====== Ripping the content of a DVD ======
===== How to rip the video and audio streams =====
This recipe uses the **lsdvd** and **vobcopy** programs, which are found in the Debian packages of the same names (verified on **Debian 10 Buster**).
First you need to list the content (chapters) of the video DVD:
lsdvd
Disc Title: MY_DVD_1_DISC1
Title: 01, Length: 00:00:00.580 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
Title: 02, Length: 00:00:14.000 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
Title: 03, Length: 00:43:32.330 Chapters: 05, Cells: 05, Audio streams: 01, Subpictures: 01
Title: 04, Length: 00:42:12.320 Chapters: 05, Cells: 06, Audio streams: 01, Subpictures: 01
Title: 05, Length: 00:44:45.840 Chapters: 05, Cells: 06, Audio streams: 01, Subpictures: 01
Title: 06, Length: 00:41:55.370 Chapters: 05, Cells: 05, Audio streams: 01, Subpictures: 01
Longest track: 03
You can also list the content of an ISO image mounted elewhere:
lsdvd /mnt/
Then you can rip the required track (e.g. #3) as a single (large) file. The **%%-i%%** parameter will accept the DVD device name or the directory containing the DVD structure:
vobcopy -n 3 -i /dev/dvd --large-file -o ./dstdir
The resulting VOB file will contains also the subtitles, if any.
===== How to rip subtitles from the DVD =====
This recipe uses the **mplayer**, **mencoder**, **vobsub2srt** and **tesseract-ocr** programs, from the Debian packages of the same names (tested with **Debian 10 Buster**, ''vobsub2srt'' comes from the **[[http://www.deb-multimedia.org/|deb-multimedia]]** repository). To improve OCR performance on subtitle images you may install the local language package for tesseract, eg. **tesseract-ocr-ita** for Italian.
Suppose that we have the DVD structure mounted under **/mnt**. If you have instead the physical disk, substitute ''/mnt'' with the DVD device in the commands.
We can use **mplayer** to identify the subtitles available into the track #1 (we identify subtitle SID #0):
mplayer -dvd-device /mnt dvd://1 -identify
...
ID_SUBTITLE_ID=0
ID_SID_0_LANG=it
number of subtitles on disk: 1
...
Suppose that the DVD track is **#1** (specified via the **%%dvd://%%** option) and the subtitle index is **#0** (specified via the **%%-sid%%** option), use **mencoder** to extract the subtitle **index** file and the subtitle **bitmaps** file, in the following example the files will be **vobsubs-it.idx** and **vobsubs-it.sub** respectively:
mencoder -dvd-device /mnt dvd://1 \
-nosound \
-ovc 'copy' -o /dev/null \
-ifo /mnt/VIDEO_TS/VTS_01_0.IFO \
-sid 0 -vobsubout vobsubs-it
The .IFO file is required to know the palette to apply to the bitmaps.
The following command, working on the two files **vobsubs-it.idx** and **vobsubs-it.sub**, will do the OCR on each subtitle image using **tesseract** (it requires several minutes to run):
vobsub2srt \
--ifo /mnt/VIDEO_TS/VIDEO_TS.IFO \
--dump-images \
--tesseract-lang ita \
vobsubs-it
The result will be a **vobsubs-it.srt** text file, containing the subtitles text and timing information. If you want to keep **one pgm image** file for each subtitle, add the **%%--dump-images%%** option.
===== Converting a DVD with subtitles to MKV using ffmpeg =====
I got a rather complicate DVD to rip from, basically the problems are:
* Subtitles are in **dvdsub** format (which is normal for DVD), which need **palette** info to be displayed correctly.
* Different subtitles streams **start at different times**, some do start **after several minutes**. The automatic detection performed by ''ffmpeg'' does not detect some of them and gets the sorting wrong.
* **Languages of subtitles** are not automatically detected.
=== Inspect the disk ===
Using lsdvd directly on the DVD disk, you can see the **video** tracks, **audio** streams and **subtitles** availables:
lsdvd -s /dev/dvd
Disc Title: FREEDOMDOWNTIME
Title: 01, Length: 02:01:38.600 Chapters: 30, Cells: 30, Audio streams: 04, Subpictures: 24
Subtitle: 01, Language: da - Dansk, Content: Undefined, Stream id: 0x20,
Subtitle: 02, Language: de - Deutsch, Content: Undefined, Stream id: 0x21,
...
Title: 02, Length: 01:18:01.000 Chapters: 06, Cells: 06, Audio streams: 01, Subpictures: 00
Title: 03, Length: 00:00:24.066 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
Title: 04, Length: 00:00:09.800 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
=== Rip the track ===
First of all I **ripped the first track** (the only one I'm really interested in) from the DVD into a directory:
vobcopy -n 1 -i /dev/dvd --large-file -o ./track1/
Using the **mediainfo** tool you can inspect the resulting vob file to verify that **video**, **audio** and **text** (subtitles) streams are the ones we expect.
=== Get subtitles palette info ===
Then I extracted the first (#0) **dvdsub stream** (there are 22!) from the DVD:
mencoder -dvd-device /dev/dvd dvd://1 \
-nosound \
-ovc 'copy' -o /dev/null \
-ifo /mnt/VIDEO_TS/VTS_01_0.IFO \
-sid 0 -vobsubout vobsubs-sid0
This command will produce two files: **vobsubs-sid0.idx** and **vobsubs-sid0.sub**. Actually we are just interested in the **palette** which is written into the idx file, it is something like this:
palette: d7410d, 101010, 0e00d7, d5ccc9, d4b1cb, aac5d0, abd3af, d5ff0c,
d717cc, d6a80b, 8b02d6, 1dca41, 0d007f, 95679f, 8caa67, 783d3f
As an alternative you can get the the **.IFO** of the track (for the first track it is **VIDEO_TS/VTS_01_0.IFO**), that file contains the palette info and can be used instead of the palette numbers.
Also **lsdvd** should be able to print the palette, using the option **-P**. But in my tests it produced a palette with different color values, which displayed incorrectly in the final result.
=== Transcode with ffmpeg ===
Finally I launched the **ffmpeg** incantation:
ffmpeg -probesize 500M -analyzeduration 500M \
-palette 'd7410d,101010,0e00d7,d5ccc9,d4b1cb,aac5d0,abd3af,d5ff0c,d717cc,d6a80b,8b02d6,1dca41,0d007f,95679f,8caa67,783d3f' \
-i 'FREEDOMDOWNTIME1.vob' \
-map '0:v:0' -map '0:a:0' -map '0:a:1' \
-map '0:s:20' \
-map '0:s:0' -map '0:s:1' -map '0:s:2' -map '0:s:3' \
-map '0:s:4' -map '0:s:5' -map '0:s:6' -map '0:s:7' \
-map '0:s:8' -map '0:s:9' -map '0:s:10' -map '0:s:11' \
-map '0:s:12' -map '0:s:13' -map '0:s:14' -map '0:s:15' \
-map '0:s:16' -map '0:s:17' -map '0:s:18' -map '0:s:19' \
-map '0:s:21' -map '0:s:22' -map '0:s:23' \
-metadata:s:a:0 title='English' -metadata:s:a:0 language=eng \
-metadata:s:a:1 title='English Commented' -metadata:s:a:1 language=eng \
-metadata title='Freedom Downtime' -metadata:s:v:0 title='Freedom Downtime' \
-metadata:s:s:0 language=eng -metadata:s:s:0 title='English' \
-metadata:s:s:1 language=eng -metadata:s:s:1 title='English FCC-Approved' \
-metadata:s:s:2 language=dan -metadata:s:s:2 title='Dansk' \
-metadata:s:s:3 language=deu -metadata:s:s:3 title='Deutsch' \
-metadata:s:s:4 language=spa -metadata:s:s:4 title='Espanol' \
-metadata:s:s:5 language=est -metadata:s:s:5 title='Estonian' \
-metadata:s:s:6 language=per -metadata:s:s:6 title='Persian' \
-metadata:s:s:7 language=fin -metadata:s:s:7 title='Suomi' \
-metadata:s:s:8 language=fra -metadata:s:s:8 title='Francais' \
-metadata:s:s:9 language=heb -metadata:s:s:9 title='Hebrew' \
-metadata:s:s:10 language=hrv -metadata:s:s:10 title='Hrvatski' \
-metadata:s:s:11 language=ita -metadata:s:s:11 title='Italiano' \
-metadata:s:s:12 language=jpn -metadata:s:s:12 title='Japanese' \
-metadata:s:s:13 language=nld -metadata:s:s:13 title='Nederlands' \
-metadata:s:s:14 language=nor -metadata:s:s:14 title='Norsk' \
-metadata:s:s:15 language=pol -metadata:s:s:15 title='Polish' \
-metadata:s:s:16 language=por -metadata:s:s:16 title='Portugues' \
-metadata:s:s:17 language=rus -metadata:s:s:17 title='Russian' \
-metadata:s:s:18 language=swe -metadata:s:s:18 title='Svenska' \
-metadata:s:s:19 language=tur -metadata:s:s:19 title='Turkish' \
-metadata:s:s:20 language=zho -metadata:s:s:20 title='Chinese' \
-metadata:s:s:21 language=xxx -metadata:s:s:21 title='Babel nonsense' \
-metadata:s:s:22 language=xxx -metadata:s:s:22 title='Game' \
-metadata:s:s:23 language=xxx -metadata:s:s:23 title='Words' \
-codec:s 'dvdsub' \
-vf yadif \
-codec:v 'libx264' -pix_fmt 'yuvj420p' -preset 'veryslow' -tune 'film' -profile:v 'high' -level:v 5 \
-b:v '2048k' \
-ac 2 -codec:a 'libvorbis' -b:a '192k' \
'FREEDOMDOWNTIME1.mkv'
Without the **%%-probesize%%** and **%%-analyzeduration%%** options (both are required), ''ffmpeg'' does not see the subtitles streams that starts some time after the begin of the video. If you explicitly map the unseen stream it will produce an error like this:
Stream map '0:s:20' matches no streams.
If you don't explicitly map the streams, you will get only a warning message during the transcode:
New subtitle stream 0:27 at pos:8284174 and DTS:20.0200s
I mapped (i.e. selected to be inserted into the output) the **video track**, then **two adio tracks** (there were four), and finally **24 text subtitles tracks** (they are actually bitmaps in dvdsub format). The order of the **%%-map%%** options is used to re-arrange the position of the subtitles, overriding the autodetect performed by ''ffmpeg''. All the **%%-metadata%%** are used to properly tag the subtitles once they are sorted as I want.
It is mandatory to use the **%%-codec:s 'dvdsub'%%** for subtitles, if you use the **copy** option (which does not re-encode the stream) the **palette** is not applied and you will get subtitles with wrong colors.
Yes, the source video has annoying **interlacing artifacts**, so I used the **yadif** video filter to apply a deinterlace effect.