====== How to rip DVD subtitles with vobsub2srt ======
The **vobsub2srt** program reads a pair of **subtitles.sub** and **subtitles.idx** files, OCRs the images contained in the //sub// file and creates a **subtitles.srt** file with the subtitles text and the appropriate timing information obtained from the **idx** file.
The program **vobsub2srt** does not exists in Debian 12 Bookworm, but it should be possible to compile it from source (see the **[[https://github.com/ruediger/VobSub2SRT|VobSub2SRT GitHub]]** repository). Alternatively you can get the binary package from the **[[https://deb-multimedia.org/dists/testing/main/binary-i386/package/vobsub2srt|Deb Multimedia repository]]**.
The required Debian packages are:
* **lsdvd** - From the official Debian repository.
* **vobcopy** - From the official Debian repository.
* **mediainfo** - From the official Debian repository.
* **mkvtoolnix** - From the official Debian repository.
* **vobsub2srt** - From the Deb Multimedia repository.
===== Ripping the .vob from the DVD =====
A DVD can contain several **titles** and you should identify which one you want to rip; generally it is the longer one or the one with most chapters. We check the DVD content using the **lsdvd** tool:
lsdvd /dev/sr0
Disc Title: DVD_TITLE
Title: 01, Length: 01:02:36.480 Chapters: 03, Cells: 03, Audio streams: 02, Subpictures: 04
Title: 02, Length: 00:00:12.800 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 03, Length: 00:21:01.760 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 04, Length: 00:00:00.480 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 05, Length: 00:21:10.000 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 06, Length: 00:20:24.720 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Longest track: 01
The longest title is the **#1**, so we will extract it using **vobcopy**:
vobcopy -n '1' -i /dev/sr0 --large-file -o .
The resulting file will be saved into the working directory (as specified by the **%%-o%%** option) and it will be named by the DVD title, something like **DVD_TITLE.vob**.
You can inspect the content of the file using the **mediainfo** tool, in our case the file contains one video stream, two audio streams and three subtitle streams. The subtitles are in the standard DVD format: VobSub, which is a images (bitmap) format, not text.
===== Converting the .vob into .mkv format =====
As far I know, there is not a tool capable of extracting the VobSub subtitles directly from the vob file; we might hope that **ffmpeg** was capable of doing this, but it seems not.
Fortunately the **mkvextract** (from the mkvtoolnix Debian package) can extract the VobSub stream from a //mkv// file, so we firstly use ffmpeg to convert the //vob// into //mkv//. In the following example all the stream are copied, without re-encoding. At this step you may want to re-encode the video to squeeze the MPEG2 stream into the more efficient H264 format.
ffmpeg -probesize 500M -analyzeduration 500M \
-i 'DVD_TITLE.vob' \
-map 0:v:0 -map 0:a:0 -map 0:a:1 -map 0:s:0 -map 0:s:1 -map 0:s:2 \
-vcodec 'copy' \
-acodec 'copy' \
-scodec 'copy' \
'DVD_TITLE.mkv'
Notice the several **%%-map%%** options required to embed all the source streams into the destination file; in our example we have **one video** stream, **two audio** streams and **three subtitles** streams. The **%%-probesize%%** and **%%-analyzeduration%%** options are required because the subtitles streams start not at the very begin of the file and they may be missed.
===== Extracting .sub and .idx files from the .vob =====
From the //mkv// file it is now possibile to create **two files** (.sub and .idx) for each subtitles stream. The stream numbering expected by ''mkvextract'' in our example is as follow: **#0** is the video stream, **#1** and **#2** are the two audio streams, so the first subtitle stream is the **#3**:
mkvextract 'DVD_TITLE.mkv' tracks -c 'S_VOBSUB' '3:subtitles-3'
The result will be two files: **subtitles-3.sub** and **subtitles-3.idx**. It is possible to repeat the command to extract the other subtitles (**#4** and **#5** in our example).
===== OCR the images from the .sub file =====
vobsub2srt --ifo './VTS_01_0.IFO' --dump-images --tesseract-lang ita 'subtitles-3'
The .IFO file is required to get the correct palette, width and hight, but it is not mandatory.