Table of Contents

How to rip DVD subtitles with vobsub2srt

The vobsub2srt program reads a pair of subtitles.sub and subtitles.idx files, OCRs the images contained in the sub file and creates a subtitles.srt file with the subtitles text and the appropriate timing information obtained from the idx file.

The program vobsub2srt does not exists in Debian 12 Bookworm, but it should be possible to compile it from source (see the VobSub2SRT GitHub repository). Alternatively you can get the binary package from the Deb Multimedia repository.

The required Debian packages are:

Ripping the .vob from the DVD

A DVD can contain several titles and you should identify which one you want to rip; generally it is the longer one or the one with most chapters. We check the DVD content using the lsdvd tool:

lsdvd /dev/sr0
Disc Title: DVD_TITLE
Title: 01, Length: 01:02:36.480 Chapters: 03, Cells: 03, Audio streams: 02, Subpictures: 04
Title: 02, Length: 00:00:12.800 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 03, Length: 00:21:01.760 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 04, Length: 00:00:00.480 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 05, Length: 00:21:10.000 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Title: 06, Length: 00:20:24.720 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
Longest track: 01

The longest title is the #1, so we will extract it using vobcopy:

vobcopy -n '1' -i /dev/sr0 --large-file -o .

The resulting file will be saved into the working directory (as specified by the -o option) and it will be named by the DVD title, something like DVD_TITLE.vob.

You can inspect the content of the file using the mediainfo tool, in our case the file contains one video stream, two audio streams and three subtitle streams. The subtitles are in the standard DVD format: VobSub, which is a images (bitmap) format, not text.

Converting the .vob into .mkv format

As far I know, there is not a tool capable of extracting the VobSub subtitles directly from the vob file; we might hope that ffmpeg was capable of doing this, but it seems not.

Fortunately the mkvextract (from the mkvtoolnix Debian package) can extract the VobSub stream from a mkv file, so we firstly use ffmpeg to convert the vob into mkv. In the following example all the stream are copied, without re-encoding. At this step you may want to re-encode the video to squeeze the MPEG2 stream into the more efficient H264 format.

ffmpeg -probesize 500M -analyzeduration 500M \
    -i 'DVD_TITLE.vob' \
    -map 0:v:0 -map 0:a:0 -map 0:a:1 -map 0:s:0 -map 0:s:1 -map 0:s:2 \
    -vcodec 'copy' \
    -acodec 'copy' \
    -scodec 'copy' \
    'DVD_TITLE.mkv'

Notice the several -map options required to embed all the source streams into the destination file; in our example we have one video stream, two audio streams and three subtitles streams. The -probesize and -analyzeduration options are required because the subtitles streams start not at the very begin of the file and they may be missed.

Extracting .sub and .idx files from the .vob

From the mkv file it is now possibile to create two files (.sub and .idx) for each subtitles stream. The stream numbering expected by mkvextract in our example is as follow: #0 is the video stream, #1 and #2 are the two audio streams, so the first subtitle stream is the #3:

mkvextract 'DVD_TITLE.mkv' tracks -c 'S_VOBSUB' '3:subtitles-3'

The result will be two files: subtitles-3.sub and subtitles-3.idx. It is possible to repeat the command to extract the other subtitles (#4 and #5 in our example).

OCR the images from the .sub file

vobsub2srt --ifo './VTS_01_0.IFO' --dump-images --tesseract-lang ita 'subtitles-3'

The .IFO file is required to get the correct palette, width and hight, but it is not mandatory.