Table of Contents
Ripping DVDs with Mencoder
For a simple recipe to rip (extract) the content of a DVD using Debian 10, see Ripping the content of a DVD.
Install the necessary programs
On Debian: (substitute k7 with 586 for Intel users)
apt-get install mplayer-k7 mencoder-k7 ogmtools libdvdcss dvdbackup
On Gentoo:
emerge mplayer ogmtools libdvdcss dvdbackup
On other distributions, use the appropriate package management tools to install mplayer, mencoder (which may be part of the mplayer package), ogmtools, libdvdcss and dvdbackup.
Rip & Unencrypt the DVD
Change to a directory on a disk with 10GiB+ free.
Backup the DVD with: (where MyDVD is the name of your project)
dvdbackup -i /dev/dvd -M -o MyDVD cd MyDVD ls
You should see one directory. I will call this directory $RIPDIR.
If you get key errors
If your DVD was from the wrong region and dvdbackup says it was unable to get the CSS keys, don't panic. Libdvdcss doesn't give a damn about regions (quite rightly), but it needs some help - you need to use ide-scsi
. Unlike that Redmond OS, it also doesn't matter what region your drive is in (or RPC-I / RPC-II firmware), it just works
Exception
Matsushita / Matshita / Panasonic (all synonyms) drives will mostly not work. You need to get a patched firmware from somewhere like http://www.rpc1.org/
Follow this procedure:
-
- Compile your kernel with
ide-scsi
and when you boot up, pass the kernel the argumenthdc=ide-scsi
(where hdc is the name of your dvdrom device). - OR
- If you have
ide-scsi
as a module, modprobe it (with the right options, which I don't know)
- Try again.
- If you now find your dvdrom is dead slow and your machine unuseable when you rip, you need to follow this procedure to enable DMA.
$ mkdir tmp-dev $ cd tmp-dev $ sudo MAKEDEV hdc # Where hdc is the name of your dvdrom $ sudo hdparm -d1 hdc $ cd .. $ sudo rm -rf tmp-dev
Of course when you want to write DVDs, ide-scsi
must be off. Life is tough
Determine Encoding Parameters
Title
DVDs are made up of a number of titles. Generally, each video on the DVD is a title (i.e. main feature is title 1, behind the scenes documentary is title 2, etc.)
First we need to determine which title we want to rip. You can use xine, totem, ogle, etc. for this:
totem dvd://$RIPDIR
Navigate to the main feature and see what Title your player says it is. I will call it $TITLE
Cropping
The movie probably has lots of black space around it. We might as well get rid of it to save some file space (and a little screen space).
mplayer -dvd-device $RIPDIR dvd://$TITLE -vf cropdetect -ss 50:00
Let it play for a little, (until you reach a bit where you can see the edges of the picture) then quit. You will see output like:
crop area: X: 3..653 Y: 74..502 (-vf crop=640:416:10:80)1.2% 0 0 43% crop area: X: 3..653 Y: 74..502 (-vf crop=640:416:10:80)1.2% 0 0 43%
Replace cropdetect with the crop command above and run mplayer again. It should have the picture perfectly cropped:
mplayer -dvd-device $RIPDIR dvd://$TITLE -vf crop=640:416:10:80 -ss 50:00
I will call that 640:416:10:80 bit $CROP.
Scaling
Scaling options:
- Don't rescale at all (will only play nicely in mplayer and other decent players if it isn't 4:3).
- You will need to add
:autoaspect
to the-lavcopts
string for mencoder. - For a high-quality rip, this is the option.
- Rescale to square pixels without resing.
- To do so you must look for the line like
VO: [xv] 720×576 ⇒ 1024×576 Planar YV12
in your mplayer output. - The second set of dimentions is the one you want to scale to.
- I will say
$SCALE=scale=1024:720
. (This must go in the-vf
)
- Rescale and resize.
- I often resize to a vertical height of 368 (a multiple of 16). You can choose whatever multiple you want.
- Then (either by calculation or trial and error using
-vf scale -xy 650
and tweaking the 650), find the width - I will say
$SCALE=scale=654:368
. (This must go in the-vf
)
Three-pass encode
There are serveral different ways to encode the video. The best quality is obtained by having three (main) separate passes:
- Extract Audio
- Encode Audio (this is a separate step if we are using OGG/Vorbis
- Examine Video to determine the compressability of each frame.
- Compress Video
- Merge audio and video (if it is an OGG or Matroska file.)
The advantages of a three pass encode are that we can get exactly the right file size (for, say, 2 CDs), and we can use containers besides AVI (which sucks big time compared to OGM and Matroska).
Extract frameno and audio
AVI
If you want an avi, encode your audio like this:
mencoder -dvd-device $RIPDIR dvd://$TITLE -ovc frameno -oac mp3lame -o frameno.avi
It will tell you some bitrates to use for various common rip-sizes based on the audio size.
OGG
For ogg, rip the audio: (you can tweak the ogg quality as necessary)
mplayer -dvd-device $RIPDIR dvd://$TITLE -vc dummy -vo null -hardframedrop -ao pcm:file=audio.wav normalize-audio audio.wav oggenc -q 2.5 audio.wav
Additional audio tracks can be ripped using mplayer's -aid
option. Find the right id with -identify
and some trial and error.
Extract chapter points (ogg only)
dvdxchap -t $TITLE $RIPDIR > chapters.txt
Encode video
Feel free to tweak bitrate (and other lavc options):
mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \ -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=1 \ -oac copy -o /dev/null mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \ -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=2 \ -oac copy -o video.avi
Remember that $SCALE might or might not be part of -vf
(-vf
options are comma seperated)
For a high-quality rip, I generally use a bitrate of 1500. If I'm rescaling down to a height of 384, I use 1000.
For really high-quality at the expense of encoding time, add :v4mv:mbd=2:trell
to your -lavcopts
.
If you don't want to preview your avi at this stage, you can replace -oac copy
with -nosound
. We will totally ignore the sound track in this avi file at the ogmmerge stage.
Merge OGM file
ogmmerge -o "Title.ogm" -c "LANGUAGE=English" audio.ogg chapters.txt -c "TITLE=Title" -A video.avi
For extra audio tracks, add in -c “LANGUAGE=English: Director Commentry” audio-c.ogg for example.
Two-pass encode
For a two pass encode, we are forced to end up with an AVI (or an MPG). Video quality remains the same as for three passes, though. It isn't much shorter, time-wise…:
- Examine Video
- Encode Video & Audio and mux into AVI.
For this, skip the Frameno and Merge OGM steps. Change the -oac
option on your second video pass mencoder command from copy
to mp3lame
.
One-pass encode
For a one pass encode, we have the same restrictions as for two passes, but it takes about half the time (at the expense of video quality):
- Encode and mux into AVI.
Now, we skip the first pass of the video encode, and remove the vpass=2
option from the mencoder command. You must make the same change to -oac
as for two-pass.
Extract Subtitles with transcode
The following programs are missing in Debian 10 Buster: tcextract, subtitle2vobsub and subtitle2pgm. We are searching for some alternatives.
DVDs have subtitles stored as images. There are some options for dealing with them:
- Extract them and keep them as images (vobsubs).
- Using
trasncode
as explained here seems to be the best method to extract subtitles from a DVD-Video. - You cannot add this to the OGM file. You have to distribute it as a separate file.
- It isn't that big - about 4MB on average.
- You have to manually tell your player (which must be decent) to use the subtitle file.
- OCR them and add them to the OGM file.
- This takes a couple of hours of your time, but it is nice to do the job properly…
NOTICE: The extract operation can be accomplished with mencoder, but mencoder seems to produce different image data into the .sub
file and slightly different timestamps into the index (.idx
) file depending on the used video codec (-ovc
option): strange enough, I got different outputs using copy and raw options. Transcode instead seems to be more deterministic.
VobSub is a well known subtitle format that saves subtitles nearly in the same format as it appears in DVD subtitle streams. From a technical point of view, VobSub saves subtitles as little images.
Extracting the subtitles
Use mplayer to identify subtitle streams contained into the DVD, they are identified by an ID and a language:
mplayer -dvd-device $RIPDIR dvd://$TITLE -identify ... ID_SUBTITLE_ID=0 ID_SID_0_LANG=it ID_SUBTITLE_ID=1 ID_SID_1_LANG=en
The tccat command will concatenate all the files that compose the specified $TITLE
to the standard output. Files are taken from the directory where the DVD-Video was ripped ($RIPDIR
).
The tcextract command extract the requested stream; ps1 stands for MPEG private stream (subtitles), the source type (-t vob
) must be specified when reading from standard input.
NOTICE: The number 0x21 is 0x20 + the subtitle ID.
tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1
If you have just the .VOB files, you can use this recipe:
cat VTS_02_?.VOB | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1
Use the How to rip DVD subtitles with vobsub2srt scripts to obtain the VobSub files:
subtitle2vobsub -p subtitles_stream.ps1 -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO -o subtitles
We used the .IFO file of the selected DVD track (#2 in the example). The subtitles will be saved into the VobSub format; two files will be generated: subtitles.idx and subtitles.sub.
If you need to extract only a part of subtitle stream (e.g. if you have cut the original track into several pieces), just use the -e option, to indicate the start, the end and a new_start (new time offset) of the extraction, in seconds, like this:
subtitle2vobsub -p subtitles_stream.ps1 \ -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO \ -e 9673.914,12673,0 -o subtitles
OCRing
Right, lets make our lives really nasty and create hundreds of PGM files:
cat subtitles_stream.ps1 | subtitle2pgm
If you want to control how grey levels are converted, try to use the -c option of subtitle2pgm, something like: -c 255,0,0,255.
Each subtitle should now be one file named like movie_subtitle0003.pgm, and a movie_subtitle.srtx file will be created to index them and their times on-screen.
With Tesseract OCR
#!/bin/sh find . -type f -name '*.pgm' | sort | while read file; do echo -n "$(basename $file) " tesseract -l eng --psm 4 "$file" "$file" done
With Gocr
NOTICE: Dont' use the following, because Gocr is not the best tool for OCR. Use Tesseract OCR instead.
To ocr all the .pgm image with gocr (using a nice wrapper for the job):
pgm2txt -d -f en -v -s 10 movie_subtitle
It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character…)
Make a single .srt file
Now we will re-merge all these text files produced into a big subtitle file:
srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt
Now it's time to proofread. I prefer to go through each one manually:
display *.pgm & vim english.srt
You can use spacebar to advance your images in display.
Gocr is very predictable, so if it makes a mistake once, it will do it again, a lot! Use your editor's regular expression features whenever you spot a mistake to correct all the instances. It saves time.
Then spell check:
aspell -l british -c english.srt
You can now add english.srt onto the end of your ogmmerge
command. Oh, and stick a -c LANGUAGE=English
before it
Fixing time, etc
Finally you can proof-check the final .srt file using the graphical interface of Gaupol, a full-featured subtitle editor program. It can handle some of the more common operation required:
- Shift times, from Tools, Shift Positions…
- Renumber subtitles, this is done automatically when you save the project.