For a simple recipe to rip (extract) the content of a DVD using Debian 10, see Ripping the content of a DVD.
On Debian: (substitute k7 with 586 for Intel users)
apt-get install mplayer-k7 mencoder-k7 ogmtools libdvdcss dvdbackup
On Gentoo:
emerge mplayer ogmtools libdvdcss dvdbackup
On other distributions, use the appropriate package management tools to install mplayer, mencoder (which may be part of the mplayer package), ogmtools, libdvdcss and dvdbackup.
Change to a directory on a disk with 10GiB+ free.
Backup the DVD with: (where MyDVD is the name of your project)
dvdbackup -i /dev/dvd -M -o MyDVD cd MyDVD ls
You should see one directory. I will call this directory $RIPDIR.
If your DVD was from the wrong region and dvdbackup says it was unable to get the CSS keys, don't panic. Libdvdcss doesn't give a damn about regions (quite rightly), but it needs some help - you need to use ide-scsi
. Unlike that Redmond OS, it also doesn't matter what region your drive is in (or RPC-I / RPC-II firmware), it just works
Exception
Matsushita / Matshita / Panasonic (all synonyms) drives will mostly not work. You need to get a patched firmware from somewhere like http://www.rpc1.org/
Follow this procedure:
ide-scsi
and when you boot up, pass the kernel the argument hdc=ide-scsi
(where hdc is the name of your dvdrom device).ide-scsi
as a module, modprobe it (with the right options, which I don't know) $ mkdir tmp-dev $ cd tmp-dev $ sudo MAKEDEV hdc # Where hdc is the name of your dvdrom $ sudo hdparm -d1 hdc $ cd .. $ sudo rm -rf tmp-dev
Of course when you want to write DVDs, ide-scsi
must be off. Life is tough
DVDs are made up of a number of titles. Generally, each video on the DVD is a title (i.e. main feature is title 1, behind the scenes documentary is title 2, etc.)
First we need to determine which title we want to rip. You can use xine, totem, ogle, etc. for this:
totem dvd://$RIPDIR
Navigate to the main feature and see what Title your player says it is. I will call it $TITLE
The movie probably has lots of black space around it. We might as well get rid of it to save some file space (and a little screen space).
mplayer -dvd-device $RIPDIR dvd://$TITLE -vf cropdetect -ss 50:00
Let it play for a little, (until you reach a bit where you can see the edges of the picture) then quit. You will see output like:
crop area: X: 3..653 Y: 74..502 (-vf crop=640:416:10:80)1.2% 0 0 43% crop area: X: 3..653 Y: 74..502 (-vf crop=640:416:10:80)1.2% 0 0 43%
Replace cropdetect with the crop command above and run mplayer again. It should have the picture perfectly cropped:
mplayer -dvd-device $RIPDIR dvd://$TITLE -vf crop=640:416:10:80 -ss 50:00
I will call that 640:416:10:80 bit $CROP.
Scaling options:
:autoaspect
to the -lavcopts
string for mencoder.VO: [xv] 720×576 ⇒ 1024×576 Planar YV12
in your mplayer output.$SCALE=scale=1024:720
. (This must go in the -vf
)-vf scale -xy 650
and tweaking the 650), find the width$SCALE=scale=654:368
. (This must go in the -vf
)There are serveral different ways to encode the video. The best quality is obtained by having three (main) separate passes:
The advantages of a three pass encode are that we can get exactly the right file size (for, say, 2 CDs), and we can use containers besides AVI (which sucks big time compared to OGM and Matroska).
If you want an avi, encode your audio like this:
mencoder -dvd-device $RIPDIR dvd://$TITLE -ovc frameno -oac mp3lame -o frameno.avi
It will tell you some bitrates to use for various common rip-sizes based on the audio size.
For ogg, rip the audio: (you can tweak the ogg quality as necessary)
mplayer -dvd-device $RIPDIR dvd://$TITLE -vc dummy -vo null -hardframedrop -ao pcm:file=audio.wav normalize-audio audio.wav oggenc -q 2.5 audio.wav
Additional audio tracks can be ripped using mplayer's -aid
option. Find the right id with -identify
and some trial and error.
dvdxchap -t $TITLE $RIPDIR > chapters.txt
Feel free to tweak bitrate (and other lavc options):
mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \ -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=1 \ -oac copy -o /dev/null mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \ -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=2 \ -oac copy -o video.avi
Remember that $SCALE might or might not be part of -vf
(-vf
options are comma seperated)
For a high-quality rip, I generally use a bitrate of 1500. If I'm rescaling down to a height of 384, I use 1000.
For really high-quality at the expense of encoding time, add :v4mv:mbd=2:trell
to your -lavcopts
.
If you don't want to preview your avi at this stage, you can replace -oac copy
with -nosound
. We will totally ignore the sound track in this avi file at the ogmmerge stage.
ogmmerge -o "Title.ogm" -c "LANGUAGE=English" audio.ogg chapters.txt -c "TITLE=Title" -A video.avi
For extra audio tracks, add in -c “LANGUAGE=English: Director Commentry” audio-c.ogg for example.
For a two pass encode, we are forced to end up with an AVI (or an MPG). Video quality remains the same as for three passes, though. It isn't much shorter, time-wise…:
For this, skip the Frameno and Merge OGM steps. Change the -oac
option on your second video pass mencoder command from copy
to mp3lame
.
For a one pass encode, we have the same restrictions as for two passes, but it takes about half the time (at the expense of video quality):
Now, we skip the first pass of the video encode, and remove the vpass=2
option from the mencoder command. You must make the same change to -oac
as for two-pass.
The following programs are missing in Debian 10 Buster: tcextract, subtitle2vobsub and subtitle2pgm. We are searching for some alternatives.
DVDs have subtitles stored as images. There are some options for dealing with them:
trasncode
as explained here seems to be the best method to extract subtitles from a DVD-Video.
NOTICE: The extract operation can be accomplished with mencoder, but mencoder seems to produce different image data into the .sub
file and slightly different timestamps into the index (.idx
) file depending on the used video codec (-ovc
option): strange enough, I got different outputs using copy and raw options. Transcode instead seems to be more deterministic.
VobSub is a well known subtitle format that saves subtitles nearly in the same format as it appears in DVD subtitle streams. From a technical point of view, VobSub saves subtitles as little images.
Use mplayer to identify subtitle streams contained into the DVD, they are identified by an ID and a language:
mplayer -dvd-device $RIPDIR dvd://$TITLE -identify ... ID_SUBTITLE_ID=0 ID_SID_0_LANG=it ID_SUBTITLE_ID=1 ID_SID_1_LANG=en
The tccat command will concatenate all the files that compose the specified $TITLE
to the standard output. Files are taken from the directory where the DVD-Video was ripped ($RIPDIR
).
The tcextract command extract the requested stream; ps1 stands for MPEG private stream (subtitles), the source type (-t vob
) must be specified when reading from standard input.
NOTICE: The number 0x21 is 0x20 + the subtitle ID.
tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1
If you have just the .VOB files, you can use this recipe:
cat VTS_02_?.VOB | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1
Use the How to rip DVD subtitles with vobsub2srt scripts to obtain the VobSub files:
subtitle2vobsub -p subtitles_stream.ps1 -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO -o subtitles
We used the .IFO file of the selected DVD track (#2 in the example). The subtitles will be saved into the VobSub format; two files will be generated: subtitles.idx and subtitles.sub.
If you need to extract only a part of subtitle stream (e.g. if you have cut the original track into several pieces), just use the -e option, to indicate the start, the end and a new_start (new time offset) of the extraction, in seconds, like this:
subtitle2vobsub -p subtitles_stream.ps1 \ -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO \ -e 9673.914,12673,0 -o subtitles
Right, lets make our lives really nasty and create hundreds of PGM files:
cat subtitles_stream.ps1 | subtitle2pgm
If you want to control how grey levels are converted, try to use the -c option of subtitle2pgm, something like: -c 255,0,0,255.
Each subtitle should now be one file named like movie_subtitle0003.pgm, and a movie_subtitle.srtx file will be created to index them and their times on-screen.
#!/bin/sh find . -type f -name '*.pgm' | sort | while read file; do echo -n "$(basename $file) " tesseract -l eng --psm 4 "$file" "$file" done
NOTICE: Dont' use the following, because Gocr is not the best tool for OCR. Use Tesseract OCR instead.
To ocr all the .pgm image with gocr (using a nice wrapper for the job):
pgm2txt -d -f en -v -s 10 movie_subtitle
It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character…)
Now we will re-merge all these text files produced into a big subtitle file:
srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt
Now it's time to proofread. I prefer to go through each one manually:
display *.pgm & vim english.srt
You can use spacebar to advance your images in display.
Gocr is very predictable, so if it makes a mistake once, it will do it again, a lot! Use your editor's regular expression features whenever you spot a mistake to correct all the instances. It saves time.
Then spell check:
aspell -l british -c english.srt
You can now add english.srt onto the end of your ogmmerge
command. Oh, and stick a -c LANGUAGE=English
before it
Finally you can proof-check the final .srt file using the graphical interface of Gaupol, a full-featured subtitle editor program. It can handle some of the more common operation required: