====== Ripping DVDs with Mencoder ======
:!: For a simple recipe to rip (extract) the content of a DVD using Debian 10, see **[[vobcopy]]**.
===== Install the necessary programs =====
On Debian: (substitute k7 with 586 for Intel users)
apt-get install mplayer-k7 mencoder-k7 ogmtools libdvdcss dvdbackup
On Gentoo:
emerge mplayer ogmtools libdvdcss dvdbackup
On other distributions, use the appropriate package management tools to install mplayer, mencoder (which may be part of the mplayer package), ogmtools, libdvdcss and dvdbackup.
===== Rip & Unencrypt the DVD =====
Change to a directory on a disk with 10GiB+ free.
Backup the DVD with: (where MyDVD is the name of your project)
dvdbackup -i /dev/dvd -M -o MyDVD
cd MyDVD
ls
You should see one directory. I will call this directory $RIPDIR.
==== If you get key errors ====
If your DVD was from the wrong region and dvdbackup says it was unable to get the CSS keys, don't panic. Libdvdcss doesn't give a damn about regions (quite rightly), but it needs some help - you need to use ''ide-scsi''. Unlike that Redmond OS, it also doesn't matter what region your drive is in (or RPC-I / RPC-II firmware), it just works :-)
**Exception** \\
Matsushita / Matshita / Panasonic (all synonyms) drives will mostly not work. You need to get a patched firmware from somewhere like [[http://www.rpc1.org/]]
Follow this procedure:
-
* Compile your kernel with ''ide-scsi'' and when you boot up, pass the kernel the argument ''hdc=ide-scsi'' (where hdc is the name of your dvdrom device).
* **OR**
* If you have ''ide-scsi'' as a module, modprobe it (with the right options, which I don't know)
- Try again.
- If you now find your dvdrom is dead slow and your machine unuseable when you rip, you need to follow this procedure to enable DMA.
$ mkdir tmp-dev
$ cd tmp-dev
$ sudo MAKEDEV hdc # Where hdc is the name of your dvdrom
$ sudo hdparm -d1 hdc
$ cd ..
$ sudo rm -rf tmp-dev
Of course when you want to write DVDs, ''ide-scsi'' must be off. Life is tough :-)
===== Determine Encoding Parameters =====
==== Title ====
DVDs are made up of a number of titles. Generally, each video on the DVD is a title (i.e. main feature is title 1, behind the scenes documentary is title 2, etc.)
First we need to determine which title we want to rip. You can use xine, totem, ogle, etc. for this:
totem dvd://$RIPDIR
Navigate to the main feature and see what Title your player says it is. I will call it $TITLE
==== Cropping ====
The movie probably has lots of black space around it. We might as well get rid of it to save some file space (and a little screen space).
mplayer -dvd-device $RIPDIR dvd://$TITLE -vf cropdetect -ss 50:00
Let it play for a little, (until you reach a bit where you can see the edges of the picture) then quit. You will see output like:
crop area: X: 3..653 Y: 74..502 (-vf crop=640:416:10:80)1.2% 0 0 43%
crop area: X: 3..653 Y: 74..502 (-vf crop=640:416:10:80)1.2% 0 0 43%
Replace cropdetect with the crop command above and run mplayer again. It should have the picture perfectly cropped:
mplayer -dvd-device $RIPDIR dvd://$TITLE -vf crop=640:416:10:80 -ss 50:00
I will call that 640:416:10:80 bit $CROP.
==== Scaling ====
Scaling options:
- Don't rescale at all (will only play nicely in mplayer and other decent players if it isn't 4:3).
* You will need to add '':autoaspect'' to the ''-lavcopts'' string for mencoder.
* For a high-quality rip, this is //the// option.
- Rescale to square pixels without resing.
* To do so you must look for the line like ''VO: [xv] 720x576 => 1024x576 Planar YV12'' in your mplayer output.
* The second set of dimentions is the one you want to scale to.
* I will say ''$SCALE=scale=1024:720''. (This must go in the ''-vf'')
- Rescale and resize.
* I often resize to a vertical height of 368 (a multiple of 16). You can choose whatever multiple you want.
* Then (either by calculation or trial and error using ''-vf scale -xy 650'' and tweaking the 650), find the width
* I will say ''$SCALE=scale=654:368''. (This must go in the ''-vf'')
===== Three-pass encode =====
There are serveral different ways to encode the video. The best quality is obtained by having three (main) separate passes:
- **Extract Audio**
- Encode Audio (this is a separate step if we are using OGG/Vorbis
- **Examine Video** to determine the compressability of each frame.
- **Compress Video**
- Merge audio and video (if it is an OGG or Matroska file.)
The advantages of a three pass encode are that we can get exactly the right file size (for, say, 2 CDs), and we can use containers besides AVI (which sucks big time compared to OGM and Matroska).
==== Extract frameno and audio ====
=== AVI ===
If you want an avi, encode your audio like this:
mencoder -dvd-device $RIPDIR dvd://$TITLE -ovc frameno -oac mp3lame -o frameno.avi
It will tell you some bitrates to use for various common rip-sizes based on the audio size.
=== OGG ===
For ogg, rip the audio: (you can tweak the ogg quality as necessary)
mplayer -dvd-device $RIPDIR dvd://$TITLE -vc dummy -vo null -hardframedrop -ao pcm:file=audio.wav
normalize-audio audio.wav
oggenc -q 2.5 audio.wav
Additional audio tracks can be ripped using mplayer's ''-aid'' option. Find the right id with ''-identify'' and some trial and error.
==== Extract chapter points (ogg only) ====
dvdxchap -t $TITLE $RIPDIR > chapters.txt
==== Encode video ====
Feel free to tweak bitrate (and other lavc options):
mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \
-ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=1 \
-oac copy -o /dev/null
mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \
-ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=2 \
-oac copy -o video.avi
Remember that $SCALE might or might not be part of ''-vf'' (''-vf'' options are comma seperated)
For a high-quality rip, I generally use a bitrate of 1500. If I'm rescaling down to a height of 384, I use 1000.
For //really// high-quality at the expense of encoding time, add '':v4mv:mbd=2:trell'' to your ''-lavcopts''.
If you don't want to preview your avi at this stage, you can replace ''-oac copy'' with ''-nosound''. We will totally ignore the sound track in this avi file at the ogmmerge stage.
==== Merge OGM file ====
ogmmerge -o "Title.ogm" -c "LANGUAGE=English" audio.ogg chapters.txt -c "TITLE=Title" -A video.avi
For extra audio tracks, add in -c "LANGUAGE=English: Director Commentry" audio-c.ogg for example.
===== Two-pass encode =====
For a two pass encode, we are forced to end up with an AVI (or an MPG). Video quality remains the same as for three passes, though. It isn't much shorter, time-wise...:
- Examine Video
- Encode Video & Audio and mux into AVI.
For this, skip the //Frameno// and //Merge OGM// steps. Change the ''-oac'' option on your second video pass mencoder command from ''copy'' to ''mp3lame''.
===== One-pass encode =====
For a one pass encode, we have the same restrictions as for two passes, but it takes about half the time (at the expense of video quality):
- Encode and mux into AVI.
Now, we skip the first pass of the video encode, and remove the ''vpass=2'' option from the mencoder command. You must make the same change to ''-oac'' as for two-pass.
===== Extract Subtitles with transcode =====
FIXME The following programs are **missing in Debian 10 Buster**: **tcextract**, **subtitle2vobsub** and **subtitle2pgm**. We are searching for some alternatives.
DVDs have subtitles stored as images. There are some options for dealing with them:
* Extract them and keep them as images (**vobsubs**).
* Using ''trasncode'' as explained here **seems to be the best method** to extract subtitles from a DVD-Video.
* You cannot add this to the OGM file. You have to distribute it as a separate file.
* It isn't //that// big - about 4MB on average.
* You have to manually tell your player (which must be decent) to use the subtitle file.
* **OCR** them and add them to the OGM file.
* This takes a couple of hours of //your// time, but it is nice to do the job properly...
**NOTICE:** The extract operation can be accomplished with [[mplayer#how_to_extract_subtitles_from_a_dvd-video_into_vobsub_format|mencoder]], but mencoder seems to produce different image data into the ''**.sub**'' file and slightly different timestamps into the index (''**.idx**'') file depending on the used video codec (''-ovc'' option): strange enough, I got different outputs using //copy// and //raw// options. Transcode instead seems to be more deterministic.
**[[glossary#vobsub|VobSub]]** is a well known subtitle format that saves subtitles nearly in the same format as it appears in DVD subtitle streams. From a technical point of view, VobSub saves subtitles as little images.
==== Extracting the subtitles ====
Use **mplayer** to identify subtitle streams contained into the DVD, they are identified by an ID and a language:
mplayer -dvd-device $RIPDIR dvd://$TITLE -identify
...
ID_SUBTITLE_ID=0
ID_SID_0_LANG=it
ID_SUBTITLE_ID=1
ID_SID_1_LANG=en
The **tccat** command will concatenate all the files that compose the specified ''$TITLE'' to the standard output. Files are taken from the directory where the DVD-Video was ripped (''$RIPDIR'').
The **tcextract** command extract the requested stream; //ps1// stands for MPEG private stream (subtitles), the source type (''-t vob'') must be specified when reading from standard input.
**NOTICE**: The number **0x21** is **0x20** + the subtitle ID.
tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1
If you have just the .VOB files, you can use this recipe:
cat VTS_02_?.VOB | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1
Use the **[[subtitleripper]]** scripts to obtain the VobSub files:
subtitle2vobsub -p subtitles_stream.ps1 -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO -o subtitles
We used the .IFO file of the selected DVD track (#2 in the example). The subtitles will be saved into the [[glossary#vobsub|VobSub]] format; two files will be generated: **subtitles.idx** and **subtitles.sub**.
If you need to extract only a part of subtitle stream (e.g. if you have cut the original track into several pieces), just use the **-e** option, to indicate the **start**, the **end** and a **new_start** (new time offset) of the extraction, in **seconds**, like this:
subtitle2vobsub -p subtitles_stream.ps1 \
-i $RIPDIR/VIDEO_TS/VTS_02_0.IFO \
-e 9673.914,12673,0 -o subtitles
==== OCRing ====
Right, lets make our lives really nasty and create hundreds of PGM files:
cat subtitles_stream.ps1 | subtitle2pgm
If you want to control how grey levels are converted, try to use the **%%-c%%** option of subtitle2pgm, something like: **%%-c 255,0,0,255%%**.
Each subtitle should now be one file named like **movie_subtitle0003.pgm**, and a **movie_subtitle.srtx** file will be created to index them and their times on-screen.
=== With Tesseract OCR ===
#!/bin/sh
find . -type f -name '*.pgm' | sort | while read file; do
echo -n "$(basename $file) "
tesseract -l eng --psm 4 "$file" "$file"
done
=== With Gocr ===
**NOTICE**: Dont' use the following, because Gocr is not the best tool for OCR. Use **Tesseract OCR** instead.
To ocr all the .pgm image with **gocr** (using a nice wrapper for the job):
pgm2txt -d -f en -v -s 10 movie_subtitle
It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character...)
==== Make a single .srt file ====
Now we will re-merge all these text files produced into a big subtitle file:
srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt
Now it's time to proofread. I prefer to go through each one manually:
display *.pgm &
vim english.srt
You can use spacebar to advance your images in display.
Gocr is very predictable, so if it makes a mistake once, it will do it again, a lot! Use your editor's regular expression features whenever you spot a mistake to correct all the instances. It saves time.
Then spell check:
aspell -l british -c english.srt
You can now add english.srt onto the end of your ''ogmmerge'' command. Oh, and stick a ''-c LANGUAGE=English'' before it ;-)
==== Fixing time, etc ====
Finally you can proof-check the final .srt file using the graphical interface of **Gaupol**, a full-featured subtitle editor program. It can handle some of the more common operation required:
* **Shift times**, from //Tools//, //Shift Positions...//
* **Renumber subtitles**, this is done automatically when you save the project.
===== Links =====
* [[http://www.mplayerhq.hu/DOCS/HTML/en/index.html]]
* [[http://gentoo-wiki.com/HOWTO_Mencoder_Introduction_Guide]]
* [[http://axljab.homelinux.org/Mencoder_DVD_to_OGM]]
* [[http://wiki.clug.org.za/index.php/DVD_Compatibility|DVD Compatibility]]
* [[http://wiki.clug.org.za/index.php/Authoring_DVDs_with_dvdauthor|Authoring DVDs with dvdauthor]]