doc:appunti:linux:video:ripping_dvds_with_mencoder
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
doc:appunti:linux:video:ripping_dvds_with_mencoder [2017/10/13 16:49] – [OCRing] niccolo | doc:appunti:linux:video:ripping_dvds_with_mencoder [2020/04/21 17:05] (current) – [OCRing] niccolo | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Ripping DVDs with Mencoder ====== | ====== Ripping DVDs with Mencoder ====== | ||
+ | :!: For a simple recipe to rip (extract) the content of a DVD using Debian 10, see **[[vobcopy]]**. | ||
===== Install the necessary programs ===== | ===== Install the necessary programs ===== | ||
Line 200: | Line 201: | ||
===== Extract Subtitles with transcode ===== | ===== Extract Subtitles with transcode ===== | ||
+ | |||
+ | FIXME The following programs are **missing in Debian 10 Buster**: **tcextract**, | ||
DVDs have subtitles stored as images. There are some options for dealing with them: | DVDs have subtitles stored as images. There are some options for dealing with them: | ||
Line 264: | Line 267: | ||
< | < | ||
- | cat subtitles_stream.ps1 | subtitle2pgm | + | cat subtitles_stream.ps1 | subtitle2pgm |
</ | </ | ||
+ | |||
+ | If you want to control how grey levels are converted, try to use the **%%-c%%** option of subtitle2pgm, | ||
Each subtitle should now be one file named like **movie_subtitle0003.pgm**, | Each subtitle should now be one file named like **movie_subtitle0003.pgm**, | ||
- | === Tesseract OCR === | + | === With Tesseract OCR === |
<code bash> | <code bash> | ||
Line 275: | Line 280: | ||
find . -type f -name ' | find . -type f -name ' | ||
echo -n " | echo -n " | ||
- | tesseract -l eng -psm 4 " | + | tesseract -l eng --psm 4 " |
done | done | ||
</ | </ | ||
- | === Gocr === | + | === With Gocr === |
**NOTICE**: Dont' use the following, because Gocr is not the best tool for OCR. Use **Tesseract OCR** instead. | **NOTICE**: Dont' use the following, because Gocr is not the best tool for OCR. Use **Tesseract OCR** instead. | ||
Line 291: | Line 296: | ||
It will prompt you for tons of characters that it doesn' | It will prompt you for tons of characters that it doesn' | ||
- | ==== ==== | + | ==== Make a single .srt file ==== |
Now we will re-merge all these text files produced into a big subtitle file: | Now we will re-merge all these text files produced into a big subtitle file: |
doc/appunti/linux/video/ripping_dvds_with_mencoder.1507906148.txt.gz · Last modified: 2017/10/13 16:49 by niccolo