Table of Contents

Video Editing on the Command Line

ffmpeg Recipes

The problem: I have recorded about 95 video clips using the XiaoYi Yi action camera. They make a total of about 20 Gb of data and 3.7 hours of play. I want to make some video editing with the following guidelines:

  1. Work from the command line, using scripts to do all the work. I don't want to use non-linear video editing software, which are very resource hungry and requires big screens.
  2. Avoid re-encoding at all.
  3. Cut the required clips and assemble them in a freely choosen order.
  4. Make some additional clips using still images or images with Ken Burst effects.
  5. No transitions between clips (cross-fade and alike) are planned at the moment.

To solve step 4, we decided to use the ffmpeg tool. We wish to make the still-image clips and the Ken Burst clips using the same features (as far as possibile) of the original videos from the XiaoYi Yi camera, because we want to concatenate them without re-encoding.

Using the GNU/Linux tools: file(1), ffprobe(1), mplayer(1) and avidemux we inspected the Yi videos, determining the following:

Frame size 1920×1080 24bpp
Frames per second 29.970
Video codec H264
Pixel format yuvj420p
Video time base 1/240000
Types of frame IPPPPPPPIPPPPPPPI…. (no B-Frames, group of pictures = 8)
Audio codec AAC
Audio stream 48000 Hz, 2 ch, floatle
File type ISO Media, MPEG v4 system, 3GPP JVT AVC [ISO 14496-12:2005]

To make a vido (file type) much like as possible as the XiamYi camera, we used the the mp4 muxer (selected automatically by the output filename) with muxer brand set to avc1 and the 3gp format:

ffmpeg ... -brand avc1 -f 3gp output.mp4

Other muxer brand availables are: mp41, mp42, isom, iso2, … (discovered inspecting the binary file with strings, which alternatives?).

Mixing videos from different sources

Sometimes we needed to mix video clips originating from different sources. To apply our simple cut-and-paste recipe (wich preserve the original video quality, without re-encoding), we normalized all the “different” video clips to the same format as the one produced by the Xiaomi Yi camera.

One annoying effect of videos created by joining clips with different encoding parameters, is seen in mplayer: if you switch the play at full screen, the videos jumps back automatically at non-full screen when there is such a clip join.

Normalize (Resize and Re-encode)

If we want to mix videos from different sources (e.g. a smartphone and the Xiaomi Yi camera), we first convert the clips all into the same format.

With this recipe we will convert an input video, resizing (upscaling) it and converting it into a format much like as possible as the one produced by the Xiaomi Yi camera. Video format is inspected with the mediainfo tool.

ffmpeg -i input_video.mkv \
    -r '30000/1001' -aspect '1920:1080' \
    -c:v libx264 -vf 'scale=1920:1080:flags=lanczos' \
    -profile:v main -level:v 4.0 \
    -x264-params \
      bitrate=12000:vbv_maxrate=12500:vbv_bufsize=20000:nal_hrd=vbr:keyint=8:bframes=0:scenecut=-1:ref=1 
    -brand avc1 \
    output_video.mp4

WARNING The -r option forces frames dropping or frames duplicating as necessary, to obtain the requested FPS (30000/1001 = 29.97). The resulting video will have the same duration as the source, but it can show some jerky shuttering.

mediainfo output ffmpeg option
Format MPEG-4
Format profile JVT -brand avc1
Codec ID avc1 (isom/iso2/avc1/mp41) -brand avc1
Format AVC
Format/Info Advanced Video Codec
Format profile Main@L4 -profile:v main -level:v 4.0
Format settings, CABAC Yes
Format settings, ReFrames 1 frame -x264-params ref=1
Format settings, GOP M=1, N=8 -x264-params keyint=8:bframes=0:scenecut=-1
Bit rate mode Variable -x264-params vbv_maxrate=12000:vbv_bufsize=2000:nal_hrd=vbr
Frame rate 29.970 (30000/1001) FPS -r 30000/1001

Rotate

Sometime it is necessary to rotate a video by 180 degrees (e.g. a video made with a smartphone, using the wrong orientation). It is necessary to remove the video rotation metadata beforehand, because ffmpeg does not seem able to remove them and apply the video filter in a single pass.

ffmpeg -i input_video.mp4 -c copy -metadata:s:v:0 rotate=0 tmp_video.mp4

Then the ffmpeg transpose video filter is required, to rotate a video by 180 degrees we need to apply the transpose=1 two times:

ffmpeg -i tmp_video.mp4 -vf transpose=1,transpose=1 rotated_video.mp4

You can apply the transpose and the normalization (if required, see the above paragraph) in a single pass: just add the transpose operation to the whole normalization recipe above.

INFO: The number accepted by the transpose filter means:

A 90-degree rotation involves the problem of the height / width ratio, so a more complex recipe is needed, such as the one described on the page: Convert a Smartphone's Portrait Video into Landscape, with ffmpeg.

Add an Audio Track

The time-lapse videos taken by the Xiaomi Yi Action camera do not have an audio track. This causes problems when cutting and concatenating clips with the recipes presented below: you can see the play (with mplayer) which freezes for several seconds at the joining point of a clip with no audio and a clip with audio.

So I add a silence audio track to theese clips with ffmpeg:

ffmpeg -f lavfi -i anullsrc=sample_rate=48000 -i timelapse_video.mp4 \
    -shortest -c:a libfdk_aac -b:a 128k -c:v copy timelapse_video_silence.mp4

Concatenate the Clips

Suppose (see below for the actual recipes) that we have extracted the choosen clips from the original videos and made some extra clips with ffmpeg (still-images and Ken Burst effects).

We have also the list of all the clips to make the montage; a file named clip_list.txt containing something like this:

file 'clip_001.mp4'
file 'clip_002.mp4'
file 'clip_003.mp4'
...

Now we face the problem to concatenate all together. Our first attemp was a failure, read below!

Concatenate at File Level: Does NOT Work!

Our first attemp was to use the concat protocol and the copy codec. This works well if all the MP4 files are equal (frame rate, time base, etc.).

# This does not work well!
ffmpeg -f concat -safe 0 -i clip_list.txt -c copy montage.mp4

Unfortunately this was not the case when we mixing clips filmed with the action cam and clips generated with ffmpeg. We think that the problem was the difference in time_base; it was 1/240000 for files coming from XiaoYi Yi cam and it was 1/90000 for files produced with ffmpeg. Once played with mplayer the montage exhibits some seconds of freeze on scene cuts, when switching time_base.

We cannot find a way to set the time_base when producing videos using ffmpeg, so we abandoned this way.

Concatenate at the Transport Streams Level

We found the solution in this page: the Concatenate protocol. Basically all the streams must be transcoded to an MPEG transport stream and then muxed into an mp4 file. This is a lossless operation when using h.264 video and AAC audio.

The trick is to use the appropriate bitstream filters (-bsf command line option), and produce all the pieces as mpegts clips.

Recipe to Cut Clips

This is the command line to extract (cut) a clip from a larger video, the piece is identified by the starting time (-ss MM:SS.mmm) and its lenght (-t SS.mmm).

ffmpeg -i YDXJ0945.mp4 -ss 01:48.641 -c copy -t 4.538 -bsf:v h264_mp4toannexb -f mpegts clip_242.ts

Recipe to Concatenate

Suppose that clip_list.txt is a text file with the list of mpegts clips to concatenate (see above), this is the command line to concatenate all together and make an mp4 file:

ffmpeg -f concat -safe 0 -i clip_list.txt -c copy -bsf:a aac_adtstoasc -brand avc1 -f 3gp montage.mp4

During the process you may get the following error message:

[h264 @ 0x564b14890620] non-existing SPS 0 referenced in buffering period
[h264 @ 0x564b14890620] SPS unavailable in decode_picture_timing

it seems to be harmless, at least according to this FAQ: These messages comes from the H.264 video decoder in ffmpeg and ar printed when re-muxing MPEG-TS files to MP4. As far as is known, the conditions flagged by ffmpeg do not have any effect on get_iplayer re-muxing, so you can ignore these messages.

Video Clip From a Still-image

#!/bin/sh -e
INPUT_FILE="$1"         # An MP4 file.
STILLFRAME="$2"         # Frame to extract MM:SS.DDD.
DURATION_S="$3"         # SS.DDD duration.
OUTPUT_MP4="$4"         # Output MP4 video.
 
STILL_IMG="_stillimage.$$.jpg"
# NOTICE: Put -ss option after -i if cut is not frame precise (slower)!
ffmpeg -loglevel fatal -ss "$STILLFRAME" -i "$INPUT_FILE" -vframes 1 -f image2 "$STILL_IMG"
ffmpeg -loglevel error -y -f lavfi -i 'anullsrc=sample_rate=48000' \
    -loop 1 -framerate '29.970' -i "$STILL_IMG" -vf 'scale=1920:1080' -t "$DURATION_S" \
    -pix_fmt 'yuvj420p' \
    -c:a 'libfaac' \
    -c:v 'libx264' -profile 'main' -x264-params 'keyint=8:bframes=0' \
    -bsf:v 'h264_mp4toannexb' -f 'mpegts' -r '29.970' "$OUTPUT_MP4"
rm "$STILL_IMG"

NOTICE: The resulting video clips have a start_pts greather than zero (e.g. 127920), which is different from the videos produced by the XiaoYi camera. It is a common practice to produce videos with the PTS (Presentation Time Stamp) of the first frame greather than zero; otherwise a subsequent B-Frame may eventually requires a negative Decode Time Stamp (DTS), which is bad. This non-zero PTS should not pose any problem in concatenating video clips.

Video Clip Ken Burst Effect

Step 1: make the images

This is the kenburst-1-make-images Python script. It sends a batch of images to the stdout, making the Ken Burst effect on a image, of the specified duration. The Ken Burst effect is specified by two geometry which identify the start and the end position into the image itself.

#!/usr/bin/python
# Make a batch of images to create a Ken Burst effect on a source image.
 
import os.path, re, subprocess, sys
if len(sys.argv) < 4:
    print "Usage %s <input_img> <geometry:geometry> <duration_s>" % (os.path.basename(sys.argv[0]),)
    sys.exit(1)
 
# Geometry: "WxH+X+Y" => Width, Height, X (from left), Y (from top)
input_img   = sys.argv[1]
geometry    = sys.argv[2]
duration_s  = float(sys.argv[3])
fps         = '29.970'
stdout_mode = True
 
(geom1, geom2) = geometry.split(':', 1)
(w1, h1, x1, y1) = re.split('x|\+', geom1)
(w2, h2, x2, y2) = re.split('x|\+', geom2)
frames = (int(round(duration_s * float(fps))) - 1)
 
for f in range(0, (frames + 1)):
    percent = float(f) / frames
    w = int(float(w1) + (float(w2) - float(w1)) * percent)
    h = int(float(h1) + (float(h2) - float(h1)) * percent)
    x = int(float(x1) + (float(x2) - float(x1)) * percent)
    y = int(float(y1) + (float(y2) - float(y1)) * percent)
    crop = '%dx%d+%d+%d!' % (w, h, x, y)
    if stdout_mode:
        cmd = ['convert', input_img, '-crop', crop, '-resize', '1920x1080!', 'jpeg:-']
    else:
        cmd = ['convert', input_img, '-crop', crop, '-resize', '1920x1080!', 'frame_%04d.jpg' % (f,)]
        print ' '.join(cmd)
    subprocess.call(cmd)

Step 2: assemble the video

This is the kenburst-2-make-video shell script. It takes a batch of images from the stdin, and create a video from them.

#!/bin/sh
# Get a stream of JPG images from stdin, make an MP4 video.
VIDEO_OUT="$1"
if [ -z "$VIDEO_OUT" ]; then
    echo "Usage: $(basename $0) <video_out>"
    exit 1
fi
ffmpeg -loglevel error -y -f lavfi -i 'anullsrc=sample_rate=48000' \
    -f image2pipe -vcodec mjpeg -framerate '29.970' -i - -shortest \
    -c:a 'libfaac' \
    -c:v 'libx264' -profile 'main' -x264-params 'keyint=8:bframes=0' \
    -bsf:v 'h264_mp4toannexb' -f 'mpegts' -r '29.970' "$VIDEO_OUT"

Step 3: pipe all together

With this script we will pipe the output of kenburst-1-make-images script to the input of kenburst-2-make-video, so it will not save the intermediate images to the disk:

#!/bin/sh -e
#
# Extract a frame from a video, or use a single image
# and make a Ken Burst video of the specified duration.
 
INPUT_FILE="$1"         # The JPG file.
KBURST_GEO="$2"         # Ken Burst data: begin and end geometry.
DURATION_S="$3"         # SS.DDD duration.
OUTPUT_MP4="$4"         # Output MP4 video.
 
./kenburst-1-make-images "$INPUT_FILE" "$KBURST_GEO" "$DURATION_S" \
    | ./kenburst-2-make-video "$OUTPUT_MP4"

Re-encoding with tonal correction

We had some video clips recorded with an SJCAM Sj8 Pro camera with a bad color balance and saturation due some bad tables loaded into the firmware. It is possibile to re-encode all the video clips applying an equalization filter keeping all the encoding paramteres as similar as possibile to the original ones.

The video clips were extracted from the original MP4 container as MPEG-TS snippets containing only video (no audio). To re-encode each clip we used the following ffmpeg recipe:

#!/bin/sh
#
# Re-encode video clips in MPEG transport stream (MPEG-TS) format applying
# some saturation and gamma correction.
#
# saturation:           In range 0.0 to 3.0. The default value is "1".
# gamma_{r|g|b}         In range 0.1 to 10.0. The default value is "1".
 
INPUT="$1"
OUTPUT="$INPUT.eq.ts"
EQ_FILTER="eq=saturation=0.88:gamma_r=0.917:gamma_g=1.007:gamma_b=1.297"
 
# Produces MPEG segments like the ones produced by the SJCAM SJ8Pro:
ffmpeg -i "$INPUT" \
    -vf "$EQ_FILTER" \
    -codec:v libx264 \
    -preset veryslow -profile:v main -level:v 4.2 -pix_fmt yuvj420p \
    -x264-params 'vbv-maxrate=38000:vbv_bufsize=20000:nal-hrd=vbr:force-cfr=1:keyint=8:bframes=0:scenecut=-1:ref=1' \
    -keyint_min 8 -brand avc1 -f 3gp \
    -bsf:v h264_mp4toannexb -f mpegts \
    "$OUTPUT"

The gamma correction for the three RGB channels was determined with the GIMP, using the ColorsLevelsPick the gray point for all channels tool. The use of MPEG-TS clips allowed the montage of the final video by just concatenating them.

AVC (x264) is better than ASP (xvid4)

See this page: Common myths to understand the differences between formats (standards) and codecs (pieces of software). Read also this simple page: Difference between MPEG-4 AVC and MPEG-4 ASP. See also the Wikipedia article about Advanced Video Coding.

If you want to tweak with x264 codec options, here are some hints on the parameters meaning:

More on ffmpeg Command Line

Suppress Some Output

Every ffmpeg invocation can be made less verbose, e.g. by suppressing all the messages except the errors:

ffmpeg -loglevel error ...

Getting Help on Command Line

NOTICE: Often there are generic ffmpeg options that produce the same effects of options specific to the used codec. As an example, the generic option

ffmpeg -i input.mp4 -g 8 ...

sets the GOP size (group of picture, i.e. the distance between one keyframe and the other). Using the x264 codec you can use a specific option, like this:

ffmpeg -i input.mp4 -c:v libx264 -x264-params keyint=8 ...

Full Help About ffmpeg Options

ffmpeg -h full

Help About a Specific Codec Options

How to get help about options of an external encoder. The ffmpeg executable must be compiled with support for that external library, in a Debian system you must install the appropriate packages, e.g. ffmpeg and libx264-148 from Debian Multimedia:

ffmpeg -help encoder=libx264

that options can be used on the ffmpeg command line, after the -c:v libx264 option.

More libx264 Options

Some options of the x264 library can be specified on the ffmpeg command line (see above). Many other options are read from system or user's configuration files (e.g. $HOME/.ffmpeg/x264-preset.ffpreset). That advanced options can be overridden on the ffmpeg command line using the -x264-params parameter. Example:

ffmpeg -i input.mp4 -c:v libx264 -profile main -x264-params 'keyint=8:bframes=0' output.mp4

To obtain the full list of x264 options you must execute the x264 command line tool (provided by the x264 Debian package). Be aware that there may be discrepancies between the versions of the libraries used; on our system ffmpeg is compiled against libx264-148 (both packages installed from Debian Multimedia), instead the x264 tool may be compiled against libx264-142 if installed from Debian Main archive.

x264 --fullhelp

Help About a Specific Video Filter

ffmpeg has many video filter that can be applied. To get the available options of a specific filter, execute this command:

ffmpeg -help filter=scale

then you can pass the options on the command line, as the argument of the -vf video filter option:

ffmpeg -i input.mp4 -vf 'scale=1920:1080:flags=bicubic' output.mp4

as an alternative you can pass some options as native ffmpeg options:

ffmpeg -i input.mp4 -vf 'scale=1920:1080' -sws_flags bicubic output.mp4

Getting Video Info

To get quick info about an audio/video file, you can use the mediainfo tool, from the omonymous Debian package. Otherwise there is the ffprobe tool from the ffmpeg Debian package.

Show several info about audio/video stream in file. E.g. codec_name, codec_time_base, width, height, avg_frame_rate, duration, etc.

ffprobe -show_streams video.mp4

show one line of info per frame in CSV format:

ffprobe -show_frames -select_streams v -print_format csv video.mp4

the CSV format can be controlled by several options, e.g. if you want to print each field as a key=val pair, use:

ffprobe -show_frames -select_streams v -print_format csv=nokey=0 video.mp4

Among others, you can get data like:

key_frame 1 or 0 if the frame is a Key frame or not.
pkt_pts Packet Presentation Time Stamp in time_base units.
pkt_pts_time Packet Presentation Time Stamp in seconds.
pkt_dts Packet Decoding Time Stamp in time_base units.
pkt_dts_time Packet Decoding Time Stamp in seconds.
pkt_duration Packet duration in time_base units.
pkt_duration_time Packet duration in seconds.
width Width in pixel, e.g. 1920
height Height in pixel, e.g. 1080
pix_fmt e.g. yuvj420p.
pict_type I, P or B for Intra frame, Predictive frame and Bidirectional predictive frame. Indicates the type of Video compression picture types.

Extractig AAC Audio from MP4 File (or other format/container)

With this two commands we will extract AAC audio from an MP4 file, and then convert it into a 16 bit WAV file at 48 KHz:

ffmpeg -i video.mp4 -vn -c:a copy audio.m4a
ffmpeg -i audio.m4a -vn -c:a pcm_s16le -ar 48000 avaudio.wav

The extension used as destination file does matter for ffmpeg, if you use a wrong one you may get the error message Unable to find a suitable output format.

You should use the mediainfo tool (from the omonymous Debian package) to inspect the the video file and discover the audio format, then use the proper file extension. E.g. if the source file has an audio stream in Opus format, you should use:

ffmpeg -i video.mkv -vn -c:a copy audio.opus

WARNING! If you plan to use the track into Audacity (or something that converts it into uncompressed format), check the duration of the track before and after the conversion. If you see some mismatch you can use the following recipe to fix timestamp gaps in the source:

ffmpeg -i audio.m4a -af aresample=async=1 audio.wav

Stabilizzazione video

Come stabilizzare un video tremolante. La procedura ovviamente sacrifica una parte del video, tagliando i bordi. Vedere il post Video Stabilization with FFmpeg.

L'operazione si compie in due passaggi, il primo calcola i vettori di stabilizzazione e li salva in un file:

ffmpeg -i input.mp4 \
    -vf vidstabdetect=stepsize=6:shakiness=8:accuracy=9:result=stab_vect.trf -f null -

il secondo passaggio applica la correzione:

ffmpeg -i input.mp4 \
    -vf vidstabtransform=input=stab_vect.trf:zoom=1:smoothing=30,unsharp=5:5:0.8:3:3:0.4 \
    -vcode c libx264 -preset slow -tune film -crf 18 -acodec copy outout.mp4

I parametri utilizzati nell'esempio sono molto “aggressivi”, provare a diminuire la shakiness (ad esempio usando valore 3 o 4) e diminuire anche lo zoom e lo smoothing.

Un test con un video molto mosso (telecamera impugnata a mano su motocicletta in movimento) con zoom=1 e smoothing=30 ha ridotto il film a circa il 77% dell'originale sulle due dimensioni. Dei 1920 pixel originali di larghezza ne sono stati estrapolati circa 1480 (il filmato viene comunque scalato per mantenere la dimensione originale). Impostando lo zoom=0.5 si sono recuperati circa 15 pixel lineari (riduzione al 78% dell'originale).

Aggiunta sottotitoli

Se si ha un file con i sottotitoli per un video, è possibile fare il muxing:

ffmpeg -i movie.mkv -i movie.sub -c copy \
    -sub_charenc UTF-8 -c:s srt -metadata:s:s:0 language=eng \
    movie_sub.mkv

Deinterlace

We can use the video filter named yadif (yet another deinterlacing filter). In this example the result was encoded in MPEG-4 AVC using the libx264 library, forcing one keyframe every 8 frames:

ffmpeg -i input-video.mkv -codec:a copy -vf yadif -codec:v libx264 -preset slow \
        -x264-params 'force-cfr=1:keyint=8:bframes=0:ref=1' \
        output-video.mkv

Problem in MKV Remux

It seems there is a bug in ffmpeg #6037 mkv muxing not broken: muxing two working files into a mkv produces a broken file: seeking around can break (mute) audio. I experienced this bug (with ffmpeg 4.1.6) trying to mux one mkv file containing one audio and one subtitle streams to another mkv file conaining video and audio. The resulting file did not play good in mplayer: seeking into the file caused audio or vido to stop playing.

This was the first try command line:

# The resulting video is broken.
ffmpeg -i input_file1.mkv -i input_file2.mkv \
    -map '0:v:0' -map '0:a:0' \
    -map '1:a:0' -map '1:s:0' \
    -codec:v copy -codec:a copy -codec:s copy \
    output_file.mkv

The workaround was to extract each individual stream, and mux then together:

ffmpeg -i input_file1.mkv -map 0:v:0 -codec:v copy input-v_env.mkv
ffmpeg -i input_file1.mkv -map 0:a:0 -codec:a copy input-a_ita.mkv
ffmpeg -i input_file2.mkv -map 0:a:0 -codec:a copy input-a_eng.mkv
ffmpeg -i input_file2.mkv -map 0:s:0 -codec:s copy input-s_eng.mkv
ffmpeg \
    -i input-v_env.mkv \
    -i input-a_ita.mkv \
    -i input-a_eng.mkv \
    -i input-s_eng.mkv \
    -codec:v copy -codec:a copy -codec:s copy \
    -map '0' -map '1' -map '2' -map '3' \
    output_file.mkv

ffmpeg: leggere la sequenza di VOB da un DVD

Nella directory VIDEO_TS di un DVD la traccia principale è normalmente suddivisa in file numerati sequenzialmente, ad esempio: VTS_01_0.VOB, VTS_01_1.VOB, …

In teoria è sufficiente concatenare i file in un solo file destinazione e quindi trattarlo come un normale file audio/video. Tuttavia è possibile indicare i singoli file come input senza la necessità di occupare ulteriore spazio disco con questa sintassi:

SOURCE="concat:VTS_01_1.VOB|VTS_01_2.VOB|VTS_01_3.VOB|VTS_01_4.VOB|VTS_01_5.VOB"
ffmpeg -i "$SOURCE" ...

ffmpeg: impostare un ritardo sui sottotitoli durante il muxing

Se un flusso di sottotitoli (ad esempio nel formato Picture based DVD) non indica correttamente l'offset iniziale di riproduzione è possibile dire ad ffmpeg di impostarlo opportunamente in fase di muxing. In questo esempio il primo sottotitolo appare a 44.5 secondi:

ffmpeg -i video-stream.mkv -i audio-stream.mkv -itsoffset 44.5 -i subtitles-stream.mkv ...

In generale dovrebbe essere possibile scoprire l'offset quando ffmpeg legge l'intero stream, al momento in cui trova il prmio frame dei subtitles mostra qualcosa del genere sulla console:

[mpeg @ 0x55f98bb2c6c0] New subtitle stream 0:7 at pos:14755854 and DTS:44.5s

Parameters for final rendering

See the page Final Rendering with libx264 and ffmpeg.

Doppiaggio audio con Ardour

Vedere la pagina dedicata: Doppiaggio audio con Ardour.