The problem: I have recorded about 95 video clips using the XiaoYi Yi action camera. They make a total of about 20 Gb of data and 3.7 hours of play. I want to make some video editing with the following guidelines:
To solve step 4, we decided to use the ffmpeg tool. We wish to make the still-image clips and the Ken Burst clips using the same features (as far as possibile) of the original videos from the XiaoYi Yi camera, because we want to concatenate them without re-encoding.
Using the GNU/Linux tools: file(1), ffprobe(1), mplayer(1) and avidemux we inspected the Yi videos, determining the following:
Frame size | 1920×1080 24bpp |
---|---|
Frames per second | 29.970 |
Video codec | H264 |
Pixel format | yuvj420p |
Video time base | 1/240000 |
Types of frame | IPPPPPPPIPPPPPPPI…. (no B-Frames, group of pictures = 8) |
Audio codec | AAC |
Audio stream | 48000 Hz, 2 ch, floatle |
File type | ISO Media, MPEG v4 system, 3GPP JVT AVC [ISO 14496-12:2005] |
To make a vido (file type) much like as possible as the XiamYi camera, we used the the mp4 muxer (selected automatically by the output filename) with muxer brand set to avc1 and the 3gp format:
ffmpeg ... -brand avc1 -f 3gp output.mp4
Other muxer brand availables are: mp41, mp42, isom, iso2, … (discovered inspecting the binary file with strings
, which alternatives?).
Sometimes we needed to mix video clips originating from different sources. To apply our simple cut-and-paste recipe (wich preserve the original video quality, without re-encoding), we normalized all the “different” video clips to the same format as the one produced by the Xiaomi Yi camera.
One annoying effect of videos created by joining clips with different encoding parameters, is seen in mplayer: if you switch the play at full screen, the videos jumps back automatically at non-full screen when there is such a clip join.
If we want to mix videos from different sources (e.g. a smartphone and the Xiaomi Yi camera), we first convert the clips all into the same format.
With this recipe we will convert an input video, resizing (upscaling) it and converting it into a format much like as possible as the one produced by the Xiaomi Yi camera. Video format is inspected with the mediainfo tool.
ffmpeg -i input_video.mkv \ -r '30000/1001' -aspect '1920:1080' \ -c:v libx264 -vf 'scale=1920:1080:flags=lanczos' \ -profile:v main -level:v 4.0 \ -x264-params \ bitrate=12000:vbv_maxrate=12500:vbv_bufsize=20000:nal_hrd=vbr:keyint=8:bframes=0:scenecut=-1:ref=1 -brand avc1 \ output_video.mp4
WARNING The -r option forces frames dropping or frames duplicating as necessary, to obtain the requested FPS (30000/1001 = 29.97). The resulting video will have the same duration as the source, but it can show some jerky shuttering.
mediainfo output | ffmpeg option | |
---|---|---|
Format | MPEG-4 | |
Format profile | JVT | -brand avc1 |
Codec ID | avc1 (isom/iso2/avc1/mp41) | -brand avc1 |
Format | AVC | |
Format/Info | Advanced Video Codec | |
Format profile | Main@L4 | -profile:v main -level:v 4.0 |
Format settings, CABAC | Yes | |
Format settings, ReFrames | 1 frame | -x264-params ref=1 |
Format settings, GOP | M=1, N=8 | -x264-params keyint=8:bframes=0:scenecut=-1 |
Bit rate mode | Variable | -x264-params vbv_maxrate=12000:vbv_bufsize=2000:nal_hrd=vbr |
Frame rate | 29.970 (30000/1001) FPS | -r 30000/1001 |
Sometime it is necessary to rotate a video by 180 degrees (e.g. a video made with a smartphone, using the wrong orientation). It is necessary to remove the video rotation metadata beforehand, because ffmpeg does not seem able to remove them and apply the video filter in a single pass.
ffmpeg -i input_video.mp4 -c copy -metadata:s:v:0 rotate=0 tmp_video.mp4
Then the ffmpeg transpose video filter is required, to rotate a video by 180 degrees we need to apply the transpose=1
two times:
ffmpeg -i tmp_video.mp4 -vf transpose=1,transpose=1 rotated_video.mp4
You can apply the transpose and the normalization (if required, see the above paragraph) in a single pass: just add the transpose operation to the whole normalization recipe above.
INFO: The number accepted by the transpose filter means:
A 90-degree rotation involves the problem of the height / width ratio, so a more complex recipe is needed, such as the one described on the page: Convert a Smartphone's Portrait Video into Landscape, with ffmpeg.
The time-lapse videos taken by the Xiaomi Yi Action camera do not have an audio track. This causes problems when cutting and concatenating clips with the recipes presented below: you can see the play (with mplayer) which freezes for several seconds at the joining point of a clip with no audio and a clip with audio.
So I add a silence audio track to theese clips with ffmpeg:
ffmpeg -f lavfi -i anullsrc=sample_rate=48000 -i timelapse_video.mp4 \ -shortest -c:a libfdk_aac -b:a 128k -c:v copy timelapse_video_silence.mp4
Suppose (see below for the actual recipes) that we have extracted the choosen clips from the original videos and made some extra clips with ffmpeg
(still-images and Ken Burst effects).
We have also the list of all the clips to make the montage; a file named clip_list.txt containing something like this:
file 'clip_001.mp4' file 'clip_002.mp4' file 'clip_003.mp4' ...
Now we face the problem to concatenate all together. Our first attemp was a failure, read below!
Our first attemp was to use the concat protocol and the copy codec. This works well if all the MP4 files are equal (frame rate, time base, etc.).
# This does not work well! ffmpeg -f concat -safe 0 -i clip_list.txt -c copy montage.mp4
Unfortunately this was not the case when we mixing clips filmed with the action cam and clips generated with ffmpeg
. We think that the problem was the difference in time_base; it was 1/240000 for files coming from XiaoYi Yi cam and it was 1/90000 for files produced with ffmpeg. Once played with mplayer the montage exhibits some seconds of freeze on scene cuts, when switching time_base.
We cannot find a way to set the time_base when producing videos using ffmpeg, so we abandoned this way.
We found the solution in this page: the Concatenate protocol. Basically all the streams must be transcoded to an MPEG transport stream and then muxed into an mp4 file. This is a lossless operation when using h.264 video and AAC audio.
The trick is to use the appropriate bitstream filters (-bsf
command line option), and produce all the pieces as mpegts clips.
This is the command line to extract (cut) a clip from a larger video, the piece is identified by the starting time (-ss MM:SS.mmm) and its lenght (-t SS.mmm).
ffmpeg -i YDXJ0945.mp4 -ss 01:48.641 -c copy -t 4.538 -bsf:v h264_mp4toannexb -f mpegts clip_242.ts
Suppose that clip_list.txt is a text file with the list of mpegts clips to concatenate (see above), this is the command line to concatenate all together and make an mp4 file:
ffmpeg -f concat -safe 0 -i clip_list.txt -c copy -bsf:a aac_adtstoasc -brand avc1 -f 3gp montage.mp4
During the process you may get the following error message:
[h264 @ 0x564b14890620] non-existing SPS 0 referenced in buffering period [h264 @ 0x564b14890620] SPS unavailable in decode_picture_timing
it seems to be harmless, at least according to this FAQ: These messages comes from the H.264 video decoder in ffmpeg and ar printed when re-muxing MPEG-TS files to MP4. As far as is known, the conditions flagged by ffmpeg do not have any effect on get_iplayer re-muxing, so you can ignore these messages.
#!/bin/sh -e INPUT_FILE="$1" # An MP4 file. STILLFRAME="$2" # Frame to extract MM:SS.DDD. DURATION_S="$3" # SS.DDD duration. OUTPUT_MP4="$4" # Output MP4 video. STILL_IMG="_stillimage.$$.jpg" # NOTICE: Put -ss option after -i if cut is not frame precise (slower)! ffmpeg -loglevel fatal -ss "$STILLFRAME" -i "$INPUT_FILE" -vframes 1 -f image2 "$STILL_IMG" ffmpeg -loglevel error -y -f lavfi -i 'anullsrc=sample_rate=48000' \ -loop 1 -framerate '29.970' -i "$STILL_IMG" -vf 'scale=1920:1080' -t "$DURATION_S" \ -pix_fmt 'yuvj420p' \ -c:a 'libfaac' \ -c:v 'libx264' -profile 'main' -x264-params 'keyint=8:bframes=0' \ -bsf:v 'h264_mp4toannexb' -f 'mpegts' -r '29.970' "$OUTPUT_MP4" rm "$STILL_IMG"
NOTICE: The resulting video clips have a start_pts greather than zero (e.g. 127920), which is different from the videos produced by the XiaoYi camera. It is a common practice to produce videos with the PTS (Presentation Time Stamp) of the first frame greather than zero; otherwise a subsequent B-Frame may eventually requires a negative Decode Time Stamp (DTS), which is bad. This non-zero PTS should not pose any problem in concatenating video clips.
This is the kenburst-1-make-images Python script. It sends a batch of images to the stdout, making the Ken Burst effect on a image, of the specified duration. The Ken Burst effect is specified by two geometry which identify the start and the end position into the image itself.
#!/usr/bin/python # Make a batch of images to create a Ken Burst effect on a source image. import os.path, re, subprocess, sys if len(sys.argv) < 4: print "Usage %s <input_img> <geometry:geometry> <duration_s>" % (os.path.basename(sys.argv[0]),) sys.exit(1) # Geometry: "WxH+X+Y" => Width, Height, X (from left), Y (from top) input_img = sys.argv[1] geometry = sys.argv[2] duration_s = float(sys.argv[3]) fps = '29.970' stdout_mode = True (geom1, geom2) = geometry.split(':', 1) (w1, h1, x1, y1) = re.split('x|\+', geom1) (w2, h2, x2, y2) = re.split('x|\+', geom2) frames = (int(round(duration_s * float(fps))) - 1) for f in range(0, (frames + 1)): percent = float(f) / frames w = int(float(w1) + (float(w2) - float(w1)) * percent) h = int(float(h1) + (float(h2) - float(h1)) * percent) x = int(float(x1) + (float(x2) - float(x1)) * percent) y = int(float(y1) + (float(y2) - float(y1)) * percent) crop = '%dx%d+%d+%d!' % (w, h, x, y) if stdout_mode: cmd = ['convert', input_img, '-crop', crop, '-resize', '1920x1080!', 'jpeg:-'] else: cmd = ['convert', input_img, '-crop', crop, '-resize', '1920x1080!', 'frame_%04d.jpg' % (f,)] print ' '.join(cmd) subprocess.call(cmd)
This is the kenburst-2-make-video shell script. It takes a batch of images from the stdin, and create a video from them.
#!/bin/sh # Get a stream of JPG images from stdin, make an MP4 video. VIDEO_OUT="$1" if [ -z "$VIDEO_OUT" ]; then echo "Usage: $(basename $0) <video_out>" exit 1 fi ffmpeg -loglevel error -y -f lavfi -i 'anullsrc=sample_rate=48000' \ -f image2pipe -vcodec mjpeg -framerate '29.970' -i - -shortest \ -c:a 'libfaac' \ -c:v 'libx264' -profile 'main' -x264-params 'keyint=8:bframes=0' \ -bsf:v 'h264_mp4toannexb' -f 'mpegts' -r '29.970' "$VIDEO_OUT"
With this script we will pipe the output of kenburst-1-make-images script to the input of kenburst-2-make-video, so it will not save the intermediate images to the disk:
#!/bin/sh -e # # Extract a frame from a video, or use a single image # and make a Ken Burst video of the specified duration. INPUT_FILE="$1" # The JPG file. KBURST_GEO="$2" # Ken Burst data: begin and end geometry. DURATION_S="$3" # SS.DDD duration. OUTPUT_MP4="$4" # Output MP4 video. ./kenburst-1-make-images "$INPUT_FILE" "$KBURST_GEO" "$DURATION_S" \ | ./kenburst-2-make-video "$OUTPUT_MP4"
We had some video clips recorded with an SJCAM Sj8 Pro camera with a bad color balance and saturation due some bad tables loaded into the firmware. It is possibile to re-encode all the video clips applying an equalization filter keeping all the encoding paramteres as similar as possibile to the original ones.
The video clips were extracted from the original MP4 container as MPEG-TS snippets containing only video (no audio). To re-encode each clip we used the following ffmpeg recipe:
#!/bin/sh # # Re-encode video clips in MPEG transport stream (MPEG-TS) format applying # some saturation and gamma correction. # # saturation: In range 0.0 to 3.0. The default value is "1". # gamma_{r|g|b} In range 0.1 to 10.0. The default value is "1". INPUT="$1" OUTPUT="$INPUT.eq.ts" EQ_FILTER="eq=saturation=0.88:gamma_r=0.917:gamma_g=1.007:gamma_b=1.297" # Produces MPEG segments like the ones produced by the SJCAM SJ8Pro: ffmpeg -i "$INPUT" \ -vf "$EQ_FILTER" \ -codec:v libx264 \ -preset veryslow -profile:v main -level:v 4.2 -pix_fmt yuvj420p \ -x264-params 'vbv-maxrate=38000:vbv_bufsize=20000:nal-hrd=vbr:force-cfr=1:keyint=8:bframes=0:scenecut=-1:ref=1' \ -keyint_min 8 -brand avc1 -f 3gp \ -bsf:v h264_mp4toannexb -f mpegts \ "$OUTPUT"
The gamma correction for the three RGB channels was determined with the GIMP, using the Colors ⇒ Levels ⇒ Pick the gray point for all channels tool. The use of MPEG-TS clips allowed the montage of the final video by just concatenating them.
See this page: Common myths to understand the differences between formats (standards) and codecs (pieces of software). Read also this simple page: Difference between MPEG-4 AVC and MPEG-4 ASP. See also the Wikipedia article about Advanced Video Coding.
If you want to tweak with x264 codec options, here are some hints on the parameters meaning:
Every ffmpeg invocation can be made less verbose, e.g. by suppressing all the messages except the errors:
ffmpeg -loglevel error ...
NOTICE: Often there are generic ffmpeg options that produce the same effects of options specific to the used codec. As an example, the generic option
ffmpeg -i input.mp4 -g 8 ...
sets the GOP size (group of picture, i.e. the distance between one keyframe and the other). Using the x264 codec you can use a specific option, like this:
ffmpeg -i input.mp4 -c:v libx264 -x264-params keyint=8 ...
ffmpeg -h full
How to get help about options of an external encoder. The ffmpeg executable must be compiled with support for that external library, in a Debian system you must install the appropriate packages, e.g. ffmpeg and libx264-148 from Debian Multimedia:
ffmpeg -help encoder=libx264
that options can be used on the ffmpeg command line, after the -c:v libx264
option.
Some options of the x264 library can be specified on the ffmpeg command line (see above). Many other options are read from system or user's configuration files (e.g. $HOME/.ffmpeg/x264-preset.ffpreset
). That advanced options can be overridden on the ffmpeg command line using the -x264-params
parameter. Example:
ffmpeg -i input.mp4 -c:v libx264 -profile main -x264-params 'keyint=8:bframes=0' output.mp4
To obtain the full list of x264 options you must execute the x264 command line tool (provided by the x264 Debian package). Be aware that there may be discrepancies between the versions of the libraries used; on our system ffmpeg is compiled against libx264-148 (both packages installed from Debian Multimedia), instead the x264 tool may be compiled against libx264-142 if installed from Debian Main archive.
x264 --fullhelp
ffmpeg has many video filter that can be applied. To get the available options of a specific filter, execute this command:
ffmpeg -help filter=scale
then you can pass the options on the command line, as the argument of the -vf
video filter option:
ffmpeg -i input.mp4 -vf 'scale=1920:1080:flags=bicubic' output.mp4
as an alternative you can pass some options as native ffmpeg options:
ffmpeg -i input.mp4 -vf 'scale=1920:1080' -sws_flags bicubic output.mp4
To get quick info about an audio/video file, you can use the mediainfo tool, from the omonymous Debian package. Otherwise there is the ffprobe tool from the ffmpeg
Debian package.
Show several info about audio/video stream in file. E.g. codec_name, codec_time_base, width, height, avg_frame_rate, duration, etc.
ffprobe -show_streams video.mp4
show one line of info per frame in CSV format:
ffprobe -show_frames -select_streams v -print_format csv video.mp4
the CSV format can be controlled by several options, e.g. if you want to print each field as a key=val pair, use:
ffprobe -show_frames -select_streams v -print_format csv=nokey=0 video.mp4
Among others, you can get data like:
key_frame | 1 or 0 if the frame is a Key frame or not. |
---|---|
pkt_pts | Packet Presentation Time Stamp in time_base units. |
pkt_pts_time | Packet Presentation Time Stamp in seconds. |
pkt_dts | Packet Decoding Time Stamp in time_base units. |
pkt_dts_time | Packet Decoding Time Stamp in seconds. |
pkt_duration | Packet duration in time_base units. |
pkt_duration_time | Packet duration in seconds. |
width | Width in pixel, e.g. 1920 |
height | Height in pixel, e.g. 1080 |
pix_fmt | e.g. yuvj420p. |
pict_type | I, P or B for Intra frame, Predictive frame and Bidirectional predictive frame. Indicates the type of Video compression picture types. |
With this two commands we will extract AAC audio from an MP4 file, and then convert it into a 16 bit WAV file at 48 KHz:
ffmpeg -i video.mp4 -vn -c:a copy audio.m4a ffmpeg -i audio.m4a -vn -c:a pcm_s16le -ar 48000 avaudio.wav
The extension used as destination file does matter for ffmpeg
, if you use a wrong one you may get the error message Unable to find a suitable output format.
You should use the mediainfo tool (from the omonymous Debian package) to inspect the the video file and discover the audio format, then use the proper file extension. E.g. if the source file has an audio stream in Opus format, you should use:
ffmpeg -i video.mkv -vn -c:a copy audio.opus
WARNING! If you plan to use the track into Audacity (or something that converts it into uncompressed format), check the duration of the track before and after the conversion. If you see some mismatch you can use the following recipe to fix timestamp gaps in the source:
ffmpeg -i audio.m4a -af aresample=async=1 audio.wav
Come stabilizzare un video tremolante. La procedura ovviamente sacrifica una parte del video, tagliando i bordi. Vedere il post Video Stabilization with FFmpeg.
L'operazione si compie in due passaggi, il primo calcola i vettori di stabilizzazione e li salva in un file:
ffmpeg -i input.mp4 \ -vf vidstabdetect=stepsize=6:shakiness=8:accuracy=9:result=stab_vect.trf -f null -
il secondo passaggio applica la correzione:
ffmpeg -i input.mp4 \ -vf vidstabtransform=input=stab_vect.trf:zoom=1:smoothing=30,unsharp=5:5:0.8:3:3:0.4 \ -vcode c libx264 -preset slow -tune film -crf 18 -acodec copy outout.mp4
I parametri utilizzati nell'esempio sono molto “aggressivi”, provare a diminuire la shakiness (ad esempio usando valore 3 o 4) e diminuire anche lo zoom e lo smoothing.
Un test con un video molto mosso (telecamera impugnata a mano su motocicletta in movimento) con zoom=1 e smoothing=30 ha ridotto il film a circa il 77% dell'originale sulle due dimensioni. Dei 1920 pixel originali di larghezza ne sono stati estrapolati circa 1480 (il filmato viene comunque scalato per mantenere la dimensione originale). Impostando lo zoom=0.5 si sono recuperati circa 15 pixel lineari (riduzione al 78% dell'originale).
Se si ha un file con i sottotitoli per un video, è possibile fare il muxing:
ffmpeg -i movie.mkv -i movie.sub -c copy \ -sub_charenc UTF-8 -c:s srt -metadata:s:s:0 language=eng \ movie_sub.mkv
We can use the video filter named yadif (yet another deinterlacing filter). In this example the result was encoded in MPEG-4 AVC using the libx264 library, forcing one keyframe every 8 frames:
ffmpeg -i input-video.mkv -codec:a copy -vf yadif -codec:v libx264 -preset slow \ -x264-params 'force-cfr=1:keyint=8:bframes=0:ref=1' \ output-video.mkv
It seems there is a bug in ffmpeg #6037 mkv muxing not broken: muxing two working files into a mkv produces a broken file: seeking around can break (mute) audio. I experienced this bug (with ffmpeg 4.1.6) trying to mux one mkv file containing one audio and one subtitle streams to another mkv file conaining video and audio. The resulting file did not play good in mplayer: seeking into the file caused audio or vido to stop playing.
This was the first try command line:
# The resulting video is broken. ffmpeg -i input_file1.mkv -i input_file2.mkv \ -map '0:v:0' -map '0:a:0' \ -map '1:a:0' -map '1:s:0' \ -codec:v copy -codec:a copy -codec:s copy \ output_file.mkv
The workaround was to extract each individual stream, and mux then together:
ffmpeg -i input_file1.mkv -map 0:v:0 -codec:v copy input-v_env.mkv ffmpeg -i input_file1.mkv -map 0:a:0 -codec:a copy input-a_ita.mkv ffmpeg -i input_file2.mkv -map 0:a:0 -codec:a copy input-a_eng.mkv ffmpeg -i input_file2.mkv -map 0:s:0 -codec:s copy input-s_eng.mkv ffmpeg \ -i input-v_env.mkv \ -i input-a_ita.mkv \ -i input-a_eng.mkv \ -i input-s_eng.mkv \ -codec:v copy -codec:a copy -codec:s copy \ -map '0' -map '1' -map '2' -map '3' \ output_file.mkv
Nella directory VIDEO_TS di un DVD la traccia principale è normalmente suddivisa in file numerati sequenzialmente, ad esempio: VTS_01_0.VOB
, VTS_01_1.VOB
, …
In teoria è sufficiente concatenare i file in un solo file destinazione e quindi trattarlo come un normale file audio/video. Tuttavia è possibile indicare i singoli file come input senza la necessità di occupare ulteriore spazio disco con questa sintassi:
SOURCE="concat:VTS_01_1.VOB|VTS_01_2.VOB|VTS_01_3.VOB|VTS_01_4.VOB|VTS_01_5.VOB" ffmpeg -i "$SOURCE" ...
Se un flusso di sottotitoli (ad esempio nel formato Picture based DVD) non indica correttamente l'offset iniziale di riproduzione è possibile dire ad ffmpeg di impostarlo opportunamente in fase di muxing. In questo esempio il primo sottotitolo appare a 44.5 secondi:
ffmpeg -i video-stream.mkv -i audio-stream.mkv -itsoffset 44.5 -i subtitles-stream.mkv ...
In generale dovrebbe essere possibile scoprire l'offset quando ffmpeg legge l'intero stream, al momento in cui trova il prmio frame dei subtitles mostra qualcosa del genere sulla console:
[mpeg @ 0x55f98bb2c6c0] New subtitle stream 0:7 at pos:14755854 and DTS:44.5s
See the page Final Rendering with libx264 and ffmpeg.
Vedere la pagina dedicata: Doppiaggio audio con Ardour.