AgentSkillsCN

audio-mixing-patterns

视频制作中的 ffmpeg 音频混音模式。当您需要将旁白与音乐混合、实现音量衰减,或为演示内容平衡音量时,可选用此技能。

SKILL.md
--- frontmatter
name: audio-mixing-patterns
description: ffmpeg audio mixing patterns for video production. Use when mixing narration with music, implementing ducking, or balancing volume levels for demos
tags: [video, audio, ffmpeg, mixing, ducking, narration, music]
context: fork
agent: demo-producer
user-invocable: false
version: 1.0.0

Audio Mixing Patterns

Comprehensive guide to audio mixing for video production using ffmpeg. Covers narration/music balancing, automatic ducking, timing control, and loudness normalization.

Core Principle

Quality Audio = Clear Narration + Supportive Music + Appropriate Levels

The human voice occupies 85-255 Hz (fundamental) with harmonics up to 8kHz. Music must support, not compete.

Volume Balancing Formula

code
Standard Video Mix Ratios:
--------------------------
Narration:  100% (reference level)
Music:      15-20% of narration level
SFX:        70-100% of narration level (contextual)

dB Relationships:
-----------------
Narration:  -14 dB LUFS (dialogue standard)
Music bed:  -30 to -35 dB LUFS (under narration)
Music only: -16 dB LUFS (no narration sections)
SFX:        -18 to -20 dB LUFS

Volume Multiplier Quick Reference

RatioMultiplierUse Case
100%1.0Full volume (narration)
70%0.7Prominent SFX
50%0.5Equal blend
30%0.3Noticeable background
20%0.2Subtle bed (recommended music)
15%0.15Minimal presence
10%0.1Barely audible

Basic ffmpeg Mixing Commands

Two-Track Mix (Narration + Music)

bash
# Basic mix: narration at full, music at 15%
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]volume=1.0[narr];[1:a]volume=0.15[music];[narr][music]amix=inputs=2:duration=first" \
  -c:a aac -b:a 192k output.m4a

Three-Track Mix (Narration + Music + SFX)

bash
ffmpeg -i narration.mp3 -i music.mp3 -i sfx.mp3 \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]volume=0.15[music];\
    [2:a]volume=0.7[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=first:weights='3 1 2'" \
  -c:a aac -b:a 192k output.m4a

Timing with adelay Filter

The adelay filter positions audio at precise timestamps.

Syntax

bash
adelay=delays[|delays...][,all=1]
# delays: milliseconds or samples (with 'S' suffix)
# all=1: apply same delay to all channels

Position Music at Specific Time

bash
# Start music at 5 seconds
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]adelay=5000|5000,volume=0.15[music];\
    [narr][music]amix=inputs=2:duration=first" \
  output.m4a

Multiple Timed Audio Cues

bash
# Narration starts at 0, music at 2s, SFX at 5.5s
ffmpeg -i narration.mp3 -i music.mp3 -i sfx.wav \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]adelay=2000|2000,volume=0.15[music];\
    [2:a]adelay=5500|5500,volume=0.7[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=longest" \
  output.m4a

Audio Ducking

Automatically lower music when speech is present.

Simple Sidechain Compression (Ducking)

bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]asplit=2[narr][sc];\
    [1:a][sc]sidechaincompress=threshold=0.02:ratio=10:attack=50:release=500[ducked];\
    [narr][ducked]amix=inputs=2:duration=first" \
  output.m4a

Parameters Explained

ParameterValueEffect
threshold0.02 (default 0.125)Lower = more sensitive to speech
ratio10:1How much to reduce (10:1 = significant duck)
attack50msHow fast to duck when speech starts
release500msHow fast to return after speech ends
knee2.82843Softness of compression curve

Advanced Ducking with Precise Control

bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]asplit=2[narr][sc];\
    [1:a]volume=0.5[music_pre];\
    [music_pre][sc]sidechaincompress=\
      threshold=0.015:\
      ratio=15:\
      attack=30:\
      release=800:\
      makeup=1:\
      knee=6[ducked];\
    [narr][ducked]amix=inputs=2:duration=first:weights='1 0.4'" \
  output.m4a

Mix Ratios by Content Type

code
Content Type          | Narration | Music | SFX  | Notes
---------------------|-----------|-------|------|------------------
Tutorial/How-to      | 100%      | 10%   | 50%  | Voice clarity critical
Corporate/Business   | 100%      | 15%   | 60%  | Professional presence
Social Media         | 100%      | 20%   | 80%  | Higher energy
Documentary          | 100%      | 25%   | 100% | Cinematic feel
Promo/Advertising    | 100%      | 30%   | 100% | Impactful
Music Video          | 50%       | 100%  | 80%  | Music dominant
Podcast              | 100%      | 5%    | 30%  | Minimal distraction
E-learning           | 100%      | 8%    | 40%  | Focus on retention

Loudness Normalization (LUFS)

LUFS (Loudness Units Full Scale) is the broadcast standard for perceived loudness.

Target Levels by Platform

PlatformTarget LUFSTrue PeakNotes
YouTube-14 LUFS-1 dB TPAuto-normalized
Spotify-14 LUFS-1 dB TPLoudness penalty applied
Apple Music-16 LUFS-1 dB TPSound Check
Broadcast TV-24 LUFS-2 dB TPEBU R128 standard
Podcast-16 to -19 LUFS-1 dB TPApple spec
TikTok/Reels-14 LUFS-1 dB TPMobile optimization

Loudness Normalization Command

bash
# Normalize to -14 LUFS (YouTube/Spotify standard)
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11 \
  -c:a aac -b:a 192k output.m4a

Two-Pass Normalization (More Accurate)

bash
# Pass 1: Analyze
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11:print_format=json \
  -f null - 2>&1 | grep -A 12 "output_i"

# Pass 2: Apply measured values
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11:\
measured_I=-18.5:measured_TP=-2.3:measured_LRA=8.2:\
measured_thresh=-28.5:\
linear=true \
  -c:a aac -b:a 192k output.m4a

Multi-Track Production Pipeline

Complete Video Audio Mix

bash
ffmpeg -i video.mp4 -i narration.wav -i music.mp3 -i sfx.wav \
  -filter_complex "\
    [1:a]volume=1.0,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[narr];\
    [2:a]volume=0.15,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[music];\
    [3:a]adelay=3000|3000,volume=0.7,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=first:normalize=0[mixed];\
    [mixed]loudnorm=I=-14:TP=-1:LRA=11[final]" \
  -map 0:v -map "[final]" \
  -c:v copy -c:a aac -b:a 192k \
  output.mp4

Audio-Only Master Mix

bash
ffmpeg -i narration.wav -i music.mp3 -i intro_sfx.wav -i outro_sfx.wav \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]volume=0.15[music];\
    [2:a]adelay=0|0,volume=0.8[intro];\
    [3:a]adelay=55000|55000,volume=0.8[outro];\
    [narr][music][intro][outro]amix=inputs=4:duration=longest:weights='3 1 2 2'[mix];\
    [mix]loudnorm=I=-14:TP=-1[final]" \
  -map "[final]" -c:a aac -b:a 256k master_audio.m4a

Quick Reference: Common Patterns

Pattern 1: Narration + Background Music

bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]volume=1.0[n];[1:a]volume=0.15[m];[n][m]amix=inputs=2:duration=first" \
  output.m4a

Pattern 2: Music with Auto-Duck

bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]asplit=2[n][sc];[1:a][sc]sidechaincompress=threshold=0.02:ratio=10[d];[n][d]amix=inputs=2" \
  output.m4a

Pattern 3: Timed Intro Music Fade

bash
ffmpeg -i narration.mp3 -i intro_music.mp3 \
  -filter_complex "\
    [1:a]afade=t=out:st=8:d=2,volume=0.3[intro];\
    [0:a]adelay=10000|10000[narr];\
    [intro][narr]amix=inputs=2:duration=longest" \
  output.m4a

Pattern 4: Crossfade Between Segments

bash
ffmpeg -i segment1.mp3 -i segment2.mp3 \
  -filter_complex "\
    [0:a]afade=t=out:st=28:d=2[s1];\
    [1:a]adelay=28000|28000,afade=t=in:d=2[s2];\
    [s1][s2]amix=inputs=2:duration=longest" \
  output.m4a

Troubleshooting

IssueCauseSolution
Clipping/distortionCombined levels too highReduce individual volumes or add limiter
Narration buriedMusic too loudReduce music to 10-15%, add ducking
Hollow/thin soundPhase cancellationCheck mono compatibility
Pumping artifactsAggressive duckingIncrease attack/release times
Inconsistent levelsNo normalizationApply loudnorm filter

Add Limiter to Prevent Clipping

bash
ffmpeg -i input.mp3 \
  -af "alimiter=level_in=1:level_out=0.9:limit=0.95:attack=5:release=50" \
  output.m4a

Related Skills

  • video-pacing: Video rhythm and timing patterns
  • remotion-composer: Programmatic video generation
  • demo-producer: Product demo video production
  • thumbnail-first-frame: Video thumbnail optimization

References