Transcribe
Convert audio and video files to Korean text transcripts using OpenAI Whisper.
Quick Start
Run the transcription script with one or more media files:
.venv/bin/python .claude/skills/transcribe/scripts/transcribe.py sources/audio.m4a
The script:
- •Extracts audio from video files (skips for audio files)
- •Transcribes using Whisper with Korean language setting
- •Generates filename from transcript content + timestamp
- •Saves to
./output/directory
Workflow
Single File Transcription
.venv/bin/python .claude/skills/transcribe/scripts/transcribe.py sources/[FILE]
Example:
- •User: "sources/audio.m4a를 전사해줘"
- •Action: Run script with default
basemodel
Multiple Files
.venv/bin/python .claude/skills/transcribe/scripts/transcribe.py sources/file1.m4a sources/file2.mp4
Example:
- •User: "sources/ 폴더의 모든 m4a 파일을 전사해줘"
- •Action: Find all .m4a files, pass to script
Custom Model
Use --model flag to specify Whisper model (tiny, base, small, medium, large):
.venv/bin/python .claude/skills/transcribe/scripts/transcribe.py sources/video.mp4 --model tiny
Model selection:
- •
tiny: Fastest, less accurate (good for testing, short files) - •
base: Default, balanced speed/accuracy - •
small: Better accuracy, slower - •
medium/large: Best accuracy, very slow
Example:
- •User: "tiny 모델로 sources/video.mp4를 전사해줘"
- •Action: Run with
--model tiny
File Duration Considerations
Check file duration before transcription to set user expectations:
ffmpeg -i sources/[FILE] 2>&1 | grep Duration
Rough processing time estimates (base model):
- •2-5 min audio: ~30 seconds
- •10-20 min audio: ~1-2 minutes
- •1+ hour audio: ~5-10 minutes
For long files (>30 min), inform user of expected wait time and suggest using tiny model for faster results.
Output Format
Generated files are saved to ./output/ with naming pattern:
[first 30 chars of transcript]_YYYYMMDD_HHMMSS.txt
Example: 네 엉덕션을 미야 사는 신의 순놈을_20260108_183045.txt
Supported File Types
Video: .mp4, .avi, .mov, .mkv, .flv, .wmv, .webm Audio: .mp3, .wav, .m4a, .aac, .ogg, .flac, .wma
Scripts
transcribe.py
Main script for transcription workflow. Handles:
- •Video to audio extraction (ffmpeg)
- •Whisper transcription
- •Output file naming and saving
Usage:
.venv/bin/python .claude/skills/transcribe/scripts/transcribe.py <files...> [--model MODEL] [--output-dir DIR]
extract_media_to_text.py
Lower-level utility for single file transcription. Use transcribe.py instead for standard workflows.