AgentSkillsCN

parakeet-stt

本地语音转文本,使用 NVIDIA Parakeet TDT 0.6B v3(CPU 上的 ONNX)。比 Whisper 快 30 倍,支持 25 种语言,自动检测,兼容 OpenAI API。当您需要转录音频文件、将语音转换为文本或在本地处理语音录音而无需云 API 时使用。

SKILL.md
--- frontmatter
name: parakeet-stt
description: >-
  Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU).
  30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API.
  Use when transcribing audio files, converting speech to text, or processing
  voice recordings locally without cloud APIs.
homepage: https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai
metadata: {"clawdbot":{"emoji":"🦜","env":["PARAKEET_URL"]}}

Parakeet TDT (Speech-to-Text)

Local transcription using NVIDIA Parakeet TDT 0.6B v3 with ONNX Runtime. Runs on CPU — no GPU required. ~30x faster than realtime.

Installation

bash
# Clone the repo
git clone https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai.git
cd parakeet-tdt-0.6b-v3-fastapi-openai

# Run with Docker (recommended)
docker compose up -d parakeet-cpu

# Or run directly with Python
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 5000

Default port is 5000. Set PARAKEET_URL to override (e.g., http://localhost:5092).

API Endpoint

OpenAI-compatible API at $PARAKEET_URL (default: http://localhost:5000).

Quick Start

bash
# Transcribe audio file (plain text)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=text"

# Get timestamps and segments
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=verbose_json"

# Generate subtitles (SRT)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=srt"

Python / OpenAI SDK

python
import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("PARAKEET_URL", "http://localhost:5000") + "/v1",
    api_key="not-needed"
)

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="parakeet-tdt-0.6b-v3",
        file=f,
        response_format="text"
    )
print(transcript)

Response Formats

FormatOutput
textPlain text
json{"text": "..."}
verbose_jsonSegments with timestamps and words
srtSRT subtitles
vttWebVTT subtitles

Supported Languages (25)

English, Spanish, French, German, Italian, Portuguese, Polish, Russian, Ukrainian, Dutch, Swedish, Danish, Finnish, Norwegian, Greek, Czech, Romanian, Hungarian, Bulgarian, Slovak, Croatian, Lithuanian, Latvian, Estonian, Slovenian

Language is auto-detected — no configuration needed.

Web Interface

Open $PARAKEET_URL in a browser for drag-and-drop transcription UI.

Docker Management

bash
# Check status
docker ps --filter "name=parakeet"

# View logs
docker logs -f <container-name>

# Restart
docker compose restart

# Stop
docker compose down

Why Parakeet over Whisper?

  • Speed: ~30x faster than realtime on CPU
  • Accuracy: Comparable to Whisper large-v3
  • Privacy: Runs 100% locally, no cloud calls
  • Compatibility: Drop-in replacement for OpenAI's transcription API