AgentSkillsCN

voice-interaction

使用 Form 1040/1040-SR 以及常用附表,为 2025 税年准备并审核简单的美国个人联邦所得税申报表。当 Codex 需要确定申报要求、选择申报身份、比较标准扣除与逐项扣除、适用核心抵免(儿童税收抵免、EITC、子女及受抚养人照料抵免),并根据纳税人提供的文件(W-2、1099 系列、基本扣除/抵免输入)生成清晰的申报 checklist 时,可选用此方法。仅限联邦申报。

SKILL.md
--- frontmatter
name: voice-interaction
description: Enable voice conversations with Claude Code - speak commands and hear responses
tools:
  - mcp__voice__voice_listen
  - mcp__voice__voice_speak
  - mcp__voice__voice_conversation
  - mcp__voice__voice_status

Voice Interaction Mode

Speak to Claude Code and hear responses. Perfect for hands-free coding, accessibility, or when you're away from the keyboard.

Quick Start

  1. Check voice is working:

    code
    Use voice_status tool
    
  2. Listen to user:

    code
    Use voice_listen with duration=5
    
  3. Speak response:

    code
    Use voice_speak with text="Hello, I heard you!"
    

Voice Conversation Flow

When in voice mode, follow this pattern:

code
1. Speak a prompt/question to user
2. Listen for their response
3. Process what they said
4. Speak your response
5. Repeat

Example Conversation

code
Claude: voice_speak("What would you like me to help with?")
Claude: voice_listen(duration=10)
→ User said: "Can you find all the TODO comments in my code?"
Claude: [runs grep for TODOs]
Claude: voice_speak("I found 5 TODO comments. The first one is in main.py line 42 about fixing the database connection.")

Commands via Voice

Users can say things like:

  • "Search for [pattern] in [file/directory]"
  • "Open [filename]"
  • "Run the tests"
  • "Fix the error in [file]"
  • "What does [function name] do?"
  • "Commit these changes with message [message]"
  • "Stop" or "Cancel" or "Never mind"

Voice Response Guidelines

When speaking responses:

DO

  • Keep responses concise (< 30 seconds of speech)
  • Summarize long outputs ("Found 15 files, the most relevant are...")
  • Confirm actions before running them
  • Ask clarifying questions for ambiguous requests
  • Spell out unusual names/symbols

DON'T

  • Read entire file contents aloud
  • Speak raw error messages (summarize them)
  • Give long explanations (offer to show on screen instead)
  • Assume - ask if unclear

Handling Voice Commands

Parse user speech and map to actions:

User SaysAction
"search for X"Grep for X
"find file X"Glob for X
"read/open/show X"Read file X
"edit/change/fix X"Edit file X
"run tests"Bash: pytest/npm test
"run/execute X"Bash: X
"commit X"Git commit with message X
"what/explain X"Describe X
"help"List available commands
"stop/cancel/quit"Exit voice mode

Error Handling

If transcription is unclear:

code
voice_speak("I didn't catch that. Could you repeat?")
voice_listen(duration=10)

If command is ambiguous:

code
voice_speak("Did you mean X or Y?")
voice_listen(duration=5)

Voice Settings

Control voice behavior via environment variables:

bash
# Whisper model size (tiny/base/small/medium/large)
export WHISPER_MODEL=base

# Default recording duration
export VOICE_DURATION=5

Accessibility Notes

Voice mode is designed for:

  • Hands-free operation
  • Screen reader compatibility
  • Reduced visual dependency

When voice is active, also output text for screen readers.

Troubleshooting

"No audio available"

bash
pip install sounddevice numpy

"Whisper not found"

bash
pip install openai-whisper

"No TTS on Linux"

bash
pip install pyttsx3

Poor transcription

  • Speak clearly, closer to microphone
  • Reduce background noise
  • Try longer duration
  • Use larger Whisper model: export WHISPER_MODEL=small