Text-to-Speech Skill ⭐ Flagship
Domain: AI Accessibility & Communication
Inheritance: inheritable (promote to Master Alex for all heirs)
Version: 2.0.0
Last Updated: 2026-02-05
Author: Alex (Master Alex)
Status: ⭐ Flagship Skill - Core Alex capability
Why This is a Flagship Skill
Text-to-Speech gives Alex a voice. This transforms Alex from a text-only assistant into a multimodal companion that can:
- •Read documents aloud while you walk, drive, or rest your eyes
- •Proofread by ear - catch errors your eyes miss
- •Accessibility - full document access for vision-impaired users
- •Rehearsal - practice presentations with natural-sounding narration
- •Export knowledge - create MP3s for offline learning
Zero cost, zero dependencies - uses Microsoft Edge TTS (free, no API key) with native TypeScript.
User Experience
🎯 Quick Start: Read Any Document
Keyboard shortcut (fastest):
- •Open any document in VS Code
- •(Optional) Select specific text to read only that portion
- •Press
Ctrl+Alt+R(Windows/Linux) orCmd+Alt+R(macOS) - •Audio begins playing through the webview player
Command palette:
- •
Ctrl+Shift+P→ "Alex: Read Aloud"
📊 Status Bar Feedback
The status bar shows real-time progress during TTS operations:
| State | Display | Click Action |
|---|---|---|
| Connecting | $(loading~spin) Connecting... | - |
| Synthesizing | $(loading~spin) Synthesizing... | - |
| Streaming | $(loading~spin) Receiving... 45KB | - |
| Playing | $(unmute) Playing 35% | Stop |
| Paused | $(unmute) Paused | Stop |
🎵 Webview Audio Player
A sleek panel opens with full playback controls:
┌─────────────────────────────────────────────────────────┐ │ Alex TTS Player [×] │ ├─────────────────────────────────────────────────────────┤ │ │ │ ▶️ ⏹️ ═══════════●══════════ 1:23 / 4:56 │ │ │ │ 🔊 ────────●──────── │ │ │ └─────────────────────────────────────────────────────────┘
Features:
- •Progress bar with scrubbing (click/drag to seek)
- •Play/Pause button - toggle playback
- •Stop button - ends playback and closes panel
- •Volume slider - adjust playback volume
- •Time display - current position / total duration
- •Auto-close - panel closes when playback ends
🎤 Voice Selection
Choose Alex's voice before reading:
- •
Ctrl+Shift+P→ "Alex: Read with Voice Selection" - •Quick pick appears:
| Voice | Character | Best For |
|---|---|---|
| Default (GuyNeural) | Professional, clear | Technical docs, code review |
| Warm (ChristopherNeural) | Friendly, conversational | Tutorials, READMEs |
| British (RyanNeural) | Authoritative | Formal documents, presentations |
| Friendly (DavisNeural) | Casual, approachable | Chat logs, informal content |
- •Select voice → reading begins immediately
💾 Save as MP3
Export any document to audio file:
- •
Ctrl+Shift+P→ "Alex: Save as Audio" - •Save dialog opens (default name based on document)
- •Progress notification shows synthesis progress
- •Success notification with options:
- •Open File - plays in default audio player
- •Open Folder - reveals in file explorer
Use cases:
- •Create podcasts from documentation
- •Generate audio for offline learning
- •Archive presentations as audio
⏹️ Stop Reading
Multiple ways to stop playback:
- •Click status bar (shows
$(unmute)icon during playback) - •Press
Escapewhen reading - •Click stop button in webview player
- •Close webview panel
- •
Ctrl+Shift+P→ "Alex: Stop Reading"
📝 Smart Markdown Processing
Alex automatically strips markdown formatting for natural speech:
| You Write | Alex Reads |
|---|---|
# Heading | "Heading." (pause) |
**bold text** | "bold text" (slight emphasis) |
[link text](url) | "link text" |
`code` | "code" |
> blockquote | "Quote: ..." |
--- | (long pause) |
Symbol conversion:
| Symbol | Spoken As |
|---|---|
~5 minutes | "about 5 minutes" |
50% | "50 percent" |
A → B | "A leads to B" |
±5% | "plus or minus 5 percent" |
For Master Alex (Promotion Notes)
This skill gives Alex a voice. Version 2.0 uses native TypeScript WebSocket integration with Microsoft Edge TTS, eliminating external dependencies. Reading documents aloud with natural-sounding neural voices.
Version 2.0 Changes:
- •Native TypeScript implementation (no Python/MCP dependencies)
- •Direct WebSocket connection to Edge TTS endpoint
- •Webview-based audio player (cross-platform)
- •Integrated as VS Code commands
- •Status bar progress feedback
Why promote to Master:
- •Universal utility across all projects
- •Zero-cost implementation (uses free Edge TTS API)
- •No external dependencies (Python, MCP server)
- •Accessibility benefits for vision-impaired users
- •Integrated into VS Code extension
Dependencies (v2.0):
- •
wsnpm package (WebSocket client) - •VS Code webview API (for audio playback)
Overview
Alex's voice synthesis capability using Microsoft Edge TTS via native TypeScript. Enables reading markdown documents, code files, and text aloud with natural-sounding voices. Fully integrated into the VS Code extension.
Architecture (v2.0)
┌─────────────────────────────────────────────────────────────┐
│ Alex VS Code Extension │
├─────────────────────────────────────────────────────────────┤
│ │
│ Commands: │
│ • Alex: Read Aloud (Ctrl+Alt+R) │
│ • Alex: Read with Voice Selection │
│ • Alex: Save as Audio │
│ • Alex: Stop Reading │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ ttsService.ts │ │
│ │ Native WebSocket to Edge TTS │ │
│ │ • SSML generation │ │
│ │ • Markdown stripping │ │
│ │ • Progress callbacks │ │
│ └─────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ audioPlayer.ts │ │
│ │ Webview-based playback │ │
│ │ • Cross-platform HTML5 Audio │ │
│ │ • Play/pause/stop controls │ │
│ │ • Progress tracking │ │
│ └─────────────────────────────────────────────┘ │
│ │
└──────────────────────┬──────────────────────────────────────┘
│ WebSocket (wss://)
▼
┌─────────────────────────────────────────────────────────────┐
│ Microsoft Edge TTS Endpoint │
│ wss://speech.platform.bing.com/consumer/speech/... │
├─────────────────────────────────────────────────────────────┤
│ • 400+ neural voices, 90+ languages │
│ • Free, no API key required │
│ • MP3 output (24kHz, 48kbps) │
│ • SSML support for prosody control │
└─────────────────────────────────────────────────────────────┘
Alex Voice Presets
| Preset | Voice ID | Character |
|---|---|---|
| Default | en-US-GuyNeural | Professional male, clear articulation |
| Warm | en-US-ChristopherNeural | Friendly, conversational |
| British | en-GB-RyanNeural | British accent, authoritative |
| Friendly | en-US-DavisNeural | Casual, approachable |
Voice Selection Rationale
Alex's default voice (GuyNeural) was chosen for:
- •Clarity: Excellent pronunciation of technical terms
- •Neutrality: Not too formal, not too casual
- •Distinctiveness: Recognizable as "Alex's voice"
- •Consistency: Same voice across all platforms
VS Code Commands
Alex: Read Aloud
Command: alex.readAloud
Keybinding: Ctrl+Alt+R (Windows/Linux), Cmd+Alt+R (macOS)
Reads the current selection or entire document aloud using Alex's default voice.
Behavior:
- •If text is selected, reads only the selection
- •If no selection, reads the entire document
- •Markdown files are stripped of formatting for natural speech
- •Progress shown in status bar
- •Click status bar to stop playback
Alex: Read with Voice Selection
Command: alex.readWithVoice
Quick pick to select a voice preset before reading.
Alex: Save as Audio
Command: alex.saveAsAudio
Generate and save speech to an MP3 file. Opens a save dialog for output location.
Alex: Stop Reading
Command: alex.stopReading
Keybinding: Escape (when reading)
Immediately stops current playback.
Implementation Details
Core Files (src/tts/)
| File | Purpose |
|---|---|
ttsService.ts | WebSocket connection, SSML generation, synthesis |
audioPlayer.ts | Webview panel, playback controls, system fallback |
index.ts | Module exports |
Text Preprocessing
The prepareTextForSpeech() function strips markdown:
| Markdown | Speech Output |
|---|---|
# Heading | "Heading." (pause) |
**bold** | "bold" (emphasis via prosody) |
*italic* | "italic" |
`code` | "code" |
[link]\(url\) | "link" |
- item | "Item." |
> quote | "Quote: ..." |
--- | (long pause) |
Code Block Handling
```python
def hello():
print("Hello")
Becomes: "Python code block. Definition hello. Print hello. End code block." ### Symbol-to-Speech Transformations Symbols are converted to natural speech equivalents: | Symbol | Spoken As | Example | |--------|-----------|--------| | `~` | "approximately" or "about" | ~2 min → "about 2 minutes" | | `&` | "and" | A & B → "A and B" | | `@` | "at" | user@email → "user at email" | | `%` | "percent" | 50% → "50 percent" | | `+` | "plus" | +10% → "plus 10 percent" | | `→` | "leads to" or "becomes" | A → B → "A becomes B" | | `—` | (pause) | word—word → "word (pause) word" | | `#` | (context-dependent) | #1 → "number 1"; ## → (heading marker) | | `<` / `>` | "less than" / "greater than" | x > 5 → "x greater than 5" | | `≥` / `≤` | "greater than or equal" / "less than or equal" | | | `µ` | "micro" | µg → "microgram" | | `°` | "degrees" | 37°C → "37 degrees celsius" | | `±` | "plus or minus" | ±5% → "plus or minus 5 percent" | **Design Principle**: Would a human reading this aloud say the symbol name, or translate it to meaning? Almost always the latter. --- ## Installation (v2.0) TTS v2 is built into the Alex VS Code extension. No separate installation required. ### Package Dependencies The extension automatically includes: - `ws` (WebSocket client for Edge TTS connection) - `fs-extra` (file operations for audio saving) ### Verification After extension update, verify TTS works: 1. Open any document 2. Press `Ctrl+Alt+R` (Windows/Linux) or `Cmd+Alt+R` (macOS) 3. Status bar should show "$(unmute) Synthesizing..." 4. Audio should play through webview panel --- ## Usage Patterns ### Read Current Document
Press Ctrl+Alt+R to read document aloud Select text first to read only selection
### Generate Audio File
Command Palette → "Alex: Save as Audio" Choose output location → MP3 saved
### Voice Customization
Command Palette → "Alex: Read with Voice Selection" Choose: Default | Warm | British | Friendly
--- ## Edge TTS Technical Reference ### WebSocket Endpoint
wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1 ?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 &ConnectionId=[UUID]
### Audio Format
- **Codec**: MP3
- **Sample Rate**: 24kHz
- **Bitrate**: 48kbps mono
### SSML Template
```xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-GuyNeural">
<prosody rate="+0%" pitch="+0Hz" volume="+0%">
Text content here
</prosody>
</voice>
</speak>
Popular Voice IDs
| Language | Voice | Style |
|---|---|---|
| en-US | GuyNeural | Professional male |
| en-US | JennyNeural | Professional female |
| en-US | AriaNeural | News anchor style |
| en-GB | RyanNeural | British male |
| en-GB | SoniaNeural | British female |
| en-AU | WilliamNeural | Australian male |
| en-IN | NeerjaNeural | Indian English |
Accessibility Benefits
| Use Case | Benefit |
|---|---|
| Vision impaired | Full document access via audio |
| Multitasking | Review code while walking/driving |
| Learning | Auditory reinforcement of reading |
| Proofreading | Catch errors by hearing text |
| Long documents | Listen during breaks |
Version History
v2.0.0 (2026-02-06)
- •Native TypeScript implementation
- •Removed Python/MCP server dependencies
- •Webview-based cross-platform audio player
- •VS Code command integration
- •Status bar progress feedback
v1.1.0 (2026-02-05)
- •Added Alex voice presets
- •Enhanced markdown stripping
- •Symbol to speech conversion
v1.0.0 (2026-02-04)
- •Initial implementation via MCP server
- •Python edge-tts integration
- •Basic markdown support
Synapses
- •accessibility: Primary use case enabler
- •vscode-extension-patterns: Extension command patterns
- •markdown-mermaid: Source content processing
- •academic-research: Document reading for research projects
- •gamma-presentations: Audio playback of pitch content for rehearsal and delivery
- •project-management: Stakeholder pitch presentations generated as audio files
Future Enhancements
| Feature | Status | Notes |
|---|---|---|
| Real-time streaming | Planned | Start playing before full generation |
| SSML support | Planned | Fine-grained prosody control |
| Section navigation | Planned | "Skip to next heading" |
| Bookmark resume | Planned | Resume from last position |
| Speed presets | Planned | 1x, 1.5x, 2x reading speeds |