Text-to-Speech Skill ⭐ Flagship

Domain: AI Accessibility & Communication
Inheritance: inheritable (promote to Master Alex for all heirs)
Version: 2.0.0
Last Updated: 2026-02-05
Author: Alex (Master Alex)
Status: ⭐ Flagship Skill - Core Alex capability

Why This is a Flagship Skill

Text-to-Speech gives Alex a voice. This transforms Alex from a text-only assistant into a multimodal companion that can:

•Read documents aloud while you walk, drive, or rest your eyes
•Proofread by ear - catch errors your eyes miss
•Accessibility - full document access for vision-impaired users
•Rehearsal - practice presentations with natural-sounding narration
•Export knowledge - create MP3s for offline learning

Zero cost, zero dependencies - uses Microsoft Edge TTS (free, no API key) with native TypeScript.

User Experience

🎯 Quick Start: Read Any Document

Keyboard shortcut (fastest):

•Open any document in VS Code
•(Optional) Select specific text to read only that portion
•Press Ctrl+Alt+R (Windows/Linux) or Cmd+Alt+R (macOS)
•Audio begins playing through the webview player

Command palette:

•Ctrl+Shift+P → "Alex: Read Aloud"

📊 Status Bar Feedback

The status bar shows real-time progress during TTS operations:

State	Display	Click Action
Connecting	`$(loading~spin) Connecting...`	-
Synthesizing	`$(loading~spin) Synthesizing...`	-
Streaming	`$(loading~spin) Receiving... 45KB`	-
Playing	`$(unmute) Playing 35%`	Stop
Paused	`$(unmute) Paused`	Stop

🎵 Webview Audio Player

A sleek panel opens with full playback controls:

code

┌─────────────────────────────────────────────────────────┐
│  Alex TTS Player                                    [×] │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ▶️ ⏹️   ═══════════●══════════   1:23 / 4:56          │
│                                                         │
│  🔊 ────────●────────                                   │
│                                                         │
└─────────────────────────────────────────────────────────┘

Features:

•Progress bar with scrubbing (click/drag to seek)
•Play/Pause button - toggle playback
•Stop button - ends playback and closes panel
•Volume slider - adjust playback volume
•Time display - current position / total duration
•Auto-close - panel closes when playback ends

🎤 Voice Selection

Choose Alex's voice before reading:

•Ctrl+Shift+P → "Alex: Read with Voice Selection"
•Quick pick appears:

Voice	Character	Best For
Default (GuyNeural)	Professional, clear	Technical docs, code review
Warm (ChristopherNeural)	Friendly, conversational	Tutorials, READMEs
British (RyanNeural)	Authoritative	Formal documents, presentations
Friendly (DavisNeural)	Casual, approachable	Chat logs, informal content

•Select voice → reading begins immediately

💾 Save as MP3

Export any document to audio file:

•Ctrl+Shift+P → "Alex: Save as Audio"
•Save dialog opens (default name based on document)
•Progress notification shows synthesis progress
•
Success notification with options:
- •Open File - plays in default audio player
- •Open Folder - reveals in file explorer

Use cases:

•Create podcasts from documentation
•Generate audio for offline learning
•Archive presentations as audio

⏹️ Stop Reading

Multiple ways to stop playback:

•Click status bar (shows $(unmute) icon during playback)
•Press Escape when reading
•Click stop button in webview player
•Close webview panel
•Ctrl+Shift+P → "Alex: Stop Reading"

📝 Smart Markdown Processing

Alex automatically strips markdown formatting for natural speech:

You Write	Alex Reads
`# Heading`	"Heading." (pause)
`bold text`	"bold text" (slight emphasis)
`[link text](url)`	"link text"
`code`	"code"
`> blockquote`	"Quote: ..."
`---`	(long pause)

Symbol conversion:

Symbol	Spoken As
`~5 minutes`	"about 5 minutes"
`50%`	"50 percent"
`A → B`	"A leads to B"
`±5%`	"plus or minus 5 percent"

For Master Alex (Promotion Notes)

This skill gives Alex a voice. Version 2.0 uses native TypeScript WebSocket integration with Microsoft Edge TTS, eliminating external dependencies. Reading documents aloud with natural-sounding neural voices.

Version 2.0 Changes:

•Native TypeScript implementation (no Python/MCP dependencies)
•Direct WebSocket connection to Edge TTS endpoint
•Webview-based audio player (cross-platform)
•Integrated as VS Code commands
•Status bar progress feedback

Why promote to Master:

•Universal utility across all projects
•Zero-cost implementation (uses free Edge TTS API)
•No external dependencies (Python, MCP server)
•Accessibility benefits for vision-impaired users
•Integrated into VS Code extension

Dependencies (v2.0):

•ws npm package (WebSocket client)
•VS Code webview API (for audio playback)

Overview

Alex's voice synthesis capability using Microsoft Edge TTS via native TypeScript. Enables reading markdown documents, code files, and text aloud with natural-sounding voices. Fully integrated into the VS Code extension.

Architecture (v2.0)

text

┌─────────────────────────────────────────────────────────────┐
│                 Alex VS Code Extension                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Commands:                                                   │
│  • Alex: Read Aloud (Ctrl+Alt+R)                            │
│  • Alex: Read with Voice Selection                          │
│  • Alex: Save as Audio                                      │
│  • Alex: Stop Reading                                       │
│                     │                                        │
│                     ▼                                        │
│  ┌─────────────────────────────────────────────┐            │
│  │           ttsService.ts                       │            │
│  │   Native WebSocket to Edge TTS               │            │
│  │   • SSML generation                          │            │
│  │   • Markdown stripping                       │            │
│  │   • Progress callbacks                       │            │
│  └─────────────────┬───────────────────────────┘            │
│                    │                                         │
│                    ▼                                         │
│  ┌─────────────────────────────────────────────┐            │
│  │           audioPlayer.ts                      │            │
│  │   Webview-based playback                     │            │
│  │   • Cross-platform HTML5 Audio               │            │
│  │   • Play/pause/stop controls                 │            │
│  │   • Progress tracking                        │            │
│  └─────────────────────────────────────────────┘            │
│                                                              │
└──────────────────────┬──────────────────────────────────────┘
                       │ WebSocket (wss://)
                       ▼
┌─────────────────────────────────────────────────────────────┐
│               Microsoft Edge TTS Endpoint                    │
│   wss://speech.platform.bing.com/consumer/speech/...        │
├─────────────────────────────────────────────────────────────┤
│  • 400+ neural voices, 90+ languages                        │
│  • Free, no API key required                                │
│  • MP3 output (24kHz, 48kbps)                               │
│  • SSML support for prosody control                         │
└─────────────────────────────────────────────────────────────┘

Alex Voice Presets

Preset	Voice ID	Character
Default	en-US-GuyNeural	Professional male, clear articulation
Warm	en-US-ChristopherNeural	Friendly, conversational
British	en-GB-RyanNeural	British accent, authoritative
Friendly	en-US-DavisNeural	Casual, approachable

Voice Selection Rationale

Alex's default voice (GuyNeural) was chosen for:

•Clarity: Excellent pronunciation of technical terms
•Neutrality: Not too formal, not too casual
•Distinctiveness: Recognizable as "Alex's voice"
•Consistency: Same voice across all platforms

VS Code Commands

Alex: Read Aloud

Command: alex.readAloud
Keybinding: Ctrl+Alt+R (Windows/Linux), Cmd+Alt+R (macOS)

Reads the current selection or entire document aloud using Alex's default voice.

Behavior:

•If text is selected, reads only the selection
•If no selection, reads the entire document
•Markdown files are stripped of formatting for natural speech
•Progress shown in status bar
•Click status bar to stop playback

Alex: Read with Voice Selection

Command: alex.readWithVoice

Quick pick to select a voice preset before reading.

Alex: Save as Audio

Command: alex.saveAsAudio

Generate and save speech to an MP3 file. Opens a save dialog for output location.

Alex: Stop Reading

Command: alex.stopReading
Keybinding: Escape (when reading)

Immediately stops current playback.

Implementation Details

Core Files (src/tts/)

File	Purpose
`ttsService.ts`	WebSocket connection, SSML generation, synthesis
`audioPlayer.ts`	Webview panel, playback controls, system fallback
`index.ts`	Module exports

Text Preprocessing

The prepareTextForSpeech() function strips markdown:

Markdown	Speech Output
`# Heading`	"Heading." (pause)
`bold`	"bold" (emphasis via prosody)
`italic`	"italic"
`code`	"code"
`[link]$url$`	"link"
`- item`	"Item."
`> quote`	"Quote: ..."
`---`	(long pause)

Code Block Handling

markdown

```python
def hello():
    print("Hello")

code


Becomes: "Python code block. Definition hello. Print hello. End code block."

### Symbol-to-Speech Transformations

Symbols are converted to natural speech equivalents:

| Symbol | Spoken As | Example |
|--------|-----------|--------|
| `~` | "approximately" or "about" | ~2 min → "about 2 minutes" |
| `&` | "and" | A & B → "A and B" |
| `@` | "at" | user@email → "user at email" |
| `%` | "percent" | 50% → "50 percent" |
| `+` | "plus" | +10% → "plus 10 percent" |
| `→` | "leads to" or "becomes" | A → B → "A becomes B" |
| `—` | (pause) | word—word → "word (pause) word" |
| `#` | (context-dependent) | #1 → "number 1"; ## → (heading marker) |
| `<` / `>` | "less than" / "greater than" | x > 5 → "x greater than 5" |
| `≥` / `≤` | "greater than or equal" / "less than or equal" | |
| `µ` | "micro" | µg → "microgram" |
| `°` | "degrees" | 37°C → "37 degrees celsius" |
| `±` | "plus or minus" | ±5% → "plus or minus 5 percent" |

**Design Principle**: Would a human reading this aloud say the symbol name, or translate it to meaning? Almost always the latter.

---

## Installation (v2.0)

TTS v2 is built into the Alex VS Code extension. No separate installation required.

### Package Dependencies

The extension automatically includes:
- `ws` (WebSocket client for Edge TTS connection)
- `fs-extra` (file operations for audio saving)

### Verification

After extension update, verify TTS works:

1. Open any document
2. Press `Ctrl+Alt+R` (Windows/Linux) or `Cmd+Alt+R` (macOS)
3. Status bar should show "$(unmute) Synthesizing..."
4. Audio should play through webview panel

---

## Usage Patterns

### Read Current Document

Press Ctrl+Alt+R to read document aloud Select text first to read only selection

code


### Generate Audio File

Command Palette → "Alex: Save as Audio" Choose output location → MP3 saved

code


### Voice Customization

Command Palette → "Alex: Read with Voice Selection" Choose: Default | Warm | British | Friendly

code


---

## Edge TTS Technical Reference

### WebSocket Endpoint

wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1 ?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 &ConnectionId=[UUID]

code


### Audio Format

- **Codec**: MP3
- **Sample Rate**: 24kHz
- **Bitrate**: 48kbps mono

### SSML Template

```xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <voice name="en-US-GuyNeural">
    <prosody rate="+0%" pitch="+0Hz" volume="+0%">
      Text content here
    </prosody>
  </voice>
</speak>

Popular Voice IDs

Language	Voice	Style
en-US	GuyNeural	Professional male
en-US	JennyNeural	Professional female
en-US	AriaNeural	News anchor style
en-GB	RyanNeural	British male
en-GB	SoniaNeural	British female
en-AU	WilliamNeural	Australian male
en-IN	NeerjaNeural	Indian English

Accessibility Benefits

Use Case	Benefit
Vision impaired	Full document access via audio
Multitasking	Review code while walking/driving
Learning	Auditory reinforcement of reading
Proofreading	Catch errors by hearing text
Long documents	Listen during breaks

Version History

v2.0.0 (2026-02-06)

•Native TypeScript implementation
•Removed Python/MCP server dependencies
•Webview-based cross-platform audio player
•VS Code command integration
•Status bar progress feedback

v1.1.0 (2026-02-05)

•Added Alex voice presets
•Enhanced markdown stripping
•Symbol to speech conversion

v1.0.0 (2026-02-04)

•Initial implementation via MCP server
•Python edge-tts integration
•Basic markdown support

Synapses

•accessibility: Primary use case enabler
•vscode-extension-patterns: Extension command patterns
•markdown-mermaid: Source content processing
•academic-research: Document reading for research projects
•gamma-presentations: Audio playback of pitch content for rehearsal and delivery
•project-management: Stakeholder pitch presentations generated as audio files

Future Enhancements

Feature	Status	Notes
Real-time streaming	Planned	Start playing before full generation
SSML support	Planned	Fine-grained prosody control
Section navigation	Planned	"Skip to next heading"
Bookmark resume	Planned	Resume from last position
Speed presets	Planned	1x, 1.5x, 2x reading speeds