Text-to-Speech (TTS) Service
Natural Voice Synthesis using Microsoft Edge TTS
Overview
The CSuite TTS Service provides high-quality text-to-speech using Microsoft Edge's neural voices, running locally for zero cost. It supports multiple Brazilian Portuguese voices with natural prosody.
🎯 Features
| Feature | Description |
|---|---|
| Neural Voices | Microsoft's natural-sounding voices |
| Portuguese Support | 3 Brazilian Portuguese voices |
| Voice Control | Adjustable rate, pitch, and volume |
| Multiple Formats | MP3, WAV, OGG output |
| Cost | FREE (no API costs) |
🎙️ Available Voices
Brazilian Portuguese 🇧🇷
| Voice ID | Name | Gender | Style |
|---|---|---|---|
pt-BR-FranciscaNeural |
Francisca | Female | Natural, warm |
pt-BR-AntonioNeural |
Antonio | Male | Professional |
pt-BR-ThalitaNeural |
Thalita | Female | Young, friendly |
English (US) 🇺🇸
| Voice ID | Name | Gender |
|---|---|---|
en-US-JennyNeural |
Jenny | Female |
en-US-GuyNeural |
Guy | Male |
🔗 Endpoints
Health Check
GET /health
List Voices
GET /voices
Synthesize Speech
POST /synthesize
Content-Type: application/json
Request Body:
{
"text": "Olá! Este é um teste de voz.",
"voice": "pt-BR-FranciscaNeural",
"rate": "+0%",
"pitch": "+0Hz",
"format": "mp3"
}
Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| text | String | Yes | Text to synthesize |
| voice | String | No | Voice ID (default: pt-BR-FranciscaNeural) |
| rate | String | No | Speed: -50% to +100% |
| pitch | String | No | Pitch: -50Hz to +50Hz |
| format | String | No | Output: mp3, base64, wav |
Stream Audio
GET /stream?text=Olá&voice=pt-BR-FranciscaNeural
Returns audio as streaming response.
📝 Example Usage
Python
import requests
import base64
# Generate speech
response = requests.post(
"http://localhost:8011/synthesize",
json={
"text": "Olá! Bem-vindo ao CSuite.",
"voice": "pt-BR-FranciscaNeural",
"format": "base64"
}
)
result = response.json()
# Save to file
audio_data = base64.b64decode(result["audio"])
with open("output.mp3", "wb") as f:
f.write(audio_data)
cURL
# Get base64 audio
curl -X POST http://localhost:8011/synthesize \
-H "Content-Type: application/json" \
-d '{"text": "Olá mundo!", "voice": "pt-BR-FranciscaNeural"}'
# Stream audio directly
curl "http://localhost:8011/stream?text=Olá%20mundo&voice=pt-BR-FranciscaNeural" > output.mp3
JavaScript
// Generate audio
const response = await fetch('http://localhost:8011/synthesize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text: 'Olá! Este é o CSuite.',
voice: 'pt-BR-FranciscaNeural',
format: 'base64'
})
});
const { audio } = await response.json();
// Play audio
const audioElement = new Audio(`data:audio/mp3;base64,${audio}`);
audioElement.play();
📊 Response Format
{
"audio": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVVVVVVVV...",
"voice": "pt-BR-FranciscaNeural",
"format": "mp3",
"text_length": 42,
"duration_estimate": 3.2
}
🔧 Configuration
| Variable | Default | Description |
|---|---|---|
TTS_PORT |
8011 | Service port |
TTS_VOICE |
pt-BR-FranciscaNeural | Default voice |
⚙️ Voice Control Examples
Slower Speech
{
"text": "Leia devagar para entender melhor.",
"rate": "-25%"
}
Faster Speech
{
"text": "Urgente! Ação necessária!",
"rate": "+50%"
}
Higher Pitch
{
"text": "Parabéns! Você ganhou!",
"pitch": "+20Hz"
}
🆚 Comparison with Cloud Services
| Service | Cost | Voices | Quality |
|---|---|---|---|
| CSuite TTS | FREE | 6 | Neural |
| ElevenLabs | $0.30/1K chars | 100+ | Premium |
| Google TTS | $0.016/1K chars | 220+ | Neural |
| Amazon Polly | $0.004/1K chars | 60+ | Neural |
🔗 Related Services
- STT Service - Convert speech to text
- OCR Service - Extract text from documents
- ARIA Gateway - Voice-enabled AI assistant