Tts Service

Text-to-Speech (TTS) Service

Natural Voice Synthesis using Microsoft Edge TTS

Overview

The CSuite TTS Service provides high-quality text-to-speech using Microsoft Edge's neural voices, running locally for zero cost. It supports multiple Brazilian Portuguese voices with natural prosody.

🎯 Features

Feature Description
Neural Voices Microsoft's natural-sounding voices
Portuguese Support 3 Brazilian Portuguese voices
Voice Control Adjustable rate, pitch, and volume
Multiple Formats MP3, WAV, OGG output
Cost FREE (no API costs)

🎙️ Available Voices

Brazilian Portuguese 🇧🇷

Voice ID Name Gender Style
pt-BR-FranciscaNeural Francisca Female Natural, warm
pt-BR-AntonioNeural Antonio Male Professional
pt-BR-ThalitaNeural Thalita Female Young, friendly

English (US) 🇺🇸

Voice ID Name Gender
en-US-JennyNeural Jenny Female
en-US-GuyNeural Guy Male

🔗 Endpoints

Health Check

GET /health

List Voices

GET /voices

Synthesize Speech

POST /synthesize
Content-Type: application/json

Request Body:

{
    "text": "Olá! Este é um teste de voz.",
    "voice": "pt-BR-FranciscaNeural",
    "rate": "+0%",
    "pitch": "+0Hz",
    "format": "mp3"
}

Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| text | String | Yes | Text to synthesize |
| voice | String | No | Voice ID (default: pt-BR-FranciscaNeural) |
| rate | String | No | Speed: -50% to +100% |
| pitch | String | No | Pitch: -50Hz to +50Hz |
| format | String | No | Output: mp3, base64, wav |

Stream Audio

GET /stream?text=Olá&voice=pt-BR-FranciscaNeural

Returns audio as streaming response.

📝 Example Usage

Python

import requests
import base64

# Generate speech
response = requests.post(
    "http://localhost:8011/synthesize",
    json={
        "text": "Olá! Bem-vindo ao CSuite.",
        "voice": "pt-BR-FranciscaNeural",
        "format": "base64"
    }
)

result = response.json()

# Save to file
audio_data = base64.b64decode(result["audio"])
with open("output.mp3", "wb") as f:
    f.write(audio_data)

cURL

# Get base64 audio
curl -X POST http://localhost:8011/synthesize \
    -H "Content-Type: application/json" \
    -d '{"text": "Olá mundo!", "voice": "pt-BR-FranciscaNeural"}'

# Stream audio directly
curl "http://localhost:8011/stream?text=Olá%20mundo&voice=pt-BR-FranciscaNeural" > output.mp3

JavaScript

// Generate audio
const response = await fetch('http://localhost:8011/synthesize', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
        text: 'Olá! Este é o CSuite.',
        voice: 'pt-BR-FranciscaNeural',
        format: 'base64'
    })
});

const { audio } = await response.json();

// Play audio
const audioElement = new Audio(`data:audio/mp3;base64,${audio}`);
audioElement.play();

📊 Response Format

{
    "audio": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVVVVVVVV...",
    "voice": "pt-BR-FranciscaNeural",
    "format": "mp3",
    "text_length": 42,
    "duration_estimate": 3.2
}

🔧 Configuration

Variable Default Description
TTS_PORT 8011 Service port
TTS_VOICE pt-BR-FranciscaNeural Default voice

⚙️ Voice Control Examples

Slower Speech

{
    "text": "Leia devagar para entender melhor.",
    "rate": "-25%"
}

Faster Speech

{
    "text": "Urgente! Ação necessária!",
    "rate": "+50%"
}

Higher Pitch

{
    "text": "Parabéns! Você ganhou!",
    "pitch": "+20Hz"
}

🆚 Comparison with Cloud Services

Service Cost Voices Quality
CSuite TTS FREE 6 Neural
ElevenLabs $0.30/1K chars 100+ Premium
Google TTS $0.016/1K chars 220+ Neural
Amazon Polly $0.004/1K chars 60+ Neural

🔊 Text-to-Speech

1.0x
1.0
Pronto para reproduzir