Stt Service

Speech-to-Text (STT) Service

Local Voice Transcription using Faster-Whisper

Overview

The CSuite STT Service provides high-quality speech-to-text transcription using the OpenAI Whisper model, running locally for zero cost and complete privacy.

🎯 Features

Feature Description
Multilingual 99+ languages supported
High Accuracy State-of-the-art Whisper models
Privacy 100% local processing
Cost FREE (no API costs)
Speed ~2-4 seconds per minute of audio

🔗 Endpoints

Health Check

GET /health

List Models

GET /models

Returns available Whisper models:
- tiny - 75MB, fastest
- base - 145MB, good balance
- small - 484MB, very accurate (default)
- medium - 1.5GB, excellent
- large-v3 - 3GB, best accuracy

Transcribe Audio

POST /transcribe
Content-Type: multipart/form-data

Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| file | File | Yes | Audio file (MP3, WAV, FLAC, etc.) |
| url | String | Yes
| URL to audio file |
| audio_base64 | String | Yes* | Base64 encoded audio |
| language | String | No | Language code (default: pt) |
| model | String | No | Model to use (default: small) |

*One of file, url, or audio_base64 is required.

Streaming Transcription

WebSocket /ws/transcribe

Real-time transcription via WebSocket for live audio.

📝 Example Usage

Python

import requests

# Upload audio file
with open("audio.mp3", "rb") as f:
    response = requests.post(
        "http://localhost:8010/transcribe",
        files={"file": f},
        data={"language": "pt"}
    )

result = response.json()
print(result["text"])       # Transcribed text
print(result["confidence"]) # Confidence score
print(result["duration"])   # Audio duration

cURL

curl -X POST http://localhost:8010/transcribe \
    -F "file=@recording.mp3" \
    -F "language=pt"

JavaScript

const formData = new FormData();
formData.append('file', audioBlob, 'audio.webm');
formData.append('language', 'pt');

const response = await fetch('http://localhost:8010/transcribe', {
    method: 'POST',
    body: formData
});

const result = await response.json();
console.log(result.text);

📊 Response Format

{
    "text": "Olá, este é um teste de transcrição.",
    "language": "pt",
    "confidence": 0.95,
    "duration": 3.5,
    "segments": [
        {
            "start": 0.0,
            "end": 1.2,
            "text": "Olá,",
            "confidence": 0.98
        },
        {
            "start": 1.2,
            "end": 3.5,
            "text": "este é um teste de transcrição.",
            "confidence": 0.92
        }
    ],
    "processing_time": 1.8
}

🔧 Configuration

Variable Default Description
STT_PORT 8010 Service port
STT_MODEL small Default Whisper model
STT_DEVICE cpu Device (cpu/cuda)

🆚 Comparison with Cloud Services

Service Cost Privacy Latency
CSuite STT FREE ✅ Local ~2s/min
Deepgram $0.0125/min ❌ Cloud ~1s/min
Google Speech $0.024/min ❌ Cloud ~1s/min
Azure Speech $0.016/min ❌ Cloud ~1s/min

🔊 Text-to-Speech

1.0x
1.0
Pronto para reproduzir