Speech-to-Text (STT) Service
Local Voice Transcription using Faster-Whisper
Overview
The CSuite STT Service provides high-quality speech-to-text transcription using the OpenAI Whisper model, running locally for zero cost and complete privacy.
🎯 Features
| Feature | Description |
|---|---|
| Multilingual | 99+ languages supported |
| High Accuracy | State-of-the-art Whisper models |
| Privacy | 100% local processing |
| Cost | FREE (no API costs) |
| Speed | ~2-4 seconds per minute of audio |
🔗 Endpoints
Health Check
GET /health
List Models
GET /models
Returns available Whisper models:
- tiny - 75MB, fastest
- base - 145MB, good balance
- small - 484MB, very accurate (default)
- medium - 1.5GB, excellent
- large-v3 - 3GB, best accuracy
Transcribe Audio
POST /transcribe
Content-Type: multipart/form-data
Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| file | File | Yes | Audio file (MP3, WAV, FLAC, etc.) |
| url | String | Yes | URL to audio file |
| audio_base64 | String | Yes* | Base64 encoded audio |
| language | String | No | Language code (default: pt) |
| model | String | No | Model to use (default: small) |
*One of file, url, or audio_base64 is required.
Streaming Transcription
WebSocket /ws/transcribe
Real-time transcription via WebSocket for live audio.
📝 Example Usage
Python
import requests
# Upload audio file
with open("audio.mp3", "rb") as f:
response = requests.post(
"http://localhost:8010/transcribe",
files={"file": f},
data={"language": "pt"}
)
result = response.json()
print(result["text"]) # Transcribed text
print(result["confidence"]) # Confidence score
print(result["duration"]) # Audio duration
cURL
curl -X POST http://localhost:8010/transcribe \
-F "file=@recording.mp3" \
-F "language=pt"
JavaScript
const formData = new FormData();
formData.append('file', audioBlob, 'audio.webm');
formData.append('language', 'pt');
const response = await fetch('http://localhost:8010/transcribe', {
method: 'POST',
body: formData
});
const result = await response.json();
console.log(result.text);
📊 Response Format
{
"text": "Olá, este é um teste de transcrição.",
"language": "pt",
"confidence": 0.95,
"duration": 3.5,
"segments": [
{
"start": 0.0,
"end": 1.2,
"text": "Olá,",
"confidence": 0.98
},
{
"start": 1.2,
"end": 3.5,
"text": "este é um teste de transcrição.",
"confidence": 0.92
}
],
"processing_time": 1.8
}
🔧 Configuration
| Variable | Default | Description |
|---|---|---|
STT_PORT |
8010 | Service port |
STT_MODEL |
small | Default Whisper model |
STT_DEVICE |
cpu | Device (cpu/cuda) |
🆚 Comparison with Cloud Services
| Service | Cost | Privacy | Latency |
|---|---|---|---|
| CSuite STT | FREE | ✅ Local | ~2s/min |
| Deepgram | $0.0125/min | ❌ Cloud | ~1s/min |
| Google Speech | $0.024/min | ❌ Cloud | ~1s/min |
| Azure Speech | $0.016/min | ❌ Cloud | ~1s/min |
🔗 Related Services
- TTS Service - Convert text to speech
- OCR Service - Extract text from documents
- ARIA Gateway - Voice-enabled AI assistant