Speech-to-Text (STT) Service

Local Voice Transcription using Faster-Whisper

Overview

The CSuite STT Service provides high-quality speech-to-text transcription using the OpenAI Whisper model, running locally for zero cost and complete privacy.

🎯 Features

Feature	Description
Multilingual	99+ languages supported
High Accuracy	State-of-the-art Whisper models
Privacy	100% local processing
Cost	FREE (no API costs)
Speed	~2-4 seconds per minute of audio

🔗 Endpoints

Health Check

GET /health

List Models

GET /models

Returns available Whisper models:
- tiny - 75MB, fastest
- base - 145MB, good balance
- small - 484MB, very accurate (default)
- medium - 1.5GB, excellent
- large-v3 - 3GB, best accuracy

Transcribe Audio

POST /transcribe
Content-Type: multipart/form-data

Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| file | File | Yes | Audio file (MP3, WAV, FLAC, etc.) |
| url | String | Yes | URL to audio file |
| audio_base64 | String | Yes* | Base64 encoded audio |
| language | String | No | Language code (default: pt) |
| model | String | No | Model to use (default: small) |

*One of file, url, or audio_base64 is required.

Streaming Transcription

WebSocket /ws/transcribe

Real-time transcription via WebSocket for live audio.

📝 Example Usage

Python

import requests

# Upload audio file
with open("audio.mp3", "rb") as f:
    response = requests.post(
        "http://localhost:8010/transcribe",
        files={"file": f},
        data={"language": "pt"}
    )

result = response.json()
print(result["text"])       # Transcribed text
print(result["confidence"]) # Confidence score
print(result["duration"])   # Audio duration

cURL

curl -X POST http://localhost:8010/transcribe \
    -F "file=@recording.mp3" \
    -F "language=pt"

JavaScript

const formData = new FormData();
formData.append('file', audioBlob, 'audio.webm');
formData.append('language', 'pt');

const response = await fetch('http://localhost:8010/transcribe', {
    method: 'POST',
    body: formData
});

const result = await response.json();
console.log(result.text);

📊 Response Format

{
    "text": "Olá, este é um teste de transcrição.",
    "language": "pt",
    "confidence": 0.95,
    "duration": 3.5,
    "segments": [
        {
            "start": 0.0,
            "end": 1.2,
            "text": "Olá,",
            "confidence": 0.98
        },
        {
            "start": 1.2,
            "end": 3.5,
            "text": "este é um teste de transcrição.",
            "confidence": 0.92
        }
    ],
    "processing_time": 1.8
}

🔧 Configuration

Variable	Default	Description
`STT_PORT`	8010	Service port
`STT_MODEL`	small	Default Whisper model
`STT_DEVICE`	cpu	Device (cpu/cuda)

🆚 Comparison with Cloud Services

Service	Cost	Privacy	Latency
CSuite STT	FREE	✅ Local	~2s/min
Deepgram	$0.0125/min	❌ Cloud	~1s/min
Google Speech	$0.024/min	❌ Cloud	~1s/min
Azure Speech	$0.016/min	❌ Cloud	~1s/min

TTS Service - Convert text to speech
OCR Service - Extract text from documents
ARIA Gateway - Voice-enabled AI assistant

Stt Service

Speech-to-Text (STT) Service

Overview

🎯 Features

🔗 Endpoints

Health Check

List Models

Transcribe Audio

Streaming Transcription

📝 Example Usage

Python

cURL

JavaScript

📊 Response Format

🔧 Configuration

🆚 Comparison with Cloud Services

🔊 Text-to-Speech

Speech-to-Text (STT) Service

Overview

🎯 Features

🔗 Endpoints

Health Check

List Models

Transcribe Audio

Streaming Transcription

📝 Example Usage

Python

cURL

JavaScript

📊 Response Format

🔧 Configuration

🆚 Comparison with Cloud Services

🔗 Related Services

🔊 Text-to-Speech