Document OCR Service
Document Text Recognition using docTR
Overview
The CSuite OCR Service provides high-accuracy text extraction from documents and images using the docTR library. It's designed for Portuguese documents including Notas Fiscais, contracts, and business documents.
🎯 Features
| Feature | Description |
|---|---|
| Layout-Aware | Preserves document structure |
| Table Detection | Recognizes tables and cells |
| High Accuracy | 80%+ confidence on typical documents |
| Portuguese Support | Optimized for Brazilian documents |
| Multiple Formats | PNG, JPG, TIFF, PDF support |
| Cost | FREE (no API costs) |
📄 Supported Document Types
- Notas Fiscais (NF-e, NFS-e)
- Boletos bancários
- Contratos e propostas
- Relatórios financeiros
- Recibos e comprovantes
- Documentos de identidade
- Faturas e contas
🔗 Endpoints
Health Check
GET /health
Full OCR (with layout)
POST /ocr/file
Content-Type: multipart/form-data
Returns complete document structure with blocks, lines, and word positions.
Simple OCR (text only)
POST /ocr/simple
Content-Type: multipart/form-data
Returns only the extracted text.
OCR from Base64
POST /ocr/base64
Content-Type: application/x-www-form-urlencoded
Process a base64-encoded image.
📝 Example Usage
Python - Simple
import requests
# Extract text from document
with open("nota_fiscal.png", "rb") as f:
response = requests.post(
"http://localhost:8012/ocr/simple",
files={"file": f}
)
result = response.json()
print(result["text"]) # Full text
print(result["confidence"]) # Average confidence
print(result["word_count"]) # Number of words
Python - Full Layout
import requests
with open("invoice.pdf", "rb") as f:
response = requests.post(
"http://localhost:8012/ocr/file",
files={"file": f}
)
result = response.json()
# Iterate through pages
for page in result["pages"]:
print(f"Page {page['page_number']}:")
for block in page["blocks"]:
for line in block["lines"]:
print(f" {line['text']} (conf: {line['confidence']:.0%})")
cURL
# Simple text extraction
curl -X POST http://localhost:8012/ocr/simple \
-F "file=@document.png"
# Full layout extraction
curl -X POST http://localhost:8012/ocr/file \
-F "file=@document.pdf" | jq
JavaScript
const formData = new FormData();
formData.append('file', documentFile);
const response = await fetch('http://localhost:8012/ocr/simple', {
method: 'POST',
body: formData
});
const { text, confidence, word_count } = await response.json();
console.log(`Extracted ${word_count} words with ${(confidence * 100).toFixed(0)}% confidence`);
📊 Response Formats
Simple Response (/ocr/simple)
{
"text": "NOTA FISCAL ELETRÔNICA\nNúmero: NF-2026-12345\nData: 01/02/2026\n...",
"confidence": 0.85,
"word_count": 62
}
Full Response (/ocr/file)
{
"pages": [
{
"page_number": 1,
"blocks": [
{
"lines": [
{
"text": "NOTA FISCAL ELETRÔNICA",
"words": [
{
"text": "NOTA",
"confidence": 0.98,
"bbox": [0.1, 0.05, 0.2, 0.08]
},
{
"text": "FISCAL",
"confidence": 0.97,
"bbox": [0.21, 0.05, 0.35, 0.08]
}
],
"confidence": 0.95
}
],
"block_type": "text"
}
],
"full_text": "NOTA FISCAL ELETRÔNICA..."
}
],
"full_text": "Complete document text...",
"word_count": 150,
"confidence": 0.85,
"processing_time": 2.5
}
🔧 Configuration
| Variable | Default | Description |
|---|---|---|
OCR_PORT |
8012 | Service port |
📐 Document Preprocessing Tips
For best results:
- Resolution: Use 300 DPI or higher
- Contrast: Ensure good text/background contrast
- Alignment: Rotate skewed documents
- Quality: Avoid blurry or compressed images
🆚 Comparison with Cloud Services
| Service | Cost | Accuracy | Privacy |
|---|---|---|---|
| CSuite OCR | FREE | 80-90% | ✅ Local |
| Google Vision | $1.50/1K pages | 95%+ | ❌ Cloud |
| AWS Textract | $1.50/1K pages | 95%+ | ❌ Cloud |
| Azure Vision | $1.00/1K pages | 90%+ | ❌ Cloud |
🔧 Model Architecture
The service uses docTR with:
- Detection: db_resnet50 - Text block detection
- Recognition: crnn_vgg16_bn - Character recognition
- Model Size: ~165MB total
📈 Performance
| Document Type | Processing Time | Accuracy |
|---|---|---|
| Simple text | ~1-2s | 90%+ |
| Tables | ~2-3s | 80%+ |
| Mixed layout | ~3-4s | 85%+ |
| Multi-page PDF | ~5-10s | 85%+ |
🔗 Related Services
- STT Service - Convert speech to text
- TTS Service - Convert text to speech
- ARIA Gateway - Voice-enabled AI assistant