Document OCR Service

Document Text Recognition using docTR

Overview

The CSuite OCR Service provides high-accuracy text extraction from documents and images using the docTR library. It's designed for Portuguese documents including Notas Fiscais, contracts, and business documents.

🎯 Features

Feature	Description
Layout-Aware	Preserves document structure
Table Detection	Recognizes tables and cells
High Accuracy	80%+ confidence on typical documents
Portuguese Support	Optimized for Brazilian documents
Multiple Formats	PNG, JPG, TIFF, PDF support
Cost	FREE (no API costs)

📄 Supported Document Types

Notas Fiscais (NF-e, NFS-e)
Boletos bancários
Contratos e propostas
Relatórios financeiros
Recibos e comprovantes
Documentos de identidade
Faturas e contas

🔗 Endpoints

Health Check

GET /health

Full OCR (with layout)

POST /ocr/file
Content-Type: multipart/form-data

Returns complete document structure with blocks, lines, and word positions.

Simple OCR (text only)

POST /ocr/simple
Content-Type: multipart/form-data

Returns only the extracted text.

OCR from Base64

POST /ocr/base64
Content-Type: application/x-www-form-urlencoded

Process a base64-encoded image.

📝 Example Usage

Python - Simple

import requests

# Extract text from document
with open("nota_fiscal.png", "rb") as f:
    response = requests.post(
        "http://localhost:8012/ocr/simple",
        files={"file": f}
    )

result = response.json()
print(result["text"])       # Full text
print(result["confidence"]) # Average confidence
print(result["word_count"]) # Number of words

Python - Full Layout

import requests

with open("invoice.pdf", "rb") as f:
    response = requests.post(
        "http://localhost:8012/ocr/file",
        files={"file": f}
    )

result = response.json()

# Iterate through pages
for page in result["pages"]:
    print(f"Page {page['page_number']}:")
    for block in page["blocks"]:
        for line in block["lines"]:
            print(f"  {line['text']} (conf: {line['confidence']:.0%})")

cURL

# Simple text extraction
curl -X POST http://localhost:8012/ocr/simple \
    -F "file=@document.png"

# Full layout extraction
curl -X POST http://localhost:8012/ocr/file \
    -F "file=@document.pdf" | jq

JavaScript

const formData = new FormData();
formData.append('file', documentFile);

const response = await fetch('http://localhost:8012/ocr/simple', {
    method: 'POST',
    body: formData
});

const { text, confidence, word_count } = await response.json();
console.log(`Extracted ${word_count} words with ${(confidence * 100).toFixed(0)}% confidence`);

📊 Response Formats

Simple Response (`/ocr/simple`)

{
    "text": "NOTA FISCAL ELETRÔNICA\nNúmero: NF-2026-12345\nData: 01/02/2026\n...",
    "confidence": 0.85,
    "word_count": 62
}

Full Response (`/ocr/file`)

{
    "pages": [
        {
            "page_number": 1,
            "blocks": [
                {
                    "lines": [
                        {
                            "text": "NOTA FISCAL ELETRÔNICA",
                            "words": [
                                {
                                    "text": "NOTA",
                                    "confidence": 0.98,
                                    "bbox": [0.1, 0.05, 0.2, 0.08]
                                },
                                {
                                    "text": "FISCAL",
                                    "confidence": 0.97,
                                    "bbox": [0.21, 0.05, 0.35, 0.08]
                                }
                            ],
                            "confidence": 0.95
                        }
                    ],
                    "block_type": "text"
                }
            ],
            "full_text": "NOTA FISCAL ELETRÔNICA..."
        }
    ],
    "full_text": "Complete document text...",
    "word_count": 150,
    "confidence": 0.85,
    "processing_time": 2.5
}

🔧 Configuration

Variable	Default	Description
`OCR_PORT`	8012	Service port

📐 Document Preprocessing Tips

For best results:

Resolution: Use 300 DPI or higher
Contrast: Ensure good text/background contrast
Alignment: Rotate skewed documents
Quality: Avoid blurry or compressed images

🆚 Comparison with Cloud Services

Service	Cost	Accuracy	Privacy
CSuite OCR	FREE	80-90%	✅ Local
Google Vision	$1.50/1K pages	95%+	❌ Cloud
AWS Textract	$1.50/1K pages	95%+	❌ Cloud
Azure Vision	$1.00/1K pages	90%+	❌ Cloud

🔧 Model Architecture

The service uses docTR with:
- Detection: db_resnet50 - Text block detection
- Recognition: crnn_vgg16_bn - Character recognition
- Model Size: ~165MB total

📈 Performance

Document Type	Processing Time	Accuracy
Simple text	~1-2s	90%+
Tables	~2-3s	80%+
Mixed layout	~3-4s	85%+
Multi-page PDF	~5-10s	85%+

STT Service - Convert speech to text
TTS Service - Convert text to speech
ARIA Gateway - Voice-enabled AI assistant

Ocr Service

Document OCR Service

Overview

🎯 Features

📄 Supported Document Types

🔗 Endpoints

Health Check

Full OCR (with layout)

Simple OCR (text only)

OCR from Base64

📝 Example Usage

Python - Simple

Python - Full Layout

cURL

JavaScript

📊 Response Formats

Simple Response (`/ocr/simple`)

Full Response (`/ocr/file`)

🔧 Configuration

📐 Document Preprocessing Tips

🆚 Comparison with Cloud Services

🔧 Model Architecture

📈 Performance

🔊 Text-to-Speech

Document OCR Service

Overview

🎯 Features

📄 Supported Document Types

🔗 Endpoints

Health Check

Full OCR (with layout)

Simple OCR (text only)

OCR from Base64

📝 Example Usage

Python - Simple

Python - Full Layout

cURL

JavaScript

📊 Response Formats

Simple Response (/ocr/simple)

Full Response (/ocr/file)

🔧 Configuration

📐 Document Preprocessing Tips

🆚 Comparison with Cloud Services

🔧 Model Architecture

📈 Performance

🔗 Related Services

🔊 Text-to-Speech

Simple Response (`/ocr/simple`)

Full Response (`/ocr/file`)