Ocr Service

Document OCR Service

Document Text Recognition using docTR

Overview

The CSuite OCR Service provides high-accuracy text extraction from documents and images using the docTR library. It's designed for Portuguese documents including Notas Fiscais, contracts, and business documents.

🎯 Features

Feature Description
Layout-Aware Preserves document structure
Table Detection Recognizes tables and cells
High Accuracy 80%+ confidence on typical documents
Portuguese Support Optimized for Brazilian documents
Multiple Formats PNG, JPG, TIFF, PDF support
Cost FREE (no API costs)

📄 Supported Document Types

🔗 Endpoints

Health Check

GET /health

Full OCR (with layout)

POST /ocr/file
Content-Type: multipart/form-data

Returns complete document structure with blocks, lines, and word positions.

Simple OCR (text only)

POST /ocr/simple
Content-Type: multipart/form-data

Returns only the extracted text.

OCR from Base64

POST /ocr/base64
Content-Type: application/x-www-form-urlencoded

Process a base64-encoded image.

📝 Example Usage

Python - Simple

import requests

# Extract text from document
with open("nota_fiscal.png", "rb") as f:
    response = requests.post(
        "http://localhost:8012/ocr/simple",
        files={"file": f}
    )

result = response.json()
print(result["text"])       # Full text
print(result["confidence"]) # Average confidence
print(result["word_count"]) # Number of words

Python - Full Layout

import requests

with open("invoice.pdf", "rb") as f:
    response = requests.post(
        "http://localhost:8012/ocr/file",
        files={"file": f}
    )

result = response.json()

# Iterate through pages
for page in result["pages"]:
    print(f"Page {page['page_number']}:")
    for block in page["blocks"]:
        for line in block["lines"]:
            print(f"  {line['text']} (conf: {line['confidence']:.0%})")

cURL

# Simple text extraction
curl -X POST http://localhost:8012/ocr/simple \
    -F "file=@document.png"

# Full layout extraction
curl -X POST http://localhost:8012/ocr/file \
    -F "file=@document.pdf" | jq

JavaScript

const formData = new FormData();
formData.append('file', documentFile);

const response = await fetch('http://localhost:8012/ocr/simple', {
    method: 'POST',
    body: formData
});

const { text, confidence, word_count } = await response.json();
console.log(`Extracted ${word_count} words with ${(confidence * 100).toFixed(0)}% confidence`);

📊 Response Formats

Simple Response (/ocr/simple)

{
    "text": "NOTA FISCAL ELETRÔNICA\nNúmero: NF-2026-12345\nData: 01/02/2026\n...",
    "confidence": 0.85,
    "word_count": 62
}

Full Response (/ocr/file)

{
    "pages": [
        {
            "page_number": 1,
            "blocks": [
                {
                    "lines": [
                        {
                            "text": "NOTA FISCAL ELETRÔNICA",
                            "words": [
                                {
                                    "text": "NOTA",
                                    "confidence": 0.98,
                                    "bbox": [0.1, 0.05, 0.2, 0.08]
                                },
                                {
                                    "text": "FISCAL",
                                    "confidence": 0.97,
                                    "bbox": [0.21, 0.05, 0.35, 0.08]
                                }
                            ],
                            "confidence": 0.95
                        }
                    ],
                    "block_type": "text"
                }
            ],
            "full_text": "NOTA FISCAL ELETRÔNICA..."
        }
    ],
    "full_text": "Complete document text...",
    "word_count": 150,
    "confidence": 0.85,
    "processing_time": 2.5
}

🔧 Configuration

Variable Default Description
OCR_PORT 8012 Service port

📐 Document Preprocessing Tips

For best results:

  1. Resolution: Use 300 DPI or higher
  2. Contrast: Ensure good text/background contrast
  3. Alignment: Rotate skewed documents
  4. Quality: Avoid blurry or compressed images

🆚 Comparison with Cloud Services

Service Cost Accuracy Privacy
CSuite OCR FREE 80-90% ✅ Local
Google Vision $1.50/1K pages 95%+ ❌ Cloud
AWS Textract $1.50/1K pages 95%+ ❌ Cloud
Azure Vision $1.00/1K pages 90%+ ❌ Cloud

🔧 Model Architecture

The service uses docTR with:
- Detection: db_resnet50 - Text block detection
- Recognition: crnn_vgg16_bn - Character recognition
- Model Size: ~165MB total

📈 Performance

Document Type Processing Time Accuracy
Simple text ~1-2s 90%+
Tables ~2-3s 80%+
Mixed layout ~3-4s 85%+
Multi-page PDF ~5-10s 85%+

🔊 Text-to-Speech

1.0x
1.0
Pronto para reproduzir