Building a Document AI Pipeline That Actually Scales

Processing 14 million documents per year for a 600-bed hospital network taught us that document AI is 20% model work and 80% engineering.

The pipeline

Four stages, each with its own failure modes:

Stage 1: Ingestion

Documents arrive as scanned PDFs, photos from phones, faxes, and direct digital uploads. We normalize everything to high-res PNGs with deskewing, denoising, and adaptive contrast. This stage handles 40+ input formats.

Stage 2: Layout detection

Before OCR, we detect the document structure: where are the tables? The headers? The handwritten vs. printed regions? We use a LayoutLM-based model fine-tuned on 5,000 annotated medical forms. This step is what makes extraction reliable — without it, OCR returns a jumble of text with no structure.

Stage 3: Extraction

OCR for printed text (Tesseract with custom post-processing), a separate handwriting recognition model for handwritten fields, and Claude API for interpreting ambiguous or poorly-scanned text. The LLM step is the fallback, not the primary extractor — it handles the 15% of fields that rule-based extraction can't.

Stage 4: Validation + Integration

Every extracted field is validated: patient IDs against the hospital's master index, medication names against RxNorm, dates for logical consistency. Validated data writes to Epic EHR via HL7 FHIR API. Anything that fails validation gets queued for human review.

Scaling considerations

Queue-based architecture. Each stage is a separate service with its own queue. If extraction is slow, ingestion doesn't stop.
Horizontal scaling. We run 8 extraction workers in parallel during peak hours (morning admissions), scale down to 2 at night.
Monitoring. Per-stage latency, accuracy, and queue depth dashboards. Alerts when accuracy drops below 98% on any field.

The accuracy numbers

After 6 months in production: 99.4% accuracy on structured fields (patient name, DOB, ID), 96.8% on semi-structured fields (medications, diagnoses), and 94.1% end-to-end (all fields correct on a single document). The remaining 5.9% get human review.