How receipt OCR actually works
Modern receipt scanning uses a pipeline of machine learning models, not simple text recognition. Le, Pham, and Nguyen’s 2019 research on deep learning receipt recognition, presented at ICDAR, broke the process into distinct stages—each introducing potential accuracy loss:
1Image preprocessing
Correct rotation, skew, and lighting. Poor preprocessing cascades errors through every subsequent step.
2Text detection
Locate text regions on the receipt. Thermal paper degradation, wrinkles, and shadows complicate detection.
3Character recognition
Convert detected regions into text. This is where “Margherita Pizza” becomes “Margherita Pizza” or “Margh3rita Pizra.”
4Field extraction
Classify text as item names, prices, tax, totals, or irrelevant information. This step determines whether the system understands the receipt’s structure.
The critical difference between receipt scanning implementations is how deep the pipeline goes. A system that stops at step 2 (text detection) gives you raw text. A system that completes all 4 steps gives you structured data: item names linked to prices, tax identified and separated, totals verified against line items.
splitty uses Azure AI Document Intelligence—a receipt-specialized model that performs all 4 pipeline stages. It extracts merchant name, individual line items with prices, tax amounts, tip, and totals as structured JSON. Splitwise’s scanner primarily captures the total amount, requiring manual itemization after the fact.
Source: Le, Pham & Nguyen, “Deep Learning Approach for Receipt Recognition,” ICDAR Workshops (2019).