Every AP desk has a different invoice format, and almost none of them speak the same language. The work was building the extraction + normalization layer that turns a pile of vendor PDFs into clean, comparable records, without forcing the team to babysit it.
What shipped
- An OCR + LLM extraction pipeline tuned per client portfolio, with industry-specific parsing rules.
- A normalization and homogenization layer so vendor formats collapse into one comparable record shape.
- Discovery and PRD work with engineering to ship the pipeline into the existing AP workflow.
Outcome
- ~2 minutes saved per invoice across ~70k invoices/month.
- Vendor-specific formatting stopped blocking the team; the data was already in shape by the time it hit review.