For most healthtech product teams, buying intelligent document processing is a better use of engineering capacity than building it. The case is straightforward once the underlying numbers are visible, and the goal of this post is to make those numbers visible — so the decision takes hours rather than quarters.
If you are a product manager, CTO, or founder weighing this decision, the analysis below covers what an in-house IDP build actually requires, why the modern “wrap an LLM” shortcut does not hold up in production, and what to look for in a vendor that operates as a real product partner. The intent is to give you a clear, defensible answer for the next roadmap review.
The decision is largely settled by the math
A production-quality IDP capability in healthcare requires 4–8 engineers, 18–36 months, and low-single-digit millions in fully-loaded cost before the first customer-facing workflow goes live against it. Ongoing maintenance is permanent. The team scales with every new document type and every new EHR integration added over time.
That investment recreates a capability that already exists, that you can integrate via API in weeks, and that a vendor whose entire roadmap is IDP will continue to advance faster than an in-house team. The opportunity cost — the differentiated product work those 4–8 engineers could be doing instead — is the line item most build analyses underweight.
There is one exception worth naming up front. If your core product is document automation itself, you must own the core. For every other healthtech category — practice management, RCM, population health, referral management, care coordination, clinical workflow — IDP is infrastructure. Infrastructure that already exists in a mature form is rarely the highest-leverage place to invest internal engineering.
Why IDP is larger than it looks on the roadmap
The first place teams underestimate is the framing. “Intelligent document processing” sounds like a single capability. In practice, it is the orchestration of five distinct disciplines, each of which represents a meaningful engineering investment in its own right.
Document ingestion and normalization. Inbound documents arrive as TIFFs, PDFs, scanned images, born-digital files, and through API hand-offs from upstream platforms. Each format has its own failure modes — page rotation, multi-document splitting, cover sheet handling, low-DPI scans of forms reproduced multiple times before reaching the queue. Issues here propagate to every downstream step.
OCR and structured extraction. Healthcare documents combine typed text, handwritten annotations, checkbox forms, tables, stamps, and hand-drawn diagrams. Off-the-shelf OCR reaches roughly 85% accuracy on clean documents. The remaining 10–15% is where production engineering lives, and it is the difference between a demo and a system operations teams will rely on.
Classification and routing. Referrals, prior authorizations, lab results, and records requests look superficially similar to a model not trained on enough healthcare documents to recognize their differences. Classification accuracy depends on labeled, varied, real-world training data, and the long tail of edge cases in healthcare is unusually long. Each specialty introduces its own conventions.
Patient and provider matching. Matching an inbound document to the correct patient record in an EHR is a separate engineering discipline that typically surfaces only after the extraction layer is built. Demographic variation, name changes, partial duplicates, and shared addresses introduce ambiguity that cannot be resolved with rules alone. In a clinical context, “mostly works” is not an acceptable threshold.
EHR integration. NextGen, PointClickCare, eClinicalWorks, Athena, ModMed — each has distinct API surfaces, authentication models, and version drift. A single integration is manageable. A portfolio of them is an ongoing engineering commitment that does not shrink.
Any one of these is a substantial feature. All five — working together, reliably, at scale, in a HIPAA-regulated environment, against documents that are routinely incomplete or malformed — is a category. Teams that attempt to build a category-level product as a side workstream alongside a primary roadmap consistently produce something that underperforms what a dedicated vendor delivers.
The LLM wrapper assumption does not survive production
A common counterargument in 2026 is that modern foundation models compress the work above into a thin integration layer. Send the document to a multimodal LLM with a structured output prompt, parse the response, and call it done.
The approach demonstrates well in pilots and handles the first several hundred documents adequately. It does not hold up as a production system for four reasons:
Unit economics deteriorate at volume. A specialty practice processing 5,000 inbound documents per month, multiplied by long-context multimodal token costs and retry overhead, produces an operating expense that scales linearly with usage. The economics that work in a pilot do not work in a deployed product.
Determinism and auditability are not optional. Healthcare workflows require traceable, explainable routing decisions. A black-box generative model output does not satisfy compliance review. Building the explainability layer on top of a generative model is its own engineering project, and typically larger than the wrapper it sits on.
The long tail is the actual workload. Foundation models hallucinate confidently on inputs they do not recognize. Smudged faxes, misidentified cover pages, and multi-document bundles requiring splitting are not edge cases — they represent a substantial share of real-world inbound volume. Handling them well requires the validation infrastructure the wrapper approach was meant to skip.
HIPAA compliance overhead compounds. Every model provider needs a business associate agreement and every fallback model needs one as well. Every meaningful change to the model stack triggers a security review. The compliance surface grows with every provider in the stack and never simplifies.
Teams that have delivered production-grade IDP did not skip these layers. They built or bought through each of them deliberately, with engineers whose full-time responsibility was the document layer.
The real cost of building is the work you do not deliver
The argument for building is almost always framed around control. Building preserves ownership; buying creates dependency. The concern is valid in the abstract. It is rarely decisive in practice.
The cost most build analyses underweight is opportunity cost. Every engineer assigned to OCR accuracy is an engineer not building the workflows your customers chose your product for. Every quarter the roadmap is dominated by classification and ingestion work is a quarter competitors are differentiating on the layer above it. That layer — the workflows, the user experience, the integration into broader practice operations — is where product differentiation actually accrues. It is also the layer only your team can build, because no vendor will ever know your customers’ workflows the way you do.
Building IDP in-house trades a known, ongoing engineering cost for a category of work that does not differentiate your product. Buying IDP redirects that engineering capacity toward the layer that does. The arithmetic is straightforward, and it consistently favors buying.
Why partnering wins on innovation
For product managers, the partner-versus-build calculus comes down to two factors that compound over time. The first is the pace of innovation. A vendor whose entire company is IDP improves the document layer at the pace of every customer’s combined production volume, a full-time research budget, and a roadmap dedicated to one thing. An in-house team improves it at the pace of whatever capacity the broader product roadmap can spare in any given quarter — which, in a competitive healthtech category, is rarely enough to stay current. The gap widens with every release. Partnering means the document layer in your product is best-in-class on day one and continues to advance without consuming a roadmap slot. The second factor is the staffing reality. Building IDP is not a project; it is a department. The team that creates version one has to maintain it, then extend it to the next set of document types, then absorb the next EHR integration, then re-architect for whatever foundation model shift arrives eighteen months out. That is a dedicated, permanent headcount commitment — not a one-time investment that ends when the feature is released. A vendor decision frees a product manager to fund the workflow and experience layers that differentiate the product. A build decision commits to permanently staffing a category that does not.
What to look for in a partner
The decision to buy is the easy half. Choosing the right vendor is where the discipline matters. Not every IDP vendor will operate as a serious product partner. Some are foundation model wrappers themselves. Some demo well and underperform in production. Some have strong products and weak support.
The criterion that matters most, beyond feature parity, is whether the vendor functions as an extension of your product organization. The following questions surface that distinction:
1. How does the classification model adapt to our document types? Generic IDP underperforms in healthcare. A serious vendor has a defined process for adapting classification to a customer’s specific document mix without requiring the customer to build the training infrastructure.
2. What is the integration surface? API-first architectures integrate cleanly. Vendors that require their UI to serve as the entry point introduce friction your engineering team will resent in six months. The IDP layer should behave like infrastructure, not like a tool with its own login.
3. How are failure modes handled? Ask for failure-mode documentation. Vendors without one have not yet encountered the long tail at scale. Vendors with one will tell you exactly what happens when a document cannot be classified, when patient matching is ambiguous, or when downstream systems are unavailable.
4. What does the 6-month roadmap look like? A vendor partnership is a bet on trajectory as much as current capability. The roadmap should align with where your product is going. Specifics matter. Vendors who will not discuss roadmap details should be eliminated from consideration.
5. What is the support experience during incidents? Reference calls are more useful here than case studies. Speak to customers who have hit problems and resolved them with the vendor. Partnership behavior under stress is the strongest signal of long-term fit.
A vendor that answers these five questions well is a vendor worth integrating against. A vendor that struggles with any of them is a vendor whose problems will eventually become your problems.
The decision in one paragraph
For nearly every healthtech product organization, the right call is to buy IDP, integrate against it through an API-first vendor, and redirect internal engineering capacity to the workflow and experience layers that define the product. The companies that succeed in this category over the next five years will not be the ones that built the best document processing engines. They will be the ones that built the best products on top of one. Where you choose to spend engineering capacity is the most consequential product decision your team will make. Spend it on the work only you can do.
Summary
- Intelligent document processing in healthcare is not a single feature — it is the orchestration of ingestion, OCR, classification, patient matching, and EHR integration. Treating it as a feature underestimates the work by an order of magnitude.
- Foundation model wrappers handle pilot volumes well but fall short in production on unit economics, auditability, edge-case handling, and compliance overhead.
- Building IDP in-house typically requires 4–8 engineers and 18–36 months before delivering customer-facing value, with ongoing maintenance permanently scaled to your document types and EHR footprint.
- The opportunity cost — the differentiated product work not produced during the build — is the cost most analyses fail to price in.
- Buying redirects engineering capacity to the layers that define your product, and a serious IDP vendor will continue to advance the document layer faster than an in-house team.
- For every healthtech product team whose core product is not document automation itself, the right decision is to buy.
Conclusion
The build-versus-buy question is rarely decided by the merits of the build itself. It is decided by what the team does with the engineering capacity the decision frees up — or absorbs.
A product organization that chooses to build IDP will, eighteen months from now, have a working document processing layer and a roadmap that has been quietly shaped around it. A product organization that chooses to buy will, eighteen months from now, have integrated against a mature document processing layer and used the intervening time to advance the parts of the product that customers chose them for. Both outcomes are functional. Only one of them moves the product forward.
The healthtech category is becoming more competitive, not less. The window in which product teams can win on infrastructure parity is closing. The teams that emerge well-positioned over the next five years will be the ones that recognized which layers of the stack deserved internal investment and which layers were better served by a serious partner. Intelligent document processing belongs firmly in the second category for almost every healthtech product team, and the sooner that decision is made deliberately, the sooner internal engineering can be redirected to the work that defines the product.
If your team is currently in a build evaluation for IDP, the most useful next step is to model the cost honestly — engineering headcount, time-to-value, ongoing maintenance, and the opportunity cost of the roadmap items that will be deferred — and compare it against an API-first vendor integration. When that comparison is done with real numbers, the decision tends to make itself.



