The Model Card Your Procurement Team Treated Like a Datasheet

June 2, 2026 · 11 min read

Software Engineer

A model card is a research artifact. A datasheet is a contract. Procurement teams routinely read the first as if it were the second, and the AI vendor that handed it over is now bound to claims its engineering team thought were narrative.

This is the cleanest way to lose a renewal: you forwarded the same PDF you publish on your model index page, the customer's legal team excerpted four sentences into Schedule B, and twelve months later you discover that "intended use: general question answering" has become a contractual representation about scope of service. Your team measured those sentences in BLEU points. Their team is now measuring them in breach.

The mistake is not that the model card was wrong. It was almost certainly accurate at the moment it was written. The mistake is that two different professional cultures read the same document with two different ideas of what a sentence is. Researchers write a model card to inform downstream decisions about behavior. Procurement reviewers read any vendor-supplied document as a commitment surface — every claim is a clause until negotiated otherwise. Neither side notices the genre mismatch because, on the page, the artifact looks like documentation.

Why The Model Card Is The Wrong Object To Hand Procurement

Model cards were proposed in 2018 as a transparency artifact for the machine learning community. The original framing was explicit: a card exists so that practitioners deciding whether to deploy a model can understand what the model does well, what it does poorly, what populations it was evaluated on, and where the authors expect it to fail. The document was a research norm, not a vendor representation, and the audience was assumed to be technical enough to read "intended use" as a description of where the authors had thought about behavior, not where the vendor had committed to providing it.

That framing has not survived contact with enterprise sales. Once a model card becomes part of a vendor's response to a security questionnaire, the audience changes. Now the reader is a third-party risk reviewer with a checklist, a procurement officer with a template, or — most dangerously — a customer's outside counsel paid by the hour to find binding language. None of them read the card the way the authors intended. They read it the way they read every other vendor document: as a representation that, if attached to or referenced by the master agreement, becomes enforceable.

The genre mismatch matters because the model card was written to disclose limitations honestly, not to limit liability precisely. "Limitations" in a model card is a research disclosure. "Limitations" in a vendor contract is a carve-out. The same word does two different jobs, and a procurement reviewer who has never seen a model card before will default to the second reading every time.

The Contractual Claims You Made Without Noticing

Walk through a representative model card and ask which sentences a procurement reviewer would highlight. The list is uncomfortable.

"Intended use" reads as scope of service. If the card says the model is intended for English-language question answering, a customer downstream may argue that you have represented suitability for that use — and unsuitability outside it has become your problem to disclaim, not theirs to assume. "Training data" reads as data provenance, which intersects copyright indemnity in ways your engineering team does not track. "Evaluation results" read as performance representations, which in any traditional vendor contract would be capped by an SLA and have a corresponding credit. Your card has neither. "Known limitations" reads as a list of things you have already conceded the model gets wrong, which a customer's counsel will treat as your admission of risk and your obligation to monitor.

None of those readings is what the authors meant. All of them are reasonable readings of a document that crossed from engineering culture into procurement culture without a translation layer. The card was authored to inform researchers about behavior. The contract is now authored against the card. Neither team designed for that handoff, and both teams will sign their respective documents without realizing they have bound the other one.

The version of this that hurts most is benchmark results. A model card published in March that lists a 92% score on some evaluation suite is not the same artifact in December — the model has been retrained twice, the benchmark has been updated, and the score is no longer reproducible. But the contract still references the original card, and the customer is still operating against a number that was never meant to be a guarantee. When their auditor asks for evidence that you are meeting the represented performance, the engineering team's honest answer — "we replaced that eval six months ago" — sounds like evasion.

The Two-Artifact Discipline

The fix is to stop letting a research artifact perform contract work, and to author the contract work as its own document.

A technical model card stays where it belongs: on the model index, behind the developer portal, written in the voice of the team that built the model. It describes behavior, limitations, evaluation methodology, training data summary, and known failure modes for an engineering audience. Its job is to help a downstream developer make a good integration decision. It is allowed to be honest about limitations because its readers know how to read research prose.

A separate vendor due-diligence package is the artifact that goes to procurement. Its claims are written by, or co-authored with, legal. Every commitment in it has a defined scope, a notice mechanism, a remediation path, and — where appropriate — a liability cap. It addresses the questions procurement reviewers actually ask: data retention windows, processing region, audit rights, sub-processor list, security certifications, deprecation notice period, incident notification SLA, indemnification scope including outputs, and exit assistance. It does not include benchmark scores unless the scores are paired with the conditions under which they are guaranteed and the remedy when they are not.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Model Card Your Procurement Team Treated Like a Datasheet

Why The Model Card Is The Wrong Object To Hand Procurement

The Contractual Claims You Made Without Noticing

The Two-Artifact Discipline

Recommended Reading

About Tian Pan

Why The Model Card Is The Wrong Object To Hand Procurement​

The Contractual Claims You Made Without Noticing​

The Two-Artifact Discipline​

Recommended Reading

About Tian Pan

Why The Model Card Is The Wrong Object To Hand Procurement

The Contractual Claims You Made Without Noticing

The Two-Artifact Discipline