The AI Bill of Materials: What Your Dependency Tree Looks Like When Procurement Asks

May 2, 2026 · 11 min read

Software Engineer

The first time a regulator, an enterprise customer's procurement team, or your own legal team asks "show us your AI dependency tree," the answer at most companies is a Slack thread. Someone in the platform channel pings the model team. The model team pings the prompt owners. The prompt owners cc the data lead. Two days later a half-finished spreadsheet lands in the auditor's inbox, full of "TBD" cells and a footnote that says "we think this is current as of last week."

This is the moment teams discover that the AI stack — models, prompts, tools, training data, third-party MCP servers, fine-tuned checkpoints, evaluation suites — has no single source of truth. Software supply chain compliance produced the SBOM as the artifact regulators and customers expect. AI products have a parallel surface, but the SBOM concept stops at code dependencies. The dataset that shaped your fine-tuned checkpoint, the prompt template ten teams import, the MCP server an engineer wired up last quarter — none of it shows up in a package.json.

The fix is the AI Bill of Materials, or AIBOM: a continuously updated, machine-readable inventory of every AI component your product depends on, generated from instrumentation rather than from someone's memory. This is not a documentation exercise. It is becoming a contractual deliverable on the timeline of "next renewal cycle," and a compliance artifact on the timeline of August 2026, when the EU AI Act's core obligations land for high-risk systems.

Why the SBOM Concept Doesn't Cover AI

SBOMs were designed for code: libraries, versions, licenses, vulnerabilities. The model is "what binaries went into this build, and which of them have known CVEs." That works because software behavior is determined by code.

AI systems aren't. A frontier model's behavior is determined by its training data, its post-training procedure, its tokenizer, the system prompt it ships with, the temperature you set, the tools you wired in, the retrieval corpus it queries, and the version pin you specified — if you specified one. None of this lives in your dependency manifest. An SBOM that captures only the Python packages your inference service imports is missing the actual sources of behavior.

The gap shows up in concrete ways. A model provider rolls a "minor update" and your refusal patterns change overnight. A fine-tuning dataset includes scraped content from a license bucket your contracts don't cover. An engineer adds a third-party MCP server that quietly gets credentials to your CRM. A prompt template gets edited without a version bump and ten downstream features start producing different outputs. None of these are caught by traditional supply chain tooling, because traditional supply chain tooling doesn't know that prompts, datasets, or model checkpoints exist.

This is why the standards bodies have moved. The OWASP AIBOM Initiative is building open implementations on top of the existing CycloneDX 1.6 and SPDX 3.0 schemas, both of which now have AI/ML-specific component types. The CycloneDX spec — recently published as Ecma International standard ECMA-424 — supports AI/ML-BOM as a first-class document type alongside SBOM and SaaSBOM. SPDX 3.0 added AI profiles to capture model metadata, training data references, and evaluation results. The format wars are mostly settled. What teams are missing is the generation pipeline.

The Four Surfaces You Have to Track

A useful AIBOM has to cover four surfaces, and most teams underestimate at least two of them.

Models: Every model invocation in production, with provider, model ID, version pin, and the feature that called it. This sounds easy until you realize how often "version pin" is actually "whatever the provider's latest alias resolves to today." A team I talked to recently discovered three different versions of the same Claude model in production simultaneously, because three different services had been deployed at different times and none had pinned versions. Their AIBOM lacked a row for "version drift across services." When they rolled the registry, two of those features had measurable behavior differences they hadn't been tracking.

Prompts: Every system prompt, every templated user prompt, every assistant pre-fill. These need IDs, version history, and an explicit owner. The reason they need an owner is that prompts have become critical business logic with no clear org placement — sometimes product owns them, sometimes engineering, sometimes neither. Without an owner, change management becomes "whoever last edited it." A real prompt registry stores these as configs rather than code, with the same git-style diffs and CI gates you'd put on a database migration. MLflow's prompt registry, LaunchDarkly's prompt management, and Vertex AI's prompt registry all converge on this same shape: prompt as versioned, environment-promoted artifact.

Tools: Every function-calling tool, every MCP server, every plugin reachable from an agent. Capability scope (read-only? write? what resources?), authentication path, deprecation status. This is where shadow AI lives. One enterprise inventory exercise turned up 150 agents on the official list and over 500 actually deployed. A separate audit of 22.4 million enterprise prompts identified 665 distinct generative AI tools across enterprise environments — most unauthorized. If your tool registry is "the array of objects in tools.py," you don't have a registry.

Datasets and checkpoints: Every training set, every fine-tuning dataset, every retrieval corpus, every evaluation set. Provenance, license, last-refresh timestamp, the checkpoint it produced. Research on the lineage of widely used fine-tuning datasets found license miscategorization rates above 50% and license information omission rates above 70%. If you fine-tune on a dataset whose license you've miscategorized, your model is shipping a problem you can't see — and you can't fix it without an AIBOM that ties checkpoint to dataset to license.

The mistake teams make is treating these as four separate spreadsheets. They aren't. A production AI feature is the cross product of all four: this prompt, on this model version, calling these tools, against this retrieval corpus. Change any one and behavior changes. An AIBOM has to record the binding, not just the parts.

Generation by Instrumentation, Not by Wiki

The first attempt at an AIBOM is almost always a wiki page. Someone fills it out manually, gets thanked, and within six weeks the page is wrong. Manual AIBOMs do not scale; this is now well documented enough that it's the first thing the OWASP guidance addresses. The only sustainable approach is to generate the AIBOM from the same instrumentation you'd use for observability.

Concretely: every LLM call from your inference layer emits structured telemetry that includes model ID + version + provider + prompt ID + prompt version + tool list + retrieval source. Your AIBOM is a query against that stream. If a row appears in production telemetry that doesn't have a corresponding registered prompt, your AIBOM generator flags it as undocumented. If a model version appears that nobody deployed, your AIBOM flags it as drift. The artifact stops being something a human writes and becomes something the system continuously emits.

The shape that works in practice has three parts. A control plane (registries for prompts, tools, models) where humans declare intent. A data plane (production telemetry) that records what actually happened. A reconciler that compares the two and flags discrepancies. The AIBOM is the materialized output of that reconciler, in CycloneDX or SPDX format, signed and timestamped. When procurement asks, you don't write a document — you export the latest run.

This shape generalizes to the change management problem. When a prompt version is promoted, the registry emits an event. When a new model version starts appearing in telemetry, the reconciler fires. When a tool's capability scope expands, a CI gate trips. The AIBOM becomes a side effect of doing AI development with normal engineering discipline, not a separate compliance project.

What Forces the Issue: Regulators and Procurement

Two forces are accelerating AIBOM adoption past the "nice-to-have" line.

Regulators: The EU AI Act's core obligations for high-risk AI systems land August 2, 2026, with deadlines for AI embedded in regulated products extending to August 2, 2027. High-risk providers must maintain technical documentation per Annex IV, register systems in the EU database under Article 71, and complete conformity assessment before going live. Article 99 sets fines for record-keeping violations at up to €15 million or 3% of worldwide annual turnover. The NIST AI RMF's Govern function requires an AI inventory; ISO/IEC 42001 explicitly requires an AIMS audit trail. None of these regulators care what format you use, but they all require the same artifact: a current, accurate inventory of what AI is in your product, what it depends on, and what risk class it operates in.

Procurement: This is the underappreciated driver. As enterprises buy AI capabilities, their security and procurement teams are starting to require AIBOM-style documentation as a vendor questionnaire response. The pattern looks like the SOC 2 trajectory from a decade ago: it began as an optional artifact, became a frequently asked-for one, and is now a contractual deliverable on most enterprise deals. AIBOM is following the same curve, faster. Your customer's procurement team isn't waiting for your industry to standardize. They're asking now, and the team that doesn't have an answer is the team that loses the deal to one that does.

The combined effect is that AIBOM stops being a compliance team's problem and becomes a sales-engineering problem. Eight months out from a renewal, someone in procurement asks for the dependency tree. The team that can export it from instrumentation in twenty minutes wins. The team that has to start a Slack thread loses, sometimes for reasons that have nothing to do with the actual technical posture.

The Engineering Discipline AIBOM Requires

If your AIBOM is going to be more than a snapshot, the engineering practices behind it have to land. The big ones are unfashionable.

Pin model versions explicitly, never latest. Every team eventually learns this lesson; the only question is whether they learn it before or after a silent behavior change in production. A version pin is the most basic AIBOM line item, and you cannot have an honest one without it.

Treat prompts as configs with a registry, not as strings in code. The prompt is the closest thing the AI stack has to a deployed artifact, and like any deployed artifact it needs an ID, a version, an owner, and a promotion pipeline.

Maintain a tool registry with capability scope, even for "internal" tools. The MCP server an engineer added last quarter is not internal once it's exposed to an agent that talks to your data layer. Inventory it like any other reachable system, with explicit auth boundaries.

Tie datasets to checkpoints with provenance metadata. When a fine-tuning run produces a checkpoint, record the input dataset version, hash, and license — not just in MLflow, but in a way the AIBOM generator can pull from later. The auditor's question isn't "what did you train on" but "show me the lineage of every checkpoint serving traffic right now."

Make the AIBOM exportable on demand. Treat it like a database query, not a Word document. The success criterion is "we can produce a current AIBOM in under five minutes." If yours takes a meeting to produce, it's not load-bearing yet.

The Realization Most Teams Reach Last

The framing trap is treating AIBOM as a compliance exercise — something the GRC team does on the way to an audit. This is the same mistake teams made with SBOM a decade ago, and the cost is the same: tooling that's bolted on after the fact, perpetually out of date, useful only for the audit and useless for engineering decisions.

The teams that benefit most treat AIBOM as engineering infrastructure that compliance happens to also need. The same registry that gates a prompt promotion gives the auditor an answer. The same telemetry that catches model version drift in your latency dashboards generates the inventory line item. The same tool registry that prevents an engineer from wiring an unaudited MCP server into an agent is the artifact your customer's procurement team wants to see.

When AIBOM is built that way, the first time someone asks "show us your dependency tree" isn't a fire drill. It's a query. The team that has the query already running is the team that gets to spend its compliance bandwidth on the things that actually matter, and the team that's still chasing a Slack thread is the one explaining to legal why the answer to a customer's question takes two weeks.

The AIBOM discipline is, at root, a bet that your AI stack is going to be inspected — by regulators, by customers, by your own incident response — more often as time goes on, not less. The teams making that bet now are buying themselves a calmer 2027.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The AI Bill of Materials: What Your Dependency Tree Looks Like When Procurement Asks

Why the SBOM Concept Doesn't Cover AI

The Four Surfaces You Have to Track

Generation by Instrumentation, Not by Wiki

What Forces the Issue: Regulators and Procurement

The Engineering Discipline AIBOM Requires

The Realization Most Teams Reach Last

Recommended Reading

About Tian Pan

Why the SBOM Concept Doesn't Cover AI​

The Four Surfaces You Have to Track​

Generation by Instrumentation, Not by Wiki​

What Forces the Issue: Regulators and Procurement​

The Engineering Discipline AIBOM Requires​

The Realization Most Teams Reach Last​

Recommended Reading

About Tian Pan

Why the SBOM Concept Doesn't Cover AI

The Four Surfaces You Have to Track

Generation by Instrumentation, Not by Wiki

What Forces the Issue: Regulators and Procurement

The Engineering Discipline AIBOM Requires

The Realization Most Teams Reach Last