Skip to main content

5 posts tagged with "supply-chain"

View all tags

Your Fine-Tuning Corpus Is a Codebase. Stop Shipping It Through a Bucket.

· 11 min read
Tian Pan
Software Engineer

By month nine of any serious fine-tuning project, your training corpus has more authors than your codebase. Synthetic generation pipelines wrote a few million examples. The vendor labeling firm contributed 80K rows from a workforce you have never met. An engineer added 47 examples last Tuesday to fix a regression they spotted in eval. A scraping job pulls production traces into a "supplementary" parquet file every night. A CSV someone dropped into S3 in February is still there, still in the training mix, and the person who wrote it left the company in March.

Now look at your application code repo. Every line is attributable to a named author. Every change went through a PR with at least one reviewer. Commits are signed. The main branch is protected. Merges require a second human. There is an audit log. If an auditor asks who wrote line 47 of payment_processor.py, you have an answer within seconds.

If they ask who wrote example 47 of the corpus that produced model v2.3, the honest answer is "a Mechanical Turk batch from 2024-Q2, vendor unknown, justification absent." Your fine-tuning corpus is a higher-privilege deployment surface than your codebase — it directly shapes model behavior in production — and you are shipping it through a bucket while you ship code through a reviewed PR. The threat model is inverted.

Open-Weight Licenses Are a Compliance Minefield Your Team Hasn't Mapped

· 9 min read
Tian Pan
Software Engineer

The word "open" is doing an extraordinary amount of work in "open-weight." When an engineer downloads a safetensors file from a model hub, they tend to file the act under the same mental category as npm install lodash — pull a dependency, ship a feature, move on. But the license that ships next to those weights is rarely Apache 2.0 or MIT. It is more often a custom community license with acceptable-use carve-outs, attribution requirements, derivative-naming rules, and user-count thresholds that switch the contract terms once your product gets popular. And almost none of it is enforced by the loader. The model runs whether you complied or not.

This is how compliance debt accumulates silently. The team that treats license review as a one-time download check is signing the company up for an audit finding that will ship years after the developer who clicked "I agree" has left. The fix is not a stricter procurement gate at the door — it is a discipline of treating model weights as a supply chain, with provenance, periodic re-review, and a manifest that traces every deployed inference path back to its upstream license.

The MCP Server Graveyard: When Your Agent's Dependencies Stop Shipping

· 10 min read
Tian Pan
Software Engineer

The last commit to the MCP server your agent calls every five minutes was eight months ago. The upstream API it wraps rolled out a new authentication model in February. There are 47 open issues, 12 of them flagged security. The maintainer's GitHub account hasn't shown activity since October. Your agent still connects, still receives tool descriptions, still executes calls — and silently, every one of those calls flows through a piece of infrastructure that nobody is watching.

This is the shape of MCP abandonment. Not a malicious rug pull, not a compromised package, just neglect. Somebody published a useful server in 2025, got adopted, then moved on. The server kept working because nothing forced it to break. Until it does — and by then, the trust boundary your agent was crossing every five minutes has already failed.

Most teams adopted community MCP servers the way they adopted npm packages: by running install and reading the README. That mental model makes sense for libraries that sit in your dependency tree, get audited at build time, and surface their deprecations through your package manager. It does not survive contact with MCP, where the dependency is a live trust boundary that the LLM invokes in a loop, with credentials, on production data.

The MCP Composability Trap: When 'Just Add Another Server' Becomes Dependency Hell

· 9 min read
Tian Pan
Software Engineer

The MCP ecosystem has 10,000+ servers and 97 million SDK downloads. It also has 30 CVEs filed in sixty days, 502 server configurations with unpinned versions, and a supply chain attack that BCC'd every outgoing email to an attacker for fifteen versions before anyone noticed. The composability promise — "just plug in another MCP server" — is real. But so is the dependency sprawl it creates, and most teams discover the cost after they're already deep in integration debt.

If you've built production systems on npm, you've seen this movie before. The MCP ecosystem is speedrunning the same plot, except the packages have shell access to your machine and credentials to your production systems.

MCP Server Supply Chain Risk: When Your Agent's Tools Become Attack Vectors

· 9 min read
Tian Pan
Software Engineer

In September 2025, an unofficial Postmark MCP server with 1,500 weekly downloads was quietly modified. The update added a single BCC field to its send_email function, silently copying every email to an attacker's address. Users who had auto-update enabled started leaking email content without any visible change in behavior. No error. No alert. The tool worked exactly as expected — it just also worked for someone else.

This is the new shape of supply chain attacks. Not compromised binaries or trojaned libraries, but poisoned tool definitions that AI agents trust implicitly. With over 12,000 public MCP servers indexed across registries and the protocol becoming the default integration layer for AI agents, the MCP ecosystem is recreating every mistake the npm ecosystem made — except the blast radius now includes your agent's ability to read files, send messages, and execute code on your behalf.