Skip to main content

17 posts tagged with "ai-governance"

View all tags

The Chain-of-Thought You Stripped to Save Tokens That Hid an Evidence Requirement

· 10 min read
Tian Pan
Software Engineer

A platform team shipped a prompt refactor that cut average response cost by thirty-two percent. The change was simple: strip the "explain your reasoning" preamble, ask the model to return only the JSON object, and drop the post-processing step that parsed the rationale out of the model's prose. The dashboard turned green. The unit economics page in the quarterly review went from yellow to gold. Nobody on the platform team thought to consult the risk team, because no part of the change touched the answer the customer received.

Two quarters later, a regulated customer's auditor requested the decision rationale for a denied-loan letter from a date six months prior. The team pulled the trace. The input was there. The output was there. The reasoning was gone — not because anyone deleted it, but because it had stopped being produced the day the refactor shipped. The customer's compliance program had been operating on the assumption that the rationale was somewhere in the trace store; the platform team had been operating on the assumption that the rationale was nobody's problem because the customer-facing answer was unchanged. Both assumptions were correct in isolation. Together they cost the customer a regulatory finding and the platform team a contract renewal.

The Legal Review Timeline Your AI Feature Roadmap Never Costed

· 10 min read
Tian Pan
Software Engineer

You sketched a six-quarter AI roadmap. The model swap, the new data source, the multilingual launch, and the prompt that now offers advice each got a single row on the Gantt chart, sized by engineering effort. Then the first launch slipped four weeks, and the post-mortem said the same thing three times in three different sections: "waiting on legal." The roadmap had assumed engineering capacity was the binding constraint. The actual binding constraint was a queue of legal reviews, each running its own three-to-six-week SLA, none of them aware of each other, and all of them landing on the same two product counsels.

The mistake was not in any of the individual reviews. Each one was warranted. The mistake was treating four parallel features as four parallel timelines while their legal dependencies serialized through the same upstream resource. By the second slip the org learns the shape of the problem. By the fourth it learns to plan against it. The teams that ship AI features on a predictable cadence have stopped treating legal throughput as an external surprise and started treating it as a planning input on the same footing as headcount and infra capacity.

The Model Card Your Procurement Team Treated Like a Datasheet

· 11 min read
Tian Pan
Software Engineer

A model card is a research artifact. A datasheet is a contract. Procurement teams routinely read the first as if it were the second, and the AI vendor that handed it over is now bound to claims its engineering team thought were narrative.

This is the cleanest way to lose a renewal: you forwarded the same PDF you publish on your model index page, the customer's legal team excerpted four sentences into Schedule B, and twelve months later you discover that "intended use: general question answering" has become a contractual representation about scope of service. Your team measured those sentences in BLEU points. Their team is now measuring them in breach.

Shadow AI: The Agents Your Team Already Shipped

· 10 min read
Tian Pan
Software Engineer

Shadow IT used to mean a marketing team expensing a SaaS subscription, or an engineer spinning up an unsanctioned S3 bucket. It was annoying, it was a procurement headache, and it was mostly survivable. Shadow AI is the same instinct — route around the slow official path — except the blast radius is larger and the entry cost has collapsed to almost nothing.

An engineer can wire an LLM API call into a production workflow in an afternoon. A support lead can stand up a no-code triage agent before lunch. A data analyst can paste a quarter's worth of customer records into a chat window to "just summarize this real quick." None of it passes through review, none of it shows up in an architecture diagram, and your governance program cannot protect a system it does not know exists.

The uncomfortable part is the scale. A 2025 UpGuard survey found that more than 80% of workers — and nearly 90% of security professionals — use unapproved AI tools at work. Your security team is doing it. Your executives are doing it. The question is not whether you have shadow AI. It is whether you can see any of it.

Production Bias Auditing: Catching AI Discrimination Before Your Users Do

· 11 min read
Tian Pan
Software Engineer

The most expensive bias bug I've seen in production was discovered by a Twitter thread, not a dashboard. A small team had shipped a credit-scoring assistant. They'd run the standard pre-launch audit: balanced training set, adversarial debiasing, equalized-odds gap under five percent on the holdout. A month after launch, a user posted screenshots showing women in their household consistently received lower limits than men with identical financials. By the time the team's monitoring caught up, the regulator had already opened an inquiry.

The lesson isn't that the team was lazy. They ran exactly the audit the literature recommends. The lesson is that pre-launch audits measure a snapshot of a model that no longer exists by the time real users hit it. Distribution shifts. New populations show up. A prompt-template change introduces a phrasing artifact that interacts with names. A model upgrade quietly trades calibration for a fluency win. The audit you ran in November does not protect the model running in production in May.

The Shadow AI Problem: Why Engineers Bypass Your Official AI Platform and What to Do About It

· 9 min read
Tian Pan
Software Engineer

Your data governance audit probably found them: API keys for OpenAI and Anthropic billed to personal credit cards, Slack bots wired to Claude through personal accounts, local Ollama instances proxying requests through the corporate VPN. Nobody told platform engineering. Nobody asked IT. The engineers just... did it.

This is the shadow AI problem, and it is already inside your organization whether you have detected it yet or not. Roughly half of employees in knowledge-work environments report using AI tools that their employers have not sanctioned. Among software engineers — who have the technical skill to set up unofficial integrations and the productivity pressure to want them — that number is almost certainly higher.

The instinct of most security and platform teams is to respond with prohibition: block the endpoints, restrict the API keys, add AI tool requests to the procurement queue. That response reliably produces more shadow AI, not less, because it treats a platform design failure as a compliance failure.

The Stakeholder Explanation Layer: Building AI Transparency That Regulators and Executives Actually Accept

· 12 min read
Tian Pan
Software Engineer

When legal asks "why did the AI deny this loan application?", your chain-of-thought trace is the wrong answer. It doesn't matter that you have 1,200 tokens of step-by-step reasoning. What they need is a sentence that holds up in a deposition — and right now, most engineering teams have no idea how to produce it.

This is the stakeholder explanation gap: the distance between what engineers understand about model behavior and what regulators, executives, and legal teams need to do their jobs. Closing it requires a distinct architectural layer — one that most production AI systems never build.

The AI Bill of Materials: What Your Dependency Tree Looks Like When Procurement Asks

· 11 min read
Tian Pan
Software Engineer

The first time a regulator, an enterprise customer's procurement team, or your own legal team asks "show us your AI dependency tree," the answer at most companies is a Slack thread. Someone in the platform channel pings the model team. The model team pings the prompt owners. The prompt owners cc the data lead. Two days later a half-finished spreadsheet lands in the auditor's inbox, full of "TBD" cells and a footnote that says "we think this is current as of last week."

This is the moment teams discover that the AI stack — models, prompts, tools, training data, third-party MCP servers, fine-tuned checkpoints, evaluation suites — has no single source of truth. Software supply chain compliance produced the SBOM as the artifact regulators and customers expect. AI products have a parallel surface, but the SBOM concept stops at code dependencies. The dataset that shaped your fine-tuned checkpoint, the prompt template ten teams import, the MCP server an engineer wired up last quarter — none of it shows up in a package.json.

Open-Weight Licenses Are a Compliance Minefield Your Team Hasn't Mapped

· 9 min read
Tian Pan
Software Engineer

The word "open" is doing an extraordinary amount of work in "open-weight." When an engineer downloads a safetensors file from a model hub, they tend to file the act under the same mental category as npm install lodash — pull a dependency, ship a feature, move on. But the license that ships next to those weights is rarely Apache 2.0 or MIT. It is more often a custom community license with acceptable-use carve-outs, attribution requirements, derivative-naming rules, and user-count thresholds that switch the contract terms once your product gets popular. And almost none of it is enforced by the loader. The model runs whether you complied or not.

This is how compliance debt accumulates silently. The team that treats license review as a one-time download check is signing the company up for an audit finding that will ship years after the developer who clicked "I agree" has left. The fix is not a stricter procurement gate at the door — it is a discipline of treating model weights as a supply chain, with provenance, periodic re-review, and a manifest that traces every deployed inference path back to its upstream license.

The Indexing Policy Committee Nobody Convened: RAG Corpus Governance Beyond the One-Time Migration

· 9 min read
Tian Pan
Software Engineer

Two years ago, a team pointed their retrieval index at the wiki, the Zendesk export, and a snapshot of the public docs. Last week, that same index returned a deprecated runbook that told an SRE to restart a service that no longer exists. The runbook had been deprecated for eighteen months. Nobody owned its retirement, so nobody retired it. The agent confidently cited it. The model wasn't wrong; the corpus was.

This is the failure mode that doesn't show up in retrieval evals: the corpus is treated as a one-time engineering decision when it's actually an ongoing governance problem. The team that scoped the initial ingestion is long gone. The legal review that should have flagged the customer-confidential PDFs never happened, because nobody told legal there was a pipeline. The "freshness strategy" is a Slack message from someone who left in Q3. The retrieval index has become a shared inbox for every document anyone ever scraped, and the bar for inclusion has drifted to "whatever was easy to ingest."

Content Provenance for AI Outputs: C2PA, SynthID, and the Audit Trail You Will Soon Owe

· 10 min read
Tian Pan
Software Engineer

A model's output used to be a string. By August 2026 it will be a signed artifact with a chain-of-custody manifest, and any team treating it as anything less will be retrofitting under deadline pressure.

That sentence sounds dramatic until you read Article 50 of the EU AI Act, which becomes fully enforceable on August 2, 2026, and requires that any synthetic content from a generative system be machine-detectable as AI-generated. The Code of Practice published in March 2026 is explicit that a single marking technique is not sufficient — providers must combine metadata embedding (C2PA) with imperceptible watermarking, and the output must survive common transformations like cropping, compression, and screenshotting. Penalties for non-compliance reach €15 million or 3% of global turnover. This is not a labeling guideline; it is a signed-artifact mandate, and it lands on every team shipping a generative feature into the EU market.

The Agent Backfill Problem: Your Model Upgrade Is a Trial of the Last 90 Days

· 12 min read
Tian Pan
Software Engineer

Here is a Tuesday-morning conversation that nobody on your AI team is prepared for. The new model lands in shadow mode. Within an hour the eval dashboard lights up: it categorizes 4% of refund requests differently than the model you have been running for the last quarter. Most of those flips look like the new model is right. Someone in the room — usually the one with the most lawyers in their reporting line — asks the question that ends the celebration: so what are we doing about the ninety days of decisions the old model already shipped?

That is the agent backfill problem. The moment a smarter model starts producing outputs that look more correct than your previous model's, every durable decision the previous model made becomes a contested record. You did not intend to indict the past. The new model did it for you, automatically, the first time you compared traces. And now you have an engineering question (can we replay history?), a legal question (do we have to disclose corrected outcomes?), and a product question (do users see retroactive changes?), and they collide.