Skip to main content

12 posts tagged with "governance"

View all tags

You Accidentally Built a Feature-Flag System for Prompts — Without the Governance

· 10 min read
Tian Pan
Software Engineer

Pull up the config repo your team uses to ship prompt changes. Look at the last thirty commits. How many had a code review? How many had an eval gate in CI? How many can you attribute — with certainty — to a measurable change in production behavior for the users who saw them? If your answer is "most," you are an outlier. For everyone else, those commits are running in production right now, and the system reading them is doing exactly what a feature-flag service does: hot-reload a value, fan it out to users, change product behavior. The difference is that your feature-flag service has audit logs, exposure tracking, kill switches, and per-cohort targeting. Your prompt deploy pipeline has git push.

This is not a metaphor. It is an accurate description of the production system your team is running. The prompt config repo, the S3 bucket your workers poll, the "prompts" collection in your database, the LangSmith/PromptLayer/Braintrust asset that your app fetches on boot — these are all feature-flag services. They have the same runtime shape: a value lives outside the binary, the binary reads it on a hot path, changing the value changes behavior for real users without a deploy. The only thing missing is every control your SRE team demanded before they would approve the actual feature-flag service.

Organizational Antibodies: Why AI Projects Die After the Pilot

· 11 min read
Tian Pan
Software Engineer

The demo went great. The pilot ran for six weeks, showed clear results, and the stakeholders in the room were impressed. Then nothing happened. Three months later the project was quietly shelved, the engineer who built it moved on to something else, and the company's AI strategy became a slide deck that said "exploring opportunities."

This is the pattern that kills AI initiatives. Not technical failure. Not insufficient model capability. Not even budget. The technology actually works — research consistently shows that around 80% of AI projects that reach production meet or exceed their stated expectations. The problem is the 70-90% that never get there.

Board-Level AI Governance: The Five Decisions Only Executives Can Make

· 9 min read
Tian Pan
Software Engineer

A major insurer's AI system was denying coverage claims. When humans reviewed those decisions, 90% were found to be wrong. The insurer's engineering team had built a performant model. Their MLOps team had solid deployment pipelines. Their data scientists had rigorous evaluation metrics. None of that mattered, because no one at the board level had ever answered the question: what is our acceptable failure rate for AI decisions that affect whether a sick person gets treated?

That gap — between functional technical systems and missing executive decisions — is where AI governance most often breaks down in practice. The result is organizations that are simultaneously running AI in production and exposed to liability they've never formally acknowledged.

The EU AI Act Is Now Your Engineering Backlog

· 12 min read
Tian Pan
Software Engineer

Most engineering teams discovered the GDPR through a legal email that arrived three weeks before the deadline. The EU AI Act is repeating that pattern, and the August 2, 2026 enforcement date for high-risk AI systems is close enough that "we'll deal with compliance later" is no longer an option. The difference between GDPR and the AI Act is that GDPR compliance was mostly about data handling policies. AI Act compliance requires building new system components — components that don't exist yet in most production AI systems.

What the regulation calls "human oversight obligations" and "audit trail requirements" are, translated into engineering language, a dashboard, an event log, and a data lineage system. This article treats the EU AI Act as an engineering specification rather than a legal document and walks through what you actually need to build.

The EU AI Act Features That Silently Trigger High-Risk Compliance — and What You Must Ship Before August 2026

· 9 min read
Tian Pan
Software Engineer

An appliedAI study of 106 enterprise AI systems found that 40% had unclear risk classifications. That number is not a reflection of regulatory complexity — it is a reflection of how many engineering teams shipped AI features without asking whether the feature changes their compliance tier. The EU AI Act has a hard enforcement date of August 2, 2026 for high-risk systems. At that point, being in the 40% is not a management problem. It is an architecture problem you will be fixing at four times the original cost, under deadline pressure, with regulators watching.

This article is not a legal overview. It is an engineering read on the specific product decisions that silently trigger high-risk classification, the concrete deliverables those classifications require, and why the retrofit path is so much more expensive than the build-it-in path.

The Prompt Governance Problem: Managing Business Logic That Lives Outside Your Codebase

· 9 min read
Tian Pan
Software Engineer

A junior PM edits a customer-facing prompt during a product sprint to "make it sound friendlier." Two weeks later, a backend engineer tweaks the same prompt to fix a formatting quirk. An ML engineer, unaware of either change, adds chain-of-thought instructions in a separate system message that now conflicts with the PM's edit. None of these changes have a ticket. None have a reviewer. None have a rollback plan.

This is how most teams manage prompts. And at five prompts, it's annoying. At fifty, it's a liability.

The AI Procurement Gap: Why Your Vendor Evaluation Process Can't Handle Probabilistic Systems

· 11 min read
Tian Pan
Software Engineer

A procurement team I worked with spent eleven weeks scoring four LLM vendors against a 312-row RFP spreadsheet. They negotiated 99.9% uptime, $0.0008 per 1K input tokens, SOC 2 Type II, and a glossy benchmark PDF that put their selected vendor 2.3 points ahead on MMLU. The contract was signed on a Friday. The following Tuesday, the vendor silently rolled a model update, and the customer-support agent the team had built started routing roughly 14% of refund requests to the wrong queue. The uptime SLA was honored. The benchmark scores were unchanged. The procurement process had functioned exactly as designed, and the system was still broken.

This is the AI procurement gap. The instruments enterprise procurement uses to manage software risk — feature checklists, uptime guarantees, security questionnaires, sample benchmarks — were built for systems whose outputs are reproducible. None of those instruments measure the thing that actually determines whether an AI vendor will keep working for you: the behavioral stability of a stochastic surface that the vendor controls and you do not.

Agentic Audit Trails: What Compliance Looks Like When Decisions Are Autonomous

· 12 min read
Tian Pan
Software Engineer

When a human loan officer denies an application, there is a name attached to that decision. That officer received specific information, deliberated, and acted. The reasoning may be imperfect, but it is attributable. There is someone to call, question, and hold accountable.

When an AI agent denies that same application, there is a database row. The row says the decision was made. It does not say why, or what inputs drove it, or which version of the model was running, or whether the system prompt had been quietly updated two weeks prior. When your compliance team hands that row to a regulator, the regulator is not satisfied.

This is the agentic audit trail problem, and most engineering teams building on AI agents have not solved it yet.

Stakeholder Prompt Conflicts: When Platform, Business, and User Instructions Compete at Inference Time

· 10 min read
Tian Pan
Software Engineer

In 2024, Air Canada's chatbot invented a bereavement fare refund policy that didn't exist. A court ruled the company was bound by what the bot said. The root cause wasn't a model hallucination in the traditional sense — it was a priority inversion. The system prompt said "be helpful." Actual policy said "follow documented rules." When a user asked about compensation, the model silently resolved the conflict in favor of sounding helpful, and nobody audited that choice before it landed the company in court.

This is the stakeholder prompt conflict problem. Every production LLM system has at least three instruction authors: the platform layer (safety constraints and base model behavior), the business layer (operator-defined rules, compliance requirements, brand voice), and the user layer (the actual request). When those layers contradict each other — and they will — the model picks a winner. The question is whether your engineering team made that pick deliberately, or whether the model did it without anyone noticing.

Internal AI Tools vs. External AI Products: Why Most Teams Get the Safety Bar Backwards

· 8 min read
Tian Pan
Software Engineer

Most teams assume that internal AI tools need less safety work than customer-facing AI products. The logic feels obvious: employees are trusted users, the blast radius is contained, and you can always fix things with a Slack message. This intuition is dangerously wrong. Internal AI tools often need more safety engineering than external products — just a completely different kind.

The 88% of organizations that reported AI agent security incidents last year weren't mostly hit through their customer-facing products. The incidents came through internal tools with ambient authority over business systems, access to proprietary data, and the implicit trust of an employee session.

Building Governed AI Agents: A Practical Guide to Agentic Scaffolding

· 10 min read
Tian Pan
Software Engineer

Most teams building AI agents spend the first month chasing performance: better prompts, smarter routing, faster retrieval. They spend the next six months chasing the thing they skipped—governance. Agents that can't be audited get shut down by legal. Agents without permission boundaries wreak havoc in staging. Agents without human escalation paths quietly make consequential mistakes at scale.

The uncomfortable truth is that most agent deployments fail not because the model underperforms, but because the scaffolding around it lacks structure. Nearly two-thirds of organizations are experimenting with agents; fewer than one in four have successfully scaled to production. The gap isn't model quality. It's governance.

Governing Agentic AI Systems: What Changes When Your AI Can Act

· 9 min read
Tian Pan
Software Engineer

For most of AI's history, the governance problem was fundamentally about outputs: a model says something wrong, offensive, or confidential. That's bad, but it's contained. The blast radius is limited to whoever reads the output.

Agentic AI breaks this assumption entirely. When an agent can call APIs, write to databases, send emails, and spawn sub-agents — the question is no longer just "what did it say?" but "what did it do, to what systems, on whose behalf, and can we undo it?" Nearly 70% of enterprises already run agents in production, but most of those agents operate outside traditional identity and access management controls, making them invisible, overprivileged, and unaudited.