Skip to main content

31 posts tagged with "governance"

View all tags

The Prompt Governance Problem: Managing Business Logic That Lives Outside Your Codebase

· 9 min read
Tian Pan
Software Engineer

A junior PM edits a customer-facing prompt during a product sprint to "make it sound friendlier." Two weeks later, a backend engineer tweaks the same prompt to fix a formatting quirk. An ML engineer, unaware of either change, adds chain-of-thought instructions in a separate system message that now conflicts with the PM's edit. None of these changes have a ticket. None have a reviewer. None have a rollback plan.

This is how most teams manage prompts. And at five prompts, it's annoying. At fifty, it's a liability.

The AI Procurement Gap: Why Your Vendor Evaluation Process Can't Handle Probabilistic Systems

· 11 min read
Tian Pan
Software Engineer

A procurement team I worked with spent eleven weeks scoring four LLM vendors against a 312-row RFP spreadsheet. They negotiated 99.9% uptime, $0.0008 per 1K input tokens, SOC 2 Type II, and a glossy benchmark PDF that put their selected vendor 2.3 points ahead on MMLU. The contract was signed on a Friday. The following Tuesday, the vendor silently rolled a model update, and the customer-support agent the team had built started routing roughly 14% of refund requests to the wrong queue. The uptime SLA was honored. The benchmark scores were unchanged. The procurement process had functioned exactly as designed, and the system was still broken.

This is the AI procurement gap. The instruments enterprise procurement uses to manage software risk — feature checklists, uptime guarantees, security questionnaires, sample benchmarks — were built for systems whose outputs are reproducible. None of those instruments measure the thing that actually determines whether an AI vendor will keep working for you: the behavioral stability of a stochastic surface that the vendor controls and you do not.

Agentic Audit Trails: What Compliance Looks Like When Decisions Are Autonomous

· 12 min read
Tian Pan
Software Engineer

When a human loan officer denies an application, there is a name attached to that decision. That officer received specific information, deliberated, and acted. The reasoning may be imperfect, but it is attributable. There is someone to call, question, and hold accountable.

When an AI agent denies that same application, there is a database row. The row says the decision was made. It does not say why, or what inputs drove it, or which version of the model was running, or whether the system prompt had been quietly updated two weeks prior. When your compliance team hands that row to a regulator, the regulator is not satisfied.

This is the agentic audit trail problem, and most engineering teams building on AI agents have not solved it yet.

Stakeholder Prompt Conflicts: When Platform, Business, and User Instructions Compete at Inference Time

· 10 min read
Tian Pan
Software Engineer

In 2024, Air Canada's chatbot invented a bereavement fare refund policy that didn't exist. A court ruled the company was bound by what the bot said. The root cause wasn't a model hallucination in the traditional sense — it was a priority inversion. The system prompt said "be helpful." Actual policy said "follow documented rules." When a user asked about compensation, the model silently resolved the conflict in favor of sounding helpful, and nobody audited that choice before it landed the company in court.

This is the stakeholder prompt conflict problem. Every production LLM system has at least three instruction authors: the platform layer (safety constraints and base model behavior), the business layer (operator-defined rules, compliance requirements, brand voice), and the user layer (the actual request). When those layers contradict each other — and they will — the model picks a winner. The question is whether your engineering team made that pick deliberately, or whether the model did it without anyone noticing.

Internal AI Tools vs. External AI Products: Why Most Teams Get the Safety Bar Backwards

· 8 min read
Tian Pan
Software Engineer

Most teams assume that internal AI tools need less safety work than customer-facing AI products. The logic feels obvious: employees are trusted users, the blast radius is contained, and you can always fix things with a Slack message. This intuition is dangerously wrong. Internal AI tools often need more safety engineering than external products — just a completely different kind.

The 88% of organizations that reported AI agent security incidents last year weren't mostly hit through their customer-facing products. The incidents came through internal tools with ambient authority over business systems, access to proprietary data, and the implicit trust of an employee session.

Building Governed AI Agents: A Practical Guide to Agentic Scaffolding

· 10 min read
Tian Pan
Software Engineer

Most teams building AI agents spend the first month chasing performance: better prompts, smarter routing, faster retrieval. They spend the next six months chasing the thing they skipped—governance. Agents that can't be audited get shut down by legal. Agents without permission boundaries wreak havoc in staging. Agents without human escalation paths quietly make consequential mistakes at scale.

The uncomfortable truth is that most agent deployments fail not because the model underperforms, but because the scaffolding around it lacks structure. Nearly two-thirds of organizations are experimenting with agents; fewer than one in four have successfully scaled to production. The gap isn't model quality. It's governance.

Governing Agentic AI Systems: What Changes When Your AI Can Act

· 9 min read
Tian Pan
Software Engineer

For most of AI's history, the governance problem was fundamentally about outputs: a model says something wrong, offensive, or confidential. That's bad, but it's contained. The blast radius is limited to whoever reads the output.

Agentic AI breaks this assumption entirely. When an agent can call APIs, write to databases, send emails, and spawn sub-agents — the question is no longer just "what did it say?" but "what did it do, to what systems, on whose behalf, and can we undo it?" Nearly 70% of enterprises already run agents in production, but most of those agents operate outside traditional identity and access management controls, making them invisible, overprivileged, and unaudited.