Skip to main content

Board-Level AI Governance: The Five Decisions Only Executives Can Make

· 9 min read
Tian Pan
Software Engineer

A major insurer's AI system was denying coverage claims. When humans reviewed those decisions, 90% were found to be wrong. The insurer's engineering team had built a performant model. Their MLOps team had solid deployment pipelines. Their data scientists had rigorous evaluation metrics. None of that mattered, because no one at the board level had ever answered the question: what is our acceptable failure rate for AI decisions that affect whether a sick person gets treated?

That gap — between functional technical systems and missing executive decisions — is where AI governance most often breaks down in practice. The result is organizations that are simultaneously running AI in production and exposed to liability they've never formally acknowledged.

Right now, 50–75% of companies use AI operationally. Fewer than 25% have board-approved structured AI policies. The EU AI Act's prohibited-use provisions are already in force (as of February 2025), with high-risk AI compliance phases arriving in 2026–2027. The SEC has flagged AI governance disclosures as an investor concern. And class-action lawsuits are establishing that boards bear fiduciary responsibility for algorithmic decisions made under their watch.

Most of the AI governance writing that engineers encounter focuses on tooling: model registries, eval pipelines, audit logs, prompt versioning. That work matters. But it cannot substitute for five strategic decisions that only executives can make — and that no amount of MLOps sophistication can replace.

Decision 1: Risk Appetite

Risk appetite is the most foundational decision, and the one most organizations have never formally made.

Engineering teams regularly make implicit risk decisions. They choose a confidence threshold for a fraud model. They set a retry budget for an agentic workflow. They pick the false positive rate that minimizes user complaints. Those are engineering tradeoffs within an unstated upper bound — but the upper bound itself is a business and legal decision, not a technical one.

Boards need to explicitly define what failure rates are acceptable for different categories of AI decisions. A recommendation system and a credit-approval system have completely different acceptable error rates. A customer support chatbot and a diagnostic tool require different levels of human review. These thresholds determine what systems can be deployed autonomously, which require human-in-the-loop review, and which shouldn't be built at all.

The practical output of this decision is a risk tier matrix — typically High, Medium, and Low risk categories — with defined performance floors and human oversight requirements for each. Engineering teams can then build to a spec rather than inferring intent.

Without this decision, engineering absorbs the risk by default. They make calls that should be board-level policy, often under competitive pressure, and with no documented authority to do so.

Decision 2: Liability Model

When an AI system causes harm, who is responsible? This seems like a legal question, but its answer has direct engineering implications.

Organizations need to choose between several liability postures. Strict liability says the company is responsible for all AI outputs, period. Vicarious liability says responsibility flows through the human who deployed or used the system. Shared liability distributes responsibility between vendor, deployer, and operator depending on the decision chain.

The Air Canada chatbot case illustrates why this matters. A customer received incorrect information from the airline's support chatbot and acted on it. The court held Air Canada liable — the chatbot was treated as a company representative, not an external product. The governance failure wasn't in the ML model; it was that no one at the executive level had defined the company's liability posture for chatbot outputs, which would have determined what review processes, disclaimers, and human escalation paths were required.

The EU AI Act now imposes liability frameworks on high-risk AI deployers regardless of whether they've defined one internally. Insurance policies are tightening: broad AI exclusions in D&O policies now leave boards personally exposed to follow-on shareholder claims when AI incidents occur. The organizations that survive AI incidents most cleanly are the ones that defined their liability model before the incident — not the ones that had the best models.

Decision 3: Model Selection Authority

Who has authority to approve deploying a new AI model or switching vendors? At most companies, this is answered in practice by whoever made the last PR approval.

The consequences of undefined model selection authority accumulate quietly. Engineers use the best available tool. Product managers integrate third-party APIs on tight timelines. Teams run experiments that graduate into production features. Shadow AI proliferates — organizations with high shadow AI usage saw an average of $670,000 in excess breach costs compared to those with proper access controls, according to 2025 security data.

Model selection authority requires the board to define: what approval is required to use a new AI model in production, who has authority to approve models at each risk tier, what vendor due diligence is mandatory before integration, and what the process is for approving open-source models versus commercial APIs.

This decision doesn't need to be bureaucratically heavy. A tiered approval system works: low-risk models can be approved by a department head, medium-risk by the CTO, high-risk by a formal AI governance committee. The key is that the decision tree exists and is written down.

Engineering teams actually benefit from this clarity. "We don't have authority to deploy this without risk classification" is a much stronger engineering position than "we're concerned about this but management wants to ship."

Decision 4: AI Incident Escalation Path

What happens when an AI system fails in a way that could cause harm? Who finds out, in what order, with what authority to act?

Most organizations have incident response playbooks for infrastructure outages. Very few have separate AI incident escalation paths. The difference matters because AI failures have different characteristics: they may be statistically distributed rather than binary, they may accumulate slowly before becoming visible, they may require domain experts (lawyers, clinicians, financial analysts) to assess severity, and they may require public disclosure rather than just internal mitigation.

The 90% error rate case described at the outset almost certainly involved months of signals that never escalated to the board because there was no defined path for them to travel. By the time the failures became visible through litigation, the liability was already locked in.

A working AI incident escalation path needs to define: what constitutes an AI incident (a threshold, not a judgment call), what the severity tiers are, who receives notification at each tier and within what timeframe, what decisions can be made at the engineering level versus which require executive or board involvement, and how external parties (regulators, insurers, affected users) get notified.

The emerging framework here adapts epidemiological models — with phases from "no evidenced occurrence" through "endemic and mitigated" — because AI failures share the distributed, probabilistic characteristics of infectious disease more than they share characteristics with traditional software outages. Engineers don't need to understand epidemiology to implement this; they need clear severity thresholds and escalation triggers, which are exactly what executives need to define.

Decision 5: Data Retention Strategy

AI systems require data — training data, inference logs, evaluation datasets, audit trails. The question of what to keep, for how long, and under what access controls cannot be answered by engineering alone.

Regulatory requirements create minimum floors: healthcare PHI requires 6–7 year retention, financial records under SEC and FINRA rules require 5–7 years, and GDPR creates deletion obligations that conflict with training data retention. The EU AI Act adds audit trail requirements for high-risk AI systems that persist beyond the system's active life. These requirements apply at the organizational level, not the system level — meaning boards bear responsibility for data retention decisions that may span dozens of AI systems and multiple regulatory jurisdictions simultaneously.

Beyond compliance minimums, boards need to decide: how long inference logs are retained (a longer retention window enables better audit and debugging; a shorter window reduces breach exposure), whether training data provenance is tracked at the record level (required if you may need to demonstrate compliance with IP or consent requirements), and what deletion procedures apply when a model is retired or a data subject requests erasure.

Engineering teams can build excellent retention pipelines, but without an organizational policy, they're building infrastructure for an unstated goal. The Air Canada and copyright litigation cases both involved data governance failures that had engineering solutions — if the policy had existed to require them.

Designing the Governance Interface

Boards that govern AI well don't require board members to have ML expertise. They require a functioning interface between technical teams and executive oversight.

That interface has three components.

The first is a risk-tiered AI inventory. Every AI system in production gets a one-page summary: what it does, what risk tier it falls in, its key performance metrics in business terms (not ML metrics), its incident history, and its last audit date. This is not a technical whitepaper. It's a document that lets a board member ask informed questions.

The second is an escalation interface — a defined path from engineering to executive that activates on specified triggers, not on individual judgment calls. When an AI system's false positive rate crosses a defined threshold, that information travels to the right executive level automatically, not through informal channels.

The third is model selection documentation. When engineering evaluates AI vendors or models, they produce a structured record of what was considered, what criteria were applied, and who approved the decision. This serves two purposes: it creates an audit trail, and it gives engineering the authority to decline or slow down deployments that haven't completed the required review.

Organizations whose boards have made these five decisions don't have less AI velocity than organizations that haven't. They have better-defined authority structures, which means engineering teams spend less time absorbing liability they were never meant to hold, and more time building.

The Accountability Gap Is Closing

In 2023, the prevailing posture was that AI governance was a compliance concern for regulated industries. That posture is no longer defensible.

EU AI Act enforcement is live. SEC investor expectations are formalized. Class-action precedents are establishing fiduciary duty for algorithmic failures. Insurance exclusions are narrowing. The organizations that reach 2027 with clean records won't be the ones that moved slowest — they'll be the ones that made the five decisions early, built the governance interfaces their technical teams needed, and treated AI risk as a board-level concern from the start.

Engineering can build a lot of things. But it cannot build risk appetite, liability posture, authority structures, or accountability chains. Those have to come from the top.

References:Let's stay in touch and Follow me for more thoughts and updates