AI-Generated Code Now Runs in Medical Devices, Power Grids, and Vehicles — 80% of Embedded Teams Use It, But Safety Standards Never Anticipated This

Here is a statistic that should alarm every engineer working on safety-critical systems: a recent industry survey found that 80% of embedded software development teams now use AI-assisted tools for code generation, testing, or documentation. These are not web applications or mobile apps — these are systems that control medical devices, power grid infrastructure, autonomous vehicles, and industrial machinery. The systems where software failures don’t cause a 500 error page, they cause physical harm or death.

The fundamental problem is that our safety standards were never designed for this reality. ISO 26262 (automotive functional safety), IEC 62304 (medical device software lifecycle), and DO-178C (airborne systems) were all written with a core assumption: code is deterministic, human-authored, and fully traceable from requirements through design to implementation. Every line of code can be linked to a requirement. Every design decision has a documented rationale. Every review has a record.

AI-generated code breaks every link in that traceability chain.

Why Traceability Matters in Safety-Critical Systems

When a medical device malfunctions and a patient is harmed, regulatory investigators need to trace backward from the failure. They examine the code that caused the malfunction, trace it to the design decisions that produced that code, link those decisions to the requirements they were meant to satisfy, and verify that appropriate reviews occurred at every stage.

With human-authored code, this chain is intact. A function exists because a requirement demanded it. The implementation approach was chosen because a design review evaluated alternatives. The code was reviewed by engineers who understood the domain-specific risks.

With AI-generated code, the “design decision” is a prompt. The “review” is often a developer glancing at the output and confirming it compiles and passes tests. The traceability chain doesn’t just weaken — it shatters. When a regulator asks “why was this interrupt handler implemented this way?” the honest answer is “because that’s what the model generated,” which is fundamentally incompatible with safety certification requirements.

The Adversarial Risk Dimension

The threat model expands significantly when AI generates the code itself. Researchers have already demonstrated that adversarial inputs can fool AI models in ways with physical consequences — causing perception systems to misclassify stop signs, or medical imaging models to miss tumors. When the code logic is also AI-generated, the attack surface grows in ways we barely understand.

AI-generated code can contain subtle logical flaws that appear correct in testing but activate under specific timing conditions, input combinations, or hardware states. Unlike a human engineer’s mistake, which typically reflects a misunderstanding that a skilled reviewer can catch, an AI’s error can be alien — structurally plausible but semantically wrong in ways that don’t match any known bug pattern.

The Emerging Regulatory Response

ISO/PAS 8800 represents the first serious attempt to bridge the gap between AI capabilities and safety requirements. The standard requires organizations to maintain a “catalog of potential weaknesses” specific to AI/ML components, implement continuous monitoring for AI behavior drift, and ensure meaningful human oversight at critical decision points. It acknowledges that AI-assisted development is happening and attempts to create guardrails.

But ISO/PAS 8800 is advisory, not mandatory. Compliance tooling barely exists. And the standard’s requirements for human oversight assume organizations have enough qualified engineers to perform that oversight — which brings us back to the skills gap that AI tools were supposed to address.

A Real-World Case That Keeps Me Up at Night

I consulted for a medical device company where engineers used GitHub Copilot to generate C code for a drug infusion pump controller. The code compiled cleanly, passed all unit tests, and appeared functionally correct through integration testing. The development team was thrilled with the productivity improvement.

Then we ran formal verification. The analysis found a race condition in the interrupt handler that could cause the pump to miss a dose under specific timing conditions — specifically, when a timer interrupt fired during a communication interrupt service routine. The window was narrow (microseconds), which is why testing never caught it. But in a device that runs 24/7 for weeks, microsecond windows eventually get hit.

A human embedded systems engineer with experience in real-time systems would almost certainly have avoided this. They understand hardware-software interaction patterns — that interrupt handlers must be atomic, that shared state between interrupt contexts requires careful synchronization, that the compiler’s optimization assumptions don’t hold across interrupt boundaries. The AI model had none of this understanding. It generated syntactically correct C that happened to contain a potentially lethal timing bug.

The Question We Need to Answer

Should AI-generated code be permitted in safety-critical systems at all? And if so, what verification standards are needed beyond what we currently have? The productivity benefits are real — embedded development is expensive and slow. But the failure modes are not theoretical. They’re a race condition away from a missed medication dose, a misclassified obstacle, or a grid controller making the wrong switching decision.

I don’t think blanket bans are practical — the genie is out of the bottle. But I do think we need mandatory formal verification for any AI-generated code in safety-critical paths, updated certification standards that explicitly address AI-assisted development, and a fundamental rethinking of what “code review” means when the author is a neural network.

What frameworks or verification approaches are your teams using for AI-generated code in high-assurance systems?

The power grid scenario is the one that genuinely terrifies me, and I say this as someone who has spent years working on infrastructure systems.

SCADA systems and industrial control systems already carry a massive legacy security debt. Many of these systems run on decades-old software that was never designed to be networked, patched retroactively with security measures that don’t fully address the architectural vulnerabilities. The attack surface is already enormous. Now we are adding AI-generated code to systems that control electricity distribution, water treatment, and gas pipelines.

This is not a theoretical risk. Stuxnet proved definitively that industrial control systems can be compromised with devastating physical consequences — centrifuges were physically destroyed through software manipulation. The Colonial Pipeline attack showed that even peripheral IT system compromises can cascade into critical infrastructure shutdowns. If AI-generated code introduces a subtle vulnerability in a power grid controller — a timing flaw, an edge case in load balancing logic, an incorrect threshold in fault detection — the consequences are not a website outage or a degraded user experience. The consequences are cascading blackouts affecting millions of people, failed hospital backup systems, disrupted water treatment, and potentially loss of life.

What makes AI-generated code particularly dangerous in this context is the subtlety of potential flaws. A traditionally written bug in a SCADA system is usually a known pattern — buffer overflow, integer overflow, unvalidated input. Security teams know how to scan for these. An AI-generated flaw might be structurally novel — correct enough to pass code review and all standard static analysis, but containing a logical vulnerability that only manifests under specific grid conditions that are rare but not impossible.

I think we need an outright ban on AI-generated code in critical infrastructure until verification tooling catches up to the threat model. The productivity gains from AI-assisted development do not justify the risk when the failure mode is “city loses power for 72 hours” or “water treatment plant delivers contaminated water.” We can afford to write grid control software slowly and carefully. We cannot afford to get it wrong.

The counterargument that “humans make mistakes too” misses the point. Human mistakes in these systems are well-characterized, and our safety processes evolved specifically to catch them. AI mistakes are a new category that our existing verification infrastructure was not designed to detect.

The regulatory arbitrage concern here is real and deeply troubling from a business strategy perspective.

Companies operating in jurisdictions with strict safety standards — the EU under the Medical Device Regulation, Japan under PMDA requirements, the US under FDA 21 CFR Part 820 — face rigorous review processes when AI is involved in safety-critical system development. These processes add time, cost, and documentation overhead. They exist for good reason, but they create a competitive asymmetry.

Companies in less regulated markets can ship AI-generated medical device code, industrial controller firmware, or vehicle software with minimal oversight. They move faster. They spend less on verification. They reach market sooner. And patients, workers, and consumers in those markets receive safety-critical devices with significantly less verified code.

This creates a race to the bottom. Regulated companies face a choice: maintain rigorous safety processes and accept slower time-to-market, or lobby for weaker regulation to compete. Meanwhile, unregulated companies ship faster and cheaper, and the people who bear the risk are end users who have no visibility into how the software controlling their medical device or vehicle was developed.

ISO/PAS 8800 is a step in the right direction, but voluntary standards don’t prevent bad actors. They provide cover for responsible companies while doing nothing to constrain irresponsible ones. We need mandatory certification for AI-assisted development in safety-critical domains, similar to how we certify the engineers who design bridges, buildings, and aircraft. You cannot design a load-bearing structure without a licensed Professional Engineer signing off. Why should software that controls a drug infusion pump or a vehicle braking system be held to a lower standard?

The certification should cover not just the final code, but the development process: what AI tools were used, how outputs were verified, what formal methods were applied, and who had the expertise to review the results. This creates accountability that voluntary standards cannot provide.

I am also concerned about the liability gap. When AI-generated code causes harm, who is liable? The developer who prompted the model? The company that deployed the device? The AI tool vendor? Current product liability frameworks were not designed for this scenario, and the ambiguity creates an environment where everyone can point fingers and no one is accountable.

The skills gap dimension compounds every other problem in this thread, and as an engineering director, it is the issue I find most difficult to address.

Traditional embedded systems engineers understand hardware-software interaction through years of specialized training and hands-on experience. They know why volatile qualifiers matter — because the compiler will optimize away reads to memory-mapped hardware registers without them. They understand why certain operations are not atomic on specific architectures — that a 32-bit read on an 8-bit microcontroller requires multiple bus cycles during which an interrupt can fire and corrupt the value. They know why interrupt priority inversion can cause deadlocks — because a high-priority ISR waiting on a resource held by a lower-priority context that has been preempted creates an unbounded priority inversion.

This knowledge does not come from reading documentation. It comes from debugging a system at 3 AM when a race condition manifests once every 10,000 hours of operation. It comes from tracing an oscilloscope through an interrupt storm that only occurs at specific temperature ranges. It is experiential knowledge that AI tools fundamentally cannot replicate.

The race condition that @priya_security described in the drug infusion pump is the perfect example. This is not a bug that any AI code review tool would catch, because it requires understanding of hardware timing — that the interrupt controller has a specific arbitration behavior, that the communication peripheral generates interrupts at irregular intervals, and that the overlap window between timer and communication ISRs creates a shared-state corruption path. This understanding comes from working with the specific hardware, reading errata sheets, and knowing the failure modes from experience.

Here is the paradox that concerns me most: AI tools abstract away the very understanding needed to safely review AI-generated code. A developer using Copilot to generate embedded C code might not understand why the suggestions are dangerous, because they never needed to learn the low-level details that the AI is getting wrong. They see code that compiles, passes tests, and appears to work. They lack the expertise to ask “but what happens when these two interrupts collide?”

If we use AI to generate safety-critical code, we need more experienced reviewers, not fewer. We need engineers who deeply understand the target hardware, the real-time constraints, and the failure modes specific to the application domain. But the economics push in the opposite direction — AI tools promise to reduce headcount and accelerate timelines, which means fewer experienced engineers reviewing more AI-generated code, with less time per review.

I have started requiring that any AI-generated code in our safety-critical paths undergoes review by a senior embedded engineer with at least 10 years of domain experience, plus formal verification where applicable. It is slower and more expensive than what our competitors are doing. But I would rather ship late than ship a device that harms someone because we optimized for velocity over safety.