Skip to main content

The Inherited AI System Audit: How to Take Ownership of an LLM Feature You Didn't Build

· 10 min read
Tian Pan
Software Engineer

Someone left. The onboarding doc says "ask Sarah" but Sarah is at a different company now. You're staring at a 900-line system prompt with sections titled things like ## DO NOT REMOVE THIS SECTION, and you have no idea what happens if you do.

This is the inherited AI system problem, and it's different from inheriting regular code. With legacy code, a determined engineer can trace execution paths, read tests, and reconstruct intent from behavior. With an inherited LLM feature, the prompt is the logic — but it's written in natural language, its failure modes are probabilistic, and the author's intent is trapped inside their head. There are no stack traces that tell you which guardrail fired and why.

Here's how to audit an LLM feature you didn't build — without triggering the regressions your predecessor quietly prevented.

Start With the Failure Archive, Not the Code

The most valuable artifact in any inherited AI system isn't the prompt. It's the history of what broke. Before you read a single line of the system prompt, go to wherever incidents are tracked — Slack, Linear, Jira, a shared doc — and search for every time this feature caused a user-visible problem.

What you're building is a failure catalog: a map from symptom to system response. Every unusual section in the system prompt was written to prevent something specific. The ## Never discuss competitors clause exists because someone once complained that the assistant recommended a competing product. The five-shot examples at the bottom exist because without them, the model returned the wrong JSON schema on edge cases that took three weeks to diagnose.

If there's no incident history, interview the departing engineer or PM. Even thirty minutes of "what broke the most and how did you fix it?" is worth more than a week of exploratory prompting. What you're after is the causal chain: incident → behavior → prompt change → resolution. Each link in that chain explains a paragraph in the system prompt.

Deconstruct the Prompt Structurally

Once you have the failure catalog, map it onto the prompt. Go section by section and ask: what failure does this prevent?

Useful heuristics by section type:

  • Role definitions and persona clauses prevent behavioral drift — the model slipping into a default assistant persona that doesn't match the product's voice or scope.
  • Constraint and restriction sections ("never", "do not", "always refuse") prevent specific user-visible failures. The more specific the language, the more specific the incident that caused it.
  • Few-shot examples encode the implicit behavioral contract. An example that looks redundant to you likely exists to resolve an ambiguity the model repeatedly got wrong in production. Removing it without understanding which ambiguity it resolves is how you ship a regression.
  • Output format instructions exist because downstream code expects a specific schema. Changing them without auditing every consumer of that output is how you break integrations silently.
  • Context injection patterns — what information gets included, in what order, and what gets excluded — reflect hard-won decisions about which context helps vs. misleads the model. Leaner context often beats comprehensive context; the exclusions are as intentional as the inclusions.

For each section, write a one-line annotation: "This section prevents [failure type]. Evidence: [incident or assumption]." If you can't write that annotation — if you genuinely don't know what failure it prevents — that's the section that needs a controlled experiment before you touch it.

Distinguish Hard Constraints from Soft Ones

Not all guardrails carry the same risk profile. Before you start refactoring, you need a mental model of which constraints are load-bearing and which are cosmetic.

Hard constraints are behavioral properties the system must never violate. In a healthcare assistant, never providing a diagnosis. In a financial tool, never generating investment advice. In any system with PII, never echoing personal information back through uncontrolled output paths. These constraints often exist for legal or compliance reasons that aren't written in the prompt itself. Violating them once — even in a test environment — can create liability. Understand where these constraints live (prompt-based, infrastructure-level, or both) before you touch anything near them.

Soft constraints are stylistic or behavioral preferences that can tolerate temporary violations if the overall direction is right. The tone instructions, the formatting preferences, the heuristics for how much to elaborate on answers. These are safe to experiment with because violations are recoverable and visible.

The distinction matters because inherited AI systems often conflate these. You'll find safety-critical restrictions buried in the same section as "respond in under 200 words." Pull them apart. Hard constraints belong in a separate document, owned by someone with authority to change them, with a test case that proves the constraint holds.

Build a Characterization Test Suite Before Touching Anything

The first code you should write when inheriting an LLM feature is not new behavior. It's characterization tests — a set of inputs paired with the outputs the current system produces — that lock current behavior in place before you change anything.

This is borrowed from legacy software engineering, but the LLM version has a wrinkle: the model's outputs aren't deterministic. You're not testing for exact string matches. You're testing for behavioral properties: does the output stay within the expected topic scope? Does it respect the format? Does it refuse the inputs it should refuse?

Use an LLM-as-judge pattern to evaluate these properties: send the model's output to a second call with a rubric and ask it to score against each constraint. Run your characterization tests against the current system, capture the distribution of scores, and treat that distribution as your regression baseline. Any refactoring that degrades the distribution is a candidate regression, not an improvement.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates