Skip to main content

The 14-Month Half-Life of Your Prompt Expert

· 9 min read
Tian Pan
Software Engineer

Every company shipping AI features in production has one or two engineers it cannot afford to lose, and most of them do not know who those engineers are until the resignation email arrives.

The person in question is rarely the loudest in the room. They are the one who remembers that the customer-support summarizer's tone got fixed by a three-line system-prompt edit after the Q2 escalation, that the eval suite added six cases the week the model provider quietly changed its default sampling, and that the judge calibration drifted the last time someone "cleaned up" the rubric. None of this is written down in a place a successor would find. It lives in one head, and that head is being messaged by a recruiter with a 25% raise attached roughly every two weeks.

The uncomfortable part is not that this engineer might leave. Engineers leave; that is normal. The uncomfortable part is that your organization is using a promotion ladder, a compensation band, and a knowledge-management discipline that were all designed for a role that does not describe what this person actually does. The generic IC ladder predicts AI engineering competence about as well as a 2018 promotion rubric predicts it in 2026 — which is to say, not at all. And the engineers it fails to see are precisely the ones holding the load-bearing knowledge.

Why the Standard Ladder Cannot See Them

Most engineering promotion rubrics measure delivered scope: a new service, a system rewrite, a migration with a clean before-and-after. The implicit unit of impact is lines of code shipped against a roadmap. That unit was a reasonable proxy for fifteen years. It is now actively misleading for AI work.

Consider what a high-impact quarter looks like for a prompt expert. They diagnosed a hallucination pattern that was costing the support team manual corrections, traced it to an ambiguous instruction in a retrieval prompt, and fixed it with a 30-line diff that moved an eval slice up two percentage points. They added provenance metadata to eleven eval cases so the next engineer would know which incident each one guards. They re-ran the judge calibration after a model version bump and caught a regression before it shipped.

That is enormous value. It is also nearly invisible to a traditional promo packet. There is no 5,000-line artifact. There is no service with the engineer's name on the on-call rotation. The eval lift sounds small to anyone who does not know that two points on that slice was the difference between a feature that shipped and one that got pulled. The work reads, on the rubric's own terms, like maintenance — and maintenance has always lost to building.

This is the same bias that has long penalized the engineer who prevents an outage nobody sees versus the one who launches a feature named in the all-hands. AI engineering did not create the bias. It just moved most of the consequential work onto the wrong side of it. The artifact of good AI engineering is usually a small diff with a measured effect, not a large diff with a visible footprint. A rubric that cannot credit that is a rubric that cannot promote your most important people, and engineers who do not see a path up leave.

The Comp Band Is Calibrated to a Job That No Longer Exists

The retention problem has a second mechanism, and it is purely financial. The role of "the engineer who owns our prompts and evals" mostly did not exist eighteen months ago. The people in it were hired into something else — a backend role, an ML role, a generalist software role — and their compensation band still reflects that original grade.

Meanwhile the external market for the skill has detached from that band entirely. PwC's 2025 Global AI Jobs Barometer measured a 56% wage premium for AI skills, up from 25% a year earlier — the premium itself doubled in twelve months. Specialists in LLM fine-tuning and evaluation command 25 to 40 percent above generalist engineers. The internal band does not move at that speed, because internal bands are reviewed annually and the market is repricing quarterly.

The result is a widening gap between what your prompt expert is paid and what the next employer will pay. The average tenure of an engineer in 2025 sits around two to three years, and the average switcher gets a double-digit raise simply for moving. For an engineer whose specialty is in a doubling-premium market, the math is not subtle. They do not need to be unhappy to leave. They just need to do arithmetic.

If your compensation review cycle is annual and your retention strategy is a counteroffer written the week someone resigns, you have already lost. The counteroffer is the most expensive and least effective form of comp adjustment: it costs more than a proactive raise, it signals that you only pay market rates under duress, and it usually delays the departure rather than preventing it. Pricing AI engineering against the market — on a refresh cycle that matches how fast the market actually moves — is cheaper than the archaeology project that follows the exit.

The Departure Is a Six-Month Archaeology Project

Here is what makes the AI-engineer exit qualitatively worse than a typical one. When a backend engineer leaves, their work is mostly in the codebase. It is version-controlled, reviewed, and at least partially documented by its own structure. A new engineer can read it.

When a prompt expert leaves, a large fraction of their work is in their memory. The prompts are in the repo, but the reasons are not. The eval cases exist, but the failure mode each one guards is not recorded. The judge has a rubric, but the calibration history — what was tuned, when, and why — is gone. The new engineer inherits a system that works and has no idea why it works, which means they cannot safely change it. The first model migration after the departure becomes a forensic exercise: something breaks, and nobody knows whether it broke a guarantee the team deliberately built or an accident the team never noticed.

This is the bus factor problem, and AI systems have an unusually severe version of it because the knowledge is unusually implicit. The fix is not heroics. It is treating institutional knowledge as an artifact with the same review discipline as code:

  • Every prompt carries a docstring. Not a description of what it does — the diff shows that — but a statement of intent and a list of the incidents or eval failures that shaped its current form. The next engineer should be able to read why a sentence is there before deciding to delete it.
  • Every eval case carries provenance. A metadata block naming the failure mode it guards and the incident that produced it. A case with no provenance is a case a future cleanup will delete, taking a silent guarantee with it.
  • Every judge carries a versioned rubric and a calibration log. When the rubric changed, who changed it, and what the calibration looked like before and after. Judge drift is invisible without this record.
  • Every prompt repo has a CODEOWNERS file that names a successor. Not the original author — a second person who has actually reviewed changes and can answer questions. If the file names exactly one person, that is your bus factor, written down.

None of this depends on the original engineer staying. That is the entire point. The goal is to convert a departure from a six-month archaeology project into a three-week handoff.

Rotate the Knowledge Before Tenure Makes It Permanent

Documentation discipline reduces the damage of a departure. It does not, by itself, reduce the probability that the knowledge is concentrated in one person — because the person who writes the prompts naturally accumulates the context faster than they can write it down. Concentration is the default state. You have to actively work against it.

The lever is deliberate rotation. Before an engineer's tenure on a prompt or eval surface makes their knowledge irreversibly load-bearing, a second engineer should be doing real work on it: reviewing the non-trivial diffs, owning a slice of the eval suite, running a model migration with the first engineer as backup rather than driver. This is not free — it is slower in the short term, the way pair programming is slower in the short term — and it is the only thing that actually raises the bus factor above one.

Treat the single-owner prompt surface the way a mature codebase treats a file owned more than 80 percent by one author: as a flagged risk, surfaced on a dashboard, with an owner accountable for bringing the number down. The teams that survive a key departure are not the ones with the best documentation. They are the ones where the knowledge was already spread across two heads before anyone gave notice.

The Leadership Reframe

Most of the failure here traces to a single category error: treating AI engineers as a sub-class of software engineer who happens to import a different SDK. Under that framing, the existing ladder, the existing comp band, and the existing knowledge practices all seem adequate, because they work for software engineers and this is "just" a software engineer.

It is not. This is a role with its own promotion signals — small diffs with measured eval effects, judge calibrations that survived a migration, eval suites that caught a regression, postmortems where the AI dimension was the root cause. It has its own market dynamics, in a premium that is doubling year over year. And it has its own knowledge-preservation problem, because more of its work lives in memory than in the repo.

An organization that treats this role with the generic ladder will keep promoting the engineers whose work photographs well and keep losing the ones whose work is a 30-line diff that quietly held the product together. The half-life is not a law of nature. It is the predictable output of a system that was built for a different job. The recruiter who messaged your prompt expert this week is not the cause of the problem. They are just the part of it you can see.

References:Let's stay in touch and Follow me for more thoughts and updates