Skip to main content

The Production Logs Your Agent Cannot Read

· 9 min read
Tian Pan
Software Engineer

You wired your incident-response agent into Splunk. You gave it the query syntax in the system prompt, a tool to execute SPL, and a fresh API token. The first time it triaged a real page, it pulled the wrong logs, summarized the wrong service, and confidently named the wrong customer. The integration was perfect. The agent was useless.

Here is what you forgot. Fifteen years of log conventions, undocumented field names, severity strings that drifted from ERR to error to ERROR across three reorgs, and team-specific suffixes that turn customer_id into cust_id_v2_actual on the auth service and tenant.user.id on billing — none of that is in the prompt. You gave the agent access to the API. You did not give it access to the institutional knowledge that makes the API useful.

The shape of this failure is bigger than Splunk. It applies to any agent integration where the tool exposes a query language over a corpus the team has been shaping by hand for a decade. The agent has the verbs. It does not have the nouns.

Access Is Not Usable Access

There is a phase shift between "the agent can call this tool" and "the agent can ask the right questions of this tool." Most production integrations stop at the first one and assume the second comes for free.

It does not. Tool documentation in the typical MCP server publishes a name, a description, and an input schema. That is enough for the model to know how to format a call. It is not enough for the model to know what to ask. The schema tells you that query is a string. It does not tell you which fields exist in the index, which of them are populated reliably, which became deprecated in 2021 but still appear in 30% of records, or which team owns the convention that says "severity 4 means warning except in the legacy ingestion pipeline where it means fatal."

This is exactly half the integration. The half you shipped is the API contract — call shape, parameter types, response format. The half you skipped is the corpus contract — what the data actually contains, what its fields mean, and what an answerable question looks like.

Text-to-SQL teams hit this wall years ago and named it. Even top-performing models drop to 77% accuracy or worse on complex queries against real enterprise schemas, with the bulk of failures traceable to missing semantic context: column meanings, business definitions, valid filter values, relationships that exist socially but not in the foreign-key graph. Adding a semantic layer that explicitly encodes that context can lift accuracy past 99% on the same underlying database. The model did not get smarter. The corpus got readable.

Logs are worse than SQL. A SQL schema at least has typed columns. Logs have free-text messages with embedded JSON that someone half-structured in 2019 and forgot to finish.

How Tacit Knowledge Hides In Your Stack

Walk into the on-call channel during a real incident and watch what happens. A senior engineer types a Splunk query in eight seconds. It contains four field names you have never seen in any documentation. It joins against a lookup table you did not know existed. It filters by a magic string that is the name of a deprecated service the data pipeline still labels its rows with for backward compatibility.

That query is the artifact of a decade of pattern-matching. The engineer learned it by writing 4,000 queries, getting yelled at for the wrong ones, and slowly accumulating a private map between "the question I am being asked" and "the SPL that actually answers it on this corpus." Nobody wrote that map down. It exists in their head and the heads of three other people, two of whom have left.

Meta published an internal analysis last month of what happened when they tried to extend an AI coding assistant from incident pattern-matching to broader development tasks. The system worked for the first task because the patterns were already labeled — incident reports, root causes, fix diffs. It failed at the second because nobody had ever labeled the conventions. Two configuration modes used different field names for the same logical operation. Deprecated enum values had to stay because serialization compatibility silently depended on them. The team found over fifty non-obvious patterns of hidden intermediate naming conventions that no document in the company described.

That is what your logs look like. Not as artifacts of bad engineering, but as the accumulated geology of every shipping pressure your team ever absorbed. The agent does not see geology. It sees a flat surface and assumes the fields named in the manual are the fields populated in the data.

What Belongs In The Tool Surface

If you accept that access is half the integration, the other half has a name: the corpus contract. It is a first-class part of the tool surface, sitting alongside the call shape.

A useful corpus contract for a log query tool includes things the model would otherwise have to guess:

  • A queryable schema description. Not the formal index definition. The annotated one — which fields exist, which are reliably populated, which are sparse and why, which mean different things in different services, and which are vestigial. The annotations are where the value lives.
  • A sample-question library. Twenty to fifty pairs of "natural-language question" and "the SPL query that actually answers it on this corpus." This is the same trick that lifts text-to-SQL accuracy: in-context examples that show the model the local idiom rather than asking it to invent one. Pick examples that exercise the field-name traps deliberately.
  • A glossary of magic strings. The service names, environment labels, severity values, and tenant identifiers that appear as raw strings in the data. The agent will not guess "prod-us-east-1-legacy" from "production." Tell it.
  • Lookups and joins that are not in the index. If the engineer always cross-references against cmdb_hosts.csv to get the team owner, that lookup is part of the answer. Surface it as a callable tool or include it inline.
  • Anti-patterns. "Do not filter by host for service ownership questions, that field was abandoned in 2022 — use service.owner instead." Negative examples are cheap and very effective.

The pattern here is the same one Splunk now formalizes for its own LLM integrations: get_splunk_fields to fetch field discovery for a source type, get_splunk_lookups to expose the lookup tables, and a tool definition that includes a description of each index detailing its fields and values. Microsoft's Azure Copilot observability agent explicitly warns that investigation accuracy depends on whether your application emits complete telemetry with preserved correlation fields and sufficient service context. The vendors are converging on the same answer: the tool surface has to publish what the data calls things, not just how to query it.

The Audit Nobody Wants To Run

Try this exercise. Pick the ten most common questions your on-call rotation asks during an incident. For each one, ask a senior engineer to write down the exact query they would run, in detail. Then ask a junior engineer to write the same query using only your written documentation and your wiki.

The gap is the tacit-knowledge debt. It is also, almost exactly, the gap the agent will fall into.

The findings are uncomfortable. You will discover that the field that actually holds the customer ID changed last quarter and the dashboard still works only because somebody added a coalesce. You will discover that "errors" in one service mean exceptions and in another mean failed health checks. You will discover that the lookup table the senior engineer joins against is a CSV in someone's home directory, exported manually from a system retired in 2023, refreshed when somebody remembers.

Most of your observability stack runs on this kind of knowledge. The dashboards work because the people who built them remember the workarounds. The runbooks work because the people who wrote them are the same people who read them. None of it survives contact with a fresh model that takes the schema documentation at face value.

You cannot ship an agent on top of this corpus without first making the implicit explicit. The audit is the work. The agent is the forcing function that exposes it.

Treating What The Corpus Calls Things As Context

Once you accept that the corpus has its own vocabulary, the architectural shift follows. The agent's context window is not just for instructions and reasoning. It is also for the local dialect of the data it operates on.

Concretely, this means a few things. The system prompt or retrieval layer carries a living schema annotation, version-controlled and reviewed when the data model shifts. A sample-question library lives next to the query tool, not in a separate wiki nobody updates. The glossary of magic strings is a file in the repo, owned by the team that owns the data, and changes go through review. When somebody adds a new service or renames a field, the corpus contract is part of the diff.

This is more work than wiring up the tool was. That is the lesson. The wiring takes an afternoon. The corpus contract takes a quarter, and it never finishes, and that is fine because the data never finishes either. Treat it as part of the integration's running cost, the way you treat database migrations as part of the running cost of a schema-driven application.

The teams that win at agentic observability are not the ones with the best models or the slickest MCP servers. They are the ones who already had the audit. They are the ones whose senior engineers were already half-documenting the local idiom because they were tired of explaining it to every new hire. They are the ones whose semantic layer was built for humans and now gets reused for machines.

The agent is not the new thing. The agent is the loud customer that finally forces you to write down what your most experienced engineers have been keeping in their heads. Build for the corpus, not just the API. The integration starts working when the agent stops needing to guess what your data is called.

References:Let's stay in touch and Follow me for more thoughts and updates