Skip to main content

The Five Definitions of 'Now' Inside Your LLM Prompt

· 11 min read
Tian Pan
Software Engineer

A customer support agent told a user "based on our latest pricing, as of today" and quoted last quarter's price sheet. The system prompt interpolated today is {current_date} correctly. The retrieval layer pulled the document with the highest freshness score. The model answered confidently. Every component did exactly what it was specified to do, and the user got a wrong answer that the on-call engineer could not reproduce because, by the time they replayed the trace at 9pm, "today" was a different day.

This is not a rare bug. It is a failure mode that lives in almost every production LLM pipeline because "now" is implicit in the prompt at five different layers, and those layers were authored at different times, by different people, against different definitions of the present. As long as a request runs synchronously from a foreground user session, the layers mostly agree. The moment the request is replayed for debugging, batch-processed overnight, run from an eval harness pinned in March, or queued and consumed an hour later, the layers start disagreeing — and the model produces an answer that is internally consistent within its prompt but externally wrong.

The teams that ship this bug do not lack rigor. They lack a vocabulary for treating "now" as a first-class variable. The fix is not "add the date to the system prompt" — most teams already did that. The fix is recognizing that there are at least five distinct meanings of "the current time" inside a single LLM call, naming each of them, and reconciling them at the seams.

Layer 1: The system prompt's interpolated date

Almost every production prompt today contains something like Today is {current_date} or The current date is {now}. This is the most visible time variable, and it is also the most innocuous — until you ask what current_date is the current date of.

The user's wall clock? The orchestrator's wall clock? The request enqueue time, or the dequeue time? The eval harness's frozen anchor? Most teams never make this distinction explicit because for the dominant case — a foreground request handled within a few hundred milliseconds — all of these are within tolerance. The bug hides until something stretches the time between request creation and prompt construction. A retried request that sat in a dead-letter queue for two days will interpolate "today" as the day it was retried, while the user-supplied "what happened yesterday" still refers to two days before the original request. The prompt is now lying to itself.

The discipline is to treat "current_date" not as a string but as a tuple: (timestamp, source, timezone). The source matters because a prompt built from request_enqueue_time produces stable behavior under replay, while one built from Date.now() at prompt-construction time does not. Pick one explicitly, document the choice, and stop calling it "today."

Layer 2: The few-shot examples' implicit dates

Few-shot examples almost always carry temporal context, even when the prompt author did not realize they were embedding any. An example that says "Q4 2026 forecasting" anchors the model to a fiscal context that may be obsolete six months later. An example that uses concrete dates ("the user asked on March 14, the answer references events through March 13") teaches the model a relative-date pattern that becomes wrong every time those exact dates pass.

The pernicious version is the example that seems timeless but isn't. "The user wants to renew their subscription" carries no explicit date, but the example response references "the upcoming pricing change" — a phrase that was correct when the example was authored but stopped being correct when the pricing change shipped. Three months later, the model is producing answers grounded in an event the company no longer considers upcoming.

The discipline is to version few-shot examples with their authoring date, treat them as code that rots, and rebuild them on a cadence rather than letting them age silently. A "few-shot freshness" metric — the median age of the examples currently in your prompt — is a one-line dashboard that surfaces decay before the model behavior does.

Layer 3: The retrieval layer's freshness window

If you are using RAG, you have a third definition of "now" that lives in your retrieval scorer. A freshness-weighted retriever ranks documents by some combination of semantic similarity and recency, and the recency score is computed against something — usually now() at retrieval time. Recent research on freshness in RAG has shown that ranking with only semantic proximity works for timeless facts but fails the moment a question is time-sensitive, and the canonical fix is to fuse semantic similarity with a recency prior so newer items outrank older ones for time-bounded queries.

That fix introduces a new variable. The recency prior is anchored to a clock — and that clock is now the third "now" in your pipeline. Most implementations silently use now() at retrieval time, which means the freshness-ranked results for a request being replayed at 9pm look different from the freshness-ranked results for the same request at 9am, even when the underlying corpus has not changed. Document timestamps drift past staleness thresholds; a piece of evidence that was "recent" from the user's perspective becomes "old" from the replay's perspective.

The deeper failure is when the retriever's "recent" disagrees with the system prompt's "today." The user asks about "the most recent pricing as of today," the system prompt says today is May 10, but the retriever returns a document scored as recent against the document's age relative to the request enqueue time three days earlier. The model dutifully describes this old document as the latest, and the user gets misinformation that the trace will fail to explain. Engineers debugging RAG systems in production have written about this as the silent failure mode that dwarfs hallucination because every step "worked."

The discipline is to log the freshness window used per query so post-hoc you can ask "what was 'recent' from the perspective of this request" — and to pass an explicit as_of_time to the retriever instead of letting it call now().

Layer 4: The base model's training cutoff

The base model has its own implicit "now," fixed at the data cutoff date of its pretraining or last fine-tune. This is the most stable layer — it does not move during a single deployment of a single model — but it is also the layer most teams forget, and the one whose disagreement with the other layers is hardest to fix.

The cutoff matters because it determines what the model "knows" without retrieval. A model trained through January 2026 will confidently make claims about events through January 2026, hedge gently about February, and refuse or hallucinate after that. If your system prompt says "Today is May 10, 2026" and your retrieval layer doesn't surface anything about a recent product launch, the model will fall back on its priors — and its priors think it is January. The model writes "the upcoming Q2 launch" when the launch shipped two months ago.

A widely-noticed failure mode is the agent that, when asked for the current date with no other context, returns its training cutoff year instead of the interpolated date the system prompt provided. The system prompt was correct; the model just trusted its priors more than the prompt. The fix here is not to give up on prompt-level date injection but to make the disagreement loud — explicit instructions like "If your training data suggests a different year than the system prompt's date, the system prompt is authoritative" measurably reduce this failure mode in published reports.

The discipline is to track the spread between training cutoff and request time as a deployment metric. When that spread exceeds your model's working tolerance — typically a few months — you need stronger retrieval coverage to compensate, or the model will quietly substitute its prior for your data.

Layer 5: The user's own time references

The fifth layer is the messiest because it is in the input you don't control. A user who says "this morning's meeting," "next week's deadline," or "the contract we signed last quarter" is encoding their own implicit "now" into the prompt — and that "now" is the time they typed the message, which is not necessarily the time the request reaches the model.

For a foreground chat session, the gap is small enough to ignore. For an async job — a user who emails an agent, a long-running batch task, a scheduled report — the gap becomes hours or days. The user's "this morning" was Tuesday morning. The agent processes the request Wednesday afternoon. The model interprets "this morning" against the system prompt's interpolated date (Wednesday) and confidently answers about an event the user did not mean.

This is the layer where most teams give up because "the user's intent" feels intractable. It is not. The discipline is to capture the user's submission timestamp at ingest, propagate it as an explicit field in the prompt — the user submitted this message at {user_submission_time} — and instruct the model to resolve relative references against that timestamp rather than the request processing time. The model does this surprisingly well when the variable is named and the instruction is explicit; the failure was never reasoning ability, it was missing context.

What this looks like in practice

A prompt that has been audited for temporal coherence does not have one date variable. It has several, each named, each sourced from a documented place, each consistent with the others by construction:

  • request_enqueue_time — when the user's request entered the system, used for resolving the user's relative time references
  • prompt_construction_time — when the prompt was assembled, used for retrieval freshness scoring (passed explicitly to the retriever, not implicitly via now())
  • eval_anchor_time — present only in eval contexts, pinned to make a six-month-old eval still score meaningfully
  • corpus_freshness_window — the recency window the retriever applied, logged per query
  • model_training_cutoff — a static fact about the deployed model, surfaced in the prompt so the model can flag when its priors might disagree with provided data

The eval harness deserves special attention. Eval datasets accumulate over time and become tribal knowledge — a 2024 golden dataset still scoring an LLM in 2026 is a common pattern, and best practices now suggest marking golden rows stale after 90 days unless re-verified. But staleness in the dataset is only half the problem; the other half is that the eval prompt itself contains a "current date" that, if interpolated against the harness run time, makes a March-authored eval fail in June for reasons that have nothing to do with model regression. Pinning eval_anchor_time to the dataset's authoring date lets the eval reason about a fixed moment instead of drifting with the wall clock.

A regression test worth adding: re-run a prompt with a date-shifted wrapper and assert the answer changes only where it should. Shift "today" forward by a year and see what breaks. The bugs that surface — few-shot examples that hardcoded a date, retrievers that scored against the wrong anchor, model responses that confidently extrapolate from a now-stale prior — are all bugs that would have shipped to production unnoticed.

The architectural realization

"Now" is a multi-meaning variable in your prompt the same way "session" or "user" are. A team that lets current_user mean five different things across five layers would not ship that pipeline. The same team routinely ships five different definitions of current_time and is surprised when the agent says "as of today" while quoting last quarter's prices.

The temporal layers are not adversaries of the prompt author — they are working as designed. Each one is doing its local job with its local clock. The bug is at the seams between them, and the fix is not at any single layer. It is the discipline of naming "now" explicitly, sourcing each instance from a documented place, and making the layers reconcile by construction rather than by coincidence.

The triggers for this bug are the boring operational events the prompt author never modeled: replay, batch, eval, async, retry. They are also the events that production traffic generates by default. A pipeline that only behaves correctly when every clock is within a few hundred milliseconds of every other clock is a pipeline that works during demos and fails during operations. Treat "now" as plural. Pass it as data. Reconcile it at the seams. The model will stop time-traveling when you stop pretending it has only one clock.

References:Let's stay in touch and Follow me for more thoughts and updates