Interview Mode vs. Task Mode: The Unspoken Contract Your Agent Keeps Breaking
Open any agent's customer feedback channel and you will find two complaints, both loud, both common, and both blamed on the model. The first sounds like "it asks too many questions before doing anything." The second sounds like "it just runs off and does the wrong thing without checking with me." Product teams hear those as opposite problems and ship opposite fixes — tighten the system prompt to ask fewer questions, then loosen it again next quarter when the other complaint gets louder. Neither change works for long, because the two complaints are not really about questions or actions. They are about a contract the user picked silently and the agent failed to honor.
Every conversation with an agent operates in one of two implicit modes. Interview mode is the contract where the user expects the agent to extract requirements before doing anything substantive — clarifying questions are welcome, premature execution is the failure. Task mode is the contract where the user has already done the thinking, has a specific plan in mind, and expects the agent to execute on the available context, asking only when truly blocked — questions are friction, half-baked execution is the failure.
Users do not announce which mode they are in. They expect the agent to read it from the message, the conversation history, and the situation, and they punish the agent harshly when it gets it wrong. The fix for "asks too many questions" and "didn't ask enough questions" is the same fix: make mode a first-class concept in your agent, detect it from signals you can actually see, and surface it to the user when you are unsure.
The Two Modes Users Already Have in Their Heads
Watch a few real users open a session with a coding agent or a research assistant. The interview-mode user types something like "help me build something to track my reading list" and leans back. They have a fuzzy goal, no schema, and an expectation that the agent will work them through it the way a thoughtful collaborator would: ask about the data, the surface area, the constraints, then propose. If the agent immediately scaffolds a Postgres schema and a React app, the user feels run over — not impressed.
The task-mode user, in the same product, types "add a priority field to the reading list model and migrate existing rows to medium". They have already decided. They want execution. If the agent responds with "What priority levels would you like? Low/medium/high, or numeric? Should null be allowed during migration? Do you want to add an index?" the user reads it as the agent stalling. They wrote a specific instruction; the agent's job is to act, not to host a requirements workshop.
Industry research backs this up at scale. One commonly cited customer-service study found that 55% of users dislike chatbots asking repetitive questions, and "the bot kept asking the same questions" appears at the top of nearly every chatbot-failure list. The mirror complaint — "it didn't ask, it just guessed and got it wrong" — shows up just as consistently in agent-coding postmortems, where teams discover that their agent shipped a feature based on the first plausible interpretation of a vague prompt. Both are real failure modes. They just happen to the opposite users.
The trap for builders is that the two modes look like personality traits. "Our agent is too nosy." "Our agent is too cocky." So teams reach for personality knobs — a sentence in the system prompt nudging the model toward more or fewer questions. That moves the needle for one cohort and breaks it for the other. The actual variable is not personality. It is which contract the user came in expecting.
Signals That Should Drive Mode Selection
If mode is a contract, then mode detection is a classifier — and like any classifier, it works best when you choose features deliberately instead of hoping the LLM figures it out. Several signals discriminate well between interview and task contracts, and they are signals you already have:
- Specificity of nouns. "a thing to track my reading" is interview-mode language. "the
ReadingListmodel" is task-mode language. Concrete identifiers — file paths, function names, ticket IDs, table names — strongly correlate with users who already have a plan. - Verbs of intent. "help me figure out", "I'm thinking about", "how should I" are deliberation verbs. "add", "rename", "deploy", "fix" are execution verbs. The mix in a single message is one of the cleanest signals available.
- Constraints in the prompt. A message that includes acceptance criteria, a target system, or a "must not" clause is almost always task mode. A message with no constraints is more often interview mode.
- Conversation position. The first message of a session skews toward interview mode (the user is introducing a goal). A message that follows a plan or a clarification answer skews hard toward task mode (the user has now picked a path).
- Message length and structure. Three-line bulleted requests with code blocks are task mode. One-sentence open questions are interview mode. The user has already done the formatting work that signals their stance.
- Presence of pasted artifacts. A pasted error, log, traceback, or diff is a near-certain task-mode signal. The user is not asking whether to do something; they want help executing on the artifact.
None of these are perfect, and you do not need them to be. The point is that an agent's mode classifier should run on real signals — many of them cheap and deterministic — instead of being implicitly delegated to whatever the model feels like doing this turn. The literature on intent ambiguity, including recent work on confidence-thresholded clarification, is consistent: route low-confidence inputs to a clarification path, route high-confidence inputs to action. The bug in most agents is that they have no threshold at all.
A reasonable starting policy: if the classifier's task-mode confidence is above a high bar (say, 0.8), execute. If it is below a low bar (say, 0.4), open with a single, well-targeted clarifying question. If it lands in the gray zone, surface the ambiguity to the user explicitly instead of guessing.
The Mode-Transition UX
Detection is half the job. The other half is showing the user which contract you think you are in, so they can correct you cheaply when you are wrong.
The most successful pattern is what coding agents have converged on: a visible plan mode that is a different surface from execution. When Cursor or Claude Code enters plan mode, the agent is allowed to ask, propose, and revise — but it cannot touch files. The user sees the contract reflected in the UI: a markdown plan they can edit, a confirmation step before any action, a clear handoff into "do it." The plan itself becomes the artifact of the interview, and the transition into task mode is a discrete, user-acknowledged event. That is mode-as-contract, made visible.
You do not need a separate mode toggle to get this benefit. The same idea works inline:
- Lead with the mode. The agent's first sentence should make the contract obvious. "I'll ask three quick questions before drafting anything" and "I'll go ahead and add the field with sensible defaults — let me know if you want to override" tell the user which contract you picked. If you picked wrong, they push back in one turn instead of ten.
- Offer a one-line escape hatch. Whenever you guess, give the user the cheap escape: "…or if you'd rather just specify the priority levels, paste them and I'll skip the questions." This is what mode-negotiation UX looks like in a chat interface: making it free to swap contracts mid-conversation.
- Acknowledge mode shifts explicitly. When the user transitions ("ok, let's just do low/medium/high, ship it"), the agent should mark the shift — "Switching to execution; I'll add the field now." Silent transitions are how agents end up halfway between modes, asking another question after the user has already approved action.
- Use the plan as a contract artifact. A short bulleted plan, presented before action, lets the user audit the agent's interpretation in seconds. If it is wrong, they edit the plan, not the agent's personality. Plan mode works because the plan is the externalized contract — it is reviewable, versionable, and steerable in a way that a vibe is not.
Notice what these patterns have in common: they make the contract observable. The single biggest reason mode mismatches feel so bad is that the user cannot see the agent's stance until something has gone wrong. Surfacing the stance is much cheaper than getting the stance perfectly right.
When Mode Negotiation Looks Like a Personality Problem
Once you start watching for it, a striking number of "the agent is too X" complaints turn out to be mode-negotiation failures wearing a personality mask. A few patterns that usually trigger the mistaken diagnosis:
- "It asks the same things every time" often means the agent is opening every session in interview mode regardless of how specific the message is. The fix is signal-driven mode detection, not a "ask fewer questions" instruction in the prompt — that one will swing it too far in the other direction for the next user.
- "It made assumptions instead of asking" often means the agent is in task mode when the user expected interview mode. The fix is a higher confidence threshold for execution, plus an inline plan that the user can audit before action lands.
- "It feels indecisive" often means the agent is oscillating — asking, then half-acting, then asking again — because mode is being re-evaluated turn by turn with no commitment. The fix is to pick a mode at the boundary and stay in it until the user explicitly negotiates a transition.
- "It overexplains" often means the agent is in interview mode and treating the explanation as part of the requirements-gathering. In task mode, the same agent should narrate less, act more, and report what it did.
Diagnosing these as personality problems sends teams down dead-end alleys: tone tuning, persona prompts, "be more concise" instructions. None of those address the root cause, which is that the agent never decided which contract it was operating under.
What to Instrument
Most agent observability dashboards measure satisfaction, latency, and tool-call success. Almost none measure mode mismatches, which is why mode-negotiation bugs persist invisibly for quarters. A few metrics worth adding:
- Mode-correction turns. Count the turns where the user explicitly redirects the agent's stance ("just do it", "wait, ask me first", "don't write code yet"). High rates point straight at a misclassifying mode detector.
- Time-to-first-action. Long delays between the opening message and the first concrete action correlate with interview mode. If your task-mode-classified sessions show long delays, you are mis-detecting and interrogating users who wanted execution.
- Plan-edit rate. When users frequently edit the plan before approving it, the interview was incomplete or the agent's interpretation was off. When users approve plans without edits and then complain about the result, the plan was too vague to function as a real contract.
- Mode-flip rate within a session. Agents that flip mode mid-conversation without an explicit user signal are oscillating — investigate why the classifier is unstable.
Tracking these numbers turns "users complain about the personality" into a debuggable signal. You can A/B threshold values, prompt phrasings, and UI affordances against measurable mode-mismatch rates instead of vibes.
Mode Is a Contract, Not a Personality
The framing shift that pays off is treating mode as something the agent and user negotiate, not something the model embodies. A well-designed agent is not nosy or cocky; it is in interview mode or task mode, with a visible stance and a cheap path to switch. The complaints stop being about what the agent is and start being about whether the contract was right — which is a problem you can actually solve.
Two practical takeaways for the next sprint. First, write down which contract each surface of your agent is supposed to default to (a one-shot completion endpoint is task mode by default; a long-running planning session is interview mode by default), and audit whether your prompts and UI actually reflect that. Second, instrument mode-correction turns this week. The number is almost always higher than teams guess, and once you can see it, the personality knobs stop looking so attractive — and the real fix, mode as a first-class contract, starts to look obvious.
- https://www.uxtigers.com/post/intent-ux
- https://www.smashingmagazine.com/2026/02/designing-agentic-ai-practical-ux-patterns/
- https://www.uxmatters.com/mt/archives/2025/12/designing-for-autonomy-ux-principles-for-agentic-ai.php
- https://arxiv.org/abs/2510.12015
- https://arxiv.org/html/2502.13069v1
- https://aclanthology.org/2024.emnlp-industry.28.pdf
- https://www.servicetarget.com/blog/ai-customer-support-chatbot-problems-solutions
- https://cursor.com/docs/agent/plan-mode
- https://cursor.com/blog/plan-mode
- https://lucumr.pocoo.org/2025/12/17/what-is-plan-mode/
- https://code.claude.com/docs/en/best-practices
- https://medium.com/@mr.murga/enhancing-intent-classification-and-error-handling-in-agentic-llm-applications-df2917d0a3cc
