Skip to main content

2 posts tagged with "in-context-learning"

View all tags

The Zero-Shot Wall: Why In-Context Examples Stop Working at Production Scale

· 8 min read
Tian Pan
Software Engineer

Most teams discover the zero-shot wall the same way: a new edge case breaks the model, they add an example to the prompt, it helps. Three months later they've got 40 examples, 6,000 tokens of context, the performance metrics haven't moved in weeks, and the prompt engineer who knows where every example came from just left the company.

Few-shot prompting is seductive because it works quickly. You observe a failure, you add a demonstration, the failure goes away. The feedback loop is tight and the wins feel free. What you don't notice is that each subsequent example is buying less than the last — and at some point you're spending tokens, latency, and cognitive overhead for improvements that round to zero.

This is the zero-shot wall: not a hard limit where performance drops off a cliff, but a zone of sharply diminishing returns where in-context learning has hit the ceiling of what it can accomplish for your task, and the only lever left is fine-tuning.

The Few-Shot Saturation Curve: Why Adding More Examples Eventually Hurts

· 9 min read
Tian Pan
Software Engineer

A team testing Gemini 3 Flash on a route optimization task watched their model score 93% accuracy at zero-shot. They added examples, performance climbed — and then at eight examples it collapsed to 30%. That's not noise. That's the few-shot saturation curve biting hard, and it's a failure mode most engineers only discover after deploying a prompt that seemed fine at four examples and broken at twelve.

The intuition that more examples is strictly better is wrong. The data across 12 LLMs and dozens of task types shows three distinct failure patterns: steady plateau (gains flatten), peak regression (gains then crash), and selection-induced collapse (gains that evaporate when you switch example retrieval strategy). Understanding which pattern you're in changes how you build prompts, when you give up on few-shot entirely, and whether you should be fine-tuning instead.