Skip to main content

One post tagged with "owasp"

View all tags

Your System Prompt Will Leak: Designing for Prompt Extraction

· 10 min read
Tian Pan
Software Engineer

The threat model for LLM features over-indexes on three failure modes: prompt injection, user-data exfiltration, and unauthorized tool calls. There is a quieter attack that lands more often, costs less to mount, and shows up in fewer postmortems because nobody filed one — prompt extraction. An adversarial user, sometimes a competitor, sometimes a curious researcher, walks the model into reciting its own system prompt over a handful of turns. The carefully tuned instructions that encode your team's product behavior, refusal policy, retrieval scaffolding, and brand voice land in a public GitHub repository within the week.

The repositories already exist. A widely-circulated GitHub project tracks extracted system prompts from Claude, ChatGPT, Gemini, Grok, Perplexity, Cursor, and v0.dev — updated as new model versions ship, often within hours of release. Anthropic's full Claude prompt clocks in at over 24,000 tokens including tools, and you can read it. The companies most invested in prompt secrecy are the ones whose prompts leak most reliably, because they are also the ones whose attackers are most motivated.