Skip to main content

44 posts tagged with "security"

View all tags

AI Content Provenance in Production: C2PA, Audit Trails, and the Compliance Deadline Engineers Are Missing

· 12 min read
Tian Pan
Software Engineer

When the EU AI Act's transparency obligations take effect on August 2, 2026, every system that generates synthetic content for EU-resident users will need to mark that content with machine-readable provenance. Most engineering teams building AI products are vaguely aware of this. Far fewer have actually stood up the infrastructure to comply — and of those that have, a substantial fraction have implemented only part of what regulators require.

The dominant technical response to "AI content provenance" has been to point at C2PA (the Coalition for Content Provenance and Authenticity standard) and declare the problem solved. C2PA is important. It's real, it's being adopted by Adobe, Google, OpenAI, Sony, and Samsung, and it's the closest thing to a universal standard the industry has. But a C2PA implementation alone will not satisfy EU AI Act Article 50. It won't survive your CDN. And it won't prevent bad actors from producing "trusted" provenance for manipulated content.

This post is about what AI content provenance actually requires in production — the technical stack, the failure modes, and the compliance gaps that catch teams off guard.

The AI Output Copyright Trap: What Engineers Need to Know Before It's a Legal Problem

· 11 min read
Tian Pan
Software Engineer

When a large language model reproduces copyrighted text verbatim in response to a user prompt, who is legally responsible — the model provider, your company that built the product, or the user who typed the query? In 2026, courts are actively working through exactly this question, and the answers have consequences that land squarely on your production systems.

Most engineering teams have absorbed the basic narrative: "AI training might infringe copyright, but that's the model provider's problem." That narrative is wrong in two important ways. First, output-based liability — what the model produces at inference time — is largely distinct from training-data liability and remains an open legal question in most jurisdictions. Second, the contractual indemnification you think you have from your AI provider is probably narrower than you believe.

This post covers the practical risk surface for engineering teams: what verbatim memorization rates look like in production, how open source license contamination actually shows up in generated code, where enterprise AI agreements leave you exposed, and the engineering controls that meaningfully reduce liability without stopping AI adoption.

The PII Leak in Your RAG Pipeline: Why Your Chatbot Knows Things It Shouldn't

· 10 min read
Tian Pan
Software Engineer

Your new internal chatbot just told an intern the salary bands for the entire engineering department. The HR director didn't configure anything wrong. No one shared a link they shouldn't have. The system just... retrieved it, because the intern asked about "compensation expectations for engineers."

This is the RAG privacy failure mode that most teams don't see coming. It's not a bug in the traditional sense—it's a fundamental mismatch between how retrieval works and how access control is supposed to work.

The Privacy Architecture of Embeddings: What Your Vector Store Knows About Your Users

· 10 min read
Tian Pan
Software Engineer

Most engineers treat embeddings as safely abstract — a bag of floating-point numbers that can't be reverse-engineered. That assumption is wrong, and the gap between perception and reality is where user data gets exposed.

Recent research achieved over 92% accuracy reconstructing exact token sequences — including full names, health diagnoses, and email addresses — from text embeddings alone, without access to the original encoder model. These aren't theoretical attacks. Transferable inversion techniques work in black-box scenarios where an attacker builds a surrogate model that mimics your embedding API. The attack surface exists whether you're using a proprietary model or an open-source one.

This post covers the three layers of embedding privacy risk: what inversion attacks can actually do, where access control silently breaks down in retrieval pipelines, and the architectural patterns — per-user namespacing, retrieval-time permission filtering, audit logging, and deletion-safe design — that give your users appropriate control over what gets retrieved on their behalf.

Red-Teaming Consumer LLM Features: Finding Injection Surfaces Before Your Users Do

· 9 min read
Tian Pan
Software Engineer

A dealership deployed a ChatGPT-powered chatbot. Within days, a user instructed it to agree with anything they said, then offered $1 for a 2024 SUV. The chatbot accepted. The dealer pulled it offline. This wasn't a sophisticated attack — it was a three-sentence prompt from someone who wanted to see what would happen.

At consumer scale, that curiosity is your biggest security threat. Internal LLM agents operate inside controlled environments with curated inputs and trusted data. Consumer-facing LLM features operate in adversarial conditions by default: millions of users, many actively probing for weaknesses, and a stochastic model that has no concept of "this user seems hostile." The security posture these two environments require is fundamentally different, and teams that treat consumer features like internal tooling find out the hard way.

Sandboxing Agents That Can Write Code: Least Privilege Is Not Optional

· 12 min read
Tian Pan
Software Engineer

Most teams ship their first code-executing agent with exactly one security control: API key scoping. They give the agent a GitHub token with repo:read and a shell with access to a working directory, and they call it "sandboxed." This is wrong in ways that become obvious only after an incident.

The threat model for an agent that can write and execute code is categorically different from the threat model for a web server or a CLI tool. The attack surface isn't the protocol boundary anymore — it's everything the agent reads. That includes git commits, documentation pages, API responses, database records, and any file it opens. Any of those inputs can contain a prompt injection that turns your research agent into a data exfiltration pipeline.

Text-to-SQL at Scale: What Nobody Tells You Before Production

· 11 min read
Tian Pan
Software Engineer

Text-to-SQL demos are deceptively easy to build. You paste a schema into a prompt, ask GPT-4 a question, get back a clean SELECT statement, and suddenly your Slack is full of "what if we built this into our data platform?" messages. Then you try to actually ship it. The benchmark says 85% accuracy. Your internal data team reports that about half the answers are wrong. Your security team asks who reviewed the generated queries before they hit production. Nobody has a good answer.

This is the gap between text-to-SQL as a research problem and text-to-SQL as an engineering problem. The research problem is about getting models to produce syntactically valid SQL. The engineering problem is about schema ambiguity, access control, query validation, and the fact that your enterprise database looks nothing like Spider or BIRD.

Agent Identity and Delegated Authorization: OAuth Patterns for Agentic Actions

· 10 min read
Tian Pan
Software Engineer

When an AI agent books a calendar event, sends an email, or submits a form, it isn't acting on its own identity — it's acting under delegated authority from a human who said "go do this." That distinction sounds philosophical until an agent leaks sensitive data, takes an irreversible action the user didn't intend, or gets compromised. At that point, the question isn't what happened but who authorized it, when, and can we revoke it.

The blast radius of poorly scoped agent credentials is larger than most teams realize. An agent authenticated with broad API access isn't one point of failure — it's a standing invitation. In 2025, agentic AI CVE counts jumped 255% year-over-year, and most incidents traced back to credentials that were too broad, too long-lived, or impossible to revoke cleanly. Building agents right means designing the authorization layer before you hit production.

Prompt Injection at Scale: Defending Agentic Pipelines Against Hostile Content

· 10 min read
Tian Pan
Software Engineer

A banking assistant processes a customer support chat. Embedded in the message—invisible because it's rendered in zero-opacity white text—are instructions telling the agent to bypass the transaction verification step. The agent complies. By the time the anomaly surfaces in logs, $250,000 has moved to accounts the customer never touched.

This isn't a contrived scenario. It happened in June 2025, and it's a precise illustration of why prompt injection is the hardest unsolved problem in production agentic AI. Unlike a chatbot that produces text, an agent acts. It calls tools, sends emails, executes code, and makes API requests. When its instructions get hijacked, the blast radius isn't a bad sentence—it's an unauthorized action at machine speed.

According to OWASP's 2025 Top 10 for LLM Applications, prompt injection now ranks as the #1 critical vulnerability, present in over 73% of production AI deployments assessed during security audits. Every team building agents needs a coherent threat model and a defense architecture that doesn't make the system useless in the name of safety.

The Insider Threat You Created When You Deployed Enterprise AI

· 9 min read
Tian Pan
Software Engineer

Most enterprise security teams have a reasonably well-developed model for insider threats: a disgruntled employee downloads files to a USB drive, emails a spreadsheet to a personal account, or walks out with credentials. The detection playbook is known — DLP rules, egress monitoring, UEBA baselines. What those playbooks don't account for is the scenario where you handed every one of your employees a tool that can plan, execute, and cover multi-stage operations at machine speed. That's what deploying AI coding assistants and RAG-based document agents actually does.

The problem isn't that these tools are insecure in isolation. It's that they dramatically amplify what a compromised or malicious insider can accomplish in a single session. The average cost of an insider incident has reached $17.4 million per organization annually, and 83% of organizations experienced at least one insider attack in the past year. AI tools don't introduce a new threat category — they multiply the capability of every threat category that already exists.

The Minimal Footprint Principle: Least Privilege for Autonomous AI Agents

· 10 min read
Tian Pan
Software Engineer

A retail procurement agent inherited vendor API credentials "during initial testing." Nobody ever restricted them before the system went to production. When a bug caused an off-by-one error, the agent had full ordering authority — permanently, with no guardrails. By the time finance noticed, $47,000 in unauthorized vendor orders had gone out. The code was fine. The model performed as designed. The blast radius was a permissions problem.

This is the minimal footprint principle: agents should request only the permissions the current task requires, avoid persisting sensitive data beyond task scope, clean up temporary resources, and scope tool access to present intent. It is the Unix least-privilege principle adapted for a world where your code makes runtime decisions about what it needs to do next.

The reason teams get this wrong is not negligence. It is a category error: they treat agent permissions as a design-time exercise when agentic AI makes them a runtime problem.

Prompt Injection Detection at 100,000 Requests Per Day: Why Simple Defenses Break and What Actually Works

· 11 min read
Tian Pan
Software Engineer

Most teams discover their prompt injection defense is broken after a user finds it, not before. You add "ignore all previous instructions" to your blocklist and ship. Three months later an attacker encodes the payload in Base64, or buries instructions in HTML comments retrieved via RAG, or uses typoglycemia ("ignroe all prevuois insrtucioins"), and your entire defense evaporates. The blocklist doesn't help because prompt injection has an unbounded attack surface — there is no closed vocabulary of malicious inputs.

At low traffic volumes you can absorb the cost of calling a second LLM to validate each request. At 100,000 requests per day, that math becomes ruinous and the latency becomes user-visible. This post is about what the architecture looks like when brute-force approaches stop working.