Skip to main content

One post tagged with "redaction"

View all tags

The PII Redactor Whose Own Training Corpus Was the Leak Vector

· 9 min read
Tian Pan
Software Engineer

A team stands up a fine-tuned redaction model in front of their log pipeline. It strips names, emails, account numbers, and IP addresses before anything lands in long-term storage. The model is small, fast, and easy to deploy alongside the ingestion workers. The privacy review approves it. Six months later a customer support engineer pastes a strange-looking log line into a debugging tool, and the redactor produces an output that contains a real customer's email address — one that does not appear anywhere in the input.

The pipeline did exactly what it was built to do. The redactor was the leak.