2 posts tagged with "machine-unlearning"

The User You Can't Delete: Right to Be Forgotten in AI Systems

July 5, 2026 · 9 min read

Software Engineer

A deletion request lands in your queue. A user has invoked their right to erasure, and legally you have a month to make their personal data disappear. In a normal system this is a DELETE statement and a smug audit-log entry. In an AI system it is the moment you discover that your data doesn't live in one place — it has been smeared across a fine-tuned model's weights, baked into a vector index, cached in a dozen retrieval snapshots, and copied into last quarter's evaluation set. There is no single row to delete. The user is, in a very literal engineering sense, undeletable.

This is not a hypothetical. In March 2025 the European Data Protection Board launched a coordinated enforcement action across thirty national authorities focused specifically on the right to erasure. Regulators have converged on an uncomfortable position: including someone's data in training is processing, so Article 17 applies to the model, not just the database. The question every AI team eventually faces is whether output suppression — teaching the model to refuse to talk about someone — is enough, or whether you actually have to remove the influence of their data from the system. The honest answer is that most teams have never designed for either.

The Dataset License That Retroactively Poisoned Your Fine-Tune

June 2, 2026 · 10 min read

Tian Pan

Software Engineer

The fine-tuned checkpoint that has been running in production for nine months is now sitting in a Slack thread between your CTO and outside counsel. A data source that you scraped under what looked like a permissive license has changed its terms, sent a notice, and named your model. Your engineers want to know whether the model can simply be "untrained" on the offending records. Counsel wants to know whether the weights file itself is now a regulated artifact. Nobody on the call has a good answer, because your training pipeline treated the license as an event — read once at ingestion time — instead of a state that the world can edit after you have already paid for the H100s.

This is the failure mode that very few fine-tuning playbooks bother to discuss. The license under which a dataset was distributed is not a static gate that you walk through at ingestion. It is an ongoing claim by a third party that you do not control, and the half-life of that claim is shrinking. Hugging Face's own legal repository quietly logs DMCA takedowns against named datasets every few weeks — AoPS pulling the MATH benchmark, PaperDemon pulling scraped artwork, Archive of Our Own removing a fanfiction dump within hours of notice. Each takedown is a downstream signal that some model somewhere was trained on data whose redistribution rights have since evaporated.

About Tian Pan