Skip to main content

Adding a Modality Is a Privacy-Classification Event, Not a Feature Flag

· 11 min read
Tian Pan
Software Engineer

A product manager pings the AI team on a Tuesday: "Customers want to paste screenshots into the support agent. Should be a small lift, right? The model already takes images." The eng lead checks the SDK, confirms the vision endpoint accepts JPEGs and PNGs, ships the change behind a feature flag, and rolls it to ten percent. Two weeks later, the legal team forwards a regulator letter asking why a user's bank statement, an image of their driver's license, and a screenshot containing another customer's order ID all appeared in the agent's training-eligible logs. Nobody on the AI team flagged the modality change, because nobody thought a modality change was a change. The privacy review that approved the text agent never re-ran for the image variant — and the image variant turned out to live under entirely different consent, retention, and residency rules.

This is not a story about a careless engineer. It is a story about a category error built into how most teams ship AI features. Text input is a known data class with a stable threat model: the user types, the user sees what they typed, the engineering team has years of habit around what to log and what to drop. Images are a different data class with a different threat model — they smuggle in metadata the user cannot see, capture surrounding content the user did not intend to share, and create storage and processing footprints with their own residency and contract terms. Treating "now with vision" as a UX iteration, when it is actually a privacy-classification event, is how teams discover at the regulator's request that their PII inventory understated their actual exposure by an order of magnitude.

What an Image Actually Carries

When a user "just pastes a screenshot," the bytes that hit your inference provider contain at least four distinct payloads, only one of which the user consciously chose to send.

The first is the visible image content — what they meant to share. The second is EXIF and embedded metadata: GPS coordinates with meter-level precision, device serial numbers, capture timestamps, the photo-editing software they used, and on some camera apps the user's cloud-account identifier. Phones strip some of this on certain share paths and not others; an image dropped from the Files app on iOS retains data that the same image shared via the Photos picker would have had stripped. Your application has no way to know which path the user took.

The third is adjacent content the user did not notice. A screenshot of an error message also captures the browser tab strip, the bookmark bar, the notification overlay that just popped up with a 2FA code, and the unrelated Slack DM in the corner. A photo of a paper invoice also captures the corner of the next document on the desk. A screenshot of a chat with your own product also captures the prior conversation in that chat — including, on consumer messaging apps, other contacts' identifiers and message previews. Recent research on multimodal inference shows that frontier models can geolocate user-uploaded images to within a few kilometers by reasoning over street layouts, signage, and architectural cues alone — meaning even "innocuous" images are de-anonymizing in ways the user does not anticipate.

The fourth is content the user uploaded "just to test." Every team that ships image input discovers, within the first month, that some users will paste their driver's license, their passport, a medical bill, or a screenshot of an authentication app to see what the AI does with it. The test-upload pattern is universal and you cannot prompt your way out of it.

Each of these payloads has different sensitivity, different retention implications, and different consent requirements. The text version of your feature governed exactly one of them.

Most AI teams inherit their data-handling rules from the consent flow they wrote for text. That consent typically says something like: "We process your messages to provide and improve the service. Messages may be retained for thirty days for abuse and quality review." The user reads "messages" as "the things I typed." When you add image input under the same flow, that mental model breaks in three places.

First, the data class itself is different. Under GDPR, images that contain identifiable faces are biometric data and may be a special category requiring explicit, separate consent. Images of identity documents trigger different rules in different jurisdictions — some require notification to a regulator if you store them at all. Photographs of children processed by an AI system have triggered fines under both GDPR and U.S. state laws. The consent that covered "your typed messages" almost certainly does not cover "any photograph or screenshot you decide to upload, including photographs of third parties who never consented to your service."

Second, the retention implications scale differently. A thirty-day text-message retention costs almost nothing and rarely re-surfaces. A thirty-day image retention at the same user volume can mean petabytes of storage, image-specific scanning obligations under various jurisdictions' CSAM and illegal-content statutes, and a much larger blast radius if the storage is breached. Several large incidents in the past year have involved AI image services that retained user uploads in misconfigured object stores; the retention window was the same as their text logs, but the consequences of a leak were several orders of magnitude worse.

Third, the inference provider's contract may differ by modality. Many providers offer text-only zero-retention modes that do not extend to image endpoints, or that route image traffic through a different region with different sub-processors. Your DPA with the provider was likely scoped to the endpoints your team named at signing. Adding a new endpoint to your call graph is a contract event, not just a code change. Teams that do not check this find out during a procurement audit that their image traffic was processed in a region their data-residency commitments to enterprise customers explicitly excluded.

What an Image Preprocessing Layer Actually Has to Do

Once you accept that image input is a different data class, the engineering response is not "add validation." It is a preprocessing layer that sits between user upload and inference call, with several responsibilities that have no analog on the text path.

Strip metadata before the bytes leave your trust boundary. Re-encode every uploaded image through a routine that drops EXIF, ICC profiles, XMP, and any embedded thumbnails. Do this even if your inference provider claims to ignore metadata, because your own logs and any downstream training selection will see what you stored. The re-encode also normalizes file format, which removes a class of polyglot-file attacks where an image is a valid PNG and a valid script in another parser.

Detect and act on adjacent-content patterns. This is harder than metadata stripping and the right answer depends on your product. Options range from light-touch (run a small classifier that flags screenshots and warns the user "we noticed this looks like a screenshot — please confirm it does not contain other people's information") to heavy (run an OCR pass on every upload, scan the extracted text for PII patterns, and either redact or refuse). The trade-off is latency, cost, and false-positive rate against blast-radius reduction. For high-stakes domains, the OCR-and-scan path is increasingly table stakes; tools like Microsoft Presidio's image redactor and Azure's image-analysis OCR pipelines exist precisely because every team that ships vision input rediscovers this need.

Treat ID documents and faces as a separate handling class. A face detector and a document classifier running before inference let you route those inputs to a stricter pipeline — or refuse them outright if your product has no business reason to accept them. The teams that get into the most trouble are the ones that accept anything because "the model can handle it" and then discover after the fact that "handle it" included the model summarizing the user's social security number into a downstream system's logs.

Log what you stored, not just what you processed. The audit trail for image input has to record the image's hash, the preprocessing decisions made, the redactions applied, and the storage region — not just the final tokens that went to the model. When the regulator asks what images you held on a given date, "we don't keep that information" is not an acceptable answer in most jurisdictions.

The Audit Slice You Have to Build Before Launch

Text agents typically get audited by sampling conversations and reading the transcripts. Image agents need a parallel audit slice that almost no team builds before launch, and almost every team scrambles to build after their first incident.

The slice has three parts. First, a sampled review of inbound images with humans looking at what users actually uploaded — not the model's interpretation, the raw bytes. The first time your compliance team does this, they will find ID documents, screenshots of competitors' internal tools, photos of medical records, and at least one image of a whiteboard from another company's office. This is not a hypothetical; it is what every team that runs the audit finds in the first hundred samples.

Second, a metadata-leak audit that confirms the preprocessing layer is actually doing its job. Run a periodic check that re-extracts EXIF from a sample of stored images and alerts if any GPS, device, or user-identifier fields survive. Preprocessing pipelines silently break — a library upgrade changes default behavior, a routing rule sends some traffic past the stripper, a new upload path bypasses the layer entirely.

Third, a cross-modality consistency check. The text agent and the image agent share a model, share a memory, and share a tool catalog. If the user uploads a screenshot of another customer's data and the agent acts on it as authoritative, you have created a privacy and integrity incident at the same time. Periodic red-team exercises that test "what does the agent do when an image disagrees with the conversation it's in" should be standing fixtures, not one-time launch checks.

The Org Change That Has to Land With the Code Change

The reason this is an organizational story and not just an engineering one is that the privacy review process at most companies is gated on data-class changes, not on UI changes. A new endpoint that accepts a new data type triggers a review. A new feature flag that turns on an existing capability does not. Vision input ships, in practice, as the second of those — and the data-class change rides along invisibly.

The fix is to name "input modality added" as a privacy-classification event in the same checklist your team uses for adding a new third-party data source or expanding a region. The merge gate should require: a data-class re-classification ticket signed by privacy and security, a contract review confirming the inference provider's terms cover the new endpoint, an updated retention policy with the storage-cost and breach-blast-radius numbers attached, and a preprocessing-layer design doc reviewed by someone who has shipped one before.

This sounds heavy until you compare it to the cost of the alternative. Regulator letters, incident response, and the engineering time spent retroactively scrubbing logs from before the preprocessing layer existed routinely run into months of team output. The lighter-weight discipline of treating modality additions as classification events costs a few engineering days per addition and prevents the entire class.

The architectural realization is the part worth carrying into the next planning meeting. The list of input modalities your AI feature accepts is not a UX choice and not a model-capabilities choice. It is the most load-bearing line in your privacy posture, because each entry on it defines a separate consent regime, a separate retention policy, a separate contract scope, and a separate audit slice. The team that ships vision as "just another input" is the team that learns, at the regulator's request, that their PII inventory was wrong by an order of magnitude — and the team that treats every new modality as a classification event is the team that gets to keep shipping them.

References:Let's stay in touch and Follow me for more thoughts and updates