Prompt-Eligibility: The Missing Column in Your Data Classification

April 27, 2026 · 11 min read

Software Engineer

Pull up your company's data classification policy. Public, internal, confidential, restricted — four neat tiers, each mapped to a set of access controls and a list of approved storage locations. Now ask a question the policy was never written to answer: which of these tiers are allowed to leave the corporate perimeter as a token sequence sent to a third-party model API?

The answer is almost always silence. Not because the policy is wrong, but because it is incomplete. Every classification scheme in use today was designed for an access vector that asks "is this employee allowed to read this row?" The prompt layer introduced a different vector entirely: an authorized service reads the row, transforms it into a prompt, and ships it across the network to a vendor that may log it, train on it, or hold it in plaintext for thirty days. None of that is read-access. None of it is covered.

This is the missing column. Until you add it, your data classification document is confidently asserting a control posture you do not have.

Read-Access Is Not Egress-Eligibility

The core conceptual error is treating "the calling service has permission to fetch this field" as the only check that matters. It was the only check that mattered when the destination was a database join, an internal microservice, or a log line in your own infrastructure. Once the destination is api.openai.com or api.anthropic.com, four new questions appear, and read-access answers none of them.

First: does the vendor retain the prompt? Most providers' default consumer terms allow logging for thirty days or longer for abuse review, and some retain indefinitely for service improvement. Enterprise tiers can negotiate down to seven days or zero, but only if you signed the agreement and only for the products it covers. Anthropic, for example, recently dropped default API log retention from thirty days to seven, and offers true zero-data-retention only to qualifying enterprise customers and only for specific API products — not for everything that talks to a Claude endpoint.

Second: can the prompt enter a training set? Even when "no training on customer data" is the default for paid API tiers, that promise is contractual, not architectural. It applies only to the products and accounts named in the data processing agreement. A side project on a personal API key is not covered. A team that signed up for a new product line is not covered until procurement reviews it.

Third: where does the prompt physically land? A US-resident customer's data routed through an EU-hosted gateway to a US-hosted model crosses two jurisdictions. Whether that crossing is lawful depends on transfer mechanisms (SCCs, adequacy decisions, the post-Privacy Shield framework) that your read-access ACL knows nothing about.

Fourth: who else sees the byte stream in flight? Inline DLP gateways, observability vendors, prompt-injection scanners, and analytics tools may all process the prompt before it reaches the model. Each is its own subprocessor, each requires its own DPA, and each is invisible to the application code that built the prompt.

Read-access answers none of these. Prompt-eligibility is a function of data sensitivity and the destination contract — and the destination contract is a moving target measured in vendor SKUs.

What a Prompt-Eligibility Tier Looks Like

The fix is not to bolt warning labels onto the existing classification scheme. The fix is to add a parallel classification — call it prompt-eligibility — that is computed, not declared, and that resolves to a list of allowed model endpoints rather than a list of allowed users.

A workable scheme has three or four tiers, each with a contract requirement attached:

Open: prompt-eligible to any model, including consumer-tier APIs and unverified vendors. Public marketing copy, documentation, open-source code.
Bounded: prompt-eligible only to vendors with a signed DPA that prohibits training on inputs and bounds retention to a documented window. Most internal business data, non-public roadmaps, customer-facing communications stripped of identifiers.
Restricted: prompt-eligible only to vendors with an active zero-data-retention agreement covering the specific API product in use. PII, financial records, employee data, source code with embedded secrets-handling.
Prohibited: not prompt-eligible to any external vendor regardless of contract. Authentication secrets, raw cardholder data, regulated health data in jurisdictions where the vendor cannot demonstrate compliance, anything covered by export control.

The critical move is that "Restricted" is not a property of the data alone. It is a function that takes the data tier and the model endpoint and returns allow/deny. The same field — say, an employee's home address — is prompt-eligible against a self-hosted Llama instance, prompt-eligible against an Azure OpenAI deployment with ZDR enabled, and prompt-ineligible against the public OpenAI API even though the company has an enterprise contract for a different product.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

Prompt-Eligibility: The Missing Column in Your Data Classification

Read-Access Is Not Egress-Eligibility

What a Prompt-Eligibility Tier Looks Like

Recommended Reading

About Tian Pan

Read-Access Is Not Egress-Eligibility​

What a Prompt-Eligibility Tier Looks Like​

Recommended Reading

About Tian Pan

Read-Access Is Not Egress-Eligibility

What a Prompt-Eligibility Tier Looks Like