Treating Telemetry Like PII: Why Privacy-by-Design Observability Is Now Mandatory

A conversation with our CISO last quarter changed how I think about observability data entirely.

The Regulatory Wake-Up Call

“Michelle, do you know what’s in your telemetry data?”

I started listing technical signals - latency, error rates, request volumes. She stopped me.

“No. I mean, do you know whose session IDs are in your traces? Which IP addresses are in your logs? What user identifiers flow through your metrics labels?”

I didn’t. And that’s a problem.

Telemetry as Compliance Liability

In 2026, regulated industries are treating telemetry like they treat any other data store containing personal information:

What’s in your observability data?

  • User IDs and session tokens
  • IP addresses and geolocation
  • Device fingerprints
  • Email addresses in error messages
  • Credit card numbers in debug logs
  • Health information in API responses

Who can access it?

  • Your observability vendor
  • Your engineering team
  • Your support team
  • Anyone with dashboard access

How long do you keep it?

  • Often longer than your formal data retention policies
  • Sometimes indefinitely “for debugging”

The Regulatory Framework

GDPR Article 5: Personal data must be “adequate, relevant and limited to what is necessary.”

Your traces capturing full request bodies? Probably not necessary.

CCPA Right to Delete: Users can request deletion of personal information.

Can you delete a specific user’s data from your observability systems? Most organizations can’t.

HIPAA Minimum Necessary: Only the minimum necessary information should be used.

Those detailed logs from your healthcare application? Potential violation.

Privacy-by-Design Implementation

1. Data Classification at Collection

Before telemetry leaves your infrastructure:

processors:
  attributes/pii:
    actions:
      - key: user.email
        action: hash  # One-way hash, preserves cardinality
      - key: user.ip
        action: redact  # Replace with [REDACTED]
      - key: http.request.body
        action: delete  # Remove entirely
      - key: user.id
        action: truncate  # Keep first 4 chars only

2. Tiered Access Controls

Role Access Level
SRE On-Call Aggregated metrics, sampled traces (anonymized)
Engineering Service-level traces, hashed identifiers
Security Full fidelity, audit logged
Compliance Query-only, purpose-limited

3. Retention with Teeth

  • Hot tier (7 days): Full detail, fast queries
  • Warm tier (30 days): Aggregated, anonymized
  • Cold tier (1 year): Compliance minimum only
  • Delete: Automated, audited

How Adaptive Telemetry Helps

This is where adaptive telemetry becomes a privacy tool, not just a cost tool:

  1. Reduce surface area - Less data collected means less data to protect
  2. Purpose-driven retention - Keep what you use, delete what you don’t
  3. Intelligent redaction - ML can identify and mask PII patterns
  4. Audit-ready logging - Track who accessed what and why

The Competitive Advantage

Organizations that get this right won’t just avoid fines - they’ll win enterprise deals. Privacy-by-design observability is becoming a procurement checkbox.

Is anyone else navigating this intersection of observability and privacy compliance? What frameworks are working for you?

Michelle, this is exactly the conversation we need to be having more broadly. The regulatory landscape is only getting more complex.

GDPR/CCPA Implications for Telemetry

Let me expand on the specific regulatory challenges:

GDPR: The Right to Be Forgotten

Article 17 gives EU residents the right to erasure. When a user requests deletion:

What most organizations think they need to delete:

  • User database records
  • CRM entries
  • Marketing lists

What they actually need to delete:

  • All traces containing their session ID
  • All logs with their IP address
  • All metrics with their user ID as a label
  • All error reports with their email

The problem: Most observability systems aren’t designed for selective deletion. You can drop entire time ranges, not specific user records.

CCPA: Know What You Have

California requires disclosure of what personal information you collect. This means:

  1. Data inventory - You need to know every PII field in every telemetry stream
  2. Purpose specification - Why are you collecting each field?
  3. Third-party disclosure - Your observability vendor counts as a third party

Regulatory Friction Points

Requirement Observability Reality Gap
User deletion Bulk time-range deletion Cannot target specific users
Data minimization Collect everything mentality Over-collection by default
Purpose limitation General debugging No documented purpose per field
Retention limits Indefinite archival No automated cleanup

What’s Actually Working

1. PII Detection Pipelines

We run ML-based PII detection on telemetry before it leaves our collectors:

  • Email regex patterns
  • Credit card number detection
  • SSN/national ID patterns
  • Custom patterns for our domain (member IDs, etc.)

False positive rate is high, but false negatives are the bigger risk.

2. Pseudonymization by Default

All user identifiers are hashed at collection. We maintain a separate, access-controlled lookup table if we need to correlate back - but that table has its own retention policy.

3. Geographic Data Segregation

EU user telemetry stays in EU regions. Period. This isn’t just about latency - it’s about jurisdictional control.

The regulatory pressure is only increasing. Organizations that don’t build privacy into their observability architecture now will be retrofitting painfully later.

Michelle, Sam - this thread is highlighting something we’ve been wrestling with on the product side.

User Privacy in Observability Data

We’ve been doing a lot of user research lately, and privacy expectations are shifting dramatically.

What Users Assume vs. Reality

Users assume:

  • “Debugging data” is anonymous
  • Technical logs don’t contain personal information
  • Their activity isn’t being tracked in detail

Reality:

  • Their user ID is in every trace
  • Their search queries are in logs
  • Their exact click patterns are observable
  • Their error messages contain their data

The Product Trust Equation

We’ve seen this play out in user research:

“Wait, you can see exactly which pages I visited and when? That feels invasive.”

Users don’t distinguish between “observability data” and “tracking data.” To them, if you can see their behavior in detail, you’re tracking them.

Where Product and Engineering Collide

Product wants:

  • Detailed user journey analytics
  • Error reproduction with full context
  • Performance metrics per user segment

Privacy requires:

  • Minimal data collection
  • Anonymization by default
  • Limited retention

Our Compromise Framework

  1. Aggregate first - Can we answer the question with aggregate data? If yes, don’t collect individual data.

  2. Ephemeral by default - Detailed user context is available for 24 hours for debugging, then anonymized.

  3. Opt-in detail - Users experiencing issues can consent to detailed session capture for support purposes.

  4. Transparent dashboards - Show users what data we have about them (GDPR data access request compliance as a feature).

The Competitive Angle

We’re starting to see privacy-first observability as a selling point. B2B customers are asking about it in procurement. Consumer products are marketing it.

The organizations that figure this out first will have an advantage - both in compliance and in user trust.

Great discussion. Let me add the technical implementation perspective on data scrubbing in observability pipelines.

Technical Implementation of Data Scrubbing

We’ve gone through several iterations of PII handling in our telemetry pipeline. Here’s what we’ve learned.

Architecture Options

Option 1: Client-Side Scrubbing

Application → Scrub PII → Collector → Backend
  • Pros: PII never leaves the application boundary
  • Cons: Inconsistent implementation across services, performance overhead in hot paths

Option 2: Collector-Side Scrubbing

Application → Collector → Scrub PII → Backend
  • Pros: Centralized rules, consistent application
  • Cons: PII traverses internal network

Option 3: Edge Scrubbing (Our Choice)

Application → Local Agent → Scrub PII → Collector → Backend
  • Pros: PII contained to local node, centralized rules via agent config
  • Cons: Additional agent to manage

OpenTelemetry Collector Configuration

Here’s our production config for PII handling:

processors:
  # Transform processor for field-level operations
  transform/pii:
    trace_statements:
      - context: span
        statements:
          # Hash user identifiers
          - set(attributes["user.id"], SHA256(attributes["user.id"])) 
            where attributes["user.id"] != nil
          # Redact email addresses
          - replace_pattern(attributes["user.email"], 
            "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}", 
            "[EMAIL_REDACTED]")
          # Truncate IP addresses to /24
          - replace_pattern(attributes["client.ip"],
            "(\\d+\\.\\d+\\.\\d+)\\.\\d+",
            "$$1.0")
    log_statements:
      - context: log
        statements:
          # Redact credit card patterns
          - replace_pattern(body, 
            "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b",
            "[CC_REDACTED]")
          # Redact SSN patterns
          - replace_pattern(body,
            "\\b\\d{3}-\\d{2}-\\d{4}\\b",
            "[SSN_REDACTED]")

  # Filter processor to drop sensitive spans entirely
  filter/sensitive:
    spans:
      exclude:
        match_type: regexp
        attributes:
          - key: http.route
            value: "/admin/.*|/internal/.*"

Performance Considerations

Regex-based scrubbing adds overhead:

  • ~2-5% CPU increase at collector
  • ~10-20ms latency increase for log processing
  • Memory overhead for pattern compilation

We pre-compile patterns and use sampling to reduce impact on high-volume streams.

Testing Your Scrubbing

We run weekly “PII leak tests” - injecting known PII patterns through the pipeline and verifying they’re scrubbed before reaching storage.

The tooling for privacy-compliant observability is maturing rapidly. A year ago this would have required significant custom development. Now it’s mostly configuration.