85% of high-security orgs use ephemeral runners—should this be table stakes?

cto_michelle · March 15, 2026, 5:37pm

I’ve been analyzing our CI/CD security posture, and one statistic keeps bothering me: 85% of high-security organizations have adopted ephemeral runners, but the majority of companies are still running persistent runners. As we move into 2026, I’m questioning whether ephemeral runners should be considered baseline infrastructure hygiene rather than an advanced security practice.

Let me explain what ephemeral runners are for those less familiar with the infrastructure details. Traditional CI/CD runners are long-lived virtual machines or containers that persist across multiple build jobs. They accumulate state, secrets, cached dependencies, and environmental configurations over time. An ephemeral runner, by contrast, is a fresh container or VM spun up for a single build job and destroyed immediately afterward. Nothing persists. No state accumulates. Each job starts from a known, clean baseline.

The security implications are profound. With persistent runners, if malware or a malicious dependency compromises a build job, that compromise can persist and affect subsequent builds - potentially for months without detection. We’ve seen this pattern in recent supply chain attacks where adversaries established persistence in build environments and exfiltrated secrets or injected malicious code into artifacts over extended periods.

Ephemeral runners eliminate this persistence vector entirely. A compromised job might steal secrets accessible during that specific build, but it cannot establish a foothold for future exploitation. The blast radius is contained to a single job execution. From a security architecture perspective, this is a fundamental shift from detection-and-response to prevention-by-design.

We implemented ephemeral runners across our infrastructure last year, and I want to share both the benefits and challenges we encountered. The benefits were immediate: our security team stopped worrying about runner compromise persistence, our compliance team loved the clean audit story, and our infrastructure costs actually decreased because we could scale runners elastically rather than maintaining persistent capacity.

The challenges were more subtle. Our existing build workflows had accumulated dependencies on persistent state - cached dependencies, build artifacts from previous jobs, environmental configurations that teams had “fixed” over time without documenting. Moving to ephemeral runners forced us to make all of these dependencies explicit, which was painful but ultimately healthy. We had to redesign our caching strategy, formalize our secret distribution approach, and document environmental requirements that had previously lived in tribal knowledge.

The implementation complexity isn’t trivial. We use Kubernetes-based runners with pod templates that define base images, resource limits, and security contexts. Each job gets a fresh pod from the template, executes, and the pod is deleted with a 30-second TTL. We integrate with our secret management system to inject credentials just-in-time rather than storing them in runner environments. Our dependency caching moved to a centralized artifact registry rather than relying on local filesystem caching.

But here’s my fundamental question: given the known risks of persistent runners, the proven effectiveness of ephemeral runners in high-security environments, and the maturity of implementation tooling available in 2026, should we still consider this an “advanced” security practice? Or should ephemeral runners be the expected baseline, with persistent runners relegated to legacy systems we’re actively migrating away from?

I’m particularly interested in perspectives from organizations that haven’t yet made this transition. What’s holding you back - technical complexity, resource constraints, organizational inertia, or something else? And for those who have implemented ephemeral runners, what was your experience? Would you consider this a prerequisite for modern CI/CD security, or am I overstating the case?

maya_builds · March 15, 2026, 5:37pm

Michelle, I love the framing of “disposable by design” - it resonates deeply with how we think about design systems and component architecture. We’ve spent years evangelizing immutability in our design components: each version is a discrete, immutable artifact rather than something that evolves in place. The parallels to ephemeral runners are striking.

In design systems, we learned that immutability forces discipline. Teams can’t rely on undocumented state or “this component works because someone tweaked it three months ago.” Every dependency must be explicit, every configuration must be declared. It sounds like ephemeral runners create exactly the same forcing function for build infrastructure.

But I’m curious about the implementation complexity and cost implications you mentioned. When we rolled out our immutable design system architecture, we faced significant resistance from teams who had built workflows around mutable components. The transition required dedicated migration support, extensive documentation, and honestly, a lot of hand-holding.

A few questions from the perspective of someone who’s led similar transformations in a different domain:

What was the timeline from decision to full implementation? How long did teams need to migrate their existing workflows?
You mentioned infrastructure costs decreased - was that immediate or did you go through a period of running both persistent and ephemeral runners simultaneously?
How did you handle the education and training challenge? Did you create templates or reference implementations to help teams understand the new model?
Were there specific types of build jobs that proved particularly difficult to migrate to ephemeral runners?

I’m asking because if this truly should be table stakes for 2026, we need to understand not just the “why” but the “how” - and specifically, how to make the transition approachable for organizations that don’t have dedicated platform engineering teams.

eng_director_luis · March 15, 2026, 5:37pm

Michelle, from a financial services perspective, I’d argue that ephemeral runners aren’t just table stakes - they’re rapidly becoming a regulatory compliance requirement, even if regulators haven’t explicitly codified it yet.

We completed our migration to ephemeral runners nine months ago, and it was driven primarily by our annual SOC 2 audit. Our auditors asked increasingly pointed questions about build environment security, secret rotation, and the blast radius of potential CI/CD compromises. When we explained our persistent runner architecture, the auditor’s report included a formal recommendation to implement ephemeral runners or document compensating controls. Given the complexity of compensating controls versus the clear security benefits of ephemeral runners, the decision became straightforward.

What I found particularly valuable from a compliance perspective is the audit trail clarity. With ephemeral runners, we can demonstrate with precision exactly what environment executed each build, what secrets were accessible during that specific job, and that no persistent state could have been compromised. Our compliance team loves being able to point to infrastructure-as-code definitions and say, “This is exactly what every runner looked like.”

But Maya’s questions about implementation complexity are spot-on. Our migration took six months from planning to completion, with a four-month period where we ran both persistent and ephemeral runners in parallel. This dual-running period was necessary not just technically but organizationally - teams needed time to understand the new model, migrate their workflows, and validate that nothing broke.

The resource requirements were significant. We assigned two senior platform engineers full-time to the migration, plus 20% of our security team’s capacity for secrets management redesign. Every engineering team invested time in workflow migration, though we created template pipelines that reduced most teams to 2-3 days of work.

The specific challenges we encountered:

Legacy test suites that relied on filesystem state between test runs
Build processes that used incremental compilation assuming persistent disk state
Custom tooling that teams had installed directly on runners rather than containerizing
Secret distribution patterns that assumed long-lived runner identities

Each of these required fundamental rethinking, but in every case, the ephemeral model forced us to build better architecture. We eliminated hidden dependencies, improved our containerization practices, and formalized our secret management.

My perspective: if you’re subject to any meaningful regulatory oversight, ephemeral runners are already effectively mandatory. The question is whether you implement them proactively or wait for an auditor to require them.

vp_eng_keisha · March 15, 2026, 5:38pm

The organizational change management aspect of ephemeral runner adoption is something I don’t think we talk about enough. Michelle and Luis both touched on technical implementation, but I want to focus on the people and process challenges we faced rolling this out across 80+ engineers.

When we announced the ephemeral runner migration, the initial reaction from teams wasn’t resistance - it was confusion. Most engineers didn’t understand why it mattered. They knew their builds worked on the current persistent runners, and the abstract security benefits of ephemeral runners felt distant compared to the concrete work of migrating their pipelines. We had to invest heavily in education before we could even begin technical migration.

We created a comprehensive rollout program:

Threat modeling workshops: We brought teams through actual supply chain attack scenarios, showing them how persistent runner compromises could affect their services. Making the threat concrete and personal drove much better engagement than abstract security presentations.
Reference implementations: Our platform team built example pipelines for every major tech stack in our organization - Node.js services, Python data pipelines, Java backend systems, Go microservices. Teams could start from working examples rather than translating concepts.
Office hours: We held weekly office hours for four months where teams could get live help with their migration challenges. The most common issues weren’t technical - they were teams discovering undocumented dependencies in their build processes.
Metrics and visibility: We built dashboards showing each team’s migration progress and, critically, the security posture improvement as they migrated. Seeing the tangible improvement in their security metrics motivated teams.

What I learned is that organizational maturity matters enormously. Teams with strong DevOps practices, good containerization, and explicit infrastructure-as-code adapted quickly - often in a single sprint. Teams with tribal knowledge, manual configuration, and implicit dependencies struggled for months.

This raises a challenging question: if ephemeral runners should be table stakes, but successful adoption requires organizational maturity in containerization, infrastructure-as-code, and explicit dependency management, are we actually saying those practices should be table stakes? Because that’s a much higher bar, and I’m not confident most organizations have achieved it.

How do we make ephemeral runners accessible to organizations that are still on their journey toward infrastructure maturity?

product_david · March 15, 2026, 5:38pm

I want to bring a business perspective to this discussion, because ultimately the “should this be table stakes” question comes down to ROI and competitive dynamics.

When Michelle asks if ephemeral runners should be baseline for 2026, I think we need to reframe the question: what’s the business case for making this investment, and how do we measure success?

From a product leadership standpoint, I look at three factors:

1. Risk reduction: What’s the potential business impact of a CI/CD compromise versus the cost of ephemeral runner implementation? Luis mentioned six months and significant engineering resources. That’s real cost. But a supply chain attack that compromises customer data or service availability could be orders of magnitude more expensive in terms of customer trust, regulatory fines, and remediation.

2. Velocity impact: Keisha’s point about organizational maturity is crucial here. If ephemeral runners slow down teams who aren’t already practicing good containerization and infrastructure-as-code, we’re trading security for velocity. That’s a valid trade-off if we’re honest about it, but we need to measure the velocity impact quantitatively.

3. Competitive differentiation: Are our competitors implementing ephemeral runners? If 85% of high-security organizations have adopted this practice, and we’re competing for enterprise customers who care about security posture, we may not have a choice. This becomes table stakes not because it’s technically optimal, but because it’s what the market expects.

What I’m struggling with is how to quantify the security benefit. We can measure implementation cost precisely - engineering time, infrastructure costs, productivity impact during migration. But how do we measure “reduced risk of persistent runner compromise”? The benefit is preventing something that hasn’t happened yet.

This creates a challenging conversation with executive leadership. “We need to invest six months of platform engineering time to reduce a risk we can’t quantify” is a tough sell compared to “We need to invest six months to ship features customers are requesting.”

How are other product and engineering leaders making the business case for ephemeral runners in ROI terms that resonate with non-technical stakeholders?