Building Pre-Deployment Cost Gates: Technical Implementation

I’ve been deep in the weeds building pre-deployment cost gates for our LLM infrastructure, and I want to share what we’ve built. This is a technical deep-dive for anyone actually implementing this.

Context: We’re running inference for large language models where costs can spike from $10K to $100K+ per month if we’re not careful. A single misconfigured deployment can burn through serious money before anyone notices. We needed gates that actually work.

Architecture Overview:

┌─────────────────┐
│   Developer     │
│   Pushes Code   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GitHub Actions │
│   CI/CD Pipeline│
└────────┬────────┘
         │
         ▼
┌─────────────────┐    ┌──────────────────┐
│ Cost Estimation │───▶│  Policy Engine   │
│   (Infracost)   │    │      (OPA)       │
└─────────────────┘    └────────┬─────────┘
                               │
                     ┌─────────┴─────────┐
                     │                   │
                     ▼                   ▼
              ┌──────────┐        ┌──────────┐
              │  ALLOW   │        │  BLOCK   │
              │  Deploy  │        │ + Notify │
              └──────────┘        └──────────┘
                                        │
                                        ▼
                                 ┌─────────────┐
                                 │ Approval Bot│
                                 │   (Slack)   │
                                 └─────────────┘

Component 1: Cost Estimation Engine

We use Infracost for Terraform infrastructure and custom scripts for application-level costs.

Terraform cost estimation:

# .github/workflows/cost-check.yml
- name: Checkout base branch
  uses: actions/checkout@v3
  with:
    ref: '${{ github.event.pull_request.base.ref }}'

- name: Generate base cost estimate
  run: |
    infracost breakdown --path=. \
      --format=json \
      --out-file=/tmp/infracost-base.json

- name: Checkout PR branch
  uses: actions/checkout@v3

- name: Generate PR cost estimate  
  run: |
    infracost breakdown --path=. \
      --format=json \
      --out-file=/tmp/infracost-pr.json

- name: Generate cost diff
  run: |
    infracost diff \
      --path=/tmp/infracost-base.json \
      --compare-to=/tmp/infracost-pr.json \
      --format=json \
      --out-file=/tmp/infracost-diff.json

Application-level cost estimation (custom):

For things like Lambda, API calls, and LLM inference that Infracost can’t estimate:

# cost_estimator.py
def estimate_lambda_cost(config):
    """Estimate Lambda costs based on memory, duration, invocations"""
    memory_gb = config['memory_mb'] / 1024
    duration_sec = config['estimated_duration_ms'] / 1000
    invocations_per_month = config['estimated_invocations']
    
    # Lambda pricing: $0.0000166667 per GB-second
    compute_cost = memory_gb * duration_sec * invocations_per_month * 0.0000166667
    
    # Request cost: $0.20 per 1M requests
    request_cost = (invocations_per_month / 1000000) * 0.20
    
    return compute_cost + request_cost

def estimate_llm_cost(config):
    """Estimate LLM inference costs"""
    model = config['model']  # e.g., 'gpt-4', 'claude-3'
    estimated_tokens_per_month = config['estimated_tokens']
    
    pricing = {
        'gpt-4': {'input': 0.03/1000, 'output': 0.06/1000},
        'claude-3': {'input': 0.015/1000, 'output': 0.075/1000}
    }
    
    # Assume 60/40 input/output split
    input_tokens = estimated_tokens_per_month * 0.6
    output_tokens = estimated_tokens_per_month * 0.4
    
    cost = (input_tokens * pricing[model]['input'] + 
            output_tokens * pricing[model]['output'])
    
    return cost

Component 2: Policy Engine (OPA)

We run Open Policy Agent to evaluate costs against our policies:

# policies/cost_limits.rego
package cost

import future.keywords.if
import future.keywords.in

# Default limits by environment
default_limits := {
    "dev": 1000,
    "staging": 2000,
    "prod": 5000
}

# Get environment from resource tags
get_environment(resource) := env if {
    env := resource.tags.environment
}

# Deny if monthly cost exceeds environment limit
deny[msg] if {
    some resource in input.resources
    cost := resource.monthly_cost
    env := get_environment(resource)
    limit := default_limits[env]
    cost > limit
    not resource.tags.cost_approved
    msg := sprintf(
        "Resource %s costs $%.2f/month, exceeds %s limit of $%.2f",
        [resource.name, cost, env, limit]
    )
}

# Deny if aggregate deployment cost is too high
deny[msg] if {
    total_cost := sum([r.monthly_cost | r := input.resources[_]])
    total_cost > 10000
    not input.deployment.tags.high_cost_approved
    msg := sprintf(
        "Total deployment cost $%.2f exceeds $10,000 threshold",
        [total_cost]
    )
}

# Require cost tags on all resources
deny[msg] if {
    some resource in input.resources
    not resource.tags.cost_center
    msg := sprintf("Resource %s missing required cost_center tag", [resource.name])
}

# Warn for expensive instance types in dev
warn[msg] if {
    some resource in input.resources
    resource.type == "aws_instance"
    resource.tags.environment == "dev"
    resource.instance_type in ["m5.4xlarge", "r5.4xlarge", "c5.4xlarge"]
    msg := sprintf(
        "Resource %s uses expensive instance type %s in dev environment",
        [resource.name, resource.instance_type]
    )
}

Component 3: CI/CD Integration

Putting it all together in GitHub Actions:

- name: Run cost estimation
  run: |
    python scripts/cost_estimator.py \
      --terraform-plan=tfplan \
      --output=costs.json

- name: Evaluate cost policies
  run: |
    conftest test costs.json \
      --policy=policies/ \
      --output=json \
      --fail-on-warn=false \
      > policy-results.json

- name: Post PR comment with results
  uses: actions/github-script@v6
  with:
    script: |
      const results = require('./policy-results.json');
      const comment = generateCostComment(results);
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.repo,
        body: comment
      });

- name: Block if policies failed
  run: |
    if jq -e '.failures | length > 0' policy-results.json; then
      echo "Cost policies failed, blocking deployment"
      exit 1
    fi

Component 4: Exception Workflow (Slack Bot)

When a deployment is blocked, we notify via Slack with one-click approval:

# slack_approval_bot.py
@app.command("/cost-approve")
def handle_approval_request(ack, command, client):
    ack()
    
    pr_url = command['text'].split()[0]
    justification = ' '.join(command['text'].split()[1:])
    cost_amount = get_cost_from_pr(pr_url)
    
    # Determine approver based on cost
    approver = determine_approver(cost_amount)
    
    client.chat_postMessage(
        channel=approver_slack_channel(approver),
        text=f"Cost approval request for {pr_url}",
        blocks=[
            {
                "type": "section",
                "text": {
                    "type": "mrkdwn",
                    "text": f"*Cost Approval Needed*\n\n"
                            f"PR: {pr_url}\n"
                            f"Estimated cost: ${cost_amount}/month\n"
                            f"Justification: {justification}\n"
                            f"Requested by: {command['user_name']}"
                }
            },
            {
                "type": "actions",
                "elements": [
                    {
                        "type": "button",
                        "text": {"type": "plain_text", "text": "Approve"},
                        "style": "primary",
                        "value": f"approve:{pr_url}",
                        "action_id": "approve_cost"
                    },
                    {
                        "type": "button",
                        "text": {"type": "plain_text", "text": "Deny"},
                        "style": "danger",
                        "value": f"deny:{pr_url}",
                        "action_id": "deny_cost"
                    }
                ]
            }
        ]
    )

@app.action("approve_cost")
def handle_approval(ack, body, client):
    ack()
    pr_url = body['actions'][0]['value'].split(':')[1]
    
    # Add cost-approved tag to PR
    add_tag_to_pr(pr_url, "cost-approved")
    
    # Notify requester
    client.chat_postMessage(
        channel=get_requester_channel(pr_url),
        text=f"✅ Cost approval granted for {pr_url}. You can retry deployment."
    )

Technical Challenges & Solutions:

Challenge 1: Accurate serverless cost estimation

Problem: Hard to estimate Lambda costs without knowing invocation patterns.

Solution:

  • Use historical data from similar functions
  • Conservative estimates (assume high-end of expected range)
  • Monitor actual vs estimated, improve model over time

Challenge 2: Policy evaluation performance

Problem: Running Infracost + OPA added 45 seconds to CI/CD initially.

Solution:

  • Cache Infracost pricing data (refresh daily, not every build)
  • Run policy evaluation in parallel with other CI/CD steps
  • Optimize Rego policies (avoid unnecessary iterations)
  • Now down to 18 seconds

Challenge 3: Handling multi-cloud deployments

Problem: We use AWS, GCP, and on-prem K8s. Different cost models.

Solution:

  • Normalize costs to “monthly cost” abstraction
  • Cloud-specific estimators feed into unified format:
{
  "resources": [
    {
      "name": "api-server",
      "type": "compute",
      "cloud": "aws",
      "monthly_cost": 450.00,
      "breakdown": {...}
    }
  ]
}
  • Policies evaluate normalized format, don’t care about cloud

Challenge 4: Cost estimation for usage-based services

Problem: API Gateway, LLM calls, etc. have variable costs.

Solution:

  • Estimate based on historical usage patterns
  • Show cost ranges: “Estimated: $200-$800/month”
  • Policy handles ranges:
deny[msg] if {
    cost_max := resource.cost_estimate.max
    cost_max > limit
}

Monitoring & Iteration:

We track:

  • Estimation accuracy: Compare estimated vs actual costs monthly
  • False positive rate: Blocked deployments that should have passed
  • Approval time: How long exception workflows take
  • Developer satisfaction: Regular surveys on the process

Current stats after 4 months:

  • Estimation accuracy: ±25% for infrastructure, ±40% for serverless
  • False positive rate: 4.2% (down from 18% initially)
  • Average approval time: 22 minutes
  • Developer NPS: 7/10 (up from 5/10)

Lessons Learned:

  1. Start with read-only policies: Show violations without blocking, build confidence
  2. Invest in developer experience: Clear error messages matter more than policy sophistication
  3. Monitor actual vs estimated: Feedback loop improves estimation
  4. Keep policies simple initially: Add complexity gradually
  5. Fast approval workflow is critical: If approval takes >2 hours, people find workarounds

Open Source:

I’m planning to open-source our cost estimation scripts and OPA policies. Would include:

  • Terraform cost policy library
  • Application-level cost estimators (Lambda, API Gateway, LLM)
  • GitHub Actions workflow templates
  • Slack approval bot code
  • Policy testing framework

Would this be useful? What else should I include?

Questions I’m still working through:

  1. How to handle cost trends vs absolute costs? (Deployment is cheap, but deploying 10 of them is expensive)
  2. Best way to estimate costs for new services we haven’t run before?
  3. How to integrate with FinOps tools (CloudHealth, Kubecost, etc.)?
  4. Should policies block on estimated cost or committed cost (RIs, savings plans)?

Anyone else building this? Let’s compare notes.

This is exactly the integration I was envisioning! The parallel with security policy gates is even stronger than I thought.

What really excites me: you’re reusing the same OPA infrastructure we already have for security. This means:

  • No new tooling to learn
  • Same CI/CD integration points
  • Policies can be tested the same way
  • Audit trail is unified

Security-Cost policy overlap:

I’m seeing interesting opportunities to combine security and cost policies. Example:

A massive EC2 instance in a dev environment could indicate:

  • Cost waste (over-provisioned for dev)
  • OR security incident (cryptomining malware)

Our security policies could trigger on both signals.

Integration question:

How do you handle the join between Infracost cost data and OPA policy input? Do you:

  1. Merge into single JSON input for OPA?
  2. Keep separate and use OPA’s data loading?
  3. Something else?

Tool suggestion for K8s costs:

For Kubernetes cost estimation, check out Kubecost’s APIs. They can estimate pod costs based on resource requests/limits. We use it and it integrates well with policy engines.

I’d absolutely contribute security policy patterns to your open-source library. The more we can share these patterns, the better.

Impressive technical implementation, Alex. I’m evaluating this for our enterprise environment (40+ teams), and I have some scalability questions.

Organizational scalability:

Managing cost policies across 15+ teams with different needs - how do you handle:

  1. Policy ownership: Centralized (platform team owns all) vs federated (teams can customize)?
  2. Policy conflicts: What if Team A’s policies conflict with global policies?
  3. Policy updates: How do you roll out policy changes without breaking everyone’s deployments?

Enterprise integration requirements:

In financial services, we need:

  • Audit trail: Every policy decision logged for SOX compliance
  • JIRA integration: Approval workflows through ServiceNow/JIRA
  • Cost center mapping: Map infrastructure to finance cost centers
  • Multi-tenant: Different policies for different business units

Your Slack bot is great for startups, but we’d need integration with enterprise approval systems.

Performance at scale:

18 seconds for cost estimation + policy eval is good for one repo. But we have:

  • 50+ active repos
  • 200+ deployments per day
  • Shared infrastructure across teams

How does this scale? Do you run centralized policy service or embed in each CI/CD pipeline?

Implementation timeline:

Month 1: Infrastructure setup
Month 2: Policy development
Month 3: Pilot
Month 4: Rollout

Is 4 months realistic for enterprise? Or should we plan 6-8 months?

I’d love to see your policy testing framework. Testing policies is critical for confidence.

Alex, solid technical implementation. I’m particularly interested in the developer experience aspects because that’s where these systems live or die.

Questions about the PR comment UX:

You mentioned posting cost estimates as PR comments. Can you share:

  • Screenshots or examples of what developers actually see?
  • How do you handle PR comment noise (if every commit triggers new comment)?
  • Do you update existing comment or create new ones?

Approval workflow UX:

22-minute average approval time is good, but what’s the distribution?

  • 50th percentile: probably fast
  • 95th percentile: might be hours?

If 5% of approvals take 2+ hours, that’s still frustrating for developers.

Developer education:

How do you help developers understand:

  • Why their deployment was blocked?
  • How to fix it (specific actions, not just “reduce cost”)?
  • Who to contact if they think it’s a false positive?

Success metrics question:

Developer NPS of 7/10 is decent, but what would make it 9/10? What are developers still complaining about?

Budget ask:

What’s the total implementation cost?

  • Engineering time (FTEs)
  • Infrastructure costs (tools, services)
  • Ongoing maintenance

Trying to build business case for my organization.

Your open-source library would be incredibly valuable. Please include:

  • Developer-facing documentation (how to fix violations)
  • Training materials (how to think about cost)
  • Communication templates (announcing cost gates to teams)

I can’t contribute to the technical implementation (way over my head) but I want to echo Keisha’s point about developer experience.

The error messages you showed are GREAT.

This is good: “Resource api-server costs $8,500/month, exceeds prod limit of $5,000”

But it could be even better with:

  • Clear visual hierarchy showing what’s wrong
  • Specific suggestions for how to fix it
  • Easy next steps for approval or questions

Visual feedback would be powerful:

Instead of just text in PR comments, what if you showed:

  • Cost breakdown chart (what’s expensive?)
  • Cost trend (team spending over time)
  • Budget gauge (team budget remaining)

Think GitHub’s code coverage comments but for cost.

Questions:

  1. What do developers see when policy evaluation is running? Loading state? Progress?
  2. How do you communicate policy changes to developers? (Hey, cost limits changed)
  3. If I’m a new engineer, how do I learn about cost policies?

I’d be happy to help design:

  • Error message templates
  • PR comment formatting
  • Slack notification UX
  • Developer documentation

Good UX makes good technical systems actually get adopted. Let me know if you want design help!