🏗️ SF Tech Week Enterprise Panel: Why 80% of AI Projects Never Make It to Production

Just came from the most sobering panel of SF Tech Week: “Enterprise AI: From POC to Production.”

Four enterprise CTOs. All deploying AI at scale. All with horror stories.

The Stat That Shocked the Room

“Less than 20% of AI initiatives are scaled across the enterprise.”

Source: EPAM study, confirmed by all 4 panelists from their own experience

Translation: 80% of AI projects FAIL to reach production.

Not fail because the technology doesn’t work. Fail because of organizational, technical, and cultural barriers.

The “AI POC Success” Trap

Here’s the pattern every panelist described:

Phase 1: POC Success :white_check_mark:

  • Pick a well-scoped use case
  • Use GPT-4 or Claude API
  • Build a demo in 2-4 weeks
  • Show impressive results
  • Executive sponsors excited

Phase 2: Production Reality :cross_mark:

  • Try to integrate with enterprise systems
  • Hit data quality issues
  • Discover compliance requirements
  • Realize infrastructure doesn’t scale
  • Security review kills the project

One CTO called it: “Death Valley between POC and Production”

The 5 Barriers to Production AI

From the panel + supporting research:

1. Legacy System Integration (60% cite this as top barrier)

The POC:

  • Demo works on clean CSV data
  • No system dependencies
  • Runs on someone’s laptop

Production reality:

  • Data in 15 different systems (SAP, Salesforce, Oracle, custom databases)
  • No APIs, only batch file exports from 1995
  • Data format inconsistencies
  • Need real-time sync but systems don’t support it

Panelist quote: “Our POC worked perfectly. Then we found out the data we needed was in a mainframe from 1987 that nobody knows how to access anymore.”

2. Data Quality and Availability (50% of companies)

The POC:

  • Hand-curated sample data
  • Edge cases removed
  • “Representative” subset (actually: cleaned and perfect)

Production reality:

  • Missing data (30% of fields null)
  • Inconsistent formats (dates in 7 different formats)
  • Duplicate records
  • Contradictory data across systems
  • Historical data needed for training doesn’t exist

Data quality rule: POC uses top 10% cleanest data. Production requires handling bottom 90%.

3. Skills Gap (40% lack AI expertise)

The POC:

  • External consultants or ML engineers build it
  • Data scientists fine-tune the model
  • Proof of concept doesn’t need maintenance

Production reality:

  • Need engineers to maintain it (consultants are gone)
  • IT team doesn’t understand ML (can’t debug when it breaks)
  • Data drift happens (model accuracy degrades)
  • Nobody on staff can retrain or update models

Hiring challenge: 43% of companies plan to hire AI roles in 2025, competing for same small talent pool.

Most in-demand roles:

  • Machine learning engineers
  • AI researchers
  • ML ops engineers
  • AI ethics/governance specialists

4. AI Governance and Compliance (18 months to implement)

The POC:

  • “Don’t worry about compliance for the demo”
  • Skips security review
  • No audit trail
  • No bias testing

Production reality:

  • Legal requires AI risk assessment (new process, takes 3 months)
  • Compliance requires model explainability (black box = blocked)
  • Security requires penetration testing of AI components
  • Privacy requires data minimization and consent
  • Regulations (EU AI Act, state laws) require documentation

Average time to implement AI governance: 18 months

From idea to AI governance framework in place = year and a half.

5. Cost at Scale

The POC:

  • 1,000 API calls to GPT-4
  • Cost: $50
  • “Totally affordable!”

Production reality:

  • 10 million API calls/month
  • Cost: $500,000/year
  • CFO: “Absolutely not”

Need to:

  • Optimize prompts (reduce tokens)
  • Switch to smaller models (lose quality)
  • Self-host open source (infrastructure complexity)
  • Implement caching (engineering effort)

One panelist: POC to production = 23x cost multiplier after accounting for all infrastructure, tooling, and engineering.

Real Enterprise AI Production Stories

Success Story: Financial Services Company

Use case: Automated document processing for loan applications

POC (4 weeks):

  • GPT-4 API extracts data from PDFs
  • 95% accuracy on test set
  • Stakeholders thrilled

Production journey (14 months):

  • Month 1-3: Security review (data can’t leave premises → need self-hosted model)
  • Month 4-6: Infrastructure build (GPU cluster, ML ops platform)
  • Month 7-9: Model fine-tuning (open source model to match GPT-4 accuracy)
  • Month 10-11: Integration (connect to 8 legacy systems)
  • Month 12-14: Compliance (audit trail, explainability, testing)

Final cost:

  • POC: $5K
  • Production: $800K first year (infrastructure + engineering)

Result: Successful deployment, processes 50K documents/month, saves $2M/year in manual labor

ROI positive after 6 months in production

Failure Story: Healthcare Company

Use case: AI-powered patient diagnosis support

POC (6 weeks):

  • LLM analyzes patient records, suggests diagnoses
  • Doctors love it in trials
  • 85% accuracy vs expert human

Production attempt (failed after 9 months):

  • HIPAA compliance blocked cloud AI APIs
  • Self-hosting required (security review takes 4 months)
  • Model explainability insufficient (doctors need to know WHY AI suggested diagnosis)
  • Liability concerns (who’s responsible if AI is wrong?)
  • Integration with EHR system impossible (vendor won’t provide API)

Result: Project cancelled, $1.2M spent, zero production deployment

Lesson: Some use cases aren’t ready for AI (regulatory/liability too high)

The 30% Who Succeed: What They Do Differently

IBM study: 30% of tech-advanced companies successfully implemented AI at scale

What separates success from failure:

:white_check_mark: Start with infrastructure, not use cases

  • Build ML ops platform first
  • Establish data pipelines
  • Create governance framework
  • THEN identify use cases

:white_check_mark: Choose low-risk, high-volume use cases

  • Not “AI diagnosis” (high risk)
  • Yes “email triage” (low risk)
  • Focus on efficiency, not critical decisions

:white_check_mark: Invest in change management

  • Train employees on AI tools
  • Address “AI will replace me” fears
  • Create AI champions in each department

:white_check_mark: Plan for the whole lifecycle

  • POC budget: $10K
  • Production budget: $500K-2M (50-200x multiplier)
  • If you can’t afford production, don’t start POC

:white_check_mark: Hybrid approach

  • Use APIs for low-volume, low-risk
  • Self-host for high-volume, high-sensitivity
  • Don’t go all-in on one strategy

The Questions I’m Taking Back to My Team

We’re mid-size company (500 employees), evaluating AI deployment.

Based on this panel, here’s my new checklist before starting ANY AI project:

Before POC:

  1. ☐ Do we have executive sponsorship + multi-year budget?
  2. ☐ Have we assessed data quality and availability?
  3. ☐ Do we have (or can we hire) ML engineering talent?
  4. ☐ Is our infrastructure ready (or can we build it)?
  5. ☐ Have we identified compliance requirements upfront?
  6. ☐ Can we commit to 12-18 month timeline?
  7. ☐ Is the ROI worth the investment (realistic production cost)?

If answer to ANY question is “no,” we shouldn’t start the POC.

POC-to-production checklist:

  1. ☐ Integration plan with all systems (documented before POC)
  2. ☐ Data quality assessment (measure completeness, accuracy)
  3. ☐ Compliance review completed (legal, security, privacy)
  4. ☐ Production cost model (realistic, not POC costs)
  5. ☐ Team trained (not relying on consultants)
  6. ☐ Monitoring and observability plan
  7. ☐ Model governance (versioning, retraining, rollback)

My Controversial Take

Hot take from the panel (everyone nodded):

“Most companies should NOT be building custom AI models. Use off-the-shelf AI products instead.”

Instead of:

  • :cross_mark: Building custom LLM application from scratch
  • :cross_mark: Fine-tuning open source models
  • :cross_mark: Hiring ML team

Consider:

  • :white_check_mark: Buying AI-enabled SaaS products (Salesforce with Einstein, Microsoft with Copilot)
  • :white_check_mark: Using AI APIs for specific tasks (OpenAI, Anthropic, Cohere)
  • :white_check_mark: Partnering with AI consultancies for specialized use cases

When to build custom:

  • AI is your core competitive advantage
  • Unique data gives you proprietary edge
  • Volume justifies infrastructure investment (>$500K/year in API costs)

When to buy:

  • AI is supporting tool, not core business
  • Standard use cases (email, documents, customer support)
  • Small/mid-size company (<1000 employees)

Questions for This Community

For CTOs/engineering leaders:

  • What’s been your POC → production success rate?
  • What’s your biggest barrier (integration, skills, cost, compliance)?
  • Are you building or buying AI?

For ML engineers:

  • How do you convince leadership that production is 10-50x harder than POC?
  • What’s your ML ops stack look like?

For everyone:

  • Is the 20% success rate acceptable or is enterprise AI fundamentally broken?

I’m trying to avoid becoming part of the 80% failure statistic.

Sources:

  • SF Tech Week “Enterprise AI: From POC to Production” panel (Day 5)
  • EPAM “What Is Holding Up AI Adoption” study
  • PwC 2025 AI Business Predictions
  • IBM “5 Biggest AI Adoption Challenges”
  • Pellera Technologies AI Adoption Challenges
  • Converge TP Top 5 AI Challenges 2025
  • Panel CTOs from: Financial services, healthcare, manufacturing, retail

@cto_michelle This hits HARD. We’re living this right now.

Our POC → Production Journey (Still Ongoing)

Use case: AI code review automation

POC (3 weeks):

  • :white_check_mark: GPT-4 analyzes pull requests
  • :white_check_mark: Suggests improvements (performance, security, style)
  • :white_check_mark: Developers love it in beta (10 person team)
  • :white_check_mark: Demo to CTO: “This will save 5 hours/week per developer!”

Production attempt (currently month 7):

  • :cross_mark: Integration with GitHub Enterprise (self-hosted, behind firewall)
  • :cross_mark: Data security (code can’t leave our VPC)
  • :cross_mark: Need self-hosted LLM (switching from GPT-4 to Llama 3.1)
  • :cross_mark: Infrastructure build (GPU servers, vLLM, monitoring)
  • :cross_mark: Model quality dropped (Llama 3.1 not as good as GPT-4 for code)
  • :cross_mark: Fine-tuning on our codebase (need ML engineer, don’t have one)
  • :counterclockwise_arrows_button: Currently stuck in “infrastructure build” phase

Original budget: $10K for POC
Actual cost so far: $180K and counting

The Barriers We Hit (In Order)

Barrier 1: Security Review

Security team: “Code is intellectual property. Can’t send to OpenAI.”

Us: “But it’s just a demo…”

Security: “No cloud AI APIs with proprietary code. Non-negotiable.”

Timeline impact: +2 months to evaluate self-hosting options

Barrier 2: Infrastructure

Need to self-host LLM:

  • GPU servers (we’re mostly CPU-based infrastructure)
  • ML ops platform (no expertise in-house)
  • Monitoring and logging (different from our standard tools)
  • Load balancing and scaling (new patterns for us)

Decision: Hire contractor for 3 months to build infrastructure

Timeline impact: +3 months

Barrier 3: Model Quality

Llama 3.1 70B (best we can self-host with our GPU budget):

  • :white_check_mark: Decent at general code review
  • :cross_mark: Misses security issues GPT-4 caught
  • :cross_mark: Style suggestions less helpful
  • :cross_mark: Hallucinates more often

Developers: “This is worse than the POC. Why did we downgrade?”

Need: Fine-tuning on our codebase to improve quality

Barrier 4: Skills Gap

Nobody on our team knows how to:

  • Fine-tune LLMs
  • Optimize inference performance
  • Debug model quality issues
  • Implement ML monitoring

Decision: Trying to hire ML engineer (6 weeks searching, no offers accepted yet)

Timeline impact: +??? (still hiring)

Barrier 5: Integration Complexity

GitHub Enterprise webhook → our infrastructure → LLM → back to GitHub

Sounds simple. Reality:

  • Rate limiting (GitHub API limits)
  • Error handling (what if LLM times out?)
  • Retry logic (what if analysis fails?)
  • Versioning (how to handle model updates without breaking?)

Engineering effort: 2 senior engineers, 6 weeks

The Lessons We’re Learning

Lesson 1: Multiply POC estimates by 20x

  • POC: 3 weeks, $10K
  • Production: 9 months (so far), $180K+ (and counting)

Rule of thumb I’m using now: POC cost × 20 = production cost

Lesson 2: Security kills cloud AI for enterprises

Every enterprise we talk to:

  • Can’t use OpenAI/Anthropic for proprietary data
  • Forces self-hosting
  • Self-hosting = infrastructure complexity

This is THE barrier for enterprise AI adoption.

Lesson 3: Open source models aren’t “free”

Yes, Llama 3.1 is free to download.

But total cost of ownership:

  • GPU infrastructure: $5K/month
  • ML engineer: $200K/year = $16.6K/month
  • Contractor for setup: $60K
  • Fine-tuning compute: $3K/month
  • Monitoring and tools: $2K/month

Total: $26.6K/month vs. OpenAI API would have been $8K/month

We’re paying 3.3x more to self-host. The “savings” are a myth.

Why we’re still doing it: Data security requirements (non-negotiable)

Lesson 4: POC hides the hard problems

POC answers: “Can AI do this task?”

Production answers:

  • Can AI do this task AT SCALE?
  • Can AI do this task RELIABLY?
  • Can AI do this task SECURELY?
  • Can AI do this task COST-EFFECTIVELY?
  • Can AI do this task WITH OUR DATA QUALITY?

These are different questions.

What I’d Do Differently

If I could restart this project:

1. Start with infrastructure assessment

Before POC:

  • ☐ What are our security requirements?
  • ☐ Can we use cloud APIs or must self-host?
  • ☐ If self-host, do we have GPU infrastructure?
  • ☐ Do we have ML engineering talent?

If answers are “must self-host” and “no infrastructure/talent,” STOP.

Either:

  • Build infrastructure first (6-12 months)
  • OR use AI-enabled products instead of building custom

2. Do POC with production constraints

Don’t demo GPT-4 if production will be Llama 3.1.

POC should use:

  • Same model as production
  • Same infrastructure as production
  • Same data quality as production
  • Same security constraints as production

3. Budget for production from day 1

POC pitch should be:

  • “POC will cost $10K and take 3 weeks”
  • “Production will cost $500K and take 12 months”
  • “Do we have $500K and 12 months?”

If no, don’t start POC.

4. Hire ML talent BEFORE starting

We tried to build AI without ML engineers. Bad idea.

Should have:

  • Hired ML engineer first
  • Had them architect the solution
  • Then started POC with production-ready approach

The 20% Success Rate Makes Sense Now

Looking at @cto_michelle’s 5 barriers:

  1. :white_check_mark: Legacy system integration - YEP, we hit this
  2. :white_check_mark: Data quality - YEP, our code has inconsistent formatting
  3. :white_check_mark: Skills gap - YEP, still trying to hire
  4. :hourglass_not_done: Governance - Haven’t hit this yet (will be barrier 6)
  5. :white_check_mark: Cost at scale - YEP, way more expensive than POC

We’ve hit 4 out of 5 barriers. No wonder 80% fail.

Are We Going to Make It?

Honest assessment:

Optimistic case (40% probability):

  • Hire ML engineer in next 2 months
  • Fine-tune model to acceptable quality
  • Deploy to production by month 12
  • Developers use it and save time
  • ROI positive after 18 months

Realistic case (40% probability):

  • Struggle to hire ML engineer
  • Launch with “good enough” model quality
  • Some developers use it, many ignore it
  • Mediocre ROI, project limps along

Pessimistic case (20% probability):

  • Can’t hire ML engineer
  • Model quality not good enough
  • Project cancelled after $300K spent
  • Join the 80% failure statistic

My Advice for Engineering Leaders

Before starting AI project, answer these:

  1. Can we use off-the-shelf product instead?

    • GitHub Copilot exists, why are we building custom?
    • Answer: We want code review, not code completion
  2. Can we use cloud APIs?

    • Security says no for code
    • Answer: Must self-host
  3. Do we have infrastructure?

    • No GPU platform
    • Answer: Need to build (6 months)
  4. Do we have talent?

    • No ML engineers
    • Answer: Need to hire (unknown timeline)
  5. What’s the ROI at production cost?

    • Save 5 hours/week × 100 developers = 500 hours/week
    • Value: ~$50K/month
    • Cost: ~$27K/month (at scale)
    • ROI: Positive if it works
    • Answer: Worth pursuing IF we can solve 1-4

If answers to 1-4 are all blockers, STOP.

Questions for @cto_michelle

You mentioned:

30% of tech-advanced companies successfully implemented AI at scale

What separates them from the 70% that failed?

We’re clearly “tech-advanced” (engineering team of 120), but we’re struggling.

What are we missing?

Sources:

  • Our internal project timeline and costs
  • 6 months of painful lessons
  • Conversations with other enterprise engineering teams at SF Tech Week
  • IBM and PwC enterprise AI adoption studies

Product manager here - let me add the USER ADOPTION perspective that technical folks often miss.

The Barrier Nobody Talks About: People

You can solve all 5 technical barriers (@cto_michelle’s list):

  • :white_check_mark: Integration with legacy systems
  • :white_check_mark: Data quality
  • :white_check_mark: Skills gap
  • :white_check_mark: Governance
  • :white_check_mark: Cost

And STILL fail at production because users don’t adopt it.

Our Story: AI-Powered Sales Tool

POC Success:

  • AI analyzes sales calls, provides real-time coaching
  • Tested with 5 top sales reps
  • They LOVED it: “This is incredible! Game-changer!”
  • Executives greenlit $800K production build

Production Reality:

  • Deployed to 200-person sales team
  • 6 months later, usage data:
    • 12% active users (24 out of 200)
    • 88% never even logged in

We solved all the technical problems. Failed at adoption.

Why Users Don’t Adopt Enterprise AI

Reason 1: “AI will replace me” Fear

Sales reps thought:

  • “If AI can coach me, can AI replace me?”
  • “If I use AI and succeed, is it me or the AI?”
  • “Management will use this to track my performance”

Result: Passive resistance. Not openly opposed, just… never use it.

Reason 2: Workflow Disruption

AI tool required:

  • Installing browser extension
  • Granting microphone access
  • Recording all calls
  • Reviewing AI suggestions after each call (5 minutes)

Sales reps: “I have 30 calls/day. I don’t have time for 2.5 extra hours reviewing AI.”

We built a feature. Users needed a workflow.

Reason 3: Trust Issues

AI suggested:

  • “Mention competitor pricing” (violates our sales policy)
  • “Follow up in 2 days” (customer explicitly said call back in 2 weeks)
  • “Emphasize ROI” (customer cares about compliance, not ROI)

After 3-4 bad suggestions, reps stopped trusting it.

AI was 85% accurate. But 15% errors destroyed trust.

Reason 4: Lack of Training

We deployed with:

  • :white_check_mark: Technical documentation
  • :white_check_mark: Tutorial video (15 minutes)
  • :cross_mark: No hands-on training
  • :cross_mark: No champions to help peers
  • :cross_mark: No ongoing support

Reps who hit issues:

  • Couldn’t troubleshoot
  • Contacted IT
  • IT didn’t know how to support AI tool
  • Rep gave up

Documentation is not training.

Reason 5: No Executive Use = No Urgency

Sales VPs didn’t use the tool.

Message to sales reps: “This is optional.”

Compare to Salesforce:

  • Executives USE Salesforce daily
  • Clear message: “If it’s not in Salesforce, it doesn’t exist”
  • Adoption: 95%

AI tool:

  • Executives just wanted reports from it
  • Unclear if it was mandatory or optional
  • Adoption: 12%

Users adopt what leadership uses.

The Change Management We Should Have Done

Based on painful retrospective:

Phase 1: Before POC (Change Management = 0%)

What we did:

  • :cross_mark: Skipped change management
  • :cross_mark: “Let’s just build it and they’ll love it”

What we should have done:

  • :white_check_mark: User research (what do sales reps actually need?)
  • :white_check_mark: Involve reps in design (co-create, not impose)
  • :white_check_mark: Address fears upfront (AI augments, doesn’t replace)

Phase 2: During Development (Change Management = 10%)

What we did:

  • :white_check_mark: Showed demos to sales leadership
  • :cross_mark: Didn’t involve actual sales reps
  • :cross_mark: Built in isolation

What we should have done:

  • :white_check_mark: Beta program with 20 reps (not just top performers)
  • :white_check_mark: Iterate based on feedback
  • :white_check_mark: Build champions who can advocate to peers

Phase 3: Deployment (Change Management = 20%)

What we did:

  • :white_check_mark: Announcement email from VP
  • :white_check_mark: Tutorial video
  • :cross_mark: Assumed that’s enough

What we should have done:

  • :white_check_mark: Hands-on training (2-hour workshop for every rep)
  • :white_check_mark: Office hours (daily support for first month)
  • :white_check_mark: Champion network (1 champion per 10 reps)
  • :white_check_mark: Incentives (gamification, recognition for top users)

Phase 4: Post-Launch (Change Management = 30%)

What we did:

  • :white_check_mark: Monitored usage metrics
  • :white_check_mark: Fixed bugs
  • :cross_mark: No proactive outreach to non-users

What we should have done:

  • :white_check_mark: 1-on-1s with non-adopters (understand barriers)
  • :white_check_mark: Success stories (showcase reps who benefited)
  • :white_check_mark: Continuous improvement (ship features users request)
  • :white_check_mark: Executive accountability (VPs use the tool themselves)

The AI Adoption Formula

Technical success ≠ User adoption

Formula for production AI success:

Technical Excellence (50%):

  • Works reliably
  • Integrates with systems
  • Acceptable quality
  • Secure and compliant

User Adoption (50%):

  • Solves real user pain (not what executives think users need)
  • Fits into workflow (minimal disruption)
  • Earns trust (high accuracy + transparent about limitations)
  • Supported by training and champions
  • Modeled by leadership

Most AI projects focus 90% on technical, 10% on adoption.

Should be 50/50.

The Relaunch Plan

We’re doing a reboot (6 months after failed launch):

1. User research (4 weeks)

  • Interview 40 sales reps
  • Understand actual pain points
  • Identify workflow constraints
  • Address fears and concerns

2. Redesign (8 weeks)

  • Simplify: Remove features reps didn’t want
  • Workflow integration: Work within existing tools (Salesforce, not standalone)
  • Trust building: Show confidence scores, explain reasoning

3. Beta program (8 weeks)

  • 20 reps, diverse (not just top performers)
  • Weekly feedback sessions
  • Iterate rapidly based on input
  • Build champions

4. Phased rollout (12 weeks)

  • Start with champion teams (20 reps)
  • Hands-on training (2 hours per rep)
  • Week 4: Expand to next 40 reps
  • Week 8: Expand to next 80 reps
  • Week 12: Full deployment (200 reps)

5. Executive accountability

  • Sales VPs commit to using tool themselves
  • Review AI insights in team meetings
  • Recognize top users publicly

Budget: $200K for change management (on top of $800K technical build)

Goal: 70% adoption within 6 months

The ROI of Change Management

Original launch:

  • Technical cost: $800K
  • Adoption: 12%
  • Value delivered: $800K × 12% = $96K worth of value
  • ROI: Negative

Relaunch with change management:

  • Technical cost: $800K (sunk)
  • Change management: $200K
  • Adoption target: 70%
  • Value delivered: $800K × 70% = $560K worth of value
  • ROI: Positive (assuming success)

Lesson: $200K in change management turns $800K failed project into successful one.

My Advice for Product Managers

Before building AI product:

  1. User research FIRST

    • What problems do users actually have?
    • Will AI solve them better than current solutions?
    • Will users change behavior to use AI?
  2. Prototype with Wizard of Oz

    • Human pretends to be AI
    • Test if users want the feature
    • Validate workflow integration
    • THEN build real AI
  3. Budget for adoption

    • Technical build: X
    • Change management: 0.25X (25% of technical cost)
    • Training and support: 0.15X (15% of technical cost)
    • Total: 1.4X
  4. Measure adoption, not just technical metrics

    • Not just “does it work?”
    • Also “are people using it?”
    • Track: Active users, frequency, retention

Questions for This Community

For product managers:

  • What’s your AI product adoption rate?
  • How much do you invest in change management vs. technical build?

For CTOs/eng leaders:

  • How do you balance technical excellence with user adoption?
  • Do you involve users in AI development process?

For @cto_michelle and @eng_director_luis:

  • Are you planning for change management?
  • Who owns user adoption (product, engineering, or someone else)?

The 80% failure rate isn’t just technical. It’s organizational and cultural.

Sources:

  • Our failed launch and retrospective
  • SF Tech Week “Enterprise AI Adoption” workshop (Day 5)
  • Change management research from PwC and IBM studies
  • Conversations with 8 other product teams at SF Tech Week who had similar adoption failures

Security and compliance perspective: The governance barrier is MASSIVE and often underestimated.

The 18-Month Governance Timeline is Real

@cto_michelle mentioned:

Average 18 months to implement AI governance

I’m living this. We’re 11 months into building AI governance framework and still not done.

What “AI Governance” Actually Means

Most people think: “Write a policy, check a box, done.”

Reality: Entire organizational process covering:

1. AI Risk Assessment Framework

  • Identify all AI use cases
  • Classify by risk level (high/medium/low)
  • Different approval workflows for each level
  • Risk assessment template and process

Timeline to build: 3 months

2. Model Validation and Testing

  • Bias testing (does model discriminate?)
  • Adversarial testing (can it be manipulated?)
  • Performance testing (accuracy, precision, recall)
  • Explainability testing (can we explain decisions?)

Timeline per model: 2-4 weeks

3. Data Governance

  • What data can be used for AI training?
  • How is data anonymized/de-identified?
  • Consent management (do we have permission?)
  • Data retention and deletion policies

Timeline to build: 4 months

4. Compliance Mapping

  • EU AI Act (effective August 2025)
  • State AI laws (Colorado, California, etc.)
  • Industry regulations (HIPAA, SOX, PCI-DSS)
  • International laws (GDPR, etc.)

Timeline: 2 months initial, ongoing updates

5. AI Ethics and Responsible AI

  • Define responsible AI principles
  • Create AI ethics review board
  • Establish fairness criteria
  • Document decision-making processes

Timeline: 3 months

6. Audit Trail and Monitoring

  • Log all AI decisions
  • Track model versions
  • Monitor for drift and degradation
  • Incident response procedures

Timeline: 2 months

Total to build comprehensive framework: 14-18 months

The Compliance Trap for AI POCs

Common pattern:

POC team: “We’ll skip compliance for the demo, add it later.”

6 months later, trying to go to production:

Compliance team: “This doesn’t meet ANY of our requirements. Start over.”

Real example from our company:

POC: AI hiring tool screens resumes

  • Built in 4 weeks
  • 92% accuracy finding qualified candidates
  • HR team loved it

Compliance review:

  • :cross_mark: No bias testing (could discriminate by race, gender, age)
  • :cross_mark: No explainability (can’t tell candidates why they were rejected)
  • :cross_mark: Violates EEOC requirements (need to show hiring process is fair)
  • :cross_mark: No audit trail (can’t prove decisions for legal defense)

Result: Project blocked. Can’t deploy until all compliance requirements met.

Timeline to fix: 6 months

Total wasted time: 4 weeks building + 6 months fixing = 7 months vs. doing it right from start

The EU AI Act Impact

Effective August 2025, the EU AI Act classifies AI systems by risk:

Unacceptable risk: Banned

  • Social scoring
  • Real-time biometric surveillance
  • Manipulative AI

High risk: Strict requirements

  • AI in hiring (bias testing, transparency)
  • AI in credit decisions (explainability, audit trail)
  • AI in healthcare (safety testing, human oversight)

Requirements for high-risk AI:

  • Risk management system
  • Data governance
  • Technical documentation
  • Record-keeping (audit trail)
  • Transparency and user information
  • Human oversight
  • Accuracy, robustness, security

Penalty for non-compliance: Up to €35M or 7% of global revenue

Impact on our POCs:

3 out of 5 AI POCs we built are classified as “high-risk”:

  • AI hiring tool
  • AI credit scoring for internal procurement
  • AI-powered fraud detection

All 3 now require full compliance before production deployment.

Timeline impact: +6 months per project for compliance work

The Governance Maturity Model

Based on conversations at SF Tech Week security track:

Level 0: No governance (most startups)

  • Building AI without any framework
  • “Move fast and break things”
  • High risk of compliance violations

Level 1: Reactive governance (early AI adopters)

  • Address compliance when forced to
  • No proactive risk management
  • Slow, expensive compliance retrofitting

Level 2: Policy-based governance (where most enterprises are)

  • Written AI policies and principles
  • Approval workflows
  • But: Not systematically enforced

Level 3: Systematic governance (AI-mature companies - 30%)

  • Automated compliance checking
  • Model registry and versioning
  • Continuous monitoring
  • Integrated into development process

Level 4: AI-native governance (rare, <5%)

  • AI governance is competitive advantage
  • Fast compliance (not blocker)
  • Transparent and explainable by design
  • Trust as differentiator

The Tools We’re Using

Building governance isn’t just process - need tools:

Model registry:

  • Track all AI models
  • Version control
  • Metadata (training data, performance, risks)
  • We use: MLflow

Bias testing:

  • Detect discrimination across protected classes
  • Fairness metrics (demographic parity, equal opportunity)
  • We use: AI Fairness 360 (IBM)

Explainability:

  • SHAP values (feature importance)
  • LIME (local explanations)
  • We use: SHAP library

Audit logging:

  • Every AI decision logged
  • Immutable audit trail
  • We use: Custom build on top of our SIEM

Policy as code:

  • Automated compliance checks
  • Block non-compliant deployments
  • We use: Open Policy Agent (OPA)

Total tooling cost: $50K/year + engineering time to integrate

The Governance Team We Built

Can’t build governance without people:

Our AI governance team (company of 5,000 employees):

  • 1 AI governance lead (new role, hired from outside)
  • 2 compliance specialists (existing team, 50% allocated)
  • 1 AI ethics researcher (new role)
  • 1 legal counsel specializing in AI (contract, not full-time)
  • 3 security engineers (existing team, 25% allocated)

Cost: $800K/year fully loaded

For context: We only have 3 production AI systems. That’s $267K/year governance cost per AI system.

The Fast-Track Governance Approach

If you’re starting from zero and need to move faster than 18 months:

Option 1: Third-party governance platform

  • Companies like Credo AI, DataRobot, Fiddler
  • Pre-built compliance frameworks
  • Faster to deploy (3-6 months vs. 18)
  • Cost: $100-500K/year depending on scale

Option 2: Limit to low-risk use cases only

  • Avoid high-risk AI (hiring, credit, healthcare)
  • Focus on internal efficiency tools (lower compliance burden)
  • Faster to production, less governance needed

Option 3: Partner with AI vendors who handle compliance

  • Use OpenAI/Anthropic APIs (they handle some compliance)
  • Buy AI-enabled SaaS (vendor handles governance)
  • Trade-off: Less customization, data sovereignty concerns

My Recommendations

For startups (<100 employees):

  • Don’t build governance from scratch
  • Use third-party platforms or API vendors
  • Focus on building product, not compliance infrastructure

For mid-size companies (100-1000 employees):

  • Start with Level 2 (policy-based governance)
  • Hire 1 dedicated governance lead
  • Use open-source tools (AI Fairness 360, SHAP)
  • Budget 6-9 months to build basics

For enterprises (1000+ employees):

  • Invest in Level 3 (systematic governance)
  • Build dedicated governance team (3-5 people)
  • Consider third-party platforms to accelerate
  • Budget 12-18 months for comprehensive framework

For everyone:

  • Don’t skip governance in POC
  • At minimum: Bias testing, explainability, audit trail
  • Build compliance in from day 1, not as afterthought

Questions for @cto_michelle and @eng_director_luis

@eng_director_luis you mentioned:

Barrier 6: Governance (haven’t hit this yet)

You will. And it will be painful.

My advice:

  • Start governance review NOW (don’t wait for deployment)
  • Identify compliance requirements upfront
  • Budget 3-6 months for governance work

@cto_michelle asked:

Is the 20% success rate acceptable?

From security/compliance perspective: No.

80% failure rate = wasted investment, frustrated teams, missed opportunities.

The fix: Build governance early, not late. It’s slower upfront but faster overall.

The Opportunity

Hot take: Governance is competitive advantage.

Companies that:

  • :white_check_mark: Have mature AI governance
  • :white_check_mark: Can move fast AND comply
  • :white_check_mark: Can demonstrate trustworthy AI

Will win enterprise deals.

Customers are asking:

  • “How do you ensure your AI is unbiased?”
  • “Can you explain AI decisions for audits?”
  • “Are you EU AI Act compliant?”

Companies with good answers: Win deals.
Companies without: Lose to competitors.

Sources:

  • Our 11-month AI governance implementation
  • SF Tech Week “AI Security and Governance” track (Day 5)
  • EU AI Act (effective August 2025)
  • IBM AI Fairness 360 and governance frameworks
  • Conversations with compliance teams from 6 enterprises at SF Tech Week