🏗️ SF Tech Week Enterprise Panel: Why 80% of AI Projects Never Make It to Production

cto_michelle · October 10, 2025, 9:20pm

Just came from the most sobering panel of SF Tech Week: “Enterprise AI: From POC to Production.”

Four enterprise CTOs. All deploying AI at scale. All with horror stories.

The Stat That Shocked the Room

“Less than 20% of AI initiatives are scaled across the enterprise.”

Source: EPAM study, confirmed by all 4 panelists from their own experience

Translation: 80% of AI projects FAIL to reach production.

Not fail because the technology doesn’t work. Fail because of organizational, technical, and cultural barriers.

The “AI POC Success” Trap

Here’s the pattern every panelist described:

Phase 1: POC Success

Pick a well-scoped use case
Use GPT-4 or Claude API
Build a demo in 2-4 weeks
Show impressive results
Executive sponsors excited

Phase 2: Production Reality

Try to integrate with enterprise systems
Hit data quality issues
Discover compliance requirements
Realize infrastructure doesn’t scale
Security review kills the project

One CTO called it: “Death Valley between POC and Production”

The 5 Barriers to Production AI

From the panel + supporting research:

1. Legacy System Integration (60% cite this as top barrier)

The POC:

Demo works on clean CSV data
No system dependencies
Runs on someone’s laptop

Production reality:

Data in 15 different systems (SAP, Salesforce, Oracle, custom databases)
No APIs, only batch file exports from 1995
Data format inconsistencies
Need real-time sync but systems don’t support it

Panelist quote: “Our POC worked perfectly. Then we found out the data we needed was in a mainframe from 1987 that nobody knows how to access anymore.”

2. Data Quality and Availability (50% of companies)

The POC:

Hand-curated sample data
Edge cases removed
“Representative” subset (actually: cleaned and perfect)

Production reality:

Missing data (30% of fields null)
Inconsistent formats (dates in 7 different formats)
Duplicate records
Contradictory data across systems
Historical data needed for training doesn’t exist

Data quality rule: POC uses top 10% cleanest data. Production requires handling bottom 90%.

3. Skills Gap (40% lack AI expertise)

The POC:

External consultants or ML engineers build it
Data scientists fine-tune the model
Proof of concept doesn’t need maintenance

Production reality:

Need engineers to maintain it (consultants are gone)
IT team doesn’t understand ML (can’t debug when it breaks)
Data drift happens (model accuracy degrades)
Nobody on staff can retrain or update models

Hiring challenge: 43% of companies plan to hire AI roles in 2025, competing for same small talent pool.

Most in-demand roles:

Machine learning engineers
AI researchers
ML ops engineers
AI ethics/governance specialists

4. AI Governance and Compliance (18 months to implement)

The POC:

“Don’t worry about compliance for the demo”
Skips security review
No audit trail
No bias testing

Production reality:

Legal requires AI risk assessment (new process, takes 3 months)
Compliance requires model explainability (black box = blocked)
Security requires penetration testing of AI components
Privacy requires data minimization and consent
Regulations (EU AI Act, state laws) require documentation

Average time to implement AI governance: 18 months

From idea to AI governance framework in place = year and a half.

5. Cost at Scale

The POC:

1,000 API calls to GPT-4
Cost: $50
“Totally affordable!”

Production reality:

10 million API calls/month
Cost: $500,000/year
CFO: “Absolutely not”

Need to:

Optimize prompts (reduce tokens)
Switch to smaller models (lose quality)
Self-host open source (infrastructure complexity)
Implement caching (engineering effort)

One panelist: POC to production = 23x cost multiplier after accounting for all infrastructure, tooling, and engineering.

Real Enterprise AI Production Stories

Success Story: Financial Services Company

Use case: Automated document processing for loan applications

POC (4 weeks):

GPT-4 API extracts data from PDFs
95% accuracy on test set
Stakeholders thrilled

Production journey (14 months):

Month 1-3: Security review (data can’t leave premises → need self-hosted model)
Month 4-6: Infrastructure build (GPU cluster, ML ops platform)
Month 7-9: Model fine-tuning (open source model to match GPT-4 accuracy)
Month 10-11: Integration (connect to 8 legacy systems)
Month 12-14: Compliance (audit trail, explainability, testing)

Final cost:

POC: $5K
Production: $800K first year (infrastructure + engineering)

Result: Successful deployment, processes 50K documents/month, saves $2M/year in manual labor

ROI positive after 6 months in production

Failure Story: Healthcare Company

Use case: AI-powered patient diagnosis support

POC (6 weeks):

LLM analyzes patient records, suggests diagnoses
Doctors love it in trials
85% accuracy vs expert human

Production attempt (failed after 9 months):

HIPAA compliance blocked cloud AI APIs
Self-hosting required (security review takes 4 months)
Model explainability insufficient (doctors need to know WHY AI suggested diagnosis)
Liability concerns (who’s responsible if AI is wrong?)
Integration with EHR system impossible (vendor won’t provide API)

Result: Project cancelled, $1.2M spent, zero production deployment

Lesson: Some use cases aren’t ready for AI (regulatory/liability too high)

The 30% Who Succeed: What They Do Differently

IBM study: 30% of tech-advanced companies successfully implemented AI at scale

What separates success from failure:

Start with infrastructure, not use cases

Build ML ops platform first
Establish data pipelines
Create governance framework
THEN identify use cases

Choose low-risk, high-volume use cases

Not “AI diagnosis” (high risk)
Yes “email triage” (low risk)
Focus on efficiency, not critical decisions

Invest in change management

Train employees on AI tools
Address “AI will replace me” fears
Create AI champions in each department

Plan for the whole lifecycle

POC budget: $10K
Production budget: $500K-2M (50-200x multiplier)
If you can’t afford production, don’t start POC

Hybrid approach

Use APIs for low-volume, low-risk
Self-host for high-volume, high-sensitivity
Don’t go all-in on one strategy

The Questions I’m Taking Back to My Team

We’re mid-size company (500 employees), evaluating AI deployment.

Based on this panel, here’s my new checklist before starting ANY AI project:

Before POC:

☐ Do we have executive sponsorship + multi-year budget?
☐ Have we assessed data quality and availability?
☐ Do we have (or can we hire) ML engineering talent?
☐ Is our infrastructure ready (or can we build it)?
☐ Have we identified compliance requirements upfront?
☐ Can we commit to 12-18 month timeline?
☐ Is the ROI worth the investment (realistic production cost)?

If answer to ANY question is “no,” we shouldn’t start the POC.

POC-to-production checklist:

☐ Integration plan with all systems (documented before POC)
☐ Data quality assessment (measure completeness, accuracy)
☐ Compliance review completed (legal, security, privacy)
☐ Production cost model (realistic, not POC costs)
☐ Team trained (not relying on consultants)
☐ Monitoring and observability plan
☐ Model governance (versioning, retraining, rollback)

My Controversial Take

Hot take from the panel (everyone nodded):

“Most companies should NOT be building custom AI models. Use off-the-shelf AI products instead.”

Instead of:

Building custom LLM application from scratch
Fine-tuning open source models
Hiring ML team

Consider:

Buying AI-enabled SaaS products (Salesforce with Einstein, Microsoft with Copilot)
Using AI APIs for specific tasks (OpenAI, Anthropic, Cohere)
Partnering with AI consultancies for specialized use cases

When to build custom:

AI is your core competitive advantage
Unique data gives you proprietary edge
Volume justifies infrastructure investment (>$500K/year in API costs)

When to buy:

AI is supporting tool, not core business
Standard use cases (email, documents, customer support)
Small/mid-size company (<1000 employees)

Questions for This Community

For CTOs/engineering leaders:

What’s been your POC → production success rate?
What’s your biggest barrier (integration, skills, cost, compliance)?
Are you building or buying AI?

For ML engineers:

How do you convince leadership that production is 10-50x harder than POC?
What’s your ML ops stack look like?

For everyone:

Is the 20% success rate acceptable or is enterprise AI fundamentally broken?

I’m trying to avoid becoming part of the 80% failure statistic.

Sources:

SF Tech Week “Enterprise AI: From POC to Production” panel (Day 5)
EPAM “What Is Holding Up AI Adoption” study
PwC 2025 AI Business Predictions
IBM “5 Biggest AI Adoption Challenges”
Pellera Technologies AI Adoption Challenges
Converge TP Top 5 AI Challenges 2025
Panel CTOs from: Financial services, healthcare, manufacturing, retail

eng_director_luis · October 10, 2025, 9:20pm

@cto_michelle This hits HARD. We’re living this right now.

Our POC → Production Journey (Still Ongoing)

Use case: AI code review automation

POC (3 weeks):

GPT-4 analyzes pull requests
Suggests improvements (performance, security, style)
Developers love it in beta (10 person team)
Demo to CTO: “This will save 5 hours/week per developer!”

Production attempt (currently month 7):

Integration with GitHub Enterprise (self-hosted, behind firewall)
Data security (code can’t leave our VPC)
Need self-hosted LLM (switching from GPT-4 to Llama 3.1)
Infrastructure build (GPU servers, vLLM, monitoring)
Model quality dropped (Llama 3.1 not as good as GPT-4 for code)
Fine-tuning on our codebase (need ML engineer, don’t have one)
Currently stuck in “infrastructure build” phase

Original budget: $10K for POC
Actual cost so far: $180K and counting

The Barriers We Hit (In Order)

Barrier 1: Security Review

Security team: “Code is intellectual property. Can’t send to OpenAI.”

Us: “But it’s just a demo…”

Security: “No cloud AI APIs with proprietary code. Non-negotiable.”

Timeline impact: +2 months to evaluate self-hosting options

Barrier 2: Infrastructure

Need to self-host LLM:

GPU servers (we’re mostly CPU-based infrastructure)
ML ops platform (no expertise in-house)
Monitoring and logging (different from our standard tools)
Load balancing and scaling (new patterns for us)

Decision: Hire contractor for 3 months to build infrastructure

Timeline impact: +3 months

Barrier 3: Model Quality

Llama 3.1 70B (best we can self-host with our GPU budget):

Decent at general code review
Misses security issues GPT-4 caught
Style suggestions less helpful
Hallucinates more often

Developers: “This is worse than the POC. Why did we downgrade?”

Need: Fine-tuning on our codebase to improve quality

Barrier 4: Skills Gap

Nobody on our team knows how to:

Fine-tune LLMs
Optimize inference performance
Debug model quality issues
Implement ML monitoring

Decision: Trying to hire ML engineer (6 weeks searching, no offers accepted yet)

Timeline impact: +??? (still hiring)

Barrier 5: Integration Complexity

GitHub Enterprise webhook → our infrastructure → LLM → back to GitHub

Sounds simple. Reality:

Rate limiting (GitHub API limits)
Error handling (what if LLM times out?)
Retry logic (what if analysis fails?)
Versioning (how to handle model updates without breaking?)

Engineering effort: 2 senior engineers, 6 weeks

The Lessons We’re Learning

Lesson 1: Multiply POC estimates by 20x

POC: 3 weeks, $10K
Production: 9 months (so far), $180K+ (and counting)

Rule of thumb I’m using now: POC cost × 20 = production cost

Lesson 2: Security kills cloud AI for enterprises

Every enterprise we talk to:

Can’t use OpenAI/Anthropic for proprietary data
Forces self-hosting
Self-hosting = infrastructure complexity

This is THE barrier for enterprise AI adoption.

Lesson 3: Open source models aren’t “free”

Yes, Llama 3.1 is free to download.

But total cost of ownership:

GPU infrastructure: $5K/month
ML engineer: $200K/year = $16.6K/month
Contractor for setup: $60K
Fine-tuning compute: $3K/month
Monitoring and tools: $2K/month

Total: $26.6K/month vs. OpenAI API would have been $8K/month

We’re paying 3.3x more to self-host. The “savings” are a myth.

Why we’re still doing it: Data security requirements (non-negotiable)

Lesson 4: POC hides the hard problems

POC answers: “Can AI do this task?”

Production answers:

Can AI do this task AT SCALE?
Can AI do this task RELIABLY?
Can AI do this task SECURELY?
Can AI do this task COST-EFFECTIVELY?
Can AI do this task WITH OUR DATA QUALITY?

These are different questions.

What I’d Do Differently

If I could restart this project:

1. Start with infrastructure assessment

Before POC:

☐ What are our security requirements?
☐ Can we use cloud APIs or must self-host?
☐ If self-host, do we have GPU infrastructure?
☐ Do we have ML engineering talent?

If answers are “must self-host” and “no infrastructure/talent,” STOP.

Either:

Build infrastructure first (6-12 months)
OR use AI-enabled products instead of building custom

2. Do POC with production constraints

Don’t demo GPT-4 if production will be Llama 3.1.

POC should use:

Same model as production
Same infrastructure as production
Same data quality as production
Same security constraints as production

3. Budget for production from day 1

POC pitch should be:

“POC will cost $10K and take 3 weeks”
“Production will cost $500K and take 12 months”
“Do we have $500K and 12 months?”

If no, don’t start POC.

4. Hire ML talent BEFORE starting

We tried to build AI without ML engineers. Bad idea.

Should have:

Hired ML engineer first
Had them architect the solution
Then started POC with production-ready approach

The 20% Success Rate Makes Sense Now

Looking at @cto_michelle’s 5 barriers:

Legacy system integration - YEP, we hit this
Data quality - YEP, our code has inconsistent formatting
Skills gap - YEP, still trying to hire
Governance - Haven’t hit this yet (will be barrier 6)
Cost at scale - YEP, way more expensive than POC

We’ve hit 4 out of 5 barriers. No wonder 80% fail.

Are We Going to Make It?

Honest assessment:

Optimistic case (40% probability):

Hire ML engineer in next 2 months
Fine-tune model to acceptable quality
Deploy to production by month 12
Developers use it and save time
ROI positive after 18 months

Realistic case (40% probability):

Struggle to hire ML engineer
Launch with “good enough” model quality
Some developers use it, many ignore it
Mediocre ROI, project limps along

Pessimistic case (20% probability):

Can’t hire ML engineer
Model quality not good enough
Project cancelled after $300K spent
Join the 80% failure statistic

My Advice for Engineering Leaders

Before starting AI project, answer these:

Can we use off-the-shelf product instead?
- GitHub Copilot exists, why are we building custom?
- Answer: We want code review, not code completion
Can we use cloud APIs?
- Security says no for code
- Answer: Must self-host
Do we have infrastructure?
- No GPU platform
- Answer: Need to build (6 months)
Do we have talent?
- No ML engineers
- Answer: Need to hire (unknown timeline)
What’s the ROI at production cost?
- Save 5 hours/week × 100 developers = 500 hours/week
- Value: ~$50K/month
- Cost: ~$27K/month (at scale)
- ROI: Positive if it works
- Answer: Worth pursuing IF we can solve 1-4

If answers to 1-4 are all blockers, STOP.

Questions for @cto_michelle

You mentioned:

30% of tech-advanced companies successfully implemented AI at scale

What separates them from the 70% that failed?

We’re clearly “tech-advanced” (engineering team of 120), but we’re struggling.

What are we missing?

Sources:

Our internal project timeline and costs
6 months of painful lessons
Conversations with other enterprise engineering teams at SF Tech Week
IBM and PwC enterprise AI adoption studies

product_david · October 10, 2025, 9:20pm

Product manager here - let me add the USER ADOPTION perspective that technical folks often miss.

The Barrier Nobody Talks About: People

You can solve all 5 technical barriers (@cto_michelle’s list):

Integration with legacy systems
Data quality
Skills gap
Governance
Cost

And STILL fail at production because users don’t adopt it.

Our Story: AI-Powered Sales Tool

POC Success:

AI analyzes sales calls, provides real-time coaching
Tested with 5 top sales reps
They LOVED it: “This is incredible! Game-changer!”
Executives greenlit $800K production build

Production Reality:

Deployed to 200-person sales team
6 months later, usage data:
- 12% active users (24 out of 200)
- 88% never even logged in

We solved all the technical problems. Failed at adoption.

Why Users Don’t Adopt Enterprise AI

Reason 1: “AI will replace me” Fear

Sales reps thought:

“If AI can coach me, can AI replace me?”
“If I use AI and succeed, is it me or the AI?”
“Management will use this to track my performance”

Result: Passive resistance. Not openly opposed, just… never use it.

Reason 2: Workflow Disruption

AI tool required:

Installing browser extension
Granting microphone access
Recording all calls
Reviewing AI suggestions after each call (5 minutes)

Sales reps: “I have 30 calls/day. I don’t have time for 2.5 extra hours reviewing AI.”

We built a feature. Users needed a workflow.

Reason 3: Trust Issues

AI suggested:

“Mention competitor pricing” (violates our sales policy)
“Follow up in 2 days” (customer explicitly said call back in 2 weeks)
“Emphasize ROI” (customer cares about compliance, not ROI)

After 3-4 bad suggestions, reps stopped trusting it.

AI was 85% accurate. But 15% errors destroyed trust.

Reason 4: Lack of Training

We deployed with:

Technical documentation
Tutorial video (15 minutes)
No hands-on training
No champions to help peers
No ongoing support

Reps who hit issues:

Couldn’t troubleshoot
Contacted IT
IT didn’t know how to support AI tool
Rep gave up

Documentation is not training.

Reason 5: No Executive Use = No Urgency

Sales VPs didn’t use the tool.

Message to sales reps: “This is optional.”

Compare to Salesforce:

Executives USE Salesforce daily
Clear message: “If it’s not in Salesforce, it doesn’t exist”
Adoption: 95%

AI tool:

Executives just wanted reports from it
Unclear if it was mandatory or optional
Adoption: 12%

Users adopt what leadership uses.

The Change Management We Should Have Done

Based on painful retrospective:

Phase 1: Before POC (Change Management = 0%)

What we did:

Skipped change management
“Let’s just build it and they’ll love it”

What we should have done:

User research (what do sales reps actually need?)
Involve reps in design (co-create, not impose)
Address fears upfront (AI augments, doesn’t replace)

Phase 2: During Development (Change Management = 10%)

What we did:

Showed demos to sales leadership
Didn’t involve actual sales reps
Built in isolation

What we should have done:

Beta program with 20 reps (not just top performers)
Iterate based on feedback
Build champions who can advocate to peers

Phase 3: Deployment (Change Management = 20%)

What we did:

Announcement email from VP
Tutorial video
Assumed that’s enough

What we should have done:

Hands-on training (2-hour workshop for every rep)
Office hours (daily support for first month)
Champion network (1 champion per 10 reps)
Incentives (gamification, recognition for top users)

Phase 4: Post-Launch (Change Management = 30%)

What we did:

Monitored usage metrics
Fixed bugs
No proactive outreach to non-users

What we should have done:

1-on-1s with non-adopters (understand barriers)
Success stories (showcase reps who benefited)
Continuous improvement (ship features users request)
Executive accountability (VPs use the tool themselves)

The AI Adoption Formula

Technical success ≠ User adoption

Formula for production AI success:

Technical Excellence (50%):

Works reliably
Integrates with systems
Acceptable quality
Secure and compliant

User Adoption (50%):

Solves real user pain (not what executives think users need)
Fits into workflow (minimal disruption)
Earns trust (high accuracy + transparent about limitations)
Supported by training and champions
Modeled by leadership

Most AI projects focus 90% on technical, 10% on adoption.

Should be 50/50.

The Relaunch Plan

We’re doing a reboot (6 months after failed launch):

1. User research (4 weeks)

Interview 40 sales reps
Understand actual pain points
Identify workflow constraints
Address fears and concerns

2. Redesign (8 weeks)

Simplify: Remove features reps didn’t want
Workflow integration: Work within existing tools (Salesforce, not standalone)
Trust building: Show confidence scores, explain reasoning

3. Beta program (8 weeks)

20 reps, diverse (not just top performers)
Weekly feedback sessions
Iterate rapidly based on input
Build champions

4. Phased rollout (12 weeks)

Start with champion teams (20 reps)
Hands-on training (2 hours per rep)
Week 4: Expand to next 40 reps
Week 8: Expand to next 80 reps
Week 12: Full deployment (200 reps)

5. Executive accountability

Sales VPs commit to using tool themselves
Review AI insights in team meetings
Recognize top users publicly

Budget: $200K for change management (on top of $800K technical build)

Goal: 70% adoption within 6 months

The ROI of Change Management

Original launch:

Technical cost: $800K
Adoption: 12%
Value delivered: $800K × 12% = $96K worth of value
ROI: Negative

Relaunch with change management:

Technical cost: $800K (sunk)
Change management: $200K
Adoption target: 70%
Value delivered: $800K × 70% = $560K worth of value
ROI: Positive (assuming success)

Lesson: $200K in change management turns $800K failed project into successful one.

My Advice for Product Managers

Before building AI product:

User research FIRST
- What problems do users actually have?
- Will AI solve them better than current solutions?
- Will users change behavior to use AI?
Prototype with Wizard of Oz
- Human pretends to be AI
- Test if users want the feature
- Validate workflow integration
- THEN build real AI
Budget for adoption
- Technical build: X
- Change management: 0.25X (25% of technical cost)
- Training and support: 0.15X (15% of technical cost)
- Total: 1.4X
Measure adoption, not just technical metrics
- Not just “does it work?”
- Also “are people using it?”
- Track: Active users, frequency, retention

Questions for This Community

For product managers:

What’s your AI product adoption rate?
How much do you invest in change management vs. technical build?

For CTOs/eng leaders:

How do you balance technical excellence with user adoption?
Do you involve users in AI development process?

For @cto_michelle and @eng_director_luis:

Are you planning for change management?
Who owns user adoption (product, engineering, or someone else)?

The 80% failure rate isn’t just technical. It’s organizational and cultural.

Sources:

Our failed launch and retrospective
SF Tech Week “Enterprise AI Adoption” workshop (Day 5)
Change management research from PwC and IBM studies
Conversations with 8 other product teams at SF Tech Week who had similar adoption failures

security_sam · October 10, 2025, 9:20pm

Security and compliance perspective: The governance barrier is MASSIVE and often underestimated.

The 18-Month Governance Timeline is Real

@cto_michelle mentioned:

Average 18 months to implement AI governance

I’m living this. We’re 11 months into building AI governance framework and still not done.

What “AI Governance” Actually Means

Most people think: “Write a policy, check a box, done.”

Reality: Entire organizational process covering:

1. AI Risk Assessment Framework

Identify all AI use cases
Classify by risk level (high/medium/low)
Different approval workflows for each level
Risk assessment template and process

Timeline to build: 3 months

2. Model Validation and Testing

Bias testing (does model discriminate?)
Adversarial testing (can it be manipulated?)
Performance testing (accuracy, precision, recall)
Explainability testing (can we explain decisions?)

Timeline per model: 2-4 weeks

3. Data Governance

What data can be used for AI training?
How is data anonymized/de-identified?
Consent management (do we have permission?)
Data retention and deletion policies

Timeline to build: 4 months

4. Compliance Mapping

EU AI Act (effective August 2025)
State AI laws (Colorado, California, etc.)
Industry regulations (HIPAA, SOX, PCI-DSS)
International laws (GDPR, etc.)

Timeline: 2 months initial, ongoing updates

5. AI Ethics and Responsible AI

Define responsible AI principles
Create AI ethics review board
Establish fairness criteria
Document decision-making processes

Timeline: 3 months

6. Audit Trail and Monitoring

Log all AI decisions
Track model versions
Monitor for drift and degradation
Incident response procedures

Timeline: 2 months

Total to build comprehensive framework: 14-18 months

The Compliance Trap for AI POCs

Common pattern:

POC team: “We’ll skip compliance for the demo, add it later.”

6 months later, trying to go to production:

Compliance team: “This doesn’t meet ANY of our requirements. Start over.”

Real example from our company:

POC: AI hiring tool screens resumes

Built in 4 weeks
92% accuracy finding qualified candidates
HR team loved it

Compliance review:

No bias testing (could discriminate by race, gender, age)
No explainability (can’t tell candidates why they were rejected)
Violates EEOC requirements (need to show hiring process is fair)
No audit trail (can’t prove decisions for legal defense)

Result: Project blocked. Can’t deploy until all compliance requirements met.

Timeline to fix: 6 months

Total wasted time: 4 weeks building + 6 months fixing = 7 months vs. doing it right from start

The EU AI Act Impact

Effective August 2025, the EU AI Act classifies AI systems by risk:

Unacceptable risk: Banned

Social scoring
Real-time biometric surveillance
Manipulative AI

High risk: Strict requirements

AI in hiring (bias testing, transparency)
AI in credit decisions (explainability, audit trail)
AI in healthcare (safety testing, human oversight)

Requirements for high-risk AI:

Risk management system
Data governance
Technical documentation
Record-keeping (audit trail)
Transparency and user information
Human oversight
Accuracy, robustness, security

Penalty for non-compliance: Up to €35M or 7% of global revenue

Impact on our POCs:

3 out of 5 AI POCs we built are classified as “high-risk”:

AI hiring tool
AI credit scoring for internal procurement
AI-powered fraud detection

All 3 now require full compliance before production deployment.

Timeline impact: +6 months per project for compliance work

The Governance Maturity Model

Based on conversations at SF Tech Week security track:

Level 0: No governance (most startups)

Building AI without any framework
“Move fast and break things”
High risk of compliance violations

Level 1: Reactive governance (early AI adopters)

Address compliance when forced to
No proactive risk management
Slow, expensive compliance retrofitting

Level 2: Policy-based governance (where most enterprises are)

Written AI policies and principles
Approval workflows
But: Not systematically enforced

Level 3: Systematic governance (AI-mature companies - 30%)

Automated compliance checking
Model registry and versioning
Continuous monitoring
Integrated into development process

Level 4: AI-native governance (rare, <5%)

AI governance is competitive advantage
Fast compliance (not blocker)
Transparent and explainable by design
Trust as differentiator

The Tools We’re Using

Building governance isn’t just process - need tools:

Model registry:

Track all AI models
Version control
Metadata (training data, performance, risks)
We use: MLflow

Bias testing:

Detect discrimination across protected classes
Fairness metrics (demographic parity, equal opportunity)
We use: AI Fairness 360 (IBM)

Explainability:

SHAP values (feature importance)
LIME (local explanations)
We use: SHAP library

Audit logging:

Every AI decision logged
Immutable audit trail
We use: Custom build on top of our SIEM

Policy as code:

Automated compliance checks
Block non-compliant deployments
We use: Open Policy Agent (OPA)

Total tooling cost: $50K/year + engineering time to integrate

The Governance Team We Built

Can’t build governance without people:

Our AI governance team (company of 5,000 employees):

1 AI governance lead (new role, hired from outside)
2 compliance specialists (existing team, 50% allocated)
1 AI ethics researcher (new role)
1 legal counsel specializing in AI (contract, not full-time)
3 security engineers (existing team, 25% allocated)

Cost: $800K/year fully loaded

For context: We only have 3 production AI systems. That’s $267K/year governance cost per AI system.

The Fast-Track Governance Approach

If you’re starting from zero and need to move faster than 18 months:

Option 1: Third-party governance platform

Companies like Credo AI, DataRobot, Fiddler
Pre-built compliance frameworks
Faster to deploy (3-6 months vs. 18)
Cost: $100-500K/year depending on scale

Option 2: Limit to low-risk use cases only

Avoid high-risk AI (hiring, credit, healthcare)
Focus on internal efficiency tools (lower compliance burden)
Faster to production, less governance needed

Option 3: Partner with AI vendors who handle compliance

Use OpenAI/Anthropic APIs (they handle some compliance)
Buy AI-enabled SaaS (vendor handles governance)
Trade-off: Less customization, data sovereignty concerns

My Recommendations

For startups (<100 employees):

Don’t build governance from scratch
Use third-party platforms or API vendors
Focus on building product, not compliance infrastructure

For mid-size companies (100-1000 employees):

Start with Level 2 (policy-based governance)
Hire 1 dedicated governance lead
Use open-source tools (AI Fairness 360, SHAP)
Budget 6-9 months to build basics

For enterprises (1000+ employees):

Invest in Level 3 (systematic governance)
Build dedicated governance team (3-5 people)
Consider third-party platforms to accelerate
Budget 12-18 months for comprehensive framework

For everyone:

Don’t skip governance in POC
At minimum: Bias testing, explainability, audit trail
Build compliance in from day 1, not as afterthought

Questions for @cto_michelle and @eng_director_luis

@eng_director_luis you mentioned:

Barrier 6: Governance (haven’t hit this yet)

You will. And it will be painful.

My advice:

Start governance review NOW (don’t wait for deployment)
Identify compliance requirements upfront
Budget 3-6 months for governance work

@cto_michelle asked:

Is the 20% success rate acceptable?

From security/compliance perspective: No.

80% failure rate = wasted investment, frustrated teams, missed opportunities.

The fix: Build governance early, not late. It’s slower upfront but faster overall.

The Opportunity

Hot take: Governance is competitive advantage.

Companies that:

Have mature AI governance
Can move fast AND comply
Can demonstrate trustworthy AI

Will win enterprise deals.

Customers are asking:

“How do you ensure your AI is unbiased?”
“Can you explain AI decisions for audits?”
“Are you EU AI Act compliant?”

Companies with good answers: Win deals.
Companies without: Lose to competitors.

Sources:

Our 11-month AI governance implementation
SF Tech Week “AI Security and Governance” track (Day 5)
EU AI Act (effective August 2025)
IBM AI Fairness 360 and governance frameworks
Conversations with compliance teams from 6 enterprises at SF Tech Week