93% Use AI Tools, Only 46% Trust Them - Are We Building Production Systems on 'Sort Of Trust'?

We have a trust problem in production, and nobody wants to talk about it.

93% of developers use AI coding tools. Only 46% actually trust them.

Let me repeat that: mass adoption without trust.

As CTO, this keeps me up at night because we’d never accept this for any other production system. Imagine telling your board: “93% of our engineers use this database, but only 46% trust it won’t lose data.”

They’d demand we fix it immediately.

But for AI coding tools? We’re just… accepting it. Writing production code with tools we don’t fully trust. Every day.

The trust breakdown:

  • 46% trust AI tools
  • 33% “sort of trust” them
  • 21% don’t trust them at all

Yet all of them keep using the tools because the productivity gains feel real even when the trust doesn’t exist.

What does “sort of trust” even mean in production?

You trust it for boilerplate. Don’t trust it for complex logic.
Trust it for CSS. Don’t trust it for security-critical code.
Trust it to give you a starting point. Don’t trust it to be right.

This is fine when you have the expertise to evaluate the output. It’s dangerous when you don’t.

The business risk calculation I’m running:

Acceptable risk: AI handles routine CRUD operations, senior engineers review everything.

Dangerous risk: AI handles complex distributed systems logic, juniors can’t evaluate if it’s correct.

Crisis scenario: Production incident caused by AI-generated code, team can’t debug it because they don’t understand what the AI built.

The data that scares me:

  • AI code shows 1.7x more issues overall
  • 23.7% more security vulnerabilities specifically
  • 66% of developers say AI code is “almost right, but not quite”

That last one is insidious. “Almost right” passes code review if reviewers are overwhelmed. It passes testing if test coverage isn’t comprehensive. It ships to production and breaks in edge cases.

What we implemented:

  1. AI-Assisted Code Tiers

    • Green tier: Juniors with AI can touch this (well-tested, non-critical paths)
    • Yellow tier: Mid-level+ with AI (feature code, business logic)
    • Red tier: Senior only, AI optional (security, distributed systems, data integrity)

    Controversial because it limits junior autonomy. But I’d rather limit scope than cause outages.

  2. Security Review Process
    All AI-assisted code goes through automated security scanning + manual review. 23.7% more vulnerabilities means we can’t rely on normal review processes.

  3. Blast Radius Limits
    AI-assisted features ship behind feature flags with gradual rollout. If something breaks, we can roll back fast.

  4. Dependency Audits
    AI tools love suggesting libraries. We now audit every new dependency before approval. Found three cases of AI suggesting deprecated packages with known vulnerabilities.

The strategic question:

Is “sort of trust” acceptable when building production systems?

In financial services, where I work, compliance frameworks explicitly require trust in our development process. I can’t tell auditors “we sort of trust the code.”

But in other industries, maybe the risk tolerance is different? Maybe “move fast with sort of trusted AI” is an acceptable trade-off?

The long-term concern:

Does trust improve as tools mature, or does dependency deepen without trust improving?

If developers use AI for 18 months and still only “sort of trust” it, we’re building production systems on a foundation of uncertainty. That compounds over time.

Michelle’s point about skill debt applies here: we won’t know we have a trust problem until something breaks badly. And by then, we might have a codebase full of AI-generated code that nobody fully understands or trusts.

The uncomfortable question:

If we don’t trust the tools, why are we trusting the output?

Are we deferring a crisis, or is this the new normal? Is “sort of trust” enough?

Michelle, this is the cultural question that underlies everything else.

How do we build trust when the tools are fundamentally black boxes?

I can explain to my team HOW a database works. We can reason about failure modes. We can build mental models. That creates trust through understanding.

But AI coding tools? I can’t explain WHY Claude suggested this pattern instead of that one. Neither can my engineers. We’re trusting outputs we don’t fully understand.

The trust-building approaches we’re trying:

  1. Transparency Through Testing
    If we can’t understand how AI generated the code, we can at least prove it works through comprehensive testing. Trust through validation, not comprehension.

    Problem: This only works if test coverage is excellent. Many teams don’t have that baseline.

  2. Shared Learning Sessions
    When AI generates interesting solutions, we do team reviews: “Why did the AI choose this approach? What does it tell us about the problem space?”

    Turns AI outputs into learning opportunities. Builds conceptual understanding even if we don’t understand the generation process.

  3. Pattern Libraries
    We’re documenting “AI-generated patterns that worked” vs “AI-generated patterns that caused issues.” Building institutional knowledge about what to trust and what to question.

  4. Gradual Autonomy
    New engineers start in Green tier (Michelle’s framework), earn trust to move to Yellow and Red. Trust is earned through demonstrated capability to evaluate AI output.

The DEI concern I mentioned before:

Who gets slotted into “can be trusted with AI in Red tier” vs “limited to Green tier”?

If that decision is subjective, we risk bias. If it’s metric-driven, what metrics actually measure “ability to evaluate AI-generated code quality”?

I don’t have good answers. But “sort of trust” isn’t enough when we’re scaling from 25 to 80+ engineers. The trust model needs to be explicit and fair.

The hiring implication:

I’m adding interview questions specifically about “how do you evaluate code you didn’t write and don’t fully understand?”

Looking for metacognitive skills: recognizing what you don’t know, knowing when to ask for help, having frameworks for evaluation even without deep understanding.

That might be the skill that matters most in the AI era: epistemic humility combined with systematic validation.

The financial services perspective is critical here.

Michelle, you’re right that compliance won’t accept “sort of trust.” We have explicit regulatory requirements about code quality, security, and auditability.

When I present our AI tool adoption to compliance teams, they ask:

  1. “Can you prove this code is secure?” - Yes, through testing and review
  2. “Can you explain how it works?” - Mostly yes, it’s still deterministic code
  3. “Can you guarantee consistent quality?” - This is where we struggle

Regulatory frameworks assume human accountability.

Someone signed off on the code. Someone reviewed it. Someone can explain the design decisions.

With AI-generated code, the accountability chain breaks. “The AI suggested it and it passed tests” isn’t an acceptable audit trail.

What we added for compliance:

  1. Human Validation Checkpoints
    Every AI-assisted feature requires human sign-off at multiple stages: requirements, design, implementation, review. Clear audit trail of who validated what.

  2. Explainability Requirements
    Engineers must document WHY they accepted AI suggestions, not just what was generated. Creates accountability for evaluation decisions.

  3. Risk Classification
    Michelle’s tier system (Green/Yellow/Red) maps to our risk framework. Different approval levels for different risk tiers.

  4. Rollback Capabilities
    Feature flags and gradual rollout aren’t just best practices—they’re risk mitigation. If AI-generated code causes issues, we can roll back fast.

The “almost right but not quite” problem is a security nightmare.

66% of developers reporting this means two-thirds of AI-generated code has subtle issues. Some are caught in review, some in testing, some slip through to production.

In financial systems, “almost right” in payment processing or compliance reporting isn’t acceptable. It’s either right or it’s wrong, and wrong can cost millions in fines or customer trust.

Michelle’s dependency audit point is spot-on. AI loves suggesting libraries without understanding security implications. We found three cases of deprecated packages with known CVEs.

The strategic question for regulated industries:

Can we adopt AI tools at the same pace as unregulated industries, or do compliance requirements force us to be slower and more careful?

I’m betting on “slower but sustainable.” The productivity gains exist, but the risk tolerance is lower.

Michelle’s tier system is exactly right, but here’s the product question:

If only 46% of developers trust AI tools, but we’re building products with them, how do we expect customers to trust our products?

Trust cascades down the value chain.

Engineers “sort of trust” AI → they build features with uncertainty → QA tests but can’t catch everything → product ships with unknown risk → customers experience bugs → customer trust degrades

The customer trust equation:

We surveyed users after a bug caused by AI-generated code (edge case in payment processing that AI missed). Customer response:

  • 23% said it made them trust our product less
  • 67% didn’t care how code was written, just wanted bugs fixed fast
  • 10% were actually interested in how we used AI

Most customers don’t care about our development process. They care about reliability.

But if AI-generated code has 1.7x more issues and 23.7% more security vulnerabilities, that affects reliability directly. Which affects customer trust.

The metrics I’m tracking:

  1. Customer-reported bugs per release - are AI productivity gains increasing bug rate?
  2. Time to resolution - can we fix AI-generated bugs as fast as human bugs?
  3. Severity distribution - Michelle mentioned this: are AI bugs more expensive?

Early data: Customer-reported bugs up 18% after AI adoption. Time to resolution up 22% (because debugging AI code is harder).

The business case tension:

Ship faster with AI tools → more bugs → customer trust decreases → churn increases → revenue impact

Ship slower with careful review → fewer bugs → customer trust maintained → but we lose competitive position to faster-shipping competitors

Which revenue hit do we take?

Luis and Michelle’s points about systematic validation are critical.

We can’t trust AI blindly, but we can build systems that validate AI output systematically. That’s the path to “trust through verification” instead of “sort of trust through hope.”

Feature flags, gradual rollout, comprehensive testing, security scanning - these aren’t optional nice-to-haves. They’re requirements for responsible AI adoption.

The design world went through this exact trust crisis with generative tools.

Designers “sort of trusted” AI to generate layouts, icons, copy. Then we shipped something AI-generated that had accessibility violations we didn’t catch.

Customer with screen reader couldn’t navigate the interface. It passed visual QA because it LOOKED right. But AI didn’t consider semantic HTML structure.

The trust-building lesson from design:

Trust comes from understanding failure modes, not just happy paths.

We now teach designers:

  • What AI tools are good at (iteration, exploration, common patterns)
  • What AI tools are bad at (accessibility, edge cases, context-specific design)
  • How to validate AI output systematically

Engineering can learn from this. AI tools have predictable failure modes:

AI is good at:

  • Boilerplate and common patterns
  • Syntax and formatting
  • Refactoring within clear boundaries
  • Generating test cases from examples

AI is bad at:

  • Security implications of code combinations
  • Performance characteristics at scale
  • Edge cases and error handling
  • Cross-system architectural impacts

If we teach engineers these failure modes explicitly, “sort of trust” becomes “systematic evaluation.”

Michelle’s point about transparency:

In design, we require explanations for AI-generated work. “Why did you accept this AI suggestion?” becomes part of the design review.

Not because we don’t trust designers, but because explaining builds metacognitive awareness. They have to think critically about what they’re accepting.

Same should apply to code review. “Can you explain why this AI-generated approach is correct?” forces engineers to engage critically instead of just accepting outputs.

Trust through explanation, not trust through blind acceptance.