We Rebuilt Our Backend from Scratch. 9 Months Later, I'm Not Sure It Was Worth It

I need to share this story because I see a lot of engineers (including my past self) romanticizing the “rewrite from scratch” decision. Here’s what actually happened.

The Setup

18 months ago, we were running a legacy .NET monolith serving 500 enterprise customers in the financial services space. The code was 7 years old, built by contractors who were long gone. Every deployment was nerve-wracking. Every new feature took weeks.

Our board asked: “Can this system scale to 2,000 customers?”

My answer: “Not without serious investment.”

The Decision

We presented two options:

  1. Incremental refactor: Keep .NET, break apart the monolith gradually, modernize piece by piece
  2. Full rebuild: Rewrite in Node.js microservices, modern architecture, “done right this time”

The board approved the rebuild. Budget: $1.2M. Timeline: 9 months. We hired 6 new engineers with Node.js experience.

I was so confident this was the right call.

The Reality (9 Months In)

Here’s where we actually are:

:white_check_mark: What went well:

  • New architecture is genuinely better—microservices, proper API contracts, better testing
  • Team morale was high during the “greenfield” phase
  • We learned a lot about our product by rebuilding it

:cross_mark: What went wrong:

  • We’re only at ~70% feature parity with the old system
  • Customers are frustrated with bugs in the new version
  • We missed critical edge cases that the old system handled
  • We’re still maintaining the old system alongside the new one (double the work)
  • We’ve shipped almost zero new features for 9 months

The Part That Haunts Me

A major customer (15% of our revenue) is threatening to churn because a reporting feature they rely on is “broken” in the new system. It worked fine in the old .NET version, but we didn’t realize they were depending on a specific bug in the date calculation logic.

We could have fixed that in the old system in 2 days. Instead, we’re now debugging microservices communication patterns and data consistency issues.

What I Should Have Done Differently

Looking back, the strangler pattern would have been smarter:

  1. Keep the .NET monolith running
  2. Build new features in Node.js microservices
  3. Gradually extract modules from the monolith one at a time
  4. Run both systems in parallel for 12-18 months
  5. Only retire the old system once we’re confident

This would have:

  • Let us ship new features during migration
  • Reduced risk of missing edge cases
  • Kept customers happy
  • Validated the new architecture incrementally

The Question I’m Wrestling With

If I could go back, would I still rebuild? Honestly… I don’t know.

The new architecture is better. But was it worth 9 months of zero feature development? Was it worth risking customer churn? Was it worth the team stress of trying to match 7 years of production battle-testing in 9 months?

What Would You Do?

For those who’ve done major rewrites:

  • How did you handle the “edge cases we didn’t know existed” problem?
  • What was your strategy for maintaining both systems during migration?
  • How did you decide when to finally sunset the old system?
  • In hindsight, would you have chosen a different path?

I’m sharing this not as a success story, but as a cautionary tale. Maybe someone can learn from my mistakes. :chart_decreasing:

Luis, thank you for sharing this so honestly. This is the conversation the industry needs to have more often.

I’m going to validate your pain first: you’re not alone in this experience. I’ve seen this exact pattern play out at three different companies. The “rewrite from scratch” decision almost always takes longer and costs more than estimated.

My Own “Rebuild Regret” Story

Five years ago at a previous company, we did almost the same thing—rewrote our Ruby on Rails monolith as Go microservices. We estimated 12 months. It took 22 months. We lost two major customers during the transition due to bugs and missing features.

The part that still bothers me: we could have achieved 80% of the architectural benefits with 20% of the risk using strangler pattern.

Where Rewrites Go Wrong

Here’s what I’ve learned about why “big bang” rewrites fail:

  1. Institutional knowledge loss: The old codebase encoded 7 years of customer quirks, edge cases, and business logic that wasn’t documented anywhere. When you throw it away, you lose all that context.

  2. The “clean slate” trap: New codebases start clean, but within 6 months they accumulate their own tech debt. You’re just trading old problems for new problems.

  3. Opportunity cost: 9 months of zero new features means your competitors shipped 3-4 product cycles while you were rebuilding.

The Dual-System Strategy That Works

When I finally learned my lesson, here’s what worked at my current company:

We ran the old Rails system and new Go services in parallel for 18 months:

  • All reads hit the old system (stable, battle-tested)
  • All writes went to both systems (dual-write pattern with reconciliation)
  • New features shipped on new system only
  • Gradually moved read traffic module-by-module

Yes, it’s more complex. Yes, you’re maintaining two systems. But:

  • Zero customer impact from migration bugs
  • Continuous feature delivery during transition
  • Incremental validation of new architecture
  • Easy rollback if something breaks

Your Specific Situation

The reporting feature bug you mentioned—the date calculation edge case—this is exactly why parallel systems matter. If you’d been running both systems, you could have compared outputs and caught that discrepancy before it reached customers.

For your 500 enterprise customers, I’d seriously consider:

  • Keep the .NET system in “maintenance mode” for critical workflows
  • Gradually move customers to new system based on their feature usage
  • Run both systems for another 6-12 months
  • Use feature flags to control which system handles which requests

It’s not too late to shift strategy.

My Advice Going Forward

  1. Don’t sunset the old system yet. You’re at 70% feature parity—that means 30% of customer workflows will break if you force migration.

  2. Create a compatibility matrix: Which customers use which features? Migrate customers whose needs are 100% covered by new system first.

  3. Celebrate what you’ve learned: The new architecture knowledge your team gained is valuable, even if the timeline was rough.

Would you be open to sharing more about your dual-system maintenance burden? I’m curious if you’re keeping both DBs in sync or treating them as separate systems.

Luis, as someone currently facing the rebuild vs refactor decision, this is exactly the reality check I needed to hear. :anxious_face_with_sweat:

The User’s Perspective

Here’s what strikes me from the design/product side: users don’t care about your stack. They only care about reliability and new features.

Your customers don’t know (or care) that you’re running Node.js microservices now. They just know that:

  • Features they relied on are broken
  • You haven’t shipped anything new in 9 months
  • Their workflows are disrupted

This is the part that scares me most about our potential migration. We have design customers who’ve built muscle memory around our product. If we break their workflows during a migration, they won’t care that our new architecture is “better.”

The “Edge Cases We Didn’t Know Existed” Problem

That date calculation bug you mentioned resonates hard. In my failed startup, we once tried to rebuild our notification system “the right way.” We thought we understood all the requirements.

Turns out, one customer was relying on notifications being delivered in a specific order because their workflow depended on it. We never documented this. It wasn’t in any spec. But when we changed it, their entire team’s process broke.

The old codebase is the spec. All those quirks and edge cases represent real user needs, even if they look like bugs.

My Question

You mentioned your team morale was high during the “greenfield” phase. How’s morale now that you’re in the “70% done but customers are angry” phase?

I worry about this with our team. Engineers love greenfield rewrites, but the messy middle—where you’re maintaining two systems, chasing edge cases, and getting customer complaints—that’s where burnout happens.

How are you keeping your team motivated through this? :flexed_biceps:

Luis, I really appreciate you being this vulnerable about the decision. This kind of honesty is rare and valuable.

I want to talk about the organizational impact of rewrites, because I think that’s the part that doesn’t get discussed enough.

The Team Morale Cycle

You mentioned morale was high during greenfield development. I’ve seen this pattern repeatedly:

Months 1-3: Excitement! Clean architecture! Modern stack! Engineers are energized.

Months 4-6: Grinding. Rebuilding boring CRUD features. Copying logic from old system. Engineers start asking “why are we doing this again?”

Months 7-9: Crisis mode. Edge cases everywhere. Customers complaining. Leadership asking “when will this be done?” Engineers are burned out.

Months 10+: Resentment. The new system has its own tech debt. The promise of “this time we’ll do it right” rings hollow.

This cycle affects retention. I’ve lost senior engineers during rewrite projects because they join for the greenfield phase and leave during the grind.

The Questions I’d Be Asking

  1. What’s your engineering attrition been during this 9 months? Rewrites are notoriously hard on retention.

  2. How did this affect your hiring? Were you still hiring for .NET skills to maintain the old system, or only Node.js for the new system?

  3. Did you have a clear “done” criteria? Or did scope keep expanding as you discovered new edge cases?

  4. What happened to your product/engineering relationship? When you tell Product “we can’t ship features for 9 months,” that trust is hard to rebuild.

The Alternative Path

I’m going to echo Michelle’s advice about running parallel systems, but add the org design perspective:

Split your team into two squads:

  • Stability squad (30% of team): Maintains old .NET system, handles critical bugs
  • Innovation squad (70% of team): Builds new features on new Node.js stack

This way:

  • You’re still delivering value to customers
  • Engineers see their work shipping, not just “rewrite progress”
  • You incrementally validate the new architecture
  • Product team stays engaged instead of frustrated

The Hard Truth

Rewrites test your credibility as a leader. Your board funded this based on your recommendation. If it doesn’t deliver the promised value, that affects future strategic decisions.

I’d suggest:

  1. Reset expectations with the board—9 months in, what’s the honest finish line?
  2. Define what “feature parity” actually means (maybe 70% is enough?)
  3. Create a parallel-system strategy so you can resume feature development
  4. Measure the real outcome: is engineering velocity actually faster on the new stack?

What does your roadmap look like for the next 6 months? Are you committed to finishing the rewrite, or are you considering running both systems long-term?

Luis, coming from the product side, this thread is both validating and terrifying.

The Product-Engineering Trust Breakdown

You mentioned that a major customer (15% of revenue!) is threatening to churn. As a VP of Product, this is my nightmare scenario.

Here’s what I imagine happened on the product side of your company:

Month 1-2: Product team is told “engineering is doing a migration, expect slower feature delivery for a few months”

Month 3-5: Product is fielding customer complaints about bugs. Engineering says “we’re working on it, migration is almost done”

Month 6-8: Product has no new features to sell. Sales is frustrated. Customer success is putting out fires.

Month 9: A key customer is threatening to leave. Product team is thinking “was this migration worth it?”

The Customer Communication Failure

Here’s what bugs me most: did you tell customers about the migration?

If yes: How did you frame it? “We’re improving our infrastructure for your benefit” probably sounded good, but customers just experienced 9 months of bugs and zero new features.

If no: Customers are confused why the product suddenly got buggier and development slowed down.

Either way, the product story breaks down.

What I Wish Engineering Teams Understood

When you decide to do a 9-month rewrite:

  1. Product’s roadmap becomes a lie. We promised customers features. We can’t deliver. We lose credibility.

  2. Sales has nothing to sell. “We’re rebuilding our infrastructure” doesn’t close deals. New features close deals.

  3. Customer success bears the brunt. Every bug, every missing feature—CS is the one explaining it to angry customers.

  4. Competitive positioning suffers. While you’re rebuilding, competitors are shipping.

The Questions I’d Want Answers To

  1. What’s the real feature parity number? You said 70%, but that could mean:

    • 70% of features work perfectly
    • 100% of features work at 70% quality
    • 70% of customer workflows are supported
  2. What customer value did the new architecture unlock? Can you now ship features faster? Handle bigger customers? Reduce infrastructure costs? If the answer is “not yet,” that’s a problem.

  3. How did this impact your renewal rates? 9 months of instability usually shows up in churn metrics 6-12 months later.

  4. What’s your win-back plan for the churning customer? That 15% revenue customer—are you giving them a dedicated team to fix their issues? Credits? Executive attention?

My Advice

From a product perspective, I’d do a customer impact assessment:

  • Tier 1 customers (top 20% of revenue): Get the old .NET system stable for them, even if it means maintaining it longer
  • Tier 2 customers: Migrate gradually with close monitoring
  • Tier 3 customers: Use as testing ground for new system

You can’t afford to lose that 15% revenue customer over a technical decision. Sometimes the right answer is “we’ll maintain the old system just for you” while you figure out the migration.

What does your customer segmentation look like, and are you treating all customers the same during this migration?