Three months ago, we faced a decision that could define the next two years of our EdTech platform: stick with our aging Django monolith or rewrite from scratch.
We did what felt modern and data-driven - we fed our entire codebase to GPT-4.5, asked it to analyze our architecture, and requested recommendations.
The AI came back with a detailed, compelling rewrite plan. New stack (Next.js, microservices, PostgreSQL + Redis), AI-generated boilerplate to bootstrap 70% of the code, estimated timeline of 6 months. The analysis was thorough, the architecture looked clean, the proposal was convincing.
We almost said yes. I’m so glad we didn’t.
The Context: Our Legacy Platform
Our EdTech platform is 8 years old, built on Django, runs as a monolith with 250,000 lines of Python code serving 5 million students and teachers. It’s not pretty:
- Mixed Django templates and React components
- Database queries that should be optimized but work “well enough”
- Feature flags layered on feature flags
- Three different authentication patterns from different eras
- No consistent API design
- Documentation that’s 40% outdated
Every new feature takes longer than it should. Onboarding engineers takes 3-4 weeks. We have tech debt. We know we have tech debt. And GPT-4.5 saw exactly what we saw.
What the AI Recommended
The GPT-4.5 analysis was genuinely impressive:
Technical Assessment:
- Identified architectural inconsistencies
- Flagged performance bottlenecks
- Noted security patterns that needed updating
- Highlighted scalability limitations
Proposed Solution:
- Next.js for frontend (better performance, modern React)
- FastAPI microservices for backend (Python async, easy migration path)
- PostgreSQL with proper normalization (fix our schema issues)
- Redis for caching and session management
- Estimated 6 months with AI-assisted code generation handling 70% of boilerplate
The Pitch:
Clean slate, modern stack, AI acceleration, better developer experience, improved performance, easier to scale.
It was exactly what a senior architect would propose. Well-reasoned, technically sound, exciting.
Why We Said No
I spent two weeks stress-testing the rewrite plan with our team, customers, and business stakeholders. The more we dug, the more the hidden costs emerged:
1. The Real Timeline
AI estimated 6 months based on raw lines of code and standard patterns. What it couldn’t account for:
- 200+ integrations with third-party services (SSO providers, payment processors, LMS systems)
- State education compliance certifications that require 6-month audit cycles
- Student data privacy regulations (FERPA, COPPA) requiring legal review
- Accessibility requirements (Section 508) needing manual testing
- Institutional knowledge embedded in edge cases
Our realistic estimate: 18-24 months to reach feature parity, 6 more months to achieve regulatory compliance, another 3-6 months for educators to trust the new system.
That’s 3 years of opportunity cost.
2. The Business Continuity Risk
Our platform serves real students taking real classes. Summer break gives us a 3-month window for major changes. But summer isn’t enough for a full rewrite.
What happens to our customers during a multi-year transition? We can’t tell schools “we’re rebuilding, expect bugs for 18 months.” They’ll switch to competitors.
The AI optimized for technical elegance. It didn’t consider customer retention, revenue continuity, or competitive dynamics.
3. The Knowledge Transfer Problem
Our “ugly” Django code contains 8 years of learned behavior:
- Edge cases we discovered through support tickets
- Performance optimizations for specific school district network conditions
- Workarounds for browser bugs in educational IT environments
- Business logic that isn’t documented anywhere except the code
AI could convert syntax but couldn’t transfer this institutional knowledge.
4. The Team Expertise Reality
Our team knows Django deeply. We have senior engineers who can debug complex issues quickly, junior engineers who are productive immediately, and hiring pipelines that work.
A rewrite to Next.js + FastAPI means:
- 12-18 months of reduced productivity during learning
- Difficulty hiring engineers with both Python and modern JS expertise
- Higher risk of architecture mistakes in unfamiliar territory
- Loss of the “go fast” advantage we have with our current stack
What We’re Doing Instead: Strangler Fig Pattern
We decided on a phased refactor approach, extracting high-value modules while keeping the core stable:
Phase 1 (Months 1-4): Extract authentication service
- Build new OAuth service in FastAPI
- Migrate from session-based to JWT
- Run both systems in parallel, gradually shift traffic
- Zero downtime, can rollback instantly
Phase 2 (Months 5-8): Extract student dashboard frontend
- Rebuild most-used UI in Next.js
- Use new auth service
- Old Django app still handles admin, less-used features
- Measure performance improvements with real users
Phase 3 (Months 9-12): Extract reporting/analytics module
- Move heavy database queries to dedicated service
- Implement proper caching layer
- Reduce load on monolith
Phase 4+: Extract additional modules based on ROI and risk
The key: Each phase delivers value independently, can be paused or rolled back, maintains business continuity.
Using AI Strategically
We’re not avoiding AI - we’re using it where it actually helps:
1. Test Generation for Legacy Code
Before extracting modules, we need comprehensive tests. AI excels at generating test cases from existing code behavior. We went from 40% test coverage to 75% in our auth module in 2 weeks.
2. Documentation of Existing System
Fed modules to GPT-4.5 and asked “explain what this code does and why.” The generated docs aren’t perfect but give new engineers a starting point. Saved weeks of senior engineer time.
3. Boilerplate for New Services
When building the new auth service, AI generated FastAPI boilerplate, OpenAPI schemas, database models. Let us focus on business logic.
4. Migration Script Assistance
AI helped write data migration scripts to move from session storage to JWT. Still required human review but saved significant time.
The pattern: AI assists with mechanical transformation and documentation, humans handle strategy and domain logic.
Four Months In: Progress Report
We’ve completed Phase 1 (auth service extraction) and are midway through Phase 2 (student dashboard):
What’s Working:
- Zero downtime during migration
- New auth service handles 60% of traffic, old system handles 40%
- Performance improved by 35% for authentication flows
- Team is learning Next.js gradually without productivity collapse
- Can demonstrate progress to stakeholders every month
What’s Challenging:
- Integration complexity higher than expected (classic strangler fig problem)
- Maintaining two systems temporarily increases operational burden
- Some junior engineers frustrated by slower pace than “rewrite everything”
The Key Insight:
AI optimizes for technical elegance, not business risk. The rewrite would have been “better” code. But the refactor approach is better for our business, our customers, and our team’s ability to keep shipping value while we modernize.
The Uncomfortable Question
How many companies are making rewrite decisions based on AI recommendations without fully accounting for the hidden costs?
GPT-4.5 doesn’t understand:
- Customer relationships built over years
- Regulatory compliance timelines
- Institutional knowledge that only exists in code and senior engineers’ heads
- Opportunity cost of not shipping new features for 18+ months
- Team skill gaps and hiring market realities
It sees code patterns and suggests optimal technical solutions. But optimal code doesn’t always mean optimal business decisions.
What I’d Tell My Past Self
If I could go back to that moment three months ago when we got the AI rewrite recommendation:
“Use AI to analyze the problem, but don’t let AI make the strategic decision. The analysis is valuable - AI found real issues. But the solution requires business context, customer empathy, team dynamics, and risk tolerance that AI can’t quantify.”
The wisdom: AI is a powerful tool for assessment and assistance. It’s a dangerous decision-maker for questions with high business stakes and long time horizons.
Questions for the Community
- Has anyone successfully used AI for major framework/architecture migrations? What made it work?
- How do you balance “technically better” versus “business risk” when AI suggests rewrites?
- What’s the right role for AI in architectural decision-making?
- How do you avoid the “rewrite temptation” when AI makes it look so achievable?
We’re 4 months into a 24+ month journey. I believe we made the right call, but we won’t really know for another year. I’m curious if others have navigated similar decisions, with or without AI in the mix.