Skip to main content

4 posts tagged with "ml-engineering"

View all tags

The Fine-Tune Artifact Your Departing Engineer Took With Them

· 12 min read
Tian Pan
Software Engineer

A fine-tune is not a file. It is the closure of a pipeline over a training set, and the team that ships the file without the closure has built a production dependency whose source code is in someone else's head. The day that person leaves with two weeks of notice and a clean handoff document is the day your bus factor on a revenue feature drops to zero and nobody notices, because the weights are still in the registry and the registry tag is still stable and the model still serves traffic. The reckoning shows up later, in a routine base-model migration that should have taken a sprint and takes a quarter instead.

The pattern is consistent across teams I have watched run into it. An ML engineer spends six months iterating on a fine-tune — data curation, hyperparameter sweeps, behavioral patches evaluated by feel against a held-out set. The final adapter weights get pushed to the model registry with a tag. The training pipeline that produced those weights is a notebook on the engineer's laptop, with hard-coded paths and floating dependencies that resolved to whatever was the latest version on the day each cell was last executed. The team accepts the handoff at face value because the weights work and the eval scores are good and the registry tag is stable. Eighteen months later, the engineer departs. Six months after that, a base-model migration requires regenerating the adapter against an updated base, the notebook runs and produces weights that score three points lower and regress visibly on the hardest customer segment, and the team spends four months trying and failing to reproduce the original artifact.

Synthetic Seed Data: Bootstrapping Fine-Tuning Before Your First Thousand Users

· 9 min read
Tian Pan
Software Engineer

Fine-tuning a model is easy when you have data. The brutal part is the moment before your product exists: you need personalization to attract users, but you need users to have personalization data. Most teams either skip fine-tuning entirely ("we'll add it later") or spend weeks collecting labeled examples by hand. Neither works well. The first produces a generic model users immediately recognize as generic. The second is slow enough that by the time you have data, the task has evolved.

Synthetic seed data solves this — but only when you understand exactly where it breaks.

Staffing AI Engineering Teams: Who Owns What When Every Feature Has an AI Component

· 11 min read
Tian Pan
Software Engineer

Three years ago, "AI team" meant a group of specialists tucked into a corner of the org chart, mostly invisible to product engineers. Today, a senior software engineer at a fintech company ships a fraud-scoring feature using a fine-tuned model on Monday, wires up a RAG pipeline for customer support on Wednesday, and debugs LLM latency on Friday. The specialists didn't go away—but the boundary between "AI work" and "product engineering" dissolved faster than almost anyone planned for.

Most teams responded by bolting new titles onto existing job descriptions and calling it done. That's the wrong answer, and the dysfunction shows up quickly: unclear ownership, duplicated tooling, and an ML platform team that spends half its time explaining why product teams can't just call the OpenAI API directly.

This post is about getting the structure right—not in the abstract, but for the actual stages of AI adoption most engineering organizations go through.

Synthetic Data Pipelines for Domain-Specific LLM Fine-Tuning

· 9 min read
Tian Pan
Software Engineer

Your model fine-tuned on synthetic data scores 95% on your internal evals. Then you deploy it, and it confidently invents drug interactions that don't exist, cites legal precedents with wrong case numbers, and hallucinates API endpoints with plausible-sounding names. The model hasn't regressed on fluency — it's gotten worse in a way that fluency metrics completely miss. Researchers call this knowledge collapse: factual accuracy degrades while surface coherence stays intact. It's one of the more insidious failure modes in synthetic data training, and it happens most often when engineers build pipelines without accounting for it.

Synthetic data generation has become unavoidable for teams fine-tuning LLMs on specialized domains. Human annotation at scale is expensive, slow, and impossible for tasks that require expertise. Synthetic data generated by a capable teacher model can fill that gap cheaply. But the pipeline is not as simple as "prompt GPT-4 for examples, train your model." The details determine whether you get a specialized system that outperforms a general model on your domain, or a fluent but factually broken one.