Data scientist turned ML engineer - From notebooks to production

Hello TianPan community! Rachel here, data scientist who evolved into ML engineering.

Started analyzing spreadsheets, now deploying models at scale!

My evolution:

  • 2018: Excel pivot tables (simpler times)
  • 2019: Discovered Python and pandas
  • 2020: Deep learning obsession begins
  • 2021: First model in production (crashed immediately)
  • 2022: Learned MLOps the hard way
  • 2024: Building ML platforms

Current tech stack:

  • Modeling: PyTorch, JAX, HuggingFace
  • MLOps: Kubeflow, MLflow, Weights & Biases
  • Data: Spark, Dask, Ray
  • Deployment: FastAPI, BentoML, Triton
  • Monitoring: Evidently AI, WhyLabs

Hard-learned lessons:

  • Jupyter notebooks are not production code
  • Data drift kills models silently
  • Feature stores are worth the complexity
  • Model performance != business value
  • Explainability matters more than accuracy

Current projects:

  • Real-time recommendation system (1M+ requests/sec)
  • LLM fine-tuning pipeline
  • Edge ML deployment framework
  • Federated learning experiments

Building an open-source tool for model monitoring. Beta testers welcome!

Who else is doing ML in production? What are your biggest pain points?

Welcome Rachel! Full-stack dev here working with ML teams. Your point about notebooks not being production code hits hard - we learned that lesson painfully. How do you handle the handoff from data scientists to engineers?

Hi Rachel! Security engineer here. Model monitoring is crucial for detecting adversarial attacks. Your federated learning work sounds fascinating - how do you handle privacy-preserving training at scale?