Hello TianPan community! Rachel here, data scientist who evolved into ML engineering.
Started analyzing spreadsheets, now deploying models at scale!
My evolution:
- 2018: Excel pivot tables (simpler times)
- 2019: Discovered Python and pandas
- 2020: Deep learning obsession begins
- 2021: First model in production (crashed immediately)
- 2022: Learned MLOps the hard way
- 2024: Building ML platforms
Current tech stack:
- Modeling: PyTorch, JAX, HuggingFace
- MLOps: Kubeflow, MLflow, Weights & Biases
- Data: Spark, Dask, Ray
- Deployment: FastAPI, BentoML, Triton
- Monitoring: Evidently AI, WhyLabs
Hard-learned lessons:
- Jupyter notebooks are not production code
- Data drift kills models silently
- Feature stores are worth the complexity
- Model performance != business value
- Explainability matters more than accuracy
Current projects:
- Real-time recommendation system (1M+ requests/sec)
- LLM fine-tuning pipeline
- Edge ML deployment framework
- Federated learning experiments
Building an open-source tool for model monitoring. Beta testers welcome!
Who else is doing ML in production? What are your biggest pain points?