MLOps: The ML Lifecycle

The ML Lifecycle: From Notebooks to Production

System_Admin

Code Syllabus

ML Infrastructure Architecture

"A model that only runs on your local machine brings zero value to the business. MLOps is the discipline of making Machine Learning predictable, scalable, and reliable."

Data Preparation & Versioning

Traditional software version control (Git) handles code perfectly, but it crashes when dealing with 10GB datasets. In MLOps, we use tools like DVC (Data Version Control) to track datasets alongside code. If a model behaves strangely in production, you must be able to rollback to the exact code and the exact data used to train it.

Model Training & Registry

Data scientists experiment with dozens of algorithms and hyperparameters. Without tracking (e.g., MLflow, Weights & Biases), knowing which parameters produced the best accuracy is impossible. Once a model is finalized, it's stored in a Model Registry—a centralized repository treating ML models as deployable artifacts.

Serving & Monitoring

Deployment often involves wrapping the model in an API (FastAPI) and containerizing it with Docker. But the lifecycle doesn't end there. In the real world, data changes (Data Drift) and model performance degrades (Concept Drift). Continuous Monitoring using Prometheus and Grafana ensures we know exactly when a model needs retraining.

⚙️ MLOps Frequently Asked Questions

What is the Machine Learning Lifecycle?

The Machine Learning Lifecycle is the iterative process of taking an ML project from conception to production. It consists of Scoping, Data Prep (Ingestion/Cleaning), Modeling (Training/Evaluation), Deployment (Serving), and Monitoring. It is highly cyclical; monitoring feeds back into data prep for retraining.

Why is MLOps necessary compared to standard DevOps?

Standard DevOps focuses on versioning code. MLOps is far more complex because it must version three dimensions: Code, Data, and Models. Furthermore, code doesn't typically degrade on its own, whereas ML models naturally degrade over time as real-world data distributions drift away from the training data.

What is Data Drift vs Concept Drift?

Data Drift happens when the statistical properties of the input data (features) change (e.g., users get older).

Concept Drift happens when the relationship between input features and the target variable changes (e.g., the definition of a "fraudulent transaction" evolves because hackers change tactics). Both require model retraining.

Terminology Arsenal

MLOps

Machine Learning Operations. A set of practices to deploy and maintain ML models in production reliably and efficiently.

Inference

The process of passing live data into a trained model to calculate a prediction or output.

Model Registry

A centralized tracking system that manages the full lifecycle of ML models, holding versions, stages (Staging, Prod), and artifacts.

DVC

Data Version Control. Open-source version control system for Machine Learning projects to track large datasets.

FastAPI

A modern, fast web framework for building APIs with Python. Heavily used in MLOps for model serving.

CI/CD for ML

Continuous Integration & Continuous Deployment adapted for ML. Includes automated testing for both code and model metrics.

The ML Lifecycle

Lifecycle Matrix

Phase: Data Preparation

Metrics Check

Deployment Challenges

Ops Holo-Net