Chapter 11: From Prototype to Production – MLOps & Ethical Deployment

發布於 2026-03-06 23:14

# Chapter 11 ## From Prototype to Production – MLOps & Ethical Deployment After polishing your models, dashboards, and notebooks into portfolio pieces, the next frontier is **deployment**. In this chapter we’ll treat deployment not as a final, one‑off step but as a **continuous, ethical, and reproducible practice** that turns a single‑use prototype into a robust service that can scale, learn, and comply with regulations. > *Takeaway – Treat deployment as an iterative project: version your models, monitor them in real time, and audit for bias as part of the life‑cycle.* --- ## 1. The Notebook to Microservice Pipeline A data scientist’s notebook is great for exploration, but a business needs a **stable, scalable service**. Below is a typical workflow: 1. **Containerize the model** – Docker lets you package code, dependencies, and the model artifact into a portable image. 2. **Define an API** – Flask/FastAPI or TensorFlow Serving expose a REST or gRPC endpoint. 3. **Automate CI/CD** – GitHub Actions or GitLab CI build the image, run tests, and push to a registry. 4. **Deploy to the cloud** – Kubernetes, ECS, or a managed MLOps platform (e.g., SageMaker, Vertex AI) orchestrate scaling. ### 1.1 Dockerfile for a Scikit‑Learn Model dockerfile # Base image FROM python:3.11-slim # Install system dependencies RUN apt-get update && apt-get install -y \ build-essential \ && rm -rf /var/lib/apt/lists/* # Set working directory WORKDIR /app # Copy requirements and install COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt # Copy model and API code COPY model.joblib ./ COPY api.py ./ # Expose port and start service EXPOSE 8000 CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"] > **Note:** Use `joblib` for scikit‑learn pickles, `pickle` for general Python objects, or `onnx`/`torchscript` for cross‑platform compatibility. --- ## 2. Orchestrating Data Pipelines Once the model is a service, the data feeding it must be reliable. Two dominant paradigms: | Paradigm | Strengths | Typical Tools | |----------|-----------|---------------| | **Batch** | Simpler, deterministic, easier to audit | Airflow, Prefect, Dagster | | **Streaming** | Real‑time, low latency | Kafka, Flink, Spark Structured Streaming | ### 2.1 Airflow DAG Example python from airflow import DAG from airflow.operators.bash import BashOperator from airflow.utils.dates import days_ago with DAG( dag_id="predict_customer_churn", default_args={"owner": "data_scientist"}, schedule_interval="@daily", start_date=days_ago(1), ) as dag: BashOperator( task_id="fetch_features", bash_command="python fetch_features.py" ) BashOperator( task_id="make_prediction", bash_command="curl -X POST http://localhost:8000/predict -d @features.json" ) --- ## 3. Model Monitoring and Feedback Loops A deployed model can degrade. **Continuous monitoring** catches drift early. | Metric | What to watch | Tool | |--------|--------------|------| | **Prediction Accuracy** | Drops after new season | Prometheus + Grafana | | **Feature Distribution** | Shift in customer demographics | Evidently, SHAP | | **Latency** | Slow responses | Datadog, New Relic | | **Bias Drift** | Unfair predictions across groups | Fairlearn, AI Fairness 360 | ### 3.1 Logging with MLflow python import mlflow import mlflow.pyfunc # Log input and output mlflow.log_input("raw_features", raw_features) mlflow.log_output("prediction", predictions) > **Tip:** Log a *sample* of raw input and output to debug without over‑loading the logger. --- ## 4. Reproducibility in Production Reproducibility is not just a research nicety; it’s a production necessity. 1. **Experiment Tracking** – MLflow, Weights & Biases. 2. **Artifact Versioning** – Store model files in an artifact store (S3, GCS) with semantic tags. 3. **Dataset Versioning** – DVC or Delta Lake to keep track of training data snapshots. 4. **Environment Capture** – Use `poetry` or `pipenv` to lock dependencies; Docker images serve as the final reference. ### 4.1 DVC Example bash # Track training data python create_training_data.py dvc add data/raw/train.csv dvc commit -m "Add training data v1" dvc push --- ## 5. Ethical Deployment: Transparency, Fairness, and Compliance Deploying a model is not just about speed; it’s about **trust**. ### 5.1 Transparency *Document assumptions, data sources, and model limits.* Use a *Model Card*: markdown # Model Card – Customer Churn Predictor **Purpose:** Predict churn probability for telecom customers. **Data:** 2023‑05‑01 dataset, 50k records. **Algorithm:** Gradient Boosting Machine. **Limitations:** Not trained on post‑2023 data; no support for non‑English labels. ### 5.2 Fairness Audits *Run bias tests before every major version change.* python from fairlearn.metrics import demographic_parity_difference diff = demographic_parity_difference(y_true, y_pred, sensitive_features=df['gender']) print(f"Gender parity diff: {diff:.3f}") ### 5.3 Regulatory Compliance *GDPR, CCPA, and other data‑privacy laws require:* - Data minimization. - Right to explanation. - Logging audit trails. > **Pro Tip:** Embed a *privacy‑by‑design* check into your CI pipeline: a script that scans for personally identifiable information (PII) before model training. --- ## 6. Iterative Improvement: A/B Testing & Continuous Learning A/B tests let you **experiment in production** safely. | Test Type | Use‑case | Tool | |-----------|----------|------| | **Feature toggle** | Test new algorithm on 10% traffic | LaunchDarkly | | **Canary deployment** | Gradual rollout of new model | Argo Rollouts | | **Online learning** | Adapt to concept drift | River, Scikit‑Incremental | ### 6.1 Simple A/B Test with FastAPI python from fastapi import FastAPI from fastapi.responses import JSONResponse import random app = FastAPI() @app.post("/predict") async def predict(data: dict): # 10% traffic to new model if random.random() < 0.1: result = new_model.predict(data) version = "v2" else: result = old_model.predict(data) version = "v1" return JSONResponse(content={"prediction": result, "model_version": version}) --- ## Conclusion Deploying a model is a **living process** that extends beyond the notebook. It involves: - Building reproducible, container‑based artifacts. - Orchestrating data pipelines. - Monitoring performance and bias. - Ensuring transparency and compliance. - Iterating with A/B tests and online learning. > **Remember:** Treat each deployment as a *new project* in your portfolio. Document every step, version everything, and maintain a clear audit trail. Recruiters and hiring managers value real‑world, reproducible systems more than flashy notebooks. --- ### Further Reading - *MLOps Engineering with Python*, Sebastian Raschka. - *Designing Data-Intensive Applications*, Martin Kleppmann. - *Fairness, Accountability, and Transparency in Machine Learning*, Suresh Venkatasubramanian. ### Practice Exercise Take the churn prediction model you built in Chapter 9 and: 1. Containerize it. 2. Deploy it to a free tier of Kubernetes (e.g., Minikube or GKE). 3. Set up Prometheus to monitor latency. 4. Create a simple A/B test that routes 20% of traffic to a slightly different hyperparameter set. 5. Document the entire pipeline in a Model Card. --- **End of Chapter 11**

Chapter 10: Capstone Project & Career Pathways

Chapter 12: From Monitored Predictions to Adaptive Insights