Chapter 8: Deployment & Production

發布於 2026-02-26 07:29

# Chapter 8: Deployment & Production Deploying a data‑science model is where the magic meets the business. A well‑designed production pipeline transforms an analytical prototype into a reliable, scalable service that delivers value day‑in, day‑out. This chapter walks through the core components of a modern MLOps workflow: 1. **Model Serving** – Exposing a trained model as an API or batch job. 2. **MLOps Pipelines** – CI/CD for data, code, and model artifacts. 3. **Version Control & Artifacts** – Reproducibility, rollback, and collaboration. 4. **Monitoring & Observability** – Detecting data drift, performance regressions, and SLA violations. By the end, you’ll know how to create a robust end‑to‑end pipeline that aligns with both technical excellence and governance requirements. --- ## 1. Model Serving Model serving is the mechanism by which predictions are delivered to downstream systems (web apps, mobile apps, internal dashboards, etc.). It can be implemented as a *real‑time* or *batch* service. ### 1.1 Real‑Time Serving | Layer | Tool | Typical Use‑Case | |-------|------|------------------| | Inference | **FastAPI** + **uvicorn** | Low‑latency microservice for API calls | | Container | **Docker** | Portable, isolated environment | | Orchestration | **Kubernetes** (K8s) | Horizontal scaling, auto‑healing | | Traffic routing | **Istio** / **NGINX** | Load balancing, retries | ```python # demo_serving.py from fastapi import FastAPI import joblib import numpy as np app = FastAPI() model = joblib.load("/opt/models/credit_approval_v1.pkl") @app.post("/predict") def predict(features: dict): X = np.array([list(features.values())]) pred = model.predict(X)[0] return {"approved": bool(pred)} ``` Deploy with Docker: ```dockerfile FROM python:3.10-slim WORKDIR /app COPY demo_serving.py . COPY credit_approval_v1.pkl . RUN pip install fastapi uvicorn joblib CMD ["uvicorn", "demo_serving:app", "--host", "0.0.0.0", "--port", "80"] ``` Build & push: ```bash docker build -t registry.company.com/credit-service:latest . # Push to private registry ``` ### 1.2 Batch Serving Batch jobs are preferable for large‑scale scoring (e.g., nightly risk calculation). Common orchestration tools include **Airflow**, **Prefect**, or **Kubeflow Pipelines**. ```python # batch_job.py import joblib, pandas as pd model = joblib.load("/opt/models/credit_approval_v1.pkl") df = pd.read_parquet("s3://data-lake/transactions.parquet") X = df.drop(columns=["transaction_id", "label"]) preds = model.predict(X) results = df["transaction_id"].to_frame() results["approved"] = preds results.to_parquet("s3://data-lake/credit_predictions.parquet") ``` The job can be scheduled in Airflow: ```python # airflow_dag.py from airflow import DAG from airflow.operators.bash import BashOperator from datetime import datetime with DAG("credit_batch", start_date=datetime(2024, 2, 1), schedule_interval="@daily") as dag: BashOperator( task_id="run_prediction", bash_command="python3 /opt/airflow/dags/batch_job.py" ) ``` --- ## 2. MLOps Pipelines MLOps extends traditional CI/CD to handle data, feature engineering, model training, and evaluation. The pipeline stages are: 1. **Data Validation & Ingestion** – Check schema, quality, and lineage. 2. **Feature Store Retrieval** – Pull latest features, handle TTL. 3. **Training** – Hyper‑parameter search, cross‑validation. 4. **Model Registry** – Store artifacts, metadata, and tags. 5. **Testing** – Unit, integration, and sanity checks. 6. **Deployment** – Push to serving environment. 7. **Monitoring** – Continuous evaluation of drift and performance. ### 2.1 Example Pipeline with MLflow + Argo ```yaml # argo_mlflow.yaml apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: credit-mlflow-; spec: entrypoint: main templates: - name: main dag: tasks: - name: data-validation template: data-val - name: feature-store depends: data-validation template: feature-store - name: train-model depends: feature-store template: train - name: register-model depends: train-model template: register - name: deploy-model depends: register-model template: deploy - name: data-val container: image: company/mlflow:1.30 command: ["bash", "-c"] args: ["python validate.py"] # ... (other templates follow similar pattern) ``` This DAG runs in a Kubernetes cluster; each step is idempotent and logged. --- ## 3. Version Control & Artifacts ### 3.1 Git for Code & Configuration All source code, including notebooks, scripts, and pipeline definitions, should be stored in a **Git** repository. Adopt semantic versioning (e.g., `v1.2.0`) and use feature branches for experimentation. ```bash git checkout -b feature/optimize-learning-rate # After testing git commit -m "Increase learning rate to 0.01 for XGBoost" git push origin feature/optimize-learning-rate ``` ### 3.2 Model Registry (MLflow, DVC, S3) | Registry | Strengths | Typical Workflow | |----------|-----------|------------------| | **MLflow** | Experiment tracking, model packaging, deployment | Log metrics in `mlflow.log_metric`; register with `mlflow.register_model` | | **DVC** | Data versioning, reproducible pipelines | Track large files (`dvc add data/`), push to remote storage | | **S3 + Glue** | Enterprise data lake, schema enforcement | Store serialized models; catalog with Glue Data Catalog | Example: Register a model with MLflow ```python import mlflow from mlflow.models import infer_signature import joblib import pandas as pd # Load data & train X_train, y_train = ... model = XGBClassifier(n_estimators=100).fit(X_train, y_train) # Log experiment with mlflow.start_run(): mlflow.log_params(model.get_params()) mlflow.log_metrics({"accuracy": accuracy_score(y_test, model.predict(X_test))}) signature = infer_signature(X_train, model.predict(X_train)) mlflow.sklearn.log_model(model, "model", signature=signature) mlflow.register_model("runs:/<run_id>/model", "credit-approval-v1") ``` --- ## 4. Monitoring & Observability A robust production model is a monitored one. Key metrics to track include: - **Latency** – Average and percentile response times. - **Throughput** – Requests per second. - **Accuracy drift** – Comparison of recent predictions vs. historical ground truth. - **Feature drift** – Distribution shift in input features. - **Resource utilization** – CPU, GPU, memory usage. ### 4.1 Prometheus + Grafana Deploy Prometheus to scrape metrics exposed by the FastAPI app (via **Prometheus‑Client**). Grafana dashboards visualize trends. ```python # metrics.py from prometheus_client import Summary import time REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request") @REQUEST_TIME.time() def process_request(): time.sleep(0.2) # Simulate work ``` In the FastAPI app: ```python from fastapi import FastAPI from prometheus_client import start_http_server app = FastAPI() start_http_server(8001) ``` Grafana panel: `rate(http_requests_total[1m])` shows request rate. ### 4.2 Model‑specific Alerts - **Data Drift Alert**: Use **Evidently AI** or **NannyML** to compute drift scores. If drift > threshold, send Slack notification. - **Accuracy Alert**: Run a validation set every hour; if accuracy < 0.9, trigger rollback. Example Slack webhook snippet: ```python import requests, json def notify_slack(message): webhook_url = "https://hooks.slack.com/services/..." payload = {"text": message} requests.post(webhook_url, data=json.dumps(payload), headers={"Content-Type": "application/json"}) ``` --- ## 5. Governance & Compliance in Production - **Model Registry Tags**: Include `staging`, `production`, `deprecated`. - **Audit Logs**: Capture every version change, deployment, and rollback. - **GDPR & CCPA**: Ensure data residency, encryption, and data‑subject rights are upheld. - **Explainability**: Store SHAP or LIME explanations alongside each prediction batch for audit. ### 5.1 CI/CD Pipeline for Model Updates ```yaml # .github/workflows/model-ci.yml name: ML Pipeline on: push: branches: [ main ] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: '3.10' - name: Install dependencies run: pip install -r requirements.txt - name: Run unit tests run: pytest tests/ - name: Run training pipeline run: python scripts/train.py - name: Deploy if success if: success() run: bash scripts/deploy.sh ``` --- ## 6. Summary | Topic | Key Takeaway | |-------|--------------| | Model Serving | Containerize inference, expose as REST, scale with K8s | | MLOps Pipeline | Automate data, feature, training, and deployment steps | | Version Control | Keep code, data, and models in a single, auditable repository | | Monitoring | Detect drift, latency, and SLA violations in real time | | Governance | Embed audit, explainability, and compliance checkpoints | The journey from a prototype notebook to a production‑ready model is iterative. By integrating **model registry**, **continuous monitoring**, and **robust pipelines**, analysts can deliver reliable insights that scale and comply with regulatory standards. --- ## Quick Reference Checklist | ✅ | Item | |---|------| | ✅ | Models are registered in MLflow with clear tags | | ✅ | Inference API is containerized and deployed to K8s | | ✅ | CI/CD pipeline runs unit tests, training, and deployment | | ✅ | Prometheus scrapes latency & request metrics | | ✅ | Data and model drift alerts are configured to Slack | | ✅ | All artifacts are versioned with Git and DVC | | ✅ | GDPR compliance logs are enabled for all predictions | --- ## Further Reading - **MLflow Documentation** – https://mlflow.org/docs/latest/ - **Argo Workflows** – https://argoproj.github.io/docs/latest/ - **Prometheus** – https://prometheus.io/docs/introduction/overview/ - **Grafana** – https://grafana.com/docs/grafana/latest/| - **Evidently AI** – https://evidentlyai.com/ - **NannyML** – https://nannyml.com/ --- *Prepared by the Data Science Operations Team, February 2024.*

Chapter 7: Transparency, Explainability, and Ethical Governance

Chapter 9: Observability in the Field – Monitoring, Drift Detection, and Continuous Governance