返回目錄
A
Data Science for the Modern Analyst: From Concepts to Implementation - 第 8 章
Chapter 8: Deployment & Production
發布於 2026-02-26 07:29
# Chapter 8: Deployment & Production
Deploying a data‑science model is where the magic meets the business. A well‑designed production pipeline transforms an analytical prototype into a reliable, scalable service that delivers value day‑in, day‑out. This chapter walks through the core components of a modern MLOps workflow:
1. **Model Serving** – Exposing a trained model as an API or batch job.
2. **MLOps Pipelines** – CI/CD for data, code, and model artifacts.
3. **Version Control & Artifacts** – Reproducibility, rollback, and collaboration.
4. **Monitoring & Observability** – Detecting data drift, performance regressions, and SLA violations.
By the end, you’ll know how to create a robust end‑to‑end pipeline that aligns with both technical excellence and governance requirements.
---
## 1. Model Serving
Model serving is the mechanism by which predictions are delivered to downstream systems (web apps, mobile apps, internal dashboards, etc.). It can be implemented as a *real‑time* or *batch* service.
### 1.1 Real‑Time Serving
| Layer | Tool | Typical Use‑Case |
|-------|------|------------------|
| Inference | **FastAPI** + **uvicorn** | Low‑latency microservice for API calls |
| Container | **Docker** | Portable, isolated environment |
| Orchestration | **Kubernetes** (K8s) | Horizontal scaling, auto‑healing |
| Traffic routing | **Istio** / **NGINX** | Load balancing, retries |
```python
# demo_serving.py
from fastapi import FastAPI
import joblib
import numpy as np
app = FastAPI()
model = joblib.load("/opt/models/credit_approval_v1.pkl")
@app.post("/predict")
def predict(features: dict):
X = np.array([list(features.values())])
pred = model.predict(X)[0]
return {"approved": bool(pred)}
```
Deploy with Docker:
```dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY demo_serving.py .
COPY credit_approval_v1.pkl .
RUN pip install fastapi uvicorn joblib
CMD ["uvicorn", "demo_serving:app", "--host", "0.0.0.0", "--port", "80"]
```
Build & push:
```bash
docker build -t registry.company.com/credit-service:latest .
# Push to private registry
```
### 1.2 Batch Serving
Batch jobs are preferable for large‑scale scoring (e.g., nightly risk calculation). Common orchestration tools include **Airflow**, **Prefect**, or **Kubeflow Pipelines**.
```python
# batch_job.py
import joblib, pandas as pd
model = joblib.load("/opt/models/credit_approval_v1.pkl")
df = pd.read_parquet("s3://data-lake/transactions.parquet")
X = df.drop(columns=["transaction_id", "label"])
preds = model.predict(X)
results = df["transaction_id"].to_frame()
results["approved"] = preds
results.to_parquet("s3://data-lake/credit_predictions.parquet")
```
The job can be scheduled in Airflow:
```python
# airflow_dag.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
with DAG("credit_batch", start_date=datetime(2024, 2, 1), schedule_interval="@daily") as dag:
BashOperator(
task_id="run_prediction",
bash_command="python3 /opt/airflow/dags/batch_job.py"
)
```
---
## 2. MLOps Pipelines
MLOps extends traditional CI/CD to handle data, feature engineering, model training, and evaluation. The pipeline stages are:
1. **Data Validation & Ingestion** – Check schema, quality, and lineage.
2. **Feature Store Retrieval** – Pull latest features, handle TTL.
3. **Training** – Hyper‑parameter search, cross‑validation.
4. **Model Registry** – Store artifacts, metadata, and tags.
5. **Testing** – Unit, integration, and sanity checks.
6. **Deployment** – Push to serving environment.
7. **Monitoring** – Continuous evaluation of drift and performance.
### 2.1 Example Pipeline with MLflow + Argo
```yaml
# argo_mlflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: credit-mlflow-;
spec:
entrypoint: main
templates:
- name: main
dag:
tasks:
- name: data-validation
template: data-val
- name: feature-store
depends: data-validation
template: feature-store
- name: train-model
depends: feature-store
template: train
- name: register-model
depends: train-model
template: register
- name: deploy-model
depends: register-model
template: deploy
- name: data-val
container:
image: company/mlflow:1.30
command: ["bash", "-c"]
args: ["python validate.py"]
# ... (other templates follow similar pattern)
```
This DAG runs in a Kubernetes cluster; each step is idempotent and logged.
---
## 3. Version Control & Artifacts
### 3.1 Git for Code & Configuration
All source code, including notebooks, scripts, and pipeline definitions, should be stored in a **Git** repository. Adopt semantic versioning (e.g., `v1.2.0`) and use feature branches for experimentation.
```bash
git checkout -b feature/optimize-learning-rate
# After testing
git commit -m "Increase learning rate to 0.01 for XGBoost"
git push origin feature/optimize-learning-rate
```
### 3.2 Model Registry (MLflow, DVC, S3)
| Registry | Strengths | Typical Workflow |
|----------|-----------|------------------|
| **MLflow** | Experiment tracking, model packaging, deployment | Log metrics in `mlflow.log_metric`; register with `mlflow.register_model` |
| **DVC** | Data versioning, reproducible pipelines | Track large files (`dvc add data/`), push to remote storage |
| **S3 + Glue** | Enterprise data lake, schema enforcement | Store serialized models; catalog with Glue Data Catalog |
Example: Register a model with MLflow
```python
import mlflow
from mlflow.models import infer_signature
import joblib
import pandas as pd
# Load data & train
X_train, y_train = ...
model = XGBClassifier(n_estimators=100).fit(X_train, y_train)
# Log experiment
with mlflow.start_run():
mlflow.log_params(model.get_params())
mlflow.log_metrics({"accuracy": accuracy_score(y_test, model.predict(X_test))})
signature = infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(model, "model", signature=signature)
mlflow.register_model("runs:/<run_id>/model", "credit-approval-v1")
```
---
## 4. Monitoring & Observability
A robust production model is a monitored one. Key metrics to track include:
- **Latency** – Average and percentile response times.
- **Throughput** – Requests per second.
- **Accuracy drift** – Comparison of recent predictions vs. historical ground truth.
- **Feature drift** – Distribution shift in input features.
- **Resource utilization** – CPU, GPU, memory usage.
### 4.1 Prometheus + Grafana
Deploy Prometheus to scrape metrics exposed by the FastAPI app (via **Prometheus‑Client**). Grafana dashboards visualize trends.
```python
# metrics.py
from prometheus_client import Summary
import time
REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request")
@REQUEST_TIME.time()
def process_request():
time.sleep(0.2) # Simulate work
```
In the FastAPI app:
```python
from fastapi import FastAPI
from prometheus_client import start_http_server
app = FastAPI()
start_http_server(8001)
```
Grafana panel: `rate(http_requests_total[1m])` shows request rate.
### 4.2 Model‑specific Alerts
- **Data Drift Alert**: Use **Evidently AI** or **NannyML** to compute drift scores. If drift > threshold, send Slack notification.
- **Accuracy Alert**: Run a validation set every hour; if accuracy < 0.9, trigger rollback.
Example Slack webhook snippet:
```python
import requests, json
def notify_slack(message):
webhook_url = "https://hooks.slack.com/services/..."
payload = {"text": message}
requests.post(webhook_url, data=json.dumps(payload), headers={"Content-Type": "application/json"})
```
---
## 5. Governance & Compliance in Production
- **Model Registry Tags**: Include `staging`, `production`, `deprecated`.
- **Audit Logs**: Capture every version change, deployment, and rollback.
- **GDPR & CCPA**: Ensure data residency, encryption, and data‑subject rights are upheld.
- **Explainability**: Store SHAP or LIME explanations alongside each prediction batch for audit.
### 5.1 CI/CD Pipeline for Model Updates
```yaml
# .github/workflows/model-ci.yml
name: ML Pipeline
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run unit tests
run: pytest tests/
- name: Run training pipeline
run: python scripts/train.py
- name: Deploy if success
if: success()
run: bash scripts/deploy.sh
```
---
## 6. Summary
| Topic | Key Takeaway |
|-------|--------------|
| Model Serving | Containerize inference, expose as REST, scale with K8s |
| MLOps Pipeline | Automate data, feature, training, and deployment steps |
| Version Control | Keep code, data, and models in a single, auditable repository |
| Monitoring | Detect drift, latency, and SLA violations in real time |
| Governance | Embed audit, explainability, and compliance checkpoints |
The journey from a prototype notebook to a production‑ready model is iterative. By integrating **model registry**, **continuous monitoring**, and **robust pipelines**, analysts can deliver reliable insights that scale and comply with regulatory standards.
---
## Quick Reference Checklist
| ✅ | Item |
|---|------|
| ✅ | Models are registered in MLflow with clear tags |
| ✅ | Inference API is containerized and deployed to K8s |
| ✅ | CI/CD pipeline runs unit tests, training, and deployment |
| ✅ | Prometheus scrapes latency & request metrics |
| ✅ | Data and model drift alerts are configured to Slack |
| ✅ | All artifacts are versioned with Git and DVC |
| ✅ | GDPR compliance logs are enabled for all predictions |
---
## Further Reading
- **MLflow Documentation** – https://mlflow.org/docs/latest/
- **Argo Workflows** – https://argoproj.github.io/docs/latest/
- **Prometheus** – https://prometheus.io/docs/introduction/overview/
- **Grafana** – https://grafana.com/docs/grafana/latest/|
- **Evidently AI** – https://evidentlyai.com/
- **NannyML** – https://nannyml.com/
---
*Prepared by the Data Science Operations Team, February 2024.*