返回目錄
A
Data Science Unveiled: From Raw Data to Insightful Decisions - 第 9 章
Chapter 9 – From Prototype to Production: Deploying Data‑Science Models at Scale
發布於 2026-03-06 22:26
# 9.1 From Notebook to Service
In the last chapter we learned to orchestrate experiments as *first‑class artifacts*—data, code, hyper‑parameters, logs—all versioned in a Git‑style repository. The next logical step is to expose the *best* of those experiments to the world: a model that runs, scales, and serves predictions on demand.
## 9.1.1 Why Deployment Matters
> **Deploying a model is not a one‑off task; it is a continuous journey.** A model that once delivered 95 % accuracy in a sandbox may degrade to 80 % once the production data distribution shifts. Thus, deployment is a gateway to **monitoring**, **re‑training**, and **governance**.
### 1. Docker: The Packaging Unit
Docker provides a declarative way to capture all the runtime dependencies of your model.
```dockerfile
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=8080
EXPOSE 8080
ENTRYPOINT ["gunicorn", "app:app", "--bind", "0.0.0.0:8080"]
```
- **Benefits**: reproducibility, isolation, ease of CI/CD.
- **Pitfalls**: image bloat; watch the size of the base image.
### 2. Building a Predictive Service
```python
# app.py
from fastapi import FastAPI, Request
import joblib
import numpy as np
app = FastAPI()
model = joblib.load("model.pkl")
@app.post("/predict")
async def predict(request: Request):
payload = await request.json()
X = np.array(payload["features"]).reshape(1, -1)
prob = model.predict_proba(X)[0, 1]
return {"probability": prob}
```
FastAPI gives you async request handling, auto‑generated OpenAPI docs, and a minimal learning curve.
## 9.1.2 Scaling with Kubernetes
Running containers locally is fine for experimentation, but production demands **horizontal scaling**, **self‑healing**, and **observability**.
### 1. Helm Charts
Using Helm lets you package all Kubernetes manifests into reusable charts.
```yaml
# helm/myapp/values.yaml
replicaCount: 3
image:
repository: myrepo/myapp
tag: "{{ .Chart.AppVersion }}"
service:
type: ClusterIP
port: 80
resources:
limits:
cpu: "1"
memory: "512Mi"
```
### 2. Autoscaling
Leverage **HorizontalPodAutoscaler** (HPA) to scale pods based on CPU or custom metrics (e.g., request latency). Example:
```yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
```
## 9.1.3 Observability & Monitoring
- **Prometheus + Grafana**: scrape metrics from the FastAPI health endpoint.
- **ELK Stack**: log routing and anomaly detection.
- **OpenTelemetry**: distributed tracing across microservices.
```yaml
# OpenTelemetry sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-opentelemetry
spec:
template:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
command: ["--config=/conf/otel-collector-config.yaml"]
volumeMounts:
- name: config
mountPath: /conf
```
## 9.1.4 Model Governance in Production
Once your model is live, governance becomes an operational necessity:
1. **Versioning**: Tag each model with a semantic version and store its artifacts in a model registry (e.g., MLflow).
2. **A/B Testing**: Serve traffic to a new model version to a subset of users and compare metrics.
3. **Feature Drift Monitoring**: Compare the distribution of incoming features against the training data.
4. **Explainability Dashboards**: Deploy SHAP or LIME visualizations for end‑users to interrogate predictions.
## 9.1.5 Continuous Delivery Pipeline
1. **Git Push** → **CI Build** (lint, unit tests, integration tests).
2. **Docker Build** → **Push to Registry**.
3. **Helm Upgrade** → **Deploy to K8s**.
4. **Post‑Deployment Smoke Test** → **Monitor**.
5. **Model Retraining Trigger** when drift exceeds threshold.
## 9.1.6 Lessons Learned
| Observation | Takeaway |
|--------------|----------|
| Cold starts on GKE were 4‑seconds on average. | Consider using a *Knative* eventing model or pre‑warm pods. |
| GPU usage for inference dropped to 20 % in production. | Optimize batch size and use ONNX runtime. |
| Drift detection triggered 12 re‑trains in 3 months. | Invest in automated data pipelines to keep the training set fresh. |
# 9.2 The Big Picture
Deploying a model is a *continuous loop*—not a checkpoint. The code you write in the notebook is only the first iteration. Production is an ecosystem where data, code, monitoring, and governance interact fluidly. In the next chapter we’ll formalize this ecosystem with **Kubeflow Pipelines** and **MLOps best practices**, turning that loop into a scalable, maintainable pipeline.