Chapter 7: Deployment & Scaling

發布於 2026-02-27 13:55

# Chapter 7: Deployment & Scaling Deploying a data‑science model is the bridge between analytical insight and operational impact. In this chapter we cover the end‑to‑end journey from a trained model in a Jupyter notebook to a production‑grade microservice that scales with traffic, all while maintaining traceability, observability, and governance. ## 7.1 The Deployment Lifecycle | Phase | Goal | Typical Activities | |-------|------|--------------------| | **Model Packaging** | Bundle code, dependencies, and artifacts | Use `mlflow` or `Docker` images; create a `requirements.txt` | | **Continuous Integration (CI)** | Automated validation of model changes | Unit tests, unit‑inference tests, data‑quality checks | | **Continuous Delivery (CD)** | Seamless promotion to environments | Automated build pipelines (GitHub Actions, Jenkins) | | **Model Serving** | Expose a predictable API | REST, gRPC, or serverless endpoints | | **Observability & Monitoring** | Detect drift, performance regressions | Metric collection, alerting, model explainability | | **Scaling** | Handle variable load and large datasets | Horizontal scaling, caching, data partitioning | > **Lesson** – Treat the model as a first‑class citizen of your IT stack. Just as you would deploy a web application, you should follow disciplined CI/CD practices for data‑science artifacts. ## 7.2 Packaging Models for Production ### 7.2.1 Containerization with Docker ```dockerfile # Base image FROM python:3.10-slim # Install system dependencies RUN apt-get update && apt-get install -y \ libpq-dev gcc # Set working directory WORKDIR /app # Copy requirements and install COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy source code COPY . . # Expose port for API EXPOSE 8000 # Start the server CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` *Key points:* * Keep the image small: use `slim` variants and delete caches. * Freeze dependencies to a specific commit. * Separate runtime (`runtime.txt`) from build (`requirements.txt`). ### 7.2.2 Artifact Repositories | Repository | Use‑case | |-------------|----------| | **MLflow Artifacts** | Versioned models, parameters, and metrics | | **S3 / GCS** | Raw data, pre‑processed datasets | | **Docker Registry** | Container images | Versioning guarantees that a model deployed in production is reproducible. ## 7.3 Model Serving Options | Option | Pros | Cons | |--------|------|------| | **FastAPI + Uvicorn** | Lightweight, async support | Requires custom code | | **TensorFlow Serving** | Optimized for TensorFlow models | | **ONNX Runtime** | Framework agnostic | Limited GPU support | | **AWS SageMaker Endpoint** | Managed, auto‑scaling | Vendor lock‑in | | **Azure ML Inference Cluster** | Enterprise‑grade | Subscription cost | | **Serverless (AWS Lambda, Azure Functions)** | Pay‑per‑use | Cold start latency | The choice depends on model size, inference latency requirements, and organizational cloud strategy. ## 7.4 Continuous Integration for Models A robust CI pipeline for data‑science should cover: 1. **Linting & Static Analysis** – `pylint`, `flake8`. 2. **Unit & Integration Tests** – Verify data pipelines and inference. 3. **Model Validation** – Compare predictions against a baseline. 4. **Data Quality Checks** – Using `great_expectations` or `deequ`. 5. **Metric Regression** – Ensure AUC / RMSE hasn't regressed beyond a threshold. ### Example: GitHub Actions Workflow ```yaml name: Model CI on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install dependencies run: pip install -r requirements.txt - name: Run tests run: pytest tests/ - name: Run model validation run: python scripts/validate_model.py ``` ## 7.5 Continuous Delivery & Deployment ### 7.5.1 Blue‑Green vs Canary Releases * **Blue‑Green** – Deploy to a new environment, switch traffic once ready. * **Canary** – Route a small percentage of traffic to the new version. Both strategies minimize risk and allow rollback. ### 7.5.2 Infrastructure as Code (IaC) Tools: Terraform, Pulumi, CloudFormation. Example Terraform snippet for an EC2 instance running a FastAPI app: ```hcl resource "aws_instance" "model_server" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium" user_data = file("setup.sh") tags = { Name = "ModelServer" } } ``` ## 7.6 Observability & Monitoring | Metric | Source | Alert Conditions | |--------|--------|------------------| | **Request latency** | Prometheus + Grafana | > 95th percentile > 500ms | | **CPU / Memory** | Docker stats | > 80% usage | | **Prediction error drift** | MLflow Tracking | Mean absolute error > 0.05 | | **Data drift** | Evidently | KS‑statistic > 0.1 | ### 7.6.1 Example: Prometheus Exporter ```python from prometheus_client import start_http_server, Summary import time REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request') @REQUEST_TIME.time() def process_request(): time.sleep(0.5) if __name__ == '__main__': start_http_server(8001) while True: process_request() ``` Prometheus scrapes `/metrics`, and Grafana visualises the data. ## 7.7 Scaling Strategies ### 7.7.1 Horizontal Scaling Deploy multiple instances behind a load balancer (AWS ALB, Nginx). Use auto‑scaling groups to adjust capacity based on CPU or request queue metrics. ### 7.7.2 Model‑Level Caching Cache frequent predictions (e.g., using Redis). For example, if a pricing model is queried for the same SKU‑region pair often, store the result. ### 7.7.3 Batch vs Real‑Time * **Batch** – Process large data chunks overnight (Spark, Airflow). Ideal for recommendation engines. * **Real‑Time** – Low‑latency inference (FastAPI, TensorFlow Serving). Use for fraud detection. ## 7.8 Governance & Compliance in Deployment | Requirement | How to Implement | |-------------|-----------------| | **Data Provenance** | MLflow lineage, data catalog | | **Model Explainability** | SHAP values exposed via API | | **Audit Logs** | Store request/response in secure logs | | **Access Control** | IAM roles, API keys, OAuth | | **Versioning** | Git tags, Docker tags | ### 7.8.1 Example: Logging Requests with GDPR compliance ```python from fastapi import FastAPI, Request import logging app = FastAPI() logging.basicConfig(filename='requests.log', level=logging.INFO) @app.post("/predict") async def predict(request: Request, payload: dict): client_ip = request.client.host logging.info(f"{client_ip} - {payload}") # ... prediction logic return {"prediction": 42} ``` Ensure logs are stored in an encrypted S3 bucket with access limited to compliance staff. ## 7.9 Key Metrics to Track in Production | Metric | Definition | Why It Matters | |--------|------------|----------------| | **Latency (99th percentile)** | Time from request to response | Affects user experience | | **Throughput** | Requests per second | Capacity planning | | **Accuracy drift** | Change in model performance over time | Indicates need for retraining | | **Data drift** | Shift in feature distribution | Signals changing business context | | **Error rate** | Percentage of failed requests | Operational reliability | | **Cost per inference** | Cloud compute + storage costs | Budgeting | ## 7.10 Lessons Learned 1. **Treat models as software** – Apply the same rigor (versioning, testing, CI/CD) to models as you do to code. 2. **Observability is non‑optional** – Without metrics you cannot diagnose drift or performance issues. 3. **Automation saves time and reduces bias** – Automated retraining pipelines help maintain fairness and accuracy. 4. **Security is integral** – Protect models, data, and APIs with encryption, IAM, and monitoring. 5. **Plan for the unexpected** – Build failover strategies and graceful degradation for critical services. Deploying and scaling models is an ongoing, iterative effort. By embedding best practices into your organization’s engineering culture, decision‑makers can reap sustained value from predictive analytics.

Chapter 6: From Prototype to Production – Scaling Insights for Business Impact

Chapter 8: Continuous Governance – Keeping Models Alive in the Real World