Chapter 6: Advanced Machine Learning & Deep Learning

發布於 2026-03-02 15:57

# Chapter 6: Advanced Machine Learning & Deep Learning In the previous chapters we established the statistical bedrock and built reliable pipelines that bring data into a usable form. Here we jump into the most powerful tools at our disposal: **advanced supervised learning models, ensembles, and deep neural networks**. The chapter is organized into six thematic blocks, each with practical Python snippets, illustrative diagrams, and key take‑aways that translate directly into production‑ready workflows. --- ## 1. Decision Trees – The Building Blocks Decision trees partition the feature space into axis‑aligned rectangles. They are *interpretable*, *non‑parametric*, and serve as the foundation for many ensemble methods. | Algorithm | Splitting Criterion | Pros | Cons | |-----------|---------------------|------|------| | CART | Gini impurity or MSE | Fast, handles mixed data | Prone to over‑fitting, unstable | | ID3 | Information gain | Simple, interpretable | Bias toward features with many levels | | C4.5 | Gain ratio | Handles missing values | More computational overhead | ### 1.1 Core Concepts - **Node**: Decision point; **leaf**: Prediction. - **Depth**: Number of edges from root to deepest leaf. - **Pruning**: Reducing over‑fitting by cutting back branches that offer little improvement. ### 1.2 Quick Implementation python from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) clf = DecisionTreeClassifier(max_depth=3, random_state=42) clf.fit(X_train, y_train) pred = clf.predict(X_test) print("Accuracy:", accuracy_score(y_test, pred)) > **Tip**: Use `max_depth`, `min_samples_split`, and `min_samples_leaf` to control tree complexity. --- ## 2. Ensembles – Bagging, Boosting, and Stacking Ensemble techniques combine multiple learners to improve generalization. They mitigate the weaknesses of single models while preserving interpretability where possible. ### 2.1 Bagging – Random Forest *Bootstrap Aggregating* reduces variance by training each tree on a random subset of data. python from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(n_estimators=200, max_depth=5, random_state=42) rf.fit(X_train, y_train) print("RF Accuracy:", accuracy_score(y_test, rf.predict(X_test))) ### 2.2 Boosting – Gradient Boosting & XGBoost *Gradient Boosting* builds trees sequentially, each correcting residuals of the previous. python from xgboost import XGBClassifier xgb = XGBClassifier(n_estimators=300, learning_rate=0.05, max_depth=4, subsample=0.8, colsample_bytree=0.8, random_state=42) xgb.fit(X_train, y_train) print("XGB Accuracy:", accuracy_score(y_test, xgb.predict(X_test))) ### 2.3 Stacking Combining predictions of diverse models via a meta‑learner. python from sklearn.ensemble import StackingClassifier from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC estimators = [ ('rf', RandomForestClassifier(n_estimators=100, random_state=42)), ('xgb', XGBClassifier(use_label_encoder=False, eval_metric='mlogloss', random_state=42)) ] stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression(), cv=5) stack.fit(X_train, y_train) print("Stack Accuracy:", accuracy_score(y_test, stack.predict(X_test))) --- ## 3. Neural Networks – From Feedforward to Transformers Deep neural networks (DNNs) approximate complex functions using layers of non‑linear units. The most common types are: | Architecture | Typical Use‑Case | |--------------|------------------| | Feedforward (MLP) | Tabular regression/classification | | Convolutional Neural Network (CNN) | Image, video | | Recurrent Neural Network (RNN) / LSTM / GRU | Time series, NLP | | Transformer | NLP, vision, multi‑modal | ### 3.1 Basic MLP Example python import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential([ layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)), layers.Dense(32, activation='relu'), layers.Dense(3, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=30, batch_size=32, validation_split=0.1) loss, acc = model.evaluate(X_test, y_test) print('Test accuracy:', acc) ### 3.2 Regularization Techniques - **Dropout**: Randomly disables neurons during training. - **Batch Normalization**: Stabilizes learning by normalizing activations. - **Weight Decay (L2)**: Penalizes large weights. python layers.Dense(64, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.001)) --- ## 4. Transfer Learning – Reusing Knowledge Training deep networks from scratch requires large labeled corpora. Transfer learning leverages pretrained models and adapts them to new tasks. ### 4.1 Image Classification Transfer python from tensorflow.keras.applications import ResNet50 from tensorflow.keras.layers import GlobalAveragePooling2D, Dense from tensorflow.keras.models import Model base = ResNet50(weights='imagenet', include_top=False, input_shape=(224,224,3)) for layer in base.layers: layer.trainable = False # Freeze base layers x = GlobalAveragePooling2D()(base.output) preds = Dense(3, activation='softmax')(x) model = Model(inputs=base.input, outputs=preds) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # model.fit(...) # Train on new data ### 4.2 NLP Transfer – BERT python from transformers import BertTokenizer, TFBertForSequenceClassification tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Prepare dataset, train, evaluate > **Rule of Thumb**: If the target domain is similar to the pretraining data, *feature extraction* (freeze all layers) often works; otherwise, *fine‑tuning* (unfreeze top layers) yields better performance. --- ## 5. Hyper‑Parameter Tuning – Beyond Grid Search Optimal model performance hinges on the right hyper‑parameters. Modern strategies balance exploration with computational cost. | Method | Search Space | Pros | Cons | |--------|--------------|------|------| | Grid Search | Exhaustive | Simple | Exponential cost | | Random Search | Uniform | Finds good regions faster | May miss peaks | | Bayesian Optimization (Optuna, Hyperopt) | Probabilistic | Efficient | Implementation overhead | | Population‑Based Training (TPOT, AutoML) | Evolutionary | Auto‑feature selection | Resource intensive | ### 5.1 Optuna Example python import optuna from sklearn.model_selection import cross_val_score def objective(trial): max_depth = trial.suggest_int('max_depth', 3, 15) min_samples_split = trial.suggest_int('min_samples_split', 2, 10) clf = RandomForestClassifier(max_depth=max_depth, min_samples_split=min_samples_split, n_estimators=100, random_state=42) score = cross_val_score(clf, X, y, cv=5, scoring='accuracy').mean() return score study = optuna.create_study(direction='maximize') study.optimize(objective, n_trials=50) print('Best params:', study.best_params) --- ## 6. Reproducibility – The Engineering of Trust A model that works locally but fails in production erodes stakeholder confidence. Reproducibility spans from data to code to environment. | Layer | Practice | |-------|----------| | Randomness | Set seeds (NumPy, TensorFlow, PyTorch), use deterministic ops | | Environment | Use `conda` or `poetry` to lock package versions, maintain a `requirements.txt` or `pyproject.toml` | | Data | Version data with DVC or LakeFS; keep a manifest of the exact split | | Experiments | Log hyper‑parameters, metrics, and artifacts with MLflow or Weights & Biases | | Models | Store artifacts in a model registry (MLflow Models, SageMaker Endpoint) with metadata | | Documentation | Auto‑generate notebooks or docs that capture the full pipeline | python # Example: Seeding import numpy as np import tensorflow as tf np.random.seed(42) tf.random.set_seed(42) --- ## 7. Case Study – Customer Churn Prediction **Business Problem**: Predict which subscribers are likely to cancel their subscription within the next month. | Step | Action | |------|--------| | Data | 1M records, 30 features, imbalanced (5% churn). | | Preprocess | Impute missing values, one‑hot encode categories, scale numeric. | | Baseline | Logistic regression → 0.78 AUC. | | Ensemble | Random Forest + XGBoost (Stacking) → 0.86 AUC. | | Deep | MLP with dropout → 0.84 AUC. | | Hyper‑Tuning | Optuna on ensemble + MLP → 0.88 AUC. | | Deployment | Model packaged as ONNX; API served via FastAPI; drift monitoring via Evidently. | | Outcome | 12% reduction in churn over 6 months. | > **Lesson Learned**: Ensemble models outperformed deep nets on tabular data, but transfer learning and deeper nets could be beneficial if we had richer sequential or textual signals. --- ## 8. Take‑Home Messages 1. **Decision trees are the lingua franca** of explainable ML; they scale to large datasets when combined into ensembles. 2. **Random Forest and Gradient Boosting** are often first‑line solutions for tabular data; tune with cross‑validation and Bayesian search. 3. **Neural networks shine** on high‑dimensional, structured signals (images, text, time series); regularization is key to avoid over‑fitting. 4. **Transfer learning** can dramatically reduce training time and data requirements; choose fine‑tuning vs feature extraction based on domain similarity. 5. **Hyper‑parameter optimization** should be automated and reproducible; tools like Optuna, Ray Tune, and Hyperopt make this tractable. 6. **Reproducibility is non‑negotiable** in production; treat data, code, experiments, and models as versioned artifacts. 7. **Deploy with a view to monitoring**: data drift, concept drift, and latency must be logged and acted upon. --- > *In the next chapter we will dive into the ethical dimensions of data science, exploring how to weave fairness, accountability, and transparency into every stage of the analytics lifecycle.*

5. Predictive Modeling Essentials

Chapter 7: Ethical Alchemy – Turning Fairness into Strategic Trust