返回目錄
A
Analytics Alchemy: Turning Data into Strategic Advantage - 第 6 章
Chapter 6: Advanced Machine Learning & Deep Learning
發布於 2026-03-02 15:57
# Chapter 6: Advanced Machine Learning & Deep Learning
In the previous chapters we established the statistical bedrock and built reliable pipelines that bring data into a usable form. Here we jump into the most powerful tools at our disposal: **advanced supervised learning models, ensembles, and deep neural networks**. The chapter is organized into six thematic blocks, each with practical Python snippets, illustrative diagrams, and key take‑aways that translate directly into production‑ready workflows.
---
## 1. Decision Trees – The Building Blocks
Decision trees partition the feature space into axis‑aligned rectangles. They are *interpretable*, *non‑parametric*, and serve as the foundation for many ensemble methods.
| Algorithm | Splitting Criterion | Pros | Cons |
|-----------|---------------------|------|------|
| CART | Gini impurity or MSE | Fast, handles mixed data | Prone to over‑fitting, unstable |
| ID3 | Information gain | Simple, interpretable | Bias toward features with many levels |
| C4.5 | Gain ratio | Handles missing values | More computational overhead |
### 1.1 Core Concepts
- **Node**: Decision point; **leaf**: Prediction.
- **Depth**: Number of edges from root to deepest leaf.
- **Pruning**: Reducing over‑fitting by cutting back branches that offer little improvement.
### 1.2 Quick Implementation
python
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, pred))
> **Tip**: Use `max_depth`, `min_samples_split`, and `min_samples_leaf` to control tree complexity.
---
## 2. Ensembles – Bagging, Boosting, and Stacking
Ensemble techniques combine multiple learners to improve generalization. They mitigate the weaknesses of single models while preserving interpretability where possible.
### 2.1 Bagging – Random Forest
*Bootstrap Aggregating* reduces variance by training each tree on a random subset of data.
python
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=5, random_state=42)
rf.fit(X_train, y_train)
print("RF Accuracy:", accuracy_score(y_test, rf.predict(X_test)))
### 2.2 Boosting – Gradient Boosting & XGBoost
*Gradient Boosting* builds trees sequentially, each correcting residuals of the previous.
python
from xgboost import XGBClassifier
xgb = XGBClassifier(n_estimators=300, learning_rate=0.05, max_depth=4, subsample=0.8, colsample_bytree=0.8, random_state=42)
xgb.fit(X_train, y_train)
print("XGB Accuracy:", accuracy_score(y_test, xgb.predict(X_test)))
### 2.3 Stacking
Combining predictions of diverse models via a meta‑learner.
python
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
estimators = [
('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
('xgb', XGBClassifier(use_label_encoder=False, eval_metric='mlogloss', random_state=42))
]
stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression(), cv=5)
stack.fit(X_train, y_train)
print("Stack Accuracy:", accuracy_score(y_test, stack.predict(X_test)))
---
## 3. Neural Networks – From Feedforward to Transformers
Deep neural networks (DNNs) approximate complex functions using layers of non‑linear units. The most common types are:
| Architecture | Typical Use‑Case |
|--------------|------------------|
| Feedforward (MLP) | Tabular regression/classification |
| Convolutional Neural Network (CNN) | Image, video |
| Recurrent Neural Network (RNN) / LSTM / GRU | Time series, NLP |
| Transformer | NLP, vision, multi‑modal |
### 3.1 Basic MLP Example
python
import tensorflow as tf
from tensorflow.keras import layers, models
model = models.Sequential([
layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
layers.Dense(32, activation='relu'),
layers.Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=30, batch_size=32, validation_split=0.1)
loss, acc = model.evaluate(X_test, y_test)
print('Test accuracy:', acc)
### 3.2 Regularization Techniques
- **Dropout**: Randomly disables neurons during training.
- **Batch Normalization**: Stabilizes learning by normalizing activations.
- **Weight Decay (L2)**: Penalizes large weights.
python
layers.Dense(64, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.001))
---
## 4. Transfer Learning – Reusing Knowledge
Training deep networks from scratch requires large labeled corpora. Transfer learning leverages pretrained models and adapts them to new tasks.
### 4.1 Image Classification Transfer
python
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense
from tensorflow.keras.models import Model
base = ResNet50(weights='imagenet', include_top=False, input_shape=(224,224,3))
for layer in base.layers:
layer.trainable = False # Freeze base layers
x = GlobalAveragePooling2D()(base.output)
preds = Dense(3, activation='softmax')(x)
model = Model(inputs=base.input, outputs=preds)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# model.fit(...) # Train on new data
### 4.2 NLP Transfer – BERT
python
from transformers import BertTokenizer, TFBertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
# Prepare dataset, train, evaluate
> **Rule of Thumb**: If the target domain is similar to the pretraining data, *feature extraction* (freeze all layers) often works; otherwise, *fine‑tuning* (unfreeze top layers) yields better performance.
---
## 5. Hyper‑Parameter Tuning – Beyond Grid Search
Optimal model performance hinges on the right hyper‑parameters. Modern strategies balance exploration with computational cost.
| Method | Search Space | Pros | Cons |
|--------|--------------|------|------|
| Grid Search | Exhaustive | Simple | Exponential cost |
| Random Search | Uniform | Finds good regions faster | May miss peaks |
| Bayesian Optimization (Optuna, Hyperopt) | Probabilistic | Efficient | Implementation overhead |
| Population‑Based Training (TPOT, AutoML) | Evolutionary | Auto‑feature selection | Resource intensive |
### 5.1 Optuna Example
python
import optuna
from sklearn.model_selection import cross_val_score
def objective(trial):
max_depth = trial.suggest_int('max_depth', 3, 15)
min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
clf = RandomForestClassifier(max_depth=max_depth,
min_samples_split=min_samples_split,
n_estimators=100,
random_state=42)
score = cross_val_score(clf, X, y, cv=5, scoring='accuracy').mean()
return score
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)
print('Best params:', study.best_params)
---
## 6. Reproducibility – The Engineering of Trust
A model that works locally but fails in production erodes stakeholder confidence. Reproducibility spans from data to code to environment.
| Layer | Practice |
|-------|----------|
| Randomness | Set seeds (NumPy, TensorFlow, PyTorch), use deterministic ops |
| Environment | Use `conda` or `poetry` to lock package versions, maintain a `requirements.txt` or `pyproject.toml` |
| Data | Version data with DVC or LakeFS; keep a manifest of the exact split |
| Experiments | Log hyper‑parameters, metrics, and artifacts with MLflow or Weights & Biases |
| Models | Store artifacts in a model registry (MLflow Models, SageMaker Endpoint) with metadata |
| Documentation | Auto‑generate notebooks or docs that capture the full pipeline |
python
# Example: Seeding
import numpy as np
import tensorflow as tf
np.random.seed(42)
tf.random.set_seed(42)
---
## 7. Case Study – Customer Churn Prediction
**Business Problem**: Predict which subscribers are likely to cancel their subscription within the next month.
| Step | Action |
|------|--------|
| Data | 1M records, 30 features, imbalanced (5% churn). |
| Preprocess | Impute missing values, one‑hot encode categories, scale numeric. |
| Baseline | Logistic regression → 0.78 AUC. |
| Ensemble | Random Forest + XGBoost (Stacking) → 0.86 AUC. |
| Deep | MLP with dropout → 0.84 AUC. |
| Hyper‑Tuning | Optuna on ensemble + MLP → 0.88 AUC. |
| Deployment | Model packaged as ONNX; API served via FastAPI; drift monitoring via Evidently. |
| Outcome | 12% reduction in churn over 6 months. |
> **Lesson Learned**: Ensemble models outperformed deep nets on tabular data, but transfer learning and deeper nets could be beneficial if we had richer sequential or textual signals.
---
## 8. Take‑Home Messages
1. **Decision trees are the lingua franca** of explainable ML; they scale to large datasets when combined into ensembles.
2. **Random Forest and Gradient Boosting** are often first‑line solutions for tabular data; tune with cross‑validation and Bayesian search.
3. **Neural networks shine** on high‑dimensional, structured signals (images, text, time series); regularization is key to avoid over‑fitting.
4. **Transfer learning** can dramatically reduce training time and data requirements; choose fine‑tuning vs feature extraction based on domain similarity.
5. **Hyper‑parameter optimization** should be automated and reproducible; tools like Optuna, Ray Tune, and Hyperopt make this tractable.
6. **Reproducibility is non‑negotiable** in production; treat data, code, experiments, and models as versioned artifacts.
7. **Deploy with a view to monitoring**: data drift, concept drift, and latency must be logged and acted upon.
---
> *In the next chapter we will dive into the ethical dimensions of data science, exploring how to weave fairness, accountability, and transparency into every stage of the analytics lifecycle.*