第六章深度學習與高階模型

發布於 2026-02-26 18:04

# 第六章深度學習與高階模型本章將帶領讀者從深度學習的基本概念開始，逐步深入到各種主流網路架構、訓練技巧、實際案例以及可解釋性與部署策略。透過實際程式碼示例（Python / PyTorch / TensorFlow）與業務應用說明，協助讀者能夠快速將深度學習模型落地，並確保模型在實際場景中的可維護性與可解釋性。 --- ## 6.1 深度學習概念 | 概念 | 定義 | 重要性 | |------|------|--------| | 神經網路 | 模仿生物神經系統結構，包含多層可學習的「感知器」 | 基礎單位，所有深度模型皆以此為核心 | | 反向傳播 | 通過鏈式法則計算梯度，進行參數更新 | 使模型能自動優化 | | 優化器 | 更新權重的演算法（SGD、Adam 等） | 決定學習速度與收斂性 | ### 1.1 神經網路結構最簡單的單層感知器（單層前饋網路）公式： \[ y = \sigma(\mathbf{w}^T\mathbf{x} + b) \] 其中 \(\sigma\) 為激活函式（如 ReLU、Sigmoid）。多層感知器（MLP）則在此基礎上堆疊多個隱藏層： \[ \mathbf{h}^{(l)} = \sigma(\mathbf{W}^{(l)}\mathbf{h}^{(l-1)} + \mathbf{b}^{(l)}) \] ### 1.2 反向傳播與梯度下降 - **前向傳播**：計算預測值。 - **損失函式**：衡量預測與真實值之差（如 MSE、Cross‑Entropy）。 - **反向傳播**：利用鏈式法則計算各參數的梯度。 - **參數更新**：\( \theta \leftarrow \theta - \eta \nabla L(\theta) \)。 --- ## 6.2 主流網路架構 ### 6.2.1 卷積神經網路（CNN） | 組件 | 作用 | |------|------| | 卷積層 | 提取局部特徵 | | 池化層 | 降低維度，增加平移不變性 | | 濾波器 | 學習不同尺度、方向的特徵 | #### 範例：簡易 CNN（PyTorch） python import torch import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self): super().__init__() self.features = nn.Sequential( nn.Conv2d(3, 32, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2), nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2) ) self.classifier = nn.Sequential( nn.Flatten(), nn.Linear(64*8*8, 256), nn.ReLU(), nn.Linear(256, 10) ) def forward(self, x): x = self.features(x) x = self.classifier(x) return x ### 6.2.2 循環神經網路（RNN） | 變種 | 特點 | |------|------| | 標準 RNN | 適用於序列資料，易發散 | | LSTM | 引入記憶門控，解決梯度消失 | | GRU | 結構簡化，參數更少 | #### 範例：LSTM 時間序列預測 python import torch import torch.nn as nn class TimeSeriesLSTM(nn.Module): def __init__(self, input_dim, hidden_dim, num_layers, output_dim): super().__init__() self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True) self.fc = nn.Linear(hidden_dim, output_dim) def forward(self, x): out, _ = self.lstm(x) out = self.fc(out[:, -1, :]) return out ### 6.2.3 Transformer & 自注意力 | 主要構件 | 作用 | |------|------| | Multi‑Head Attention | 同時捕捉多個子空間的關係 | | Positional Encoding | 引入序列位置信息 | | Feed‑Forward | 處理非線性變換 | #### 範例：簡易 Transformer Encoder（TensorFlow） python import tensorflow as tf from tensorflow.keras import layers class SimpleTransformerEncoder(tf.keras.Model): def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, maximum_position_encoding): super().__init__() self.embedding = layers.Embedding(input_vocab_size, d_model) self.pos_encoding = self.positional_encoding(maximum_position_encoding, d_model) self.enc_layers = [layers.MultiHeadAttention(num_heads, key_dim=d_model) for _ in range(num_layers)] self.ffn = layers.Dense(dff, activation='relu') def positional_encoding(self, position, d_model): # 省略實作細節 pass def call(self, x): seq_len = tf.shape(x)[1] x = self.embedding(x) + self.pos_encoding[:, :seq_len, :] for mha in self.enc_layers: x = mha(x, x) x = self.ffn(x) return x --- ## 6.3 訓練技巧 | 技巧 | 目的 | 參考實作 | |------|------|-----------| | Dropout | 防止過擬合 | `nn.Dropout(p)` | | Batch Normalization | 穩定訓練，縮短收斂時間 | `nn.BatchNorm2d()` | | Weight Decay | 正則化 | optimizer.add_weight_decay(l2=1e‑4) | | 資料增強 | 擴充樣本空間 | 例如 `torchvision.transforms.RandomCrop` | | Early Stopping | 防止過擬合 | 監控驗證損失 | | Learning‑Rate Scheduler | 自動調整學習率 | `torch.optim.lr_scheduler.CosineAnnealingLR` | > **注意**：對於大型模型（如 Transformer），建議先使用 **Warm‑up** 階段逐步提升學習率，再進入 **Decay** 階段。 --- ## 6.4 案例實作 | 案例 | 數據集 | 模型 | 主要挑戰 | |------|--------|------|-----------| | 圖像分類 | CIFAR‑10 | ResNet‑18 | 需要在小 GPU 上高效訓練 | | 文本分類 | IMDB | BERT (預訓練模型) | 需要大量語料的語義理解 | | 時間序列預測 | 電價曲線 | LSTM + Attention | 長期依賴與噪聲處理 | ### 6.4.1 CIFAR‑10 ResNet‑18（PyTorch） python import torch import torch.nn as nn import torch.optim as optim from torchsummary import summary # ResNet‑18 參考實作 class BasicBlock(nn.Module): expansion = 1 def __init__(self, in_channels, out_channels, stride=1, downsample=None): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.relu = nn.ReLU(inplace=True) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.downsample = downsample def forward(self, x): identity = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: identity = self.downsample(x) out += identity out = self.relu(out) return out class ResNet(nn.Module): def __init__(self, block, layers_cfg, num_classes=10): super().__init__() self.in_channels = 64 self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layer1 = self._make_layer(block, 64, layers_cfg[0]) self.layer2 = self._make_layer(block, 128, layers_cfg[1], stride=2) self.layer3 = self._make_layer(block, 256, layers_cfg[2], stride=2) self.layer4 = self._make_layer(block, 512, layers_cfg[3], stride=2) self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(512 * block.expansion, num_classes) def _make_layer(self, block, out_channels, blocks, stride=1): downsample = None if stride != 1 or self.in_channels != out_channels * block.expansion: downsample = nn.Sequential( nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels * block.expansion) ) layers_list = [block(self.in_channels, out_channels, stride, downsample)] self.in_channels = out_channels * block.expansion for _ in range(1, blocks): layers_list.append(block(self.in_channels, out_channels)) return nn.Sequential(*layers_list) def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = torch.flatten(x, 1) x = self.fc(x) return x # 建構 ResNet‑18 model = ResNet(BasicBlock, [2, 2, 2, 2]) print(summary(model, (3, 32, 32))) --- ## 6.5 高階模型與策略 ### 6.5.1 轉移學習（Transfer Learning） - **前置訓練模型**：如 VGG, ResNet, BERT。\n- **微調**：僅更新最後幾層參數，減少計算量。 python # PyTorch：Fine‑tune ResNet‑50 for 5‑class classification from torchvision import models model = models.resnet50(pretrained=True) for param in model.parameters(): param.requires_grad = False # freeze all layers model.fc = nn.Linear(2048, 5) # replace classifier ### 6.5.2 多模態模型 | 模式 | 例子 | |------|------| | 影像 + 文字 | 例如 CLIP、ViLBERT | | 影像 + 聲音 | 例如 Audio‑Visual Speech Recognition | ### 6.5.3 強化學習（RL）概覽 - **Agent**：決策者。 - **Environment**：執行動作的世界。 - **Reward**：評估動作好壞的信號。 - **Policy**：\(\pi(a|s)\) 從狀態到動作的映射。 > *本章僅作簡要提及，詳細內容可參考「第九章強化學習」或《Deep Reinforcement Learning Hands‑On》一書。* --- ## 6.6 可解釋性方法 | 方法 | 適用範例 | |------|-----------| | Grad‑CAM | CNN 視覺可視化 | | Integrated Gradients | 整合梯度解釋 | | SHAP for DL | 層級特徵重要度 | ### 6.6.1 Grad‑CAM 範例 python import torch import cv2 import numpy as np from torchvision.models import resnet18 model = resnet18(pretrained=True) model.eval() # 取得最後卷積層的特徵圖 def get_activation(name): def hook(model, input, output): activation[name] = output.detach() return hook activation = {} model.layer4[1].conv2.register_forward_hook(get_activation('layer4')) # 前向傳播 img = cv2.imread('dog.jpg') # 省略圖片前處理 output = model(img_tensor) # 取得最大類別的梯度 class_idx = output.argmax().item() output[:, class_idx].backward() # 計算 Grad‑CAM grads = activation['layer4'].grad.mean(dim=[2, 3], keepdim=True) w = grads.mean(dim=1, keepdim=True) cam = torch.relu((w * activation['layer4']).sum(dim=1)).squeeze() cam = cam.cpu().numpy() # 省略熱圖繪製細節 --- ## 6.7 部署策略 | 平台 | 優點 | 案例 | |------|------|------| | ONNX | 跨框架兼容 | `torch.onnx.export(...)` | | TensorRT | GPU 高效推理 | `tensorrt.Builder()` | | TorchServe | 服務化模型 | `torchserve --start --model-store model_store` | | Hugging Face Inference API | 雲端推理 | `transformers.pipeline('sentiment-analysis')` | > **小技巧**：對於 ResNet‑18 / LSTM 等模型，在部署前可使用 **量化**（8‑bit）以減少 70% 的內存佔用，並保持 5% 以上的精度。 --- ## 6.8 小結 1. **模型選擇**：根據問題類型（影像、文字、序列）選擇對應結構。 2. **訓練工程**：採用 Batch‑Norm、Dropout、LR‑Scheduler 等技巧保障訓練穩定與收斂。 3. **高階技巧**：轉移學習、微調、量化能大幅降低硬體成本。 4. **可解釋性**：Grad‑CAM、Integrated Gradients 等工具可為深度模型提供透明度。 5. **部署**：ONNX + TensorRT 或 TorchServe 可快速將模型推向實際環境。 > 透過本章的基礎理論與實作範例，您可以快速構建、優化並部署多種深度學習模型，並在必要時提供可解釋性與監管合規性。祝您開發順利！ --- **參考資料** - 《Deep Learning》 (Ian Goodfellow, Yoshua Bengio, Aaron Courville) - 《Computer Vision: Algorithms and Applications》 (Richard Szeliski) - 《Natural Language Processing with Transformers》 (L. M. R. H. - 《Deep Reinforcement Learning Hands‑On》 (Maxim Lapan) - PyTorch 官方文檔、TensorFlow 官方文檔 - Hugging Face 官方網站 *如需更深入的實作或模型微調技巧，歡迎參考上述文獻或聯繫專業顧問。*

第五章：模型部署與監控 — 從實驗室到商業環境

第七章：模型部署與監控：從實驗室到生產環境

聊天視窗

第六章 深度學習與高階模型

第六章深度學習與高階模型