第 5 章　視覺形象的自動生成與後製

發布於 2026-03-02 20:24

# 第 5 章　視覺形象的自動生成與後製本章聚焦於 **利用 Diffusion 系列生成模型**，將已微調的虛擬偶像角色資料轉換為高品質 **2D 立繪、動作帧以及 3D 模型**，並說明完整的後製流程與實務工具。讀者在完成本章練習後，能夠自行搭建一條從 Prompt → 圖像 → 影片 → 3D 資源的全自動管線。 --- ## 5.1 為何選擇 Diffusion 系列模型？ | 特性 | 代表模型 | 適用情境 | |------|----------|----------| | 文字→圖像 | Stable Diffusion 2.1、SDXL | 高解析度角色立繪、插畫風格 | 文字+結構→圖像 | ControlNet、ControlNet‑Pose | 依據骨架或深度圖生成一致姿勢 | 文本+3D → 3D | DreamFusion、Magic3D | 快速粗略 3D 雛形 | 圖像修補 | Inpainting、Resize (ESRGAN) | 細節補齊、解析度提升 Diffusion 模型的 **噪聲‑去噪迭代** 本質，使其在保留風格一致性的同時，具備高度的 **可控性**（透過 Prompt、Sampler、CFG Scale 等參數），非常適合作為虛擬偶像視覺資產的自動生成核心。 --- ## 5.2 Stable Diffusion 基礎概念 1. **Latent Diffusion**：先把圖像映射到低維 latent 空間，降低運算成本。 2. **Conditioning**：以文字、圖像、深度等條件引導去噪過程。 3. **Sampler**：Euler‑a、DDIM、DPM++ 等決定去噪路徑，會影響細節與速度。 4. **CFG Scale（Classifier‑Free Guidance）**：控制文字指令的強度，值越高生成越貼近 Prompt，過高易失去創意。 > **小技巧**：在角色立繪時，將 CFG 設為 **7~9**，在創意概念圖（如背景草圖）時可降低至 **4~5**，取得更自由的變化。 --- ## 5.3 工作流概覽 ```mermaid flowchart TD A[準備 Prompt & 參數] --> B[Stable Diffusion 生成 2D 圖] B --> C{需要後製？} C -- Yes --> D[Upscale / Inpaint / Color‑Grade] C -- No --> E[直接輸出] D --> F[產出高解析立繪] F --> G{要製作動畫?} G -- Yes --> H[ControlNet‑Pose 生成關鍵帧] H --> I[插值 & 合成影片] G -- No --> J[進入 3D pipeline] J --> K[Depth‑Map + MeshLab / Blender] K --> L[完成 3D 角色模型] style A fill:#f9f,stroke:#333,stroke-width:2px; style L fill:#bbf,stroke:#333,stroke-width:2px; ``` --- ## 5.4 Prompt Engineering（提示詞設計） ### 5.4.1 基本結構 ``` <角色名稱>, <外觀特徵>, <服裝風格>, <光影設定>, <畫風>, <解析度> ``` #### 範例 Prompt（V‑Style 立繪） ```text "Luna Star, pastel pink hair, twin tails, cyber‑punk runner outfit with neon accents, glowing visor, soft rim light, ultra‑detail, digital art, 8k" ``` ### 5.4.2 進階技巧 | 技巧 | 說明 | 範例 | |------|------|------| | **Negative Prompt** | 排除不想要的元素（如 "low‑res, blurry"） | `negative: low-res, blurry, watermark` | | **Style‑Tag** | 使用已知的藝術家或模型風格（如 "by sakimichan"） | `style: by sakimichan` | | **Region Prompt** | 以 ControlNet 替代特定區域（如手部） | `hand: open palm, detailed fingernails` | | **Seed 固定** | 再現相同構圖或做變體 | `seed: 12345678` | --- ## 5.5 資料前處理與模型微調回顧在第 4 章已完成角色專屬的 LoRA 微調（`lora_star_appearance.safetensors`），在本章使用 **載入 LoRA** 的方式提升角色風格一致性。 ```bash # 使用 Automatic1111 介面載入 LoRA python webui.py --ckpt models/StableDiffusion/stable-diffusion-v1-5.ckpt \ --lora-dir models/LoRA/ --lora star_appearance:0.8 ``` --- ## 5.6 生成 2D 角色立繪 ### 5.6.1 常用 Sampler & 參數組合 | Sampler | Steps | CFG | 推薦情境 | |---------|------|-----|----------| | Euler‑a | 30 | 8 | 高速概念圖 | DPM++ 2M Karras | 50 | 9 | 高品質完整立繪 | DDIM | 25 | 7 | 動畫關鍵帧（需要較少噪點） | ### 5.6.2 範例指令（CLI） ```bash python scripts/stable_diffusion.py \ --prompt "Luna Star, pastel pink hair, twin tails, cyber‑punk runner outfit, neon glow, ultra‑detail, digital art, 8k" \ --negative "lowres, blurry" \ --ckpt models/StableDiffusion/v1-5-pruned.safetensors \ --lora models/LoRA/star_appearance.safetensors:0.85 \ --sampler dpm++_2m_karras \ --steps 50 \ --cfg 9 \ --seed 20260302 \ --outdir outputs/illustrations \ --W 1024 --H 1024 ``` 生成後的圖檔會存於 `outputs/illustrations/20260302.png`。 ### 5.6.3 常見問題與除錯 | 症狀 | 可能原因 | 解決方案 | |------|-----------|----------| | 顏色偏暗 | CFG 過高 | 降低至 7~8，或加入 `bright lighting` 於 Prompt | | 手部異常 | 缺少手部細節控制 | 使用 ControlNet‑Pose 或載入手部 LoRA | | 雜訊過多 | Steps 太少 | 增加 Steps 至 45~60 | --- ## 5.7 動畫關鍵帧與插值 ### 5.7.1 使用 ControlNet‑Pose 產生連續姿勢 1. **準備骨架 JSON**（可由 OpenPose、MediaPipe 產出） 2. **設定 ControlNet** 為 `pose` 模式 ```bash python scripts/controlnet_pose.py \ --pose_json data/pose_seq.json \ --prompt "Luna Star, cyber‑punk runner pose, dynamic motion, soft lighting" \ --ckpt models/StableDiffusion/v1-5.safetensors \ --lora models/LoRA/star_appearance.safetensors:0.8 \ --sampler euler_a \ --steps 30 \ --cfg 8 \ --outdir outputs/pose_frames ``` 此腳本會根據每一幀的骨架自動生成對應圖像，產出 `frame_000.png … frame_059.png`（共 60 幀，2 秒 30fps）。 ### 5.7.2 幀插值（Frame Interpolation）使用 **RIFE** 或 **Flowframes** 進一步提升至 60fps。 ```bash rife-ncnn-vulkan -i outputs/pose_frames -o outputs/pose_interp -n 2 ``` ### 5.7.3 合成影片 ```bash ffmpeg -r 60 -i outputs/pose_interp/frame_%05d.png -c:v libx264 -pix_fmt yuv420p -crf 18 outputs/animation.mp4 ``` --- ## 5.8 從 2D 到 3D：深度圖、Mesh 生成與渲染 ### 5.8.1 生成深度圖（Depth‑Map） Stable Diffusion 可透過 **ControlNet‑Depth** 產出對應深度圖。 ```bash python scripts/controlnet_depth.py \ --prompt "Luna Star front view, high detail" \ --ckpt models/StableDiffusion/v1-5.safetensors \ --outdir outputs/depth_maps \ --W 512 --H 512 ``` ### 5.8.2 轉換為 3D Mesh（利用 MiDaS + Meshroom） 1. 使用 MiDaS 估算更精細的深度。 2. 載入 Meshroom 或 Blender 的 **Photogrammetry** pipeline 產出 OBJ。 ```bash # MiDaS 推理（Python 示例） python midas/run.py --input outputs/depth_maps/frame_000.png --output depth_midas.npy ``` 3. 在 Blender 中匯入深度圖，利用 **Displace Modifier** 把平面變成立體模型，最後 **Retopo** 簡化。 ### 5.8.3 3D 渲染與動作綁定 | 步驟 | 工具 | 重點 | |------|------|------| | UV 展開 | Blender | 保持與 2D 紋理對齊 | | 骨骼綁定 | Mixamo + Auto‑Rig Pro | 直接套用跑步、舞蹈等常用動作 | | 渲染 | Unity URP / Unreal Engine | 使用 PBR 材質、兩層渲染（角色 + 背景） | --- ## 5.9 後製流程：上采樣、修補與調色 ### 5.9.1 超分辨率（Upscaling） - **ESRGAN**、**Real‑ESRGAN**、**Stable Diffusion Upscale（SD‑XL‑Upscale）** 均可選。 ```bash python scripts/upscale.py --input outputs/illustrations/20260302.png --model RealESRGAN-x4plus --scale 2 --output outputs/upscaled/20260302_up.png ``` ### 5.9.2 Inpainting（局部修補）使用 Stable Diffusion inpaint 針對手部、眼睛等細節進行微調。 ```bash python scripts/inpaint.py \ --image outputs/upscaled/20260302_up.png \ --mask masks/hand_mask.png \ --prompt "detail of hand, sharp fingers, realistic skin texture" \ --outdir outputs/final ``` ### 5.9.3 色彩校正 & 風格統一 - **DaVinci Resolve**、**Adobe Lightroom** 可做全局 LUT。 - 若要自動化，可使用 **ColorfulGAN** 產生風格化 LUT。 --- ## 5.10 常用工具鏈與環境建議 | 類別 | 推薦工具 | 官方文件 | 主要特點 | |------|----------|----------|----------| | UI/前端 | Automatic1111、ComfyUI、InvokeAI | GitHub | 交互式 Prompt、批量生成、插件生態 | | 影片編輯 | FFmpeg、DaVinci Resolve | 命令列、GUI | 高效轉碼、時間軸編輯 | | 3D 建模 | Blender、Maya、Unity (URP) | 官方手冊 | 完整渲染管線、即時預覽 | | 版本管理 | Git + DVC | Git 官方 | 大型模型檔與資料集追蹤 | | 硬體需求 | RTX 3090 / 4090 (24GB+) | NVIDIA | Diffusion 生成速度提升 3‑5 倍 | **建議 Docker 環境**（快速復現） ```dockerfile FROM nvidia/cuda:12.2.2-runtime-ubuntu22.04 RUN apt-get update && apt-get install -y python3-pip git RUN pip install torch==2.2.0+cu121 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121 RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git /app && \ cd /app && pip install -r requirements_versions.txt ENV PYTHONUNBUFFERED=1 WORKDIR /app CMD ["python", "webui.py", "--listen", "--port", "7860"] ``` --- ## 5.11 案例實作：從 Prompt 到完整 30 秒 MV 1. **角色**：Luna Star（第 2 章設定） 2. **目標**：產出 1080p、30fps、配合原創歌曲的 MV。流程概述： - 生成 5 種場景立繪（街道、舞台、虛擬星空、咖啡廳、未來都市） - 每場景 6 秒，使用 ControlNet‑Pose 產生 180 幀關鍵帧 - 使用 RIFE 內插至 540 幀（30fps） - 把每段動畫與音樂對應的 **beat** 進行剪輯，加入 **光暈特效**（After Effects） - 最終調色使用 **DaVinci Resolve** LUT（Cyber‑Neon） 3. **腳本概略**（以 Bash 為例） ```bash # 1. 產生成立繪 for scene in street stage sky cafe city; do python generate_illustration.py --scene $scene --out outputs/scene_${scene}.png done # 2. 產生姿勢序列（以 MediaPipe 提供的 json） python generate_pose_frames.py --pose_json data/luna_pose.json --out outputs/frames # 3. 插值至 30fps rife-ncnn-vulkan -i outputs/frames -o outputs/frames_interp -n 2 # 4. 合成影片 ffmpeg -r 30 -i outputs/frames_interp/frame_%05d.png -i audio/luna_theme.wav \ -filter_complex "[0:v]fade=t=in:st=0:d=1,fade=t=out:st=28:d=1[v];[1:a]afade=t=in:st=0:d=2,afade=t=out:st=28:d=2[a]" \ -map "[v]" -map "[a]" -c:v libx264 -crf 18 -c:a aac -b:a 192k outputs/luna_mv.mp4 ``` 4. **成果檢視**：影片長度 30 秒，解析度 1920×1080，整體視覺風格與角色設計高度一致，觀眾回饋「角色活靈活現、光影節奏感好」。 --- ## 5.12 小結與實作清單 | 步驟 | 主要任務 | 推薦工具 | 成果檔案 | |------|----------|----------|----------| | 5.1 | 了解 Diffusion 基礎 | 官方論文、HuggingFace | ✅ 理解概念 | | 5.2 | 設定 Prompt & LoRA | Automatic1111 UI | `prompt.txt` | | 5.3 | 生成 2D 立繪 | `scripts/stable_diffusion.py` | `illustrations/*.png` | | 5.4 | 生成動畫關鍵帧 | ControlNet‑Pose | `pose_frames/*.png` | | 5.5 | 幀插值與影片合成 | RIFE + FFmpeg | `animation.mp4` | | 5.6 | 產生深度圖與 3D Mesh | ControlNet‑Depth + Blender | `3d_model.obj` | | 5.7 | 後製上采樣、修補、調色 | Real‑ESRGAN、Inpaint、DaVinci Resolve | `final/*.png` | | 5.8 | 整體檢測與匯出 | OBS / Unity | `final_mv.mp4` | > **最佳實踐**：每一步完成後，使用 `git commit -m "[5.x] 完成 <任務>"` 保存版本，並以 `dvc add <output>` 追蹤大型模型與圖像檔，防止資料遺失。 --- **本章結語** 透過 Diffusion 系列模型的高度可控與多模態延伸能力，創作者可以在 **短時間內** 從文字概念產出完整的 2D、動畫乃至 3D 視覺資產。結合第 4 章微調的 LoRA、Adapter‑Fusion，角色的風格與個性得以在所有媒體形態中保持一致，為後續的聲音合成（第 6 章）與內容工作流（第 7 章）奠定堅實基礎。祝大家在自己的虛擬偶像旅程中，創造出光彩奪目、令人印象深刻的視覺奇蹟！

第 4 章數據收集與模型微調

第 6 章：聲音合成與情感表達

聊天視窗

第 5 章 視覺形象的自動生成與後製

第 5 章　視覺形象的自動生成與後製