聊天視窗

Data Science for Social Good: Analytics to Drive Impact - 第 7 章

Chapter 7: Advanced Analytics & AI

發布於 2026-03-02 07:11

# Chapter 7: Advanced Analytics & AI > *In the realm of social good, advanced analytics and artificial intelligence (AI) unlock patterns that were previously invisible, scale interventions, and provide a data‑driven voice for marginalized communities.* ## 7.1 Neural Networks for Social Impact ### 7.1.1 What Are Neural Networks? Neural networks (NNs) are computational models inspired by the human brain. They consist of layers of interconnected **neurons** that learn to map inputs to outputs through training on labeled data. Key concepts include: | Concept | Definition | |---------|------------| | **Layer** | A set of neurons that transform input data into an intermediate representation. | | **Activation Function** | Non‑linear function (e.g., ReLU, sigmoid) that introduces complexity. | | **Loss Function** | Metric that quantifies error (e.g., cross‑entropy for classification). | | **Backpropagation** | Gradient‑based method to update weights by propagating error backwards. | ### 7.1.2 Example: Predicting Child Malnutrition A public‑health NGO wants to target villages most at risk of severe acute malnutrition. Using a multi‑layer perceptron (MLP): 1. **Data** – Household surveys, weather data, sanitation scores. 2. **Preprocessing** – Normalization, one‑hot encoding, imputation. 3. **Model** – 3 hidden layers (128, 64, 32 units) with ReLU, dropout 0.2. 4. **Training** – 80/20 split, Adam optimizer, binary cross‑entropy. 5. **Evaluation** – ROC‑AUC = 0.87, precision‑recall curve indicates good balance. ```python import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential([ layers.Dense(128, activation='relu', input_shape=(X_train.shape[1],)), layers.Dropout(0.2), layers.Dense(64, activation='relu'), layers.Dense(32, activation='relu'), layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1) ``` ### 7.1.3 Interpretability & Fairness - **SHAP values** reveal feature contributions per sample. - Use **funnel analysis** to ensure model decisions do not disproportionately affect low‑income groups. - Deploy **counterfactual explanations** to policymakers. ## 7.2 Natural Language Processing (NLP) ### 7.2.1 Why NLP Matters in Social Good Text data – policy documents, social media, survey open‑ended responses – contains rich, unstructured insights about public sentiment, needs, and barriers. NLP transforms this noise into actionable metrics. ### 7.2.2 Core NLP Tasks | Task | Use‑Case in Social Good | |------|------------------------| | Sentiment Analysis | Gauge community mood during disaster relief | | Topic Modeling | Identify dominant themes in NGO reports | | Named Entity Recognition (NER) | Extract locations, organizations from news feeds | | Text Summarization | Create concise briefs for policy briefs | ### 7.2.3 Pipeline Example: Disaster Relief Tweet Analysis ```python import tweepy from transformers import pipeline # 1. Collect tweets api = tweepy.Client(bearer_token='YOUR_TOKEN') tweets = api.search_recent_tweets(query='earthquake', max_results=1000) texts = [t.text for t in tweets.data] # 2. Sentiment sentiment = pipeline('sentiment-analysis') results = sentiment(texts) # 3. Aggregate import pandas as pd df = pd.DataFrame({'text': texts, 'sentiment': [r['label'] for r in results]}) print(df['sentiment'].value_counts()) ``` ### 7.2.4 Bias Mitigation - Use **balanced corpora** across demographics. - Evaluate **fairness metrics** (e.g., equal opportunity) for each sentiment class. - Apply **de‑biasing embeddings** (e.g., debiased word vectors). ## 7.3 Geospatial Analytics ### 7.3.1 Spatial Data Foundations | Term | Definition | |------|------------| | **Raster** | Grid‑based representation (e.g., satellite imagery). | | **Vector** | Points, lines, polygons (e.g., city boundaries). | | **Spatial Autocorrelation** | Measure of how similar values are at nearby locations (Moran’s I). | | **Hotspot Analysis** | Identifying statistically significant clusters of high values. | ### 7.3.2 Tools & Libraries - **GeoPandas** – Pythonic geospatial data manipulation. - **PySAL** – Spatial analysis toolbox. - **Leaflet / Folium** – Interactive web maps. - **ArcGIS / QGIS** – GUI‑based GIS platforms. ### 7.3.3 Case: Mapping Cholera Outbreaks 1. **Data** – Daily case counts per district, population density raster, water source locations. 2. **Spatial Join** – Associate cases with nearest water source. 3. **Hotspot** – Apply Getis‑Ord Gi* to locate high‑risk clusters. 4. **Visualization** – Folium choropleth with overlay markers. ```python import geopandas as gpd import matplotlib.pyplot as plt from libpysal import esda from esda import G_Local districts = gpd.read_file('districts.shp') cases = pd.read_csv('cholera_cases.csv') # Merge districts = districts.merge(cases, on='district_id') # Local Gi* coords = list(zip(districts.geometry.centroid.x, districts.geometry.centroid.y)) g_local = G_Local(districts['cases'], coords) # Add results districts['Gi'] = g_local.G # Plot districts.plot(column='Gi', cmap='Reds', legend=True) plt.title('Cholera Hotspots') plt.show() ``` ### 7.3.4 Spatial Modeling - **Spatio‑Temporal Regression**: e.g., SAR, CAR models. - **Geographically Weighted Regression (GWR)**: capture local variation. - **Agent‑Based Models**: simulate disease spread on a grid. ## 7.4 Explainability & Responsible AI - **SHAP (SHapley Additive exPlanations)**: feature contribution per prediction. - **LIME (Local Interpretable Model‑agnostic Explanations)**: surrogate linear models. - **Counterfactual Generation**: "What if" scenarios for policy simulation. - **Model Cards**: Documentation of data, metrics, intended use, and limitations. ## 7.5 Scaling AI Models for Impact | Consideration | Practical Tip | |---------------|---------------| | **Compute** | Leverage cloud TPUs (e.g., Google Cloud) or on‑prem GPUs for large models. | | **Edge Deployment** | TinyML models for mobile health kits in low‑connectivity areas. | | **Reproducibility** | Containerize pipelines (Docker), use versioned datasets (DVC). | | **Energy Efficiency** | Prefer sparse architectures, prune models; track carbon footprints. | | **Data Pipelines** | Use Airflow or Prefect for ETL, model training, and monitoring. | ## 7.6 Case Study: AI for Refugee Support | Stage | Action | Outcome | |-------|--------|---------| | Data | SMS, social media, UNHCR records | Rich, multilingual dataset | | Model | Transformer‑based NER for identity extraction; BERT sentiment on needs | 92% F1 for NER, early detection of food insecurity | | Deployment | Cloud API, integrated into UNHCR portal | 30% faster allocation of aid resources | | Impact | 15% reduction in unmet basic needs within 6 months | ## 7.7 Practical Toolkit | Domain | Recommended Libraries | Why | |--------|-----------------------|-----| | Deep Learning | TensorFlow, PyTorch, Keras | Mature ecosystems, GPU support | | NLP | Hugging Face Transformers, spaCy, Gensim | State‑of‑the‑art models & pipelines | | Geospatial | GeoPandas, Rasterio, PySAL, Folium | Pythonic GIS & spatial stats | | Model Explainability | SHAP, LIME, ELI5 | Model transparency | | Orchestration | Airflow, Prefect, Dagster | Workflow automation | > **Tip**: Build a **starter kit** that bundles sample datasets, notebooks, and Docker images so newcomers can hit the ground running. ## 7.8 Ethical & Responsible Practices - **Data Privacy**: Use differential privacy for sensitive health data. - **Bias Auditing**: Regularly assess model performance across demographic slices. - **Transparency**: Publish model cards and open source code where feasible. - **Community Engagement**: Validate insights with local stakeholders before deployment. ## 7.9 Takeaway Advanced analytics and AI unlock deeper, more actionable insights for social good, but only when coupled with robust governance, explainability, and ethical stewardship. Neural networks power predictive precision, NLP brings unstructured voices to the forefront, and geospatial analytics maps where help is most needed. Together, they form a triad that transforms raw data into targeted, compassionate interventions.