Chapter 4: Statistical Inference – From Data to Decision

發布於 2026-02-24 13:41

# Chapter 4: Statistical Inference – From Data to Decision In the previous chapter you learned how to explore data and uncover patterns that guide modeling choices. Now we shift our focus from **exploratory** to **confirmatory** analysis. Statistical inference provides the toolkit that lets you move from observations to conclusions, quantify uncertainty, and support decisions with evidence. --- ## 4.1 Why Inference Matters for Decision Makers - **Uncertainty is inevitable**. Even the best data come from noisy sources. Inference teaches you how to measure that uncertainty. - **Decisions require evidence**. Whether you’re a product manager launching a new feature or a CFO tightening a budget, you need to know if a change *really* works. - **Regulatory and ethical stakes**. In healthcare, finance, or public policy, conclusions must be statistically sound to avoid costly mistakes. > *In practice, the difference between “I think this worked” and “The evidence shows this worked” is what separates a data‑driven culture from a data‑heavy one.* --- ## 4.2 Core Concepts | Concept | What it is | Why it matters | Typical Formula | |---------|------------|----------------|-----------------| | **Hypothesis Test** | Formal comparison of two or more groups | Determines if observed differences are likely due to chance | *t‑test, chi‑square, ANOVA* | | **p‑value** | Probability of observing data as extreme as you did, under the null | Provides a threshold for significance | *P(Observed ≥ Data | H0)* | | **Confidence Interval (CI)** | Range of values that likely contains the true parameter | Communicates precision of an estimate | *Mean ± 1.96×SE* | | **Effect Size** | Magnitude of an effect, independent of sample size | Helps decide if a statistically significant result is practically useful | *Cohen’s d, Odds Ratio, R²* | | **Multiple Testing** | Conducting many tests simultaneously | Controls the chance of false positives | *Bonferroni, Benjamini–Hochberg* | > **Tip:** A 95% CI that does not cross zero indicates a statistically significant difference, but always interpret it alongside the effect size. --- ## 4.3 The Inference Workflow 1. **Formulate a research question** – e.g., "Does the new checkout flow reduce cart abandonment?" 2. **Specify null and alternative hypotheses** – H0: No difference; H1: Difference exists. 3. **Choose the right test** – Depends on data type, distribution, and sample size. 4. **Check assumptions** – Normality, homogeneity of variance, independence. 5. **Compute test statistic & p‑value** – Often via statistical software. 6. **Calculate confidence intervals** – Gives a range for the effect size. 7. **Interpret** – Combine p‑value, CI, effect size, and business context. 8. **Report** – Present results with clear visuals and plain‑English explanations. --- ## 4.4 Common Tests in a Nutshell ### 4.4.1 Two‑Sample t‑Test - **Use**: Compare means of two independent groups. - **Assumptions**: Normally distributed outcomes, equal variances (Levene’s test can check), independent observations. - **Example**: Testing if average order value differs between customers exposed to a new recommendation engine. python from scipy import stats t_stat, p_val = stats.ttest_ind(sample_a, sample_b, equal_var=False) ### 4.4.2 Chi‑Square Test of Independence - **Use**: Assess association between two categorical variables. - **Assumptions**: Expected frequencies >5 in each cell, independent observations. - **Example**: Relationship between device type (mobile vs. desktop) and click‑through rate. python chi2, p_val, dof, exp = stats.chi2_contingency(contingency_table) ### 4.4.3 ANOVA (Analysis of Variance) - **Use**: Compare means across three or more groups. - **Assumptions**: Normality, equal variances, independence. - **Example**: Evaluating three pricing tiers’ impact on subscription churn. python f_stat, p_val = stats.f_oneway(group1, group2, group3) ### 4.4.4 Logistic Regression Inference - **Use**: Predict a binary outcome and interpret odds ratios. - **Assumptions**: Linearity in log‑odds, no perfect multicollinearity, independence. - **Example**: Modeling the probability that a lead converts based on campaign features. python import statsmodels.api as sm logit = sm.Logit(y, X).fit() print(logit.summary()) --- ## 4.5 Real‑World Case Study: A/B Test for a New Feature **Scenario**: The e‑commerce team rolls out a new product recommendation widget and wants to know if it increases average order value (AOV). 1. **Define hypotheses** - H0: The widget does not change AOV. - H1: The widget increases (or decreases) AOV. 2. **Randomly assign** 10,000 users to Control (old UI) and Treatment (new widget). 3. **Collect data**: AOV for each user over a 30‑day window. 4. **Check assumptions**: Shapiro–Wilk test indicates normality; Levene’s test suggests equal variances. 5. **Run t‑test**: t‑stat = 3.42, p‑value = 0.0004. 6. **Compute 95% CI** for the mean difference: [0.15, 0.35] dollars. 7. **Effect size**: Cohen’s d = 0.27 (small to medium). 8. **Interpret**: Statistically significant, but the mean increase is modest. Business impact depends on conversion volume. 9. **Report**: Present a bar chart, the CI, and a plain‑English note: *"The new widget increases AOV by about $0.25 on average, a modest lift that may still justify the development cost if scaled across millions of users."* > *Lesson*: Statistical significance does not automatically translate to business significance. Combine inference with cost‑benefit analysis. --- ## 4.6 Ethical & Practical Pitfalls | Pitfall | What It Looks Like | How to Avoid It | |---------|--------------------|-----------------| | **p‑hacking** | Tweaking tests until p < 0.05 | Pre‑register tests; use a test‑then‑retest protocol | | **Multiple Comparisons** | Many subgroup tests inflate false positives | Apply FDR control or Bonferroni correction | | **Over‑reliance on p‑values** | Treating p < 0.05 as a hard threshold | Report effect sizes & confidence intervals alongside p‑values | | **Ignoring Assumptions** | Using t‑test with heavily skewed data | Transform data or use non‑parametric alternatives | | **Misinterpreting Causality** | Correlation ≠ causation | Use randomized experiments or causal inference techniques | --- ## 4.7 Quick Reference Cheat Sheet | Situation | Test | Key Output | |-----------|------|------------| | Two means, equal variances | Independent t‑test | t‑stat, p‑value | | Two means, unequal variances | Welch’s t‑test | t‑stat, p‑value | | One categorical vs. one categorical | Chi‑square | χ², p‑value | | Multiple categorical categories | Chi‑square (larger table) | χ², p‑value | | Continuous outcome, one categorical predictor | One‑way ANOVA | F‑stat, p‑value | | Multiple continuous predictors, binary outcome | Logistic regression | Odds ratios, p‑values | --- ## 4.8 Key Takeaways - **Inference turns data into evidence**: It quantifies whether patterns are likely real. - **Always pair p‑values with effect sizes and confidence intervals** to gauge practical significance. - **Assumptions matter**: Violating them can invalidate results; check before you trust the numbers. - **Context is king**: Statistical results must be interpreted in light of business goals, cost structures, and stakeholder priorities. - **Ethical rigor**: Pre‑specify tests, control false positives, and transparently report all findings. --- > *Ready for the next leap? In Chapter 5 we’ll build predictive models that not only describe the data but also forecast future outcomes, powering proactive decision making.*

Chapter 3: Exploratory Data Analysis (EDA)

Chapter 5: From Descriptive to Predictive – Building Forecasting Models for Decision Makers