Lecture 08: Power Analysis

Experimental Design in Education

Author
Affiliation

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

Published

March 7, 2025

Modified

February 2, 2026

2 Introduction

2.1 Learning Objectives

By the end of this lecture, you will be able to:

  1. Define statistical power and explain its importance in experimental design
  2. Understand the relationship between Type I and Type II errors
  3. Identify the key components of power analysis
  4. Calculate required sample sizes for different experimental designs
  5. Apply power analysis to common research scenarios

2.2 Why This Matters

NoteReal-World Impact in Education Research
  • Underpowered studies waste limited education funding and may miss important interventions
  • Overpowered studies consume resources that could benefit more students
  • Power analysis is required for grant proposals (IES, NSF, NIH)
  • Helps determine if research questions are feasible with available schools/classrooms
  • Ethical obligation: Don’t ask teachers/students to participate in futile studies

3 What is Power Analysis?

3.1 Definition: Statistical Power

TipIn Plain English

Imagine you’re testing whether a new teaching method works better than the old one. Power is the chance that your study will be able to detect the improvement, if the improvement is really there.

Think of it like a metal detector:

  • A high-power detector can find small coins buried deep in the sand
  • A low-power detector might miss those same coins, even though they’re there

Formal Definition: Statistical power is the probability that a study will correctly reject a false null hypothesis.

\text{Power} = 1 - \beta

where \beta is the probability of a Type II error (false negative)

TipIn Plain English

In other words: Power tells us how good our study is at discovering real effects. Higher power means we’re less likely to miss something important.

3.2 The Four Possible Outcomes

Click to see R code
flowchart TD
    A[Reality: Null Hypothesis is TRUE] --> B{Decision}
    B -->|Reject H0| C[Type I Error<br/>False Positive<br/>α]
    B -->|Fail to Reject H0| D[Correct Decision<br/>1 - α]

    E[Reality: Alternative Hypothesis is TRUE] --> F{Decision}
    F -->|Reject H0| G[Correct Decision<br/>Power = 1 - β]
    F -->|Fail to Reject H0| H[Type II Error<br/>False Negative<br/>β]

    style C fill:#ff6b6b
    style H fill:#ff6b6b
    style D fill:#51cf66
    style G fill:#51cf66

flowchart TD
    A[Reality: Null Hypothesis is TRUE] --> B{Decision}
    B -->|Reject H0| C[Type I Error<br/>False Positive<br/>α]
    B -->|Fail to Reject H0| D[Correct Decision<br/>1 - α]

    E[Reality: Alternative Hypothesis is TRUE] --> F{Decision}
    F -->|Reject H0| G[Correct Decision<br/>Power = 1 - β]
    F -->|Fail to Reject H0| H[Type II Error<br/>False Negative<br/>β]

    style C fill:#ff6b6b
    style H fill:#ff6b6b
    style D fill:#51cf66
    style G fill:#51cf66

3.3 Understanding Errors

3.3.1 Type I Error (α)

  • False Positive
  • Rejecting true null hypothesis
  • Usually set at α = 0.05
  • “Finding an effect that isn’t there”

3.3.2 Type II Error (β)

  • False Negative
  • Failing to reject false null hypothesis
  • Usually set at β = 0.20
  • “Missing an effect that is there”

3.4 Power in Context

Visualization of statistical power showing the relationship between null and alternative distributions

Visualization of statistical power showing the relationship between null and alternative distributions

4 Why Do Power Analysis?

4.1 Critical Reasons

  1. Resource Efficiency
    • Avoid wasting time, money, and effort on underpowered studies
    • Don’t collect more data than necessary
  2. Ethical Considerations
    • Minimize participant burden and exposure to risks
    • Ensure research has realistic chance of success
  3. Scientific Integrity
    • Detect meaningful effects when they exist
    • Avoid publishing false negatives
  4. Funding Requirements
    • Grant agencies require power analyses
    • Demonstrates research feasibility

4.2 Consequences of Low Power

4.2.1 Too Small Sample

  • Miss effective interventions (Type II error)
  • Waste teacher and student time
  • Publish false negatives (“it doesn’t work”)
  • Discourage future research on promising approaches
  • Contribute to replication crisis

4.2.2 Too Large Sample

  • Unnecessary recruitment burden on schools
  • Waste limited education funding
  • Find statistically significant but educationally trivial results
  • Fewer resources for other important studies

4.3 Power Analysis is NOT About Gaming the System

WarningCommon Misconceptions

Power analysis should help determine if a question can be reasonably answered, not:

  • Justify a predetermined sample size
  • Defend what you want to study anyway
  • Manipulate effect sizes to get funding
  • Written defensively after the study is planned

5 The Logic of Power Analysis

5.1 Five Key Components

Every power analysis involves specifying values for:

  1. Significance Level (α): Usually 0.05
  2. Power (1 - β): Usually 0.80 (minimum 80%)
  3. Effect Size (δ): Meaningful difference to detect
  4. Variability (σ): Standard deviation of measurements
  5. Sample Size (n): What we usually solve for
Note

If you know any four, you can calculate the fifth!

5.2 Significance Level (α)

  • Type I error rate - probability of false positive
  • Conventionally set at α = 0.05 (5%)
  • Controls how “strict” we are about declaring effects significant
  • Relates to critical value for hypothesis testing
Click to see R code
# Critical value for one-tailed test, alpha = 0.05
qnorm(0.95)  # z = 1.645
[1] 1.644854
Click to see R code
# Critical value for two-tailed test, alpha = 0.05
qnorm(0.975)  # z = 1.96
[1] 1.959964

5.3 Power (1 - β)

  • Probability of detecting a true effect
  • Conventionally set at 0.80 (80%) or higher
  • Higher power = more certainty in detecting effects
  • Common choices: 70%, 80%, 90%
ImportantMinimum Standards
  • 80% power is considered minimum acceptable
  • 90% power for clinical trials or high-stakes research
  • Choice depends on how certain you want to be

5.4 Effect Size (δ)

The meaningful difference you want to detect

Must be specified based on:

  • Prior research literature
  • Educational/practical significance (not just statistical)
  • Subject matter expertise
  • Pilot data
TipCohen’s d (Standardized Effect Size)

Often expressed as: d = \frac{|\delta|}{\sigma} (effect size relative to SD)

  • Small effect: d ≈ 0.2 (subtle but meaningful in education)
  • Medium effect: d ≈ 0.5 (typical for many interventions)
  • Large effect: d ≈ 0.8 (rare in education research)

Reality check: Most education interventions have d = 0.2-0.4

5.5 Variability (σ)

Standard deviation of measurements

Estimate from:

  • Prior published research
  • Pilot studies
  • Similar studies in literature
  • Expert knowledge
WarningImpact on Sample Size

Sample size increases proportionally to variance: n \propto \sigma^2 Doubling the SD quadruples the required sample size!

5.6 Relationships Among Components

5.7 Understanding (σ/δ)²: The Signal-to-Noise Ratio

NoteWhy Does Sample Size Depend on (σ/δ)²?

The ratio σ/δ represents noise-to-signal:

  • δ = effect size (the “signal” you want to detect)
  • σ = variability (the “noise” that obscures the signal)

Intuitive examples:

Scenario δ σ σ/δ Interpretation n needed
Strong signal, low noise 10 5 0.5 Easy to detect Small
Weak signal, low noise 5 5 1.0 Moderate Medium
Weak signal, high noise 5 20 4.0 Hard to detect Large

Why squared?

  • Doubling noise (σ) means 4× more participants needed
  • Halving effect size (δ) means 4× more participants needed
  • This quadratic relationship comes from variance, not an arbitrary choice!

5.8 Key Facts to Remember

  1. Sample size increases with power: More power → larger sample needed
  2. Sample size increases with smaller detectable differences: Smaller effect → larger sample (quadratically!)
  3. Sample size increases with variance: More variability → larger sample (quadratically!)
  4. One-sided tests require smaller samples than two-sided tests
  5. The (z_α + z_β)² term represents the required distance between distributions
  6. The (σ/δ)² term represents the signal-to-noise ratio

6 Power Analysis: One-Sample Case

6.1 One-Sample Situation

Testing if a mean differs from a known value:

  • H_0: \mu = \mu_0 (null hypothesis)
  • H_1: \mu \neq \mu_0 or \mu < \mu_0 or \mu > \mu_0 (alternative)

Required sample size formula:

n = (z_\alpha + z_\beta)^2 \left(\frac{\sigma}{\delta}\right)^2

where:

  • z_\alpha = critical value for significance level
  • z_\beta = critical value for power
  • \delta = \mu_1 - \mu_0 = effect size

6.2 Where Does This Formula Come From?

NoteThe Logic Behind the Formula

Key insight: The distributions under H₀ and H₁ must be separated enough that:

  1. The critical value from H₀ cuts off α (Type I error)
  2. The critical value from H₁ cuts off β (Type II error)

The distance between distributions (in standard error units) must span both critical values:

\frac{\delta}{SE} = z_\alpha + z_\beta

Since SE = \frac{\sigma}{\sqrt{n}}, we have:

\frac{\delta}{\sigma/\sqrt{n}} = z_\alpha + z_\beta

Solving for n: \sqrt{n} = \frac{(z_\alpha + z_\beta) \cdot \sigma}{\delta}

Square both sides: n = (z_\alpha + z_\beta)^2 \left(\frac{\sigma}{\delta}\right)^2

6.3 Visual Intuition

TipThe Key Insight

The formula squared because:

  1. We need: \delta = (z_\alpha + z_\beta) \times SE
  2. Since SE = \sigma/\sqrt{n}, we have: \delta = (z_\alpha + z_\beta) \times \frac{\sigma}{\sqrt{n}}
  3. Rearranging for \sqrt{n}: \sqrt{n} = \frac{(z_\alpha + z_\beta) \times \sigma}{\delta}
  4. Squaring both sides to isolate n: n = \frac{(z_\alpha + z_\beta)^2 \times \sigma^2}{\delta^2}

The squaring comes from solving for n, not from squaring the sum first!

6.4 Example: Reading Achievement Study

NoteResearch Question

Do students in a new literacy program have higher reading scores compared to the national average?

Known Information:

  • National average reading score: μ₀ = 100
  • Standard deviation: σ = 20
  • Meaningful difference: δ = 5 points (improvement)
  • Desired power: 80%
  • Significance level: α = 0.05 (one-tailed)

6.5 Calculating Sample Size: Step by Step

Click to see R code
# Given values
mu0 <- 100      # national average score
sigma <- 20     # standard deviation
delta <- 5      # meaningful improvement
alpha <- 0.05   # significance level
power <- 0.80   # desired power

# Step 1: Find critical values
z_alpha <- qnorm(1 - alpha)  # = 1.645 for one-tailed
z_beta <- qnorm(power)       # = 0.842 for 80% power

cat("Step 1: Critical values\n")
cat("  z_alpha =", round(z_alpha, 3), "\n")
cat("  z_beta  =", round(z_beta, 3), "\n\n")

# Step 2: Calculate required distance (in SE units)
required_distance <- z_alpha + z_beta
cat("Step 2: Required distance between distributions\n")
cat("  z_alpha + z_beta =", round(required_distance, 3), "standard errors\n\n")

# Step 3: Set up the equation
# We need: delta = required_distance × SE
# We need: delta = required_distance × (sigma / sqrt(n))
# Solve for sqrt(n)
sqrt_n <- required_distance * sigma / delta
cat("Step 3: Solve for sqrt(n)\n")
cat("  sqrt(n) = (", round(required_distance, 3), " × ", sigma, ") / ", delta, "\n")
cat("  sqrt(n) =", round(sqrt_n, 3), "\n\n")

# Step 4: Square to get n
n <- sqrt_n^2
cat("Step 4: Square to get n\n")
cat("  n = (", round(sqrt_n, 3), ")²\n")
cat("  n =", round(n, 2), "\n\n")

cat("Required sample size:", ceiling(n), "students")
Step 1: Critical values
  z_alpha = 1.645 
  z_beta  = 0.842 

Step 2: Required distance between distributions
  z_alpha + z_beta = 2.486 standard errors

Step 3: Solve for sqrt(n)
  sqrt(n) = ( 2.486  ×  20 ) /  5 
  sqrt(n) = 9.946 

Step 4: Square to get n
  n = ( 9.946 )²
  n = 98.92 

Required sample size: 99 students

6.6 Formula Summary: What Each Part Means

n = \underbrace{(z_\alpha + z_\beta)^2}_{\text{Separation needed}} \times \underbrace{\left(\frac{\sigma}{\delta}\right)^2}_{\text{Signal-to-noise}}

6.6.1 (z_\alpha + z_\beta)^2

What it controls:

  • Type I error (α)
  • Type II error (β)
  • Power (1 - β)

Typical values:

  • α = 0.05, Power = 80%
  • z_α = 1.645, z_β = 0.842
  • Sum = 2.487
  • (z_\alpha + z_\beta)^2 \approx 6.2

6.6.2 (\sigma/\delta)^2

What it controls:

  • Effect size you want to detect (δ)
  • Population variability (σ)

Typical values:

  • Small effect: σ/δ = 5 → 25
  • Medium effect: σ/δ = 2 → 4
  • Large effect: σ/δ = 1.25 → 1.6

Total: n = 6.2 × (σ/δ)²

6.7 Using the pwr Package

Click to see R code
# Effect size (standardized)
effect_size <- abs(delta) / sigma

# Power analysis using pwr package
result <- pwr.t.test(
  d = effect_size,
  sig.level = alpha,
  power = power,
  type = "one.sample",
  alternative = "greater"  # or "two.sided", "less", "greater"
)

print(result)

     One-sample t test power calculation 

              n = 100.2877
              d = 0.25
      sig.level = 0.05
          power = 0.8
    alternative = greater
Click to see R code
# Verify our manual calculation matches
cat("\nManual calculation:",
    ceiling(n),
    "\npwr package:",
    abs(ceiling(result$n)))

Manual calculation: 99 
pwr package: 101

6.8 Paired Comparisons

The one-sample formula applies to paired (blocked) designs:

Response: D = Y_2 - Y_1 (difference between paired measurements)

Examples:

  • Pre-test vs. Post-test
  • Treatment vs. Control on same subject
  • Left eye vs. Right eye

Key: Need SD of the differences, not individual measurements

6.9 Example: Pre-Post Test Anxiety Study

NoteResearch Question

Does a mindfulness intervention reduce student test anxiety?

Study Design:

  • Paired comparison: post-intervention vs. baseline on same students
  • SD of differences: σ = 5 points
  • Meaningful reduction: δ = -2.5 points
  • Power: 80%, α = 0.05
Click to see R code
# Calculate sample size for paired comparison
sigma_diff <- 5.0
delta_anxiety <- -2.5
effect <- abs(delta_anxiety) / sigma_diff

pwr.t.test(d = effect, sig.level = 0.05, power = 0.80,
           type = "paired", alternative = "two.sided")

     Paired t test power calculation 

              n = 33.36713
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number of *pairs*

6.10 Exploring Power Curves

Click to see R code
# Create power curve for different sample sizes
sample_sizes <- seq(5, 50, by = 1)
effect_size <- 0.5

power_values <- sapply(sample_sizes, function(n) {
  pwr.t.test(n = n, d = effect_size, sig.level = 0.05,
             type = "one.sample", alternative = "two.sided")$power
})

ggplot(data.frame(n = sample_sizes, power = power_values),
       aes(x = n, y = power)) +
  geom_line(linewidth = 1.2, color = "#1971c2") +
  geom_hline(yintercept = 0.80, linetype = "dashed", color = "red") +
  annotate("text", x = 40, y = 0.83, label = "80% Power", color = "red") +
  labs(x = "Sample Size", y = "Statistical Power",
       title = "Power Curve for One-Sample t-test",
       subtitle = "Effect size = 0.5, α = 0.05") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

7 Power Analysis: Two-Sample Case

7.1 Two-Sample Comparison

Comparing means of two independent groups:

  • H_0: \mu_2 - \mu_1 = 0
  • H_1: \mu_2 - \mu_1 \neq 0 (or < or >)

Both means are unknown - different from one-sample case!

7.2 Two-Sample Formulas

7.2.1 Case 1: Unequal Variances

Total sample size: N = (z_\alpha + z_\beta)^2 \left[\frac{\sigma_1 + \sigma_2}{\delta}\right]^2

Allocate samples proportional to SDs: n_1 = N \cdot \frac{\sigma_1}{\sigma_1 + \sigma_2}, \quad n_2 = N \cdot \frac{\sigma_2}{\sigma_1 + \sigma_2}

7.2.2 Case 2: Equal Variances

Sample size per group: n = 2(z_\alpha + z_\beta)^2 \left(\frac{\sigma}{\delta}\right)^2

7.3 Example: Math Achievement by SES

NoteResearch Question

Compare math achievement scores between students from high and low socioeconomic status (SES) backgrounds

Known Information:

  • High SES group: σ₁ = 12 points
  • Low SES group: σ₂ = 15 points (larger variability)
  • Meaningful difference: δ = 10 points
  • Power: 80%, α = 0.05

7.4 Calculation: Unequal Variances

Click to see R code
# Given values
sigma1 <- 12    # High SES SD
sigma2 <- 15    # Low SES SD
delta <- 10     # Effect size
alpha <- 0.05
power <- 0.80

# Critical values
z_alpha <- qnorm(1 - alpha/2, lower.tail = TRUE)  # two-tailed
z_beta <- qnorm(power, lower.tail = TRUE)

# Total sample size
N <- (z_alpha + z_beta)^2 * ((sigma1 + sigma2) / delta)^2

# Allocate proportional to SDs
n1 <- ceiling(N * sigma1 / (sigma1 + sigma2))
n2 <- ceiling(N * sigma2 / (sigma1 + sigma2))

cat("High SES students:", n1, "\n")
High SES students: 26 
Click to see R code
cat("Low SES students:", n2, "\n")
Low SES students: 32 
Click to see R code
cat("Total sample size:", n1 + n2)
Total sample size: 58

7.5 Using pwr for Two-Sample Tests

Click to see R code
# For equal variances (using larger SD as planning value)
sigma <- sqrt((sigma1^2 + sigma2^2)/2) # pooled standard deviation
effect_size <- abs(delta) / sigma # Cohen' d

result <- pwr.t.test(
  d = effect_size,
  sig.level = 0.05,
  power = 0.80,
  type = "two.sample",
  alternative = "two.sided"
)

print(result)

     Two-sample t test power calculation 

              n = 29.95364
              d = 0.7362102
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group
Click to see R code
cat("\nTotal sample size:", ceiling(result$n * 2))

Total sample size: 60

7.6 Example: Study Skills Workshop

NoteResearch Question

Does a study skills workshop improve GPA compared to no intervention?

Study Design:

  • Two independent groups (workshop vs. control)
  • Equal SDs: σ = 0.5 GPA points
  • Meaningful change: δ = 0.3 GPA points
  • Power: 80%, α = 0.05
Click to see R code
sigma <- 0.5
delta <- 0.3
effect_size <- delta / sigma

pwr.t.test(d = effect_size, sig.level = 0.05, power = 0.80,
           type = "two.sample", alternative = "two.sided")

     Two-sample t test power calculation 

              n = 44.58577
              d = 0.6
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

7.7 Visualizing Two-Sample Power

8 Power Analysis: Other Designs

8.1 Log-Normal Distributions

When data are log-normally distributed:

  • Easier to specify effects as percentage changes
  • Variability expressed as coefficient of variation

c = \frac{\sqrt{\text{Var}(Y)}}{E(Y)}

Sample size per group: n = \frac{2(z_\alpha + z_\beta)^2 c^2}{[\log(1+f)]^2}

where f is the proportionate change (e.g., 0.20 for 20% increase)

8.2 Example: Reaction Time Study

NoteResearch Question

Compare reaction times between students with ADHD receiving medication vs. placebo

Known Information:

  • Coefficient of variation: c = 0.30
  • Expected difference: 20% faster reaction time with medication (f = -0.20)
  • Power: 80%, α = 0.05

Note: Reaction times typically follow log-normal distributions

Click to see R code
# Given values
c <- 0.30       # Coefficient of variation
f <- 0.20       # 20% proportionate change
alpha <- 0.05
power <- 0.80

# Critical values
z_alpha <- qnorm(alpha, lower.tail = TRUE)
z_beta <- qnorm(1 - power, lower.tail = TRUE)

# Sample size calculation
n <- 2 * (z_alpha + z_beta)^2 * c^2 / (log(1 + f))^2
cat("Sample size per group:", ceiling(n), "\n")
Sample size per group: 34 
Click to see R code
cat("Total sample size:", ceiling(n) * 2)
Total sample size: 68

8.3 Cluster Randomized Designs

Clusters = groups of experimental units

Common examples:

  • In education: Students within classrooms, classrooms within schools
  • Also: Patients within clinics, siblings within families

Intracluster Correlation (ρ):

  • Measures similarity within clusters
  • Range: 0 to 1
  • ρ = 0: units independent (no clustering effect)
  • ρ = 1: units identical within cluster
  • Typical values in education: 0.10-0.25 for students in classrooms

Sample size adjustment: n = km = 2(z_\alpha + z_\beta)^2 \left(\frac{\sigma}{\delta}\right)^2 [1 + (m-1)\rho]

where k = number of clusters, m = units per cluster

8.4 Impact of Intracluster Correlation

8.5 Cluster Design: Key Insights

  1. High ICC → Need more clusters, not more units per cluster
    • With ρ = 0.20, adding more students per classroom doesn’t help proportionally
    • Better to recruit more classrooms with fewer students each
    • In education research, typical ICC for classrooms: 0.10-0.25
  2. Ignoring ICC leads to underpowered studies
    • Don’t calculate as if students are independent
    • Must account for clustering in design
    • Failure to account for ICC is a common error in educational research
  3. Design Efficiency
    • Maximize number of clusters (k = classrooms, schools)
    • Consider cluster size (m) and practical constraints
    • Balance statistical needs with recruitment feasibility

8.6 Example: Students in Classrooms

NoteScenario
  • Need 100 students total with ρ = 0.20
  • Cluster size m = 20 (students per classroom)

Incorrect approach (ignoring ICC): n = 100 \text{ students} \rightarrow 5 \text{ classrooms}

Correct approach (accounting for ICC): n = 100 \times [1 + (20-1)(0.20)] = 100 \times 4.8 = 480 \text{ students} \rightarrow 24 \text{ classrooms needed (12 per condition)}

Warning

Always account for clustering structure in your design!

9 Power Analysis: ANOVA Designs

9.1 One-Way ANOVA

Comparing means across multiple groups (k ≥ 3)

Effect size (f): f = \frac{\sigma_{\text{between}}}{\sigma_{\text{within}}}

Conventional values:

  • Small: f = 0.10
  • Medium: f = 0.25
  • Large: f = 0.40

9.2 ANOVA Power Analysis in R

Click to see R code
# Example: Comparing 4 groups
k <- 4              # number of groups
effect_size <- 0.25 # medium effect
alpha <- 0.05
power <- 0.80

# Power analysis
pwr.anova.test(
  k = k,
  f = effect_size,
  sig.level = alpha,
  power = power
)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 44.59927
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

9.3 Factorial ANOVA

For 2×2 factorial design:

  • Two factors, each with 2 levels
  • Tests main effects and interaction
Note

Note: pwr.f2.test() returns v (denominator degrees of freedom). To get total sample size: n = v + u + 1

Click to see R code
# Effect size for factorial design
effect_size <- 0.25
alpha <- 0.05
power <- 0.80

# Numerator df = (levels - 1)
# For 2x2: df = 1 for each main effect and interaction

result <- pwr.f2.test(
  u = 1,              # numerator df for one effect
  f2 = effect_size^2, # f-squared effect size
  sig.level = alpha,
  power = power
)

print(result)

     Multiple regression power calculation 

              u = 1
              v = 125.5312
             f2 = 0.0625
      sig.level = 0.05
          power = 0.8
Click to see R code
# Calculate total sample size from v (denominator df)
# Formula: n = v + u + 1
n_total <- ceiling(result$v + result$u + 1)
cat("\nTotal sample size needed:", n_total, "participants")

Total sample size needed: 128 participants

9.4 Understanding pwr.f2.test Output

TipInterpreting the Results

When using pwr.f2.test(), the output shows:

  • u = numerator degrees of freedom (number of predictors/effects tested)
  • v = denominator degrees of freedom (error df)
  • f2 = Cohen’s f² effect size
  • sig.level = α level
  • power = statistical power

Important: To get the total sample size needed:

n = v + u + 1

Example: If u = 1 and v = 125.53, you need n = 128 participants total.

9.5 Visualizing ANOVA Power

10 Power Analysis: Proportions and Correlations

10.1 Comparing Two Proportions

Testing difference between two proportions (e.g., success rates)

Effect size (h): h = 2[\arcsin(\sqrt{p_1}) - \arcsin(\sqrt{p_2})]

Click to see R code
# Example: Compare graduation rates
p1 <- 0.70  # Control group
p2 <- 0.80  # Treatment group

# Calculate effect size
ES.h(p1, p2)
[1] -0.2319843
Click to see R code
# Power analysis
pwr.2p.test(
  h = ES.h(p1, p2),
  sig.level = 0.05,
  power = 0.80,
  alternative = "two.sided"
)

     Difference of proportion power calculation for binomial distribution (arcsine transformation) 

              h = 0.2319843
              n = 291.6887
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: same sample sizes

10.2 Testing Correlations

Testing if correlation differs from zero

Effect size = r (the correlation itself)

Click to see R code
# Test if correlation r = 0.30 is significant
r <- 0.30

pwr.r.test(
  r = r,
  sig.level = 0.05,
  power = 0.80,
  alternative = "two.sided"
)

     approximate correlation power calculation (arctangh transformation) 

              n = 84.07364
              r = 0.3
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

10.3 Chi-Square Tests

For contingency tables:

Effect size (w):

  • Small: w = 0.10
  • Medium: w = 0.30
  • Large: w = 0.50
Click to see R code
# Example: 3x3 contingency table
df <- (3 - 1) * (3 - 1)  # degrees of freedom

pwr.chisq.test(
  w = 0.30,         # medium effect
  df = df,
  sig.level = 0.05,
  power = 0.80
)

     Chi squared power calculation 

              w = 0.3
              N = 132.6143
             df = 4
      sig.level = 0.05
          power = 0.8

NOTE: N is the number of observations

11 Software Tools for Power Analysis

11.1 Available Software

11.1.1 R Packages

  • pwr: Most common for psychology/education
  • WebPower: Web-based interface
  • simr: Simulation-based for mixed models
  • powerAnalysis: Educational resource
  • clusterPower: Cluster randomized trials

11.1.2 Standalone Software

  • **G*Power**: Free, widely used in psychology
  • Optimal Design: Multilevel designs in education
  • PowerUpR: R package for education research
  • PASS: Commercial, extensive capabilities
  • Russell Lenth’s applets: Free online tools

11.2 Using G*Power

Popular free software for power analysis:

  1. Select test family (t-test, ANOVA, regression, etc.)

  2. Choose specific test type

  3. Specify:

    • Effect size
    • α level
    • Power or sample size
  4. Get results with visualizations

Download: https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower

12 Practical Guidelines

12.1 Before Running Your Study

  1. Review literature for:

    • Expected effect sizes
    • Variability estimates
    • Similar study designs
  2. Consider pilot studies to:

    • Estimate parameters
    • Test procedures
    • Refine hypotheses
  3. Specify meaningful effects:

    • Clinical/practical significance
    • Minimum detectable difference
    • Cost-benefit considerations

12.2 Conducting Power Analysis

  1. Be conservative:

    • Use realistic (not optimistic) effect sizes
    • Account for attrition/dropout
    • Consider multiple comparisons
  2. Perform sensitivity analysis:

    • Vary effect sizes
    • Check impact of assumptions
    • Explore “what if” scenarios
  3. Document everything:

    • Assumptions and their sources
    • Calculations and software used
    • Rationale for parameter choices

12.3 Common Pitfalls to Avoid

WarningDon’t:
  1. Manipulate effect sizes to justify small samples
  2. Ignore clustering or dependence
  3. Use published effect sizes uncritically (publication bias!)
  4. Forget about attrition/missing data
  5. Conduct power analysis after data collection (“post-hoc power”)
  6. Ignore practical constraints (budget, time, recruitment)

12.4 Post-Hoc Power Analysis

ImportantWhy Not to Do It

Post-hoc power analysis (after collecting data) is controversial:

  • Circular reasoning: uses observed effect to estimate power
  • Non-significant results always have low post-hoc power
  • Doesn’t change interpretation of results
  • Confidence intervals more informative

Exception: Informing future studies with pilot data

13 Applied Examples

13.1 Example 1: Reading Intervention

NoteResearch Question

Does a new reading intervention improve test scores compared to standard curriculum?

Information:

  • Two independent groups (intervention vs. control)
  • σ = 15 points (from prior studies)
  • Meaningful improvement: 5 points
  • Power: 80%, α = 0.05 (two-tailed)
Click to see R code
effect_size <- 5 / 15  # 0.333
pwr.t.test(d = effect_size, sig.level = 0.05, power = 0.80,
           type = "two.sample", alternative = "two.sided")

     Two-sample t test power calculation 

              n = 142.2462
              d = 0.3333333
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

Need ~143 students per group, 286 total

13.2 Example 2: Class Size Study

NoteResearch Question

Do smaller class sizes improve student achievement? Compare 3 class size conditions.

Information:

  • Three groups: Small (15), Medium (25), Large (35)
  • σ = 10 points
  • Expected η² = 0.06 (medium effect)
  • Convert to f: f = \sqrt{\frac{\eta^2}{1-\eta^2}} = 0.25
Click to see R code
eta_squared <- 0.06
f <- sqrt(eta_squared / (1 - eta_squared))

pwr.anova.test(k = 3, f = f, sig.level = 0.05, power = 0.80)

     Balanced one-way analysis of variance power calculation 

              k = 3
              n = 51.32635
              f = 0.2526456
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

Need ~52 students per class, 156 total

13.3 Example 3: Correlation Study

NoteResearch Question

Is there a significant correlation between homework time and GPA?

Expected correlation: r = 0.30 (medium)

Click to see R code
pwr.r.test(r = 0.30, sig.level = 0.05, power = 0.80,
           alternative = "two.sided")

     approximate correlation power calculation (arctangh transformation) 

              n = 84.07364
              r = 0.3
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

Need ~85 students

13.4 Example 4: Multilevel Design

NoteResearch Question

Testing intervention effects with students nested in classrooms

Information:

  • 20 students per classroom
  • ICC = 0.15 (typical for education)
  • Effect size d = 0.40
Click to see R code
# Base sample size (ignoring clustering)
base <- pwr.t.test(d = 0.40, power = 0.80, sig.level = 0.05,
                   type = "two.sample")
base_n <- ceiling(base$n)

# Adjust for clustering
m <- 20           # cluster size
rho <- 0.15       # ICC
design_effect <- 1 + (m - 1) * rho

adjusted_n <- ceiling(base_n * design_effect)

cat("Base sample size per group:", base_n, "\n")
Base sample size per group: 100 
Click to see R code
cat("Design effect:", round(design_effect, 2), "\n")
Design effect: 3.85 
Click to see R code
cat("Adjusted sample size per group:", adjusted_n, "\n")
Adjusted sample size per group: 385 
Click to see R code
cat("Number of classrooms needed per group:",
    ceiling(adjusted_n / m))
Number of classrooms needed per group: 20

14 Summary and Recommendations

14.1 Key Takeaways

  1. Power analysis is essential for ethical, efficient research

  2. Minimum 80% power to detect meaningful effects

  3. Four key inputs: α, power, effect size, variability

  4. Sample size relationships:

    • Increases with desired power
    • Increases quadratically with smaller effect sizes
    • Increases with greater variability
  5. Account for study design: clustering, pairing, multiple groups

14.2 Recommendations for Practice

14.2.1 Planning Stage

  • Conduct early in research design
  • Use realistic effect sizes
  • Consider attrition
  • Perform sensitivity analyses
  • Document all assumptions

14.2.2 Resources

  • Use established software (pwr, G*Power)
  • Consult statistician
  • Review similar studies
  • Conduct pilot studies
  • Get peer review of power analysis

14.3 Decision Framework

Click to see R code
flowchart TD
    A[Research Question] --> B{Feasible<br/>Sample Size?}
    B -->|Yes| C[Conduct Study]
    B -->|No| D{Can Increase<br/>Sample?}
    D -->|Yes| E[Revise Budget/<br/>Timeline]
    D -->|No| F{Can Accept<br/>Larger δ?}
    F -->|Yes| G[Revise Research<br/>Question]
    F -->|No| H[Abandon/<br/>Restructure]
    E --> C
    G --> C

    style C fill:#51cf66
    style H fill:#ff6b6b

flowchart TD
    A[Research Question] --> B{Feasible<br/>Sample Size?}
    B -->|Yes| C[Conduct Study]
    B -->|No| D{Can Increase<br/>Sample?}
    D -->|Yes| E[Revise Budget/<br/>Timeline]
    D -->|No| F{Can Accept<br/>Larger δ?}
    F -->|Yes| G[Revise Research<br/>Question]
    F -->|No| H[Abandon/<br/>Restructure]
    E --> C
    G --> C

    style C fill:#51cf66
    style H fill:#ff6b6b

14.4 Further Reading

  1. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge.
    • The classic reference for power analysis in psychology and education
  2. Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). GPower 3: A flexible statistical power analysis program. Behavior Research Methods, 39*, 175-191.
    • Free software widely used in psychology and education
  3. Lakens, D. (2022). Sample size justification. Collabra: Psychology, 8(1), 33267.
    • Modern perspective on justifying sample sizes
  4. Spybrook, J., et al. (2011). Optimal Design Plus Empirical Evidence: Documentation for the Optimal Design software.
    • Specialized for cluster randomized trials in education
  5. Ledolter, J., & Kardon, R. (2020). Focus on data: Statistical design of experiments and sample size selection using power analysis. Investigative Ophthalmology & Visual Science, 61(8), 11.
    • General principles applicable across disciplines

15 Questions?

15.1 Resources and Practice

TipPractice Exercises
  1. Calculate sample size for your own research question
  2. Explore power curves with different parameters
  3. Compare results across different software
  4. Conduct sensitivity analysis

15.1.1 Contact Information

  • Office Hours: [Schedule]
  • Email: [Your email]
  • Course Website: [Link]
Back to top