1.1 Class Outline

Go through three assumptions of ANOVA and their checking statistics
Post-hoc tests for more group comparisons
Example: Intervention and Verbal Acquisition
After-class Exercise: Effect of Sleep Duration on Cognitive Performance

1.2 Review ANOVA Procedure

Set hypotheses:
- Null hypothesis (H_0): All group means are equal.
- Alternative hypothesis (H_A): At least one group mean differs.
Determine statistical parameters:
- Significance level \alpha
- Degrees of freedom for between-group (df_b) and within-group (df_w)
- Find the critical F-value.
Compute test statistic:
- Calculate F-ratio based on between-group and within-group sum of squares.
Compare results:
- Either compare F_{\text{obs}} with F_{\text{crit}} or p-value with \alpha.
- If p < \alpha, reject H_0.

1.3 ANOVA Type I Error

The way to compare the means of multiple groups (i.e., more than two groups)
- Compared to multiple t-tests, the ANOVA does not inflate Type I error rates.
  - For F-test in ANOVA, Type I error rate is not inflated (testing the H_0 once a time)
  - Type I error rate is inflated when we do multiple comparison (testing multiple H_0s given the same data)
- Family-wise error rate (FWER): probability of at least one Type I error occur = 1 − (1 − \alpha)^c where c =number of tests. Frequently used FWER methods include Bonferroni or FDR.

Click to see R code

library(purrr)
alpha <-  .05
c <- seq(10)
prob_TypeOneError <- 1 - (1 - alpha)^c

plot(x = c, y = prob_TypeOneError, type = "b",
     xlab = "Number of comparisons", ylab = "Pr(#error > 1)",
     main = "Type I error inflation with number of tests increases")

Family-Wise Error Rate

The Family-Wise Error Rate (FWER) is the probability of making at least one Type I error (false positive) across a set of multiple comparisons.
In Analysis of Variance (ANOVA), multiple comparisons are often necessary when analyzing the differences between group means. FWER control is crucial to ensure the validity of statistical conclusions.

2 ANOVA Assumptions

2.1 Overview

Like all statistical tests, ANOVA requires certain assumptions to be met for valid conclusions:
- Independence: Observations are independent of each other.
- Normality: The residuals (errors) follow a normal distribution.
- Homogeneity of variance (HOV): The variance within each group is approximately equal.

2.2 Importance of Assumptions

If assumptions are violated, the results of ANOVA may not be reliable.
We call models robust when ANOVAs are not easily influenced by the violation of assumptions.
Typically:
- ANOVA is robust to minor violations of normality, especially for large sample sizes (Central Limit Theorem).
- Not robust to violations of independence—if independence is violated, ANOVA is inappropriate.
- Moderately robust to HOV violations if sample sizes are equal.

2.3 Assumption 1: Independence among samples

Definition: Each observation should be independent of others.
Violations:
- Clustering of data (e.g., repeated measures).
- Participants influencing each other (e.g., classroom discussions).
Check (Optional): Use the Durbin-Watson test to check Serial Correlation.
Solution: Repeated Measures ANOVA, Mixed ANOVA, Multilevel Model
Consequences: If independence is violated, ANOVA results are not valid.

Example: Demonstrating Independence Violation

Click to see R code

# Generate data with dependency (autocorrelation)
set.seed(1234)
n_per_group <- 20
groups <- 3

# Create dependent data using autoregressive structure
generate_dependent_data <- function(n, mean_val, sd_val, rho = 0.7) {
  y <- numeric(n)
  y[1] <- rnorm(1, mean_val, sd_val)
  for(i in 2:n) {
    y[i] <- rho * y[i-1] + rnorm(1, mean_val * (1-rho), sd_val * sqrt(1-rho^2))
  }
  return(y)
}

# Generate data for 3 groups with different means but with dependency
dependent_data <- data.frame(
  group = factor(rep(c("A", "B", "C"), each = n_per_group)),
  score = c(
    generate_dependent_data(n_per_group, 50, 10),  # Group A
    generate_dependent_data(n_per_group, 55, 10),  # Group B
    generate_dependent_data(n_per_group, 60, 10)   # Group C
  )
)

# Compare with independent data
independent_data <- data.frame(
  group = factor(rep(c("A", "B", "C"), each = n_per_group)),
  score = c(
    rnorm(n_per_group, 50, 10),  # Group A
    rnorm(n_per_group, 55, 10),  # Group B
    rnorm(n_per_group, 60, 10)   # Group C
  )
)

# Run ANOVA on both datasets
anova_dependent <- aov(score ~ group, data = dependent_data)
anova_independent <- aov(score ~ group, data = independent_data)

# Compare results
cat("ANOVA with DEPENDENT data:\n")
print(summary(anova_dependent))
cat("--------------------------\n")
cat("\nANOVA with INDEPENDENT data:\n")
print(summary(anova_independent))


# Check Durbin-Watson test
cat("--------------------------\n")
cat("\nDurbin-Watson test for dependent data:")
print(lmtest::dwtest(anova_dependent))
cat("--------------------------\n")
cat("\nDurbin-Watson test for independent data:")
print(lmtest::dwtest(anova_independent))

ANOVA with DEPENDENT data:
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2    695   347.3    3.72 0.0303 *
Residuals   57   5321    93.4                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------

ANOVA with INDEPENDENT data:
            Df Sum Sq Mean Sq F value Pr(>F)
group        2    377  188.55   2.389  0.101
Residuals   57   4500   78.94               
--------------------------

Durbin-Watson test for dependent data:
    Durbin-Watson test

data:  anova_dependent
DW = 0.74297, p-value = 3.078e-09
alternative hypothesis: true autocorrelation is greater than 0

--------------------------

Durbin-Watson test for independent data:
    Durbin-Watson test

data:  anova_independent
DW = 1.8171, p-value = 0.1641
alternative hypothesis: true autocorrelation is greater than 0

Key observations:

Dependent data often shows artificially smaller p-values (inflated significance)
Standard errors are underestimated when independence is violated
Durbin-Watson test helps detect this violation (DW ≠ 2, p < 0.05)

2.3.1 Background: Durbin-Watson (DW) Test

Note

The Durbin-Watson test is primarily used for detecting autocorrelation in time-series data.
In the context of ANOVA with independent groups, residuals are generally assumed to be independent.
- It’s good practice to check this assumption, especially if there’s a reason to suspect potential autocorrelation.

Important

Properties of DW Statistic:
- H_0: Independence of residual is satisfied.
- Ranges from 0 to 4.
  - A value around 2 suggests no autocorrelation.
  - Values approaching 0 indicate positive autocorrelation.
  - Values toward 4 suggest negative autocorrelation.
- P-value
  - P-Value: A small p-value (typically < 0.05) indicates violation of independence

Real-world examples of independence violations:

Dependent data scenarios:
- Repeated measures: Testing the same students multiple times (pre-test, mid-test, post-test)
- Classroom clusters: Students in the same classroom influence each other’s performance
- Time series: Daily test scores where yesterday’s performance affects today’s
- Spatial correlation: Schools in the same district sharing similar resources and outcomes
Independent data scenarios:
- Random sampling: Selecting students from different schools across different regions
- Single measurement: Each participant tested only once with no interaction
- Controlled assignment: Participants randomly assigned to groups with no cross-contamination

Why this matters: When observations are dependent, they don’t provide as much unique information as independent observations. This leads to overconfident results (smaller standard errors and p-values).

2.3.2 Visualization of indepdence test

Click to see R code

library(ggplot2)
library(gridExtra)

# Use the same data from previous example
n_per_group <- 20

# Generate dependent and independent data
dependent_data <- data.frame(
  group = factor(rep(c("A", "B", "C"), each = n_per_group)),
  score = c(
    generate_dependent_data(n_per_group, 50, 10),
    generate_dependent_data(n_per_group, 55, 10),
    generate_dependent_data(n_per_group, 60, 10)
  ),
  observation = 1:(3*n_per_group),
  type = "Dependent (AR)"
)

independent_data <- data.frame(
  group = factor(rep(c("A", "B", "C"), each = n_per_group)),
  score = c(
    rnorm(n_per_group, 50, 10),
    rnorm(n_per_group, 55, 10),
    rnorm(n_per_group, 60, 10)
  ),
  observation = 1:(3*n_per_group),
  type = "Independent"
)

# Combine data
combined_data <- rbind(dependent_data, independent_data)

# Plot 1: Time series plot showing autocorrelation
p1 <- ggplot(combined_data, aes(x = observation, y = score, color = group)) +
  geom_line(size = 0.8) + geom_point(size = 1.5) +
  facet_wrap(~type, scales = "free") +
  labs(title = "Time Series: Independence vs Dependence",
       x = "Observation Order", y = "Score") +
  theme_minimal() + theme(legend.position = "bottom")

# Plot 2: Lag plot to show autocorrelation
create_lag_data <- function(data) {
  data$score_lag1 <- c(NA, data$score[-length(data$score)])
  return(data[!is.na(data$score_lag1), ])
}

lag_dependent <- create_lag_data(dependent_data)
lag_independent <- create_lag_data(independent_data)
lag_combined <- rbind(lag_dependent, lag_independent)

p2 <- ggplot(lag_combined, aes(x = score_lag1, y = score)) +
  geom_point(aes(color = group), alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "black", size = 0.5) +
  facet_wrap(~type, scales = "free") +
  labs(title = "Lag Plot: Current vs Previous Observation",
       x = "Score (t-1)", y = "Score (t)") +
  theme_minimal() + theme(legend.position = "bottom")

# Display plots
grid.arrange(p1, p2, ncol = 1)

Key Visual Differences: - Dependent data: Shows smooth, connected patterns with visible trends - Independent data: Shows random scatter with no systematic patterns - Lag plot: Dependent data shows positive correlation between consecutive observations

lmtest::dwtest()

Performs the Durbin-Watson test for autocorrelation of disturbances.
The Durbin-Watson test has the null hypothesis that the autocorrelation of the disturbances is 0.

Click to see R code

#install.packages("lmtest")
library(lmtest)
err1 <- rnorm(100)

## generate regressor and dependent variable
x <- rep(c(-1,1), 50)
y <- 1 + x + err1
dat <- data.frame(x = x, y = y)
## perform Durbin-Watson test
dwtest(y ~ x, data = dat)

These results suggest there is no autocorrelation at the alpha level of .05.

Below is the R code that has loaded two datasets for you. dat1 and dat2, try to examine which dataset violates independence.

2.4 Assumption 2: Normality

The dependent variable (DV) should be normally distributed within each group.
Assessments:
- Graphical methods: Histograms, Q-Q plots.
- Statistical tests:
  - Shapiro-Wilk test (common)
  - Kolmogorov-Smirnov (KS) test (for large samples)
  - Anderson-Darling test (detects kurtosis issues).
Robustness:
- ANOVA is robust to normality violations for large samples.
- If normality is violated, consider transformations or non-parametric tests.

2.4.1 Normality check for the whole samples

Assume that the DV (Y) is distributed normally within each group for ANOVA
ANOVA is robust to minor violations of normality
- So generally start with homogeneity, then assess normality

2.4.2 R function `shapiro.test()`

Click to see R code

set.seed(123)  # For reproducibility
sample_normal_data <- rnorm(200, mean = 50, sd = 10)  # Generate normal data
sample_nonnormal_data <- sample(1:10, 200, replace = TRUE)  # Generate non-normal data

cat("Normality assumption holds:")
shapiro.test(sample_normal_data)
cat("Normality assumption does not hold:")
shapiro.test(sample_nonnormal_data)

Normality assumption holds:
    Shapiro-Wilk normality test

data:  sample_normal_data
W = 0.99076, p-value = 0.2298

Normality assumption does not hold:
    Shapiro-Wilk normality test

data:  sample_nonnormal_data
W = 0.93581, p-value = 1.005e-07

2.4.3 R function `ks.test()`

Click to see R code

# Perform Kolmogorov-Smirnov test against a normal distribution
cat("Normality assumption holds:")
ks.test(scale(sample_normal_data), "pnorm")
cat("Normality assumption does not hold:")
ks.test(scale(sample_nonnormal_data), "pnorm")

Warning in ks.test.default(scale(sample_nonnormal_data), "pnorm"): ties should
not be present for the one-sample Kolmogorov-Smirnov test

Normality assumption holds:
    Asymptotic one-sample Kolmogorov-Smirnov test

data:  scale(sample_normal_data)
D = 0.054249, p-value = 0.5983
alternative hypothesis: two-sided

Normality assumption does not hold:
    Asymptotic one-sample Kolmogorov-Smirnov test

data:  scale(sample_nonnormal_data)
D = 0.12037, p-value = 0.006084
alternative hypothesis: two-sided

Perform Shapiro-Wilk and Kolmogorov-Smirnov for both datasets. Tell me which test works or does not work and why.

2.4.4 Normaliy check for each group

# Generate three-group data with normal distributions
library(dplyr) # we need this package to group the data
set.seed(123)
group_data <- data.frame(
  group = rep(c("A", "B", "C"), each = 30),
  score = c(rnorm(30, mean = 50, sd = 10),  # Group A
            rnorm(30, mean = 55, sd = 12),  # Group B
            rnorm(30, mean = 48, sd = 9))   # Group C
)

# Run Shapiro-Wilk test for each group
group_data |>
  group_by(group) |>
  summarize(
    shapiro_p_value = shapiro.test(score)$p.value
  )

# A tibble: 3 × 2
  group shapiro_p_value
  <chr>           <dbl>
1 A               0.797
2 B               0.961
3 C               0.848

Interpretation of Normality Checking Results:

If all p-values > 0.05: All groups satisfy normality assumption - proceed with ANOVA
If one group has p-value < 0.05 but others are normal:
- ANOVA is generally robust to minor normality violations, especially with equal sample sizes
- Consider the sample size: larger samples (n > 30) make ANOVA more robust
- Options: (1) Continue with ANOVA if violation is mild, (2) Apply data transformation (log, square root), or (3) Use non-parametric alternative (Kruskal-Wallis test)
If multiple groups violate normality: Strong evidence against using ANOVA - consider transformations or non-parametric tests

Perform Shapiro-Wilk for each group datasets. Tell me which group is not normal.

2.5 Assumption 3: Homogeneity of Variance (HOV)

Variance across groups should be equal.
Assessments:
- Levene’s test: Tests equality of variances.
- Brown-Forsythe test: More robust to non-normality.
- Bartlett’s test: For data with normality.
- Boxplots: Visual inspection.
What if violated?
- Welch’s ANOVA (adjusted for variance differences).
- Transforming the dependent variable.
- Using non-parametric tests (e.g., Kruskal-Wallis).

Practical Considerations

Click to see R code

## Computes Levene's test for homogeneity of variance across groups.
car::leveneTest(outcome ~ group, data = data)

## Boxplots to visualize the variance by groups
boxplot(outcome ~ group, data = data)

## Brown-Forsythe test
onewaytests::bf.test(outcome ~ group, data = data)

## Bartlett's Test
bartlett.test(outcome ~ group, data = data)

Levene’s test and Brown-Forsythe test are often preferred when data does not meet the assumption of normality, especially for small sample sizes.
Bartlett’s test is most powerful when the data is normally distributed and when sample sizes are equal across groups. However, it can be less reliable if these assumptions are violated.

Decision Tree

2.6 ANOVA Robustness

Robust to:
- Minor normality violations (for large samples).
- Small HOV violations if group sizes are equal.
Not robust to:
- Independence violations—ANOVA is invalid if data points are dependent.
- Severe HOV violations—Type I error rates become unreliable.
The robustness of assumptions is something you should be careful about before/when you perform data collection. They are not something you can address after data collection has been finished.

Robustness to Violations of Normality Assumption

ANOVA assumes that the residuals (errors) are normally distributed within each group.
However, ANOVA is generally robust to violations of normality, particularly when the sample size is large.
Theoretical Justification: This robustness is primarily due to the Central Limit Theorem (CLT), which states that, for sufficiently large sample sizes (typically n≥30 per group), the sampling distribution of the mean approaches normality, even if the underlying population distribution is non-normal.
This means that unless the data are heavily skewed or have extreme outliers, ANOVA results remain valid and Type I error rates are not severely inflated.

2.6.1 Robustness to Violations of Homogeneity of Variance

The homogeneity of variance (homoscedasticity) assumption states that all groups should have equal variances. ANOVA can tolerate moderate violations of this assumption, particularly when:
- Sample sizes are equal (or nearly equal) across groups – When groups have equal sample sizes, the F-test remains robust to variance heterogeneity because the pooled variance estimate remains balanced.
- The degree of variance heterogeneity is not extreme – If the largest group variance is no more than about four times the smallest variance, ANOVA results tend to remain accurate.

2.6.2 ANOVA: Lack of Robustness to Violations of Independence of Errors

The assumption of independence of errors means that observations within and between groups must be uncorrelated. Violations of this assumption severely compromise ANOVA’s validity because:
- Inflated Type I error rates – If errors are correlated (e.g., due to clustering or repeated measures), standard errors are underestimated, leading to an increased likelihood of falsely rejecting the null hypothesis.
- Biased parameter estimates – When observations are not independent, the variance estimates do not accurately reflect the true variability in the data, distorting F-statistics and p-values.
- Common sources of dependency – Examples include nested data (e.g., students within schools), repeated measurements on the same subjects, or time-series data. In such cases, alternatives like mixed-effects models or generalized estimating equations (GEE) should be considered.

3 Omnibus ANOVA Test

3.1 Overview

What does it test?
- Whether there is at least one significant difference among means.
Limitation:
- Does not tell which groups are different.
Solution:
- Conduct post-hoc tests.

3.2 Individual Comparisons of Means

If ANOVA is significant, follow-up tests identify where differences occur.
Types:
- Planned comparisons: Defined before data collection.
- Unplanned (post-hoc) comparisons: Conducted after ANOVA.

3.3 Planned vs. Unplanned Comparisons

Planned:
- Based on theory.
- Can be done even if ANOVA is not significant.
Unplanned (post-hoc):
- Data-driven.
- Only performed if ANOVA is significant.

3.4 Types of Unplanned Comparisons

Common post-hoc tests:
1. Fisher’s LSD
2. Bonferroni correction or Adjusted p-values
3. Sidák correction
4. Tukey’s HSD

3.4.1 Post-hoc test

Tukey’s HSD

Controls for Type I error across multiple comparisons.
Uses a q-statistic from a Tukey table.
Preferred when all pairs need comparison.

Click to see R code

aov_result <- aov(score ~ group, group_data) # run the ANOVA model first
TukeyHSD(aov_result) # call TukeyHSD followed by the model to perform Tukey multiple comparisons

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = score ~ group, data = group_data)

$group
         diff        lwr       upr     p adj
B-A  7.611098   1.902128 13.320067 0.0057521
C-A -1.309179  -7.018149  4.399791 0.8483773
C-B -8.920277 -14.629246 -3.211307 0.0009985

Family-wise Error Rate (adjusted p-values)

Adjusts p-values to reduce Type I error
Report adjusted p-values (typically larger that original p-values)

3.4.2 Tukey Honest Significant Differences (HSD) Test

Purpose: Conducts all possible pairwise comparisons between group means while controlling the family-wise error rate at the specified alpha level (typically 0.05).
Why it’s needed: Simple pairwise t-tests after a significant ANOVA inflate the Type I error rate. Tukey HSD adjusts for multiple comparisons to maintain the overall error rate.
How it works: Creates confidence intervals for all pairwise mean differences using the studentized range distribution, which accounts for the number of groups being compared.
Interpretation: If a confidence interval does not include zero, or if p adj < 0.05, the pair of groups differs significantly.

Click to see R code

tukey_result <- TukeyHSD(aov_result)
print(tukey_result$group)

         diff        lwr       upr        p adj
B-A  7.611098   1.902128 13.320067 0.0057520929
C-A -1.309179  -7.018149  4.399791 0.8483772504
C-B -8.920277 -14.629246 -3.211307 0.0009984894

Interpretation: The Tukey HSD test provides pairwise comparisons between all groups while controlling for family-wise error rate. The output shows:

diff: The difference in means between each pair of groups
lwr & upr: The 95% confidence interval for the difference
p adj: The adjusted p-value for multiple comparisons

Pairs with p adj < 0.05 indicate statistically significant differences between those groups.

Click to see R code

plot(tukey_result)

3.5 Interpreting Results

3.5.1 ANOVA Statistics

           Df Sum Sq Mean Sq F value   Pr(>F)
group        3  724.1  241.37   11.45 2.04e-05 ***
Residuals   36  759.2   21.09
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Click to see R code

library(dplyr)
data |>
  group_by(group) |>
  summarise(
    Mean = mean(score),
    SD = sd(score)
  )

A one-way analysis of variance (ANOVA) was conducted to examine the effect of teaching method on students’ test scores. The results indicated a statistically significant difference in test scores across the four groups, (F(3, 36) = 11.45, p < .001). Post-hoc comparisons using the Tukey HSD test revealed that the Interactive method (G3) (M = 28.06, SD = 3.33) resulted in significantly higher scores than the Traditional method (M = 18.17, SD = 4.47) with p < .001. However, no significant difference was found between G1 and the Control group (p = .99). These results suggest that certain teaching methods can significantly improve student performance compared to traditional approaches.

3.6 ANOVA Example: Intervention and Verbal Acquisition

3.6.1 Background

Research Question: Does an intensive intervention improve students’ verbal acquisition scores?
Study Design:
- 4 groups: Control, G1, G2, G3 (treatment levels).
- Outcome variable: Verbal acquisition score (average of three assessments).
Hypotheses:
- H_0: No difference in verbal acquisition scores across groups.
- H_A: At least one group has a significantly different mean.

3.6.2 Step 1: Generate Simulated Data in R

Click to see R code

# Load necessary libraries
library(tidyverse)

# Set seed for reproducibility
set.seed(123)

# Generate synthetic data for 4 groups
data <- tibble(
  group = rep(c("Control", "G1", "G2", "G3"), each = 30),
  verbal_score = c(
    rnorm(30, mean = 70, sd = 10),  # Control group
    rnorm(30, mean = 75, sd = 12),  # G1
    rnorm(30, mean = 80, sd = 10),  # G2
    rnorm(30, mean = 85, sd = 8)    # G3
  )
)

# View first few rows
head(data)

# A tibble: 6 × 2
  group   verbal_score
  <chr>          <dbl>
1 Control         64.4
2 Control         67.7
3 Control         85.6
4 Control         70.7
5 Control         71.3
6 Control         87.2

Function explanations:

library(tidyverse): Loads the tidyverse package collection, including dplyr, ggplot2, and tibble
set.seed(123): Sets the random number generator seed to ensure reproducible results across runs
tibble(): Creates a modern data frame (tibble) with improved printing and subsetting behavior
rep(c("Control", "G1", "G2", "G3"), each = 30): Repeats each group label 30 times, creating 120 total observations (30 per group)
rnorm(30, mean = X, sd = Y): Generates 30 random numbers from a normal distribution with specified mean and standard deviation
c(): Combines the four vectors of random numbers into a single vector of 120 values
head(data): Displays the first 6 rows of the dataset for preview

3.6.3 Step 2: Summary Statistics

Click to see R code

# Summary statistics by group
data|>
  group_by(group)|>
  summarise(
    mean_score = mean(verbal_score),
    sd_score = sd(verbal_score),
    n = n()
  )

# A tibble: 4 × 4
  group   mean_score sd_score     n
  <chr>        <dbl>    <dbl> <int>
1 Control       69.5     9.81    30
2 G1            77.1    10.0     30
3 G2            80.2     8.70    30
4 G3            84.2     7.25    30

Function explanations:

group_by(group): Groups the data by the group variable, allowing subsequent operations to be performed separately for each group
summarise(): Creates summary statistics for each group, collapsing the grouped data into summary values
mean(verbal_score): Calculates the arithmetic mean of verbal scores within each group
sd(verbal_score): Computes the standard deviation of verbal scores within each group
n(): Counts the number of observations in each group

Click to see R code

# Boxplot visualization
library(ggplot2)  # Load ggplot2 package for creating visualizations
ggplot(data, aes(x = group, y = verbal_score, fill = group)) +  # Create base plot with group on x-axis, scores on y-axis, fill by group
  geom_boxplot() +  # Add boxplot layer to show distribution of scores within each group
  theme_minimal() +  # Apply minimal theme for clean appearance
  labs(title = "Verbal Acquisition Scores Across Groups", y = "Score", x = "Group")  # Add descriptive labels

3.6.4 Step 3: Check ANOVA Assumptions

3.6.5 Assumption Check 1: Independence of residuals Check

Click to see R code

# Fit the ANOVA model
anova_model <- lm(verbal_score ~ group, data = data)

# Load the lmtest package
library(lmtest)

# Perform the Durbin-Watson test
dw_test_result <- dwtest(anova_model)

# View the test results
print(dw_test_result)


    Durbin-Watson test

data:  anova_model
DW = 2.0519, p-value = 0.5042
alternative hypothesis: true autocorrelation is greater than 0

Interpretation:
- In this example, the DW value is close to 2, and the p-value is greater than 0.05, indicating no significant autocorrelation in the residuals.

3.6.6 Assumption Check 2: Normality Check

Click to see R code

# Shapiro-Wilk normality test for each group
data |>
  group_by(group) |>
  summarise(
    shapiro_p = shapiro.test(verbal_score)$p.value
  )

# A tibble: 4 × 2
  group   shapiro_p
  <chr>       <dbl>
1 Control     0.797
2 G1          0.961
3 G2          0.848
4 G3          0.568

Interpretation:
- If p>0.05, normality assumption is not violated.
- If p<0.05, data deviates from normal distribution.
Since the data meets the normality requirement and no outliers, we can use Bartlett’s or Levene’s tests for HOV checking.

Alternative Check: Q-Q Plot

Click to see R code

ggplot(data, aes(sample = verbal_score)) +
  geom_qq() + geom_qq_line() +
  facet_wrap(~group) +
  theme_minimal() +
  labs(title = "Q-Q Plot for Normality Check")

3.6.7 Assumption Check 3: Homogeneity of Variance (HOV) Check

Click to see R code

library(car) # for leveneTest
library(onewaytests) # for bf.test
data$group <- as.factor(data$group)
car::leveneTest(verbal_score ~ group, data = data)
cat("-----------------------------\n")
bartlett.test(verbal_score ~ group, data = data)
cat("-----------------------------\n")
onewaytests::bf.test(verbal_score ~ group, data = data)

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   3  1.1718 0.3236
      116               
-----------------------------

    Bartlett test of homogeneity of variances

data:  verbal_score by group
Bartlett's K-squared = 3.5385, df = 3, p-value = 0.3158

-----------------------------

  Brown-Forsythe Test (alpha = 0.05) 
------------------------------------------------------------- 
  data : verbal_score and group 

  statistic  : 14.32875 
  num df     : 3 
  denom df   : 109.9888 
  p.value    : 6.030497e-08 

  Result     : Difference is statistically significant. 
-------------------------------------------------------------

Interpretation:
- If p>0.05, variance is homogeneous (ANOVA assumption met).
- If p<0.05, variance differs across groups (consider Welch’s ANOVA).
It turns out our data does not violate the homogeneity of variance assumption.

3.6.8 Step 4: Perform One-Way ANOVA

Click to see R code

anova_model <- aov(verbal_score ~ group, data = data)
summary(anova_model)

             Df Sum Sq Mean Sq F value   Pr(>F)    
group         3   3492  1164.1   14.33 5.28e-08 ***
Residuals   116   9424    81.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:
- If p<0.05, at least one group mean is significantly different.
- If p>0.05, fail to reject H0 (no significant differences).

3.6.9 Step 5: Post-Hoc Tests (Tukey’s HSD)

After finding a significant ANOVA result, we perform Tukey’s HSD test to identify which specific groups differ from each other.
For example, we can compare:
- G1 vs Control
- G2 vs Control
- G3 vs Control
- G2 vs G1
- G3 vs G1
- G3 vs G2

# Tukey HSD post-hoc test
tukey_results <- TukeyHSD(anova_model)
round(tukey_results$group, 3)

             diff    lwr    upr p adj
G1-Control  7.611  1.545 13.677 0.008
G2-Control 10.715  4.649 16.782 0.000
G3-Control 14.720  8.654 20.786 0.000
G2-G1       3.104 -2.962  9.170 0.543
G3-G1       7.109  1.042 13.175 0.015
G3-G2       4.005 -2.062 10.071 0.318

Interpretation:
- Identifies which groups differ.
- If p<0.05, the groups significantly differ.
  - G1-Control
  - G2-Control
  - G3-Control
  - G3-G1

3.6.10 Step 6: Family-wise Error Rate Control

General Linear Hypothesis Tests: glht function in multcomp package allow us to perform family-wise error rate control (using adjusted p-values).
Alternatively, we can use TukeyHSD to adjust p values to control inflated Type I error.

Y = \beta_0G_0+\beta_1G_1 +\beta_2G_2 +\beta_3G_3

When \{G_0, G_1, G_2, G_3\} = \{0, 1, 0, 0\}, Y = \beta_1, which is the group difference between G1 with the control group
When \{G_0, G_1, G_2, G_3\} = \{0, 0, 1, 0\}, Y = \beta_2, which is the group difference between G2 with the control group
When \{G_0, G_1, G_2, G_3\} = \{0, -1, 1, 0\}, Y = \beta_2 - \beta_1, which is the group difference between G2 with G1

Click to see R code

library(multcomp)
# install.packages("multcomp")
### set up multiple comparisons object for all-pair comparisons
# head(model.matrix(anova_model))
comprs <- rbind(
  "G1 - Ctrl" = c(0, 1, 0,  0),
  "G2 - Ctrl" = c(0, 0, 1,  0),
  "G3 - Ctrl" = c(0, 0, 0,  1),
  "G2 - G1" = c(0, -1, 1,  0),
  "G3 - G1" = c(0, -1, 0,  1),
  "G3 - G2" = c(0, 0, -1,  1)
)
cht <- glht(anova_model, linfct = comprs)
summary(cht, test = adjusted("fdr"))


     Simultaneous Tests for General Linear Hypotheses

Fit: aov(formula = verbal_score ~ group, data = data)

Linear Hypotheses:
               Estimate Std. Error t value Pr(>|t|)    
G1 - Ctrl == 0    7.611      2.327   3.270  0.00283 ** 
G2 - Ctrl == 0   10.715      2.327   4.604 3.20e-05 ***
G3 - Ctrl == 0   14.720      2.327   6.325 2.94e-08 ***
G2 - G1 == 0      3.104      2.327   1.334  0.18487    
G3 - G1 == 0      7.109      2.327   3.055  0.00419 ** 
G3 - G2 == 0      4.005      2.327   1.721  0.10555    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- fdr method)

3.6.11 Side-by-Side comparison of two methods

3.6.11.1 TukeyHSD method

TukeyHSD method

             diff    lwr    upr p adj
G1-Control  7.611  1.545 13.677 0.008 **
G2-Control 10.715  4.649 16.782 0.000 ***
G3-Control 14.720  8.654 20.786 0.000 ***
G2-G1       3.104 -2.962  9.170 0.543
G3-G1       7.109  1.042 13.175 0.015 *
G3-G2       4.005 -2.062 10.071 0.318

3.6.11.2 FDR method

multcomp package

               Estimate Std. Error t value Pr(>|t|)
G1 - Ctrl == 0    7.611      2.327   3.270  0.00283 **
G2 - Ctrl == 0   10.715      2.327   4.604 3.20e-05 ***
G3 - Ctrl == 0   14.720      2.327   6.325 2.94e-08 ***
G2 - G1 == 0      3.104      2.327   1.334  0.18487
G3 - G1 == 0      7.109      2.327   3.055  0.00419 **
G3 - G2 == 0      4.005      2.327   1.721  0.10555

The differences in p-value adjustment between the TukeyHSD method and the multcomp package stem from how each approach calculates and applies the multiple comparisons correction. Below is a detailed explanation of these differences.

3.6.12 Comparison of Differences

Feature	`TukeyHSD()` (Base R)	`multcomp::glht()`
Distribution Used	Studentized Range (q-distribution)	t-distribution
Error Rate Control	Strong FWER control	Flexible error control
Simultaneous Confidence Intervals	Yes	Typically not (depends on method used)
Adjustment Method	Tukey-Kramer adjustment	Single-step, Westfall, Holm, Bonferroni, etc.
P-value Differences	More conservative (larger p-values)	Slightly different due to t-distribution

3.6.13 Step 6: Reporting ANOVA Results

Result

We first examined three assumptions of ANOVA for our data as the preliminary analysis. According to the Durbin-Watson test, the Shapiro-Wilk normality test, and the Bartletts’ test, the sample data meets all assumptions of the one-way ANOVA modeling.
A one-way ANOVA was then conducted to examine the effect of three intensive intervention methods (Control, G1, G2, G3) on verbal acquisition scores. There was a statistically significant difference between groups, F(3,116)=14.33, p<.001.
To further examine which intervention method is most effective, we performed Tukey’s post-hoc comparisons. The results revealed that all three intervention methods have significantly higher scores than the control group (G1-Ctrl: p = .003; G2-Ctrl: p < .001; G3-Ctrl: p < .001). Among the three intervention methods, G3 seems to be the most effective. Specifically, G3 showed significantly higher scores than G1 (p = .004). However, no significant difference was found between G2 and G3 (p = .105).

Discussion

These findings suggest that higher intervention intensity improves verbal acquisition performance, which is consistent with prior literature [xxxx/references].

4 After-Class Exercise: Effect of Sleep Duration on Cognitive Performance

4.1 Background

Research Question:
- Does the amount of sleep affect cognitive performance on a standardized test?
Study Design
- Independent variable: Sleep duration (3 groups: Short (≤5 hrs), Moderate (6-7 hrs), Long (≥8 hrs)).
- Dependent variable: Cognitive performance scores (measured as test scores out of 100).

A one-way ANOVA was conducted to examine the effect of sleep duration on cognitive performance.
There was a statistically significant difference in cognitive test scores across sleep groups, F(2,87)=15.88,p<.001.
Tukey’s post-hoc test revealed that participants in the Long sleep group (M=81.52, SD=6.27) performed significantly better than those in the Short sleep group (M=65.68, SD=12.55), p<.01.
These results suggest that inadequate sleep is associated with lower cognitive performance.

--- title: "Lecture 04: ANOVA Assumptions Checking" subtitle: "Experimental Design in Education" date: "2025-02-10" execute: eval: true echo: true format: html: code-tools: true code-line-numbers: false code-fold: true code-summary: 'Click to see R code' number-offset: 1 fig.width: 10 fig-align: center message: false grid: sidebar-width: 350px uark-revealjs: chalkboard: true embed-resources: false code-fold: false number-sections: true number-depth: 1 footer: "ESRM 64503" slide-number: c/t tbl-colwidths: auto scrollable: true output-file: slides-index.html mermaid: theme: forest engine: knitr filters: - webr --- ## Class Outline - Go through three assumptions of ANOVA and their checking statistics - Post-hoc tests for more group comparisons - Example: Intervention and Verbal Acquisition - After-class Exercise: Effect of Sleep Duration on Cognitive Performance ## Review ANOVA Procedure 1. **Set hypotheses**: - **Null hypothesis** ($H_0$): All group means are equal. - **Alternative hypothesis** ($H_A$): At least one group mean differs. 2. **Determine statistical parameters**: - Significance level $\alpha$ - Degrees of freedom for between-group ($df_b$) and within-group ($df_w$) - Find the critical F-value. 3. **Compute test statistic**: - Calculate **F-ratio** based on between-group and within-group sum of squares. 4. **Compare results**: - Either compare $F_{\text{obs}}$ with $F_{\text{crit}}$ or $p$-value with $\alpha$. - If $p < \alpha$, reject $H_0$. ## ANOVA Type I Error 1. The way to compare the means of multiple groups (i.e., more than two groups) - Compared to multiple t-tests, the ANOVA does not inflate Type I error rates. - For F-test in ANOVA, Type I error rate is not inflated (testing the $H_0$ once a time) - Type I error rate is inflated when we do multiple comparison (testing multiple $H_0$s given the same data) - **Family-wise error rate (FWER)**: probability of at least one Type I error occur = $1 − (1 − \alpha)^c$ where c =number of tests. Frequently used FWER methods include Bonferroni or FDR. ------------------------------------------ ```{r} #| code-fold: true library(purrr) alpha <- .05 c <- seq(10) prob_TypeOneError <- 1 - (1 - alpha)^c plot(x = c, y = prob_TypeOneError, type = "b", xlab = "Number of comparisons", ylab = "Pr(#error > 1)", main = "Type I error inflation with number of tests increases") ``` ::: callout-note ## Family-Wise Error Rate 1. The **Family-Wise Error Rate** (FWER) is the probability of making at least one **Type I error** (false positive) across a set of multiple comparisons. 2. In **Analysis of Variance (ANOVA)**, multiple comparisons are often necessary when analyzing the differences between group means. FWER control is crucial to ensure the validity of statistical conclusions. ::: # ANOVA Assumptions ## Overview - Like all statistical tests, ANOVA requires certain assumptions to be met for valid conclusions: - **Independence**: Observations are independent of each other. - **Normality**: The residuals (errors) follow a normal distribution. - **Homogeneity of variance (HOV)**: The variance within each group is approximately equal. ## Importance of Assumptions - **If assumptions are violated**, the results of ANOVA may not be reliable. - **We call models robust** when ANOVAs are not easily influenced by the violation of assumptions. - Typically: - ANOVA is **robust** to minor violations of normality, especially for large sample sizes (Central Limit Theorem). - **Not robust** to violations of independence—if independence is violated, ANOVA is inappropriate. - **Moderately robust** to HOV violations if sample sizes are equal. ## Assumption 1: Independence among samples - **Definition**: Each observation should be independent of others. - **Violations**: - Clustering of data (e.g., repeated measures). - Participants influencing each other (e.g., classroom discussions). - **Check (Optional)**: Use the [Durbin-Watson test]{.redcolor} to check Serial Correlation. - **Solution**: Repeated Measures ANOVA, Mixed ANOVA, Multilevel Model - **Consequences**: If independence is violated, [ANOVA results are not valid]{.red}. ---------------------------------- **Example: Demonstrating Independence Violation** ```{r} #| code-fold: true #| results: hold # Generate data with dependency (autocorrelation) set.seed(1234) n_per_group <- 20 groups <- 3 # Create dependent data using autoregressive structure generate_dependent_data <- function(n, mean_val, sd_val, rho = 0.7) { y <- numeric(n) y[1] <- rnorm(1, mean_val, sd_val) for(i in 2:n) { y[i] <- rho * y[i-1] + rnorm(1, mean_val * (1-rho), sd_val * sqrt(1-rho^2)) } return(y) } # Generate data for 3 groups with different means but with dependency dependent_data <- data.frame( group = factor(rep(c("A", "B", "C"), each = n_per_group)), score = c( generate_dependent_data(n_per_group, 50, 10), # Group A generate_dependent_data(n_per_group, 55, 10), # Group B generate_dependent_data(n_per_group, 60, 10) # Group C ) ) # Compare with independent data independent_data <- data.frame( group = factor(rep(c("A", "B", "C"), each = n_per_group)), score = c( rnorm(n_per_group, 50, 10), # Group A rnorm(n_per_group, 55, 10), # Group B rnorm(n_per_group, 60, 10) # Group C ) ) # Run ANOVA on both datasets anova_dependent <- aov(score ~ group, data = dependent_data) anova_independent <- aov(score ~ group, data = independent_data) # Compare results cat("ANOVA with DEPENDENT data:\n") print(summary(anova_dependent)) cat("--------------------------\n") cat("\nANOVA with INDEPENDENT data:\n") print(summary(anova_independent)) # Check Durbin-Watson test cat("--------------------------\n") cat("\nDurbin-Watson test for dependent data:") print(lmtest::dwtest(anova_dependent)) cat("--------------------------\n") cat("\nDurbin-Watson test for independent data:") print(lmtest::dwtest(anova_independent)) ``` **Key observations:** - Dependent data often shows artificially **smaller p-values** (inflated significance) - Standard errors are **underestimated** when independence is violated - Durbin-Watson test helps detect this violation (DW ≠ 2, p < 0.05) ------------------------------------------------------------------------ ### Background: Durbin-Watson (DW) Test ::: callout-note - The Durbin-Watson test is primarily used for detecting **autocorrelation** in time-series data. - In the context of ANOVA with independent groups, residuals are generally assumed to be independent. - It's good practice to check this assumption, especially if there's a reason to suspect potential autocorrelation. ::: ::: callout-important - Properties of DW Statistic: - $H_0$: [Independence of residual is satisfied.]{.redcolor} - Ranges from 0 to 4. - A value around 2 suggests no autocorrelation. - Values approaching 0 indicate positive autocorrelation. - Values toward 4 suggest negative autocorrelation. - P-value - P-Value: A small p-value (typically \< 0.05) indicates violation of independence ::: --------------------------------------------- **Real-world examples of independence violations:** - **Dependent data scenarios**: - **Repeated measures**: Testing the same students multiple times (pre-test, mid-test, post-test) - **Classroom clusters**: Students in the same classroom influence each other's performance - **Time series**: Daily test scores where yesterday's performance affects today's - **Spatial correlation**: Schools in the same district sharing similar resources and outcomes - **Independent data scenarios**: - **Random sampling**: Selecting students from different schools across different regions - **Single measurement**: Each participant tested only once with no interaction - **Controlled assignment**: Participants randomly assigned to groups with no cross-contamination **Why this matters**: When observations are dependent, they don't provide as much unique information as independent observations. This leads to overconfident results (smaller standard errors and p-values). --------------------------------------------- ### Visualization of indepdence test ```{r} #| code-fold: true #| fig-width: 12 #| fig-height: 8 #| warning: false library(ggplot2) library(gridExtra) # Use the same data from previous example n_per_group <- 20 # Generate dependent and independent data dependent_data <- data.frame( group = factor(rep(c("A", "B", "C"), each = n_per_group)), score = c( generate_dependent_data(n_per_group, 50, 10), generate_dependent_data(n_per_group, 55, 10), generate_dependent_data(n_per_group, 60, 10) ), observation = 1:(3*n_per_group), type = "Dependent (AR)" ) independent_data <- data.frame( group = factor(rep(c("A", "B", "C"), each = n_per_group)), score = c( rnorm(n_per_group, 50, 10), rnorm(n_per_group, 55, 10), rnorm(n_per_group, 60, 10) ), observation = 1:(3*n_per_group), type = "Independent" ) # Combine data combined_data <- rbind(dependent_data, independent_data) # Plot 1: Time series plot showing autocorrelation p1 <- ggplot(combined_data, aes(x = observation, y = score, color = group)) + geom_line(size = 0.8) + geom_point(size = 1.5) + facet_wrap(~type, scales = "free") + labs(title = "Time Series: Independence vs Dependence", x = "Observation Order", y = "Score") + theme_minimal() + theme(legend.position = "bottom") # Plot 2: Lag plot to show autocorrelation create_lag_data <- function(data) { data$score_lag1 <- c(NA, data$score[-length(data$score)]) return(data[!is.na(data$score_lag1), ]) } lag_dependent <- create_lag_data(dependent_data) lag_independent <- create_lag_data(independent_data) lag_combined <- rbind(lag_dependent, lag_independent) p2 <- ggplot(lag_combined, aes(x = score_lag1, y = score)) + geom_point(aes(color = group), alpha = 0.7) + geom_smooth(method = "lm", se = TRUE, color = "black", size = 0.5) + facet_wrap(~type, scales = "free") + labs(title = "Lag Plot: Current vs Previous Observation", x = "Score (t-1)", y = "Score (t)") + theme_minimal() + theme(legend.position = "bottom") # Display plots grid.arrange(p1, p2, ncol = 1) ``` **Key Visual Differences:** - **Dependent data**: Shows smooth, connected patterns with visible trends - **Independent data**: Shows random scatter with no systematic patterns - **Lag plot**: Dependent data shows positive correlation between consecutive observations ------------------------------------------------------------------------ [`lmtest::dwtest()`]{.bluecolor .bigger} - Performs the Durbin-Watson test for autocorrelation of disturbances. - The Durbin-Watson test has the null hypothesis that the autocorrelation of the disturbances is 0. ```{r} #| eval: false #install.packages("lmtest") library(lmtest) err1 <- rnorm(100) ## generate regressor and dependent variable x <- rep(c(-1,1), 50) y <- 1 + x + err1 dat <- data.frame(x = x, y = y) ## perform Durbin-Watson test dwtest(y ~ x, data = dat) ``` These results suggest there is no autocorrelation at the alpha level of .05. -------------------------------- ::: panel-tabset ### Exercise - Below is the R code that has loaded two datasets for you. `dat1` and `dat2`, try to examine which dataset violates independence. ```{webr-r} #| context: setup set.seed(1999) # Create dependent data using autoregressive structure generate_dependent_data <- function(n, mean_val, sd_val, rho = 0.7) { y <- numeric(n) y[1] <- rnorm(1, mean_val, sd_val) for(i in 2:n) { y[i] <- rho * y[i-1] + rnorm(1, mean_val * (1-rho), sd_val * sqrt(1-rho^2)) } return(y) } # Use the same data from previous example n_per_group <- 30 # Generate dependent and independent data dat1 <- data.frame( group = factor(rep(c("A", "B", "C"), each = n_per_group)), score = c( generate_dependent_data(n_per_group, 50, 10), generate_dependent_data(n_per_group, 55, 10), generate_dependent_data(n_per_group, 60, 10) ) ) dat2 <- data.frame( group = factor(rep(c("A", "B", "C"), each = n_per_group)), score = c( rnorm(n_per_group, 50, 10), rnorm(n_per_group, 55, 10), rnorm(n_per_group, 60, 10) ) ) ``` ```{webr-r} #| editor-code-line-numbers: 4-7 print(dat1) print(dat2) ## Test which data violate independence assumption library(________) print(dwtest(________)) print(dwtest(________)) ``` ### Answer ```{webr-r} # DW = 0.8248, p-value = 6.68e-11, dat1 violates independence # DW = 2.2514, p-value = 0.8386, dat2 does not violates independence library(lmtest) print(dwtest(score ~ group, data = dat1)) print(dwtest(score ~ group, data = dat2)) ``` ::: ## Assumption 2: Normality - The **dependent variable (DV)** should be normally distributed within each group. - **Assessments**: - Graphical methods: Histograms, [Q-Q plots]{.redcolor}. - Statistical tests: - [Shapiro-Wilk test (common)]{.redcolor} - Kolmogorov-Smirnov (KS) test (for large samples) - Anderson-Darling test (detects kurtosis issues). - **Robustness**: - ANOVA is robust to normality violations for large samples. - If normality is violated, consider transformations or non-parametric tests. ------------------------------------------------------------------------ ### Normality check for the whole samples - Assume that the DV (Y) is distributed normally within each group for ANOVA - ANOVA is **robust** to minor violations of normality - So generally start with homogeneity, then assess normality ![](images/clipboard-3483482689.png) ------------------------------------------------------------------------ ### R function `shapiro.test()` ```{r} #| eval: true #| code-fold: true #| results: hold set.seed(123) # For reproducibility sample_normal_data <- rnorm(200, mean = 50, sd = 10) # Generate normal data sample_nonnormal_data <- sample(1:10, 200, replace = TRUE) # Generate non-normal data cat("Normality assumption holds:") shapiro.test(sample_normal_data) cat("Normality assumption does not hold:") shapiro.test(sample_nonnormal_data) ``` ------------------------------------------------------------------------ ### R function `ks.test()` ```{r} #| results: hold # Perform Kolmogorov-Smirnov test against a normal distribution cat("Normality assumption holds:") ks.test(scale(sample_normal_data), "pnorm") cat("Normality assumption does not hold:") ks.test(scale(sample_nonnormal_data), "pnorm") ``` ::: panel-tabset ### Exercise II Perform Shapiro-Wilk and Kolmogorov-Smirnov for both datasets. Tell me which test works or does not work and why. ```{webr-r} #| editor-code-line-numbers: 6-11 set.seed(2345) normal_data <- rnorm(50, mean = 50, sd = 10) # normal data nonnormal_data <- sample(1:10, 50, replace = TRUE) # non-normal ### Perform Shapiro-Wilk and Kolmogorov-Smirnov for both data ### Shapiro-Wilk _____(normal_data) _____(nonnormal_data) ### Kolmogorov-Smirnov _____(____(normal_data, _____)) _____(____(nonnormal_data, _____)) ``` ### Answer ```{webr-r} shapiro.test(normal_data) shapiro.test(nonnormal_data) ks.test(scale(normal_data), "pnorm") ks.test(scale(nonnormal_data), "pnorm") ``` ::: ----------------------- ### Normaliy check for each group ```{r} #| code-fold: false #| warning: false # Generate three-group data with normal distributions library(dplyr) # we need this package to group the data set.seed(123) group_data <- data.frame( group = rep(c("A", "B", "C"), each = 30), score = c(rnorm(30, mean = 50, sd = 10), # Group A rnorm(30, mean = 55, sd = 12), # Group B rnorm(30, mean = 48, sd = 9)) # Group C ) # Run Shapiro-Wilk test for each group group_data |> group_by(group) |> summarize( shapiro_p_value = shapiro.test(score)$p.value ) ``` **Interpretation of Normality Checking Results:** - If all p-values > 0.05: All groups satisfy normality assumption - proceed with ANOVA - If one group has p-value < 0.05 but others are normal: - ANOVA is generally robust to minor normality violations, especially with equal sample sizes - Consider the sample size: larger samples (n > 30) make ANOVA more robust - Options: (1) Continue with ANOVA if violation is mild, (2) Apply data transformation (log, square root), or (3) Use non-parametric alternative (Kruskal-Wallis test) - If multiple groups violate normality: Strong evidence against using ANOVA - consider transformations or non-parametric tests -------- ::: panel-tabset ### Exercise III Perform Shapiro-Wilk for each group datasets. Tell me which group is not normal. ```{webr-r} #| context: setup set.seed(2345) group_data2 <- data.frame( group = rep(c("A", "B", "C"), each = 30), score = c(round(rnorm(30, mean = 50, sd = 10), 2), # Group A sample(seq(50, 100, .01), 30, replace = TRUE), # Group B round(rnorm(30, mean = 48, sd = 9), 2)) # Group C ) ``` ```{webr-r} #| editor-code-line-numbers: 5-9 library(dplyr) # we need this package to group the data head(group_data2) # check first 6 rows ### Perform Shapiro-Wilk for each group group_data2 |> ______(______) |> ______( shapiro_p_value = __________(________)$__________ ) ``` ### Answer ```{webr-r} group_data2 |> group_by(group) |> summarize( shapiro_p_value = shapiro.test(score)$p.value ) ``` ::: -------------------------------------------------------------------- ## Assumption 3: Homogeneity of Variance (HOV) - Variance across groups should be equal. - **Assessments**: - **Levene’s test**: Tests equality of variances. - **Brown-Forsythe test**: More robust to non-normality. - **Bartlett's test**: For data with normality. - **Boxplots**: Visual inspection. - **What if violated?** - **Welch’s ANOVA** (adjusted for variance differences). - **Transforming** the dependent variable. - **Using non-parametric tests** (e.g., Kruskal-Wallis). ------------------------------------------------------------------------ [Practical Considerations]{.bluecolor .bigger} ```{r} #| eval: false ## Computes Levene's test for homogeneity of variance across groups. car::leveneTest(outcome ~ group, data = data) ## Boxplots to visualize the variance by groups boxplot(outcome ~ group, data = data) ## Brown-Forsythe test onewaytests::bf.test(outcome ~ group, data = data) ## Bartlett's Test bartlett.test(outcome ~ group, data = data) ``` - **Levene's test** and **Brown-Forsythe test** are often preferred when data does not meet the assumption of normality, especially for small sample sizes. - **Bartlett’s test** is most powerful when the data is normally distributed and when sample sizes are equal across groups. However, it can be less reliable if these assumptions are violated. ------------------------------------------------------------------------ [Decision Tree]{.redcolor .bigger} ```{dot} //| echo: false digraph flowchart { node [shape=box, style=filled, fillcolor=lightgray]; A [label="Is the data normal by group?"]; B [label="Use Bartlett's Test"]; C [label="Are there outliers?"]; D [label="Use Brown-Forsythe Test"]; E [label="Use Levene's Test"]; A -> B [label="Yes"]; A -> C [label="No"]; C -> D [label="Yes"]; C -> E [label="No"]; } ``` ------------------------------------------------------------------------ ## ANOVA Robustness - **Robust to**: - Minor normality violations (for large samples). - Small HOV violations if group sizes are **equal**. - **Not robust to**: - **Independence violations**—ANOVA is invalid if data points are dependent. - **Severe HOV violations**—Type I error rates become unreliable. - The robustness of assumptions is something you should be careful about before/when you perform data collection. They are not something you can address after data collection has been finished. ------------------------------------------------------------------------ [Robustness to Violations of Normality Assumption]{.redcolor} - ANOVA assumes that the residuals (errors) are normally distributed within each group. - However, ANOVA is generally robust to violations of normality, particularly when **the sample size is large**. - **Theoretical Justification**: This robustness is primarily due to the Central Limit Theorem (CLT), which states that, for sufficiently large sample sizes (typically $n≥30$ per group), the sampling distribution of the mean approaches normality, even if the underlying population distribution is non-normal. - This means that unless the data are heavily skewed or have extreme outliers, ANOVA results remain valid and Type I error rates are not severely inflated. ------------------------------------------------------------------------ ### Robustness to Violations of Homogeneity of Variance - The homogeneity of variance (homoscedasticity) assumption states that all groups should have equal variances. ANOVA can tolerate moderate violations of this assumption, particularly when: - **Sample sizes are equal (or nearly equal) across groups** – When groups have equal sample sizes, the F-test remains robust to variance heterogeneity because the pooled variance estimate remains balanced. - **The degree of variance heterogeneity is not extreme** – If the largest group variance is no more than about four times the smallest variance, ANOVA results tend to remain accurate. ------------------------------------------------------------------------ ### ANOVA: Lack of Robustness to Violations of Independence of Errors - The assumption of independence of errors means that observations within and between groups must be uncorrelated. Violations of this assumption severely compromise ANOVA’s validity because: - **Inflated Type I error rates** – If errors are correlated (e.g., due to clustering or repeated measures), standard errors are underestimated, leading to an increased likelihood of falsely rejecting the null hypothesis. - **Biased parameter estimates** – When observations are not independent, the variance estimates do not accurately reflect the true variability in the data, distorting F-statistics and p-values. - **Common sources of dependency** – Examples include nested data (e.g., students within schools), repeated measurements on the same subjects, or time-series data. In such cases, alternatives like mixed-effects models or generalized estimating equations (GEE) should be considered. # Omnibus ANOVA Test ## Overview - **What does it test?** - Whether there is **at least one** significant difference among means. - **Limitation**: - Does **not** tell **which** groups are different. - **Solution**: - Conduct **post-hoc tests**. ## Individual Comparisons of Means - If ANOVA is **significant**, follow-up tests identify **where** differences occur. - **Types**: - **Unplanned (post-hoc) comparisons**: Conducted **after** ANOVA (Ominibus ANOVA Test; Post-hoc Test). - **Planned comparisons**: Defined **before** data collection. ## Planned vs. Unplanned Comparisons - **Unplanned (post-hoc)**: - **Data-driven**. - Only performed **if ANOVA is significant**. - **Planned**: - Based on **theory**. - Can be done **even if ANOVA is not significant**. ## Types of Unplanned Comparisons - **Common post-hoc tests**: 1. **Tukey’s HSD** 2. **Family-Wise Error Rate** or Adjusted p-values 3. **Fisher’s LSD** 4. **Sidák correction** ------------------------------------------------------------------------ ### Post-hoc test [Tukey’s HSD]{.bluecolor} - Controls for **Type I error** across multiple comparisons. - Uses a **q-statistic** from a Tukey table. - Preferred when **all pairs** need comparison. ```{r} aov_result <- aov(score ~ group, group_data) # run the ANOVA model first TukeyHSD(aov_result) # call TukeyHSD followed by the model to perform Tukey multiple comparisons ``` [Family-wise Error Rate (adjusted p-values)]{.bluecolor} - Adjusts p-values to reduce Type I error - Report adjusted p-values (typically larger that original p-values) ### Tukey Honest Significant Differences (HSD) Test - **Purpose**: Conducts all possible pairwise comparisons between group means while controlling the family-wise error rate at the specified alpha level (typically 0.05). - **Why it's needed**: Simple pairwise t-tests after a significant ANOVA inflate the Type I error rate. Tukey HSD adjusts for multiple comparisons to maintain the overall error rate. - **How it works**: Creates confidence intervals for all pairwise mean differences using the studentized range distribution, which accounts for the number of groups being compared. - **Interpretation**: If a confidence interval does not include zero, or if p adj < 0.05, the pair of groups differs significantly. ```{r} #| eval: true tukey_result <- TukeyHSD(aov_result) print(tukey_result$group) ``` **Interpretation**: The Tukey HSD test provides pairwise comparisons between all groups while controlling for family-wise error rate. The output shows: - **diff**: The difference in means between each pair of groups - **lwr & upr**: The 95% confidence interval for the difference - **p adj**: The adjusted p-value for multiple comparisons Pairs with p adj < 0.05 indicate statistically significant differences between those groups. ------------------------------------------------------------------------ ```{r} #| eval: true plot(tukey_result) ``` ## Interpreting Results ### ANOVA Statistics ``` Df Sum Sq Mean Sq F value Pr(>F) group 3 724.1 241.37 11.45 2.04e-05 *** Residuals 36 759.2 21.09 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```{r} #| eval: false #| code-fold: true library(dplyr) data |> group_by(group) |> summarise( Mean = mean(score), SD = sd(score) ) ``` - A one-way analysis of variance (ANOVA) was conducted to examine the effect of teaching method on students' test scores. The results indicated a statistically significant difference in test scores across the four groups, (F(3, 36) = 11.45, p < .001). Post-hoc comparisons using the Tukey HSD test revealed that the Interactive method (G3) (M = 28.06, SD = 3.33) resulted in significantly higher scores than the Traditional method (M = 18.17, SD = 4.47) with p < .001. However, no significant difference was found between G1 and the Control group (p = .99). These results suggest that certain teaching methods can significantly improve student performance compared to traditional approaches. ## ANOVA Example: Intervention and Verbal Acquisition ### Background - Research Question: Does an intensive intervention improve students’ verbal acquisition scores? - Study Design: - 4 groups: Control, G1, G2, G3 (treatment levels). - Outcome variable: Verbal acquisition score (average of three assessments). - Hypotheses: - $H_0$: No difference in verbal acquisition scores across groups. - $H_A$: At least one group has a significantly different mean. ### Step 1: Generate Simulated Data in R ```{r} #| warning: false #| message: false #| results: hold # Load necessary libraries library(tidyverse) # Set seed for reproducibility set.seed(123) # Generate synthetic data for 4 groups data <- tibble( group = rep(c("Control", "G1", "G2", "G3"), each = 30), verbal_score = c( rnorm(30, mean = 70, sd = 10), # Control group rnorm(30, mean = 75, sd = 12), # G1 rnorm(30, mean = 80, sd = 10), # G2 rnorm(30, mean = 85, sd = 8) # G3 ) ) # View first few rows head(data) ``` **Function explanations:** - `library(tidyverse)`: Loads the tidyverse package collection, including dplyr, ggplot2, and tibble - `set.seed(123)`: Sets the random number generator seed to ensure reproducible results across runs - `tibble()`: Creates a modern data frame (tibble) with improved printing and subsetting behavior - `rep(c("Control", "G1", "G2", "G3"), each = 30)`: Repeats each group label 30 times, creating 120 total observations (30 per group) - `rnorm(30, mean = X, sd = Y)`: Generates 30 random numbers from a normal distribution with specified mean and standard deviation - `c()`: Combines the four vectors of random numbers into a single vector of 120 values - `head(data)`: Displays the first 6 rows of the dataset for preview ### Step 2: Summary Statistics ```{r} # Summary statistics by group data|> group_by(group)|> summarise( mean_score = mean(verbal_score), sd_score = sd(verbal_score), n = n() ) ``` **Function explanations:** - `group_by(group)`: Groups the data by the `group` variable, allowing subsequent operations to be performed separately for each group - `summarise()`: Creates summary statistics for each group, collapsing the grouped data into summary values - `mean(verbal_score)`: Calculates the arithmetic mean of verbal scores within each group - `sd(verbal_score)`: Computes the standard deviation of verbal scores within each group - `n()`: Counts the number of observations in each group ------------------------------------------------------------------------ ```{r} # Boxplot visualization library(ggplot2) # Load ggplot2 package for creating visualizations ggplot(data, aes(x = group, y = verbal_score, fill = group)) + # Create base plot with group on x-axis, scores on y-axis, fill by group geom_boxplot() + # Add boxplot layer to show distribution of scores within each group theme_minimal() + # Apply minimal theme for clean appearance labs(title = "Verbal Acquisition Scores Across Groups", y = "Score", x = "Group") # Add descriptive labels ``` ### Step 3: Check ANOVA Assumptions ### Assumption Check 1: Independence of residuals Check ```{r} #| results: hold #| warning: false #| message: false # Fit the ANOVA model anova_model <- lm(verbal_score ~ group, data = data) # Load the lmtest package library(lmtest) # Perform the Durbin-Watson test dw_test_result <- dwtest(anova_model) # View the test results print(dw_test_result) ``` - Interpretation: - In this example, the `DW` value is close to 2, and the p-value is greater than 0.05, indicating no significant autocorrelation in the residuals. ------------------------------------------------------------------------ ### Assumption Check 2: Normality Check ```{r} # Shapiro-Wilk normality test for each group data |> group_by(group) |> summarise( shapiro_p = shapiro.test(verbal_score)$p.value ) ``` - Interpretation: - If $p>0.05$, normality assumption is not violated. - If $p<0.05$, data deviates from normal distribution. - Since the data meets the normality requirement and no outliers, we can use [Bartlett's or Levene's tests]{.redcolor} for HOV checking. ------------------------------------------------------------------------ - Alternative Check: Q-Q Plot ```{r} ggplot(data, aes(sample = verbal_score)) + geom_qq() + geom_qq_line() + facet_wrap(~group) + theme_minimal() + labs(title = "Q-Q Plot for Normality Check") ``` ------------------------------------------------------------------------ ### Assumption Check 3: Homogeneity of Variance (HOV) Check ```{r} #| results: hold #| warning: false library(car) # for leveneTest library(onewaytests) # for bf.test data$group <- as.factor(data$group) car::leveneTest(verbal_score ~ group, data = data) cat("-----------------------------\n") bartlett.test(verbal_score ~ group, data = data) cat("-----------------------------\n") onewaytests::bf.test(verbal_score ~ group, data = data) ``` - Interpretation: - If $p>0.05$, variance is homogeneous (ANOVA assumption met). - If $p<0.05$, variance differs across groups (consider Welch’s ANOVA). - It turns out our data does not violate the homogeneity of variance assumption. ### Step 4: Perform One-Way ANOVA ```{r} anova_model <- aov(verbal_score ~ group, data = data) summary(anova_model) ``` - Interpretation: - If $p<0.05$, at least one group mean is significantly different. - If $p>0.05$, fail to reject $H0$ (no significant differences). ### Step 5: Post-Hoc Tests (Tukey's HSD) - After finding a significant ANOVA result, we perform Tukey's HSD test to identify which specific groups differ from each other. - For example, we can compare: - G1 vs Control - G2 vs Control - G3 vs Control - G2 vs G1 - G3 vs G1 - G3 vs G2 ```{r} #| code-fold: false # Tukey HSD post-hoc test tukey_results <- TukeyHSD(anova_model) round(tukey_results$group, 3) ``` - Interpretation: - Identifies which groups differ. - If $p<0.05$, the groups significantly differ. - G1-Control - G2-Control - G3-Control - G3-G1 ------------------------------------------------------------------------ ### Step 6: Family-wise Error Rate Control - General Linear Hypothesis Tests: `glht` function in multcomp package allow us to perform family-wise error rate control (using `adjusted p-values`). - Alternatively, we can use `TukeyHSD` to adjust p values to control inflated Type I error. $$ Y = \beta_0G_0+\beta_1G_1 +\beta_2G_2 +\beta_3G_3 $$ - When $\{G_0, G_1, G_2, G_3\}$ = $\{0, 1, 0, 0\}$, $Y = \beta_1$, which is the group difference between G1 with the control group - When $\{G_0, G_1, G_2, G_3\}$ = $\{0, 0, 1, 0\}$, $Y = \beta_2$, which is the group difference between G2 with the control group - When $\{G_0, G_1, G_2, G_3\}$ = $\{0, -1, 1, 0\}$, $Y = \beta_2 - \beta_1$, which is the group difference between G2 with G1 ```{r} #| warning: false #| message: false #| results: hold library(multcomp) # install.packages("multcomp") ### set up multiple comparisons object for all-pair comparisons # head(model.matrix(anova_model)) comprs <- rbind( "G1 - Ctrl" = c(0, 1, 0, 0), "G2 - Ctrl" = c(0, 0, 1, 0), "G3 - Ctrl" = c(0, 0, 0, 1), "G2 - G1" = c(0, -1, 1, 0), "G3 - G1" = c(0, -1, 0, 1), "G3 - G2" = c(0, 0, -1, 1) ) cht <- glht(anova_model, linfct = comprs) summary(cht, test = adjusted("fdr")) ``` ------------------------------------------------------------------------ ### Side-by-Side comparison of two methods ::::: columns ::: {.column width="50%"} #### TukeyHSD method ```{r} #| eval: false #| code-line-numbers: false #| code-summary: "TukeyHSD method" diff lwr upr p adj G1-Control 7.611 1.545 13.677 0.008 ** G2-Control 10.715 4.649 16.782 0.000 *** G3-Control 14.720 8.654 20.786 0.000 *** G2-G1 3.104 -2.962 9.170 0.543 G3-G1 7.109 1.042 13.175 0.015 * G3-G2 4.005 -2.062 10.071 0.318 ``` ::: ::: {.column width="50%"} #### FDR method ```{r} #| eval: false #| code-line-numbers: false #| code-summary: "multcomp package" Estimate Std. Error t value Pr(>|t|) G1 - Ctrl == 0 7.611 2.327 3.270 0.00283 ** G2 - Ctrl == 0 10.715 2.327 4.604 3.20e-05 *** G3 - Ctrl == 0 14.720 2.327 6.325 2.94e-08 *** G2 - G1 == 0 3.104 2.327 1.334 0.18487 G3 - G1 == 0 7.109 2.327 3.055 0.00419 ** G3 - G2 == 0 4.005 2.327 1.721 0.10555 ``` ::: - The differences in p-value adjustment between the `TukeyHSD` method and the `multcomp` package stem from how each approach calculates and applies the multiple comparisons correction. Below is a detailed explanation of these differences. ::::: ### Comparison of Differences | Feature | `TukeyHSD()` (Base R) | `multcomp::glht()` | |----|----|----| | **Distribution Used** | Studentized Range (q-distribution) | t-distribution | | **Error Rate Control** | Strong FWER control | Flexible error control | | **Simultaneous Confidence Intervals** | Yes | Typically not (depends on method used) | | **Adjustment Method** | Tukey-Kramer adjustment | Single-step, Westfall, Holm, Bonferroni, etc. | | **P-value Differences** | More conservative (larger p-values) | Slightly different due to t-distribution | ### Step 6: Reporting ANOVA Results [Result]{.bigger} 1. We first examined three assumptions of ANOVA for our data as the preliminary analysis. According to the Durbin-Watson test, the Shapiro-Wilk normality test, and the Bartletts' test, the sample data meets all assumptions of the one-way ANOVA modeling. 2. A one-way ANOVA was then conducted to examine the effect of three intensive intervention methods (Control, G1, G2, G3) on verbal acquisition scores. There was a statistically significant difference between groups, $F(3,116)=14.33$, $p<.001$. 3. To further examine which intervention method is most effective, we performed Tukey's post-hoc comparisons. The results revealed that all three intervention methods have significantly higher scores than the control group (G1-Ctrl: p = .003; G2-Ctrl: p \< .001; G3-Ctrl: p \< .001). Among the three intervention methods, G3 seems to be the most effective. Specifically, G3 showed significantly higher scores than G1 (p = .004). However, no significant difference was found between G2 and G3 (p = .105). [Discussion]{.bigger} These findings suggest that higher intervention intensity improves verbal acquisition performance, which is consistent with prior literature \[xxxx/references\]. # After-Class Exercise: Effect of Sleep Duration on Cognitive Performance ## Background - Research Question: - Does the amount of sleep affect cognitive performance on a standardized test? - Study Design - Independent variable: Sleep duration (3 groups: Short (≤5 hrs), Moderate (6-7 hrs), Long (≥8 hrs)). - Dependent variable: Cognitive performance scores (measured as test scores out of 100). --------------------------------------------- ::: panel-tabset ### Exercise ```{webr-r} #| context: setup # Set seed for reproducibility set.seed(42) # Generate synthetic data for sleep study sleep_data <- data.frame( sleep_group = rep(c("Short", "Moderate", "Long"), each = 30), cognitive_score = c( rnorm(30, mean = 65, sd = 10), # Short sleep group (≤5 hrs) rnorm(30, mean = 75, sd = 12), # Moderate sleep group (6-7 hrs) rnorm(30, mean = 80, sd = 8) # Long sleep group (≥8 hrs) ) ) ``` ```{webr-r} # load packages you need library(dplyr) library(ggplot2) library(lmtest) library(car) # View first few rows head(sleep_data) # Summary statistics by sleep group sleep_data |> group_by(sleep_group) |> summarise( _______ ) # Step 3: Check ANOVA Assumptions # Indepdendence check: Perform the Durbin-Watson test dw_test_result <- ____(____(_________, data = sleep_data)) # View the test results print(dw_test_result) # Normality check: Shapiro-Wilk normality test for each group sleep_data |> group_by(sleep_group) |> summarise( shapiro_p = ______(_______)$p.value ) # HOV check: Levene's Test for homogeneity of variance ______(_______, data = sleep_data) # Step 5: Perform One-Way ANOVA anova_sleep <- ________(__________, data = sleep_data) summary(anova_sleep) # Step 6: Type I error control -- Tukey HSD post-hoc test tukey_sleep <- TukeyHSD(_______) tukey_sleep ``` ### Answer ```{webr-r} #| eval: false #| echo: false library(dplyr) library(ggplot2) library(lmtest) library(car) # View first few rows head(sleep_data) # Summary statistics by sleep group sleep_data |> group_by(sleep_group) |> summarise( mean_score = mean(cognitive_score), sd_score = sd(cognitive_score), n = n() ) # Boxplot visualization ggplot(sleep_data, aes(x = sleep_group, y = cognitive_score, fill = sleep_group)) + geom_boxplot() + theme_minimal() + labs(title = "Cognitive Performance Across Sleep Groups", y = "Test Score", x = "Sleep Duration") # Step 3: Check ANOVA Assumptions # Indepdendence check: Perform the Durbin-Watson test dw_test_result <- dwtest(aov(cognitive_score ~ sleep_group, data = sleep_data)) # View the test results print(dw_test_result) # Normality check: Shapiro-Wilk normality test for each group sleep_data |> group_by(sleep_group) |> summarise( shapiro_p = shapiro.test(cognitive_score)$p.value ) ggplot(sleep_data, aes(sample = cognitive_score)) + geom_qq() + geom_qq_line() + facet_wrap(~sleep_group) + theme_minimal() + labs(title = "Q-Q Plot for Normality Check") # HOV check: Levene's Test for homogeneity of variance leveneTest(cognitive_score ~ sleep_group, data = sleep_data) # Step 5: Perform One-Way ANOVA anova_sleep <- aov(cognitive_score ~ sleep_group, data = sleep_data) summary(anova_sleep) # Step 6: Type I error control -- Tukey HSD post-hoc test tukey_sleep <- TukeyHSD(anova_sleep) tukey_sleep ``` ::: heimu - A one-way ANOVA was conducted to examine the effect of sleep duration on cognitive performance. - There was a statistically significant difference in cognitive test scores across sleep groups, $F(2,87)=15.88$,$p<.001$. - Tukey's post-hoc test revealed that participants in the Long sleep group (M=81.52, SD=6.27) performed significantly better than those in the Short sleep group (M=65.68, SD=12.55), p\<.01. - These results suggest that inadequate sleep is associated with lower cognitive performance. :::

1.1 Class Outline

1.2 Review ANOVA Procedure

1.3 ANOVA Type I Error

2 ANOVA Assumptions

2.1 Overview

2.2 Importance of Assumptions

2.3 Assumption 1: Independence among samples

2.3.1 Background: Durbin-Watson (DW) Test

2.3.2 Visualization of indepdence test

2.4 Assumption 2: Normality

2.4.1 Normality check for the whole samples

2.4.2 R function shapiro.test()

2.4.3 R function ks.test()

2.4.4 Normaliy check for each group

2.5 Assumption 3: Homogeneity of Variance (HOV)

2.6 ANOVA Robustness

2.6.1 Robustness to Violations of Homogeneity of Variance

2.6.2 ANOVA: Lack of Robustness to Violations of Independence of Errors

3 Omnibus ANOVA Test

3.1 Overview

3.2 Individual Comparisons of Means

3.3 Planned vs. Unplanned Comparisons

3.4 Types of Unplanned Comparisons

3.4.1 Post-hoc test

3.4.2 Tukey Honest Significant Differences (HSD) Test

3.5 Interpreting Results

3.5.1 ANOVA Statistics

3.6 ANOVA Example: Intervention and Verbal Acquisition

3.6.1 Background

3.6.2 Step 1: Generate Simulated Data in R

3.6.3 Step 2: Summary Statistics

3.6.4 Step 3: Check ANOVA Assumptions

3.6.5 Assumption Check 1: Independence of residuals Check

3.6.6 Assumption Check 2: Normality Check

3.6.7 Assumption Check 3: Homogeneity of Variance (HOV) Check

3.6.8 Step 4: Perform One-Way ANOVA

3.6.9 Step 5: Post-Hoc Tests (Tukey’s HSD)

3.6.10 Step 6: Family-wise Error Rate Control

3.6.11 Side-by-Side comparison of two methods

3.6.11.1 TukeyHSD method

3.6.11.2 FDR method

3.6.12 Comparison of Differences

3.6.13 Step 6: Reporting ANOVA Results

4 After-Class Exercise: Effect of Sleep Duration on Cognitive Performance

4.1 Background

2.4.2 R function `shapiro.test()`

2.4.3 R function `ks.test()`