Experimental Design in Education
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-04-09
ANOVA is Analysis of Variance
ANCOVA is Analysis of Covariance
What we will discuss today is a statistical control for reducing the variance due to error.
Statistical control is used when we know a subject’s score on an additional variable.
ANCOVA by definition is a general linear model that includes both:
We are interested in
Comparing method of instruction on students’ math problem solving skills, as measured by a test score.
One-way, independent ANOVA: If we get a significant F statistic, we would conclude that method of instruction differed on the mean # of math problems answered correctly.
But, performance on math word problems may be affected by things other than method of instruction and math ability.
The DV score results from instructional method + the other factors we just listed.
We really want to ask:
To what extent might we have obtained method of instruction difference on math scores had the groups been equivalent in their motivation levels? (or verbal proficiency?)
The inclusion of the covariate adjusts the groups’ means, as if they are the same on the covariate.
The analyses addressed different research questions!
If there is no random assignment of treatments, we should be careful to use ANCOVA!
Each student had also completed a math pre-test (as a indicator of students’ ability), so the principal decided to include the pre-test scores in the analysis as the covariate.
In this example, we will simulate data for 200 students who were randomly assigned to one of three teaching methods.
The pre-test scores will be used as a covariate in the ANCOVA analysis.
The post-test scores will be generated based on the pre-test scores and the teaching method.
student_id | method | pretest | posttest |
---|---|---|---|
1 | lecture | 68.9 | 70.9 |
2 | hands-on | 66.1 | 62.8 |
3 | lecture | 61.6 | 53.5 |
4 | self-paced | 72.5 | 85.8 |
5 | hands-on | 64.9 | 73.1 |
Assumption Check in ANCOVA:
➤ The three usual ANOVA assumptions apply:
➤ Three additional data considerations:
By looking over the plot, we can find that there is a general linear relationship (i.e., straight line) between the covariate (pretest) and the DV (mathtest).
# Plot
ggplot(df, aes(x = pretest, y = posttest)) +
geom_point(color = "blue", alpha = 0.6) +
geom_smooth(method = "lm", color = "steelblue", se = FALSE) +
theme_minimal(base_size = 14) +
labs(x = "pretest", y = "mathtest") +
theme(legend.position = "bottom") +
guides(color = guide_legend(title = NULL))
import matplotlib.pyplot as plt
import numpy as np
# Generate x values (covariate)
x = np.linspace(0, 10, 100)
# Homogeneous regression slopes
y1_homo = 1.0 * x + 2
y2_homo = 1.0 * x + 4
y3_homo = 1.0 * x + 6
# Heterogeneous regression slopes
y1_hetero = -0.1 * x + 6
y2_hetero = 1.2 * x + 1
y3_hetero = 0.9 * x + 3
# Create the figure with two subplots
fig, axs = plt.subplots(1, 2, figsize=(12, 5), sharey=True)
# Plot for homogeneous regression slopes
axs[0].plot(x, y1_homo, label="Group 1")
axs[0].plot(x, y2_homo, label="Group 2")
axs[0].plot(x, y3_homo, label="Group 3")
axs[0].set_title("(a) Homogeneity of regression (slopes)")
axs[0].set_xlabel("Covariate (X)")
axs[0].set_ylabel("DV (Y)")
axs[0].legend()
# Plot for heterogeneous regression slopes
axs[1].plot(x, y1_hetero, label="Group 1")
axs[1].plot(x, y2_hetero, label="Group 2")
axs[1].plot(x, y3_hetero, label="Group 3")
axs[1].set_title("(b) Heterogeneity of regression (slopes)")
axs[1].set_xlabel("Covariate (X)")
axs[1].legend()
plt.tight_layout()
plt.show()
➤ This is equivalent to saying that the relationship between the DV and covariate has to be the same for each cell (a.k.a. “group”)
➤ Consequences depend on whether cells have equal sample sizes and whether a true experiment
When we “eyeball” the three regression slopes (regression of the covariate predicting the DV), we see the relationship is approximately equal.
# Plot
ggplot(df, aes(x = pretest, y = posttest, color = method)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
scale_color_manual(values = c("steelblue", "tomato", "seagreen4")) +
theme_minimal(base_size = 14) +
labs(x = "pretest", y = "mathtest") +
theme(legend.position = "bottom") +
guides(color = guide_legend(title = NULL))
➤ The formal/general method of checking homogeneity of regression slope:
\(SS_{total} = SS_{IV} + SS_{COV} + \color{red}{SS_{IV*COV}} + SS_{within}\)
\(SS_{total} = SS_{IV} + SS_{COV} + SS_{within}\)
Homogeneity of Regression Slope: ➤ The formal/general method of checking homogeneity of regression slope:
Run the model with the interaction term included to make sure it is negligible (a.k.a., not significant) → this is not the ANCOVA yet!
library(gt)
library(tidyverse)
res <- anova(lm(posttest~pretest*method, data=df))
res_tbl <- res |>
as.data.frame() |>
rownames_to_column("Coefficient")
res_gt_display <- gt(res_tbl) |>
fmt_number(
columns = `Sum Sq`:`Pr(>F)`,
suffixing = TRUE,
decimals = 3
)
res_gt_display|>
tab_style( # style for versicolor
style = list(
cell_fill(color = "royalblue"),
cell_text(color = "red", weight = "bold")
),
locations = cells_body(
columns = colnames(res_tbl),
rows = Coefficient == "pretest:method")
)
Coefficient | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
---|---|---|---|---|---|
pretest | 1 | 3.774K | 3.774K | 150.059 | 0.000 |
method | 2 | 726.111 | 363.055 | 14.436 | 0.000 |
pretest:method | 2 | 66.190 | 33.095 | 1.316 | 0.271 |
Residuals | 194 | 4.879K | 25.148 | NA | NA |
res <- anova(lm(posttest~pretest+method, data=df))
res_tbl <- res |>
as.data.frame() |>
rownames_to_column("Coefficient")
res_gt_display <- gt(res_tbl) |>
fmt_number(
columns = `Sum Sq`:`Pr(>F)`,
suffixing = TRUE,
decimals = 3
)
res_gt_display|>
tab_style( # style for versicolor
style = list(
cell_fill(color = "royalblue"),
cell_text(color = "red", weight = "bold")
),
locations = cells_body(
columns = colnames(res_tbl),
rows = Coefficient == "method")
)
Coefficient | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
---|---|---|---|---|---|
pretest | 1 | 3.774K | 3.774K | 149.576 | 0.000 |
method | 2 | 726.111 | 363.055 | 14.390 | 0.000 |
Residuals | 196 | 4.945K | 25.230 | NA | NA |
➤ Reliability of Covariates
➤ The reliability of covariate scores is crucial with ANCOVA
True Experimental Design | Quasi-Experimental Design |
---|---|
- Relationship between covariate and DV underestimated, resulting in less adjustment than is necessary | - Relationship between covariate and DV underestimated, resulting in less adjustment than is necessary |
- Less powerful F test | - Group effects (IV) may be seriously biased |
[Example] Step #1
\[ \bar{Y}_{\text{adjusted}} = \bar{Y}_{\text{original}} - b (\bar{X}_{\text{cell}} - \bar{X}_{\text{grand}}) \]
b is the pooled slope for the simple regression of the covariate on the DV
X is the covariate (cell mean and grand mean)
Y is the dependent variable (adjusted and unadjusted cell means)
If b is zero (relationship is zero) then there is no adjustment.
The bigger b is (the stronger the covariate/DV relationship), the more of an adjustment.
➔ The further a cell mean is from the covariate grand mean (the bigger the deviation), the more the cell mean is adjusted.
Based on the ANCOVA adjusted means formula:
\[ \bar{Y}_{\text{adjusted}} = \bar{Y}_{\text{original}} - b (\bar{X}_{\text{cell}} - \bar{X}_{\text{grand}}) \] We can compute the adjusted means for each group using the following steps:
# Unadjusted means of posttest by group
unadjusted_means <- df %>%
group_by(method) %>%
summarise(posttest_mean = mean(posttest))
# Pretest means by group
pretest_means <- df %>%
group_by(method) %>%
summarise(pretest_mean = mean(pretest))
# Grand mean of pretest
grand_pretest_mean <- mean(df$pretest)
# Fit linear model to get pooled regression slope
model <- lm(posttest ~ pretest, data = df)
# Extract slope
pooled_slope <- coef(model)["pretest"]
# Combine into one table
results <- left_join(unadjusted_means, pretest_means, by = "method")
# Calculate adjusted means using the ANCOVA adjustment formula
results2 <- results |>
mutate(grand_pretest_mean = grand_pretest_mean) |>
mutate(pooled_slope = pooled_slope) |>
mutate(adjusted_mean = posttest_mean - pooled_slope * (pretest_mean - grand_pretest_mean))
# View results
gt(results2) |>
fmt_number(
columns = posttest_mean:adjusted_mean
)
method | posttest_mean | pretest_mean | grand_pretest_mean | pooled_slope | adjusted_mean |
---|---|---|---|---|---|
hands-on | 70.41 | 64.90 | 65.11 | 0.74 | 70.56 |
lecture | 65.96 | 65.25 | 65.11 | 0.74 | 65.86 |
self-paced | 69.10 | 65.20 | 65.11 | 0.74 | 69.03 |
Based on the adjusted means, pretest, and posttest means, we can visualize the results using a bar plot.
used_colors <- c("steelblue", "tomato", "seagreen4")
used_group_labels <- c("Pretest", "Posttest", "Adjusted")
results2 |>
select(method, pretest_mean, posttest_mean, adjusted_mean) |>
pivot_longer(ends_with("_mean"), names_to = "type", values_to = "Mean") |>
mutate(type = factor(type, levels = paste0(c("pretest", "posttest", "adjusted"), "_mean"))) |>
ggplot(aes(y = method, x = Mean)) +
geom_col(aes(y = method, x = Mean, fill = type), position = position_dodge()) +
geom_text(aes(x = Mean + 5, label = round(Mean, 2), color = type), position = position_dodge(width = .85)) +
scale_color_manual(values = used_colors, labels = used_group_labels) +
scale_fill_manual(values = used_colors, labels = used_group_labels) +
labs(y = "", title = "Comparing Adjusted and Unadjusted Means") +
theme_minimal() +
theme(legend.position = "bottom")
➤ Categorical variable - ✓ Contain a finite number of categories or distinct groups. - ✓ Might not have a logical order. - ✓ Examples: gender, material type, and payment method.
➤ Discrete variable - ✓ Numeric variables that have a countable number of values between any two values. - ✓ Examples: number of customer complaints, number of items correct on an assessment, attempts on GRE. - ✓ It is common practice to treat discrete variables as continuous, as long as there are a large number of levels (e.g., 1–100 not 1–4).
➤ Continuous variable - ✓ Numeric variables that have an infinite number of values between any two values. - ✓ Examples: length, weight, time to complete an exam.
➔ We often assume the DV for ANCOVA is continuous, but we can sometimes “get away” with discrete, ordered outcomes if there are enough categories.
➤ Not related to this course, but categorical outcomes are commonly analyzed: - ✓ Examples: pass/fail a fitness test; pass/fail an academic test; retention (yes/no); on-time graduation (yes/no); proficiency (below, meeting, advanced), etc.
➔ These are not continuous, so we cannot use them in ANOVA
➤ Instead: logistic regression (PROC LOGISTIC or PROC GLM!) - ✓ Logistic regression can include both categorical and continuous IVs (and their interactions)
# Load libraries
library(ggplot2)
library(dplyr)
# Simulate data
set.seed(123)
n <- 100
weight <- rnorm(n, 140, 20)
prob_obese <- 1 / (1 + exp(-(0.1 * weight -15))) # logistic model
obese <- rbinom(n, size = 1, prob = prob_obese)
data <- data.frame(weight = weight, obese = obese)
# Linear model
lm_model <- lm(obese ~ weight, data = data)
# Logistic model
logit_model <- glm(obese ~ weight, data = data, family = "binomial")
# Prediction data
pred_data <- data.frame(weight = seq(min(weight), max(weight), length.out = 100))
pred_data$lm_pred <- predict(lm_model, newdata = pred_data)
pred_data$logit_pred <- predict(logit_model, newdata = pred_data, type = "response")
# Plot 1: Linear Regression
p1 <- ggplot(data, aes(x = weight, y = obese)) +
geom_point(color = "red", size = 2) +
geom_line(data = pred_data, aes(x = weight, y = lm_pred), color = "black") +
labs(title = "Linear Regression", x = "weight", y = "Obesity (0/1)") +
# ylim(0, 1.1) +
theme_minimal()
# Plot 2: Logistic Regression
p2 <- ggplot(data, aes(x = weight, y = obese)) +
geom_point(color = "red", size = 2) +
geom_line(data = pred_data, aes(x = weight, y = logit_pred), color = "black") +
labs(title = "Logistic Regression", x = "weight", y = "Predicted Probability") +
# ylim(0, 1.1) +
theme_minimal()
# Combine plots using patchwork
library(patchwork)
p1 + p2
In addition to the traditional degrees of freedom for an ANOVA, you now lose a degree of freedom for each covariate.
Degrees of Freedom → In our scenario, we have 1 IV with 3 groups and 1 covariate.
The \(df_{method}\) is the same as before: k - 1, where k represents the number of groups.
The \(df_{error}\) is different:
The \(df_{covariate}\) is #covariates = 1
If the principal in the scenario assigned a total of 200 students, the degrees of freedom for this analysis would be:
Now we need to follow-up to see where the differences lie.
Planned and Pairwise comparisons
Post-hoc tests
True Experimental Design | Quasi-Experimental Design | |
---|---|---|
Assignment to treatment | The researcher randomly assigns subjects to control and treatment groups. | Some other, non-random method is used to assign subjects to groups. |
Control over treatment | The researcher usually designs the treatment. | The researcher often does not have control over the treatment, but studies pre-existing groups. |
Use of control groups | Requires the use of control and treatment groups. | Control groups are not required (although they are commonly used). |
What is the risk/concern with quasi-experimental design?
➤ For example, the inclusion of the covariate adjusts the two groups’ means, as if they are the same on the covariate.
➤ Therefore, the two analyses are addressing two different research questions!
Biases the effect size of the IV | Values can’t be trusted |
---|---|
- Can remove real “effect variance” and attenuate the effect size | - Adjusted means are implausible values |
- If other variables involved, can make it look like there is an effect when there isn’t | - Interaction and slope values could just apply to the cells observed, not the population |
Use of covariates does not guarantee that groups will be “equivalent” —
even after using multiple covariates, there still may be some confounding variables operating that you are unaware of.
Best way to overcome differences between groups due to variables other than the IV
is to randomly assign subjects to groups.
Make sure that the covariate you are using is reliable!
ESRM 64103