Experimental Design in Education
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-02-17
Class Outline
Pre-defined:
Weights assigned:
\[ D = weights * means \]
Imagine a study comparing the effects of three different study methods (A, B, C) on test scores.
One planned contrast might be to compare the average score of method A (considered the “experimental” method) against the combined average of methods B and C (considered the “control” conditions),
Testing the hypothesis that method A leads to significantly higher scores than the traditional methods.
\(H_0: \mu_{A} = \frac{\mu_B+\mu_C}{2}\), we also call this complex contrast
When to use planned contrasts:
Note
We should not test all possible combinations of groups. Instead, justify your comparison plan before performing statistic analysis.
Orthogonal Contrasts: Independent from each other, sum of product of weights equals zero.
Non-Orthogonal Contrasts: Not independent, lead to inflated Type I error rates.
Note
Orthogonal contrasts allow clear interpretation without redundancy.
Orthogonal contrasts follows a series of group comparisons that does not overlap variances.
Hermert contrast for example
Group | Contrast 1 | Contrast 2 | Product |
---|---|---|---|
G1 | +1 | -1 | -1 |
G2 | +1 | +1 | +1 |
G3 | -2 | 0 | 0 |
Sum | 0 | 0 | 0 |
[,1] [,2]
[1,] 1 -1
[2,] 1 1
[3,] -2 0
[,1]
[1,] 0
[,1] [,2]
[1,] 6 0
[2,] 0 2
[,1] [,2]
[1,] 1 0
[2,] 0 1
library(tidyverse)
library(kableExtra)
library(here)
# Set seed for reproducibility
set.seed(42)
dt <- read.csv(here("teaching/2025-01-13-Experiment-Design/Lecture05","week5_example.csv"))
options(digits = 5)
summary_tbl <- dt |>
group_by(group) |>
summarise(
N = n(),
Mean = mean(score),
SD = sd(score),
shapiro.test.p.values = shapiro.test(score)$p.value
)
kable(summary_tbl)
group | N | Mean | SD | shapiro.test.p.values |
---|---|---|---|---|
g1 | 28 | 4.2500 | 3.15054 | 0.07759 |
g2 | 28 | 2.7589 | 2.19478 | 0.07605 |
g3 | 28 | 3.5446 | 2.86506 | 0.00623 |
g4 | 28 | 3.8568 | 0.58325 | 0.03023 |
g5 | 28 | 2.0243 | 1.30911 | 0.06147 |
Df | F value | Pr(>F) | |
---|---|---|---|
group | 4 | 12.966 | 0 |
135 | NA | NA |
Even though assumption checkings did not pass using original categorical levels, we may be still interested in different group contrasts.
For example, Helmert Four contrasts:
Summary Statistics:
group | N | Mean | SD | shapiro.test.p.values | Ctras1 | Ctras2 | Ctras3 | Ctras4 | |
---|---|---|---|---|---|---|---|---|---|
g1 | g1 | 28 | 4.2500 | 3.15054 | 0.07759 | -1 | -1 | -1 | -1 |
g2 | g2 | 28 | 2.7589 | 2.19478 | 0.07605 | 1 | -1 | -1 | -1 |
g3 | g3 | 28 | 3.5446 | 2.86506 | 0.00623 | 0 | 2 | -1 | -1 |
g4 | g4 | 28 | 3.8568 | 0.58325 | 0.03023 | 0 | 0 | 3 | -1 |
g5 | g5 | 28 | 2.0243 | 1.30911 | 0.06147 | 0 | 0 | 0 | 4 |
Orthogonal contrast matrix
Ctras1 Ctras2 Ctras3 Ctras4
0 0 0 0
Ctras1 Ctras2 Ctras3 Ctras4
Ctras1 2 0 0 0
Ctras2 0 6 0 0
Ctras3 0 0 12 0
Ctras4 0 0 0 20
The relationship between planned contrasts in ANOVA and coding in regression lies in how categorical variables are represented and interpreted in statistical models.
Both approaches aim to test specific hypotheses about group differences, but their implementation varies based on the framework
\(t = \frac{C}{\sqrt{MSE \sum \frac{c_i^2}{n_i}}}\)
Sum_C2_n <- colSums(cH^2 / summary_tbl$N)
C <- crossprod(summary_tbl$Mean, cH)
MSE <- 5.0
t <- as.numeric(C / sqrt(MSE * Sum_C2_n))
t
[1] -2.495040 0.077632 0.694597 -3.340639
# A tibble: 4 × 2
t_value p_value
<dbl> <dbl>
1 -2.50 0.00690
2 0.0776 0.531
3 0.695 0.756
4 -3.34 0.000541
g1 vs. g2: We reject the null and determine that the mean of the Education is different from the mean of Engineering in their growth mindset scores (p = 0.531).
\(\frac{g1+g2}{2}\) vs. g3: We retain the null and determine that the mean of the Chemistry is not significant different from the mean of Education and Engineering in their growth mindset scores (p = 0.531).
Remember that Planned Contrast: g1 vs. g2 from Helmert Contrast:
contrasts(dt$group) <- "contr.helmert"
fit_helmert <- lm(score ~ group, dt)
contr.helmert(levels(dt$group))
[,1] [,2] [,3] [,4]
g1 -1 -1 -1 -1
g2 1 -1 -1 -1
g3 0 2 -1 -1
g4 0 0 3 -1
g5 0 0 0 4
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.287 0.189 17.391 0.000
group1 -0.746 0.299 -2.495 0.014
group2 0.013 0.173 0.078 0.938
group3 0.085 0.122 0.695 0.489
group4 -0.316 0.095 -3.340 0.001
Planned contrast can be done using linear regression
+ contrasts
Let’s look at the default contrasts plan: treatment contrasts == dummy coding
g2 g3 g4 g5
g1 0 0 0 0
g2 1 0 0 0
g3 0 1 0 0
g4 0 0 1 0
g5 0 0 0 1
[,1] [,2] [,3] [,4]
g1 1 0 0 0
g2 0 1 0 0
g3 0 0 1 0
g4 0 0 0 1
g5 -1 -1 -1 -1
[,1] [,2] [,3] [,4]
g1 -1 -1 -1 -1
g2 1 -1 -1 -1
g3 0 2 -1 -1
g4 0 0 3 -1
g5 0 0 0 4
For treatment contrasts, four dummy variables are created to compared:
Intercept
: G1’s meangroup2
: G2 vs. G1group3
: G3 vs. G1group4
: G4 vs. G1group5
: G5 vs. G1 (Intercept) groupg2 groupg3 groupg4 groupg5 group
1 1 0 0 0 0 1
29 1 1 0 0 0 2
57 1 0 1 0 0 3
85 1 0 0 1 0 4
113 1 0 0 0 1 5
Another type of coding is effect coding. In R, the corresponding contrast type are the so-called sum contrasts.
A detailed post about sum contrasts can be found here
With sum contrasts the reference level is in fact the grand mean.
[,1] [,2] [,3] [,4]
g1 1 0 0 0
g2 0 1 0 0
g3 0 0 1 0
g4 0 0 0 1
g5 -1 -1 -1 -1
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.28693 0.18900 17.39087 2.8188e-36
group1 0.96307 0.37801 2.54777 1.1962e-02
group2 -0.52800 0.37801 -1.39680 1.6476e-01
group3 0.25771 0.37801 0.68177 4.9655e-01
group4 0.56986 0.37801 1.50753 1.3401e-01
Note
Effect coding is a method of encoding categorical variables in regression models, similar to dummy coding, but with a different interpretation of the resulting coefficients. It is particularly useful when researchers want to compare each level of a categorical variable to the overall mean rather than to a specific reference category.
In effect coding, categorical variables are transformed into numerical variables, typically using values of -1, 0, and 1. The key difference from dummy coding is that the reference category is represented by -1 instead of 0, and the coefficients indicate deviations from the grand mean.
For a categorical variable with k
levels, effect coding requires k-1
coded variables. If we have a categorical variable X with three levels: \(A, B, C\), the effect coding scheme could be:
Category | \(X_1\) | \(X_2\) |
---|---|---|
A | 1 | 0 |
B | 0 | 1 |
C (reference) | -1 | -1 |
The last category (\(C\)) is the reference group, coded as -1 for all indicator variables.
When effect coding is used in a regression model:
\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon \]
library(ggplot2)
# Create a data frame for text labels
text_data <- data.frame(
x = rep(0.25, 3), # Repeating the same x-coordinate
y = c(0.3, 0.7, 0.9), # Different y-coordinates
label = c("C: beta[0] - beta[1] - beta[2]",
"A: beta[0] + 1*'×'*beta[1] + 0*'×'*beta[2]",
"B: beta[0] + 0*'×'*beta[1] + 1*'×'*beta[2]") # Labels
)
# Create an empty ggplot with defined limits
ggplot() +
geom_text(data = text_data, aes(x = x, y = y, label = label), parse = TRUE, size = 11) +
# Add a vertical line at x = 0.5
# geom_vline(xintercept = 0.5, color = "blue", linetype = "dashed", linewidth = 1) +
# Add two horizontal lines at y = 0.3 and y = 0.7
geom_hline(yintercept = c(0.35, 0.75, 0.95), color = "red", linetype = "solid", linewidth = 1) +
geom_hline(yintercept = 0.5, color = "grey", linetype = "solid", linewidth = 1) +
geom_text(aes(x = .25, y = .45, label = "grand mean of Y"), color = "grey", size = 11) +
# Set axis limits
xlim(0, 1) + ylim(0, 1) +
labs(y = "Y", x = "") +
# Theme adjustments
theme_minimal() +
theme(text = element_text(size = 20))
Category | \(X_1\) | \(X_2\) |
---|---|---|
A | 1 | 0 |
B | 0 | 1 |
C (reference) | 0 | 0 |
Effect coding is beneficial when:
Effect coding can be set in R using the contr.sum
function:
group N Mean SD shapiro.test.p.values Contrasts
1 g1 28 4.2500 3.15054 0.0775874 0.50000
2 g2 28 2.7589 2.19478 0.0760542 -0.33333
3 g3 28 3.5446 2.86506 0.0062253 0.50000
4 g4 28 3.8568 0.58325 0.0302312 -0.33333
5 g5 28 2.0243 1.30911 0.0614743 -0.33333
\[ H_0: \frac{\mu_{Engineering}+\mu_{Chemistry}}{2} = \frac{\mu_{Education}+\mu_{PoliSci}+\mu_{Psychology}}{3} \]
weighted mean difference:
\[ C = c_1\mu_{Eng}+c_2\mu_{Edu}+c_3\mu_{Chem}+c_4\mu_{PoliSci}+c_5\mu_{Psych}\\ = \frac{1}{2}*4.25+(-\frac13)*2.75+(\frac12)*3.54+(-\frac13)*3.85+(-\frac13)*2.02\\ = 1.0173 \]
\[ \sum\frac{c^2}{n} = \frac{(\frac12)^2}{28}+\frac{(-\frac13)^2}{28}+\frac{(\frac12)^2}{28}+\frac{(-\frac13)^2}{28}+\frac{(-\frac13)^2}{28} \]
[1] 0.029762
[1] 5.0011
[1] 2.6369
\[ t = \frac{C}{\sqrt{MSE*\sum\frac{c^2}{n} }} = \frac{1.0173}{\sqrt{5.0011*0.029762}}=2.6368 \]
Note
Many psychology journals require the reporting of effect sizes
Df Sum Sq Mean Sq F value Pr(>F)
group 4 89.368 22.3420 4.4674 0.0020173
Residuals 135 675.149 5.0011 NA NA
Interpretation: 11.69% of variance in the DV is due to group differences.
Df Sum Sq Mean Sq F value Pr(>F)
group 4 89.368 22.3420 4.4674 0.0020173
Residuals 135 675.149 5.0011 NA NA
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.25000 0.42262 10.0562 4.2275e-18
groupg2 -1.49107 0.59768 -2.4948 1.3810e-02
groupg3 -0.70536 0.59768 -1.1802 2.4001e-01
groupg4 -0.39321 0.59768 -0.6579 5.1172e-01
groupg5 -2.22571 0.59768 -3.7239 2.8718e-04
[1] 0.654 0.210 0.101 0.057 0.305
ESRM 64503