Df Sum Sq Mean Sq F value Pr(>F)
group 1 218.42 218.42 264.1 <2e-16 ***
Residuals 38 31.43 0.83
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Experimental Design in Education
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-08-18
Df Sum Sq Mean Sq F value Pr(>F)
group 1 218.42 218.42 264.1 <2e-16 ***
Residuals 38 31.43 0.83
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
group 1 113.4 113.43 5.485 0.0245 *
Residuals 38 785.8 20.68
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We will talk more details about this in the next lecture
Context: the example research focus on whether three different teaching methods (labeled as G1, G2, G3) on students’ test scores.
In total, 40 students are assigned to three teaching group and one default teaching group. Each group have 10 samples.
set.seed(1234)
data <- data.frame(
group = rep(c("G1", "G2", "G3", "Control"), each = 10),
score = c(rnorm(10, 20, 5), rnorm(10, 25, 5), rnorm(10, 30, 5), rnorm(10, 22, 5))
)
data_unequal <- data.frame(
group = rep(c("G1", "G2", "G3", "Control"), each = 10),
score = c(rnorm(10, 20, 10), rnorm(10, 25, 5), rnorm(10, 30, 1), rnorm(10, 22, .1))
)
Df Sum Sq Mean Sq F value Pr(>F)
group 3 724.1 241.37 11.45 2.04e-05 ***
Residuals 36 759.2 21.09
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
group 3 1413 470.9 19.14 1.37e-07 ***
Residuals 36 886 24.6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Bartlett test of homogeneity of variances
data: score by group
Bartlett's K-squared = 2.0115, df = 3, p-value = 0.57
Bartlett test of homogeneity of variances
data: score by group
Bartlett's K-squared = 82.755, df = 3, p-value < 2.2e-16
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.1779 0.9107
36
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 3.4749 0.02581 *
36
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Create a set of confidence intervals on the differences between the means of the pairwise levels of a factor with the specified family-wise probability of coverage.
When comparing the means for the levels of a factor in an analysis of variance, a simple comparison using t-tests will inflate the probability of declaring a significant difference when it is not in fact present.
Based on Tukey’s ‘Honest Significant Differences’ method
diff lwr upr p adj
G1-Control -0.08482175 -5.6158112 5.446168 0.9999741881
G2-Control 6.24011176 0.7091223 11.771101 0.0218818550
G3-Control 9.89123122 4.3602418 15.422221 0.0001491636
G2-G1 6.32493351 0.7939441 11.855923 0.0197364909
G3-G1 9.97605296 4.4450635 15.507042 0.0001317258
G3-G2 3.65111946 -1.8798700 9.182109 0.3003394077
Df Sum Sq Mean Sq F value Pr(>F)
group 3 724.1 241.37 11.45 2.04e-05 ***
Residuals 36 759.2 21.09
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# A tibble: 4 × 3
group Mean SD
<chr> <dbl> <dbl>
1 Control 18.2 4.47
2 G1 18.1 4.98
3 G2 24.4 5.34
4 G3 28.1 3.33
aov()
weights <- c(rep(1, 10), rep(2, 10), rep(0.5, 10), rep(1.5, 10))
anova_weighted <- aov(score ~ group, data = data, weights = weights)
summary(anova_weighted)
Df Sum Sq Mean Sq F value Pr(>F)
group 3 666.6 222.20 7.578 0.000471 ***
Residuals 36 1055.5 29.32
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
weights <- ifelse(data$group == "LargeSchool", 0.5, 1)
anova_weighted <- aov(score ~ group, data = data, weights = weights)
summary(anova_weighted)
Df Sum Sq Mean Sq F value Pr(>F)
group 3 724.1 241.37 11.45 2.04e-05 ***
Residuals 36 759.2 21.09
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(ellmer)
chat <- chat_ollama(model = "llama3.2", seed = 1234)
prompt <- paste0("Perform ANOVA analysis using R code given the generated data sets by R and then interpret the results",
'
set.seed(1234)
data <- data.frame(
group = rep(c("G1", "G2", "G3", "Control"), each = 10),
score = c(rnorm(10, 20, 5), rnorm(10, 25, 5), rnorm(10, 30, 5), rnorm(10, 22, 5))
)
')
chat$chat(prompt)
Here is an example of how to perform ANOVA analysis using the given data set in
R:
```r
# Load necessary libraries
library(car)
# Perform ANOVA analysis
anova_result <- aov(score ~ group, data = data)
# Print summary of ANOVA results
summary(anova_result)
```
In this code, we first load the necessary "car" library to access the `aov()`
function, which performs the ANOVA analysis. We then perform ANOVA using
`aov()` on our score variable and a categorical independent variable 'group'
and print out the results.
Here is how you can interpret the results:
```
Df Sum Sq Mean Sq F value Pr(>F)
group 3 145.5 48.50 11.41 1.39e-10 ***
Residuals 30 1550.2 51.75
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.0556
(multiplied by sqrt())
Number of observations: 40
(All tests are two-tailed)
Multiple R-squared: 0.9635 # near perfect fit
Adjusted R-squared: 0.9517 # a very strong correlation between score and group
F-statistic: 11.412 on 3 and 30 DF, p-value = inf
```
From the results:
- The F-statistic of 11.41, accompanied by an extremely low (almost flat)
p-value of `1.39e-10`, suggests that there is a statistically significant
difference among groups.
- With nearly perfect fit using R-squared value with more than **0.9** the
result supports that prediction model perfectly explains given data and can be
assumed suitable predictor model for other scenarios like forecasting , or
making predictions about scores.
Thus, ANOVA proves that groups are significantly different from one another in
terms of mean scores, supporting the idea that group is a significant predictor
variable in our model.
ESRM 64503: Lecture 03