Code
Detergent | Stain1 | Stain2 | Stain3 |
---|---|---|---|
1 | 45 | 43 | 51 |
2 | 47 | 46 | 52 |
3 | 48 | 50 | 55 |
4 | 42 | 37 | 49 |
Experimental Design in Education
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-03-07
Think about our example of the effects of teaching methods (M1, M2, M3, M4) and measurement forms (F1, F2, F3, F4) to math performance.
Answer
The RCBD utilizes an additive model (two-way ANOVA without interaction)
Both the treatments and blocks can be considered as random effects rather than fixed effects, if the levels were selected at random from a population of possible treatments or blocks. We consider this case later, but it does not change the test for a treatment effect.
What are the consequences of not blocking if we should have? Generally the unexplained error in the model will be larger, and therefore the test of the treatment effect less powerful.
How to determine the sample size in the RCBD?
\[ Y_{ij} = \mu + \tau_i + \rho_j + \epsilon_{ij} \]
where:
\[ \mathrm{SS}_{\mathrm{T}}= n_b \sum\left(\bar{y}_{i .}-\bar{y}_{. .}\right)^{2}+ n_a \sum\left(\bar{y}_{. j}-\bar{y}_{. .}\right)^{2}+\sum \sum\left(y_{i j}-\bar{y}_{i .}-\bar{y}_{. j}+\bar{y}_{. .}\right)^{2} \]
\(\mathrm{SS}_{\mathrm{treatment}}= n_b \sum\left(\bar{y}_{i .}-\bar{y}_{. .}\right)^{2}\) with \(\mathrm{df} = a -1\)
\(\mathrm{SS}_{\mathrm{block}}= n_a \sum\left(\bar{y}_{. j}-\bar{y}_{. .}\right)^{2}\) with \(\mathrm{df} = b -1\)
\(\mathrm{SS}_{\mathrm{Residual}}= \sum \sum\left(y_{i j}-\bar{y}_{i .}-\bar{y}_{. j}+\bar{y}_{. .}\right)^{2}\) with \(\mathrm{df} = (n_a-1)(n_b -1)\)
\[ \mathrm{SS}_{\mathrm{T}} = \mathrm{SS}_{\mathrm{treatment}} + \mathrm{SS}_{\mathrm{block}} + \mathrm{SS}_{\mathrm{Residual}} \]
\[ SS_{Total} = \sum_{i=1}^{n_a}\sum_{j=1}^{n_b}(y_{ij})^2-(\sum_{i=1}^{n_a}\sum_{j=1}^{n_b}y_{ij})^2/N \]
\[ SS_{Treatment} = \frac{1}{n_b}\sum{(y_{i.})}^2 -(\sum_{i=1}^{n_a}\sum_{j=1}^{n_b}y_{ij})^2/N \]
\[ SS_{Block} = \frac{1}{n_a}\sum{(y_{.j})}^2 -(\sum_{i=1}^{n_a}\sum_{j=1}^{n_b}y_{ij})^2/N \]
Background
An experiment was designed to study the performance of four different detergents in cleaning clothes. The following “cleanness” readings (higher=cleaner) were obtained with specially designed equipment for three different types of common stains. Is there a difference between the detergents?
Detergent | Stain1 | Stain2 | Stain3 |
---|---|---|---|
1 | 45 | 43 | 51 |
2 | 47 | 46 | 52 |
3 | 48 | 50 | 55 |
4 | 42 | 37 | 49 |
Marginal Sums of treatment: \(y_{i.}\); R code: rowSums(detergents[, 2:4])
Marginal Sums of Stain: \(y_{.j}\); R code: colSums(detergents[, 2:4])
treatment_marginal_Sums = rowSums(detergents[, 2:4])
grand_mean <- mean(unlist(detergents[, 2:4]))
## Method 1
3 * sum((treatment_marginal_Sums/3 - grand_mean)^2)
[1] 110.9167
[1] 110.9167
block_marginal_Sums = colSums(detergents[, 2:4])
## Method 1
4 * sum((block_marginal_Sums/4 - grand_mean)^2)
[1] 135.1667
[1] 135.1667
\[ F = \frac{SS_{\mathrm{treatment}}/n_a}{SS_\mathrm{residual}/ ((n_a-1)*(n_b-1))} \]
detergents_aov <- detergents |>
pivot_longer(starts_with("Stain"), names_to = "Stain") |>
mutate(Detergent = factor(Detergent, levels = 1:4))
fit <- aov(value ~ Detergent+Stain, data= detergents_aov)
summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
Detergent 3 110.92 36.97 11.78 0.00631 **
Stain 2 135.17 67.58 21.53 0.00183 **
Residuals 6 18.83 3.14
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note
DV: Midterm Score (in cells)
Note
DV: Midterm Score (in cells)
IV: Time (𝑎= 2 )
Nuisance factor: Tutor (𝑏= 4 )
DV: Midterm Score
Now:
Thus, we can partition the effects into three parts:
Sum of squares due to treatments (IV = Time),
Sum of squares due to the blocking factor,
and Sum of squares due to error.
We do not model an interaction with blocked designs. (we will talk about it later.)
\[ SS_{\mathrm{Total}}=\sum_{i=1}^{n}(y_{ij}-\bar{y}_{..})^2 \]
[1] 9.325
[1] 0.0875
Do this for everyone and then sum over all people \(\sum_{i=1}^{n}(y_{ij}-\bar{y}_{..})^2\)
However, when calculating Sum of Squares for IVs: \(SS_{Model}\), \(SS_{Block}\), \(SS_{error}\), we need to compute “marginal means”
For example:
\[ SS_{Total} = \sum_{i=1}^{n}(y_{ij}-\bar{y}_{..})^2=15060.48 \]
\[ SS_{Model(time)} = \sum_{a=1}^{a}n_a(\bar{y}_{a.}-\bar{y}_{..})^2=4489.02 \]
where \(n_a\) is the group size for AM/PM and \(\bar{y}_{a.}\) are the marginal means for AM and PM {22.95, 13.43}
This is similar to how we computed \(𝑆𝑆_{𝑀𝑜𝑑𝑒𝑙}\) before: marginal group mean subtract off the grand mean and square it. Sum over all groups.
\[ SS_{Block} = \sum_{b=1}^{b}n_{b}(\bar{y}_{.b}-\bar{y}_{..})^2=3239.43 \]
Technically, the blocking factor is just another IV (but we are not interested in or is not within the scope of research question).
Note
Under \(\alpha=.05\), for “Model” factor – Time, we have \(df_{Model}\) = 1, \(df_{error} = 195\): \(F_{crit}=3.89\) so sig.
Similarly, for “Blocking” - Tutor, we have \(df_{block}\) = 3, \(df_{error} = 195\): \(F_{crit}=2.65\) so sig.
The Rockwell hardness test
The Rockwell hardness test is a hardness test based on indentation hardness of a material. The Rockwell test measures the depth of penetration of an indenter under a large load (major load) compared to the penetration made by a preload (minor load).
Metal | Tip | Hardness |
---|---|---|
Metal1 | Tip1 | 9.9 |
Metal2 | Tip1 | 9.5 |
Metal3 | Tip2 | 9.4 |
Metal4 | Tip2 | 9.3 |
Metal5 | Tip3 | 9.6 |
Metal6 | Tip3 | 9.0 |
Metal7 | Tip4 | 9.8 |
Metal8 | Tip4 | 9.1 |
If we conduct this as a blocked experiment, we would assign all four tips to the same test specimen, randomly assigned to be tested on a different location on the specimen. Since each treatment occurs once in each block, the number of test specimens is the number of replicates.
Back to the hardness testing example, the experimenter may very well want to test the tips (treatment) across specimens (block) of various hardness levels. This shows the importance of blocking. To conduct this experiment as a RCBD, we assign all 4 tips to each specimen.
Suppose that we use b = 4 blocks as shown in the table below:
We are primarily interested in testing the equality of treatment means, but now we have the ability to remove the variability associated with the nuisance factor (the blocks) through the grouping of the experimental units prior to having assigned the treatments.
tribble(
~`1`, ~`2`, ~`3`, ~`4`,
"Tip 3", "Tip 3", "Tip 2", "Tip 1",
"Tip 1", "Tip 4", "Tip 1", "Tip 4",
"Tip 4", "Tip 2", "Tip 3", "Tip 3",
"Tip 2", "Tip 1", "Tip 4", "Tip 3"
) |>
gt() |>
tab_header(
title = "The Hardness Testing Experiment",
subtitle = "Randomized Complete Block Design"
) |>
tab_spanner(
label = "Test Coupon (Block)",
columns = everything()
) |>
tab_options(
table.width = px(500),
table.font.size = px(20)
)
The Hardness Testing Experiment | |||
---|---|---|---|
Randomized Complete Block Design | |||
Test Coupon (Block)
|
|||
1 | 2 | 3 | 4 |
Tip 3 | Tip 3 | Tip 2 | Tip 1 |
Tip 1 | Tip 4 | Tip 1 | Tip 4 |
Tip 4 | Tip 2 | Tip 3 | Tip 3 |
Tip 2 | Tip 1 | Tip 4 | Tip 3 |
Important
Notice the two-way structure of the experiment. Here we have four blocks and within each of these blocks is a random assignment of the tips within each block.
Obs | Tip | Hardness | Coupon |
---|---|---|---|
1 | 1 | 9.3 | 1 |
2 | 1 | 9.4 | 2 |
3 | 1 | 9.6 | 3 |
4 | 1 | 10.0 | 4 |
5 | 2 | 9.4 | 1 |
6 | 2 | 9.3 | 2 |
7 | 2 | 9.8 | 3 |
8 | 2 | 9.9 | 4 |
9 | 3 | 9.2 | 1 |
10 | 3 | 9.4 | 2 |
11 | 3 | 9.5 | 3 |
12 | 3 | 9.7 | 4 |
13 | 4 | 9.7 | 1 |
14 | 4 | 9.6 | 2 |
15 | 4 | 10.0 | 3 |
16 | 4 | 10.2 | 4 |
aov()
. We can see four levels of the Tip and four levels for Coupon:Note
The Analysis of Variance table shows three degrees of freedom for Tip three for Coupon, and the residual (error) degrees of freedom is nine.
The ratio of mean squares of treatment over error gives us an F ratio that is equal to 14.44 which is highly significant since it is greater than the .001 percentile of the F distribution with three and nine degrees of freedom.
Our 2-way analysis also provides a test for the block factor, Coupon. The ANOVA shows that this factor is also significant with an F-test = 30.94. So, there is a large amount of variation in hardness between the pieces of metal.
This is why we used specimen (or coupon) as our blocking factor. We expected in advance that it would account for a large amount of variation. By including block in the model and in the analysis, we removed this large portion of the variation, such that the residual error is quite small. By including a block factor in the model, the error variance is reduced, and the test on treatments is more powerful.
Tip | N_Tip | Hardness_Tip | Coupon | N_Coupon | Hardness_Coupon |
---|---|---|---|---|---|
1 | 4 | 9.575 | 1 | 4 | 9.400 |
2 | 4 | 9.600 | 2 | 4 | 9.425 |
3 | 4 | 9.450 | 3 | 4 | 9.725 |
4 | 4 | 9.875 | 4 | 4 | 9.950 |
In a greenhouse experiment, there was a single factor (fertilizer) with 4 levels (i.e. 4 treatments), six replications, and a total of 24 experimental units (potted plants). Suppose the image below is the greenhouse bench (viewed from above) that was used for the experiment.
To use CBD, we need to randomly assign each of the treatment levels to 6 potted plants. To do this, we first assign numbers to the physical position of the pots on the bench. Each column of plants will be use one Fertilizer.
To further expand it to RCBD, we need to put them into different farms (say we have six farms). Each farms will have 4 plans with each will use one type of fertilizer.
gt(dat,
rownames_to_stub = TRUE,
groupname_col = "Block",
row_group_as_column = TRUE) |>
tab_options(table.width = px(500))
Fertilizer | Height | Plant | ||
---|---|---|---|---|
Block1 | 1 | Control | 19.5 | 1 |
7 | F1 | 25.0 | 7 | |
13 | F2 | 22.5 | 13 | |
19 | F3 | 27.5 | 19 | |
Block2 | 2 | Control | 20.5 | 2 |
8 | F1 | 27.5 | 8 | |
14 | F2 | 25.2 | 14 | |
20 | F3 | 28.0 | 20 | |
Block3 | 3 | Control | 21.0 | 3 |
9 | F1 | 28.0 | 9 | |
15 | F2 | 26.0 | 15 | |
21 | F3 | 29.2 | 21 | |
Block4 | 4 | Control | 21.0 | 4 |
10 | F1 | 28.6 | 10 | |
16 | F2 | 26.5 | 16 | |
22 | F3 | 29.5 | 22 | |
Block5 | 5 | Control | 21.5 | 5 |
11 | F1 | 30.5 | 11 | |
17 | F2 | 27.0 | 17 | |
23 | F3 | 30.0 | 23 | |
Block6 | 6 | Control | 22.5 | 6 |
12 | F1 | 32.0 | 12 | |
18 | F2 | 28.0 | 18 | |
24 | F3 | 31.0 | 24 |
aov()
function with the formula:<Outcome>~<Treatment>+<Block>
.
<Outcome>~<Treatment>
Important
Comparing the two ANOVA tables, we see that the MSE in RCBD has decreased considerably in comparison to the CRD. This reduction in MSE can be viewed as the partition in SSE for the CRD (61.033) into SSBlock (53.32) + SSE (7.715). The potential reduction in SSE by blocking is offset to some degree by losing degrees of freedom for the blocks. But more often than not, is worth it in terms of the improvement in the calculated F-statistic. In our example, we observe that the F-statistic for the treatment has increased considerably for RCBD in comparison to CRD. It is reasonable to assume that the result from the RCBD is more valid than that from the CRD as the MSE value obtained after accounting for the block to block variability is a more accurate representation of the random error variance.
stud <- factor(rep(c("male", "female"), each = 2))
perf <- factor(rep(c("ah", "ac" ), times = 2))
perf
[1] ah ac ah ac
Levels: ac ah
y <- c(5.5, 5,
4, 6.2)
# y is the hours students
# studied in specific places
results <- data.frame(y, stud, perf)
fit <- aov(y ~ perf+stud, data = results)
summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
perf 1 0.7225 0.7225 0.396 0.642
stud 1 0.0225 0.0225 0.012 0.930
Residuals 1 1.8225 1.8225
0.7225<<1.8225
,i.e, here blocking wasn’t necessary. And as Pr value is 0.642 > 0.05 (5% significance) and the hypothesis is accepted - there is no sufficient evidence suggesting females and males have significant differences in performance.GENDER
, STATUS
Schools
, Classes
ESRM 64503