Jihong Zhang, Ph.D. – Lecture 08: Block Design (2)

Overview

Review Last week’s Lecture
Review Block Design
Randomized Complete Block Design with R Programming

In last week (1)

We reviewed different types of randomized experiment design:
- Block Design: Complete block design (CBD) vs. Randomized Complete Block Design (RCBD)
  - Difference: A “complete block design” simply refers to an experimental design where every treatment is applied to every block, while a “randomized complete block design” takes that same concept and adds the element of randomly assigning treatments within each block
- Block Design with more block factors: Latin square design, Repeated LSD, Greco-Roman Squares
  - Benefit: (1) Account for more explained variances and leads to lower residual variances; (2) Make the effect size of treatment more accurate;
  - Limitation: require block factors to have same number of levels.

In last week (2)

We discussed about why we need Block design compared to using simple treatment-control design.
- Potential confounding effects that can become nuisance factors
- Heterogeneity of samples (variability across gender, schools, age groups)
- Greater generalization of results
Assumptions of Block Design
- Continuous outcome
- Experimental units are randomly sampled
- No interactions between treatment factor(s) and blocking factor(s)
- Each block group’s outcome is normally distributed
- Each block group has “equal” or “close” variances in outcome

Features of RCBD

Think about our example of the effects of teaching methods (M1, M2, M3, M4) and measurement forms (F1, F2, F3, F4) to math performance.

In randomized complete block design (RCBD), each block size is the same and is equal to the number of treatments (i.e. factor levels or factor level combinations).
- For those who using measurement form (same block), they will be randomly assigned to 4 teaching methods.
Each treatment will be randomly assigned to exactly one experimental unit (i.e., students) within every block.
The assignments of treatment levels (teaching methods) to the experimental units (students) have to be done within each block separately.

Random Effects of RCBD

It is important to mention that blocks are usually (, but not always) treated as random effects as they typically represent the population of all possible blocks.
In other words, the mean comparison among specific blocks is not of interest. The variability could be large or small depending on your context.
However, the variation between blocks must be incorporated into the model.

Exercises

A poultry experiment was run to investigate the effect of diet and antibiotics on egg production. They evaluated 2 diets of interest and 2 specific antibiotics that are on the market. The feed and antibiotic were combined and used to fill the feeding trays in barns. They chose 3 poultry farms at random and randomly assigned the combinations of diet and antibiotic to 4 barns within each farm. Total egg production by the chickens was recorded after 4 weeks.
1. What is the experimental design (hint: think about the randomization process)?
2. Identify which factors are treatment and block.

Answer

RCBD.
treatment: combination of Diet and Antibiotic; block: Farms.

Other Aspects of the RCBD

The RCBD utilizes an additive model (two-way ANOVA without interaction)
- one in which there is no interaction between treatments and blocks. The error term in a randomized complete block model reflects how the treatment effect varies from one block to another.
Both the treatments and blocks can be considered as random effects rather than fixed effects, if the levels were selected at random from a population of possible treatments or blocks. We consider this case later, but it does not change the test for a treatment effect.
What are the consequences of not blocking if we should have? Generally the unexplained error in the model will be larger, and therefore the test of the treatment effect less powerful.
How to determine the sample size in the RCBD?
- The Operating Characteristic (OC) curve approach can be used to determine the number of blocks to run. The number of blocks, b, represents the number of replications (they are exchangable from the point of researchers’ view). The power calculations that we looked at before would be the same, except that we use b rather than n, and we use the estimate of error, , that reflects the improved precision based on having used blocks in our experiment. So, the major benefit or power comes not from the number of replications but from the error variance which is much smaller because you removed the effects due to block.

Statistical form of RCBD

The mean comparison among specific blocks (, , , ) is not of interest
- In a RCBD, the variation between blocks is partitioned out of the MSE, resulting in a smaller MSE for testing hypotheses about the treatments.

where:

: math scores for Method i and Form j
: grand mean
: Method i with i = 1, …, 4
: From j with j = 1, 2, 3
and are independent random variables such that and

A little bit statistics ¹

Can partition into:

with
with
with

A littble bit more statistics

Assume treatment factor has levels and blocking factor has levels:

Mean of “sum of square” of marginal sums minus the mean of “square of sum”
- Marginal Sums of treatment:

Mean of “sum of square” of marginal sums minus the mean of “square of sum”
- Marginal Sums of block:

Example: Performance of detergents

Background

An experiment was designed to study the performance of four different detergents in cleaning clothes. The following “cleanness” readings (higher=cleaner) were obtained with specially designed equipment for three different types of common stains. Is there a difference between the detergents?

Code

library(tidyverse)
detergents <- tribble(
  ~Detergent, ~Stain1, ~Stain2, ~Stain3,
  1, 45, 43, 51,
  2, 47, 46, 52,
  3, 48, 50, 55,
  4, 42, 37, 49
)
kableExtra::kable(detergents)

Detergent	Stain1	Stain2	Stain3
1	45	43	51
2	47	46	52
3	48	50	55
4	42	37	49

Marginal Sums of treatment: ; R code: rowSums(detergents[, 2:4])
Marginal Sums of Stain: ; R code: colSums(detergents[, 2:4])

Example: Total Sum of Square

Sum of square of all values: = 26867
Square of sum of all values per level: = 26602.0833333333
Total Sum of Squares: = 264.916666666668

sum((detergents[, 2:4])^2) - (sum(detergents[, 2:4]))^2 / 12

[1] 264.9167

Example: Sum of squares for Detergent

treatment_marginal_Sums = rowSums(detergents[, 2:4])
grand_mean <- mean(unlist(detergents[, 2:4]))
## Method 1
3 * sum((treatment_marginal_Sums/3 - grand_mean)^2)

[1] 110.9167

## Method 2
sum(treatment_marginal_Sums^2) / 3 - (sum(detergents[, 2:4]))^2 / 12

[1] 110.9167

Sum of square per level: = 26713
Square of sum of all values: = 319225
Sum of Squares for treatment: = 110.917

Example: Sum of squares for block

block_marginal_Sums = colSums(detergents[, 2:4])
## Method 1
4 * sum((block_marginal_Sums/4 - grand_mean)^2)

[1] 135.1667

## Method 2
(1 / 4) * sum(block_marginal_Sums^2) - (sum(detergents[, 2:4]))^2 / 12

[1] 135.1667

Sum of square per level: = 26737.25
Square of sum of all values: = 707670837.673611
Sum of Squares for treatment: = 135.166666666668

F-statistics

SS_total <- sum((detergents[, 2:4])^2) - (sum(detergents[, 2:4]))^2 / 12
SS_treatment <- 3 * sum((treatment_marginal_Sums/3 - grand_mean)^2)
SS_block <- 4 * sum((block_marginal_Sums/4 - grand_mean)^2)
SS_residual = SS_total - SS_treatment - SS_block
SS_residual

[1] 18.83333

F_stat = (SS_treatment / (4-1)) / (SS_residual / ((4-1)*(3-1)))
F_stat

[1] 11.77876

R Code for Sum of Squares

detergents_aov <- detergents |> 
  pivot_longer(starts_with("Stain"), names_to = "Stain") |>
  mutate(Detergent = factor(Detergent, levels = 1:4))

fit <- aov(value ~ Detergent+Stain, data= detergents_aov)
summary(fit)

            Df Sum Sq Mean Sq F value  Pr(>F)   
Detergent    3 110.92   36.97   11.78 0.00631 **
Stain        2 135.17   67.58   21.53 0.00183 **
Residuals    6  18.83    3.14                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example: Tutoring session

Let’s look at a simple example:
Research question: What is the effect of time of day of tutoring session on midterm grades?
The Primary Investigator (PI) of the study wants to control for tutor, believing some tutors may be better in the subject matter than others. Thus, they design a completely randomized block design.
Each tutor works with students in the morning and afternoon.
- In this example, student select their favorite tutor, but are then randomly assigned to either a morning (AM) or afternoon (PM) session.
- 200 total students: 91 in the morning, 109 in the afternoon

Note

DV: Midterm Score (in cells)

Example: Variables and Null Hypothesis

IV: Time (a = 2 )
- 2 levels: AM and PM
Nuisance factor: Tutor (b = 4 )
- 4 levels: Booby, Julia, Monique, and Ned
DV: Midterm Score
Null hypothesis pertaining to the IV of interest:
We will also have a null hypothesis pertaining to the blocking factor:
Two Nulls = two values of , two values of , two decisions

Note

DV: Midterm Score (in cells)

Example: Sum of Squares

IV: Time (𝑎= 2 )
Nuisance factor: Tutor (𝑏= 4 )
DV: Midterm Score
Now:
Thus, we can partition the effects into three parts:
- Sum of squares due to treatments (IV = Time),
- Sum of squares due to the blocking factor,
- and Sum of squares due to error.
We do not model an interaction with blocked designs. (we will talk about it later.)

Example: Mean of Squares and F-statistics

IV: Time (𝑎= 2 )
Nuisance factor: Tutor (𝑏= 4 )
DV: Midterm Score
Model:
ANOVA table:

Example: Sum of Square Formula

This is the same as before: take each individual score (), subtract the grand mean (), and square it

scores <- c(9.1, 9.3, 9.4, 9.5) # four individuals' scores
mean(scores) # grand mean

[1] 9.325

sum((scores - mean(scores))^2) # sum of squares of total indivisuals

[1] 0.0875

Do this for everyone and then sum over all people
However, when calculating Sum of Squares for IVs: , , , we need to compute “marginal means”
- A marginal mean is the mean for one level of the variable, ignoring the other variable

Example: Marginal Means

A marginal mean is the mean for one level of the variable, ignoring the other variable

For example:

The AM marginal mean is the average of all students’ midterm scores in the morning, ignoring who they have as a tutor
The Bobby marginal mean is the average of all students’ midterm scores who had Bobby as a tutor, ignoring time of day

Example: SS for Total and Model

where is the group size for AM/PM and are the marginal means for AM and PM {22.95, 13.43}
This is similar to how we computed before: marginal group mean subtract off the grand mean and square it. Sum over all groups.

Technically, the blocking factor is just another IV (but we are not interested in or is not within the scope of research question).
- = 7332. 03

Example: Details of ANOVA Table

Then, we can fill out the ANOVA table:

Note

Under , for “Model” factor – Time, we have = 1, : so sig.

Similarly, for “Blocking” - Tutor, we have = 3, : so sig.

Interpretation

A randomized block design was used to test the effect of tutoring time on midterm scores. For each tutor, participants were randomly assigned to either morning (𝑛 = 91 ) or afternoon (𝑛= 109 ) tutoring sessions.
The effect of tutoring time was significant ) with a large effect. Using a Tukey’s test, morning is significantly higher than afternoon sessions (𝑝 <. 05 ).
The effect of the blocking factor, tutor, was significant () with a large effect. Using a Tukey’s test, Monique’s students were significantly higher than other students, and Bobby’s students were significantly lower than other students in the midterm scores (𝑝<. 05 ).

Example: Hardness Reading¹

In this example we wish to determine whether 4 different tips (the treatment factor) produce different (mean) hardness readings on a Rockwell hardness tester.
- The treatment factor is the design of the tip for the machine that determines the hardness of metal. The tip is one component of the testing machine.

The Rockwell hardness test

The Rockwell hardness test is a hardness test based on indentation hardness of a material. The Rockwell test measures the depth of penetration of an indenter under a large load (major load) compared to the penetration made by a preload (minor load).

To conduct this experiment we assign the tips to an experimental unit; that is, to a test specimen (called a coupon), which is a piece of metal on which the tip is tested.
- The blocking factor is the block of test specimens. The test specimens are blocks of metal that are similar in hardness. The test specimens are used to block the variation in hardness of the metal from the variation in the tips.

Example: Block Design - CRD

If the structure were a completely randomized experiment (CRD) that we discussed in lecture 7, we would assign the tips to a random piece of metal for each test. In this case, the test specimens would be considered a source of nuisance variability.

Code

set.seed(1234)
data.frame(
  Metal = paste0("Metal", 1:8),
  Tip = rep(c("Tip1", "Tip2", "Tip3", "Tip4"), each = 2),
  Hardness = sample(seq(9, 10, by =.1), 8)
) |> 
  kableExtra::kable()

Metal	Tip	Hardness
Metal1	Tip1	9.9
Metal2	Tip1	9.5
Metal3	Tip2	9.4
Metal4	Tip2	9.3
Metal5	Tip3	9.6
Metal6	Tip3	9.0
Metal7	Tip4	9.8
Metal8	Tip4	9.1

Example: Block Design - RCBD

If we conduct this as a blocked experiment, we would assign all four tips to the same test specimen, randomly assigned to be tested on a different location on the specimen. Since each treatment occurs once in each block, the number of test specimens is the number of replicates.
Back to the hardness testing example, the experimenter may very well want to test the tips (treatment) across specimens (block) of various hardness levels. This shows the importance of blocking. To conduct this experiment as a RCBD, we assign all 4 tips to each specimen.
- In this experiment, each specimen is called a “block”; thus, we have designed a more homogenous set of experimental units on which to test the tips.

Example: Block Design Table - RCBD

Suppose that we use b = 4 blocks as shown in the table below:
We are primarily interested in testing the equality of treatment means, but now we have the ability to remove the variability associated with the nuisance factor (the blocks) through the grouping of the experimental units prior to having assigned the treatments.

Code

tribble(
  ~`1`,     ~`2`,   ~`3`,   ~`4`,
  "Tip 3",  "Tip 3",    "Tip 2",    "Tip 1",
  "Tip 1",  "Tip 4",    "Tip 1",    "Tip 4",
  "Tip 4",  "Tip 2",    "Tip 3",    "Tip 3",
  "Tip 2",  "Tip 1",    "Tip 4",    "Tip 3"
) |> 
  gt() |> 
  tab_header(
    title = "The Hardness Testing Experiment",
    subtitle = "Randomized Complete Block Design"
  ) |> 
  tab_spanner(
    label = "Test Coupon (Block)",
    columns = everything()
  ) |> 
  tab_options(
    table.width = px(500),
    table.font.size = px(20)
  )

The Hardness Testing Experiment
Randomized Complete Block Design
Test Coupon (Block)
1	2	3	4
Tip 3	Tip 3	Tip 2	Tip 1
Tip 1	Tip 4	Tip 1	Tip 4
Tip 4	Tip 2	Tip 3	Tip 3
Tip 2	Tip 1	Tip 4	Tip 3

Important

Notice the two-way structure of the experiment. Here we have four blocks and within each of these blocks is a random assignment of the tips within each block.

Example: ANOVA Results (1)

Remember, the hardness of specimens (coupons) is tested with 4 different tips.

Code

library(here)
dat <- read.csv(here::here("teaching/2025-01-13-Experiment-Design/Lecture08", "tip_hardness.csv"))
kableExtra::kable(dat)

Obs	Tip	Hardness	Coupon
1	1	9.3	1
2	1	9.4	2
3	1	9.6	3
4	1	10.0	4
5	2	9.4	1
6	2	9.3	2
7	2	9.8	3
8	2	9.9	4
9	3	9.2	1
10	3	9.4	2
11	3	9.5	3
12	3	9.7	4
13	4	9.7	1
14	4	9.6	2
15	4	10.0	3
16	4	10.2	4

Example: ANOVA Results (2)

Here is the output from R aov(). We can see four levels of the Tip and four levels for Coupon:

Code

dat$Tip <- factor(dat$Tip)
dat$Coupon <- factor(dat$Coupon)
fit_exp2 <- aov(Hardness ~ Tip + Coupon, data = dat)

Note

The Analysis of Variance table shows three degrees of freedom for Tip three for Coupon, and the residual (error) degrees of freedom is nine.
The ratio of mean squares of treatment over error gives us an F ratio that is equal to 14.44 which is highly significant since it is greater than the .001 percentile of the F distribution with three and nine degrees of freedom.
Our 2-way analysis also provides a test for the block factor, Coupon. The ANOVA shows that this factor is also significant with an F-test = 30.94. So, there is a large amount of variation in hardness between the pieces of metal.
This is why we used specimen (or coupon) as our blocking factor. We expected in advance that it would account for a large amount of variation. By including block in the model and in the analysis, we removed this large portion of the variation, such that the residual error is quite small. By including a block factor in the model, the error variance is reduced, and the test on treatments is more powerful.

Example: ANOVA Results (3)

The test on the block factor is typically not of interest except to confirm that you used a good blocking factor. The results are summarized by the table of means given below.

Code

cbind(
dat |> 
  group_by(Tip) |>
  summarize(
    N_Tip = n(),
    Hardness_Tip = mean(Hardness))
,
dat |> 
  group_by(Coupon) |>
  summarize(
    N_Coupon = n(),
    Hardness_Coupon = mean(Hardness)) 
) |> 
  kableExtra::kable()

Tip	N_Tip	Hardness_Tip	Coupon	N_Coupon	Hardness_Coupon
1	4	9.575	1	4	9.400
2	4	9.600	2	4	9.425
3	4	9.450	3	4	9.725
4	4	9.875	4	4	9.950

Example: Plant Fertilizer¹

In a greenhouse experiment, there was a single factor (fertilizer) with 4 levels (i.e. 4 treatments), six replications, and a total of 24 experimental units (potted plants). Suppose the image below is the greenhouse bench (viewed from above) that was used for the experiment.
To use CBD, we need to randomly assign each of the treatment levels to 6 potted plants. To do this, we first assign numbers to the physical position of the pots on the bench. Each column of plants will be use one Fertilizer.
To further expand it to RCBD, we need to put them into different farms (say we have six farms). Each farms will have 4 plans with each will use one type of fertilizer.

Read in R Data

After saving the exp1_data.csv into the same directory of your R file. You should be able to import dataset from Files Panel in Rstudio:

Block Design

gt(dat, 
   rownames_to_stub = TRUE, 
   groupname_col = "Block", 
   row_group_as_column = TRUE) |> 
   tab_options(table.width = px(500))

		Fertilizer	Height	Plant
Block1	1	Control	19.5	1
	7	F1	25.0	7
	13	F2	22.5	13
	19	F3	27.5	19
Block2	2	Control	20.5	2
	8	F1	27.5	8
	14	F2	25.2	14
	20	F3	28.0	20
Block3	3	Control	21.0	3
	9	F1	28.0	9
	15	F2	26.0	15
	21	F3	29.2	21
Block4	4	Control	21.0	4
	10	F1	28.6	10
	16	F2	26.5	16
	22	F3	29.5	22
Block5	5	Control	21.5	5
	11	F1	30.5	11
	17	F2	27.0	17
	23	F3	30.0	23
Block6	6	Control	22.5	6
	12	F1	32.0	12
	18	F2	28.0	18
	24	F3	31.0	24

ANOVA Results

Let us obtain the ANOVA table for the RCBD. To run the model with Block Design in R we can use the aov() function with the formula:

<Outcome>~<Treatment>+<Block>.

fit_rcbd <- aov(Height ~ Fertilizer + Block, data = dat)                 
summary(fit_rcbd)

            Df Sum Sq Mean Sq F value   Pr(>F)    
Fertilizer   3 251.44   83.81  162.96 1.14e-11 ***
Block        5  53.32   10.66   20.73 2.99e-06 ***
Residuals   15   7.72    0.51                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For comparison, let us obtain the ANOVA table for the CRD for the same data. We use the following R code with the formula:

<Outcome>~<Treatment>

fit_cbd <- aov(Height ~ Fertilizer, data = dat)              
summary(fit_cbd)

            Df Sum Sq Mean Sq F value   Pr(>F)    
Fertilizer   3 251.44   83.81   27.46 2.71e-07 ***
Residuals   20  61.03    3.05                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

Important

Comparing the two ANOVA tables, we see that the MSE in RCBD has decreased considerably in comparison to the CRD. This reduction in MSE can be viewed as the partition in SSE for the CRD (61.033) into SSBlock (53.32) + SSE (7.715). The potential reduction in SSE by blocking is offset to some degree by losing degrees of freedom for the blocks. But more often than not, is worth it in terms of the improvement in the calculated F-statistic. In our example, we observe that the F-statistic for the treatment has increased considerably for RCBD in comparison to CRD. It is reasonable to assume that the result from the RCBD is more valid than that from the CRD as the MSE value obtained after accounting for the block to block variability is a more accurate representation of the random error variance.

Example: Performance of Students at varied environment

Background: Comparing the performances of students (male and female) blocks in different environments (at home and at college). To represent this experiment in the figure will be as follows:

Where AC: At College, AH: At Home

stud <- factor(rep(c("male", "female"), each = 2)) 
perf <- factor(rep(c("ah", "ac" ), times = 2)) 
perf

[1] ah ac ah ac
Levels: ac ah

y <- c(5.5, 5, 
    4, 6.2) 

# y is the hours students 
# studied in specific places 
results <- data.frame(y, stud, perf) 

fit <- aov(y ~ perf+stud, data = results)                
summary(fit)

            Df Sum Sq Mean Sq F value Pr(>F)
perf         1 0.7225  0.7225   0.396  0.642
stud         1 0.0225  0.0225   0.012  0.930
Residuals    1 1.8225  1.8225

Explanation: The value of Mean Sq is 0.7225<<1.8225,i.e, here blocking wasn’t necessary. And as Pr value is 0.642 > 0.05 (5% significance) and the hypothesis is accepted - there is no sufficient evidence suggesting females and males have significant differences in performance.

Health_Program	N
Program A	50
Program B	50
Program C	50
Control	50