Author
Affiliation

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

Homework: Experimental Design and Validity

Course: ESRM 64103 — Experimental Design in Education Based on: Lecture 07 (Experimental Design and Validity) Total Points: 10 (1 point each)


Instructions

This homework consists of 10 multiple-choice questions covering concepts from Lecture 07 on experimental design types and the four types of validity. Each question presents a research scenario. Select the best answer for each question.


Questions

Question 1

A researcher wants to evaluate whether a new social-emotional learning (SEL) curriculum improves empathy scores in middle school students. She is specifically worried that administering a pre-test questionnaire about empathy might sensitize students to the topic — making them more empathetic regardless of the curriculum — and she wants to be able to estimate that pre-test effect separately from the curriculum effect. She has 240 participants available and four intact classrooms.

Which experimental design is most appropriate?

A. Pre-Post-test Design B. Post-test Only Design C. Solomon Four-Group Design D. Randomized Block Design

Answer: C

The Solomon Four-Group design is specifically built to isolate the pre-test sensitization effect from the treatment effect. With four groups (pre-tested + treatment, un-pretested + treatment, pre-tested + control, un-pretested + control), it can estimate both effects separately. The researcher’s explicit concern about pre-test sensitization, and her desire to quantify it, rules out simply using a Post-test Only design.


Question 2

An educational researcher examines whether a mindfulness intervention reduces burnout in graduate students. She randomly assigns 60 students to either the treatment or control group and measures burnout before and after the 10-week program. However, midway through the study, the university launches a campus-wide mental health awareness campaign unrelated to her intervention.

Which threat to internal validity does this illustrate?

A. Regression to the mean B. Maturation C. History D. Instrumentation

Answer: C

A history threat occurs when an external event — unrelated to the treatment — happens during the study and may influence the outcome. The campus mental health campaign is a real-world event concurrent with the intervention, making it impossible to attribute any observed reduction in burnout solely to the mindfulness program.


Question 3

A school district creates a new standardized test intended to measure students’ critical thinking skills. A validation study reveals that scores on this test correlate at r = .89 with a reading fluency measure, but only at r = .21 with an established logical reasoning assessment.

Which validity concern does this finding raise?

A. Statistical validity — the sample size is too small to detect real effects B. External validity — the results cannot generalize beyond this district C. Construct validity — the test may be measuring reading fluency rather than critical thinking D. Internal validity — random assignment was not used

Answer: C

Construct validity is threatened when the operationalization does not match the intended theoretical construct. A test labeled “critical thinking” that correlates far more strongly with reading fluency than with logical reasoning is likely capturing reading skill, not the construct it claims to measure.


Question 4

A researcher develops a growth mindset intervention and tests it on a convenience sample of undergraduate psychology students at a large research university. The study finds a significant positive effect. The researcher concludes that the intervention should be adopted by K–12 schools nationwide.

Which threat is most relevant to this conclusion?

A. Instrumentation threat B. Interaction of selection and treatment (external validity) C. Differential attrition (internal validity) D. Low statistical power (statistical validity)

Answer: B

Interaction of selection and treatment is an external validity threat: the effect observed in one type of sample (18–22-year-old university students) may not generalize to a very different population (K–12 students). The sample characteristics are tied to the effect, limiting generalizability.


Question 5

A researcher hypothesizes that the effectiveness of a flipped-classroom intervention depends on both class size (small vs. large) and student prior knowledge (low vs. high). She wants to test whether these two factors interact — that is, whether the intervention works better for small classes only when students have high prior knowledge.

Which design directly supports this investigation?

A. Pre-Post-test Design B. Repeated Measures Design C. 2 × 2 Factorial Design D. Randomized Block Design

Answer: C

A factorial design is the only design that simultaneously manipulates two or more independent variables and allows the researcher to test interaction effects between them. A 2 × 2 factorial (class size × prior knowledge) creates four conditions and directly tests whether the effect of one variable depends on the level of the other.


Question 6

A researcher conducts five separate independent-samples t-tests to compare treatment and control groups on five different outcome measures (reading, math, attendance, self-efficacy, and anxiety), each at α = .05. She reports that two outcomes are significant and concludes the treatment is broadly effective.

Which statistical validity threat is most relevant here?

A. Low power due to small sample size B. Inflated Type I error rate from multiple comparisons C. Violation of the normality assumption D. Sampling error from inadequate randomization

Answer: B

Running five tests at α = .05 without correction inflates the familywise error rate to approximately 1 − (0.95)^5 ≈ 23%. At least one false positive is quite likely. A Bonferroni correction (α = .05/5 = .01 per test) or a multivariate approach (MANOVA) should be used to control the Type I error rate.


Question 7

A researcher is studying the effect of a new writing workshop on essay quality. She knows that students’ baseline writing ability (measured by GPA) is strongly related to the outcome. Before randomly assigning students to treatment and control, she divides participants into three groups — low, medium, and high GPA — and then randomly assigns within each group.

Which design is she using, and what is its primary benefit?

A. Solomon Four-Group Design; it controls for pre-test sensitization B. Factorial Design; it tests the interaction between GPA and the workshop C. Randomized Block Design; it reduces error variance by controlling a known nuisance variable D. Repeated Measures Design; each student serves as their own control

Answer: C

By dividing participants into homogeneous blocks (based on GPA) before randomizing within blocks, the researcher controls for a known covariate that would otherwise inflate error variance. This increases the precision of the treatment effect estimate without adding more participants.


Question 8

A psychologist measures the same 30 children’s working memory capacity at the beginning, middle, and end of a school year to track developmental change. She chooses this approach partly because the assessment is expensive, and she wants to maximize statistical power while minimizing the number of participants needed.

Which design does this study use, and why is it efficient?

A. Post-test Only Design; there is no pre-test to reduce costs B. Randomized Block Design; children are blocked by age C. Repeated Measures Design; each child serves as their own control, removing between-person variability D. Solomon Four-Group Design; it separates pre-test effects from maturation effects

Answer: C

In a repeated measures design, each participant is measured multiple times, so individual differences are held constant across conditions. This eliminates a large source of error variance (between-person differences), yielding higher statistical power with fewer participants than a between-subjects design.


Question 9

A researcher creates a new scale to measure academic self-efficacy (a student’s belief in their ability to succeed academically — a construct theoretically related to general self-efficacy but unrelated to personality traits such as extraversion). To validate it, she correlates scores with (a) an established general self-efficacy measure (r = .78) and (b) a scale measuring extraversion (r = .71).

What do these correlations suggest about the scale’s construct validity?

A. Both correlations support convergent validity; the scale is well-validated B. The first correlation supports convergent validity, but the second threatens discriminant validity C. The second correlation supports discriminant validity because it is below r = .80 D. Neither correlation is relevant to construct validity

Answer: B

Convergent validity is supported when a scale correlates highly with theoretically related constructs — because general self-efficacy is conceptually similar to academic self-efficacy, the r = .78 is a good sign. However, discriminant validity requires that the scale does not correlate highly with theoretically unrelated constructs — because extraversion (how outgoing a person is) is conceptually distinct from academic self-efficacy, the r = .71 is unexpectedly high and suggests the scale may be capturing something broader than academic self-efficacy.


Question 10

A school district pilots a new homework reduction policy for one randomly selected school, while a comparable school serves as the control. Teachers at the control school learn that the treatment school’s students are doing less homework and getting similar or better grades. The control school teachers, feeling their students are being disadvantaged, begin informally reducing homework loads on their own.

Which threat to validity does this scenario best illustrate?

A. Regression to the mean (internal validity) B. Interaction of setting and treatment (external validity) C. Diffusion of treatment / resentful demoralization (internal validity) D. Mono-operation bias (construct validity)

Answer: C

When control group members learn about — and begin to mimic — the treatment condition, the distinction between groups breaks down. This is diffusion of treatment (the treatment “spreads” to the control group), which threatens internal validity because any observed difference between groups is now an underestimate of the true treatment effect. The control teachers’ behavior is also partly driven by a sense of unfairness, overlapping with compensatory rivalry.


Answer Key

Question Answer
1 C
2 C
3 C
4 B
5 C
6 B
7 C
8 C
9 B
10 C
Back to top