1.1 Presentation Outline

Types of Statistics
- Descriptive: Summarize data (central tendency, variability, shape)
- Inferential: Makes population inferences from samples
Hypothesis Testing
1. State $H_0$ and $H_A$
2. Set α level
3. Compute test statistics
4. Conduct test
ANOVA Fundamentals
- One DV, one IV with multiple levels
- Between/within designs
- Interaction effects
Test Components
- Error types (I & II)
- Variance analysis ( $SS_T$ , $SS_B$ , $SS_W$ )
- F-statistics and critical values
Examples
- Political attitudes (Democrats, Republicans, Independents)
Decision Making
- Compare F-observed vs F-critical
- Compare p-value vs α
- Interpret at (1-α) confidence level

1.2 Types of Statistics

1.2.1 1. Descriptive Statistics

Definition: Describes and summarizes the collected data using numbers/values
- Central tendency: mean, median, mode
- Variability: range, interquartile range (IQR), variance, standard deviation
- Shape of distribution: skewness, kurtosis

moments::skewness(c(1:10, 100))

[1] 2.793716

moments::skewness(rnorm(100, 0, 1))

[1] 0.1482851

Examples of skewness with two graphs:

1.2.2 2. Inferential Statistics

Definition: Uses probability theory to infer/estimate population characteristics from a sample using hypothesis testing
Visual representation shows:
- Population → Sampling → Sample
- Sample → Inference → Population
  - Sample is analyzed using descriptive statistics
  - Inferential statistics used to make conclusions about population

1.2.3 3. Predictive Statistics

Definition: Use observed data to produce the most accurate prediction possible for new data. Here, the primary goal is that the predicted values have the highest possible fidelity to the true value of the new data.
Example: A simple example would be for a book buyer to predict how many copies of a particular book should be shipped to their store for the next month.

1.3 Which type statistics to use

How many houses burned in California wildfire in the first week?
- Descriptive
Which factor is most important causing the fires?
- Inference
How likely the California wildfire will not happen again in next 5 years?
- Predictive
How likely human will live on Mars?
- Not statistics. Sci-Fi
Which type of statistics used by ChatGPT?

1.4 Statistical Hypothesis Testing Steps

To perform inference statistics, we need to go through following steps:

State null hypothesis (H₀) and alternative hypothesis ( $H_A$ )
- Null hypothesis must be some statement that is statistically testable.
Set alpha α (type I error rate) to determine significance levels
- rejection region vs. p-value
Compute test statistics (i.e., F-statistics)
Conduct hypothesis testing:
- Compare test statistics: critical value vs. observed value
- Compare alpha and p-value

1.5 ANOVA Introduction

ANOVA is one of the most frequently used statistical tool for inference statistics in experimental design.
Settings for Analysis of Variance (ANOVA):
- One dependent variable (DV), “Outcome”
- One independent variable (IV) with multiple levels, “Group”
Example question: “Are there mean differences in SAT math scores (outcome) for different high school program types (group)?”
Course covers advanced ANOVA topics:
- Group comparisons (Group A vs. B vs. C)
- Model comparisons
- Between/within-subject design
- Interaction effects

1.6 Types of ANOVA: Key Differences

One-Way ANOVA
- Purpose: Tests one factor with three or more levels on a continuous outcome.
- Use Case: Comparing means across multiple groups (e.g., diet types on weight loss).
Two-Way ANOVA
- Purpose: Examines two factors and their interaction on a continuous outcome.
- Use Case: Studying effects of diet and exercise on weight loss.
Repeated Measures ANOVA
- Purpose: Tests the same subjects under different conditions or time points.
- Use Case: Longitudinal studies measuring the same outcome over time (e.g., cognitive tests after varying sleep durations).
Mixed-Design ANOVA
- Purpose: Combines between-subjects and within-subjects factors in one analysis.
- Use Case: Evaluating treatment effects over time with control and experimental groups.
Multivariate Analysis of Variance (MANOVA)
- Purpose: Assesses multiple continuous outcomes (dependent variables) influenced by independent variables.
- Use Case: Impact of psychological interventions on anxiety, stress, and self-esteem.

2 Example 1: Political Study on Tax Reform Attitudes

2.1 Background

A political scientist is interested in politicians’ attitudes toward tax reform and conducts a survey of Republicans, Independents, and Democrats.
Political scientist study on tax reform attitudes:
- Groups: Democrats, Republicans, Independents
- Data shows attitude scores (higher score = greater concern for tax reform)
- Analysis conducted at α = .05
- Data includes:
  - party: Democrats (4), Republicans (5), Independents (8)
  - scores: attitudes scores for the survey respondents
    - The higher the score, the greater the concern for tax reform

remotes::install_github("JihongZ/ESRM64103")
library(ESRM64103)

library(ESRM64103)
library(dplyr)
exp_political_attitude

         party scores
1     Democrat      4
2     Democrat      3
3     Democrat      5
4     Democrat      4
5     Democrat      4
6   Republican      6
7   Republican      5
8   Republican      3
9   Republican      7
10  Republican      4
11  Republican      5
12 Independent      8
13 Independent      9
14 Independent      8
15 Independent      7
16 Independent      8

2.2 Descriptive Statistics: summary statistics

Standard deviations and variances for each group
Grand mean: 5.625

# Grand mean
mean(exp_political_attitude$scores)

[1] 5.625

exp_political_attitude$party <- factor(exp_political_attitude$party, levels = c("Democrat", "Republican", "Independent"))
mean_byGroup <- exp_political_attitude |> 
  group_by(party) |> 
  summarise(Mean = mean(scores),
            SD = round(sd(scores), 2),
            Vars = round(var(scores), 2),
            N = n())
mean_byGroup

# A tibble: 3 × 5
  party        Mean    SD  Vars     N
  <fct>       <dbl> <dbl> <dbl> <int>
1 Democrat        4  0.71   0.5     5
2 Republican      5  1.41   2       6
3 Independent     8  0.71   0.5     5

2.3 Descriptive Statistics: histogram

library(ggplot2)
ggplot(data = mean_byGroup) +
  geom_bar(mapping = aes(x = party, y = Mean, fill = party), stat = "identity", width = .5) +
  geom_label(aes(x = party, y = Mean, label = Mean), nudge_y = .3) +
  labs(title = "Attitudes Toward the Tax Return") +
  theme(text = element_text(size = 15))

2.4 Analysis Steps

State the null hypothesis and alternative hypothesis:
- $H_0$ : $\bar{X}_{dem}$ = $\bar{X}_{rep}$ = $\bar{X}_{ind}$
- $H_A$ : At least two groups are significantly different
- Question: Why not testing $\bar{SD}_{dem}$ = $\bar{SD}_{rep}$ = $\bar{SD}_{ind}$ ?
- Answer: You definitely can in statistics. Variances homogeneity.
Set the significant alpha = 0.05

Quick Review of F-statistics:

$F_{obs} = \frac{SS_b/df_b}{SS_w/df_w}$

$df_b$ = 3 (groups) - 1 = 2, $df_w$ = 16 (samples) - 3 (groups) =13
$SS_b$ = $\Sigma n_j(\bar{Y}_j - \bar{Y})^2$ = 43.75; where $n_j$ is group sample sizes, $\bar{Y}_j$ is group means, and $\bar{Y}$ is the grand mean.

$SS_w$ = $\Sigma_{j=1}^{3} \Sigma_{1}^{n_j}(Y_{ij}-\bar{Y_j})^2$ = 14.00; where $\bar{Y}_{ij}$ is the individual i’s score in group j

GrandMean <- mean(exp_political_attitude$scores)
## Between-group Sum of Squares
sum(mean_byGroup$N * (mean_byGroup$Mean - GrandMean)^2)
## Within-group Sum of Squares
SSw_dt <- exp_political_attitude |> 
  group_by(party) |> 
  mutate(GroupMean = mean(scores),
         Diff_sq = (scores - GroupMean)^2) 
sum(SSw_dt$Diff_sq)

F_critical (df_num = 2, df_deno = 13) = 3.81
F_observed = 20.31

mod1 <- lm(scores ~ party, data = exp_political_attitude)
anova(mod1)

Analysis of Variance Table

Response: scores
          Df Sum Sq Mean Sq F value    Pr(>F)    
party      2  43.75 21.8750  20.312 9.994e-05 ***
Residuals 13  14.00  1.0769                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results show rejection of H₀ (F_obs > F_critical)

2.5 Step 1: State the null hypothesis and alternative hypothesis

Formulate the null hypothesis (𝐻₀) and the alternative hypothesis (𝐻ₐ)
- Prior to any statistical tests, start with a working hypothesis based on an initial guess about the phenomenon.
- Example: Investigating whether diet groups affect weight loss.
  - Research question: “Is there a variance in weight loss among different diet groups?”
  - Hypothesis: “Different diet groups will show varying weight loss.”
- Operational Definitions:
  - Null hypothesis (𝐻₀): No observed difference or effect (“Something is something”).
    - Group A’s mean - Group B’s mean = 0
  - Alternative hypothesis (𝐻ₐ): Noticeable difference or effect, contrary to 𝐻₀. (“Something is not something”)
- The adequacy of the data will dictate if 𝐻₀ can be confidently rejected.

2.6 Step 2: Rejection region

F-statistic has two degree of freedoms. This is the density distribution of F-statistics for degree of freedoms as 2 and 13.

⌘+C

# Set degrees of freedom for the numerator and denominator
num_df <- 2  # Change this as per your specification
den_df <- 13  # Change this as per your specification

# Generate a sequence of F values
f_values <- seq(0, 8, length.out = 1000)

# Calculate the density of the F-distribution
f_density <- df(f_values, df1 = num_df, df2 = den_df)

# Create a data frame for plotting
data_to_plot <- data.frame(F_Values = f_values, Density = f_density)
data_to_plot$Reject05 <- data_to_plot$F_Values > 3.81
data_to_plot$Reject01 <- data_to_plot$F_Values > 6.70
# Plot the density using ggplot2
ggplot(data_to_plot) +
  geom_area(aes(x = F_Values, y = Density), fill = "grey", 
            data = filter(data_to_plot, !Reject05)) + # Draw the line
  geom_area(aes(x = F_Values, y = Density), fill = "yellow", 
            data = filter(data_to_plot, Reject05)) + # Draw the line
  geom_area(aes(x = F_Values, y = Density), fill = "tomato", 
            data = filter(data_to_plot, Reject01)) + # Draw the line
  geom_vline(xintercept = 3.81, linetype = "dashed", color = "red") +
  geom_label(label = "F_crit = 3.81 (alpha = .05)", x = 3.81, y = .5, color = "red") +
  geom_vline(xintercept = 6.70, linetype = "dashed", color = "royalblue") +
  geom_label(label = "F_crit = 6.70 (alpha = .01)", x = 6.70, y = .5, color = "royalblue") +
  ggtitle("Density of F-Distribution") +
  xlab("F values") +
  ylab("Density") +
  theme_classic()

Set the alpha $\alpha$ (i.e., type I error rate)—rejection rate, vs. p-value
- Alpha can determine several values for the statistical hypothesis testing: the critical value of the test statistics, the rejection region, etc.
- Large sample size needs lower alpha level : .01/.001 (more restrict rejection rate)
When we conduct a hypothesis testing, these four cases might be occured

Type I & II Error
	Reality
Decision	True $H_0$	False $H_0$
Fail to reject $H_0$	Correct Decision	Error made. Type II error ( $\beta$ ).
Reject $H_0$	Error made. Type I error ( $\alpha$ )	Correct Decision (Power)

2.7 Step 3: Compute the test statistics

Investigate where the variability of outcome come from?
- In this study, do people’s attitude scores differ because of political parties?
- Imagine we have two factors: A and B, the variability of outcome can be separated as following:

2.7.1 F-statistics

Core idea of F-stats: comparing the variances between groups and within groups to ascertain if the means of different groups are significantly different from each other.
Logic: if the between-group variance (due to systematic differences caused by the independent variable) is significantly greater than the within-group variance (attributable to random error), the observed differences between group means are likely not due to chance.
F-statistics under 1-way ANOVA:

$F_{obs} = \frac{SS_b/df_b}{SS_w/df_w}$
- $df_b$ = 3 (groups) - 1 = 2, $df_w$ = 16 (samples) - 3 (groups) =13
- $SS_b$ = $\Sigma n_j(\bar{Y}_j - \bar{Y})^2$ = 43.75;
  - the variability in the differences between groups (weighted by the sample size of the group)
- $SS_w$ = $\Sigma_{j=1}^{3} \Sigma_{1}^{n_j}(Y_{ij}-\bar{Y_j})^2$ = 14.00; where $\bar{Y}_{ij}$ is the individual i’s score in group j
  - Random error with groups - people differ in attitudes for the unkown reason

2.8 Step 4: Conduct a hypothesis testing

In addition to the comparison of the critical value and the observed value of the test statistics, we also can compare the alpha and the p-value:

We determine F_crit by setting α value.
- α = (acceptable) type I error rate = probability that we wrongly reject $H_0$ when $H_0$ is true
From the data, we can obtain F_obs with p-value.
- p-value = probability of data sets have F-stats larger than F_obs
If the F statistic from the data (=F_obs) is larger than the F critical, then you are in the rejection region and you can reject the $H_0$ and accept the $H_A$ with (1-α) level of confidence.
If the p-value obtained from the ANOVA is less than α, then reject H0 and accept the HA with (1-α) level of confidence.

2.9 Step 5: Results Report

A one-way ANOVA was conducted to compare the level of concern for tax reform among three political groups: Democrats, Republicans, and Independents. There was a significant effect of political affiliation on tax reform concern at the p < .001 level for the three conditions [F(2, 13) = 20.31, p < .001]. This result indicates significant differences in the attitudes toward tax reform among the groups.

2.10 Note: Relationship Between P-values and Type I Error

p-values: the probability of observing data as extreme as, or more extreme than, the data observed under the assumption that the null hypothesis is true.
- Lower the p-values are, we are less likely to see the observed data given the null hypothesis is true
- Question: Given that we already have the observed data, does lower p-values means the null hypothesis is unlikely to be true, which is our goal for inference statistics?
- Answer: p(observed data exists | H0 = true) is not equal to p(H0 = true | observed data exists), P-values are often misconstrued as the probability that the null hypothesis is true given the observed data. However, this interpretation is incorrect.
Type I error, also known as a “false positive,” occurs when the null hypothesis is incorrectly rejected when it is, in fact, true.
The alpha level set before conducting a test (commonly α = 0.05) essentially defines the cut-off point for the p-value below which the null hypothesis will be rejected.
- A p-value that is less than the alpha level suggests a low probability that the observed data would occur if the null hypothesis were true. Consequently, rejecting the null hypothesis in this context implies that there is a statistically significant difference likely not due to random chance.

2.11 Note: Limitations of p-values

Relying solely on p-values to reject the null hypothesis can be problematic for several reasons:

Binary Decision Making: The use of a threshold (e.g., α = 0.05) to determine whether to reject the null hypothesis reduces the complexity of the data and the underlying phenomena to a binary decision. This can oversimplify the interpretation and overlook the nuances in the data.
- Confidence Intervals. Bayesian statistics - reporting posterior distribution.
Neglect of Effect Size: P-values do not convey the size or importance of an effect. A very small effect can produce a small p-value if the sample size is large enough, leading to a rejection of the null hypothesis even when the effect may not be practically significant.
- Independent of sample size.
Probability of Extremes Under the Null: Since p-values quantify the extremeness of the observed data under the null hypothesis, they do not address whether similarly extreme data could also occur under alternative hypotheses. This can lead to an overemphasis on the null hypothesis and potentially disregard other plausible explanations for the data.
- Explore theory. Find other explanations. Tried varied models.

3 Example 2: the Effect of Sleep on Academic Performance (Simulation)

3.1 Background

A study investigates the effect of different sleep durations on the academic performance of university students. Three groups are defined based on nightly sleep duration: Less than 6 hours, 6 to 8 hours, and more than 8 hours.
We can simulate the data

# Set seed for reproducibility
set.seed(42)

# Generate data for three sleep groups
less_than_6_hours <- rnorm(30, mean = 65, sd = 10)
six_to_eight_hours <- rnorm(50, mean = 75, sd = 8)
more_than_8_hours <- rnorm(20, mean = 78, sd = 7)

# Combine data into a single data frame
sleep_data <- data.frame(
  Sleep_Group = factor(c(rep("<6 hours", 30), rep("6-8 hours", 50), rep(">8 hours", 20))),
  Exam_Score = c(less_than_6_hours, six_to_eight_hours, more_than_8_hours)
)

# View the first few rows of the dataset
head(sleep_data)

3.2 Descriptive Statistics

Groups:
- Less than 6 hours: 30 students
- 6 to 8 hours: 50 students
- More than 8 hours: 20 students
Performance Metric: Average exam scores out of 100.
- Less than 6 hours: Mean = 65, SD = 10
- 6 to 8 hours: Mean = 75, SD = 8
- More than 8 hours: Mean = 78, SD = 7

3.3 You turn:

F-test

Analysis: One-way ANOVA was conducted to compare the average exam scores among the three groups.
Results: F_observed = XX.XX, p = 0.001

Interpretation

Alpha Level: α = 0.05
P-value Interpretation: The p-value (0.001) is less than the alpha level (0.05), indicating a statistically significant difference in exam scores among the different sleep groups.
Conclusion: The results suggest that the amount of sleep significantly affects academic performance, with students getting 6 hours or more of sleep performing better on average than those with less sleep. The findings highlight the importance of adequate sleep among students for optimal academic outcomes.

3.4 Homework 1

Due on 02/03 5PM.

--- title: "Lecture 02: Hypothesis Testing" subtitle: "Experimental Design in Education" date: "2025-08-18" execute: eval: true echo: true format: html: code-tools: true code-line-numbers: false code-fold: false number-offset: 1 fig.width: 10 fig-align: center message: false grid: sidebar-width: 350px uark-revealjs: chalkboard: true embed-resources: false code-fold: false number-sections: true number-depth: 1 footer: "ESRM 64503" slide-number: c/t tbl-colwidths: auto scrollable: true output-file: slides-index.html mermaid: theme: forest --- ## Presentation Outline - Types of Statistics - Descriptive: Summarize data (central tendency, variability, shape) - Inferential: Makes population inferences from samples - Hypothesis Testing 1. State $H_0$ and $H_A$ 2. Set α level 3. Compute test statistics 4. Conduct test - ANOVA Fundamentals - One DV, one IV with multiple levels - Between/within designs - Interaction effects - Test Components - Error types (I & II) - Variance analysis ($SS_T$, $SS_B$, $SS_W$) - F-statistics and critical values - Examples - Political attitudes (Democrats, Republicans, Independents) - Decision Making - Compare F-observed vs F-critical - Compare p-value vs α - Interpret at (1-α) confidence level ## Types of Statistics ### 1. Descriptive Statistics - **Definition**: Describes and summarizes the collected data using numbers/values - Central tendency: mean, median, mode - Variability: range, interquartile range (IQR), variance, standard deviation - Shape of distribution: skewness, kurtosis ```{r} moments::skewness(c(1:10, 100)) moments::skewness(rnorm(100, 0, 1)) ``` ------------------------------------------------------------------------ - **Examples** of skewness with two graphs: ![](images/clipboard-2603774303.png) ------------------------------------------------------------------------ ### 2. Inferential Statistics - **Definition**: Uses probability theory to infer/estimate population characteristics from a sample using hypothesis testing - Visual representation shows: - Population → Sampling → Sample - Sample → Inference → Population - Sample is analyzed using descriptive statistics - Inferential statistics used to make conclusions about population ![](images/clipboard-3795101287.png) ------------------------------------------------------------------------ ### 3. Predictive Statistics - **Definition**: Use observed data to produce the most accurate prediction possible for new data. Here, the primary goal is that the predicted values have the highest possible fidelity to the true value of the new data. - **Example**: A simple example would be for a book buyer to predict how many copies of a particular book should be shipped to their store for the next month. ![](images/clipboard-3105485841.png) ## Which type statistics to use 1. How many houses burned in California wildfire in the first week? - [Descriptive]{.heimu} 2. Which factor is most important causing the fires? - [Inference]{.heimu} 3. How likely the California wildfire will not happen again in next 5 years? - [Predictive]{.heimu} 4. How likely human will live on Mars? - [Not statistics. Sci-Fi]{.heimu} 5. Which type of statistics used by ChatGPT? ![](images/clipboard-793465966.png){width="500"} ## Statistical Hypothesis Testing Steps **To perform inference statistics, we need to go through following steps:** 1. State null hypothesis (H₀) and alternative hypothesis ($H_A$) - Null hypothesis must be some statement that is **statistically testable**. 2. Set alpha α (type I error rate) to determine significance levels - rejection region vs. p-value 3. Compute test statistics (i.e., F-statistics) 4. Conduct hypothesis testing: - Compare test statistics: critical value vs. observed value - Compare alpha and p-value ## ANOVA Introduction - ANOVA is one of the most frequently used statistical tool for inference statistics in experimental design. - Settings for Analysis of Variance (ANOVA): - One dependent variable (DV), "Outcome" - One independent variable (IV) with multiple levels, "Group" - **Example question**: "Are there mean differences in SAT math scores (**outcome**) for different high school program types (**group**)?" - Course covers advanced ANOVA topics: - Group comparisons (Group A vs. B vs. C) - Model comparisons - Between/within-subject design - Interaction effects ## Types of ANOVA: Key Differences - [**One-Way ANOVA**]{.underline} - **Purpose**: Tests **one** factor with three or more levels on a **continuous** outcome. - **Use Case**: Comparing means across multiple groups (e.g., diet types on weight loss). - [**Two-Way ANOVA**]{.underline} - **Purpose**: Examines **two factors and their interaction** on a **continuous** outcome. - **Use Case**: Studying effects of diet and exercise on weight loss. - [**Repeated Measures ANOVA**]{.underline} - **Purpose**: Tests the same subjects under **different conditions or time points**. - **Use Case**: Longitudinal studies measuring the same outcome over time (e.g., cognitive tests after varying sleep durations). - [**Mixed-Design ANOVA**]{.underline} - **Purpose**: Combines **between-subjects and within-subjects** factors in one analysis. - **Use Case**: Evaluating treatment effects over time with control and experimental groups. - [**Multivariate Analysis of Variance (MANOVA)**]{.underline} - **Purpose**: Assesses **multiple continuous outcomes** (dependent variables) influenced by independent variables. - **Use Case**: Impact of psychological interventions on anxiety, stress, and self-esteem. # Example 1: Political Study on Tax Reform Attitudes ## Background - A political scientist is interested in politicians’ attitudes toward tax reform and conducts a survey of Republicans, Independents, and Democrats. - Political scientist study on tax reform attitudes: - Groups: Democrats, Republicans, Independents - Data shows attitude scores (higher score = greater concern for tax reform) - Analysis conducted at α = .05 - Data includes: - `party`: Democrats (4), Republicans (5), Independents (8) - `scores`: attitudes scores for the survey respondents - [The higher the score, the greater the concern for tax reform]{.underline} ```{r} #| eval: false remotes::install_github("JihongZ/ESRM64103") library(ESRM64103) ``` ```{r} library(ESRM64103) library(dplyr) exp_political_attitude ``` ## Descriptive Statistics: summary statistics - Standard deviations and variances for each group - Grand mean: 5.625 ```{r} # Grand mean mean(exp_political_attitude$scores) exp_political_attitude$party <- factor(exp_political_attitude$party, levels = c("Democrat", "Republican", "Independent")) mean_byGroup <- exp_political_attitude |> group_by(party) |> summarise(Mean = mean(scores), SD = round(sd(scores), 2), Vars = round(var(scores), 2), N = n()) mean_byGroup ``` ## Descriptive Statistics: histogram ```{r} library(ggplot2) ggplot(data = mean_byGroup) + geom_bar(mapping = aes(x = party, y = Mean, fill = party), stat = "identity", width = .5) + geom_label(aes(x = party, y = Mean, label = Mean), nudge_y = .3) + labs(title = "Attitudes Toward the Tax Return") + theme(text = element_text(size = 15)) ``` ## Analysis Steps 1. State the null hypothesis and alternative hypothesis: - $H_0$: $\bar{X}_{dem}$ = $\bar{X}_{rep}$ = $\bar{X}_{ind}$ - $H_A$: At least two groups are significantly different - Question: Why not testing $\bar{SD}_{dem}$ = $\bar{SD}_{rep}$ = $\bar{SD}_{ind}$? - Answer: [You definitely can in statistics. Variances homogeneity.]{.heimu} 2. Set the significant alpha = 0.05 3. Quick Review of F-statistics: $$ F_{obs} = \frac{SS_b/df_b}{SS_w/df_w} $$ - $df_b$ = 3 (groups) - 1 = 2, $df_w$ = 16 (samples) - 3 (groups) =13 - $SS_b$ = $\Sigma n_j(\bar{Y}_j - \bar{Y})^2$ = 43.75; where $n_j$ is group sample sizes, $\bar{Y}_j$ is group means, and $\bar{Y}$ is the grand mean.  - $SS_w$ =$\Sigma_{j=1}^{3} \Sigma_{1}^{n_j}(Y_{ij}-\bar{Y_j})^2$ = 14.00; where $\bar{Y}_{ij}$ is the individual i's score in group j ```{r} #| eval: false GrandMean <- mean(exp_political_attitude$scores) ## Between-group Sum of Squares sum(mean_byGroup$N * (mean_byGroup$Mean - GrandMean)^2) ## Within-group Sum of Squares SSw_dt <- exp_political_attitude |> group_by(party) |> mutate(GroupMean = mean(scores), Diff_sq = (scores - GroupMean)^2) sum(SSw_dt$Diff_sq) ``` - F_critical (df_num = 2, df_deno = 13) = 3.81 - F_observed = 20.31 ```{r} mod1 <- lm(scores ~ party, data = exp_political_attitude) anova(mod1) ``` Results show rejection of H₀ (F_obs \> F_critical) ## Step 1: State the null hypothesis and alternative hypothesis 1. Formulate the null hypothesis (𝐻₀) and the alternative hypothesis (𝐻ₐ) - Prior to any statistical tests, start with a working hypothesis based on an initial guess about the phenomenon. - Example: Investigating whether diet groups affect weight loss. - Research question: "Is there a variance in weight loss among different diet groups?" - Hypothesis: "[**Different diet groups will show varying weight loss.**]{.underline}" - Operational Definitions: - Null hypothesis (𝐻₀): No observed difference or effect ("[Something is something]{.underline}"). - Group A's mean - Group B's mean = 0 - Alternative hypothesis (𝐻ₐ): Noticeable difference or effect, contrary to 𝐻₀. ("[Something is not something]{.underline}") - The adequacy of the data will dictate if 𝐻₀ can be confidently rejected. ## Step 2: Rejection region F-statistic has two degree of freedoms. This is the density distribution of F-statistics for degree of freedoms as 2 and 13. ```{r} #| code-fold: true # Set degrees of freedom for the numerator and denominator num_df <- 2 # Change this as per your specification den_df <- 13 # Change this as per your specification # Generate a sequence of F values f_values <- seq(0, 8, length.out = 1000) # Calculate the density of the F-distribution f_density <- df(f_values, df1 = num_df, df2 = den_df) # Create a data frame for plotting data_to_plot <- data.frame(F_Values = f_values, Density = f_density) data_to_plot$Reject05 <- data_to_plot$F_Values > 3.81 data_to_plot$Reject01 <- data_to_plot$F_Values > 6.70 # Plot the density using ggplot2 ggplot(data_to_plot) + geom_area(aes(x = F_Values, y = Density), fill = "grey", data = filter(data_to_plot, !Reject05)) + # Draw the line geom_area(aes(x = F_Values, y = Density), fill = "yellow", data = filter(data_to_plot, Reject05)) + # Draw the line geom_area(aes(x = F_Values, y = Density), fill = "tomato", data = filter(data_to_plot, Reject01)) + # Draw the line geom_vline(xintercept = 3.81, linetype = "dashed", color = "red") + geom_label(label = "F_crit = 3.81 (alpha = .05)", x = 3.81, y = .5, color = "red") + geom_vline(xintercept = 6.70, linetype = "dashed", color = "royalblue") + geom_label(label = "F_crit = 6.70 (alpha = .01)", x = 6.70, y = .5, color = "royalblue") + ggtitle("Density of F-Distribution") + xlab("F values") + ylab("Density") + theme_classic() ``` ------------------------------------------------------------------------ 1. Set the alpha $\alpha$ (i.e., type I error rate)—rejection rate, vs. p-value - Alpha can determine several values for the statistical hypothesis testing: the critical value of the test statistics, the rejection region, etc. - Large sample size needs lower alpha level : .01/.001 (more restrict rejection rate) 2. When we conduct a hypothesis testing, these four cases might be occured +----------------------+-------------------------+--------------------------+ | | **Reality** | | +----------------------+-------------------------+--------------------------+ | **Decision** | True $H_0$ | False $H_0$ | +----------------------+-------------------------+--------------------------+ | Fail to reject $H_0$ | Correct Decision | Error made. | | | | | | | | Type II error ($\beta$). | +----------------------+-------------------------+--------------------------+ | Reject $H_0$ | Error made. | Correct Decision (Power) | | | | | | | Type I error ($\alpha$) | | +----------------------+-------------------------+--------------------------+ : Type I & II Error ## Step 3: Compute the test statistics - Investigate where the variability of outcome come from? - In this study, do people's attitude scores differ because of political parties? - Imagine we have two factors: A and B, the variability of outcome can be separated as following: ![](images/clipboard-2219252399.png) ------------------------------------------------------------------------ ### F-statistics - **Core idea of F-stats**: comparing the variances between groups and within groups to ascertain if the means of different groups are significantly different from each other. - **Logic**: if the **between-group variance** (due to systematic differences caused by the independent variable) is significantly greater than the **within-group variance** (attributable to random error), the observed differences between group means are likely not due to chance. - F-statistics under **1-way ANOVA:** $$ F_{obs} = \frac{SS_b/df_b}{SS_w/df_w}$$ - $df_b$ = 3 (groups) - 1 = 2, $df_w$ = 16 (samples) - 3 (groups) =13 - $SS_b$ = $\Sigma n_j(\bar{Y}_j - \bar{Y})^2$ = 43.75; - the variability in the differences between groups (weighted by the sample size of the group) - $SS_w$ =$\Sigma_{j=1}^{3} \Sigma_{1}^{n_j}(Y_{ij}-\bar{Y_j})^2$ = 14.00; where $\bar{Y}_{ij}$ is the individual i's score in group j - Random error with groups - people differ in attitudes for the unkown reason ## Step 4: Conduct a hypothesis testing - In addition to the comparison of the critical value and the observed value of the test statistics, we also can compare the alpha and the p-value: ::::: columns ::: column ![](images/clipboard-3877350302.png){width="100%"} ::: ::: column - We determine F_crit by setting α value. - α = (acceptable) type I error rate = probability that we wrongly reject $H_0$ when $H_0$ is true - From the data, we can obtain F_obs with p-value. - p-value = probability of data sets have F-stats larger than F_obs - If the F statistic from the data (=F_obs) is larger than the F critical, then you are in the rejection region and you can reject the $H_0$ and accept the $H_A$ with (1-α) level of confidence. - If the p-value obtained from the ANOVA is less than α, then reject H0 and accept the HA with (1-α) level of confidence. ::: ::::: ## Step 5: Results Report A one-way ANOVA was conducted to compare the level of concern for tax reform among three political groups: Democrats, Republicans, and Independents. There was a significant effect of political affiliation on tax reform concern at the p \< .001 level for the three conditions \[F(2, 13) = 20.31, p \< .001\]. This result indicates significant differences in the attitudes toward tax reform among the groups. ## Note: Relationship Between P-values and Type I Error 1. p-values: the probability of observing data as extreme as, or more extreme than, the data observed [under the assumption that the null hypothesis is true]{.underline}. - Lower the p-values are, we are less likely to see the observed data given the null hypothesis is true - Question: Given that we already have the observed data, does **lower p-values means the null hypothesis is unlikely to be true**, which is our goal for inference statistics? - Answer: [p(observed data exists \| H0 = true) is not equal to p(H0 = true \| observed data exists), P-values are often misconstrued as the probability that the null hypothesis is true given the observed data. However, this interpretation is incorrect.]{.heimu} 2. Type I error, also known as a "false positive," occurs when the null hypothesis is incorrectly rejected when it is, in fact, true. 3. The alpha level set before conducting a test (commonly α = 0.05) essentially defines the cut-off point for the p-value below which the null hypothesis will be rejected. - **A p-value that is less than the alpha level** suggests a low probability that the observed data would occur if the null hypothesis were true. Consequently, rejecting the null hypothesis in this context implies that there is a statistically significant difference likely not due to random chance. ## Note: Limitations of p-values Relying solely on p-values to reject the null hypothesis can be problematic for several reasons: - **Binary Decision Making**: The use of a threshold (e.g., α = 0.05) to determine whether to reject the null hypothesis reduces the complexity of the data and the underlying phenomena to a binary decision. This can oversimplify the interpretation and overlook the nuances in the data. - Confidence Intervals. Bayesian statistics - reporting posterior distribution. - **Neglect of Effect Size**: P-values do not convey the size or importance of an effect. A very small effect can produce a small p-value if the sample size is large enough, leading to a rejection of the null hypothesis even when the effect may not be practically significant. - Independent of sample size. - **Probability of Extremes Under the Null**: Since p-values quantify the extremeness of the observed data under the null hypothesis, they do not address whether similarly extreme data could also occur under alternative hypotheses. This can lead to an overemphasis on the null hypothesis and potentially disregard other plausible explanations for the data. - Explore theory. Find other explanations. Tried varied models. # Example 2: the Effect of Sleep on Academic Performance (Simulation) ## Background - A study investigates the effect of different sleep durations on the academic performance of university students. Three groups are defined based on nightly sleep duration: Less than 6 hours, 6 to 8 hours, and more than 8 hours. - We can simulate the data ```{r} #| eval: false # Set seed for reproducibility set.seed(42) # Generate data for three sleep groups less_than_6_hours <- rnorm(30, mean = 65, sd = 10) six_to_eight_hours <- rnorm(50, mean = 75, sd = 8) more_than_8_hours <- rnorm(20, mean = 78, sd = 7) # Combine data into a single data frame sleep_data <- data.frame( Sleep_Group = factor(c(rep("<6 hours", 30), rep("6-8 hours", 50), rep(">8 hours", 20))), Exam_Score = c(less_than_6_hours, six_to_eight_hours, more_than_8_hours) ) # View the first few rows of the dataset head(sleep_data) ``` ## Descriptive Statistics - **Groups**: - **Less than 6 hours**: 30 students - **6 to 8 hours**: 50 students - **More than 8 hours**: 20 students - **Performance Metric**: Average exam scores out of 100. - **Less than 6 hours**: Mean = 65, SD = 10 - **6 to 8 hours**: Mean = 75, SD = 8 - **More than 8 hours**: Mean = 78, SD = 7 ## You turn: #### F-test - **Analysis**: One-way ANOVA was conducted to compare the average exam scores among the three groups. - **Results**: F_observed = XX.XX, p = 0.001 #### Interpretation - **Alpha Level**: α = 0.05 - **P-value Interpretation**: [The p-value (0.001) is less than the alpha level (0.05), indicating a statistically significant difference in exam scores among the different sleep groups.]{.heimu} - **Conclusion**: [The results suggest that the amount of sleep significantly affects academic performance, with students getting 6 hours or more of sleep performing better on average than those with less sleep. The findings highlight the importance of adequate sleep among students for optimal academic outcomes.]{.heimu} ## Homework 1 Due on 02/03 5PM.