Lecture 01
Course and Bayesian Analysis Introduction
Jihong Zhang
Educational Statistics and Research Methods
Today’s Lecture Objectives
- Introduction to Bayesian Analysis
but, before we begin…
Background: What is Bayesian?
Where does the word “Bayesian” come from?
The term “Bayesian” is named after Thomas Bayes (1701-1761), an English statistician. He developed a mathematical theorem that describes how to update probabilities based on new evidence. His work was published posthumously in 1763 and has since become the foundation of Bayesian statistics.
- How to Pronounce Bayesian (Real Life Examples!)
- “B-Asian” or “Bayes-ian”?
- What does “Bayesian” mean? Assign probabilities to everything!
- Frequentist vs Bayesian
- Example: In the U.S., are there more male or female Asian faculty?
- Frequentist: If “Male:Female = 1:1 out of Asian Faculty” is fixed, how is the probability that the data happens?
- Bayesian: If I believe Male:Female = 1:2 but the data says 1:1, to what degree do I need to update my belief?
Bayesian Model Components
Simple Definition: Bayesian statistics is a method that combines what we already know (prior knowledge) with new evidence (data) to update our beliefs about uncertain things. Everything is treated as having some level of uncertainty that can be expressed as a probability.
- What we see: Observed Data (the data you actually collect)
- What we cannot see: Future Data, Data yet to be collected, Parameters, Population Distribution
Take home Note: Everything is random in Bayesian
- Observed Data: Some random information given a unknown generation process
- Parameter: The random components controlling the generation process
Take home Note: Every random component can be expressed as probability!
- The probability of observed data given parameters: \(p(\text{Observed Data}|\text{Parameters})\) = Data Likelihood
- The probability of parameters: \(p(\text{Parameters})\) = Prior Distribution
- The probability of parameters given observed data: \(p(\text{Parameters}|\text{Observed Data})\) = Posterior Distribution*
Bayesian Logic
A toy example:
The ratio of male Asian faculty to female Asian faculty is about 1:2 (Prior). We obtained information from 2000 doctoral candidates suggesting the gender ratio is 1:1 (Data; Likelihood given the true ratio we don’t know). Based on Bayes’ rule, the estimate for the ratio is probably 1:1.5 (Posterior) after combining these two statements.
Bayesian Analysis: Why It Is Used?
There are at least four main reasons why people use Bayesian Analysis:
- Missing data
- Multiple imputation (fill in missing data points with external information)
- More complicated model for certain types of missing data
- Lack of software capable of handling large-sized analyses
- Have a zero-inflated Poisson model with 1000 observations and 1000 parameters? No problem in Bayesian!
Bayesian Analysis: Why It Is Used? (Cont.)
- New complex models not available in frequentist framework
- Have a new model? (A model that estimates the probability students choose the right answers then choose the wrong answers in a multiple choice test?)
- Enjoy the Bayesian thinking process
- It is a way of thinking that everything is random and everything can be expressed as probability. It is a way of thinking that we can update our belief as we collect more data. It is a way of thinking that we can use our prior knowledge to help us understand the data.
Bayesian Analysis: Why It Is Used? (Cont.)
I created a BayesNet model for 30 survey items. It simulates the complete item relationships, which can still be estimated using Bayesian analysis.
Bayesian Analysis: Issues
- Subjective vs. Objective
- Prior distribution is subjective. It is based on your prior knowledge.
- However, 1) Scientific judgement is always subjective 2) you can use objective prior distribution to avoid this issue.
- Computationally Intensive
- It is not a problem anymore. We have computers.
- But we still need weeks or months to get results for some complex models and large datasets
- Difficult to understand
Bayesian Analysis is popular
![]()
Funding available only for NIH, CDC, FDA, AHRQ, and ACF 2020 Spring. Source: https://report.nih.gov/
What topics Bayesian Analysis can cover?
![]()
Funding available only for NIH, CDC, FDA, AHRQ, and ACF 2020 Spring. Source: https://report.nih.gov/
Case Study: Medical Test Example
The Scenario
Suppose you are a doctor and a patient comes in for a rare disease screening:
Question: If a patient tests positive, what is the probability they actually have the disease?
Case Study: Understanding the Test
Confusion Matrix (Test Characteristics)
| Has Disease |
True Positive = 95% |
False Negative = 5% |
|
(Correctly detected) |
(Missed cases) |
| No Disease |
False Positive = 10% |
True Negative = 90% |
|
(False alarm) |
(Correctly ruled out) |
Key Terms:
- Sensitivity = True Positive Rate = 95%
- Specificity = True Negative Rate = 90%
Case Study: Setting Up the Problem
What We Know (Define Our Probabilities):
Prior Distribution: \(P(\text{Disease}) = 0.01\) and \(P(\text{No Disease}) = 0.99\)
Likelihood: The probability of test results given disease status
- \(P(\text{Positive Test}|\text{Disease}) = 0.95\) (true positive)
- \(P(\text{Positive Test}|\text{No Disease}) = 0.10\) (false positive)
What We Want: Posterior Distribution
- \(P(\text{Disease}|\text{Positive Test}) = ?\)
Question: Can you guess how large is this probability?
Case Study: Applying Bayes’ Rule
\[P(\text{Disease}|\text{Positive Test}) = \frac{P(\text{Positive Test}|\text{Disease}) \times P(\text{Disease})}{P(\text{Positive Test})}\]
We need to find \(P(\text{Positive Test})\) first (total probability of testing positive):
\[P(\text{Positive Test}) = P(\text{Positive}|\text{Disease}) \times P(\text{Disease})\] \[+ P(\text{Positive}|\text{No Disease}) \times P(\text{No Disease})\]
\[= 0.95 \times 0.01 + 0.10 \times 0.99 = 0.0095 + 0.099 = 0.1085\]
Case Study: The Calculation
Now we can calculate the posterior:
\[P(\text{Disease}|\text{Positive Test}) = \frac{0.95 \times 0.01}{0.1085} = \frac{0.0095}{0.1085} \approx 0.0876\]
Result: Only about 8.76% chance the patient actually has the disease!
Case Study: Interpretation
Why Is the Probability So Low?
- Prior matters: The disease is rare (1%), so most people don’t have it
- False positives: 10% false positive rate means many healthy people test positive
- Bayesian updating: We updated our belief from 1% (prior) to 8.76% (posterior)
Key Insight:
- Out of 10,000 people:
- 100 have the disease (1%); 95 test positive ✓
- 9,900 are healthy (99%); 990 test positive (false alarm!)
- Total positive tests: 95 + 990 = 1,085
- Probability of disease given positive: \(\frac{95}{1,085} \approx 8.76\%\)
Case Study: What If We Test Again?
Second Test After First Positive:
New Prior: Our posterior becomes the new prior = P(Disease)’ = 0.0876 (8.76%)
If the second test is also positive:
\[P(\text{Disease}|\text{2nd Positive}) = \frac{0.95 \times 0.0876}{0.95 \times 0.0876 + 0.10 \times 0.9124}\]
\[= \frac{0.0832}{0.0832 + 0.0912} \approx 0.477\]
Result: Now about 47.7% chance of the individual actually having disease! This is sequential Bayesian updating.
Case Study: Take-Home Messages
- Prior information is powerful: The 1% base rate strongly influences the result
- Bayesian updating is iterative: We can keep updating as new data arrives
- Counterintuitive results: A “95% accurate” test doesn’t mean 95% chance of disease
- Everything is probability: Disease status, test results, all expressed as probabilities
This is the essence of Bayesian thinking: combining prior knowledge with data to update beliefs!
Wrapping Up
- We understand what “Bayesian” means and its components.
- How Bayesian estimation differs from other estimation methods, such as maximum likelihood
- We also know that Bayesian analysis is popular in many fields, especially for complex data
Next Class
We will talk about how Bayesian methods works in a little bit more technical way.
Suggestions
Your opinions are very important to me. Feel free to let me know if you have any suggestions on the course.