Combinations of continuous and categorical (e.g., either 0 or some other continuous number)
1.3 The Goal of Generalized Models
Generalized models map the substantive theoryonto the sample space of the observed outcomes
Sample space = type/range/outcomes that are possible
The general idea is that the statistical model will not approximate the outcome well if the assumed distribution is not a good fit to the sample space of the outcome
If model does not fit the outcome, the findings cannot be trusted
The key to making all of this work is the use of differing statistical distributions for the outcome
Generalized models allow for different distributions for outcomes
The mean of the distribution is still modeled by the model for the means (the fixed effects)
The variance of the distribution may or may not be modeled (some distributions don’t have variance terms)
1.4 What kind of outcome? Generalized vs. General
Generalized Linear Models \rightarrow General Linear Models whose residuals follow some not-normal distributions and in which a link transformed Y is predicted instead of the original scale of Y
Many kinds of non-normally distributed outcomes have some kind of generalized linear model to go with them:
Binary (dichotomous)
Unordered categorical (nominal)
Ordered categorical (ordinal)
Counts (discrete, positive values)
Censored (piled up and cut off at one end – left or right)
Zero-inflated (pile of 0’s, then some distribution after)
Continuous but skewed data (pile on one end, long tail)
These two are often called “multinomial” inconsistently
1.5 Common distributions and canonical link functions (from Wikipedia)
Distribution
Support of distribution
Typical uses
Link name
Link function \mathbf{X}\mathbf{\beta} = g(\mu)\,\!
Model for the Variance (“Sampling/Stochastic Model”):
If the errors aren’t normally distributed, then what are they?
Family of alternative distributions at our disposal that map onto what the distribution of errors could possibly look like
In logistic regression, you often hear sayings like “no error term exists” or “the error term has a binomial distribution”
1.7 Link Functions: How Generalized Models Work
Generalized models work by providing a mapping of the theoretical portion of the model (the right hand side of the equation) to the sample space of the outcome (the left hand side of the equation)
The mapping is done by a feature called a link function
The link function is a non-linear function that takes the linear model predictors, random/latent terms, and constants and puts them onto the space of the outcome observed variables
Link functions are typically expressed for the mean of the outcome variable (we will only focus on that)
In generalized models, the error variance is often a function of the mean, so no additive information exists if estimating the error term
1.8 Link Functions in Practice
The link function expresses the conditional value of the mean of the outcome
E(Y_p) = \hat{Y}_p = \mu_y
where E(\cdot) stands for expectation.
… through a non-linear link functiong(\hat Y_p)when used on conditional mean of outcome
or its inverselink functiong^{-1}(\mathbf{\beta X})when used on linear combination of predictors
2.6 3 Problems with GLM predicting binary outcomes
Assumption Violation problem: GLM for continuous, conditionally normal outcome = residuals can’t be normally distributed
Restricted Range problem (e.g., 0 to 1 for outcomes)
Predictors should not be linearly related to observed outcome
Effects of predictors need to be ‘shut off’ at some point to keep predicted values of binary outcome within range
Decision Making Problem: for GLM, the predicted value will a continuous predicted value with same scale of outcome (Y), how do we answer the question such as whether or not students will apply given certain value of GPA
But for Generalized Linear model, we can say students will have 50% probability of applying
2.7 The Binary Outcome: Bernoulli Distribution
Bernoulli distribution has following properties
Notation: Y_p \sim B(\boldsymbol{p}_p) (where p is the conditional probability of a 1 for person p)
Sample Space: Y_p \in \{0, 1\} (Y_p can either be a 0 or a 1)
From Equation 1 and Equation 2, we can know that g(\mathbb{E}(Y)) has a range of [-\infty, +\infty], P(Y = 1) has a range of [0, 1].
2.10 Interpretation of Coefficients
# function to translate OR to ProbabilityOR_to_Prob <-function(OR){ p = OR / (1+OR)return(p)}# function to translate Logit to ProbabilityLogit_to_Prob <-function(Logit){ OR <-exp(Logit) p =OR_to_Prob(OR)return(p)}Logit_to_Prob(Logit =-2.007) # p = .118
[1] 0.1184699
exp(-2.007)
[1] 0.1343912
Intercept \beta_0:
Logit: We can say the predicted logit value of Y = 1 for an individual when all predictors are zero; i.e., the average logit is -2.007
Probability: Alternatively, we can say the expected value of probability of Y = 1 is \frac{\exp(\beta_0)}{1+\exp(\beta_0)} when all predictors are zero; i.e., the average probability of applying to grad score is0.1184699
Odds Ratio: Alternatively, we can say the expected odds ratio (OR) of probability of Y = 1 is \exp(\text{Logit}) when all predictors are zero; the average odds (ratio) of the probability of applying to grad school is exp(-2.007) = 0.13439
Slope \beta_1:
Logit: We can say the predicted increase of logit value of Y = 1 with one-unit increase of X;
Probability: We can say the expected increase of probability of Y = 1 is \frac{\exp(\beta_0+\beta_1)}{1+\exp(\beta_0+\beta_1)}-\frac{\exp(\beta_0)}{1+\exp(\beta_0)} with one-unit increase of X; Note that the increase (\Delta(\beta_0, \beta_1)) is non-linear and dynamic given varied value of X.
Odds Ratio: We can say the expected odds ratio (OR) of probability of Y = 1 is \exp(\beta_1) times larger with one-unit increase of X. (hint: the new odds ratio is \exp(\beta_0+\beta_1) = \exp(\beta_0)\exp(\beta_1) when X = 1 and the old odds ratio is \exp(\beta_0) when X = 0)
2.11 Example: Fitting The Models
Model 0: The empty model for logistic regression with binary variable (applying for grad school) as the outcome
Model 1: The logistic regression model including centered GPA and binary predictors (Parent has granduate degree, Student Attend Public University)
2.12 Model 0: The empty model
The statistical form of empty model:
P(Y_p =1) = \frac{\exp(\beta_0)}{1+\exp(\beta_0)}
or
\text{logit}(P(Y_p = 1)) = \beta_0
Takehome Note
Many generalized linear models don’t list an error term in the statistical form. This is because the error has fixed mean and fixed variances.
For the logit function, e_p has a logistic distribution with a zero mean and a variance as \pi^2/3 = 3.29.
Use ordinal package and clm() function, we can model categorical dependent variables
library(ordinal)# response variable must be a factor1dataLogit$LLAPPLY =factor(dataLogit$LLAPPLY, levels =0:1)# Empty model: likely to apply2model0 =clm(LLAPPLY ~1, data = dataLogit, control =clm.control(trace =1))
1
The dependent variable must be stored as a factor
2
the formula and data arguments are identical to lm; The control = argument is only used here to show iteration history of the ML algorithm
# Or we can use chi-square distribution to calculate p-valueas.numeric(pchisq(-2* (logLik(model0)-logLik(model1)), 3, lower.tail =FALSE))
[1] 0.0001282981
Conclusion: reject H_0 and we preferred to the empty model
Question #2: Whether the effects of GPA, PARED, PUBLIC are significant or not?
Intercept\beta_0 = -0.3382 (0.1187):
Logit: the predicted logit of probability of applying for the grad school is -0.3382 for a person with 3.0 GPA, parents without a graduate degree, and at a private university
Odds Ratio: the predicted OR of applying for the grad school is 0.7130 for a person with 3.0 GPA, parents without a graduate degree, and at a private university (OR < 1: the probability of applying is less than the probability of not applying)
Probability: the predicted probability of applying for the grad school is 41.7% for a person with 3.0 GPA, parents without a graduate degree, and at a private university
Slope of parents having a graduate degree: \beta_1 (SE) = 1.0596 (0.2974) with p < .05
Logit: the predicted logit of applying for the grad school will increase 1.0596 for whose parents having a graduate degree controlling other predictors.
Odds Ratio: the predicted OR will increase from 0.7139 to 2.05 for whose parents having a graduate degree controlling other predictors – students who have parents with a graduate degree has 3x the odds of rating the item with a “likely to apply”
Probability: Compared to those without parental graduate degree, the predicted probability of “likely to apply” for students with parental graduate degree increases from 0.416 to .673
For every one-unit increase in GPA, the logit of applying for grad. school will increase 0.548, the odds ratio will be 1.73 times, the probabilities will be 19.2%, 29.2%, 41.6% to 55.2% for GPA = 1 , 2, 3 and 4
---title: "Lecture 06: Generalized Linear Models (Binary Outcome) and Matrix Algebra"subtitle: "Generalized Linear Models"author: "Jihong Zhang*, Ph.D"institute: | Educational Statistics and Research Methods (ESRM) Program* University of Arkansasdate: "2024-09-24"sidebar: falseexecute: echo: true warning: falseoutput-location: defaultcode-annotations: belowformat: uark-revealjs: scrollable: true chalkboard: true embed-resources: false code-fold: false number-sections: false footer: "ESRM 64503 - Lecture 06: Matrix Algebra" slide-number: c/t tbl-colwidths: auto output-file: slides-index.html html: page-layout: full toc: true toc-depth: 2 toc-expand: true lightbox: true code-fold: false fig-align: centerfilters: - quarto - line-highlight---## Today's Class- Introduction to [Generalized Linear Models]{style="color: tomato; font-weight: bold"} - Expanding your linear models knowledge to models for outcomes that are not conditionally normally distributed - An example of generalized models for binary data using logistic regression- Matrix Algebra# An Introduction to Generalized Linear Models## Categories of Multivariate Models- **Statistical models** can be broadly organized as: - [General]{style="color: tomato;"} (normal outcome) vs. [Generalized]{style="color: royalblue"} (not normal outcome) - [One dimension of sampling]{style="color: turquoise"} (one variance term per outcome) vs. [multiple dimensions of sampling]{style="color: yellowgreen"} (multiple variance terms) - Fixed effects only vs. Mixed effects (fixed and random effects = multilevel)- All models have **fixed effects**, and then: - **General Linear Models (GLM)**: conditionally normal distribution of data, fixed effects and [no]{style="color: tomato; font-weight: bold"} random effects - **General Linear** [Mixed]{style="color: yellowgreen; font-weight: bold"} **Models (GLMM)**: conditionally normal distribution for data, fixed and **random** effects - **General[ized]{style="color: royalblue; font-weight: bold"}** L**inear Models**: any conditional distribution for data, fixed effects through [link functions]{style="color: royalblue; font-weight: bold"}, no random effects - **General[ized]{style="color: royalblue; font-weight: bold"}** **Linear [Mixed]{style="color: yellowgreen; font-weight: bold"} Models**: any conditional distribution for data, fixed effects through [link functions]{style="color: royalblue; font-weight: bold"}, fixed and **random** effects- "Linear" means the fixed effects predict the link-transformed DV in a linear combination of $$ g^{-1}(\beta_0 +\beta_1X_1+ \beta_2X_2 + \cdots) $$## Unpacking the Big Picture```{mermaid}%%| echo: falseflowchart RL A("Model: Substantive Theory") --> |Hypothesized Causal Process|B("Observed Outcomes (any format)")```- **Substantive theory**: what guides your study- **Hypothetical causal process**: what the statistical model is testing (attempting to falsify) when estimated- **Observed outcomes**: what you collect and evaluate based on your theory - Outcomes can take many forms: - Continuous variables (e.g., time, blood pressure, height) - Categorical variables (e.g., Likert-type response, ordered categories, nominal categories) - Combinations of continuous and categorical (e.g., either 0 or some other continuous number)## The Goal of Generalized Models- Generalized models map [[the substantive theory]{.underline}]{style="color: royalblue; font-weight: bold"} [onto the **sample space** of the observed outcomes]{.underline} - **Sample space** = type/range/outcomes that are possible- The general idea is that the statistical model will not approximate the outcome well if the assumed distribution is not a good fit to the sample space of the outcome - If model does not fit the outcome, the findings cannot be trusted- The key to making all of this work is the use of differing statistical distributions for the outcome- Generalized models allow for different distributions for outcomes - The mean of the distribution is still modeled by the model for the means (the [fixed]{style="color: tomato; font-weight: bold"} effects) - The variance of the distribution may or may not be modeled (some distributions don’t have variance terms)## What kind of outcome? Generalized vs. General- Generalized Linear Models $\rightarrow$ General Linear Models whose residuals follow some **not-normal distributions** and in which a link [transformed Y]{style="color: yellowgreen; font-weight: bold"} is predicted instead of [the original scale of Y]{style="color: royalblue; font-weight: bold"}- Many kinds of non-normally distributed outcomes have some kind of generalized linear model to go with them: 1. Binary (dichotomous) 2. [Unordered categorical (nominal)]{style="color: tomato"} 3. [Ordered categorical (ordinal)]{style="color: tomato"} [These two are often called "multinomial" inconsistently]{.aside style="color: tomato; font-weight: bold"} 4. Counts (discrete, positive values) 5. Censored (piled up and cut off at one end – left or right) 6. Zero-inflated (pile of 0's, then some distribution after) 7. Continuous but skewed data (pile on one end, long tail)## Common distributions and canonical link functions (from [Wikipedia](https://en.wikipedia.org/wiki/Generalized_linear_model))```{=html}<table class="wikitable" style="background:white;"><tbody><tr><th>Distribution</th><th>Support of distribution</th><th>Typical uses</th><th>Link name</th><th>Link function <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=g(\mu )\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <mi>g</mi> <mo stretchy="false">(</mo> <mi>μ<!-- μ --></mi> <mo stretchy="false">)</mo> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=g(\mu )\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/5e2ebd12256b9e1b8dcdfdd4bd625f37df639ded" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:11.366ex; height:2.843ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=g(\mu )\,\!}"></span></th><th>Mean function</th></tr><tr><td><a href="/wiki/Normal_distribution" title="Normal distribution">Normal</a></td><td>real: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle (-\infty ,+\infty )}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mo stretchy="false">(</mo> <mo>−<!-- − --></mo> <mi mathvariant="normal">∞<!-- ∞ --></mi> <mo>,</mo> <mo>+</mo> <mi mathvariant="normal">∞<!-- ∞ --></mi> <mo stretchy="false">)</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle (-\infty ,+\infty )}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/e577bfa9ed1c0f83ed643206abae3cd2f234cf9c" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; width:11.107ex; height:2.843ex;" alt="{\displaystyle (-\infty ,+\infty )}"></span></td><td>Linear-response data</td><td>Identity</td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\mu \,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <mi>μ<!-- μ --></mi> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\mu \,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/63238c06f9c1927aee60b40fec3adccd419cf32a" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:8.441ex; height:2.676ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\mu \,\!}"></span></td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mu =\mathbf {X} {\boldsymbol {\beta }}\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mi>μ<!-- μ --></mi> <mo>=</mo> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mu =\mathbf {X} {\boldsymbol {\beta }}\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/12c514082234f52d09595635789f474de0279b7d" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:8.441ex; height:2.676ex;" alt="{\displaystyle \mu =\mathbf {X} {\boldsymbol {\beta }}\,\!}"></span></td></tr><tr><td><a href="/wiki/Exponential_distribution" title="Exponential distribution">Exponential</a></td><td rowspan="2">real: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle (0,+\infty )}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mo stretchy="false">(</mo> <mn>0</mn> <mo>,</mo> <mo>+</mo> <mi mathvariant="normal">∞<!-- ∞ --></mi> <mo stretchy="false">)</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle (0,+\infty )}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/de77e40eb7e2582eef8a5a1da1bc027b7d9a8d6e" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; width:8.138ex; height:2.843ex;" alt="{\displaystyle (0,+\infty )}"></span></td><td rowspan="2">Exponential-response data, scale parameters</td><td rowspan="2"><a href="/wiki/Multiplicative_inverse" title="Multiplicative inverse">Negative inverse</a></td><td rowspan="2"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=-\mu ^{-1}\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <mo>−<!-- − --></mo> <msup> <mi>μ<!-- μ --></mi> <mrow class="MJX-TeXAtom-ORD"> <mo>−<!-- − --></mo> <mn>1</mn> </mrow> </msup> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=-\mu ^{-1}\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f6532ae0a7d9f63020f9a3e4175c391fb1130f99" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:12.582ex; height:3.176ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=-\mu ^{-1}\,\!}"></span></td><td rowspan="2"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mu =-(\mathbf {X} {\boldsymbol {\beta }})^{-1}\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mi>μ<!-- μ --></mi> <mo>=</mo> <mo>−<!-- − --></mo> <mo stretchy="false">(</mo> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <msup> <mo stretchy="false">)</mo> <mrow class="MJX-TeXAtom-ORD"> <mo>−<!-- − --></mo> <mn>1</mn> </mrow> </msup> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mu =-(\mathbf {X} {\boldsymbol {\beta }})^{-1}\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/11209fa27eda9b964da5691b83fd3652d59ddcc0" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:14.391ex; height:3.176ex;" alt="{\displaystyle \mu =-(\mathbf {X} {\boldsymbol {\beta }})^{-1}\,\!}"></span></td></tr><tr><td><a href="/wiki/Gamma_distribution" title="Gamma distribution">Gamma</a></td></tr><tr><td><a href="/wiki/Inverse_Gaussian_distribution" title="Inverse Gaussian distribution">Inverse <br>Gaussian</a></td><td>real: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle (0,+\infty )}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mo stretchy="false">(</mo> <mn>0</mn> <mo>,</mo> <mo>+</mo> <mi mathvariant="normal">∞<!-- ∞ --></mi> <mo stretchy="false">)</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle (0,+\infty )}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/de77e40eb7e2582eef8a5a1da1bc027b7d9a8d6e" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; width:8.138ex; height:2.843ex;" alt="{\displaystyle (0,+\infty )}"></span></td><td></td><td>Inverse <br>squared</td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\mu ^{-2}\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <msup> <mi>μ<!-- μ --></mi> <mrow class="MJX-TeXAtom-ORD"> <mo>−<!-- − --></mo> <mn>2</mn> </mrow> </msup> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\mu ^{-2}\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/0a3b87590326202b24e85ce5762989fd34bff8c2" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:10.774ex; height:3.176ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\mu ^{-2}\,\!}"></span></td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mu =(\mathbf {X} {\boldsymbol {\beta }})^{-1/2}\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mi>μ<!-- μ --></mi> <mo>=</mo> <mo stretchy="false">(</mo> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <msup> <mo stretchy="false">)</mo> <mrow class="MJX-TeXAtom-ORD"> <mo>−<!-- − --></mo> <mn>1</mn> <mrow class="MJX-TeXAtom-ORD"> <mo>/</mo> </mrow> <mn>2</mn> </mrow> </msup> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mu =(\mathbf {X} {\boldsymbol {\beta }})^{-1/2}\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/9f2b2781a377e3d9ed78c1b1e026fda1e8895402" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:14.227ex; height:3.343ex;" alt="{\displaystyle \mu =(\mathbf {X} {\boldsymbol {\beta }})^{-1/2}\,\!}"></span></td></tr><tr><td><a href="/wiki/Poisson_distribution" title="Poisson distribution">Poisson</a></td><td>integer: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle 0,1,2,\ldots }"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>…<!-- … --></mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle 0,1,2,\ldots }</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b1da8ed7e74b31b6314f23f122a1198c104fcaad" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.671ex; width:9.312ex; height:2.509ex;" alt="{\displaystyle 0,1,2,\ldots }"></span></td><td>count of occurrences in fixed amount of time/space</td><td><a href="/wiki/Natural_logarithm" title="Natural logarithm">Log</a></td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln(\mu )\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <mi>ln</mi> <mo><!-- --></mo> <mo stretchy="false">(</mo> <mi>μ<!-- μ --></mi> <mo stretchy="false">)</mo> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln(\mu )\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/245ed014e9dd7f9624171201d1a4daecb1c20997" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:12.189ex; height:2.843ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln(\mu )\,\!}"></span></td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mu =\exp(\mathbf {X} {\boldsymbol {\beta }})\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mi>μ<!-- μ --></mi> <mo>=</mo> <mi>exp</mi> <mo><!-- --></mo> <mo stretchy="false">(</mo> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo stretchy="false">)</mo> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mu =\exp(\mathbf {X} {\boldsymbol {\beta }})\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/7fac36b3451b711d49417813988a6e8bb4db5719" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; margin-right: -0.387ex; width:13.803ex; height:2.843ex;" alt="{\displaystyle \mu =\exp(\mathbf {X} {\boldsymbol {\beta }})\,\!}"></span></td></tr><tr><td><a href="/wiki/Bernoulli_distribution" title="Bernoulli distribution">Bernoulli</a></td><td>integer: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \{0,1\}}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mo fence="false" stretchy="false">{</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo fence="false" stretchy="false">}</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \{0,1\}}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/28de5781698336d21c9c560fb1cbb3fb406923eb" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; width:5.684ex; height:2.843ex;" alt="{\displaystyle \{0,1\}}"></span></td><td>outcome of single yes/no occurrence</td><td rowspan="5"><a href="/wiki/Logit" title="Logit">Logit</a></td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{1-\mu }}\right)\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <mi>ln</mi> <mo><!-- --></mo> <mrow> <mo>(</mo> <mrow class="MJX-TeXAtom-ORD"> <mfrac> <mi>μ<!-- μ --></mi> <mrow> <mn>1</mn> <mo>−<!-- − --></mo> <mi>μ<!-- μ --></mi> </mrow> </mfrac> </mrow> <mo>)</mo> </mrow> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{1-\mu }}\right)\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/8756b6c8f78882b05820c4058a861002462ef4b4" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -2.505ex; margin-right: -0.387ex; width:18.64ex; height:6.176ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{1-\mu }}\right)\,\!}"></span></td><td rowspan="5"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mu ={\frac {\exp(\mathbf {X} {\boldsymbol {\beta }})}{1+\exp(\mathbf {X} {\boldsymbol {\beta }})}}={\frac {1}{1+\exp(-\mathbf {X} {\boldsymbol {\beta }})}}\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mi>μ<!-- μ --></mi> <mo>=</mo> <mrow class="MJX-TeXAtom-ORD"> <mfrac> <mrow> <mi>exp</mi> <mo><!-- --></mo> <mo stretchy="false">(</mo> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo stretchy="false">)</mo> </mrow> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mo><!-- --></mo> <mo stretchy="false">(</mo> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo stretchy="false">)</mo> </mrow> </mfrac> </mrow> <mo>=</mo> <mrow class="MJX-TeXAtom-ORD"> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mo><!-- --></mo> <mo stretchy="false">(</mo> <mo>−<!-- − --></mo> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo stretchy="false">)</mo> </mrow> </mfrac> </mrow> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mu ={\frac {\exp(\mathbf {X} {\boldsymbol {\beta }})}{1+\exp(\mathbf {X} {\boldsymbol {\beta }})}}={\frac {1}{1+\exp(-\mathbf {X} {\boldsymbol {\beta }})}}\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b739082e7ee418a2163685f976c75b4906910158" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -2.671ex; margin-right: -0.387ex; width:37.302ex; height:6.509ex;" alt="{\displaystyle \mu ={\frac {\exp(\mathbf {X} {\boldsymbol {\beta }})}{1+\exp(\mathbf {X} {\boldsymbol {\beta }})}}={\frac {1}{1+\exp(-\mathbf {X} {\boldsymbol {\beta }})}}\,\!}"></span></td></tr><tr><td><a href="/wiki/Binomial_distribution" title="Binomial distribution">Binomial</a></td><td>integer: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle 0,1,\ldots ,N}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mo>…<!-- … --></mo> <mo>,</mo> <mi>N</mi> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle 0,1,\ldots ,N}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/4f0dabd0eecff746a5377991354a67ea28a4e684" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.671ex; width:10.601ex; height:2.509ex;" alt="{\displaystyle 0,1,\ldots ,N}"></span></td><td>count of # of "yes" occurrences out of N yes/no occurrences</td><td><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{n-\mu }}\right)\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <mi>ln</mi> <mo><!-- --></mo> <mrow> <mo>(</mo> <mrow class="MJX-TeXAtom-ORD"> <mfrac> <mi>μ<!-- μ --></mi> <mrow> <mi>n</mi> <mo>−<!-- − --></mo> <mi>μ<!-- μ --></mi> </mrow> </mfrac> </mrow> <mo>)</mo> </mrow> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{n-\mu }}\right)\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ecbce4c90689853e5656461e1165f5473d276a44" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -2.505ex; margin-right: -0.387ex; width:18.873ex; height:6.176ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{n-\mu }}\right)\,\!}"></span></td></tr><tr><td rowspan="2"><a href="/wiki/Categorical_distribution" title="Categorical distribution">Categorical</a></td><td>integer: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle [0,K)}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mo stretchy="false">[</mo> <mn>0</mn> <mo>,</mo> <mi>K</mi> <mo stretchy="false">)</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle [0,K)}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/aa074207d3bea2e879410172ce89ba2435d37d11" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; width:5.814ex; height:2.843ex;" alt="{\displaystyle [0,K)}"></span></td><td rowspan="2">outcome of single <i>K</i>-way occurrence</td><td rowspan="3"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{1-\mu }}\right)\,\!}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold">X</mi> </mrow> <mrow class="MJX-TeXAtom-ORD"> <mi mathvariant="bold-italic">β<!-- β --></mi> </mrow> <mo>=</mo> <mi>ln</mi> <mo><!-- --></mo> <mrow> <mo>(</mo> <mrow class="MJX-TeXAtom-ORD"> <mfrac> <mi>μ<!-- μ --></mi> <mrow> <mn>1</mn> <mo>−<!-- − --></mo> <mi>μ<!-- μ --></mi> </mrow> </mfrac> </mrow> <mo>)</mo> </mrow> <mspace width="thinmathspace"></mspace> <mspace width="negativethinmathspace"></mspace> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{1-\mu }}\right)\,\!}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/8756b6c8f78882b05820c4058a861002462ef4b4" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -2.505ex; margin-right: -0.387ex; width:18.64ex; height:6.176ex;" alt="{\displaystyle \mathbf {X} {\boldsymbol {\beta }}=\ln \left({\frac {\mu }{1-\mu }}\right)\,\!}"></span></td></tr><tr><td><i>K</i>-vector of integer: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle [0,1]}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mo stretchy="false">[</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo stretchy="false">]</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle [0,1]}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/738f7d23bb2d9642bab520020873cccbef49768d" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; width:4.653ex; height:2.843ex;" alt="{\displaystyle [0,1]}"></span>, where exactly one element in the vector has the value 1</td></tr><tr><td><a href="/wiki/Multinomial_distribution" title="Multinomial distribution">Multinomial</a></td><td><i>K</i>-vector of integer: <span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle [0,N]}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mo stretchy="false">[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo stretchy="false">]</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle [0,N]}</annotation> </semantics></math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/703d57dca548a7f9d927247c2a27b67666aebdd5" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.838ex; width:5.554ex; height:2.843ex;" alt="{\displaystyle [0,N]}"></span></td><td>count of occurrences of different types (1, ..., <i>K</i>) out of <i>N</i> total <i>K</i>-way occurrences</td></tr></tbody></table>```## Three Parts of a Generalized Linear Model1. Link Function $g(\cdot)$ (main difference from GLM): - How a **non-normal outcome** gets transformed into something we can predict that is more continuous (unbounded) - For outcomes that are already normal, general linear models are just a special case with an "identity" link function (Y \* 1)2. Model for the Means ("Structural Model"): - How predictors linearly related to the link-transformed outcome - Or predictor non-linearly related to the original scale of outcome - New link-transformed $Y_p = \color{royalblue}{g^{-1}} (\color{tomato}{\mathbf{\beta_0}} + \color{tomato}{\mathbf{\beta_1}}X_p + \color{tomato}{\mathbf{\beta_2}} Z_p + \color{tomato}{\mathbf{\beta_3}} X_pZ_p)$3. Model for the Variance ("Sampling/Stochastic Model"): - If the errors aren’t normally distributed, then what are they? - Family of alternative distributions at our disposal that map onto what the distribution of errors could possibly look like - In logistic regression, you often hear sayings like "[no error term exists]{style="color: tomato"}" or "[the error term has a binomial distribution]{style="color: royalblue"}"## Link Functions: How Generalized Models Work1. Generalized models work by providing [a mapping of the theoretical portion of the model]{.underline} (the right hand side of the equation) to the sample space of the outcome (the left hand side of the equation) - The mapping is done by a feature called a link function2. The link function is a non-linear function that takes the linear model predictors, random/latent terms, and constants and puts them onto the space of the outcome observed variables3. Link functions are typically expressed for the mean of the outcome variable (we will only focus on that) - In generalized models, **the error variance is often a function of the mean, so no additive information exists if estimating the error term**## Link Functions in Practice- The link function expresses the conditional value of the mean of the outcome$$E(Y_p) = \hat{Y}_p = \mu_y$$where $E(\cdot)$ stands for expectation.- ... through a non-linear **link function** $g(\hat Y_p)$ [when used on conditional mean of outcome]{style="color: royalblue; font-weight: bold"} or its [inverse]{style="color: red; font-weight: bold"} **link function** $g^{-1}(\mathbf{\beta X})$ [when used on linear combination of predictors]{style="color: tomato; font-weight: bold"}- The general form is: $$ E(Y_p) =\hat{Y}_p = \mu_y =\color{royalblue}{g^{-1}}(\color{tomato}{\beta_0+\beta_1X_p+\beta_2Z_p+\beta_3X_pZ_p}) $$- The [**red**]{style="color: tomato; font-weight: bold"} part is the linear combination of predictors and their effects.## Why normal GLM is one type of Generalized Linear Model- Our familiar general linear model is actually a member of the generalized model family (it is subsumed) - The link function is called the identity, the linear predictor is unchanged- The normal distribution has two parameters, a mean $\mu$ and a variance $\sigma^2$ - Unlike most distributions, the normal distribution parameters are directly modeled by the GLM- In conditionally normal GLMs, the inverse link function is called the identity function:$$g^{-1}(\cdot) = \boldsymbol{I}(\cdot) = 1 * (\color{yellowgreen}{\text{linear combination of predictors}})$$- The identity function does not alter the predicted values -- they can be any real number- This matches the sample space of the normal distribution – the mean can be any real number<!-- -->- The expected value of an outcome from the GLM is:$$\begin{align}\mathbb{E}(Y_p) =\hat{Y}_p = \mu_y &=\color{royalblue}{\boldsymbol{I}}(\color{yellowgreen}{\beta_0+\beta_1X_p+\beta_2Z_p+\beta_3X_pZ_p}) \\&= \beta_0+\beta_1X_p+\beta_2Z_p+\beta_3X_pZ_p\end{align}$$## About the Variance of GLM- The other parameter of the normal distribution described the variance of an outcome – called the error (residual) variance- We found that the model for the variance for the GLM was:$$\begin{align}Var(Y_p) &= Var(\beta_0+\beta_1X_p+\beta_2Z_p+\beta_3X_pZ_p + \color{tomato}{e}_p ) \\&= Var(e_p) \\&= \sigma^2_e\end{align}$$- Similarly, this term directly relates to the variance of the outcome in the normal distribution - We will quickly see distributions from other families (i.e., logistic ) where this doesn’t happen- **Error terms are independent of predictors**# Generalized Linear Models For Binary Data## Today's Data Example1. To help demonstrate generalized models for binary data, we borrow from an example listed on the [UCLA ATS website](https://stats.idre.ucla.edu/stata/dae/ordered-logistic-regression/).2. The data can be used when you upgraded the `ESRM64503` package3. Data come from a survey of 400 college juniors looking at factors that influence the decision to apply to graduate school: - **Y (outcome)**: student rating of likelihood he/she will apply to grad school ([0 = unlikely]{style="color: tomato"}; [1 = somewhat likely]{style="color: yellowgreen"}; [2 = very likely]{style="color: royalblue"}) - We will first look at Y for two categories ([0 = unlikely]{style="color: tomato; font-weight: bold"}; [**1 = somewhat or very likely**]{style="color: royalblue; font-weight: bold"}) – we merged Cat1 and Cat2 into one category for illustration - You wouldn't do this in practice (use a different distribution for 3 categories) - **ParentEd**: indicator (0/1) if one or more parent has graduate degree - **Public**: indicator (0/1) if student attends a public university - **GPA**: grade point average on 4 point scale (4.0 = perfect)## Descriptive Statistics for Data```{r}#| code-fold: truelibrary(ESRM64503)library(tidyverse)library(kableExtra)Desp_GPA <- dataLogit |>summarise(Variable ="GPA",N =n(),Mean =mean(GPA),`Std Dev`=sd(GPA),Minimum =min(GPA),Maximum =max(GPA) ) Desp_Apply <- dataLogit |>group_by(APPLY) |>summarise(Frequency =n(),Percent =n() /nrow(dataLogit) *100 ) |>ungroup() |>mutate(`Cumulative Frequency`=cumsum(Frequency),`Cumulative Percent`=cumsum(Percent) )Desp_LLApply <- dataLogit |>group_by(LLAPPLY) |>summarise(Frequency =n(),Percent =n() /nrow(dataLogit) *100 ) |>ungroup() |>mutate(`Cumulative Frequency`=cumsum(Frequency),`Cumulative Percent`=cumsum(Percent) )Desp_PARED <- dataLogit |>group_by(PARED) |>summarise(Frequency =n(),Percent =n() /nrow(dataLogit) *100 ) |>ungroup() |>mutate(`Cumulative Frequency`=cumsum(Frequency),`Cumulative Percent`=cumsum(Percent) )Desp_PUBLIC <- dataLogit |>group_by(PUBLIC) |>summarise(Frequency =n(),Percent =n() /nrow(dataLogit) *100 ) |>ungroup() |>mutate(`Cumulative Frequency`=cumsum(Frequency),`Cumulative Percent`=cumsum(Percent) )``````{r}show_table(Desp_Apply) # Likelihood of Applying (1 = somewhat likely; 2 = very likely)show_table(Desp_LLApply) # Likelihood of Applying (1 = likely)show_table(Desp_GPA) # Analysis Variable : GPAshow_table(Desp_PARED) # Parent Has Graduate Degreeshow_table(Desp_PUBLIC) # Student Attends Public University```## What If We Used a Normal GLM for Binary Outcomes?- If $Y_p$ is a binary (0 or 1) outcome - Expected mean is proportion of people who have a 1 (or "p", the probability of Y_p = 1 in the sample) - The probability of having a 1 is what we're trying to predict for each person, given the values of his/her predictors- **General linear model**: $Y_p = I(\beta_0 + \beta_1x_p + \beta_2z_p + e_p)$ - $\color{tomato}{\beta}_0$ = expected probability when all predictors are 0 - $\color{tomato}{\beta}_s$ = expected change in [probability for a one-unit change in the predictor]{.underline} - $\color{royalblue}{e}_p$ = difference between observed and predicted values - Generalized Linear Model becomes $Y_p = \color{tomato}{(\text{predicted probability of outcome equal to 1})} + e_p$## A General Linear Model of Predicting Binary Outcomes?- But if $Y_p$ is binary and link function is identity link, then $e_p$ can only be 2 things: - $e_p$ = $Y_p - \hat{Y}_p$ - If $Y_p = 0$ then $e_p$ = (0 - predicted probability) - If $Y_p = 1$ then $e_p$ = (1 - predicted probability) - The mean of errors would still be 0 ... by definition - But variance of errors can't possibly be constant over levels of X like we assume in general linear models - The mean and variance of a binary outcome are **dependent!** - As shown shortly, mean = *p* and variance = *p* \* (1 - *p*), so they are tied together - This means that because the conditional mean of Y (*p*, the predicted probability Y = 1) is dependent on X, then so is the error variance## A General Linear Model With Binary Outcomes?- How can we have a linear relationship between X & Y?- Probability of a 1 is bounded between 0 and 1, but predicted probabilities from a linear model aren't bounded - Impossible values- Linear relationship needs to 'shut off' somehow $\rightarrow$ made nonlinear### Predicted Regression Line of GLM```{r}#| fig-align: center#| code-fold: truelibrary(ggplot2)ggplot(dataLogit) +aes(x = GPA, y = LLAPPLY) +geom_hline(aes( yintercept =1)) +geom_hline(aes( yintercept =0)) +geom_point(color ="tomato", alpha = .8) +geom_smooth(method ="lm", se =FALSE, fullrange =TRUE, linewidth =1.3) +scale_x_continuous(limits =c(0, 8), breaks =0:8) +scale_y_continuous(limits =c(-0.4, 1.4), breaks =seq(-0.4, 1.4, .1)) +labs(y ="Predicted Probability of Y") +theme_classic() +theme(text =element_text(size =20))```### Predicted Regression Line of Logistic Regression```{r}#| fig-align: center#| code-fold: trueggplot(dataLogit) +aes(x = GPA, y = LLAPPLY) +geom_hline(aes( yintercept =1)) +geom_hline(aes( yintercept =0)) +geom_point(color ="tomato", alpha = .8) +geom_smooth(method ="glm", se =FALSE, fullrange =TRUE,method.args =list(family =binomial(link ="logit")), color ="yellowgreen", linewidth =1.5) +scale_x_continuous(limits =c(0, 8), breaks =0:8) +scale_y_continuous(limits =c(-0.4, 1.4), breaks =seq(-0.4, 1.4, .1)) +labs(y ="Predicted Probability of Y") +theme_classic() +theme(text =element_text(size =20))```## 3 Problems with GLM predicting binary outcomes- Assumption Violation problem: GLM for continuous, conditionally normal outcome = residuals can't be normally distributed- Restricted Range problem (e.g., 0 to 1 for outcomes) - Predictors should not be linearly related to observed outcome - Effects of predictors need to be 'shut off' at some point to keep predicted values of binary outcome within range- Decision Making Problem: for GLM, the predicted value will a continuous predicted value with same scale of outcome (Y), how do we answer the question such as whether or not students will apply given certain value of GPA - But for Generalized Linear model, we can say students will have 50% probability of applying## The Binary Outcome: Bernoulli Distribution- Bernoulli distribution has following properties - Notation: $Y_p \sim B(\boldsymbol{p}_p)$ (where ***p*** is the conditional probability of a 1 for person p) - Sample Space: $Y_p \in \{0, 1\}$ ($Y_p$ can either be a 0 or a 1) - Probability Density Function (PDF): - $$ f(Y_p) = (\mathbf{p}_p)^{Y_p}(1-\mathbf{p}_p)^{1-Y_p} $$<!-- --> - Expected value (mean) of Y: $E(y_p) = \mu_{Y_p}=\boldsymbol{p}_p$ - Variance of Y: $Var(Y_p) = \sigma^2_{Y_p} = \boldsymbol{p}_p ( 1- \boldsymbol{p}_p )$::: callout-note$\boldsymbol{p}_p$ is the only parameter – so we only need to provide a link function for it ...:::## Generalized Models for Binary Outcomes- Rather than modeling the probability of a 1 directly, we need to transform it into a more continuous variable with a **link function**, for example: - We could transform **probability** into an **odds ratio** - Odds ratio (OR): $\frac{p}{1-p} = \frac{Pr(Y = 1)}{Pr(Y = 0)}$ - For example, if $p = .7$, then OR(1) = 2.33; OR(0) = .429 - Odds scale is [way skewed, asymmetric, and ranges from 0 to $+\infty$]{style="color: tomato"} - This is not a helpful property - **Take natural log of odds ratio** $\rightarrow$ called "logit" link - $\text{logit}(p) = \log(\frac{p}{1-p})$ - For example, $\text{logit}(.7) = .846$ and $\text{logit}(.3) = -.846$ - Logit scale is now symmetric at $p = .5$ - The logit link is one of many used for the Bernoulli distribution - Names of others: [Probit, Log-Log, Complementary Log-Log]{style="color: tomato"}## More Details about Logit Transformation- The link function for a logit is defined by: $$ g(\mathbb{E}(Y)) = \log(\frac{P(Y=1)}{1-P(Y=1)}) = \beta X^T $$ {#eq-logit-e} where $g$ called **link function**, $\mathbb{E}(Y)$ is the expectation of Y, $\beta X^T$ is the linear predictorA logit can be translate back to a probability with some algebra:$$P(Y=1) = \frac{\exp(\beta X^T)}{1+\exp(\beta X^T)} = \frac{\exp(\beta_0 + \beta_1 X + \beta_2Z + \beta_3XZ)}{1+\exp(\beta_0 + \beta_1 X + \beta_2Z + \beta_3XZ)} \\= (1+\exp(-1*(\beta_0 + \beta_1 X + \beta_2Z + \beta_3XZ)))^{-1}$$ {#eq-logit-p}From @eq-logit-e and @eq-logit-p, we can know that $g(\mathbb{E}(Y))$ has a range of \[-$\infty$, +$\infty$\], P(Y = 1) has a range of \[0, 1\].## Interpretation of Coefficients```{r}# function to translate OR to ProbabilityOR_to_Prob <-function(OR){ p = OR / (1+OR)return(p)}# function to translate Logit to ProbabilityLogit_to_Prob <-function(Logit){ OR <-exp(Logit) p =OR_to_Prob(OR)return(p)}Logit_to_Prob(Logit =-2.007) # p = .118exp(-2.007)```- Intercept $\beta_0$: **Logit**: We can say the predicted logit value of Y = 1 for an individual when all predictors are zero; i.e., *the average logit is -2.007* **Probability**: Alternatively, we can say the expected value of probability of Y = 1 is $\frac{\exp(\beta_0)}{1+\exp(\beta_0)}$ when all predictors are zero; i.e., *the average probability of applying to grad score is* [0.1184699]{style="color: tomato; font-weight: bold"} **Odds Ratio**: Alternatively, we can say the expected odds ratio (OR) of probability of Y = 1 is $\exp(\text{Logit})$ when all predictors are zero; the average odds (ratio) of the probability of applying to grad school is exp(-2.007) = 0.13439- Slope $\beta_1$: **Logit**: We can say the predicted [**increase**]{style="color: tomato; font-weight: bold"} of logit value of Y = 1 with one-unit increase of X; **Probability**: We can say the expected [**increase**]{style="color: tomato; font-weight: bold"} of probability of Y = 1 is $\frac{\exp(\beta_0+\beta_1)}{1+\exp(\beta_0+\beta_1)}-\frac{\exp(\beta_0)}{1+\exp(\beta_0)}$ with one-unit increase of X; Note that the increase ($\Delta(\beta_0, \beta_1)$) is non-linear and dynamic given varied value of X. **Odds Ratio**: We can say the expected odds ratio (OR) of probability of Y = 1 is $\exp(\beta_1)$ times larger with one-unit increase of X. ([hint: the new odds ratio is $\exp(\beta_0+\beta_1) = \exp(\beta_0)\exp(\beta_1)$ when X = 1 and the old odds ratio is $\exp(\beta_0)$ when X = 0]{.mohu})## Example: Fitting The Models- Model 0: The empty model for logistic regression with binary variable (applying for grad school) as the outcome- Model 1: The logistic regression model including centered GPA and binary predictors (Parent has granduate degree, Student Attend Public University)## Model 0: The empty model- The statistical form of empty model: $$ P(Y_p =1) = \frac{\exp(\beta_0)}{1+\exp(\beta_0)} $$ or $$ \text{logit}(P(Y_p = 1)) = \beta_0 $$::: callout-note## Takehome NoteMany generalized linear models don't list an error term in the statistical form. This is because the error has fixed mean and fixed variances.For the logit function, $e_p$ has a logistic distribution with a zero mean and a variance as $\pi^2$/3 = 3.29.:::- Use `ordinal` package and `clm()` function, we can model categorical dependent variables```{r}library(ordinal)# response variable must be a factordataLogit$LLAPPLY =factor(dataLogit$LLAPPLY, levels =0:1) # <1># Empty model: likely to applymodel0 =clm(LLAPPLY ~1, data = dataLogit, control =clm.control(trace =1)) # <2>```1. The dependent variable must be stored as a `factor`2. the `formula` and `data` arguments are identical to `lm`; The `control =` argument is only used here to show iteration history of the ML algorithm## Model 0: Result```{r}summary(model0)```- The `clm` function output **Threshold** parameter (labelled as $\tau_0$) rather than intercept ($\beta_0$).- The relationship between $\beta_0$ and $\tau_0$ is $\beta_0 = - \tau_0$ - Thus, the estimated $\beta_0$ is -0.2007 with SE = 0.1005 for model 0 - The predicted logit is -0.2007; the predicted probability is .55 - The log-likelihood is -275.26; AIC is 552.51## Model 1: The conditional model$$P(Y_p = 1) = \frac{\exp(\beta_0 + \beta_1PARED_p + \beta_2 (GPA_p-3) +\beta_3 PUBLIC_p)}{1 + \exp(\beta_0 + \beta_1PARED_p + \beta_2 (GPA_p-3) +\beta_3 PUBLIC_p)}$$or$$\text{logit}(P(Y_p =1)) = \beta_0 + \beta_1PARED_p + \beta_2 (GPA_p-3) +\beta_3 PUBLIC_p$$```{r}dataLogit$GPA3 <- dataLogit$GPA -3model1 =clm(LLAPPLY ~ PARED + GPA3 + PUBLIC, data = dataLogit)summary(model1)```## Model 1: Results- LL(Model 1) = -264.96 (LL(Model 0) = -275.26)- $\beta_0 = -\tau_0 = 0.3382 (0.1187)$- $\beta_1 (SE) = 1.0596 (0.2974)$ with $p < .05$- $\beta_2 (SE) = 0.5482 (0.2724)$ with $p < .05$- $\beta_3 (SE) = -0.2006 (0.3053)$ with $p = .511$### Understand the results1. **Question #1**: does Model 1 fit better than the empty model (Model 2)? This question is equivalent to test the following hypothesis: $$ H_0: \beta_0=\beta_1=\beta_2=0\\ H_1: \text{At least one not equal to 0} $$ We can use [Likelihood Ratio Test]{.mohu}: $$ -2\Delta = (-275.26 - (-264.96)) = 20.586 $$ - DF = 4 (# of params of Model 0) - 1 (# of params of Model 1) = 3 - p-value: p = .0001283```{r} # anova can compare two models anova(model0, model1) # Or we can use chi-square distribution to calculate p-value as.numeric(pchisq(-2 * (logLik(model0)-logLik(model1)), 3, lower.tail = FALSE)) ``` - Conclusion: reject $H_0$ and we preferred to [the empty model]{.mohu}------------------------------------------------------------------------2. **Question #2**: Whether the effects of GPA, PARED, PUBLIC are significant or not? - **Intercept** $\beta_0 = -0.3382 (0.1187)$: **Logit**: the predicted logit of probability of applying for the grad school is [-0.3382]{style="color: tomato"} for a person with 3.0 GPA, parents without a graduate degree, and at a private university **Odds Ratio:** the predicted OR of applying for the grad school is $0.7130$ for a person with 3.0 GPA, parents without a graduate degree, and at a private university (OR \< 1: the probability of applying is less than the probability of not applying) **Probability**: the predicted probability of applying for the grad school is [41.7%]{style="color: tomato"} for a person with 3.0 GPA, parents without a graduate degree, and at a private university - **Slope of parents having a graduate degree**: $\beta_1 (SE) = 1.0596 (0.2974)$ with $p < .05$ **Logit**: the predicted logit of applying for the grad school will increase [1.0596]{style="color: tomato"} for whose parents having a graduate degree controlling other predictors. **Odds Ratio**: the predicted OR will increase [from 0.7139 to 2.05]{style="color: tomato"} for whose parents having a graduate degree controlling other predictors – students who have parents with a graduate degree has 3x the odds of rating the item with a "likely to apply" **Probability**: Compared to those without parental graduate degree, the predicted probability of "likely to apply" for students with parental graduate degree increases [from 0.416 to .673]{style="color: tomato"} $$ \frac{\exp(\beta_0+\beta_1)}{1+ \exp(\beta_0+\beta_1)}= .673 $$ $$ \frac{\exp(\beta_0)}{1+ \exp(\beta_0)}= .416 $$ - **Slope of students in public vs. private universities:** $\beta_3 (SE) = -0.2006 (0.3053)$ with $p = .511$```{r}## Interpret outputOR_p <-function(logit_old, logit_new){data.frame(OR_old =exp(logit_old),OR_new =exp(logit_new),p_old =exp(logit_old) / (1+exp(logit_old)),p_new =exp(logit_new) / (1+exp(logit_new)) )}beta_0 <--0.3382beta_3 <--0.2006Result <-OR_p(logit_old = beta_0, logit_new = (beta_0 + beta_3)) Result |>show_table()Result[1, 2] / Result[1, 1]Result[1, 4] - Result[1, 3]```::: callout-note## Interpretation- Students has [0.81 times odds ratio of applying for grad.]{.mohu}school when they are in public universities- The probability [of applying for grad. school increases 4.78%]{.mohu}.- This change in odds ratio and probability of applying for grad. school is not significant.:::------------------------------------------------------------------------- **Slope of GPA3:** $\beta_2(SE) = 0.5482 (0.2724)$ with $p < .05$```{r} beta_2 <- 0.5482 Result2 <- OR_p(logit_old = beta_0, logit_new = (beta_0 + beta_2)) Result2 |> show_table() Result2[1, 2]/Result2[1, 1] ```::: callout-note## Interpretation- For every one-unit increase in GPA, the logit of applying for grad. school will increase 0.548, the odds ratio will be 1.73 times, the probabilities will be 19.2%, 29.2%, 41.6% to 55.2% for GPA = 1 , 2, 3 and 4:::```{r}new_data <-data.frame(GPA3 =seq(-3, 1, .1),PARED =0,PUBLIC =0)Pred_prob <-predict(model1, newdata=new_data)$fitas.data.frame(cbind(GPA = new_data$GPA3+3, P_Y_0 = Pred_prob[,1], P_Y_1 = Pred_prob[,2])) |>pivot_longer(starts_with("P_Y")) |>ggplot() +aes(x = GPA, y = value) +geom_path(aes(group = name, color = name), linewidth =1.3) +labs(y ="Predicted Probability") +scale_x_continuous(breaks =seq(0, 4, .2)) +scale_color_discrete(labels =c("P(Y = 0)", "P(Y = 1)"), name ="") +theme_classic()```::: callout-important## Takehome Note- For logistic models with two responses: - Regression weights are now for LOGITS - The direction of what is being modeled has to be understood (Y = 0 or = 1) - The change in odds and probability is not linear per unit change in the IV, but instead is linear with respect to the logit - Interactions will still - Will still modify the conditional main effects - Simple main effects are effects when interacting variables = 0:::## Wrap up- Generalized linear models are models for outcomes with distributions that are not necessarily normal- The estimation process is largely the same: maximum likelihood is still the gold standard as it provides estimates with understandable properties- Learning about each type of distribution and link takes time: - They all are unique and all have slightly different ways of mapping outcome data onto your model- Logistic regression is one of the more frequently used generalized models – binary outcomes are common