Lecture 04

Linear Regression Model with Stan II

Jihong Zhang

Educational Statistics and Research Methods

Today’s Lecture Objectives

  1. Making Stan Syntax Shorter

  2. Computing Functions of Model Parameters

    Download R file DietDataExample2.R

In previous class…

R code for data read in
library(cmdstanr)
library(bayesplot)
library(tidyr)
library(dplyr)
color_scheme_set('brightblue')
dat <- read.csv(here::here("posts", "Lectures", "2024-01-12-syllabus-adv-multivariate-esrm-6553", "Lecture03", "Code", "DietData.csv"))
dat$DietGroup <- factor(dat$DietGroup, levels = 1:3)
dat$HeightIN60 <- dat$HeightIN - 60
kableExtra::kable( rbind(head(dat), tail(dat)) ) |> kableExtra::kable_classic_2() |> 
  kableExtra::kable_styling(full_width = F, font_size = 15)
Respondent DietGroup HeightIN WeightLB HeightIN60
1 1 1 56 140 -4
2 2 1 60 155 0
3 3 1 64 143 4
4 4 1 68 161 8
5 5 1 72 139 12
6 6 1 54 159 -6
25 25 3 70 259 10
26 26 3 52 201 -8
27 27 3 59 228 -1
28 28 3 64 245 4
29 29 3 65 241 5
30 30 3 72 269 12
  1. Introduce the empty model
  2. Example: Post-Diet Weights
    • WeightLB (Dependent Variable): The respondents’ weight in pounds
    • HeightIN: The respondents’ height in inches
    • DietGroup: 1, 2, 3 representing the group to which a respondent was assigned
  3. The empty model has two parameters to be estimated: (1) \(\beta_0\), (2) \(\sigma_e\)
  4. The posterior mean/median of \(\beta_0\) should be mean of WeightLB
  5. The posterior mean/median of \(\sigma_e\) should be sd of WeightLB

Making Stan Code Short and Efficient

The Stan syntax from our previous model was lengthy:

  • A declared variable for each parameter
  • The linear combination of coefficients by multiplying predictors

Stan has built-in features to shorten syntax:

  • Matrices/Vectors
  • Matrix products
  • Multivariate distribution (initially for prior distributions)
  • Built-in Functions (sum() better than +=)

Note: if you are interested in Efficiency tuning in Stan, look at this Charpter for more details.

Linear Models without Matrices

The linear model from our example was:

\[ \text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\ +\beta_4 \text{HeightIN}_p\text{Group2}_p \\ +\beta_5 \text{HeightIN}_p\text{Group3}_p \\ + e_p \]

with:

  • \(\text{Group2}_p\) the binary indicator of person \(p\) being in group 2

  • \(\text{Group}3_p\) the binary indicator of person \(p\) being in group 3

  • \(e_p \sim N(0, \sigma_e)\)

Path Diagram of the Full Model

graph LR;
  HeightIN60 --> WeightLB;
  DietGroup2 --> WeightLB;
  DietGroup3 --> WeightLB;
  HeightIN60xDietGroup2 --> WeightLB;
  HeightIN60xDietGroup3 --> WeightLB;
Figure 1

Linear Models with Matrices

Model (predictor) matrix with the size 30 (rows) \(\times\) 6 (columns)

\[ \mathbf{X} = \begin{bmatrix}1 & -4 & 0 & 0 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & 12 & 0 & 1 & 0 & 12 \end{bmatrix} \]

Coefficients vectors with the size 6 (rows) \(\times\) 1 (column):

\[ \mathbf{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \beta_5 \\ \end{bmatrix} \]

FullModelFormula = as.formula("WeightLB ~ HeightIN60 + DietGroup + HeightIN60*DietGroup")
model.matrix(FullModelFormula, data = dat) |> head()
  (Intercept) HeightIN60 DietGroup2 DietGroup3 HeightIN60:DietGroup2
1           1         -4          0          0                     0
2           1          0          0          0                     0
3           1          4          0          0                     0
4           1          8          0          0                     0
5           1         12          0          0                     0
6           1         -6          0          0                     0
  HeightIN60:DietGroup3
1                     0
2                     0
3                     0
4                     0
5                     0
6                     0

Linear Models with Matrices (Cont.)

We then rewrite the equation from

\[ \text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\ +\beta_4 \text{HeightIN}_p\text{Group2}_p \\ +\beta_5 \text{HeightIN}_p\text{Group3}_p \\ + e_p \]

to:

\[ \mathbf{WeightLB} = \mathbf{X}\boldsymbol{\beta} + \mathbf{e} \]

Where:

  • \(\mathbf{WeightLB}\) is the vector of outcome (N \(\times\) 1)

  • \(\mathbf{X}\) is the model (predictor) matrix (N \(\times\) P for P - 1 predictors)

  • \(\boldsymbol{\beta}\) is the coefficients vector (P \(\times\) 1)

  • \(\mathbf{e}\) is the vector for residuals (N \(\times\) 1)

Example: Predicted Values and \(\text{R}^2\)

Similar to Monte Carlo Simulation, given matrices \(P\) and \(\boldsymbol{\beta}\)

Code
set.seed(1234)
fit_lm <- lm(formula = FullModelFormula, data = dat)

beta = coefficients(fit_lm)
P = length(beta)
X = model.matrix(FullModelFormula, data = dat)
head(X%*%beta)
      [,1]
1 149.0639
2 147.5566
3 146.0493
4 144.5419
5 143.0346
6 149.8176

Calculating \(R^2\) and adjusted \(R^2\):

rss = crossprod(dat$WeightLB - X%*%beta) # residual sum of squares
tss = crossprod(dat$WeightLB - mean(dat$WeightLB)) # total sum of squares
R2 = 1 - rss / tss
R2.adjust = 1 - (rss/(nrow(dat)-P)) / (tss/((nrow(dat)-1)))
data.frame(
  R2, # r-square
  R2.adjust # adjusted. r-square
)
         R2 R2.adjust
1 0.9786724 0.9742292

lm function: \(R^2\) and adjusted \(R^2\):

summary_lm <- summary(fit_lm)
summary_lm$r.squared
[1] 0.9786724
summary_lm$adj.r.squared
[1] 0.9742292

Vectorize prior distributions

Previously, we defined a normal distribution for each regression coefficient \[ \beta_0 \sim normal(0, 1) \\ \vdots \\ \beta_p \sim normal(0, 1) \]

  • They are all univariate normal distribution
  • Issue: Each parameter had a prior that was independent of the other parameter; then the correlation between betas is low and cannot be changed.

For example, the code shows two betas with univariate normal distribution have low correlation (r = -0.025)

set.seed(1234)
beta0 = rnorm(100, 0, 1)
beta1 = rnorm(100, 0, 1)
cor(beta0, beta1)
[1] -0.02538285

Vectorize prior distributions (Cont.)

When combining all parameters into a vector, a natural extension is a multivariate normal distribution, so that the betas have a pre-defined correlation strength

  • The syntax shows the two betas generated by the multivariate normal distribution with correlation of .5
set.seed(1234)
sigma_of_betas = matrix(c(1, 0.5, 0.5, 1), ncol = 2)
betas = mvtnorm::rmvnorm(100, mean = c(0, 0), sigma = sigma_of_betas)
beta0 = betas[,1]
beta1 = betas[,2]
cor(beta0, beta1)
[1] 0.5453899

Back to the stan code, we need to specify:

  • Mean vector of betas (meanBeta; size P \(\times\) 1)
    • Put all prior means for those coefficients into a vector
  • Covariance matrix for betas (covBeta; size P \(\times\) P)
    • Put all prior variances into the diagonal; zeros for off diagonal; ’cause we are not sure the potential correlation between betas

Syntax Changes: Data Section

  • Old syntax without matrix:
data{
    int<lower=0> N;
    vector[N] weightLB;
    vector[N] height60IN;
    vector[N] group2;
    vector[N] group3;
    vector[N] heightXgroup2;
    vector[N] heightXgroup3;
}
  • New syntax with matrix:
data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R 
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: rate parameter for residual standard deviation
}

Syntax Changes: Parameters Section

  • Old syntax without matrix:
parameters {
  real beta0;
  real betaHeight;
  real betaGroup2;
  real betaGroup3;
  real betaHxG2;
  real betaHxG3;
  real<lower=0> sigma;
}
  • New syntax with matrix:
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}

Syntax Changes: Prior Distributions Definition

  • Old syntax without matrix:
model {
  beta0 ~ normal(0,100);
  betaHeight ~ normal(0,100);
  betaGroup2 ~ normal(0,100);
  betaGroup3 ~ normal(0,100);
  betaHxG2 ~ normal(0,100);
  betaHxG3 ~ normal(0,100);
  sigma ~ exponential(.1); // prior for sigma
  weightLB ~ normal(
    beta0 + betaHeight * height60IN + betaGroup2 * group2 + 
    betaGroup3*group3 + betaHxG2*heightXgroup2 +
    betaHxG3*heightXgroup3, sigma);
}

New syntax with matrix:

  • multi_normal() is the multivariate normal sampling in Stan, similar to rmvnorm() in R; For uninformative, we did not need to specify
  • exponential() is the exponential distribution sampling in Stan, similar to rexp() in R
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}

A little more about exponential distribution

  • The mean of the exp. distribution is \(\frac{1}{\lambda}\), where \(\lambda\) is called rate parameter
  • The variance of the exp. distribution is \(\frac{1}{\lambda^2}\)
  • It is typically positive skewed (skewness is 2)
  • Question: which hyperparameter rate \(\lambda\) is most informative/uninformative
library(tidyr)
library(dplyr)
library(ggplot2)
rate_list = seq(0.1, 1, 0.2)
pdf_points = sapply(rate_list, \(x) dexp(seq(0, 20, 0.01), x)) |> as.data.frame()
colnames(pdf_points) <- rate_list
pdf_points$x = seq(0, 20, 0.01)
pdf_points %>% 
  pivot_longer(-x, values_to = 'y') %>% 
  mutate(
    sigmaRate = factor(name, levels = rate_list)
    ) %>% 
  ggplot() +
  geom_path(aes(x = x, y = y, color = sigmaRate, group = sigmaRate), size = 1.2) +
  scale_x_continuous(limits = c(0, 20)) +
  labs(x = "Sigma")
Figure 2: PDF for the exponential distribution by varied rate parameters

Since we talked about Exponential distribution…

Let’s dive deeper into Laplace distribution. It is sometimes called double-exponential distribution. Exponential distribution is positive part of Laplace distribution.

\[ \text{PDF}_{exp.} = \lambda e^{-\lambda x} \]

\[ \text{PDF}_{laplace} = \frac{1}{2b} e^{-\frac{|x - u|}{b}} \]

  • Thus, we know that for x > 0, exponential distribution is a special case of Laplace distribution with scale parameter \(b\) as \(\frac{1}{\lambda}\) and location parameter as 0.

  • Laplace-based distribution, Cauchy, and Horseshoe distribution all belong to so-called “shrinkage” priors.

  • Shrinkage priors will be very useful for high-dimensional data (say P = 1000) and variable selection
library(LaplacesDemon)
b_list = 1 / rate_list * 2
pdf_points = sapply(b_list, \(x) dlaplace(seq(-20, 20, 0.01), scale = x, location = 0)) |> as.data.frame()
colnames(pdf_points) <- round(b_list, 2)
pdf_points$x = seq(-20, 20, 0.01)
pdf_points %>% 
  pivot_longer(-x, values_to = 'y') %>% 
  mutate(
    scale = factor(name, levels = round(b_list, 2))
    ) %>% 
  ggplot() +
  geom_path(aes(x = x, y = y, color = scale, group = scale), size = 1.2) +
  scale_x_continuous(limits = c(-20, 20)) +
  labs(x = "")

Compare results and computational time

Code
mod_full_old <- cmdstan_model("Code/FullModel_Old.stan")
data_full_old <- list(
  N = nrow(dat),
  weightLB = dat$WeightLB,
  height60IN = dat$HeightIN60,
  group2 = as.numeric(dat$DietGroup == 2),
  group3 = as.numeric(dat$DietGroup == 3),
  heightXgroup2 = as.numeric(dat$DietGroup == 2) * dat$HeightIN60,
  heightXgroup3 = as.numeric(dat$DietGroup == 3) * dat$HeightIN60
)
fit_full_old <- mod_full_old$sample(
  data = data_full_old,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)
fit_full_old$summary()[, -c(9, 10)]
# A tibble: 8 × 8
  variable      mean  median    sd   mad     q5     q95  rhat
  <chr>        <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
1 lp__       -76.7   -76.3   2.17  2.00  -80.8  -73.8    1.00
2 beta0      148.    148.    3.26  3.20  142.   153.     1.00
3 betaHeight  -0.370  -0.378 0.495 0.488  -1.17   0.437  1.00
4 betaGroup2 -24.2   -24.1   4.59  4.52  -31.6  -16.4    1.00
5 betaGroup3  81.2    81.1   4.31  4.36   74.6   88.5    1.00
6 betaHxG2     2.46    2.45  0.690 0.678   1.34   3.59   1.00
7 betaHxG3     3.56    3.55  0.658 0.648   2.48   4.66   1.00
8 sigma        8.26    8.10  1.23  1.13    6.52  10.5    1.00
Code
mod_full_new <- cmdstan_model("Code/FullModel_New.stan")
FullModelFormula = as.formula("WeightLB ~ HeightIN60 + DietGroup + HeightIN60*DietGroup")
X = model.matrix(FullModelFormula, data = dat)
data_full_new <- list(
  N = nrow(dat),
  P = ncol(X),
  X = X, 
  weightLB = dat$WeightLB,
  sigmaRate = 0.1
)
fit_full_new <- mod_full_new$sample(
  data = data_full_new,
  seed = 1234,
  chains = 4,
  parallel_chains = 4
)
fit_full_new$summary()[, -c(9, 10)]
# A tibble: 8 × 8
  variable    mean  median    sd   mad     q5     q95  rhat
  <chr>      <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
1 lp__     -76.6   -76.3   2.15  2.00  -80.6  -73.9    1.00
2 beta[1]  148.    148.    3.27  3.23  142.   153.     1.00
3 beta[2]   -0.374  -0.372 0.483 0.479  -1.16   0.414  1.00
4 beta[3]  -24.1   -24.2   4.59  4.45  -31.6  -16.5    1.00
5 beta[4]   81.3    81.3   4.44  4.40   74.1   88.5    1.00
6 beta[5]    2.47    2.48  0.683 0.676   1.32   3.56   1.00
7 beta[6]    3.57    3.57  0.646 0.643   2.52   4.62   1.00
8 sigma      8.25    8.10  1.26  1.20    6.50  10.5    1.00

The differences between two method:

  • betaGroup3 has the largest differences between two methods
cbind(fit_full_old$summary()[,1], fit_full_old$summary()[, -c(1, 9, 10)] - fit_full_new$summary()[, -c(1, 9, 10)])
    variable         mean     median            sd          mad        q5
1       lp__ -0.007672400  0.0063000  0.0242926098  0.007931910 -0.115880
2      beta0  0.071524500  0.1155000 -0.0135811270 -0.029652000 -0.102200
3 betaHeight  0.004062205 -0.0067305  0.0121231572  0.008502340 -0.018165
4 betaGroup2 -0.013670300  0.0513000 -0.0005405792  0.069904590 -0.037665
5 betaGroup3 -0.103896750 -0.2107500 -0.1281290904 -0.041068020  0.468720
6   betaHxG2 -0.019823355 -0.0341300  0.0072874047  0.001823598  0.020645
7   betaHxG3 -0.010911025 -0.0213900  0.0116900063  0.005040840 -0.041349
8      sigma  0.011585872  0.0005500 -0.0326167204 -0.068377512  0.021184
          q95          rhat
1  0.03527000  2.381442e-03
2  0.00740000  1.487790e-04
3  0.02306265  1.225408e-03
4  0.07863000  4.347532e-05
5 -0.04479000 -4.122054e-05
6  0.02261500  5.967715e-04
7  0.03879350  7.265786e-05
8 -0.00486500  2.854078e-04

Compare computational time

  • The Stan code with matrix has faster computation:
fit_full_old$time()
$total
[1] 0.202059

$chains
  chain_id warmup sampling total
1        1  0.053    0.042 0.095
2        2  0.064    0.050 0.114
3        3  0.057    0.053 0.110
4        4  0.056    0.043 0.099
fit_full_new$time()
$total
[1] 0.1630359

$chains
  chain_id warmup sampling total
1        1  0.031    0.028 0.059
2        2  0.036    0.033 0.069
3        3  0.036    0.032 0.068
4        4  0.033    0.029 0.062
  • Pros: With matrices, there is less syntax to write

    • Model is equivalent
    • More efficient for sampling (sample from matrix space)
    • More flexible: modify matrix elements in R instead of individual parameters in Stan
  • Cons: Output, however, is not labeled with respect to parameters

    • May have to label output

Computing Functions of Parameters

  • Often, we need to compute some linear or non-linear function of parameters in a linear model

    • Missing effects - beta for diet group 2 and 3

    • Model fit indices: \(R^2\)

    • Transformed effects - residual variance \(\sigma^2\)

  • In non-Bayesian (frequentist) analyses, there are often formed with the point estimates of parameters (with standard errors - second derivative of likelihood function)

  • For Bayesian analyses, however, we seek to build the posterior distribution for any function of parameters

    • This means applying the function to all posterior samples
    • It is especially useful when you want to propose your new statistic

Example: Need Slope for Diet Group 2

Recall our model:

\[ \text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\ +\beta_4 \text{HeightIN}_p\text{Group2}_p \\ +\beta_5 \text{HeightIN}_p\text{Group3}_p \\ + e_p \]

Here, \(\beta_1\) denotes the average change in \(\text{WeightLB}\) with one-unit increase in \(\text{HeightIN}\) for members in the reference group— Diet Group 1.

Question: What about the slope for members in Diet Group 2.

  • Typically, we can calculate by hand by assign \(\text{Group2}\) as 1 and all effects regarding \(\text{HeightIN}\):

    \[ \beta_{\text{group2}}*\text{HeightIN} = (\beta_1 + \beta_4*1 + \beta_5*0)*\text{HeightIN} \]

    \[ \beta_{\text{group2}}= \beta_1 +\beta_4 \]

  • Similarly, the intercept for Group2 - the average mean of \(\text{WeightLB}\) is \(\beta_0 + \beta_2\).

Computing slope for Diet Group 2

Our task: Create posterior distribution for Diet Group 2

  • We must do so for each iteration we’ve kept from our MCMC chain

  • A somewhat tedious way to do this is after using Stan

fit_full_new$summary()
# A tibble: 8 × 10
  variable    mean  median    sd   mad     q5     q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -76.6   -76.3   2.15  2.00  -80.6  -73.9    1.00    1401.    1886.
2 beta[1]  148.    148.    3.27  3.23  142.   153.     1.00    1402.    1838.
3 beta[2]   -0.374  -0.372 0.483 0.479  -1.16   0.414  1.00    1522.    2019.
4 beta[3]  -24.1   -24.2   4.59  4.45  -31.6  -16.5    1.00    1746.    2190.
5 beta[4]   81.3    81.3   4.44  4.40   74.1   88.5    1.00    1538.    2091.
6 beta[5]    2.47    2.48  0.683 0.676   1.32   3.56   1.00    1805.    2118.
7 beta[6]    3.57    3.57  0.646 0.643   2.52   4.62   1.00    1678.    2218.
8 sigma      8.25    8.10  1.26  1.20    6.50  10.5    1.00    2265.    2212.
beta_group2 <- fit_full_new$draws("beta[2]")  + fit_full_new$draws("beta[5]")
summary(beta_group2)
# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 beta[2]   2.10   2.09 0.478 0.478  1.32  2.88  1.00    4032.    3211.

Computing slope within Stan

Stan can compute these values for us-with the “generated quantities” section of the syntax

Stan code
data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R 
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: prior rate parameter for residual standard deviation
}
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}
generated quantities{
  real slopeG2;
  slopeG2 = beta[2] + beta[5];
}

The generated quantities block computes values that do not affect the posterior distributions of the parameters–they are computed after the sampling from each iteration

  • The values are then added to the Stan object and can be seen in the summary

    • They can also be plotted using bayesplot package
mod_full_compute <- cmdstan_model("Code/FullModel_compute.stan")
fit_full_compute <- mod_full_compute$sample(
  data = data_full_new,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)
fit_full_compute$summary('slopeG2')
# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 slopeG2   2.10   2.09 0.478 0.478  1.32  2.88  1.00    4032.    3211.
bayesplot::mcmc_dens_chains(fit_full_compute$draws('slopeG2'))

Alternative way of computing the slope with Matrix

This is a little more complicated but more flexible method.

That is, we can make use of matrix operation and form a contrast matrix

  • Contrasts are linear combinations of parameters

    • You may have used these in R using glht package

For use, we form a contrast matrix that is size of \(C \times P\) where C is the number of contrasts:

  • The entries of this matrix are the values that multiplying the coefficients

    • For \((\beta_1 + \beta_2)\) this would be:

      • a “1” in the corresponding entry for \(\beta_1\)

      • a “1” in the corresponding entry for \(\beta_4\)

      • “0”s elsewhere

    • \[ \mathbf{C} = \begin{bmatrix} 0\ \mathbf{1}\ 0\ 0\ \mathbf{1}\ 0 \end{bmatrix} \]

  • Then, the contrast matrix is multiplied by the coefficients vector to form the values:

    • \(\mathbf{C} * \beta\)

Contrasts in Stan

Stan code
data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R 
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: prior rate parameter for residual standard deviation
  int<lower=0> nContrasts;
  matrix[nContrasts, P] contrast; // C matrix 
}
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}
generated quantities{
  vector[nContrasts] computedEffects;
  computedEffects = contrast*beta;
}
R code
mod_full_contrast <- cmdstan_model("Code/FullModel_contrast.stan")
contrast_dat <- list(
  nContrasts = 2,
  contrast = matrix(
    c(0,1,0,0,1,0, # slope for diet group2
      1,0,1,0,0,0),# intercept for diet group 2
    nrow = 2, byrow = TRUE
  )
)
fit_full_contrast <- mod_full_contrast$sample(
  data = c(data_full_new, contrast_dat),
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)
fit_full_contrast$summary('computedEffects')[, -c(9, 10)]
# A tibble: 2 × 8
  variable             mean median    sd   mad     q5    q95  rhat
  <chr>               <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
1 computedEffects[1]   2.10   2.09 0.478 0.478   1.32   2.88  1.00
2 computedEffects[2] 123.   123.   3.16  3.02  118.   129.    1.00
bayesplot::mcmc_hist(fit_full_contrast$draws('computedEffects'))

Computing \(\text{R}^2\)

We can use the generated quantities section to build a posterior distribution for \(\text{R}^2\)

There are several formulas for \(\text{R}^2\), we will use the following:

\[ \text{R}^2 = 1-\frac{RSS}{TSS} = 1- \frac{\Sigma_{p=1}^{N}(y_p -\hat{y}_p)}{\Sigma_{p=1}^{N}(y_p -\bar{y}_p)} \] Where:

  1. RSS is the residual sum of squares
  2. TSS is the total sum of squares of dependent variable
  3. \(\hat{y}_p\) is the predicted values: \(\hat{y}_p = \mathbf{X}\boldsymbol{\beta}\)
  4. \(\bar{y}_p\) is the mean value of dependent variable: \(\bar{y}_p = \frac{\Sigma_{p=1}^{N}y_p}{N}\)

Notice: RSS depends on sampled parameters, so we will use this to build our posterior distribution for \(\text{R}^2\)

For adjusted \(\text{R}^2\), we use the following:

\[ \text{adj.R}^2 = 1-\frac{RSS/(N-P)}{TSS/(N-1)} = 1- \frac{\Sigma_{p=1}^{N}(y_p -\hat{y}_p)}{\Sigma_{p=1}^{N}(y_p -\bar{y}_p)}*\frac{N-P}{N-1} \]

Then, we can calculate the how to calculate \(\text{adj.R}^2\) by \(\text{R}^2\):

\[ \text{adj.R}^2 = 1-(1-\text{R}^2)*\frac{N-P}{N-1} = \frac{(P-1)+(N-1)R^2}{N-P} \]

Stan code for Computing \(\text{R}^2\)

Stan code
generated quantities{
  vector[nContrasts] computedEffects;
  computedEffects = contrast*beta;
  // compute R2
  real rss;
  real tss;
  real R2;
  real R2adj;
  {// anything in these brackets will not appear in summary table
    vector[N] pred = X*beta;
    rss = dot_self(weightLB-pred); // dot_self is stan function for matrix square
    tss = dot_self(weightLB-mean(weightLB));
  }
  R2 = 1-rss/tss;
  R2adj = 1-(rss/(N-P))/(tss/(N-1));
}

Recall that our lm function provides \(\text{R}^2\) as 0.9787 and adjusted \(\text{R}^2\) as 0.9742

fit_full_contrast$summary(c('rss', 'tss', 'R2','R2adj'))[, -c(9, 10)]
# A tibble: 4 × 8
  variable      mean    median        sd       mad        q5       q95  rhat
  <chr>        <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>
1 rss       1941.     1877.    285.      226.       1616.     2499.     1.00
2 tss      71112     71112       0         0       71112     71112     NA   
3 R2           0.973     0.974   0.00401   0.00317     0.965     0.977  1.00
4 R2adj        0.967     0.968   0.00484   0.00384     0.958     0.973  1.00
bayesplot::mcmc_hist(fit_full_contrast$draws(c('R2', 'R2adj')))

Get posterior mode

# Create the function.
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Calculate the mode using the user function.
getmode(fit_full_contrast$draws('R2'))
[1] 0.974917
getmode(fit_full_contrast$draws('R2adj'))
[1] 0.96674

Wrapping up

Today we further added generated quantities into our Bayesian toolset:

  • How to make Stan use less syntax using matrices
  • How to form posterior distributions for functions of parameters

We will use both of these features in psychometric models.

Next Class

  1. Bayesian Model fit
  2. Bayesian Model Comparison