When combining all parameters into a vector, a natural extension is a multivariate normal distribution, so that the betas have a pre-defined correlation strength
The syntax shows the two betas generated by the multivariate normal distribution with correlation of .5
data{int<lower=0> N; // number of observationsint<lower=0> P; // number of predictors (plus column for intercept)matrix[N, P] X; // model.matrix() from R vector[N] weightLB; // outcomereal sigmaRate; // hyperparameter: rate parameter for residual standard deviation}
Figure 2: PDF for the exponential distribution by varied rate parameters
Since we talked about Exponential distribution…
Let’s dive deeper into Laplace distribution. It is sometimes called double-exponential distribution. Exponential distribution is positive part of Laplace distribution.
PDFexp.=λe−λx
PDFlaplace=12be−|x−u|b
Thus, we know that for x > 0, exponential distribution is a special case of Laplace distribution with scale parameter b as 1λ and location parameter as 0.
Laplace-based distribution, Cauchy, and Horseshoe distribution all belong to so-called “shrinkage” priors.
Shrinkage priors will be very useful for high-dimensional data (say P = 1000) and variable selection
Pros: With matrices, there is less syntax to write
Model is equivalent
More efficient for sampling (sample from matrix space)
More flexible: modify matrix elements in R instead of individual parameters in Stan
Cons: Output, however, is not labeled with respect to parameters
May have to label output
Computing Functions of Parameters
Often, we need to compute some linear or non-linear function of parameters in a linear model
Missing effects - beta for diet group 2 and 3
Model fit indices: R2
Transformed effects - residual variance σ2
In non-Bayesian (frequentist) analyses, there are often formed with the point estimates of parameters (with standard errors - second derivative of likelihood function)
For Bayesian analyses, however, we seek to build the posterior distribution for any function of parameters
This means applying the function to all posterior samples
It is especially useful when you want to propose your new statistic
Stan can compute these values for us-with the “generated quantities” section of the syntax
Stan code
data{int<lower=0> N; // number of observationsint<lower=0> P; // number of predictors (plus column for intercept)matrix[N, P] X; // model.matrix() from R vector[N] weightLB; // outcomereal sigmaRate; // hyperparameter: prior rate parameter for residual standard deviation}parameters {vector[P] beta; // vector of coefficients for Betareal<lower=0> sigma; // residual standard deviation}model { sigma ~ exponential(sigmaRate); // prior for sigma weightLB ~ normal(X*beta, sigma); // linear model}generated quantities{real slopeG2; slopeG2 = beta[2] + beta[5];}
The generated quantities block computes values that do not affect the posterior distributions of the parameters–they are computed after the sampling from each iteration
The values are then added to the Stan object and can be seen in the summary
Alternative way of computing the slope with Matrix
This is a little more complicated but more flexible method.
That is, we can make use of matrix operation and form a contrast matrix
Contrasts are linear combinations of parameters
You may have used these in R using glht package
For use, we form a contrast matrix that is size of C×P where C is the number of contrasts:
The entries of this matrix are the values that multiplying the coefficients
For (β1+β2) this would be:
a “1” in the corresponding entry for β1
a “1” in the corresponding entry for β4
“0”s elsewhere
C=[010010]
Then, the contrast matrix is multiplied by the coefficients vector to form the values:
C∗β
Contrasts in Stan
Stan code
data{int<lower=0> N; // number of observationsint<lower=0> P; // number of predictors (plus column for intercept)matrix[N, P] X; // model.matrix() from R vector[N] weightLB; // outcomereal sigmaRate; // hyperparameter: prior rate parameter for residual standard deviationint<lower=0> nContrasts;matrix[nContrasts, P] contrast; // C matrix }parameters {vector[P] beta; // vector of coefficients for Betareal<lower=0> sigma; // residual standard deviation}model { sigma ~ exponential(sigmaRate); // prior for sigma weightLB ~ normal(X*beta, sigma); // linear model}generated quantities{vector[nContrasts] computedEffects; computedEffects = contrast*beta;}
R code
mod_full_contrast <-cmdstan_model("Code/FullModel_contrast.stan")contrast_dat <-list(nContrasts =2,contrast =matrix(c(0,1,0,0,1,0, # slope for diet group21,0,1,0,0,0),# intercept for diet group 2nrow =2, byrow =TRUE ))fit_full_contrast <- mod_full_contrast$sample(data =c(data_full_new, contrast_dat),seed =1234,chains =4,parallel_chains =4,refresh =0)
Then, we can calculate the how to calculate adj.R2 by R2:
adj.R2=1−(1−R2)∗N−PN−1=(P−1)+(N−1)R2N−P
Stan code for Computing R2
Stan code
generated quantities{vector[nContrasts] computedEffects; computedEffects = contrast*beta;// compute R2real rss;real tss;real R2;real R2adj; {// anything in these brackets will not appear in summary tablevector[N] pred = X*beta; rss = dot_self(weightLB-pred); // dot_self is stan function for matrix square tss = dot_self(weightLB-mean(weightLB)); } R2 = 1-rss/tss; R2adj = 1-(rss/(N-P))/(tss/(N-1));}
Recall that our lm function provides R2 as 0.9787 and adjusted R2 as 0.9742
# Create the function.getmode <-function(v) { uniqv <-unique(v) uniqv[which.max(tabulate(match(v, uniqv)))]}# Calculate the mode using the user function.getmode(fit_full_contrast$draws('R2'))
[1] 0.974917
getmode(fit_full_contrast$draws('R2adj'))
[1] 0.96674
Wrapping up
Today we further added generated quantities into our Bayesian toolset:
How to make Stan use less syntax using matrices
How to form posterior distributions for functions of parameters
We will use both of these features in psychometric models.