Lecture 10: Mixed Models for Multivariate Regression

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

2024-10-09

Today's Class

Multivaraite regression via mixed models

Comparing and contrasting path analysis with mixed models

Differences in model fit measures

Differences in software estimation methods

Model comparisons via multivariate Wald tests (instead of LRTs)

How to compute R-square

R Setup

library(ESRM64503)library(kableExtra)library(tidyverse)library(DescTools) # Desc() allows you to quick screen datalibrary(lavaan) # Desc() allows you to quick screen datahead(dataMath)

id hsl cc use msc mas mse perf female
1 1 NA 9 44 55 39 NA 14 1
2 2 3 2 77 70 42 71 12 0
3 3 NA 12 64 52 31 NA NA 1
4 4 6 20 71 65 39 84 19 0
5 5 2 15 48 19 2 60 12 0
6 6 NA 15 61 62 42 87 18 0

dim(dataMath)

[1] 350 9

Correction about fixed.x argument in previous lecture

If TRUE, the exogenous x covariates are considered fixed variables and the means, variances and covariances of these variables are fixed to their sample values.

If FALSE, they are considered random, and the means, variances and covariances are free parameters. Typically, called latent variable

If “default”, the value is set depending on the mimic option.

Thus, we considered the distributions of exogenous variables as known parameters.

What is Mixed Models

A mixed model, mixed-effects model or Linear mixed models (LMMs) is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences.

Mixed model can answer similar research questions as Path Analysis (or structural equation model):

Relationships among multiple endogenous variables

Fixed effects vs. Random effects

Definition: Fixed effects are constant across individuals, and random effects vary.

Assume a person is measured t times (repeated measure design or longitudinal design), thus we have t points of x and y for each individuals

\[
y_{it} = \beta_{0i} + \beta_1 x_{it}
\]

Here, \(\beta_{0i}\) is random intercept that varies across individuals. \(\beta_{1}\) is the fixed slope that shared acorss individuals.

Alternative definition: Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population.

\[
y_i = \beta_0 + \beta_1 x_i +e_i
\] Thus, \(\sigma^e\) is random effect, \(\beta_0\) and \(\beta_1\) are fixed effects.

Properties of Mixed Models

Mixed models are used for many types of analyses:

Analogous to MANOVA and M-Regression (so repeated measures analyses)

Multilevel models for clustered, longitudinal, and crossed-effects data

The biggest difference between mixed models and path analysis software is the assumed distribution of the exogenous variables：

Mixed models: no distribution assumed

Path analysis: most software assumes multivariate normal distribution

This affects how missing data are managed – mixed models cannot have any missing IVs

Mixed models also do not allow endogenous variables to predict other endogenous variables

No indirect effects are possible from a single analysis (multiple analyses needed)

Mixed models software also often needs variables to be stored in so-called “stacked” or long-format data (one row per DV)

We used wide-format data for lavaan (one row per person)

Wide to Long Data Transformation

Original wide-format data (all DVs for a person on one row)

dat <- dataMathdat$cc10 <- dat$cc -10dat_wide <- dat |>select(id, perf, use, female, cc10)head(dat_wide) # show first 6 lines

cols: Columns to pivot into longer format. Put multiple column names (no quote) into c()

2

names_to: A character vector specifying the new column to create from the information stored in the column names

3

values_to: A string specifying the name of the column to create from the data stored in cell values.

# A tibble: 6 × 6
id female cc10 DV score dPerf
<int> <int> <dbl> <chr> <int> <dbl>
1 1 1 -1 perf 14 0
2 1 1 -1 use 44 1
3 2 0 -8 perf 12 0
4 2 0 -8 use 77 1
5 3 1 2 perf NA 0
6 3 1 2 use 64 1

Execise 1: Wide to Long Transform

Turn msc, mas, mse into long-form

dat_wide <- dat |>select(id, msc, mas, mse)## You turn

id

DV

Score

1

msc

55

1

mas

39

1

mse

NA

2

msc

70

Statistical Form: Bridge Path Model w/t Mixed Model

Before we dive into mixed models, we will begin with a multivariate regression model:

Predicting mathematics performance (PERF) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)

Predicting perceived usefulness (USE) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)

Mixed Model: Here I use the symbol \(\delta\) to represent each fixed effect in the multivariate model from the mixed model perspective. dPERF is the DV indicator: 1 - Perf and 0 - Use

corSymm: General Correlation Structure; Provides estimates of all unique correlations; Needs id variable name after | for program to know which data comes from which person; ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.

4

varIdent: a constant variance function structure; Estimates a different (residual) variance for each DV; With correlation line ensures an unstructured model is estimated

Estimating the Empty Model

Generalized least squares fit by REML
Model: score ~ 1 + dPerf
Data: dat_long
AIC BIC logLik
3774.313 3795.871 -1882.156
Correlation Structure: General
Formula: ~1 | id
Parameter estimate(s):
Correlation:
1
2 0.136
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | DV
Parameter estimates:
perf use
1.000000 5.337397
Coefficients:
Value Std.Error t-value p-value
(Intercept) 13.94085 0.1868631 74.60461 0
dPerf 38.46901 0.9399708 40.92575 0
Correlation:
(Intr)
dPerf -0.079
Standardized residuals:
Min Q1 Med Q3 Max
-3.24789265 -0.64196261 0.01956532 0.68109325 2.99644101
Residual standard error: 3.023304
Degrees of freedom: 553 total; 551 residual

Empty Model Results I: Covariance Matrix of DVs

The covariance matrix of DVs comes from the getVarCov() function

lavaan 0.6.17 ended normally after 29 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Used Total
Number of observations 348 350
Number of missing patterns 3
Model Test User Model:
Standard Scaled
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Model Test Baseline Model:
Test statistic 6.064 5.573
Degrees of freedom 1 1
P-value 0.014 0.018
Scaling correction factor 1.088
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000 1.000
Tucker-Lewis Index (TLI) 1.000 1.000
Robust Comparative Fit Index (CFI) 1.000
Robust Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -2085.032 -2085.032
Loglikelihood unrestricted model (H1) -2085.032 -2085.032
Akaike (AIC) 4180.064 4180.064
Bayesian (BIC) 4199.325 4199.325
Sample-size adjusted Bayesian (SABIC) 4183.464 4183.464
Root Mean Square Error of Approximation:
RMSEA 0.000 NA
90 Percent confidence interval - lower 0.000 NA
90 Percent confidence interval - upper 0.000 NA
P-value H_0: RMSEA <= 0.050 NA NA
P-value H_0: RMSEA >= 0.080 NA NA
Robust RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value H_0: Robust RMSEA <= 0.050 NA
P-value H_0: Robust RMSEA >= 0.080 NA
Standardized Root Mean Square Residual:
SRMR 0.000 0.000
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
perf ~~
use 6.847 2.850 2.403 0.016 6.847 0.147
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
perf 13.959 0.174 80.442 0.000 13.959 4.721
use 52.440 0.872 60.140 0.000 52.440 3.322
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
perf 8.742 0.754 11.596 0.000 8.742 1.000
use 249.245 19.212 12.973 0.000 249.245 1.000

Comparing and Contrasting Results: Intercept (fixed effect)

Note: R’s nlme function doesn’t do a good job with df.residual and provides a Chi-square test

Also note there are 6 degrees of freedom (one for each additional beta weight in the model)

Questions that can be answered

What is the effect of college experience on usefulness for males?

What is the effect of college experience on usefulness for females?

What is the difference between males and females ratings of usefulness when college experience = 10?

How did the difference between males and females ratings change for each additional hour of college experience?

What is the effect of college experience on performance for males?

What is the effect of college experience on performance for females?

What is the difference between males and females performance when college experience = 10?

How did the difference between males and females performance change for each additional hour of college experience?

Model R-squared

To determine the model R-squared, we have to compare the variance/covariance matrix from model01 and model02 and make the statistics ourselves:

Vmodel01 =getVarCov(model01_mixed)Vmodel02 =getVarCov(model02_mixed)## Rsquare for Performance and Usefulness(diag(Vmodel01) -diag(Vmodel02)) /diag(Vmodel01)

[1] 0.064686451 0.003341706

6.47% variance of performance was explained by added predictors.

0.33% variance of usefulness was explained by added predictors.

Exercise 2: Model mixed model with mse, mas and msc

Model 1: A path analysis with mse, mas, and msc as outcomes

Model 2: A empty mixed model with mse, mas, and msc are repeated measures nested in each individual

Compare two models: intercepts and correlations

Wrapping up

Things we get directly from path models that we do not get directly in mixed models:

Tests for approximate model fit

Scaled Chi-square for some types of non-normal data

Standardized parameter coefficients

Tests for indirect effects

R-squared statistics

Things we get directly in mixed models that we do not get in path models:

REML (unbiased estimates of variances/covariances)

In this lecture we discussed the basics of mixed model analyses for multivariate models

Model specification/identification

Model estimation

Model modification and re-estimation

Final model parameter interpretation

There is a lot to the analysis

but what is important to remember is the over-arching principal of multivariate analyses: covariance between variables is important

Mixed models imply very specific covariance structures

The validity of the results still hinge upon accurately finding an approximation to the covariance matrix