Lecture 10: Mixed Models for Multivariate Regression

Author

Affiliation

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

Published

October 9, 2024

Modified

October 11, 2024

0.1

Today's Class

Multivaraite regression via mixed models

Comparing and contrasting path analysis with mixed models

Differences in model fit measures

Differences in software estimation methods

Model comparisons via multivariate Wald tests (instead of LRTs)

How to compute R-square

0.2 R Setup

library(ESRM64503)library(kableExtra)library(tidyverse)library(DescTools) # Desc() allows you to quick screen datalibrary(lavaan) # Desc() allows you to quick screen datahead(dataMath)

id hsl cc use msc mas mse perf female
1 1 NA 9 44 55 39 NA 14 1
2 2 3 2 77 70 42 71 12 0
3 3 NA 12 64 52 31 NA NA 1
4 4 6 20 71 65 39 84 19 0
5 5 2 15 48 19 2 60 12 0
6 6 NA 15 61 62 42 87 18 0

dim(dataMath)

[1] 350 9

0.3 Correction about fixed.x argument in previous lecture

If TRUE, the exogenous x covariates are considered fixed variables and the means, variances and covariances of these variables are fixed to their sample values.

If FALSE, they are considered random, and the means, variances and covariances are free parameters. Typically, called latent variable

If “default”, the value is set depending on the mimic option.

Thus, we considered the distributions of exogenous variables as known parameters.

0.4 What is Mixed Models

A mixed model, mixed-effects model or Linear mixed models (LMMs) is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences.

Mixed model can answer similar research questions as Path Analysis (or structural equation model):

Relationships among multiple endogenous variables

0.5 Fixed effects vs. Random effects

Definition: Fixed effects are constant across individuals, and random effects vary.

Assume a person is measured t times (repeated measure design or longitudinal design), thus we have t points of x and y for each individuals

y_{it} = \beta_{0i} + \beta_1 x_{it}

Here, \beta_{0i} is random intercept that varies across individuals. \beta_{1} is the fixed slope that shared acorss individuals.

Alternative definition: Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population.

y_i = \beta_0 + \beta_1 x_i +e_i
Thus, \sigma^e is random effect, \beta_0 and \beta_1 are fixed effects.

0.6 Properties of Mixed Models

Mixed models are used for many types of analyses:

Analogous to MANOVA and M-Regression (so repeated measures analyses)

Multilevel models for clustered, longitudinal, and crossed-effects data

The biggest difference between mixed models and path analysis software is the assumed distribution of the exogenous variables：

Mixed models: no distribution assumed

Path analysis: most software assumes multivariate normal distribution

This affects how missing data are managed – mixed models cannot have any missing IVs

Mixed models also do not allow endogenous variables to predict other endogenous variables

No indirect effects are possible from a single analysis (multiple analyses needed)

Mixed models software also often needs variables to be stored in so-called “stacked” or long-format data (one row per DV)

We used wide-format data for lavaan (one row per person)

0.7 Wide to Long Data Transformation

Original wide-format data (all DVs for a person on one row)

dat <- dataMathdat$cc10 <- dat$cc -10dat_wide <- dat |>select(id, perf, use, female, cc10)head(dat_wide) # show first 6 lines

cols: Columns to pivot into longer format. Put multiple column names (no quote) into c()

2

names_to: A character vector specifying the new column to create from the information stored in the column names

3

values_to: A string specifying the name of the column to create from the data stored in cell values.

# A tibble: 6 × 6
id female cc10 DV score dPerf
<int> <int> <dbl> <chr> <int> <dbl>
1 1 1 -1 perf 14 0
2 1 1 -1 use 44 1
3 2 0 -8 perf 12 0
4 2 0 -8 use 77 1
5 3 1 2 perf NA 0
6 3 1 2 use 64 1

0.8 Execise 1: Wide to Long Transform

Turn msc, mas, mse into long-form

dat_wide <- dat |>select(id, msc, mas, mse)## You turn

id

DV

Score

1

msc

55

1

mas

39

1

mse

NA

2

msc

70

0.9 Statistical Form: Bridge Path Model w/t Mixed Model

Before we dive into mixed models, we will begin with a multivariate regression model:

Predicting mathematics performance (PERF) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)

Predicting perceived usefulness (USE) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)

Mixed Model: Here I use the symbol \delta to represent each fixed effect in the multivariate model from the mixed model perspective. dPERF is the DV indicator: 1 - Perf and 0 - Use

corSymm: General Correlation Structure; Provides estimates of all unique correlations; Needs id variable name after | for program to know which data comes from which person; ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.

4

varIdent: a constant variance function structure; Estimates a different (residual) variance for each DV; With correlation line ensures an unstructured model is estimated

Generalized least squares fit by REML
Model: score ~ 1 + dPerf
Data: dat_long
AIC BIC logLik
3774.313 3795.871 -1882.156
Correlation Structure: General
Formula: ~1 | id
Parameter estimate(s):
Correlation:
1
2 0.136
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | DV
Parameter estimates:
perf use
1.000000 5.337397
Coefficients:
Value Std.Error t-value p-value
(Intercept) 13.94085 0.1868631 74.60461 0
dPerf 38.46901 0.9399708 40.92575 0
Correlation:
(Intr)
dPerf -0.079
Standardized residuals:
Min Q1 Med Q3 Max
-3.24789265 -0.64196261 0.01956532 0.68109325 2.99644101
Residual standard error: 3.023304
Degrees of freedom: 553 total; 551 residual

1.3 Empty Model Results I: Covariance Matrix of DVs

The covariance matrix of DVs comes from the getVarCov() function

lavaan 0.6.17 ended normally after 29 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Used Total
Number of observations 348 350
Number of missing patterns 3
Model Test User Model:
Standard Scaled
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Model Test Baseline Model:
Test statistic 6.064 5.573
Degrees of freedom 1 1
P-value 0.014 0.018
Scaling correction factor 1.088
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000 1.000
Tucker-Lewis Index (TLI) 1.000 1.000
Robust Comparative Fit Index (CFI) 1.000
Robust Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -2085.032 -2085.032
Loglikelihood unrestricted model (H1) -2085.032 -2085.032
Akaike (AIC) 4180.064 4180.064
Bayesian (BIC) 4199.325 4199.325
Sample-size adjusted Bayesian (SABIC) 4183.464 4183.464
Root Mean Square Error of Approximation:
RMSEA 0.000 NA
90 Percent confidence interval - lower 0.000 NA
90 Percent confidence interval - upper 0.000 NA
P-value H_0: RMSEA <= 0.050 NA NA
P-value H_0: RMSEA >= 0.080 NA NA
Robust RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value H_0: Robust RMSEA <= 0.050 NA
P-value H_0: Robust RMSEA >= 0.080 NA
Standardized Root Mean Square Residual:
SRMR 0.000 0.000
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
perf ~~
use 6.847 2.850 2.403 0.016 6.847 0.147
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
perf 13.959 0.174 80.442 0.000 13.959 4.721
use 52.440 0.872 60.140 0.000 52.440 3.322
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
perf 8.742 0.754 11.596 0.000 8.742 1.000
use 249.245 19.212 12.973 0.000 249.245 1.000

1.5 Comparing and Contrasting Results: Intercept (fixed effect)

Note: R’s nlme function doesn’t do a good job with df.residual and provides a Chi-square test

Also note there are 6 degrees of freedom (one for each additional beta weight in the model)

2.4 Questions that can be answered

What is the effect of college experience on usefulness for males?

What is the effect of college experience on usefulness for females?

What is the difference between males and females ratings of usefulness when college experience = 10?

How did the difference between males and females ratings change for each additional hour of college experience?

What is the effect of college experience on performance for males?

What is the effect of college experience on performance for females?

What is the difference between males and females performance when college experience = 10?

How did the difference between males and females performance change for each additional hour of college experience?

2.5 Model R-squared

To determine the model R-squared, we have to compare the variance/covariance matrix from model01 and model02 and make the statistics ourselves:

Vmodel01 =getVarCov(model01_mixed)Vmodel02 =getVarCov(model02_mixed)## Rsquare for Performance and Usefulness(diag(Vmodel01) -diag(Vmodel02)) /diag(Vmodel01)

[1] 0.064686451 0.003341706

6.47% variance of performance was explained by added predictors.

0.33% variance of usefulness was explained by added predictors.

2.6 Exercise 2: Model mixed model with mse, mas and msc

Model 1: A path analysis with mse, mas, and msc as outcomes

Model 2: A empty mixed model with mse, mas, and msc are repeated measures nested in each individual

Compare two models: intercepts and correlations

2.7 Wrapping up

Things we get directly from path models that we do not get directly in mixed models:

Tests for approximate model fit

Scaled Chi-square for some types of non-normal data

Standardized parameter coefficients

Tests for indirect effects

R-squared statistics

Things we get directly in mixed models that we do not get in path models:

REML (unbiased estimates of variances/covariances)

In this lecture we discussed the basics of mixed model analyses for multivariate models

Model specification/identification

Model estimation

Model modification and re-estimation

Final model parameter interpretation

There is a lot to the analysis

but what is important to remember is the over-arching principal of multivariate analyses: covariance between variables is important

Mixed models imply very specific covariance structures

The validity of the results still hinge upon accurately finding an approximation to the covariance matrix

---title: "Lecture 10: Mixed Models for Multivariate Regression"subtitle: ""author: "Jihong Zhang*, Ph.D"institute: | Educational Statistics and Research Methods (ESRM) Program* University of Arkansasdate: "2024-10-09"date-modified: "2024-10-11"sidebar: falseexecute: echo: true warning: falseoutput-location: defaultcode-annotations: belowhighlight-style: "nord"format: uark-revealjs: scrollable: true chalkboard: true embed-resources: false code-fold: false number-sections: false footer: "ESRM 64503 - Lecture 10: Introduction to mixed model" slide-number: c/t tbl-colwidths: auto output-file: slides-index.html html: page-layout: full toc: true toc-depth: 2 toc-expand: true lightbox: true code-fold: false fig-align: centerfilters: - quarto - line-highlight---## ```{=html}<div class="card shadow"> <div class="ml-3 mt-2"> <svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"> <g fill="none" fill-rule="evenodd" transform="translate(1 1)"> <circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle> <circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle> <circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle> </g> </svg> </div> <div class="card-body"> <h4 class="card-title"><b>Today's Class</b></h4> <ul> <li>Multivaraite regression via <b>mixed models</b></li> <li>Comparing and contrasting path analysis with mixed models</li> <ul> <li>Differences in model fit measures </li> <li>Differences in software estimation methods </li> <li>Model comparisons via multivariate Wald tests (instead of LRTs) </li> <li>How to compute R-square </li> </ul> </ul> </div></div>```## R Setup```{r}#| output-location: defaultlibrary(ESRM64503)library(kableExtra)library(tidyverse)library(DescTools) # Desc() allows you to quick screen datalibrary(lavaan) # Desc() allows you to quick screen datahead(dataMath)dim(dataMath)```## Correction about `fixed.x` argument in previous lecture- If TRUE, the exogenous `x` covariates are considered fixed variables and the means, variances and covariances of these variables are fixed to their sample values.- If FALSE, they are considered random, and the means, variances and covariances are free parameters. Typically, called **latent variable**- If "default", the value is set depending on the mimic option.[Thus, we considered the distributions of exogenous variables as known parameters.]{.underline}## What is Mixed Models- A **mixed model**, **mixed-effects model** or **Linear mixed models** (LMMs) is a [statistical model](https://en.wikipedia.org/wiki/Statistical_model "Statistical model") containing both [fixed effects](https://en.wikipedia.org/wiki/Fixed_effect "Fixed effect") and [random effects](https://en.wikipedia.org/wiki/Random_effect "Random effect"). These models are useful in a wide variety of disciplines in the physical, biological and social sciences.- Mixed model can answer similar research questions as Path Analysis (or structural equation model): - Relationships among multiple endogenous variables## Fixed effects vs. Random effects1. Definition: Fixed effects are constant across individuals, and random effects vary.Assume a person is measured t times (repeated measure design or longitudinal design), thus we have t points of x and y for each individuals$$y_{it} = \beta_{0i} + \beta_1 x_{it}$$Here, $\beta_{0i}$ is random intercept that varies across individuals. $\beta_{1}$ is the fixed slope that shared acorss individuals.2. Alternative definition: Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population.$$y_i = \beta_0 + \beta_1 x_i +e_i$$ Thus, $\sigma^e$ is random effect, $\beta_0$ and $\beta_1$ are fixed effects.## Properties of Mixed Models1. Mixed models are used for many types of analyses: - Analogous to **MANOVA** and M-Regression (so repeated measures analyses) - Multilevel models for *clustered*, *longitudinal*, and *crossed-effects* data2. The biggest difference between mixed models and path analysis software is the [assumed distribution of the exogenous variables]{.underline}： - Mixed models: **no distribution assumed** - Path analysis: most software assumes **multivariate normal distribution** - This affects how missing data are managed – mixed models cannot have any missing IVs3. Mixed models also do not allow endogenous variables to predict other endogenous variables - No indirect effects are possible from a single analysis (multiple analyses needed)4. Mixed models software also often needs variables to be stored in so-called “stacked” or long-format data (one row per DV) - We used **wide-format** data for `lavaan` (one row per person)## Wide to Long Data Transformation- Original wide-format data (all DVs for a person on one row)```{r}dat <- dataMathdat$cc10 <- dat$cc -10dat_wide <- dat |>select(id, perf, use, female, cc10)head(dat_wide) # show first 6 lines```- Reshape with `pivot_longer()` function and Resulting data:```{r}dat_long <- dat_wide |>pivot_longer(cols =c(perf, use), #<1>names_to ="DV", #<2>values_to ="score") |>#<3>mutate(dPerf =ifelse(DV =='perf', 0, 1)) # convert DVs into indicator variable - dperfhead(dat_long)```1. `cols`: Columns to pivot into longer format. Put multiple column names (no quote) into `c()`2. `names_to`: A character vector specifying the new column to create from the information stored in the column names3. `values_to`: A string specifying the name of the column to create from the data stored in cell values.## Execise 1: Wide to Long Transform- Turn msc, mas, mse into long-form```{r}#| eval: falsedat_wide <- dat |>select(id, msc, mas, mse)## You turn```| id | DV | Score ||-----|-----|-------|| 1 | msc | 55 || 1 | mas | 39 || 1 | mse | NA || 2 | msc | 70 |## Statistical Form: Bridge Path Model w/t Mixed Model- Before we dive into mixed models, we will begin with a multivariate regression model: - Predicting mathematics performance (PERF) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC) - Predicting perceived usefulness (USE) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)$$PERF_i = \beta_{0,PERF} + \beta_{F,PERF} F_i + \beta_{CC,PERF}CC_i + \beta_{F*CC,PERF}F_i*CC_i + e_{i,PERF}$$ {#eq-pathmodel1} $$USE_i = \beta_{0,USE} + \beta_{F,USE} F_i + \beta_{CC,USE}CC_i + \beta_{F*CC,USE}F_i*CC_i + e_{i,USE}$$ {#eq-pathmodel2}- Mixed Model: Here I use the symbol $\delta$ to represent each fixed effect in the multivariate model from the mixed model perspective. `dPERF` is the DV indicator: 1 - Perf and 0 - Use$$Score_i = (\delta_{0,PERF} + \delta_{0, dPERF}) + (\delta_{F,PERF} + \delta_{F, dPERF}) F_i + (\delta_{CC,PERF} \\ + \delta_{CC, dPERF}) CC_i + (\delta_{F*CC,PERF} + \delta_{F*CC,dPERF}) F_i*CC_i + e_{i,PERF} + e_{i,dPERF}$$ {#eq-mixedmodel1}$$Score_i = \delta_{0,PERF} + \delta_{0, dPERF}dPERF_i + \delta_{F,PERF} F_i + \delta_{F, dPERF}dPERF_i * F_i \\+ \delta_{CC,PERF} CC_i + \delta_{CC, dPERF} dPERF_i * CC_i \\+ \delta_{F*CC,PERF} F_i*CC_i + \delta_{F*CC,dPERF} dPERF_i * F_i*CC_i + e_{i,PERF} + e_{i,dPERF}$$ {#eq-mixedmodel2}# Build the Empty Model: Not So Empty## Statistical Form of empty mixed model- For illustration, let's start from the empty model- A multivariate model using mixed model software uses the dummy code for DV to make all effects conditional on the specific DV in the model - I will compare/contrast these with the symbols $\beta$ from the fixed effects in path analysis- For instance, our empty model is thus: - Where *Score* is condition on the value of *dPerf*: - When $dPerf = 0$ $\rightarrow$ DV = "Use" $\rightarrow$ $Score_i = Use_i = \delta_0 + e_{i, Use}$ - When $dPerf = 1$ $\rightarrow$ DV = "Perf" $\rightarrow$ $Score_i = Perf_i = \delta_0 + \delta_1 + e_{i, perf}$$$Score_{i, DV} = \delta_0 + \delta_1 dPerf_i + e_{i, DV}$$## Estimating the Empty Model- From the `nlme` library, we will use the `gls()` function - Be sure the library is installed and loaded before trying this!```{r}#| output-location: slidelibrary(nlme) # install.packages("nlme")dat_long <- dat_long[complete.cases(dat_long), ]# create empty model using REML estimation to attempt to mirror initial analysis:model01_mixed =gls(model = score ~1+ dPerf, #<1>data = dat_long,method ="REML", #<2>correlation =corSymm(form =~1|id), #<3>weights =varIdent(form =~1|DV)) #<4>summary(model01_mixed)```1. `score ~ 1 + dPerf`: $Score_{i, DV} = \delta_0 + \delta_1 dPerf_i + e_{i, DV}$2. "REML": Residual Maximum Likelihood Estimation3. `corSymm`: General Correlation Structure; Provides estimates of all unique correlations; Needs `id` variable name after \| for program to know which data comes from which person; `~ 1`, which corresponds to using the order of the observations in the data as a covariate, and no groups.4. `varIdent`: a constant variance function structure; Estimates a different (residual) variance for each DV; With correlation line ensures an unstructured model is estimated## Empty Model Results I: Covariance Matrix of DVs- The covariance matrix of DVs comes from the `getVarCov()` function```{r}getVarCov(model01_mixed)```- Estimated variance-covariance matrix of PERF and USE scores.## Mapping Multivariate Mixed Models onto Path Models- To compare this result with the path analyses we conducted previously, we’ll have to use this data set - Omit the same observations- So, we’ll need to take our long-format data and reshape it into wide-format:```{r}head(dat_wide)``````{r}#| output-location: slidelibrary(lavaan)model01_mirror.syntax ="# Means:perf ~ 1use ~ 1# Variances:perf ~~ perfuse ~~ use# Covariance:perf ~~ use"model01_path_noNA.fit =sem(model01_mirror.syntax, data = dat_wide,fixed.x =TRUE, mimic ="MPLUS", estimator ="MLR")summary(model01_path_noNA.fit, fit.measures =TRUE,standardized =TRUE)```## Comparing and Contrasting Results: Intercept (fixed effect)- $\beta_{0,Perf}$ and $\beta_{0,Use}$:```{r}parameterestimates(model01_path_noNA.fit) |>filter(op =="~1")```- $\delta_{0,DV}$ and $\delta_{1,DV}$:```{r}summary(model01_mixed)$tTable```$\delta_{0,DV} + \delta_{1,DV}$: `r (13.94085+38.46901)` is close to $\beta_{0,Use}$## Comparing and Contrasting Results: Residual variance coviarance```{r}parameterestimates(model01_path_noNA.fit) |>filter(op =="~~") |>select(lhs, rhs, est) |>pivot_wider(names_from = rhs, values_from = est)```In mixed model, we cannot get the z-value (significance testing)```{r}getVarCov(model01_mixed)```## Comparing and Contrasting Results: Correlation```{r}standardizedsolution(model01_path_noNA.fit) |>filter(op =="~~", lhs != rhs)``````{r}model01_mixed$modelStruct$corStruct```## REML: Residual Maximum Likelihood Estimation- The ML estimator is nice, but the variance estimate is downward biased (too small) - Remember – it divides by N for the residual covariance matrix- In small samples, this is likely to lead to biased estimates and incorrect p-values - The variance goes into the SE, which goes into the Wald test, which dictates the p-value for the beta- Instead, another maximum likelihood technique has been developed: **Residual Maximum Likelihood** (REML) - Maximizes the likelihood of the residuals rather than the data - Has unbiased estimates of the residual covariance matrix - Is the default method of estimation for most mixed model estimation packages------------------------------------------------------------------------- There is one catch to REML: you cannot use a LRT to compare nested models with differing fixed effects - Because the algorithm uses residuals not data likelihood, if the residuals change, the likelihood changes - Residuals come from the fixed effects $\rightarrow$ if fixed effects are different, then residuals change, causing the likelihood to change - Can use multivariate Wald test for fixed effects- Don't mix ML and REML for the same analysis# Adding Predictors To The Model## Adding more predictors: female and cc10- Adding predictors to the model is similar to adding predictors in regular regression models- By using REML we cannot compare models using likelihood ratio tests - REML LRTs must have same fixed effects - Adding predictors adds new fixed effects to the empty model- We are predicting each DV with `female`, `cc10`, and `female*cc10`## Model with Predictors: Syntax```{r}#| output-location: slide# Model 02: all predictors includedmodel02_formula =as.formula("score ~ 1 + dPerf + female + dPerf*female + cc10 + dPerf*cc10 + female*cc10 + dPerf*female*cc10")model02_mixed <-gls(model = model02_formula, method ="REML",data = dat_long, correlation =corSymm(form =~1|id),weights =varIdent(form =~1|DV))summary(model02_mixed)``````{r}getVarCov(model02_mixed)``````{r}summary(model02_mixed)$tTable |>round(3)```1. $\beta_0 = 13.689, p < .001$2. $\beta_{dPerf} = 38.110, p < .001$3. $\beta_{female} = 0.658, p = 0.087$4. $\beta_{cc10} = 0.099, p = 0.006$5. $\beta_{dPerf*female} = 1.177, p = 0.558$6. $\beta_{dPerf*cc10} = 0.097, p = 0.626$7. $\beta_{female*cc10} = 0.094, p = 0.163$8. $\beta_{dPerf*female*cc10} = 0.166, p = 0.637$## First Question: Which Model “Fits” Better?- After adding the predictors (estimating their betas) to the model, we must first ask which model fits better- A likelihood ratio test (LRT) cannot be performed as we are using REML- Use multivariate wald test```{r}library(multcomp)model2_model_matrix <-diag(rep(1, 8))rownames(model2_model_matrix) <-c("Intercept","dPerf","female","cc10","dPerf:female","dPerf:cc10","female:cc10","dPerf:female:cc10")effects <-glht(model = model02_mixed, linfct = model2_model_matrix)summary(effects)summary(effects, test =Ftest())```- Note: R’s `nlme` function doesn’t do a good job with df.residual and provides a Chi-square test- Also note there are 6 degrees of freedom (one for each additional beta weight in the model)## Questions that can be answered- What is the effect of college experience on usefulness for males?- What is the effect of college experience on usefulness for females?- What is the difference between males and females ratings of usefulness when college experience = 10?- How did the difference between males and females ratings change for each additional hour of college experience?- What is the effect of college experience on performance for males?- What is the effect of college experience on performance for females?- What is the difference between males and females performance when college experience = 10?- How did the difference between males and females performance change for each additional hour of college experience?## Model R-squaredTo determine the model R-squared, we have to compare the variance/covariance matrix from model01 and model02 and make the statistics ourselves:```{r}Vmodel01 =getVarCov(model01_mixed)Vmodel02 =getVarCov(model02_mixed)## Rsquare for Performance and Usefulness(diag(Vmodel01) -diag(Vmodel02)) /diag(Vmodel01)```- 6.47% variance of performance was explained by added predictors.- 0.33% variance of usefulness was explained by added predictors.## Exercise 2: Model mixed model with mse, mas and msc- Model 1: A path analysis with mse, mas, and msc as outcomes- Model 2: A empty mixed model with mse, mas, and msc are repeated measures nested in each individual- Compare two models: intercepts and correlations```{r}#| echo: false#| eval: falseexe2_pathmodel.syntax ="# Means:mse ~ 1mas ~ 1msc ~ 1# Variances:mse ~~ msemas ~~ masmsc ~~ msc# Covariance:mse ~~ masmse ~~ mscmas ~~ msc"exe2_pathmodel.fit =sem(exe2_pathmodel.syntax, data = dat,fixed.x =TRUE, mimic ="MPLUS", estimator ="MLR")## Mixed modeldat_exe2 <- dat |> dplyr::select(id, msc, mas, mse) |>pivot_longer(cols =c(msc, mas, mse), names_to ="survey", values_to ="score")dat_exe2_complete <- dat_exe2[complete.cases(dat_exe2),]dat_exe2_complete$survey <-factor(dat_exe2_complete$survey, levels =c("mse", "mas", "msc"))exe2_mixedmodel <-gls(model = score ~1+ survey, data = dat_exe2_complete, method ="REML",correlation =corSymm(form =~1|id),weights =varIdent(form =~1|survey))## Interceptparameterestimates(exe2_pathmodel.fit) |>filter(op =="~1") |> dplyr::select(lhs, est) # Path Modelintercept_mixed_model <-summary(exe2_mixedmodel)$tTableintercept_mixed_model_vector <-c(intercept_mixed_model[1], intercept_mixed_model[2:3, 1] + intercept_mixed_model[1, 1] )names(intercept_mixed_model_vector) <-c("mse", "mas", "msc")intercept_mixed_model_vector## Residual correlationstandardizedsolution(exe2_pathmodel.fit) |>filter(op =="~~") |> dplyr::select(lhs, rhs, est.std) |>pivot_wider(names_from = rhs, values_from = est.std) # mse~~mas:0.565962; mse~~msc:0.6446238; mas~~msc:0.8695978exe2_mixedmodel$modelStruct$corStruct # msc~~mas: 0.807; msc~~mse: 0.643; mas~~mse: 0.531```## Wrapping up- Things we get directly from path models that we do not get directly in mixed models: - Tests for approximate model fit - Scaled Chi-square for some types of non-normal data - Standardized parameter coefficients - Tests for indirect effects - R-squared statistics- Things we get directly in mixed models that we do not get in path models: - REML (unbiased estimates of variances/covariances)- In this lecture we discussed the basics of mixed model analyses for multivariate models - Model specification/identification - Model estimation - Model modification and re-estimation - Final model parameter interpretation- There is a lot to the analysis - but what is important to remember is the over-arching principal of multivariate analyses: covariance between variables is important - Mixed models imply very specific covariance structures - The validity of the results still hinge upon accurately finding an approximation to the covariance matrix