## True Covariance Matrix - \(\bb{\Sigma}\)

```
[,1] [,2] [,3]
[1,] 1.00 -0.26 0.31
[2,] -0.26 1.00 -0.08
[3,] 0.31 -0.08 1.00
```

Network Psychometrics

Jihong Zhang

Educational Statistics and Research Methods

Understand what is network psychometrics and

Understand the relationship between psychometric/psychological network with factor analysis

Understand how network model can be applied into real scenarios

Network analysis is a broad area. It has many names in varied fields:

- Graphical Models (Computer Science, Machine Learning)
- Bayesian Network (Computer Science, Educational Measurement)
- Social Network (Sociology)
- Psychological/Psychometric Network (Psychopathology, Psychology)
- Structural Equation Model, Path Analysis (Psychology, Education)

1 and 2 focus on the probabilistic relationships and further casual relationships among variables. 3 and 4 focuses on network structure and node importance. 5 focus on the regression coefficients of structural and measurement mdoel.

All 5 analysis methods have a network-shaped diagram. __ Graphical modeling__ is a more “general” term that can comprise of the other network models.

- Bayesian Network (BN) aims to derive the casual relations between variables
- Social Network aims to examine the network structure (community, density or centrality) of
__individuals__ - Factor Analysis aims to identify latent variables
- Psychological Network aims to examines the associations among observed variables (topological structures) and their positions in the network.

- Network psychometrics is a novel area that allows representing complex phenomena of measured constructs in terms of a set of elements that interact with each other.
- It is inspired by the so-called
__mutualism model__and research in ecosystem modeling (Kan, Maas, and Levine 2019).- A mutualism model proposes that basic cognitive abilities directly and positively interact during development.

- Psychometric network arises as
__dynamics or reciprocal causation__among variables are getting more attentions. - For example, individual differences in depression could arise from, and could be maintained by, vicious cycles of mutual relationships among symptoms.
- A depression symptom such as insomnia can cause another symptom, such as fatigue, which in turn can determine concentration problems and worrying, which can result in more insomnia and so on.

__ Factor analysis__ (common factor model) assumes the associations between observed features can be explained by one or more common factors.

- For example, higher “depression” level leads to increased frequency of depressive behaviors

__ Psychometric network__, however, assumes the associations between observed features ARE the reason of the development of depression. Or “depression” is the network itself.

- unavoidable cycle of “fatigue -> worrying -> insomnia -> fatigue” will leads to higher “depression”

- Explain the pathways of certain psychological phenomenon
- Identify the most important problem that needed to be intervene in the treatment procedure
- Examine group differences in interactions among observed features
- Examine density of network: more dense network indicates more dynamic of certain problems
- Examine clusters/communities of observed features: some symptoms are more likely to concur than other symptoms

__Network structure estimation__: the application of statistical models to assess the structure of pairwise (conditional) associations in multivariate data.__Network description__: characterization of the global topology and the position of individual nodes in that topology.__Psychometric network analysis__: the analysis of multivariate psychometric data using network structure estimation and network description.

__Node__: psychometric variables that are selected in the network- such as responses to questionnaire items, symptom ratings, and cognitive test scores, background variables such as age and gender, experimental interventions.

__Edge (conditional association)__: associations between variables taking into account other variables that may explain the association__Edge weight__: parameter estimates that represent the strength of conditional association between nodes__Node centrality__: the relative importance of a node in the context of other nodes, that can be calculated using different statistics

__Node selection__: the choice of which variables will function as nodes in the network model.__Network stability analysis__: the assessment of estimation precision and robustness to sampling error of psychometric networks.__Pairwise Markov random field (PMRF)__: an undirected network that represents variables as nodes and conditional associations as edges, in which unconnected nodes are conditionally independent.

Pyschometric network is exploratory by nature. To obtain a meaningful network structure, psychometric networks need to drop weak edges but keep strong edges.

This procedure is typically called edge selection. One popular edge selection method is __regularization__.

- Original network structure without regularization is called saturated network; vice versa regularized network

Psychometric network analysis methodology includes steps of network structure estimation (to construct the network), network description (to characterize the network) and network stability analysis (to assess the robustness of results).

- Cross-sectional data (N = large, T = 1)
- Ising model for categorical variables
- Gaussian graphical model (GGM; Foygel & Drton, 2010) for continuous variables
- Mixed graphical model (MGM) for mixed types of variables

- Panel data (N >> T)
- Multilevel graphical vector autoregression
- i.e., longitudinal data, repeated measures

- Multilevel graphical vector autoregression
- Time-series data (\(N \geq 1\), T = large)
- Graphical vector autoregression
- i.e., ecological momentary assessment, conducted via smartphones

- Graphical vector autoregression

GGM is one type of Pairwise Markov random field (PMRF) when data are continuous.

In a PMRF, the joint likelihood of multivariate data is modelled through the use of pairwise conditional associations, leading to a network representation that is undirected.

For \(p\)-dimensional data following multivariate normal distribution:

\[ \newcommand{\bb}[1]{\boldsymbol{#1}} \bb{X} \sim \mathcal{N}(\mu, \bb{K}^{-1}) \]

Where \(K\) is a inverse covariance matrix of \(\bb{X}\) (\(K = \Sigma^{-1}\)), also known as *precision*/*concentration* matrix.

To obtain sparse network structure, the \(i\)th row and \(j\)th column element of \(\bb{K}\), \(k_{ij}=0\) when edge \(\{j, k\}\) is not included in the network \(G\),

- It means \(X^{(i)}\) and \(X^{(j)}\) are independent conditional on the other variables .

GGM can be standardized as the partial correlation network, in which each edge of GGM representing partial correlations between two nodes.

\[ \rho_{ij}=\rho(X^{(i)}, X^{(j)}|\bb{X}^{-(i,j)}) = -\frac{k_{ij}}{\sqrt{k_{ii}}\sqrt{k_{jj}}} \]

Assume there are three variables: fatigue, insomnia, concentration

```
[,1] [,2] [,3]
[1,] 1.00 -0.26 0.31
[2,] -0.26 1.00 -0.08
[3,] 0.31 -0.08 1.00
```

- Someone who is tired is also more likely to suffer from concentration problems and insomnia.
- Concentration problems and insomnia are conditional independent given the level of fatigue
- Or the correlation between insomnia and concentration can be totally explained by the relationships of both variables with fatigue

Factor analysis model:

\[ \bb{X}\sim\mathcal{N}(\mu, \bb{\Lambda\Psi\Lambda^\text{T}+\Phi}) \]

GGM with partial correlation matrix:

\[ \bb{X} \sim \mathcal{N}(0, \bb{\Delta(I-\Omega)^{-1}\Delta}) \]

Where

- \(\bb{\Delta}\) is a diagonal scaling matrix that controls the variances
- \(\bb{\Omega}\) is a square symmetrical matrix with \(0\)s on the diagonal and partial correlation coefficients on the off diagonal.

`psychonetrics`

: Partial correlation matrix estimation```
[,1] [,2] [,3]
[1,] 1.00 -0.26 0.31
[2,] -0.26 1.00 -0.08
[3,] 0.31 -0.08 1.00
```

```
[,1] [,2] [,3]
[1,] 0.000 -0.248 0.300
[2,] -0.248 0.000 0.001
[3,] 0.300 0.001 0.000
```

```
[,1] [,2] [,3]
[1,] 0.921 0.000 0.000
[2,] 0.000 0.966 0.000
[3,] 0.000 0.000 0.951
```

`BGGM`

: Bayesian approach```
[,1] [,2] [,3]
[1,] 0.000 -0.258 0.189
[2,] -0.258 0.000 0.040
[3,] 0.189 0.040 0.000
```

```
BGGM: Bayesian Gaussian Graphical Models
---
Type: continuous
Analytic: FALSE
Formula:
Posterior Samples: 1000
Observations (n):
Nodes (p): 3
Relations: 3
---
Call:
BGGM::estimate(Y = dat, type = "continuous", analytic = FALSE,
iter = 1000)
---
Estimates:
Relation Post.mean Post.sd Cred.lb Cred.ub
1--2 -0.258 0.042 -0.342 -0.173
1--3 0.189 0.043 0.105 0.274
2--3 0.040 0.046 -0.055 0.129
---
```

Multiple procedure and software can be used to perform edge selection:

`prune`

function in`psychonetrics`

package uses stepdown model search by pruning non-significant parameters- a edge with significance level lower than \(\alpha\) will be removed and re-fit the model until no nonsignificant edge existed
- \(p\) values of edges needed to be adjusted

`EBICglasso`

in`qgraph`

and`glasso`

package uses Extended Bayesian Information Criterion (EBIC) to select best model\[ \text{EBIC}=-2\text{L}+E(\log(N))+4\gamma E(log(P)) \]

`select`

function in`BGGM`

package uses Bayesian Hypothesis Testing — Bayes Factor (BF) to select model- \(\mathcal{H}_0: \rho_{ij}=0\)
- \(\mathcal{H}_1: \rho_{ij}\in(-1, 1)\)
- \[ BF = \frac{p(\bb{X}|\mathcal{H}_0)}{p(\bb{X}|\mathcal{H}_1)} \]
- By default, edges with BF < 3 will be included

25 self-reported personality items representing 5 factors.

70.6% edges are nonzero using EBICglasso, while only 63.6% edges are nonzero using BF method and 62.0% edges are kept using \(\alpha =.01\) significance testing

```
library(psych)
library(qgraph)
data(bfi)
big5groups <- list(
Agreeableness = 1:5,
Conscientiousness = 6:10,
Extraversion = 11:15,
Neuroticism = 16:20,
Openness = 21:25
)
CorMat <- cor_auto(bfi[,1:25])
EBICgraph <- EBICglasso(CorMat, nrow(bfi), 0.5, threshold = TRUE)
## density
density_nonzero_edge <- function(pcor_matrix){
N_nonzero_edge = (sum(pcor_matrix == 0) - ncol(pcor_matrix)) /2
N_all_edge = ncol(pcor_matrix)*(ncol(pcor_matrix)-1)/2
N_nonzero_edge/N_all_edge
}
density_nonzero_edge(EBICgraph)
```

`[1] 0.7066667`

`[1] 0.62`

Strength centrality measures suggest that `C4`

and `E4`

or `N1`

have highest centrality indicating they play most imporatant roles in the networks.

`E4`

and `E5`

has highest bridge strength, indicating they serve as bridges linking communities of personality. They are important elements connecting varied types of personality

- Network analysis is an alternative way of modeling dependency among variables.
- Gaussian graphical model is used for multivariate continuous data.
- The goal is to estimate network structure, node importance, and stability.

Kan, Kees-Jan, Han L. J. van der Maas, and Stephen Z. Levine. 2019. “Extending Psychometric Network Analysis: Empirical Evidence Against *g* in Favor of Mutualism?” *Intelligence* 73 (March): 52–62. https://doi.org/10.1016/j.intell.2018.12.004.