Lecture 10

Generalized Measurement Models: Modeling Observed Polytomous Data

Jihong Zhang

Educational Statistics and Research Methods

Previous Class

Dive deep into factor scoring
Show how different initial values affect Bayesian model estimation
Show how parameterization differs for standardized latent variables vs. marker item scale identification

Today’s Lecture Objectives

Show how to estimate unidimensional latent variable models with polytomous data
- Also know as Polytomous Item response theory (IRT) or Item factor analysis (IFA)
Distributions appropriate for polytomous (discrete; data with lower/upper limits)

Example Data: Conspiracy Theories

Today’s example is from a bootstrap resample of 177 undergraduate students at a large state university in the Midwest.
The survey was a measure of 10 questions about their beliefs in various conspiracy theories that were being passed around the internet in the early 2010s
All item responses were on a 5-point Likert scale with:
1. Strong Disagree \(\rightarrow\) 0
2. Disagree \(\rightarrow\) 0
3. Neither Agree nor Disagree \(\rightarrow\) 0
4. Agree \(\rightarrow\) 1
5. Strongly Agree \(\rightarrow\) 1
The purpose of this survey was to study individual beliefs regarding conspiracies.
Our purpose in using this instrument is to provide a context that we all may find relevant as many of these conspiracies are still prevalent.

From Previous Lectures: CFA (Normal Outcomes)

For comparisons today, we will be using the model where we assumed each outcome was (conditionally) normally distributed:

For an item \(i\) the model is:

\[ Y_{pi}=\mu_i+\lambda_i\theta_p+e_{pi}\\ e_{pi}\sim N(0,\psi_i^2) \]

Recall that this assumption wasn’t good one as the type of data (discrete, bounded, some multimodality) did not match the normal distribution assumption.

Polytomous Data Characteristics

As we have done with each observed variable, we must decide which distribution to use

To do this, we need to map the characteristics of our data on to distributions that share those characteristics

Our observed data:

Discrete responses
Small set of known categories: 1,2,3,4,5
Some observed item responses may be multimodal

Discrete Data Distributions

Stan has a list of distributions for bounded discrete data

Binomial distribution
- Pro: Easy to use / code
- Con: Unimodal distribution
Beta-binomial distribution
- Not often used in psychometrics (but could be)
- Generalizes binomial distribution to have different probability for each trial
Hypergeometric distribution
- Not often used in psychometrics
Categorical distribution (sometimes called multinomial)
- Most frequently used
- Base distribution for graded response, partial credit, and nominal response models
Discrete range distribution (sometimes called uniform)
- Not useful–doesn’t have much information about latent variables

Binomial Distribution Models

The binomial distribution is one of the easiest to use for polytomous items

However, it assumes the distribution of responses are unimodal

Binomial probability mass function (i.e., pdf):

\[ P(Y=y)=\begin{pmatrix}n\\y\end{pmatrix}p^y(1-p)^{(n-y)} \]

Parameters:

n - “number of trials” (range: \(n \in \{0, 1, \dots\}\))
y - “number of successes” out of \(n\) “trials” (range: \(y\in\{0,1,\dots,n\}\))
p - “probability of success” (range: [0, 1])
Mean: \(np\)
Variance: \(np(1-p)\)

Probability Mass Function

Fixing n = 4 and probability of “brief in conspiracy theory” for each item p = {.1, .3, .5, .7} , we can know probability mass function across each item response from 0 to 4.

Question: which shape of distribution matches our data’s empirical distribution most?

Adapting the Binomial for Item Response Models

Although it doesn’t seem like our items fit with a binomial, we can actually use this distribution

Item response: number of successes \(y_i\)
- Needed: recode data so that lowest category is 0 (subtract one from each item)
Highest (recoded) item response: number of trials \(n\)
- For all out items, once recoded, \(n_i = 4\)
Then, use a link function to model each item’s \(p_i\) as a function of the latent trait:

\[ P(Y_i=y_i)=\begin{pmatrix}4\\y_i\end{pmatrix}p^{y_i}(1-p)^{(4-y_i)} \]

where probability of success of item \(i\) for individual \(p\) is as:

\[ p_{pi}=\frac{\text{exp}(\mu_i+\lambda_i\theta_p)}{1+\text{exp}(\mu_i+\lambda_i\theta_p)} \]

Note:

Shown with a logit link function (but could be any link)
Shown in slope/intercept form (but could be distrimination/difficulty for unidimensional items)
Could also include asymptote parameters (\(c_i\) or \(d_i\))

Binomial Item Response Model

The item response model, put into the PDF of the binomial is then:

\[ P(Y_{pi}|\theta_p)=\begin{pmatrix}n_i\\Y_{pi}\end{pmatrix}(\frac{\text{exp}(\mu_i+\lambda_i\theta_p)}{1+\text{exp}(\mu_i+\lambda_i\theta_p)})^{y_{pi}}(1-\frac{\text{exp}(\mu_i+\lambda_i\theta_p)}{1+\text{exp}(\mu_i+\lambda_i\theta_p)})^{4-y_{pi}} \]

Further, we can use the same priors as before on each of our item parameters

\(\mu_i\): Normal Prior \(N(0, 100)\)
\(\lambda_i\): Normal prior \(N(0, 100)\)

Likewise, we can identify the scale of the latent variable as before, too:

\(\theta_p \sim N(0,1)\)

`model{}` for the Binomial Model in Stan

model {
  lambda ~ multi_normal(meanLambda, covLambda); // Prior for item discrimination/factor loadings
  mu ~ multi_normal(meanMu, covMu);             // Prior for item intercepts
  theta ~ normal(0, 1);                         // Prior for latent variable (with mean/sd specified)
  for (item in 1:nItems){
    Y[item] ~ binomial(maxItem[item], inv_logit(mu[item] + lambda[item]*theta));
  }
}

Here, the binomial item response function has two arguments:

The first part: (maxItem[Item]) is the number of “trials” \(n_i\) (here, our maximum score minus one – 4)
The second part: (inv_logit(mu[item] + lambda[item]*theta)) is the probability from our model (\(p_i\))

The data Y[item] must be:

Type: integer
Range: 0 through maxItem[item]

`parameters{}` for the Binomial Model in Stan

parameters{
  vector[nObs] theta;                // the latent variables (one for each person)
  vector[nItems] mu;                 // the item intercepts (one for each item)
  vector[nItems] lambda;             // the factor loadings/item discriminations (one for each item)
}

No changes from any of our previous slope/intercept models

`data{}` for the Binomial Model in Stan

data {
  int<lower=0> nObs;                     // number of observations
  int<lower=0> nItems;                   // number of items
  array[nItems] int<lower=0> maxItem;    // maximum value of Item (should be 4 for 5-point Likert)
  
  array[nItems, nObs] int<lower=0>  Y;   // item responses in an array

  vector[nItems] meanMu;                 // prior mean vector for intercept parameters
  matrix[nItems, nItems] covMu;          // prior covariance matrix for intercept parameters
  
  vector[nItems] meanLambda;             // prior mean vector for discrimination parameters
  matrix[nItems, nItems] covLambda;      // prior covariance matrix for discrimination parameters
}

Note:

Need to supply maxItem (maximum score minus one for each item)
The data are the same (integer) as in the binary/dichotomous item syntax

Preparing Data for Stan

# note: data must start at zero
conspiracyItemsBinomial = conspiracyItems
for (item in 1:ncol(conspiracyItemsBinomial)){
  conspiracyItemsBinomial[, item] = conspiracyItemsBinomial[, item] - 1
}

# check first item
table(conspiracyItemsBinomial[,1])


 0  1  2  3  4 
49 51 47 23  7

# determine maximum value for each item
maxItem = apply(X = conspiracyItemsBinomial,
                MARGIN = 2, 
                FUN = max)
maxItem

 item1  item2  item3  item4  item5  item6  item7  item8  item9 item10 
     4      4      4      4      4      4      4      4      4      4

R Data List for Binomial Model

### Prepare data list -----------------------------
# data dimensions
nObs = nrow(conspiracyItems)
nItems = ncol(conspiracyItems)

# item intercept hyperparameters
muMeanHyperParameter = 0
muMeanVecHP = rep(muMeanHyperParameter, nItems)

muVarianceHyperParameter = 1000
muCovarianceMatrixHP = diag(x = muVarianceHyperParameter, nrow = nItems)

# item discrimination/factor loading hyperparameters
lambdaMeanHyperParameter = 0
lambdaMeanVecHP = rep(lambdaMeanHyperParameter, nItems)

lambdaVarianceHyperParameter = 1000
lambdaCovarianceMatrixHP = diag(x = lambdaVarianceHyperParameter, nrow = nItems)


modelBinomial_data = list(
  nObs = nObs,
  nItems = nItems,
  maxItem = maxItem,
  Y = t(conspiracyItemsBinomial), 
  meanMu = muMeanVecHP,
  covMu = muCovarianceMatrixHP,
  meanLambda = lambdaMeanVecHP,
  covLambda = lambdaCovarianceMatrixHP
)

Binomial Model Stan Call

modelBinomial_samples = modelBinomial_stan$sample(
  data = modelBinomial_data,
  seed = 12112022,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 5000,
  iter_sampling = 5000,
  init = function() list(lambda=rnorm(nItems, mean=5, sd=1))
)

Seed as 12112022
Number of Markov Chains as 4
Warmup Iterations as 5000
Sampling Iterations as 5000
Initial values of \(\lambda\)s as \(N(5, 1)\)

Binomial Model Results

# checking convergence
max(modelBinomial_samples$summary(.cores = 4)$rhat, na.rm = TRUE)

[1] 1.003589

# item parameter results
print(modelBinomial_samples$summary(variables = c("mu", "lambda"), .cores = 4) ,n=Inf)

# A tibble: 20 × 10
   variable     mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
   <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
 1 mu[1]      -0.840 -0.838 0.126 0.126 -1.05  -0.637  1.00    2506.    6240.
 2 mu[2]      -1.84  -1.83  0.215 0.212 -2.21  -1.50   1.00    2374.    6009.
 3 mu[3]      -1.99  -1.98  0.223 0.221 -2.38  -1.65   1.00    2662.    6596.
 4 mu[4]      -1.73  -1.72  0.208 0.206 -2.08  -1.40   1.00    2341.    5991.
 5 mu[5]      -2.00  -1.99  0.247 0.244 -2.43  -1.62   1.00    2046.    4880.
 6 mu[6]      -2.10  -2.09  0.241 0.240 -2.51  -1.72   1.00    2370.    6109.
 7 mu[7]      -2.44  -2.43  0.261 0.257 -2.90  -2.04   1.00    2735.    5943.
 8 mu[8]      -2.16  -2.15  0.240 0.239 -2.58  -1.79   1.00    2452.    6329.
 9 mu[9]      -2.40  -2.39  0.278 0.274 -2.88  -1.97   1.00    2412.    6524.
10 mu[10]     -3.97  -3.93  0.517 0.511 -4.88  -3.19   1.00    3523.    7434.
11 lambda[1]   1.11   1.10  0.143 0.142  0.881  1.35   1.00    5249.    8927.
12 lambda[2]   1.90   1.89  0.246 0.241  1.53   2.33   1.00    4555.    6779.
13 lambda[3]   1.92   1.90  0.254 0.253  1.52   2.36   1.00    4719.    7465.
14 lambda[4]   1.89   1.88  0.235 0.230  1.53   2.30   1.00    4539.    7441.
15 lambda[5]   2.28   2.26  0.281 0.277  1.84   2.76   1.00    4041.    7534.
16 lambda[6]   2.15   2.13  0.273 0.269  1.72   2.62   1.00    4082.    6668.
17 lambda[7]   2.13   2.11  0.293 0.292  1.68   2.64   1.00    4389.    6712.
18 lambda[8]   2.08   2.07  0.268 0.263  1.67   2.55   1.00    4109.    6184.
19 lambda[9]   2.35   2.33  0.315 0.308  1.87   2.90   1.00    4302.    7267.
20 lambda[10]  3.40   3.36  0.559 0.541  2.56   4.39   1.00    4807.    6536.

Option Characteristic Curves

Expected probability of certain response across a range of latent variable \(\theta\)

ICC for Item 10: The expected scores with latent variable

ICC for all items: which items have high expected score given theta

Investigate Latent Variable Estimates

Compare factors scores of Binomial model to 2PL-IRT

Comparing EAP Estimates to Sum Scores

Categorical / Multinomial Distribution Models

Multinomial Distribution Models

Although the binomial distribution is easy, it may not fit our data well

Instead, we can use the categorical distribution, with following PMF:

\[ P(Y=y)=\frac{n!}{y_i!\dots y_C!}p_1^{y_1}\dots p_C^{y_C} \]

Here,

\(n\): number of “trials”
\(y_c\): number of events in each of \(c\) categories (c \(\in \{1, \cdots, C\}\); \(\Sigma_cy_c = n\))
\(p_c\): probability of observing an event in category \(c\)

Graded Response Model (GRM)

GRM is one of the most popular model for ordered categorical response

\[ P(Y_{ic}|\theta) = 1 - P^*(Y_{i1}|\theta) \ \ \text{if}\ c=1\\ P(Y_{ic}|\theta) = P^*(Y_{i,c-1}|\theta)-P^*(Y_{i,c}|\theta) \ \ \text{if}\ 1<c<C\\ P(Y_{ic}|\theta) = P^*(Y_{i,C-1}|\theta) \ \ \text{if}\ c=C_i\\ \]where probability of response larger than \(c\):

\[ P^*(Y_{i,c}|\theta) = P(Y_{i,c}>c|\theta)=\frac{\text{exp}(\mu_{ic}+\lambda_u\theta_p)}{1+\text{exp}(\mu_{ic}+\lambda_u\theta_p)} \]

With:

\(C_i-1\) has ordered intercept intercepts: \(\mu_i > \mu_2 > \cdots > \mu_{C_i-1}\)

`model{}` for GRM in Stan

model {
  
  lambda ~ multi_normal(meanLambda, covLambda); // Prior for item discrimination/factor loadings
  theta ~ normal(0, 1);                         // Prior for latent variable (with mean/sd specified)
  
  for (item in 1:nItems){
    thr[item] ~ multi_normal(meanThr[item], covThr[item]);             // Prior for item intercepts
    Y[item] ~ ordered_logistic(lambda[item]*theta, thr[item]);
  }
  
}

ordered_logistic is the built-in Stan function for ordered categorical outcomes
- Two arguments neeeded for the function
- lambda[item]*theta is the muliplications
- thr[item] is the thresholds for the item, which is negative values of item intercept
The function expects the response of Y starts from 1

`parameters{}` for GRM in Stan

parameters {
  vector[nObs] theta;                // the latent variables (one for each person)
  array[nItems] ordered[maxCategory-1] thr; // the item thresholds (one for each item category minus one)
  vector[nItems] lambda;             // the factor loadings/item discriminations (one for each item)
}

Note that we need to declare threshould as: ordered[maxCategory-1] so that

\[ \tau_{i1} < \tau_{i2} <\cdots < \tau_{iC-1} \]

`generated quantities{}` for GRM in Stan

generated quantities{
  array[nItems] vector[maxCategory-1] mu;
  for (item in 1:nItems){
    mu[item] = -1*thr[item];
  }
}

Convert threshold back into item intercept

`data{}` for GRM in Stan

data {
  int<lower=0> nObs;                            // number of observations
  int<lower=0> nItems;                          // number of items
  int<lower=0> maxCategory; 
  
  array[nItems, nObs] int<lower=1, upper=5>  Y; // item responses in an array

  array[nItems] vector[maxCategory-1] meanThr;   // prior mean vector for intercept parameters
  array[nItems] matrix[maxCategory-1, maxCategory-1] covThr;      // prior covariance matrix for intercept parameters
  
  vector[nItems] meanLambda;         // prior mean vector for discrimination parameters
  matrix[nItems, nItems] covLambda;  // prior covariance matrix for discrimination parameters
}

Note that the input for the prior mean/covariance matrix for threshold parameters is now an array (one mean vector and covariance matrix per item)

Prepare Data for GRM

To match the array for input for the threshold hyperparameter matrices, a little data manipulation is needed

# item threshold hyperparameters
thrMeanHyperParameter = 0
thrMeanVecHP = rep(thrMeanHyperParameter, maxCategory-1)
thrMeanMatrix = NULL
for (item in 1:nItems){
  thrMeanMatrix = rbind(thrMeanMatrix, thrMeanVecHP)
}

thrVarianceHyperParameter = 1000
thrCovarianceMatrixHP = diag(x = thrVarianceHyperParameter, nrow = maxCategory-1)
thrCovArray = array(data = 0, dim = c(nItems, maxCategory-1, maxCategory-1))
for (item in 1:nItems){
  thrCovArray[item, , ] = diag(x = thrVarianceHyperParameter, nrow = maxCategory-1)
}

R array matches Stan’s array type

GRM Results

# checking convergence
max(modelOrderedLogit_samples$summary(.cores = 5)$rhat, na.rm = TRUE)

[1] 1.003504

# item parameter results
print(modelOrderedLogit_samples$summary(variables = c("lambda", "mu"), .cores = 5) ,n=Inf)

# A tibble: 50 × 10
   variable       mean   median    sd   mad      q5     q95  rhat ess_bulk
   <chr>         <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>    <dbl>
 1 lambda[1]    2.13     2.11   0.286 0.282   1.68   2.61    1.00    4435.
 2 lambda[2]    3.06     3.04   0.428 0.422   2.40   3.80    1.00    4504.
 3 lambda[3]    2.57     2.55   0.382 0.376   1.98   3.23    1.00    5053.
 4 lambda[4]    3.09     3.07   0.423 0.416   2.44   3.82    1.00    4762.
 5 lambda[5]    4.98     4.92   0.762 0.746   3.84   6.34    1.00    3883.
 6 lambda[6]    5.01     4.95   0.760 0.738   3.88   6.35    1.00    4065.
 7 lambda[7]    3.22     3.19   0.487 0.483   2.47   4.06    1.00    4260.
 8 lambda[8]    5.33     5.26   0.863 0.839   4.06   6.87    1.00    3751.
 9 lambda[9]    3.14     3.10   0.480 0.472   2.40   3.97    1.00    4633.
10 lambda[10]   2.64     2.61   0.473 0.463   1.92   3.47    1.00    5129.
11 mu[1,1]      1.49     1.48   0.286 0.285   1.03   1.97    1.00    2189.
12 mu[2,1]      0.182    0.185  0.332 0.325  -0.374  0.712   1.00    1430.
13 mu[3,1]     -0.447   -0.442  0.297 0.296  -0.944  0.0271  1.00    1641.
14 mu[4,1]      0.348    0.350  0.336 0.334  -0.204  0.895   1.00    1427.
15 mu[5,1]      0.332    0.338  0.494 0.483  -0.484  1.14    1.00    1188.
16 mu[6,1]     -0.0321  -0.0198 0.498 0.488  -0.859  0.769   1.00    1166.
17 mu[7,1]     -0.809   -0.793  0.360 0.353  -1.42  -0.245   1.00    1424.
18 mu[8,1]     -0.381   -0.365  0.530 0.509  -1.28   0.454   1.00    1187.
19 mu[9,1]     -0.740   -0.726  0.353 0.346  -1.34  -0.185   1.00    1487.
20 mu[10,1]    -1.97    -1.95   0.394 0.385  -2.65  -1.37    1.00    2329.
21 mu[1,2]     -0.654   -0.649  0.258 0.256  -1.09  -0.244   1.00    1816.
22 mu[2,2]     -2.21    -2.19   0.390 0.381  -2.88  -1.61    1.00    1918.
23 mu[3,2]     -1.67    -1.66   0.327 0.321  -2.23  -1.15    1.00    1901.
24 mu[4,2]     -1.79    -1.78   0.366 0.361  -2.42  -1.22    1.00    1668.
25 mu[5,2]     -3.18    -3.14   0.627 0.608  -4.28  -2.23    1.00    1807.
26 mu[6,2]     -3.25    -3.21   0.612 0.599  -4.30  -2.31    1.00    1660.
27 mu[7,2]     -2.72    -2.71   0.435 0.425  -3.48  -2.04    1.00    1879.
28 mu[8,2]     -3.26    -3.22   0.649 0.637  -4.41  -2.29    1.00    1655.
29 mu[9,2]     -2.67    -2.64   0.433 0.424  -3.42  -1.99    1.00    2109.
30 mu[10,2]    -3.25    -3.22   0.463 0.455  -4.05  -2.54    1.00    2987.
31 mu[1,3]     -2.51    -2.50   0.317 0.319  -3.04  -2.00    1.00    2449.
32 mu[2,3]     -4.13    -4.11   0.500 0.494  -5.00  -3.36    1.00    3009.
33 mu[3,3]     -4.35    -4.32   0.517 0.503  -5.26  -3.56    1.00    4073.
34 mu[4,3]     -4.59    -4.56   0.538 0.533  -5.52  -3.75    1.00    3361.
35 mu[5,3]     -6.04    -5.98   0.867 0.852  -7.56  -4.73    1.00    3042.
36 mu[6,3]     -7.47    -7.39   1.02  1.00   -9.25  -5.93    1.00    3599.
37 mu[7,3]     -5.11    -5.08   0.636 0.629  -6.21  -4.12    1.00    3421.
38 mu[8,3]     -9.02    -8.89   1.32  1.29  -11.4   -7.07    1.00    4280.
39 mu[9,3]     -4.00    -3.98   0.521 0.512  -4.90  -3.19    1.00    2803.
40 mu[10,3]    -4.48    -4.44   0.560 0.551  -5.45  -3.62    1.00    4067.
41 mu[1,4]     -4.53    -4.50   0.507 0.506  -5.39  -3.74    1.00    5718.
42 mu[2,4]     -5.76    -5.72   0.683 0.676  -6.94  -4.69    1.00    5189.
43 mu[3,4]     -5.52    -5.48   0.660 0.649  -6.67  -4.50    1.00    5692.
44 mu[4,4]     -5.55    -5.52   0.635 0.623  -6.65  -4.57    1.00    4458.
45 mu[5,4]     -8.62    -8.53   1.17  1.14  -10.7   -6.85    1.00    4420.
46 mu[6,4]    -10.4    -10.3    1.46  1.44  -13.0   -8.23    1.00    6549.
47 mu[7,4]     -6.87    -6.81   0.871 0.851  -8.36  -5.52    1.00    5615.
48 mu[8,4]    -12.1    -11.9    1.91  1.86  -15.5   -9.26    1.00    7019.
49 mu[9,4]     -5.79    -5.75   0.714 0.703  -7.04  -4.70    1.00    4707.
50 mu[10,4]    -4.74    -4.71   0.589 0.583  -5.77  -3.83    1.00    4319.
# ℹ 1 more variable: ess_tail <dbl>

Option Characteristics Curve

OPC for other 9 items

EAP of Factor Scores for Four Models

Posterior SDs around Factor Scores

Resources

Dr. Templin’s slide

Lecture 10

Previous Class

Today’s Lecture Objectives

Example Data: Conspiracy Theories

From Previous Lectures: CFA (Normal Outcomes)

Polytomous Data Characteristics

Discrete Data Distributions

Binomial Distribution Models

Binomial Distribution Models

Probability Mass Function

Adapting the Binomial for Item Response Models

Binomial Item Response Model

model{} for the Binomial Model in Stan

parameters{} for the Binomial Model in Stan

data{} for the Binomial Model in Stan

Preparing Data for Stan

R Data List for Binomial Model

Binomial Model Stan Call

Binomial Model Results

Option Characteristic Curves

ICC for Item 10: The expected scores with latent variable

ICC for all items: which items have high expected score given theta

Investigate Latent Variable Estimates

Comparing EAP Estimates to Sum Scores

Categorical / Multinomial Distribution Models

Multinomial Distribution Models

Graded Response Model (GRM)

model{} for GRM in Stan

parameters{} for GRM in Stan

generated quantities{} for GRM in Stan

data{} for GRM in Stan

Prepare Data for GRM

GRM Results

Option Characteristics Curve

OPC for other 9 items

EAP of Factor Scores for Four Models

Posterior SDs around Factor Scores

Resources

`model{}` for the Binomial Model in Stan

`parameters{}` for the Binomial Model in Stan

`data{}` for the Binomial Model in Stan

`model{}` for GRM in Stan

`parameters{}` for GRM in Stan

`generated quantities{}` for GRM in Stan

`data{}` for GRM in Stan