Discussant Commentary: Cutscore Estimation Via Two-Stage Analysis—Implications for Psychometric Research and Practice

Slides

It is my honor to serve as discussant for this well-integrated symposium that addresses the challenge of cutscore estimation through innovative two-stage modeling frameworks.

Item Response Theory (IRT) has been around for decades, and Diagnostic Classification Models (DCMs) have been used for over ten years. Both have evolved into different variants, each suited for particular scenarios. However, challenges in communication between users of these two modeling traditions still remain. The main theme of this symposium stems from a simple but important question: how can we best combine IRT and DCMs to leverage the strengths of both? Today’s symposium brings together five presentations that tackle this question from different perspectives, including theoretical developments, methodological innovations, and practical applications.

Theoretical and Methodological Contributions

Dr. Alfonso Martinez opens the session by introducing a maximum likelihood estimation (MLE) approach for cutscore identification using a two-stage mixture model. He raises a very interesting point: diagnostic models are often pushed to classify individuals into groups without considering the magnitude of their underlying traits. This can lead to a situation where “mastery” and “non-mastery” groups may have very different meanings across different populations.

Instead of focusing only on classification, we can think of the latent ability space as a continuum, with certain points serving as latent cutpoints. In this framework, both the continuous distribution and the discrete cutpoints can be estimated. The idea of blending IRT with classification models is not new— for example, Kentaro Yamamoto’s Hybrid Model of IRT and Latent Class Models is an early effort in this area. However, the research questions in this space remain challenging and largely underexplored.

Alfonso’s two-stage MLE approach may offer a new way forward in addressing this topic.

Dr. Jonathan Templin expands on this framework through a Bayesian estimation of the two-stage mixture model. The Bayesian perspective not only facilitates the incorporation of prior knowledge, but also yields posterior uncertainty estimates around cutpoints. These are critical for defensible reporting of decision consistency and classification precision. Dr. Templin’s work exemplifies how Bayesian methods can offer transparency in probabilistic classification and absorb the expert’s knowledge.

Practical and Substantive Applications

Sergio Haab pushes the discussion from estimation to interpretation, addressing how scale score cutpoints can be made substantively meaningful. His approach advocates anchoring cutpoints in real-world behavioral or cognitive benchmarks rather than purely statistical partitions. This resonates with calls for construct validity in standard setting, and Mr. Haab provides empirical strategies for aligning score regions with theoretically performance descriptions. His contribution reminds us that psychometric modeling must support human judgment and policy making.

Ae Kyong Jung builds on the idea of estimating cutscores for continuous latent traits and applies it to the area of Computerized Adaptive Testing (CAT), focusing especially on the item selection process. The idea follows naturally: if you can convert a continuous latent distribution into a discrete one, using the method presented by Alfonso and Jonathan, you can then apply Shannon Entropy for item selection while using IRT for theta estimation at the same time. Ae Kyong then compares the use of Shannon Entropy and D-optimality in Multidimensional IRT (MIRT) in terms of item selection accuracy. Her results show that D-optimality in MIRT outperforms Shannon Entropy within the DCM framework. Overall, her work aims to find a balance between diagnostic precision and theta estimation accuracy.

Finally, Ahmed Bediwy presents a compelling application of the two-stage approach to standard setting procedures, suggesting that cutscore estimation can—and perhaps should—be integrated with operational practices such as the Angoff or Bookmark methods. By quantifying the psychometric implications of judgments and enabling model-based refinement, His work suggests a promising hybrid model that respects both statistical evidence and expert consensus.

Integrative Reflections and Future Directions

Together, these five presentations show a coherent vision for the future of hybrid general psychometric models, one that embraces methodological rigor, computational efficiency, and substantive relevance. Several cross-cutting themes deserve emphasis:

Two-stage modeling with different item responses types
The Bayesian approach with different priors
The field must continue to move beyond statistical adequacy to ensure interpretability of cutpoints, as emphasized by Mr. Haab.
Innovations in CAT Scoring including cutpoints information
The integration of psychometric modeling into standard setting practices represents a promising frontier, helping resolve the long-standing tension between expert judgment and statistical modeling.

In closing, this symposium not only advances the theoretical landscape of cutscore estimation but also expand its practical scenarios. As psychometricians, we are called not only to estimate well, but to explain and justify clearly. This work does both.

Thank you to all presenters for your contributions to this important dialogue.

Open Discussion

Given that we have cutpoints as well as The minimum test length needed for acceptable classification accuracy in CAT.
What about the non-normal latent traits.
Number of cutpoints per trait? More cutpoints for each traits.
The meanings of cutscores in real world.
The uncertainty of cutscores

HYBRID MODEL OF IRT AND LATENT CLASS MODELS, Kentaro Yamamoto (Dec 1982), https://onlinelibrary.wiley.com/doi/10.1002/j.2333-8504.1982.tb01326.x - Efficient Models for Cognitive Diagnosis With Continuous and Mixed-Type Latent Variables, Hong (Apr 2014). - A model that combines continuous and discrete latent variables is proposed that includes a noncompensatory item response theory (IRT) term and a term following the discrete attribute Deterministic Input, Noisy “And” Gate (DINA) model in cognitive diagnosis.