Hostname: page-component-6bb9c88b65-vqtzn Total loading time: 0 Render date: 2025-07-25T10:29:35.716Z Has data issue: false hasContentIssue false

Asymptotically Correct Person Fit z-Statistics For the Rasch Testlet Model

Published online by Cambridge University Press:  01 January 2025

Zhongtian Lin*
Affiliation:
Financial Industry Regulatory Authority
Tao Jiang
Affiliation:
Cambium Assessment
Frank Rijmen
Affiliation:
Cambium Assessment
Paul Van Wamelen
Affiliation:
Cambium Assessment
*
Correspondence should be made to Zhongtian Lin, Financial Industry Regulatory Authority, Washington, USA. Email: lzt713@gmail.com

Abstract

A well-known person fit statistic in the item response theory (IRT) literature is the lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} statistic (Drasgow et al. in Br J Math Stat Psychol 38(1):67-86, 1985). Snijders (Psychometrika 66(3):331-342, 2001) derived lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}^{*}$$\end{document}, which is the asymptotically correct version of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} when the ability parameter is estimated. However, both statistics and other extensions later developed concern either only the unidimensional IRT models or multidimensional models that require a joint estimate of latent traits across all the dimensions. Considering a marginalized maximum likelihood ability estimator, this paper proposes lzt\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}$$\end{document} and lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document}, which are extensions of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} and lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}^{*}$$\end{document}, respectively, for the Rasch testlet model. The computation of lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document} relies on several extensions of the Lord-Wingersky algorithm (1984) that are additional contributions of this paper. Simulation results show that lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document} has close-to-nominal Type I error rates and satisfactory power for detecting aberrant responses. For unidimensional models, lzt\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}$$\end{document} and lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document} reduce to lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} and lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}^{*}$$\end{document}, respectively, and therefore allows for the evaluation of person fit with a wider range of IRT models. A real data application is presented to show the utility of the proposed statistics for a test with an underlying structure that consists of both the traditional unidimensional component and the Rasch testlet component.

Information

Type
Original Research
Copyright
© 2024 The Author(s), under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

The research reported in this paper was performed when the first author was an employee of Cambium Assessment. The first author is currently an employee of the Financial Industry Regulatory Authority (FINRA). Any opinion expressed in this publication are those of the authors and not necessarily of FINRA.

References

Albers, C. J., Meijer, R. R., Tendeiro, J. N. (2016). Derivation and applicability of asymptotic results for multiple subtests person-fit statistics. Applied Psychological Measurement, 40(4), 274288.CrossRefGoogle ScholarPubMed
Bedrick, E. J. (1997). Approximating the conditional distribution of person fit indexes for checking the Rasch model. Psychometrika, 62(2), 191199.CrossRefGoogle Scholar
Bradlow, E. T., Wainer, H., Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153168.CrossRefGoogle Scholar
Cai, L. (2015). Lord-Wingersky algorithm version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 80(2), 535559.CrossRefGoogle ScholarPubMed
Chen, H. (2013). Testlet Effects on Standardized Log-likelihood Person Fit Index to Detect Aberrant Responses for the IRT Testlet Model (Doctoral dissertation, University of Missouri–Columbia).Google Scholar
De La Torre, J., Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159177.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 6786.CrossRefGoogle Scholar
Glas, C. A. W., Dagohoy, A. V. T. (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72(2), 159180.CrossRefGoogle Scholar
Gorney, K., Sinharay, S., Eckerly, C. (2024). Efficient corrections for standardized person-fit statistics. Psychometrika, 1–23.CrossRefGoogle Scholar
Hong, M., Lin, L., Cheng, Y. (2021). Asymptotically corrected person fit statistics for multidimensional constructs with simple structure and mixed item types. Psychometrika, 86(2), 464488.CrossRefGoogle ScholarPubMed
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277298.CrossRefGoogle Scholar
Liou, M., Chang, C. H. (1992). Constructing the exact significance level for a person fit statistic. Psychometrika, 57(2), 169181.CrossRefGoogle Scholar
Lord, F. M., Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453461.CrossRefGoogle Scholar
Magis, D., Raîche, G., Béland, S. (2012). A didactic presentation of Snijders’s lz* index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37(1), 5781.CrossRefGoogle Scholar
Meijer, R. R., Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107135.CrossRefGoogle Scholar
Molenaar, I. W., Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55(1), 75106.CrossRefGoogle Scholar
Nering, M. L. (1995). The distribution of person fit using true and estimated person parameters. Applied Psychological Measurement, 19(2), 121129.CrossRefGoogle Scholar
New Hampshire Department of Education (2019). New hampshire statewide assessment system 2018-2019 annual technical report volume 1. https://www.education.nh.gov/sites/g/files/ehbemt326/files/inline-documents/sonh/nhsas-v1-tech-report-2018-19.pdf.Google Scholar
Reise, S. P. (1995). Scoring method and the detection of person misfit in a personality assessment context. Applied Psychological Measurement, 19(3), 213229.CrossRefGoogle Scholar
Rijmen, F., Turhan, A., Jiang, T. (2018). An item response theory model for next generation of science standards assessments. National Council of Measurement in Education Annual Conference, New York, NY.Google Scholar
Rupp, A. A. (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), 3.Google Scholar
Seo, D. G., Weiss, D. J. (2013). lz Person-fit index to identify misfit students with achievement test data. Educational and Psychological Measurement, 73(6), 9941016.CrossRefGoogle Scholar
Sinharay, S. (2015). Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics, 40(4), 343365.CrossRefGoogle ScholarPubMed
Sinharay, S. (2016). Asymptotically correct standardization of person-fit statistics beyond dichotomous items. Psychometrika, 81(4), 9921013.CrossRefGoogle ScholarPubMed
Sireci, S. G., Thissen, D., Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28(3), 237247.CrossRefGoogle Scholar
Snijders, T. A. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331342.CrossRefGoogle Scholar
van Krimpen-Stoop, E. M., Meijer, R. R. (1999). The null distribution of person-fit statistics for conventional and adaptive tests. Applied Psychological Measurement, 23(4), 327345.CrossRefGoogle Scholar
von Davier, M., Molenaar, I. W. (2003). A person-fit index for polytomous Rasch models, latent class models, and their mixture generalizations. Psychometrika, 68(2), 213228.CrossRefGoogle Scholar
Wainer, H., Lukhele, R. (1997). How reliable are TOEFL scores?. Educational and Psychological Measurement, 57(5), 741758.CrossRefGoogle Scholar
Wainer, H., Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability?. Educational Measurement: Issues and Practice, 15(1), 2229.CrossRefGoogle Scholar
Wainer, H., Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203220.CrossRefGoogle Scholar
Wang, W. C., Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126149.CrossRefGoogle Scholar
Xia, Y., Zheng, Y. (2018). Asymptotically normally distributed person fit indices for detecting spuriously high scores on difficult items. Applied Psychological Measurement, 42(5), 343358.CrossRefGoogle ScholarPubMed
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187213.CrossRefGoogle Scholar