Asymptotically Correct Person Fit z-Statistics For the Rasch Testlet Model

Zhongtian Lin; Tao Jiang; Frank Rijmen; Paul Van Wamelen

doi:10.1007/s11336-024-09997-y

Asymptotically Correct Person Fit z-Statistics For the Rasch Testlet Model

Published online by Cambridge University Press: 01 January 2025

Zhongtian Lin

Tao Jiang ,

Frank Rijmen and

Paul Van Wamelen

Show author details

Zhongtian Lin*: Affiliation:
Financial Industry Regulatory Authority
Tao Jiang: Affiliation:
Cambium Assessment
Frank Rijmen: Affiliation:
Cambium Assessment
Paul Van Wamelen: Affiliation:
Cambium Assessment
*: Correspondence should be made to Zhongtian Lin, Financial Industry Regulatory Authority, Washington, USA. Email: lzt713@gmail.com

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

A well-known person fit statistic in the item response theory (IRT) literature is the lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} statistic (Drasgow et al. in Br J Math Stat Psychol 38(1):67-86, 1985). Snijders (Psychometrika 66(3):331-342, 2001) derived lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}^{*}$$\end{document}, which is the asymptotically correct version of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} when the ability parameter is estimated. However, both statistics and other extensions later developed concern either only the unidimensional IRT models or multidimensional models that require a joint estimate of latent traits across all the dimensions. Considering a marginalized maximum likelihood ability estimator, this paper proposes lzt\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}$$\end{document} and lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document}, which are extensions of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} and lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}^{*}$$\end{document}, respectively, for the Rasch testlet model. The computation of lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document} relies on several extensions of the Lord-Wingersky algorithm (1984) that are additional contributions of this paper. Simulation results show that lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document} has close-to-nominal Type I error rates and satisfactory power for detecting aberrant responses. For unidimensional models, lzt\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}$$\end{document} and lzt∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{zt}^{*}$$\end{document} reduce to lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}$$\end{document} and lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_{z}^{*}$$\end{document}, respectively, and therefore allows for the evaluation of person fit with a wider range of IRT models. A real data application is presented to show the utility of the proposed statistics for a test with an underlying structure that consists of both the traditional unidimensional component and the Rasch testlet component.

Keywords

Person fit IRT lz statistic Rasch testlet model

Information

Type: Original Research
Information: Psychometrika , Volume 89 , Issue 4 , December 2024 , pp. 1230 - 1260

DOI: https://doi.org/10.1007/s11336-024-09997-y [Opens in a new window]
Copyright: © 2024 The Author(s), under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

The research reported in this paper was performed when the first author was an employee of Cambium Assessment. The first author is currently an employee of the Financial Industry Regulatory Authority (FINRA). Any opinion expressed in this publication are those of the authors and not necessarily of FINRA.

References

Albers, C. J., Meijer, R. R., Tendeiro, J. N. (2016). Derivation and applicability of asymptotic results for multiple subtests person-fit statistics. Applied Psychological Measurement, 40(4), 274–288.CrossRef Google Scholar PubMed

Bedrick, E. J. (1997). Approximating the conditional distribution of person fit indexes for checking the Rasch model. Psychometrika, 62(2), 191–199.CrossRef Google Scholar

Bradlow, E. T., Wainer, H., Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168.CrossRef Google Scholar

Cai, L. (2015). Lord-Wingersky algorithm version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 80(2), 535–559.CrossRef Google Scholar PubMed

Chen, H. (2013). Testlet Effects on Standardized Log-likelihood Person Fit Index to Detect Aberrant Responses for the IRT Testlet Model (Doctoral dissertation, University of Missouri–Columbia).Google Scholar

De La Torre, J., Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159–177.CrossRef Google Scholar

Drasgow, F., Levine, M. V., Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86.CrossRef Google Scholar

Glas, C. A. W., Dagohoy, A. V. T. (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72(2), 159–180.CrossRef Google Scholar

Gorney, K., Sinharay, S., Eckerly, C. (2024). Efficient corrections for standardized person-fit statistics. Psychometrika, 1–23.CrossRef Google Scholar

Hong, M., Lin, L., Cheng, Y. (2021). Asymptotically corrected person fit statistics for multidimensional constructs with simple structure and mixed item types. Psychometrika, 86(2), 464–488.CrossRef Google Scholar PubMed

Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277–298.CrossRef Google Scholar

Liou, M., Chang, C. H. (1992). Constructing the exact significance level for a person fit statistic. Psychometrika, 57(2), 169–181.CrossRef Google Scholar

Lord, F. M., Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453–461.CrossRef Google Scholar

Magis, D., Raîche, G., Béland, S. (2012). A didactic presentation of Snijders’s lz* index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37(1), 57–81.CrossRef Google Scholar

Meijer, R. R., Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135.CrossRef Google Scholar

Molenaar, I. W., Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55(1), 75–106.CrossRef Google Scholar

Nering, M. L. (1995). The distribution of person fit using true and estimated person parameters. Applied Psychological Measurement, 19(2), 121–129.CrossRef Google Scholar

New Hampshire Department of Education (2019). New hampshire statewide assessment system 2018-2019 annual technical report volume 1. https://www.education.nh.gov/sites/g/files/ehbemt326/files/inline-documents/sonh/nhsas-v1-tech-report-2018-19.pdf.Google Scholar

Reise, S. P. (1995). Scoring method and the detection of person misfit in a personality assessment context. Applied Psychological Measurement, 19(3), 213–229.CrossRef Google Scholar

Rijmen, F., Turhan, A., Jiang, T. (2018). An item response theory model for next generation of science standards assessments. National Council of Measurement in Education Annual Conference, New York, NY.Google Scholar

Rupp, A. A. (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), 3.Google Scholar

Seo, D. G., Weiss, D. J. (2013). lz Person-fit index to identify misfit students with achievement test data. Educational and Psychological Measurement, 73(6), 994–1016.CrossRef Google Scholar

Sinharay, S. (2015). Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics, 40(4), 343–365.CrossRef Google Scholar PubMed

Sinharay, S. (2016). Asymptotically correct standardization of person-fit statistics beyond dichotomous items. Psychometrika, 81(4), 992–1013.CrossRef Google Scholar PubMed

Sireci, S. G., Thissen, D., Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28(3), 237–247.CrossRef Google Scholar

Snijders, T. A. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331–342.CrossRef Google Scholar

van Krimpen-Stoop, E. M., Meijer, R. R. (1999). The null distribution of person-fit statistics for conventional and adaptive tests. Applied Psychological Measurement, 23(4), 327–345.CrossRef Google Scholar

von Davier, M., Molenaar, I. W. (2003). A person-fit index for polytomous Rasch models, latent class models, and their mixture generalizations. Psychometrika, 68(2), 213–228.CrossRef Google Scholar

Wainer, H., Lukhele, R. (1997). How reliable are TOEFL scores?. Educational and Psychological Measurement, 57(5), 741–758.CrossRef Google Scholar

Wainer, H., Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability?. Educational Measurement: Issues and Practice, 15(1), 22–29.CrossRef Google Scholar

Wainer, H., Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203–220.CrossRef Google Scholar

Wang, W. C., Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149.CrossRef Google Scholar

Xia, Y., Zheng, Y. (2018). Asymptotically normally distributed person fit indices for detecting spuriously high scores on difficult items. Applied Psychological Measurement, 42(5), 343–358.CrossRef Google Scholar PubMed

Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187–213.CrossRef Google Scholar

Article contents

Asymptotically Correct Person Fit z-Statistics For the Rasch Testlet Model

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests