1 Introduction
In the fields of education and psychology, it is often of interest to study constructs that are not directly observable. These constructs cannot be measured directly and are often formulated as latent variables (LVs; De Boeck et al., Reference De Boeck, Pek, Walton, Wegener, Turner, Andersen, Beauchaine, Lecavalier, Myung and Petty2023). Instruments are administered to collect observable indicators associated with these LVs, which are referred to as manifest variables (MVs). The LVs can then be inferred from the collected MVs by fitting a measurement model that specifies the relationship between the LVs and MVs.
A widely used LV measurement model is the common factor model (e.g., Bollen, Reference Bollen1989; Jöreskog, Reference Jöreskog1969; Kaplan, Reference Kaplan2008; Kline, Reference Kline2023; Lawley & Maxwell, Reference Lawley and Maxwell1971; Thurstone, Reference Thurstone1947), particularly when analyzing continuous response data. The common factor model requires MVs to be linearly dependent on LVs, aiming to attribute the dependencies among MVs to LVs. The model has been applied in a wide range of settings, such as testing theories about the number of LVs and the pattern of MV–LV dependency (i.e., confirmatory factor analysis) or learning these structures from data (i.e., exploratory factor analysis). Other common uses include examining individual differences in latent traits or investigating MV-level characteristics, similar to applications of item response theory (IRT; e.g., Hambleton & Swaminathan, Reference Hambleton and Swaminathan2013), but focusing on continuous MVs rather than categorical ones. Examples of such continuous responses include response times (RTs), often recorded in computer-based testing environments, and responses made by marking a placement along a line segment (Minchen & de la Torre, Reference Minchen and de la Torre2018).
A crucial aspect of all these applications is the evaluation of model-data fit, as a poorly fitted model can invalidate subsequent inferences. For common factor models, numerous goodness-of-fit (GOF) assessment tools have been developed, among which the most commonly reported (e.g., Jackson et al., Reference Jackson, Gillaspy and Purc-Stephenson2009) are the
$\chi ^2$
model fit test (i.e., likelihood ratio test) and several GOF indices, such as comparative fit index (CFI; Bentler, Reference Bentler1990), Tucker–Lewis index (TLI; Tucker & Lewis, Reference Tucker and Lewis1973), standardized root mean squared residual (SRMR; Bentler, Reference Bentler1995), and root mean square error of approximation (RMSEA; Steiger & Lind, Reference Steiger and Lind1980). If these fit diagnostics indicate good fit, researchers would generally be satisfied with the model and proceed with subsequent inferences.
However, conventional GOF statistics are primarily based on residual means and covariances (i.e., the differences between observed and fitted mean vectors and covariance matrices), which limits their ability to detect misfit occurring beyond the mean and covariance structures. It implies that even when conventional GOF diagnostics appear satisfactory, some masked misfit may still exist that is not captured by these measures. When the fit to mean and covariance structures is poor, researchers often turn to modification indices, which approximate changes in the model
$\chi ^2$
statistic in response to relaxing model constraints (e.g., allowing for cross-loadings or correlated residuals). Although these indices can help identify potential sources of misfit and guide model modifications, these adjustments remain confined to improving the recovery of mean and covariance structures.
In this article, we emphasize that assessing GOF to mean and covariance structures only may overlook critical aspects of model-data disagreement, potentially resulting in misleading conclusions. For instance, in a common factor model with two LVs, the relation between the LVs is often of interest, typically inferred from the estimated correlation coefficient. When bivariate normality is assumed for the LVs, the correlation coefficient quantifies the direction and strength of their linear relation. If normality is violated, the correlation coefficient taken at face value may inaccurately represent the inter-LV association, as the two LVs may not even be linearly related (Liu & Wang, Reference Liu and Wang2024). Such violations, however, cannot easily be detected by conventional methods. Another aspect that could potentially be overlooked is MV-level misfit. While residual means and variances can be inspected for individual MVs, this often provides little insight because the observed means and variances are often perfectly recovered in standard linear factor models.
To facilitate GOF assessment beyond mean and covariance structures, we focus on the theory of generalized residuals (Haberman & Sinharay, Reference Haberman and Sinharay2013), which was originally developed for and applied to categorical data organized in contingency tables. Informally, generalized residuals refer to the discrepancy between the sample average and the model-based population expectation of a specific summary quantity, which can be flexibly defined to evaluate various aspects of model misfit. This flexibility, coupled with the following benefits, makes generalized residuals a powerful tool for fit assessment. First, generalized residuals are asymptotically normal under the null hypothesis, enabling formal statistical tests. Second, they can be defined conditionally on LV values, allowing for localized fit assessments when combined with LV measurement models. When the latent dimensionality is low (e.g., one or two), graphical displays can also be conveniently generated conditionally on LV values, providing intuitive diagnostics of the nature of the misfit. Based on these features, the generalized residuals framework has been successfully applied in IRT with categorical data to evaluate specific parametric assumptions of the model, such as functional forms of item characteristic curves and the normality of LVs (Haberman et al., Reference Haberman, Liu and Lee2019; Haberman et al., Reference Haberman, Sinharay and Chon2013; Monroe, Reference Monroe2021; van Rijn et al., Reference van Rijn, Sinharay, Haberman and Johnson2016).
One contribution of our work is to extend the theory of generalized residuals (Haberman & Sinharay, Reference Haberman and Sinharay2013) to more general measurement models. We focus on GOF testing with linear normal factor analysis seeing its wide applications in measurement scenarios involving continuous MVs. As part of this extension, we introduce transformed residuals and establish their joint multivariate normality, which offers greater flexibility in constructing test statistics compared to the original framework. In addition, we propose summary statistics that evaluate multiple residuals simultaneously.
The second contribution of our work is to propose three example test statistics that are designed to capture various aspects of model misfit in common factor models. These examples illustrate how our general framework can be adapted to specific fit assessment problems. The first example evaluates the fit of the LV density, while the second and third examples focus on MV-level fit assessment by examining the conditional mean and variance functions of each MV given the LVs. In all three examples, residuals are defined conditionally on LV values, allowing for both pointwise fit assessments at specific LV values and overall evaluations across multiple LV values. Given the flexibility of our framework, test statistics targeting other types of misfit can be explored in future research. The strength of our framework lies in its ability to identify issues that traditional GOF diagnostics may overlook, even when their results appear favorable. We demonstrate the utility of our method through simulations and an empirical data analysis using RT data. Ultimately, our proposed framework is expected to complement conventional GOF diagnostics for common factor models by enabling more nuanced assessments of model-data fit.
The remainder of the article is organized as follows. In Section 2, we introduce the common factor model and present our extended theory of generalized residuals, along with example GOF test statistics. In Section 3, the performance of the proposed test statistics is evaluated by Monte Carlo (MC) studies. In Section 4, our GOF testing methods are illustrated with a real data example. The article concludes with a discussion of the main findings, limitations, and future directions in Section 5.
2 Theory
2.1 Common factor model
2.1.1 LV measurement model
Let
$Y_{ij} \in \mathcal {R}$
be the individual i’s response for MV j, and
$\mathbf {X}_i=(X_{i1}, \dots , X_{id})^{^{\top }} \in \mathcal {R}^d$
be the d-dimensional LV for individual i. Denote the conditional density of
$Y_{ij}|\mathbf {X}_{i}$
by
$f_j(y_{ij}|\boldsymbol {x})$
, where
$y_{ij}$
and
$\boldsymbol {x}=(x_1, \dots , x_d)^{^{\top }}$
indicate the realized values of
$Y_{ij}$
and
$\mathbf {X}_i$
, respectively. Let
$\mathbf {Y}_i=(Y_{i1}, \dots , Y_{im})^{\top }$
be a collection of m MVs from the same individual i. It is typically assumed that
$Y_{i1}, \dots , Y_{im}$
are conditionally independent given
$\mathbf {X}_i$
, often referred to as the local independence assumption (McDonald, Reference McDonald1999, pp. 255–257). The conditional density of
$\mathbf {Y}_i=\boldsymbol {y}_i=(y_{i1}, \dots , y_{im})^{\top }$
given
$\mathbf {X}_i=\boldsymbol {x}$
can then be factorized asFootnote
1

It follows that the marginal density/likelihood of
$\mathbf {Y}_i=\boldsymbol {y}_i$
is obtained as the following integral:

in which
$\phi $
denotes the population density of
$\mathbf {X}_i$
.
Pooling across a sample of n independent and identically distributed (i.i.d.) random vectors of MVs, the sample log-likelihood can be expressed as

in which
$\mathbf {Y}=(\mathbf {Y}_1, \dots , \mathbf {Y}_n)^{\top } \in \mathcal {R}^{n \times m}$
denotes the data matrix, and the dependency on model parameters, denoted as
$\boldsymbol {\xi } \in \mathcal {R}^q$
, is now included in the notation. In parametric LV measurement models where the distribution of LVs and the conditional distribution of MVs given LVs are specified,
$\boldsymbol {\xi }$
is often estimated by maximum likelihood (ML). ML estimation seeks for
$\boldsymbol {\xi }$
that solves the following estimating equation:

in which
$\nabla _{\boldsymbol {\xi }}l_n(\boldsymbol {\xi }; \mathbf {Y})$
is a
$1 \times q$
vector of partial derivatives of
$l_n(\boldsymbol {\xi }; \mathbf {Y})$
with respect to
$\boldsymbol {\xi }$
. Provided that the Hessian matrix is negative definite, the solution to Equation (4), denoted
$\hat {\boldsymbol {\xi }}$
, corresponds to a local maximizer of the sample log-likelihood function. Under suitable regularity conditions and given correct model specification,
$\hat {\boldsymbol {\xi }}$
satisfies

as
$n \to \infty $
, where
$\boldsymbol{\mathcal {I}}(\boldsymbol {\xi })=\mathbb {E} [\nabla _{\boldsymbol {\xi }} \log f(\mathbf {Y}_i; \boldsymbol {\xi })^{{}^{\top }} \nabla _{\boldsymbol {\xi }} \log f(\mathbf {Y}_i; \boldsymbol {\xi })]$
denotes the
$q \times q$
per-observation Fisher information matrix. By Equation (5), local identifiability of the model parameters is implicitly assumed.
2.1.2 Common factor model
In the current work, we focus on a specific LV measurement model for continuous MVs: the common factor model. We assume that
$\mathbf {X}_i \sim \mathcal {N}(\mathbf {0}, \boldsymbol {\Phi })$
, with the population density of
$\mathbf {X}_i=\boldsymbol {x}$
expressed by

We further assume that
$Y_{ij}|\mathbf {X}_i=\boldsymbol {x} \sim \mathcal {N}(\nu _j+\boldsymbol {\lambda }_j^{^{\top }} \boldsymbol {x}, \theta _j)$
for
$j=1,\dots ,m$
, where
$\nu _j$
denotes an intercept,
$\boldsymbol {\lambda }_j$
denotes a
$d \times 1$
vector of factor loadings, and
$\theta _j$
denotes an error variance. With this model specification, the conditional density
$f_j(y_{ij}|\boldsymbol {x})$
is given by

The marginal likelihood
$f(\boldsymbol {y}_i)$
can then be derived from Equation (2), resulting in a normal density function for
$\mathcal {N}(\boldsymbol {\nu }, \boldsymbol {\Lambda } \boldsymbol {\Phi } \boldsymbol {\Lambda }^{\top } + \boldsymbol {\Theta })$
:

in which
$\boldsymbol {\nu }=(\nu _1, \dots , \nu _m)^{^{\top }}$
is an
$m \times 1$
vector of intercepts,
$\boldsymbol {\Lambda }=(\boldsymbol {\lambda }_1, \dots , \boldsymbol {\lambda }_m)^{^{\top }}$
is an
$m \times d$
matrix of factor loadings, and
$\boldsymbol {\Theta } = \text {Diag}(\theta _1,\dots ,\theta _m)$
is an
$m \times m$
diagonal error covariance matrix. The model parameters
$\boldsymbol {\xi }$
now consist of the free elements in
$\boldsymbol {\nu }$
,
$\boldsymbol {\Lambda }$
,
$\boldsymbol {\Phi }$
, and
$\boldsymbol {\Theta }$
. The ML estimator
$\hat {\boldsymbol {\xi }}$
can be obtained by solving Equation (4) with the normal log-likelihood
$l_n(\boldsymbol {\xi }; \mathbf {Y})$
.
2.1.3 Summary quantity and fit assessment
Under this common factor model, our goal now is to develop test statistics that can capture misfit beyond the mean and covariance structures. To achieve this, consider a summary quantity
$H(\mathbf {Y}_i; \boldsymbol {\xi })$
for each individual i, which is a real continuous function allowed to depend on the MVs
$\mathbf {Y}_i$
and the model parameters
$\boldsymbol {\xi }$
. Depending on the context, this function can also incorporate fixed LV values
$\boldsymbol {x}$
. Using
$H(\mathbf {Y}_i; \boldsymbol {\xi })$
, we obtain its population expectation by

An empirical estimator of Equation (9) is the sample average

The difference
$\hat {\eta }(\boldsymbol {\xi })-\eta ( \boldsymbol {\xi })$
then quantifies the discrepancy between the sample average and the model-implied population expectation, forming the basis for assessing model fit. Because model parameters are typically unknown in practice and must be estimated from data, we replace
$\boldsymbol {\xi }$
by the ML estimator
$\hat{\boldsymbol{\xi}}$
and rely on the residual
$\hat {\eta }(\hat {\boldsymbol {\xi }})-\eta ( \hat {\boldsymbol {\xi }})$
as the final basis for constructing fit statistics.
When the same idea is applied to models for categorical data, the difference between sample and population averages of summary quantities is referred to as generalized residuals (Haberman & Sinharay, Reference Haberman and Sinharay2013). Generalized residuals are asymptotically normal under the correct model specification, which has enabled their wide application in constructing various fit statistics in IRT (e.g., Haberman et al., Reference Haberman, Liu and Lee2019; Haberman et al., Reference Haberman, Sinharay and Chon2013; Monroe, Reference Monroe2021; van Rijn et al., Reference van Rijn, Sinharay, Haberman and Johnson2016).
In this article, we extend the existing theory of generalized residuals (Haberman & Sinharay, Reference Haberman and Sinharay2013) to establish asymptotic properties of transformed residuals under more general measurement models. Our proposed framework not only accommodates continuous MVs, but also enables flexible construction of diverse fit test statistics not considered in extant work on generalized residuals. In the following sections, we begin by presenting the most general theory in Section 2.2, construct two different types of test statistics in Section 2.3, and then narrow down to specific examples of fit assessment in Section 2.4.
2.2 Residuals
2.2.1 Asymptotic normality of residuals
Consider a
$k \times 1$
vector of summary quantities

whose components are formed by different choices of summary quantities introduced in Section 2.1.3. Analogous to Equations (9) and (10), let
$\boldsymbol {\eta }(\boldsymbol {\xi })$
and
$\hat {\boldsymbol {\eta }}(\boldsymbol {\xi })$
denote the population expectation and sample average of the vector-valued
$\mathbf {H}(\mathbf {Y}_i; \boldsymbol {\xi })$
, respectively:


With the ML estimates for model parameters plugged in, the
$k \times 1$
difference vector
$\hat {\boldsymbol {\eta }}(\hat {\boldsymbol {\xi }})-\boldsymbol {\eta }(\hat {\boldsymbol {\xi }})$
quantifies the discrepancy between the data and the model, evaluated based on
$H_{\ell }(\mathbf {Y}_i; \hat {\boldsymbol {\xi }})$
for
$\ell =1,\dots ,k$
.
To further generalize, let
$\boldsymbol {\varphi }: \mathcal {R}^{k} \to \mathcal {R}^{k'}$
be a twice continuously differentiable transformation function applied to both
$\hat {\boldsymbol {\eta }}$
and
$\boldsymbol {\eta }$
. This transformation allows us to construct a
$k' \times 1$
vector of transformed residuals:

which evaluates model-data fit with respect to the population quantities of interest
$\boldsymbol {\varphi }(\boldsymbol {\eta }(\boldsymbol {\xi }))$
. This general formulation enables more flexible construction of statistics, including those involving ratios of summary quantities, as will be illustrated in Sections 2.4.2 and 2.4.3.
In the supplementary material, we establish the following asymptotic normality of the transformed residuals:

in which
$\nabla \boldsymbol {\varphi }$
stands for a
$k' \times k$
Jacobian matrix of
$\boldsymbol {\varphi }$
,
$\mathbf {A}(\boldsymbol {\xi })=\mathbb {E}\left [\mathbf {H}(\mathbf {Y}_i; \boldsymbol {\xi }) \nabla _{\boldsymbol {\xi }} \log f(\mathbf {Y}_i; \boldsymbol {\xi }) \right ]$
, and
$\boldsymbol {\Sigma }_{\mathbf {H}}(\boldsymbol {\xi })=\mathrm {Cov}[\mathbf {H}( \mathbf {Y}_i; \boldsymbol {\xi })].$
In the special case where the identity transformation
$\boldsymbol \varphi = \boldsymbol {\iota }$
is applied, with
$\boldsymbol {\iota }(\mathbf {h})=\mathbf {h}$
for any
$\mathbf {h}\in \mathcal {R}^{k}$
, Equation (14) reduces to a
$k \times 1$
vector of original residuals,

In this case,
$\nabla \boldsymbol {\varphi }(\boldsymbol {\eta }(\boldsymbol {\xi }))$
in Equation (15) simplifies to the
$k \times k$
identity matrix.
2.2.2 Estimation of the asymptotic covariance matrix
In practice, the asymptotic covariance matrix (ACM) of residuals (Equation (15)) needs to be estimated from the data. When the population ACM is substituted with its consistent estimator, the asymptotic normality results remain unchanged by Slutsky's theorem (Bickel & Doksum, Reference Bickel and Doksum2015, p. 512).
To obtain the consistent estimator of the ACM in Equation (15), we replace
$\boldsymbol {\xi }$
with the ML estimator
$\hat {\boldsymbol {\xi }}$
wherever they appear in the formula. For the population mean and (co)variance in
$\mathbf {A}(\hat {\boldsymbol {\xi }})$
and
$\boldsymbol {\Sigma }_{\mathbf {H}}(\hat {\boldsymbol {\xi }})$
, a straightforward approach is to use the sample mean and sample covariance matrix, as done by Haberman et al. (Reference Haberman, Sinharay and Chon2013) and Monroe (Reference Monroe2021). However, our pilot simulation study revealed that this approach could result in severely inflated Type I error rates, particularly with small sample sizes. To address this issue, we suggest obtaining more stable estimates for
$\mathbf {A}( \hat {\boldsymbol {\xi }})$
and
$\boldsymbol {\Sigma }_{\mathbf {H}}( \hat {\boldsymbol {\xi }})$
by computing the MC mean and covariance. This MC approach has been similarly used to approximate the expected Fisher information for IRT models (Monroe, Reference Monroe2019). The steps are outlined as follows:
-
1. Sample
$\tilde {\boldsymbol {y}}_i, i=1,\dots ,M$ , from the marginal distribution of
$\mathbf {Y}_i$ with density
$f(\boldsymbol {y}_i;\hat {\boldsymbol {\xi }})$ .
-
2. Use
$\tilde {\boldsymbol {y}}_i$ to calculate
$\mathbf {H}( \tilde {\boldsymbol {y}}_i; \hat {\boldsymbol {\xi }})$ and
$\nabla _{\boldsymbol {\xi }} \log f(\tilde {\boldsymbol {y}}_i; \hat {\boldsymbol {\xi }})$ .
-
3. Estimate
$\mathbf {A}(\hat {\boldsymbol {\xi }})$ and
$\boldsymbol {\Sigma }_{\mathbf {H}}(\hat {\boldsymbol {\xi }})$ by
(17)$$ \begin{align} \hat{\mathbf{A}}(\hat{\boldsymbol{\xi}})&=\frac{1}{M}\sum_{i=1}^M \left[\mathbf{H}(\tilde{\boldsymbol{y}}_i; \hat{\boldsymbol{\xi}}) \nabla_{\boldsymbol{\xi}} \log f(\tilde{\boldsymbol{y}}_i; \hat{\boldsymbol{\xi}})\right], \end{align} $$
(18)$$ \begin{align} &\hat{\boldsymbol{\Sigma}}_{\mathbf{H}}(\hat{\boldsymbol{\xi}})= \frac{1}{M-1}\sum_{i=1}^M \left[\left(\mathbf{H}(\tilde{\boldsymbol{y}}_i; \hat{\boldsymbol{\xi}}) - \bar{\mathbf{H}}( \hat{\boldsymbol{\xi}})\right)\left(\mathbf{H}(\tilde{\boldsymbol{y}}_i; \hat{\boldsymbol{\xi}}) - \bar{\mathbf{H}}(\hat{\boldsymbol{\xi}})\right)^{\top}\right]. \end{align} $$
In Equation (18),
$\bar {\mathbf {H}}(\hat {\boldsymbol {\xi }})=\frac {1}{M}\sum _{i=1}^M \mathbf {H}(\tilde {\boldsymbol {y}}_i; \hat {\boldsymbol {\xi }})$
. As
$M \to \infty, \hat {\mathbf {A}}(\hat {\boldsymbol {\xi }}) \text{ and } \hat{\boldsymbol {\Sigma}}_{\mathbf {H}}(\hat {\boldsymbol {\xi }})$
are consistent estimators of
$\mathbf {A}(\hat {\boldsymbol {\xi }}) \text{ and } \boldsymbol {\Sigma }_{\mathbf {H}}(\hat {\boldsymbol {\xi }})$
, respectively.
In the subsequent notation for ACMs, a caret “ˆ” is added to indicate that the ACMs are obtained using
$\hat {\boldsymbol {\xi }}, \hat {\mathbf {A}},$
and
$\hat {\boldsymbol {\Sigma }}_{\mathbf {H}}$
.
2.3 Test statistics
Based on the asymptotic normality of residuals established in Section 2.2.1, we develop two types of test statistics for assessing model misfit: (1) a z-statistic for testing a single residual and (2) a
$\chi ^2$
-statistic for testing multiple residuals simultaneously.
2.3.1 z-statistic
Let
$e_{\boldsymbol {\varphi },r}$
be the rth component of
$\boldsymbol {e}_{\boldsymbol {\varphi }}$
in Equation (14) (
$r=1, \dots , k'$
). Similarly, let
$\hat {\sigma }_{\boldsymbol {\varphi },r}$
be the square root of the rth diagonal element of
$\hat {\boldsymbol {\Sigma }}_{\boldsymbol {\varphi }}$
, where the shorthand notation
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}$
is introduced to denote the ACM in Equation (15):

For a single residual
$e_{\boldsymbol {\varphi },r}$
, we standardize it to construct the z-statistic:

Under the null hypothesis that the model is correctly specified, the z-statistic converges in distribution to
$\mathcal {N}(0, 1)$
. Consequently, values of
$|z|$
> 1.96 suggest significant misfit at
$\alpha =0.05$
based on
$e_{\boldsymbol {\varphi },r}$
.
2.3.2
$\chi ^2$
-statistic
Building on the joint asymptotic multivariate normality of
$\boldsymbol {e}_{\boldsymbol {\varphi }}$
established in Equation (15), we construct a quadratic form statistic
$n\boldsymbol {e}_{\boldsymbol {\varphi }}^{\top } \mathbf {W} \boldsymbol {e}_{\boldsymbol {\varphi }}$
, where
$\mathbf {W}$
is a
$k' \times k'$
symmetric weight matrix. This statistic provides a summary of model misfit based on multiple residuals. Such summary information can help practitioners make informed judgments about the overall model misfit of interest. In the remainder of this section, we first review some existing approaches for constructing quadratic form statistics based on normal variables, followed by our newly proposed approach.
When the model is correctly specified, it is known that a quadratic form statistic of normal variates is distributed as a mixture of independent
$\chi _1^2$
random variables (Box, Reference Box1954). The correct p-value for this mixture-
$\chi ^2$
distribution can be computed, for example, by using the inversion formula given in Imhof (Reference Imhof1961) or by adjusting the statistic based on its mean and variance, approximating a
$\chi ^2$
distribution (Satorra & Bentler, Reference Satorra and Bentler1994; Satterthwaite, Reference Satterthwaite1946). However, calculating p-values under a
$\chi ^2$
-mixture reference can often be computationally challenging.
To simplify the p-value calculations, the weight matrix
$\mathbf {W}$
can be chosen such that the resulting quadratic form statistic is asymptotically distributed as
$\chi ^2$
. Specifically, this condition is met when

and
$tr(\mathbf {W}\boldsymbol {\Sigma }_{\boldsymbol {\varphi }})=s$
, where s is the degrees of freedom of the resulting
$\chi ^2$
distribution (Schott, Reference Schott2016, Theorem 11.11). A common way to satisfy these conditions is to use the Moore–Penrose pseudoinverse of
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}$
,
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{+},$
as the weight matrix (Magnus & Neudecker, Reference Magnus and Neudecker1999, p. 36). This approach has been employed in previous studies to construct quadratic form fit statistics (e.g., Liu & Maydeu-Olivares, Reference Liu and Maydeu-Olivares2014; Liu et al., Reference Liu, Yang and Maydeu-Olivares2019; Maydeu-Olivares & Liu, Reference Maydeu-Olivares and Liu2015; Reiser, Reference Reiser1996). However, the use of
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{+}$
can introduce numerical instability, as the rank of
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}$
needs to be estimated numerically by counting non-zero eigenvalues in its sample estimate. This procedure relies on an arbitrarily selected tolerance for identifying zero eigenvalues, which may lead to inconsistent performance of the resulting test statistic (Maydeu-Olivares & Joe, Reference Maydeu-Olivares and Joe2008).
To address this issue, we propose defining the weight matrix as

in which
$\mathbf {U}$
is the
$k' \times k'$
matrix of eigenvectors from the eigendecomposition of
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}$
(i.e.,
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}=\mathbf {U} \boldsymbol {\Omega } \mathbf {U}^{\top }$
),
$\boldsymbol {\Omega }$
is the diagonal matrix of eigenvalues, and
$\boldsymbol {\Omega }^{-}=\text {Diag}(\omega _1^{-1}, \dots , \omega _s^{-1}, 0, \dots , 0)$
. Here,
$\boldsymbol {\Omega }^{-}$
stands for the inverse of
$\boldsymbol {\Omega }$
only with the largest s eigenvalues,
$\omega _1, \dots , \omega _s$
, where
$s \leq \text {rank}(\boldsymbol {\Sigma }_{\boldsymbol {\varphi }})$
. The proposed weight matrix
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{-}$
generalizes the pseudoinverse
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{+}$
, subsuming it as a special case when
$s=\text {rank}(\boldsymbol {\Sigma }_{\boldsymbol {\varphi }})$
. For
$s \leq \text {rank}(\boldsymbol {\Sigma }_{\boldsymbol {\varphi }})$
,
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}$
serves as a generalized inverse of
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{-}$
, such that
$\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{-} \boldsymbol {\Sigma }_{\boldsymbol {\varphi }} \boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{-}=\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{-}$
holds. With this formulation, the conditions in Equation (21) and
$tr(\boldsymbol {\Sigma }_{\boldsymbol {\varphi }}^{-}\boldsymbol {\Sigma }_{\boldsymbol {\varphi }})=s$
are satisfied.
Accordingly, we define our
$\chi ^2$
-statistic using
$\hat {\mathbf {W}}=\boldsymbol {\hat {\Sigma }}_{\boldsymbol {\varphi }}^{-}$
as

such that T asymptotically follows a
$\chi ^2$
distribution with s degrees of freedom (
$\chi ^2_s)$
under correct model specification. Since this asymptotic property holds for any
$1 \leq s \leq \text {rank}(\boldsymbol {\Sigma }_{\boldsymbol {\varphi }})$
, a sufficiently small value, such as
$s=1$
, can be chosen as the reference degrees of freedom to avoid numerical instability.
With the choice of
$s=1$
, the reference distribution becomes
$\chi ^2_1$
, where
$T> 3.84$
indicates significant overall misfit at
$\alpha =0.05$
based on multiple residuals in
$\boldsymbol {e}_{\boldsymbol {\varphi }}$
. The performance of the proposed
$\chi ^2$
-statistic with
$s=1$
will be evaluated through our simulation studies in Section 3.
2.4 Examples
In this section, we provide three examples to illustrate how our proposed framework can be adapted to assess specific model assumptions in common factor models. The first example (Section 2.4.1) focuses on assessing the distributional assumption of LVs. The second and third examples (Sections 2.4.2 and 2.4.3) address testing MV-level linearity and homoscedasticity by evaluating each MV’s mean and variance functions conditional on the LVs. Note that the key connection from our general framework to practical model fit assessment lies in the careful design of the functions
$\mathbf {H}$
and the selection of the transformation function
$\boldsymbol {\varphi }$
. Although we focus on only three examples here, further variations are open for future research.
2.4.1 Testing normality of LV density
Let
$\boldsymbol {x}_1, \dots , \boldsymbol {x}_Q$
denote the Q distinct values of the d-dimensional LV, for which we aim to assess the misfit in LV density. Given our goal of evaluating normality assumption of LVs, the dependency of summary quantities on LV values is now naturally introduced in our formulation, even when not explicitly expressed in the notation.
First, construct
$\mathbf {H}$
in Equation (11) with
$k=Q$
as

in which each component
$H_\ell $
(
$\ell =1, \dots , Q)$
represents the posterior density of
$\boldsymbol {x}_\ell $
given
$\mathbf {Y}_i$
:

With this choice of
$H_\ell $
, its population expectation, denoted by
$\eta _\ell $
, corresponds to the LV density
$\phi (\boldsymbol {x}_\ell; \boldsymbol {\xi })$
defined in Equation (6):

Equation (26) follows from Bayes’ theorem,

along with Equation (9). The empirical estimator of
$\eta _\ell $
, denoted by
$\hat {\eta }_\ell $
, is then constructed as

representing the sample mean of the posterior densities. Now in the vector form, the difference between
$\hat {\boldsymbol {\eta }}$
and
$\boldsymbol {\eta }$
(Equations (13) and (12)) collectively represents the discrepancies between the population LV density and its sample estimate on
$\boldsymbol {x}_1, \dots , \boldsymbol {x}_Q$
. Thus, with
$\boldsymbol {\varphi }=\boldsymbol {\iota }$
being the identity transformation (see Equation (16)), the resulting
$Q \times 1$
vector of residuals,

reflects the nonnormality of the LVs.
With this formulation, we construct the z-statistic from Equation (20) based on each single residual
$e_{\boldsymbol {\iota }, \ell }=\hat {\eta }_\ell (\hat {\boldsymbol {\xi }})-\eta _\ell (\hat {\boldsymbol {\xi }})$
, which captures the misfit in LV density at
$\boldsymbol {x}_\ell $
. In addition, we construct the
$\chi ^2$
-statistic from Equation (23) based on
$\boldsymbol {e}_{\boldsymbol {\iota }}$
, which reflects the overall misfit in LV density across
$\boldsymbol {x}_1, \dots , \boldsymbol {x}_Q$
. Given the nature of fit assessment in our example, we will also use the terms pointwise statistic and summary statistic for the z- and
$\chi ^2$
-statistics, respectively. Also, we denote these statistics developed for this LV density example as
$z_1(\boldsymbol {x}_\ell )$
and
$T_1$
, respectively, for use in later sections. The pointwise statistic
$z_1$
coincides with the one proposed in the context of IRT for detecting misfit in the LV density under unidimensional IRT models (Monroe, Reference Monroe2021).
2.4.2 Testing MV-level linearity
On the same grid of LV values
$\boldsymbol {x}_1, \dots , \boldsymbol {x}_Q$
, construct
$\mathbf {H}$
in Equation (11) with
$k=2Q$
as follows:

in which


for
$\ell =1, \dots , Q$
.
Following the construction of
$\hat {\boldsymbol {\eta }}$
and
$\boldsymbol {\eta }$
as in Equations (13) and (12), define the
$Q \times 1$
vector of transformed residuals (i.e.,
$k'=Q$
) as

in which a transformation function
$\boldsymbol {\varphi }: \mathcal {R}^{2Q} \to \mathcal {R}^Q$
is applied to produce ratios:

with
$\boldsymbol {\gamma }=(\gamma _1,\dots ,\gamma _Q,\gamma _{Q+1},\dots ,\gamma _{2Q})^{\top }$
. With this setup, the residuals in Equation (33) represent the misfit in the mean response for the jth MV, conditional on
$\boldsymbol {x}_1,\dots ,\boldsymbol {x}_Q$
.
To elaborate, the expectations of
$H_{\ell }$
and
$H_{Q+\ell }$
are obtained by


from Equations (9) and (27), with corresponding empirical estimators


After applying the ratio transformation function in Equation (34) to the vector form counterparts
$\hat {\boldsymbol {\eta }}$
and
$\boldsymbol {\eta }$
, the
$\ell $
th component of
$\boldsymbol {e}_{\boldsymbol {\varphi }}$
in Equation (33) becomes

Note that the conditional expectation in Equation (39),
$\mathbb {E}(Y_{ij} | \boldsymbol {x}_\ell; \hat {\boldsymbol {\xi }})$
, is known as the item characteristic curve under a unidimensional IRT model with dichotomous response variables. Based on this, similar form of residuals has been proposed to assess item-level fit in IRT models (e.g., Haberman et al., Reference Haberman, Sinharay and Chon2013). In the context of the common factor model, the conditional expectation is a linear function of
$\boldsymbol {x}_\ell $
:

Therefore, the residual
$e_{\boldsymbol {\varphi }, \ell }$
in Equation (39) serves as a basis for assessing this linearity assumption for the jth MV.
With this background, the pointwise statistic is now constructed from Equation (20) to assess the MV-level linearity at
$\boldsymbol {x}_\ell $
, while the summary statistic is constructed from Equation (23) to provide an overall assessment across multiple LV grids
$\boldsymbol {x}_1, \dots , \boldsymbol {x}_Q$
. We refer to these statistics as
$z_2(\boldsymbol {x}_\ell )$
and
$T_2$
, respectively.
Before moving on to the next example, it is worth noting that alternative ways to formulate residuals targeting the same population quantity may exist. For instance, with the target population quantity
$\mathbb {E}(Y_{ij} | \boldsymbol {x}_\ell; \boldsymbol {\xi })$
as in this example, we can keep the formulation of
$\mathbf {H}$
as in Equation (24), with each component defined as

With this choice of
$H_\ell $
, its population expectation
$\eta _\ell $
directly becomes
$\mathbb {E}(Y_{ij} | \boldsymbol {x}_\ell; \boldsymbol {\xi })$
, eliminating the need for a transformation and thus simplifying the formulation of residuals. The corresponding empirical estimator is

We observed in pilot simulations that the performance of generalized residuals based on Equation (42) is very sensitive to misspecification of the LV density
$\phi (\boldsymbol {x}_\ell; \boldsymbol {\xi })$
. In contrast, the ratio formulation we presented in Equation (39) also estimates the LV density in Equation (42), resulting in a more targeted fit diagnostic. Further discussion on this is provided in the supplementary material using a real-data example.
2.4.3 Testing MV-level homoscedasticity
In this example, we retain the setup from Section 2.4.2, modifying only the first
$\mathcal{Q}$
elements of
$\mathbf{H}(\mathbf{Y}_i; \boldsymbol{\xi})$
defined in Equation (31) as follows:

The expectation of
$H_{\ell }$
then becomes

with the empirical estimator

The definitions of
$H_{Q+\ell }, \eta _{Q+\ell },$
and
$\hat \eta _{Q+\ell }$
remain unchanged, as specified in Equations (32), (36), and (38). After applying the ratio transformation function in Equation (34) to the corresponding vector forms
$\boldsymbol {\eta }$
and
$\boldsymbol {\hat \eta }$
, the
$\ell $
th component of the resulting
$\boldsymbol {e}_{\boldsymbol {\varphi }}$
becomes

Under the common factor model, the conditional variance of the jth MV in Equation (46) is a constant (i.e.,
$\mathrm {Var}(Y_{ij} | \boldsymbol {x}_\ell; \hat {\boldsymbol {\xi }})=\hat {\theta }_j$
) and does not depend on LVs
$\boldsymbol {x}_\ell $
. Therefore, this residual can be used to assess the constant-variance assumption for the jth MV. Similar to the previous examples, the pointwise statistic is constructed from Equation (20) to assess the MV-level homoscedasticity at
$\boldsymbol {x}_\ell $
, while the summary statistic is constructed from Equation (23) to summarize results across multiple LV grids
$\boldsymbol {x}_1, \dots , \boldsymbol {x}_Q$
. We refer to these statistics as
$z_3(\boldsymbol {x}_\ell )$
and
$T_3$
, respectively.
In summary, we have discussed three examples of fit assessments that can be developed within our framework. We constructed pointwise statistics,
$z_1(\boldsymbol {x}_\ell ), z_2(\boldsymbol {x}_\ell ),$
and
$z_3(\boldsymbol {x}_\ell )$
(
$\ell =1, \dots , Q$
), and summary statistics,
$T_1, T_2,$
and
$T_3$
, optimized for testing the normality of LV density, MV-level linearity of the mean function, and MV-level homoscedasticity of the variance function, respectively. Table 1 provides a summary of the three examples discussed so far.
Table 1 Summary of components for constructing the three examples discussed in Section 2.4

Note: For brevity, dependency on model parameters
$\boldsymbol {\xi }$
is omitted. In the first column of the second row,
$[\boldsymbol {\varphi }(\boldsymbol {\eta })]_\ell $
denotes the
$\ell $
th component of the transformed
$\boldsymbol {\eta }$
, representing the population quantity of interest. In these examples, the index
$\ell $
distinguishes the LV values under evaluation, where
$\ell = 1, \dots , Q.$
3 Simulation study
To evaluate the finite sample behaviors of the proposed test statistics under our framework, two simulation studies were conducted. Study 1 (Section 3.1) evaluates the performance of
$z_1(\boldsymbol {x}_\ell )$
and
$T_1$
, while Study 2 (Section 3.2) evaluates the performance of
$z_2(\boldsymbol {x}_\ell ), T_2$
and
$z_3(\boldsymbol {x}_\ell ), T_3$
. Within each study, we manipulate two factors: (1) Sample sizes (
$n=100, 500, 1000$
), representing small, moderate, and large samples and (2) the presence or absence of model misspecification. For each simulation condition, pointwise z-tests at
$\boldsymbol {x}_1, \dots , \boldsymbol {x}_Q$
and an overall
$\chi ^2$
-test are performed with one simulated dataset, which is replicated 500 times to obtain empirical rejection rates under each condition.
In the following sections addressing each study, we first describe the simulation setup, including the simulation conditions and data-generating models, followed by the detailed procedures for calculating the test statistics (Sections 3.1.1 and 3.2.1). We then examine their performance using empirical rejection rates as the evaluation criteria (Sections 3.1.2 and 3.2.2). As no convergence issues were found in either study, empirical rejection rates were calculated as the proportion of rejected cases among the 500 replications. At the end of each study, results from some conventional fit diagnostics are also presented for comparison. All computations were performed in R (R Core Team, 2024). Example code is included in the supplementary material.
3.1 Simulation Study 1
3.1.1 Simulation setup
As a special case of the common factor model presented in Section 2.1.2, we considered a two-dimensional independent-cluster model with LVs:
$\mathbf {X}_i=(X_{i1}, X_{i2})^{^{\top }}$
. The number of MVs was set to 20 (i.e.,
$m=20$
), with the first ten MVs loaded only on
$X_{i1}$
, and the remaining ten only on
$X_{i2}$
. The two LVs were assumed to be correlated with each other.
For data generation, two different LV densities were specified to represent correctly specified and misspecified conditions. For the correctly specified condition,
$\mathbf {X}_i, i=1, \dots , n,$
were generated from the standard bivariate normal distribution with a correlation coefficient of 0.2. For the misspecified condition, we simulated the LVs from a mixture of two bivariate normal distributions:
$\mathbf {X}_i \sim 0.5\mathcal {N}(\boldsymbol {\mu }^{(1)}, \boldsymbol {\Phi }^{(1)})+0.5\mathcal {N}(\boldsymbol {\mu }^{(2)}, \boldsymbol {\Phi }^{(2)}),$
in which
$\boldsymbol {\mu }^{(1)}=(-0.6, -0.6)^{\top }$
,
$\boldsymbol {\mu }^{(2)}=(0.6, 0.6)^{\top }$
, and

With this setup, the LV distributions in both conditions have means of zero and the covariance matrix of

The left panel of Figure 1 depicts the contour plot of the two data-generating LV densities, with one superimposed on the other. The right panel displays the conditional density of
$X_{i1}=x_1$
given the mean of
$X_{i2}$
, which is zero in this case. The right panel will be revisited in the discussion of results in Section 3.1.2.

Figure 1 Left panel: Contour plot of the data-generating LV densities under correctly specified (gray solid lines) and misspecified (black dashed lines) conditions. Note: Right panel: Conditional density of
$X_{i1}=x_1$
given
$X_{i2}=x_2=0$
under the same conditions. The conditional density of
$X_{i2}=x_2$
given
$X_{i1}=x_1=0$
is identical to that shown in the right panel.
In both correctly and incorrectly specified conditions, other data-generating parameters were set equal. In particular, MVs were assumed to have zero intercepts and exhibit an independent-cluster factor structure:
$\boldsymbol {\nu }=\mathbf {0}$
and

in which
$\boldsymbol {a}_1=\boldsymbol {a}_2=(\sqrt {0.3},\sqrt {0.5},\sqrt {0.7},\sqrt {0.3},\sqrt {0.5},\sqrt {0.7},\sqrt {0.3},\sqrt {0.5},\sqrt {0.7},\sqrt {0.3})^{\top }$
and
$\textbf{0}_{10}$
indicates a
$10 \times 1$
vector of zeros. Error variances
$\theta _j, j=1, \dots , 20,$
were then determined by diagonal elements of the square matrix
$\mathbf {I}-\boldsymbol {\Lambda } \boldsymbol {\Phi } \boldsymbol {\Lambda }^{\top }$
. This model specification implies that the communalities alternate among 0.3, 0.5, and 0.7, covering a range of values typically observed in practice. A value of 0.3 represents low communality, while 0.7 represents high communality (MacCallum et al., Reference MacCallum, Widaman, Zhang and Hong1999). The value of 0.5 was chosen as an intermediate point to reflect moderate communality.
Upon generating data using the specified parameter values, the proposed tests for assessing the LV density fit were conducted using the following procedures: (1) estimate the model parameters
$\boldsymbol {\xi }$
by ML to get
$\hat {\boldsymbol {\xi }}$
, (2) construct
$\boldsymbol {\hat {\eta }}(\hat {\boldsymbol {\xi }})$
and
$\boldsymbol {\eta }(\hat {\boldsymbol {\xi }})$
and calculate residuals by
$\boldsymbol {e}_{\boldsymbol {\iota }}=\boldsymbol {\hat {\eta }}(\hat {\boldsymbol {\xi }})-\boldsymbol {\eta }(\hat {\boldsymbol {\xi }})$
, (3) estimate the ACM of
$\boldsymbol {e}_{\boldsymbol {\iota }}$
, and (4) compute
$z_1(\boldsymbol {x}_\ell )$
for
$\ell =1,\dots , Q$
and
$T_1$
to conduct pointwise z-tests and the overall
$\chi ^2$
-test.
For parameter estimation in Step 1, the lavaan package version 0.6-14 (Rosseel, Reference Rosseel2012) was used with the default option of ML estimation with the normal likelihood. Mean structure was added to the model (meanstructure = TRUE). To check model convergence, we used the following criteria: (1) the absence of Heywood cases, (2) the maximum absolute element of the gradient vector is less than
$10^{-4}$
, and (3) the minimum eigenvalue of the Hessian matrix is greater than zero. Default maximum number of iterations was used. In Step 3, the estimated ACM was obtained following the process described in Section 2.2.2 with
$M=10,000$
. The inverse of the expected information matrix was obtained using the inverted.information option in the lavInspect function, and the gradient of the individual log-likelihood,
$\nabla _{\boldsymbol {\xi }} \log f(\mathbf {Y}_i; \hat {\boldsymbol {\xi }})$
, was obtained using the lavScores function. In Step 4,
$19$
evenly spaced values between
$-3$
and
$3$
were used as evaluation points per LV, resulting in a total of
$19^2=361$
pointwise statistics (i.e.,
$Q=361$
) for evaluating the fit at every possible combination of LVs. For the summary statistic, a subgrid of
$7^2=49$
points, obtained from seven equally spaced values from
$-2$
to
$2$
per LV, were used for illustration purposes. In addition, we set the degree of freedom
$s=1$
in Equation (23) so that the reference limiting distribution for
$T_1$
becomes
$\chi _1^2$
.
3.1.2 Results
Empirical Type I error rates were investigated under the correctly specified condition at the nominal level of
$\alpha =0.05$
. If the proposed tests perform well, the Type I error rates should closely match the nominal level. Figure 2 depicts the Type I error results for the pointwise z-tests (presented as points connected by lines) and the overall
$\chi ^2$
-tests (presented as numbers in parentheses alongside the sample sizes) under various sample size conditions. The left and right panels of Figure 2 show the results evaluated at each
$x_1$
and
$x_2$
value, respectively, conditional on the mean of the other LV, which is zero. The results were summarized this way to avoid the influence of LV combinations that are highly unlikely to occur, such as
$\boldsymbol {x}=(-3, 3)^{\top }$
. In addition, 95% normal-approximation confidence interval (i.e., [0.03, 0.07]) was obtained around the nominal
$\alpha $
-level (i.e., 0.05), which indicates an acceptable range of Type I error rates considering MC error.

Figure 2 Type I error results for the pointwise and overall LV density fit tests at
$\alpha =0.05$
.
Note: Gray-scaled horizontal dotted lines represent the nominal level
$\alpha =0.05$
and the MC confidence band. Pointwise z-test results are summarized for each LV, conditional on the mean of the other LV. Overall
$\chi ^2$
-test results are presented in parentheses next to the sample size legend.
The results indicate that Type I error rates for both pointwise and overall tests are well-controlled across all sample size conditions. For the pointwise tests, most of the points in Figure 2 fall within the MC confidence band, suggesting well controlled Type I error rates. Although a few points fall noticeably outside the confidence band at extreme LV values, these extreme values have a low chance of occurrence in practice given the normal density, and the extent of their deviations is also minimal. For the summary results, the Type I error rates closely align with the nominal level across all sample size conditions, indicating satisfactory performance of
$T_1$
.
Empirical power was investigated next under the misspecified condition. Figure 3 displays the power results for the pointwise z-tests and the overall
$\chi ^2$
-tests under various sample size conditions. The results were summarized in the same way as in Figure 2. In general, both pointwise and overall tests demonstrate increasing power in detecting misfit as the sample size increases. While the tests show limited power with a small sample size (
$n=100$
), they exhibit satisfactory power with moderate to large sample sizes (
$n=500$
and
$n=1000$
).

Figure 3 Power results for the pointwise and overall LV density fit tests.
Note: Pointwise z-test results are summarized for each LV, conditional on the mean of the other LV. Overall
$\chi ^2$
-test results are presented in parentheses next to the sample size legend.
Specifically, the power results for the pointwise z-tests are consistent with the shape of the data-generating LV densities shown in the right panel of Figure 1. When constructing the pointwise statistics, recall that
$\hat \eta _\ell (\hat {\boldsymbol {\xi }})$
attempts to recover the LV density under the misspecified condition (depicted with black dashed lines in Figure 1), while
$\eta _\ell (\hat {\boldsymbol {\xi }})$
is simply a unimodal normal density in this case (depicted with gray solid linesFootnote
2
). The values of
$z_1(\boldsymbol {x}_\ell )$
are derived from the discrepancy between these two; therefore, power is expected to be high in regions with large discrepancies and relatively low near crossover points. In our simulation study, both LVs responded to these varying degrees of discrepancy, as indicated by the oscillating pattern of results in Figure 3.
As suggested by a reviewer, we report in Table 2, the performance of the Satorra–Bentler adjusted fit diagnostics (Satorra & Bentler, 2001) under our misspecified condition. The Satorra–Bentler adjustment aims to correct for the effect of non-normal data on the conventional fit measures. Although not presented here, the adjustment resulted in model fit diagnostics similar to those without the correction. In addition, as can be seen in Table 2, the power of the
$\chi ^2$
model fit test decreased with increasing sample size, converging to the nominal
$\alpha $
level. All fit indices investigated (CFI, TLI, SRMR, and RMSEA) also did not respond to the misspecified LV density. These results show that conventional GOF statistics capture only imperfect recovery of mean and covariance structures but not other important aspects of misfit.
Table 2 Results from the conventional fit diagnostics under the misspecified condition

Note: Satorra–Bentler correction was applied to all statistics. The second column shows rejection rates of the Satorra–Bentler adjusted
$\chi ^2$
model fit tests. The third to sixth columns present average values of the fit indices over 500 replications. Standard errors are presented in parentheses.
Assessing fit based solely on the recovery of mean and covariance structures may be sufficient when examining parameter estimates such as factor loadings. However, when the focus shifts to factors themselves or their relationships, assessing normality at the latent level becomes crucial. In the Study 1 setup, interpreting the correlation coefficient of 0.2 as indicating a positive linear relationship between the two LVs under the assumption of bivariate normality would be misleading, as there actually exist two subgroups with negatively correlated LVs. In general, the inter-LV correlation should be interpreted with caution when there is potential violation of normality at the LV level.
3.2 Simulation Study 2
3.2.1 Simulation setup
In Study 2, the linear normal one-factor model with
$X_i \sim \mathcal {N}(0, 1)$
was under investigation with or without the misspecification in MV-level mean/variance functions. Let
$\mu _j$
and
$\sigma _j^2$
denote the mean and variance of the conditional distribution
$Y_{ij}|X_i$
. Under the linear normal factor model,
$Y_{ij}|X_i=x \sim \mathcal {N}(\mu _j(x), \sigma _j^2(x))$
with the mean function
$\mu _j(x)=\nu _j+\lambda _j x$
and variance function
$\sigma _j^2(x)=\theta _j$
. MV-level fit assessments were performed by investigating the linearity of
$\mu _j$
and homoscedasticity of
$\sigma _j^2$
.
The data-generating model consisted of four types of MVs summarized in Table 3. The four types were determined by fully crossing linear versus quadratic mean functions and constant versus log-linear variance functions, labeled as LMCV, QMCV, LMLV, and QMLV, respectively. Regardless of the type of MVs, the intercept was set to zero, i.e.,
$\nu _j=0$
, and the error variance was obtained by
$\theta _j=1-\lambda _j^2$
. The factor loading
$\lambda _j$
was set to alternate between
$\sqrt {0.3}, \sqrt {0.5},$
and
$\sqrt {0.7}$
for LMCVs;
$\lambda _j$
was fixed at
$\sqrt {0.5}$
for the other three types of MVs. After that, coefficients
$\kappa _j$
,
$\gamma _{0j}$
, and
$\gamma _{1j}$
were added for the misspecified MVs. The quadratic coefficient
$\kappa _j$
for QMCV and QMLV was fixed at
$-0.1$
;
$\gamma _{0j}$
and
$\gamma _{1j}$
for LMLV and QMLV were set as
$\gamma _{0j}=\log \theta _j$
and
$\gamma _{1j}=0.3$
, respectively. In Figure 4, the black dashed and solid lines illustrate the shapes of
$\mu _j(x)$
and
$\sigma _j^2(x)$
with and without these additional coefficients, respectively, the combination of which comprises the four types of MVs. As an illustration, the values of
$\nu _j=0$
,
$\lambda _j=\sqrt {0.5}$
, and
$\theta _j=0.5$
were used in generating the figure.
Table 3 Four types of MVs used for data generation

Note: LMCV: linear mean function, constant variance function. QMCV: quadratic mean function, constant variance function. LMLV: linear mean function, log-linear variance function. QMLV: quadratic mean function, log-linear variance function.

Figure 4 Black lines illustrate the mean function
$\mu _j(x)$
(left panel) and variance function
$\sigma _j^2(x)$
(right panel) under correctly specified (LM, CV) and misspecified (QM, LV) conditions in the data-generating model.
Note: Gray dotted lines (LMQ, CVL) approximate the expected functions under misspecification: a misfitting linear mean under a quadratic population (left) and constant variance under a log-linear population (right). LM: linear mean, QM: quadratic mean, CV: constant variance, LV: log-linear variance, LMQ: linear mean under quadratic population, and CVL: constant variance under log-linear population.
For both correctly and incorrectly specified conditions, the number of MVs was fixed at ten (
$m=10$
). In the correctly specified condition, all ten MVs were LMCVs. In the misspecified condition, three out of ten MVs were misspecified—one for QMCV, LMLV, and QMLV, respectively. The first seven MVs in the misspecified condition were set to be correctly specified to allow for the evaluation of the empirical false detection rate—the proportion of cases in which the tests falsely detect these correctly specified MVs as misfitting MVs in the presence of both correctly and incorrectly specified MVs.
Upon generating the data, the proposed MV-level fit assessments were performed using the following procedures: (1) estimate the model parameters
$\boldsymbol {\xi }$
by ML to get
$\hat {\boldsymbol {\xi }}$
, (2) construct
$\boldsymbol {\hat {\eta }}(\hat {\boldsymbol {\xi }})$
and
$\boldsymbol {\eta }(\hat {\boldsymbol {\xi }})$
, and apply the ratio transformation to obtain residuals by
$\boldsymbol {e}_{\boldsymbol {\varphi }}=\boldsymbol {\varphi }(\boldsymbol {\hat {\eta }}(\hat {\boldsymbol {\xi }}))-\boldsymbol {\varphi }(\boldsymbol {\eta }(\hat {\boldsymbol {\xi }}))$
, (3) estimate the ACM of
$\boldsymbol {e}_{\boldsymbol {\varphi }}$
, and (4) compute
$z_2(x_\ell ), z_3(x_\ell )$
for
$\ell =1,\dots ,Q$
to conduct pointwise z-tests and
$T_2, T_3$
to conduct the overall
$\chi ^2$
-tests. Computational details follow those introduced in Section 3.1.1 for Study 1, except for the number of evaluation points used. In Step 4, we adopted
$31$
evenly spaced LV values between
$-3$
and 3 as evaluation points (i.e.,
$Q=31$
) for
$z_2(x_\ell )$
and
$z_3(x_\ell )$
. Among these, 11 sub-points ranging from
$-2$
to
$2$
were selected to construct
$T_2$
and
$T_3$
for illustration.
3.2.2 Results
Empirical Type I error rates were investigated under the correctly specified condition at the nominal level of
$\alpha =0.05$
. Figure 5 illustrates the Type I error results for the pointwise z-tests (presented as points connected by lines) and the overall
$\chi ^2$
-tests (presented as numbers in parentheses alongside the sample sizes) under various sample size conditions. The performance of
$z_2$
and
$T_2$
, which assess the linearity of
$\mu _j$
, is presented in the left panel, while the performance of
$z_3$
and
$T_3$
, which assess the homoscedasticity of
$\sigma _j^2$
, is presented in the right panel. In the figure, 95% normal-approximation confidence intervals are delineated around the nominal
$\alpha $
-level, indicating an acceptable range of Type I error rates considering MC error. Results for only one MV (
$j=2$
with communality 0.5) are presented due to space limit, but similar results were found for the other nine MVs in the correctly specified condition.

Figure 5 Type I error results for the pointwise and overall MV-level fit tests at
$\alpha =0.05$
.
Note: The left panel shows the results for
$z_2(x)$
and
$T_2$
; the right panel shows the results for
$z_3(x)$
and
$T_3$
. Gray-scaled horizontal dotted lines represent the nominal level
$\alpha =0.05$
and the MC confidence band. Pointwise z-test results are presented as points and connected by lines. Overall
$\chi ^2$
-test results are presented within parentheses next to the sample size legends. The results are presented for one MV (
$j=2$
) in the correctly specified condition. Similar results were observed for the other nine MVs.
The results demonstrate that Type I error rates are well-controlled for both pointwise and overall tests across all sample size conditions. In the pointwise results, the majority of points in Figure 5 fall within the MC confidence band, suggesting decent performance of
$z_2$
and
$z_3$
. Some noticeable deviations are observed in the extreme range of the LV; however, these extreme LV values close to
$-3$
and 3 under the standard normal density have a rare chance of occurrence in practice, making them of no significant concern. Overall
$\chi ^2$
-test results closely match with the nominal level across all sample size conditions, verifying the satisfactory performance of
$T_2$
and
$T_3$
.
Next, empirical power was investigated under the misspecified condition. The first row of Figure 6 presents results for
$z_2(x)$
and
$T_2$
, and the second row shows results for
$z_3(x)$
and
$T_3$
. Each column contains results for QMCV, LMLV, and QMLV, respectively. The proposed MV-level fit tests exhibit distinct behaviors in terms of power depending on the types of MVs. Specifically,
$z_2(x)$
and
$T_2$
show decent power only against QMCV and QMLV, indicating that they are mainly responsive to detecting misspecification in
$\mu _j$
but not in
$\sigma _j^2$
. On the other hand,
$z_3(x)$
and
$T_3$
show decent power only against LMLV and QMLV, demonstrating the opposite pattern. These results suggest that the two types of statistics, developed respectively for testing
$\mu _j$
and
$\sigma _j^2$
, perform their intended roles effectively. However, with the small sample of
$n=100$
, the power tends to be low for all statistics, with little differentiation observed in their respective roles.

Figure 6 Power results for the pointwise and overall MV-level fit tests.
Note: The first row of the figure shows the results for
$z_2(x)$
and
$T_2$
; the second row shows the results for
$z_3(x)$
and
$T_3$
. Each column represents results for QMCV, LMLV, and QMLV (i.e.,
$j=8, 9,$
and
$10$
in the misspecified condition). Pointwise z-test results are presented as points and connected by lines. Overall
$\chi ^2$
-test results are presented within parentheses next to the sample size legends.
Another notable observation is the oscillating pattern in the power of the z-tests, which is closely related to the shapes of
$\mu _j$
and
$\sigma _j^2$
illustrated in Figure 4. In this figure, the gray dotted lines (LMQ in the left panel and CVL in the right) represent the expected linear mean and constant variance functions under misspecification, approximated by the average fitting from the
$n=1000$
condition in Simulation Study 2. To evaluate the fit, LMQ and CVL are compared to the empirical estimates of QM and LV, respectively. Accordingly, power is expected to be low around points where QM and LMQ (or LV and CVL) intersect and increase as the discrepancy between the two widens. This expectation is confirmed by the simulation results shown in Figure 6, where both
$z_2(x)$
and
$z_3(x)$
exhibit oscillating power patterns that mirror the shape depicted in Figure 4, within approximately the 95% range of the LV. Notably, the misspecified linear mean function under a quadratic population (LMQ) intersects the true quadratic mean (QM) at two distinct points rather than one, which explains the elevated power of
$z_2(x)$
in the middle range.
Finally, empirical false detection rates were examined using the correctly specified MVs (i.e., LMCVs) in the misspecified condition. The results are presented in Figure 7, in which the performance of
$z_2(x)$
and
$T_2$
is presented in the left panel, while the performance of
$z_3(x)$
and
$T_3$
is presented in the right panel. Again, we provide results for only one MV (
$j=2$
with communality 0.5); similar results were observed for the other six LMCVs in the misspecified condition.

Figure 7 False detection rates for the pointwise and overall MV-level fit tests at
$\alpha =0.05$
.
Note: The left panel shows the result for
$z_2(x)$
and
$T_2$
; the right panel shows the results for
$z_3(x)$
and
$T_3$
. Gray-scaled horizontal dotted lines represent the nominal level
$\alpha =0.05$
and the MC confidence band. Pointwise z-test results are presented as points and connected with lines. Overall
$\chi ^2$
-test results are presented within parentheses next to the sample size legends. The results are presented for one LMCV (
$j=2$
) in the misspecified condition. Similar results were observed for the other six LMCVs.
In the presence of misspecified MVs in a model, every correctly specified MV will eventually be detected as misfit with a large enough sample size. Hence, the extent to which the rate of false detection is controlled at a low level is another crucial evaluation criterion for MV-level fit tests. In our simulation setup, false detection rates for both pointwise and overall statistics were well-controlled without substantial inflation, indicating their capability of distinguishing between good-fitting and bad-fitting MVs. The rates were similar to the nominal
$\alpha $
-level in the middle range of the LV. Some inflation was observed in the extreme high end of the LV with the large sample sizes (
$n=500 \text { and } 1000$
); however, these LV values close to 3 are again have a rare chance of occurrence in reality.
Similar to Study 1, conventional fit assessment methods with the Satorra–Bentler correction were evaluated for the misspecified condition. Again, there was little difference between adjusted and unadjusted statistics, and as indicated by the results in Table 4, MV-level misspecifications in our setup were not detected by these conventional GOF diagnostics. Note that MV-level misfit cannot be captured by inspecting marginal residual means and variances for each MV either, as these values are always zero for the linear one-factor model, preventing any insights into MV-level misfit conditional on the LV.
Table 4 Results from the conventional fit diagnostics under the misspecified condition

Note: Satorra–Bentler correction was applied to all statistics. The second column shows rejection rates of the Satorra–Bentler adjusted
$\chi ^2$
model fit tests. The third to sixth columns present average values of the fit indices over 500 replications. Standard errors are presented in parentheses.
4 Empirical example
In this section, we illustrate the application of our proposed fit assessment method using item-level RT data obtained from an online administration of the Test of Relational Reasoning-Junior (TORRjr) among Chinese elementary and middle school students (Zhao et al., Reference Zhao, Alexander and Sun2021). Among the four subscales of TORRjr, we focus on the antinomy subscale, which consists of eight multiple-choice items (
$m=8$
). The dataset comprises a sample of
$n=744$
observations with no missing data. The same RT data were previously analyzed by Liu & Wang (Reference Liu and Wang2022) using a semiparametric approach. Additional information about TORRjr is available, for example, in Alexander et al. (Reference Alexander, Dumas, Grossnickle, List and Firetto2016).
In the RT literature, van der Linden’s log-normal RT model (van der Linden, Reference van der Linden2006) stands out as one of the most widely used models for analyzing RT data. This model is essentially a linear normal one-factor model fitted to log-transformed RTs (e.g., Molenaar et al., Reference Molenaar, Tuerlinckx and van der Maas2015; Ranger et al., Reference Ranger, Kuhn and Ortner2020; Sinharay & van Rijn, Reference Sinharay and van Rijn2020), where individual’s latent processing speed is represented by the LV
$X_i$
, and the item-level log-transformed RTs serve as the MVs. Higher values of
$X_i$
indicate slower processing speeds; therefore, we refer to
$X_i$
as latent slowness. It is common and convenient to assume that
$X_i$
follows a standard normal distribution. One natural application of our method is to assess the fit of the log-normal RT model.
The importance of evaluating model fit in RT analysis has been discussed by Sinharay & van Rijn (Reference Sinharay and van Rijn2020), who presented several fit assessment methods for the log-normal RT model. For example, they employed the Shapiro–Wilk test (Shapiro & Wilk, Reference Shapiro and Wilk1965) to assess item-level fit of the log-normal RT model by evaluating the normality of each MV. While some of their statistics align with ours in that they examine beyond the residual means and covariances to enable item-level diagnostics, they did not address normality at the LV level. Additionally, localized assessments conditional on LV values and the accompanying graphical diagnostics are distinctive features of our approach. Although we do not offer a direct comparison in this empirical example, similar analyses using the same RT data were reported in Liu & Wang (Reference Liu and Wang2022), where the Shapiro–Wilk tests were conducted to assess the normality of each MV’s marginal distribution.
Before proceeding with our proposed fit assessments, we summarize in Table 5 the results of traditional GOF assessments for the linear-normal factor model fitted to the log-transformed RT data. In line with our simulation setup, all diagnostics were adjusted using the Satorra–Bentler correction. The
$\chi ^2$
model fit test was statistically significant at
$\alpha =0.05$
, indicating a poor fit of the model (
$\chi _{20}^2=60.35,\ p < .0001$
). However, with a sample size of
$n=744$
, this result would often be ignored in practice due to the tendency of the
$\chi ^2$
test to suggest poor fit in large samples, even with trivial amount of misfit. Instead, researchers typically rely on descriptive GOF indices in such cases. Here, values of CFI=0.97, TLI=0.96, SRMR=0.02, and RMSEA=0.05 all fell within acceptable ranges (according to the rules of thumb by, e.g., Hu & Bentler, Reference Hu and Bentler1999), suggesting an acceptable model fit. In what follows, we illustrate how our fit diagnostics can provide deeper insights using the same data and model.
Table 5 Results from the conventional GOF diagnostics based on TORRjr RT data, with the Satorra–Bentler correction applied to all statistics

4.1 Fit diagnostics
We now apply our proposed test statistics to evaluate the normality of the LV density and item-level linearity and homoscedasticity, all of which are fundamental assumptions in the linear normal factor model. For each analysis described below, 31 equally spaced LV values from
$-3$
to 3 were selected as evaluation points, and all of them were used to construct the summary statistic. This summary information can be useful in practical applications where researchers seek an overall judgment based on multiple pointwise assessments. All other specific options were applied consistently with those used in the simulation studies.
The normality of the LV density was tested using the pointwise statistics
$z_1(x_\ell )$
,
$\ell =1, \dots , 31$
, along with the summary statistic
$T_1$
. Results are summarized graphically in Figure 8. In the left panel, circled points represent the
$z_1(x_\ell )$
values, with
$T_1$
and the corresponding p-value for the overall
$\chi ^2$
-test presented at the top. Horizontal dashed lines at
$\pm 1.96$
represent the 95% pointwise confidence intervals at
$\alpha =0.05$
. Points falling outside this range (i.e.,
$|z_1|>1.96$
) indicate significant misfit in the LV density. The right panel offers a more intuitive graphical representation of the same information. Here, the solid line depicts the model-implied standard normal density, drawn by connecting the points
$\eta _\ell (\hat {\boldsymbol {\xi }})=\phi (x_\ell )$
. Similarly, the dotted line represents the empirically estimated LV density,
$\hat {\eta }_\ell (\hat {\boldsymbol {\xi }})$
. Dashed lines indicate the 95% pointwise confidence intervals, calculated by
$\hat {\eta }_\ell (\hat {\boldsymbol {\xi }}) \pm 1.96 [\hat {\sigma }_{\boldsymbol {\varphi },\ell } / \sqrt {n}]$
(from Equation (20) with
$\boldsymbol {\varphi }=\boldsymbol {\iota }$
). In this plot, the part of the solid line that falls outside the confidence band indicates significant misfit.

Figure 8 LV density fit test results at
$\alpha =0.05$
.
Note: The left panel displays results with respect to the test statistics
$z_1(x)$
and
$T_1$
. The right panel presents the same information in a more intuitive way. In both panels, dashed lines indicate the 95% pointwise confidence band; points or segments of the solid line falling outside this band indicate significant misfit.
As shown in Figure 8, our method identified some violations in the normality assumption of the LV density in this example. The graphical plot in the right panel further reveals that the LV density, as estimated from the data, is leptokurtic and slightly negatively skewed. This finding mirrors the descriptive statistics reported in Liu & Wang (Reference Liu and Wang2022, Table 2), which showed that the observed log-RT distributions were leptokurtic and/or skewed for some items. Practically, this deviation from the standard normal density suggests, for example, that there are substantially more fast responders and fewer slow responders than assumed by the model. For an overall judgment on the LV density fit based on this graphical diagnostic, one can refer to the overall test result, which indicates significant misfit in the LV density in general (
$T_1=27.53,\ \textit {p}<0.001$
).
Next, item-level fit was evaluated by examining the mean function of the log-RT using
$z_2(x)$
and
$T_2$
, and by examining the variance function using
$z_3(x)$
and
$T_3$
. Figures 9 and 10 present the corresponding results for the mean and variance functions, respectively, adopting the graphical notations shown in the right panel of Figure 8. In each figure, solid lines represent the model-implied linear mean functions and constant variance functions, respectively, while dotted lines represent their empirical estimates derived from the data. Dashed lines indicate the 95% pointwise confidence intervals obtained from Equation (20), with
$\boldsymbol {\varphi }$
being the ratio transformation. For each item, the p-values for the overall
$\chi ^2$
-test based on
$T_2 \text { or } T_3$
are provided at the top of the figures.

Figure 9 MV-level linearity test results at
$\alpha =0.05$
.
Note: Solid lines represent model-implied linear mean functions, while dotted lines indicate empirical estimates of the mean functions. Dashed lines delineate 95% pointwise confidence bands. The p-value of the overall
$\chi ^2$
-test based on
$T_2$
is denoted as p in this figure.

Figure 10 MV-level homoscedasticity test results at
$\alpha =0.05$
.
Note: Solid lines represent model-implied constant variances, while dotted lines indicate empirical estimates of the variance functions. Dashed lines delineate 95% pointwise confidence bands. The p-value of the overall
$\chi ^2$
-test based on
$T_3$
is denoted as p in this figure.
Examining the pointwise results in Figure 9 reveals that certain items, particularly items 2, 6, and 7, exhibit substantial violations of linearity in their mean functions, as indicated by curvature in their confidence bands. Specifically, item 2 shows a bending-down shape at both extremes of the LVs, suggesting that the misfitting linear normal factor model significantly overestimates the conditional mean of log-RT in these regions. Such misfit can be especially problematic if researchers’ focus is on investigating extremely slow or fast responders, which is often the case in RT data analysis. For example, when identifying careless responders based on unusually fast responses, the model’s overestimation of the expected RT for fast responders could inflate the false positive rate. Similar discussions can also be found in Liu & Wang (Reference Liu and Wang2022), where similar curvature patterns in conditional mean functions have been identified using a semiparametric approach. The overall
$\chi ^2$
-test results presented at the top of each plot in Figure 9 can be referenced to identify items whose mean functions are poorly recovered by a linear normal factor model. Specifically, items 2, 6, and 7 are called out by the overall tests, offering a formal justification for what might otherwise depend on subjective judgment.
Similarly, the fit assessment of item-level variance functions in Figure 10 indicates that the constant variance assumption is violated for several items, as evidenced by confidence bands deviating noticeably from the model-implied constant variances. These deviations suggest heteroscedastic variance depending on the level of latent processing speed. Particularly, the salient increase in variability at the high end of the LV for items 6 and 7 reflects greater RT variability among slow responders. Summarizing the pointwise results, the overall tests identified items 5–8 as misfitting in this case.
5 Discussion
To achieve more flexible fit assessment, we extend the theory of generalized residuals (Haberman & Sinharay, Reference Haberman and Sinharay2013) to accommodate more general measurement models. In our framework, transformed residuals are defined to construct asymptotically normal and chi-squared fit statistics, enabling the evaluation of both individual and multiple residuals simultaneously. As specific examples, we propose pointwise and summary statistics to identify misfit in the LV density and MV-level conditional moments under common factor models. Results from two simulation studies indicate that the proposed tests, based on these example statistics, maintain well-controlled Type I error rates even with small sample sizes and exhibit decent power to detect misfit with moderate to large sample sizes. These findings suggest that generalized residuals are promising tools for assessing parametric assumptions involved in measurement models.
The empirical data analysis using RT data further illustrates the practical utility of our approach. While conventional GOF diagnostics suggest an overall reasonable fit for the linear normal one-factor model, our method identifies misfit in the LV density, the MV-level mean function, and the MV-level variance function. Moreover, we provide intuitive visualizations contrasting empirical versus model-implied estimates of relevant summary quantities conditional on LV values (e.g., Figures 8–10), which facilitates the interpretation of misfitting patterns.
Although not addressed in the current manuscript, diagnostics generated from the proposed method could inform model modification. For instance, if a bimodal LV distribution is suggested in the graphical summary of generalized residuals, one may proceed with mixture modeling to explore the presence of hidden subgroups. In longitudinal measurement models for time-varying constructs, the violation of multivariate normality could indicate a nonlinear growth pattern. Similarly, the detected shape of conditional moment functions for MVs could suggest fitting a nonlinear factor model. In addition, identifying misfitting MVs itself could contribute to establishing test fairness.
There are several limitations and extensions of our current work to be addressed in future research.
First, the proposed framework relies on ML estimation, which requires a correctly specified likelihood. This implies that when the framework is adapted to test a specific assumption, such as linearity, it assumes that all other model assumptions are satisfied except for the one under investigation. As noted in Section 2.4.2 and the Supplementary Material, the ratio-form estimator developed for MV-level fit tests may perform well even when the LV density is misspecified; however, its robustness against other types of misspecification has not been evaluated. If other aspects of the model are misspecified alongside the one under investigation, the performance of the statistics under our framework could be affected. Future research could explore the influence of mixed misspecifications on generalized residuals through simulation studies. Furthermore, incorporating alternative estimation methods beyond ML could be considered to expand the applicability of our framework.
Second, the tests developed in our framework share a limitation with the conventional model fit
$\chi ^2$
test in that they would reject the null hypothesis even under trivial misfit in large samples. Although not considered in the current work, introducing effect size measures, such as those analogous to RMSEA or Cohen’s effect sizes (Cohen, Reference Cohen1988), could be explored in future work as a potential solution. For instance, Cohen’s d and f remove the influence of sample size from the associated t- and F-tests, and RMSEA rescales the
$\chi ^2$
fit statistic by both the sample size and the degrees of freedom. Standardized effect size measures based on the proposed z- and
$\chi ^2$
- statistics could be developed in a similar fashion.
Third, beyond the three example test statistics provided, other statistics can be constructed under our framework to evaluate different types of model assumptions. One simple application is to formulate generalized residuals regarding the conditional covariance between two MVs, which can yield GOF test statistic for assessing the pairwise local independence assumption (e.g., Ip, Reference Ip2001; McDonald, Reference McDonald1994; Strout, Reference Strout1990).
Fourth, although we focused only the common factor model in this article, the theory we present can be extended to other measurement models, as long as a fully specified parametric model is assumed. For example, the framework can be generalized to GOF assessments in non-linear factor analysis (e.g., Yalcin & Amemiya, Reference Yalcin and Amemiya2001) and count data IRT (e.g., Wang, Reference Wang2010).
Finally, exploring efficient ways to compute and visualize generalized residuals in high-dimensional LV settings can be a potential topic for future research. When the latent dimensionality is high, constructing evaluation points using an outer product grid becomes infeasible due to the exponential growth in the number of points. For example, in a five-dimensional case, examining residuals at just five different points per LV leads to
$5^5=3,125$
evaluations, making the process computationally intensive and posing a challenge for generating graphical plots. Selecting appropriate grids of LVs to summarize results adds another layer of complexity. Formulating partially marginalized residuals conditioned only on one or two LV(s) may yield more efficient and informative diagnostics.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/psy.2025.10037.
Acknowledgements
The authors would like to thank Dr. Hongyang Zhao and Dr. Patricia Alexander from the University of Maryland, College Park, for providing the empirical data example. The authors would also like to thank the anonymous Associate Editor and reviewers for their valuable comments on earlier versions of this manuscript.
Funding statement
The research of the third author (Yang Liu) was supported in part by the NSF Grant SES-1826535.
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.