We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
This journal utilises an Online Peer Review Service (OPRS) for submissions. By clicking "Continue" you will be taken to our partner site
https://mc.manuscriptcentral.com/psychometrika.
Please be aware that your Cambridge account is not valid for this OPRS and registration is required. We strongly advise you to read all "Author instructions" in the "Journal information" area prior to submitting.
To save this undefined to your undefined account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your undefined account.
Find out more about saving content to .
To send this article to your Kindle, first ensure no-reply@cambridge-org.demo.remotlog.com is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Researchers interested in dyadic processes increasingly collect intensive longitudinal data (ILD), with the longitudinal actor–partner interdependence model (L-APIM) being a popular modeling approach. However, due to non-compliance and the use of conditional questions, ILD are almost always incomplete. These missing data issues become more prominent in dyadic studies, because partners often miss different measurement occasions or disagree about features that trigger conditional questions. Large amounts of missing data challenge the L-APIM’s estimation performance. Specifically, we found that non-convergence occurred when applying the L-APIM to pre-existing dyadic diary data with a lot of missing values. Using a simulation study, we systematically examined the performance of the L-APIM in dyadic ILD with missing values. Consistent with our illustrative data, we found that non-convergence often occurred in conditions with small sample sizes, while the fixed within-person actor and partner effects were well estimated when analyses did converge. Additionally, considering potential convergence failures with the L-APIM, we investigated 31 alternative models and evaluated their performance on simulated and empirical data, showing that multiple alternatives may alleviate the convergence problems. Overall, when the L-APIM fails to converge, we recommend fitting multiple alternative models to check the robustness of the results.
With models and research designs ever increasing in complexity, the foundational question of model identification is more important than ever. The determination of whether or not a model can be fit at all or fit to some particular data set is the essence of model identification. In this article, we pull from previously published work on data-independent model identification applicable to a broad set of structural equation models, and extend it further to include extremely flexible exogenous covariate effects and also to include data-dependent empirical model identification. For illustrative purposes, we apply this model identification solution to several small examples for which the answer is already known, including a real data example from the National Longitudinal Survey of Youth; however, the method applies similarly to models that are far from simple to comprehend. The solution is implemented in the open-source OpenMx package in R.
This paper presents a model specification for group comparisons regarding a functional trend over time within a trial and learning across a series of trials in intensive binary longitudinal eye-tracking data. The functional trend and learning effects are modeled using by-variable smooth functions. This model specification is formulated as a generalized additive mixed model, which allowed for the use of the freely available mgcv package (Wood in Package ‘mgcv.’ https://cran.r-project.org/web/packages/mgcv/mgcv.pdf, 2023) in R. The model specification was applied to intensive binary longitudinal eye-tracking data, where the questions of interest concern differences between individuals with and without brain injury in their real-time language comprehension and how this affects their learning over time. The results of the simulation study show that the model parameters are recovered well and the by-variable smooth functions are adequately predicted in the same condition as those found in the application.
This article addresses problematic behaviors of Markov chain Monte Carlo (MCMC) methods for finite mixture models due to what we call degenerate nonidentifiability. We discuss the reasons for these behaviors, propose diagnostics to detect them, and show through simulations that using more informative priors than the vague defaults can mitigate the problems in growth mixture models (GMMs). Our motivating example is an application of GMMs to data from the National Longitudinal Survey of Youth (NLSY) to examine heterogeneity in the development of reading skills in children aged 6–14. We also suggest ways of describing and visualizing within-class heterogeneity in GMMs, provide a literature review of likelihood identification and Bayesian identification, propose a viable definition of Bayesian identification for latent variable models based on the marginal likelihood (integrated over the latent variables), and give a brief didactic description of Hamiltonian Monte Carlo (HMC) as implemented in Stan.
Structured latent curve models (SLCMs) for continuous repeated measures data have been the subject of considerable recent research activity. In this article, we develop a first-order SLCM for repeated measures count data where the underlying change process is theorized to develop in distinct phases. Parameters of the multiphase or piecewise growth model, including changepoints, are allowed to vary across individuals. Exposure is allowed to vary across both individuals and time. We demonstrate our modeling approach on empirical expressive language data (grammatical morpheme counts) drawn from multiple distinct corpora available in the Child Language Data Exchange System (CHILDES), where the acquisition of grammatical morphology is understood to occur in distinct phases in typically developing children. A multiphase SLCM is fit to summarize individuals’ data as well as the average developmental pattern. Change in time-varying dispersion (unexplained variability in morpheme counts) over the course of early childhood is modeled concurrently to provide additional insights into acquisition. Unique characteristics of count data create modeling, identification, estimation, and diagnostic challenges that are exacerbated by incorporating growth models with nonlinear random effects. These are discussed at length. We provide annotated software code for each of models used in the empirical example.
To assess country-level progress toward these educational goals it is important to monitor trends in educational outcomes over time. The purpose of this article is to demonstrate how optimally predictive growth models can be constructed to monitor the pace of progress at which countries are moving toward (or way from) the education sustainable development goals as specified by the United Nations. A number of growth curve models can be specified to estimate the pace of progress, however, choosing one model and using it for predictive purposes assumes that the chosen model is the one that generated the data, and this choice runs the risk of “over-confident inferences and decisions that are more risky than one thinks they are” (Hoeting et al., 1999). To mitigate this problem, we adapt and apply Bayesian stacking to form mixtures of predictive distributions from an ensemble of individual models specified to predict country-level pace of progress. We demonstrate Bayesian stacking using country-level data from the Program on International Student Assessment. Our results show that Bayesian stacking yields better predictive accuracy than any single model as measured by the Kullback–Leibler divergence. Issues of Bayesian model identification and estimation for growth models are also discussed.
When analyzing scaling conditions in latent variable structural equation models (SEMs) with continuous observed variables, analysts scaling a latent variable typically set the factor loading of one indicator to one and either set its intercept to zero or the mean of its latent variable to zero. When binary and ordinal observed variables are part of SEMs, the identification and scaling choices are more varied and multifaceted. Longitudinal data further complicate this. In SEM software, such as lavaan and Mplus, fixing the underlying variables’ variances or the error variances to one are two primary scaling conventions. As demonstrated in this paper, choosing between these constraints can significantly impact longitudinal analysis, affecting model fit, degrees of freedom, and assumptions about the dynamic process and error structure. We explore alternative parameterizations and conditions of model equivalence with categorical repeated measures.
Using data from the National Longitudinal Survey of Youth 1997, we empirically explore how different parameterizations lead to varying conclusions in longitudinal categorical analysis. More specifically, we provide insights into the specifications of the autoregressive latent trajectory model and its special cases—the linear growth curve and first-order autoregressive models—for categorical repeated measures. These findings have broader implications for a wide range of longitudinal models.
Despite the versatility of generalized linear mixed models in handling complex experimental designs, they often suffer from misspecification and convergence problems. This makes inference on the values of coefficients problematic. In addition, the researcher’s choice of random and fixed effects directly affects statistical inference correctness. To address these challenges, we propose a robust extension of the “two-stage summary statistics” approach using sign-flipping transformations of the score statistic in the second stage. Our approach efficiently handles within-variance structure and heteroscedasticity, ensuring accurate regression coefficient testing for 2-level hierarchical data structures. The approach is illustrated by analyzing the reduction of health issues over time for newly adopted children. The model is characterized by a binomial response with unbalanced frequencies and several categorical and continuous predictors. The proposed approach efficiently deals with critical problems related to longitudinal nonlinear models, surpassing common statistical approaches such as generalized estimating equations and generalized linear mixed models.
Establishing the effectiveness of treatments for psychopathology requires accurate models of its progression over time and the factors that impact it. Longitudinal data is however fraught with missingness, hindering accurate modeling. We re-analyse data on schizophrenia severity in a clinical trial using hidden Markov models (HMMs). We consider missing data in HMMs with a focus on situations where data is missing not at random (MNAR) and missingness depends on the latent states, allowing symptom severity to indirectly impact probability of missingness. In simulations, we show that including a submodel for state-dependent missingness reduces bias when data is MNAR and state-dependent, whilst not reducing accuracy when data is missing-at-random (MAR). When missingness depends on time, a model that allows missingness to be both state- and time-dependent is unbiased. Overall, these results show that modelling missingness as state-dependent and including relevant covariates is a useful strategy in applications of HMMs to time-series with missing data. Applying the model to data from a clinical trial, we find that drop-out is more likely for patients with less severe symptoms, which may lead to a biased assessment of treatment effectiveness.