Model Identification and Estimation for Longitudinal Data in Practice

Performance of the Longitudinal Actor–Partner Interdependence Model in Case of Large Amounts of Missing Values: Challenges and Possible Alternatives
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Yuanyuan Ji, Jordan Revol, Anna Schouten, Marieke J. Schreuder, Eva Ceulemans
Journal:

Psychometrika ,

Published online by Cambridge University Press:

13 June 2025, pp. 1-23
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Researchers interested in dyadic processes increasingly collect intensive longitudinal data (ILD), with the longitudinal actor–partner interdependence model (L-APIM) being a popular modeling approach. However, due to non-compliance and the use of conditional questions, ILD are almost always incomplete. These missing data issues become more prominent in dyadic studies, because partners often miss different measurement occasions or disagree about features that trigger conditional questions. Large amounts of missing data challenge the L-APIM’s estimation performance. Specifically, we found that non-convergence occurred when applying the L-APIM to pre-existing dyadic diary data with a lot of missing values. Using a simulation study, we systematically examined the performance of the L-APIM in dyadic ILD with missing values. Consistent with our illustrative data, we found that non-convergence often occurred in conditions with small sample sizes, while the fixed within-person actor and partner effects were well estimated when analyses did converge. Additionally, considering potential convergence failures with the L-APIM, we investigated 31 alternative models and evaluated their performance on simulated and empirical data, showing that multiple alternatives may alleviate the convergence problems. Overall, when the L-APIM fails to converge, we recommend fitting multiple alternative models to check the robustness of the results.

Show Me Some ID: A Universal Identification Program for Structural Equation Models
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Michael D. Hunter, Robert M. Kirkpatrick, Michael C. Neale
Journal:

Psychometrika ,

Published online by Cambridge University Press:

24 April 2025, pp. 1-24
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
With models and research designs ever increasing in complexity, the foundational question of model identification is more important than ever. The determination of whether or not a model can be fit at all or fit to some particular data set is the essence of model identification. In this article, we pull from previously published work on data-independent model identification applicable to a broad set of structural equation models, and extend it further to include extremely flexible exogenous covariate effects and also to include data-dependent empirical model identification. For illustrative purposes, we apply this model identification solution to several small examples for which the answer is already known, including a real data example from the National Longitudinal Survey of Youth; however, the method applies similarly to models that are far from simple to comprehend. The solution is implemented in the open-source OpenMx package in R.

Comparing Functional Trend and Learning among Groups in Intensive Binary Longitudinal Eye-Tracking Data using By-Variable Smooth Functions of GAMM
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Sun-Joo Cho, Sarah Brown-Schmidt, Sharice Clough, Melissa C. Duff
Journal:

Psychometrika ,

Published online by Cambridge University Press:

14 April 2025, pp. 1-30
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This paper presents a model specification for group comparisons regarding a functional trend over time within a trial and learning across a series of trials in intensive binary longitudinal eye-tracking data. The functional trend and learning effects are modeled using by-variable smooth functions. This model specification is formulated as a generalized additive mixed model, which allowed for the use of the freely available mgcv package (Wood in Package ‘mgcv.’ https://cran.r-project.org/web/packages/mgcv/mgcv.pdf, 2023) in R. The model specification was applied to intensive binary longitudinal eye-tracking data, where the questions of interest concern differences between individuals with and without brain injury in their real-time language comprehension and how this affects their learning over time. The results of the simulation study show that the model parameters are recovered well and the by-variable smooth functions are adequately predicted in the same condition as those found in the application.

Bayesian Identification and Estimation of Growth Mixture Models
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Xingyao Xiao, Sophia Rabe-Hesketh, Anders Skrondal
Journal:

Psychometrika ,

Published online by Cambridge University Press:

07 April 2025, pp. 1-34
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article addresses problematic behaviors of Markov chain Monte Carlo (MCMC) methods for finite mixture models due to what we call degenerate nonidentifiability. We discuss the reasons for these behaviors, propose diagnostics to detect them, and show through simulations that using more informative priors than the vague defaults can mitigate the problems in growth mixture models (GMMs). Our motivating example is an application of GMMs to data from the National Longitudinal Survey of Youth (NLSY) to examine heterogeneity in the development of reading skills in children aged 6–14. We also suggest ways of describing and visualizing within-class heterogeneity in GMMs, provide a literature review of likelihood identification and Bayesian identification, propose a viable definition of Bayesian identification for latent variable models based on the marginal likelihood (integrated over the latent variables), and give a brief didactic description of Hamiltonian Monte Carlo (HMC) as implemented in Stan.

Multiphase Structured Latent Curve Models for Count Response Data: A Re-Analysis of the Acquisition of Morphology in English
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Marian M. Strazzeri, Jeffrey R. Harring, Nan Bernstein Ratner
Journal:

Psychometrika ,

Published online by Cambridge University Press:

18 March 2025, pp. 1-40
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Structured latent curve models (SLCMs) for continuous repeated measures data have been the subject of considerable recent research activity. In this article, we develop a first-order SLCM for repeated measures count data where the underlying change process is theorized to develop in distinct phases. Parameters of the multiphase or piecewise growth model, including changepoints, are allowed to vary across individuals. Exposure is allowed to vary across both individuals and time. We demonstrate our modeling approach on empirical expressive language data (grammatical morpheme counts) drawn from multiple distinct corpora available in the Child Language Data Exchange System (CHILDES), where the acquisition of grammatical morphology is understood to occur in distinct phases in typically developing children. A multiphase SLCM is fit to summarize individuals’ data as well as the average developmental pattern. Change in time-varying dispersion (unexplained variability in morpheme counts) over the course of early childhood is modeled concurrently to provide additional insights into acquisition. Unique characteristics of count data create modeling, identification, estimation, and diagnostic challenges that are exacerbated by incorporating growth models with nonlinear random effects. These are discussed at length. We provide annotated software code for each of models used in the empirical example.

Stacking Models of Growth: A Methodology for Predicting the Pace of Progress to the Education Sustainable Development Targets Using International Large-Scale Assessments
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
David Kaplan, Kjorte Harra, Jonas Stampka, Nina Jude
Journal:

Psychometrika ,

Published online by Cambridge University Press:

13 February 2025, pp. 1-29
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
To assess country-level progress toward these educational goals it is important to monitor trends in educational outcomes over time. The purpose of this article is to demonstrate how optimally predictive growth models can be constructed to monitor the pace of progress at which countries are moving toward (or way from) the education sustainable development goals as specified by the United Nations. A number of growth curve models can be specified to estimate the pace of progress, however, choosing one model and using it for predictive purposes assumes that the chosen model is the one that generated the data, and this choice runs the risk of “over-confident inferences and decisions that are more risky than one thinks they are” (Hoeting et al., 1999). To mitigate this problem, we adapt and apply Bayesian stacking to form mixtures of predictive distributions from an ensemble of individual models specified to predict country-level pace of progress. We demonstrate Bayesian stacking using country-level data from the Program on International Student Assessment. Our results show that Bayesian stacking yields better predictive accuracy than any single model as measured by the Kullback–Leibler divergence. Issues of Bayesian model identification and estimation for growth models are also discussed.

Implications of Alternative Parameterizations in Structural Equation Models for Longitudinal Categorical Variables
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Silvia Bianconcini, Kenneth A. Bollen
Journal:

Psychometrika ,

Published online by Cambridge University Press:

03 January 2025, pp. 1-34
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
When analyzing scaling conditions in latent variable structural equation models (SEMs) with continuous observed variables, analysts scaling a latent variable typically set the factor loading of one indicator to one and either set its intercept to zero or the mean of its latent variable to zero. When binary and ordinal observed variables are part of SEMs, the identification and scaling choices are more varied and multifaceted. Longitudinal data further complicate this. In SEM software, such as lavaan and Mplus, fixing the underlying variables’ variances or the error variances to one are two primary scaling conventions. As demonstrated in this paper, choosing between these constraints can significantly impact longitudinal analysis, affecting model fit, degrees of freedom, and assumptions about the dynamic process and error structure. We explore alternative parameterizations and conditions of model equivalence with categorical repeated measures.
Using data from the National Longitudinal Survey of Youth 1997, we empirically explore how different parameterizations lead to varying conclusions in longitudinal categorical analysis. More specifically, we provide insights into the specifications of the autoregressive latent trajectory model and its special cases—the linear growth curve and first-order autoregressive models—for categorical repeated measures. These findings have broader implications for a wide range of longitudinal models.

Robust Inference for Generalized Linear Mixed Models: A “Two-Stage Summary Statistics” Approach Based on Score Sign Flipping
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Angela Andreella, Jelle Goeman, Jesse Hemerik, Livio Finos
Journal:

Psychometrika ,

Published online by Cambridge University Press:

03 January 2025, pp. 1-23
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Despite the versatility of generalized linear mixed models in handling complex experimental designs, they often suffer from misspecification and convergence problems. This makes inference on the values of coefficients problematic. In addition, the researcher’s choice of random and fixed effects directly affects statistical inference correctness. To address these challenges, we propose a robust extension of the “two-stage summary statistics” approach using sign-flipping transformations of the score statistic in the second stage. Our approach efficiently handles within-variance structure and heteroscedasticity, ensuring accurate regression coefficient testing for 2-level hierarchical data structures. The approach is illustrated by analyzing the reduction of health issues over time for newly adopted children. The model is characterized by a binomial response with unbalanced frequencies and several categorical and continuous predictors. The proposed approach efficiently deals with critical problems related to longitudinal nonlinear models, surpassing common statistical approaches such as generalized estimating equations and generalized linear mixed models.

State-Dependent Missingness in Hidden Markov Models, with an Application to Drop-Out in a Clinical Trial
Part of:
- Model Identification and Estimation for Longitudinal Data in Practice
Maarten Speekenbrink, Ingmar Visser
Journal:

Psychometrika ,

Published online by Cambridge University Press:

03 January 2025, pp. 1-32
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Establishing the effectiveness of treatments for psychopathology requires accurate models of its progression over time and the factors that impact it. Longitudinal data is however fraught with missingness, hindering accurate modeling. We re-analyse data on schizophrenia severity in a clinical trial using hidden Markov models (HMMs). We consider missing data in HMMs with a focus on situations where data is missing not at random (MNAR) and missingness depends on the latent states, allowing symptom severity to indirectly impact probability of missingness. In simulations, we show that including a submodel for state-dependent missingness reduces bias when data is MNAR and state-dependent, whilst not reducing accuracy when data is missing-at-random (MAR). When missingness depends on time, a model that allows missingness to be both state- and time-dependent is unbiased. Overall, these results show that modelling missingness as state-dependent and including relevant covariates is a useful strategy in applications of HMMs to time-series with missing data. Applying the model to data from a clinical trial, we find that drop-out is more likely for patients with less severe symptoms, which may lead to a biased assessment of treatment effectiveness.

Psychometrika

Refine listing

Actions for selected content:

Model Identification and Estimation for Longitudinal Data in Practice

Application and Case Studies

Performance of the Longitudinal Actor–Partner Interdependence Model in Case of Large Amounts of Missing Values: Challenges and Possible Alternatives

Show Me Some ID: A Universal Identification Program for Structural Equation Models

Comparing Functional Trend and Learning among Groups in Intensive Binary Longitudinal Eye-Tracking Data using By-Variable Smooth Functions of GAMM

Bayesian Identification and Estimation of Growth Mixture Models

Multiphase Structured Latent Curve Models for Count Response Data: A Re-Analysis of the Acquisition of Morphology in English

Stacking Models of Growth: A Methodology for Predicting the Pace of Progress to the Education Sustainable Development Targets Using International Large-Scale Assessments

Application Reviews and Case Studies

Implications of Alternative Parameterizations in Structural Equation Models for Longitudinal Categorical Variables

Robust Inference for Generalized Linear Mixed Models: A “Two-Stage Summary Statistics” Approach Based on Score Sign Flipping

State-Dependent Missingness in Hidden Markov Models, with an Application to Drop-Out in a Clinical Trial

Psychometrika

Refine listing

Actions for selected content:

Save Search

Model Identification and Estimation for Longitudinal Data in Practice

Application and Case Studies

Application Reviews and Case Studies