Introduction
Suicide remains a major global public health challenge, claiming over 700,000 lives annually (WHO, 2021). In China, approximately 116,000 suicide deaths were reported in 2019, accounting for about 17% of the global total (WHO, 2021). The age-standardized rate was 6.7 per 100,000, lower than the global average of 9.0 per 100,000 (WHO, 2021). Although China has seen a substantial decline in suicide rates over the past three decades (Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b), the country’s large population keeps the absolute burden high. Suicide attempt (SA) is among the strongest predictors of suicide (Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016). An estimated 8.5% to 13% of attempts are fatal, with the first-attempt fatality rate around 3% (Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016; Conner et al., Reference Conner, Azrael and Miller2019; Miller et al., Reference Miller, Azrael and Barber2012). Nearly half of suicide deaths occur following a prior attempt rather than on the first attempt (Anestis, Reference Anestis2016; Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016). Preventing nonfatal SA provides a critical window for early intervention, especially in China.
Age is a well-established factor influencing suicidal behaviours (Yazdi-Ravandi et al., Reference Yazdi-Ravandi, Khazaei, Davari, Matinnia, Karami, Taslimi, Afkhami and Ghaleiha2023). Although no uniform age pattern has been identified, substantial epidemiological differences across age groups have been consistently reported (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017; Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b). Globally, adolescents and young adults are widely recognized as high-risk groups for both SA and suicide deaths (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017; Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b). In China, a nationwide analysis of cause-of-death surveillance data revealed a bimodal age distribution in suicide mortality, with peaks among those aged 15–24 and older adults (Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b). However, nationally representative data on SA remain scarce, possibly due to suicide-related stigma, stringent ethical oversight, and the challenges of diagnostic complexity, participant recruitment, disclosure, and data collection in population-based surveys on suicidal behaviours (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007; Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b). Most available evidence comes from small, high-risk samples—such as individuals with mental disorders (Chen et al., Reference Chen, Chi, Niu, Gao, Mei, Zhao, Hu, Zhao and Ma2022; Song et al., Reference Song, Liu, Zhou and Zhang2023), sexual and gender minorities (Chen et al., Reference Chen, Zhu, Wright, Drescher, Gao, Wu, Ying, Qi, Chen, Xi, Ji, Zhao, Ou and Broome2019), children and adolescents (Su et al., Reference Su, John and Lin2023; Wu et al., Reference Wu, Zhu, Wang and Jiang2021), and rural or metropolitan residents (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007; Lyu et al., Reference Lyu, Wang, Shi and Zhang2018)—and shows wide variability across subgroups. As a result, the overall prevalence and age-specific patterns of SA in the general population remain unclear. Given China’s age-patterned suicide mortality across life stages, we hypothesize that SA similarly displays distinct age-specific profiles and predictors. Clarifying these patterns is critical for informing age-tailored suicide prevention strategies.
SA arises from multiple factors that often interact in complex, nonlinear ways (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017). Most prediction studies have relied on linear models, yielding modest accuracy (Wu et al., Reference Wu, Su, Chen, Zhao, Zhong and Zheng2023). Although traditional approaches such as generalized additive models can model certain nonlinearities, they require pre-specifying interactions and are less suited to high-dimensional, mixed-type data. A meta-analysis of 365 studies found that models based solely on conventional risk factors performed only slightly better than chance, highlighting the need for machine-learning (ML) risk algorithms (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017). ML can flexibly capture nonlinearities and higher order interactions, handle heterogeneous predictors and class imbalance, and do so without strict parametric assumptions (García de la Garza et al., Reference García de la Garza, Blanco, Olfson and Wall2021). Given the likely age-related heterogeneity of SA in the general population, identifying age-specific predictors in non-clinical samples is essential (García de la Garza et al., Reference García de la Garza, Blanco, Olfson and Wall2021; Gordon et al., Reference Gordon, Avenevoli and Pearson2020). To date, no study in China has used ML for this purpose, leaving a key evidence gap.
In this nationally representative, multicentre survey, we aimed to (1) estimate the age-specific prevalence of lifetime suicide attempts (LSA) among Chinese adults and (2) identify key age-specific predictors using five ML algorithms. These findings may help inform tailored suicide prevention strategies across the life course.
Methods
Data source and study population
Data came from the Psychology and Behavior Investigation of Chinese Residents (PBICR), a nationally representative, multicentre cross-sectional survey led by Peking University from June to September 2024. Using stratified and quota sampling, the survey randomly selected 150 cities, 202 districts/counties, and 390 townships across 31 provincial-level regions (including Hong Kong and Macao), covering 800 communities or villages. Trained interviewers recruited participants online and onsite at survey locations, verified eligibility, and administered standardized electronic questionnaires in one-to-one, face-to-face interviews. Eligible participants were adults (≥18 years) who were Chinese nationals permanently residing in China and able to clearly understand and complete the questionnaire. We excluded individuals with diagnosed psychiatric or cognitive disorders, those concurrently enrolled in similar surveys, those who declined or did not sign informed consent, and responses with completion times < 5 min. For participants with intact decisional capacity but insufficient motor ability, interviewers conducted one-to-one interviews and recorded responses on their behalf. All procedures complied with the Declaration of Helsinki and received institutional ethics approval (see Ethics Statement).
Of 38,793 invited residents, 38,424 responded (response rate: 99.05%). We excluded 2,563 invalid questionnaires due to age < 18, non-Chinese nationality, no consent, response time < 5 min, or missing age and LSA, leaving 35,861 fully eligible participants. To improve national representativeness, post-stratification weights based on sex, age, and urban–rural distribution as reported in the China Statistical Yearbook 2023 (National Bureau of Statistics of China, 2023) were applied, yielding a final analytic sample of 25,047. Missing covariate values (n = 136) were imputed using MissForest, a non-parametric algorithm based on random forests suitable for both continuous and categorical data (Stekhoven and Buhlmann, Reference Stekhoven and Buhlmann2012). Based on prior suicide-related research, mortality surveillance practices, and developmental distinctions across life stages (Miller et al., Reference Miller, Azrael and Hemenway2004; Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b), age was categorized into three groups (18–24, 25–44, and ≥45 years), reflecting distinct patterns of psychosocial development, health risk exposures, and suicide-related vulnerabilities. The sample selection flowchart is shown in Fig. 1.

Fig. 1. Flow diagram of sample selection.
Outcome assessment
LSA was defined as any self-inflicted, potentially injurious act at any point in life, carried out with at least some intent to die, regardless of the outcome (Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b). Consistent with prior suicide-related research (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007; Oquendo et al., Reference Oquendo, Wall, Wang, Olfson and Blanco2024), LSA was assessed using a dichotomous self-report item: “Have you ever attempted suicide?”. Participants who answered “yes” were classified as lifetime suicide attempters, and those who answered “no” as non-attempters.
Predictors
Based on previous studies (Su et al., Reference Su, John and Lin2023; Wu et al., Reference Wu, Zhu, Wang and Jiang2021), we included 37 candidate predictors across six domains (details in Table S1):
(1) Sociodemographic: gender (male/female), marital status (married vs. unmarried—never married, divorced, widowed, or separated), residence (urban/rural), ethnicity (Han/ethnic minority), educational level (low: junior high school or below; medium: senior high school or vocational/technical school; high: college degree or above), job status (in education or employment vs. not in education or employment), economic status (monthly household income: low < 3000 RMB; middle 3000–5999 RMB; high ≥ 6000 RMB), and medical insurance (yes/no).
(2) Physical health: body mass index (BMI, underweight < 18.5, normal 18.5–24.9, overweight/obese ≥ 25); chronic disease (yes/no); activities of daily living (ADL) limitation (yes/no), defined as difficulty with basic self-care tasks (walking across a room, bathing/showering, or dressing); and self-rated health (0–100, higher scores represent better status).
(3) Lifestyle: smoking (yes/no), drinking (yes/no), living alone (yes/no), sleep quality (five-item sleep health score: chronotype, duration, insomnia, snoring, daytime sleepiness; range 0–5; higher scores indicate better sleep) (Fan et al., Reference Fan, Sun, Zhou, Heianza, Lv, Li and Qi2020), social media addiction (SMA, assessed with the Bergen Social Media Addiction Scale; total score 6–30; scores ≥ 19 indicate at-risk status; Cronbach’s α = 0.926 overall) (Leung et al., Reference Leung, Pakpour, Strong, Lin, Tsai, Griffiths, Lin and Chen2020), and moderate- and vigorous-intensity physical activity (MPA, VPA) (measured as the number of self-reported days per week).
(4) Social environment: social support (three-item Perceived Social Support Scale; total score 0–18, higher scores indicate stronger perceived support; Cronbach’s α = 0.904 overall) (Wu et al., Reference Wu, Tang, Du, Chen, Wang, Sun, Zhang and Wu2025); self-rated social status (a self-assessed item, 1–7; higher scores indicate higher perceived status); neighbourhood relations (a self-assessed item, 1–7; higher scores indicate better relations); and social participation (assessed by the five-item Social Connection Scale; total score 0–5; higher scores indicate greater engagement in shared activities and stronger social connectedness; Cronbach’s α = 0.805 overall) (Foster et al., Reference Foster, Gill, Mair, Celis-Morales, Jani, Nicholl, Lee and O’Donnell2023).
(5) Mental health: depressive symptoms (DS, assessed using Patient Health Questionnaire-9; total score 0–27, scores ≥ 10 indicating having DS; Cronbach’s α = 0.940 overall) (Levis et al., Reference Levis, Benedetti and Thombs2019); anxiety symptoms (AS, assessed using Generalized Anxiety Disorder-3 scale; total score 0–9, scores ≥ 3 indicating AS; Cronbach’s α = 0.901 overall) (Wang et al., Reference Wang, Wu, Wang, Du and Wu2024); negative life events (NLE, any major stressor in the past year; yes/no); loneliness (a self-rated item, 1–7; higher scores indicate greater loneliness); perceived stress (assessed by the four-item Perceived Stress Scale; total score 0–16; higher scores indicate higher perceived stress; Cronbach’s α = 0.934 overall) (Huang et al., Reference Huang, Wang, Wang, Zhang, Du, Su, Jia, Ouyang, Wang, Li, Jiang and Zhang2020); burnout (measured with the seven-item Copenhagen Burnout Inventory; total score 0–28; higher scores indicate greater exhaustion; Cronbach’s α = 0.851 overall) (Borritz et al., Reference Borritz, Bültmann, Rugulies, Christensen, Villadsen and Kristensen2005); self-efficacy (assessed with the three-item New Generalized Self-Efficacy Scale; total score 0–12; higher scores indicate greater self-efficacy; Cronbach’s α = 0.929 overall) (Feng and Chen, Reference Feng and Chen2012); intimate partner violence (IPV, measured with the five-item scale covering physical, emotional, and controlling behaviours; total score 0-20; higher scores indicate greater severity; Cronbach’s α = 0.921 overall) (Yount et al., Reference Yount, Cheong, Khan, Bergenfeld, Kaslow and Clark2022); adverse childhood experiences (ACEs, cumulative score from 30 binary items covering household dysfunction, abuse, neglect, bereavement, and exposure to violence before age 18; total score 0–30; higher scores indicate greater adversity; Cronbach’s α = 0.901 overall) (Lin et al., Reference Lin, Wang, Lu, Chen and Guo2021).
(6) Self-injury and suicide-related history (all variables refer to lifetime history unless otherwise specified) (Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b): Non-suicidal self-injury (NSSI): deliberate self-inflicted damage to the body’s surface without suicidal intent (yes/no); NSSI-medical: history of NSSI that required medical treatment (yes/no); Suicidal ideation (SI): thoughts about engaging in behaviour intended to end one’s life (yes/no); Suicide plan (SP): formulation of a specific method, means, timing, or place for a suicide attempt (yes/no); Suicide disclosure: disclosure of suicide plans to others (yes/no).
Feature screening
Feature selection proceeded in two steps. First, univariable logistic regressions for LSA were run to prefilter predictors (retain if p < 0.05). Second, least absolute shrinkage and selection operator (LASSO) logistic regression with five-fold cross-validation was used to tune λ and apply an L1 penalty that shrinks irrelevant coefficients to zero, addressing high dimensionality and multicollinearity (Lee et al., Reference Lee, Gornitz, Xing, Heckerman and Lippert2018). Only predictors retained by both steps were carried forward. This intersection rule was chosen to enhance model stability, interpretability, and out-of-sample generalizability.
Machine learning algorithms
To identify the most accurate classifier, we compared five supervised algorithms on each age-specific dataset. All models used the same feature set obtained from the Predictors section. Hyperparameters were tuned by grid search with stratified five-fold cross-validation in the training set, and performance was evaluated on a held-out 30% test set (see Table S2).
(1) Random forest (RF) (Breiman, Reference Breiman2001): RF is an ensemble of decision trees grown on bootstrap samples with random feature subsampling. Predictions are aggregated by majority vote. This setup captures nonlinearities and interactions, is relatively robust to noise and overfitting, and provides internal variable-importance measures.
(2) Logistic regression (LR) (Pan and Xu, Reference Pan and Xu2022): LR models the log-odds of a binary outcome as a linear function of predictors. Coefficients are directly interpretable and estimation is efficient, suiting problems with modest feature sets or an emphasis on describing predictor–outcome associations.
(3) Support Vector Machine (SVM) (Rezvani and Wu, Reference Rezvani and Wu2023): SVM is a margin-based classifier that finds a separating hyperplane with maximal margin. Kernel functions allow nonlinear decision boundaries. It performs well in high-dimensional or sparse settings, particularly when the number of features exceeds the number of observations or when class boundaries are approximately linearly separable.
(4) Extreme Gradient Boosting (XGBoost) (Chen and Guestrin, Reference Chen and Guestrin2016): XGBoost is a gradient-boosted ensemble of decision trees fitted stage-wise to minimize a differentiable loss, with shrinkage and L1/L2 regularization to curb overfitting. It delivers strong accuracy on structured, high-dimensional data and supports efficient parallel training, making it well suited to complex health datasets.
(5) Naive Bayes (Domingos and Pazzani, Reference Domingos and Pazzani1997): Naive Bayes is a probabilistic classifier grounded in Bayes’ theorem that assumes conditional independence among features. Despite its simplicity, it performs well in many practical settings, particularly for high-dimensional, text-based, or weakly correlated data, and when computational efficiency is essential.
Statistical analysis
We estimated and compared the prevalence of LSA across age groups using chi-square tests. Within each age group, we compared participant characteristics between those with and without LSA using independent-samples t tests (or Mann–Whitney U tests for non-normal data) for continuous variables and chi-square tests for categorical variables. Continuous variables are reported as mean ± standard deviation (SD) if approximately normal, and as median (interquartile range, IQR) otherwise; categorical variables are reported as number and percentage.
To mitigate class imbalance in the binary classification task, we applied Adaptive Synthetic (ADASYN) oversampling to the training set after feature selection (Haibo et al., Reference Haibo, Yang, Garcia and Shutao2008). This method generates synthetic samples adaptively based on local feature density, improving classifier performance in imbalanced settings.
Each of the three age-specific datasets was randomly split into training (70%) and test (30%). Five-fold stratified cross-validation was performed within the training set for hyperparameter tuning and to reduce overfitting. Final performance was evaluated on the held-out test set using multiple metrics. The area under the receiver operating characteristic curve (AUC) assessed overall discrimination (0.5 indicating no discrimination and 1.0 perfect classification). Accuracy measured the proportion of correctly classified cases. Sensitivity reflected the proportion of true positives correctly identified, while specificity measured the proportion of true negatives. Precision (positive predictive value, PPV) and negative predictive value (NPV) captured the correctness of positive and negative predictions, respectively. The F1 score, defined as the harmonic mean of precision and sensitivity, offered a balanced metric under class imbalance. Beyond discrimination, model calibration was evaluated using calibration curves, and clinical utility was assessed with decision curve analysis (DCA). For the best-performing model in each dataset, we computed SHAP (SHapley Additive exPlanations) values to interpret feature contributions, offering insights into both global importance and individual-level predictions.
Sensitivity analyses were conducted to test the robustness of results: (1) reanalysis using complete cases to assess the impact of imputation; (2) 10-fold stratified cross-validation to test resampling sensitivity; (3) a 60:40 development/validation split to examine split-ratio effects; and (4) comparing feature-importance rankings across similarly performing models—to gauge predictor stability.
All analyses were performed using Python (version 3.7). Statistical significance was defined as two-sided p < 0.05.
Results
Prevalence of LSA and participant characteristics across age groups
As shown in Fig. 2, 1,145 of 25,047 participants reported LSA, including 417, 459, and 269 cases in the 18–24, 25–44, and ≥ 45 age groups, respectively. The overall prevalence of LSA was 4.57%, slightly higher in males (4.69%) than in females (4.45%), and similar between urban (4.49%) and rural (4.66%) residents. Prevalence varied significantly across age groups—peaking at 18–24 years (8.10%), then 25–44 years (4.67%), and lowest at ≥ 45 years (2.67%)—with all pairwise differences significant (all p < 0.001). Table 1 summarizes characteristics by LSA status within each age group across six domains. Age-related heterogeneity was most evident for sociodemographic factors (gender, marital status, education, income, insurance), whereas differences across lifestyle, social environment, mental health, and self-injury/suicide-history domains were broadly consistent across age groups.

Fig. 2. Age-specific prevalence and subgroup differences in lifetime suicide attempts (LSA) among adults in China.
Table 1. Basic characteristics of participants with lifetime suicide attempts across age groups

Notes: Abbreviations: LSA = lifetime suicide attempts; BMI = body mass index; ADL = activities of daily living; MPA = moderate-intensity physical activity, days/week; VPA = vigorous-intensity physical activity, days/week; DS = depressive symptoms; AS = anxiety symptoms; NLE = negative life events; IPV = intimate partner violence; ACEs = adverse childhood experiences; SI = suicidal ideation; NSSI = non-suicidal self-injury; NSSI-medical = NSSI requiring medical treatment. BMI: underweight < 18.5; normal weight 18.5–24.9; overweight/obesity ≥ 25.0 kg/m2. Educational level: low (junior high school or below), medium (senior high school or vocational/technical school), high (college degree or above). Job status: in education or employment (Yes) vs not in education or employment (No). Marital status: married or unmarried (never married, divorced, widowed, or separated).
Predictor screening
We applied the predefined two-step feature selection separately within each age group. Figures S1–S3 show the univariable logistic-regression rankings, LASSO cross-validation error curves, and coefficient shrinkage paths; Table S3 shows the overlap in selected predictors across age groups. Eighteen predictors were retained for ages 18–24, 22 for 25–44, and 20 for ≥45. Several key features overlapped across groups, but age-specific differences were evident, consistent with distinct life-stage risk profiles. This combined approach improved the robustness of subsequent model development.
Model performance
We evaluated the predictive performance of five ML models on the test set across three age groups (Table 2). Discriminatory power and calibration were assessed using ROC curves (Fig. 3A–C) and calibration plots (Fig. 3D–F). Based on overall performance, SVM was identified as the optimal model in all groups and was further evaluated using DCA (Fig. 3G-I).

Fig. 3. Model performance in predicting lifetime suicide attempts across age groups.
Table 2. Comparison of model performance in predicting lifetime suicide attempts (LSA) on the test set data

Notes: RF = Random Forest; LR = Logistic Regression; SVM = Support Vector Machine; XGBoost = Extreme Gradient Boosting; ROC = Receiver Operating Characteristic curve.
In the 18–24 group, SVM demonstrated the most balanced performance, achieving high AUC (0.88), accuracy (0.805), sensitivity (0.794), specificity (0.815), PPV (0.803), and NPV (0.861). Although XGBoost yielded a slightly higher AUC (0.89), SVM showed more consistent performance across all metrics. Its ROC curve approached the upper-left corner, and its calibration plot showed strong agreement with the ideal reference line. DCA indicated greater net benefit for SVM across a wide range of clinically relevant threshold probabilities compared to “treat all” or “treat none” strategies.
In the 25–44 group, SVM outperformed all other models, with the highest AUC (0.94), accuracy (0.874), sensitivity (0.868), and NPV (0.870), along with strong specificity (0.880) and PPV (0.871). ROC and calibration curves confirmed its excellent discrimination and calibration. DCA further supported its clinical utility.
In the ≥45 group, both SVM and LR achieved the highest AUC (0.94). However, SVM showed slightly better performance in accuracy (0.858 vs. 0.857), specificity (0.885 vs. 0.879), and PPV (0.875 vs. 0.873), while maintaining comparable sensitivity and NPV. It also showed calibration closer to the 45° line. DCA consistently demonstrated higher net clinical benefit for SVM.
SHAP-based model interpretability analysis
To interpret the best-performing SVM models across age groups, we applied SHAP analysis to quantify each predictor’s contribution to the predicted probability of LSA. For each group, the SHAP bar plot (left) shows mean absolute SHAP values (global importance), and the summary plot (right) illustrates the direction and magnitude of feature effects (Figs. 4, 5, and 6; full results in Figures S4–S6). For accessibility, SHAP values > 0 indicate an increase in the model-predicted probability of LSA (values < 0 indicate a decrease); the absolute value reflects contribution size, and point colour encodes the feature value (red = higher, blue = lower).

Fig. 4a. Top 10 features identified by SHAP using the best-performing model (SVM) in the 18-24y group.

Fig. 4b. Top 10 features identified by SHAP using the best-performing model (SVM) in the 25-44y group.

Fig. 4c. Top 10 features identified by SHAP using the best-performing model (SVM) in the ≥ 45y group.
In the 18–24 group, the top predictors were SI, ACEs, suicide disclosure, sleep quality, and AS, followed by neighbourhood relations, self-efficacy, and self-rated health. Summary plots indicated that higher ACEs and SI were associated with a higher predicted probability of LSA, whereas better sleep quality and stronger neighbourhood relations were associated with a lower predicted probability.
In the 25–44 group, leading contributors included marriage, SI, ACEs, suicide disclosure, and living status. IPV, perceived stress, and self-efficacy also featured prominently. Being unmarried, exposure to IPV, and higher perceived stress were associated with higher predicted probability of LSA.
In the ≥45 group, SI, ACEs, and suicide disclosure again ranked highest, followed by sleep quality, ADL limitation, and DS. Functional limitations and psychological distress (e.g., perceived stress, AS, DS) contributed substantially to risk in this older cohort.
Across age groups, SI, ACEs, and suicide disclosure consistently emerged as core predictors. In contrast, other factors varied by life stage: sleep and AS weighed more in younger adults; relationship/structural factors (e.g., marital status, living status) dominated in midlife; and functional status and mental-health measures (e.g., ADL limitation, DS) were most salient in older adults.
Sensitive analysis
SHAP-identified top predictors of LSA remained largely consistent across age groups under all sensitivity checks, including exclusion of imputed data (Table S4; Figures S7–S10), application of 10-fold stratified cross-validation (Table S5; Figures S11–S14), use of a 60:40 train-test split (Table S6; Figures S15-S18), and comparison of feature rankings between similarly performing models (LR vs. SVM in the ≥45y group; Figure S19), further supporting the robustness of the findings.
Discussion
To our knowledge, this is the first study to apply ML approaches to examine age-stratified prevalence and predictors of LSA among Chinese adults. Using a nationally representative cross-sectional survey, we identified three principal findings: First, the overall prevalence of LSA was 4.57%, indicating a relatively high burden with a pronounced age gradient—prevalence declined with increasing age, and young adults represented the highest-risk group. Second, SI, ACEs, and suicide disclosure consistently emerged as the most robust predictors across all age groups. Third, risk profiles varied by age: psychological distress and sleep-related problems predominated in young adults; marital and living status, IPV, and perceived stress were more salient in mid-life; and in older adults, poor sleep, functional limitations, and DS were the primary contributors. Collectively, these findings provide empirical evidence to guide the development of age-tailored suicide prevention strategies in China.
SA is among the strongest predictors of death by suicide (Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016). In this nationally representative study, the prevalence of LSA among Chinese adults was 4.57%, with a marked age gradient: 8.10% in young adults, 4.67% in mid-life, and 2.67% in older adults. National epidemiological data on SA remain limited; most prior studies have focused on local settings or high-risk clinical groups, yielding widely varying estimates. For example, a 2001-2002 survey of community-dwelling adults in metropolitan China reported a LSA prevalence of 1.0% (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007). By contrast, a meta-analysis of 29 studies in Chinese college students estimated a pooled LSA prevalence of 2.8% (range 0.4%–10.5%) (Yang et al., Reference Yang, Zhang, Sun, Sun and Ye2015). Internationally, nationally representative World Mental Health surveys across 17 countries reported adult LSA prevalence ranging from 0.5% in Italy to 5.0% in the United States(Nock et al., Reference Nock, Borges, Bromet, Alonso, Angermeyer, Beautrais, Bruffaerts, Chiu, de Girolamo, Gluzman, de Graaf, Gureje, Haro, Huang, Karam, Kessler, Lepine, Levinson, Medina-Mora, Ono, Posada-Villa and Williams2008a). Age-specific patterns show broad cross-national regularities: most countries report higher LSA in younger adults, whereas Japan is a notable exception with relatively elevated risk in midlife. Our study indicates that the prevalence of LSA among Chinese adults is relatively high. This may reflect both substantive and methodological factors: rapid social change with rising depression and stress exposure; limited access and continuity of mental-health care alongside stigma that suppresses help-seeking and disclosure (Wu et al., Reference Wu, Su, Chen, Zhao, Zhong and Zheng2023, Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b, Reference Wu, Zhu, Wang and Jiang2021); and the use of self-reported recall measures—which typically yield higher estimates than diagnostic or registry data—plus differences in survey timing.
Notably, young adults show the highest LSA risk, plausibly reflecting heavier psychosocial stressors (academic and career pressures, identity formation), greater exposure to self-injury content online, and fewer coping resources (Wu et al., Reference Wu, Zhu, Wang and Jiang2021). This pattern should not be equated with the bimodal distribution of suicide mortality (youth and late-life peaks) (Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b): mortality reflects attempt incidence and case-fatality, and case-fatality increases with age (more lethal methods, greater frailty/comorbidity, lower rescue), yielding a late-life peak despite fewer attempts. Underreporting of lifetime attempts and survivorship may also depress observed LSA in older adults. These findings provide national epidemiological evidence for China and underscore the need for improved surveillance and age-targeted prevention.
Guided by the age-specific pattern of LSA, we applied multiple ML algorithms to identify key predictors across the lifespan. SI, ACEs, and suicide disclosure consistently emerged as the most robust predictors. SI was the strongest predictor, reaffirming its pivotal role in the suicidal process (Klonsky et al., Reference Klonsky, May and Saffer2016). According to the Interpersonal Theory of Suicide, SI arises from perceived burdensomeness, thwarted belongingness, and psychological pain (Van Orden et al., Reference Van Orden, Witte, Cukrowicz, Braithwaite, Selby and Joiner2010), while the Three-Step Theory conceptualizes it as a necessary, though not sufficient, precursor, triggered by intolerable distress and hopelessness (Klonsky et al., Reference Klonsky, May and Saffer2016). Although not all individuals with SI progress to attempts, SI substantially elevates risk and remains a necessary antecedent (Klonsky et al., Reference Klonsky, May and Saffer2016). These findings support integrating brief SI screening into frontline settings (e.g., primary care, schools, crisis services), alongside the standardized response protocols and inclusion of SI indicators in public health surveillance to inform timely intervention and resource allocation.
ACEs represent cumulative early-life adversity with lasting impacts on psychological development (Norman et al., Reference Norman, Byambaa, De, Butchart, Scott and Vos2012). Extensive evidence links childhood abuse, neglect, and interpersonal violence to elevated risk of suicidal behaviour in adulthood (Angelakis et al., Reference Angelakis, Gillespie and Panagioti2019; Norman et al., Reference Norman, Byambaa, De, Butchart, Scott and Vos2012). One possible mechanism is that ACEs foster negative cognitive patterns—such as powerlessness, defeat, and entrapment—that impair emotional regulation under stress (Angelakis et al., Reference Angelakis, Gillespie and Panagioti2019). ACEs are also linked to post-traumatic stress disorder, where hopelessness and psychological disengagement may trigger SI and accelerate its progression to SA (Angelakis et al., Reference Angelakis, Gillespie and Panagioti2019). Their consistent predictive value underscores the need for upstream, trauma-informed policies that go beyond clinical screening—such as parenting support programs, school-based violence prevention, and cross-sector data integration to identify and intervene in at-risk environments before patterns of harm are established.
Suicide disclosure similarly predicted LSA across age groups. It often reflects a critical threshold of psychological distress and a strong need for help. From a cognitive-behavioural perspective, disclosure signals the externalization of suicidality—a shift from internal struggle to overt expression—and thus heightened risk (Rudd, Reference Rudd2000). While concealment may reflect fear of stigma, unsupported disclosure can intensify hopelessness and isolation, increasing the likelihood of escalation. These findings call for dedicated post-disclosure protocols across sectors, including hotline escalation pathways, school-based rapid response teams, and mandated training for frontline staff to recognize, triage, and support individuals who disclose suicidal thoughts or intent—bridging the gap between expression and action.
In addition to shared predictors, our SHAP analysis revealed distinct age-specific patterns of LSA risk. Among young adults, the pattern was primarily emotion–relationship driven. Poor sleep, high anxiety, weak neighbourhood ties, and low self-efficacy emerged as salient predictors, reflecting developmental challenges of identity formation, autonomy, and early academic or career stress. Sleep disturbances and anxiety may indicate emotional dysregulation and vulnerability to affective instability, increasing the risk of impulsive suicidal behaviours (Kearns et al., Reference Kearns, Coppersmith, Santee, Insel, Pigeon and Glenn2020). Weak community ties and low self-efficacy may further diminish perceived agency and access to support, reinforcing powerlessness. These findings underscore the value of youth-centred prevention strategies emphasizing emotion regulation, connectedness, and empowerment.
In mid-aged adults, the pattern was largely role-stress driven. Being unmarried, living alone, perceived stress, and IPV ranked highest. This stage is marked by heavier work and family responsibilities alongside shrinking informal support. Emotional isolation from marital disruption or living alone may heighten perceived burdensomeness and thwarted belongingness—key drivers of SI (Van Orden et al., Reference Van Orden, Witte, Cukrowicz, Braithwaite, Selby and Joiner2010). Chronic stress and IPV may also dysregulate the stress-response system, leading to emotional exhaustion and impaired coping (Vidal et al., Reference Vidal, Reinert, Nguyen and Jun2024). These findings highlight the need for integrated psychosocial strategies, including family therapy, IPV screening, and workplace stress reduction.
In older adults, the pattern was health–decline driven. DS, sleep problems, ADL limitations, and NSSI were top predictors. This stage is often accompanied by functional decline, bereavement, and shrinking social roles from retirement or caregiving burdens. ADL impairments undermine autonomy and increase dependency, which in collectivist contexts may be experienced as shame or perceived burden on others. Coexisting depression and sleep disturbances deepen emotional pain and reinforce cognitive distortions, while late-life NSSI may reflect chronic psychological distress or entrenched maladaptive coping. Prevention in later life thus requires a multidisciplinary approach integrating chronic disease management, ADL rehabilitation, and geriatric mental health services tailored to the loss and meaning reconstruction. Together, these findings highlight the need for age-tailored LSA screening and prevention strategies that are developmentally sensitive to the psychosocial vulnerabilities of each life stage.
Finally, our study demonstrates the potential of ML to improve identification of individuals at risk for LSA. Given the rarity and complexity of suicidal behaviour, accurate prediction of SA remains a major challenge (Su et al., Reference Su, John and Lin2023). Traditional risk assessment tools have shown limited value, with meta-analyses reporting low sensitivity and poor positive predictive values across populations (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017; Su et al., Reference Su, John and Lin2023). In contrast, ML can capture complex, non-linear interactions among multiple factors (Su et al., Reference Su, John and Lin2023; Wu et al., Reference Wu, Su, Chen, Zhao, Zhong and Zheng2023). In our analysis, SVM consistently outperformed other models across age groups, yielding balanced predictive performance—critical for reducing both false negatives and positives in suicide risk screening. These findings support the feasibility and added value of ML-based prediction. However, concerns remain regarding potential bias and inflated performance estimates (Jacobucci et al., Reference Jacobucci, Littlefield, Millner, Kleiman and Steinley2021). Thus, rigorous methodological standards and cautious interpretation are essential to ensure clinical applicability.
Strengths and limitations
This study has several strengths. First, it is the first nationwide investigation in China to assess LSA across age groups, providing valuable epidemiological data for future suicide-related research. Second, the inclusion of 37 variables across six domains enabled a comprehensive, multidimensional analysis of LSA predictors. Third, the application and comparison of multiple ML models provided a more robust assessment of predictive performance and model reliability.
However, several limitations should be acknowledged. First, the cross-sectional design precludes causal and temporal inferences. Time-window misalignment between current exposures and lifetime outcomes may introduce reverse causality. Prospective studies are needed to clarify temporal sequence and strengthen causal interpretation. Second, although ML captures nonlinearities and high-order interactions without prior specification, it remains inherently data-driven. SHAP interpretations are model-specific and descriptive rather than causal, and feature importance may vary with feature selection, hyperparameter tuning, and class-imbalance handling. Future work should examine the robustness of SHAP results across different modelling pipelines. Third, LSA and NSSI were measured using single-item, retrospective self-reports, which are prone to recall bias and misclassification (Su et al., Reference Su, John and Lin2023). Stigma, especially among older and rural populations, may further contribute to underreporting and underestimation of prevalence (Wu et al., Reference Wu, Su, Zhao, Chen, Zhong and Zheng2024a, Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b). Fourth, the absence of psychiatric diagnoses may have reduced model performance and limited insights into mental health–related risk factors, despite their known associations with suicidal behaviour (Mullins et al., Reference Mullins, Kang, Campos, Coleman, Edwards, Galfalvy, Levey, Lori, Shabalin, Starnawska, Su, Watson, Adams, Awasthi, Gandal, Hafferty, Hishimoto, Kim, Okazaki, Otsuka, Ripke, Ware, Bergen, Berrettini, Bohus, Brandt, Chang, Chen, Chen, Crawford, Crow, DiBlasi, Duriez, Fernández-Aranda, Fichter, Gallinger, Glatt, Gorwood, Guo, Hakonarson, Halmi, Hwu, Jain, Jamain, Jiménez-Murcia, Johnson, Kaplan, Kaye, Keel, Kennedy, Klump, Li, Liao, Lieb, Lilenfeld, Liu, Magistretti, Marshall, Mitchell, Monson, Myers, Pinto, Powers, Ramoz, Roepke, Rozanov, Scherer, Schmahl, Sokolowski, Strober, Thornton, Treasure, Tsuang, Witt, Woodside, Yilmaz, Zillich, Adolfsson, Agartz, Air, Alda, Alfredsson, Andreassen, Anjorin, Appadurai, Soler Artigas, Van der Auwera, Azevedo, Bass, Bau, Baune, Bellivier, Berger, Biernacka, Bigdeli, Binder, Boehnke, Boks, Bosch, Braff, Bryant, Budde, Byrne, Cahn, Casas, Castelao, Cervilla, Chaumette, Cichon, Corvin, Craddock, Craig, Degenhardt, Djurovic, Edenberg, Fanous, Foo, Forstner, Frye, Fullerton, Gatt, Gejman, Giegling, Grabe, Green, Grevet, Grigoroiu-Serbanescu, Gutierrez, Guzman-Parra, Hamilton, Hamshere, Hartmann, Hauser, Heilmann-Heimbach, Hoffmann, Ising, Jones, Jones, Jonsson, Kahn, Kelsoe, Kendler, Kloiber, Koenen, Kogevinas, Konte, Krebs, Landén, Lawrence, Leboyer, Lee, Levinson, Liao, Lissowska, Lucae, Mayoral, McElroy, McGrath, McGuffin, McQuillin, Medland, Mehta, Melle, Milaneschi, Mitchell, Molina, Morken, Mortensen, Müller-Myhsok, Nievergelt, Nimgaonkar, Nöthen, O’Donovan, Ophoff, Owen, Pato, Pato, Penninx, Pimm, Pistis, Potash, Power, Preisig, Quested, Ramos-Quiroga, Reif, Ribasés, Richarte, Rietschel, Rivera, Roberts, Roberts, Rouleau, Rovaris, Rujescu, Sánchez-Mora, Sanders, Schofield, Schulze, Scott, Serretti, Shi, Shyn, Sirignano, Sklar, Smeland, Smoller, Sonuga-Barke, Spalletta, Strauss, Świątkowska, Trzaskowski, Turecki, Vilar-Ribó, Vincent, Völzke, Walters, Shannon Weickert, Weickert, Weissman, Williams, Wray, Zai, Ashley-Koch, Beckham, Hauser, Hauser, Kimbrel, Lindquist, McMahon, Oslin, Qin, Agerbo, Børglum, Breen, Erlangsen, Esko, Gelernter, Hougaard, Kessler, Kranzler, Li, Martin, McIntosh, Mors, Nordentoft, Olsen, Porteous, Ursano, Wasserman, Werge, Whiteman, Bulik, Coon, Demontis, Docherty, Kuo, Lewis, Mann, Rentería, Smith, Stahl, Stein, Streit, Willour and Ruderfer2022). Finally, model development and validation were conducted within a single dataset, which limits generalizability and highlights the need for external validation in future research.
Conclusion
The prevalence of LSA among Chinese adults remains relatively high, with a clear age gradient—peaking in young adults and declining with age. Risk profiles revealed both shared and age-specific predictors, reflecting distinct life-stage vulnerabilities. These findings highlight the need for age-tailored suicide prevention strategies in China.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S2045796025100231.
Availability of data and materials
Data are available on reasonable request. The data that support the findings of this study are available from the PBICR team (Yibo Wu) upon reasonable request.
Acknowledgements
We gratefully acknowledge the PBICR team for their significant efforts in conducting the survey.
Author contributions
Yu Wu: conceptualization, methodology, formal analysis, writing-original draft, and writing-review & editing; Yihao Zhao, Chen Chen, Panliang Zhong: methodology, visualization and validation, and writing-review & editing; Yibo Wu: project administration, validation, and writing-review & editing; Xiaoying Zheng: conceptualization, supervision, project administration, writing-original draft, and writing-review & editing. Yibo Wu can also be contacted for correspondence, email bjmuwuyibo@outlook.com.
Financial support
This work was supported by the National Key Research and Development Program of China (2022YFC3600800), the Population and Aging Health Science Program (WH10022023035).
Competing interests
There are no actual or potential conflicts of interest, including any financial, personal, or other relationships with other people or organizations.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2000. Ethical approval was granted by the Key Laboratory of Health Economics and Policy Research of the National Health Commission (NHC-HEPR202401) and the Ethics Committee of Shanghai Jiao Tong University (H20240237I). The study was registered in the Chinese Clinical Trial Registry (ChiCTR). All participants provided informed consent.