Age-specific prevalence and predictors of lifetime suicide attempts using machine learning in Chinese adults: a nationwide multi-centre survey

Yu Wu; Yihao Zhao; Panliang Zhong; Chen Chen; Yibo Wu; Xiaoying Zheng

doi:10.1017/S2045796025100231

Age-specific prevalence and predictors of lifetime suicide attempts using machine learning in Chinese adults: a nationwide multi-centre survey

Published online by Cambridge University Press: 20 October 2025

Yu Wu

Yihao Zhao ,

Panliang Zhong ,

Chen Chen ,

Yibo Wu and

Xiaoying Zheng

Show author details

Yu Wu: Affiliation:
Department of Population Health and Aging Science, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Yihao Zhao: Affiliation:
Department of Population Health and Aging Science, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Panliang Zhong: Affiliation:
Department of Population Health and Aging Science, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Chen Chen: Affiliation:
Department of Population Health and Aging Science, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Yibo Wu: Affiliation:
School of Public Health, Peking University, Beijing, China
Xiaoying Zheng*: Affiliation:
China Center for Environmental and Energy Economics (C2E3), Peking University, Chengze Yuan, Beijing, China
*: Corresponding author: Xiaoying Zheng; Email: zhengxiaoying@sph.pumc.edu.cn

Article contents

Abstract
Aims
Methods
Results
Conclusions
Introduction
Methods
Results
Discussion
Strengths and limitations
Conclusion
Supplementary material
Availability of data and materials
Author contributions
Financial support
Competing interests
Ethical standards
References

Rights & Permissions

Abstract

Aims

The epidemiology and age-specific patterns of lifetime suicide attempts (LSA) in China remain unclear. We aimed to examine age-specific prevalence and predictors of LSA among Chinese adults using machine learning (ML).

Methods

We analyzed 25,047 adults in the 2024 Psychology and Behavior Investigation of Chinese Residents (PBICR-2024), stratified into three age groups (18–24, 25–44, ≥ 45 years). Thirty-seven candidate predictors across six domains—sociodemographic, physical health, mental health, lifestyle, social environment, and self-injury/suicide history—were assessed. Five ML models—random forest, logistic regression, support vector machine (SVM), Extreme Gradient Boosting (XGBoost), and Naive Bayes—were compared. SHapley Additive exPlanations (SHAP) were used to quantify feature importance.

Results

The overall prevalence of LSA was 4.57% (1,145/25,047), with significant age differences: 8.10% in young adults (18–24), 4.67% in adults aged 25–44, and 2.67% in older adults (≥45). SVM achieved the best test-set performance across all ages [area under the curve (AUC) 0.88–0.94, sensitivity 0.79–0.87, specificity 0.81–0.88], showing superior calibration and net clinical benefit. SHAP analysis identified both shared and age-specific predictors. Suicidal ideation, adverse childhood experiences, and suicide disclosure were consistent top predictors across all ages. Sleep disturbances and anxiety symptoms stood out in young adults; marital status, living alone, and perceived stress in mid-life; and functional limitations, poor sleep, and depressive symptoms in older adults.

Conclusions

LSA prevalence in Chinese adults is relatively high, with a clear age gradient peaking in young adulthood. Risk profiles revealed both shared and age-specific predictors, reflecting distinct life-stage vulnerabilities. These findings support age-tailored suicide prevention strategies in China.

Keywords

age differences associated factors China machine learning prevalence suicide attempt suicidal behavior

Information

Type: Original Article
Information: Epidemiology and Psychiatric Sciences , Volume 34 , 2025 , e52

DOI: https://doi.org/10.1017/S2045796025100231 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright: © The Author(s), 2025. Published by Cambridge University Press.

Introduction

Suicide remains a major global public health challenge, claiming over 700,000 lives annually (WHO, 2021). In China, approximately 116,000 suicide deaths were reported in 2019, accounting for about 17% of the global total (WHO, 2021). The age-standardized rate was 6.7 per 100,000, lower than the global average of 9.0 per 100,000 (WHO, 2021). Although China has seen a substantial decline in suicide rates over the past three decades (Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b), the country’s large population keeps the absolute burden high. Suicide attempt (SA) is among the strongest predictors of suicide (Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016). An estimated 8.5% to 13% of attempts are fatal, with the first-attempt fatality rate around 3% (Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016; Conner et al., Reference Conner, Azrael and Miller2019; Miller et al., Reference Miller, Azrael and Barber2012). Nearly half of suicide deaths occur following a prior attempt rather than on the first attempt (Anestis, Reference Anestis2016; Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016). Preventing nonfatal SA provides a critical window for early intervention, especially in China.

Age is a well-established factor influencing suicidal behaviours (Yazdi-Ravandi et al., Reference Yazdi-Ravandi, Khazaei, Davari, Matinnia, Karami, Taslimi, Afkhami and Ghaleiha2023). Although no uniform age pattern has been identified, substantial epidemiological differences across age groups have been consistently reported (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017; Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b). Globally, adolescents and young adults are widely recognized as high-risk groups for both SA and suicide deaths (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017; Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b). In China, a nationwide analysis of cause-of-death surveillance data revealed a bimodal age distribution in suicide mortality, with peaks among those aged 15–24 and older adults (Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b). However, nationally representative data on SA remain scarce, possibly due to suicide-related stigma, stringent ethical oversight, and the challenges of diagnostic complexity, participant recruitment, disclosure, and data collection in population-based surveys on suicidal behaviours (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007; Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b). Most available evidence comes from small, high-risk samples—such as individuals with mental disorders (Chen et al., Reference Chen, Chi, Niu, Gao, Mei, Zhao, Hu, Zhao and Ma2022; Song et al., Reference Song, Liu, Zhou and Zhang2023), sexual and gender minorities (Chen et al., Reference Chen, Zhu, Wright, Drescher, Gao, Wu, Ying, Qi, Chen, Xi, Ji, Zhao, Ou and Broome2019), children and adolescents (Su et al., Reference Su, John and Lin2023; Wu et al., Reference Wu, Zhu, Wang and Jiang2021), and rural or metropolitan residents (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007; Lyu et al., Reference Lyu, Wang, Shi and Zhang2018)—and shows wide variability across subgroups. As a result, the overall prevalence and age-specific patterns of SA in the general population remain unclear. Given China’s age-patterned suicide mortality across life stages, we hypothesize that SA similarly displays distinct age-specific profiles and predictors. Clarifying these patterns is critical for informing age-tailored suicide prevention strategies.

SA arises from multiple factors that often interact in complex, nonlinear ways (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017). Most prediction studies have relied on linear models, yielding modest accuracy (Wu et al., Reference Wu, Su, Chen, Zhao, Zhong and Zheng2023). Although traditional approaches such as generalized additive models can model certain nonlinearities, they require pre-specifying interactions and are less suited to high-dimensional, mixed-type data. A meta-analysis of 365 studies found that models based solely on conventional risk factors performed only slightly better than chance, highlighting the need for machine-learning (ML) risk algorithms (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017). ML can flexibly capture nonlinearities and higher order interactions, handle heterogeneous predictors and class imbalance, and do so without strict parametric assumptions (García de la Garza et al., Reference García de la Garza, Blanco, Olfson and Wall2021). Given the likely age-related heterogeneity of SA in the general population, identifying age-specific predictors in non-clinical samples is essential (García de la Garza et al., Reference García de la Garza, Blanco, Olfson and Wall2021; Gordon et al., Reference Gordon, Avenevoli and Pearson2020). To date, no study in China has used ML for this purpose, leaving a key evidence gap.

In this nationally representative, multicentre survey, we aimed to (1) estimate the age-specific prevalence of lifetime suicide attempts (LSA) among Chinese adults and (2) identify key age-specific predictors using five ML algorithms. These findings may help inform tailored suicide prevention strategies across the life course.

Methods

Data source and study population

Data came from the Psychology and Behavior Investigation of Chinese Residents (PBICR), a nationally representative, multicentre cross-sectional survey led by Peking University from June to September 2024. Using stratified and quota sampling, the survey randomly selected 150 cities, 202 districts/counties, and 390 townships across 31 provincial-level regions (including Hong Kong and Macao), covering 800 communities or villages. Trained interviewers recruited participants online and onsite at survey locations, verified eligibility, and administered standardized electronic questionnaires in one-to-one, face-to-face interviews. Eligible participants were adults (≥18 years) who were Chinese nationals permanently residing in China and able to clearly understand and complete the questionnaire. We excluded individuals with diagnosed psychiatric or cognitive disorders, those concurrently enrolled in similar surveys, those who declined or did not sign informed consent, and responses with completion times < 5 min. For participants with intact decisional capacity but insufficient motor ability, interviewers conducted one-to-one interviews and recorded responses on their behalf. All procedures complied with the Declaration of Helsinki and received institutional ethics approval (see Ethics Statement).

Of 38,793 invited residents, 38,424 responded (response rate: 99.05%). We excluded 2,563 invalid questionnaires due to age < 18, non-Chinese nationality, no consent, response time < 5 min, or missing age and LSA, leaving 35,861 fully eligible participants. To improve national representativeness, post-stratification weights based on sex, age, and urban–rural distribution as reported in the China Statistical Yearbook 2023 (National Bureau of Statistics of China, 2023) were applied, yielding a final analytic sample of 25,047. Missing covariate values (n = 136) were imputed using MissForest, a non-parametric algorithm based on random forests suitable for both continuous and categorical data (Stekhoven and Buhlmann, Reference Stekhoven and Buhlmann2012). Based on prior suicide-related research, mortality surveillance practices, and developmental distinctions across life stages (Miller et al., Reference Miller, Azrael and Hemenway2004; Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b), age was categorized into three groups (18–24, 25–44, and ≥45 years), reflecting distinct patterns of psychosocial development, health risk exposures, and suicide-related vulnerabilities. The sample selection flowchart is shown in Fig. 1.

Fig. 1. Flow diagram of sample selection.

Outcome assessment

LSA was defined as any self-inflicted, potentially injurious act at any point in life, carried out with at least some intent to die, regardless of the outcome (Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b). Consistent with prior suicide-related research (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007; Oquendo et al., Reference Oquendo, Wall, Wang, Olfson and Blanco2024), LSA was assessed using a dichotomous self-report item: “Have you ever attempted suicide?”. Participants who answered “yes” were classified as lifetime suicide attempters, and those who answered “no” as non-attempters.

Predictors

Based on previous studies (Su et al., Reference Su, John and Lin2023; Wu et al., Reference Wu, Zhu, Wang and Jiang2021), we included 37 candidate predictors across six domains (details in Table S1):

(1) Sociodemographic: gender (male/female), marital status (married vs. unmarried—never married, divorced, widowed, or separated), residence (urban/rural), ethnicity (Han/ethnic minority), educational level (low: junior high school or below; medium: senior high school or vocational/technical school; high: college degree or above), job status (in education or employment vs. not in education or employment), economic status (monthly household income: low < 3000 RMB; middle 3000–5999 RMB; high ≥ 6000 RMB), and medical insurance (yes/no).
(2) Physical health: body mass index (BMI, underweight < 18.5, normal 18.5–24.9, overweight/obese ≥ 25); chronic disease (yes/no); activities of daily living (ADL) limitation (yes/no), defined as difficulty with basic self-care tasks (walking across a room, bathing/showering, or dressing); and self-rated health (0–100, higher scores represent better status).
(3) Lifestyle: smoking (yes/no), drinking (yes/no), living alone (yes/no), sleep quality (five-item sleep health score: chronotype, duration, insomnia, snoring, daytime sleepiness; range 0–5; higher scores indicate better sleep) (Fan et al., Reference Fan, Sun, Zhou, Heianza, Lv, Li and Qi2020), social media addiction (SMA, assessed with the Bergen Social Media Addiction Scale; total score 6–30; scores ≥ 19 indicate at-risk status; Cronbach’s α = 0.926 overall) (Leung et al., Reference Leung, Pakpour, Strong, Lin, Tsai, Griffiths, Lin and Chen2020), and moderate- and vigorous-intensity physical activity (MPA, VPA) (measured as the number of self-reported days per week).
(4) Social environment: social support (three-item Perceived Social Support Scale; total score 0–18, higher scores indicate stronger perceived support; Cronbach’s α = 0.904 overall) (Wu et al., Reference Wu, Tang, Du, Chen, Wang, Sun, Zhang and Wu2025); self-rated social status (a self-assessed item, 1–7; higher scores indicate higher perceived status); neighbourhood relations (a self-assessed item, 1–7; higher scores indicate better relations); and social participation (assessed by the five-item Social Connection Scale; total score 0–5; higher scores indicate greater engagement in shared activities and stronger social connectedness; Cronbach’s α = 0.805 overall) (Foster et al., Reference Foster, Gill, Mair, Celis-Morales, Jani, Nicholl, Lee and O’Donnell2023).
(5) Mental health: depressive symptoms (DS, assessed using Patient Health Questionnaire-9; total score 0–27, scores ≥ 10 indicating having DS; Cronbach’s α = 0.940 overall) (Levis et al., Reference Levis, Benedetti and Thombs2019); anxiety symptoms (AS, assessed using Generalized Anxiety Disorder-3 scale; total score 0–9, scores ≥ 3 indicating AS; Cronbach’s α = 0.901 overall) (Wang et al., Reference Wang, Wu, Wang, Du and Wu2024); negative life events (NLE, any major stressor in the past year; yes/no); loneliness (a self-rated item, 1–7; higher scores indicate greater loneliness); perceived stress (assessed by the four-item Perceived Stress Scale; total score 0–16; higher scores indicate higher perceived stress; Cronbach’s α = 0.934 overall) (Huang et al., Reference Huang, Wang, Wang, Zhang, Du, Su, Jia, Ouyang, Wang, Li, Jiang and Zhang2020); burnout (measured with the seven-item Copenhagen Burnout Inventory; total score 0–28; higher scores indicate greater exhaustion; Cronbach’s α = 0.851 overall) (Borritz et al., Reference Borritz, Bültmann, Rugulies, Christensen, Villadsen and Kristensen2005); self-efficacy (assessed with the three-item New Generalized Self-Efficacy Scale; total score 0–12; higher scores indicate greater self-efficacy; Cronbach’s α = 0.929 overall) (Feng and Chen, Reference Feng and Chen2012); intimate partner violence (IPV, measured with the five-item scale covering physical, emotional, and controlling behaviours; total score 0-20; higher scores indicate greater severity; Cronbach’s α = 0.921 overall) (Yount et al., Reference Yount, Cheong, Khan, Bergenfeld, Kaslow and Clark2022); adverse childhood experiences (ACEs, cumulative score from 30 binary items covering household dysfunction, abuse, neglect, bereavement, and exposure to violence before age 18; total score 0–30; higher scores indicate greater adversity; Cronbach’s α = 0.901 overall) (Lin et al., Reference Lin, Wang, Lu, Chen and Guo2021).
(6) Self-injury and suicide-related history (all variables refer to lifetime history unless otherwise specified) (Nock et al., Reference Nock, Borges, Bromet, Cha, Kessler and Lee2008b): Non-suicidal self-injury (NSSI): deliberate self-inflicted damage to the body’s surface without suicidal intent (yes/no); NSSI-medical: history of NSSI that required medical treatment (yes/no); Suicidal ideation (SI): thoughts about engaging in behaviour intended to end one’s life (yes/no); Suicide plan (SP): formulation of a specific method, means, timing, or place for a suicide attempt (yes/no); Suicide disclosure: disclosure of suicide plans to others (yes/no).

Feature screening

Feature selection proceeded in two steps. First, univariable logistic regressions for LSA were run to prefilter predictors (retain if p < 0.05). Second, least absolute shrinkage and selection operator (LASSO) logistic regression with five-fold cross-validation was used to tune λ and apply an L1 penalty that shrinks irrelevant coefficients to zero, addressing high dimensionality and multicollinearity (Lee et al., Reference Lee, Gornitz, Xing, Heckerman and Lippert2018). Only predictors retained by both steps were carried forward. This intersection rule was chosen to enhance model stability, interpretability, and out-of-sample generalizability.

Machine learning algorithms

To identify the most accurate classifier, we compared five supervised algorithms on each age-specific dataset. All models used the same feature set obtained from the Predictors section. Hyperparameters were tuned by grid search with stratified five-fold cross-validation in the training set, and performance was evaluated on a held-out 30% test set (see Table S2).

(1) Random forest (RF) (Breiman, Reference Breiman2001): RF is an ensemble of decision trees grown on bootstrap samples with random feature subsampling. Predictions are aggregated by majority vote. This setup captures nonlinearities and interactions, is relatively robust to noise and overfitting, and provides internal variable-importance measures.
(2) Logistic regression (LR) (Pan and Xu, Reference Pan and Xu2022): LR models the log-odds of a binary outcome as a linear function of predictors. Coefficients are directly interpretable and estimation is efficient, suiting problems with modest feature sets or an emphasis on describing predictor–outcome associations.
(3) Support Vector Machine (SVM) (Rezvani and Wu, Reference Rezvani and Wu2023): SVM is a margin-based classifier that finds a separating hyperplane with maximal margin. Kernel functions allow nonlinear decision boundaries. It performs well in high-dimensional or sparse settings, particularly when the number of features exceeds the number of observations or when class boundaries are approximately linearly separable.
(4) Extreme Gradient Boosting (XGBoost) (Chen and Guestrin, Reference Chen and Guestrin2016): XGBoost is a gradient-boosted ensemble of decision trees fitted stage-wise to minimize a differentiable loss, with shrinkage and L1/L2 regularization to curb overfitting. It delivers strong accuracy on structured, high-dimensional data and supports efficient parallel training, making it well suited to complex health datasets.
(5) Naive Bayes (Domingos and Pazzani, Reference Domingos and Pazzani1997): Naive Bayes is a probabilistic classifier grounded in Bayes’ theorem that assumes conditional independence among features. Despite its simplicity, it performs well in many practical settings, particularly for high-dimensional, text-based, or weakly correlated data, and when computational efficiency is essential.

Statistical analysis

We estimated and compared the prevalence of LSA across age groups using chi-square tests. Within each age group, we compared participant characteristics between those with and without LSA using independent-samples t tests (or Mann–Whitney U tests for non-normal data) for continuous variables and chi-square tests for categorical variables. Continuous variables are reported as mean ± standard deviation (SD) if approximately normal, and as median (interquartile range, IQR) otherwise; categorical variables are reported as number and percentage.

To mitigate class imbalance in the binary classification task, we applied Adaptive Synthetic (ADASYN) oversampling to the training set after feature selection (Haibo et al., Reference Haibo, Yang, Garcia and Shutao2008). This method generates synthetic samples adaptively based on local feature density, improving classifier performance in imbalanced settings.

Each of the three age-specific datasets was randomly split into training (70%) and test (30%). Five-fold stratified cross-validation was performed within the training set for hyperparameter tuning and to reduce overfitting. Final performance was evaluated on the held-out test set using multiple metrics. The area under the receiver operating characteristic curve (AUC) assessed overall discrimination (0.5 indicating no discrimination and 1.0 perfect classification). Accuracy measured the proportion of correctly classified cases. Sensitivity reflected the proportion of true positives correctly identified, while specificity measured the proportion of true negatives. Precision (positive predictive value, PPV) and negative predictive value (NPV) captured the correctness of positive and negative predictions, respectively. The F1 score, defined as the harmonic mean of precision and sensitivity, offered a balanced metric under class imbalance. Beyond discrimination, model calibration was evaluated using calibration curves, and clinical utility was assessed with decision curve analysis (DCA). For the best-performing model in each dataset, we computed SHAP (SHapley Additive exPlanations) values to interpret feature contributions, offering insights into both global importance and individual-level predictions.

Sensitivity analyses were conducted to test the robustness of results: (1) reanalysis using complete cases to assess the impact of imputation; (2) 10-fold stratified cross-validation to test resampling sensitivity; (3) a 60:40 development/validation split to examine split-ratio effects; and (4) comparing feature-importance rankings across similarly performing models—to gauge predictor stability.

All analyses were performed using Python (version 3.7). Statistical significance was defined as two-sided p < 0.05.

Results

Prevalence of LSA and participant characteristics across age groups

As shown in Fig. 2, 1,145 of 25,047 participants reported LSA, including 417, 459, and 269 cases in the 18–24, 25–44, and ≥ 45 age groups, respectively. The overall prevalence of LSA was 4.57%, slightly higher in males (4.69%) than in females (4.45%), and similar between urban (4.49%) and rural (4.66%) residents. Prevalence varied significantly across age groups—peaking at 18–24 years (8.10%), then 25–44 years (4.67%), and lowest at ≥ 45 years (2.67%)—with all pairwise differences significant (all p < 0.001). Table 1 summarizes characteristics by LSA status within each age group across six domains. Age-related heterogeneity was most evident for sociodemographic factors (gender, marital status, education, income, insurance), whereas differences across lifestyle, social environment, mental health, and self-injury/suicide-history domains were broadly consistent across age groups.

Note: (A) LSA prevalence by age group in the overall population; (B) Number of LSA cases by age group; (C) LSA prevalence by age group and sex; (D) LSA prevalence by age group and residential location (urban vs. rural). LSA prevalence differed by age, with all pairwise comparisons significant (all p < 0.001)

Fig. 2. Age-specific prevalence and subgroup differences in lifetime suicide attempts (LSA) among adults in China.

Table 1. Basic characteristics of participants with lifetime suicide attempts across age groups

Notes: Abbreviations: LSA = lifetime suicide attempts; BMI = body mass index; ADL = activities of daily living; MPA = moderate-intensity physical activity, days/week; VPA = vigorous-intensity physical activity, days/week; DS = depressive symptoms; AS = anxiety symptoms; NLE = negative life events; IPV = intimate partner violence; ACEs = adverse childhood experiences; SI = suicidal ideation; NSSI = non-suicidal self-injury; NSSI-medical = NSSI requiring medical treatment. BMI: underweight < 18.5; normal weight 18.5–24.9; overweight/obesity ≥ 25.0 kg/m². Educational level: low (junior high school or below), medium (senior high school or vocational/technical school), high (college degree or above). Job status: in education or employment (Yes) vs not in education or employment (No). Marital status: married or unmarried (never married, divorced, widowed, or separated).

Predictor screening

We applied the predefined two-step feature selection separately within each age group. Figures S1–S3 show the univariable logistic-regression rankings, LASSO cross-validation error curves, and coefficient shrinkage paths; Table S3 shows the overlap in selected predictors across age groups. Eighteen predictors were retained for ages 18–24, 22 for 25–44, and 20 for ≥45. Several key features overlapped across groups, but age-specific differences were evident, consistent with distinct life-stage risk profiles. This combined approach improved the robustness of subsequent model development.

Model performance

We evaluated the predictive performance of five ML models on the test set across three age groups (Table 2). Discriminatory power and calibration were assessed using ROC curves (Fig. 3A–C) and calibration plots (Fig. 3D–F). Based on overall performance, SVM was identified as the optimal model in all groups and was further evaluated using DCA (Fig. 3G-I).

Note: (A–C) Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) for 18–24y, 25–44y, and ≥ 45y age groups; (D–F) Calibration curves comparing predicted versus observed probabilities for each model across the three age groups; (G–I) Decision Curve Analyses showing the net benefit of using Support Vector Machine (SVM) models across a range of threshold probabilities. Models included: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Naive Bayes

Fig. 3. Model performance in predicting lifetime suicide attempts across age groups.

Table 2. Comparison of model performance in predicting lifetime suicide attempts (LSA) on the test set data

Notes: RF = Random Forest; LR = Logistic Regression; SVM = Support Vector Machine; XGBoost = Extreme Gradient Boosting; ROC = Receiver Operating Characteristic curve.

In the 18–24 group, SVM demonstrated the most balanced performance, achieving high AUC (0.88), accuracy (0.805), sensitivity (0.794), specificity (0.815), PPV (0.803), and NPV (0.861). Although XGBoost yielded a slightly higher AUC (0.89), SVM showed more consistent performance across all metrics. Its ROC curve approached the upper-left corner, and its calibration plot showed strong agreement with the ideal reference line. DCA indicated greater net benefit for SVM across a wide range of clinically relevant threshold probabilities compared to “treat all” or “treat none” strategies.

In the 25–44 group, SVM outperformed all other models, with the highest AUC (0.94), accuracy (0.874), sensitivity (0.868), and NPV (0.870), along with strong specificity (0.880) and PPV (0.871). ROC and calibration curves confirmed its excellent discrimination and calibration. DCA further supported its clinical utility.

In the ≥45 group, both SVM and LR achieved the highest AUC (0.94). However, SVM showed slightly better performance in accuracy (0.858 vs. 0.857), specificity (0.885 vs. 0.879), and PPV (0.875 vs. 0.873), while maintaining comparable sensitivity and NPV. It also showed calibration closer to the 45° line. DCA consistently demonstrated higher net clinical benefit for SVM.

SHAP-based model interpretability analysis

To interpret the best-performing SVM models across age groups, we applied SHAP analysis to quantify each predictor’s contribution to the predicted probability of LSA. For each group, the SHAP bar plot (left) shows mean absolute SHAP values (global importance), and the summary plot (right) illustrates the direction and magnitude of feature effects (Figs. 4, 5, and 6; full results in Figures S4–S6). For accessibility, SHAP values > 0 indicate an increase in the model-predicted probability of LSA (values < 0 indicate a decrease); the absolute value reflects contribution size, and point colour encodes the feature value (red = higher, blue = lower).

Note: 1. Left panel: SHAP bar plot of mean absolute SHAP values (global importance; features ordered by mean SHAP). Right panel: SHAP summary plot (each dot = one participant). Colors encode feature values (red = higher, blue = lower). Positive SHAP values indicate an increase, and negative values a decrease, in the model-predicted probability of LSA. SHAP values were computed on the test set; 2. Abbreviations: SHAP = SHapley Additive exPlanations; SVM = support vector machine; LSA = lifetime suicide attempts; SI = suicidal ideation; ACEs = adverse childhood experiences; AS = anxiety symptoms; SP = suicide plan

Fig. 4a. Top 10 features identified by SHAP using the best-performing model (SVM) in the 18-24y group.

Fig. 4b. Top 10 features identified by SHAP using the best-performing model (SVM) in the 25-44y group.

Fig. 4c. Top 10 features identified by SHAP using the best-performing model (SVM) in the ≥ 45y group.

In the 18–24 group, the top predictors were SI, ACEs, suicide disclosure, sleep quality, and AS, followed by neighbourhood relations, self-efficacy, and self-rated health. Summary plots indicated that higher ACEs and SI were associated with a higher predicted probability of LSA, whereas better sleep quality and stronger neighbourhood relations were associated with a lower predicted probability.

In the 25–44 group, leading contributors included marriage, SI, ACEs, suicide disclosure, and living status. IPV, perceived stress, and self-efficacy also featured prominently. Being unmarried, exposure to IPV, and higher perceived stress were associated with higher predicted probability of LSA.

In the ≥45 group, SI, ACEs, and suicide disclosure again ranked highest, followed by sleep quality, ADL limitation, and DS. Functional limitations and psychological distress (e.g., perceived stress, AS, DS) contributed substantially to risk in this older cohort.

Across age groups, SI, ACEs, and suicide disclosure consistently emerged as core predictors. In contrast, other factors varied by life stage: sleep and AS weighed more in younger adults; relationship/structural factors (e.g., marital status, living status) dominated in midlife; and functional status and mental-health measures (e.g., ADL limitation, DS) were most salient in older adults.

Sensitive analysis

SHAP-identified top predictors of LSA remained largely consistent across age groups under all sensitivity checks, including exclusion of imputed data (Table S4; Figures S7–S10), application of 10-fold stratified cross-validation (Table S5; Figures S11–S14), use of a 60:40 train-test split (Table S6; Figures S15-S18), and comparison of feature rankings between similarly performing models (LR vs. SVM in the ≥45y group; Figure S19), further supporting the robustness of the findings.

Discussion

To our knowledge, this is the first study to apply ML approaches to examine age-stratified prevalence and predictors of LSA among Chinese adults. Using a nationally representative cross-sectional survey, we identified three principal findings: First, the overall prevalence of LSA was 4.57%, indicating a relatively high burden with a pronounced age gradient—prevalence declined with increasing age, and young adults represented the highest-risk group. Second, SI, ACEs, and suicide disclosure consistently emerged as the most robust predictors across all age groups. Third, risk profiles varied by age: psychological distress and sleep-related problems predominated in young adults; marital and living status, IPV, and perceived stress were more salient in mid-life; and in older adults, poor sleep, functional limitations, and DS were the primary contributors. Collectively, these findings provide empirical evidence to guide the development of age-tailored suicide prevention strategies in China.

SA is among the strongest predictors of death by suicide (Bostwick et al., Reference Bostwick, Pabbati, Geske and McKean2016). In this nationally representative study, the prevalence of LSA among Chinese adults was 4.57%, with a marked age gradient: 8.10% in young adults, 4.67% in mid-life, and 2.67% in older adults. National epidemiological data on SA remain limited; most prior studies have focused on local settings or high-risk clinical groups, yielding widely varying estimates. For example, a 2001-2002 survey of community-dwelling adults in metropolitan China reported a LSA prevalence of 1.0% (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He, Zhang, Shen, Nock and Kessler2007). By contrast, a meta-analysis of 29 studies in Chinese college students estimated a pooled LSA prevalence of 2.8% (range 0.4%–10.5%) (Yang et al., Reference Yang, Zhang, Sun, Sun and Ye2015). Internationally, nationally representative World Mental Health surveys across 17 countries reported adult LSA prevalence ranging from 0.5% in Italy to 5.0% in the United States(Nock et al., Reference Nock, Borges, Bromet, Alonso, Angermeyer, Beautrais, Bruffaerts, Chiu, de Girolamo, Gluzman, de Graaf, Gureje, Haro, Huang, Karam, Kessler, Lepine, Levinson, Medina-Mora, Ono, Posada-Villa and Williams2008a). Age-specific patterns show broad cross-national regularities: most countries report higher LSA in younger adults, whereas Japan is a notable exception with relatively elevated risk in midlife. Our study indicates that the prevalence of LSA among Chinese adults is relatively high. This may reflect both substantive and methodological factors: rapid social change with rising depression and stress exposure; limited access and continuity of mental-health care alongside stigma that suppresses help-seeking and disclosure (Wu et al., Reference Wu, Su, Chen, Zhao, Zhong and Zheng2023, Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b, Reference Wu, Zhu, Wang and Jiang2021); and the use of self-reported recall measures—which typically yield higher estimates than diagnostic or registry data—plus differences in survey timing.

Notably, young adults show the highest LSA risk, plausibly reflecting heavier psychosocial stressors (academic and career pressures, identity formation), greater exposure to self-injury content online, and fewer coping resources (Wu et al., Reference Wu, Zhu, Wang and Jiang2021). This pattern should not be equated with the bimodal distribution of suicide mortality (youth and late-life peaks) (Wu et al., Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b): mortality reflects attempt incidence and case-fatality, and case-fatality increases with age (more lethal methods, greater frailty/comorbidity, lower rescue), yielding a late-life peak despite fewer attempts. Underreporting of lifetime attempts and survivorship may also depress observed LSA in older adults. These findings provide national epidemiological evidence for China and underscore the need for improved surveillance and age-targeted prevention.

Guided by the age-specific pattern of LSA, we applied multiple ML algorithms to identify key predictors across the lifespan. SI, ACEs, and suicide disclosure consistently emerged as the most robust predictors. SI was the strongest predictor, reaffirming its pivotal role in the suicidal process (Klonsky et al., Reference Klonsky, May and Saffer2016). According to the Interpersonal Theory of Suicide, SI arises from perceived burdensomeness, thwarted belongingness, and psychological pain (Van Orden et al., Reference Van Orden, Witte, Cukrowicz, Braithwaite, Selby and Joiner2010), while the Three-Step Theory conceptualizes it as a necessary, though not sufficient, precursor, triggered by intolerable distress and hopelessness (Klonsky et al., Reference Klonsky, May and Saffer2016). Although not all individuals with SI progress to attempts, SI substantially elevates risk and remains a necessary antecedent (Klonsky et al., Reference Klonsky, May and Saffer2016). These findings support integrating brief SI screening into frontline settings (e.g., primary care, schools, crisis services), alongside the standardized response protocols and inclusion of SI indicators in public health surveillance to inform timely intervention and resource allocation.

ACEs represent cumulative early-life adversity with lasting impacts on psychological development (Norman et al., Reference Norman, Byambaa, De, Butchart, Scott and Vos2012). Extensive evidence links childhood abuse, neglect, and interpersonal violence to elevated risk of suicidal behaviour in adulthood (Angelakis et al., Reference Angelakis, Gillespie and Panagioti2019; Norman et al., Reference Norman, Byambaa, De, Butchart, Scott and Vos2012). One possible mechanism is that ACEs foster negative cognitive patterns—such as powerlessness, defeat, and entrapment—that impair emotional regulation under stress (Angelakis et al., Reference Angelakis, Gillespie and Panagioti2019). ACEs are also linked to post-traumatic stress disorder, where hopelessness and psychological disengagement may trigger SI and accelerate its progression to SA (Angelakis et al., Reference Angelakis, Gillespie and Panagioti2019). Their consistent predictive value underscores the need for upstream, trauma-informed policies that go beyond clinical screening—such as parenting support programs, school-based violence prevention, and cross-sector data integration to identify and intervene in at-risk environments before patterns of harm are established.

Suicide disclosure similarly predicted LSA across age groups. It often reflects a critical threshold of psychological distress and a strong need for help. From a cognitive-behavioural perspective, disclosure signals the externalization of suicidality—a shift from internal struggle to overt expression—and thus heightened risk (Rudd, Reference Rudd2000). While concealment may reflect fear of stigma, unsupported disclosure can intensify hopelessness and isolation, increasing the likelihood of escalation. These findings call for dedicated post-disclosure protocols across sectors, including hotline escalation pathways, school-based rapid response teams, and mandated training for frontline staff to recognize, triage, and support individuals who disclose suicidal thoughts or intent—bridging the gap between expression and action.

In addition to shared predictors, our SHAP analysis revealed distinct age-specific patterns of LSA risk. Among young adults, the pattern was primarily emotion–relationship driven. Poor sleep, high anxiety, weak neighbourhood ties, and low self-efficacy emerged as salient predictors, reflecting developmental challenges of identity formation, autonomy, and early academic or career stress. Sleep disturbances and anxiety may indicate emotional dysregulation and vulnerability to affective instability, increasing the risk of impulsive suicidal behaviours (Kearns et al., Reference Kearns, Coppersmith, Santee, Insel, Pigeon and Glenn2020). Weak community ties and low self-efficacy may further diminish perceived agency and access to support, reinforcing powerlessness. These findings underscore the value of youth-centred prevention strategies emphasizing emotion regulation, connectedness, and empowerment.

In mid-aged adults, the pattern was largely role-stress driven. Being unmarried, living alone, perceived stress, and IPV ranked highest. This stage is marked by heavier work and family responsibilities alongside shrinking informal support. Emotional isolation from marital disruption or living alone may heighten perceived burdensomeness and thwarted belongingness—key drivers of SI (Van Orden et al., Reference Van Orden, Witte, Cukrowicz, Braithwaite, Selby and Joiner2010). Chronic stress and IPV may also dysregulate the stress-response system, leading to emotional exhaustion and impaired coping (Vidal et al., Reference Vidal, Reinert, Nguyen and Jun2024). These findings highlight the need for integrated psychosocial strategies, including family therapy, IPV screening, and workplace stress reduction.

In older adults, the pattern was health–decline driven. DS, sleep problems, ADL limitations, and NSSI were top predictors. This stage is often accompanied by functional decline, bereavement, and shrinking social roles from retirement or caregiving burdens. ADL impairments undermine autonomy and increase dependency, which in collectivist contexts may be experienced as shame or perceived burden on others. Coexisting depression and sleep disturbances deepen emotional pain and reinforce cognitive distortions, while late-life NSSI may reflect chronic psychological distress or entrenched maladaptive coping. Prevention in later life thus requires a multidisciplinary approach integrating chronic disease management, ADL rehabilitation, and geriatric mental health services tailored to the loss and meaning reconstruction. Together, these findings highlight the need for age-tailored LSA screening and prevention strategies that are developmentally sensitive to the psychosocial vulnerabilities of each life stage.

Finally, our study demonstrates the potential of ML to improve identification of individuals at risk for LSA. Given the rarity and complexity of suicidal behaviour, accurate prediction of SA remains a major challenge (Su et al., Reference Su, John and Lin2023). Traditional risk assessment tools have shown limited value, with meta-analyses reporting low sensitivity and poor positive predictive values across populations (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang, Musacchio, Jaroszewski, Chang and Nock2017; Su et al., Reference Su, John and Lin2023). In contrast, ML can capture complex, non-linear interactions among multiple factors (Su et al., Reference Su, John and Lin2023; Wu et al., Reference Wu, Su, Chen, Zhao, Zhong and Zheng2023). In our analysis, SVM consistently outperformed other models across age groups, yielding balanced predictive performance—critical for reducing both false negatives and positives in suicide risk screening. These findings support the feasibility and added value of ML-based prediction. However, concerns remain regarding potential bias and inflated performance estimates (Jacobucci et al., Reference Jacobucci, Littlefield, Millner, Kleiman and Steinley2021). Thus, rigorous methodological standards and cautious interpretation are essential to ensure clinical applicability.

Strengths and limitations

This study has several strengths. First, it is the first nationwide investigation in China to assess LSA across age groups, providing valuable epidemiological data for future suicide-related research. Second, the inclusion of 37 variables across six domains enabled a comprehensive, multidimensional analysis of LSA predictors. Third, the application and comparison of multiple ML models provided a more robust assessment of predictive performance and model reliability.

However, several limitations should be acknowledged. First, the cross-sectional design precludes causal and temporal inferences. Time-window misalignment between current exposures and lifetime outcomes may introduce reverse causality. Prospective studies are needed to clarify temporal sequence and strengthen causal interpretation. Second, although ML captures nonlinearities and high-order interactions without prior specification, it remains inherently data-driven. SHAP interpretations are model-specific and descriptive rather than causal, and feature importance may vary with feature selection, hyperparameter tuning, and class-imbalance handling. Future work should examine the robustness of SHAP results across different modelling pipelines. Third, LSA and NSSI were measured using single-item, retrospective self-reports, which are prone to recall bias and misclassification (Su et al., Reference Su, John and Lin2023). Stigma, especially among older and rural populations, may further contribute to underreporting and underestimation of prevalence (Wu et al., Reference Wu, Su, Zhao, Chen, Zhong and Zheng2024a, Reference Wu, Su, Zhong, Wang, Huang and Zheng2024b). Fourth, the absence of psychiatric diagnoses may have reduced model performance and limited insights into mental health–related risk factors, despite their known associations with suicidal behaviour (Mullins et al., Reference Mullins, Kang, Campos, Coleman, Edwards, Galfalvy, Levey, Lori, Shabalin, Starnawska, Su, Watson, Adams, Awasthi, Gandal, Hafferty, Hishimoto, Kim, Okazaki, Otsuka, Ripke, Ware, Bergen, Berrettini, Bohus, Brandt, Chang, Chen, Chen, Crawford, Crow, DiBlasi, Duriez, Fernández-Aranda, Fichter, Gallinger, Glatt, Gorwood, Guo, Hakonarson, Halmi, Hwu, Jain, Jamain, Jiménez-Murcia, Johnson, Kaplan, Kaye, Keel, Kennedy, Klump, Li, Liao, Lieb, Lilenfeld, Liu, Magistretti, Marshall, Mitchell, Monson, Myers, Pinto, Powers, Ramoz, Roepke, Rozanov, Scherer, Schmahl, Sokolowski, Strober, Thornton, Treasure, Tsuang, Witt, Woodside, Yilmaz, Zillich, Adolfsson, Agartz, Air, Alda, Alfredsson, Andreassen, Anjorin, Appadurai, Soler Artigas, Van der Auwera, Azevedo, Bass, Bau, Baune, Bellivier, Berger, Biernacka, Bigdeli, Binder, Boehnke, Boks, Bosch, Braff, Bryant, Budde, Byrne, Cahn, Casas, Castelao, Cervilla, Chaumette, Cichon, Corvin, Craddock, Craig, Degenhardt, Djurovic, Edenberg, Fanous, Foo, Forstner, Frye, Fullerton, Gatt, Gejman, Giegling, Grabe, Green, Grevet, Grigoroiu-Serbanescu, Gutierrez, Guzman-Parra, Hamilton, Hamshere, Hartmann, Hauser, Heilmann-Heimbach, Hoffmann, Ising, Jones, Jones, Jonsson, Kahn, Kelsoe, Kendler, Kloiber, Koenen, Kogevinas, Konte, Krebs, Landén, Lawrence, Leboyer, Lee, Levinson, Liao, Lissowska, Lucae, Mayoral, McElroy, McGrath, McGuffin, McQuillin, Medland, Mehta, Melle, Milaneschi, Mitchell, Molina, Morken, Mortensen, Müller-Myhsok, Nievergelt, Nimgaonkar, Nöthen, O’Donovan, Ophoff, Owen, Pato, Pato, Penninx, Pimm, Pistis, Potash, Power, Preisig, Quested, Ramos-Quiroga, Reif, Ribasés, Richarte, Rietschel, Rivera, Roberts, Roberts, Rouleau, Rovaris, Rujescu, Sánchez-Mora, Sanders, Schofield, Schulze, Scott, Serretti, Shi, Shyn, Sirignano, Sklar, Smeland, Smoller, Sonuga-Barke, Spalletta, Strauss, Świątkowska, Trzaskowski, Turecki, Vilar-Ribó, Vincent, Völzke, Walters, Shannon Weickert, Weickert, Weissman, Williams, Wray, Zai, Ashley-Koch, Beckham, Hauser, Hauser, Kimbrel, Lindquist, McMahon, Oslin, Qin, Agerbo, Børglum, Breen, Erlangsen, Esko, Gelernter, Hougaard, Kessler, Kranzler, Li, Martin, McIntosh, Mors, Nordentoft, Olsen, Porteous, Ursano, Wasserman, Werge, Whiteman, Bulik, Coon, Demontis, Docherty, Kuo, Lewis, Mann, Rentería, Smith, Stahl, Stein, Streit, Willour and Ruderfer2022). Finally, model development and validation were conducted within a single dataset, which limits generalizability and highlights the need for external validation in future research.

Conclusion

The prevalence of LSA among Chinese adults remains relatively high, with a clear age gradient—peaking in young adults and declining with age. Risk profiles revealed both shared and age-specific predictors, reflecting distinct life-stage vulnerabilities. These findings highlight the need for age-tailored suicide prevention strategies in China.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S2045796025100231.

Availability of data and materials

Data are available on reasonable request. The data that support the findings of this study are available from the PBICR team (Yibo Wu) upon reasonable request.

Acknowledgements

We gratefully acknowledge the PBICR team for their significant efforts in conducting the survey.

Author contributions

Yu Wu: conceptualization, methodology, formal analysis, writing-original draft, and writing-review & editing; Yihao Zhao, Chen Chen, Panliang Zhong: methodology, visualization and validation, and writing-review & editing; Yibo Wu: project administration, validation, and writing-review & editing; Xiaoying Zheng: conceptualization, supervision, project administration, writing-original draft, and writing-review & editing. Yibo Wu can also be contacted for correspondence, email bjmuwuyibo@outlook.com.

Financial support

This work was supported by the National Key Research and Development Program of China (2022YFC3600800), the Population and Aging Health Science Program (WH10022023035).

Competing interests

There are no actual or potential conflicts of interest, including any financial, personal, or other relationships with other people or organizations.

Ethical standards

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2000. Ethical approval was granted by the Key Laboratory of Health Economics and Policy Research of the National Health Commission (NHC-HEPR202401) and the Ethics Committee of Shanghai Jiao Tong University (H20240237I). The study was registered in the Chinese Clinical Trial Registry (ChiCTR). All participants provided informed consent.

References

Anestis, MD (2016) Prior suicide attempts are less common in suicide decedents who died by firearms relative to those who died by other means. Journal of Affective Disorders 189, 106–109. https://doi.org/10.1016/j.jad.2015.09.007CrossRef Google Scholar PubMed

Angelakis, I, Gillespie, EL and Panagioti, M (2019) Childhood maltreatment and adult suicidality: A comprehensive systematic review with meta-analysis. Psychological Medicine 49(7), 1057–1078. https://doi.org/10.1017/s0033291718003823CrossRef Google Scholar PubMed

Borritz, M, Bültmann, U, Rugulies, R, Christensen, KB, Villadsen, E and Kristensen, TS (2005) Psychosocial work characteristics as predictors for burnout: Findings from 3-year follow up of the PUMA Study. J Occup Environ Medicine 47(10), 1015–1025. https://doi.org/10.1097/01.jom.0000175155.50789.98CrossRef Google Scholar PubMed

Bostwick, JM, Pabbati, C, Geske, JR and McKean, AJ (2016) Suicide attempt as a risk factor for completed suicide: Even more lethal than we knew. American Journal of Psychiatry 173(11), 1094–1100. https://doi.org/10.1176/appi.ajp.2016.15070854CrossRef Google Scholar PubMed

Breiman, L (2001) Random forests. Machine Learning 45(1), 5–32. https://doi.org/10.1023/A:1010933404324CrossRef Google Scholar

Chen, F, Chi, J, Niu, F, Gao, Q, Mei, F, Zhao, L, Hu, K, Zhao, B and Ma, B (2022) Prevalence of suicidal ideation and suicide attempt among patients with traumatic brain injury: A meta-analysis. Journal of Affective Disorders 300, 349–357. https://doi.org/10.1016/j.jad.2022.01.024CrossRef Google Scholar PubMed

Chen, R, Zhu, X, Wright, L, Drescher, J, Gao, Y, Wu, L, Ying, X, Qi, J, Chen, C, Xi, Y, Ji, L, Zhao, H, Ou, J and Broome, MR (2019) Suicidal ideation and attempted suicide amongst Chinese transgender persons: National population study. Journal of Affective Disorders 245, 1126–1134. https://doi.org/10.1016/j.jad.2018.12.011CrossRef Google Scholar PubMed

Chen, T and Guestrin, C (2016) XGBoost: a Scalable Tree Boosting System Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. https://doi.org/10.1145/2939672.2939785CrossRef Google Scholar

Conner, A, Azrael, D and Miller, M (2019) Suicide case-fatality rates in the United States, 2007 to 2014: A nationwide population-based study. Annals of Internal Medicine 171(12), 885–895. https://doi.org/10.7326/m19-1324CrossRef Google Scholar PubMed

Domingos, P and Pazzani, M (1997) On the optimality of the simple Bayesian Classifier under Zero-One Loss. Machine Learning 29(2), 103–130. https://doi.org/10.1023/A:1007413511361CrossRef Google Scholar

Fan, M, Sun, D, Zhou, T, Heianza, Y, Lv, J, Li, L and Qi, L (2020) Sleep patterns, genetic susceptibility, and incident cardiovascular disease: A prospective study of 385 292 UK biobank participants. European Heart Journal 41(11), 1182–1189. https://doi.org/10.1093/eurheartj/ehz849CrossRef Google Scholar PubMed

Feng, X and Chen, XY (2012) Reliability and validity of the New General Self-Efficacy Scale (NGSES). Journal of Mudanjiang Normal University (Philosophy and Social Sciences Edition) 21(4), 127–129. https://doi.org/10.13815/j.cnki.jmtc(pss).2012.04.042Google Scholar

Foster, HME, Gill, JMR, Mair, FS, Celis-Morales, CA, Jani, BD, Nicholl, BI, Lee, D and O’Donnell, CA (2023) Social connection and mortality in UK Biobank: A prospective cohort analysis. Bmc Medicine 21(1), 384. https://doi.org/10.1186/s12916-023-03055-7CrossRef Google Scholar PubMed

Franklin, JC, Ribeiro, JD, Fox, KR, Bentley, KH, Kleiman, EM, Huang, X, Musacchio, KM, Jaroszewski, AC, Chang, BP and Nock, MK (2017) Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychological Bulletin 143(2), 187–232. https://doi.org/10.1037/bul0000084CrossRef Google Scholar PubMed

García de la Garza, Á, Blanco, C, Olfson, M and Wall, MM (2021) Identification of suicide attempt risk factors in a national US survey using machine learning. Jama Psychiatry 78(4), 398–406. https://doi.org/10.1001/jamapsychiatry.2020.4165CrossRef Google Scholar

Gordon, JA, Avenevoli, S and Pearson, JL (2020) Suicide prevention research priorities in health care. Jama Psychiatry 77(9), 885–886. https://doi.org/10.1001/jamapsychiatry.2020.1042CrossRef Google Scholar PubMed

Haibo, H, Yang, B, Garcia, EA and Shutao, L (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence),10.1109/IJCNN.2008.4633969CrossRef Google Scholar

Huang, F, Wang, H, Wang, Z, Zhang, J, Du, W, Su, C, Jia, X, Ouyang, Y, Wang, Y, Li, L, Jiang, H and Zhang, B (2020) Psychometric properties of the perceived stress scale in a community sample of Chinese. BMC Psychiatry 20(1), 130. https://doi.org/10.1186/s12888-020-02520-4CrossRef Google Scholar

Jacobucci, R, Littlefield, AK, Millner, AJ, Kleiman, EM and Steinley, D (2021) Evidence of inflated prediction performance: a commentary on machine learning and suicide research. Clinical Psychological Science 9(1), 129–134. https://doi.org/10.1177/2167702620954216CrossRef Google Scholar

Kearns, JC, Coppersmith, DDL, Santee, AC, Insel, C, Pigeon, WR and Glenn, CR (2020) Sleep problems and suicide risk in youth: A systematic review, developmental framework, and implications for hospital treatment. General Hospital Psychiatry 63, 141–151. https://doi.org/10.1016/j.genhosppsych.2018.09.011CrossRef Google Scholar PubMed

Klonsky, ED, May, AM and Saffer, BY (2016) Suicide, suicide attempts, and suicidal ideation. Annual Review of Clinical Psychology 12, 307–330. https://doi.org/10.1146/annurev-clinpsy-021815-093204CrossRef Google Scholar PubMed

Lee, S, Fung, SC, Tsang, A, Liu, ZR, Huang, YQ, He, YL, Zhang, MY, Shen, YC, Nock, MK and Kessler, RC (2007) Lifetime prevalence of suicide ideation, plan, and attempt in metropolitan China. Acta Psychiatrica Scandinavica 116(6), 429–437. https://doi.org/10.1111/j.1600-0447.2007.01064.xCrossRef Google Scholar PubMed

Lee, S, Gornitz, N, Xing, EP, Heckerman, D and Lippert, C (2018) Ensembles of Lasso Screening Rules. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(12), 2841–2852. https://doi.org/10.1109/tpami.2017.2765321CrossRef Google Scholar PubMed

Leung, H, Pakpour, AH, Strong, C, Lin, YC, Tsai, MC, Griffiths, MD, Lin, CY and Chen, IH (2020) Measurement invariance across young adults from Hong Kong and Taiwan among three internet-related addiction scales: Bergen Social Media Addiction Scale (BSMAS), Smartphone Application-Based Addiction Scale (SABAS), and Internet Gaming Disorder Scale-Short Form (IGDS-SF9) (Study Part A). Addictive Behaviors 101, 105969. https://doi.org/10.1016/j.addbeh.2019.04.027CrossRef Google Scholar

Levis, B, Benedetti, A and Thombs, BD (2019) Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis. BMJ 365, l1476. https://doi.org/10.1136/bmj.l1476CrossRef Google Scholar PubMed

Lin, L, Wang, HH, Lu, C, Chen, W and Guo, VY (2021) Adverse childhood experiences and subsequent chronic diseases among middle-aged or older adults in China and Associations With Demographic and Socioeconomic Characteristics. JAMA Network Open 4(10), e2130143. https://doi.org/10.1001/jamanetworkopen.2021.30143CrossRef Google Scholar PubMed

Lyu, J, Wang, Y, Shi, H and Zhang, J (2018) Early warnings for suicide attempt among Chinese rural population. Journal of Affective Disorders 238, 353–358. https://doi.org/10.1016/j.jad.2018.06.009CrossRef Google Scholar PubMed

Miller, M, Azrael, D and Barber, C (2012) Suicide mortality in the United States: The importance of attending to method in understanding population-level disparities in the burden of suicide. Annual Review of Public Health 33, 393–408. https://doi.org/10.1146/annurev-publhealth-031811-124636CrossRef Google Scholar PubMed

Miller, M, Azrael, D and Hemenway, D (2004) The epidemiology of case fatality rates for suicide in the northeast. Annals of Emergency Medicine 43(6), 723–730. https://doi.org/10.1016/j.annemergmed.2004.01.018CrossRef Google Scholar PubMed

Mullins, N, Kang, J, Campos, AI, Coleman, JRI, Edwards, AC, Galfalvy, H, Levey, DF, Lori, A, Shabalin, A, Starnawska, A, Su, MH, Watson, HJ, Adams, M, Awasthi, S, Gandal, M, Hafferty, JD, Hishimoto, A, Kim, M, Okazaki, S, Otsuka, I, Ripke, S, Ware, EB, Bergen, AW, Berrettini, WH, Bohus, M, Brandt, H, Chang, X, Chen, WJ, Chen, HC, Crawford, S, Crow, S, DiBlasi, E, Duriez, P, Fernández-Aranda, F, Fichter, MM, Gallinger, S, Glatt, SJ, Gorwood, P, Guo, Y, Hakonarson, H, Halmi, KA, Hwu, HG, Jain, S, Jamain, S, Jiménez-Murcia, S, Johnson, C, Kaplan, AS, Kaye, WH, Keel, PK, Kennedy, JL, Klump, KL, Li, D, Liao, SC, Lieb, K, Lilenfeld, L, Liu, CM, Magistretti, PJ, Marshall, CR, Mitchell, JE, Monson, ET, Myers, RM, Pinto, D, Powers, A, Ramoz, N, Roepke, S, Rozanov, V, Scherer, SW, Schmahl, C, Sokolowski, M, Strober, M, Thornton, LM, Treasure, J, Tsuang, MT, Witt, SH, Woodside, DB, Yilmaz, Z, Zillich, L, Adolfsson, R, Agartz, I, Air, TM, Alda, M, Alfredsson, L, Andreassen, OA, Anjorin, A, Appadurai, V, Soler Artigas, M, Van der Auwera, S, Azevedo, MH, Bass, N, Bau, CHD, Baune, BT, Bellivier, F, Berger, K, Biernacka, JM, Bigdeli, TB, Binder, EB, Boehnke, M, Boks, MP, Bosch, R, Braff, DL, Bryant, R, Budde, M, Byrne, EM, Cahn, W, Casas, M, Castelao, E, Cervilla, JA, Chaumette, B, Cichon, S, Corvin, A, Craddock, N, Craig, D, Degenhardt, F, Djurovic, S, Edenberg, HJ, Fanous, AH, Foo, JC, Forstner, AJ, Frye, M, Fullerton, JM, Gatt, JM, Gejman, PV, Giegling, I, Grabe, HJ, Green, MJ, Grevet, EH, Grigoroiu-Serbanescu, M, Gutierrez, B, Guzman-Parra, J, Hamilton, SP, Hamshere, ML, Hartmann, A, Hauser, J, Heilmann-Heimbach, S, Hoffmann, P, Ising, M, Jones, I, Jones, LA, Jonsson, L, Kahn, RS, Kelsoe, JR, Kendler, KS, Kloiber, S, Koenen, KC, Kogevinas, M, Konte, B, Krebs, MO, Landén, M, Lawrence, J, Leboyer, M, Lee, PH, Levinson, DF, Liao, C, Lissowska, J, Lucae, S, Mayoral, F, McElroy, SL, McGrath, P, McGuffin, P, McQuillin, A, Medland, SE, Mehta, D, Melle, I, Milaneschi, Y, Mitchell, PB, Molina, E, Morken, G, Mortensen, PB, Müller-Myhsok, B, Nievergelt, C, Nimgaonkar, V, Nöthen, MM, O’Donovan, MC, Ophoff, RA, Owen, MJ, Pato, C, Pato, MT, Penninx, B, Pimm, J, Pistis, G, Potash, JB, Power, RA, Preisig, M, Quested, D, Ramos-Quiroga, JA, Reif, A, Ribasés, M, Richarte, V, Rietschel, M, Rivera, M, Roberts, A, Roberts, G, Rouleau, GA, Rovaris, DL, Rujescu, D, Sánchez-Mora, C, Sanders, AR, Schofield, PR, Schulze, TG, Scott, LJ, Serretti, A, Shi, J, Shyn, SI, Sirignano, L, Sklar, P, Smeland, OB, Smoller, JW, Sonuga-Barke, EJS, Spalletta, G, Strauss, JS, Świątkowska, B, Trzaskowski, M, Turecki, G, Vilar-Ribó, L, Vincent, JB, Völzke, H, Walters, JTR, Shannon Weickert, C, Weickert, TW, Weissman, MM, Williams, LM, Wray, NR, Zai, CC, Ashley-Koch, A,E, Beckham, JC, Hauser, ER, Hauser, MA, Kimbrel, NA, Lindquist, JH, McMahon, B, Oslin, DW, Qin, X, Agerbo, E, Børglum, AD, Breen, G, Erlangsen, A, Esko, T, Gelernter, J, Hougaard, D,M, Kessler, RC, Kranzler, HR, Li, QS, Martin, NG, McIntosh, AM, Mors, O, Nordentoft, M, Olsen, CM, Porteous, D, Ursano, RJ, Wasserman, D, Werge, T, Whiteman, DC, Bulik, CM, Coon, H, Demontis, D, Docherty, AR, Kuo, PH, Lewis, CM, Mann, JJ, Rentería, ME, Smith, DJ, Stahl, EA, Stein, MB, Streit, F, Willour, V and Ruderfer, DM (2022) Dissecting the shared genetic architecture of suicide attempt, psychiatric disorders, and known risk factors. Biological Psychiatry 91(3), 313–327. https://doi.org/10.1016/j.biopsych.2021.05.029CrossRef Google Scholar PubMed

National Bureau of Statistics of China (2023) China Statistical Yearbook 2023. Beijing: China Statistics Press.Google Scholar

Nock, MK, Borges, G, Bromet, EJ, Alonso, J, Angermeyer, M, Beautrais, A, Bruffaerts, R, Chiu, WT, de Girolamo, G, Gluzman, S, de Graaf, R, Gureje, O, Haro, JM, Huang, Y, Karam, E, Kessler, RC, Lepine, JP, Levinson, D, Medina-Mora, ME, Ono, Y, Posada-Villa, J and Williams, D (2008a) Cross-national prevalence and risk factors for suicidal ideation, plans and attempts. British Journal of Psychiatry 192(2), 98–105. https://doi.org/10.1192/bjp.bp.107.040113CrossRef Google Scholar

Nock, MK, Borges, G, Bromet, EJ, Cha, CB, Kessler, RC and Lee, S (2008b) Suicide and suicidal behavior. Epidemiologic Reviews 30(1), 133–154. https://doi.org/10.1093/epirev/mxn002CrossRef Google Scholar

Norman, RE, Byambaa, M, De, R, Butchart, A, Scott, J and Vos, T (2012) The long-term health consequences of child physical abuse, emotional abuse, and neglect: A systematic review and meta-analysis. PLOS Medicine 9(11), e1001349. https://doi.org/10.1371/journal.pmed.1001349CrossRef Google Scholar PubMed

Oquendo, MA, Wall, M, Wang, S, Olfson, M and Blanco, C (2024) Lifetime suicide attempts in otherwise psychiatrically healthy individuals. Jama Psychiatry 81(6), 572–578. https://doi.org/10.1001/jamapsychiatry.2023.5672CrossRef Google Scholar PubMed

Pan, X and Xu, Y (2022) A Safe Feature Elimination Rule for L(1)-Regularized Logistic Regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 4544–4554. https://doi.org/10.1109/tpami.2021.3071138Google Scholar

Rezvani, S and Wu, J (2023) Handling multi-class problem by intuitionistic fuzzy twin support vector machines based on relative density information. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(12), 14653–14664. https://doi.org/10.1109/tpami.2023.3310908CrossRef Google Scholar PubMed

Rudd, MD (2000) The suicidal mode: A cognitive-behavioral model of suicidality. Suicide and Life-Threatening Behavior 30(1), 18–33.10.1111/j.1943-278X.2000.tb01062.xCrossRef Google Scholar

Song, X, Liu, X, Zhou, Y and Zhang, X (2023) Prevalence and correlates of suicide attempts in young patients with first-episode and drug-naïve major depressive disorder: A large cross-sectional study. Journal of Affective Disorders 340, 340–346. https://doi.org/10.1016/j.jad.2023.08.006CrossRef Google Scholar

Stekhoven, DJ and Buhlmann, P (2012) MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118. https://doi.org/10.1093/bioinformatics/btr597CrossRef Google Scholar PubMed

Su, R, John, JR and Lin, PI (2023) Machine learning-based prediction for self-harm and suicide attempts in adolescents. Psychiatry Research 328, 115446. https://doi.org/10.1016/j.psychres.2023.115446CrossRef Google Scholar PubMed

Van Orden, KA, Witte, TK, Cukrowicz, KC, Braithwaite, SR, Selby, EA and Joiner, TE Jr (2010) The interpersonal theory of suicide. Psychological Review 117(2), 575–600. https://doi.org/10.1037/a0018697CrossRef Google Scholar PubMed

Vidal, C, Reinert, M, Nguyen, T and Jun, HJ (2024) Chronic stress and lack of social support: Role in adolescent depression and suicide-related behaviors in the context of the COVID-19 pandemic. Journal of Affective Disorders 365, 437–442. https://doi.org/10.1016/j.jad.2024.08.090CrossRef Google Scholar PubMed

Wang, F, Wu, Y, Wang, S, Du, Z and Wu, Y (2024) Development of an optimal short form of the GAD-7 scale with cross-cultural generalizability based on Riskslim. General Hospital Psychiatry 87, 33–40. https://doi.org/10.1016/j.genhosppsych.2024.01.010CrossRef Google Scholar PubMed

WHO. (2021) Suicide worldwide in 2019: Global Health Estimates. https://www.who.int/publications/i/item/9789240026643 (accessed 5 November 2023)Google Scholar

Wu, R, Zhu, H, Wang, ZJ and Jiang, CL (2021) A Large Sample Survey of Suicide Risk among University Students in China. BMC Psychiatry 21(1), 474. https://doi.org/10.1186/s12888-021-03480-zCrossRef Google Scholar PubMed

Wu, Y, Su, B, Chen, C, Zhao, Y, Zhong, P and Zheng, X (2023) Urban-rural disparities in the prevalence and trends of depressive symptoms among Chinese elderly and their associated factors. Journal of Affective Disorders 340, 258–268. https://doi.org/10.1016/j.jad.2023.07.117CrossRef Google Scholar PubMed

Wu, Y, Su, B, Zhao, Y, Chen, C, Zhong, P and Zheng, X (2024a) Epidemiological features of suicidal ideation among the elderly in China based meta-analysis. BMC Psychiatry 24(1), 562. https://doi.org/10.1186/s12888-024-06010-9CrossRef Google Scholar

Wu, Y, Su, B, Zhong, P, Wang, Y, Huang, Y and Zheng, X (2024b) The long-term changing patterns of suicide mortality in China from 1987 to 2020: Continuing urban-rural disparity. BMC Public Health 24(1), 1269. https://doi.org/10.1186/s12889-024-18743-zCrossRef Google Scholar

Wu, Y, Tang, J, Du, Z, Chen, K, Wang, F, Sun, X, Zhang, G and Wu, Y (2025) Development of a short version of the perceived social support scale: Based on classical test theory and ant colony optimization. BMC Public Health 25(1), 232. https://doi.org/10.1186/s12889-025-21399-yCrossRef Google Scholar

Yang, LS, Zhang, ZH, Sun, L, Sun, YH and Ye, DQ (2015) Prevalence of suicide attempts among college students in China: A meta-analysis. PLoS One 10(2), e0116303. https://doi.org/10.1371/journal.pone.0116303CrossRef Google Scholar

Yazdi-Ravandi, S, Khazaei, S, Davari, H, Matinnia, N, Karami, M, Taslimi, Z, Afkhami, MR and Ghaleiha, A (2023) Gender and age differences in suicide attempt: A large population study in the West of Iran. Asian Journal of Psychiatry 81, 103470. https://doi.org/10.1016/j.ajp.2023.103470103470CrossRef Google Scholar PubMed

Yount, KM, Cheong, YF, Khan, Z, Bergenfeld, I, Kaslow, N and Clark, CJ (2022) Global measurement of intimate partner violence to monitor Sustainable Development Goal 5. BMC Public Health 22(1), 465. https://doi.org/10.1186/s12889-022-12822-9CrossRef Google Scholar PubMed

Fig. 1. Flow diagram of sample selection.

Fig. 2. Age-specific prevalence and subgroup differences in lifetime suicide attempts (LSA) among adults in China.

Note: (A) LSA prevalence by age group in the overall population; (B) Number of LSA cases by age group; (C) LSA prevalence by age group and sex; (D) LSA prevalence by age group and residential location (urban vs. rural). LSA prevalence differed by age, with all pairwise comparisons significant (all p

Table 1. Basic characteristics of participants with lifetime suicide attempts across age groups

Fig. 3. Model performance in predicting lifetime suicide attempts across age groups.

Note: (A–C) Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) for 18–24y, 25–44y, and ≥ 45y age groups; (D–F) Calibration curves comparing predicted versus observed probabilities for each model across the three age groups; (G–I) Decision Curve Analyses showing the net benefit of using Support Vector Machine (SVM) models across a range of threshold probabilities. Models included: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Naive Bayes

Table 2. Comparison of model performance in predicting lifetime suicide attempts (LSA) on the test set data

Fig. 4a. Top 10 features identified by SHAP using the best-performing model (SVM) in the 18-24y group.

Note: 1. Left panel: SHAP bar plot of mean absolute SHAP values (global importance; features ordered by mean SHAP). Right panel: SHAP summary plot (each dot = one participant). Colors encode feature values (red = higher, blue = lower). Positive SHAP values indicate an increase, and negative values a decrease, in the model-predicted probability of LSA. SHAP values were computed on the test set; 2. Abbreviations: SHAP = SHapley Additive exPlanations; SVM = support vector machine; LSA = lifetime suicide attempts; SI = suicidal ideation; ACEs = adverse childhood experiences; AS = anxiety symptoms; SP = suicide plan

Fig. 4b. Top 10 features identified by SHAP using the best-performing model (SVM) in the 25-44y group.

Fig. 4c. Top 10 features identified by SHAP using the best-performing model (SVM) in the ≥ 45y group.

Wu et al. supplementary material

DOI: https://doi.org/10.1017/S2045796025100231.sm001

File 11.6 MB

Article contents

Age-specific prevalence and predictors of lifetime suicide attempts using machine learning in Chinese adults: a nationwide multi-centre survey

Abstract

Keywords

Information

Introduction

Methods

Data source and study population

Outcome assessment

Predictors

Feature screening

Machine learning algorithms

Statistical analysis

Results

Prevalence of LSA and participant characteristics across age groups

Predictor screening

Model performance

SHAP-based model interpretability analysis

Sensitive analysis

Discussion

Strengths and limitations

Conclusion

Supplementary material

Availability of data and materials

Acknowledgements

Author contributions

Financial support

Competing interests

Ethical standards

References

Wu et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests