Introduction
The glomerular filtration rate (GFR), estimated from serum creatinine (SCr), is a standard tool in diagnosing, staging, and managing chronic kidney disease (CKD) in clinical practice. Over the years, many SCr-based equations have been developed, with current clinical guidelines recommending the race-free 2021 CKD-EPI SCr equation [Reference Delgado, Baweja and Crews1]. However, the performance of these equations is limited by non-GFR determinants that influence SCr, such as muscle mass and nutritional intake [Reference Kashani, Rosner and Ostermann2,Reference Tio, Shafi, Zhu, Kalantar-Zadeh, Chan and Nguyen3]. In the study that developed the 2021 CKD-EPI equations, external validation has shown that the CKD-EPI SCr equation may introduce inaccuracies in GFR estimation for both Black and non-Black populations, and lead to differential bias between race groups (namely, overestimation in non-Black individuals and underestimation in Black individuals). Furthermore, the 2021 CKD-EPI SCr equation may be less precise than the 2009 race-based CKD-EPI SCr equation [Reference Inker, Eneanya and Coresh4].
To address these limitations, guidelines from the National Kidney Foundation (NKF) and the American Society of Nephrology (ASN) recommend using the CKD-EPI creatine-cystatin C combined equation for further confirmation and advocate for further investigation into other endogenous filtration markers that may enhance GFR estimation [Reference Delgado, Baweja and Crews1]. Several studies have explored the potential of novel biomarkers like beta-trace protein, β2-microglobulin, and citrulline for estimating GFR [Reference Lousa, Reis, Beirão, Alves, Belo and Santos-Silva5,Reference Benito, Unceta and Maciejczyk6], but SCr remains a valuable marker for GFR estimation due to its low cost, widespread availability, and routine inclusion in the basic metabolic panel. While it is important to continue exploring alternatives to SCr, until guidelines update their recommendations for GFR estimation, the SCr-based 2021 CKD-EPI equation will likely continue to be the standard in clinical settings. Given its clinical relevance, we sought to investigate whether SCr-based equations can be improved using advanced modeling techniques without incorporating additional biomarkers, potentially offering a cost-effective approach to enhance clinical practice.
It is well-known that SCr has a nonlinear relationship with measured GFR (mGFR) [Reference Raman, Middleton, Kalra and Green7]. Existing SCr-based equations typically employ linear regression methods with piecewise spline terms for SCr to capture the nonlinearity, using log-transformed values for both mGFR and SCr [Reference Inker, Eneanya and Coresh4]. However, several advanced smoothing and machine learning techniques can model complex nonlinear associations and identify the most appropriate relationships between the outcome and predictors based on the data itself, without assuming a predefined functional form. We, therefore, assessed whether these advanced methods could improve equation performance by more accurately capturing the nonlinear relationship between SCr and mGFR compared to the traditional linear regression-based approach. Specifically, we developed four new SCr-based equations using advanced approaches and compared their performance to the refitted linear regression-based 2021 CKD-EPI SCr equation.
Materials and methods
Data sources
This investigation utilized existing data from seven study cohorts that had mGFR data. For the development of new equations, we used the Genetic Epidemiology Network of Arteriopathy Study (GENOA) (N = 1010) [Reference Rule, Bailey, Lieske, Peyser and Turner8], African American Study of Kidney Disease and Hypertension Study (AASK) (N = 1807) [Reference Appel, Middleton and Miller9], Modification of Diet in Renal Disease (MDRD) Study (N = 1628) [Reference Levey, Bosch, Lewis, Greene, Rogers and Roth10], and Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) (N = 168) [Reference Chapman, Guay-Woodford and Grantham11]. For external validation, we utilized three studies: the Epidemiology of Coronary Artery Calcification (ECAC) cohort study (N = 406) [Reference Rule, Bergstralh, Slezak, Bergert and Larson12], the Assessing Long Term Outcomes in Living Kidney Donors (ALTOLD) (N = 386) [Reference Kasiske, Anderson-Haag and Ibrahim13], and the Chronic Renal Insufficiency Cohort (CRIC) Study (N = 1423) [Reference Anderson, Yang and Hsu14]. Additional information about these cohorts can be found in the Supplemental Materials (Section: Details of Study Cohorts).
Measured GFR, serum creatinine, and covariates
Details of mGFR protocols and laboratory measurements were published elsewhere [Reference Rule, Bailey, Lieske, Peyser and Turner8,Reference Levey, Bosch, Lewis, Greene, Rogers and Roth10,Reference Chapman, Guay-Woodford and Grantham11,Reference Kasiske, Anderson-Haag and Ibrahim13–Reference Kwong, Stevens and Selvin16]. GFR was measured using urinary clearance of non-radiolabeled iothalamate in GENOA, ECAC, and CRISP; radiolabeled 125I-iothalamate urinary clearances in MDRD, AASK, and CRIC; and plasma clearance of iohexol in ALTOD. mGFRs from all studies were standardized to 1.73 m2 of body surface area. SCr measurement was standardized in all studies. Age, sex, and race were self-reported.
Similar to the guideline-recommended 2021 CKD-EPI SCr equation, all models included age, sex, and SCr as predictors, with mGFR as the outcome. mGFR was expressed in ml/min/1.73m2, SCr in mg/dL, and age in years for equation development. mGFR was log-transformed due to its high variability and to ensure positive predictions. SCr was kept on its original scale in all models except for the linear regression model used to refit the CKD-EPI SCr equation, where it was log-transformed.
Statistical analysis
We selected four distinct methods to estimate GFR – each with unique strengths in capturing nonlinear relationships between predictors and the outcome – to estimate GFR: multivariable fractional polynomials (MFP), generalized additive models (GAM), random forests (RF), and gradient boosted machines (GBM). MFP employs built-in variable selection and applies appropriate power transformations to continuous predictor variables when nonlinearity is identified [Reference Sauerbrei17]. GAM utilizes smooth functions, represented by penalized regression splines, to flexibly model nonlinear relationships when present and can effectively handle complex interactions between variables [Reference Wood18]. RF and GBM are ensemble machine learning techniques that combine predictions from multiple models to improve the overall accuracy and robustness of predictions. RF aggregates predictions from multiple deep decision trees, each trained on a random subset of the data, ensuring diversity among the trees and reducing overfitting [Reference Breiman19]. It captures nonlinearity by allowing individual decision trees to model different aspects of the relationship between dependent and independent variables. GBM uses a stochastic gradient boosting strategy, sequentially building a series of shallow decision trees, where each tree corrects the residuals of the previous ones [Reference Friedman20,Reference Friedman21]. The technical details of these methods are provided in the Supplemental Materials (Technical Notes of Statistical Methods).
To compare these four methods with the linear regression-based CKD-EPI SCr equation, we refitted the 2021 CKD-EPI SCr equation using a linear regression model (hereafter referred to as LM). We preserved the same expression for the linear predictors as in the original 2021 CKD-EPI SCr equation, which included a linear effect for age and sex-specific linear splines for SCr, with cut-off values of 0.9 for females and 0.7 for males.
The MFP, GAM, RF, and GBM methods were implemented using R packages mfp, mgcv, RandomForest, and gbm, respectively, while the LM was implemented using base R. R packages pdp and visreg were used to visualize the function form of equations.
External validation
We evaluated the performance of the new GFR-estimating equations in the pooled external validation data set using several population-level and individual-level metrics. Bias, the population-level systematic difference, was evaluated as the median difference between mGFR and estimated GFR (eGFR) (i.e., mGFR minus eGFR). Precision was assessed as the interquartile range of the difference. Population-level accuracy was assessed as root mean square error (RMSE) of log mGFR and log eGFR, mean absolute error (MAE) of the difference between mGFR and eGFR, percentage of eGFR within 10% of mGFR (P10), and percentage of eGFR within 30% of mGFR (P30) [Reference Levey, Stevens and Schmid22,Reference Stevens, Zhang and Schmid23].
We examined individual-level accuracy by calculating 95% prediction intervals (PIs) of mGFR at different eGFR values. The lower and upper bounds of these intervals were determined using the 2.5th and 97.5th percentiles of mGFR, predicted from quantile regression models [Reference Austin, Tu, Daly and Alter24,Reference Shafi, Zhu and Lirette25]. Similarly, we constructed the 50% prediction interval (PI) using the 25th and 75th percentiles of mGFR obtained from quantile regression models.
We obtained 95% confidence intervals (CIs) (2.5th percentile, 97.5th percentile) for all population-level metrics using 2000 bootstrap samples (see details in Supplemental Materials: Bootstrapping Methods to Calculate Obtain Estimates (95% CI) of Metrics for Equation Performance) [Reference Royston and Sauerbrei26]. All analyses were performed in R (R Foundation for Statistical Computing, Vienna, Austria) and STATA/SE version 18.0 (StataCorp LLC, College Station, TX). Exemplary R code is provided in Supplementary Materials.
Results
Characteristics of study participants
Table 1 shows the characteristics of the participants in both the development and external validation data sets. In the development data set, the mean ± SD age of participants was 54 ± 13 years; 2099 (45%) participants were female, and 2516 (54%) self-reported Black race. The mean mGFR was 58 ± 28 ml/min/1.73 m2, and 2571 (55%) had mGFR < 60 ml/min/1.73 m2. The external validation data set showed a slightly higher mean mGFR (62 ± 28 ml/min/1.73 m2), an older mean age (56 ± 14 years), a higher proportion of females (49%), and a lower proportion of Black race (24%). Characteristics of participants from each cohort are shown in Supplemental Table 1.
Table 1. Characteristics of study participants

Note: GFR = glomerular filtration rate; mGFR = measured glomerular filtration rate. Data are presented as mean (standard deviation) for continuous variables and as n (%) for categorical variables.
Formulation of equations
MFP equation
Nonlinear relationships of both SCr and age with log(mGFR) were selected by the MFP model based on the deviance criterion. The following equation was used to estimate GFR:


Supplemental Figure 1 displays the relationships of MFP-based eGFR with SCr and age.
GAM equation
Nonlinear relationships with log(mGFR) for both SCr and age were statistically supported in the GAM-based equation; Penalized cubic regression splines were used in the GAM model, as they resulted in the lowest generalized cross-validation scores compared to other available spline functions. The equation to estimate GFR is:
exp [3.755293+ f (SCr) + f (age) + 0.291015 if male] where f represents the smooth functions.
However, unlike MFP, it is difficult to transcribe the mathematical forms of the smooth terms concisely. Supplemental Figure 2 shows the relationships of GAM-based eGFR with SCr and age.
RF equation
In the RF approach, a grid search was conducted across various parameter combinations, including the number of trees and the minimum size of terminal nodes, to identify the optimal parameters for the final model, based on the lowest out-of-bag RMSE. The final model used 1000 trees with a node size of nine. Two randomly sampled variables were used to perform each of the splits because there were only three predictors in total. In contrast to the MFP and GAM models, RF models are purely designed for prediction and thus do not provide interpretable equations. Supplemental Figure 3 shows how RF-based eGFR changes with SCr or age.
GBM equation
The GBM equation was developed based on a boosted regression model. A grid search, similar to that used for RF models, was performed to determine the parameters for the final model, including the total number of trees, the maximum depth of each tree, and the shrinkage parameter, based on the RMSE. A total of 1386 trees were included in the final model.
Like the RF equation, predictions can be easily generated from the GBM model, although interpretable equations are not available. Supplemental Figure 4 illustrates how the eGFR changes with SCr or age.
Refitted CKD-EPI SCr 2021 (LM) equation
The 2021 CKD-EPI SCr equation was refitted using our development data set. The refitted equation, referred to as the LM equation, was defined as follows:
135
$\!{^{ *}}({SCr \over 0.7})^{-0.135}$
$\!{^{ *}}\!\!$
0.9940Age if SCr ≤ 0.7, female
135
$\!{^{ *}}({SCr \over 0.7})^{-1.159}$
$\!{^{ *}}\!\!$
0.9940Age if SCr > 0.7, female
139
$\!{^{ *}}({SCr \over 0.9})^{-0.110}$
$\!{^{ *}}\!\!$
0.9940Age if SCr ≤ 0.9, male
139
$\!{^{ *}}({SCr \over 0.9})^{-1.159}$
$\!{^{ *}}\!\!$
0.9940Age if SCr > 0.9, male
Supplemental Figure 5 demonstrates that the refitted equation produced predictions generally consistent with the original CKD-EPI SCr equation across various SCr values.
Performance of equations in the external validation data set
Overall, all equations underestimated mGFR with small biases ranging from 0.9 to 1.4, except for the GBM equation, which resulted in a bias that was not significantly different from zero (–0.3, 95%CI: –0.9, 0.3) (Table 2). The LM, MFP, and GAM equations demonstrated similar accuracy, as measured by both P10 and P30, while the GBM and RF equations showed lower accuracy (Table 2). Imprecision, RMSE, and MAE followed a similar pattern as P10 and P30, with GBM and RF demonstrating larger imprecision and errors than the other equations (Table 2). Despite these small differences, the overall performance of the equations was generally similar, as the 95% confidence intervals overlapped for most of the metrics.
Table 2. Overall performance of estimating equations in the external validation data set

Note: RMSE = root mean square error; MAE = mean absolute error; GFR = glomerular filtration rate; eGFR = estimated GFR; mGFR = measured GFR; CKD-EPI = Chronic Kidney Disease Epidemiology Collaboration; SCr = serum creatinine; LM = the refitted 2021 CKD-EPI SCr equation using a linear regression model; MFP = the equation based on the multivariable fractional polynomial model; GAM = the equation based on the generalized additive model; RF = the equation based on random forests; GBM = the equation based on gradient boosted machines. Bias is defined as the median of the differences between mGFR and eGFR for each individual in the sample (mGFR minus eGFR). Imprecision is the interquartile range (75th minus the 25th percentiles) of the differences. P10 is the percentage of eGFRs within 10% of mGFR, and P30 is the percentage of eGFRs within 30% of mGFR. RMSE is the root mean square error of log mGFR and log eGFR. MAE is calculated as the average of the absolute differences between each eGFR and the corresponding mGFR.
The performance of these equations across subgroups of race, sex, age, and eGFR (calculated using the CKD-EPI SCr equation) (Supplemental Table 2) mirrored the overall trends. All equations, including the LM equation, exhibited the greatest underestimation biases in the Black subgroup, followed by the eGFR < 60 ml/min/1.73 m2 subgroup (Supplemental Table 2, Figure 1). The GBM equation showed the smallest bias in both of these subgroups; however, it exhibited the largest overestimation bias in the White and eGFR ≥ 60 ml/min/1.73 m2 subgroups. Other equations demonstrated relatively similar biases within each subgroup. The GBM and RF equations generally yielded lower accuracy, as measured by P10 and P30, within each subgroup when compared to other equations (Supplemental Table 2, Figures 2 and 3). The LM, MFP, and GAM equations performed similarly within most of the subgroups, but the MFP and GAM equations tended to show higher P10 and P30 than the LM equation in Black individuals and females (Figures 2 and 3). Other performance metrics, including imprecision, MAE, and RMSE, followed the pattern of P10 and P30, with GBM and RF showing slightly larger errors compared to the other equations in most subgroups (Supplemental Figure 6).

Figure 1. Bias of equations overall and by subgroups in the external validation data set. Shows the bias of all equations overall and across subgroups. The dots are point estimates and the horizontal lines are 95% confidence intervals. The vertical dashed line represents the unbiased reference line, with estimates closer to 0 indicating better performance. eGFR based on the 2021 CKD-EPI SCr equation was used to define the subgroups with eGFR < 60 ml/min/1.73 m2 and eGFR ≥ 60 ml/min/1.73 m2.
Note: Bias was defined as the median of the differences between mGFR and eGFR for each individual in the sample (mGFR minus eGFR); GFR = glomerular filtration rate; eGFR = estimated GFR; mGFR = measured GFR; CKD-EPI = Chronic Kidney Disease Epidemiology Collaboration; SCr = serum creatinine; LM = the refitted 2021 CKD-EPI SCr equation using a linear regression model; MFP = the equation based on the multivariable fractional polynomial model; GAM = the equation based on the generalized additive model; RF = the equation based on random forests; GBM = the equation based on gradient boosted machines.

Figure 2. P10 of equations overall and by subgroups in the external validation data set. Shows accuracy measured by P10 of all equations overall and across subgroups. The dots are point estimates and the horizontal lines are 95% confidence intervals. The vertical reference line is positioned at the highest P10 value across all equations, with estimates closer to 100 indicating higher accuracy. eGFR based on the 2021 CKD-EPI SCr equation was used to define the subgroups with eGFR < 60 ml/min/1.73 m2 and eGFR ≥ 60 ml/min/1.73 m2.
Note: P10 is the percentage of eGFRs within 10% of mGFR; GFR = glomerular filtration rate; eGFR = estimated GFR; mGFR = measured GFR; CKD-EPI = Chronic Kidney Disease Epidemiology Collaboration; SCr = serum creatinine; LM = the refitted 2021 CKD-EPI SCr equation using a linear regression model; MFP = the equation based on the multivariable fractional polynomial model; GAM = the equation based on the generalized additive model; RF = the equation based on random forests; GBM = the equation based on gradient boosted machines.

Figure 3. P30 of equations overall and by subgroups in the external validation data set. Shows accuracy measured by P30 of all equations overall and across subgroups; the dots are point estimates and the horizontal lines are 95% confidence intervals. The vertical reference line is positioned at the highest P30 value across all equations, with estimates closer to 100 indicating greater accuracy. eGFR based on the 2021 CKD-EPI SCr equation was used to define the subgroups with eGFR < 60 ml/min/1.73 m2 and eGFR ≥ 60 ml/min/1.73 m2.
Note: P30 is the percentage of eGFRs within 30% of mGFR; GFR = glomerular filtration rate; eGFR = estimated GFR; mGFR = measured GFR; CKD-EPI = Chronic Kidney Disease Epidemiology Collaboration; SCr = serum creatinine; LM = the refitted 2021 CKD-EPI SCr equation using a linear regression model; MFP = the equation based on the multivariable fractional polynomial model; GAM = the equation based on the generalized additive model; RF = the equation based on random forests; GBM = the equation based on gradient boosted machines.
We further performed the analysis in subgroups based on combinations of race and age (<65 years), sex and age (<65 years), race and eGFR (<60 ml/min/1.73 m2), and sex and eGFR (<60 ml/min/1.73 m2) (Supplemental Tables 3 and 4, Supplemental Figure 7, Supplemental Figure 8). We found that the performances of GBM and RF were similar in these subgroups and comparable to the performance in the overall sample, with generally smaller biases but lower precision and accuracy. The GAM equation had higher P10 and P30 compared to the LM equation for Black individuals across age groups (Supplemental Figure 7B, Supplemental Figure 7C) and eGFR categories (Supplemental Figure 8B, Supplemental Figure 8C). Similarly, the MFP equation showed higher P30 than the LM equation for Black individuals, regardless of age group (Supplemental Figure 7C) or eGFR category (Supplemental Figure 8C), although 95% CIs overlapped for most of the equations across subgroups.
The individual-level differences between mGFR and all eGFRs derived from all equations, including the LM equation, were large (Supplemental Table 5, Figure 4). 95% PIs were wide at each eGFR threshold used to define CKD diagnosis or staging. For example, at eGFR of 60 ml/min/1.73 m2, 95% PI of mGFR range from 38 to 89 ml/min/1.73 m2 for the LM equation (i.e., mGFR could fall anywhere between 38 to 89 ml/min/1.73 m2), 36 to 88 ml/min/1.73 m2 for both the MFP and GAM equation, 37 to 87 ml/min/1.73 m2 for the RF equation, and 36 to 86 ml/min/1.73 m2 for the GBM equation. Overall, all equations exhibited similar performance at the chosen eGFR thresholds.

Figure 4. Comparison of 95% prediction intervals of mGFR among all equations in the external validation data set. Vertical lines represent prediction intervals of the new equations, with each equation represented by a different color. The numbers near the caps of vertical lines show the 2.5th and 97.5th percentiles of mGFR at given eGFR values. Symbols (arrows and dots) on the vertical lines identify the 25th and 75th percentiles, and median of mGFR at given eGFR values. The interpretation is that at a given eGFR, 95% of mGFRs range from the 2.5th to 97.5th percentiles. Similarly, 50% of mGFRs range from the 25th to 75th percentiles. For each equation, the percentile values of mGFR are obtained from separate quantile regression models (at the 2.5th, 25th, median, 75th, and 97.5th percentiles, respectively) of mGFR on eGFR.
Note: GFR = glomerular filtration rate; eGFR = estimated GFR; mGFR = measured GFR; CKD-EPI = Chronic Kidney Disease Epidemiology Collaboration; SCr = serum creatinine; LM = the refitted 2021 CKD-EPI SCr equation using a linear regression model; MFP = the equation based on the multivariable fractional polynomial model; GAM = the equation based on the generalized additive model; RF = the equation based on random forests; GBM = the equation based on gradient boosted machines.
Discussion
In this study, we developed and examined four new SCr-based GFR-estimating equations using advanced methods and compared their performance to that of the refitted linear regression-based 2021 CKD-EPI SCr equation (the LM equation). Our findings showed that the MFP and GAM equations performed very similarly to the refitted CKD-EPI SCr equation, with slightly improved accuracy as measured by P10 and P30 in certain subgroups, including Black individuals and females. The GBM and RF methods had smaller biases in the overall sample, as well as in some subgroups, including Black and eGFR < 60 ml/min/1.73 m2 subgroups. However, they tended to show lower precision and accuracy compared to the other equations. Generally, differences among equations were modest across the entire external validation data set and subgroups defined by race, sex, age, and eGFR categories.
A few other studies have also looked into GFR estimation using machine learning methods. For example, Xunliu et al. employed an artificial neural network (ANN) model to estimate GFR using sex, age, and SCr as predictors based on data collected from a group of patients with CKD in China. The model did not outperform a linear regression model [Reference Liu, Li and Lv27]. The authors later applied an ensemble approach, averaging predictions from ANN, support vector machines, and the regression model, to estimate GFR, which improved precision but yielded bias and accuracy comparable to the regression-model-based approach [Reference Liu, Li and Lv28].
Despite the numerous advantages of RF or GBM, our study did not demonstrate the clear benefits of these techniques for SCr-based GFR estimation. One possible explanation is that variable selection is one of the key strengths of these machine learning methods. However, in our study, the predictors were pre-selected according to the standard clinical practice. As a result, the variable selection capability of machine learning was not fully utilized, which may explain why GBM and RF did not significantly outperform the traditional linear regression approach in GFR estimation. In addition, we suspect that our data did not exhibit complex nonlinear relationships between SCr and mGFR. As a result, the usual benefits of advanced methods – such as their ability to handle intricate nonlinear relationships – may not be fully demonstrated in our study sample.
All equations, including the LM-based refitted 2021 CKD-EPI SCr equation, exhibited suboptimal performance in the external validation data set. The greatest underestimation biases were observed in Black individuals across all subgroups defined by race, sex, age, and eGFR (Figure 1). The largest biases were seen in the Black subgroup with eGFR > 60 ml/min/1.73 m2, followed by the Black subgroup with eGFR<60 ml/min/1.73 m2 (Supplemental Table 4, Figure 8A). P30 was lowest for Black individuals with eGFR < 60 ml/min/1.73 m2 subgroup (Supplemental Table 4, Figure 8C), ranging between 70% and 80%, while 90% is typically considered good. Our findings indicate that the limitations of SCr-based equations persist even when advanced statistical and machine learning methods are employed, highlighting the difficulty of improving upon established SCr-based equations such as the CKD-EPI SCr equation.
Some limitations of the current study bear mentioning. First, the sample sizes of both our development and external validation data sets were relatively small compared to those used to develop the CKD-EPI equations. The external validation data set included fewer Black participants (n = 535, 24%) than White participants, resulting in wide confidence intervals in certain analyses, such as the subgroup analyses. This may have reduced the ability to detect true differences between White and Black participants across equations. Future research with a larger sample of Black participants would enable more robust and valid comparisons.
Moreover, the equations generated using the methods we employed are more difficult to understand than those based on linear regression models. While we were able to explicitly formulate the MFP equation, we could only visualize the GAM, RF, and GBM equations. User-friendly tools, such as online calculators and R packages, will be essential to facilitate the application of these advanced methodologies in clinical practice. However, the goal of this study is to assess whether advanced methods can improve SCr-based GFR-estimating equations without incorporating other markers rather than to advocate for any of these equations.
In conclusion, our results suggest that advanced methodologies, including MFP, GAM, RF, and GBM, may have limited utility in enhancing GFR estimation using SCr as the main predictor. Future research should aim to integrate novel biomarkers to enhance GFR estimation and to improve the clinical feasibility of mGFR measurements, especially for subgroups where the current eGFR equations show less optimal performance, such as Black individuals.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/cts.2025.10057
Acknowledgments
The data from the CRIC, MDRD, AASK, CRISP, and ALTOLD studies reported here were supplied by the NIDDK Central Repository. Data for GENOA is housed at the University of Mississippi Medical Center. We thank Dr Andrew Rule from the Mayo Clinic for the help in obtaining data from the ECAC cohort study.
Author contributions
Xiaoqian Zhu: Conceptualization, data curation, formal analysis, investigation, methodology, project administration, resources, software, validation, visualization, writing – original draft, writing – review and editing; Tariq Shafi: conceptualization, supervision, writing – review and editing; Keith Norris: supervision, writing – review and editing; Jeannette Simino: supervision, writing – review and editing; Srishti Shrestha: writing review and editing; Thomas Mosley: resources, writing – review and editing; Michael Griswold: supervision, writing – review and editing; Seth Lirette: conceptualization, methodology, supervision, writing – review and editing.
Funding statement
XZ is partially supported by 75N92022D0004, HHSN268201800012i, and HIS-2020C1-19350. JS is supported by 75N92022D0004, 5P20GM144041, and 1RF1AG059421. KCN is partially supported by NIH research grants UL1TR001881, P30AG021684, U2CDK129496, P50MD017366, and OT2OD032581. TS is supported by R01NR017399, R01DK123062, U01DK127918, and R01HL153499. SL is partially supported by the Mississippi Center for Clinical and Translational Research and Mississippi Center of Excellence in Perinatal Research COBRE funded by the National Institute of General Medical Sciences of the National Institutes of Health under Award Numbers 5U54GM115428 and P20GM121334.
Competing interests
KCN is a Kidney Disease Quality Improvement Consultant for Atlantis Health, Inc.