Introduction
Nicotine use and nicotine dependence (or tobacco use disorder) are influenced by both genetic liability and environmental factors (Le Foll et al., Reference Le Foll, Piper, Fowler, Tonstad, Bierut, Lu and Hall2022; MAES et al., Reference Maes, Sullivan, Bulik, Neale, Prescott, Eaves and Kendler2004). Several different smoking-related phenotypes have been studied, including smoking initiation (SmkInit) (Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022), cigarettes per day (CPD) (Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022), the Fagerström Test for Nicotine Dependence (FTND) (Quach et al., Reference Quach, Bray, Gaddis, Liu, Palviainen, Minica and Hancock2020), problematic tobacco use (PTU) (Hatoum et al., Reference Hatoum, Johnson, Colbert, Polimanti, Zhou, Walters and Agrawal2022), tobacco use disorder defined by codes from the International Classification of Diseases (ICD-based tobacco use disorder [ICD-TUD]) (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024, and nicotine dependence (Loukola et al., Reference Loukola, Wedenoja, Keskitalo-Vuokko, Broms, Korhonen, Ripatti and Kaprio2014) defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-NicDep) (Table 1).
Table 1. Definitions of different smoking-related phenotypes discussed in this article, the acronyms used to represent them, and the relevant GWASs used in analyses

Genome-wide association studies (GWASs) of nicotine use behaviors have largely focused on phenotypes that are easily ascertained through self-report questionnaires (e.g. When did you start smoking?, Are you a current smoker?, and How many cigarettes do you smoke in a day?). The largest GWASs of tobacco smoking behaviors to date identified 140 loci associated with CPD (N = 784,353) and 1,346 loci related to the initiation of regular smoking (SmkInit; N ~ 3.38 million) (Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022).
GWASs of nicotine dependence phenotypes have generally been smaller. Some GWASs of nicotine dependence-related phenotypes have focused on data collected from short questionnaires, such as the FTND (Quach et al., Reference Quach, Bray, Gaddis, Liu, Palviainen, Minica and Hancock2020). The FTND is a six-item questionnaire that includes cigarettes smoked per day as an ordinal indicator. The largest GWAS of FTND (defined as 0–3 for mild, 4–6 for moderate, and 7–10 for severe dependence) identified five loci (Quach et al., Reference Quach, Bray, Gaddis, Liu, Palviainen, Minica and Hancock2020). Given the strong genome-wide genetic correlation between CPD and FTND (r g = 0.95) (Quach et al., Reference Quach, Bray, Gaddis, Liu, Palviainen, Minica and Hancock2020), these GWASs were combined into a single phenotype, PTU, in another study (Hatoum et al., Reference Hatoum, Johnson, Colbert, Polimanti, Zhou, Walters and Agrawal2022) using multi-trait analysis of GWAS. A recent GWAS meta-analysis of ICD-TUD from electronic health records (see Table 1) reached nearly 900,000 samples and identified 88 loci (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024), all of which had been implicated in prior smoking-related GWASs. ICD-TUD showed moderate genetic correlations with CPD (r g = 0.44) (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024) and FTND (r g = 0.63).
Our study was motivated by epidemiological and clinical data supporting nosological distinctions between FTND-defined and ICD- or DSM-based diagnoses, including some studies that suggest qualitative and quantitative differences in the associations between DSM- and FTND-defined nicotine dependence and some psychopathology (Breslau & Johnson, Reference Breslau and Johnson2000; Brook, Koppel, & Pahl, Reference Brook, Koppel and Pahl2009). The FTND is brief and, therefore, easily and frequently collected. It has been especially prioritized in clinical trials of tobacco cessation (Ramon, Morchon, Baena, & Masuet-Aumatell, Reference Ramon, Morchon, Baena and Masuet-Aumatell2014; Tashkin et al., Reference Tashkin, Rennard, Hays, Ma, Lawrence and Lee2011), likely because FTND scores correlate well with relapse and treatment response, and the scale places a great deal of emphasis on physiological aspects of dependence (e.g. items related to tolerance and withdrawal [Baker, Breslau, Covey, & Shiffman, Reference Baker, Breslau, Covey and Shiffman2012]). On the other hand, both ICD- and DSM-based nicotine dependence include criteria related to physical and psychological (and social, in DSM-5) impairment due to nicotine/tobacco use, as well as behaviors directed at seeking and using nicotine to the exclusion of other activities. Neither FTND nor the ICD-TUD diagnostic classification maps perfectly to DSM-NicDep (Moolchan et al., Reference Moolchan, Radzius, Epstein, Uhl, Gorelick, Cadet and Henningfield2002; Mwenifumbo & Tyndale, Reference Mwenifumbo and Tyndale2011).
Many large GWAS meta-analyses of substance use disorders have relied on cases defined using clinical criteria recommended by the DSM or ICD classification. However, the few GWASs of DSM-NicDep to date have been relatively small (Loukola et al., Reference Loukola, Wedenoja, Keskitalo-Vuokko, Broms, Korhonen, Ripatti and Kaprio2014). We conducted a large meta-analysis of DSM-NicDep, combining data across 16 cohorts and multiple genetic ancestries. The largest analyses of psychiatric traits have focused on individuals whose genetics most closely resemble the European ancestry (EUR) subset of the 1,000 Genomes Project. Therefore, we assessed the genetic correlations between DSM-NicDep and other substance use phenotypes, psychiatric disorders, and related phenotypes in that subset. We also compared the correlations of the various nicotine dependence-related measures (DSM-NicDep, ICD-TUD, and PTU) and other substance use disorders (cannabis use disorder [CanUD], problematic alcohol use [PAU], and opioid use disorder [OUD]) by fitting a previously published common factor model (the Addiction-Risk-Factor) (Hatoum et al., Reference Hatoum, Colbert, Johnson, Huggett, Deak, Pathak and Hansen2023) to the data using genomic structural equation modeling (SEM). Finally, we tested whether a polygenic score (PGS) for DSM-NicDep was associated with DSM-5 tobacco use disorder and its 11 diagnostic criteria and 4 of the 6 FTND criteria in a large, independent sample – the National Epidemiologic Survey on Alcohol and Related Conditions-III (NESARC-III) cohort.
Methods
GWASs of DSM-NicDep
We performed meta-analyses of GWASs of DSM-defined nicotine dependence (hereafter referred to as “DSM-NicDep”) across a total of 18 cohorts, 3 of which included samples of multiple ancestries, using a sample size-weighted meta-analysis implemented in METAL (Willer, Li, & Abecasis, Reference Willer, Li and Abecasis2010). We used nicotine-exposed controls where possible (see Supplemental Materials for more details on each cohort). The deCODE and Finnish Twin Cohort samples were entirely of EUR. For other cohorts, genetic ancestry similarity was inferred by comparing an individual’s genome to the genomes from global reference populations using statistical methods such as principal components analysis. Specifically, all Psychiatric Genomics Consortium cohorts included in this meta-analysis used the 1,000 Genomes Phase 3 as a reference panel for defining ancestries based on principal components, as did the Thai sample. There were 16 cohorts with samples that were most genetically similar to EUR global reference populations, 4 with samples that were most genetically similar to African ancestry (AFR) global reference populations, and 1 cohort whose participants were most genetically similar to East Asian ancestry (EAS) global reference populations (hereafter referred to as European, African, or East Asian ancestries). All GWASs controlled for age, sex, and 10 principal components, except for the Minnesota Center for Twin and Family Research (MCTFR) cohort, which covaried for 20 principal components, and the deCODE cohort, which controlled for population stratification by adjusting for a genome-wide inflation factor; more details, including cohort-specific covariates, are provided in Supplemental Table 1. Twelve cohorts provided summary statistics, while five cohorts provided individual genotype and phenotype data for analysis (Supplemental Table 1).
We used FUMA v1.5.2 (Watanabe, Taskesen, van Bochoven, & Posthuma, Reference Watanabe, Taskesen, van Bochoven and Posthuma2017) to identify independent, genome-wide significant risk loci, annotate variants, and perform gene-wise analyses via MAGMA (de Leeuw, Mooij, Heskes, & Posthuma, Reference de Leeuw, Mooij, Heskes and Posthuma2015). We used the default FUMA parameters to define “independent significant single-nucleotide polymorphisms (SNPs)” as those that reached genome-wide significance (p < 5e−8) and were independent of each other at r 2 < 0.6, and “lead SNPs” as those SNPs that were independent of each other at r 2 < 0.1. “Genomic risk loci” were defined by merging linkage disequilibrium (LD) blocks of independent significant SNPs within a 250-kb distance. We performed gene mapping in FUMA using positional mapping (based on ANNOVAR annotations), expression quantitative trait locus mapping (using GTEx V8 [Aguet et al., Reference Aguet, Brown, Castel, Davis, He and Jo2017], CommonMind [Huckins et al., Reference Huckins, Dobbyn, Ruderfer, Hoffman, Wang and Pardiñas2019], and Braineac [Trabzuni et al., Reference Trabzuni, Ryten, Walker, Smith, Imran, Ramasamy and Hardy2011] data), and chromatin interaction mapping. We also performed the MAGMA gene, gene-set, and gene expression analyses (using GTEx V8 data).
We performed most follow-up analyses described below (genomic SEM and PGSs) only in the EUR data because the AFR and EAS subsets were underpowered.
Genetic correlations with other substance use disorders, psychiatric disorders, and other relevant phenotypes
We used LD score regression (LDSC) (Bulik-Sullivan et al., Reference Bulik-Sullivan, Finucane, Anttila, Gusev, Day, Loh and Neale2015; Bulik-Sullivan et al., Reference Bulik-Sullivan, Loh, Finucane, Ripke, Yang and Patterson2015) to estimate the SNP-heritability of DSM-NicDep and the genetic correlations between DSM-NicDep and other substance use phenotypes, using published GWASs of PAU (Zhou et al., Reference Zhou, Kember, Deak, Xu, Toikumo and Yuan2023), FTND (Quach et al., Reference Quach, Bray, Gaddis, Liu, Palviainen, Minica and Hancock2020), ICD-TUD (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024), CPD (Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022), cannabis ever-use (Pasman et al., Reference Pasman, Verweij, Gerring, Stringer, Sanchez-Roige and Treur2018), CanUD (Levey et al., Reference Levey, Galimberti, Deak, Wendt, Bhattacharya and Koller2023), OUD (Deak et al., Reference Deak, Zhou, Galimberti, Levey, Wendt, Sanchez-Roige and Gelernter2022), SmkInit (Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022), and smoking cessation (Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022). We also estimated genetic correlations between DSM-NicDep and other phenotypes, including psychiatric disorders, behavioral traits, respiratory health, and socioeconomic status-related phenotypes. Details on the individual GWAS used in genetic correlation analyses are provided in the Supplemental Methods. We further tested whether genetic correlations for DSM-NicDep and FTND were different from each other using a block-jackknife method (Bulik-Sullivan, Finucane, et al., Reference Bulik-Sullivan, Finucane, Anttila, Gusev, Day, Loh and Neale2015; Coleman et al., Reference Coleman, Peyrot, Purves, Davis, Rayner and Choi2020).
We also estimated genetic correlations between the AFR DSM-NicDep GWAS and eight traits for which GWAS data were available from a reasonably large AFR sample. With a total sample size <5,000, it was not feasible to estimate meaningful genetic correlations using LDSC for the DSM-NicDep EAS samples.
Genomic SEM
We applied confirmatory factor analysis to the covariance matrix generated by LDSC using genomic SEM (Grotzinger et al., Reference Grotzinger, Rhemtulla, de Vlaming, Ritchie, Mallard, Hill and Tucker-Drob2019) with weighted least squares estimation. As presented in Hatoum et al. (Reference Hatoum, Johnson, Colbert, Polimanti, Zhou, Walters and Agrawal2022)), the indicators were allowed to load freely on a single latent factor (Addiction-Risk-Factor), but we updated the OUD (Deak et al., Reference Deak, Zhou, Galimberti, Levey, Wendt, Sanchez-Roige and Gelernter2022), PAU (Zhou et al., Reference Zhou, Kember, Deak, Xu, Toikumo and Yuan2023), and CanUD (Levey et al., Reference Levey, Galimberti, Deak, Wendt, Bhattacharya and Koller2023) GWASs. We compared Addiction-Risk-Factor models with DSM-NicDep, PTU (a combination of FTND and CPD [Hatoum, Colbert, et al., Reference Hatoum, Colbert, Johnson, Huggett, Deak, Pathak and Hansen2023), or ICD-TUD as the tobacco-related indicator. The variance of the common latent factor was scaled to 1.0.
PGS analyses
We created PGSs for DSM-NicDep, FTND, ICD-TUD, and CPD in the EUR subset of the NESARC-III sample. NESARC-III was genotyped using the Affymetrix Axiom® Exome Array (Zhang et al., Reference Zhang, Grant, Hodgkinson, Ruan, Kerridge, Huang and Chou2022), which limited our ability to impute SNPs due to a lack of appreciable nonexonic coverage and resulted in some regions with low SNP densities. Details of quality control and imputation are available in the Supplemental Materials. We used PRS-CS (Ge, Chen, Ni, Feng, & Smoller, Reference Ge, Chen, Ni, Feng and Smoller2019), a Bayesian method that uses continuous shrinkage before weight SNP effect sizes, and used the “auto” function, which allows the global scaling parameter to be automatically learned from the data. We then used the “score” function in Plink 1.9 (Chang et al., Reference Chang, Chow, Tellier, Vattikuti, Purcell and Lee2015) to create PGS for DSM-NicDep, FTND, ICD-TUD, and CPD in NESARC-III. We used logistic regression models to estimate associations between the PGS and endorsement of the 11 individual diagnostic criteria for DSM-5 TUD and 4 of the 6 FTND criteria (excluding CPD and smoking when ill) in NESARC-III (N = 12,482; DSM-TUD N cases = 4,205). Linear regression models were used to estimate associations between PGSs and the total DSM-5 TUD criterion count and total FTND criterion count. All regression models covaried for age, sex, and 10 within-ancestry principal components. We first ran a model with only the DSM-NicDep PGS as a predictor, then a model with only the FTND PGS as a predictor, and finally a model where all four PGSs (DSM-NicDep, FTND, ICD-TUD, and CPD) were included jointly as predictors.
We also created PGSs for DSM-NicDep, FTND, and CPD in the BioVU biobank (we did not create a PGS for ICD-TUD, because BioVU was included in that published GWAS meta-analysis). We used the methods described above to create the PGS and tested whether the DSM-NicDep, FTND, and CPD PGS were associated with ICD-TUD (N = 66,914; N cases = 9,666) in BioVU in a logistic regression model that controlled for age, sex, and 10 ancestry principal components.
Results
GWAS of DSM-NicDep
There were a total of 15 cohorts included in the EUR meta-analysis (N cases = 20,923, N controls = 26,961), 4 cohorts included in the AFR meta-analysis (N cases = 5,293, N controls = 4,938), and 1 cohort included in the EAS GWAS (N cases = 2,007, N controls = 1,739), for a total cross-ancestry sample size of N = 61,861 (28,223 cases; Supplemental Table 1). We identified one genome-wide significant locus, near the cholinergic receptor nicotinic alpha 5 subunit (CHRNA5) gene on chromosome 15 (lead SNP: rs147144681, p =1.27E−11 in EUR; lead SNP = rs2036527, p = 6.49e−13 in cross-ancestry analysis; Supplemental Tables 2 and 3). Using a lifetime prevalence of 24% (Breslau, Johnson, Hiripi, & Kessler, Reference Breslau, Johnson, Hiripi and Kessler2001), we estimated the liability scale SNP-heritability of DSM-NicDep to be 0.07 (SE = 0.01) in the EUR samples. The estimated liability scale SNP-heritability was similar in the AFR samples, but with a much larger standard error (SNP-h 2 = 0.08, SE = 0.09).
In the MAGMA gene-based analysis of the EUR data, three genes were significant after multiple testing corrections: CHRNA5, CHRNA3, and IREB2 (Supplemental Table 4). The following three gene sets were significantly associated: “GOBP_TRANSCRIPTION_BY_RNA_POLYMERASE_III,” “GOMF_PRE_MRNA_5_SPLICE_SITE_BINDING,” and “REACTOME_HIGHLY_CALCIUM_PERMEABLE_NICOTINIC_ACETYLCHOLINE_RECEPTORS.” None of the 30 tissue types from GTEx v8 were significant.
Genetic correlations with substance use and other phenotypes
In the EUR data, DSM-NicDep was strongly correlated with the other measures of nicotine dependence (r gs with FTND and ICD-TUD ranged from 0.81 to 1.01). The genetic correlations between DSM-NicDep and published GWASs of substance use disorders were of moderate-to-high magnitude: CanUD, PAU, and OUD (r g = 0.64–0.84). Overall, DSM-NicDep was significantly correlated with 23 of the 26 phenotypes tested (Figure 1 and Supplemental Table 5). Compared to the other tobacco-related phenotypes, DSM-NicDep showed the strongest correlations with many traits, albeit with wider confidence intervals (CIs) due to smaller sample sizes. When correcting for 24 comparisons, the genetic correlations between DSM-NicDep and SmkInit, cannabis ever-use, CanUD, PAU, and the Townsend Deprivation Index were significantly larger (p < 0.002) than the corresponding genetic correlations between FTND and these phenotypes.

Figure 1. Comparing genetic correlations (r g) for DSM-NicDep, FTND, ICD-TUD, and CPD with other traits in European ancestry data. Traits include other substance use disorders (CanUD, cannabis use disorder [Levey et al., Reference Levey, Galimberti, Deak, Wendt, Bhattacharya and Koller2023]; OUD, opioid use disorder [Deak et al., Reference Deak, Zhou, Galimberti, Levey, Wendt, Sanchez-Roige and Gelernter2022]; PAU, problematic alcohol use [Zhou et al., Reference Zhou, Kember, Deak, Xu, Toikumo and Yuan2023]; ICD-TUD, ICD-based tobacco use disorder [Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024]), substance use behaviors (CanUse, cannabis ever-use [Pasman et al., Reference Pasman, Verweij, Gerring, Stringer, Sanchez-Roige and Treur2018]; DPW, drinks per week [Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022]; SmkInit, smoking initiation [Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022]; SmkCessation, smoking cessation [Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022]), psychiatric disorders and other mental health phenotypes (ADHD, attention deficit hyperactivity disorder [Demontis et al., Reference Demontis, Walters, Athanasiadis, Walters, Therrien, Nielsen and Zeng2023]; PTSD, post-traumatic stress disorder [Nievergelt et al., Reference Nievergelt, Maihofer, Klengel, Atkinson, Chen, Choi and Koenen2019]), biomarkers (Cot + HC, cotinine +3-hydroxycotinine [Buchwald et al., Reference Buchwald, Chenoweth, Palviainen, Zhu, Benner, Gordon and Lehtimäki2021]), lung health-related traits (FEV1, forced expiratory volume in 1 s), risk tolerance (Linnér et al., Reference Linnér, Biroli, Kong, Meddens, Wedow, Fontana and Hammerschlag2019), socioeconomic status-related traits (Edu attainment, educational attainment [Lee et al., Reference Lee, Wedow, Okbay, Kong, Maghzian and Zacher2018]; TDI, Townsend Deprivation Index]), executive function (EF [Hatoum, Morrison, et al., Reference Hatoum, Morrison, Mitchell, Lam, Benca-Bachman, Reineberg and Friedman2023]), and anthropometric measures (BMI, body mass index [Yengo et al., Reference Yengo, Sidorenko, Kemper, Zheng, Wood and Weedon2018]; height [Yengo et al., Reference Yengo, Vedantam, Marouli, Sidorenko, Bartell, Sakaue and Raghavan2022]). * indicates r gs that significantly differ between DSM-NicDep and FTND at α = 0.002 (Bonferroni correction for 24 comparisons).
Presumably due to the smaller sample size of the AFR data (reflected in the imprecise estimate of SNP-heritability, SNP-h 2 = 0.08, SE = 0.09), the standard errors of the genetic correlations were quite large (Supplemental Table 6). However, the genetic correlation between DSM-NicDep and ICD-TUD in the AFR ancestry was significant at p < 0.05 (r g = 0.58, p = 0.024) (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024).
Genomic structural equation models of broad addiction liability
In the EUR data, we found that a common genetic factor model with DSM-NicDep, PAU, OUD, and CanUD as indicators, similar to the Addiction-Risk-Factor presented in Hatoum et al., (Reference Hatoum, Johnson, Colbert, Polimanti, Zhou, Walters and Agrawal2022), fit the data well (Figure 2a and Supplemental Table 7). There were two differences between the Hatoum et al. Addiction-Risk-Factor and our modified model: (1) We used larger, more recent versions of the OUD (Deak et al., Reference Deak, Zhou, Galimberti, Levey, Wendt, Sanchez-Roige and Gelernter2022), PAU (Zhou et al., Reference Zhou, Kember, Deak, Xu, Toikumo and Yuan2023), and CanUD (Levey et al., Reference Levey, Galimberti, Deak, Wendt, Bhattacharya and Koller2023) GWASs; and (2) we removed the residual correlation between PAU and OUD, as this path was no longer significant. We compared the model with DSM-NicDep as the tobacco-related indicator (Figure 2a) to one where DSM-NicDep was substituted by the original PTU GWAS meta-analysis (FTND + CPD) from Hatoum et al.4 (Figure 2b and Supplemental Table 8), and one where NicDep was substituted by ICD-TUD (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024) (Figure 2c and Supplemental Table 9). Each of the modified models fit the data well, but the loading for DSM-NicDep was the largest of the three tobacco-related indicators, almost threefold larger than the loading for PTU in the equivalent model (0.86 vs. 0.30).

Figure 2. A modified Addiction-Risk-Factor model. This model is patterned upon the common factor model in Figure 1A of Hatoum et al., Reference Hatoum, Johnson, Colbert, Polimanti, Zhou, Walters and Agrawal2022, but updated with new, larger versions of the OUD (Deak et al., Reference Deak, Zhou, Galimberti, Levey, Wendt, Sanchez-Roige and Gelernter2022), PAU (Zhou et al., Reference Zhou, Kember, Deak, Xu, Toikumo and Yuan2023), and CanUD GWAS (Levey et al., Reference Levey, Galimberti, Deak, Wendt, Bhattacharya and Koller2023) and using three different phenotypes for tobacco GWAS. (a) DSM-NicDep. (b) PTU (Hatoum et al., Reference Hatoum, Johnson, Colbert, Polimanti, Zhou, Walters and Agrawal2022) GWAS. (c) ICD-TUD (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024). *Significant loadings at p < 0.05. Addiction-rf, the Addiction-Risk-Factor; CanUD, cannabis use disorder; DSM-NicDep, nicotine dependence; ICD-TUD, ICD-based tobacco use disorder; OUD, opioid use disorder; PAU, problematic alcohol use.
PGSs for DSM-NicDep, FTND, CPD, and ICD-TUD
Results from the item-level logistic regression models in the EUR subset of the NESARC-III sample are shown in Figure 3. As the sole PGS predictor (Figure 3a and Supplemental Table 10), the DSM-NicDep PGS was significantly associated with all 11 DSM criteria (p FDR < 0.05) and 2 of the 4 FTND items (how soon after waking is the first cigarette smoked, and whether one smokes more frequently during the early hours of the day). The DSM-NicDep PGS was also associated with the total number of endorsed DSM-5 TUD criteria (β = 0.116, SE = 0.025, p FDR = 7.1e−5, variance explained = 0.17%) and total number of endorsed FTND items (β = 0.033, SE = 0.010, p FDR = 0.003, variance explained = 0.08%). When the FTND PGS was the sole predictor, it was associated with all 4 FTND items and 9 of 11 DSM criteria (Figure 3b), as well as the total number of endorsed DSM-5 TUD criteria (β = 0.118, SE = 0.025, p FDR = 7.2e−6, variance explained = 0.2%) and total number of endorsed FTND items (β = 0.052, SE = 0.010, p FDR = 1.3e−6, variance explained = 0.2%; Supplemental Table 11). When all four PGS were included as predictors (Figure 3c and Supplemental Table 12), the DSM-NicDep PGS was no longer associated with any individual DSM or FTND items nor the total criterion counts. Overall, the ICD-TUD PGS was the strongest predictor of endorsement of individual DSM and FTND items, but the variance explained was still quite small (maximum variance explained = 0.716%).

Figure 3. Polygenic scores (PGSs) for DSM-NicDep (a), FTND (b), and DSM-NicDep, FTND, ICD-TUD, and CPD (c) predict individual DSM-5 nicotine use disorder and FTND criteria and total criterion or item counts, respectively, in the European ancestry subset of the NESARC-III sample. Filled circles represent estimates that were significant after FDR correction, while open circles represent estimates that were not significant after FDR correction. Hazardous = Recurrent use in physically hazardous situations; Fail = Recurrent use resulting in failure to fulfill major role obligations at work, school, or home; Tolerance = Marked need for increased amount to get the same effect or diminished effect of the same amount; TimeSpent = Great deal of time spent in activities necessary to obtain or use; GiveUp = Important recreational, social, or occupational activities given up or reduced; Problems = Use despite knowledge of persistent/recurrent physical/psychological problems; Larger = Taken over larger amounts/longer periods than intended; Withdrawal = Withdrawal syndrome or use to relieve/avoid syndrome; Cutdown = Persistent desire or unsuccessful attempts to cut down or control use; Crave = Craving or strong urge or desire to use; Social = Persistent use despite recurring social/interpersonal problems caused or exacerbated by use; FTND1_within30min = How soon after you wake up do you smoke your first cigarette?; FTND2_prohibited = Do you find it difficult to refrain from smoking in places where it is forbidden?; FTND3_morning = Which cigarette would you hate most to give up?; FTND5_waking = Do you smoke more frequently during the first hours after waking than during the rest of the day?
In a model that included DSM-NicDep, FTND, and CPD PGSs as joint predictors, the DSM-NicDep PGS was significantly associated with TUD in the EUR subset of the BioVU biobank (odds ratio [OR] = 1.013, 95% CI = [1.01, 1.016], p = 5.7e−18), as was the PGS for CPD (OR = 1.017, 95% CI = [1.014, 1.02], p = 1.3e−31). The PGS for FTND was also significantly associated, but with an even smaller effect size (OR = 1.006, 95% CI = [1.003, 1.009], p = 1.6e−5). Altogether, the PGS explained 0.58% of the variance in ICD-TUD, while the DSM-NicDep PGS alone explained 0.12% of the variance in ICD-TUD after accounting for the variance explained by the other PGS (Supplemental Table 13).
Discussion
In this first cross-ancestry GWAS of DSM-NicDep (N = 61,861), we replicated the well-known CHRNA5 risk locus. DSM-NicDep was genetically correlated with other substance use disorders, smoking-related phenotypes, and psychiatric disorders. In genomic SEM analyses, we found that DSM-NicDep was more closely related to the genetic underpinnings of other DSM- or ICD-defined substance use disorders than PTU (a phenotype that combined FTND and CPD measures).
ICD and DSM criteria focus on physical and psychological (and social, in DSM-5) impairment due to substance use and are mostly identical across substances. FTND, on the other hand, places greater emphasis on physiological aspects of dependence (e.g. items related to tolerance and withdrawal [Baker et al., Reference Baker, Breslau, Covey and Shiffman2012]). Because most GWASs of substance use disorders utilize ICD and DSM criteria, we hypothesized that this discrepancy might underlie the lower genetic correlation observed between nicotine dependence and other substance use disorders in prior twin and genome-wide correlation studies that used FTND. Indeed, the loading for DSM-NicDep in the Addiction-Risk-Factor genomic SEM was nearly three times that of the loading for PTU (0.83 vs. 0.30) (Figure 1). Notably, the loadings for OUD and PAU also increased in our model (0.83–1.0 and 0.58–0.81, respectively), likely due to the larger sample sizes of these more recent GWAS. We also noted no significant residual correlation between OUD and PAU in our analysis.
In the large, independent BioVU biobank, the DSM-NicDep and CPD PGS showed significant associations with ICD-TUD of similar magnitude (ORs = 1.013–1.017), while the association with the FTND PGS was smaller (OR = 1.006). A very small proportion of variance was explained by the three PGSs (0.58%), reflecting underpowered GWASs and PGSs that are not currently suitable for individual-level prediction. PGS analyses of individual criteria in NESARC-III revealed that the DSM-NicDep PGS was associated with all individual DSM-5 TUD criteria and two of the FTND-specific items when it was the only predictor in the model (Figure 3a). However, when all four tobacco-related PGSs (ICD-TUD, FTND, CPD, and DSM-NicDep) were included as predictors, DSM-NicDep was no longer associated with any individual items (Figure 3b), nor the total criterion counts. In this multi-PGS model, the FTND PGS was associated with only 1 item (social, one of the three DSM-IV abuse criteria in the DSM-5 TUD diagnosis) while the ICD-TUD PGS was associated with 10 of the 11 DSM-5 criteria and all 4 FTND items, consistent with the ICD-TUD GWAS being much more statistically powerful than FTND or DSM-NicDep. Future item-level GWASs using novel structural equation modeling methods that bring together DSM and FTND items might be of high value in parsing whether genetic and clinical heterogeneity align.
We expected that DSM-NicDep would be more genetically correlated with psychopathology than FTND, as has been found in clinical and epidemiological studies (Breslau & Johnson, Reference Breslau and Johnson2000). The point estimates of our genetic correlation analysis suggested that PTU (CPD + FTND) was the least associated with most indices of psychopathology, while both DSM-NicDep and ICD-TUD were more strongly associated with psychosocial indices. DSM-NicDep appeared to additionally index material deprivation more strongly than the other traits, while all four tobacco GWASs (DSM-NicDep, ICD-TUD, FTND, and CPD) were equivalently related to respiratory (e.g. FEV1 and lung cancer) and metabolic markers of tobacco exposure.
The largest GWAS of CPD (Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu and Wang2022) identified 140 genomic risk loci, compared with 5 in the largest GWAS of FTND (Quach et al., Reference Quach, Bray, Gaddis, Liu, Palviainen, Minica and Hancock2020) and 1 in the current study of DSM-NicDep. This likely reflects the much larger sample size in the CPD study. Despite a similar sample size, the FTND GWAS identified more loci than the current DSM-NicDep GWAS, possibly because that study used a linear model to analyze a categorical three-level (mild–moderate–severe) measure while our analyses relied on binary diagnoses. A recent GWAS of ICD-TUD that had a much larger sample size than ours (N = 898,680) identified 88 loci (Toikumo et al., Reference Toikumo, Jennings, Pham, Lee, Mallard and Bianchi2024. In the current study, DSM-NicDep was highly correlated with ICD-TUD. Given their high genetic correlation, future efforts may consider combining these diagnostic modalities as sources of information on tobacco use disorder.
We did not find any significant loci in the much smaller AFR or EAS analyses, nor could we precisely estimate SNP-heritability or genetic correlations due to the limited power. The lack of diversity in genetic data is unfortunate, as nicotine dependence and tobacco use disorder are leading contributors to mortality in worldwide populations (Le Foll et al., Reference Le Foll, Piper, Fowler, Tonstad, Bierut, Lu and Hall2022; Reitsma et al., Reference Reitsma, Kendrick, Ababneh, Abbafati, Abbasi-Kangevari, Abdoli and Gakidou2021). A related limitation was our inability to conduct sex- and birth cohort-stratified analyses; rates of tobacco use vary markedly according to both sex and birth cohort, which may in turn modify the extent to which genetic liability influences risk for DSM-NicDep. Amassing more genetic datasets with DSM-derived nicotine dependence would enable such valuable analyses.
In summary, our analyses highlight the importance of considering diagnostic assessment in genetic studies of substance use disorders. We found that DSM-NicDep was more closely related to a general addiction liability factor compared to a “PTU” phenotype that combined FTND and CPD. The DSM-NicDep PGS was significantly associated with ICD-TUD in an independent biobank and was associated with individual DSM-5 TUD criteria in another independent sample. Compared to FTND, DSM-NicDep was more strongly genetically correlated with ICD-TUD, PAU, SmkInit, cannabis ever-use, and material deprivation. Given the strong genetic correlation between DSM-NicDep and ICD-TUD, future analyses may consider combining data from DSM-NicDep and ICD-based studies of TUD to maximize sample size for gene discovery.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0033291725100883.
Acknowledgments
The authors gratefully acknowledge our contributing studies and the participants in those studies, without whom this effort would not be possible.
AGDS : The authors would like to thank all the participants for giving their time to contribute to this study. The authors wish to thank all the people who helped in the conception, implementation, beta testing, media campaign, and data cleaning. The authors would specifically like to acknowledge Ken Kendler, Patrick Sullivan, Andrew McIntosh, and Cathryn Lewis for input on the questionnaire; Lorelle Nunn, Mary Ferguson, Lucy Winkler, and Natalie Garden for data and sample collection; Natalia Zmicerevska, Alissa Nichles, and Candace Brennan for participant recruitment support; Jonathan Davies, Luke Lowrey, and Valeriano Antonini for support with IT aspects; and Vera Morgan and Ken Kirkby for help with the media campaign.
GBP : The authors would like to thank the participants for giving their time and support for this project. The authors acknowledge and thank M. Steffens for her generous donations in loving memory of J. Banks.
Funding support for the Comorbidity and Trauma Study (CATS) (dbGAP accession number: phs000277.v1.p1) was provided by the National Institute on Drug Abuse (R01 DA17305) and GWAS genotyping services at the CIDR at The Johns Hopkins University were supported by the National Institutes of Health (contract N01-HG-65403). The Christchurch Health and Development Study (CHDS: dbGAP in progress) has been supported by funding from the Health Research Council of New Zealand, the National Child Health Research Foundation (Cure Kids), the Canterbury Medical Research Foundation, the New Zealand Lottery Grants Board, the University of Otago, the Carney Centre for Pharmacogenomics, the James Hume Bequest Fund, US NIH grant MH077874, and NIDA grant “A developmental model of gene–environment interplay in SUDs” (R01DA024413) 2007–2012. The Collaborative Study on the Genetics of Alcoholism (COGA) – Principal Investigators B. Porjesz, V. Hesselbrock, A. Agrawal; Scientific Director, and A. Agrawal and Translational Director D. Dick – includes 10 different centers: University of Connecticut (V. Hesselbrock); Indiana University (H.J. Edenberg, T. Foroud, Y. Liu, and M.H. Plawecki); University of Iowa Carver College of Medicine (S. Kuperman and J. Kramer); SUNY Downstate Health Sciences University (B. Porjesz, J. Meyers, C. Kamarajan and A. Pandey); Washington University in St. Louis (L. Bierut, J. Rice, K. Bucholz, A. Agrawal, and S. Hartz); University of California at San Diego (M. Schuckit); Rutgers University (J. Tischfield, D. Dick, R. Hart, and J. Salvatore); The Children’s Hospital of Philadelphia, University of Pennsylvania (L. Almasy); Icahn School of Medicine at Mount Sinai (A. Goate and P. Slesinger); and Howard University (D. Scott). Other COGA collaborators include: L. Bauer (University of Connecticut); J. Nurnberger Jr., L. Wetherill, X. Xuei, D. Lai, and S. O’Connor (Indiana University); G. Chan (University of Iowa and University of Connecticut); D.B. Chorlian, J. Zhang, P. Barr, S. Kinreich, G. Pandey, and Z. Neale (SUNY Downstate); N. Mullins (Icahn School of Medicine at Mount Sinai); A. Anokhin, S. Hartz, E. Johnson, V. McCutcheon, and S. Saccone (Washington University); J. Moore, F. Aliev, Z. Pang, and S. Kuo (Rutgers University); A. Merikangas (The Children’s Hospital of Philadelphia and University of Pennsylvania); and H. Chin and A. Parsian are the NIAAA Staff Collaborators. The authors expressed that they continue to be inspired by their memories of Henri Begleiter and Theodore Reich, the founding PI and Co-PI of COGA, and also owe a debt of gratitude to other past organizers of COGA, including Ting-Kai Li, P. Michael Conneally, Raymond Crowe, and Wendy Reich, for their critical contributions. This national collaborative study is supported by NIH Grant U10AA008401 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA) and the National Institute on Drug Abuse (NIDA). Support for the Study of Addiction: Genetics and Environment (SAGE) was provided through the NIH Genes, Environment and Health Initiative (GEI; U01 HG004422; dbGaP study accession phs000092.v1.p1). SAGE is one of the genome-wide association studies funded as part of the Gene Environment Association Studies (GENEVA) under GEI. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Support for the collection of datasets and samples was provided by the Collaborative Study on the Genetics of Alcoholism (COGA; U10 AA008401), the Collaborative Genetic Study of Nicotine Dependence (COGEND; P01 CA089392; see also phs000404.v1.p1), and the Family Study of Cocaine Dependence (FSCD; R01 DA013423 and R01 DA019963). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR), was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, the National Institute on Drug Abuse, and the NIH contract “High throughput genotyping for studying the genetic contributions to human disease” (HHSN268200782096C). The GSMS project (phs000852.v1.p1) was supported by the National Institute on Drug Abuse (U01DA024413 and R01DA11301), the National Institute of Mental Health (R01MH063970, R01MH063671, R01MH048085, K01MH093731, and K23MH080230), NARSAD, and the William T. Grant Foundation. The authors are grateful to all the GSMS and CCC study participants who contributed to this work. The following grants supported data collection and analysis of CADD (dbGAP in progress): DA011015, DA012845, DA021913, DA021905, DA032555, and DA035804. ADAA was funded by NIH grant R01 AA017444. Brisbane Longitudinal Twin Study (BLTS) was supported by the United States National Institute on Drug Abuse (R00DA023549) and by the Australian Research Council (DP0343921, DP0664638, 464914, 619667, and FT110100548). Yale-Penn (phs000425.v1.p1 and phs000952.v1.p1) was supported by National Institutes of Health Grants RC2 DA028909, R01 DA12690, R01 DA12849, R01 DA18432, R01 AA11330, and R01 AA017535 and the Veterans Affairs Connecticut and Philadelphia Veterans Affairs Mental Illness Research, Educational, and Clinical Centers. Australian Alcohol and Nicotine studies (OZ-ALC-NAG; phs000181.v1.p1) were supported by National Institutes of Health Grants AA07535, AA07728, AA13320, AA13321, AA14041, AA11998, AA17688, DA12854, and DA019951; by Grants from the Australian National Health and Medical Research Council (241944, 339462, 389927,389875, 389891, 389892, 389938, 442915, 442981, 496739, 552485, and 552498); by Grants from the Australian Research Council (A7960034, A79906588, A79801419, DP0770096, DP0212016, and DP0343921); and by the 5th Framework Programme (FP-5) GenomEUtwin Project (QLG2-CT-2002-01254). Genome-wide association study genotyping at the Center for Inherited Disease Research was supported by a grant to the late Richard Todd, MD, PhD, former Principal Investigator of Grant AA13320. The Finnish Twin Cohort/Nicotine Addiction Genetics-Finland study was supported by Academy of Finland (grants # 213506 and 129680 to JK), NIH DA12854 (PAFM), Global Research Award for Nicotine Dependence/Pfizer Inc. (JK), Wellcome Trust Sanger Institute, UK, and the European Community’s Seventh Framework Programme ENGAGE Consortium (HEALTH-F4-2007- 201413). In Finntwin12, support for data collection and genotyping has come from the National Institute of Alcohol Abuse and Alcoholism (grants AA-12502, AA-00145, and AA-09203 to RJR and AA15416 and K02AA018755 to DMD), the Academy of Finland (grants 100499, 205585, 118555, 141054, and 264146 to JK), and the Wellcome Trust Sanger Institute, UK.
Author contribution
ECJ, DL, HJE, JG, and AA conceived the study idea. ECJ, DL, APM, ASH, JDD, MJ, JVB, MG, KS, TT, TE, PJ, TK, CM, YN, TP, MS, PNRV, LW, and IRG contributed to data analysis. ECJ, DL, and AA drafted the manuscript, and all authors edited, reviewed, and approved the final manuscript.
Funding statement
We acknowledge the following sources of support: K01DA051759 (ECJ); T32DA007261 (JVB); K01DA058807 (JDD); K01AA031724 (APM); R01DA054869 (AA, HJE, and JG); DP1DA054394 and T32IR5226 (SSR); R01DA042755, U01DA042217, R01AG077742, R01DA054087, R01DA044283, DA05147, DA13240, DA02441, AA09367, AA11886, and MH066140 (TE, AG, EAW, SV, MM, and JJL); R01DA030976 (KCW and IRG); R01MH100141 (PNRV); R01MH123489, R01MH123619, and R01MH134284 (ARD); AUS NHMRC 464914 (IBH and NGM); R00DA023549 (NAG); and HSRI 67–095 (VS). LD acknowledges support from NHMRC Investigator Grant L3 (2016825), NHMRC Senior Principal Research Fellowship (1135991), the National Drug and Alcohol Research Centre (NDARC), and UNSW funded by the Australian Government Department of Health. The views expressed in this publication do not necessarily represent the position of the Australian Government.
The AGDS was primarily funded by the National Health and Medical Research Council (NHMRC) of Australia (Grant No. APP1086683) to NGM. This work was further supported by NHMRC grants (No. 1145645, 1078901, and 1087889). NGM is supported by an NHMRC Investigator Grant (No. APP1172990).
GBP: Data collection was funded, and data analysis was supported by the Australian National Health and Medical Research Council (No. APP1138514) to SEM. SEM is supported by a National Health and Medical Research Council Investigator Grant (No. APP1172917).
Some of the datasets used for the analyses described were obtained from Vanderbilt University Medical Center’s BioVU, which is supported by numerous sources: institutional funding, private agencies, and federal grants. These include the NIH-funded Shared Instrumentation Grant S10RR025141 and CTSA grants UL1TR002243, UL1TR000445, and UL1RR024975. Genomic data are also supported by investigator-led projects that include U01HG004798, R01NS032830, RC2GM092618, P50GM115305, U01HG006378, U19HL065962, R01HD074711, and additional funding sources listed at https://victr.vumc.org/biovu-funding/.
The Psychiatric Genomics Consortium’s Substance Use Disorders (PGC-SUD) Working Group receives support from the National Institute on Drug Abuse via R01DA054869. We gratefully acknowledge prior support from the National Institute on Alcohol Abuse and Alcoholism and support from the National Institutes of Mental Health to the overall PGC. Statistical analyses were carried out on the NL Genetic Cluster Computer (http://www.geneticcluster.org) hosted by SURFsara.
Competing interests
The authors declare none.