Introduction
Inventories of species diversity created by conducting taxonomy and systematics research underpin many other research areas in biological sciences. Parasite species need to be discovered and characterised to be distinguished from other species. Undescribed species, as well as species not yet represented with sequence data, limit our ability to trace parasite diversity through space and time (Poulin et al. Reference Poulin, Hay and Jorge2019). The number of species discoveries of parasitic helminths (hereafter parasites) has sustained a steady growth for decades (Cribb et al. Reference Cribb, Bott, Bray, McNamara, Miller, Nolan and Cutmore2014; Poulin Reference Poulin2014; Poulin and Presswell 2016). Today, collaborations among researchers, taxonomists, and other experts contributing to new species publications result in higher quality and more comprehensive species characterisations than in the past (Poulin and Presswell Reference Poulin and Presswell2016). Nevertheless, the field of parasite taxonomy faces an imminent crisis. It lacks turnover for a generation of prolific researchers approaching retirement, and funding for the discipline decreases (Brooks and Hoberg Reference Brooks and Hoberg2001; Cribb Reference Cribb2016; Pearson et al. Reference Pearson, Hamilton and Erwin2011; Poulin and Presswell Reference Poulin and Presswell2022). At current discovery rates, it would take centuries to systematically sample, collect, and name all vertebrate parasites (Carlson et al. Reference Carlson, Dallas, Alexander, Phelan and Phillips2020).
In times of a biodiversity crisis such as the one we are living in, research should speed up to fulfil the knowledge needs of society. But instead, parasitology is lagging behind. The popularisation of molecular methods in the last 50 years turned DNA sequences into a fundamental resource, together with morphological analysis, for the study of parasite diversity and species discovery, as well as a valuable tool to resolve phylogenetic relationships, parasite distribution range, and life cycles (Blasco-Costa et al. Reference Blasco-Costa, Cutmore, Miller and Nolan2016; Blasco-Costa and Poulin Reference Blasco-Costa and Poulin2017; Caira Reference Caira2011; Perez-Ponce de Leon and Poulin Reference Perez-Ponce de Leon and Poulin2018; Perkins et al. Reference Perkins, Martinsen and Falk2011). However, the adoption of molecular technological breakthroughs in parasitology research, whether the use of allozymes (see Glossary), nucleotide sequences, or more recently genomic data, progresses at a slower pace than in the general biology field (Selbach et al. Reference Selbach, Jorge, Dowle, Bennett, Chai, Doherty, Eriksson, Filion, Hay, Herbison, Lindner, Park, Presswell, Ruehle, Sobrinho, Wainwright and Poulin2019). Likewise, the implementation of diverse molecular approaches by parasitologists, like population genetics or eDNA (see Glossary) methods, is overdue (Bass et al. Reference Bass, Stentiford, Littlewood and Hartikainen2015; Criscione Reference Criscione, Janovy and Esch2016; Hupało et al. Reference Hupało, Blasco-Costa, Trujillo-González, Leese, Smit and Sures2025). Overall, research studies on parasites contributing or using molecular data represent only 5% of the total molecular research output, with veterinary and medically relevant species accounting for most of the publications (Selbach et al. Reference Selbach, Jorge, Dowle, Bennett, Chai, Doherty, Eriksson, Filion, Hay, Herbison, Lindner, Park, Presswell, Ruehle, Sobrinho, Wainwright and Poulin2019). Here, I posit that the parasitology research community has not yet fully embraced molecular advances that allow the use of multiple independent loci (i.e., genome-wide markers; see Glossary) to study parasite species diversity, including species delimitation, phylogeography, and interspecies phylogenetic relationships. Aligned with Poulin’s (2025) idea, the parasite diversity discipline has mostly focused on breadth but has lacked depth in the application of advanced molecular tools to taxonomically complex scenarios. Caution in the interpretation of complex cases is a first step, as requested by Cribb et al. (Reference Cribb, Barton, Blair, Bott, Bray, Corner, Cutmore, De Silva, Duong, Faltýnková, Gonchar, Hechinger, Herrmann, Huston, Johnson, Kremnev, Kuchta, Louvard, Luus-Powell, Martin, Miller, Pérez-Ponce de León, Smit, Tkach, Truter, Waki, Vermaak, Wee, Yong and Achatz2025), but it will not unequivocally resolve the complexity. To move forward, species delimitations of ‘taxonomically difficult’ cases sensu Cribb et al. (Reference Cribb, Barton, Blair, Bott, Bray, Corner, Cutmore, De Silva, Duong, Faltýnková, Gonchar, Hechinger, Herrmann, Huston, Johnson, Kremnev, Kuchta, Louvard, Luus-Powell, Martin, Miller, Pérez-Ponce de León, Smit, Tkach, Truter, Waki, Vermaak, Wee, Yong and Achatz2025) and other research questions (see BOX 1 below) should be addressed using advanced molecular approaches providing genome-wide markers. Otherwise, we risk proposing erroneous or unsound species’ delimitation hypotheses, which could bias diversity estimates.
In this brief review, I first examine the recent historical trends in the molecular markers used specifically to study trematode diversity. I follow by pinpointing past and current molecular approaches that have failed or are struggling to take off in the field of parasitology, and discussing how the knowledge gained from the analysis of genome-wide markers would benefit research on parasite diversity today. Finally, I provide an overview of considerations for obtaining high-throughput molecular data of parasitic helminths. Although I focus on trematodes mostly, the issues I address apply broadly to other helminth parasite taxa as well.
Historical trends in the use of molecular markers for studying trematode diversity
Detecting genetic variation is a must for the study of diversity. Finding and interpreting variation at a particular genetic maker (see Glossary) will depend on the evolutionary rates of the gene or region, as well as on the scale of our question (e.g., variation at the population level within a species (intraspecific), or above the species level (interspecific) and higher taxonomic levels). One way to increase the chances of finding genetic variation is to increase the sample size of specimens analysed. Indeed, increasing the number of specimens in analyses increases the likelihood of detecting cryptic species (Perez-Ponce de Leon and Poulin Reference Perez-Ponce de Leon and Poulin2018). Another means of increasing the chances of detecting genetic variation is to use a genetic marker with a high mutation rate, so that variation is accumulated more rapidly. Among typically employed molecular markers (see Glossary), the internal transcribed spacers (ITS, also referred as ITS1-5.8S-ITS2) of the ribosomal (rRNA) gene are more variable than the large (28S) or the small (18S) ribosomal subunits (Nolan and Cribb Reference Nolan and Cribb2005); the mitochondrial (mt) genes are more variable than the ribosomal genes and spacers (Avise et al. Reference Avise, Arnold, Ball, Bermingham, Lamb, Neigel, Reeb and Saunders1987; Brown et al. Reference Brown, George and Wilson1979; Vilas et al. Reference Vilas, Criscione and Blouin2005), and the microsatellite regions spread over the genome are more variable than mitochondrial genes (Avise Reference Avise2004). In trematodes, the 28S and the ITS are predominantly used to assess species diversity, species delimitations, and phylogenetic relationships, with mt genes being commonly employed in studies focused on genetic variation within a species, at population and phylogeographic levels (Blasco-Costa et al. Reference Blasco-Costa, Cutmore, Miller and Nolan2016). Studies on cryptic species have focused more on obtaining mt cytochrome c oxidase subunit 1 (Cox1) sequences than on nuclear sequences (Perez-Ponce de Leon and Poulin Reference Perez-Ponce de Leon and Poulin2018). Thus, over time, researchers are putting more effort on sequencing markers with higher evolutionary rates. However, are we expanding the choice of markers employed beyond these?
To update our knowledge on the current practice in the field, I performed a literature review by searching for publications in Web of Science (WoS) with title, abstract or keywords that included the following terms TS= (‘*barcod*’ or ‘phylogen*’ or ‘population genetic*’ or ‘population genomic*’ or ‘phylogeog*’). I retained studies on trematodes for the period 2020–2023 and acknowledge that the literature search did not capture all the studies on trematode species diversity, but provided a representative sample. I compared the results to those reported in Blasco-Costa et al. (Reference Blasco-Costa, Cutmore, Miller and Nolan2016), compiled from 252 articles published in 11 major parasitology journals from 2011 to 2015. The current data were mined from 434 articles including multiple species descriptions, but also diversity and lifecycle research studies, as well as many species diagnostic or epidemiology assessments of parasites of medical or veterinary importance.
In the last 4 years, the 28S rDNA subunit and the ITS regions remain the most employed molecular markers (Figure 1a). The percentage of studies using mt genes has increased, with the Cox1 being employed in over 40% of studies. However, the use of multilocus genetic markers, such as microsatellites, has not changed over time (see Blasco-Costa et al. Reference Blasco-Costa, Cutmore, Miller and Nolan2016). Note that the percentage of studies exceeds 100% because several genetic markers can be employed in the same study. Indeed, the number of genetic markers used in any particular study has not changed either, with 25 to 35% of studies every year still limited to the use of a single genetic marker, and 30 to 40% of studies using two genetic markers, rarely more (Figure 1b). The number of mitogenomes of trematodes available at the National Centre for Biotechnology Information (NCBI) reaches now 364 entries, representing 111 species, of which over half of them are of medical (24%, including zoonotic species) or veterinary (23%) importance. Except for the emergence of pioneer studies employing whole mitogenomes, the current use of genetic markers resembles closely that of a decade ago. Thus, trematode diversity research is highly conservative in the genetic resources and approaches utilised, restricted to single locus or a combination of two, rarely three mt and ribosomal loci, despite the rapid changes in molecular approaches and sequencing technologies in the last 20 years (Slatko et al. Reference Slatko, Gardner and Ausubel2018). Although I investigated only the trematode literature herein, these results and conclusion apply to other parasite groups as well (see data in Rojas et al. Reference Rojas, Bass, Campos-Camacho, Dittel-Meza, Fonseca, Huang-Qiu, Olivares, Romero-Vega, Villegas-Rojas and Solano-Barquero2025).

Figure 1. Percentage of studies between 2020 and 2023 using a) different molecular markers, b) one or more molecular markers simultaneously. Annual variation in the percentage of studies represented by one or more distribution peaks per molecular marker and colour gradient according to increasing percentage. Abbreviations: 28S, large ribosomal DNA region; ITS, the gene cluster including the internal transcribed spacer 1, the 5.8S rDNA region and the internal transcribed spacer 2 (considered as 2 markers in Figure 1b since 5.8S is short and rarely used independently); Cox1, mitochondrial (mt) cytochrome c oxidase subunit 1; ITS2, internal transcribed spacer 2; 18S, small ribosomal RNA region; ITS1, internal transcribed spacer 1.
How the approaches we have missed make us fall behind
For some time already, taxonomists working on free-living taxa have proposed that detailed species diversity assessment and species delimitation require the sequencing of multilocus genetic markers (i.e., multiple loci/ genome-wide markers) and integrative methods (Fujita et al. Reference Fujita, Leaché, Burbrink, McGuire and Moritz2012; Sites Jr and Marshall Reference Sites and Marshall2004). Although integrative taxonomy (Dayrat Reference Dayrat2005), using multiple lines of evidence, is often reported for species delimitation in parasite diversity research, the use of multiple loci with independent evolutionary histories is generally absent. Sequencing the whole mitogenome may seem like a wealth of data because of the number of functional genes it comprises. However, the phylogenetic information contained is limited because these genes are transmitted as a mostly non-recombining unit through female lines (Avise Reference Avise2004). Thus, their phylogenetic information is linked and non-independent, equivalent to a single locus genetic marker. In parasites, single-locus (e.g., the broadly used barcoding mt Cox1 marker) or a few loci with linked evolutionary history (e.g., the different subunits and ITS of the ribosomal gene) are commonly considered effective proxies for species discovery by detecting species-level lineages, and specimen identification/diagnostics when the species is already characterised molecularly. Although this may be the case for many parasite species that have diverged for a long time, there exist several examples where interpretation of patterns in Cox1 and ribosomal markers are not straightforward (Cribb et al. Reference Cribb, Barton, Blair, Bott, Bray, Corner, Cutmore, De Silva, Duong, Faltýnková, Gonchar, Hechinger, Herrmann, Huston, Johnson, Kremnev, Kuchta, Louvard, Luus-Powell, Martin, Miller, Pérez-Ponce de León, Smit, Tkach, Truter, Waki, Vermaak, Wee, Yong and Achatz2025). These cases would benefit from the use of genome-wide markers. Furthermore, mt and ribosomal markers are often not representative of entire phylogenetic history of the species (i.e., the species tree), due to their idiosyncratic evolution (Brower et al. Reference Brower, DeSalle and Vogler1996; Dasmahapatra et al. Reference Dasmahapatra, Elias, Hill, Hoffman and Mallet2010; Dupuis et al. Reference Dupuis, Roe and Sperling2012; Rubinoff et al. Reference Rubinoff, Cameron and Will2006). As reviewed quantitatively in the section above and discussed in Cribb et al. (Reference Cribb, Barton, Blair, Bott, Bray, Corner, Cutmore, De Silva, Duong, Faltýnková, Gonchar, Hechinger, Herrmann, Huston, Johnson, Kremnev, Kuchta, Louvard, Luus-Powell, Martin, Miller, Pérez-Ponce de León, Smit, Tkach, Truter, Waki, Vermaak, Wee, Yong and Achatz2025), to this date, most genetic evidence used in species circumscription of trematodes is still based on the peculiar evolution of the most variable genes analysed, often a mt gene, though combined with other non-genetic data in an integrative manner.
Several approaches for obtaining genetic data from multiple loci have been used in research on free-living species, particularly for assessing species diversity and defining species boundaries, but these are dramatically absent in studies of parasitic helminths. Microsatellite loci are highly polymorphic tandem repeat regions found at high frequencies in the nuclear genome of most taxa, with each microsatellite locus consisting of a tandem repetition of a short sequence (1–6 nucleotides) at a particular chromosomal location, with variation in the repeat copy number often underlying the presence of distinct alleles in a population (Avise Reference Avise2004). Microsatellites were discovered in the late 1980s and became very commonly employed molecular markers to assess population differentiation and relationships among closely related species in a large variety of taxonomic groups (Avise Reference Avise2004; Kalia et al. Reference Kalia, Rai, Kalia, Singh and Dhawan2011; Selkoe and Toonen Reference Selkoe and Toonen2006). Conversely, they struggled to gain attention in the field of helminth parasitology. Performing a search in WoS with the terms in Table 1 and selecting for must include ‘microsatellite*’, from 2000 to 2024, I compared the number of records from the general literature, including both free-living and parasitic taxa, to those on parasites, as well as to the number of records on trematodes exclusively. Only 1.3% of the studies using microsatellites are focused on parasitic helminths, and of those, half of them were on trematodes (Figure 2). Microsatellites are still used today and are particularly suitable for diagnostics, where hundreds to thousands of samples of the same species are routinely analysed (Bazsalovicsová et al. Reference Bazsalovicsová, Minárik, Šoltys, Radačovská, Kuhn, Karlsbakk, Skírnisson and Králová-Hromadová2020; Knapp et al. Reference Knapp, Meyer, Courquet, Millon, Raoul, Gottstein and Frey2021; Umhang et al. Reference Umhang, Bastid, Avcioglu, Bagrade, Bujanić, Bjelić Čabrilo, Casulli, Dorny, van der Giessen, Guven, Harna, Karamon, Kharchenko, Knapp, Kolarova, Konyaev, Laurimaa, Losch, Miljević, Miterpakova, Moks, Romig, Saarma, Snabel, Sreter, Valdmann and Boué2021).
Table 1. Literature search terms and topics to extract the records from Web of Science and compare them across parasite groups and the general literature (including both, free-living and parasitic taxa)


Figure 2. Percentage of studies on helminths and trematodes from the general literature output (including free-living and parasitic taxa) using early approaches providing multiple independent loci: a) microsatellites and b) amplified fragment-length polymorphisms (AFLP) or restriction fragment-length polymorphisms (RFLP). Data from Web of Science, published in the period 2000–2024.
RFLP (randomly amplified polymorphic DNAs) and AFLP (amplified fragment-length polymorphisms) are two other early approaches that generate data for multiple loci. In this case, the patterns of genomic DNA (gDNA; see Glossary) fragmentation across individuals is compared, but sequence data are not often obtained. They appeared in the early 1990s (Vos et al. Reference Vos, Hogers, Bleeker, Reijans, Tvd, Hornes, Friters, Pot, Paleman and Kuiper1995; Welsh et al. Reference Welsh, Petersen and McClelland1991; Williams et al. Reference Williams, Kubelik, Livak, Rafalski and Tingey1990) and lasted about a decade. They were mostly employed in population genetics, but AFLP found applications in forensic and genomic (see Glossary) analysis that require large numbers of qualitatively unlinked polymorphisms (Mueller and Wolfenbarger Reference Mueller and Wolfenbarger1999). When dominant markers were suitably analysed, AFLPs allowed estimation of relatedness between individuals with a large number of loci. A search in the literature using the terms in Table 1 but selecting for must include AFLP AND RFLP produced only 1.4% of records associated with parasitic helminths, and only 0.2% included trematodes. Among parasitic helminths, most studies were conducted on parasitic nematodes, and the only trematodes studied were Fasciola spp. (Huang et al. Reference Huang, He, Wang and Zhu2004; Ichikawa-Seki et al. Reference Ichikawa-Seki, Hayashi, Tashiro and Khadijah2022). Nevertheless, both approaches were quickly superseded with the rise in popularity of the microsatellite markers that are also highly polymorphic and provide data on more informative codominant markers (Avise Reference Avise2004).
With the advent of the high-throughput sequencing, many novel approaches have emerged that allow molecular characterisation of multiple genomic loci for one or several specimens at an always decreasing cost (De Coster et al. Reference De Coster, Weissensteiner and Sedlazeck2021; Goodwin et al. Reference Goodwin, McPherson and McCombie2016; Hook and Timp Reference Hook and Timp2023; Wetterstrand Reference Wetterstrand2023). Thousands of single nucleotide polymorphisms (SNPs; see Glossary) can be obtained, even assembled into the genomes of non-model organisms for which no previous genome resources may exist (Davey and Blaxter Reference Davey and Blaxter2011; Luikart et al. Reference Luikart, Kardos, Hand, Rajora, Aitken, Hohenlohe and Rajora2019). However, parasite genomic (see Glossary) research is still in its infancy by the proportion of studies conducted, and it is mainly dominated by medically important species (Selbach et al. Reference Selbach, Jorge, Dowle, Bennett, Chai, Doherty, Eriksson, Filion, Hay, Herbison, Lindner, Park, Presswell, Ruehle, Sobrinho, Wainwright and Poulin2019). Today, the number of available genomes of trematodes reaches 67 entries at NCBI (of various completeness and quality), representing 34 distinct species, of which 62% are of medical relevance (including zoonotic species), 26% are of veterinary relevance, and only 12% are other trematodes. Although 34 species represent a negligible portion of the diversity of trematodes, other helminths are even less represented, only 23 species of cestodes (38% of medical relevance) and a single species of acanthocephalan. Nematodes have the highest number of genomes sequenced among parasitic helminths, with 617 entries for 252 species/species-level lineages, of which 42% are non-parasitic, 39% are parasites of animals, 16% plant parasites, 2% facultative, and 1% symbionts. But nematodes are also the most species-rich helminth phylum.
This retrospective assessment highlights that molecular approaches providing genome-wide markers have found resistance in the trematode and the helminth parasitology field over the past four decades. However, by falling behind in the application of such approaches, we are lacking comprehensive datasets that permit to resolve confidently ‘taxonomically difficult’ species boundaries, as well as fine-scale, questions related to the evolution, phylogeography, population biology, and adaptation of these organisms, which are otherwise intractable with traditional genetic markers (Luikart et al. Reference Luikart, Kardos, Hand, Rajora, Aitken, Hohenlohe and Rajora2019; Thorn et al. Reference Thorn, Maness, Hulke, Delmore and Criscione2023). We lack understanding of the role of important processes generating diversity, like hybridization (see Glossary), incomplete lineage sorting (see Glossary), or selection (Thorn et al. Reference Thorn, Maness, Hulke, Delmore and Criscione2023). Besides, we are unable to challenge putatively erroneous interpretations of evolutionary histories generated with traditional markers (e.g., Andriollo et al. Reference Andriollo, Naciri and Ruedi2015; Glon et al. Reference Glon, Quattrini, Rodríguez, Titus and Daly2021), and we may bias species diversity estimates (e.g., Hupalo et al. Reference Hupalo, Copilas-Ciocianu, Leese and Weiss2023). Given that a cohort of highly prolific parasite taxonomists is approaching retirement without obvious turnover and that most species are only described once (Poulin and Presswell Reference Poulin and Presswell2022), young taxonomists and researchers working on parasite diversity and evolution should embrace new molecular approaches and high throughput technologies. The use of genome-wide markers, together with integrative taxonomy, may partially compensate for the loss of expertise by contributing a wealth of molecular data that would increase confidence in species delimitations and species/populations over a range (phylogeography) (Vences et al. Reference Vences, Miralles and Dufresnes2024). Key applications of genome-wide markers to diversity research questions in parasites (BOX 1) include the ‘taxonomically difficult’ cases sensu Cribb et al. (Reference Cribb, Barton, Blair, Bott, Bray, Corner, Cutmore, De Silva, Duong, Faltýnková, Gonchar, Hechinger, Herrmann, Huston, Johnson, Kremnev, Kuchta, Louvard, Luus-Powell, Martin, Miller, Pérez-Ponce de León, Smit, Tkach, Truter, Waki, Vermaak, Wee, Yong and Achatz2025) such as species complexes and cryptic (or nearly cryptic) species, as well as species with a complex taxonomic history, recently diverged species, or a history of suspected mitonuclear discordance (see Glossary), introgression (see Glossary), or hybridization with other species (e.g., Cribb et al. Reference Cribb, Bray, Justine, Reimer, Sasal, Shirakashi and Cutmore2022; Krupenko et al. Reference Krupenko, Kremnev, Skobkina, Gonchar, Uryadova and Miroliubov2022; Panzner and Boissier Reference Panzner and Boissier2021; Tantrawatpan et al. Reference Tantrawatpan, Tapdara, Agatsuma, Sanpool, Intapan, Maleewong and Saijuntha2021).
BOX 1: Key applications of genome-wide markers to diversity research questions in parasitic helminths
-
1. How many species are there?
-
• Provide comprehensive molecular evidence to resolve ‘taxonomically difficult’ species sensu Cribb et al. (Reference Cribb, Barton, Blair, Bott, Bray, Corner, Cutmore, De Silva, Duong, Faltýnková, Gonchar, Hechinger, Herrmann, Huston, Johnson, Kremnev, Kuchta, Louvard, Luus-Powell, Martin, Miller, Pérez-Ponce de León, Smit, Tkach, Truter, Waki, Vermaak, Wee, Yong and Achatz2025), such as species complexes and cryptic (or nearly cryptic) species, but also polymorphic species with no evidence of divergence based on traditional markers.
-
• Validate species delineations in taxa with a complex taxonomic history, or suspected recent origin/divergence (e.g., taxa with limited or no variation in ribosomal markers but some variation in mt markers; or forming distinct clades with short branches in mt-based phylogenetic trees).
-
• Elucidate mitonuclear discordance, hybridization, and introgression among taxa with a suspected history.
-
-
2. What is the phylogeographic distribution of a taxon?
-
• Confirm distribution over a geographical range in cosmopolitan and widely distributed taxa.
-
• Evaluate taxa circumscriptions in species rich genera with wide distributions or species with disjoint distributions.
-
-
3. What are the evolutionary relationships among these taxa?
-
• Assess the systematics of a taxon, especially when traditional markers do not fully resolve the relationships among the included taxa.
-
Embracing omics in trematode and parasitic helminth diversity research
Perhaps sequencing whole genomes (see Glossary) is not the most accessible, nor appropriate strategy to study the diversity and evolution of non-model trematodes today. After all, obtaining whole genome data is still costly for most taxonomic projects, especially when species lack medical or veterinary relevance. If interested in establishing the relationships among taxa, we would require investing on sequencing the species of interest, but also all the closely related species we would like to compare it to, given that few genomes are available to date (see section above). Instead, reduced-representation genome sequencing (see Glossary) or sequence-capture (see Glossary) approaches are best suited to provide genome-wide markers for diversity studies at multiple scales. Although still scarce in the trematode literature, a few recent examples exist on the use of genome-wide markers of trematodes or other Platyhelminthes, that illustrate the application of multiple loci data to uncover species diversity, delimit species, examine phylogeographic and population structure, or evolutionary histories.
Feijen et al. (Reference Feijen, Zajac, Vorburger, Blasco-Costa and Jokela2022) used a combination of 35 diallelic, synonymous nuclear SNP positions in coding genes for about 400 specimens, the mt NADH-ubiquinone oxidoreductase chain 5, and 28S rDNA and ITS2 for a subset of specimens to assess cryptic species structure and phylogeographic patterns of, a priori, a microphallid species across the Southern Alps of New Zealand. Their extensive sampling over the geographic range with ecological and molecular data for multiple independent loci allowed the authors to detect strong mitochondrial divergence across geographic regions dating from the Pleistocene, and mitonuclear discordance for populations in a potential secondary contact zone (i.e., divergent gene flow of nuclear and mitochondrial genomes). Furthermore, a thorough genetic characterisation and genetic delimitation allowed the recognition of a species complex, with at least three divergent putative species, although no taxonomic action could be taken since the specimens were juvenile metacercariae.
The true diversity of 139 plerocercoid specimens of Ligula intestinalis (L.) sensu lato (Cestoda: Diphyllobothriidea), a globally distributed tapeworm with a wide host spectrum, was analysed using reduced-representation genome-wide SNP data (including thousands of SNPs) and sequences for 3 mt genes (cytochrome b, cytochrome oxidase subunit 1, and NADH-ubiquinone oxidoreductase chain 1) from 5 biodiversity realms (Nazarizadeh et al. Reference Nazarizadeh, Nováková, Loot, Gabagambi, Fatemizadeh, Osano, Presswell, Poulin, Vitál, Scholz, Halajian, Trucchi, Kočová and Štefka2023). The authors unraveled at least 10 evolutionary lineages, with the deepest divergence c. 4.99–5.05 Mya, which is much younger than the diversification of the fish host genera and orders utilized. Their study also shows the impact of historical distribution shifts on host switching and the evolution of host specificity without parallel host-parasite cospeciation.
Using reduced-representation genome-wide SNP data (including thousands of SNPs) of over 400 adult specimens, Brabec et al. (Reference Brabec, Gauthier, Selz, Knudsen, Bilat, Alvarez, Seehausen, Feulner, Præbel and Blasco-Costa2024) examined the reciprocal effects of the postglacial European whitefish adaptive radiations on the differentiation of Proteocephalus fallax La Rue, 1911 (Cestoda: Proteocephalidae) populations among lakes and within lakes across sympatric whitefish species. The authors found strong geographic differentiation of P. fallax populations among lakes, while postglacial whitefish intralake radiations prompted P. fallax host repertoire expansion in all lakes, with differentiation among whitefish hosts remaining incipient. Such recent differentiation was driven by the host ecology rather than host genetics, supporting differentiation by ecological fitting rather than by codivergence with the radiated hosts.
In a recent assessment of the current systematic classification of the Diplostomida, Locke et al. (Reference Locke, Van Dam, Caffara, Pinto, López-Hernández and Blanar2018) used mitogenomes and hundreds of ultra-conserved genomic elements (UCEs) for seven representative species to reconstruct the phylogeny of the order. The authors detected mitonuclear discordance in the Diplostomida, which was recovered as paraphyletic based on mitogenome phylogeny but monophyletic based on UCEs distributed across the nuclear genome. Both datasets supported monophyly of the Diplostomoidea and showed congruent relationships within it.
The few examples above often built upon questions emerging from previous studies based on classical molecular markers. Such studies are still relevant and necessary, although with limitations as mentioned in the sections above. Likewise, genomic datasets can be mined to extract popular markers of interest to address other taxonomic and diversity questions (Brabec et al. Reference Brabec, Rochat, Knudsen, Scholz and Blasco-Costa2023). Therefore, both types of studies should not walk different paths but provide feedback on each other’s results to advance the field forward.
Challenges and recommendations for obtaining high-throughput molecular data of parasitic helminths
Even with a permitting budget, there are practical challenges to our ability to get genome-wide markers from high-throughput sequencing of trematodes and other parasitic helminths. The first challenge is to obtain sufficient gDNA of acceptable quality from a single individual to successfully build the genomic library. At least three factors play a crucial role: the size of the worm, the preservation of the specimen, and capacity to recover gDNA of the extraction kit or protocol. All else being equal, larger worms contain more cells and therefore more gDNA. Typically, adults are the larger developmental stage of parasites and thus the most suitable. But sometimes they are inaccessible, whereas juvenile or larval stages are more easily encountered in hosts or the environment. Depending on the particular taxa, it may be out of reach to use reduced-representation or whole genome sequencing approaches today (but see below). The type of tissue fixation and the preservative used for storage can also have an impact on the parasite gDNA yield. A priori, specimens extracted fresh should produce the best quality and quantity gDNA, followed by specimens fixed alive in a buffer (e.g., RNAlater, DNAzol) or by flash frozen in liquid nitrogen and stored dry at -80°C, or preserved in 95% ethanol concentration before storage at -20°C (Mulcahy et al. Reference Mulcahy, Macdonald, Brady, Meyer, Barker and Coddington2016; Oosting et al. Reference Oosting, Hilario, Wellenreuther and Ritchie2020). Although several studies have compared tissue preservation methods and storage conditions, a thorough systematic comparison of the most employed ones is lacking, especially for helminths. This is important, as we may be among the last generations of scientists to be allowed to collect specimens anew, given the increasing complexity of regulations for collecting vertebrate wildlife and their conservation status (Fukushima et al. Reference Fukushima, West, Pape, Penev, Schulman and Cardoso2021). Thus, optimisation of the most adequate specimen preservation media for long-term storage in museum collections is needed to ensure the most data is retrievable from them in the future.
A great diversity of commercial extraction kits is available nowadays, in addition to traditional in-house protocols. To date, a few studies have tested gDNA yields from samples preserved in different fixatives and/or using different extraction kits/procedures (Ayana et al. Reference Ayana, Cools, Mekonnen, Biruksew, Dana, Rashwan, Prichard, Vlaminck, Verweij and Levecke2019; Doyle et al. Reference Doyle, Sankaranarayanan, Allan, Berger, Jimenez Castro, Collins, Crellen, Duque-Correa, Ellis, Jaleta, Laing, Maitland, McCarthy, Moundai, Softley, Thiele, Ouakou, Tushabe, Webster, Weiss, Lok, Devaney, Kaplan, Cotton, Berriman and Holroyd2019; Papaiakovou et al. Reference Papaiakovou, Pilotte, Baumer, Grant, Asbjornsdottir, Schaer, Hu, Aroian, Walson and Williams2018), often restricted to the idiosyncrasies of the taxonomic group or type of environmental sample concerned. Thus, the information is only marginally informative to other parasitic taxa or developmental stages. Comparative studies applying similar kits and protocols to a variety of taxa are still lacking. Because most library preparation protocols for high-throughput sequencing have minimum quantity gDNA requirements and volume limitations, extraction methods that allow maximum yield recovery and low elution volumes are the most appropriate.
Although reduced-representation genome sequencing (and whole genome) approaches on individual parasite specimens are difficult to implement for samples with low DNA yields, it is possible to employ whole-genome amplification kits on extracted gDNA (Doyle et al. Reference Doyle, Laing, Bartley, Britton, Chaudhry, Gilleard, Holroyd, Mable, Maitland, Morrison, Tait, Tracey, Berriman, Devaney, Cotton and Sargison2017; Shortt et al. Reference Shortt, Card, Schield, Liu, Zhong, Castoe, Carlton and Pollock2017; Small et al. Reference Small, Labbé, Coulibaly, Nutman, King, Serre and Zimmerman2019). However, this technique has a significant economic cost, and it can bias the sequenced data by introducing technical artifacts that can lead to genomic data disparity from unamplified samples (Sabina and Leamon Reference Sabina, Leamon and Kroneis2015; Taitt et al. Reference Taitt, Leski, Compton, Chen, Berk, Dorsey, Sozhamannan, Dutt and Vora2024; Tsai et al. Reference Tsai, Hunt, Holroyd, Huckvale, Berriman and Kikuchi2013). Thus, depending on the goal of the study, whole-genome amplification should be avoided. Another alternative is the use of high-throughput sequencing of pooled samples (Pool-seq; see Glossary). This approach is very cost-effective but comes with other complexities in data analysis and limitations (Schlötterer et al. Reference Schlötterer, Tobler, Kofler and Nolte2014).
Prior to starting library preparation, all samples need to be available, as it is important to randomize samples position in the plate as well as include technical replicates to avoid spurious results and check for library contamination after sequencing. It is also wise to run a sequencing test on a subset of libraries before the library protocol is implemented to all available samples. This allows troubleshooting any potential problems without potentially wasting all the samples, time, and money in unsuccessful library preparation. Furthermore, the efficacy of genomic approaches in addressing particular research questions can vary (e.g., sequence-capture versus ddRAD sequencing, Glon et al. (Reference Glon, Quattrini, Rodríguez, Titus and Daly2021)) and a few critical decisions are always required (e.g., choice of digestion enzymes; Brabec et al. (Reference Brabec, Gauthier, Selz, Knudsen, Bilat, Alvarez, Seehausen, Feulner, Præbel and Blasco-Costa2024)). Because there is no single best or most flexible method, researchers must consider the trade-offs of the different methods and choose the one that is best suited to their study goals.
Looking ahead
Certainly, the molecular data gathered over the past decades have contributed immensely to reveal the diversity of trematodes and other parasitic helminth species, delineate species, and map their spatial and host distribution. But speciation is a continuous and complex process, involving the evolution of multiple, interacting reproductive barriers among populations that affect patterns of variation across the whole genome at different rates (Fujita et al. Reference Fujita, Leaché, Burbrink, McGuire and Moritz2012; Ravinet et al. Reference Ravinet, Faria, Butlin, Galindo, Bierne, Rafajlović, Noor, Mehlig and Westram2017). Although senior parasitologists in the field were often not trained in genetics or genomics, present generations of early-career researchers are educated in the foundations of genomics and genomic analyses as to embrace the application of these methods to parasites. Besides, strong collaborations with geneticists will speed up the advancement of the parasitology community in such regards. The blooming of sequencing technology and genomic approaches in the last two decades offer unprecedented opportunities to test species hypotheses in complex species assemblages by examining multiple loci across the genome and quantifying genetic variation, alongside other sources of evidence. Embracing this opportunity will bring us closer to understand the evolutionary processes that shape the genomic differentiation of parasitic organisms, the links between patterns of genomic differentiation/divergence and phenotypes, and speciation in parasitic organisms. In a time when emerging diseases and species distribution changes occur faster than ever, progress on these broad and timely subjects is a must.
Glossary
- Allozymes:
-
structurally variant forms of an enzyme with equal function, which are coded by different alleles at the same locus. They exhibit high levels of evolutionary conservatism throughout phyla. They were used as molecular markers to infer the evolutionary histories and relationships among species prior to the use of DNA sequence data.
- eDNA/ Environmental DNA:
-
DNA of organisms collected from environmental samples such as soil, water, or even feces of other organisms. It can be sequenced by environmental genomic techniques and reveal the presence of species in an ecosystem, including microscopic and difficult-to-observe ones. eDNA is often degraded in small sequence fragments.
- gDNA/ Genomic DNA:
-
Chromosomal DNA typically found in the cell nucleus. It contrasts to extra-chromosomal DNAs like the one found in the mitochondria or plasmids organelles in eukaryotic cells.
- Genomics:
-
interdisciplinary field of molecular biology focused on the structure, function, evolution, mapping, and editing of genomes. Genomics aims at the characterization and quantification of all genes of an organism, their interrelations and influence on the organism.
- Genome-wide markers:
-
partial sequences or single nucleotide polymorphisms (SNPs) on the DNA of an organism distributed across their genome. Genome-wide markers are also considered multiple independent loci if they are not closely linked or influenced by each other.
- Hybridization/Introgressive hybridization:
-
the process of interbreeding two different parent species to create an offspring that is genetically similar to both parent organisms. Hybridization occurs naturally, resulting in a greater genetic variety of plant and animal species. Simple hybridization results in a relatively even mixture of the genomes; gene and allele frequencies in the first generation will be a uniform mix of two parental species.
- Incomplete lineage sorting:
-
A phenomenon in evolutionary biology and population genetics that results in discordance between species and gene trees. In a speciation event, one or both daughter species inherits a subset of alleles present in the parental species (retention of ancestral polymorphism), contrary to complete lineage sorting where both daughter species will inherit all alleles of the gene in question. This results in differences between the average divergence time between genes and the divergence time between species.
- Introgression:
-
the transfer of genetic material from one species into the gene pool of another following hybridization and repeated backcrossing to the parental species. It is a long-term process.
- Mitonuclear discordance:
-
the phenomenon where the evolutionary history of mitochondrial (mt) DNA differs from that of nuclear (n) DNA. This is when phylogenetic trees based on mtDNA depict two groups of organisms as closely related, while trees based on nDNA show they are not. This discordance can arise from various factors including incomplete lineage sorting, introgressive hybridization, and asymmetrical gene flow, among others.
- Molecular marker:
-
also referred to as genetic or DNA marker, is a fragment of DNA sequence used to identify variations between individuals or within a population. The sequence may or may not correspond to a gene, and it is associated with a certain location within the genome, either in the cell nucleus (nuclear genetic marker) or its organelles (e.g., mitochondria or chloroplast genetic marker).
- Multiple independent loci:
-
(singular ‘locus’) Loci are used as equivalent to markers herein. Loci refers to the specific, distinct positions on chromosomes where particular genes or genetic markers are located. Loci are considered independent if they are sufficiently apart so that their inheritance of alleles (variants of a gene) is not closely linked or influenced by each other.
- Pool-seq/ Pooled sequencing:
-
a sequencing approach in which the gDNA of multiple individuals is combined for the library preparation and sequenced in bulk to obtain genome-wide polymorphism data for the sample rather than the individual specimens. This is a cost-effective method, especially for non-model organisms of small size with low DNA quantities and large number of samples.
- Reduced-representation genome sequencing/ approaches:
-
DNA library preparation and sequencing protocols that allow to analyze a portion of the genome, rather than the entire genome, to obtain partial fragments of DNA sequences or SNPs across the genome of an organism. By focusing on specific regions or fragments, these approaches make sequencing cost-effective and efficient, especially for non-model organisms and large number of samples.
- Sequence-capture methods:
-
also known as target enrichment, it is a sequencing method that allows to isolate and amplify specific DNA regions of interest. It uses synthetic oligonucleotide probes (known as baits) that hybridize to the target sequences in the gDNA.
- Single nucleotide polymorphism/ SNP:
-
plural ‘SNPs’. It is a substitution of a single nucleotide at a specific position in the genome. It represents a common measure of genetic variation in an organism and may be referred to as ‘allele’. SNPs may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions (regions between genes).
- Whole genome sequencing/ approaches:
-
Also referred to as genome-wide approaches. DNA library preparation and sequencing protocols that allow to obtain the DNA sequence of the entire genome of an organism, including all its genes.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0022149X25100333.
Acknowledgements
This article is based on a keynote talk presented at the Trematodes 2024 conference, held in Brisbane, Australia, September 2024. I thank Scott Cutmore for inviting me to present at the conference. I am also grateful to Kristin Herrmann (Tarleton State University, USA) and Mar Llaberia-Robledillo (University of Valencia, Spain) for providing feedback on an earlier version of this manuscript. I also thank two anonymous reviewers for their helpful suggestions and comments, which have contributed to improve the original version of this manuscript.
Financial support
This work was supported by the Natural History Museum of Geneva, Switzerland.
Competing interests
The author declares no conflict of interest.
Ethical standard
Not applicable.