Hostname: page-component-7dd5485656-dk7s8 Total loading time: 0 Render date: 2025-10-25T08:31:14.565Z Has data issue: false hasContentIssue false

Policlim: A Dataset of Climate Change Discourse in the Political Manifestos of Forty-Five Countries from 1990 to 2022

Published online by Cambridge University Press:  23 October 2025

Mary Sanford*
Affiliation:
CMCC Foundation - Euro-Mediterranean Center on Climate Change, Lecce, Italy RFF-CMCC European Institute on Economics and the Environment, Milan, Italy Centre for Research on Geography, Resources, Environment, Energy & Networks (GREEN), Bocconi University, Milan, Italy
Silvia Pianta
Affiliation:
CMCC Foundation - Euro-Mediterranean Center on Climate Change, Lecce, Italy RFF-CMCC European Institute on Economics and the Environment, Milan, Italy Centre for Research on Geography, Resources, Environment, Energy & Networks (GREEN), Bocconi University, Milan, Italy
Nicolas Schmid
Affiliation:
CMCC Foundation - Euro-Mediterranean Center on Climate Change, Lecce, Italy RFF-CMCC European Institute on Economics and the Environment, Milan, Italy INFRAS Research and Consulting, Zurich, Switzerland
Giorgio Musto
Affiliation:
CMCC Foundation - Euro-Mediterranean Center on Climate Change, Lecce, Italy RFF-CMCC European Institute on Economics and the Environment, Milan, Italy Nova School of Business and Economics, Lisbon, Portugal
*
Corresponding author: Mary Sanford; Email: mary.sanford@cmcc.it
Rights & Permissions [Opens in a new window]

Abstract

With ambitious action required to achieve global climate mitigation goals, climate change has become increasingly salient in the political arena. This article presents a dataset of climate change salience in 1,792 political manifestos of 620 political parties across different party families in forty-five OECD, European, and South American countries from 1990 to 2022. Importantly, our measure uniquely isolates climate change salience, avoiding the conflation with general environmental and sustainability content found in other work. Exploiting recent advances in supervised machine learning, we developed the dataset by fine-tuning a pre-trained multilingual transformer with human coding, employing a resource-efficient and replicable pipeline for multilingual text classification that can serve as a template for similar tasks. The dataset unlocks new avenues of research on the political discourse of climate change, on the role of parties in climate policy making, and on the political economy of climate change. We make the model and the dataset available to the research community.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Introduction

As the impacts of the climate crisis become increasingly salient and a growing number of countries commit to climate mitigation and adaptation goals, climate change has gained prominence in the political arena. Academic interest in the politics of climate change has also increased, particularly focusing on wealthy, high-emission democracies (Green and Hale Reference Green and Hale2017; Hadden and Prakash Reference Hadden and Prakash2024; Keohane Reference Keohane2015; Javeline Reference Javeline2014; Ross Reference Ross2025). However, to date, there is no comprehensive and publicly available measure of climate change salience in political platforms across a wide set of countries. To address this gap, this article presents a dataset of climate change salience in political manifestos across forty-five European, OECD, and South American countries from 1990 to 2022, and a model which can be used to identify climate-relevant discourse in other political texts.

While previous work has studied climate change in political manifestos for subsets of countries, parties, or elections in Europe, the literature still lacks a measure of salience covering a wider geographic scope and an extensive time period. Such a measure can also facilitate the study of more detailed characterizations of climate policy goals and positions. With the exception of Carter et al. (Reference Carter, Ladrech, Little and Tsagkroni2018), these studies do not provide their data to the research community at the manifesto level, hindering further applications of the data and comparison with other measures. Moreover, much of the previous work tends to conflate climate change issues with environmental issues more broadly, even using the terms interchangeably. We further argue for a finer distinction of the two concepts.

Using the manifestos collected by the Manifesto Project Dataset (Lehmann et al. Reference Lehmann, Burst, Lewandowski, Matthieß, Regel and Zehnter2024), we develop and rigorously validate a supervised machine learning model to identify climate change salience in our sample of political manifestos. Combining the strength of manual coding and the scale-up power of computational tools, we train our model on a set of double-coded manual annotations that follow an inductively derived annotation framework specifically focused on climate change relevance. While the Manifesto Project scores the manifestos for pro-environmental and sustainability or anti-growth positions (variables 501 and 416), it lacks a variable specific to climate change. Indeed, we find limited overlap between our measure of climate change salience and these two Manifesto Project variables in the final dataset.

We report the performance in terms of accuracy and F1 scores.Footnote 1 In model training, we achieve accuracy and F1 scores of 0.936 and 0.948, respectively. In a randomly sampled post-hoc validation set, we achieve an accuracy score of 0.957 and an F1-score of 0.935, outperforming alternative approaches and other models trained for similar tasks (Dickson and Hobolt Reference Dickson and Hobolt2024).

We make the model and the dataset freely available to the research community.Footnote 2 The model can be used to identify climate change discourse in other political texts, while the dataset enables further research on the political economy of climate change, the role of political parties in climate policy making, as well as party responsiveness and party competition.

Political Manifestos and Climate Change Salience

While less frequent than other forms of party communication, political manifestos are widely employed within political science as they offer official policy pronouncements and provide a strong basis for analysis of party platforms that is comparable over time and space (Budge Reference Budge2002; Farstad Reference Farstad2018). There is an established literature using political manifestos to study party positions, issue salience, and other political dynamics. The Manifesto Project has developed several variables related to salience and party positions on key topics (Lehmann et al. Reference Lehmann, Burst, Lewandowski, Matthieß, Regel and Zehnter2024). Two of these variables measure pro-environmental and pro-sustainability or anti-growth positions (variables 501 and 416 in the Manifesto Project codebook). The latter have been used to study trends and drivers of the environmental positions of political parties and related dynamics of party competition and political behaviour (Aklin and Urpelainen Reference Aklin and Urpelainen2013; Bez et al. Reference Bez, Bosetti, Colantone and Zanardi2023; Derndorfer et al. Reference Derndorfer, Hoffmann and Theine2022; Facchini et al. Reference Facchini, Gaeta and Michallet2017; Garritzmann and Seng Reference Garritzmann and Seng2023). However, very few contributions address climate policy in particular, and there are no extensive publicly available datasets of climate change salience or positions.

In political science, salience generally refers to how much different actors – members of the public, politicians, and other policymakers – engage with a given political issue (Moniz and Wlezien Reference Moniz and Wlezien2020). It constitutes a key aspect of the struggle for policy output, particularly in so far as it signals the prioritization of some issues over others, reflects what parties think their voters care about, and informs dynamics of party competition (Dolezal et al. Reference Dolezal, Ennser-Jedenastik, Müller and Winkler2014). Salience is more inclusive than measures of political position because it does not require the expression of a specific stance. Political actors can and often do discuss issues related to climate change as an issue without taking specific stances (Sanford and Painter Reference Sanford and Painter2024; Wetts Reference Wetts2020). Therefore, measuring the salience of climate change captures the full scope of signalling that parties can make about the issue and identifies where future work can focus efforts to characterize specific positions and other details.

There is some previous work mapping climate change discourse in political manifestos, mostly employing manual classification methods, and therefore having high reliability and precision in identifying climate change relevance but limited party, temporal, and geographic coverage. Farstad (Reference Farstad2018) developed a measure of climate change salience for eighteen OECD countries for elections in the 2009–12 period. Carter et al. (Reference Carter, Ladrech, Little and Tsagkroni2018) coded the climate mitigation policy positions of the top two parties in six Western EU countries from the mid-1990s to 2015. Schmid (Reference Schmid2021) developed measures of party positions on energy technologies and related policy instruments in Germany, France, and the United Kingdom from 1980 to 2017. More recently, Schwörer and Fernández-García (Reference Schwörer and Fernández-García2023) and Schwörer (Reference Schwörer2024) combined keyword search and manual coding to map climate policy positions of populist radical right parties and mainstream parties in ten Western European countries from the 1990s to 2022.

Meanwhile, there have been computational efforts, leveraging advances in supervised machine learning, to identify environmental salience as an approximation of climate change relevance. For example, Wappenhans et al. (Reference Wappenhans, Stoetzer and Valentim2024) trained a model to detect references to sustainability and environmental policies, including those related to climate change, in the press releases of political parties in nine European countries. However, similar to the pro-environmental protection and pro-sustainability variables of the Manifesto Project, their operationalization also captures topics and issues which are not specific to climate change, e.g., asbestos and the illicit wildlife trade, and therefore does not provide a precise measure of climate change salience (full definition reproduced in Appendix 3).

Dickson and Hobolt (Reference Dickson and Hobolt2024) present a multilingual model for identifying climate change issues in the press releases of political parties in nine European countries. However, their operationalization of climate change salience relies on the Comparative Agendas Project’s (CAP’s) Environment variable, which was designed to cover a wide scope of environmental issues, including climate change but also the protection of drinking water and disposal of hazardous waste (Alexandrova et al. Reference Alexandrova, Carammia, Princen and Timmermans2014; full definition in Appendix 3). It therefore also does not provide a precise measure of climate change salience.

Finally, Webersinke et al. (Reference Webersinke, Kraus, Bingler and Leippold2022) developed the ClimateBert model, which is meant to determine whether a paragraph of text contains climate-relevant content. The model was fine-tuned on different types of corpora, primarily academic research papers on climate change and corporate annual and sustainability reports. Here also, the definition of climate change underlying ClimateBert includes a wide range of content generally related to environmental protection and sustainability (examples from the data used to train the model are provided in Appendix 3). Moreover, although the model is available to the public, it is only trained on English-language texts and is therefore not suitable for multilingual corpora.

The objective of the present paper is therefore to develop a measure of climate change salience which captures all references to climate change as an issue and to objectives, policies, and actions with direct implications for climate change, while excluding references to other environmental problems and references to general sustainability goals which are not climate-change specific. This definition differs fundamentally from the work mentioned above, which combines content related to the environment and sustainability with climate-change-specific content. While these other measures, along with the Manifesto Project’s pro-environment and pro-sustainability/anti-growth variables, do capture content relevant to climate change, they are designed to include a wider spectrum of issues and thus do not provide a precise measure of climate change salience.

Moreover, the existing Manifesto Project pro-environment and pro-sustainability variables capture neither antagonistic nor obstructionist positions on climate change issues (for example, protecting the coal industry), which a salience variable does include. While our variable does not measure party positions on climate change, it takes care of the arduous prerequisite of identifying climate-relevant content in the manifestos, in favour of or against climate action, which can then be expanded with additional coding to identify positions, different policy approaches, and other details.

Defining Climate Change Salience

Accurately identifying all references to climate change is a non-trivial task because the issue is discussed in many different ways across countries and languages, generating an unknown topic scope that no previous empirical work has fully covered. Therefore, there is no structure we can assume a priori to exist. Moreover, as noted in Geese et al. (Reference Geese, Sullivan-Thomsett, Jordan, Kenny and Lorenzoni2024), it is difficult to define climate change relevance without ‘overlooking’ or ‘overstretching’. We therefore used an inductive approach, closely engaging with the manifesto texts over several iterations, to develop an annotation framework for coding the training set.

Our annotation framework is designed to capture references to: (1) climate change as an issue, (2) greenhouse gas (GHG) emissions, (3) climate change mitigation or adaptation objectives, and (4) any policy or action that has direct climate change mitigation or adaptation implications. This definition includes mentions of policies whose key objective is climate action, but also policies whose main objectives are not made explicit in the text but that unequivocally have impacts in terms of climate mitigation or adaptation. For example, we code as relevant references to the expansion of renewable energy production as it leads to lower GHG emissions. We code as relevant references to road pricing and low emission zones which might be introduced to address air pollution but unequivocally have impacts in terms of GHG reduction. Instead, if a quasi-sentence only contains references to the general management of transport systems, without relating it to emissions reduction or other climate-relevant goals, we code it as not relevant.

Moreover, we code as relevant references to reforestation, deforestation, or afforestation, as we know they have climate implications in terms of carbon capture, while we code as not relevant generic references to ‘sustainable forestry’ without further context, as the climate implications of the phrase alone are not clear. While climate mitigation and adaptation represent different political issues, we include both as relevant since they are both fundamentally climate issues. Finally, we code as not relevant references to other environmental problems (such as soil or water pollution) and generic references to sustainability (such as advocacy for sustainable development) without clear references to climate issues. The full annotation framework and examples are provided in Appendix 5.

Manifesto Data

We collected from the Manifesto Project Dataset (MPD) all available manifestos of parties that obtained at least a 5 per cent vote share in any national election after 1990 in forty-five OECD, European Union, and South American countries (see Table A4 for the full list).Footnote 3 This sums to 1,792 manifestos from 620 parties in forty-five countries spanning twenty-six languages. The manifestos are unitized by human coders at what the Manifesto Project defines a quasi-sentence level. Each quasi-sentence is intended to capture a single statement or ‘message’. Our sample contains over 1.8 million quasi-sentences. See Appendices 1 and 2 for more discussion of the collection process, quasi-sentences, and the structure of the MPD.

Supervised Machine Learning Pipeline

We designed a supervised machine learning pipeline to develop our measure of climate change salience. We performed a purposive selection of the training set, employed an ‘active learning’ approach to maximize the performance of our model, validated the performance in random samples of predictions, and carried out a post-hoc manual correction of the quasi-sentences that the model was least confident in predicting. Each step is summarized below and discussed in further detail in Appendices 49. Figure 1 visualizes the pipeline for additional clarity. The primary steps included: (1) defining the variable and the annotation framework; (2) selecting and annotating the training set; (3) fine-tuning the model’s hyper-parameters and cross-validating the model on the training set; (4) running and validating the initial model; (5) expanding the training set to account for errors in the initial model; (6) running and validating the final model; and (7) correcting the most uncertain predictions post-hoc. We trained the model originally only on manifestos from the EU and then applied the validated model to the rest of the countries in our sample.

Figure 1. Key steps of our methodological pipeline. The F1 scores provided for each model run correspond to the performance of each model in post-hoc validation samples.

Model Selection

The volume and linguistic diversity of the our sample of manifestos necessitated a scalable computational method capable of yielding high accuracy and reliability for multilingual corpora. Multilingual transformer models have become the state of the art for such tasks (Rodriguez and Spirling Reference Rodriguez and Spirling2022; Licht and Lind Reference Licht and Lind2023). Transformer models are pre-trained on extremely large datasets to predict missing words from their surrounding context (Devlin et al. Reference Devlin, Chang, Lee and Toutanova2019; Vaswani et al. Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017). They can then be fine-tuned on additional labelled data for more specific tasks. We employ the multilingual transformer XLM-RoBERTa, which has been pre-trained on text from over 100 languages, thereby enabling classification of multilingual texts within a single model (Conneau et al. Reference Conneau, Khandelwal, Goyal, Chaudhary, Wenzek, Guzmán, Grave, Ott, Zettlemoyer, Stoyanov, Jurafsky, Chai, Schulter and Tetreault2020). We fine-tune the model for our classification task by training it on a set of manual annotations for climate change salience in the manifestos.

Training Set Construction

We develop a custom process to construct the training set in order to maximize the performance of the model. This process is summarized below and additional detail is provided in Appendix 4. To adequately train the model, we needed a sufficient number of positive and negative cases – that is, quasi-sentences that are obviously climate-relevant, quasi-sentences that are obviously not climate-relevant, and quasi-sentences mentioning sustainability or environmental issues that are, however, not climate-specific. This included an initial set of 2,598 quasi-sentences selected based on a set of keywords as well as some randomly selected quasi-sentences which we then translated to English and manually annotated for climate relevance.Footnote 4 Each quasi-sentence was then double-coded and the average intercoder reliability score, determined by Cohen’s kappa (Cohen Reference Cohen1960), was 0.915. All conflicting annotations were discussed in meetings where at least three coders were present to align conflicts and ensure full agreement across the training set.

We then trained a preliminary model on this set of annotations, with the quasi-sentences in their original language. Examining the predictions revealed some systematic errors on themes that did not appear in the training set. We annotated a random sample of these predictions (stratified by language, n = 836) using the same procedure as the initial training set (Cohen’s kappa score = 0.757), including alignment to full agreement, and added them to the training set. This approach of ‘active learning’ built robustness into the training set, as we were able to learn from the initial model’s mistakes, thereby accounting for weaknesses in the topic scope of the initial set of quasi-sentences. Indeed, the performance of the second model in post-hoc validation increased by 20 percentage points relative to the initial model (we provide more information on performance metrics below). We use this expanded set (n = 3,434) for the final model training.

Model Training and Performance

Before fine-tuning the base XLM-RoBERTa model to our classification task, we fine-tuned the model hyper-parameters (see Appendix 7 for the final specifications) and conducted five-fold cross validation on the training set, stratifying the folds by language and salience class, to confirm the the stability of the model. Table 1 summarizes the average accuracy and F1 score of the model in the cross-validation. For comparison, we also include the performance of (1) a simple keyword search employing the (stemmed) keywords: ‘climate’, ‘climate change’, ‘global warming’, ‘emissions’, ‘renewable’, ‘environment’, and ’sustainability’ (‘full keyword search’); (2) a keyword search using the set of (stemmed) keywords most likely to be present in positive cases – ‘climate change’, ‘global warming’, ‘climate’, ‘emissions’, ‘renewable’ (‘high probability keyword search’); and (3) The ClimateBert model run over the English translations of all the quasi-sentences in the annotated set.

Table 1. Accuracy and F1 scores for the training set. Classification methods: (1) Full Keyword Search – Keyword search using: ‘climate change’, ‘global warming’, ‘climate’, ‘emissions’, ‘renewable’, ‘environment’, and ’sustainability’; (2) High Probability Keyword Search – Keyword search using the high probability set: ‘climate change’, ‘global warming’, ‘climate’, ‘emissions’, and ‘renewable’; (3) ClimateBert; and (4) average five-fold cross-validation of our model. Highest values per metric shown in bold

The performance of the full keyword search is the weakest. The high probability keyword search yields higher performance, similar to that of ClimateBert. However, our model outperforms both by over 8 percentage points on both metrics. This is likely due to (a) our model generalizing to climate-relevant content that does not contain these keywords and (b) the stricter definition of climate change salience underlying our model relative to the definition employed by ClimateBert, which, as mentioned above, was formulated to exclude generic pro-environmental and pro-sustainability content. Our model also exceeds human intercoder reliability scores documented for similar tasks, as well as our own intercoder reliability score for the training set annotations.

Post-hoc Validation

After testing the model performance in the training set, we applied the model to the rest of the corpus to produce predictions of climate change salience for each quasi-sentence. We evaluated the model performance outside the training set by manually checking random samples of predictions (n = 2,103). In this post-hoc validation set, our model achieved an F1 score of 0.935, more than 20 percentage points higher than ClimateBert and nearly tripling both of the keyword search methods. Details of the validation sample and full performance metrics are provided in Appendix 8.

Post-hoc Correction

Finally, to maximize the reliability of the published dataset, we undertook a process of post-hoc corrections of the quasi-sentences which the model was least confident in predicting. This is not done to inflate the predictive power of the model – the samples for validation are taken before the correction process – but rather as a way of minimizing the number of quasi-sentences the model may have mislabelled so that the dataset can be of the most utility for use by the research community.

The vast majority of predictions are made with extremely high confidence, that is, probability scores exceeding 0.9 for either class, but there is a small set of quasi-sentences to which the model assigned very similar probability scores for each class. Thus, we examined all quasi-sentences with probability scores between 0.4 and 0.6 for climate change salience – 2,002 quasi-sentences in total across all countries in the final dataset (0.1% of all predictions). One author reviewed each of these quasi-sentences and this score became the final prediction value.

Presenting the Dataset

The policlim dataset provides a measure of climate change salience for the 1,792 manifestos included in our sample, covering 620 parties in forty-five OECD, EU, and South American countries for the 1990–2022 period. For each manifesto, the measure of climate change salience is defined as the share of climate-relevant quasi-sentences divided by of the total number of quasi-sentences in the manifesto.

In total, 96,253 quasi-sentences – 5.2% of the full dataset – were predicted as relevant to climate change. This compares with 8.4% of quasi-sentences labelled either as referring to pro-environmental protection (501) or pro-sustainability or anti-growth (416) by the Manifesto Project annotators.Footnote 5 Only 30% of the quasi-sentences that are coded as 501 or 416 by the MPD are also predicted by our model to be relevant to climate change.

Further to the point, Table 2 shows the cross-tabulation of the ten MPD variablesFootnote 6 with the most overlap with our climate variable. We see that 51% of quasi-sentences identified as climate-relevant by our model were not given the pro-environmental (501) nor the pro-sustainability/anti-growth (416) label. Instead, there is a long tail of categories to which the climate-relevant content belongs. These results show that our measure of climate change salience is capturing the multidimensional nature of climate policy in ways that the 501 and 416 variables on their own do not, confirming the need to disambiguate climate change from environment- and sustainability-related issues that are not climate-specific within the research space.

Table 2. Overlap between our measure of climate salience and Manifesto Project variables (code number and descriptive label). Only the ten MPD variables with the highest overlap with our measure of climate salience are listed

To illustrate the value of our climate salience measure, we report a few descriptive statistics. First, Figure 2 plots the mean climate change salience aggregated across all parties in each country over time. With the exceptions of Argentina and Brazil, climate change salience has increased in all countries, but significant variation across countries clearly exists, with parties in Northern and Western Europe having the highest share of climate-salient quasi-sentences.

Figure 2. Average climate change salience over time for each country in our sample. Note : The panel for Uruguay (UY) shows only one point because we only have the manifestos from one election (2014).Footnote 8

Using the party family categorization of the Manifesto Project team (Mair and Mudde Reference Mair and Mudde1998), Figure 3 breaks down the country-level trends between right- and left-leaning parties. Again, for most countries, we see a clear increasing trend for both right- and left-leaning parties, with climate salience being in most (but not all) cases higher for the latter. It should be noted that, among the several factors affecting issue ownership of climate change by left- and right-wing political parties, the presence of strong Green parties is one important element, driving the spike in salience for left-wing parties in countries such as Denmark (2007 elections) or Brazil (in 2010).Footnote 7

Figure 3. Mean share of climate-relevant quasi-sentences per manifesto for right- and left-leaning parties within each country over time. We use the MPD’s party family variable to group the parties into right- and left-leaning groups. The right-leaning group includes parties in the Christian Democrat, Conservative, and Nationalist and radical right families. The left-leaning group includes the families Social Democrats, Socialist and other left, and Ecological. Note : For simplicity, we exclude the Liberal party family from this visualization because it contains both traditionally left-leaning (the UK’s Liberal Democrats) and right-leaning (Germany’s Free Democratic Party) parties.

To further explore the relationship between political orientation and the importance of climate change as an issue for European parties, Figure 4 displays a scatterplot of average climate change salience and a right–left score for all parties across all elections in our sample, based on the party-average standardized right–left index presented in Lowe et al. (Reference Lowe, Benoit, Mikhaylov and Laver2011).Footnote 9 The two variables are negatively correlated, as evidenced by the trend line: on average, parties to the left of the political spectrum tend to discuss issues related to climate change more in their manifestos than those to the right.

Figure 4. Scatterplot of average climate change salience and standardized right–left index for parties averaged across all elections in our sample. The right–left index measures the right- versus left-leaning of the positions articulated in each party’s manifesto, according to Lowe et al. (Reference Lowe, Benoit, Mikhaylov and Laver2011). The trend line fits a linear model and the shaded area represents 95 per cent confidence intervals.

The plot highlights some of the largest parties and notable outliers in our sample, showing the potential of the dataset for identifying climate change discourse of specific parties. The biggest European parties are mostly closer to the centre of the plot as they maintain more centrist positions along the left–right dimension and display moderate climate change salience. Still, some differences emerge, with the far-right Rassemblement National (RN) in France showing higher salience compared to the centre-right Renaissance (RE). This result highlights how critical positions towards climate policies, coming from a far-right party such as the RN, can translate into higher issue salience according to our model.

Limitations

The policlim model and dataset are the outcomes of a rigorous training and validation pipeline. We made every effort to optimize the synthesis of rigorous manual classification with the scaling power of the transformer model. Thanks to this, we achieve an unprecedented level of performance for our task of identifying climate change salience in the manifesto texts. Nonetheless, both the model and the dataset contain errors. This is the nature of large-scale text classification tasks, both manual and computational, and it presents researchers with the challenge of how to mitigate the detrimental effects of classification errors in downstream analyses. Scholars such as TeBlunthuis, et al. (Reference TeBlunthuis, Hase and Chan2024) have documented these issues and developed strategies for handling this misclassification bias. We encourage all researchers who consider using the policlim dataset or model in their own work to consider the potential impacts of misclassification bias in their analyses and to compensate accordingly.

Intended Uses

We created this dataset because we saw a gap in the data researchers had available to them for analysing the prevalence of climate change in political manifestos. We therefore envision the dataset to be appropriate for any analysis in which a measure of attention given to climate change within the electoral manifestos in our sample is relevant. Researchers may also find the dataset useful as a starting point for coding more specific features of climate policy discourse, e.g., a focus on specific industries or policy instruments. Instead of having to sift through the full dataset of manifesto text, the policlim dataset allows researchers to automatically focus on those most likely to be climate relevant. Meanwhile, the dataset should not be used as a proxy for climate change position, as our variable does not contain information on the stance of the content expressed about climate change.

Similarly, the model can also be used to identify climate change salience in other kinds of political texts. For instance, the model could be used on sentences in political press releases, social media posts, and advertisements. However, depending on the domain, the model may produce more or less certain predictions. As such, we would recommend inspecting samples of predictions post-hoc, as well as those falling in borderline regions of the probability distribution, as we did in the post-hoc validation and correction phases of our dataset development.

Conclusion

This article presents the policlim dataset of climate change salience in 1,792 manifestos of 620 parties active in forty-five OECD, European, and South American countries from 1990 to 2022. Exploiting advances in supervised machine learning, we fine-tune the XLM-RoBERTa model on a manually annotated sample for our classification task, developing an extensive framework for defining climate change salience as distinct from other environmental and sustainability topics. We then compare the performance of our model to alternative approaches and show that our model outperforms them. We make the annotation framework, training data, and model available to the research community, allowing replication and application to similar tasks.

We make the policlim dataset available to the research community, complete with the full suite of manifesto metadata from the Manifesto Project, to allow further investigations of climate change discourse in political texts. The dataset can be a precious resource to investigate drivers and consequences of climate discourse of parties across different countries and decades, as well as to explore party competition and party responsiveness more broadly. At the same time, the model allows for the identification of climate change salience in other political texts. Additional experimentation can be done to determine the extent to which the model also works for non-political texts, for example news articles, scientific papers, and social media posts.

Defining the boundaries of climate change salience is difficult given the extent to which climate change is implicated across policy domains, from energy and transport to housing and agriculture, and how intertwined it can be with other environmental issues. However, making this distinction is fundamental to correctly measure the relative importance of climate change specifically to political parties in comparison with other, potentially less controversial environmental topics. We hope that our framework to identify climate change discourse can inform future work in this area and highlight the importance of making a clear distinction between climate change and other environmental topics.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0007123425100719

Data availability statement

Replication data for this paper can be found in Harvard Dataverse at https://doi.org/10.7910/DVN/OXMPTR.

Acknowledgements

The authors would like to thank Doina Vasilev and Marie Holzer for excellent research assistance, and Max Callaghan for input on the modelling framework. They also thank participants of the Political Dynamics and Consequences of Climate Policy workshop at Bocconi University, the 2024 COMPTEXT Conference, the 2024 What Works Climate Solutions Summit, the King’s College Public Policy and Regulation Workshop, members of the Environmental Politics and Governance community, Italo Colantone, and Fay Farstad and members of her PARTYCLIM team for very useful feedback and commentary. They are also grateful to the Manifesto Project team for all of their efforts to make the manifesto data freely available to researchers.

Financial support

This work was supported by the European Union under the Horizon programme (CAPABLE project – Grant No. 101056891).

Competing interests

The authors declare no competing interests.

Footnotes

1 Accuracy is the ratio of the sum of the true positive and true negative predictions to the total number of predictions. The F1 score reported here is the ratio of the true positives for climate salience to the sum of the true positives, false positives, and false negatives, i.e., the harmonic mean of precision and recall. Precision is the proportion of all the model’s positive classifications that are true positives. Recall is the proportion of all true positives that were classified correctly as positives.

2 Links to the dataset, codebase, and model are available on GitHub: https://github.com/marysanford/policlim.

3 There is typically a delay of up to five years between an election and when the corresponding manifestos are uploaded to the MPD. The MPD makes available the machine-readable text of the manifestos it has on file via an application programming interface (API). However, it does not have the text available for all manifestos. We therefore collected by hand a number of manifestos that were missing, particularly from France. The MPD includes only two manifestos from Malta (the Labour Party manifestos for 1996 and 1998) so we decided to exclude Malta from the analysis. We include manifestos from the United Kingdom also for the post-Brexit period. We include all OECD countries except Iceland, Japan, South Korea, and Turkey. As the model did not perform at an acceptable level for these languages, for the time being, we do not include them in the dataset. We include all South American countries except Guyana, Paraguay, Peru, Suriname, and Venezuela. The text of the manifestos of these countries was not readily available in the MPD.

4 The model is trained on EU manifestos. We then apply the model to the additional non-EU manifestos and validate these results the same way we validate the results of the original EU manifestos.

5 The quasi-sentence annotations are not made available by the Manifesto Project Database for 22 per cent of the quasi-sentences in our sample. For this subset, we infer the quasi-sentence scores using the manifestoberta model, developed by the Manifesto Project to classify their original variables.

7 Figure A3 examines the average trends of climate change salience over time for each party family, again employing the party family categorization developed by the Manifesto Project team (Mair and Mudde Reference Mair and Mudde1998). There we see most parties increasing over time but with a sharp decline after 2019.

8 We use ISO-2-C abbreviations for all country names.

9 The Lowe et al. (Reference Lowe, Benoit, Mikhaylov and Laver2011) right–left score is determined for each manifesto as the log odds-ratio of the difference between their scores on a number of policy variables traditionally considered to be polarized between right- and left-leaning parties. Parties with values below 0 are classified as left-leaning, while parties with values greater or equal to 0 are classed as right-leaning. This version of the variable is an improvement over the one originally provided by the Manifesto Project from Laver and Budge (Reference Laver and Budge1992) because the Lowe et al. (Reference Lowe, Benoit, Mikhaylov and Laver2011) version includes more policy categories and does not scale the variable by manifesto length. As the authors demonstrate, by using the log odds-ratio of sentences coded to left versus right positions, their variable is not sensitive to the number of sentences not coded to a position and therefore more precisely captures the relative balance of right versus left positions in each manifesto.

References

Aklin, M and Urpelainen, J (2013) Political competition, path dependence, and the strategy of sustainable energy transitions. American Journal of Political Science 57(3), 643658.10.1111/ajps.12002CrossRefGoogle Scholar
Alexandrova, P, Carammia, M, Princen, S and Timmermans, A (2014) Measuring the European Council agenda: Introducing a new approach and dataset. European Union Politics 15, 152167.10.1177/1465116513509124CrossRefGoogle Scholar
Bez, C, Bosetti, V, Colantone, I and Zanardi, M (2023) Exposure to international trade lowers green voting and worsens environmental attitudes. Nature Climate Change 13(10), 11311135.10.1038/s41558-023-01789-zCrossRefGoogle Scholar
Budge, I (2002) Mapping policy preferences: 21 years of the comparative manifestos project. European Political Science 1(3), 6068.10.1057/eps.2002.33CrossRefGoogle Scholar
Carter, N, Ladrech, R, Little, C and Tsagkroni, V (2018) Political parties and climate policy: A new approach to measuring parties’ climate policy preferences. Party Politics 24(6), 731742.10.1177/1354068817697630CrossRefGoogle ScholarPubMed
Cohen, J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 3746.10.1177/001316446002000104CrossRefGoogle Scholar
Conneau, A, Khandelwal, K, Goyal, N, Chaudhary, V, Wenzek, G, Guzmán, F, Grave, E, Ott, M, Zettlemoyer, L and Stoyanov, V (2020) Unsupervised cross-lingual representation learning at scale. In Jurafsky, D., Chai, J., Schulter, N. and Tetreault, J. (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 84408451.10.18653/v1/2020.acl-main.747CrossRefGoogle Scholar
Derndorfer, J, Hoffmann, R and Theine, H (2022) Integrating environmental issues within party manifestos: Exploring trends across European welfare states. In Towards Sustainable Welfare States in Europe. Edward Elgar Publishing, pp. 80104.Google Scholar
Devlin, J, Chang, M-W, Lee, K and Toutanova, K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 41714186.Google Scholar
Dickson, ZP and Hobolt, SB (2024) Going against the grain: Climate change as a Wedge issue for the radical right. Comparative Political Studies 58(8), 17331759.10.1177/00104140241271297CrossRefGoogle Scholar
Dolezal, M, Ennser-Jedenastik, L, and Müller, WC and Winkler, AK (2014) How parties compete for votes: A test of saliency theory. European Journal of Political Research 53(1), 5776.10.1111/1475-6765.12017CrossRefGoogle Scholar
Facchini, F, Gaeta, GL and Michallet, B (2017) Who cares about the environment? An empirical analysis of the evolution of political parties’ environmental concern in European countries (1970–2008). Land Use Policy 64, 200211.10.1016/j.landusepol.2017.02.017CrossRefGoogle Scholar
Farstad, FM (2018) What explains variation in parties’ climate change salience? Party Politics 24(6), 698707.10.1177/1354068817693473CrossRefGoogle Scholar
Garritzmann, JL and Seng, K (2023) The politics of (de)liberalization: Studying partisan effects using mixed-effects models. Political Science Research and Methods 12(4), 750766.10.1017/psrm.2023.35CrossRefGoogle Scholar
Geese, L, Sullivan-Thomsett, C, Jordan, AJ, Kenny, J and Lorenzoni, I (2024) Measuring climate mitigation policy content in text-as-data: Navigating the conceptual challenges. Political Research Exchange 6(1).10.1080/2474736X.2024.2387120CrossRefGoogle Scholar
Green, JF and Hale, TN (2017) Reversing the marginalization of global environmental politics in international relations: An opportunity for the discipline. PS: Political Science & Politics 50(2), 473479.Google Scholar
Hadden, J and Prakash, A (2024) Introduction: What scholars know (and need to know) about the politics of climate change. PS: Political Science & Politics 57(1), 1720.Google Scholar
Javeline, D (2014) The most important topic political scientists are not studying: Adapting to climate change. Perspectives on Politics 12(2), 420434.10.1017/S1537592714000784CrossRefGoogle Scholar
Keohane, RO (2015) The global politics of climate change: Challenge for political science. PS: Political Science & Politics 48(1), 1926.Google Scholar
Laver, MJ and Budge, I (eds.) (1992) Party Policy and Government Coalitions. London: Palgrave Macmillan UK.10.1007/978-1-349-22368-8CrossRefGoogle Scholar
Lehmann, P, Burst, T, Lewandowski, J, Matthieß, T, Regel, S and Zehnter, L (2024) Manifesto Corpus: Version: 2024-A Google Scholar
Licht, H and Lind, F (2023) Going cross-lingual: A guide to multilingual text analysis. Computational Communication Research 5(2), 1.10.5117/CCR2023.2.2.LICHCrossRefGoogle Scholar
Lowe, W, Benoit, K, Mikhaylov, S and Laver, M (2011) Scaling policy preferences from coded political texts. Legislative Studies Quarterly 36(1), 123155.10.1111/j.1939-9162.2010.00006.xCrossRefGoogle Scholar
Mair, P and Mudde, C (1998) The party family and its study. Annual Review of Political Science 1, 211229.10.1146/annurev.polisci.1.1.211CrossRefGoogle Scholar
Moniz, P and Wlezien, C (2020) Issue salience and political decisions. In Oxford Research Encyclopedia of Politics.10.1093/acrefore/9780190228637.013.1361CrossRefGoogle Scholar
Rodriguez, PL and Spirling, A (2022) Word embeddings: What works, what doesn’t, and how to tell the difference for applied research. The Journal of Politics 84(1), 101115.10.1086/715162CrossRefGoogle Scholar
Ross, ML (2025) The new political economy of climate change. World Politics 77(1), 155194.Google Scholar
Sanford, M and Painter, J (2024) Divergences between mainstream and social media discourses after COP26, and why they matter. Oxford Open Climate Change 4(1), kgae006.10.1093/oxfclm/kgae006CrossRefGoogle Scholar
Sanford, M, Pianta, S, Schmid, N and Musto, G (2025) “Replication Data for: Policlim: A Dataset of Climate Change Discourse in the Political Manifestos of 45 Countries from 1990–2022”, https://doi.org/10.7910/DVN/OXMPTR, Harvard Dataverse, V1.CrossRefGoogle Scholar
Schmid, N (2021) A comparative and dynamic analysis of political party positions on energy technologies. Environmental Innovation and Societal Transitions 39, 206228.10.1016/j.eist.2021.04.006CrossRefGoogle Scholar
Schwörer, J (2024) Mainstream parties and global warming: What determines parties’ engagement in climate protection? European Journal of Political Research 63, 303325.10.1111/1475-6765.12602CrossRefGoogle Scholar
Schwörer, J and Fernández-García, B (2023) Climate sceptics or climate nationalists? Understanding and explaining populist radical right parties’ positions towards climate change (1990–2022). Political Studies 72(3), 11781202.10.1177/00323217231176475CrossRefGoogle Scholar
TeBlunthuis, N, Hase, V and Chan, C–H (2024) Misclassification in automated content analysis causes bias in regression. Can we fix it? Yes we can!. Communication Methods and Measures 18(3), 278299.10.1080/19312458.2023.2293713CrossRefGoogle Scholar
Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Kaiser, Ł and Polosukhin, I (2017) Attention is all you need. In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc.Google Scholar
Wappenhans, T, Klüver H, Stoetzer, L and Valentim, A (2024) Extreme weather events do not increase political parties’ environmental attention. Nature Climate Change 14, 696699.10.1038/s41558-024-02024-zCrossRefGoogle Scholar
Webersinke, N, Kraus, M, Bingler, J and Leippold, M (2022) ClimateBERT: A pretrained language model for climate-related text. In Proceedings of AAAI 2022 Fall Symposium: The Role of AI in Responding to Climate Challenges.10.2139/ssrn.4229146CrossRefGoogle Scholar
Wetts, R (2020) Models and morals: Elite-oriented and value-neutral discourse dominates American organizations’ framings of climate change. Social Forces 98(3), 13391369.Google Scholar
Figure 0

Figure 1. Key steps of our methodological pipeline. The F1 scores provided for each model run correspond to the performance of each model in post-hoc validation samples.

Figure 1

Table 1. Accuracy and F1 scores for the training set. Classification methods: (1) Full Keyword Search – Keyword search using: ‘climate change’, ‘global warming’, ‘climate’, ‘emissions’, ‘renewable’, ‘environment’, and ’sustainability’; (2) High Probability Keyword Search – Keyword search using the high probability set: ‘climate change’, ‘global warming’, ‘climate’, ‘emissions’, and ‘renewable’; (3) ClimateBert; and (4) average five-fold cross-validation of our model. Highest values per metric shown in bold

Figure 2

Table 2. Overlap between our measure of climate salience and Manifesto Project variables (code number and descriptive label). Only the ten MPD variables with the highest overlap with our measure of climate salience are listed

Figure 3

Figure 2. Average climate change salience over time for each country in our sample.Note: The panel for Uruguay (UY) shows only one point because we only have the manifestos from one election (2014).8

Figure 4

Figure 3. Mean share of climate-relevant quasi-sentences per manifesto for right- and left-leaning parties within each country over time. We use the MPD’s party family variable to group the parties into right- and left-leaning groups. The right-leaning group includes parties in the Christian Democrat, Conservative, and Nationalist and radical right families. The left-leaning group includes the families Social Democrats, Socialist and other left, and Ecological. Note: For simplicity, we exclude the Liberal party family from this visualization because it contains both traditionally left-leaning (the UK’s Liberal Democrats) and right-leaning (Germany’s Free Democratic Party) parties.

Figure 5

Figure 4. Scatterplot of average climate change salience and standardized right–left index for parties averaged across all elections in our sample. The right–left index measures the right- versus left-leaning of the positions articulated in each party’s manifesto, according to Lowe et al. (2011). The trend line fits a linear model and the shaded area represents 95 per cent confidence intervals.

Supplementary material: File

Sanford et al. supplementary material

Sanford et al. supplementary material
Download Sanford et al. supplementary material(File)
File 1.4 MB
Supplementary material: Link

Sanford et al. Dataset

Link