Politicians, to put it mildly, often stick their feet in their mouths. Some live to fight another day, and some never recover. In 2006, former Virginia Sen. George Allen was caught on camera calling a videographer a “macaca” (monkey) (Tapper and Kulman, Reference Tapper and Kulman2006), and he lost his seat in a fit of disgrace. In 2016, Donald Trump was exposed on a hot mic from a decade earlier bragging that he could “grab [women] by the pussy” (Times, Reference Times2016), and he was elected president. There is no shortage of high-profile politicians who come under fire for making insensitive remarks. Amid hotly charged debates over “cancelations,” these controversies matter. What can and cannot be said not only implicates politicians. It has become one of the most contested terrains over which America’s “culture wars” are fought (Clark, Reference Clark2020; Friedersdorf, Reference Friedersdorf2022; Mishan, Reference Mishan2020). Even a cursory look at anecdotes, however, suggests that the consequences for violating speech norms can vary markedly. Why?
In popular circles, answers tend to highlight the remarks themselves and how politicians exercise damage control. Yet beyond the precise language that politicians use, observers commonly speculate that not all politicians, at all times, are held to the same standards. Pundits, for instance, have routinely commented on Trump’s “teflon” reputation in which nothing he says – from trafficking in racism to demeaning “locker room talk” against women – seems to “stick” to him (Cillizza, Reference Cillizza2016). One CNN reporter has written about a “political double standard for ugly speech” (Sheffield, Reference Sheffield2018). Such examples are not confined to the U.S. In the U.K., a writer for a major British newspaper argued that a conservative MP “fe[lt] she c[ould] say whatever she wants, because she’s a woman and she’s queer, and therefore in her eyes she’s not like those other party members” (Necati, Reference Necati2017).
What such counterfactuals have in common is a simple intuition: that the words politicians use may not be the only (or even most salient) factor determining their fates. A growing body of political science research has looked at how the content of speech – ranging from inflammatory racial rhetoric to sexist and other tropes – can affect voter responses to politicians (Benoit, Reference Benoit2017; Conway et al., Reference Conway, Repke and Houck2017; Hodges, Reference Hodges2020; Ott, Reference Ott2016; Newman et al., Reference Newman, Merolla, Sono Shah, Collingwood and Karthick Ramakrishnan2021; Rhodes et al., Reference Rhodes, Sharrow, Greenlee and Nteta2020; Shafer, Reference Shafer2017). At the same time, a parallel scholarship has examined how any array of situational factors – including the conditions under which words are spoken and details about a speaker – impact public reactions (Christiani, Reference Christiani2023; Lindner and Nosek, Reference Lindner and Nosek2009; McGraw, Reference McGraw1990; Thompson and Busby, Reference Thompson and Busby2023). To date, however, research has not offered an integrated account of how such variables shape the outcomes of speech scandals.
This gap represents a challenge. Despite efforts to conceptualize other serious types of politician misconduct (Dewberry, Reference Dewberry2015; Dziuda and Howell, Reference Dziuda and Howell2021; Invernizzi, Reference Invernizzi2016), there has been no similarly comprehensive attempt to explain divergent outcomes of speech acts. Even if differences in the controversial words politicians use or the situations in which they say them “matter,” the relative importance of these variables is unclear. Voters weigh many factors when evaluating candidates, with some taking precedence. When examined in isolation, individual features of a speech controversy may appear more salient than they in fact are (De la Cuesta et al., Reference De la Cuesta, Egami and Imai2022; Horiuchi et al., Reference Horiuchi, Smith and Yamamoto2020; Rabinowitz et al., Reference Rabinowitz, Prothro and Jacoby1982). How voters prioritize different criteria – including both the content of speech and the situation surrounding it – requires direct comparisons.
This article fills that gap by developing a general framework to explain public reactions to inappropriate language by politicians. We focus on identity-based speech because of its highly fraught nature. The diffusion of “social justice” movements, including Black Lives Matter, #MeToo, LGBTQ+ advocacy, and the Israel-Palestine issue has placed identity-based speech squarely in the public eye (Bernstein, Reference Bernstein2005; Fukuyama, Reference Fukuyama2018; Lilla, Reference Lilla2018). In doing so, it has galvanized members of the public either advocating for more sensitivity or claiming that the “Overton window” for incorrect speech has narrowed too far (Costello et al., Reference Costello, Hawdon, Bernatzky and Mendes2019; Haidt and Lukianoff, Reference Haidt and Lukianoff2018; Howard, Reference Howard2019). The features of speech transgressions that we highlight cover the basic “who, what, where, and when” of controversies that would likely feature in media coverage (Tumber and Waisbord, Reference Tumber and Waisbord2019).
To test our predictions, we fielded a preregistered, national online conjoint experiment in the U.S. The experiment asked respondents to evaluate fictitious news articles about politicians, where we randomized key features of the content of the controversial remark and the situation in which it occurred. Our findings provide partial support for our expectations: As predicted, subjects react most negatively to insensitive speech when the target belongs to their own identity group, when aggravating circumstances exist, and when politicians are of an opposing political party. However, contrary to predictions, subjects do not respond most negatively to slurs, react similarly regardless of how a politician addresses the scandal after the fact, and are no more likely to rule out voting for a politician based on having dissimilar demographic traits.
Theoretically, our study can help to explain why only some politicians have their careers derailed (or even ended) by making derogatory comments. Although prior work on speech controversies has focused on a small number of causative factors in isolation, we offer a generalized framework. Empirically, we provide – to our knowledge – the first exploratory test of how a large set of contentual and situational variables compare in determining the outcomes of politicians accused of “wrongspeak.” Broadly, our study adds to a considerable literature on scandals in American politics, which has focused on topics like sex allegations, financial misconduct, and other malfeasance (Barnes et al., Reference Barnes, Beaulieu and Saxton2020; Basinger and Rottinghaus, Reference Basinger and Rottinghaus2012; Busby, Reference Busby2022; Bowler and Karp, Reference Bowler and Karp2004; Doherty et al., Reference Doherty, Dowling and Miller2014; Funk, Reference Funk1996; Green et al., Reference Green, Zelizer and Kirby2018; Knight, Reference Knight and Dagnes2011; Maier, Reference Maier2010; Vonnahme, Reference Vonnahme2014).
How voters react to speech controversies
We propose a framework in which both the content of the speech and the situation in which it happens can shape public judgments of speech scandals. By content, we refer to (1) the nature of the remarks (what was said and who was targeted) and (2) the politician’s response (whether the candidate apologizes, makes excuses, defends the speech, or offers “no comment”). By situational factors, we refer to (1) the context (the timeframe, the degree of spontaneity, and whether the words reflect a pattern) and (2) the politician’s background traits (his or her political party, race, gender, and age). Our preregistered predictions are as follows:
Contentual factors
Nature of original remarks
Reporting on identity-based speech controversies starts with details of the speech itself (Ayo et al., Reference Ayo, Folorunso, Ibharalu and Ademola Osinuga2020; Kennedy et al., Reference Kennedy, Kogon, Coombs, Hoover, Park, Portillo-Wightman, Davani, Atari and Dehghani2018). More severe comments should generally generate greater backlash. Relative to other forms of insensitive speech – including dehumanizing language, stereotypes, and denials of discrimination – slurs, which impugn a group’s innate self-worth, tend to be regarded as most transgressive (Croom, Reference Croom2013; McWhorter, Reference McWhorter2021). As such, the public may be most likely to condemn their use as a “cardinal sin.” Separately, in line with social identity theory (Ellemers and Haslam, Reference Ellemers, Alexander Haslam, Van Lange, Kruglanski and Tory Higgins2012; Tajfel and Turner, Reference Tajfel, Turner, Jost and Sidanius2004), which presumes that people feel more commonality with in-group members, voters may be especially activated if they are members of the same identity group being targeted. These vicarious feelings accord with the “homophily” principle rooted in in-group preferences (McPherson et al., Reference McPherson, Smith-Lovin and Cook2001).
-
Prediction 1a: Voters will object most to insensitive speech when a slur is used.
-
Prediction 1b: Voters will object most to insensitive speech when their own identity group is “congruent” with the target.
Politician’s response
Politicians routinely try to control public relations after a speech scandal. Studies largely suggest that voters will be most punitive toward politicians who expressly try to shift blame away from themselves (Chanley et al., Reference Chanley, Sullivan, Hope Gonzales and Bull Kovera1994; Kitagawa and Chu, Reference Kitagawa and Chu2021; Thompson and Busby, Reference Thompson and Busby2023). Compared to its main alternatives – apologizing or defending one’s words – making an excuse for inappropriate language has been shown to be ineffective. Whereas apologies, defenses, and even “no comments” avoid blame shifting, excuses deflect culpability. Research shows that standard excuses – such as pleading ignorance or claiming to misspeak or to be quoted out of context – often backfire by making public officials seem weak or unwilling to “take the heat” (McGraw, Reference McGraw1990). This accords with analyses of groups being more positively inclined toward leaders who insist that “the buck stops” with them (Miller and Reeves, Reference Miller and Reeves2022).
-
Prediction 2: Voters will object most to insensitive speech when politicians make excuses for their words.
Situational factors
Context
Context may serve an aggravating factor in assigning guilt (Nyhan, Reference Nyhan2014, Reference Nyhan2017; Schein, Reference Schein2020). Several issues may be relevant. First, the public may be more critical of biased comments that are more recent (Schulte, Reference Schulte2021). Because definitions of prejudice tend to become more capacious over time (Greenland et al., Reference Greenland, West and van Laar2022), with an informal “statute of limitations” attached to past mistakes, voters may be reluctant to apply today’s standards of morality to dated words. Planned comments (e.g., at a speech or on social media) might also offer politicians less latitude to eschew accountability or to clarify their remarks than ones delivered more spontaneously (e.g., in an interview, Q&A, or debate) (Frantzich, Reference Frantzich2012). Finally, repeated usage of controversial language may make it harder for voters to discount statements as an anomaly or “out of character” (Agadjanian et al., Reference Agadjanian2019; Nisan and Horenczyk, Reference Nisan and Horenczyk1990).
-
Prediction 3: Voters will object most to insensitive speech when there are more aggravating factors to impute guilt (that is, when remarks are more recent, planned, and part of a pattern of behavior).
Politician’s background traits
Voters often turn to politicians’ background traits as “shortcuts,” or heuristics, to make inferences (Campbell and Cowley, Reference Campbell and Cowley2014; McGraw, Reference McGraw1990). One way is by “projecting” on politicians certain attitudes or biases. When voters are negatively predisposed toward a politician, they may evaluate bad behavior more harshly. Evidence indicates that this is especially likely if their own background characteristics do not match those of the accused politician. Rejection may be conditioned by either the personal political attachments of voters (i.e., party identification) or their ascriptive traits (i.e., race, gender, and age). Extensive research, for example, shows that voters empathize less with candidates who are on opposing political “teams” (Green et al., Reference Green, Palmquist and Schickler2002) and who do not “look like” them (Aichholzer and Willmann, Reference Aichholzer and Willmann2020). Affective disalignment may even trump more programmatic or policy concerns when voting.
-
Prediction 4: Voters will object most to insensitive speech when their partisanship and ascriptive traits (race, gender, and age) are “incongruent” with the politician.
Survey and data collection
Appendix Table A.1 summarizes the full set of experimental variables and hypotheses. To test our predictions, we fielded a preregistered online survey experiment via YouGov U.S. between December 5th and 19th, 2024.Footnote 1 The conjoint design randomly manipulated core elements of speech allegations (nature of the remarks, politician’s response, context, and politician’s background traits) to discern public reactions. Respondents included 3,162 adults drawn from a national panel, weighted to approximate the U.S. population by age, gender, race, education, region, income level, and political party. Each respondent received modest compensation for participating in our study, equivalent to between $5 and $7.5. Due to the sensitivity of the topic, all respondents were warned prior to the experiment that the content could be offensive, with the choice given to opt out. Subjects were debriefed that the scenarios were fictitious. One scope condition on our study is that, by necessity, we assumed that everyone in the public is aware of the speech scandal, which may be more or less realistic depending on actual media coverage and voter attentiveness.
Conjoint design
For each vignette, respondents evaluated in isolation a single candidate portrayed as running for state legislature in a general election who had made a controversial remark, then repeated this task three times (for a total N of 10,200). Conjoints maximize statistical efficiency by measuring the impact of randomly isolated variables on voter preferences (Bansak et al., Reference Bansak, Hainmueller, Hopkins, Yamamoto, Druckman and Green2021b; Hainmueller et al., Reference Hainmueller, Hopkins and Yamamoto2014). The design addresses the likelihood that making inappropriate remarks is collinear with other traits of politicians that may depress ballot box appeal. It also precludes assessments of the often idiosyncratic nature of real-world speech controversies. We presented just one candidate for respondents to evaluate at a time because it is more realistic for the setting. Although forced-choice conjoints involving candidate “face-offs” are common, it is rare that voters would be presented in an actual election with a choice of two politicians, both of whom had made insensitive comments.
Respondents read about the politician and the speech allegations in a brief vignette, similar to a write-up in a newspaper.Footnote 2 In total, we randomized ten independent variables. The dedication required for the survey is below the likely level that would trigger “satisficing” behavior. Research shows that respondents are capable of processing a relatively large number of dimensions while maintaining the integrity of causal effects (Bansak et al., Reference Bansak, Hainmueller, Hopkins and Yamamoto2018, Reference Bansak, Hainmueller, Hopkins and Yamamoto2021a). To increase realism, a headshot of the candidate manipulated politician’s background traits of race (via skin tone and hair color), gender, and age. Alongside listing a party affiliation, we also included racialized, gendered names for each politician.Footnote 3 In total, we used 16 pictures, one for each race-gender-age category (see Figure A.1).Footnote 4 Figure 1 shows an example vignette.Footnote 5

Figure 1. Example of candidate presentation.
We took particular care in representing the content of speech. Media outlets follow different style guidelines in reporting on insensitive language.Footnote 6 Apart from a “clean” control that only describes the politician and does not reference any speech scandal, we presented realistic quotes that denoted (1) slurs; (2) stereotypes; (3) dehumanizing language; or (4) denials of discrimination. This precluded respondents from needing to make assumptions about what these generic terms mean. SlursFootnote 7 and stereotypes were necessarily specific to each identity group. To ensure that responses were not driven by idiosyncrasies in the chosen language (Bertrand and Mullainathan, Reference Bertrand and Mullainathan2004; Sen and Wasow, Reference Sen and Wasow2016), we randomized two quotes each (eight in total) varying the precise language for each treatment, then collapsed them into the four central categories (see Table A.2). This also reduces the concern that some comments directed against the main identity categories (e.g., specific stereotypes for different groups) are regarded as more derogatory.
For the politician’s response, we used quotes that represented three different variants of apologies, excuses, and defenses that are commonly invoked. Apologies included the following: (1) a remorseful apology that expressed simple contrition; (2) a “woke” apology that admitted complicity in broader power structures; or (3) a “sorry-I-offended” apology that conveyed regret that the statement hurt feelings. Excuses included the following: (1) pleading ignorance; (2) claiming to misspeak; or (3) maintaining that remarks were taken out of context. Defenses included: (1) denying wrongdoing; (2) playing the victim; or (3) going on the attack.Footnote 8 Collectively, we randomized nine independent treatments (see Table A.3) equating to the response sub-types, in addition to “no comment” controls. For each of the conditions, we again used two quotes each (20 in total) to reduce idiosyncratic reactions to particular language and then combined them into the four main categories.
We included three main dependent variables (see Appendix – DV Questions). The first was a simple binary question measuring electoral preferences: “If you had to make a choice without knowing more, would you ever consider voting for this candidate?” (Yes/No). We categorize a “No” answer as the respondent “ruling out” voting for the candidate. The second question asked respondents to evaluate the politician on a “feeling thermometer,” based on a standard measure from the American National Election Study. The continuum ran on a 0–100 scale from “coldness” to “warmth.” Finally, we asked a general question about the speech act itself: “On a scale of 1–7, how objectionable do you think the candidate’s behavior is in this case? (From 1 (“Not at all objectionable”) to 7 (“The most objectionable I can imagine”)). We framed the question in the negative because we assumed that most respondents would reduce their support for politicians accused of insensitive statements.
Data and empirics
Power calculations
We conducted a power calculation using the procedure in Schuessler and Freitag (Reference Schuessler and Freitag2020) where we sought to identify minimum detectable effects (MDEs) as small as 0.05. To do so, we relied on results from a pilot study with a convenience sample of volunteers (N = 919) that we fielded via the Harvard Digital Lab for the Social Sciences (DLABSS) from Dec. 2023 to Jan. 2024.Footnote 9 Using estimated coefficients from the pilot on a main “ruling out” binary DV, we calculated (for our maximum number of levels (5) across our attributes, including interactions) a required N of 9,394 (accounting for the loss of power due to including a clean control with no reference to a speech controversy, which lacks variation for the remaining attributes). This was less than our final sample size of 9,486 observations (three tasks per individual respondent). When adjusting our estimates using the procedure in Storey and Tibshirani (Reference Storey and Tibshirani2003) to account for testing several hypotheses with effects expected to be of different sizes, we found that we should still be able to detect, for most scenarios, significance (at q < .05) except in the scenario when coefficients are tiny (less than .001, or 1/40th of the smallest coefficient in our pilot).
Data pre-processing
YouGov data are generally high-quality and have been used in a range of recent American politics studies on public opinion (e.g., Bartels, Reference Bartels2018; Huddy et al., Reference Huddy, Mason and Aaroe2015). As part of data pre-processing, we took standard checks to ensure the approximation of national demographic representativeness and balanced randomization (see Table A.5, which shows some differences with the population typical of online survey samples). This justifies the use of respondent covariates in our models, as described below.
Analysis and Results
For our analyses, we estimate marginal means (MMs) under each randomized condition using simple linear models with each level of the attribute as a dummy. MMs have increasingly become standard in conjoint research because they allow for substantive interpretations of the baseline estimands and are useful for comparing the relevance of attributes across subgroups or interactions given different baselines (Abramson et al., Reference Abramson, Kocak and Magazinnik2022; Leeper et al., Reference Leeper, Hobolt and Tilley2020). We also report the statistical significance of main effects using differences in Average Marginal Component Effects (AMCEs) with q-values, computed in accordance with the Storey-Tibshirani method. These are found to have the best trade-off between excessive Type I and Type II errors under multiple hypothesis testing. Covariates are used in all the models (respondent gender, age, race, education, region, income level, and partisanship).Footnote 10
Before turning to our main models, we first estimate a baseline regression, the marginal means of which are shown in Figure 2 Panel A, that compares responses to the “clean” control to those of any treatment condition that references a speech scandal. This estimation indicates the net effect of insensitive remarks on a respondent’s likelihood of “ruling out” voting for a candidate accused of controversial remarks. As expected, respondents are significantly more likely to rule out voting for politicians associated with a speech controversy, more than doubling the probability that they would never consider voting for the candidate.

Figure 2. Marginal means (MMs) of the probability of answering “I would NEVER consider voting for this candidate,” contrasting respondents who receive no information about controversial speech by the candidate with those who are informed that the candidate has been involved in a speech scandal.
Note: Within models, all treatments are statistically significant, calculated as an AMCE (q < .05).
In Panels B–F of Figure 2, we also check whether some respondents react more negatively in the aggregate to insensitive speech by stratifying respondents by political party/ideology and demographics (race, gender, and age). In terms of party identification and ideology, research suggests that Democrats and liberals more generally may be more sensitive to concerns about harm to vulnerable groups (Graham et al., Reference Graham, Haidt and Nosek2009). In line with this prediction, we find that both Democrats (compared to Republicans) (Panel B) and liberals (compared to moderates and conservatives, respectively) (Panel C) are significantly more likely to rule out voting for politicians who have made insensitive remarks. Regarding demographics, scholarship indicates that aversion to insensitivity may be more pronounced among racial minorities (Hutchings and Valentino, Reference Hutchings and Valentino2004), women (Bittner and Goodyear-Grant, Reference Bittner and Goodyear-Grant2017), and youth voters (Holbein and Hillygus, Reference Holbein and Sunshine Hillygus2020), for whom identity politics is potentially more salient. As expected, with gender, we find that female respondents are more likely than male respondents to rule out voting for politicians linked to controversial speech (Panel E). However, results yield no clear statistical differences for respondents of different races (Panel D) or age cohorts (Panel F).
For our main models, we compare all experimental conditions within their respective treatment categories. As explained above, we made specific predictions only about which one variable within each category leads to the highest voter rejection of politicians. We fit linear models of the form:

where Y is the DV (binary of ruling out voting for the candidate, thermometer scale of “warmth” toward the candidate, and scale of objection to behavior), A is a vector of dummy variables for each attribute level j (our IVs), and X is the series of respondent-level covariates. We binarize the “aggravating factors” variables and sum them together for a total value (from 0 to 3). We also binarize the “respondent-politician incongruence” variables based on whether respondents have all incongruent background traits with politicians or not. Our respondents are indexed by i, and k (1–3) indexes for one of the three candidates that respondents assess. Our main IVs are summarized in Table A.6.
Main estimations. Figure 3 displays the MMs associated with the main DV of ruling out voting for the candidate. Alongside these estimations, Figure A.2 reports AMCEs with confidence intervals.

Figure 3. Marginal means (MMs) of the probability of answering “I would never consider voting for this candidate.”
Note: In panels A–E, we boldface variables for which the difference with the reference category is statistically significant, calculated as an AMCE (q < .05). The reference category is the topline variable in each case (“no speech controversy,” “incongruent,” “no comment” “no aggravating circumstances,” “some congruent factor”). Full AMCE results are shown in Figure A.2, with point estimates, t-statistics, and
${\rm{q}}$
-values for all variables in Table A.7.
Panel A reveals that, contrary to Prediction 1a, respondents react most negatively to politicians accused of using dehumanizing language, not slurs. However, slurs are rated more objectionable than stereotyping or denying that discrimination exists against identity groups. In Panel B, consistent with Prediction 1b, respondents are significantly more likely to rule out voting for politicians when the target of the insensitive speech belongs to their own identity group. Panel C shows that, contrary to Prediction 2, which anticipated that excuses would elicit the most negative reaction, no specific type of politician response statistically stands out as the most likely to increase the likelihood of respondents ruling out voting for politicians. In fact, politicians who make excuses actually tend to fare better than those who defend their insensitive speech or (perhaps surprisingly) those who offer no comment. Panel D supports Prediction 3, indicating that the presence of more aggravating factors increases the likelihood of respondents being unwilling to vote for politicians. Finally, Panel E is inconsistent with Prediction 4, showing that respondents are no more or less likely to rule out voting for politicians when all of the respondent’s background traits are incongruent with the politician’s. Figure A.3 shows the AMCEs for the two alternative DVs: the 1–100 feeling thermometer, and 1–7 scale of how objectionable the respondent finds the candidate’s behavior. In both cases, results are substantively identical to those of our main “ruling out” variable.
Additional analyses. In addition to estimating aggregate subgroup effects as described above, Figures A.4–A.8 also report our main findings with a breakdown of respondents by political party/ideology and demographics (race, gender, and age). Directionally, results are similar to the main models for the various subgroups, although less precisely estimated due to the smaller sample size within each category. Figure A.9 unpacks the effects of respondent congruence with the target of a speech controversy (by race, religion, sexual orientation, and gender). All of the effects are directionally positive as expected and statistically significant, showing that any form of respondent-target congruence makes subjects more likely to rule out voting for politicians. Figure A.10 disaggregates the respective effects of aggravating factors that we collapsed into a composite. Results are again directionally positive as predicted and statistically significant for each factor. Respondents are more apt to rule out voting for politicians whose statements are more recent, planned, and reflect a pattern. Lastly, Figure A.11 disaggregates the effects of respondent incongruence with a politician’s background traits by political party and demographics (race, gender, and age). All of the effects are directionally positive as predicted, though only political party is statistically significant. Respondents are more inclined to rule out voting for politicians accused of insensitive speech who are of the opposing party.
Robustness checks
We conduct several robustness checks to ensure the reliability of our findings. First, given that we present candidates running for state legislature, and models average over the demographics of candidates presented (De la Cuesta et al., Reference De la Cuesta, Egami and Imai2022), Figure A.12 Panel A weights observations so that the demographics of the candidates in the experiment (by race, gender, and age) match those of elected officials in state legislatures across the U.S.Footnote 11 Additionally, because we do not observe the full target distribution of controversial speech acts, we also use a procedure analogous to De la Cuesta et al. (Reference De la Cuesta, Egami and Imai2022) in Figure A.12 Panels B and C that “bounds” our estimates. Specifically, we reweight our sample under a “low expected rule out” scenario (where observations denoting attribute levels expected to most correlate with voters not ruling out voting for a politician are double-weighted) and a “high expected rule out” scenario (where observations denoting attribute levels expected to most correlate with voters ruling out voting for a politician are double-weighted). In general, results look substantively similar to our main findings. Lastly, as described above, we test whether the results are affected by idiosyncracies in the phrasings of the controversial remarks by the politicians or their responses. Figure A.13 estimates differentiated effects for quotes under each category, as presented in Table A.2 and Table A.3. Results again appear substantively robust to alternative wordings of the politician quotes. However, it is worth mentioning that we do find considerable variability in different kinds of insensitive speech depending on both the group targeted and the particular language used. For example, in multiple instances, the use of slurs had similar effects as dehumanizing language, even though the former produced larger negative reactions overall. In some cases, such as slurs targeted at Blacks and Jews, the effects are actually larger than the average effects of using dehumanizing language. In others, such as slurs targeted at women, Hispanics, and gays, effects are generally below the average effects of the use of dehumanizing language.
Conclusion
Our multi-part, preregistered survey experiment in the U.S. sheds light on how both the content of speech and the situation surrounding it affect voter reactions. Although prior accounts have identified such variables as potentially impactful, studies have not contrasted their relative salience within an integrated framework. In a national survey experiment, we found that respondents are most likely to reject politicians for insensitive speech when the target is a member of their own identity group (race, religion, sexual orientation, and gender), when there are more aggravating circumstances (such as being repeat offenders), and when politicians are of a different party from the respondent’s own.
Our findings advance a template of relevant factors shaping public reactions to speech scandals, permitting further studies to add, revise, or delete criteria as appropriate. Future research might expand on our results in several ways. First, experiments could present vignettes of controversial speech that simulate how voters obtain information in other “real-life” scenarios (e.g., via social media feeds). Additionally, qualitative analyses of real speech controversies might trace how the mechanisms that we identify as important operate in practice by changing recipients’ beliefs or emotions. Scholars could also investigate how less tangible traits like personality or charisma affect the ability of politicians to eschew backlash from speech scandals. Moreover, they could explore how controversies affect particular demographic groups differently, including potentially galvanizing the support of some subgroups. Finally, studies could examine speech controversies in other contexts, such as journalism, academia, or entertainment.
Our study complements a large literature on public responses to scandals, which has mostly focused on topics like extramarital affairs, financial corruption, and other indiscretions. Despite the attention given in the media and popular culture to speech controversies, the question of why some politicians seem more or less immune to criticism remains ripe for research. Examining how social norms and “evolving standards of decency” change over time, vary both within and across societies, and lead to different outcomes for politicians are key. Such extensions could help political scientists better comprehend the varied responses to politicians accused of “wrongspeak.” These issues are likely to remain relevant given the momentum of social justice movements and moves to redefine what public figures can (and cannot) say across diverse areas.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/XPS.2025.10010
Data availability
The data, code, and any additional materials required to replicate all analyses in this article are available at the Journal of Experimental Political Science Dataverse within the Harvard Dataverse Network, at https://doi.org/10.7910/DVN/DUVWXQ. See (Gift & Lastra Anadón, Reference Gift and Lastra Anadón2025).
Acknowledgements
Support for this research was provided by the Department of Political Science at University College London and is gratefully acknowledged. We thank audiences at the American Political Science Association annual meeting and the IE politics research group seminar. The authors thank Zoey Weisman and Victoria Krueger for their excellent research assistance and Ryan Enos and the Harvard Digital Lab on the Social Sciences for hosting the pilot of this study and for comments on the research design.
Competing interests
The authors declare no conflicts of interest.
Ethics statement
This study was approved by the ethics review board at IE University Research Committee reference #IERC-16/2024-2025.