Introduction
There has been longstanding attention paid to the role of ethnicity in shaping language variation and change, from as early as Labov’s founding studies in Martha’s Vineyard (Labov, Reference Labov1963) and New York (Labov, Reference Labov1966), such that ethnicity is recognized as one of the fundamental social variables alongside gender and social class. However, while gender and social class are regularly considered in relation to each other (Labov, Reference Labov1990), ethnicity is often given privileged status and considered independently of other social factors, reflecting an assumption that ethnically marked ways of speaking serve to index ethnicity only (as observed by Britain, Reference Britain, Kerswill and Wiese2022:332; Eckert, Reference Eckert2008:26).
In this paper, we challenge this assumption by demonstrating an intersection between ethnicity and social class. The analyses presented here build on a body of work on ethnic variation in Australian English that has demonstrated not only that social class is a strong explanatory factor for language variation among Australians of Anglo-Celtic background and of minority ethnic backgrounds, but also that the patterning of different ethnic groups is impacted by corresponding social class affiliations, such that apparent ethnic differences are often diminished once social class is factored in (as summarized in Travis, Reference Travis2024:177-178). The evidence for this has come from the linguistic behavior observed across different ethnic and social class groups, following the standard approach in sociolinguistics. However, an approach that relies on pre-determined social categories is not ideal if the purpose is to explore the nature of those categories (cf. Horvath & Sankoff, Reference Horvath and Sankoff1987:182-184). First, it assumes a priori the reality of these social groupings, which is problematic because social class is notoriously difficult to define and has not been widely studied in Australian English, and because ethnicity is not a monolithic concept. Second, it can be difficult to obtain sufficient representation across ethnic and social class groups to fully test interactions, given that ethnic groups often show skewed social distribution. In order to address this, here we reverse the lens, to first identify groupings defined according to linguistic behavior and then explore how these groupings align with social dimensions, following the pioneering work of Horvath and Sankoff (Reference Horvath and Sankoff1987).
We consider 10 linguistic variables spanning phonetic, morphophonological, morphosyntactic, and discourse levels, occurring in approximately 750,000 words of speech recorded with 159 Australians across two time periods (the 1970s and 2010s). These variables are the fleece and face vowels, word-final -er, prevocalic the, word-final unstressed -ing, existential there’s, modals of obligation have got to and need to, and quotatives go and be like. Speakers come from the majority Anglo-Celtic population and three of Australia’s largest migrant groups (Italian, Greek, and Chinese Australians), and are further stratified by age, gender, and social class. To identify linguistic groupings, we conduct cluster analyses using speaker random intercepts extracted from independent regression analyses that included the linguistic conditioning only for each of the 10 variables. Cluster analyses for the 1970s and the 2010s groups point to an intersection between ethnicity and social class, in that ethnic groups cluster with their age and social class cohorts such that no clusters are defined by ethnicity alone. This methodology provides an empirical, quantitative foundation for exploration of social groupings based on linguistic behavior, and the result evidences the interconnectedness of these social dimensions in shaping linguistic variation.
Ethnic variation in its social context
A key question that has been explored in relation to ethnic variation relates to the “integration of ethnic groups into the social system” (Labov, Reference Labov1966:vi). One particularly marked case of ethnic variation is the development of “ethnolects” (e.g., Hoffman & Walker, Reference Hoffman and Walker2010:42) or “multiethnolects” (e.g., Cheshire et al., Reference Cheshire, Kerswill, Fox and Torgersen2011:153), that is, new language varieties that arise in contexts where people from different ethnic and linguistic backgrounds come together. Such varieties have been interpreted as being used as “a means of expressing linguistic identity” (Clyne et al., Reference Clyne, Eisikovits, Tollfree, Blair and Collins2001:226).
Another line of research has been around the participation of ethnic minorities in wider patterns of variation and change, and here also, ethnic identity has been drawn on as an explanation for any differences observed. For example, young people of migrant background in Gothenburg, Sweden, were reported to be differentiated from the Swedish majority in that they were leading in a change, being the first in the city to adopt a shift taking place in Stockholm towards a more open realization of the /ɛ:/ vowel (Johan et al., Reference Gross, Boyd, Leinonen and Walker2016:238), whereas Turkish and Moroccan youths in the Netherlands were reported to be differentiated by lagging in a change towards a diphthongized variant for Dutch /ɛi/, preferring the older monophthongal realizations “characteristic of the ‘traditional’ urban dialects” (van Meel et al., Reference van Meel, Hinskens and van Hout2014:68). In both cases, this patterning was interpreted as serving to index ethnicity (Johan et al., Reference Gross, Boyd, Leinonen and Walker2016:243; van Meel et al., Reference van Meel, Hinskens and van Hout2014:69). Hoffman and Walker identified similar linguistic conditioning among British, Italian, and Chinese Canadians who grew up in Toronto for both the Canadian Vowel Shift, a change in progress, and (t/d)-deletion, a stable variable (Reference Hoffman and Walker2010:52, 55), but found some differences in rates of participation and deletion, leading them to propose that participants “may be using overall rates of use to construct and express ethnic identities” (Reference Hoffman and Walker2010:58). In Australia, differential patterning for Lebanese Australians’ realization of /l/, Voice Onset Time, and prosody has also been tied to ethnic identity (Clothier, Reference Clothier2019:1891; Clothier & Loakes, Reference Clothier and Loakes2018:15-16; Cox & Palethorpe, Reference Cox and Palethorpe2011:531).
It is important to acknowledge that not all studies find ethnic differences. In the Swedish study reported above, ethnic differences were observed in Gothenburg, but not in Stockholm (Gross et al., Reference Gross, Boyd, Leinonen and Walker2016:238). In Toronto, Hoffman and Walker’s finding regarding shared linguistic conditioning across ethnic groups in Toronto corresponds with other work by Hoffman that has reported no significant differences in participation in the Canadian Vowel Shift between British, Italian, and Chinese Canadians, as all groups were “acquiring and accommodating to the local norm” (Reference Hoffman2010:136). Another study comparing Chinese Americans in San Francisco and New York found that they were participating in change in the bought vowel as it was taking place in the city in which they lived, and differently from each other, such that their patterning aligned not with ethnicity, but with regional variation (Wong & Hall-Lew, Reference Wong and Hall-Lew2014:33).
Furthermore, where differences do emerge, assuming a priori that these index ethnicity may fail to account for differences within the one ethnic group and may risk overlooking other factors that may be at play. For example, one study identified differences in participation in the Northern Cities Vowel Shift by Jewish Americans in Chicago across more and less affluent neighborhoods, leading the authors to conclude that “productions were influenced by local place- and class-based meanings” (Benheim & D’Onofrio, Reference Benheim and D’Onofrio2024:152). Class-based meanings were also drawn on to account for participation in the Northern Cities Vowel Shift by some African Americans in Rochester, New York, where it was argued that a group of “Mobile Black Professionals” were adopting supralocal sound changes as part of their external orientation (King, Reference King2021:174).
While these studies have suggested that ethnic minorities draw on the potential indexicality associated with class-based differences, it has also been shown that ethnic patterning cannot be separated from social class characteristics of the relevant ethnic groups. This was demonstrated in early variationist work for Jewish communities in North America. For example, in comparing the linguistic behavior of Italian and Jewish Americans in New York City, though Labov found that ethnicity was a “more powerful factor” than social class for the trap vowel (Labov, Reference Labov1966:306), the distribution across the ethnic groups was skewed, with the Jewish participants having an overall higher class distribution than the Italian Americans (Labov, Reference Labov1966:293). In Boston, to interpret her finding that Jewish Americans were ahead of Italian and Irish Americans in a shift away from regionally specific patterning for the north vowel, Laferriere noted that Jewish Americans were more likely than both Italian and Irish Americans to be university educated and work in white collar occupations, rendering them “the first to adopt the standard variant, associated with the values of the larger society, through educational exposure” (Laferriere, Reference Laferriere1979:612). Many years later, Boberg attributed his finding of greater participation in Canadian norms by Jewish than Italian Canadians to Jewish Canadians’ “access to higher education, and therefore exposure to mainstream, middle-class, Canadian English” (Boberg, Reference Boberg2004:563). A parallel in the United Kingdom may be what has been named “British Asian English,” spoken by people of Punjabi heritage in a middle class area in West London, which, unlike Multicultural London English (spoken in underprivileged areas), is described as aligning with the norms of “high prestige Standard Southern British English,” while also incorporating some features from “elite Indian English” (Sharma, Reference Sharma, Beaman, Buchstaller, Fox and Walker2020:56). In Australia, work that compares speech in more versus less ethnically diverse regions in Sydney is also faced with a skewed distribution, in that the more linguistically diverse areas (specifically, Western Sydney) tend to be more socio-economically disadvantaged (Cox & Penney, Reference Cox and Penney2024:204). Most recently, Walker found that level of education helped explain differences in patterning of -ing among Chinese, Italian, and Portuguese Canadians in Toronto: the Chinese Canadians (all of whom were university educated) had the lowest rate of vernacular -in’ and no gender difference, and the gender differences among the Italian and Portuguese Canadians corresponded to the different educational backgrounds of the men and women (Reference Walker2024:302).
Some of the first studies to directly test the intersection between ethnicity and social class were conducted in Australia, using a corpus collected in the 1970s and 1980s. In this early work, Horvath found that Australian-born teenagers of Italian and Greek backgrounds were leading a change in Australian English vowel realizations, away from both “broad” and “cultivated” realizations (associated with working class speech and British Received Pronunciation, respectively), towards more “general” realizations (Horvath, Reference Horvath1985:91-94). She interpreted this as an effort by the teenagers to differentiate themselves from their Italian- and Greek-born parents, whose accented vowels were described as “ethnic broad” (Horvath, Reference Horvath1985:69), in that they were qualitatively distinct from realizations of the Anglo-Celtic majority (being influenced by the speakers’ first language) and were closest to “broad” vowels, consistent with their work in lower-status occupations, for example factories, the construction industry, and family businesses such as milk bars or vegetable shops (Horvath, Reference Horvath1985:46; cf. Jupp, Reference Jupp2001:83). The vowel realizations of the Greek and Italian teenagers, in contrast, were not qualitatively distinct from those of their Anglo peers, though these groups were socially distinct—neither the Greeks nor the Italians exhibited the same gender differences as the Anglos, and the Italians did not exhibit the same class differences (Horvath, Reference Horvath1985:81-83). Horvath hypothesized that, at this time, “the safest ground for ‘sounding’ Australian, whether Greek, Italian or Anglo, [was] the middle of the Broad-General-Cultivated continuum” (Reference Horvath1985:176), and in this way, the Italian and Greek teenagers drove forward a change that was underway in Australian English. Using the same corpus, Guy, Horvath, and colleagues found no linguistic differentiation in High Rising Terminal pitch by the Anglo, Italian, and Greek teenagers, with, again, the only distinction being in relation to the social conditioning, specifically in how the ethnic groups respond to class differentiation (Reference Guy, Horvath, Vonwiller, Daisley and Rogers1986:40). Our own recent research, in which we have reanalyzed these same data together with a comparable contemporary corpus (outlined in the following section), corroborates this general pattern.
What is clear from this body of work is that ethnic variation cannot be assumed to be directly indexing ethnicity and must be considered in relation to the broader social context. Britain has observed that what are assumed to be “ethnic” variants may not “merely (or possibly even mainly) index ethnicity per se, but potentially a localized, gendered, classed, age-constrained identity (too),” leading him to call for a shift away from an “asocial perspective” to pay greater attention to potential intersectionality (Britain, Reference Britain, Kerswill and Wiese2022:332). This paper seeks to address this call, and to do this at scale, looking across multiple variables, different ethnic groups, and over time.
Sydney Speaks
The data for this study come from the Sydney Speaks corpus (Travis et al., Reference Travis, Grama and Purser2023), a sociolinguistic corpus collected for the purpose of examining language variation and change in one of Australia’s largest and most diverse cities. The corpus totals over 1.2 million words of transcribed speech from 130 hours of recordings with 265 Australians taken from three sub-corpora—recorded in the 1970s, 1980s, and 2010s—with birth years spanning from 1889 to 2001 (see Travis, Reference Travis2024, for a corpus overview).
Participants
For this paper, we draw on two sub-corpora: the Sydney Social Dialect Survey recorded in the 1970s with Australians of Anglo-Celtic, Greek, and Italian backgrounds (Horvath, Reference Horvath1985), and the ANU Corpus of Sydney Speech, recorded from 2014 to the present with Australians of Anglo-Celtic, Greek, Italian, and Chinese backgrounds. Participants are stratified for age, gender, socio-economic class, and ethnicity, allowing for ethnic variation and change to be examined in both real and apparent time. A total of 159 participants are included in this study (restricted to those for whom data for all the variables is available; see the section below on the linguistic variables). As summarized in Table 1, these participants represent four time points, based on time of recording and year of birth: Adults and Teens recorded in the 1970s (born around the 1930s and 1960s, respectively), and Adults and Young Adults in the 2010s (born around the 1960s and 1990s, respectively).Footnote 1 As there is evidence of language change over this time, for the purposes of the analyses presented here, we analyze each group with their cohort recorded in the same time period.Footnote 2
Table 1. Participants by age, ethnicity, and gender

All participants grew up in Australia (either being born in Australia or having arrived before the age of six) and live in Sydney. The different ethnic groups included represent the largest groups in Australia at the time of recording. Anglo-Celtic Australians are defined as those whose parents and grandparents were born in Australia, and who grew up in an English-speaking household; this represents the majority community in Australia, and they are included at each time point. Greek and Italian Australians are included in the Sydney Speaks corpus from the 1970s Teens on. In the analyses conducted here, we include Italian Australians for three time points, but Greek Australians for the 1970s Teens only, due to low numbers of participants for the other age groups. The 1970s Teens and 2010s Adults are the children of the first large wave of migration from Greece and Italy following the Second World War (Jupp, Reference Jupp2001:83). The 2010s Italian Young Adults are primarily second-generation Australians; three of the 12 are third-generation Australians. The Chinese Australians are second-generation Australians whose parents migrated from Hong Kong and Cantonese-speaking areas in mainland China, primarily under business migration schemes that ran in the 1980s, following the opening up of Australian migration policy (Jupp, Reference Jupp2001:218). The Greek, Italian, and Chinese participants all grew up in homes in which the respective language was spoken; they thus all have exposure to the language, but not all of them speak it.
Participants are stratified for social class, which was determined from a composite score made up of three independent factors, obtained from a demographic questionnaire that was carried out immediately following the sociolinguistic interview: occupation, for the participants themselves or for their parents for those who were still in high school (five points, scored according to the Australian Socio-Economic Index, AUSEI; McMillan et al., Reference McMillan, Beavis and Jones2009); highest level of education (five points, distinguishing between non-completion of high school, completion of high school, technical college [known as TAFE], university, and post-graduate degree); and type of high school, based on historical and current government funding as well as fees and resources typically contributed by families for attendance (four points, distinguishing between State, Catholic, Selective [prestigious, government-funded high schools], and Independent schools). These scores were summed together to provide a continuous measure, on a scale from three to 14. This operationalization allows us to capture, in a single measure, the complexity of social class, by taking into account different elements that make it up.
The different ethnic groups are not evenly spread across social class, as can be seen in the density plot in Figure 1, which shows the distribution of the participants by class and ethnicity for the different age groups. Class is presented on the x-axis, moving up in social class score as we move farther to the right, and the four age groups are presented on the y-axis, from oldest to youngest as we move down. The different ethnic groups are captured in color, and height represents the number of speakers in each group. First, of note here is the overall higher social class scores for the 2010s participants relative to the 1970s participants, seen in that the peaks fall farther to the right as we move down the chart, especially for the 2010s Young Adults. This is largely due to changes in higher education tuition funding from 1974 onwards, which made university more accessible, resulting in increased participation rates (Norton, Reference Norton2023:32-33).

Figure 1. Social class scores for participants by ethnicity over time (n = 159).
Of more interest to us here is the distribution of the ethnic groups according to social class. For the 1970s Adults and Teens, the Anglo Australians cover the full range; the Italian and Greek Teens, on the other hand, fall slightly lower on the range (the peaks are to the left of the Anglos), a reflection of the working-class occupations of their parents. The 2010s Adult Italians, however, have a wider distribution, reflecting the upward mobility this group experienced over this 40-year time period (Ricatti, Reference Ricatti2018:41). For the 2010s Young Adults, the Italian-background participants have limited distribution, most falling between six and nine (a distribution that we are working to broaden through ongoing data collection). The Chinese Australians, in contrast, are overall higher on the scale, something which is not a sampling issue but a reflection of the nature of the community of second-generation Australians of Cantonese background today, a group that tends to attend selective or private high schools, achieve high levels of education, and work in professional occupations (Jupp, Reference Jupp2001:221).
Sociolinguistic interview data
The recorded speech comes from sociolinguistic interviews (Labov, Reference Labov1972), recorded in the 1970s by Horvath and associates, and in the 2010s by research assistants who were members of the same ethnic communities as the participants. From interviews lasting between 45 and 90 minutes, approximately 30 minutes were selected for transcription, which amounted to around 4,500 words of speech from each participant. The data were transcribed orthographically in ELAN (Lausberg & Sloetjes, Reference Lausberg and Sloetjes2009) and force-aligned in LaBB-CAT (Fromont & Hay, Reference Fromont and Hay2012). Vowel alignments were manually corrected and processed in Praat (Boersma & Weenink, Reference Boersma and Weenink2024) to obtain estimates of F1 and F2 values for analysis (more details on corpus preparation available in Travis, Reference Travis2024:172-175).
Though the topics covered vary across the two time periods and across individual interviews, common themes that came up include participants’ life experiences (e.g., growing up in Sydney, travel, work, and hobbies), language use, network, and identity, all of which can be helpful for understanding the social background of the participants and interpreting the linguistic behavior observed. To illustrate, Example 1 presents an excerpt from an interview with a Young Adult Chinese Australian, Lathan, who is talking about accent differentiation across Sydney and outlining his belief that, more than ethnic differences, there are class-driven regional differences, in what turns out to be a very astute observation.
(1)

Linguistic variables
As this work is concerned with the social patterning for multiple variables, we do not have space for detailed explanation of the conditioning of each individual variable. We therefore utilize a set of variables for which analyses are already available, allowing us to focus our attention on the patterning across these variables as a set. The variables included are listed in Table 2, along with the nature of the variation observed in prior Sydney Speaks work and the number of tokens included in the regression analyses (the model summaries of which are produced in the Supplementary Materials). It is necessary to consider the 1970s and 2010s data separately because the variation is not identical at the two time points, and because we include different ethnic groups at each time point. We now briefly describe each of the variables in turn.
Table 2. Linguistic variables included

The variables come from different linguistic levels. Two phonetic variables are the vowels fleece and face, which have been demonstrated to be undergoing raising and fronting over time, moving away from realizations traditionally associated with men and the working class, though with different timing, with fleece shifting from the 1970s and face from the 2010s (Grama et al., Reference Grama, Travis, Gonzalez, Van de Velde, Hilton and Knooihuizen2021:300, 306; Purser et al., Reference Purser, Grama and Travis2020:284). As noted above, Italian and Greek teenagers were at the forefront of this change in the 1970s (Horvath, Reference Horvath1985:94), whereas today, it is the Chinese Australians who are the most ahead (Grama et al., Reference Grama, Travis, Gonzalez, Van de Velde, Hilton and Knooihuizen2021:307). For these variables, as an overall measure, we use the position along the front diagonal, calculated by subtracting F1 from F2, using the Lobanov normalized values converted back to Hz (as was done in Grama et al., Reference Grama, Travis, Gonzalez, Van de Velde, Hilton and Knooihuizen2021:302).
We include three variables at the morphophonological level: word-final -er; prevocalic the; and word-final unstressed -ing. Word-final -er refers to unstressed schwa in word-final position followed by an orthographic “r” that is typically not pronounced in Australian English, which is a non-rhotic variety (e.g., teacher, culture). Grama et al. (Reference Grama, Travis and Gonzalez2020) found lengthening (and accompanied backing and lowering) of this form beginning with the 1970s Teens, and in particular the Greek Teens, for whom this appears to have been an ethnolectal marker (cf. Clyne et al., Reference Clyne, Eisikovits, Tollfree, Blair and Collins2001:228; Kiesling, Reference Kiesling2005:20). Today, -er is no longer differentiated by ethnicity, with the lengthening having been taken up across the community (Grama et al., Reference Grama, Travis and Gonzalez2020:358-359; Sheard, Reference Sheard2022). Prevocalic the (e.g., in the other, the Italian) is currently undergoing a change from fleece to schwa in Australian English, as in other varieties (e.g., Cox et al., Reference Cox, Penney and Palethorpe2023:15; Fox, Reference Fox2015:167; Hay et al., Reference Hay, Walker, McKenzie and Nielsen2012). In Australia, this is a recent change, seen in apparent time comparisons between the 2010s Adults and Young Adults, and is led by middle-class women and by Chinese Australians (Gan & Travis, Reference Gan and Travis2022:58-59). The variation between velar and alveolar -ing (e.g., going/goin’) follows a similar pattern in Australia as elsewhere: it is generally stable, with the alveolar variant favored by men and the working-class. The highest alveolar rates are found here with the 1970s Teens, largely consistent with an adolescent peak, and the 2010s Adult Italians, attributed to the working-class associations of this group; the lowest rates are found with the 2010s Young Adult Chinese (Travis et al., Reference Travis, Grama and Purser2023:446).
At the morphosyntactic level, we consider existential there’s and modals of obligation. Existential there’s refers to the alternation of the singular and plural verbs with plural arguments (there’s versus there are; e.g., there’s lots of local breweries around, [SydS_AOF_005]). There was a marked increase in there’s from the 1970s Adults to Teens, led by the working class, and today, this change has largely stabilized, with 2010s Adults and Young Adults exhibiting a similar rate of there’s as the 1970s Teens (Gan, Reference Gan2024:151, 159). For modals of obligation, we consider the long-standing variant have to in variation with have got to, for which there is largely stable variation, with a favoring of have got to (also in the reduced form gotta) by the working class and men. In the 2010s a new variant has arisen, namely need to, which is not socially distinguished from have to but which is favored by the middle class and by Chinese Australians relative to should (Travis & Torres Cacoullos, Reference Travis and Torres Cacoullos2023:364, 369).
Finally, we include quotatives as a discourse-pragmatic variable. Variation across quotatives in Australia (as elsewhere) includes say, be like, think, go, and zero quotative, as well as other miscellaneous verbs (tell, wonder, etc.). In the 1970s, quotative go increased as a competing variant (e.g., he goes mate, come on let’s go for a drive. [SydS_IOM_058]), and in the 2010s, be like has arisen and taken over as the majority variant among Young Adults (e.g., I’m like, I don’t even have this much food, [SydS_IYF_037]) (Lee, Reference Lee2020:34). The 1970s rise of go was led by the Greek Teens, and by the Anglo and Italian middle and lower middle class men, while 2010s be like is favored by women, the middle class, and Anglos and Italians over Chinese Australians (Lee, Reference Lee2020:62, 76).
This set of variables includes some that exhibit change that is ongoing over the period examined (fleece, -er) and some that are stable (-ing, have to versus have got to), as well as others that undergo change in one period and are stable in the other (e.g., face stable in the 1970s, there’s stable in the 2010s), and others that are only relevant in one time period (quotatives, go in the 1970s and be like in the 2010s; the + Vowel and need to in the 2010s). Some have clear class and gender differentiation (e.g., -ing, have got to), while others do not (e.g., -er in the 2010s), and some exhibit related ethnic differentiation, with the Italian and Greek Teens and Italian Adults tending to favor forms that are more associated with men and the working class, and Chinese Young Adults forms that are more associated with the middle class (e.g., -ing). The individual analyses carried out already paint a picture of class-ethnicity intersections for many variables, but we might wonder whether this result has been promoted by assuming defined ethnic and social class groups. Thus, we now test this further by examining how speakers are clustered when we consider only their linguistic behavior. If this reveals clear ethnic groupings, this would support the primacy of ethnicity in driving variation; if not, then the emerging groups should help us better understand the relationship between ethnicity and other social factors.
Cluster analyses
The method we employ to group speakers according to their linguistic behavior is the clustering technique of Divisive Analysis (DIANA), following previous work assessing social groupings in New York (Haddican et al., Reference Haddican, Cutler, Newman and Tortora2021, Reference Haddican, Newman, Cutler and Tortora2022).Footnote 4 Prior to presenting the results, we discuss the preparation of the data for entering into the clustering algorithm, the nature of the clustering performed, and the methods used to interpret the clusters.
Speaker random intercepts as a measure of speaker behavior
In order to be able to group speakers according to their linguistic behavior, the first step is to obtain comparable data for each of the linguistic variables included. The raw values are on different scales (Hz measurements for the vowels, duration in milliseconds for -er, and rates of a certain variant for the categorical variables). Though we could scale these to make them comparable, relying on raw values is still potentially problematic because they are impacted by the linguistic context in which the instances occur, and the distribution of those contexts is not even across speakers in these spontaneous speech data (cf. Torres Cacoullos & Travis, Reference Torres Cacoullos and Travis2018:148). For example, a speaker who produces a high proportion of quotatives in the historical present is likely to have a higher rate of the go quotative than others who use less historical present because this is an environment that strongly favors go (Lee, Reference Lee2020:57). In such a case, a higher rate of go may reflect the use of historical present more than a general favoring of go.
In order to control for the impact of predicting factors (such as tense for quotatives), we use by-speaker estimates from independent regression models run for each variable as a measure of “how much that individual’s trend diverges from the predicted trends set forth in the statistical model” (Drager & Hay, Reference Drager and Hay2012:62). These models are based on the patterning already identified in our prior studies in the literature referred to in the previous section on the linguistic variables but exclude the social predictors to ensure that the clusters are formed on the basis of the linguistic conditioning only. For continuous variables (fleece, face, -er) we used the lmer function, and for categorical variables (the + Vowel, -ing, there’s, modals of obligation, quotatives) we used the glmer function (in the lme4 package; Bates et al., Reference Bates, Mächler, Bolker, Walker, Christensen, Singmann, Dai, Scheipl, Grothendieck, Green and Fox2019), including speaker as a random intercept in all models. (See Supplementary Material for model summaries for all variables.)
We then extracted the speaker random intercepts from each model and used these as the dependent variable in the divisive clustering analysis. These speaker random intercept scores are on different scales, coming from both continuous and categorical variables, and thus we scaled them in order to make them comparable (by centring each score and then dividing it by the standard deviation for each set to produce a z-score, using the scale function in R) (cf. Haddican et al., Reference Haddican, Cutler, Newman and Tortora2022:516).
The reliability of the random intercepts in capturing speaker behavior is supported by a strong correlation with raw values. Figure 2 visualizes this correlation for one continuous (fleece) and one categorical variable (-ing), in the 1970s (top panes) and 2010s (bottom panes). The x-axis represents the scaled speaker random intercepts, and the y-axis the raw values (normalized F2 minus F1 for fleece and rate of velar for -ing). Farther to the right on the x-axis represents a positive intercept (and thus a favoring of a higher and fronter fleece, or of velar over alveolar -ing), and higher on the y-axis represents a higher and fronter fleece as measured in Hz, or a higher rate of velar -ing. As can be seen, the values are strongly correlated, indicating that the overall picture would be similar regardless of the measure used. In taking account of the linguistic conditioning, however, the random intercepts distinguish the speakers better. For example, for -ing there are several categorically velar speakers who differ according to the random intercept, such as Jade (a 1970s Anglo Adult) and Lisa (a 1970s Italian Teen), who are both categorically velar, but whose scaled random intercepts differ (1.947 for Jade and 0.838 for Lisa). Examination of their data reveals that Jade produces proportionally more tokens preceding a coronal (e.g., going to school, something like that), an environment that is highly propitious to the alveolar realization (Travis et al., Reference Travis, Grama and Purser2023:448). The high intercept reflects her strong favoring of the velar variant overall, even in contexts where the alveolar variant is favored. In this way, we consider the random intercept to be a more reliable measure than the raw values.

Figure 2. Correlations between the by-speaker random effects and raw values.
We obtain speaker random intercepts for each of the variables included, and thus for the 1970 analysis, we obtain seven measures for each speaker, and for the 2010s nine measures (see Table 2); this is the data on which we perform the clustering.
Identifying the social makeup of the linguistically defined clusters
The divisive clustering technique we use (DIANA) is particularly well suited for identifying potentially intersecting groups for a number of reasons, as outlined by Haddican and colleagues (Reference Haddican, Newman, Cutler and Tortora2021:149, Reference Haddican, Cutler, Newman and Tortora2022:516). This is a top-down clustering approach, which starts with the entire dataset (here, all speakers, with the corresponding random intercept scores for each variable) in a single cluster and recursively splits the data (the speakers) into smaller clusters based on their similarity across these variables until reaching individual speakers. Clusters are determined based on the overall similarity between sets of speakers across all variables, measured in Euclidean distance (the square root of the sum of differences). Unlike other clustering algorithms, the number of clusters is not pre-defined but is determined on the basis of the data, which is important here, as how many clusters there are in the data is one of the questions that we are asking. The hierarchical structure of this clustering method identifies splits in the dataset at different levels, thus bringing to the fore intersections across social categories, and the results are visualized in dendrograms, which facilitates the identification of groupings at different levels of the hierarchy.
Clustering algorithms will produce clusters because that is their function, but this does not mean that the clusters are meaningful. The validity of the clusters formed here can be demonstrated both socially and linguistically. To interpret their social makeup, we examine the distribution according to the age, gender, social class, and ethnicity of cluster members through visualizations in the form of dendrograms and scatterplots, and to interpret their linguistic makeup, we assess the relative contribution of the different linguistic variables to the clusters via random forest models.
The social makeup of the clusters
1970s
Figure 3 displays the dendrogram that is output from the cluster analysis of the 1970s data. The points along the bottom of the dendrogram represent individual speakers, clustered according to their linguistic behavior across these variables. The length of each branch indicates the relative similarity between speakers and clusters, thus the higher up the tree, the greater the difference between the groups. These clusters are entirely agnostic as to their social makeup, but we have visualized age (in shape) and ethnicity (in color) to give an initial sense of the patterns. It is by considering the social groupings that come out of the linguistic patterning that we can track both change over time and how that is taken up according to gender, class, and ethnicity.

Figure 3. Dendrogram of a DIANA cluster analysis: 1970s (81 speakers, 7 variables: fleece; face; -er; -ing; there’s; have got to; quotative go).
This model initially identifies three main clusters. Within Group 1, the largest cluster (57 members), we observe two subgroups that are distinct socially (labelled here Group 1.1 and 1.2, with 41 and 16 members, respectively), which will be examined separately. Groups 2 and 3 each have smaller numbers of members (18 and six, respectively), and we will not break them down further.
The order in which the clusters at different levels of the hierarchy are placed is arbitrary (that is, Group 1.2 is no more similar linguistically to Group 2 than Group 1.1 is), but the social patterns that emerge suggest that they are distinguished first and foremost by age, in that most of the Adults (empty squares) fall into Groups 1.1 and 1.2, while Group 2 has only two adults, and Group 3 has none. To better visualize the social composition of these groups, we reproduce the clusters from the dendrogram in scatter plots (Figure 4), re-ordering them in a socially relevant way, with the cluster containing the highest proportion of the oldest age group (1970s Adults) on the left, and the youngest age group (1970s Teens) on the right, thus ordering the panes in a way that captures the apparent time change. We further distinguish the members of each cluster according to age, gender, ethnicity, and social class.

Figure 4. Scatter plots for social groupings: 1970s.
In Figure 4, each of the four panes represents one of the clusters identified. We have split each pane according to age group (Adults on the left, Teens on the right), and the y-axis shows social class, with a higher score corresponding to higher social class. Men are marked by triangles and women by squares, and ethnicity is coded by color.
Here, it can more easily be seen that almost all the Adults occur in the first two panes (Groups 1.2 and 1.1, the two groups making up the first cluster in Figure 3). The first pane (Group 1.2) includes both Adults and Teens and spans a wide range for social class, but it is distinguished by gender, comprising more women than men. For ethnicity (relevant for the Teens, as the Adults are all Anglos), we find that the Anglos, Greeks, and Italians are all represented. The second pane (Group 1.1) is distinguished by having proportionally more Teens and proportionally more men than the first pane; again, it covers the range of social class, and for the Teens the three ethnic groups are represented.
The third pane (Group 2) comprises only two Adults, and only two women, and overall corresponds to lower social class than the previous two groups. We still see no ethnically driven clustering, with Anglo, Greek, and Italian Teens occurring together.
It is in the fourth pane (Group 3) that we do observe some ethnic differentiation. This small cluster, consisting of only six participants, is made up entirely of Italian and Greek Teens with low social class scores. This marks the first time we see ethnicity playing a defining role in the clustering of linguistic behavior, and it does so in conjunction with low socio-economic status.
For the 1970s, then, the clusters formed on the basis of linguistic behavior are primarily distinguished by age, reflecting change in apparent time. The distribution of the social groups across the clusters suggests that the changes are led by the working class and by men, seen in the gradual shift down in social class as we move from left to right across the panes, and in the shift from predominantly women to gradually more men for all but the final pane, which has both men and women. Crucially, ethnicity plays a minor role here, emerging as a distinguishing factor only in a small cluster and with participants of lower social class.
2010s
Figure 5 presents the dendrogram output from the 2010s analysis. Here, the initial split delineates two primary groups (Group 1 and Group 2), quite markedly distinguished by age, with almost all Adults (empty squares) and no Young Adults occurring in Group 2. Group 1 is divided into three subgroups (1.1, 30 members; 1.2, 15 members; 1.3, six members), and Group 2 into two subgroups (2.1, 12 members, and 2.2, 15 members). We consider the social makeup of these five groups presented in the scatterplot in Figure 6, in which (as for Figure 4) we have re-ordered the clusters to capture the progression of change, presenting the two sub-groups (made up of Adults) in Group 2 on the left, and then the three sub-groups (made up of Young Adults) in Group 1 on the right.

Figure 5. Dendrogram of a DIANA cluster analysis: 2010s (78 speakers, 9 variables: fleece; face; -er; the + Vowel; -ing; there’s; have got to; need to; quotative be like).

Figure 6. Scatter plots for social groupings: 2010s.
The two groups where almost all of the older speakers cluster, the first two panes in Figure 6, are distinguished primarily by gender, with the first pane (Group 2.2) having proportionally more men and the second (Group 2.1) having proportionally more women. Both groups cover a broad social class distribution and, importantly, Anglos and Italians are similarly distributed in both. Thus, the separate grouping of some Italian and Greek Teens with lower social class scores that we saw for the 1970s has dissipated for this age group.
The next three panes (Groups 1.1, 1.2, and 1.3) represent the younger speakers, who are not differentiated by gender but are differentiated by social class. Group 1.1 has a similarly broad class distribution as the two Adult groups, while in Groups 1.2 and 1.3, there are progressively fewer speakers with low class scores. There is a corresponding shift in the ethnic makeup of these clusters; Anglo Young Adults are distributed across the three clusters, though the majority are situated in 1.1 and 1.2, with the Italian Young Adults, whereas the majority of the Chinese Young Adults are situated in Groups 1.2 and 1.3 (11 out of the 17 Chinese Australians occur in these groups, as opposed to six out of the 18 Anglo Australian Young Adults and two out of the 12 Italian Australian Young Adults, both in 1.2).
For the 2010s, we thus see clustering according to age, as was also seen for the 1970s, with gender playing a role in further differentiating the age groups. For ethnicity, the Anglo and Italian Australians pattern similarly, and though the Chinese Australians are distinct, this is again in conjunction with a class distinction, this time patterning with higher social class.
Summary: social class, ethnicity, and ethnic orientation
In sum, in neither time period do we get clustering according to ethnicity alone, but rather the ethnic patterning that we observe is tied to social class, specifically to lower social class for the 1970s Greek and Italian Teens, and higher social class for the 2010s Chinese Young Adults. It is also of note that the nature of the set of changes in each time period is distinct: in the 1970s, the change is led by men and the working class, whereas in the 2010s, it is led by women and the middle class. This may be due to the specific set of variables and ethnic groups considered here, rather than an overall shift in the leaders of language change over time, a question which we leave for further analysis. What is of more interest to us here is that the intersection between class and ethnicity remains, independent of the leaders of change: the ethnic groups pattern with their corresponding class affiliations.
Though here we have highlighted class distinctions, other work has highlighted distinctions in the degree to which members of an ethnic community orient to their ethnic heritage, and such “ethnic orientation” has been reported to impact linguistic behavior (e.g., Hoffman & Walker, Reference Hoffman and Walker2010:59). We have also considered ethnic orientation in Sydney Speaks, compiling information from the sociolinguistic interview and from the demographic questionnaire about the makeup of participants’ social network, language ability and use, visits to and connections with the countries of their parents, engagement with cultural traditions, and so on (see Travis, Reference Travis2024:171). However, ethnic orientation has typically been found to have no effect (e.g., Gan, Reference Gan2024:189; Lee, Reference Lee2020:94-95; Sheard, Reference Sheard2023:177-178, 212, 285), and in the few cases where an effect has been identified, this occurs with variables that are highly constrained by social class (lengthening of word-final -er for the Greek 1970s Teens; Sheard, Reference Sheard2023:249; and use of schwa versus fleece in prevocalic the for the Chinese 2010s Young Adults; Gan, Reference Gan2024:189). Perhaps even more telling is the fact that, for these groups, ethnic orientation correlates with social class: for the 1970s Greek Teens, higher ethnic orientation correlates with lower social class (and longer -er), and for the 2010s Chinese Young Adults, higher ethnic orientation correlates with higher social class (and higher use of schwa) (Gan, Reference Gan2024:192; Travis, Reference Travis2021). Thus, ethnic orientation, just like ethnicity, is tied to social class.
The relative contribution of the linguistic variables to the clustering
As a final consideration, we turn to the linguistic makeup of the clusters. The linguistic variables that form the basis of the clustering are for the most part structurally independent (an exception may be fleece and face, and the two sets of modals of obligation), ensuring that the results are not restricted to a specific aspect of language. However, we might ask whether the different linguistic features equally impact the clustering or whether the variation is driven more by any particular subset of variables. To determine the relative contribution of each variable to the clusters, we applied random forest analysis, which ranks the variables according to their relative importance in predicting an outcome. Random forests are a tree-based modeling method that use recursive partitioning, assessing the likelihood of each variant within a specified set of predictors and making binary splits in progressively smaller data subsets until no further significant splits are found. Random forests average the results across multiple conditional inference trees produced in this way to determine the overall importance of each predictor, based on randomly generated subsets of the data (Tagliamonte & Baayen, Reference Tagliamonte and Baayen2012:159-160). For the random forests produced here, the dependent variable was the cluster in which the speaker was placed (four for the 1970s and five for the 2010s), and the independent variables were the speaker random intercepts for each of the linguistic variables in the analysis (seven for the 1970s and nine for the 2010s). We used the ranger package in R (Wright & Ziegler, Reference Wright and Ziegler2017) and set it to produce 500 decision trees and to consider three linguistic variables at each split.
Figure 7 presents the results of the random forest analyses, displaying the importance ranking of the variables in determining the clusters for the 1970s data on the left and the 2010s data on the right.Footnote 5 The x-axis shows the weights of the variables, which reflect their importance, and the y-axis lists each variable in order of its ranking. All variables have scores greater than zero, indicating that each plays a role in shaping speaker groupings, but the wide range shows that they do this to varying degrees. The range is larger for the 1970s (2.7 to 10.6) than for the 2010s (2.9 to 7.8), indicating that the contribution of the different variables is more similar in the 2010s than it is in the 1970s data.

Figure 7. Two independent analyses testing the importance ranking for linguistic variables in the 1970s (L) and 2010s (R).
The ranking of the variables in each model is slightly different: for the 1970s, fleece contributes most to the speaker groupings, followed by there’s and -ing, and then (contributing progressively less) -er, quotative go, face, and modal have got to. For the 2010s, quotatives have the largest impact, followed by fleece, (have) got to, and face, then (with progressively smaller contributions) there’s, -ing, the + Vowel, and finally, -er and modal need to.
It has been proposed that phonetic variables are more widely available for social indexing than morphosyntactic and discourse variables (e.g., Cheshire, Reference Cheshire2005:479). If that applied here, the phonetic variables should be ranked higher in importance than the grammatical or discourse variables, but this is not the case. For the 1970s, fleece is the most important but face is among the least important, and for the 2010s, fleece again has a higher ranking than face, but the strongest effect is with the discourse-pragmatic variable, be like. The primary factor that would appear to be driving the relative importance is the timing of the change. For example, though fleece and face have both been undergoing change for some time, fleece began from the 1970s Teens, whereas face began later; thus, fleece distinguishes the 1970s groups more than face does. In the 2010s, we see the meteoric rise of be like among the Young Adults (with a rate of use of just 10% for the Adults and 70% for the Young Adults), rendering this the most important variable. Likewise, there’s, -ing, -er, and quotative go are variables that show a marked shift from the 1970s Adults to the Teens, whereas the variation between (have) got to/have to is more similar across the two groups. For the 2010s, there is a drop in use of (have) got to versus have to from the Adults to the Young Adults, in contrast to there’s, -ing, and need to, for which there is less apparent time change. The low importance of -er and the + Vowel for the 2010s is surprising, given that they do undergo change in this time period. It may be that the overall change is smaller for these forms, thus lessening their importance relative to the other variables. (Regarding the timing of these changes, see section on the linguistic variables above and references therein.)
These analyses show, then, that first, in each time period, all variables contribute to the clustering observed, and second, the main factor that determines their relative importance is the difference between the two age groups within each time period, reflecting the changes that have occurred. This is consistent with the social distribution of the clusters, for which age was the primary factor in the first division made, thus confirming that these clusters are not random, but meaningfully capture social and linguistic patterning in the data.
Conclusion
The results presented here align with observations in prior Sydney Speaks work that interpreting ethnic variation in the Sydney speech community requires consideration of social class. This intersection is likely a reflex of multiple things, including, socially, the network in which people move, and linguistically, the prestige attached to different variants (cf. Travis et al., Reference Travis, Grama and Purser2023:461-463). The current study presents a further level of evidence for an ethnicity-social class intersection in treating the linguistic behavior as the organizing factor for the social groups, rather than comparing the linguistic behavior of pre-defined social groups. Doing this demonstrates that this ethnicity-social class intersection holds across different time points and for different ethnic groups. This is hardly surprising, given what we know about intersections between other social variables, but it remains the case that many studies overlook this, considering ethnicity in isolation.
There are several risks with prioritizing ethnicity in this way. From a linguistic perspective, it may give an incorrect impression of the patterns of variation observed and impede our understanding of the nature of language change and variation across society. There are also social implications, as considering ethnicity independently from other social factors may be taken to suggest that the linguistic behavior of migrant groups (or ethnic minorities more generally) can be explained by their ethnic background alone, and thereby that they are solely defined by their ethnicity, thus potentially “idealising or glamorising a rather undifferentiated monolithic perspective on the international migrant” (Britain, Reference Britain, Kerswill and Wiese2022:332).
In contrast to such an essentialist approach, the intersectionality we put forward here recognizes that “no one category […] is sufficient to account for individual experience or behavior,” just as has been recognized for gender (Levon, Reference Levon2015:295), and thus better captures the multidimensionality of social groups. Such an approach brings to the fore the fact that someone like Lathan (from Example 1 above) is not just a Chinese Australian who speaks some Cantonese and Mandarin, has a very Asian social network, and has visited Hong Kong. He is also a 24-year-old man who attended a selective high school in Sydney and is now completing an Arts/Law degree and working as a paralegal. Categorizing him solely as a Chinese Australian ignores these other factors relating to social class that exist alongside his Chinese identity. Here, we have taken them into account and find that the social class measures are closely tied in with ethnicity (and with ethnic orientation). By considering all these factors together, we can gain a better understanding of what it means to be a “Chinese Australian” or a “Greek Australian,” which in turn allows for a more informed interpretation of the linguistic variation observed in ethnically diverse settings, and in particular in urban environments like Sydney.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0954394525100537.
Acknowledgements
This research was supported by the ARC Centre of Excellence for the Dynamics of Language (CE140100041). We thank James Grama for initial discussions about the class-ethnicity intersection, and Benjamin Purser, Heba Bou Orm, and two anonymous LVC reviewers for helpful comments on the paper.
Competing interests
The authors declare none.