Hostname: page-component-cb9f654ff-hqlzj Total loading time: 0 Render date: 2025-08-19T10:49:13.085Z Has data issue: false hasContentIssue false

Cue-specificity of contrastive hyperarticulation: evidence from the voicing contrast in Japanese

Published online by Cambridge University Press:  15 August 2025

Shinichiro Sano*
Affiliation:
Faculty of Business and Commerce, https://ror.org/02kn6nx58 Keio University , Yokohama, Japan
Céleste Guillemot
Affiliation:
https://ror.org/00bx6dj65 Faculty of Sciences and Engineering, Hosei University , Tokyo, Japan
*
Corresponding author: Shinichiro Sano; Email: shinichirosano@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

In natural speech, phonetic cues that distinguish lexical items can be hyperarticulated when there is a minimal pair competitor, a process known as contrastive hyperarticulation. Building upon prior work, this article examines the cue-specific nature of contrastive hyperarticulation in Japanese, focussing on stop voice onset time (VOT) using a speech corpus. We confirmed that the existence of a voicing minimal pair competitor in the lexicon triggers hyperarticulation of VOT duration on the target segment (shorter for voiced stops and longer for voiceless stops), while other contrasts (singleton vs. geminate) did not. The results also suggest that contrastive hyperarticulation (a) is more compatible with casual speech than slow/clear speech, (b) is sensitive to position in a word (greater in word-initial position than in non-initial position) and (c) applies to a greater degree in Japanese than in English due to properties of stops. This provides further evidence that the phonetic specificity of contrastive hyperarticulation is cross-linguistically relevant.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (https://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Mathematical theories such as information theory (Shannon Reference Shannon1948) and Bayesian inference (Bayes Reference Bayes1763; Laplace [1820] Reference Laplace1886) offer a means of quantifying the notion of communicative efficiency under the assumption that language is an effective system of message transfer. Recent linguistic research has provided increasing evidence that information-theoretic measures such as predictability, surprisal and informativity/entropy play a significant role in accounting for patterns in linguistic behaviour and the course of linguistic change. One of the underlying ideas of this research program is that linguistic signals differ in amount and quality, which causes a tendency for contextually predictable linguistic units to be reduced. Conversely, signals that increase the probability of successfully conveying the intended message may be enhanced in order to achieve robust information transmission. Prior research has investigated the role of information-theoretical and probabilistic measures in linguistic behaviour (see Jaeger & Buz Reference Jaeger, Buz, Fernandez and Cairns2017 for a comprehensive review). For example, findings establish links with morphological contraction (Frank & Jaeger Reference Frank, Jaeger, Love, McRae and Sloutsky2008; Bresnan & Spencer Reference Bresnan and Spencer2012), omission of morphemes (e.g., case marker omission; Fry Reference Fry2001; Lee Reference Lee2006; Kurumada & Jaeger Reference Kurumada and Jaeger2015; Norcliffe & Jaeger Reference Norcliffe and Jaeger2015), function words (e.g., that; Jaeger Reference Jaeger2010; Jaeger & Grimshaw Reference Jaeger and Grimshaw2013) or referring expressions (e.g., pronouns; Tily & Piantadosi Reference Tily, Piantadosi, Deemter, Gatt, Gompel and Krahmer2009).

This kind of approach is especially flourishing in phonology in the framework of Message-Oriented Phonology (Hall et al. Reference Hall, Hume, Jaeger and Wedel2016, Reference Hall, Hume, Jaeger and Wedel2018; henceforth MOP), which applies the basic concepts of information theory and Bayesian inference to phonological research. The idea that constitutes the foundation of MOP is that context-specific differences in the amount of information that phonological units (segments, syllables, etc.) carry about messages influence the extent to which general articulatory and perceptual biases affect their realisation (Hall et al. Reference Hall, Hume, Jaeger and Wedel2016). In this way, more informative pieces of the signal will be enhanced, while less informative pieces will be reduced. When Hall et al. (Reference Hall, Hume, Jaeger and Wedel2016) first introduced MOP, they demonstrated the concept based on final obstruent devoicing, a cross-linguistically common phenomenon, focussing on the patterns in English and German. They claimed that there is a direct correlation between the rate of devoicing in word-final obstruents and the number of minimal pairs (lexical competitors). Under the assumptions of MOP, final voiced obstruents are prone to reduction in languages such as German, where only a few minimal pairs exist, because word-final positions are ‘weak’ in terms of the amount of information they can convey, and maintaining voicing word-finally is physiologically costly. On the other hand, when the voicing distinction in word-final position allows distinguishing many lexical competitors, as in English, voicing will be preserved to maintain the contrast.

Probability effects at different levels provide supporting evidence for MOP: segments (Seyfarth Reference Seyfarth2014; Cohen Priva Reference Cohen Priva2015; Turnbull Reference Turnbull2018; with a focus on consonants in Kirov & Wilson Reference Kirov, Wilson, Miyake, Peebles and Cooper2012; Schertz Reference Schertz2013; Seyfarth et al. Reference Seyfarth, Buz and Jaeger2016; Nelson & Wedel Reference Nelson and Wedel2017; Chodroff & Wilson Reference Chodroff and Wilson2018; Sano Reference Sano2018a; Wedel et al. Reference Wedel, Nelson and Sharp2018; and on vowels in Aylett & Turk Reference Aylett and Turk2004; Hume & Bromberg Reference Hume and Bromberg2005; Shaw & Kawahara Reference Shaw and Kawahara2017; Wedel et al. Reference Wedel, Nelson and Sharp2018), phonological patterns and processes (Hall et al. Reference Hall, Hume, Jaeger and Wedel2018; Kawahara & Lee Reference Kawahara and Lee2018; Wedel et al. Reference Wedel, Ussishkin and King2019), variation and sound change (Wedel et al. Reference Wedel, Jackson and Kaplan2013a,Reference Wedel, Kaplan and Jacksonb; Bowern & Babinsky Reference Bowern and Babinsky2018) or morpheme/word duration (Bell et al. Reference Bell, Jurafsky, Fosler-Lussier, Girand, Gregory and Gildea2003, Reference Bell, Brenier, Gregory, Girand and Jurafsky2009; Hashimoto Reference Hashimoto2021), among others.

Within MOP-based research, close attention has been paid to the linguistic phenomenon referred to as contrastive hyperarticulation (Wedel et al. Reference Wedel, Nelson and Sharp2018). When a phonetic cue contributes to phonetically distinguishing a word from its lexical competitors, that cue tends to be hyperarticulated. That is, the distinctive feature of a sublexical unit that is relevant to a specific contrast is enhanced (e.g., longer duration, greater distance between vowels; Wedel et al. Reference Wedel, Nelson and Sharp2018) to make the distance in phonetic properties greater. This process is active not only within a single conversation, but also based on the existence of competitors in the lexicon as a whole (as also shown in Baese-Berk & Goldrick Reference Baese-Berk and Goldrick2009).

Contrastive hyperarticulation must be distinguished from slow/clear-speech hyperarticulation. Although the two types seem to have similar purposes (i.e., better comprehension/identification of lexical items; Payton et al. Reference Payton, Uchanski and Braida1994; Bradlow et al. Reference Bradlow, Torretta and Pisoni1996), they can have opposite effects on phonetic implementation. If we take the example of voice onset time (VOT; Lisker & Abramson Reference Lisker and Abramson1964), the interval of time between the release of the stop closure and the onset of voicing of the following vowel, the logical consequence of slow speech hyperarticulation is that, together with the enhancement of segment duration, VOT for voiceless stops should be longer. Contrastive hyperarticulation has a similar effect. For voiced stops, however, enhancement due to contrastive hyperarticulation should result in a shorter VOT, while we expect that in clear speech it would become longer in most cases (Smiljanić & Bradlow Reference Smiljanić and Bradlow2008). (In contrast, Kang & Guion Reference Kang and Guion2008 report a shortening of VOT in Korean lenis stops for clear speech hyperarticulation.)

Previous studies provide evidence that context-based contrastive hyperarticulation is active cross-linguistically, for example, duration of /p/ aspiration, fricative voicing contrasts, vowel length and quality contrasts and stop VOT duration in English (Baese-Berk & Goldrick Reference Baese-Berk and Goldrick2009; Kirov & Wilson Reference Kirov, Wilson, Miyake, Peebles and Cooper2012; Schertz Reference Schertz2013; Seyfarth et al. Reference Seyfarth, Buz and Jaeger2016; Nelson & Wedel Reference Nelson and Wedel2017; Wedel et al. Reference Wedel, Nelson and Sharp2018); vowels in Korean (Kang et al. Reference Kang, Ryu, Yun, Calhoun, Escudero, Tabain and Warren2019); and closure duration in singleton–geminate contrasts in Japanese (Sano Reference Sano2018a).

These studies suggest, in part, that hyperarticulation is cue-specific. That is, the only phonetic cues that participate in contrastive hyperarticulation are those that specifically contribute to the maintenance of the phonetic distance between the target and competitors for a given contrast. For example, hyperarticulation of VOT can be triggered by the presence of ‘voicing’ minimal pairs (e.g., /pit/ vs. /bit/), but not by other lexical neighbours of the target word (e.g., /kit/, /sit/, …).

The studies mentioned above offer a picture of the cue-specific nature of contrastive hyperarticulation in English. Experimental findings in Kirov & Wilson (Reference Kirov, Wilson, Miyake, Peebles and Cooper2012), for example, indicate that the VOT of word-initial voiceless stops correlates with the existence of a competitor for voicing and place of articulation, while no such correlation is observed in other positions. In Fricke (Reference Fricke2013), rather than the number of neighbours of a given word (neighbourhood density), the number of minimal pairs targeting the initial stop was the better predictor of VOT duration of voiceless stops in the Buckeye Corpus. In another experimental study exploring voiced and voiceless stops, Schertz (Reference Schertz2013) report that hyperarticulation of VOT was triggered by the existence of a voicing competitor, rather than a place/manner competitor. Seyfarth et al. (Reference Seyfarth, Buz and Jaeger2016) focus on word-final /s/–/z/ voicing contrast and experimentally demonstrate that speakers hyperarticulate when it is contextually relevant. Specifically, they observed that the signal was enhanced (i.e., shorter vowels before /s/ and longer voicing for /z/) when it resulted in increasing a relevant contrast, namely when there is a lexical competitor. Nelson & Wedel (Reference Nelson and Wedel2017), using a speech corpus, found that the presence of a VOT-specific minimal pair was a better predictor of contrastive hyperarticulation than the number of lexical neighbours differing in the initial segment. They also found a more robust effect for voiced stops (decreased duration when there is a competitor) than for voiceless stops. Lastly, Wedel et al. (Reference Wedel, Nelson and Sharp2018), using the same corpus, investigated both VOT of word-initial stops in voicing contrasts and F1−F2 Euclidean distance for vowel contrasts. For both types, they report that the existence of a cue-specific minimal pair competitor is a trigger for hyperarticulation: a greater phonetic distance with the competitor was observed. Conversely, neighbourhood density was shown to be irrelevant.

This article offers a case study of the cue-specificity of contrastive hyperarticulation, focussing on voicing and length contrasts in Japanese using a speech corpus. Based on the assumption that lexical competition induces synchronic, phonetically specific enhancement of phonemic contrasts (Aylett & Turk Reference Aylett and Turk2004, inter alia), this study analyses the patterns in sub-lexical hyperarticulation and examines the hypothesis that a specific cue (viz., VOT) that allows distinguishing a word from its minimal pair competitor is hyperarticulated to provide more information and maintain the contrast. Note that corpus studies appear to be especially relevant to the study of contrastive hyperarticulation: Nelson & Wedel (Reference Nelson and Wedel2017) point out that experimental studies, which often involve the use of speech paradigms, do not seem to provide enough motivation to trigger hyperarticulation in speakers. They suggest that speech in an experimental setting may induce so much clear-speech hyperarticulation that contrastive hyperarticulation is obscured or does not occur, as further hyperarticulation may not seem necessary to speakers.

Japanese differs from languages that are already well-studied in the MOP framework (English and other typologically related Indo-European languages) in rhythmic/prosodic properties, phonological characteristics and grammatical structure (such as word order or morphological system). Thus, investigating Japanese provides evidence to the question of whether the cue-specificity of contrastive hyperarticulation holds across language types.

In terms of rhythmic/prosodic properties, Japanese is a mora-timed language (Han Reference Han1962, Reference Han1994; Port et al. Reference Port, Dalby and O’Dell1987), in which moras tend to be of similar duration (see Port et al. Reference Port, Al-Ani and Maeda1980; Homma Reference Homma1981; Warner & Arai Reference Warner and Arai2001a,Reference Warner and Araib for a critical review of mora isochrony and mora-timing compensation), which may affect the realisation of hyperarticulation unlike in other languages. That is, we expect hyperarticulation targeting durational cues to be constrained to maintain mora-based durational contrasts: variations affecting duration should be restricted to a threshold of categorial perception in order to avoid the violation of language-specific timing properties that might hinder communication (e.g., by causing a short segment to be identified as long). Conversely, in a stress-timed language like English, no such restriction should be active, as there are no contrasts based solely on durational cues, and so the degree of hyperarticulation of durational cues should be greater in English than in Japanese. It is therefore meaningful to examine whether there is contrastive hyperarticulation despite the timing restrictions, or whether the phonetic implementation is rigid.

Furthermore, Japanese has two different phonological contrasts that target stops: voicing (see §1.1.1) and length (§1.1.2), which means that the same segment can have both a voicing and a geminate minimal pair (e.g., /kaki/ ‘persimmon’ vs. /kagi/ ‘key’ and /kakki/ ‘energy’). This allows us to test the cue-specificity of contrastive hyperarticulation by comparing the effects that contrasts have on VOT. If contrastive hyperarticulation is cue-specific, then we expect it to target only the specific cue that is relevant to a contrast. However, if it is not cue-specific, then we expect it to target the segment as a whole (as in clear speech hyperarticulation). For example, VOT duration should vary as well in the case of a geminate counterpart (and not only closure duration as reported in previous studies; Sano Reference Sano, Calhoun, Escudero, Tabain and Warren2019). Additionally, the literature reports an ongoing change in the VOT of Japanese voiced stops (from long lead to short lag; Riney et al. Reference Riney, Takagi, Ota and Uchida2007; Gao et al. Reference Gao, Yun, Arai, Calhoun, Escudero, Tabain and Warren2019; see §1.1.1), and it seems of interest to investigate how this sound change may affect patterns of hyperarticulation.

Lastly, we are not aware of any study examining multiple contrasts targeting the same segment in the same language. Because Japanese offers a variety of minimal pairs (voicing and length, both contrasts having a high functional load) for stop consonants, it provides a good opportunity to explore the phonetic specificity of contrastive hyperarticulation.

1.1. Phonetic properties of Japanese

1.1.1. Voicing contrast in Japanese

Japanese has a two-way contrast in voicing: voiceless stops /p/, /t/, /k/ contrast with their voiced counterparts /b/, /d/, /g/ (e.g., /kin/ ‘gold’ vs. /gin/ ‘silver’), in both word-initial and word-medial positions (Vance Reference Vance1987). Prior phonetic research investigating Japanese stops has identified the presence/absence of laryngeal activity and VOT as the main acoustic cues that contribute to the distinction between the two categories (Shimizu Reference Shimizu1977 et seq.). Thus, the traditional description of VOT opposes two types of plosives, ‘prevoiced’ (negative VOT; Shimizu Reference Shimizu1996, Reference Shimizu1999) and ‘voiceless unaspirated’ (positive VOT; Homma Reference Homma1980; Shimizu Reference Shimizu1996, Reference Shimizu1999), with variations depending on speech rate and position in the word.

However, the nature of VOT in Japanese is more complex than the dichotomy presented above. Recent studies indicate that the realisation of voicing is variable and undergoing a synchronic change: younger speakers tend to devoice stops in word-initial position (while older speakers tend to retain prevoicing), which suggests that VOT might be progressively losing its status as the primary cue for voicing (Takada et al. Reference Takada, Kong, Yoneyama and Beckman2015; Gao & Arai Reference Gao and Arai2018; Gao et al. Reference Gao, Yun, Arai, Calhoun, Escudero, Tabain and Warren2019). Takada et al. (Reference Takada, Kong, Yoneyama and Beckman2015) found that the degree of voicing appears to be shifting, as they observe extreme-lead, short-lead and short-lag VOT in speakers from oldest to youngest. Additionally, the VOT of voiceless consonants in Japanese does not match straightforwardly the traditional short lag vs. long lag dichotomy (Lisker & Abramson Reference Lisker and Abramson1967). Instead, voiceless stops are ‘moderately’ aspirated (Riney et al. Reference Riney, Takagi, Ota and Uchida2007), with a VOT that is shorter than long-lag VOT but longer than short-lag VOT.

The consequence of the above is that the VOT of voiced and voiceless consonants in initial position may overlap, suggesting that VOT alone might not be enough to maintain the contrast. However, the investigation of other possible secondary cues (f0, voice quality, following vowel) has produced limited results, and VOT still remains the primary and necessary cue for the voicing contrast (Riney et al. Reference Riney, Takagi, Ota and Uchida2007; Takada et al. Reference Takada, Kong, Yoneyama and Beckman2015; Byun Reference Byun2021). In the case of English, although both voiced and voiceless stops may have a positive VOT (Lisker & Abramson Reference Lisker and Abramson1964), there is no confusion between categories, because they fall into the short-lag and long-lag categories. In Japanese, however, the distinction may be difficult in initial position, because voiceless stop VOT values are between the short- and long-lag categories. In medial position, however, the contrast is retained.

1.1.2. Consonantal length contrast in Japanese

Segmental length plays an important role in Modern Japanese. Consonants in intervocalic position can be short (singleton) or long (geminate), and the distinction between them carries lexical contrasts in a variety of minimal pairs, such as /kako/ ‘past’ vs. /kakko/ ‘parenthesis’ or /hato/ ‘dove’ vs. /hatto/ ‘hat’. A large body of research has been conducted on the phonetics of singletons and geminates in Japanese. These studies have contributed to identifying the acoustic correlates involved in this distinction. The primary acoustic correlate of the singleton–geminate contrast in Japanese is closure duration: constriction for geminates is two to three times longer than for their singleton counterparts, with duration varying according to place and voicing (Han Reference Han1962, Reference Han1994; Homma Reference Homma1981; Port et al. Reference Port, Dalby and O’Dell1987; Kawahara Reference Kawahara and Kubozono2015). Previous studies have also identified factors such as the duration of preceding or following vowels and non-durational acoustic correlates like intensity, f0 and F1 (Port et al. Reference Port, Dalby and O’Dell1987; Han Reference Han1994; Hirata Reference Hirata2007; Kawahara Reference Kawahara and Kubozono2015). On the other hand, other phonetic cues such as VOT have been shown to be unrelated to the phonetic implementation of the contrast (Homma Reference Homma1981; Hirata & Whiton Reference Hirata and Whiton2005; Sano Reference Sano, Calhoun, Escudero, Tabain and Warren2019).

1.2. Goals and research agenda

In examining voicing and consonantal length contrasts in Japanese, we aim to provide additional evidence for the cue-specific nature of contrastive hyperarticulation, as MOP predicts that the same phonological unit can be affected in different ways by contrastive hyperarticulation when it contributes to identifying specific words relative to competitors. That is, depending on the competitors in the lexicon (e.g., the presence or absence of minimal pair), specific phonetic cues (e.g., VOT) are enhanced, while others remain unchanged.

Previous studies on VOT in the MOP framework suggest that, in English, the existence of a voiced stop competitor for a word-initial voiceless stop tends to induce contrastive hyperarticulation of VOT. That is, in an experimental setting, contrastive hyperarticulation of a voiceless stop leads to a longer VOT for target words with a minimal pair competitor when compared to those without (Baese-Berk & Goldrick Reference Baese-Berk and Goldrick2009; Peramunage et al. Reference Peramunage, Blumstein, Myers, Goldrick and Baese-Berk2011). Other studies have also found that hyperarticulation of the voicing contrast could be enhanced by visual stimuli (Kirov & Wilson Reference Kirov, Wilson, Miyake, Peebles and Cooper2012; Buz et al. Reference Buz, Tanenhaus and Jaeger2016; Seyfarth et al. Reference Seyfarth, Buz and Jaeger2016) and specifically targets VOT, the primary acoustic cue, instead of triggering a lengthening at the word level (Buz et al. Reference Buz, Jaeger, Tanenhaus, Bello, Guarini, McShane and Scassellati2014). In the case of Japanese, it appears reasonable to expect that the presence of lexical competitors should also trigger hyperarticulation of VOT duration: a shorter VOT for voiced consonants and a longer one for voiceless consonants. This is supported by Sano (Reference Sano2018a,Reference Sanob), who found in a corpus study that hyperarticulation can also be observed in Japanese consonantal length contrasts: when a lexical item has lexical competitors, the closure duration is longer for geminates and shorter for singletons.

In the current study, we specifically investigate the prediction that a distinctive phonetic cue (VOT) will undergo hyperarticulation when involved in a lexical contrast in which it maintains the phonetic distance between the target and the competitor. We show that in Japanese the existence of a voicing minimal pair competitor in the lexicon affects (i.e., enhances) the VOT duration of the target segment (shorter for voiced stops, longer for voiceless stops), while no such effect is induced by the existence of other types of contrasts, here consonantal length, in which VOT is not a relevant phonetic cue. This provides further evidence that the phonetic specificity of contrastive hyperarticulation is not limited to English and related languages, but also holds in typologically different languages like Japanese.

The remainder of this article is structured as follows: in §2, we introduce the corpus used for the analysis and the variables included in the statistical model. §3 summarises the results obtained in the statistical analysis. Based on the results, §4 discusses the nature of cue-specificity and other issues related to hyperarticulation with reference to the previous literature. §5 concludes this study.

2. Methods

2.1. The corpus and data collection

The analysis of the present study is based on the Corpus of Spontaneous Japanese Relational Database (NINJAL 2012; henceforth CSJ-RDB). The CSJ-RDB is a subset of the Corpus of Spontaneous Japanese (CSJ), one of the largest annotated corpora of spoken Japanese. The CSJ is abundantly annotated with linguistic and non-linguistic information that is suitable for detailed analysis. The target data were retrieved from the CSJ-RDB by focussing on a selection of 12 speech samples that is balanced in speech style (the CSJ-mini provided by Hanae Koiso of the National Institute for Japanese Language and Linguistics, which consists of the following speech samples: A01F0055, A02M0098, A05F0039, A11M0846, D01F0023, D01M0009, D04F0022, D04M0010, S00F0014, S01M0005, S02M0043, S03F0108). The breakdown of the CSJ-mini is as follows: monologue (eight speech samples) and dialogue (four speech samples), amounting to about 34,000 words produced by 11 speakers (five males and six females; age: 20s (3), 30s (5), 40s (1) and 60s (2)). Using the SQLite database language (https://jp.navicat.com/), this study employed the phonetic/phonological and morphological information annotated in the CSJ-RDB.

The target segments were stops (/p/, /t/, /k/, /b/, /d/, /g/). In addition to the target segments, we also retrieved information from the CSJ-mini about (a) segments immediately preceding/following the target segments, (b) syllables immediately preceding/following the syllables that contain the target segments and (c) words and phrases that contain the target segments. Tokens were excluded from the dataset, however, if the targeted segments occurred in filled pauses or word fragments.

After exclusion, the remaining target segments in the dataset were categorised as voiceless or voiced, based on the phonetic annotation provided in the CSJ-RDB. Other segmental properties, such as place, position in a word, height of the following vowel and existence of a minimal pair (see §2.2.2) were manually annotated for each target segment.

Segmental intervals in the CSJ-RDB (onset time and offset time) are annotated for each linguistic unit, such as segment, syllable, word and phrase. For stops, separate annotation is provided for the closure (with the label ‘<cl>’) and the burst (labelled with the segment) portions. The duration of VOT was obtained by subtracting the onset time of the target segments from their offset time based on the annotation in the CSJ-RDB. Note that the nature of the labels did not allow us to take into account the presence of laryngeal activity during the closure portion, thus VOT was treated as positive in our analysis, as in English (which has an aspiration contrast). However, as mentioned in §1.1.1., the VOT of voiced plosives in Japanese is undergoing a change, and prevoicing is often absent in younger generations. Given that speakers in the CSJ-mini mainly belong to younger generations, we expect to observe few occurrences of prevoicing, and so this treatment of VOT seems suitable. The duration of other units was calculated in the same manner. An exhaustive search and filtering of the data from the CSJ-mini resulted in a dataset of 4,448 tokens, of which 1,222 (27.5) were voiced stops and 3,226 (72.5) were voiceless stops.

2.2. Factors in statistical analysis

2.2.1. Response variable

For the purpose of examining the relationship between speech rate and VOT duration, we calculated speech rate by dividing the number of moras in a word containing the target segment by the duration of the word in seconds. As previously reported, VOT varied considerably depending on speech rate in such a way that they inversely correlated with each other (voiced: $r = 0.189$ , $t(1,220) = -6.724$ , $p < 0.01$ ; voiceless: $r = 0.123$ , $t(3224) = -7.015$ , $p < 0.01$ ). In other words, VOT tends to be shorter as the speech rate increases and longer as it decreases. Furthermore, phonetic enhancement resulting from contrastive hyperarticulation should be distinguished from enhancement due to slow/clear-speech hyperarticulation (see, e.g., Wedel et al. Reference Wedel, Nelson and Sharp2018). For these reasons, raw VOT values observed in the corpus were normalised by speech rate using a measure-internal method (see Wedel et al. Reference Wedel, Nelson and Sharp2018 and the references cited therein): namely, we multiplied raw VOT by speech rate. Following the previous literature, the speech rate-normalised VOT was then log-transformed (Bell et al. Reference Bell, Brenier, Gregory, Girand and Jurafsky2009; Seyfarth Reference Seyfarth2014).

2.2.2. Factors of interest

As the working hypothesis of this study is to examine if the minimal-pair-driven contrastive hyperarticulation in Japanese is cue-specific, the factors of our primary interest are the presence or absence of minimal pairs for (1) the voicing contrast and (2) the singleton–geminate contrast. Labels regarding the presence/absence of minimal pairs were coded item-by-item, using a lexicon based on three Japanese dictionaries (Kōjien, Shimmura Reference Shimmura2018; Sanseidō Kokugo Jiten, Kembo et al. Reference Kembo, Ichikawa, Hida, Yamazaki, Iima and Shioda2013; and Goo Kokugo Jisyo, Matsumura Reference Matsumura2024). For each word token, from the corresponding phonemic representation of the lemma annotated in the CSJ-RDB, we identified a potential minimal-pair competitor that contrasts with the lemma by substituting the distinctive feature (voicing or length) of the stop in question and checked if the competitor was present as a dictionary entry. If the potential competitor was present, the value of the minimal pair existence was coded as true; otherwise it was coded as false. Lexical accent was not taken into account. In this process, if a member of a pair was a personal name, jargon, or an archaic or dialectal form, the pair was not regarded as a minimal pair, as it is not likely to be shared by the majority of Japanese speakers. This process resulted in the distribution of minimal pairs summarised in Tables 1 (for the voicing contrast) and 2 (for the length contrast).

Table 1 Distribution of minimal pair existence by segment and position for the voicing contrast ( $x/y$ : for each cell, x is the number of types and y the number of tokens).

Table 2 Distribution of minimal pair existence by segment and position for the singleton–geminate contrast ( $x/y$ : for each cell, x is the number of types and y the number of tokens).

2.2.3. Control variables

Factors that may have an effect on VOT were also included in the model:

  • Place of articulation (labial, coronal, dorsal): VOT differs across places of articulation, as in labial $<$ coronal $<$ dorsal (Lisker & Abramson Reference Lisker and Abramson1967).

  • Position in word (initial/non-initial): Previous studies on VOT hyperarticulation in English focussed on word-initial stops (Wedel et al. Reference Wedel, Nelson and Sharp2018), but did not explore stops in non-initial position. However, following previous research on VOT in Japanese, this study explored both word-initial and non-initial positions, taking the effect of position on VOT into account.

  • Word frequency: We counted and log-transformed the number of occurrences of each word in the complete CSJ ( $N(\textit {word}_{x})$ ). Word frequency is known to affect word duration: frequent words are shorter than less frequent ones (Zipf Reference Zipf1935; Wright Reference Wright1979). Therefore, in the present study, word frequency may affect VOT duration.

  • Contextual predictability/local phonotactic probability (backward and forward): The average probability of each two-phoneme sequence was calculated by dividing the probability (number of occurrences in the corpus) of the target segment and the preceding/following vowel by the unconditional probability of the target segment (backward: $p(\textit {phoneme}_{x} \mid \textit {phoneme}_{x-1})$ ; forward: $p(\textit {phoneme}_{x} \mid \textit {phoneme}_{x+1})$ ). The values were log-transformed. Contextual predictability has been shown to be a predictor of duration in a corpus study of English (Seyfarth Reference Seyfarth2014).

  • Following vowel height (high/non-high): VOT is affected by the height of the following vowel (e.g., Klatt Reference Klatt1975). /i, u/ were coded as high and /a, e, o/ as non-high.

  • Following vowel duration: VOT of stops may be affected by the duration of the following vowel, based on the principle of mora-timing compensation (Port et al. Reference Port, Al-Ani and Maeda1980; Homma Reference Homma1981). We can expect that if vowel duration is longer, VOT is shorter and vice versa. The values were normalised by speech rate.

  • Word length: the number of moras in a word containing the target segment. The values were log-transformed. Turk & Shattuck-Hufnagel (Reference Turk and Shattuck-Hufnagel2000) propose that mean syllable duration may decrease with the number of syllables in a word, a phenomenon that they call ‘polysyllabic shortening.’ If we postulate that the same is true with moras, then the more moras a word contains, the shorter a mora will be. Segments should be affected accordingly at the level of the phonetic cue.

2.3. Model building

Following prior work (e.g., Wedel et al. Reference Wedel, Nelson and Sharp2018), we fit separate models for voiced and voiceless stops, since (a) VOT significantly differs depending on voicing (voiced: $\textrm {mean} = 16.15\,\textrm {ms}$ , $\textrm {SD} = 7.17$ ; voiceless: $\textrm {mean} = 21.84\,\textrm {ms}$ , $\textrm {SD} = 10.05$ ; $t(3068.8) = -21.008$ , $p < 0.01$ ); (b) contrastive hyperarticulation is expected to affect VOT in opposite directions: voiced stops become shorter and voiceless stops become longer (if there are minimal-pair competitors); and (c) other control variables may not affect voiced and voiceless stops the same way. In particular, we fitted linear mixed-effects models to our data using lmer of the lmerTest package (Kuznetsova et al. Reference Kuznetsova, Brockhoff and Christensen2017) in R (R Core Team 2019). Variables included in the full model were the normalised VOT as the dependent variable; the fixed effects were presence/absence of minimal pairs (for voicing and singleton–geminate contrasts), place of articulation, position in word, word frequency, contextual predictability (backward and forward), following vowel height, following vowel duration and word length. Random intercepts for speaker and item (lemma) and by-speaker random slopes (consisting of position, voicing minimal pair, singleton-geminate minimal pair and an interaction term of position and voicing minimal pair) were also included in the model. The model was structured to include the greatest number of theoretically relevant factors, rather than focussing on the most complex random effects structure (see Tang & Bennett Reference Tang and Bennett2018, following Baayen et al. Reference Baayen, Vasishth, Kliegl and Bates2017; Matuschek et al. Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017). The model fit was assessed by referring to AIC, BIC and log-likelihood.

3. Results

We now turn to the description of the final models. Results are presented separately for voiced and voiceless stops. Table 3 presents the summary of fixed factors for the voiced-stop model. For the control predictors, place of articulation, position in word, word frequency, contextual predictability (backward and forward), following vowel height, following vowel duration and word length were all retained in the final model, which indicates that they make significant contributions to model fit. Although word frequency did not reach significance, other factors had highly significant positive or negative effects on VOT duration.

Table 3 Fixed effects summary for the voiced-stop model.

The factors of interest regarding the two minimal pair contrasts were retained in the final model. Figure 1 illustrates the distribution of speech rate normalised VOT for voiced and voiceless stops, by presence/absence of a minimal pair competitor.

Figure 1 Distribution of speech rate-normalised VOT values by presence/absence of a minimal-pair competitor for voiced and voiceless stops. Solid circles represent mean values and vertical lines interquartile ranges.

Figure 1 shows that the mean VOT value for voiced stops is lower when a lexical competitor (voiceless counterpart) exists than when there is no such competitor. This is also confirmed in Table 3. The presence of a minimal-pair competitor in a voicing contrast was shown to significantly predict a shorter VOT ( $p < 0.01$ ) for voiced stops, as indicated by the negative coefficient for this variable. The average difference in raw VOT between voiced and voiceless stops when there is a minimal pair competitor was 8.45 ms, and it was 5.08 ms with no such competitor (see §4.1 for a comparison with English). On the other hand, the presence of a lexical competitor for the singleton–geminate contrast (i.e., a geminate counterpart) was not significantly predictive ( $p = 0.38304$ ). This is consistent with the hypothesis that contrastive hyperarticulation of a given cue is triggered by the existence of a minimal pair distinguished by that cue. Additionally, the shorter VOT in voiced stops caused by the existence of voicing minimal pairs suggests that what is observed in this model is contrastive hyperarticulation, rather than slow/clear-speech hyperarticulation. Next, let us turn our attention to the summary of fixed factors for the voiceless stop model in Table 4.

Table 4 Fixed effects summary for the voiceless stop model.

For the control predictors, place of articulation, position in word, word frequency, contextual predictability (backward and forward), following vowel height, following vowel duration and word length were all retained in the final model, suggesting that these predictors significantly contribute to model fit. Unlike in the voiced model, place of articulation (labial), position in word and backward contextual predictability were not significant, while word frequency was highly significant. Forward contextual predictability was close to significance at the five percent level.

For the factors of interest, both retained in the final model, the presence of a minimal pair competitor in a voicing contrast (voiced counterpart) was found to significantly affect VOT in the expected direction: longer VOT ( $p < 0.05$ ), as illustrated in Figure 1, while the presence of a minimal-pair competitor in the singleton–geminate contrast did not reach significance ( $p = 0.16767$ ). The result in the voiceless stop model is again consistent with the hypothesis that contrastive hyperarticulation is triggered by the existence of cue-specific minimal pairs, but not by non-cue-specific minimal pairs. The direction of the effect of cue-specific lexical competition in the voiceless-stop model was reversed from what was found in the voiced-stop model, in support of the hypothesis that contrastive hyperarticulation is realised in such a way that the phonetic correlate of a distinctive feature moves away from the competitor.

However, care should be taken in interpreting these results, since the distribution of geminates is biased with respect to position and voicing. As Table 2 shows, there is no singleton–geminate contrast word-initially in Japanese; consonantal length is contrastive only word-medially. Additionally, there are very few voiced geminates. Considering the possible effects of these biases, we ran a third model consisting only of the word-medial voiceless stop fraction of the data for triangulation purposes. For consistency, position in word, as a control predictor and as a term in the by-speaker random slope, was taken out of this model. The results are presented in Table 5.

Table 5 Fixed effects summary for the word-medial voiceless stop model.

In the word-medial-voiceless-stop model, the pattern for the control predictors is mostly similar to the voiceless-stop model in Table 4, except for the following vowel duration, which did not reach significance. Most importantly, the pattern of the factors of interest in this model is consistent with the voiceless-stop model in that the presence of a minimal-pair competitor in a voicing contrast (i.e., a voiced counterpart) was found to significantly affect VOT, making it longer ( $p < 0.05$ ), while the presence of a minimal-pair competitor in the singleton–geminate contrast did not reach significance ( $p = 0.12024$ ). Thus, we can safely reject the possibility that the lack of significant contrastive hyperarticulation effect of singleton–geminate minimal pairs is due to an unbalanced distribution of geminates regarding position and voicing.

4. Discussion

We confirmed in the previous section that (a) the presence of a voicing minimal pair competitor induces hyperarticulation of VOT in such a way that voiceless stops become longer, while voiced stops become shorter; and (b) the presence of a singleton–geminate minimal pair competitor does not induce hyperarticulation of VOT. The VOT of stop consonants in Japanese thus provides additional support for the cue-specific nature of contrastive hyperarticulation. This is the first study that attests the existence of minimal pair-driven contrastive hyperarticulation of VOT in Japanese, which is typologically different from better-studied languages in its grammatical structure and rhythmic/prosodic properties. This provides additional evidence that the cue-specificity of contrastive hyperarticulation holds across language types.

As reviewed above, VOT functions as the main acoustic cue for the distinction between voiced and voiceless stops in Japanese (Shimizu Reference Shimizu1977 et seq.). For the singleton–geminate contrast, closure duration has been shown to be the primary acoustic correlate in natural speech (Sano Reference Sano, Calhoun, Escudero, Tabain and Warren2019). Furthermore, the perceptual distance between singletons and geminates is found to be enhanced by contrastive hyperarticulation (Sano Reference Sano2018a,Reference Sanob). Building upon prior research, the present study provides further evidence for the existence of contrastive hyperarticulation in VOT in Japanese, but its effect is induced only by the existence of voiced/voiceless minimal-pair competitors, and it is insensitive to lexical competition with singleton–geminate minimal pairs. This is consistent with previous findings that contrastive hyperarticulation occurs only at the level of the phonetic cue; hence, it is cue-specific.

4.1. Degree of contrastive hyperarticulation

The results of this study are consistent with recent seminal work on the effect of contrastive hyperarticulation on VOT in English (Wedel et al. Reference Wedel, Nelson and Sharp2018). We found that a greater difference was observed between voiceless and voiced stops’ VOT when a lexical competitor exists in the lexicon than when there is no such competitor. In this study, the average difference in raw VOT between voiced and voiceless stops with a minimal-pair competitor (8.45 ms) is about 66 greater than without such a competitor (5.08 ms).

This difference between VOT of voiced vs. voiceless stops in the presence/absence of a minimal pair is below the threshold of the Weber–Fechner law of the just-noticeable difference – that is, the minimal difference between two stimuli that leads to a change in experience (Treutwein Reference Treutwein1995). Under the assumptions of the just-noticeable difference, a 3 ms difference in duration seems indeed too small for listeners to make a conscious distinction between the two competitors (see Lehiste Reference Lehiste and Lass1976, inter alia; a 10 ms difference is necessary for hearers to distinguish two sounds). However, contrastive hyperarticulation is assumed to be an unconscious mechanism of spontaneous speech (in contrast with the conscious mechanism involved in clear/elicited speech): it is speakers’ implicit knowledge of the existence of a minimal pair in the lexicon that leads them to enhance specific cues in order to better keep lexical items apart. We can thus postulate that the same is true for perception: both speakers and listeners should be able to unconsciously use sub-lexical information at a higher level for a more efficient communication process. Findings from a study on VOT in English by McMurray et al. (Reference McMurray, Tanenhaus and Aslin2002) suggest that this inference is correct. Their results in a perceptual experiment where VOT varied on a continuum indicate that fine-grained acoustic differences at the level of the phonemic cue that have minimal effects on phoneme identification play, in fact, a crucial role in lexical access. Drawing a parallel with the results of the present study, we propose that a minimal change in VOT, which should not be relevant when examining the two segments in isolation, provides listeners with additional cues for lexical identification.

Let us now turn our attention to the reasons behind the small size of this difference. One possibility is that, since the just-noticeable difference is defined as being proportional to the size of the original unit, the increase should not be considered in terms of raw duration, but as a proportion. In this case, the small size of the difference in VOT between voiced and voiceless stops without a minimal-pair competitor (5.08 ms) might justify the small size of the difference between stops with competitors (8.45 ms), as a 66 increase between the two is observed. Note that contrastive hyperarticulation of the closure duration in the singleton–geminate contrast (Sano Reference Sano2018a) represents an increase of 59 of the difference between singleton and geminate closure (81.7 ms with a minimal-pair competitor vs. 51.2 ms without).

We can find further food for thought in the nature of VOT in Japanese. As previously mentioned in §1.1.1, recent studies report that VOT in Japanese is undergoing a change and tends to deviate from its original description, especially in younger speakers: devoicing is often observed in initial position, and VOT seems to be shifting from a long lead to a short lag (Riney et al. Reference Riney, Takagi, Ota and Uchida2007; Takada et al. Reference Takada, Kong, Yoneyama and Beckman2015; Gao & Arai Reference Gao and Arai2018; Gao et al. Reference Gao, Yun, Arai, Calhoun, Escudero, Tabain and Warren2019). What these studies suggest is that VOT seems to be losing its role as a primary cue for the voicing contrast in Japanese. Thus, one might postulate that cues other than VOT might also be hyperarticulated to enhance the contrast. In this regard, our models seem to indicate that the following vowel plays a role in the contrast, as we found a correlation with the following vowel duration in both the voiced ( $p < 0.001$ ) and the voiceless ( $p < 0.05$ ) models. On the other hand, as indicated in Gao et al. (Reference Gao, Yun, Arai, Calhoun, Escudero, Tabain and Warren2019), although the status of VOT as a phonetic cue for voicing appears to be shifting, there is a lag between production and perception, and they find that in terms of perception, listeners still heavily rely on VOT. This suggests that even a small difference in VOT duration should be meaningful.

If we put the present results into perspective with those of Wedel et al. (Reference Wedel, Nelson and Sharp2018), it is interesting to note that the degree of contrastive hyperarticulation was found to be greater in Japanese than in English. In Wedel et al. (Reference Wedel, Nelson and Sharp2018), the average difference in raw VOT between voiced and voiceless stops in words that have a voicing minimal-pair competitor (63 ms) is approximately 20 greater than those in words that do not have such a competitor (53 ms), while it was 66 in this study.

One clue to understanding this difference between English and Japanese may be the role of aspiration. Wedel et al. (Reference Wedel, Nelson and Sharp2018) and most previous studies focussed on word-initial stops, where voiceless segments are aspirated, producing longer VOT durations (Lisker & Abramson Reference Lisker and Abramson1964). For this reason, the effect of contrastive hyperarticulation on VOT in English may be inhibited; that is, it takes more effort to produce contrastive articulation at longer VOTs. The present study, on the other hand, targeted both word-initial and non-initial stops, because Japanese does not have positional aspiration, and VOT is thus not expected to differ greatly between these two positions. Compared to English, the shorter VOT in Japanese may allow extra room for the effect of contrastive hyperarticulation (i.e., lengthening). Furthermore, for a shorter VOT to be salient enough to convey information about lexical contrast, the duration should be enhanced to a greater extent. In other words, voiced and voiceless stops in Japanese are phonetically more similar, with closer VOT durations; the stops of interest therefore are subject to a stronger effect of contrastive hyperarticulation (even 66 of enhancement results in no more than 3.37 ms of difference). This is not the case for English, where the difference in VOT between voiced and voiceless stops is greater than in Japanese (20 of enhancement results in about 10 ms of difference).

4.2. Other issues

4.2.1. Position in word

It has been shown that language users pay more attention to word-initial positions (Bruner & O’Dowd Reference Bruner and O’Dowd1958), which contribute more information to lexical identification (van Son & Pols Reference van Son, Pols and Bourlard2003; Wedel et al. Reference Wedel, Ussishkin and King2019). Beckman (Reference Beckman1998) classifies initial syllables as ‘psycholinguistically prominent’, and therefore initial positions favour phonological processes that enhance the realisation of lexical contrasts. Other previous research has shown that cross-linguistically, strengthening phonological processes are more likely to target word beginnings, while neutralisation processes prefer word ends (Barnes Reference Barnes2006; Wedel et al. Reference Wedel, Ussishkin and King2019). In terms of prosody as well, domain-initial positions are stronger (Keating Reference Keating, Palethorpe and Tabain2003 and references cited therein). In our dataset, the difference in raw VOT between voiced and voiceless stops with and without a minimal-pair competitor was slightly greater in initial positions (3.82 ms) than in non-initial positions (3.58 ms), although the predictor, position in word, was significant in the voiced-stop model ( $p < 0.01$ ) but not in the voiceless-stop model. From this, a possibility arises that the degree of contrastive hyperarticulation differs depending on the position of the segment in the word. If the positional difference is confirmed in future work, it can provide additional evidence for the cross-linguistic precedence of (word) initial positions over non-initial positions (e.g., Wedel et al. Reference Wedel, Ussishkin and King2019).

4.2.2. Slow/clear speech vs. casual speech

In §3, our results confirmed that contrastive hyperarticulation tends to be incompatible with slow/clear speech. This supports previous findings in Wedel et al. (Reference Wedel, Nelson and Sharp2018) that, assuming that the purpose of contrastive hyperarticulation is to avoid perceptually confusable productions near the category boundary, its effect may be less robust in slow/clear speech, where phonemic categories are less likely to overlap. Conversely, the context where contrastive hyperarticulation can be observed more robustly and is more likely to occur would be casual speech, which includes more perceptually confusable productions due, for example, to reduction processes (Wedel et al. Reference Wedel, Nelson and Sharp2018).

We examined the potential connection between speech style and the effect of contrastive hyperarticulation by running additional models with style in interaction with the existence of a voicing minimal pair. The distinction in the CSJ-RDB (monologue vs. dialogue) corresponds to the distinction between slow/clear speech and casual speech, as monologues represent slow/clear speech and dialogues represent casual speech (Maekawa Reference Maekawa2003). The gap in raw VOT duration due to the existence of a minimal-pair competitor was greater in dialogues than in monologues both in voiced and voiceless stops (voiced: monologues = 0.47 ms, dialogues = 3.05 ms; voiceless: monologues = 2.21 ms, dialogues = 3.11 ms), although the predictor was not significant in the voiceless-stop model, and the voiced-stop model only showed a tendency to significance ( $p = 0.09068$ ). If it is confirmed in future work that the degree of contrastive hyperarticulation is greater in casual speech than in slow/clear speech, it will reinforce our understanding of the distinction between slow/clear-speech hyperarticulation and contrastive hyperarticulation, and suggest that contrastive hyperarticulation is induced only when necessary.

5. Conclusion

This study sheds light on hitherto unexplored aspects of contrastive hyperarticulation. Building upon prior work, this study provided an additional test case that supports cue-specificity. As mentioned above, in Japanese two kinds of minimal pairs are distinguished by duration-based cues coexisting within a single stop consonant: VOT for the voicing contrast and closure duration for the singleton–geminate contrast. By taking advantage of the variety of minimal pairs distinguished based solely on durational considerations, we examined how information about lexical competition is reflected at the level of the phonetic cue. The results showed that what matters in contrastive hyperarticulation of VOT is the existence of voicing minimal pairs, providing further support for the cue-specificity of contrastive hyperarticulation. Our results also offer additional support for MOP’s perspective that language is an effective tool for communication, and that speakers phonetically enhance cues for accurate message transmission: when there is a lexical competitor, VOT is hyperarticulated and the distance is increased, but speakers are less likely to hyperarticulate when little benefit is expected.

Acknowledgements

We would like to thank Andrew Wedel for his valuable feedback and support on earlier versions of this article. We are also deeply grateful to the editors and anonymous reviewers for their constructive suggestions. Any remaining errors are solely our responsibility.

Funding statement

This study is supported by the Japan Society for the Promotion of Science KAKENHI Grant No. 19K00558.

Competing interests

The authors declare no competing interests.

References

Aylett, Mathew & Turk, Alice (2004). The smooth signal redundancy hypothesis: a functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47, 3156.10.1177/00238309040470010201CrossRefGoogle ScholarPubMed
Baayen, Harald R., Vasishth, Shravan, Kliegl, Reinhold & Bates, Douglas (2017). The cave of shadows: addressing the human factor with generalized additive mixed models. Journal of Memory and Language 94, 206234.10.1016/j.jml.2016.11.006CrossRefGoogle Scholar
Baese-Berk, Melissa & Goldrick, Matthew (2009). Mechanisms of interaction in speech production. Language and Cognitive Processes 24, 527554.10.1080/01690960802299378CrossRefGoogle ScholarPubMed
Barnes, Jonathan (2006). Strength and weakness at the interface: positional neutralization in phonetics and phonology. Berlin: Mouton de Gruyter.10.1515/9783110197617CrossRefGoogle Scholar
Bayes, Thomas (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London 53, 370418.Google Scholar
Beckman, Jill N. (1998). Positional faithfulness. PhD dissertation, University of Massachusetts, Amherst.Google Scholar
Bell, Alan, Brenier, Jason M., Gregory, Michelle, Girand, Cynthia & Jurafsky, Dan (2009). Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language 60, 92111.10.1016/j.jml.2008.06.003CrossRefGoogle Scholar
Bell, Alan, Jurafsky, Dan, Fosler-Lussier, Eric, Girand, Cynthia, Gregory, Michelle & Gildea, Daniel (2003). Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. JASA 113, 10011024.10.1121/1.1534836CrossRefGoogle ScholarPubMed
Bowern, Claire & Babinsky, Sarah (2018). Redundancy, and category variation. Topics in Cognitive Science 8, 503513.Google Scholar
Bradlow, Ann R., Torretta, Gina M. & Pisoni, David B. (1996). Intelligibility of normal speech I: global and fine-grained acoustic-phonetic talker characteristics. Speech Communication 20, 255272.10.1016/S0167-6393(96)00063-5CrossRefGoogle ScholarPubMed
Bresnan, Joan & Spencer, Jessica (2012). Frequency effects in spoken syntax: Have and be contraction. Presented at the symposium on New Ways of Analyzing Syntactic Variation, Radboud University, Nijmegen, November 2012.Google Scholar
Bruner, Jerome S. & O’Dowd, Donald (1958). A note on the informativeness of parts of words. Language and Speech 1, 98101.10.1177/002383095800100203CrossRefGoogle Scholar
Buz, Esteban, Jaeger, Florian T. & Tanenhaus, Michael K. (2014). Contextual confusability leads to targeted hyperarticulation. In Bello, Paul, Guarini, Marcello, McShane, Marjorie & Scassellati, Brian (eds.) Proceedings of the 36th annual meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 19701975.Google Scholar
Buz, Esteban, Tanenhaus, Michael K. & Jaeger, Florian T. (2016). Dynamically adapted context-specific hyper-articulation: feedback from interlocutors affects speakers’ subsequent pronunciations. Journal of Memory and Language 89, 6886.10.1016/j.jml.2015.12.009CrossRefGoogle ScholarPubMed
Byun, Hi-Gyung (2021). Acoustic characteristics for Japanese stops in word-initial position: VOT and post-stop f0. Journal of the Phonetic Society of Japan 23, 174197.Google Scholar
Chodroff, Eleanor R. & Wilson, Colin (2018). Predictability of stop consonant phonetics across talkers: between-category and within-category dependencies among cues for place and voice. Linguistics Vanguard 4, article no. 20170047 (11 pp.).10.1515/lingvan-2017-0047CrossRefGoogle Scholar
Cohen Priva, Uriel (2015). Informativity affects consonant duration and deletion rates. Laboratory Phonology 6, 243278.10.1515/lp-2015-0008CrossRefGoogle Scholar
Frank, Austin F. & Jaeger, Florian T. (2008). Speaking rationally: uniform information density as an optimal strategy for language production. In Love, Bradley, McRae, Ken & Sloutsky, Vladimir (eds.) Proceedings of the 30th annual meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 939944.Google Scholar
Fricke, Melinda D. (2013). Phonological encoding and phonetic duration. PhD dissertation, University of California, Berkeley.Google Scholar
Fry, John S. (2001). Ellipsis and wa-marking in Japanese conversation. PhD dissertation, Stanford University.Google Scholar
Gao, Jiayin & Arai, Takayuki (2018). F0 perturbation in a “pitch accent” language. In Proceedings of the 6th International Symposium on Tonal Aspects of Languages (TAL 2018). International Speech Communication Association, 5660. Published online at https://doi.org/10.21437/TAL.2018-12.CrossRefGoogle Scholar
Gao, Jiayin, Yun, Jihyeon & Arai, Takayuki (2019). VOT-f0 coarticulation in Japanese: production-biased or misparsing? In Calhoun, Sasha, Escudero, Paola, Tabain, Marija & Warren, Paul (eds.) Proceedings of the 19th International Congress of Phonetic Sciences. Australasian Speech Science and Technology Association, Inc., and International Phonetic Association, 210214.Google Scholar
Hall, Kathleen Currie, Hume, Elizabeth, Jaeger, Florian T. & Wedel, Andrew (2016). The message shapes phonology. Ms, University of British Columbia, University of Canterbury, University of Rochester and University of Arizona. Available at https://www.researchgate.net/publication/309033386_The_Message_Shapes_Phonology.Google Scholar
Hall, Kathleen Currie, Hume, Elizabeth, Jaeger, Florian T. & Wedel, Andrew (2018). The role of predictability in shaping phonological patterns. Linguistics Vanguard 4, article no. 20170027 (15 pp.).10.1515/lingvan-2017-0027CrossRefGoogle Scholar
Han, Mieko (1962). The feature of duration in Japanese. Onsei no Kenkyuu 10, 6580.Google Scholar
Han, Mieko (1994). Acoustic manifestations of mora timing in Japanese. JASA 96, 7382.10.1121/1.410376CrossRefGoogle Scholar
Hashimoto, Daiki (2021). Probabilistic reduction and mental accumulation in Japanese: frequency, contextual probability, and average predictability. JPh 87, 117.Google Scholar
Hirata, Yukari (2007). Durational variability and invariance in Japanese stop quantity distinction: roles of adjacent vowels. Journal of the Phonetic Society of Japan 11, 922.Google Scholar
Hirata, Yukari & Whiton, Jacob (2005). Effects of speaking rate on the singleton/geminate stop distinction in Japanese. JASA 118, 16471660.10.1121/1.2000807CrossRefGoogle Scholar
Homma, Yayoi (1980). Voice onset time in Japanese stops. Onsei Gakkai Kaiho 163, 79.Google Scholar
Homma, Yayoi (1981). Durational relationship between Japanese stops and vowels. JPh 9, 273281.Google Scholar
Hume, Elizabeth & Bromberg, Ilana (2005). Predicting epenthesis: an information-theoretic account. Paper presented at the 7th Annual Meeting of the French Network of Phonology, Aix-en-Provence, January 2005.Google Scholar
Jaeger, Florian T. (2010). Redundancy and reduction: speakers manage syntactic information density. Cognitive Psychology 61, 2362.10.1016/j.cogpsych.2010.02.002CrossRefGoogle ScholarPubMed
Jaeger, Florian T. & Buz, Esteban (2017). Signal reduction and linguistic encoding. In Fernandez, Eva M. & Cairns, Helen Smith (eds.) Handbook of psycholinguistics. Hoboken, NJ: Wiley-Blackwell, 3881.10.1002/9781118829516.ch3CrossRefGoogle Scholar
Jaeger, Florian T. & Grimshaw, Jane (2013). Information density affects both production and grammatical constraints. Paper presented at Architectures and Mechanisms for Language Processing (AMLaP), Université Aix-Marseille, September 2013.Google Scholar
Kang, Kyoung-Ho & Guion, Susan G. (2008). Clear speech production of Korean stops: changing phonetic targets and enhancement strategies. JASA 124, 39093917.10.1121/1.2988292CrossRefGoogle ScholarPubMed
Kang, Yoonjung, Ryu, Na-Young & Yun, Suyeon (2019). Contrastive hyperarticulation of vowels in two dialects of Korean. In Calhoun, Sasha, Escudero, Paola, Tabain, Marija & Warren, Paul (eds.) Proceedings of the 19th International Congress of Phonetic Sciences. Australasian Speech Science and Technology Association, Inc., and International Phonetic Association, 4347.Google Scholar
Kawahara, Shigeto (2015). The phonetics of sokuon, or geminate obstruents. In Kubozono, Haruo (ed.) Handbook of Japanese phonetics and phonology. Berlin: De Gruyter Mouton, 4378.10.1515/9781614511984.43CrossRefGoogle Scholar
Kawahara, Shigeto & Lee, Seunghun J. (2018). Truncation in message-oriented phonology: a case study using Korean vocative truncation. Linguistics Vanguard 4, article no. 20170016.10.1515/lingvan-2017-0016CrossRefGoogle Scholar
Keating, Patricia (2003). Phonetic encoding of prosodic structure. In Palethorpe, Sallyanne & Tabain, Marija (eds.) Proceedings of the 6th International Seminar on Speech Production. Sydney: Macquarie Centre for Cognitive Science, 119124.Google Scholar
Kembo, Hidetoshi, Ichikawa, Takashi, Hida, Yoshifumi, Yamazaki, Makoto, Iima, Hiroaki & Shioda, Takehiro (eds.) (2013). Sanseidō kokugo jiten [Sanseidō Japanese dictionary]. 7th edition. Tokyo: Sanseidō.Google Scholar
Kirov, Christo & Wilson, Colin (2012). The specificity of online variation in speech production. In Miyake, Naomi, Peebles, David & Cooper, Richard P. (eds.) Proceedings of the 34th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 587592.Google Scholar
Klatt, Dennis H. (1975). Voice onset time, frication and aspiration in word-initial consonant clusters. Journal of Speech, Language, and Hearing Research 18, 686706.10.1044/jshr.1804.686CrossRefGoogle ScholarPubMed
Kurumada, Chigusa & Jaeger, Florian T. (2015). Communicative efficiency in language production: optional case-marking in Japanese. Journal of Memory and Language 83, 152178.10.1016/j.jml.2015.03.003CrossRefGoogle Scholar
Kuznetsova, Alexandra, Brockhoff, Per B. & Christensen, Rune H.B. (2017). lmerTest package: tests in linear mixed effects models. Journal of Statistical Software 82, 126.10.18637/jss.v082.i13CrossRefGoogle Scholar
Laplace, Pierre S. ([1820] 1886). Théorie analytique des probabilités. 3rd edition. Paris: Courcier. Reprinted in 1886 by Gauthier-Villars as volume 7 of Œuvres completes de Laplace.Google Scholar
Lee, Hanjung (2006). Parallel optimization in case systems: evidence from case ellipsis in Korean. Journal of East Asian Linguistics 15, 6996.10.1007/s10831-005-3004-1CrossRefGoogle Scholar
Lehiste, Ilise (1976). Suprasegmental features in speech. In Lass, Norman (ed.) Contemporary issues in experimental phonetics. New York: Academic Press, 225239.10.1016/B978-0-12-437150-7.50013-0CrossRefGoogle Scholar
Lisker, Leigh & Abramson, Arthur S. (1964). A cross-language study of voicing in initial stops: acoustical measurements. Word 20, 384422.10.1080/00437956.1964.11659830CrossRefGoogle Scholar
Lisker, Leigh & Abramson, Arthur S. (1967). Some effects of context on voice onset time in English stops. Language and Speech 10, 128.10.1177/002383096701000101CrossRefGoogle ScholarPubMed
Maekawa, Kikuo (2003). Corpus of spontaneous Japanese: its design and evaluation. In Proceedings of the ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition. International Speech Communication Association, 712.Google Scholar
Matsumura, Akira (ed.) (2024). Goo kokugo jisho [goo Japanese dictionary]. Tokyo: NTT. Available at https://dictionary.goo.ne.jp/jn/.Google Scholar
Matuschek, Hannes, Kliegl, Reinhold, Vasishth, Shravan, Baayen, Harald R. & Bates, Douglas (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language 94, 305315.10.1016/j.jml.2017.01.001CrossRefGoogle Scholar
McMurray, Bob, Tanenhaus, Michael K. & Aslin, Richard N. (2002). Gradient effects of within-category phonetic variation on lexical access. Cognition 86, 3342.10.1016/S0010-0277(02)00157-9CrossRefGoogle ScholarPubMed
Nelson, Noah & Wedel, Andrew (2017). The phonetic specificity of competition: contrastive hyperarticulation of voice onset time in conversational English. JPh 64, 5170.Google Scholar
NINJAL, National Institute for Japanese Language and Linguistics (2012). Corpus of spontaneous Japanese relational database. Available at https://clrd.ninjal.ac.jp/csj/.Google Scholar
Norcliffe, Elizabeth J. & Jaeger, Florian T. (2015). Predicting head-marking variability in yucatec maya relative clause production. Language and Cognition 8, 139.Google Scholar
Payton, Karen L., Uchanski, R.M. & Braida, Louis D. (1994). Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. JASA 95, 15811592.10.1121/1.408545CrossRefGoogle ScholarPubMed
Peramunage, Dasun, Blumstein, Sheila E., Myers, Emily B., Goldrick, Mathew & Baese-Berk, Melissa (2011). Phonological neighbourhood effects in spoken word production: an fMRI study. Journal of Cognitive Neuroscience 23, 593603.10.1162/jocn.2010.21489CrossRefGoogle Scholar
Port, Robert F., Al-Ani, Salman & Maeda, Shosaku (1980). Temporal compensation and universal phonetics. Phonetica 37, 235252.10.1159/000259994CrossRefGoogle Scholar
Port, Robert F., Dalby, Jonathan & O’Dell, Michael (1987). Evidence for mora timing in Japanese. JASA 81, 15741585.10.1121/1.394510CrossRefGoogle ScholarPubMed
R Core Team (2019). R: a language and environment for statistical computing. Available at https://www.r-project.org.Google Scholar
Riney, Timothy, Takagi, Naoyuki, Ota, Kaori & Uchida, Yoko (2007). The intermediate degree of VOT in Japanese initial voiceless stops. JPh 35, 439443.Google Scholar
Sano, Shin-ichiro (2018a). Durational contrast in gemination and informativity. Linguistics Vanguard 4, article no. 20170011.10.1515/lingvan-2017-0011CrossRefGoogle Scholar
Sano, Shin-ichiro (2018b). Minimal pairs and hyperarticulation of singleton and geminate consonants as enhancement of lexical/pragmatic contrasts. NELS 48, 5366.Google Scholar
Sano, Shin-ichiro (2019). The distribution of singleton/geminate consonants in spoken Japanese and its relation to preceding/following vowels. In Calhoun, Sasha, Escudero, Paola, Tabain, Marija & Warren, Paul (eds.) Proceedings of the 19th International Congress of Phonetic Sciences. Australasian Speech Science and Technology Association, Inc., and International Phonetic Association, 18331837.Google Scholar
Schertz, Jessamyn (2013). Exaggeration of featural contrasts in clarifications of misheard speech in English. JPh 41, 249263.Google Scholar
Seyfarth, Scott (2014). Word informativity influences acoustic duration: effects of contextual predictability on lexical representation. Cognition 133, 140155.10.1016/j.cognition.2014.06.013CrossRefGoogle ScholarPubMed
Seyfarth, Scott, Buz, Esteban & Jaeger, Florian T. (2016). Dynamic hyperarticulation of coda voicing contrasts. JASA 139, EL31EL37.10.1121/1.4942544CrossRefGoogle ScholarPubMed
Shannon, Claude E. (1948). A mathematical theory of communication. Mobile Computating and Communications Review 5, 355.10.1145/584091.584093CrossRefGoogle Scholar
Shaw, Jason & Kawahara, Shigeto (2017). Effects of surprisal and entropy on vowel duration in Japanese. Language and Speech 62, 80114.10.1177/0023830917737331CrossRefGoogle ScholarPubMed
Shimizu, Katsumasa (1977). Voicing features in the perception and production of stop consonants by Japanese speakers. Studia Phonologica 11, 2534.Google Scholar
Shimizu, Katsumasa (1996). A cross-language study of voicing contrasts of stop consonants in Asian languages. Tokyo: Seibido.Google Scholar
Shimizu, Katsumasa (1999). A study on phonetic characteristics of voicing of stop consonants in Japanese and English. Journal of the Phonetic Society of Japan 3, 410.Google Scholar
Shimmura, Izuru (ed.) (2018). Kōjien [Wide garden of words]. 7th edition. Tokyo: Iwanami Shoten.Google Scholar
Smiljanić, Rajka & Bradlow, Ann R. (2008). Stability of temporal contrasts across speaking styles in English and Croatian. JPh 36, 91113.Google ScholarPubMed
van Son, Robert J.J.H. & Pols, Louis C.W. (2003). Information structure and efficiency in speech production. In Bourlard, Hervé (ed.) Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech). International Speech Communication Association, 769772.10.21437/Eurospeech.2003-63CrossRefGoogle Scholar
Takada, Mieko, Kong, Eun Jong, Yoneyama, Kiyoko & Beckman, Mary E. (2015). Loss of prevoicing in modern Japanese /g, d, b/. In Proceedings of the 18th International Congress of Phonetic Sciences. London: International Phonetic Association, 5 pp.Google Scholar
Tang, Kevin & Bennett, Ryan (2018). Contextual predictability influences word and morpheme duration in a morphologically complex language (Kaqchikel Mayan). JASA 144, 9971017.10.1121/1.5046095CrossRefGoogle Scholar
Tily, Harry & Piantadosi, Steven T. (2009). Refer efficiently: use less informative expressions for more predictable meanings. In van Deemter, Kees, Gatt, Albert, van Gompel, Roger & Krahmer, Emiel (eds.) Proceedings of the Workshop on the Production of Referring Expressions: bridging the gap between computational and empirical approaches to reference (PRE-CogSci 2009). Amsterdam: Cognitive Science Society, 8 pp.Google Scholar
Treutwein, Bernhard (1995). Adaptative psychophysical procedures. Vision Research 35, 25032522.10.1016/0042-6989(95)00016-XCrossRefGoogle ScholarPubMed
Turk, Alice & Shattuck-Hufnagel, Stephanie (2000). Word-boundary-related duration patterns in English. JPh 28, 397440.Google Scholar
Turnbull, Rory (2018). Effects of lexical predictability on patterns of phoneme deletion/reduction in conversational speech in English and Japanese. Linguistics Vanguard 4, article no. 20170033.Google Scholar
Vance, Timothy J. (1987). An introduction to Japanese phonology. Albany, NY: SUNY Press.Google Scholar
Warner, Natasha & Arai, Takayuki (2001a). Japanese mora-timing: a review. Phonetica 58, 125.10.1159/000028486CrossRefGoogle Scholar
Warner, Natasha & Arai, Takayuki (2001b). The role of the mora in the timing of spontaneous Japanese speech. JASA 109, 11441156.10.1121/1.1344156CrossRefGoogle Scholar
Wedel, Andrew, Jackson, Scott & Kaplan, Abby (2013a). Functional load and the lexicon: evidence that syntactic category and frequency relationships in minimal lemma pairs predict the loss of phoneme contrasts in language change. Language and Speech 56, 395417.10.1177/0023830913489096CrossRefGoogle Scholar
Wedel, Andrew, Kaplan, Abby & Jackson, Scott (2013b). High functional load inhibits phonological contrast loss: a corpus study. Cognition 128, 179186.10.1016/j.cognition.2013.03.002CrossRefGoogle Scholar
Wedel, Andrew, Nelson, Noah & Sharp, Rebecca (2018). The phonetic specificity of contrastive hyperarticulation in natural speech. Journal of Memory and Language 100, 6188.10.1016/j.jml.2018.01.001CrossRefGoogle Scholar
Wedel, Andrew, Ussishkin, Adam & King, Adam (2019). Crosslinguistic evidence for a strong statistical universal: phonological neutralization targets word-ends over beginnings. Lg 95, 428446.Google Scholar
Wright, Charles E. (1979). Duration differences between rare and common words and their implications for the interpretation of word frequency effects. Memory & Cognition 7, 411419.10.3758/BF03198257CrossRefGoogle ScholarPubMed
Zipf, George K. (1935). The psycho-biology of language. Boston, MA: Houghton Mifflin.Google Scholar
Figure 0

Table 1 Distribution of minimal pair existence by segment and position for the voicing contrast ($x/y$: for each cell, x is the number of types and y the number of tokens).

Figure 1

Table 2 Distribution of minimal pair existence by segment and position for the singleton–geminate contrast ($x/y$: for each cell, x is the number of types and y the number of tokens).

Figure 2

Table 3 Fixed effects summary for the voiced-stop model.

Figure 3

Figure 1 Distribution of speech rate-normalised VOT values by presence/absence of a minimal-pair competitor for voiced and voiceless stops. Solid circles represent mean values and vertical lines interquartile ranges.

Figure 4

Table 4 Fixed effects summary for the voiceless stop model.

Figure 5

Table 5 Fixed effects summary for the word-medial voiceless stop model.