Verbal expressions can evoke alternative states of the world that may not be present. Communicated linguistic representations may, for example, inform us about new or unknown aspects of ongoing experience, such as what a novel object is for or how to use it. Likewise, linguistic representations may mediate action planning, memory recollection or learning about the world. Thus, many ordinary cognitive activities are accompanied or supported by language. How do we integrate verbal and non-verbal stimuli when performing a task? How do linguistic expressions and non-verbal representations relate to each other?
This question is fundamental to the study of human cognition, as it concerns the relationship between basic human cognitive functions, such as language, perception and memory. An enduring approach to studying this relationship has been linguistic relativity – the idea that the language one speaks may influence the way one thinks. Owing to its anthropological origins, this approach has primarily focused on cross-linguistic and cross-cultural comparisons, as exemplified in existing reviews (Gumperz & Levinson, Reference Gumperz and Levinson1996; Ünal & Papafragou, Reference Ünal and Papafragou2016; Wolff & Holmes, Reference Wolff and Holmes2011). For example, Wolff and Holmes (Reference Wolff and Holmes2011) identified several ways in which language may influence thought: (a) language may meddle with task performance, e.g., by suggesting competing stimulus representations; (b) language may augment concurrent conceptual representations, e.g., by providing additional features supporting task performance; and (c) language may act as a spotlight or inducer, making certain features more salient or prime behaviour once it has been used.
However, these purported language influences on cognition are not exclusive to cross-linguistic studies (Gentner, Reference Gentner, Gentner and Goldin-Medow2003; Gentner & Goldin-Meadow, Reference Gentner and Goldin-Meadow2003). Most contemporary approaches to human cognition, such as embodied or connectionist theories, argue that interactions between linguistic and cognitive processes involve partially shared representations integrating verbal and non-verbal aspects (Barsalou, Reference Barsalou1999; Barsalou et al., Reference Barsalou, Simmons, Barbey and Wilson2003; McClelland & Rogers, Reference McClelland and Rogers2003; Patterson et al., Reference Patterson, Nestor and Rogers2007). Consistent with this view, many studies have demonstrated that verbal expressions can modulate visual perception (Estes et al., Reference Estes, Verges and Barsalou2008; Spivey et al., Reference Spivey, Tyler, Eberhard and Tanenhaus2001; Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995) or action planning (Glenberg & Kaschak, Reference Glenberg and Kaschak2002; Zwaan & Taylor, Reference Zwaan and Taylor2006) and elicit activity in brain regions shared with perception or action planning (Hauk et al., Reference Hauk, Johnsrude and Pulvermuller2004; Martin & Chao, Reference Martin and Chao2001; Pulvermüller, Reference Pulvermüller2018). These approaches have also emphasised the flexible and context-dependent nature of the cognitive processes involved in a task. For example, language modulations on action planning depend on the temporal overlap between language processing and planning (Borreggine & Kaschak, Reference Borreggine and Kaschak2006). In general, conceptual representations are adaptive in that they may dynamically change based on contextual cues and the agent’s goals (J. R. Anderson, Reference Anderson1991; Barsalou, Reference Barsalou1999, Reference Barsalou2009; Glushko et al., Reference Glushko, Maglio, Matlock and Barsalou2008).
Consistent with these interactive views, more recent discussions on language and cognition have incorporated single-language studies and argued for task-dependent interactive or predictive processes. For example, Lupyan (Reference Lupyan2012) argues that linguistic labels may exert top-down influences on perceptual representations. Similarly, Lupyan et al. (Reference Lupyan, Abdel Rahman, Boroditsky and Clark2020) argue that linguistic cues may minimise prediction errors in perception and action when viewed within a predictive coding framework. These proposals acknowledge other cognitive processes outside language (e.g., prediction), constraining how language operates in non-verbal tasks – an approach further elaborated here.
Nevertheless, the specific cognitive principles or constraints governing language–cognition interactions still need to be fleshed out in detail. Such principles are essential to elaborate mechanistic theories of how language operates in cognition that entail testable predictions. What leads language to meddle in a task, facilitate or hinder prediction or performance, modulate decisions or compete with alternative stimulus representations? Why do language influences often appear ad hoc? Just like theories of attention and perception predict what behaviours are more likely to occur in an experimental context, it must be possible to specify some principles that would increase the likelihood of specific language influences on cognitive performance and behaviour. The renewed interest in the Whorfian hypothesis in recent years, along with the flurry of studies reporting cross-linguistic and within-language effects, underscores the utility of theoretical frameworks in building theories capable of making specific predictions.
The present article reviews some principles of adult cognition that may constrain and even determine the interaction between verbal and non-verbal representations, drawing on existing cognitive psychology and neuroscience findings. Neither the cognitive principles nor the studies mentioned are meant to be exhaustive. Many cognitive tasks recruiting language likely entail distinct constraints on performance, as highlighted by various subfields of cognitive science (e.g., perception, memory, decision-making, and working memory). Comprehensive reviews of cross-linguistic studies can also be found elsewhere (Bohnemeyer, Reference Bohnemeyer2020; Samuel et al., Reference Samuel, Cole and Eacott2019; Ünal & Papafragou, Reference Ünal and Papafragou2016). Instead, following the topic of this special issue, the aim is to propose a framework for understanding ‘ad-hoc cognition’ and language.
Section 1 begins by delineating the relationship between words and conceptual structures, as well as other forms of prior knowledge. Linguistic meaning and prior knowledge are not always separable because they are both acquired through experience across development and become associated with one another. Section 2 introduces the role of task goals and experimental characteristics in constraining performance, leading the cognitive system to satisfy these constraints. It then illustrates how task goals and designs may trigger the use of available linguistic information in some existing studies. Section 3 summarises these observations and suggests a tentative framework for delineating how and why linguistic knowledge modulates cognitive performance, regardless of whether these occur within speakers of the same language or across speakers of different languages.
1. Language and prior knowledge in memory
Our mind implicitly and explicitly learns a staggering array of knowledge throughout life. For example, we know that some events bring about others (causation and contingencies) and know how to perform activities or interact with objects and people in multiple contexts. As much of this knowledge can be recruited at any time as needed, it is argued that our mind possesses different kinds of memory representations (or knowledge) supporting these cognitive processes. Cognitive scientists often distinguish among different types of long-term memory acquired from prior experience and practice. Episodic memory includes context-specific memories linked to a time and space (e.g., one’s lunch yesterday). Semantic memory includes generalised knowledge abstracted across similar multi-sensory experiences (e.g., typical lunches) (Tulving, Reference Tulving1972, Reference Tulving1984). Procedural memory encompasses generalised task abilities acquired from prior practice (Eichenbaum, Reference Eichenbaum2010; Gupta & Cohen, Reference Gupta and Cohen2002).
Generalised semantic knowledge includes object concepts and categories (Murphy, Reference Murphy2004; Smith & Medin, Reference Smith and Medin1981) or structured event schemas (Franklin et al., Reference Franklin, Norman, Ranganath, Zacks and Gershman2020; Rumelhart, Reference Rumelhart, Spiro, Bruce and Brewer1980; Shank & Abelson, Reference Shank and Abelson1977; Zacks, Reference Zacks2020). For example, we are familiar with various animal and tool types, as well as how to interact with them. We are also familiar with many actions and events, their typical participants and the situations in which they occur (e.g., arrests, hunts). The more extensive our experience with objects or events, the more detailed and embedded the knowledge we acquire. These conceptual representations are necessary because they enable inferences and context-appropriate actions when encountering new objects or events. For example, when faced with a novel gadget, we may infer what it is for or how to use it by assessing it against existing knowledge.
Models of concepts and semantic memory generally assume that word and sentence meanings convey conceptual representations (Kumar, Reference Kumar2021; Murphy, Reference Murphy2004). Words like bird or arrest evoke associated features or situations, including sensory features (e.g., shape) and situations where the entities involved typically occur (Hare et al., Reference Hare, Jones, Thomson, Kelly and McRae2009; McNorgan et al., Reference McNorgan, Reid and McRae2011; Stanfield & Zwaan, Reference Stanfield and Zwaan2001). However, researchers often highlight the absence of one-to-one correspondence between individual words and concepts (Murphy, Reference Murphy2004). The meaning of some words, like dog and table, appears to refer to external categories and behave like concepts, e.g., they may have prototypical features and support categorical inferences. Still, as dictionary entries suggest, most words have multiple senses and meanings depending on sentential contexts (e.g., turn, taller, paper), and many concepts that we can think of are not expressed by a single word (e.g., dishes to taste in San Fermín’s festival). Indeed, much ambiguity-resolution research has extensively studied how linguistic meaning is computed on the spot as a function of larger phrasal, sentential or pragmatic contexts (Kaiser & Trueswell, Reference Kaiser and Trueswell2004; MacDonald et al., Reference MacDonald, Pearlmutter and Seidenberg1994). Thus, isolated words do not necessarily correspond to unique concepts, just as vocabulary alone does not reflect the variety of ideas a language can communicate.
Nevertheless, embodied and connectionist approaches to cognition incorporate linguistic meanings – including the products of context-dependent interpretations – into semantic memory (see Figure 1). In connectionist models, for example, words are associated through learning with a distributed network of semantic features partially shared with objects and action representations (Hoffman et al., Reference Hoffman, McClelland and Lambon-Ralph2017; Lambon Ralph et al., Reference Lambon Ralph, Jefferies, Patterson and Rogers2017; McClelland et al., Reference McClelland, Botvinick, Noelle, Plaut, Rogers, Seidenberg and Smith2010; McClelland & Rogers, Reference McClelland and Rogers2003; McRae et al., Reference McRae, de Sa and Seidenberg1997). These models of semantic memory are experience based in that cognitive representations and linguistic meanings emerge from learning over time and are grounded in sensory–motor features distributed across the cerebral cortex (Binder & Desai, Reference Binder and Desai2011; Fernandino et al., Reference Fernandino, Binder, Desai, Pendl, Humphries, Gross, Conant and Seidenberg2016; Martin & Chao, Reference Martin and Chao2001). In this view, language learning and usage in multi-sensory contexts strengthen the association between words and sensorimotor features in semantic memory.

Figure 1. Schematic representation of conceptual features and their links to words within semantic memory. Features may temporarily cluster together into concepts and relate to others in a context-dependent fashion, such as when interpreting ambiguous words. Words may have associative links to conceptual features and other words or linguistic structures.
Multiple neurocognitive studies have shown that words activate brain regions recruited for action or perception (Martin & Chao, Reference Martin and Chao2001; Pulvermüller, Reference Pulvermüller2005). For example, kick, pick and lick verbs activate premotor regions associated with leg, hand or mouth actions. Even motor features resulting from sentential composition appear to recruit motor-related regions. For example, pushing the piano elicits stronger activity than pushing the chair or forgetting the piano (Moody & Gennari, Reference Moody and Gennari2010). As argued by models of semantic cognition, action or perception networks may be co-activated with language-processing regions computing context-dependent interpretations (Hoffman et al., Reference Hoffman, McClelland and Lambon-Ralph2017; Lambon Ralph et al., Reference Lambon Ralph, Jefferies, Patterson and Rogers2017), suggesting that computing linguistic meaning is distinct from but overlaps with semantic memory networks.
Experience-based approaches to cognition are consistent with the possibility that people from different cultures and languages may recruit different associated features in semantic memory (Kemmerer, Reference Kemmerer2023). Group differences could emerge from learning different systems that map verbal expressions to semantic memory features. For example, different languages employ distinct constructions and morphological resources to describe the same picture, indicating variations in mapping a conceptual representation into verbal expressions (Gennari et al., Reference Gennari, Mirkovic and MacDonald2012; Papafragou et al., Reference Papafragou, Massey and Gleitman2002). Some authors argue that word order sequencing in language is akin to learned procedures in procedural memory (Hamrick et al., Reference Hamrick, Lum and Ullman2018; Ullman, Reference Ullman2016), which entails procedural memory differences across languages with distinct sequencing patterns. Likewise, languages differ in how words map into sensory experiences, such as colour perception, implying that words from different languages activate distinct semantic-memory features (Regier & Kay, Reference Regier and Kay2009). Thus, cross-linguistic differences may lead to contrasting associations in semantic or procedural memory, with some languages and cultures more readily activating certain features or patterns than others.
Many cross-cultural studies are consistent with this possibility. Growing up in different cultures modulates how people inspect scenes (Chua et al., Reference Chua, Boland and Nisbett2005; Flecken et al., Reference Flecken, Von Stutterheim and Carroll2014), how they reason about the world (Atran et al., Reference Atran, Medin and Ross2005; Ojalehto & Medin, Reference Ojalehto and Medin2015) or how they respond to nameable colours (Thierry et al., Reference Thierry, Athanasopoulos, Wiggett, Dering and Kuipers2009). For example, in oddball tasks where participants respond to shapes, speakers with contrasting names for the colour stimuli show different event-related potentials from speakers who lack the naming distinction (Thierry et al., Reference Thierry, Athanasopoulos, Wiggett, Dering and Kuipers2009). In another study, Korean speakers who consistently distinguish different containment relationships between objects (tight vs loose fit) are more susceptible than English speakers to attentional capture by visual fitness features in colour tasks (Goller et al., Reference Goller, Choi, Hong and Ansorge2020). These examples suggest that frequent references to visual features may increase the likelihood of attending to these features or verbal references, leading to cross-linguistic differences.
However, one’s semantic associative network is not necessarily stable over time or entirely available at any time. Individuals continuously learn from experience, and new experiences can potentially change one’s semantic network. Learning to navigate a new city or learning a new language can lead to slow semantic reorganisation through memory consolidation – a process integrating episodic experiences with existing semantic knowledge (Dudai, Reference Dudai2012; James et al., Reference James, Gaskell, Weighall and Henderson2017). Moreover, contextual features may increase the availability of some cognitive representations more than others, as in semantic or associative priming (McNorgan et al., Reference McNorgan, Reid and McRae2011; Thompson-Schill et al., Reference Thompson-Schill, Kurtz and Gabrieli1998). For example, when we sit back and listen to language in visual scene contexts, words and phrases spontaneously drive attention to semantically related but unnamed visual objects (e.g., looking at a trumpet when hearing piano) (Altmann & Mirković, Reference Altmann and Mirković2009; Huettig & Altmann, Reference Huettig and Altmann2005; Kamide et al., Reference Kamide, Altmann and Haywood2003; Spivey et al., Reference Spivey, Tanenhaus, Eberhard and Sedivy2002; Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995). Finally, we entertain mostly relevant semantic information when pursuing specific goals. For example, when searching for our house keys, we must retrieve relevant memories (where we left them or are likely to have left them) while representing object features capable of identifying the keys in the current search environment (visual vs tactile search in our pockets). Thus, our action goals and contexts constrain the semantic memory representations recruited in a specific situation.
These observations suggest that although speakers of different languages may vary in their semantic associative networks and verbal practices, in most laboratory studies and ordinary goal-oriented tasks, contextual features, intended goals or task demands will constrain the cognitive representations entertained and the extent to which linguistic meaning is recruited. Attuning to contextual conditions is a key characteristic of adult human cognition, enabling successful performance in a multi-faceted and dynamic world.
2. Language and cognition in action
A critical feature of goal-directed behaviour is that attention is focused on the contextual features consistent with the internal goal representation (Barsalou, Reference Barsalou1999; Hommel, Reference Hommel2022; Hommel et al., Reference Hommel, Müsseler, Aschersleben and Prinz2001). Different task goals can therefore elicit distinct processing of the same stimuli. For example, watching an animation to describe it elicits a different pattern of fixations from watching it for other purposes (Papafragou et al., Reference Papafragou, Hulbert and Trueswell2008; Sakarias & Flecken, Reference Sakarias and Flecken2019). Likewise, providing form-based or meaning-based word judgements directs attention to different stimulus aspects, resulting in differential stimulus recollection (Craik & Tulving, Reference Craik and Tulving1975). Even salient stimulus characteristics can be missed when participants are engaged in goal-directed visual processing. For example, when counting the number of ball passes between players in a video, the presence of a gorilla among the players is often unnoticed (Simons, Reference Simons2000). Thus, visual attention oriented to action is typically selective, with observers attending to goal-relevant features more than goal-irrelevant ones (Hommel et al., Reference Hommel, Müsseler, Aschersleben and Prinz2001).
Goal-oriented action may require additional processes when tasks admit alternative responses or procedures to accomplish them. In these cases, competition or weighting mechanisms may intervene to choose one alternative over another within the allocated time (Allen et al., Reference Allen, Ibara, Seymour, Cordova and Botvinick2010; Botvinick & Cohen, Reference Botvinick and Cohen2014; Hommel, Reference Hommel2022; Kool et al., Reference Kool, McGuire, Rosen and Botvinick2010; Shenhav et al., Reference Shenhav, Musslick, Lieder, Kool, Griffiths, Cohen and Botvinick2017). For instance, when naming an action performed with a visually presented object (e.g., a door or scissors), the conflict between two equally likely alternatives (e.g., open and close) elicits longer reaction times than having one dominant alternative (e.g., cut). When alternative task procedures are available, processes are often argued to yield the optimal, less costly response to the task’s demands. For example, selecting among alternative courses of action may depend on perceived cognitive effort, i.e., selecting the action or response expected to incur the least mental effort within the available resources (Shah & Oppenheimer, Reference Shah and Oppenheimer2008; Shenhav et al., Reference Shenhav, Musslick, Lieder, Kool, Griffiths, Cohen and Botvinick2017).
Accommodation of external demands using available internal resources is particularly important in tasks involving open-ended judgements, such as questions or decisions that do not require objective accuracy or specify a clear decision criterion. For this reason, much of decision-making research has focused on cognitive shortcuts or heuristics that minimise cognitive effort. For example, when asked which of two unrelated events, A or B, is more likely, people choose the event with more available (easily accessible) event instances – a heuristic known as the availability heuristic (Tversky & Kahneman, Reference Tversky and Kahneman1973). In Bayesian accounts, heuristics are modelled as probabilistic inferences based on available information, including stimuli, prior knowledge and recent contextual experience (Chater et al., Reference Chater, Tenenbaum and Yuille2006; Oaksford & Chater, Reference Oaksford and Chater2020; Shah & Oppenheimer, Reference Shah and Oppenheimer2008).
These brief observations suggest that goal-oriented action is constrained by the relationship between contextual characteristics and task demands, and these constraints may determine which available features of semantic memory are temporarily activated in a specific situation. These contextual constraints operate as modulatory forces dynamically interacting during processing, as argued by most approaches to cognition, including embodied, connectionist and dynamic theories (Hoffman et al., Reference Hoffman, McClelland and Lambon-Ralph2017; Spivey, Reference Spivey2008, Reference Spivey2023). Therefore, an interactive view of cognition implies that language may play different roles depending on how and when linguistic information contributes to achieving a goal within the current context and available resources. In theories of cognition, the management of mental representations oriented towards behaviour is variably referred to as working memory and executive, domain-general or cognitive control processes and is typically associated with frontal brain regions (Baddeley, Reference Baddeley2003; Badre, Reference Badre2025; Braver, Reference Braver2012). A common characteristic of executive or control processes is that they converge in optimal solutions within the available task contexts and limited cognitive resources (Lieder & Griffiths, Reference Lieder and Griffiths2020).
In what follows, I review previous findings exemplifying potential linguistic contributions to these cognitive processes, such as goal-directed representations and attuning to task demands. Reframing earlier studies in these terms illustrates how task and goal representations constrain and even determine the role of language in non-verbal tasks.
2.1. Task–goal representations in experimental contexts
As semantic memory contains interconnected linguistic meaning and multi-sensory conceptual structures (cf. Figure 1), a task requiring concurrent or temporally contiguous visual and linguistic stimuli will activate semantic features associated with both stimulus types. Yet, how these features affect behavioural performance depends on goal representations. The role of task goals can be inferred from the studies demonstrating that words and sentences may either facilitate or interfere with perceptual tasks such as visual object recognition (Estes et al., Reference Estes, Verges and Barsalou2008; Lupyan et al., Reference Lupyan, Abdel Rahman, Boroditsky and Clark2020; Spivey et al., Reference Spivey, Tyler, Eberhard and Tanenhaus2001; Stanfield & Zwaan, Reference Stanfield and Zwaan2001; Zwaan et al., Reference Zwaan, Stanfield and Yaxley2002; Zwaan & Taylor, Reference Zwaan and Taylor2006). When the words’ semantic features overlap with the conceptual features recruited for task performance, e.g., object identification, performance is facilitated via spreading activation through the semantic network. Interference or response delays, on the other hand, may occur when the words’ features conflict with those recruited to achieve the task goal. For example, processing words containing incongruent features with a target may delay target identification. For more details on the temporal dynamics during processing, see S. E. Anderson et al. (Reference Anderson, Chiu, Huette and Spivey2011) and Connell and Lynott (Reference Connell and Lynott2012).
In memory studies, the stimulus structure or the experimental design may facilitate or hamper goal attainment. For example, verbal categorisation during learning may produce contrasting results in later visual-recognition tests. Studies examining object or scene recognition as a function of learning tasks, e.g., comparing a linguistic task to a non-linguistic task, have found that visual memory performance was poorer after linguistic categorisation tasks (Carmichael et al., Reference Carmichael, Hogan and Walter1932; Feist & Gentner, Reference Feist and Gentner2007; Lupyan, Reference Lupyan2008). These results align with an interactive encoding account, whereby language use during learning distorts object representations towards typical category features, resulting in poorer recognition or discrimination (Feist & Gentner, Reference Feist and Gentner2007; Lupyan, Reference Lupyan2008).
Other studies, in contrast, have shown better, rather than poorer, recognition memory with language use. They demonstrate that language production or comprehension during stimulus exposure may lead to better memory performance than non-verbal tasks (Huff & Schwan, Reference Huff and Schwan2008; Lupyan et al., Reference Lupyan, Rakison and McClelland2007; Richler et al., Reference Richler, Gauthier and Palmeri2011, Reference Richler, Palmeri and Gauthier2013; Sakarias & Flecken, Reference Sakarias and Flecken2019). Richler et al. (Reference Richler, Palmeri and Gauthier2013) argued that memory facilitation results from verbal cues making the stimuli more memorable and distinctive. A key difference between these and Lupyan’s (Reference Lupyan2008) studies was the structure of the stimulus set so that naming facilitated subsequent recognition memory when the labels uniquely identified an object of a given category (lamps, cups, chairs, etc.), i.e., the labels help diagnose whether a specific object was previously seen. In contrast, category labels hinder subsequent recognition when many similar objects share the same label during learning, causing exemplars of a category to resemble one another due to their similarity to the category prototype. Therefore, stimulus features and labels operate differently within contrasting stimulus sets and experimental designs (e.g., numbers of exemplars in a category). What matters in visual-recognition tasks is whether stimulus features help identify and discriminate an item from the set of studied stimuli. See Wang et al. (Reference Wang, Gareth Gaskell and Gennari2024) for a comparison of task instructions across the same experimental designs and Wang & Gennari (Reference Wang and Gennari2019) for language-mediated retrieval effects.
Together, these perceptual and memory studies suggest that the relationship between the task goal and the experimental context determines how verbal labels modulate performance.
2.2. Optimising task performance with internally available words
Experience-based accounts of semantic memory are compatible with robust links between words and conceptual structures, particularly those that are frequently strengthened through everyday communication and practice. Therefore, verbal expressions may be spontaneously recruited to aid task performance. For example, verbal expressions can provide shorthand substitutes for complex non-verbal representations or those needing to be mentally retained or manipulated further. In many ordinary tasks, people use language to encode information, e.g., when memorising a phone number, calculating sums or studying for an exam. As suggested by research on inner speech, people may also reason or plan complex actions by talking to themselves or writing to-do lists (Alderson-Day & Fernyhough, Reference Alderson-Day and Fernyhough2015; Fernyhough & Borghi, Reference Fernyhough and Borghi2023). A recent meta-analysis of verbal interference in dual-task paradigms indeed suggests that complex cognitive tasks, such as reasoning, mental calculations and behavioural self-cuing (e.g., task reminders), involve some form of inner speech (Nedergaard et al., Reference Nedergaard, Wallentin and Lupyan2022).
This suggestion is consistent with many studies demonstrating distinctive behaviours across and within language groups. Object names, for example, are spontaneously recruited in visual search tasks in monolingual speakers (Meyer et al., Reference Meyer, Belke, Telling and Humphreys2007; Walenchok et al., Reference Walenchok, Hout and Goldinger2016). In bilinguals, English and Spanish speakers fixate on different objects in a display while searching for a clock in Figure 2: English speakers fixate on an English phonological competitor (e.g., clouds), whereas Spanish speakers fixate on a Spanish phonological competitor (e.g., fixating on a gift [Spanish ‘regalo’], when searching for a clock (Spanish ‘reloj’) (Chabal & Marian, Reference Chabal and Marian2015). The competitors’ activation in these studies may depend on name accessibility (familiar objects prime their high-frequency names) and the requirement to maintain the object in working memory for an upcoming visual search.

Figure 2. Task structure of the visual search task in Chabal and Marian (Reference Chabal and Marian2015).
Having readily available names also facilitates colour discrimination. For example, speakers possessing linguistic categories for colour stimuli are faster at discriminating them than speakers lacking those categories unless task demands prevent lexical access (Gilbert et al., Reference Gilbert, Regier, Kay and Ivry2005; Lupyan et al., Reference Lupyan, Abdel Rahman, Boroditsky and Clark2020; Winawer et al., Reference Winawer, Witthoft, Frank, Wu, Wade and Boroditsky2007). In some colour discrimination tasks, performance would be strenuous without linguistic aid. These studies, for example, asked speakers of different languages to retain a colour shade for 30 seconds and later indicate which shade was seen from two similar ones (Davidoff et al., Reference Davidoff, Davies and Roberson1999; Roberson et al., Reference Roberson, Davidoff, Davies and Shapiro2005). The alternatives straddled name boundaries in one language or another. Participants may naturally resort to colour names to facilitate encoding and discrimination in these contexts, as decisions for colours with the same or different names are readily apparent.
Resorting to names in the colour domain is magnified by the continuous nature of the stimulus. Indeed, individual stimuli drawn from continuous domains, such as colour, time and spatial distance, are difficult, if not impossible, to retain in memory because our brain does not encode these domains in the metrics conventionally used to measure them (e.g., hue, saturation and intensity values for colours). For this reason, Bayesian approaches to cognition have argued that retrieving individual stimuli from continuous domains can be explained by probabilistic inferences (Huttenlocher et al., Reference Huttenlocher, Hedges and Bradburn1990, Reference Huttenlocher, Hedges and Duncan1991, Reference Huttenlocher, Hedges and Vevea2000; Regier & Xu, Reference Regier and Xu2017; Shi et al., Reference Shi, Church and Meck2013). In memory-based colour discrimination tasks, uncertainty about which colour shade was previously seen leads participants to combine prior knowledge – the linguistic category – with the stimulus memory to infer the most likely seen colour from the presented alternatives (Cibelli et al., Reference Cibelli, Xu, Austerweil, Griffiths and Regier2016). This inference is argued to operate across and within a language. For example, when participants are instructed to pinpoint a previously seen colour in a continuous colour wheel, probabilistic inferences lead to responses biased towards typical category members. Likewise, the recollection of object sizes involves inferences from category-based prior knowledge, resulting in biased recollection towards typical object sizes (Hemmer & Steyvers, Reference Hemmer and Steyvers2009a, Reference Hemmer and Steyvers2009b; Steyvers & Hemmer, Reference Steyvers and Hemmer2012).
These examples indicate that response uncertainty and expected difficulty may encourage spontaneous language recruitment. As language is not necessarily recruited in all visual tasks, spontaneously resorting to language may depend on the convergence of contextual factors, such as stimulus properties, task goals and response uncertainty. For example, words would not be recruited if visual stimuli did not have standard names available. Likewise, inferences or covert naming may be unnecessary if task responses are easily determined. Thus, the role of language in cognitive tasks depends on how well it serves the processes involved in achieving the task goal.
2.3. Optimising performance with contextually available language
Semantic memory features or structures that have been processed recently within the experimental context remain more available for further use than structures that have not been recently used – a phenomenon often referred to as implicit priming or learning. For example, many lexical and syntactic priming studies show persistent reuse of words or structures across multiple intervening events or tasks without explicit recollection (Bock et al., Reference Bock, Dell, Chang and Onishi2007; Chang et al., Reference Chang, Dell and Bock2006, Reference Chang, Janciauskas and Fitz2012; Schacter, Reference Schacter1990). It follows that language use in the experimental context may modulate task performance at a distance, i.e., when language is not temporally contiguous with visual stimuli.
This possibility has been demonstrated in tasks involving decision-making with indeterminate responses, such as selecting an item among similar alternatives. A cross-linguistic study manipulated the task that preceded a similarity judgement task. Participants had to choose which of two manner or path alternatives was most similar to a target event. Similarity choices were aligned with linguistic meaning (path alternatives in Spanish) only when a description rather than a non-verbal task preceded the similarity judgements (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002). This result suggests that the linguistic context primed linguistic meanings and made them available to aid or support subsequent similarity choices. In Bayesian inference terms, verbal expressions become part of the priors – the contextually available information to infer responses – resulting in response biases towards linguistic meanings. Nevertheless, it remains unclear whether these biases arise spontaneously from the network’s activation dynamics when facing uncertainty or whether they are strategically (deliberately) controlled.
Interestingly, similar contextual language modulations have been observed in bilinguals, where similarity choices or sorting decisions are consistent with the language of the experimental context (Athanasopoulos et al., Reference Athanasopoulos, Bylund, Montero-Melis, Damjanovic, Schartner, Kibbe, Riches and Thierry2015; Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). Semantic memory in bilinguals is the subject of intense study (Heredia & Altarriba, Reference Heredia and Altarriba2014). However, a crude approximation to the observations in section 1 is that words from the two spoken languages will be associated with potentially distinct sets of conceptual features. Critically, words and linguistic structures in bilinguals are also linked to contrasting learning experiences, memories and practices that using each language throughout life entails. Indeed, bilingual studies have shown that language use can prime cultural values, practices and autobiographical memories experienced in that language (Akkermans et al., Reference Akkermans, Harzing and van Witteloostuijn2010; Chen & Bond, Reference Chen and Bond2007, Reference Chen and Bond2010; Holtgraves et al., Reference Holtgraves, Kashima, Kashima and Kidd2014; Marian & Kaushanskaya, Reference Marian and Kaushanskaya2007). Thus, using one or another language in the experimental context may increase the availability of words and knowledge schemas associated with the language of the context, supporting subsequent decisions. Nevertheless, control processes in bilinguals are a topic of active research, so other high-order influences may occur (Bialystok, Reference Bialystok2017; Filipović & Hawkins, Reference Filipović and Hawkins2019; Green & Abutalebi, Reference Green and Abutalebi2013).
In continuous domains with few known or nameable categories, such as spatial location and duration, uncertainty in judging or reproducing the precise stimulus duration, distance or location leads to probabilistic inferences based on relevant available knowledge. Many studies have shown, for example, that judging the duration of tones one after another is modulated by the tone duration of previous trials, as participants implicitly compare current and preceding stimuli. Across multiple trials, this comparison results in temporal judgements biased towards the overall stimulus average, a phenomenon referred to as central tendency effects (Jazayeri & Shadlen, Reference Jazayeri and Shadlen2010; Shi et al., Reference Shi, Church and Meck2013). Similar results have been reported for judgements of spatial locations, where averaged locations or coarse representations relative to known categories guide performance (Gudde et al., Reference Gudde, Coventry and Engelhardt2016; Huttenlocher et al., Reference Huttenlocher, Hedges and Bradburn1990, Reference Huttenlocher, Hedges and Duncan1991, Reference Huttenlocher, Hedges and Vevea2000; Tompary & Thompson-Schill, Reference Tompary and Thompson-Schill2021). These results suggest that continuous stimulus domains might be particularly susceptible to inferences based on contextually available information, either prior categorical (linguistic) knowledge or recent stimulus experience. From this observation, it follows that contextually available linguistic stimuli have the potential to modulate temporal or spatial judgements, as shown in some studies (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2017; Casasanto, Reference Casasanto2016).
The availability of linguistic labels or nameable features from prior experience has been extensively studied in category learning (Brojde et al., Reference Brojde, Porter and Colunga2011; Lupyan et al., Reference Lupyan, Rakison and McClelland2007; Vong et al., Reference Vong, Navarro and Perfors2016; Zettersten & Lupyan, Reference Zettersten and Lupyan2020). These studies have examined various stimulus types and task designs, including meaningful and non-meaningful labels for visual stimuli. Some of these studies suggest that categories with nameable features are learned more quickly and that this learning depends on stimulus features and prior linguistic experience. For instance, object shapes are inherently more nameable based on linguistic experience than object textures (Brojde et al., Reference Brojde, Porter and Colunga2011). Other studies have shown that colour or shape categories containing easier-to-name features are learned more effectively than those with harder-to-name features (Zettersten & Lupyan, Reference Zettersten and Lupyan2020). These findings suggest that prior verbal experience makes category learning (an instance of goal-oriented behaviour) more efficient and highlight promising avenues for exploring cross-linguistic variations in feature naming patterns.
3. Discussion
This brief review highlighted some operating principles of goal-directed action and decisions based on semantic knowledge, suggesting that higher-order cognitive processes may govern the interaction of linguistic and non-linguistic representations. Section 1 introduced the role of prior knowledge in supporting cognitive processes and suggested that verbal expressions become associated with conceptual features or structures through experience-based learning. This view is compatible with different languages establishing distinctive links to conceptual structures and leading to contrastive attentional patterns or cognitive representations. Nevertheless, cross-linguistic vocabulary or lexicalisation differences do not necessarily entail distinct cognitive representations beyond contrasting linguistic meanings. The semantic features that words and linguistic patterns bundle together during language use do not need to operate together in cognitive tasks outside of language use. In many respects, non-verbal representations are similar across languages despite linguistic differences because the physical world is largely shared, and its regularities are similarly learned (Malt et al., Reference Malt, Sloman and Gennari2003, Reference Malt, Gennari, Imai, Ameel, Tsuda and Majid2008; Papafragou et al., Reference Papafragou, Massey and Gleitman2002; Ünal et al., Reference Ünal, Ji and Papafragou2021).
Section 2 discussed how task goals, contexts and demands constrain the role of language in an experimental task. In studies involving linguistic and visual stimuli, such as those in section 2.1, the relationship between task goals and experimental contexts determines performance. Similarly, internally available verbal expressions can be recruited to expedite processing or enable further cognitive operations, such as visual search, colour discrimination or decisions (see section 2.2). Language use within the experimental context may encourage (or implicitly prime) resorting to linguistic meanings to facilitate other processes, such as decision-making or learning (see section 2.3). Resorting to language may thus depend on the convergence or interaction of context and stimulus properties, task goals and processing demands.
Generally, the adult cognitive system responds to experimental goals as efficiently as possible within the constraints of internal and contextually available representations and resources. Rather than viewing language as the driving force behind language effects, this perspective presents language as a resource for other cognitive processes oriented towards a goal within a constraining context. Language may intervene in a non-verbal task because domain-general cognitive processes promote its recruitment. Several computationally explicit theories support these goal-oriented processes, although they differ in their target level of explanation. Interactive connectionist and constraint satisfaction models aim to elucidate processing mechanisms, while Bayesian accounts strive to establish general cognitive principles independently of their mechanistic implementation (Chater et al., Reference Chater, Tenenbaum and Yuille2006; Jones & Love, Reference Jones and Love2011; McClelland et al., Reference Mcclelland, Mirman, Bolger and Khaitan2014). Based on these models, I have argued that previous findings demonstrating language modulations in non-verbal tasks exemplify the operation of these cognitive principles.
Nevertheless, specifying the structure and inner workings of the cognitive processes that guide behaviour is not simple. Multiple working memory, control and executive function theories have been proposed to explain cognition, which differ in the cognitive architectures assumed and the extent to which they mimic brain structure and mechanisms (Baddeley, Reference Baddeley2012; Botvinick et al., Reference Botvinick, Braver, Barch, Carter and Cohen2001; Botvinick & Cohen, Reference Botvinick and Cohen2014; Miyake & Friedman, Reference Miyake and Friedman2012; Shenhav et al., Reference Shenhav, Musslick, Lieder, Kool, Griffiths, Cohen and Botvinick2017; Van Ede & Nobre, Reference Van Ede and Nobre2025). Most theories assume some form of working memory as a system or network that temporarily holds and manipulates long-term representations. The information currently active in working memory is selected to fulfil one’s goals and may flexibly draw on different sources of available knowledge or accessible features, as schematically represented in Figure 3. Various processes have been proposed to occur within working memory, including information maintenance, selection and several types of conflict resolution and prioritisation processes (e.g., those dealing with dual-task goals, competing stimulus features, cues or responses). Ultimately, contextual constraints, such as the nature of the task, the stimuli and the experimental context, will modulate the relative reliance on one resource over another or how a specific task is performed.

Figure 3. Schematic representations of resources and a goal representation interacting within working memory, and modulated by external task constraints (e.g., stimulus structure, speed or accuracy demands).
For language–cognition interactions, it remains to be determined which cognitive tasks, stimuli or experimental contexts are more conducive to spontaneous or primed language recruitment. For example, Bayesian approaches suggest that tasks with indeterminate responses may be more likely to rely on prior linguistic and conceptual knowledge. Yet, systematic comparisons across decision and stimulus types are scarce. Likewise, it is unclear whether resorting to language knowledge, such as word meanings or stimulus names, is under conscious control or is rather implicit. These issues are not confined to language and cognition research but extend to other areas of cognitive research. The proliferation of increasingly specialised research fields in working memory, decision-making, attention or cognitive control makes it challenging to infer general cognitive mechanisms. In this respect, the Whorfian question presents a unique opportunity to continue concerted efforts to explore the tasks, contextual features and linguistic experiences that lead to one outcome or another.
From an experimental perspective, progress in understanding language–cognition interactions necessitates the development of higher-order theories that predict how and when verbal knowledge or experience serves task goals or permeates cognitive activities. Systematic hypothesis testing across tasks, stimulus types or contexts will enhance our understanding of how and why linguistic information modulates cognitive representations and, most importantly, elucidate why some tasks, and not others, exhibit verbal influences.