Search

3 - Collecting Data
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 33-54
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 3 considers different approaches to data collection. Three case studies are included. The first study involves a purpose-built corpus of news articles about obesity. We focus on theoretical considerations attending to corpus design, as well as practical challenges involved in processing texts provided by repositories such as LexisNexis to make them amenable to corpus analysis. The second study focuses on how corpus linguists might work with existing datasets, in this case, transcripts collected by research collaborators conducting ethnographic research in Australian Emergency Departments. We discuss the ways in which data collected for the purposes of different kinds of analysis is likely to require some pre-processing before it becomes suitable for corpus-based analysis. The third study is concerned with the creation of a corpus of anti-vaccination literature from Victorian England. We discuss the challenges involved in sourcing historical material from existing databases, selecting a principled set of potential texts for inclusion, and using optical character recognition (OCR) software to convert the texts into a format that is appropriate for corpus tools.

1 - Introduction
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 1-15
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 1 introduces the context and aims of the book, and provides a brief introduction to corpus linguistics for readers unfamiliar with it. It finishes by providing a chapter-by-chapter overview of the book.

11 - Positions Legitimated
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 169-185
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 11 introduces the concept of legitimation in discourse and considers how it might function, and be studied, in the context of health(care) communication. First, we look at how contributors to the online parenting forum Mumsnet use labels denoting attitudes towards vaccinations. We point out how labels that involve opposition to vaccinations, such as ‘anti-vaxxer’ tend to collocate with negation, and then consider how people justify negating the applicability of the label to themselves. This reveals a range of different concerns around vaccinations. We then draw on a study of patient feedback in which we examined how patients legitimate their perspectives and the evaluations they gave in their feedback. For example, this included patients representing themselves as experienced users of healthcare services. Additionally, some patients used aspects of their identities to position themselves as requiring attention, while others used techniques such as employing second person pronouns to imply that their experiences could be generalised to other patients.

7 - Change over Time
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 101-117
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 7 considers how language change over short timespans can be examined using corpus-assisted methods. We present three case studies. The first study involves a corpus of patient feedback relating to cancer care, collected for four consecutive years. A technique called the coefficient of variation was used to identify lexical items that had increased or decreased over time. The second study considered UK newspaper articles about obesity. To examine changing themes over time, we employed a combination of keyness and concordance analyses to identify which themes in the corpus were becoming more or less popular over time. Additionally, the analysis considered time in a different way, by using the concept of the annual news cycle. To this end, the corpus was divided into 12 parts, consisting of articles published according to a particular month, and the same type of analysis was applied to each part. The third case study involves an analysis of a corpus of forum posts about anxiety. Time was considered in terms of the age of the poster and in terms of the number of contributions that a poster had made to the forum, and differences were found depending on both approaches to time.

6 - Language Use and Identity
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 85-100
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 6 shows how it is possible to use demographic metadata to study identities in health-related corpora. We present two case studies, based on research on patient feedback on NHS services in England. The first study compares how cancer patients of different age and sex groups evaluate healthcare services and, specifically, how they use distinct linguistic and rhetorical strategies to do this. The corpus was encoded with demographic metadata which allowed the researchers to explore the language used by people of different age and sex identity groups. For the second study, a different corpus of more general patient feedback was used, one which did not contain demographic information metadata. Instead, targeted searches were used to identify patients’ demographic characteristics based on cases where they made those characteristics explicit within their feedback. In contrasting these case studies, we also evaluate the two different approaches taken, considering the affordances and limitations of both. Taken together, the case studies demonstrate how language and identity can be explored in corpora with and without reliable demographic metadata.

5 - Interaction
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 68-84
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 5 is concerned with sequential aspects of health-oriented interactions and the challenges this poses for corpus research. Two case studies demonstrate how conventional corpus procedures can be augmented with other linguistic approaches to facilitate a critical examination of the relationships between parts of the data that might otherwise be separated in corpus analysis. The first study is an investigation of a thread from an online forum dedicated to cancer – one that is explicitly dedicated to irreverent verbal play. We show how a corpus approach enabled the identification of humourous metaphors and helped us reveal recurrent lexical and grammatical features that facilitate discussion around sensitive topics, enable a coherent identity, and contribute to a sense of community. In the second study we use an approach that was originally applied to the Spoken BNC 2014 corpus to examine interactional data in terms of functional discourse units. We apply this coding framework to a sample of anxiety support forum data in order to document, quantify, and evaluate how various communicative purposes are formulated in forum posts and are met with different types of response.

13 - Conclusions
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 201-213
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 13 presents a synthesis of the previous chapters, beginning by asking the question – what have our experiences taught us about health communication that we didn’t know? We go on to examine lessons we learnt about carrying out corpus-based research on health communication, offering practical advice and tips for people who might be carrying out similar kinds of studies to the ones described in this book. We then consider the limitations of a corpus-based approach and end by looking to the future – what changes have taken place since we completed our analyses? What kinds of developments in the field of healthcare and in corpus linguistic analysis have occurred recently? And what avenues of research into health care do we believe are potentially interesting to investigate next?

2 - Research Questions
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 16-32
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 2 is concerned with research questions. We discuss the different processes through which research questions can be identified and developed in corpus-based research on health communication. Three case studies are considered. The first study involved the analysis of press representations of obesity. In this study, the researchers developed their own research questions in a variety of ways, including by drawing from the non-linguistic literature on obesity. The second study focused on the McGill Pain Questionnaire – a well-known language-based diagnostic tool for pain. A pain consultant asked the researchers if they could help understand why some patients find it difficult to respond to some sections of the questionnaire. In response, the researchers formulated a series of questions that could be answered using corpus linguistic tools, and identified some issues with the questionnaire that address the pain consultant’s concerns. The third study involved the analysis of patient feedback on the UK’s National Health Service. The researchers were approached by the NHS Feedback Team and given 12 questions that they were commissioned to answer by means of corpus linguistic methods.

12 - Dissemination
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 186-200
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 12 discusses the potential opportunities and challenges associated with disseminating the findings of corpus-based approaches to health communication, which also apply more generally to interdisciplinary research and collaborations between researchers and non-academic stakeholders. We include two case studies. The first case study involves work on patient feedback with members of the NHS who had provided a list of questions for us to work on. We discuss the importance of and challenges around building and maintaining relationships with members of this large, changing organisation, as well as outlining how we approached dissemination of findings, both in academic and non-academic senses, and the extent that we were able to carry out impact. The second case study considers our experiences of disseminating findings from a project on metaphors and cancer, focussing particularly on writing for a healthcare journal, dealing with the media, and going beyond corpus data to create a metaphor-based resource for communication about cancer.

4 - Ethics
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 55-67
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 4 considers ethical issues in healthcare communication research through two case studies. The first case study looks at a relatively straightforward situation involving a study of the Pain Concern online forum. Data from the forum was provided by Health Unlocked, a company that runs a large number of online communities related to health. One advantage of using their service was that Health Unlocked took care of relevant legal requirements concerning ethics and only shared data from contributors to the forum who had agreed for their posts to be used for research purposes. The second case study relates to the study of dementia and brings into focus the difficulties of working with multiple datasets and a range of stakeholders. The data collection for this project involved public health communication in terms of news media and external communications from support services, including social media. As such, it presents scenarios that are common to studies of health communication and thereby offers instruction in how to navigate related ethical concerns.

10 - Representing Social Actors
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 153-168
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 10 demonstrates how corpus approaches support the study of various social actors. We include two case studies. The first study investigates how representations of people with obesity in the UK press contribute to stigmatisation. The analysis orients around the naming strategies to collectively and individually refer to people with obesity, as well as the adjectives used to describe them and the activities that they are reported to be involved in. Furthermore, we show that people with obesity are regularly held up as figures of ridicule and obesity is discussed in the context of social deviance, foregrounded when reporting on perpetrators of crimes. The second study uses a tailor-made annotation system to discuss referential strategies, descriptions of traits and the capacity to carry out different kinds of actions in the context of voice-hearing, to critically consider the different degrees to which people who experience psychosis personify their voices. We track these representations in the reports of those with lived experience over time and consider the implications of a social actor model for therapeutic interventions to support those with chronic mental health issues.

8 - Historical Data
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 118-133
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 8 is concerned with the use of historical corpora in the study of language relating to health. We present two case studies – one where an issue is well understood and discussed publicly, the other where there was a clear issue with the framing of a discussion. For the former study we explore the VicVaDis corpus, first introduced in chapter 1. We combine different corpus techniques to show the main anti-vaccination arguments in the corpus and to point out parallels with present-day anti-vaccination discourse. The second case study looks at the emergence of venereal disease in the seventeenth century using the Early English Books Online corpus. By examining collocates of the word pox, we are able to weed out relevant uses of the word (e.g., those which referred to venereal disease) as opposed to those which do not. Additionally, we show that through the investigation of one type of collocate (words referring to geographical locations) the analysis was taken in an unexpected but rewarding direction.

9 - Representing the Experience of Illness
Elena Semino, Lancaster University, Paul Baker, Lancaster University, Gavin Brookes, Lancaster University, Luke Collins, Lancaster University, Tony McEnery, Lancaster University
Book:

Applying Corpus Linguistics to Illness and Healthcare

Published online:

05 September 2025

Print publication:

25 September 2025, pp 134-152
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

Chapter 9 considers how the experience of illness is represented linguistically, focussing on two contexts. In the first case study, collocational patterns were examined in order to show how people represented the word anxiety. Different patterns around anxiety were grouped together in order to identify oppositional pairs of representation (e.g., medicalising/normalising). The second case study involved an examination of the ways in which cancer was constructed in a corpus of interviews with and online forum posts by people with cancer, family carers, and healthcare professionals. Using a combination of manual analysis and corpus searches, we considered how metaphors were used to convey a sense of empowerment or disempowerment in the experience of cancer. More specifically, the analysis of metaphors around cancer revealed insights into people’s identity construction and the relationships between doctors and patients.

Applying Corpus Linguistics to Illness and Healthcare

Elena Semino, Paul Baker, Gavin Brookes, Luke Collins, Tony McEnery
Published online:

05 September 2025

Print publication:

25 September 2025
- Book
- - You have access
  - Open access
- Export citation
Communication is central to the experience of illness and the provision of healthcare. This book showcases the insights that can be gained into health communication by means of corpus linguistics – the computer-aided linguistic analysis of large datasets of naturally occurring language use known as 'corpora'. The book takes readers through the stages that they must go through to carry out corpus linguistic research on health communication, from formulating research questions to disseminating findings to interested stakeholders. It helps readers anticipate and deal with different kinds of challenges they may encounter, and shows the variety of applications of the methods discussed, from interactions in Accident and Emergency departments, to online discussions of mental illness, and press representations of obesity. Providing the reader with a wide range of clear case studies, it makes the relevant methods and findings accessible, engaging and inspiring. This title is also available open access on Cambridge Core.

Corpus linguistics in practice: Investigating register variation and usage - Eniko Csomay and William J. Crawford, Doing Corpus Linguistics (2nd edn.) New York: Routledge, 2024. Pp. 175. Hardback $52.99, ISBN: 9781032425771; e-book ISBN: 9871003363309.
Yuying Zhi
Journal:

English Today , First View

Published online by Cambridge University Press:

22 August 2025, pp. 1-3
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Pattern, Construction, System

A Unified Approach to Grammar and Lexis
Susan Hunston
Published online:

21 August 2025

Print publication:

04 September 2025
- Book
- - You have access
  - Open access
- Export citation
Construction Grammar and Systemic Functional Grammar take different approaches to the study of lexico-grammar, based on language as a cognitive and as a social phenomenon respectively. This is the first book to bring the two approaches together, using corpus-based Pattern Grammar as an underlying descriptive framework, in order to present a comprehensive and original treatment of verb-based patterns in English. It describes in detail two processes: deriving over 800 verb argument constructions from 50 verb complementation patterns; and using those constructions to populate systemic networks based on 9 semantic fields. The result is an approach to the lexis and grammar of English that unifies disparate theories, finding synergies between them and offering a challenge to each. Pattern Grammar, Construction Grammar and Systemic-Functional Grammar are introduced in an accessible way, making each approach accessible to readers from other backgrounds. This title is also available as open access on Cambridge Core.

Automatic Image Tagging for Corpus Linguistics

A Multimodal Study of News Representations of Islam
Paul Baker, Hanna Schmück, Yufang Qian
Published online:

02 July 2025

Print publication:

24 July 2025
- Element
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This Element reports on the creation and analysis of a 1.5-million-word corpus consisting of a year's worth of UK national press news articles about Islam and Muslims, published between December 2022 and November 2023. The corpus also contains 8,546 image files which have been automatically tagged using Google's Vertex AI. Analysis was carried out on three levels a) written text only, b) images only, c) interactions between written text and images. Using examples from the analyses, the authors demonstrate the affordances of these three approaches, providing a critical evaluation of Vertex AI's capabilities and the abilities of popular corpus software to work with visually tagged corpora. The Element acts as a practical guide for researchers who want to carry out this form of analysis. This title is also available as Open Access on Cambridge Core.

‘Quirky construction looking for syntactically flexible verb’: creativity and productivity in a dynamic link-based network perspective
Susanne Flach
Journal:

English Language & Linguistics / Volume 29 / Issue 2 / June 2025

Published online by Cambridge University Press:

20 June 2025, pp. 298-318
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Approaches to creativity commonly distinguish between F-creativity (rule-compliant use) and E-creativity (rule-breaking use). This dichotomy in part stems from a focus on grammatical constructions (‘nodes’) at the relative expense of their connections (‘links’). We approach creativity and productivity from a link-based perspective in Usage-Based Construction Grammar, and assume that productivity pertains to a unit’s inventory of links, while creativity pertains to the creation and maintenance of links. These assumptions are showcased using the into-causative (He talked me into going, They scared us into working harder). The construction is productive because it hosts a large inventory of verbal slot-fillers (talk, scare). Conversely, these slot-fillers themselves are creative because they can establish and maintain links with a construction that is not their primary host. This property is not linear: we assume that the slot-fillers’ ability to occur in unusual constructional environments reflects their general ‘creative potential’ to form and maintain (new) links within the network. In data from the Corpus of Contemporary American English (COCA), we find weak, but consistent correlations between verbs’ association with the into-causative and (i) their semantic and syntactic compatibility with the construction, and, crucially, (ii) their general flexibility and ability to establish and maintain links.

27 - The Meaning of “Reasonable”
from Part III - Applications
- By Lucien Baumgartner, Markus Kneer
Edited by Kevin Tobia, Georgetown University, Washington DC
Book:

The Cambridge Handbook of Experimental Jurisprudence

Published online:

17 May 2025

Print publication:

05 June 2025, pp 440-463
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The reasonable person standard is key to both Criminal Law and Torts. What does and does not count as reasonable behavior and decision-making is frequently deter- mined by lay jurors. Hence, laypeople’s understanding of the term must be considered, especially whether they use it predominately in an evaluative fashion. In this corpus study based on supervised machine learning models, we investigate whether laypeople use the expression “reasonable” mainly as a descriptive, an evaluative, or merely a value-associated term. We find that “reasonable” is predicted to be an evaluative term in the majority of cases. This supports prescriptive accounts, and challenges descriptive and hybrid accounts of the term – at least given the way we operationalize the latter. Interestingly, other expressions often used interchangeably in jury instructions (e.g. “careful,” “ordinary,” “prudent,” etc.), however, are predicted to be descriptive. This indicates a discrepancy between the intended use of the term “reasonable” and the understanding lay jurors might bring into the courtroom.

13 - Corpus Linguistics and Armchair Jurisprudence
from Part II - Introductions
- By Thomas R. Lee, Stephen C. Mouritsen
Edited by Kevin Tobia, Georgetown University, Washington DC
Book:

The Cambridge Handbook of Experimental Jurisprudence

Published online:

17 May 2025

Print publication:

05 June 2025, pp 201-216
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The law and corpus linguistics movement shares many of the commitments of experimental jurisprudence. Both are concerned with testing intuitions about legal concepts through the lens of empirical evidence gathered through experimentation. Though often discussed in the context of a given case or legal problem, linguistic evidence from legal corpora can help provide content to otherwise indeterminate concepts in the law.
Using language evidence from linguistic corpora, we can begin to have more meaningful conversations about what concepts like ordinary meaning, ambiguity, and speech community might actually mean and make progress on the boundaries of these concepts and their implications for legal interpretation. And, because corpora are constructed from linguistic utterances made in natural linguistic settings, they can provide an important check and means of triangulation for experimental jurisprudence claims that are often premised on survey data.

Search Results

Refine search

Refine search

Actions for selected content:

187 results

3 - Collecting Data

Summary

1 - Introduction

Summary

11 - Positions Legitimated

Summary

7 - Change over Time

Summary

6 - Language Use and Identity

Summary

5 - Interaction

Summary

13 - Conclusions

Summary

2 - Research Questions

Summary

12 - Dissemination

Summary

4 - Ethics

Summary

10 - Representing Social Actors

Summary

8 - Historical Data

Summary

9 - Representing the Experience of Illness

Summary

Applying Corpus Linguistics to Illness and Healthcare

Corpus linguistics in practice: Investigating register variation and usage - Eniko Csomay and William J. Crawford, Doing Corpus Linguistics (2nd edn.) New York: Routledge, 2024. Pp. 175. Hardback $52.99, ISBN: 9781032425771; e-book ISBN: 9871003363309.

Pattern, Construction, System

Automatic Image Tagging for Corpus Linguistics

‘Quirky construction looking for syntactically flexible verb’: creativity and productivity in a dynamic link-based network perspective

27 - The Meaning of “Reasonable”

Summary

13 - Corpus Linguistics and Armchair Jurisprudence

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

187 results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Applying Corpus Linguistics to Illness and Healthcare

Pattern, Construction, System

Automatic Image Tagging for Corpus Linguistics

Summary

Summary