Effective Human–AI Collaborative Intelligence

doi:10.1017/9781009587877.009

8 - Effective Human–AI Collaborative Intelligence

Published online by Cambridge University Press: 19 September 2025

Zhuoren Jiang and

Xiaozhong Liu

Edited by

Dan Wu and

Shaobo Liang

Show author details

Dan Wu: Affiliation:
Wuhan University, China
Shaobo Liang: Affiliation:
Wuhan University, China

Book contents

Summary

In today’s data-driven world, the demand for advanced intelligent systems to automate and enhance complex tasks is growing. However, developing effective artificial intelligence (AI) often depends on extensive, high-quality training data, which can be costly and time-consuming to obtain. This chapter highlights the potential of human–AI collaboration by integrating human expertise into machine learning workflows to address data limitations and enhance model performance. We explore foundational concepts such as Human-in-the-Loop systems, Active Learning, Crowdsourcing, and Interactive Machine Learning, outlining their interconnections as key paradigms. Through practical applications in diverse domains such as healthcare, finance, and agriculture, along with real-world case studies in education and law, we demonstrate how strategically incorporating human expertise into machine learning workflows can significantly enhance AI performance. From an information science perspective, this chapter emphasizes the powerful human–AI partnership that can drive the next generation of AI systems, enabling continuous learning from human experts and advancing capability and performance.

Keywords

Human-in-the-Loop Active Learning Crowdsourcing Interactive Machine Learning Human–AI Collaborative Intelligence

Information

Type: Chapter
Information: Human-AI Interaction and Collaboration , pp. 175 - 212

DOI: https://doi.org/10.1017/9781009587877.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

8 Effective Human–AI Collaborative Intelligence

8.1 Introduction

8.1.1 The Data Challenge in AI

Artificial Intelligence (AI) systems, especially those driven by machine learning (ML), thrive on the availability of high-quality data. Data serves as the cornerstone for model training, evaluation, and, ultimately, the effectiveness of AI in real-world applications. However, achieving this high-quality data for AI remains a significant challenge. While iterative optimization of model architectures can effectively enhance performance, the benefits derived solely from structural improvements are gradually diminishing. More attention needs to be directed toward the quality, availability, and management of data, which is vital for sustained AI advancement.

The data-for-AI pipeline, which encompasses data acquisition, integration, cleaning, and annotation, is often a bottleneck in AI development. Without systematic approaches to improving data quality, inaccuracies and inconsistencies can emerge, leading to unreliable models and increased costs. As defined by the ISO/IEC 25012 standard, high-quality data is characterized by “Accuracy,” “Completeness,” “Consistency,” “Credibility,” and “Currentness” (Gualo et al., Reference Gualo, Rodríguez, Verdugo, Caballero and Piattini2021). Achieving these qualities is particularly challenging due to the dynamic nature of data sources, especially in real-time systems, where data continuously evolves (Rahm & Do, Reference Rahm and Do2000; Reddy, Reference Reddy2023). In this context, managing and maintaining data quality across diverse, dynamic sources has become a critical aspect of AI system development (Madnick et al., Reference Madnick, Wang, Lee and Zhu2009; Pipino et al., Reference Pipino, Lee and Wang2002).

Beyond technical considerations, fairness and bias in data represent another significant challenge for AI (Mehrabi et al., Reference Mehrabi, Morstatter, Saxena, Lerman and Galstyan2022). As AI systems become more integrated into critical sectors such as healthcare and finance, biased training data can lead to harmful consequences. For instance, computer vision models for diagnosing malignant skin lesions have been shown to perform significantly worse on dark skin compared to light skin, with an AUROC (Area Under the Receiver Operating Curves) score that dropped by 10–15 percent (Liang et al., Reference Liang, Tadesse, Ho, Fei-Fei, Zaharia, Zhang and Zou2022). This gap resulted from biased data and annotation errors. Addressing these issues by improving annotations and ensuring diverse, representative datasets effectively mitigated the performance disparity, highlighting the importance of data diversity and fairness in AI.

Additionally, AI systems face pressing ethical concerns regarding data privacy. Techniques such as model inversion (Fredrikson et al., Reference Fredrikson, Jha and Ristenpart2015) and membership inference attacks (Hu et al., Reference Hu, Salcic, Sun, Dobbie, Yu and Zhang2022) have shown that even anonymized data can reveal sensitive information about individuals. The aggregation of data from multiple sources, particularly in international contexts where privacy regulations vary, further amplifies these risks (Casalini et al., Reference Casalini, González and Nemoto2021). Strong data governance is essential to ensure privacy protection, particularly in sectors like finance and e-commerce where the exposure of personal data could have serious ramifications (Panian, Reference Panian2010).

As these challenges illustrate, the complexities surrounding data quality are only part of the larger issue in AI development. Equally important are the significant costs and time constraints involved in acquiring and preparing the necessary data, which further complicates the efficient deployment of AI systems. The process of data labeling, for instance, can range from simple tasks – such as categorizing social media posts as positive, negative, or neutral – to more complex activities, like designing graphical interfaces for drawing bounding boxes around objects in video footage. These more advanced tasks often require substantial engineering resources, further driving up costs (Monarch, Reference Monarch2021).

Human error is another critical factor that complicates data acquisition. In some cases, such as analyzing broad consumer trends, errors may have a minor impact. However, in high-stakes applications like autonomous vehicles, even small mistakes in data labeling can lead to catastrophic outcomes, such as the failure to detect pedestrians. While some machine learning algorithms can tolerate minor noise, significant human errors often introduce biases that are difficult to rectify later in the training process, emphasizing the importance of high-quality training data (Monarch, Reference Monarch2021).

Data preparation itself is a resource-intensive task. Surveys reveal that data scientists spend approximately 80 percent of their time on tasks like data collection, cleaning, and labeling – far more than on model development (Press, Reference Press2022). Annotation, the process of adding labels to raw data to make it suitable for training machine learning models, often consumes more time and effort than model-building (Chai & Li, Reference Chai and Li2020; Monarch, Reference Monarch2021). Furthermore, enterprises continue to face data-related challenges, with 96 percent of organizations reporting difficulties in handling data (Dimensional Research, 2019), and 40 percent of them expressing concerns over data quality (Forrester Consulting, 2020).

In summary, data is a fundamental component of AI systems, particularly those driven by machine learning, serving as the foundation for training, evaluation, and real-world performance. However, acquiring and maintaining high-quality data presents significant challenges. Issues such as data quality, fairness, and privacy, along with the labor-intensive processes of data acquisition, annotation, and preparation, complicate the AI development pipeline. Human errors, biases in data, and ethical concerns regarding data privacy further exacerbate these challenges. Ultimately, overcoming these obstacles is critical for the successful deployment of AI systems in practical, high-stakes environments.

8.1.2 Leveraging Human Expertise in AI: Strategies for Effective Collaboration

Given these challenges, relying solely on automated machine learning methods may no longer be sufficient to address the complexities of real-world scenarios. To further enhance AI system performance and tackle data-related issues, leveraging human expertise and skills, particularly through human–AI collaboration at critical stages, has become essential for the development of efficient intelligent systems. By integrating effective human–AI collaborative intelligence techniques (Aldoseri et al., Reference Aldoseri, Al-Khalifa and Hamouda2023; Chai & Li, Reference Chai and Li2020; X. Wu et al., Reference Wu, Xiao, Sun, Zhang, Ma and He2022), we can improve AI’s ability to learn from diverse, high-quality data, enabling it to adapt to complex environments and address challenges related to data quality, fairness, and privacy. This approach paves the way for building systems that are not only high-performing but also fair, reliable, and ethically sound, accelerating AI’s advancement in practical applications.

In the context of AI development, integrating human expertise into machine learning frameworks allows prior knowledge to be embedded into the learning process, enhancing the model’s ability to make informed decisions even when data is limited (Diligenti et al., Reference Diligenti, Roychowdhury and Gori2017). Table 8.1 outlines the key stages where human expertise can be effectively integrated into the machine learning workflow (Dimensional Research, 2019):

Table 8.1Key stages for integrating human expertise in the machine learning workflow
Stages	Description
Data Extraction	Converting unstructured data into structured forms using human-provided rules or machine learning models.
Data Integration	Merging structured data from various sources, with human intervention resolving complex cases such as deduplication or schema alignment.
Data Cleaning	Identifying and correcting errors in datasets, with human involvement handling complex issues like missing values or duplicate data.
Data Annotation and Iterative Labeling	Iteratively labeling data, focusing human effort on critical samples to reduce annotation costs.
Model Training and Inference	Combining human insights with machine learning techniques, such as deep learning or hybrid methods, to improve model performance.

In these key stages, a series of strategies can be employed to effectively integrate human knowledge and achieve an efficient combination of human intelligence with machine learning (Monarch, Reference Monarch2021). The strategies outlined by Chai and Li (Reference Chai and Li2020) emphasize different approaches to improving machine learning systems. Quality improvement relies on human input to fine-tune outcomes and adjust task distribution for optimal results. Cost reduction is achieved through crowdsourcing and efficient management of human resources, minimizing overall expenses. Latency reduction tackles the slower response times associated with human involvement by implementing efficient machine learning models. Active learning directs human annotation efforts toward the most informative samples, maximizing model performance within budget constraints. Lastly, weak supervision allows for the semi-automatic generation of a large volume of useful labels, ensuring high accuracy while maintaining cost-effectiveness. These diverse strategies together enhance machine learning efficiency across multiple dimensions.

This chapter will delve into the technical foundations and practical applications of human–AI collaboration, providing a comprehensive understanding of how these systems are built, refined, and applied across various domains. We begin by exploring the core technical frameworks that support effective collaboration, including human-in-the-loop systems, active learning, crowdsourcing, and interactive machine learning, with a focus on their definitions, principles, quality control mechanisms, and cost considerations. This is followed by an in-depth examination of real-world applications and case studies, illustrating how these collaborative models are transforming industries. Finally, we turn to future directions, highlighting continuous learning from human experts, advancements in human–AI interfaces, the integration of large language models, and the ethical implications and societal impacts of these evolving technologies.

8.2 Technical Foundations of Human–AI Collaboration

8.2.1 Human-in-the-Loop (HITL) Systems

Definition and Principles

Human-in-the-Loop (HITL) systems represent a powerful framework that allows for the seamless collaboration between human intelligence and machine learning processes. The essence of HITL lies not just in improving machine learning performance, but also in enhancing human efficiency by enabling dynamic, real-time interaction with AI models throughout the learning process. By embedding human expertise, intuition, and contextual knowledge into various stages of data processing, HITL systems unlock the potential for a more symbiotic relationship between human oversight and automation (Monarch, Reference Monarch2021).

At its core, HITL systems engage humans in tasks like data preprocessing, annotation, and interactive labeling, where human insight is crucial for optimizing the quality and context of data that machines learn from. This direct involvement ensures that the strengths of human cognition – such as pattern recognition, domain expertise, and creative problem-solving – complement the raw processing power of machine learning algorithms (X. Wu et al., Reference Wu, Xiao, Sun, Zhang, Ma and He2022).

In domains like Natural Language Processing (NLP) and Computer Vision (CV), where training data often requires nuanced interpretation, the HITL paradigm can significantly enhance both the accuracy and efficiency of AI models. By iteratively incorporating human feedback into the learning cycle, these systems mitigate the limitations posed by sparse, ambiguous, or noisy data, fostering the development of AI that is not only technically proficient but also adaptable to complex real-world scenarios (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023). The ultimate goal of HITL is to integrate the strengths of both humans and machines, leveraging the precision of machines and the adaptability of humans to accelerate AI innovation and improve decision-making across various domains.

The foundational principles of Human-in-the-Loop Machine Learning (HITL-ML) are as follows (Monarch, Reference Monarch2021):

Improving Model Accuracy: One of the core objectives of HITL-ML is to enhance the accuracy of machine learning models by incorporating human feedback during key stages of the learning process. Human insight, particularly in tasks like annotation and validation, adds critical value to data quality, thereby refining model predictions.
Accelerating Target Accuracy: HITL systems aim to expedite the process of reaching desired accuracy levels for machine learning models. By involving humans in the iterative learning loop, the model can achieve optimal performance more quickly through targeted corrections and guidance.
Maximizing Combined Intelligence: HITL synergistically combines human intuition and machine precision to maximize accuracy. By leveraging the strengths of both human decision-making and automated learning, these systems can overcome limitations inherent in either component alone.
Enhancing Human Efficiency with AI Assistance: Another fundamental principle of HITL is to increase human efficiency by automating routine and labor-intensive tasks. Machine learning algorithms assist in handling large-scale data and repetitive tasks, freeing human experts to focus on higher-level decision-making and strategic oversight.

Benefits and Limitations

As with any system, HITL systems come with their inherent advantages and challenges.

The benefits of HITL systems include:

(1) Improved Accuracy in Sparse Data Contexts: One of the most significant benefits of HITL systems is their ability to excel in situations where data is scarce. By integrating human a priori knowledge, especially in data-limited fields such as clinical diagnosis, HITL systems can enhance model accuracy. In cases where high-quality labeled data is not readily available, human expertise fills the gap, ensuring that models perform optimally despite data constraints (X. Wu et al., Reference Wu, Xiao, Sun, Zhang, Ma and He2022).
(2) Enhanced Model Performance through Minimal Feedback: Another key benefit of HITL is its ability to improve model performance with minimal human input. Cognitive science research shows that even small datasets of human feedback can lead to substantial improvements in machine learning outcomes. HITL systems leverage this principle, allowing better training outcomes even with limited human engagement, thus increasing efficiency (X. Wu et al., Reference Wu, Xiao, Sun, Zhang, Ma and He2022).
(3) Increased Interpretability and Usability: Traditional machine learning models can often be difficult to interpret, leading to hesitancy in their adoption. HITL systems address this issue by incorporating human input into the learning process, which not only improves the interpretability of the results but also makes the models more usable in real-world applications. This human–AI interaction builds trust and allows practitioners to better understand and utilize AI-generated insights.
(4) Tailored Learning through Human Preferences: HITL systems offer the unique advantage of aligning AI models more closely with human preferences. By incorporating datasets reflecting human preferences, such as preferred summaries, HITL systems employ supervised learning to develop reward models that predict outcomes aligned with human choices. Reinforcement learning can then be applied to maximize these predictions, ensuring models meet user expectations (Ziegler et al., Reference Ziegler, Stiennon, Wu, Brown, Radford, Amodei, Christiano and Irving2020).

While HITL systems offer numerous advantages, several challenges remain. These potential constraints highlight areas where further research and refinement are necessary.

(1) One of the key challenges in HITL systems is how to efficiently and effectively embed human expertise and knowledge into machine learning models. Current approaches often rely heavily on manual data annotation (X. Wu et al., Reference Wu, Xiao, Sun, Zhang, Ma and He2022), but this limits the system’s ability to learn from human experience on a broader scale. Additionally, maintaining system robustness while integrating human input is a critical issue.
(2) Another significant limitation is the absence of standardized evaluation benchmarks for HITL systems (X. Wu et al., Reference Wu, Xiao, Sun, Zhang, Ma and He2022). Determining which key samples to use and how to construct appropriate evaluation metrics remains unclear. This becomes particularly important in scenarios such as active learning, where learners request human labeling. Ensuring the questions are presented in a clear and understandable manner is crucial, making the usability features – such as clarity, consistency, and efficiency – highly relevant in these systems (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).
(3) As HITL systems increasingly involve human input, ethical considerations such as fairness, transparency, and interpretability become more pressing. While HITL aims to make machine learning more understandable and reliable, integrating these human aspects into AI systems requires careful thought and ongoing development to ensure equitable and transparent outcomes (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).

8.2.2 Active Learning

Definition and Principles

Active Learning (AL) is a machine learning approach where the learner plays an active role in selecting data for human annotation (Monarch, Reference Monarch2021). The learner requests an oracle, typically a human expert, to label ambiguous or informative examples that will contribute significantly to improving the model (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023). This method is designed to address the challenge of the labeling bottleneck by iteratively presenting unlabeled instances to be annotated by the oracle (Burr, Reference Burr2009; Schröder & Niekler, Reference Schröder and Niekler2020). Active Learning aims to reduce the amount of data that requires human labeling while maintaining or even enhancing the overall model performance (Schröder & Niekler, Reference Schröder and Niekler2020). Through selective data sampling, AL allows for more efficient learning, leveraging both labeled and unlabeled data in the process (P. Liu et al., Reference Liu, Wang, Ranjan, He and Zhao2022; Monarch, Reference Monarch2021).

Active learning has already been widely applied in various fields, including collaborative filtering recommender systems, supervised remote sensing image classification, text classification using deep neural networks, and medical image analysis (P. Liu et al., Reference Liu, Wang, Ranjan, He and Zhao2022).

The following core principles guide the operation of Active Learning systems:

Learner Control Over Data: In Active Learning, the learner actively controls which data to query. The learner can request a knowledgeable entity, often a human expert, to annotate selected unlabeled examples, thereby focusing on the most informative data (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).
Iterative and Incremental Annotation Process: Active Learning follows an iterative, cyclic process in which the learner employs a query strategy to request labels for specific examples. These labeled examples are then used to refine and improve the model incrementally (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).
Combination of Labeled and Unlabeled Data: AL effectively operates as a form of semi-supervised learning, combining both labeled and unlabeled data. This approach is particularly valuable when labeled data is costly or scarce, enabling efficient use of available resources (Burr, Reference Burr2009).
Query Strategy for Effective Sampling: A critical aspect of AL is the query strategy (X. Wu et al., Reference Wu, Xiao, Sun, Zhang, Ma and He2022), which determines which examples the learner requests for labeling. Common strategies include:
1. (a) Membership Query Synthesis: The learner may generate and request labels for any instance, even synthetic ones.
2. (b) Stream-based Selective Sampling: The learner evaluates instances one at a time, choosing whether to query or discard each.
3. (c) Pool-based Sampling: The learner evaluates a pool of data to select the most informative examples for labeling.
Focus on Informative Instances: The ultimate goal of Active Learning is to select the most informative or uncertain examples for annotation. This approach maximizes the efficiency of the labeling process, leading to faster model improvement and reduced reliance on large labeled datasets.

Quality Control and Cost Consideration

Active learning’s quality control is centered on selecting the most valuable samples for labeling, assuming that human experts provide accurate annotations. The design of the selection process, whether model-driven or data-driven, plays a critical role. Key factors influencing quality include (P. Liu et al., Reference Liu, Wang, Ranjan, He and Zhao2022):

(1) Data: Representativeness, diversity, and other characteristics influence the quality of samples. Sample distribution is an inherent property of data, and there are at least three primary methods to leverage it: representativeness, diversity, and core-set selection. These methods connect sample selection to distribution from different angles, as a metric is still required to make choices, even when the distribution is well understood.
1. (a) Representativeness: Selecting samples that effectively represent the overall data distribution, ensuring that the chosen samples reflect the key characteristics of the entire dataset.
2. (b) Diversity: Selecting samples that exhibit a broad range of variability within the sample space, capturing more distinctions and reducing model bias.
3. (c) Core-set: Identifying a set of influential samples that can maximally summarize the information contained in other samples, thus simplifying the learning process.
(2) Model Attributes: Metrics such as gradient length and Fisher information are used to assess how much influence unlabeled data can have on the model (Chaudhuri et al., Reference Chaudhuri, Kakade, Netrapalli and Sanghavi2015; Sourati et al., Reference Sourati, Akcakaya, Leen, Erdogmus and Dy2017).
(3) Metrics: Includes adversarial metrics and uncertainty measures to guide sample selection.
1. (a) Best-versus-Second Best (BvSB): Compares the probabilities of the top two predictions to identify the most uncertain samples (Joshi et al., Reference Joshi, Porikli and Papanikolopoulos2009).
2. (b) Multiple Peak Entropy (MPE): Measures the entropy in the model’s predictions, selecting samples that reflect multiple uncertainty modes (B. Liu & Ferrari, Reference Liu and Ferrari2017).
3. (c) Query by Committee (QBC): Uses a committee of models to assess disagreement among them, selecting samples where the models exhibit the most uncertainty (Kee et al., Reference Kee, Del Castillo and Runger2018).

Cost considerations are an essential part of active learning. While annotating data is typically expensive, active learning helps mitigate these costs by minimizing the number of labeled samples required for effective learning. Some costs that may need to be considered in research and practice include:

Decision Cost: In active learning, decision cost considers both the expense of labeling data and the potential cost of future misclassification. This requires balancing the cost of annotation with the risk of errors if the instance is used in training. For example, researchers suggest summing labeling costs, assumed to be proportional to instance length, with expected misclassification costs (Kapoor et al., Reference Kapoor, Horvitz and Basu2007). While practical, this approach requires both costs to be expressed in the same currency, which can be challenging for some applications.
Model Training Costs: In active learning, sample selection can be computationally expensive, as each iteration typically requires re-evaluating the informativeness of every data point. This iterative process can be time-consuming.
Expert Availability: In active learning, expert availability is a critical cost, particularly in specialized fields like medical image analysis, where labeling requires a high level of expertise (Budd et al., Reference Budd, Robinson and Kainz2021). The limited availability of qualified experts can slow down the annotation process and increase costs.
Annotation Time: In active learning, annotation time could be a key cost, as the time spent labeling data can significantly impact the overall efficiency of the process. The design of the user interface and the size of the labeled set also play crucial roles in managing this cost. For example, researchers used active learning for morpheme annotation in rare language documentation, employing two human oracles to reduce both annotation time and corpus size (Baldridge & Palmer, Reference Baldridge and Palmer2009). Similarly, a study highlights that the interface design can be just as important as the active learning strategy itself in reducing time and cost, with the size of the labeled set directly influencing overall expenses (Druck et al., Reference Druck, Settles and McCallum2009).

Benefits and Limitations

Active learning offers significant benefits, particularly in terms of cost efficiency and managing scenarios with limited data. By focusing on selecting only the most informative samples, it reduces the number of labeled instances required, leading to significant cost savings while maintaining high-quality annotations (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023). This is especially valuable in domains where labeled data is scarce or expensive to obtain, such as rare disease detection, clinical decision-making, or specialized medical image analysis (Budd et al., Reference Budd, Robinson and Kainz2021). For instance, in radiology reports (Hoi et al., Reference Hoi, Jin, Zhu and Lyu2006; Nguyen & Patrick, Reference Nguyen and Patrick2014) or biological research (Luo et al., Reference Luo, Kramer, Goldgof, Hall, Samson, Remsen, Hopkins and Cohn2005), active learning enables models to achieve high accuracy with fewer labeled examples, minimizing annotation costs. Moreover, active learning is highly effective in addressing data limitations, such as class imbalance or constrained annotation budgets. In such cases, it enhances the performance of pre-trained models (PTMs) and large language models (LLMs) by optimizing sample selection in imbalanced datasets, making it a powerful tool for maximizing efficiency in resource-limited settings (Dor et al., Reference Dor, Halfon, Gera, Shnarch, Dankin, Choshen, Danilevsky, Aharonov, Katz and Slonim2020).

However, active learning is not without limitations (Donmez & Carbonell, Reference Donmez and Carbonell2008; Settles, Reference Settles2011). The assumption that human oracles are infallible and always available may not hold in practice (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023). Human annotators can experience fatigue, and labeling quality may vary over time. Additionally, querying samples one at a time can be inefficient, leading to batch querying methods, which, while more efficient, may still not fully address the variability in labeling quality. Furthermore, the cost of obtaining new labels is often assumed to be fixed, yet in real-world scenarios, this cost can vary depending on the complexity of the data. Lastly, in situations where data distributions or model classes change, active learning systems must be adaptable enough to incorporate new knowledge and adjust accordingly.

8.2.3 Crowdsourcing

Definition and Principles

Crowdsourcing refers to an online, distributed problem-solving and production model (Brabham, Reference Brabham2008), where tasks are outsourced to a large group of people – often through platforms like Amazon Mechanical Turk (Paolacci et al., Reference Paolacci, Chandler and Ipeirotis2010). The term was first coined in 2006 by Jeff Howe and Mark Robinson to describe how businesses were leveraging the Internet to “outsource work to the crowd” (Howe, Reference Howe2006). This approach has since become integral to machine learning, advancing key areas such as data generation, model evaluation, hybrid human–AI systems, and behavioral experiments that enhance our understanding of human interaction with AI technologies (Sheng & Zhang, Reference Sheng and Zhang2019; Vaughan, Reference Vaughan2018). By harnessing the complementary strengths of humans and machines, crowdsourcing enables the expansion of AI capabilities through collaborative intelligence.

Crowdsourcing operates on several key principles:

(1) Collective Intelligence: The concept of the “wisdom of crowds” suggests that large groups of people, when working together, can often solve problems or make judgments more effectively than individual experts (Kameda et al., Reference Kameda, Toyokawa and Tindale2022). This collective intelligence is fundamental to crowdsourcing, where the input from many contributors can lead to more accurate or creative solutions.
(2) Task Submission and Contribution: Crowdsourcing involves two main groups: requesters, who submit tasks to a crowdsourcing platform, and workers, who form the crowd that contributes to solving these tasks. The output is the result of this collective effort, and its quality often depends on the diversity and volume of contributions.
(3) Evaluation and Rewards: Requesters may evaluate the quality of the outputs and provide rewards based on performance. In some cases, quality control is delegated to the crowdsourcing platform, where outputs are evaluated and rewarded automatically. Rewards can vary, ranging from monetary compensation to gifts, reputation badges, or other incentives (Daniel et al., Reference Daniel, Kucherbaev, Cappiello, Benatallah and Allahbakhsh2019).

These principles ensure that crowdsourcing initiatives, regardless of their specific application, follow a structured process that leverages collective human intelligence to enhance AI systems (Lease, Reference Lease2011).

Quality Control and Cost Consideration

When implementing crowdsourcing, two primary concerns are cost efficiency and quality control. The challenge lies in ensuring that data labels and prediction models derived from crowdsourced work maintain high quality, especially given the varied skills and motivations of the workers (Daniel et al., Reference Daniel, Kucherbaev, Cappiello, Benatallah and Allahbakhsh2019; Sheng & Zhang, Reference Sheng and Zhang2019).

From a quality model perspective, several key factors affect the final output quality (Daniel et al., Reference Daniel, Kucherbaev, Cappiello, Benatallah and Allahbakhsh2019):

Data: The quality of both input (e.g., images, text) and output (e.g., labeled data, translated text) is crucial. High-quality data output is vital for the broader acceptance and effectiveness of crowdsourcing systems.
Task Design: How a task is described, structured, and managed significantly impacts worker engagement, output quality, and overall performance. Clear descriptions, user-friendly interfaces, and well-designed incentives (whether intrinsic, like status, or extrinsic, such as financial rewards) contribute to task efficiency. Additionally, terms and conditions, including privacy, intellectual property rights, and ethical standards, influence worker participation and compliance. Effective resource management is also important, ensuring that the quantity of useful work completed aligns with the resources consumed, thus sustaining productivity in crowdsourcing initiatives.
People: Both the requesters (who submit tasks and evaluate outputs) and workers (who complete tasks) play critical roles. Characteristics such as fairness, communication, and worker motivation affect the overall success of crowdsourcing tasks. Additionally, group dynamics, such as collaborative teams or the larger crowd, influence task performance and quality.

Table 8.2 outlines various methods used to evaluate outputs in crowdsourcing from a quality assessment perspective.

Table 8.2Quality assessment methods in crowdsourcing
Stages	Description
Individual Assessment	Involves individuals (workers, experts, or requesters) evaluating outputs, such as rating the accuracy of specific data or reviewing completed tasks.
Group Assessment	Uses the input of multiple people to arrive at a collective judgment, as seen in methods like voting or peer review.
Computation-based Assessment	Uses automated processes, such as comparing outputs to a predefined ground truth, to assess quality without human involvement.

To effectively manage crowdsourcing quality control, researchers should focus on several key strategies: incorporating more labeled data, integrating hybrid systems, targeting uncertain or diverse data, and conducting ongoing assessments. Additionally, using on-demand evaluations and creating more benchmarks can help minimize the need for reusing data (Lease, Reference Lease2011).

Benefits and Limitations

Crowdsourcing offers several significant benefits that make it a valuable tool in human–AI collaboration:

Lowered Costs and Improved Speed: Crowdsourcing allows for faster and more cost-effective data collection by tapping into a diverse workforce. This flexibility helps accelerate data gathering and reduces the financial burden on organizations (Vaughan, Reference Vaughan2018).
Improved Quality: By leveraging the collective intelligence of a diverse group of contributors, crowdsourcing can handle tasks that are difficult for computers alone. Human workers bring common sense, practical knowledge, and real-world experience to the table, leading to more nuanced and accurate outputs (Gomes et al., Reference Gomes, Welinder, Krause and Perona2011).
Increased Flexibility: Crowds can provide dynamic data for training machine learning algorithms. The ability to generate and aggregate potentially noisy labels has spurred research into methods for improving label accuracy, making crowdsourcing an effective tool for training AI models (Zhang et al., Reference Zhang, Wu and Sheng2016).
Promoting Diversity: Crowdsourcing encourages input from a broad range of contributors, leading to diverse perspectives and insights, which can enhance the quality and breadth of solutions (Vaughan, Reference Vaughan2018).
Data Generation and Evaluation: Crowdsourcing platforms are particularly adept at generating large datasets and evaluating machine learning models, for example, evaluating the coherence of topic models and enhancing the interpretability of predictions in supervised learning (Chang et al., Reference Chang, Gerrish, Wang, Boyd-Graber and Blei2009; Ribeiro et al., Reference Ribeiro, Singh and Guestrin2016).
Hybrid Intelligence Systems: Crowdsourcing supports hybrid intelligence systems that combine human reasoning with machine learning. These systems, drawing on human common sense, subjective beliefs, and life experiences, can outperform traditional AI in specific tasks that require flexible reasoning (Chang et al., Reference Chang, Gerrish, Wang, Boyd-Graber and Blei2009; Ribeiro et al., Reference Ribeiro, Singh and Guestrin2016).

Despite these advantages, crowdsourcing comes with notable limitations:

Impact on Product Quality: The inclusion of unqualified participants can lead to a significant number of low-quality or unusable contributions (Eapen et al., Reference Eapen, Finkenstadt, Folk and Venkataswamy2023). To mitigate errors, employers often rely on multiple workers to complete the same task, which increases both time and costs (Ipeirotis et al., Reference Ipeirotis, Provost and Wang2010).
Motivation and Engagement: It can be difficult to maintain consistent engagement and motivation among crowd participants. Varying levels of dedication can result in inconsistent contribution quality, affecting overall outcomes.
Cultural and Contextual Differences: Differences in cultural background and context among contributors may lead to misunderstandings or inconsistencies in contributions, posing a challenge for maintaining uniform quality.
Information Overload: The large volume of data generated through crowdsourcing can create challenges in filtering, managing, and processing information effectively.
Data Security and Privacy Concerns: Crowdsourcing tasks may inadvertently expose sensitive personal information, such as when workers input private data like business card details or share location information through certain tasks, raising concerns over privacy and security (Deng et al., Reference Deng, Shahabi and Demiryurek2013).

8.2.4 Interactive Machine Learning

Definition and Principles

Interactive Machine Learning (IML) refers to a learning paradigm where humans are actively involved in the machine learning process. Unlike classical machine learning (CML), which operates in a passive, offline manner, IML allows users – whether experts or non-experts – to interactively guide, correct, and refine models as they are being trained. This iterative process ensures that the human agent can adjust the learning process to meet specific needs, optimizing the performance and relevance of the models (Fails & Olsen, Reference Fails and Olsen2003; Porter et al., Reference Porter, Theiler and Hush2013; Ware et al., Reference Ware, Frank, Holmes, Hall and Witten2001).

IML systems enable humans to perform tasks that are complex for machines, such as refining classification boundaries or providing annotations in image segmentation. The human role can vary, from providing input at the beginning (e.g., initial data labeling) to intervening at the end (e.g., validating machine outputs). This interaction allows for a dynamic collaboration between humans and machines, enabling each to focus on the tasks they perform best (Porter et al., Reference Porter, Theiler and Hush2013; Ramos et al., Reference Ramos, Meek, Simard, Suh and Ghorashi2020).

Several key principles underpin IML systems:

Humans in the ML Loop: In IML, humans play an active role in the learning loop, performing tasks that require their expertise, such as labeling or correcting outputs. The inclusion of human agents makes IML particularly valuable in domains where uncertainty and complexity prevail, such as health informatics (Holzinger, Reference Holzinger2016).
Incremental and Iterative Learning: IML is characterized by an incremental and iterative learning process. Human input is continuously integrated as the model evolves, ensuring that the system adapts to real-time feedback and improves over time (Dudley & Kristensson, Reference Dudley and Kristensson2018).
Multiple Human Roles: The individuals involved in IML can assume various roles, from machine learning experts and data scientists to crowdsource workers or domain specialists. The diversity of human roles influences both the interaction and the outcomes of the system (Ramos et al., Reference Ramos, Meek, Simard, Suh and Ghorashi2020).
User-Friendly Interfaces: The design of the user interface is critical in IML systems, as it determines how users interact with the model and influences the overall learning outcome. Well-designed interfaces can empower non-experts to effectively contribute to the model-building process (Q. Yang et al., Reference Yang, Suh, Chen and Ramos2018).

In IML, quality control and cost considerations go beyond traditional metrics like accuracy. IML involves subjective evaluations that account for factors such as cost, confidence, and task complexity. Unlike classical approaches, IML adopts a human-centered perspective, placing emphasis on the model’s utility and effectiveness for end-users.

Quality Control and Cost Consideration

Quality control in IML integrates both algorithm-centered evaluations, which are typical in Active Learning, and human-centered assessments (Fiebrink et al., Reference Fiebrink, Cook and Trueman2011). This dual approach helps to address the “black-box” nature of machine learning algorithms, offering deeper insights into the model’s performance and making it more interpretable and accessible to users (Boukhelifa et al., Reference Boukhelifa, Bezerianos, Lutton, Zhou and Chen2018). Ultimately, this ensures that IML systems are not only accurate but also practical, efficient, and user-friendly.

The cost considerations in IML include the time and resources required for human involvement. Since humans are actively engaged in the iterative learning process – either by providing annotations or refining models – the cost of human labor becomes a significant factor. However, by focusing on tasks where human input is essential, IML can optimize resource usage while still producing high-quality outcomes that meet the needs of its users.

Benefits and Limitations

IML offers several key benefits, contributing to its robustness, trustability, and lower resource demands (Wondimu et al., Reference Wondimu, Buche and Visser2022):

Robustness: IML enhances the robustness of models by allowing users to understand how variations in input data affect predictions. This transparency helps in creating more resilient models that can perform reliably across different scenarios and datasets.
Trustability: By providing clear explanations for the model’s decisions, IML builds trust among users and stakeholders. When people can comprehend and verify the reasoning behind predictions, they are more likely to accept and rely on the model’s outcomes.
Low Resource Requirements: IML often requires fewer computational resources than traditional machine learning models. This makes it suitable for low-resource environments, enabling the deployment of interpretable models in diverse applications, especially where computational power is limited.

Despite its benefits, IML also presents certain limitations (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023):

Increased Development Complexity: IML combines machine learning with human–computer interaction, which significantly increases the complexity of developing such systems. Each IML application needs to be carefully designed and studied, complicating the overall development process.
Dependency on Human Expertise: While IML reduces computational demands and enhances learning efficiency, it relies heavily on human expertise. The system’s effectiveness can be limited by the availability and attention of human experts, as well as the varying levels of expertise, which can introduce inconsistencies and impact performance.

8.2.5 Interconnections as High-Level Paradigms

Human-in-the-Loop (HITL) serves as the overarching framework that connects various methods of human–AI collaboration, emphasizing the essential role of human involvement in machine learning processes. HITL, along with Interactive Machine Learning (IML), Active Learning (AL), and Crowdsourcing, forms a set of high-level paradigms that offer complementary approaches to integrating human intelligence with AI systems. Each method contributes to refining the learning process by leveraging different aspects of human engagement, as illustrated in Figure 8.1.

(1) Active Learning (AL): AL focuses on the strategic selection of data points that are most valuable for annotation. By asking humans to label only the most uncertain or informative data, AL reduces the number of required annotations, making the training process more efficient. As shown in Figure 8.1, AL is foundational to IML by enabling a more streamlined human–AI interaction, where the machine retains control while using humans as oracles for data labeling.
(2) Crowdsourcing: Crowdsourcing leverages the collective effort of nonexperts to gather labeled data at lower costs. This approach complements AL by reducing the cost of annotation while utilizing a large pool of contributors. However, as indicated in Figure 8.1, crowdsourcing often requires robust quality control due to the variable skill levels of contributors. The interplay between AL and crowdsourcing is crucial, as AL helps minimize the number of annotations, while crowdsourcing reduces the per-annotation cost, thereby substantially lowering the overall cost of creating training datasets (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).
(3) Humans in the HITL Workflow: As illustrated in Figure 8.1, humans play different roles and responsibilities throughout the various stages of the machine learning process. From providing high-quality annotations to validating, cleaning, and correcting results, humans ensure that the model continually improves. These diverse roles – annotators, validators, strategists – are critical in the HITL paradigm. This dynamic feedback loop enables humans to interact in real-time, enhancing both AL and IML processes.
(4) Interactive Machine Learning (IML): IML benefits from the synergy between AL and crowdsourcing, but it differs in its approach to human interaction. In IML, humans and machines share control, with humans performing tasks beyond simple labeling, such as refining classification boundaries or providing strategic input. As Figure 8.1 shows, IML emphasizes the continuous exchange of tasks between humans and machines, with humans taking on roles in data annotation, knowledge transfer, strategy design, and model evaluation. IML systems also incorporate Human–Computer Interaction (HCI) techniques, ensuring that the user experience is optimized for nonexperts and experts alike (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).
(5) Dynamic Feedback and Task Segmentation: Figure 8.1 also demonstrates how the HITL workflow facilitates real-time feedback and task segmentation between humans and AI systems. Humans handle tasks that are complex for machines, such as data annotation and model validation, while AI focuses on tasks like automated data selection, model optimization, and decision-making. This dynamic partnership between human input and machine learning drives continual model improvement and ensures a balance between efficiency and effectiveness in the learning process.

The interconnections between these paradigms are crucial for achieving a more robust, efficient, and cost-effective approach to machine learning:

Active Learning and Crowdsourcing: Active learning minimizes annotation needs by strategically selecting samples, while crowdsourcing lowers annotation costs by distributing tasks to nonexperts. Together, they substantially reduce the costs and efforts involved in building large training datasets (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).
Active Learning and Interactive Machine Learning: Active learning serves as the foundation for interactive machine learning, but the difference lies in how control is distributed. In active learning, the model controls the learning process and uses the human as an oracle for specific tasks, while interactive machine learning involves a closer, interactive relationship between the human and the learning system, with control shared between them. Additionally, interactive machine learning incorporates HCI techniques, enhancing user interaction in more flexible and less structured environments (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).

In summary, these paradigms – human-in-the-loop, active learning, interactive machine learning, and crowdsourcing – form a cohesive framework for integrating human and AI intelligence. Human-in-the-loop serves as the foundation, with active learning, interactive machine learning, and crowdsourcing contributing complementary techniques for improving learning outcomes, enhancing model robustness, and reducing costs. The continuous interaction between humans and machines ensures that these systems remain adaptive, effective, and responsive to real-world challenges.

A model diagram presents the concepts of human-AI collaborative intelligence with a human-in-the-loop system and HILT workflow. See long description.

Figure 8.1 Overview of technical concepts in effective human–AI collaborative intelligence

Figure 8.1Long description

Top. The human-in-the-loop system via learning with human leads to a complementary annotation method. It integrates active learning and crowdsourcing strategies. AL forms the foundation of IML. These two components are closely connected, working together to validate, clean, and correct results, and provide high-quality annotation and oversight. Crowdsourcing supports IML by assigning tasks that are easy for human but complex for machines, thereby enhancing the system’s overall effectiveness. Bottom. The HILT workflow follows a cyclical process encompassing data input, model training, and model output, with continuous interaction between human and AI. Human roles in the loop include data annotation, knowledge transferring, strategy design, feedback provision, and evaluation and verification. AI takes charge of data processing, and analysis, data automated selection, model training and optimization, and task completion and decision making.

8.3 Practical Applications and Case Studies

8.3.1 Practical Applications across Various Domains

Human–AI collaboration has shown transformative potential in numerous fields by combining advanced AI capabilities with human expertise. These efforts have significantly enhanced efficiency, accuracy, and decision-making across various domains.

Healthcare

In the healthcare domain, human–AI collaborative intelligence has become a pivotal force in advancing medical practices, improving patient outcomes, and enhancing overall system efficiency. A prominent example is medical imaging, which plays a critical role in clinical decision-making across multiple areas, including detection, diagnosis, and treatment planning (Tajbakhsh et al., Reference Tajbakhsh, Jeyaseelan, Li, Chiang, Wu and Ding2020). While traditional methods for analyzing medical images, such as manual segmentation, are often time-consuming and labor-intensive, human–AI collaboration offers new avenues for improvement. One such approach is interactive segmentation, where clinicians can refine AI-generated initial segmentations. For instance, researchers introduced a dual-CNN-based framework that allows user interaction to correct model inaccuracies, significantly reducing the time needed for precise image segmentation in applications like brain tumor and fetal MRI imaging (G. Wang et al., Reference Wang, Zuluaga, Li, Pratt, Patel, Aertsen, Doel, David, Deprest, Ourselin and Vercauteren2019).

Active learning is another crucial application. In cancer pathology, where extracting key details from reports can be tedious, AI systems can assist by flagging challenging cases for human annotators, thereby reducing the manual workload (De Angeli et al., Reference De Angeli, Gao, Alawad, Yoon, Schaefferkoetter, Wu, Durbin, Doherty, Stroup, Coyle, Penberthy and Tourassi2021). This hybrid method has shown impressive results, achieving similar performance to traditional methods while requiring fewer labeled data. Furthermore, in heart disease prediction, models that solicit expert feedback for the most relevant data points can reduce costs and enhance accuracy (El-Hasnony et al., Reference El-Hasnony, Elzeki, Alshehri and Salem2022). Human experts also play an essential role in preprocessing medical data, ensuring that the AI system receives high-quality inputs. Such collaboration leads to improved outcomes, as seen in the classification of oral cancer tissues (Folmsbee et al., Reference Folmsbee, Liu, Brandwein-Weber and Doyle2018) and radiology reports (Nguyen & Patrick, Reference Nguyen and Patrick2014).

Maintaining effective human–AI collaboration remains essential in healthcare, where the accuracy and reliability of AI systems must meet the stringent requirements of patient safety (Budd et al., Reference Budd, Robinson and Kainz2021).

Finance

Human–AI collaboration is increasingly transforming the financial domain, combining AI’s computational power with human expertise to enhance efficiency, accuracy, and decision-making. In the lending domain, artificial intelligence is being employed to streamline loan processing and ensure responsible lending decisions. However, machines alone are not sufficient for decision-making. For instance, a human–AI collaborative method was introduced to evaluate multiple independent and conflicting pieces of evidence in loan applications (Sachan et al., Reference Sachan, Almaghrabi, Yang and Xu2024). By comparing expert underwriters’ judgments with AI-generated results, the system learns human decision tendencies, optimizing decision quality through continuous human–AI interaction. In international maritime trade, algorithms automatically gather data to generate transaction tables, but human data scientists play a crucial role in reviewing and refining these outputs. Their feedback enhances algorithm performance and adaptability over time (Gronsund & Aanestad, Reference Gronsund and Aanestad2020). Moreover, active learning approaches in stock market sentiment analysis allow machines to query human experts for labels, significantly improving the model’s predictive performance (Smailovic et al., Reference Smailovic, Grcar, Lavrac and Znidarsic2014). This iterative process ensures that the AI continues to improve through expert guidance.

In summary, human–AI collaboration is essential for addressing complex financial challenges. This evolving synergy will continue to drive innovation and ensure fairness in financial services.

Agriculture

Human–AI collaboration has made significant advancements in the agricultural sector by improving efficiency in tasks such as crop monitoring and pest detection. For instance, in crop yield analysis, AI can detect and analyze cereal crops like wheat and sorghum, focusing on panicle density, a critical factor in evaluating yield. Researchers have introduced an automated system that combines weak and strong labeling methods to reduce annotation time and improve accuracy (Chandra et al., Reference Chandra, Desai, Balasubramanian, Ninomiya and Guo2020). This approach allows human annotators to collaborate with AI in a more efficient workflow, producing high-quality data with less manual effort. Similarly, distinguishing crops from weeds is another complex task where human expertise complements AI. Scientists have developed a model to identify samples requiring labeling, which are then manually annotated by experts. This method minimizes human workload while maintaining high classification accuracy (Sheikh et al., Reference Sheikh, Milioto, Lottes, Stachniss, Bennewitz and Schultz2020). In pest classification, research has demonstrated that cleaning redundant public datasets can reduce data size while achieving similar classification results, showcasing the power of human-guided data optimization (J. Yang et al., Reference Yang, Lan, Li, Gong, Zhang and Ercisli2022).

In essence, human–AI collaboration significantly advances agricultural practices by improving efficiency and accuracy, minimizing manual workload and leading to more effective and precise agricultural solutions.

Education

Human–AI collaboration has transformed personalized learning and education. Researchers have developed a system in which AI classifies student profiles, enabling human experts to customize learning experiences. This synergy between AI’s data processing capabilities and human expertise facilitates personalized feedback and content adjustments, leading to improved learning outcomes (De Melo et al., Reference De Melo, Flôres, De Carvalho, De Teixeira, Loja and de Sousa Gomide2014). In the realm of scientific reading, human–AI collaboration is critical for helping students comprehend complex documents. Researchers have proposed a collaborative PDF reader powered by Open Educational Resources (OERs), which uses machine learning to analyze reader behaviors, such as highlighting and questioning, and recommends relevant OERs like videos and slides. Readers provide feedback, allowing the system to refine its recommendations based on user interactions (Jiang et al., Reference Jiang, Liu, Gao and Tang2016; X. Liu et al., Reference Liu, Jiang and Gao2015). For complex mathematical content, the system suggests a Formula Evolution Map that tracks formula development and recommends related OERs (Jiang et al., Reference Jiang, Gao, Yuan, Gao, Tang and Liu2018). Human feedback is crucial, refining the system’s recommendations to ensure greater accuracy and contextual relevance. This collaboration between humans and AI improves understanding of complex concepts, making STEM (Science, Technology, Engineering, and Mathematics) reading more efficient and personalized.

Overall, human–AI collaboration in education leverages AI’s strength in data processing and the human ability to provide customized learning experiences. This partnership enhances personalized feedback, improves comprehension of scientific materials, and facilitates understanding of complex concepts through adaptive content recommendations and user interaction.

Misinformation Control and Cybersecurity

Human–AI collaboration has been applied in tackling misinformation and enhancing cybersecurity. For instance, researchers have used semi-supervised and active learning to detect camouflaged Chinese spam content, integrating human annotations with AI insights. This collaboration improves the accuracy of spam detection, making it more adaptable to evolving text camouflage and data imbalance challenges (Jiang et al., Reference Jiang, Gao, Duan, Kang, Sun, Zhang and Liu2020). Similarly, a security system has been developed by integrating AI with crowdsourced human feedback to combat online misinformation (Demartini et al., Reference Demartini, Mizzaro and Spina2020), while a human-centric vulnerability analysis system was created for cybersecurity (Shoshitaishvili et al., Reference Shoshitaishvili, Weissbacher, Dresel, Salls, Wang, Kruegel and Vigna2017).

These examples show how human–AI collaboration addresses complex challenges in information security, improving both the accuracy of threat detection and decision-making processes.

Law

The legal domain has embraced human–AI collaboration for tasks such as case classification and legal consultation. For instance, researchers have proposed a system where human analysts refine AI decision-making processes by adding new factors and adjusting the algorithm, ensuring flexible and informed legal decisions (Odekerken & Bex, Reference Odekerken, Bex, Villata, Harašta and Křemen2020). Similarly, another legal model employs Positive-Unlabeled Reinforcement Learning (PURL) to dynamically generate diagnostic legal questions. By integrating AI insights with human expertise, this model provides accurate, context-sensitive legal advice (Y. Wu et al., Reference Wu, Wang, Gumusel and Liu2024).

Generally, human–AI collaboration in the legal domain enables more flexible, informed, and context-sensitive decision-making by integrating AI’s analytical capabilities with human expertise, optimizing multiple legal processes.

Expanding Human–AI Collaboration in Other Domains

Human–AI collaboration also extends into other domains. For example, a human–machine interactive image search method was introduced by Kovashka et al. (Reference Kovashka, Parikh and Grauman2015), while human-in-the-loop semi-supervised learning has been applied for random gene regulation (Wrede & Hellander, Reference Wrede and Hellander2019). These efforts demonstrate the adaptability and scalability of human–AI collaboration across diverse industries, ensuring continuous innovation and improvement in performance.

As AI systems evolve, human oversight remains crucial for ensuring transparency, fairness, and efficiency. The integration of human expertise and AI drives innovation, improves decision-making, and enhances accuracy across various applications. These systems foster natural, efficient human–machine interactions, accelerating digital transformation and delivering smarter solutions across diverse industries.

8.3.2 Representative Case Studies

Case 1: Education Domain: Scaffolding with Open Educational Resources (OER)

An exemplary case of effective human–AI collaborative intelligence in the education field is demonstrated through the development of the OER-based Collaborative PDF Reader (OCPR). This system showcases how AI and human intelligence can work together to enhance the comprehension of complex scientific literature for students and junior scholars (Jiang et al., Reference Jiang, Liu, Gao and Tang2016, Reference Jiang, Gao, Yuan, Gao, Tang and Liu2018; X. Liu et al., Reference Liu, Jiang and Gao2015).

A typical interface of the OCPR system is shown in Figure 8.2. The system integrates advanced text mining and heterogeneous graph mining algorithms to recommend relevant Open Educational Resources (OERs), such as videos, slides, source code, and Wikipedia pages, based on students’ interactions with scientific publications. By leveraging human actions, such as highlighting and asking questions, and, most importantly, feedback on the recommended content, the system dynamically adjusts its recommendations, enabling students to better grasp complex concepts.

A screenshot of a PDF reader highlighting various features. See long description.

Figure 8.2 OER-based collaborative PDF reader system

Figure 8.2Long description

The highlighted features include the embedding of student or instructor knowledge within the PDF, user-to-user collaboration for annotating information need, and highlighted text that marks important content. They also encompass system-driven extraction of users’ implicit or explicit information needs, access to further assistance via the “Get Help” drop-down menu, and user-system collaboration through feedback on recommended results.

This human–AI collaboration enables the AI to respond to the emerging information needs of students, while allowing human oversight and intervention to refine the system’s suggestions. The integration of human judgment with AI-driven recommendations creates a learning environment where complex materials are made more accessible, promoting deeper engagement and understanding.

This case highlights the power of human–AI collaboration in educational settings, where the ability of AI to scaffold learning is enhanced by human input, illustrating the future of personalized, intelligent education.

Case 2: Legal Domain: Knowledge-infused Legal Wisdom

One notable example of effective human–AI collaborative intelligence in the legal field is the development of the D3LM (Diagnostic Legal Large Language Model) framework (Y. Wu et al., Reference Wu, Wang, Gumusel and Liu2024). This system illustrates how combining human oversight and AI can transform legal case analysis. As shown in Figure 8.3, the core innovation of D3LM lies in its ability to generate lawyer-like diagnostic questions that guide users in formulating their legal queries more accurately. By incorporating human knowledge through targeted questioning, D3LM extracts crucial case details, which significantly improves the AI’s understanding and predictive capabilities.

A block diagram compares a case solution using three legal service methodologies. See long description.

Figure 8.3 Comparison of legal service methodologies, focusing on traditional LLMs, lawyer consultations, and the D3LM model

Figure 8.3Long description

The block diagram compares how a legal case is handled using three legal service methodologies: a standard LLM, a human lawyer, and the D3LM model. The process begins with an unclear fact description from the client, which initiates a professional dialog involving follow-up questions—such as inquiries about surveillance footage and intoxication. These interactions help construct a structured fact set, which serves as the basis for generating a legally sound court view. Unlike the other two approaches, the LLM attempts to generate a legal conclusion directly from the vague initial description, without engaging in clarification or fact gathering. As a result, its response is incomplete and overlooks key facts. The human lawyer integrates critical evidence and context into a professional judgment, though at a higher cost. In contrast, the D3LM model autonomously conducts dialog, extracts essential facts, and produces a court-view-level decision that is both cost-effective and legally coherent.

Unlike traditional LLMs that passively respond to user inputs, D3LM actively engages with users to gather missing or overlooked information. It utilizes the Positive-Unlabeled Reinforcement Learning (PURL) algorithm to dynamically adapt its questioning strategy, making it an example of Human-in-the-Loop systems where humans enhance AI’s performance through ongoing interaction. This collaboration results in more accurate legal predictions, enriched by human guidance.

This case highlights how human–AI collaboration can empower individuals without legal expertise to receive professional-quality assistance at a fraction of the cost. Moreover, it demonstrates the potential of integrating human knowledge with advanced AI technologies to bridge gaps in domains where precise and detailed understanding is essential.

8.4 Future Directions

8.4.1 Continuous Learning from Human Experts

Model Robustness and Efficiency

The continuous integration of human expertise into AI models offers a promising pathway to enhancing their robustness and efficiency but it also presents significant challenges. As AI systems undergo iterative training cycles driven by human feedback, issues such as catastrophic forgetting may arise, where models struggle to retain previously learned information when exposed to new data. Addressing this requires strategies like active learning, which ensures models selectively query the most informative data, thereby optimizing learning while maintaining robustness (Bartolo et al., Reference Bartolo, Roberts, Welbl, Riedel and Stenetorp2020; Budd et al., Reference Budd, Robinson and Kainz2021). Incorporating human rationales into these models can further bolster both performance and stability, reducing the chances of losing acquired knowledge over time (Arous et al., Reference Arous, Dolamic, Yang, Bhardwaj, Cuccu and Cudré-Mauroux2021).

Another significant challenge in continuous learning from human experts is the efficiency of data annotation, particularly for complex tasks. Manual annotation is labor-intensive and time-consuming. While innovations like self-annotation frameworks and deep reinforcement active learning have shown potential to streamline the process (Le et al., Reference Le, Sugimoto, Ono and Kawasaki2020; Z. Liu et al., Reference Liu, Wang, Gong, Lu and Tao2019), further advancements are needed to minimize the dependency on large-scale labeled data. For instance, some frameworks improve model robustness by integrating active learning with adversarial examples, which significantly reduces the need for annotations while boosting performance (Guo et al., Reference Guo, Shi, Kang, Kuang, Tang, Jiang, Sun, Wu and Zhuang2021).

Balancing Expertise and Ensuring Feedback Quality

Balancing expertise and feedback quality is another critical factor. While non-experts can contribute valuable insights in some cases, tasks requiring specialized knowledge demand domain experts for accuracy (Fan et al., Reference Fan, Li, Yuan, Dong and Liang2019). Establishing clear guidelines for differentiating tasks suitable for nonexperts from those requiring expert input is essential for improving feedback quality. Additionally, variability in annotator expertise can lead to uneven feedback, which may negatively impact model performance (Jwo et al., Reference Jwo, Lin and Lee2021; Z. Liu et al., Reference Liu, Guo and Mahmud2021). Thus, developing improved methods for evaluating and filtering human feedback is essential, particularly in complex, high-stakes domains where consistent and reliable input is critical (He et al., Reference He, Michael, Lewis and Zettlemoyer2016). For instance, techniques that employ adversarial learning and knowledge graphs have proven effective in tackling these challenges by improving feedback quality and maximizing the utility of human input, even in highly imbalanced datasets (Jiang et al., Reference Jiang, Gao, Duan, Kang, Sun, Zhang and Liu2020).

Real-World Applications and Generalization

Finally, ensuring that AI systems can generalize effectively across real-world scenarios requires robustness to domain shifts and the handling of out-of-distribution samples. Real-world environments often introduce complexities such as inconsistent data quality and ambiguous tasks (Kreutzer et al., Reference Kreutzer, Riezler and Lawrence2021; Ziegler et al., Reference Ziegler, Stiennon, Wu, Brown, Radford, Amodei, Christiano and Irving2020). Research has already begun addressing these challenges, for instance, by using inconsistency-based sample selection to guide human experts in annotating the most uncertain and inconsistent samples, thereby improving model accuracy and reducing labeling costs (Guo et al., Reference Guo, Kang, Duan, Liu, Tang, Zhang, Kuang, Sun and Wu2022). Future research should focus on developing domain-agnostic models that adapt flexibly to diverse scenarios without extensive retraining (Xu et al., Reference Xu, Dainoff, Ge and Gao2023; Ziegler et al., Reference Ziegler, Stiennon, Wu, Brown, Radford, Amodei, Christiano and Irving2020), enhancing the scalability and applicability of human–AI collaborative systems across various domains.

8.4.2 Advancements in Human–AI Interfaces

Advancements in human–AI interfaces are essential for improving collaborative systems that rely on seamless interaction between humans and AI. A critical aspect of this improvement lies in creating more intuitive and user-friendly interfaces, which facilitate better communication between users and AI models. Human-centered design approaches can significantly enhance the quality of feedback from users, leading to improved model performance. For instance, visualizations that clarify what the model has learned play a pivotal role in improving interpretability. This allows non-experts to contribute effectively to model refinement by making the AI’s decision-making process more transparent (Lee et al., Reference Lee, Smith, Seppi, Elmqvist, Boyd-Graber and Findlater2017; Z. J. Wang et al., Reference Wang, Choi, Xu and Yang2021).

Future developments in this area should focus on building interfaces that not only collect feedback but also guide users in providing targeted and actionable input. Systems that assist users in offering more precise input can significantly enhance the learning process. Furthermore, interfaces that allow users to track the evolution of AI models over time help increase transparency, fostering greater trust and long-term collaboration between humans and AI (Smith et al., Reference Smith, Kumar, Boyd-Graber, Seppi and Findlater2018; Wallace et al., Reference Wallace, Rodriguez, Feng, Yamada and Boyd-Graber2019). Human-centered designs in crowdsourcing annotation systems have been shown to enhance the quality of feedback and improve active learning model performance, highlighting the importance of intuitive interface design in optimizing human–AI collaboration (Dong et al., Reference Dong, Kang, Liu, Sun, Fan, Jin, Wu, Jiang, Niu and Liu2023).

Another area of focus is the growing demand for interactive tools that enable users to dynamically explore data and influence AI outputs in real-time (Patil et al., Reference Patil, Amer-Yahia and Subramanian2022). These tools allow for immediate, informed adjustments, aligning human feedback more closely with AI decision-making, particularly in complex scenarios (Shraga, Reference Shraga2022). In high-demand environments, interactive data cleaning tools play a crucial role in optimizing real-time human–AI interactions, further improving the system’s overall effectiveness (Räth et al., Reference Räth, Onah and Sattler2023).

8.4.3 Integration with Large Language Models

The integration of Large Language Models (LLMs) into human–AI collaborative systems is becoming increasingly prevalent, significantly enhancing both the efficiency and accuracy of AI-driven tasks. Recent active learning frameworks, which combine human annotations with LLM-driven text classification, have demonstrated the effectiveness of techniques like uncertainty sampling in prioritizing and annotating the most informative data points. These approaches help reduce the annotation workload while maintaining or even improving model accuracy, offering a promising method for optimizing machine learning workflows (Rouzegar & Makrehchi, Reference Rouzegar and Makrehchi2024). For instance, a recent study exemplifies how LLMs can be dynamically integrated with active learning to iteratively improve model performance, especially in few-shot learning scenarios, while reducing computational costs (C. Liu et al., Reference Liu, Zhao, Kuang, Kang, Jiang, Sun and Wu2024).

However, LLMs are not without limitations. Their inherent biases and lack of human-like decision-making capabilities pose significant challenges, especially in high-stakes or sensitive applications. In such environments, eliminating human oversight entirely is neither feasible nor advisable. Human involvement remains essential in safety-critical systems to ensure that AI-driven decisions are safe, reliable, and ethically sound (Xiao et al., Reference Xiao, Dong, Zhao, Wu, Lin, Chen and Wang2023).

The future of LLM integration in human–AI collaborative systems will involve finding the right balance between automation and human intervention. While LLMs can complement human expertise in many tasks, they cannot fully replace human judgment in complex or ethically sensitive areas. Ongoing research is focused on developing more interactive and adaptable systems that can manage real-world complexities while ensuring that human oversight remains central to critical decision-making processes.

8.4.4 Ethical Implications and Societal Impact of Human–AI Collaboration

As human–AI collaboration continues to evolve, addressing the ethical concerns associated with these advancements is crucial. A significant risk is that AI systems trained on human feedback could be misused, potentially manipulated to influence beliefs, spread radical ideologies, or facilitate fraudulent activities (Stiennon et al., Reference Stiennon, Ouyang, Wu, Ziegler, Lowe, Voss, Radford, Amodei and Christiano2020). In particular, distinguishing between LLM-generated and human-written content has become increasingly difficult, raising concerns about trustworthiness in human–AI collaboration. While domain-specific content, such as scientific writing, may exhibit subtle differences, LLMs are still capable of propagating biased or ideologically influenced content, posing substantial risks to society (Ma et al., Reference Ma, Liu, Yi, Cheng, Huang, Lu and Liu2023; X. Zhou et al., Reference Zhou, Wang, Wang, Tang and Liu2023). These risks highlight the importance of developing trustworthy AI systems that embed ethical principles, ensuring transparency, fairness, and accountability, and include safeguards against malicious use, thus preventing unintended harm.

Another critical challenge lies in the vulnerability of machine learning algorithms to adversarial attacks, where malicious actors can compromise training data, threatening the integrity and security of AI models (Rožanec et al., Reference Rožanec, Montini, Cutrona, Papamartzivanos, Klemenčič, Fortuna, Mladenić, Veliou, Giannetsos, Emmanouilidis and Soldatos2024). Detecting and preventing such attacks will be essential in maintaining the reliability of AI-driven systems, ensuring they remain secure and beneficial. The next stage of AI development must therefore balance technological progress with strong ethical considerations, guaranteeing that AI systems operate in a way that is transparent, fair, and aligned with human values (Mosqueira-Rey et al., Reference Mosqueira-Rey, Hernández-Pereira, Alonso-Ríos, Bobes-Bascarán and Fernández-Leal2023).

As human–AI collaborative systems are increasingly integrated into various industries, their societal impact must be closely monitored. Research should focus on ensuring the equitable distribution of AI technologies and their application for the greater good, continuously addressing the evolving ethical challenges that emerge from their widespread use.

8.5 Conclusion

In this chapter, we first addressed the critical role of high-quality data in AI development and the challenges of data acquisition, management, and fairness. Machine learning systems depend heavily on accurate, diverse, and timely data, but ensuring this remains a key hurdle. Human errors and biases, alongside ethical concerns like privacy, further complicate this process. To tackle these issues, human expertise plays a crucial role in stages such as data extraction, cleaning, annotation, and model training. By leveraging strategies like active learning and weak supervision, human-AI collaboration enhances system performance, fairness, and scalability.

In Section 8.2, we explored the technical foundations of human–AI collaboration, focusing on Human-in-the-Loop (HITL) systems, active learning, crowdsourcing, and interactive machine learning. HITL systems enhance model accuracy and decision-making by integrating human expertise, but face challenges in embedding input effectively and ensuring robust evaluation. Active learning reduces labeling workload by selecting the most informative data points, balancing efficiency with annotation quality and cost concerns. Crowdsourcing accelerates data generation with collective intelligence, requiring strong quality control to handle varied contributor skill levels. Interactive machine learning fosters real-time human–AI collaboration, improving adaptability but increasing complexity and reliance on human expertise. These paradigms together optimize human–AI collaboration, balancing human oversight with machine automation for better outcomes.

In Section 8.3, we explored the practical applications and case studies of human–AI collaboration across various domains. In healthcare, AI aids in medical imaging and pathology, while human expertise ensures precision and reliability, improving efficiency in areas like cancer detection and heart disease prediction. In finance, AI supports decision-making in lending and stock analysis, with human oversight refining outcomes to enhance quality and fairness. In agriculture, AI assists in crop monitoring and pest detection, with human feedback improving accuracy and reducing manual effort. In education, AI personalizes learning experiences and human input guides content recommendations for better learning outcomes. In misinformation control and cybersecurity, AI models, combined with human feedback, improve threat detection and response. In law, AI supports legal consultation, with human oversight refining diagnostic processes for more accurate decisions. In the case of the OER-based Collaborative PDF Reader, human feedback refines AI content recommendations, while, in the case of the D3LM Legal Model, human oversight enhances AI’s legal question generation. These examples illustrate the transformative power of human–AI collaboration across diverse fields, optimizing efficiency, decision-making, and accuracy.

In Section 8.4, we explored the future directions of human–AI collaborative intelligence, focusing on several key areas. Continuous learning from human experts emphasizes improving model robustness and efficiency through human feedback, while addressing challenges like catastrophic forgetting and optimizing annotation processes. Advancements in human–AI interfaces will prioritize transparency, intuitive design, and dynamic feedback to foster trust and improve collaboration. The integration of LLMs highlights their growing role in enhancing tasks like text classification, but maintaining human oversight is essential for ensuring safety and ethical AI use. Finally, the ethical implications and societal impact of human–AI collaboration require attention to issues such as bias, adversarial attacks, and manipulation, with a focus on developing transparent, fair, and secure AI systems that benefit the whole society.

References

Aldoseri, A., Al-Khalifa, K. N., & Hamouda, A. M. (2023). Re-thinking Data Strategy and Integration for Artificial Intelligence: Concepts, Opportunities, and Challenges. Applied Sciences, 13(12), 7082.10.3390/app13127082CrossRef Google Scholar

Arous, I., Dolamic, L., Yang, J., Bhardwaj, A., Cuccu, G., & Cudré-Mauroux, P. (2021). Marta: Leveraging Human Rationales for Explainable Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 35(7), 5868–5876.10.1609/aaai.v35i7.16734CrossRef Google Scholar

Baldridge, J., & Palmer, A. (2009). How Well Does Active Learning Actually Work? Time-based Evaluation of Cost-reduction Strategies for Language Documentation. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 296–305.Google Scholar

Bartolo, M., Roberts, A., Welbl, J., Riedel, S., & Stenetorp, P. (2020). Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension. Transactions of the Association for Computational Linguistics, 8, 662–678.10.1162/tacl_a_00338CrossRef Google Scholar

Boukhelifa, N., Bezerianos, A., & Lutton, E. (2018). Evaluation of Interactive Machine Learning Systems. In Zhou, J. & Chen, F. (eds.), Human and Machine Learning (pp. 341–360). Springer International Publishing.10.1007/978-3-319-90403-0_17CrossRef Google Scholar

Brabham, D. C. (2008). Crowdsourcing as a Model for Problem Solving: An Introduction and Cases. Convergence: The International Journal of Research into New Media Technologies, 14(1), 75–90.10.1177/1354856507084420CrossRef Google Scholar

Budd, S., Robinson, E. C., & Kainz, B. (2021). A Survey on Active Learning and Human-in-the-Loop Deep Learning for Medical Image Analysis. Medical Image Analysis, 71, 102062.10.1016/j.media.2021.102062CrossRef Google Scholar PubMed

Burr, S. (2009). Active Learning Literature Survey. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar

Casalini, F., González, J. L., & Nemoto, T. (2021). Mapping Commonalities in Regulatory Approaches to Cross-border Data Transfers. OECD Trade Policy Papers.Google Scholar

Chai, C., & Li, G. (2020). Human-in-the-Loop Techniques in Machine Learning. IEEE Data Engineering Bulletin, 43(3), 37–52.Google Scholar

Chandra, A. L., Desai, S. V., Balasubramanian, V. N., Ninomiya, S., & Guo, W. (2020). Active Learning with Point Supervision for Cost-effective Panicle Detection in Cereal Crops. Plant Methods, 16(1), 34. https://doi.org/10.1186/s13007-020-00575-8 CrossRef Google Scholar PubMed

Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J., & Blei, D. (2009). Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems, 22, 288–296.Google Scholar

Chaudhuri, K., Kakade, S. M., Netrapalli, P., & Sanghavi, S. (2015). Convergence Rates of Active Learning for Maximum Likelihood Estimation. Advances in Neural Information Processing Systems, 28, 1090–1098.Google Scholar

Daniel, F., Kucherbaev, P., Cappiello, C., Benatallah, B., & Allahbakhsh, M. (2019). Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions. ACM Computing Surveys, 51(1), 1–40.10.1145/3148148CrossRef Google Scholar

De Angeli, K., Gao, S., Alawad, M., Yoon, H.-J., Schaefferkoetter, N., Wu, X.-C., Durbin, E. B., Doherty, J., Stroup, A., Coyle, L., Penberthy, L., & Tourassi, G. (2021). Deep Active Learning for Classifying Cancer Pathology Reports. BMC Bioinformatics, 22(1), 113. https://doi.org/10.1186/s12859-021-04047-1 CrossRef Google Scholar PubMed

De Melo, F. R., Flôres, E. L., De Carvalho, S. D., De Teixeira, R. A. G., Loja, L. F. B., & de Sousa Gomide, R. (2014). Computational Organization of Didactic Contents for Personalized Virtual Learning Environments. Computers & Education, 79, 126–137.10.1016/j.compedu.2014.07.012CrossRef Google Scholar

Demartini, G., Mizzaro, S., & Spina, D. (2020). Human-in-the-Loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Engineering Bulletin, 43(3), 65–74.Google Scholar

Deng, D., Shahabi, C., & Demiryurek, U. (2013). Maximizing the Number of Worker’s Self-selected Tasks in Spatial Crowdsourcing. Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 324–333.10.1145/2525314.2525370CrossRef Google Scholar

Diligenti, M., Roychowdhury, S., & Gori, M. (2017). Integrating Prior Knowledge into Deep Learning. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 920–923.10.1109/ICMLA.2017.00-37CrossRef Google Scholar

Dimensional Research. (2019). What Data Scientists Tell Us About AI Model Training Today. Alegion. https://cdn2.hubspot.net/hubfs/3971219/Content%20Offers/Drafts/Alegion_SurveyNarrative.pdf Google Scholar

Dong, J., Kang, Y., Liu, J., Sun, C., Fan, S., Jin, H., Wu, D., Jiang, Z., Niu, X., & Liu, X. (2023). Human-centred Design on Crowdsourcing Annotation towards Improving Active Learning Model Performance. Journal of Information Science, doi: 10.1177/01655515231204802CrossRef Google Scholar

Donmez, P., & Carbonell, J. G. (2008). Proactive Learning: Cost-sensitive Active Learning with Multiple Imperfect Oracles. Proceedings of the 17th ACM Conference on Information and Knowledge Management, 619–628.10.1145/1458082.1458165CrossRef Google Scholar

Dor, L. E., Halfon, A., Gera, A., Shnarch, E., Dankin, L., Choshen, L., Danilevsky, M., Aharonov, R., Katz, Y., & Slonim, N. (2020). Active Learning for BERT: An Empirical Study. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 7949–7962.Google Scholar

Druck, G., Settles, B., & McCallum, A. (2009). Active Learning by Labeling Features. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 81–90.10.3115/1699510.1699522CrossRef Google Scholar

Dudley, J. J., & Kristensson, P. O. (2018). A Review of User Interface Design for Interactive Machine Learning. ACM Transactions on Interactive Intelligent Systems, 8(2), 1–37.10.1145/3185517CrossRef Google Scholar

Eapen, T., Finkenstadt, D. J., Folk, J., & Venkataswamy, L. (2023). How Generative AI Can Augment Human Creativity. Harvard Business Review, 101(4), 56–64.Google Scholar

El-Hasnony, I. M., Elzeki, O. M., Alshehri, A., & Salem, H. (2022). Multi-Label Active Learning-Based Machine Learning Model for Heart Disease Prediction. SENSORS, 22(3), 1184. https://doi.org/10.3390/s22031184 CrossRef Google Scholar PubMed

Fails, J. A., & Olsen, D. R. (2003). Interactive Machine Learning. Proceedings of the 8th International Conference on Intelligent User Interfaces, 39–45.10.1145/604045.604056CrossRef Google Scholar

Fan, X., Li, C., Yuan, X., Dong, X., & Liang, J. (2019). An Interactive Visual Analytics Approach for Network Anomaly Detection through Smart Labeling. Journal of Visualization, 22(5), 955–971.10.1007/s12650-019-00580-7CrossRef Google Scholar

Fiebrink, R., Cook, P. R., & Trueman, D. (2011). Human Model Evaluation in Interactive Supervised Learning. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 147–156.10.1145/1978942.1978965CrossRef Google Scholar

Folmsbee, J., Liu, X., Brandwein-Weber, M., & Doyle, S. (2018). Active Deep Learning: Improved Training Efficiency of Convolutional Neural Networks for Tissue Classification in Oral Cavity Cancer. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 770–773. https://webofscience.clarivate.cn/wos/alldb/summary/8b3794b9-2bc2-4ba9-bb93–84b37012273a-0108175925/relevance/110.1109/ISBI.2018.8363686CrossRef Google Scholar

Forrester Consulting. (2020). Overcome Obstacles to Get to AI at Scale. IBM. www.ibm.com/downloads/cas/VBMPEQLN Google Scholar

Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1322–1333.10.1145/2810103.2813677CrossRef Google Scholar

Gomes, R., Welinder, P., Krause, A., & Perona, P. (2011). Crowdclustering. Advances in Neural Information Processing Systems 24 (NIPS 2011).Google Scholar

Gronsund, T., & Aanestad, M. (2020). Augmenting the Algorithm: Emerging Human-in-the-Loop Work Configurations. Journal of Strategic Information Systems, 29(2), 101614. https://doi.org/10.1016/j.jsis.2020.101614 CrossRef Google Scholar

Gualo, F., Rodríguez, M., Verdugo, J., Caballero, I., & Piattini, M. (2021). Data Quality Certification using ISO/IEC 25012: Industrial Experiences. Journal of Systems and Software, 176, 110938.10.1016/j.jss.2021.110938CrossRef Google Scholar

Guo, J., Kang, Y., Duan, Y., Liu, X., Tang, S., Zhang, W., Kuang, K., Sun, C., & Wu, F. (2022). Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2935–2945.10.1145/3534678.3539022CrossRef Google Scholar

Guo, J., Shi, H., Kang, Y., Kuang, K., Tang, S., Jiang, Z., Sun, C., Wu, F., & Zhuang, Y. (2021). Semi-supervised Active Learning for Semi-supervised Models: Exploit Adversarial Examples with Graph-based Virtual Labels. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2896–2905.10.1109/ICCV48922.2021.00289CrossRef Google Scholar

He, L., Michael, J., Lewis, M., & Zettlemoyer, L. (2016). Human-in-the-Loop Parsing. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2337–2342.10.18653/v1/D16-1258CrossRef Google Scholar

Hoi, S. C. H., Jin, R., Zhu, J., & Lyu, M. R. (2006). Batch Mode Active Learning and Its Application to Medical Image Classification. Proceedings of the 23rd International Conference on Machine Learning: ICML’06, 417–424.10.1145/1143844.1143897CrossRef Google Scholar

Holzinger, A. (2016). Interactive Machine Learning for Health Informatics: When Do We Need the Human-in-the-Loop? Brain Informatics, 3(2), 119–131.10.1007/s40708-016-0042-6CrossRef Google Scholar PubMed

Howe, J. (2006). The Rise of Crowdsourcing. Wired Magazine, 14(6), 176–183.Google Scholar

Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P. S., & Zhang, X. (2022). Membership Inference Attacks on Machine Learning: A Survey. ACM Computing Surveys, 54(11s), 1–37.Google Scholar

Ipeirotis, P. G., Provost, F., & Wang, J. (2010). Quality Management on Amazon Mechanical Turk. Proceedings of the ACM SIGKDD Workshop on Human Computation, 64–67.10.1145/1837885.1837906CrossRef Google Scholar

Jiang, Z., Gao, L., Yuan, K., Gao, Z., Tang, Z., & Liu, X. (2018). Mathematics Content Understanding for Cyberlearning via Formula Evolution Map. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 37–46. https://doi.org/10.1145/3269206.3271694 CrossRef Google Scholar

Jiang, Z., Gao, Z., Duan, Y., Kang, Y., Sun, C., Zhang, Q., & Liu, X. (2020). Camouflaged Chinese Spam Content Detection with Semi-supervised Generative Active Learning. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3080–3085.10.18653/v1/2020.acl-main.279CrossRef Google Scholar

Jiang, Z., Liu, X., Gao, L., & Tang, Z. (2016). Community-based Cyberreading for Information Understanding. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 789–792. https://dl.acm.org/doi/abs/10.1145/2911451.2914744 CrossRef Google Scholar

Joshi, A. J., Porikli, F., & Papanikolopoulos, N. (2009). Multi-class Active Learning for Image Classification. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2372–2379.10.1109/CVPR.2009.5206627CrossRef Google Scholar

Jwo, J.-S., Lin, C.-S., & Lee, C.-H. (2021). Smart Technology-driven Aspects for Human-in-the-Loop Smart Manufacturing. The International Journal of Advanced Manufacturing Technology, 114(5–6), 1741–1752.10.1007/s00170-021-06977-9CrossRef Google Scholar

Kameda, T., Toyokawa, W., & Tindale, R. S. (2022). Information Aggregation and Collective Intelligence beyond the Wisdom of Crowds. Nature Reviews Psychology, 1(6), 345–357.10.1038/s44159-022-00054-yCrossRef Google Scholar

Kapoor, A., Horvitz, E., & Basu, S. (2007). Selective Supervision: Guiding Supervised Learning with Decision-theoretic Active Learning. IJCAI’07 Proceedings of the 20th International Joint Conference on Artificial Intelligence, 877–882.Google Scholar

Kee, S., Del Castillo, E., & Runger, G. (2018). Query-by-Committee Improvement with Diversity and Density in Batch Active Learning. Information Sciences, 454, 401–418.10.1016/j.ins.2018.05.014CrossRef Google Scholar

Kovashka, A., Parikh, D., & Grauman, K. (2015). WhittleSearch: Interactive Image Search with Relative Attribute Feedback. International Journal of Computer Vision, 115(2), 185–210. https://doi.org/10.1007/s11263-015-0814-0 CrossRef Google Scholar

Kreutzer, J., Riezler, S., & Lawrence, C. (2021). Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks. Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021), 37–43.10.18653/v1/2021.spnlp-1.4CrossRef Google Scholar

Le, T.-N., Sugimoto, A., Ono, S., & Kawasaki, H. (2020). Toward Interactive Self-annotation for Video Object Bounding Box: Recurrent Self-learning and Hierarchical Annotation based Framework. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3231–3240.10.1109/WACV45572.2020.9093398CrossRef Google Scholar

Lease, M. (2011). On Quality Control and Machine Learning in Crowdsourcing. Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence.Google Scholar

Lee, T. Y., Smith, A., Seppi, K., Elmqvist, N., Boyd-Graber, J., & Findlater, L. (2017). The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models. International Journal of Human–Computer Studies, 105, 28–42.Google Scholar

Liang, W., Tadesse, G. A., Ho, D., Fei-Fei, L., Zaharia, M., Zhang, C., & Zou, J. (2022). Advances, Challenges and Opportunities in Creating Data for Trustworthy AI. Nature Machine Intelligence, 4(8), 669–677.10.1038/s42256-022-00516-1CrossRef Google Scholar

Liu, B., & Ferrari, V. (2017). Active Learning for Human Pose Estimation. Proceedings of the IEEE International Conference on Computer Vision, 4363–4372. http://openaccess.thecvf.com/content_iccv_2017/html/Liu_Active_Learning_for_ICCV_2017_paper.html10.1109/ICCV.2017.468CrossRef Google Scholar

Liu, C., Zhao, F., Kuang, K., Kang, Y., Jiang, Z., Sun, C., & Wu, F. (2024). Evolving Knowledge Distillation with Large Language Models and Active Learning. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 6717–6731. https://aclanthology.org/2024.lrec-main.593/Google Scholar

Liu, P., Wang, L., Ranjan, R., He, G., & Zhao, L. (2022). A Survey on Active Deep Learning: From Model Driven to Data Driven. ACM Computing Surveys, 54(10s), 1–34.Google Scholar

Liu, X., Jiang, Z., & Gao, L. (2015). Scientific Information Understanding via Open Educational Resources (OER). Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 645–654. https://doi.org/10.1145/2766462.2767750 CrossRef Google Scholar

Liu, Z., Guo, Y., & Mahmud, J. (2021). When and Why a Model Fails? A Human-in-the-Loop Error Detection Framework for Sentiment Analysis. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, 170–177.10.18653/v1/2021.naacl-industry.22CrossRef Google Scholar

Liu, Z., Wang, J., Gong, S., Lu, H., & Tao, D. (2019). Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-identification. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6122–6131.10.1109/ICCV.2019.00622CrossRef Google Scholar

Luo, T., Kramer, K., Goldgof, D. B., Hall, L. O., Samson, S., Remsen, A., Hopkins, T., & Cohn, D. (2005). Active Learning to Recognize Multiple Types of Plankton. Journal of Machine Learning Research, 6(4), 589–613.Google Scholar

Ma, Y., Liu, J., Yi, F., Cheng, Q., Huang, Y., Lu, W., & Liu, X. (2023). AI vs. Human: Differentiation Analysis of Scientific Content Generation [arXiv preprint]. arXiv:2301.10416. http://arxiv.org/abs/2301.10416 Google Scholar

Madnick, S. E., Wang, R. Y., Lee, Y. W., & Zhu, H. (2009). Overview and Framework for Data and Information Quality Research. Journal of Data and Information Quality, 1(1), 1–22.Google Scholar

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2022). A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys, 54(6), 1–35.10.1145/3457607CrossRef Google Scholar

Monarch, R. M. (2021). Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-centered AI. Simon and Schuster.Google Scholar

Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., & Fernández-Leal, Á. (2023). Human-in-the-Loop Machine Learning: A State of the Art. Artificial Intelligence Review, 56(4), 3005–3054.10.1007/s10462-022-10246-wCrossRef Google Scholar

Nguyen, D. H., & Patrick, J. D. (2014). Supervised Machine Learning and Active Learning in Classification of Radiology Reports. Journal of the American Medical Informatics Association, 21(5), 893–901.10.1136/amiajnl-2013-002516CrossRef Google Scholar PubMed

Odekerken, D., & Bex, F. (2020). Towards Transparent Human-in-the-Loop Classification of Fraudulent Web Shops. In Villata, S., Harašta, J., & Křemen, P. (eds.), Frontiers in Artificial Intelligence and Applications (pp. 239–242). IOS Press. https://doi.org/10.3233/FAIA200873 Google Scholar

Panian, Z. (2010). Some Practical Experiences in Data Governance. World Academy of Science, Engineering and Technology, 62(1), 939–946.Google Scholar

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running Experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5), 411–419.10.1017/S1930297500002205CrossRef Google Scholar

Patil, Y., Amer-Yahia, S., & Subramanian, S. (2022). Designing the Evaluation of Operator-enabled Interactive Data Exploration in VALIDE. Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 1–7. https://doi.org/10.1145/3546930.3547509 CrossRef Google Scholar

Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data Quality Assessment. Communications of the ACM, 45(4), 211–218.10.1145/505248.506010CrossRef Google Scholar

Porter, R., Theiler, J., & Hush, D. (2013). Interactive Machine Learning in Data Exploitation. Computing in Science & Engineering, 15(5), 12–20.10.1109/MCSE.2013.74CrossRef Google Scholar

Press, G. (2022, April 14). Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes. www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/Google Scholar

Rahm, E., & Do, H. H. (2000). Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin, 23(4), 3–13.Google Scholar

Ramos, G., Meek, C., Simard, P., Suh, J., & Ghorashi, S. (2020). Interactive Machine Teaching: A Human-centered Approach to Building Machine-learned Models. Human–Computer Interaction, 35(5–6), 413–451.10.1080/07370024.2020.1734931CrossRef Google Scholar

Räth, T., Onah, N., & Sattler, K.-U. (2023). Interactive Data Cleaning for Real-Time Streaming Applications. Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 1–3. https://doi.org/10.1145/3597465.3605229 CrossRef Google Scholar

Reddy, D. (2023). Data Engineering Challenges in AI Automation. 2023 International Conference on Computing, Electronics & Communications Engineering (iCCECE), 107–112.Google Scholar

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144.10.1145/2939672.2939778CrossRef Google Scholar

Rouzegar, H., & Makrehchi, M. (2024). Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation. Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII), 98–111.Google Scholar

Rožanec, J. M., Montini, E., Cutrona, V., Papamartzivanos, D., Klemenčič, T., Fortuna, B., Mladenić, D., Veliou, E., Giannetsos, T., & Emmanouilidis, C. (2024). Human in the AI Loop via xAI and Active Learning for Visual Inspection. In Soldatos, J. (ed.), Artificial Intelligence in Manufacturing (pp. 381–406). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-46452-2_22 CrossRef Google Scholar

Sachan, S., Almaghrabi, F., Yang, J.-B., & Xu, D.-L. (2024). Human–AI Collaboration to Mitigate Decision Noise in Financial Underwriting: A Study on FinTech Innovation in a Lending Firm. International Review of Financial Analysis, 93, 103149. https://doi.org/10.1016/j.irfa.2024.103149 Google Scholar

Schröder, C., & Niekler, A. (2020). A Survey of Active Learning for Text Classification using Deep Neural Networks [arXiv preprint]. arXiv:2008.07267. http://arxiv.org/abs/2008.07267 Google Scholar

Settles, B. (2011). From Theories to Queries: Active Learning in Practice. Active Learning and Experimental Design Workshop in Conjunction with AISTATS 2010, 1–18.Google Scholar

Sheikh, R., Milioto, A., Lottes, P., Stachniss, C., Bennewitz, M., & Schultz, T. (2020). Gradient and Log-based Active Learning for Semantic Segmentation of Crop and Weed for Agricultural Robots. 2020 IEEE International Conference on Robotics and Automation (ICRA), 1350–1356. https://ieeexplore.ieee.org/abstract/document/9196722/Google Scholar

Sheng, V. S., & Zhang, J. (2019). Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 9837–9843.10.1609/aaai.v33i01.33019837CrossRef Google Scholar

Shoshitaishvili, Y., Weissbacher, M., Dresel, L., Salls, C., Wang, R., Kruegel, C., & Vigna, G. (2017). Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 347–362. https://doi.org/10.1145/3133956.3134105 CrossRef Google Scholar

Shraga, R. (2022). HumanAL: Calibrating Human Matching beyond a Single Task. Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 1–8. https://doi.org/10.1145/3546930.3547496 CrossRef Google Scholar

Smailovic, J., Grcar, M., Lavrac, N., & Znidarsic, M. (2014). Stream-based Active Learning for Sentiment Analysis in the Financial Domain. Information Sciences, 285, 181–203. https://doi.org/10.1016/j.ins.2014.04.034 CrossRef Google Scholar

Smith, A., Kumar, V., Boyd-Graber, J., Seppi, K., & Findlater, L. (2018). Closing the Loop: User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. 23rd International Conference on Intelligent User Interfaces, 293–304. https://doi.org/10.1145/3172944.3172965 CrossRef Google Scholar

Sourati, J., Akcakaya, M., Leen, T. K., Erdogmus, D., & Dy, J. G. (2017). Asymptotic Analysis of Objectives based on Fisher Information in Active Learning. Journal of Machine Learning Research, 18(34), 1–41.Google Scholar

Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. F. (2020). Learning to Summarize with Human Feedback. Advances in Neural Information Processing Systems, 33, 3008–3021.Google Scholar

Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J., Wu, Z., & Ding, X. (2020). Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation. Medical Image Analysis, 63, 101693. https://doi.org/10.1016/j.media.2020.101693 CrossRef Google Scholar PubMed

Vaughan, J. W. (2018). Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research. Journal of Machine Learning Research, 18(193), 1–46.Google Scholar

Wallace, E., Rodriguez, P., Feng, S., Yamada, I., & Boyd-Graber, J. (2019). Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering. Transactions of the Association for Computational Linguistics, 7, 387–401.10.1162/tacl_a_00279CrossRef Google Scholar

Wang, G., Zuluaga, M. A., Li, W., Pratt, R., Patel, P. A., Aertsen, M., Doel, T., David, A. L., Deprest, J., Ourselin, S., & Vercauteren, T. (2019). DeeplGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7), 1559–1572. https://doi.org/10.1109/TPAMI.2018.2840695 CrossRef Google Scholar PubMed

Wang, Z. J., Choi, D., Xu, S., & Yang, D. (2021). Putting Humans in the Natural Language Processing Loop: A Survey. Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing, 47–52. https://aclanthology.org/2021.hcinlp-1.8/Google Scholar

Ware, M., Frank, E., Holmes, G., Hall, M., & Witten, I. H. (2001). Interactive Machine Learning: Letting Users Build Classifiers. International Journal of Human–Computer Studies, 55(3), 281–292.10.1006/ijhc.2001.0499CrossRef Google Scholar

Wondimu, N. A., Buche, C., & Visser, U. (2022). Interactive Machine Learning: A State of the Art Review [arXiv preprint]. arXiv:2207.06196. http://arxiv.org/abs/2207.06196 Google Scholar

Wrede, F., & Hellander, A. (2019). Smart Computational Exploration of Stochastic Gene Regulatory Network Models Using Human-in-the-Loop Semi-supervised Learning. Bioinformatics, 35(24), 5199–5206.10.1093/bioinformatics/btz420CrossRef Google Scholar PubMed

Wu, X., Xiao, L., Sun, Y., Zhang, J., Ma, T., & He, L. (2022). A Survey of Human-in-the-Loop for Machine Learning. Future Generation Computer Systems, 135, 364–381.10.1016/j.future.2022.05.014CrossRef Google Scholar

Wu, Y., Wang, C., Gumusel, E., & Liu, X. (2024). Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning. Findings of the Association for Computational Linguistics 2024.10.18653/v1/2024.findings-acl.918CrossRef Google Scholar

Xiao, R., Dong, Y., Zhao, J., Wu, R., Lin, M., Chen, G., & Wang, H. (2023). FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 14520–14535. https://aclanthology.org/2023.emnlp-main.896/10.18653/v1/2023.emnlp-main.896CrossRef Google Scholar

Xu, W., Dainoff, M. J., Ge, L., & Gao, Z. (2023). Transitioning to Human Interaction with AI Systems: New Challenges and Opportunities for HCI Professionals to Enable Human-Centered AI. International Journal of Human–Computer Interaction, 39(3), 494–518.10.1080/10447318.2022.2041900CrossRef Google Scholar

Yang, J., Lan, G., Li, Y., Gong, Y., Zhang, Z., & Ercisli, S. (2022). Data Quality Assessment and Analysis for Pest Identification in Smart Agriculture. Computers and Electrical Engineering, 103, 108322.10.1016/j.compeleceng.2022.108322CrossRef Google Scholar

Yang, Q., Suh, J., Chen, N.-C., & Ramos, G. (2018). Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models. Proceedings of the 2018 Designing Interactive Systems Conference, 573–584.10.1145/3196709.3196729CrossRef Google Scholar

Zhang, J., Wu, X., & Sheng, V. S. (2016). Learning from Crowdsourced Labeled Data: A Survey. Artificial Intelligence Review, 46(4), 543–576.10.1007/s10462-016-9491-9CrossRef Google Scholar

Zhou, X., Wang, Q., Wang, X., Tang, H., & Liu, X. (2023). Large Language Model Soft Ideologization via AI-Self-Consciousness [arXiv preprint]. arXiv:2309.16167. http://arxiv.org/abs/2309.16167 Google Scholar

Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., & Irving, G. (2020). Fine-Tuning Language Models from Human Preferences [arXiv preprint]. arXiv:1909.08593. http://arxiv.org/abs/1909.08593 Google Scholar

Table 8.1 Key stages for integrating human expertise in the machine learning workflow

Table 8.2 Quality assessment methods in crowdsourcing

Figure 8.1 Overview of technical concepts in effective human–AI collaborative intelligenceFigure 8.1 long description.

Figure 8.2 OER-based collaborative PDF reader systemFigure 8.2 long description.

Figure 8.3 Comparison of legal service methodologies, focusing on traditional LLMs, lawyer consultations, and the D3LM modelFigure 8.3 long description.

Accessibility standard: Unknown

Accessibility compliance for the HTML of this book is currently unknown and may be updated in the future.

Book contents

8 - Effective Human–AI Collaborative Intelligence

Summary

Keywords

Information

8.1 Introduction

8.1.1 The Data Challenge in AI

8.1.2 Leveraging Human Expertise in AI: Strategies for Effective Collaboration

8.2 Technical Foundations of Human–AI Collaboration

8.2.1 Human-in-the-Loop (HITL) Systems

Definition and Principles

Benefits and Limitations

8.2.2 Active Learning

Definition and Principles

Quality Control and Cost Consideration

Benefits and Limitations

8.2.3 Crowdsourcing

Definition and Principles

Quality Control and Cost Consideration

Benefits and Limitations

8.2.4 Interactive Machine Learning

Definition and Principles

Quality Control and Cost Consideration

Benefits and Limitations

8.2.5 Interconnections as High-Level Paradigms

8.3 Practical Applications and Case Studies

8.3.1 Practical Applications across Various Domains

Healthcare

Finance

Agriculture

Education

Misinformation Control and Cybersecurity

Law

Expanding Human–AI Collaboration in Other Domains

8.3.2 Representative Case Studies

Case 1: Education Domain: Scaffolding with Open Educational Resources (OER)

Case 2: Legal Domain: Knowledge-infused Legal Wisdom

8.4 Future Directions

8.4.1 Continuous Learning from Human Experts

Model Robustness and Efficiency

Balancing Expertise and Ensuring Feedback Quality

Real-World Applications and Generalization

8.4.2 Advancements in Human–AI Interfaces

8.4.3 Integration with Large Language Models

8.4.4 Ethical Implications and Societal Impact of Human–AI Collaboration

8.5 Conclusion

References

Accessibility standard: Unknown

Save book to Kindle

Save book to Dropbox

Save book to Google Drive