4.1 Introduction
With the rapid development of artificial intelligence, Human–AI Interaction (HAII) has gradually become the focus of Human–Computer Interaction (HCI) and its related cross-fields (Amershi et al., Reference Amershi, Weld, Vorvoreanu, Fourney, Nushi, Collisson, Suh, Iqbal, Bennett and Inkpen2019). The emergence of ChatGPT indicates that generative artificial intelligence (GAI) based on the large language model (LLM) has entered a new stage of development. Particularly, a deep learning model is employed to generate human-like content; that is, AI-Generated Content (AIGC) in response to complex and diverse prompts (Lim et al., Reference Lim, Gunasekara, Pallant, Pallant and Pechenkina2023). Human interaction with GAI will greatly enhance people’s productivity and creativity, and further penetrate all aspects of the public’s life (De Freitas et al., Reference De Freitas, Agarwal, Schmitt and Haslam2023). However, while bringing benefits to people, GAI will inevitably raise a series of technical, socio-cultural, and ethical issues, among which, the credibility of GAI remains a research concern worth attention in the new era (Longoni et al., Reference Longoni, Fradkin, Cian and Pennycook2022).
Credibility was originally defined as ‘believability’ or perceived information quality from the perspective of information recipients but credibility is not necessarily equal to objective information quality (Flanagin & Metzger, Reference Flanagin, Metzger, Kenski and Jamieson2017). Many researchers agree that the concept of credibility is multidimensional, including the components such as trustworthiness, expertise, objectivity (Choi & Stvilia, Reference Choi and Stvilia2015). As technology advances and times move forward, credibility studies also need to pay close attention to the depth of interaction between people and information, digital artifacts, and socio-cultural environments (Shin, Reference Shin2022). The credibility problem in the traditional mass media era is not comparable to that in the Internet era. Similarly, credibility issues in the era of GAI are facing more challenges brought on by new technologies, new businesses, and new environments (Huschens et al., Reference Huschens, Briesch, Sobania and Rothlauf2023), and therefore credibility research needs to keep up with the times and be critically examined.
The credibility assessment and judgment of AI has become an important topic in the research of explainable AI (Wagle et al., Reference Wagle, Kaur, Kamat, Patil and Kotecha2021). While AI technology injects vitality into social development, it also triggers negative problems such as technological black box (Castelvecchi, Reference Castelvecchi2016), algorithmic discrimination (Shin, Reference Shin2022), dissemination of misinformation (Zhou et al., Reference Zhou, Zhang, Luo, Parker and De Choudhury2023), and echo chambers (Jeon et al., Reference Jeon, Kim, Park, Ko, Ryu, Kim and Han2024); in particular, the rapid development of GAI in recent years has created a series of concerns among the public about privacy, employment opportunities, and the loss of control, which in turn affects the trust between people and technology, as well as the adoption and use of GAI by individuals and organizations (Wach et al., Reference Wach, Duong, Ejdys, Kazlauskaitė, Korzynski, Mazurek, Paliszkiewicz and Ziemba2023). Therefore, the credibility assessment of GAI aims to alleviate people’s concerns about new technologies represented by ChatGPT to a certain extent, and advocate the development of human-centered AI, which promotes a harmonious symbiotic relationship between humans and the new generation of AI. For example, Johnson et al. (Reference Johnson, Goodman, Patrinely, Stone, Zimmerman, Donald, Chang, Berkowitz, Finn and Jahangir2023) suggested that verifying the reliability of the content generated by ChatGPT is conducive to further designing models to improve the robustness of an AI system, thus increasing the credibility of users’ perception of AI.
In view of this, the topic of credibility in human–generative AI interaction needs to be further explored. Although there have been reviews of credibility research in the algorithmic era in recent years (Alrubaian et al., Reference Alrubaian, Al-Qurishi, Alamri, Al-Rakhami, Hassan and Fortino2018), studies specifically addressing credibility issues from the GAI perspective remain limited. So far, there have been some studies focusing on the topic of credibility in user’s adoption and use of GAIs, and some studies have specifically explored the trust and reliability of GAIs empirically in various contexts. Therefore, the aim of this chapter is to present a clear picture of the current state of credibility research in human–generative AI interaction by analyzing the relevant literature dispersed across various disciplines and to provide a holistic review of measurement instruments, influencing factors, challenges, emerging technologies and optimization methods for the assessment of AIGC credibility. Finally, the chapter also proposes several directions for further investigation with respect to the limitations of AIGC credibility assessment.
4.2 The Concept of AI Credibility
4.2.1 What Is Credibility?
Credibility is a multifaceted construct that pertains to the degree to which an entity – be it information, an individual, or a system – is perceived as trustworthy and reliable in a specific context (Rieh & Danielson, Reference Rieh and Danielson2007). The foundational definition of credibility often revolves around the term “believability,” signifying the extent to which stakeholders are willing to trust and rely on a given source or system (Fogg & Tseng, Reference Fogg and Tseng1999). However, credibility encompasses a broader array of dimensions beyond mere believability. Credibility is often complex and multidimensional, encompassing a comprehensive evaluation of various characteristics or factors, such as reliability, accuracy, expertise, authority, objectivity, and appeal (Fogg & Tseng, Reference Fogg and Tseng1999; McCroskey & Young, Reference McCroskey and Young1981; Rieh, Reference Rieh2002). From the perspective of the subject being assessed, researchers have classified credibility into categories like advertisement credibility, review credibility, and media credibility (Cheung et al., Reference Cheung, Sia and Kuan2012; Cotte et al., Reference Cotte, Coulter and Moore2005). Furthermore, Flanagin and Metzger (Reference Flanagin and Metzger2007) subdivided credibility into content credibility, source credibility, and design credibility.
Credibility is a key factor for individuals, corporations, governments, and the media in maintaining a good reputation, and it also influences public trust in the broader social structure (Tseng & Fogg, Reference Tseng and Fogg1999). Whether in information dissemination, investment decisions, or policy formulation, credibility often becomes a benchmark for evaluating the success and effectiveness of these activities. As such, credibility has become increasingly important across various sectors, from news reporting (Hofeditz et al., Reference Hofeditz, Mirbabaie, Holstein and Stieglitz2021) and scientific research (Alam & Mohanty, Reference Alam and Mohanty2022) to business marketing (Khan & Mishra, Reference Khan and Mishra2024) and smart healthcare (Aliyeva & Mehdiyev, Reference Aliyeva and Mehdiyev2024; Stevens & Stetson, Reference Stevens and Stetson2023). However, in the age of AI – marked by the rapid proliferation of emerging technologies, the lack of algorithmic transparency, the risks of bias and manipulation, and the globalized, decentralized digital environment – the task of maintaining and enhancing AI credibility presents both significant opportunities and substantial challenges.
There are several similarities between the concept of credibility and the concept of human-centered AI, as both emphasize the important position of users in shaping perceived experience. Some researchers suggest that the design of human-centered AI should pay attention to the influence of AI on people and put the user experience at the center (Shneiderman, Reference Shneiderman2020; Xu, Reference Xu2019). Furthermore, traditional human–computer interaction is actively evolving toward human–generative AI interaction, and the original credibility dimension can no longer fully cover and reflect the connotation of AI credibility. Therefore, it is necessary to revisit the conceptualization of AI credibility in the context of human–generative AI interaction. The integrated framework of credibility evaluation (Hilligoss & Rieh, Reference Hilligoss and Rieh2008), dominance-interpretation theory (Fogg, Reference Fogg2003), credibility MAIN model (Sundar, Reference Sundar2008), and other related theories lay a theoretical foundation for expanding the conceptual map of AIGC credibility.
4.2.2 Main Dimensions of AI Credibility
It is necessary to consider the characteristics of the AI when constructing the concept of AIGC credibility. Shin (Reference Shin2022) suggests that the credibility of AIGC should be mapped with some characteristics of AI in a broader scope. At present, researchers generally agree that human-centered AI should be Explainable AI (Capel & Brereton, Reference Capel and Brereton2023), which can be embodied in the characteristics of AI, such as fairness, accountability, transparency, and interpretability. This section elaborates and expands on AI credibility based on the primary dimensions of explainable AI. Table 4.1 summarizes the main dimensions and corresponding concepts of AI credibility.
Dimension | Description | References |
---|---|---|
Reliability | The ability of AI to consistently deliver accurate and stable results under various conditions – including flexibility, accessibility, and timeliness – also encompasses the system’s robustness when faced with data changes, failures, or stress. Reliability is crucial in assessing AI credibility, as users expect the system to maintain stable performance even in complex, uncertain, or extreme situations. | Bedué and Fritzsche (Reference Bedué and Fritzsche2022); Hayashi and Wakabayashi (Reference Hayashi and Wakabayashi2017) |
Fairness | Fairness in AI credibility requires that the system not only avoids overt biases but also possesses the ability to detect and correct hidden biases. To ensure AI credibility, developers must rigorously control for bias throughout model design, data collection, training, and testing processes. | Mehrabi et al. (Reference Mehrabi, Morstatter, Saxena, Lerman and Galstyan2021); Sambasivan et al. (Reference Sambasivan, Arnesen, Hutchinson, Doshi and Prabhakaran2021) |
Accountability | A key component of AI credibility is ensuring clear accountability when errors or failures occur. Whether involving developers, operators, or users, the responsibility framework for AI systems must be well-defined to ensure that issues can be traced back to their source and corrective actions taken. | Busuioc (Reference Busuioc2021); Hallowell et al. (Reference Hallowell, Badger, Sauerbrei, Nellåker and Kerasidou2022) |
Transparency | Transparency in AI refers not only to the explainability and comprehensibility of the system’s decision-making processes but also to the transparency of information, such as data sources and algorithm choices, and the transparency of processes, like records of system updates or adjustments. A transparent AI system enables users to understand how data is collected, processed, and analyzed, allowing them to better grasp and trust the decision-making flow of the AI. | Ehsan et al. (Reference Ehsan, Liao, Muller, Riedl and Weisz2021); Vössing et al. (Reference Vössing, Kühl, Lind and Satzger2022) |
Security | Security and robustness ensure that AI systems do not make erroneous decisions in abnormal situations, such as when faced with malicious inputs or adversarial attacks, thereby safeguarding user trust. | Hu et al. (Reference Hu, Kuang, Qin, Li, Zhang, Gao, Li and Li2021) |
Ethic | AI decisions must not only be technically accurate but also align with social, ethical, and moral standards. By addressing issues such as privacy protection, eliminating algorithmic bias, and considering the impact on vulnerable groups, AI systems can enhance user trust. | Reinhardt (Reference Reinhardt2023) |
Intelligibility | From the user’s perspective, AI outputs need to be understandable, and its decisions must provide clear explanations to practitioners without a technical background. This allows users to maintain trust in the results while utilizing AI. | Lim et al. (Reference Lim, Yang, Abdul and Wang2019) |
Firstly, the reliability and security of AI systems are paramount, as users expect stable performance and data integrity even in complex or uncertain situations. For example, in the healthcare domain, the accuracy of AIGC will affect patients trust (Johnson et al., Reference Johnson, Goodman, Patrinely, Stone, Zimmerman, Donald, Chang, Berkowitz, Finn and Jahangir2023). Secondly, transparency and intelligibility are key dimensions of AI credibility, helping users understand the logic and reasoning behind AI decisions, thus reducing fear or distrust of “black box” models (Shin, Reference Shin2023). Thirdly, accountability refers to the presence of clear responsibility mechanisms in AI systems, ensuring that issues can be traced, corrected, and prevented from recurring – an essential aspect of building and maintaining user trust (Hallowell et al., Reference Hallowell, Badger, Sauerbrei, Nellåker and Kerasidou2022). Lastly, fairness and ethics represent two extended dimensions of AI credibility, reflecting the importance of social values and human cultural norms in AI applications. Enhancing AI credibility requires not only technological advancements but also the establishment of strict ethical and fairness standards, ensuring that AI systems make more responsible decisions within various social contexts (Zhang & Zhang, Reference Zhang and Zhang2023).
4.3 Measures of AI Credibility in Human–Generative AI Interaction
Measuring and evaluating AI credibility is a crucial aspect of achieving trustworthy and human-centered AI systems. Through a review of the literature, we classify the measurement of AI credibility into subjective assessments from a user-centric perspective and relatively objective measurements using technical methods.
On the one hand, the user-centered subjective measurement approach primarily refers to measure users’ perceived credibility of AI products through questionnaires, focusing on specific research situations and research questions (Xiang et al., Reference Xiang, Zhou and Xie2023). For example, in order to evaluate the perceived credibility of students on ChatGPT, Tossell et al. (Reference Tossell, Tenhundfeld, Momen, Cooley and de Visser2024) used the updated multi-dimensional measure of trust (MDMT), version 2 questionnaire. The measurement dimensions include reliability, ability, morality, transparency, and kindness (Ullman & Malle, Reference Ullman and Malle2019). In addition, Tossell et al. (Reference Tossell, Tenhundfeld, Momen, Cooley and de Visser2024) used 7-point Likert scales to evaluate students’ trust in ChatGPT, and the measurement items were adapted from the surveys used in military training (Dzindolet et al., Reference Dzindolet, Peterson, Pomranky, Pierce and Beck2003) and autonomous driving research (Tenhundfeld et al., Reference Tenhundfeld, de Visser, Ries, Finomore and Tossell2020). Uzir et al. (Reference Uzir, Bukari, Al Halbusi, Lim, Wahab, Rasul, Thurasamy, Jerin, Chowdhury and Tarofder2023) used the form of questionnaire, which included two dimensions of privacy and security, and measured the perceived credibility of smart watches by elderly consumers.
In addition, some researchers assess the users’ perceived AI credibility from other dimensions. For example, measuring the propensity of users relying on agents in future situations is one of the initial methods used to assess the credibility (Kohn et al., Reference Kohn, De Visser, Wiese, Lee and Shaw2021; Momen et al., Reference Momen, De Visser, Wolsten, Cooley, Walliser and Tossell2023; Monfort et al., Reference Monfort, Graybeal, Harwood, McKnight and Shaw2018). Because trust comes from the drive of rational factors and the stimulation of positive emotions, or the comprehensive effect of the two, Chen and Park (Reference Chen and Park2021) divide users’ trust in intelligent personal assistants into cognitive trust (e.g., usefulness, reliability, honesty and integrity of AI) and emotional trust (e.g., safety, comfort and satisfaction of AI).
On the other hand, relatively objective measurements using technical methods can assist researchers and developers in quantifying and evaluating AI system credibility, thereby enhancing its reliability and safety in practical applications. For example, automated methods may assess an AI system’s responsiveness and explainability (Lin et al., Reference Lin, Lee and Celik2021), test model performance on specific tasks (Huang et al., Reference Huang, Sun, Wang, Wu, Zhang, Li, Gao, Huang, Lyu and Zhang2024), and develop quantitative metrics to evaluate the robustness of deep neural networks (Ruan et al., Reference Ruan, Wu, Sun, Huang, Kroening and Kwiatkowska2019). Some researchers also use machine learning techniques to assess AI system explainability (Yang, Reference Yang2019), employ blockchain technology to enhance data credibility (Distefano et al., Reference Distefano, Di Giacomo and Mazzara2021). Some frameworks such as DeepTrust (Cheng et al., Reference Cheng, Nazarian and Bogdan2020) and credibility metrics models (Uslu et al., Reference Uslu, Kaur, Rivera, Durresi, Durresi and Babbar-Sebens2021) are proposed to measure AI system reliability.
Overall, while numerous studies have highlighted the need to improve the credibility of AI systems, relatively few have explored the quantitative assessment of AI credibility, in particular the contextualized measurement of AIGC credibility and the refinement of credibility dimensions in human–generative AI interaction.
4.4 Influences on Credibility Assessment in Human–Generative AI Interaction
Early research on information credibility assessment identified information sources, cues, and affordances as key factors influencing users’ perceived credibility. Since then, numerous studies on the credibility of HCI have highlighted the impact of technical signifiers in the interaction environment on users’ credibility assessment (Liao & Mak, Reference Liao and Mak2019). For instance, when users search for health information on short video platforms, social media indicators positively influence their perception of credibility (Song et al., Reference Song, Zhao, Yao, Ba and Zhu2021). As HCI evolves into human–generative AI interaction, AI credibility assessment not only involves technical aspects such as system components and algorithm optimization, but also focuses on the practical performance of AI systems across diverse application scenarios and users’ trust perceptions in human–generative AI interaction. Therefore, recent research trends toward a comprehensive consideration of various factors affecting AI credibility assessment, including data, system, algorithm and user factors in addition to information factors. Specific details and examples are provided in Table 4.2.
Dimensions | Categories | Examples |
---|---|---|
Date and information | Data quality | Data acquisition, data processing and data storage (Hu et al., Reference Hu, Kuang, Qin, Li, Zhang, Gao, Li and Li2021; Liang et al., Reference Liang, Tadesse, Ho, Fei-Fei, Zaharia, Zhang and Zou2022; Zhang & Zhang, Reference Zhang and Zhang2023) |
Date and information | Information source | News organizations/media with cognitive authority (Kim & Kim, Reference Kim and Kim2020) |
Date and information | Information content | Accuracy, authenticity, completeness, and timeliness (Kim et al., Reference Kim, Giroux and Lee2021; Van Bulck & Moons, Reference Van Bulck and Moons2024) |
System | System interpretability | Audit integrity (Raji et al., Reference Raji, Smart, White, Mitchell, Gebru, Hutchinson, Smith-Loud, Theron and Barnes2020), trust calibration (Zhang et al., Reference Zhang, Liao and Bellamy2020), agency transparency (Araujo et al., Reference Araujo, Helberger, Kruikemeier and De Vreese2020), explanatory element types (Ha & Kim, Reference Ha and Kim2024; Pareek et al., Reference Pareek, van Berkel, Velloso and Goncalves2024) |
System | System attribute characteristics | System reliability (Hayashi & Wakabayashi, Reference Hayashi and Wakabayashi2017), system (service) quality (Chen et al., Reference Chen, Lu, Gong and Xiong2023), model performance (Zhang et al., Reference Zhang, Genc, Wang, Ahsen and Fan2021) |
System | AI anthropomorphism | AI anthropomorphism features (Chen & Park, Reference Chen and Park2021), AI voice features (Kim et al., Reference Kim, Merrill, Xu and Kelly2022), AI warmth and ability (Chandra et al., Reference Chandra, Shirish and Srivastava2022) |
Algorithm | Algorithm complexity | Complexity degree of algorithm (Lehmann et al., Reference Lehmann, Haubitz, Fügener and Thonemann2022) |
Algorithm | Algorithm transparency | Algorithmic interpretability (Chen, Reference Chen2024; Grimmelikhuijsen, Reference Grimmelikhuijsen2023; Markus et al., Reference Markus, Kors and Rijnbeek2021), algorithm reliability (Durán & Jongsma, Reference Durán and Jongsma2021) |
Algorithm | Algorithm security | Algorithm errors (Schmitt et al., Reference Schmitt, Wambsganss, Söllner and Janson2021) |
Algorithm | Fairness of algorithm | Algorithm bias (Bernagozzi et al., Reference Bernagozzi, Srivastava, Rossi and Usmani2021; Winkle et al., Reference Winkle, Melsión, McMillan and Leite2021) |
User | Interactive experience | Perceptual interactive experience (Zhuang et al., Reference Zhuang, Ma, Zhou, Li, Wang, Huang, Zhai and Ying2024) |
User | Individual ability | Algorithm literacy (Shin, Reference Shin2022) |
User | Sociocultural contexts | Social and cultural environment (Chien et al., Reference Chien, Lewis, Sycara, Liu and Kumru2018) |
4.4.1 Date and Information-related Attributes
In terms of data factors, data quality significantly impacts the credibility of medical AI. Issues such as data errors and omissions, the lack of standardized metadata, and the prevalence of unstructured data can undermine technical reliability, negatively affecting the credibility of medical AI (Zhang & Zhang, Reference Zhang and Zhang2023). Additionally, aspects of the AI data process (e.g., data design, data archiving and data evaluation) also influence the credibility of the AI model (Liang et al., Reference Liang, Tadesse, Ho, Fei-Fei, Zaharia, Zhang and Zou2022).
As for information, the content quality is a key factor influencing users’ perception of AI credibility. For instance, users’ overall trust in an AI system largely depends on its ability to provide accurate, authentic, complete, and timely information to support their tasks (Kim et al., Reference Kim, Giroux and Lee2021). Moreover, it has been found that content generated by ChatGPT often lacks completeness, which can easily mislead users and diminish its credibility (Van Bulck & Moons, Reference Van Bulck and Moons2024).
4.4.2 System-related Attributes
For the system, explanation directly affects the transparency of AI, which in turn positively correlates with AI credibility. For example, research has shown that providing users with text-based explanations can enhance their trust in explainable AI systems more effectively than visual explanations (Ha & Kim, Reference Ha and Kim2024). Additionally, the security and reliability of AI systems impact users’ perceived credibility. For instance, the service quality of AI chatbots positively influences customer loyalty by enhancing perceived value, cognitive trust, and emotional trust (Chen et al., Reference Chen, Lu, Gong and Xiong2023).
The anthropomorphism of AI is supported by its strong comprehension and innovative capabilities (Pelau et al., Reference Pelau, Dabija and Ene2021), enabling AI systems to grasp the nuances of human–generative AI interaction. The anthropomorphic traits of AI enhance users’ trust, making AI systems with human-like expression styles more approachable and trustworthy (Chen & Park, Reference Chen and Park2021; L. Lu et al., Reference Lu, McDonald, Kelleher, Lee, Chung, Mueller, Vielledent and Yue2022; Wang & Zhao, Reference Wang and Zhao2023). For instance, AI instructors with human-like voices tend to achieve higher perceived credibility among students than those with robotic voices (Kim et al., Reference Kim, Merrill, Xu and Kelly2022). However, humans generally possess greater social appeal, competence, and credibility compared to robots (Beattie et al., Reference Beattie, Edwards, Edwards, Nah, McNealy, Kim and Joo2020; Edwards et al., Reference Edwards, Edwards and Omilion-Hodges2018; Finkel & Krämer, Reference Finkel and Krämer2022).
4.4.3 Algorithm-related Attributes
In the realm of algorithms, specific characteristics such as fairness, accountability, transparency, and explainability are closely linked to trust and performance expectations (Shin, Reference Shin2023). Algorithm transparency can significantly influence users’ trust in the information provided by the algorithm (Grimmelikhuijsen, Reference Grimmelikhuijsen2023; Yeomans et al., Reference Yeomans, Shah, Mullainathan and Kleinberg2019), as well as their confidence in algorithmic outcomes and decision-makers, ultimately impacting their interactive experiences and decision-making processes (Cadario et al., Reference Cadario, Longoni and Morewedge2021). However, when the complexity of an algorithm falls below users’ expectations, increased transparency can actually diminish perceived credibility (Lehmann et al., Reference Lehmann, Haubitz, Fügener and Thonemann2022). Additionally, algorithmic bias can undermine users’ trust in AI systems, with gender bias being a particularly prominent issue in human–generative AI interaction (Bernagozzi et al., Reference Bernagozzi, Srivastava, Rossi and Usmani2021; Winkle et al., Reference Winkle, Melsión, McMillan and Leite2021).
4.4.4 User-related Attributes
In the early years of theories of information credibility, it was widely accepted that the user’s understanding, judgment, and cognitive processing of information clues or components would have a significant impact on the evaluation of information credibility during interaction with the computer (Fogg, Reference Fogg2003). In the context of human–generative AI intelligence interaction, the interaction experience between users and AI systems also influences their evaluation of AI credibility. For example, older adults have had positive experiences watching short medical videos created by large language models, which has enhanced their trust in medical care (Zhuang et al., Reference Zhuang, Ma, Zhou, Li, Wang, Huang, Zhai and Ying2024).
From the user’s perspective, algorithm literacy is a key factor influencing the credibility assessment of AI, which represents an advanced stage of both information and digital literacy, manifesting a profound understanding of AI (Shin et al., Reference Shin2022). It is indispensable in forecasting user decisions in human–generative AI interaction (Shin, Reference Shin2022). In addition, social and cultural backgrounds also influence the evaluation of AI credibility (Chien et al., Reference Chien, Lewis, Sycara, Liu and Kumru2018). This aligns with sociocultural perspectives, which suggests that people’s evaluation of credibility are constrained by their particular cultural, systemic, and historical backgrounds (Mansour & Francke, Reference Mansour and Francke2017).
4.5 Challenges in Credibility Assessment of Human–Generative AI Interaction
The challenges in assessing AI credibility encompass issues related to transparency, ethics, security, privacy, and rights, as detailed in Table 4.3. AI models generally use complex algorithms such as machine learning and deep learning, so users cannot understand the process of AI decision-making in a direct way (Hamon et al., Reference Hamon, Junklewitz, Malgieri, Hert, Beslay and Sanchez2021). For example, “black box” problems are common in AI systems in healthcare, characterized by a lack of interpretability and potential biases. This situation can clash with clinicians’ and patients’ expectations for a clear logical chain, thereby undermining trust in AI (Esmaeilzadeh, Reference Esmaeilzadeh2024). Additionally, as the amount of explanatory information provided by AI systems increases, especially in time-sensitive situations, managing information overload and identifying the most relevant details becomes a significant challenge (Ehsan et al., Reference Ehsan, Liao, Muller, Riedl and Weisz2021).
Challenges | Examples |
---|---|
Transparency issue | Lack of explanatory (Ehsan et al., Reference Ehsan, Liao, Muller, Riedl and Weisz2021; Esmaeilzadeh, Reference Esmaeilzadeh2024), technical black box (Schoenherr et al., Reference Schoenherr, Abbas, Michael, Rivas and Anderson2023) |
Moral and ethical issues | Gender prejudice (Winkle et al., Reference Winkle, Melsión, McMillan and Leite2021), moral conflict (Morley et al., Reference Morley, Machado, Burr, Cowls, Joshi, Taddeo and Floridi2020) |
Security and privacy issues | Algorithm deviation and error (Kaissis et al., Reference Kaissis, Makowski, Rückert and Braren2020), data abuse (Kaissis et al., Reference Kaissis, Makowski, Rückert and Braren2020), privacy violation (Mou & Meng, Reference Mou and Meng2024) |
Power and responsibility issues | Responsibility attribution (Leo & Huh, Reference Leo and Huh2020) |
Other risk issues | Misinformation dissemination (Esmaeilzadeh, Reference Esmaeilzadeh2020; Molina & Sundar, Reference Molina and Sundar2022), cognitive biases (Ehsan et al., Reference Ehsan, Liao, Muller, Riedl and Weisz2021), weakening of human autonomy (Abbass, Reference Abbass2019; Ernst, Reference Ernst2020) |
Secondly, moral and ethical issues, such as gender bias (Winkle et al., Reference Winkle, Melsión, McMillan and Leite2021) and moral conflicts (Morley et al., Reference Morley, Machado, Burr, Cowls, Joshi, Taddeo and Floridi2020), must be thoroughly considered in assessing AI credibility. These issues often arise from algorithmic bias. Data security and privacy are also major challenges in AI credibility assessment. The inherent fragility of algorithms can lead to incorrect decisions when processing data, directly impacting the stability and security of AI systems (Zhang & Zhang, Reference Zhang and Zhang2023). Additionally, using extensive data sets for credibility evaluation raises substantial privacy and security concerns. If data is misused, it can severely threaten user privacy and security (Kaissis et al., Reference Kaissis, Makowski, Rückert and Braren2020). For example, users’ normative behaviors and reactions may be exploited by intelligent machines (Leong & Selinger, Reference Leong and Selinger2019) and their designers for monitoring, tracking, or fraudulent activities (Shahriar et al., Reference Shahriar, Allana, Hazratifard and Dara2023), posing a serious threat to personal privacy and potentially resulting in significant privacy violations.
Besides the above challenges, there is also an important issue of how to clarify the attribution of responsibility when AI systems fail or cause harm to users. In particular, this issue is critical and urgent when AI applications directly affect the health and safety of patients (Esmaeilzadeh, Reference Esmaeilzadeh2020), and solving this problem requires a combination of technical, legal, and ethical considerations.
It is important to recognize that misplaced or inappropriate trust in GAI can lead to a variety of potential consequences and risks, including the spread of misinformation (Molina & Sundar, Reference Molina and Sundar2022), cognitive biases (Ehsan et al., Reference Ehsan, Liao, Muller, Riedl and Weisz2021), and reduced human autonomy (Abbass, Reference Abbass2019). For instance, AIGC, while fueling efficient content creation, also risks the spread of disinformation (Shusas, Reference Shusas2024).
4.6 Ways to Enhance the Credibility Assessment in Human–Generative AI Interaction
Due to the complexity and opacity of AI systems, users often find it difficult to understand and trust their decision-making processes and outcomes. Therefore, exploring new methods and technical solutions to enhance the credibility evaluation of AI systems is crucial. Currently, a key approach is to calibrate AI system trust and robustness using advanced technologies. Techniques such as machine learning (Carvalho et al., Reference Carvalho, Pereira and Cardoso2019), deep learning (Chander et al., Reference Chander, John, Warrier and Gopalakrishnan2024), federated learning (P. Chen et al., Reference Chen, Liu and Lee2022; Lo et al., Reference Lo, Liu, Lu, Wang, Xu, Paik and Zhu2022), and Shapley Additional Explanations (SHAP) (Sabharwal et al., Reference Sabharwal, Miah, Wamba and Cook2024; Trindade Neves et al., Reference Trindade Neves, Aparicio and de Castro Neto2024) are used to improve model transparency and system explanation. Toreini et al. (Reference Toreini, Aitken, Coopamootoo, Elliott, Zelaya and Van Moorsel2020) proposed four technologies to enhance AI credibility: Fairness, Explanatory Ability, Auditability, and Safety (FEAS), which should be considered throughout all stages of the system life cycle.
The human–computer collaborative decision-making design method integrates system decision-making with human experience and cross-domain knowledge, aiming to enhance both the credibility and operational efficiency of AI systems. This approach includes measures such as optimizing configurations or designs to improve user–AI collaboration (Jain et al., Reference Jain, Garg and Khera2023), incorporating domain-specific knowledge to interpret local data errors in AI-assisted decision-making (Zhang et al., Reference Zhang, Liao and Bellamy2020) and enabling users to provide feedback to algorithms (Molina & Sundar, Reference Molina and Sundar2022). These strategies can significantly enhance AI system credibility and ensure its reliable use across various application scenarios. Researchers urge various institutions – including government bodies, accounting firms, insurance companies, non-governmental organizations, civil society organizations, professional groups, and research institutions – to collaborate in exploring new ways to improve the credibility of human-centered AI and advance interpretable AI (Arnold et al., Reference Arnold, Bellamy, Hind, Houde, Mehta, Mojsilović, Nair, Ramamurthy, Olteanu and Piorkowski2019; Shneiderman, Reference Shneiderman2020).
In addition to the above technical approaches, some important theoretical frameworks have also been proposed to ensure the reliability of AI system reliability assessment from multiple dimensions. For example, the algorithmic audit framework can be applied to the whole life cycle of AI system assessment (Raji et al., Reference Raji, Smart, White, Mitchell, Gebru, Hutchinson, Smith-Loud, Theron and Barnes2020). The AI Public Trust Model (Knowles & Richards, Reference Knowles and Richards2021) and the AI Trust, Risk and Security Management (AI TRiSM) framework (Habbal et al., Reference Habbal, Ali and Abuzaraida2024) aim to improve the trustworthiness and reliability of AI; Context-cognitive frameworks for Explainable AI (SAFE-AI) (Sanneman & Shah, Reference Sanneman and Shah2022), confidence measures frameworks (van der Waa et al., Reference van der Waa, Schoonderwoerd, van Diggelen and Neerincx2020) for explainability, and multidimensional interpretative matrices (Hamon et al., Reference Hamon, Junklewitz, Malgieri, Hert, Beslay and Sanchez2021) can be used to assess the explainable behavior of AI systems.
Some auxiliary evaluation classifications are proposed to solve the ethical problems brought by the credibility of AI. For example, the interpretable classification of evaluation system (Sokol & Flach, Reference Sokol and Flach2020), the “dishonest personification” classification of AI robot (Leong & Selinger, Reference Leong and Selinger2019), and the visual evaluation classification of gender bias in AI systems (Bernagozzi et al., Reference Bernagozzi, Srivastava, Rossi and Usmani2021). These auxiliary evaluations can alleviate, to varying degrees, the diverse socio-cultural and technological ethical dilemmas raised by human–generative AI interaction, and help users make better use of GAI.
4.7 Domains of Credibility Assessment in Human–Generative AI
With the rapid advancement and widespread application of AI technology, credibility issues in human–generative AI interaction have become increasingly significant across various industries, including healthcare, finance, services, and education. Specific application scenarios such as smart healthcare, autonomous driving, investment forecasting, and intelligent customer service highlight these concerns, as detailed in Table 4.4. This underscores the need for heightened attention to AI credibility within both industry and academia. Addressing these issues is crucial for fostering effective interactions between users and AI technology and advancing the development of trustworthy AI.
Research on AI credibility assessment in the healthcare sector primarily focuses on several key areas. Firstly, as AI applications in disease prediction, diagnosis, treatment, and health management become increasingly prevalent (Holzinger et al., Reference Holzinger, Langs, Denk, Zatloukal and Müller2019; Zahlan et al., Reference Zahlan, Ranjan and Hayes2023), ensuring the credibility of AI outcomes is crucial. Trust in medical AI is considered foundational for the adoption of smart healthcare. Studies have shown that clinicians’ acceptance of AI is influenced by AI credibility (Stevens & Stetson, Reference Stevens and Stetson2023), and there are also differences in how patients perceive trust in both their doctors and AI medical systems (Yokoi et al., Reference Yokoi, Eguchi, Fujita and Nakayachi2021). Secondly, recent research has concentrated on the impact of transparency and explainability of medical models on AI credibility (Albahri et al., Reference Albahri, Duhaim, Fadhel, Alnoor, Baqer, Alzubaidi, Albahri, Alamoodi, Bai and Salhi2023), particularly addressing issues such as the “black box” nature of algorithms that can lead to distrust or even aversion among patients (Zhang & Zhang, Reference Zhang and Zhang2023). Finally, AI systems and tools such as ChatGPT (Van Bulck & Moons, Reference Van Bulck and Moons2024), AI chatbots (Weeks et al., Reference Weeks, Sangha, Cooper, Sedoc, White, Gretz, Toledo, Lahav, Hartner and Martin2023), and AI medical devices (Fehr et al., Reference Fehr, Jaramillo-Gutierrez, Oala, Gröschel, Bierwirth, Balachandran, Werneck-Leite and Lippert2022) are central to assessing AI credibility in the healthcare field. Future research needs to focus on enhancing credibility through human–AI collaboration, addressing privacy, ethical, and responsibility issues in medical practice, and improving AI’s decision-making capabilities.
Research on the AI credibility in education primarily focuses on students’ perceived trust in AI and the exploration of human–AI collaboration in online learning models. Students’ trust in AI teaching tools may affect the effectiveness of online education, and such studies are usually analyzed using user experiments. For instance, the perceived credibility of AI instructors is influenced by AI voice features and their social presence (Kim et al., Reference Kim, Merrill, Xu and Kelly2022). Additionally, personal perceptions and communication styles influence how students perceive the credibility of AI graders in the classroom (Abendschein et al., Reference Abendschein, Lin, Edwards, Edwards and Rijhwani2024). Researchers are also exploring how to foster a collaborative relationship between AI systems and human educators, rather than relying solely on AI or using it as a supplementary tool, to enhance teaching outcomes (M. Chen et al., Reference Chen, Liu and Lee2022; Tossell et al., Reference Tossell, Tenhundfeld, Momen, Cooley and de Visser2024). Meanwhile, large language models like ChatGPT present significant risks for higher education, including the spread of misinformation, a potential decline in students’ critical thinking abilities, and a reduction in the credibility of educational research evidence (M. Chen et al., Reference Chen, Liu and Lee2022; Cukurova et al., Reference Cukurova, Luckin and Kent2020).
The evaluation of AI credibility in the marketing field is predominantly based on empirical research, supplemented by qualitative methods such as interviews. The primary focus of these studies is the impact of credibility on consumers’ AI experience and their willingness to purchase AI products. The perceived quality of user experience is a key factor influencing the credibility of AI systems. When consumers use AI products or platforms, their trust in the AI is shaped by the system’s interaction experience, the accuracy of its recommendations, the credibility of its sources, and its anthropomorphic characteristics (Alboqami, Reference Alboqami2023; Khan & Mishra, Reference Khan and Mishra2024; Kim et al., Reference Kim, Giroux and Lee2021). Moreover, consumers’ perceived credibility of AI has a significant impact on both their intention to use AI and their actual purchasing behavior (Uzir et al., Reference Uzir, Bukari, Al Halbusi, Lim, Wahab, Rasul, Thurasamy, Jerin, Chowdhury and Tarofder2023). Traditional models like the Technology Acceptance Model (TAM) and the Stimulus-Organism-Response (S-O-R) theory provide a theoretical foundation for AI credibility research, though there is a need to extend these theories in the context of human–generative AI interaction (Cheng et al., Reference Cheng, Zhang, Cohen and Mou2022; Wang et al., Reference Wang, Ahmad, Ayassrah, Awwad, Irshad, Ali, Al-Razgan, Khan and Han2023).
In the financial sector, AI credibility assessment has garnered significant attention from scholars, particularly in areas such as market volatility, credit risk evaluation, and fraud detection. Recent research has focused on developing more interpretable models, aiming to enable financial professionals to better understand and validate the reasoning behind AI-driven decisions, thereby enhancing transparency and trust in decision-making processes (Edunjobi & Odejide, Reference Edunjobi and Odejide2024; Sabharwal et al., Reference Sabharwal, Miah, Wamba and Cook2024).
4.8 Future Research Agenda
The issue of credibility as a cross-cutting research area has been the subject of extensive and sustained attention. In the new context of human—generative AI interaction, credibility research will continue to derive new propositions with the development of technology, changes in scenarios, updating of measurement approaches, and adaptive use of theories.
4.8.1 Reconceptualizing the AI Credibility
Compared to the earlier years of website credibility and the social media credibility in the Web 2.0 era, the research object of AI credibility has changed considerably and the emergence of some technologies rich in intelligent features may have made some new changes to the concept of credibility. For example, the credibility perception and evaluation of AIGC is quite different from the previous credibility measurement of user-generated content (UGC), and the production and dissemination of information content is not comparable in terms of speed, scale, and degree of influence. In addition, the GAI era on digital artifact and the embodiment of intelligences need to be further incorporated into the conceptual kernel of AI credibility. Further, the traditional credibility research conducted based on individuals urgently needs to break through toward a collectivist perspective, especially with the development of crowdsourcing, citizen science, and crowd science, the concept of AI credibility needs to take into account the characteristics of collectiveness in order to better construct the measurements of credibility in the interactions of different groups of people with GAI.
4.8.2 Examining Technological Advancement in GAI Credibility
Advances in algorithm have created a complex digital environment in which credibility assessment has become even more difficult. People no longer rely only on information cues (e.g., author, credentials, news source, etc.) to make a judgment of credibility. Instead, they make a holistic assessment of the platform, the source, the content, and even the judgments of other users. In this view, how to use algorithm-driven new technologies to improve human capabilities such as decision-making, problem-solving, situational learning, and work performance will be important future research topics. With the development of GAI, misinformation and disinformation (e.g., fake news, fake videos, fake pictures) could be intentionally created and quickly spread by various social bots. How to combat the dark side of AIGC will be an important topic for future research. While the dark side creates critical problems, the bright side of algorithmic affordances creates promising opportunities for credibility research. We can observe that GAI could be a powerful tool for filtering misinformation, combating fake news, and supporting laypeople’s credibility judgment. While analyzing large-scale data to understand credibility judgment patterns utilizing deep learning and computational methods, it will be important to incorporate much previous credibility research in which individuals’ multiple dimensions of credibility assessment are characterized and identified.
4.8.3 Evolution of Credibility Measures in Human–Generative AI Interaction
We found that a variety of methodological approaches have been taken to investigate credibility issues. Traditional credibility research methods, such as interviews, focus groups, case studies, ethnography, grounded theory, and content analysis, as well as quantitative research methods, such as surveys, experiments, network analyses, sentiment analyses, and data-mining techniques, are often used in combination to assess credibility. Recently, various algorithmic techniques have been developed to detect falseness or inaccuracy of information. We call for more mixed-methods analyses in future AI credibility studies, especially in combining the characteristics of the algorithms themselves as well as the characteristics of the people interacting with the algorithms and using multi-source data to carry out AI credibility measurements. Furthermore, credibility researchers could examine AI credibility judgments from neuro-information science perspectives while using EEG/FMRI techniques to do in-depth studies on interaction effects between information cues and judgments.
4.8.4 Building a Human-centered Theoretical Lens of AI Credibility
The development of technology and the richness of research objects place new demands on extending the theory of credibility. Some of the traditional dimensions of the concept of credibility, trustworthiness, expertise, authority, and objectivity, may no longer be sufficient in the context of GAI. The anonymity and “authorless” character of the next-generation Internet increases when machine learning and GAI play the role of content creators and gatekeepers of information dissemination. Therefore, future research needs to enrich and deepen the theoretical foundation of credibility by building and testing new dimensions of AIGC credibility by drawing extensively on relevant theories from different disciplines. Some concepts of AI credibility that are considered core dimensions of human judgment, such as fairness, openness, inclusiveness, and diversity, can be integrated into the development of machine learning and AI algorithms. Given that people will increasingly conduct holistic credibility assessments in the context of GAI, humanistic elements will be an important point for future AI credibility constructs.
In addition, the ethical and moral issues surrounding the assessment of AI credibility are complex and multifaceted, involving data privacy, attribution of responsibility, transparency and interpretability during system design, testing, and feedback. Current research lacks a clear mechanism for attributing responsibility and does not adequately address the details of user consent in the collection and use of feedback data. Future research should take a humanistic approach by establishing a clear regulatory framework for AI credibility assessment, strengthening accountability mechanisms, and ensuring rigorous ethical scrutiny of user feedback collected through surveys and other approaches.
4.9 Conclusion
Credibility has always been a central topic of concern for information-related fields, and the intelligent era has brought new opportunities and challenges to the assessment of credibility. With the dual empowerment of digital and intelligent technologies, future credibility assessment of AI should focus on diverse human–generative AI interaction scenarios (Appelganc et al., Reference Appelganc, Rieger, Roesler and Manzey2022), with the goal of developing trustworthy AI (Peckham, Reference Peckham2024). This undoubtedly puts forward higher requirements for theoretical and methodological tools for credibility assessment. Today, the credibility of GAI faces a series of ethical, information security, and data governance challenges. This review outlines the conceptual connotation of AI credibility and analyzes the main dimensions of GAI. In terms of research content, the main measures, influencing factors, challenges, and emerging approaches to AI credibility assessment are reviewed. We advocate researchers to strengthen interdisciplinary dialogue, exchange, and cooperation in the future, further enrich the theoretical lens, and innovate assessment methods, expand the application scenarios of GAI credibility assessment, and pay attention to the role of human–generative AI interaction experience in credibility assessment.