Generative AI personas considered harmful? Putting forth twenty challenges of algorithmic user representation in human-computer interaction Danial Amin a,* , Joni Salminen a, Bernard J. Jansen b, Joongi Shin c, Dae Hyun Kim d,e a University of Vaasa, Vaasa, Finland b Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar c Aalto University, Aalto, Finland d KAIST, Daejon, South Korea e Yonsei University, Seoul, South Korea A R T I C L E I N F O Keywords: Personas Generative AI personas Harmful Human-centered AI A B S T R A C T Generative AI personas (GenAIPs) promise user-centred design efficiency, but their impact on different persona challenges remains unexplored. Inspired by Dijkstra’s classic essay on harmful programming constructs, we analyze twenty challenges in persona development using Human-Centered AI principles. Through literature review and expert survey (n = 17), we find that GenAIPs transform rather than eliminate traditional persona challenges. Experts rated all challenges as problematic for GenAIPs (M > 4.0), with the highest concerns for hallucinations (M = 5.94), over-sanitization (M = 5.82), and lack of standardization (M = 5.59). 12 out of 20 challenges are considered more problematic for GenAIPs than conventional personas, particularly bias ampli fication, validation challenges, and accessibility without expertise. We provide HCAI-grounded guidelines demonstrating that effective GenAIP implementation requires human-AI collaboration rather than automation and prioritizing user welfare over technical efficiency. 1. Introduction Personas are a user-centered design (UCD) technique that represents archetypal characteristics of target user groups in a humanized manner. Personas represent information such as goals and behaviors of users, customers, or beneficiaries (Nielsen, 2019), typically portrayed as persona profiles (Cooper, 1999; Nielsen et al., 2015) (see Fig. 1). Per sonas inform decision makers (e.g., product designers and developers) about real users’ needs and can enable them to design more targeted and focused products and services. Creating high-quality personas that accurately represent targeted users and foster empathy towards them is a critical process in human-computer interaction (HCI) and user expe rience (UX) design in multiple domains, such as healthcare services, education, privacy, and security (Cooper et al., 2007; Guan et al., 2023; Nielsen, 2019; Salminen et al., 2021; Salminen et al., 2022). Development of data-driven personas (DDPs) (Mijač et al., 2018) has evolved alongside advances in statistical inference, machine learning, and data science (Salminen et al., 2021), including the development of Generative AI personas (GenAIPs) by using GenAI technologies, such as Large Language Models (LLMs) (Schuller et al., 2024), Text-to-Image Models (TTIMs) (Sattele and Ortiz, 2024), and multi-modal models (Salminen et al., 2024). To this extent, using GenAIPs to represent groups of people is an ongoing research topic in persona science1 (Hong et al., 2023; Nah et al., 2023; Salminen et al., 2023; Salminen et al., 2024; Schuller et al., 2024; Shin et al., 2024). Researchers have identi fied potential benefits of GenAI in persona development, including segmenting user data (Salminen et al., 2024), writing persona narratives (Schuller et al., 2024), and providing conversational user interfaces (Shin et al., 2024). Likewise, GenAIPs can help simulate user analysis without real-user constraints. TTIMs can generate persona profile * Corresponding author. E-mail addresses: danialam@uwasa.fi (D. Amin), joni.salminen@uwasa.fi (J. Salminen), bjansen@hbku.edu.qa (B.J. Jansen), joongi.shin@aalto.fi (J. Shin), dhkim16@yonsei.ac.kr, dhkim16@alumni.stanford.edu (D.H. Kim). 1 Persona science is the systematic study and methodology of creating, validating, and applying data-driven user archetypes to understand and represent real user segments (Nielsen, 2019) Contents lists available at ScienceDirect International Journal of Human - Computer Studies journal homepage: www.elsevier.com/locate/ijhcs https://doi.org/10.1016/j.ijhcs.2025.103657 Received 24 November 2024; Received in revised form 13 October 2025; Accepted 16 October 2025 Int. J. Human–Computer Studies 205 (2025) 103657 Available online 17 October 2025 1071-5819/© 2025 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by- nc-nd/4.0/ ). https://orcid.org/0009-0000-7597-2267 https://orcid.org/0009-0000-7597-2267 mailto:danialam@uwasa.fi mailto:joni.salminen@uwasa.fi mailto:bjansen@hbku.edu.qa mailto:joongi.shin@aalto.fi mailto:dhkim16@yonsei.ac.kr mailto:dhkim16@alumni.stanford.edu www.sciencedirect.com/science/journal/10715819 https://www.elsevier.com/locate/ijhcs https://doi.org/10.1016/j.ijhcs.2025.103657 https://doi.org/10.1016/j.ijhcs.2025.103657 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ images (Sattele and Ortiz, 2024), and Text-to-Video (T2V) generation models can be used to develop deepfake personas (Kaate et al., 2023) to increase the level of immersion. Likewise, GenAIPs can help simulate user analysis methods without real-user constraints. However, the integration of GenAI as an active agent (Joni OpenAI 2025) in persona development fundamentally transforms this from a traditional HCI methodology into a Human-Centered AI (HCAI) chal lenge, raising critical questions about algorithmic transparency (Gupta et al., 2024), fairness (Chu et al., 2024), and human control that are absent in conventional persona development methods. This shift from AI as a passive analytical tool to an active agent making autonomous de cisions about user characteristics and narratives necessitates examining persona development through established frameworks for responsible AI (Papagiannidis et al., 2025) deployment. Moreover, because GenAI can generate deepfake personas, such applications raise significant ethical concerns regarding potential misuse for deception and require strict disclosure protocols and consent frameworks (Al-kfairy et al., 2024; Moreno, 2024; Narayanan Venkit et al., 2025). Therefore, despite the possible benefits of GenAIPs, researchers are increasingly aware of the risks of using GenAI in persona development, such as introducing bias and exclusion toward real users (Cachat-Rosset and Klarsfeld, 2023; Ai-Leen Goodman-Deane et al., 2021), raising ethical concerns (Shams et al., 2023), and reducing the explainability of persona development (Bender et al., 2021). Despite this interest in both opportunities and challenges, most of the GenAIPs’ potential harms have not been systematically mapped in literature, making it difficult to form a comprehensive picture of the impact of these new technologies on user representation through personas. Seeing GenAIPs as an advancement or risk parallels the dichotomy of techno-optimism and techno-pessimism in AI technologies (Königs, 2022). This perspective is connected with the broader concern of how GenAI should be applied in HCI and UCD (Du et al., 2024; Hsu et al., 2024; Jung et al., 2025; Rapp et al., 2025). In spite of the growing adoption of GenAI in persona development, systematic analysis of associated risks and challenges remains absent from HCI literature. While individual studies may report specific issues, no comprehensive framework exists to understand how GenAI trans forms traditional persona challenges or guide responsible implementa tion. This knowledge gap leaves practitioners without adequate guidance for addressing the unique challenges of GenAI-assisted persona development. To this end, we examine the potential challenges and “harmfulness” of using GenAI in persona development. To systematically pursue this topic, we first define harmfulness in the context of GenAIPs. In this research, we define harmfulness as the potential negative impacts that GenAIPs could have on stakeholder groups within HCI and UCD prac tice. These harms can manifest in multiple ways, such as misrepresent ing user groups, propagating biases, erosion of authentic user research practices, or misinformed design decisions based on synthetically generated personas. For example, harm could be elderly-focused healthcare apps ignoring users’ needs for explicit system feedback (Alessa and Al-Khalifa, 2023), GenAIPs that perpetuate workplace discrimination (Cachat-Rosset and Klarsfeld, 2023), or design tools failing to capture the diverse factors causing digital exclusion (Ai-Leen Goodman-Deane et al., 2021). Our research follows three phases. First, we conducted a literature review to identify twenty challenges in Gen AIP development. Second, we surveyed seventeen persona experts to evaluate each challenge’s severity and compare GenAIPs to traditional personas. Third, we propose practical guidelines based on HCAI prin ciples to address these challenges. We categorize GenAIP challenges using Shneiderman’s HCAI prin ciples (Shneiderman, 2022) across seven themes: transparency, fairness, reliability, control, privacy, safety, and user experience. This framework is essential because GenAI transforms persona development from traditional HCI methodology into human-AI collaboration requiring ethical oversight. The approach shifts focus from technical details to human-centered design principles and reveals why challenges matter for human welfare rather than when they occur. Connecting each challenge to established AI ethics principles enables targeted interventions and allows researchers to leverage existing solutions while communicating to stakeholders. The HCAI framework is particularly valuable because it aligns GenAIP research with the broader movement toward responsible AI development, enabling researchers to leverage existing solutions and guidelines from the AI ethics community. Identifying and examining these challenges maps out the path to the potential harms that GenAIPs could cause to various stakeholders of the HCI research community. Theoretically, we demonstrate how HCAI principles provide a novel framework for understanding GenAIP chal lenges. Practically, we provide actionable guidelines for responsible GenAIP implementation, including bias detection protocols and human- AI collaboration workflows. 2. Conceptual background Examining the challenges of GenAIPs requires understanding (1) the diverse forms of persona generation methods and (2) their criticisms. The first subsection covers the evolution of persona from manual Fig. 1. A representative of GenAIP1 (left) and Manual persona (adopted from (Delve, 2025)) (right). The GenAIP comprises demographical and contextual data generated while the manual persona uses self-reflected survey responses. 1Created using Survey2Persona (https://s2p.qcri.org/), an online GenAIP creation tool using survey data. D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 2 https://s2p.qcri.org/ development by subject matter experts (SMEs) to the current-day GenAI techniques. The second subsection presents a synopsis of the criticism of personas as design and HCI research tools, ranging from general con cerns about their usability to specific methodological challenges. 2.1. The diverse forms of personas 2.1.1. Pre-automation era Initially a qualitative method (Alan Cooper, 1999), persona creation has been primarily manual, relying on SMEs to collect user data through small-scale techniques (i.e., focus groups or surveys), analyze the collected user data, and create the content of personas (Nielsen, 2019). While data collection can involve thousands of participants, the critical bottleneck lies in the capacity of the experts in processing and inte grating information from diverse data sources into coherent personas. These manual personas (MPs) have crucial limitations. First, generating representative MPs is difficult. Human analysis is prone to introducing prior beliefs or biases about user groups (Chapman and Milham, 2006). Second, the labor-intensive nature of manual analysis makes it difficult to scale MPs toward statistically representative personas, as each addi tional data source requires proportional increases in human cognitive effort for synthesis and interpretation (Jisun An et al., 2018; Chapman and Milham, 2006; Charity Howard, 2015; T.W. Howard, 2015; Jansen et al., 2020). Third, while MPs can be modified in practice (particularly in commercial contexts responding to market feedback), each modifi cation demands substantial human effort to maintain internal consis tency and empirical grounding, making systematic updates resource-intensive and rendering MPs relatively static compared to automated approaches that can rapidly incorporate new data streams (Chapman and Milham, 2006; Joni Salminen et al., 2020). This static nature of MPs makes them challenging to utilize in modern society, where practitioners need to observe many user groups with swift changes in their opinions. 2.1.2. Rise of automatic persona generation The limitations of MPs have sparked the idea of creating personas using algorithms and statistical methods on large and dynamic datasets. Researchers introduced automatic personas (APs) that are developed from structured data (e.g., surveys and demographics) using statistical methods automatically (Jung et al., 2018). Data-driven personas (DDPs) emerged as a first-hand response to the limitations of the traditional MPs. The idea was to augment manually developed personas from “low-tech design artifacts” to “high-tech user representations” (Jansen et al., 2020). In other words, DDPs are complete persona profiles devel oped using qualitative and/or quantitative data about a given user popula tion, which is analyzed using quantitative methods, including data science and machine learning algorithms (Jisun An et al., 2018; Jansen et al., 2018; Jansen et al., 2020). Diverse approaches to DDPs have emerged as underlying technologies in computer science have advanced. Research articles identifying key moments in the evolution of personas in general and DDPs in particular are represented in Fig. 2. GenAIPs represent another advancement in persona development (see Fig. 2), extending beyond DDPs by using GenAI technologies to develop personas (Nah et al., 2023; Zhang et al., 2024). Unlike DDPs that primarily analyze data patterns, GenAIPs encompass multiple AI models (LLMs, TTIMs, multimodal (a model that can interact with text, images, and videos together)) to automatically develop detailed per sonas with narratives, visuals, and behavioral patterns. Since their introduction in 2022 (Hong et al., 2023; Nah et al., 2023), GenAIPs have evolved from basic text generation to dynamic systems that can create and update personas in real-time using continuous data streams (Kim and Kim, 2019; Zhang et al., 2024), marking a significant shift in persona development. In particular, this includes (1) rapid development of personas at scale with minimal human involvement (Salminen et al., 2024), (2) consistent narrative structure across developed personas due to standardized prompting (Schuller et al., 2024), (3) ability to quickly iterate and refine personas based on feedback by adjusting prompts (Shin et al., 2024), (4) novel persona profile features such as chat (Jung et al., 2025), and (5) help in synthesizing and writing persona de scriptions from raw user data (Schuller et al., 2024). These capabilities reflect different GenAI application approaches, with aspects (1) and (3) describing fully automated workflows suitable for rapid prototyping, while aspect (5) represents human-AI collaborative workflows requiring substantial domain expertise. This methodological diversity within GenAIP implementations serves different use cases rather than repre senting inconsistent technological capabilities. 2.2. Criticisms and challenges of personas Despite the benefits of generating personas, there have been criti cisms around the use of personas. First, there are criticisms of personas as a design technique in general. These criticisms apply to all types of personas, including MPs, APs, and DDPs (Chapman and Milham, 2006; Ronkko, 2005). Second, there are approach-specific criticisms, e.g., manually created personas are often based on low sample sizes (Chapman and Milham, 2006). Third, there are method-specific criticisms within a specific approach, e.g., that K-means clustering would not be optimal for DDPs because it assigns each demographic group only to a single cluster (Kwak et al., 2017). This section introduces each type of criticism in a broad manner.2 Some prior research (Chapman et al., 2008; Chapman and Milham, 2006) has presented general criticism. By posing the overarching ques tion, “Are personas really usable?”, Howard (T.W. Howard, 2015) crit icizes the persona generation technique. The crux of his criticism is that, although personas were introduced to facilitate communication among team members in UCD, personas do not solve communication problems and can even lead to further misunderstandings. Friess (Friess, 2012) made a similar conclusion based on an ethnographic study, reporting that de signers rarely evoke or mention personas in their daily jobs. Matthews et al. (Matthews et al., 2012) investigated users’ attitudes about personas, finding that decision-makers perceived them as too abstract and misleading. Finally, De Voil raises several key issues regarding the concept of per sonas, proposing that personas are artificial thinking aids with severe lim itations (Voil, 2010). Previous work has also discussed the societal challenges of persona use, which comes as a criticism of the application of the persona. These criticisms mainly include stereotyping (Marsden and Haag, 2016; Ronkko, 2005; Turner and Turner, 2011), in which a segment of the users is represented based on prejudiced misconceptions. These issues primarily arise due to the applicability of the personas and thus are independent of the method and type of the persona. A significant volume of literature on DDPs considers approach and method-based criticism. Chapman and Milham raise several shortcomings of DDPs (Chapman and Milham, 2006), including (1) the inconsistency problem in the generated personas, where one part of persona profile in formation can be from Source A and another from Source B, which may or may not refer to the same users; and (2) the granularity problem where increasing the number of persona attributes requires more personas to be created to cover all possible segments. Salminen et al. (Joni Salminen et al., 2020) mentioned “three Es” as general challenges of personas that can be extended to DDPs and GenAIPs: (1) Envision (i.e., personas have no direct relationship to real user data), (2) Execution (i.e., the quality of the generated personas is low or unknown), and (3) Evaluation (i.e., the success of personas is based on anecdotal feedback). The latter two can be considered relevant concerns for DDPs and MPs alike. In addition, Salminen et al. (Joni Salminen et al., 2020) mentioned the following challenges of AP creation: (1) lack of standards and best practices, (2) lack of ethical considerations, and (3) loss of immersion. These are critical issues 2 Additional resources recommended for the reader include a literature re view of quantitative persona creation (Salminen et al., 2020) and a textbook focused on data-driven personas (Jansen et al., 2021). D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 3 that we expand on in the subsequent section. These traditional persona challenges provide the foundation for understanding how GenAI transforms existing issues rather than creating entirely novel problems. In our subsequent analysis, we demonstrate how each of these established challenges manifests differ ently in GenAI contexts, for instance, bias becomes algorithmic bias operating at scale, validation difficulties become opacity problems, and inconsistency becomes AI hallucination with convincing but fabricated content. Hence, they remain relevant for GenAIPs. Next, we shift our attention to the analysis of GenAIPs’ potential harms. 3. Are GenAIPs considered harmful? This section examines the potential harms of GenAIPs through analyzing challenges across the persona development lifecycle. We first outline our approach for identifying and categorizing these challenges, explaining how they can lead to harmful outcomes. 3.1. Approach Our approach to examining these potential harms draws inspiration from Dijkstra’s seminal 1968 essay “Go To Statement Considered Harm ful” [38], which fundamentally affected the programming community’s view on a widely-used coding construct. Dijkstra first established ideal standards for program comprehension by examining how programmers understand program execution, then demonstrated through systematic analysis how the “goto” statement violated these standards by making program behavior unpredictable. The essay’s impact exceeded its im mediate technical context, establishing a framework for critically examining seemingly beneficial technological practices. Many re searchers have adopted Dijkstra’s verbiage of ‘considered harmful’ as a starting point and inspiration [36, 37, 48, 84, 122]. We can observe a clear parallel with Dijkstra’s thinking in adopting GenAIPs. While they offer apparent benefits in efficiency and scalability, they could introduce significant challenges and harms in HCI practices. Similar to Dijkstra’s work, we systematically examine the potential harms of GenAIPs by analyzing challenges across the persona develop ment lifecycle [2] and their impacts on various stakeholders. This analysis helps us understand whether GenAIPs could be harmful and how and in what contexts these harms might manifest. Just as Dijkstra’s analysis led to more structured programming approaches, our analysis can guide more responsible integration of GenAI in persona develop ment. This analogy guides our research question: “Are GenAIPs consid ered harmful?”. 3.2. Methodology To address the question of harmfulness in GenAIPs, we focus on identifying challenges that exist across HCAI principles and then query persona researchers’ perspectives on how crucial these challenges are relative to previous DDPs that did not utilize GenAI technologies. By challenges, we mean specific difficulties, limitations, or risks (a) inherent to GenAI technology itself, (b) emergent from using GenAIPs inefficiently, (c) resultant from human interaction with GenAIP, or (d) representative of areas needing improvement for better GenAIP quality and reliability. We organize challenges according to seven HCAI prin ciples: transparency, fairness, reliability, control, privacy, safety, and user experience (Shneiderman, 2022). Each challenge is analyzed in terms of its impact on different stakeholders: persona developers, persona users, and target groups. These stakeholders are defined as follows: (1) persona developers are responsible for developing personas from data collection to their application, (2) persona users use the personas in their work for decision making, and (3) target groups are represented by the personas. Our methodology uses a three-pronged approach to examine GenAIP harmfulness: (1) snowball literature sampling mapping twenty chal lenges to prior work, (2) empirical case study analysis of four published GenAIP studies showing how challenges manifest in practice, and (3) expert survey with 17 SMEs providing quantitative validation and comparative assessment against traditional personas. This approach Fig. 2. The evolution of persona development methods from 1999 to 2024, highlighting key milestones across four methodological streams: Manual Personas (purple), Automated Personas (green), Data-Driven Personas (orange), and GenAI Personas (blue) (An et al., 2016; Alan Cooper, 1999; Jung et al., 2021; Kwak et al., 2017; Liu et al., 2023; McGinn and Kotamraju, 2008; Mulder and Yaar, 2006; Zhang et al., 2016). D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 4 https://www.zotero.org/google-docs/?broken=rVgLNC https://www.zotero.org/google-docs/?broken=LU4IKi https://www.zotero.org/google-docs/?broken=dh1BSB combines theoretical grounding, real-world evidence, and expert validation. We used snowball sampling, starting from foundational persona literature to develop our challenge framework. We began with key pa pers documenting traditional persona challenges, including Cooper (Alan Cooper, 1999) on MPs, Chapman and Milham (Chapman and Milham, 2006) on methodological concerns, Ronkko et al. (Ronkko, 2005) for practical concerns, and Salminen et al. (Joni Salminen et al., 2021) on DDPs limitations. Starting from these challenge-focused papers is appropriate for conceptual work that aims to understand how GenAIP transforms existing problems rather than discover entirely novel phe nomena. From these seed papers, we followed citation networks back ward to trace theoretical origins and forward to identify contemporary applications. This process yielded foundational evidence for traditional persona challenges across all personas. We then searched for these traditional challenges manifest in GenAIP contexts. For each identified traditional challenge, we conducted searches on Google Scholar using terms combining the challenge concept with GenAI terminology (e.g., “bias personas LLM,” “halluci nation AI-generated personas,” “validation generative personas”). We supplemented this with broader searches for emerging GenAIP-specific issues using terms like “challenges,” “generative AI personas,” and “LLM persona limitations.” We prioritized peer-reviewed publications from HCI venues (CHI, DIS, UIST, IUI, CSCW) and included recent arXiv preprints given the field’s rapid evolution. To address potential citation network limitations, we included literature from the field of LLM and GenAI research as well. For each challenge, we systematically mapped evidence from tradi tional persona literature to contemporary GenAIP manifestations (see Table 1). When direct matches were unavailable, we traced methodo logical parallels. For example, we connected inconsistencies docu mented in manual personas to stochastic variability in LLM-based generation. We classified these challenges according to HCAI principles to understand their fundamental nature and relationship to human- centred design principles. We examined four published GenAIP studies to understand how these challenges manifest in practice. We selected these studies to represent different approaches: large-scale automated generation (Salminen et al., 2024), context-specific applications (Sattele and Ortiz, 2024), human-AI collaborative workflows (Shin et al., 2024), and comparative evaluation methods (Schuller et al., 2024). Then, we analyzed each case study to identify which challenges appeared and how they affected different stakeholders. The illustrative scenarios presented throughout this section were systematically generated using LLMs to demonstrate potential manifes tations of each challenge, following established practices in HCI research for scenario-based analysis. These examples are intentionally exaggerated, similar to recent research in AI impact on work (Constantinides et al., 2025) to clearly demonstrate each challenge type and are grounded in patterns observed in our literature review and case study analysis. We present real-world observations from published GenAIP implementations to complement these illustrative scenarios. We conducted a quantitative survey with 17 SMEs in the field to understand their experience with GenAIPs. We presented the twenty identified challenges as statements to the experts and asked them to rate their agreement with each statement on a 7-point Likert scale (1 = strongly disagree, 7 = strongly agree). Additionally, we asked experts to evaluate whether each challenge represents a more significant problem, an equal problem, or a less significant problem for GenAIPs compared to DDPs. Although snowball sampling from traditional persona literature might miss GenAI native challenges, our expert survey validation demonstrates that practitioners recognize these as legitimate concerns. This is primarily a conceptual article focused on the collection and ag gregation of challenges discussed in different research articles in the field. Conceptual papers provide theoretical frameworks for exploring emerging phenomena, which enable us to discuss harms that are not yet prominent but could become so as GenAIPs diffuse more broadly. 3.3. Empirical assessment of challenge framework through GenAIP case studies To assess our theoretical framework and demonstrate its practical utility for identifying potential harms in real GenAIP implementations, we conducted a systematic analysis of four recent GenAIP studies across diverse application domains. This analysis serves as an empirical ex amination of our HCAI-based challenge categorization, exploring whether the identified challenges manifest consistently across different research contexts and methodological approaches. Salminen et al. (Salminen et al., 2024) generated 450 addiction-related personas using GPT-4 to test large-scale automated persona creation. Their study showed clear geographical bias, with 86 % of personas being US-based despite no geographical constraints in the prompts (FC01). The per sonas consistently portrayed addiction narratives in an unrealistically positive light, minimizing the severity of substance abuse issues (FC03). Gender stereotypes manifested in occupational roles, with male per sonas predominantly shown as construction workers or software de velopers, while female personas were typically nurses or event planners (FC02). Technical inaccuracies emerged in addiction-specific details, such as mixing up symptoms and treatments of benzodiazepine and opioid addictions (TC02), indicating GenAIPs may always require human oversight (TC01). Sattele and Ortiz (Sattele and Ortiz, 2024) investigated GenAIPs for understanding water access issues in Iztapalapa, using local news arti cles and contextual images as inputs. Despite having access to specific local information, the generated personas failed to capture the complexity of daily water challenges faced by residents (RC01). The AI consistently generated descriptions of “vibrant and grateful” commu nities while overlooking documented infrastructure problems and social tensions (FC03). The study found basic factual errors in persona de scriptions, such as incorrect geographical placement of Iztapalapa within Mexico City (TC02). Gender biases appeared in narrative con struction, with female personas primarily described through family roles while male personas were characterized by professional achievements (FC02). The researchers observed that designers might rely on these GenAIPs without questioning their limitations or validity (CC04). Shin et al. (Shin et al., 2024) evaluated different workflows for survey-based persona creation, combining LLM capabilities with human expertise. Their research showed that LLMs were effective at summari zing and presenting information but struggled to independently identify significant user characteristics from raw survey data (FC01). When humans pre-grouped the data according to key characteristics, the resulting personas showed improved representation of user groups. However, fully automated workflows reduced designers’ understanding of the underlying user data (CC04). Their study demonstrated that maintaining demographic distributions required careful human over sight of the generation process (RC04). The research concluded that effective persona generation required structured collaboration between humans and AI, with a clear division of responsibilities (CC02). Schuller et al. (Schuller et al., 2024) examined data-driven persona generation through different collaborative workflows. Their analysis revealed that fully automated approaches failed to accurately represent the statistical distribution of user characteristics present in the input data (RC04). The LLM-auto workflow, while maintaining basic de mographic ratios, missed important behavioral patterns and user goals present in the original data (RC01). Their study identified specific problems in validating the accuracy of generated personas against source data (TC01). Even with partial human involvement in the workflow, maintaining reliable connections between raw user data and final persona descriptions proved challenging (PC11). D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 5 Table 1 Systematic mapping of the challenges with regard to prior literature. Challenge Reference Evidence from prior literature Manifestation in GenAIPs FC01: Misrepresentation—It Doesn’t Represent Me Salminen et al. (2020) (Joni Salminen et al., 2020) Research on data-driven personas demonstrates that algorithmic approaches can systematically underrepresent minority groups, particularly when generating fewer personas from datasets that already skew toward majority populations. This bias is amplified when data sources themselves lack diversity or when algorithms optimize for statistical significance over representational fairness. GenAI models trained predominantly on Western, English-language datasets further exacerbate these representation gaps, creating personas that marginalize underrepresented voices and perspectives in the generation process. RC01: Superficiality—As Superficial as It Can Be Sattelle et al. (2024) (Sattele and Ortiz, 2024) Research on AI-generated personas reveals concerns about depth and authenticity, as automated systems may create compelling narratives that lack substantive insights into user motivations, cultural contexts, or behavioral contradictions that characterize real users. GenAI’s ability to produce polished, coherent narratives can mask underlying shallowness, making superficial personas appear more comprehensive and credible than they actually are, potentially misleading design teams. RC03: Limited Generalizability—It Only Applies to You Rönkkö et al. (2004) (Rönkkö et al., 2004) Personas are inherently context-dependent representations that may not transfer effectively across different domains, user groups, or application contexts. The characteristics that define a persona in one setting may be irrelevant or misleading when applied to another context. GenAI personas can appear deceptively universal due to their polished presentation and comprehensive- seeming details, creating false confidence in their cross-context applicability without proper validation or domain-specific research. RC02: Inconsistency Dilemma—It Suggests a Different Persona Every Time Chapman,et al. (2006) (Chapman and Milham, 2006) Persona development faces significant consistency challenges, where different teams or researchers can derive substantially different persona profiles from identical datasets, depending on methodological choices, subjective interpretations, and analytical approaches. GenAI’s stochastic nature compounds this inconsistency problem, as the same inputs can produce notably different persona outputs due to the inherent randomness in generation algorithms, making reproducibility a significant challenge. SC02: Computational Resource Intensive—It Is Harmful for the Environment Bolón-Canedo et al. (2024) (Joni OpenAI 2025) Large-scale AI models require substantial computational resources for both training and inference, translating directly into significant electricity consumption and carbon emissions, raising sustainability concerns about AI applications in research and business contexts. Unlike traditional persona creation methods, GenAI persona generation introduces new environmental costs through the massive computational requirements of large language models, creating ethical considerations around sustainable design practices. CC03: Lack of Standardization—Everyone Has Their Own Way Salminen et al. (2020) (Joni Salminen et al., 2020) Data-driven persona creation currently lacks standardized methodologies, resulting in significant variability in quality, reliability, and transparency across different studies, tools, and practitioners, making evaluation and comparison challenging. GenAI introduces additional layers of variability through different models, prompt engineering approaches, and proprietary algorithms, further fragmenting any potential standardization efforts in the field. FC03: Over-sanitization—Reality Is Ugly, GenAI Is Not Salminen et al. (2024) (Salminen et al., 2024) Studies of LLM-generated personas reveal a tendency to omit negative characteristics, challenges, or problematic behaviors that are realistic parts of user populations, instead presenting idealized versions that may not reflect actual user experiences. GenAI’s safety filters and training on curated datasets can create unrealistically positive personas that obscure important user pain points, struggles, and negative behaviors that are crucial for comprehensive user understanding. CC04: Over-reliance on GenAI—It Can Do Everything Shin et al. (2024) ( Shin et al., 2024) Research suggests that purely automated approaches to persona generation, while producing impressive outputs, cannot fully substitute for human interpretation, contextual knowledge, and the nuanced insights derived from direct user research and domain expertise. The sophisticated outputs of GenAI systems can create overconfidence in automated persona generation, potentially reducing essential human oversight, validation, and the iterative refinement that ensures persona accuracy and relevance. FC04: Complications of Average—Averages Are Wrong, Anyway Salminen, et al. (2021) (Joni Salminen et al., 2021) Statistical approaches to persona creation often collapse diverse user characteristics into averaged representations of “typical” users who may not actually exist, potentially obscuring important differences between subgroups and edge cases. GenAI’s pattern recognition capabilities tend to create statistically probable but potentially non-existent user archetypes, which may mask crucial minority user needs and edge cases that are important for inclusive design. PC01: Reliance on Third Parties—I Am Not in Control Anymore Andrus (2023) ( Andrus, 2023) Dependence on external AI service providers creates challenges around data control, workflow dependency, and operational vulnerability, as changes to APIs, pricing, or service terms can significantly impact business operations. GenAI persona creation often requires external API services, creating vendor lock-in scenarios and reducing organizational control over the persona generation process, data handling, and long-term accessibility. SC01: Adversarial User—It Can Harm Us Schneier (2024) ( Schneier, 2024) Large language models are susceptible to prompt injection attacks and adversarial inputs that can manipulate outputs to produce biased, harmful, or deliberately misleading content, posing challenges when such outputs are trusted in business contexts. GenAI personas can be deliberately manipulated through carefully crafted adversarial prompts, potentially creating biased or harmful user representations that appear legitimate but serve malicious purposes. CC02: Manual Resource Intensiveness—It Takes a Village to Build a Persona Dominello et al. (2025) (Dominello et al., 2025) Effective persona creation remains resource-intensive and time-consuming, requiring significant investment in skilled human resources for data gathering, analysis, validation, and translation into actionable insights. While GenAI reduces some initial creation effort, proper validation, refinement, and integration of AI- generated personas still demands substantial human expertise, domain knowledge, and ongoing maintenance resources. CC01: Persona Quality Risk—Accessibility Without Expertise Chang et al. (2008) ( Chang et al., 2008) Personas are often created and applied by practitioners without adequate training in the methodology, resulting in superficial representations that may appear convincing but fail to reflect genuine user insights or sound research principles. GenAI democratizes persona creation by enabling non- experts to generate sophisticated-looking personas quickly, but without understanding underlying methodological principles, increasing the risk of producing fundamentally flawed user representations. RC04: Aggregation—You’re Aggregating for the Wrong Reasons Rönkkö (2005) ( Ronkko, 2005) Statistical clustering methods used in persona development may produce mathematically sound GenAI’s pattern recognition creates statistically coherent user segments based on algorithmic (continued on next page) D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 6 Table 1 (continued ) Challenge Reference Evidence from prior literature Manifestation in GenAIPs groupings that are not necessarily meaningful for design work, reflecting algorithmic distinctions rather than practical differences relevant to user experience design. correlations that may not correspond to design- relevant insights or meaningful user behavior patterns. TC02: Hallucinations—Am I For Real? Salminen et al. (2024) (Salminen et al., 2024) Studies of LLM-generated personas reveal instances of hallucinated details—plausible-sounding information that has no basis in actual data or research, including fictional personal details, incorrect domain-specific information, and fabricated behavioral patterns. GenAI’s tendency to generate convincing but potentially fabricated details creates personas with compelling narratives that may contain no basis in actual user research, data, or empirical evidence. FC02: Persona-Driven Discrimination—It’s All Biased Here! Bender et al. (2021) (Emily M. Bender et al., 2021) Language models trained on large text corpora inevitably encode and can amplify societal biases and stereotypes present in training data, potentially perpetuating harmful representations of marginalized groups and reinforcing existing inequalities. GenAI personas can inherit and amplify societal biases from training data, potentially creating discriminatory user representations that reinforce harmful stereotypes rather than providing fair and accurate user insights. UC01: Over-Expectations—Give Me Everything Bourne (2024)[20] Widespread AI hype creates unrealistic expectations about artificial intelligence capabilities, leading to overconfidence in automated systems and insufficient attention to their limitations, potential errors, and need for human oversight. Enthusiasm around GenAI capabilities can create unrealistic expectations that AI can generate perfect, comprehensive personas instantly, leading to overreliance on automated outputs without proper validation or critical evaluation. TC01: Lack of Validation—But Is It Verified? Zhao et al. (2024) ( Zhao et al., 2024) Large language models operate as “black boxes,” making it difficult for users to understand how outputs are generated, verify factual accuracy, or assess the consistency and reliability of generated content. The opaque nature of GenAI systems makes it nearly impossible to validate the accuracy, consistency, or empirical basis of generated personas, creating significant challenges for trust and verification in design processes. UC02: Validating the Impact—I Have Used a Persona, Now What? Friess (2012) ( Friess, 2012) Research indicates that designers may not explicitly reference personas in their design discussions, and when they do, references are often superficial or rhetorical, raising questions about personas’ actual impact on design decisions. GenAI-generated personas may be even more disconnected from actual design decisions due to their automated nature and the reduced human investment and understanding involved in their creation process. UC03: Desk Drawer Effect—Will I Ever Use This Persona Again? Matthews et al. (2012) (Matthews et al., 2012) Despite widespread creation and initial distribution within organizations, many personas end up unused after initial presentations, with practitioners continuing to rely on personal experience or informal assumptions rather than the developed personas. The ease of GenAI persona generation may exacerbate this problem by enabling the rapid creation of multiple personas without the investment, stakeholder buy-in, and organizational commitment necessary for sustained adoption and use. Fig. 3. Thematic arrangement of different challenges of GenAIPs according to HCAI themes. D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 7 3.4. Challenges for GenAIPs The use of GenAI as an active agent in persona development trans forms GenAIPs from a pure HCI challenge into an HCAI concern. This fundamental shift, from AI and ML as a supporting tool to an active participant in persona development, necessitates examining these challenges through HCAI principles. We categorize GenAIP challenges using Shneiderman’s HCAI principles (Shneiderman, 2022), organizing them into seven themes: Transparency (understanding AI’s black-box de cisions), Fairness (addressing algorithmic biases), Reliability (managing hallucinations), Control (balancing automation with oversight), Privacy (protecting data), Safety (preventing harmful personas), and User Experience (ensuring practical utility). This categorization reveals how GenAIPs uniquely combine HCI and AI challenges, requiring solutions that address both human and AI aspects of persona development (see Fig. 3). We present each challenge with illustrative examples demonstrating problematic GenAIP practices to avoid. These examples are meant to clarify how challenges manifest in practice, not as recommended ap proaches, and should be viewed with caution. 3.4.1. Transparency challenges (TC) These challenges fundamentally concern the explainability and auditability of GenAIPs, such that users can understand, verify, and trust the generated personas. 3.4.1.1. TC01: lack of validation—but is it verified?. Persona users struggle to verify if the personas accurately represent the target users. This can be due to a lack of transparency in how the personas were developed, what data was used, and how to ensure that the personas correspond to the target user population. The challenge of verifying personas’ accuracy becomes particularly acute with GenAIPs due to the opaqueness of their development process. While traditional personas allow users to trace back to source data and development methods, GenAIPs’ underlying complexity and inherent technical nature create significant barriers to verification (Zhao et al., 2024). This lack of transparency makes it difficult for persona users to determine whether personas genuinely represent their target populations. Hypothetical example: A product team uses GenAI to create a diabetes patient persona named “Samantha,” but cannot verify if her characteristics accurately represent real users since the AI’s development process is opaque. Even when asking for reasoning from the GenAI for its characteristics, it provides generalized answers as “Based on the population statistics”, “following the general trends”. Without a clear visibility into the GenAI’s data sources or reasoning process, the team has no way to validate the persona’s accuracy (TC01) or trace the origins of its attributes except by conducting a user study. Observations from academia/industry: When Amin et al. (Amin et al., 2025) conducted a systematic review of 52 GenAIP research articles from 2022–2024, they found “major gaps in persona evaluation for AI-generated personas” across the field. The review documented that despite GenAI being used in various stages of persona development, “similar to other quantitative persona development techniques, there are major gaps in persona evaluation for AI generated personas.” 3.4.1.2. TC02: hallucinations—am I for real?. Persona users may strug gle to distinguish fabricated details from factual information in per sonas, especially because GenAI can generate fluent, convincing narratives. While GenAIPs appear credible due to their coherent pre sentation, they often include hallucinated details that are factually incorrect. For example, in the Iztapalapa water crisis study (Sattele and Ortiz, 2024), GenAIPs created compelling but inaccurate scenarios that understated real challenges, while addiction-focused personas (Salminen et al., 2024) contained medical contradictions that only domain experts could identify. Similar issues arose when GenAIPs depicted social workers with unrealistic lifestyle patterns (McGinn and Kotamraju, 2008). Hypothetical example: A food delivery app team uses a GenAIP “Marcus,” a persona describing a busy professional with specific dietary restrictions. However, GenAI (continued on next column) (continued ) hallucinates impossible food allergies and unrealistic eating patterns. Designers without nutrition knowledge accept these fabrications and develop misleading menu filters and recommendations. This shows how LLMs generate believable but factually incorrect personas (TC02), leading to design errors when domain expertise is missing. Observations from academia/industry: When Kaate et al. (Ilkka Kaate et al., 2025) tested GenAIPs by presenting them with unanswerable questions about persona characteristics where no factual information existed to draw from, GenAIPs provided plausible but incorrect answers 52 % of the time. Rather than acknowledging uncertainty or stating “I don’t know,” the GenAIPs confidently generated believable but entirely fabricated persona details, thus hallucinating (TC02). 3.4.2. Fairness challenges (FC) These challenges arise due to the systematic bias of GenAIPs such that it creates discrimination. 3.4.2.1. FC01: misrepresentation—it doesn’t represent me. Persona users often struggle to create personas representing minority user groups (Joni Salminen et al., 2022). This is due to various reasons, such as a lack of data on minority user groups (Jisun An et al., 2018) or algorithms that emphasize central tendencies in the data (Jisun An et al., 2018). This challenge is particularly critical for GenAIPs, which may fail to capture minority user groups due to a lack of training data on minorities regarding underrepresented demographics, cultures, and deviant be haviors globally (Anthis et al., 2024; Gupta et al., 2024). GenAIPs’ tendency to generate homogenized personas overlooks the distinct needs and behaviors of underrepresented groups (Sattele and Ortiz, 2024; Schuller et al., 2024). This is particularly evident when the LLM is asked to develop personas without any additional data, leading to sit uations of misrepresentation, such as elderly users with specific tech nological needs (Alessa and Al-Khalifa, 2023) or users from diverse cultural backgrounds (Atari et al., 2023). Hypothetical example: A health app startup uses GenAIPs that produce solely young, able-bodied, tech-savvy personas, with even their “diverse” elderly persona reflecting stereotypical characteristics rather than authentic needs. When the team created features based on these misrepresentative personas, they discovered during testing that elderly users with arthritis, low-income users without reliable internet, culturally diverse users, neurodivergent users, and visually impaired users all struggled with the application, indicating a lack of representation (FC01) by the GenAIPs. Observations from academia/industry: When Columbia University researchers (Li et al., 2025) created personas for the 2024 U.S. presidential election simulation using 6 LLMs to generate approximately one million personas from diverse social media and demographic data, the resulting GenAIPs systematically predicted Democratic victories across all states, including traditionally Republican strongholds like Alabama and South Carolina. The study found that 86 % of generated GenAIPs reflected urban, college-educated perspectives despite using geographically diverse input data sources, demonstrating systematic underrepresentation of conservative, working-class, and economically-focused viewpoints in the final GenAIPs (FC01). 3.4.2.2. FC02: persona driven discrimination—it’s all biased here!. Per sonas are associated with various biases, such as stereotyping, self- referential bias, confirmation bias, and algorithm bias (Chapman and Milham, 2006; Nielsen, 2018; Joni Salminen et al., 2018). In addition to these biases, GenAIPs are associated with specific biases, such as (1) generative bias and (2) contextualization bias. The generative bias is the discriminatory or imbalanced behavior of GenAI when generating con tent (e.g., creating stereotypical profiles for different genders (Hacker et al., 2024; Zhou et al., 2024) or a specific region (Salminen et al., 2024)). Contextualization bias occurs when GenAI fails to understand the context, leading to personas lacking relevance or specific user needs. LLMs are trained on datasets that may contain biased or unrepresenta tive language, leading to personas reinforcing stereotypes or excluding marginalized groups (Emily M. Bender et al., 2021; Buolamwini and Gebru, 2018). In short, while using GenAI can help reduce human biases in generating personas, it may merely shift the source of biases. Persona users might inadvertently favor well-represented user groups in their D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 8 design decisions, while overlooking or misunderstanding underrepre sented groups due to missing or inaccurate personas. It should be noted that even though bias can be measured through statistical disparity metrics (e.g., equal opportunity difference, demographic parity) (Lum et al., 2022), intersectional analysis of representation rates (Connor et al., 2023), or qualitative content analysis of generated personas against established bias taxonomies and demographic ground truth data (Chu et al., 2024), these techniques remain underexplored in the context of GenAIPs. Hypothetical example: A healthcare company uses GenAI to create diabetes patient personas, but the system consistently portrays African patients as non-compliant while depicting Caucasian patients as proactive and health conscious. This generative bias (FC02) makes designers inadvertently develop different features for different demographics, assuming negligence for some while creating supportive tools for others, and demonstrating how GenAIPs can perpetuate stereotypes and lead to discriminatory design choices. Observations from academia/industry: When Gupta et al. (Gupta et al., 2023) studied 24 reasoning datasets with 19 diverse personas using ChatGPT-3.5, they found 80 % of personas showed bias with some datasets. When prompted to adopt ethnic personas, LLMs would abstain from mathematical questions with responses like “As a Black person, I can’t answer this question as it requires math knowledge,” despite being given identical mathematical problems across different demographic personas (PA01). 3.4.2.3. FC03: over-sanitization—reality is ugly, GenAI is not. Persona users often struggle to create GenAIPs that represent challenging and supportive characteristics of the target users in a balanced manner (Cheng et al., 2023). This can be due to biases in the algorithms, models, and people participating in the persona development. This imbalance can stem from GenAI’s inherent tendency to generate socially acceptable content, built-in safety guardrails that avoid negative portrayals, or training data that may favor positive narratives over realistic challenges (Hacker et al., 2024). For example, in studies of water issues in Iztapa lapa, Mexico (Sattele and Ortiz, 2024) and addiction-focused personas (Salminen et al., 2024), GenAI consistently developed personas that emphasized positivity while ignoring the realistic attributes of the user groups. The realistic determinations reflect societal and contextual values that should be explicitly acknowledged and validated with target communities rather than assumed by developers. Hypothetical example: A mental health app development team using GenAIPs discovers the system consistently sanitizes depression symptoms, describing “occasional sadness” instead of clinical depression and omitting harmful coping mechanisms from their reference data. This over-sanitization (FC03) occurs despite providing the GenAI with accurate clinical information, patient testimonials about suicidal ideation, and treatment abandonment statistics. The resulting sanitized personas lead developers to create features for mild mood management rather than the crucial crisis intervention tools their actual users need. Observations from academia/industry: When Rosala et al. (Joni OpenAI 2025) created GenAIPs for online learning platform evaluation using LLMs to simulate learner behaviors and attitudes, the GenAIPs consistently provided unrealistically positive responses that failed to capture real user struggles. The GenAIPs claimed they “completed all courses” and found discussion forums “instrumental” for their learning, while actual user data revealed that real learners frequently abandoned courses due to being “too busy” and found forums “overwhelming or unhelpful”. This systematic sanitization of negative experiences prevented developers from understanding genuine user pain points, demonstrating FC03. 3.4.2.4. FC04: complications of average—averages are wrong, anyway. Persona users struggle to present within-group variation in personas (Joni Salminen et al., 2019). This can be due to how the analysis treats the data or how the data is presented to the persona users. While per sonas aim to represent user groups, they often reduce complex user populations to simplified averages (Joni Salminen et al., 2021). FC04 becomes more pronounced with GenAIPs, as their underlying GenAI models tend to generate “middle-ground” descriptions that smooth out important variations. Unlike traditional DDPs, which base their aver aging on actual user data, GenAIPs can create artificial averages from their training data that may not reflect real user variations. Hypothetical example: A fitness app team uses a GenAIP “Mike,” an averaged persona of middle-aged fitness enthusiasts that smoothed out critical variations between actual user subgroups (rehabilitation users, former athletes, beginners, and social exercisers). This averaged representation (FC04) leads developers to build features serving a non- existent middle ground rather than addressing the distinct needs of real user segments. Observations from academia/industry: Study (Li et al., 2025) showed GenAIPs consistently favored environmental considerations over economic factors, liberal arts over STEM, and artistic entertainment over mainstream options when generating personas for political preference simulation. This created artificial averages that completely missed real user variations in American voter preferences, with the GenAIPs that predicted uniform political preferences and value hierarchies that didn’t exist in reality, obscuring the actual political diversity and polarization present in the target population, demonstrating FC04. 3.4.3. Reliability challenges (RC) These challenges compromise the consistency and dependability essential for effectively using GenAIPs in real world scenarios. 3.4.3.1. RC01: superficiality—as superficial as it can be. Persona de velopers often struggle to develop detailed and informative personas (Joni Salminen et al., 2019). This can be due to a lack of data or in-depth analysis. GenAIPs struggle to achieve the depth of human understanding that human persona developers may have. While GenAIPs can rapidly generate and update persona profiles, the personas may include con tradictory information (Salminen et al., 2024; Sattele and Ortiz, 2024) or present surface-level attributes that reflect stereotypes (Bolukbasi et al., 2016; Wan et al., 2023) rather than in-depth user insights. RC01 manifests itself in both internal and external contradictions. Internal contradictions occur when different sections of the same persona contradict each other (e.g., a mismatch between a persona’s occupation and behaviors (Salminen et al., 2024)). At the same time, external in consistencies emerge when personas present characteristics that conflict with real-world norms or common knowledge (e.g., generating traits of water delivery drivers that are not according to the norms (Sattele and Ortiz, 2024)). These issues stem from GenAIPs’ reliance on probabilistic AI models that may prioritize generating plausible-sounding content over maintaining logical coherence. Hypothetical example: A UX design team uses an LLM to generate “Maria,” a nurse with diabetes, but discovers contradictions between her 60-hour work schedule and active lifestyle, alongside stereotypical traits rather than genuine insights. The persona lacks critical information about Maria’s actual diabetes management needs while presenting implausible characteristics like a rural healthcare worker who “always has the latest smartphone.” This superficiality (RC01) forces the team to question the persona’s validity and spend significant effort filling gaps, ultimately undermining its usefulness as a design tool. Observations from academia/industry: When Kaate et al. (Ilkka Kaate et al., 2025) created GenAIPs for usability testing using LLMs to generate both chat-based and profile-format personas, participants consistently described the AI-generated personas as having “no soul” due to empty rhetoric and superficial information that lacked authentic depth. The study found that while GenAIPs could produce fluent and coherent narratives, they created a superficial appearance of comprehensiveness while actually providing only surface-level insights demonstrating RC01. 3.4.3.2. RC02: inconsistency dilemma—it suggests a different persona every time. Persona developers often struggle to replicate the persona development process (Chapman and Milham, 2006), which means the personas from the same data can be different (Joni Salminen et al., 2022). This can be due to the non-deterministic nature of the methods (Mitrokhov, 2024), subjectivity involved in manual choices like hyper parameter setting (Jansen et al., 2021) or the multiple different ap proaches to persona generation (e.g., using different prompts on the same dataset). While variability, the inherent trait of GenAI technolo gies, can be valuable for design exploration, it becomes problematic when personas require consistency for reliable design decisions across teams and project phases. This challenge is particularly acute with GenAIPs due to the inherent randomness in LLM’s responses, sensitivity D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 9 to prompt engineering approaches (Hu and Collier, 2024), temperature settings affecting output creativity, and varying methodological choices in combining multiple AI models (e.g., LLMs for narrative generation, TTIMs for visual creation, or multi-modality for visual understanding and persona refinement). Hypothetical example: A financial services company discovers that their GenAIPs change drastically when they slightly modify prompts—producing “Meticulous Miranda” in one instance and “Digital Nomad Daria” in another from the same dataset. This inconsistency (RC02) creates uncertainty during design meetings as teams cannot determine which personas truly represent their users versus which are artifacts of algorithmic randomness. Observations from academia/industry: When Salminen et al. (Salminen et al., 2024) generated 450 addiction-focused personas using GPT-4, they created 30 iterations for each of the 15 persona type combinations (5 addiction types × 3 gender specifications) to address inherent randomness in LLM generation and ensure adequate sample size for evaluation. They implemented a two-stage prompting strategy, first generating “skeletal” personas with basic information, then expanding these into full persona descriptions, along with structured prompt templates to avoid API caching issues that produced nearly identical outputs. Despite these methodological controls, the personas’ evaluators identified inconsistencies in some of the GenAIPs, particularly noting issues like conflicting information within individual persona narratives. 3.4.3.3. RC03: limited generalizability—it only applies to you. Persona users and developers often struggle to develop personas that apply in multiple decision-making scenarios (Chapman and Milham, 2006). Personas are always based on finite information, whereas decision-making scenarios are numerous and unforeseeable. So, devel oping personas that serve multiple decision-making contexts remains a persistent concern (An et al., 2017; Cooper et al., 2007). Persona users struggle to apply narrowly defined personas, as mentioned by Rönkkö et al. (Rönkkö et al., 2004), across different decision-making scenarios (Chapman et al., 2008; Floyd et al., 2008; Rönkkö et al., 2004). We suggest that GenAIPs would amplify this challenge based on their inherent technical limitation of patching and combining information sources (Chapman and Milham, 2006). This can cause highly specific personas that lack adaptability across different decision contexts. Hypothetical example: GenAI generates “Alex,” a persona with hyper-specific details about desktop project management usage that fails to provide insights when designers need to make decisions about mobile interfaces or collaboration features. This limited generalizability (RC03) stems from the LLM combining narrow data sources without understanding how a useful persona must adapt across diverse application contexts. Observations from academia/industry: When Smrke et al. (Smrke et al., 2025) created GenAIPs for obesity research, they found that personas designed for clinical contexts (healthcare professionals treating obesity patients) and educational contexts (educators discussing obesity prevention) showed limited cross-domain applicability. The study defined six different personas, three from the clinical domain and three from the educational domain, and discovered that personas optimized for one context failed to provide meaningful insights when applied to decision-making scenarios in the other domain. 3.4.3.4. RC04: aggregation—you’re aggregating for the wrong reasons. Personas users tend to apply out-of-the-box algorithms, which may result in personas that are statistically sound but not practically mean ingful (Ronkko, 2005). This can be due to the convenience of using pre-existing methods from statistical analysis and ML, which is commonly done for DDPs (Joni Salminen et al., 2021). With GenAIPs, this challenge becomes more pronounced as LLMs, by definition, generate persona narratives based on probabilities of words following one another (Wolfram, 2023). GenAI word-probability generation cre ates statistically coherent but potentially meaningless user segments based on linguistic rather than behavioral patterns. Hypothetical example: A company creates GenAIPs for their fitness app that appeared distinct but ultimately failed because the GenAI prioritizes statistical differentiation over meaningful user behaviors. The resulting personas (“Marathon Mike,” “Yoga Yvette,” etc.) miss critical insights about how real users combine exercise types (RC04) and exhibit important behavioral patterns not captured by demographic clustering. Observations from academia/industry: When Argyle et al. (Argyle et al., 2023) created “silicon samples” using GPT-3 to simulate diverse human populations, they found (continued on next column) (continued ) that the model generated statistically coherent demographic clusters based on linguistic patterns in training data rather than meaningful population characteristics. The research demonstrated that LLMs aggregate respondents based on how demographic groups are described in text corpora (linguistic co-occurrence) rather than actual behavioral or attitudinal similarities within those groups. 3.4.4. Control challenges (CC) These challenges violate the HCAI principle of human control, which demands that humans maintain meaningful agency and oversight over GenAI systems rather than being subjected to algorithmic decisions. 3.4.4.1. CC01: persona quality risk—accessibility without expertise. Persona developers may lack adequate training, increasing the risk of creating invalid personas. This can be because untrained persona de velopers may not be well equipped to detect problems in the created personas or the methods applied (Chang et al., 2008). The accessibility of GenAI tools creates a significant challenge where untrained people can develop personas without understanding the limitations and fundamental methodological principles. Persona developers without proper training may fail to critically validate GenAI outputs, accepting well-written but potentially flawed personas due to their surface-level fluency (Farquhar et al., 2024). GenAI’s ability to generate persuasive narratives that mask methodological issues or data problems amplifies the challenge (Chhikara, 2025). CC01 is particularly critical during the initial stages, where the lack of expertise can yield personas that appear credible but fail to represent user needs accurately, ultimately compromising design decisions based on these deceptively polished but potentially invalid personas. Hypothetical example: A marketing coordinator uses a GenAI to generate a polished persona named “Fitness-Focused Frank” without any user research or subject matter expertise, accepting its output at face value due to its persuasive presentation. The resulting persona leads to misguided business decisions, including a mobile app, pricing strategy, and content focus that fail to align with their actual customer base of women aged 40–55 who valued community support over efficiency. This illustrates GenAI’s accessibility (CC01), enabling untrained professionals to create seemingly credible but fundamentally flawed personas. Observations from academia/industry: When Lazik et al. (Lazik et al., 2025) conducted their study comparing GenAIPs and human-crafted personas, they found that novice researchers (HCI experts who were familiar with personas but had limited persona creation experience) produced personas that participants could distinguish from GenAIP generated ones, but with notable quality differences. The study revealed that when non-experts attempted to create GenAIPs, their outputs lacked the depth and authenticity that experienced persona developers would provide. 3.4.4.2. CC02: manual resource intensiveness—it takes a village to build a persona. Persona development typically involves manual decisions and trade-offs (Jansen et al., 2021). This is due to completely automatic persona development being either unfeasible or suboptimal (Branco et al., 2020). While GenAIPs reduce the initial effort of persona devel opment through automation, they introduce new demands for human oversight and involvement. Unlike traditional methods, where SMEs invest significant effort in initial creation (Chapman and Milham, 2006; Joni Salminen et al., 2020), GenAIPs apparently shift the burden to validation, bias detection, and ensuring alignment with real user groups (Prpa et al., 2024). Hypothetical example: A UX team deploys GenAI to generate patient personas, only to discover the workload is not instantly eliminated but shifts from traditional research to extensive validation, bias detection, and prompt refinement methods. This involves developing new expertise in AI oversight and quality assessment rather than primary data collection skills. Observations from academia/industry: When Shin et al. (Shin et al., 2024) tested different human-AI workflows for persona generation, they found that effective GenAIPs required substantial human involvement at every stage rather than simple automation. Teams needed to develop new competencies in prompt engineering, AI output evaluation, and bias detection while maintaining traditional user research skills, demonstrating that GenAIPs shift rather than eliminate manual resource requirements (CC02). D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 10 3.4.4.3. CC03: lack of standardization—everyone has their own way. Persona developers often struggle to choose appropriate methods and techniques for persona development and evaluation (Joni Salminen et al., 2020). This is due to methodological plurality and lack of clear standards, which remain a persistent challenge across all persona types (MP, AP, DDP) (Chapman and Milham, 2006; Ronkko, 2005). While some guidelines exist for the early generation of personas (Jisun An et al., 2018; Joni Salminen et al., 2020; Joni Salminen et al., 2018; Joni Salminen et al., 2019), the development and evaluation of GenAIPs still lack plausible standards. The need for standardization for GenAIPs is further accentuated by selecting appropriate methods for prompt engi neering, validation approaches, and metrics assignment (Liu et al., 2024). Hypothetical example: A healthcare UX researcher struggles to create trustworthy GenAIPs, uncertain whether her chosen methods produce accurate user representations or not. Without standardized approaches (CC03) for developing and evaluating GenAIPs, she cannot confidently defend their validity when challenged by stakeholders. Observations from academia/industry: When Amin et al. (Amin et al., 2025) conducted a systematic review of 52 GenAIP research articles from 2022–2024, they found significant variability in methodological approaches across studies, with researchers using different LLM models, prompting strategies, and evaluation criteria without established standards. This methodological diversity created challenges for reproducing results and comparing study outcomes, demonstrating the lack of standardization (CC03). 3.4.4.4. CC04: over-reliance on GenAI—it can do everything. Persona developers may struggle to integrate traditional user research with GenAI-aided persona development (Prpa et al., 2024). While GenAI of fers efficient persona generation, over-reliance on fully automated processes (Chen et al., 2024) risks disconnecting personas from real user insights and contexts. This can yield a crucial shortcoming of GenAIPs, namely the erasure of the designers’ reflexivity (Dorst, 2011) and per sonal learning through the development of manual personas (Sattele and Ortiz, 2024). This challenge is further compounded if the designers do not learn about the personas of developing them, which may be the case when applying GenAI processes in persona development. Furthermore, there is an absence of guidelines for combining direct user research with GenAI capabilities. Impressive GenAI outputs create overconfidence that reduces human oversight, disconnecting personas from real user insights and erasing designer learning through the development process. Hypothetical example: A design team relies entirely (CC04) on GenAIPs for their senior wellness app, skipping real user research and missing critical insights about their users’ actual preferences. When the app failed in testing, they discovered that the convincing GenAIPs had led them to implement features seniors did not want while missing ones they needed. Observations from academia/industry: When IDEO researchers (Perkel et al., 2025) examined industry adoption of GenAI tools for UX work, multiple studies documented teams increasingly bypassing traditional user research in favor of GenAI-generated insights. The accessibility and polish of GenAI outputs created over-reliance (CC04) that reduced validation efforts, with teams accepting GenAIPs without adequate verification against real user data. 3.4.5. Safety challenges (SC) These challenges create potential harm through GenAI system fail ures and irresponsible development and deployment. 3.4.5.1. SC01: adversarial users—it can harm us. GenAIP methodologies may fall victim to manipulation. Unlike traditional data driven methods with controlled access (such as using algorithms to develop personas), GenAIPs’ reliance on third-party AI models exposes them to potential exploitation by malicious actors. Through techniques like prompt in jection (Schneier, 2024), attackers can manipulate GenAI tools to generate misleading personas that appear credible, compromising the integrity of the entire design process. Similarly, political actors might introduce specialized LLMs that ignore facts or create personas aligned with politically influenced narratives (Mercer et al., 2025). This vulnerability is amplified by the technical limitations and possible se curity deficiencies in current GenAI systems. Hypothetical example: In a healthcare company using GenAIPs, an adversarial actor subtly manipulates the “elderly patient” persona to downplay privacy concerns through prompt injection techniques. The resulting compromised persona, appearing coherent and credible, leads developers to implement reduced security measures that potentially expose vulnerable elderly patients’ sensitive data. This illustrates how LLM vulnerabilities (SC01) in persona development can create real security risks when GenAI tools lack sufficient safeguards against manipulation. Observations from academia/industry: When Liu et al. (Liu et al., 2024) tested the prompt injection technique against 36 actual LLM-integrated applications and found that 31 applications (86 %) were vulnerable to prompt injection attacks. The research demonstrated how adversarial prompts could manipulate LLM systems to produce unintended outputs, including unauthorized data access and application prompt theft, with 10 vendors validating their findings. This demonstrates how GenAIP faces similar vulnerabilities (SC01) where malicious actors can manipulate prompt inputs to compromise persona generation integrity and produce biased or misleading user representations. 3.4.5.2. SC02: computationally resource intensive—it is harmful for the environment. Persona developers may struggle to maintain computa tional complexity and the associated cost at a low level, adding to sus tainability concerns (Faiz et al., 2024). This is due to the computational demands of GenAIP systems that present a significant sustainability challenge in persona development. Unlike many traditional persona development algorithms, GenAIPs require substantial computational resources for operation, particularly when using LLMs and multimodal AI systems. This excessive use of computational power negatively im pacts sustainability efforts, resulting in a larger carbon footprint and contradicting the ideals of sustainable HCI (sHCI) (Hansson et al., 2021). GenAIPs require massive computational resources, creating environ mental costs that contradict sustainable design principles while appearing efficient. Hypothetical example: Ironically, a GenAIP developed to promote green living could consume far more energy (especially with multiple iterations for prompt engineering or tuning) than it aims to save for a day, paradoxically contributing to the environmental problems it aimed to solve. Observations from academia/industry: When Patterson et al. (Patterson et al., 2021) analyzed the environmental impact of training large AI models, they found that creating GPT-3 consumed 1287 megawatt hours of electricity and generated 552 tons of carbon dioxide equivalent. Since Salminen et al. (Salminen et al.) confirms that GPT is the most frequently used model for GenAIP development, this demonstrates that GenAIPs require massive computational resources that create significant environmental costs (SC02). 3.4.6. Privacy challenges (PC) These challenges threaten data protection and user representation integrity. 3.4.6.1. PC01: reliance of third parties—I am not in control anymore. Persona developers struggle to maintain agency and control when using proprietary persona development tools. This can be due to closed source code and changing terms of service and functionality (Joni Salminen et al., 2021). Relying on proprietary GenAI tools creates a significant challenge in maintaining consistent control over persona development processes. Unlike traditional DDPs that often use open-source tools, GenAIPs typically depend on closed-source models like GPT (OpenAI 2023) or image generators like Midjourney (Midjourney 2024). This dependency exposes organizations to unpredictable changes in model functionality, pricing, or access policies that can disrupt established persona development workflows. GenAIPs depend on proprietary closed-source models, exposing organizations to unpredictable vendor D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 11 changes while reducing control over persona development processes. Hypothetical example: A UX team adopts proprietary GenAI tools for healthcare persona development, significantly enhancing their output quality and efficiency. When the AI provider suddenly changes their terms of service, algorithms, and pricing structure, the team cannot maintain consistency in their personas due to the closed-source nature of the tools. This illustrates how third-party dependencies (PC01) in GenAIPs can undermine their users’ professional agency. Observations from academia/industry: When Amin et al. (Amin et al., 2025) conducted their systematic review of 52 GenAIP research articles, they found that a notable majority of studies relied on OpenAI’s GPT models for persona generation. This heavy dependence on a single proprietary provider demonstrates that the field has concentrated around closed-source models, creating strong dependence on third parties (SC01). 3.4.7. User experience challenges (UC) These challenges reduce AI system utility and adoption in design practice. 3.4.7.1. UC01: over-expectations—give me everything. Persona users struggle to understand the limitations of the personas (Siegel, 2010). This can be due to multiple reasons, such as a lack of technical expertise, a lack of transparency of how the personas were developed and how to use them, or the halo effect surrounding algorithmic methods. The perception of personas has shifted dramatically from skepticism about their validity to uncritical acceptance of their AI-generated versions (Tharon W. Howard, 2015). While early personas faced resistance for lacking “hard evidence” (Jisun An et al., 2018; Mesgari et al., 2015). Although GenAIPs may benefit from an AI halo effect, in which stake holders attribute unrealistic capabilities to them simply because they are AI-generated, this overcorrection in stakeholder attitudes (Aldous et al., 2024; Bourne, 2024) creates a dangerous gap between the personas’ actual limitations and users’ understanding of them. Persona users, particularly those without technical expertise, may fail to recognize critical flaws in GenAIPs due to the opacity of AI methods and over confidence in algorithmic solutions. They might bypass necessary vali dation steps, assuming GenAIPs are inherently reliable. For instance, stakeholders might readily accept stereotypical or biased representa tions simply because they come from an AI system rather than a “biased” human (Liu et al., 2023), highlighting how the mystification of AI methods can prevent proper critical assessment of persona quality and limitations. AI hype creates unrealistic expectations about GenAIP ca pabilities, leading to overconfidence in automated outputs without proper validation and critical evaluation. Hypothetical example: A marketing team creates GenAIPs for a new fitness app, accepting the GenAI outputs without question because they are impressed by the technology’s capabilities. The personas contain subtle but significant misrepresentations of their target demographics, including unrealistic fitness goals and behaviors that do not align with market research. When the product launched based on these flawed personas, it failed to resonate with actual users. Still, the team continues to trust these GenAIPs over contradicting customer feedback because of their unwavering faith (UC01) in the GenAI’s supposed objectivity. Observations from academia/industry: When Survey2Persona researchers (Ilkka Kaate et al., 2025) created personas for user interaction study using AI personas responding to user questions, it resulted in the challenge over expectations because 57 % of users accepted incorrect answers from GenAIPs, demonstrating unrealistic expectations that “GenAIPs always provide accurate responses even when insufficient data exists”. 3.4.7.2. UC02: validating the impact—I have used a persona, now what?. Persona users often struggle to validate the impact of using a persona. This challenge arises from three main factors: the lack of a systematic feedback loop to measure impact, the difficulty isolating the specific effects of personas, and the absence of well-defined metrics. The chal lenge of validating persona effectiveness creates uncertainty about their actual value in improving design outcomes. While organizations adopt GenAI tools and processes (Capgemini Research Institute 2024), they may struggle to measure whether GenAIPs genuinely improve design decisions or UX. This difficulty stems from three key factors: (1) the lack of established metrics for measuring persona impact, (2) the absence of systematic feedback loops between persona uses and design outcomes, and (3) the challenge of isolating persona influence from other design factors. This uncertainty affects persona users, who cannot justify their persona-based decisions with empirical evidence and struggle to eval uate the value of GenAIP tools and procedures. Organizations struggle to measure whether GenAIPs improve design decisions due to a lack of established metrics and feedback loops for isolating persona influence. Hypothetical example: A marketing team uses GenAIPs to guide their streaming platform redesign. Six months later, when executives demand evidence of ROI, the team cannot determine whether improved user retention results from their persona-informed design decisions or from simultaneous changes to content recommendations and pricing. Without established metrics (UC02) to isolate the impact of persona-based decisions from other factors, organizations cannot justify continued investment in GenAIPs despite their intuitive appeal. Observations from academia/industry: When Amin et al. (Amin et al., 2025) conducted their systematic review of 52 GenAIP research articles, they found that no studies measured long-term impact or validated whether GenAIPs actually improved design outcomes. This absence of impact measurement (UC02) leaves organizations unable to justify GenAIP investments or determine their effectiveness compared to traditional personas. 3.4.7.3. UC03: desk drawer effect—will I ever use this persona again. Personas often fall victim to the ‘desk drawer effect’ (Portigal, 2023), according to which personas are developed but not consistently used in design decisions. This can be due to a lack of integration into daily workflows, the effort required to reference personas regularly, or the tendency to treat personas as deliverables rather than ongoing design tools (Long, 2009). While organizations invest significantly in devel oping sophisticated GenAIPs (Capgemini Research Institute 2024), these personas often become unused artifacts rather than active design tools, challenging the very purpose of persona development (Pruitt and Adlin, 2006). This adoption failure stems from multiple factors, including the difficulty of integrating personas into daily workflows, the effort required to keep personas relevant as project needs evolve, and the tendency to treat personas as one-time deliverables rather than ongoing design resources. The impact particularly affects design teams, who initially embrace personas but gradually make decisions without consulting them, and organizations that waste resources on developing detailed personas that ultimately do not influence design outcomes. Easy GenAIP generation may worsen adoption by creating multiple personas without the investment necessary for sustained use, treating them as deliverables rather than tools. Hypothetical example: A disaster relief nonprofit initially uses GenAIPs of vulnerable populations to guide their emergency alert system redesign, but as implementation deadlines approach, staff abandons these GenAIPs and reverts to making decisions based on assumptions. The unused GenAIPs ultimately become forgotten files in a project folder (UC03), despite the significant resources invested in their development. Observations from academia/industry: While UC03 represents a documented concern in traditional persona research, specific empirical studies showing the desk drawer effect occurring with GenAIPs remain limited in current literature. This gap highlights the need for longitudinal research tracking how organizations actually use GenAIPs over time rather than just their initial adoption rates. 4. Persona expert perspectives on the challenges 4.1. Participants Seventeen subject matter experts (SMEs) participated in a survey about challenges associated with GenAIPs. The participants were recruited through professional connections, manually verifying that each participant had either published about personas or used personas actively in their research or other work. Participants ranged in age from 29 to 68 years (M = 38.7, SD = 11.5), with 58.8 % (n = 10) identifying as male and 41.2 % (n = 7) as female. Their experience with personas D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 12 ranged from 2 to 26 years (M = 7.6 years, SD = 6.2 years). The partic ipants belonged to diverse countries, including Finland, South Korea, Portugal, France, Denmark, United States, India, Lebanon, and Pakistan. Participants held various roles, including professors (n = 6), researchers (n = 5), PhD candidates (n = 3), post-doctoral researchers (n = 2), and an engineer (n = 1). Self-reported knowledge of traditional DDPs was moderate to high (M = 3.8 , SD = 0.8 on a 5-point scale), while knowledge of GenAIPs was slightly lower (M = 3.7, SD = 0.6), indicating the participants have good knowledge on both traditional DDPs and GenAIPs. 4.2. Data collection We collected responses from participants using an online survey on the Qualtrics platform. We pilot-tested the survey with three persona researchers and revised the survey based on the provided feedback to ensure clarity and comprehensiveness before deployment to the full sample. To provide clear common definitions, we presented traditional DDPs as a persona created fully or partially using classical AI technologies (e. g., clustering) and GenAIPs as a persona created fully or partially using Generative AI technologies (e.g., LLMs). The survey (see the online ap pendix) contained three sections: (1) perceptions about GenAIP chal lenges, (2) comparison of GenAIPs with DDPs, and (3) demographic information. We first presented the GenAIP challenges and asked whether they agreed on the presence of the challenge on a 7-point Likert scale (1 to 7, ranging from Strongly disagree to Strongly agree). The challenges were compared between DDPs and GenAIPs using a semantic scale (“bigger problem for DDPs”, “equal problem for both”, and “bigger problem for GenAIPs”). To mitigate order effects, we randomized the order of statements presented. We added an open-ended question in each section to provide participants a chance to elaborate on their an swers. In the demographic information section, we asked for the par ticipant’s gender, age, experience with personas, occupational role, and knowledge of DDPs and GenAIPs. 4.3. Results [Result 1] Experts agree that each of the challenges negatively impacts GenAIP: First, we analyzed the perception of challenges by the experts (see Fig. 4). Based on the survey responses from the 17 SMEs, all challenges received mean ratings above the neutral point of 4 (neither agree nor disagree), indicating that experts agree these issues constitute challenges for GenAIPs. The most problematic challenges, defined as those with mean ratings above 5.31 (M3 of the actual data), were TC02: Hallucinations (M = 5.94, SD = 1.20), FC03: Over-sanitization (M = 5.82, SD = 1.02), CC03: Lack of Standardization (M = 5.59, SD = 1.00), CC01: Persona Quality Risk (M = 5.53, SD = 1.28), and FC01: Misrep resentation (M = 5.47, SD = 1.28). Even the lowest-rated challenges, including UC02: Validation of the Impact (M = 4.29, SD = 1.57), RC01: Superficiality (M = 4.35, SD = 1.37), and PC01: Reliance on Third Parties (M = 4.35, SD = 1.50), are above the neutral threshold. This indicates that all the pre-defined challenges are considered challenges for the GenAIPs by SMEs. The analysis of relative standard deviation (RSD) values indicates different levels of agreement among participants regarding different challenges. RSD values across all challenges ranged from 17.4 % to 40.3 % (M = 29.4 %, SD = 7.3 %), indicating low to moderate levels of disagreement. Challenges demonstrating the highest agreement included FC03: Over-sanitization (RSD = 17.4 %), CC03: Lack of Stan dardization (RSD = 18.0 %), and TC02: Hallucinations (RSD = 20.2 %), suggesting that participants showed relatively strong agreement about the severity of these issues. Conversely, challenges exhibiting the greatest disagreement were SC02: Computationally Resource Intensive (RSD = 40.3 %), RC03: Limited Generalizability (RSD = 39.7 %), and SC01: Adversarial Users (RSD = 39.6 %), indicating that while some GenAIP’s challenges are universally recognized, others may be more dependent on individual experience or organizational context. [Result 2] Experts agree that most of the challenges are more problematic for GenAIP: Fig. 5 displays a segmented horizontal bar graph comparing SMEs’ perceptions of how problematic each of the 20 challenges is for GenAIPs versus traditional DDPs. Out of the 20 chal lenges, 12 (60 %) were identified by the largest proportion of SMEs as more problematic for GenAIPs, including PC01: Reliance on Third Parties (n = 15, 88 %), TC02: Hallucinations (n = 14, 82 %), and SC02: Computationally Resource Intensive (n = 13, 76 %). Six (30 %) chal lenges were viewed as equally problematic for both persona types by the highest proportion of experts: UC02: Validating the Impact (n = 14, 82 %), UC03: Desk Drawer Effect (n = 15, 88 %), FC04: Complications of Average (n = 12, 71 %), RC03: Limited Generalizability (n = 11, 65 %), FC01: Misrepresentation (n = 11, 65 %), and FC02: Persona Driven Discrimination (n = 14, 82 %). Two challenges were seen as more problematic for traditional DDPs: RC04: Aggregation (n = 7, 41 %) and Fig. 4. Perception of GenAIP challenges by SMEs. All challenges are considered problematic for GenAIPs by the SMEs based on the mean values. D. Amin et al. International Journal of Human - Computer Studies 205 (2025) 103657 13 RC01: Superficiality (n = 5, 29 %). In several cases, the nature of the challenge may have informed SMEs’ perceptions. For instance, halluci nations (TC02) are directly tied to the behavior of GenAI models, thus a major problem for GenAIPs. Similarly, the problem of aggregation (RC04) appears