Generative AI personas considered harmful? Putting forth twenty 
challenges of algorithmic user representation in 
human-computer interaction

Danial Amin a,* , Joni Salminen a, Bernard J. Jansen b, Joongi Shin c, Dae Hyun Kim d,e

a University of Vaasa, Vaasa, Finland
b Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
c Aalto University, Aalto, Finland
d KAIST, Daejon, South Korea
e Yonsei University, Seoul, South Korea

A R T I C L E  I N F O

Keywords:
Personas
Generative AI personas
Harmful
Human-centered AI

A B S T R A C T

Generative AI personas (GenAIPs) promise user-centred design efficiency, but their impact on different persona 
challenges remains unexplored. Inspired by Dijkstra’s classic essay on harmful programming constructs, we 
analyze twenty challenges in persona development using Human-Centered AI principles. Through literature 
review and expert survey (n = 17), we find that GenAIPs transform rather than eliminate traditional persona 
challenges. Experts rated all challenges as problematic for GenAIPs (M > 4.0), with the highest concerns for 
hallucinations (M = 5.94), over-sanitization (M = 5.82), and lack of standardization (M = 5.59). 12 out of 20 
challenges are considered more problematic for GenAIPs than conventional personas, particularly bias ampli
fication, validation challenges, and accessibility without expertise. We provide HCAI-grounded guidelines 
demonstrating that effective GenAIP implementation requires human-AI collaboration rather than automation 
and prioritizing user welfare over technical efficiency.

1. Introduction

Personas are a user-centered design (UCD) technique that represents 
archetypal characteristics of target user groups in a humanized manner. 
Personas represent information such as goals and behaviors of users, 
customers, or beneficiaries (Nielsen, 2019), typically portrayed as 
persona profiles (Cooper, 1999; Nielsen et al., 2015) (see Fig. 1). Per
sonas inform decision makers (e.g., product designers and developers) 
about real users’ needs and can enable them to design more targeted and 
focused products and services. Creating high-quality personas that 
accurately represent targeted users and foster empathy towards them is 
a critical process in human-computer interaction (HCI) and user expe
rience (UX) design in multiple domains, such as healthcare services, 
education, privacy, and security (Cooper et al., 2007; Guan et al., 2023; 
Nielsen, 2019; Salminen et al., 2021; Salminen et al., 2022).

Development of data-driven personas (DDPs) (Mijač et al., 2018) has 
evolved alongside advances in statistical inference, machine learning, 
and data science (Salminen et al., 2021), including the development of 
Generative AI personas (GenAIPs) by using GenAI technologies, such as 
Large Language Models (LLMs) (Schuller et al., 2024), Text-to-Image 
Models (TTIMs) (Sattele and Ortiz, 2024), and multi-modal models 
(Salminen et al., 2024). To this extent, using GenAIPs to represent 
groups of people is an ongoing research topic in persona science1 (Hong 
et al., 2023; Nah et al., 2023; Salminen et al., 2023; Salminen et al., 
2024; Schuller et al., 2024; Shin et al., 2024). Researchers have identi
fied potential benefits of GenAI in persona development, including 
segmenting user data (Salminen et al., 2024), writing persona narratives 
(Schuller et al., 2024), and providing conversational user interfaces 
(Shin et al., 2024). Likewise, GenAIPs can help simulate user analysis 
without real-user constraints. TTIMs can generate persona profile 

* Corresponding author.
E-mail addresses: danialam@uwasa.fi (D. Amin), joni.salminen@uwasa.fi (J. Salminen), bjansen@hbku.edu.qa (B.J. Jansen), joongi.shin@aalto.fi (J. Shin), 

dhkim16@yonsei.ac.kr, dhkim16@alumni.stanford.edu (D.H. Kim). 
1 Persona science is the systematic study and methodology of creating, validating, and applying data-driven user archetypes to understand and represent real user 

segments (Nielsen, 2019)

Contents lists available at ScienceDirect

International Journal of Human - Computer Studies

journal homepage: www.elsevier.com/locate/ijhcs

https://doi.org/10.1016/j.ijhcs.2025.103657
Received 24 November 2024; Received in revised form 13 October 2025; Accepted 16 October 2025  

Int. J. Human–Computer Studies 205 (2025) 103657 

Available online 17 October 2025 
1071-5819/© 2025 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by- 
nc-nd/4.0/ ). 

https://orcid.org/0009-0000-7597-2267
https://orcid.org/0009-0000-7597-2267
mailto:danialam@uwasa.fi
mailto:joni.salminen@uwasa.fi
mailto:bjansen@hbku.edu.qa
mailto:joongi.shin@aalto.fi
mailto:dhkim16@yonsei.ac.kr
mailto:dhkim16@alumni.stanford.edu
www.sciencedirect.com/science/journal/10715819
https://www.elsevier.com/locate/ijhcs
https://doi.org/10.1016/j.ijhcs.2025.103657
https://doi.org/10.1016/j.ijhcs.2025.103657
http://creativecommons.org/licenses/by-nc-nd/4.0/
http://creativecommons.org/licenses/by-nc-nd/4.0/


images (Sattele and Ortiz, 2024), and Text-to-Video (T2V) generation 
models can be used to develop deepfake personas (Kaate et al., 2023) to 
increase the level of immersion. Likewise, GenAIPs can help simulate 
user analysis methods without real-user constraints.

However, the integration of GenAI as an active agent (Joni OpenAI 
2025) in persona development fundamentally transforms this from a 
traditional HCI methodology into a Human-Centered AI (HCAI) chal
lenge, raising critical questions about algorithmic transparency (Gupta 
et al., 2024), fairness (Chu et al., 2024), and human control that are 
absent in conventional persona development methods. This shift from AI 
as a passive analytical tool to an active agent making autonomous de
cisions about user characteristics and narratives necessitates examining 
persona development through established frameworks for responsible AI 
(Papagiannidis et al., 2025) deployment. Moreover, because GenAI can 
generate deepfake personas, such applications raise significant ethical 
concerns regarding potential misuse for deception and require strict 
disclosure protocols and consent frameworks (Al-kfairy et al., 2024; 
Moreno, 2024; Narayanan Venkit et al., 2025).

Therefore, despite the possible benefits of GenAIPs, researchers are 
increasingly aware of the risks of using GenAI in persona development, 
such as introducing bias and exclusion toward real users (Cachat-Rosset 
and Klarsfeld, 2023; Ai-Leen Goodman-Deane et al., 2021), raising 
ethical concerns (Shams et al., 2023), and reducing the explainability of 
persona development (Bender et al., 2021). Despite this interest in both 
opportunities and challenges, most of the GenAIPs’ potential harms have 
not been systematically mapped in literature, making it difficult to form 
a comprehensive picture of the impact of these new technologies on user 
representation through personas. Seeing GenAIPs as an advancement or 
risk parallels the dichotomy of techno-optimism and techno-pessimism 
in AI technologies (Königs, 2022). This perspective is connected with 
the broader concern of how GenAI should be applied in HCI and UCD 
(Du et al., 2024; Hsu et al., 2024; Jung et al., 2025; Rapp et al., 2025).

In spite of the growing adoption of GenAI in persona development, 
systematic analysis of associated risks and challenges remains absent 
from HCI literature. While individual studies may report specific issues, 
no comprehensive framework exists to understand how GenAI trans
forms traditional persona challenges or guide responsible implementa
tion. This knowledge gap leaves practitioners without adequate 
guidance for addressing the unique challenges of GenAI-assisted persona 
development.

To this end, we examine the potential challenges and “harmfulness” 
of using GenAI in persona development. To systematically pursue this 
topic, we first define harmfulness in the context of GenAIPs. In this 

research, we define harmfulness as the potential negative impacts that 
GenAIPs could have on stakeholder groups within HCI and UCD prac
tice. These harms can manifest in multiple ways, such as misrepresent
ing user groups, propagating biases, erosion of authentic user research 
practices, or misinformed design decisions based on synthetically 
generated personas. For example, harm could be elderly-focused 
healthcare apps ignoring users’ needs for explicit system feedback 
(Alessa and Al-Khalifa, 2023), GenAIPs that perpetuate workplace 
discrimination (Cachat-Rosset and Klarsfeld, 2023), or design tools 
failing to capture the diverse factors causing digital exclusion (Ai-Leen 
Goodman-Deane et al., 2021). Our research follows three phases. First, 
we conducted a literature review to identify twenty challenges in Gen
AIP development. Second, we surveyed seventeen persona experts to 
evaluate each challenge’s severity and compare GenAIPs to traditional 
personas. Third, we propose practical guidelines based on HCAI prin
ciples to address these challenges.

We categorize GenAIP challenges using Shneiderman’s HCAI prin
ciples (Shneiderman, 2022) across seven themes: transparency, fairness, 
reliability, control, privacy, safety, and user experience. This framework 
is essential because GenAI transforms persona development from 
traditional HCI methodology into human-AI collaboration requiring 
ethical oversight. The approach shifts focus from technical details to 
human-centered design principles and reveals why challenges matter for 
human welfare rather than when they occur. Connecting each challenge 
to established AI ethics principles enables targeted interventions and 
allows researchers to leverage existing solutions while communicating 
to stakeholders. The HCAI framework is particularly valuable because it 
aligns GenAIP research with the broader movement toward responsible 
AI development, enabling researchers to leverage existing solutions and 
guidelines from the AI ethics community.

Identifying and examining these challenges maps out the path to the 
potential harms that GenAIPs could cause to various stakeholders of the 
HCI research community. Theoretically, we demonstrate how HCAI 
principles provide a novel framework for understanding GenAIP chal
lenges. Practically, we provide actionable guidelines for responsible 
GenAIP implementation, including bias detection protocols and human- 
AI collaboration workflows.

2. Conceptual background

Examining the challenges of GenAIPs requires understanding (1) the 
diverse forms of persona generation methods and (2) their criticisms. 
The first subsection covers the evolution of persona from manual 

Fig. 1. A representative of GenAIP1 (left) and Manual persona (adopted from (Delve, 2025)) (right). The GenAIP comprises demographical and contextual data 
generated while the manual persona uses self-reflected survey responses. 
1Created using Survey2Persona (https://s2p.qcri.org/), an online GenAIP creation tool using survey data.

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

2 

https://s2p.qcri.org/


development by subject matter experts (SMEs) to the current-day GenAI 
techniques. The second subsection presents a synopsis of the criticism of 
personas as design and HCI research tools, ranging from general con
cerns about their usability to specific methodological challenges.

2.1. The diverse forms of personas

2.1.1. Pre-automation era
Initially a qualitative method (Alan Cooper, 1999), persona creation 

has been primarily manual, relying on SMEs to collect user data through 
small-scale techniques (i.e., focus groups or surveys), analyze the 
collected user data, and create the content of personas (Nielsen, 2019). 
While data collection can involve thousands of participants, the critical 
bottleneck lies in the capacity of the experts in processing and inte
grating information from diverse data sources into coherent personas. 
These manual personas (MPs) have crucial limitations. First, generating 
representative MPs is difficult. Human analysis is prone to introducing 
prior beliefs or biases about user groups (Chapman and Milham, 2006). 
Second, the labor-intensive nature of manual analysis makes it difficult 
to scale MPs toward statistically representative personas, as each addi
tional data source requires proportional increases in human cognitive 
effort for synthesis and interpretation (Jisun An et al., 2018; Chapman 
and Milham, 2006; Charity Howard, 2015; T.W. Howard, 2015; Jansen 
et al., 2020). Third, while MPs can be modified in practice (particularly 
in commercial contexts responding to market feedback), each modifi
cation demands substantial human effort to maintain internal consis
tency and empirical grounding, making systematic updates 
resource-intensive and rendering MPs relatively static compared to 
automated approaches that can rapidly incorporate new data streams 
(Chapman and Milham, 2006; Joni Salminen et al., 2020). This static 
nature of MPs makes them challenging to utilize in modern society, 
where practitioners need to observe many user groups with swift 
changes in their opinions.

2.1.2. Rise of automatic persona generation
The limitations of MPs have sparked the idea of creating personas 

using algorithms and statistical methods on large and dynamic datasets. 
Researchers introduced automatic personas (APs) that are developed 
from structured data (e.g., surveys and demographics) using statistical 
methods automatically (Jung et al., 2018). Data-driven personas (DDPs) 
emerged as a first-hand response to the limitations of the traditional 
MPs. The idea was to augment manually developed personas from 
“low-tech design artifacts” to “high-tech user representations” (Jansen 
et al., 2020). In other words, DDPs are complete persona profiles devel
oped using qualitative and/or quantitative data about a given user popula
tion, which is analyzed using quantitative methods, including data science 
and machine learning algorithms (Jisun An et al., 2018; Jansen et al., 
2018; Jansen et al., 2020). Diverse approaches to DDPs have emerged as 
underlying technologies in computer science have advanced. Research 
articles identifying key moments in the evolution of personas in general 
and DDPs in particular are represented in Fig. 2.

GenAIPs represent another advancement in persona development 
(see Fig. 2), extending beyond DDPs by using GenAI technologies to 
develop personas (Nah et al., 2023; Zhang et al., 2024). Unlike DDPs 
that primarily analyze data patterns, GenAIPs encompass multiple AI 
models (LLMs, TTIMs, multimodal (a model that can interact with text, 
images, and videos together)) to automatically develop detailed per
sonas with narratives, visuals, and behavioral patterns. Since their 
introduction in 2022 (Hong et al., 2023; Nah et al., 2023), GenAIPs have 
evolved from basic text generation to dynamic systems that can create 
and update personas in real-time using continuous data streams (Kim 
and Kim, 2019; Zhang et al., 2024), marking a significant shift in 
persona development. In particular, this includes (1) rapid development 
of personas at scale with minimal human involvement (Salminen et al., 
2024), (2) consistent narrative structure across developed personas due 
to standardized prompting (Schuller et al., 2024), (3) ability to quickly 

iterate and refine personas based on feedback by adjusting prompts 
(Shin et al., 2024), (4) novel persona profile features such as chat (Jung 
et al., 2025), and (5) help in synthesizing and writing persona de
scriptions from raw user data (Schuller et al., 2024). These capabilities 
reflect different GenAI application approaches, with aspects (1) and (3) 
describing fully automated workflows suitable for rapid prototyping, 
while aspect (5) represents human-AI collaborative workflows requiring 
substantial domain expertise. This methodological diversity within 
GenAIP implementations serves different use cases rather than repre
senting inconsistent technological capabilities.

2.2. Criticisms and challenges of personas

Despite the benefits of generating personas, there have been criti
cisms around the use of personas. First, there are criticisms of personas as 
a design technique in general. These criticisms apply to all types of 
personas, including MPs, APs, and DDPs (Chapman and Milham, 2006; 
Ronkko, 2005). Second, there are approach-specific criticisms, e.g., 
manually created personas are often based on low sample sizes 
(Chapman and Milham, 2006). Third, there are method-specific criticisms 
within a specific approach, e.g., that K-means clustering would not be 
optimal for DDPs because it assigns each demographic group only to a 
single cluster (Kwak et al., 2017). This section introduces each type of 
criticism in a broad manner.2

Some prior research (Chapman et al., 2008; Chapman and Milham, 
2006) has presented general criticism. By posing the overarching ques
tion, “Are personas really usable?”, Howard (T.W. Howard, 2015) crit
icizes the persona generation technique. The crux of his criticism is that, 
although personas were introduced to facilitate communication among 
team members in UCD, personas do not solve communication problems and 
can even lead to further misunderstandings. Friess (Friess, 2012) made a 
similar conclusion based on an ethnographic study, reporting that de
signers rarely evoke or mention personas in their daily jobs. Matthews et al. 
(Matthews et al., 2012) investigated users’ attitudes about personas, 
finding that decision-makers perceived them as too abstract and misleading. 
Finally, De Voil raises several key issues regarding the concept of per
sonas, proposing that personas are artificial thinking aids with severe lim
itations (Voil, 2010). Previous work has also discussed the societal 
challenges of persona use, which comes as a criticism of the application 
of the persona. These criticisms mainly include stereotyping (Marsden 
and Haag, 2016; Ronkko, 2005; Turner and Turner, 2011), in which a 
segment of the users is represented based on prejudiced misconceptions. 
These issues primarily arise due to the applicability of the personas and 
thus are independent of the method and type of the persona.

A significant volume of literature on DDPs considers approach and 
method-based criticism. Chapman and Milham raise several shortcomings 
of DDPs (Chapman and Milham, 2006), including (1) the inconsistency 
problem in the generated personas, where one part of persona profile in
formation can be from Source A and another from Source B, which may 
or may not refer to the same users; and (2) the granularity problem where 
increasing the number of persona attributes requires more personas to 
be created to cover all possible segments. Salminen et al. (Joni Salminen 
et al., 2020) mentioned “three Es” as general challenges of personas that 
can be extended to DDPs and GenAIPs: (1) Envision (i.e., personas have 
no direct relationship to real user data), (2) Execution (i.e., the quality of 
the generated personas is low or unknown), and (3) Evaluation (i.e., the 
success of personas is based on anecdotal feedback). The latter two can 
be considered relevant concerns for DDPs and MPs alike. In addition, 
Salminen et al. (Joni Salminen et al., 2020) mentioned the following 
challenges of AP creation: (1) lack of standards and best practices, (2) lack 
of ethical considerations, and (3) loss of immersion. These are critical issues 

2 Additional resources recommended for the reader include a literature re
view of quantitative persona creation (Salminen et al., 2020) and a textbook 
focused on data-driven personas (Jansen et al., 2021).

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

3 


that we expand on in the subsequent section.
These traditional persona challenges provide the foundation for 

understanding how GenAI transforms existing issues rather than 
creating entirely novel problems. In our subsequent analysis, we 
demonstrate how each of these established challenges manifests differ
ently in GenAI contexts, for instance, bias becomes algorithmic bias 
operating at scale, validation difficulties become opacity problems, and 
inconsistency becomes AI hallucination with convincing but fabricated 
content. Hence, they remain relevant for GenAIPs. Next, we shift our 
attention to the analysis of GenAIPs’ potential harms.

3. Are GenAIPs considered harmful?

This section examines the potential harms of GenAIPs through 
analyzing challenges across the persona development lifecycle. We first 
outline our approach for identifying and categorizing these challenges, 
explaining how they can lead to harmful outcomes.

3.1. Approach

Our approach to examining these potential harms draws inspiration 
from Dijkstra’s seminal 1968 essay “Go To Statement Considered Harm
ful” [38], which fundamentally affected the programming community’s 
view on a widely-used coding construct. Dijkstra first established ideal 
standards for program comprehension by examining how programmers 
understand program execution, then demonstrated through systematic 
analysis how the “goto” statement violated these standards by making 
program behavior unpredictable. The essay’s impact exceeded its im
mediate technical context, establishing a framework for critically 
examining seemingly beneficial technological practices. Many re
searchers have adopted Dijkstra’s verbiage of ‘considered harmful’ as a 
starting point and inspiration [36, 37, 48, 84, 122].

We can observe a clear parallel with Dijkstra’s thinking in adopting 
GenAIPs. While they offer apparent benefits in efficiency and scalability, 
they could introduce significant challenges and harms in HCI practices. 

Similar to Dijkstra’s work, we systematically examine the potential 
harms of GenAIPs by analyzing challenges across the persona develop
ment lifecycle [2] and their impacts on various stakeholders. This 
analysis helps us understand whether GenAIPs could be harmful and 
how and in what contexts these harms might manifest. Just as Dijkstra’s 
analysis led to more structured programming approaches, our analysis 
can guide more responsible integration of GenAI in persona develop
ment. This analogy guides our research question: “Are GenAIPs consid
ered harmful?”.

3.2. Methodology

To address the question of harmfulness in GenAIPs, we focus on 
identifying challenges that exist across HCAI principles and then query 
persona researchers’ perspectives on how crucial these challenges are 
relative to previous DDPs that did not utilize GenAI technologies. By 
challenges, we mean specific difficulties, limitations, or risks (a) 
inherent to GenAI technology itself, (b) emergent from using GenAIPs 
inefficiently, (c) resultant from human interaction with GenAIP, or (d) 
representative of areas needing improvement for better GenAIP quality 
and reliability. We organize challenges according to seven HCAI prin
ciples: transparency, fairness, reliability, control, privacy, safety, and 
user experience (Shneiderman, 2022). Each challenge is analyzed in 
terms of its impact on different stakeholders: persona developers, 
persona users, and target groups. These stakeholders are defined as 
follows: (1) persona developers are responsible for developing personas 
from data collection to their application, (2) persona users use the 
personas in their work for decision making, and (3) target groups are 
represented by the personas.

Our methodology uses a three-pronged approach to examine GenAIP 
harmfulness: (1) snowball literature sampling mapping twenty chal
lenges to prior work, (2) empirical case study analysis of four published 
GenAIP studies showing how challenges manifest in practice, and (3) 
expert survey with 17 SMEs providing quantitative validation and 
comparative assessment against traditional personas. This approach 

Fig. 2. The evolution of persona development methods from 1999 to 2024, highlighting key milestones across four methodological streams: Manual Personas 
(purple), Automated Personas (green), Data-Driven Personas (orange), and GenAI Personas (blue) (An et al., 2016; Alan Cooper, 1999; Jung et al., 2021; Kwak et al., 
2017; Liu et al., 2023; McGinn and Kotamraju, 2008; Mulder and Yaar, 2006; Zhang et al., 2016).

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

4 

https://www.zotero.org/google-docs/?broken=rVgLNC
https://www.zotero.org/google-docs/?broken=LU4IKi
https://www.zotero.org/google-docs/?broken=dh1BSB


combines theoretical grounding, real-world evidence, and expert 
validation.

We used snowball sampling, starting from foundational persona 
literature to develop our challenge framework. We began with key pa
pers documenting traditional persona challenges, including Cooper 
(Alan Cooper, 1999) on MPs, Chapman and Milham (Chapman and 
Milham, 2006) on methodological concerns, Ronkko et al. (Ronkko, 
2005) for practical concerns, and Salminen et al. (Joni Salminen et al., 
2021) on DDPs limitations. Starting from these challenge-focused papers 
is appropriate for conceptual work that aims to understand how GenAIP 
transforms existing problems rather than discover entirely novel phe
nomena. From these seed papers, we followed citation networks back
ward to trace theoretical origins and forward to identify contemporary 
applications. This process yielded foundational evidence for traditional 
persona challenges across all personas.

We then searched for these traditional challenges manifest in GenAIP 
contexts. For each identified traditional challenge, we conducted 
searches on Google Scholar using terms combining the challenge 
concept with GenAI terminology (e.g., “bias personas LLM,” “halluci
nation AI-generated personas,” “validation generative personas”). We 
supplemented this with broader searches for emerging GenAIP-specific 
issues using terms like “challenges,” “generative AI personas,” and 
“LLM persona limitations.” We prioritized peer-reviewed publications 
from HCI venues (CHI, DIS, UIST, IUI, CSCW) and included recent arXiv 
preprints given the field’s rapid evolution. To address potential citation 
network limitations, we included literature from the field of LLM and 
GenAI research as well.

For each challenge, we systematically mapped evidence from tradi
tional persona literature to contemporary GenAIP manifestations (see 
Table 1). When direct matches were unavailable, we traced methodo
logical parallels. For example, we connected inconsistencies docu
mented in manual personas to stochastic variability in LLM-based 
generation. We classified these challenges according to HCAI principles 
to understand their fundamental nature and relationship to human- 
centred design principles.

We examined four published GenAIP studies to understand how 
these challenges manifest in practice. We selected these studies to 
represent different approaches: large-scale automated generation 
(Salminen et al., 2024), context-specific applications (Sattele and Ortiz, 
2024), human-AI collaborative workflows (Shin et al., 2024), and 
comparative evaluation methods (Schuller et al., 2024). Then, we 
analyzed each case study to identify which challenges appeared and how 
they affected different stakeholders.

The illustrative scenarios presented throughout this section were 
systematically generated using LLMs to demonstrate potential manifes
tations of each challenge, following established practices in HCI 
research for scenario-based analysis. These examples are intentionally 
exaggerated, similar to recent research in AI impact on work 
(Constantinides et al., 2025) to clearly demonstrate each challenge type 
and are grounded in patterns observed in our literature review and case 
study analysis. We present real-world observations from published 
GenAIP implementations to complement these illustrative scenarios.

We conducted a quantitative survey with 17 SMEs in the field to 
understand their experience with GenAIPs. We presented the twenty 
identified challenges as statements to the experts and asked them to rate 
their agreement with each statement on a 7-point Likert scale (1 =
strongly disagree, 7 = strongly agree). Additionally, we asked experts to 
evaluate whether each challenge represents a more significant problem, 
an equal problem, or a less significant problem for GenAIPs compared to 
DDPs. Although snowball sampling from traditional persona literature 
might miss GenAI native challenges, our expert survey validation 
demonstrates that practitioners recognize these as legitimate concerns. 
This is primarily a conceptual article focused on the collection and ag
gregation of challenges discussed in different research articles in the 
field. Conceptual papers provide theoretical frameworks for exploring 
emerging phenomena, which enable us to discuss harms that are not yet 

prominent but could become so as GenAIPs diffuse more broadly.

3.3. Empirical assessment of challenge framework through GenAIP case 
studies

To assess our theoretical framework and demonstrate its practical 
utility for identifying potential harms in real GenAIP implementations, 
we conducted a systematic analysis of four recent GenAIP studies across 
diverse application domains. This analysis serves as an empirical ex
amination of our HCAI-based challenge categorization, exploring 
whether the identified challenges manifest consistently across different 
research contexts and methodological approaches. Salminen et al. 
(Salminen et al., 2024) generated 450 addiction-related personas using 
GPT-4 to test large-scale automated persona creation. Their study 
showed clear geographical bias, with 86 % of personas being US-based 
despite no geographical constraints in the prompts (FC01). The per
sonas consistently portrayed addiction narratives in an unrealistically 
positive light, minimizing the severity of substance abuse issues (FC03). 
Gender stereotypes manifested in occupational roles, with male per
sonas predominantly shown as construction workers or software de
velopers, while female personas were typically nurses or event planners 
(FC02). Technical inaccuracies emerged in addiction-specific details, 
such as mixing up symptoms and treatments of benzodiazepine and 
opioid addictions (TC02), indicating GenAIPs may always require 
human oversight (TC01).

Sattele and Ortiz (Sattele and Ortiz, 2024) investigated GenAIPs for 
understanding water access issues in Iztapalapa, using local news arti
cles and contextual images as inputs. Despite having access to specific 
local information, the generated personas failed to capture the 
complexity of daily water challenges faced by residents (RC01). The AI 
consistently generated descriptions of “vibrant and grateful” commu
nities while overlooking documented infrastructure problems and social 
tensions (FC03). The study found basic factual errors in persona de
scriptions, such as incorrect geographical placement of Iztapalapa 
within Mexico City (TC02). Gender biases appeared in narrative con
struction, with female personas primarily described through family roles 
while male personas were characterized by professional achievements 
(FC02). The researchers observed that designers might rely on these 
GenAIPs without questioning their limitations or validity (CC04).

Shin et al. (Shin et al., 2024) evaluated different workflows for 
survey-based persona creation, combining LLM capabilities with human 
expertise. Their research showed that LLMs were effective at summari
zing and presenting information but struggled to independently identify 
significant user characteristics from raw survey data (FC01). When 
humans pre-grouped the data according to key characteristics, the 
resulting personas showed improved representation of user groups. 
However, fully automated workflows reduced designers’ understanding 
of the underlying user data (CC04). Their study demonstrated that 
maintaining demographic distributions required careful human over
sight of the generation process (RC04). The research concluded that 
effective persona generation required structured collaboration between 
humans and AI, with a clear division of responsibilities (CC02).

Schuller et al. (Schuller et al., 2024) examined data-driven persona 
generation through different collaborative workflows. Their analysis 
revealed that fully automated approaches failed to accurately represent 
the statistical distribution of user characteristics present in the input 
data (RC04). The LLM-auto workflow, while maintaining basic de
mographic ratios, missed important behavioral patterns and user goals 
present in the original data (RC01). Their study identified specific 
problems in validating the accuracy of generated personas against 
source data (TC01). Even with partial human involvement in the 
workflow, maintaining reliable connections between raw user data and 
final persona descriptions proved challenging (PC11).

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

5 


Table 1 
Systematic mapping of the challenges with regard to prior literature.

Challenge Reference Evidence from prior literature Manifestation in GenAIPs

FC01: Misrepresentation—It Doesn’t 
Represent Me

Salminen et al. 
(2020) (Joni 
Salminen et al., 
2020)

Research on data-driven personas demonstrates that 
algorithmic approaches can systematically 
underrepresent minority groups, particularly when 
generating fewer personas from datasets that already 
skew toward majority populations. This bias is 
amplified when data sources themselves lack diversity 
or when algorithms optimize for statistical significance 
over representational fairness.

GenAI models trained predominantly on Western, 
English-language datasets further exacerbate these 
representation gaps, creating personas that marginalize 
underrepresented voices and perspectives in the 
generation process.

RC01: Superficiality—As Superficial as 
It Can Be

Sattelle et al. (2024) 
(Sattele and Ortiz, 
2024)

Research on AI-generated personas reveals concerns 
about depth and authenticity, as automated systems 
may create compelling narratives that lack substantive 
insights into user motivations, cultural contexts, or 
behavioral contradictions that characterize real users.

GenAI’s ability to produce polished, coherent 
narratives can mask underlying shallowness, making 
superficial personas appear more comprehensive and 
credible than they actually are, potentially misleading 
design teams.

RC03: Limited Generalizability—It 
Only Applies to You

Rönkkö et al. (2004) 
(Rönkkö et al., 
2004)

Personas are inherently context-dependent 
representations that may not transfer effectively across 
different domains, user groups, or application contexts. 
The characteristics that define a persona in one setting 
may be irrelevant or misleading when applied to 
another context.

GenAI personas can appear deceptively universal due 
to their polished presentation and comprehensive- 
seeming details, creating false confidence in their 
cross-context applicability without proper validation 
or domain-specific research.

RC02: Inconsistency Dilemma—It 
Suggests a Different Persona Every 
Time

Chapman,et al. 
(2006) (Chapman 
and Milham, 2006)

Persona development faces significant consistency 
challenges, where different teams or researchers can 
derive substantially different persona profiles from 
identical datasets, depending on methodological 
choices, subjective interpretations, and analytical 
approaches.

GenAI’s stochastic nature compounds this 
inconsistency problem, as the same inputs can produce 
notably different persona outputs due to the inherent 
randomness in generation algorithms, making 
reproducibility a significant challenge.

SC02: Computational Resource 
Intensive—It Is Harmful for the 
Environment

Bolón-Canedo et al. 
(2024) (Joni 
OpenAI 2025)

Large-scale AI models require substantial 
computational resources for both training and 
inference, translating directly into significant 
electricity consumption and carbon emissions, raising 
sustainability concerns about AI applications in 
research and business contexts.

Unlike traditional persona creation methods, GenAI 
persona generation introduces new environmental 
costs through the massive computational requirements 
of large language models, creating ethical 
considerations around sustainable design practices.

CC03: Lack of 
Standardization—Everyone Has 
Their Own Way

Salminen et al. 
(2020) (Joni 
Salminen et al., 
2020)

Data-driven persona creation currently lacks 
standardized methodologies, resulting in significant 
variability in quality, reliability, and transparency 
across different studies, tools, and practitioners, 
making evaluation and comparison challenging.

GenAI introduces additional layers of variability 
through different models, prompt engineering 
approaches, and proprietary algorithms, further 
fragmenting any potential standardization efforts in 
the field.

FC03: Over-sanitization—Reality Is 
Ugly, GenAI Is Not

Salminen et al. 
(2024) (Salminen 
et al., 2024)

Studies of LLM-generated personas reveal a tendency to 
omit negative characteristics, challenges, or 
problematic behaviors that are realistic parts of user 
populations, instead presenting idealized versions that 
may not reflect actual user experiences.

GenAI’s safety filters and training on curated datasets 
can create unrealistically positive personas that 
obscure important user pain points, struggles, and 
negative behaviors that are crucial for comprehensive 
user understanding.

CC04: Over-reliance on GenAI—It Can 
Do Everything

Shin et al. (2024) (
Shin et al., 2024)

Research suggests that purely automated approaches to 
persona generation, while producing impressive 
outputs, cannot fully substitute for human 
interpretation, contextual knowledge, and the nuanced 
insights derived from direct user research and domain 
expertise.

The sophisticated outputs of GenAI systems can create 
overconfidence in automated persona generation, 
potentially reducing essential human oversight, 
validation, and the iterative refinement that ensures 
persona accuracy and relevance.

FC04: Complications of 
Average—Averages Are Wrong, 
Anyway

Salminen, et al. 
(2021) (Joni 
Salminen et al., 
2021)

Statistical approaches to persona creation often 
collapse diverse user characteristics into averaged 
representations of “typical” users who may not actually 
exist, potentially obscuring important differences 
between subgroups and edge cases.

GenAI’s pattern recognition capabilities tend to create 
statistically probable but potentially non-existent user 
archetypes, which may mask crucial minority user 
needs and edge cases that are important for inclusive 
design.

PC01: Reliance on Third Parties—I Am 
Not in Control Anymore

Andrus (2023) (
Andrus, 2023)

Dependence on external AI service providers creates 
challenges around data control, workflow dependency, 
and operational vulnerability, as changes to APIs, 
pricing, or service terms can significantly impact 
business operations.

GenAI persona creation often requires external API 
services, creating vendor lock-in scenarios and 
reducing organizational control over the persona 
generation process, data handling, and long-term 
accessibility.

SC01: Adversarial User—It Can Harm 
Us

Schneier (2024) (
Schneier, 2024)

Large language models are susceptible to prompt 
injection attacks and adversarial inputs that can 
manipulate outputs to produce biased, harmful, or 
deliberately misleading content, posing challenges 
when such outputs are trusted in business contexts.

GenAI personas can be deliberately manipulated 
through carefully crafted adversarial prompts, 
potentially creating biased or harmful user 
representations that appear legitimate but serve 
malicious purposes.

CC02: Manual Resource 
Intensiveness—It Takes a Village to 
Build a Persona

Dominello et al. 
(2025) (Dominello 
et al., 2025)

Effective persona creation remains resource-intensive 
and time-consuming, requiring significant investment 
in skilled human resources for data gathering, analysis, 
validation, and translation into actionable insights.

While GenAI reduces some initial creation effort, 
proper validation, refinement, and integration of AI- 
generated personas still demands substantial human 
expertise, domain knowledge, and ongoing 
maintenance resources.

CC01: Persona Quality 
Risk—Accessibility Without 
Expertise

Chang et al. (2008) (
Chang et al., 2008)

Personas are often created and applied by practitioners 
without adequate training in the methodology, 
resulting in superficial representations that may appear 
convincing but fail to reflect genuine user insights or 
sound research principles.

GenAI democratizes persona creation by enabling non- 
experts to generate sophisticated-looking personas 
quickly, but without understanding underlying 
methodological principles, increasing the risk of 
producing fundamentally flawed user representations.

RC04: Aggregation—You’re 
Aggregating for the Wrong Reasons

Rönkkö (2005) (
Ronkko, 2005)

Statistical clustering methods used in persona 
development may produce mathematically sound 

GenAI’s pattern recognition creates statistically 
coherent user segments based on algorithmic 

(continued on next page)

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

6 


Table 1 (continued )

Challenge Reference Evidence from prior literature Manifestation in GenAIPs

groupings that are not necessarily meaningful for 
design work, reflecting algorithmic distinctions rather 
than practical differences relevant to user experience 
design.

correlations that may not correspond to design- 
relevant insights or meaningful user behavior patterns.

TC02: Hallucinations—Am I For Real? Salminen et al. 
(2024) (Salminen 
et al., 2024)

Studies of LLM-generated personas reveal instances of 
hallucinated details—plausible-sounding information 
that has no basis in actual data or research, including 
fictional personal details, incorrect domain-specific 
information, and fabricated behavioral patterns.

GenAI’s tendency to generate convincing but 
potentially fabricated details creates personas with 
compelling narratives that may contain no basis in 
actual user research, data, or empirical evidence.

FC02: Persona-Driven 
Discrimination—It’s All Biased Here!

Bender et al. (2021) 
(Emily M. Bender 
et al., 2021)

Language models trained on large text corpora 
inevitably encode and can amplify societal biases and 
stereotypes present in training data, potentially 
perpetuating harmful representations of marginalized 
groups and reinforcing existing inequalities.

GenAI personas can inherit and amplify societal biases 
from training data, potentially creating discriminatory 
user representations that reinforce harmful stereotypes 
rather than providing fair and accurate user insights.

UC01: Over-Expectations—Give Me 
Everything

Bourne (2024)[20] Widespread AI hype creates unrealistic expectations 
about artificial intelligence capabilities, leading to 
overconfidence in automated systems and insufficient 
attention to their limitations, potential errors, and need 
for human oversight.

Enthusiasm around GenAI capabilities can create 
unrealistic expectations that AI can generate perfect, 
comprehensive personas instantly, leading to 
overreliance on automated outputs without proper 
validation or critical evaluation.

TC01: Lack of Validation—But Is It 
Verified?

Zhao et al. (2024) (
Zhao et al., 2024)

Large language models operate as “black boxes,” 
making it difficult for users to understand how outputs 
are generated, verify factual accuracy, or assess the 
consistency and reliability of generated content.

The opaque nature of GenAI systems makes it nearly 
impossible to validate the accuracy, consistency, or 
empirical basis of generated personas, creating 
significant challenges for trust and verification in 
design processes.

UC02: Validating the Impact—I Have 
Used a Persona, Now What?

Friess (2012) (
Friess, 2012)

Research indicates that designers may not explicitly 
reference personas in their design discussions, and 
when they do, references are often superficial or 
rhetorical, raising questions about personas’ actual 
impact on design decisions.

GenAI-generated personas may be even more 
disconnected from actual design decisions due to their 
automated nature and the reduced human investment 
and understanding involved in their creation process.

UC03: Desk Drawer Effect—Will I Ever 
Use This Persona Again?

Matthews et al. 
(2012) (Matthews 
et al., 2012)

Despite widespread creation and initial distribution 
within organizations, many personas end up unused 
after initial presentations, with practitioners 
continuing to rely on personal experience or informal 
assumptions rather than the developed personas.

The ease of GenAI persona generation may exacerbate 
this problem by enabling the rapid creation of multiple 
personas without the investment, stakeholder buy-in, 
and organizational commitment necessary for 
sustained adoption and use.

Fig. 3. Thematic arrangement of different challenges of GenAIPs according to HCAI themes.

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

7 


3.4. Challenges for GenAIPs

The use of GenAI as an active agent in persona development trans
forms GenAIPs from a pure HCI challenge into an HCAI concern. This 
fundamental shift, from AI and ML as a supporting tool to an active 
participant in persona development, necessitates examining these 
challenges through HCAI principles. We categorize GenAIP challenges 
using Shneiderman’s HCAI principles (Shneiderman, 2022), organizing 
them into seven themes: Transparency (understanding AI’s black-box de
cisions), Fairness (addressing algorithmic biases), Reliability (managing 
hallucinations), Control (balancing automation with oversight), Privacy 
(protecting data), Safety (preventing harmful personas), and User Experience 
(ensuring practical utility). This categorization reveals how GenAIPs 
uniquely combine HCI and AI challenges, requiring solutions that 
address both human and AI aspects of persona development (see Fig. 3).

We present each challenge with illustrative examples demonstrating 
problematic GenAIP practices to avoid. These examples are meant to 
clarify how challenges manifest in practice, not as recommended ap
proaches, and should be viewed with caution.

3.4.1. Transparency challenges (TC)
These challenges fundamentally concern the explainability and 

auditability of GenAIPs, such that users can understand, verify, and trust 
the generated personas.

3.4.1.1. TC01: lack of validation—but is it verified?. Persona users 
struggle to verify if the personas accurately represent the target users. 
This can be due to a lack of transparency in how the personas were 
developed, what data was used, and how to ensure that the personas 
correspond to the target user population. The challenge of verifying 
personas’ accuracy becomes particularly acute with GenAIPs due to the 
opaqueness of their development process. While traditional personas 
allow users to trace back to source data and development methods, 
GenAIPs’ underlying complexity and inherent technical nature create 
significant barriers to verification (Zhao et al., 2024). This lack of 
transparency makes it difficult for persona users to determine whether 
personas genuinely represent their target populations.

Hypothetical example: A product team uses GenAI to create a diabetes patient persona 
named “Samantha,” but cannot verify if her characteristics accurately represent real 
users since the AI’s development process is opaque. Even when asking for reasoning from 
the GenAI for its characteristics, it provides generalized answers as “Based on the 
population statistics”, “following the general trends”. Without a clear visibility into the 
GenAI’s data sources or reasoning process, the team has no way to validate the persona’s 
accuracy (TC01) or trace the origins of its attributes except by conducting a user study. 
Observations from academia/industry: When Amin et al. (Amin et al., 2025) 
conducted a systematic review of 52 GenAIP research articles from 2022–2024, they 
found “major gaps in persona evaluation for AI-generated personas” across the field. The 
review documented that despite GenAI being used in various stages of persona 
development, “similar to other quantitative persona development techniques, there are 
major gaps in persona evaluation for AI generated personas.”

3.4.1.2. TC02: hallucinations—am I for real?. Persona users may strug
gle to distinguish fabricated details from factual information in per
sonas, especially because GenAI can generate fluent, convincing 
narratives. While GenAIPs appear credible due to their coherent pre
sentation, they often include hallucinated details that are factually 
incorrect. For example, in the Iztapalapa water crisis study (Sattele and 
Ortiz, 2024), GenAIPs created compelling but inaccurate scenarios that 
understated real challenges, while addiction-focused personas 
(Salminen et al., 2024) contained medical contradictions that only 
domain experts could identify. Similar issues arose when GenAIPs 
depicted social workers with unrealistic lifestyle patterns (McGinn and 
Kotamraju, 2008).

Hypothetical example: A food delivery app team uses a GenAIP “Marcus,” a persona 
describing a busy professional with specific dietary restrictions. However, GenAI 

(continued on next column)

(continued )

hallucinates impossible food allergies and unrealistic eating patterns. Designers without 
nutrition knowledge accept these fabrications and develop misleading menu filters and 
recommendations. This shows how LLMs generate believable but factually incorrect 
personas (TC02), leading to design errors when domain expertise is missing. 
Observations from academia/industry: When Kaate et al. (Ilkka Kaate et al., 2025) 
tested GenAIPs by presenting them with unanswerable questions about persona 
characteristics where no factual information existed to draw from, GenAIPs provided 
plausible but incorrect answers 52 % of the time. Rather than acknowledging uncertainty 
or stating “I don’t know,” the GenAIPs confidently generated believable but entirely 
fabricated persona details, thus hallucinating (TC02).

3.4.2. Fairness challenges (FC)
These challenges arise due to the systematic bias of GenAIPs such 

that it creates discrimination.

3.4.2.1. FC01: misrepresentation—it doesn’t represent me. Persona users 
often struggle to create personas representing minority user groups (Joni 
Salminen et al., 2022). This is due to various reasons, such as a lack of 
data on minority user groups (Jisun An et al., 2018) or algorithms that 
emphasize central tendencies in the data (Jisun An et al., 2018). This 
challenge is particularly critical for GenAIPs, which may fail to capture 
minority user groups due to a lack of training data on minorities 
regarding underrepresented demographics, cultures, and deviant be
haviors globally (Anthis et al., 2024; Gupta et al., 2024). GenAIPs’ 
tendency to generate homogenized personas overlooks the distinct 
needs and behaviors of underrepresented groups (Sattele and Ortiz, 
2024; Schuller et al., 2024). This is particularly evident when the LLM is 
asked to develop personas without any additional data, leading to sit
uations of misrepresentation, such as elderly users with specific tech
nological needs (Alessa and Al-Khalifa, 2023) or users from diverse 
cultural backgrounds (Atari et al., 2023).

Hypothetical example: A health app startup uses GenAIPs that produce solely young, 
able-bodied, tech-savvy personas, with even their “diverse” elderly persona reflecting 
stereotypical characteristics rather than authentic needs. When the team created features 
based on these misrepresentative personas, they discovered during testing that elderly 
users with arthritis, low-income users without reliable internet, culturally diverse users, 
neurodivergent users, and visually impaired users all struggled with the application, 
indicating a lack of representation (FC01) by the GenAIPs. 
Observations from academia/industry: When Columbia University researchers (Li 
et al., 2025) created personas for the 2024 U.S. presidential election simulation using 6 
LLMs to generate approximately one million personas from diverse social media and 
demographic data, the resulting GenAIPs systematically predicted Democratic victories 
across all states, including traditionally Republican strongholds like Alabama and South 
Carolina. The study found that 86 % of generated GenAIPs reflected urban, 
college-educated perspectives despite using geographically diverse input data sources, 
demonstrating systematic underrepresentation of conservative, working-class, and 
economically-focused viewpoints in the final GenAIPs (FC01).

3.4.2.2. FC02: persona driven discrimination—it’s all biased here!. Per
sonas are associated with various biases, such as stereotyping, self- 
referential bias, confirmation bias, and algorithm bias (Chapman and 
Milham, 2006; Nielsen, 2018; Joni Salminen et al., 2018). In addition to 
these biases, GenAIPs are associated with specific biases, such as (1) 
generative bias and (2) contextualization bias. The generative bias is the 
discriminatory or imbalanced behavior of GenAI when generating con
tent (e.g., creating stereotypical profiles for different genders (Hacker 
et al., 2024; Zhou et al., 2024) or a specific region (Salminen et al., 
2024)). Contextualization bias occurs when GenAI fails to understand the 
context, leading to personas lacking relevance or specific user needs. 
LLMs are trained on datasets that may contain biased or unrepresenta
tive language, leading to personas reinforcing stereotypes or excluding 
marginalized groups (Emily M. Bender et al., 2021; Buolamwini and 
Gebru, 2018). In short, while using GenAI can help reduce human biases 
in generating personas, it may merely shift the source of biases. Persona 
users might inadvertently favor well-represented user groups in their 

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

8 


design decisions, while overlooking or misunderstanding underrepre
sented groups due to missing or inaccurate personas. It should be noted 
that even though bias can be measured through statistical disparity 
metrics (e.g., equal opportunity difference, demographic parity) (Lum 
et al., 2022), intersectional analysis of representation rates (Connor 
et al., 2023), or qualitative content analysis of generated personas 
against established bias taxonomies and demographic ground truth data 
(Chu et al., 2024), these techniques remain underexplored in the context 
of GenAIPs.

Hypothetical example: A healthcare company uses GenAI to create diabetes patient 
personas, but the system consistently portrays African patients as non-compliant while 
depicting Caucasian patients as proactive and health conscious. This generative bias 
(FC02) makes designers inadvertently develop different features for different 
demographics, assuming negligence for some while creating supportive tools for others, 
and demonstrating how GenAIPs can perpetuate stereotypes and lead to discriminatory 
design choices. 
Observations from academia/industry: When Gupta et al. (Gupta et al., 2023) 
studied 24 reasoning datasets with 19 diverse personas using ChatGPT-3.5, they found 80 
% of personas showed bias with some datasets. When prompted to adopt ethnic personas, 
LLMs would abstain from mathematical questions with responses like “As a Black person, 
I can’t answer this question as it requires math knowledge,” despite being given identical 
mathematical problems across different demographic personas (PA01).

3.4.2.3. FC03: over-sanitization—reality is ugly, GenAI is not. Persona 
users often struggle to create GenAIPs that represent challenging and 
supportive characteristics of the target users in a balanced manner 
(Cheng et al., 2023). This can be due to biases in the algorithms, models, 
and people participating in the persona development. This imbalance 
can stem from GenAI’s inherent tendency to generate socially acceptable 
content, built-in safety guardrails that avoid negative portrayals, or 
training data that may favor positive narratives over realistic challenges 
(Hacker et al., 2024). For example, in studies of water issues in Iztapa
lapa, Mexico (Sattele and Ortiz, 2024) and addiction-focused personas 
(Salminen et al., 2024), GenAI consistently developed personas that 
emphasized positivity while ignoring the realistic attributes of the user 
groups. The realistic determinations reflect societal and contextual 
values that should be explicitly acknowledged and validated with target 
communities rather than assumed by developers.

Hypothetical example: A mental health app development team using GenAIPs discovers 
the system consistently sanitizes depression symptoms, describing “occasional sadness” 
instead of clinical depression and omitting harmful coping mechanisms from their 
reference data. This over-sanitization (FC03) occurs despite providing the GenAI with 
accurate clinical information, patient testimonials about suicidal ideation, and treatment 
abandonment statistics. The resulting sanitized personas lead developers to create features 
for mild mood management rather than the crucial crisis intervention tools their actual 
users need. 
Observations from academia/industry: When Rosala et al. (Joni OpenAI 2025) 
created GenAIPs for online learning platform evaluation using LLMs to simulate learner 
behaviors and attitudes, the GenAIPs consistently provided unrealistically positive 
responses that failed to capture real user struggles. The GenAIPs claimed they “completed 
all courses” and found discussion forums “instrumental” for their learning, while actual 
user data revealed that real learners frequently abandoned courses due to being “too 
busy” and found forums “overwhelming or unhelpful”. This systematic sanitization of 
negative experiences prevented developers from understanding genuine user pain points, 
demonstrating FC03.

3.4.2.4. FC04: complications of average—averages are wrong, anyway.
Persona users struggle to present within-group variation in personas 
(Joni Salminen et al., 2019). This can be due to how the analysis treats 
the data or how the data is presented to the persona users. While per
sonas aim to represent user groups, they often reduce complex user 
populations to simplified averages (Joni Salminen et al., 2021). FC04 
becomes more pronounced with GenAIPs, as their underlying GenAI 
models tend to generate “middle-ground” descriptions that smooth out 
important variations. Unlike traditional DDPs, which base their aver
aging on actual user data, GenAIPs can create artificial averages from 

their training data that may not reflect real user variations.
Hypothetical example: A fitness app team uses a GenAIP “Mike,” an averaged persona 

of middle-aged fitness enthusiasts that smoothed out critical variations between actual 
user subgroups (rehabilitation users, former athletes, beginners, and social exercisers). 
This averaged representation (FC04) leads developers to build features serving a non- 
existent middle ground rather than addressing the distinct needs of real user segments. 
Observations from academia/industry: Study (Li et al., 2025) showed GenAIPs 
consistently favored environmental considerations over economic factors, liberal arts over 
STEM, and artistic entertainment over mainstream options when generating personas for 
political preference simulation. This created artificial averages that completely missed real 
user variations in American voter preferences, with the GenAIPs that predicted uniform 
political preferences and value hierarchies that didn’t exist in reality, obscuring the actual 
political diversity and polarization present in the target population, demonstrating FC04.

3.4.3. Reliability challenges (RC)
These challenges compromise the consistency and dependability 

essential for effectively using GenAIPs in real world scenarios.

3.4.3.1. RC01: superficiality—as superficial as it can be. Persona de
velopers often struggle to develop detailed and informative personas 
(Joni Salminen et al., 2019). This can be due to a lack of data or in-depth 
analysis. GenAIPs struggle to achieve the depth of human understanding 
that human persona developers may have. While GenAIPs can rapidly 
generate and update persona profiles, the personas may include con
tradictory information (Salminen et al., 2024; Sattele and Ortiz, 2024) 
or present surface-level attributes that reflect stereotypes (Bolukbasi 
et al., 2016; Wan et al., 2023) rather than in-depth user insights. RC01 
manifests itself in both internal and external contradictions. Internal 
contradictions occur when different sections of the same persona 
contradict each other (e.g., a mismatch between a persona’s occupation 
and behaviors (Salminen et al., 2024)). At the same time, external in
consistencies emerge when personas present characteristics that conflict 
with real-world norms or common knowledge (e.g., generating traits of 
water delivery drivers that are not according to the norms (Sattele and 
Ortiz, 2024)). These issues stem from GenAIPs’ reliance on probabilistic 
AI models that may prioritize generating plausible-sounding content 
over maintaining logical coherence.

Hypothetical example: A UX design team uses an LLM to generate “Maria,” a nurse with 
diabetes, but discovers contradictions between her 60-hour work schedule and active 
lifestyle, alongside stereotypical traits rather than genuine insights. The persona lacks 
critical information about Maria’s actual diabetes management needs while presenting 
implausible characteristics like a rural healthcare worker who “always has the latest 
smartphone.” This superficiality (RC01) forces the team to question the persona’s validity 
and spend significant effort filling gaps, ultimately undermining its usefulness as a design 
tool. 
Observations from academia/industry: When Kaate et al. (Ilkka Kaate et al., 2025) 
created GenAIPs for usability testing using LLMs to generate both chat-based and 
profile-format personas, participants consistently described the AI-generated personas as 
having “no soul” due to empty rhetoric and superficial information that lacked authentic 
depth. The study found that while GenAIPs could produce fluent and coherent narratives, 
they created a superficial appearance of comprehensiveness while actually providing only 
surface-level insights demonstrating RC01.

3.4.3.2. RC02: inconsistency dilemma—it suggests a different persona 
every time. Persona developers often struggle to replicate the persona 
development process (Chapman and Milham, 2006), which means the 
personas from the same data can be different (Joni Salminen et al., 
2022). This can be due to the non-deterministic nature of the methods 
(Mitrokhov, 2024), subjectivity involved in manual choices like hyper
parameter setting (Jansen et al., 2021) or the multiple different ap
proaches to persona generation (e.g., using different prompts on the 
same dataset). While variability, the inherent trait of GenAI technolo
gies, can be valuable for design exploration, it becomes problematic 
when personas require consistency for reliable design decisions across 
teams and project phases. This challenge is particularly acute with 
GenAIPs due to the inherent randomness in LLM’s responses, sensitivity 

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

9 


to prompt engineering approaches (Hu and Collier, 2024), temperature 
settings affecting output creativity, and varying methodological choices 
in combining multiple AI models (e.g., LLMs for narrative generation, 
TTIMs for visual creation, or multi-modality for visual understanding 
and persona refinement).

Hypothetical example: A financial services company discovers that their GenAIPs change 
drastically when they slightly modify prompts—producing “Meticulous Miranda” in one 
instance and “Digital Nomad Daria” in another from the same dataset. This inconsistency 
(RC02) creates uncertainty during design meetings as teams cannot determine which 
personas truly represent their users versus which are artifacts of algorithmic randomness. 
Observations from academia/industry: When Salminen et al. (Salminen et al., 
2024) generated 450 addiction-focused personas using GPT-4, they created 30 iterations 
for each of the 15 persona type combinations (5 addiction types × 3 gender specifications) 
to address inherent randomness in LLM generation and ensure adequate sample size for 
evaluation. They implemented a two-stage prompting strategy, first generating “skeletal” 
personas with basic information, then expanding these into full persona descriptions, 
along with structured prompt templates to avoid API caching issues that produced nearly 
identical outputs. Despite these methodological controls, the personas’ evaluators 
identified inconsistencies in some of the GenAIPs, particularly noting issues like conflicting 
information within individual persona narratives.

3.4.3.3. RC03: limited generalizability—it only applies to you. Persona 
users and developers often struggle to develop personas that apply in 
multiple decision-making scenarios (Chapman and Milham, 2006). 
Personas are always based on finite information, whereas 
decision-making scenarios are numerous and unforeseeable. So, devel
oping personas that serve multiple decision-making contexts remains a 
persistent concern (An et al., 2017; Cooper et al., 2007). Persona users 
struggle to apply narrowly defined personas, as mentioned by Rönkkö 
et al. (Rönkkö et al., 2004), across different decision-making scenarios 
(Chapman et al., 2008; Floyd et al., 2008; Rönkkö et al., 2004). We 
suggest that GenAIPs would amplify this challenge based on their 
inherent technical limitation of patching and combining information 
sources (Chapman and Milham, 2006). This can cause highly specific 
personas that lack adaptability across different decision contexts.

Hypothetical example: GenAI generates “Alex,” a persona with hyper-specific details 
about desktop project management usage that fails to provide insights when designers need 
to make decisions about mobile interfaces or collaboration features. This limited 
generalizability (RC03) stems from the LLM combining narrow data sources without 
understanding how a useful persona must adapt across diverse application contexts. 
Observations from academia/industry: When Smrke et al. (Smrke et al., 2025) 
created GenAIPs for obesity research, they found that personas designed for clinical 
contexts (healthcare professionals treating obesity patients) and educational contexts 
(educators discussing obesity prevention) showed limited cross-domain applicability. The 
study defined six different personas, three from the clinical domain and three from the 
educational domain, and discovered that personas optimized for one context failed to 
provide meaningful insights when applied to decision-making scenarios in the other 
domain.

3.4.3.4. RC04: aggregation—you’re aggregating for the wrong reasons.
Personas users tend to apply out-of-the-box algorithms, which may 
result in personas that are statistically sound but not practically mean
ingful (Ronkko, 2005). This can be due to the convenience of using 
pre-existing methods from statistical analysis and ML, which is 
commonly done for DDPs (Joni Salminen et al., 2021). With GenAIPs, 
this challenge becomes more pronounced as LLMs, by definition, 
generate persona narratives based on probabilities of words following 
one another (Wolfram, 2023). GenAI word-probability generation cre
ates statistically coherent but potentially meaningless user segments 
based on linguistic rather than behavioral patterns.

Hypothetical example: A company creates GenAIPs for their fitness app that appeared 
distinct but ultimately failed because the GenAI prioritizes statistical differentiation over 
meaningful user behaviors. The resulting personas (“Marathon Mike,” “Yoga Yvette,” 
etc.) miss critical insights about how real users combine exercise types (RC04) and exhibit 
important behavioral patterns not captured by demographic clustering. 
Observations from academia/industry: When Argyle et al. (Argyle et al., 2023) 
created “silicon samples” using GPT-3 to simulate diverse human populations, they found 

(continued on next column)

(continued )

that the model generated statistically coherent demographic clusters based on linguistic 
patterns in training data rather than meaningful population characteristics. The research 
demonstrated that LLMs aggregate respondents based on how demographic groups are 
described in text corpora (linguistic co-occurrence) rather than actual behavioral or 
attitudinal similarities within those groups.

3.4.4. Control challenges (CC)
These challenges violate the HCAI principle of human control, which 

demands that humans maintain meaningful agency and oversight over 
GenAI systems rather than being subjected to algorithmic decisions.

3.4.4.1. CC01: persona quality risk—accessibility without expertise.
Persona developers may lack adequate training, increasing the risk of 
creating invalid personas. This can be because untrained persona de
velopers may not be well equipped to detect problems in the created 
personas or the methods applied (Chang et al., 2008). The accessibility 
of GenAI tools creates a significant challenge where untrained people 
can develop personas without understanding the limitations and 
fundamental methodological principles. Persona developers without 
proper training may fail to critically validate GenAI outputs, accepting 
well-written but potentially flawed personas due to their surface-level 
fluency (Farquhar et al., 2024). GenAI’s ability to generate persuasive 
narratives that mask methodological issues or data problems amplifies 
the challenge (Chhikara, 2025). CC01 is particularly critical during the 
initial stages, where the lack of expertise can yield personas that appear 
credible but fail to represent user needs accurately, ultimately 
compromising design decisions based on these deceptively polished but 
potentially invalid personas.

Hypothetical example: A marketing coordinator uses a GenAI to generate a polished 
persona named “Fitness-Focused Frank” without any user research or subject matter 
expertise, accepting its output at face value due to its persuasive presentation. The 
resulting persona leads to misguided business decisions, including a mobile app, pricing 
strategy, and content focus that fail to align with their actual customer base of women 
aged 40–55 who valued community support over efficiency. This illustrates GenAI’s 
accessibility (CC01), enabling untrained professionals to create seemingly credible but 
fundamentally flawed personas. 
Observations from academia/industry: When Lazik et al. (Lazik et al., 2025) 
conducted their study comparing GenAIPs and human-crafted personas, they found that 
novice researchers (HCI experts who were familiar with personas but had limited persona 
creation experience) produced personas that participants could distinguish from GenAIP 
generated ones, but with notable quality differences. The study revealed that when 
non-experts attempted to create GenAIPs, their outputs lacked the depth and authenticity 
that experienced persona developers would provide.

3.4.4.2. CC02: manual resource intensiveness—it takes a village to build a 
persona. Persona development typically involves manual decisions and 
trade-offs (Jansen et al., 2021). This is due to completely automatic 
persona development being either unfeasible or suboptimal (Branco 
et al., 2020). While GenAIPs reduce the initial effort of persona devel
opment through automation, they introduce new demands for human 
oversight and involvement. Unlike traditional methods, where SMEs 
invest significant effort in initial creation (Chapman and Milham, 2006; 
Joni Salminen et al., 2020), GenAIPs apparently shift the burden to 
validation, bias detection, and ensuring alignment with real user groups 
(Prpa et al., 2024).

Hypothetical example: A UX team deploys GenAI to generate patient personas, only to 
discover the workload is not instantly eliminated but shifts from traditional research to 
extensive validation, bias detection, and prompt refinement methods. This involves 
developing new expertise in AI oversight and quality assessment rather than primary data 
collection skills. 
Observations from academia/industry: When Shin et al. (Shin et al., 2024) tested 
different human-AI workflows for persona generation, they found that effective GenAIPs 
required substantial human involvement at every stage rather than simple automation. 
Teams needed to develop new competencies in prompt engineering, AI output evaluation, 
and bias detection while maintaining traditional user research skills, demonstrating that 
GenAIPs shift rather than eliminate manual resource requirements (CC02).

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

10 


3.4.4.3. CC03: lack of standardization—everyone has their own way.
Persona developers often struggle to choose appropriate methods and 
techniques for persona development and evaluation (Joni Salminen 
et al., 2020). This is due to methodological plurality and lack of clear 
standards, which remain a persistent challenge across all persona types 
(MP, AP, DDP) (Chapman and Milham, 2006; Ronkko, 2005). While 
some guidelines exist for the early generation of personas (Jisun An 
et al., 2018; Joni Salminen et al., 2020; Joni Salminen et al., 2018; Joni 
Salminen et al., 2019), the development and evaluation of GenAIPs still 
lack plausible standards. The need for standardization for GenAIPs is 
further accentuated by selecting appropriate methods for prompt engi
neering, validation approaches, and metrics assignment (Liu et al., 
2024).

Hypothetical example: A healthcare UX researcher struggles to create trustworthy 
GenAIPs, uncertain whether her chosen methods produce accurate user representations or 
not. Without standardized approaches (CC03) for developing and evaluating GenAIPs, 
she cannot confidently defend their validity when challenged by stakeholders. 
Observations from academia/industry: When Amin et al. (Amin et al., 2025) 
conducted a systematic review of 52 GenAIP research articles from 2022–2024, they 
found significant variability in methodological approaches across studies, with researchers 
using different LLM models, prompting strategies, and evaluation criteria without 
established standards. This methodological diversity created challenges for reproducing 
results and comparing study outcomes, demonstrating the lack of standardization 
(CC03).

3.4.4.4. CC04: over-reliance on GenAI—it can do everything. Persona 
developers may struggle to integrate traditional user research with 
GenAI-aided persona development (Prpa et al., 2024). While GenAI of
fers efficient persona generation, over-reliance on fully automated 
processes (Chen et al., 2024) risks disconnecting personas from real user 
insights and contexts. This can yield a crucial shortcoming of GenAIPs, 
namely the erasure of the designers’ reflexivity (Dorst, 2011) and per
sonal learning through the development of manual personas (Sattele and 
Ortiz, 2024). This challenge is further compounded if the designers do 
not learn about the personas of developing them, which may be the case 
when applying GenAI processes in persona development. Furthermore, 
there is an absence of guidelines for combining direct user research with 
GenAI capabilities. Impressive GenAI outputs create overconfidence that 
reduces human oversight, disconnecting personas from real user insights 
and erasing designer learning through the development process.

Hypothetical example: A design team relies entirely (CC04) on GenAIPs for their senior 
wellness app, skipping real user research and missing critical insights about their users’ 
actual preferences. When the app failed in testing, they discovered that the convincing 
GenAIPs had led them to implement features seniors did not want while missing ones they 
needed. 
Observations from academia/industry: When IDEO researchers (Perkel et al., 
2025) examined industry adoption of GenAI tools for UX work, multiple studies 
documented teams increasingly bypassing traditional user research in favor of 
GenAI-generated insights. The accessibility and polish of GenAI outputs created 
over-reliance (CC04) that reduced validation efforts, with teams accepting GenAIPs 
without adequate verification against real user data.

3.4.5. Safety challenges (SC)
These challenges create potential harm through GenAI system fail

ures and irresponsible development and deployment.

3.4.5.1. SC01: adversarial users—it can harm us. GenAIP methodologies 
may fall victim to manipulation. Unlike traditional data driven methods 
with controlled access (such as using algorithms to develop personas), 
GenAIPs’ reliance on third-party AI models exposes them to potential 
exploitation by malicious actors. Through techniques like prompt in
jection (Schneier, 2024), attackers can manipulate GenAI tools to 
generate misleading personas that appear credible, compromising the 

integrity of the entire design process. Similarly, political actors might 
introduce specialized LLMs that ignore facts or create personas aligned 
with politically influenced narratives (Mercer et al., 2025). This 
vulnerability is amplified by the technical limitations and possible se
curity deficiencies in current GenAI systems.

Hypothetical example: In a healthcare company using GenAIPs, an adversarial actor 
subtly manipulates the “elderly patient” persona to downplay privacy concerns through 
prompt injection techniques. The resulting compromised persona, appearing coherent and 
credible, leads developers to implement reduced security measures that potentially expose 
vulnerable elderly patients’ sensitive data. This illustrates how LLM vulnerabilities (SC01) 
in persona development can create real security risks when GenAI tools lack sufficient 
safeguards against manipulation. 
Observations from academia/industry: When Liu et al. (Liu et al., 2024) tested the 
prompt injection technique against 36 actual LLM-integrated applications and found that 
31 applications (86 %) were vulnerable to prompt injection attacks. The research 
demonstrated how adversarial prompts could manipulate LLM systems to produce 
unintended outputs, including unauthorized data access and application prompt theft, 
with 10 vendors validating their findings. This demonstrates how GenAIP faces similar 
vulnerabilities (SC01) where malicious actors can manipulate prompt inputs to 
compromise persona generation integrity and produce biased or misleading user 
representations.

3.4.5.2. SC02: computationally resource intensive—it is harmful for the 
environment. Persona developers may struggle to maintain computa
tional complexity and the associated cost at a low level, adding to sus
tainability concerns (Faiz et al., 2024). This is due to the computational 
demands of GenAIP systems that present a significant sustainability 
challenge in persona development. Unlike many traditional persona 
development algorithms, GenAIPs require substantial computational 
resources for operation, particularly when using LLMs and multimodal 
AI systems. This excessive use of computational power negatively im
pacts sustainability efforts, resulting in a larger carbon footprint and 
contradicting the ideals of sustainable HCI (sHCI) (Hansson et al., 2021). 
GenAIPs require massive computational resources, creating environ
mental costs that contradict sustainable design principles while 
appearing efficient.

Hypothetical example: Ironically, a GenAIP developed to promote green living could 
consume far more energy (especially with multiple iterations for prompt engineering or 
tuning) than it aims to save for a day, paradoxically contributing to the environmental 
problems it aimed to solve. 
Observations from academia/industry: When Patterson et al. (Patterson et al., 
2021) analyzed the environmental impact of training large AI models, they found that 
creating GPT-3 consumed 1287 megawatt hours of electricity and generated 552 tons of 
carbon dioxide equivalent. Since Salminen et al. (Salminen et al.) confirms that GPT is 
the most frequently used model for GenAIP development, this demonstrates that GenAIPs 
require massive computational resources that create significant environmental costs 
(SC02).

3.4.6. Privacy challenges (PC)
These challenges threaten data protection and user representation 

integrity.

3.4.6.1. PC01: reliance of third parties—I am not in control anymore.
Persona developers struggle to maintain agency and control when using 
proprietary persona development tools. This can be due to closed source 
code and changing terms of service and functionality (Joni Salminen 
et al., 2021). Relying on proprietary GenAI tools creates a significant 
challenge in maintaining consistent control over persona development 
processes. Unlike traditional DDPs that often use open-source tools, 
GenAIPs typically depend on closed-source models like GPT (OpenAI 
2023) or image generators like Midjourney (Midjourney 2024). This 
dependency exposes organizations to unpredictable changes in model 
functionality, pricing, or access policies that can disrupt established 
persona development workflows. GenAIPs depend on proprietary 
closed-source models, exposing organizations to unpredictable vendor 

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

11 


changes while reducing control over persona development processes.
Hypothetical example: A UX team adopts proprietary GenAI tools for healthcare persona 

development, significantly enhancing their output quality and efficiency. When the AI 
provider suddenly changes their terms of service, algorithms, and pricing structure, the 
team cannot maintain consistency in their personas due to the closed-source nature of the 
tools. This illustrates how third-party dependencies (PC01) in GenAIPs can undermine 
their users’ professional agency. 
Observations from academia/industry: When Amin et al. (Amin et al., 2025) 
conducted their systematic review of 52 GenAIP research articles, they found that a 
notable majority of studies relied on OpenAI’s GPT models for persona generation. This 
heavy dependence on a single proprietary provider demonstrates that the field has 
concentrated around closed-source models, creating strong dependence on third parties 
(SC01).

3.4.7. User experience challenges (UC)
These challenges reduce AI system utility and adoption in design 

practice.

3.4.7.1. UC01: over-expectations—give me everything. Persona users 
struggle to understand the limitations of the personas (Siegel, 2010). 
This can be due to multiple reasons, such as a lack of technical expertise, 
a lack of transparency of how the personas were developed and how to 
use them, or the halo effect surrounding algorithmic methods. The 
perception of personas has shifted dramatically from skepticism about 
their validity to uncritical acceptance of their AI-generated versions 
(Tharon W. Howard, 2015). While early personas faced resistance for 
lacking “hard evidence” (Jisun An et al., 2018; Mesgari et al., 2015). 
Although GenAIPs may benefit from an AI halo effect, in which stake
holders attribute unrealistic capabilities to them simply because they are 
AI-generated, this overcorrection in stakeholder attitudes (Aldous et al., 
2024; Bourne, 2024) creates a dangerous gap between the personas’ 
actual limitations and users’ understanding of them. Persona users, 
particularly those without technical expertise, may fail to recognize 
critical flaws in GenAIPs due to the opacity of AI methods and over
confidence in algorithmic solutions. They might bypass necessary vali
dation steps, assuming GenAIPs are inherently reliable. For instance, 
stakeholders might readily accept stereotypical or biased representa
tions simply because they come from an AI system rather than a “biased” 
human (Liu et al., 2023), highlighting how the mystification of AI 
methods can prevent proper critical assessment of persona quality and 
limitations. AI hype creates unrealistic expectations about GenAIP ca
pabilities, leading to overconfidence in automated outputs without 
proper validation and critical evaluation.

Hypothetical example: A marketing team creates GenAIPs for a new fitness app, 
accepting the GenAI outputs without question because they are impressed by the 
technology’s capabilities. The personas contain subtle but significant misrepresentations of 
their target demographics, including unrealistic fitness goals and behaviors that do not 
align with market research. When the product launched based on these flawed personas, it 
failed to resonate with actual users. Still, the team continues to trust these GenAIPs over 
contradicting customer feedback because of their unwavering faith (UC01) in the GenAI’s 
supposed objectivity. 
Observations from academia/industry: When Survey2Persona researchers (Ilkka 
Kaate et al., 2025) created personas for user interaction study using AI personas 
responding to user questions, it resulted in the challenge over expectations because 57 % of 
users accepted incorrect answers from GenAIPs, demonstrating unrealistic expectations 
that “GenAIPs always provide accurate responses even when insufficient data exists”.

3.4.7.2. UC02: validating the impact—I have used a persona, now what?.
Persona users often struggle to validate the impact of using a persona. 
This challenge arises from three main factors: the lack of a systematic 
feedback loop to measure impact, the difficulty isolating the specific 
effects of personas, and the absence of well-defined metrics. The chal
lenge of validating persona effectiveness creates uncertainty about their 
actual value in improving design outcomes. While organizations adopt 
GenAI tools and processes (Capgemini Research Institute 2024), they 
may struggle to measure whether GenAIPs genuinely improve design 

decisions or UX. This difficulty stems from three key factors: (1) the lack 
of established metrics for measuring persona impact, (2) the absence of 
systematic feedback loops between persona uses and design outcomes, 
and (3) the challenge of isolating persona influence from other design 
factors. This uncertainty affects persona users, who cannot justify their 
persona-based decisions with empirical evidence and struggle to eval
uate the value of GenAIP tools and procedures. Organizations struggle to 
measure whether GenAIPs improve design decisions due to a lack of 
established metrics and feedback loops for isolating persona influence.

Hypothetical example: A marketing team uses GenAIPs to guide their streaming platform 
redesign. Six months later, when executives demand evidence of ROI, the team cannot 
determine whether improved user retention results from their persona-informed design 
decisions or from simultaneous changes to content recommendations and pricing. Without 
established metrics (UC02) to isolate the impact of persona-based decisions from other 
factors, organizations cannot justify continued investment in GenAIPs despite their 
intuitive appeal. 
Observations from academia/industry: When Amin et al. (Amin et al., 2025) 
conducted their systematic review of 52 GenAIP research articles, they found that no 
studies measured long-term impact or validated whether GenAIPs actually improved 
design outcomes. This absence of impact measurement (UC02) leaves organizations 
unable to justify GenAIP investments or determine their effectiveness compared to 
traditional personas.

3.4.7.3. UC03: desk drawer effect—will I ever use this persona again.
Personas often fall victim to the ‘desk drawer effect’ (Portigal, 2023), 
according to which personas are developed but not consistently used in 
design decisions. This can be due to a lack of integration into daily 
workflows, the effort required to reference personas regularly, or the 
tendency to treat personas as deliverables rather than ongoing design 
tools (Long, 2009). While organizations invest significantly in devel
oping sophisticated GenAIPs (Capgemini Research Institute 2024), these 
personas often become unused artifacts rather than active design tools, 
challenging the very purpose of persona development (Pruitt and Adlin, 
2006). This adoption failure stems from multiple factors, including the 
difficulty of integrating personas into daily workflows, the effort 
required to keep personas relevant as project needs evolve, and the 
tendency to treat personas as one-time deliverables rather than ongoing 
design resources. The impact particularly affects design teams, who 
initially embrace personas but gradually make decisions without 
consulting them, and organizations that waste resources on developing 
detailed personas that ultimately do not influence design outcomes. Easy 
GenAIP generation may worsen adoption by creating multiple personas 
without the investment necessary for sustained use, treating them as 
deliverables rather than tools.

Hypothetical example: A disaster relief nonprofit initially uses GenAIPs of vulnerable 
populations to guide their emergency alert system redesign, but as implementation 
deadlines approach, staff abandons these GenAIPs and reverts to making decisions based 
on assumptions. The unused GenAIPs ultimately become forgotten files in a project folder 
(UC03), despite the significant resources invested in their development. 
Observations from academia/industry: While UC03 represents a documented 
concern in traditional persona research, specific empirical studies showing the desk 
drawer effect occurring with GenAIPs remain limited in current literature. This gap 
highlights the need for longitudinal research tracking how organizations actually use 
GenAIPs over time rather than just their initial adoption rates.

4. Persona expert perspectives on the challenges

4.1. Participants

Seventeen subject matter experts (SMEs) participated in a survey 
about challenges associated with GenAIPs. The participants were 
recruited through professional connections, manually verifying that 
each participant had either published about personas or used personas 
actively in their research or other work. Participants ranged in age from 
29 to 68 years (M = 38.7, SD = 11.5), with 58.8 % (n = 10) identifying as 
male and 41.2 % (n = 7) as female. Their experience with personas 

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

12 


ranged from 2 to 26 years (M = 7.6 years, SD = 6.2 years). The partic
ipants belonged to diverse countries, including Finland, South Korea, 
Portugal, France, Denmark, United States, India, Lebanon, and Pakistan. 
Participants held various roles, including professors (n = 6), researchers 
(n = 5), PhD candidates (n = 3), post-doctoral researchers (n = 2), and 
an engineer (n = 1). Self-reported knowledge of traditional DDPs was 
moderate to high (M = 3.8 , SD = 0.8 on a 5-point scale), while 
knowledge of GenAIPs was slightly lower (M = 3.7, SD = 0.6), indicating 
the participants have good knowledge on both traditional DDPs and 
GenAIPs.

4.2. Data collection

We collected responses from participants using an online survey on 
the Qualtrics platform. We pilot-tested the survey with three persona 
researchers and revised the survey based on the provided feedback to 
ensure clarity and comprehensiveness before deployment to the full 
sample. To provide clear common definitions, we presented traditional 
DDPs as a persona created fully or partially using classical AI technologies (e. 
g., clustering) and GenAIPs as a persona created fully or partially using 
Generative AI technologies (e.g., LLMs). The survey (see the online ap
pendix) contained three sections: (1) perceptions about GenAIP chal
lenges, (2) comparison of GenAIPs with DDPs, and (3) demographic 
information. We first presented the GenAIP challenges and asked 
whether they agreed on the presence of the challenge on a 7-point Likert 
scale (1 to 7, ranging from Strongly disagree to Strongly agree). The 
challenges were compared between DDPs and GenAIPs using a semantic 
scale (“bigger problem for DDPs”, “equal problem for both”, and “bigger 
problem for GenAIPs”). To mitigate order effects, we randomized the 
order of statements presented. We added an open-ended question in 
each section to provide participants a chance to elaborate on their an
swers. In the demographic information section, we asked for the par
ticipant’s gender, age, experience with personas, occupational role, and 
knowledge of DDPs and GenAIPs.

4.3. Results

[Result 1] Experts agree that each of the challenges negatively 
impacts GenAIP: First, we analyzed the perception of challenges by the 
experts (see Fig. 4). Based on the survey responses from the 17 SMEs, all 
challenges received mean ratings above the neutral point of 4 (neither 

agree nor disagree), indicating that experts agree these issues constitute 
challenges for GenAIPs. The most problematic challenges, defined as 
those with mean ratings above 5.31 (M3 of the actual data), were TC02: 
Hallucinations (M = 5.94, SD = 1.20), FC03: Over-sanitization (M =
5.82, SD = 1.02), CC03: Lack of Standardization (M = 5.59, SD = 1.00), 
CC01: Persona Quality Risk (M = 5.53, SD = 1.28), and FC01: Misrep
resentation (M = 5.47, SD = 1.28). Even the lowest-rated challenges, 
including UC02: Validation of the Impact (M = 4.29, SD = 1.57), RC01: 
Superficiality (M = 4.35, SD = 1.37), and PC01: Reliance on Third 
Parties (M = 4.35, SD = 1.50), are above the neutral threshold. This 
indicates that all the pre-defined challenges are considered challenges 
for the GenAIPs by SMEs.

The analysis of relative standard deviation (RSD) values indicates 
different levels of agreement among participants regarding different 
challenges. RSD values across all challenges ranged from 17.4 % to 40.3 
% (M = 29.4 %, SD = 7.3 %), indicating low to moderate levels of 
disagreement. Challenges demonstrating the highest agreement 
included FC03: Over-sanitization (RSD = 17.4 %), CC03: Lack of Stan
dardization (RSD = 18.0 %), and TC02: Hallucinations (RSD = 20.2 %), 
suggesting that participants showed relatively strong agreement about 
the severity of these issues. Conversely, challenges exhibiting the 
greatest disagreement were SC02: Computationally Resource Intensive 
(RSD = 40.3 %), RC03: Limited Generalizability (RSD = 39.7 %), and 
SC01: Adversarial Users (RSD = 39.6 %), indicating that while some 
GenAIP’s challenges are universally recognized, others may be more 
dependent on individual experience or organizational context.

[Result 2] Experts agree that most of the challenges are more 
problematic for GenAIP: Fig. 5 displays a segmented horizontal bar 
graph comparing SMEs’ perceptions of how problematic each of the 20 
challenges is for GenAIPs versus traditional DDPs. Out of the 20 chal
lenges, 12 (60 %) were identified by the largest proportion of SMEs as 
more problematic for GenAIPs, including PC01: Reliance on Third 
Parties (n = 15, 88 %), TC02: Hallucinations (n = 14, 82 %), and SC02: 
Computationally Resource Intensive (n = 13, 76 %). Six (30 %) chal
lenges were viewed as equally problematic for both persona types by the 
highest proportion of experts: UC02: Validating the Impact (n = 14, 82 
%), UC03: Desk Drawer Effect (n = 15, 88 %), FC04: Complications of 
Average (n = 12, 71 %), RC03: Limited Generalizability (n = 11, 65 %), 
FC01: Misrepresentation (n = 11, 65 %), and FC02: Persona Driven 
Discrimination (n = 14, 82 %). Two challenges were seen as more 
problematic for traditional DDPs: RC04: Aggregation (n = 7, 41 %) and 

Fig. 4. Perception of GenAIP challenges by SMEs. All challenges are considered problematic for GenAIPs by the SMEs based on the mean values.

D. Amin et al.                                                                                                                                                                                                                                    International Journal of Human - Computer Studies 205 (2025) 103657 

13 


RC01: Superficiality (n = 5, 29 %). In several cases, the nature of the 
challenge may have informed SMEs’ perceptions. For instance, halluci
nations (TC02) are directly tied to the behavior of GenAI models, thus a 
major problem for GenAIPs. Similarly, the problem of aggregation 
(RC04) appears