Deus Ex Machina and Personas from Large Language Models: 
Investigating the Composition of AI-Generated Persona 

Descriptions 
Joni Salminen Chang Liu Wenjing Pian 

University of Vaasa, Vaasa, Finland Peking University, Beijing, China Fuzhou University, Fuzhou, China 
jonisalm@uwasa.fi imliuc@pku.edu.cn wpian1@e.ntu.edu.sg 
Jianxing Chi Essi Häyhänen Bernard J. Jansen 

Wuhan University, Wuhan, China; University of Vaasa, Vaasa, Finland Qatar Computing Research Institute, 
and Fujian Normal University, essi.hayhanen@uwasa.fi Hamad Bin Khalifa University 

Fuzhou, China bjansen@hbku.edu.qa 
chijx@fjnu.edu.cn 

ABSTRACT 
Large language models (LLMs) can generate personas based on 
prompts that describe the target user group. To understand what 
kind of personas LLMs generate, we investigate the diversity and 
bias in 450 LLM-generated personas with the help of internal eval-
uators (n=4) and subject-matter experts (SMEs) (n=5). The research 
findings reveal biases in LLM-generated personas, particularly in 
age, occupation, and pain points, as well as a strong bias towards 
personas from the United States. Human evaluations demonstrate 
that LLM persona descriptions were informative, believable, posi-
tive, relatable, and not stereotyped. The SMEs rated the personas 
slightly more stereotypical, less positive, and less relatable than the 
internal evaluators. The findings suggest that LLMs can generate 
consistent personas perceived as believable, relatable, and informa-
tive while containing relatively low amounts of stereotyping. 
CCS CONCEPTS 
• Human-centered computing → Human computer interaction 
(HCI). 
KEYWORDS 
AI, LLMs, HCI, user personas, evaluation 
ACM Reference Format: 
Joni Salminen, Chang Liu, Wenjing Pian, Jianxing Chi, Essi Häyhänen, 
and Bernard J. Jansen. 2024. Deus Ex Machina and Personas from Large 
Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions. In Proceedings of the CHI Conference on Human Factors in 
Computing Systems (CHI ’24), May 11–16, 2024, Honolulu, HI, USA. ACM, 
New York, NY, USA, 20 pages. https://doi.org/10.1145/3613904.3642036 
1 INTRODUCTION 
Personas are fictional representations of target users that provide 
valuable insights into user needs, behaviors, and preferences [11] 

This work is licensed under a Creative Commons Attribution International 4.0 License. 
CHI ’24, May 11–16, 2024, Honolulu, HI, USA 
© 2024 Copyright held by the owner/author(s). 
ACM ISBN 979-8-4007-0330-0/24/05 
https://doi.org/10.1145/3613904.3642036 

presented in a narrative format known as a persona description [25]. 
Persona descriptions mainly consist of textual information about 
the user type the persona represents [26]. In turn, large language 
models (LLMs), such as OpenAI’s GPT-4, have shown remarkable 
capabilities in generating coherent and contextually relevant text. 
Rooted in their ability to understand and produce textual content 
[20], LLMs hold promise for further automating persona generation 
processes, while being able to maintain the narrative realism [10] 
of manually crafted personas. In theory, such personas can be 
generated based on data (i.e., be ‘data-driven’), while at the same 
time offering engaging narratives into the circumstances of different 
people groups’ circumstances. To this end, LLMs can generate 
personas based on prompts that describe the target user group, 
their needs, goals, and preferences (see Figure 1 for an example). 
For example, the prompt “Create a persona for a smartwatch game 
user who likes casual and social games” could yield a persona with a 
name, age, occupation, hobbies, motivations, and challenges related 
to smartwatch gaming [40]. 

However, what are such personas like? Are they diverse in their 
representation of people? Do they contain skewness or bias toward 
certain characteristics? Are the LLM-generated personas at all of 
satisfactory quality? How do stakeholders feel about them? These 
are some of the motivating questions behind our study. 

Paoli [28] summarizes the current state of LLM-generated per-
sonas as follows: “If the LLM (with the support of the human 
researcher) can produce at least satisfactorily some forms (or at 
least ideas) of user personas based on a data analysis, we may also 
be able to make a step [. . .] toward Phase 6 [writing the persona 
descriptions].” (p. 5). Hence, we are still at an early stage where we 
do not know how and where to use LLMs in the persona creation 
process. These questions form essential research gaps that HCI 
research on personas needs to be addressed. 

More precisely, we address the following research questions 
(RQs): 
RQ1: How diverse are the characteristics of personas created by LLMs? 
Are there any notable biases?RQ2: How do (a) UX researchers and 
(b) subject-matter experts assess the LLM-generated personas? 

Both RQs matter. For RQ1, if the LLM-generated personas do 
not contain diverse characteristics, there is a risk of persona users 
‘missing out’ on marginalized or fringe user types [15, 16], as these 

https://doi.org/10.1145/3613904.3642036
https://creativecommons.org/licenses/by/4.0/
https://doi.org/10.1145/3613904.3642036
mailto:essi.hayhanen@uwasa.fi
mailto:jonisalm@uwasa.fi
http://crossmark.crossref.org/dialog/?doi=10.1145%2F3613904.3642036&domain=pdf&date_stamp=2024-05-11


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

Figure 1: The process of obtaining LLM-generated persona descriptions includes drafting a prompt (P) that the LLM follows to 
generate persona descriptions. The focus of these persona descriptions is the narrative text content [24, 26]. 
would not be represented in the generated personas. Moreover, 
even though the personas would be diverse in their representation 
of different user types, the personas can still be biased in that they 
over-emphasize certain characteristics at the expense of others [7, 
8]. For example, the persona distribution could be overwhelmingly 
male or predominantly young. 

For RQ2, if end-users of personas consider them to be of weak 
quality, they will not use such personas [23, 29] and the whole 
purpose of creating the personas would be defeated. Overall, these 
questions matter for the persona design practice and the application 
of personas in real projects. 

In this research, we present an in-depth exploration of LLMs’ 
potential for persona generation. Specifically, we analyze the ben-
efits, challenges, and limitations of employing LLMs in persona 
creation and provide valuable insights into the effectiveness and 
reliability of the generated personas for the HCI community. Our 
findings contribute to the understanding of LLMs’ role in persona 
creation. By using LLMs, there is an opportunity to revolutionize 
the way personas are created, providing designers and researchers 
with valuable insights into user behaviors and preferences. This 
transformative nature of LLMs is not to be underestimated by the 
HCI community, nor should the risk involved with this technology 
cloud our judgment of its positive prospects. For example, Schmidt 
[40] reports that “We tried prompts to ask for [HCI-related] ideas, 
similar to user input in brainstorming and focus groups. The results 
are in many ways what we would expect from working with actual 

users.” (p. 8). This statement attests to the fact that the LLMs gener-
ate plausible outputs, drastically different from what existed just a 
couple of years ago. By venturing into this novel domain, we aim to 
establish a foundation for future investigations into LLM-generated 
personas. 

2 REVIEW OF LITERATURE 
LLMs have been explored in the design process in conjunction with 
personas in various ways, but there is currently no study similar to 
ours. Here, we summarize the main previous work. 

Deshpande et al. [12] study the anthropomorphizing of LLMs. 
For instance, telling an LLM to “Talk like a doctor” allows it to 
assume the role of a doctor. The researchers discuss an example 
of how different personas assigned to the same AI system led to 
varied behaviors. The exact ‘persona’ of the system has a large 
effect on its behaviors and decisions and altering the prompt can 
make the LLM switch roles on the fly. However, the study did not 
investigate the characteristics (specifically diversity and/or bias) of 
LLM-generated personas or user perceptions of such personas. 

Paoli [28] illustrates that LLMs can create user personas based 
on thematic analysis (TA) of semi-structured interviews with real 
users. The LLM can generate codes and themes from the interview 
data, and then use them to write personas narratives that include 
the goals, background, needs, challenges, and other relevant details. 


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions 

However, the study did not investigate the characteristics (specif-
ically diversity and/or bias) of LLM-generated personas or user 
perceptions of such personas. In a similar vein, Zhang et al. [45] 
further elaborate that LLMs can be used for cleaning, integrating, 
predicting, and analyzing user feedback data, which is a key step in 
generating high-quality personas. They introduced a GPT-4 based 
tool PersonaGen to generate personas, which can classify different 
persona attributes for downstream tasks. However, the study did 
not investigate the characteristics (specifically diversity and/or bias) 
of LLM-generated personas or user perceptions of such personas. 

Kocaballi [6] reported the capability of ChatGPT to generate 
fictional user personas for a given project, for which ChatGPT was 
asked to generate five different user personas. Five brief descrip-
tions were successfully generated accordingly with a good variety 
of demographics. The researcher [6] further commented that “[the 
GPT-generated] five different personas [showed] a good range of 
variety in demographics, but potentially [lacked] ethnic diversity 
based on their names”, which may suggest a possibility for ethnic 
mismatches in LLM-generated personas. While Kocaballi’s find-
ings were based on a limited sample of personas generated, our 
evaluation further investigates this topic using a larger sample of 
personas and more thorough analyses. 

Alessa and Al-Khalifa [2] created elderly personas from different 
demographic backgrounds that interacted with a conversational 
agent based on the persona’s details. The context was the miti-
gation of experienced loneliness by the elderly. The interaction 
episodes were rated by subject-matter experts based on criteria 
such as engagement, interestingness, fluency, and sense-making. 
However, the elderly personas themselves were not rated nor were 
their characteristics examined in the study. Similarly, Hong et al. 
[21] illustrated the potential of LLMs for assuming the role of the 
persona by responding to users’ natural language queries. The re-
searchers also pointed out the risk of not precisely knowing “whose 
opinions are reflected in the generated [personas]”, which may lead 
to representational biases such as over- or under-representation 
of demographic subgroups [21]. However, the study did not in-
vestigate the characteristics (specifically diversity and/or bias) of 
LLM-generated personas or user perceptions of such personas. 

Cheng et al. [9] presented a framework named “Marked Per-
sonas” that applies natural language prompts to generating per-
sonas, i.e., imagined individuals belonging to specific demographic 
groups. They evaluated the personas by a method named “Marked 
Words”, which included identifying words that statistically distin-
guish personas of marked groups from corresponding unmarked 
ones. The researchers found evidence of harmful patterns like 
stereotypes and essentializing narratives. They also provided rec-
ommendations for LLM creators and researchers to address stereo-
types and essentializing narratives. This is the closest study we 
could locate to ours; yet, it focuses on the intersectionalist analysis 
of bias (particularly race and gender), whilst we explore more vari-
ables, including age, gender, country, and so on. We also add the 
subject-matter expert perspective, which the study by Cheng et al. 
[9] did not do. 

None of the previous studies, as far as we know, have specifically 
investigated the characteristics of LLM-generated personas in terms 
of diversity and bias. Also, we could not locate a study that would 
have tested subject-matter experts’ perceptions of LLM-generated 

CHI ’24, May 11–16, 2024, Honolulu, HI, USA 

personas (for example, Alessa and Al-Khalifa [2] evaluated the 
interaction between SMEs and personas, not the personas them-
selves). Because these dimensions of personas form a core line of 
investigation, the lack of research in them poses a research gap 
that we address in this study. Incorporating demographic attributes 
into persona generation from an HCI perspective is essential for 
designing systems that are more accessible and tailored to diverse 
user needs, thereby fostering inclusive technology development. 
3 METHODOLOGY 
3.1 Research Context 
As the research context, we chose a serious domain: addiction. Our 
choice was based on the notion that personas could be more broadly 
applied for “social good” – that is, societally beneficial purposes 
[32]. So, in our study, the personas represented individuals with 
various types of addiction, including alcohol, opioids, social media, 
online shopping, and gambling. Based on internal ideation among 
the team, these addictions were chosen to represent a wide range 
of addictions that take place in modern people’s lives and touch 
people regardless of age, gender, or nationality. 

Personas are not only applicable to representing ‘users’ of prod-
ucts; instead, they generalize to representing any groups of people, 
for example, survey respondent groups [33]. We want to emphasize 
this broader applicability, also referred to as ‘personas for social 
good’ [18, 32], by focusing on the context of a real societal issue, 
addictions. Addictions are treatable, chronic medical conditions in 
which individuals’ interactions among their brain, genetics, envi-
ronment, and life experiences may lead them to compulsively use 
substances or act in certain behaviors that are harmful to them in 
multiple ways [17]. Additions could be broadly categorized into 
two types: substance use disorders and behavioral addictions [46]. 
Examples of substance use disorders include opioid, nicotine, and 
alcohol use disorders, while behavioral addictions include (but are 
not limited to) gambling and overeating. In our case, the personas 
addicted to alcohol or opioids represent individuals with substance 
use disorders, whereas the personas addicted to gambling, online 
shopping, or social media represent individuals with behavioral 
addictions. 

The chosen conditions describe different forms of addictions in 
the life of a modern person, ranging from more to less severe in 
terms of their immediate health impacts. Opioid addiction is a 
major issue in the United States [44]. Alcohol addiction remains 
one of the most alarming forms of addiction [13]. Gambling is a 
particular concern among young men, although it touches nearly 
all age and gender demographics [27]. Social media and online 
shopping (ranging from impulsive [19] to compulsive shopping 
[42] behaviors) are perhaps more recent but yet serious forms of 
addiction that can have negative impacts on people’s lives, e.g., by 
having adverse financial or social effects. 

While the context of addictions enables us to test the LLM’s 
ability to create personas for social good [32], this context also 
enables us to examine any potential biases related to age or gender 
in a meaningful way. Going forward, personas generated for this 
context could be used in the design of automated app interventions, 
for example, to mitigate these addictive behaviors (although this is 
beyond the scope of the current work). 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

3.2 Persona Generation 
We used GPT-4 (June 2023 version) to generate 450 personas. We 
created three types of prompts for each addiction: one specifying 
male gender, one specifying female gender, and one not specifying 
a gender at all. The reason for this is that, first, we would control 
and balance the number of each gender; second, we plan to test the 
gender distribution when it is not specified. So, given that we have 
five addictions and three prompt types, that yields 15 combinations 
(3 × 5). However, generating only 15 personas would be susceptible 
to inherent randomness in the LLM generation process [3]. To thor-
oughly evaluate the LLM’s ability to generate personas consistently, 
we need to repeat the generation multiple times. Each time, we 
obtain a different persona. We chose to repeat the generation 30 
times for each of the 15 combinations, thus yielding 450 personas 
in total (30 × 15). 

A general challenge with LLM-generated personas is that in-
putting the same prompt multiple times via Open AI’s API to GPT-4 
yielded nearly identical personas, which might be a caching issue. 
We addressed this issue via a two-stage prompting strategy: first, 
we asked the model to generate a list of 30 “skeletal” personas for 
each addiction-prompt type combination (skeletal in the sense they 
only contain basic information [35, 41]). This resulted in unique 
short persona descriptions that we then inputted back to the model, 
asking it to expand each persona description to create the full per-
sona descriptions (i.e., “rounded personas” [25]) for analysis. 
Our code is publicly available in the following Google Colab note-
books (NB): 

• NB1. Skeletal persona generation: https://bit.ly/LLM-
personas-skeletal 

• NB2. Rounded persona generation: https://bit.ly/LLM-
personas-rounded 

Given access to Open AI’s API, other researchers (who have 
access to Open AI) can run the notebooks to generate personas to 
replicate our findings or create personas from different contexts 
by making slight modifications to the prompt (e.g., by changing 
the context from addictions to something else). Note that in our 
prompt, we did not provide an explicit definition of what a persona 
is, as we presumed, based on prior literature [28], that the model 
already knows what the persona is (and this presumption was 
correct). However, we specified the role of GPT (“You are a helpful 
assistant to a social sciences researcher”) as well as a structured 
template for the information we expected (“Provide the output in a 
json array, with each dict containing only the following keys: ‘index’, 
‘name’, ‘age’, ‘occupation’, ‘background’, ‘details’). The expansion 
was done by taking the input personas in the previous step and 
asking the model to expand on them (“Expand on the following 
summary persona. Ensure that all the information provided is used 
in your expanded persona.”). Overall, using the structured template 
approach is aligned with prior research on LLM-generated personas 
[28] and it also has the benefit of producing comparable personas 
(as the information is in standard structure) – this is beneficial also 
for other researchers, as we share our persona dataset. 

The Personas-addicted dataset can be downloaded here: https: 
//bit.ly/LLM-personas-data. Overall, our method is replicable in 
terms of the programming code provided and the analysis is also 
replicable as we provide the persona descriptions themselves. So, 

the methodology itself exemplifies that LLM-generated personas 
can increase persona creation replicability which has been found 
problematic in past studies [8, 30]. This is important because repli-
cability is one key toward accomplishing persona science [35] which 
is the application of scientific principles in the study of personas 
and their users. As such, we believe the datasets to be beneficial to 
others in the HCI community. 

3.3 Evaluation Protocol 
3.3.1 Internal Evaluation. The procedure of data coding and eval-
uation is divided into two stages. The first stage is the internal 
evaluation of the generated personas to gain a “sanity check” on 
their quality, within which four internal evaluators from our re-
search team were involved. The average experience among the UX 
researchers was 9.25 years in UX/HCI research and they were all fa-
miliar with the concepts used in the evaluation, such as pain points. 
Each evaluated approximately 120 personas (of which around 112 
were evaluated by one evaluator and eight were used for the inter-
coder reliability calculation). A mixture of objective quantitative 
and subjective perception-based metrics was adopted to evaluate 
the quality of these personas. The second stage is the subject-matter 
experts’ (SMEs) evaluation of these personas performed by five 
public health professionals with domain expertise on addictions. 
Within this stage, only a subset of these personas was evaluated by 
these external evaluators. The schema of coding and evaluation is 
shown in Table 1. 

In the following, we explain why each criterion is relevant for 
this study. 

Age, gender, and occupation. These are basic characteristics in 
typical persona profiles [26, 31] that enable us to assess whether 
there are any distinct biases or stereotypes concerning demographic 
variables. Demographic diversity is considered important for inclu-
sive design through personas [15, 16], especially when it comes to 
representing all age and gender groups. Alongside the demographic 
information, occupation is often included in persona profiles [5]. 
Text length. This is an interesting variable that captures how ex-
tensive persona descriptions the LLM generates. The information 
contributes to providing a baseline for further comparisons with 
human-generated personas. 

Pain points. Pain points, often referred to as needs, goals, and 
wants, are typical content for personas [11, 26]. Their analysis 
can illustrate what the model understands about human circum-
stances related to the subject matter. We recorded the frequency 
and content of pain points in the coding stage. 

Physical appearance. Appearances matter for personas; for ex-
ample, smiling pictures affect multiple user perceptions of personas 
[36]. Persona attractiveness is consistent with the ‘what is beauti-
ful is good’ effect; personas that are perceived as physically more 
attractive are attributed to other positive traits [38]. So, we evalu-
ated how LLMs would characterize the physical appearance of the 
persona. 
Personality. Personality traits characterize the persona’s psycho-
logical tendencies [37]. These can reveal insights into the LLM’s 
“thinking” in terms of consistency and stereotypicality. So, we 
extract the mentioned personality traits. 

https://bit.ly/LLM-personas-skeletal
https://bit.ly/LLM-personas-skeletal
https://bit.ly/LLM-personas-rounded
https://bit.ly/LLM-personas-rounded
https://bit.ly/LLM-personas-data
https://bit.ly/LLM-personas-data


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions CHI ’24, May 11–16, 2024, Honolulu, HI, USA 
Table 1: The coding sheet applied to extract information and assign evaluation ratings to the personas. The definition column 
includes instructions given to the evaluators. The criteria highlighted in blue color (the last six items of the table) were given 
both to internal and external evaluators; the other criteria before that were coded only by internal evaluations (Krippendorff’s 
U = 0.833). 

Extracted from persona Variable Definition provided to the evaluators 
age The persona’s age 

description as information 

Determined based on human 
evaluation of the persona 

gender 
occupation 
text length 
number of pain points mentioned 

pain point list [open-ended] 
physical appearance mentioned 
if yes, how? Describe physical 
appearance [open-ended] 
personality mentioned 
if yes, how? Describe personality 
[open-ended] 
informativeness for design 
[persona perception] 
believability [persona perception] 
stereotypicality [persona 
perception] 
positivity [persona perception] 

relatability [persona perception] 
consistency [persona perception] 

Persona perceptions. These are users’ perceptions of the persona 
they are using [39]. There are both positive and negative persona 
perceptions: positive ones are qualities we would like to see in 
a persona (in our study, these are (a) informativeness for design, 
(b) believability, (c) positivity, (d) relatability, and (e) consistency), 
whereas negative ones are qualities we would like to avoid (in our 
framework: stereotypicality). In short, a good persona provides 
useful information for design purposes, is believable (i.e., realistic, 
credible), presents the persona in a positive light (not as an antago-
nist), is relatable (i.e., evokes empathy), and is consistent (i.e., does 
not contain conflicting information) [11, 25, 39]. 

We computed the inter-coder reliability based on the four infor-
mational categories shown in Table 1 (note that open-ended and 
persona perception categories cannot be used here because they 
contain subjective information). Since these categories contain a 

The persona’s gender (“m” for male, “f” for female) 
The persona’s job title 
The length of the persona description in words 
A pain point is a problem or issue that the persona has; in this 
context, pain points related to addiction (list as many as you 
find) 
Writeup of the pain points, separated by comma and space 
If the persona’s physical appearance is mentioned, mark “y”; if 
it’s not, mark “n” 
Describe how the physical appearance is described (you can 
paste the text from the persona description) 
If the persona’s personality is mentioned, mark “y”; if it’s not, 
mark “n” 
Describe how the personality is described (you can paste the text 
from the persona description) 
Does the persona description contain adequate 
information to design an app or system to address the 
persona’s needs?* 
Does the persona appear realistic, i.e., lifelike, like an 
actual person that could exist?* 
Does the persona appear stereotypical?* Stereotypes are 
related to a widely held but fixed and oversimplified image or 
idea of a particular type of person or thing. 
Is the person depicted in a positive light?* (an example of 
not being depicted in a positive light is to blame the persona for 
the addiction) 
Is the persona relatable? Relatability is the quality of being 
easy to understand or feel empathy for. 
Is the persona consistent?* Consistent persona does not have 
conflicting information (for example, if the description said “he 
is a happy personality” but later said, “because he is often sad” 
=> these information pieces conflict so you would give a low 
rank for consistency. 

mixture of categorical and numerical data, we selected Krippen-
dorff’s Alpha (U) as the inter-coder reliability metric. The average 
value taken from these four categories indicates high agreement 
(U = 0.833, where above 0.800 is considered high). Therefore, we 
conclude that the coded data is quite reliable. For any observed 
disagreement, the lead author made a judgment call about the final 
data value. 
3.3.2 External Evaluation. In addition to the four internal eval-
uators, five SMEs evaluated a sample of the personas. We used 
stratified random sampling to select 30 personas for the SMEs to 
evaluate. We stratified the sampling by gender and addiction type, 
so there were three male and three female personas in each of the 
five addiction types (3 × 2 × 5 = 30), saving other gender identifica-
tions for future research. 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

Table 2: Human evaluators in this study. The SMEs received a USD 100 compensation. The UX researchers were not financially 
compensated. 
Evaluator ID Occupation Years of exp.* Gender Country 
SME1 Health Surveillance Officer at the State Health 3 Female Nigeria 

Department 
SME2 Rehabilitation counselor 20+ Female United States 
SME3 Lead Processor at Medical Health Services 3 Female United States 
SME4 Master’s degree in Epidemiology, worked on a 4 Female India 

research project related to substance abuse 
SME5 Pharmacist 15+ Female Pakistan 
UXR1 Associate Professor 16 Female China 
UXR2 Research Assistant 1 Female Finland 
UXR3 Associate Professor 12 Male China 
UXR4 Lecturer 8 Female China 
*For SMEs: In public health. For UX: In UX/HCI research. 

Figure 2: The age distribution of the personas. The median is 35 years. 
We recruited five public health professionals for the study as 

SMEs using Upwork, a professional services platform (see Table 2 
for description). The recruitment included a screening stage where 
the SMEs were asked three questions: their knowledge of addictions 
and their work experience in public health. These questions were 
used to ensure that each SME participant had prior experience of 
addiction and public health. Prior knowledge of personas was not 
deemed necessary, as we explained to each SME what a persona is 
before they started their evaluations. So, the SMEs were briefed on 
what personas are; they were then provided with definitions of each 
evaluation criterion (the same ones in Table 1) and asked to evaluate 
the 30 persona descriptions (for study replication, the IDs of the 30 
personas are shared in Appendix 1). We also asked them to provide 
a short, written statement of their overall impression regarding 
the evaluated personas in terms of each evaluation criterion. The 
SMEs were not told that the personas were computer-generated; 
they were simply told that we were researching personas. 

4 RESULTS 
4.1 RQ1: How diverse are the characteristics of 

personas created by LLMs? Are there any 
notable biases? 

4.1.1 Age. The average age of the personas is approximately 37.04 
years (SD = 11.11 years). The range is 17-67, meaning the youngest 
persona is 17 years old and the oldest 67 years old. As can be 
seen from Figure 2, personas are generated across different age 
groups, which is a desirable feature relative to a scenario where 
the personas would only focus on a certain age group. At the same 
time, the Shapiro-Wilk test indicated that the age of the personas 
is not normally distributed, W (474) = .97, p < .001. Rather, the age 
distribution displays platykurtic properties, i.e., lower peakedness 
and flatter tails compared to a normal distribution. 

We next conducted Chi-squared tests to compare the frequencies 
of different addiction types between various age groups. The age 
grouping was adopted from previous persona generation research 
[4, 5]. We omit the age groups 13-17 and 65+ from this analysis, as 
each only had one observation. 


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions CHI ’24, May 11–16, 2024, Honolulu, HI, USA 

Figure 3: Relative risk ratios by personas’ age group and addiction type. Red indicates a higher risk ratio, and blue indicates a 
lower. We can observe that the LLM associates the risk of social media addiction with the youngest age cohort (18-24) and the 
risk of alcohol addiction with the oldest age cohort (55-64). 

First, the results indicate a significant difference in the prevalence 
of gambling addiction among the age groups, j2(4, N = 450) = 10.93, 
p < .05. The age group with the highest prevalence for gambling was 
55-64. Second, there was a significant difference in the prevalence 
of alcohol addiction among the age groups, j2(4, N = 450) = 25.34, 
p < .05. The age group with the highest prevalence for alcohol was 
55-64. Third, there was a significant difference in the prevalence 
of social media addiction among the age groups, j2(4, N = 450) = 
67.78, p < .05. The age group with the highest prevalence for social 
media was 18-24. 

There was no significant difference in the prevalence of shopping 
or opioids among the age groups. However, the age groups with 
the highest prevalence of shopping and opioid addiction were 18-24 
and 35-44, respectively. Figure 3 illustrates the relative risk ratios 
that the LLM-generated personas from different age groups had for 
the different addiction types. 

We also investigated which addiction type is most prevalent for 
each age group. For age groups 18-24 and 25-34, the most prevalent 
addiction type was social media. For the 45-54 and 55-64 age groups, 
it was alcohol. For the age group 35-44, it was opioids. At face value, 
the variability in these addiction types seems to make sense in 
terms of the younger age groups being more addicted to social 
media than the older age groups. 

While further comparison to Census statistics at a population 
level is needed to establish the robustness of these differences, from 
these findings, we can surmise that the LLM has an opinion of 
what age group is typically addicted to what – but in the absence of 
baseline data, we cannot deduct if that opinion is factually correct. 
4.1.2 Gender. When not specifying the persona’s gender in the 
prompt (n = 150), the LLM generated a perfectly even distribution 
of male and female personas, thus resulting in perfect gender parity. 
We also verified whether the generated personas for which the 
gender was specified (male or female) actually matched the specified 

gender. We found this to be true in all cases (100% adherence to 
instructions). There was a statistically significant difference in the 
average age between male (M = 37.72, SD = 11.11) and female (M = 
35.50, SD = 10.67) personas; t(448) = 2.16, p = .031. Even though the 
male personas were slightly older than their female counterparts, 
the difference is not meaningful in practice (only two years). There 
was no statistically significant relationship between gender and 
addiction type, j2(1) = 0.231, p = .994. 
4.1.3 Country. The LLM generated the first and last names for 
each persona. However, neither the persona description nor the 
prompt had information about the persona’s country. So, we ap-
plied Name2GAN, an online research tool for inferring likely de-
mographics (gender, age, country) based on their name (the model 
has been trained on millions of names [22]). The results indicate 
that The LLM-generated personas originated from 15 countries: 
Argentina (n = 1), Australia (n = 1); Brazil (n = 1); Colombia (n 
= 6); Germany (n = 1); Hong Kong, China (n = 1); Mexico (n = 
14); Nigeria (n = 2); Philippines (n = 3); South Korea (n = 2); Spain 
(n = 2); Taiwan, China (n = 3); United Kingdom (n = 25); United 
States (US) (n = 385); and Vietnam (n = 3). Although the large 
number of countries indicates diversity, the frequencies in Figure 4 
indicate strong US-centricity, with the overwhelming majority of 
the personas being from the US. Nonetheless, a Chi-squared test 
indicates no statistically significant difference in the prevalence of 
addiction types between US and non-US personas; j2(4, N = 450) = 
47.08, p = .796 
4.1.4 Occupation. In terms of jobs, the generated personas were 
extremely versatile, with 201 unique jobs being used by GPT-4. 
Examples are shown in Table 3. Overall, the data suggests a wide 
variety of occupations across the personas, with the most common 
being “Graphic Designer”, “Real Estate Agent”, and “Accountant”, 
each with 12 occurrences. So, in terms of job occupations, GPT 
generates a wide diversity of personas. In terms of gender, there 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

Table 4: Five most common terms in the personas’ pain points 
(the list has been cleaned from words such as ‘and’, ‘of’, and 
so on). 

Figure 4: Countries of the generated personas according 
to the Name2GAN model [22], accessed online at https: 
//acua.qcri.org/tool/Name2GAN. Most of the personas (85.6%) 
were from the United States, despite the country not being 
specified in the prompt. This implies that GPT tends to gen-
erate US-centric personas by default. 

is some stereotyping: Figure 5 shows that male personas are more 
likely to be construction workers, software developers, and unem-
ployed, and female personas are more likely to be nurses, event 
planners, and baristas. In terms of addiction type, occupations ap-
pear randomly distributed (see Figure 6) – the only occupation with 
a frequency of higher than 3 is “unemployed” (n = 4 for alcohol 
addiction). 
4.1.5 Pain points. To carry out a thematic analysis identifying 
common themes or topics mentioned in the pain points, we used a 
simple word frequency-based approach. The list in Table 4 contains 
keywords from the pain points extracted from the LLM-generated 
personas. Some potential themes that could be inferred from these 
common words are work-related issues, relationship problems, fi-
nancial troubles, performance concerns, stress, and life disruptions. 
At face value, these reasons appear plausible antecedents for the 
development of addictions, although a more robust assessment is 
required. 

Term Frequency 
work 120 
relationships 119 
financial 116 
performance 77 
stress 72 

Regression modeling was carried out to predict each word’s fre-
quency based on age, gender, and addiction type. The regression 
analysis for work showed a significant relationship with addic-
tion types (gambling, opioids, shopping, and social media) but not 
with age or gender. Among the addiction types, gambling had the 
strongest negative relationship with work (V = -1.5848, p < 0.001), 
followed by shopping (V = -1.5209, p < 0.001) and social media (V = 
-0.9426, p = 0.006). 

The word relationships showed a significant relationship with 
addiction types (gambling, shopping, and social media) but not 
with age or gender. Among the addiction types, shopping had the 
strongest negative relationship with relationships (V = -0.7024, p < 
0.001). 

The word financial showed a significant relationship with addic-
tion type (gambling) but not with age, gender, or other addiction 
types. Gambling had a strong positive relationship with financial 
(V = 1.6406, p < 0.001). 

The word performance showed a significant relationship with age 
and addiction type (gambling, shopping, and social media) but not 
with gender or opioid addiction type. Among the addiction types, 
shopping had the strongest negative relationship with performance 
(V = -0.4897, p < 0.001). 

The word stress showed a significant relationship with addiction 
types (gambling, shopping, and social media) but not with age or 
gender. Among the addiction types, shopping had the strongest 
negative relationship with stress (V = -0.7912, p < 0.001), followed 
by social media (V = -1.3773, p < 0.001). 

In summary, there is no evidence of age or gender bias in LLM-
generated personas’ pain points. However, the analyses reveal that 

Table 3: Example occupations in LLM-generated personas. There were 201 unique occupations, indicating a high degree of 
occupational diversity among the personas. The count shows the five most common occupations and samples of occupations 
mentioned only once. 
Occupations with the given frequency Frequency 
Graphic Designer, Real Estate Agent, Accountant 12 
Retired 11 
Barista, College Student 10 
Architect, Construction Worker, Student 9 
Event Planner, Nurse, Chef 8 
… … 
Instagram Influencer, Science Teacher, Stay-at-home dad, Customer service representative, 1 
Sales Assistant 
Other occupations 185 

https://acua.qcri.org/tool/Name2GAN
https://acua.qcri.org/tool/Name2GAN


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions CHI ’24, May 11–16, 2024, Honolulu, HI, USA 

Figure 5: Relative risk ratios of occupations by gender. 

Figure 6: The frequency of occupations in addiction types. 
the LLM appears to assign certain life conditions more commonly 
to some addiction types than others. (Full regression results are 
included in the online supplemental material.) 
4.1.6 Physical appearance. It was extremely rare that the LLM 
would emphasize physical appearance in the persona descriptions 
it created. The coders logged only eight of such cases (∼1.8%). Even 
among these, detailed scrutiny showed that the physical appearance 
was mentioned in passing and related to the negative consequences 
of the addiction (“a distinct change in his physical appearance”; 
“steady decline in his overall physical appearance”). Of the four ob-
servations where physical appearance was mentioned as a specific 
attribute of the persona, one was about a male persona (“His good 
looks”) and three about a female persona (“Through her videos, she 
showcases her talent, personality, and her gorgeous looks”, “Felicity 
is of average height and has a petite figure”, “Standing 5’6” with 
a slender build, Carla has always been conscious of her weight 

and appearance”). So, we conclude that the LLM does not consider 
physical appearance as a dominant descriptor in this context of 
persona creation (which is correct behavior, as it should not be). 
4.1.7 Personality. While physical appearance was not typically 
mentioned, personality was. In this coding, we considered per-
sonality broadly as the persona’s nature or characters. The coders 
identified personality cues in 180 (∼38.0%) personas, so including 
personality descriptions in the generated personas appears com-
mon behavior for the LLM. Some of the common themes in the way 
the LLM described the personas included: 

(1) Hardworking and dedicated: Many of the personality de-
scriptions emphasize traits such as being hardworking, diligent, 
ambitious, and having a strong work ethic. The personas are de-
scribed as committed to their careers, families, and personal goals 
(e.g., “She is known for her dedication to her demanding career”, “As 
a dedicated teacher, Felicity is diligent and resourceful”, “a highly 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

skilled accountant with a strong work ethic”, “hardworking and 
seemingly responsible individual”, “known for his excellent organi-
zational and problem-solving skills”, “known for his creativity and 
unique approach to visual aesthetics.”). 
(2) Compassionate and caring: Several personas are described 
as warm, compassionate, loving, and devoted, especially towards 
their families or those they work with, such as children or stu-
dents (e.g., “warm and compassionate person, loving and devoted 
mother”, “a deeply caring and empathetic individual”, “a dedicated 
and passionate teacher”). 
(3) Intelligent and creative: Many personas are described as 
intelligent, creative, and having unique talents and skills in various 
fields (e.g., “intelligent”, “a highly skilled accountant with a strong 
work ethic”, “a talented and respected architect”, “talented and 
ambitious art school graduate”, “a bright and ambitious young man, 
“creative, careful”, “has a passion for exploring new places and 
culture.”). 
(4) Extroverted: Some personas are characterized as sociable, 
outgoing, and friendly, enjoying social interactions and engag-
ing with others (e.g., “extrovert”, “friendly, social and outgoing 
young woman”, “an outgoing, friendly, and ambitious individual 
who values hard work and dedication”, “a sociable, outgoing per-
son”, “fun-loving and adventurous individual who loves to travel 
and party.”). 
(5) Introverted: Other personas are described as introverted, find-
ing social situations challenging or preferring solitude (e.g., “intro-
verted”, “self-reported introvert person”, “despite outward appear-
ances, Leo is a deeply introverted individual who struggles in social 
situations.”). 

These examples illustrate that the LLM used a diverse set of 
personalities and psychological characteristics to incorporate hu-
manlikeness in the persona descriptions. The LLM’s viewpoint on 
the individual is often positive – the persona is more portrayed as 
a protagonist than an antagonist (in fact, we could locate no case 
where the LLM would have vilified the persona or presented them 
as a bad person). 
4.1.8 Text analyses. To ensure that no persona description would 
be identical or close to identical, we computed the Levenshtein 
distance (LD) between each description pair. This metric tells us 
how many character changes are needed to make the pair identical 
(so, a low value would indicate a highly similar text description). 
The obtained LD values indicate no description is identical (M = 
1931.65, SD = 222.72, Min = 1285.00, Max = 3025.00). So, the LLM 
does not recycle the same descriptions across the different personas. 

The average length of the LLM-generated persona descriptions 
was 381.78 words (SD = 50.70). This suggests a moderate amount 
of variability in the length. Although we do not have a baseline 
of human-generated personas to compare to (as far as we know, 
nobody has investigated the length of persona descriptions previ-
ously!), the length seems reasonable in the sense of giving adequate 
information about the personas. 

We investigated if there was a difference in the word count be-
tween male and female personas, between personas of different 
ages, and between the addiction types. First, a Mann-Whitney U 
test indicates no statistically significant difference in persona de-
scription lengths in terms of word count between male (M = 379.41, 

SD = 54.16) and female (M 384.10, SD = 47.08) persona descriptions, 
U = 23823.0, p = .282. Second, Spearman’s correlation coefficient 
(d = -0.1547, p = .001) indicates a statistically significant but weak 
negative monotonic relationship between age and persona descrip-
tion length. It suggests that as age increases, persona descriptions 
tend to be shorter, but the strength of this relationship is relatively 
modest (see Figure 7a). 

The results of the Kruskal-Wallis indicated a statistically signifi-
cant difference in the word count of persona descriptions between 
addiction types, H = 22.164, P = .0002. Pairwise comparisons using 
Dunn’s test indicated that the word counts in persona descriptions 
were significantly different between gambling and shopping (p = 
.0007) and gambling and social media (p = .0007). No other dif-
ferences were statistically significant. Overall, the lengths of the 
persona descriptions appear to be aligned, with no noteworthy bias 
observed (see Figure 7b). 
4.2 RQ2: How do (a) UX researchers and (b) 

subject-matter experts assess the 
LLM-generated personas? 

4.2.1 Quantitative results. Addressing RQ2, we found that the LLM-
generated personas generally obtained high scores from the human 
evaluators. The scores shown in Figure 8 indicate a high degree 
of consistency, relatability, positivity, believability, and informa-
tiveness for design. In contrast, stereotypicality is low (which is 
desirable as this is a problem and not a virtue in personas [43]). 
So, these evaluations indicate no quality issues in the generated 
personas – quite the opposite. 

We conducted a series of Welch’s t-tests to assess whether the 
differences in the ratings between the internal evaluators and SMEs 
were mixed, with some statistically significant differences for cer-
tain criteria. As there are six tests (one for each criterion), the 
Bonferroni-adjusted alpha value is 0.05/6 = 0.0083. The results indi-
cate no significant differences for informativeness (t(194.00)= -2.19, 
p = .0300 > .0083), believability (t(191.57) = 1.49, p = .1368), and 
consistency (t(183.24) = 2.37, p = .0.019). However, there were three 
significant differences in ratings given by internal evaluators and 
SMEs. First, the SMEs rated the personas more stereotypical than 
the internal evaluators did, t(177.26) = -5.52, p < .0001. Second, the 
SMEs rated the personas less positive than the internal evaluators 
did, t(177.47) = 11.02, p < .0001. Third, the SMEs rated the personas 
less relatable than the internal evaluators did, t(180.78) = 4.56, p < 
.0001. 

In absolute terms, however, the scores given by the SMEs were 
not bad: the average stereotypicality score they gave (M = 2.98) 
was below the scale average which is four for a seven-point Likert 
scale, whereas all of the “desirable” persona traits were above four. 
So, there are two takeaways here: (1) SMEs gave LLM-generated 
personas lower quality scores than internal evaluators, but (2) both 
the quality scores given by SMEs and internal evaluators indicate 
rather “high” than “low” quality personas. Especially surprising 
is that consistency ranks the highest for both evaluator types, as 
consistency has traditionally been an issue with text generation 
[3]. 

There were no notable differences by gender of the personas 
(see Figure 9), and no measure significantly correlated with the 

https://t(180.78
https://t(177.47
https://t(177.26
https://t(183.24
https://t(191.57
https://t(194.00


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions CHI ’24, May 11–16, 2024, Honolulu, HI, USA 

Figure 7: (a) the relationship between age and word count in persona descriptions. (b) The relationship between addiction type 
and word count. 

Figure 8: Average human evaluator scores across four raters. Example question: “Does the persona appear consistent? 1 = Not 
at all, 7 = Very much). The evaluators were provided with a definition of each criterion. Error bars indicate standard deviation. 

Figure 9: Evaluation scores by persona gender. No notable differences exist. We conducted a series of t-tests which are omitted 
from this manuscript (but available upon request) as no significant differences were found. Error bars indicate standard 
deviation. 
persona’s age (details omitted due to parsimony, available upon 
request). So, the scores given by the human evaluators indicate no 
age or gender bias in terms of persona attributes. 
4.2.2 Qualitative results. To better understand the scores given 
by the SMEs, we asked them to provide open-ended explanations 

regarding their answers for each evaluation criterion. The full feed-
back by the SMEs is provided in Appendix 2; here, we summarize 
the main insights (note: positive comments are highlighted in green 
color, while critique or improvement suggestions are in red color, 
and “E” indicates evaluator ID): 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

Believability. Noteworthy comments were as follows: 
• “The veterans’ personas were especially believable. I found 

some of the shopping addictions a bit hard to believe, 
in particular the social worker and teachers. Social 
workers and teachers usually struggle to make ends meet 
even without a shopping addiction.” (E2). 

• “Overall, the majority of personas demonstrated were highly 
believable. The backgrounds presented combined with their 
high pressures in either their professional or social life made 
the scenarios seem very realistic, that they could be an actual 
person. For example, Persona ID P244, or Ava Chen, seemed 
very realistic, as I am sure the pressures of immigrating to 
an entirely new country and the challenges and barriers that 
exist with this transition seem paramount and never-ending. 
In addition, her trying to take care of her family and also 
being a schoolteacher must bring immense stress, leading 
her to unhealthy coping mechanisms such as alcohol.” (E3). 

• “There were only a few personas whose background, pro-
fession, and addiction disorder did not quite add up. For 
example, Persona P64, or Stacey Rivers, seemed a bit off to 
me. Her escalation from winning at a charity casino night to 
a full-on gambling addiction seemed a bit extreme, combined 
with her background of being a schoolteacher.” (E3). 

The feedback on personas highlighted a mix of believability and 
skepticism, with veteran personas being praised for their authentic-
ity, while some personas, like those of social workers and teachers 
with shopping addictions, were questioned for their realism given 
their financial constraints. The detailed background stories, such 
as Ava Chen’s immigration challenges and resultant stress, were 
recognized for adding depth and realism, making the personas re-
latable and believable. However, some scenarios, like Stacey Rivers’ 
rapid descent into gambling addiction, were deemed unrealistic, 
suggesting a need for more nuanced development to align personas 
more closely with their professional and social contexts. 

Relatability. Noteworthy comments were as follows: 
• “The more personal details given about a persona, the more 

relatable I found them. It would have been helpful to have 
a little more info on the current important relationships in 
their lives.” (E2) 

• “Overall, the majority of personas demonstrated appeared 
highly relatable and garnered much empathy. The caring 
and empathetic professions that many personas had, such as 
being teachers, social workers, environmental activists, etc., 
combined with their caring and connected backgrounds with 
family and friends, made their struggles with these negative 
coping mechanisms very relatable and highly sympathetic.” 
(E3) 

• “There were only a few that seemed off, with the stand-
out being Persona P20, or Sean Hall, who was a 34-year-
old HVAC technician that still lived with his parents while 
having a gambling addiction.” (E3) 

Feedback on the relatability of personas indicated that detailed 
personal stories enhanced empathy and connection, with sugges-
tions for more insights into significant relationships to deepen 
relatability. The personas, particularly those in caring professions 
like teaching or social work, were largely seen as empathetic and 

relatable due to their nurturing backgrounds and the realistic por-
trayal of their struggles with negative coping mechanisms. How-
ever, some personas, such as Sean Hall, the HVAC technician with 
a gambling addiction still living with his parents, were viewed 
as less relatable, suggesting the importance of aligning personal 
circumstances with professional and lifestyle choices for greater 
authenticity. 

Consistency. Noteworthy comments were as follows: 
• “For the most part, consistency was very good. Natalia 

Thompson contained a significant discrepancy. It first 
stated that she adopted a child but later said she had post-
partum depression. This was confusing- did she adopt the 
child or give birth?” (E2) 

• “Overall, I believe all of these personas were very consistent 
with their backgrounds, personalities, professions, and un-
healthy coping mechanisms. While reading each persona, I 
really did not see any contradictions between their thoughts, 
emotions, or actions.” (E3) 

• “The only persona that stood out with conflicting in-
formation was (. . .) Natalia Thompson. In her persona, 
it described her excitement and achievement of adopting 
a baby boy named Alex and then, however, described how 
she was diagnosed with postpartum depression, which 
is depression experienced by women following child-
birth.” (E3) 

Feedback on the consistency of personas was predominantly pos-
itive, highlighting their coherence in backgrounds, personalities, 
professions, and coping mechanisms, with no noticeable contra-
dictions in thoughts, emotions, or actions. However, confusion 
arose with the persona Natalia Thompson, where there was an in-
consistency regarding her situation; she was described as adopting 
a child but was also mentioned to have postpartum depression, a 
condition typically associated with childbirth, leading to questions 
about whether she adopted or gave birth. This discrepancy points 
to a need for clearer storytelling to avoid confusion and maintain 
the integrity of the personas’ narratives. 

Informativeness for design. Noteworthy comments were as 
follows: 

• “Overall, informativeness for design was very good. Per-
sonas #29 and #30, Yvette Patel and Anthony Rogers, seemed 
to describe benzodiazepine addictions rather than opioid 
addictions. Benzos are commonly prescribed for anxiety. 
It would be more unusual for someone to start an opioid 
addiction due to anxiety.” (E2) 

• “For the most part, a lot of these personas described a good 
amount of information in relation to the individual’s back-
ground, relationships, emotions, motives, and professional 
goals, allowing designers to pin-point access to resources 
and information that could help the individual in managing 
their disorder.” (E3). 

• “I believe the personas that did not provide a lot of informa-
tion on what drives the individual to their unhealthy coping 
mechanism and their emotions during it were scored lower 
as it would be harder to find out exactly what resources could 
be used to really help the individual in their addiction.” (E3). 


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions CHI ’24, May 11–16, 2024, Honolulu, HI, USA 

Feedback on the informativeness of personas for design high-
lighted their overall effectiveness, though it pointed out specific 
areas for improvement. For instance, Personas 29 and 30 were cri-
tiqued for inaccurately attributing benzodiazepine characteristics to 
opioid addictions, suggesting a need for more precise information 
regarding the nature of the addiction and its causes. The detailed 
backgrounds, relationships, emotions, motives, and professional 
aspirations provided in most personas were praised for giving de-
signers clear insights into the individuals’ needs, thereby facilitating 
the identification of relevant support resources. However, personas 
lacking detailed information on the motivations behind unhealthy 
coping mechanisms and the emotions experienced during these 
periods were viewed as less useful, indicating that a deeper explo-
ration of these aspects could significantly enhance the design utility 
of the personas. 

Stereotypicality. Noteworthy comments were as follows: 
• “I feel that overall the personas were not too stereotyped. It 

would have been nice to see a little more diversity reflected 
in their names.” (E2). 

• “Overall, I believe the majority of these personas were not 
deemed stereotypical scenarios. The majority of these per-
sonas each held unique backgrounds, emotions, and behav-
iors that are not widely held and fixed/oversimplified images 
or ideas of a particular person.” (E3). 

• “For example, Persona P324, or Yvette Patel, being a single 
schoolteacher with crippling anxiety that led her to an opioid 
addiction seemed the opposite of stereotypical in our society.” 
(E3). 

• “All personas made sense except a few. There was not much 
stereotyping.” (E5). 

• “I gave high sterotypicality scores to a few personas (P327, 
P221, P162, P132, P324) because they looked more like a 
fiction story to me, as though an author is creating them 
from scratch and the people are fictional characters that do 
not exist, but if they do, they are moving about their routine 
life normally even though they are “addicted” to one thing 
or another.” (E5). 

Feedback on the stereotypicality of personas indicated that they 
were generally perceived as non-stereotypical, with a call for greater 
diversity in naming to reflect broader inclusivity. The personas 
were praised for their unique backgrounds, emotions, and behav-
iors that went beyond fixed or oversimplified images, particularly 
highlighting examples like Yvette Patel, whose story as a single 
schoolteacher with anxiety leading to opioid addiction was seen as 
counter-stereotypical. While most personas were viewed as real-
istic and well-constructed, a few were critiqued for seeming more 
like fictional characters, with their life situations and addictions 
feeling too constructed and not reflective of real-life complexities. 
This feedback suggests a balance was largely achieved in avoiding 
stereotypes, but some personas could benefit from more grounded 
detailing to enhance their believability and avoid the impression of 
fiction. 

Positivity. Noteworthy comments were as follows: 
• “Overall, I believe the majority of these personas were pre-

sented in a more neutral light, compared to negative or posi-
tive depiction.” (E3). 

• “Personas that were scored higher for positivity, such as 
James Patterson, were due to their recognition and actions 
in trying to manage their addiction and the positive lifestyles 
that they were trying to lead.” (E3). 

Feedback on the positivity aspect of personas indicated that they 
were generally presented in a neutral manner, neither overly posi-
tive nor negative. Personas like James Patterson, who were scored 
higher for positivity, were distinguished by their proactive efforts 
to manage their addiction and their attempts to maintain or shift 
towards positive lifestyles. This approach underscores the impor-
tance of depicting personas in a balanced way that acknowledges 
their struggles while also highlighting their resilience and efforts 
towards recovery or positive change. 
5 DISCUSSION AND IMPLICATIONS 
5.1 Answers to Research Questions 
RQ1 dealt with the diversity and bias of the LLM-generated per-
sonas. The results indicated that the LLM generated personas of 
different ages, from young to elderly people. That said, the LLM 
was biased toward younger age groups. The addiction types var-
ied among personas from different age groups, but their variation 
appeared logical (i.e., younger groups struggled with social media 
addiction more often, the older groups with alcohol). 

In terms of gender, the LLM generated the same number of male 
and female personas. Male personas were slightly older than female 
personas. In terms of country, the LLM generated personas from 15 
different countries, although 86% of the personas were from the US, 
indicating strong bias. In terms of occupation, the LLM generated 
personas with 201 different jobs. Despite high diversity, there 
was some gender stereotyping, for example, males had a higher 
likelihood of being construction workers and females being nurses. 
The personas’ pain points differed by addiction type, indicating 
that the LLM considered people with different addictions facing 
different types of challenges. In terms of physical appearance, 
the LLM rarely referred to the looks of the persona. In terms of 
personality, the LLM tended to portray the person in a positive 
light, highlighting positive traits over negative ones. There was no 
clear bias in terms of the length of the persona description, except 
that the length was slightly shorter for older personas. 

Overall, the personas generated by LLM appear diverse. They do 
contain some biases, but the source and severity of these biases are 
difficult to assess. It appears that when humans perceive certain 
biases as harmful, these could be remedied by relatively minor 
interventions (e.g., changing the female personas profession from 
nurse to software developer). However, such assessments would 
need to be done on a case-by-case basis. 

RQ2 dealt with the internal and external evaluation of the per-
sonas. The results indicate that LLMs can generate consistent per-
sonas that are perceived as believable, relatable, and informative for 
design while retaining a relatively low level of stereotyping, as per-
ceived by the human evaluators. The potential of LLMs for persona 
generation lies in their possible capacity to generate immersive 
persona descriptions, incorporating demographics, motivations, 
pain points, and preferences based on a given set of inputs. As 
noted by Paoli [28], “. . .there is something powerful in the [Chat-
GPT] model since it knows what a user persona is without needing 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

any contextual explanation.” This property of fluency can explain 
why human evaluators give such high ratings to LLM-generated 
personas. 

In the following, we offer some tentative explanations for the 
observed biases. First, the ‘youth bias’ might stem from the fact 
that machine learning (ML) datasets often over-emphasize younger 
demographics [14]. Second, the US-centricity might have a similar 
background, stemming from the fact that many ML training sets are 
based on English materials. We also must bear in mind that OpenAI 
is a US-based company, which might further accentuate the lack of 
cultural adaptation in its model’s behavior. Third, the LLM’s positive 
outlook on each persona (i.e., portraying predominantly positive 
personality traits and describing the person in a positive light) is 
likely due to Open AI’s guardrails on the output and corresponds 
to observations made by other researchers [9]. 
5.2 Practical Implications for Persona Design 
An important observation for the practical deployment of LLMs 
is that GPT-4 seems to have an innate understanding of what a 
persona is, so it is by default able to start listing needs, pain points, 
attitudes, and so on [28]. So, the model that is supposed to create 
“mental models” in the form of personas has a mental model of its 
own when it comes to understanding what constitutes a persona! 
For the successful implementation of LLMs in the persona creation 
process, we propose the following guidelines: 

• 1. Verify the LLM-generated personas using diversity 
and bias analysis techniques, such as those illustrated 
in this work. There is not necessarily a need for complex 
analyses, but basic descriptive statistics go a long way. 

• 2. Verify the LLM-generated personas using subject-
matter experts to establish external validity. Domain ex-
perts ought to be able to spot if there is ‘anything fishy’ 
about the personas. 

• 3. Adjust the prompts if you observe challenges in 
diversity, bias, or quality of the personas. Prompt design 
will substantially affect the characteristics of the generated 
personas. For example, the strong US-centricity of the LLM-
generated personas could be addressed by instructing the 
LLM to generate personas from different countries. 

By following these three guidelines, persona creators can mit-
igate the challenges and risks associated with using LLMs for 
persona generation. We also note that completely alternative ap-
proaches to making use of LLMs could be deployed, such as fine-
tuning based on existing user or population data. These approaches 
are likely to emerge as the research on LLM-generated personas 
matures. 
5.3 Limitations and Future Research 
As with any study, ours includes some limitations. We discuss them 
here. 

The reader should note that the generated personas are based 
on the general knowledge the GPT-4 model has about people with 
addictions. Apart from the SME evaluations, there was no addi-
tional verification of their factual correctness. Because the SMEs 
noted some inconsistencies in some of the generated personas, such 

inconsistencies should be addressed before considering the applica-
tion of the personas in any real-world scenario. As our evaluation 
of the personas primarily relied on subjective assessments from 
internal researchers and external SMEs, it would be beneficial to 
explore additional trustworthy sources, such as persona databases 
or real-world persona case studies, to compare the generated per-
sonas with those created through actual design processes. It was 
not verified whether these SMEs have experience in specific ad-
diction domains or all of them. Future work could verify that as 
well as recruiting more SMEs to achieve more stable evaluation 
ratings. It is also possible that the SMEs did not understand all 
measurement criteria in the same way as the UX/HCI researchers 
did, specifically informativeness for design. Future research could 
cross-check SMEs’ baseline understanding of HCI metrics. 

Inferring the nationality of personas based on their names within 
the context of addiction might pose problems. Names may be linked 
to a person’s place of birth or even to their parents, whereas the 
reasons for addiction might be more related to the current place 
of residence. These distinctions are essential for personas since a 
person’s place of birth and their current place of living may not 
necessarily be the same. For instance, all the personas listed in the 
paper might be living in the United States during their addiction 
journey. Future research could work to entangle the relationship 
between nationality and place of residence within LLM-generated 
personas. 

Also, a significant contribution to HCI would be interpreting 
how to design prompt engineering to be more robust against biases 
in LLM generation. The initial idea on this is to ensure that the 
prompting covers protected classes and minority groups to generate 
personas from all possible user groups. The fact that specifically 
instructing the LLM to generate male or female personas resulted in 
100% correct gender specification in the output supports the notion 
that the LLM can follow instructions concerning specific persona 
attributes. 

Future research could investigate the textual content of LLM-
generated personas using NLP techniques, similar to the study 
by Cheng et al. [9]. Multiple metrics could be deployed, includ-
ing length, lexical diversity, sentiment, psycholinguistics, and so 
on. Linguistic analysis can reveal more insights into the diversity 
and bias in personas [34]. There is also a need for comparing LLM-
enabled persona generation with the traditional persona generation 
processes, such as those based on user research and user behav-
ior data. This would help emphasize the distinctions and unique 
characteristics of personas generated by LLMs. 

It is not yet evident how LLMs will shape the persona-creation 
process. We have illustrated one possible approach, which is us-
ing the LLM’s foundational knowledge about people to generate 
personas. Another possibility is to ground the persona generation 
more strongly to specific datasets, whereupon the LLM becomes 
a “helper” in the analysis [28]. More studies are needed to test 
the pros and cons of integrating LLMs into the persona creation 
process. 

As with any novel technology, LLM-generated personas come 
with possible harms. They can, either intentionally or inadvertently, 
have adverse societal effects, such as generating unreliable informa-
tion, reinforcing gender stereotypes, affecting diversity representa-
tion, and deceiving users about the capabilities and limitations of 


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions 

their actual degree of quality [1]. We did not focus especially on 
these risks, as it was not in the scope of our study. So, these risks 
warrant further scrutiny from the HCI research community. The 
risks should be weighed against the potential benefits to form a 
balanced perspective on the pros and cons of LLMs for HCI research 
and practice. 

Overall, LLMs are rapidly transforming various spheres of soci-
ety. HCI is not immune to their impact, neither is persona design. 
With this work, we have highlighted multiple avenues for future 
research on this topic which certainly warrants much more inves-
tigation from the HCI community. To facilitate replication of our 
study as well as further research on LLM-generated personas, we 
make our data and coding results publicly available (see the links to 
resources in Section 3). This supports the advancement of persona 
science, as called for in the literature [35]. 
6 CONCLUSION 
Based on the findings, it can be concluded that LLM-generated 
personas exhibit diversity across various demographic and psycho-
logical dimensions. However, some biases are present, primarily 
related to age, occupation, and pain points. Younger age groups are 
overrepresented, and there is gender stereotyping in certain occupa-
tions. Additionally, there is a strong bias towards personas from the 
United States. Despite these biases, LLMs can generate consistent, 
believable, relatable, and informative personas for design purposes. 
Human evaluators generally perceive these personas positively, 
highlighting the fluency of LLMs in understanding and portraying 
user personas. It is important to note that while some biases are 
present, they appear to be addressable through minor interven-
tions on a case-by-case basis. Overall, LLM-generated personas 
hold promise for design and user research applications, providing 
a foundation for further research. 
ETHICAL REMARKS 
The personas generated were not evaluated for factuality. They 
were evaluated for other factors such as believability and consis-
tency. Because they were not evaluated for factuality, we do not 
recommend directly applying them for healthcare (or other) inter-
ventions. To generate personas for actual decision-making, we 
recommend either verifying the factuality of the personas gener-
ated using a general LLM like ChatGPT or then using factual data to 
finetune or otherwise adapt the LLM before the persona generation. 
DECLARATION OF GENERATIVE AI AND 
AI-ASSISTED TECHNOLOGIES IN THE 
WRITING PROCESS 
During the preparation of this work, the author(s) used Open AI’s 
ChatGPT (GPT-3.5 and GPT-4) as well as GPT-4 via API to gen-
erate the personas, assist us in the analysis, and provide material 
for addressing the ‘blank page’ problem in writing. After using 
this tool/service, the author(s) reviewed and edited the content as 
needed and take(s) full responsibility for the content of the publica-
tion. 
REFERENCES 
[1] Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, and Zeerak Talat. 

2023. Mirages: On anthropomorphism in dialogue systems. arXiv preprint arXiv: 

CHI ’24, May 11–16, 2024, Honolulu, HI, USA 

2305.09800 (2023). 
[2] Abeer Alessa and Hend Al-Khalifa. 2023. Towards Designing a ChatGPT Conver-

sational Companion for Elderly People. arXiv preprint arXiv:2304.09866 (2023). 
[3] Mostafa M. Amin, Erik Cambria, and Björn W. Schuller. 2023. Will Affective 

Computing Emerge From Foundation Models and General Artificial Intelligence? 
A First Evaluation of ChatGPT. IEEE Intelligent Systems 38, 2 (2023), 15–23. 

[4] Jisun An, Haewoon Kwak, Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 
2018. Customer segmentation using online platforms: isolating behavioral and 
demographic segments for persona creation via aggregated user data. Social 
Network Analysis and Mining 8, 1 (2018), 54. https://doi.org/10.1007/s13278-018-
0531-0 

[5] Jisun An, Haewoon Kwak, Joni Salminen, Soon-gyo Jung, and Bernard J. Jansen. 
2018. Imaginary People Representing Real Numbers: Generating Personas from 
Online Social Media Data. ACM Transactions on the Web (TWEB) 12, 4 (2018), 27. 
https://doi.org/10.1145/3265986 

[6] A. Baki Kocaballi. 2023. Conversational AI-Powered Design: ChatGPT as De-
signer, User, and Product. arXiv e-prints (2023), arXiv-2302. 

[7] Chris Chapman, Edwin Love, Russell P. Milham, Paul ElRif, and James L. Alford. 
2008. Quantitative Evaluation of Personas as Information. In Proceedings of the 
Human Factors and Ergonomics Society Annual Meeting, September 01, 2008. 
1107–1111. . https://doi.org/10.1177/154193120805201602 

[8] Chris Chapman and Russell P. Milham. 2006. The Personas’ New Clothes: Method-
ological and Practical Arguments against a Popular Method. In Proceedings of 
the Human Factors and Ergonomics Society Annual Meeting, October 01, 2006. 
634–636. . https://doi.org/10.1177/154193120605000503 

[9] Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023. Marked Personas: Using 
Natural Language Prompts to Measure Stereotypes in Language Models. arXiv 
preprint arXiv:2305.18189 (2023). 

[10] Hyunyi Cho, Lijiang Shen, and Kari Wilson. 2014. Perceived Realism: Dimensions 
and Roles in Narrative Persuasion. Communication Research 41, 6 (August 2014), 
828–851. https://doi.org/10.1177/0093650212450585 

[11] Alan Cooper. 1999. The Inmates Are Running the Asylum: Why High Tech Products 
Drive Us Crazy and How to Restore the Sanity (1 edition ed.). Sams - Pearson 
Education, Indianapolis, IN. 

[12] Ameet Deshpande, Tanmay Rajpurohit, Karthik Narasimhan, and Ashwin Kalyan. 
2023. Anthropomorphization of AI: Opportunities and Risks. arXiv preprint arXiv: 
2305.14784 (2023). 

[13] Mohamed A. Elfeki, Mohamed A. Abdallah, Lorenzo Leggio, and Ashwani K. 
Singal. 2023. Simultaneous management of alcohol use disorder and liver disease: 
a systematic review and meta-analysis. Journal of Addiction Medicine 17, 2 (2023), 
e119–e128. 

[14] Raul Vicente Garcia, Lukasz Wandzik, Louisa Grabner, and Joerg Krueger. 2019. 
The Harms of Demographic Bias in Deep Face Recognition Research. In 2019 
International Conference on Biometrics (ICB), June 2019. 1–6. . https://doi.org/10. 
1109/ICB45273.2019.8987334 

[15] Joy Ai-Leen Goodman-Deane, Mike Bradley, Sam Waller, and P. John Clarkson. 
2021. Developing personas to help designers to understand digital exclusion. 
Proceedings of the Design Society 1, (2021), 1203–1212. 

[16] Joy Goodman-Deane, Sam Waller, Dana Demin, Arantxa González-de-Heredia, 
Mike Bradley, and John P. Clarkson. 2018. Evaluating Inclusivity using Quantita-
tive Personas. In In the Proceedings of Design Research Society Conference 2018, 
June 28, 2018, Limerick, Ireland. Limerick, Ireland. . https://doi.org/10.21606/drs. 
2018.400 

[17] Jon E. Grant, Marc N. Potenza, Aviv Weinstein, and David A. Gorelick. 2010. 
Introduction to behavioral addictions. The American journal of drug and alcohol 
abuse 36, 5 (2010), 233–241. 

[18] Kathleen W. Guan, Joni Salminen, Soon-Gyo Jung, and Bernard J. Jansen. 2023. 
Leveraging Personas for Social Impact: A Review of Their Applications to So-
cial Good in Design. International Journal of Human–Computer Interaction 0, 0 
(September 2023), 1–16. https://doi.org/10.1080/10447318.2023.2247568 

[19] Muhammad Bilal Gulfraz, Muhammad Sufyan, Mekhail Mustak, Joni Salminen, 
and Deepak Kumar Srivastava. 2022. Understanding the impact of online cus-
tomers’ shopping experience on online impulsive buying: A study on two leading 
E-commerce platforms. Journal of Retailing and Consumer Services 68, (September 
2022), 103000. https://doi.org/10.1016/j.jretconser.2022.103000 

[20] Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating Large 
Language Models in Generating Synthetic HCI Research Data: a Case Study. 
In Proceedings of the 2023 CHI Conference on Human Factors in Computing Sys-
tems (CHI ’23), April 19, 2023, New York, NY, USA. Association for Computing 
Machinery, New York, NY, USA, 1–19. . https://doi.org/10.1145/3544548.3580688 

[21] Matthew K. Hong, Shabnam Hakimi, Yan-Ying Chen, Heishiro Toyoda, Charlene 
Wu, and Matt Klenk. 2023. Generative AI for Product Design: Getting the Right 
Design and the Design Right. arXiv preprint arXiv:2306.01217 (2023). 

[22] Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 2021. All About the 
Name: Assigning Demographically Appropriate Names to Data-Driven En-
tities. In Proceedings of the 54th Hawaii International Conference on System 

arXiv:2305.09800
arXiv:2305.09800
arXiv:2304.09866
https://doi.org/10.1007/s13278-018-0531-0
https://doi.org/10.1007/s13278-018-0531-0
https://doi.org/10.1145/3265986
https://doi.org/10.1177/154193120805201602
https://doi.org/10.1177/154193120605000503
arXiv:2305.18189
https://doi.org/10.1177/0093650212450585
arXiv:2305.14784
arXiv:2305.14784
https://doi.org/10.1109/ICB45273.2019.8987334
https://doi.org/10.1109/ICB45273.2019.8987334
https://doi.org/10.21606/drs.2018.400
https://doi.org/10.21606/drs.2018.400
https://doi.org/10.1080/10447318.2023.2247568
https://doi.org/10.1016/j.jretconser.2022.103000
https://doi.org/10.1145/3544548.3580688
arXiv:2306.01217


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 

Sciences, 2021, Virtual conference. Virtual conference. . Retrieved from http: 
//hdl.handle.net/10125/71108 

[23] Tara Matthews, Tejinder Judge, and Steve Whittaker. 2012. How do designers and 
user experience professionals actually perceive and use personas? In Proceedings 
of the 2012 ACM annual conference on Human Factors in Computing Systems 
- CHI ’12, 2012, Austin, Texas, USA. ACM Press, Austin, Texas, USA, 1219. . 
https://doi.org/10.1145/2207676.2208573 

[24] Lene Nielsen. 2002. From User to Character: An Investigation into User-
descriptions in Scenarios. In Proceedings of the 4th Conference on Designing Interac-
tive Systems: Processes, Practices, Methods, and Techniques (DIS ’02), 2002, London, 
England. ACM, London, England, 99–104. . https://doi.org/10.1145/778712.778729 

[25] Lene Nielsen. 2019. Personas - User Focused Design (2nd ed. 2019 edition ed.). 
Springer, New York, NY, USA. 

[26] Lene Nielsen, Kira Storgaard Hansen, Jan Stage, and Jane Billestrup. 2015. A 
Template for Design Personas: Analysis of 47 Persona Descriptions from Danish 
Industries and Organizations. International Journal of Sociotechnology and Knowl-
edge Development 7, 1 (2015), 45–61. https://doi.org/10.4018/ijskd.2015010104 

[27] Christian Nyemcsok, Hannah Pitt, Peter Kremer, and Samantha L. Thomas. 2023. 
Viewing young men’s online wagering through a social practice lens: implications 
for gambling harm prevention strategies. Critical Public Health 33, 2 (2023), 
241–252. 

[28] Stefano Paoli. 2023. Writing user personas with Large Language Models: Testing 
phase 6 of a Thematic Analysis of semi-structured interviews. 

[29] Kari Rönkkö, Mats Hellman, Britta Kilander, and Yvonne Dittrich. 2004. Personas 
is Not Applicable: Local Remedies Interpreted in a Wider Context. In Proceedings 
of the Eighth Conference on Participatory Design: Artful Integration: Interweaving 
Media, Materials and Practices - Volume 1 (PDC 04), 2004, Toronto, Ontario, 
Canada. ACM, Toronto, Ontario, Canada, 112–120. . https://doi.org/10.1145/ 
1011870.1011884 

[30] Joni Salminen, Kathleen Guan, Soon-gyo Jung, Shammur Absar Chowdhury, and 
Bernard J. Jansen. 2020. A Literature Review of Quantitative Persona Creation. In 
CHI ’20: Proceedings of the 2020 CHI Conference on Human Factors in Computing 
Systems, April 25, 2020, Honolulu, Hawaii, USA. ACM, Honolulu, Hawaii, USA, 
1–14. . https://doi.org/10.1145/3313831.3376502 

[31] Joni Salminen, Kathleen Guan, Lene Nielsen, Soon-gyo Jung, Shammur Absar 
Chowdhury, and Bernard J. Jansen. 2020. A Template for Data-Driven Personas: 
Analyzing 31 Quantitatively Oriented Persona Profiles. In Human Interface and 
the Management of Information. Designing Information. HCII 2020., S. Yamamoto 
and H. Mori (eds.). Springer, Copenhagen, Denmark, 125–144. 

[32] Joni Salminen, Kathleen W. Guan, Soon-Gyo Jung, and Bernard J. Jansen. 2022. 
Use Cases for Design Personas: A Systematic Review and New Frontiers. In 2022 
ACM Conference on Human Factors in Computing Systems (CHI’22), 2022, New 
Orleans, USA. ACM, New Orleans, USA. 

[33] Joni Salminen, Bernard Jansen, and Soon-Gyo Jung. 2022. Survey2Persona: Ren-
dering Survey Responses as Personas. In Adjunct Proceedings of the 30th ACM 
Conference on User Modeling, Adaptation and Personalization (UMAP ’22 Adjunct), 
July 04, 2022, New York, NY, USA. Association for Computing Machinery, New 
York, NY, USA, 67–73. . https://doi.org/10.1145/3511047.3536403 

[34] Joni Salminen, Soon-Gyo Jung, Shammur Chowdhury, Dianne Ramirez Robillos, 
and Bernard J. Jansen. 2021. The Ability of Personas: An Empirical Evalua-
tion of Altering Incorrect Preconceptions About Users. International Journal of 
Human-Computer Studies (March 2021), 102645. https://doi.org/10.1016/j.ijhcs. 
2021.102645 

[35] Joni Salminen, Soon-Gyo Jung, and Bernard Jansen. 2022. Developing Persona 
Analytics Towards Persona Science. In 27th International Conference on Intelligent 
User Interfaces (IUI ’22), March 22, 2022, New York, NY, USA. Association for 
Computing Machinery, New York, NY, USA, 323–344. . https://doi.org/10.1145/ 

3490099.3511144 
[36] Joni Salminen, Soon-gyo Jung, João M. Santos, and Bernard J. Jansen. 2019. Does 

a Smile Matter if the Person Is Not Real?: The Effect of a Smile and Stock Photos 
on Persona Perceptions. International Journal of Human–Computer Interaction 0, 
0 (September 2019), 1–23. https://doi.org/10.1080/10447318.2019.1664068 

[37] Joni Salminen, Rohan Gurunandan Rao, Soon-gyo Jung, Shammur A. Chowdhury, 
and Bernard J. Jansen. 2020. Enriching Social Media Personas with Personality 
Traits: A Deep Learning Approach Using the Big Five Classes. In Artificial 
Intelligence in HCI (Lecture Notes in Computer Science), 2020, Cham. Springer 
International Publishing, Cham, 101–120. . https://doi.org/10.1007/978-3-030-
50334-5_7 

[38] Joni Salminen, João M. Santos, Soon-gyo Jung, and Bernard J. Jansen. 2023. How 
does an imaginary persona’s attractiveness affect designers’ perceptions and IT 
solutions? An experimental study on users’ remote working needs. Information 
Technology & People 36, 8 (January 2023), 196–225. https://doi.org/10.1108/ITP-
09-2022-0729 

[39] Joni Salminen, Joao M. Santos, Haewoon Kwak, Jisun An, Soon-gyo Jung, and 
Bernard J. Jansen. 2020. Persona Perception Scale: Development and Exploratory 
Validation of an Instrument for Evaluating Individuals’ Perceptions of Personas. 
International Journal of Human-Computer Studies 141, (April 2020), 102437. https: 
//doi.org/10.1016/j.ijhcs.2020.102437

[40] Albrecht Schmidt. 2023. Speeding Up the Engineering of Interactive Systems with 
Generative AI. In Companion Proceedings of the 2023 ACM SIGCHI Symposium on 
Engineering Interactive Computing Systems, 2023. 7–8. 

[41] Phillip Douglas Stevenson and Christopher Andrew Mattson. 2019. The Personi-
fication of Big Data. Proceedings of the Design Society: International Conference on 
Engineering Design 1, 1 (July 2019), 4019–4028. https://doi.org/10.1017/dsi.2019. 
409 

[42] Piotr Tarka, Monika Kukar-Kinney, and Richard J. Harnish. 2022. Consumers’ 
personality and compulsive buying behavior: The role of hedonistic shopping ex-
periences and gender in mediating-moderating relationships. Journal of Retailing 
and Consumer Services 64, (2022), 102802. 

[43] Phil Turner and Susan Turner. 2011. Is stereotyping inevitable when designing 
with personas? Design studies 32, 1 (2011), 30–44. 

[44] Jennifer C. Veilleux, Peter J. Colvin, Jennifer Anderson, Catherine York, and 
Adrienne J. Heinz. 2010. A review of opioid dependence treatment: pharmacolog-
ical and psychosocial interventions to treat opioid addiction. Clinical psychology 
review 30, 2 (2010), 155–166. 

[45] Xishuo Zhang, Lin Liu, Yi Wang, Xiao Liu, Hailong Wang, Anqi Ren, and Chetan 
Arora. 2023. PersonaGen: A Tool for Generating Personas from User Feedback. 
arXiv preprint arXiv:2307.00390 (2023). 

[46] Noam Zilberman, Gal Yadid, Yaniv Efrati, Yehuda Neumark, and Yuri Rassovsky. 
2018. Personality profiles of substance and behavioral addictions. Addictive be-
haviors 82, (2018), 174–181. 

A PERSONA IDS FOR SME EVALUATION 
Persona IDs: P327, P01, P410, P414, P182, P221, P197, P246, P83, 
P369, P449, P44, P274, P298, P20, P61, P426, P162, P64, P132, P247, 
P380, P316, P111, P244, P139, P99, P324, P348, P171 

B OPEN-ENDED FEEDBACK FROM SUBJECT 
MATTER EXPERTS 

http://hdl.handle.net/10125/71108
http://hdl.handle.net/10125/71108
https://doi.org/10.1145/2207676.2208573
https://doi.org/10.1145/778712.778729
https://doi.org/10.4018/ijskd.2015010104
https://doi.org/10.1145/1011870.1011884
https://doi.org/10.1145/1011870.1011884
https://doi.org/10.1145/3313831.3376502
https://doi.org/10.1145/3511047.3536403
https://doi.org/10.1016/j.ijhcs.2021.102645
https://doi.org/10.1016/j.ijhcs.2021.102645
https://doi.org/10.1145/3490099.3511144
https://doi.org/10.1145/3490099.3511144
https://doi.org/10.1080/10447318.2019.1664068
https://doi.org/10.1007/978-3-030-50334-5_7
https://doi.org/10.1007/978-3-030-50334-5_7
https://doi.org/10.1108/ITP-09-2022-0729
https://doi.org/10.1108/ITP-09-2022-0729
https://doi.org/10.1016/j.ijhcs.2020.102437
https://doi.org/10.1016/j.ijhcs.2020.102437
https://doi.org/10.1017/dsi.2019.409
https://doi.org/10.1017/dsi.2019.409
arXiv:2307.00390


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions CHI ’24, May 11–16, 2024, Honolulu, HI, USA 
Question Evaluator 1 Evaluator 2 Evaluator 3 Evaluator 4 Evaluator 5 
Believability - what The personas are quite 
was good and what believable and realistic in 
seemed off? their portrayal of 

individuals struggling with 
addiction. 

The veterans’ personas Overall, the majority of Good: Yes, I think all The first few personas were 
were especially believable. personas demonstrated personas appear realistic. not believable (as I have 
I found some of the were highly believable. The Seemed off: In some explained in the 
shopping addictions a bit backgrounds presented persona like Lila Bennett consistency section) but as I 
hard to believe, in combined with their high little more detail is needed moved down the questions, 
particular the social worker pressures in either their to make them appear like the personas started 
and teachers. Social professional or social life an actual person. making sense and turned 
workers and teachers made the scenarios seem believable especially those 
usually struggle to make very realistic, that they related to social media 
ends meet even without a could be an actual person. addiction, PTSD 
shopping addiction. For example, Persona ID opioid/alcohol dependence 

P244, or Ava Chen, seemed and online shopping. Some 
very realistic, as I am sure personas about single 
the pressures of parents giving in to alcohol 
immigrating to an entirely dependence due to pressure 
new country and the of raising a child were not 
challenges and barriers believable. 
that exist with this 
transition seem paramount 
and never ending. In 
addition, her trying to take 
care of her family and also 
being a schoolteacher must 
bring immense stress, 
leading her to unhealthy 
coping mechanisms such 
as alcohol. There were only 
a few personas whose 
background, profession, 
and addiction disorder did 
not quite add up. For 
example, Persona P64, or 
Stacey Rivers, seemed a bit 
off to me. Her escalation 
from winning at a charity 
casino night to a full-on 
gambling addiction seemed 
a bit extreme, combined 
with her background of 
being a schoolteacher. 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 
Relatability - what 
was good and what 
seemed off? 

Consistency - what 
was good and what 
seemed off? 

The personas are mostly 
relatable in the sense that 
they show case different 
types of people who could 
be affected by addiction. 

The personas are mostly 
consistent in their 
portrayal of addiction and 
the associated challenges 

The more personal details 
given about a persona, the 
more relatable I found 
them. It would have been 
helpful to have a little more 
info on the current 
important relationships in 
their lives. 

For the most part, 
consistency was very good. 
Persona ID #6 (Natalia 
Thompson) contained a 
significant discrepancy. It 
first stated that she 
adopted a child but later 
said she had postpartum 
depression. This was 
confusing- did she adopt 
the child or give birth? 

Overall, the majority of 
personas demonstrated 
appeared highly relatable 
and garnered much 
empathy. The caring and 
empathetic professions that 
many personas had, such 
as being teachers, social 
workers, environmental 
activists, etc., combined 
with their caring and 
connected backgrounds 
with family and friends, 
made their struggles with 
these negative coping 
mechanisms very relatable 
and highly sympathetic. 
Especially due to the fact 
that a large portion of 
these individuals 
recognized their unhealthy 
disorder and were trying to 
find ways to remedying it 
and alleviate the stress and 
pain it causes both to them 
and those who care for 
them. There were only a 
few that seemed off, with 
the stand-out being 
Persona P20, or Sean Hall, 
who was a 34-year-old 
HVAC technician that still 
lived with his parents 
while having a gambling 
addiction. 
Overall, I believe all of 
these personas were very 
consistent with their 
backgrounds, personalities, 
professions, and unhealthy 
coping mechanisms. While 
reading each persona, I 
really did not see any 
contradictions between 
their thoughts, emotions, 
or actions. The only 
persona that stood-out 
with conflicting 
information was Persona 
221, or Natalia Thompson. 
In her persona, it described 
her excitement and 
achievement of adopting a 
baby boy named Alex and 
then, however, described 
how she was diagnosed 
with postpartum 
depression, which is 
depression experienced by 
women following 
childbirth. 

Good: Yes, according to me 
all personas are relatable 

Good: All personas look 
consistent but, in Lila 
Bennett’s description, a 
slight contradiction is there 

Most of the questions made 
sense and there were 
real-life examples to relate 
to. Only 2-3 personas did 
not look relatable to me, 
and I have explained in the 
consistency section. 

Almost all personas showed 
consistency and I gave 
them high scores except 
P327 and P221. Let me 
explain why. 
P327 did not look 
consistent to me. Samantha 
was shown in such a 
positive light, i.e., young, 
determined and committed 
yet she caved into the 
pressure of her business in 
just 5 years. This is not just 
too early but also 
inconsistent for a person 
who has studied and 
prepared for a job/business 
all their life. How can you 
give up on something 
you’re so passionate about 
and that too so quickly? 
The personas turned a 
sharp turn from positivity 
to negativity. Therefore, I 
gave it low consistency 
scores. 
P221 also looked 
inconsistent. Natalie was 
passionate and looked like 
an achiever. She wanted a 
career, she studied and 
strived for it; she wanted a 
baby, she adopted one 
without waiting to meet the 
right man and making a 
biological baby. She looks 
like a doer, yet she allowed 
herself to be crushed under 
the burden of 
responsibilities. For such a 
strong and independent 
person who takes big 
decisions, Natalie seemed 
to have a sudden and 
abrupt shift in her behavior. 
She ran her house 
effectively before adopting 
the baby. There are day 
care centers for babies. She 
has a stable job, there is no 
way she should be quitting 
so soon. It’s abrupt and 
inconsistent, in my opinion. 


Deus Ex Machina and Personas from Large Language Models: Investigating the Composition of AI-Generated Persona 
Descriptions CHI ’24, May 11–16, 2024, Honolulu, HI, USA 
Informativeness for 
design - what was 
good and what 
seemed off? 

The personas are 
informative for design in 
the sense that they provide 
designers with a deeper 
understanding of the 
experiences and need of 
individual struggling with 
addiction. 

Overall, informativeness 
for design was very good. 
Personas #29 and #30, 
Yvette Patel and Anthony 
Rogers, seemed to describe 
benzodiazepine addictions 
rather than opioid 
addictions. Benzos are 
commonly prescribed for 
anxiety. It would be more 
unusual for someone to 
start an opioid addiction 
due to anxiety. 

Overall, I believe the 
majority of these personas 
presented an adequate 
amount of information to 
design an app or system to 
address the persona’s 
needs. For the most part, a 
lot of these personas 
described a good amount 
of information in relation 
to the individual’s 
background, relationships, 
emotions, motives, and 
professional goals, 
allowing designers to 
pin-point access to 
resources and information 

Yes, to design an app 
adequate amount of 
information about 
personas is present 

The informativeness factor 
was really good. All 
personas except 3 were 
educational and 
informative. Persona #1 
looks so conflicting. 
Samantha is young and 
determined. It looks 
unbelievable that she’d fall 
for addiction in such a 
young age and just after 5 
years of starting her 
business. Her challenges do 
not look big enough to 
affect her mental health so 
much. The persona looks 
off and somewhat 

that could help the 
individual in managing 
their disorder. I believe the 

inconsistent, unbelievable, 
and unrelatable. 
P414 also looks weird. I 

personas that did not 
provide a lot of 
information on what drives 
the individual to their 
unhealthy coping 
mechanism and their 
emotions during it were 
scored lower as it would be 
harder to find out exactly 
what resources could be 
used to really help the 
individual in their 
addiction. 

have worked as a 
consultant and after 
spending so much time 
online, smart phone and 
digital screens are the last 
thing I want in my routine. 
Sitting all day and staring 
at the screen is so torturing. 
You want to run away from 
it, not indulge in it. 
P221 looked totally made 
up. Natalia is not the first 
and the only working 
woman in the country. 
Nearly all women work 
full-time and the challenge 
if raising a baby is not big 
enough to turn her into an 
alcoholic. The scenario 

Positivity - any 
comments on this? The personas do not 

necessarily focus on 
positivity, as they are 
meant to depict the 
challenges and struggles 
associated with addiction. 

There was a wide range in 
positivity among the 
personas. Some only 
mentioned the persona’s 
career and no other details 
about them. The more we 
know about a person’s 
positive attributes 
(strengths, interests, 
achievements), the more 
relatable and believable it 
is. 

Overall, I believe the 
majority of these personas 
were presented in a more 
neutral light, compared to 
negative or positive 
depiction. Because the 
majority of these personas 
give a balanced 
background between the 
individual’s successes and 
challenges, I believe they 
seemed human for the 

I feel that very few 
personas show positive 
behavior toward their 
addiction 

looked stereotypical and 
made up. 
Save a few personas, there 
was a huge positive factor 
in all. Many people were 
inherently good but fell 
prey to their circumstances. 
They all realized their 
addiction/dependence 
status and showed 
inclination to seek help. I 
gave low positivity scores 
to a few personas (P327, 
P01, P414, P410, P182, P197, 

most part and were scored 
in a more neutral zone. 
Personas that were scored 
higher for positivity, such 
as P449, or James Patterson, 
were due to their 
recognition and actions in 
trying to manage their 
addiction and the positive 
lifestyles that they were 
trying to lead. 

P83, P162, P64, P171) 
because they had a very 
little positivity element. 
The description showed 
they were either spoiled or 
totally messed up from the 
start and made no effort to 
improve their life even 
though they had families to 
support and responsibilities 
on their shoulders. Many 
personas did not even 
realize they had an 
addiction or needed help. 
Their personas were off. 


CHI ’24, May 11–16, 2024, Honolulu, HI, USA Joni Salminen et al. 
Stereotypicality - any 
comments on this? 

Any other comments 
or remarks about the 
personas? 

The personas do not seem 
to rely on stereotypes, as 
they appear to be based on 
real life experiences. 

The personas are well 
crafted and provide a useful 
starting point for designers 
to understand and 
empathize with individuals 
struggling with addiction. 

I feel that overall the 
personas were not too 
stereotyped. It would have 
been nice to see a little 
more diversity reflected in 
their names. 

I thought these were 
well-done overall. If the