"When AI Writes Personas": Analyzing Lexical Diversity in
LLM-Generated Persona Descriptions

Sankalp Sethi
College of Information Science

University of Arizona
Tucson, Arizona, USA

sankalpsethi@arizona.edu

Joni Salminen
University of Vaasa

Vaasa, Finland
jonisalm@uwasa.fi

Danial Amin
University of Vaasa

Vaasa, Finland
danialam@uwasa.fi

Bernard J Jansen
Qatar Computing Research Institute

Hamad Bin Khalifa University
Doha, Qatar

bjansen@hbku.edu.qa

Abstract
Large language models (LLMs) are increasingly employed in gen-
erating user personas representing various groups of people. It is
vital that these personas do not contain major sources of bias for
stakeholders using the personas. To investigate linguistic bias in
LLM-generated personas, we apply eleven lexical diversity metrics
to analyze the association between linguistic diversity in 600 per-
sona descriptions generated using five LLMs (GPT, Claude, Gemini,
DeepSeek, Llama) and demographic attributes (age, gender, country)
of the personas. We find that LLM-generated persona descriptions
are lexically diverse independently of the personas’ demographic
attributes. While we find no significant demographic bias in the
persona profiles, we do find significant differences between the
lexical diversity of persona descriptions generated by the LLMs.
The persona descriptions generated by Gemini 1.5 Pro have the
highest lexical diversity. The results imply that current LLMs can
generate lexically diverse persona descriptions, but the selection of
an LLM for specific applications is an important decision.

CCS Concepts
• Human-centered computing→ Empirical studies in HCI .

Keywords
AI, LLMs, user personas, lexical diversity, evaluation

ACM Reference Format:
Sankalp Sethi, Joni Salminen, Danial Amin, and Bernard J Jansen. 2025.
"When AI Writes Personas": Analyzing Lexical Diversity in LLM-Generated
Persona Descriptions. In Extended Abstracts of the CHI Conference on Human
Factors in Computing Systems (CHI EA ’25), April 26–May 01, 2025, Yokohama,
Japan. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3706599.
3719712

Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
CHI EA ’25, Yokohama, Japan
© 2025 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-1395-8/25/04
https://doi.org/10.1145/3706599.3719712

1 Introduction
Personas are humanized depictions of user segments that are used
for user representation and understanding in user experience (UX)
design, product development, and human-computer interaction
(HCI) research and practice [8, 15]. Personas are usually presented
as a profile. A key component of the persona profile is the persona
description (also called ‘narrative’), a text describing the persona’s
attributes, including needs, preferences, and background in a narra-
tive format [39]. See Figure 1 for an example persona description.

Traditionally, persona descriptions have been written by human
persona developers. However, this is increasingly changing due to
the ability of Generative AI (GenAI) and large language models
(LLMs) to generate fluent text content [18, 54, 61]. Natural language
processing (NLP) has contributed to data-driven persona develop-
ment by providing persona developers with multiple techniques
to computationally process data, such as from user interviews or
user-generated social media content [51]. Current state-of-the-art
NLP technologies include GenAI and LLMs, which are rapidly influ-
encing HCI research, including persona development [18, 45, 54].
The ability of the current generation of LLMs to generate context-
sensitive and detail-rich text [2] makes persona description gen-
eration a seemingly fitting use case for LLMs in HCI. LLMs can
contribute to several tasks in persona development, ranging from
data analysis to writing persona descriptions [26].

The evaluation of user personas is a central research topic in
persona research [7, 51]. One critical aspect of evaluation is diver-
sity, referring to how varied the developed personas are in their
representation of various end-user groups. It is believed that more
diverse personas also yield more inclusive design choices [53]; that
is, covering more (especially underrepresented) user groups [13].
Most existing work evaluating persona diversity focuses on the
demographic diversity of persona sets [23], LLM-generated or oth-
erwise [29, 53], analyzing how well the persona set covers the
groups represented by the personas [30]. The role of demographics
in persona development and use is important as research has found
many effects of varied demographics of personas on stakeholder
perceptions of personas [24, 27, 52, 55].)

The proliferation of textual content generated by LLMs has
prompted research into benchmarking the linguistic diversity of

https://orcid.org/0009-0006-8708-0876
https://orcid.org/0000-0003-3230-0561
https://orcid.org/0009-0000-7597-2267
https://orcid.org/0000-0002-6468-6609
https://doi.org/10.1145/3706599.3719712
https://doi.org/10.1145/3706599.3719712
https://doi.org/10.1145/3706599.3719712
http://crossmark.crossref.org/dialog/?doi=10.1145%2F3706599.3719712&domain=pdf&date_stamp=2025-04-25


CHI EA ’25, April 26–May 01, 2025, Yokohama, Japan Sethi et al.

Figure 1: An example persona obtained from Survey2Persona,
a system using LLMs for writing persona descriptions (a snip-
pet of which is highlighted in the figure).

LLM generated content [16, 48, 49, 61]. Linguistic diversity can be
broadly classified into semantic, syntactic, and lexical diversity [16].
In our study, we focus on lexical diversity, leaving other forms of
linguistic diversity for future work. Lexical diversity measures the
range and variety of words used in a text sample and is an indicator
of vocabulary richness and textual complexity [3, 25, 63]. In the
context of persona descriptions generated by LLMs, lexical diversity
can be an important indicator of how well the description captures
the nuanced characteristics and attributes of the persona [32].

The need for a lexical analysis of persona descriptions generated
by LLMs is emphasized by evidence of bias and potentially harmful
stereotypes [1, 32, 42] in LLM-generated text that could make their
way into persona descriptions generated by LLMs [28, 54]. A lexical
analysis of LLM-generated persona descriptions can reveal valuable
information about the diversity of LLM-assisted user representation,
potential biases and risks involved in it, and practical guidelines
about how to quantify this lexical diversity and what LLMs to use
in order to create lexically diverse persona descriptions.

Against this backdrop, we put forth the following research ques-
tions (RQs):

• RQ1: How lexically diverse are LLM-generated personas?
• RQ2: Is there a dependence between demographic attributes
and lexical diversity in LLM-generated personas?

• RQ3: Does the lexical diversity of LLM-generated personas
vary by the LLM used?

RQ1 can provide results that can be compared against lexical
diversity baselines and conventionally developed persona descrip-
tions [32, 49]. High lexical diversity might be desirable in LLM-
generated persona descriptions, which are representative of entire
groups [30], thus, in turn, leading to more inclusive design.

For RQ2, a dependence between demographic attributes and
lexical diversity could indicate bias in which the LLM generates, for
example, lower-quality descriptions for certain demographic groups
[32]. Finally, RQ3 allows us to make informed choices in selecting
LLMs to maximize lexical diversity in persona descriptions.

In terms of positioning, our work examines the role of LLMs in
persona development and, as such, exemplifies the convergence of
HCI and NLP in addressing “grand challenges” in persona devel-
opment [51]. In the remainder of the work, we will first concisely
review the related work. After this, we present our methodology

and findings for each RQ. In the discussion, we summarize the im-
plications of these findings and suggest future research directions.

2 Related Work
Diverse user representation is a key principle in user-centered
design (UCD) [10, 19, 39–41, 46], since global user populations
are increasingly heterogeneous and, therefore, require variation in
persona sets to be represented in a fair and balanced manner [37].
Diverse user understandings can promote inclusion by representing
a wider range of user needs and attributes in the design process
[13, 14]. At its best, diversity empowers designers to create relevant,
inclusive products and features [8] that are accessible and usable
to those considered fringe or marginalized users [12, 62].

However, using personas as an inclusive design tool requires that
personas representing user groups within a population in a diverse
way [31, 38]. Design teams employing diverse persona sets are bet-
ter equipped to identify and address the needs of underrepresented
user segments early in the development process [13, 22, 36]. Diver-
sity in persona sets is associated with representing users in various
demographic and behavioral contexts [47, 50]. Furthermore, it is
essential to address ethical considerations in data-driven persona
development, particularly the risk that algorithms may emphasize
majority groups instead of fringe user groups [60]. This is particu-
larly important given the proliferation of algorithms in the persona
development process [21, 54].

While previous work has investigated human-AI persona gener-
ation workflows [59], qualitative analysis of LLM-generated per-
sonas [56] and persona diversity based on demographic attributes
[54], to our knowledge, no previous study has specifically inves-
tigated lexical diversity in LLM-generated persona descriptions.
Yet, analyzing lexical diversity in LLM-generated personas helps to
evaluate how well LLMs can create distinct character descriptions
without falling into repetitive language that might contain patterns
of generic stereotypes.

3 Methodology
3.1 Overview
We applied the methodology of Salminen et al. [54] using publicly
available Jupyter notebooks. We used our own OpenAI API keys
to run the persona generation code in the notebooks (without any
modification) and obtained a set of 450 personas generated by GPT-
4o1 to address RQ1 and RQ2.

The first stage involved creating a “skeletal” [54] persona that
does not contain a detailed persona description. In the second stage,
we used this skeletal persona in the prompt to generate a detailed
persona description. These persona descriptions were then passed
through a custom NLP pipeline for pre-processing and lexical diver-
sity calculations (detailed in subsequent sections). We have made
the pre-processing and the NLP pipeline code available through
Jupyter notebooks2 which researchers can use to re-run the analysis
or calculate lexical diversity metrics of their own.

Statistical analysis was performed on the resulting lexical di-
versity metric data to address RQ1 and RQ2. RQ3 was addressed
by generating 30 new personas from five different models (for a
1All LLMs used in this study are the mentioned versions as of 7 January 2025.
2Available at: https://bit.ly/lexical-diversity-personas-supplementary-material

https://bit.ly/lexical-diversity-personas-supplementary-material


Analyzing Lexical Diversity in LLM-Generated User Personas CHI EA ’25, April 26–May 01, 2025, Yokohama, Japan

total of 150 new personas) using a modified version of the original
prompt that was previously used to generate the personas used in
RQ1 and RQ2. A 120 word limit was enforced through the prompt
to ensure consistency in the length of the persona descriptions
across all models because text length might otherwise impose a
confounding factor. (This was not the case as, due to this condition-
ing, the observed length varied only little by personas; M = 78.5,
SD = 6.1 words.) For RQs 1 and 2, since we used the 450 personas
developed by Salminen et al. [54], the word limit was not applied.
Lexical diversity metrics were calculated based on the descriptions
of these new personas, and a Kruskal-Wallis test was performed on
each metric to compare the central tendencies of the metrics across
models. A post-hoc analysis was performed using Dunn’s test for
the metrics that exhibited statistically significant differences.

3.2 Model Selection
For investigating RQ1 and RQ2, we used GPT-4o, as it represented
the state-of-the-art model at the time of our study and was sup-
ported by the implementation of Salminen et al. [54]. This allowed
us to efficiently generate a substantial dataset of 450 personas while
maintaining methodological consistency with prior work. GPT-
4o has also shown strong capabilities in generating contextually
rich text [2], making it an ideal candidate for establishing baseline
lexical diversity patterns in persona descriptions. For RQ3, we ex-
panded our analysis to include multiple models (GPT 4o, Claude
3.5 Sonnet, DeepSeek V3, Gemini 1.5 Pro and LLaMa 3.1) to pro-
vide a comparative perspective on lexical diversity across different
LLM architectures. This study design allowed us to first establish
fundamental patterns in lexical diversity (RQ1) and its relationship
with demographic attributes (RQ2) before expanding to cross-model
comparisons (RQ3).

3.3 Metric Selection
Lexical diversity has a rich academic landscape with applications in
multiple domains, including the assessment of language disorders
[9, 11], the quality of writing [17], language development [64] and
NLP tasks [6, 18, 32, 65].

There are open-source Python libraries that implement differ-
ent sets of lexical diversity metrics. For our study, we applied the
LexicalRichness [58] library because of its extensive coverage
of lexical diversity metrics, detailed documentation, compatibility
with our own custom pre-processing pipeline and proven record
of use in academic research [44, 49]. Using multiple metrics also
aligns withMccarthy and Jarvis’ recommendation of “using multiple
metrics in research studies [rather than any single index] noting that
lexical diversity can be assessed in many ways and each approach
may be informative as to the construct under investigation” [34].

Since our data is predominantly limited to persona descriptions
of 120 words, which translates to a lower range of 40-50 words after
stop-word (common words like ‘a’, ‘and’, ‘the’) removal and further
processing, our data falls below the suitable text length threshold
for metrics like Measure of Textual, Lexical Diversity (MLTD) and
vocD [34, 35], which have thus been excluded from our chosen
metrics for this study.

A description of the lexical diversity metrics used in this study is
available in the supplementary material 3. The expected values for
high diversity have been referenced from the results of empirical
studies and existing research [33, 64]. It should also be noted that
these studies also emphasize the shortcomings of thesemetrics, such
as variability of Type-Token Ratio (TTR) with respect to word count
[34]. Our study, by using a collection of these metrics, balances the
outcomes against the individual weakness for a given metric. For
example, CTTR and RTTR are corrected versions of TTR which
overcome their dependence on token length.

3.4 Data Processing
We performed standard data pre-processing on the textual persona
description data. The original text was sequentially processed by
first normalizing the text by lower casing, removing special char-
acters, and removing excess white spaces. Stop words were then
removed and lemmatization (i.e., reducing the inflected forms of a
word to its base form, e.g. driving -> drive) was performed on the
normalized text using the NLTK library [4]. The persona’s name is
also removed with the stop words during this step.

Our selection of demographic attributes (age, gender, and coun-
try) follows an established precedent in persona research [30, 54],
where these attributes form the main demographic identifiers in
persona profiles. These attributes are consistently present in per-
sona templates and are known to potentially influence perception
and stereotyping [65], making them appropriate focal points for
investigating potential biases in lexical diversity.

The pre-processed text was used to generate LD metrics via
the LexicalRichness [58] package. The resulting metrics form
the basis for our analysis in this study. For RQ3, we created a set
of personas using different LLMs for comparative analysis. The
prompt of Salminen et al. [54] was used to create 30 personas using
each model. The following models were used: (a) ChatGPT 4o, (b)
Claude 3.5 Sonnet, (c) DeepSeek V3, (d) Gemini 1.5 Pro, and (e)
LLaMa 3.1 (405b). These represented state-of-the-art models at the
time we conducted the study. The persona descriptions were passed
through the same pre-processing pipeline, and metric calculation
was performed as in RQ1. Since this part of the study aggregation
of results across metrics for each model to calculate overall lexical
diversity, we transformed the values of the metrics where a lower
score is better to a higher is better scale.

4 Results
4.1 RQ1: How Lexically Diverse Are

LLM-Generated Personas?
We first illustrate the difference between persona descriptions with
marked and noticeable differences in lexical diversity in Table 1.
As an example, we chose two LLM-generated persona descriptions
(P209 and P281) from our sample. These descriptions are for per-
sonas representing alcohol addiction. Repeated words negatively
affect the lexical diversity and are highlighted in red. P209 scores
lower on all the eleven lexical diversity metrics used in this study.
Taking Type-Token Ratio (TTR) as a reference, a TTR of 0.59 for

3Available at: https://bit.ly/lexical-diversity-personas-supplementary-table

https://bit.ly/lexical-diversity-personas-supplementary-table


CHI EA ’25, April 26–May 01, 2025, Yokohama, Japan Sethi et al.

P209 indicates moderate lexical diversity and a higher rate of rep-
etition of words in the persona description as compared to P281
with a TTR of 0.82, which indicates high lexical diversity.

To understand the central tendency, stability, and overall distri-
bution of the lexical diversity metrics across our set of 450 persona
descriptions generated using GPT-4o for RQ1, we performed de-
scriptive statistical analysis (see Table 2).

The key takeaway from these descriptive statistics is that LLMs
can generate lexically diverse persona descriptions. Across all the
metrics, we see high mean values, indicating that these persona
descriptions contain a wide ranging vocabulary, balanced word
usage, and overall linguistic variety [33].

Figure 2 illustrates the distribution of the eleven normalized
lexical diversity across the 450 LLM-generated persona descriptions
created for RQ1.

Figure 2: Boxplot of normalized lexical diversity metrics. The
boxplot represents the distribution of each metric calculated
on a sample of 450 persona descriptions. Metrics where a
lower value is better are highlighted in blue. The metrics
have varying distributions and stability across the entire
sample and between metrics.

The metrics show varying distributions but low coefficients
of variation overall, suggesting consistent lexical diversity scores
across our dataset. This indicates that the personas generated by
LLMs maintain lexical diversity regardless of the context or length
of the persona description. In other words, the persona descriptions
exhibit consistent depth and detail.

We further investigated the stability of diversity metrics using
the coefficient of variation (CoV), because it is dimensionless and
allows comparison between metrics despite their scale [66]. We
chose the cut-off for CoV at 0.20 for stability [57].

Overall, the results indicate high lexical diversity and stable
distribution with minimal outliers across all metrics. The stability
of these metrics translates into low variance across the dataset.
This minimizes the influence of fluctuations caused by text length,
demographic attributes, or randomness.

4.2 RQ2: Is There a Dependence between
Demographic Attributes and Lexical
Diversity in LLM-generated personas?

Regression analysis was conducted using Ordinary Least Squares
(OLS) to examine the relationship between Age, Gender, and Coun-
try predictors and the eleven lexical diversity metrics.

The results indicate that none of the demographic attributes (age,
gender, or country) significantly predicts changes in the lexical
diversity metrics. While RTTR (𝐵 = −0.0141) and CTTR (𝐵 =

−0.0100) exhibit statistically significant negative relationships with
age, the 𝐵 for these relationships is very small, indicating a very
weak relationship. Overall, these results suggest that the lexical
diversity of LLM-generated persona descriptions is independent of
the demographic attributes of the personas themselves.

4.3 RQ3 : Does the Lexical Diversity of
LLM-Generated Personas Vary by LLM
Used?

The Kruskal-Wallis test was conducted to assess the differences in
lexical diversity across groups (models) for each metric. The results
revealed statistically significant differences for all metrics4.

The p-values for all metrics are highly significant (p < .05), sug-
gesting that for each metric, at least one group’s (model’s) median
differs significantly from the others.

Since all metrics exhibit statistically significant variability across
groups, post-hoc pairwise comparisons using Dunn’s test were con-
ducted to evaluate differences in lexical diversity metrics between
groups. Bonferroni correction was applied to adjust for multiple
comparisons across model pairs to ensure stringent control over
Type I error rate. Statistically significant differences between multi-
ple model pairs were observed across all metrics 4.

Finally, we measured the lexical diversity of each LLM across
the lexical diversity metrics to identify the model that generates
persona descriptions with the highest lexical diversity. To enable a
direct comparison, we transformed the values of the metrics (i.e.,
Maas, Herdan-VM and Simpson-D) in which a lower score is better.
The values for Maas, Herdan-VM and Simpson-D were transformed
to a higher is better scale using min-max normalization (applied to
the reciprocal of the original value).

The lexical diversity metrics for each model were normalized us-
ing MinMax normalization. The median for each metric-model pair
was calculated and plotted on a radar plot (see Figure 3). The area
under the polygon formed by each model was calculated using the
Surveyor’s Area Formula [5] that gives us the models’ overall lexical
diversity across all metrics. The larger the area of the polygon, the
higher the overall lexical diversity of the model.

The results indicate that Gemini 1.5 Pro had the highest lexical
diversity measured by our set of metrics among the models with
the largest area under the polygon formed in the radar plot. The
models can be ranked in descending order of their overall lexical
diversity (see Figure 4) by the area of their respective polygons.

5 Discussion
5.1 Findings Concerning Research Questions
Results on RQ1 show that LLM-generated persona descriptions are
lexically diverse, as measured by all metrics. Compared to bench-
mark values from the previous literature [33, 64], the persona de-
scriptions scored high on normalized scales. The relative stability

4Results of the statistical analyses performed can be viewed in the supplementary
material at: https://bit.ly/lexical-diversity-personas-supplementary-material

https://bit.ly/lexical-diversity-personas-supplementary-material


Analyzing Lexical Diversity in LLM-Generated User Personas CHI EA ’25, April 26–May 01, 2025, Yokohama, Japan

Table 1: Comparison of lexical diversity in two persona descriptions. Repeated words (highlighted in red) reduce the overall
lexical diversity of the text. In P209, we see the words farm repeated 8 times and family repeated once. In P281, there is a lower
incidence of repeated words, with college and social repeated once each, making it more lexically diverse than P209, which is
reflected in the TTR value.

P209 (Lower Diversity - TTR = 0.59) P281 (Higher Diversity - TTR = 0.82)

P209 is a 49-year-old farmer who has found himself struggling with
alcohol. Born and raised on a family farm, P209 was exposed to the

demanding nature and responsibilities of farm life from a very young age.
Growing up on the family farm made P209 accustomed to the rigors of
rural life, including the seemingly never-ending days of labor and the

relentless physical demands of farm work. As an adult, P209 now manages
the farm himself, working long hours and taking on all of the day-to-day

responsibilities. Although the farm has provided P209 with a stable
occupation and a sense of pride in his work, it has also come at a high

personal cost. Dealing with the day-to-day stressors and routine of farm
life, coupled with feelings of isolation from the lack of social interaction
inherent in rural living, P209 turned to alcohol as a way to cope with his

emotional struggles.

P281 is a 22-year-old college student currently studying social sciences at
a well-known university. He is a member of a popular fraternity on campus,
which plays a significant role in his social life. As part of this fraternity,
P281 has had the opportunity to make several friends and enjoy a wide
range of activities, both on and off-campus. As is quite common among
college students, partying is a significant aspect of P281’s life. During his
freshman year, many of these gatherings involved excessive drinking and
the use of recreational drugs. Unfortunately, this environment provided
P281 with ample opportunity to experiment with various substances,

including opioids. Initially, P281 only used opioids occasionally, believing
that this would prevent him from developing an addiction. However, over
time, he found himself using increasingly higher doses and more frequent
administration to achieve the desired euphoric effects. Inevitably, this led

him down the path of opioid addiction.

Table 2: Central tendencies of the lexical diversity metrics. Metrics where a lower value is better are highlighted in blue. The
mean values for all metrics lie between our expected values for high diversity (greater than 70th percentile for metrics with an
upper bound). The metrics also exhibit low coefficients of variation, indicating that the metrics remain stable across the data,
indicating consistency in the lexical diversity of LLM-generated persona descriptions.

Metric Expected Value (High Diversity) Mean (M) SD CoV Diversity

TTR 0.7 − 0.9 0.72 0.05 0.063 High
RTTR 10 − 15 10.70 0.76 0.071 High
CTTR 7 − 10 7.57 0.54 0.071 High
Herdan 0.8 − 1.0 0.94 0.01 0.012 High
Summer 0.8 − 1.0 0.96 0.007 0.007 High
Dugast 90 − 150 90.96 15.95 0.175 High
Maas 0.005 − 0.02 0.01 0.002 0.178 High
Yulek 20 − 80 63.10 14.44 0.229 High
Yulei 70 − 150 72.83 22.23 0.305 High
Herdan-VM 0.06 − 0.1 0.067 0.008 0.126 High
Simpson-D 0.002 − 0.01 0.006 0.001 0.229 High

of these metrics reflects LLMs’ ability to produce lexically rich per-
sona descriptions. This aligns with previous findings on the ability
of LLMs to generate high-quality, lexically diverse text [2, 32, 54].

For RQ2, we did not find a statistically significant relationship
between demographic attributes of age, gender, or country and the
lexical diversity of the persona descriptions. The only statistically
significant relationship between age and two of our 11 lexical di-
versity metrics (RTTR and CTTR) was found to be extremely weak
with 𝐵 = −0.0141 and 𝐵 = −0.0100, respectively.

The findings to RQ3 revealed statistically significant differences
between the selected models for each individual metric, making
model choice a factor to consider in maximizing the lexical diversity
of LLM-generated persona descriptions. A performance analysis
across all metrics indicated that the persona descriptions generated
by Gemini 1.5 Pro had the highest lexical diversity.

5.2 Limitations and Future Research Directions
Even though we employed metrics that measure conventional lex-
ical diversity, their efficacy and relevance in persona evaluation
lack scholarly attention. A unified quantitative model for measur-
ing lexical diversity relevant to persona generation could inform
decisions of which metrics correlate with stakeholders’ experience
of personas. So, more needs to be known about the relationship
between lexical diversity and the “quality” of personas, especially
from the perspective of persona users. There is also a need to ex-
plore similar metrics in the realm of semantic [20] and syntactic
[43] diversity to quantify overall linguistic diversity. A correlation
analysis between linguistic diversity scores and human evaluations
by subject matter experts could form the basis of future research.

In assessing bias, RQ2 showing significant findings would have
signaled a bias toward a particular demographic attribute. However,
the lack of it (as found in this study) does not posit the absence
of bias. A conclusive answer to whether LLM-generated persona


CHI EA ’25, April 26–May 01, 2025, Yokohama, Japan Sethi et al.

Figure 3: A comparison of the overall lexical diversity of the models. Each individual radar plot depicts a model’s overall lexical
diversity across the metrics. Within each plot, each radial axis represents the median value of a lexical diversity metric. These
metrics have been MinMax normalized before plotting. The model with the highest overall lexical diversity across all metrics
will have the largest area under the polygon.

Figure 4: Overall lexical diversity of the LLMs for persona
generation based on areas under the polygon in Figure 3.

descriptions are free of demographic bias would require a cross-
sectional analysis of lexical diversity with demographic attributes
over a larger sample, with descriptions of varying lengths and
additional variables. While outside the scope of this study, this is a
needed direction for future research.

Moreover, while we have studied the results from multiple mod-
els on a control prompt, the effect of prompt engineering with
respect to prompt length, structure, and input data has not been
studied. It offers a promising avenue for future research. In a similar
vein, our work paves the way for automated evaluation systems
that could be applied before providing the persona descriptions to
stakeholders in order to ensure that the personas contain sufficient
levels of lexical diversity. The use of NLP metrics in the automatic
evaluation of persona descriptions thus offers a formidable avenue
for future work that is currently understudied. For example, our
approach could be further developed into an algorithm that would
select persona descriptions from a pool of candidate persona descrip-
tions to maximize the overall diversity of user representation at a

persona set level. Therefore, we believe that our work, exploratory
in nature, adds value to HCI research on the impact of LLMs in
persona generation and the application of NLP in HCI domains.
To this end, we share our programming code to facilitate further
explorations in LLM-generated personas.

References
[1] William Babonnaud, Estelle Delouche, and Mounir Lahlouh. 2024. The Bias that

Lies Beneath: Qualitative Uncovering of Stereotypes in Large Language Models.
Swedish Artificial Intelligence Society (2024), 195–203.

[2] Jason Baronova, Catherine Stevens, Logan Tennant, and Alfred MacPhee. 2024.
Dynamic context-aware representation for semantic alignment in large language
models. (2024).

[3] Yves Bestgen. 2024. Back to Basics in Measuring Lexical Diversity: Too Simple
to Be True. Applied Linguistics 45, 5 (Oct. 2024), 926–932. doi:10.1093/applin/
amae053

[4] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing
with Python: analyzing text with the natural language toolkit. " O’Reilly Media,
Inc.".

[5] Bart Braden. 1986. The surveyor’s area formula. The College Mathematics Journal
17, 4 (1986), 326–337.

[6] Erik Cambria. 2024. Semantics Processing. In Understanding Natural Language
Understanding. Springer, 113–228.

[7] Chris Chapman, Edwin Love, Russell P. Milham, Paul ElRif, and James L. Alford.
2008. Quantitative Evaluation of Personas as Information. Proceedings of the
Human Factors and Ergonomics Society Annual Meeting 52, 16 (Sept. 2008), 1107–
1111. doi:10.1177/154193120805201602

[8] Alan Cooper. 1999. The inmates are running the asylum. Springer.
[9] Kevin T. Cunningham and Katarina L. Haley. 2020. Measuring Lexical Diversity

for Discourse Analysis in Aphasia: Moving-Average Type–Token Ratio and Word
Information Measure. Journal of Speech, Language, and Hearing Research 63, 3
(2020), 710–721. doi:10.1044/2019_JSLHR-19-00226

[10] Nana Kesewaa Dankwa and Claude Draude. 2021. Setting Diversity at the Core
of HCI. In Universal Access in Human-Computer Interaction. Design Methods and
User Experience, Margherita Antona and Constantine Stephanidis (Eds.). Springer
International Publishing, Cham, 39–52.

[11] Gerasimos Fergadiotis, Heather Wright, and Thomas West. 2013. Measuring
Lexical Diversity in Narrative Discourse of People With Aphasia. American
Journal of Speech-Language Pathology 22 (05 2013), S397–S408. doi:10.1044/1058-
0360(2013/12-0083)

[12] Jennifer Goodman and Michelle Broome. 2012. A designer’s research manual:
Succeed in design by knowing your clients and what they really need. Rockport
Publishers.

[13] Joy Ai-Leen Goodman-Deane, Mike Bradley, Sam Waller, and P. John Clarkson.
2021. Developing personas to help designers to understand digital exclusion. 1
(2021), 1203–1212. doi:10.1017/pds.2021.120 Publisher: Cambridge University
Press.

https://doi.org/10.1093/applin/amae053
https://doi.org/10.1093/applin/amae053
https://doi.org/10.1177/154193120805201602
https://doi.org/10.1044/2019_JSLHR-19-00226
https://doi.org/10.1044/1058-0360(2013/12-0083)
https://doi.org/10.1044/1058-0360(2013/12-0083)
https://doi.org/10.1017/pds.2021.120


Analyzing Lexical Diversity in LLM-Generated User Personas CHI EA ’25, April 26–May 01, 2025, Yokohama, Japan

[14] Jonathan Grudin. 2006. Why Personas Work: The Psychological Evidence. In
The Persona Lifecycle, John Pruitt and Tamara Adlin (Eds.). Elsevier, 642–663.
doi:10.1016/B978-012566251-2/50013-7

[15] Jonathan Grudin and John Pruitt. 2002. Personas, participatory design and
product development: An infrastructure for engagement. In Proceedings of Partic-
ipation and Design Conference (PDC2002), Vol. 2. Sweden, 144–161.

[16] Yanzhu Guo, Guokan Shang, and Chloé Clavel. 2024. Benchmarking Linguistic
Diversity of Large Language Models. arXiv preprint arXiv:2412.10271 (2024).

[17] Hye Seung Ha. 2019. Lexical Richness in EFL Undergraduate Students’ Academic
Writing. English Teaching 74, 3 (2019), 3–28.

[18] Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating large
language models in generating synthetic hci research data: a case study. In
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems.
1–19.

[19] Julia Himmelsbach, Stephanie Schwarz, Cornelia Gerdenitsch, Beatrix Wais-
Zechmann, Jan Bobeth, andManfred Tscheligi. 2019. DoWe Care About Diversity
in Human Computer Interaction: A Comprehensive Content Analysis on Diver-
sity Dimensions in Research. In Proceedings of the 2019 CHI Conference on Human
Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for
Computing Machinery, New York, NY, USA, 1–16. doi:10.1145/3290605.3300720

[20] Paul Hoffman, Matthew A Lambon Ralph, and Timothy T Rogers. 2013. Semantic
diversity: A measure of semantic ambiguity based on variability in the contextual
usage of words. Behavior research methods 45 (2013), 718–730.

[21] Pei-Fang Hsu, Yu-Han Lu, Shih-Chu Chen, and Patricia Pei-Yi Kuo. 2024. Creating
and validating predictive personas for target marketing. International Journal of
Human-Computer Studies 181 (2024), 103147. doi:10.1016/j.ijhcs.2023.103147

[22] Laiba Husain, Teresa Finlay, Arqam Husain, Joseph Wherton, Gemma Hughes,
and Trisha Greenhalgh. 2024. Developing user personas to capture intersecting
dimensions of disadvantage in older patients who are marginalised: a qualitative
study. British Journal of General Practice 74, 741 (2024), e250–e257.

[23] Bernard J Jansen, Soon-gyo Jung, and Joni Salminen. 2024. Finetuning analytics
information systems for a better understanding of users: evidence of personifica-
tion bias on multiple digital channels. Information Systems Frontiers 26, 2 (2024),
775–798.

[24] Bernard J Jansen, Joni Salminen, Soon-gyo Jung, and Kathleen Guan. 2021. Chal-
lenges of Applying Data-Driven Persona Development. Data-Driven Personas
(2021), 139–158.

[25] Scott Jarvis. 2013. Capturing the Diversity in Lexical Diversity. Language Learning
63, s1 (March 2013), 87–106. doi:10.1111/j.1467-9922.2012.00739.x

[26] Soon-Gyo Jung, Joni Salminen, Kholoud Khalil Aldous, and Bernard J Jansen. 2025.
PersonaCraft: Leveraging language models for data-driven persona development.
International Journal of Human-Computer Studies 197 (2025), 103445.

[27] Ilkka Kaate, Joni Salminen, Soon-Gyo Jung, João M Santos, Essi Häyhänen, Trang
Xuan, Jinan Azem, and Bernard J Jansen. 2024. Modeling the New Modalities
of Personas: How Do Users’ Attributes Influence Their Perceptions and Use of
Interactive Personas?. In Adjunct Proceedings of the 32nd ACM Conference on User
Modeling, Adaptation and Personalization. 164–169.

[28] Messi H.J. Lee, Jacob M. Montgomery, and Calvin K. Lai. 2024. Large Lan-
guage Models Portray Socially Subordinate Groups as More Homogeneous,
Consistent with a Bias Observed in Humans. In The 2024 ACM Conference
on Fairness, Accountability, and Transparency (FAccT ’24). ACM, 1321–1340.
doi:10.1145/3630106.3658975

[29] Antonio A Lopez-Lorca, Tim Miller, Sonja Pedell, Antonette Mendoza, Alen
Keirnan, and Leon Sterling. 2014. One size doesn’t fit all: diversifying" the user"
using personas and emotional scenarios. In Proceedings of the 6th International
Workshop on Social Software Engineering. 25–32.

[30] Nicola Marsden and Monika Pröbster. 2019. Personas and identity: Looking at
multiple identities to inform the construction of personas. In Proceedings of the
2019 CHI Conference on Human Factors in Computing Systems. 1–14.

[31] Nicola Marsden and Monika Pröbster. 2019. Personas and Identity: Looking at
Multiple Identities to Inform the Construction of Personas. In Proceedings of the
2019 CHI Conference on Human Factors in Computing Systems - CHI ’19. ACM
Press, Glasgow, Scotland Uk, 1–14. doi:10.1145/3290605.3300565

[32] Gonzalo Martínez, José Alberto Hernández, Javier Conde, Pedro Reviriego, and
Elena Merino-Gómez. 2024. Beware of Words: Evaluating the Lexical Diversity
of Conversational LLMs using ChatGPT as Case Study. ACM Trans. Intell. Syst.
Technol. (Sept. 2024). doi:10.1145/3696459 Just Accepted.

[33] Philip M McCarthy. 2005. An assessment of the range and usefulness of lexical
diversity measures and the potential of the measure of textual, lexical diversity
(MTLD). Ph. D. Dissertation. The University of Memphis.

[34] Philip M McCarthy and Scott Jarvis. 2007. vocd: A theoretical and empirical
evaluation. Language Testing 24, 4 (2007), 459–488.

[35] Philip M. McCarthy and Scott Jarvis. 2010. MTLD, vocd-D, and HD-D: A valida-
tion study of sophisticated approaches to lexical diversity assessment. Behavior
Research Methods 42, 2 (May 2010), 381–392. doi:10.3758/BRM.42.2.381

[36] Zdenek Meier, Kristyna Gabova, Radka Zidkova, and Peter Tavel. 2024. Personas
of Older Adults in Social and Health Context. In Intelligent Technologies for
Healthcare Business Applications. Springer, 137–171.

[37] Farooq Mubarak, Reima Suomi, and Satu-Päivi Kantola. 2020. Confirming the
links between socio-economic variables and digitalization worldwide: the unset-
tled debate on digital divide. Journal of Information, Communication and Ethics
in Society 18, 3 (2020), 415–430.

[38] Timothy Neate, Aikaterini Bourazeri, Abi Roper, Simone Stumpf, and Stephanie
Wilson. 2019. Co-Created Personas: Engaging and Empowering Users with
Diverse Needs Within the Design Process. In Proceedings of the 2019 CHI
Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk)
(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12.
doi:10.1145/3290605.3300880

[39] Lene Nielsen. 2013. Personas-user focused design. Vol. 15. Springer.
[40] Lene Nielsen, Marta Larusdottir, and Lars Bo Larsen. 2021. Understanding users

through three types of personas. In Human-Computer Interaction–INTERACT
2021: 18th IFIP TC 13 International Conference, Bari, Italy, August 30–September 3,
2021, Proceedings, Part II 18. Springer, 330–348.

[41] Lene Nielsen and Kira Storgaard Hansen. 2014. Personas is applicable: a study
on the use of personas in Denmark. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems. 1665–1674.

[42] Naseela Pervez and Alexander J. Titus. 2024. Inclusivity in Large Lan-
guage Models: Personality Traits and Gender Bias in Scientific Abstracts.
arXiv:2406.19497 [cs.CL] https://arxiv.org/abs/2406.19497

[43] Alexander Pfaff. 2024. How to measure syntactic diversity: Patternization, meth-
ods, algorithms. Noun phrases in early Germanic languages (2024), 33–70.

[44] Esther Ploeger, Huiyuan Lai, Rik van Noord, and Antonio Toral. 2024. To-
wards Tailored Recovery of Lexical Diversity in Literary Machine Translation.
arXiv:2408.17308 [cs.CL] https://arxiv.org/abs/2408.17308

[45] Mirjana Prpa, Giovanni Maria Troiano, Matthew Wood, and Yvonne Coady. 2024.
Challenges and Opportunities of LLM-Based Synthetic Personae and Data in
HCI. In Extended Abstracts of the CHI Conference on Human Factors in Computing
Systems. 1–5.

[46] Graham Pullin and Alan Newell. 2007. Focussing on extra-ordinary users. In
Proceedings of the 4th International Conference on Universal Access in Human
Computer Interaction: Coping with Diversity (Beijing, China) (UAHCI’07). Springer-
Verlag, Berlin, Heidelberg, 253–262.

[47] Cynthia Putnam, Emma J. Rose, Erica J. Johnson, and Beth E. Kolko. 2009.
Adapting User-Centered Design Methods to Design for Diverse Populations.
Information Technologies and International Development 5 (2009), 51–74. https:
//api.semanticscholar.org/CorpusID:55242592

[48] Emily Reif, Minsuk Kahng, and Savvas Petridis. 2023. Visualizing linguistic
diversity of text datasets synthesized by large language models. In 2023 IEEE
Visualization and Visual Analytics (VIS). IEEE, 236–240.

[49] Pedro Reviriego, Javier Conde, Elena Merino-Gómez, Gonzalo Martínez, and
José Alberto Hernández. 2024. Playing with words: Comparing the vocabulary
and lexical diversity of ChatGPT and humans. Machine Learning with Applications
18 (2024), 100602.

[50] Joni Salminen, Kamal Chhirang, Soon-Gyo Jung, Saravanan Thirumuruganathan,
Kathleen W. Guan, and Bernard J. Jansen. 2022. Big Data, Small Personas: How
Algorithms Shape the Demographic Representation of Data-Driven User Seg-
ments. (2022). doi:10.1089/big.2021.0177 Publisher: Mary Ann Liebert, Inc.,
publishers 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA.

[51] Joni Salminen, Soon-gyo Jung, Hind Almerekhi, Erik Cambria, and Bernard
Jansen. 2023. How Can Natural Language Processing and Generative AI Address
Grand Challenges of Quantitative User Personas?. In International Conference on
Human-Computer Interaction. Springer, 211–231.

[52] Joni Salminen, Soon-gyo Jung, Shammur A Chowdhury, and Bernard J Jansen.
2020. Rethinking personas for fairness: Algorithmic transparency and account-
ability in data-driven personas. In Artificial Intelligence in HCI: First International
Conference, AI-HCI 2020, Held as Part of the 22nd HCI International Conference, HCII
2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings 22. Springer, 82–100.

[53] Joni Salminen, Soon-Gyo Jung, Lene Nielsen, and Bernard Jansen. 2022. Creating
More Personas Improves Representation of Demographically Diverse Populations:
Implications Towards Interactive Persona Systems. In Nordic Human-Computer
Interaction Conference. 1–11.

[54] Joni Salminen, Chang Liu, Wenjing Pian, Jianxing Chi, Essi Häyhänen, and
Bernard J Jansen. 2024. Deus Ex Machina and Personas from Large Language
Models: Investigating the Composition of AI-Generated Persona Descriptions. In
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
(CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–20.
doi:10.1145/3613904.3642036

[55] Joni Salminen, Sercan Şengün, João M Santos, Soon-gyo Jung, Lene Nielsen, and
Bernard Jansen. 2023. The Choice of a Persona: An Analysis of Why Stakeholders
Choose a Given Persona for a Design Task. In International Conference on Human-
Computer Interaction. Springer, 288–310.

[56] Andreas Schuller, Doris Janssen, Julian Blumenröther, Theresa Maria Probst,
Michael Schmidt, and Chandan Kumar. 2024. Generating personas using LLMs
and assessing their viability. In Extended Abstracts of the CHI Conference on
Human Factors in Computing Systems. 1–7.

https://doi.org/10.1016/B978-012566251-2/50013-7
https://doi.org/10.1145/3290605.3300720
https://doi.org/10.1016/j.ijhcs.2023.103147
https://doi.org/10.1111/j.1467-9922.2012.00739.x
https://doi.org/10.1145/3630106.3658975
https://doi.org/10.1145/3290605.3300565
https://doi.org/10.1145/3696459
https://doi.org/10.3758/BRM.42.2.381
https://doi.org/10.1145/3290605.3300880
https://arxiv.org/abs/2406.19497
https://arxiv.org/abs/2406.19497
https://arxiv.org/abs/2408.17308
https://arxiv.org/abs/2408.17308
https://api.semanticscholar.org/CorpusID:55242592
https://api.semanticscholar.org/CorpusID:55242592
https://doi.org/10.1089/big.2021.0177
https://doi.org/10.1145/3613904.3642036


CHI EA ’25, April 26–May 01, 2025, Yokohama, Japan Sethi et al.

[57] Orit Shechtman. 2013. The Coefficient of Variation as an Index of Measurement
Reliability. Springer Berlin Heidelberg, Berlin, Heidelberg, 39–49. doi:10.1007/
978-3-642-37131-8_4

[58] Lucas Shen. 2022. LexicalRichness: A small module to compute textual lexical
richness. doi:10.5281/zenodo.6607007

[59] Joongi Shin, Michael A Hedderich, Bartłomiej Jakub Rey, Andrés Lucero, and
Antti Oulasvirta. 2024. Understanding Human-AI Workflows for Generating
Personas. In Proceedings of the 2024 ACMDesigning Interactive Systems Conference.
757–781.

[60] Phillip Douglas Stevenson and Christopher Andrew Mattson. 2019. The Personi-
fication of Big Data. Proceedings of the Design Society: International Conference
on Engineering Design 1, 1 (July 2019), 4019–4028. doi:10.1017/dsi.2019.409

[61] Guy Tevet and Jonathan Berant. 2021. Evaluating the Evaluation of Diversity
in Natural Language Generation. In Proceedings of the 16th Conference of the
European Chapter of the Association for Computational Linguistics: Main Volume.
Association for Computational Linguistics, Online, 326–346. doi:10.18653/v1/

2021.eacl-main.25
[62] Phil Turner and Susan Turner. 2011. Is stereotyping inevitable when designing

with personas? Design studies 32, 1 (2011), 30–44.
[63] Fiona J. Tweedie and R. Harald Baayen. 1998. How Variable May a Constant Be?

Measures of Lexical Richness in Perspective. Computers and the Humanities 32, 5
(1998), 323–352. https://www.jstor.org/stable/30200474 Publisher: Springer.

[64] Ji Seung Yang, Carly Rosvold, and Nan Bernstein Ratner. 2022. Measurement of
lexical diversity in children’s spoken language: Computational and conceptual
considerations. Frontiers in psychology 13 (2022), 905789.

[65] Xulang Zhang, Rui Mao, and Erik Cambria. 2024. Multilingual emotion recogni-
tion: Discovering the variations of lexical semantics between languages. In 2024
International Joint Conference on Neural Networks (IJCNN).

[66] Wanwan Zheng andMingzhe Jin. 2024. Evaluate Lexical RichnessMeasures Using
Coefficient of Variation and Relative Value. (2024). http://www.cicling.org/2018/
intranet/pre-print/papers/paper_1.pdf Working paper. Accessed: 29-Dec-2024.

https://doi.org/10.1007/978-3-642-37131-8_4
https://doi.org/10.1007/978-3-642-37131-8_4
https://doi.org/10.5281/zenodo.6607007
https://doi.org/10.1017/dsi.2019.409
https://doi.org/10.18653/v1/2021.eacl-main.25
https://doi.org/10.18653/v1/2021.eacl-main.25
https://www.jstor.org/stable/30200474
http://www.cicling.org/2018/intranet/pre-print/papers/paper_1.pdf
http://www.cicling.org/2018/intranet/pre-print/papers/paper_1.pdf

	Abstract
	1 Introduction
	2 Related Work
	3 Methodology
	3.1 Overview
	3.2 Model Selection
	3.3 Metric Selection
	3.4 Data Processing

	4 Results
	4.1 RQ1: How Lexically Diverse Are LLM-Generated Personas?
	4.2 RQ2: Is There a Dependence between Demographic Attributes and Lexical Diversity in LLM-generated personas?
	4.3 RQ3 : Does the Lexical Diversity of LLM-Generated Personas Vary by LLM Used?

	5 Discussion
	5.1 Findings Concerning Research Questions
	5.2 Limitations and Future Research Directions

	References