"When AI Writes Personas" : Analyzing Lexical Diversity in LLM-Generated Persona Descriptions

Osuva_Sethi_Salminen_Amin_Jansen_2025.pdf
Lopullinen julkaistu versio - 1.01 MB

Kuvaus

© 2025 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Large language models (LLMs) are increasingly employed in generating user personas representing various groups of people. It is vital that these personas do not contain major sources of bias for stakeholders using the personas. To investigate linguistic bias in LLM-generated personas, we apply eleven lexical diversity metrics to analyze the association between linguistic diversity in 600 persona descriptions generated using five LLMs (GPT, Claude, Gemini, DeepSeek, Llama) and demographic attributes (age, gender, country) of the personas. We find that LLM-generated persona descriptions are lexically diverse independently of the personas’ demographic attributes. While we find no significant demographic bias in the persona profiles, we do find significant differences between the lexical diversity of persona descriptions generated by the LLMs. The persona descriptions generated by Gemini 1.5 Pro have the highest lexical diversity. The results imply that current LLMs can generate lexically diverse persona descriptions, but the selection of an LLM for specific applications is an important decision.

Emojulkaisu

Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25)

ISBN

979-8-4007-1395-8

ISSN

Aihealue

OKM-julkaisutyyppi

B3 Vertaisarvioimaton artikkeli konferenssijulkaisussa