This is a self-archived – parallel published version of this article in the
publication archive of the University of Vaasa. It might differ from the original.
Developing Persona Analytics Towards Persona
Science
Author(s): Salminen, Joni; Jung, Soon-Gyo; Jansen, Bernard
Title: Developing Persona Analytics Towards Persona Science
Year: 2022
Version: Accepted manuscript
Copyright © Authors | ACM 2022. This is the author's version of the work. It is posted
here for your personal use. Not for redistribution. The definitive Version
of Record was published in Proceedings of IUI '22: 27th International
Conference on Intelligent User Interfaces,
http://dx.doi.org/10.1145/3490099.3511144.
Please cite the original version:
Salminen, J., Jung, S-G. & Jansen, B. (2022). Developing Persona
Analytics Towards Persona Science. Proceedings of IUI '22: 27th
International Conference on Intelligent User Interfaces, 323-344. New
York: Association for Computing Machinery.
https://doi.org/10.1145/3512891
Developing Persona Analytics Towards Persona Science
Joni Salminen
Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar; and University of Vaasa, Vaasa,
Finland, joni.salminen@uwasa.fi
Soon-gyo Jung
Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar, sjung@hbku.edu.qa
Bernard J. Jansen
Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar, bjansen@hbku.edu.qa
Much of the reported work on personas suffers from the lack of empirical evidence. To address this issue, we
introduce Persona Analytics (PA), a system that tracks how users interact with data-driven personas. PA captures
users’ mouse and gaze behavior to measure users’ interaction with algorithmically generated personas and use of
system features for an interactive persona system. Measuring these activities grants an understanding of the
behaviors of a persona user, required for quantitative measurement of persona use to obtain scientifically valid
evidence. Conducting a study with 144 participants, we demonstrate how PA can be deployed for remote user
studies during exceptional times when physical user studies are difficult, if not impossible.
Keywords
Personas, User research, Persona science, Persona analytics, Remote user studies
1 Introduction
The current work discusses how personas can be effectively combined with the concept of analytics, i.e., the use of
end-user data for drawing insights into human factors [53]. We start by defining the key concepts, and we then
explain our approach to infusing personas with analytics, which we denote as Persona Analytics (PA). We refer
to users when we mean stakeholders that use personas for decision making (e.g., designers, software developers,
marketers). For other terminology, we refer to end-users when we mean people on whose information personas
are based.
Personas, commonly applied in human-computer interaction (HCI) [16], design [8], and business domains such
as marketing and sales [74], are fictional depictions of end-users, patients, customers, or other groups of interest
[16]. Personas convey end-users’ needs and requirements [8], alleviate decision-makers’ self-referential bias [6],
and enable thinking of end-users even when none are physically present [66]. Also, personas give a human face to
analytics data [29,35], humanize segments [11], give design inspiration [57], help compare end-users [34], and
facilitate prioritizing end-user needs [69] for system development [67]. Persona profiles show relevant
information about end-users [58] (see Figure 1). Nielsen summarizes a good deal of personas literature on their
creation, assessment, and use [57]. Personas are easily digestible snapshots of end-users, audiences, or customers
for use throughout an organization.
Figure 1: Example of a persona profile. Persona Analytics enables tracking users’ mouse and gaze
interactions as the user engages with different information elements (A: Picture, name, and text
description, B: Audience size, C: Sentiment, D: Social media quotes, E: Topics of interest, and F: Most
viewed contents).
Personas are increasingly being enriched with quantitative data [72], or their creation is partially or
completely carried out by algorithmic processes, which is referred to as algorithmically generated persona
development [50]. When quantitative data becomes part of the created persona profiles, these profiles start
approaching other analytics systems in terms of producing end-user metrics. In fact, algorithmically generated
personas can be seen as an alternative method of end-user understanding, as alternatives to UX analytics tools
(e.g., Google Analytics, Adobe Analytics, HubSpot, Mixpanel, Crazy Egg, and so on), when it comes to design tasks
that require user insights (e.g., understanding website visitors for improving usability). As such, personas
personify the numerical data on end-user characteristics and behaviors—turning numerical reports into persona
profiles [29].
This transformation from “cold numbers” into “warm people” has been denoted as a benefit of personas, in
that personified end-user data is treated more empathetically than nameless, faceless numbers [74].
Algorithmically generated personas are also supported by the automation of data science pipelines—i.e., the
process of persona creation can now be automated—as well as web technologies [52] that enable serving the
personas to users via web browsers—i.e., via interactive persona systems [33]. As such, PA can be defined as
follows:
DEFINITION 1: Persona analytics refers to decision-makers (i.e., persona users) in organizations using
personas as analytical tools to better understand their end-users or other groups of interest.
The above definition follows the conventional understanding of algorithmically generated personas using
quantitative data [29,30,52]. As mentioned, we define a ‘user’ as someone who uses a persona for a professional
task, which can relate to software development, design, marketing, or any other domain where personas are
applied. Therefore, users can be software developers, designers, marketers, or other stakeholders involved in
user-centric decision making. Because of this connection between personas and users, there exists another aspect
to the concept of PA. Namely, PA can be seen as a research instrument to generate lasting knowledge about
personas and how users interact with them. Its role is to grant HCI researchers a systematic approach for
collecting data about persona user behavior and metrics for dealing with this data. In so doing, PA paves the way
to more effective application of persona science, defined as ‘the use of empirical scientific methods, such as
experiments, to produce robust and generalizable information about persona creation, evaluation, use, and
impact’.
Indeed, in prior research [36,37], PA is defined as the systematic measurement of behaviors and interactions of
persona users engaged with interactive persona systems. This is consistent with the second definition we put
forth:
DEFINITION 2: Persona analytics refers to how researchers investigate the behaviors of persona users.
It is this second definition that motivates our current work, because examining persona users’ engagement
with personas can generate vital insights for persona science and the design of personas and persona systems that
better serve stakeholders’ information needs about end-users or customers. The persona research urgently needs
a strong empirical orientation to produce knowledge that is believable and can truly push forward the boundaries
of personas practice and theory and add up to a coherent understanding of the persona user. Advocates of the
scientific method in persona research [4,5,11,12,23] have continuously mentioned the lack of empirical
experiments and quantitative measurements as a bottleneck for progress in terms of theory and practice.
Our definition of persona science implies not only collecting data and conducting research on personas but
also making an effort to devise theories that explain the data and guide further data collection. Persona science
deals with real user behavior and formulating theories that are relevant to the design of personas. The focus in
these efforts lies in the study of the persona users, which we demonstrate in this work by introducing new tools
for measuring persona user behavior. To this end, the current work concerns itself with the development of a
novel PA system embedded within a persona system, with the purpose of more effectively
investigating/researching the behavior of persona users. Three research questions (RQs) are posed:
• RQ1: How to implement PA in an interactive persona system?
• RQ2: What kind of research questions can PA address?
• RQ3: How can PA be used for understanding persona user behavior?
The goal of the current work is to report efforts of building analytics features into an interactive persona
system. We demonstrate the capabilities of this PA system and discuss its value for empirical persona research. In
practice, PA can assist in designing layouts, features, and information content in algorithmically generated
persona profiles. To achieve these benefits, it is necessary to incorporate analytics into personas, so that the
interaction between the users and personas (and interaction features) can be captured. Previous efforts of this
work appeared in [36,37] – relative to these, the current work adds a full-scale case study demonstrating the
system capabilities with a real user study (previous research only tested the system with one pilot user). The PA
system has implications for researchers and practitioners who are increasingly adopting web-based tools for
remote testing since social distancing hampers in-person user studies [17]. This trend is likely to continue as tools
and practices for remote user studies evolve.
2 Related Work
Algorithmically generated Personas. Although quantitative personas were first created within software
requirements engineering [7,8], the concept of personas being data-driven was introduced by McGinn and
Kotamraju [50] and later deployed by others [40,41,51,95,97]. Though, the idea of using “data” for personas dates
to Cooper’s [16] concept that personas should be based on real user goals instead of fiction. While data
orientation has remained a consistent theme in the persona literature [16,17,32,9,10,50,51], three trends
contribute to the rise of algorithmically generated personas [29,72]: (1) availability of user and customer data
from online analytics and social media platforms; (2) democratization of data science tools and algorithms that
enable automated persona generation; and (3) web technologies that remove the limitations of static personas via
interactive user interfaces. These trends denote a shift from unchanging “flat file” personas into dynamic “full-
stack personas” that update automatically and are traceable to individual user-level data [29].
Interactive Persona Systems. From algorithmically generated personas, the next logical step of evolution is
interactive persona systems [3,52,72], defined as interactive user interfaces (UI) that display persona profiles. This
UI can, but not necessarily always, be accessed via web browsers [32,33,35,38]. The benefits of web technologies
are their broad applicability and accessibility. Personas served via the web can be accessed virtually from
anywhere using any device that supports web browsing (see Figure 2a). Supporting technologies, such as user
account management, can be integrated with relative ease using standard libraries and best practices.
Interactivity refers to users performing various actions on the personas, such as analyzing information on gender
distributions, refreshing the persona quotes, filtering the quotes by sentiment and topic [80], predicting a
persona’s interest for a given topic [2,3], and engaging in dialogue [45,47]. The interactive features are enabled by
standard Web technologies, such as HTML, CSS, and JavaScript.
Emerging opportunities in Literature. Following these developments in algorithmically generated personas
and interactive persona systems that have been described as transformational [52], multiple opportunities can be
envisioned. We highlight five such opportunities. First, (i) interaction techniques and multimedia (e.g., persona
chat/dialogue systems [13], video, AI agents [87]…) could be incorporated into persona systems to serve various
end-user needs [75]. Second, (ii) new features for comparing personas by design goal metrics, such as diversity
[74] and inclusivity [21], could be added. Third, (iii) personas could be integrated into an external system to
enable persona-based recommendations [46], content management, and customer relationship management, as
well as facilitating online advertising [79] via application programming interfaces (APIs) [38]. Fourth, (iv)
developers could provide explainability, transparency, and context, which are important when applying
algorithms for persona creation [80,88], as illustrated in Figure 2b. Finally, (v) interactive systems can be used to
drill down to the persona information and make quantitative predictions [3].
(a) (b)
Figure 2: (a) Interactive persona features, such as [A] browsing the available personas, [B] searching and
sorting by user-defined criteria, [C] explanatory tooltips, and [D] export of usage logs. (b) Adding
transparency to algorithmically generated personas. The first layer shows Mamdouh, a young Egyptian.
The second information layer, accessible by clicking a chart icon, shows that Mamdouh actually comprises
many demographic groups, of which [Egypt, 25-34 Male] is deemed the most representative by the
algorithm
Research Gap. While technology introduces novel opportunities for user-to-persona interaction, at the same
time, these trends create an opportunity for better understanding of how persona users, such as designers,
software developers, and marketers interact with personas. This better understanding of persona user behavior
can lead to substantial advances in persona science (i.e., the academic study of personas and their usage), but it
requires effective implementation of measurement. The lack of empirical persona user research has been noted
by several researchers [48,72,77]. The unifying factor behind these possibilities is the need for understanding the
persona user behavior, which requires measurement. In our solution, this measurement capability is provided by
PA.
3 Methodology for Persona Analytics
3.1 Requirements Journey
There is no standard method of building an analytics system. In our case, all the researchers had extensive
experience of both Web analytics and personas research, which was instrumental in this process. This experience
consisted of working with industry-leading analytics solutions, such as Google Analytics, for more than a decade
in the case of two authors and half a decade in the case of one researcher. The persona research of the authors,
when combined, also extends well beyond a decade and mostly consists of empirical work. Therefore, we had a
vision of what we wanted to accomplish, what is missing from current research and practice, and what research
questions in persona science should be addressed via empirical data.
We started out by “drinking our own Kool-Aid,” i.e., by defining the ideal user persona for the system. This
“persona” of a user of the PA system is a researcher that wants to conduct persona user studies in order to address
scientifically important questions. Measuring user behaviors helps researchers tackle open research questions in
persona science, which is the goal of this persona. To support this persona, a few requirements are posed: (a) the
data must be accurate so that proper conclusion can be drawn from it, (b) there should be the possibility to
include several data types to enable the comparison of different end-user inputs, and (c) the dimensionality of the
data should not be overwhelming for the analysis task, i.e., the data needs to be exportable in a format that is
relatively easy to analyze.
These desiderata were considered in the design of PA by incorporating multiple data sources and by keeping
the reporting data granularity at a user-friendly level – in other words, reports with different levels of detail and
aggregation are provided, as explained later in this manuscript.
Second, we brainstormed the type of questions that the PA system would need to be able to address by
providing data for the researcher persona. The following list of scientific questions of interest (SQ) was
collaboratively obtained among the research team members, pertaining to various persona aspects (in
parentheses):
• SQa: How do users interact with persona profiles? (interaction techniques)
• SQb: What interactive features facilitate users’ discovery of personas for a given task? (interaction
techniques)
• SQc: What information of personas do users pay attention to? (information design)
• SQd: What persona information influences users’ design choices and how? (information design)
• SQe: What persona information influence users’ behaviors or attitudes about end-users? (persona
perceptions)
• SQf: How do users compare personas for a task? (cognitive styles, information processing)
• SQg: How and why do users choose a persona for a given task? (cognitive styles, information processing)
• SQh: How do users or user groups differ by their persona use? For example, are persona users with less
experience in personas using them differently? Are there gender differences? (demographic, cultural,
and social factors)
Addressing these and other vital questions can provide much needed direction for persona science, addressing
aspects of persona creation, validation, use, and value in use. Empirical, scientific inquiry is not only needed to
produce valid knowledge for practitioners using personas, but it is also required to create robust theories on
personas and their users. Aligned with principles of scientific inquiry, persona research can benefit from adopting
more rigorous research designs, including hypothesis formulation based on theories in HCI, information science,
social psychology, and other fields tangential to personas; and followed by systematic testing of those hypotheses,
then revising the theory to adapt to persona context. Addressing these questions can help persona creators
understand which aspects of human-to-human interactions apply to human-to-persona interactions, so design
decisions can be made to mitigate unwanted effects (e.g., stereotyping [48] and seeing personas as irrelevant,
abstract, or misleading [49]) as much as possible.
3.2 Defining Metrics for Persona User Behavior
In practice, to address these questions, the PA system needs to track various measures and use these measures to
compute metrics. Therefore, we needed to devise PA system metrics. These metrics were defined based on their
ability to address the types of questions posed earlier. These metrics can be divided into (a) persona-based metrics
and (b) user-based metrics. The persona-based metrics include, for example, the following (with potential use
mentioned after definition):
• Time spent per persona: the duration users interacted with a given persona. Purpose: Proxy measure
for users’ interest – it is likely that users spend more time with personas they find more interesting.
• Number of visits per persona: the number of a given persona was visited by the users. Purpose: Proxy
measure for users’ interest – it is likely that users visit personas they find interesting more often.
• Persona bi- and trigrams: the number of times users visited specific two or three personas during a
session or time period. Purpose: Bi- and trigrams can be indicative of comparative behavior, i.e., how
users compare personas. (Technically, this is not a metric but a measure; however, we mention it here
due to its nature of being computed.)
In other words, these metrics communicate aggregate information about how one persona did relative to
another – i.e., was one more popular than another, in what order where they visited, and so on. User-based
metrics, in turn, communicate about a user or a group of users. These include:
• Number of personas visited: the number of personas a user visited during a session. Purpose: to
understand how thoroughly a user viewed the personas. For example, a user that only visits a small
number of personas either quickly found what they were looking for, satisficed with the “first acceptable
choice” [93], or was not engaged with the system and/or personas.
• Persona coverage: the relative share of personas a user visited out of the personas available. Purpose:
The same as previous, but as a ratio metric of the visited personas / the number of available personas.
The higher the number of personas becomes, the more likely it is that the persona coverage per user
decreases, as users would be unlikely to browse a very high number of personas for their professional
tasks.
• Average visit duration: the average time spent per persona for a given user. Purpose: can reveal if the
user was more or less engaged relative to other users.
• Persona rank correlation: the degree to which the order of a user visiting the personas corresponded
with the personas’ order of presentation in the system (can be computed based on visit duration as
well). Purpose: to test if there are order effects that affect persona use.
Many of these metrics are inspired by similar metrics used in information theory [92], eye-tracking studies
[25,42,71], and Web analytics [1,14,31]. While similar metrics are well established in said fields, these are metrics
are not established in persona science and research. In fact, we are aware of no previous study that discusses
metrics for persona user behavior – again, this hampers scientific progress in this domain. Coupled with
participant data, the metrics can help analyze how users view different personas, if there is selection bias based
on demographic factors of personas and their users, and so on. While these basic metrics provide a useful starting
point for persona science, a lot more development in this domain is needed. It is also important to modify and
adapt known metrics for the persona context, because they could be computed or interpreted differently when
studying personas. We discuss this matter later in Section 6.8.
3.3 Determining the Data Collection Modes
The data collection modes were largely pre-determined by what is possible using the current Web technologies.
Mouse-tracking is the obvious choice due to its commonality in online analytics and support provided by all Web
browsers [39]. The advantages of mouse-tracking are three-fold: it (i) offers an unobtrusive form of tracking of
natural user behavior, (ii) does not require calibration, and (iii) has perfect accuracy—i.e., there is virtually no
measurement error, but the users’ movement of the mouse is perfectly traceable to specific pixels and UI
elements. On the negative side, mouse-tracking is considered a weaker proxy for attention than gaze movement,
i.e., eye-tracking [9,55], mainly because users might not always move their mouse when processing information
on the screen. As processing of persona information requires eye-sight, eye-tracking is a useful data source to
complement mouse-tracking in interactive systems [18]. The challenge of webcam-based tracking is that error
margins can pose challenges for data quality, as there are differences in terms of hardware quality, lighting
conditions, distance to screen and device, and a myriad of other conditions that can decrease online eye-tracking
data quality [98]. While these issues do concern both separate hardware trackers (e.g., Tobii, MyGaze, GazePoint)
and webcam-based eye-tracking, for the latter, the challenges in data quality are much higher because webcams
do not provide access to infrared frequencies that the professional trackers use.
Therefore, because mouse- and eye-tracking in a remote user study context each involve their unique
advantages and challenges, it is appropriate to integrate both data collection modes into the PA system, which
simultaneously completes the scope of the requirements. Figure 3 offers a conceptual overview of the PA system.
Overall, the overall algorithmically generated persona process is as follows: end-users’ data => personas =>
persona users (e.g., marketers) => persona users’ data (collected via PA) => researchers and analysts (studying
persona user behavior). In other words, personas serve the information needs of stakeholders, and PA serves the
information needs of researchers interested in persona user behavior.
Figure 3: Conceptual diagram of Persona Analytics. Multiple users can simultaneously interact with the
persona system. The user’s interaction with persona profiles is captured via mouse- and eye-tracking,
recorded in a central database (DB), and outputted via reporting interface. By analyzing the reports,
researchers can make important discoveries for persona science.
4 System Implementation
4.1 Overview
The defined questions acted as a guiding idea for requirements and implementation. In software development
projects, requirements detail what is needed from a system [20]. Engineers or developers tend to implement
features and functionalities according to the requirements to create the system. Two of the authors collaborated
on creating the requirements, and one of the authors with the necessary skills implemented them. The system was
tested internally and with a pilot user (reported in [37]), and we found it to log the data correctly.
The implementation of PA was carried out for an interactive system called Automatic Persona Generation
(APG) [2,3], which is a state-of-the-art system for algorithmically generated persona development. The system is
available at https://persona.qcri.org. While we defer the reader to related work for a complete description of
APG’s system functionalities and associated algorithms, the following subsection provides a brief explanation of
the APG system.
4.2 Algorithmic Approach for Persona Generation
APG generates personas from online analytics data—e.g., from YouTube audience statistics or Google Analytics log
data on end-users. APG infers demographically and behaviorally distinct patterns from user datasets [27,28]. APG
has been previously applied to datasets on social media users [89], ad target groups [84], online news audiences
[2,3], and video game players [90]. The APG persona creation relies on three main steps [30]: (a) identify unique
user behavioral patterns using non-negative matrix factorization (NMF) [43], (b) associate these behavioral
patterns with representative demographics (age, gender, country) to form “skeletal personas”, and (c) enrich the
skeletal personas with personified information that matches the demographics (name, picture, job, education
level, relationship status, topics of interest). Figure 4 summarizes the algorithmic persona generation process.
Figure 4: (A) applying the NMF algorithm [43] to the user dataset V that consists of demographic groups
(g) and content (c). This matrix is decomposed to W and H, both involving the hyperparameter p that
indicates the number of personas. Epsilon describes the error term. Through enrichment process (B),
explained in the body text, APG produces a set of p personas (C) that have personified information,
conceptually known as “personification of big data” [96].
4.3 Measurement Paradigm
The key insight going from APG (interactive persona system) to PA (persona analytics) is that, when personas are
provided through a web browser, PA takes place via mouse- (and eye-)tracking that records the persona users’
mouse (or gaze) movements and clicks (eye fixations) on the persona profiles and their information elements.
https://persona.qcri.org/
This enables empirical persona research, such as building click paths, persona visit sequences, dwell time
analyses, and so on. In PA, we track both the information usage behavior within the persona profiles and the
transitions between the personas. One can think of personas as “pages” in the conventional Web analytics terms,
and then information reports are the pages’ content. To answer research questions that advance persona science,
we need both levels of tracking.
To this end, the PA system records the user’s mouse and gaze movements simultaneously during the session.
The data is stored in a backend database. To support analysis, the screen coordinates are automatically converted
to the corresponding information elements in the logs using JavaScript. In other words, the PA system records
that a given mouse hovering or fixation was targeting, e.g., “Persona picture”. The duration of hover or fixation is
calculated based on the “in” and “out” timestamps. In total, the PA system has 120 predefined HTML elements
describing all information in the persona profile page (see Figure 5). The main elements include Headline (name,
gender, age, country), About (picture, text description, job, education level, relationship status), Sentiment, Topics
of Interest, Viewed Conversations (Quotes), Viewed Contents, and Audience Size, corresponding to typical
information in persona templates [58].
Figure 5: Because interactive persona systems serve the personas via a web browser, web technologies,
such as HTML and JavaScript, can be used for tracking how users interact with the personas. The frames in
the figure illustrate how information elements in the persona profile are tagged for user tracking. In total,
PA tracks 120 elements in the persona profile.
4.4 Online Eye-Tracking
The eye-tracking is implemented using WebGazer.js1 [62–64], a webcam-based gaze tracker developed at Brown
University. WebGazer is available in JavaScript as an open-source library2. Multiple alternative frameworks were
compared, but we chose WebGazer based on four reasons (a) it provides a relatively good accuracy based on our
pilot testing, (b) is actively developed based on the update frequencies in the GitHub repository, (c) the source
1 https://webgazer.cs.brown.edu/
2 https://github.com/brownhci/WebGazer
https://webgazer.cs.brown.edu/
https://github.com/brownhci/WebGazer
code is publicly available and can be integrated into systems such as APG, and (d) the software is provided free of
charge. These properties make WebGazer a feasible online eye-tracker for research-based systems such as APG.
4.5 Administrative Features
In the APG’s UI, system administrators can enable either mouse tracking, eye tracking, or both for all users or for a
subset of users. From a user’s point of view, the only difference is that, when eye-tracking is enabled, every
session starts with calibration (see Figure 6a and b). The PA system processes mouse- and eye-tracking data
identically, which means that both types of interaction are recorded in the database and can be exported in a
single file in order to improve usability for the researcher using the PA system. When the coordinates of hovering
or gazing correspond to a predefined element, this even, along with timestamp (in/out) and meta-data (User and
Session ID), is sent by the client browser via Ajax (Asynchronous JavaScript and XML) to the backend database.
The PA system maps the coordinates to the persona information element the user is interacting with.
Administrators can download the log files for data analysis (see Figure 6d). They can also create new user studies
from the persona system’s backend.
(a) (b)
(c) (d)
Figure 6: (a) Eye-tracking calibration dialogue and (c) how it shows to users; (c) example of a user’s eye-
tracking pattern before the data is converted to each specific information element in the persona profile
(denser color indicates more gaze fixations in a given area), and (d) data export dialogue shown for
researchers to export user logs. The logs are provided in CSV files which can be downloaded via the
Download button.
To prepare the data exports, the system uses Pandas (i.e., Python Data Analysis Library) after retrieving the
logs from the backend database. It computes the duration of each interaction based on “in” and “out” timestamps
and generates a comprehensive data report, in which each mouse and eye fixation event, its timestamp, its target
information element in the persona profile, and meta-information (Session ID and User ID), are saved into a file.
The logs can be downloaded for further analysis. Information about the variables logged by the PA is included in
Supplementary Material3. These variables were determined based on the metrics and questions detailed in the
previous sections. As a result, the data recorded by the PA system enables the calculation of various metrics of
persona user behavior. We now illustrate, though an example user study, how PA can serve persona researchers.
3 https://www.dropbox.com/s/yeu1jrohs6lbpg8/central%20variables.docx?dl=0
https://www.dropbox.com/s/yeu1jrohs6lbpg8/central%20variables.docx?dl=0
5 Validation Study
A remote user study was conducted in which 114 participants used an interactive persona system with PA
enabled, to browse a set of 10 personas, created from a dataset of a tourism-promoting organization with 1.8M
(1,795,115) user likes over 5,312 Instagram posts. The participants’ task was to choose a persona to target for
tourism marketing (i.e., promoting a specific destination, in this case, a country). Basic demographics of
participants are provided in Table 1.
Table 1: Participant demographics.
Age Male Female Non-binary
M SD 62 55 1 N = 118
(81.9%*)
35.35 9.08 52.5% 46.6% 0.01%
*For 18.1% of the participants, we did not have demographic information.
The participants were recruited using an online data collection service called Prolific [61]; the same service has
been used in various other persona user studies [82,83,88]. We used the platform’s industry categories as a
sampling criterion, including “Art/Design”, “Graphic Design”, and “Market Research”, in order to reach people that
work in industries were personas are relevant. Students were excluded. All participants were provided with a
definition of personas, and a task description prior to their use of the system. The study flow is illustrated in
Figure 7. The study dealt with testing the effect of simple and complex explanations on user behavior and
perceptions, served via a product walk-through (i.e., a process that sequentially shows different parts of the
system to a user that is logging in for the first time). The participants were directed from the online service to the
system, which randomly allocated each participant into one of the three experimental conditions. The participants
were randomly assigned by the APG system to one of the three experimental conditions (simple explanations
about the system, complex explanations, and no explanations at all).
Figure 7: An example of how the APG and PA systems can be synchronized to conduct a full remote user
study. In this case, we used an online platform to recruit participants to the study [A]. After using the
system to browse the personas [B] (no time limit was imposed), the participants completed a survey [C]
and were redirected back to the data collection platform [D] that logged successful study completions. PA
recorded both the survey and behavioral data [E].
After using the persona system, which consistent viewing as many personas as they wanted out of the ones
created by the algorithm (see Table 2) for as long as they wanted, the users could click on a banner to indicate
they are ready to complete their task, after which the participant is transferred to a survey platform that collects
data about their task completion, perceptions, and demographic variables. Upon completing the survey, the
participant is automatically redirected back to the online data collection platform, where their participation is
marked complete. Both APG and the survey platform record the UTM parameters4 that identify (in an anonymous
way) the participant so that participants that pass the data validation stage (i.e., researchers validating that their
responses were genuine) can be easily compensated, and their system usage data is linked with their survey
responses for further analysis. During system usage, PA was enabled, and data was collected on the users’
interactions with the persona system.
Table 2: Personas the algorithm created for the user study.
Persona Age Gender Country
Mamdouh 28 Male Egypt
Rahul 34 Male India
Ashley 25 Female United States
Muhammad 34 Male Pakistan
Alaa 18 Female Egypt
John 26 Male United States
Abdalaziz 23 Male Egypt
Rizky 18 Male Indonesia
Putri 20 Female Indonesia
Chris 40 Male United States
6 Explorations of Persona User Behavior
6.1 Overview
In this section, we analyze the collected data to demonstrate how PA can serve empirical persona user research.
We do not explicitly test any hypotheses, although the data obtained from PA could be used for that, but for the
sake of demonstration, we inductively analyze the data and provide exploratory findings about persona user
behavior. We then synthesize these findings in the form of propositions that future research could test, with
complementary theorization, as hypotheses. In other words, we illustrate how PA can be of service towards
persona science.
The results of the effect of the three conditions will be reported in a future publication; here, we focus on
demonstrating how PA can be used to analyze the data obtained from user experiments. For parsimony, the
following analyses focus on the mouse-tracking data. Because the eye-tracking data is logged in the precisely same
data structure as the mouse-tracking data, the exact same analyses and metrics can be obtained from eye-
tracking. Based on our piloting of the eye-tracking module, the accuracy strongly varies (from ~16% to ~80% in
our testing) by the user, condition, and equipment. This is also why it is more reliable to carry out this
demonstration with the mouse-tracking data.
6.2 Descriptive Statistics
Descriptive statistics about participants’ engagement with the system (see Table 3) indicate that, on average,
participants spent around 8.5 minutes browsing the personas for their task and visited persona profiles on
average 14 times. What is striking is the high dispersion among the participants – the standard deviations are
high for both the dwell time (SD = 8.3 minutes) and visit counts (SD = 11). The participant with the shortest dwell
time only used the system for 20 seconds, while the participant with the longest dwell time used the system for
4 Urchin Tracking Module parameters are a standard technique for tracking source and meta-data of Web traffic.
more than an hour (62.6 minutes). The shortest visit path only included visiting one persona, whereas the longest
path consisted of visiting the persona profiles 60 times, which equals 6 visits per profile on average. These results
indicate a major dispersion in engagement, with some participants being “persona power users” while others lack
significant engagement with the system. Future analyses could investigate how these two extreme user types
differ (e.g., demographic or industry variables that might explain the differences) and why (e.g., low task
motivation, not perceiving personas as relevant or useful).
Table 3: Dwell time (in seconds) and number of visits to the persona profiles by the participants.
Mean SD Min Max Median
dwell time (i.e., system usage
time)
514.3 496.8 19.5 3756.8 410.8
visits (i.e., number of times
loading a persona profile)
14.0 11.0 1 60 12
As a whole, exactly half (50.0%) of the users viewed at least 9 of the personas (see Figure 8a), i.e., achieving a
persona coverage of 90%, while less than a third (31.9%) viewed three or fewer personas. The fact that close to
half (47.2%) viewed all 10 personas implies that users have a need for viewing a variety of personas—10 was the
highest number in this study, but it seems likely that given the choice, users would have viewed more than 10
personas. Concerning information viewing patterns (Figure 8b), the persona profiles contain 8 parent information
elements (About, Audience Size, Headline, Sentiment, Timeline, Topics of Interest, Viewed Contents, and Viewed
Conversations). Only a minority of persona visits contained viewing all 8 information elements (0.5%). While
some visits included only viewing one information element (10.5%) – perhaps an indication of rapid verification
of a recalled detail – more than half of the persona visits (52.3%) contained the viewing of at least four parent
information elements (i.e., at least half of the main information in persona profiles). Section 6.5 investigates
further what information was most viewed.
(a) (b)
Figure 8: (a) Persona coverage (i.e., the number of personas a participant viewed during their whole
sessions), (b) Information coverage (i.e., how many parent information elements were viewed by the
users during each viewing of a persona).
6.3 Correlations and Gender Effects
There was no notable correlation between participant age and their system usage time (r = 0.08) or age and
number of visits (r = -0,16). In terms of dwell times, results from a t-test indicate that females were using the
system longer (M = 634.8 seconds) than males (M = 493.4 seconds), t(114) = -1.55, p = 0.06. In terms of visit
count, there was no significance difference, with females (M = 14.8) and males (M = 16.3) visiting a roughly even
number of personas, t(114) = 0.79, p = 0.21. (Both these tests were based on the 118 participants for which we
had gender information; there was one participant who indicated non-binary gender and who was therefore
excluded from the analysis.) For males, with 95% confidence, the population mean for persona visit duration is
between 33.4 and 59.4 seconds, based on 61 samples. For females, with 95% confidence, the population mean for
persona visit duration is between 46.5 and 80.3, based on 55 samples. Finally, there is also no significant effect
based on persona gender, with both male (M=37.8 seconds) and female (M=35.3 seconds) personas being
frequented roughly an even amount of time, t(1996) = 0.50, p = 0.31.
6.4 Persona Viewing Patterns
One of the researchers conducted an exploratory data analysis (EDA) on 21 participants’ patterns of viewing the
personas – by pattern, we mean how long a participant viewed a persona in their sequence of browsing the
personas. This EDA revealed several different patterns of viewing the personas (see Figure 9), including (a) shark
fin, (b) u-shape, (c) stabilizing, (d) sporadic, (e) linear declining, and (f) triangle shapes.
(a) Shark fin (b) U-shape (c) Stabilizing
(d) Sporadic (increasing) (e) Linear declining (f) Triangle
Figure 9: Different persona viewing duration patterns based on an exploratory analysis.
The variety of patterns indicates that it might be difficult to find general “laws” that would govern how
individual users explore a set of personas. However, among the manually reviewed samples, we observed that the
dwell times tend to decrease over the number of visits – 16 out of 21 had such a trend (76.2%) (e.g., a and c in
Figure 9), while only three participants (14.3%) had an increasing dwell time trend (e.g., d). The two remaining
had trends that could not be categorized as either decreasing or increasing (b and f). Thus, it appears that the time
spent reviewing persona profiles decreases over the number of visits. In cases where the dwell time appears to
“resurge” (e.g., b), the participant may be returning to a persona they found interesting earlier in order to verify,
learn more, or compare information. Some patterns remain highly sporadic till the end of the session (e.g., d),
while others seem to stabilize early (e.g., c).
6.5 Persona Information Viewing Behavior
We investigated where users focused their attention as a proxy for attention and interest. Results in Figure 10a
show that the users were most interested in social media quotes in the persona profile (68.4% of the total dwell
time), followed by the personas’ basic information (“About”, 14.1%) and audience size (8.1%) that indicates how
many people there are on Facebook and Twitter similar to the persona. Plotting the data shows that users’
attention is unevenly distributed, with the quotes garnering over five times more dwell time than the second most
popular information, i.e., the persona’s basic information. Unlike for persona visits, dwell time and visit count is
strongly correlated (r = 0.63) for persona information. It is known from previous persona studies that quotes are
very impactful for users’ perceptions of personas [86], but it is interesting that the mouse-tracking shows the
quotes overshadow other information this strongly.
(a) (b)
Figure 10: Information viewing behavior of the participants. (a) Dwell time (bars) and visit counts (line).
(b) Most common transitions between the parent information elements across all personas (Mamdouh is
used for illustration). Users start browsing the persona’s basic information, including picture, text
description and sociographics (State 0). They then move to comments (S1), most viewed content (S2),
which is viewed repeatedly (S3), before moving to topics of interest (S4), audience size (S5), persona’s
name and demographics (S6), and back to audience size (S7).
On the other hand, if we instead of dwell time focus on the number of visits, personas’ basic information
(“About,” as indicated by the orange line in Figure 10a) becomes the most important information element. This
element contains the text description and picture of the persona that, again, previous research has been found
influential for persona profiles [58]. The information viewing sequence (see Figure 10b) seems to move
diagonally from top left to bottom right, then up, then bottom left, and back up and finally down (↘↑↙↑↓). This
sequence was obtained by calculating the most common parent elements in states S0...S7 across all participant-
persona pairs.
Two takeaways can be elicited from these findings: (a) that personas’ quotes, text description, and picture are
among the most impactful information based on users’ mouse engagement, and (b) measuring dwell time and
visit counts can give different results, which is why measuring these two separately makes sense – an information
element with a high visit count but low dwell time is frequented in short bursts, whereas an information element
with a high dwell time is focused on for a longer time; it is logical that quotes interest people because they are
seen to reflect the persona’s attitudes and are information-rich for various user tasks.
6.6 Effect of Order of Personas
Order of presentation has been shown to affect how the information is accessed, used, and recalled [19]. Among
the notable effects in this line of work are, e.g., the primacy effect that implies first seen information is the most
impactful [68] and serial effect, implying that first and last items in a list are given special attention [54]. In
persona system context, these effects can matter, because the personas are shown in a list, and such effects can
therefore cause that, e.g., the needs of the first and last personas in the list would be considered more strongly
than those of other personas.
When plotting the data to investigate, three observations can be made from Figure 11: (1) there is a first
persona effect, i.e., the first shown persona (Mamdouh) gets substantially more attention than the others, (2)
there is no strong pattern of primacy effect in terms of declining dwell time based on persona’s order of being
displayed in the system. However, (3) the fact that the last persona (Chris) is the second most viewed implies
serial effect in which the first and last items of a list garner the most attention. Spearman rank correlation
between persona order and dwell time is negligible (r = 0.176). Correlation between system order and number of
visits is moderate (r = 0.576). Correlation between dwell time and visits is also moderate (r = 0.455). However,
when we compute the most common personas visited (i.e., S1 = the first persona the user visits, S2 = the second
persona they visit, ... S10 = the tenth persona they visit), we find that the TOP-10 path is precisely identical to the
order of presenting the personas in the system. A further check reveals that 31 users (21.5%) follow this sequence
when using the system. That is, about one fifth of the users browse all the personas in the order in which they
were presented.
(a) (b)
Figure 11: (a) Dwell time distribution among personas. (b) First persona effect. Mamdouh (the first
persona shown in the system) received almost one third of all dwell time from the users. If dwell time was
distributed evenly, he should only receive 10%, which means an excess of 229% from this equal baseline,
some of which likely stems him being the default persona in the system.
Overall, these results indicate that (a) the system’s default persona garners the most attention, and (b) a
sizeable portion of the users visit the personas in the exact order that the system shows them. Perhaps the system
should evenly fluctuate the default position among the personas to mitigate for “discrimination” arising from this
effect. The current logic is that the system loads the persona by default that has the highest audience
representation, i.e., the most engagements in the baseline user data the personas are created from. Testing
various rationale for the default persona and the effect of rotation are excellent ideas for future work. We also
observed a potential cultural effect, which is a possible manifestation of the users better identifying with the
personas from their own cultural sphere (see Table 4). Cultural aspects in personas remain an important area of
future work, such as the special role of default and exit personas, i.e., those with whom users finish their browsing
session (see Figure 12).
Table 4: Ethnic bias? The only three Western personas rank the highest in terms of average view time
(apart from Mamdouh that is the default persona). 89.8% of the participants were from Western
countries (Europe and United States). Users may feel more comfortable identifying with personas from
their own culture and ethnicity. This also implies persona studies should employ culturally and ethnically
diverse samples to obtain internationally valid results.
Persona Avg time per visit
Mamdouh (rank = 1) 69.10
Chris (rank = 10) 37.78
John (rank = 6) 33.78
Ashley (rank = 3) 33.49
Alaa (rank = 5) 32.98
Persona Avg time per visit
Rizky (rank = 8) 28.92
Putri (rank = 9) 28.13
Abdalaziz (rank = 7) 26.53
Muhammad (rank = 4) 24.85
Rahul (rank = 2) 20.79
Figure 12: Exit personas. The number of times a user stopped their session after viewing a given persona.
Because the task dealt with considering a specific persona, it is possible that the exit indicates a higher
likelihood of the persona being chosen for the task.
6.7 Modeling Persona User Behavior
Because the PA system records the user’s transition from one persona information element to another (based on
mouse and gaze movements), as well as capturing transitions from one persona to another, there are important
opportunities for modeling user behavior, some of which we illustrate here. Figure 13 describes these
opportunities through the concept of persona-gram, which refers to a string of letters depicting a user’s path of
visiting either the personas or the information elements within a persona profile. This information can be stored
as a state transition matrix (see bottom of Figure 13) which can be further used for computing the probability of a
user transition from one state to another.
In Figure 13, the names on the left illustrate a user’s path of visiting the personas. On the right-hand side (User
1 = U1), the same path is transformed into a string. The string format enables the comparison of different users
using Levenshtein’s edit distance (ED) [73]. For example, User 2 (U2) differs from User 1 (U1) in only two string
states (bolded in Figure 13), yielding ED1,2 = 1. Users that have a low edit distance are similar to each other in
terms of their persona use behavior, whereas users with a high edit distance are behaviorally more different.
When computing the distances of all users, it becomes possible to identify “average” behaviors and distinct outlier
behaviors. (Moreover, it is possible to consider the duration of each visit to get a higher dimensional
representation of the user’s viewing behavior.)
Figure 13: Illustration of persona-grams using imaginary data. The series on the left describes a user’s
transition from one persona to another. User 1 on the right is the same sequence transformed into a
string. User 2 has the same sequence except for two differences (A-N, bolded), which means the edit
distance is 2, i.e., one needs to make two edits to make the strings identical. The fewer changes one needs
to make, the more similar the sequence of viewing a person is between two users.
We computed the average edit distance across the dataset obtained and found the number highly dispersed. In
other words, two users would rarely view the personas (or the information elements within the personas) in the
same or similar order. This finding is interesting in itself – it implies users’ processing of persona information is
more idiosyncratic than anticipated. To give a simple example, Table 5 shows three randomly chosen participants
that each have a path length of 10, i.e., they visited persona profiles 10 times during their session.
Table 5: Persona-grams of three randomly chosen users who each visited persona profiles 10 times. The
color codes indicate the same persona being visited in the same sequence: yellow is shared by all three
users, green is shared by Users 1 and 3, and turquoise by User 2 and 3. User 3 has a more similar browsing
behavior with User 1 than with User 2, and User 1 and User 2 share the least similarity.
User 1 User 2 User 3
Mamdouh
Alaa
Putri
Alaa
Rizky
Abdalaziz
Alaa
Putri
Rizky
Alaa
Mamdouh
Rizky
Chris
Rahul
Alaa
Muhammad
Abdalaziz
Ashley
John
Putri
Mamdouh
Alaa
Putri
Alaa
Ashley
Chris
Abdalaziz
John
Mamdouh
Ashley
As can be seen from Table 5, the behaviors are almost completely unique—for example, the only shared visit
among the three is to the first persona, which is the default shown by the system. Hence, edit distance is a
troublesome metric, because in this case, we would need 9 edits to make User 1 and User 2 — ratio-wise, this is
9/10, so 90% change rate (i.e., 9 out of 10 paths are different). Due to high uniqueness demonstrated by this
example (which is also accentuated by the fact that the strings are of different lengths across the dataset!), the
similarity of behaviors could perhaps be measured using other options. For example, User 2 and User 3 viewed 7
same personas (Mamdouh, Chris, Alaa, Abdalaziz, Ashley, John, and Putri) and 3 different personas (Rizky, Rahul,
Muhammad). So, even though their exact viewing sequences are very different, the users actually view more the
same than the different personas, i.e., there is likeness in their browsing behavior. To quantify this likeness, we
can apply set theory to form an intersection (i.e., an overlap of paths). The intuition is that if two users visited
more same personas than two other users, their persona browsing behavior was more similar. To can quantify
this by calculating the Jaccard coefficient (J), which simply indicates the overlap between two sets. This metric is
commonly used in information theory to compare sets [44].
Applying J to our examples from Table 5, we can observe that User 2 and User 3 are more similar to each other
(J=0.7) than User 1 and User 2 (J=0.5) or User 1 and User 3 (J=0.5). (For replication, the sets are: User 1 – M, A, P,
A, R, B, A, P, R, A; User 2 – M, R, C, H, A, U, B, S, J, P; and User 3 – M, A, P, A, S, C, B, J, M, S.) Unlike ED, which is only
applicable to pairwise comparison, sets can be expanded from pairwise comparisons to multiple sets (see Figure
14).
Figure 14: Among the 10 available personas, 4 (40%) were viewed by all three users. Users 2 and 3
viewed 3 personas that User 1 did not view, whereas Users 1 and 2 viewed 1 persona that User 3 did not
view. User 2 viewed 2 personas that neither of the other users viewed, which indicates this user had the
most diverse viewing pattern.
Obtaining the number of intersecting elements (i.e., shared personas that any number of participants viewed)
is trivial and easy using basic functions in scripting languages like R and Python, which increases the practical
appeal of modeling persona behavior using sets. One can also use sets to compare the behaviors of different
groups. We illustrate some of these cases in Figure 15. For example, (a) union can be used for identifying all
personas that a group of users viewed, which can be beneficial when the number of personas exceeds a handful, as
would be the case for large and heterogeneous online audiences. (b) Intersection shows common elements of
two or more users or groups. Intersection can reveal common personas of interest, i.e., that most participants
engaged with. (c) Difference can show personas that one group viewed exclusively, e.g., those that were unique to
more experienced persona users. Finally, (d) subset and superset can help make comparisons on the variety of
personas visited. For example, in our previous case, User 2 is a superset of User 1 (i.e., User 2 visited all the
personas that User 1 did and more).
(a) Union (b) Intersection (c) Difference (d) Sub- and supersets
Figure 15: Examples of basic set operations and how they can be used for investigating persona user
behavior.
While approaching the analysis of persona user behavior using set theory seems fruitful, we can include even
more information in the comparison. Namely, the set representation ignores that the visits are typically unevenly
distributed among the personas, both by count and duration. Some personas are viewed more than once; some
are viewed considerably longer than others. A set would not consider this information at all.
For a representation that considers such information, we can turn to empirical distributions or probability
distributions. These indicate how the time or number of visits is allocated between different personas during a
user session. For example, if a user visits Alaa 5 times, Putri 5 times, and John 2 times, the empirical distribution is
5 / 12; 5 / 12; 2 / 12; or [0.42, 0.42, 0.16]. For a finite set of personas and users, we can compute a complete
probability distribution for each user, where complete implies that each persona-user pair will have a value. Then,
from information theory, we can use several metrics to compare the obtained probability distributions. These are
known as statistical distance metrics (e.g., Kullback–Leibler (KL) divergence or Jensen-Shannon distance). The
smaller the distance between two users, the closer their behavior is in terms of how they divide their time among
the available personas. Applying this logic to our dataset, we obtain the probability distributions indicated in
Table 6.
Table 6: Distribution of example users’ visits among the shown personas. Non-zero values are highlighted.
Mamdo
uh
Rah
ul
Ashl
ey
Muham
mad
Ala
a
Joh
n
Abdala
ziz
Riz
ky
Put
ri
Chr
is
Us
er 1
10% 0% 0% 0% 40
%
0
%
10% 20
%
20
%
0%
Us
er 2
10% 10
%
10% 10% 10
%
10
%
10% 10
%
10
%
10
%
Us
er 3
20% 0% 20% 0% 20
%
10
%
10% 0% 10
%
10
%
The distance D between two distributions can be computed using the following equation,
,
1
1
| ( ) ( ) |
N
p q i i
i
D p x q x
N =
= − ,
where p and q are the distributions to compare (e.g., User 1 and User 2). The formula calculates the absolute
difference for both users for each persona X, and then takes the average as the distance number. Unlike KL
divergence, which is non-symmetrical (i.e., the distance between User 1 and User 2 might not be the same), this
formulate gives symmetrical results (i.e., Dpq = Dq,p). (As a sidenote, symmetry is, of course, desirable for our
purpose, because there is no reason to assume that the results should differ when comparing User 1 to User 2 or
vice versa; in both cases, the sequences are the same.) When inputting the fractions from Table 6 to this formula,
we can see the results aligning with the J comparison, so that Users 2 and 3 are the most similar (D = 0.06),
whereas User 1 is equally distant to User 2 (D = 0.10) and User 3 (D = 0.10). This example illustrates how concepts
from information science can be leveraged for persona science, namely, by understanding persona viewing
behaviors as probability distributions and then computing distance. Essentially, the smaller these distance values
are, the closer two persona viewing behaviors would be (as the behaviors are represented as probability
distributions).
6.8 Interpreting the Metrics
It is well known that general rules about whether a metric value is “good” or “bad” are difficult to draw – for
example, for an entertainment website or social media service, it is desirable that users spend a lot of time on the
site, because time is positively correlated with revenue models [100]. However, the opposite applies for
government information websites or search engines – the user is expected to find the information as quickly as
possible and then leave the site. So, in some cases, small engagement time is optimal; in other cases, it is not
optimal. For personas, the same applies, with even more nuance – in many professional tasks, users are time-
pressed; they have deadlines, they want the information immediately, and so on. For these scenarios, low
engagement time with a persona system would be considered a good sign (given that the user was able to
complete their task successfully or that the personas helped the user). However, in scenarios where the user is
conducting end-user research (e.g., market research), they might be interested in dwelling deep into the persona
details: a low engagement time would therefore not indicate that the personas were useful for the user.
We provide further examples, as the matter of defining and computing various metrics is not trivial.
Example 1: Consider that a user visits 5 personas during the session and does overall 20 visits. Now, if we
apply the average, we get 20 / 5 = 4. however, the comparative aspect is if user visiting these 5 personas during
their first 5 visits, versus visiting the 5 personas during their first 10 visits, the behavioral pattern is different. In
the former case, the user is first visiting many personas, and then spending the rest of the time comparing them.
In the latter case, the user is engaging in comparative behavior already during visiting the first personas. These
two strategies would be different, but a simple average would miss the nuance. To quantify such patterns, we can
compute x = how many personas the user visited and y = how many visits it took the user to visit each of the x
personas at least once. This metric can be called “persona scanning tendency” or PST. Using the above example,
the scores would be different:
PST5 = 5 / 5 = 1 versus PST10 = 5 / 10 = 0.5.
A lower PST score would indicate a smaller tendency to scan all or many personas first and then dwell into
their information, and vice versa. A high PST score could also be associated with a linear user behavior, in which
the user sequentially visits the available personas. In contrast, a low PST score could be associated with non-
linear user behavior. For example, if it took the user ten visits to see all the five personas, they were likely doing
comparisons along the way. Therefore, this PST metric can reveal insights about different users’ tendency to
compare personas against one another and proceed in an organized manner when browsing the available
personas.
Example 2: How to measure how diversely a user browses the personas? Highly diverse behavior would be
one that looks at many personas. For example, if there were a total of 10 personas available, 5 out of 10 personas
visited is less diverse than 10 out of 10 personas visited. That is, persona coverage would seem like a good metric
(being 5 / 10 = 0.5 and 10 / 10 = 1 for the above cases). But, if the user only made 5 visits and visited 5 personas
(out of the 10), then that is more diverse than making 12 visits and visiting 6 personas. In other words, when
assessing the diversity of visit behavior, understood in this manner, the denominator should be the number of
visits the particular user made, not the number of available personas. So, even if persona coverage would be
higher for the user visiting 6 personas (6 / 10 = 0.6), their visit diversity would still be lower than that of the user
visiting one persona less (i.e., 6 / 12 = 0.5 vs. 5 / 5 = 1).
There are also cases where different granularity would be needed, for example, considering all information
elements (of which there are 120) versus only parent information elements (of which there are 8), or considering
forward-only movement versus both forward and backward movements (i.e., across information elements and/or
personas). Depending on these choices, the dimensionality of the analyzed dataset can greatly increase or
decrease. Overall, these examples highlight the non-trivial nature of measuring and understanding persona user
behavior. Intepreting PA metrics follows the general patterns in UX research: a high value can indicate either a
desirable or undesirable effect depending on the task type and user goals: for example, longer dwelling time may
be a sign of more interest, but it could just as well be a sign of confusion and disorientation. Aligning the metrics
with expressed user perceptions (e.g., “I was confused when using the personas”) can be useful in this regard.
Overall, case-dependent interpretation is required.
7 Discussion
7.1 Highlights and Novelty
Even though there is some empirical evidence showing the effectiveness of personas for specific tasks [78], as a
whole, the body of knowledge remains underdeveloped, even after 20 years of personas being part of HCI
research and practice. The purpose of PA is to help generate new knowledge on persona user behavior towards
the advancement of persona science. Examples include behaviors related to order of presenting the personas,
revisit frequency, users’ styles of browsing and comparing personas, and persona choice – i.e., how and why
people choose a specific persona for their decision-making task. Analyzing users’ typical viewing patterns and
dwell time per persona information can inform persona information design (i.e., what information users most
interact with), and help deduce understanding of persona usage based on a real system and real or realistic use
cases and scenarios. Combined with algorithmically generated persona systems, this opens the possibility of
updating persona profiles in real-time based on expressed user needs.
It is well known that the body of knowledge on personas relies on a high number of case studies as opposed to
repeated experiments with independent samples (i.e., different organizations, locations, users, etc.). Even when a
case study focuses on a specific context, there is a need for multiple case studies to establish something more
generalizable than the one case can do [99]. Without having more rigor, persona research cannot proceed much,
and it risks going in circles instead of establishing evidence-based empirical phenomena. Examples of this “going
on circles” include conflicting findings about personas being applicable and not being applicable (e.g., [70] vs.
[60]) – to date, nobody has explained when personas are applicable and when not. The only way to achieve
plausible explanations is to carry out repeated testing and measure the results empirically, for which PA provides
a way.
Users’ interaction with personas can be measured in many ways. Here, we focused on two commonly used
technologies: mouse tracking and eye tracking, as these two technologies have unique strengths and weaknesses,
and both can be implemented in a Web-based interactive persona system. As acknowledged in many HCI studies,
eye- and mouse-tracking are helpful techniques for studying user engagement with interactive systems [15,91],
and personas in particular [26,76,78,85,86]. When integrated directly into an interactive persona system, these
data collection methods can deliver rich datasets describing persona use and modeling for complex user
behaviors [18,22].
Moreover, integrating these techniques also enable user studies during exceptionally difficult times (e.g.,
during a global pandemic) when it is either not possible or very difficult to conduct in-person user studies.
Highlights of this work include the following:
• Providing conceptual underpinnings of persona analytics and persona science, two emerging and
promising concepts for HCI and user segmentation.
• Developing a novel persona analytics system embedded within a persona system, by integrating mouse-
and eye-tracking functionalities. Measuring users’ mouse and gaze activities grants an understanding of the
behaviors of a persona user. Studying the behavior of the persona user is essential to generate empirical
knowledge and theories of personas and human-persona interaction.
• Demonstrating an end-to-end experimental loop for empirical persona studies, using an interactive
persona system and data collection platforms with a user study of 144 participants. Remote user studies such as
this help scale persona user studies from the conventional 30-or-so to hundreds of participants.
• Providing exploratory findings of persona user behavior, based on the use of PA, its metrics, and various
statistical techniques.
The trends contributing to interactive persona systems are likely to continue, including the evolution of (1)
digital user data from online analytics platforms through APIs, (2) data science algorithms and libraries that
integrate into quantitative persona creation process, and (3) web medium that surpasses the limitations of paper
for persona delivery and user engagement. Interactive persona systems commonly rely on web standards [33]
that enable users to access personas from any device with an internet connection and make it possible to record
persona users’ behavior and analyze with using PA, a customized solution for tracking users’ interaction with
persona profiles. This system, therefore, has value and potential for advancing persona science and the
development of interactive persona systems for years to come.
The current work puts forward a new artifact for studying the behavior of persona users. While previous
persona user studies [73,76,78] have analyzed dwell time and users’ information viewing sequence, these metrics
have, as far as we know, not been captured directly from an interactive persona system in any previous research.
Thus, the presented solution has novelty in its field. As far as we know, the study presented the first “online
laboratory” solution for interactive persona systems during a period in history where there is a need for remote
user study solutions.
7.2 Novel Opportunities to Push Forward Persona Science
Persona science needs progress on all fronts, eyeing on long-term theory formulation but also investing in short-
term returns through the use of empirical methods. Persona science can contribute to a much-needed transition
beyond the general claims that “personas work” or “personas do not work”, or the repetition of their “benefits”
and “problems,” into systematically examining the conditions where the effects emerge.
Many human factors have not been investigated empirically. Based on case studies, variables of special interest
include at least (a) Experience [81], (b) Task type [4], (c) Job role [59], and (d) Culture [56]. Investigation of
combination of these human factors would be based on specific research objectives.
First, the effect of users’ experience with personas on behaviors; this tends to be reported in persona studies
but not included as a variable. How does novice persona users’ use of personas differ from more experienced
users? Can the behaviors of more experienced users be used for guiding the novice users to learn to use personas
more efficiently? The task and task type, more specifically. This is reported but rarely controlled – most typically,
only one task type is deployed and in only one empirical setting, without repetition to achieve robustness. As a
result, the current body of literature has, for example, no comparison between different task types – e.g., design,
content creation, ad targeting, etc. Personas can be deployed for a range of professional tasks, but no study
compares what kind of personas are ideal for the various task types and if users approach the personas differently
based on the task type.
The users’ job roles – again, reported, but comparison among different roles and organizational units are
rarely conducted, even though it is common sense that a person’s job position would greatly affect how they use
personas to support their work. Studies tend to mention “designers” but looking deeper into these users’ job
positions, it is revealed that they work in multiple departments, have multiple different perspectives to the end-
user, and require much different information for their decision making. Overall, systematic analysis of these
variables in experimental studies can produce long-lasting, consistent, and robust knowledge on personas and
their users, extending the boundaries of persona research. Finally, culture. Previous research has hinted at
cultural effects related to personas use [83], but there is no adequate understanding of how the cultural match
between the shown personas and the users mediate the interaction and whether personas themselves can help
bridge cultural gaps for design.
Table 7 proposes a preliminary research roadmap. Naturally, due to the enormous scope of potential research
topics, this proposal is not a complete one. However, it maps some relevant questions that can be addressed via
PA.
Table 7: Research roadmap for empirical persona research. This roadmap includes research questions
that PA can help address. Through solving multiple research questions, researchers can start formulating
a unified theory of personas that would explain, with robust empirical when and when not personas work,
what factors govern their successful use, and how can persona creators and champions increase the
likelihood persona projects.
Open Research Question (ORQ) and Sub-
Questions (SQ)
Useful for…* Variation
by...
ORQ1 What types of personas are most/least viewed?
• SQ1a: Are there differences based on
personas’ gender, age, nationality,
ethnic background?
• SQ1b: Are some types of personas
systematically disadvantaged in terms
of how frequently and how long users
interact with them? If so, can system
features, e.g., rearranging order of
showing, mitigate such disadvantages?
Persona Creation and Use ⇒ How do users
interact with persona profiles?
experience
task type
job role
culture
ORQ2 What persona information was most/least viewed?
• SQ2a: How is users’ consumption of
persona information affected by
position and screen size of the
information?
• SQ2b: Is some information considered
redundant regardless of its position
and screen size?
Persona Information Design ⇒ What
information do persona users pay attention
to?
ORQ3 How does the user transition between (a) the
personas and (b) the information elements in the
persona profiles?
• SQ3a: What is the degree of linearity /
predictability?
• SQ3b: Are the dwell times
consistently increasing / decreasing?
Persona Information Design ⇒ How can we
model users’ cognitive styles of using
personas?
ORQ4 How does increasing/decreasing the number of
personas affect user behavior?
• SQ4a: What interaction techniques can
help users cope with more than a
handful of personas?
• SQ4b: What is the extra cognitive cost
of adding a persona?
Persona User Behavior ⇒ What is the optimal
number of personas?
ORQ5 How do the viewed (a) personas and (b) persona
information influence users’ design choices?
• SQ5a: What is the effect of persona
use of desirable outcomes such as
increased usability, satisfaction,
profitability?
• SQ25b: How can these outcomes be
measured?
Persona Impact, Value of Personas ⇒ What is
the real value of personas?
Open Research Question (ORQ) and Sub-
Questions (SQ)
Useful for…* Variation
by...
*NOTES: The categories of Persona Creation, Validation, Use, and Impact originate from [77]. ⇒ indicates a
higher abstraction of the ORQ in question.
7.3 Practical Implications
Benefits for Researchers. Persona science is the application of scientific methods to persona research, with the
goal of producing empirically valid knowledge about persona creation, validation, use, and impact for design
outcomes, individual decision making, and organizations. PA supports persona science by aiding in the scientific
process, such as generation and testing of hypotheses: data exploration → inductive analysis → propositions →
hypotheses → independent data collection → hypothesis testing → theory generation of personas and their users. PA
can be deployed for both inductive and deductive research. Deductive research typically formulates hypotheses
and relies on experiments to test those hypotheses. The hypotheses can be inspired by works in HCI, social
psychology, economics, etc. Manipulating variables in the interactive persona system, such studies can be
conducted in-person or remotely (see Figure 7 illustrating an end-to-end loop for remote participation). Inductive
studies rely on freely exploring user behaviors to formulate propositions that can later be tested in controlled
settings. Here, we conduct a series of exploratory analyses in order to demonstrate the type of information,
metrics, and analyses that the PA can afford to researchers.
Benefits Relative to Pre-Existing Solutions. From a practical point of view, the reader might pose the
question, “Why not just use Google Analytics for this purpose? Why develop a new system?”. There are various
reasons for that. Compared to pre-existing industry solutions, such as Google Analytics (GA), which is the
dominant Web analytics service [65], PA has four major benefits, summarized in Table 8 and explained thereafter:
Table 8: PA vs. GA – a practical comparison.
Persona Analytics Google Analytics
Data ownership x -
Tracking
customization
x Partial
Clickstream logs x Requires Premium version
Persona metric
exports
x -
• Data ownership: the data is recorded to our serves, not to third party servers such as Google’s.
• Customized event and information tagging: As we know the exact functionality of the APG system, we can
label the information elements and events with suitable names from the onset.
• Clickstream data: Unlike in the standard GA installation, we are able to obtain raw log data of all actions
taken by a user in the APG system. The standard GA installation only provides aggregate data export.
• Tailored reports for persona user analysis: GA reports are designed for websites, not for custom-built systems
like interactive persona systems. Therefore, the available metrics and the reports are not suitable for analytical
questions related to personas. The development of our reports was inspired by analytical tasks for which the
data would be used. Thus, the reports serve persona research better than reports in GA.
With these benefits, the PA system is more equipped for tracking persona user behavior than GA, and therefore
has the potential to serve researchers more adequately.
Benefits for end-users. Personas are an important tool for professionals in various domains, which is why
understanding how people use them is instrumental for the creation of better end-user insights tools and
therefore advocate more customer- and user-centric thinking in organizations. The practical implementation of
PA also requires consideration for the users’ privacy, to let them know their usage of the system is being tracked.
Therefore, we notify users of this tracking in APG’s terms of service, which is similar to any other website tracking
user behavior. Additional ethical considerations include acquiring consent from users when conducting user
studies to track their system usage behavior, as well as acceptance from an institutional review board (IRB) when
there is a reason to suspect that the research deals with topics that warrant ethical scrutiny (in persona user
studies, harmful scenarios tend to be rare but nonetheless could exist for some study topics).
7.4 Specific Application Areas for Future Research and Development
There are several areas to further deepen the level of analysis. The data produced by the PA system can be
examined using many computationally advanced approaches, for example, by creating a persona state-transition
matrix and applying Markov Chain modeling or neural networks to the constructed matrix to model historical
dependencies (similar to [94]). Similarly, future work could look into predicting user outputs (e.g., task success)
using behavioral features. There are several architectures to deal with sequential data, including recurrent neural
networks (RNNs) [10], that could be used for modeling purposes that can result in persona recommenders or user
behavior classification. While we leave these additional considerations for future work, we do want to point out
two prominent opportunities:
• Prediction. By varying the personas, their information, or interactive features, the effect of such
variables on outcome variables such as persona choice, task completion success, quality, time, perceived
usefulness, or design task impact (e.g., usability improvement) can be measured. For example, when
there are many personas available, users often choose a specific persona for their task. This choice
matters because selecting one persona over another can influence how different user groups’ needs are
considered (or not considered).
• Persona recommenders. For large and heterogeneous populations, the appropriate number of
personas can be in the hundreds or more to correctly represent the diversity of the user population [11].
Therefore, there is a need, on the one hand, to show more personas to users, and on the other hand, to
build tools and methods for users or systems to narrow down the number of candidate personas for a
given task, without taking away users’ choice of reaching beyond this candidate pool to browse
marginalized or fringe user groups [21]. Approaches for achieving this may include recommending users
a specific set of personas based on task criteria or user modeling. Simple examples that have been
already implemented in interactive persona systems include sorting the personas based on their
segment size (i.e., how many people they represent in the baseline data), either in decreasing order
(when wanting to see the most representative personas) or in increasing order (when wanting to see
outlier personas). However, more elaborate persona recommenders are missing to date. The
recommenders’ value increases as the number of personas increases because large persona sets with
hundreds or even thousands of personas [27] pose a manageability problem for a human efficiently to
deal with.
Finally, the PA approach could be applied to other forms of profile systems, including social media profiles,
gaming avatar profiles, and so on. Although the system was designed with personas in mind, if the use case of
wanting to learn how people interact with profiles is adequately similar, then PA-type of analytics can be
deployed.
8 Conclusion
In this work, we demonstrated a fully functional persona analytics system embedded within a fully functional
interactive persona system. We demonstrated ways in which persona analytics can reveal structural patterns in
user interaction with personas. These patterns can lead to advancement of persona science and theoretical
propositions for persona-user interaction. Persona research lacks empirical studies, so there is plenty of room for
contributions by keen researchers. To this end, we hope our research encourages others to pursue empirical
research questions in the persona domain. The techniques we demonstrated have special value during
exceptional times when physical user studies are hindered by social distancing. Finally, we proposed seven
metrics that can be computed from the persona analytics data. In addition to these metrics, devising new
quantitative metrics for persona studies is a valuable research direction.
References
[1] Mohamed Aboelmaged and Samar Mouakket. 2020. Influencing models and determinants in big data
analytics research: A bibliometric analysis. Information Processing & Management 57, 4 (July 2020), 102234.
DOI:https://doi.org/10.1016/j.ipm.2020.102234
[2] Jisun An, Haewoon Kwak, Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 2018. Customer
segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data. Social
Network Analysis and Mining 8, 1 (2018), 54. DOI:https://doi.org/10.1007/s13278-018-0531-0
[3] Jisun An, Haewoon Kwak, Joni Salminen, Soon-gyo Jung, and Bernard J. Jansen. 2018. Imaginary People
Representing Real Numbers: Generating Personas from Online Social Media Data. ACM Transactions on the Web (TWEB) 12, 4 (2018), 27.
DOI:https://doi.org/10.1145/3265986
[4] F. Anvari, D. Richards, M. Hitchens, and M. A. Babar. 2015. Effectiveness of Persona with Personality
Traits on Conceptual Design. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 263–272.
DOI:https://doi.org/10.1109/ICSE.2015.155
[5] Farshid Anvari, Deborah Richards, Michael Hitchens, Muhammad Ali Babar, Hien Minh Thi Tran, and
Peter Busch. 2017. An empirical investigation of the influence of persona with personality traits on conceptual design. Journal of Systems and
Software 134, (December 2017), 324–339. DOI:https://doi.org/10.1016/j.jss.2017.09.020
[6] Farshid Anvari, Deborah Richards, Michael Hitchens, and Hien Minh Thi Tran. 2019. Teaching User
Centered Conceptual Design Using Cross-Cultural Personas and Peer Reviews for a Large Cohort of Students. In 2019 IEEE/ACM 41st International
Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), 62–73. DOI:https://doi.org/10.1109/ICSE-
SEET.2019.00015
[7] M. Aoyama. 2005. Persona-and-scenario based requirements engineering for software embedded in
digital consumer products. In Proceedings of the 13th IEEE International Conference on Requirements Engineering (RE’05) , Washington, DC, USA,
85–94. DOI:https://doi.org/10.1109/RE.2005.50
[8] M. Aoyama. 2007. Persona-Scenario-Goal Methodology for User-Centered Requirements Engineering. In
Proceedings of the 15th IEEE International Requirements Engineering Conference (RE 2007) , Delhi, India, 185–194.
DOI:https://doi.org/10.1109/RE.2007.50
[9] Ernesto Arroyo, Ted Selker, and Willy Wei. 2006. Usability tool for analysis of web designs using mouse
tracks. In CHI’06 extended abstracts on Human factors in computing systems, 484–489.
[10] Homanga Bharadhwaj. 2019. Explainable recommender system that maximizes exploration. In
Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion , 1–2.
[11] Chris Chapman, Edwin Love, Russell P. Milham, Paul ElRif, and James L. Alford. 2008. Quantitative
Evaluation of Personas as Information. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 1107–1111.
DOI:https://doi.org/10.1177/154193120805201602
[12] Chris Chapman and Russell P. Milham. 2006. The Personas’ New Clothes: Methodological and Practical
Arguments against a Popular Method. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 634–636.
DOI:https://doi.org/10.1177/154193120605000503
[13] Eric Chu, Prashanth Vijayaraghavan, and Deb Roy. 2018. Learning Personas from Dialogue with
Attentive Memory Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for
Computational Linguistics, Brussels, Belgium, 2638–2646. Retrieved June 20, 2019 from https://www.aclweb.org/anthology/D18-1284
[14] Theresa Bilitski Clarke and Bernard J. Jansen. 2017. Conversion potential: a metric for evaluating
search engine advertising performance. Journal of Research in Interactive Marketing 11, 2 (2017), 142–159.
[15] Argyris Constantinides, Marios Belk, Christos Fidas, and Andreas Pitsillides. 2020. An eye gaze-driven
metric for estimating the strength of graphical passwords based on image hotspots. In Proceedings of the 25th International Conference on
Intelligent User Interfaces, 33–37.
[16] Alan Cooper. 1999. The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and
How to Restore the Sanity (1 edition ed.). Sams - Pearson Education, Indianapolis, IN.
[17] Giuseppe Desolda, Rosa Lanzilotti, Danilo Caivano, and Maria F. Costabile. 2021. An experience on
remote testing exploiting new web technology. In INTERACT’21 Workshop on Remote user Testing – Experiences and Trends, Bari, Italy.
[18] A. T. Duchowski. 2009. Eye Tracking Methodology: Theory and Practice. Springer, London.
[19] Hermann Ebbinghaus. 2013. Memory: A contribution to experimental psychology. Annals of
neurosciences 20, 4 (2013), 155.
[20] Achim Ebert, Shah Rukh Humayoun, Norbert Seyff, Anna Perini, and Simone D.J. Barbosa (Eds.). 2016.
Usability- and Accessibility-Focused Requirements Engineering. Springer International Publishing, Cham. DOI:https://doi.org/10.1007/978-3-319-
45916-5
[21] Joy Goodman-Deane, Sam Waller, Dana Demin, Arantxa González-de-Heredia, Mike Bradley, and John
P. Clarkson. 2018. Evaluating Inclusivity using Quantitative Personas. In In the Proceedings of Design Research Society Conference 2018, Limerick,
Ireland. DOI:https://doi.org/10.21606/drs.2018.400
[22] Thomas Grindinger, Andrew T. Duchowski, and Michael Sawyer. 2010. Group-wise Similarity and
Classification of Aggregate Scanpaths. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications (ETRA ’10), ACM, New York,
NY, USA, 101–104. DOI:https://doi.org/10.1145/1743666.1743691
[23] Jonathan Grudin. 2006. Why Personas Work: The Psychological Evidence. In The Persona Lifecycle,
John Pruitt and Tamara Adlin (eds.). Elsevier, 642–663. DOI:https://doi.org/10.1016/B978-012566251-2/50013-7
[24] Jonathan Grudin and John Pruitt. 2002. Personas, Participatory Design and Product Development: An
Infrastructure for Engagement. In Proceedings of Participation and Design Conference (PDC2002), Sweden, 8.
[25] Jacek Gwizdka, Rahilsadat Hosseini, Michael Cole, and Shouyi Wang. 2017. Temporal dynamics of eye-
tracking and EEG during reading and relevance decisions. Journal of the Association for Information Science and Technology 68, 10 (October 2017),
2299–2312. DOI:https://doi.org/10.1002/asi.23904
[26] Charles G. Hill, Maren Haag, Alannah Oleson, Chris Mendez, Nicola Marsden, Anita Sarma, and
Margaret Burnett. 2017. Gender-Inclusiveness Personas vs. Stereotyping: Can We Have it Both Ways? In Proceedings of the 2017 CHI Conference,
ACM Press, Denver, Colorado, USA, 6658–6671. DOI:https://doi.org/10.1145/3025453.3025609
[27] Bernard J. Jansen, Soon-gyo Jung, and Joni Salminen. 2019. Creating Manageable Persona Sets from
Large User Populations. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, Glasgow, United Kingdom,
1–6. DOI:https://doi.org/10.1145/3290607.3313006
[28] Bernard J. Jansen, Soon-gyo Jung, and Joni Salminen. 2020. From flat file to interface: Synthesis of
personas and analytics for enhanced user understanding. Proceedings of the Association for Information Science and Technology 57, 1 (October
2020). DOI:https://doi.org/10.1002/pra2.215
[29] Bernard J. Jansen, Joni Salminen, and Soon-gyo Jung. 2020. Data-Driven Personas for Enhanced User
Understanding: Combining Empathy with Rationality for Better Insights to Analytics. Data and Information Management 4, 1 (2020), 1–17.
DOI:https://doi.org/10.2478/dim-2020-0005
[30] Bernard Jansen, Joni Salminen, Soon-gyo Jung, and Kathleen Guan. 2021. Data-Driven Personas (1st
ed.). Morgan & Claypool Publishers. Retrieved February 10, 2021 from
https://www.morganclaypool.com/doi/abs/10.2200/S01072ED1V01Y202101HCI048
[31] Joel Järvinen and Heikki Karjaluoto. 2015. The use of Web analytics for digital marketing performance
measurement. Industrial Marketing Management 50, Supplement C (October 2015), 117–127.
DOI:https://doi.org/10.1016/j.indmarman.2015.04.009
[32] Soon-gyo Jung, Jisun An, Haewoon Kwak, Moeed Ahmad, Lene Nielsen, and Bernard J. Jansen. 2017.
Persona Generation from Aggregated Social Media Data. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in
Computing Systems (CHI EA ’17), ACM, Denver, Colorado, USA, 1748–1755.
[33] Soon-Gyo Jung, Joni Salminen, Jisun An, Haewoon Kwak, and Bernard J Jansen. 2018. Automatically
Conceptualizing Social Media Analytics Data via Personas. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM
2018), San Francisco, California, USA, 2.
[34] Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 2019. Personas Changing Over Time: Analyzing
Variations of Data-Driven Personas During a Two-Year Period. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing
Systems - CHI EA ’19, ACM Press, Glasgow, Scotland Uk, 1–6. DOI:https://doi.org/10.1145/3290607.3312955
[35] Soon-Gyo Jung, Joni Salminen, and Bernard J. Jansen. 2020. Giving Faces to Data: Creating Data-Driven
Personas from Personified Big Data. In Proceedings of the 25th International Conference on Intelligent User Interfaces Companion (IUI ’20),
Association for Computing Machinery, Cagliari, Italy, 132–133. DOI:https://doi.org/10.1145/3379336.3381465
[36] Soon-gyo Jung, Joni Salminen, and Bernard J Jansen. 2021. Implementing Eye-Tracking for Persona
Analytics. In ETRA ’21 Adjunct: ACM Symposium on Eye Tracking Research and Applications, ACM, Virtual conference, 1–4.
DOI:https://doi.org/10.1145/3450341.3458765
[37] Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 2021. Persona Analytics: Implementing Mouse-
tracking for an Interactive Persona System. In Extended Abstracts of ACM Human Factors in Computing Systems - CHI EA ’21, ACM, Virtual
conference. DOI:https://doi.org/10.1145/3411763.3451773
[38] Soon-gyo Jung, Joni Salminen, Haewoon Kwak, Jisun An, and Bernard J. Jansen. 2018. Automatic
Persona Generation (APG): A Rationale and Demonstration. In CHIIR ’18: Proceedings of the 2018 Conference on Human Information Interaction &
Retrieval, ACM, New Jersey, USA, 321–324. DOI:https://doi.org/10.1145/3176349.3176893
[39] Pascal J. Kieslich, Felix Henninger, Dirk U. Wulff, Jonas MB Haslbeck, and Michael Schulte-
Mecklenbeck. 2019. Mouse-Tracking: A Practical Guide to Implementation and Analysis 1. In A handbook of process tracing methods. Routledge,
111–130.
[40] Ari Kolbeinsson, Erik Brolin, and Jessica Lindblom. 2021. Data-Driven Personas: Expanding DHM for a
Holistic Approach. In International Conference on Applied Human Factors and Ergonomics, Springer, 296–303.
[41] Dannie Korsgaard, Thomas Bjørner, Pernille Krog Sørensen, and Paolo Burelli. 2020. Creating user
stereotypes for persona development from qualitative data through semi-automatic subspace clustering. User Model User-Adap Inter 30, 1 (March
2020), 81–125. DOI:https://doi.org/10.1007/s11257-019-09252-5