This is a self-archived – parallel published version of this article in the publication archive of the University of Vaasa. It might differ from the original. Developing Persona Analytics Towards Persona Science Author(s): Salminen, Joni; Jung, Soon-Gyo; Jansen, Bernard Title: Developing Persona Analytics Towards Persona Science Year: 2022 Version: Accepted manuscript Copyright © Authors | ACM 2022. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of IUI '22: 27th International Conference on Intelligent User Interfaces, http://dx.doi.org/10.1145/3490099.3511144. Please cite the original version: Salminen, J., Jung, S-G. & Jansen, B. (2022). Developing Persona Analytics Towards Persona Science. Proceedings of IUI '22: 27th International Conference on Intelligent User Interfaces, 323-344. New York: Association for Computing Machinery. https://doi.org/10.1145/3512891 Developing Persona Analytics Towards Persona Science Joni Salminen Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar; and University of Vaasa, Vaasa, Finland, joni.salminen@uwasa.fi Soon-gyo Jung Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar, sjung@hbku.edu.qa Bernard J. Jansen Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar, bjansen@hbku.edu.qa Much of the reported work on personas suffers from the lack of empirical evidence. To address this issue, we introduce Persona Analytics (PA), a system that tracks how users interact with data-driven personas. PA captures users’ mouse and gaze behavior to measure users’ interaction with algorithmically generated personas and use of system features for an interactive persona system. Measuring these activities grants an understanding of the behaviors of a persona user, required for quantitative measurement of persona use to obtain scientifically valid evidence. Conducting a study with 144 participants, we demonstrate how PA can be deployed for remote user studies during exceptional times when physical user studies are difficult, if not impossible. Keywords Personas, User research, Persona science, Persona analytics, Remote user studies 1 Introduction The current work discusses how personas can be effectively combined with the concept of analytics, i.e., the use of end-user data for drawing insights into human factors [53]. We start by defining the key concepts, and we then explain our approach to infusing personas with analytics, which we denote as Persona Analytics (PA). We refer to users when we mean stakeholders that use personas for decision making (e.g., designers, software developers, marketers). For other terminology, we refer to end-users when we mean people on whose information personas are based. Personas, commonly applied in human-computer interaction (HCI) [16], design [8], and business domains such as marketing and sales [74], are fictional depictions of end-users, patients, customers, or other groups of interest [16]. Personas convey end-users’ needs and requirements [8], alleviate decision-makers’ self-referential bias [6], and enable thinking of end-users even when none are physically present [66]. Also, personas give a human face to analytics data [29,35], humanize segments [11], give design inspiration [57], help compare end-users [34], and facilitate prioritizing end-user needs [69] for system development [67]. Persona profiles show relevant information about end-users [58] (see Figure 1). Nielsen summarizes a good deal of personas literature on their creation, assessment, and use [57]. Personas are easily digestible snapshots of end-users, audiences, or customers for use throughout an organization. Figure 1: Example of a persona profile. Persona Analytics enables tracking users’ mouse and gaze interactions as the user engages with different information elements (A: Picture, name, and text description, B: Audience size, C: Sentiment, D: Social media quotes, E: Topics of interest, and F: Most viewed contents). Personas are increasingly being enriched with quantitative data [72], or their creation is partially or completely carried out by algorithmic processes, which is referred to as algorithmically generated persona development [50]. When quantitative data becomes part of the created persona profiles, these profiles start approaching other analytics systems in terms of producing end-user metrics. In fact, algorithmically generated personas can be seen as an alternative method of end-user understanding, as alternatives to UX analytics tools (e.g., Google Analytics, Adobe Analytics, HubSpot, Mixpanel, Crazy Egg, and so on), when it comes to design tasks that require user insights (e.g., understanding website visitors for improving usability). As such, personas personify the numerical data on end-user characteristics and behaviors—turning numerical reports into persona profiles [29]. This transformation from “cold numbers” into “warm people” has been denoted as a benefit of personas, in that personified end-user data is treated more empathetically than nameless, faceless numbers [74]. Algorithmically generated personas are also supported by the automation of data science pipelines—i.e., the process of persona creation can now be automated—as well as web technologies [52] that enable serving the personas to users via web browsers—i.e., via interactive persona systems [33]. As such, PA can be defined as follows: DEFINITION 1: Persona analytics refers to decision-makers (i.e., persona users) in organizations using personas as analytical tools to better understand their end-users or other groups of interest. The above definition follows the conventional understanding of algorithmically generated personas using quantitative data [29,30,52]. As mentioned, we define a ‘user’ as someone who uses a persona for a professional task, which can relate to software development, design, marketing, or any other domain where personas are applied. Therefore, users can be software developers, designers, marketers, or other stakeholders involved in user-centric decision making. Because of this connection between personas and users, there exists another aspect to the concept of PA. Namely, PA can be seen as a research instrument to generate lasting knowledge about personas and how users interact with them. Its role is to grant HCI researchers a systematic approach for collecting data about persona user behavior and metrics for dealing with this data. In so doing, PA paves the way to more effective application of persona science, defined as ‘the use of empirical scientific methods, such as experiments, to produce robust and generalizable information about persona creation, evaluation, use, and impact’. Indeed, in prior research [36,37], PA is defined as the systematic measurement of behaviors and interactions of persona users engaged with interactive persona systems. This is consistent with the second definition we put forth: DEFINITION 2: Persona analytics refers to how researchers investigate the behaviors of persona users. It is this second definition that motivates our current work, because examining persona users’ engagement with personas can generate vital insights for persona science and the design of personas and persona systems that better serve stakeholders’ information needs about end-users or customers. The persona research urgently needs a strong empirical orientation to produce knowledge that is believable and can truly push forward the boundaries of personas practice and theory and add up to a coherent understanding of the persona user. Advocates of the scientific method in persona research [4,5,11,12,23] have continuously mentioned the lack of empirical experiments and quantitative measurements as a bottleneck for progress in terms of theory and practice. Our definition of persona science implies not only collecting data and conducting research on personas but also making an effort to devise theories that explain the data and guide further data collection. Persona science deals with real user behavior and formulating theories that are relevant to the design of personas. The focus in these efforts lies in the study of the persona users, which we demonstrate in this work by introducing new tools for measuring persona user behavior. To this end, the current work concerns itself with the development of a novel PA system embedded within a persona system, with the purpose of more effectively investigating/researching the behavior of persona users. Three research questions (RQs) are posed: • RQ1: How to implement PA in an interactive persona system? • RQ2: What kind of research questions can PA address? • RQ3: How can PA be used for understanding persona user behavior? The goal of the current work is to report efforts of building analytics features into an interactive persona system. We demonstrate the capabilities of this PA system and discuss its value for empirical persona research. In practice, PA can assist in designing layouts, features, and information content in algorithmically generated persona profiles. To achieve these benefits, it is necessary to incorporate analytics into personas, so that the interaction between the users and personas (and interaction features) can be captured. Previous efforts of this work appeared in [36,37] – relative to these, the current work adds a full-scale case study demonstrating the system capabilities with a real user study (previous research only tested the system with one pilot user). The PA system has implications for researchers and practitioners who are increasingly adopting web-based tools for remote testing since social distancing hampers in-person user studies [17]. This trend is likely to continue as tools and practices for remote user studies evolve. 2 Related Work Algorithmically generated Personas. Although quantitative personas were first created within software requirements engineering [7,8], the concept of personas being data-driven was introduced by McGinn and Kotamraju [50] and later deployed by others [40,41,51,95,97]. Though, the idea of using “data” for personas dates to Cooper’s [16] concept that personas should be based on real user goals instead of fiction. While data orientation has remained a consistent theme in the persona literature [16,17,32,9,10,50,51], three trends contribute to the rise of algorithmically generated personas [29,72]: (1) availability of user and customer data from online analytics and social media platforms; (2) democratization of data science tools and algorithms that enable automated persona generation; and (3) web technologies that remove the limitations of static personas via interactive user interfaces. These trends denote a shift from unchanging “flat file” personas into dynamic “full- stack personas” that update automatically and are traceable to individual user-level data [29]. Interactive Persona Systems. From algorithmically generated personas, the next logical step of evolution is interactive persona systems [3,52,72], defined as interactive user interfaces (UI) that display persona profiles. This UI can, but not necessarily always, be accessed via web browsers [32,33,35,38]. The benefits of web technologies are their broad applicability and accessibility. Personas served via the web can be accessed virtually from anywhere using any device that supports web browsing (see Figure 2a). Supporting technologies, such as user account management, can be integrated with relative ease using standard libraries and best practices. Interactivity refers to users performing various actions on the personas, such as analyzing information on gender distributions, refreshing the persona quotes, filtering the quotes by sentiment and topic [80], predicting a persona’s interest for a given topic [2,3], and engaging in dialogue [45,47]. The interactive features are enabled by standard Web technologies, such as HTML, CSS, and JavaScript. Emerging opportunities in Literature. Following these developments in algorithmically generated personas and interactive persona systems that have been described as transformational [52], multiple opportunities can be envisioned. We highlight five such opportunities. First, (i) interaction techniques and multimedia (e.g., persona chat/dialogue systems [13], video, AI agents [87]…) could be incorporated into persona systems to serve various end-user needs [75]. Second, (ii) new features for comparing personas by design goal metrics, such as diversity [74] and inclusivity [21], could be added. Third, (iii) personas could be integrated into an external system to enable persona-based recommendations [46], content management, and customer relationship management, as well as facilitating online advertising [79] via application programming interfaces (APIs) [38]. Fourth, (iv) developers could provide explainability, transparency, and context, which are important when applying algorithms for persona creation [80,88], as illustrated in Figure 2b. Finally, (v) interactive systems can be used to drill down to the persona information and make quantitative predictions [3]. (a) (b) Figure 2: (a) Interactive persona features, such as [A] browsing the available personas, [B] searching and sorting by user-defined criteria, [C] explanatory tooltips, and [D] export of usage logs. (b) Adding transparency to algorithmically generated personas. The first layer shows Mamdouh, a young Egyptian. The second information layer, accessible by clicking a chart icon, shows that Mamdouh actually comprises many demographic groups, of which [Egypt, 25-34 Male] is deemed the most representative by the algorithm Research Gap. While technology introduces novel opportunities for user-to-persona interaction, at the same time, these trends create an opportunity for better understanding of how persona users, such as designers, software developers, and marketers interact with personas. This better understanding of persona user behavior can lead to substantial advances in persona science (i.e., the academic study of personas and their usage), but it requires effective implementation of measurement. The lack of empirical persona user research has been noted by several researchers [48,72,77]. The unifying factor behind these possibilities is the need for understanding the persona user behavior, which requires measurement. In our solution, this measurement capability is provided by PA. 3 Methodology for Persona Analytics 3.1 Requirements Journey There is no standard method of building an analytics system. In our case, all the researchers had extensive experience of both Web analytics and personas research, which was instrumental in this process. This experience consisted of working with industry-leading analytics solutions, such as Google Analytics, for more than a decade in the case of two authors and half a decade in the case of one researcher. The persona research of the authors, when combined, also extends well beyond a decade and mostly consists of empirical work. Therefore, we had a vision of what we wanted to accomplish, what is missing from current research and practice, and what research questions in persona science should be addressed via empirical data. We started out by “drinking our own Kool-Aid,” i.e., by defining the ideal user persona for the system. This “persona” of a user of the PA system is a researcher that wants to conduct persona user studies in order to address scientifically important questions. Measuring user behaviors helps researchers tackle open research questions in persona science, which is the goal of this persona. To support this persona, a few requirements are posed: (a) the data must be accurate so that proper conclusion can be drawn from it, (b) there should be the possibility to include several data types to enable the comparison of different end-user inputs, and (c) the dimensionality of the data should not be overwhelming for the analysis task, i.e., the data needs to be exportable in a format that is relatively easy to analyze. These desiderata were considered in the design of PA by incorporating multiple data sources and by keeping the reporting data granularity at a user-friendly level – in other words, reports with different levels of detail and aggregation are provided, as explained later in this manuscript. Second, we brainstormed the type of questions that the PA system would need to be able to address by providing data for the researcher persona. The following list of scientific questions of interest (SQ) was collaboratively obtained among the research team members, pertaining to various persona aspects (in parentheses): • SQa: How do users interact with persona profiles? (interaction techniques) • SQb: What interactive features facilitate users’ discovery of personas for a given task? (interaction techniques) • SQc: What information of personas do users pay attention to? (information design) • SQd: What persona information influences users’ design choices and how? (information design) • SQe: What persona information influence users’ behaviors or attitudes about end-users? (persona perceptions) • SQf: How do users compare personas for a task? (cognitive styles, information processing) • SQg: How and why do users choose a persona for a given task? (cognitive styles, information processing) • SQh: How do users or user groups differ by their persona use? For example, are persona users with less experience in personas using them differently? Are there gender differences? (demographic, cultural, and social factors) Addressing these and other vital questions can provide much needed direction for persona science, addressing aspects of persona creation, validation, use, and value in use. Empirical, scientific inquiry is not only needed to produce valid knowledge for practitioners using personas, but it is also required to create robust theories on personas and their users. Aligned with principles of scientific inquiry, persona research can benefit from adopting more rigorous research designs, including hypothesis formulation based on theories in HCI, information science, social psychology, and other fields tangential to personas; and followed by systematic testing of those hypotheses, then revising the theory to adapt to persona context. Addressing these questions can help persona creators understand which aspects of human-to-human interactions apply to human-to-persona interactions, so design decisions can be made to mitigate unwanted effects (e.g., stereotyping [48] and seeing personas as irrelevant, abstract, or misleading [49]) as much as possible. 3.2 Defining Metrics for Persona User Behavior In practice, to address these questions, the PA system needs to track various measures and use these measures to compute metrics. Therefore, we needed to devise PA system metrics. These metrics were defined based on their ability to address the types of questions posed earlier. These metrics can be divided into (a) persona-based metrics and (b) user-based metrics. The persona-based metrics include, for example, the following (with potential use mentioned after definition): • Time spent per persona: the duration users interacted with a given persona. Purpose: Proxy measure for users’ interest – it is likely that users spend more time with personas they find more interesting. • Number of visits per persona: the number of a given persona was visited by the users. Purpose: Proxy measure for users’ interest – it is likely that users visit personas they find interesting more often. • Persona bi- and trigrams: the number of times users visited specific two or three personas during a session or time period. Purpose: Bi- and trigrams can be indicative of comparative behavior, i.e., how users compare personas. (Technically, this is not a metric but a measure; however, we mention it here due to its nature of being computed.) In other words, these metrics communicate aggregate information about how one persona did relative to another – i.e., was one more popular than another, in what order where they visited, and so on. User-based metrics, in turn, communicate about a user or a group of users. These include: • Number of personas visited: the number of personas a user visited during a session. Purpose: to understand how thoroughly a user viewed the personas. For example, a user that only visits a small number of personas either quickly found what they were looking for, satisficed with the “first acceptable choice” [93], or was not engaged with the system and/or personas. • Persona coverage: the relative share of personas a user visited out of the personas available. Purpose: The same as previous, but as a ratio metric of the visited personas / the number of available personas. The higher the number of personas becomes, the more likely it is that the persona coverage per user decreases, as users would be unlikely to browse a very high number of personas for their professional tasks. • Average visit duration: the average time spent per persona for a given user. Purpose: can reveal if the user was more or less engaged relative to other users. • Persona rank correlation: the degree to which the order of a user visiting the personas corresponded with the personas’ order of presentation in the system (can be computed based on visit duration as well). Purpose: to test if there are order effects that affect persona use. Many of these metrics are inspired by similar metrics used in information theory [92], eye-tracking studies [25,42,71], and Web analytics [1,14,31]. While similar metrics are well established in said fields, these are metrics are not established in persona science and research. In fact, we are aware of no previous study that discusses metrics for persona user behavior – again, this hampers scientific progress in this domain. Coupled with participant data, the metrics can help analyze how users view different personas, if there is selection bias based on demographic factors of personas and their users, and so on. While these basic metrics provide a useful starting point for persona science, a lot more development in this domain is needed. It is also important to modify and adapt known metrics for the persona context, because they could be computed or interpreted differently when studying personas. We discuss this matter later in Section 6.8. 3.3 Determining the Data Collection Modes The data collection modes were largely pre-determined by what is possible using the current Web technologies. Mouse-tracking is the obvious choice due to its commonality in online analytics and support provided by all Web browsers [39]. The advantages of mouse-tracking are three-fold: it (i) offers an unobtrusive form of tracking of natural user behavior, (ii) does not require calibration, and (iii) has perfect accuracy—i.e., there is virtually no measurement error, but the users’ movement of the mouse is perfectly traceable to specific pixels and UI elements. On the negative side, mouse-tracking is considered a weaker proxy for attention than gaze movement, i.e., eye-tracking [9,55], mainly because users might not always move their mouse when processing information on the screen. As processing of persona information requires eye-sight, eye-tracking is a useful data source to complement mouse-tracking in interactive systems [18]. The challenge of webcam-based tracking is that error margins can pose challenges for data quality, as there are differences in terms of hardware quality, lighting conditions, distance to screen and device, and a myriad of other conditions that can decrease online eye-tracking data quality [98]. While these issues do concern both separate hardware trackers (e.g., Tobii, MyGaze, GazePoint) and webcam-based eye-tracking, for the latter, the challenges in data quality are much higher because webcams do not provide access to infrared frequencies that the professional trackers use. Therefore, because mouse- and eye-tracking in a remote user study context each involve their unique advantages and challenges, it is appropriate to integrate both data collection modes into the PA system, which simultaneously completes the scope of the requirements. Figure 3 offers a conceptual overview of the PA system. Overall, the overall algorithmically generated persona process is as follows: end-users’ data => personas => persona users (e.g., marketers) => persona users’ data (collected via PA) => researchers and analysts (studying persona user behavior). In other words, personas serve the information needs of stakeholders, and PA serves the information needs of researchers interested in persona user behavior. Figure 3: Conceptual diagram of Persona Analytics. Multiple users can simultaneously interact with the persona system. The user’s interaction with persona profiles is captured via mouse- and eye-tracking, recorded in a central database (DB), and outputted via reporting interface. By analyzing the reports, researchers can make important discoveries for persona science. 4 System Implementation 4.1 Overview The defined questions acted as a guiding idea for requirements and implementation. In software development projects, requirements detail what is needed from a system [20]. Engineers or developers tend to implement features and functionalities according to the requirements to create the system. Two of the authors collaborated on creating the requirements, and one of the authors with the necessary skills implemented them. The system was tested internally and with a pilot user (reported in [37]), and we found it to log the data correctly. The implementation of PA was carried out for an interactive system called Automatic Persona Generation (APG) [2,3], which is a state-of-the-art system for algorithmically generated persona development. The system is available at https://persona.qcri.org. While we defer the reader to related work for a complete description of APG’s system functionalities and associated algorithms, the following subsection provides a brief explanation of the APG system. 4.2 Algorithmic Approach for Persona Generation APG generates personas from online analytics data—e.g., from YouTube audience statistics or Google Analytics log data on end-users. APG infers demographically and behaviorally distinct patterns from user datasets [27,28]. APG has been previously applied to datasets on social media users [89], ad target groups [84], online news audiences [2,3], and video game players [90]. The APG persona creation relies on three main steps [30]: (a) identify unique user behavioral patterns using non-negative matrix factorization (NMF) [43], (b) associate these behavioral patterns with representative demographics (age, gender, country) to form “skeletal personas”, and (c) enrich the skeletal personas with personified information that matches the demographics (name, picture, job, education level, relationship status, topics of interest). Figure 4 summarizes the algorithmic persona generation process. Figure 4: (A) applying the NMF algorithm [43] to the user dataset V that consists of demographic groups (g) and content (c). This matrix is decomposed to W and H, both involving the hyperparameter p that indicates the number of personas. Epsilon describes the error term. Through enrichment process (B), explained in the body text, APG produces a set of p personas (C) that have personified information, conceptually known as “personification of big data” [96]. 4.3 Measurement Paradigm The key insight going from APG (interactive persona system) to PA (persona analytics) is that, when personas are provided through a web browser, PA takes place via mouse- (and eye-)tracking that records the persona users’ mouse (or gaze) movements and clicks (eye fixations) on the persona profiles and their information elements. https://persona.qcri.org/ This enables empirical persona research, such as building click paths, persona visit sequences, dwell time analyses, and so on. In PA, we track both the information usage behavior within the persona profiles and the transitions between the personas. One can think of personas as “pages” in the conventional Web analytics terms, and then information reports are the pages’ content. To answer research questions that advance persona science, we need both levels of tracking. To this end, the PA system records the user’s mouse and gaze movements simultaneously during the session. The data is stored in a backend database. To support analysis, the screen coordinates are automatically converted to the corresponding information elements in the logs using JavaScript. In other words, the PA system records that a given mouse hovering or fixation was targeting, e.g., “Persona picture”. The duration of hover or fixation is calculated based on the “in” and “out” timestamps. In total, the PA system has 120 predefined HTML elements describing all information in the persona profile page (see Figure 5). The main elements include Headline (name, gender, age, country), About (picture, text description, job, education level, relationship status), Sentiment, Topics of Interest, Viewed Conversations (Quotes), Viewed Contents, and Audience Size, corresponding to typical information in persona templates [58]. Figure 5: Because interactive persona systems serve the personas via a web browser, web technologies, such as HTML and JavaScript, can be used for tracking how users interact with the personas. The frames in the figure illustrate how information elements in the persona profile are tagged for user tracking. In total, PA tracks 120 elements in the persona profile. 4.4 Online Eye-Tracking The eye-tracking is implemented using WebGazer.js1 [62–64], a webcam-based gaze tracker developed at Brown University. WebGazer is available in JavaScript as an open-source library2. Multiple alternative frameworks were compared, but we chose WebGazer based on four reasons (a) it provides a relatively good accuracy based on our pilot testing, (b) is actively developed based on the update frequencies in the GitHub repository, (c) the source 1 https://webgazer.cs.brown.edu/ 2 https://github.com/brownhci/WebGazer https://webgazer.cs.brown.edu/ https://github.com/brownhci/WebGazer code is publicly available and can be integrated into systems such as APG, and (d) the software is provided free of charge. These properties make WebGazer a feasible online eye-tracker for research-based systems such as APG. 4.5 Administrative Features In the APG’s UI, system administrators can enable either mouse tracking, eye tracking, or both for all users or for a subset of users. From a user’s point of view, the only difference is that, when eye-tracking is enabled, every session starts with calibration (see Figure 6a and b). The PA system processes mouse- and eye-tracking data identically, which means that both types of interaction are recorded in the database and can be exported in a single file in order to improve usability for the researcher using the PA system. When the coordinates of hovering or gazing correspond to a predefined element, this even, along with timestamp (in/out) and meta-data (User and Session ID), is sent by the client browser via Ajax (Asynchronous JavaScript and XML) to the backend database. The PA system maps the coordinates to the persona information element the user is interacting with. Administrators can download the log files for data analysis (see Figure 6d). They can also create new user studies from the persona system’s backend. (a) (b) (c) (d) Figure 6: (a) Eye-tracking calibration dialogue and (c) how it shows to users; (c) example of a user’s eye- tracking pattern before the data is converted to each specific information element in the persona profile (denser color indicates more gaze fixations in a given area), and (d) data export dialogue shown for researchers to export user logs. The logs are provided in CSV files which can be downloaded via the Download button. To prepare the data exports, the system uses Pandas (i.e., Python Data Analysis Library) after retrieving the logs from the backend database. It computes the duration of each interaction based on “in” and “out” timestamps and generates a comprehensive data report, in which each mouse and eye fixation event, its timestamp, its target information element in the persona profile, and meta-information (Session ID and User ID), are saved into a file. The logs can be downloaded for further analysis. Information about the variables logged by the PA is included in Supplementary Material3. These variables were determined based on the metrics and questions detailed in the previous sections. As a result, the data recorded by the PA system enables the calculation of various metrics of persona user behavior. We now illustrate, though an example user study, how PA can serve persona researchers. 3 https://www.dropbox.com/s/yeu1jrohs6lbpg8/central%20variables.docx?dl=0 https://www.dropbox.com/s/yeu1jrohs6lbpg8/central%20variables.docx?dl=0 5 Validation Study A remote user study was conducted in which 114 participants used an interactive persona system with PA enabled, to browse a set of 10 personas, created from a dataset of a tourism-promoting organization with 1.8M (1,795,115) user likes over 5,312 Instagram posts. The participants’ task was to choose a persona to target for tourism marketing (i.e., promoting a specific destination, in this case, a country). Basic demographics of participants are provided in Table 1. Table 1: Participant demographics. Age Male Female Non-binary M SD 62 55 1 N = 118 (81.9%*) 35.35 9.08 52.5% 46.6% 0.01% *For 18.1% of the participants, we did not have demographic information. The participants were recruited using an online data collection service called Prolific [61]; the same service has been used in various other persona user studies [82,83,88]. We used the platform’s industry categories as a sampling criterion, including “Art/Design”, “Graphic Design”, and “Market Research”, in order to reach people that work in industries were personas are relevant. Students were excluded. All participants were provided with a definition of personas, and a task description prior to their use of the system. The study flow is illustrated in Figure 7. The study dealt with testing the effect of simple and complex explanations on user behavior and perceptions, served via a product walk-through (i.e., a process that sequentially shows different parts of the system to a user that is logging in for the first time). The participants were directed from the online service to the system, which randomly allocated each participant into one of the three experimental conditions. The participants were randomly assigned by the APG system to one of the three experimental conditions (simple explanations about the system, complex explanations, and no explanations at all). Figure 7: An example of how the APG and PA systems can be synchronized to conduct a full remote user study. In this case, we used an online platform to recruit participants to the study [A]. After using the system to browse the personas [B] (no time limit was imposed), the participants completed a survey [C] and were redirected back to the data collection platform [D] that logged successful study completions. PA recorded both the survey and behavioral data [E]. After using the persona system, which consistent viewing as many personas as they wanted out of the ones created by the algorithm (see Table 2) for as long as they wanted, the users could click on a banner to indicate they are ready to complete their task, after which the participant is transferred to a survey platform that collects data about their task completion, perceptions, and demographic variables. Upon completing the survey, the participant is automatically redirected back to the online data collection platform, where their participation is marked complete. Both APG and the survey platform record the UTM parameters4 that identify (in an anonymous way) the participant so that participants that pass the data validation stage (i.e., researchers validating that their responses were genuine) can be easily compensated, and their system usage data is linked with their survey responses for further analysis. During system usage, PA was enabled, and data was collected on the users’ interactions with the persona system. Table 2: Personas the algorithm created for the user study. Persona Age Gender Country Mamdouh 28 Male Egypt Rahul 34 Male India Ashley 25 Female United States Muhammad 34 Male Pakistan Alaa 18 Female Egypt John 26 Male United States Abdalaziz 23 Male Egypt Rizky 18 Male Indonesia Putri 20 Female Indonesia Chris 40 Male United States 6 Explorations of Persona User Behavior 6.1 Overview In this section, we analyze the collected data to demonstrate how PA can serve empirical persona user research. We do not explicitly test any hypotheses, although the data obtained from PA could be used for that, but for the sake of demonstration, we inductively analyze the data and provide exploratory findings about persona user behavior. We then synthesize these findings in the form of propositions that future research could test, with complementary theorization, as hypotheses. In other words, we illustrate how PA can be of service towards persona science. The results of the effect of the three conditions will be reported in a future publication; here, we focus on demonstrating how PA can be used to analyze the data obtained from user experiments. For parsimony, the following analyses focus on the mouse-tracking data. Because the eye-tracking data is logged in the precisely same data structure as the mouse-tracking data, the exact same analyses and metrics can be obtained from eye- tracking. Based on our piloting of the eye-tracking module, the accuracy strongly varies (from ~16% to ~80% in our testing) by the user, condition, and equipment. This is also why it is more reliable to carry out this demonstration with the mouse-tracking data. 6.2 Descriptive Statistics Descriptive statistics about participants’ engagement with the system (see Table 3) indicate that, on average, participants spent around 8.5 minutes browsing the personas for their task and visited persona profiles on average 14 times. What is striking is the high dispersion among the participants – the standard deviations are high for both the dwell time (SD = 8.3 minutes) and visit counts (SD = 11). The participant with the shortest dwell time only used the system for 20 seconds, while the participant with the longest dwell time used the system for 4 Urchin Tracking Module parameters are a standard technique for tracking source and meta-data of Web traffic. more than an hour (62.6 minutes). The shortest visit path only included visiting one persona, whereas the longest path consisted of visiting the persona profiles 60 times, which equals 6 visits per profile on average. These results indicate a major dispersion in engagement, with some participants being “persona power users” while others lack significant engagement with the system. Future analyses could investigate how these two extreme user types differ (e.g., demographic or industry variables that might explain the differences) and why (e.g., low task motivation, not perceiving personas as relevant or useful). Table 3: Dwell time (in seconds) and number of visits to the persona profiles by the participants. Mean SD Min Max Median dwell time (i.e., system usage time) 514.3 496.8 19.5 3756.8 410.8 visits (i.e., number of times loading a persona profile) 14.0 11.0 1 60 12 As a whole, exactly half (50.0%) of the users viewed at least 9 of the personas (see Figure 8a), i.e., achieving a persona coverage of 90%, while less than a third (31.9%) viewed three or fewer personas. The fact that close to half (47.2%) viewed all 10 personas implies that users have a need for viewing a variety of personas—10 was the highest number in this study, but it seems likely that given the choice, users would have viewed more than 10 personas. Concerning information viewing patterns (Figure 8b), the persona profiles contain 8 parent information elements (About, Audience Size, Headline, Sentiment, Timeline, Topics of Interest, Viewed Contents, and Viewed Conversations). Only a minority of persona visits contained viewing all 8 information elements (0.5%). While some visits included only viewing one information element (10.5%) – perhaps an indication of rapid verification of a recalled detail – more than half of the persona visits (52.3%) contained the viewing of at least four parent information elements (i.e., at least half of the main information in persona profiles). Section 6.5 investigates further what information was most viewed. (a) (b) Figure 8: (a) Persona coverage (i.e., the number of personas a participant viewed during their whole sessions), (b) Information coverage (i.e., how many parent information elements were viewed by the users during each viewing of a persona). 6.3 Correlations and Gender Effects There was no notable correlation between participant age and their system usage time (r = 0.08) or age and number of visits (r = -0,16). In terms of dwell times, results from a t-test indicate that females were using the system longer (M = 634.8 seconds) than males (M = 493.4 seconds), t(114) = -1.55, p = 0.06. In terms of visit count, there was no significance difference, with females (M = 14.8) and males (M = 16.3) visiting a roughly even number of personas, t(114) = 0.79, p = 0.21. (Both these tests were based on the 118 participants for which we had gender information; there was one participant who indicated non-binary gender and who was therefore excluded from the analysis.) For males, with 95% confidence, the population mean for persona visit duration is between 33.4 and 59.4 seconds, based on 61 samples. For females, with 95% confidence, the population mean for persona visit duration is between 46.5 and 80.3, based on 55 samples. Finally, there is also no significant effect based on persona gender, with both male (M=37.8 seconds) and female (M=35.3 seconds) personas being frequented roughly an even amount of time, t(1996) = 0.50, p = 0.31. 6.4 Persona Viewing Patterns One of the researchers conducted an exploratory data analysis (EDA) on 21 participants’ patterns of viewing the personas – by pattern, we mean how long a participant viewed a persona in their sequence of browsing the personas. This EDA revealed several different patterns of viewing the personas (see Figure 9), including (a) shark fin, (b) u-shape, (c) stabilizing, (d) sporadic, (e) linear declining, and (f) triangle shapes. (a) Shark fin (b) U-shape (c) Stabilizing (d) Sporadic (increasing) (e) Linear declining (f) Triangle Figure 9: Different persona viewing duration patterns based on an exploratory analysis. The variety of patterns indicates that it might be difficult to find general “laws” that would govern how individual users explore a set of personas. However, among the manually reviewed samples, we observed that the dwell times tend to decrease over the number of visits – 16 out of 21 had such a trend (76.2%) (e.g., a and c in Figure 9), while only three participants (14.3%) had an increasing dwell time trend (e.g., d). The two remaining had trends that could not be categorized as either decreasing or increasing (b and f). Thus, it appears that the time spent reviewing persona profiles decreases over the number of visits. In cases where the dwell time appears to “resurge” (e.g., b), the participant may be returning to a persona they found interesting earlier in order to verify, learn more, or compare information. Some patterns remain highly sporadic till the end of the session (e.g., d), while others seem to stabilize early (e.g., c). 6.5 Persona Information Viewing Behavior We investigated where users focused their attention as a proxy for attention and interest. Results in Figure 10a show that the users were most interested in social media quotes in the persona profile (68.4% of the total dwell time), followed by the personas’ basic information (“About”, 14.1%) and audience size (8.1%) that indicates how many people there are on Facebook and Twitter similar to the persona. Plotting the data shows that users’ attention is unevenly distributed, with the quotes garnering over five times more dwell time than the second most popular information, i.e., the persona’s basic information. Unlike for persona visits, dwell time and visit count is strongly correlated (r = 0.63) for persona information. It is known from previous persona studies that quotes are very impactful for users’ perceptions of personas [86], but it is interesting that the mouse-tracking shows the quotes overshadow other information this strongly. (a) (b) Figure 10: Information viewing behavior of the participants. (a) Dwell time (bars) and visit counts (line). (b) Most common transitions between the parent information elements across all personas (Mamdouh is used for illustration). Users start browsing the persona’s basic information, including picture, text description and sociographics (State 0). They then move to comments (S1), most viewed content (S2), which is viewed repeatedly (S3), before moving to topics of interest (S4), audience size (S5), persona’s name and demographics (S6), and back to audience size (S7). On the other hand, if we instead of dwell time focus on the number of visits, personas’ basic information (“About,” as indicated by the orange line in Figure 10a) becomes the most important information element. This element contains the text description and picture of the persona that, again, previous research has been found influential for persona profiles [58]. The information viewing sequence (see Figure 10b) seems to move diagonally from top left to bottom right, then up, then bottom left, and back up and finally down (↘↑↙↑↓). This sequence was obtained by calculating the most common parent elements in states S0...S7 across all participant- persona pairs. Two takeaways can be elicited from these findings: (a) that personas’ quotes, text description, and picture are among the most impactful information based on users’ mouse engagement, and (b) measuring dwell time and visit counts can give different results, which is why measuring these two separately makes sense – an information element with a high visit count but low dwell time is frequented in short bursts, whereas an information element with a high dwell time is focused on for a longer time; it is logical that quotes interest people because they are seen to reflect the persona’s attitudes and are information-rich for various user tasks. 6.6 Effect of Order of Personas Order of presentation has been shown to affect how the information is accessed, used, and recalled [19]. Among the notable effects in this line of work are, e.g., the primacy effect that implies first seen information is the most impactful [68] and serial effect, implying that first and last items in a list are given special attention [54]. In persona system context, these effects can matter, because the personas are shown in a list, and such effects can therefore cause that, e.g., the needs of the first and last personas in the list would be considered more strongly than those of other personas. When plotting the data to investigate, three observations can be made from Figure 11: (1) there is a first persona effect, i.e., the first shown persona (Mamdouh) gets substantially more attention than the others, (2) there is no strong pattern of primacy effect in terms of declining dwell time based on persona’s order of being displayed in the system. However, (3) the fact that the last persona (Chris) is the second most viewed implies serial effect in which the first and last items of a list garner the most attention. Spearman rank correlation between persona order and dwell time is negligible (r = 0.176). Correlation between system order and number of visits is moderate (r = 0.576). Correlation between dwell time and visits is also moderate (r = 0.455). However, when we compute the most common personas visited (i.e., S1 = the first persona the user visits, S2 = the second persona they visit, ... S10 = the tenth persona they visit), we find that the TOP-10 path is precisely identical to the order of presenting the personas in the system. A further check reveals that 31 users (21.5%) follow this sequence when using the system. That is, about one fifth of the users browse all the personas in the order in which they were presented. (a) (b) Figure 11: (a) Dwell time distribution among personas. (b) First persona effect. Mamdouh (the first persona shown in the system) received almost one third of all dwell time from the users. If dwell time was distributed evenly, he should only receive 10%, which means an excess of 229% from this equal baseline, some of which likely stems him being the default persona in the system. Overall, these results indicate that (a) the system’s default persona garners the most attention, and (b) a sizeable portion of the users visit the personas in the exact order that the system shows them. Perhaps the system should evenly fluctuate the default position among the personas to mitigate for “discrimination” arising from this effect. The current logic is that the system loads the persona by default that has the highest audience representation, i.e., the most engagements in the baseline user data the personas are created from. Testing various rationale for the default persona and the effect of rotation are excellent ideas for future work. We also observed a potential cultural effect, which is a possible manifestation of the users better identifying with the personas from their own cultural sphere (see Table 4). Cultural aspects in personas remain an important area of future work, such as the special role of default and exit personas, i.e., those with whom users finish their browsing session (see Figure 12). Table 4: Ethnic bias? The only three Western personas rank the highest in terms of average view time (apart from Mamdouh that is the default persona). 89.8% of the participants were from Western countries (Europe and United States). Users may feel more comfortable identifying with personas from their own culture and ethnicity. This also implies persona studies should employ culturally and ethnically diverse samples to obtain internationally valid results. Persona Avg time per visit Mamdouh (rank = 1) 69.10 Chris (rank = 10) 37.78 John (rank = 6) 33.78 Ashley (rank = 3) 33.49 Alaa (rank = 5) 32.98 Persona Avg time per visit Rizky (rank = 8) 28.92 Putri (rank = 9) 28.13 Abdalaziz (rank = 7) 26.53 Muhammad (rank = 4) 24.85 Rahul (rank = 2) 20.79 Figure 12: Exit personas. The number of times a user stopped their session after viewing a given persona. Because the task dealt with considering a specific persona, it is possible that the exit indicates a higher likelihood of the persona being chosen for the task. 6.7 Modeling Persona User Behavior Because the PA system records the user’s transition from one persona information element to another (based on mouse and gaze movements), as well as capturing transitions from one persona to another, there are important opportunities for modeling user behavior, some of which we illustrate here. Figure 13 describes these opportunities through the concept of persona-gram, which refers to a string of letters depicting a user’s path of visiting either the personas or the information elements within a persona profile. This information can be stored as a state transition matrix (see bottom of Figure 13) which can be further used for computing the probability of a user transition from one state to another. In Figure 13, the names on the left illustrate a user’s path of visiting the personas. On the right-hand side (User 1 = U1), the same path is transformed into a string. The string format enables the comparison of different users using Levenshtein’s edit distance (ED) [73]. For example, User 2 (U2) differs from User 1 (U1) in only two string states (bolded in Figure 13), yielding ED1,2 = 1. Users that have a low edit distance are similar to each other in terms of their persona use behavior, whereas users with a high edit distance are behaviorally more different. When computing the distances of all users, it becomes possible to identify “average” behaviors and distinct outlier behaviors. (Moreover, it is possible to consider the duration of each visit to get a higher dimensional representation of the user’s viewing behavior.) Figure 13: Illustration of persona-grams using imaginary data. The series on the left describes a user’s transition from one persona to another. User 1 on the right is the same sequence transformed into a string. User 2 has the same sequence except for two differences (A-N, bolded), which means the edit distance is 2, i.e., one needs to make two edits to make the strings identical. The fewer changes one needs to make, the more similar the sequence of viewing a person is between two users. We computed the average edit distance across the dataset obtained and found the number highly dispersed. In other words, two users would rarely view the personas (or the information elements within the personas) in the same or similar order. This finding is interesting in itself – it implies users’ processing of persona information is more idiosyncratic than anticipated. To give a simple example, Table 5 shows three randomly chosen participants that each have a path length of 10, i.e., they visited persona profiles 10 times during their session. Table 5: Persona-grams of three randomly chosen users who each visited persona profiles 10 times. The color codes indicate the same persona being visited in the same sequence: yellow is shared by all three users, green is shared by Users 1 and 3, and turquoise by User 2 and 3. User 3 has a more similar browsing behavior with User 1 than with User 2, and User 1 and User 2 share the least similarity. User 1 User 2 User 3 Mamdouh Alaa Putri Alaa Rizky Abdalaziz Alaa Putri Rizky Alaa Mamdouh Rizky Chris Rahul Alaa Muhammad Abdalaziz Ashley John Putri Mamdouh Alaa Putri Alaa Ashley Chris Abdalaziz John Mamdouh Ashley As can be seen from Table 5, the behaviors are almost completely unique—for example, the only shared visit among the three is to the first persona, which is the default shown by the system. Hence, edit distance is a troublesome metric, because in this case, we would need 9 edits to make User 1 and User 2 — ratio-wise, this is 9/10, so 90% change rate (i.e., 9 out of 10 paths are different). Due to high uniqueness demonstrated by this example (which is also accentuated by the fact that the strings are of different lengths across the dataset!), the similarity of behaviors could perhaps be measured using other options. For example, User 2 and User 3 viewed 7 same personas (Mamdouh, Chris, Alaa, Abdalaziz, Ashley, John, and Putri) and 3 different personas (Rizky, Rahul, Muhammad). So, even though their exact viewing sequences are very different, the users actually view more the same than the different personas, i.e., there is likeness in their browsing behavior. To quantify this likeness, we can apply set theory to form an intersection (i.e., an overlap of paths). The intuition is that if two users visited more same personas than two other users, their persona browsing behavior was more similar. To can quantify this by calculating the Jaccard coefficient (J), which simply indicates the overlap between two sets. This metric is commonly used in information theory to compare sets [44]. Applying J to our examples from Table 5, we can observe that User 2 and User 3 are more similar to each other (J=0.7) than User 1 and User 2 (J=0.5) or User 1 and User 3 (J=0.5). (For replication, the sets are: User 1 – M, A, P, A, R, B, A, P, R, A; User 2 – M, R, C, H, A, U, B, S, J, P; and User 3 – M, A, P, A, S, C, B, J, M, S.) Unlike ED, which is only applicable to pairwise comparison, sets can be expanded from pairwise comparisons to multiple sets (see Figure 14). Figure 14: Among the 10 available personas, 4 (40%) were viewed by all three users. Users 2 and 3 viewed 3 personas that User 1 did not view, whereas Users 1 and 2 viewed 1 persona that User 3 did not view. User 2 viewed 2 personas that neither of the other users viewed, which indicates this user had the most diverse viewing pattern. Obtaining the number of intersecting elements (i.e., shared personas that any number of participants viewed) is trivial and easy using basic functions in scripting languages like R and Python, which increases the practical appeal of modeling persona behavior using sets. One can also use sets to compare the behaviors of different groups. We illustrate some of these cases in Figure 15. For example, (a) union can be used for identifying all personas that a group of users viewed, which can be beneficial when the number of personas exceeds a handful, as would be the case for large and heterogeneous online audiences. (b) Intersection shows common elements of two or more users or groups. Intersection can reveal common personas of interest, i.e., that most participants engaged with. (c) Difference can show personas that one group viewed exclusively, e.g., those that were unique to more experienced persona users. Finally, (d) subset and superset can help make comparisons on the variety of personas visited. For example, in our previous case, User 2 is a superset of User 1 (i.e., User 2 visited all the personas that User 1 did and more). (a) Union (b) Intersection (c) Difference (d) Sub- and supersets Figure 15: Examples of basic set operations and how they can be used for investigating persona user behavior. While approaching the analysis of persona user behavior using set theory seems fruitful, we can include even more information in the comparison. Namely, the set representation ignores that the visits are typically unevenly distributed among the personas, both by count and duration. Some personas are viewed more than once; some are viewed considerably longer than others. A set would not consider this information at all. For a representation that considers such information, we can turn to empirical distributions or probability distributions. These indicate how the time or number of visits is allocated between different personas during a user session. For example, if a user visits Alaa 5 times, Putri 5 times, and John 2 times, the empirical distribution is 5 / 12; 5 / 12; 2 / 12; or [0.42, 0.42, 0.16]. For a finite set of personas and users, we can compute a complete probability distribution for each user, where complete implies that each persona-user pair will have a value. Then, from information theory, we can use several metrics to compare the obtained probability distributions. These are known as statistical distance metrics (e.g., Kullback–Leibler (KL) divergence or Jensen-Shannon distance). The smaller the distance between two users, the closer their behavior is in terms of how they divide their time among the available personas. Applying this logic to our dataset, we obtain the probability distributions indicated in Table 6. Table 6: Distribution of example users’ visits among the shown personas. Non-zero values are highlighted. Mamdo uh Rah ul Ashl ey Muham mad Ala a Joh n Abdala ziz Riz ky Put ri Chr is Us er 1 10% 0% 0% 0% 40 % 0 % 10% 20 % 20 % 0% Us er 2 10% 10 % 10% 10% 10 % 10 % 10% 10 % 10 % 10 % Us er 3 20% 0% 20% 0% 20 % 10 % 10% 0% 10 % 10 % The distance D between two distributions can be computed using the following equation, , 1 1   |  ( ) ( ) | N p q i i i D p x q x N = = − , where p and q are the distributions to compare (e.g., User 1 and User 2). The formula calculates the absolute difference for both users for each persona X, and then takes the average as the distance number. Unlike KL divergence, which is non-symmetrical (i.e., the distance between User 1 and User 2 might not be the same), this formulate gives symmetrical results (i.e., Dpq = Dq,p). (As a sidenote, symmetry is, of course, desirable for our purpose, because there is no reason to assume that the results should differ when comparing User 1 to User 2 or vice versa; in both cases, the sequences are the same.) When inputting the fractions from Table 6 to this formula, we can see the results aligning with the J comparison, so that Users 2 and 3 are the most similar (D = 0.06), whereas User 1 is equally distant to User 2 (D = 0.10) and User 3 (D = 0.10). This example illustrates how concepts from information science can be leveraged for persona science, namely, by understanding persona viewing behaviors as probability distributions and then computing distance. Essentially, the smaller these distance values are, the closer two persona viewing behaviors would be (as the behaviors are represented as probability distributions). 6.8 Interpreting the Metrics It is well known that general rules about whether a metric value is “good” or “bad” are difficult to draw – for example, for an entertainment website or social media service, it is desirable that users spend a lot of time on the site, because time is positively correlated with revenue models [100]. However, the opposite applies for government information websites or search engines – the user is expected to find the information as quickly as possible and then leave the site. So, in some cases, small engagement time is optimal; in other cases, it is not optimal. For personas, the same applies, with even more nuance – in many professional tasks, users are time- pressed; they have deadlines, they want the information immediately, and so on. For these scenarios, low engagement time with a persona system would be considered a good sign (given that the user was able to complete their task successfully or that the personas helped the user). However, in scenarios where the user is conducting end-user research (e.g., market research), they might be interested in dwelling deep into the persona details: a low engagement time would therefore not indicate that the personas were useful for the user. We provide further examples, as the matter of defining and computing various metrics is not trivial. Example 1: Consider that a user visits 5 personas during the session and does overall 20 visits. Now, if we apply the average, we get 20 / 5 = 4. however, the comparative aspect is if user visiting these 5 personas during their first 5 visits, versus visiting the 5 personas during their first 10 visits, the behavioral pattern is different. In the former case, the user is first visiting many personas, and then spending the rest of the time comparing them. In the latter case, the user is engaging in comparative behavior already during visiting the first personas. These two strategies would be different, but a simple average would miss the nuance. To quantify such patterns, we can compute x = how many personas the user visited and y = how many visits it took the user to visit each of the x personas at least once. This metric can be called “persona scanning tendency” or PST. Using the above example, the scores would be different: PST5 = 5 / 5 = 1 versus PST10 = 5 / 10 = 0.5. A lower PST score would indicate a smaller tendency to scan all or many personas first and then dwell into their information, and vice versa. A high PST score could also be associated with a linear user behavior, in which the user sequentially visits the available personas. In contrast, a low PST score could be associated with non- linear user behavior. For example, if it took the user ten visits to see all the five personas, they were likely doing comparisons along the way. Therefore, this PST metric can reveal insights about different users’ tendency to compare personas against one another and proceed in an organized manner when browsing the available personas. Example 2: How to measure how diversely a user browses the personas? Highly diverse behavior would be one that looks at many personas. For example, if there were a total of 10 personas available, 5 out of 10 personas visited is less diverse than 10 out of 10 personas visited. That is, persona coverage would seem like a good metric (being 5 / 10 = 0.5 and 10 / 10 = 1 for the above cases). But, if the user only made 5 visits and visited 5 personas (out of the 10), then that is more diverse than making 12 visits and visiting 6 personas. In other words, when assessing the diversity of visit behavior, understood in this manner, the denominator should be the number of visits the particular user made, not the number of available personas. So, even if persona coverage would be higher for the user visiting 6 personas (6 / 10 = 0.6), their visit diversity would still be lower than that of the user visiting one persona less (i.e., 6 / 12 = 0.5 vs. 5 / 5 = 1). There are also cases where different granularity would be needed, for example, considering all information elements (of which there are 120) versus only parent information elements (of which there are 8), or considering forward-only movement versus both forward and backward movements (i.e., across information elements and/or personas). Depending on these choices, the dimensionality of the analyzed dataset can greatly increase or decrease. Overall, these examples highlight the non-trivial nature of measuring and understanding persona user behavior. Intepreting PA metrics follows the general patterns in UX research: a high value can indicate either a desirable or undesirable effect depending on the task type and user goals: for example, longer dwelling time may be a sign of more interest, but it could just as well be a sign of confusion and disorientation. Aligning the metrics with expressed user perceptions (e.g., “I was confused when using the personas”) can be useful in this regard. Overall, case-dependent interpretation is required. 7 Discussion 7.1 Highlights and Novelty Even though there is some empirical evidence showing the effectiveness of personas for specific tasks [78], as a whole, the body of knowledge remains underdeveloped, even after 20 years of personas being part of HCI research and practice. The purpose of PA is to help generate new knowledge on persona user behavior towards the advancement of persona science. Examples include behaviors related to order of presenting the personas, revisit frequency, users’ styles of browsing and comparing personas, and persona choice – i.e., how and why people choose a specific persona for their decision-making task. Analyzing users’ typical viewing patterns and dwell time per persona information can inform persona information design (i.e., what information users most interact with), and help deduce understanding of persona usage based on a real system and real or realistic use cases and scenarios. Combined with algorithmically generated persona systems, this opens the possibility of updating persona profiles in real-time based on expressed user needs. It is well known that the body of knowledge on personas relies on a high number of case studies as opposed to repeated experiments with independent samples (i.e., different organizations, locations, users, etc.). Even when a case study focuses on a specific context, there is a need for multiple case studies to establish something more generalizable than the one case can do [99]. Without having more rigor, persona research cannot proceed much, and it risks going in circles instead of establishing evidence-based empirical phenomena. Examples of this “going on circles” include conflicting findings about personas being applicable and not being applicable (e.g., [70] vs. [60]) – to date, nobody has explained when personas are applicable and when not. The only way to achieve plausible explanations is to carry out repeated testing and measure the results empirically, for which PA provides a way. Users’ interaction with personas can be measured in many ways. Here, we focused on two commonly used technologies: mouse tracking and eye tracking, as these two technologies have unique strengths and weaknesses, and both can be implemented in a Web-based interactive persona system. As acknowledged in many HCI studies, eye- and mouse-tracking are helpful techniques for studying user engagement with interactive systems [15,91], and personas in particular [26,76,78,85,86]. When integrated directly into an interactive persona system, these data collection methods can deliver rich datasets describing persona use and modeling for complex user behaviors [18,22]. Moreover, integrating these techniques also enable user studies during exceptionally difficult times (e.g., during a global pandemic) when it is either not possible or very difficult to conduct in-person user studies. Highlights of this work include the following: • Providing conceptual underpinnings of persona analytics and persona science, two emerging and promising concepts for HCI and user segmentation. • Developing a novel persona analytics system embedded within a persona system, by integrating mouse- and eye-tracking functionalities. Measuring users’ mouse and gaze activities grants an understanding of the behaviors of a persona user. Studying the behavior of the persona user is essential to generate empirical knowledge and theories of personas and human-persona interaction. • Demonstrating an end-to-end experimental loop for empirical persona studies, using an interactive persona system and data collection platforms with a user study of 144 participants. Remote user studies such as this help scale persona user studies from the conventional 30-or-so to hundreds of participants. • Providing exploratory findings of persona user behavior, based on the use of PA, its metrics, and various statistical techniques. The trends contributing to interactive persona systems are likely to continue, including the evolution of (1) digital user data from online analytics platforms through APIs, (2) data science algorithms and libraries that integrate into quantitative persona creation process, and (3) web medium that surpasses the limitations of paper for persona delivery and user engagement. Interactive persona systems commonly rely on web standards [33] that enable users to access personas from any device with an internet connection and make it possible to record persona users’ behavior and analyze with using PA, a customized solution for tracking users’ interaction with persona profiles. This system, therefore, has value and potential for advancing persona science and the development of interactive persona systems for years to come. The current work puts forward a new artifact for studying the behavior of persona users. While previous persona user studies [73,76,78] have analyzed dwell time and users’ information viewing sequence, these metrics have, as far as we know, not been captured directly from an interactive persona system in any previous research. Thus, the presented solution has novelty in its field. As far as we know, the study presented the first “online laboratory” solution for interactive persona systems during a period in history where there is a need for remote user study solutions. 7.2 Novel Opportunities to Push Forward Persona Science Persona science needs progress on all fronts, eyeing on long-term theory formulation but also investing in short- term returns through the use of empirical methods. Persona science can contribute to a much-needed transition beyond the general claims that “personas work” or “personas do not work”, or the repetition of their “benefits” and “problems,” into systematically examining the conditions where the effects emerge. Many human factors have not been investigated empirically. Based on case studies, variables of special interest include at least (a) Experience [81], (b) Task type [4], (c) Job role [59], and (d) Culture [56]. Investigation of combination of these human factors would be based on specific research objectives. First, the effect of users’ experience with personas on behaviors; this tends to be reported in persona studies but not included as a variable. How does novice persona users’ use of personas differ from more experienced users? Can the behaviors of more experienced users be used for guiding the novice users to learn to use personas more efficiently? The task and task type, more specifically. This is reported but rarely controlled – most typically, only one task type is deployed and in only one empirical setting, without repetition to achieve robustness. As a result, the current body of literature has, for example, no comparison between different task types – e.g., design, content creation, ad targeting, etc. Personas can be deployed for a range of professional tasks, but no study compares what kind of personas are ideal for the various task types and if users approach the personas differently based on the task type. The users’ job roles – again, reported, but comparison among different roles and organizational units are rarely conducted, even though it is common sense that a person’s job position would greatly affect how they use personas to support their work. Studies tend to mention “designers” but looking deeper into these users’ job positions, it is revealed that they work in multiple departments, have multiple different perspectives to the end- user, and require much different information for their decision making. Overall, systematic analysis of these variables in experimental studies can produce long-lasting, consistent, and robust knowledge on personas and their users, extending the boundaries of persona research. Finally, culture. Previous research has hinted at cultural effects related to personas use [83], but there is no adequate understanding of how the cultural match between the shown personas and the users mediate the interaction and whether personas themselves can help bridge cultural gaps for design. Table 7 proposes a preliminary research roadmap. Naturally, due to the enormous scope of potential research topics, this proposal is not a complete one. However, it maps some relevant questions that can be addressed via PA. Table 7: Research roadmap for empirical persona research. This roadmap includes research questions that PA can help address. Through solving multiple research questions, researchers can start formulating a unified theory of personas that would explain, with robust empirical when and when not personas work, what factors govern their successful use, and how can persona creators and champions increase the likelihood persona projects. Open Research Question (ORQ) and Sub- Questions (SQ) Useful for…* Variation by... ORQ1 What types of personas are most/least viewed? • SQ1a: Are there differences based on personas’ gender, age, nationality, ethnic background? • SQ1b: Are some types of personas systematically disadvantaged in terms of how frequently and how long users interact with them? If so, can system features, e.g., rearranging order of showing, mitigate such disadvantages? Persona Creation and Use ⇒ How do users interact with persona profiles? experience task type job role culture ORQ2 What persona information was most/least viewed? • SQ2a: How is users’ consumption of persona information affected by position and screen size of the information? • SQ2b: Is some information considered redundant regardless of its position and screen size? Persona Information Design ⇒ What information do persona users pay attention to? ORQ3 How does the user transition between (a) the personas and (b) the information elements in the persona profiles? • SQ3a: What is the degree of linearity / predictability? • SQ3b: Are the dwell times consistently increasing / decreasing? Persona Information Design ⇒ How can we model users’ cognitive styles of using personas? ORQ4 How does increasing/decreasing the number of personas affect user behavior? • SQ4a: What interaction techniques can help users cope with more than a handful of personas? • SQ4b: What is the extra cognitive cost of adding a persona? Persona User Behavior ⇒ What is the optimal number of personas? ORQ5 How do the viewed (a) personas and (b) persona information influence users’ design choices? • SQ5a: What is the effect of persona use of desirable outcomes such as increased usability, satisfaction, profitability? • SQ25b: How can these outcomes be measured? Persona Impact, Value of Personas ⇒ What is the real value of personas? Open Research Question (ORQ) and Sub- Questions (SQ) Useful for…* Variation by... *NOTES: The categories of Persona Creation, Validation, Use, and Impact originate from [77]. ⇒ indicates a higher abstraction of the ORQ in question. 7.3 Practical Implications Benefits for Researchers. Persona science is the application of scientific methods to persona research, with the goal of producing empirically valid knowledge about persona creation, validation, use, and impact for design outcomes, individual decision making, and organizations. PA supports persona science by aiding in the scientific process, such as generation and testing of hypotheses: data exploration → inductive analysis → propositions → hypotheses → independent data collection → hypothesis testing → theory generation of personas and their users. PA can be deployed for both inductive and deductive research. Deductive research typically formulates hypotheses and relies on experiments to test those hypotheses. The hypotheses can be inspired by works in HCI, social psychology, economics, etc. Manipulating variables in the interactive persona system, such studies can be conducted in-person or remotely (see Figure 7 illustrating an end-to-end loop for remote participation). Inductive studies rely on freely exploring user behaviors to formulate propositions that can later be tested in controlled settings. Here, we conduct a series of exploratory analyses in order to demonstrate the type of information, metrics, and analyses that the PA can afford to researchers. Benefits Relative to Pre-Existing Solutions. From a practical point of view, the reader might pose the question, “Why not just use Google Analytics for this purpose? Why develop a new system?”. There are various reasons for that. Compared to pre-existing industry solutions, such as Google Analytics (GA), which is the dominant Web analytics service [65], PA has four major benefits, summarized in Table 8 and explained thereafter: Table 8: PA vs. GA – a practical comparison. Persona Analytics Google Analytics Data ownership x - Tracking customization x Partial Clickstream logs x Requires Premium version Persona metric exports x - • Data ownership: the data is recorded to our serves, not to third party servers such as Google’s. • Customized event and information tagging: As we know the exact functionality of the APG system, we can label the information elements and events with suitable names from the onset. • Clickstream data: Unlike in the standard GA installation, we are able to obtain raw log data of all actions taken by a user in the APG system. The standard GA installation only provides aggregate data export. • Tailored reports for persona user analysis: GA reports are designed for websites, not for custom-built systems like interactive persona systems. Therefore, the available metrics and the reports are not suitable for analytical questions related to personas. The development of our reports was inspired by analytical tasks for which the data would be used. Thus, the reports serve persona research better than reports in GA. With these benefits, the PA system is more equipped for tracking persona user behavior than GA, and therefore has the potential to serve researchers more adequately. Benefits for end-users. Personas are an important tool for professionals in various domains, which is why understanding how people use them is instrumental for the creation of better end-user insights tools and therefore advocate more customer- and user-centric thinking in organizations. The practical implementation of PA also requires consideration for the users’ privacy, to let them know their usage of the system is being tracked. Therefore, we notify users of this tracking in APG’s terms of service, which is similar to any other website tracking user behavior. Additional ethical considerations include acquiring consent from users when conducting user studies to track their system usage behavior, as well as acceptance from an institutional review board (IRB) when there is a reason to suspect that the research deals with topics that warrant ethical scrutiny (in persona user studies, harmful scenarios tend to be rare but nonetheless could exist for some study topics). 7.4 Specific Application Areas for Future Research and Development There are several areas to further deepen the level of analysis. The data produced by the PA system can be examined using many computationally advanced approaches, for example, by creating a persona state-transition matrix and applying Markov Chain modeling or neural networks to the constructed matrix to model historical dependencies (similar to [94]). Similarly, future work could look into predicting user outputs (e.g., task success) using behavioral features. There are several architectures to deal with sequential data, including recurrent neural networks (RNNs) [10], that could be used for modeling purposes that can result in persona recommenders or user behavior classification. While we leave these additional considerations for future work, we do want to point out two prominent opportunities: • Prediction. By varying the personas, their information, or interactive features, the effect of such variables on outcome variables such as persona choice, task completion success, quality, time, perceived usefulness, or design task impact (e.g., usability improvement) can be measured. For example, when there are many personas available, users often choose a specific persona for their task. This choice matters because selecting one persona over another can influence how different user groups’ needs are considered (or not considered). • Persona recommenders. For large and heterogeneous populations, the appropriate number of personas can be in the hundreds or more to correctly represent the diversity of the user population [11]. Therefore, there is a need, on the one hand, to show more personas to users, and on the other hand, to build tools and methods for users or systems to narrow down the number of candidate personas for a given task, without taking away users’ choice of reaching beyond this candidate pool to browse marginalized or fringe user groups [21]. Approaches for achieving this may include recommending users a specific set of personas based on task criteria or user modeling. Simple examples that have been already implemented in interactive persona systems include sorting the personas based on their segment size (i.e., how many people they represent in the baseline data), either in decreasing order (when wanting to see the most representative personas) or in increasing order (when wanting to see outlier personas). However, more elaborate persona recommenders are missing to date. The recommenders’ value increases as the number of personas increases because large persona sets with hundreds or even thousands of personas [27] pose a manageability problem for a human efficiently to deal with. Finally, the PA approach could be applied to other forms of profile systems, including social media profiles, gaming avatar profiles, and so on. Although the system was designed with personas in mind, if the use case of wanting to learn how people interact with profiles is adequately similar, then PA-type of analytics can be deployed. 8 Conclusion In this work, we demonstrated a fully functional persona analytics system embedded within a fully functional interactive persona system. We demonstrated ways in which persona analytics can reveal structural patterns in user interaction with personas. These patterns can lead to advancement of persona science and theoretical propositions for persona-user interaction. Persona research lacks empirical studies, so there is plenty of room for contributions by keen researchers. To this end, we hope our research encourages others to pursue empirical research questions in the persona domain. The techniques we demonstrated have special value during exceptional times when physical user studies are hindered by social distancing. Finally, we proposed seven metrics that can be computed from the persona analytics data. In addition to these metrics, devising new quantitative metrics for persona studies is a valuable research direction. References [1] Mohamed Aboelmaged and Samar Mouakket. 2020. Influencing models and determinants in big data analytics research: A bibliometric analysis. Information Processing & Management 57, 4 (July 2020), 102234. DOI:https://doi.org/10.1016/j.ipm.2020.102234 [2] Jisun An, Haewoon Kwak, Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 2018. Customer segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data. Social Network Analysis and Mining 8, 1 (2018), 54. DOI:https://doi.org/10.1007/s13278-018-0531-0 [3] Jisun An, Haewoon Kwak, Joni Salminen, Soon-gyo Jung, and Bernard J. Jansen. 2018. Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data. ACM Transactions on the Web (TWEB) 12, 4 (2018), 27. DOI:https://doi.org/10.1145/3265986 [4] F. Anvari, D. Richards, M. Hitchens, and M. A. Babar. 2015. Effectiveness of Persona with Personality Traits on Conceptual Design. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 263–272. DOI:https://doi.org/10.1109/ICSE.2015.155 [5] Farshid Anvari, Deborah Richards, Michael Hitchens, Muhammad Ali Babar, Hien Minh Thi Tran, and Peter Busch. 2017. An empirical investigation of the influence of persona with personality traits on conceptual design. Journal of Systems and Software 134, (December 2017), 324–339. DOI:https://doi.org/10.1016/j.jss.2017.09.020 [6] Farshid Anvari, Deborah Richards, Michael Hitchens, and Hien Minh Thi Tran. 2019. Teaching User Centered Conceptual Design Using Cross-Cultural Personas and Peer Reviews for a Large Cohort of Students. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), 62–73. DOI:https://doi.org/10.1109/ICSE- SEET.2019.00015 [7] M. Aoyama. 2005. Persona-and-scenario based requirements engineering for software embedded in digital consumer products. In Proceedings of the 13th IEEE International Conference on Requirements Engineering (RE’05) , Washington, DC, USA, 85–94. DOI:https://doi.org/10.1109/RE.2005.50 [8] M. Aoyama. 2007. Persona-Scenario-Goal Methodology for User-Centered Requirements Engineering. In Proceedings of the 15th IEEE International Requirements Engineering Conference (RE 2007) , Delhi, India, 185–194. DOI:https://doi.org/10.1109/RE.2007.50 [9] Ernesto Arroyo, Ted Selker, and Willy Wei. 2006. Usability tool for analysis of web designs using mouse tracks. In CHI’06 extended abstracts on Human factors in computing systems, 484–489. [10] Homanga Bharadhwaj. 2019. Explainable recommender system that maximizes exploration. In Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion , 1–2. [11] Chris Chapman, Edwin Love, Russell P. Milham, Paul ElRif, and James L. Alford. 2008. Quantitative Evaluation of Personas as Information. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 1107–1111. DOI:https://doi.org/10.1177/154193120805201602 [12] Chris Chapman and Russell P. Milham. 2006. The Personas’ New Clothes: Methodological and Practical Arguments against a Popular Method. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 634–636. DOI:https://doi.org/10.1177/154193120605000503 [13] Eric Chu, Prashanth Vijayaraghavan, and Deb Roy. 2018. Learning Personas from Dialogue with Attentive Memory Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2638–2646. Retrieved June 20, 2019 from https://www.aclweb.org/anthology/D18-1284 [14] Theresa Bilitski Clarke and Bernard J. Jansen. 2017. Conversion potential: a metric for evaluating search engine advertising performance. Journal of Research in Interactive Marketing 11, 2 (2017), 142–159. [15] Argyris Constantinides, Marios Belk, Christos Fidas, and Andreas Pitsillides. 2020. An eye gaze-driven metric for estimating the strength of graphical passwords based on image hotspots. In Proceedings of the 25th International Conference on Intelligent User Interfaces, 33–37. [16] Alan Cooper. 1999. The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity (1 edition ed.). Sams - Pearson Education, Indianapolis, IN. [17] Giuseppe Desolda, Rosa Lanzilotti, Danilo Caivano, and Maria F. Costabile. 2021. An experience on remote testing exploiting new web technology. In INTERACT’21 Workshop on Remote user Testing – Experiences and Trends, Bari, Italy. [18] A. T. Duchowski. 2009. Eye Tracking Methodology: Theory and Practice. Springer, London. [19] Hermann Ebbinghaus. 2013. Memory: A contribution to experimental psychology. Annals of neurosciences 20, 4 (2013), 155. [20] Achim Ebert, Shah Rukh Humayoun, Norbert Seyff, Anna Perini, and Simone D.J. Barbosa (Eds.). 2016. Usability- and Accessibility-Focused Requirements Engineering. Springer International Publishing, Cham. DOI:https://doi.org/10.1007/978-3-319- 45916-5 [21] Joy Goodman-Deane, Sam Waller, Dana Demin, Arantxa González-de-Heredia, Mike Bradley, and John P. Clarkson. 2018. Evaluating Inclusivity using Quantitative Personas. In In the Proceedings of Design Research Society Conference 2018, Limerick, Ireland. DOI:https://doi.org/10.21606/drs.2018.400 [22] Thomas Grindinger, Andrew T. Duchowski, and Michael Sawyer. 2010. Group-wise Similarity and Classification of Aggregate Scanpaths. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications (ETRA ’10), ACM, New York, NY, USA, 101–104. DOI:https://doi.org/10.1145/1743666.1743691 [23] Jonathan Grudin. 2006. Why Personas Work: The Psychological Evidence. In The Persona Lifecycle, John Pruitt and Tamara Adlin (eds.). Elsevier, 642–663. DOI:https://doi.org/10.1016/B978-012566251-2/50013-7 [24] Jonathan Grudin and John Pruitt. 2002. Personas, Participatory Design and Product Development: An Infrastructure for Engagement. In Proceedings of Participation and Design Conference (PDC2002), Sweden, 8. [25] Jacek Gwizdka, Rahilsadat Hosseini, Michael Cole, and Shouyi Wang. 2017. Temporal dynamics of eye- tracking and EEG during reading and relevance decisions. Journal of the Association for Information Science and Technology 68, 10 (October 2017), 2299–2312. DOI:https://doi.org/10.1002/asi.23904 [26] Charles G. Hill, Maren Haag, Alannah Oleson, Chris Mendez, Nicola Marsden, Anita Sarma, and Margaret Burnett. 2017. Gender-Inclusiveness Personas vs. Stereotyping: Can We Have it Both Ways? In Proceedings of the 2017 CHI Conference, ACM Press, Denver, Colorado, USA, 6658–6671. DOI:https://doi.org/10.1145/3025453.3025609 [27] Bernard J. Jansen, Soon-gyo Jung, and Joni Salminen. 2019. Creating Manageable Persona Sets from Large User Populations. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, Glasgow, United Kingdom, 1–6. DOI:https://doi.org/10.1145/3290607.3313006 [28] Bernard J. Jansen, Soon-gyo Jung, and Joni Salminen. 2020. From flat file to interface: Synthesis of personas and analytics for enhanced user understanding. Proceedings of the Association for Information Science and Technology 57, 1 (October 2020). DOI:https://doi.org/10.1002/pra2.215 [29] Bernard J. Jansen, Joni Salminen, and Soon-gyo Jung. 2020. Data-Driven Personas for Enhanced User Understanding: Combining Empathy with Rationality for Better Insights to Analytics. Data and Information Management 4, 1 (2020), 1–17. DOI:https://doi.org/10.2478/dim-2020-0005 [30] Bernard Jansen, Joni Salminen, Soon-gyo Jung, and Kathleen Guan. 2021. Data-Driven Personas (1st ed.). Morgan & Claypool Publishers. Retrieved February 10, 2021 from https://www.morganclaypool.com/doi/abs/10.2200/S01072ED1V01Y202101HCI048 [31] Joel Järvinen and Heikki Karjaluoto. 2015. The use of Web analytics for digital marketing performance measurement. Industrial Marketing Management 50, Supplement C (October 2015), 117–127. DOI:https://doi.org/10.1016/j.indmarman.2015.04.009 [32] Soon-gyo Jung, Jisun An, Haewoon Kwak, Moeed Ahmad, Lene Nielsen, and Bernard J. Jansen. 2017. Persona Generation from Aggregated Social Media Data. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA ’17), ACM, Denver, Colorado, USA, 1748–1755. [33] Soon-Gyo Jung, Joni Salminen, Jisun An, Haewoon Kwak, and Bernard J Jansen. 2018. Automatically Conceptualizing Social Media Analytics Data via Personas. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM 2018), San Francisco, California, USA, 2. [34] Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 2019. Personas Changing Over Time: Analyzing Variations of Data-Driven Personas During a Two-Year Period. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems - CHI EA ’19, ACM Press, Glasgow, Scotland Uk, 1–6. DOI:https://doi.org/10.1145/3290607.3312955 [35] Soon-Gyo Jung, Joni Salminen, and Bernard J. Jansen. 2020. Giving Faces to Data: Creating Data-Driven Personas from Personified Big Data. In Proceedings of the 25th International Conference on Intelligent User Interfaces Companion (IUI ’20), Association for Computing Machinery, Cagliari, Italy, 132–133. DOI:https://doi.org/10.1145/3379336.3381465 [36] Soon-gyo Jung, Joni Salminen, and Bernard J Jansen. 2021. Implementing Eye-Tracking for Persona Analytics. In ETRA ’21 Adjunct: ACM Symposium on Eye Tracking Research and Applications, ACM, Virtual conference, 1–4. DOI:https://doi.org/10.1145/3450341.3458765 [37] Soon-gyo Jung, Joni Salminen, and Bernard J. Jansen. 2021. Persona Analytics: Implementing Mouse- tracking for an Interactive Persona System. In Extended Abstracts of ACM Human Factors in Computing Systems - CHI EA ’21, ACM, Virtual conference. DOI:https://doi.org/10.1145/3411763.3451773 [38] Soon-gyo Jung, Joni Salminen, Haewoon Kwak, Jisun An, and Bernard J. Jansen. 2018. Automatic Persona Generation (APG): A Rationale and Demonstration. In CHIIR ’18: Proceedings of the 2018 Conference on Human Information Interaction & Retrieval, ACM, New Jersey, USA, 321–324. DOI:https://doi.org/10.1145/3176349.3176893 [39] Pascal J. Kieslich, Felix Henninger, Dirk U. Wulff, Jonas MB Haslbeck, and Michael Schulte- Mecklenbeck. 2019. Mouse-Tracking: A Practical Guide to Implementation and Analysis 1. In A handbook of process tracing methods. Routledge, 111–130. [40] Ari Kolbeinsson, Erik Brolin, and Jessica Lindblom. 2021. Data-Driven Personas: Expanding DHM for a Holistic Approach. In International Conference on Applied Human Factors and Ergonomics, Springer, 296–303. [41] Dannie Korsgaard, Thomas Bjørner, Pernille Krog Sørensen, and Paolo Burelli. 2020. Creating user stereotypes for persona development from qualitative data through semi-automatic subspace clustering. User Model User-Adap Inter 30, 1 (March 2020), 81–125. DOI:https://doi.org/10.1007/s11257-019-09252-5