Buddhi Weerasekara Sentiment Analysis and Stock Volatility Evidence from Financial News headlines Published related to Finland and NASDAQ Helsinki Vaasa 2024 School of Accounting and Finance Master’s thesis in Finance Master’s degree Programme in Finance 2 Contents 1 Introduction 7 1.1 Motivation 8 1.2 Finland 11 1.3 Identification of Gaps 15 1.4 Background 19 2 Literature Review and Critical Analysis 24 2.1 Sentiment Analysis 24 2.2 Model and Data Gaps 28 3 Data 31 3.1 Date Process Chart 31 3.2 Pre-Processing Data 32 3.3 Sentiment Score Calculation 34 3.3.1 FinBERT 36 3.3.2 Naïve Bayes and SVM 37 3.3.3 PYsentiment2 38 4 Methodologies 41 4.1 Statistical Models 41 4.1.1 Vector Autoregressive Model (VAR) 41 4.1.2 Auto Regressive Distributed Lag Model (ARDL) 43 4.1.3 VAR Vs ARDL 44 4.2 Hypothesis 1 – Correlation 44 4.3 Hypothesis 2 – Lagged Effect 45 4.4 Hypothesis 3 – High and Low Volatile Companies 47 4.5 Hypothesis 4 – Macroeconomic Rates 49 5 Test and Results 50 5.1.1 Testing Data for Descriptive Statistics 50 5.1.2 Alternative Models 51 5.1.3 Unit Root Test 52 5.1.4 Optimal Lag Test 53 5.1.5 Autocorrelation Test 53 3 5.1.6 Correlogram Q Statistics Test 54 5.1.7 Serial Correlation LM Test 55 5.1.8 Adjusting for Autocorrelation 55 5.2 Results 56 6 Conclusions 60 7 Further Research and Limitation 61 8 References 62 9 Appendix – Statistical Results 71 9.1 Statistical Test Results – EViews 71 9.2 Python Workfiles 75 4 List of Images, Figures, Tables, Equations, and Appendices Images Image 1. Sentiment map from Finviz.com 11 Image 2. Visual sentiment web application 23 Image 3. Sample sentiment analysis heatmap 25 Image 4. Words frequency map of Finnish financial news 34 Figures Figure 1. Data processing map 31 Figure 2. Word frequency bar chart 35 Figure 3. Word frequency pie chart 35 Figure 4. FinBERT sentiment score calculation map 36 Figure 5. Sample sentiment score calculated from FinBERT 37 Figure 6. PYsentiment2 sentiment calculation map 38 Figure 7. Comparison of polarity and weighted average sentiment scores 40 Figure 8. Map of statistical tests 50 Figure 9 Volatility comparison of NASDAQ Helsinki and sentiment scores 50 Tables Table 1. Volatility distribution of each stock 48 Table 2. Residual distribution of each stock 48 Table 3. Volatility and residual distribution of NASDAQ Helsinki and sentiment 51 Table 4. Least Squares of three lags results 59 Equations Equation 1. Naïve Bayes theorem 37 Equation 2. Weighted average of sentiment scores 39 Equation 3. Personal opinion score calculation 39 Equation 4. Day’s sentiment calculation 40 Equation 5. Basic Vector Autoregressive (VAR) equation 42 5 Equation 6. Basic Autoregressive Lag Equation (ADL) 42 Equation 7. Breakdown of VAR 42 Equation 8. Autoregressive Distributed Lag Model (ARDL) 44 Equation 9. Pearson Correlation Equation 45 Equation 10. Application of VAR 45 Equation 11. Application of ARDL Lag 1 46 Equation 12. Application of ARDL Lag two 46 Equation 13. Application of ARDL with Macroeconomic Variables 49 Appendices 71-78 Appendix 9.1.1. OLS Estimator Appendix 9.1.2. Descriptive Statistics of Variables Appendix 9.1.3. VAR Estimation NH and NS Appendix 9.1.4. ARDL Estimation NH and NS Appendix 9.1.5. ARDL Individual Stocks Appendix 9.1.6. ARDL P-Values of All Variables Appendix 9.1.7. Unit Root Test Appendix 9.1.8. Optimal Lag Test Appendix 9.1.9. Correlogram Q-Stat Tests Appendix 9.1.10. Serial Correlation Tests Appendix 9.2.1. PYsentiment Codes Appendix 9.2.2. FinBERT Codes 6 UNIVERSITY OF VAASA School of Accounting & Finance Author: Buddhi Weerasekara x1461755 Title of the Thesis: Sentiment Analysis and Price Volatility Degree: Master’s of Finance (Business Economics) Programme: Master’s Supervisor: Klaus Grobys Year: 2024 Pages: 78 ABSTRACT: The challenge of forecasting the stock market fascinates researchers. Studies that use innova- tive prediction approaches continue to emerge, despite the overwhelming body of data sug- gesting that the dynamics of the financial market cannot be foreseen. The widely accepted theory for predicting stocks is the Efficient Market Hypothesis. However, numerous studies have still been conducted in the field of stock price prediction. With the use of news sentiment, this thesis examines non-quantifiable financial data, such as financial news headlines, through machine learning techniques in Python to quantify the news headlines. Predicting financial market trends using time series analysis and natural language processing is a challenging and complex task given the magnitude of factors that might affect stock prices, such as political and economic events. Nevertheless, due to the rapid development of digital platforms, the clarity of forecasting future financial trends has greatly improved. A variety of financial data can be freely accessed and evaluated for insights, and people's opinions are widely shared, which may set the sentiment of the market. Numerous studies have examined the link be- tween public opinion and market volatility and suggest that sentiment expressed by individuals may influence market trends due to causes such as the ripple effect and the herd effect. The variations in opinion and irrational decision-making may depend on factors such as personal risk appetite, financial literacy level, and other social factors like age, gender, and religion. Therefore, it is interesting to examine these relationships for a smaller yet economically im- portant nation in Europe. This master’s thesis investigates the relationship between News Sentiment (NS) and NASDAQ Helsinki (NH) as well as their effect on a sample of companies with different volatility profiles. It also looks at the lagged effects of variables and the impact of inflation, interest rates, and indexes on each variable. The study concludes that there was no statistically significant rela- tionship between NS and NH and that NS could not account for the variation in NH. The study applies vector autoregressive and auto-regressive lag models to examine the relationship be- tween variables with an optimal lag length selected according to AIC. The first lag of NH on its own showed a strong correlation with the results of the ARDL test. confirming the finding from Engle in 1982, who reveals that prior error terms have predictive power over current error terms. However, the tests did not show a lagged effect of NS on NH, nor was there a noticeable correlation between other companies. The combined effect of NH and NS was found to be less significant for high-volatile companies in the selected sample than for low-volatile companies. Additionally, some variables' statistical significance suggests that inflation, interest rates, and indexes had a significant impact on them. KEYWORDS: Finland, Sentiment, Volatility, VAR, ARDL, NASDAQ Helsinki, Financial News. 7 1 Introduction Due to rapid advancements in technology and artificial intelligence (AI), many business environments are adapted to a fast processing pace. In financial markets, high-fre- quency trading is one good example (Deneuve, 2022). Macroeconomic factors, along with other significant indicators, are processed within seconds to derive an in- vestment decision (Rodrigo, 2024). In this process, news sentiment plays a key role, which is derived from financial news published around the world. Investors are inclined to base their financial decisions and trading activities on the analysis derived from various platforms, including artificial intelligent robots, and these transactions are completed within seconds (Oehler et al., 2021); (Wood et al., 2022). Due to this fast-moving nature, investors are left with little to no time to read the whole news article. Therefore, finan- cial news published tends to be short but sufficient to give important information, and news headlines are written and published in this manner to cater to those fast-moving investors (Sathya, 2023). According to a research paper series published by the Czech National Bank and its re- search team, stock prices are not solely driven by market fundamentals. Global events such as the Black Monday crash, dot-com, global financial crisis, and the latest crypto crash fueled by the FTX scandal are good examples (Gric et al., 2021). They document that the influence of mood on future returns is substantially greater in the case of indi- vidual investors than institutions. Another study reveals how the cross-section of stock returns is impacted by investor mood. They expect that stocks with higher sensitivity to their valuation and stocks that are difficult to arbitrage are more influenced by investor sentiment (Bake and Wurgler, 2006). The authors state that mispricing may be caused by these high investor emotions. These studies further support the significance of factor- ing investor sentiment into financial decision-making process. Due to the increasing value of sentiment analysis, analytical tools such as Sprout Social, RavenPack, and Buffer are popular sources to derive sentiment scores to forecast fu- ture prices. These systems use Natural Language Processing (NLP) methods to assess social media posts and news to understand consumer sentiment. A previous literature 8 that was performed based on RavenPack analytic tools document the firm-level return volatility and public news sentiment (Yip Ho et al., 2013). The authors process more than 1200 news releases along with their scores at high frequencies to examine intra- day price volatility and conclude that negative news affects volatility more than good news. Examining the relationship between stock market daily price volatility and financial news sentiment could provide valuable information into the dynamics of financial mar- kets. This thesis follows a methodical approach to quantify news sentiment on news headlines published related to Finland and arrive at a sentiment score to examine whether there is a significant connection to stock price volatility, thereby examining the possibility of predicting future prices through this approach. Sentiment analysis and stock market volatility together are heavily researched area mainly for major econ- omies and is insufficiently studied for the Nordic markets, specifically for Finland. 1.1 Motivation Financial news sentiment analysis is a fast-growing area that utilizes Natural Language Processing (NLP), Machine Learning (ML), and Bigdata analysis to extract useful inves- tor information from financial news published (Simon and Nelson, 2022). The objective is to identify investor sentiment towards a particular stock, index or any other financial instrument through a ML process and generate a sentiment score that can be used in financial planning. These sentiment scores can be beneficial to market participants in various ways. Investment and trading strategies can be supported by the data extracted from finan- cial news published on a particular asset. If sentiment data indicates a good trend for a specific stock, an investor may elect to purchase that stock, expecting its price to rise. On the other hand, investors may rush to sell the stock in negative news publications to minimize losses. This process can be seen as part of day trading strategy where stock- holders gather a quick glance at the day’s sentiment set by the daily news and make an irrational decision solely based on the mood set by the news item. 9 This concept of an irrational behavioral aspect is further discussed later in this paper. However, the sentiment scores can also be factored into long-term financial planning. Edward et al. (2007) explain how the mood set by the market can be used in pricing models like the Dividend Discount Model (DDM) and Capital Asset Pricing Models (CAPM). DDM discounts all future cashflow to arrive at the current stock price, which is a fundamental way of arriving at the value of a stock. Sentiment set by the market is not accounted for in this basic calculation. In order to factor sentiment scores into DDM, long-term sentiment scores may be considered along with the future growth prospects of the company. Tests such as Autoregressive Distributed Lag (ARDL) capture the long- and short-term relationships between variables. If the long-term relationship can be es- tablished with the help of an econometric model, DDM valuation can be improved to capture market sentiment as well by adjusting the required rate of return along with the dividend growth rate. Therefore, a positive long-term sentiment should lead to increased profit while a negative sentiment may lead to losses or the disposal of shares to avoid future losses. The rate of return required in the model represents the risk associated with the shares. If investors wish to apply DDM to their daily valuation of stocks, the required rate of return may be adjusted with the risk appetite of that investor coupled with the personal belief how the share will perform in the future. In this case, Investor may expect a higher return for a lower sentiment score and an increase in the required rate of return. Similarly, Edward et al. (2007) suggest how CAPM can be improved to capture market sentiment by adjusting a beta component. Boido and Fasano (2014) also document the importance of CAPM with sentiment by explaining the deviation of asset prices and its connection to sentiment indicators. The authors claim that higher sentiment is con- nected with a higher expected return and vice versa, and psychological biases among market participants are a significant barrier to the efficient market hypothesis. The the- oretical prices produced by the CAPM model do not align with observed prices, and mar- ket factors are insufficient to explain excess returns (Boido and Fasano, 2014). Another scientific paper examines two behavioral biases, ambiguity aversion and positive skew- ness related to CAPM with sentiment. In this paper, authors use Market Sentiment CAPM (MSCAPM) to explain the beta anomaly and three market strange behaviors as well as their impact on sentiment and conclude MSCAPM takes model certainty, positive 10 skewness, disaster risk, and market sentiment into account, which is similar to the three- factor asset pricing model (Boido and Fasano, 2014). In general, the beta in the CAPM measures the stock’s sensitivity to market returns. If a stock price follows market senti- ment, its beta can be modified to reflect this. For example, during moments of bullish sentiment, the beta may rise, indicating the stock’s higher risk and projected returns. Similar to these previous studies, this empirical analysis expects to first, identify the re- lationship between sentiment scores and stock prices and second, apply this relationship to asset pricing models to derive more realistic and accurate valuations. Once the link is established, the application of sentiment can be extended to various aspects of financial markets, including budgeting and forecasting, day trading strategies, asset pricing mod- els, financial advisory services and modeling, and financial consultancy. In addition, incorporating sentiment into financial planning may assist in identifying market volatility, understanding market behavior, and managing risk. Market volatility is the degree of change in the price of a financial asset over time. News emotions may have a substantial influence on this volatility. Contemporaneous news refers to news that occurs simultaneously with market movement. For instance, a breaking news item regarding a company’s earnings may create a sudden variation in the company’s stock prices (Yen et al., 2021). Market volatility may also have an impact from past news items, which is called the delayed effect. For example, while investors process the news, a re- port from a few days ago regarding a company’s leadership transition may still have an impact on the stock price today (Yen et al., 2021). The authors of this study claim that both contemporaneous and lagged news are main determinants of market volatility. Based on this, it would be fascinating to test and verify these concepts in this thesis. Building a financial modeling system that covers a wider view, including fundamental analysis, financial statement analysis, ratio analysis, competitor analysis, and market sentiment analysis, is the ultimate motivation of this exercise. Successful application of these aspects could support the decision-making process and thereby minimize finan- cial risk. A system can be developed to factor daily news sentiment into pricing models in real time. This idea is mainly motivated by using a freely available news sentiment website, https://finviz.com/map. This website provides a visual illustration of news https://finviz.com/map 11 sentiment for each stock in many indices. For example, below is an image of the senti- ment of major indices around the world. This application not only covers the sentiment of stocks but also other financial instruments like exchange-traded funds. Investors in Finland may also benefit from such a sentiment map, which can be supportive in iden- tifying the volatility of the stock market. finviz.com Image 1 1.2 Finland Finland is one of the key countries in Europe, situated in a key geographic location bor- dering Russia and neighboring Sweden, with a stable economy supported by a less cor- rupt political environment (Aggarwal and Lyttle, 2022). Conducting a sentiment analy- sis on Finnish financial news and identifying its impact on price volatility may support many decision-makers and especially stock market investors, due to its strong hold in the European economy. Very little previous research covers empirical analysis on NASDAQ Helsinki (NH) and there are many justifications to support the fact that more studies should be undertaken on Finnish stock markets as the country is a key compo- nent in Europe, according to European Union update on principle countries (european- union, n.d.). First, Finland boasts a strong focus on technology and innovation, and the government launched the “Innovation and Skill in Finland 2021-2027’’ program, which supports well-being and employment, promoting Finland as a great destination to https://finviz.com/ 12 attract international talent. According to the Finland Promotion Board, Finland is well known for its success in bigdata, virtual reality, cyber security, AI, and 5G technology and claims to be the most digitalized country in Europe. A robust welfare system and a high-standard education system are also key economic factors that not only attract many international talents but also retain local talent in the country. Secondly, Finland manages to attract an impressive amount of Foreign Direct Invest- ment (FDI) into the country. In the year 2022, Finland placed second in the top ten western European countries based on the inward FDI performance ranking (Aggarwal and Lyttle, 2022). The country has demonstrated its ability to attract many projects, especially in the face of the COVID-19 pandemic, from 2019 to 2020, with project num- bers rising from 148 to 193. The increase in FDI during the years 2020-2021 was rec- orded at a notable 30.4% compared to the global average of 18.1%. The country is par- ticularly good at drawing businesses in the fields of media, communication, software, information technology (IT), business, and professional services. These industries ac- counted for 44% of total inbound greenfield FDI in 2021. FDI in the communication sector enjoyed an increase from 11 to 43 projects between the years 2020 and 2021. Setting up a stable economy and a business-friendly environment are key factors lead- ing Finland to attract more FDI, further supported by a successful pandemic recovery and flexible tax measures (Aggarwal and Lyttle, 2022). Hence, Finland may be consid- ered as a preferred destination for more inflows of FDI. Furthermore, Lloyds bank’s 2023 update on ‘’Investing in Finland’’ claims lack of cor- ruption, competitiveness, and a key location at the hub of an exciting area centered on Russia, Scandinavia, and the Baltic states as key positive factors. This update explains the favorable investing environment in the country with numbers published in statistics Finland data. According to that, over two-thirds of the total flow was accounted for by corporate acquisitions. In breaking down this by nation in euros, Luxembourg 2.9 billion, Sweden 2.4 billion, and Switzerland 1.5 billion. The total FDI in 2021 amounted to USD 98.5 billion, which is an 8.8% increase from 2020. The largest share of investment stocks is held by, Sweden at 23.1%, the U.S.A at 18.4%, Germany at 10.1%, and Luxem- bourg and Norway each at 6.6% (LloydsBank, 2023). 13 Based on these facts, Finland continues to attract investments into the country, which implies that the economy is in a stable state and will continue to advance. Other note- worthy positive factors that Finland boasts about are its multilingual population, least corruption, expertise in green technology, high work productivity, free market, highly industrialized economy, and high spending in research and development. Despite this positivity, Finland also has some weaknesses, such as its geographic vulnerability as it is located near Russia, which is a known politically conflicted area, lack of industrial competitiveness, small internet market, aging population, deterioration of current ac- counts, and large household debt (LloydsBank, 2023). NASDAQ Helsinki (NH) formally known as the Helsinki Stock Exchange, is the main plat- form where investors transact shares listed on the exchange. The annual volume of NH was reported at 13.13 billion in the year 2023. This figure is 20% of NASDAQ 100 index volume and Europe’s largest stock exchange Euronext’s volume for the same year. Ac- cording to NASDAQ (2022), NH proposed five key initiatives to revitalize the market- place as the new government formed in 2022. With the implementation of this plan, it is expected to support companies obtain equity financing more conveniently. The first area covered in this plan is revising capital income taxation to a neutral tax treatment and lower dividend tax for individual shareholders. Secondly, improve domestic inves- tors and encourage them to maintain long-term savings and direct investment in the stock market. According to this article, there are one million Finnish private sharehold- ers with 278,000 active equity trading accounts. The medium-sized investor base is smaller, and the domestic shareholding is concentrated among a small number of large institutional investors (NASDAQ, 2022). According to Euroclear (2022), almost half of the Finnish companies are held by foreign investors. Finland expects to attract more foreign investment, which supports the li- quidity and capital requirements of Finnish companies (NASDAQ, 2022). Therefore, NH is an important platform to enable these funds to be directed to companies in need. Further to the proposal extended, financial literacy is a key component to enable the whole plan to work feasibly and expect all members of society to be aware of financial 14 risks and raise awareness to make healthy financial decisions to reach future financial goals. These include fostering financial studies at the school and university level to equip students with the necessary knowledge to make effective financial decisions. The final initiative is to support green financing and green transition. In this process, NH expects the listed companies to report environmental, social, and governance (ESG), metrics according to the guidelines provided by NASDAQ. ESG designation Is a key mile- stone that companies are expected to achieve in addition to creating sustainable in- vestment opportunities. NH plays a pivotal role in assisting businesses to obtain equity-based financing to sup- port their expansion and plays an important role in the development of both the Nordic and European regions. Therefore, studies conducted on NH may provide significant dy- namic information, and assist in receiving further FDIs, and support individual and in- stitutional investors in making thorough and healthy investment decisions in the future. This paper attempts to support investors by investigating a key component of equity investment, financial news sentiment and price volatility. There are various studies car- ried out on sentiment analysis for Finnish markets, but the impact of financial news sentiment is not being looked at yet. One study by Rautiainen and Jokinen (2022) claims the relationship between social media use and stock prices is largely unknown in the Finnish markets and looks at this relationship for 105 Finnish public limited companies listed on the Nasdaq Helsinki (NH) through handpicked social media data. This study looks at the value and relevance of social media activities on Facebook, Instagram, LinkedIn, Twitter, and YouTube. The findings indicate that social media activity and popularity are valuable variables for forecasting stock prices. The study concludes that not all social media activities are equally important for managers and investors, emphasizing the importance of using multiple visual social media channels. This study only focuses on social media review and a large amount of financial news published on credible news sources may not be included. To address this omitted data bias, this thesis incorporates a large set of news headlines published in the Refinitive database for the year 2023. Another sentiment analysis study by Vankka et al. (2019) looks at online user reviews and headlines by 15 applying a hybrid algorithm that predicts review polarity using word embeddings and Finnish polarity lexicons. They use a weighted average method to arrive at the senti- ment score for each reviews and headlines. However, this study focuses on hotel and travel customer sentiment and may not impact stock prices unless reviews are subject to any listed companies. Therefore, this paper specifically looks at financial news sentiment analysis, and the paper is structured as follow: Chapter 1 captures the introduction, identification of gaps, research questions, and background. In the background section, a detailed analysis of previous literature conducted in this area is discussed. This part covers the scope of each study, process and method used, limitations, and drawbacks. Chapter 2 is a deeper literature review with a critical analysis that identifies the gaps in previous lit- erature and how the gaps are addressed in this study. Chapter 3 explains the process followed to prepare data for this empirical investigation, including the data sources of news headlines, individual prices, and macroeconomic rates. In addition, different ap- proaches used to calculate sentiment score for each news headlines are discussed in detail along with the justification for using the selected method for the empirical anal- ysis. Chapter 4 covers the econometric models used to test each hypothesis. This sec- tion also covers a detailed explanation of the statistical model used and the different tests performed to validate the data. Chapter 5 summarizes the test results and whether or not the null hypotheses are rejected. The conclusion chapter summarizes the overall results, and Chapter 7 discusses the limitations and further research. 1.3 Identification of Gaps As credible news sources are the preferred choice for information, social media plat- forms may not meet this standard at all times. Therefore, performing this empirical investigation on the news headlines downloaded from a credible news source, Refin- itive Data, holds more value as they have a dedicated user-generated content (UGC) verification team to detect false information and validate news (Reuters, n.d.). Reuters also claims that the platform has a set of strict rules and principles based on their Trust 16 Principles, to which all journalists must adhere, and that news services offer all sides of a story promptly and are impartial, accurate, and independent. Matthia (2014) found that Reuters sentiment can explain and predict stock performance better than macroe- conomic factors. On the other hand, social media user accounts may not follow such standards (Stearns and Kille , 2015). Therefore, having reliable information to conduct this empirical analysis may hold more value. Based on this data, this study attempts to address the lack of studies performed on one of the influential aspects of Finnish stock market investment by answering the below research questions: 1) Do the sentiment scores derived from news headlines have a significant impact on price volatility? 2) Is there a significant lagged effect of news sentiment on price volatility? 3) Are companies with high price volatility more affected by the combined rela- tionship of news sentiment and index volatility? 4) Do macroeconomic factors such as inflation and interest rates, along with news sentiment and index volatility, have an impact on price volatility? Based on these research questions four hypotheses are defined as follows: H1: There is a significant positive correlation between the sentiment scores of financial news and the daily price volatility on the Helsinki Stock Exchange. H0: There is no significant correlation between sentiment scores and daily price volatil- ity. H2: The sentiment scores of financial news have a significant lagged effect on the daily price volatility on the Helsinki Stock Exchange. H0: The sentiment scores do not have a significant lagged effect on the daily price vol- atility. H3: The combined impact of News Sentiment and Nasdaq Helsinki on daily price vola- tility is stronger for highly volatile stocks. H0: The combined impact of News Sentiment and Nasdaq Helsinki is not stronger for highly volatile stocks compared to low-volatile stocks. 17 H4: The price volatility is significantly influenced by macroeconomic indicators for all the variables. H0: The price volatility is not significantly influenced by macroeconomic indicators for all the variables. The first hypothesis focuses on whether price volatility in the Finnish stock market is sig- nificantly affected by sentiment scores obtained from news headlines. This concept pro- poses that a news headline’s emotional tone, whether positive or negative, can affect investor behavior and result volatility in stock values. This is a vastly tested hypothesis around the globe for various topics such as economies, countries, individual securities, and politics. A study that tested this hypothesis on political news and its connection to voter volatility is done by Baraniak and Sydow (2021). In this paper, the authors propose a unique human-labeled dataset to train and evaluate machine learning algorithms with 3819 human-labeled political news headlines from online media sources. The literature claims that, every day a large number of casual news site readers scan through several headlines rapidly and briefly. It further explains that when headlines are read in this man- ner, readers may have an unconscious effect on how they view the world. It is noteworthy that the full news item covers more information, and the headlines may not capture the whole impression. Therefore, a reader’s perception derived from reading a news head- line may have been misdirected. This bias may also be present in financial news readers, and sentiment scores derived from financial news may not reflect the real price volatility. Therefore. Should there be a significant connection between sentiment scores calculated on news headlines and price volatility, this suggests that the majority of readers traded based on perceptions derived from news headlines and vice versa. A similar study performed to test research question one on stock market trends and so- cial media moods was done by Daifeng et al. (2019). In this paper, microblogs are used by the authors to examine the changes in the stock market and how irrational behavior can affect price volatility fueled by social media user posts. The authors examine the data gathered from Tancent Weibo, a large microblogging platform in China, for a three- months period to mimic the illogical actions of investors by creating a mood-based stock 18 trend analysis tool. Test results reveal a significant correlation between variables, provid- ing credibility to the current behavioral finance theories on irrational investor behaviors and explaining why prices have brief spikes and falls (Daifeng et al., 2019). This concept may account for the temporary volatility in stock prices should there be a significant cor- relation to news headlines sentiment in this study. The Efficient Market Hypothesis (EMH) assumes that investors value securities rationally based on fundamental analysis. In addition, other factors that affect stock market vola- tility, especially short-term and abnormal movements, may be connected to irrational investor decision-making. Edward et al. (2007) attempt to analyze high trading volume, high volatility, and stock bubbles by incorporating investor sentiment into pricing models, the Dividend Discount Model (DDM) and Capital Asset Pricing Models (CAPM). They state that investor sentiment is defined as a personal opinion on the company’s potential for future success or failure. In addition to professional and market analyst guidance, inves- tors may get insight from the macroeconomic conditions of the market as a whole. How- ever, investors tend to make their own investment decisions based their best self-analysis at the end and these beliefs may differ from person to person depending on their age, gender, culture, and level of risk awareness (Edward et al., 2007). This can be connected to the irrational behavioral aspect discussed earlier. Irrational decision-making may also have a link to financial news and the level of understanding of the content. In order to capture these variations in different investor sentiments, Edward et al. (2007) factor a modified beta, which is a function of beta and investor sentiment, into the CAPM. An investor who perceives a company as failing in the future may require a higher risk pre- mium, thereby applying a modified, increased beta to capture a high expected return. As a result, the prices may fluctuate beyond the scope of fundamental analysis due to per- sonal bias or irrational decision-making. These perceived perceptions have a delayed effect on the market. There may be a delay between reading a news item and executing a trade by an investor. A previous study tested this lag effect of news sentiment on price volatilities. Fama-French three multi- factor model is used to examine this relationship (David et al., 2019). They address non- trading days sentiment by applying a seven-day simple moving average (SMA) of the 19 scores. The findings of this paper reveal that sentiment scores have a significant impact on Dow Jones returns, and lagged daily sentiment scores are significant, indicating that information compounded in these scores is not immediately reflected in security prices. The SMA applied to cover non-trading days does not have a significant impact on the prices (David et al., 2019). Testing hypothesis two in this thesis, the sentiment score and prices of non-working days are reflected in the previous day’s numbers. Hypothesis number three is a novel test that is examined in this study. Firstly, the com- bined effect of sentiment score and index prices, both on a current and lagged basis, has not been tested previously on stock prices. However, one study examined the stock ex- change impact on news sentiment and individual stock prices on a current and lagged basis (Dumiter et al., 2023). The authors provide an integrated method to ascertain the relationship between technical analysis, the stock market, and news sentiment indica- tors through autoregressive models and conclude a solid reverse connection between these variables while emphasizing behavioral finance theories. Daxhammer et al. (2023) examine the role of emotional finance on stock markets as a hole. In addition to that, this hypothesis also checks whether the volatility profile is a key component that affects the impact of sentiment differently. 1.4 Background It has been discovered that news headline sentiment scores have a significant influence over price volatility. Previous literature examined this effect using models such as Vector Autoregressive, Autoregressive lag models, and Generalized autoregressive models (Yen et al., 2021). Additionally, studies show that news sentiment has a significant lag- time effect on price volatility. The lagged effect is considered when analyzing the lead- lag and contemporaneous news effects (Yen et al., 2021). The contemporaneous news effect is the immediate impact on the market of the news. The lead-lag effect refers to the influence that the latter price, the lag, has on the former price, the leader, on two consecutive days. This implies, the volatility of day one mirrors the volatility on day two (Yongli et al., 2022). This study states that power-law distribution is a basic nature of 20 human activities and reveals that the number of cumulative lead-lag days between stock pairs in both Chinese and American stock markets matches the distribution, which validates the power-law distribution in stock trading. Power-law distribution is a functional relationship in which a change in one quantity relative to another causes a corresponding change in the other quantity that is proportionate to the power of the change. A previous study by Khan and Ahmad (2018) examine the lead-lag and bi-directional relationship between investor sentiment and market returns for an emerging market and concludes that sentiment plays a significant role in derailing the market from its sustainable state or pulling the market down as investors behave irrationally. This im- plies that the investor reaction to negative news may pull the markets down more com- pared to the positive news effect. This concept is known as the leverage effect which is examined in previous literature by Yang et al. (2019). They highlight that the leverage effect, and investor sentiment have a big impact on the volatility forecast. Another study by Laakkonen and Lanne (2008) claim that bad news increases volatility more than good news. Supporting this claim further, the Australian National University reveals that intraday volatility is more sensitive to negative news than positive news (ANU, 2013). In addition, Ellwood (2020) shows that consumers around the world respond more psychologically to negative news than to positive news. From a behavioral finance perspective, one study shows that negative news before 1 day and after 4 days makes stock price volatility larger (Chung Wu et al., 2022). The relationship between stock indices volatility and individual stock prices is popularly studied, but the combined effect of news sentiment and stock market performance on individual stock price volatility, along with the impact from inflation and interest, may not be frequently observed, especially in Finland. Another study by Seng and Yang (2017) aims to investigate the relationship between financial news and stock market volatility by building a dictionary using grammar, mul- tiword structure, and sentiment analysis. The researchers examine news content using a social media content measurement methodology, and they suggest a model that 21 combines structured and unstructured data to examine the relationship between fi- nancial news and market volatility. The authors collect unstructured data from the in- ternet which is processed through Natural Language Processing (NLP) to arrive at a sentiment score before it is integrated into the model. The dictionary is built based on the people’s reading along with additional information analysis to process this data into assign sentiment scores to each news item. The structured data is directly integrated into the model. According to the findings, a substantial correlation between financial news and market volatility is detected, and good news is positively correlated with ris- ing stock prices (Seng and Yang, 2017). Nevertheless, the fact that this study only uses one news source could make it more unclear to determine the accurate impact of mar- ket sentiment at large. In order to address this, I use news headlines from numerous publications that are kept up-to-date in the Refinitive database. In the case of general news reading habits, the majority of the newsreaders still prefer reading news on the hard paper, and around six hundred million readers are on the digital platform (Ponsford, 2012). With the rapid development of social media, news readers have diverted from regular platforms to social platforms to read or view news. According to a survey done in 2020, more than 80% of Indians, aged 16 to 70, used social media as their primary source of news, as did nearly 60% of Argentinians and Australians (Watson, 2022). A major financial news source, Financial Times, reported one million paying subscribers on its digital platform (FinancialTimes, 2022). There are approximately 4.3 million active online stock traders in the world and nearly twelve thousand Finnish online traders (forex.in.rs, 2017). As these numbers represents a sig- nificant portion of the whole investor base, it is safe to assume that demand for digital news and online trading platforms is higher. In financial markets, news reading platforms may be high price while some channels are free. Platforms, such as Bloomberg and Reuters, cost thousands of dollars to gain ac- cess. For example, Bloomberg Terminal costs up to USD 25,000 per user for a year, while Reuters costs around USD 2000 per month. These platforms cover a vast scope of in- formation about companies, ranging from financials to media coverage. These data can be downloaded, processed, and analyzed according to the user’s preference. These http://forex.in.rs/ 22 platforms cover a wider range of news published around the world from various credi- ble sources. Traders who have the luxury of using such a platform benefit from thor- ough analysis based on all the information relevant to a company or a share. In terms of financial news, traders can view a wider range of news items with the click of a but- ton, and if needed, a sentiment heatmap can be retrieved, that provides a visual illus- tration of the sentiment of the news related to a company. The instant trading features that combine these news sentiment heat maps support the trader in forecasting future volatility. On the other hand, newspapers such as the Financial Times require a monthly subscription, which may also be considered costly for beginners who wish to gain ac- cess to more reliable news. Therefore, conducting a sentiment analysis on financial news may not be a fair representation of the most accurate sentiment of the market, as market participants may experience information gaps due to cost and unavailability. In other words, new entrants to trading and the lower layer of investors may not have access to all the financial news, which implies that not all investors are fully informed. Conducting a sentiment analysis in this scenario may contain accuracy gaps. An ad- vanced trader who has access to an improved sentiment map like below would benefit from reducing risk and securing more returns compared to a trader in the lower cate- gories. According to research conducted in 2020 by the NORC at the University of Chicago and the FINRA investor education foundation, a large number of new retail investors joined the American securities market. These were categorized into three groups: holdover ac- count owners, experienced entrants, and new investors. The findings reveal that, com- pared to experienced entrants, holdover account owners and new entrants were younger, had lower income, and represented a wider racial spectrum (FINRA and NORC, 2021). Image 2 is an example of what an interactive sentiment heatmap looks like, and this freely available web application is built by Damian Boh, a machine learning engineer. This application illustrates the sentiment of the market derived from news sentiment and is updated daily and hourly, supporting traders in understanding the market mood of their holdings. According to Boh (2022), the user can enter the stock ticket to retrieve the sentiment data, which visually illustrates whether the market emotion is negative or positive for that particular ticket. The score ranges from negative three to positive 23 three, which is a similar scoring range in sentiment analysis libraries like FinBERT. The database is populated through financial news headlines scraped from the webspace is processed through machine learning techniques to provide a visual illustration of the performance of stocks (Boh, 2022). https://damianboh.github.io/stock_sentiment.html Image 2 https://damianboh.github.io/stock_sentiment.html 24 2 Literature Review and Critical Analysis 2.1 Sentiment Analysis Sentiment analysis is part of Natural Language Processing (NLP) combined with neuro- science, linguistics, mathematics, and computer science, which is a method to ascer- tain the sentiment contained in a text. It can categorize the feeling as neutral, negative, or positive. A scored word list, such as AFINN scores words between minus five and plus five and is used in the most basic implementation of sentiment analysis (Shivanandhan, 2020). Andrea et al. (2015) discuss various tools that can be used in analyzing sentiment data. The study also documents making better choices by identifying consumer senti- ment, not only in finance but in all industries, including politics. According to the au- thors, the sentiment analysis was first discussed by Nasukawa and Yi (2003), and more research has been conducted thereafter. According to Andrea et al. (2015), there are five main stages of sentiment analysis. Starting with data collection, text preparation, sentiment detection, sentiment classification, and presentation of output. For the pur- pose of this study, the focus is on how people's opinions, attitudes, and emotions can affect making better or worse investment decisions by quantifying news articles with sentiment analysis tools. In this process, the texts are classified according to negative, positive, and neutral, which can be generated through various trained dictionaries con- taining lists of words categorized according to their meaning. For example, words such as ‘agree, good, support, pros, benefits, win’ are treated as positive sentiment, while ‘disagree, bad, opposition, cons, disadvantage’ are listed as negative sentiment. A study by Usmani et al. (2023) reveals that there is a significant correlation between financial news and stock market trends, and in this paper, the authors introduce a weighted average model for news categories, in which they categorize news according to the sector and stock-related news. After that, the authors combine the Long-Short- Term Memory-Based Weighted and Categorized News Stock Prediction Model (WCN- LSTM), which they claim to be a better model that explains stock market trends based 25 on news. This indicates that, with the news articles analyzed through a sound model, investors benefit from the sentiment results generated. However, other simpler methods can also be used for this purpose. Popular news platforms like The Guardian, Bloomberg, Reuters, and CNBC provide the latest and most up-to-date financial news precisely. Due to the rapid increase in financial news articles published at a given time, mainly through digital platforms, readers can experience difficulties in covering important information relevant to their investments (Kirange et al. 2016). Therefore, news headlines are written in such a way that readers can grasp the contents of a news item quickly (Sathya, 2023). In addition, tools like sentiment analysis are designed to give a visual or quantitative illustration of a particular news item, which can support investors in making their decision in a fast-moving trad- ing environment. Most professional traders build their own mechanisms to trade stock around the world by incorporating various analyses such as technical analysis. Plat- forms such as Bloomberg and Reuters are most popular and provide these technical anal- yses supported by heat maps that show the sentiment of a particular day for a given exchange or security. Below is an example of sentiment on June 29th, 2020, for major U.S. companies analyzed through Twitter data (AltSignals, 2020). Image 3 26 According to this map, green represents improved sentiment, red is for worsened sen- timent, and grey is for neutral sentiment. This map reveals a positive trend for commu- nication services, while consumer durables appear to be on a negative trend. Consumer discretion has a mixed result. An investor who refers to a heat map such as this would not be encouraged to place money on consumer durable companies compared to com- munication services. However, a daily heat map only represents that day’s sentiment, and advancing this to a weekly, monthly, and yearly heat map would be ideal to identify long-term trends in stock performances. Investors may have different profiles and can be categorized, ranging from new entrants to professionals. The newcomers may first trade with less knowledge and analysis, while most experienced traders may have an overall knowledge of the securities that they invest in. Another key element is the financial literacy level of each investor. As Mitchell (2014) suggests, ‘’While the costs of raising financial literacy are likely to be substantial, so too are the costs of being liquidity-constrained, overindebted, and poor.’’ Therefore, deeper knowledge makes an investor more successful. Stemming from this, an experienced investor looks at different analyses before an investment decision is made, which can range from reading simple financial news to analyzing company financial statements. Therefore, the analysis of news may represent a smaller portion of support but an important element. Investor sentiment cannot be directly measured, yet it is an important element as there is a strong reaction to news from market participants, even for a smaller portion of the news feed (Arratia et al., 2021), (Matteo et al., 2021). Traditionally, investor sentiment was analyzed through emotional proxy variables. (Jiangshan et al., 2021) Thanks to the advancement of different types of analytical tools, such as machine learning and natural language processing (NLP) methods, the analysis of textual data has become more popular and widely employed in financial market analysis. On another occasion, the digital transformation in the financial industry has impacted positively towards better customer service through online banking and fintech applications. Kachan (2021) states that the service gaps in the banking sector can be filled by NLP-driven sentiment analysis. Kachan (2021) also explains how 27 identifying consumer sentiment can be beneficial in developing new products and im- proving the quality of services. One of the key components of sentiment analysis is textual analysis, which can also extend to image, film, art, etc. analysis. Software like Python, MATLAB, and R are popular sentiment analysis tools. Additionally, other plat- forms such as Brand24, Digimind, Lexalytics Salience, and the MS Azure text analytics API are more popular (Fontanella, 2023). In order to run these analyses on financial news, various financial term libraries are developed to generate accurate sentiments. FinBERT is a dictionary trained to analyze financial text sentiment. Another popular dic- tionary is McDonald and Loughran. Financial term dictionary, which has lists of financial terms that are connected to positive and negative scores. Similarly, VADER, TextBlob, and NLTK are also used in this process. An important ongoing area of scientific research in finance is volatility prediction and forecasting. Using a linguistic approach to analyze news content and incorporating news sentiment scores into this process may improve forecasting. A study by Yen et al. (2021) uses a method to quantify the sentiment score by factoring in the positive and nega- tive words included in the news content. The weakness of this approach is not captur- ing the statement’s overall meaning but the number of positive and negative words needed to arrive at a sentiment score. Nevertheless, they claim that news sentiment is an effective reference to security trading. This may hold true if the accuracy rate of the news sentiment score is high. A high accuracy rate or intended meaning cannot be achieved by counting the number of words, as it may only give a quantification of neg- ative and positive words but also an implied message in a news item. A sentence that is included in the news item to express a positive sentiment may be misinterpreted by a word-to-word count. For example, ‘’the expenses were reduced by liquidating loss- making business units to increase profit’’ In this statement, four negative words over- ride the two positive words, leading to an overall negative sentiment score, regardless of whether the whole sentence is a positive statement. 28 Other studies by Tetlock (2007) and Schumaker et al. (2012) employ the Harvard Gen- eral Inquirer (GI) and Arizona Financial Text systems, respectively, limiting them to word-based sentiment calculation. Similarly, Loughran and McDonald (2011) incorporate 10-Ks, which limits the scope to the Harward-IV-4 psychological dictionary. Another drawback of this type of word-base quantification is the misinterpretation of the con- text. For example, “reduction in cost” may be interpreted as two negative words leading to an overall negative sentiment. This is further supported by the FinBERT test that is used in this study to quantify news sentiment. A sample of misinterpreted news head- lines is provided in the 3.2.1 section, and as a result, the polarity base score calculation method is used in this study. 2.2 Model and Data Gaps A number of different regression models have been used to analyze the impact of news on financial market volatility. The study by Yen et al. (2021) uses the GARCH model to quantify the impact on Taiwan’s stocks. GARCH may capture relationships between linear data sets but may not capture large changes and heavy tails (Yen et al., 2021). In con- trast, the ARDL model calculates both the short- and long-term effects of one variable on another by adding lags of variables to the model (Chetty, 2018). A noteworthy fea- ture of both models is that they do not capture the leverage effect of sentiment, which is the asymmetric response to both positive and negative sentiment. Another analysis by Deveikyte et al. (2022) finds a relationship between Twitter data and FTSE100 movements. The sentiment derived through tweets may signal near-future movements but does not affect long- or short-term volatility (Deveikyte et al., 2022). Further, the impact of social media comments and posts may vary depending on the number of followers, and the messages may not reach the full investor base. This may be due to restricted accounts, a lack of awareness of the existence of certain Twitter or X accounts, etc. On the other hand, news articles published on reliable platforms may reach a larger investor base. 29 Previous literature states that social media rumors fueled by the herd effect can cause investor confidence to collapse and financial risk to spread further (Zhang et al., 2022). Therefore, investors solely relying on social media posts may absorb uncalculated risks into their portfolio. The disadvantage of incorporating such a large number of individ- ual messages could be that it may lead to personal bias, and these unsubstantiated personal opinions may disrupt financial stability (Zhang et al., 2022). Therefore, the Re- finitive database provides a better reach as it captures many credible news sources. It is also important to note that platforms such as Reuters are of expensive and not all market participants have access to such high-end platforms. To address this, the boolean search operator captures the Finland’s famous financial newspaper, Kauppalehti. Hritha and Rishad (2020) explore evidence from India for an empirical investigation of investor sentiment and market volatility. This literature examines how the emotions of irrational investors impact market volatility by applying the Granger and Generalized Au- toregressive Conditional Heteroskedasticity (GARCH) causation approaches and con- cludes that excessive market volatility is largely caused by unreasonable sentiment. The limitation of Granger causality is that it assumes volatility occurs post-news sentiment impact. This might be accurate to some extent, particularly in cases where there is a noticeable lag effect, but because investors react quickly to news propagation, this as- sumption may not always hold true. Okafor and Nneamaka (2023) highlight the mechanics of volatility in financial decisions might be difficult to understand. The authors argue that GARCH models have trouble with volatility persistence, and to address this, the GARCH model is changed into the Markov regime-switching GARCH model, which enables conditional mean and variance to vary dynamically. The GARCH model is used to predict the volatility of returns on financial assets and is applied specially to analyze time series data. Volatility persis- tence, often referred to as the autocorrelation of volatility, is a concept that is seen in financial markets, where volatility depends on how volatile a specific period was com- pared to a prior period (FasterCapital, 2023). Volatility persistence is influenced by a number of factors. The existence of news shocks is one of the most important elements. Unexpected and abrupt events known as news shock frequently have a long-lasting 30 effect on market volatility (FasterCapital, 2023). The actions of market players also have a role in this area. Traders and investors may become more cautious and modify their holdings in response to significant market volatility (FasterCapital, 2023). The ARDL model offers a special method for handling volatility persistence (Ari, 2021). Applying the ARDL model in this thesis addresses volatility persistence, and autocorre- lation. Choosing optimal lag values based on the Akaike Information Criterion is one method to approach this. Ari (2021) states that volatility measurements might be used as explanatory variables in the ARDL model, and this makes it possible for the model to accurately reflect how volatility persists over time. An error correction model that con- siders both the short- and long-term impacts of variable changes may be obtained by re-parameterizing the ARDL model and may be used to account for volatility persis- tence by capturing the effect of volatility both instantly and over time (Kripfganz and Schneider, 2016). 31 3 Data 3.1 Date Process Chart Three categories of data are used in this empirical analysis: news headlines down- loaded from the Refinitive database, individual prices downloaded from NASDAQ Hel- sinki, and macroeconomic data from Suomentpankki. Pre-processing of data is done in Excel and uploaded to different platforms to perform tests and analyses. Figure 1 Refinitive Data News Headlines - Finland, - OMXH25 and OMXHPI - Kauppalehti Sorting of news data according to date and time Upload CSV file to Python platform Translate Finnish and Swedish news to English Translate Finnish news to English Calcualte news sentiment on FinBERT, Naïve Bayes and SVM , and PYsentiment2 Cacluate day's sentiment through weighted average method Prepare data for the regression tests - Logrithmic transfer NASDAQ Helsinki Data Prices - NASDAQ Helsinki - Six Individual Stock Prices Download to Excel individually Calculate daily price volatility each variable Prepare data for the regression tests - Logrithmic transfer Macroeconomic Data Inflation and Interest Rates 32 3.2 Pre-Processing Data The study hypotheses are focused on sentiment scores calculated for the news head- lines published under the key search word Finland for the period January 1, 2023, to December 12, 2023. More than forty thousand news headlines are downloaded man- ually from the Refinitive database each day as data is not readily available. As Refinitive publishes news articles from many credible sources, the search is filtered to capture the most relevant news items with Boolean operators: Topic:FI OR R: OMXH25 OR R: OMXHPI OR (Product: KAU NOT Language:LZH NOT Topic:US). Prominent sources such as seekingalpha.com, public technologies, Edgar Filings, and many more were included in the search options that captured breaking news alerts, research, filings, transcripts, press releases, web news, and posts from Twitter or X. With this search option, the headlines of the most relevant news articles are downloaded in Excel format for easy use and fast cleaning. Each news headline is tagged with the date and time of the re- lated stock identifier, where the news item is related to a company listed on the Helsinki stock exchange. Finland’s most prominent financial newspaper, Kauppalehti, is also in- cluded in the Boolean search to include the latest and most relevant financial news in the database. The daily historical prices of NASDAQ Helsinki are downloaded from the Refinitive da- tabase for the period of January 1, 2023, to December 12, 2023. These prices are then processed to capture the daily volatility by calculating the difference between the close and open prices for each day. Similarly, the daily prices of the top and bottom three volatile companies are downloaded from the same database. Macroeconomic indicator data is taken from the Suomentpankki.fi database. Monthly inflation rates and daily in- terest rate data are added to the database. Monthly data is allocated to each day of the respective month; for example, the January inflation rate is used throughout the month of January. The data is downloaded from the NASDAQ Helsinki data base in Excel format for each variable. These prices are then sorted according to date for the period of January 1, 2023, to December 11, 2023, which accounts for 346 data points. A separate daily price 33 volatility column is added for each day by taking the difference between open and close prices. This column for each variable is transferred to a separate sheet to prepare the data for the regression analysis. To address negative data points, the entire data is con- verted to a positive state by using the =min function in Excel. This function helps to identify the minimum number in the data set for each variable and convert it to a pos- itive state, which is then converted to a logarithmic version with the =LN excel function to prepare the data for regression analysis. It is important that the data be normalized before regression analyses are performed. A highly skewed variable can be transformed into a more normalized dataset using a logarithmic transformation. This is especially useful for modeling variables that have a non-linear correlation because it can also lead to a negative skew in the likelihood of errors (Cleophas and Zwinderman, 2016). A key assumption in regression analysis is homoscedasticity, which states that the variance of the errors is constant at all levels of the independent variables. Heteroscedasticity is the state in which the amount of the error term varies across values of an independent variable when this assumption is not met. For example, the log rithmic transformation of variable Y is =ln(Y). Regres- sion analysis findings can be adversely affected by outlier values, which can be miti- gated with the use of this transformation. It can improve the interpretability of the data patterns and the accuracy of the model’s prediction by compressing the scale on which data is measured (Sole, 2022). It is common that the relationship between variables can be non-linear. A linear regres- sion model may find it challenging to properly present the correlation as a result. Re- gardless of this nature, data is converted to a linear nature from logarithmic methods. In other words, a curved connection may be straightened out by using this method. For example, X and Y have a connection such that Y = a*exp(b*x). Then ln(Y) = ln(a) + b*x, which is a linear relationship (Greenwood, 2022). Another advantage of converting data to logarithmic is that it addresses overfitting. Overfitting is known as a model with an excessive number of parameters in relation to the amount of data. By keeping the model’s function less complicated, the logarithmic transformation can aid in preventing 34 overfitting. This suggests that by changing the distribution to a more regularly formed bell curve, the model may become less prone to overfitting (Andy, 2019). 3.3 Sentiment Score Calculation The focus of this study is to analyze the relationship between market sentiment set by financial news and stock price volatility. Therefore, more emphasis is put on the econo- metric analysis of the relationship while ensuring the sentiment data is suitable and valid for this exercise. The financial news headlines are preprocessed before they are used in machine learning techniques. The Python platform is used in this exercise, and the process is briefly explained below. More than 40,000 financial news headlines were downloaded from the Refinitive da- tabase for the period from January 1, 2023, to December 11, 2023, with an average of 110 news headlines per day. A sentiment score is calculated for each day, quantifying the textual impression into a number ranging from positive to negative. This processing is done through several machine learning (ML) techniques, and a natural language toolkit is used to process the headline. It is imperative to use libraries with pre-trained financial words. In this exercise, FinBERT and PYsentiment2 libraries are used to arrive at sentiment scores for each news headline, and the most accurate prediction is used for the regression analysis. Image 4 35 This chart represents the distribution of words according to the number of appearances in the data set. It is evident that the dataset is centered around Finland. The below charts illustrate the number of appearances of key words in the data set. Figure 2 Figure 3 36 3.3.1 FinBERT Pre-processing each news headline is performed to remove special characters, include dates, and translate headlines in Finnish and Swedish to English through a Google Translator command. Thereafter, the FinBERT model is installed, and the data is pro- cessed to arrive at a sentiment score. FinBERT sentiment scores are 0 for neutral, 1 for negative, and 2 for positive. To calculate the day’s sentiment, averaging the scores over 100 plus data points does not provide a valid sentiment for the day. In addition, assigning equal weights to each score and arriving at a weighted average score provides a similar result to the simple average method, as the market may react differently to positive news than to negative news, which is known as the leverage effect. Therefore, assign- ing more weight to negative sentiment seems fair when arriving at an aggregated sentiment score for a day. But identifying the optimal weights to be assigned to each score may be challenging. However, as a proxy, 50% more weight can be assigned to the negative score, and this was further validated by a study by Danowski et al. (2020), which states that negative scores were nearly twice as strong as positive scores, thereby assigning 1 for neutral, 1 for positive, and 1.5 for negative scores seems like a fair approach. To address the anomalies in arriving at a consolidated sentiment score, a couple of models were used, namely, Naïve Bayes and Support Vector Machine (SVM). The below chart illustrates the main steps followed in the FinBERT sentiment calcula- tion. •Data collection •Pre-processing of news headlines and save the data in csv format Upload to Python •Installing FinBERT •Instaling trasnformers and vectorizers, and goole translator •Process data with pre-trained libraries Translation of Swedish and Finnish News to English, and chunk proces the 40k plus news headlines •Calculate news sentiment •Calculate weighted average score for each day Download daily sentiment to excel 37 Figure 4 FinBERT and other word-processing sentiment libraries have the major drawback of not accounting for the entire statement of the headlines but the number of words. The sentiment score derived from FinBERT produced inconsistent results compared to the contents of the headline statement. The below table of samples illustrates these dis- crepancies. In this, negative sentiment is given a positive score, and vice versa. Figure 5 3.3.2 Naïve Bayes and SVM The Naïve Bayes algorithm family uses Bayes’ theorem and probability theory to predict the category of a given sample, such as a news headline. Due to its probabilistic nature, the theory determines each category’s probability for a given sample and then outputs the category with the highest probability. The main assumption of this theory is that every feature contributes equally and independently to the result and calculates the likelihood of an event based on the likelihood of an earlier event. The mathematical formulation of this theory is: Equation 1 𝑃(𝐴 ∣ 𝐵) = 𝑃(𝐵 ∣ 𝐴) ⋅ 𝑃(𝐴) 𝑃(𝐵) Where A and B are events and P(B) is not 0. The model looks at the likelihood that event A will occur, given that event B is true. To test this, the data is uploaded to the Python platform and trained for testing sets. After that, the text is converted to a matrix of 38 token counts through vectorization to be used in the machine learning algorithms. (Grisel et al., 2023). The results derived from this exercise showed a lower accuracy rate of 40%, and therefore, the Naïve Bayes combined sentiment score is not selected for this paper. To address this issue, another model called Support Vector Machine (SVM) is used to improve the accuracy rate. Similar to Naïve Bayes, SVM also produces a set of tokens to run machine learning algorithms. With this method, the accuracy is slightly improved to 54%. The accuracy rates of each of these models represent how well the model cor- rectly predicts the outcome of the data. This implies that, when arriving at a combined day’s sentiment score, the above models generated results with lower accuracy rates. 3.3.3 PYsentiment2 Similar to the FinBERT process, a different set of commands is used in calculating the sentiment score. In addition, a polarity score is calculated to derive a sentiment for the full news item. This library is used for general financial sentiment analysis and offers two sentiment dictionaries, Harvard IV-4 and Loughran and McDonald (DeRobertis, 2020). Figure 6 •Pre-processing of news headlines and save the data in csv format Upload to Python •install pysentiment2 •install googletrans •Apply stopwords Translation of Swedish and Finnish News to English, and chunk proces the 40k plus news headlines •Calculate the daily compound sentiment score •Apply the function to the 'Processed_Text' column •Add polarity score •Check accuracy of your model Download daily sentiment to excel 39 This library is used to analyze the headline statement and arrive at a polarity score. The PYsentiment2 library is used and provides a customized output depending on the num- ber of positive and negative words included in the headlines (Shah, 2021). For example, positive: 4 indicates that there are 4 words in the text with a positive sentiment, and negative: 1 means that there is 1 word with a negative sentiment. A polarity score is provided for each news headline, taking the positive and negative scores into account as follows: Equation 2 Positive − Negative Positive + Negative According to this, the polarity score of the above example is 0.6 and can be used as a combined sentiment score for a news item. Apart from this, the model also calculates the presence of personal opinions, evaluations, or beliefs under the subjective score (Shah, 2021). This is arrived through: Equation 3 Positive + Negative Number of Words in the Text However, this subjectivity score is not considered in the sentiment calculation, as dif- ferences in opinion may vary for each investor. After arriving at a more accurate senti- ment score for each headline, the next challenge is to calculate a combined sentiment score for each day. Ranking the scores based on the number of times they appear in the dataset for a particular day may provide an overall sentiment for that day. For ex- ample, more negative scores for news headlines mean the day’s sentiment is negative, and to obtain one sentiment score for a day, the weighted average method is applied. Polarity scores represent the day’s sentiment by taking positive and negative word counts for the day. This thesis applies the basic weighted average method of sentiment scores. A day’s polarity sentiment score is calculated by averaging the polarity scores for the particular day. Similarly, a score of the weighted average of the positive and negative words is calculated for a day using the below formula. 40 Equation 4 Positive Positive + Negative ∗ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑤𝑜𝑟𝑑𝑠) + ( Negative Positive + Negative ∗ 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑤𝑜𝑟𝑑𝑠 = Day’s sentiment The trial-and-error approach provides similar data patterns and evidence of stationarity. Therefore, this empirical analysis is performed on polarity averages and is considered the day’s sentiment for regression analyses. Below graphs represents the distribution of each score and news. Figure 7 Sentiment scores capture the news headlines published on both working and non- working days, while the stock prices are only available on business working days. In order to capture the maximum possible number of data points, the stock prices on non- working days capture the previous day’s price volatility, thereby creating 349 data points. The volatility of stock prices for each day is the difference between the closing and opening price. Each volatility and other variable are transformed into logarithmic series to prepare data for the regression analysis. Polarity Scores Vs Weighted Average Average of Polarity Average of Weighted Average Linear (Average of Polarity) Linear (Average of Weighted Average) 41 4 Methodologies 4.1 Statistical Models This section looks at different methodologies applied to test each hypothesis in detail. Main model used is the Vector Autoregressive model (VAR) as this study is related to two main time series data sets, daily sentiment scores and daily stock price volatility. In addition, individual stock price volatility, interest rates, and inflation rates are also captured in further hypothesis testing. The relationship between several time series data points can be examined using VAR models (Chetty, 2018). This involves examining different correlations between sentiment scores and the volatility of stock prices over time. This model is widely used in the financial world to make more accurate forecasts by capturing the dynamic interdependencies between variables. Time series data, which are sets of data points gathered over a period of time, are fre- quently used in financial statistics and economic analysis. A time series’ correlation with its own past and future values is known as autocorrelation, and is a fundamental characteristic in time series analysis as it facilitates the discovery of patterns in the data. VAR allows for the examination of the relationships of variables with their own past and future prices and with other variables. Further to this, the relationships between the aforementioned variables are examined in VAR and ARDL analyses. The results de- rived from these two tests are tabled in the Results section. 4.1.1 Vector Autoregressive Model (VAR) VAR is widely used in time series analysis and forecasting (Mohr, 2018). It models each variable as a function of its own past values, which is helpful when multiple time series influence each other. Other autoregressive models, such as AR, ARMA, or ARIMA, are unidirectional, while VAR is bi-directional (two-way) which enable a more comprehen- sive understanding of the variables along with improved forecasting (Prabhakaran, 2024). Because of VAR’s usability in real-world applications, it is an effective tool to analyze disciplines like finance and economics, where it is crucial to comprehend the dynamic relationship between several variables over time. Stemming from the 42 univariate autoregressive model, the VAR model regresses the vector of time series on the lagged vectors of these variables. Put differently, the value a variable depends on its own lag, or today’s condition of the prices is a base to set tomorrow’s price, similarly for the news sentiment. For example, a news article published today for a particular share may set the mood for the next news article published in the future for that stock. Therefore, autoregressive models fit perfectly to capture this behavior in this study. The basic univariate autoregressive model has one dependent or endogenous variable. The current value of variable Y is dependent on its own lag. This is an order-one VAR model, and by adding more lags, the order levels can be increased. a and et stand for intercept and error terms, respectively, where the error term has a normal distribution. Stationarity of data is one key assumption in this model (Mohr 2018). Regressing a var- iable with its own lag may not fully reveal the relationship between two or more vari- ables, as other variables may also have a significant impact on each other. Models that incorporate contemporaneous and lagged values of other, or exogenous variables, to- gether with lagged values of the dependent variable, correctly represent this concept (Mohr 2018). This is also known as the Autoregressive Distributed Lag (ADL) model and the equation can be written as follows: Equation 5 𝑦𝑡 = 𝑎 + 𝑦𝑡𝑡−1 + 𝑒𝑡 Equation 6 𝑦1𝑡 = 𝑎 + 𝑏0𝑦𝑡−1 + 𝑏1𝑥𝑡+ + 𝑏2𝑥𝑡−1 + 𝑒1𝑡 This ADL model may be a better model for forecasting the relationship between varia- bles than a basic autoregressive model. If the endogenous variable’s lag value also af- fects the exogenous variable, the VAR approach supports this relationship (Mohr 2018). The basic Var equation can therefore be written follows: Equation 7 ( 𝑦1 𝑦2 ) = [ 𝑎11 𝑎12 𝑎21 𝑎22 ] ( 𝑦1𝑡−1 𝑦2𝑡−2 ) + ( 𝑒1𝑡 𝑒2𝑡 ) or 𝑦1𝑡 = 𝑎1𝑦1𝑡−1 + 𝑒1𝑡 y2t=a2y2t-1+e2t 43 This model assumes everything depends on everything, including endogenous varia- bles and exogenous variables, and each line in the above matrix equation can be con- sidered as two different equations subsequently. 4.1.2 Auto Regressive Distributed Lag Model (ARDL) ARDL is an expansion of autoregressive models and considers both lags and self-lagged values of more variables (Gupta, 2022). Because of this, ARDL is especially useful for examining sentiment scores and how they affect price volatility. Furthermore, the ARDL model takes care of the problem of collinearity by including a variable’s lag along with the lags of other independent variables (Chetty, 2018). Collinearity is the high correla- tion among independent variables (stats.stackexchange, 2015). This should be adjusted before tests are performed, as it can lead to unreliable and unstable estimates of the model. Therefore, ARDL is helpful to understand the complex dynamics of this study as more exogenous variables are introduced in later hypothesis tests. The ARDL model is beneficial in capturing the dynamics of the variables over time, as changes in variables may not reflect immediately (Chetty, 2018). In addressing autocor- relation, a lagged response variable serves as a type of substitute for response variable autocorrelation when it is employed in a model. Generally, after the effect of autocor- relation is eliminated, the remaining explanatory variables are included in the test to determine whether there is still a statistical relationship between these variables (Nguyen, 2021). ARDL can also separate the long-run and short-run effects, which are used to test cointegration and the long-run relationship between the variables, and including lagged variables can enhance the model’s fit by enabling the model to cap- ture more variations in the variables (Gupta, 2022). A challenge that is common when performing ARDL is singularity due to multicollinearity in the data. This occurs when regressing different lags of independent variables, which may lead to unstable coeffi- cient estimates (Eviews, 2014). To address this when performing the ARDL model in EViews, one lagged version of independent variable is applied to equations, as the op- timal lags are automatically picked by the test. 44 ARDL captures fore components to forecast patterns. Self-lagged values, distributed lags, seasonality, and trends. The trend component is captured by e + x0 + x1t + x2t² … xk*tᵏ and the seasonality is captured with ΣᵢXᵢ Sᵢ (Gupta, 2022). Therefore, the basic equation is as follows: Equation 8 𝑌𝑡 = 𝛼 + ∑ 𝑝 𝑖=1 𝛽𝑖𝑌𝑡−𝑖 + ∑ 𝑞 𝑗=0 𝛾𝑗𝑋𝑡−𝑗 + ∑ 𝑟 𝑘=0 𝛿𝑘𝑍𝑡−𝑘 + 𝜖𝑡 In this equation 𝑝, 𝑞, 𝑟 are the optimal lag lengths, which will be automatically picked up by the AIC process, and Y, X, Z are the different variables, 𝛽, 𝛾 , 𝛿 are coefficients, 𝜖 and 𝛼 are error term and intercept, respectively. 4.1.3 VAR Vs ARDL VAR uses multiple equations to explain variables using both its own and other variables, while ARDL uses a single equation with one dependent variable that is regressed on both independent variables and its own lags. All variables in VAR model are considered endogenous, which means that other variables in the system have an impact on them, which leads to simpler forecasting. In comparison, the ARDL model usually considers one variable to be endogenous while others are treated as exogenous, which makes forecasting more difficult (stats.stackexchange, 2015). This indicates that although other variables affect the endogenous variable, the other variables are not considered to be impacted by the endogenous variable. 4.2 Hypothesis 1 – Correlation H1: There is a significant positive correlation between the sentiment scores of financial news and the daily price volatility on the Helsinki Stock Exchange. H0: There is no significant correlation between sentiment scores and daily price volatil- ity. To test hypothesis 1, the basic VAR test and Pearson correlation coefficient formula are applied to identify the initial relationship between variables. The dependent variable Y 45 Nasdaq Helsinki, and the independent variable X is the sentiment scores derived from sentiment analysis for each news headline statement daily. The econometric model for this analysis is: Equation 9 𝑟 = ∑ 𝑛 𝑖=1 (𝑁𝑆𝑖 − 𝑁𝑆ˉ )(𝑁𝐻𝑖 − 𝑁𝐻̄) √∑ 𝑛 𝑖=1 (𝑁𝑆𝑖 − 𝑁𝑆ˉ )2 ∑ 𝑛 𝑖=1 (𝑁𝐻𝑖 − 𝑁𝐻̄)2 𝑁𝑆𝑖 represents the News Sentiment scores of Finnish financial news headlines for each day i. A consolidated score is derived for each day, considering positive, negative, and neutral sentiment scores. This score represents the overall market mood. The depend- ent variable 𝑁𝐻𝑖 is the price volatility of Nasdaq Helsinki. The tests are performed for a few dependent variables in this study, ranging from top individual stocks to index price volatility. The volatility is calculated as the standard deviation of the daily returns for the selected dependent variable. n is the total number of days from which the scores and volatility are derived. 𝑋̅ is the mean of the sentiment scores and 𝑌̅ is the mean of price volatility. (𝑁𝑆𝑖 − 𝑁𝑆ˉ ) and (𝑁𝐻𝑖 − 𝑁𝐻̄) are the difference between the individual sentiment scores, and price volatility respectively and (𝑁𝑆𝑖 − 𝑁𝑆ˉ )(𝑁𝐻𝑖 − 𝑁𝐻̄) is the product of these differences. 4.3 Hypothesis 2 – Lagged Effect H2: The sentiment scores of financial news have a lagged effect on the daily price vola- tility in the Helsinki Stock Exchange. H0: The sentiment scores do not have a lagged effect on the daily price volatility. Basic VAR (1) order one model with one endogenous variable is given below. The rela- tionship between the variable and its own lag is tested in this model. Nasdaq Helsinki (NH) and News Sentiment (NS) act as dependent variable, respectively. Equation 10 𝑁𝐻 𝑡 = 𝑎 + 𝛽1𝑁𝐻𝑡−1 + 𝜁 46 Further to VAR, the ARDL model can also be applied on the dependent variable 𝑁𝐻 with first-order lag along with the independent variable 𝑁𝑆 as below: Equation 11 𝑁𝐻 𝑡 = 𝑎 + 𝛽1𝑁𝐻𝑡−1+𝛾1𝑁𝑆𝑡+𝛾2𝑁𝑆𝑡−1 + 𝜁 In this model, today’s stock price volatility of 𝑁𝐻 is stated as a function of its own one lag value 𝑁𝐻𝑡−1 and sentiment score’s current and one lag value 𝛾 1 𝑁𝑆𝑡, 𝛾 2 𝑁𝑆𝑡−1. The coefficients 𝛽1, 𝛾 1 and 𝛾 2 measures the impact of one unit change in lagged variables on the expected value of dependent variables. To identify the optimal lag periods, EViews lag length criteria is used. The results supported lag two or order two according to the Akaike Information Criterion (AIC) and is used in the equations. Therefore, the ARDL model with two lag periods is as follows: Equation 12 𝑁𝐻 𝑡 = 𝑎 + 𝛽1𝑁𝐻𝑡−1+𝛾1𝑁𝑆𝑡+𝛾2𝑁𝑆𝑡−2 + 𝜁 Order 2 model checks whether today’s price volatility in 𝑁𝐻 has a lag effect on its own price at 𝑁𝐻𝑡−1 and 𝑁𝐻𝑡−2 and whether this is exacerbated with lagged sentiment scores of 𝑁𝑆𝑡, 𝑁𝑆𝑡−1, 𝑁𝑆𝑡−2 news sentiment scores. Regardless of the number of lags added to the above equation, the ARDL test will automatically pick the number of lag suitable for the data according to AIC. Therefore, should there be a significant relation- ship between NH and NS, the test result is expected to provide significant p-values for each lagged variable. Failing to establish a significant relationship under the AIC selec- tion process, a least square test is performed for NH up to lag 3 to further test the lagged effect on NH. 47 4.4 Hypothesis 3 – High and Low Volatile Companies H3: The combined impact of News Sentiment and Nasdaq Helsinki on daily price vola- tility is stronger for high volatile stocks H0: The combined impact of News Sentiment and Nasdaq Helsinki is not stronger for volatile stocks compared to low volatile stocks Similar to the H2 model, the model is applied to test the H3 on the top and bottom three volatile companies. This analysis is expected to identify the impact differences between NH, NS and different level of volatility. The top three and bottom three com- panies in 200-days volatility bracket are picked for this test. The optimal lag periods of each company as dependent variable are automatically selected by the ARDL model according to the AIC method. The companies and their respective volatilities are: High - Lehto Group Oyj (190.40%), Valoe Oyj (90.49%), Incap Oyj (78.83%), Low - Elisa Oyj (15.51%), Lassila and Tikanoja Oyj (16.41%), and Tallink Grupp AS (17.57%). The above company-wise analysis gives an indication to the impact differences accord- ing of their volatility levels and which is more sensitive to NS and NH by comparing the coefficients of sentiment scores under each company. If the average coefficient of high- volatile companies is significantly larger than low-volatile companies’ average coeffi- cients, the H3 is established by concluding that news sentiment has a greater impact on high-volatile companies based on the sample selected. The below graphs illustrate the distribution of daily price volatility and residuals for each variable. As stated above, the daily volatility for each variable is calculated as the difference between open and closed prices. These graphs also confirm that the data is stationary for each variable. 48 -.08 -.06 -.04 -.02 .00 .02 .04 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 LEHTO_GROUP Residuals -6 -4 -2 0 2 4 6 8 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 VALOE_OYJ Residuals -.20 -.16 -.12 -.08 -.04 .00 .04 .08 .12 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 INCAP_OYJ Residuals -.6 -.4 -.2 .0 .2 .4 .6 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 ELISA_OYJ Residuals -.3 -.2 -.1 .0 .1 .2 .3 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 LASSILLA_AND_TIKANOJA_OY Res iduals -.03 -.02 -.01 .00 .01 .02 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 TALLINK_GRUPP Residuals -.4 -.2 .0 .2 .4 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 SENTIMENT_SCORE Residuals Table 1 .00 .02 .04 .06 .08 .10 .12 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Lehto Group 0 2 4 6 8 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Valoe oyj 2.0 2.1 2.2 2.3 2.4 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Incap oyj 0.4 0.6 0.8 1.0 1.2 1.4 1.6 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Elisa Oyj .0 .1 .2 .3 .4 .5 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Lassilla and Tikanoja Oyj .00 .01 .02 .03 .04 .05 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Tallink Grupp 0.0 0.2 0.4 0.6 0.8 1.0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Sentiment Score Table 2 49 4.5 Hypothesis 4 – Macroeconomic Rates A study by Dumiter et al. (2023) uses autoregressive models to regress multivariate time series of news sentiments and American stock market indices. This study claims news sentiment has a strong correlation with stock market movements, and key inter- est rates and inflation rates have a significant impact on this relationship. H4 tests this impact of inflation and interest rates by adding two more variables to the equations which are denoted as 𝑃𝑅𝑡 and 𝐼𝑅𝑡 to represent policy rates and inflation rates. Infla- tion rates update on a monthly basis and interest rates update weekly. Therefore, due to limitations on testing mixed periods, the respective month’s rates are used on each day. A study claims that there is a negative correlation between inflation rates and stock prices and a positive correlation between interest rates and stock prices (Eldomiaty, 2020). This is tested in this section, along with the impact of NSDAQ Helsinki as a stock exchange of a country can be considered as a part of the macroeconomy (Bloomenthal, 2023). H4: The Price volatility is significantly influenced by macroeconomic indicators for all the variables H0: The Price volatility is not significantly influenced by macroeconomic indicators for all the variables Equation 13 𝑁𝐻 𝑡 = 𝑎 + 𝛽1𝑁𝐻𝑡−1 + 𝛽2𝑁𝐻𝑡−2 + 𝛾1𝑁𝑆𝑡 + 𝛾2𝑁𝑆𝑡−1 + 𝛾3𝑁𝑆𝑡−2 +𝛿1𝑃𝑅𝑡 + 𝜖1𝐼𝑅𝑡 + 𝜁 In addition to the variables explained in the previous ARDL equation, this equation tests the relationship between the same variables with their own lags as well as two addi- tional controlled macroeconomic variables, policy rates represented by PR and inflation rate IR. The PR and IR components are considered fixed or controlled in this test. 50 5 Test and Results This chart illustrates the process followed in this section. Figure 8 5.1.1 Testing Data for Descriptive Statistics Firstly, NS and NH are tested to identify significant correlations. The dependent variable is NH, and independent variable is NS. The results are further discussed in the result section. The below graph illustrates a snapshot of the relationship between volatilities and news sentiment that may exist. 0 1 2 3 4 5 6 7 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Sentiment Score NasdaqHelsinki Figure 9 Correlation Alternative Models Stationarity Test -Unit Root Optimal Lag Autocorrelation -Serial Correlation LM -Correlogram Q Stat Adjusting Autocorrelation VAR-ARDL Results Interpretation of Results Summary 51 5.1.2 Alternative Models As discussed in the methodology section, the VAR and ARDL models suggest the best fit for this study. The VAR and ARDL models assume that the data is stationary, which is an important aspect of the nature of the data used to explain the relationship be- tween price volatility and sentiment scores. A basic method includes applying a plot chart to distribute the data, which can visually illustrate that the data moves around a fixed mean. Should the two time series not be stationary, it is possible that a combina- tion of both can still be stationary. This can be tested through a cointegration test. (stats.stackexchange, 2015). However, a Unit Root test is performed to check the sta- tionarity in addition to plot charts as data distribution confirms some level of station- arity. 4.8 5.2 5.6 6.0 6.4 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 NasdaqHelsinki 0.0 0.2 0.4 0.6 0.8 1.0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 Sentiment Score -1.2 -0.8 -0.4 0.0 0.4 0.8 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 NASDAQHELSINKI Residuals -.6 -.4 -.2 .0 .2 .4 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2023 SENTIMENT_SCORE Residuals Table 3 Stationarity is a basic concept and an important part of studying time series analysis. Stationary data behave similarly over time in terms of mean, variance, and autocorre- lation and ensure more predictability due to this consistency (stats.stackexchange, 2015). Statistical conclusions may be incorrect if the data is non-stationary. For example, even if two series are unrelated, a regression analysis may indicate a relationship 52 between them if they are trending over time (stats.stackexchange, 2015). For station- ary random variables, a number of helpful theoretical conclusions hold, such as the central limit theorem and the law of large numbers. Since it is anticipated that the fu- ture statistical process will remain constant, stationary series are simpler to predict (Singh, 2023). A time series’ behavior can only be examined for the time under consid- eration if it is non-stationary. On the other hand, conclusions drawn from the examina- tion of a stationary series can be applied to different time periods (Adeleye, 2018). Therefore, it is beneficial to apply autoregressive models for this study. To ensure the data is suitable to examine the relationship among variables, a number of tests are performed. 5.1.3 Unit Root Test A unit root test is a statistical procedure used in economics to determine whether a time series variable is non-stationary and has a unit root. Stationarity implies stable statistical properties over time (xlstat, 2023). A unit root indicates that the