Buddhi Weerasekara 

Sentiment Analysis and Stock Volatility 

Evidence from Financial News headlines Published related to Finland and 
NASDAQ Helsinki 

 
Vaasa 2024 

School of Accounting and Finance  
Master’s thesis in Finance  

Master’s degree Programme in Finance 


2 

Contents 

1 Introduction 7 

1.1 Motivation 8 

1.2 Finland 11 

1.3 Identification of Gaps 15 

1.4 Background 19 

2 Literature Review and Critical Analysis 24 

2.1 Sentiment Analysis 24 

2.2 Model and Data Gaps 28 

3 Data 31 

3.1 Date Process Chart 31 

3.2 Pre-Processing Data 32 

3.3 Sentiment Score Calculation 34 

3.3.1 FinBERT 36 

3.3.2 Naïve Bayes and SVM 37 

3.3.3 PYsentiment2 38 

4 Methodologies 41 

4.1 Statistical Models 41 

4.1.1 Vector Autoregressive Model (VAR) 41 

4.1.2 Auto Regressive Distributed Lag Model (ARDL) 43 

4.1.3 VAR Vs ARDL 44 

4.2 Hypothesis 1 – Correlation 44 

4.3 Hypothesis 2 – Lagged Effect 45 

4.4 Hypothesis 3 – High and Low Volatile Companies 47 

4.5 Hypothesis 4 – Macroeconomic Rates 49 

5 Test and Results 50 

5.1.1 Testing Data for Descriptive Statistics 50 

5.1.2 Alternative Models 51 

5.1.3 Unit Root Test 52 

5.1.4 Optimal Lag Test 53 

5.1.5 Autocorrelation Test 53 


3 

5.1.6 Correlogram Q Statistics Test 54 

5.1.7 Serial Correlation LM Test 55 

5.1.8 Adjusting for Autocorrelation 55 

5.2 Results 56 

6 Conclusions 60 

7 Further Research and Limitation 61 

8 References 62 

9 Appendix – Statistical Results 71 

9.1 Statistical Test Results – EViews 71 

9.2 Python Workfiles 75 

  
4 

List of Images, Figures, Tables, Equations, and Appendices 
 
Images  
 
Image 1. Sentiment map from Finviz.com                                                                    11 

Image 2. Visual sentiment web application                                                                  23 

Image 3. Sample sentiment analysis heatmap                                                             25 

Image 4. Words frequency map of Finnish financial news                                         34 

 
Figures 
 
Figure 1. Data processing map                                                                                       31 

Figure 2. Word frequency bar chart                                                                              35 

Figure 3. Word frequency pie chart                                                                              35 

Figure 4. FinBERT sentiment score calculation map                                                   36 

Figure 5. Sample sentiment score calculated from FinBERT                                     37 

Figure 6. PYsentiment2 sentiment calculation map                                                   38 

Figure 7. Comparison of polarity and weighted average sentiment scores           40 

Figure 8. Map of statistical tests                                                                                    50    

Figure 9 Volatility comparison of NASDAQ Helsinki and sentiment scores            50 

 
Tables  
 
Table 1. Volatility distribution of each stock                                                                 48 

Table 2. Residual distribution of each stock                                                                 48 

Table 3. Volatility and residual distribution of NASDAQ Helsinki and sentiment   51 

Table 4. Least Squares of three lags results                                                                  59 

  
Equations 
  
Equation 1. Naïve Bayes theorem                                                                                  37 

Equation 2. Weighted average of sentiment scores                                                    39 

Equation 3. Personal opinion score calculation                                                           39 

Equation 4. Day’s sentiment calculation                                                                       40 

Equation 5. Basic Vector Autoregressive (VAR) equation                                           42 


5 

Equation 6. Basic Autoregressive Lag Equation (ADL)                                                  42 

Equation 7. Breakdown of VAR                                                                                         42  

Equation 8. Autoregressive Distributed Lag Model (ARDL)                                         44 

Equation 9. Pearson Correlation Equation                                                                     45  

Equation 10. Application of VAR                                                                                      45 

Equation 11. Application of ARDL Lag 1                                                                          46 

Equation 12. Application of ARDL Lag two                                                                     46 

Equation 13. Application of ARDL with Macroeconomic Variables                            49 

 
Appendices                                                                                                   71-78                                                                                                        
 
Appendix 9.1.1.  OLS Estimator 

Appendix 9.1.2.  Descriptive Statistics of Variables 

Appendix 9.1.3.  VAR Estimation NH and NS 

Appendix 9.1.4.  ARDL Estimation NH and NS 

Appendix 9.1.5.  ARDL Individual Stocks 

Appendix 9.1.6.  ARDL P-Values of All Variables 

Appendix 9.1.7. Unit Root Test 

Appendix 9.1.8. Optimal Lag Test 

Appendix 9.1.9. Correlogram Q-Stat Tests 

Appendix 9.1.10. Serial Correlation Tests 

Appendix 9.2.1. PYsentiment Codes                                                                                    

Appendix 9.2.2. FinBERT Codes 

 
6 

 
UNIVERSITY OF VAASA  
School of Accounting & Finance  
Author:       Buddhi Weerasekara x1461755  
Title of the Thesis:                     Sentiment Analysis and Price Volatility 
Degree:       Master’s of Finance (Business Economics)  
Programme:     Master’s  
Supervisor:      Klaus Grobys  
Year: 2024                                   Pages: 78 

 
ABSTRACT:  
The challenge of forecasting the stock market fascinates researchers. Studies that use innova-

tive prediction approaches continue to emerge, despite the overwhelming body of data sug-

gesting that the dynamics of the financial market cannot be foreseen. The widely accepted 

theory for predicting stocks is the Efficient Market Hypothesis. However, numerous studies 

have still been conducted in the field of stock price prediction. With the use of news sentiment, 

this thesis examines non-quantifiable financial data, such as financial news headlines, through 

machine learning techniques in Python to quantify the news headlines. Predicting financial 

market trends using time series analysis and natural language processing is a challenging and 

complex task given the magnitude of factors that might affect stock prices, such as political 

and economic events. Nevertheless, due to the rapid development of digital platforms, the 

clarity of forecasting future financial trends has greatly improved. A variety of financial data 

can be freely accessed and evaluated for insights, and people's opinions are widely shared, 

which may set the sentiment of the market. Numerous studies have examined the link be-

tween public opinion and market volatility and suggest that sentiment expressed by individuals 

may influence market trends due to causes such as the ripple effect and the herd effect. The 

variations in opinion and irrational decision-making may depend on factors such as personal 

risk appetite, financial literacy level, and other social factors like age, gender, and religion. 

Therefore, it is interesting to examine these relationships for a smaller yet economically im-

portant nation in Europe. 

 
This master’s thesis investigates the relationship between News Sentiment (NS) and NASDAQ 

Helsinki (NH) as well as their effect on a sample of companies with different volatility profiles. 

It also looks at the lagged effects of variables and the impact of inflation, interest rates, and 

indexes on each variable. The study concludes that there was no statistically significant rela-

tionship between NS and NH and that NS could not account for the variation in NH. The study 

applies vector autoregressive and auto-regressive lag models to examine the relationship be-

tween variables with an optimal lag length selected according to AIC. The first lag of NH on its 

own showed a strong correlation with the results of the ARDL test. confirming the finding from 

Engle in 1982, who reveals that prior error terms have predictive power over current error 

terms. However, the tests did not show a lagged effect of NS on NH, nor was there a noticeable 

correlation between other companies. The combined effect of NH and NS was found to be less 

significant for high-volatile companies in the selected sample than for low-volatile companies. 

Additionally, some variables' statistical significance suggests that inflation, interest rates, and 

indexes had a significant impact on them. 

 
KEYWORDS: Finland, Sentiment, Volatility, VAR, ARDL, NASDAQ Helsinki, Financial News. 


7 

1 Introduction 

Due to rapid advancements in technology and artificial intelligence (AI), many business 

environments are adapted to a fast processing pace. In financial markets, high-fre-

quency trading is one good example (Deneuve, 2022). Macroeconomic factors, 

along with other significant indicators, are processed within seconds to derive an in-

vestment decision (Rodrigo, 2024). In this process, news sentiment plays a key role, which 

is derived from financial news published around the world. Investors are inclined to 

base their financial decisions and trading activities on the analysis derived from various 

platforms, including artificial intelligent robots, and these transactions are completed 

within seconds (Oehler et al., 2021); (Wood et al., 2022). Due to this fast-moving nature, 

investors are left with little to no time to read the whole news article. Therefore, finan-

cial news published tends to be short but sufficient to give important information, and 

news headlines are written and published in this manner to cater to those fast-moving 

investors (Sathya, 2023).  

 
According to a research paper series published by the Czech National Bank and its re-

search team, stock prices are not solely driven by market fundamentals. Global events 

such as the Black Monday crash, dot-com, global financial crisis, and the latest crypto 

crash fueled by the FTX scandal are good examples (Gric et al., 2021). They document 

that the influence of mood on future returns is substantially greater in the case of indi-

vidual investors than institutions. Another study reveals how the cross-section of stock 

returns is impacted by investor mood. They expect that stocks with higher sensitivity to 

their valuation and stocks that are difficult to arbitrage are more influenced by investor 

sentiment  (Bake and Wurgler, 2006). The authors state that mispricing may be caused 

by these high investor emotions. These studies further support the significance of factor-

ing investor sentiment into financial decision-making process.  

 
Due to the increasing value of sentiment analysis, analytical tools such as Sprout Social, 

RavenPack, and Buffer are popular sources to derive sentiment scores to forecast fu-

ture prices. These systems use Natural Language Processing (NLP) methods to assess 

social media posts and news to understand consumer sentiment. A previous literature 


8 

that was performed based on RavenPack analytic tools document the firm-level return 

volatility and public news sentiment (Yip Ho et al., 2013). The authors process more 

than 1200 news releases along with their scores at high frequencies to examine intra-

day price volatility and conclude that negative news affects volatility more than good 

news.  

 
Examining the relationship between stock market daily price volatility and financial 

news sentiment could provide valuable information into the dynamics of financial mar-

kets. This thesis follows a methodical approach to quantify news sentiment on news 

headlines published related to Finland and arrive at a sentiment score to examine 

whether there is a significant connection to stock price volatility, thereby examining 

the possibility of predicting future prices through this approach. Sentiment analysis 

and stock market volatility together are heavily researched area mainly for major econ-

omies and is insufficiently studied for the Nordic markets, specifically for Finland.  

 
1.1 Motivation  

Financial news sentiment analysis is a fast-growing area that utilizes Natural Language 

Processing (NLP), Machine Learning (ML), and Bigdata analysis to extract useful inves-

tor information from financial news published (Simon and Nelson, 2022). The objective 

is to identify investor sentiment towards a particular stock, index or any other financial 

instrument through a ML process and generate a sentiment score that can be used in 

financial planning. These sentiment scores can be beneficial to market participants in 

various ways. 

 
Investment and trading strategies can be supported by the data extracted from finan-

cial news published on a particular asset. If sentiment data indicates a good trend for a 

specific stock, an investor may elect to purchase that stock, expecting its price to rise. 

On the other hand, investors may rush to sell the stock in negative news publications 

to minimize losses. This process can be seen as part of day trading strategy where stock-

holders gather a quick glance at the day’s sentiment set by the daily news and make an 

irrational decision solely based on the mood set by the news item. 


9 

This concept of an irrational behavioral aspect is further discussed later in this paper. 

However, the sentiment scores can also be factored into long-term financial planning. 

Edward et al. (2007) explain how the mood set by the market can be used in pricing 

models like the Dividend Discount Model (DDM) and Capital Asset Pricing Models 

(CAPM). DDM discounts all future cashflow to arrive at the current stock price, which is 

a fundamental way of arriving at the value of a stock. Sentiment set by the market is not 

accounted for in this basic calculation. In order to factor sentiment scores into DDM, 

long-term sentiment scores may be considered along with the future growth prospects 

of the company. Tests such as Autoregressive Distributed Lag (ARDL) capture the long- 

and short-term relationships between variables. If the long-term relationship can be es-

tablished with the help of an econometric model, DDM valuation can be improved to 

capture market sentiment as well by adjusting the required rate of return along with the 

dividend growth rate. Therefore, a positive long-term sentiment should lead to increased 

profit while a negative sentiment may lead to losses or the disposal of shares to avoid 

future losses. The rate of return required in the model represents the risk associated with 

the shares. If investors wish to apply DDM to their daily valuation of stocks, the required 

rate of return may be adjusted with the risk appetite of that investor coupled with the 

personal belief how the share will perform in the future. In this case, Investor may expect 

a higher return for a lower sentiment score and an increase in the required rate of return. 

 
Similarly, Edward et al. (2007) suggest how CAPM can be improved to capture market 

sentiment by adjusting a beta component. Boido and Fasano (2014) also document the 

importance of CAPM with sentiment by explaining the deviation of asset prices and its 

connection to sentiment indicators. The authors claim that higher sentiment is con-

nected with a higher expected return and vice versa, and psychological biases among 

market participants are a significant barrier to the efficient market hypothesis. The the-

oretical prices produced by the CAPM model do not align with observed prices, and mar-

ket factors are insufficient to explain excess returns (Boido and Fasano, 2014). Another 

scientific paper examines two behavioral biases, ambiguity aversion and positive skew-

ness related to CAPM with sentiment. In this paper, authors use Market Sentiment CAPM 

(MSCAPM) to explain the beta anomaly and three market strange behaviors as well as 

their impact on sentiment and conclude MSCAPM takes model certainty, positive 


10 

skewness, disaster risk, and market sentiment into account, which is similar to the three-

factor asset pricing model (Boido and Fasano, 2014). In general, the beta in the CAPM 

measures the stock’s sensitivity to market returns. If a stock price follows market senti-

ment, its beta can be modified to reflect this. For example, during moments of bullish 

sentiment, the beta may rise, indicating the stock’s higher risk and projected returns. 

Similar to these previous studies, this empirical analysis expects to first, identify the re-

lationship between sentiment scores and stock prices and second, apply this relationship 

to asset pricing models to derive more realistic and accurate valuations. Once the link is 

established, the application of sentiment can be extended to various aspects of financial 

markets, including budgeting and forecasting, day trading strategies, asset pricing mod-

els, financial advisory services and modeling, and financial consultancy. 

 
In addition, incorporating sentiment into financial planning may assist in identifying 

market volatility, understanding market behavior, and managing risk. Market volatility 

is the degree of change in the price of a financial asset over time. News emotions may 

have a substantial influence on this volatility. Contemporaneous news refers to news 

that occurs simultaneously with market movement. For instance, a breaking news item 

regarding a company’s earnings may create a sudden variation in the company’s stock 

prices (Yen et al., 2021). Market volatility may also have an impact from past news items, 

which is called the delayed effect. For example, while investors process the news, a re-

port from a few days ago regarding a company’s leadership transition may still have an 

impact on the stock price today (Yen et al., 2021). The authors of this study claim that 

both contemporaneous and lagged news are main determinants of market volatility. 

Based on this, it would be fascinating to test and verify these concepts in this thesis. 

 
Building a financial modeling system that covers a wider view, including fundamental 

analysis, financial statement analysis, ratio analysis, competitor analysis, and market 

sentiment analysis, is the ultimate motivation of this exercise. Successful application of 

these aspects could support the decision-making process and thereby minimize finan-

cial risk. A system can be developed to factor daily news sentiment into pricing models 

in real time. This idea is mainly motivated by using a freely available news sentiment 

website, https://finviz.com/map. This website provides a visual illustration of news 

https://finviz.com/map


11 

sentiment for each stock in many indices. For example, below is an image of the senti-

ment of major indices around the world. This application not only covers the sentiment 

of stocks but also other financial instruments like exchange-traded funds. Investors in 

Finland may also benefit from such a sentiment map, which can be supportive in iden-

tifying the volatility of the stock market.  

 
finviz.com 

Image 1 

 
1.2 Finland  

Finland is one of the key countries in Europe, situated in a key geographic location bor-

dering Russia and neighboring Sweden, with a stable economy supported by a less cor-

rupt political environment (Aggarwal and Lyttle, 2022). Conducting a sentiment analy-

sis on Finnish financial news and identifying its impact on price volatility may support 

many decision-makers and especially stock market investors, due to its strong hold in 

the European economy. Very little previous research covers empirical analysis on 

NASDAQ Helsinki (NH) and there are many justifications to support the fact that more 

studies should be undertaken on Finnish stock markets as the country is a key compo-

nent in Europe, according to European Union update on principle countries (european-

union, n.d.). First, Finland boasts a strong focus on technology and innovation, and the 

government launched the “Innovation and Skill in Finland 2021-2027’’ program, which 

supports well-being and employment, promoting Finland as a great destination to 

https://finviz.com/


12 

attract international talent. According to the Finland Promotion Board, Finland is well 

known for its success in bigdata, virtual reality, cyber security, AI, and 5G technology 

and claims to be the most digitalized country in Europe. A robust welfare system and a 

high-standard education system are also key economic factors that not only attract 

many international talents but also retain local talent in the country. 

 
Secondly, Finland manages to attract an impressive amount of Foreign Direct Invest-

ment (FDI) into the country. In the year 2022, Finland placed second in the top ten 

western European countries based on the inward FDI performance ranking (Aggarwal 

and Lyttle, 2022). The country has demonstrated its ability to attract many projects, 

especially in the face of the COVID-19 pandemic, from 2019 to 2020, with project num-

bers rising from 148 to 193. The increase in FDI during the years 2020-2021 was rec-

orded at a notable 30.4% compared to the global average of 18.1%. The country is par-

ticularly good at drawing businesses in the fields of media, communication, software, 

information technology (IT), business, and professional services. These industries ac-

counted for 44% of total inbound greenfield FDI in 2021. FDI in the communication 

sector enjoyed an increase from 11 to 43 projects between the years 2020 and 2021. 

Setting up a stable economy and a business-friendly environment are key factors lead-

ing Finland to attract more FDI, further supported by a successful pandemic recovery 

and flexible tax measures (Aggarwal and Lyttle, 2022). Hence, Finland may be consid-

ered as a preferred destination for more inflows of FDI. 

 
Furthermore, Lloyds bank’s 2023 update on ‘’Investing in Finland’’ claims lack of cor-

ruption, competitiveness, and a key location at the hub of an exciting area centered on 

Russia, Scandinavia, and the Baltic states as key positive factors. This update explains 

the favorable investing environment in the country with numbers published in statistics 

Finland data. According to that, over two-thirds of the total flow was accounted for by 

corporate acquisitions. In breaking down this by nation in euros, Luxembourg 2.9 billion, 

Sweden 2.4 billion, and Switzerland 1.5 billion. The total FDI in 2021 amounted to USD 

98.5 billion, which is an 8.8% increase from 2020. The largest share of investment 

stocks is held by, Sweden at 23.1%, the U.S.A at 18.4%, Germany at 10.1%, and Luxem-

bourg and Norway each at 6.6% (LloydsBank, 2023).  


13 

 
Based on these facts, Finland continues to attract investments into the country, which 

implies that the economy is in a stable state and will continue to advance. Other note-

worthy positive factors that Finland boasts about are its multilingual population, least 

corruption, expertise in green technology, high work productivity, free market, highly 

industrialized economy, and high spending in research and development. Despite this 

positivity, Finland also has some weaknesses, such as its geographic vulnerability as it 

is located near Russia, which is a known politically conflicted area, lack of industrial 

competitiveness, small internet market, aging population, deterioration of current ac-

counts, and large household debt (LloydsBank, 2023).  

 
NASDAQ Helsinki (NH) formally known as the Helsinki Stock Exchange, is the main plat-

form where investors transact shares listed on the exchange. The annual volume of NH 

was reported at 13.13 billion in the year 2023. This figure is 20% of NASDAQ 100 index 

volume and Europe’s largest stock exchange Euronext’s volume for the same year. Ac-

cording to NASDAQ (2022), NH proposed five key initiatives to revitalize the market-

place as the new government formed in 2022. With the implementation of this plan, it 

is expected to support companies obtain equity financing more conveniently. The first 

area covered in this plan is revising capital income taxation to a neutral tax treatment 

and lower dividend tax for individual shareholders. Secondly, improve domestic inves-

tors and encourage them to maintain long-term savings and direct investment in the 

stock market. According to this article, there are one million Finnish private sharehold-

ers with 278,000 active equity trading accounts. The medium-sized investor base is 

smaller, and the domestic shareholding is concentrated among a small number of large 

institutional investors (NASDAQ, 2022).  

 
According to Euroclear (2022), almost half of the Finnish companies are held by foreign 

investors. Finland expects to attract more foreign investment, which supports the li-

quidity and capital requirements of Finnish companies (NASDAQ, 2022). Therefore, NH 

is an important platform to enable these funds to be directed to companies in need. 

Further to the proposal extended, financial literacy is a key component to enable the 

whole plan to work feasibly and expect all members of society to be aware of financial 


14 

risks and raise awareness to make healthy financial decisions to reach future financial 

goals. These include fostering financial studies at the school and university level to 

equip students with the necessary knowledge to make effective financial decisions. The 

final initiative is to support green financing and green transition. In this process, NH 

expects the listed companies to report environmental, social, and governance (ESG), 

metrics according to the guidelines provided by NASDAQ. ESG designation Is a key mile-

stone that companies are expected to achieve in addition to creating sustainable in-

vestment opportunities. 

 
NH plays a pivotal role in assisting businesses to obtain equity-based financing to sup-

port their expansion and plays an important role in the development of both the Nordic 

and European regions. Therefore, studies conducted on NH may provide significant dy-

namic information, and assist in receiving further FDIs, and support individual and in-

stitutional investors in making thorough and healthy investment decisions in the future. 

This paper attempts to support investors by investigating a key component of equity 

investment, financial news sentiment and price volatility. There are various studies car-

ried out on sentiment analysis for Finnish markets, but the impact of financial news 

sentiment is not being looked at yet.  

 
One study by Rautiainen and Jokinen (2022) claims the relationship between social 

media use and stock prices is largely unknown in the Finnish markets and looks at this 

relationship for 105 Finnish public limited companies listed on the Nasdaq Helsinki (NH) 

through handpicked social media data. This study looks at the value and relevance of 

social media activities on Facebook, Instagram, LinkedIn, Twitter, and YouTube. The 

findings indicate that social media activity and popularity are valuable variables for 

forecasting stock prices. The study concludes that not all social media activities are 

equally important for managers and investors, emphasizing the importance of using 

multiple visual social media channels. This study only focuses on social media review 

and a large amount of financial news published on credible news sources may not be 

included. To address this omitted data bias, this thesis incorporates a large set of news 

headlines published in the Refinitive database for the year 2023. Another sentiment 

analysis study by Vankka et al. (2019) looks at online user reviews and headlines by 


15 

applying a hybrid algorithm that predicts review polarity using word embeddings and 

Finnish polarity lexicons. They use a weighted average method to arrive at the senti-

ment score for each reviews and headlines. However, this study focuses on hotel and 

travel customer sentiment and may not impact stock prices unless reviews are subject 

to any listed companies. 

 
Therefore, this paper specifically looks at financial news sentiment analysis, and the 

paper is structured as follow: Chapter 1 captures the introduction, identification of gaps, 

research questions, and background. In the background section, a detailed analysis of 

previous literature conducted in this area is discussed. This part covers the scope of 

each study, process and method used, limitations, and drawbacks. Chapter 2 is a 

deeper literature review with a critical analysis that identifies the gaps in previous lit-

erature and how the gaps are addressed in this study. Chapter 3 explains the process 

followed to prepare data for this empirical investigation, including the data sources of 

news headlines, individual prices, and macroeconomic rates. In addition, different ap-

proaches used to calculate sentiment score for each news headlines are discussed in 

detail along with the justification for using the selected method for the empirical anal-

ysis. Chapter 4 covers the econometric models used to test each hypothesis. This sec-

tion also covers a detailed explanation of the statistical model used and the different 

tests performed to validate the data. Chapter 5 summarizes the test results and 

whether or not the null hypotheses are rejected. The conclusion chapter summarizes 

the overall results, and Chapter 7 discusses the limitations and further research. 

 
1.3 Identification of Gaps 

As credible news sources are the preferred choice for information, social media plat-

forms may not meet this standard at all times. Therefore, performing this empirical 

investigation on the news headlines downloaded from a credible news source, Refin-

itive Data, holds more value as they have a dedicated user-generated content (UGC) 

verification team to detect false information and validate news (Reuters, n.d.). Reuters 

also claims that the platform has a set of strict rules and principles based on their Trust 


16 

Principles, to which all journalists must adhere, and that news services offer all sides of 

a story promptly and are impartial, accurate, and independent. Matthia (2014) found 

that Reuters sentiment can explain and predict stock performance better than macroe-

conomic factors. On the other hand, social media user accounts may not follow such 

standards (Stearns and Kille , 2015). Therefore, having reliable information to conduct 

this empirical analysis may hold more value. Based on this data, this study attempts to 

address the lack of studies performed on one of the influential aspects of Finnish stock 

market investment by answering the below research questions: 

 
1) Do the sentiment scores derived from news headlines have a significant impact 
on price volatility? 
 

2) Is there a significant lagged effect of news sentiment on price volatility? 
 

3) Are companies with high price volatility more affected by the combined rela-
tionship of news sentiment and index volatility? 
 

4) Do macroeconomic factors such as inflation and interest rates, along with news 
sentiment and index volatility, have an impact on price volatility? 

 
Based on these research questions four hypotheses are defined as follows: 

 
H1: There is a significant positive correlation between the sentiment scores of financial 

news and the daily price volatility on the Helsinki Stock Exchange. 

H0: There is no significant correlation between sentiment scores and daily price volatil-

ity. 

 
H2: The sentiment scores of financial news have a significant lagged effect on the daily 

price volatility on the Helsinki Stock Exchange. 

H0: The sentiment scores do not have a significant lagged effect on the daily price vol-

atility. 

 
H3: The combined impact of News Sentiment and Nasdaq Helsinki on daily price vola-

tility is stronger for highly volatile stocks. 

H0: The combined impact of News Sentiment and Nasdaq Helsinki is not stronger for 

highly volatile stocks compared to low-volatile stocks. 


17 

 
H4: The price volatility is significantly influenced by macroeconomic indicators for all 

the variables. 

H0: The price volatility is not significantly influenced by macroeconomic indicators for 

all the variables. 

 
The first hypothesis focuses on whether price volatility in the Finnish stock market is sig-

nificantly affected by sentiment scores obtained from news headlines. This concept pro-

poses that a news headline’s emotional tone, whether positive or negative, can affect 

investor behavior and result volatility in stock values. This is a vastly tested hypothesis 

around the globe for various topics such as economies, countries, individual securities, 

and politics. A study that tested this hypothesis on political news and its connection to 

voter volatility is done by Baraniak and Sydow (2021). In this paper, the authors propose 

a unique human-labeled dataset to train and evaluate machine learning algorithms with 

3819 human-labeled political news headlines from online media sources. The literature 

claims that, every day a large number of casual news site readers scan through several 

headlines rapidly and briefly. It further explains that when headlines are read in this man-

ner, readers may have an unconscious effect on how they view the world. It is noteworthy 

that the full news item covers more information, and the headlines may not capture the 

whole impression. Therefore, a reader’s perception derived from reading a news head-

line may have been misdirected. This bias may also be present in financial news readers, 

and sentiment scores derived from financial news may not reflect the real price volatility. 

Therefore. Should there be a significant connection between sentiment scores calculated 

on news headlines and price volatility, this suggests that the majority of readers traded 

based on perceptions derived from news headlines and vice versa.  

 
A similar study performed to test research question one on stock market trends and so-

cial media moods was done by Daifeng et al. (2019). In this paper, microblogs are used 

by the authors to examine the changes in the stock market and how irrational behavior 

can affect price volatility fueled by social media user posts. The authors examine the data 

gathered from Tancent Weibo, a large microblogging platform in China, for a three-

months period to mimic the illogical actions of investors by creating a mood-based stock 


18 

trend analysis tool. Test results reveal a significant correlation between variables, provid-

ing credibility to the current behavioral finance theories on irrational investor behaviors 

and explaining why prices have brief spikes and falls (Daifeng et al., 2019). This concept 

may account for the temporary volatility in stock prices should there be a significant cor-

relation to news headlines sentiment in this study.  

 
The Efficient Market Hypothesis (EMH) assumes that investors value securities rationally 

based on fundamental analysis. In addition, other factors that affect stock market vola-

tility, especially short-term and abnormal movements, may be connected to irrational 

investor decision-making. Edward et al. (2007) attempt to analyze high trading volume, 

high volatility, and stock bubbles by incorporating investor sentiment into pricing models, 

the Dividend Discount Model (DDM) and Capital Asset Pricing Models (CAPM). They state 

that investor sentiment is defined as a personal opinion on the company’s potential for 

future success or failure. In addition to professional and market analyst guidance, inves-

tors may get insight from the macroeconomic conditions of the market as a whole. How-

ever, investors tend to make their own investment decisions based their best self-analysis 

at the end and these beliefs may differ from person to person depending on their age, 

gender, culture, and level of risk awareness (Edward et al., 2007). This can be connected 

to the irrational behavioral aspect discussed earlier. Irrational decision-making may also 

have a link to financial news and the level of understanding of the content. In order to 

capture these variations in different investor sentiments, Edward et al. (2007) factor a 

modified beta, which is a function of beta and investor sentiment, into the CAPM. An 

investor who perceives a company as failing in the future may require a higher risk pre-

mium, thereby applying a modified, increased beta to capture a high expected return. As 

a result, the prices may fluctuate beyond the scope of fundamental analysis due to per-

sonal bias or irrational decision-making.  

 
These perceived perceptions have a delayed effect on the market. There may be a delay 

between reading a news item and executing a trade by an investor. A previous study 

tested this lag effect of news sentiment on price volatilities. Fama-French three multi-

factor model is used to examine this relationship (David et al., 2019). They address non-

trading days sentiment by applying a seven-day simple moving average (SMA) of the 


19 

scores. The findings of this paper reveal that sentiment scores have a significant impact 

on Dow Jones returns, and lagged daily sentiment scores are significant, indicating that 

information compounded in these scores is not immediately reflected in security prices. 

The SMA applied to cover non-trading days does not have a significant impact on the 

prices (David et al., 2019). Testing hypothesis two in this thesis, the sentiment score and 

prices of non-working days are reflected in the previous day’s numbers. 

 
Hypothesis number three is a novel test that is examined in this study. Firstly, the com-

bined effect of sentiment score and index prices, both on a current and lagged basis, has 

not been tested previously on stock prices. However, one study examined the stock ex-

change impact on news sentiment and individual stock prices on a current and lagged 

basis (Dumiter et al., 2023). The authors provide an integrated method to ascertain the 

relationship between technical analysis, the stock market, and news sentiment indica-

tors through autoregressive models and conclude a solid reverse connection between 

these variables while emphasizing behavioral finance theories. Daxhammer et al. (2023) 

examine the role of emotional finance on stock markets as a hole. In addition to that, 

this hypothesis also checks whether the volatility profile is a key component that affects 

the impact of sentiment differently. 

 
1.4 Background 

It has been discovered that news headline sentiment scores have a significant influence 

over price volatility. Previous literature examined this effect using models such as Vector 

Autoregressive, Autoregressive lag models, and Generalized autoregressive models 

(Yen et al., 2021). Additionally, studies show that news sentiment has a significant lag-

time effect on price volatility. The lagged effect is considered when analyzing the lead-

lag and contemporaneous news effects (Yen et al., 2021). The contemporaneous news 

effect is the immediate impact on the market of the news. The lead-lag effect refers to 

the influence that the latter price, the lag, has on the former price, the leader, on two 

consecutive days. This implies, the volatility of day one mirrors the volatility on day two 

(Yongli et al., 2022). This study states that power-law distribution is a basic nature of 


20 

human activities and reveals that the number of cumulative lead-lag days between 

stock pairs in both Chinese and American stock markets matches the distribution, 

which validates the power-law distribution in stock trading. Power-law distribution is a 

functional relationship in which a change in one quantity relative to another causes a 

corresponding change in the other quantity that is proportionate to the power of the 

change.  

 
A previous study by Khan and Ahmad (2018) examine the lead-lag and bi-directional 

relationship between investor sentiment and market returns for an emerging market 

and concludes that sentiment plays a significant role in derailing the market from its 

sustainable state or pulling the market down as investors behave irrationally. This im-

plies that the investor reaction to negative news may pull the markets down more com-

pared to the positive news effect. This concept is known as the leverage effect which is 

examined in previous literature by Yang et al. (2019). They highlight that the leverage 

effect, and investor sentiment have a big impact on the volatility forecast. Another 

study by Laakkonen and Lanne (2008) claim that bad news increases volatility more than 

good news. Supporting this claim further, the Australian National University reveals 

that intraday volatility is more sensitive to negative news than positive news (ANU, 

2013). In addition, Ellwood (2020) shows that consumers around the world respond 

more psychologically to negative news than to positive news.  

 
From a behavioral finance perspective, one study shows that negative news before 1 

day and after 4 days makes stock price volatility larger (Chung Wu et al., 2022). The 

relationship between stock indices volatility and individual stock prices is popularly 

studied, but the combined effect of news sentiment and stock market performance 

on individual stock price volatility, along with the impact from inflation and interest, 

may not be frequently observed, especially in Finland.  

 
Another study by Seng and Yang (2017) aims to investigate the relationship between 

financial news and stock market volatility by building a dictionary using grammar, mul-

tiword structure, and sentiment analysis. The researchers examine news content using 

a social media content measurement methodology, and they suggest a model that 


21 

combines structured and unstructured data to examine the relationship between fi-

nancial news and market volatility. The authors collect unstructured data from the in-

ternet which is processed through Natural Language Processing (NLP) to arrive at a 

sentiment score before it is integrated into the model. The dictionary is built based on 

the people’s reading along with additional information analysis to process this data into 

assign sentiment scores to each news item. The structured data is directly integrated 

into the model. According to the findings, a substantial correlation between financial 

news and market volatility is detected, and good news is positively correlated with ris-

ing stock prices (Seng and Yang, 2017). Nevertheless, the fact that this study only uses 

one news source could make it more unclear to determine the accurate impact of mar-

ket sentiment at large. In order to address this, I use news headlines from numerous 

publications that are kept up-to-date in the Refinitive database. 

 
In the case of general news reading habits, the majority of the newsreaders still prefer 

reading news on the hard paper, and around six hundred million readers are on the 

digital platform (Ponsford, 2012). With the rapid development of social media, news 

readers have diverted from regular platforms to social platforms to read or view news. 

According to a survey done in 2020, more than 80% of Indians, aged 16 to 70, used 

social media as their primary source of news, as did nearly 60% of Argentinians and 

Australians (Watson, 2022). A major financial news source, Financial Times, reported 

one million paying subscribers on its digital platform (FinancialTimes, 2022). There are 

approximately 4.3 million active online stock traders in the world and nearly twelve 

thousand Finnish online traders (forex.in.rs, 2017). As these numbers represents a sig-

nificant portion of the whole investor base, it is safe to assume that demand for digital 

news and online trading platforms is higher.  

 
In financial markets, news reading platforms may be high price while some channels are 

free. Platforms, such as Bloomberg and Reuters, cost thousands of dollars to gain ac-

cess. For example, Bloomberg Terminal costs up to USD 25,000 per user for a year, while 

Reuters costs around USD 2000 per month. These platforms cover a vast scope of in-

formation about companies, ranging from financials to media coverage. These data can 

be downloaded, processed, and analyzed according to the user’s preference. These 

http://forex.in.rs/


22 

platforms cover a wider range of news published around the world from various credi-

ble sources. Traders who have the luxury of using such a platform benefit from thor-

ough analysis based on all the information relevant to a company or a share. In terms 

of financial news, traders can view a wider range of news items with the click of a but-

ton, and if needed, a sentiment heatmap can be retrieved, that provides a visual illus-

tration of the sentiment of the news related to a company. The instant trading features 

that combine these news sentiment heat maps support the trader in forecasting future 

volatility. On the other hand, newspapers such as the Financial Times require a monthly 

subscription, which may also be considered costly for beginners who wish to gain ac-

cess to more reliable news. Therefore, conducting a sentiment analysis on financial 

news may not be a fair representation of the most accurate sentiment of the market, 

as market participants may experience information gaps due to cost and unavailability. 

In other words, new entrants to trading and the lower layer of investors may not have 

access to all the financial news, which implies that not all investors are fully informed. 

Conducting a sentiment analysis in this scenario may contain accuracy gaps. An ad-

vanced trader who has access to an improved sentiment map like below would benefit 

from reducing risk and securing more returns compared to a trader in the lower cate-

gories.  

According to research conducted in 2020 by the NORC at the University of Chicago and 

the FINRA investor education foundation, a large number of new retail investors joined 

the American securities market. These were categorized into three groups: holdover ac-

count owners, experienced entrants, and new investors. The findings reveal that, com-

pared to experienced entrants, holdover account owners and new entrants were younger, 

had lower income, and represented a wider racial spectrum (FINRA and NORC, 2021).   

 
Image 2 is an example of what an interactive sentiment heatmap looks like, and this 

freely available web application is built by Damian Boh, a machine learning engineer. 

This application illustrates the sentiment of the market derived from news sentiment 

and is updated daily and hourly, supporting traders in understanding the market mood 

of their holdings. According to Boh (2022), the user can enter the stock ticket to retrieve 

the sentiment data, which visually illustrates whether the market emotion is negative 

or positive for that particular ticket. The score ranges from negative three to positive 


23 

three, which is a similar scoring range in sentiment analysis libraries like FinBERT. The 

database is populated through financial news headlines scraped from the webspace is 

processed through machine learning techniques to provide a visual illustration of the 

performance of stocks (Boh, 2022). 

 
https://damianboh.github.io/stock_sentiment.html 

 
Image 2 

https://damianboh.github.io/stock_sentiment.html


24 

2 Literature Review and Critical Analysis  

2.1 Sentiment Analysis 

Sentiment analysis is part of Natural Language Processing (NLP) combined with neuro-

science, linguistics, mathematics, and computer science, which is a method to ascer-

tain the sentiment contained in a text. It can categorize the feeling as neutral, negative, 

or positive. A scored word list, such as AFINN scores words between minus five and 

plus five and is used in the most basic implementation of sentiment analysis 

(Shivanandhan, 2020).  

 
Andrea et al. (2015) discuss various tools that can be used in analyzing sentiment 

data. The study also documents making better choices by identifying consumer senti-

ment, not only in finance but in all industries, including politics. According to the au-

thors, the sentiment analysis was first discussed by Nasukawa and Yi (2003), and 

more research has been conducted thereafter. According to Andrea et al. (2015), there 

are five main stages of sentiment analysis. Starting with data collection, text preparation, 

sentiment detection, sentiment classification, and presentation of output. For the pur-

pose of this study, the focus is on how people's opinions, attitudes, and emotions can 

affect making better or worse investment decisions by quantifying news articles with 

sentiment analysis tools. In this process, the texts are classified according to negative, 

positive, and neutral, which can be generated through various trained dictionaries con-

taining lists of words categorized according to their meaning. For example, words such 

as ‘agree, good, support, pros, benefits, win’ are treated as positive sentiment, while 

‘disagree, bad, opposition, cons, disadvantage’ are listed as negative sentiment. 

 
A study by Usmani et al. (2023) reveals that there is a significant correlation between 

financial news and stock market trends, and in this paper, the authors introduce a 

weighted average model for news categories, in which they categorize news according 

to the sector and stock-related news. After that, the authors combine the Long-Short-

Term Memory-Based Weighted and Categorized News Stock Prediction Model (WCN-

LSTM), which they claim to be a better model that explains stock market trends based 


25 

on news. This indicates that, with the news articles analyzed through a sound model, 

investors benefit from the sentiment results generated. However, other simpler methods 

can also be used for this purpose. 

 
Popular news platforms like The Guardian, Bloomberg, Reuters, and CNBC provide the 

latest and most up-to-date financial news precisely. Due to the rapid increase in financial 

news articles published at a given time, mainly through digital platforms, readers can 

experience difficulties in covering important information relevant to their investments 

(Kirange et al. 2016). Therefore, news headlines are written in such a way that readers 

can grasp the contents of a news item quickly (Sathya, 2023). In addition, tools like 

sentiment analysis are designed to give a visual or quantitative illustration of a particular 

news item, which can support investors in making their decision in a fast-moving trad-

ing environment. Most professional traders build their own mechanisms to trade stock 

around the world by incorporating various analyses such as technical analysis. Plat-

forms such as Bloomberg and Reuters are most popular and provide these technical anal-

yses supported by heat maps that show the sentiment of a particular day for a given 

exchange or security. Below is an example of sentiment on June 29th, 2020, for major 

U.S. companies analyzed through Twitter data (AltSignals, 2020). 

 
Image 3 


26 

According to this map, green represents improved sentiment, red is for worsened sen-

timent, and grey is for neutral sentiment. This map reveals a positive trend for commu-

nication services, while consumer durables appear to be on a negative trend. Consumer 

discretion has a mixed result. An investor who refers to a heat map such as this would 

not be encouraged to place money on consumer durable companies compared to com-

munication services. However, a daily heat map only represents that day’s sentiment, 

and advancing this to a weekly, monthly, and yearly heat map would be ideal to identify 

long-term trends in stock performances. 

 
Investors may have different profiles and can be categorized, ranging from new entrants 

to professionals. The newcomers may first trade with less knowledge and analysis, 

while most experienced traders may have an overall knowledge of the securities that 

they invest in. Another key element is the financial literacy level of each investor. As 

Mitchell (2014) suggests, ‘’While the costs of raising financial literacy are likely to be 

substantial, so too are the costs of being liquidity-constrained, overindebted, and poor.’’  

Therefore, deeper knowledge makes an investor more successful. Stemming from 

this, an experienced investor looks at different analyses before an investment decision 

is made, which can range from reading simple financial news to analyzing company 

financial statements. Therefore, the analysis of news may represent a smaller portion 

of support but an important element. Investor sentiment cannot be directly measured, 

yet it is an important element as there is a strong reaction to news from market 

participants, even for a smaller portion of the news feed (Arratia et al., 2021), (Matteo 

et al., 2021). 

 
Traditionally, investor sentiment was analyzed through emotional proxy variables. 

(Jiangshan et al., 2021) Thanks to the advancement of different types of analytical tools, 

such as machine learning and natural language processing (NLP) methods, the analysis 

of textual data has become more popular and widely employed in financial market 

analysis. On another occasion, the digital transformation in the financial industry has 

impacted positively towards better customer service through online banking and 

fintech applications. Kachan (2021) states that the service gaps in the banking sector 

can be filled by NLP-driven sentiment analysis. Kachan (2021) also explains how 


27 

identifying consumer sentiment can be beneficial in developing new products and im-

proving the quality of services.  One of the key components of sentiment analysis is 

textual analysis, which can also extend to image, film, art, etc. analysis. Software like 

Python, MATLAB, and R are popular sentiment analysis tools. Additionally, other plat-

forms such as Brand24, Digimind, Lexalytics Salience, and the MS Azure text analytics 

API are more popular (Fontanella, 2023). In order to run these analyses on financial 

news, various financial term libraries are developed to generate accurate sentiments. 

FinBERT is a dictionary trained to analyze financial text sentiment. Another popular dic-

tionary is McDonald and Loughran. Financial term dictionary, which has lists of financial 

terms that are connected to positive and negative scores. Similarly, VADER, TextBlob, and 

NLTK are also used in this process.  

 
An important ongoing area of scientific research in finance is volatility prediction and 

forecasting. Using a linguistic approach to analyze news content and incorporating news 

sentiment scores into this process may improve forecasting. A study by Yen et al. (2021) 

uses a method to quantify the sentiment score by factoring in the positive and nega-

tive words included in the news content. The weakness of this approach is not captur-

ing the statement’s overall meaning but the number of positive and negative words 

needed to arrive at a sentiment score. Nevertheless, they claim that news sentiment is 

an effective reference to security trading. This may hold true if the accuracy rate of the 

news sentiment score is high. A high accuracy rate or intended meaning cannot be 

achieved by counting the number of words, as it may only give a quantification of neg-

ative and positive words but also an implied message in a news item. A sentence that 

is included in the news item to express a positive sentiment may be misinterpreted by 

a word-to-word count. For example, ‘’the expenses were reduced by liquidating loss-

making business units to increase profit’’ In this statement, four negative words over-

ride the two positive words, leading to an overall negative sentiment score, regardless 

of whether the whole sentence is a positive statement. 

 
28 

Other studies by Tetlock (2007) and Schumaker et al. (2012) employ the Harvard Gen-

eral Inquirer (GI) and Arizona Financial Text systems, respectively, limiting them to 

word-based sentiment calculation. Similarly, Loughran and McDonald (2011) incorporate 

10-Ks, which limits the scope to the Harward-IV-4 psychological dictionary. Another 

drawback of this type of word-base quantification is the misinterpretation of the con-

text. For example, “reduction in cost” may be interpreted as two negative words leading 

to an overall negative sentiment. This is further supported by the FinBERT test that is 

used in this study to quantify news sentiment. A sample of misinterpreted news head-

lines is provided in the 3.2.1 section, and as a result, the polarity base score 

calculation method is used in this study.   

 
2.2 Model and Data Gaps  

A number of different regression models have been used to analyze the impact of news 

on financial market volatility. The study by Yen et al. (2021) uses the GARCH model to 

quantify the impact on Taiwan’s stocks. GARCH may capture relationships between linear 

data sets but may not capture large changes and heavy tails (Yen et al., 2021). In con-

trast, the ARDL model calculates both the short- and long-term effects of one variable 

on another by adding lags of variables to the model (Chetty, 2018). A noteworthy fea-

ture of both models is that they do not capture the leverage effect of sentiment, which 

is the asymmetric response to both positive and negative sentiment.  

 
Another analysis by Deveikyte et al. (2022) finds a relationship between Twitter data and 

FTSE100 movements. The sentiment derived through tweets may signal near-future 

movements but does not affect long- or short-term volatility (Deveikyte et al., 2022). 

Further, the impact of social media comments and posts may vary depending on the 

number of followers, and the messages may not reach the full investor base. This may 

be due to restricted accounts, a lack of awareness of the existence of certain Twitter or 

X accounts, etc. On the other hand, news articles published on reliable platforms may 

reach a larger investor base.  


29 

Previous literature states that social media rumors fueled by the herd effect can cause 

investor confidence to collapse and financial risk to spread further (Zhang et al., 2022). 

Therefore, investors solely relying on social media posts may absorb uncalculated risks 

into their portfolio. The disadvantage of incorporating such a large number of individ-

ual messages could be that it may lead to personal bias, and these unsubstantiated 

personal opinions may disrupt financial stability (Zhang et al., 2022). Therefore, the Re-

finitive database provides a better reach as it captures many credible news sources. 

It is also important to note that platforms such as Reuters are of expensive and not all 

market participants have access to such high-end platforms. To address this, the boolean 

search operator captures the Finland’s famous financial newspaper, Kauppalehti.  

 
Hritha and Rishad (2020) explore evidence from India for an empirical investigation of 

investor sentiment and market volatility. This literature examines how the emotions of 

irrational investors impact market volatility by applying the Granger and Generalized Au-

toregressive Conditional Heteroskedasticity (GARCH) causation approaches and con-

cludes that excessive market volatility is largely caused by unreasonable sentiment. The 

limitation of Granger causality is that it assumes volatility occurs post-news sentiment 

impact. This might be accurate to some extent, particularly in cases where there is a 

noticeable lag effect, but because investors react quickly to news propagation, this as-

sumption may not always hold true.  

 
Okafor and Nneamaka (2023) highlight the mechanics of volatility in financial decisions 

might be difficult to understand. The authors argue that GARCH models have trouble 

with volatility persistence, and to address this, the GARCH model is changed into the 

Markov regime-switching GARCH model, which enables conditional mean and variance 

to vary dynamically. The GARCH model is used to predict the volatility of returns on 

financial assets and is applied specially to analyze time series data. Volatility persis-

tence, often referred to as the autocorrelation of volatility, is a concept that is seen in 

financial markets, where volatility depends on how volatile a specific period was com-

pared to a prior period (FasterCapital, 2023). Volatility persistence is influenced by a 

number of factors. The existence of news shocks is one of the most important elements. 

Unexpected and abrupt events known as news shock frequently have a long-lasting 


30 

effect on market volatility (FasterCapital, 2023). The actions of market players also have 

a role in this area. Traders and investors may become more cautious and modify their 

holdings in response to significant market volatility (FasterCapital, 2023).  

 
The ARDL model offers a special method for handling volatility persistence (Ari, 2021). 

Applying the ARDL model in this thesis addresses volatility persistence, and autocorre-

lation. Choosing optimal lag values based on the Akaike Information Criterion is one 

method to approach this. Ari (2021) states that volatility measurements might be used 

as explanatory variables in the ARDL model, and this makes it possible for the model to 

accurately reflect how volatility persists over time. An error correction model that con-

siders both the short- and long-term impacts of variable changes may be obtained by 

re-parameterizing the ARDL model and may be used to account for volatility persis-

tence by capturing the effect of volatility both instantly and over time (Kripfganz and 

Schneider, 2016).  

 
31 

3 Data   

3.1 Date Process Chart 

Three categories of data are used in this empirical analysis: news headlines down-

loaded from the Refinitive database, individual prices downloaded from NASDAQ Hel-

sinki, and macroeconomic data from Suomentpankki. Pre-processing of data is done in 

Excel and uploaded to different platforms to perform tests and analyses.  

 
Figure 1 

Refinitive Data

News Headlines

- Finland,
- OMXH25 and OMXHPI

- Kauppalehti

Sorting of news data according to 
date and time

Upload CSV file to Python platform 

Translate Finnish and Swedish 
news to English 

Translate Finnish news to English 

Calcualte news sentiment on 
FinBERT, Naïve Bayes and SVM , 

and PYsentiment2

Cacluate day's sentiment through 
weighted average method

Prepare data for the regression 
tests - Logrithmic transfer

NASDAQ Helsinki 
Data

Prices

- NASDAQ Helsinki
- Six Individual Stock Prices 

Download to Excel individually

Calculate daily price volatility each 
variable

Prepare data for the regression 
tests - Logrithmic transfer

Macroeconomic 
Data

Inflation and Interest Rates


32 

3.2 Pre-Processing Data 

The study hypotheses are focused on sentiment scores calculated for the news head-

lines published under the key search word Finland for the period January 1, 2023, to 

December 12, 2023. More than forty thousand news headlines are downloaded man-

ually from the Refinitive database each day as data is not readily available. As Refinitive 

publishes news articles from many credible sources, the search is filtered to capture 

the most relevant news items with Boolean operators: Topic:FI OR R: OMXH25 OR R: 

OMXHPI OR (Product: KAU NOT Language:LZH NOT Topic:US). Prominent sources such 

as seekingalpha.com, public technologies, Edgar Filings, and many more were included 

in the search options that captured breaking news alerts, research, filings, transcripts, 

press releases, web news, and posts from Twitter or X. With this search option, the 

headlines of the most relevant news articles are downloaded in Excel format for easy 

use and fast cleaning. Each news headline is tagged with the date and time of the re-

lated stock identifier, where the news item is related to a company listed on the Helsinki 

stock exchange. Finland’s most prominent financial newspaper, Kauppalehti, is also in-

cluded in the Boolean search to include the latest and most relevant financial news in 

the database.   

 
The daily historical prices of NASDAQ Helsinki are downloaded from the Refinitive da-

tabase for the period of January 1, 2023, to December 12, 2023. These prices are then 

processed to capture the daily volatility by calculating the difference between the close 

and open prices for each day. Similarly, the daily prices of the top and bottom three 

volatile companies are downloaded from the same database. Macroeconomic indicator 

data is taken from the Suomentpankki.fi database. Monthly inflation rates and daily in-

terest rate data are added to the database. Monthly data is allocated to each day of the 

respective month; for example, the January inflation rate is used throughout the month 

of January. 

 
The data is downloaded from the NASDAQ Helsinki data base in Excel format for each 

variable. These prices are then sorted according to date for the period of January 1, 

2023, to December 11, 2023, which accounts for 346 data points. A separate daily price 


33 

volatility column is added for each day by taking the difference between open and close 

prices. This column for each variable is transferred to a separate sheet to prepare the 

data for the regression analysis. To address negative data points, the entire data is con-

verted to a positive state by using the =min function in Excel. This function helps to 

identify the minimum number in the data set for each variable and convert it to a pos-

itive state, which is then converted to a logarithmic version with the =LN excel function 

to prepare the data for regression analysis.  

 
It is important that the data be normalized before regression analyses are performed. 

A highly skewed variable can be transformed into a more normalized dataset using a 

logarithmic transformation. This is especially useful for modeling variables that have a 

non-linear correlation because it can also lead to a negative skew in the likelihood of 

errors (Cleophas and Zwinderman, 2016). A key assumption in regression analysis is 

homoscedasticity, which states that the variance of the errors is constant at all levels 

of the independent variables. Heteroscedasticity is the state in which the amount of 

the error term varies across values of an independent variable when this assumption 

is not met. For example, the log rithmic transformation of variable Y is =ln(Y).  Regres-

sion analysis findings can be adversely affected by outlier values, which can be miti-

gated with the use of this transformation. It can improve the interpretability of the data 

patterns and the accuracy of the model’s prediction by compressing the scale on which 

data is measured (Sole, 2022).  

 
It is common that the relationship between variables can be non-linear. A linear regres-

sion model may find it challenging to properly present the correlation as a result. Re-

gardless of this nature, data is converted to a linear nature from logarithmic methods. 

In other words, a curved connection may be straightened out by using this method. For 

example, X and Y have a connection such that Y = a*exp(b*x). Then ln(Y) = ln(a) + b*x, 

which is a linear relationship (Greenwood, 2022). Another advantage of converting 

data to logarithmic is that it addresses overfitting. Overfitting is known as a model with 

an excessive number of parameters in relation to the amount of data. By keeping the 

model’s function less complicated, the logarithmic transformation can aid in preventing 


34 

overfitting. This suggests that by changing the distribution to a more regularly formed 

bell curve, the model may become less prone to overfitting (Andy, 2019).  

 
3.3 Sentiment Score Calculation   

The focus of this study is to analyze the relationship between market sentiment set by 

financial news and stock price volatility. Therefore, more emphasis is put on the econo-

metric analysis of the relationship while ensuring the sentiment data is suitable and valid 

for this exercise. The financial news headlines are preprocessed before they are used 

in machine learning techniques. The Python platform is used in this exercise, and the 

process is briefly explained below. 

 
More than 40,000 financial news headlines were downloaded from the Refinitive da-

tabase for the period from January 1, 2023, to December 11, 2023, with an average of 

110 news headlines per day. A sentiment score is calculated for each day, quantifying 

the textual impression into a number ranging from positive to negative. This processing 

is done through several machine learning (ML) techniques, and a natural language 

toolkit is used to process the headline. It is imperative to use libraries with pre-trained 

financial words. In this exercise, FinBERT and PYsentiment2 libraries are used to arrive 

at sentiment scores for each news headline, and the most accurate prediction is used 

for the regression analysis. 

Image 4 


35 

This chart represents the distribution of words according to the number of appearances 

in the data set. It is evident that the dataset is centered around Finland. The below 

charts illustrate the number of appearances of key words in the data set.  

 
Figure 2 

 
Figure 3 


36 

3.3.1 FinBERT  

Pre-processing each news headline is performed to remove special characters, include 

dates, and translate headlines in Finnish and Swedish to English through a Google 

Translator command. Thereafter, the FinBERT model is installed, and the data is pro-

cessed to arrive at a sentiment score. FinBERT sentiment scores are 0 for neutral, 1 for 

negative, and 2 for positive. To calculate the day’s sentiment, averaging the scores over 

100 plus data points does not provide a valid sentiment for the day. In addition, assigning 

equal weights to each score and arriving at a weighted average score provides a similar 

result to the simple average method, as the market may react differently to positive 

news than to negative news, which is known as the leverage effect. Therefore, assign-

ing more weight to negative sentiment seems fair when arriving at an aggregated 

sentiment score for a day. But identifying the optimal weights to be assigned to each 

score may be challenging. However, as a proxy, 50% more weight can be assigned to the 

negative score, and this was further validated by a study by Danowski et al. (2020), which 

states that negative scores were nearly twice as strong as positive scores, thereby 

assigning 1 for neutral, 1 for positive, and 1.5 for negative scores seems like a fair 

approach. To address the anomalies in arriving at a consolidated sentiment score, a 

couple of models were used, namely, Naïve Bayes and Support Vector Machine (SVM). 

The below chart illustrates the main steps followed in the FinBERT sentiment calcula-

tion. 

 
•Data collection

•Pre-processing of news 
headlines and save the data 
in csv format

Upload to Python

•Installing FinBERT

•Instaling trasnformers and vectorizers, 
and goole translator

•Process data with pre-trained libraries

Translation of Swedish and 
Finnish News to English, and 

chunk proces the 40k plus 
news headlines •Calculate news sentiment

•Calculate weighted average 
score for each day

Download daily 
sentiment to excel


37 

Figure 4 

 
FinBERT and other word-processing sentiment libraries have the major drawback of 

not accounting for the entire statement of the headlines but the number of words. The 

sentiment score derived from FinBERT produced inconsistent results compared to the 

contents of the headline statement. The below table of samples illustrates these dis-

crepancies. In this, negative sentiment is given a positive score, and vice versa. 

 
Figure 5 

 
3.3.2 Naïve Bayes and SVM  

The Naïve Bayes algorithm family uses Bayes’ theorem and probability theory to predict 

the category of a given sample, such as a news headline. Due to its probabilistic nature, 

the theory determines each category’s probability for a given sample and then outputs 

the category with the highest probability. The main assumption of this theory is that 

every feature contributes equally and independently to the result and calculates the 

likelihood of an event based on the likelihood of an earlier event. The mathematical 

formulation of this theory is: 

 
Equation 1  

𝑃(𝐴 ∣ 𝐵) =
𝑃(𝐵 ∣ 𝐴) ⋅ 𝑃(𝐴)

𝑃(𝐵)
 

Where A and B are events and P(B) is not 0. The model looks at the likelihood that event 

A will occur, given that event B is true. To test this, the data is uploaded to the Python 

platform and trained for testing sets. After that, the text is converted to a matrix of 


38 

token counts through vectorization to be used in the machine learning algorithms. 

(Grisel et al., 2023). The results derived from this exercise showed a lower accuracy 

rate of 40%, and therefore, the Naïve Bayes combined sentiment score is not selected 

for this paper. 

 
To address this issue, another model called Support Vector Machine (SVM) is used to 

improve the accuracy rate. Similar to Naïve Bayes, SVM also produces a set of tokens 

to run machine learning algorithms. With this method, the accuracy is slightly improved 

to 54%. The accuracy rates of each of these models represent how well the model cor-

rectly predicts the outcome of the data. This implies that, when arriving at a combined 

day’s sentiment score, the above models generated results with lower accuracy rates. 

 
3.3.3 PYsentiment2 

Similar to the FinBERT process, a different set of commands is used in calculating the 

sentiment score. In addition, a polarity score is calculated to derive a sentiment for the 

full news item. This library is used for general financial sentiment analysis and offers 

two sentiment dictionaries, Harvard IV-4 and Loughran and McDonald (DeRobertis, 

2020). 

 
Figure 6 

 
•Pre-processing of news 
headlines and save the 
data in csv format

Upload to Python

•install pysentiment2

•install googletrans

•Apply stopwords

Translation of Swedish and 
Finnish News to English, and 

chunk proces the 40k plus 
news headlines

•Calculate the daily 
compound sentiment 
score

•Apply the function to 
the 'Processed_Text' 
column

•Add polarity score

•Check accuracy of your 
model

Download daily 
sentiment to excel


39 

This library is used to analyze the headline statement and arrive at a polarity score. The 

PYsentiment2 library is used and provides a customized output depending on the num-

ber of positive and negative words included in the headlines (Shah, 2021). For example, 

positive: 4 indicates that there are 4 words in the text with a positive sentiment, and 

negative: 1 means that there is 1 word with a negative sentiment. A polarity score is 

provided for each news headline, taking the positive and negative scores into account 

as follows: 

Equation 2  

Positive − Negative

Positive + Negative
 

According to this, the polarity score of the above example is 0.6 and can be used as a 

combined sentiment score for a news item. Apart from this, the model also calculates 

the presence of personal opinions, evaluations, or beliefs under the subjective score 

(Shah, 2021). This is arrived through: 

Equation 3  

 
Positive + Negative

Number of Words in the Text
 

However, this subjectivity score is not considered in the sentiment calculation, as dif-

ferences in opinion may vary for each investor. After arriving at a more accurate senti-

ment score for each headline, the next challenge is to calculate a combined sentiment 

score for each day. Ranking the scores based on the number of times they appear in 

the dataset for a particular day may provide an overall sentiment for that day. For ex-

ample, more negative scores for news headlines mean the day’s sentiment is negative, 

and to obtain one sentiment score for a day, the weighted average method is applied. 

 
Polarity scores represent the day’s sentiment by taking positive and negative word 

counts for the day. This thesis applies the basic weighted average method of sentiment 

scores. A day’s polarity sentiment score is calculated by averaging the polarity scores 

for the particular day. Similarly, a score of the weighted average of the positive and 

negative words is calculated for a day using the below formula. 


40 

Equation 4  

 
Positive

Positive + Negative
∗ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑤𝑜𝑟𝑑𝑠) + (

Negative

Positive + Negative
∗ 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑤𝑜𝑟𝑑𝑠 

= Day’s sentiment 

 
The trial-and-error approach provides similar data patterns and evidence of stationarity. 

Therefore, this empirical analysis is performed on polarity averages and is considered 

the day’s sentiment for regression analyses. Below graphs represents the distribution 

of each score and news.  

 
Figure 7 

 
Sentiment scores capture the news headlines published on both working and non-

working days, while the stock prices are only available on business working days. In 

order to capture the maximum possible number of data points, the stock prices on non-

working days capture the previous day’s price volatility, thereby creating 349 data 

points. The volatility of stock prices for each day is the difference between the closing 

and opening price. Each volatility and other variable are transformed into logarithmic 

series to prepare data for the regression analysis.  

Polarity Scores Vs Weighted Average

Average of Polarity

Average of Weighted Average

Linear (Average of Polarity)

Linear (Average of Weighted Average)


41 

4 Methodologies 

4.1 Statistical Models 

This section looks at different methodologies applied to test each hypothesis in detail. 

Main model used is the Vector Autoregressive model (VAR) as this study is related to 

two main time series data sets, daily sentiment scores and daily stock price volatility. 

In addition, individual stock price volatility, interest rates, and inflation rates are also 

captured in further hypothesis testing. The relationship between several time series 

data points can be examined using VAR models (Chetty, 2018). This involves examining 

different correlations between sentiment scores and the volatility of stock prices over 

time. This model is widely used in the financial world to make more accurate forecasts 

by capturing the dynamic interdependencies between variables. 

 
Time series data, which are sets of data points gathered over a period of time, are fre-

quently used in financial statistics and economic analysis. A time series’ correlation 

with its own past and future values is known as autocorrelation, and is a fundamental 

characteristic in time series analysis as it facilitates the discovery of patterns in the data. 

VAR allows for the examination of the relationships of variables with their own past 

and future prices and with other variables. Further to this, the relationships between 

the aforementioned variables are examined in VAR and ARDL analyses. The results de-

rived from these two tests are tabled in the Results section. 

 
4.1.1 Vector Autoregressive Model (VAR) 

VAR is widely used in time series analysis and forecasting (Mohr, 2018). It models each 

variable as a function of its own past values, which is helpful when multiple time series 

influence each other. Other autoregressive models, such as AR, ARMA, or ARIMA, are 

unidirectional, while VAR is bi-directional (two-way) which enable a more comprehen-

sive understanding of the variables along with improved forecasting (Prabhakaran, 

2024). Because of VAR’s usability in real-world applications, it is an effective tool to 

analyze disciplines like finance and economics, where it is crucial to comprehend the 

dynamic relationship between several variables over time. Stemming from the 


42 

univariate autoregressive model, the VAR model regresses the vector of time series on 

the lagged vectors of these variables. Put differently, the value a variable depends on 

its own lag, or today’s condition of the prices is a base to set tomorrow’s price, similarly 

for the news sentiment. For example, a news article published today for a particular 

share may set the mood for the next news article published in the future for that stock. 

Therefore, autoregressive models fit perfectly to capture this behavior in this study. 

 
The basic univariate autoregressive model has one dependent or endogenous variable. 

The current value of variable Y is dependent on its own lag. This is an order-one VAR 

model, and by adding more lags, the order levels can be increased. a and et stand for 

intercept and error terms, respectively, where the error term has a normal distribution. 

Stationarity of data is one key assumption in this model (Mohr 2018). Regressing a var-

iable with its own lag may not fully reveal the relationship between two or more vari-

ables, as other variables may also have a significant impact on each other. Models that 

incorporate contemporaneous and lagged values of other, or exogenous variables, to-

gether with lagged values of the dependent variable, correctly represent this concept 

(Mohr 2018). This is also known as the Autoregressive Distributed Lag (ADL) model and 

the equation can be written as follows: 

Equation 5 

𝑦𝑡 = 𝑎 + 𝑦𝑡𝑡−1 + 𝑒𝑡 

Equation 6 

𝑦1𝑡 = 𝑎 + 𝑏0𝑦𝑡−1 + 𝑏1𝑥𝑡+ + 𝑏2𝑥𝑡−1 + 𝑒1𝑡 

 
This ADL model may be a better model for forecasting the relationship between varia-

bles than a basic autoregressive model. If the endogenous variable’s lag value also af-

fects the exogenous variable, the VAR approach supports this relationship (Mohr 2018). 

The basic Var equation can therefore be written follows: 

Equation 7 

(
𝑦1

𝑦2
) = [

𝑎11 𝑎12
𝑎21 𝑎22

] (
𝑦1𝑡−1

𝑦2𝑡−2
) + (

𝑒1𝑡

𝑒2𝑡
) 

or 

𝑦1𝑡 = 𝑎1𝑦1𝑡−1 + 𝑒1𝑡 

y2t=a2y2t-1+e2t 


43 

 
This model assumes everything depends on everything, including endogenous varia-

bles and exogenous variables, and each line in the above matrix equation can be con-

sidered as two different equations subsequently. 

 
4.1.2 Auto Regressive Distributed Lag Model (ARDL) 

ARDL is an expansion of autoregressive models and considers both lags and self-lagged 

values of more variables (Gupta, 2022). Because of this, ARDL is especially useful for 

examining sentiment scores and how they affect price volatility. Furthermore, the ARDL 

model takes care of the problem of collinearity by including a variable’s lag along with 

the lags of other independent variables (Chetty, 2018). Collinearity is the high correla-

tion among independent variables (stats.stackexchange, 2015). This should be adjusted 

before tests are performed, as it can lead to unreliable and unstable estimates of the 

model. Therefore, ARDL is helpful to understand the complex dynamics of this study as 

more exogenous variables are introduced in later hypothesis tests. 

 
The ARDL model is beneficial in capturing the dynamics of the variables over time, as 

changes in variables may not reflect immediately (Chetty, 2018). In addressing autocor-

relation, a lagged response variable serves as a type of substitute for response variable 

autocorrelation when it is employed in a model. Generally, after the effect of autocor-

relation is eliminated, the remaining explanatory variables are included in the test to 

determine whether there is still a statistical relationship between these variables 

(Nguyen, 2021). ARDL can also separate the long-run and short-run effects, which are 

used to test cointegration and the long-run relationship between the variables, and 

including lagged variables can enhance the model’s fit by enabling the model to cap-

ture more variations in the variables (Gupta, 2022). A challenge that is common when 

performing ARDL is singularity due to multicollinearity in the data. This occurs when 

regressing different lags of independent variables, which may lead to unstable coeffi-

cient estimates (Eviews, 2014). To address this when performing the ARDL model in 

EViews, one lagged version of independent variable is applied to equations, as the op-

timal lags are automatically picked by the test.  

 
44 

ARDL captures fore components to forecast patterns. Self-lagged values, distributed 

lags, seasonality, and trends. The trend component is captured by e + x0 + x1t + x2t² … 

xk*tᵏ and the seasonality is captured with ΣᵢXᵢ Sᵢ (Gupta, 2022). Therefore, the basic 

equation is as follows: 

Equation 8 

𝑌𝑡 = 𝛼 + ∑  

𝑝

𝑖=1

𝛽𝑖𝑌𝑡−𝑖 + ∑  

𝑞

𝑗=0

𝛾𝑗𝑋𝑡−𝑗 + ∑  

𝑟

𝑘=0

𝛿𝑘𝑍𝑡−𝑘 + 𝜖𝑡 

 
In this equation 𝑝, 𝑞, 𝑟 are the optimal lag lengths, which will be automatically picked 

up by the AIC process, and Y, X, Z are the different variables, 𝛽, 𝛾 , 𝛿 are coefficients, 𝜖 

and 𝛼 are error term and intercept, respectively.  

 
4.1.3 VAR Vs ARDL 

VAR uses multiple equations to explain variables using both its own and other variables, 

while ARDL uses a single equation with one dependent variable that is regressed on 

both independent variables and its own lags. All variables in VAR model are considered 

endogenous, which means that other variables in the system have an impact on them, 

which leads to simpler forecasting. In comparison, the ARDL model usually considers 

one variable to be endogenous while others are treated as exogenous, which makes 

forecasting more difficult (stats.stackexchange, 2015). This indicates that although 

other variables affect the endogenous variable, the other variables are not considered 

to be impacted by the endogenous variable. 

 
4.2 Hypothesis 1 – Correlation 

H1: There is a significant positive correlation between the sentiment scores of financial 

news and the daily price volatility on the Helsinki Stock Exchange. 

H0: There is no significant correlation between sentiment scores and daily price volatil-

ity. 

 
To test hypothesis 1, the basic VAR test and Pearson correlation coefficient formula are 

applied to identify the initial relationship between variables. The dependent variable Y 


45 

Nasdaq Helsinki, and the independent variable X is the sentiment scores derived from 

sentiment analysis for each news headline statement daily. The econometric model for 

this analysis is: 

Equation 9 

𝑟 =
∑  𝑛

𝑖=1 (𝑁𝑆𝑖 − 𝑁𝑆ˉ )(𝑁𝐻𝑖 − 𝑁𝐻̄)

√∑  𝑛
𝑖=1 (𝑁𝑆𝑖 − 𝑁𝑆ˉ )2 ∑  𝑛

𝑖=1 (𝑁𝐻𝑖 − 𝑁𝐻̄)2

 
𝑁𝑆𝑖 represents the News Sentiment scores of Finnish financial news headlines for each 

day i. A consolidated score is derived for each day, considering positive, negative, and 

neutral sentiment scores. This score represents the overall market mood. The depend-

ent variable  𝑁𝐻𝑖  is the price volatility of Nasdaq Helsinki. The tests are performed for 

a few dependent variables in this study, ranging from top individual stocks to index 

price volatility. The volatility is calculated as the standard deviation of the daily returns 

for the selected dependent variable. n is the total number of days from which the 

scores and volatility are derived. 𝑋̅ is the mean of the sentiment scores and 𝑌̅ is the 

mean of price volatility. (𝑁𝑆𝑖 − 𝑁𝑆ˉ ) and (𝑁𝐻𝑖 − 𝑁𝐻̄) are the difference between the 

individual sentiment scores, and price volatility respectively and (𝑁𝑆𝑖 − 𝑁𝑆ˉ )(𝑁𝐻𝑖 −

𝑁𝐻̄) is the product of these differences. 

 
4.3 Hypothesis 2 – Lagged Effect 

H2: The sentiment scores of financial news have a lagged effect on the daily price vola-

tility in the Helsinki Stock Exchange. 

H0: The sentiment scores do not have a lagged effect on the daily price volatility. 

 
Basic VAR (1) order one model with one endogenous variable is given below. The rela-

tionship between the variable and its own lag is tested in this model. Nasdaq Helsinki 

(NH) and News Sentiment (NS) act as dependent variable, respectively. 

Equation 10 

 
𝑁𝐻 𝑡
= 𝑎 + 𝛽1𝑁𝐻𝑡−1 + 𝜁  


46 

Further to VAR, the ARDL model can also be applied on the dependent variable 𝑁𝐻 

with first-order lag along with the independent variable 𝑁𝑆 as below: 

Equation 11 

𝑁𝐻 𝑡  
= 𝑎 + 𝛽1𝑁𝐻𝑡−1+𝛾1𝑁𝑆𝑡+𝛾2𝑁𝑆𝑡−1 + 𝜁 

 
In this model, today’s stock price volatility of 𝑁𝐻 is stated as a function of its own one 

lag value 𝑁𝐻𝑡−1 and sentiment score’s current and one lag value 𝛾
1

𝑁𝑆𝑡, 𝛾
2

𝑁𝑆𝑡−1. The 

coefficients 𝛽1, 𝛾
1
 and 𝛾

2
 measures the impact of one unit change in lagged variables 

on the expected value of dependent variables. To identify the optimal lag periods, 

EViews lag length criteria is used. The results supported lag two or order two according 

to the Akaike Information Criterion (AIC) and is used in the equations. Therefore, the 

ARDL model with two lag periods is as follows: 

Equation 12 

𝑁𝐻 𝑡  
= 𝑎 + 𝛽1𝑁𝐻𝑡−1+𝛾1𝑁𝑆𝑡+𝛾2𝑁𝑆𝑡−2 + 𝜁 

 
Order 2 model checks whether today’s price volatility in 𝑁𝐻 has a lag effect on its own 

price at 𝑁𝐻𝑡−1  and 𝑁𝐻𝑡−2  and whether this is exacerbated with lagged sentiment 

scores of 𝑁𝑆𝑡, 𝑁𝑆𝑡−1, 𝑁𝑆𝑡−2 news sentiment scores. Regardless of the number of lags 

added to the above equation, the ARDL test will automatically pick the number of lag 

suitable for the data according to AIC. Therefore, should there be a significant relation-

ship between NH and NS, the test result is expected to provide significant p-values for 

each lagged variable. Failing to establish a significant relationship under the AIC selec-

tion process, a least square test is performed for NH up to lag 3 to further test the 

lagged effect on NH. 

 
47 

4.4 Hypothesis 3 – High and Low Volatile Companies 

H3: The combined impact of News Sentiment and Nasdaq Helsinki on daily price vola-

tility is stronger for high volatile stocks 

H0: The combined impact of News Sentiment and Nasdaq Helsinki is not stronger for 

volatile stocks compared to low volatile stocks 

 
Similar to the H2 model, the model is applied to test the H3 on the top and bottom 

three volatile companies. This analysis is expected to identify the impact differences 

between NH, NS and different level of volatility. The top three and bottom three com-

panies in 200-days volatility bracket are picked for this test. The optimal lag periods of 

each company as dependent variable are automatically selected by the ARDL model 

according to the AIC method. The companies and their respective volatilities are: High 

- Lehto Group Oyj (190.40%), Valoe Oyj (90.49%), Incap Oyj (78.83%), Low - Elisa Oyj 

(15.51%), Lassila and Tikanoja Oyj (16.41%), and Tallink Grupp AS (17.57%). 

 
The above company-wise analysis gives an indication to the impact differences accord-

ing of their volatility levels and which is more sensitive to NS and NH by comparing the 

coefficients of sentiment scores under each company. If the average coefficient of high- 

volatile companies is significantly larger than low-volatile companies’ average coeffi-

cients, the H3 is established by concluding that news sentiment has a greater impact 

on high-volatile companies based on the sample selected. The below graphs illustrate 

the distribution of daily price volatility and residuals for each variable. As stated above, 

the daily volatility for each variable is calculated as the difference between open and 

closed prices. These graphs also confirm that the data is stationary for each variable.  


48 

-.08

-.06

-.04

-.02

.00

.02

.04

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

LEHTO_GROUP Residuals

-6

-4

-2

0

2

4

6

8

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

VALOE_OYJ Residuals

-.20

-.16

-.12

-.08

-.04

.00

.04

.08

.12

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

INCAP_OYJ Residuals

-.6

-.4

-.2

.0

.2

.4

.6

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

ELISA_OYJ Residuals

-.3

-.2

-.1

.0

.1

.2

.3

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

LASSILLA_AND_TIKANOJA_OY Res iduals

-.03

-.02

-.01

.00

.01

.02

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

TALLINK_GRUPP Residuals

-.4

-.2

.0

.2

.4

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

SENTIMENT_SCORE Residuals

 
Table 1 

 
.00

.02

.04

.06

.08

.10

.12

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Lehto Group

0

2

4

6

8

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Valoe oyj

2.0

2.1

2.2

2.3

2.4

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Incap oyj

0.4

0.6

0.8

1.0

1.2

1.4

1.6

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Elisa Oyj

.0

.1

.2

.3

.4

.5

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Lassilla and Tikanoja Oyj

.00

.01

.02

.03

.04

.05

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Tallink Grupp

0.0

0.2

0.4

0.6

0.8

1.0

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Sentiment Score  

Table 2 

 
49 

4.5 Hypothesis 4 – Macroeconomic Rates 

A study by Dumiter et al. (2023) uses autoregressive models to regress multivariate 

time series of news sentiments and American stock market indices. This study claims 

news sentiment has a strong correlation with stock market movements, and key inter-

est rates and inflation rates have a significant impact on this relationship. H4 tests this 

impact of inflation and interest rates by adding two more variables to the equations 

which are denoted as  𝑃𝑅𝑡 and 𝐼𝑅𝑡 to represent policy rates and inflation rates. Infla-

tion rates update on a monthly basis and interest rates update weekly. Therefore, due 

to limitations on testing mixed periods, the respective month’s rates are used on each 

day. A study claims that there is a negative correlation between inflation rates and stock 

prices and a positive correlation between interest rates and stock prices (Eldomiaty, 

2020). This is tested in this section, along with the impact of NSDAQ Helsinki as a stock 

exchange of a country can be considered as a part of the macroeconomy (Bloomenthal, 

2023). 

 
H4: The Price volatility is significantly influenced by macroeconomic indicators for all 

the variables 

H0: The Price volatility is not significantly influenced by macroeconomic indicators for 

all the variables 

Equation 13 

𝑁𝐻 𝑡
= 𝑎 + 𝛽1𝑁𝐻𝑡−1 + 𝛽2𝑁𝐻𝑡−2 + 𝛾1𝑁𝑆𝑡 + 𝛾2𝑁𝑆𝑡−1 + 𝛾3𝑁𝑆𝑡−2

+𝛿1𝑃𝑅𝑡 + 𝜖1𝐼𝑅𝑡 + 𝜁 

 
In addition to the variables explained in the previous ARDL equation, this equation tests 

the relationship between the same variables with their own lags as well as two addi-

tional controlled macroeconomic variables, policy rates represented by PR and inflation 

rate IR. The PR and IR components are considered fixed or controlled in this test.  

 
50 

5 Test and Results 

This chart illustrates the process followed in this section. 

 
Figure 8 

 
5.1.1 Testing Data for Descriptive Statistics 

Firstly, NS and NH are tested to identify significant correlations. The dependent variable 

is NH, and independent variable is NS. The results are further discussed in the result 

section. The below graph illustrates a snapshot of the relationship between volatilities 

and news sentiment that may exist. 

 
0

1

2

3

4

5

6

7

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Sentiment Score NasdaqHelsinki  

Figure 9 

 
Correlation

Alternative Models

Stationarity Test
-Unit Root Optimal Lag

Autocorrelation

-Serial Correlation LM
-Correlogram Q Stat

Adjusting 
Autocorrelation

VAR-ARDL Results 

Interpretation of 
Results

Summary


51 

5.1.2 Alternative Models 

As discussed in the methodology section, the VAR and ARDL models suggest the best 

fit for this study. The VAR and ARDL models assume that the data is stationary, which 

is an important aspect of the nature of the data used to explain the relationship be-

tween price volatility and sentiment scores. A basic method includes applying a plot 

chart to distribute the data, which can visually illustrate that the data moves around a 

fixed mean. Should the two time series not be stationary, it is possible that a combina-

tion of both can still be stationary. This can be tested through a cointegration test.  

(stats.stackexchange, 2015). However, a Unit Root test is performed to check the sta-

tionarity in addition to plot charts as data distribution confirms some level of station-

arity. 

 
4.8

5.2

5.6

6.0

6.4

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

NasdaqHelsinki

0.0

0.2

0.4

0.6

0.8

1.0

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

Sentiment Score

-1.2

-0.8

-0.4

0.0

0.4

0.8

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

NASDAQHELSINKI Residuals

-.6

-.4

-.2

.0

.2

.4

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12

2023

SENTIMENT_SCORE Residuals

 
Table 3 

 
Stationarity is a basic concept and an important part of studying time series analysis. 

Stationary data behave similarly over time in terms of mean, variance, and autocorre-

lation and ensure more predictability due to this consistency (stats.stackexchange, 

2015). Statistical conclusions may be incorrect if the data is non-stationary. For example, 

even if two series are unrelated, a regression analysis may indicate a relationship 


52 

between them if they are trending over time (stats.stackexchange, 2015). For station-

ary random variables, a number of helpful theoretical conclusions hold, such as the 

central limit theorem and the law of large numbers. Since it is anticipated that the fu-

ture statistical process will remain constant, stationary series are simpler to predict 

(Singh, 2023). A time series’ behavior can only be examined for the time under consid-

eration if it is non-stationary. On the other hand, conclusions drawn from the examina-

tion of a stationary series can be applied to different time periods (Adeleye, 2018). 

Therefore, it is beneficial to apply autoregressive models for this study. To ensure the 

data is suitable to examine the relationship among variables, a number of tests are 

performed. 

 
5.1.3 Unit Root Test 

A unit root test is a statistical procedure used in economics to determine whether a 

time series variable is non-stationary and has a unit root. Stationarity implies stable 

statistical properties over time (xlstat, 2023). A unit root indicates that the