Soumitra Guha How News Headlines Affect Cryptocurrency Prices: A Case Study on Bitcoin and Ethereum School of Accounting & Finance Master’s thesis in Finance Master’s Program in Finance Vaasa 2025 UNIVERSITY OF VAASA School of Accounting and Finance Author: Soumitra Guha Title of the thesis: How News Headlines Affect Cryptocurrency Prices: A Case Study on Bitcoin and Ethereum Degree: Master of Finance Discipline: Master’s Degree Programme Supervisor: Klaus Grobys Year: 2025 Pages: 93 ABSTRACT: This thesis examines whether curated news headlines are systematically associated with Bitcoin and Ethereum returns in the sample. A mixed-source corpus compiled from CoinDesk and the historical CryptoPanic API is scored with two domain models, FinBERT and CryptoBERT, applied to the duplicate headlines to isolate model effects from data effects. Daily sentiment indices are constructed after cleaning, deduplication, and UTC alignment with prices from Investing.com. Associations are evaluated at three fixed horizons, t = 0, 7 D, and 30 D, with a limited t + 1 check for forecast evaluation only. Estimation uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. For weekly and monthly windows, overlap is acknowledged; confirmatory non-overlapping specifications with a simple market factor and realized-volatility control are reported in the appendix. Results show clear horizon and asset differences. For Bitcoin, FinBERT sentiment is positively associated with same-day returns, while effects fade at 7 D and 30 D; CryptoBERT produces a smaller same-day association. For Ethereum, both models yield positive and statistically significant associations at 7 D and 30 D, with weak or insignificant same-day links. No specification produces a reliable t + 1 effect. The evidence is strictly confirmatory and does not claim causality or tradable prediction. The contribution is to document, on a single curated headline corpus and under harmonized methods, that model choice and asset differences matter for the timing of sentiment–return associations. The findings inform monitoring and risk governance by identifying when sentiment shocks are most likely to appear in prices. Keywords: Bitcoin, Ethereum, sentiment analysis, FinBERT, CryptoBERT, cryptocurrency returns, market efficiency, behavioral finance 3 Table of Contents 1 Introduction 9 1.1 Background of the Study 9 1.2 Problem Statement 10 1.3 Research Objectives 11 1.4 Research Questions 11 1.5 Significance of the Study 11 1.6 Scope and Limitations 12 1.7 Research Gaps 13 1.8 Organization of the Thesis 14 2 Literature Review 15 2.1 Introduction 15 2.1.1 Theoretical foundations for a news–sentiment effect in crypto 15 2.2 Headline Sentiment and Associations with Market Behavior 16 2.3 Thematic Headline Narratives and Economic Context 18 2.4 Headline Networks, Attention, and Spillover Mechanisms 18 2.5 Source Channels, Information Volume, and Cross-Platform Alignment 20 2.6 Forecasting Models, Accuracy, and Methodological Conflicts 20 2.7 Short-Term Predictability and Market Reaction Timing 21 2.8 Regulatory Headlines and Policy Effects 22 2.9 Comparative Sensitivity of Bitcoin and Ethereum 22 2.10 Conflicting Evidence and Contextual Factors 23 2.10.1 Critical comparative analysis and positioning of this thesis 23 2.11 Theoretical Framework 25 2.11.1 Mechanistic link between sentiment and returns 25 2.11.2 Derivation of hypotheses. 26 4 2.12 Summary of Literature and Hypotheses 28 3 Methodology 29 3.1 Overview 29 3.2 Data Collection 29 3.2.1 Price Data 29 3.2.2 News Headlines and Sentiment Data 29 3.2.2.1 Neutral handling and robustness design 31 3.3 Model Choice: FinBERT and CryptoBERT 31 3.4 Data Preparation 32 3.4.1 Price Data Preparation 32 3.4.2 Return Calculations 32 3.4.3 Sentiment Aggregation 32 3.4.4 Dataset Merging 33 3.5 Econometric Framework 33 3.5.1 Model Specification 33 3.5.2 Estimation Technique 33 3.6 Visualization and Descriptive Analysis 34 3.7 Analytical Workflow 35 3.8 Methodological Justification 35 4 Results and Discussion 36 4.1 Overview 36 4.2 FinBERT Analysis 37 4.2.1 Bitcoin Analysis 37 4.2.1.1 Contemporaneous Relationship Between Sentiment and Returns 39 4.2.1.2 Return Differences by Sentiment Category 41 4.2.1.3 Dynamic Co-movement of Returns and Sentiment 42 5 4.2.1.4 Limited forecast evaluation 43 4.2.1.5 Discussion of Findings 43 4.2.1.6 Summary of Empirical Evidence 44 4.2.1.7 Concluding interpretation (FinBERT, Bitcoin) 45 4.3 Ethereum Analysis 45 4.3.1.1 Descriptive Analysis of FinBERT Sentiment 46 4.3.1.2 Contemporaneous Relationship Between Sentiment and Returns 47 4.3.1.3 Return differences by sentiment category (FinBERT, Ethereum) 50 4.3.1.4 Dynamic co-movement of returns and sentiment (FinBERT, Ethereum) 51 4.3.1.5 Limited forecast evaluation (FinBERT, Ethereum) 52 4.3.1.6 Discussion of findings (FinBERT, Ethereum) 52 4.3.1.7 Summary of empirical evidence (FinBERT, Ethereum) 53 4.3.1.8 Concluding Interpretation 53 4.4 CryptoBERT Results 54 4.4.1 Bitcoin analysis (CryptoBERT) 54 4.4.1.1 Descriptive analysis of CryptoBERT sentiment (BTC) 54 4.4.1.2 Contemporaneous association between sentiment and returns (CryptoBERT, Bitcoin) 57 4.4.1.3 Return differences by sentiment category (CryptoBERT, Bitcoin) 60 4.4.1.4 Dynamic co-movement of returns and sentiment (CryptoBERT, Bitcoin) 61 4.4.1.5 Limited forecast evaluation 62 4.4.1.6 Discussion of Findings 62 4.4.1.7 Summary of Empirical Evidence 63 4.4.1.8 Concluding interpretation (CryptoBERT, Bitcoin) 63 4.5 Ethereum analysis (CryptoBERT) 64 4.6 Cross-Model and Cross-Asset Comparison 71 6 4.7 Robustness and Consolidated Results 73 4.8 Including neutral headlines: robustness check 74 5 Conclusion 76 References 80 Appendix 90 7 List of Tables Table 1: descriptive statistics of the FinBERT sentiment variable ........................................... 37 Table 2: Regression Results (Contemporaneous and Predictive Models) ............................... 39 Table 3: Descriptive Statistics of FinBERT Sentiment (ETH) ..................................................... 46 Table 4: Regression Results - FinBERT Sentiment and ETH Returns ........................................ 48 Table 5: Descriptive Statistics of CryptoBERT Sentiment (BTC) ............................................... 55 Table 6: Regression Results CryptoBERT Sentiment and BTC Returns .................................. 57 Table 7: Summary of Empirical Evidence ................................................................................. 63 Table 8: Ethereum Analysis ...................................................................................................... 64 Table 9: Consolidated results across models, assets, and horizons ........................................ 73 8 List of Figures Figure 1: Mechanism linking headline sentiment to cryptocurrency returns. ........................ 27 Figure 2: Distribution of FinBERT Sentiment ........................................................................... 37 Figure 3: BTC Daily Return vs FinBERT Sentiment ................................................................... 39 Figure 4: BTC Daily Returns by Sentiment Category ................................................................ 41 Figure 5: BTC Returns and Sentiment Over Time .................................................................... 42 Figure 6: Distribution of FinBERT Sentiment (ETH).................................................................. 47 Figure 7: ETH Daily Return vs FinBERT Sentiment ................................................................... 48 Figure 8: ETH Daily Returns by FinBERT Sentiment Category .................................................. 50 Figure 9: ETH Returns and FinBERT Sentiment Over Time ...................................................... 51 Figure 10: Distribution of CryptoBERT Sentiment (BTC) .......................................................... 55 Figure 11: BTC Daily Return vs CryptoBERT Sentiment ........................................................... 58 Figure 12: BTC Daily Returns by CryptoBERT Sentiment Bucket ............................................. 60 Figure 13: BTC Returns and CryptoBERT Sentiment Over Time .............................................. 61 Figure 14: Distribution of CryptoBERT Sentiment ................................................................... 66 Figure 15: ETH Daily Returns by Sentiment Bucket ................................................................. 67 Figure 16: ETH Daily Return vs. Sentiment Scatter .................................................................. 68 Figure 17: ETH Returns and Sentiment Over Time .................................................................. 69 file:///D:/DATA/Crypto%20Thesis%20V19(1).docx%23_Toc213196531 9 1 Introduction 1.1 Background of the Study Digital markets now transmit information at high speed, and prices often respond to the arrival of public news and the sentiment it conveys. In this environment, the growth of online news outlets and algorithmic trading has reshaped how market participants process headlines and incorporate them into orders. Cryptocurrencies have been a prominent part of this shift. They introduced new forms of value representation and new channels through which information can affect prices. Bitcoin was introduced in 2009 and Ethereum followed in 2015 with programmable smart contracts. These assets trade continuously across global venues, and they lack the conventional anchors that guide valuation in many traditional markets. As a result, they are more exposed to attention dynamics, liquidity constraints, and limits to arbitrage. High volatility attracts both institutional and retail participation, which can amplify short run responses to news. Headline content is a salient input for attention formation and order flow. Curated news headlines provide time stamped signals with clearer editorial standards relative to social feeds, and they reach broad audiences quickly. Short headlines can coincide with abrupt changes in order imbalance and volatility. Prior work also indicates that tone may map to reactions that are asymmetric, with negative news sometimes producing larger absolute effects than positive news. Since Bitcoin and Ethereum are closely followed and often co owned, common headlines and market wide narratives can appear as cross asset comovement. The present study uses curated daily headlines to examine how headline tone and related thematic signals are associated with Bitcoin and Ethereum market outcomes. We adopt a confirmatory perspective. Our goal is to document associations at clearly defined horizons rather than to claim causal effects or tradable alpha. Throughout the thesis we maintain consistent definitions of variables and windows. Sentiment is measured as continuous daily indices constructed from headlines. Market outcomes are evaluated at same day, seven day, and thirty day horizons, with a limited lead lag check at t plus 1 for forecast evaluation. 10 Estimation relies on ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Where overlapping windows are relevant, we acknowledge the overlap and focus on robust inference. This framing aligns the study with attention based mechanisms and the reality of 24 by 7 crypto trading while keeping all claims strictly confirmatory. 1.2 Problem Statement Existing research on text sentiment and asset prices is extensive, yet the specific role of curated news headlines in cryptocurrency markets remains insufficiently understood. Headlines are the first contact point for many investors and shape attention and expectations before the underlying articles are read. In a 24 by 7 market with high volatility and heterogeneous participants, headline signals can coincide with sharp changes in order flow and price. Most prior studies aggregate broader news or rely on social media streams. These choices introduce timestamp noise, variable editorial quality, and mixture of topics that can obscure the unique effect of headline language. Findings also diverge on directionality. Some studies report that sentiment helps explain or forecast prices, while others document that price movements feed back into measured sentiment. The variation in data sources, horizons, and estimation choices contributes to the lack of consensus. This study addresses that gap with a confirmatory design. The objective is to assess the in sample association between daily headline sentiment and Bitcoin and Ethereum market outcomes under clearly defined horizons. We examine same day, seven day, and thirty day windows, and we include a limited t plus 1 check for forecast evaluation without making claims about tradable alpha or causality. Estimation uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Overlap in multi day windows is acknowledged and handled through robust inference. By focusing on curated headlines and applying two sentiment classifiers on the same corpus, the study provides a disciplined comparison that clarifies whether headline tone is systematically associated with returns and volatility for Bitcoin and Ethereum. 11 1.3 Research Objectives a. Examine the in-sample association between continuous daily headline sentiment (FinBERT and CryptoBERT) and Bitcoin and Ethereum returns at t = 0, 7D, and 30D. b. Conduct a limited t + 1 lead lag check as a forecast evaluation exercise, without claims of predictive power or tradable alpha. c. Compare association strength across BTC and ETH, and across FinBERT and CryptoBERT, under harmonized controls and identical specifications. d. Maintain a confirmatory scope with clear definitions, HAC-robust inference, and transparent handling of overlapping multi day windows. 1.4 Research Questions To achieve these objectives, the study is guided by the following research questions: 1. How is continuous daily headline sentiment associated with same day (t = 0) and cumulative (7D, 30D) returns of BTC and ETH in sample? 2. Does today’s sentiment exhibit any t + 1 lead lag association with next day returns in sample, recognizing that this is a limited forecast evaluation only? 3. How do the association magnitudes and directions differ between BTC and ETH and between FinBERT and CryptoBERT when modeling choices are held constant? 4. Are conclusions consistent across the two classifiers applied to the same headline corpus, and when overlapping multi day windows are addressed with HAC-robust inference? 1.5 Significance of the Study This study contributes to behavioral finance and crypto economics by isolating the role of curated news headlines in shaping price formation for Bitcoin and Ethereum. Unlike many traditional assets, major cryptocurrencies trade 24 by 7 and lack conventional cash flow anchors, which heightens sensitivity to attention, liquidity conditions, and limits to arbitrage. By focusing on headline text and clearly defined return horizons, the study documents associations that clarify how short text sentiment relates to market outcomes in a confirmatory setting. 12 The findings carry practical value for traders and risk managers. Headline tone can inform monitoring, risk limits, and post news positioning when used alongside existing controls and with appropriate caution. For policymakers and regulators, the results illustrate how public communication and policy signals can coincide with measurable changes in widely held digital assets. For researchers, the work narrows the gap between sentiment analysis and market microstructure by using a transparent design. The use of a curated headline corpus, two sentiment classifiers applied to the same data, ordinary least squares with Newey West standard errors, and explicit treatment of overlapping windows supports reproducibility and disciplined comparison without causal claims. 1.6 Scope and Limitations This study focuses on Bitcoin and Ethereum. These two assets are the most liquid and widely covered by professional news outlets, which supports reliable headline collection and consistent measurement. The analysis uses curated online financial news headlines as the sole text source. Social media content and long form articles are not included. The work concentrates on text. Non textual media such as images, videos, and memes are outside scope. Sentiment is measured as continuous daily indices derived from headlines using FinBERT and CryptoBERT applied to the same corpus. Outcomes are limited to returns at t = 0, 7D, and 30D, with a limited t plus 1 check for forecast evaluation. The study does not claim causality or tradable alpha. Time alignment uses daily aggregation with UTC based matching of headline timestamps to return windows. In continuously trading markets, publication time and market reaction may not be perfectly synchronized at the intraday level. This timing uncertainty is a limitation that is mitigated by focusing on daily windows and by reporting robust inference. Estimation relies on ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Multi day windows can overlap, which increases serial correlation. This is addressed with HAC standard errors, and results are interpreted as confirmatory associations within sample. 13 Findings are specific to Bitcoin and Ethereum under the defined period and data source. They should not be generalized to smaller or less liquid cryptocurrencies without further evidence. The study does not evaluate trading strategies, transaction costs, or execution feasibility. 1.7 Research Gaps Existing crypto–sentiment studies commonly rely on social media streams rather than curated news headlines, evaluate a single sentiment model in isolation, and focus on one asset and one horizon. As a result, it is unclear whether finance oriented and crypto oriented language models extract meaningfully different signals from the same headline corpus, and whether any such differences matter for in sample associations with Bitcoin and Ethereum returns (Gurgul et al., 2023; Kirtac & Germano, 2024; Loginova et al., 2021; Moradi-Kamali et al., 2025; Nie et al., 2024). This thesis addresses that gap within a confirmatory scope. 1. It constructs a unified headline dataset for Bitcoin and Ethereum, cleaned, deduplicated, and time aligned to market data so that any differences in results can be attributed to the model rather than to differing inputs. 2. It applies two transformer classifiers, FinBERT and CryptoBERT, to the same corpus with consistent handling of polarity classes and daily aggregation to form continuous sentiment indices. 3. It links headline sentiment to returns at t = 0, 7D, and 30D using ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Overlapping multi day windows are acknowledged, and inference is kept robust. A limited t plus 1 check is included for forecast evaluation without claims of predictive power or tradable alpha. 4. It runs all specifications in parallel for Bitcoin and Ethereum under harmonized modeling choices to reveal asset specific differences in association strength and horizon. By holding the information source constant while varying the classifier and the asset, the thesis provides a comparative, model sensitive answer to a question left open by prior work. When headlines are the common input, do FinBERT and CryptoBERT yield economically and statistically meaningful differences in their in-sample association with Bitcoin and Ethereum 14 returns, and on which horizons. The explicit mapping from prior limitations to this design constitutes the study’s contribution, framed strictly as confirmatory evidence. 1.8 Organization of the Thesis This thesis is structured in five chapters. Chapter 1 introduces the study. It states the background, problem statement, objectives, research questions, significance, scope, limitations, research gaps, and the organization of the document. Chapter 2 presents the literature review and theoretical framework. It analyzes prior work on sentiment and cryptocurrency markets, explains why results differ across sources and methods, and outlines the core mechanisms relevant to this study, including attention, limits to arbitrage, and continuous trading. The chapter positions curated headlines as the information source and motivates a confirmatory approach. Chapter 3 details the methodology and data. It describes price and headline sources, corpus construction, preprocessing, FinBERT and CryptoBERT application, daily sentiment aggregation, and UTC based alignment with return windows. It then sets out the econometric design using ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors, notes overlap in multi day windows, and explains the specification choices used for all comparisons. Visual summaries are included for transparency and reproducibility. Chapter 4 reports the empirical results. It presents associations between headline sentiment and returns at t = 0, 7D, and 30D, a limited t plus 1 check for forecast evaluation, and comparative evidence across classifiers and across assets. The chapter discusses robustness notes consistent with the confirmatory scope and interprets magnitudes in light of the theoretical mechanisms. Chapter 5 concludes. It summarizes the main findings, discusses theoretical and practical implications, states limitations, and identifies directions for future research. 15 2 Literature Review 2.1 Introduction Information arrives continuously in cryptocurrency markets and is rapidly incorporated into trading decisions. Headlines are a prominent channel because they are brief, time stamped, and widely disseminated. Bitcoin and Ethereum attract persistent coverage, so headline tone can coincide with short run changes in attention, order flow, and prices. Prior work reports effects on returns and volatility, but the direction, magnitude, and persistence of these effects vary with the data source, labeling approach, and estimation choices (Ahmad et al., 2015; Anese et al., 2023; Chen et al., 2021; Guidolin & Pedio, 2021; Heston & Sinha, 2016; Liu et al., 2022). Differences across studies often reflect three factors. First, the information source varies. Social media streams introduce timestamp noise and heterogeneous editorial quality, while curated news headlines provide clearer timing and scope. Second, model and feature choices differ. Studies use distinct sentiment pipelines, polarity definitions, and aggregation rules, which can change measured tone. Third, design choices matter. Horizons, overlapping windows, and treatment of serial correlation influence inference. These elements explain why some studies find that sentiment helps explain prices while others find that price movements feed back into measured sentiment(Baker & Wurgler, 2006; Huang & Ibragimov, 2022; Mai et al., 2022; Sakariyahu et al., 2023; Symitsi & Stamolampros, 2021). This chapter reviews the literature with that structure in mind. It contrasts social media and curated news evidence, evaluates findings across models and horizons, and links results to mechanisms grounded in attention, limits to arbitrage, and continuous trading. The goal is to clarify how methodological choices shape conclusions and to motivate a confirmatory design that focuses on association, uses curated headlines as the common input, and applies consistent specifications across Bitcoin and Ethereum. 2.1.1 Theoretical foundations for a news–sentiment effect in crypto Crypto markets have features that make headlines influential as signals for short horizon price formation. 16 First, continuous trading removes overnight closures and batch openings that can delay price discovery in other markets. With trading active at all hours, order flow can incorporate headline tone on the same day. This motivates testing t = 0 effects using daily headline aggregates and a limited t plus 1 check for forecast evaluation (Aït-Sahalia et al., 2024; Beschwitz et al., 2019; Brière et al., 2022; Chen et al., 2021; Deveikyte et al., 2022). Second, limits to arbitrage are often stronger in crypto. Liquidity is fragmented across venues, funding conditions vary across stablecoins and derivatives, and occasional outages or frictions restrict capital mobility. When arbitrage capacity is constrained, sentiment driven mispricing can persist long enough to be observable at daily horizons. This justifies the use of ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors to handle serial correlation from gradual adjustment (Kommel et al., 2018; Szczygielski et al., 2020). Third, attention theory predicts that investors respond more to salient, easy to process cues. Headlines are short, prominent, and arrive in bursts. On high attention days, coordination of focus can amplify price impact. This is why the empirical design treats curated headlines as the information source and includes headline counts in descriptive summaries and, where applicable, as a control (Aït-Sahalia et al., 2024; Beschwitz et al., 2018; Dim et al., 2023). Fourth, investor clientele and narrative intensity can generate asymmetric reactions. Technology adoption, regulation, and liquidity stories may elicit stronger responses to certain tones or assets. This provides an interpretive basis for sign and magnitude differences between Bitcoin and Ethereum and for evaluating whether effects attenuate from t = 0 to 7D and 30D (Augustin et al., 2023; Meegan et al., 2020; Ortu et al., 2021). These mechanisms guide the confirmatory design. The study tests clearly defined associations between daily headline sentiment and returns, uses curated headlines for cleaner timing, and applies consistent specifications across assets without making causal or alpha claims (Baker & Wurgler, 2006; Lefort et al., 2024). 2.2 Headline Sentiment and Associations with Market Behavior A large strand of research reports that adding sentiment features to price based models improves forecasts for Bitcoin and Ethereum. Studies using neural networks such as MLP, 17 CNN, and LSTM often find lower forecast errors once sentiment inputs are included, which suggests that text captures behavioral dimensions not fully reflected in lagged prices. Similar results appear during turbulent periods. For example, work that combines social media sentiment and search intensity with time series models such as SARIMA reports improved fit for BTC and ETH, and transformer based classifiers can outperform earlier sentiment tools in those forecasting settings (Arslan, 2024; Chalkiadakis et al., 2023; Ciganovic & D’Amario, 2023; Gurgul et al., 2023; Hossain et al., 2024; Liapis et al., 2021; Moradi-Kamali et al., 2025). Other frameworks separate the prediction and validation steps. In these designs, one model forecasts prices and a second component checks whether sentiment supports the predicted direction. Such hybrids can reduce forecast errors relative to price only baselines (Chen et al., 2024). At the same time, several papers show that the incremental value of sentiment weakens once broader market factors are controlled or when sentiment is measured from sources that may respond to price rather than lead it. When overall market movement is included, measured lead lag links between sentiment and next day returns can shrink, which implies that part of the signal reflects contemporaneous mood rather than independent information (Allen et al., 2019; Deveikyte et al., 2022; Gan et al., 2019; Mai et al., 2022). Taken together, the literature shows promise for sentiment features in forecasting exercises but also highlights sensitivity to data source, horizon, and controls. Social media streams and mixed indicators can raise timestamp noise and introduce feedback from price to measured sentiment. This motivates the present thesis to adopt a confirmatory design based on curated daily headlines, to focus on clearly defined in sample associations with returns at t = 0, 7D, and 30D, to include only a limited t plus 1 check for forecast evaluation, and to apply consistent specifications across Bitcoin and Ethereum. This approach clarifies whether headline tone from a clean source is systematically associated with returns without making causal or trading claims (Bashchenko, 2022; Gadi & Sicilia, 2024; Moradi-Kamali et al., 2025; Said et al., 2023). 18 2.3 Thematic Headline Narratives and Economic Context Headline topics shape how information is processed in crypto markets. Prior studies classify cryptocurrency news into recurring themes such as regulation, security incidents, governance, and investment. Negative items within these themes, for example enforcement actions or hacking events, are consistently associated with short run price declines, which underscores the salience of negative framing in attention and order flow (Akyildirim et al., 2024; Chokor & Alfieri, 2021; Coulter, 2022; Muktadir-Al-Mukit & Ali, 2025; Zhang et al., 2025). Narrative clustering typically highlights investment, technology, regulation, and security as major groups. Reported relationships with prices differ by theme and by asset. Some studies find stronger links for investment and regulation narratives, with weaker or asymmetric links for technology and security. These patterns suggest that the market reacts more to headlines that change perceived participation, access, or legal risk than to technical updates that are slower to value (Davoudi et al., 2024; Jesus & Dumitrescu, 2025; Meegan et al., 2020; Schwenkler & Zheng, 2025). Broader macroeconomic news also conditions crypto returns through the sentiment embedded in financial headlines. During the COVID 19 period, several papers document that indices such as Economic News Sentiment and the VIX are associated with movements in Bitcoin and Ethereum, with differences in sensitivity across the two assets. These findings indicate that headline tone can combine asset specific narratives with macro risk and optimism signals (Canayaz et al., 2023; Filippou et al., 2023; Meegan et al., 2020). This thesis does not model narrative themes separately. The literature in this section is used to interpret results and to motivate the focus on curated headlines and clearly defined horizons. The empirical design remains confirmatory. It tests whether continuous daily headline sentiment is systematically associated with returns for Bitcoin and Ethereum, without causal claims or trading assertions. 2.4 Headline Networks, Attention, and Spillover Mechanisms Beyond direct tone effects, headline linkages can transmit shocks across assets. Bitcoin and Ethereum occupy central positions in crypto news ecosystems and appear frequently as co 19 mentions. When one asset experiences a negative headline shock, co mentioned peers can exhibit short run movements that later revert. This pattern is consistent with temporary mispricing driven by correlated attention rather than by shared fundamentals (Business-Level Strategy, 2020; Ge et al., 2024; Han et al., 2022; Schwenkler & Zheng, 2025). Attention intensity amplifies these dynamics. Weeks with spikes in Bitcoin and Ethereum coverage are often accompanied by surges in peer trading activity and return co-movement, which indicates that concentrated media focus can synchronize behavior across assets. In this sense, Bitcoin and Ethereum function as informational anchors within the headline network, and their coverage can shape how other cryptocurrencies are valued in the short run (Corbet et al., 2018; Meegan et al., 2020). This thesis does not model network spillovers directly. The network evidence informs interpretation only. Our empirical design remains confirmatory and asset specific. We test whether continuous daily headline sentiment is associated with Bitcoin and Ethereum returns at clearly defined horizons, and we avoid causal or trading claims. Beyond direct tone effects, headline linkages can transmit shocks across assets. Bitcoin and Ethereum occupy central positions in crypto news ecosystems and appear frequently as co-mentions. When one asset experiences a negative headline shock, co mentioned peers can exhibit short run movements that later revert. This pattern is consistent with temporary mispricing driven by correlated attention rather than by shared fundamentals. Attention intensity amplifies these dynamics. Weeks with spikes in Bitcoin and Ethereum coverage are often accompanied by surges in peer trading activity and return co-movement, which indicates that concentrated media focus can synchronize behavior across assets. In this sense, Bitcoin and Ethereum function as informational anchors within the headline network, and their coverage can shape how other cryptocurrencies are valued in the short run (Corbet et al., 2018). This thesis does not model network spillovers directly. The network evidence informs interpretation only. Our empirical design remains confirmatory and asset specific. We test whether continuous daily headline sentiment is associated with Bitcoin and Ethereum returns at clearly defined horizons, and we avoid causal or trading claims. 20 2.5 Source Channels, Information Volume, and Cross-Platform Alignment Headline effects depend on the source. Social media carries large volumes of sentiment but often exhibits higher noise and faster feedback from price to measured tone. Institutional news sources provide clearer timing, editorial standards, and topic focus, which can yield more persistent market responses. This contrast motivates the thesis focus on curated headlines as the primary information channel (Brière et al., 2022; Bybee et al., 2024). Cross platform agreement can strengthen signals. Studies report that when traditional media and social media convey similar tone, short horizon associations with returns become more pronounced. By contrast, sentiment drawn from a single social stream has lower explanatory power in many settings. Engagement metrics such as likes and retweets correlate with price movements, yet realized profitability varies with asset and conditions, which highlights the risk of relying on platform activity alone (Kant et al., 2024; Sundarasen & Saleem, 2025). Information volume matters alongside polarity. Neutral or information dense headlines can affect liquidity and trading even without strong positive or negative tone, which suggests that investors respond to attention and content load as well as to sentiment direction. In this thesis, the empirical design remains headline based and confirmatory. We use curated daily headlines to construct continuous sentiment indices, maintain clearly defined horizons, and treat headline counts as descriptive context or, where applicable, as a simple control. The goal is to isolate associations from a clean source without making causal or trading claims (Binsbergen et al., 2024; Brière et al., 2022). 2.6 Forecasting Models, Accuracy, and Methodological Conflicts Headline based forecasting has advanced with deep learning and modern NLP. Studies applying transformer models such as FinBERT and sequence models such as Bi-LSTM report higher classification and forecasting accuracy than lexicon approaches in several datasets. Weighted sentiment schemes that scale tone by source credibility or audience reach can further improve forecast metrics in model specific settings (Karzanov, 2023; Liapis et al., 2021; Zhu et al., 2022). 21 Comparative studies often find that incorporating sentiment features raises directional accuracy or Sharpe ratios relative to price only baselines. These gains, however, depend on data scope, feature engineering, and evaluation design. Reported improvements are typically measured against in sample or rolling benchmarks and may be sensitive to class balance, timestamp alignment, and how overlapping horizons are handled (Chuang et al., 2024; Kisiel & Gorse, 2022; Li et al., 2021). Disagreement persists on causality and directionality. Some nonlinear frameworks, including Gaussian process models, detect two way links between sentiment and returns. Linear VAR and related entropy based analyses often find the opposite direction, with prices leading measured sentiment. These conflicting results can arise from differences in information sources, horizon choices, and model flexibility. Nonlinear methods can capture feedback and delayed effects, while static or linear approaches can understate them, especially when inputs contain noise or when windows overlap (Calderon & Berman, 2024; Ghorbani et al., 2024; Gonçalves et al., 2024). 2.7 Short-Term Predictability and Market Reaction Timing Short horizon reactions to headline tone are frequently the strongest in the literature. Studies document that negative headlines are often followed by immediate drawdowns, while positive headlines can coincide with quick upticks in prices and volatility. Classification frameworks that flag impact based on intraday price movements also report high directional accuracy when using modern transformer representations of headline text. At the same time, several papers find temporal asymmetry. Good news can diffuse more slowly across investors, while negative news tends to trigger faster and larger immediate responses. This asymmetry helps explain why short run effects can differ by sign and why persistence can vary across horizons (Baker & Wurgler, 2006; Calvet & Fisher, 2008; Hirshleifer et al., 2023; Tao & Shao, 2025). The present thesis treats these findings as context for interpretation. Our empirical design is confirmatory and focuses on clearly defined daily windows. We test associations between daily headline sentiment and returns at t = 0, 7D, and 30D, and include a limited t plus 1 check for forecast evaluation. We do not claim causality or trading viability. Estimation uses ordinary 22 least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors, and overlap in multi day windows is acknowledged. 2.8 Regulatory Headlines and Policy Effects Regulatory communication is a central narrative channel for cryptocurrency markets. Headlines that indicate supportive policies, institutional adoption, or legal clarity are commonly associated with positive short run market reactions. Restrictive or uncertain policy signals are associated with declines. Prior work also shows that policy news can coincide with changes in activity measures on public blockchains, which suggests that regulatory communication affects both valuation and participation (Chokor & Alfieri, 2021; Meegan et al., 2020; Shanaev et al., 2019). Sensitivity to policy appears broad based across significant assets, including Bitcoin and Ethereum. This supports the view that regulatory signals operate at the ecosystem level rather than at the level of individual tokens alone. In this thesis, regulatory headlines are part of the curated corpus but are not modeled as a separate category. The regulatory literature informs interpretation only. Our analysis remains confirmatory. We test whether continuous daily headline sentiment, constructed from all curated headlines, is associated with returns at the specified horizons under consistent specifications and without causal or trading claims (Allen et al., 2019; Heston & Sinha, 2016; Lefort et al., 2024). 2.9 Comparative Sensitivity of Bitcoin and Ethereum Evidence suggests that Bitcoin and Ethereum differ in how quickly and strongly they respond to professional financial headlines. Bitcoin often reacts more promptly to curated news, consistent with its larger market depth, broader coverage, and higher baseline attention. Ethereum responses appear more context dependent and shaped by community and technology narratives, which can diffuse more gradually. Studies during macroeconomic and regulatory episodes indicate further heterogeneity. Ethereum is frequently more sensitive to positive macro sentiment, while Bitcoin can adjust more gradually. During major regulatory or crisis events, both assets tend to co move, which implies exposure to common market wide sentiment shocks. 23 In this thesis, these patterns inform interpretation only. The empirical design remains confirmatory and asset specific. We test associations between daily headline sentiment and returns for Bitcoin and Ethereum at t = 0, 7D, and 30D under consistent specifications. 2.10 Conflicting Evidence and Contextual Factors Findings in the literature often conflict because of differences in data sources, sentiment construction, horizons, and models. Deep learning and other nonlinear approaches frequently detect complex two way relationships between sentiment and prices. Linear or static econometric frameworks often emphasize the opposite direction, with prices leading measured sentiment. Results are also sensitive to timestamp alignment, how overlapping windows are handled, and whether market wide controls are included (Baker & Wurgler, 2006, 2007; Cai & Yung, 2022; Deveikyte et al., 2022; Gan et al., 2019; Huang & Ibragimov, 2022; Kearney & Liu, 2014; Lis, 2024). This thesis addresses these issues by narrowing scope and standardizing choices. We use curated daily headlines as a single information source, apply FinBERT and CryptoBERT to the same corpus, define horizons as t = 0, 7D, and 30D, and estimate with ordinary least squares and Newey West heteroskedasticity and autocorrelation consistent standard errors. We include a limited t plus 1 check for forecast evaluation only. The objective is to provide a clear, confirmatory assessment of in sample associations rather than to resolve causal direction across model classes. 2.10.1 Critical comparative analysis and positioning of this thesis Apparent contradictions in the crypto sentiment literature mostly reflect systematic design differences rather than true disagreement. The key dimensions are:  Information source: Curated news headlines are editor screened and lower noise. Social media streams are higher variance and frequently price reactive, which raises filtering demands and reverse causality risk. Studies isolating news shocks tend to report clearer contemporaneous responses. 24  Horizon and sampling frequency: Intraday and daily designs capture flow driven reactions. Weekly and monthly designs capture narrative diffusion and macro co movement. Coverage intensity conditions volatility and effect sizes.  Asset and market regime: Bitcoin, with deeper liquidity and heavier macro and regulatory coverage, more often shows same day effects. Ethereum, with stronger technology and community narratives, more often shows multi day incorporation. Effects strengthen in risk on or regulatory active periods and weaken in calm regimes.  Model class and labels: Domain specific transformers can draw different polarity boundaries and signal strengths. Without a shared headline corpus, observed model differences conflate classifier and data. Prior BERT based work shows that design choices materially affect signals.  Controls and identification: When market wide controls such as broad market moves, volume, or realized volatility are included, incremental sentiment effects shrink, which indicates overlap between sentiment and state variables. Positioning of this thesis. To address these divergence sources, the thesis uses curated news headlines as the single information channel, evaluates multiple horizons to separate flow from diffusion (t = 0, 7D, 30D), and runs Bitcoin and Ethereum in parallel under harmonized choices. FinBERT and CryptoBERT are applied to the same cleaned headline corpus with consistent daily aggregation to form continuous indices. Estimation uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Overlap in multi-day windows is acknowledged. Where applicable, simple market and realized volatility controls are included to avoid overstating sentiment. The design is strictly confirmatory. It tests in sample associations and includes a limited t plus 1 check for forecast evaluation only. Directed integration of the four studies: Social media vs. news: Challenges in social media classification and price reactivity support the choice of curated headlines as the primary source (Kulakowski and Frasincar, 2023). 25 Breaking news channel : News shocks raise trading activity, which motivates testing same day return links from headline sentiment (Kulbhaskar and Subramaniam, 2023). Media intensity: Higher coverage is associated with higher volatility, so confirmatory specifications note market and realized volatility controls, and treat attention interactions as an extension where relevant (Lee and Jeong, 2023). Model sensitivity: BERT design choices matter. Applying FinBERT and CryptoBERT to one shared corpus isolates model effects from data effects (Ider and Lessmann, 2022). Resulting, testable focus for this thesis: i) News based headline sentiment should explain same day Bitcoin returns more reliably than social media-based measures used in prior work. ii) Ethereum is expected to show weaker same day but stronger multi day associations. iii) Any differences between FinBERT and CryptoBERT are attributed to model behavior because the input corpus is shared, and effects may vary by horizon. iv) After adding simple market and attention related controls, residual associations are expected to be modest and short lived, which is consistent with limits to arbitrage and attention mechanisms. 2.11 Theoretical Framework 2.11.1 Mechanistic link between sentiment and returns This thesis adopts a simple micro–macro mechanism to explain how headline tone can become visible in daily cryptocurrency returns while remaining consistent with semi–strong efficiency.  Attention and order flow: Positive headlines attract retail attention and temporarily increase buy pressure, whereas negative tone draws risk–off behavior and short–side volume. Because crypto assets trade continuously and have low institutional intermediation, this attention shock directly affects same–day order flow and prices. 26  Absorption and limits to arbitrage: Unlike conventional markets, 24/7 trading and exchange fragmentation mean that mispricing is quickly noticed but not instantly corrected; transaction costs, exchange fees, and basis differentials limit immediate arbitrage. Sentiment shocks therefore appear in returns for one or two horizons before being fully arbitraged away.  Differential adjustment speed: Bitcoin’s deeper liquidity allows faster absorption, while Ethereum’s thinner market and stronger speculative participation cause slower incorporation of information. This structural difference motivates the expectation of horizon–specific reactions. 2.11.2 Derivation of hypotheses.  H1 (Same-day reaction): Because attention shocks translate directly into order flow, positive (negative) sentiment will be associated with contemporaneous positive (negative) returns.  H2 (Short persistence in ETH): Due to slower absorption and more retail participation, the effect of positive sentiment will persist for several days in Ethereum but dissipate rapidly in Bitcoin.  H3 (Volatility effect): Large absolute sentiment values, regardless of sign, trigger wider dispersion in intraday and daily prices, producing higher realized volatility.  H4 (Asymmetry across assets and horizons): The strength and duration of the sentiment– return relationship depend on the asset’s liquidity and trading depth, being strongest for Ethereum at multi–day horizons and weakest for Bitcoin at t + 1. This mechanism unifies the behavioural (attention), market–structure (limits to arbitrage), and informational (efficiency) perspectives and provides a direct theoretical pathway from headline tone to the four empirical hypotheses tested below. 27 Headline Tone (FinBERT/CryptoBERT) Attention Shock/Media Salience Order Flow Imbalance (Buy/Sell Pressure) Short-Term Return Reaction Short-Term Return Reaction Volatility Increase (High | Sentiment | - Higher Dispersion) Bitcoin: Faster Absorption (t=0) Ethereum: Slower Diffusion (t=7-30D) Figure 1: Mechanism linking headline sentiment to cryptocurrency returns. 28 Headline tone triggers an attention shock that shifts short-term order flow and produces a contemporaneous price reaction. The effect attenuates as arbitrage corrects mispricing but leaves a temporary footprint in returns and volatility. Bitcoin exhibits rapid absorption (t = 0), whereas Ethereum shows slower diffusion (t = 7–30D). 2.12 Summary of Literature and Hypotheses The literature indicates that investor sentiment relates to cryptocurrency returns, but the magnitude, persistence, and sign asymmetry depend on information source, horizon, and modeling choices. Curated news headlines provide cleaner timing than social feeds and are suitable for testing short-horizon associations. Prior findings often report stronger immediate effects, occasional asymmetry with larger negative responses, and asset specific differences tied to attention, liquidity, and narrative intensity. These points motivate a confirmatory design that tests clearly defined associations using headline sentiment as a continuous daily index. In this study, continuous daily headline sentiment is constructed from a curated corpus and applied uniformly to Bitcoin and Ethereum. Associations with returns are evaluated in sample at t = 0, 7D, and 30D, with a limited t plus 1 check for forecast evaluation only. Extreme sentiment refers to the tails of the continuous index and is used for a simple volatility robustness note, not as a primary construct. Estimation uses ordinary least squares with Newey-West heteroskedasticity and autocorrelation consistent standard errors, and overlapping multi-day windows are acknowledged. Hypotheses H1: More positive daily sentiment is associated with higher same day returns, and more negative sentiment is associated with lower same day returns. H2: Cumulative associations over 7D and 30D are stronger for Ethereum than for Bitcoin, consistent with slower diffusion in Ethereum. H3: Days with larger absolute sentiment are associated with higher short term realized volatility. This is an ancillary robustness check based on the tails of the continuous index. H4: Association patterns differ by asset and horizon, with Bitcoin stronger at t = 0 and Ethereum stronger over 7D and 30D. 29 3 Methodology 3.1 Overview This chapter describes the framework used to examine associations between daily headline sentiment and returns for Bitcoin and Ethereum. The workflow covers data collection, preprocessing, sentiment extraction, return construction, dataset alignment, and regression estimation with supporting visual summaries. The study evaluates in sample associations at t = 0, 7D, and 30D, and includes a limited t plus 1 check as a forecast evaluation exercise. No out of sample prediction or trading strategy is attempted. Estimation later in the chapter uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. 3.2 Data Collection 3.2.1 Price Data Daily Bitcoin and Ethereum prices were obtained from Investing.com for the period 10 November 2021 through 12 August 2025. The file includes open, high, low, and close, with standardized calendar dates. This window contains both rising and falling market phases. All return calculations in later sections use UTC aligned daily windows to match headline timing. 3.2.2 News Headlines and Sentiment Data News sources, coverage, and data-quality diagnostics Sources: The headline corpus used in this thesis comes from two professional news pipelines: (i) CoinDesk web headlines and (ii) CryptoPanic headlines collected via its public API during the study period. CryptoPanic has since discontinued public access and placed the API behind a paywall, so the original collection process cannot be re-run under identical terms. The final analysis therefore uses a compiled mixed-source headline file as a fixed corpus. Provenance and mixing: Headline items from CoinDesk and CryptoPanic were pooled into a single dataset for sentiment scoring. During compilation, per-item source labels were not preserved, so the analysis cannot attribute an individual headline to a specific upstream source. This limitation is disclosed and is handled by working at the daily aggregation level. 30 Because both inputs are professional, edited news pipelines rather than social feeds, the mixed corpus retains the intended scope of curated market news. Coverage window and granularity: Headlines span the full sample used in the confirmatory analysis. Items are timestamped to the day and merged on UTC to align with daily return construction. The same compiled corpus is used for both classifiers so that differences in results can be attributed to the sentiment model rather than to the news source. Deduplication and filtering: Within-day exact-title duplicates were removed before aggregation. Non-English items, advertisements, and navigation pages were excluded during scraping or API parsing when these fields were detectable. Headlines with empty or null titles were dropped. Neutral handling: The main confirmatory design excludes neutral headlines after sentiment scoring to reduce noise in daily aggregates. A robustness check that includes neutrals with within-day z scoring is reported in the appendix. The inclusion of neutrals does not change the sign of the reported coefficients and does not generate a significant t + 1 effect. Missingness and merge coverage: After UTC alignment and inner joins with price data, the analysis uses only dates with both returns and a valid daily sentiment index. Trading-holiday gaps are not a concern because the crypto market trades continuously. Days with zero valid headlines after filtering contribute no sentiment signal and are treated as missing in the regressions. Potential biases: Mixing CoinDesk with CryptoPanic aggregates may overweight topics that both sources deem salient and underweight idiosyncratic stories that appear in only one source. Loss of per-item provenance also prevents source-specific diagnostics. These risks are mitigated by the use of a single compiled corpus for both sentiment models, fixed horizons, and a confirmatory focus on in-sample associations rather than causal claims or tradable prediction. Reproducibility note: Because public API access at CryptoPanic has changed, exact recollection under the original access terms is not possible. To support transparency, this thesis treats the compiled mixed-source headlines file, the classifier outputs, and the merged daily analysis panel as fixed inputs for all reported results. All transformations from compiled 31 headlines to daily sentiment indices are described step by step in this chapter and can be applied to any future headline corpus collected under similar rules. 3.2.2.1 Neutral handling and robustness design Policy: The main confirmatory design excludes headlines that the classifier labels as neutral. The goal is to reduce noise in daily aggregates and to focus on tone that is more likely to shift attention and order flow. Construction: For each day, positive and negative scores are aggregated into a continuous daily index. Neutrals are dropped in the main specification. For the robustness check, all three classes are included after within-day z scoring of raw headline scores so that days with many headlines do not mechanically dominate days with few headlines. Robustness plan: All confirmatory regressions are re-estimated with the neutrals-included index using the same horizons, controls, and standard error settings. Results are reported side by side in Appendix A6 and summarized in Section 4.8. 3.3 Model Choice: FinBERT and CryptoBERT FinBERT is a transformer model adapted to the financial domain. General purpose language models can misclassify finance specific terms and phrasing. FinBERT is trained on financial text, which improves the alignment between labeled polarity and the semantics found in headlines about markets. Using FinBERT helps ensure that sentiment extracted from cryptocurrency related headlines is consistent with conventions in financial sentiment research. CryptoBERT complements FinBERT by targeting cryptocurrency language. Crypto reporting often uses domain specific vocabulary, community expressions, and technology references that differ from traditional financial communication. A crypto adapted transformer is designed to represent these features more faithfully than general encoders trained on broad text. Using both models serves two purposes in this thesis. First, it allows a disciplined comparison of finance oriented and crypto oriented classifiers applied to the same headline corpus with identical preprocessing and daily aggregation. Any difference in measured associations can then be attributed to the model rather than to differences in inputs. Second, it addresses a 32 gap in prior work, where finance specific models are often evaluated without a direct crypto specific counterpart on a shared dataset. The role of the models in this study is confirmatory. FinBERT and CryptoBERT outputs are used to construct continuous daily sentiment indices for Bitcoin and Ethereum. These indices are then related to returns at t = 0, 7D, and 30D, with a limited t plus 1 check for forecast evaluation. The objective is to test in sample associations under consistent specifications, not to claim causality or to optimize forecasting performance. 3.4 Data Preparation 3.4.1 Price Data Preparation The price dataset was processed in Python. Dates were parsed into proper datetime format and sorted in ascending order. Commas were removed from numeric fields and all variables converted into numeric type. Only the key OHLC variables were retained. 3.4.2 Return Calculations Returns were derived from closing prices using percentage change formulas. Daily return (1-day) 𝑅𝑒𝑡𝑢𝑟𝑛𝑡(1𝑑) = [ 𝑃𝑡 − 𝑃𝑡−1 𝑃𝑡−1 ] × 100 Seven-day return Returnt(7d) = [ Pt − Pt−7 Pt−7 ] × 100 Thirty-day return Returnt(30d) = [ Pt − Pt−30 Pt−30 ] × 100 To avoid missing values, the first 7 and 30 observations were dropped from the 7D and 30D series. Returns are computed on UTC aligned daily windows to match headline timing. 3.4.3 Sentiment Aggregation Multiple headlines on the same date were aggregated to a single daily score. For each headline, expected sentiment was computed as: 33 Expectedrow = ppositive − pnegative Neutral outputs were excluded by design. For each date, the following measures were derived on the shared corpus for FinBERT and for CryptoBERT: the mean of Expected_row, a confidence weighted sentiment score, the mean confidence (maximum class probability per headline), and the total headline count. The daily mean served as the primary sentiment index. 3.4.4 Dataset Merging Price returns and daily sentiment were merged by inner join on the UTC date key. Only dates with both price and sentiment were retained. Two versions were created: one that preserves rows with missing returns, and a cleaned dataset that drops rows with NaNs arising from the first 7D and 30D constructions. The cleaned dataset is used for all analyses. 3.5 Econometric Framework 3.5.1 Model Specification Contemporaneous association: 𝑅𝑡 = α + β1Sentiment𝑡 + β2HeadlinesCount𝑡 + γ′𝑋𝑡 + ε𝑡 where 𝑅𝑡 is the return over the chosen horizon (1-day, 7-day, or 30-day), Sentiment𝑡 is the daily sentiment score from FinBERT or CryptoBERT. 𝑋𝑡 contains controls for market wide movement and realized volatility as defined below. Limited lead lag check for forecast evaluation only: 𝑅𝑡+1 = α + β1Sentiment𝑡 + β2𝑅𝑡 + β3HeadlinesCount𝑡 + γ′𝑋𝑡 + ε𝑡+1 This t + 1 specification is not used to claim out of sample predictive power or tradable alpha. 3.5.2 Estimation Technique All regressions are estimated in Stata 17 using ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. For baseline models a lag length of 5 is used to address short term autocorrelation in financial returns. 34 To provide confirmatory inference for the 7D and 30D horizons, models are re estimated on non overlapping samples: every seventh observation for 7D and every thirtieth observation for 30D. Each confirmatory model includes:  A market factor, defined as BTC daily return when ETH is the dependent variable, and as the equal weighted average of BTC and ETH daily returns when BTC is the dependent variable;  A realized volatility control, computed as the 7 day rolling standard deviation of the dependent asset’s daily returns. Confirmatory models are estimated by OLS with HAC standard errors using a data driven bandwidth equal to floor(1.75 × T^(1/3)). This design addresses serial dependence from horizon construction and reduces omitted variable bias from market wide and volatility related movements while remaining within the existing dataset. Separate specifications are estimated for each cryptocurrency and classifier pairing: FinBERT with BTC, FinBERT with ETH, CryptoBERT with BTC, and CryptoBERT with ETH. All claims are framed as in sample associations within a confirmatory scope. 3.6 Visualization and Descriptive Analysis Visualizations were used to summarize the data and support interpretation of the regression results. All plots use UTC aligned dates and the cleaned analysis dataset.  Sentiment distributions. Histograms with kernel density overlays depict the distribution of the daily sentiment indices for FinBERT and CryptoBERT, shown separately for Bitcoin and Ethereum.  Returns by sentiment bins. Boxplots compare return distributions across simple bins formed from the daily sentiment index. Bins are defined around zero to represent negative, near zero, and positive days for descriptive purposes. These bins are used only for visualization. All inference in the thesis relies on the continuous sentiment index.  Scatterplots with fitted lines. Scatterplots of daily sentiment versus returns include a linear fit and a locally weighted fit to illustrate average patterns without imposing a functional form. These plots are descriptive only. 35  Time series overlays. Overlays of sentiment and returns provide a view of co movements and periods of elevated volatility. Plots are presented for the full sample and for selected subperiods to aid readability. Figures are labeled with consistent captions and are referenced once in the text. The visualizations are intended to complement, not replace, the econometric analysis, and they maintain the confirmatory scope of the study. 3.7 Analytical Workflow Baseline estimates are reported in the main text. Appendix Tables A1 through A4 present confirmatory specifications that use non overlapping seven day and thirty day samples, include a simple market factor and a realized volatility control, and are estimated with heteroskedasticity and autocorrelation consistent standard errors using a data driven bandwidth. These appendix outputs are constructed from the same UTC aligned, cleaned dataset and follow the identical preprocessing and aggregation rules described above. 3.8 Methodological Justification The design emphasizes rigor, comparability, and replicability within a confirmatory scope.  FinBERT and CryptoBERT provide complementary coverage of financial and cryptocurrency language, applied to the same curated headline corpus with identical preprocessing.  Excluding neutral classifications reduces measurement noise and yields a clearer continuous daily sentiment index.  Multiple horizons, defined as t = 0, 7D, and 30D, allow assessment of immediate and gradual associations. A limited t plus 1 check is included for forecast evaluation only.  Estimation uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Overlap in multi day horizons is acknowledged in the baseline and addressed directly in the appendix through non overlapping sampling.  Visualizations summarize distributions and co movements to aid interpretation. They are descriptive and do not replace inference. 36 4 Results and Discussion 4.1 Overview This section reports in sample associations between daily headline sentiment and returns for Bitcoin and Ethereum using FinBERT and CryptoBERT during 10 November 2021 to 12 August 2025. The evidence is horizon and asset specific. Bitcoin. FinBERT sentiment shows a statistically significant contemporaneous association with same day returns. Positive tone is associated with higher daily returns and negative tone with lower daily returns. Effects attenuate at longer horizons. The t plus 1 check does not yield a statistically reliable association with next day returns. CryptoBERT produces qualitatively similar signs but weaker magnitudes in the same day setting. Ethereum. Associations are stronger over multi day windows. FinBERT sentiment is positively associated with cumulative 7D and 30D returns, with limited or no same day effect. CryptoBERT likewise shows little daily association but positive and statistically significant links over weekly and monthly horizons. This pattern indicates slower diffusion for Ethereum relative to Bitcoin under the same specifications. Summary.  Both assets move in the expected direction with respect to headline tone.  Bitcoin exhibits the clearest same day association that fades with horizon.  Ethereum exhibits stronger cumulative associations at 7D and 30D and weaker same day links.  The limited t plus 1 check does not support a reliable next day association for either asset. No out of sample prediction is claimed. Appendix Tables A1 through A4 report confirmatory specifications using non overlapping seven day and thirty-day samples, a simple market factor, a realized volatility control, and HAC standard errors with a data driven bandwidth. The appendix results are consistent with the main findings and support the interpretation of these patterns as confirmatory associations within sample. 37 4.2 FinBERT Analysis 4.2.1 Bitcoin Analysis The FinBERT sentiment variable and financial metric descriptive statistics are summarized in Table 1. The histogram and kernel density of the sentiment values is illustrated in Figure 1. Table 1: descriptive statistics of the FinBERT sentiment variable Variable Obs Mean Std. Dev. Min Max FinBERT sentiment (weighted) 1,327 –0.015 0.198 –0.75 0.62 Daily return (%) 1,297 0.42 9.17 –56.3 162.5 Headlines count 1,327 18.4 9.8 2 41 Figure 2: Distribution of FinBERT Sentiment 38 Table 1 reports summary statistics for the FinBERT daily sentiment index, Bitcoin daily returns, and headline counts. The sentiment index has mean −0.015, standard deviation 0.198, minimum −0.75, and maximum 0.62 across 1,327 observations. Bitcoin daily returns average 0.42 percent with standard deviation 9.17 percent across 1,297 observations. The headline count averages 18.4 per day. Figure 1 shows the distribution of the FinBERT sentiment index for Bitcoin. The histogram with kernel density overlay indicates a concentration near zero with symmetric tails of moderate size. Most observations fall in a narrow band around neutrality, approximately between −0.2 and +0.2, with fewer days exhibiting strongly positive or negative tone. This pattern implies that the series captures day-to-day variation in tone without being dominated by extreme values. Taken together, the table and figure suggest that the headline sentiment measure provides sufficient dispersion for regression analysis while avoiding pronounced skew toward either polarity. The presence of both positive and negative tails supports testing for associations with returns at the defined horizons, while the central mass near zero helps interpret estimated magnitudes as effects around typical news conditions. All subsequent inference remains confirmatory and in sample. 39 4.2.1.1 Contemporaneous Relationship Between Sentiment and Returns Figure 3: BTC Daily Return vs FinBERT Sentiment Table 2: Regression Results (Contemporaneous and Predictive Models) Model Dependent Variable Sentiment Coefficient Std. Error p-value Significance (1) Daily Return (ret1) ret1 4.339 2.191 0.0479 Significant (2) 7-Day Return ret7 3.04 5.401 0.5737 Insignificant (3) 30-Day Return ret30 3.32 2.897 0.2519 Insignificant (4) Next-Day Return F1.ret1 1.415 1.819 0.4368 Insignificant H1 is supported for Bitcoin under the FinBERT specification. In the same day model, the sentiment coefficient is positive and statistically significant after controlling for headline 40 count and the baseline controls used throughout the thesis. The confirmatory specification in Appendix Table A1 preserves the sign and significance. In Model (1) Daily Return, the estimated coefficient on FinBERT sentiment is +4.339 with p = 0.0479. Interpreted on the scale of the index, a one unit increase in daily sentiment is associated with a 4.34 percentage point change in the daily return. Using the sample dispersion reported in Table 1 (standard deviation of sentiment = 0.198), a one standard deviation increase in sentiment is associated with roughly +0.86 percentage points in the daily return. Estimation uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Figure 2 is consistent with this result. The linear fit and the locally weighted smoother both display a mild positive slope. The cloud of points is wide, particularly around neutral sentiment, which indicates that most day to day return variation is driven by other factors. Within that noise, the regression identifies a small but statistically reliable positive association between headline tone and same day Bitcoin returns. 41 4.2.1.2 Return Differences by Sentiment Category Figure 4: BTC Daily Returns by Sentiment Category For descriptive clarity, FinBERT daily sentiment was binned into three categories: Negative (< −0.2), Neutral (−0.2 to +0.2), and Positive (> +0.2). These bins are used only for visualization. All inference in the thesis relies on the continuous index. The boxplots show a small upward shift in the median from the negative to the positive group. Positive days display somewhat higher central returns and a thicker upper tail, indicating a greater frequency of large positive moves when headline tone is positive. Downside tails are present in all categories, with wide dispersion around neutral days that reflects background market volatility. Interquartile ranges overlap across all groups, and each category contains extreme outliers. This pattern indicates that most day to day variation is driven by forces other than headline tone, while sentiment adds a modest directional component. 42 Link to H3. The greater tail thickness and dispersion observed on days with stronger absolute sentiment are consistent with H3, which posits higher short term volatility on extreme sentiment days. This figure is descriptive and does not constitute a formal volatility test. The confirmatory assessment of H3 relies on the volatility specification defined in the methodology and the appendix. All statements remain in sample and strictly confirmatory. 4.2.1.3 Dynamic Co-movement of Returns and Sentiment Figure 5: BTC Returns and Sentiment Over Time Figure 4 overlays Bitcoin’s daily return (left axis) with the FinBERT daily sentiment index (right axis). Both series display high short horizon variability and occasional spikes. Key observations:  Short bursts of positive sentiment often coincide with temporary upticks in daily returns.  Periods with persistently negative sentiment align with broader volatility clusters and drawdowns. 43  Beyond these short bursts, the two series do not move in lockstep. Co movement is episodic and fades quickly. Interpretation remains strictly confirmatory. The visualization is consistent with fast incorporation of headline tone at very short horizons and limited persistence thereafter, which aligns with the regression evidence that same day associations are present while multi day effects are weaker for Bitcoin. The figure is descriptive and does not imply causality or tradable forecasting ability. 4.2.1.4 Limited forecast evaluation A simple lead lag check was estimated to see whether today’s sentiment is associated with next day returns. The dependent variable is the next day daily return. The key predictor is today’s FinBERT sentiment, with today’s return and headline count included as in the baseline specification. Estimation uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. In the daily model, the coefficient on lagged sentiment is +1.415 with p = 0.4368. Parallel checks at the 7D and 30D horizons are likewise statistically insignificant. These results indicate that, within this sample and specification, the same day association does not translate into a reliable next day association. Interpretation:  Positive tone is associated with higher returns on the same day, but this effect does not persist to the next day.  Negative tone is associated with lower returns on the same day, with no systematic continuation. These findings are consistent with rapid incorporation of headline tone and limited persistence at the daily frequency. The analysis is confirmatory and does not claim out of sample predictive ability or a test of market efficiency. 4.2.1.5 Discussion of Findings The regression and descriptive evidence for Bitcoin under the FinBERT specification support a clear but modest same day association between headline tone and returns. The main points are: 44 1. H1 supported for Bitcoin. The same day coefficient on FinBERT sentiment is positive and statistically significant, which is consistent with higher returns on days with more positive headlines and lower returns on days with more negative headlines. The confirmatory specification in the appendix preserves this result. 2. Economic magnitude is small and short lived. The estimated slope implies changes that are minor relative to the asset’s intrinsic volatility. Associations weaken at 7D and 30D and the t plus 1 check is not statistically reliable. 3. Visuals are consistent with the regressions. The scatterplot shows a mild positive slope and wide dispersion. The category boxplots display a small upward shift in the median for positive days and thicker upper tails, while interquartile ranges overlap across groups. The time series overlay suggests episodic alignment during short bursts and limited persistence thereafter. 4. Asymmetry is descriptive. Positive days more often coincide with larger upside moves, while negative days do not show comparable continuation. This pattern is consistent with attention based mechanisms but is not taken as causal. It is used only to interpret the sign and horizon of estimated associations. 5. Scope of inference. Results are in sample and specific to Bitcoin with FinBERT sentiment. They do not imply forecasting ability, trading viability, or causal effects. Cross asset conclusions are drawn only when Bitcoin and Ethereum are compared directly under the same specifications later in the chapter. 4.2.1.6 Summary of Empirical Evidence Summary of Empirical Evidence Same day association: Positive sentiment is associated with higher daily returns. The coefficient on the FinBERT index is +4.339 with p = 0.0479, which supports H1 for Bitcoin. Direction under negative tone: The linear specification implies lower returns on negative sentiment days, consistent with the sign of the coefficient. Multi day horizons: Associations at 7D and 30D are not statistically significant, which indicates a short term reaction rather than a persistent effect. Limited t + 1 check: The next day specification is not statistically significant, so the same day association does not extend to the following day. 45 Descriptive corroboration  Sentiment distribution: Approximately balanced around zero with moderate tails, indicating that the index captures a usable range of variation in tone.  Figures: The scatterplot shows a mild upward slope, and the time series overlays exhibit episodic co movement during short bursts. These visuals are descriptive and align with the regression results. All statements are in sample and confirmatory. No causal interpretation or trading claim is made. 4.2.1.7 Concluding interpretation (FinBERT, Bitcoin) The regression estimates and descriptive figures point to the same pattern. Bitcoin returns are modestly and positively associated with daily headline tone on the same day, with limited persistence beyond that horizon. The association weakens at 7D and 30D, and the t plus 1 check is not statistically reliable. Positive tone coincides with higher same day returns and negative tone with lower same day returns, but the effect is small relative to Bitcoin’s intrinsic volatility. The boxplots and time series overlays show episodic alignment between sentiment and returns together with wide dispersion, which indicates that most day to day variation is driven by other forces. Visual asymmetries are descriptive. Positive days more often coincide with larger upside moves, while negative days do not show systematic continuation. These features are consistent with attention based mechanisms and limits to arbitrage, but they are not taken as causal evidence. Overall, FinBERT sentiment provides a useful summary of short horizon conditions for Bitcoin within this sample. The findings are strictly confirmatory. They document an in sample same day association and do not imply forecasting ability, trading viability, or proof of market efficiency. 4.3 Ethereum Analysis This section reports the in sample association between daily headline sentiment and Ethereum returns using FinBERT for the period 10 November 2021 to 12 August 2025. The analysis mirrors the Bitcoin workflow and remains strictly confirmatory. We evaluate clearly 46 defined horizons at t = 0, 7D, and 30D, and include a limited t plus 1 check for forecast evaluation only. The evidence is organized as follows: 1. Distribution of the daily FinBERT sentiment index for Ethereum; 2. Contemporaneous association between sentiment and returns with supporting scatterplots; 3. Descriptive boxplots of returns by simple sentiment bins used only for visualization; 4. Time series overlays of returns and sentiment to illustrate episodic co movement. The objective is to determine whether Ethereum related headline tone is associated with returns at the specified horizons and whether any association persists beyond the same day. All estimates use ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Overlap in multi day windows is acknowledged, and appendix tables report the non overlapping confirmatory specifications. No causal or trading claims are made. 4.3.1.1 Descriptive Analysis of FinBERT Sentiment Table 4 summarizes the FinBERT daily sentiment index and key ETH variables. The sentiment index has mean 0.014, standard deviation 0.189, minimum −0.73, and maximum 0.68 across 1,202 observations. ETH daily returns average 0.27 percent with standard deviation 6.84 percent across 1,202 observations. The headline count averages 14.2 per day. Table 3: Descriptive Statistics of FinBERT Sentiment (ETH) Variable Obs Mean Std. Dev. Min Max FinBERT sentiment (expected) 1,202 0.014 0.189 –0.73 0.68 Daily return (%) 1,202 0.27 6.84 –48.2 79.5 Headlines count 1,202 14.2 7.9 1 33 47 Figure 6: Distribution of FinBERT Sentiment (ETH) Figure 5 shows the distribution of the FinBERT sentiment index for Ethereum. The histogram with kernel density overlay is centered near zero with moderate tails and a slight tilt to the right. Most observations lie within approximately −0.2 to +0.2, indicating that typical daily coverage is close to neutral, with somewhat more positive than negative days. Together, the table and figure indicate that the sentiment measure provides sufficient dispersion for regression analysis without being dominated by extremes or by one polarity. The concentration near neutrality helps interpret coefficients as responses around typical news conditions, while the presence of both positive and negative tails supports testing for in-sample associations at the defined horizons. All claims remain strictly confirmatory. 4.3.1.2 Contemporaneous Relationship Between Sentiment and Returns Table 5 reports the Newey–West regression results testing the effect of FinBERT sentiment on Ethereum returns over various horizons. 48 Table 4: Regression Results - FinBERT Sentiment and ETH Returns Model Dependent Variable Coefficient Std. Error p-value Significance (1) Daily Return (ret1) ret1 0.579 0.463 0.211 Insignificant (2) 7-Day Return ret7 3.117 1.152 0.0069 Significant (1%) (3) 30-Day Return ret30 8.732 2.337 0.0002 Highly significant (0.1%) (4) Next-Day Return F1.ret1 0.03 0.409 0.942 Insignificant Figure 7: ETH Daily Return vs FinBERT Sentiment Table 5 reports Newey West regressions of Ethereum returns on the FinBERT daily sentiment index. 49  Same day (ret1). The coefficient on sentiment is positive but statistically insignificant (coef. = 0.579, p = 0.211). Interpreted on the index scale, a one standard deviation increase in sentiment (0.189 from Table 4) corresponds to an estimated change of about +0.11 percentage points in the daily return, which is economically small and not statistically reliable in sample.  Multi day horizons. The coefficient is positive and significant for the 7D window (coef. = 3.117, p = 0.0069) and positive and highly significant for the 30D window (coef. = 8.732, p = 0.0002). A one standard deviation change in sentiment maps to roughly +0.59 percentage points for 7D and +1.65 percentage points for 30D. This pattern indicates that Ethereum exhibits stronger cumulative associations than same day links under the same specification.  Limited t + 1 check. The next day model is not statistically significant (coef. = 0.030, p = 0.942), so today’s sentiment does not yield a reliable association with next day returns. Figure 6 is consistent with these results. The scatter shows an upward sloping linear fit with substantial dispersion, which aligns with a weak same day association and stronger cumulative effects documented in the 7D and 30D regressions. Estimates use ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors. Overlap in multi day windows is acknowledged; appendix tables report non overlapping confirmatory specifications that preserve these conclusions. All statements are in sample and strictly confirmatory. 50 4.3.1.3 Return differences by sentiment category (FinBERT, Ethereum) Figure 8: ETH Daily Returns by FinBERT Sentiment Category For descriptive clarity, daily FinBERT sentiment was binned into three categories that are used only for visualization: Negative (< −0.2), Neutral (−0.2 to +0.2), and Positive (> +0.2). Figure 7 shows a small increase in the median daily return from the negative group to the positive group. Positive days display slightly higher central returns and a thicker upper tail, indicating a greater frequency of large positive moves when headline tone is positive. Downside tails are present in all groups. Interquartile ranges overlap across the three categories, which implies that same day effects are economically small relative to Ethereum’s background volatility. The wider tails on stronger sentiment days are consistent with H3, which posits higher short term volatility when absolute sentiment is large. 51 These patterns are descriptive and align with the regression results that show weak same day links and stronger cumulative associations at 7D and 30D. All statements are in sample and confirmatory. No causal or trading claims are made. 4.3.1.4 Dynamic co-movement of returns and sentiment (FinBERT, Ethereum) Figure 9: ETH Returns and FinBERT Sentiment Over Time Figure 8 overlays Ethereum’s daily return (left axis) with the FinBERT daily sentiment index (right axis). Both series are volatile and display frequent short bursts. Observations:  Short positive sentiment bursts often coincide with temporary upticks in daily returns.  Sequences of neutral or negative sentiment align with broader drawdown periods and volatility clusters.  Beyond these brief episodes, the two series do not remain synchronized. Co-movement fades quickly. 52 Interpretation is strictly confirmatory. The figure is descriptive and consistent with the regression results: limited same-day association and stronger cumulative links at 7D and 30D for Ethereum, with no reliable next-day association. The pattern aligns with rapid incorporation of headline tone rather than persistent effects, without implying causality or out-of-sample predictability. 4.3.1.5 Limited forecast evaluation (FinBERT, Ethereum) To assess whether today’s sentiment is associated with next day returns, a simple lead lag specification was estimated with next day return as the dependent variable and today’s FinBERT sentiment as the key predictor, alongside the baseline controls. The estimated coefficient is 0.030 with p = 0.942, which is not statistically significant. Parallel checks do not alter this conclusion. Within this sample and specification, the same day association does not extend to the next day. Interpretation remains confirmatory. No out of sample predictability or trading implication is claimed. 4.3.1.6 Discussion of findings (FinBERT, Ethereum) The Ethereum results align with the confirmatory pattern documented for Bitcoin but with a different horizon profile.  Direction and horizon. The same day association is weak and statistically insignificant, while cumulative associations at 7D and 30D are positive and statistically significant. This indicates that headline tone for Ethereum relates more to multi day returns than to same day moves under the same specification.  Magnitude and persistence. Estimated multi day effects are modest in size but more pronounced than the daily link. This suggests gradual diffusion of headline information into Ethereum prices at weekly and monthly horizons.  Descriptive consistency. The boxplots show a small increase in medians from negative to positive days with thicker upper tails under positive tone, and the time series overlay displays episodic co movement that fades quickly. These figures are descriptive and consistent with the regression evidence.  Scope of inference. Results are in sample and framed as associations. They do not imply causality, forecasting ability, or trading viability. 53 Overall, FinBERT sentiment provides a useful summary of short horizon news conditions for Ethereum, with the clearest statistical associations appearing at 7D and 30D. This horizon dependence is considered again in the cross-asset comparison later in the chapter. 4.3.1.7 Summary of empirical evidence (FinBERT, Ethereum) Same day association: Directionally positive but statistically insignificant (coef. ≈ 0.58, p = 0.21). 7D association: Positive and statistically significant (coef. ≈ 3.12, p = 0.0069). 30D association: Positive and statistically significant with larger magnitude (coef. ≈ 8.73, p = 0.0002). Limited t + 1 check: Not statistically significant (coef. ≈ 0.03, p = 0.94). Descriptive corroboration  Sentiment distribution: Centered near zero with a mild right skew, indicating slightly more positive than negative days while retaining balance for inference.  Figures: The scatterplot shows a gentle upward slope with wide dispersion, and the time series overlay exhibits short bursts of co movement. These visuals are descriptive and consistent with the regression results. H1 is supported for Ethereum under the FinBERT specification at multi day horizons. Positive daily sentiment is significantly associated with higher seven day and thirty day returns, and the confirmatory non overlapping models in Appendix Table A2 preserve these results. The same day association is directionally positive but not statistically significant. 4.3.1.8 Concluding Interpretation FinBERT sentiment is not reliably associated with same day Ethereum returns, yet it is positively and significantly associated with cumulative returns at 7D and 30D. This pattern points to gradual incorporation of headline tone into prices at multi day horizons under the specifications used in this thesis. The limited t plus 1 check is not statistically significant. Descriptive figures align with this evidence. Positive tone days show slightly higher medians and thicker upper tails, while interquartile ranges overlap across sentiment bins. On days with stronger absolute sentiment, return dispersion is wider, which is consistent with H3. 54 Overall, FinBERT sentiment provides a useful summary of short horizon news conditions for Ethereum, with the clearest statistical associations emerging over weekly and monthly windows. The findings are strictly confirmatory and in sample. They do not imply causality, forecasting ability, or trading viability. 4.4 CryptoBERT Results 4.4.1 Bitcoin analysis (CryptoBERT) This section examines in sample associations between daily headline sentiment from CryptoBERT and Bitcoin returns for 10 November 2021 to 12 August 2025. The analysis mirrors the FinBERT workflow and remains strictly confirmatory. We evaluate horizons at t = 0, 7D, and 30D, and include a limited t plus 1 check for forecast evaluation only. Estimation uses ordinary least squares with Newey West heteroskedasticity and autocorrelation consistent standard errors, with overlap in multi day windows acknowledged and addressed in the appendix. Evidence is presented from four perspectives: • Distribution of the CryptoBERT daily sentiment index for Bitcoin; • Contemporaneous association between sentiment and returns, supported by scatterplots; • Descriptive return differences by simple sentiment categories used only for visualization; • Time series overlays that illustrate episodic co movement of sentiment and returns. The objective is to determine whether CryptoBERT headline tone for Bitcoin is associated with returns at the specified horizons and how any association compares with the FinBERT results. No causal interpretation, trading implication, or out of sample prediction is claimed. 4.4.1.1 Descriptive analysis of CryptoBERT sentiment (BTC) This table summarizes the descriptive statistics of CryptoBERT sentiment and return variables, while this figure illustrates the sentiment distribution. 55 Table 5: Descriptive Statistics of CryptoBERT Sentiment (BTC) Variable Obs Mean Std. Dev. Min Max CryptoBERT Sentiment (Expected) 1,335 0.017 0.187 −0.52 0.61 Daily Return (%) 1,335 0.23 5.47 −19.8 17.6 Headlines Count 1,335 12.6 6.9 1 29 Figure 10: Distribution of CryptoBERT Sentiment (BTC) Table 7 summarizes the CryptoBERT daily sentiment index and key Bitcoin variables. The sentiment index has mean 0.017, standard deviation 0.187, minimum −0.52, and maximum 0.61 across 1,335 observations. Bitcoin daily returns average 0.23 percent with standard deviation 5.47 percent across 1,335 observations. The headline count averages 12.6 per day. 56 Figure 9 shows the distribution of the CryptoBERT sentiment index for Bitcoin. The histogram with kernel density overlay is centered near zero with moderate tails. Most daily values fall between −0.2 and +0.2, indicating that typical coverage is close to neutral, with both positive and negative days present. Together, the table and figure indicate that the CryptoBERT sentiment series provides sufficient dispersion for regression analysis while avoiding dominance by extreme values or by one polarity. The concentration near zero helps interpret coefficient magnitudes as responses around typical news conditions. All statements are descriptive and support the confirmatory analyses that follow. 57 4.4.1.2 Contemporaneous association between sentiment and returns (CryptoBERT, Bitcoin) This table reports Newey–West (HAC) estimates assessing the effect of sentiment on returns over daily, weekly, and monthly horizons. Table 6: Regression Results CryptoBERT Sentiment and BTC Returns Model Dependent Variable Coefficient Std. Error p-value Significance (1) Daily Return (ret1) ret1 1.192 0.648 0.066 Marginal (10%) (2) 7-Day Return ret7 0.251 2.545 0.921 Insignificant (3) 30-Day Return ret30 0.651 6.412 0.919 Insignificant (4) Next-Day Return F1.ret1 −0.453 0.77 0.556 Insignificant 58 Figure 11: BTC Daily Return vs CryptoBERT Sentiment Table 8 reports Newey West regressions of Bitcoin returns on the CryptoBERT daily sentiment index. Same day (ret1). The coefficient on sentiment is positive and marginal in statistical terms (coef. = 1.192, p = 0.066). Magnitude is small relative to the volatility of daily returns. Multi day horizons. Coefficients at 7D and 30D are not statistically significant (p = 0.921 and p = 0.919), indicating no reliable cumulative association under this specification. Limited t + 1 check. The next day model is not statistically significant (coef. = −0.453, p = 0.556). Figure 10 is consistent with these estimates. The scatter cloud is centered near zero and highly dispersed. The fitted line has a slight positive slope, which matches the weak, directionally positive same day coefficient, but dispersion dominates. 59 Overall, CryptoBERT sentiment for Bitcoin shows at most a very small same day association and no robust links at weekly or monthly horizons in sample. These results are strictly confirmatory and do not imply causality, forecasting ability, or trading viability. 60 4.4.1.3 Return differences by sentiment category (CryptoBERT, Bitcoin) Figure 12: BTC Daily Returns by CryptoBERT Sentiment Bucket For descriptive clarity, daily CryptoBERT sentiment was grouped into three categories used only for visualization: Negative (< −0.2), Neutral (−0.2 to +0.2), and Positive (> +0.2). Figure 11 shows a slight increase in the median daily return from negative to positive categories. Positive days display somewhat thicker upper tails, indicating that larger upside moves occur more frequently when headline tone is positive. At the same time, interquartile ranges overlap across all three groups, which is consistent with the regression evidence that same day effects are small relative to background volatility. These patterns are descriptive. They align with a modest, directionally positive same day association under CryptoBERT and the absence of reliable multi day links. All statements are in sample and confirmatory. No causal or trading claims are made. 61 4.4.1.4 Dynamic co-movement of returns and sentiment (CryptoBERT, Bitcoin) Figure 13: BTC Returns and CryptoBERT Sentiment Over Time Figure 12 overlays Bitcoin’s daily return (left axis) with the CryptoBERT daily sentiment index (right axis). Both series are volatile and display frequent short bursts without a persistent trend. Observations:  Short positive sentiment bursts sometimes coincide with temporary upticks in daily returns.  Negative sentiment episodes align with occasional drawdowns, but alignment is episodic rather than sustained.  Beyond these brief intervals, co-movement fades quickly and the series do not move in lockstep. Interpretation is strictly confirmatory. The figure is descriptive and consistent with the regression evidence: at most a small same-day association under CryptoBERT and no reliable 62 links at weekly, monthly, or next-day horizons. The pattern is consistent with rapid incorporation of headline tone at very short horizons, without implying causality, forecasting ability, or market efficiency claims. 4.4.1.5 Limited forecast evaluation A lead lag regression was estimated with next day Bitcoin return as the dependent variable and today’s CryptoBERT sentiment as the key predictor, together with the baseline controls. The estimated coefficient is −0.453 with p = 0.556, which is statistically insignificant. This shows that, within this sample and specification, today’s CryptoBERT sentiment does not predict tomorrow’s Bitcoin return. The reaction to sentiment is therefore interp