Matti Viherkoski Investigating Financial Drivers of ESG Scores An Interpretable Machine Learning Approach Vaasa 2025 School of Finance and Accounting Master’s degree Finance 2 UNIVERSITY OF VAASA School of Finance and Accounting Author: Matti Viherkoski Title of the thesis: Investigating Financial Drivers of ESG Scores: An Interpretable Ma- chine Learning Approach Degree: Master of Science in Economics and Business Administration Degree Programme: Master’s Programme in Finance Supervisor: Timo Rothovius Year: 2025 Pages: 115 ABSTRACT: Growing interest in sustainable finance has increased the demand for transparent and reproduc- ible assessments of corporate sustainability. The study has been grounded in the idea that ESG ratings reflect both sustainability-related practices and potentially the economic capacity to dis- close and implement them. This thesis examined the extent to which financial information ex- plains variation in ESG ratings. The objective has been to assess how far ESG outcomes are pre- dictable from firm-level characteristics, to identify the financial factors most consistently associ- ated with them and to analyse the underlying structure of these relationships. The empirical analysis was based on data from firms included in the STOXX Europe 600 index for the period 2014–2023, obtained from the London Stock Exchange Group database. The dataset contained firm-level annual observations, and the dependent variables consisted of the overall ESG score and its environmental, social, and governance pillars. The independent variables com- prised profitability, leverage, liquidity, efficiency, and valuation ratios, together with firm size, industry, and year identifiers. Methods combined supervised machine learning with model-agnostic interpretability. Model evaluation relied on standard regression metrics, and explainable artificial intelligence methods. The results indicated that firm-level financial characteristics explain a substantial portion of cross-sectional variation in ESG assessments. Nonlinear models outperformed linear alterna- tives, demonstrating that relationships between financial and sustainability indicators are com- plex and potentially interactive. The analysis highlighted firm size, operational efficiency, and capital structure as key predictors of higher ESG scores, whereas high profitability margins and liquidity were not systematically associated with higher assessed sustainability. The findings suggested that financial structure had influence over the measured sustainability, implying that ESG scores can partly reflect underlying economic fundamentals in addition to non-financial performance. The study showed that interpretable machine learning offered a practical framework for understanding these linkages, but also that financial data alone could not fully account for the multidimensional nature of sustainability. Future research was encour- aged to integrate non-financial and textual data, apply longitudinal designs, and examine regu- latory developments to better capture the dynamic relationship between corporate finance and sustainability outcomes. KEYWORDS: machine learning, modelling, responsible investing, sustainability reporting, cor- porate responsibility 3 VAASAN YLIOPISTO Laskentatoimen ja rahoituksen yksikkö Tekijä: Matti Viherkoski Tutkielman nimi: Investigating Financial Drivers of ESG Scores: An Interpretable Ma- chine Learning Approach Tutkinto: Kauppatieteiden maisteri Oppiaine: Rahoitus Työn ohjaaja: Timo Rothovius Vuosi: 2025 Pages: 115 TIIVISTELMÄ: Kestävän rahoituksen kasvava merkitys on lisännyt tarvetta läpinäkyville ja toistettaville yritys- vastuuta kuvaaville arviointitavoille. Tutkimus on perustunut oletukseen, että ESG-arvosanat heijastavat sekä vastuullisuuskäytäntöjä, sekä mahdollisesti niiden raportointiin ja toimeenpa- noon tarvittavaa taloudellista kapasiteettia. Tutkimuksen tavoitteena on ollut arvioida, kuinka pitkälle ESG-tuloksia voidaan ennustaa yritystason taloudellisista tunnusluvuista, tunnistaa mitkä taloudelliset mittarit linkittyvät näihin vahvimmin, ja analysoida näiden suhteiden raken- teellista luonnetta. Empiirinen analyysi koostui STOXX Europe 600 -indeksin yrityksistä vuosilta 2014–2023 koot- tuun aineistoon, joka on peräisin London Stock Exchange Groupin tietokannasta. Aineisto koos- tui yritystason havainnoista vuosittain; riippuvina muuttujina olivat yritysten vastuullisuuspis- teytykset, ja selittävinä muuttujina olivat kannattavuutta, velkaantuneisuutta, maksuvalmiutta, tehokkuutta ja arvostustasoa kuvaavat suhdeluvut, sekä yrityksen koko, toimialaluokitus ja vuo- situnniste. Menetelmissä yhdistettiin ohjattua koneoppimista malliriippumattomaan tulkittavuuteen. Mal- lien arviointi perustui standardoituihin regressiometriikoihin ja tuloksia selitettäviin tekoälyme- netelmiin. Tulokset osoittivat, että yritystason taloudelliset ominaisuudet selittävät merkittävän osan ESG- arviointien poikkileikkausvaihtelusta. Epälineaaristen mallien todettiin suoriutuvan lineaarisia vaihtoehtoja paremmin, osoittaen, että taloudellisten ja kestävyysindikaattorien väliset suhteet olivat monimutkaisia ja mahdollisesti vuorovaikutteisia. Analyysi korosti yrityksen kokoa, toimin- nan tehokkuutta ja pääomarakennetta ESG-pisteiden keskeisinä ennustajina, kun taas korkeat kannattavuusmarginaalit ja likviditeetti eivät olleet systemaattisesti yhteydessä korkeampaan arvioituun vastuullisuustasoon. Tulokset osoittivat, että rahoitusrakenne vaikutti mitattuun vastuullisuuteen, mikä viittasi sii- hen, että ESG-pisteet heijastivat osittain taustalla olevia talouden perustekijöitä ei-taloudellisen informaation lisäksi. Tutkimus osoitti, että tulkittava koneoppiminen tarjosi käytännöllisen vii- tekehyksen näiden yhteyksien ymmärtämiseen, mutta pelkkä taloudellinen informaatio ei riitä selittämään kestävyyden luonnetta. Jatkotutkimuksiksi suositeltiin ei-taloudellisen ja tekstimuo- toisen aineiston integrointia, pitkittäisasetelmien hyödyntämistä sekä sääntelykehitysten tar- kastelua, jotta yritysrahoituksen ja kestävyystulosten dynaamista suhdetta voidaan kuvata täs- mällisemmin. AVAINSANAT: koneoppiminen, mallintaminen, vastuullinen sijoittaminen, kestävyysrapor- tointi, yritysvastuu 4 Contents 1 Introduction 9 1.1 Purpose and motivation 10 1.2 Research hypotheses 11 1.3 Structure of the study 11 2 Literature review 13 2.1 Corporate Sustainability and ESG Scores 13 2.1.1 Defining Corporate Sustainability 13 2.1.2 Importance of Sustainability for Companies 13 2.1.3 Measuring Sustainability: ESG Scores and Other Metrics 14 2.2 Predictive Modelling of ESG Scores 16 2.2.1 Linear and Regularized Regression Models 17 2.2.2 Tree-Based Ensemble Models (Random Forests and Boosting) 19 2.2.3 Deep Learning Models (Neural Networks) 22 3 Data and Data Processing 26 3.1 Data Integrity and Initial Screening 29 3.2 Currency Standardization 30 3.3 Descriptive Statistics of the Processed Dataset 30 3.3.1 Overview of ESG Scores 31 3.3.2 Overview of Financial Variables 31 3.3.3 Interpretation and Relevance for Modelling 32 3.4 Distribution Across Time 32 3.5 Industry Classification 34 3.6 Correlogram 37 3.7 Density Functions of ESG Scores 39 3.8 Missing Data Assessment and Handling 40 3.8.1 Extent and Distribution of Missingness 41 3.8.2 Mechanisms of Missing Data 42 3.8.3 Implications for Data Integrity and Modelling 42 3.8.4 Treatment Principles 43 5 3.9 Pre-Modelling Procedures and Pipeline Design 43 3.10 Winsorisation 45 4 Methodology 47 4.1 Machine Learning Models – general information on models used 47 4.1.1 Random Forest 47 4.1.2 XGBoost 48 4.1.3 RIDGE and LASSO Regression 48 4.2 Trained Models 49 4.2.1 XGB1 49 4.2.2 XGB2 51 4.2.3 XGB3 52 4.2.4 RF1 52 4.2.5 RF2 52 4.2.6 RF3 54 4.2.7 RIDGE 55 4.2.8 LASSO 55 5 Results 56 5.1 Observed and Residuals vs. Predicted ESG Score 58 5.1.1 Observed vs. predicted plot 58 5.1.2 Residuals vs. predicted plot 59 5.1.3 Observed vs. Predicted ESG Score 59 5.1.4 Residuals vs. Predicted ESG Score 61 5.2 Feature Importances 62 5.2.1 Permutation Importances 62 5.2.2 Feature Importances by Weight, Cover and Gain 64 5.2.2.1 Feature Importances by Weight 65 5.2.2.2 Feature Importances by Cover 67 5.2.2.3 Feature Importances by Gain 69 5.2.2.4 Cross-metric feature comparison 71 6 5.3 SHAP Metrics 73 5.3.1 Directional SHAP Feature Importance Metrics 73 5.3.2 SHAP Summary Dot Plot 76 5.4 Cross-validated Partial Dependence Plots 79 5.5 SHAP Dependence plots and 2D partial dependence 83 5.6 Further Analysis of Negative Directions 88 6 Discussion 91 6.1 Findings and Previous Literature 91 6.2 Limitations 95 6.2.1 Comparisons to Previous Literature 95 6.2.2 Data and construct validity 96 6.2.3 Study Design 97 6.2.4 Feature Score and Omitted Variables 97 6.2.5 Technical Constraints 97 6.2.6 Interpretability Caveats 98 6.2.7 External Validity and Regulatory Shifts 98 6.3 Suggestions for Future Research 98 Conclusion 101 References 104 Appendices 108 Appendix 1. ESG Report/Rating Summary Table by Huber et al. (2017) 108 Appendix 2. Results from other trained models 110 7 Figures Figure 1. Mean ESG Score by Year and Industry. 36 Figure 2. Standard deviation of ESG Score by Year and Industry. 36 Figure 3. Correlogram. 38 Figure 4. Density functions of ESG score, E, S and G. 39 Figure 5. XGB1 Observed vs predicted ESG score and residuals. 59 Figure 6. Permutation Importances (XGB1). 63 Figure 7. Feature Importance by Weight (XGB1). 66 Figure 8. Feature Importance by Cover (XGB1). 68 Figure 9. Feature Importance by Gain (XGB1). 70 Figure 10. Directional SHAP feature importance Metric Bar (XGB1). 74 Figure 11. SHAP Feature Importance Summary Dot (XGB1). 77 Figure 12. XGB1 Cross-Validated PDPs for SIZE and TD/TA against ESG Score. 79 Figure 13. XGB1 Cross-Validated PDPs for NS/TA and EBIT/NS against ESG Score. 80 Figure 14. XGB1 Cross-Validated PDPs for DIV Y and ROE against ESG Score. 81 Figure 15. XGB1 Cross-Validated PDPs for ROA against ESG Score. 82 Figure 16. SHAP Plot and 2D Dependence for NS/TA and SIZE. 84 Figure 17. SHAP Plot and 2D Dependence for DIV Y and SIZE. 85 Figure 18. SHAP Plot and 2D Dependence for TD/TA and SIZE. 86 Figure 19. SHAP Plot and 2D Dependence for EBIT/NS and SIZE. 86 Figure 20. SHAP Plot and 2D Dependence for NS/TA and TD/TA. 87 Figure 21. RF2 Observed vs predicted and residuals. 110 Figure 22. RIDGE Obtained vs predicted and residuals. 110 Figure 23. LASSO Obtained vs predicted and residuals. 111 Figure 24. RF2 Intrinsic Feature Importances. 111 Figure 25. RF2 Permutation Importances. 112 Figure 26. RF2 SHAP Feature Importances bar. 112 Figure 27. RF2 SHAP Feature Importances plot. 113 Figure 28. RF2 SHAP Directional metrics. 113 Figure 29. RF2 Cross validated PDPs for ASSETS and TD/TA. 114 8 Figure 30. RF2 Cross validated PDPs for NS/TA and EBIT/NS. 114 Figure 31. RF2 Cross validated PDPs for DIV Y and ROE. 114 Figure 32. RF2 Cross validated PDPs for P/E and ROA. 115 Tables Table 1. Summary of key studies. 25 Table 2. Summary Statistics table. 30 Table 3. Main statistics of the ESG, E, S and G scores distribution by year of the sample of 600 companies listed in the STOXX Europe 600 Index. 33 Table 4. General Industry Classification explanation. 34 Table 5. Main statistics of the ESG, E, S, and G score distribution by industry sector of the sample of 600 companies listed in the STOXX Europe 600 Index. 35 Table 6. Density functions of ESG score, E, S and G. 40 Table 7. Missing values per industry table. 41 Table 8. Descriptions of XGBoost hyperparameters (XGBoost Developers, 2024). 51 Table 9. Descriptions of RandomForest parameters (scikit-learn developers, 2024). 53 Table 10. Model performance comparison. 56 Table 11. ESG Report/Rating Summary Table by Huber et al. (2017). 108 9 1 Introduction The current investment landscape has been undergoing changes propelled by the grow- ing demand and representation for Socially Responsible Investing (SRI) (D’Amato et al., 2022). Alongside financial characteristics, SRI considers companies ethical, social, and environmental values, aiming to generate financial returns while considering positive outcomes for stakeholders from the point of view of sustainability. Arguably the most important characteristics and metrics for socially responsible investors are Environmen- tal, Social and Corporate Governance (ESG) characteristics. Large financial data and rat- ing institutions play pivotal roles in serving market participants benchmarks for guidance in their investment decision making processes (D’Amato et al., 2022). These institutions provide ESG ratings for companies, providing socially responsible investors with a quan- tifiable tool to compare and analyze sustainability of companies. However, the accuracy and reliability of these ratings remain subjects of scrutiny. While ESG ratings are gaining traction, the accuracy of existing scores continues to be widely questioned, providing a need for further research and refinement in methodologies (Chowdhury et al., 2023). An obstacle for investors and policymakers is the inability to accurately evaluate the re- liability of the aggregation process used to determine ESG scores. This challenge stems from the lack of transparency in the rating system. Rating agencies generate ESG scores using proprietary models, and the information available to the public is often limited to what the agency chooses to disclose. In many cases, this disclosure is restricted to the fundamental principles of the methodology, which varies between agencies. Conse- quently, from the perspective of outside stakeholders, algorithms used by rating agen- cies can be considered as black-box models, where the inner workings are obscure (Del Vitto et al., 2023). Furthermore, although different papers have found nonlinear relationships between sustainability metrics and their constituting indicators, Berg et al. (2022) discovered that six major ESG ratings are developed using linear models. These ratings rely on ad hoc weighted averages, meaning that the model weights assigned by the rater are assumed 10 to accurately reflect the relative importance of different ESG aspects. However, this ap- proach overlooks the nuanced nature of ESG factors and may not fully capture their ac- tual significance in sustainability assessment. Referring to Berg et al. (2022) findings, Svanberg et al. (2022) argue that because complex concepts such as ESG are not likely to have solely linear relationships with the features constituting ESG indicators, ESG ratings are unlikely to represent the degree of actual sustainability of companies. Further relating to the issue of ESG ratings accurately representing the degree of sus- tainability, Billio et al. (2021) find that due to raters’ disagreements on characteristics and their significances defining components of ESG leads to varying sustainability assess- ments among rating agencies and thus disseminates sustainable investors preferences regarding asset prices. 1.1 Purpose and motivation Against this backdrop, the purpose of this thesis is to contribute to the evolving literature of sustainable finance and ESG investing by applying machine learning techniques to pre- dict ESG scores using financial statement items, and analyzing the information learned by the model, to investigate the potential relationships between them. The results from investigating these relationships can improve the understanding of how and what finan- cial characteristics affect the ESG scores of companies, and of the inner workings of the “black box models or methods” used at rating companies sustainability scores; the paper investigates whether, and to which extent a company’s ESG performance can be pre- dicted using traditional financial statement items, and to examine which financial fea- tures are most important in explaining ESG scores using Explainable AI tools, which are methods for interpreting machine learning models. Considering data and methodology, within previous literature this thesis has most simi- larities with the work of Chowdhury et al. (2023), and D’Amato et al. (2021 and 2022), but uses combination of different variables that their studies previously recommended 11 as significant and uses larger dataset with more recent observations than, for example, D’Amato et al. (2021), covering years of Covid-19 pandemic. During recent years the im- portance of sustainability has also kept rising, and LSEG’s ESG score methodology might have quietly changed, as the more specific methodology is not disclosed. 1.2 Research hypotheses This thesis hypotheses that RandomForest and XGBoost machine learning methods can achieve notable reduction in variability of prediction than standard mean prediction would from simply using financial data on firm-year level observations, and that the in- formation learned by the model can provide further insights on how ESG scores are af- fected by different financial characteristics. If this hypothesis is true, it brings up ques- tions about the nature of sustainability scores: To what degree do these scores accu- rately capture sustainability of companies, considering that ESG ratings struggle with the problem of transparency of the models, and real sustainability effects of companies should at least in theory be rather independent from solely financial metrics? To what extent is it appropriate to compare ESG scores of different companies as a proxy for real sustainability on a continuous 0 – 100 scale, rather than on a form of weighted or ad- justed scale, if companies with some financially homogenous metrics are consistently placed on different ESG score quantiles than others? 1.3 Structure of the study In conclusion, in this thesis I aim to use supervised machine learning algorithms to pre- dict ESG scores based on financial statement ratios and analyse the patterns learned by the models. By doing so, the study seeks to examine whether publicly available financial data can approximate ESG ratings and to what extent ESG assessments may be driven by quantifiable financial indicators. 12 The thesis is organized as follows: Chapter 2 provides a review of the existing literature regarding to sustainability, its impact on companies and how it can be measured, ESG ratings and predictive modelling approaches. Chapter 3 outlines the data used in the research, data processing and pre-processing steps and imputation methods. Chapter 4 outlines the methodology, covers the general information concerning the ma- chine learning techniques used, and explains further model specific information consid- ering the specific model data and calibration. Chapter 5 presents the results of the models, including their performance, and further explainable AI analysis of the best performing models learned patterns from the training dataset. Chapter 6 discusses the findings considering previous studies and evaluates the implica- tions of model results and interpretability, discusses the limitations of the study, and suggests directions for future research. Finally, the last chapter concludes the study. 13 2 Literature review This chapter introduces key concepts and findings considering the topic from the litera- ture. In this chapter, corporate sustainability is first defined and its importance shortly introduced with common metrics to measure sustainability. The chapter continues by reviewing previous literature considering predictive modelling of ESG scores and con- cludes with summary table consisting of the key studies referred to in this chapter. 2.1 Corporate Sustainability and ESG Scores The next subchapters define corporate sustainability, its importance for firms, and out- lines ways companies present it. 2.1.1 Defining Corporate Sustainability Corporate sustainability refers to a company’s ability to conduct business in a way that is environmentally sound, socially responsible, maintains transparent governance prin- ciples, and sustains long term economic viability (Ahmad et al., 2024). In practice, this means integrating ecological integrity, social welfare, and good governance into corpo- rate strategies while continuing creating value for shareholders. This concept aligns with the “triple bottom line” of people, planet, and profit, emphasizing that sustainable firms balance financial performance with social and environmental stewardship. According to the OECD, corporate sustainability entails embedding environmental and social consid- erations into core business operations and strategy. 2.1.2 Importance of Sustainability for Companies Companies are increasingly recognizing that strong sustainability practices can confer significant benefits. One key driver is investor demand: a growing share of global invest- ments now incorporates Environmental, Social, and Governance (ESG) factors. As of the late 2010s, roughly $30 trillion in assets were managed using ESG criteria (D'Amato et 14 al., 2021). Investors increasingly view sustainability as linked to long-term financial per- formance and effective risk management, prompting firms to improve ESG performance to attract capital. Research further suggests that companies with strong sustainability profiles may be more resilient during periods of crisis. For example, firms with high ESG ratings experi- enced better stock return performance during the 2008 financial crisis (Lins et al. 2017), (D'Amato et al., 2021). Beyond investor considerations, Pilz (2024) suggests that sustainability also offers repu- tational and operational advantages. Embracing ESG can enhance a company's brand and consumer trust, while also helping identify risks and opportunities within operations (Pilz, 2024). Pilz (2024) further suggests that sustainability initiatives can lead to cost savings like im- proved energy efficiency and can spur innovation. Integrating ESG considerations also enables more informed decision-making and strengthens relationships with stakehold- ers. Surveys of executives consistently show that sustainability is no longer seen as a niche issue, but as essential for long-term success and risk mitigation (Pilz, 2024). In summary, companies should care about sustainability not only for ethical reasons but also because it aligns with financial prudence, stakeholder expectations, and evolving regulatory trends in today’s business environment. 2.1.3 Measuring Sustainability: ESG Scores and Other Metrics Corporate sustainability is commonly assessed using standardized measurement frame- works, with ESG scores standing out as among the most widely adopted. ESG—an acro- nym for Environmental, Social, and Governance—represents three dimensions used to evaluate a firm’s sustainability-related performance (Del Vitto et al., 2023). An ESG score serves as a summary indicator that reflects how effectively a company manages its risks 15 and externalities across these three domains. The environmental dimension typically in- cludes metrics such as greenhouse gas emissions, resource consumption, and waste management; the social dimension covers issues such as labour practices, community relations, and product safety; and the governance dimension focuses on board structure, ethical conduct, and transparency (Del Vitto et al., 2023). These scores are generally produced by independent ESG rating agencies or data provid- ers, which assess company disclosures, news sources, and other information to bench- mark sustainability performance relative to industry peers. Leading providers include MSCI ESG Ratings, Sustainalytics, S&P Global (CSA/DJSI), and Refinitiv (formerly Thom- son Reuters/Asset4), each applying distinct methodologies. For example, Refinitiv’s ESG framework evaluates over 12,000 companies and assigns percentile-based scores rang- ing from 0 (lowest) to 100 (highest), based on industry-relative performance (Del Vitto et al., 2023). The expansion of ESG scoring systems reflects growing demand for quanti- fiable sustainability metrics, and they have become a key tool for investors to quickly evaluate a company’s sustainable profiles (Del Vitto et al., 2023). It is important to recognize that ESG scores can vary substantially across rating providers due to differences in data sources, weighting schemes, and evaluation methodologies. Berg et al. (2022) documented significant divergence among ESG scores assigned by six leading rating agencies, underscoring the lack of standardization in sustainability assess- ment practices. Nevertheless, ESG ratings remain widely used as a proxy for corporate sustainability performance in academic research and investment practice (D'Amato et al. 2021). While these composite scores offer a convenient summary measure, firms often supplement them with more granular sustainability metrics for internal tracking and dis- closure purposes. For further information considering largest ESG report providers and their rating meth- ods, Huber et al. (2017) have constructed comprehensive summary table of the topic in 16 their paper “ESG Reports and Ratings: What They Are, Why They Matter”, that can also be found in this papers Appendix 1. In addition to ESG scores, several other frameworks and metrics are used to assess cor- porate sustainability. Many companies publish sustainability reports in accordance with sustainability directives such as Corporate Sustainability Reporting Directive (CSRD) or European Sustainability Reporting Standards (ESRS), that follow established sustainability standards such as the Global Reporting Initiative (GRI), which mandate de- tailed qualitative and quantitative disclosures. Other benchmarks include sustainability indices, such as the Dow Jones Sustainability Index (DJSI) and FTSE4Good, which rank companies based on structured questionnaires and performance criteria. Organizations may also pursue third-party certifications or ratings, such as B Corp certification or Car- bon Disclosure Project (CDP) scores, particularly for environmental performance. Furthermore, concepts like Corporate Social Responsibility (CSR) and alignment with the UN Sustainable Development Goals (SDGs) are used to qualitatively gauge a company’s contributions to sustainable development. are often used to qualitatively evaluate a firm's contributions to sustainable development. These diverse measurement ap- proaches complement ESG ratings. For example, a company may receive a high ESG score from MSCI, be included in the DJSI, and disclose its sustainability efforts in line with GRI standards—together offering a more holistic view of corporate sustainability. In this the- sis, however, the primary focus is on ESG scores as a quantifiable measure of sustaina- bility performance, given their widespread use in financial markets and research. 2.2 Predictive Modelling of ESG Scores In recent years, a growing body of research at the intersection of sustainable finance and machine learning has focused on predicting ESG scores using a variety of data sources. The motivation for this work is twofold: first, to identify the factors that influence ESG ratings, thereby offering insight into the rating process and the relationship between 17 financial and sustainability performance; and second, to develop predictive models ca- pable of estimating ESG scores where data are missing or of forecasting future ESG out- comes, with potential applications for investors and corporate decision-makers. Lever- aging the increasing availability of ESG ratings and firm-level financial data, researchers have employed a wide range of machine learning (ML) methods, ranging from linear re- gressions to advanced deep learning architectures, to model ESG scores. The following review surveys recent literature on ESG score prediction, organizing studies by model type, from linear and regularized models to ensemble and deep learning approaches. 2.2.1 Linear and Regularized Regression Models Since sustainability has become more topical in recent years, many studies have tried to make ESG scoring more transparent by building predictive models. A study by Licari, J. et al. (2021) used traditional linear regression to predict ESG scores across a large and global dataset consisting of 19,000+ companies in 96 countries between years 2004– 2020. The paper found that traditional models struggle to handle the complexity and inconsistency of ESG score construction, highlighted the limitations of traditional statis- tical methods in modelling ESG ratings. In the paper, predicting ESG scores using liner regression achieved weak prediction performance – an R² of 31.13%, suggesting it cap- tured only a small portion of the variation in ESG scores (31.13% of ESG score variation was attributable to the independent variables in the model). The paper presents multi- ple potential reasons for poor performance of traditional models, including complex na- ture of ESG rating methodologies, varying data sources, subjective weighting of ESG at- tributes, direct company engagement, and coverage caps considering smaller firms, and varying regulation between sectors and emerging markets. Del Vitto, Marazzina, and Stocco (2023) investigate the transparency of proprietary ESG ratings by attempting to replicate the ESG scoring methodology used by LSEG (formerly Refinitiv). Using a combination of machine learning methods—including regularized lin- ear models (Ridge and Lasso regressions), Random Forest, and Artificial Neural Net- works—they model the Environmental, Social, and Governance (ESG) pillar scores based 18 on Refinitiv's full set of sustainability indicators and financial variables. A key contribu- tion of their study is the demonstration that interpretable models such as Lasso and Ridge, often referred to as “white-box” methods, can achieve predictive performance comparable to more complex black-box models like neural networks. These linear mod- els also offered the advantage of minimal overfitting and strong generalizability across sectors. The authors report high predictive accuracy for the Environmental pillar and moderate accuracy for the Social and Governance scores. The reduced accuracy for the social pillar is attributed to its broader and less quantifiable scope, while regional varia- tion in Governance scores reflects differing institutional contexts and data availability— prompting caution when making cross-country comparisons (e.g., between the U.S. and China). Their analysis also reveals that feature importance varies across industries and geographies, underscoring the contextual nature of ESG rating mechanisms. Overall, the findings suggest that a well-specified linear model using relevant financial and ESG indi- cators can approximate Refinitiv’s ESG ratings with surprising accuracy, In a study of Taiwanese companies, Lin and Hsu (2023) included a multiple linear regres- sion as a benchmark for ESG score prediction. The authors emphasized the importance of establishing interpretable baseline models, particularly in the context of Taiwan’s unique market characteristics, including technology-driven economy, limited stock circu- lation, and heightened information asymmetry. Although they found that the linear models were consistently outperformed by more advanced machine learning techniques, they still demonstrated moderate predictive accuracy and served as a transparent refer- ence point for evaluating more complex approaches. The authors noted that linear mod- els struggled to capture the nonlinear relationships inherent in ESG ratings, especially in the presence of multicollinearity among financial and governance-related variables. Nonetheless, the inclusion of linear regression highlighted the trade-off between model simplicity and predictive power, underscoring its value in contexts where interpretability and transparency are prioritized. 19 Notably, linear models allow researchers to identify which financial ratios and indicators have the most explanatory power for ESG scores, albeit assuming a linear relationship. Commonly influential variables include profitability metrics, leverage, firm size, and in- dustry-specific factors, which are consistent with broader empirical findings on the de- terminants of ESG performance. Regularization techniques such as Lasso regression fur- ther enhance model parsimony by shrinking the coefficients of less relevant predictors toward zero, thereby highlighting a core subset of explanatory features (Del Vitto et al., 2023). While linear models generally exhibit lower predictive accuracy than nonlinear approaches in more complex environments, they nevertheless provide a transparent and reasonably effective baseline for ESG score modelling, particularly when interpretability and variable selection are of primary importance. 2.2.2 Tree-Based Ensemble Models (Random Forests and Boosting) A significant portion of recent ESG prediction research employs tree-based ensemble models, including Random Forests (RF) and gradient boosting frameworks such as XGBoost and LightGBM. These models are suited to capture nonlinear relationships and complex interactions among predictors, making them effective in financial modelling contexts. They have likewise shown strong performance in predicting ESG scores across various studies. As they can handle high-dimensional data and model heterogeneity, they have become popular choice in studies aiming to replicate ESG scores or forecast sustainability performance. Moreover, ensemble models such as RF offer built-in mech- anisms for estimating feature importance, which can provide insights into the relative contribution of predictors to ESG outcomes—albeit still with less transparency than lin- ear models. Random Forests model was used by D’Amato, D’Ecclesia, & Levantesi (2021) in one of the pioneering works to link financial fundamentals with ESG ratings. Using data from 109 STOXX Europe 600 index companies during the 2010s, the authors trained the model on balance sheet and income statement ratios to predict Bloomberg’s ESG disclosure scores. The study aimed to assess the predictive power of conventional financial 20 variables in explaining variation in sustainability ratings. Among the models tested, Ran- dom Forest delivered the highest predictive performance, achieving an R2 of approxi- mately 0.62, which outperformed linear regression and other baseline models. Key pre- dictors identified included firm size, profitability, and leverage. The authors concluded that financial statement items constitute a robust explanatory basis for ESG scores, providing empirical support for the notion that sustainability assessments, although seemingly non-financial in nature, are linked to a firm’s financial characteristics. Complementing the findings of D’Amato et al. (2021), Lin et al. (2019) study had already found negative link between corporate social responsibility and corporate financial per- formance measured by ROE, ROA and ROI, which supports the theory that a trade-off exists between optimising financial performance metrics and carrying out sustainability objectives. However, a few other studies have later pointed out that the trade-off only negatively affect companies which financial performance is below optimal to begin with. D’Amato et al. (2022) expand on their earlier study, aiming to assess structural data and balance sheet items effect on ESG scores of regularly traded stocks. In this study, they instead use Refinitiv (LSEG) ESG scores, with larger sample of companies across 2009 – 2019 and find that balance sheet items present a significant predictive power on ESG score. Based on their findings the Random Forest algorithm performs best at predicting ESG scores compared to classical regression approach, as it can capture nonlinear rela- tionships between ESG scores and predictive variables, which their study shows to occur consistently. Cini and Ferrari (2025) took this approach a step further by introducing a time dimension: they trained an RF classification model to predict a firm’s next-year ESG rating class using current financial ratios and risk indicators. Using panel data from 2016 to 2021 for Euro- pean companies, their model categorized firms into ESG performance tiers (e.g., high, medium, or low) with high out-of-sample accuracy. This is notable as it demonstrates forward-looking predictive power – essentially showing that there is informational 21 content in financial fundamentals that anticipates improvements or declines in ESG per- formance. The authors described their model’s accuracy as “unprecedented,” suggesting practical applications in estimating ESG ratings for firms that lack current evaluations, such as small-cap or privately held companies. Beyond Random Forests, boosting algorithms have also gained traction in ESG score pre- diction due to their high predictive accuracy and ability to model complex nonlinear re- lationships. Gradient boosting machines such as XGBoost have been applied in ESG stud- ies with promising results. A study by Choi, Chen, & Lee (2024), compared multiple ML models on a dataset of Korean companies’ financial ratios over three years, aiming to predict the companies’ ESG ratings. They evaluated linear models, tree ensembles, and neural networks, and while applying SHAP (Shapley Additive Explanations) to interpret the variable importance. In their results, XGBoost was found to be the most effective model, achieving an F1-score of 85.1% in classifying ESG ratings. Similarly, Lin and Hsu (2023) included XGBoost in their evaluation of ESG prediction mod- els for Taiwanese firms and found it to perform competitively, although an alternative model—Extreme Learning Machine (ELM)—slightly outperformed it in their dataset. However, the literature also cautions that particularly in ESG applications where datasets can consist of relatively small panels, boosting algorithms require careful hyperparame- ter tuning to prevent overfitting. In summary, ensemble tree-based models have demonstrated strong predictive perfor- mance in ESG score modelling. By capturing nonlinear relationships and complex feature interactions, methods such as Random Forest and XGBoost often outperform linear re- gression models, which assume constant marginal effects. For instance, the impact of profitability on ESG scores may vary nonlinearly, strengthening or diminishing beyond certain thresholds. The collective evidence from recent studies suggests that these mod- els can effectively learn the functional mapping between financial ratios and ESG ratings, with reported R2 values and classification metrics substantially exceeding baseline 22 accuracy (e.g., Choi et al., 2024; D’Amato et al., 2021). For more recent example, Alsay- yad and Fadel (2025) findings demonstrated high predictive performance in their com- prehensive machine learning study on ESG scores and employed panel data with best R2 scores reaching over 0.9. The findings generally indicate that a considerable portion of the variance in ESG ratings can be explained by financial data. However, it should be noted that each study’s results depend on the specific dataset and ESG rating agency used, as each have unique meth- odologies. Additionally, several studies point to diminishing returns: once a robust tree- based model is in place, even more complex approaches may not dramatically improve accuracy, as we discuss next. 2.2.3 Deep Learning Models (Neural Networks) Given the success of machine learning in predicting ESG scores, researchers have inves- tigated deep learning approaches, such as multilayer artificial neural networks (ANNs), to see if they can further improve prediction performance. Neural networks can, in the- ory, capture very complex nonlinear interactions in data. However, in the context of ESG score prediction, deep learning has been explored less in comparison to tree-based mod- els, and the empirical results are mixed. Del Vitto et al. (2023) evaluated multiple ANN architectures in their effort to replicate Refinitiv’s ESG scoring methodology. The authors tested both shallow and deep networks with varying the number of layers and hidden units, and benchmarked their perfor- mance against simpler models, including Lasso regression and Random Forest. They found that increasing the depth and complexity of the neural networks did not consist- ently improve prediction accuracy. In some cases, a simpler ANN with fewer hidden lay- ers performed comparably to, or better than, more complex architectures. Furthermore, the highest overall performance was achieved by the regularized linear models and the simpler ANN, rather than by the deeper ANNs or ensemble methods. These findings sug- gest that while ESG–financial relationships are nonlinear, they may not require deep 23 architectures to model effectively. This could be due to the moderate size of structured ESG datasets and the risk of overfitting when models include too many parameters rela- tive to the data available (Del Vitto et al. (2023). Other studies reinforce the view that deep learning should be applied with caution in the context of ESG score prediction. Choi et al. (2024) included a neural network in their model comparison when classifying ESG ratings for Korean firms but ultimately found the tree-based XGBoost model superior. Lin and Hsu (2023) studied ESG score prediction of Taiwanese non-financial companies using 27 financial metrics with corporate governance indicators. The used an Extreme Learning Machine (ELM), a form of single-layer neural network with random weights and reported that ELM achieved excellent performance (R2 of over 0.9 for multiple models), slightly outperforming Random Forest and XGBoost in predicting ESG scores for their dataset. While ELM is technically a neural approach, it is not a deep learning model. It rather offers an efficient architecture for capturing nonlinearities in relatively small da- tasets. These findings suggest that neural models—especially shallow or lightweight variants like ELM—can perform competitively in ESG prediction. However, evidence from recent studies indicates that deep neural networks have not consistently outperformed boost- ing or ensemble tree methods when using structured financial data alone. Although hy- brid deep learning approaches incorporating unstructured data such as ESG reports or news sentiment are gaining attention, they fall outside the scope of predictions based on structured data and are rather new topic in the research. As such, the incremental benefit of deep learning over more interpretable machine learning models remains lim- ited in this domain, particularly given concerns around overfitting, data volume, and model transparency. An advantage of neural networks is their flexibility in integrating heterogeneous data sources, such as combining structured financial indicators with un- structured information like textual disclosures or ESG news. However, this flexibility 24 comes at the cost of reduced model interpretability, which presents a limitation for sus- tainability assessment. To address this concern, recent studies have increasingly em- ployed explainable AI (XAI) techniques to interpret the internal logic of complex models. For example, both Del Vitto et al. (2023) and Choi et al. (2024) applied SHAP (Shapley Additive Explanations) to their ESG prediction models, enabling them to identify which input features, such as the debt-to-equity ratio, return on assets, or carbon emissions, had the greatest influence on predicting ESG scores. A related line of research explores the integration of natural language processing (NLP) and model robustness techniques. For example, Lee et al. (2022) proposed an AI frame- work for predicting firm-specific ESG ratings by analysing governance and social-related datasets using a combination of machine learning and NLP algorithms. In addition to evaluating multiple models for prediction accuracy, their study addressed the vulnera- bility of ESG systems to adversarial attacks, which they describe as malicious manipula- tions of input data that can distort rating outcomes. They introduced a method for de- tecting such attacks, contributing to the growing emphasis on data reliability and secu- rity in ESG analytics. While such hybrid approaches extend beyond structured financial data and remain relatively novel, they showcase how new AI technology can be applied to expand the scope and resilience of ESG prediction models. The use of XAI contributes to making obscure models more transparent, which is im- portant in the ESG domain where stakeholders seek to understand the drivers behind sustainability ratings. Insights from these interpretability tools also reinforce broader findings in the literature across both complex and simpler models, a relatively consistent set of financial variables frequently emerges as key predictors of ESG performance; prof- itability, firm size, leverage, and industry-specific environmental or social factors are among the most cited drivers, hinting that certain financial fundamentals hold robust explanatory power across models and contexts. 25 In summary, while model performance varies, recent literature confirms that ESG scores can be predicted with reasonable accuracy using firm-level financial data and machine learning techniques. The table below summarizes key studies from this literature review, highlighting their methods, data sources, and key findings. Table 1. Summary of key studies. Study Data Methods Key Findings Add. Insights D’Amato et al. (2021) Euro Stoxx 600 firms, Bloomberg ESG scores Random Forest vs. linear models RF achieved R2 ~0.62, indicating fi- nancial metrics ex- plain ESG scores Highlighted the im- portance of struc- tural financial data in ESG ratings. Del Vitto et al. (2023) Refinitiv ESG scores, global firms by sec- tor Lasso, Ridge, Deci- sion Tree, Random Forest, ANN Lasso and a shallow ANN were best pre- dictors of ESG; deeper ANNs didn’t improve much ESG ratings can be largely replicated with a selected fea- ture set and ML models Lin & Hsu (2023) Taiwan companies, ESG index scores (2018–2021) SVM, Random For- est, XGBoost, Ex- treme Learning Ma- chine (ELM) High accuracy (~0.9+) R2 with dif- ferent models Integrating financial and governance in- dicators is effective Chowdhury et al. (2023) 6171 firms from 2005 to 2019 Six machine learn- ing classification models The RFC model was superior with 78.50% accuracy findings highlight the relationship be- tween firm size, li- quidity, and ESG in- vesting. Choi et al. (2024) Korean firms, ESG ratings from a local agency Multiple (linear, RF, XGBoost, deep NN) + XAI XGBoost was best (F1 ~85%), beating deep neural nets Financial factors (leverage, profitabil- ity) were significant Cini & Ferrari (2025) Euro Stoxx 600 (2016–2021), MSCI (or similar) ESG rat- ing classes Random Forest clas- sification to predict next-year ESG rating class from current financial ratios + a systemic risk metric High out-of-sample accuracy in classify- ing ESG ratings one year ahead Investors can fore- cast sustainability improvements or deteriorations using financial data 26 3 Data and Data Processing Data for this study is gathered from LSEG database. LSEG is one of the largest and most important financial data and ESG score providers, covering more than 80% of the global market capitalization (LSEG.com). LSEG ESG ratings are percentile rank scores, ranging from 0 (lowest) to 100 (highest). These ratings aim to objectively assess a company's relative ESG performance. LSEG states that their ESG ratings are data-driven and consider the most crucial industry metrics and are adjusted to consider biases related to trans- parency and market capitalization. However, their scores are not exempt from some of the main problems with ESG scores. The original dataset consists of six hundred European companies from StoxxEurope600 index from years 2014 to 2023. Original variables are selected based on suggestions and previous findings within the literature, based on Chowdhury et al. (2023), D’Amato et al. (2021) and D’Amato et al. (2022). D’Amato et al. (2021) convincingly argue that using ratios that consider the overall financial statements of the companies representing prof- itability, liquidity and solvency ratios is more informative and improves the characteriza- tion of companies, in contrast to using absolute financial statement values when aiming to explain ESG scores, which is why in this paper, the step of testing model characteristics importances using solely financial statement values is skipped, and financial statement ratios are used instead. After testing for variable correlations and initial model performances some potentially influential variables originally recommended in the literature, such as NI/NS (Net income / Net Sales) were redacted from the final dataset and model, leaving EBIT/NS as the main proxy for profitability due to having proportionally higher prediction power in the mod- els whilst having over 96% correlation with NI/NS. After omitting variables based on initial model performances and variable correlation, the following is a list of variables used in the models: 27 • YEAR: 2014-2023 • INDUSTRY: General industry sector classification variable (range: 1-6). They are transformed into dummy variables in the model. • ESGScore: ESG score from LSEG database. • ESGE: Environmental score from LSEG. • ESGS: Social score from LSEG. • ESGG: Governance score from LSEG. • ASSETS: Total assets of the company. • SIZE: Logarithm of assets. • NS/TA: Net Sales divided by Total Assets. o Efficiency ratio (turnover). o Measures how efficiently a company uses its assets to generate sales. o Indicates operational efficiency and asset utilization; a higher ratio indi- cates more effective use of assets to drive revenue. • EBIT/NS: The ratio of Earnings Before Interest and Taxes to Net Sales. o Profitability Ratio (Operating Margin) o The proportion of sales remaining as operating profit before accounting for interest and taxes. o Compares operational performance across companies regardless of their financing and tax structures. • DIV: Dividend yield. o Income (yield) ratio. o The annual dividend per share relative to the stock price. o Indicates the cash return on investment and can reflect the company's commitment to returning profits to shareholders. • P/E: Price to Earnings ratio. o Valuation Ratio. o The market value of a stock relative to its earnings per share. o Provides insights into market expectations and relative valuation. • CA/CL: Current Assets to Current Liabilities. 28 o Liquidity Ratio (working capital). o The ability of a company to cover its short-term liabilities with its short- term assets. o Serves as an indicator of short-term financial health and liquidity. • TD/TA: Total Debt to Total Assets. o Solvency (Leverage) Ratio. o The proportion of a company’s assets financed by debt. o Evaluates financial risk and solvency; a lower ratio typically implies a more conservative capital structure. • ROE: Return on Equity. o Profitability Ratio. o The profitability relative to shareholders’ equity, reflecting how effec- tively a company uses equity capital to generate profits. o Reflects management effectiveness and overall profitability relative to equity; critical for comparing performance among companies in the same industry. • ROA: Return on Assets. o Profitability Ratio (with an efficiency component). o How effectively a company generates profit from its total asset base. o View of operational efficiency and profitability, valuable for comparing companies irrespective of their financing structures. • P/E missing flag o Categorical variable included in the model where P/E was missing. • CA/CL missing flag o Categorical variable included in the model where CA/CL was missing. Processed data used in the models consists of ESG score as the main outcome variable, and Environmental, Social and Governmental scores separately, as other dependent var- iables. Independent variables consist of general industry classification from 1 to 6, Year, Total Assets, Dividend Yield, ROE, ROA and categorical flags for missing P/E and CA/CL 29 ratios, and financial ratios of Net Sales / Total Assets, EBIT / Total Assets, EBIT / Net Sales, Price / Earnings, Current Assets / Current Liabilities and Total Debt / Total Assets. Similarly to D’Amato et al. (2021) in this paper “Year” is included as a separate static variable and disregards the year-on-year changes. In the model years 2014 to 2023 are considered, although for year 2023, most of the ESG scores were missing at the time of obtaining the dataset. Chowdhury et al. (2023) argues that based on variable importance factor, the Lagged ESG score is the most important predictor of ESG, followed by firm size and debt to eq- uity ratio, indicating that previous investments into ESG, firms’ total assets and financial leverage are the best predictors of ESG score. In testing and model optimisation process of this paper, Lagged ESG score was also found to be the most important predictor of ESG scores based on variable importances. However, it can be reasoned that this finding is rather obvious, and the variable alone could explain most of the ESG score in the model testing phase, dominating the model and results. As the objective of this paper is to aim to predict ESG scores from financial statement items and reveal information about the underlying influence of financial statement items on ESG scores, including a lagged pre- dictor of the dependant variable itself as independent variable rather works to defeats this purpose, and thus in this paper it is omitted from the models. 3.1 Data Integrity and Initial Screening The dataset has a panel-like structure, containing annual observations for STOXX Europe 600 companies from 2014 to 2023, but is analysed cross-sectionally. The dataset con- tains no duplicate values. As first step of the data processing, observations with missing values in ESG Scores were omitted to ensure consistency of the dependent variables. The remaining dataset contains 4937 valid firm-year observations and is described in the fol- lowing tables in this chapter. 30 3.2 Currency Standardization The financial data in the original dataset varied in currency, showcasing different firms’ financial information in their local currencies, containing observations with eight differ- ent currencies: United Kingdom Pound (GBP), Euro (EUR), Danish Krone (DKK), Swiss Franc (CHF), Hong Kong Dollar (HKD), Norwegian Krone (NOK), Swedish Krona (SEK) and Polish Zloty (PLN). Since most of the variables in the model are ratios, considering com- patible values between firms with different currencies only affected total assets. Assets were converted to EUR using the average annual EUR exchange rate corresponding to each observation’s year to ensure comparability across firms and time. 3.3 Descriptive Statistics of the Processed Dataset Table 2. Summary Statistics table. ESG Score ESG E ESG S ESG G AS- SETS EBIT/ NS TD/ TA NS/ TA CA/ CL DIV Y P/E ROE ROA count 4937 4937 4937 4937 4935 4901 4935 4934 3807 4916 4562 4887 4870 mean 65.37 64.4 3 68.8 4 61.5 2 853755 84 0.23 0.25 0.67 1.60 2.72 38.7 6 17.5 3 6.89 std 17.28 23.4 4 19.8 7 20.9 2 342680 815 1.02 0.16 0.58 1.32 2.67 206. 03 65.7 6 12.35 min 2.60 0.00 0.25 1.45 36571 -28.28 0.00 -0.05 0.21 0.00 0.20 - 262. 32 -63.72 25% 54.96 49.4 4 56.8 3 46.9 1 394565 0 0.07 0.13 0.23 0.99 1.18 12.5 0 7.46 2.05 50% 68.55 69.6 9 73.4 3 64.9 5 103953 92 0.13 0.24 0.58 1.31 2.35 19.0 0 13.2 4 5.38 75% 78.45 83.2 0 84.4 1 78.4 2 403248 22 0.23 0.35 0.90 1.80 3.92 28.2 0 20.9 1 9.18 max 95.72 99.1 4 98.2 0 98.5 6 663919 8547 29.13 1.32 4.41 29.2 7 111. 23 8105 .00 2409 .86 269.11 31 The descriptive statistics of all variables are presented in Table 2, summarizing the cen- tral tendency and dispersion for ESG scores and the associated financial statement items. 3.3.1 Overview of ESG Scores The mean ESG Score of the sample is approximately 65.4 with a standard deviation of 17.3, indicating that most companies cluster around mid-to-high ESG performance levels. The environmental (E), social (S), and governance (G) pillars show comparable patterns: the S-score has the highest mean (68.8) and slightly lower dispersion, suggesting more consistent social-performance evaluations across firms, while the G-score shows the greatest variability (SD ≈ 20.9), potentially indicating broader differences in corporate governance practices across Europe. The overall range of scores (minimum ≈ 2.6, maxi- mum ≈ 95.7) shows that the dataset contains both low- and high-performing firms, of- fering variation for predictive modelling. The time-series analysis presented later in Table 3 (Section 3.4) reveals an upward trend in ESG means and a gradual reduction in standard deviations over the period 2014–2023. This pattern is consistent with the increasing institutional emphasis on sustainability re- porting and improved data coverage in Europe, which have led to more homogeneous ESG assessments in recent years. 3.3.2 Overview of Financial Variables The financial ratios display considerable heterogeneity, which is expected given the cross-industry composition of the sample. Total assets (ASSETS) vary widely, from roughly hundreds of thousands to over €6 billion, reflecting the coexistence of smaller firms and multinational corporations. Ratios such as EBIT/NS (mean = 0.23, SD = 1.02) and ROE (mean = 17.5, SD = 65.8) show substantial dispersion, partly driven by outliers in profitability and capital structure. This variation underscores the need for robust algo- rithms and outlier treatment during modelling. 32 Leverage-related ratios, such as Total Debt to Total Assets (TD/TA), display relatively moderate variation (mean = 0.25, SD = 0.16), suggesting that debt levels among listed European firms are somewhat stable across industries. In contrast, CA/CL exhibits wide variation (mean = 1.6, SD = 1.3) where observed, reflecting differences in liquidity struc- tures, especially between manufacturing firms and financial institutions. The P/E ratio demonstrates extreme spread (mean ≈ 38.8, SD ≈ 206.0, max > 8,000), indicating the presence of a few extraordinarily high values. Such dispersion arises from low or near-zero earnings denominators, highlighting an example reason for winsorising. Missing or undefined P/E values need to be handled carefully, which will be further dis- cussed in missing-data assessment and the subsequent Imputation chapter. 3.3.3 Interpretation and Relevance for Modelling The descriptive analysis highlights some considerations for modelling. The dataset is suf- ficiently diverse with ranging firm sizes, profitability levels and sustainability outcomes for aiming to find relationships between financial statement items and ESG scores. The magnitudes and dispersion of variables indicate a need for data processing choices, such as winsorisation of extreme values, standardization for linear algorithms, and con- text-specific handling of missing data. 3.4 Distribution Across Time Table 3 presents the mean and standard deviation of the ESG score and its subcompo- nents for years 2014 to 2023. The results show an upward trend in average scores over the period, accompanied by a gradual reduction in dispersion. The mean overall ESG score rises from approximately 58 in 2014 to about 70 in 2021–2022, while standard deviations decline from around 19 to 14. Similar dynamics are observed across the three pillars, although the magnitude and pace of change vary slightly: the Environmental (E) 33 and Social (S) dimensions increase more steadily than Governance (G), which remains comparatively volatile. Table 3. Main statistics of the ESG, E, S and G scores distribution by year of the sample of 600 companies listed in the STOXX Europe 600 Index. ESG Score E Score S score G score Year Mean SD Mean SD Mean SD Mean SD 2014 58.52 19.46 61.24 25.34 60.34 23.11 54.76 22.09 2015 60.28 19.57 62.19 25.19 63.52 22.66 55.48 22.32 2016 61.90 18.23 63.74 23.92 65.85 21.57 56.29 21.80 2017 63.37 17.55 63.48 24.28 68.69 19.90 57.08 21.62 2018 64.79 17.53 60.82 25.47 69.26 19.49 61.08 21.05 2019 66.86 16.23 64.25 23.52 70.58 18.73 63.42 19.71 2020 69.30 15.33 66.00 22.28 72.22 17.46 67.43 18.66 2021 70.03 14.44 67.74 20.92 73.00 16.64 67.42 18.26 2022 70.02 13.98 68.86 19.67 73.00 16.67 66.39 18.38 2023 67.17 13.39 65.46 20.27 68.51 16.50 66.21 18.48 The pattern indicates that firms in the STOXX Europe 600 have generally improved their reported sustainability performance during the past decade. The concurrent decline in standard deviations suggests convergence among firms, meaning that extreme low per- formers have become less frequent while mid-range and high-range scores have become more typical. This aligns with the general observations within sustainability literature, relating to the increasing importance of sustainability, especially in Europe. Whilst, ac- cording to LSEG, the ESG scores by LSEG do include country specific characteristics, in general they are still comparable to ESG scores within other continents, and Europe is pioneering as a market in sustainability, in relation to other markets. Over the years, ESG reporting has benefited from enhanced disclosure requirements and more consistent evaluation frameworks (LSEG), which may also reflect the gradual im- provement in the underlying data infrastructure. The increase in mean scores and the narrowing of their spread can both result from a combination of progress in corporate 34 sustainability and a methodological maturation of ESG measurement. The pattern sug- gests that the dataset captures broader systemic changes in European sustainability re- porting and corporate sustainability in addition to firm-level differences. 3.5 Industry Classification To capture cross-sectoral patterns in sustainability performance, the dataset is divided into six broad industry groups based on the general business classification used through- out this thesis. The classifications and their definitions are presented in Table 4. Table 4. General Industry Classification explanation. 1. Various Industries This category is a “catch-all” category for companies operating across different sectors, contains a diverging range of industries, including personal goods, pharmaceuticals and biotechnology, technology hard- ware and equipment, food producers, retail, oil, gas, coal, aerospace, etc. 2. Electricity and Tele- communications Companies involved in providing electricity, telecommunications, etc. utilities 3. Transportation, Travel, Leisure Companies involved in transportation services, travel, leisure, and re- lated industries such as tourism. 4. Banks Banks and financial institutions engaged in commercial banking activi- ties. 5. Insurance Companies operating under insurance industry. 6. Real Estate and Invest- ments Companies involved in real estate, investment management, invest- ment banking, etc. The mean and standard deviation of ESG, E, S, and G scores by industry are summarized in Table 5 below. 35 Table 5. Main statistics of the ESG, E, S, and G score distribution by industry sector of the sample of 600 companies listed in the STOXX Europe 600 Index. ESG score E score S score G score Sec- tor Mean SD Mean SD Mean SD Mean SD 1 65.85 17.02 63.25 23.06 70.37 19.69 61.16 21.04 2 69.28 14.72 69.64 19.65 70.98 19.02 64.63 17.14 3 63.53 15.17 64.44 19.49 67.49 16.88 58.58 22.10 4 66.54 17.26 73.92 22.02 69.41 18.12 62.02 21.16 5 64.44 16.05 65.18 23.51 62.08 18.42 69.32 18.50 6 58.04 20.44 58.80 28.06 59.44 21.68 56.30 22.34 Across industries, ESG performance varies notably both in mean values and dispersion. Firms in Electricity and Telecommunications (Industry 2) show the highest average ESG and E-scores, consistent with the sector’s strong exposure to environmental regulation and renewable energy transition policies. Conversely, Real Estate and Investment firms (Industry 6) record the lowest overall ESG averages and the widest dispersion, reflecting structural heterogeneity and differing reporting standards within that category. Banks (Industry 4) exhibit relatively high Environmental (E) scores compared to other sectors, which may stem from their lower direct emissions and increasing engagement in sustainable finance. In contrast, Insurance (Industry 5) firms tend to perform better in Governance (G), likely due to regulatory oversight and mature compliance systems. These cross-industry contrasts confirm that sustainability outcomes are influenced not only by firm-level financial characteristics but also by sector-specific operational and reg- ulatory contexts. From a modelling perspective, such heterogeneity underscores the importance of includ- ing industry identifiers or fixed effects when predicting ESG outcomes. Industry-level var- iation can capture systematic differences in disclosure norms, business models, and risk exposures. The implications of industry structure for feature importance and model be- haviour will be revisited in later chapters, where variable importance and interpretability methods (e.g., SHAP values and partial dependence plots) are discussed. 36 Figure 1. Mean ESG Score by Year and Industry. Figure 2. Standard deviation of ESG Score by Year and Industry. 37 3.6 Correlogram To examine the relationships among variables and assess potential redundancy among predictors, pairwise correlations were computed using Pearson’s correlation coefficient (r). Pearson’s r quantifies the linear association between two continuous variables, rang- ing from –1 to +1, where values near ±1 indicate a strong linear relationship, and values close to zero imply weak or no correlation. Correlation analysis provides a diagnostic step for identifying potential multicollinearity, which can affect model stability and interpret- ability — particularly for linear estimators. It should be noted that the variables included in this correlation matrix represent the final selected features after preliminary testing. During the earlier data preparation phase, variables showing near-perfect linear dependence and limited marginal contribu- tion were excluded to reduce redundancy. For example, NI/NS (Net Income / Net Sales) was removed due to its very high correlation (r ≈ 0.96) with EBIT/NS, while the latter was retained as a more informative profitability measure with higher predictive relevance in preliminary model evaluations. Consequently, the correlogram presented in Figure 3 vis- ualizes correlations among the refined set of predictors that were ultimately used in model training. To analyze the correlative relationships within variables in the dataset, the variables are plotted in correlograms, which can be found in Figures 3. Positive correlations are illus- trated in red while negative correlations are illustrated in blue. Color intensity is propor- tional to the correlation coefficient. 38 Figure 3. Correlogram. The strongest positive associations appear between Return on Equity (ROE) and Return on Assets (ROA), both profitability-based measures driven by net income performance. Similarly, EBIT/NS exhibits a strong positive correlation with ROE, indicating that firms with higher operating profitability typically achieve higher returns on equity. Moderate negative correlations emerge between leverage and profitability measures, such as be- tween Total Debt to Total Assets (TD/TA) and ROA, suggesting that higher leverage is, on average, associated with lower returns. Liquidity ratios such as CA/CL show weaker or more heterogeneous relationships with profitability, indicating that short-term solvency conditions vary independently from performance and sustainability factors across sec- tors. Overall, the correlation results indicate moderate interdependencies but no critical mul- ticollinearity among the retained predictors. Tree-based models such as Random Forest and XGBoost, which form the primary modelling techniques in this study, are robust to 39 the remaining correlations due to their hierarchical structure. For linear models such as Ridge and Lasso regression, regularization further mitigates residual multicollinearity, which will be further discussed. From a broader perspective, the observed positive associations among size and profita- bility ratios and ESG outcomes suggest that larger and more profitable firms can achieve higher sustainability ratings. While this does not imply causality, it hints that financially healthy firms may possess greater resources and incentives for sustainability practices. 3.7 Density Functions of ESG Scores Figure 4 displays kernel density estimates (KDE) for the overall ESG score and its E, S and G score subcomponents. The dataset contains 4,937 firm-year observations. KDE is a non-parametric method for estimating the probability density function of a random var- iable, providing a representation of how values are distributed across the sample (Silver- man, 1986). Figure 4. Density functions of ESG score, E, S and G. 40 Table 6. Density functions of ESG score, E, S and G. ESG score ESG E ESG S ESG G Sample Size 4937 4937 4937 4937 Bandwidth 3.15344 4.27896 3.62623 3.8181 The bandwidth determines how much the curve is smoothed: A smaller bandwidth pro- duces a curve that follows local fluctuations more closely, and a larger bandwidth smooths over wider ranges, emphasizing the general shape of the distribution but po- tentially hiding smaller peaks or irregularities. Bandwidths are determined using Silverman’s rule of thumb, resulting in values between approximately 3.1 and 4.3 for the ESG variables. The values in the figure mean that, for example, the ESG Score distribution is smoothed over windows of about three units on the 0–100 scale, while ESG E is smoothed slightly more broadly. The density curves show how ESG scores are not uniformly distributed. Most observa- tions cluster in the 60-85 range, while relatively few observations locate at extremes. E and G distributions appear broader and more dispersed, and S has slightly sharper peak at high values. The density functions matter as they highlight both the central tendency and dispersion of ESG scores in the dataset. 3.8 Missing Data Assessment and Handling It is important to examine the extent and nature of missing data before modelling, as missingness can affect both the reliability and interpretability of results. In the dataset, missing values occur across several variables with differing magnitudes and underlying causes. Table 7 summarizes the proportion of missing observations for each variable by industry classification. The missingness pattern is unevenly distributed, both in terms of variables and industries, suggesting that values are not missing completely at random. 41 This section focuses on describing the patterns and implications of missing values within the dataset, while the specific imputation procedures and justifications are presented separately in Section 3.9.; Pre-Modelling Procedures and Pipeline Design. Table 7. Missing values per industry table. Industry 1 Missing Industry 2 Missing Industry 3 Missing Industry 4 Missing Industry 5 Missing Industry 6 Missing ESG score 0 0 0 0 0 0 ESG E 0 0 0 0 0 0 ESG S 0 0 0 0 0 0 ESG G 0 0 0 0 0 0 ASSETS 2 0 0 0 0 0 EBIT/TA 14 0 0 14 6 0 EBIT/NS 15 0 0 14 6 1 TD/TA 2 0 0 0 0 0 NS/TA 3 0 0 0 0 0 CA/CL 2 0 0 424 293 411 DIV Y 18 1 0 1 0 1 P/E 251 32 9 46 15 22 ROE 46 2 0 1 0 1 ROA 10 1 0 55 0 1 3.8.1 Extent and Distribution of Missingness Table 7. reports the proportion of missing observations for each variable across the six industry classifications. The pattern is uneven and concentrated in specific variables and sectors. The Current Assets to Current Liabilities (CA/CL) ratio shows the highest level of missingness, with approximately one-quarter of observations absent overall. This ab- sence particularly occurs in the financial and real-estate sectors—industries 4 (Banks), 5 (Insurance), and 6 (Real Estate and Investments)—where in several cases all CA/CL values are missing. 42 Another variable affected by substantial missingness is the Price-to-Earnings (P/E) ratio. Inspecting the data reveals that most missing P/E entries appear with negative Earnings Before Interest and Taxes (EBIT), making the ratio undefined rather than simply unre- ported. Missingness is therefore embedded in the accounting structure of the variable. For the remaining financial ratios, missing values are comparatively rare and irregular, suggesting minor gaps in firm-level reporting rather than systematic omissions. 3.8.2 Mechanisms of Missing Data These patterns imply that missingness is not Missing Completely at Random (MCAR), where the probability of missingness is independent of any variable in the dataset. Ra- ther, the missingness mechanisms align with Missing at Random (MAR) or Missing Not at Random (MNAR). The CA/CL variable follows an industry-dependent pattern consistent with MAR, since the likelihood of missing values depends on a categorical factor (industry classification) observable in the data. In contrast, the P/E variable is closer to MNAR, because missing- ness is systematically related to the unobserved (negative) earnings values that make P/E undefined. These mechanisms imply that missingness has economic meaning which needs to be considered. 3.8.3 Implications for Data Integrity and Modelling Recognizing the origins of missingness has implications for how these gaps should be treated. Excluding all observations with missing values would remove entire industries and firms with negative earnings from the analysis, introducing selection bias and reduc- ing the sample size, and simple mean imputation would ignore economically meaningful differences between industries or profitability levels. Therefore, a more context-sensi- tive approach is needed, which allows the dataset to remain as complete as possible while preserving the interpretability of model relationships. 43 3.8.4 Treatment Principles In this thesis, for models that need complete data, the missing data are handled using variable-specific strategies that depend on the structure and origin of the missingness. For variables such as CA/CL and P/E, missingness itself carries interpretive value, indicat- ing the presence of unique financial structures or earnings conditions. The chosen meth- ods aim to retain their informational content while ensuring that models requiring com- plete inputs can still be trained. The details of these procedures—including the use of sentinel values paired with missingness indicators, the rationale for applying them only to selected industries, and how they are implemented separately within training and test partitions—are presented in the “Pre-Modelling Procedures and Pipeline Design” chap- ter 3.9. For variables with minor or unsystematic missingness, such as ROE, ROA, or Div- idend Yield, a more conventional industry-mean imputation approach is applied, as de- scribed there as well. In contrast, models such as XGBoost inherently manage missing values during tree con- struction and therefore do not require these imputations. 3.9 Pre-Modelling Procedures and Pipeline Design Before applying imputation methods, the data is randomly split into training (80%) and testing (20%) sets, to prevent data leakage. Next step of the data processing is to deal with the rest of the missing values. When choosing an imputation strategy, it should be considered why values are missing, and how extensive the missingness is. Examining the data, it was discovered that missingness is most prevalent in CA/CL and P/E - CA/CL having approximately quarter of the observations missing, which are occur- ring in banks, insurance, and real estate and investment companies (Industries 4 to 6). 44 Since all the CA/CL observations in industries 4 and 6 are missing, imputation methods such as MissForest or mean imputation would not produce desirable outcomes for them, as the industries do not have values to interpret from. Due to this, in some models, for industries 4 and 6, an absolute value of -999 was imputed where CA/CL was missing, with missing flag to highlight the missingness. This compromise was the result of testing different methods to deal with this issue – including omitting CA/CL as variable. As a result, the models where this data processing was used were able to recognize this solu- tion as a variable without real weight for predicting ESG Score, as the value of -999 is absurdly large in comparison to average CA/CL values, paired together with the missing- ness flag and only appearing in two industries. This allowed for keeping the variable in the models that cannot internally handle missing values, as for real CA/CL values the variable was a contributing predictor for ESG Score. Another variable with high missingness was P/E. In the case of P/E, lots of the observa- tions flagged for missing value also had negative Earnings Before Interest and Taxes. For those companies, similarly, due to most of the existing P/E values being positive, and missing values being negative by their real nature, typical imputation methods do not provide desirable outcomes. Because of this, for models XGB3 and RF2 missing P/E val- ues were replaced with figure -999, when their EBIT was negative, and flagged with cat- egorical variable of 1 to identify them, as most missing P/E values resulted from the com- pany having negative earnings, and the models require a value for optimal performance, without having to sacrifice the predictor itself from the analysis, or the corresponding rows of the missing variable. This method did not appear to bias the results, and it pro- vided higher prediction accuracy for Random Forest models than RF models that handled missing values internally, as with the missing flag, the model able to disregard imputed values that were way off from real ones, as well as being able to interpret the positive link between sustainability score and simply having a positive P/E value. This is due to the problem where when there are no negative P/E values, and thus the model is only comparing positive ones, the predictive power over sustainability score is low. The main predictive power of this variable only showed up when observations with missing value, 45 as in negative earnings, can be included. For observations with positive EBITs, missing P/E values were imputed with industry means. For other variables, the general missingness was low, and they were imputed with mean values of their respective industries, computed on the training set. The imputation re- sults were then applied to the test set. It should be noted that the motivation behind these imputation choices for specific mod- els was an aim to get the best results from models that have theoretical potential for competitive prediction accuracy but have limited functionality to accommodate for the weaknesses of the dataset, in terms of missing values. While these imputations im- proved the performance of random forest models, they were ultimately outperformed by XGBoost model that handled missingness internally and did not require previously discussed imputations. 3.10 Winsorisation Winsorisation is the process of moving outlier values to match values of a specific quan- tile (Ranta, 2023, Ch. 9). Analysing the dataset, a winsorisation threshold of 1%-99% is selected, similarly to, for example, Chowdhury et al. (2023), as it appears as a literature standard for firm-level financial datasets of similar scale. For Random Forest, winsorisation was performed after the imputation stage to the train- ing set to ensure that outlier caps were computed using the complete imputed distribu- tions of each variable. P/E was capped only on the upper tail at the 99th percentile among positive values leaving sentinel unchanged, while other continuous variables were sym- metrically capped at the 1st and 99th percentiles. The same percentile thresholds, esti- mated from the training set, were applied to the test set to ensure consistent feature ranges without leaking target information. Finally, industry dummies were created and 46 aligned across splits. The adopted order maintains consistent data ranges and prevents the reappearance of outliers introduced by later steps. For XGBoost, winsorisation was applied to all continuous financial ratios to mitigate the influence of extreme outliers while preserving the overall rank order and structure of the data. ASSETS were winsorised, and only after turned into LOG(ASSETS). The 1st and 99th percentile thresholds were computed from the training set and then applied to both the training and test set, ensuring consistent feature ranges and preventing data leakage. As XGBoost natively handles missing values, winsorisation was applied only to the non- missing values, leaving NaN entries untouched for internal treatment by the model. 47 4 Methodology This chapter provides a short overview of the different machine learning techniques used in the thesis, and follows by reviewing the model specifications, such as data pro- cessing withing the model and model parameters. 4.1 Machine Learning Models – general information on models used This subchapter provides short overview of the different machine learning techniques used in the thesis. 4.1.1 Random Forest Introduced by Leo Breiman in 2001, Random Forest is an ensemble learning method, that constructs a multitude of decision trees and combines their results for predictions (Ramchandran et al., 2021). The random forest algorithm selects a random subset of fea- tures for each weak estimator known as random subspace method (Ranta, 2023, Ch. 8). The idea of random forest is to reduce overfitting and improve accuracy by averaging many uncorrelated decision trees; While a single decision tree can be prone to bias, a random forest reduces this by training multiple trees on random subsets of data, then aggregating their outputs (IBM, n.d.-a). This so called bagging, or bootstrap aggregating approach with feature randomness makes that each tree provides a unique result, and their average is more generalizable than any tree individually (IBM, n.d.-a). In practice, each tree is grown on a bootstrap sample of the training dataset, and at each split only a random subset of features is considered as candidates for splitting (Breiman, 2001). This process introduces diversity among the trees, preventing any single feature or data point from dominating the model. After training, the forest makes predictions by aggre- gating the trees’ outputs. Random forest is capable of both classification and regression tasks, and for a regression problem, the model averages the predicted values from all trees, and for classification, it takes a majority vote among the trees (IBM, n.d.-a). 48 4.1.2 XGBoost Like random forests, gradient boosting is an ensemble method that combines multiple decision trees, but unlike random forest, it’s boosting builds trees sequentially rather than in parallel (NVIDIA, n.d.). In gradient boosting, each new tree is trained to correct the errors (residuals) of the combined previous trees, gradually improving the model’s performance (NVIDIA, n.d.). The model can be seen as a weighted sum of multiple weak learner trees, which together form a model with strong predictive power. The final model comes together by combining many small decision trees, each one correcting the mis- takes of the previous ones. In contrast to random forests, which reduce variance by av- eraging many independent trees, boosting reduces bias by gradually improving the fit through a sequence of corrections (NVIDIA, n.d.). XGBoost (Extreme Gradient Boosting) is one of the most successful implementations of gradient boosting tenchniques (Ranta, 2023, Ch. 8). It extends the basic gradient boost- ing framework with engineering improvements and regularization, enhancing model speed and accuracy (NVIDIA, n.d.). XGBoost builds decision trees using a novel level-wise parallelization strategy: instead of the strictly sequential tree-by-tree boosting process, XGBoost can grow trees in parallel by processing multiple splits at once (NVIDIA, n.d.). It also includes regularized learning objective, which adds a penalty for model complexity to the loss function, helping prevent overfitting and improving generalization (NVIDIA, n.d.). In practice, XGBoost’s objective at each iteration minimizes a combination of the gradient-based loss (e.g. mean squared error) and a regularization term that penalizes overly complex trees (NVIDIA, n.d.). 4.1.3 RIDGE and LASSO Regression Ridge and Lasso regression extend the scope of linear regression models by including regularization (Ranta, 2023, Ch. 9). The methods constrain the coefficient estimates and force them towards zero. The models address common regression problems of multicol- linearity and overfitting, improving robustness of the models compared to traditional 49 regression. Ridge and Lasso are considered as best regularization techniques (Ranta, 2023, Ch. 9) In a standard linear regression, the model finds coefficients that minimize the residual sum of squares, fitting the data as closely as possible. Both ridge and lasso methods add a penalty to the size of the regression coefficients to the loss function, which prevents the model from relying heavily on any specific variable, making the results more stable (IBM, n.d.-b). Ridge regression (L2 regularization) shrinks coefficients towards - but never fully to zero, retaining all predictors in the model, but reducing their influence (IBM, n.d.-b). Ridge is useful when predictors are correlated, as it spreads their effect more evenly (IBM, n.d.- b). Lasso regression (L1 regularization) can also shrink coefficients to zero. In practice, Lasso can automatically select a smaller subset of predictors, that can improve model inter- pretability and efficiency (IBM, n.d.-b). Lasso is valuable when identifying and selecting the most important predictors is a priority. Both models balance model complexity and prediction accuracy in different ways (IBM, n.d.-b). 4.2 Trained Models This subchapter further reviews the model specifications, such as data processing with- ing the model and model parameters for trained models. The models were implemented and executed in Python using the Google Colab environment. 4.2.1 XGB1 XGBoost model trained on original data without imputation of missing variables, as the model can internally deal with missing values. This was best performing model with both 50 ASSETS and SIZE (LOG(ASSETS)), and performed best after retraining best parameters for model using SIZE around prior best searches, with: Parameter distributions of = { "max_depth": randint [6, 18], "min_child_weight": randint [1, 10], "learning_rate": loguniform [0.01, 0.3], "n_estimators": randint [150, 600], "subsample": uniform [0.6, 0.4], "colsample_bytree": uniform [0.6, 0.4], "gamma": uniform [0.0, 2.0], "reg_alpha": loguniform [1e-4, 10.0], "reg_lambda": loguniform [1e-3, 30.0], }, yielding in best parameters of: {learning_rate≈0.024, max_depth=16, n_estimators=356, sub- sample≈0.643, colsample_bytree≈0.889, gamma≈0.472, min_child_weight=9, reg_alpha≈0.014, and reg_lambda≈0.0013}. The parameter distribution is applicable for following XGboost models as well. Hyperparameter optimization for the XGBoost model was conducted using scikit-learn’s RandomizedSearchCV, which samples random combinations of parameter values from pre-specified ranges and evaluates each using five-fold cross-validation (cv=5). Sixty parameter combinations were evaluated (n_iter=60) and the configuration yielding the lowest cross-validated RMSE was selected. The model was then refitted on the full training set using these optimal parameters (refit=True) to obtain the final tuned estimator. This process of hyperparameter optimization is applicable for the fol- lowing XGBoost and Random Forest models as well. Table 8 below describes what the XGBoost hyperparameters control in the model, and their effects on model learning, based on the official XGBoost documentation (XGBoost Developers, 2024). 51 Table 8. Descriptions of XGBoost hyperparameters (XGBoost Developers, 2024). Parameter Description learning_rate Shrinkage factor applied to each tree’s contribution. A smaller value slows learn- ing and requires more trees, reducing overfitting. max_depth Max depth of individual trees. Deeper trees capture more complex nonlinear re- lationships but increase overfitting risk. n_estimators Number of boosting rounds. Determines total model complexity with learning rate: more trees compensate for lower learning rate. subsample Fraction of the training data randomly sampled for each tree. Introduces random- ness improving generalization and reducing overfitting. colsample_bytree Fraction of features randomly selected for each tree. Reduces feature correlation effects and increases model diversity. gamma Minimum required loss reduction to make a further split. Acts as a regularization term: higher values make the algorithm more conservative, pruning weak splits. min_child_weight Minimum sum of instance weights in a child node. Prevents the creation of nodes representing too few samples or low variance, stabilizing deep trees and control- ling overfitting. reg_alpha L1 regularization term on leaf weights. Encourages sparsity in tree leaf weights, reducing the number of active leaves and simplifying the model. reg_lamba L2 regularization term on leaf weights. Penalizes large weight magnitudes to re- duce model variance and improve generalization stability. 4.2.2 XGB2 Same data and model as XGB1, but uses ASSETS instead of SIZE, and with that has differ- ent best parameters. Best Parameters: {'colsample_bytree': 0.9, 'learning_rate': 0.1, 'max_depth': 15, 'min_child_weight': 5, 'n_estima tors': 290, 'subsample': 0.8} 52 4.2.3 XGB3 Trained on data with same preprocessing steps as RF2. Best Parameters: {'colsample_bytree': 0.9, 'learning_rate': 0.1, 'max_depth': 15, 'min_child_weight': 5, 'n_estima tors': 290, 'subsample': 0.8} 4.2.4 RF1 This RandomForest model trained on data where CA/CL is dropped, as approximately 25% of observations are missing. For low missingness, uses mean industry imputation. For P/E, mean imputation where EBIT is positive, -999 where EBIT is negative. P/E missing flag added for missing observations. Hyperparameters tuned with RandomizedSearchCV, yielding: Best Parameters: {'max_depth': 14, 'min_samples_leaf': 1, 'min_samp les_split': 4, 'n_estimators': 140} 4.2.5 RF2 CA/CL imputed with -999 for industries 4-6 when missing. For missing in industries 1-3, mean industry imputation is applied. CA/CL missing flag added for missing observations. For P/E, mean imputation is applied where EBIT is positive, -999 where EBIT is negative. P/E missing flag added for missing observations. Hyperparameters were searched using RandomizedSearchCV, with: Parameter distributions of ={ "n_estimators": [300, 500, 800], "max_depth": [None, 8, 12, 16, 20], "min_samples_split": [2, 4, 6, 10], "min_samples_leaf": [1, 2, 4, 8], "max_features": ["sqrt", "log2", 0.3, 0.5, 0.7, 1.0], 53 "bootstrap": [True], "max_samples": [None, 0.6, 0.8], "criterion": ["squared_error","absolute_er- ror"], "min_impurity_decrease": [0.0, 1e-6, 1e-5, 1e-4], "max_leaf_nodes": [None, 128, 256], "ccp_alpha": [0.0, 1e-4, 1e-3] } Resulting in best parameters of: {'n_estimators': 800, 'min_samples_split': 6, 'min_sam- ples_leaf': 1, 'min_impurity_decrease': 1e-05, 'max_samples': None, 'max_leaf_nodes': None, 'max_features': 1.0, 'max_depth': 16, 'criterion': 'squared_error', 'ccp_alpha': 0.0001, 'bootstrap': True}. Table 9 below describes what the RandomForest hyperparameters control in the model, and their effects on model learning, based on The RandomForestRegressor implemen- tation in scikit-learn (scikit-learn developers, 2024). Table 9. Descriptions of RandomForest parameters (scikit-learn developers, 2024). Parameter Description n_estimators The number of trees in the ensemble. Larger numbers reduce variance through av- eraging to a certain point. max_depth Maximum depth of individual trees. Controls the detail of each tree’s partitions. Low depth simplify relationships and reduce overfitting; deeper trees allow more complex patterns but increase variance. min_samp- les_split Minimum number of samples required to split an internal node. Higher values: fewer splits and smoother predictions. Lower values: deeper, more detailed trees. min_samp- les_leaf Minimum number of samples required to form a leaf node. Prevents leaves that represent very small sample subsets. Increase smooths prediction. max_features Proportion of predictors randomly considered at each split. Smaller values increase model diversity and reduce overfitting; larger values allow each tree to fit more ac- curately to the training data. bootstrap Whether to sample training observations with replacement when building each tree. The default “True” enables classical bagging, producing more diverse trees and allowing estimation of out-of-bag error for validation. 54 max_samples If set (<1.0), limits the proportion of the training data used for each bootstrap sam- ple. Using less than the full sample increases tree diversity and training speed but can increase bias. criterion Measure split quality. "squared_error" minimizes mean-squared deviations (de- fault for regression); "absolute_error" is more robust to outliers and yields me- dian-based predictions. min_impu- rity_decrease A split is performed only if it decreases impurity (MSE or MAE) by at least this value. Acts as a small regularization term, preventing, low-gain splits and reducing overfitting. max_leaf_nodes Caps the number of terminal nodes in each tree. Provides a direct upper bound on tree complexity, similar to limiting max_depth. Smaller values simplify the model and control overfitting. ccp_alpha Cost-complexity pruning parameter. After trees are built, branches that contribute less than ccp_alpha to overall model performance are pruned. Larger values yield simpler, more regularized models. 4.2.6 RF3 Instead of Sklearn library’s RandomForestRegressor, this model used Xgboost library’s XGBRFRegressor. It uses random forest-style bagging instead of boosting residuals and can natively handle missing values, with Param grid: { "n_estimators": [100, 200, 300, 500], "max_depth": [3, 5, 7, 9, 12], "subsample": [0.6, 0.8, 1.0], "colsample_bynode": [0.6, 0.8, 1.0], "reg_alpha": [0.1, 0.5, 1], "reg_lambda": [1, 2, 3], } Resulting in best parameters of: {‘n_estimators’: 200, ‘max_depth’: 12, ‘subsample’: 1.0, ‘colsample_bynod’: 0.6, ‘reg_alpha’: 1, and ‘reg_lambda’: 1} 55 4.2.7 RIDGE L2 Regularization. Ridge minimizes the squared error with a penalty on the squared mag- nitude of coefficients. It helps prevent overfitting by shrinking coefficients, especially useful when predictors are correlated. Ridge regression natively handles multi-target outputs. Variables with excessive missingness dropped: CA/CL, P/E. For others with low missing- ness, industry mean imputation was applied. Because asset values typically span several orders of magnitude and can be right-skewed, a logarithmic transformation was applied. model was trained with sklearn.linear_model.Ridge. Features were standardized using StandardScaler, and the model used a fixed penalty term alpha=1.0. 4.2.8 LASSO L1 Regularization for Multi-Output. LASSO encourages sparsity by penalizing the abso- lute values of coefficients, which can set some coefficients exactly to zero. The Multi- TaskLasso variant enforces a common sparsity structure ac