Wubshet Solomon A Comparative Analysis of the Use of Deep Learning and Machine Learning in Weather Forecasting: Using Meteorological Dataset on Vaasa Vaasa 2023 School of Technology and Innovations Master’s thesis in Industrial Systems Analytics 2 UNIVERSITY OF VAASA School of Technology and Innovation Author: Wubshet Solomon Title of the thesis: A Comparative Analysis of the Use of Deep Learning and Machine Learning in Weather Forecasting: Using Meteorological Dataset on Vaasa Degree: Master of Science in Technology Programme: Industrial System Analytics Supervisor: Assistant Professor Emmanuel Ndzibah Year: 2023 Pages: 50 ABSTRACT: This study presents a comparative analysis of two prominent technologies, namely deep learn- ing, and machine learning, in the context of weather forecasting. The main research question is “How can machine learning and deep learning algorithm be implemented to obtain near- accurate weather forecasting”? The objectives of this research are identifying the fundamental differences between deep learning and machine learning algorithms handling weather-related dataset and to ascertain the accuracy of using deep learning as compared to machine learning in weather forecasting. The study begins by providing a detailed overview of deep learning and machine learning tech- niques, explaining their fundamental principles, and highlighting their respective implementa- tion in weather dataset. In addition, the focus of the research is on the application of technologies such as polynomial regression, gradient boosting, neural prophet, and recurrent neural network models to the process of weather forecasting. The study applied quantitative methodology and used an open-source dataset from Finnish Meteorological Institute which is a weather record collected from the city of Vaasa. The comparative analysis involves employing those techniques to cap- ture nonlinear relationships between weather variables and the pattern within the dataset. Moreover, the study investigates the performance of each technology and evaluates its effec- tiveness in forecasting weather conditions over different interval of time using performance evaluation matrices. The outcomes of the comparative analysis provide valuable insights into the application of recent machine learning and deep learning methods with regard to the quality and the amount of data applied for the process. This includes proper implementation of data pre-processing techniques, that significantly impact the accuracy of models. KEYWORDS: Deep Learning, Machine Learning, Weather Forecasting, 3 Contents 1 Introduction 6 1.1 Background of the study 6 1.2 Research Gap, Questions and Objectives 7 1.3 Definitions and Limitations 9 1.4 Research process 11 1.5 Structure of the study 13 2 Review Literatures 15 2.1 Machine Learning for Weather forecasting 16 2.1.1 Supervised Learning for weather for forcastng 17 2.1.2 Unsupervised Learning for weather forecasting 19 2.2 Deep Learning for weather forecasting 21 2.2.1 Neural Networks applications in weather forecasting 22 3 Methodology 26 3.1 Research Design 26 3.2 Data Preprocessing 27 3.3 Model Selection 29 3.3.1 Polynomial Regression for weather forecasting 29 3.3.2 Gradient Boosting for weather forecasting 30 3.3.3 Recurrent NN (RNN) for weather forecasting 31 3.3.3 NeuralProphet application in weather forecastng 33 3.4 Performance Evaluation 35 4 Research result and Analysis 38 4.1 Analysis of the dataset 38 4.2 Analysis of of Model results 41 5.Conclusion 45 5.1 Key findings 45 5.2 Conclusions 46 4 Reference 47 List of Figures Figure 1. Research Process ............................................................................................. 12 Figure 2 Study Structure ................................................................................................. 14 Figure 3 Flow chart ......................................................................................................... 16 Figure 4 Unsupervised ML as a single-step process ....................................................... 20 Figure 5 Biological Neuron ............................................................................................. 22 Figure 6 Multilayer perceptron ANN .............................................................................. 24 Figure 7 Dataset Header ................................................................................................. 26 Figure 8 Proposed Model ............................................................................................... 27 Figure 9 Data Pre-processing steps ................................................................................ 28 Figure 10 Linear vs Non-Linear relationship .................................................................. 30 Figure 11 RNN Structure ................................................................................................ 32 Figure 12 Yearly distribution of the feature, temperature ............................................. 38 Figure 13 Trend and Seasonality simulations ................................................................. 39 Figure 14 Number of Null values in the dataset ............................................................. 39 Figure 15 Duplicated value analysis ................................................................................ 40 Figure 16 Boxplot for Maximum temperature ............................................................... 40 Figure 17 Heatmap for feature selection ........................................................................ 41 Figure 18 Polynomial Regression Evaluation matrices ................................................... 41 Figure 19 Actual and prediction value comparison ........................................................ 42 Figure 20 Gradient Boosting model evaluation metrices .............................................. 43 Figure 21 LSTM model prediction ................................................................................... 43 Figure 22 LSTM model evaluation metrices ................................................................... 44 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029575 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029576 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029577 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029578 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029579 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029580 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029581 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029582 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029583 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029584 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029585 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029586 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029587 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029588 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029589 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029590 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029591 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029592 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029593 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029594 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029595 file:///C:/Users/wubdi/OneDrive/Desktop/chapter%201-3%20_updated.docx%23_Toc137029596 5 Abbreviations AI Artificial Intelligence ANN Artificial Neural Networks AR Auto Regression DL Deep Learning ML Machine Learning NWP Numerical Weather Predictions ANN Artificial Neural Network NN Neural Network WF Weather Forecasting RNN Recurrent Neural Network LSTM Long short-term memory MAE Mean Absolute Error MSE Mean squared Error. RMSE Root Means squared Error 6 1 Introduction 1.1 Background of the study Weather forecasting is an intensive process that involves collections and analysis of atmospheric observations based on location and time (Chen et.al. ,2022). As a crucial element of daily human activities, traditional forecasting techniques have rapidly trans- formed into data-driven technologies. One of the pioneer mathematical models in this field is Numerical Weather Prediction (NWP), which aims to translate hydrodynamic activities in the atmosphere using collection of equations (Rozas ,2019). These equa- tions iteratively process current weather observations to forecast future weather con- ditions. Despite the success of NWP, the output retains uncertainty due to the equa- tions used in the method (Cho et al,2022). Moreover, the NWP method has encoun- tered challenges in understanding the patterns of observation data. Additionally, high- performance computing resources are needed to process the massive amount of data required for accurate predictions (Ren et al.,2021). ML and DL techniques are increasingly being applied in weather forecasting and signifi- cant progress has been made in addressing challenges such as handling large datasets, improving computational capabilities, and increasing prediction accuracy (Schultz et al.,2020). ML algorithms are designed to train a dataset and predict the future considering the behavior trained from the input data. There are several ML techniques used for weath- er prediction, such as regression and random forest which are popular choices. DL algorithms, on the other hand, involve training huge datasets using neural networks that mimic the structure of the human brain. DL is particularly suitable for capturing complex and non-linear relationships in weath- er data, which makes it powerful technique for improving weather forecasting accuracy. 7 This research paper aims to review distinct ML and DL technologies applied in weather forecasting, explain the theoretical background of these technologies ,how to handle the process .Moreover , it will evaluate the performance of both technologies in terms of accuracy ,precision and different scores. 1.2 Research Gap, Questions and Objectives Based on the main key words used in this research, different scientific papers reviewed to find out the research gap. Different academic publications databases used for searching relevant resources and some articles and journals which published within the last five years from IEEE database listed in the Table 1 Table 1. Research gaps Keywords Timeline 2017 - 1022 Database Hits Description Machine Learn- ing & Deep Learning, Weather Fore- casting 2021 IEEE 3 Air Temperature Fore- casting using Traditional and Deep Learning Algo- rithms (Li et.al,2020) 2021 IEEE 69 Air Temperature Fore- casting using Traditional and Deep Learning Algo- rithms (Chengsi et.al,2021) 2016 IEEE 176 Weather forecasting us- ing deep learning tech- niques (Ayman et.al 8 2021) 2022 IEEE 47 Rainfall Prediction using Different Machine Learn- ing and Deep Learning Algorithms (Mahadware et al.,2022) 2021 IEEE 55 Forecasting of Tempera- ture by using LSTM and Bidirectional LSTM ap- proach: Case Study in Semarang, Indonesia (Nizar et al.,2021) According to Chengsi et.al. (2021), weather forecasting requires enormous amounts of data as input. In addition, the dynamic nature of the collected data resulting more complex behavior during interpretation of this data to thoughtful conclusions, this sig- nificantly affects the accuracy of the prediction. The emerging of ML technologies play magnificent role in discovering the hidden patterns in massive data processing and produce near-accurate prediction DL strategies have an extraordinary capacity to investigate and grasp subtle patterns contained within enormous and complex datasets, which ultimately results in the pro- duction of results that are dependable and correct. Nowadays, DL has shown tremen- dous breakthroughs in the area weather forecasting, which has allowed them to better serve their customers (Ayman et.al 2021). 9 The research will apply selected ML and DL technologies on the sample data and eval- uate the accuracy of each method using different performance matrices. Based on the results obtained during the process, the study targets to answer the following research questions. o RC1: How can different deep learning and machine learning algorithms be measured, analyzed, and compared using performance matrices based on the dataset of the Finnish Meteorological Institute on Vaasa? o RC2: How can such an algorithm be implemented to obtain near-accurate weather forecasting. The main objectives of this study are: 1. To identify the fundamental differences between deep learning and machine learning algorithms handling time series datasets, specifically weather-related dataset. 2. To identify the pros and cons of deep learning and machine learning for weath- er forecasting. 3. To ascertain the accuracy of using deep learning as compared to machine learn- ing in weather forecasting. 1.3 Definitions and Limitations A weather dataset is an accumulation of meteorological data that comprises different atmospheric characteristics collected at places over a period. These parameters are recorded at specific locations over some interval of time. The science of meteorology makes use of these datasets for a wide variety of purposes, including different scientific studies, forecasting weather, analyzing environmental conditions, and other similar endeavors. Weather datasets applied for future prediction based on numerical weather prediction models, which contain innate errors. The uncertainty nature derives from the compli- 10 cated structure of weather systems, model assumptions, and limits in input data assim- ilation. However, the study limited to models that can learn the pattern of dataset in- stead of using a set of mathematical rules, that is numerical weather predictions. Weather forecasting is the process of predicting atmospheric conditions of the future for a specific location and time using scientific techniques (Singh and Chaturvedi ,2019). The raw data used for weather prediction have time series presentation format, that obtain repeated value over time (Afteniy,2021). In recent years, there are enormous amounts of technologies which have developed to handle such prediction effectively. There is still a degree of uncertainty connected with weather forecasting, even though contemporary systems for making forecasts have substantially increased their accuracy. It may be challenging to provide an accurate forecast of meteorological events that occur on a smaller scale, such as thunderstorms or enormous amount of rainfall. In addition, short term and long-term forecasting are uncertain and challenging due to the complex relations of numerous atmospheric processes and the limitations models. According to Soori et.al. (2022), machine learning defined as “a technology that repre- sents important evolution in computer science and data processing systems which can be used in order to enhance almost every technology enabled service, product and industrial applications “. In addition, it learns from the data and draws a pattern that is used for prediction or classification. In the case of weather forecasting, ML be trained from enormous amounts of meteorological data to improve their ability to predict the weather. The algorithms behind these technologies are intelligent to anticipate the weather, by analyzing this dataset and discovering patterns and associations. To provide reliable forecasts, machine learning algorithms need vast volumes of high- quality data. In addition, there is a possibility that the data coming from various sources contains errors or inconsistencies. These limitations have an impact on the accuracy of ML models. 11 Deep Learning is one of subspecialized machine learning technology that consist of neural network, resembles human brain structure, with multiple layers (LeCun et al.,2015). The layers are between neural network and pass the processed data to the next network until the data capture the required feature. The number of layers and neural networks on the structure depend on the volume data. From the perspective of weather predictions, DL describes the process of using neural network models that have numerous layers to evaluate and predict weather trends. These models strive to identify complicated links and patterns in the data, which gives them the ability to cre- ate forecasts based on previous observations. The training procedure for DL models is time-consuming and expensive, which is one of the limitations of these models. Because of these models' intricate design and exten- sive list of parameters, significant computing resources, such as high-powered graphics processing units or specialized hardware, are necessary. It takes a substantial amount of time to train DL models on big meteorological datasets, which may restrict their abil- ity to provide accurate real-time predictions. 1.4 Research process This study focuses on analyzing the performance of selected ML and DL technologies that implement data driven techniques. The theoretical background of these technolo- gies has been reviewed from different scientific papers. The study applies quantitative methodology, and the research process is divided into different phases as listed below. o Fetching Input Data The dataset used for this research was retrieved from Finnish Meteorological institute and describes hourly record of minimum and maximum temperature for the city of Vaasa. o Pre-processing Data The first step includes converting the raw input data into a proper format and data type according to specification of ML and DL algorithms. In addition, null 12 Figure 1. Research Process values, duplicate values, and error values are handled and dropped. For this purpose, one of the python data manipulation and analysis libraries, Panda, will be applied throughout this task. o Splitting Data The pre-processed data is divided into two independent datasets, namely the training, and testing sets. The training dataset covers 70 -80 % of the whole da- taset which is used to train the algorithm to learn the behavior of data and build a new model to make prediction for a given scenario. The testing dataset that contains the rest part of the dataset is used to evaluate the performance of the newly created model. o Building Model After suitable cleaning and preparation of the dataset, the training dataset feed into selected ML and DL algorithms. It results a new model which forecast the temperature for given time. o Testing Model The testing dataset is used here to check the performance of the models. The accuracy level of the model is also calculated using this dataset. 13 1.5 Structure of the study This research paper contains five chapters, each of which sequentially explains the top- ic, starting from its background to the conclusion. The detailed description of each chapter is presented as shown in the following lists. o Chapter 1: Introduction This chapter starts with a brief description of the study background and ex- plains the research gap, research questions and main research objectives re- spectively. It also includes the definition of main key words, limitation of the study and the research process. o Chapter 2: Literature review This part includes detailed literature review of ML and DL technologies related to weather forecasting. o Chapter 3: Methodology This chapter deals with four different methods representing ML and DL algo- rithms according to steps described in the research process. o Chapter 4: Result and Discussion This part is dedicated to answering the research questions and explain the re- sult obtained using the methods described in previous chapter. o Chapter 5: Conclusion This is the last part of the study and present summarization of the result based on research objectives defined in the first chapter. 14 Figure 2 Study Structure 15 2 Review Literatures ML and DL have made tremendous advancement in time series forecasting over the past few years. These technologies enable scientific forecast based on historical time- stamped observational data. This chapter is dedicated to discuss relevant academic literature which focuses on how machine learning and deep learning technologies ap- proach and implemented for weather forecasting. In ML, input data is processed to extract relevant collection of features that are used to train model. The model then learns how to map features with the desired output, using statistical techniques such as classification, regression, or clustering. The performance of the model tested against a test dataset using evaluation method. On the other hand, DL, involves feeding the input data directly into a network of nodes that is made up of several layers. The NN extracts features from input data in each and uses it to make predictions or classification. The network is then trained by using an algorithm that optimizes the weights of the node in order to minimize the difference between the predicted and actual output (Hinton ,2015). 16 Figure 3 Flow chart “Flow chart shows how the different parts of an AI system relate to each other within different AI disciplines. Shaded boxes indicate components that are able to learn from data” (Goodfellow et al ,2016) 2.1 Machine Learning for Weather forecasting Machine learning has shown tremendous promise in terms of increasing the accuracy of weather forecasting. Machine learning algorithms able to generate forecasts and give useful insights into future weather conditions because uses past observations on weather conditions and training models to understand complicated relationships and correlations (Holmstrom et al., 2016). In recent years, research on weather forecasting applying ML has been broadly increas- ing in all sectors of science.ML technology combine mathematical techniques with pri- or knowledge to enhance its performance to generate precise forecasts. The experi- ence refers to the past information available to the learner, that typically takes the 17 form of digital data collected and made available for analysis. This data could be in the form of digitized human-labeled training sets, or other types of information obtained via interaction with the environment. The quality and size of this data are critical to the success of the predictions made by the ML model (Mohri et al. ,2016). Machine learning is classified into three categories, namely supervised, unsupervised, and reinforcement learning, based on the learning algorithm, input data type, and problem type to be addressed (Sah,2020). 2.1.1 Supervised Learning for weather for forecasting Supervised learning has been extensively utilized for weather forecasting, with the goal of using past meteorological data to educate prediction models, has seen widespread adoption. This method makes it possible to create precise models that are able to pro- vide predictions depending on the characteristics that are fed into them. The use of supervised learning methods in weather prediction has been the subject of several research, which has shown the efficiency of these methods in capturing historical trends and boosting prediction accuracy (Brown et al., 2019). In supervised ML, the given weather data is a combination of labels {(X ,Y)} 𝑖=1 𝑁 . A fea- ture vector is a collection of all the element Xi among N, in which each of the elements, i = 1, ..., N, has a value that in some way characterizes the sample. This value is known as a feature and is represented by the symbol X (i). The label Yi can be element of any fixed set of classes, that used to categorize the element belong to a feature vector Xi. The main objective of supervised learning approach is to generate a model from the dataset that accepts a feature vector as an input and outputs a model that can be used to infer a label for the feature vectors it takes as input (Burkov, 2019). Supervised learning algorithms can be further grouped into classification and regres- sion problems. 18 1 Classification: It is the process of identifying the class to which a new data point belongs, based on a dataset that already contains observations with known class membership. Classes are commonly known as targets or labels and serve as categories for grouping items. For instance, the process of detecting spam in email service providers entails binary classification, which involves solely two classes (Campesato, 2020). The field of machine learning covers various classification algorithms, which enumerat- ed as follows (Campesato, 2020). • Decision trees: It is one of classification algorithm that utilizes a structure re- sembling a tree. In addition, the positioning of a data point is established through uncomplicated conditional reasoning. • Random Forests: Considered as an extension of decision trees, wherein the classification process requires the use of multiple trees, the quantity of which is predetermined by the user. • kNN (k Nearest Neighbour): It is a classification technique, that classification of data points into the same class is determined by their proximity to one another. Upon the introduction of a novel point, it is assigned to the same class as most of its closest neighbours. • Logistic regression: It is a statistical method that serves as both a classifier and a linear model, producing a binary output. Its’ method deals with multiple inde- pendent variables and utilizes a sigmoid function to compute probabilities. • Naïve Bayes: It is a type of probabilistic classifier that draws inspiration from the Bayes theorem. The Naive Bayes classifier operates under the assumption of conditional independence among attributes and has demonstrated effective performance even in cases where this assumption is not strictly upheld. This claim significantly diminishes computational expenses and constitutes a straightforward algorithmic implementation that solely necessitates linear time. • SVM (Support Vector Machines): Apply to a supervised machine learning algo- rithm that is capable of addressing classification or regression problems. Sup- 19 port Vector Machines (SVM) have the capability to operate with data that is not only linearly separable but also nonlinearly separable. 2 Regression: The linear regression algorithm is widely used in regression analysis to learn a model that is a linear combination of input features (Burkov,2019 ). The objective of linear regression is to determine the optimal line of best fit that accurately reflects a given dataset. It is imperative to bear in mind two fundamental aspects. The optimal regression line may not necessarily intersect with the majority, or all, of the data points within the dataset. The objective of determining a best fitting line is to reduce the vertical deviation between said line and the data points within the dataset. It should be noted that linear re- gression is not capable of determining the optimal polynomial fit. This task re- quires the identification of a polynomial of higher degree that intersects with a significant number of data points within a given dataset (Campesato, 2020). Moreover, it is possible for a dataset within a two-dimensional plane to com- prise of two or more points that are situated on a common vertical line. This implies that these points share an identical x value. It is important to note that a function is incapable of passing across a pair of points if two points, namely (x1,y1) and (x2,y2), share the same x value. In such cases, it is imperative that the y value of both points be identical (i.e., y2=y2). Conversely, it is possible for a function to exhibit multiple points that are situated on a common horizontal ax- is. 2.1.2 Unsupervised Learning for weather forecasting Unsupervised learning is widely use in weather forecasting to find hidden patterns within meteorological data without considering the presence of complex patterns and lacking target variables that have been identified explicitly. However unsupervised learning does not directly provide predictions, it can nevertheless give insightful infor- mation and help with feature mining, irregularity identification, and clustering for weather analysis (Lin et al., 2019). 20 Figure 4 Unsupervised ML as a single-step process Unsupervised learning involves utilization of techniques for the purpose of detecting trends within data sets that do not possess any classification or labeling of data points. The algorithms possess the capability to classify, label, and group data points within datasets autonomously, despite any external direction (Dridi, 2021). Unsupervised machine learning methods are utilized when a target feature is not pre- sent, and instead, the model fundamental structure inherent in the descriptive features of a given dataset. The previously framework is commonly represented through newly created characteristics that can be added to the initial dataset, thereby enhancing, or supplementing it (Kelleher et al. ,2020). (Kelleher et al. ,2020) Clustering is one of the unsupervised learning algorithms that entails the utilization of a distance metric and the iterative relocation of comparable entities in closer proximity. Upon completion of the process, the items that exhibit the highest density clustering around n centroids are deemed to be categorized within that particular group. K-means clustering is a well-known variant of clustering within the field of machine learning (Patterson & Gibson, 2017). 21 K-Means clustering, and hierarchical clustering are two widely recognized unsupervised clustering algorithms. The K-means clustering technique is a well-established method for clustering and is considered a prominent example of unsupervised learning. Due to its straightforward concept, superior efficiency, and uncomplicated execution, this ap- proach has garnered extensive utilization across various domains. (Chong,2021). 2.2 Deep Learning for weather forecasting Due to its capability of automatically capturing complicated patterns and obtaining sequential correlations in dataset, deep learning has gathered a substantial amount of interest in weather forecasting in recent years. The continuous growth of meteorologi- cal data in volume, contribues to the envolement of intelligent technologie, starts to play significant role in the weather forecasting (Chen et al.,2019). The technique of deep learning for image analysis and recognition is utilized extensive- ly in the identification of atmospheric radar and satellite cloud images, as well as in the prediction of inversions that will occur later. This results in obtaining automatic obser- vation of metrological phenomena (Chen et al.,2022). DL has obtained huge popularity in recent years due to its capacity processing enor- mous amount of data and produce near-to-accurate prediction output. According to Ekman (2021),” DL is a class of machine learning algorithms that use multiple layers of computational units where each layer learns its own representation of the input data”. The fundamental building block of DL is ANN, that simulate biological neurons present in human brain. These networks consist of billions of interconnected neurons through synapses, that exchange electrical signals by adding values to the input received. The activation function determines the activation status of each neuron by computing the total weight plus the constant called bias, that turn the activation function to the posi- tive or negative part (Géron 2022). 22 Figure 5 Biological Neuron (Géron 2022) DL transform conventional ML to more efficient technology by introducing more com- plex behaviour into the model. This result obtained by adding extra layer to NN design. Moreover, DL entails modifying data with different functions that permit sequential description in several layers of abstraction. This enables DL models, resulting in higher accuracy in variety of applications including weather forecasting (Kamilaris et al ,2018). 2.2.1 Neural Networks applications in weather forecasting The field of meteorology utilizes neural networks for a wide variety of applications, including weather forecasting. The process of a neural network is dictated by the net- work topology, the connection strength, and the processing that is carried out at com- putation components, also known as nodes. A neural network is a system that is built of many basic handling parts that operate at the same instance. The adaptable nature of neural networks is one of the most fundamental aspects of these systems. Because of this property, the ANN approaches are especially attractive in application areas of weather forecasting for resolving highly nonlinear events (Baboo et al.,2010). A neural network can be considered a mathematical function, similar to other machine learning model. 23 y= fNN(x) (1) The function fNN exhibits a specific structure, that it is an interconnected function. y = fNN(x) = f3(f2(f1))) (2) f1 and f2 can be expressed as: f1(z) ≝ gl (wlz +bi) (3) The variable "l" is commonly referred to as the layer index, and its range of values ex- tends from 1 to an arbitrary number of layers. The activation function known as "gl" is classified as a mathematical function utilized in neural networks. The data analyst typi- cally selects a non-linear function prior to commencing the learning process. The ma- trix wl and vector bl for each layer are acquired through gradient descent optimization, with the specific cost function being dependent on the task at hand (Burkov, 2019 ). Currently, there exist three prevalent categories of deep neural networks that are wide- ly employed. 1. Multilayer Feed-Forward Networks The multilayer feed-forward network is a type of neural network that comprises an input layer, one or more hidden layers, and an output layer. Each stratum comprises of one or multiple synthetic neurons. The artificial neurons exhibit resemblance to their perceptron predecessor, but their activation function varies based on the lay- er's distinct purpose within the network (Patterson & Gibson, 2017). 24 Figure 6 Multilayer perceptron ANN (Patterson & Gibson, 2017) 2. Convolutional neural network (CNN) The Convolutional Neural Network (CNN) is a distinct type of Feedforward Neural Net- work (FFNN) that effectively minimizes the parameters in a complex neural network with multiple units, while maintaining a satisfactory level of model accuracy. Convolu- tional Neural Networks (CNNs) have been utilized in various domains such as image and text processing, exhibiting superior performance compared to earlier recognized targets (Burkov, 2019). The effectiveness of Convolutional Neural Networks in the field of image recognition is a significant factor in the widespread acknowledgement of the capabilities of DL (Gibson, 2017). 3 Recurrent Neural Network (RNN) Recurrent Neural Networks (RNNs) are a highly expressive model category that is commonly used for tasks involving sequences (Sutskever et al.,2019). It possesses the ability to handle input of varying lengths, similar to RNN Neural Networks. Recurrent Neural Networks possess the capability to represent the hierarchical structures present in the training dataset, which sets them apart from other types of neural networks (Gibson, 2017). 25 Traditional NN have inputs and outputs that are not reliant on one another in any way. However, in situations in which it is necessary to anticipate the next word in a phrase, it is necessary to remember the prior words. As an outcome, it is necessary to remember the earlier words. As a result, RNN brought an innovative idea to resolve this problem with the assistance of a hidden Layer. RNN's hidden state, which remembers certain information about a sequence, is the property that is considered to be its primary and most significant characteristic. Memory State is another name for this condition be- cause it stores information about the most recent input that was made to the network. It implements the same job on all the inputs or hidden layers in order to generate the result, and so employs similar weight for each input it receives. In contrast to other NN, this simplifies the relationship between the constraints. 26 Figure 7 Dataset Header 3 Methodology This chapter focuses on the research methods and technologies used to collect, analyze, and interpret the dataset for the study. Moreover, it provides a clear and detailed de- scription of how the study was conducted in terms of research design process, data selection, model selection and data analysis tools. 3.1 Research Design The research design of this research involves a comparison of the performance of ML and DL models for weather forecasting. Based on the nature of the research topic, the research methodology is quantitative, focusing on retrieving and analyzing the dataset obtained from open source. Mathematical, statistical, and computational tools are uti- lized to analyze the data and obtain results. The dataset was obtained from Finnish Meteorological institute and contains hourly historical weather observations from the automatic observation station of Vaasa, west- er Finland. The dataset is in CSV format due to its fast-processing times when importing and exporting data. It includes various variables, such as temperature, atmospheric pressure, humidity, wind, and solar radiation. However, this study focuses specifically on forecasting maximum and minimum temperature. Based on the selected dataset, the research proposes models that combine both ML and DL technologies to perform weather forecasting. The model design involves the 27 Figure 8 Proposed Model use of a variety of algorithms including polynomial regression, gradient boosting, and recurrent neural networks. The main purpose of the models is to accurately forecast the temperature based on the dataset, by using the techniques. The process contains the following two main steps. • Train the model on 75% of the entire data, applying the algorithms stated above. These technologies involve feeding the dataset into the newly created model for the purpose of learning the pattern and trend of input data. • Evaluate the model applying the remaining 25% of the dataset to verify its per- formance. This process is used to discover the accuracy and reliability level of the model. 3.2 Data Preprocessing 28 Figure 9 Data Pre-processing steps (Sharda et al.2021) The study undertakes several pre-processing steps to guarantee the weather dataset is suitable for applying ML and DL algorithms. • Data Cleaning: The historical weather data checked for incomplete, errors, miss- ing values and outliers removed using appropriate data analysis tools. • Feature Engineering: New variable created from the existing variable to provide more valuable information. • Data Normalization: The dataset passed through the process of cleaning and standardizing to ensure that all variables have the same format, scale, and range. • Dimension Reduction: The dimension of the data reduced into a low dimension space to maintain meaningful properties of the raw data. 29 3.3 Model Selection Four different models selected from both machine learning (ML) and deep learning(DL), that are suitable to handle weather forecasting. 3.3.1 Polynomial Regression for weather forecasting The process of weather forecasting often requires the analysis of time-series data, in which the variables fluctuate over the course of a period. Polynomial regression as- sumes that the connection between the independent and dependent variables is con- stant, which means that it may not be able to capture the time-based patterns and dynamics of meteorological dataset. Any weather observations data will typically exhibit a nonlinear pattern of activity in its overall behavior. As a direct consequence of this, the linear regression model will be very challenging to visualize and will not accurately forecast any of the data. Because of this, it will be quite challenging to construct the optimal line that accounts for the ma- jority of the meteorological data. As a result, the prediction of the weather forecast will be too uncertain, and polynomial regression become preferred option since it allows to match the data curve while maintaining a minimal error value. According to Peck et al. (2012), polynomial regression is a sort of regression analysis in which the relationship between an independent variable (X) and a dependent variable (Y) is modeled as an nth-degree polynomial. Polynomial regression is also known as polynomial modeling. Additionally, it is one of the ML models that fits a non-linear re- gression curve to obtain a non-linear relation between the two variables. Polynomial regression represented by the equation: Y = β + β1X + β2X2 + … + βnXn + e (1) 30 Figure 10 Linear vs Non-Linear relationship (Cukrowski, 2022) Where Y is dependent variable, X is independent variable, β, β1, β2,… βn are coefficients of the equation, n is the degree of polynomial equation and e is error value. The ability of polynomial regression to capture non-linear trends in the data makes it an ideal choice for use in the forecasting of time series data. The relationship between the dependent variables and the independent variables in a time series dataset will typically result in a polynomial regression, which is able to capture non-linearity. 3.3.2 Gradient Boosting for weather forecasting Gradient boosting is a modern machine learning approach that is able to apply to pre- dict weather. The process of weather forecasting includes making predictions about future weather conditions based on observation from the past, and gradient boosting algorithm is one tool that enhances the accuracy of these forecasts. According to Friedman (2001), “Gradient boosting of regression trees produces com- petitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data”. It is a common ML technology that has recently gained popularity and functions by integrating a group of simple or ineffective learners into a single and more successful model. This method has been 31 demonstrated to be extremely effective in a variety of applicates including weather forecasting. The concept of boosting serves as the foundation for another efficient ensemble learn- ing approach known as gradient boosting. First, investigate the gradient boosting tech- nique for regression. In order to construct a robust regressor, start with a model in which f is equal to f0. f = f0(x) ≝ 1 𝑁 ∑ 𝑦𝑁 𝑖=1 I (2) Subsequently, the labels of each example i, where i ranges from 1 to N in the training set, are modified in the following manner: 𝑦̂ yi -- f(xi) (3) Where 𝑦̂ is residual and xi is the new label. The revised training set, which utilizes residuals in place of primary labels, is employed to construct a novel decision tree model, denoted as f1. The current definition of the boosting model is represented by f, which is defined as f ≝ f0 + αf1, where α denotes the learning rate, a hyperparameter. 3.3.3 Recurrent NN (RNN) for weather forecasting Recurrent neural networks are a prominent kind of deep learning model that is applied for time-series modeling applications such as weather forecasting. RNN is especially useful for applying with sequential data because they are able to describe sequential relationships and generate predictions based on the circumstance of previous records. This makes the algorithm an ideal tool for implementing sequential datasets. However, in some cases coping with long-term dependence or sudden shifts the sequence of the weather data, cause difficulty. To solve this problem, more sophisticated architectural designs, such as transformer-based models, applied by combining peripheral inputs into the model, such as geographical or satellite data, added to boost the capability of the output to make accurate forecasts. 32 Figure 11 RNN Structure A Recurrent Neural Network (RNN) is an architectural design that originated in the 1980s. RNNs are a fitting choice for datasets that feature sequential data (Campesato, 2020). Additionally, weather forecasting, stock prices forecasting, predicting energy demand are time series prediction problems. In those examples, events happen in the time-ordered sequence, where the previous event affects the current and future events. RNNs are meant to learn from data sequences in order to tackle time series issues by transmitting the hidden state from one step in the sequence to the next and mixing it with the input. However, the memory in RNN is generally short-term memory, in particular, RNN works by storing and merging the right before short-term memory in the current event. From that, RNN attempts to handle time-based or sequence-based data (Peter, 2021). Assume that the input sequence is denoted as x1, x2, x3, ..., x(t), .... Additionally, as- sume that the hidden state sequence is denoted as h1, h2, h3, ..., h(t). It should be not- ed that both the input sequence and hidden state are represented as a vector of size 1xn, where n corresponds to the number of features. During time t, the input is determined by the amalgamation of h(t-1) and x(t). Subse- quently, an activation function is employed on this combination, which may also en- 33 compass the inclusion of a bias vector. An additional distinction pertains to the feed- back mechanism inherent in recurrent neural networks, which operates between suc- cessive temporal intervals. The recent inner state is computed by integrating the pre- vious output with the present input, as per the operational procedure. The sequence {h(0), h(1), h(2), ..., h(t-1), h(t)} is utilized to denote the internal states of a Recurrent Neural Network (RNN) over a period of time {0, 1, 2, ..., t-1, t}. Additionally, it is as- sumed that the sequence {x(0), x(1), x(2), ..., x(t-1), x(t)} represents the inputs throughout the same time frames (Campesato, 2020). The equation below represents primary correlation for a recurrent neural network (RNN) at a given time t: h(t) = f(w *x(t) + u*h(t-1)) (4) where w and u are weight matrices, and f is tanh activation method. 3.3.3 NeuralProphet application in weather forecasting NeuralProphet is a time series forecasting algorithms created based on Facebook’s Prophet algorithm (Catherine,2022) which is currently applicable for weather forecast- ing. Initially, the Prophet algorithm provided a simple, practical, customizable, and rea- sonable tool to forecast time series. However, a persistent issue remained, poor per- formance. In order to address this matter, NeuralProphet was developed by combining neural networks with Prophet. NeuralProphet emphasis on configurability and interpretability, which means end user permitted to customize the model’s hyperparameters to best fit own approaches and provides analytical tools allow user to evaluate the model’s performance an identify part for enhancement. Furthermore, the modular architecture of NeuralProphet in- cludes a feature that enables the addition of new components as required (Yu et al., 2022). 34 The NeuralProphet model consists of six different modules, with each module contrib- uting an additional element to the time series prediction. According to Triebe et al.(2021), ” a core concept of the NeuralProphet model is its model it modular com- posability ”. The full model is summation of each module as shown in the equation 2, where h is the number of steps predicted in the future and 𝑦̂ is predicted value. 𝑦̂t+h -1 = T(t + h -1) trend + S (t + h -1) seasonal effects + E(t + h -1) event &holiday + F(t + h -1) regression effect for future + A(t + h -1) auto-regression effect + L(t + h -1) regression effect for lagged observation of variable (5) It is possible to configure each individual module of the model components and merge to form the complete model. • Trend: One of the most common ways to model trends is to use a combination of offset value, represented by m, and a growth rate, denoted by k. The trend impact at each given time t is assumed to be driven by multiplying the growth rate by the time difference between the beginning point ti and the current time tc , plus the offset m (Triebe et al. ,2021). T(tc) = T(ti) + k (tc – ti) (6) • Seasonality: The seasonality of a model refers to the extent to which a given da- taset exhibits a periodic pattern. This characteristic is typically represented using the following Fourier term equation. S(t) = ∑ ( 𝑛=𝑘 𝑖=0 aicos( 2𝜋𝑖𝑡 𝑝 ) + bi .sin( 2𝜋𝑖𝑡 𝑝 )) (7) 35 • Auto-Regression: This module is a commonly employed time series model for capturing temporal dependence among the stochastic variables within a series. • Lagged Regressors: The utilization of lagged regressors is a common practice in order to establish correlation between the target time series and other observed variables. The variables in question are commonly denoted as covariates. In con- trast to future regressors, the trajectory of lagged regressors remains uncertain (Triebe et al. ,2021). • Future Regressors: Refer to variables that are anticipated to be recognized in the future. The value of these variables is identified at every time • 3.4 Performance Evaluation The performance of all selected models will be evaluated applying the following set of performance metrices. 1. Mean Squared Error (MSE): It is a widely used measure of the average squared difference between actual and predicted values in regression problem. The val- ue is computed using the following formula. MSE = (1/n) * ∑(𝑌𝑖 − 𝑌̂𝑖 )² (8) Where: n is the observations in the data 𝑌𝑖 the actual value of the corresponding dependent variable in the observation. 𝑌̂𝑖 the predicted value of the corresponding dependent variable in the observation. Squaring the difference results in a non-negative value and guarantees that the MSE always return positive number or zero. An MSE zero is returned only by perfect model with no errors, but in actual case this does not occur. The closer the MSE value to zero, the model considered more accurate. 36 2. Mean Absolute Error (MAE): This metrices is used to evaluate the performance of a regressions model and defined as the measurement of average absolute difference between the actual observation and the predicted observation. The MAE value can be computed as follows: MAE = 1 𝑛 ∑ |𝑌𝑖 − 𝑌̂𝑖|𝑛 𝑖=1 2 (9 Where: 𝑌𝑖 the actual 𝑌̂𝑖 the predicted value One of the advantages of MAE is that it measures the average size of the mod- el’s errors in its predictions. In addition, it is used for evaluating the perfor- mance of a model when the errors are uniformly spread within the data. 3. R-squared (R2): This metric measures the proportion of variance in the target variable that is explained by the regression model. It also measures how much the data is closer to the fitted line. The formula for R2 is as follows: R2 = 1 - 𝑅𝑆𝑆 𝑇𝑆𝑆 (10) Where: Rss is sum of residual square, which measure difference the pre predicted value and actual values. TSS is total sum of squares, that measure the difference between actual values and the mean of the actual values. RSS = ∑(𝑌̂𝑖 − 𝑌𝑖 )² TSS = ∑(𝑌𝑖 − 𝑌̅)2 37 Where: 𝑌𝑖 the actual 𝑌̂𝑖 the predicted value 𝑌̅ mean of actual value The higher R2 implies a better performance in terms of fitting the model to the data and its value varies between 0 and 1. The models trained and tested using k fold cross validation technique to verify that the results are robust and not impacted by the selecting training and testing dataset. The cross-validation aids in reducing over-fitting and provide a more accurate prediction. In addition to the performance metrices different qualitative analysis is used to evaluate the model’s ability to capture complex patterns between variables, such as visualizing the result of model prediction and comparing it with the real observation data. 38 Figure 12 Yearly distribution of the feature, temperature 4 Research result and Analysis This part is dedicated to answering the research questions and explaining the result obtained using the methods described in the previous chapter. 4.1 Analysis of the dataset Analyzing the properties and relationships present in the dataset is essential prior to constructing machine learning and deep learning models for weather forecasting. This study facilitates the process of making learned decisions regarding feature engineering, data preprocessing, and model selection. Several crucial procedures have been con- ducted before developing a model in this study. - Main preprocessing methodology conducted to assure the quality of the da- taset. The dataset used in this research is considered as time serious data type, the data patten checked in the first steps underlying pattern and characteristics of the data was important for effective analysis and modeling. In addition, time serious data demonstrates seasonal dependencies, that means that the values at different points in time are likely to be related. Figure 5 shows the year distribution of the maximum and minimum températures by plac- ing the daily count each year on top of each other. The graph reveals some interesting pat- 39 Figure 13 Trend and Seasonality simulations Figure 14 Number of Null values in the dataset terns in the early phase and gives some hypotheses about the trend of maximum tempéra- ture: - Trend and seasonality detection performed in the second steps of the process, which refers to recurring occurrence at regular intervals, such as weekly, monthly, and yearly cycles, and represents organized changes in the dataset over time. This process played a significant role in selecting suitable models that capture the overall direction and magnitude of the dataset evaluation. - The dataset was carefully examined in order to detect any instances of null val- ues or values that were missing. Upon closer inspection, it became clear that the dataset did not include any null values, and there were no entries that were missing. The result of this process is shown in the following code snippet. 40 Figure 15 Duplicated value analysis Figure 16 Boxplot for Maximum temperature - In this process, an accurate analysis was conducted to identify any instances of duplicated values within the dataset. It was observed that a small portion of the dataset showed duplicated values, indicating the presence of replication. - Most time series data have a high probability of including outliers, which are values that dramatically depart from the norm of the rest of the data points. Outlier detection methods were used to the dataset, specifically focusing on two different features. Following the implementation of the detection functions, outliers were detected in one of the features. The analysis showed the presence of outliers, suitable actions were made to manage and fix these issues in the succeeding phases. - Feature extraction was conducted using heatmap to find out which feature is highly correlated to the other feature. The extracted feature selected based on 41 Figure 17 Heatmap for feature selection Figure 18 Polynomial Regression Evaluation matrices the correlation value and provides a more brief and understandable presenta- tion of the basic dataset patterns and easily applied in selected models. 4.2 Analysis of Model results The first model approach is polynomial regression. The minimum and maximum tem- perature shows a seasonal pattern (yearly, monthly, daily) that needs to be identified and modelled separately before studying other factors. To achieve this, the dataset is divided into separate train and test sets, and the model is adjusted with various degrees. The results obtained after training the model shown in the following figure. 42 Figure 19 Actual and prediction value comparison The evaluation metrics indicate the accuracy and performance of the weather forecast- ing model. RMSE value of 8.18 indicates the average difference between the predicted temperatures and the actual temperatures in the initial unit of the target variable. A lower MSE value implies better accuracy. In this case, it means that, on average, the predictions deviate by approximately 8.18 units from the actual temperatures. The MAE value is 2.14, it indicates that on average the absolute difference between the predicted values and actual value. The R2 score is 0.908, which indicates that approxi- mately 90.08% of the variance in the dependent variable can be described by the inde- pendent variables in the model. This implies a strong correlation and a good fit be- tween the predicted values and the actual values. In general, the results indicate that the regression model performed well. The low MSE and MAE values suggest that the predictions are generally close to the actual values. Additionally, the high R2 score indicates a high level of explanation and prediction ac- curacy in the model. The second model applied on the give dataset was gradient boosting. The model scored 0.91 for R2 ,which indicated the fraction of the total variation in the dependent 43 Figure 20 Gradient Boosting model evaluation metrices Figure 21 LSTM model prediction variable that can be attributed to the predictability of the independent variables. A higher R2 score in gradient boosting suggests that the ensemble of decision trees is able to explain a bigger amount of the variation in the data. This reveals, 91.03% of the variation that was found in the dependent variable was able to be explained by the independent variables that were used in the gradient boosting model. This suggests that there is a substantial connection between the projected values and the actual val- ues, as well as a good match between the two sets of data. The dataset format has the same features as time series dataset, so the RNN model, in particular LSTM, is applied with kerras to learn and predict the sequence of maximum and minimum temperatures from year to year. Moreover, it handles complicated mod- els with multivariate input variables and promotes the creation of a time series-based forecasting system. 44 Figure 22 LSTM model evaluation metrices According to the MAE value of 0.8449, the LSTM model's predictions deviated by around 0.8449 units from the actual values, on average. MAE scores that are lower are indicative of improved prediction performance. A MSE score of 1.6527 indicates that the LSTM model's predictions, on average, drifted from the actual values by around 1.6527 units squared. Along the same lines as MAE, lower MSE values suggest im- proved predicting ability. R2 score is 0.91, which indicates that approximately 91.03% of the variance in the dependent variable can be explained by the independent variables used in the gradient boosting model. A higher R2 score generally indicates better mod- el performance. 45 5.Conclusion 5.1 Key findings Following an analysis of the result from those models described in the previous chapter, a number of important key points were observed, including the following: - The LSTM model had the least amount of mean absolute error (MAE) compared to the other models, indicating that it had the best accuracy in predicting the target variable. It is possible that its capacity to grasp temporal relationships in sequential data was a contributing factor in this outstanding performance. - The MAE and MSE were not significantly different between the results obtained by the Polynomial Regression and Gradient Boosting models. Even though Poly- nomial Regression had a somewhat higher MSE, Gradient Boosting did margin- ally better in terms of its R2 score, which indicates that it is able to explain a bigger proportion of the variation in the target variable. - R2 ratings that were more than 0.9 were achieved by each of the three models, indicating that they each displayed a satisfactory level of predictive ability. This suggests that a significant percentage of the variation in the target variable may be traced back to the predictors that are included in each model. - The LSTM model had the best R2 value, which indicated that it provided a bet- ter overall fit to the data. It is important to note, however, that LSTM models may be computationally costly and may need more extensive data preparation than other models. - Having quality data significantly plays essential roles in achieving consistent and accurate predictions across different models and reduces the accuracy gap be- tween machine learning and deep learning models. 46 5.2 Conclusions Comparing the performance of different forecasting models in weather forecasting can be a complex task as it depends on various factors such as data quality, model configu- ration, hyperparameter tuning, and the specific weather patterns being predicted. Both deep learning and machine learning algorithms have their strengths and weaknesses when applied to weather-related time series datasets. Deep learning models, such as recurrent neural networks with alternatives like LSTM, shine in capturing complex sequential dependencies and patterns in weather data. They can automatically learn feature representations from raw data, eliminating the need for manual feature engineering. Deep learning models can handle large-scale datasets effectively, making them suitable for weather forecasting tasks. However, this technology often requires substantial amounts of labeled training data and computa- tional resources for training. Interpretability can also be a challenge with deep learning models. In summary, selecting the appropriate forecasting model for weather-related time se- ries datasets involves considering decisions between accuracy, interpretability, compu- tational requirements, and data availability. It is recommended to experiment with dif- ferent models, evaluate their performance using appropriate metrics, and choose the model that best suits the specific requirements of the weather forecasting task at hand. In general, having data of a high quality is very necessary in order to make accurate and trustworthy predictions using machine learning and deep learning models. While DL models have the benefit of being able to learn from raw data, ensuring the quality of the data via preprocessing, ensuring accurate labeling, and reducing bias are still criti- cal issues for both techniques. 47 Reference A. Burkov (2019). The Hundr-page Machine Learning BOOK, Page (1-5). Academic Journal of Computing & Information Science, 2021, 4(5) A. Géron (2022). Hands-On Machine Learning withScikit-Learn, Keras, and TensorFlow. A. Goodfellow,Y.Bengio,A.Courville (2016,chapter 3).Deep Learning https://www.deeplearningbook.org/contents/mlp.html A.Kamilaris , F.Prenafeta-Boldú(2018). "Deep learning in agriculture: A survey." computers and Electronics in Agriculture A.Salman ,B.Kanigoro ,Y.Heryadi.(2015). A Deep Learning-Based Weather Forecast System for Data Volume and Recency Analysis” A.Peck,G.Vining ,D.Montgomery(2012).”Introduction to Linear Regresssion Analysis” A.Kamilaris , F.Prenafeta-Boldú(2018). "Deep learning in agriculture: A survey." computers and Electronics in Agriculture B.Chong(2021). K-means clustering algorithm: a brief review. C.Li ,M.Zhao,Y.Liu,F.Xu (2020) .Air Temperature Forecasting Using Traditional and Deep Learning Algorithms. C.Lin , Y.Yu , L.Wu · J.Cao(2019). Unsupervised Learning on U.S. Weather Forecast Per- formance. C.Brown, J. Johnson,J.Mith(2019).Supervised Learning approaches for weather fore casting”.Journal of Meteorological applications D.Mishra , P.Joshi (2021).A Comprehensive Study on Weather Forecasting using Ma chine Learning . D.Cho , C.Yoo ,B.Son ,J.Im ,D.Yoon ,D.Cha (2022). A strategy for operational implementation of 4D-Var, using an incremental approach. In Quarterly Journal of the Royal Meteorological Society 120.519, pp. 1367– 1387. P.Rozas (2019). “Application of machine learning techniques to weather forecasting” I.Sutskever,J.Martens,G.Hinton(2019). Generating Text with Recurrent Neural https://francis-press.com/journals/AJCIS https://www.deeplearningbook.org/contents/mlp.html 48 Networks. M. Abdalla, H. Ghaith, A. Tamimi (2021). Deep Learning Weather Forecasting Tech niques: Literature Survey. J.Booz ,W.Yu , G.Xu ,D.Griffith ,N.Golme (2019).”A Deep Learning-Based Weather Fore cast System for Data Volume and Recency Analysis . J.Kelleher ,B.Namee,A.D’arch (2020). Fundamentals of Machine Learning for Predic- tive Data Analytics (2nd ed.) J. Segovia , J. Toaquiza , J. Llanos ,David R. Rivas(2013), Meteorological Variables Fore casting System Using Machine Learning and Open-Source Software. N.Singh ,S.Chaturvedi ,S.Akhter (2019).Weather Forecasting Using Machine Learning Algorithm. Published in Toward Data Science. L.Chen, B.Chen, J.Tingting(2019). Application analysis of meteorological Big data Ser vice in cloud computing Environment. China Sci. Technol. Inf. 2019, 11, 88–90. M.Afteniy (2021). Predicting time series with Transformer” M.Mohri ,A.Rostamizadeh ,A.Talwalkar (2016).”Foundations of Machine Learning, pages 1-8 M.Cukrowski(2022).”Polynomial Autoregression: Improve your Forecastsbin 2 minutes” M.Homstrom , D.Liu,C.Vo (2016).” Machine Learning Applied to Weather Forecasting. Stanford University M.Soori ,B.Arezoo ,R.Dastres (2022).” Machine learning and artificial intelligence in CNC machine tools, A review “. N.Triebe,H.Hewamalage ,P.Pilyugina , N.Laptev,C.Bergmeir ,R.Rajagopal(2021). Neural Prophet: Explainable Forecasting at scale. O.Campesato (2020) .Artificial Intelligence Machine Learning and Deep Learning, Page 128 – 150. P.Flach(2012).” MACHINE LEARNING The Art and Science of Algorithms that Make Sense of Data” S.Sah(2020). Machine Learning: A Review of Learning Types. https://www.pre prints.org/manuscript/202007.0230/v1 https://www.pre/ 49 Y.Lecun,Y.Bengio ,G.Hinton. (2015). Deep Learning . Nature volume 521, pages436– 444 (2015) Y.BENGIO, Y. LECUN,, G.HINTON.(2021).Deep Learning for AI. S.Dridi (2021). Unsupervised Learning – A systematic Literature Review. https://osf.io/kpqr6 R.Sharda,D.Delen,E.Turban(2021) . Analytics, Data Science and Artificial Intelligence X.Ren ,X.Li, K.Ren ,J.Song ,Z.Xu ,K.Deng X.Wang (2021) “Deep Learning -Based Weather Prediction: A Survey” V.Catherine (2022).In-Depth Understanding of NeuralProphet through a Complete Example. Published in Toward Data Science. Z.Yu, K.Niu, X.Chen, Z.Guo, D.LI (2022). A Hybrid Model Based on NeuralProphet and Long short-Term Memory for Time Series Forecasting. https://www.nature.com/ 1 Introduction 1.1 Background of the study 1.2 Research Gap, Questions and Objectives 1.3 Definitions and Limitations 1.4 Research process 1.5 Structure of the study 2 Review Literatures 2.1 Machine Learning for Weather forecasting 2.1.1 Supervised Learning for weather for forecasting 2.1.2 Unsupervised Learning for weather forecasting 2.2 Deep Learning for weather forecasting 2.2.1 Neural Networks applications in weather forecasting 3 Methodology 3.1 Research Design 3.2 Data Preprocessing 3.3 Model Selection 3.3.1 Polynomial Regression for weather forecasting 3.3.2 Gradient Boosting for weather forecasting 3.3.3 Recurrent NN (RNN) for weather forecasting 3.3.3 NeuralProphet application in weather forecasting 3.4 Performance Evaluation 4 Research result and Analysis 4.1 Analysis of the dataset 4.2 Analysis of Model results 5.Conclusion 5.1 Key findings 5.2 Conclusions Reference