This is a self-archived – parallel published version of this article in the publication archive of the University of Vaasa. It might differ from the original. Towards novel deep neuroevolution models: chaotic levy grasshopper optimization for short-term wind speed forecasting Author(s): Jalali, Seyed Mohammad Jafar; Ahmadian, Sajad; Khodayar, Mahdi; Khosravi, Abbas; Ghasemi, Vahid; Shafie-khah, Miadreza; Nahavandi, Saeid; Catalão, João P. S. Title: Towards novel deep neuroevolution models: chaotic levy grasshopper optimization for short-term wind speed forecasting Year: 2022 Version: Accepted manuscript Copyright ©2022 Springer. This is a post-peer-review, pre-copyedit version of an article published in Engineering with Computers. The final authenticated version is available online at: http://dx.doi.org/10.1007/s00366-021-01356-0 Please cite the original version: Jalali, S. M. J., Ahmadian, S., Khodayar, M., Khosravi, A., Ghasemi, V., Shafie-khah, M., Nahavandi, S. & Catalão, J. P. S. (2022). Towards novel deep neuroevolution models: chaotic levy grasshopper optimization for short-term wind speed forecasting. Engineering with Computers 38(Suppl 3), 1787–1811. https://doi.org/10.1007/s00366- 021-01356-0 Towards novel deep neuroevolution models: chaotic levy grasshopper optimization for short‑term wind speed forecasting Seyed Mohammad Jafar Jalali  · Sajad Ahmadian · Mahdi Khodayar · Abbas Khosravi · Vahid Ghasemi · Miadreza Shafie‑khah · Saeid Nahavandi · João P. S. Catalão Abstract High accurate wind speed forecasting plays an important role in ensuring the sustainability of wind power utilization. Although deep neural networks (DNNs) have been recently applied to wind time-series datasets, their maximum perfor- mance largely leans on their designed architecture. By the current state-of-the-art DNNs, their architectures are mainly configured in manual way, which is a time-consuming task. Thus, it is difficult and frustrating for regular users who do not have comprehensive experience in DNNs to design their optimal architectures to forecast problems of interest. This paper proposes a novel framework to optimize the hyperparameters and architecture of DNNs used for wind speed forecasting. Thus, we introduce a novel enhanced version of the grasshopper optimization algorithm called EGOA to optimize the deep long short-term memory (LSTM) neural network architecture, which optimally evolves four of its key hyperparameters. For designing the enhanced version of GOA, the chaotic theory and levy flight strategies are applied to make an efficient bal- ance between the exploitation and exploration phases of the GOA. Moreover, the mutual information (MI) feature selection algorithm is utilized to select more correlated and effective historical wind speed time series features. The proposed model’s performance is comprehensively evaluated on two datasets gathered from the wind stations located in the United States (US) for two forecasting horizons of the next 30-min and 1-h ahead. The experimental results reveal that the proposed model achieves the best forecasting performance compared to seven prominent classical and state-of-the-art forecasting algorithms. 1  Introduction In recent years, with the significant development of huge energy demand and dwindling supplies for renewable energy resources, wind energy has proliferated and gained a great deal of attention as one of the most environmentally and economically sustainable green energy resources [1]. How- ever, due to wind speed’s natural stochastic characteristic, designing an accurate wind energy model in electrical power systems can be considered problematic. Moreover, the wind speed’s inconsistency can significantly impact the safety and stability of the micro-grid scheduling and wind turbines con- trol that will further affect the load demand and balance of supply for the wind farm and energy quality [2]. Thus, in energy conversion and management, the optimal and accu- rate wind speed prediction models’ design can bring a stable bias for the generation and transmission of wind energy and diminish the operating costs of the power system. Over the last few decades, various forecasting tech- niques have been developed to predict the wind speed time series. Typically, such methods can be categorised within three subgroups, including physical strategies, statistical methods, and artificial intelligence algorithms [3]. Physical Keywords Wind speed forecasting  Deep neuroevolution  Long short-term memory  Enhanced grasshopper optimization algorithm http://orcid.org/0000-0002-2169-4350 http://crossmark.crossref.org/dialog/?doi=10.1007/s00366-021-01356-0&domain=pdf strategies are the explicit approaches that use meteorologi- cal information such as density, temperature, roughness, and atmospheric pressure obstacles [4]. A common technique for numeric weather prediction (NWP) uses mathematical models based on the physical data to forecast wind speed. Nonetheless, this numerical method is not adequate for prac- tical usage as it is not a straightforward process to collect such physical data, particularly for short-term wind speed forecasting. The statistical methods are the second group used by researchers to forecast wind speed time series. The modeling of different natural phenomena were studied using several data analysis techniques, such as statistical and mathematical modeling containing time series analysis, regression mod- eling, optimization and numerical analysis [5–9]. For wind speed forecasting, the most well-known statistical methods are auto-regressive models (AR), auto-regressive moving average models (ARMA), and auto-regressive integrated moving average models (ARIMA). Lydia et al. [10] adopted the linear and nonlinear AR models to forecast wind speed from 10-min up to 1-h for a wind energy center in India. Their developed method uses the Gauss–Newton algorithm for parameter tuning of the ARs. They also measured the accuracy of their proposed model using three performance metrics. In another work, Ailliot et al. [11] proposed novel techniques called non-homogeneous Markov-Switching auto-regressive (MS-AR) models to measure wind speed forecasting for an island in France. Different weather types have been analyzed by their method. Torres et al. [12] used the ARMA to forecast the hourly average wind speed from 1-h up to 10-h time horizons ahead. The data for this work have been gathered from a period of 9 years of five locations with different topographic characteristics in Navarre (Spain). They have shown that the ARMA models have a better fore- casting performance than the persistence model. Yunus et al. [13] developed an ARIMA model that can cost-effectively capture the probability distribution and time correlation for wind speed data. Their work’s simulation results show that their technique outperforms most of the persistence models to forecast short-term time horizons. As stated in [14], due to the inappropriate pre-assumed linear form, many statisti- cal methods can not cope well against nonlinear wind speed characteristics. With the rapid growth of feature selection, and machine learning approaches [15–22], numerous artificial intelligence (AI) strategies have been applied for several real-world problems [23–30] and have successfully been designed to address the non-stationary and randomization nature of wind speed time series. Generally, the existing AI-based wind speed forecasting methods can be classified into two cat- egories, including traditional machine learning algorithms and deep learning methods [31]. Support vector machine (SVM) algorithm is one of the prominent categories of the traditional machine learning algorithms which has a strong generalization potential [32–34]. In recent work, Kong et al. [35] optimized the parameters of a specific type of SVM algorithms called reduced support vector machine (RSVM) using particle swarm optimization (PSO) algorithm for wind speed prediction. In another work, Yu et al. [36] integrated an SVM algorithm with recurrent neural network methods to forecast wind speed with success. Artificial neural network (ANN) algorithms including backpropagation (BP), Elman neural network (ENN) [37], extreme learning machine (ELM) [38, 39], and radial basis function (RBF) [40] are the most commonly used traditional machine learning algorithms in many areas including the forecasting of wind speed time series. Cadenas and Rivera [41] utilized several BP models to forecast the short-term wind speed of Oaxaca city in Mexico. They showed the structure used for BP has acceptable accuracy for the energy supplier in Oaxaca. In [42], Guo et al. presented a hybrid algorithm based on the BP algorithm and seasonal expo- nential adjustment (SEA), in which the proposed algorithm was utilized to forecast the daily wind speed 1 year ahead for an area in China from 2001 to 2006. For ENNs, Wang et al. [43] proposed a novel algorithm optimizing these neu- ral networks’ weights and thresholds using a multi-objective whale optimization algorithm for wind speed forecasting. In another work, a multi-objective satin bower-bird optimizer algorithm was employed by [44] to optimize and enhance the forecasting performance of the ENNs based on two real wind farms of China. Salcedo et al. [45] developed a combined wind speed forecasting model using coral reefs optimization algorithm based on a feature selection problem for train- ing ELM [46, 47] networks. In [48], RBF neural networks were trained and optimized by a two-step novel mechanism, including the K-means clustering algorithm and non-dom- inated sorting genetic algorithm-II (NSGA-II) to maximize the coverage probability of the constructed prediction inter- vals for a wind speed dataset. The deep neural network algorithms have gained substan- tial attention as another successful artificial intelligence cat- egory [49–51]. Chen et al. [52] developed a novel nonlinear- hybrid ensemble of deep LSTM models for forecasting wind speed time series. Their hybrid method has been evaluated through two case studies of data from a Chinese wind farm. Liu et al. in [53] proposed a new model for the wind speed multi-step forecasting by deep LSTM networks combined by empirical wavelet transform and ELM [54, 55] algorithms. In another work, Pei et al. [56] proposed a hybrid algorithm including new cell update LSTM combined with empiri- cal wavelet transform for wind speed forecasting simulated on four different datasets. Besides, Khodayar et al. [57] presented a rough deep learning architecture combined by stacked denoising autoencoder (SDAE) and stacked autoen- coder (SAE) to forecast wind speed for ultra-short-term and short-term horizons. Several of these studies and other applications of deep learning have shown that deep learning approaches have more accurate performance than traditional machine learning methods [58, 59]. In general, deep learn- ing has demonstrated tremendous promise as an advanced and efficient machine learning paradigm for the wind speed forecasting field. In the research work presented in [60], the authors intro- duced a hybrid method called VMD-DE-ESN, combining variational mode decomposition [61], differential evolution, and echo state for wind speed forecasting. This proposed algorithm showed efficient performance on four stations col- lected from a wind farm in northwestern Spain. In [62], a new deep learning approach of the gated recurrent unit has been effectively designed to be coupled with the wavelet soft threshold denoising to predict the wind speed series. By adjusting the GRU parameters using a cross-validated grid-search strategy, this deep learning-based hybrid model achieved high adaptability through several case studies. In [63], the authors presented a novel model of day-to-day wind speed forecasting focused on deep CNNs by exploiting the Taguchi’s orthogonal array. The experimental findings sig- nify that the proposed efficient design-based CNN outper- forms other existing benchmark models. Among the deep learning approaches [64, 65], LSTM neural network generally has effective and strong perfor- mance due to its outstanding ability to cope with long-term time series problems [66, 67]. As a groundbreaking deriva- tive of RNNs (recurrent neural networks), LSTMs can pro- foundly learn the temporal and long-term dependencies from time-series data and effectively solve the gradient problem compared to traditional RNNs [68]. Thus, these excel- lent characteristics of LSTM motivated us to consider the deep learning strategy in this work based on LSTM neural network. Nonetheless, the empirical knowledge for selecting the values of hyperparameters in the LSTM neural network is unknown, and these hyperparameters will affect the fore- casting potential of LSTM. Therefore, we introduce a novel deep neuroevolution method based on an enhanced version of grasshopper optimisation algorithm (GOA) to optimize these hyperparameters to increase the wind speed forecast- ing functionality. GOA is a recent promising optimization algorithm inspired by the swarming behavior of grasshop- pers. This algorithm has already been utilized to plenty of stochastic and continuous optimization problems, proving its dominance over the most common meta-heuristics such as differential evolution [69], whale optimizer [70], particle swarm optimization [71], and genetic algorithm [72]. Saxena et al. [73] has introduced an improved version of GOA based on ten forms of chaotic maps in which the performance of these variants is successfully examined on several unimodal and multimodal benchmark functions. In work by Xu et al. [74], two techniques, namely orthogonal learning and cha- otic exploitation, are implemented in the traditional GOA to explore a much more reliable trade-off between both the phases of exploration and exploitation. The analytical find- ings demonstrate that the modified version can alleviate the shortcomings of GOA and provide more high-quality solu- tions. An annealing behaved GOA with boosted exploratory and exploitative patterns was proposed by Yu et al. [75] for solving global optimization. For a comprehensive review on GOA, please refer to work presented by Abualigah and Diabat [76]. As mentioned in the previous works discussed above, there are some weaknesses in the standard model of GOA. It can easily collapse into the local optimum and dem- onstrate a slow convergence rate when faced with several challenging problems. To further improve the performance quality of GOA, we add two powerful evolutionary operators into the basic GOA for the first time. These operators are based on chaos theory [77] and levy-flight technique, aiming to enhance meta-heu- ristic evolutionary algorithms’ performance for optimization problems. We name this improved version of basic GOA as enhanced GOA (EGOA). As discussed by Jalali et al. [26], it is of great importance how to select the appropriate hyperparameters for DNN algorithms since their performance depends on the values of such hyperparameters. Due to the decentralized and rela- tional feature representations, deep neural networks can learn nonlinear structures that are deeper and more dynamic than traditional machine learning models such as BP, ENN, ELM, and RBF neural networks, and SVM [78] algorithms. On the other hand, deep LSTM as a prominent deep neu- ral network was successfully deployed to solving different time series real-world problems [53, 79]. The architecture of LSTM neural networks was mostly designed manually, which is a cost-effective and time-consuming procedure [80]. Nonetheless, in the field of wind speed forecasting, there remain little works to utilize the optimal design archi- tecture for LSTM algorithms. In most of the studies that utilized deep learning technologies for wind speed forecast- ing, the authors designed the architecture of the utilized deep learning manually, which is a time-consuming procedure [81, 82]. Therefore, this paper aims to predict the wind speed with the highest accuracy using a novel optimization algo- rithm that automatically and efficiently designs the LSTM architecture. In summary, the principal contributors of this paper are as follows: 1. We introduce an LSTM-based deep neuroevolution time series forecasting algorithm for exploring the implicit knowledge from wind speed time series. Moreover, the mutual information (MI) algorithm is implemented to determine the procedure of input variable selection. The obtained features by MI aid in selecting the most fitting size of the LSTM input window. 2. While the references such as [67, 83, 84] selects the deep LSTM hyperparameters by the trial-and-error procedure, which is a time-consuming procedure, to efficiently optimize the hyperparameters of the deep LSTM neu- ral network in each layer, an efficient enhanced version of GOA evolutionary algorithm is conducted which we name it as EGOA. This modification enhances the GOA performance centered on chaotic theory and levy-flight strategies to obtain a faster convergence speed and make a more efficient balance between exploitation and explo- ration phases in the search space. 3. To the best of our knowledge, this work is the first study to utilize an enhanced version of the GOA evolutionary algorithm to optimize the hyperparameters of LSTM neural networks for wind speed forecasting. 4. Our proposed deep hybrid optimization algorithm shows an excellent forecasting performance compared to seven competitive classical and state-of-the-art methodologies for wind speed forecasting. Two prediction intervals successfully show the proposed model’s supremacy: utmost short-term wind speed forecast- ing for 30-min ahead and short-term wind speed forecasting for 1-h ahead. The datasets used for our experiments are col- lected from two wind sites near Las Vegas and Denver in the USA. We compare our novel algorithm with several standard and hybrid state-of-the-art time series forecasting algorithms including back propagation (BP) [41], convolutional neural network (CNN) [85], long short-term memory (LSTM) [80], Xgboost [86], empirical mode decomposition and genetic algorithm-BP neural network (EMD-GABP) [87], differen- tial evolution-LSTM (DE-LSTM) [88] and ensemble empiri- cal mode decomposition–GA–particle swarm Optimization Wavelet Neural Network (EGP-WNN) [89] algorithms. The experimental results show that the proposed model is signifi- cantly superior to other compared standard models. The remainder of this study is arranged as follows: Sect. 2 presents the related basic formulation for the proposed method. The experimental procedures for two US collected datasets of two different time-step horizons and discussions of the obtained experimental results are given in Sect. 3. Eventually, all major findings and future works are sum- marized and presented in Sect. 4. 2 � Proposed method This section describes how to develop our enhanced GOA evolutionary algorithm to optimize the structure of the LSTM neural networks by providing details. 2.1 � Structure of basic GOA Saremi et al. [72] recently proposed the swarm-based GOA based on imitating the behavior of grasshopper groups in the environment to realize optimal or sub-optimal solutions to the complex multimodal or composite hybrid problems. After initialization, the updating rule follows three laws: social interaction, gravity force, and wind advection. The current position of ith agent is referred to Xi and described by where Si denotes the variable for social interaction, Gi rep- resents the gravity force and Ai denotes to the wind advec- tion. Social interaction is the most influential component, based on its impact on the motion patterns, which can be determined as follows: where dij represents the distance between the agent i to the jth agent, and d̂ij denotes to a unit vector between ith and jth agent. The function s determines the social forces, which can be evolved based on the f and l parameters. The dis- tance between agents should be allocated between the [1,4] interval. The gravity force of an agent can be expressed as follows: where g is the constant of gravity and êg is the vector of unity towards the center of the earth. Grasshopper wind advection can be computed as following: where u denotes to a constant drift and êw is a vector of unity in wind direction. Finally, Eq. (1) can then be generalized as follows: (1)Xi = Si + Gi + Ai, (2) Si = N∑ j = 1 j ≠ i s ( dij ) d̂ij (3)dij = |||xj − xi ||| (4)d̂ij = ( xj − xi ) ∕dij (5)s(r) =fe−r∕l − e−r, (6)Gi = −gêg, (7)Ai = uêw, (8) Xi = N∑ j = 1 j ≠ i s (|||xj − xi ||| )xj − xi dij − gêg + uêw where the number of agents are denoted by N. The consider- able influence of gravity force on the grasshopper is too slow and weak to be simply ignored and implicitly assumes that the direction of wind (A component) is always in the best solution T̂d . The logical model between the agents is also demonstrated in Fig. 1. After all, the mathematical formula is developed as follows: where ubd is the dth dimension of upper boundary, lbd is the dth dimension of lower boundary, T̂d is the dth dimension value in the best solution so far obtained, and the parameter c is continuously updated to minimize exploration phase and help increasing exploitation phase according to the number of iterations through the following equation: where the maximum value is represented by cmax , the mini- mum value is denoted by cmin , l corresponds to the current iteration, and L denotes the maximum iteration number. 2.2 � Chaotic‑population initialization Boosting the balance of the swarm-based methods such as GOA is an essential part of the optimization process. For (9)Xd i = c ⎛ ⎜⎜⎜⎜⎜⎝ N� j = 1 j ≠ i c ubd − lbd 2 s ����x d j − xd i ��� �xj − xi dij ⎞ ⎟⎟⎟⎟⎟⎠ + T̂d, (10)c = cmax − l cmax − cmin L , example, advanced variants of several other evolutionary and swarm intelligence methods such as boosted moth-flame optimizer (LGCMFO) [90], chaotic, random spare ant col- ony optimization [91], biogeography-based whale optimizer [92], double adaptive moth-flame optimizer [93], orthogo- nal learning grey wolf optimizer [94], Gaussian bare-bones fruit fly optimizer [95] have found their applications in both basic and advanced versions in many areas based on sta- bilizing the balance of the exploration and exploitation of the core processes. In this regard, the quality of the initial population can significantly impact the convergence speed, and solution accuracy with evolutionary algorithms that continuously desire optimization via population iteration [96, 97]. The basic GOA typically initializes the population randomly, making it hard to guarantee population diversity, leading to weak search results and performance. There- fore, it is essential to enhance the diversity of the initial population. Generally speaking, chaos is a pseudo-random movement formed by a stochastic deterministic mechanism that is initially sensitive to a value and then generates many pseudo-random patterns [98, 99]. It has the attributes of non- linearity, randomness, and consistency. These characteristics can easily eliminate the algorithm from the local optimal solution when solving function optimization problems to preserve population diversity and increase the global search efficiency [100, 101]. Among various chaotic maps having different function optimization abilities, the tent map has shown its greater performance than the other maps [102]. Therefore, we used tent map agent population initialization, which can be formulated as Fig. 1   Primitive patterns between the agents in an update of GOA The target location 3D time-varying Comfort zone Trajectory of agents before the current iterations Attraction force applies to an agent Repulsion force applies to an agent Previous location of an agent Assume D represents the search space dimension and N denotes to the population size, the tent map sequence xij(i = 1, 2,… ,N; j = 1, 2,… ,D) is generated by Eq. (11). Based on Eq. (12), the initialized population P0 = { Xij } is mapped into the search space as follows: where the maximum and minimum of the jth dimension are represented by Xmaxj and Xminj , respectively. 2.3 � Levy flight Levy-flight (LF) was initially proposed in 1937 by Paul Levy, a French mathematician. In terms of levy statistics, many artificial and natural phenomena have been defined [103]. The LF is a well-respected subclass of non-Gaussian stochastic walks to distribute their step-length values con- cerning a stable Levy distribution. The levy distribution is accomplished as follows: where � provides a significant levy index for stability adjust- ment. The levy random number is determined using the given equation: where � and v represent the standard normal distributions, Γ denotes to a standard Gamma function, the value of � param- eter is equal to 1.5, and � is computed as follows: For achieving a potential trade-off between the capability of evolutionary algorithms to exploration and exploitation, LF approach is employed to update the position of each agent, which is calculated as follows: where Xlevy i represents the new position of the ith agent Xi , r denotes to a random vector in [0,1] interval, and ⊕ is the dot product (entry-wise multiplications). (11)xi+1 = { 2 × xi, 0 ≤ xi ≤ 1∕2; 2 × ( 1 − xi ) , 1∕2 ≤ xi ≤ 1. (12)Xij = xij × ( Xmaxj − Xminj ) + Xminj, (13)Levy (�) ∼ u = t−1−� , (14)Levy (�) ∼ � × � |v|1∕� , (15)� = ⎡⎢⎢⎢⎣ Γ(1 + �) × sin(� × �∕2) Γ �� 1+� 2 � × � × 2 �−1 2 � ⎤⎥⎥⎥⎦ 1∕� . (16)X levy i = Xi + r⊕ levy(𝛽), 2.4 � Enhanced GOA This section outlines the proposed enhanced GOA (EGOA) in detail. In EGOA, first, we adopt the chaos theory to boost the quality of the initial population position as described in detail in Sect. 2.2. Then we utilize the Levy flight strategy into the GOA to address the original GOA’s drawback to make a more appropriate balance between exploration and exploitation phases. Section 2.3 defined the fundamental principles of the levy flight strategies in detail. As it is well- known regarding evolutionary algorithms, search agents’ diversity is crucially important since diversity enables the population to search functionality towards the global opti- mum. The levy flight component was utilized in GOA to improve GOA population diversity. To this end, once the position of ith search agent Xi is updated, the levy flight component is incorporated to deploy a new candidate solu- tion. The modified mathematical equation for the enhanced GOA is defined as follows: where X∗ i represents the current agent position after the new update, and rand(d) is a random d-dimensional vector into the interval of [0,1]. Since levy flight is a randomized procedure where the jump’s size typically follows the levy probability distribution function, the new candidate solu- tion obtained via the levy flight algorithm has a significant chance of jumping from the local optimum and achieving a superior solution. Search agents with more excellent fitness are preserved in the population to guarantee the reliability of the population. Therefore, the levy flight mechanism can cause competitive agents to move faster towards the global optimum. As a result, since incorporating the chaotic theory and levy flight strategies help to enhance the capabilities of GOA, we name this novel proposed method as enhanced GOA (EGOA). 2.5 � LSTM LSTM neural network is a deep learning algorithm with time-varying inputs and targets. It also has an excellent performance in time-series data processing thanks to its outstanding ability to solve long-term dependency prob- lems. The cornerstone of the LSTM neural network is the memory cell, which can preserve the temporal state. The input gate can add or remove the information to the cell state with the memory cell, forget gate, and the output gate. Figure 2 describes a sample unit of a LSTM network. The (17)X levy i =X∗ i + rand(d)⊕ levy(𝛽) (18)Xt+1 i = { X levy i fitness ( X levy i ) > fitness ( X∗ i ) X∗ i otherwise , key stages of this neural network are explained as follows in three stages: 1. The input gate monitors the input activation when the input gate is activated, and the new input information is received to the memory cell. 2. The forget gate forgets the unimportant contents. Thus, the past cell status is forgotten in the pipeline when the forget gate is enabled. 3. The output gate regulates the output activation. Thus, the current cell output is propagated to the final state when the output gate is enabled. The three gates are sigmoid units that adjust each item in the interval of [0, 1]. The standard sigmoid logistics function is specified as follows: The ith entry gate regulates the input information that passes into the memory cell, resulting in the following: Forget gate ft regulates forgetting cell information, in which Output gate ot regulates the output information that flows from the cell, deriving from the following equation: (19)�(x) = 1 1 + e−x . (20)it = � ( wxixt + whiht−1 + bi ) . (21)ft = � ( wxf xt + whf h + bf ) . For the time t, a tanh function quantifies the input charac- teristics by inputting xt and the previous hidden state ht−1 as follows: Here, the memory cell is updated through regulated input features and the partial forgetting of previous memory cell, which provides The hidden output status ht is eventually determined by the output gate ot and the memory ct , where Therefore, the LSTM output yt is determined as follows: In Eqs. (20)–(26), the wxi , wxf  , wxo , and wxc are the proper input weights. whi , whf  , who , and whc matrices represent the recurrent weight matrices, and why denotes to the matrix of hidden output weight. The corresponding bias vectors are represented by bi , bf  , bo , bc , and by. 2.6 � Proposed EGOA‑LSTM Method This section presents the proposed wind speed forecasting method called EGOA-LSTM. This method aims to utilize the improved GOA algorithm to optimize the LSTM neural network’s hyperparameters, leading to improving the wind speed forecasting model’s accuracy. Before applying EGOA, two issues should be considered, including representation of solutions and calculation of fitness function. It should be noted that four different hyperparameters, including batch size, learning rate, maximum epoch, and neural units, are considered in the proposed method to be optimized by the EGOA algorithm. Therefore, each solution in EGOA can be represented as a vector with four dimensions, each of which corresponds to one of the four hyperparameters. Learning rate is a hyperparameter with continuous values, which EGOA can obtain its optimal value. In contrast, batch size, maximum epoch, and neural units are other hyperparameters with discrete values. As EGOA explores solution space in continuous mode, we need to convert these hyperparameters’ optimal values to their corresponding discrete values. To this end, each real value can be converted to an integer value using the following equation: (22)ot = � ( wx0xt + whoht−1 + bo ) . (23)gt = tanh ( wxcxt + whcht−1 + bc ) . (24)ct = ft ∗ ct−1 + it ∗ gt. (25)ht = ot ∗ tanh ( ct ) . (26)yt = � ( whyht + by ) . * * * xt ht-1ot xt ht-1it xt ht-1 gt xt ht-1 ft ct-1 ct ht Fig. 2   Structure of the deep LSTM neural network block where bj is the total number of the item of type j, xij is the real number corresponds to the jth dimension of the solution Xi , yij is the converted integer value, lb and ub are respec- tively the lower and upper bounds of the search space. In the proposed EGOA-LSTM method, first of all, the ini- tial population with n solutions is randomly initialized using Eq. (8). Each solution is denoted by a four-dimensional vec- tor Xij, i = 1,… , n and j = 1,… , 4 where each dimension j corresponds to one of the four LSTM hyperparameters. After the initialization of the initial population, new solutions can be obtained by repeatedly updating the solutions’ current positions using Eq. (9). Moreover, the levy flight strategy is applied to the updated positions to balance exploration and exploitation using Eqs. (17) and (18). The procedure repeats until the termination criterion is reached, and then the best-obtained solution is considered as the final result. This obtained solution can be used as the optimal values of the LSTM hyperparameters. To evaluate the usefulness of each solution, we need to define a fitness function. To this end, the input time series data is divided into two sets, including training and test. The training set is used to opti- mize the LSTM hyperparameters using EGOA, while the test set is used to evaluate the final obtained wind speed forecasting model’s performance. Suppose that y⃗ is a vector to denote the historical wind speed time series data for M time steps expressed as follows: where y(t) denotes the actual wind speed value for the time step t. The purpose of the proposed wind speed forecasting model is to predict the wind speed values of the next N time steps using LSTM neural network which these predicted val- ues can be represented as follows: (27)yij = ⌊ bj × xij − lb ub − lb + 0.5 ⌋ , j = 1,… , n, (28)y⃗ = (y(0), y(1),… y(M−1)), where y⃗(t) denotes the predicted wind speed value for the time step t. It should be noted that each solution in EGOA is used to configure an LSTM model based on the hyperpa- rameters’ obtained values. Therefore, the configured LSTM model’s performance on forecasting wind speed values can be considered as the fitness function. To this end, the input vectors of the LSTM model are represented using Eq. (28) based on the training data. The LSTM model is then utilized to predict the wind speed values of the next N time steps, which are represented using Eq. (29). To calculate the fitness value of each solution in EGOA, the mean square error can be used as follows: where yi is the actual wind speed value and ŷi is the pre- dicted wind speed value obtained by the LSTM neural net- work. Obviously, a solution with a lower MSE value has a higher fitness value and vice versa. Therefore, the proposed method aims to obtain a solution with the lowest MSE value (i.e., highest fitness value) containing the optimal values of LSTM hyperparameters. This leads to obtaining an LSTM model with maximum performance forecasting wind speed values in the test set. After determining the optimal values of LSTM hyperparameters using EGOA, the configured LSTM model is used to predict wind speed values in the test set. Algorithm 1 represents the overall steps of the proposed EGOA-LSTM method. In Fig. 3, the deep proposed model’s whole procedure is illustrated. Also, the flowchart of the proposed wind speed forecasting model is depicted in Fig. 4. (29)⃗̂y = ( ŷ(M), ŷ(M+1),… , ŷ(M+N−1) ) , (30)MSE = 1 n n∑ i=1 (yi − ŷi) 2, Fig. 3   The schema of deep EGOA-LSTM model for wind speed forecasting ... Flattening layer LSTM layerLSTM layerLSTM layer Input wind speed data Hyper parameters optimization based on EGOA optimizer Predicted wind speed Algorithm 1 Pseudo-code of the proposed wind speed forecasting method (EGOA- LSTM) 1: Input: pop size (population size), cmax, cmin and L (maximum number of iterations). 2: Output: Predicted wind speed values. 3: Begin algorithm: 4: Split dataset into training set Tr and test set Te; 5: Initialize the agent population Xi (i=1,2,. . . , pop size) based on chaotic theory; 6: for (each search agent Xi) do 7: Set an LSTM model based on the values of hyperparameters obtained by the solution Xi ; 8: Calculate the fitness of solution Xi using Eq. (30) as the MSE of LSTM model obtained based on the training set Tr; 9: end for 10: Set B= the best search agent based on the calculated fitness values; 11: Set l=1; 12: while (l < L) do 13: Update c according to Eq. (10); 14: for each search agent Xi do 15: Normalize the distance between agents in [1,4] interval; 16: Update the current position of Xi based on Eq. (9); 17: Apply the levy flight strategy by using Eqs. (15) and (16); 18: Check the Xi values to be in the boundaries and bring them back if they go outside; 19: Set an LSTM model based on the values of hyperparameters obtained by the solution Xi; 20: Calculate the fitness of solution Xi using Eq. (30) as the MSE of LSTM model obtained based on the training set Tr; 21: if the fitness of solution Xi is better than the fitness of B then 22: Set B=Xi; 23: end if 24: end for 25: Set l=l+1; 26: end while 27: Set an LSTM model based on the hyperparameters obtained by the best solution B; 28: Predict the wind speed values in the test set Te using the best obtained LSTM model; 29: Return the predicted wind speed values as the output; 30: End algorithm Fig. 4   Flowchart of the pro- posed EGOA-LSTM model for wind speed forecasting Wind speed time series datasets Data preprocessing with normalization and MI strategy Calculate the fitness values for each agent by LSTM on training data Randomly initialize population based on chaotic theory Start U=The best obtained LSTM hyper- parameters based on fitness function Update the position of agents Is iteration criteria satisfied? Report the optimal set of U If a better solution found ,then update U Perform the levy flight operation Calculate the fitness values for each agent by LSTM on training set Update the agents X Feed the set of U into LSTM using test data Forecast wind speed time series test data using U Obtain the forecasted wind speed error value with optimal LSTM hyper-parameters End No Yes 3 � Experimental results 3.1 � Data In contrast to several studies such as [53, 56, 82] which used a small amount of wind speed data (usually less than one- thousand samples) for showing the efficiency of their pro- posed deep learning algorithms, the 30-min interval between consecutive historical samples of two wind stations in the US for the whole year 2012 has been used in this study. Western Wind Dataset [104] created by the National Renew- able Energy Laboratory (NREL) and 3TIER, the wind speed time series estimated for two wind sites in Las Vegas and Denver are used to examine the efficiency of the proposed EGOA-LSTM algorithm. The location of these two wind sites are shown in Figs. 5 and 6. In total, there are 17520 wind speed values measured in intervals of 30 min for each of two wind stations. Thus, sufficient data are available to train and test our proposed deep learning method. Similar to [80], 70% of each dataset is considered for training sets while 10% is used for validation sets and the rest is dedicated to testing sets. At the beginning of the experiments, raw datasets were pre-processed into the interval of [0,1] using Eq. (31) to improve the forecasting efficiency. The final goal is to predict the next forecasting horizons for the next 30 min (one-step) and 1-h (two-step) ahead. Fig. 5   Location of wind speed site for Las Vegas case study Fig. 6   Location of wind speed site for Denver case study 3.2 � Evaluation metrics Four loss functions are employed to assess the prediction performance of the proposed model as the criterion related to the wind speed values including root mean squared error (RMSE), mean absolute error (MAE), mean absolute per- centage error (MAPE) and R squared ( R2 ). The lower the loss function value, the higher the model accuracy for wind speed forecasting. The formulas of the performance evalua- tion metrics are as follows: where y′ i represents the predicted wind speed value of cor- responding yi and n indicates the number of data points in the test set. (31)z = z − zmin zmax − zmin . (32)RMSE = √√√√( 1 n ) n∑ i=1 (y� i − yi) 2 (33)MAE = ( 1 n ) n∑ i=1 |y� i − yi| (34)MAPE = ( 1 n ) n∑ i=1 ||||| y� i − yi yi ||||| (35)R2 = 1 − ∑n i=1 (y� i − yi) 2 ∑n i=1 (y� i − ỹi) 2 , 3.3 � Input feature selection Input feature selection [105–107] is a fundamental, and yet crucial consideration in determining the optimal structure of data-driven models. In the literature, several studies such as [108] have operated auto-correlation function (ACF) to achieve the cross-correlation of wind speed time series at various time instances. As ACF can only calculate linear dependency of variables with themselves, and the wind speed information is highly nonlinear in nature, mutual information (MI) is an effective strategy to estimate the data’s nonlinear and linear correlations. Assume X and Y are considered as two random variables. The entropy of X rep- resented by H(X) is a metric of its uncertainty, and the joint entropy of X and Y are donated by H(X, Y). The conditional entropy calculated by H(Y|X) = H(X, Y) − H(X) indicates the uncertainty of Y due to the observation of the variable X. The MI is a nonlinear equation between two random vari- ables to calculate the amount of information acquired about a variable if the other variable is observed. MI is determined by I(X, Y) = H(Y) − H(Y|X) which reduces the uncertainty of variable Y due to the observation of variable X, and vice versa. Suppose v(t) as the value of wind speed at time t, the MI between v(t − l + 1) and v(t + 1) is calculated consider- ing l as the time lag. Following the selection of the most relevant inputs for our deep EGOA-LSTM algorithm, the wind speed values equivalent to time-lags with MI more than x = 0.4 are considered for input sets to high- light the correlation in two wind datasets for 30-min and 1-h ahead forecasting horizons. In Fig. 7, MI for the lag l = 1 to l = 200 of the Las Vegas dataset for 30-min ahead interval is illustrated. As it is indicated from this fig- ure, the correlation among the wind speed observations is increased by the time-lag. As a result, time-lags from l = 1 to l = 29 are incorporated. Assume the current time is t and we are going to predict the wind speed values for a future time horizon. Then, our input set is a 29+28 = 57 dimensional vector v(t − 28),Δv(t − 27), v(t − 26),… , v(t) with the sequential difference Δv(t) = v(t) − v(t − 1) of the wind speed data. Fig. 7   Mutual information of various time-lags for Las Vegas dataset Table 1   The hyperparameters of the deep LSTM network during the evolution Hyperparameter Range M e [1–500] N u [1–60] B s [1–200] L r [0.0001–0.1] 3.4 � Parameter settings In this section, we describe the default configurations for performing our proposed EGOA-LSTM algorithm. Regard- ing the initialized parameters for EGOA, we set the number of population = 30, the maximum number of iterations = 20, and the number of runs = 20 for each dataset. Two main parameters for GOA are Cmax and Cmin , which their values are set to 1 and 0.00004, respectively. These values are selected based on the recommended literature [72]. There are four key hyperparameters for training the deep LSTM neural network, including maximum epoch ( Me ), neural units in the hidden layer ( Nu ), batch size ( Bs ), and learning rate ( Lr ), which are fed to EGOA. The range of these hyper- parameters is shown in Table 1. Previous works [109–111] have used a more limited range of hyperparameters. How- ever, in this study, we have chosen a wider range of hyper- parameters to train LSTM. Moreover, the number of layers that have been used for designing the LSTM architecture is denoted to three. To further assess our proposed approach’s predictive ability, the proposed deep neuroevolution model is compared with the recently proposed deep learning mod- els. The single and hybrid algorithms presented herein are used as compared models to highlight the efficiency of the EGOA-LSTM. These models are backpropagation (BP) [41], convolutional neural network (CNN) [85], long short-term memory (LSTM) [80], Xgboost [86], empiri- cal mode decomposition and genetic algorithm-BP neural network (EMD-GABP) [87], differential evolution–LSTM (DE-LSTM) [88] and ensemble empirical mode decomposi- tion–GA–particle swarm Optimization Wavelet Neural Net- work (EGP-WNN) [89] algorithms. The configuration for these compared single and hybrid approaches are based on their recommended literature. The proposed EGOA-LSTM model is implemented in the Python programming language [112] version 3.7, TensorFlow 1.15, CUDA 10.1, cuDNN 8.0.5 and executed on an NVIDIA GTX 1080 Ti GPU, RAM of 32 GB, and Intel Core i7 machine with 3.7 GHz 12 cores CPU. 3.5 � Analysis of the results and discussion In this section, we report the results of experiments for two case studies with two forecasting horizons. We then discuss these results in detail. 3.5.1 � Las Vegas case study In this case study, the wind speed data recorded for every 30 min was utilized as the dataset. We consider this wind speed dataset for forecasting of utmost short-term 30 min (one-step ahead) ahead and short-term 1 h (two-step ahead) ahead. Tables 2 and 3 report the forecasting performance of the different prediction algorithms for the 30-min ahead and 1-h ahead wind speed data, respectively. Moreover, Fig. 8 demonstrates the actual and predicted values of different forecasting algorithms for the next 30-min ahead of the Las Vegas dataset. The blue and red colors seen in these fig- ures represent the actual and predicted data values of the algorithms used in this paper, respectively. The convergence curve for two different horizons of the Las Vegas dataset is also demonstrated in Fig. 9. Also, the violin plots of four hyperparameters involved in optimizing LSTMs using our novel deep neuroevolution method are illustrated in Figs. 10 and 11. From Table 2 and Fig. 8, it is noteworthy that the pro- posed EGOA-LSTM carries out better than the compared forecasting techniques with the minimum value of RMSE as 0.033647, MAE as 0.019135, MAPE as 24.42821 and the maximum value of R2 as 0.956096 in terms of next 30-min wind speed prediction. On the other hand, the best algo- rithm among compared predictive models is EGP-WNN with RMSE as 0.037143, MAE as 0.025895, MAPE as 51.28683 and R2 as 0.946497 whereas Xgboost is the worst one with RMSE as 0.158511, MAE as 0.147968, MAPE as 418.574722 and R2 as 0.467719. It appears from Fig. 8 that the EGOA-LSTM demonstrates better curve fitting of the Table 2   Error estimated results of the predictions of the 30-min ahead wind speed time series for Las Vegas dataset. The bold values represent the best performance evaluation metric Algorithm RMSE MAE R 2 MAPE XGBOOST 0.158511 0.147968 0.467719 418.5747 BP 0.050608 0.033037 0.900678 58.43221 CNN 0.047746 0.031668 0.911593 55.59189 LSTM 0.046534 0.032627 0.916025 65.59595 DE-LSTM 0.045165 0.030470 0.920891 51.78664 EMD-GABP 0.043582 0.028625 0.926339 47.20511 EGP-WNN 0.037143 0.025895 0.946497 51.28683 EGOA-LSTM 0.033647 0.019135 0.956096 24.42821 Table 3   Error estimated results of the predictions of the 1-h ahead wind speed time series for Las Vegas dataset. The bold values repre- sent the best performance evaluation metric Algorithm RMSE MAE R 2 MAPE XGBOOST 0.176611 0.164245 0.206522 472.6453 BP 0.106510 0.076001 0.561186 161.1773 CNN 0.102338 0.073158 0.594892 153.8014 LSTM 0.095044 0.068763 0.650583 127.7113 DE-LSTM 0.092181 0.065956 0.671315 125.8747 EMD-GABP 0.088070 0.062095 0.699977 113.9565 EGP-WNN 0.065221 0.041562 0.835458 81.66707 EGOA-LSTM 0.064619 0.038741 0.838482 66.02486 actual wind speed time series compared to other forecasting models. Table 3 shows that the EGOA-LSTM achieves better per- formance than the compared forecasting algorithms in terms of next 1-h ahead wind speed forecasting, including the minimum value of RMSE as 0.064619, MAE as 0.038741, MAPE as 66.02486, and the maximum value of R2 as 0.838482. Among the compared models, EGP-WNN is the leading algorithm with minimum values in terms of RMSE, MAE, MAPE, and maximum value for R2 . The convergence 0.0 0.2 0.4 0.6 0 200 400 600 EGOA−LSTM 0.0 0.2 0.4 0.6 0 200 400 600 Xgboost 0.00 0.25 0.50 0 200 400 600 BP 0.00 0.25 0.50 0 200 400 600 W in d sp ee d (m /s ) CNN 0.0 0.2 0.4 0.6 0.8 0 200 400 600 LSTM 0.0 0.2 0.4 0.6 0.8 0 200 400 600 DE−LSTM 0.0 0.2 0.4 0.6 0.8 0 200 400 600 EMD−GABP 0.0 0.2 0.4 0.6 0 200 400 600 Time (half−hour) EGP−WNN Fig. 8   The wind speed forecasting results of 30-min ahead obtained by different algorithms on Las Vegas case study profile of the proposed EGOA-LSTM algorithm for the Las Vegas dataset using two different forecasting horizons is shown in Fig. 9. As we can see in this figure, the pre- diction error for 1-h ahead of forecasting is much higher than 30-min ahead of forecasting. Moreover, our proposed method converges properly to the end of the maximum itera- tion number for both forecasting horizons. The violin plots using four different hyperparameters evolved into EGOA-LSTM algorithm for 30-min and 1-h ahead wind speed forecasting are illustrated in Figs. 10 and 11. In an overview of these two figures, we reveal that the EGOA-LSTM assigns values to deep LSTM hyperparam- eters that do not have high computational volumes (usually less than the maximum value of the interval). For instance, by looking into the batch size values for both 30-min and 1-h intervals of the Las Vegas dataset, we understand that most of the assigned values are around and less than the median (the line shown in the figure). Such an interpretation applies to the other three hyperparameters and indicates the high capability of the proposed evolutionary search algo- rithm in initializing hyperparameters of the LSTM neural network. To evaluate the proposed algorithm’s performance sta- tistically, we demonstrate the boxplots of RMSE rates in Figs. 12 and 13 for the proposed EGOA-LSTM versus the other benchmarks in tackling Las Vegas dataset for two fore- casting horizons. As seen from these two figures, in two forecasting horizons, the dominance of the proposed deep EGOA-LSTM is evident. 3.5.2 � Denver case study This section investigates the forecasting of the next 30-min and 1-h ahead for wind speed time series of Denver collected dataset. Tables 4 and 5 display the performance results of forecasting compared algorithms. The visualization for dif- ferent forecasting algorithms based on the test set’s actual and predicted points is shown in Fig. 14. Table 4 demonstrates that the proposed EGOA-LSTM model dominates other compared forecasting methods with the minimum value of RMSE as 0.042213, MAE as 0.028105, MAPE as 40.930122 and maximum value of R2 as 0.916746 for utmost short-term 30-min wind speed forecast- ing. Among compared prediction algorithms, the best fore- casting performance is denoted to EGP-WNN with RMSE 0.03 0.06 0.09 0.12 0 5 10 15 Iteration R M SE Forecasting Horizon 1 hr 30 min Fig. 9   The convergence profile of EGOA-LSTM algorithm for two forecasting horizons of Las Vegas case study as 0.045463, MAE as 0.030756, MAPE as 43.49489, and R2 as 0.896977. From Fig. 14, we notice that our novel deep neuroevolution method’s actual and predicted points are met properly. We also observe such dominance of our proposed method in Table 5 for 1-h ahead wind speed forecasting with the maximum value of R2 as 0.746495, minimum values of RMSE as 0.071538, MAE as 0.049332 and MAPE as 73.45831 while the performance of the best predictive algo- rithm among compared models indices to EGP-WNN with RMSE as 0.073322, MAE as 0.052651, MAPE as 85.09159 and R2 as 0.74649. As it can be seen in Fig. 14, the wind speed predicted by the EGOA-LSTM model demonstrates more similarities with the actual data points and conducts fewer errors in the Denver case study. Figure 15 shows the convergence curve for the EGOA- LSTM algorithm using 30-min and 1-h ahead horizons for the Denver case study. Like the Las Vegas case study, EGOA-LSTM is easily converged to the maximum iteration number (20), and it generates fewer error values for 30-min ahead prediction compared with 1-h ahead horizon. Besides, four utilized LSTM hyperparameters involved in optimi- zation procedures with EGOA obtain low computational volumes of hyperparameters, as shown in Figs. 16 and 17. For example, the proposed algorithm for the initializing of learning rate hyperparameter in both cases mostly chooses values that are closer to the beginning of the interval or the median, indicating that the algorithm is effective in initial- izing the LSTM hyperparameters. Moreover, the boxplots of two forecasting horizons of the Denver case study are illustrated in Figs. 18 and 19. We notice from these two figures that the novel EGOA-LSTM performs better than all single and hybrid benchmarks. Finally, we present the best architectures obtained by the proposed algorithm for both databases in the 1-step (30-min) and 2-step (1-h) ahead time periods in Table 6. As an example, we can see that for the prediction of the next 30-min ahead of the Denver case study, the algorithm selects the maximum epoch = 50, the number of units = 23, batch size = 50, and learning rate = 0.0001, which results in the RMSE equal to 0.033562. We focus on discussing and comparing the proposed EGOA-LSTM model with other conventional forecasting algorithms in a nutshell. We can notice from the experimen- tal findings for utmost short-term wind speed forecasting and short-term wind speed forecasting for both case studies that Batch size Learning rate Maximum epoch Neural units 0 5 10 15 20 25 0 100 200 300 0.000 0.005 0.010 0 50 100 150 Fig. 10   Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Las Vegas case study of the 30-min ahead forecasting Batch size Learning rate Maximum epoch Neural units 20 40 60 0 100 200 300 400 0.0025 0.0050 0.0075 0.0100 0.0125 0 50 100 150 200 250 Fig. 11   Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Las Vegas case study of the 1-h ahead forecasting XGBOOST BP CNN LSTM DE-LSTM EMD-GABP EGP-WNN EGOA-LSTM Algorithms 0 0.05 0.1 0.15 0.2 0.25 R M SE Fig. 12   RMSE box plots of all models for Las Vegas 30-min ahead forecasting EGOA-LSTM performs superior in error prediction indica- tors (RMSE, MAE, MAPE and R2 ) among comparative fore- casting algorithms, including CNN, LSTM, BP, Xgboost, EMD-GABP, DE-LSTM, and EGP-WNN. Moreover, the proposed EGOA-LSTM model meets most of the actual and predicted points of both case studies. From both convergence curves for two case studies, we can easily understand that short-term wind speed forecast- ing is more costly and challenging than utmost short-term wind speed forecasting when the prediction time hori- zon was lengthened from 30 min to 1 h with wind speed time-horizon rising. Besides, to show the effectiveness of the proposed EGOA-LSTM from the statistical point, we evaluated it using the boxplots for different horizons of two datasets. The results indicate that the novel EGOA-LSTM performs superior compared to other benchmarks used in the experiments. Based on the evaluation error results, EGP-WNN itself shows a robust performance among compared benchmarks. Our proposed deep neuroevolution, which optimizes the four key hyperparameters of LSTM networks, improves the generalization robustness and competency of single LSTMs. On the other hand, the results of optimized hyperparameters visualized in violin plots for two forecasting horizons of two datasets show that EGOA-LSTM has not chosen the com- plex and heavy values during optimization of LSTMs. This behavior shows the low cost-efficiency of the EGOA-LSTM algorithm. According to the discussions in this section, we conclude that the EGOA-LSTM algorithm proposed in this study is efficient and promising, which can be considered as a reliable alternative strategy for wind speed time series forecasting. XGBOOST BP CNN LSTM DE-LSTM EMD-GABP EGP-WNN EGOA-LSTM Algorithms 0 0.05 0.1 0.15 0.2 0.25 R M SE Fig. 13   RMSE box plots of all models for Las Vegas 1-h ahead forecasting Table 4   Error estimated results of the predictions of the 30-min ahead wind speed time series for Denver dataset. The bold values rep- resent the best performance evaluation metric Algorithm RMSE MAE R 2 MAPE XGBOOST 0.150719 0.137569 0.584349 287.0161 BP 0.072462 0.052733 0.738284 70.73292 CNN 0.067939 0.050329 0.769935 72.37543 LSTM 0.054729 0.039344 0.850702 56.17056 DE-LSTM 0.056017 0.040672 0.843593 60.11669 EMD-GABP 0.053664 0.038637 0.856455 56.57227 EGP-WNN 0.045463 0.030756 0.896977 43.49489 EGOA-LSTM 0.042213 0.028105 0.911182 40.93012 Table 5   Error estimated results of the predictions of the 1-h ahead wind speed time series for Denver dataset. The bold values represent the best performance evaluation metric Algorithm RMSE MAE R 2 MAPE XGBOOST 0.164598 0.148963 0.342062 294.7785 BP 0.107107 0.080082 0.431719 116.0144 CNN 0.105921 0.078654 0.444242 113.3859 LSTM 0.112653 0.084376 0.371346 108.6512 DE-LSTM 0.109007 0.082086 0.411381 119.9755 EMD-GABP 0.105245 0.078292 0.451306 111.0744 EGP-WNN 0.073322 0.052651 0.733687 85.09159 EGOA-LSTM 0.071538 0.049332 0.746495 73.45831 0.0 0.2 0.4 0.6 0 200 400 600 EGOA−LSTM 0.0 0.2 0.4 0.6 0 200 400 600 Xgboost 0.0 0.2 0.4 0.6 0 200 400 600 BP 0.00 0.25 0.50 0.75 0 200 400 600 W in d sp ee d (m /s ) CNN 0.0 0.2 0.4 0.6 0 200 400 600 LSTM 0.0 0.2 0.4 0.6 0 200 400 600 DE−LSTM 0.0 0.2 0.4 0.6 0 200 400 600 EMD−GABP 0.0 0.2 0.4 0.6 0.8 0 200 400 600 Time (half−hour) EGP−WNN Fig. 14   The wind speed forecasting results of 30-min ahead obtained by different algorithms on Denver case study 4 � Conclusions and future directions Wind speed forecasting is an essential problem in the con- version, consumption, and wind energy operation, which has received much attention in recent years. This paper pre- sented a novel deep neuroevolution approach for wind speed forecasting, using the optimization of deep learning time series LSTM algorithm based on an enhanced version of GOA (EGOA) involving chaotic theory levy-flight operators. Involving these two powerful evolutionary operators into the original GOA makes adjusting and balancing the primary GOA exploration and exploitation phases. In this study, evolved LSTM neural networks were introduced to EGOA to optimize the hyperparameters of LSTMs to learn and predict the data of wind speed time series. To confirm the feasibility of the proposed EGOA-LSTM, two data-collection case studies from two wind stations near Las Vegas and Denver in the USA were introduced to forecast the utmost short- term wind speeds including 30-min short-term wind speed and 1 h ahead. We used the mutual information as the feature selection strategy to determine our proposed deep learning model’s optimal inputs. Compared to other prominent fore- casting methods such as LSTM, CNN, BP, Xgboost, EMD- GABP, DE-LSTM, and EGP-WNN, our novel EGOA-LSTM algorithm obtained the best prediction performance with the minimum values of RMSE, MAE, and MAPE and the maxi- mum value of R2 . Furthermore, the analysis of the evolved hyperparameters’ impact on the forecasting performance of the LSTMs presented that the hyperparameters of LSTMs optimized by the EGOA obtained a low computational cost. 0.04 0.06 0.08 0.10 0.12 0.14 0 5 10 15 Iteration R M SE Forecasting Horizon 1 hr 30 min Fig. 15   The convergence profile of EGOA-LSTM algorithm for two forecasting horizons of Denver case study The proposed EGOA-LSTM algorithm achieved adequate wind speed forecasting performance based on the nonlinear- learning features of LSTMs and EGOA. In this paper, we analyzed the univariate time series pre- diction for wind speed forecasting. For future works, the scholars can research for multivariate time series prediction of further complicated wind speed prediction based on more advanced deep neuroevolution models using more interde- pendent attributes such as power system statuses and weather conditions. Moreover, an attempt can be made to develop more optimal deep learning algorithms to promote green energy resources forecasting. A further valuable orientation might be to expand the datasets’ size, which would allow the training process more robust against over-fitting. An analysis of the wind speed results for the forecasts of the next few hours and multi-day would be undertaken as another future work. We may further use the proposed deep neuroevolu- tion strategy proposed in this work to obtain probabilistic forecasts to quantify the corresponding uncertainties in the wind speed datasets. Also, the new GOA-based model can Batch size Learning rate Maximum epoch Neural units 0 20 40 60 0 100 200 300 400 0.000 0.002 0.004 −50 0 50 100 150 200 Fig. 16   Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Denver study of the 30-min ahead forecasting Batch size Learning rate Maximum epoch Neural units 0 20 40 60 80 −200 0 200 400 −0.005 0.000 0.005 0.010 0.015 −50 0 50 100 150 200 Fig. 17   Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Denver case study of the 1-h ahead forecasting XGBOOST BP CNN LSTM DE-LSTM EMD-GABP EGP-WNN EGOA-LSTM Algorithms 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 R M SE Fig. 18   RMSE box plots of all models for Denver 30-min ahead forecasting be applied to areas such as neural network-based robotic systems [113]. Acknowledgements  This research was partially supported by the Aus- tralian Research Council Discovery Projects funding scheme (project DP190102181 and DP210101465). References 1. Liu M, Cao Z, Zhang J, Wang L, Huang C, Luo X (2020) Short- term wind speed forecasting based on the jaya-svm model. Int J Electric Power Energy Syst 121:106056 2. Watil A, El Magri A, Raihani A, Lajouad R, Giri F (2020) Multi- objective output feedback control strategy for a variable speed wind energy conversion system. Int J Electric Power Energy Syst 121:106081 3. Abedi A, Rahimiyan M (2020) Day-ahead energy and reserve scheduling under correlated wind power production. Int J Elec- tric Power Energy Syst 120:105931 4. Wang J, Song Y, Liu F, Hou R (2016) Analysis and application of forecasting models in wind power integration: a review of multi-step-ahead wind speed forecasting models. Renew Sustain Energy Rev 60:960–981 5. Hassan S, Khosravi A, Jaafar J (2015) Examining performance of aggregation algorithms for neural network-based electric- ity demand forecasting. Int J Electric Power Energy Syst 64:1098–1105 6. Mahmoudi MR, Heydari MH, Avazzadeh Z, Pho K-H (2020) Goodness of fit test for almost cyclostationary processes. Digit Signal Proc 96:102597 7. Mahmoudi MR, Maleki M, Pak A (2018) Testing the equality of two independent regression models. Commun Stat-Theory Methods 47:2919–2926 8. Haghbin H, Mahmoudi MR, Shishebor Z (2015) Large sample inference on the ratio of two independent binomial proportions. J Math Ext 5:87–95 9. Mahmoudi MR, Behboodian J, Maleki M (2017) Large sample inference about the ratio of means in two independent popula- tions. J Stat Theory Appl 16:366–374 10. Lydia M, Kumar SS, Selvakumar AI, Kumar GEP (2016) Linear and non-linear autoregressive models for short-term wind speed forecasting. Energy Convers Manag 112:115–124 11. Ailliot P, Monbet V (2012) Markov-switching autoregressive models for wind time series. Environ Model Softw 30:92–101 12. Torres JL, Garcia A, De Blas M, De Francisco A (2005) Fore- cast of hourly average wind speed with arma models in navarre (spain). Sol Energy 79:65–77 13. Yunus K, Thiringer T, Chen P (2015) Arima-based frequency- decomposed modeling of wind speed time series. IEEE Trans Power Syst 31:2546–2556 14. Jahangir H, Golkar MA, Alhameli F, Mazouz A, Ahmadian A, Elkamel A (2020) Short-term wind speed forecasting framework based on stacked denoising auto-encoders with rough ann. Sus- tain Energy Technol Assess 38:100601 15. Zhang X, Wang D, Zhou Z, Ma MYJITOPA (2019) Intelligence, robust low-rank tensor recovery with rectification and alignment. XGBOOST BP CNN LSTM DE-LSTM EMD-GABP EGP-WNN EGOA-LSTM Algorithms 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 R M SE Fig. 19   RMSE box plots of all models for Denver 1-h ahead forecasting Table 6   The best architectures obtained by EGOA-LSTM based on RMSE error metric Dataset Time-step M e N u B s L r RMSE Las Vegas 1 50 6 40 0.0011 0.030012 2 70 17 70 0.0001 0.061139 Denver 1 50 23 50 0.0001 0.033562 2 60 33 40 0.0006 0.068242 IEEE Trans Pattern Anal Mach Intell. https​://doi.org/10.1109/ TPAMI​.2019.29290​43 16. Zhang X, Wang T, Wang J, Tang G, Zhao L (2020) Pyramid channel-based feature attention network for image dehazing. Comput Vis Image Understand 197–198:103003. http://www. scien​cedir​ect.com/scien​ce/artic​le/pii/S1077​31422​03007​09 17. Zhang X, Jiang R, Wang T, Wang JJITOC (2020) S. f. V. tech- nology, recursive neural network for video deblurring. IEEE Trans Circ Syst Video Technol. https​://doi.org/10.1109/TCSVT​ .2020.30357​22 18. Zhang X, Wang T, Luo W, Huang PJITOC (2020) S. f. V. Tech- nology, Multi-level fusion and attention-guided cnn for image dehazing. IEEE Trans Circ Syst Video Technol. https​://doi. org/10.1109/TCSVT​.2020.30466​25 19. Zhang X, Wang J, Wang T, Jiang R, Xu J, Zhao LJIS (2020) Robust feature learning for adversarial defense via hierar- chical feature alignment. Inf Sci. https​://doi.org/10.1016/j. ins.2020.12.042 20. Jalali SMJ, Moro S, Mahmoudi MR, Ghaffary KA, Maleki M, Alidoostan A (2017) A comparative analysis of classifiers in can- cer prediction using multiple data mining techniques. In J Bus Intell Syst Eng 1:166–178 21. Jalali SMJ, Khosravi A, Alizadehsani R, Salaken SM, Kebria PM, Puri R, Nahavandi S (2019) Parsimonious evolutionary- based model development for detecting artery disease. In: 2019 IEEE International Conference on industrial technology (ICIT), IEEE, pp 800–805 22. Jalali SMJ, Ahmadian S, Khosravi A, Mirjalili S, Mahmoudi MR, Nahavandi S (2020) Neuroevolution-based autonomous robot navigation: a comparative study. Cogn Syst Res 62:35–43 23. Mousavirad SJ, Schaefer G, Jalali SMJ, Korovin I (2020) A benchmark of recent population-based metaheuristic algorithms for multi-layer neural network training. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference com- panion, pp 1402–1408 24. Jalali SMJ, Ahmadian S, Kebria PM, Khosravi A, Lim CP, Nahavandi S (2019) Evolving artificial neural networks using butterfly optimization algorithm for data classification. In: Inter- national Conference on neural information processing, Springer, pp 596–607 25. Hasani H, Jalali SMJ, Rezaei D, Maleki M (2018) A data mining framework for classification of organisational performance based on rough set theory. Asian J Manag Sci Appl 3:156–180 26. Jalali SMJ, Kebria PM, Khosravi A, Saleh K, Nahavandi D, Nahavandi S (2019) Optimal autonomous driving through deep imitation learning and neuroevolution. In: 2019 IEEE Inter- national Conference on systems, man and cybernetics (SMC), IEEE, pp 1215–1220 27. Mousavirad SJ, Jalali SMJ, Ahmadian S, Khosravi A, Schaefer G, Nahavandi S (2020) Neural network training using a bioge- ography-based learning strategy. In: International Conference on neural information processing, Springer, pp 147–155 28. Jalali SMJ, Khosravi A, Kebria PM, Hedjam R, Nahavandi S (2019) Autonomous robot navigation system using the evolution- ary multi-verse optimizer algorithm. In: 2019 IEEE International Conference on systems, man and cybernetics (SMC), IEEE, pp 1221–1226 29. Ahmadian S, Khanteymoori AR (2015) Training back propaga- tion neural networks using asexual reproduction optimization. In: 2015 7th Conference on information and knowledge technology (IKT), IEEE, pp 1–6 30. Quan H, Srinivasan D, Khosravi A (2016) Integration of renew- able generation uncertainties into stochastic unit commitment considering reserve and risk: A comparative study. Energy 103:735–745 31. Qiu T, Shi X, Wang J, Li Y, Qu S, Cheng Q, Cui T, Sui S (2019) Deep learning: a rapid and efficient route to automatic metasur- face design. Adv Sci 6:1900128 32. Li C, Hou L, Sharma BY, Li H, Chen C, Li Y, Zhao X, Huang H, Cai Z, Chen HJCMPI (2018) Biomedicine, developing a new intelligent system for the diagnosis of tuberculous pleural effu- sion. Comput Methods Programs Biomed 153:211–225 33. Wang M, Chen HJASC (2020) Chaotic multi-swarm whale opti- mizer boosted support vector machine for medical diagnosis. Appl Soft Comput 88:105946 34. Chen H-L, Wang G, Ma C, Cai Z-N, Liu W-B, Wang S-JJN (2016) An efficient hybrid kernel extreme learning machine approach for early diagnosis of parkinsons disease. Neurocom- puting 184:131–144 35. Kong X, Liu X, Shi R, Lee KY (2015) Wind speed prediction using reduced support vector machines with feature selection. Neurocomputing 169:449–456 36. Yu C, Li Y, Bao Y, Tang H, Zhai G (2018) A novel framework for wind speed prediction based on recurrent neural networks and support vector machine. Energy Convers Manag 178:137–145 37. Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo KA (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compart- ment neurons. IEEE Trans Neural Netw Learn Syst 31:148–162 38. Zhang Y, Liu R, Heidari AA, Wang X, Chen Y, Wang M, Chen HJN (2020) Towards augmented kernel extreme learning models for bankruptcy prediction: algorithmic behavior and comprehen- sive analysis. Neurocomputing. https​://doi.org/10.1016/j.neuco​ m.2020.10.038 39. Xia J, Chen H, Li Q, Zhou M, Chen L, Cai Z, Fang Y, Zhou H. J. C. m. (2017) p. i. biomedicine, Ultrasound-based differen- tiation of malignant and benign thyroid nodules: An extreme learning machine approach. Comput Methods Programs Biomed 147:37–49 40. Chen H, Qiao H, Xu L, Feng Q, Cai K (2019) A fuzzy optimiza- tion strategy for the implementation of rbf lssvr model in vis-nir analysis of pomelo maturity. IEEE Trans Ind Inf 15:5971–5979 41. Cadenas E, Rivera W (2009) Short term wind speed forecasting in la Venta, Oaxaca, México, using artificial neural networks. Renew Energy 34:274–278 42. Guo Z-H, Wu J, Lu H-Y, Wang J-Z (2011) A case study on a hybrid wind speed forecasting method using bp neural network. Knowl-Based Syst 24:1048–1056 43. Wang J, Du P, Niu T, Yang W (2017) A novel hybrid system based on a new proposed algorithm–multi-objective whale opti- mization algorithm for wind speed forecasting. Appl Energy 208:344–360 44. Tian C, Hao Y, Hu J (2018) A novel wind speed forecasting system based on hybrid data preprocessing and multi-objective optimization. Appl Energy 231:301–319 45. Salcedo-Sanz S, Pastor-Sánchez A, Prieto L, Blanco-Aguilera A, García-Herrera R (2014) Feature selection in wind speed predic- tion systems based on a hybrid coral reefs optimization-extreme learning machine approach. Energy Convers Manag 87:10–18 46. Zhao X, Zhang X, Cai Z, Tian X, Wang X, Huang Y, Chen H, Hu L. J. C. b. (2019) chemistry, Chaos enhanced grey wolf optimiza- tion wrapped elm for diagnosis of paraquat-poisoned patients. Comput Biol Chem 78:481–490 47. Wang M, Chen H, Yang B, Zhao X, Hu L, Cai Z, Huang H, Tong CJN (2017) Toward an optimal kernel extreme learning machine using a chaotic moth-flame optimization strategy with applica- tions in medical diagnoses. Neurocomputing 267:69–84 48. Zhang C, Wei H, Xie L, Shen Y, Zhang K (2016) Direct inter- val forecasting of wind speed using radial basis function neural networks in a multi-objective optimization framework. Neuro- computing 205:53–63 https://doi.org/10.1109/TPAMI.2019.2929043 https://doi.org/10.1109/TPAMI.2019.2929043 http://www.sciencedirect.com/science/article/pii/S1077314220300709 http://www.sciencedirect.com/science/article/pii/S1077314220300709 https://doi.org/10.1109/TCSVT.2020.3035722 https://doi.org/10.1109/TCSVT.2020.3035722 https://doi.org/10.1109/TCSVT.2020.3046625 https://doi.org/10.1109/TCSVT.2020.3046625 https://doi.org/10.1016/j.ins.2020.12.042 https://doi.org/10.1016/j.ins.2020.12.042 https://doi.org/10.1016/j.neucom.2020.10.038 https://doi.org/10.1016/j.neucom.2020.10.038 49. Zhang H, Qiu Z, Cao J, Abdel-Aty M, Xiong L (2019) Event- triggered synchronization for neutral-type semi-Markovian neu- ral networks with partial mode-dependent time-varying delays. IEEE Trans Neural Netw Learn Syst 31:4437–4450 50. Lv Z, Qiao L (2020) Deep belief network and linear perceptron based cognitive computing for collaborative robots. Appl Soft Comput 92:106300 51. Khodayar M, Khodayar ME, Jalali SMJ (2021) Deep learning for pattern recognition of photovoltaic energy generation. Electric J 34:106882 52. Chen J, Zeng G-Q, Zhou W, Du W, Lu K-D (2018) Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization. Energy Con- vers Manag 165:681–695 53. Liu H, Mi X-W, Li Y-F (2018) Wind speed forecasting method based on deep learning strategy using empirical wavelet trans- form, long short term memory neural network and elman neural network. Energy Convers Manag 156:498–514 54. Hu L, Hong G, Ma J, Wang X, Chen H. J. C. i. B. (2015) Medi- cine, An efficient machine learning approach for diagnosis of paraquat-poisoned patients. Comput Biol Med 59:116–124 55. Shen L, Chen H, Yu Z, Kang W, Zhang B, Li H, Yang B, Liu DJK-BS (2016) Evolving support vector machines using fruit fly optimization for medical data classification. Knowl-Based Syst 96:61–75 56. Pei S, Qin H, Zhang Z, Yao L, Wang Y, Wang C, Liu Y, Jiang Z, Zhou J, Yi T (2019) Wind speed prediction method based on empirical wavelet transform and new cell update long short-term memory network. Energy Convers Manag 196:779–792 57. Khodayar M, Kaynak O, Khodayar ME (2017) Rough deep neural architecture for short-term wind speed forecasting. IEEE Trans Ind Inf 13:2770–2779 58. Li T, Xu M, Zhu C, Yang R, Wang Z, Guan Z (2019) A deep learning approach for multi-frame in-loop filter of hevc. IEEE Trans Image Process 28:5663–5678 59. Chen H, Chen A, Xu L, Xie H, Qiao H, Lin Q, Cai K (2020) A deep learning cnn architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources. Agric Water Manag 240:106303 60. Hu H, Wang L, Tao R (2021) Wind speed forecasting based on variational mode decomposition and improved echo state net- work. Renew Energy 164:729–751 61. Mousavi AA, Zhang C, Masri SF, Gholipour G (2020) Struc- tural damage localization and quantification based on a ceemdan Hilbert transform neural network approach: a model steel truss bridge case study. Sensors 20:1271 62. Peng Z, Peng S, Fu L, Lu B, Tang J, Wang K, Li W (2020) A novel deep learning ensemble model with data denoising for short-term wind speed forecasting. Energy Convers Manag 207:112524 63. Hong Y-Y, Satriani TRA (2020) Day-ahead spatiotemporal wind speed forecasting using robust design-based deep learning neural network. Energy 209:118441 64. Qian J, Feng S, Tao T, Hu Y, Li Y, Chen Q, Zuo C (2020) Deep- learning-enabled geometric constraints and phase unwrapping for single-shot absolute 3d shape measurement. APL Photon 5:046105 65. Qian J, Feng S, Li Y, Tao T, Han J, Chen Q, Zuo C (2020) Single- shot absolute 3d shape measurement with deep-learning-based color fringe projection profilometry. Opt Lett 45:1842–1845 66. Wu Y-X, Wu Q-B, Zhu J-Q (2019) Data-driven wind speed fore- casting using deep feature extraction and lstm. IET Renew Power Gener 13:2062–2069 67. Yu R, Gao J, Yu M, Lu W, Xu T, Zhao M, Zhang J, Zhang R, Zhang Z (2019) Lstm-efg for wind power forecasting based on sequential correlation features. Future Gener Comput Syst 93:33–42 68. Wang B, Zhang L, Ma H, Wang H, Wan S (2019) Parallel lstm- based regional integrated energy system multienergy source-load information interactive energy prediction. Complexit 2019:1–13 69. Sun G, Li C, Deng L (2021) An adaptive regeneration framework based on search space adjustment for differential evolution. Neu- ral Comput Appl. https​://doi.org/10.1007/s0052​1-021-05708​-1 70. Cao Y, Li Y, Zhang G, Jermsittiparsert K, Nasseri M (2020) An efficient terminal voltage control for pemfc based on an improved version of whale optimization algorithm. Energy Rep 6:530–542 71. Bai B, Guo Z, Zhou C, Zhang W, Zhang J (2021) Application of adaptive reliability importance sampling-based extended domain pso on single mode failure in reliability engineering. Inf Sci 546:42–59 72. Saremi S, Mirjalili S, Lewis A (2017) Grasshopper optimisation algorithm: theory and application. Adv Eng Softw 105:30–47 73. Saxena A, Shekhawat S, Kumar R (2018) Application and devel- opment of enhanced chaotic grasshopper optimization algo- rithms. Model Simul Eng 2018:1–14 74. Xu Z, Hu Z, Heidari AA, Wang M, Zhao X, Chen H, Cai X (2020) Orthogonally-designed adapted grasshopper optimization: a comprehensive analysis. Expert Syst Appl 150:113282 75. Yu C, Chen M, Cheng K, Zhao X, Ma C, Kuang F, Chen HJEWC (2021) Sgoa: annealing-behaved grasshopper optimizer for global tasks. Eng Comput. https​://doi.org/10.1007/s0036​6-020- 01234​-1 76. Abualigah L, Diabat A (2020) A comprehensive survey of the grasshopper optimization algorithm: results, variants, and appli- cations. Neural Comput Appl 32:1–24 77. Wang B, Zhang B, Liu X (2021) An image encryption approach on the basis of a time delay chaotic system. Optik 225:165737 78. Jiang Q, Wang G, Jin S, Li Y, Wang Y (2013) Predicting human microrna-disease associations based on support vector machine. Int J Data Min Bioinform 8:282–293 79. Song X, Liu Y, Xue L, Wang J, Zhang J, Wang J, Jiang L, Cheng Z (2020) Time-series well performance prediction based on long short-term memory (lstm) neural network model. J Petrol Sci Eng 186:106682 80. Chang Z, Zhang Y, Chen W (2019) Electricity price prediction based on hybrid model of adam optimized lstm neural network and wavelet transform. Energy 187:115804 81. Wang H, Wang G, Li G, Peng J, Liu Y (2016) Deep belief net- work based deterministic and probabilistic wind speed forecast- ing approach. Appl Energy 182:80–93 82. Liu H, Mi X, Li Y (2018) Smart multi-step deep learning model for wind speed forecasting based on variational mode decomposi- tion, singular spectrum analysis, lstm network and elm. Energy Convers Manag 159:54–64 83. Ghimire S, Deo RC, Raj N, Mi J (2019) Deep solar radiation forecasting with convolutional neural network and long short- term memory network algorithms. Appl Energy 253:113541 84. Qing X, Niu Y (2018) Hourly day-ahead solar irradiance predic- tion using weather forecasts by lstm. Energy 148:461–468 85. Zahid M, Ahmed F, Javaid N, Abbasi RA, Kazmi Z, Syeda H, Javaid A, Bilal M, Akbar M, Ilahi M (2019) Electricity price and load forecasting using enhanced convolutional neural network and enhanced support vector regression in smart grids. Electron- ics 8:122 86. Li L (2019) Geographically weighted machine learning and downscaling for high-resolution spatiotemporal estimations of wind speed. Remote Sens 11:1378 87. Wang S, Zhang N, Wu L, Wang Y (2016) Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and ga-bp neural network method. Renew Energy 94:629–636 https://doi.org/10.1007/s00521-021-05708-1 https://doi.org/10.1007/s00366-020-01234-1 https://doi.org/10.1007/s00366-020-01234-1 ​ ​ ​ ​ ​ ​ 88. Peng L, Liu S, Liu R, Wang L (2018) Effective long short-term memory with differential evolution algorithm for electricity price prediction. Energy 162:1301–1314 89. Filik T (2016) Improved spatio-temporal linear models for very short-term wind speed forecasting. Energies 9:168 90. Xu Y, Chen H, Luo J, Zhang Q, Jiao S, Zhang XJIS (2019) Enhanced moth-flame optimizer with mutation strategy for global optimization. Inf Sci 492:181–203 91. Zhao D, Liu L, Yu F, Heidari AA, Wang M, Liang G, Muham- mad K, Chen HJK-BS (2020) Chaotic random spare ant colony optimization for multi-threshold image segmentation of 2d Kapur entropy. Knowl-Based Syst 216:106510 92. Tu J, Chen H, Liu J, Heidari AA, Zhang X, Wang M, Ruby R, Pham Q-VJK-BS (2021) Evolutionary biogeography-based whale optimization methods with communication structure: towards measuring the balance. Knowl-Based Syst 212:106642 93. Shan W, Qiao Z, Heidari AA, Chen H, Turabieh H, Teng YJK-BS (2020) Double adaptive weights for stabilization of moth flame optimizer: balance analysis, engineering cases, and medical diag- nosis. Knowl-Based Syst 214:106728 94. Hu J, Chen H, Heidari AA, Wang M, Zhang X, Chen Y, Pan ZJK- BS (2020) Orthogonal learning covariance matrix for defects of grey wolf optimizer: Insights, balance, diversity, and feature selection. Knowl-Based Syst 213:106684 95. Yu H, Li W, Chen C, Liang J, Gui W, Wang M, Chen HJEwC (2020) Dynamic gaussian bare-bones fruit fly optimizers with abandonment mechanism: method and analysis. Eng Comput 1–29. https://doi.org/10.1007/s00366-020-01174-w 96. Xu X, Chen H-LJSC (2014) Adaptive computational chemotaxis based on field in bacterial foraging optimization. Soft Comput 18:797–807 97. Chen H, Heidari AA, Chen H, Wang M, Pan Z, Gandomi AH (2020) Multi-population differential evolution-assisted Harris hawks optimization: framework and case studies. Future Gener Comput Syst 111:175–198. https ://doi.org/10.1016/j.futur e.2020.04.008. http://www.scien cedir ect.com/scien ce/artic le/ pii/S0167739X19313263. Accessed Oct 2020 98. Wu T, Cao J, Xiong L, Zhang H (2019) New stabilization results for semi-markov chaotic systems with fuzzy sampled-data con- trol. Complexity 2019 99. Shi K, Tang Y, Zhong S, Yin C, Huang X, Wang W (2018) Non- fragile asynchronous control for uncertain chaotic lurie network systems with bernoulli stochastic process. Int J Robust Nonlinear Control 28:1693–1714 100. Liu J, Wu C, Wu G, Wang X (2015) A novel differential search algorithm and applications for structure design. Appl Math Com- put 268:246–269 101. Shi K, Tang Y, Liu X, Zhong S (2017) Non-fragile sampled- data robust synchronization of uncertain delayed chaotic Lurie systems with randomly occurring controller gain fluctuation. ISA Trans 66:185–199 102. Fan Q, Chen Z, Li Z, Xia Z, Yu J, Wang D (2020) A new improved whale optimization algorithm with joint search mecha- nisms for high-dimensional global optimization problems. Eng Comput 1–28. https​://doi.org/10.1007/s0036​6-019-00917​-8 103. Haklı H, Uğuz H (2014) A novel particle swarm optimization algorithm with levy flight. Appl Soft Comput 23:333–345 104. Western wind data set, https​://www.nrel.gov/grid/weste​rn-wind- data.html, ???? [online] Accessed 15 Jan 2020 105. Zhao X, Li D, Yang B, Ma C, Zhu Y, Chen HJASC (2014) Fea- ture selection based on improved ant colony optimization for online detection of foreign fiber in cotton. Appl Soft Comput 24:585–596 106. Zhang Y, Liu R, Wang X, Chen H, Li C. J. s. (2020) Boosted binary Harris hawks optimizer and feature selection. Eng Com- put 25:26 107. Zhang X, Fan M, Wang D, Zhou P, Tao DJITONN, Systems L (2020) Top-k feature selection framework using robust 0–1 inte- ger programming. IEEE Trans Neural Netw Learn Syst. https​:// doi.org/10.1109/TNNLS​.2020.30092​09 108. Bhaskar K, Singh S (2012) Awnn-assisted wind power fore- casting using feed-forward neural network. IEEE Trans Sustain Energy 3:306–315 109. Sagheer A, Kotb M (2019) Time series forecasting of petroleum production using deep lstm recurrent networks. Neurocomputing 323:203–213 110. Cao J, Li Z, Li J (2019) Financial time series forecasting model based on ceemdan and lstm. Phys A 519:127–139 111. Sagheer A, Kotb M (2019) Unsupervised pre-training of a deep lstm-based stacked autoencoder for multivariate time series fore- casting problems. Sci Rep 9:1–16 112. Shida H, Fei, G Quan Z, Ding H (2020) Mrmd2.0: A python tool for machine learning with feature ranking and reduction. Curr Bioinform 15: 1213–1221. https​://doi.org/10.2174/15748​93615​ 99920​05030​30350​. http://www.eurek​asele​ct.com/node/18157​8/ ​ article. Accessed Feb 2021 113. Ding L, Li S, Gao H, Chen C, Deng Z (2018) Adaptive partial reinforcement learning neural network-based tracking control for wheeled mobile robotic systems. IEEE Trans Syst Man Cybern Syst 50:2512–2523 https://doi.org/10.1007/s00366-020-01174-w https://doi.org/10.1016/j.future.2020.04.008 https://doi.org/10.1016/j.future.2020.04.008 http://www.sciencedirect.com/science/article/pii/S0167739X19313263 http://www.sciencedirect.com/science/article/pii/S0167739X19313263 https://doi.org/10.1007/s00366-019-00917-8 https://www.nrel.gov/grid/western-wind-data.html https://www.nrel.gov/grid/western-wind-data.html https://doi.org/10.1109/TNNLS.2020.3009209 https://doi.org/10.1109/TNNLS.2020.3009209 https://doi.org/10.2174/1574893615999200503030350 https://doi.org/10.2174/1574893615999200503030350 http://www.eurekaselect.com/node/181578/article http://www.eurekaselect.com/node/181578/article Towards novel deep neuroevolution models: chaotic levy grasshopper optimization for short-term wind speed forecasting Abstract 1 Introduction 2 Proposed method 2.1 Structure of basic GOA 2.2 Chaotic-population initialization 2.3 Levy flight 2.4 Enhanced GOA 2.5 LSTM 2.6 Proposed EGOA-LSTM Method 3 Experimental results 3.1 Data 3.2 Evaluation metrics 3.3 Input feature selection 3.4 Parameter settings 3.5 Analysis of the results and discussion 3.5.1 Las Vegas case study 3.5.2 Denver case study 4 Conclusions and future directions Acknowledgements References