short-term forecasting of wind speed using time division ... · year, grover et al. (2015) used...

RESEARCH ARTICLE

Short-term forecasting of wind speed using timedivision ensemble of hierarchical deep neural networks

Ashapurna Marndi1,2 & G. K. Patra1,2 & K. C. Gouda1,2

Received: 24 September 2019 /Accepted: 27 February 2020# Springer Nature Switzerland AG 2020

AbstractWind power in tropical countries like India has great potential as a major source ofgreen energy. However, in order to do proper energy provisioning, there is a need toforecast and estimate the wind speed at windmill locations with an actionable lead-time. Forecasting wind speed at station level is a big challenge using dynamicalmodels as it gives only macro level information. Therefore, use of statistical modelsis preferably adopted for this purpose. With the recent phenomenal growth of appli-cations in artificial intelligence (AI), it is also possible to use data-driven modelsbased on AI, especially deep learning for short-term forecasting of wind speed. In thispaper, we have proposed a novel ensemble forecasting methodology using the longshort-term memory (LSTM) model, which is a deep learning approach for time seriesdata analysis. The capability of this approach has been demonstrated using windspeed data obtained from two meteorological stations located at New Delhi in NorthIndia and at Bengaluru in South India. We have used the ensemble methodology intwo different modes; one is the averaging pooling and other is by using a hierarchicalLSTM. The simulations using these models have been validated against the trueobservations at station scale. The ensemble forecasting method has shown promisingresults for 3-h early wind speed prediction at both the locations. The results are alsocompared with two classical statistical methods namely autoregressive and persis-tence models and two state-of-the-art data-driven models namely support vectormachine (SVM) and extreme learning machine (ELM). The capability of the proposedmethod is demonstrated through various error matrices and found to have betterperformance. We believe that the proposed method has the potential to improve theshort-term wind speed prediction capability at station level.

Keywords Ensemble forecasting . Long short-termmemory.Artificial intelligence .Deep neuralnetworks .Wind speed prediction

https://doi.org/10.1007/s42865-020-00009-2

* Ashapurna [email protected]

Extended author information available on the last page of the article

Published online: 4 March 2020

Bulletin of Atmospheric Science and Technology (2020) 1:91–108

/

http://crossmark.crossref.org/dialog/?doi=10.1007/s42865-020-00009-2&domain=pdf

mailto:[email protected]

1 Introduction

Wind energy is considered as one of the most promising green energy resources that have thepotential to change the power profile of a country. Especially, tropical countries like India,wind energy production is likely to go up further due to its rich wind resources. In case of windenergy generation, windmills are used to convert wind energy to electrical energy. As asynchronized power provider, a power grid gets its electricity from different sources anddistributes to consumers. In case of India, about 55% of the total power are thermal, while 13%and 11% of power are obtained from hydro-electric and solar respectively. Wind energyaccounts for about 10% of the total consumption. European countries like Denmark, Uruguay,and Portugal use wind power as their major source of energy (20–40%), while countries likeChina, the USA, and Canada (4–6%) are rapidly increasing their wind power generatingcapability (Alencar et al., 2017). Unlike the other sources of power, wind power and solarpower generation has a higher degree of uncertainty. Hence, grids face difficulty for makingproper energy planning such as power commitment, distribution, and maintenance scheduling.As a result, the grid ends up either in wasting excessive power or starves in need for morepower. In order for grids to do proper planning, it is essential that the amount of wind energygenerated should be estimated in advance. This will enable the grid to instruct non-renewablepower generating stations to either decrease or increase the power production and saveprecious resources as well as pollution. Given effective planning, policies, and programs,wind energy can be a core component of affordable, reliable, and clean energy portfolio of acountry. Accurate prediction of wind speed that has a cubic relationship with the power thatcan be generated is one of the biggest challenges in atmospheric sciences. It is desirable topredict wind speed as early and as accurately as possible. But the stochastic behavior of thewind speed makes it difficult to predict with long lead-time and high accuracy.

Typically, numerical weather models are used to predict wind speed. These are modelswhich depend on dynamical equations and are more suitable to get macro level information interms of both time and space. However, wind energy generation depends on the wind speed ata particular location of the windmill. Higher wind speed triggers faster rotation of blades and inturn generates more power. The amount of energy that can be generated is a function of speed,mass, and the kinetic energy (Patel 2005; Bernstein 2002). In fact, electric power output has acubic relationship with wind speed. Thus, there is a need to provide a good wind speedforecast, considering parameters affecting at micro level. As dynamical models are not suitablein this case, there is a need to have a better mechanism to predict wind speed at station level.

Over the last few decades, several efforts have been made to predict wind speed at specificlocation using various statistical methods. These are typically auto regressive (AR), autoregressive moving average (ARMA), and auto regressive integrated moving average(ARIMA) or persistence-based models. Statistical methods use difference between predictedand observed parameters to adjust the model parameters. However, recent trend of rapidinstrumentation and availability of huge observational data enables researchers to use artificialintelligence-based approaches to find relationship between input and output by non-statisticalmeans and use them for forecasting. In fact, to be very precise, the methods and techniquesbased on deep neural networks have shown lots of promises. Deep neural network, which arebased on deep learning that mimics how a human brain works, have already shown lot ofpromises in areas such as image processing, video processing, audio processing, and naturallanguage processing. However, its applications to prediction on the weather and atmosphericparameters are still limited. One of the early studies has used three local recurrent neural


networks to predict long-term wind speed (Barbounis et al. 2006). Later, Gong and Jing (2010)compared three different neural networks to predict wind speed with a lead-time of 1 h.Subsequently, Cao et al. (2012) and Bhaskar and Singh (2012) used recurrent neural networkand feed-forward neural network respectively to predict wind speed. Using a multi-channeladaptive filter Dowell et al. (2013) predicted both the wind speed and the direction. Xie et al.(2014) used probabilistic trigonometric direction diurnal (TDD) for wind forecasting. The nextyear, Grover et al. (2015) used hybrid model based on neural network for the prediction ofwind speed. Wang et al. (2016) proposed a prediction based on wavelet transform in combi-nation to deep belief network and spine quantile regression. Later, Xuejiao et al. (2017) haveused fuzzy logic and neural network for forecasting the wind speed. In another study by Liuet al. 2018, wind speed time series was decomposed into different sub layers of differentfrequencies using wavelet packet decomposing technique and used convolutional neuralnetwork (CNN) for high frequency and LSTM CNN for low frequency sub layers to predictwind speed. Another work proposed a multi period-ahead wind speed forecast model based onthe analysis of variance, stacked denoising auto encoder (SDAE), and ensemble learning forwind forecasting (Chen et al. 2019). Though there are lots of interest and progress over the lastdecade, there is still lots of potential in improvement of shot-term wind speed prediction.

In this paper, we have proposed and implemented new methodology named time divisionensemble (TDE) using deep neural network (DNN) based on long-term short memory (LSTM)for improving the accuracy of wind speed prediction. Here, each member of the ensemble isdistinguished by different time characteristics of the same input data. We have compared ourresults with classical statistical methods such as persistence and autoregressive model and ourproposed four different variants, i.e., two in non-ensemble mode and two in ensemble mode.Also we have compared with some of the artificial intelligence-based models with similarobjectives. The validation of proposed algorithm is carried out using meteorological observa-tion data at two distinct geographical locations: one located in the northern part of India andthe other in the southern part of India. The forecast method is demonstrated by first dividingthe data into training and testing data. Then the training data is used to learn the non-linearrelationships between the input and the output parameters and the testing data is used tovalidate the performance of the model. It is observed that the ensemble forecasts performedbetter than the others. To evaluate the performance of all the prediction models, three errorindices such as root mean square error (RMSE), mean absolute error (MEA), and correlationcoefficients (CC) are used.

2 Data

The Council of Scientific and Industrial Research (CSIR), India has setup a number ofmeteorological towers at various parts of the country to collect most of the major metrologicalparameters.

These data sets are already used by earlier works to understand the variations and dynamicsof various parameters at station scale (Kantha Rao and Rakesh 2019; Bhimala and Goswami2015). For our studies, we have considered the available data from two meteorologicalobservation towers during 2010–2014. For better demonstration, we have considered onetower at New Delhi in north India and the other one in Bengaluru (earlier called Bangalore), insouth India. Figure 1 shows the climatological daily averaged wind speed for the two citiesduring 2010 to 2013 as observed by the meteorological towers. It can be observed from the

Bulletin of Atmospheric Science and Technology (2020) 1:91–108 93

climatology (4 year average) that there is much larger variability of the wind speed observa-tions at New Delhi as compared to Bengaluru. We have used the four meteorologicalparameters such as temperature (T), pressure (P), humidity (H), and wind speed (V) availableat 30-min averaged interval. The observed data are collected at height of 20 m.

In this study, the whole data is divided into training, validation, and testing set. We haveconsidered the data during 2010 to 2013 for the training and validation and the data of 2014 isused for testing. The period of data for training, validation, and testing is for a total of 5 years.

3 Methodology

In artificial neural network (ANN)-based model, it is extremely important to design a robustmethodology having capability to do a reliable forecasting by extracting inherent relationshipsamong data. Hence, it is important to define the basic building blocks of forecasting modelssuch that each of them should be robust and efficient. Following are the detail explanation ofthe different methodology adopted for forecasting the wind speed in our study.

3.1 Selection of model parameters

It is important to identify the parameters that drive the forecasting model. Be it a classicalstatistical model or DNN, there can be two different approaches. One is domain independent

Fig. 1 Climatological wind speed (daily average) during 2010–13 for New Delhi and Bengaluru locations


approach, where a single variable time series data is used for forecasting without any need ofdomain knowledge such as source of data. In a statistical sense, autoregressive models handlesuch forecasts where previous values of the time series are used to predict the future value ofthe same time series. On the other hand, domain knowledge can enhance the scope offorecasting by intelligently selecting the input. Multi-dimensional data, with the knowledgeof the inter-dependency of the various parameters, certainly enhances the scope of theforecasting capability. In statistical sense, multi-regression models use several explanatoryvariables to predict the outcome of a response variable. In the case of wind speed prediction,the following three aspects need to be identified and defined beforehand to design an optimalforecasting solution.

3.1.1 Selection of inputs

Time series prediction problem is one of the hardest prediction problems in data sciencebecause not only the features but also the sequence in which they occur, needs to beremembered. Wind speed prediction using historical data is one such challenge where atmo-spheric scientists are working for long time. Most of the time series predictions use older/historical data of a particular parameter to predict the same parameter for a future time (Treiberet al. 2015). From the basic understanding of atmospheric circulation, it is well known thatwind speed depends on other weather parameters such as temperature, humidity, pressure,rainfall, dew point, and solar radiation. However, the effects of temperature, humidity, andpressure are significantly more important than others (Cadenas et al. 2016). Hence it is logicalto consider at least these three parameters apart from past wind speed data itself for predictingwind speed for future time steps.

One of the issues while using the long period historical data is that, there may be significanttrend over time, which may lead to increase in mean error or bias. Therefore, detrending of theobserved data can be considered. However, it has been shown in earlier studies by Jewson andBrix (2005) that detrending meteorological data for few years (about 5–10 years) can do moreharm than good even if the trend is real, because it increases standard deviation of theprediction. Especially in case of predictions based on multiple variables, the relationshipamong the variables may be lost due to the process of detrending. Also, some of the recurrentneural network-based model, such as LSTM (used in this paper) has the capability ofremembering the past information, while predicting the future values. LSTM is known to picknon-linear trends very well. However, statistical methods such as autoregressive and persis-tence models do expect the response variables to be stationary and hence need special carewhile analyzing the data.

Curing of observed data is another important requirement with respect to quality controlsuch as cleaning the data and replacing missing data with scientifically acceptable manner.Though, the data used for our studies were already quality controlled during data collection byCSIR, there were still some missing values, which were filled using linear interpolationmethods (Falge et al., 2001).

3.1.2 Selection of range of prediction

There are different time scale prediction requirements, depending on the application, veryimportant to define in advance. The time scale requirements can be divided in to threecategories, namely short-term (in hours), medium-term (in days), and long-term (in weeks or


months). However, the prediction accuracy decreases as we go from short to long range ofprediction. But, it should be emphasized that, the purpose of the prediction decides the range ofprediction. In case of the applications related to weather, for making policy decisions, long-term predictions could be useful, though accuracy level may be less. However, for wind energygeneration, accuracy of prediction is of utmost importance and hence, short-term prediction isthe requirement. Even in short-term forecasting, it is important that, the prediction is providedwith a lag of sufficient actionable time. Small power system operation needs the wind power atsmaller time scale, i.e., 1 to 3 h. This was also confirmed by some of the state power gridagencies in southern India. As per their views, in order to facilitate electrical grids to plan aproper rescheduling of energy generation from non-renewable energy sources, a 3-h earlyprediction is a sufficient actionable time. Hence, the present work concentrates on 3-h earlyprediction at station level, which can provide a proper estimate of potential wind energy.

3.1.3 Selection of model

One of the important and most challenging aspects of a prediction system is to identify asuitable model. As far as wind speed prediction is considered, there are several models existbased on numerical, statistical, and artificial intelligence techniques (Buhan et al., 2016;Bianchi et al., 2017). For station-based prediction, numerical prediction methods are notsuitable. Traditional time series forecasting methods based on statistical models focus onunivariate data with linear relationships. On the other hand, the machine learning-basedsystems model the non-linear and complex relationship better and provide a better predictioncapability (Makridakis et al. 2018). In our study, in addition to our proposed model, we haveconsidered two of the classical statistical techniques (autoregressive and persistence) and twoof the state-of-the art data-driven models namely support vector machine (Mohandes et al.,2004; Pinto et al., 2014) and extreme learning machine (Lazarevska 2016). Our results arecompared with the results obtained from these models. The classical statistical methods as wellas the data-driven models are univariate, and hence they use the previous values of wind speedand predict the future wind speed. The proposed LSTM-based model uses four dependentparameters, i.e., temperature, humidity, and pressure in addition to the wind speed itself againfor a predefined previous time steps and predicts the future wind speed.

Among the machine learning methods, the recurrent neural network (RNN) uses feedbackloops so that the network can learn the sequence information. However, a standard RNN failsto remember long-term information, and hence we propose to use long short-term memory(LSTM) network for our studies. It has been observed that LSTM is the most effective solutionfor time series prediction in an earlier study by Choi and Lee (2018). LSTM has edge overother conventional neural network and recurrent neural network algorithms due to its propertyof selectively remembering patterns for long duration of time as mentioned by Hochreiter andSchmidhuber (1997). LSTM has a great benefit in time series forecasting, where classicallinear methods have difficulty to adapt to multivariate forecasting problems.

A typical LSTM (Fig 2) consists of different memory blocks called “cells.” There are two statesnamed cell state (Cn) and hidden state (hn) that are being updated in every timestamp and propagatedto the next timestamp. The memory blocks are responsible for remembering and manipulating thememory. This is performed through three major mechanisms called “gates.” The first gate is called“Forget” gate (fn), which is responsible for removing information. The second gate is “Input”gate (in), which is responsible for addition of information to the cell state. The final one is the“Output” gate (on), which selects useful information from current cell and shows it as an output.


Three gates: forget gate (fn), input gate (in), and output gate (on) of LSTM are defined usingfollowing three equations as described by Hochreiter and Schmidhuber (1997).

f n ¼ σ W f In þ U f hn−1 þ bf� � ð1Þ

in ¼ σ WiIn þ Uihn−1 þ bið Þ ð2Þ

on ¼ σ WoIn þ Uohn−1 þ boð Þ ð3Þ

where In is input to LSTM network, which is defined in the next section. Based on thevalues of forget gate, input gate, output gate mentioned in the above equations at everytime stamp, cell state (Cn), and hidden state (hn) of LSTM are being updated using thefollowing equations:

Cn ¼ f n � Cn−1 þ in � tanh WcIn þ Uchn−1 þ bcð Þ ð4Þ

hn ¼ tanh Cnð Þ � on ð5Þwhere Wf, Wi, Wo, and Wc are the weight matrices of current timestamp; Uf, Ui, Uo, Uc are theweight matrices of the previous timestamp; and bf, bi, bo, bc are the bias vectors for the gates fn,in, on, and cell state Cn respectively; and hn − 1 is the hidden unit of previous state. σ and tanhare the sigmoid and hyperbolic tangent activation functions respectively. Performance of thesegates enable LSTM to work on time series data effectively.

As LSTM is advanced ANN, it supports the basic property of ANN such as inputlayer, output layer, and hidden layer. Input layer receives the input to be fed in thenetwork and the output layer is responsible for producing final outcome of the network.Hidden layers consist of number of hidden neurons, which compute weighted input andgenerate output after passing through appropriate activation function. One of theimportant parameters that affect the accuracy of the prediction system is the numberof epochs. Epoch defines the number of times the training set is trained. In ourproposed algorithm, we have designed stacked LSTM by stacking together multipleLSTMs as the base model.

Fig. 2 Architecture of a LSTM network


3.2 The proposed model

The popularity of LSTM as a time series predictor has inspired us to investigate further as apotential short-term wind speed forecaster. We use LSTM as a multivariate time-seriesforecasting platform, where the input vector consists of list of interdependent variables ofone or more subsequent timestamps and the output predicts the wind speed with a short lead-time. In the case of wind speed, the dependency of its value on parameters such as temperature,humidity, and pressure is significantly high; hence, they are considered as input parameters inthis model.

The ability of LSTM to predict time series accurately depends on the quality as well as thequantity of data. In addition, though LSTM is a powerful tool, the result greatly depends onhow intelligently the inputs are managed as different sequence of inputs that leads to differentunique results. It motivated us to propose an enhanced methodology where LSTM is used inan ensemble mode with different input characteristics.

Standard ensemble methodology in machine learning typically combines number of meta-algorithms based on different machine learning techniques with aim to decrease the varianceand bias and to increase the prediction capability. However, in our approach, we use LSTM asthe core machine learning technique and use different input characteristics as the members ofthe ensemble. We believe, in data science, ensembles based on characteristics of input data canenhance the prediction capability. To establish the superiority of the proposed methodology, inthe specific case of wind speed prediction, we have presented results of four different modes ofLSTM. Two of them are in non-ensemble mode and other two are in ensemble mode asdiscussed below.

Each mode of LSTM is characterized in terms of the type and nature of input data.However, to ensure a fair comparison, in all the modes, the hyper parameters such as thenumber of hidden layers and number of neurons in each hidden layer are kept same. Even thenumber of epochs is maintained among the ensembles. Also, we have considered 3-h earlyprediction (short term) in all the modes. As the inputs have different time characteristics, so wehave considered 3-h lead-time from the last input timestamp used in the respective modes.

3.2.1 Non-ensemble modes of LSTM

The first two modes fall into the category of non-ensemble LSTM. However, they have twodifferent input characteristics and hence they are independently analyzed. The first mode is anon-ensemble non-sequence mode, where the inputs belong to only one timestamp. Asdiscussed earlier, we consider four different inter-dependent meteorological parameters, i.e.,temperature (T), pressure (P), humidity, (H), and wind speed (V) as inputs to predict the windspeed with a 3-h lead time. Henceforth, we will represent it as an input quadruple In = (Tn, Pn,Vn, Hn). Here, n is the current timestamp (sequence) identifier. In our naming convention forindexes, we have used “n” for the current timestamp, negative index (such as n− 1 and n− 2)for the past data and positive indexes (such as n + 1 and n + 2) for the future. In this particularcase of wind speed prediction, we have data at every 30-min interval and hence, each sequencein this case differs by 30 min. In order to achieve, a 3-h early prediction, we need to predictwind speed at 6-time steps ahead, i.e., Vn+ 6. Henceforth, this mode is referred as LSTM1 inthis paper. Figure 3 a shows the schematic structure of LSTM1.

To take the real advantage of the capability of LSTM, i.e., to learn from time series, wehave considered the second non-ensemble mode where input consists of multiple timestamps.


In this case, the LSTM takes three consecutive timestamps of the quadruple (i.e., total 12inputs) as the input. Hence, In, In−1, In−2 are provided as input to the LSTM, and the wind speedfor (n + 6)th step is predicted (i.e., Vn+ 6) to achieve 3-h early prediction. Henceforth, this modeof LSTM is referred as LSTM2 in this paper. Figure 3 b shows the schematic structure ofLSTM2.

3.2.2 Ensemble modes of LSTM

This is the core of the proposed methodology. In atmospheric science, especially for weatherprediction, ensemble forecasting is a well-known technique to decrease uncertainty and bias inthe prediction. Ensembles based on different initial conditions or different dynamic models arethe state-of-the art. Machine learning techniques have also adopted ensemble methodologies toachieve better prediction capability. However, most of these methods are restricted to the use ofdifferent models as ensembles.

In our proposal, we use different input patterns to characterize an ensemble member. Also,we implement them in two different modes depending on how the ensemble outputs arecombined to generate the final prediction. In both the cases, we have used three ensembles.Structurally the ensemble members are similar to that of LSTM2. However, they differ by theirinput patterns and output lead-time. The ensemble members are defined as follows.

The first ensemble is exactly same as LSTM2, in which there are three inputquadruples of consecutive timestamps (In, In − 1, In − 2). In case of the second ensemble,the input quadruple is from alternate timestamps (i.e., In, In − 2, In − 4). For the thirdensemble, the input quadruples differ by two timestamps (i.e., In, In − 3, In − 6). It isessential that, the ensemble outputs should be for the same timestamp, in order tocombine them. Hence, to achieve, 3-h early prediction, the output should be 6timestamps ahead considering the 30-min observation data of the last timestamp usedas an input. All the ensembles need to predict the (n + 6)th wind speed (i.e., Vn + 6). Theoutput of the three ensembles can be summarized as Eq. (6) below.

Fig. 3 Standard LSTM. a LSTM1 (multiple variable and single-time step). b (multiple variable and multiple timesteps)


Vinþ6 ¼ LSTM2i I n; In−i; In−2ið Þ;where i ¼ 1 to 3 ð6Þ

Here, i represents the ith ensemble of the LSTM. Though we have considered only 3ensembles here, theoretically the value of i can be more. This process is common for boththe ensemble modes. It is important here to note that, the training of the ensemble members isindependent of each other. Post processing of the outputs obtained from the ensemblemembers differentiates the 3rd and 4th mode of our proposed LSTM.

In the 3rd mode of LSTM, the three outputs undergo average pooling, and the finalpredicted value is obtained which is summarized in Eq. (7) for the case of i = 3.

VPredictednþ6 ¼ 1

3∑3

i¼1Vinþ6 ð7Þ

Henceforth, this mode of LSTM is referred as LSTM3 in this paper. Figure 4 a shows theschematic structure of LSTM3.

In LSTM3, we have implemented an average function, where output of each ensemblemember contributes equally to the outcome. This imposes strong restriction. Hence, toimprove upon LSTM3, we looked at implementing weighted average where the contributionsof each ensemble member differ. However, the challenge is to determine the weights. Hence,we introduced a second layer of LSTM at the output and train it to provide the final outcome.This led to a hierarchical structure of LSTM, which is henceforth referred as LSTM4 in thispaper. However, it is important to note that the training in the second layer is carried out aftertrainings of all the ensemble members are completed. The output of the LSTM at the secondlayer can be summarized in Eq. (8) for the case when i = 3.

VPredictednþ6 ¼ LSTM V1

nþ6;V2nþ6;V

3nþ6

� � ð8ÞHenceforth, this mode of LSTM is referred as LSTM4 in this paper. Figure 4 b shows theschematic structure of LSTM4.

3.3 Model parameter configuration

In case of the LSTMs, the hyper-parameters, such as number of input nodes, number of outputnodes, number of hidden layers, and number of neurons in hidden units, need to be optimizedand fixed. This is typically done on a trial and error basis. As discussed in “Section 3.2” foreach mode of LSTM, the numbers of input nodes and output nodes are already fixed. Hence,the parameters that need to be determined are the number of hidden layers and the number ofneurons per hidden layer. The optimization exercise was carried out using LSTM2, as it acts asthe basic building block for the ensemble mode of operation.

To start with, one hidden layer with 10 numbers of neurons were considered. The output ofthe LSTM was fixed to provide the wind speed at 3 h in advance, i.e., 6 data points ahead asdata collected is of 30-min intervals. The training phase was performed on the training data andthe training loss was calculated. This process was repeated by increasing the number ofneurons in the hidden layer by a step of 10 neurons, till there was no improvement to theloss value. Once the number of neuron in the hidden layer was fixed, the process was againcontinued by adding one new hidden layer each time, till we reach an optimum loss output. For


LSTM2, which forms the basic building block for our ensemble forecasting, we have used 12hidden units and 50 neurons in each layer. We use mean square error as the loss function andAdagrad optimizer and the learning rate is set to 0.0001. The data for the period 2010–2013 isused as input for training and validation. The observations being at 30-min interval, we haveused about 52,560 data points with a validation split of 0.33. The training data is used to buildup the models, and validation data is used to select the parameters of the systems that bestperform on these data. The testing data is neither utilized in building the systems nor onselecting the system parameters.

In case of autoregressive and persistence models, only past values of wind speed were usedto predict the 3-h early forecast. We have used the AR implementation of statsmodels(Statsmodels 0.11.0, 2009-2019), which is a python package that provides statistical compu-tations including descriptive statistics and estimation and inference for statistical models, toidentify suitable order and the coefficients that minimize the model error on the training data.

Fig. 4 Ensemble mode of LSTM. a LSTM3 (average pooling). b LSTM4 (hierarchical LSTM)


The coefficients of the model are estimated using linear least square technique. We used theAR model recursively to achieve 6-step prediction. In the case of persistence model, we haveused the current wind speed as the 3-h (6-time steps) future wind speed. As these modelsexpect stationary input, we tested our model using both raw data as well as detrend data. Wehave used a simple method of differencing to detrend the data.

In case of the two data-driven models used for our studies, they are based on support vectormachine (SVM) and extreme learning machines (ELM) respectively, and both the models areused in univariate mode. The basic idea of SVM for regression is to map the data into a highdimensional feature space via a nonlinear mapping and to perform a linear regression in thisfeature space. The SVM (Mohandes et al., 2004; Pinto et al., 2014) used in this paper utilizesthe radial basis function (RBF). The validation data is used to optimize the model parameters.Several trials were used to find reasonably good values of these parameters for the wind speedprediction.

The ELM is a new type of single-hidden layer feed-forward network (SLFN) of which theinput weights and hidden layer biases do not need to be tuned. The best feature of the ELMalgorithm is its extremely high learning speed, suitability for short-term forecasting studies. Wehave used 100 hidden neurons and “tanh” function as the transfer function (Lazarevska 2016)for the prediction of wind speed.

In all the cases, the prediction was done 3-h in advance, i.e., 6-time steps. The differentmodels were implemented using Python with the help of Keras library in Tensorflow platform,and different studies were carried out on Intel(R) Xeon(R) CPU E3-1203 v3 @ 3.30 GHz with8 cores and 32 GB RAM.

We have performed our studies using four different modes of LSTM in addition to the twoclassical statistical methods and two data-driven models.

4 Result and analysis

We train the proposed LSTM-based models with 100 sequences per batch for 80 epochs forboth the locations, i.e., New Delhi and Bengaluru. To decide the number of epochs to beconsidered for prediction, the training and validation losses are computed and presented inFig 5. It can be observed that in the case of New Delhi observation, the training and validationlosses converge after 60 epochs, while for Bengaluru observation, the losses converge as earlyas 30 epochs. This is expected due to the larger variability in observed wind climatology ofNew Delhi as compared to Bengaluru (Fig. 1).

The epochs are fixed at 60 for New Delhi and 30 for Bengaluru observations. Thefour modes of LSTMs were trained accordingly and the ensemble members of the LSTMwere trained separately. In case of LSTM4, another LSTM was used at the output toprovide the final prediction. However, even for this LSTM, the same hyper-parameterswere retained.

After the training, the model is evaluated on the test set. The future parameter is set to 6, i.e.,3-h early predictions. The predicted wind speed was calculated for a period of 322 days, i.e.,15,456 data points for the year 2014, which were compared with the corresponding observeddata. To compare the predictions with their actual observations, we have used three errormatrices, namely mean absolute error (MAE), root mean square error (RMSE), and Pearsoncorrelation coefficient (CC). Table 1 shows the comparison of the results for the persistence,autoregressive, SVM, ELM, and the four proposed modes of LSTM.


From Table 1, it is observed that using “detrend” in classical statistical models resulted inlow MAE and RMSE compared to raw data analysis. However, the correlation reduced(almost close to 0) indicating the pattern of wind speed variability captured well in raw datamodeling but with more bias. In case of the data-driven models, the performances of SVM andELM models are comparable to the statistical model in terms of all the error matrices for thecase of New Delhi location. However, performance of ELM-based model is better than thestatistical as well as the SVM-based model for the Bengaluru location.

In comparison to the statistical models as well as SVM- and ELM-based models, theLSTM-based models show low MAE and RMSE values. Among the LSTM-basedmodels, the ensemble-based models, i.e., LSTM3 and LSTM4 resulted in a very lowMAE and RMSE with strong and significant correlation with the observed wind speedin New Delhi and Bengaluru. This clearly indicates that the ensemble-based LSTMprovides a better short-term wind prediction in terms of magnitude as well as the day-to-day variability.

It is very important to note the model performance in terms of capturing the wind ofdifferent categories like low, high, and very high wind as this will be a good indicator for thewind energy assessment and disaster assessment. So we have compared the predicted windsfrom data-driven models in order to quantify the model performance for category prediction

Fig. 5 Training and validation losses of LSTM2. a New Delhi. b Bengaluru locations

Table 1 Summary of MAE, RMSE, and CC between predicted and observed wind speed for (a) New Delhi and(b) Bengaluru locations

Locations New Delhi Bengaluru

Measures MAE RMSE CC MAE RMSE CC

Persistence (raw) 0.883 1.214 0.590 1.713 2.320 0.742Persistence (detrend) 0.483 0.673 0.008 0.518 0.725 0.015Autoregressive (raw) 0.759 1.051 0.662 1.578 2.090 0.792Autoregressive (detrend) 0.332 0.468 0.181 0.360 0.509 0.187SVM 0.824 1.091 0.510 1.757 2.287 0.614ELM 0.773 1.027 0.547 0.919 1.173 0.651LSTM1 0.179 0.210 0.044 0.154 0.189 0.167LSTM2 0.100 0.128 0.100 0.136 0.161 0.170LSTM3 0.078 0.107 0.489 0.133 0.160 0.716LSTM4 0.077 0.105 0.585 0.129 0.156 0.760


and presented in Figs. 6 and 7. Plotting the whole data for the 15,456 data points for all themodels will clutter the graph and no information can be derived out of it. Hence, we haveselected the two data-driven models (SVM and ELM) and our proposed LSTM4 (the modelwith low error and high correlation), which is a hierarchical ensemble LSTM (HELSTM) for asample output. Figures 6 (New Delhi) and 7 (Bengaluru) present the comparison between thethree identified models with the observed data only for a period of 4 days starting from 1February 2014 (i.e., about 192 time steps).

From this comparative analysis, the better performance of the ensemble-based short-termwind prediction can be seen from Figs. 6 and 7. The SVM- and ELM-based models arecomparable with our propose ensemble-based LSTM model. In fact, the SVM- and ELM-based models are slightly underestimating the wind speed at both locations. However, nomodel seems to predict the low wind speed (close to “0”) properly. Though, the ensemble-based LSTM performed comparatively better than other models in estimating the high windspeed and overestimates the low wind speed values, for both the locations. However, noconclusion can be made for prediction of very high wind speed (> 5 m/s for New Delhi and >7 m/s for Bengaluru) prediction as they have a lower frequency of occurrence.

To get more insight about the prediction capability of the short-term wind speed, therelationship, i.e., scatter plot between observed values (in x-axes) and predicted values (in y-axes) for New Delhi and Bengaluru are presented in Figs. 8 and 9 respectively. Both Figs. 8and 9, present the whole predicted data (15,456 data points), in comparison to the observedvalues for the (a) autoregressive, (b) SVM, (c) ELM, and (d) hierarchical LSTM [i.e.,LSTM4]) models in both figures.

In order to analyze the performance of the models better, we have also drawn a unit line,which can provide information about the bias and the extent of errors.

For New Delhi location depicted in Fig. 8, it can be observed that, all the four models arecomparable in terms of estimation of wind speed and have a slight positive bias in the windprediction. However, the SVM- and ELM-based models are not able to predict very low windspeed (close to zero).

Fig. 6 Comparison of 3-h ahead predicted wind speed (m/s) using SVM, ELM, and HELSTM, vs. observedwind speed (m/s) for the New Delhi location for 4-day period (192 data points) starting from 1 February 2014


Similarly, for the Bengaluru location, presented in Fig. 9, the statistical method (AR) shows alarge negative bias and grossly underestimates the wind speed. On the other hand, the data-drivenmodels (SVM, ELM, and hierarchical LSTM) have performed better than the statistical model. The

Fig. 7 Comparison of 3-h ahead predicted wind speed (m/s) using SVM, ELM, and HELSTM, vs. observedwind speed (m/s) for the Bengaluru location for 4-day period (192 data points) starting from 1 February 2014

Fig. 8 Comparison of different models used for the 3-h early predicted of wind speed (m/s) vs. observed windspeed (m/s) at New Delhi. a AR, b SVM, c ELM, and d hierarchical LSTM for the period of 322 days in 2014


SVM-based model is able to predict the high wind speed, but fails to predict very low wind speed.The ELM-based model successfully predicts the medium wind speeds, while fails to predict bothvery high wind speed (> 6 m/s) and very low wind speed (< 1 m/s). Our proposed model based onhierarchical LSTM is able to predict all categories except the very high wind speed (> 7 m/s). Thewind speed prediction is mostly limited to 7m/s. Though, this is well within the climatological windlimits of the city, it is important that the capability of very high wind speed also be estimatedaccurately for a better power generation potential. These inferences are similar to the windobservations presented in Figs. 6 and 7. One reason of underestimating the very high wind speedcould be that, the instances of the very highwind speed are relatively rare and hence, the data-drivenmodels may potentially need more data to capture the characteristics of high wind speed scenarios.

It is clear from these analysis that LSTM4, which is a hierarchical ensemble-based LSTMmodel, provides better predictability, in comparison to the other models discussed here. Theresults are achieved at the cost of additional computation for the second layer training. But thisoverhead is only at the training stage, hence does not act as a bottleneck for performance.

5 Conclusion

Ensemble forecasting is a well-known methodology in atmospheric studies. This is typicallyused to get an indication about the range of possible outcome and for the improvement in the

Fig. 9 Comparison of different models used for the 3-h early predicted of wind speed (m/s) vs. observed windspeed (m/s) at Bangalore station. a AR, b SVM, c ELM, and d hierarchical LSTM for the period of 322 days in2014


prediction at multiple scales. However, the application of ensemble forecasting using artificialintelligence is relatively new in the field of meteorological studies. The data-driven models’outputs are slightly different each time the model is executed. These output fluctuations arecaused by random initialization of the input weights and hidden biases, which limits thedecision-making ability. Ensemble-based models stabilize these slight fluctuations. The pro-posed work emphasizes to generate forecasting from an ensemble of forecasts with weightedaverage where the weights are decided through another layer of AI in general, and LSTM inparticular is a novel approach. With this proposed method, different LSTMs, SVM, ELM, aswell as two classical statistical methods namely autoregressive and persistence models areconsidered and compared with the observations. It has been observed that the hierarchicalensemble-based LSTM performs better than other models for 3-h early forecast. This proposedLSTM could also be used to improve prediction of other time series data as well, which are ofinterest to scientists and researchers.

Compliance with ethical standards

Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

Alencar BDD, Affonso CDM, Oliveira RCLDO et al (2017) Different models for forecasting wind powergeneration: case study. Energies 10(12):1976. https://doi.org/10.3390/en10121976

Barbounis TG, Theocharis JB, Alexiadis MC, Dokopoulos PS (2006) Long-term wind speed and powerforecasting using local recurrent neural network models. IEEE Transactions on Energy Conversion 21(1):273–284. https://doi.org/10.1109/TEC.2005.847954

Bernstein J. Wind Resource Analysis Program (2002) Minnesota Department of Commerce: Saint Paul, MN,USA

Bianchi FM, Maiorino E, Kampffmeyer MC, Rizzi A, Jenssen R (2017) An overview and comparative analysisof recurrent neural networks for short term load forecasting. Neural and Evolutionary Computing https://doi.org/10.1007/978-3-319-70338-1

Bhaskar K, Singh S (2012) AWNN-assisted wind power forecasting using feed-forward neural network. IEEETrans Sustainable Energy 3(2):306–315. https://doi.org/10.1109/TSTE.2011.2182215

Bhimala KR, Goswami P (2015) A comparison of ASCAT soil moisture data with in situ observations over theIndian region. A multiscale analysis. IEEE Trans Geosci Remote Sens 53(10):5425–5434. https://doi.org/10.1109/TGRS.2015.2422377

Buhan S, Ozkazanc Y, Cadirci I (2016) Wind pattern recognition and reference wind mast data correlations withnwp for improved wind-electric power forecasts. IEEE Trans Ind Inf 12(3):991–1004. https://doi.org/10.1109/TII.2016.2543004

Cadenas E, Rivera W, Amezcua RC, Heard C (2016) Wind speed prediction using a univariate ARIMA modeland a multivariate NARX model energy. Energies 9(2):109. https://doi.org/10.3390/en9020109

Cao Q, Ewing BT, Thompson MA (2012) Forecasting wind speed with recurrent neural networks. Eur J OperRes 221(1):148–154. https://doi.org/10.1016/j.ejor.2012.02.042

Chen L, Zhijun L, Zhang Y (2019) Multiperiod-ahead wind speed forecasting using deep neural architecture andensemble learning. Math Probl Eng. https://doi.org/10.1155/2019/9240317

Choi JY, Lee B (2018) Combining LSTM network ensemble via adaptive weighting for improved time seriesforecasting. Math Probl Eng. https://doi.org/10.1155/2018/2470171

Dowell J, Weiss S, David H, David I (2013) Short-term spatio-temporal prediction of wind speed and direction.Wind Energy. https://doi.org/10.1002/we.1682

Falge E et al (2001) Gap filling strategies for long term energy flux data sets. Agric For Meteorol 107(1):71–77.https://doi.org/10.1016/S0168-1923(00)00235-5

Gong L, Jing S (2010) On comparing three artificial neural networks for wind speed forecasting. Appl Energy87(7):2313–2320. https://doi.org/10.1016/j.apenergy.2009.12.013


https://doi.org/10.3390/en10121976

https://doi.org/10.1109/TEC.2005.847954

https://doi.org/10.1007/978-3-319-70338-1

https://doi.org/10.1007/978-3-319-70338-1

https://doi.org/10.1109/TSTE.2011.2182215

https://doi.org/10.1109/TGRS.2015.2422377

https://doi.org/10.1109/TGRS.2015.2422377

https://doi.org/10.1109/TII.2016.2543004

https://doi.org/10.1109/TII.2016.2543004

https://doi.org/10.3390/en9020109

https://doi.org/10.1016/j.ejor.2012.02.042

https://doi.org/10.1155/2019/9240317

https://doi.org/10.1155/2018/2470171

https://doi.org/10.1002/we.1682

https://doi.org/10.1016/S0168-1923(00)00235-5

https://doi.org/10.1016/j.apenergy.2009.12.013

Grover A, Kapoor A, Horvitz E (2015) A deep hybrid model for weather forecasting. In: roceedings of the 21thACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 379–386.https://doi.org/10.1145/2783258.2783275

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Jewson S, Brix A (2005) Weather derivative valuation: the meteorological, statistical, financial and mathematicalfoundations. Cambridge University Press, Cambridge

Kantha Rao B, Rakesh V (2019) Evaluation of WRF-simulated multilevel soil moisture, 2-m air temperature, and2-m relative humidity against in situ observations in India. Pure Appl Geophys 176:1807–1826. https://doi.org/10.1007/s00024-018-2022-7

Lazarevska E, (2016) Wind speed prediction with extreme learning machine, Intelligent Systems, IS,International IEEE Conference. https://doi.org/10.1109/IS.2016.7737415

Liu H, Xiwei M, Yanfei L (2018) Smart deep learning based wind speed prediction model using wavelet packetdecomposition, convolutional neural network and convolutional long short term memory network. EnergyConvers Manag 166:120–131. https://doi.org/10.1016/j.enconman.2018.04.021

Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and machine learning forecasting methods:concerns and ways forward. PLoS One 13(3):e0194889. https://doi.org/10.1371/journal.pone.0194889

Mohandes MA, Halawani TO, Hussain RS, Ahmed A (2004) Support vector machines for wind speedprediction. Renew Energy 29(6):939–947. https://doi.org/10.1016/j.renene.2003.11.009

Pinto T, Ramos S, Sousa MT, Zita V (2014) Short-term wind speed forecasting using support vector machines.Computational intelligence in dynamic and uncertain nvironments (CIDUE), IEEE symposium on.https://doi.org/10.1109/CIDUE.2014.7007865

Patel MR (2005) Wind and solar power systems: design, analysis, and operation. CRC Press, Boca RatonStatsmodels 0.11.0 (2009–2019) statistical models, hypothesis tests, and data exploration. https://www.

statsmodels.org/stable/index.html. Accessed 24 January 2020Treiber NA, Spath S, Heinermann J, Bremen LV, Kramer O (2015) Comparison of numerical models and

statistical learning for wind speed prediction. ESANN proceedings, European symposium on artificial neuralnetworks, computational intelligence and machine learning.i6doc.com publ., ISBN 978-287587014-8

Wang HZ, Wang GB, Li GQ, Peng JC, Liu YT (2016) Deep belief network based deterministic and probabilisticwind speed forecasting approach. Appl Energy 182:80–93. https://doi.org/10.1016/j.apenergy.2016.08.108

Xie L, Gu Y, Zhu X, Genton MG (2014) Short-term spatio-temporal wind power forecast in robust look-aheadpower system dispatch. IEEE Trans Smart Grid 5(1):511–520. https://doi.org/10.1109/TSG.2013.22823

Xuejiao M, Yu J, Qingli D (2017) A generalized dynamic fuzzy neural network based on singular spectrumanalysis optimized by brain storm optimization for short-term wind speed forecasting. Appl Soft Comput 54:296–312. https://doi.org/10.1016/j.asoc.2017.01.033

Affiliations

Ashapurna Marndi1,2 & G. K. Patra1,2 & K. C. Gouda1,2

1 CSIR Fourth Paradigm Institute, Wind Tunnel Road, Bengaluru 560037, India2 Academy of Scientific and Innovative Research, Ghaziabad 201002, India


https://doi.org/10.1145/2783258.2783275

https://doi.org/10.1162/neco.1997.9.8.1735

https://doi.org/10.1162/neco.1997.9.8.1735

https://doi.org/10.1007/s00024-018-2022-7

https://doi.org/10.1007/s00024-018-2022-7

https://doi.org/10.1109/IS.2016.7737415

https://doi.org/10.1016/j.enconman.2018.04.021

https://doi.org/10.1371/journal.pone.0194889

https://doi.org/10.1016/j.renene.2003.11.009

https://doi.org/10.1109/CIDUE.2014.7007865

https://www.statsmodels.org/stable/index.html

https://www.statsmodels.org/stable/index.html

http://learning.i6doc.com

https://doi.org/10.1016/j.apenergy.2016.08.108

https://doi.org/10.1109/TSG.2013.22823

https://doi.org/10.1016/j.asoc.2017.01.033

short-term forecasting of wind speed using time division ... · year, grover et al. (2015) used...

Documents