improved myceilski.pdf

12
Improved synthetic wind speed generation using modied Mycielski approach Mehmet Fidan 1 , Fatih Onur Hocaoğlu 2,3, * ,and Ömer N. Gerek 1 1 Department of Electrical Engineering, Anadolu University, 26555, Eskişehir, Turkey 2 Engineering Faculty, Department of Electrical Engineering, Afyon Kocatepe University, 03200 Afyonkarahisar, Turkey 3 Solar and Wind Reseach and Application Center, Afyon Kocatepe University, 03200 Afyonkarahisar, Turkey SUMMARY In this paper, novel approaches for wind speed data generation using Mycielski algorithm are developed and presented. To show the accuracy of developed approaches, we used three-year collected wind speed data belonging to deliberately se- lected two different regions of Turkey (Izmir and Kayseri) to generate articial wind speed data. The data belonging to the rst two years are used for training, whereas the remaining one-year data are used for testing and accuracy comparison purposes. The concept of distinct synthetic data production with correlation-wise and distribution-wise similar statistical properties constitutes the main idea of the proposed methods for a successful articial wind speed generation. Generated data are compared with test data for both regions in the sense of basic statistics, Weibull distribution parameters, transition probabilities, spectral densities, and autocorrelation functions; and are also compared with the data generated by the clas- sical rst-order Markov chains method. Results indicate that the accuracy and realistic behavior of the proposed method is superior to the classical method in the literature. Comparisons and results are discussed in detail. Copyright # 2011 John Wiley & Sons, Ltd. KEY WORDS wind speed; prediction; Mycielski; Markov; modeling; synthetic data generation Correspondence *F. O. Hocaoğlu, Engineering Faculty, Department of Electrical Engineering, Afyon Kocatepe University, 03200 Afyonkarahisar, Turkey. E-mail: [email protected] Received 28 April 2010; Revised 19 May 2011; Accepted 12 June 2011 1. INTRODUCTION The knowledge of wind speed time series is of vital impor- tance to evaluate the characteristics of the data to deter- mine the wind regime of any region for electricity generation purposes [16]. Synthetic wind data generation gives very useful insights to understand the underlying process for the meteorological phenomenon. There are a number of studies that deal with synthetic wind speed modeling. A brief review of these studies is given in the succeeding paragraphs: Sahin and Sen have modeled the wind speed data mea- sured from the Marmara region of Turkey using rst-order Markov chains [7]. Tore et al. used rst-order Markov chain models for synthetic generation of hourly wind speed time series in Corsica region [8]. Youcef Ettoumi et al. have modeled three hourly wind speed and wind direction data by means of Markov chains [9]. Autoregressive models, Markov chains, and wavelet methods are also used for wind speed data generation by Aksoy et al. [10]. Shamshad et al. have generated hourly wind speed data using rst-order and second-order Markov chains and compared the rst-order and second-order Markov chains using wind speed data measured from two different regions in Malaysia [11]. In their study, it is concluded that the wind speed behavior slightly improves by increasing the Markov-model order. Recently, Hocaoglu et al. also modeled the wind speed data using Markov chains and observed the effect of number of Markov states [12]. In these studies, the wind speed generation was based on Markov transition probabilities. The use of Markov chains to generate wind speed data is appropriate for approximat- ing the general statistical parameters of orginal data. How- ever, it is not appropriate to approximate the variations or to keep the correlations between the samples of the wind speed data. Spectral density analysis gives very useful insights to understand the behavior of the data in time [13]. The spectral characteristics of the previously reported methods are clearly far from the characteristics of real wind speed data. INTERNATIONAL JOURNAL OF ENERGY RESEARCH Int. J. Energy Res. 2012; 36:12261237 Published online 31 August 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/er.1893 Copyright # 2011 John Wiley & Sons, Ltd. 1226

Upload: lavanyachandran

Post on 09-Nov-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

  • Improved synthetic wind speed generation usingmodied Mycielski approachMehmet Fidan1, Fatih Onur Hocaolu2,3,*, and mer N. Gerek1

    1Department of Electrical Engineering, Anadolu University, 26555, Eskiehir, Turkey2Engineering Faculty, Department of Electrical Engineering, Afyon Kocatepe University, 03200 Afyonkarahisar, Turkey3Solar and Wind Reseach and Application Center, Afyon Kocatepe University, 03200 Afyonkarahisar, Turkey

    SUMMARY

    In this paper, novel approaches for wind speed data generation using Mycielski algorithm are developed and presented. Toshow the accuracy of developed approaches, we used three-year collected wind speed data belonging to deliberately se-lected two different regions of Turkey (Izmir and Kayseri) to generate articial wind speed data. The data belonging tothe rst two years are used for training, whereas the remaining one-year data are used for testing and accuracy comparisonpurposes. The concept of distinct synthetic data production with correlation-wise and distribution-wise similar statisticalproperties constitutes the main idea of the proposed methods for a successful articial wind speed generation. Generateddata are compared with test data for both regions in the sense of basic statistics, Weibull distribution parameters, transitionprobabilities, spectral densities, and autocorrelation functions; and are also compared with the data generated by the clas-sical rst-order Markov chains method. Results indicate that the accuracy and realistic behavior of the proposed method issuperior to the classical method in the literature. Comparisons and results are discussed in detail. Copyright# 2011 JohnWiley & Sons, Ltd.

    KEY WORDS

    wind speed; prediction; Mycielski; Markov; modeling; synthetic data generation

    Correspondence

    *F. O. Hocaolu, Engineering Faculty, Department of Electrical Engineering, Afyon Kocatepe University, 03200 Afyonkarahisar,Turkey.E-mail: [email protected]

    Received 28 April 2010; Revised 19 May 2011; Accepted 12 June 2011

    1. INTRODUCTION

    The knowledge of wind speed time series is of vital impor-tance to evaluate the characteristics of the data to deter-mine the wind regime of any region for electricitygeneration purposes [16]. Synthetic wind data generationgives very useful insights to understand the underlyingprocess for the meteorological phenomenon. There are anumber of studies that deal with synthetic wind speedmodeling. A brief review of these studies is given in thesucceeding paragraphs:

    Sahin and Sen have modeled the wind speed data mea-sured from the Marmara region of Turkey using rst-orderMarkov chains [7]. Tore et al. used rst-order Markovchain models for synthetic generation of hourly wind speedtime series in Corsica region [8]. Youcef Ettoumi et al.have modeled three hourly wind speed and wind directiondata by means of Markov chains [9]. Autoregressivemodels, Markov chains, and wavelet methods are alsoused for wind speed data generation by Aksoy et al.

    [10]. Shamshad et al. have generated hourly wind speeddata using rst-order and second-order Markov chainsand compared the rst-order and second-order Markovchains using wind speed data measured from two differentregions in Malaysia [11]. In their study, it is concluded thatthe wind speed behavior slightly improves by increasingthe Markov-model order. Recently, Hocaoglu et al. alsomodeled the wind speed data using Markov chains andobserved the effect of number of Markov states [12]. Inthese studies, the wind speed generation was based onMarkov transition probabilities. The use of Markov chainsto generate wind speed data is appropriate for approximat-ing the general statistical parameters of orginal data. How-ever, it is not appropriate to approximate the variations orto keep the correlations between the samples of the windspeed data. Spectral density analysis gives very usefulinsights to understand the behavior of the data in time[13]. The spectral characteristics of the previously reportedmethods are clearly far from the characteristics of real windspeed data.

    INTERNATIONAL JOURNAL OF ENERGY RESEARCHInt. J. Energy Res. 2012; 36:12261237

    Published online 31 August 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/er.1893

    Copyright# 2011 John Wiley & Sons, Ltd.1226

  • In this paper, a novel approach using Mycielski algo-rithm (as presented in Section 2) is applied to generate ar-ticial wind speed data. Originally, the Mycielskialgorithm was designed as a predictor, which was alsoused for forecasting the wind speed in the future [14].The Mycielski algorithm was also developed and usedfor coding and compression in communications [15], or,with simple inversion modications, it was also used as apseudo-random number generator [16]. In this paper, theMycielski algorithm is converted to wind speed data gener-ator instead of predictor with several changes. The gener-ated data have same characteristics with the original data.However, this generated data are totally different fromthe original data at sample base. To show the accuracyand efciency of the proposed method, we also tested theMarkov chain approach for wind speed data generationas mentioned in Section 3. The generated data from theproposed methods and the measured data are compared,and detailed analysis on generated data is carried out inSection 4. In this section, generation methods are discussedin the sense of Weibull distribution parameters, transitionprobabilities, spectral densities, and autocorrelation func-tions of the generated and measured data. Finally, theresults and the conclusions are given in Section 5.

    2. WIND SPEED DATA GENERATIONUSING MODIFIED MYCIELSKIALGORITHM

    Wind speed time series generation is important to under-stand the underlying process of the data. Such a study isnecessary for further analysis on wind data.

    The Mycielski algorithm performs a prediction on thetime series data using the total exact history of the datasamples. The basic idea of the algorithm is to search forthe longest sufx string at the end of the data sequence,which had been repeated at least once in the history ofthe sequence. The search starts with a short (length=1)template size and goes on increasing the template size aslong matches are found in the history. When the longest re-peating sequence is determined, the value of the sampleright after the repeating template is assigned as the predic-tion value. The prediction rule works according to the

    intuitive fact that if this pattern had appeared like this inthe past, then it is expected to behave the same now.

    A time series predictor can be generalized with the ex-pression in Equation (1):

    x

    n 1 fn1 x 1 ; . . . ; x n (1)

    where the difference between x

    n 1 and the actual valuex[n+1] is expected to be small. For our particular case, thefunction f() performs an iterative algorithm that starts fromthe shortest data segment at the end (i.e., lengthone sam-ple: x[n]) then one by one increases the data segmentlength to the left side as (x[n1],x[n]), (x[n2],x[n1],x[n]), and so forth. Meanwhile, the segments are searchedfrom the end point to the start point by sliding over thesamples. Several matches could be found during the algo-rithm run. At a point of a no-match, a probably long seg-ment will not be encountered anywhere in the pastsequence. At that point, the prediction is made as the nextsample value of the latest encountered (1shorter) match-ing string. Naturally, the algorithm searches through thewhole data sequence repeatedly for each prediction step,and it has high computational requirements. The overallscheme can be analytically expressed as follows:

    m arg maxL

    x k x n ; x k 1 x n 1 ; . . . ; x k L 1 x n L 1

    ;

    fn1 x

    n 1 x m (2)

    The original Mycielski algorithm works on binarysequences. For binary sequences, the steps of the algorithmcan also be shown with an example as in Table I.

    Because the time series data adopted herein consist ofreal numerical values, the algorithm should be modiedfor the articial wind speed generation process. Anothermodication was also necessary to avoid cyclic and re-peated generation outputs. In this work, two types of mod-ications were proposed to remedy these problems. Therst proposed algorithm can be dened as Mycielski gen-eration with random noise addition (Myc-1), and the sec-ond algorithm can be called Mycielski generation forlevel-reduced wind speed data (Myc-2). For both of themodication methods, the wind speed data were acquiredfor three years and were used for the generation of one-

    Table I. A sample run for basic Mycielski prediction.

    X019=01101110011101011011, length:20, X

    20=?

    Scanned history Searched pattern Repeat location Prediction location Prediction value0110111001110101101 1 18 19 1011011100111010110 11 15 17 001101110011101011 011 14 17 00110111001110101 1011 2 6 1011011100111010 11011 1 6 101101110011101 011011 0 6 10110111001110 1011011 No repeat Previous location Previous predictionStop procedure. X

    20=1

    Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu and . N. Gerek

    1227Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • year articial wind speed data. The measured wind speedvalues vary between 0 and 14m/s with 0.1m/s quantizedlevels. For both Myc-1 and Myc-2 methods, the trainingset is kept the same (two years of data). However, bothof these methods utilize randomization process to varythe output year data at each run, making it possible to pro-duce several distinct realizations.

    2.1. Myc-1 algorithm

    In the Myc-1 algorithm, the rst sample (following arecorded history of 3years data) of articial data was pre-dicted using a modied Mycielski algorithm. The modi-cation consists of relaxing the exact matching criterion ofEquation (2) to a close match in the Hamming distance.Hamming distance can be shown as in Equation (3).

    dHx; y x; y 2 Rj 1;x 6 y0;x y

    (3)

    This kind of an alteration is necessary to process timeseries data with non-integer values. By nature, oatingpoint numbers hardly match in an exact manner; therefore,a match case is assumed if two numbers are close within aninterval. For each numerical comparison, this interval was

    taken as 0.2 in our experiments. This redened distancecan be expressed as in Equation (4).

    dTOLx; y x; y 2 Rj 1;jx yj > 0:20;j x yj0:2

    (4)

    The method continues by also manipulating the predic-tion value by perturbing it to a randomized level. Particu-larly, a random value between 0.4 and 0.4 was addedto the prediction. This perturbation was observed to avoidloops and cyclic limits of the produced data, which other-wise occurs. The addition result was assumed as the rstsample of articial data, and the history was updated byattaching this sample to the end of list, which was thethree-year data in the beginning. These sample generationand history update steps were continued until the lengthof generated data corresponds to one year. These steps ofthe Myc-1 algorithm are also explained in the ow diagramshown in Figure 1.

    The purpose of random noise is protection from beingstuck to a repeat search. Unless the random noise is used,the generated data will be a copy of a part of the history.Because of the proposed modication, our generated datawould be unique and totally different from the originaldata. On the other hand, the generated data have statistical

    LengthHistory = 3 365 24N = 0 , PN = 0

    Search Repeatition

    Repeat Found?

    Update NP

    PN = PN + NoiseN = N + 1

    N = 365 24? { },History History PN=

    Stop Procedure

    Extend Searched Pattern Shorten Scanned History

    Yes

    Yes

    No

    No

    Figure 1. Flow diagram of Myc-1 algorithm. PN: Nth sample of generated wind speed data.

    Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu and . N. Gerek

    1228 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • properties, which are pleasingly similar to the original dataas presented in Section 4. The concept of distinct syntheticdata production with correlation-wise and distribution-wise similar statistical properties constitutes the main ideaof the proposed methods for a successful articial windspeed generation.

    2.2. Myc-2 algorithm

    In the Myc-2 algorithm, the original data values were rstrounded to the nearest integer values. This rounding stepcauses reduction of the number of wind speed levels andconsequently increases the lengths of repeating segments.Naturally, a Hamming distance is not necessary for theinteger-valued sequence, because exact matching is possi-ble with integer comparisons.

    The algorithm starts by predicting the rst value fromthis rounded history. Then the history data are separatedinto three clusters corresponding to three years. The rstfour days data (corresponding to 244 data values) are

    generated from any of the three randomly selected clusterof the history.

    The searched part was limited within the selected datacluster. Arbitrarily changing the cluster makes sure thatthe generated data do not exactly duplicate a long seg-ment of the history data, which may eventually stick intoa cyclic and repeating pattern. Once the four days dataare generated, the selected part of the history is shufedwith another randomly selected cluster of the history.These generation and history shufing steps, which areexplained with the ow diagram in Figure 2, continue untilone-year articial data generation is completed.

    The main motive of Myc-2 is obtaining not only similarstatistical properties to the original data but also similarspectral and correlation characteristics and autocorrelationproperties with the original data. These similarities werestated and analyzed in detail in Section 5. The historyshufing gives the opportunity of analyzing different partsof total history and avoids being stuck to the search in thesame place of past data, which consequently avoids

    { }1 2 3, ,History History History History=Round History

    365 24History1 History2 History3Length Length Length= = = 0, 0, 0NN P M= = =

    1N N= +

    M jHistory History=

    { }1,2,3j

    Extend Searched PatternShorten Scanned History

    Repeat Found?

    Search Repeatition

    Update NP

    ( )% 24 4 0 ?N =

    { },M M NHistory History P=

    365 24?N =

    Stop Procedure

    1M M= +

    Yes

    Yes

    Yes

    No

    No

    No

    Figure 2. Flow diagram of Myc-2 algorithm. PN: Nth sample of generated wind speed data.

    Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu and . N. Gerek

    1229Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • generating a synthetic data that may be a long copy ofhistory.

    3. WIND SPEED GENERATIONUSING MARKOV CHAINS

    To compare the proposed Mycielski methods with theclassical Markov-based methods, we briey describe theMarkov method here. Markov chains are based on theprobabilities of observing a transition from one state(say, a predened wind speed interval) to another withindiscrete time intervals. The probabilities are expressed ina form of, so called, probability transition matrix. Whileconstructing a Markov model, nitely many states of thesystem must be determined [17]. In this study, to generatewind speed time series, wind speed values are transformedinto wind state intervals. The boundaries of the intervalsare selected as 1m/s. Then, the corresponding Markovchain transition probabilities are calculated.

    As an illustration, let the number of states at each timeinstant be n. Consequently, there will be nn transitionsbetween two successive time instances. It is then possibleto nd the number of transition probabilities, Pij, from astate at time t to another state at time t+1, and accordingly,the following transition probabilities matrix (A) can beconstructed from observed wind speed data that includesthe described transition probabilities at the correspondingrows and columns.

    A

    p11 p12 p13 . . . p1np21 p22 p23 . . . p2n: : : . . . :: : : . . . :pn1 pn2 pn3 . . . pnn

    266664

    377775 (5)

    For state transition matrix A, the following constraints(6) and (7) must be satised;

    0pij1 (6)

    0pij1 (7)

    The probabilities Pij in Equation (5) can be calculatedfrom Equation (8)

    pijmijPjmij

    i; j 1;2; . . . ;n(8)

    Here mij represents the number of observed transitionsfrom states i to j that happens within one step of a timeinterval.

    In practice, the transition probability matrix elementsconstitute the relative frequency of the measured windspeeds that fall into the jth state at time t+1 provided thatit was in the ith state at the previous time step. Finally,the cumulative transition probabilities of a system given

    with transition probabilities in Equation (5) can be calcu-lated using Equation (9).

    Pik Xkj1

    pij (9)

    Here, Pik represents the transition probability in the ith

    row at the kth state.After model construction is completed by probability

    calculations using available training data, the algorithmgiven in the succeeding paragraphs is applied to generatesynthetic wind speed time series.

    1. Cumulative transition probabilities are calculatedusing Equation (9), and a cumulative transitionmatrix in the form of Equation (5) is obtained.

    2. An initial state is selected at random (using randomnumber generation fed by the Weibull distributionwith available parameters).

    3. A random number is produced with uniform distribu-tion between 0 and 1.

    4. The upper bound of the interval in which this randomvalue is greater than the cumulative probability of theprevious state but less than or equal to the cumula-tive probability of the following state is determinedto be the new wind state.

    To produce more realistic articial data, we also addeda random amount of noise to the state values. This algo-rithm is available in previous studies [712] and is widelyused for wind speed generation from the rst-order andsecond-order Markov chains.

    4. RESULTS

    4.1. Data generation results

    To test wind speed data generation abilities of modiedMycielski (proposed herein) and Markov chains, we usedwind speed data belonging to the rst three years (20032005) for training, whereas the remaining one-year data(2006) are used for testing purposes for two regions, Izmirand Kayseri. The training and test data for Izmir andKayseri regions are ploted in Figures 3 and 4, respectively.

    First, the modied Mycielski algorithms mentioned inSection 2 are applied, and one-year wind speed data aregenerated for both regions using the available training data.Then, using the Markov chain approach, we calculatedstate transition probabilities of training data for bothregions, and we applied the algorithm given in Section 3to generate articial wind speed data. To test the efciencyof the methods, we compared basic statistics of generateddata from both methods. The basic statistics of the gener-ated and measured data for Izmir and Kayseri regions aregiven in Table II. In these tables, the basic statistics oftraining data used for generation are also presented. Itshould be noted that the basic statistics constitute an initial

    Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu and . N. Gerek

    1230 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • performance measure, and they do not provide time-variational behavior of the produced data. Detailedanalysis is discussed in Section 5.

    In Table II, Myc-1 and Myc-2 indicate two differentmodied Mycielski algorithms as mentioned in Section 2,whereas Markov indicates wind speed data generation us-ing Markov approach as mentioned in Section 3.

    It can be observed from Table II that mean values of theactual test data are closer to the mean values of the gener-ated data for the Mycielski methods as compared with theMarkov approach. Similar observations can be made forthe median values, too. The standard deviations of themethods are relatively close; therefore, the variations ofthe generated data are similar for all methods.

    To illustrate the sample-wise performance of the methodson wind speed data generation, the generated data fromMarkov, Myc-1, and Myc-2 are plotted in Figure 5 for Izmirregion (such graphs for Kayseri region are also availableupon request but not reported in the manuscript to save

    space). Clearly, Markov and Myc-1 generate data valueswith arbitrary numerical values, whereas, because of itsrounded (interval based) number generation behavior,Myc-2 produces data with stepped values (Figure 7).

    0 0.5 1 1.5 2 2.5x 104

    0

    5

    10

    15

    20

    Hour

    Win

    d Sp

    eed

    (m/s

    )

    (a)

    0 2000 4000 6000 8000 100000

    5

    10

    15

    Hour

    Win

    d Sp

    eed

    (m/s

    )

    (b) Figure 4. (a) Train data, (b) test data obtained from Kayseri.

    0 0.5 1 1.5 2 2.5x 104

    0

    5

    10

    15

    Hour

    Win

    d Sp

    eed

    (m/s

    )

    (a)

    0 2000 4000 6000 8000 100000

    5

    10

    15

    Hour

    Win

    d Sp

    eed

    (m/s

    )

    (b) Figure 3. (a) Training data, (b) test data obtained from Izmir.

    Table II. Basic statistics of the data for Izmir and Kayseri.

    Max(m/s) Mean(m/s) Median(m/s) Std. Dev. (m/s)

    IzmirTest data 13.6 3.4 3.1 2.0Train data 13.6 3.7 3.4 2.1Markov 14.0 3.0 2.7 2.0Myc-1 12.3 3.5 3.2 2.0Myc-2 11.0 3.6 3.0 2.0KayseriTest data 14.2 1.5 1.1 1.3Train data 16.4 1.6 1.2 1.3Markov 10.9 1.2 0.8 1.3Myc-1 10.8 1.6 1.2 1.3Myc-2 12.0 1.5 1.0 1.2

    Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu and . N. Gerek

    1231Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • Despite its steppedvalued visualization, the statistical andautocorrelation-wise properties of the Myc-2 method werefound to be superior to the other tested methods.

    Finally, examining the basic statistics and time devia-tions of generated and measured data, we can conclude thatmodied Mycielski algorithms perform better than Markovmodels for wind speed data generation.

    A proper accuracy analysis of the produced syntheticwind speed data should include the examination of Weibullparameters, Markov transition probabilities, spectral densi-ties, and calculated autocorrelation values of measured andgenerated test data for cyclic behavior tests. These compar-ative tests are performed in the following subsections. Theanalyses are performed for both Izmir and Kayseri regions,and the results are interpreted.

    4.2. Weibull parameters of generated data

    In wind speed-related studies such as wind regime deter-mination, Weibull parameter values play an importantrole. For instance, Ulgen and Hepbasli explained theWeibull parameters of the wind speed for Izmir region,for the years between 1995 and 1999 [18]. ahin and

    Aksakal also analyzed the Weibull parameters of windspeed for the eastern region of Saudi Arabia [19]. Classi-cally, the histogram of wind speed values obeys a Weibulldistribution where the parameters (such as variance andmean values) are utilized for determination of the wind po-tential and regime. Therefore, size optimizations of windturbines can be performed according to expected electricalenergy from wind turbines. Accurate optimizations obvi-ously depend on accurate determination of the describedWeibull parameters. Therefore, in this subsection, the suc-cess of the proposed method on wind speed data genera-tion is examined by tting the test and generated data toWeibull distribution (with two parameters) according tothe data obtained from both regions. The Weibull distribu-tion function with two parameters can be described byEquation (10).

    f v kc

    v

    c

    k1exp v

    c

    k v0; k; c > 0

    (10)

    In this equation, k and c represent shape and scaleparameters of the distribution function.

    0 2000 4000 6000 8000 100000

    5

    10

    15

    Hour

    Win

    d Sp

    eed

    (m/s

    )

    0 2000 4000 6000 8000 100000

    5

    10

    15

    Hour

    Win

    d Sp

    eed

    (m/s

    )

    (a)

    0 2000 4000 6000 8000 100000

    2

    4

    6

    8

    10

    12

    Hour

    Win

    d Sp

    eed

    (m/s

    )

    (c)

    (b)

    Figure 5. Generated wind speed data from (a) Markov, (b) Myc-1, (c) Myc-2 approach for Izmir.

    Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu and . N. Gerek

    1232 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • The Weibull parameters of the data are calculated andtabulated in Table III for the measured and generated dataof Izmir and Kayseri regions.

    It can be calculated from Table III that absolute errorvalues between measured test data and Markov approachis 0.2372 and 0.0638 for k and c parameters, respectively.On the other hand, absolute errors for the same parametersbetween measured test and proposed Myc1, Myc2 gener-ated data are calculated as 0.0517, 0.0854 and 0.1759,0.057. Therefore, it can be claimed that the generated dataare well suited for wind regime determination. To illus-trate, the distribution-wise matching of generated data toWeibull distributions for wind regime histograms ofgenerated data of the Izmir region from the proposedmethods are plotted in Figure 6. Similar histogram is alsoavailable upon request for the Kayseri region.

    Corresponding theoretical Weibull distributions are alsodrawn as overlay plots on the histograms. It is clear fromthese plots that distributions of the generated data closelymatch the Weibull behavior.

    4.3. Markov transition probability analysisof generated data

    In this section, to further analyze the efciency of theproposed methods, we calculated the Markov transitionprobabilities of the generated data from each methodand compared with the transition probabilities of mea-sured data. The transition probabilities are calculatedfrom Equation (6), and transition probability matrices areformed. For simple visualization, the matrices are renderedin mesh plots as presented in Figure 7 for the Izmir region.It can be noticed that the transition structures of all of themethods closely match the structure obtained from the testdata with some differences in detail.

    4.4. Spectral density analysis of generateddata

    The energy spectral density describes how the energy(or variance) of a signal or a time series is distributed

    Table III. Weibull parameters of measured (test) and generated (Markov, Myc-1, and Myc-2) data.

    Region

    Test Markov Myc-1 Myc-2

    k c k c k c k c

    Izmir 3.9765 1.8604 3.7393 1.7965 4.0282 1.9458 4.1524 1.9174Kayseri 1.8877 1.5482 1.9236 1.5311 2.0121 1.5689 1.8436 1.6183

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 140

    0.1

    0.2

    Wind Speed Intervals (m/s)

    Prob

    abili

    ty D

    ensit

    y

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 140

    0.1

    0.2

    Wind Speed Intervals (m/s)

    Pro

    babi

    lity

    Den

    sity

    (a) (b)

    0 1 2 3 4 5 6 7 8 9 10 11 120

    0.1

    0.2

    Wind Speed Intervals (m/s)

    Pro

    babi

    lity

    Den

    sity

    0 1 2 3 4 5 6 7 8 9 10 110

    0.1

    0.2

    Wind Speed Intervals (m/s)

    Pro

    babi

    lity

    Den

    sity

    (c) (d)Figure 6. Distribution histograms of (a) measured test data (b) generated data from Markov (c) generated data from Myc-1, and (d)

    generated data from Myc-2 approaches for Izmir.

    Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu and . N. Gerek

    1233Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • with frequency. In this study, the spectral analysis ofthe wind speed time series is carried out to obtain theinformation about oscillatory changes in wind speed.Such an analysis is useful to illustrate spectral similari-ties of the generated and measured data. Although theprevious techniques (in Sections 4.1 and 4.2) were com-monly applied for analysis, such time-variational analysis(together with the autocorrelation analysis) is not a com-monly applied analysis for the articially generated dataexcept for Aksoy et al. who compared autocorrelationcoefcients of the observed and generated data. The reasonfor staying away from spectral or correlation analysis wasthe lack of time-variational similarity of articial data tothe natural data in previous studies. The Mycielski meth-ods (particularly, Myc-2) not only accurately model theWeibull parameters but also provide a natural oscillatorybehavior in the generated data. To motivate this observa-tion, we calculated the spectral densities of generated datafrom Markov, Myc-1, and Myc-2 approaches and com-pared with the spectral behavior of the original test data.The spectra are plotted for the Izmir region in Figure 8.Similar spectra plot for the Kayseri region are not reportedhere to save space. However, it is available upon requestfrom the authors.

    In the plots of spectra, fundamental harmonic points,which are the rst peaks following the start of the graph,correspond to the frequency value of the main oscillation.In these three graphs, the tagged points show that the fun-damental frequency peak appears at frequency=11.94mHz.This frequency corresponds to the period of 24h, whichmeans one day. The other harmonics naturally appear atinteger multiples of the fundamental frequency. The exis-tence of a fundamental period of 24h is a natural andexpected behavior of wind phenomenon. The existence ofthis behavior for the case of synthetic data generated bythe Myc-2 method (Figure 8c) indicates an importantstrength of the proposed method in terms of depicting thehard-to-achieve cyclic structure of the natural wind speeddata. This is a clear advantage of the method, which doesnot exist in other generation methods, and parametricdesign of dynamic wind power systems is thought tobenet from this accuracy.

    4.5. Autocorrelation results of generateddata

    As a nal study, the characteristics of the generated datafrom all approaches are discussed in the sense of

    05

    10

    0

    5

    100

    0.2

    0.4

    0.6

    0.8

    Wind State (i)Wind State (j)

    Tran

    sitio

    n Pr

    oba

    bilit

    y

    05

    10

    0

    5

    100

    0.2

    0.4

    0.6

    0.8

    Wind State (i)Wind State (j)

    Tran

    sitio

    n Pr

    oba

    bilit

    y

    05

    10

    0

    5

    100

    0.2

    0.4

    0.6

    0.8

    Wind State (i)Wind State (j)

    Tran

    sitio

    n Pr

    obab

    ility

    0 24 6

    8 10

    0

    5

    100

    0.2

    0.4

    0.6

    0.8

    Wind State (i)Wind State (j)

    Tran

    sitio

    n Pr

    oba

    bilit

    y

    (a)

    (c)

    (b)

    (d)Figure 7. State transition probabilities of (a) test data (b) generated data fromMarkov (c) generated data fromMyc-1 (d) generated data

    from Myc-2 for Izmir.

    Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu and . N. Gerek

    1234 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • autocorrelation calculations. The autocorrelations at timelag k are determined using the following equation:

    rk 1

    NkPNk i1

    xi x

    xik x

    1NPNi1

    xi x

    xi x (11)

    where x is the mean of wind speed time series (xi, i=1, 2,. . ., N). Mathematically, the autocorrelation function canbe considered as the inverse Fourier transform of the powerspectral density. Because the spectral harmonic behavior ofthe Myc-2 method is evident, it is expected that the auto-correlation function of this method can be superior to theother methods. The autocorrelations calculated from thedata of both regions for the measured and generatedwind speed are presented in Figure 9a and b for Izmirand Kayseri regions, respectively.

    The critical difference and strength of the Mycielskimethod is apparent because of this analysis. In Figure 9aand b, peak values of autocorrelation functions are indi-cated for measured (test) data. The autocorrelation func-tions can be interpreted as the inverse Fourier transformsof the spectral densities depicted in Figure 8 as explainedin Section 4.4. The peaks appearing at the fundamentalfrequencies in Figure 8 also appear as peaks in the

    autocorrelation functions, but this time, at the positions ofthe fundamental period. Naturally, the location of the funda-mental period corresponds to 24h (one day). In other words,there are considerable correlations between the data obtainedat the same hour of consecutive days. By analyzing the au-tocorrelation gures, similar to the analysis in Section 4.4,we can see that Markov and Myc-1 are unable to depictthe cyclic (or pseudo-periodic) behavior of the wind speeddata (in a smoothly decaying function, we see no autocor-relation peaks of the plots to indicate a cyclic behavior).This inability was also conrmed in the plots of Shamshadet al. in their study about synthetic wind speed generationsfrom Markov-based models [12]. However, articiallygenerated data using the proposed Myc-2 algorithm havethe ability to catch these correlation-wise cyclic behavioras depicted in Figure 9a and b. This capability is veriedby the data obtained from the two geographically differentlocations (Izmir: seaside, Kayseri: innerland). The describednatural behavior in terms of daily correlations was found toexist in a similar way for both sites, although in Izmir, itlooks better for Myc-2 method.

    5. CONCLUSION

    In this study, novel approaches for synthetic wind speedgeneration are proposed and compared with approaches

    0 50 100 15010

    10

    10

    10

    Frequency (MicroHertz)

    Pow

    er S

    pect

    rum

    in d

    B

    Test DataMarkov

    0 50 100 15010

    10

    10

    10

    Frequency (MicroHertz)

    Pow

    er S

    pect

    rum

    in d

    B

    Test DataMyc-1

    (a)

    0 50 100 15010

    10

    10

    10

    Frequency (MicroHertz)

    Pow

    er S

    pect

    rum

    in d

    B

    Test DataMyc-2

    (c)

    (b)

    Figure 8. Spectral density of measured (test) data and (a) Markov generated, (b) Myc-1 generated, (c) Myc-2 generated data for Izmir.

    Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu and . N. Gerek

    1235Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • that existed in the literature. The proposed approaches usemodications of the Mycielski prediction algorithm forgenerating samples of articial wind speed data. TheMycielski algorithm basically searches for long repetitionpatterns in the history for predicting a next sample in thetime series. To demonstrate the efciency of the proposedalgorithms, we selected two geographically differentregions (Izmir and Kayseri) and used the wind speed dataof these regions. Apart from the ne performances in termsof matched Weibull distributions and state transition prob-abilities, it is observed that one of the modied Mycielskialgorithms (notated by Myc-2) has the ability to catchand produce samples with daily quasi-cyclic behavior. Thisproperty inherently exists in real-life data, and it can beseen from the peaks of autocorrelation or spectral densityplots. The Myc-2 method was observed to produce datawith very similar autocorrelation characteristics. Such aproperty of generating articial wind speed data with natu-ral daily variations (due to daynight transitions) was notencountered in the data generated by previous articialdata generation methods in the literature, making the pro-posed approach a noteworthy model for understandingthe time series mechanisms of the wind speed phenome-non. The pattern search strategy of the Mycielski algorithm

    proves to constitute a promising and plausible approach forsimilar applications requiring forecasting.

    NOMENCLATURE

    x = measured data valuex

    = predicted value of xdH = Hamming distancem = location of the sample that is used as predicted

    valuef = function of predictiondTOL = predened distanceA = state transition matrixpij = probability of transition from states i to jmij = number of observed transitions from states i to j

    that happens within one step of a time intervalPik = transition probability in the i

    th row at the kth statek = shape parameter of Weibull distribution functionc = scale parameter of Weibull distribution functionf(v) = Weibull distribution functionx = mean of wind speed time seriesrk = autocorrelations at time lag kn = length of the data

    0 10 20 30 40 50 60 70 80 90 100-0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Lag (hours)

    Au

    toco

    rrel

    atio

    n

    Test DataMyc-1Myc-2Markov

    (a)

    0 10 20 30 40 50 60 70 80 90 100-0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Lag (hours)

    Au

    toco

    rrel

    atio

    n

    Test DataMyc-1Myc-2Markov

    (b)Figure 9. Autocorrelation functions of measured and generated wind speed data for (a) Izmir and (b) Kayseri.

    Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu and . N. Gerek

    1236 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er

  • ACKNOWLEDGEMENTS

    The authors thank the Turkish State Meteorological Service(DMI) for supplying hourly wind speed data. The authorsare also grateful to four anonymous reviewers of this journalfor their helpful comments on an earlier version of thispaper.

    REFERENCES

    1. Celik AN. A statistical analysis of wind power densitybased on the Weibull and Rayleigh models at thesouthern region of Turkey. Renewable Energy 2004;29:593604.

    2. Hrayshat ES. Wind resource assessment of theJordanian southern region. Renewable Energy 2007;32:19481960.

    3. Kavak Akpinar E, Akpinar S. An assessment on sea-sonal analysis of wind energy characteristics and windturbine characteristics. Energy Conversion andManagement 2005; 46:18481867.

    4. Migoya E, Crespo A, Jimnez , Garca J, Manuel F.Wind energy resource assessment in Madrid region.Renewable Energy 2007; 32:14671483.

    5. Kurban M, Hocaoglu FO. Potential analysis of windenergy as a power generation source. Energy Sources,Part B: Economics, Planning, and Policy 2010;5:1928.

    6. Hocaolu FO, Kurban M. Regional wind energyresource assessment. Energy Sources, Part B: Eco-nomics, Planning, and Policy 2010; 5:4149.

    7. Sahin AD, Sen Z. First order Markov chain approachto wind speed modeling. Journal of Wind Engineeringand Industrial Aerodynamics 2001; 89:263269.

    8. Tore MC, Poggi P, Louche A. Markovian model forstudying wind speed time series in Corsica. Interna-tional Journal of Renewable Energy Engineering2001; 3:311319.

    9. Youcef Ettoumi F, Sauvageot H, Adane AHE. Statisti-cal bivariate modeling of wind using rst order

    Markov chain and Weibull distribution. RenewableEnergy 2003; 28:17871802.

    10. Aksoy H, Toprak ZF, Aytek A, nal NE. Stochasticgeneration of hourly mean wind speed data. Renew-able Energy 2004; 29:21112131.

    11. Shamshad A, Bawadi MA, Wan Hussin WMA,Majid TA, Sanusi SAM. First and second orderMarkov chain models for synthetic generation of windspeed time series. Energy 2005; 30:693708.

    12. Hocaolu FO, Gerek N, Kurban M. The effect ofMarkov chain state size for synthetic wind speed genera-tion. The 10th International Conference on ProbabilisticMethods Applied to Power Systems (PMAPS2008),Rincn, Puerto Rico, May 2529, 2008.

    13. Hocaolu FO, Gerek N, Kurban M. A novel windspeed modeling approach using atmospheric pressureobservations and hidden Markov models. Journal ofWind Engineering and Industrial Aerodynamics2010; 98:472481.

    14. Hocaolu FO, Fidan M, Gerek N. Mycielski ap-proach for wind speed prediction. Energy Conversionand Management 2009; 50:14361443.

    15. Fidan M, Gerek ON. A time improvement over theMycielski algorithm for predictive signal coding:Mycielski-78. Proceedings of the 14th European Sig-nal Processing Conference EUSIPCO 2006, Florence,Sep. 2006.

    16. Ehrenfeucht A, Mycielski J. A pseudorandom se-quencehow random is It. The American Mathemati-cal Monthly 1992; 99(4):373375.

    17. Benjamin JR, Cornell CA. Probability, Statistics andDecision for Civil Engineers. McGrawHill: NewYork,1970.

    18. Ulgen K, Hepbasli A. Determination of Weibullparameters for wind energy analysis of Izmir, Turkey.International Journal of Energy Research 2002;26:495506.

    19. ahin AZ, Aksakal A. A statistical analysis of windenergy potential at the eastern region of Saudi Arabia.International Journal of Energy Research 1999;23:909917.

    Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu and . N. Gerek

    1237Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley & Sons, Ltd.DOI: 10.1002/er