dipartimento di informatica e scienze dell ......istituto nazionale per la fisica della materia...

DIPARTIMENTO DI INFORMATICA ESCIENZEDELL’INFORMAZIONE

UNIVERSITA’ DI GENOVA - ITALY

TechnicalReportDISI-TR-00-4(v 1.1)

Discontinuousand Intermittent SignalForecasting:A Hybrid Approach

FrancescoMasulliIstitutoNazionaleper la Fisicadella Materia&

DISI-Dipartimentodi InformaticaeScienzedell’InformazioneUniversita di Genova,Via Dodecaneso35, I-16146Genova, Italy

E-mail: [email protected]

Giovambattista CicioniIstitutodi RicercasulleAcque

ConsiglioNazionaledelleRicerche, Via Reno1, I-00198Roma,ItalyE-mail: [email protected]

LeonardStuderInstitutdePhysiquedesHautesEnergies

UniversitedeLausanne, CH-1015Dorigny, SwitzerlandE-mail: [email protected]

Abstract

A constructive methodologyfor shapinga neuralmodelof a non-linearprocess,sup-portedby resultsandprescriptionsrelatedto the Takens-Mane theorem,hasbeenre-centlyproposed.Following thisapproach,themeasurementof thefirst minimumof themutual informationof the outputsignalandthe estimationof the embeddingdimen-sionusingthemethodof globalfalsenearestneighborspermitto designtheinput layerof a neuralnetwork or a neuro-fuzzysystemto beusedaspredictor. In this paperwepresentan extensionof this predictionmethodologyto discontinuousor intermittentsignals. As the Universalfunction approximationtheoremsfor neuralnetworks andfuzzy systemsrequiresthecontinuityof thefunctionto beapproximate,we apply theSingular-SpectrumAnalysis(SSA) to the original signal, in orderto obtaina familyof time seriescomponentsthataremoreregularthantheoriginal signalandcanbe,inprinciple,individually predictedusingthementionedmethodology. Onthebasisof thepropertiesof SSA,thepredictionof theoriginal seriescanberecoveredasthesumofthoseof all theindividualseriescomponents.Weshow anapplicationof thispredictionapproachto a hydrologyproblemconcerningtheforecastingof daily rainfall intensityseries,usinga databasecollectedfor aperiodof 10yearsfrom 135stationsdistributedin theTiber riverbasin.

Chapter 1

Intr oduction

In thelastfew years,neuralnetworksandfuzzysystemshavebeenwidely usedin non-linear dynamicsystemsmodelingandforecasting.Thoseapplicationsaresupportedon UniversalFunctionsApproximationpropertiesholding for both neuralnetworksandfuzzy system[3, 12, 33]. However, from thesetheoremsno informationcanbeobtainedin orderto definethestructureof learningmachine,while theapplicationthiskind of systems,suchasa Multi-Layer Perceptrons(MLPs), to the problemof timeseriesforecastingimplies the settingof: the numberof units in the input layer, thesamplingtimeof theseries,andthestructureanddimensionof hiddenlayers.Theneu-ral network theorygivesonly generalsuggestionsin orderto choosethesequantities.Thespecificitiesof datasethaveto betakeninto accountat this level to tailor theMLPto thetime serieswhich have to beforecasted.

As shown by our groupin [26, 22], a constructive methodologyfor shapinga su-pervisedneuralmodelof anon-linearprocesscanbebasedontheresultsandprescrip-tionsrelatedto theTakens-Mane theoremandon theGlobal FalseNearestNeighborsmethod(FNN) proposedby Abarbanel[1]. In [10] a similar methodologyhasbeenindependentlyproposed.

In this paperwe presentanextensionof this methodologyto thepredictionof dis-continuousor intermittentsignals. basedon an ensemblemethodusinga decompo-sition basedon the Singular-SpectrumAnalysis(SSA) [31] . The SSA decomposestheraw signalin reconstructedcomponents. Eachreconstructedcomponentcanbe,inprinciple,predictedseparatelyusingtheapproachpresentedin [22]. Then,thepredic-tion of the original seriescanbe recoveredasthe sumof thoseof all the individualseriescomponents.Theapproachfollowedin the paperis basedon the predictionofreconstructedwavesthataresumof disjoint groupsof reconstructedcomponentsandon thefollowing recompositionof theforecastingof theraw signalby additionof thepredicitionsof individual reconstructedwaves.

We presentalsoanapplicationof thepresentedmethodsto theforecastingof rain-fall intensityseriescollectedin theTiber basin.

In thenext chapter, wepresenttheMulti-Layer Perceptronsanddiscusstheirprop-ertiesrelevant to seriesforecasting. In Ch. 3 we give the basisof the methodologyfor time seriesforecasting.In Ch. 4 we introducetheSingularSpectrumAnalysisand

1

ensemblemethodbasedonSSAdecomposition.In Ch. 5 wepresenttheapplicationtorainfall forecastingandtheobtainedresults.Theconclusionsof thework aredrawn inCh. 6.

2

Chapter 2

Learning Time SeriesusingMulti-Lay er Perceptrons

Theoreticalandexperimentalresultssupportthe useof neuralnetworks in many ap-plicative tasks. In particular, it hasbeenshown that suchsystemscanperformfunc-tion approximation[3, 12], Bayesianclassification[24], unsupervisedvectorquantiza-tion clusteringof inputs[14], content-addressablememories[11], linearandnon-linearprincipalcomponentanalysisandindependentcomponentanalysis[9].

Artificial neuralnetworksaremadeup of simplenodesor neuronsinterconnectedto oneanother. Generallyspeaking,a nodeof a neuralnetwork canbe regardedasablock thatmeasuresthesimilarity betweenthe input vectorandtheparametervector,or weightvector, associatedto thenode,followedby anotherblock that computesanactivation function, normally not linear [18, 9]. The transferfunction of an artificialneuronis givenby theequation:

y H∑i

wixi θ (2.1)

wherey is the outputof the neuron,H is a the activation function,wi areweights,xi

aretheinputs,andθ is thethreshold.Themostusedneuralnetwork is theMulti-Layer Perceptron(MLP) that is a feed-

forward modelbasedon layersof neurons. Nodesof eachlayer are interconnectedwith all nodesof the following layer. In this way Multi-Layer Perceptronsperformnon-linearmapsfrom aninputspaceto anoutputspace.

Moreover, asdemonstratedby the Universal ApproximationTheorem [3, 12], anMLP with a single hidden layer 1, and using sigmoid activation functionsH

x

1 1 exp ax , wherea is theslopeparameterof thesigmoidfunction),is sufficient

to uniformly approximateany continuousfunctionwith supportin a unit hypercube.The non-linearmap can be automaticallylearnedfrom databy a MLP thought

supervisedlearningtechniquesbasedon the minimizationof a costfunction,suchas1Theoutputnodesconstitutetheoutputof theMLP. Theremainingnodesconstitutehiddenlayers of the

network.

3

the Root Mean Square(RMS) error . The most diffusedlearningtechniqueis theError Back-Propagationthatis anefficientapplicationof theGradientDescentmethod[25, 9].

Theuniversalapproximationpropertyimpliesthat,if thenon-lineardynamicalpro-cesscanberepresentedby a continuousfunction,anefficient non-linearmodelcanbebuilt from datausingaMulti-Layer Perceptron.UsingMLPsthecostlydetaileddesignstepof thefirst principlesmodelusuallyimplementedin thenon-linearsystemidenti-ficationis transformedin a moresimplerstructuringstepof theMLP plusanoptionalpre-processing(eventuallydriven by any understandingof the physicalmodelof theprocess)of theraw datacomingfrom thefield.

Even if, in principle, the functionapproximationpropertyof MLP guaranteesthefeasibility of data-basedmodelsof non-lineardynamicalsystems,the neuralnetworktheorydon’t giveany suggestionaboutmany details:for exampleno generalprescrip-tionsareavailableconcerningthedimensionof thedatawindow (i.e. input layerof theMLP), thesamplingrateof theinputdata,thedimensionof thehiddenlayer, andthedi-mensionof thetrainingset,andthenmostof timethosefundamentaldesignparametershaveto beobtainedby experimentsandheuristics.

4

Chapter 3

Hints fr om Dynamical SystemsTheory

3.1 Dynamical Systemsand ChaosTheory

3.1.1 StateSpace

A deterministicdynamicalsystemis describedby a setof differentialequations.Itsevolution is representedby thetrajectoryin statespace(of dimensionn) of thevectorQ

x x y y z z wherex x y y z z arethevariablesof thesystemandtheirderivatives.Thefiguremadein statespaceby Q is theattractorof thesystem.

For non-linearsystems,thedynamicalvariables(x y z ) arecoupled.Theevolu-tion of onevariable(let sayx) is not independentof all theotherones(y z ). Exceptfor few simplephenomena,the setof differentialequationsis unknown. Even,oftenthewholesetof relevanteffectivedynamicalvariablesis notalwayswell defined.But,asthevariablesareinterdependent,theobservationof only oneof thosebringsinfor-mation— maybein an implicit way — on the otheronesandconsequentlyon thecompletedynamicalsystem.This is thereasonwhy time seriesof non-lineardynamicsystemsaresouseful.

3.1.2 EmbeddingTheorem

The questionis now: “How to reconstructthe completedynamicalsystemwith onlythe one-variabletime series

s1 s2 s3 ?” Here the theoryof dynamicalsystems

givesan answer. In particular, the EmbeddingTheorem proposedindependentlyin1981TakensandMane [27, 17] givesananswerto theabovequestion.

In the Takens-Mane theoremwe consideran augmentedvectorS built with d el-ementsof the time series.The dimensionof the vectord hasto be greaterthantwotimesthebox-countingdimensionD0 of theattractorof thesystem:

d 2D0 (3.1)

5

A vector S satisfying the Takens-Mane boundcited in the previous paragraphwillevolve in a reconstructedstatespace,andits evolution will be in a diffeomorphicre-lation with theoriginal Q statespacepoint (a diffeomorphismis a smoothone-to-onerelation). In otherwords,for every practicalpurposestheevolution of S is a fair copyof theevolutionof Q.

It is worth noting that thereis a distinctionbetweenthe orderof the differentialequation(n) whichis thedimensionof thestatespacewherelivethetruestatevectorQandthesufficientdimensionof a reconstructedstatespace(d) wherethereconstructedvectorS lives.

3.1.3 An Example

In orderto elucidatetheEmbeddingTheorem,let considerasinewavest A sin

t . In

d=1(i.e. thest space)this wave oscillatesin theinterval A A . Two pointswhich

areclosein thesenseof Euclidean(or otherdistance)mayhave quitedifferentvaluesof s

t . In thiswaytwo ”close” pointsmaymovein oppositedirectionsalongthesingle

spatialaxis. In a two dimensionalspacest st T , whereT is a time lag,theambiguity

of the dynamicsof pointsis resolved. The systemevolveson a figure (in generalanellipse)thatis topologicallyequivalentto acircle. If wedraw thesinewavein thethreedimensions

st st T st 2T , no furtherunfoldingoccursandthesineis representedas

anew ellipse.

3.1.4 The Method of Embedding

In order to reconstructthe dynamicalsystemwe can usethe time delay embeddingmethod[1]. This method consists in building d-dimensionalstatevectorsSi

si si T si d 1 T . In principle,it sufficesthatd n. But, theeffectivedimension

d is notdirectly relatedto thedynamicaldimensionn asin thecaseof weakcoupledvariables.

3.2 Choosingthe time delay

ThetimedelayT (or timelag) usedin theembeddinghasto bechosencarefully. If it istoo long,thesamplessi si T si d 1 T arenotcorrelated1 andthen,in general,thedynamicalsystemcannotbereconstructed.If it is tooshort,everysampleis essentiallyacopy of thepreviousone,bringingvery little informationon thedynamicalsystem.

We usethe Shannon’s mutual informationto quantify the amountof informationsharedby two samplesin order to get an usefulestimationof the time lag T. Let’sdefinedtheaverage mutualinformationbetweenmeasurementsai drawn from thesetA andmeasurementsbi drawn from setB. Thesetof measurementsA is madeof thevaluesof theobservablesi andthesetB is madeof thevaluessi t (t is a time interval).

1This happensin particularfor chaoticsystems,for which even two initially closechaotictrajectorieswill divergeexponentiallyin time.

6

Averagemutualinformationis then:

It ∑

si A si t B

Psi si t log2

Psi si t

Psi P si t (3.2)

whereP areprobabilitiesdistributionsbasedon frequency observations.

It hasbeensuggested[6, 5, 29, 1] to take the time T, wherethe first minimumof I(t) occurs,asthe valueto useat the time delayin thephasespacereconstruction.In this way the valuesof sn andsn T are the most independentof eachother in aninformation-theoreticsense.

Moreoverthefirst minimumof averagemutualinformationis agoodcandidatefortheinterval betweenthecomponentsof thestatevectorsthatwill beinput to theneuralnetwork modelof thenon-lineardynamicalprocess.

3.3 Evaluating the Global EmbeddingDimension

From the EmbeddingTheorem,the box countingdimensionD0 shouldbe evaluated.In principle,it canbeestimateddirectly from thetimeseriesitself, but this taskis verysensitive to thenoiseandneedslargesetof datapoints(orderof 10D0 datapoints)[1].

In order to avoid thoseproblems,we canestimatethe embeddingdimensiondE,definedas the lowest (integer) dimensionwhich unfolds the attractor, i.e. the min-imal dimensionfor which foldings due to the projectionof the attractorin a lowerdimensionalspaceareavoided. Theembeddingdimensionis a global dimensionandin generalis differentfrom thelocaldimensionof theunderlyingdynamics.

The EmbeddingTheoremguaranteesthat if the dimensionof the attractoris D0,thenwe canunfold the attractorin a spaceof dimensiondE (dE 2Do). It is worthnotingthatdE is nota necessaryconditionfor unfolding,but is sufficient.

Thedimensionof input layerof theMulti-Layer Perceptronwill bethenof dimen-sionhigh enoughin orderthat thedeterministicpartof thedynamicsof thesystemisunfold.

3.3.1 Global FalseNearestNeighbors

In practice,themethodof GlobalFalseNearestNeighborsproposedby Abarbanel[1],canbeusedto evaluatetheembeddingdimensiondE. Givenadataspacereconstructionin dimensiond, with datavectorsSi

si si T si d 1 T , wherethetimedelayT

is thefirst minimumof averagemutualinformation(Eq.3.2).Let beSNN

i

sNNi sNN

i T ! sNNi d 1 T , thenearestneighborvectorin phasespace.

If thevectorSNNi is a falseneighbor(FNN) of Si , having arrivedin its neighborhoodby

projectionfrom a higherdimensionbecausethepresentdimensiond doesnot unfoldtheattractor, thenby goingto thenext dimensiond 1 we maymovethis point out oftheneighborhoodof Si .

We definethedistanceξ betweenpointswhenseenin dimensiond 1 relative to

7

thedistancein dimensiond as

ξi " #R2

d 1

i R2

d

i

R2d

i (3.3)

then

ξi%$ si dT sNN

i dT $Rd

i (3.4)

As suggestedby Abarbanel[1], SNNi andSi canbeclassifiedasa falseneighborif

ξi is a numbergreaterthana thresholdθ (ξi θ). In many applicationsa goodvaluefor θ is 15.

In caseof cleandatafrom a dynamicalsystem,we expect that the percentageofFNNswill dropfrom nearly100%in dimensiononecloseto zerowhendE is reached.

It is worthnotingthat,aswe go to higherdimensionalspacesthevolumeavailablefor datagrowsasthedistanceto thepowerof dimension,andno nearneighborwill beclassifiedcloseneighbor. In this casewe canmodify theEq.3.4as

ξi%$ si dT sNN

i dT $RA

(3.5)

whereA is the nominal “radius” of the attractordefinedas the Root Mean Square(RMS)errorvalueof dataaboutits mean,e.g.:

RA 1

N

N

∑i & 1

$ si sav $ (3.6)

sav 1

N

N

∑i & 1

si (3.7)

In [20] a very efficient implementationof FNN algorithmis presented.This algo-rithm is basedon thework by NeneandNayar[21].

It is worth noting that thereare two main argumentsthat cansuggestto sizetheinput layer of a predictorbasedon MLPs smallerthanthe evaluationobtainedusingtheFNN method.In fact this evaluationis still anupperbound,andmoreover for anassignedsizeof thetrainingset,a limitation of thecomplexity of thelearningmachinecanleadto bettergeneralization.

3.3.2 BellsWhistles and Pitfalls of FNN' TheglobalFNN calculationis simpleandfast.' TheFNN calculationappliedto signalscomingfrom two differentoutputsof thesamedynamicalsystemgives,in general,two differentvaluesof dE. Thenfromeachsignalwe will obtaindifferentreconstructedcoordinatesystems,but bothconsistentwith theoriginaldynamicalsystem.' FNN methodis valid evenif thesignalof interestresultsfrom a filteredoutputof a dynamicalsystem[1, 4].

8

' If the signal is contaminedby noise(assumedto be generatedby an high di-mensionalsystem),it maybethatthecontaminationwill dominatethesignalofinterestandFNN will show thedimensionrequiredto unfold thecontamination.Here,a simplebyproductof FNN calculationis anindicationof noiselevel in asignal.

9

Chapter 4

EnsembleMethod basedonSingular SpectrumAnalysisDecomposition

4.1 Singular SpectrumAnalysis

The methodologydescribedin the previous sectionhasbeensuccessfullyappliedinthedesignof Multi-Layer PerceptronsandNeuro-Fuzzysystemsto forecastingof sim-ulatednon-linearandchaoticsystems[26, 22] andof to realworld problemsuchasthemodelingof thevibrationdynamicof a realsystemconsistingin a 150MW Siemenssteamturbine[19].

The proposedmethodologycannot be directly appliedto forecastingdiscontinu-ousor intermittentsignals,astheuniversalfunctionapproximationtheoremsfor neuralnetworks[3] andfuzzysystems[33] requirethecontinuityof thefunctionto beapprox-imate.

In orderto avoid theeffectof discontinuitiesof asignalwecanapplytheSingular-SpectrumAnalysis(SSA) [15, 23, 31, 16] to the signalto be forecasted.In SSA thestatevectorSi

si si 1 si M 1 is a temporalwindow (augmentedvector)of the

seriess, madeup by a givennumberof samplesM.Thecornerstoneof SSAis theKarhunen-Loeveexpansionor PrincipalComponent

Analysis(PCA) [28] thatis basedon theeigenvaluesproblemof thelaggedcovariancematrix Zs. Zs hasa Toeplitz structure,i.e. constantdiagonalscorrespondingto equallags: ())))))* c

0 c

1 c

M 1

c1 c

0 c

1+ c

1

cM 1 c

1 c

0

,.------/ (4.1)

10

In absenceof prior informationaboutthe signal it hasbeensuggest[31] to usethefollowing estimatefor Zs:

cj 1

N j

N j

∑i & 1

sisi j (4.2)

The original seriescanbe expandedwith respectto the orthonormalbasiscorre-spondingto theeigenvectorsof Zs

si j M

∑k& 1

pki u

kj 1 0 j 0 M 0 0 i 0 N M (4.3)

wherepki arecalledprincipal components(PCs)andtheeigenvectorsuk

j arecalledtheempiricalorthogonalfunctions(EOFs),andtheorthornomalityproperty

M

∑k& 1

ukju

kl δ j l 1 0 j 0 M 1 0 l 0 M (4.4)

holds. It is worth noting that SSA doesnot resolve periodslonger thanthe windowlengthM. Hence,if we want to reconstructa strangeattractor, whosespectrumin-cludesperiodsof arbitrarylength,thelargeM thebetter, avoiding to exceedingM N

3(otherwisestatisticalerrorscoulddominatethelastvaluesof theauto-covariancefunc-tion).

In [30, 8,31, 13, 16, 7] many applicationsof SingularSpectrumAnalysishavebeenpresented,includingnoisereduction,detrending,spectralestimate,andprediction.

Concerningthe applicationof SSA to prediction,that is the main interestof thepresentpaper, it is supportedby the following argument. Sincethe PCsarefilteredversionof the signaland typically band-limited,their behavior is more regular thanthatof theraw seriess, andhencemorepredictable.

VautardandGhil in [31] fit an autoregressive (AR) modelfor eachindividual PCusingtheAR coefficientestimateof Burg [2], while Lisi, Nicolis andSandri[16] usedMulti-Layer Perceptronsin order to estimatefiltered versionof the raw signalusingobtainedusingSSA.

In orderto reducethecomputationalcostswedecomposetheraw seriess in recon-structedwavescorrespondingto SSA subspacesequivalentto similar explainedvari-anceandwe predictthemusingMulti-Layer Perceptronscombinedwith independentevaluationof time lag usingthefirst minimumof mutualinformationandembeddingdimensionusingFalseNearestNeighborsmethod.

4.2 Reconstructedcomponentsandreconstructedwaves

Following VautardandGhil [31], supposewe want to reconstructthe original signalsi startingfrom a SSA subspace1 of k eigenvectors. By analogywith Eq. 4.3, theproblemcanbeformalizedasthesearchfor aseriessof lengthN, suchthatthequantity

H2 s N M

∑i & 0

M

∑j & 1

si j ∑

k 2 pki u

kj 2 (4.5)

11

is minimized.In otherwords,theoptimalseriess is theonewhoseaugmentedversionS is theclosest,in the least-squaressense,to theprojectionof theaugmentedseriesSontoEOFswith indicesbelongingto 1 .

Thesolutionof theleast-squaresproblemof Eq.4.5 is givenby

si4355556 55557

1M ∑M

j & 1 ∑k 2 pki ju

kj for M 0 i 0 N M 1

1i ∑i

j & 1 ∑k 2 pki ju

kj for 1 0 i 0 M 1

1N i 1 ∑M

j & i N M ∑k 2 pki ju

kj for N M 2 0 i 0 N (4.6)

When 1 consistson a singleindex k, theseriess is calledthekth RC, andwill bedenotedby sk.

RCshaveadditiveproperties,i.e.

s ∑k 2 sk (4.7)

In particulartheseriesscanbeexpandedasthesumof its RCs:

s M

∑k& 1

sk (4.8)

Notethat,despiteits linearaspect,thetransformchangingtheseriess into sk is, infact,non-linear, sincetheeigenvectorsuk dependnon-linearlyon s.

If we truncatethis sumto an assignednumberof RCs,the explainedvarianceoftherelatedaugmentedvectorS is thesumof theeigenvaluesassociatedto thoseRCs,while theestimationof theresultingreconstructionerroris thesumof theeigenvaluescorrespondingto theremainingRCs.As a consequence,it is suitableto ordertheRCsfollowing thevalueof theeigenvalues.

Let be 1 1 81 2 81 L L disjoint subspaces,thena reconstructedwave(RW) Ωl (l 1 L) is definedas

Ωl ∑

k 2 l

sk 1 0 l 0 L (4.9)

Then,from Eq.s4.8and4.9,onecanobtain:

s L

∑l & 1

Ωl (4.10)

thatsaysthattheoriginalseriesscanberecoveredasthesumof all theindividualRWs.

4.3 Hybrid Approachto ComplexSignal Prediction

In orderto designa predictorfor complex signals,suchasdiscontinuousand/orinter-mittent signals,we canapply the following approachthat combinesan unsupervisedstepandonesupervisedone,building-upansucha way anensembleof learningma-chines:

12

' Unsuperviseddecomposition:Using the SingularSpectrumAnalysis, decom-posestheoriginal signalS in reconstructedwaves(RWs), correspondingto sub-spaceswith equalexplainedvariance;' Supervisedlearning: Preparesa predictorfor eachRW usingthe methodologydescribedin Ch.3;' OperationalPhase:Thepredictionof theoriginalsignalS is thenobtainedasthesumof thepredictionsof individualRWs, i.e. usingEq.4.10.

It is worth noting that, sometimethe mostcomplex waves(in generalthosecor-respondingthe the low eigenvalues)cannotsatisfactorypredicted,usingtheavailabledata.Following thecriteriaof thebestprediction[16] in theEq.4.10we canexcludedthemif, whenif enclosedin thesum,makeworsetheoverallprediction.

13

Chapter 5

Application to RainfallForecasting

5.1 Data Setand Methods

Oneof our applicationsof the previous describedforecastingapproachconcernstheforecastingof daily rainfall intensitiesseriesof 3652sampleseach,collectedby 135stationslocatedin theTiber riverbasin(seeFig. 5.1) in theperiod01/01/1958- 12/31/1967.

The dataprocessingstartedby consideringthe seriesof the MeanStation(MS),definedastheaverageof all 135rainfall intensityseries(Fig. 5.2). In Fig 5.3awindowontheperiod07/01/66- 12/30/66is presentedin orderto bettershow thediscontinuityandintermittenceof thestudiedsignal.Fig.5.4showsthegraphof themutualinforma-tion of theMS’stimeseries.Its first minimumgivesT 7. Thisvaluehasbeenusedasthetimelagfor thecomputationof GlobalFalseNearestNeighbors.Thegraphof FNNis shown in Fig. 5.5.Till d 6 thecurvedecreaseswith thegrowing of dimension,andthenreachesa plateauof 20%.Theembeddingdimensionis thendE

6.Following the constructive approachdescribedin Ch. 3, we designeda predictor

basedon a Multi-Layer Perceptron.TheMLP wasmadeup by two hiddenlayersof 5units,aninput layerof 6 inputsspacedby atime lagof 7 days.Theresultsobtainedbysuchaway arepoor, dueto thediscontinuityof thehydrologicalvariable.

In orderto reducetheeffectsof thediscontinuities,weusedtheSSAdecompositionensemblemethodshown in thepreviouschapter.

We appliedtheSingular-SpectrumAnalysis(SSA)to asignalcorrespondingto thefirst 3000samplesof MS series.Thewindow width usedfor theSSA wasM 182,i.e. 6 months,that is a periodsufficient to take in accountseasonalperiodicitiesof therelatedphysicalphenomena.

Fig. 5.6 shows the orderedlist of eigenvaluesand the explainedvarianceof thereconstructedsignalusinganincreasingnumberof RCs.

Then,from theoriginal MS serieswe obtained10 wavesΩ1 Ω10 reconstructedfrom 10disjointsub-spaces,eachof themrepresentinga10%of theexplainedvariance

14

Figure5.1: Distribution of the135stationsin theTiber riverbasin.

15

0 500 1000 1500 2000 2500 3000 3500 40000

20

40

60

80

100

120

time (days)

Dai

ly r

ain

(mm

)

Figure5.2: MeanStation:Daily rain millimeters.Period01/01/1958- 12/31/1967.

0 20 40 60 80 100 120 140 160 180 2000

5

10

15

20

25

30

35

time (days)

Dai

ly r

ain

(mm

)

Figure5.3: MeanStation:Daily rain millimeters.Period07/01/66- 12/30/66.

16

0 10 20 300

0.5

1

1.5

2

2.5

3

3.5

time (days)

Mut

ual I

nfor

mat

ion

Figure5.4: MeanStation:Mutual Information.Thefirst minimumis for t 7.

0 5 10 15 200

20

40

60

80

100

time (days)

FN

N R

atio

Figure5.5: MeanStation:GlobalFalseNearestNeighbors.

17

0 50 100 150 2000

50

100

150

200

250

Eig

en V

alue

Spe

ctra

0 50 100 150 2000

20

40

60

80

100

Exp

lain

ed V

aria

nce

Figure5.6: MeanStation: Eigenvaluesspectrum(up) andexplainedvarianceof theaugmentedvectorsrelatedto anincreasingnumberof RCs(down).

18

Table 5.1: Reconstructedwaves (RWs) from disjoint SSA subspaces(eachof themexplaining10%of thevariance)andcorrespondingreconstructedcomponents(RCs).TheSSAis performedusingusingawindow of 182days.

RW RCs

Ω1 1-4Ω2 5-11Ω3 12-19Ω4 20-28Ω5 29-39Ω6 40-52Ω7 53-70Ω8 71-93Ω9 94-126Ω10 127-182

(Tab5.1).Waves Ω1 Ω6 (correspondingto the first 52 RCs), are more regular than the

remainingwaves(correspondingto subspaceswith low eigenvalues)aremorecomplex(Fig. 5.7).

Fig. 5.8showsthemutualinformationfor eachRW, while Fig. 5.9showsthecorre-spondingGlobalFalseNeighborsplots.Theevaluationsof thefirstminimumof mutualinformationandof dE for eachRW arepresentedin Tab. 5.2.

Then,we designeda neuralpredictorbasedon a MLP for eachindividualwave oftheMS, following theconstructiveapproachdescribedin Ch.3, implementing,in suchaway, a SSAdecompositionensembleof learningmachines.

The bestresultsfor eachRW have beenobtainedusingas inputswindows of 5consecutiveelementsandtwo hiddenlayerswith dimensionsdescribedin Tab. 5.3.Aseachwave contains3652daily samples,in our casefor eachwave we obtaineda datasetof 3646associative couples,eachof themconsistingof a window of 5 consecutiveelements,asinput,andthenext dayrainfall intensity, asoutput.

EachMLP wastrainedusingthefirst 2000associativecouples(trainingset), usingtheerrorback-propagationalgorithmwith momentum[32], andabatchpresentationofsamples.Thefollowing 1000associativecouples(validationset) wereusedin ordertoimplementanearlystoppingof thetrainingprocedure.Theremaining646wereusedfor measuringthequality of theforecastingof thereconstructedwave (testset).

19

Table5.2: First minimum of Mutual Information(T) andandembeddingdimension(dE) computedusingT andothertime lagsfor eachreconstructedwave.

RW T dE 9 T : dE 9 7: dE 9 1:Ω1 22 4 3 2Ω2 9 18 14 3Ω3 4 10 7 4Ω4 5 18 14 4Ω5 4 14 9 4Ω6 3 5 6 4Ω7 2 4 4 5Ω8 2 4 4 6Ω9 2 5 5 4Ω10 5 10 8 4

5.2 Resultsand Discussion

The predictionresultsfor eachreconstructedwave are presentedin Tab. 5.3 and inFig. 5.10.

ThepredictionsobtainedusingtheSSAdecompositionensembleof learningma-chines(i.e., thesumof thepredictionsof the10waves)at1 dayaheadareverysatisfac-tory, asfor theresultingMS predictiontheRootMeanSquare(RMS) erroron thetestsetis .95 mm of rain, while the Maximum Absolute(MAXA) error is 6.47mm, i.e.,thepredictedsignalis substantiallycoincidentwith themeasuredMS rainfall intensitysignal.

As shown in Figs.5.11,5.12,and5.13,thepredictionsof theMS rainfall intensitysignalaresubstantiallycoincidentwith themeasuredMS 1.

It is worth noting that the designof the ensemblelearningmachineis critical.Choosinga window M 182 for the SSA, the bestpredictionresultswereobtainedusingMLPs with four or five inputsandtwo hiddenlayers. Using MLPs predictorswith four inputswe obtainedresultsslight worse.Inthis casetheRMS for MS is 1.05mm and the MAXA is 8.05 mm for MLPs predictorsusing four inputs. We noticethattheMaximumAbsoluteerroroccursthesameday(11/05/1967)thanfor thearchi-tectureusingMLPs with five inputs. A differentwindow for SSAcangive resultsofinferior quality. E.g.,usingM=256 asthewindow for SSAwe obtainedgoodpredic-tion performancesonly for for wavesΩ1 Ω6, correspondingto 60%of theexplainedvariance(first 76 RCs).Theresultinggeneralizationof theSSAdecomposionensem-

1Notethatin thecomparisonshown in Fig. 5.11thepredictedsignalis clampedto zero.

20

Table5.3: Sizeof thehiddenlayers(L1 andL2), RootMeanSquare(RMS) errorandMaximumAbsolute(MAXA) errorfor eachreconstructedwave - Sizeof MLPs InputLayer=5.

RW L1 L2 RMS MAXA

Ω1 6 4 .02 .05Ω2 8 5 .03 .12Ω3 6 4 .04 .15Ω4 8 4 .04 .11Ω5 8 5 .06 .14Ω6 8 4 .15 .40Ω7 4 4 .15 .38Ω8 6 4 .64 1.92Ω9 3 4 .75 2.40Ω10 3 4 .29 .90

ble waspoor, evenleaving out in Eq.4.10thepredictionsof Ω7 Ω10 as,if enclosedin theaddition,makeworsetheoverallprediction.

We underlinethat thedimensionof theoptimal input layer (i.e. 5) is smallerthanthe dE evaluatedwith the FNN method(Tab. 5.2). This choiceis supportedby thegeneralizationtrade-off dueto complexity of the learningmachineandlimed sizeofthetrainingset(seediscussionin Ch.3.3.1).Concerningthetime lag betweeninputs,we investigateddifferentvaluesasthefirst minimumof themutualinformationis onlyaprescriptionandnot a theoreticalresult( in [26]).

The plateauin the FNN plots of Fig. 5.4 is a symptomof the presenceof highdimensionalnoise[1]. After the SSA decompositionwe cannoticethat the noiseisconcentratedmainly in RW10 and also in RW3, RW5, and RW9, as shown in theplateausin thetheirFNN plots(Fig. 5.9.

21

0 50 100 150 200−1

0

1

2

3

4

5

6

time (days)

Dai

ly r

ain

(mm

)

RW1

0 50 100 150 200−4

−2

0

2

4

6

time (days)

Dai

ly r

ain

(mm

)

RW2

0 50 100 150 200−4

−2

0

2

4

6

time (days)

Dai

ly r

ain

(mm

)

RW3

0 50 100 150 200−3

−2

−1

0

1

2

3

time (days)

Dai

ly r

ain

(mm

)

RW4

0 50 100 150 200−3

−2

−1

0

1

2

3

4

time (days)

Dai

ly r

ain

(mm

)

RW5

0 50 100 150 200−3

−2

−1

0

1

2

3

time (days)

Dai

ly r

ain

(mm

)

RW6

0 50 100 150 200−4

−3

−2

−1

0

1

2

3

4

time (days)

Dai

ly r

ain

(mm

)

RW7

0 50 100 150 200−6

−4

−2

0

2

4

time (days)

Dai

ly r

ain

(mm

)

RW8

0 50 100 150 200−8

−6

−4

−2

0

2

4

6

8

time (days)

Dai

ly r

ain

(mm

)

RW9

0 50 100 150 200−8

−6

−4

−2

0

2

4

6

8

time (days)

Dai

ly r

ain

(mm

)

RW10

Figure5.7: ReconstructedWaves.Period07/01/1966- 12/30/1966.

22

0 10 20 30 400

1

2

3

4

5

6

7

8

RW1

0 10 20 30 401

2

3

4

5

6

7

RW3

0 10 20 30 401

2

3

4

5

6

7

RW4

0 10 20 30 401

2

3

4

5

6

RW5

0 10 20 30 401

2

3

4

5

6

RW6

0 10 20 30 400

1

2

3

4

5

6

RW7

0 10 20 30 400

1

2

3

4

5

6

RW8

0 10 20 30 401

2

3

4

5

6

7

RW9

0 10 20 30 401

2

3

4

5

6

RW10

Figure5.8: Reconstructedwaves- Mutual Information.

23

0 5 10 15 200

20

40

60

80

100

RW1

0 5 10 15 200

20

40

60

80

100

RW2

0 5 10 15 200

20

40

60

80

100

RW3

0 5 10 15 200

20

40

60

80

100

RW4

0 5 10 15 200

20

40

60

80

100

RW5

0 5 10 15 200

20

40

60

80

100

RW6

0 5 10 15 200

20

40

60

80

100

RW7

0 5 10 15 200

20

40

60

80

100

RW8

0 5 10 15 200

20

40

60

80

100

RW9

0 5 10 15 200

20

40

60

80

100

RW10

Figure5.9: ReconstructedWaves- GlobalFalseNeighborsusingT=7.

24

−2 0 2 4 6−1

0

1

2

3

4

5

6

MS

Pre

dict

ed M

S

RW1

−5 0 5−6

−4

−2

0

2

4

6

MS

Pre

dict

ed M

S

RW2

−5 0 5−6

−4

−2

0

2

4

6

MS

Pre

dict

ed M

S

RW3

−5 0 5−5

0

5

MS

Pre

dict

ed M

S

RW4

−5 0 5−5

0

5

MS

Pre

dict

ed M

S

RW5

−5 0 5 10−5

0

5

10

MS

Pre

dict

ed M

SRW6

−5 0 5−5

0

5

MS

Pre

dict

ed M

S

RW7

−5 0 5−5

0

5

MS

Pre

dict

ed M

S

RW8

−10 −5 0 5 10 15−6

−4

−2

0

2

4

6

8

MS

Pre

dict

ed M

S

RW9

−10 −5 0 5 10 15−8

−6

−4

−2

0

2

4

6

8

MS

Pre

dict

ed M

S

RW10

Figure5.10: ReconstructedWaves- Scatterplots on the testset(usingMLPs with 5inputs). 25

0 50 100 150 2000

5

10

15

20

25

time (days)

Dai

ly r

ain

(mm

)

Figure5.11: MeanStation:1 dayaheadforecastingin theperiod07/01/66- 12/30/66usingtheensembleof 10 MLPswith 5 inputs.

0 50 100 150 200−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

time (days)

Err

or (

mm

)

Figure5.12: MeanStation: 1 day aheadforecasting.Errorsin the period07/01/66-12/30/66usingtheensembleof 10 MLPswith 5 inputs.

26

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

35

40

45

50

Predicted Station

MS

Figure5.13: MeanStation: scatterplot of the 1 day haedforecastingon the testsetusingtheensembleof 10 MLPswith 5 inputs.

27

Chapter 6

Conclusions

In thispaperwepresentedanextensionof amethodologyfor signalforecasting[1, 26,22] to thecaseof discontinuousandintermittentsignals.

As in [22], we usedpredictorsbasedon Multi-Layer Perceptronsor Neuro-FuzzySystemscharacterizedby theUniversalFunctionApproximationProperty. The inputlayer of thosepredictorareshapedusing resultsandsuggestionsfrom the theoryofdynamicalsystemslinked to the Takens-Mane theorem[27, 17] aboutthe sufficientdimensionof thereconstructionvectorfor thedynamicsof anattractor.

In orderto avoid theeffect of thediscontinues,in this paperwe have proposedanEnsembleMethodbasedon SingularSpectrumAnalysis[15, 23, 31, 16] Decomposi-tion basedon thefollowing designsteps:' Unsuperviseddecomposition:Using the SingularSpectrumAnalysis, decom-

posestheoriginal signalS in reconstructedwaves(RWs), correspondingto sub-spaceswith equalexplainedvariance;' Supervisedlearning: Preparesa predictorfor eachRW usingthe methodologydescribedin Ch.3;' OperationalPhase:Thepredictionof theoriginalsignalS is thenobtainedasthesumof thepredictionsof individualRWs, i.e. usingEq.4.10.

Thepresentedmethodologyhasbeensuccessfullyappliedto theforecastingof rain-fall intensitiesseriescollectedby 135stationsdistributedin theTiber river basinfor aperiodof 10years.

Moreover, preliminaryresultsfor the forecastingof the rainfall seriesof an indi-vidual stationarealsoin goodagreementwith data[20].

28

Acknowledgments

Thisworkswassupportedby IRSA-CNR,ProgettoFinalizzatoMadessII CNR,INFM,andUniversita di Genova. We thankFabio MontarsoloandDanielaBarattafor theprogrammingsupport.

29

Bibliography

[1] H.D.I. Abarbanel. Analysisof ObservedChaotic Data. Springer, New York,USA, 1996.

[2] J.P. Burg. Maximumentropy spectralanalysis. In D.G.Childers,editor, ModernSpectrumAnalysis, IEEEPress,page34,New York, 1978.

[3] G. Cybenko. Approximationby superpositionsof a sigmoidalfunction. Mathe-maticsof Control Signals,andSystems, 2:303–314,1989.

[4] M.E. Dave. Reconstructionof attractorsfrom filtered time series. PhysicaD,101:195–206,1997.

[5] A. Fraser. Informationtheoryandstrangeattractors.TechnicalReportPhDthesis,Universityof Texas,Austin,1989.

[6] A. FraserandL. Swinney. Independentcoordinatesfor strangeattractorsfrommutualinformation.PhysicalReview, 33:1134–1140,1986.

[7] M Ghil. TheSSA-MTM toolkit: Applicationsto analysisandpredictionof timeseries.In B. Bosacchi,J.C.Bezdek,andD.B. Fogel,editors,Applicationof SoftComputing, volume3165of Proceedingsof SPIE, pages216–230,Bellingham,WA, 1997.

[8] M. Ghil andR. Vautard.Rapiddisintegrationof thewordieice shelf in responseto atmosphericwarning.Nature, 350:324,1991.

[9] S. Haykin. Neural Networks.A ComprehensiveFoundation(SecondEdition).PrenticeHall, UpperSaddleRiver, NJ,1999.

[10] S. Haykin. and J. Principe. Making senseof a complex world: Using neuralnetworks to dynamicallymodelchaoticeventssuchasseaclutter. IEEE SignalProcessingMagazine,15,1998

[11] J. J. Hopfield. Neuralnetworks andphysicalsystemswith emergentcollectivecomputationalfacilities. Proceedingsof theNationalAcademyof Sciences,USA,79:2554–2558,1982.

[12] K. Hornik, M. Stinchcombe,andH. White. Multilayer feedforwardnetworksareuniversalapproximators.Neural Networks, 2:359–366,1989.

30

[13] C.L. KeppenneandM. Ghil. Adaptive filtering andpredictionof noisy multi-variatesignals: an applicationto subannualvariability in atmosphericangularmomentum.InternationalJournalof BifurcationandChaos, 3:625–634,1993.

[14] T. Kohonen.Self-OrganizationandAssociativeMemory. Springer, Berlin, thirdedition,1989.

[15] R. KumaresanandD.W. Tuffs. Data-adaptive principal componentsignalpro-cessing.In IEEE Proc. Conf. on DecisionandControl, page949,Albuquerque,USA, 1980.IEEE.

[16] F. Lisi, O. Nicolis, andM. Sandri. CombiningSingular-SpectrumAnalysisandneuralnetworks for time seriesforecasting.Neural ProcessingLetters, 2:6–10,1995.

[17] R. Mane. On the dimensionof the compactinvariantsetsof certainnon-linearmaps. In D.A. RandandL.-S. Young,editors,DynamicalSystemsand Turbu-lence, volume898 of Lecture Notesin Mathematics, pages230–242,Warwick1980,1981.Springer-Verlag,Berlin.

[18] F. Masulli. Bayesianclassificationby feedforward connectionistsystems. InF. Masulli, P. G. Morasso, and A. Schenone,editors, Neural Networks inBiomedicine- Proceedingsof the AdvancedSchool of the Italian BiomedicalPhysicsAssociation- Como(Italy) 1993, pages145–162,Singapore,1994.WorldScientific.

[19] F. Masulli, R. Parenti,andL. Studer. Neuralmodelingof non-linearprocesses:Relevanceof theTakens-Mane theorem.InternationalJournal on ChaosTheoryandApplications, 4:59-74,1999.

[20] F. Montarsolo. A toolkit for discontinuousseriesforecasting. Laureathesisincomputerscience(in Italian), DISI - Departmentof ComputerandInformationSciences,Universityof Genova- Genova,(Italy), 1998.

[21] S.A.NeneandS.K.Nayar. A simplealgorithmfor nearestneighborsearchin highdimensions. IEEE Transactionson Pattern Analysisand Machine Intelligence,19,1997.

[22] R. Parenti,F. Masulli, andL. Studer. Control of non-linearprocessby neuralnetworks: BenefitsusingtheTakens-Mane theorem.In Proceedingsof theICSCSymposiumon Intelligent Industrial Automation,IIA’97, pages44–50,Millet,Canada,1997.ICSC.

[23] E.R.Pike,J.G.MCWhirter, M. Bertero,andC. deMol. Generalizedinformationtheoryfor inverseproblemsin signalprocessing.IEE Proceedings, 59:660–667,1984.

[24] D.W. Ruck,S.K. Rogers,M. Kabrisky, M.E. Oxley, andB.W. Suther. Themul-tilayer perceptronasanapproximationto a Bayesoptimaldiscriminantfunction.IEEETransactionsonNeural Networks, 1:296–298,1990.

31

[25] D.E. Rumelhart,G.E. Hinton, and R.J. Williams. Learninginternal represen-tationsby error propagation. In D.E. RumelhartandJ.L. McClelland,editors,Parallel DistributedProcessing, volume1, chapter8, pages318–362.MIT Press,Cambridge,1986.

[26] L. StuderandF. Masulli. Building a neuro-fuzzysystemto efficiently forecastchaotictimeseries.NuclearInstrumentsandMethodsin PhysicsResesarch, Sec-tion A, 389:264–667,1997.

[27] F. Takens. Detectingstrangeattractorsin turbulence. In D.A. RandandL.-S.Young,editors,DynamicalSystemsandTurbulence, volume898of LectureNotesin Mathematics, pages366–381,Warwick,1981.Springer-Verlag,Berlin.

[28] C. W. Therrien. Decision,Estimation,and Classification: An IntroductiontoPatternRecognitionandRelatedTopics. Wiley, New York, 1989.

[29] J.VastanoandL. Rahman.Informationtransportin spatio-temporalchaos.Phys-ical Review Letters, 72:241–275,1989.

[30] R. VautardandM. Ghil. Singular-spectrumanalysisin nonlineardynamics,withapplicationsto paleoclimatictimeseries.PhysicaD, 35:395–424,1989.

[31] R. Vautard,P. You,andM. Ghil. Singular-spectrumanalysis:A toolkit for short,noisychaoticsignals.PhysicaD, 58:95–126,1992.

[32] T.P. Vogl, J.K.Mangis,A.K. Rigler, W.T. Zink, andD.L. Alkon. Acceleratingtheconvergenceof the back-propagationmethod. Biological Cybernetics, 59:257–263,1988.

[33] L. WangandJ.M. Mendel. Fuzzybasisfunctions,universalapproximation,andorthogonalleast-squareslearning. IEEE Trans.on Neural Networks, 5:807–14,1992.

32

dipartimento di informatica e scienze dell ......istituto nazionale per la fisica della materia...

Documents