art%3a10.1007%2fs10518-015-9728-z

Bull Earthquake EngDOI 10.1007/s10518-015-9728-z

ORIGINAL RESEARCH PAPER

Strategy for the selection of input ground motion forinelastic structural response analysis based on naïveBayesian classifier

M. Lancieri · M. Renault · C. Berge-Thierry ·P. Gueguen · D. Baumont · M. Perrault

Received: 3 June 2014 / Accepted: 11 January 2015© Springer Science+Business Media Dordrecht 2015

Abstract An application of the naïve Bayesian classifier for selecting strong motion datain terms of the deformation probably induced on a given structural system is presented. Themain differences between the proposed method and the “standard” procedure based on theinference of a polynomial relationship between a single intensity measure and the engineeringdemand parameter are: the discrete description of the engineering demand parameter; theuse of an array of intensity measures; the combination of the information issued from thetraining phase via a Bayesian formulation. Six non-linear structural systems with initialfundamental frequency of 1, 2 and 5 Hz and with different strength reduction factors aremodelled. Their behaviour is described using the Takeda hysteretic model and the engineeringdemand parameter is expressed as the relative drift. A database of 6,373 strong motion recordsis built from worldwide catalogues and is described by a set of “classical” intensity measures;it constitutes the “training dataset” used to feed the Bayesian classifier. The structural systemresponse is reduced to a description of three possible classes: elastic, if the induced driftis lower than the yield displacement; plastic, if the drift ranges between the yield and theultimate drift values; fragile if the drift reaches the ultimate drift. The goal is to evaluate theconditional probability of observing a given status of the system as a function of the intensitymeasure array. To validate the presented methodology and evaluate its prediction capability,a blind test on a second dataset, completely disjointed from the training one, composed of7,000 waveforms recorded in Japan, is performed. The Japanese data are classed using theprobability distribution functions derived on the first data set. It is shown that, by combiningseveral intensity measures through the likelihood product, a stable result is obtained whereby

M. Lancieri (B) · M. Renault · D. BaumontPRP-DGE/SCAN/BERSSIN, Institut de Radioprotection et Sûreté Nucléaire, Bp 17, 92262 Fontenay auxRoses CEDEX, Francee-mail: [email protected]

C. Berge-ThierryDEN/DANS/DM2S, Comissariat à l’énergie atomique, Centre de Saclay, 91191 Gif sur Yvette, France

P. Guéguen · M. PerraultISTerre-IFSTTAR, Université Joseph Fourier, 1/CNRS/IFSTTAR, BP 53, 38041 Grenoble cedex 9, France

123

Bull Earthquake Eng

most of the data (>75 %) are well classed. The degree of correlation between the intensitymeasure and the engineering demand parameter controls the reliability of the probabilitycurves associated to each intensity measure.

Keywords Naïve Bayesian · Strong ground motion · Data analysis · Intensity measures ·Inelastic structural response · Takeda model

1 Introduction

Thanks to recent advances in computational resources, availability of improved numericalmethods and of robust numerical platforms, the analysis of inelastic structural response isnow more and more performed for the structural design. Moreover, there is today a generalconsensus in the engineering community that only this kind of analysis can give insightsinto hierarchy of failures, or quantify the energy absorption and force redistribution phe-nomena resulting from gradual plastic hinge formation in a structure (Katsanos et al. 2009).The required inputs are time series representative of a given earthquake scenario issued bydeterministic seismic hazard assessment or by the disaggregation of probabilistic seismichazard analysis assessment (PSHA). Unfortunately there is no consensus in the earthquakeengineering community on how to appropriately select and scale the input ground motion.An exhaustive review of the ground-motion selection methods is presented in Katsanos et al.(2009), each of them presenting several advantages and/or limitations. Indeed, the analystmust have a clear understanding of the goals of the analysis (performance based design, prob-ability of failure, vulnerability assessment) before choosing one procedure of data selectionand scaling (NEHRP 2011). In case of site-specific performance-based design, the analystevaluates the rate λEDP(s) of exceedance of a particular value of engineering demand para-meter (EDP) over the state for a particular return period, as a function of a magnitude distancepair. It can be expressed as (Luco and Cornell 2007):

λEDP (s) = GEDP|M,R(s, Mi , Ri )λ(Mi , Ri ) for the ith pair (1)

where GEDP|M,R(s, Mi , Ri ) is the probability of exceeding the engineering demand parame-ter (EDP) over the state s, conditional to the (Mi , Ri ) pairs including all the possible scenarioscontributing to the seismic hazard definition. It is determined through nonlinear tests on agiven structure excited with a set of ground motion. λ(Mi , Ri ) is the annual frequency ofoccurrence of each one of these seismic events estimated via PSHA.

The number of ground motions needed for dynamic excitation of a particular structure is

given by n =[

ση

]2, where σ is the dispersion of the response and η is the level of accuracy

(Luco and Bazzurro 2007). To optimize the number of needed essays, a possible strategyis to use as input spectral matched ground motions (SMGM). SMGM strongly reduce thevariability of the observed response spectra, reducing the number of time series needed tobe run in engineering analyses and leading to higher confidence in the prediction of averageresponses (Luco and Bazzurro 2007; Grant and Diaferia 2012). Various methods have beendeveloped to obtain spectral-matched time series, some allowing to generate purely artificialwaveforms (Gasparini and Vanmarke 1976; Yamamoto and Baker 2011), others adjustingrecorded signals by adding harmonic components in the frequency domain (Preumont 1984;Silva and Lee 1987) or by wavelet adjustment (Abrahamson 1992; Al Atik and Abrahamson2010).

123

Bull Earthquake Eng

However, in certain configurations, i.e. near the fault planes, or to assess the probabilityof failure of a given structure, the matched signals are not suitable since they do not take intoaccount the natural variability of the seismic signal. In these cases the possible strategies areto use non-scaled real records, synthetic seismic ground motion based on a physical model ofthe rupture (Song et al. 2014), or semi-empirical non-stationary stochastic methods (Pousseet al. 2006; Laurendedau et al. 2012) that take into account some basic requirement issued bythe physics of the source (Brune 1970 spectral model, realistic envelope function, variability).One problem that could occur in these cases is that signals selected by seismologists maynot be suitable for engineering purposes: the dataset does not contain input motions strongenough to induce plastic hinges, or, on the contrary, the signals are too strong and the structureis severely damaged.

In this paper, this problem is addressed by introducing a strategy for signal selectioncompatible with the aim of the inelastic structural response analysis: to select real groundmotion records on the basis of a set of ground motion intensity measures (IM), associatingto each waveform the probability that it will induce a linear, plastic or fragile behaviour ona given structural system.

IMs are parameters describing some features of the seismic ground shaking, such as thepeak ground acceleration (PGA) or spectral acceleration (PSA), that can be related withtheir effects on the structure. They represent the link between the seismic hazard assessmentprovided by seismologists and structural analysis conducted by engineers. The critical point ishow to select an appropriate IM (or set of IMs). Following Giovenale et al. (2004) and Lucoand Cornell (2007), an appropriate IM must be “efficient”, “sufficient” and “predictable”through a seismic hazard analysis. An IM is efficient if it exhibits a small variability withrespect to the selected EDP and it is sufficient if it is conditionally independent of the eventmagnitude M and distance R. It is important to notice that those conditions not only dependon ground motion nature but also on the characteristics of the analysed structure. If the IMis sufficient the formulation in Eq. (1) can be expressed as follow:

λEPD (s) ∼=∑all s

P(EDP > si

∣∣I M = yi)�λI M (yi ) (2)

where P(EDP > si |I M = yi ) express the probability of exceeding a given EDP level sconditioned to a IM’s value yi , and �λI M (yi ) is the annual frequency of IM = yi.

It is interesting to observe that the best IM is the structural demand measure itself. Tra-ditionally, the most used ground motion IM is the spectral acceleration PSA at a frequencyclose to the fundamental frequency of the studied structure with a damping of 5 %, or thePGA. Several authors discussed the correlations between the responses of several typologiesof structures and a given IM (or set of IMs). Among them, Bommer and Martinez-Pereira(2000) focused their study on the importance of strong ground motion parameters as PGA,peak ground velocity (PGV) or duration. Lestuzzi et al. (2004) investigated the correlationbetween the responses of single degree of freedom, non-linear systems (hereinafter SDOF-s)and several IMs (Arias intensity, Housner intensity and a custom spectral intensity based onthe fundamental frequency of the single degree of freedom models). Hancock and Bommer(2006) reviewed the influence of the ground motion duration on damage, concluding that itstrongly depends on the metric used to quantify the damage, and that the contribution of theduration cannot be easily decoupled from the signal amplitude, since both affect the damagelevel. It is important to recall that the choice of the best IM is strongly controlled by theinvestigated structures. For example, Luco et al. (2005) studied the sufficiency and the effi-ciency of a set of ground motion parameters with respect to the damage of elastic, ductile andfragile steel moment-resisting frame. In that case the sufficient IM vector includes the first

123

Bull Earthquake Eng

mode spectral acceleration, the higher mode spectral acceleration and the first mode inelasticspectral acceleration. Some authors investigated the predictive IMs for medium-period framestructures (Yakut and Yilmaz 2008; Jayaram et al. 2010). Lucchini et al. (2011) looked at thebest IM to predict the behaviour of torsional buildings; Asgarian et al. (2012) looked at thebest IM for tall building: finally Mollaioli et al. (2013) investigated the efficiency of severalIMs for base isolated structures.

In this paper, after a quick overview of the correlation between several IMs and theresponse of given SDOF-s, a step forward is proposed through a Bayesian approach for theselection of strong ground motion. The scope is to evaluate the conditional probability thatthe response of a structural system reaches a given level of a given EDP when excited witha seismic input described by an array of classical ground motion parameters (IM). In orderto train the Bayesian classifier, in the first part of the work a training dataset composed by6,373 waveforms is exploited jointly with six non-linear SDOF-s. The correlation betweeneach IM and the chosen EDP (the relative drift) will be used to define the probability densityfunction of a given IM as a function of the drift level, that will be combined via a likelihoodproduct. The strength point of the proposed algorithm is the combination of different IMdescribing the amplitude, frequency content and energy release. This not only leads to abetter description of the complexity of the input ground motion, but it also allows to take intoaccount the evolution of the degree of correlation between the IMs and the EDP when thesystem passes from the linear domain to the plastic or the fragile ones. A further advantageof such a strategy is that it takes into account the observed variability of the signal, but at thesame time it reduces the number of selected ground motions by conditioning their selectionto a given structural response.

In the second part of the work, the proposed scheme will be applied to a target datasetcompletely disjointed from the first one. Seven thousand waveforms recorded in Japan havebeen classified with respect to the behaviour of the defined SDOF-s using the probabilitydensity functions issued from the training data-set.

2 The Takeda model

The SDOF-s are modelled using the Takeda et al. (1970) hysteretic model. The code imple-mented by Lestuzzi (distributed through his web site: imac.epfl.ch/pagw-55838-en.html) isused. This MATLAB code evaluates the response of a non-linear system by applying thecentral difference method (Chopra 2011) and the Takeda et al. (1970) deformation model inthe Allahabadi and Powell (1988) formulation.

The Takeda model is generally used to simulate the behaviour of reinforced concretestructural elements since it includes some realistic conditions to describe the reloading curvesand it takes into account the stiffness degradation caused by the excursions in the plasticdomain. Moreover, compared with 1D–2D or 3D structural modelling, such a model reducesthe computation time to only a few seconds per accelerogram. The model is described byfive parameters: the initial stiffness (K0), the yield drift (xe), the post-yield stiffness (r · K )

controlling the frequency degradation, the α parameter describing the unloading stiffnessdegradation and the β parameter which controls the target point for the loading curves afterplastic deformation.

The Takeda strength-deformation law is described in Fig. 1 (Figure modified from Lestuzziand Badoux (2008)). In the case of pure elastic response, the diagram is a straight line whoseslope is proportional to the element stiffness (grey area in the figure). When the loadingstress is strong enough to induce a drift greater than the elasticity limit (xe), the element

123

Bull Earthquake Eng

F

xxe xp

Fy

Plastic FragileElastic

K0

rK0

β(xu-xe)

xu

Reloading

Fig. 1 Takeda model hysteretic diagram describing the evolution of the system strength (axis y) as a functionof its deformation history (axis x)

is deformed plastically (red area) and the stiffness degradation occurs. As a consequence,the loading mechanism changes. The β parameter controls the reloading curve, after eachexcursion in the plastic domain the maximum allowed deformation is β (xu − xe), with β

varying between 0 and 1. The α parameter tunes the degradation of the reloading stiffness(K/K0), following the exponential function:

K

K0=

(xu

xe

)−α

Once the element reaches the maximum deformation (xu), it starts to unload elastically. Ifthe strength increases again, the element is reloaded plastically and the cycle starts again.If the maximum deformation coincides with the plasticity limit (xp), the element is in thefragile domain and the model can’t predict its behaviour realistically.

In this application β and α have been set to 0.1. The post-yield stiffness ratio is set equalto 5 % and a numerical damping of 5 % is assumed. In the next section, more insights on theused parameters will be given.

3 The SDOF-s parameters

The SDOF-s used in this study are described by the capacity curves in Fig. 2, those curves havebeen computed by measuring the SDOF-s drift values when excited with 6,373 waveforms.The capacity curve is defined by two points: the yield capacity (corresponding to the yielddrift xe) and the ultimate capacity (xp). Beyond the yield capacity point the stiffness of thestructure decreases as a consequence of the degradation of the concrete (cracking) and thesystem is plastically deformed. The ultimate capacity is reached when the structure is in afully plastic state. For earthquake-designed buildings, the structure is assumed to deformbeyond the ultimate capacity without loss of stability. In this application the systems drift islimited to the ultimate point drift (xp) (Fig. 2). The capacity curve slope (K0) is the SDOF-s

initial stiffness related to the initial frequency of the system F0 = 12π

√K0m , where m is the

mass of the structure.

123

Bull Earthquake Eng

K0

A B5 Hz

2 Hz 2 Hz

1 Hz

5 Hz

1 Hz

Linear Domain

Fragile DomainPlastic Domain

xe xp

yield point ultimate pointyield point

ultimate point

Fig. 2 Capacity curves for the 6 single degree of freedom systems. On the left: capacity curves for the brittlesystems not allowing plastic deformation, in this case the yield and the ultimate drifts coincide. On the right:capacity curves for the ductile systems. The drift is bounded to the ultimate values represented as verticaldistributions of points. The higher frequency systems (5 Hz) have the highest slope. For sake of clarity theyield drift xe and the ultimate drift are traced only for the ductile element with initial fundamental frequencyequal to 1 Hz. For this element, the three classes (linear, plastic and fragile) used in the paper are also reported

Table 1 Parameters of the 6SDOF-s

Fundamentalfrequency (Hz)

Yielddrift (cm)

Experimentname

μ=1 Brittle systems 1 3.15 μ 1F1

2 1.40 μ 1F2

5 0.56 μ 1F5

μ=3 Ductile systems 1 4.19 μ 3F1

2 1.85 μ 3F2

5 0.74 μ 3F5

Two sets of SDOF-s have been modelled with different kinematic ductility μD = x pxe

,initial fundamental frequency of 1, 2 and 5 Hz, and with seismic behaviours representativeof the low- (μD = 1) and high- (μD = 3) earthquake design code. Capacity curves forthe brittle systems (μD = 1) are shown in Fig. 2a. In this case the yield and the ultimatepoints coincide. Capacity curves for the ductile systems (μD = 3) kink at the yield point,the initial stiffness decreases and the system starts to plastically deform up to the ultimatepoint (Fig. 2b).

In Table 1 the fundamental frequency (F0) and the yield displacement for each SDOF-sare reported. The ultimate drift is given by the yield displacement multiplied by μD . Theyield displacements were selected according to the drift thresholds provided by the Hazusmethodology (FEMA 2003), by considering the slight damage grade, for three typologies ofbuildings classified according to their number of floors (related to the frequency) and the levelof earthquake design. For the sake of synthesis, hereinafter each system will be labelled usingthe experiment name reported in the last column: i.e. the brittle system with fundamentalfrequency of 1 Hz will be the μ 1F1 system.

123

Bull Earthquake Eng

The Takeda parameters have been set with no intent of reproducing the behaviour of a targetstructure, but rather as representative of typical reinforced concrete structural element. Theselected EDP describing the response of the system is the maximum relative displacement,corresponding to the maximal drift.

4 Data set and IM

The training dataset is composed of 6,373 strong-motion waveforms including (Fig. 3):

– 3,928 California earthquake strong motion data (CESMD) waveforms(strongmotioncenter.org),

– 716 Chi–Chi mainshock records (strongmotioncenter.org),– 663 Dartfield sequence accelerograms, collected from September 2010 to July 2011

(geonet.org.nz),– 809 strong-motion records issued by the European strong motion data base (ESDB)

(Ambraseys et al. 2004),– 208 waveforms from the Italian “Rete Accelerometrica Nationale” including strong

motions from L’Aquila and Emilia sequences (itaca.mi.ingv.it),– 42 waveforms from the French “Réseau Accélérometrique Permanent” (RAP) mainly

including Antillean events (www-rap.obs.ujf-grenoble.fr),– 7 records of the Lorca earthquake (from the “Istituto Geográfico Nacional”, IGN) (Antón-

Antón personal communication).

Waveforms are selected with PGA greater or equal to 0.025 g. This condition is equivalent tonon-zero standardized cumulative velocity (SCAV see Table 2 for the parameter description)

Fig. 3 2D histogram showing themagnitude–distance distributionof the training dataset. The colourscale indicates the number ofwaveform in each bin. a Thecriterion applied to select data isbased on the PGA value,therefore it is still interesting tolook at this kind of distribution.The lack of data forlower-magnitude, higher-distanceis driven by the PGA threshold(white area on the top-rightcorner). A well-known problemis the lack of near-source recordsfor the strongest events (white areon the left bottom corner). bDistribution of PGA

A

B

123

http://strongmotioncenter.org

http://strongmotioncenter.org

http://geonet.org.nz

http://itaca.mi.ingv.it

http://www-rap.obs.ujf-grenoble.fr

Bull Earthquake Eng

Table 2 Synoptic table of the investigated intensity measures

Abbreviation Parameter description Equation

PGA Peak ground acceleration max |a (t)|PGV Peak ground velocity max

∣∣∫ a (t) dt∣∣

PGD Peak ground displacement max∣∣∫∫ a (t) dtdt

∣∣Arias Arias intensity π

2g

∫a2 (t) dt

Husid Time interval where the Arias inten-sity varies from 5 to 95 % of its finalvalue

Housner Housner intensity∫ 2.5s

0.1s P SV (τ )dτ

SpInt Spectral intensity∫ 0.5s

0.1s P S A(τ )dτ

SCAV Standardized cumulative absolute velocity∫ ti +1

ti|a (t)| dt

∀ti where a (ti ) ≥ 0.025 gPSA 1 Hz – Pseudo-acceleration spectra estimated at 1 Hz

PSA 2 Hz – Pseudo-acceleration spectra estimated at 2 Hz

PSA 5 Hz – Pseudo-acceleration spectra estimated at 5 Hz

Id Damage factor 2gπ

AriasPG A·PGV

(Kostov 2005) leading to the magnitude-distance distribution reported in Fig. 3 (with Mranging from 4.0 to 7.4 and hypocentral distances ranging from 4 to 143 km).

A baseline correction to each waveform is also applied before calculating the array ofclassical intensity measures (IM), listed in Table 2.

5 Preliminary statistical analysis

The standardized variable covariance matrix in Fig. 4 provides a first glance of the correlationsexisting between the IM described in Table 2 and the maximum drift of the 6 SDOF-s. Thestandardized variables corresponding to each IM (z) is defined as:

zi = xi − x̄

σx

where xi is the ith value of the considered variable, x̄ the mean and σx the standard devi-ation. This representation is widely used in principal component analysis. Indeed from thestatistical point of view, the x variable normal distribution x(x̄, σx ) becomes a z-distributionz(0, 1) centred on zero with unitary variance: instead of looking at a single sample and itsstandard deviation, the population mean and standard deviation are used to render the cal-culation more stable and realistic. From the physical point of view, the IM-s variables havedifferent units of measure since they describe different physical properties, and the use of thestandardized variable allows comparing them directly. Finally the covariance (cov) matrix(Fig. 4) expresses the degree of correlation between each pair of IM and between the IMs interms of how much the two quantities vary together.

Since the drift is truncated at its ultimate value, the corresponding drift z-distributions forthe six SDOF-s are truncated as well at the standardized value zUltimate Dri f t = xu−x̄

σx, where

xu is the ultimate drift value. As a consequence, the covariance values IM-drift decrease(the IM values increase while the drift keeps the same value), weakening the statisticalsignificance of such a test. However, since the number of ultimate drift observations is small

123

Bull Earthquake Eng

Driftμ1F1

Driftμ3F1

Driftμ1F2

Driftμ3F2

Driftμ1F5

Driftμ3F5

Drif

tμ1F

1

Drif

tμ3F

1

Drif

tμ1F

2

Drif

tμ3F

2

Drif

tμ1F

5

Drif

tμ3F

5

Fig. 4 Correlation matrix. For the sake of clarity all the correlation factors <0.7 are masked. The black boldlines divide the matrix in four quarters (from top-left in clockwise versus): IM correlation, correlation betweenthe IM and the system response, system responses correlation, the 4th quarter is empty since the matrix issymmetric

and the corresponding IM-s values lie on the coda distribution, the test can be still consideredvalid, at least to evaluate of the degree of correlation between the drift and the IMs.

The spectral and Arias intensities are the only two parameters with a good level of correla-tion (cov > 0.7) with the maximum drift for all the SDOF-s under study. The SCAV exhibitscov values with the drift ranging from 0.7 to 0.9 except for the μ 3F5 elements (cov < 0.7).The other IMs show a variable degree of correlation mainly depending on the fundamentalfrequency of the system, i.e. the PGA is well correlated with the drift measured on the μ

1F5 and μ 3F5 systems (cov > 0.8), while the PGD with the drift measured on the μ 1F1and μ 3F1 elements (cov > 0.7). The Husid duration and the damage factor (Cosenza andManfredi 2000) are weakly correlated with the drift (cov < 0.7 for all the SDOF-s); thusthose parameters are not included in the further analysis.

Some complementary insights are shown in Fig. 5 where each point represents the IMversus maximum drift 2D-histograms for each SDOF-s. This representation gives a quali-tative overview of the scatter associated to each IM—drift relation and of the distributionof IM values over the drift range. Again, the drift is truncated at its ultimate value; datareaching that value are represented as a vertical distribution of points. For each SDOF-sthe PSA at the fundamental frequency shows a straight line with unitary slope until theyield drift, then (for the ductile structures) the lines start to be more scattered. This is nota surprising result since the PSA at the SDOF-s initial fundamental frequency is defined as

123

Bull Earthquake Eng

Brittle Elements Ductile Elements1 Hz

2 Hz

5 Hz

1 Hz

2 Hz

5 Hz

Fig. 5 Correlation for the brittle (left block) and ductile (right block) systems for the different fundamentalfrequencies (top 1 Hz, middle 2 Hz, bottom 5 Hz). Each panel shows the 2D histograms of a given IM as afunction of the drift values, this representation gives insight on the density of observation as a function of thevalues assumed by the investigated parameters

the maximum drift in the linear domain; it thus represents the structural system responseitself.

Concerning the peak IM, the PGA is less scattered for the 5 Hz systems, the PGV for the2 Hz systems and the PGD for the 1 Hz systems.

The Arias intensity versus drift histograms have the same scatter and shape independentlyto the fundamental frequency, however they are less scattered for ductile structures. TheSCAV histograms are very scattered for all the studied elements, so this IM cannot be usedas proxy for damage level prediction. This is an expected result since the SCAV does nottake into account the spectral values. Nonetheless the SCAV is considered as one of the bestindicator of failure (Katona and Tóth 2013), and the parameter to be used to filter the PSHAdisaggregation (Chapmann 1999).

Besides the PSA, the better parameters in terms of dispersion and correlation (cov 0.8–1.0 in Fig 4), are the Housner intensity for the 1 and 2 Hz systems (the PSV has mainlylow-intermediate frequency content) and the spectral intensity, defined as the PSA spectrumintegral over 2–10 Hz range, for the 5 Hz systems. This is not surprising, since these para-meters describe the spectral velocity or acceleration content averaged over two frequencies,taking into account the fundamental frequency degradation related to the appearance of plas-tic hinges or cracks in the structures. Furthermore, several authors investigated the best way ofdefining the frequency integration range for pseudo-displacement/velocity and acceleration

123

Bull Earthquake Eng

spectra, in order to customize the IM parameter as a function of the investigated structureproperties (De Biasio et al. 2014; Lestuzzi et al. 2004).

Except for the PSA, the IM-drift trends do not show a clear change in slope and scatterbeyond the elasticity limit.

The regression relationships between each IM and the drift are not estimated in this papersince a single regression law cannot explain the three domains (elastic, plastic, fragile) giventhe fact that it is impossible to include the fragile data in the regression. Moreover, the densityof observation is not uniform over the IM and drift ranges (as shown by the distribution inFig. 5) driving to very high standard errors on the regressions.

A naïve Bayesian classifier is instead proposed: it is a statistical tool that associates theinput ground motion with a specific class related to the final drift induced on the SDOF-s.

6 The naïve Bayesian algorithm

Suppose that the EDP of a structural system can be described by a variable c that is categorical(it has a discrete instantiation) and it depends on some other variable IM = {IM1, . . ., IMN}.In the present application, the drift values are discretized in the three classes (c) defined asfollows:

– If the measured drift is less than the yield drift, the system remains in the elastic domain;the waveform will be attributed to the class “elastic”.

– If the measured drift is greater than (or equal to) the yield drift and less than the ultimatedrift, the system is driven in the plastic domain; the waveform will be classed as “plastic”.

– If the measured drift is greater than (or equal to) the ultimate drift, the system is driven inthe fragile domain; the waveform will be classed as “fragile”.

The faced problem is to predict the conditional probability that a given structure is driven in theclass c given a waveform characterized by an array of intensity measures

{im1p, . . . , imnp

}(conditional distribution of the class c given IM). This is the typical classification problemthat can be addressed using a naïve Bayesian classifier:

P(c∣∣IM) = P (c) P

(IM

∣∣c)

P(IM)(3)

The implementation of a Bayesian procedure requires (Wasserman 2004):

– choosing a probability density P (c) called prior distribution, expressing our belief on theparameters c before to see any data;

– determining a statistical model P(IM|c) expressing the belief about data IM given theparameter c;

– updating the beliefs after observing the data{IM1 = im1p, . . . , IM N = imN p

}(where

IM j denotes the jth IM, with j ∈ [1, N ] and im jp is the pth values assumed by the jthIM), and calculating the posterior distribution P(c|IM).

The aim is to estimate P(c = ci |IM = im1p, . . . , imN p) which can be expressed as:

P(c = ci

∣∣IM = im1p, . . . , imN p) = P (c = ci ) P

(im1p, . . . , imN p

∣∣c = ci)

∑j P

(c = c j

)P

(im1p, . . . , imN p

∣∣c = c j)

(4)

where the summation over j covers the whole event space, and c j is a partition of the eventspace. Assuming that the IM variables are conditionally independent on the parameter c

123

Bull Earthquake Eng

the conditional probability P(im1p, . . . , imN p|c = ci ) can be expressed as the likelihoodproduct:

P(IM1p, . . . , IM N p

∣∣c = ci) =

∏N

j=1P

(IM jp = im p

∣∣ci)

(5)

where P(IM jp = im p|ci ) is the conditional probability of observing the imp value of the jthIM on the class ci.

The hypothesis of conditional independence is not a simple assumption; it lies on theargument that, given the class ci , the value im j assumed by the jth IM does not depend onthe value imk assumed by the kth IM. Under a physical perspective this assumption is notrealistic. Nonetheless it works well since, as explained in Kuehn and Scherbaum (2010),the intention of the Bayesian classifier is not to be a physical model explaining the datagenerating process, but rather to be a simple statistical predictor of the class c given the IM,based on the empirical probability curves (in our application the so called frequency curves).

Concerning the prior distribution, two possible hypotheses can be considered. First, theP(s) is defined as the relative frequency of the classes observed on a training dataset:

P (c = ci ) = #(c = ci )

#s(6)

where the symbol # means number. Second, the P(s) is assumed to be uniform and equal toone:

P (c) = 1 (7)

The impact of the two assumptions on the classifier performances will be discussed later inthe paper (section: “Influence of the prior distribution”)

The probability P(IM jp = imp|ci ) can be estimated from the relative frequency of data,measured on the training dataset:

P(IM jp = imp

∣∣c = ci) = #(IM jp = imp ∧ c = ci )

#(IM jp = imp)(8)

where ∧ is the logic “and” operator. Equation (8) expresses the ratio between the simultaneousobservation of the class ci and the pth value of the jth considered IM, and the number ofobservation of the pth value of the jth considered IM. A consequence of the formulation inEq. 6 is that the sum of the P(IM jp = im p|ci ) is equal to 1:

∑s

P(IM jp = im p

∣∣ci) = 1 (9)

This propriety is “transferred” to the P(c|IM) = P(c)P(IM|c)P(IM)

, leading to define complemen-

tary probability levels for each class, conditioned to the observation IM. For example: giventhe array IMk , if the probability P(c j |IM) to observe the class c j is equal to 70 %, the sumof the probabilities of observing the other classes will not be >30 %, allowing to class thekth in c j class.

7 The frequency curves

The relative frequency of data described in Eq. (8) has been obtained by injecting the trainingdataset of 6,373 accelerometric waveforms in the six non-linear SDOF-s described in previoussections. The plots in Figs. 6 and 7 refer to the distribution for each IM as a function of the

123

Bull Earthquake Eng

final class of the SDOF-s for the brittle and ductile structures (right column), and the relativeprobability curves (left column) for the different SDOF-s.

For each IM, the unfilled histogram describes the whole distribution regardless of the finalclass. The PGA is characterized by a truncated distribution related to the 0.025 g threshold,imposed as selection criteria. The histograms filled in grey (in both figures) are relative tothe IM distribution associated with the elastic class, the red histograms (in Fig. 7) are the IMdistributions relative to the plastic class, and finally the blue histograms (in both figures) arethe IM distributions relative to the fragile class. Coherently with the formulation in Eq. 8,the ratio between the height of the class-specific distribution and the whole distribution,normalized to one, gives the value of the corresponding probability curve. The probabilitycurves express the probability for the studied system to be in a given damage state definedon the basis of the reached drift, and in this sense they are equivalent to the system vulnera-bility curves. Following the Bayesian terminology, the damage state are the classes used forlabelling the waveforms.

The choice of the appropriate bins number and width is not an easy task. If the binning istoo coarse, the histogram does not give much information about the shape of the probabilitydistribution. On the other hand, if the binning is too fine, bins become empty and the histogrambecomes noisy (it “overfits” data). Several methods are proposed to select an appropriatebinning. In this paper, a criterion inspired by the jacknife likelihood proposed by Hogg(2008) is used, whereby the choice of the binning is the one that best predicts future data. Amonotonic shape of the derived frequency curves is imposed. This request lies in the physicsof the described process.

Since the brittle structures (Fig. 6) have only two possible classes, for each IM a single“crossing range” is identified: a range of values where the fragile class becomes more frequent(and probable) than the elastic one. The width of the “crossing range” depends on the distancebetween the elastic and fragile distributions. The best IM on Fig. 6 is the PSA since it coincideswith the model itself: the elastic and fragile distributions of the parameters are well separatedand the frequency curves passes from the two limit values (0 et 1) with a sharp step (reducedcross-over range).

For all the others parameters, the class distributions are more or less overlapping, thenumber of superimposed bins is directly related to the dispersion of the IM as a functionof the drift (Fig. 5). Indeed, the overlapping of the two distributions has no impact on theeffectiveness of a given parameter in the classification procedure, except when the bin heightsbecome comparable. In this case, the probability of observing the IM values over both classesare very close. An example is given by the PGD distributions in Fig. 6 panel C where theprobability of observing log10(PGD) values ranging from 1 to 1.7 is the same for the elasticand fragile classes.

The frequency curves of ductile structures have a more complex behaviour (Fig. 7). A IMcan be considered as a good proxy for the drift class when the elastic and fragile frequencycurves keep a monotonic behaviour while the frequency curve for the plastic class has a bellshape (increasing—stable—decreasing frequency), and the maximum frequency over eachclass are well separated with respect to the IM values. Again the PSA is the best IM, but alsothe Housner intensity for the μ 1F1 and μ 1F2 systems (Fig. 7a, b). The spectral intensityfor the μ 1F2 and μ 1F5 systems (Fig. 6b, c) can be considered as a good IM. The PGD inFig. 7b, c (μ 1F2 and μ 2F5 systems) and the SCAV in Fig. 7c (μ 1F5 systems) are examplesof IMs with a strong scatter over the three classes.

Despite the optimized binning procedure, the frequency curves have been slightly modifiedto keep a realistic behaviour. An example is shown in Fig. 7c (frequency curves for the μ 3F5systems), where the last points of the frequency curves for PSA have been adjusted since it

123

Bull Earthquake Eng

AC

B1

Hz

2 H

z5

Hz

Prob

abili

ty

Curv

es

Elas

ticFr

agile

Prob

abili

ty C

urve

sM

onot

onic

Beh

avio

ur

Elas

ticFr

agile

IM d

istr

ibut

ion

IM d

istr

ibut

ion

Elas

tic C

lass

IM d

istr

ibut

ion

Frag

ileCl

ass

Fig

.6D

istr

ibut

ion

ofth

ein

vest

igat

edIM

asa

func

tion

ofth

edr

iftc

lass

trac

edal

ong

with

the

freq

uenc

ycu

rves

for

each

clas

s.L

eft,

britt

leel

emen

tsw

ithfu

ndam

enta

lfre

quen

cyof

1H

z;m

iddl

e,br

ittle

elem

ents

with

fund

amen

talf

requ

ency

of2

Hz;

righ

tbr

ittle

elem

ents

with

fund

amen

talf

requ

ency

of5

Hz.

The

grey

hist

ogra

ms

are

rela

tive

toth

eel

astic

clas

s,th

ebl

ueon

esto

the

frag

ilecl

ass,

the

blac

k-ed

geun

fille

dhi

stog

ram

isth

eIM

dist

ribu

tion.

The

freq

uenc

ycu

rves

valu

esar

eob

tain

edfr

omth

era

tio(n

orm

aliz

edto

1)be

twee

nth

ehe

ight

ofth

eIM

dist

ribu

tion

for

agi

ven

clas

s(i

.e.g

rey

bins

for

the

elas

ticcl

ass)

and

the

over

allI

Mdi

stri

butio

n(b

lack

unfil

led

bins

)

123

Bull Earthquake Eng

1 H

z2

Hz

5 H

zA

CB

IM d

istr

ibut

ion

IM d

istr

ibut

ion

Elas

tic C

lass

IM

dis

trib

utio

n Fr

agile

Clas

s

IM d

istr

ibut

ion

Plas

ticCl

ass

Prob

abili

ty

Curv

es

Elas

ticPl

astic

Frag

ileEl

astic

Plas

ticFr

agile

Prob

abili

ty C

urve

sM

onot

onic

Beh

avio

ur

Fig

.7D

istr

ibut

ion

ofth

ein

vest

igat

edIM

asa

func

tion

ofth

edr

iftc

lass

trac

edal

ong

with

the

freq

uenc

ycu

rves

fore

ach

clas

s.L

eft,

duct

ileel

emen

tsw

ithfu

ndam

enta

lfre

quen

cyof

1H

z;m

iddl

e,du

ctile

elem

ents

with

fund

amen

talf

requ

ency

of2

Hz;

righ

t,du

ctile

elem

ents

with

fund

amen

talf

requ

ency

of5

Hz.

The

grey

hist

ogra

ms

are

rela

tive

toth

eel

astic

clas

s;th

ere

don

eto

the

plas

ticcl

ass;

the

blue

ones

toth

efr

agile

clas

s;th

ebl

ack-

edge

unfil

led

hist

ogra

mis

the

IMdi

stri

butio

n.T

hefr

eque

ncy

curv

esva

lues

are

obta

ined

from

the

ratio

(nor

mal

ized

to1)

betw

een

the

heig

htof

the

IMdi

stri

butio

nfo

ra

give

ncl

ass

(i.e

.gre

ybi

nsfo

rth

eel

astic

clas

s)an

dth

eov

eral

lIM

dist

ribu

tion

(bla

ckun

fille

dbi

ns).

The

dash

edcu

rves

(with

tran

spar

entm

arke

rs)a

reth

efr

eque

ncy

curv

esob

tain

edw

ithou

tany

adju

stm

ento

nth

eIM

dist

ribu

tions

.Som

etim

esth

eysh

owa

non-

phys

ical

beha

viou

r.T

his

isth

eca

sefo

rthe

PSA

plas

tican

dfr

agile

curv

esfo

rthe

5H

zel

emen

ts,t

hePG

Afo

rthe

elas

ticfr

eque

ncy

curv

efo

rthe

1H

zel

emen

ts,a

ndal

soth

esp

ectr

alin

tens

ityan

dth

eSC

AV

123

Bull Earthquake Eng

is not realistic that the frequency of observations of higher values of PSA over the fragiledomain decreases for log10(PSA) > 0.6. Indeed the frequency curves definition suffers fromthe lack of observations in the last bins (under sampling problem).

8 Sensitivity study and single IM classification scheme

After the training phase, the Bayesian method is tested on the training dataset by comparingthe observed classes with those predicted by the Bayesian algorithm.

In this section, two single IM applications are shown. The first is based on the PSA, with theaim of discussing the algorithm sensitivity. Since the PSA at a given F0 is the response of theoscillators in the linear domain, this IM gives the best result in terms of discrimination betweenelastic–fragile classes (for brittle systems) or elastic–plastic classes (for ductile systems).

Results are shown in Fig. 8 where the observed and the predicted PSA distribution areplotted along with the confusion matrix for the brittle SDOF-s: in this representation theelements distributed along the diagonal are the number of data correctly classed, while theelements outside the diagonal are the misplaced data. In this specific case, all the misplaceddata are related to the binning process, except for the plastic–fragile classes. This limitationis due to the discrete distributions: the minimum IM variation that can be discriminated isrelated to the bin width, leading to a loss of precision.

The second single-IM application is based on the best-correlated parameters (cov > 0.8in Fig. 4). The aim here is to demonstrate the differences and the improvement related to theuse of an array of IMs. This application is closer to a more realistic case, where there is noIM parameter perfectly matching the structural response. The selected IM are the Housnerintensity for the 1 and 2 Hz systems, and the spectral intensity for the 5 Hz systems. Theresults of the single parameters classification are reported in Fig. 9 (same structure as Fig. 8but with the selected IM distributions).

Even if the selected IMs are well correlated with the drift they are not able to fully discrim-inate the different classes. In particular they fail over the ranges where the IM distributionsoverlap, like for the Housner intensity distributions for the 1 and 2 Hz SDF-s in Fig. 9 or forSpectral intensity distributions (for both brittle and ductile elements). The predicted distrib-utions show a strait separation between each class, while the observed ones are overlapping.

Since the relations between a single IM and the system response are scattered (even for agood degree of correlation), it is not possible to fully classify the waveforms using a single IM.

9 Influence of the prior distribution

Only by combining several IMs via the likelihood product (Eq. 5) and the naïve Bayesianformulation, can the number of misplaced data be reduced. Figures 10 (brittle elements) and11 (ductile elements) show the classifier performances with the prior distribution based onthe frequency of each class (Eq. 8) and the uniform one (Eq. 9). Looking at IM distributions(Figs. 10, 11), it is clear that using an array of IMs leads to a better description of the observeddistribution, even resolving the ranges where the distributions overlap. The improvementis observed when comparing results in Figs. 9 and 10, especially for plastic and fragileclasses.

However, looking at the confusion matrixes in Figs. 9 and 10, the number of elementsclassed as elastic is higher when a single IM is used. This is a computational artefact: since thenumber of elastic data is higher than the number of plastic or fragile ones and the distributions

123

Bull Earthquake Eng


2 Hz

5 Hz

1 Hz

2 Hz

5 Hz

Observed

Predicted

Fig. 8 Classification based on the PSA for the brittle (left block) and ductile structures (right block), for the1 (top), 2 (middle) and 5 Hz (bottom) systems. On the left column, PSA distributions obtained applying thenaïve Bayesian classifier (empty histograms contoured in black for elastic class, in red for the plastic class andin blue for the plastic class) and the observed ones (filled histograms). On the right, the confusion matrix: onthe X axis the observed class and on the Y axis the predicted one are reported, along the diagonal the correctlyclassed waveforms. The size of each square is proportional to the logarithm base 10 of the number of elements;for the sake of clarity, this number is also reported. Since the PSA is the drift measured in linear domain, this isthe best classification that can be achieved, the errors being related to the sensitivity due to the binning process

are not renormalized, the sharp cut-off distribution (related to the single IM) covers the tail ofthe elastic distribution (Fig. 9) and the elastic class is assigned more frequently. The drawbackis that even the “fragile” waveforms are classed as elastic (see elements on the down-side ofthe diagonal), downgrading the algorithm performances in selecting data belonging to theseclasses.

The different prior distributions do not affect the overall number of well-classed data,however frequency-based prior distribution gives the less damaging class as more probable.Since the elastic class is the more frequent the other classes are considered less probable. Theuniform prior distribution does not introduce biases and is more efficient in classing data inthe plastic and fragile classes.

From now on, the uniform prior definition will be used since it is preferable to make nohypothesis on the class distribution instead of using an hypothesis strongly dependent on thesampling of the training dataset, since in a real situation the data distribution is not known.

123

Bull Earthquake Eng

Observed

Predicted


2 Hz

5 Hz

1 Hz

2 Hz

Housner Intensity Housner Intensity

Housner Intensity Housner Intensity

Spectral Intensity Spectral Intensity

Fig. 9 Classification based on the best-correlated IM (non considering the PSA) for the brittle (left block)and ductile structures (right block). For the 1 Hz (top) and the 2 Hz (middle) systems, the Housner intensity isused. For the 5 Hz (bottom) systems the spectral intensity is used. The confusion matrixes are plotted alongwith the distribution of the selected IM. The colour scheme is the same of Fig. 8

10 Confidence bound

In the examples discussed so far, the waveforms are associated to the final class with the high-est probability (Eq. 4). The data are “not classed” only if the probability associated to theelastic class is equal to 50 %, a condition that never occurs in the examples shown in Figs. 8, 9,10 and 11. However, more strength to the classifier can be given by bounding the probabilitiesassociated to each class. Given an input signal characterized by the array IM of intensity mea-sures, the classifier will exclude the class c j if and only if the condition P(c j |I M) < Plow

is respected. Moreover, the classifier associates the waveform with the ci if and only ifP(ci |I M) > Phigh , where P(c j |I M) is the probability of the class given the array IM ofintensity measures, and Plow and Phigh are two probability bounds set by the user. Figures 12a(brittle systems) and b (ductile systems) show the results obtained applying the 10–90 %bounds. Both not-classed and misplaced data are distributed over the bins located on the super-position of the elastic and plastic distribution, and the confusion matrix shows that boundingthe probability leads to reducing not only the number of elements outside the diagonal, butalso the well classed data. Thus the performances of the algorithm measured in terms ofmatched classes are not directly related to the confidence degree associated to each labelling:

123

Bull Earthquake Eng

1 Hz

2 Hz

5 Hz

1 Hz

2 Hz

5 Hz

Non Uniform Prior Probability Uniform Prior ProbabilityPSA PSA

PSA PSA

PSA PSA

PGA PGA

PGA PGA

PGA PGA

PGV PGV

PGV

PGVPGV

PGV

PGD PGD

PGDPGD

PGD PGD

AI AI

AI AI

AIAI

HI

HI

HIHI

HI

HI

SI

SI

SI SI

SI

SISCAV SCAV

SCAV

SCAVSCAV

SCAV

Observed

Predicted

Fig. 10 Naïve Bayesian classification for brittle elements based on the IM array and the frequency based priordistribution (left block); uniform prior distribution (right block) for the 1 (top), 2 (middle) and 5 Hz (bottom)systems. The confusion matrixes are plotted along with the observed and predicted distribution for all the usedIM. Since this representation is focused on the shape of the distribution, the IM values are not reported. TheAI HI SI labels indicate the Arias, Housner and Spectral intensities, respectively

some elements have been associated to an erroneous class with a probability score >90 %.This depends on the fact that, for some waveforms, the majority of the used IM does notexplain the observed drift; in this sense, such data can be considered as outliers. Looking indetail their characteristic could be an interesting exercise aimed at better understanding thelimit of the selected IM, but it is beyond the scope of the present paper.

11 Blind application to Japanese data set

In order to quantify the efficiency of the proposed naïve Bayesian classifier, the method isapplied to a second dataset completely disjointed from the first one.

The idea is to use the frequency distributions displayed in Figs. 6 and 7 to class the systemsresponse when excited with the new data. The results will be then compared with final classesobserved when new strong ground motion dataset are inputted in the six SDOF-s.

The dataset consists of 7,000 waveforms recorded at K-net Japanese network from 1994to 2013. Their characteristics are displayed in Fig. 13. A blind selection has been made,without any discrimination between the event mechanisms (crustal, volcanic or subductionearthquakes), magnitude, focal depth or site-event distance. The only selection criterionapplied is PGA > 0.1 g. This choice is not coherent with the assumption made on thetraining dataset, but imposing a PGA > 0.025 g on such a rich dataset leads to select too

123

Bull Earthquake Eng

1 Hz

2 Hz

5 Hz

1 Hz

2 Hz

5 Hz

Non Uniform Prior Probability Uniform Prior Probability

Observed

Predicted

PSA PSA

PSA PSA

PSA PSA

PGA PGA

PGA PGA

PGA PGA

PGV PGV

PGV

PGVPGV

PGV

PGD PGD

PGDPGD

PGD PGD

AI AI

AI AI

AIAI

HI

HI

HIHI

HI

HI

SI

SI

SI SI

SI

SISCAV SCAV

SCAV

SCAVSCAV

SCAV

Fig. 11 Naïve Bayesian classification for ductile elements based on the IM array. The figure items descriptionis the same as for Fig. 10

many waveforms most of them inducing small drifts on the systems under study enrichingthe elastic class but not the plastic or the fragile classes.

It is interesting to underline that the naïve Bayesian algorithm classed the Japanese wave-forms in 20 s (for the six SDOF-s), while computing the Takeda responses used as benchmarktook 3 weeks (on a Quad Core processor PC).

Figure 14a displays the performances of the classifier for the three brittle SDOF-s. Con-cerning the comparison between the observed and predicted distribution, the observed andpredicted relative frequency per the bin differ over a large bin range, while in previous appli-cations in Figs. 10, 11 and 12 the differences are concentrated in the area of the overlappingdistributions. This is related to the fact that the new dataset spans different values of eachparameter, in some case the information is extrapolated and the sampling of the three classesdiffers. Looking at the confusion matrix for the μ 1F1 and the μ 2F2 elements, the largemajority of data are correctly classed (in both cases almost 97 % of elastic class and morethan the 90 % of the fragile class are well predicted). For the μ 1F5 elements, the successpercentage for the fragile class decreases to 73.4 %, which can be still considered satisfyingfor blind application.

The blind classifier results on the ductile structures are shown in Fig. 14b. The greatmajority of elements lies along the diagonal for all the SDOF-s, the plastic class is predictedas elastic in 10 % (μ 3F1) 19 % (μ 3F2) and 24 % (μ 3F5) of the cases, and as fragile in 3.5,3.8 and 2 % of the cases for the μ 3F1, μ 3F2 and μ 3F5 structures, respectively.

These results can be explained looking at Fig. 15 showing the comparison between theprobability curves issued from the Japanese dataset and those relative to training data-setused in the blind classification: the little shifts comparable with the bin size are negligible.

123

Bull Earthquake Eng

Fig. 12 Naïve Bayesian classification based on the IM array and the uniform prior distribution with theintroduction of a 10–90 % confidence bound, for brittle (a) and ductile structures (b) for the 1 (a), 2 (b) and 5(c) Hz systems. On the left IM distributions obtained applying the naïve Bayesian classifier (empty histogramscontoured in black for elastic class, in red for the plastic class and in blue for the plastic class) and the observedones (filled histograms). The green bins are the non-classified records: the records having a probability >10 %to be in a given class and <90 % to be in the complementary classes. On the right the confusion matrix: a2D histograms with the observed class on the X axis and the predicted on the Y axis, along the diagonal thewell-labelled signals. The number of data contained in each bin is also reported

123

Bull Earthquake Eng

Ductile Elements

5 Hz

1 Hz

2 Hz

Observed

Predicted

Fig. 12 continued

As a general remark the probability curves of well-correlated parameters have a behaviourthat is independent of the used data-set. It is interesting to follow the behaviour of PGAand Housner Intensity: for the μ 1F1 elements the crossing points of the probability curvesrelative to PGA are shifted by 3 bins, while those relative to the Housner match perfectly. For

123

Bull Earthquake Eng

A

B

Fig. 13 Japanese waveforms database. As in Fig. 2, the database features are plotted as 2D magnitude–distance histogram. The magnitude–distance ranges differ from the training dataset. Very small events (evenif the min PGA is set to 0.1) are present in dataset, as well as the Tohoku 2011 earthquake (M 9.0)

the μ 1F2 elements the PGA curves are still shifted by 3 bins and the Housner curves are nowshifted by one bin. Finally, for the μ 1F5 elements, the probability curves relative to HousnerIntensity do not match, while the PGA curves match perfectly. Similar considerations can bemade for the ductile elements (Fig. 15 bottom). These elements help understanding the originof the large range of mismatch of the observed and predicted distribution (Fig. 14a, b). First,the behaviour of the SDOF-s does not depend on the geological context: the frequency curvesfor the PSA coincide with those issued from the training dataset, and since the PSA coincideswith the system drift, the system response is invariant. The probability curves relative to thewell correlated parameters can be then applied in two different contexts successfully. Onthe contrary, the probability curves relative to the large scattered IM differ from a dataset toanother; this is because even with a large training dataset of 6,378 waveforms, the variabilityof those IMs has not been completely explored. This observation put in evidence that thestatistical model P(IM|s) strongly depends on the sampling of the final classes in the trainingphase. This is not only a limit of the proposed approach, but of all the empirical approachesincluding the classic predictive relationships: if the waveform dataset is not spanning thewhole signal variability, the ergodic assumption must be carefully assumed.

In order to quantify the impact of a very scattered IM on the Bayesian, the classifier hasbeen modified, introducing for each IM a weight function based on the value of the correlationcoefficients (see Fig. 3): if the IM is correlated with the drift with a correlation factor greaterof 0.7, it will be used in the classifier, otherwise it will be rejected.

123

Bull Earthquake Eng

Japan Application Brittle Elements 1 Hz

5 Hz

2 Hz

Observed

Predicted

Fig. 14 Naïve Bayesian classification for Japanese dataset based on the IM array and the uniform priordistribution, for brittle (a) and ductile structures (b), for the 1 (top), 2 (middle), and 5 (bottom) Hz systems.The figure items description is the same as for Fig. 7

123

Bull Earthquake Eng

Japan Application Ductile Elements 1 Hz

5 Hz

2 Hz

Observed

Predicted

Fig. 14 continued

123

Bull Earthquake Eng

Training Probability Curves

Elastic Plastic Fragile Elastic Plastic Fragile

Japan DataProbability Curves

PSA PSA PSA

PGA PGA PGA

PGV PGV PGV

PGD PGD PGD

AI AI AI

HI HI HI

SI SI SI

SCAV SCAV SCAV

PSA PSA PSA

PGA PGA PGA

PGV PGV PGV

PGD PGD PGD

AI AI AI

HI HI HI

SI SI SI

SCAV SCAV SCAV

Brit

tle E

lem

ents

Duc

tile

Ele

men

ts1 Hz 2 Hz 5 Hz

1 Hz 2 Hz 5 Hz

Fig. 15 Comparison between the frequency curves issued on the training and blind datasets. Top brittleelements, bottom ductile elements with initial fundamental frequency of 1 (left), 2 (middle) and 5 Hz (right).The continuous lines are the frequency curves deduced on the training dataset; the dashed lines with transparentdots are those deduced from the Japanese dataset. The coherency between the distributions is determined bythe level of correlation of the IM with the measured drift

123

Bull Earthquake Eng

Fig. 16 Confusion matrixes relative to the naïve Bayesian classification for Japanese dataset, based on thearray of well correlated IM (following the correlation coefficients in Fig. 3) and the uniform prior distribution,for brittle (on the left) and ductile structures (on the right) for the 1 (top), 2 (middle) and 5 (bottom) Hz systems.The results do not show a significant improvement with respect to the application in Fig. 13. This result provesthe stability of the likelihood estimator

The results are reported in Fig. 16. The brittle elements do not show a significant improve-ment, while for the ductile elements μ 3F2 and μ 3F5, the labelling of the plastic and fragileclasses is slightly improved. This proves that the likelihood estimator is robust.

12 Conclusions

An application of the naïve Bayesian classifier for selecting strong motion data in terms ofthe deformation “probably” induced on a given structural system has been presented. Themain differences between the “standard” procedure based on the inference of a polynomial

123

Bull Earthquake Eng

relationship between a single IM and the EDP and the proposed method are: the discretedescription of the engineering demand parameter, the use of an array of intensity measures,and the combination of the information issued from the training phase via a Bayesian for-mulation.

Comparing the algorithm performances for a single IM and for the IM array it has beenobserved for training and blind applications, that the IM array classifier better predicts theIM distribution for each response class, even in the IM ranges where the distributions areoverlapping. In conclusion, taking into account multiple IM leads to a better description ofthe complexity of the seismic signals; furthermore, since the structural response evolves asa function of plastic hinge formation, a single IM is not always able to follow this evolutiondescribing all the complexity of the structural demand. That is why the results presented inthis paper can be improved using more specific IM defined to better relate the input groundmotion and the structural response (De Biasio et al. 2014; Lestuzzi et al. 2004). In thissense, the naïve Bayesian classifier is conceived as a tool able to collect and fully exploit thepublished results on the best IMs as a function of the structural response.

It is important to underline that this paper is mainly focused on the methodology; a verysimplistic 1D model, strongly correlated with the PSA, has been used, making the applicationredundant, especially for the brittle elements. However the pro of using a simple SDOF-s,with respect to 2D or 3D structural models, is that computing efforts are considerably reducedto a few seconds per waveform, allowing to use a great number of waveforms as input for thesix different systems. The con of the SDOF-s is that their response coincides with the spectralacceleration at its fundamental frequency in the linear domain; thus the PSA(f0) is the bestIM and it is not interesting to use the others parameters. This is however not the case forductile elements, neither for more complex systems (multiple degree of freedom systems). Itis important to remark that the proposed Bayesian scheme does not depend on the particularstructural model, neither on the selected IM array.

Besides its limits, the presented methodology can be already successfully applied; this isshown by the blind application on 7,000 Japanese waveforms, where more than 90 % of dataare successfully classed for each SDOF-s.

A further application for the presented algorithm is the classification of massive amountsof synthetic data with respect to a given structural model. Moreover, when synthetic dataare used, it is important to check their compatibility with real data; this process is usuallybased on a comparison between a single given IM measured on the synthetic waveformand the corresponding ground motion prediction equation (GMPE). The presented Bayesianmethod allows checking the compatibility looking simultaneously at an array of IM and atthe response of the target structure or structural system, going beyond a pure seismologicalacceptability criterion.

From the statistical point of view, the improvement of the proposed method is the devel-opment of a strategy allowing to take into account the interdependency between IMs byimplementing a Bayesian Network Kuehn et al. (2011).

A possible extent of the field of application of this method is to use condensed modelsreproducing the main characteristics of a complex 3D model (such as mass, stiffness, funda-mental frequency, ductility). Condensed models are fast enough to be used to train the dataset(which must not necessarily count more than 6,000 waveforms), and the resulting probabil-ity distribution can be used to label data to be used for more complex and time consumingmodels, or as input for the shaking table experiment. Moreover, the method can be useful toclass natural waveforms after rescaling or spectral matching.

123

Bull Earthquake Eng

Acknowledgments This work would not have been possible without the huge work made by people workingat CESMD, RAP, ITACA, K-Net, KiK-Net, GNS, IGN databases, the authors are deeply thankfully for theircontinuous efforts in ensuring data collection and distribution. The authors also thank Prof Pierino Lestuzzi fordistributing his research codes through his personal web-page. This work benefits of fruitful discussions withO. Scotti, C. Satriano, F. Lopez-Caballero and I. Zenter. Maria Lancieri thanks O. Zagordi for his valuable helpin developing the naïve Bayesian classifier. The figures have been plotted using Matplotlib (Hunter 2007). Thework has-been partially found by the “Group d’intérêt scientifique du Réseau Accélérométrique Permanent”(GIS-RAP).

References

Abrahamson NA (1992) Non-stationary spectral matching. Seismol Res Lett 63(1):30Al Atik L, Abrahamson NA (2010) An improved method for nonstationary spectral matching. Earthq Spectra

26(3):601–617. doi:10.1193/1.3459159Allahabadi R, Powell GH (1988) DRAIN-2DX user guide. Report No. UCB/EERC-88/06. Berkeley Earth-

quake Engineering Research Centre, University of CaliforniaAmbraseys N, Smit P, Douglas J, Margaris B, Sigbjornsson R, Olafsson S, Suhadolc P, Costa G (2004) Internet

site for European strong-motion data. Bollettino Geofisica Teorica e Applicata 45:113–129Asgarian B, Nojoumi RM, Alanjari P (2012) Performance-based evaluation of tall building using advanced

intensity measures (case study: 30-story steel structure with frame-tube system). Struct Des Tall SpecBuild 23(2):81–93, 10. doi:10.1002/tal.2013

Bommer JJ, Martinez-Pereira A (2000) The effective duration of earthquake strong motion. J Earthquake Eng3:127–172. doi:10.1142/S1363246999000077

Brune JN (1970) Tectonic stress and the spectra of seismic shear waves from earthquakes. J Geophys Res75(26):4997–5009. doi:10.1029/JB075i026p04997

Chapmann MC (1999) On the use of elastic input energy for seismic hazard analysis. Earthq Spectra 15(4):607–635. doi:10.1193/1.1586064

Chopra AK (2011) Dynamics of structures. Prentice-Hall International Series in Civil Engineering and Engi-neering Mechanics. ISBN-13:978–0132858038

Cosenza E, Manfredi G (2000) Damage index and damage measures. Prog Struct Eng Mater 2(1):50–59.doi:10.1002/(SICI)1528-2716(200001/03)2:1<50::AID-PSE7>3.0CO;2-S()

De Biasio M, Grange S, Dufour F, Allain F, Petre-Lazar I (2014) A simple and efficient intensity measure toaccount for nonlinear structural behavior. Earthq Spectra 30(4):1403–1426

Federal Emergency Management Agency FEMA (2003) HAZUS-MH MR1 advances engineering buildingmodule technical and user’s manual. Department of Homeland Security Emergency Preparedness andResponse Directorate, Washington

Gasparini DA, Vanmarke EH (1976) Simulated earthquake motions compatible with prescribed responsespectra. MIT civil engineering research report R76–4. Massachusetts Institute of Technology, Cambridge

Giovenale P, Cornell A, Esteva L (2004) Comparing the adequacy of alternative ground motion intensitymeasures for the estimation of structural responses. Earthq Eng Struc 3:951–979. doi:10.1002/eqe.386

Grant D, Diaferia R (2012) Assessing adequacy of spectrum-matched ground motion for response historyanalysis. Earthq Eng Struct 42(9):1265–1280. doi:10.1002/eqe.2270

Hancock J, Bommer JJ (2006) A state of knowledge review of the influence of strong-motion duration onstructural damage. Earthq Spectra 22:827–845. doi:10.1193/1.2220576

Hogg D (2008) Data analysis recipes: choosing the binning for a histogram. Cornell University Library.arXiv:0807.4820

Hunter JD (2007) Matplotlib: A 2D graphics environment. Comput Sci Eng 9:90–95Jayaram N, Mollaioli F, Bazzurro P, De Sortis A, Bruno S (2010) Prediction of structural response of rein-

forced concrete frames subjected to earthquake ground motions. In: 9th US National and 10th Canadianconference of earthquake engineering, pp 428–437, Toronto, Canada

Katona TJ, Tóth L (2013) Damages indicators for post-earthquake condition assessment. Acta Geodaetica etGeophysica Hungarica 48(3):333–345. doi:10.1007/s40328-013-0021-9

Katsanos EI, Sextos AG, Manolis GD (2009) Selection of earthquake ground motion records: a state-of-the-art review from a structural engineering perspective. Soil Dyn Earthq Eng 30:157–169. doi:10.1016/j.soildyn.2009.10.005

Kostov MK (2005) Site specific estimation of cumulative absolute velocity. In: 18th international conferenceon structural mechanics in reactor technology (SMiRT 18), SMiRT 18–K03_4, Bejing, China

123

http://dx.doi.org/10.1193/1.3459159

http://dx.doi.org/10.1002/tal.2013

http://dx.doi.org/10.1142/S1363246999000077

http://dx.doi.org/10.1029/JB075i026p04997

http://dx.doi.org/10.1193/1.1586064

http://dx.doi.org/10.1002/(SICI)1528-2716(200001/03)2:1<50::AID-PSE7>3.0CO;2-S()

http://dx.doi.org/10.1002/eqe.386


http://dx.doi.org/10.1193/1.2220576

http://arxiv.org/abs/0807.4820

http://dx.doi.org/10.1007/s40328-013-0021-9

http://dx.doi.org/10.1016/j.soildyn.2009.10.005

http://dx.doi.org/10.1016/j.soildyn.2009.10.005

Bull Earthquake Eng

Kuehn NM, Scherbaum F (2010) A naïve Bayesian classifier for intensities using peak ground velocity andacceleration. Bull Seismol Soc Am 100:3278–3283. doi:10.1785/0120100082

Kuehn NM, Riggelsen C, Sherbaum F (2011) Modeling the joint probability of earthquake, site andground-motion parameters using Bayesian networks. Bull Seismol Soc Am 101:235–249. doi:10.1785/0120100080

Laurendedau A, Cotton F, Bonilla LF (2012) Nonstationary simulation of strong ground motion time histories:application to the K-Net Japanese database. In: 15th World conference on earthquake engineering, Lisbon,Portugal

Lestuzzi P, Badoux M (2008) Génie Parasismique, 2nd edn. Presses polytechniques et universitaires romandes,Lausanne, Swiss

Lestuzzi P, Schawab P, Koller M, Lacave C (2004) How to choose earthquake recordings for non-linear seismicanalysis of structures. In: Proceedings of 13th world conference on earthquake engineering, Vancouver,British Columbia, Canada

Luco N, Bazzurro P (2007) Does amplitude scaling of ground motion records result in biased nonlinearstructural drift response? Earthq Eng Struct 36(13):1813–1835. doi:10.1002/eqe.695

Luco N, Cornell A (2007) Structure-Specific scalar intensity measures for near-source and ordinary earthquakeground motions. Earthq Spectra 23(2):357–392. doi:10.1193/1.2723158

Luco N, Manuel L, Baldava S, Bazzurro P (2005) Correlation of damage of steel moment resisting framesto a vector-valued set of ground motion parameters. In: Proceedings of 9th international conference onstructural safety and reliability, Rome, Italy

Lucchini A, Mollaioli F, Monti G (2011) Intensity measures for response prediction of a torsional buildingsubjected to bi-directional earthquake ground motion. Bull Earthq Eng 9(5):1499–1518. doi:10.1007/s10518-011-9258-2

Mollaioli F, Luchini A, Cheng Y, Monti G (2013) Intensity measures for the seismic response prediction ofbase-isolated buildings. Bull Earthq Eng 11:1841–1866. doi:10.1007/s10518-013-9431-x

NEHRP Consultants Joint Venture (2011) Selecting and scaling earthquake ground motions for performingresponse-history analyses. www.nehrp.gov/pdf/nistgcr11-917-15.pdf

Pousse G, Bonila LF, Cotton F, Margerin L (2006) Nonstationary stochastic simulation of strong ground motiontime histories including natural variability: application to the K-Net Japanese database. Bull Seismol SocAm 96(6):2103–2117. doi:10.1785/0120050134

Preumont A (1984) The generation of spectrum compatible accelerograms for the design of nuclear powerplants. Earthq Eng Struct 12(4):481–497. doi:10.1002/eqe.4290120405

Silva WJ, Lee L (1987) WES RASCAL code for synthesizing earthquake ground motions. State-of-the-art forassessing earthquake hazard in the United States. Report 24. U.S. Army Engineers Waterways ExperimentStation, Misc. Paper S-73-1

Song SG, Dalguer LA, Mai PM (2014) Pseudo-dynamic source modelling with 1-point and 2-points statisticsof earthquake source parameters. Geophys J Int 196:1770–1786. doi:10.1093/gji/ggt479

Takeda T, Sozen MA, Nielsen NN (1970) Reinforced concrete response to simulated earthquakes. J Struct DivASCE 96(12):2557–2573

Wasserman L (2004) All of statistics. A coincise course in statistical inference. Springer, New YorkYamamoto Y, Baker J (2011) Stochastic model for earthquake ground motion using wavelet packets. In:

11th international conference on applications of statistics and probability in civil engineering, Zurich,Switzerland

Yakut A, Yilmaz H (2008) Correlation of deformation demands with ground motion intensity. J Struct Eng134(12):1818–1828. doi:10.1061/(ASCE)0733-9445(2008)134:12(1818)

123

http://dx.doi.org/10.1785/0120100082

http://dx.doi.org/10.1785/0120100080

http://dx.doi.org/10.1785/0120100080


http://dx.doi.org/10.1193/1.2723158

http://dx.doi.org/10.1007/s10518-011-9258-2

http://dx.doi.org/10.1007/s10518-011-9258-2

http://dx.doi.org/10.1007/s10518-013-9431-x

www.nehrp.gov/pdf/nistgcr11-917-15.pdf

http://dx.doi.org/10.1785/0120050134


http://dx.doi.org/10.1093/gji/ggt479

http://dx.doi.org/10.1061/(ASCE)0733-9445(2008)134:12(1818)