earthquake forecasting: statistics and information

21
arXiv:1310.7173v1 [physics.geo-ph] 27 Oct 2013 Noname manuscript No. (will be inserted by the editor) Earthquake forecasting: Statistics and Information V.Gertsik , M.Kelbert, A.Krichevets the date of receipt and acceptance should be inserted later Abstract We present an axiomatic approach to earthquake forecasting in terms of multi-component random fields on a lattice. This approach provides a method for con- structing point estimates and confidence intervals for conditional probabilities of strong earthquakes under conditions on the levels of precursors. Also, it provides an approach for setting multilevel alarm system and hypothesis testing for binary alarms. We use a method of comparison for different earthquake forecasts in terms of the increase of Shannon information. ’Forecasting’ and ’prediction’ of earthquakes are equivalent in this approach. 1 Introduction The methodology of selecting and processing of relevant information about the future occurrence of potentially damaging earthquakes has reached a reasonable level of matu- rity over the recent years. However, the problem as a whole still lacks a comprehensive and generally accepted solution. Further efforts for optimization of the methodology of forecasting would be productive and well-justified. A comprehensive review of the modern earthquake forecasting state of knowledge and guidelines for utilization can be found in [Jordan et all., 2011]. Note that all methods of evaluating the probabilities of earthquakes are based on a combination of geophysical, geological and probabilistic models and considerations. Even the best and very detailed models used in practice are in fact only ’caricatures’ of immensely complicated real processes. A mathematical toolkit for earthquake forecasting is well presented in the paper [Harte and Vere-Jones, 2005]. This work is based on the modeling of earthquake se- Institute of Earthquake Prediction Theory and Mathematical Geophysics RAS, Moscow, RF, [email protected] Dept. of Mathematics, Swansea University, Singleton Park, Swansea, SA2 8PP, UK. Institute of Earthquake Prediction Theory and Mathematical Geophysics RAS, [email protected] Lomonosov MSU, Department of Psychology, Moscow, Russia, [email protected] Address(es) of author(s) should be given

Upload: independent

Post on 04-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

arX

iv:1

310.

7173

v1 [

phys

ics.

geo-

ph]

27

Oct

201

3

Noname manuscript No.

(will be inserted by the editor)

Earthquake forecasting: Statistics and Information

V.Gertsik , M.Kelbert, A.Krichevets

the date of receipt and acceptance should be inserted later

Abstract We present an axiomatic approach to earthquake forecasting in terms of

multi-component random fields on a lattice. This approach provides a method for con-

structing point estimates and confidence intervals for conditional probabilities of strong

earthquakes under conditions on the levels of precursors. Also, it provides an approach

for setting multilevel alarm system and hypothesis testing for binary alarms. We use

a method of comparison for different earthquake forecasts in terms of the increase of

Shannon information. ’Forecasting’ and ’prediction’ of earthquakes are equivalent in

this approach.

1 Introduction

The methodology of selecting and processing of relevant information about the future

occurrence of potentially damaging earthquakes has reached a reasonable level of matu-

rity over the recent years. However, the problem as a whole still lacks a comprehensive

and generally accepted solution. Further efforts for optimization of the methodology of

forecasting would be productive and well-justified.

A comprehensive review of the modern earthquake forecasting state of knowledge

and guidelines for utilization can be found in [Jordan et all., 2011]. Note that all

methods of evaluating the probabilities of earthquakes are based on a combination

of geophysical, geological and probabilistic models and considerations. Even the best

and very detailed models used in practice are in fact only ’caricatures’ of immensely

complicated real processes.

A mathematical toolkit for earthquake forecasting is well presented in the paper

[Harte and Vere-Jones, 2005]. This work is based on the modeling of earthquake se-

Institute of Earthquake Prediction Theory and Mathematical Geophysics RAS, Moscow, RF,[email protected]

Dept. of Mathematics, Swansea University, Singleton Park, Swansea, SA2 8PP, UK. Institute ofEarthquake Prediction Theory and Mathematical Geophysics RAS, [email protected]

Lomonosov MSU, Department of Psychology, Moscow, Russia, [email protected]

Address(es) of author(s) should be given

2 V.Gertsik, M.Kelbert, A.Krichevets

quences in terms of the marked point processes. However, the mathematical technique

used is quite sophisticated and does not provide direct practical tools to investigate

the relations of the structure of temporal-spatial random fields of precursors to the

appearance of strong earthquakes.

The use of the multicomponent lattice models (instead of marked point processes)

gives a different/novel way of investigating these relations in a more elementary way.

Discretization of space and time allows us to separate the problem in question into

two separate tasks. The first task is the selection of relevant precursors, i.e., observable

and theoretically explained physical and geological facts which are casually related to a

high probability of strong earthquakes. Particularly, this task involves the development

of models of seismic events and computing probabilities of strong earthquakes in the

framework of these models. Such probabilities are used as precursors in the second

task.

The second task is the development of methodology of working with these precur-

sors in order to extract the maximum information about the probabilities of strong

earthquakes. This is the main topic of this paper.

Our approach allows us to obtain the following results:

• Estimates of probabilities of strong earthquakes for given values of precursors are

calculated in terms of the frequencies of historic data.

• Confidence intervals are also constructed to provide reasonable bounds of preci-

sion for point estimates.

• Methods of predictions (i.e., binary alarm announcement [Keilis-Borok, 1996],

[Keilis-Borok, Kossobokov, 1990], [Holiday et all., 2007]) and forecasting (i.e., calcu-

lating probabilities of earthquakes [Jordan et all., 2011], [Kagan and Jackson, 2000],

[Harte and Vere-Jones, 2005], [WGNCEP]) are equivalent in the following sense: the

setting of some threshold for probability of earthquakes allows to update the alarm

level. On the other hand, the knowledge of the alarm domain based on historical data

allows us to evaluate the probabilities of earthquakes. In a sense, the prediction is

equivalent to hypothesis testing as well, see Section 11.

• In our scheme we propose a scalar statistic which is the ratio of actual increment

of information to the maximal possible increment of information. This statistic allows

us to linearly order all possible forecasting algorithms. Nowadays the final judgement

about the quality of earthquake forecasting algorithms is left to experts. This arrange-

ment puts the problem outside the scope of natural sciences which are trying to avoid

subjective judgements.

The foundation of our proposed scheme is the assumption that the seismic process

is random and cannot be described by a purely deterministic model. Indeed, if the

seismic process is deterministic then the inaccuracy of the forecast could be explained

by the non-completeness of our knowledge about the seismic events and non-precision

of the available information. This may explain, at least in principle, attacks from the

authorities addressed to geophysicists who failed to predict a damaging earthquake.

However, these attacks have no grounds if one accepts that the seismic process is

random. At the end of the last century (February-April 1999) a group of leading seis-

mologists organized a debate via the web to form a collective opinion of the scientific

community on the topic: ’Is the reliable prediction of individual earthquakes a realistic

scientic goal?’ (see http://www.nature.com/nature/debates/earthquake/).

Despite a considerable divergence in peripheral issues all experts taking part in the

debate agreed on the following main principles:

Earthquake forecasting: Statistics and Information 3

• the deterministic prediction of an individual earthquake, within sufficiently nar-

row limits to allow a planned evacuation programme, is an unrealistic goal;

• forecasting of at least some forms of time-dependent seismic hazard can be jus-

tified on both physical and observational grounds.

The following facts form the basis of our agreement with this point of view.

The string-block Burridge-Knopov model, generally accepted as a mathematical

tool to demonstrate the power-like Gutenberg-Richter relationship between the mag-

nitude and the number of earthquakes, involves the generators of chaotic behaviour or

dynamic stochasticity. In fact, the nonlinearity makes the seismic processes stochastic:

a small change in the shift force may lead to completely different consequences. If the

force is below the threshold of static friction the block is immovable, if the force exceeds

this threshold it starts moving, producing an avalanche of unpredictable size.

This mechanism is widespread in the Earth. Suppose that the front propagation of

the earthquake approaches a region of enhanced strength of the rocks. The earthquake

magnitude depends on whether this region will be destroyed or remains intact. In the

first case the front moves further on, in the second case the earthquake remains local-

ized. So, if the strength of the rocks is below the threshold the first scenario prevails, if

it is above the threshold the second scenario is adapted. The whole situation is usually

labelled as a butterfly effect : infinitesimally small changes of strength and stress lead to

macroscopic consequences which cannot be predicted because this infinitesimal change

is below any precision of the measurement. For these reasons determinism of seismic

processes looks more doubtful than stochasticity.

The only comment we would like to contribute to this discussion is that the fore-

casting algorithms based exclusively on the empirical data without consistent physical

models could hardly be effective in practice (see Sections 12, 13 for more details).

In conclusion we discuss the problem of precursor selection and present a theorem

by A. Krichevetz stating that using a learning sample for an arbitrary feature selection

in pattern recognition is useless in principle.

Finally, note that our approach may be well-applicable for the space-time forecast-

ing of different extremal events outside the scope of earthquake prediction.

2 Events and precursors on the lattice

In order to define explicitly estimates of probabilities of strong earthquakes we dis-

cretize the two-dimensional physical space and time, i.e., introduce a partition of three-

dimensional space-time into rectangular cells with the space partition in the shape of

squares and time partition in the shape of intervals. These cells should not intersect to

avoid an ambiguity in computing the frequencies for each cell. In fact, the space cells

should not be perfect squares because of the curvature of the earth’s surface, but this

may be neglected if the region of forecasting is not too large.

So, we obtain a discrete set ΩK with N = I × J × K points which is defined as

follows. Let us select a rectangular domain A of the two-dimensional lattice with I ×J

points x = (xi, yj); xi = a×i; i = 1, . . . , I and yj = a×j; j = 1, . . . , J,a is the step of

the lattice. A cell in ΩK takes the shape of parallelepiped of height Δt with a square

base. Clearly, any point in ΩK has coordinates (xi, yj , tk), tk = t+k∆t; k = 0, . . . ,K.

.

We say that a seismic event happens if an earthquake with magnitude greater than

some pre-selected threshold M0 is registered, and this earthquake is not foreshock or

4 V.Gertsik, M.Kelbert, A.Krichevets

aftershock of another, more powerful earthquake (we put aside a technical problem of

identification of foreshocks and aftershocks in the sequence of a seismic event). For any

cell in our space-time grid we define an indicator of an event, i.e., a binary function

h(i, j, k). This function takes the value 1 if at least one seismic event is registered

in a given cell and 0 otherwise. Suppose that for all points (xi, yj , tk) the value of a

vector precursor f(i, j, k) = fq(i, j, k), q = 1, ..., Q is given. The components of the

precursor fq(i, j, k), q = 1, ..., Q are the scalar statistics constructed on the base of our

understanding of the phenomena that precede a seismic event.

Remark 1 Note that specifying an alarm domain as a circle with center at a lattice

site and radius proportional to the maximal magnitude of the forecasted earthquakes

leads to a contradiction. Indeed, suppose we announce an alarm for earthquakes with

minimal magnitude 6 in a domain A6. Obviously, the same alarm should be announced

in the domain A7 as well. By the definition A6 ⊂ A7 and we expect an earthquake

with magnitude at least 7 and do not expect an earthquake with magnitude at least 6in the domain A7 \ A6. But this is absurd.

3 Mathematical assumptions

A number of basic assumptions form the foundation of the mathematical tecnique of

earthquake forecasting. In the framework of mathematical theory they can be treated as

axioms but are, in fact, an idealization and simplification with respect to the description

of the real phenomena. Below we summarize the basic assumptions which are routinely

used in existing studies of seismicity and algorithms of earthquake forecasting even the

authors do not always formulate them explicitly.

We accept the following assumptions or axioms of the mathematical theory:

(i) The multicomponent random process h(i, j, k), f(i, j, k), describing the joint evo-

lution of the vector precursors and the indicator of seismic events, is stationary.

This assumption provides an opportunity to investigate the intrinsic relations be-

tween the precursors and the seismic events based on the historical data. In other words,

the experience obtained by analysing the series of events in the past, is applicable to

the future as the properties of the process do not depend on time.

In reality, this assumption holds only approximately and for a restricted time pe-

riod. Indeed, plate tectonics destroys the stationarity for a number of reasons including

some purely geometrical considerations. For instance, the movements of the plates leads

to their collisions, their partial destruction and also changes their shapes. Nevertheless,

the seismic process can be treated as quasi-stationary one for considerable periods of

time. At the time when the system changes one quasi-stationary regime to another (say,

nowadays, many researcher speak about the abrupt climate change) the reliability of

any prediction including the forecast of seismic events is severely restricted.

(ii) The multicomponent random process h(i, j, k), f(i, j, k) is ergodic.

Any quantitative characteristic of seismicity more representative than a registration

of an individual event is, in fact, the result of averaging over time. For instance, the

Gutenberg-Richter law, applied to a given region relates the magnitude with the average

number of earthquakes where the averaging is taken over a specific time interval. In

order to associate with the time averaging a proper probabilistic characteristic of the

process and make a forecast about the future one naturally needs the assumption of

ergodicity. This exactly means that any averaging over time interval [0, T ] will converge

Earthquake forecasting: Statistics and Information 5

to the stochastic average when T → ∞. In view of ergodicity one can also construct

unbiased and consistent estimates of conditional probabilities of strong earthquakes

under conditions that the precursors take values in some intervals. Naturally, these

estimates are the frequencies of observed earthquakes, i.e., ratios of the number of cells

with seismic events and prescribed values of precursors to the total number of cells

with the prescribed values of precursors. (Recall that an unbiased point estimate θ of

parameter θ satisfies the condition Eθ = θ, and a consistent estimate converges to the

true value θ when the sample size tends to infinity).

(iii) Any statement about the value of the indicator of a seismic event h(i, j, k) in the

cell (i, j, k) or its probability should be based on the values of the precursor f(i, j, k)only.

This assumption means that the precursor in the given cell accumulates all the

relevant information about the past and the information about the local properties of

the area that may be used for the forecast of the seismic event in this cell. In other

words, the best possible precursor is used (which is not always the case in practice). As

in the other cases, this assumption is only an approximation to reality, and the quality

of a forecast depends on the quality of the selection and accumulation of relevant

information in the precursors.

Below we present some corollaries and further specifications.

(iii-a) For any k the random variables h(i, j, k), i = 1, ..., I, j = 1, . . . , J are condi-

tionally independent under the condition that the values of any measurable function

u(f(i, j, k)) of the precursors f(i, j, k), i = 1, . . . , I, j = 1, . . . , J are fixed..

In practice this assumption means that the forecast for the time tk = t0 + k∆t

cannot be affected by the values related to the future time intervals (tk, tk + ∆t].In reality all of these events may be dependent, but our forecast does not use the

information from the future after tk.

(iii-b) The conditional distribution of the random variable h at a given cell depends

on the values of the precursors at this cell f and is independent of all other variables.

(iii-c) The conditional probabilities Prij h | u(f) of the indicator of seismic events h

in the cell (i, j, k), under condition u(f) in this cell do not depend on the position of

the cell in space (the time index k related to this probablity may be dropped due to

the stationarity of the process).

In other words, the rule for computing the conditional probability Prij h | u(f)based on the values of precursors is the same for all cells, and the space indices of

probability Pr may be dropped. This condition is widely accepted in constructions of

the forecasting algoritms but rarely formulated explicitly. However, the probability of

a seismic event depends to a large extent on the local properties of the area. Hence,

the quality of the forecasting depends on how adequately these properties are sum-

marized in the precursors. This formalism properly describes the space inhomogenuity

of the physical space because the stationary joint distribution of Prij h,f≤ x for an

arbitrary precursor f depends, in general, on the position of the cell in the domain A.

Below we will use the distributions of precursors and indicators of seismic events in

domain A that do not depend on the spatial coordinates and have the following form

PrA h,f≤ x =1

I · J∑

(i,j )∈A

Prij h,f≤ x ,

PA(x) ≡PrA f≤ x =1

I · J∑

(i,j )∈A

Prij f ≤ x ,

6 V.Gertsik, M.Kelbert, A.Krichevets

pA ≡PrA h= 1 =1

I · J∑

(i,j )∈A

Prij h = 1 ,

(iii-d) Note that assumption (iii) implies that the conditional probabilities Pr (h|u(f))are computed via the probabilities PrA h,f≤ x only.

The properties listed above are sufficient to obtain the point estimates for the

conditional probabilities of seismic events under conditions formulated in terms of the

values of precursors. However, additional assumption are required for a testing of the

forecasting algorithm:

(iv) The random variables f(i, j, k), are conditionally independent under condition that

h(i, j, k) = 1.Again, these conditions are not exactly true, however they may be treated as a

reasonable approximation to reality. Indeed, if the threshold M0 is sufficiently high

than the strong earthquakes may be treated as rare events, and the cells where they

are observed are far apart with a high probability. Any two events related to cells

separated by the time intervals ∆t are asymptotically independent as ∆t → ∞ because

the seismic process has decaying correlations (the mixing property in the language of

random processes). The loss of dependence (or decaying memory) is related to the

physical phemonema such as healing of the defects in the rocks, relaxation of strength

due to viscosity, etc. As usual in physical theories, we accept an idealized model of the

real phenomena applying this asymptotic property for large but finite intervals between

localizations of seismic events.

The independence of strong earthquakes is not a new assumption, in the case of con-

tinuous space-time it is equivalent to the assumption that the locations of these events

form a Poisson random field.. (Note that the distribution of strong earthquake should

be homogeneous in space, because there is no information about the heterogeneity a

priori .) The Poisson hypothesis is used in many papers, see, e.g. [Harte and Vere-Jones

, 2005]. It is very natural for the analysis of the «tails» of the Gutenberg-Richter law

for large magnitudes [Pisarenko et al., 2008]. Summing up, the development of the

strict mathematical theory of earthquake forecasting does not require any additional

assumption except those routinely accepted in the existing algorithms but usually not

formulated explicitly.

4 The standard form for precursors

The correct solution of the forecasting problem given the values of precursors f(i, j, k) =(

f1(i, j, k), . . . , fQ(i, j, k))

is provided by the estimate of conditional probability Prh | f(i , j , k)of the indicator of seismic event in the cell (i, j, k). In practice this solution may be

difficult to obtain because the number of events in catalog is not sufficient.

Indeed, the range of value of a scalar precursor is usually divided into a number

M of intervals, and only a few events are registered for any such interval. For a Q-

dimensional precursor the number of Q-dimensional rectangles, covering the range, is

already MQ, and majority of them contains 0 event. Only a small number of such

rectangles contains one or more events, that is the precision of such an estimate of

conditional probability is usually too low to have any practical value.

For this reason one constructs a new scalar precursor in the form of the scalar

function of component of the vector precursor, and optimize its predictive power. This

Earthquake forecasting: Statistics and Information 7

approach leads to additional complication as the units of measurement and the phys-

ical sense of different components of precursor are substantially different. In order to

overcome this problem one uses some transformation to reduce all the components of

the precursor to a standard form with the same sense and range of values.

Let us transform all the precursors fq(i, j, k), q = 1, ..., Q to variables with the

values in [0,1] providing estimates of conditional probabilities. So, after some trans-

formation F we obtain an estimate of Pr h = 1 | u(f(i, j, k)) = 1, where u is a char-

acteristic function of some interval B, i.e., the probability of event h(i, j, k) = 1 under

condition that this precursor takes the value f(i, j, k) ∈ B.

The transformation F of a scalar precursor f(i, j, k) is defined as follows. Fix an

arbitrary small number ε. Let L be a number of cells (i, j, k) such that h(i, j, k) = 1,and Zl, l = 1, . . . , L, be the ordered statistics, i.e., the values f(i, j, k) in these cells

listed in non-decreasing order. Define a new sequence zm, m = 0, ...,M, from the

ordered statistics Zl by the following recursion: z0 = −∞, zm is defined as the first

point in the sequence Zl, such that zm − zm−1 ≥ ε. Next, construct the sequence

z∗m = zm + (zm+1 − zm)/2,m = 1, . . . ,M − 1, and add the auxiliary elements z∗0 =−∞, z∗M = ∞. Define also a sequence nm, m = 1, . . . ,M , where nm equals to the

number of values in the sequence Zl, such that z∗m−1 ≤ Zl < z∗m. Finally, define the

numbers Nm, m = 1, ...,M counting all cells such that z∗m−1 ≤ f(i, j, k) < z∗m, m =

1, ...M . Observe that∑M

m=1 nm = L,∑M

m=1 Nm = N , and use the ratios

λ =L

N(1)

as estimate of unconditional probability of a seismic event in a given cell

pA ≡ PrA h(i , j , k) = 1 =

ˆ ∞

−∞Pr h = 1 | x dPA(x). (2)

The transformation F is defined as follows

g = Ff(i, j, k) =nm

Nm, if z∗m−1 ≤ f(i, j, k) < z∗m, m = 1, . . . ,M. (3)

This definition implies that transformation F replace the value of precursor for

the frequency, i.e., the ratio of a number of cells containing a seismic event and the

values of precursor from [z∗m−1, z∗m) to the number of cells with the value of precursor

in this range. These frequencies are the natural estimates of conditional probabili-

ties Pr

h = 1 |z∗m−1 ≤ f < z∗m

, m = 1, . . . ,M , computed with respect to stationary

distribution PA(x):

Pr

h = 1 |z∗m−1 ≤ f < z∗m

=

´ z∗

m

z∗

m−1

Pr h = 1 | x dPΩ(x)´ z∗

m

z∗

m−1

dPA(x). (4)

(This conditional probability can be written as Pr h = 1 |u(f ) , where u is the char-

acteristic function of interval [z∗m−1, z∗m)). The function g has a stepwise shape, and

the length of the step in bounded from below by ε. It can be checked that there exist

the limit g = limε→0

limK→∞

g = Pr h = 1 |f .The estimates of conditional probabilities in terms of the function g are quite rough

because typically the numbers nm are of the order 1. As a final result we will present

below more sharp but less detailed estimates of conditional probabilities and confidence

intervals for them.

8 V.Gertsik, M.Kelbert, A.Krichevets

5 Combinations of precursors

There are many ways to construct a single scalar precursor based on the vector pre-

cursor (Ffq, q = 1, . . . , Q). Each such construction inevitably contains a number of

parameters or degrees of freedom. These parameters (including the parameters used

for construction of the precursors themselves) should be selected in a way to optimize

the predictive power of the forecasting algorithm. The optimization procedure will be

presented below, its goal is to adapt the parameters of precursors to a given catalog

of earthquakes, that is to obtain the best possible retrospective forecast. However, this

adaptation procedure creates a "ghost" information related with the specific features

of the given catalog but not present in physical propertities of real seismicity. This

"ghost" information will not be reproduced if the algorithm is applied to another cat-

alog of earthquakes. It is necessary to increase the volume of the catalog and to reduce

the number of free parameters to get rid of this "ghost" information.. Clearly, the first

goal requires the considerable increase of the observation period and may be achieved

in the remote future only. So, one concentrates on the reduction of number of degrees

of freedom. The simplest ansatz including Q− 1 parameters is the linear combination

f∗ = Ff1 +

Q∑

q=2

cq−1F fq . (5)

As a strictly monotonic function of precursor is a precursor itself the log-linear combi-

nation is an equally suitable choice

f∗ = ln (Ff1) +

Q∑

q=2

cq−1 ln (F fq) , (6)

Here cq , q = 1, ..., Q− 1 are free parameters. The result of the procedure has the form

g = Ff∗.

6 Alarm levels, point and interval estimations

In view of (3) the precursor g is the set of estimates for probabilities

Pr

h = 1 |z∗l−1 ≤ f (i , j , k) < z∗l

, l = 1, ..., L(f).

Its serious drawback is that typically

z∗l−1 ≤ f(i, j, k) < z∗l

correspond to single

events, and therefore the precision of these estimates is very low (the confidence in-

tervals discussed below may be taken as a convenient measure of precision). In order

to increase the precision it is recommended to use the larger cells containing a larger

number of events, that is a more coarse covering of the space where the precursor takes

its values. In a sense, the precision of the estimation and the localization of the pre-

cursor values in its time-space region are related by a kind of "uncertainty principle":

the more precise estimate one wants to get the more coarse is the time-space range of

their values and vice versa.

We adapt the following approach in order to achieve a reasonable compromise.

1. For fixed thresholds as, s = 1, ..., S + 1, a1 = 1, as < as+1, aS+1 = 0, we

define S possible alarm levels as+1 ≤ g(i, j, k) < as and subsets Ωs, s = 1, ..., S, of

Earthquake forecasting: Statistics and Information 9

the set ΩK corresponding to alarm levels, i.e., Ωs is a set of cells of ΩK , such that

as+1 ≤ g(i, j, k) < asThere are different ways to choose the number S of alarm levels and the thresholds

as, s = 2, ..., S. Say, fix S = 5, and select as = 10−α(s−1). This is a natural choice of

the alarm level because at α = 1 it corresponds to decimal places of the estimate of the

conditional probability given by the precursor. The problem with S = 2, i.e., two-level

alarm, may be reduced to the hypothesis testing and discussed in more details below.

2. Compute the point estimates θs of probabilities Pr h = 1 |as+1 ≤ g(i , j , k) < as ,s = 1, ..., S, obtained via the distribution PΩ(x) of precursor g in the same way as in

(4). The property (iv) implies that for any domain Ωs the binary random variables

h(i, j, k) are independent and identically distributed, i.e

Pr h = 1 |as+1 ≤ g(i , j , k) < as ≡ ps,

and the unbiased estimate of ps takes the form

θs =ms

ns(7)

where ns stands for the number of cells in domain Ωs, and by ms we denote the number

of cells in Ωs containing seismic events.

3. The random variable ms takes integer values between 0 and ns. The probabili-

ties of these values are computed via the well-known Bernoulli formula Pr(ms = k) =(ns

k ) pks(1 − ps)

ns−k. Let us specify the confidence interval covering the unknown pa-

rameter ps with the confidence level γ. In view of the integral Mouvre-Laplace theorem

for ns large enough the statistics(θs−ps)

√ns√

ps(1−ps)is approximately Gaussian N(0,1) with

zero mean and unit variance. Note that the values ns increase with time. Omitting

straightforward calculations and replacing the parameter ps by its estimate θs we ob-

tain that θ−s < θs<θ+s , where θ−s = θs − tγ√

θs(1−θs)√ns

, θ+s = θs +tγ√

θs(1−θs)√ns

, and

tγ is the solution of equation Φ(tγ) = γ2 . Here Φ stands for the standard Gaussian

distribution function.

4. As a result of these considerations we introduce ’the precursor of alarms’ which

indicates the alarm level: R(f(i, j, k)) = s(i, j, k). It will be used for calculations of

point estimate and the confidence inteval in the form θ−s(i,j,k)

< θs(i,j,k) < θ+s(i,j,k)

.This result will be use for prospective forecasting procedure.

7 The information gain and the precursor quality

The construction of a ’combined’ precursor R involves parameters from formula (5) or

(6) as well as parameters which appear in definition of each individual precursors fq . It

is natural to optimize the forecasting algorithm in such a way that the information gain

related to the seismic events is maximal. In one-dimensional case the information gain

as a measure of the forecast efficiency was first intoduced by Vere-Jones [Vere-Jones,

1998]. Here we exploit his ideas in the case of multidimensional space-time process.

Remind the notions of the entropy and information. Putting aside the mathematical

subtlety (see [Kelbert, Suhov, 2013] for details) we follow below an intuitive approach

of the book [Prohorov, Rozanov, 1969]. The information containing in a given text is,

basically, the length of the shortest compression of this text without the loss of its

content. The smallest length S of the sequence of digits 0 and 1 (in a binary code) for

10 V.Gertsik, M.Kelbert, A.Krichevets

counting N different objects satisfies the relations 0 ≤ S− log2 N ≤ 1. So, the quantity

S ≈ log2 N characterizes the shortest length of coding the numbers of N objects.

Consider an experiment that can produce one of N non-intersecting events А1, . . . ,АN

with probabilities q1, . . . , qN , respectively, q1 + . . . + qN = 1. A message informing

about the outcomes of n such independent identical experiments may look as a se-

quence (Ai1 , . . . , Ain), where Aik is the outcome of the experiment k. But for long

enough series of observations the frequency ni/n of event Аi is very close to its prob-

ability qi. It means that in our message (Ai1 , . . ., Ain) the event Аi appears ni times.

The number of such messages is

Nn =n!

n1!...nN !.

By the Stirling formula the length of the shortest coding of these messages

Sn ≈ log2 Nn ≈ −n

N∑

i=1

qi log2 qi.

The quantity Sn measures the uncertainty of the given experiment before its start,

in our case we are looking for one of possible outcomes of n independent trials. The

specific measure of uncertainty for one trial

1

nSn =

1

nSn(q1, . . . , qN ) = −

N∑

i=1

qi log2 qi

is known as Shannon’s entropy of distribution q1, . . . , qN (in physical literature it is

also known as a measure of chaos or disorder). After one trial the uncertainty about

the future outcomes decreases by the value S = Sn − Sn−1, this decrement equals to

the information gain I = S, obtained as a result of single trial.

The quantity

S(h) = −pA log2 pA − (1− pA) log2(1− pA) (8)

is the (unconditional) entropy of distribution for indicator of seismic event h in a space-

time cell in the absence of any precursors. The conditional entropy S(h | as+1 ≤ g <

as) under condition that in the cell (i, j, k) the alarm level s is set up equals

S(h | as+1 ≤ g < as) = −ps log2 ps − (1− ps) log2(1− ps)

The expected conditional entropy SR(h) of indicator of seismic events where the aver-

aging in taken by the distribution of precursors R takes the form

SR(h) = −S∑

s=1

[ps log2 ps + (1− ps) log2(1− ps)]PA(as+1 ≤ g < as) (9)

We conclude that the knowledge of the precursor values helps to reduce the un-

certainty about the future experiment by S(h)− SR(h) which is precisely information

I(R, h) obtained from the precursor. Taking into account (8), (9) and the fact that

pA =S∑

s=1

psPA(as+1 ≤ g(i, j, k) < as)

Earthquake forecasting: Statistics and Information 11

we specify the information gain as

I(R, h) =S∑

s=1

[

ps log2pspA

+ (1− ps) log21− ps1− pA

]

PA(as+1 ≤ g < as).

By analogy with the one-dimensional case [Kolmogorov, 1965] the quantity I(R, h)may the called the mutual information about the random field h that may be obtained

from observations of random field R. It is known that the information I(R, h) is non-

negative and equals to 0 if and only if the random fields h and R are independent.

This mutual information I(R, h) takes its maximal value S(h) in an idealized case

of absolutely exact forecast. The mutual information quantifies the information that

the distributions of precursors contribute to that of the indicator of seismic event. For

this reason it may be considered as an adequate scalar estimate for the quality of the

forecast.

The quantity I(R, h) depends on the cell size, i.e., on the space discretization

length a and time interval ∆t. We need a formal test to compare precursors defined

for different size of the discretization cells. For this aim let us introduce the so-called

’efficiency’ of precursors as the ratio of information gains

r(R, h) =I(R, h)

S(h).

This efficiency varies between 0 and 1 and serves as a natural estimate of information

quality of precursors. It allows to compare different forecasting algorithms and select

the best one.

A natural estimate of S(h) based on (1) and (2) is defined as follows

S(h) = −λ log2 λ− (1− λ) log2(1− λ). (10)

Taking into account (7) and using an estimate of PA(as+1 ≤ g < as) in the form of

ratio τs = ns

N , we construct an estimate of I(R, h) as follows

I(R, h) =S∑

s=1

[

θs log2θsλ

+ (1− θs) log21− θs1− λ

]

τs, (11)

r(R, h) =I(R, h)

S(h). (12)

Remark 2 The economical quality of forecast. A natural economic measure for a quality

of binary forecast is the economic risk or damage r related to the earthquakes and the

necessary protective measures. In mathematical statistics the risk is defined as the

expectation of the loss function, in our case there are two types of losses: damage and

expenses related to protection. For each cell of our grid the risk may be specified by

the formula

r = αPrh(i , j , k) = 1, η(i , j , k) = 0+ βPrh(i , j , k) = 0, η(i , j , k) = 1++γPrh(i , j , k) = 1, η(i , j , k) = 1,

here α stands for the average damage from a seismic event; β stands for the average

expenses for protection after a seismic alarm is announced; γ stands for the average

12 V.Gertsik, M.Kelbert, A.Krichevets

damage after the alarm, γ = α+β− δ, where δ is the damage prevented by the alarm.

The coefficient in front of Prh(i , j , k) = 0, η(i , j , k) = 0, obviously, equals 0, because

in the absence both of a seismic event and an alarm there is no loss of any kind. Clearly,

only the case when δ > β is economically justified, i.e., the gain from the prevention

measures is positive. Obviously, δ should be less than α+β, i.e., an earthquake cannot

be profitable. Taking into account that α, β and γ depend on the geographical position

of the cell, we write the total risk as the summation over all cells in the region of a

given forecast. In the simplest case of the absence of the spacial component, when a

single cell represents a region of forecast, the expression for the risk is simplified as

follows r = αλν + βτ + γλ(1− ν).

However, the risk r, which is very useful for economical considerations and as a basis

for an administrative decision, could hardly be used as a criteria for quality of seismic

prediction. First of all, it cannot be computed in a consistent way because the coeffi-

cients α, β and γ are not known in practice, and hence no effective way of its numerical

evaluation is known. The computation of these coefficients is a difficult economic prob-

lem and goes far beyond of the competence of geophysicists. On the other hand, the

readiness of the authority to commit resources to solving the problem depends on the

quality of the geophysical forecast. This situation leads to a vicious cirle.

The second drawback of the economic risk as a criterion for the quality of predic-

tion is related to the fact that it depends on many factors which have no relation to

geophysics or earthquake prediction. These factors inlude the density of population,

the number and size of industrial enterprises, infrastructure, etc. It also depends on

subjective factors such as the williness of authorities to use resources for prevention of

the damage from earthquakes. The natural sciences could hardly accept the criteria for

the forecast quality which depend on the type of state organization, priorities of ruling

parties, results of the recent elections, etc.

Remark 3 It seems reasonable to introduce a penalty related to the number of super-

fluous parameters in evaluating the quality of forecast pointing to the natural analogy

with the Akaike test [Akaike, 1974] and similar methods in information theory. In our

context the main parameter of importance is r(R, t) and its limit as t → ∞. This

quantity does not involve the number of parameters directly. Probably, the rate of

convergence depends on the number of parameters but this dependence is not studied

yet.

8 The forecasting procedure

The number of time intervals, i.e., the number of observation N used in the construc-

tion of estimates increases with the growth of observation time. So, the computation

procedure requires constant innovations. On the other hand some computation time

is required to ’adapt’ the model parameters to the updated information about seismic

events via an iterative procedure. For these reasons we propose the following forecasting

algorithm.

1. Given initial parameter values at the moment tK−1 = t0 + (K − 2)∆t we

optimize them to obtain the maximum of efficiency r(R, h) of precursor in domain

ΩK−1. For this aim the Monte-Carlo methods is helpful: one perturbs the current

values of parameters randomly and adapts the new values if the efficiency increases.

The process continues before the value of efficiency stabilized, this may give a local

Earthquake forecasting: Statistics and Information 13

maximum, so the precedure is repeated sufficient number of times. The choice of initial

value on the first step of optimization procedure is somewhat arbitrary but a reasonable

iteration procedure usually leads to consistent results. The opmization procedure takes

the period of time tK−1 < t ≤ tK .

2. Next, we construct the forecast in the following way. At the moment tK the

values of precursor g in each cell (i, j, K + 1) is computed with optimized parameters.

Based on these parameter values the alarm levels, the point estimates and confidence

intervals are computed in each cell as well as the values of efficiency of precursors.

3. The estimates of stationary probabilities of seismic events in the cell θ(i, j) are

defined as follows:

θ(i, j) =1

K

K∑

k=1

θs(i,j,k).

they can be used for creation of the of the variant of the maps of seismic hazard in the

region.

9 Retrospective and prospective informativities

The efficiency of precursor which is achieved as a result of parameters optimization

could be considered as retrospective as it is constructed by the precursors adaptation

to the historical catalogs of seismic events. The prospective efficiency for the space-time

domain Ω∗ containing the cell in the ’future’ is based on the forecast. It is computed

via formulas (10), (11), (12) with the only difference that domain Ω∗s consists from

the cells where the forecasted alarm level is s. The efficiency of prospective forcast

is smaller compared with the retrospective efficiency, however approaches this value

with time. In principle, the prospective efficiency is an ultimate criteria of precursors

quality and the retrospective efficiency could serve only for the preliminary selection

of precursors and their adaptation to the past history of seismic events.

10 Testing of the forecasting algorithm

The efficiency of precursor could be computed exactly only in an idealized case of infi-

nite observation time. However, its estimate may be obtained based on the observation

over a finite time interval. So, if an estimate produces a non-zero value not necessarily

the real effects is present. It may be simply a random fluctuation even if the precursor

provides no information about the future earthquake. For this reason we would like to

check the hypothesis H0 about the independence of a precursor and an event indica-

tor with a reasonable level of confidence. In case the hypothesis is rejected one have

additional assurance that the forecasting is real, not just a "ghost".

So, consider the distributions

PA(x) =1

I · J∑

(i,j )∈Ω

Prij g(i , j , k) ≤ x ,

and

P′A(x) =

1

I · J∑

(i,j )∈A

Prij g(i , j , k) ≤ x |h(i, j, k) = 1

14 V.Gertsik, M.Kelbert, A.Krichevets

The function PA(P−1A

(y)) = y of variable y = PA(x) provides an uniform distribution

F ∗(y) = Pr(

ξ ≤ y)

of some random variable ξ on [0,1]. Next, consider a distribution

function G(y) = P ′A(P

−1A

(y)) on [0,1], and use a parametric representation for abcissa

PA(x) and ordinate P ′A(x). If random fields g and h are independent the distribution

functions PA(x) = P ′A(x) and G(y) are uniform. So, the hypothesis about the absence

of forecasting, i.e., about the independence of g and h, is equivalent to the hypothesis

H0 that the distribution G(y) is uniform.

The empirical distribution GL(y) related to G(y) is defined as follows. Denote by

ul, l = 1, ...L the values of the function g(i, j, k) sorted in the non-decreasing order

and beloning to the cells where h(i, j, k) = 1. Let nl be the numbers of cells such that

h(i, j, k) = 1, g(i, j, k) = ul. Denote by m(ul) the numbers of cells from Ω such that

g(i, j, k) < ul, and define the empirical distribution GL(y) as a step-wise function with

GL(0) = 0 and positive jumps of the size nl

L at points yl =m(ul)N .

The well-known methods of hypothesis testing requires that the function GL(y)has the same shape as for independent trials, i.e., random variables ul, l = 1, ...L are

independent in view of axiom (iv). Naturally, we accept the precursors such that the

hypothesis H0 is rejected with the reasonable level of confidence. (Remind, that the

hypothesis is accepted if and only if its logical negation could be rejected based on the

available observations. The fact that the hypothesis cannot be rejected does not mean

at all that it should be accepted, it only means that the available observations don’t

contradict this hypothesis. Say, the well-known fact that "The Sun rise in the East"

does not contradict to our hypothesis, however it may not be considered as a ground

for its acceptance.) For large values of L the Kolmogorov statistics [Kolmogorov, 1933a]

is helpful for this aim

DL = sup | GL(y)− y |

with an asymptotic distribution

limL→∞

Pr√

LDL ≤ z

=∞∑

k=−∞

(−1)k e−2k2 z2

, z > 0,

or Smirnov’s statistics [Smirnov, 1939]

D+L = sup [GL(y)− y] ,

D−L = − inf [GL(y)− y] ,

with asymptotic distribution

limL→∞

Pr√

LD+L ≤ z

= limL→∞

Pr√

LD−L ≤ z

= 1− e−2z2

, z > 0.

The asymptotic expressions for these statistics can be used for L > 20 ([Bolshev,

Smirnov, 1965])..

Earthquake forecasting: Statistics and Information 15

11 The binary alarm and the hypothesis testing

The prediction is the form of forecast when an alarm is announced in a given cell

without a preliminary evaluation of probability of seismic event. In this case we can

estimate the probabilities of events too. (If the alarm is announced in an arbitrary

domain Ω we set up an alarm if at least haph of the cell of our model is occupied by

alarm.).

Let M be the number of cells in Ω which are in the state of alarm, M0 be the

number of cells where the seismic event is present but no alarm was announced (the

number of ’missed targets’). Denote by τ = MN the share of the cells with alarm

announced, λ = LN the share of the cells with seismic events, and ν = M0

M the share

of missed targets. Let a random variable η(i, j, k), equal 1 if an alarm is announced

in the cell (i, j, k), and 0 otherwise. Obviously, the estimate of conditional probability

Pr h(i , j , k) = 1 | η(i , j , k) = 1 of the seismic event under the condition of alarm isλ(1−ν)

τ , and the estimate of conditional probability Pr h(i , j , k) = 1 | η(i , j , k) = 0of the seismic event under the condition of no alarm is λν

1−τ .

If the alarm is announced according to the procedure described in Section 5 the

threshold a1 specifying the acceptable domain of values for g(i, j, k) should be treated

as a free parameter and selected by maximizing the information efficiency r(η, h). The

estimate of information increase for given values of τ and ν equals

I(η, h) = λ(1− ν) log21− ν

τ+ λν log2

ν

1− τ+

+ [τ − λ(1− ν)] log2τ − λ(1− ν)

(1− λ)τ+ (1− τ − λν) log2

1− τ − λν

(1− λ)(1− τ).

The value of η(i, j, k) characterizes the results of checking two mutually exclusive

simple hypothesis:

H0: the distribution of g(i, j, k) has the form P0A(x) ≡ P rA g(i , j , k) ≤ x | h(i ,j ,k) = 0,

implying ’no seismic events’,

or

H1 : the distribution of g(i, j, k) has the form P1A(x) ≡ P rA g(i , j , k)≤ x | h(i ,j ,k) = 1,

implying the presence of seismic event.

Statistics for checking of these hypothesis is the precursor g(i, j, k), and the critical

domain for H0 has the form g(i, j, k) ≥ a1. (If usual method of alarm announcement

is used the relevant precursor plays the rôle of statistics and the critical domain is

defined by the rule of the alarm announcement). The probability of first type error

α = Pr η(i , j , k) = 1 | h(i , j , k) = 0 ,

it is estimated as τ−λ(1−ν)1−λ

. The probability of second type error

β = Pr η(i , j , k) = 0 | h(i , j , k) = 1 ,

it is estimated as ν. (Note that due to condition (iii) any test used for the checking

these hypothesis should not depend on the coordinates of the cell).

The Neyman-Pearson theory allows to define the domain of images of all possible

criteriaall possible criteria: in coordinates (α, β) it is a convex domain with a boundary

16 V.Gertsik, M.Kelbert, A.Krichevets

Γ which corresponds to the set of uniformly most powerful tests. This family may

be defined in terms of the likelihood ratio Λ(x) =p1

A(x)

p0

A(x)under condition that the

distributions P1A(x) and P0

A(x) has densities p1A(x) and p0

A(x):

η(i, j, k) = 1 if Λ(x) > ω,

η(i, j, k) = 0 if Λ(x) < ω

where ω denotes the threshold. In the paper [Gercsik, 2004] we demonstrated that

among all the tests with the images on the boundary Γ there exists three different best

tests. Here the term "best" may be understood in three different sense, i.e., maximizing

the variational, correlational and informational efficiency. The most relevant criteria is

the informational efficiency r(η, h).The well-known Molchan’s error diagram [Molchan, 1990] where the probability of

the first kind error is estimated by τ is constructed in the same way. However, it involve

a comparison of two intersecting hypothesis:

H0: the distribution g(i, j, k) has the form P0A(x) ≡ PrA g(i , j , k)≤ x, i.e., the

seismic event could "either happen or not happen", and

H1 : the distribution g(i, j, k) has the form P1A(x) ≡ PrA g(i , j , k) ≤ x | h(i ,j ,k) = 1,

i.e., the seismic event "will happen"

.Note that the rejection of hypothesis H0 leads to absurd results.

Remark 4 In the paper [Molchan and Keilis-Borok, 2008] the area of the alarm domain

is defined in terms of non-homogeneous measure depending on the spacial coordinates,

in terms of our paper it may be denoted as θ(i, j). i.e., τ ∽

i,j θ(i, j)η(i, j, k). This

approach is used to eliminate the decrease of the share of alarmed sites τ with the

extension of the domain when a purely safe and aseismic territory is included into

consideration. It would be well-justified if the quantity τ could be accepted as an ade-

quate criterion of the quality of forecast in its own right. On the other hand, it can be

demonstrated that the information efficiency r(η, h) converges to a non-zero value 1−ν

when the number of cells with an alarm is fixed but the total number of cells tends

to infinity. An inhomogeneous area of the territory under forecast which is propor-

tional to θ(i, j) does not enable us to calculate the informational efficiency. Moreover,

it possesses a number of unnatural features from the point of view of evaluation of

economical damage. A seismic event in the territory of low seismicity is more costly

because no precautions are taken to prevent the damage of infrastructure. However,

in this inhomogeneous area an alarm announced in an aseismic territory will have a

smaller contribution than an alarm in a seismically active territory where the losses

would be in fact smaller. We conclude that this approach ’hides’ the most costly events

and does not provide a reasonable estimate of economic damage.

12 The choice of precursors

We use the term ’empiric precursor of earthquake’ for any observable characterisric de-

rived from the catalog only which provides for this catalog a reasonable retrospective

forcast of seismic events and not derived from basic physical conception of seismicity

(say, the periods of relatice calm, deviation of some basic characteristic from a long-time

average , etc). In contrast, the physical precursors are de- rived from some of physical

processe and characterize physical quantities (stress fields, strength, concentration of

Earthquake forecasting: Statistics and Information 17

cracks, etc.) or well-defined physical processes (i.e., phase transitions, cracks propaga-

tions, etc.) In the meteorological forecast the danger of using empirical precursors was

highlighted by A. Kolmogorov in 1933 [Kolmogorov, 1933]. From that time the mete-

orological forecast relies on the physical precursors which are theoretically justified by

the models of atmospheric dynamics. Below we will present A. Kolmogorov’s argument

adapted to the case of seismic forecast. This demonstrates that the purely empirical

precursors work well only for the given catalog from which they are derived. However,

their eficiency deteriorates drastically when they are applied to any other independent

catalog.

Consider a group of k empirical precursors used for a forecast and and selected from

a set of n such groups. According to A. Kolmogorov’s remark the number k is typically

rather small. This is related to the fact that a number of strong earthquakes in catalog

is unlikely to exceed a few dozen. As the values of precursors are random there exists

a small probability p that the efficiency of the forecast exceeds the given threshold

С. Then the probability of event r(R, h)≤С equals 1–p, and the probability of event

r(R, h) > С for at least one collection of precursors equals P = 1− (1− p)n and tends

to 1 as n → ∞.. (According to Kolmogorov some arbitrariness of the assumption of

independence is compensated by the large number of collections.)

Summing up, if the number of groups is large enough with probability close to

1 it is possible to find a group giving an effective retrospective forecast for a given

catalog. In practice this is always the case as the number of empirical precursors could

be increased indefinitely by variation of real parameters used in their construction. It is

important to note that for such a group, which is highly eficient for the initial catalog,

the probability that the eficiency is greater than C is still equal to p for any other

catalog. In other words the larger the number of the groups of empirical precursors the

less reliable forecast is. So, the collection of a large list of the empirical precursors is

counter-productive.

Much more reliable are the physical precursors intrinsicly connected with the phys-

ical processes which preserve their values with the change of sample. The probability

to find such a set of precursors by pure empirical choice is negligible because they are

very rare in the immense collection of all possible precursors.

13 Image identification

The possibility to use the pattern recognition formalism in seismic forecast is totally

based on the acceptance of deterministic model of seismicity. It is necessary to assume

that in principle there exits such a group of precursors which allows to determine with

certainly whether a strong earthquake will happen or not. In this case one believes that

all random errors are related to the incompleteness of this set of precursors.But if the

seismicity is a random process then the image appears only after the earthquake and

before it any set of values for precursors cannot guarantee the possible outcome and

only the relevant probabilities may be a subject of scientific study. After the discovery

of dynamic instability and generators of stochastic behavior of dynamical systems the

deterministic model of seismicity is cast in doubts. Its potential acceptance requires

substantial evidence which hardly exist at present.

In any case the results of pattern recognition procedure (i.e., a binary alarm) are

useful if they are considered alongside with the results of statistical tests. They allows

to calculate the estimate of probabilities of seismic events and informational efficiency.

18 V.Gertsik, M.Kelbert, A.Krichevets

However, the section of ’features’ for pattern recognition leads to the same difficul-

ties as the selection of precursors: the ’features’ based on the observations only and not

related to the physics of earthquakes are not helpful, and any hopes for ’perceptron

education’ are not grounded. A successful supervised recognition is possible if the fea-

tures has proved causal relation with pattern. This principle is illustrated by a simple

but important theorem by A.N. Krichevets.

Theorem 1 Let A be a finite set, B1, B2 ⊂ A,B1 ∩B2 = ∅. We say that B1 and B2

are finite educational samples. Let X ∈ A, X /∈ B1 ∪B2 be a new object. Then among

all classifications, i.e., subsets (A1, A2) such that B1 ⊂ A1, B2 ⊂ A2, A1 ∪ A2 = A,

A1 ∩ A2 = ∅ satisfying condition that either B1 ∪X ⊂ A1 or B2 ∪ X ⊂ A2 exactly a

half classifies X as an object of sample B1 and a half classifies X as an object from

B2.

Proof It is easy to define a one-to-one between classifications. Indeed, if A1, A2 ,A1 ∪ A2 = A, is a classification such that B1 ⊂ A1, X ⊂ A1, B2 ⊂ A2, one maps it

into the unique classification

A′1, A

′2

such that B1 ⊂ A′1, X ⊂ A′

2, B2 ⊂ A′2, where

A′1 = A1 \X, A′

2 = A1 ∪X.

Corollary 1 A supervised pattern recognition is impossible. After the leaning procedure

the probability to classify correctly a new object is the same as before leaning, i.e., 1/2.

14 Demonstration of algorithm

A preliminary version of the forecast algorithm described above was used in the paper

[Ghertzik, 2008] for California and the Sumatra-Andaman earthquake region. These

computations serve as a demonstration of the efficiency of the method but their actual

results should be taken with a pinch of salt because the selection of precursors does not

appear well-justified from the modern point of view: the number of free parameters to

be adapted in the precursor ‘"stress indicator" is too large.

Califormia region. The catalog Global Hypocenter Data Base CD-ROM NEIC/USGS,

Denver, CO, 1989, together with data from the site NEIC/USGS PDE (ftp://hazard.cr.usgs.gov)

for earthquakes with magnitudes M ≥ 4.0 with epicenters between 113−129 of west-

ern longitude and 31 − 43 of northern latitude was used for parameter adaptation.

The initial time t0 was selected by subtracting from the time of actual computation,

08.03.2006, an integer number of half-year intervals such that t0 fits the first half of

the year 1936. (The final time 08.03.2006 could be considered as an initial moment for

constructing half-year forecast forward up to the date of the latest earthquake available

in the catalog). During the computation the time interval from the first half of 1936 to

the first half of 1976 was used for relaxation of the zero initial data used for precursors.

After this date the catalog for the earthquakes with magnitude M ≥ 4.0 was used to

estimate the probabilities of strong earthquakes with magnitude M ≥ 6.0 up to the

moment of actual forecast. Note that the adapted restriction to include into consider-

ation only earthquakes with magnitudes M ≥ 4.0 is a severe restriction. It decreases

the precision of precursor computation and therefore, if a prediction is successful, in-

creases the degree of confidence to the predictor choice. We choose a = 150km as a size

of the spacial lattice, and ∆t = 6 months as a time-step. Retrospective forecast was

performed with 5 alarm levels defined by the thresholds as = 10−α(5−s), s = 1, . . . , 4and α = 0.75. (Due to too short time step no alarms was registered on the lowest level

Earthquake forecasting: Statistics and Information 19

when parameter α = 1 was selected). In order to reduce the influence of the boundary

conditions the large square covering all the seismic events in the catalog used in the

computations was reduced by two layers of elementary cells from each boundary. As

a result of optimization the forecast information efficiency of 0.526 was achieved, i.e.,

the forecasting algorithm applied to the given catalog extracted from it about 53% of

all available information about seismic events. It seems that this result could be only

partially explained by a lucky selection of precursors: another contributor to the high

efficiency of the algorithm is the adaptation of the parameters to the features of the

specific catalog. The influence of this artificial information may be reduced only with

the increase of the observation interval.

Accepting the rule of binary alarm announcement in the cells from group 1 and 2

from 5 levels possible one obtains that the space-time share of alarmed cells is 3.4%and the share of missed targets is 18.2%. This result is comparable with the best

forecasts available in the literature and obtained by other methods (in the cases when

the quantitive parameters of algorithms are presented in the publications). When the

forecast was constructed in the future we obtained that the estimate of probability of

a strong earthquake anywhere in the area under study is 0.174, and the maximal point

estimate of an event in any individual cell is 0.071. As a whole the seismic situation

in California did not look too alarming. Indeed, there were no strong earthquakes in

California in the next half-year.

SAE region. We have conducted a retrospective forecast of strong earthquakes

with magnitudes M ≥ 7.0 for the whole region where the Sumatra-Andoman earth-

quake (SAE) happened on 26.12.2004 with magnitude M = 9.3. The catalog Global

Hypocenter Data Base CD-ROM NEIC/USGS, Denver, CO, 1989, together with the

data from the cite NEIC/USGS PDE (ftp://hazard.cr.usgs.gov) for earthquakes with

magnitudes M ≥ 5.5 with epicenters between 84.3 − 128 of eastern longitude and

20 − 26 of northern latitude was used for the parameters adaptation. The initial

moment of time t0 was selected by subtracting from the time of actual computation,

10.11.2004, an integer number of half-year intervals such that t0 fits the first half of the

year 1936. (The final time was selected in such a way that the next half-year period

covers SAE and its powerful aftershock). During the computation the time interval

from the first half of 1936 to the first half of 1976 was used for relaxation of the zero

initial data used for precursors. After this date the catalog was used to estimate the

probabilities of strong earthquakes with magnitude M ≥ 7.0 up to the moment of ac-

tual forecast. (For magnitude M ≥ 7.5 the number of seismic events was not sufficient

for reliable forecast because the 5%-confidence intervals strongly overlapped). In this

case the restriction to include into consideration only earthquakes with magnitudes

M ≥ 5.5 was adapted. As before, it decreases the precision of precursor computa-

tion and therefore, if the prediction is successful, increases the degree of confidence to

the predictor choice. We selected the size of the spacial grid as a = 400km and the

size of time-step ∆t =half-year. Retrospective analysis was conducted following the

same scheme as in the previous case. In order to reduce the influence of the boundary

conditions the large square covering all the seismic events in the catalog used in the

computations was reduced by two layers of elementary cells from each boundary. As a

result of optimization the forecast information efficiency was 0.549, i.e., the forecasting

algorithm extracted around 55% of all available information about seismic events when

applied to the given catalog.

In the case of binary alarm announcement the space-time share of alarmed cells

was 3.1% and the share of missed targets was 8.3%. This result is comparable with

20 V.Gertsik, M.Kelbert, A.Krichevets

the best forecasts available in the literature and obtained by the other methods (in the

cases when the quantitive parameters of algorithms are presented in the publications).

In the case of forward forecast the two most powerful earthquakes, i.e., SAE and

its major aftershock, happened in the alarm zone of the second level, and two other

events with smaller magnitudes in the fourth alarm zone. Note that in case of binary

alarm announcement one would register a square with 9 elementary cells with only one

alarmed and 8 quiet cells. In this case no reliable forward forecast is possible.

Acknowledgements We would like to thank V.Pisarenko and G.Sobolev for stimulat-

ing discussions that gave us the impulse for writing this paper.

References

Akaike, 1974. Akaike, H., A new look at the statistical model identification,IEEE Trans. Automatic Control, 19, n. 6, 716-723, 1974.

Bolshev, Smirnov, 1965 . Bolshev L. N., Smirnov V. N., Mathematical StatisticalTables (in Russian). V.A. Steklov Matematical Institute,Academy of Sciences, Moscow, USSR, 1965, 464 pp.

Field, 2007. Field, E. H., Overview on the working group for the devel-opment of Regional Earthquake Likelihood Models (RELM),Seismol. Res. Lett. 7â 16, 2007.

Ghertzik, 2008. Gercsik V., Physical concepts of fracture and prediction ofprobabilities of strong earthquakes, Phys. Solid Earth, 44. n.3, 22-39. 2008.

Gercsik, Kelbertl, 2004. Gercsik V., Kelbert M., On comparision of hypothesis tests inBayesian framework without a loss function, Journ. ModernApplied Statistical Methods, 3, n.2, 399-405. 2004.

Harte and Vere-Jones, 2005. Harte D. and Vere-Jones D., The entropy score and its usesin earthquake fore- casting, Pure and Applied Geophysics.162, n. 6-7, 1229-1253, 2005.

Holliday et al., 2007. Holliday J.R., Chien-chih Chen, Tiampo K.F., Rundle J.B.,Turcott D.L. and Donnellan A., A RELM Earthquake fore-cast based on pattern informatics, Seismological ResearchLett., 78, n.1., 97-93, 2007.

Jackson, 1996 . Jackson D. D., Hypothesis testing and earthquake prediction,Proc. Natl. Acad. Sci. USA, 93, 3772-3775, 1996.

Jordan et all., 2011. Jordan T.H., Chen Y.-T., Gasparini P., Madariaga R., MainI., Marzocchi W., Papdopoulos G., Sobolev G., Yamaoka K.and Zschau J., Operational earthquake forecasting: state ofknowledge and guidelines for utilization, Ann. Geophysics,54, n. 4, 2011.

Kagan and Jackson 2000. Kagan Y. Y., and Jackson D.D., Probabilistic forecasting ofearthquakes, Geophysical Journal International, 143, 438â1453, 2000.

Keilis-Borok., 1996. Keilis-Borok V. I., Intermediate-term earthquake prediction,Proceedings of the National Academy of Sciences of theUnited States of America, 93, n. 9, 3748-3755, 1996.

Keilis-Borok, Kossobokov, 1990. Keilis-Borok V.I., Kossobokov V.G. , Premonitory activationof earthquake flow: algorithm M8. Physics of the Earth andPlanetary Interiors, 61. 73-83, 1990.

Kelbert, Suhov, 2013. Kelbert M., Suhov Y., Information Theory and Coding byExample. Cambridge Univ. Press: Cambridge, 2013, 530 pp.

Kolmogorov, 1965 . Kolmogorov A. N., Three approaches to the definition of thenotion of infor- mation amount, Probl. Information Trans-mission, 1, n. 1, 3-11, 1965.

Kolmogorov, 1933. Kolmogorov A. N., On the suitability of statistically obtainedprediction formulas, Zh. Geoz., 3, 7882, 1933.

Earthquake forecasting: Statistics and Information 21

Kolmogorov, 1933a. Kolmogorov A. N., Sulla determinazione empirica di unalegge distribuzione, G. Ist. Ital. Attuari., 4 (1), 83-91, 1933.

Molchan, 1990. Molchan G. M., Strategies in strong earthquake prediction,Phys. Earth Plan. Int., 61, 84-98, 1990.

Molchan and Keilis-Borok, 2008. Molchan, G. M. and Keilis-Borok V. I., Earthquake predic-tion: probabilistic aspect, Geophys. Journ. Int., 173, 1012-1017, 2008.

Pisarenko et al., 2008. Pisarenko V. F., Sornette A., Sornette D., Rodkin M. V.,New approach to the characterization of Mmax and of thetail of the distribution of earthquake magnitudes, Pure andApplied Geophysics, 165, n 5 , 847-888, 2008

Prohorov, Rozanov, 1969. Prohorov Yu.V., Rozanov Yu.A., Probability theory, ba-sic concepts. Limit theorems, random processes, Springer-Verlag, 1969, 401 pp.

Smirnov, 1939. Smirnov N. V., On deviations from the empirical distributioncurve, Mat. Sb., 6(48), 1, 3-24, 1939.

Vere-Jones, 1998. Vere-Jones D., Probability and information gain for earth-quake forecasting, in: Problems of Geodynamics and Seismol-ogy, Computational Seismology, Issue 30, Moscow, 248-263,1998.

WGNCEP, 2011. WGNCEP, 2011 WGNCEP (Working Group on NorthernCalifornia Earthquake Potential), The Uniform CaliforniaEarthquake Rupture Forecast, Version 3 (UCERF3) ProjectPlan : U. S. Geological Survey Open-File Report 1-176, 2011.