the tau model for data redundancy: part 2

23
The Tau model for Data Redundancy: Part 2 Sunderrajan Krishnan April 24, 2005 Abstract Knowledge about earth properties arrive from diverse sources of information. Connectivity of permeability is one such earth property which is essential for de- scribing flow through porous materials. Information on multiple-point connectivity of permeability arrives from core data, well-test data and seismic data which are defined over varying supports with complex redundancy between these information sources. The tau model offers a framework to combine these diverse and partially redundant data. The tau weights in this tau model are a measure of data redundancy, function of the data values and of the order in which the conditioning data are be- ing considered. In order to compute these tau weights, one needs a model of data redundancy, here expressed as a vectorial training image (Ti). A vectorial Ti can be constructed using a prior conceptual knowledge of geology and the physics of data measurement. From such a vectorial Ti, the tau weights can be computed exactly, then compared to those computed using any approximative calibration technique. In the case of estimating permeability connectivity, one observes significant deviations from data independence or conditional independence. Neglecting data redundancy leads to an over-compounding of individual data information and a possible risk of making extreme decisions. 1 Introduction Several recent studies have pointed out that advective fluid flow through permeable media is affected strongly by the connected patterns of extreme permeability values, both highs and lows. Srinivasan [7] studied the impact of higher-order pattern statistics on fluid flow and showed that complex flow-based reservoir performance can be represented in terms of the multiple-point properties of the underlying permeability field [8]. Several authors have attempted to develop measures of such multiple-point (mp) pattern characteristics 1

Upload: others

Post on 08-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Tau model for Data Redundancy: Part 2

The Tau model for Data Redundancy: Part 2

Sunderrajan Krishnan

April 24, 2005

Abstract

Knowledge about earth properties arrive from diverse sources of information.Connectivity of permeability is one such earth property which is essential for de-scribing flow through porous materials. Information on multiple-point connectivityof permeability arrives from core data, well-test data and seismic data which aredefined over varying supports with complex redundancy between these informationsources. The tau model offers a framework to combine these diverse and partiallyredundant data. The tau weights in this tau model are a measure of data redundancy,function of the data values and of the order in which the conditioning data are be-ing considered. In order to compute these tau weights, one needs a model of dataredundancy, here expressed as a vectorial training image (Ti). A vectorial Ti can beconstructed using a prior conceptual knowledge of geology and the physics of datameasurement. From such a vectorial Ti, the tau weights can becomputed exactly,then compared to those computed using any approximative calibration technique. Inthe case of estimating permeability connectivity, one observes significant deviationsfrom data independence or conditional independence. Neglecting data redundancyleads to an over-compounding of individual data information and a possible risk ofmaking extreme decisions.

1 Introduction

Several recent studies have pointed out that advective fluidflow through permeable mediais affected strongly by the connected patterns of extreme permeability values, both highsand lows. Srinivasan [7] studied the impact of higher-orderpattern statistics on fluid flowand showed that complex flow-based reservoir performance can be represented in termsof the multiple-point properties of the underlying permeability field [8]. Several authorshave attempted to develop measures of such multiple-point (mp) pattern characteristics

1

Page 2: The Tau model for Data Redundancy: Part 2

of permeability. One such measure is the rectilinear mp-connectivity of binary indicatorvaluesI(u) at locationu along any directionh given by:

K(h; n) = E{I(u)I(u + h) . . . I(u + nh)}

= Prob{I(u) = 1, I(u + h) = 1, . . . , I(u + nh) = 1}(1)

Note thatK(h; n) = P (I(u) = 1) for n = 1. This multiple-point measureK(h; n)is an improvement over the traditional two-point transition probabilityP (I(u + nh) = 1|I(u) = 1) which considers the interaction between locations takenonly two at a time.

Geostatistical algorithms have been developed to impose these mp-statistics over con-ditional realizations using simulated annealing [2]. However, it has been shown that theserectilinear measures of connectivity fail to capture the curvilinear nature of geologicalpatterns. One striking example demonstrated by Krishnan and Journel [5] is that of ahighly continuous channel deposit which has the same rectilinear connectivityK(h; n)and same indicator variogramγI(h) as a discontinuous lens structure which, by sim-ple visual inspection, shows a significantly lesser spatialconnectivity and hence a lessereffective permeability. Newly developed geostatistical simulation algorithms manage tocapture such higher order connectivity measures by means ofconsidering all mp-statisticswithin a specified neighborhood template ( [9], [1], [10]).

Information about these mp-characteristics of permeability can arrive from a varietyof sources. These sources of information, denotedDi, i = 1, . . . , n, could be definedover different supports and could be derived from diverse data such as small-support coredata and larger support well-test data and seismic-deriveddata. Denoting the unknownconnectivity of permeability asA, one can represent the information arriving from eachindividual data sourceDi in terms of the conditional probabilityP (A|Di)

1. It is evi-dent that there could be strong redundancies between these individual data informationP (A|Di). This data redundancy arrives from a variety of reasons withone prime rea-son being the overlap in the volume supports over which thesedata are defined. Therecould also be overlap in the physical processes generating these data, for example twodifferent well-tests conducted at nearby wells using different pressure or flow controls.Very importantly, note that this overlap between information cannot be fully described bythe common unknownA alone rejecting therefore any possibility of conditional indepen-dence (CI) between the data, i.e.,P (D1, . . . , Dn|A) = P (D1|A) . . . P (Dn|A) [6].

1Throughout this paper, we use the short notationP (A|Di) instead of the exactP (A = a|Di = di)

2

Page 3: The Tau model for Data Redundancy: Part 2

When combining these elementary data informationP (A|Di) into a combined knowl-edgeP (A|D1, . . . , Dn), one needs to account for the redundancies between then datautilized. The question then is how can one account for complex data redundancy as re-lated for example to the mp-connectivity of permeability ? This problem of combiningprobabilistic information under redundancy is addressed by the tau model described indetail in the accompanying paper [6].

That paper addresses the general problem of combining conditional probabilitiesP (A|Di) accounting for data redundancy.

Define the following data probability ratiosx0, x1, . . . , xn and the target ratiox, allvalued in[0,∞], as:

x0 = 1−P (A)P (A)

, x1 = 1−P (A|D1)P (A|D1)

, . . . , xn = 1−P (A|Dn)P (A|Dn)

,and

x = 1−P (A|D1,...,Dn)P (A|D1,...,Dn)

, [0,∞]

The tau model is stated as:

x

x0=

n∏

i=1

(xi

x0

)τi

, τi ε [−∞, +∞] (2)

The most important components in this expression are the weightsτi. These weightswhich can lie anywhere in[−∞,∞] account for the redundancy between the informationarriving from different data. The accompanying theory paper [6] develops this expressionin more detail and interprets the tau weights. Further a calibration technique was proposedto compute these weights in practice.

An exact representation of data redundancy would require the generation of joint real-izations of the variablesA, D1, . . . , Dn. In its turn, this requires knowledge of the physicsof data generation accompanied by a conceptual knowledge ofthe underlying geologywhich relatesA to each of the datumDi. A conceptual depiction of geology can be rep-resented in the form of a training image (Ti) which can be seenas a single realization ofa spatial random functionZ(u). Once we have this conceptual depiction of the geology,a knowledge of the data physics can be used to generate the unknownA and the relateddataD1, . . . , Dn. Such a generation will give rise to a vectorial Ti:{A(l), D

(l)1 , . . . , D

(l)n }

where this Ti is now composed of a vector of(n + 1) variables. The superscript(l) refersto a specific vectorial Ti; there can bel = 1, . . . , L different such Tis. This vectorial Tican be used to determine the redundancy between the information arriving from dataDi

about the unknownA. Examples of such data physics are seismic ray tracing techniquesand flow models for generating well-test data. Such forward models of data generationare now widely developed for different fields of expertise and in fact, are commonly used

3

Page 4: The Tau model for Data Redundancy: Part 2

to calibrate the individual data probabilitiesP (A|Di). No rigorous evaluation of data re-dundancy is possible without such knowledge of the data physics. In fact, our evaluationof data redundancy and consequently the accuracy of any datacombination technique canonly be as good as our knowledge of these data physics. It is better to use some roughidea of the data physics, rather than totally ignoring it andrelying on typically poorlyestimated correlation values.

In this paper, we first generate a fine scale Ti representing a heterogenous distributionof permeability values. This fine scale data set is averaged using different combinationsof power averages resulting in four other variables, all of which together constitute a 5-dimensional vectorial Ti. The connectivity of permeability at one intermediate support isthen evaluated using data from smaller and larger supports of this vectorial Ti. The taumodel is used towards this purpose, with the tau weights being computed both exactlyand by using an approximate calibration technique. Finally, the impact of ignoring dataredundancy is demonstrated by assigning tau weights corresponding to a data conditionalindependence assumption.

2 Description of the data set

Consider the 500× 500 permeability field shown in Figure 1a. This reference fieldhas been constructed by using a combination of Gaussian simulation (GSLIB programsgsim, [3]) and a random drop (Poisson distribution) of rectangular-shape low permeabil-ity shales.

Let this reference field represent a permeability variableZ(u) defined on a quasi-pointsupport. Averaging of this variableZ(u) over different supports with different averagingfunctions leads to the other variables displayed in Figure 1:

• Zv(u) is the geometric average ofZ(u) defined over a constant volume of sizev = 11× 11 pixels.v is the support volume to be estimated for input to, say, a flowsimulator.

The variableZv(u) is to be evaluated using point support data of typeZ(u) and thefollowing data:

• Zw1(u) is a linear average of harmonic averages, the latter defined over 8 radialstrings each of length 10 pixels each (Figure 2). This average is assumed to mimicradial flow from a central well, approximated by 1D parallel flow occurring alongeach of the 8 radial directions.

4

Page 5: The Tau model for Data Redundancy: Part 2

• Zw2(u) is the harmonic average of 3 geometric averages, the latter defined over 3annular regions of radii 5, 10 and 20 pixels (Figure 3). This average mimics radialflow through these annular regions. Note that theω2-support is larger than that ofω1.

• Zs(u) is the linear average ofZ(u) defined over a constant large volume of sizes = 51× 51 pixels.Zs(u) is a large scale average ofZ(u) obtained, possibly, fromcalibration of seismic data.

All previous averaging types are power averages. However,Zw1(u) andZw2(u) callsfor a sequence of two power averages, they are thus non-linear and multiple-point aver-ages as opposed toZv(u) andZs(u) which are single-point averages. Taken together, theset of images (a) through (e) in Figure 1 constitutes a vectorial training image. Figure 4shows the histograms of these five variables. The impact of averaging can be seen clearlyin the smoothing of the histograms: the coefficient of variation decreases from 1.83 forZ(u) to 1.17, 1.10, 0.95 and 0.70 forZv(u), Zw1(u), Zw2(u) andZs(u), respectively. Ourfocus here is the relationship between the high values of these maps. Define the follow-ing upper quartile indicator variables representing the high values. The indicatorsI(u),Iv(u), Iw1(u) , Iw2(u) andIs(u) are the indicators of exceeding respectively the upperquartilesz0.75, z0.75

v , z0.75w1

, z0.75w2

andz0.75s defined at each support. These upper quartiles

can be read from the histograms in Figure 4. The corresponding indicator maps are shownin Figure 5.

Using this multiple-support data set, our target of estimation is the mp connectivity ofthe variableZv(u) at thev block support of size 11× 11. For sake of illustration, we willconsider only the rectilinear measure of connectivityK(h; n) in the East-West directioninstead of the curvilinear measureKC(h; n), [5].

Define the following data events:

A(n) : (Iv(u) = 1, . . . , Iv(u + nh) = 1), the event of observing a set ofn contiguoushigh values in directionh at thev support

Similarly, defineD

(n)1 : (I(u) = 1, . . . , I(u + nh) = 1), D

(n)2 : (Iw1(u) = 1, . . . , Iw1(u + nh) = 1),

D(n)3 : (Iw2(u) = 1, . . . , Iw2(u + nh) = 1) andD

(n)4 : (Is(u) = 1, . . . , Is(u + nh) = 1).

Note for reference: the prior probabilityP (A(1)) = 0.25 corresponds to the distancex0 = 1−0.25

0.25= 3. Similarly,P (D

(1)k ) = 0.25 for allk = 1, 2, 3, 4.

Next, consider the conditional connectivity function:P (A(n)|D(n)k ), k = 1, . . . , 4.

This function gives the probability of observing a string ofn connected high values at

5

Page 6: The Tau model for Data Redundancy: Part 2

the supportv, given observation of a colocated string of connected high values at anothersupportk.

The availability of the various reference maps of Figure 1 allows us to compute exactlyall four previous conditional connectivity functions as well as the following connectivitiesgiven two or three data events,P (A(n)|D

(n)i , D

(n)j ), P (A(n)|D

(n)i , D

(n)j , D

(n)k ) and given all

four data,P (A(n)|D(n)1 , D

(n)2 , D

(n)3 , D

(n)4 ).

Figure 6 shows all the single data conditional connectivitiesP (A(n)|D(n)k ),

k = 1, . . . , 4. The marginal probabilityP (A(n)) is the traditional rectilinear connectivityfunction, here defined at the supportv=11× 11; it is a decreasing function ofn. Note thatP (A(n)) atn = 1 is equal to 0.25, the global proportion of high-values. The other curvesin Figure 6 give the probability of connectedv-support high values in the E-W directiongiven D

(n)1 , D

(n)2 , D

(n)3 or D

(n)4 taken one at a time. Note that theP (A(n)|D

(n)2 ) and

P (A(n)|D(n)3 ) curves both intersect theP (A(n)|D

(n)1 ) curve. At all lags, the datumD(n)

4

gives the conditional probability closest to the marginal,i.e.,D(n)4 is the least informative

datum.Figure 7 shows the two-data conditional connectivities. All conditioning data pairs

which include datumD(n)1 show higher conditional probability than those which do not

include it. Presence of a string of high values at both the point support and supportw2

(dataD(n)1 andD

(n)3 ) results in almost sure occurrence of connected high valuesat thev

support.Figure 8 shows the three and four data conditional connectivities. Again, inclusion of

datumD(n)1 indicates strong connectivity at thev support. On the other hand, inclusion

of the large support datumD(n)4 does not add much information. This can be consistently

observed for all combinations including this datum.

3 Cross-support statistics

Conditional correlations

The conditional correlationCorr{Di, Dj |A} is a measure of how two dataDi andDj

relate to each other with regard to a specific outcome of the unknownA, in other wordsa measure of data redundancy for informing that particular value of unknownA. Herewe compute these conditional correlations between data taken two at a time for the eventA = 1. Figure 9 gives these correlation values for lagsn = 1 through 150. Note thatzero correlation implies conditional independence forA = 1. Note that a zero correlationgiven A = 1 does not imply the same forA = 0, unless there is full independence.

6

Page 7: The Tau model for Data Redundancy: Part 2

Since we are interested in the data eventA = 1, i.e., in the probability of pixels beingconnected, we only observe the correlation between the datagiven A = 1. The pairD

(n)1 andD

(n)3 have the least conditional correlation for all lags: connected strings of

high values at the point support and at thew2 support are almost independent of eachother given the connected values at supportv. The maximum conditional correlation isobserved between dataD(n)

3 andD(n)4 . Note that the same datumD(n)

3 exhibits both theleast and the greatest conditional correlation, withD

(n)1 andD

(n)4 respectively.

4 Combining data from different supports

4.1 Exact tau weights

In the accompanying paper [6], it was shown that the sequence-dependent exact tauweightsτi can be expressed as a ratio of the data-likelihoods ofD

(s)i given all previously

utilized dataD(s)1 , . . . , D

(s)i−1 in a sequences:

τ(s)i (d1, . . . , di, a) =

Ln(P (D

(s)i

|A,D(s)1 ,...,D

(s)i−1)

P (D(s)i

|A,D(s)1 ,...,D

(s)i−1)

)

Ln(P (D

(s)i

|A)

P (D(s)i

|A))

(3)

The superscript(s) refers to a specific sequence of data conditioning starting by D1

and ending withDi. The small case notationsd1, . . . , di, a represent the values taken bythe dataD1, . . . , Di and by the unknownA, respectively. In the example here, the dataand the unknown are all binary, thus:di = 0, 1 anda = 0, 1.

Knowledge of the joint probability distribution between the variablesA, D1, D2,D3 andD4 allows to compute these data-likelihoods. One can average these sequence-dependent tau weights over all possible sequencess resulting in sequence-averaged tauweightsτi. For this example, we would have 1, 2, (3! =6) and (4! =24) number of se-quences, respectively for cases with 1, 2, 3 and 4 data. Only the sequence-averaged tauweightsτi will be discussed hereafter.

Note that the multivariate distribution between the variables A(n), D(n)1 , D

(n)2 , D

(n)3

andD(n)4 change with lagn. Therefore, the relationships between the data is considered

separately for each lag, that is, the tau weights are a function of the lagn.Figure 10 shows these sequence-averaged tau weightsτi. As a general rule, deviation

of these weights from the value 1 implies deviation from conditional independence. Ini-tially, focus only on the first six cases in Figure 10 corresponding to conditioning to twodata only. It is clear that data with poor correlation (D1, D3) have tau weights closer to

7

Page 8: The Tau model for Data Redundancy: Part 2

1 than those with stronger correlation (D1, D2). Note that even thoughD1-D3 have verypoor correlation (ρ ≈ 0.1) from Figure 9, their tau weights are significantly different from1 (τ 1, τ 3 ≈ 0.7). This indicates that evensmall correlationscould result insignificantdeviationsfrom conditional independence.

Next, look at the (D1, D2) pair. At lag 45, the two dataD1 andD2 are equally in-formative and get an equal weight of 0.59. Next consider the caseD1 andD4. DatumD4 gets a lower weight (≈ 0.6) thanD1 (≈ 0.8). For the (D2, D3) pair, the weights arealmost similar and close to 0.6 at all lags. The correlation in this case is similar in magni-tude to (D1, D2. The cases (D2, D4) and (D3, D4) shows behavior similar to that seen in(D1, D4). DatumD4 receives much less weight than the other datum.

From these six cases, one can observe that the sequence averaged tau weights arefunction of the interactions betweendata redundancy, data informationanddata value.There is complex overlap between these concepts, for example, changing the data valuesresult in changing the data information content (a form of heteroscedasticity). All of theseconcepts need to be studied jointly towards their contribution to the tau weights.

The cases of conditioning to three data are considerably more complex. Consider firstthe triplet (D1, D2, D3) and compare with the three previous cases of (D1, D2), (D1, D3)and (D2, D3). τ 2 is now consistently lesser thanτ 1. There is no switching between thesetwo weights. The behavior ofτ 1 andτ 3 is similar to their behavior in the two-data casewith the switch occurring at a greater lag than in the two-data D1-D3 case. As opposedto the two-data case,τ 3 is consistently greater thanτ 2 suggesting that the presence of thedatumD1 has an impact on the interaction between the two dataD2, D3. The effect of thejoint distribution of the three dataD1, D2 andD3 is observed here.

Looking at the other cases of three and four data,τ 4 is always the lowest andτ 1

is consistently greater than the other weights. The other two weightsτ 2 and τ 3 showinteresting behavior in that, depending on which other dataare involved, one is lesser orgreater than the other. This means that weights of closely related data (Figure 9) whichare similarly important depends on their interaction with the other data.

4.2 Calibration-based tau weights

Next, the tau weights are computed using the proposed approximate calibration technique[6]. Briefly described, this technique first involves a ranking of data according to theirinformation content about the unknown. Then, one requires the conditional correlationof each datum with the most informative datum. The tau weightfor the most informativefirst datum is set toτ1 = 1. Then, all other data obtain a weight given by:

τi = 1 − (ρ2Di,D1|A

)f(t) ε [0, 1) (4)

8

Page 9: The Tau model for Data Redundancy: Part 2

wheret is a calibration parameter varying in[0, 1]. This technique requires the knowl-edge of the conditional correlation (Figure 9) and the calibration parametert.

Calibrating the tau model

In expression (4) the weightsτk are approximated in terms of the conditional correlationρ2

D(n)k

,D(n)1 |A

, requiring the calibration of a single scaling parameter, here denotedt(n).

Note here that the superscript(n) denotes the lag. More precisely, for a series of valuest(n)ε[0, 1] expression (4) gives the tau values:

τ(n)k = 1 − (ρ2

D(n)k

,D(n)1

|A)f(t(n)) ∀ k = 2, . . . , K

Using this series of values forτ (n)k , the estimated distancex∗(t) is computed using

equation (2). This estimated valuex∗(t) is then compared with the true valuex computedfrom the training images of Figure 1. The value of parametert(n) which minimizes thesquared error(x − x∗(t))2 is then taken as the optimal value:

t(n) = argmintε[0,1]

(x − x∗(t))2 (5)

Combining two data

First consider the case of conditioning to dataD1 and D2 only. Using the proceduredescribed above, the rescaling parametert(n) is computed for each lagn, see Figure 11.There is small variation of this parameter over all string lengthsn. This almost constantvalue oft(n) over alln suggests some aspect unique to the data set or/and to the proposedheuristic calibration equation (4).

Figure 12 shows theτ weights computed using the optimal calibration valuest(n) andthe conditional correlations shown in Figure 9. At lags lesser than 45, dataD(n)

2 is themost informative. But at larger lags, it is dataD

(n)1 that is most informative. Since, we

assign a weight 1 to the most informative datum, we observe onFigure 12 the switch inweights at lag 45.

Detailed studies have been performed to study the sensitivity of the estimated com-bined probability to incorrect calibration of parametert [4]. Such studies have revealedthat estimation using this calibration technique is indeedhighly sensitive to accurate es-timation of parametert. A poor knowledge of the data physics, hence a poor vectorialTi, will lead to incorrect evaluation of data redundancy andinaccurate estimates of theconditional probability.

9

Page 10: The Tau model for Data Redundancy: Part 2

The behavior forD1-D3 andD2-D3 is similar to that ofD1-D2. In case of the pairD1-D3, the switch in weights happens around lag 15 whereas forD2-D3, the switch is atan initial lag close to 1.

The behavior of tau weights is different for all cases involving datumD4. Since theD4 datum is always lesser informative than any other datum, it receives a lower weight(< 0.5) for all cases. No switch in weights is observed for thesecases involvingD4.

Combining three data:

Next, we observe conditioning to three data taken together.Consider the case of condi-tioning toD1, D2 andD3 in Figure 12. Beyond lag 45,D1 is the most informative datum,therefore it gets maximum weight of 1. Beyond this lag,τ3 is greater thenτ2. From Figure9, observe that amongst the two dataD2 andD3, the datum which is consistently bettercorrelated withD1 is D2. The greater the correlation with the most informative datum,the lesser the tau weight given to that datum. This is the behavior observed in this case.One observes a similar behavior in Figure 10 with the exact tau weights, albeit in a lessmarked fashion.

Next, consider the case of conditioning toD1, D2 andD4. Figure 12 shows thatD1

takes the maximum tau weight beyond lag 45, andτ4 receives a greater tau weight thanτ2. This arrives from the ordering of the respective correlations with datumD1 (Figure9). However this ordering of tau weights is different from that observed in Figure 10. Thetwo-point correlations used in computed the calibrated weights are insufficient to predictthe behavior of the exact tau weights.

Similar behavior is observed for the casesD1-D3-D4 andD2-D3-D4. The maximumweight is given the same datumD1 for the first case andD2 for the second case. But theorder of the weight given to the other data is reversed.

Considering all four data

Finally, we consider conditioning to all four dataD1, D2, D3 andD4.The behavior of the computed weights in Figure 12 is similar to that ofD1-D2-D3,

but more complex because of the introduction of the fourth datum D(n)4 . As before, one

can distinguish two zones:n ε [5, 45] andn > 45• In the first zonen ε [5, 45], datumD2 has maximum information about the

unknownA and receives a weight 1. The weight given to any other datum isgreater if itsconditional correlation with the maximally informative datum D

(n)2 is lesser (see Figure

9). Consequently:τ4 ≥ τ1 ≥ τ3.

10

Page 11: The Tau model for Data Redundancy: Part 2

• In the second zonen > 45, datumD(n)1 is maximally informative receiving

weight of 1. Hereτ3 ≥ τ4 ≥ τ2.Comparing with the exact averaged weights of Figure 10, one observes that the order-

ing of weights given to the lesser important datum is different. Also, those weights areless different from each other.

4.3 Cost of ignoring data redundancy

Just a visual inspection of Figures 1 and 5 would starkly convey that there is strong depen-dence between data coming from different supports. And thatthose relationships cannotall be linked to the common estimation goalA, the connectivity of strings at the supportsize 11× 11. Such data-dependence implies that the information coming from these sup-ports towards the estimation goalA bear considerable redundancy between each other.The plots for conditional correlation in Figure 9 confirm this information redundancy.

Figure 13 shows the estimated probabilities using tau weights of 1 resulting from aconditional independence (CI) assumption, compared with the exact probabilities. WithCI, one obtains an almost sure probability of 1 for all cases.Conditional independence,here, results in too much importance given to each individual datum, leading to an appar-ent sense of greater concordance of information and greatercertainty. Such assumptionis hardly ever justified, except for circumstances of physically inferred full independencebetween data, examples of which are few in the earth sciences. Therefore, one must, ifnecessary, make an assumption of CI with caution.

Assumption of conditional independence cannot be deemed either as a safe one. Safetyof a model is not determined by matters of analytical convenience, but by consequencesof the assumption on the physical quantities being estimated.

5 Discussion and Conclusions

This paper illustrates a unique methodology to combine complex multiple-point informa-tion arriving from diverse sources and defined over varying supports. The informationarriving from individual data are represented in the form ofconditional probabilities. Theoverlap in different data information, i.e. data redundancy, is accounted for by the tauweights in the tau model. The tau weights are a measure of multiple-data redundancy,therefore they go much beyond traditional measures such as two-data correlations. Thesetau weights are function of data values or interval of data values, therefore they can alsoaccount for heteroscedastic dependency between the data and the unknown.

11

Page 12: The Tau model for Data Redundancy: Part 2

The example presented in this paper illustrates the case of evaluating connectivity ofhigh permeability values which determines to a great extent, the paths of fluid flow andtransport in porous media. Since information about permeability and its connectivity, ar-rives from multiple sources, namely well-core, geophysical data and dynamic well-baseddata, one needs to synthesize all these different information together in order to arriveat an estimate of the unknown permeability connectivity. Complex redundancies existbetween these data. In most data combination procedures, there are calls to simplifyingassumptions which eventually result in ignoring these redundancies. We have shown thatignoring such redundancies cannot be considered as a safe assumption. In some cases, ac-counting for data redundancy can become as important as the individual data processingitself.

The important question that arises then is how to determine these data redundancies.Here, we have proposed the concept of a vectorial training image that represents a single,joint realization of the unknown and all data variables. A conceptual knowledge of thegeology is used along with an understanding of the data physics to create this vectorial Ti.Using this vectorial Ti, we infer all statistics required todetermine the tau weights. Theexact tau weights computed using this Ti show complex interactions resulting from theredundancy between the different data. An approximate calibration technique is proposedto compute these tau weights.

Application of the tau model framework to other problems mayrequire novel strate-gies to evaluate data redundancy. For many examples in earthsciences, it should be possi-ble to construct an analog model or a vectorial training image. Conceptual, mathematicaland numerical algorithms have been developed for many data measurement processesover the past few decades. These forward models which operate on earth propertiesand model the data measurement procedure are frequently used in inversion proceduresthat evaluate the data information about the unknown. In fact, it can be stated that anydata measurement corresponds to an inversion procedure which uses an implicit forwardmodel. Such forward models can be used to construct a vectorial Ti.

A major challenge that lies ahead is in identifying appropriate procedures for con-structing these vectorial Ti and henceforth evaluate data redundancy. That would requirenovel techniques adapting to each individual problem, for example, say, combination ofsatellite and ground truth information. Developing such novel procedures for evaluatingdata redundancy need to be the areas of focus in future.

12

Page 13: The Tau model for Data Redundancy: Part 2

References

[1] B.A. Arpat. A multi-scale pattern-based approach to sequential simulation. InPro-ceedings of Geostatistics Congress, Banff, 2004.

[2] C. Deutsch.Geostatistical Reservoir Modeling. Oxford University Press, 2002.

[3] C. Deutsch and A.G. Journel.GSLIB: Geostatistical software library and user’sguide. Oxford University Press, 1998.

[4] S. Krishnan. Combining diverse and partially redundant information in the EarthSciences. PhD Thesis, Stanford University, 2004.

[5] S. Krishnan and A.G. Journel. Spatial connectivity: from variograms to multiple-point measures.Mathematical Geology, 35(8):915–925, 2003.

[6] S. Krishnan and A.G. Journel. The tau model for data redundancy: Part 1.Mathe-matical Geology, this volume, 2005.

[7] S. Srinivasan. Is crisp modeling of geological objects important for flow - when isflow convective ? In12th Annual SCRF Meeting, Stanford University, 1999.

[8] S. Srinivasan.Integration of production data into reservoir models: a forward mod-eling perspective. PhD Thesis, Stanford University, 2000.

[9] S. Strebelle. Sequential simulation of complex geological structures using multiple-point statistics.Mathematical Geology, 34(1):1–22, 2001.

[10] T. Zhang, P. Switzer, and A.G. Journel. Sequential conditional simulation usingclassification of local patterns. InProceedings of Geostatistics Congress, Banff,2004.

13

Page 14: The Tau model for Data Redundancy: Part 2

(a) fine scale

East

Nor

th

0.0 500.00.0

500.0

10.0

100.0

1000.0

(b) geometric 11 x 11

East

Nor

th

0.0 500.00.0

500.0

10.0

100.0

1000.0

(c) string aver. length 10

East

Nor

th

0.0 500.00.0

500.0

10.0

100.0

1000.0

(d) annular aver. 5, 10, 20 radii

East

Nor

th

0.0 500.00.0

500.0

10.0

100.0

1000.0

(e) linear ave. 51 x 51

East

Nor

th

0.0 500.00.0

500.0

10.0

100.0

1000.0

Figure 1:Pixelmaps of a) point supportZ(u), b) Zv(u), c) Zw1(u), d) Zw2(u), e)Zs(u)

14

Page 15: The Tau model for Data Redundancy: Part 2

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

Figure 2:Radial directions: 8 radial directions of length 10 units defining supportw1

−20 −15 −10 −5 0 5 10 15 20−20

−15

−10

−5

0

5

10

15

20

Figure 3:Annular regions: 3 annular regions of radii 5, 10 and 20 pixels defining supportw2

15

Page 16: The Tau model for Data Redundancy: Part 2

Fre

quen

cy

(a) Z(u)1 10 100 1000

0.000

0.040

0.080

0.120

mean 228.32std. dev. 417.96

coef. of var 1.83upper quartile 265.78

median 98.51lower quartile 34.92

Fre

quen

cy

(b) Zv(u)1 10 100 1000

0.000

0.040

0.080

0.120mean 153.04

std. dev. 179.13coef. of var 1922.30

upper quartile 190.28median 91.52

lower quartile 44.61F

requ

ency

(c) Zw1(u)1 10 100 1000

0.000

0.040

0.080

0.120mean 132.79

std. dev. 145.73coef. of var 1.10

upper quartile 170.08median 84.27

lower quartile 38.19

Fre

quen

cy

(d) Zw2(u)1 10 100 1000

0.000

0.040

0.080

0.120

mean 122.38std. dev. 116.37

coef. of var 0.95upper quartile 154.12

median 87.08lower quartile 47.82

Fre

quen

cy

(e) Zs(u)1 10 100 1000

0.000

0.050

0.100

0.150

0.200mean 228.52

std. dev. 160.59coef. of var 0.70

upper quartile 289.89median 188.31

lower quartile 116.99

Figure 4:Histograms of a) point supportZ(u), b)Zv(u), c)Zw1(u), d)Zw2(u), e)Zs(u)

16

Page 17: The Tau model for Data Redundancy: Part 2

(a) fine scale

East

Nor

th

0.0 500.0000.0

500.000

0

1

(b) geometric 11 x 11

East

Nor

th

0.0 500.0000.0

500.000

0

1

(c) string aver. length 10

East

Nor

th

0.0 500.0000.0

500.000

0

1

(d) annular aver. 5, 10, 20 radii

East

Nor

th

0.0 500.0000.0

500.000

0

1

(e) linear ave. 51 x 51

East

Nor

th

0.0 500.0000.0

500.000

0

1

Figure 5:Pixelmaps of a) point supportI(u), b) Iv(u), c) Iw1(u), d) Iw2(u), e)Is(u)

17

Page 18: The Tau model for Data Redundancy: Part 2

0 20 40 60 80 100 120 1400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P(A)

P(A|D1)

P(A|D2)

P(A|D3)

P(A|D4)

lagpr

obab

ility

Student Version of MATLABFigure 6: Single data conditional probabilities: Connectivity function in EW of thehigh-values at thev support

0 20 40 60 80 100 120 1400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P(A|D1,D

2)

P(A|D1,D

3)

P(A|D1,D

4)

P(A|D2,D

3)

P(A|D2,D

4)

P(A|D3,D

4)

lag

prob

abili

ty

Student Version of MATLAB

Figure 7: Two data conditional probabilities: Connectivity function in EW at thevsupport.

18

Page 19: The Tau model for Data Redundancy: Part 2

0 20 40 60 80 100 120 1400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P(A|D1,D

2,D

3)

P(A|D1,D

2,D

4)

P(A|D1,D

3,D

4)

P(A|D2,D

3,D

4)

P(A|D1,D

2,D

3,D

4)

lagpr

obab

ility

Student Version of MATLABFigure 8:Three and four data conditional probabilities: Connectivity function in EWat thev support conditioned to data from three other supports

0 50 1000

0.1

0.2

0.3

...

...

1

Corr2(D1, D

2|A)

0 50 1000

0.1

0.2

0.3

...

...

1

Corr2(D1, D

3|A)

0 50 1000

0.1

0.2

0.3

...

...

1

Corr2(D1, D

4|A)

0 50 1000

0.1

0.2

0.3

...

...

1

Corr2(D2, D

3|A)

0 50 1000

0.1

0.2

0.3

...

...

1

Corr2(D2, D

4|A)

lag0 50 100

0

0.1

0.2

0.3

...

...

1

Corr2(D3, D

4|A)

Student Version of MATLAB

Figure 9: Conditional correlations: Square of conditional correlations betweenD

(n)k , k = 1, . . . , 4 given the unknownA =1

19

Page 20: The Tau model for Data Redundancy: Part 2

0 50 1000.2

0.4

0.6

0.8

1

D1,D

2

0 50 1000.2

0.4

0.6

0.8

1

D1,D

3

0 50 1000.2

0.4

0.6

0.8

1

D1,D

4

0 50 1000.2

0.4

0.6

0.8

1

D2,D

3

0 50 1000.2

0.4

0.6

0.8

1

D2,D

4

0 50 1000.2

0.4

0.6

0.8

1

D3,D

4

0 50 1000.2

0.4

0.6

0.8

1

D1,D

2,D

3

0 50 1000.2

0.4

0.6

0.8

1

D1,D

2,D

4

0 50 1000.2

0.4

0.6

0.8

1

D1,D

3,D

4

0 50 1000.2

0.4

0.6

0.8

1

D2,D

3,D

4

0 50 1000.2

0.4

0.6

0.8

1

D1,D

2,D

3,D

4

τ1

τ2

τ3

τ4

Figure 10: Exact averaged tau weights τi: The sequence averaged exact tau weightscomputed for each case of data conditioning using the reference vectorial Ti

20

Page 21: The Tau model for Data Redundancy: Part 2

0 1400

0.1

0.2

D1,D

2

0 1400

0.1

0.2

D1,D

3

0 1400

0.1

0.2

D1,D

4

0 1400

0.1

0.2

D2,D

3

0 1400

0.1

0.2

D2,D

4

t pa

ram

eter

0 1400

0.1

0.2

D3,D

4

0 1400

0.1

0.2

D1,D

2,D

3

0 1400

0.1

0.2

D1,D

2,D

4

0 1400

0.1

0.2

0.3

D1,D

3,D

4

0 1400

0.1

0.2

0.3

D2,D

3,D

4

nlag in EW

0 1400

0.1

0.2

0.3

D1,D

2,D

3,D

4

Figure 11:Calibration parameter t(n) for all cases: All sets of 11 combinations

21

Page 22: The Tau model for Data Redundancy: Part 2

1400

0.5

1

D1,D

2

1400

0.5

1

D1,D

3

1400

0.5

1

D1,D

4

1400

0.5

1

D2,D

3

1400

0.5

1

D2,D

4

tau

wei

ghts

1400

0.5

1

D3,D

4

1400

0.5

1

D1,D

2,D

3

1400

0.5

1

D1,D

2,D

4

1400

0.5

1

D1,D

3,D

4

1400

0.5

1

D2,D

3,D

4

nlag in EW140

0

0.5

1

D1,D

2,D

3,D

4

Figure 12:Tau weights using calibration: All sets of 11 tau weights from calibrationapproximation

22

Page 23: The Tau model for Data Redundancy: Part 2

0 140

0.6

0.8

1

D1,D

2

0 140

0.6

0.8

1

D1,D

3

0 140

0.6

0.8

1

D1,D

4

0 140

0.6

0.8

1

D2,D

3

0 140

0.6

0.8

1

D2,D

4

Com

bine

d pr

obab

ility

0 140

0.6

0.8

1

D3,D

4

0 140

0.6

0.8

1

D1,D

2,D

3

0 140

0.6

0.8

1

D1,D

2,D

4

0 140

0.6

0.8

1

D1,D

3,D

4

0 140

0.6

0.8

1

D2,D

3,D

4

0 140

0.6

0.8

1

D1,D

2,D

3,D

4

nlag

Exact conditionalprobability

Conditional Ind.estimate

Figure 13:Impact of incorrect conditional independence assumption: All sets of 11combinations

23