a visual data-mining methodology for seismic-facies - paradigm

A visual data-mining methodology for seismic-facies analysis:P s

I

o

.

@

.©

GEOPHYSICS, VOL. 74, NO. 1 �JANUARY-FEBRUARY 2009�; P. P1–P11, 12 FIGS., 1 TABLE.10.1190/1.3046455 OFFICIAL ONLINE VERSION: http://dx.doi.org/10.1190/1.3046455

art 1 - Testing and comparison with other unsupervised clustering method

ván Dimitri Marroquín1, Jean-Jules Brault2, and Bruce S. Hart3

1avlfisacl2

scvpmemncat�trtwas

tmrsm

ceived 1t, Mont

mpus d

ontréa

ABSTRACT

Seismic facies analysis aims to identify clusters �groups�of similar seismic trace shapes, where each cluster can beconsidered to represent variability in lithology, rock proper-ties, and/or fluid content of the strata being imaged. Unfortu-nately, it is not always clear whether the seismic data has anatural clustering structure. Cluster analysis consists of afamily of approaches that have significant potential for clas-sifying seismic trace shapes into meaningful clusters. Theclustering can be performed using a supervised process �as-signing a pattern to a predefined cluster� or an unsupervisedprocess �partitioning a collection of patterns into groupswithout predefined clusters�. We evaluate and compare dif-ferent unsupervised clustering algorithms �e.g., partition, hi-erarchical, probabilistic, and soft competitive models� forpattern recognition based entirely on the characteristics of theseismic response. From validation results on simple data sets,we demonstrate that a self-organizing maps algorithm imple-mented in a visual data-mining approach outperforms all oth-er clustering algorithms for interpreting the cluster structure.We apply this approach to 2D seismic models generated us-ing a discrete, known number of different stratigraphic geom-etries. The visual strategy recovers the correct number ofend-member seismic facies in the model tests, showing that itis suitable for pattern recognition in highly correlated andcontinuous seismic data.

INTRODUCTION

Seismic facies originally were defined based on qualitative butbjective descriptions of the seismic trace shape �Mitchum et al.,

Manuscript received by the Editor 23 October 2007; revised manuscript re1Formerly McGill University, Earth & Planetary Sciences Departmen

[email protected]École Polytechnique Montréal, Département de Génie électrique, Capolymtl.qc.ca.3Formerly McGill University, Earth & Planetary Sciences Department, M

[email protected].
2009 Society of Exploration Geophysicists.All rights reserved.
P1

977�. Defined this way, the facies correspond to amplitude, phase,nd frequency variations along and between traces in a specific inter-al of a seismic data set �e.g., high-amplitude parallel reflections orow-amplitude chaotic reflections�.Automated seismic facies classi-cation has become a standard technique used in interpreting 3Deismic data volumes �Coléou et al., 2003�. The objective is to char-cterize the physical properties of a reservoir by mapping seismic fa-ies, each of which is thought to represent an area with similar geo-ogic characteristics �Fournier and Derain, 1995; Saggaf et al.,003�.

Several studies use automated pattern-recognition techniques oneismic data. For example, Mathieu and Rice �1969� employ dis-riminant-factor analysis to interpret lithologic changes in a reser-oir from seismic data. Dumay and Fournier �1988� use both princi-al-component analysis and discriminant-factor analysis for auto-atic seismic facies recognition, whereas Simaan’s �1991� knowl-

dge-based system segments the seismic section into zones of com-on signal character. Yang and Huang �1991� use a hybrid neural

etwork to detect seismic patterns, Vinther et al. �1995� propose aomputer-aided approach for texture classification, and Fourniernd Derain �1995� combine seismic facies analysis with seismic-at-ribute calibration to characterize reservoir properties. West et al.2002� develop a seismic facies classification combining textural at-ributes and neural networks, Gao �2004� develops a texture-modelegression method for seismic facies visualization and discrimina-ion, and de Matos et al. �2007� propose an approach that integrates aavelet transform to identify singularities in the seismic trace shape

nd self-organizing maps as a visual tool to determine the clustertructure in data.

In unsupervised clustering of seismic data, the objective is to par-ition the data into distinct seismic facies without using a priori infor-

ation to guide the seismic trace classification �Fournier and De-ain, 1995�. Ideally, this partitioning provides a natural clusteringtructure, in which patterns in a given cluster resemble each otherore than patterns in other clusters �Xu and Wunsh, 2005�. Howev-

1 June 2008; published online 31 December 2008.réal, Canada; presently Paradigm, Houston, Texas, U.S.A. E-mail: ivan

e 1’Université de Montréal, Montréal, Canada. E-mail: jean-jules.brault

l, Canada; presently ConocoPhillips, Houston, Texas, U.S.A. E-mail: bruce

edcmm2tf

a2icfiJ

•

•

•

•

wccstbstmr2

rtgpimsc

�psa

swratl

1

2

3

4

cass�e

P

l

wi

fcatim

wtis

P

vh

Fr

P2 Marroquín et al.

r, seismic data volumes consist of highly continuous, redundantata that can sometimes be significantly noisy. These characteristicsan prevent well-defined clusters from developing, which in turnay inhibit the data from being partitioned into seismic facies thateaningfully characterize reservoir heterogeneities �Coléou et al.,

003�. Other potential problems that may lead to overlapping clus-ers — or worse, no clusters at all but a continuum — include the ef-ects of seismic acquisition, processing, and imaging on trace shape.

The number of different algorithms reported or used in clusternalysis is more than we can characterize here �e.g., Xu and Wunsh,005�. In general, most of the clustering algorithms have an underly-ng optimization criterion, which in turn determines the output of thelustering procedure. Thus, according to the criterion adopted to de-ne clusters, the algorithms can be classified into four types �e.g.,ain et al., 1999; Xu and Wunsh, 2005�:

Partition models, which attempt to decompose the data set direct-ly into a set of disjoint clusters.Probabilistic models, which attempt to estimate probabilistic pa-rameters with a clustering purpose.Hierarchical models, which proceed successively by mergingsmaller clusters into larger ones or by splitting larger clusters intosmaller ones.Soft competitive models, which are similar to partition models ex-cept that the adaptation of cluster centers is defined by theirinterconnections.

Compared to K-means �MacQueen, 1967�, these algorithms startith a random initial partition and keep reassigning the patterns to

lusters based on the similarity between the pattern and the clusterenters �except hierarchical-based algorithms, which find succes-ive clusters using previously established clusters�. It is also impor-ant to mention that nonrandom cluster-initialization methods haveeen proposed �e.g., He et al., 2004� to deal with clustering-outputensitivity to initial conditions and convergence to an optimal solu-ion. Unfortunately, there is no generally accepted initialization

ethod for clustering, and this topic continues to be an area of activeesearch �Fayyad et al., 1998; Meilă and Heckerman, 1998; He et al.,004; Moth’d Belal, 2005�.

In this paper, we test six different unsupervised clustering algo-ithms to assess their ability to define seismic facies �Figure 1�. Theests we conducted on simple 2D data sets demonstrate that a self-or-anizing maps algorithm implemented in a visual data-mining ap-roach �Guo et al., 2005� provides an environment that facilitates thenterpretation of the cluster structure. Further tests on 2D seismic

odels having a 25-dimensional analysis space show that this visualtrategy is suitable for recovering major groups of patterns in highlyorrelated and continuous seismic data. In the companion paper

igure 1. Clustering methods. For each category, we selected a rep-esentative algorithm to assess its ability to define seismic facies.

Marroquín et al., 2009�, we illustrate how this approach can be ap-lied to seismic facies analysis on a 3D seismic data volume by pre-enting examples of its application to the characterization of carbon-te and clastic deposits.

METHODOLOGY

Cluster analysis is a useful data-mining technique for identifyingignificant distributions in data sets �Jain et al., 1999�. However, aide variety of clustering algorithms have been developed, and the

esults depend on the criteria used for partitioning the data. It is notlways clear which clustering algorithm will provide the best solu-ion in new situations. To address this clustering problem, we fol-owed the framework proposed by Xu and Wunsch �2005�:

� Select an analysis window, i.e., the features on which the clus-ter analysis is to be performed.

� Choose a clustering algorithm that defines a good clusteringscheme for the given data set.

� Validate the results and verify the effectiveness of a clusteringalgorithm using an appropriate criterion or set of criteria.

� Analyze the data partitions recovered and interpret whetherthey represent the correct cluster structure in the data set.

Below, we review the main characteristics of the unsupervisedlustering algorithms. We then present the criteria used to comparend evaluate the clustering algorithms’ ability to recover the clustertructure present in the data sets. The following discussion is repre-entative, not exhaustive. Interested readers can consult Marroquín2007� for details, including the public-domain software we evaluat-d during this study.

artition models

The objective of partition-based algorithms is to optimize the fol-owing criterion:

E � �i�1

K

�x�Ki

d�x,mi� , �1�

here x is the input patterns, mi is the center of cluster Ki, and d�x,mi�s the Euclidean distance between an input pattern x and m.

The hard competitive learning algorithm �Fritzke, 1997� is re-erred to as an online crisp clustering algorithm, where the set ofluster centers w are seen as output units of a neural network thatdapts every time a new input x is presented. Only one output unit ishe winner for each given pattern, and the winner converges towardts input pattern. The adaptation rule for the winner unit is deter-

ined by

�w�t� j � ��t��x�t� � w�t� j� , �2�

here ��t� is an exponential learning rate that decays over time �i.e.,he extent to which the winner unit is adapted toward the input� and ts the number of iterations. This update procedure reinforces the as-ociation between the input pattern and the winning unit.

robabilistic models

In the probabilistic approach, the input patterns x1, . . . , xn in Rd areiewed as coming from a mixture of Gaussian distributions�x �� ,� �, each representing a different cluster with mean � and
i k k k

ct�h

rdhsEfia1fgCcE

H

pcatddttbs

tcpTtatala

S

atedn

cAr

whctteIlmpfm

aail

witbofin

aenacaptSt�dtd

munriftcmmc

s1ta

Unsupervised clustering methods P3

ovariance matrix � k. The probability that xi belongs to a distribu-ion is denoted by pk, with �kpk � 1. We want to find the parameters

k � �p1, . . . ,pk,�1, . . . ,�k,� 1, . . . ,� k� that maximize the log likeli-ood �Dempster et al., 1977�

L�� k�x1, . . . ,xn� � �i�1

n

ln��k�1

K

pkh�xi��k,� k�� . �3�

The expectation maximization �EM; Dempster et al., 1977� algo-ithm is an iterative procedure for maximizing L�� k�x1, . . . ,xn�. Toeal with the problem of getting the highest maximum log-likeli-ood values, two EM strategies were used. The first was a randomtart of a classification EM �CEM� algorithm followed by a run of theM algorithm �CEM-EM�. The CEM algorithm maximizes a modi-ed version of the log-likelihood equation �equation 3�, resulting inmixture of well-separated distributions �Celeux and Govaert,

992�. The second strategy was a random start of the EM algorithmollowed by a run of the EM algorithm �EM-EM�. For both strate-ies, the stopping rule was the number of iterations as proposed byeleux and Govaert �1992�. They advise against using relativehanges in the log likelihood because of the slow convergence of theM algorithm.

ierarchical models

In hierarchical clustering, a data set is organized into a series ofartitions according to the measures of the dissimilarities among theurrent clusters. The result of the clustering procedure is depicted bydendogram, in which the intermediate nodes describe the similari-

y of the input patterns and the height of the dendogram expresses theistance between each pair of clusters �Schonlau, 2002�. Den-ograms provide a visual support for the potential clustering struc-ure, from which the number of clusters can be determined by cuttinghe dendogram at different levels of dissimilarity. Hierarchical-ased algorithms are usually classified as agglomerative and divi-ive methods.

In the divisive method, all input patterns belong to the same clus-er, and a procedure successively divides this cluster until singletonlusters �i.e., a cluster composed of a single input pattern� areresent. Agglomerative algorithms proceed in the opposite way.hey start with N singleton clusters and by a series of merge opera-

ions lead all input patterns to the same cluster. Hierarchical-basedlgorithms have several variations, depending on the criterion usedo measure the distances among the clusters. We opted for the aver-ge linkage metric to avoid the extremes of large clusters �i.e., singleinkage� or tight, compact clusters �i.e., complete linkage; Kauffmannd Rousseeuw, 1990�.

oft competitive models

Soft competitive algorithms attempt to adapt the actual winnernd all other neighboring cluster centers in proportion to their activa-ion. Those centers closer to the current input pattern make the larg-st adaptation, and those farther away make only minor, or no, up-ates. We tested two different algorithms: neural gas and self-orga-izing maps.

The neural gas algorithm �Martinetz and Schulten, 1991� sorts allluster centers w according to their distance from the input pattern x.fter the sorting, all cluster centers are adapted according to their

ank position. The adaptation of the winner units is given by

�w�t� j � ��t�h��k��x�t� � w�t� j� , �4�

here ��t� is an exponential learning rate that decays over time and��k� is the neighborhood ranking, with k denoting the number of thelosest neighbors to the cluster center w, and t is the number of adap-ations. When the neighborhood-ranking parameter is close to one,he adaptation rule of the neural gas algorithm �equation 4� becomesquivalent to the hard competitive learning algorithm �equation 2�.n contrast to Kohonen’s algorithm �Kohonen, 1995� described be-ow, the neural gas algorithm does not have a topology of fixed di-

ensionality to get a low-dimensional representation of the inputatterns. In addition, the h��k� parameter is an adjusting rule that per-orms a normalized exponential avoiding confinement of local mini-a �Martinetz and Schulten, 1991�.Self-organizing maps �SOM; Kohonen, 1995� is an algorithm that

pproximates the input patterns by a finite set of processing neuronsrranged in a regular, usually 2D grid of map nodes. Each SOM nodes associated with a weight vector w, updated according to the fol-owing rule:

�wj � wj�t� � ��t�hbi�k��x�t� � wj�t�� , �5�

here t is the number of iterations, ��t� is the learning rate, and hbi�k�s the neighborhood kernel.At the beginning of the learning process,he neighborhood kernel is chosen to be fairly large to guarantee glo-al ordering of the SOM nodes. Later, the learning rate and the widthf the neighborhood kernel slowly decrease during the learning tone-tune the SOM nodes, resulting in a topological ordering �e.g.,odes adjacent to each other tend to have similar weight vectors�.

We used a SOM algorithm implemented in a visual data-miningpproach by Guo et al. �2005�. In this data-mining tool, the interpret-r can explore and examine patterns in the data through the coordi-ated visual interpretation of three graphic display components: �1�2D hexagonal grid of map nodes to represent visually the SOM’slustering output with the unified distance matrix �U-matrix; Ultschnd Siemon, 1990�, �2� a multidimensional visualization with thearallel coordinate plot �PCP; Inselberg, 1985�, and �3� a map viewo display the spatial relationships of the patterns discovered byOM. Each component performs a specific task and can coordinate

he whole process with other components for exploring complexi.e., high-dimensional and very large� data sets. Guo et al. �2005�evelop their method to investigate multivariate spatial patterns be-ween variables such as socioeconomic factors and cancer inci-ence. In this paper, we use their method to analyze seismic data.

The U-matrix display �Ultsch and Siemon, 1990� contains twoaps in one: a distance map and a frequency map. The distance map

ses a grayscale image to indicate visually the similarity of a mapode to its immediate neighbors. Dark shading indicates the sur-ounding map nodes have dramatically different values; light shad-ng shows the surrounding nodes have very similar values. There-ore, a region in the U-matrix with light shading is a probable indica-ion of a cluster. On the other hand, the frequency map is presented asolored circles of variable size in the U-matrix display. It shows howany input patterns are projected to each map node, revealing whichap nodes are active and which ones smooth the transition from one

luster to the next.In the PCP approach to visualization, a point in m-dimensional

pace is represented as a series of m � 1 polygonal lines �Inselberg,985�. In contrast to Cartesian coordinates, in which all axes are mu-ually perpendicular, each axis in the PCP plot corresponds to a vari-ble �dimension�, and the N-axes are organized as uniformly spaced

pppaeLsi�p

aiKtqncpgtcss

C

c

cie2cteK

wtb

w

t

hnawo

C

ctatct

P

tsV3dtwwc

P

�tsca

Fcywt


arallel lines. A vector with values �x1,x2,…,xN� is represented bylotting x1 on axis 1, x2 on axis 2, and so on through xN on axis N. Theoints plotted in this manner are joined by a polygonal segment. Thedvantage of PCP display is that it allows us to investigate the pres-nce of multidimensional clusters in a planar diagram �Wegman anduo, 1996�. Thus, clusters are detected by the grouping of polygonalegments on any one axis or between any pair of axes. Furthermore,t is possible to see whether clusters propagate through other axesdimensions� because of the multidimensional connectedness of theolygonal segments.

However, as with other visualization techniques when data setsre too complex, the PCP display suffers from overplotting, result-ng in an image far too cluttered to perceive any patterns �Keim andreigel, 1996�. The visual strategy reduces visual clutter by plotting

he summarized output of the SOM �Guo et al., 2005�. Like the fre-uency map, the thickness of the polygonal segments represents theumber of input patterns contained in an SOM map node. The PCPomponent can use nested-means or linear-scaling procedures to im-rove the visual aspect of the polygonal segments. The visual strate-y also integrates a color component that links the frequency map, inhe U-matrix display, with the PCP and map view components. Be-ause of the 2D nature of the data sets used in this paper, we do nothow the map display. However, map views from 3D seismic dataets are presented in Marroquín et al. �2009�.

luster validity assessmentWe sought, unsuccessfully �see below�, a single criterion that

ould be used to evaluate and compare the results of the different

7

7

7

-2

-2

-2

-1

-1

-1

7

7

7

a)

b)

c)

igure 2. 2D scatter plots of normal data sets used for evaluating andomparing the unsupervised clustering algorithms. The x- and-axes show the distribution of data points. �a� Data set 1 consists ofell-separated clusters. In �b� data set 2 and �c� data set 3, the clus-

ers become closer and eventually begin to overlap.

lustering methods. Many criteria have been proposed to character-ze cluster structure in a data set �Jain et al., 1999�. We opted for anntropy-based criterion named variation of information �VI; Meilă,003�. VI measures the amount of information lost or gained whenomparing two partitions of the same data set �Meilă, 2003�. Thus,he criterion gives a measure of clustering quality because the small-r the entropy value, the better the mapping of the input patterns ontonatural clusters. The criterion is given by

VI � �H�C� � I�C,C�� H�C�� I�C,C�� , �6�

here H�C� and H�C�� represent the entropies associated with clus-ers C and C�, respectively, and I�C,C�� is the mutual informationetween the clusters C, C� given by

I�C,C�� k�1

K

�k��1

K�

P�k,k��logP�k,k��

P�k�P�k��, �7�

here P�k� is the probability that the outcome is in cluster Ck, andP�k,k�� is the joint probability that a data point belongs to Ck in clus-ering C and to Ck� in C� �Meilă, 2003�.

We used the VI criterion to estimate the number of clusters forard competitive learning, EM, and neural gas algorithms. Unfortu-ately, the VI criterion could not be used for the hierarchical-basednd SOM algorithms; so in these cases the output graphic displaysere examined to evaluate the data structure and to infer the numberf clusters.

RESULTS

lustering analysis on 2D normally distributed data

We conducted experimental evaluations to examine whether thelustering algorithms could recover the true cluster structure onhree 2D normally distributed data sets �referred to as data sets 1, 2,nd 3�, each containing 950 data points divided into three popula-ions �Figure 2a-c�. The data sets ranged from very well-separatedlusters �data set 1� through an intermediate state �data set 2� to clus-ers that almost overlap �data set 3�.

artition models

For data sets 1 and 2, the VI criterion shows two unexpectedrends: It points to the incorrect number of clusters �Figure 3a� and itelects more than one possible number of clusters �Figure 3b�. TheI criterion points to the correct number of clusters only for data set�Figure 3c�. Furthermore, these variations in the VI criterion are in-ependent of the quality of the clustering output. To demonstratehis, we ran the hard competitive learning algorithm a second timeith the original data sets. The results are plotted in Figure 3d-f, inhich the VI criterion effectively points to the correct number of

lusters.

robabilistic models

The VI criterion generally points to the correct number of clustersFigure 4�. This implies that a spherical Gaussian model fits the dis-ribution of points in the test data sets; as a consequence, the EMtrategies show a better convergence when compared to the hardompetitive learning and neural gas algorithms �see Figure 3a and bnd Figure 6a and b, respectively�.

H

svnwtmai

S

aahtmfpfrd

dte

a

b

c

FPc�idt

a

b

c

FidCborecovers the true number of clusters.

a

b

c

Fd��oTaasterisk�.


ierarchical models

For data set 1 �Figure 5a-d�, in which the clusters are very welleparated, a cut through the longest links in the dendograms pro-ides the correct number of clusters �i.e., K � 3�. As the clustersear each other �data set 2�, the agglomerative algorithm still doesell �Figure 5b�, whereas the divisive algorithm can no longer keep

he clusters apart �Figure 5e�. Finally, when the clusters overlap evenore �data set 3�, the lack of hierarchical structure is further accentu-

ted; therefore, neither of the dendograms �Figure 5c-f� provides annformative description of the clustering structure.

oft competitive models

For the neural gas clustering results �Figure 6a-c�, the VI criterionlso shows identical trends to those of the hard competitive learninglgorithm �see Figure 3a-c�. We investigated the causes of this be-avior by examining the clustering output and found that the VI cri-erion only illustrates how both algorithms generated identical infor-

ation exchange. We also ran the neural gas algorithm a second timerom the original data sets.Again, the clustering output �Figure 6d-f�roduced an identical information exchange �refer to Figure 3d-f�.Aurther examination of the clustering output revealed that both algo-ithms created clusters, whose centers are more representative of theistribution of points in the test data sets.

The SOM algorithm we tested performed well. From the U-matrixisplay of data sets 1 and 2 �Figure 7a and b�, we clearly distinguishhree clusters separated by ridges of dark gray nodes. On these ridg-s, the frequency-count map shows nodes with very few hits.As sug-

)

)

)

d)

e)

f)

igure 3. Clustering results for hard competitive learning algorithm.lots are of variation information �VI� criterion versus number oflusters from 2D normal data sets with three clusters �Figure 2�.a–c� Results obtained in a first run; �d–f� improved results obtainedn a second run. Note how the performance of partition algorithmsepends on initial clustering conditions. The arrow indicates whenhe algorithm recovers the true number of clusters.

)

)

)

d)

e)

f)

igure 4. Clustering results for EM algorithm. Plots are of variationnformation VI criterion versus number of clusters from 2D normalata sets consisting of three clusters �Figure 2�. �a–c� Results for theEM-EM strategy; �d–f� results for the EM-EM strategy. Model-ased performance depends on strategies to improve the estimationf the maximum likelihood. The arrow indicates when the algorithm

Sim

ilarit

yS

imila

rity

Sim

ilarit

y

0

2

4

6

8

0

2

4

6

0

2

4

6

8

K = 3* K = 3*

K = 3*K = 3*

K = 2 K = 2

d)

e)

f)

Sim

ilarit

yS

imila

rity

Sim

ilarit

y

Agglomerative method Divisive method

)

)

)

5

4

3

210

4

3

2

1

0

2.52.0

1.51.0

0.5

0.0

igure 5. Clustering results for hierarchical-based algorithms. Den-ograms obtained by using �a–c� the agglomerative method andd–f� the divisive method on 2D normal data sets with three clustersFigure 2�. Note how the degree of overlapping influences the abilityf the hierarchical algorithms to retrieve the true number of clusters.he horizontal line shows the K groups predicted from the visual ex-mination of the longest links �a correct estimation is denoted by an

gfctstfipwtstt�d

C

�tbopgtaiaaa

a

b

c

Fv2tNtthe true number of clusters.

F2�aibPc


ested by Zhang and Li �1993�, we can also acquire informationrom these nodes because they may indicate cluster borders. Theluster structure suggested by the U-matrix display corresponds tohe correct number of clusters in these test models. In the corre-ponding PCP graphic displays �Figure 7d and e�, the associations ofhe polygonal lines correctly represent the number of clusters de-ned in the U-matrix display. For data set 3, in which the overlap-ing is more accentuated, the U-matrix display �Figure 7c� shows aell-separated cluster of patterns with red/pink on the right side of

he SOM map, along with a poorly defined ridge of dark gray nodeseparating the patterns with green/brown from those with blue. Inhis case, the frequency map indicates that all nodes are active duringhe learning process. However, the polygonal lines in the PCP plotFigure 7f� can be used to define the presence of three clusters in thisata set.

lustering analysis on 2D seismic models

In the second phase of the study, we built four geologic modelscase studies 1-4� having variable lateral geology but a constanthickness of 22 m �72 ft�. Each model consisted of a different com-ination of five different stratigraphic successions �Figure 8�. Fourf the five stratigraphic successions consisted of a high-acoustic-im-edance layer in which we embedded �1� a bell-shaped �sharp base,radational top� low-impedance layer, �2� a funnel-shaped �grada-ional base, sharp top� low-impedance layer, �3� a single low-imped-nce layer, �4� two thin low-impedance layers, or �5� two thin low-mpedance layers, one of which was funnel shaped. We also includedramp stratigraphic succession wherein an underlying low-imped-nce layer graded upward into a high-impedance layer. The velocitynd density of the high-impedance layer were set to typical values

x display PCP graphic display

Datava

lues

Datava

lues

Datava

lues

d)

e)

f)

)

)

)

d)

e)

f)

igure 6. Clustering results for neural gas algorithm. Plots are ofariation information VI criterion versus number of clusters fromD normal data sets with three clusters �Figure 2�. �a–c� Results ob-ained in a first run; �d–f� improved results obtained in a second run.ote how the performance of the algorithm depends on initial clus-

ering conditions. The arrow indicates when the algorithm recovers

U-matria)

b)

c)

igure 7. Visual data-mining clustering results ofD normal data sets with three clusters �Figure 2�.a–c� The interpreted clusters are shown by circledreas in the U-matrix displays �i.e., light gray shad-ng between adjacent map nodes�. �d–f� The num-er of distinct associations of polygonal lines inCP displays corresponds to the correct number oflusters in these data sets.

�iftcr

bzigt

dtcuT2tta

cs

wifidsltct�tvrisrnmttcfl

lpdHhne

sfsaa

sdcIntc1mdb�

Tr

R

S

C

S


from log-interpretation manuals� for a sandstone, whereas the low-mpedance layer was assigned typical velocity and density valuesor coal in one run and shale in a second run �Table 1�. This was doneo assess the impact that changes in reflection amplitude �because ofhanges in reflection coefficients� might have on the clustering algo-ithms’performance.

Our modeling package linearly interpolated acoustic impedanceetween input profiles. The geologic models were convolved with aero-phase band-pass wavelet �10–20–60–70 Hz� to create vertical-ncidence 2D seismic models sampled at 2 ms. The number of tracesenerated for each model ranged from 2000 to 5000, depending onhe width of each geologic model �Figure 8�.

The first step in the process of cluster analysis was to select a win-ow of investigation such that we captured the lateral variability inrace shape. To that end, we tracked a horizon in the seismic modelsorresponding to the top of the horizontal-layer interval. We thensed a horizon-guided constant window of 50 ms to extract the data.his resulted in traces with a length of 25 samples, giving rise to a5-dimensional analysis space �in contrast to the 2D space used inhe previous section; Figure 2�. Figure 9 illus-rates the characteristics of the seismic responsessociated with each stratigraphic profile.

To illustrate the presence of well-defined artifi-ial cluster structures in each case study, we mea-ured the Euclidean distance:

Dij � ��k�1

n

�xik � xjk�2�1/2

, �8�

here i, j are two consecutive model traces and ks the number of samples. The dissimilarity pro-les are shown in Figure 10, in which constantissimilarity values are indicators of clustertructures, and increasing or decreasing dissimi-arity values represent transition zones betweenwo stratigraphic profiles �see Figure 8�. Hence, aluster is defined by two major components: �1�he size �i.e., the number of synthetic traces� and2� the content �i.e., the shape of the syntheticrace characterizing the cluster�. Therefore, theolume occupied by these stratigraphic profiles isepresented by clusters that are close to each othern the multidimensional space, in which the tran-ition zones can be regarded as diffuse clouds sur-ounding the clusters. In all scenario models, theumber of seismic traces represented by the end-ember trace types is greater than the number of

races having intermediate characteristics. Thus,he number of seismic facies for the models inase studies 1 and 4 is five, whereas the modelsor case studies 2 and 3 have six and three seismic facies, respective-y.

In our tests using the 2D seismic models, the hard competitiveearning, EM, divisive, agglomerative, and neural gas algorithmserformed poorly in recovering the correct cluster structure in theata. Details of these analyses are discussed in Marroquín �2007�.e attributes the failure of these methods to factors that include theigh dimensionality of the data, the highly correlated and continuousature of the data �i.e., lack of well-separated clusters�, and differ-nces between the actual cluster geometry and the structure �e.g.,

a)

c)

Figure 8. Geoseismic modeembedded horthe embeddedphysical proppedance profi

pherical� assumed by the methods. In light of these failures and theailures identified using simple 2D data sets �see Figure 2�, we re-trict our discussion to the results obtained using the SOM algorithmnd the associated visual data-mining approach developed by Guo etl. �2005�.

For the models in case studies 1, 3, and 4 in sandstone/coal andandstone/shale scenarios, the number of clusters in the U-matrixisplays �Figure 11a, c, and d and Figure 12a, c, and d� is representedorrectly, with regions of light shading encircled by dark gray ridges.n each display, the frequency map shows a fairly large circle that de-otes the high number of data patterns mapped in that node. Addi-ionally, the distinctive associations of the polygonal segments in theorresponding PCPgraphic displays �Figure 11e, g, and h and Figure2e, g, and h� confirm the cluster structure discovered in the SOMaps. Based on the previous interpretation, the resulting U-matrix

isplays �Figures 11b and 12b� for case study 2 show a cluster num-er of five, smaller than the actual number of seismic facies �i.e., K

6�.

able 1. Density and velocity measurements associated withock types.

ock type Density �g/cm3� Velocity �m/s �ft/s��

andstone 2.6 5000 �16,404�

oal 1.5 1300 �4265�

hale 2.1 2000 �6562�

b)

d)

odels built to simulate specific artificial cluster structures in the 2Dre 9�. The geologic models consisted of a sandstone medium with an

l layer of 22 m �72 ft�. Two scenarios are modeled. In the first case,s assigned velocities and densities typical of coal; in the second case,f a typical shale are assigned. The plotted curves represent the im-e stratigraphic succession.

logic mls �Figuizontalayer ierties oles of th

spctr

tieswu

cncffit�

t

FteSitaram

at

a

c

Fvgci

a

c

FVt


Moreover, the cluster boundaries are not well defined in compari-on with the other results. Examination of the input model helps ex-lain why. This is the only model that does not have well-isolatedlusters �see Figure 10b�. Instead, it has three major groups of clus-ers, and this situation may prevent the U-matrix display from accu-ately representing the distance, in the multidimensional space, of

) b)

) d)

igure 9. 2D seismic models synthesized from the models in Figureolved with a zero-phase band-pass wavelet �10–20–60–70 Hz�. Tenerated for case studies 1 and 2 was 5000 �a–b�, for case study 3 wase study 4 was 4000 �c�. To avoid a cluttered display, only two mong a stratigraphic profile are plotted.

) b)

) d)

igure 10. Plots are of dissimilarity versus number of seismic traces.alues of constant dissimilarity indicate the number of clusters K in

he 2D seismic models.

he seismic facies. However, visual examination of the correspond-ng PCPgraphic displays �Figures 11f and 12f� indicates the possiblexistence of a sixth cluster in the data sets �e.g., represented by an as-ociation of green polygonal segments�. In light of this observation,e reexamined the U-matrix displays of the case study 2 model �Fig-res 11b and 12b� and concluded that these polygonal associations

support the presence of the sixth cluster.

DISCUSSION

The clustering problem is about how to parti-tion a given data set into clusters such that the datain one cluster resemble each other more than datain other clusters �Jain et al., 1999; Xu and Wunsh,2005�. Therefore, it is important when investigat-ing the problem of interest to select an appropri-ate clustering strategy �e.g., ability to handle clus-ters of arbitrary shapes and high-dimensional in-put pattern�. When working with seismic data, wehope cluster analysis will help us identify a dis-crete number of trace shapes, each somehow re-lated to geologic features of interest �differentdepositional environments, pore-filling fluids,and so on�. We thus investigated a variety of unsu-pervised clustering algorithms with respect totheir effectiveness at recovering the correct num-ber of end-member seismic facies in 2D seismic-model data sets.

As pointed out by Xu and Wunsch �2005�, theoutput of the clustering procedure can be errone-ous if the underlying model it assumes does not fitthe cluster structure of the data sets. Based ontests performed on 2D seismic models, we con-

lude that the poor performance of the hard competitive learning andeural gas algorithms is because �1� they tend to form hypersphericallusters that may not be effective if the data are in other geometricorms; �2� they are based on a metric distance �equation 1� and sufferrom the “curse” of dimensionality �Xu and Wunsch, 2005� — thats, as the number of dimensions in a data set increases, distances be-ween the nearest points become similar to those of other pointsBeyer et al., 1999�; and �3� they are sensitive to the initial choice ofhe clustering parameters.

The poor performance of the EM algorithm can be explained.irst, the high degree of similarity among the model traces caused

he covariance matrix to be singular, which in turn produced the poorstimate of the maximum log likelihood �Fraley and Raftery, 2002�.econd, the assumption that input data arise from a spherical Gauss-

an model does not reflect the geometric form of the clusters. Third,he EM technique can struggle with high-dimensional data �Wittennd Frank, 2005�. On the other hand, the hierarchical-based algo-ithms assume that truly hierarchical relations exist in the data �Xund Wunsch, 2005�, which does not represent the nature of the 2Dodel data sets.In contrast to the unsupervised clustering algorithms described

bove, the SOM algorithm, with its integrated data-visualizationechniques �i.e., U-matrix, PCP graphic and geographic mapping

ch were con-ber of traces0 �d�, and fores represent-

8, whihe numas 200

del trac

dstfcaibtdvsSc

nlSS

ccs

sSaddptttsuDd3m

a

b

c

d


isplays�, provides an environment for exploring patterns in the dataets. This visual strategy shows the best performance in recoveringhe number of end-member seismic facies. The SOM algorithm of-ers a significant advantage in that it is not constrained to a particularluster shape. It also generates a map space in which nearby nodesre pulled toward each other, thereby performing localized cluster-ng on these nodes. In this sense, Flexer �2001� considers the SOM toe a method for simultaneous clustering and discovering real, poten-ially observable data structures in a visual way. However, SOMoes not optimize a cost function; instead, the SOM algorithm in-olves a trade-off between the accuracy of the clustering and themoothness of the topological mapping. This flexible nature of theOM algorithm can impair its performance in recovering the trueluster structure underlying a data set �Flexer, 2001�.

Although not tested in this paper, new developments of the origi-al SOM algorithm have been introduced. For example, Yin and Al-ison �2001�, van Hulle �2002�, and Verbeek et al. �2005� treat theOM nodes as Gaussian �or other� mixture models. The resultingOM algorithm offers the advantages of probabilistic models, in-

U-matrix display PCP graphic

Amplitu

deva

lues

)

)

)

e)

f)

g)

) h)

Amplitu

deva

lues

Amplitu

deva

lues

Amplitu

deva

lues

luding the ability to interpret the SOM as mixture models, theoreti-al guarantees of convergence, and robustness for missing andkewed data �Xu and Jordan, 1995; Ordonez and Omiecinski, 2002�.

Atrained SOM also may suffer from misrepresentation of the den-ity input space �Haykin, 1999�, i.e., the topological ordering of theOM nodes may overrepresent an area of low density, whereas anrea of high density can be underrepresented. This situation can beepicted in the results of case study 2 �Figures 11b and 12b�.Anotherrawback of the visual strategy is the subjective nature of the ap-roach. An interpreter makes a judgment to define how many clus-ers might be present by looking at the graphic displays. Moreover,he interpretation of the U-matrix �e.g., Koskela et al., 2004; Mou-arde and Ultsch, 2005� and/or PCP �e.g., Artero et al., 2004; Johans-on et al., 2005� graphic displays in an automated approach are stillnresolved, and they will need to be addressed by future research.espite these problems, in Marroquín et al. �2009� we present andiscuss the application of the visual data-mining approach on a realD seismic data volume and show the potential of the visual environ-ent for conducting a seismic facies analysis.

y Figure 11. Visual data-mining clustering results of2D seismic models generated using sandstone/coalsequence. Note how the interpreted cluster distri-bution in the U-matrix displays �circled areas oflight gray shading between adjacent map nodes ina, c, and d� correctly recovers the number of clus-ters and how the distinct association of polygonallines in PCP graphic displays support such a find-ing. For case study 2, the combined interpretationof both displays �b and f� helps to identify six clus-ters.

displa

aaodwc

hioptg

tdnest

df2SDm

a

b

c

d

FuUalc


CONCLUSIONS

In this study, we evaluated six different unsupervised clusteringlgorithms for performing automated seismic facies analysis. Welso addressed how to validate the clustering results and decide theptimal number of clusters necessary to fit a data set. Because of theifficulties surrounding cluster analysis, we used a framework inhich we provide a comprehensive and systematic evaluation of the

lustering procedures’output.Tests conducted on 2D data sets were designed to demonstrate

ow different the clustering algorithms are in terms of the partition-ng they produce, especially when clusters have different degrees ofverlap. In this experimental stage, we observed that the hard com-etitive learning and neural gas algorithms exhibit a performancehat depends on the initial clustering conditions, whereas the EM al-orithm offers a better convergence because of the combined appli-

U-matrix display PCP graphic

Amplitu

deva

lues

)

)

)

e)

f)

g)

) h)

Amplitu

deva

lues

Amplitu

deva

lues

Amplitu

deva

lues

igure 12. Visual based data-mining clustering results of 2D seismsing sandstone/shale sequence. Note how the interpreted cluster-matrix displays �circled areas of light gray shading between adjace

nd d� correctly recovers the number of clusters and how the distincygonal lines in the PCP graphic displays support such a finding. Fombined interpretation of both displays �b and f� helps to identify six

cation of probabilistic models that fit the distribu-tion of data points and strategies for getting thehighest estimation of the maximum likelihood.The performance of hierarchical-based algo-rithms depends on the degree to which the clus-ters are separated in the data sets. For 2D seismicmodels, the results of this phase allowed us tochoose an unsupervised clustering algorithm tovalidate the existence of a natural cluster struc-ture in highly correlated and continuous seismicdata sets.

Although not described at length in this paper,Marroquín �2007� explains why the above-men-tioned algorithms fail to recover the true clusterstructure of 2D seismic models consistently. Theprimary reason is that clustering algorithms gen-erally are designed with certain assumptions thatsolve a particular problem. As such, their resultsare poorer for problems that do not satisfy theseassumptions.

Our tests also demonstrate that a SOM algo-rithm implemented in a visual data-mining ap-proach offers the best performance in interpretingthe correct cluster structure in the 2D data setsand the 2D seismic models. Moreover, the visualstrategy allows the interpreter to use a subjectivejudgment to assess the presence and nature of acluster structure in cases where clusters are poor-ly defined. Thus, we argue that the visual environ-ment for data exploration is promising for seismicfacies analysis because it is robust, does not makeassumptions about the shape of the clusters, han-dles complex data sets, and decreases depen-dence on user-supplied parameters. Our compan-ion paper extends these analyses to a 3D seismicdata set and demonstrates how the integration ofthe SOM with graphic techniques can benefitseismic facies analysis.

We identified several problems that shouldform the basis of future research. First, there ap-pears to be no single, quantitative means for com-paring the output of different clustering algo-rithms. Second, our selection of clustering meth-ods was meant to be representative, not exhaus-

ive. Given the large number of clustering algorithms and the manyifferent ways to implement those algorithms �initiation parameters,umber of nodes, and so on�, it may be that a visual data-mining frontnd could be adapted to other clustering methods. Finally, we notedome drawbacks to the SOM-based method. We defer recommenda-ions for improving this method to the companion paper.

ACKNOWLEDGMENTS

This work was undertaken by the senior author as part of a Ph.D.issertation at McGill University. Funding for this research camerom NSERC Discovery Grants to Bruce S. Hart �RGPIN38411-07� and to Jean-J. Brault �RGPIN 89742-02�. We thanktéphane Gesbert for a stimulating technical review, as well asengliang Gao and an anonymous reviewer for their reviews of thisanuscript.

ay

els generatedbution in thenodes in a, c,iation of po-study 2, the

rs.

displ

ic moddistri

nt mapt assocor casecluste

A

B

C

C

d

D

D

F

F

F

F

F

G

G

H

H

I

J

J

K

K

KK

M

M

M

M

M

M

M

M

M

M

O

S

S

S

U

v

V

V

W

W

W

X

X

Y

Y

Z


REFERENCES

rtero, A. O., M. C. F. de Oliveira, and H. Levkowitz, 2004, Uncoveringclusters in crowded parallel coordinates visualizations: Proceedings of the10th IEEE Symposium on Information Visualization, 81–88.

eyer, K., J. Goldstein, R. Ramakrishnan, and U. Shaft, 1999, When is near-est neighbor meaningful?: Proceedings of the 7th International Confer-ence on Database Theory, 217–235.

eleux, G., and G. Govaert, 1992, A classification EM algorithm for cluster-ing and two stochastic versions: Computational Statistics & Data Analy-sis, 14, 315–332.

oléou, T., M. Poupon, and K. Azbel, 2003, Unsupervised seismic faciesclassification: A review and comparison of techniques and implementa-tion: The Leading Edge, 22, 942–953.

e Matos, M. C., P. L. M. Osorio, and P. R. Schroeder, 2007, Unsupervisedseismic facies analysis using wavelet transform and self-organizing maps:Geophysics, 72, no. 1, P9–P21.

empster, A. P., N. M. Laird, and D. B. Rubin, 1977, Maximum likelihoodfrom incomplete data via the EM algorithm: Journal of the Royal Statisti-cal Society B, 39, 1–38.

umay, J., and F. Fournier, 1988, Multivariate statistical analyses applied toseismic facies recognition: Geophysics, 53, 1151–1159.

ayyad, U., R. Cory, and P. S. Bradley, 1998, Initialization of iterative refine-ment clustering algorithms: Proceedings of the 4th International Confer-ence on Knowledge Discovery and Data Mining, 194–198.

lexer, A., 2001, On the use of self-organizing maps for clustering and visu-alization: Intelligent DataAnalysis, 5, 373–384.

ournier, F., and J.-F. Derain, 1995, A statistical methodology for derivingreservoir properties from seismic data: Geophysics, 60, 1437–1450.

raley, C., and A. Raftery, 2002, Model-based clustering, discriminant anal-ysis, and density estimation: Journal of the American Statistical Associa-tion, 97, 611–631.

ritzke, B., 1997, Some competitive learning methods, http://www.neuroin-formatik.ruhr-uni-bochum.de/ini/VDM/research/JavaPaper, accessed 10March 2006.

ao, D., 2004, Texture model regression for effective feature discrimination:Application to seismic facies visualization and interpretation: Geophys-ics, 69, 958–967.

uo, D., M. Gahegan, A. M. MacEachren, and B. Zhou, 2005, Multivariateanalysis and geovisualization with an integrated geographic knowledgediscovery approach: Cartographic and Geographic Information Science,32, 113–132.

aykin, S., 1999, Neural networks: A comprehensive foundation, 2nd ed.:Prentice-Hall, Inc.

e, J., M. Lan, C.-L. Tan, S.-Y. Sung, and L. Hwee-Boom, 2004, Initializa-tion of cluster refinement algorithms: A review and comparative study:Proceedings of the International Joint Conference on Neural Networks, 1,297–302.

nselberg, A., 1985, The plane with parallel coordinates: The Visual Comput-er, 1, 69–97.

ain, A. K., M. N. Murty, and P. J. Flyn, 1999, Data clustering: A review:ACM Computing Surveys, 31, 264–323.

ohansson, J., P. Ljung, M. Jern, and M. Cooper, 2005, Revealing structurewithin clustered parallel coordinates displays: Proceedings of the 11thIEEE Symposium on Information Visualization, 125–132.

auffman, L., and P. Rousseeuw, 1990, Finding groups in data: An introduc-tion to cluster analysis: Wiley Interscience.

eim, D. A., and H.-P. Kriegel, 1996, Visualization techniques for mininglarge databases: A comparison: IEEE Transactions on Knowledge andData Engineering, 8, 923–936.

ohonen, T., 1995, Self-organizing maps: Springer-Verlag.oskela, M., J. Laakson, and E. Oja, 2004, Entropy-based measures for clus-tering and SOM topology preservation applied to content-based image in-dexing and retrieval: Proceedings of the 17th International Conference onPattern Recognition, 2, 1005–1009.acQueen, J. B., 1967, Some methods for classification and analysis of mul-tivariate observations: Proceedings of the 5th Berkeley Symposium onMathematical Statistics and Probability, 1, 281–297.arroquín, I. D., 2007, Reservoir characterization through the application ofseismic attributes: Multiattribute and unsupervised seismic facies analy-ses: Ph.D. thesis, McGill University.

arroquín, I. D., B. S. Hart, and J.-J. Brault, 2009, A visual data-miningmethodology to conduct seismic-facies analysis, part 2: Application to 3Dseismic data: Geophysics, this issue.artinetz, T. M., and K. J. Schulten, 1991, A “neural-gas” network learns to-pologies, in T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, eds., Ar-tificial neural networks: Elsevier Science Publishing Co., Inc., 397–402.athieu, P. G., and G. W. Rice, 1969, Multivariate analysis used in the detec-tion of stratigraphic anomalies from seismic data: Geophysics, 34,507–515.eilă, M., 2003, Comparing clusterings by the variation of information: Pro-ceedings of the 16thAnnual Conference of Computational Learning Theo-ry, 173–187.eilă, M., and D. Heckerman, 1998, An experimental comparison of severalclustering methods: Proceedings of the 14th Conference on Uncertainty inArtificial Intelligence, 386–395.itchum, R. M., Jr., P. R. Vail, and J. B. Sangree, 1977, Stratigraphic inter-pretation of seismic reflection patterns in depositional sequences, in C. E.Payton, ed., Seismic stratigraphy —Applications to hydrocarbon explora-tion:AAPG Memoir 26, 117–133.oth’d Belal, A.-D., 2005, A new algorithm for clustering initialization:Proceedings of the World Academy of Science, Engineering and Technol-ogy, 4, 74–76.outarde, F., and A. Ultsch, 2005, U*F clustering: A new performant “clus-ter-mining” method based on segmentation of self-organizing maps: Pro-ceedings of the 5th Workshop on Self-Organizing Maps, 25–32.

rdonez, C., and E. Omiecinski, 2002, FREM: Fast and robust EM clusteringfor large data sets: International Conference on Information and Knowl-edge Management, 1–12.

aggaf, M. M., M. N. Toksöz, and M. I. Marhoon, 2003, Seismic facies clas-sification and identification by competitive neural networks: Geophysics,68, 1984–1999.

chonlau, M., 2002, The clustergram: A graph for visualizing hierarchicaland non-hierarchical cluster analysis: The Stata Journal, 3, 316–327.

imaan, M. A., 1991, Aknowledge-based computer system for segmentationof seismic sections based on texture: 61st Annual International Meeting,SEG, ExpandedAbstracts, 289–292.

ltsch, A., and H. P. Siemon, 1990, Kohonen’s self organizing feature mapsfor exploratory data analysis: Proceedings of the International Neural Net-work Conference, 305–308.

an Hulle, M., 2002, Kernel-based topographic map formation achievedwith an information-theoretic approach: Neural Networks, 15,1029–1039.

erbeek, J. J., N. Vlassis, and B. J. A. Kröse, 2005, Self-organizing mixturemodels: Neurocomputing, 63, 99–123.

inther, R., K. Mosegaard, K. Kierkegaard, I. Abatzis, C. Andersen, O. Vej-back, F. If, and P. Nielsen, 1995, Seismic texture classification: A comput-er-aided approach to stratigraphic analysis: 65th Annual InternationalMeeting, SEG, ExpandedAbstracts, 153–155.egman, E. J., and Q. Luo, 1996, High dimensional clustering using parallelcoordinates and the grand tour: Computing Science and Statistics, 28,352–360.est, B., S. May, J. E. Eastwood, and C. Rossen, 2002, Interactive seismicfacies classification using textural and neural networks: The LeadingEdge, 21, 1042–1049.itten, I. H., and E. Frank, 2005, Data mining: Practical machine learningtools and techniques with Java implementations, 2nd ed.: Morgan Kauf-mann.

u, L., and M. Jordan, 1995, On convergence properties of the EM algorithmfor Gaussian mixtures: Neural Computation, 7, 129–151.

u, R., and D. Wunsch, 2005, Survey of clustering algorithms: IEEE Trans-actions on Neural Networks, 16, 645–678.

ang, F. M., and K. Y. Huang, 1991, Multi-layer perception for the detectionof seismic anomalies: 61st Annual International Meeting, SEG, ExpandedAbstracts, 309–312.

in, H., and N. Allinson, 2001, Self-organizing mixture networks for proba-bility density estimation: IEEE Transactions on Neural Networks, 12,405–411.

hang, H., and Y. Li, 1993, Self-organizing map as a new method for cluster-ing and data analysis: Proceedings of the International Joint Conference onNeural Networks, 3, 2448–2451.

a visual data-mining methodology for seismic-facies - paradigm

Documents