segmenting geochemical records using hierarchical

14
Chemical Geology 559 (2021) 119973 Available online 4 November 2020 0009-2541/© 2020 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Segmenting geochemical records using hierarchical probabilistic models Aaron C. Davis * CSIRO Mineral Resources, 26 Dick Perry Ave, Kensington 6151, WA, Australia A R T I C L E INFO Editor: B. Sherwood- Lollar Keywords: Change-point Partition Probabilistic Hierarchical Drill core ABSTRACT Geochemical records of drill cores can be complex and multi-variate. We address the problem of automatically classifying collections of sequential geochemical data. Our approach uses a change-point analysis based on a probabilistic explanation of the binary segmentation method. Change-points, which separate geochemical data into segments of self-similar trends, are accepted only if they increase the global probability of the data set. Increased model complexity is penalised by an Occam factor that offsets improvements in explaining the data trends. Therefore, simpler models are preferred. The method is insensitive to unequally spaced samples in the geochemical records, negating the need to interpolate data for missing values. Probability analysis requires an estimation of variance in the measured data which affects our ability to detect change-points. Extension of change-point analysis to multiple variables in geochemical data sets is straightforward. Typical classification schemes only segment geochemical data into regions that are explained by constant models: we extend our analysis to include models where chemical concentrations change linearly with depth. The recursive application of the binary segmentation algorithm generates hierarchical models that can be used to classify geochemical records over a variety of distance scales. We provide a clear explanation of the development of the mathematics of change-point analysis with binary segmentation and demonstrate its utility with a suite of examples. We compare our results to published classi- fication methods, and provide two examples of classification on data sets from peat bogs and drill core. In one example we detect linear trends in concentrations of Ti, Cr, and Zr that would not be detected with other methods. 1. Introduction Exploration for mineral resources is complex and risky. Once a prospective area is identified, the primary method of subsurface inves- tigation is drilling. Geological information is gained by visual inspection of drill core; however, multivariate, depth-dependent geochemical and mineralogical data are commonly acquired as auxiliary information sources. Because mineral exploration is directed toward deeper base- ment deposits, these records are large and complex. There is need for analysis techniques that deepen our understanding of mineral systems and structures, their provenance and evolution, and their spatial relationships. One way of analysing depth-dependent data series is to group seg- ments into self-consistent sections sharing similar trends or properties. This is similar to visual logging of core into lithological and stratigraphic units separated by geological changes, and is representative of discrimination and classification problems. Classifying geological records assists our understanding of processes and formation: by extension, classification of geochemical records extends our under- standing beyond visual inspection (see, e.g., Pratson et al., 1992; Gazley et al., 2014; Fresia et al., 2017; Konat´ e et al., 2017; Hill and Uvarova, 2018; Stromberg et al., 2019). Classification can extend to downhole logs of geophysical data. One common method of classifying geophysical records is through contin- uous wavelet transforms (CWT). CWTs offer multi-scale detection of boundaries and edges in data that is useful for lithological characteri- sation Perez-Mu˜ noz et al. (2013). Cooper and Cowan (2009) used the CWT method to detect alterations in banded iron formations from magnetic susceptibility measurements in Western Australia. They used the second derivative of a Gaussian as the transform operator. This was further developed by Davis and Christensen (2013) for detecting edges in gamma and conductivity logs from geophysical records along the Gascoyne River, Western Australia. Davis and Christensen determined layer importance by creating a hierarchical grouping of layers. This * Corresponding author. E-mail address: [email protected]. Contents lists available at ScienceDirect Chemical Geology journal homepage: www.elsevier.com/locate/chemgeo https://doi.org/10.1016/j.chemgeo.2020.119973 Received 22 April 2020; Received in revised form 26 October 2020; Accepted 29 October 2020

Upload: others

Post on 29-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Chemical Geology 559 (2021) 119973

Available online 4 November 20200009-2541/© 2020 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Segmenting geochemical records using hierarchical probabilistic models

Aaron C. Davis *

CSIRO Mineral Resources, 26 Dick Perry Ave, Kensington 6151, WA, Australia

A R T I C L E I N F O

Editor: B. Sherwood- Lollar

Keywords: Change-point Partition Probabilistic Hierarchical Drill core

A B S T R A C T

Geochemical records of drill cores can be complex and multi-variate. We address the problem of automatically classifying collections of sequential geochemical data. Our approach uses a change-point analysis based on a probabilistic explanation of the binary segmentation method. Change-points, which separate geochemical data into segments of self-similar trends, are accepted only if they increase the global probability of the data set. Increased model complexity is penalised by an Occam factor that offsets improvements in explaining the data trends. Therefore, simpler models are preferred. The method is insensitive to unequally spaced samples in the geochemical records, negating the need to interpolate data for missing values. Probability analysis requires an estimation of variance in the measured data which affects our ability to detect change-points. Extension of change-point analysis to multiple variables in geochemical data sets is straightforward. Typical classification schemes only segment geochemical data into regions that are explained by constant models: we extend our analysis to include models where chemical concentrations change linearly with depth. The recursive application of the binary segmentation algorithm generates hierarchical models that can be used to classify geochemical records over a variety of distance scales.

We provide a clear explanation of the development of the mathematics of change-point analysis with binary segmentation and demonstrate its utility with a suite of examples. We compare our results to published classi-fication methods, and provide two examples of classification on data sets from peat bogs and drill core. In one example we detect linear trends in concentrations of Ti, Cr, and Zr that would not be detected with other methods.

1. Introduction

Exploration for mineral resources is complex and risky. Once a prospective area is identified, the primary method of subsurface inves-tigation is drilling. Geological information is gained by visual inspection of drill core; however, multivariate, depth-dependent geochemical and mineralogical data are commonly acquired as auxiliary information sources. Because mineral exploration is directed toward deeper base-ment deposits, these records are large and complex. There is need for analysis techniques that deepen our understanding of mineral systems and structures, their provenance and evolution, and their spatial relationships.

One way of analysing depth-dependent data series is to group seg-ments into self-consistent sections sharing similar trends or properties. This is similar to visual logging of core into lithological and stratigraphic units separated by geological changes, and is representative of discrimination and classification problems. Classifying geological

records assists our understanding of processes and formation: by extension, classification of geochemical records extends our under-standing beyond visual inspection (see, e.g., Pratson et al., 1992; Gazley et al., 2014; Fresia et al., 2017; Konate et al., 2017; Hill and Uvarova, 2018; Stromberg et al., 2019).

Classification can extend to downhole logs of geophysical data. One common method of classifying geophysical records is through contin-uous wavelet transforms (CWT). CWTs offer multi-scale detection of boundaries and edges in data that is useful for lithological characteri-sation Perez-Munoz et al. (2013). Cooper and Cowan (2009) used the CWT method to detect alterations in banded iron formations from magnetic susceptibility measurements in Western Australia. They used the second derivative of a Gaussian as the transform operator. This was further developed by Davis and Christensen (2013) for detecting edges in gamma and conductivity logs from geophysical records along the Gascoyne River, Western Australia. Davis and Christensen determined layer importance by creating a hierarchical grouping of layers. This

* Corresponding author. E-mail address: [email protected].

Contents lists available at ScienceDirect

Chemical Geology

journal homepage: www.elsevier.com/locate/chemgeo

https://doi.org/10.1016/j.chemgeo.2020.119973 Received 22 April 2020; Received in revised form 26 October 2020; Accepted 29 October 2020

Chemical Geology 559 (2021) 119973

2

offered a geophysical record that could be viewed based on selections of layer thickness, number of layers, or wavelet scale. Later, Hill et al. (2015) applied a CWT method to geochemical data, using the hierar-chical boundary grouping to interpret logs at chosen scales. Hill and Uvarova (2018) later compared CWT results to expert interpretations, and showed improved geological understanding with automated clas-sifiers. The benefits of extending classification of data into 3-dimen-sional space to create models that reflect mineral processes and formations is obvious Hill et al. (2020).

There are a few problems with implementation of CWT. Currently, CWT methods require evenly spaced samples along the distance axis; unevenly spaced and missing samples must be interpolated to regular spacings. Multivariate characterisation is conducted individually for each record, making it difficult to obtain measures of how components combine to make separate classes. Hill et al. (2020) classify multivariate data by summing the boundary strength of all variables entering the analysis. They then create a tessellation of the group of variables using the total boundary strengths.

The CWT method has a limited way of treating with variance and data noise. Although it can be used to de-noise data, as described in Davis and Christensen (2013) and Cooper and Cowan (2009), it is difficult to tell which variations are due to sampling error and which are due to natural variations. Finally, published CWT methods for geophysical and geochemical analysis are only valid for modelling constant values across layers. Presumably, the dual CWT method of Hill and Uvarova (2018), which can be used to detect sharp and gradual boundary changes, could be extended to classifying layers with values other than a constant trend.

Alternative methods exist for classifying geochemical records by recasting the problem into change-point detection. A change-point is a location that divides a depth-series data into two segments of self- consistent trends. For discrete data, a change-point can be the index that indicates the beginning of the right-hand trend. A trend in a segment is defined with functional models that describes the data. The models may be constant value, a straight line with depth-dependent slope, or more complicated functional forms. An impressive example change-point detection is described by Gallagher et al. (2011) which is based on the reversible-jump Markov chain Monte-Carlo method. Gal-lagher et al. (2011) offer a method that separates discrete data series into segments of constant value. They address the problems of unequal data sampling, noise, and multi-variate analysis. Although they do not show it, their method can be extended to include trends other than constant values between change-points ((as Kylander et al., 2007). The main drawback of their method is computation time, which becomes very great for large multivariate data series.

Change-point analysis is applied across many applications with different nomenclature. An excellent summary of the subject is given by Truong et al. (2020). In it, they mention the binary segmentation method which uses a cost function to create constant-valued segments in a data series (Scott and Knott, 1974; Sen and Srivastava, 1975). The method was extended several times (see Yao, 1984; Barry and Hartigan, 1993; Killick et al., 2012) with the general approach of shortening the computation time.

In this paper, we develop a change-point analysis method for geochemical and environmental records. Following Denison et al. (2002) and Gregory (2010), we extend the binary segmentation of Scott and Knott (1974) using probability theory. Increased model complexity is accepted dependent on global probability improvement for models fitting the data. Better fits produced by increased model complexity are balanced by an Occam factor that penalises the global probability: therefore, simpler models are preferred. Recursive application of binary segmentation produces a hierarchical model structure based on change- point likelihood and model probability. Sub-selections of the hierarchy can be chosen to simplify visualisation without losing the overall model structure.

Probability theory requires an estimate of noise in the data. We show

how noise, and our estimation of it, affects our ability to detect change- points. In general, increased noise decreases the ability to detect change- points. We compare our method to a modified version of the reversible- jump Markov chain Monte Carlo algorithm presented by Gallagher et al. (2011). For a simple data series with constant trends, we recover models very similar to Gallagher et al. (2011) in a fraction of the time it takes for the RJMCMC simulation. Like RJMCMC, our method is insensitive to non-uniformly spaced samples. Null values in data series (that is, data points which are ether below detection limit or are missing) are easily treated without having to infer values for equally spaced sample points.

The Bayesian framework allows us to extend available model types permitted between change-points. We use the set of polynomial func-tions as candidates. We limit our investigation to polynomials of up to order 1, although higher order models can also be chosen. We demon-strate model selection with examples that show the possibility of detecting liner trends with depth. Combinations of linear and constant trends are modelled and shown.

We extend the change-point analysis to collections of multiple data series, not necessarily co-located in depth. Multivariate analysis de-termines common-position change-points with differing models for each series. Extension to multiple dimensions represents a significant exten-sion of change-point analysis for geochemical records.

Finally, we verify our method using real data to two case-studies. One example analyses environmental chemistry data taken from peat bogs and was studied by Large et al. (2009) and by Gallagher et al. (2011) using the RJMCMC method with constant trends. Our result compares favourably with theirs. Another example is from continuous X- ray fluorescence data collected from drill core in Western Australia. Focussing on the elements Ti, Cr and Zr, we recover the Formation boundaries determined by visual geological inspection, and detect re-gions that exhibit linear variations in elements with depth.

2. Theoretical development

2.1. Model parametrisation

Consider the set of N observations of variable y = {yn : 1 ≤ n ≤ N) which have associated uncertainty Δyn. Δy =

{Δyn : 1 ≤ n ≤ N

}repre-

sents the set of model variances, assumed to be Gaussian and individu-ally identically distributed. Observations are separated in distance or time by the set of locations, x = {xn : 1 ≤ n ≤ N}.

We wish to segment the series of data into partitions modelled by functions. Change-point c which indicates a change in the trend of the data series and locates the partition boundary. We denote a set of par-titions as m, where a single partition has model mi specified by param-eters ci, the index of the change-point, and {a}i, the parameters of the function modelling data within the partition. We consider the first two polynomial functions as possible bases so that the partition parameters are either {a0} for a constant model, or {a0,a1} for a linear model. The simplest partition of a set of N observed data points is

m =

(

0,1N

∑Nn=1yn⋅Δyn∑N

n=1Δyn

)

(1)

the partition consists of N measurements, and parameter a0 is the weighted mean of the measured data.

A partition set can consist of k change-points resulting in k + 1 partitions (0 ≤ k ≤ N − 1). We write this as

m = (c, a) (2)

where we drop the brace notation from a understanding that it is a vector of parameter sets. For a given partition mn, the prediction for the data is

ypred,n = GTn an (3)

A.C. Davis

Chemical Geology 559 (2021) 119973

3

where ypred,n is the predicted data for indexes i = max (1,cn− 1) to j =min (cn,N), and GT is the transpose of the generating matrix. For poly-nomials of order n, G is

G =

⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 xi x21 … xn

1

1 xi+1 x2i+1 … xn

i+1

⋮ ⋮ ⋮ ⋱ ⋮1 xj x2

j … xnj

⎥⎥⎥⎥⎥⎥⎥⎥⎦

(4)

2.2. Parameter estimation

We estimate the best fitting parameters {a} that describe the data in a given partition using Bayes’ theorem

p({a}|y, I) =p({a}, I)p(y|{a}, I)

p(y, I)(5)

where p({a}|y, I) is the probability of the parameters given y and I (our prior information). The first term in the numerator is the prior proba-bility of the parameter set values {a}, the second term is the likelihood of the data given the modelling parameters. The term in the denominator is a normalisation term that evaluates to a constant for the parameter estimation problem (see Gregory, 2010). Best-fitting model parameters for a given partition are obtained by maximising Eq. (5).

2.2.1. Prior probability Prior probabilities express our beliefs about the values of the model

parameters. For a constant value model in a segment, an appropriate prior probability is a uniform distribution across the range of observed values of the entire series. Designating the minimum and maximum values as ymin and ymax, the prior probability for any parameter a0 is

p(a0|I) =1

ymax − ymin(6)

We extend the logic for the linear case: consider the example that the minimum and maximum values of the data occur consecutively, and that the distance between the two observations is the smallest separation in the series. We therefore encounter maximal slope across the two points with value of a1, max = (ymax − ymin)/Δxmin). The minimum possible slope would be a1, min = − a1, max. A conservative prior probability for all a1 parameters could be

p(a1|I) = 2Δxmin

ymax − ymin(7)

If we choose a polynomial of order 0 to describe the data, the prior probability is given by Eq. (6). If we choose a polynomial of order 1, then the prior is the product of Eqs. (6) and (7):

p({a}) =∏

αp(aα) (8)

where we drop information term I for convenience.

2.2.2. Likelihood The L2-norm is used to calculate the likelihood of the predicted data

fitting the observed data for a given parameter set {a}. The probability of the data occurring is

p(y|{a}, I)=

⎢⎣

1

(2π)N/2

detC− 1d

⎥⎦

exp(

−12(ypred − y

)T C− 1d(ypred − y

))

(9)

The first term in the RHS of Eq. (9) is a constant normalisation value for the sequence. Cd

− 1 is the data covariance matrix, and is diagonal with entries of 1/(Δyi)2.

2.3. Determining {a}

The best-fitting coefficients {a} maximise the likelihood function Eq. (9), accomplished by minimising the quadratic term in the exponent with respect to model parameters. Defining the model parameter covariance matrix as V = (GTCd

− 1G)− 1, the best-fitting parameters are

{a} = VGTC− 1d y (10)

Notice V contains the data covariance matrix; the model parameters are determined by data uncertainty and the polynomial matrix G. Esti-mates of model parameter uncertainty are obtained from the square root of the main diagonal of V.

2.4. Global probability

After selecting a model, we determine the best fitting parameters for a given partition using Eq. (10), and calculate the global probability for the chosen model. This can be written as p(y|m), which is the probability of the data occurring given the best-fitting model m. The global proba-bility is calculated by integrating over all possible model parameters. Assuming that the best-fitting model parameters have Gaussian distri-butions, the integral can be computed Gregory (2010) and is expressed as

p(y|m, I) =

(∏M

αp(aα)(2π)(M/2)

detV√

)

p(y|{a}, I)

= ΩmLmax

(11)

The first term Ωm is known as the Occam factor. It is the product of the prior probabilities of the model parameters for a chosen model (Eq. (8)), multiplied by the square root of the determinant of the parameter covariance matrix V for the model, times a normalising factor of (2π)M/2

where M is the total number of model parameters required. The second term is the value of the likelihood estimated with parameters {a}, given by substituting Eq. (10) into Eqs. (3) and (9). Allowing more model parameters increases the likelihood factor. However, the Occam factor reduces global probability because more prior probability factors are needed for the added parameters.

2.5. Model selection

Eq. (11) calculates the global probability of model m that best fits N data points with parameters {a}. When we separate the series into two partitions with change-point c at index p and fit the two segments with models mathbfm = {m1,m2} using Eq. (10), we have the joint likelihood p (y[1,p],y[p+1,N]|m). The data vectors consist of points 1 to p, and p + 1 to N for the two partition. The data sets are treated independently, the global probability becomes

p(

y[1,p] , yp+1,N]|m)= p(

y[1,p] |m1

)p(

y[p+1,N ] |m2

)(12)

and we use Eq. (11) for each term on the righthand side. The decision to choose 2 partitions over a single model is made by taking the ratio of the global probabilities for each. Using Eqs. (12) and (11), the odds ratio for preferring the two partitions is

O21 =p(m)

p(m)p(

y[1,p] |m1

)p(

y[p+1,N ] |m2

)

p(y|m)

where p(m) is the prior probability for two partitions and p(m) is the prior probability for one. With no preference for competing models, the ratio in the first term evaluates to 1, leaving the simplified expression

A.C. Davis

Chemical Geology 559 (2021) 119973

4

O21 =Ωm1 Ωm2

Ωm

Lmax,m1 Lmax,m2

Lmax,m(13)

which can be evaluated with appropriate substitutions from Eq. (11). We extend the idea to re-write Eq. (13) into a model selection equation that considers two arbitrary model sets, possibly consisting of several parti-tions each. Representing the first set as m1 and the second set as m2, the odds ratio of selecting model set m2 over m1 is written as

O21 =Ωm2

Ωm1

Lmax,m2

Lmax,m1

(14)

2.6. Change-point selection

We want to determine how to best split N points in a series into two partitions with constant vales. From Eq. (10), we calculate the best- fitting parameters for any partition. Eqs. (11) and (12) calculate the global probability of the partitions, and Eq. (14) tells which model set of

preferable. The procedure is prescriptive: calculate the global probability of two-

partition models for every possible change-point location in the series from index 1 to N. There is a single index p which gives the largest global probability of the two-partition model, maximising Eq. (12). Dividing the maximum value by the global probability of the single partition gives the odds ratio. If the odds ratio is greater than threshold Othresh, we accept the model as a valid partition, rejecting otherwise. The procedure is repeated for each new partition until the odds ratio for every proposed partition falls below Othresh.

There is also a procedure for simplification. Once change-point j is accepted, generating 2 new layers, we propose to simplify the new model by removing either or both change-points j − 1 and j + 1, calculated with Eq. (14). This extra step attempts to address the problem that binary segmentation does not always lead to the optimal solution Truong et al. (2020).

Considering different models for partitions does not change the procedure. When we introduce a change-point at index p, we have four

Fig. 1. Example of sectioning synthetic data using constant models. Odds criterion for acceptance is 2:1. Data is non-uniformly sampled in the sequence. (a) Global probability of the hierarchical models, normalised to maximum probability. (b) Model complexity plot. Colours represent mean value of the layer between 2 change- points versus depth. Model complexity increases to the right. The vertical white line indicates which model complexity level chosen for display in (c). (c) Data and error (dots and whiskers) for the example. Solid blue line is the result of the change-point analysis; dashed gold line shows the true base model. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

5

proposals to make. Specifying C as a constant value and L as a linear function, we propose the following models: C− C, C− L, L− C, and L− L, where the hyphen represents the breakpoint for the partitions. Selecting the best-fitting models is straightforward: find location p of the largest global probability for all four proposals and compare it to the global likelihood of the model for data series without the change-point. Linear functions are more complicated and will fit better than constant values, but the likelihood increase will be balanced by the penalty introduced by the prior probabilities for extra model parameters (Eq. (11)). Simpler models will be preferred unless the likelihood is greater than the Occam penalty.

2.7. Extension to multiple dimensions

Suppose we have S series of independent data samples, not neces-sarily co-located. The most probable index to separate the ensemble into two partitions with change-point c is found using the method above. Each data series is a vector of samples y1 to yS, with N1 to NS observa-tions. We form N × S matrix Y for the total set where N is the number of locations in position vector x which contains all points. We split the ensemble into two partitions by maximising the global probability of the data

p(Y|M, I) =∏S

s=1p(Y([1,p], s) |M(1,s) , I

)p(Y([p+1,N ], s) |M(2,s) , I

)(15)

Since each data series is independent, the probability of each parti-tion is calculated as a product of the probability for each column s of the S columns in Y. We compare the global probability of the two partition models to the global probability of the ensembles taken as a singular partition.

Although data matrix Y is N × S, it can contain empty entries in one or more of its columns, indicating locations in the data sequence where one or more of the observed variables was not recorded. These null values do not matter, and the placeholders in Y are there to indicate presence of other, valid, variables. We scan through discrete vector x to seek the location p of change-point c, including only valid data points, and eliminating the need to impute data that is not present.

3. Synthetic examples

We use synthetic data examples to demonstrate important features of the method for data analysis and characterisation.

3.1. Constant model

The first example segments data into partitions of constant value separated by abrupt changes. Fig. 1c shows a sequence of noisy data irregularly sampled with respect to depth. Each point is indicated with a black dot and gray lines. Data is generated from a basis set of 8 layers, shown by the dashed gold line. The first 80 points are sampled at 2.5 m intervals, the remainder are sampled at 1 m intervals. Data are produced by adding random Gaussian noise with Δy = 100 to the basis set at the sample locations. The solid blue line in panel (c) shows the result of the segmentation algorithm with a log-odds threshold ratio of 2:1. There are 7 change-points detected, resulting in 8 discrete layers.

Fig. 1b shows the hierarchical complexity plot resulting from the algorithm. The colours show the mean value of the layer produced from the algorithm, while change-points are marked with horizontal black lines. Starting from 1, models become more complex to the right. At every increase in model complexity, an additional change-point is added resulting in an extra constant model layer in the sequence. The white vertical line at 8 indicates the model displayed in c. Fig. 1a shows the global probability (Eq. (11)) of the models with increasing model complexity. The difference in global probability between model 2 and 1 is 229, indicating that 2 layers is 2.2 × 1099 more likely than one layer.

The last odds ratio for 8 layers to 7 is 33:1. Fig. 2 shows log-probability values for the first change-point of the

example. Total global probability is calculated for each index in the data sequence, and the contributions from the left- and right-hand sequences are shown in blue and gold, respectively. The total global probability for each location is shown in black, and has a maximum at 337 m.

3.2. Effect of noise in the measurements

Using the same data sequence as before, we re-run the analyses with different estimates of noise in the data. Fig. 4a shows the number of layers detected by the algorithm with stated noise values of Δd = 100, 200 and 400. Error bars for each noise estimate are shown at 450 m depth. For noise levels of 200 and 400, 5 layers are detected. Larger estimates of noise in the data are related to a reduction in change-points even though the data does not change. Panel (b) of the figure shows the normalised global probabilities for each example.

We further explore the effect of noise in Fig. 4. Each section in the figure shows a histogram of the total number of layers detected for 10,000 change-point simulations similar to the example above. For each simulation, we use the same sample depths and the same basis set of layers (gold line in Fig. 1), but we set different noise, noise estimations, and sample values. Sample points are produced using a different seed for the random number function that generates the points. The algorithm is run, the number of layers detected for a simulation is recorded, and histograms accumulated for each simulation. For cases in Fig. 4(a-d), the actual noise used to produce the data samples was Δd = 100, but the noise estimates used were 100, 200, 400 and 50. For cases (e-g), the actual and estimated noise were 200, 400, and 50. We see that the number of layers detected decreases with increasing noise. For the case where the actual error was Δd = 100 but the estimated noise was 50, the number of change-points detected was much larger than the actual number of change-points in the synthetic data sequence.

Fig. 2. Probability of first segmentation from above example. Solid blue line shows the probability of the left-hand side of a constant fit to the data as the change-point index is moved from left to right. Gold curve shows probability of a constant model fit to the right-hand side of the change-point. Black curve shows the total probability (LHS + RHS) of the models. The peak of the black curve occurs at 337 m. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

6

3.3. Comparison to RJMCMC models

Following Gallagher et al. (2011), we use a simplified version of the RJMCMC method. We do not randomly propose mean values for a given layer: the most probable constant value for a segment is given by the

mean of the data contained in the layer. Fig. 5b shows a distribution of change-points accepted at the end of a

realisation involving 400,000 proposals in the Markov chain. The fre-quency of accepted change-points is normalised to 1. Fig. 5a shows a comparison between our algorithm (blue line) and the posterior mean

Fig. 3. Effect of noise in change-point analysis. (a) Dots indicate data. Blue line shows result of change-point analysis when Δd = 100, dashed red when Δd = 200, dashed gold when Δd = 400. Representative error bars are shown at 450 depth in the figure. (b) Normalised global probability for each of the error estimates shown in (a). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

7

model of the RJMCMC method (dashed gold). Blue shading shows the predicted error of the constant values from our method, estimated from the best-fitting parameters resulting from the change-points. The RJMCMC method took approximately 275.8 s to compute, compared to about 300 ms for ours (nearly 1000 times faster).

3.4. Linear versus constant trends

So far, we have only demonstrated sectioning by seeking change- points that result in constant-value layers on each side of the parti-tion, but we can extend models to higher-order polynomials. Fig. 6c shows a synthetic data sequence of points with obvious linear trends. The sequence is formed from a basis set (gold line), and points are randomly generated using random numbers with a chosen noise level. Fig. 6b shows the result of sectioning when only constant models are accepted between change-points, and Fig. 6a shows the resulting global

probability. There are 14 change-points in the final model. Fig. 6d shows the result with polynomial models up to order 1 allowed. The last model has 4 change-points resulting in 5 layers. Fig. 6e shows the global probability. Both the constant and constant-linear models are shown in Fig. 6c.

The colours in Fig. 6c represent the mean value of the layer bounded by the change-points. We also indicate whether the layer is determined with a constant (C) or a linear trend (L). The position of a change-point varies from model 6 to model 7 (and from 5 to 6). This is a consequence of a model simplification step after each new change-point is accepted (Section 2.6).

Models generated in Fig. 6d contain both constant and linear layers. The constant models shown in Fig. 6b are a subset of the more general models. Since the model in Fig. 6d consists of linear and constant models, it is preferred over the constant-only models. We calculate the odds ratio of the linear-model segmentation to the constant-only model

Fig. 4. Histograms of the number of layers detected for 10,000 random data realisations using different error values and error estimates. (a) Actual error =100, estimated error Δd = 100. (b) Actual error =100, estimated error Δd = 200. (c) Actual error =100, estimated error Δd = 400. (d) Actual error =100, estimated error Δd = 50. (e) Actual and estimated error Δd = 200. (f) Actual and estimated error Δd = 400. (g) Actual and estimated error Δd = 50. A vertical line in each panel shows the number of layers in the true data model (8).

A.C. Davis

Chemical Geology 559 (2021) 119973

8

by comparing their total global probability: the linear model is favoured by a factor of 3.5 × 1040 : 1.

3.5. Multivariate data

Panels (c) and (d) of Fig. 7 show 2 multivariate synthetic data se-quences that exhibit constant and linearly varying layers. Data series num1 is composed of 5 layers, series 2 has 6 layers. The basis data se-quences are shown with gold lines, and the data samples are shown with solid black dots and gray error bars. The segmented data models for both series are shown with blue dashed lines. Both sequences show a gap in the data at 200 m, but data series 1 has missing data at 700 m repre-senting a section of data that was not recorded. We choose the locations of the change-points and linear or constant models for the segmented layers using Eq. (15). Linear and constant models are chosen for each data trace individually, and only the position of the change-point is shared between the series. Fig. 7b and e show the mean values of the best-fitting models for each segment, and panels Fig. 7a and f show the global probability for each trace. The total global probability for the

entire data set is shown in Fig. 7g. Neighbouring change-points are allowed to be removed with the addition of each newly accepted change- point. This is seen to occur in Fig. 7b and e between model complexities 6 and 7 at 200 m depth.

4. Real data examples

The previous section exhibited segmentation for synthetic examples placing emphasis on noise, null values, extension to allowing linear models, and multivariate analysis. We focus on applications to real data sets, the first example of which involves geochemical data from peat beds in Central China. The second examines changes in geochemistry observed in continuous XRF data of drill cores from the Fortescue Group, Western Australia.

4.1. Geochemical data in peat

The first example examines data originally published by Large et al. (2009) and reinterpreted by Gallagher et al. (2011). Peat bogs are useful

Fig. 5. Comparison of probabilistic change-point results to an RJMCMC realisation. (a) Data shown with dots and whiskers, blue line shows results from the proposed method using constant models and a 2:1 log-odds acceptance criterion. Shaded blue region shows the ±1σ variance of the constant models. The dashed gold line shows the resulting posterior mean model from an RJMCMC realisation using 400,000 proposals. (b) Histogram of the RJMCMC change-point distributions. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

9

for determining changes in climatic history through environmental variables such as carbon (and δ13C), organic carbon density (ρ(Corganic)), density (ρ), nitrogen, hydrogen, and soil density. The changes in envi-ronmental variables are interpreted for cold, dry periods, and recovery epochs. Our example is from data available from Large et al. (2009) and their subsequent interpretation of different cold periods (Fig. 7 of their article). Gallagher et al. (2011) re-examined this data with their RJMCMC method. They inferred approximately 10 different change- points in their analysis.

Fig. 8 shows the result of our analysis of the δ13C, ρ(Corganic), density (ρ), C, N, and H depth series. We estimate the error of each variable as 50% of the standard deviation of the sequence. The change-points identified are shown by solid lines in Fig. 8h. Change-points from Gal-lagher et al. (2011) are shown in Fig. 8i, and the cold epochs from Large et al. (2009) are marked as gray zones in each of the panels h-m. The individual data series, together with their estimated errors, are shown in Fig. 8h–m. The recovered models are also shown in the figure (solid blue

lines). The increase in total global probability versus model complexity for all variables is shown in Fig. 8a. The global probability increases with each new change-point. The other figures on this row (b-g) show the contributing global probability values for each of the variables.

4.2. Geochemical data from drill holes

The second example uses continuous X-ray fluorescence (XRF) geochemical data acquired from two drill holes in the Fortescue Group, Western Australia. Drilled by Artemis Resources in the West Pilbara Superterrane, core from holes 18ABAD01 and 18ABAD02 was logged with the Minalyse ™instrument which scans a 20×2 XRF swath along the drill core. Spectral data is analysed for geochemical composition and is averaged into 10 cm intervals. The resulting data set can be analysed as multivariate sequences associated to positions along the drill core. We estimate the error of each 1 sample point by computing the standard deviation of 10 cm stacked values across a 1 interval. This attempts to

Fig. 6. Example of synthetic data when linear trends are present. (a) Normalised global probability when constant models are assumed between change-points. (b) Model complexity plot when constant models are assumed. (c) Synthetic data shown with dot and whisker plots, true model shown with solid gold line. Constant model shown with solid magenta line, and model that allows both constant and linear trends shown with dashed blue. (d) Complexity plot for models that allow constant and linear trends. Colours in the figure show the mean value of each layer. Each layer has a letter that describes whether the model is constant (’C′) or linear (‘L’). (e) Normalised global probability for models that allow constant and linear trends. The odds ratio of the constant-linear models to constant only is 3.5 × 1040 : 1. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

10

estimate geological noise in the samples: variations in composition in a measured interval are attributed more to sample averaging rather than analytical error.

The Fortescue Group is composed of Formations of extrusive volcanic and sedimentary rocks that overlie the granites and greenstones of the Pilbara Craton Thorne and Trendall (2001). At the locations of drill holes 18ABAD01 and 02, the Tumbiana, Hardey, and Kylena Formations are identified before terminating in granitic basement Stromberg et al. (2019).

Fig. 9 shows our analysis applied to the Cr, Zr and Ti concentrations from both drill holes. The process is computed separately for each drill hole, even though the results are shown grouped together. Fig. 9a and b show the complexity plots of Cr for both holes. For clarity of display, we have chosen a complexity level of 15 which is marked with a vertical white line on the figures. Fig. 9c and d show the measured Cr data for both drill holes with the resulting segmentation at complexity level 15. Likewise, Fig. 9e and f show the Zr concentrations (and segmentation); while g and h show Ti.

The Formation boundaries between the Tumbiana, Kylena and Hardey Formations are marked with dashed horizontal lines in each figure. Our models are shown in blue.

5. Discussion

5.1. Synthetic examples

Section 3 showed several examples of using the change-point detection method on synthetic data to highlight features and how we address various problems. We showed that we can produce results similar to continuous wavelet transforms (CWT): we can construct hi-erarchical structures of models of increasing complexity whereby seg-ments of data are modelled with trends separated by indexes that mark abrupt changes. The method presented here offers several advantages over CWT analysis. Fig. 1 shows we do not require equally spaced data, meaning there is no need to impute or attribute data at locations where it doesn’t exist. Figs. 3 and 4 show the probabilistic method requires an estimate of variance in measured data, demonstrating that increased noise reduces our ability to determine the presence of change-points. Fig. 4d shows that underestimation of noise in the data results in the detection of more layers than are present in the set. In contrast, when our estimation of noise is lower than the actual variance in the data, we under-estimate the correct number of change-points. This is because improvement of data fitting is not greater than the penalty imposed for

Fig. 7. An example of multivariate models with constant and linear trends. (a) Normalised global probability of the algorithm for data trace 1. (b) Model complexity plot for data trace 1. Colours indicate mean value of each layer identified by the change-points. As before, each layer model is indicated whether it is constant (’C′) or linear (‘L’). (c) Data and models for data trace 1. True model is shown with solid gold line, recovered model shown with dashed blue line. (d) As in (c), but for data trace 2. (e) As in (b), but for data trace 2. (f) Normalised global probability for data trace 2. (g) Normalised total global probability for both data traces 1 and 2 sharing common change-points. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

11

increasing model complexity. It is a feature of probabilistic fitting that models will only be accepted if the data allows it, and noise determines the degree of improvement (Eq. (9)).

Fig. 5 shows that we achieve, for our example at least, results com-parable to the reversible-jump Monte-Carlo Markov chain (RJMCMC) method of Gallagher et al. (2011). The RJMCMC method offers excellent posterior data analysis and allows estimation of both layer mean values and data error. As Gallagher et al. (2011) point out, data error can be estimated for the layers between every change-point, and not just the data sequence as a whole. It is, however, rather slow; and the compu-tational burden increases with number of data points. For the method presented, we are obliged to either know the data error in advance or have a good estimate in mind. With our method, it is easy to change the noise estimates and the log-odds threshold for program termination, and

run several versions of the algorithm in a very short time. Fig. 6 demonstrates an extension to higher-order models. In the ex-

amples shown, we model the data twice: once for layers of constant mean value, and once for layers with constant or linear trends. The combined model is superior (we find a log-odds improvement of 3.5 ×1040 : 1 for the linear and constant model compared to the constant-only model), and linear trends in the data are faithfully reproduced. To our knowledge, ours is the only method that allows detection of both linear and constant trends in data sequences. Kylander et al. (2007) offer Bayesian change-point approaches that detect either constant or linearly trending models in the sequences, but they do offer an approach that detects both. Gallagher et al. (2011) write that such a treatment using RJMCMC methods is subject to further research, and we eagerly await their development.

Fig. 8. Analysis of geochemistry data from peat bogs in China (Large et al., 2009). (a) Normalised global total probability of probabilistic analysis for all 6 geochemistry data sequences. (b - g) Normalised global probability for each variable. (b) δ13C, (c) organic carbon density ρ(Corganic), (d) density ρ, (e) carbon abundance, (f) nitrogen abundance, (g) hydrogen abundance. (h - m) Geochemistry data traces for each of the variables (dots), and analysis results using constant and linear models (blue lines). Change-points identified by our method are shown with solid black horizontal lines in panel (h). Change-points identified in analysis provided by Gallagher et al. (2011) are shown with dashed lines in panel (i). Cool epochs interpreted by Large et al. (2009) shown by gray boxes in each panel. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

12

Fig. 7 demonstrates our scheme for segmenting multi-variate data sequences. Independent model parameters are chosen for each data sequence, the locations of change-points are shared through the series, resulting in linear model segments in one series and constant model segments in another. Change-points are introduced that increase the overall total global probability even if they are not strictly necessary for a given single trace. As in other examples, the probability models are insensitive to data missing from either or both series.

5.2. Real data examples

We re-examine geochemical analysis taken from peat bogs in China Large et al. (2009). The geochemical data was later re-interpreted by Gallagher et al. (2011) using their RJMCMC change-point method. We

find excellent agreement between our results and the interpretations of both Large et al. and Gallagher et al.

Organic carbon density, δ13(C), and H concentrations are the most important variables in our analysis, since their probability values are almost always increasing with model complexity. The other variables are of lesser importance after 4 change-points. This highlights that the change-points shared between series will be accepted if the total global probability increases relative to the user-defined threshold even though individual data series do not necessarily support increased model complexity. Some change-points reduce the probability contributions for individual variables, even though the total probability always increases.

We analyse XRF data sampled at 1 m intervals from drill core in Western Australia. Our method is applied to the data series ensemble of

Fig. 9. Analyses for XRF logs taken from drill holes 18ABAD01 and 18ABAD02, using Cr, Zr, and Ti. (a) Model complexity plot for Cr, drill hole 18ABAD01. Colour values shown represent mean value of Cr between change-points. Vertical white line marks the level of complexity chosen for models in the data panels (15, in this case). (b) As in (a), but for 18ABAD02. (c), (e) and (g) Cr, Zr and Ti for 18ABAD01. Blue lines show predicted models at complexity level 15. Geological formation boundaries are shown with dashed horizontal lines. Arrow in (c) shows log-odds of linear model to constant model in Cr at about 450 m depth. Arrow in (g) shows log-odds for linear versus constant model for Ti at 300 m depth. (d), (f) and (h) Cr, Zr, and Ti data and models for drill hole 18ABAD02. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A.C. Davis

Chemical Geology 559 (2021) 119973

13

Cr, Ti, and Zr for both drill holes (18ABAD01 and 18ABAD02). The first change-point detected for both holes is in the Kylena Formation at about 350 m depth. This change-point is marked by a large shift in both the Cr and Ti content for both holes about 150 m above the change to the sedimentary Hardey Formation indicating mixing into the Kylena For-mation. There is clear evidence that some layers exhibit linearly increasing or decreasing concentrations of elements with respect to depth. It is beyond the scope of the paper to speculate what interesting physical or chemical processes cause the linear changes in elemental concentration with depth. It is useful to point out that segmentation or change-point detection methods that do not allow linear models will fail to discriminate these layers. Instead, there would either be multiple change-points separating many thin layers of constant concentration (as seen in Fig. 6b). The linear trends are clear and highly likely: the odds ratio for choosing linear models over constant models for 2 layers, shown in Fig. 9c and g, are extremely large. This indicates that the linear trends are preferred over constant models.

Fig. 9 shows similarities between the drill hole chemistry of 18ABAD01 and 18ABAD02 with respect to depth even though the drill holes are separated by 5 km. Drilling records identify some of the change-points as changes in geological formation. With our method, we see that not only are change-points located at similar depths, they form the boundary of zones where elements exhibit similar linearly increasing or decreasing trends. This is very clear, for example, in the increasing concentrations of Ti from 250 m to 350 m depth in both drill holes. Presumably, the processes responsible were similar across the 5 sepa-ration. Visually, we see clear relationships in the concentration of ele-ments between drill cores, and our inspection is aided by modelling linear relationships in the segmented traces. Classification and identi-fication of relationships like this will be improved with the introduction of analysis techniques such as machine learning or pattern recognition algorithms which will aid in spatial coherence modelling, allowing us to infer that similar rock properties exist over large separation distances. The hierarchical model structures that are produced by our method will be useful for this.

6. Summary

Change-point analysis is based on detecting a location in a data sequence that separates different trends in data segments on either side of the location. Segments are distinct regions that can be modelled with functions, and we present a novel method of segmenting multivariate depth-series data series through application of probability theory. We provide a clear procedure for determining positions of change-points that are common to all data sequences in a data series collection. The data trends on either side of the change-point are modelled using simple polynomial functions. The position of the change-point, and the model parameters of the polynomial functions yield the highest probability for fitting the segmented sequences, and recursive application of the method partitions the data series collection into a hierarchical model structure of change-points and model parameters. Termination of the change-point search occurs once a proposed change-point offers no significant improvement of data misfit over increased model complexity.

Bayesian probability theory requires the estimation of variance in the data sequences. Our approach is to require the user to estimate noise variance for all data in the collection forcing the analyst to consider the analytic and measurement noise in the sequences. This may appear to be a hinderance, but the algorithm is fast enough that we can quickly test for reasonable variance estimates. The Bayesian approach balances model complexity against improved misfit due to the addition of new change-points and model parameters. Model complexity is increased to include the maximum number of change-points allowable by the data, the allowed model parameters, and noise variance. Our method main-tains a complete hierarchical structure, and we are at liberty to choose simple or complicated models based upon considerations such as ancillary data or our desire for simplicity in visualisation.

Applying the method to real data sets, we show results that are comparable to other analysis techniques. We also found evidence for linear trends in data sequences that are overwhelmingly favoured over constant layer values. The inclusion of linear trends in data analysis across multiple variables should improve classification of multi-variate segments across multiple data sets and aid our understanding of geochemical processes in mineral exploration.

Research data

Data for this paper has been made available in the CSIRO Data Access Portal.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The author declares there is no conflict of interest in the writing of this manuscript. Acknowledgement is given Andrew King for many fruitful discussions about developing this analysis, Mark Pearce for discussion on the utility of detecting linear trends in multi-variate geochemical sequences, and Artemis Resources for permission to pub-lish data from the drill core in Western Australia.

References

Barry, D., Hartigan, J.A., 1993. A Bayesian analysis for change point problems. J. Am. Stat. Assoc. 88, 309–319. URL: http://www.jstor.org/stable/2290726 https://doi. org/10.2307/2290726.

Cooper, G., Cowan, D., 2009. Blocking geophysical borehole log data using the continuous wavelet transform. Explor. Geophys. 40, 233–236. URL: https://doi. org/10.1071/EG08127.

Davis, A.C., Christensen, N.B., 2013. Derivative analysis for layer selection of geophysical borehole logs. Comput. Geosci. 60, 34–40. URL: http://www.sciencedi rect.com/science/article/pii/S0098300413001799 https://doi.org/10.1016/j. cageo.2013.06.015.

Denison, D.G.T., Homes, C.C., Mallick, B.K., Smith, A.F., 2002. Bayesian methods for nonlinear classification and regression. In: Wiley Series in Probability and Statistics. Wiley, Chichester, England, New York, NY (OCLC: ocm48754149).

Fresia, B., Ross, P.S., Gloaguen, E., Bourke, A., 2017. Lithological discrimination based on statistical analysis of multi-sensor drill core logging data in the Matagami VMS district, Quebec, Canada. Ore Geol. Rev. 80, 552–563. URL: http://www.sciencedi rect.com/science/article/pii/S0169136816301305 https://doi.org/10.1016/j.or egeorev.2016.07.019.

Gallagher, K., Bodin, T., Sambridge, M., Weiss, D., Kylander, M., Large, D., 2011. Inference of abrupt changes in noisy geochemical records using transdimensional changepoint models. Earth Planet. Sci. Lett. 311, 182–194. URL: http://linkinghub. elsevier.com/retrieve/pii/S0012821X11005292 https://doi.org/10.1016/j.epsl.20 11.09.015.

Gazley, M.F., Tutt, C.M., Fisher, L.A., Latham, A.R., Duclaux, G., Taylor, M.D., de Beer, S. J., 2014. Objective geological logging using portable XRF geochemical multi- element data at Plutonic Gold Mine, Marymia Inlier, Western Australia. J. Geochem. Explor. 143, 74–83. URL: http://www.sciencedirect.com/science/article/pii/ S0375674214001137 https://doi.org/10.1016/j.gexplo.2014.03.019.

Gregory, P., 2010. Bayesian logical data analysis for the physical sciences: a comparative approach with Mathematica support. In: Paperback ed. Cambridge University Press, Cambridge.

Hill, E., Robertson, J., Uvarova, Y., 2015. Multiscale hierarchical domaining and compression of drill hole data. Comput. Geosci. 79, 47–57. URL: http://www.scienc edirect.com/science/article/pii/S0098300415000540 https://doi.org/10.1016/j. cageo.2015.03.005.

Hill, E.J., Uvarova, Y., 2018. Identifying the nature of lithogeochemical boundaries in drill holes. J. Geochem. Explor. 184, 167–178. URL: http://www.sciencedirect.com/ science/article/pii/S0375674217301279 https://doi.org/10.1016/j.gexplo.2017.10 .023.

Hill, E.J., Pearce, M.A., Stromberg, J.M., 2020. Improving automated geological logging of drill holes by incorporating multiscale spatial methods. Math. Geosci. https://doi. org/10.1007/s11004-020-09859-0. URL:

Killick, R., Fearnhead, P., Eckley, I.A., 2012. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 107, 1590–1598. URL: https://doi.org /10.1080/01621459.2012.737745.

Konate, A.A., Ma, H., Pan, H., Qin, Z., Ahmed, H.A., Dembele, N.D.J., 2017. Lithology and mineralogy recognition from geochemical logging tool data using multivariate

A.C. Davis

Chemical Geology 559 (2021) 119973

14

statistical analysis. Appl. Radiat. Isot. 128, 55–67. URL: http://www.sciencedirect.co m/science/article/pii/S0969804317301008 https://doi.org/10.1016/j.apra diso.2017.06.041.

Kylander, M., Muller, J., Wüst, R., Gallagher, K., Garcia-Sanchez, R., Coles, B., Weiss, D., 2007. Rare earth element and Pb isotope variations in a 52 kyr peat core from Lynch’s Crater (NE Queensland, Australia): Proxy development and application to paleoclimate in the Southern Hemisphere. Geochim. Cosmochim. Acta 71, 942–960. URL: http://www.sciencedirect.com/science/article/pii/S0016703706021582 htt ps://doi.org/10.1016/j.gca.2006.10.018.

Large, D.J., Spiro, B., Ferrat, M., Shopland, M., Kylander, M., Gallagher, K., Li, X., Shen, C., Possnert, G., Zhang, G., Darling, W.G., Weiss, D., 2009. The influence of climate, hydrology and permafrost on Holocene peat accumulation at 3500m on the eastern Qinghai–Tibetan Plateau. Quat. Sci. Rev. 28, 3303–3314. URL: http://www. sciencedirect.com/science/article/pii/S0277379109003084 https://doi.org/10.10 16/j.quascirev.2009.09.006.

Perez-Munoz, T., Velasco-Hernandez, J., Hernandez-Martinez, E., 2013. Wavelet transform analysis for lithological characteristics identification in siliciclastic oil fields. J. Appl. Geophys. 98, 298–308. URL: http://www.sciencedirect.com/science/ article/pii/S0926985113002097 https://doi.org/10.1016/j.jappgeo.2013.09.010.

Pratson, E.L., Anderson, R.N., Dove, R.E., Lyle, M., Silver, L.T., James, E.W., Chappell, B. W., 1992. Geochemical logging in the Cajon Pass Drill Hole and its application to a new, oxide, igneous rock classification scheme. J. Geophys. Res. Solid Earth 97,

5167–5180. URL: https://doi.org/10.1029/91JB02643. Publisher: John Wiley & Sons, Ltd.

Scott, A., Knott, M., 1974. A cluster analysis method for grouping means in the analysis of variance. Biometrics 30, 507–512.

Sen, A., Srivastava, M.S., 1975. On tests for detecting change in Mean. Ann. Stat. 3, 98–108. URL: https://projecteuclid.org:443/euclid.aos/1176343001 https://doi. org/10.1214/aos/1176343001.

Stromberg, J., Spinks, S., Pearce, M., 2019. Characterisation of the Neoarchean Fortescue Group Stratigraphy – Integrated Downhole Geochemical Mineralogical Correlation from New Diamond Drilling. Taylor and Francis, Perth, Australia, pp. 1–4.

Thorne, A.M., Trendall, A., 2001. Geology of the Fortescue Group, Pilbara Craton, Western Australia. Bulletin Number 144. Geological Survey of Western Australia, Perth, Western Australia. URL: https://warmelpdstageodocspub.blob.core.windows. net/gswa-publications/gsdbul144.pdf.

Truong, C., Oudre, L., Vayatis, N., 2020. Selective review of offline change point detection methods. Signal Process. 167, 107299. URL: http://www.sciencedirect.co m/science/article/pii/S0165168419303494 https://doi.org/10.1016/j.sigpro.20 19.107299.

Yao, Y.C., 1984. Estimation of a noisy discrete-time Step function: bayes and empirical bayes approaches. Ann. Stat. 12, 1434–1447. URL: https://projecteuclid.org:443/eu clid.aos/1176346802 https://doi.org/10.1214/aos/1176346802.

A.C. Davis