estimation of probabilities of three kinds of petrologic hypotheses with bayes theorem

19
Mathematical Geology, Vol. 30, No. 7, 1998 Estimation of Probabilities of Three Kinds of Petrologic Hypotheses with Bayes Theorem 1 James Nicholls 2 Physical-chemical explanations of the causes of variations in rock suites are evaluated by comparing predicted to measured compositions. Consistent data turn an explanation into a viable hypothesis. Predicted and measured values seldom are equal, creating problems of defining consistency and quantifying confidence in the hypthesis. Bayes theorem leads to methods for testing alternative hypotheses. Information available prior to data collection provides estimates of prior probabilities for competing hypotheses. After consideration of new data, Bayes theorem updates the probabilities for the hypotheses being correct, returning posterior probabilities. Bayes factors, B, are a means of expressing Bayes theorem if there are two hypotheses, H 0 and H 1 . For fixed values of the prior probabilities, B > 1 implies an increased posterior probability for H 0 over its prior probability, whereas B < 1 implies an increased posterior probability for H 1 over its prior probability. Three common problems are: (1) comparing variances in sets of data with known analytical uncertainties, (2) comparing mean values of two datasets with known analytical uncertainties, and (3) determining whether a data point falls on a predicted trend. The probability is better than 0.9934 that lava flows of the 1968 eruption of Kilauea Volcano, Hawaii, are from a single magma batch. The probability is 0.99 that lava flows from two outcrops near Mount Edziza, British Columbia, are from different magma batches, suggesting that the two outcrops can be the same age only by an unlikely coincidence. Bayes factors for hypotheses relating lava flows from Volcano Mountain, Yukon Territory, by crystal fractionation support the hypothesis for one flow but the factor for another flow is so small it practically guarantees the fractionation hypothesis is wrong. Probabilities for petrologic hypotheses cannot become large with a single line of evidence; several data points or datasets are required for high probabilities. KEY WORDS: statistics, analytical uncertainty, Bayes factor, hypothesis testing. INTRODUCTION Petrologic hypotheses are physical-chemical explanations of the causes of vari- ations in chemistry, mineralogy, and rock-type in rock bodies or rock suites. Predicted compositions derived from physical-chemical models (e.g., Ghiorso and Sack, 1995) can be compared to measured compositions of rocks and min- erals. If the chemical, mineralogical, and rock-type data are consistent with the 1 Received 19 July 1996; accepted 5 January 1998. 2 Department of Geology & Geophysics, University of Calgary, Calgary, Alberta T2N 1N4, Canada. e-mail: [email protected] 817 0882-8121/98/1000-0817$15.00/1 © 1998 International Association for Mathematical Geology

Upload: james-nicholls

Post on 06-Aug-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Mathematical Geology, Vol. 30, No. 7, 1998

Estimation of Probabilities of Three Kinds ofPetrologic Hypotheses with Bayes Theorem1

James Nicholls2

Physical-chemical explanations of the causes of variations in rock suites are evaluated by comparingpredicted to measured compositions. Consistent data turn an explanation into a viable hypothesis.Predicted and measured values seldom are equal, creating problems of defining consistency andquantifying confidence in the hypthesis. Bayes theorem leads to methods for testing alternativehypotheses. Information available prior to data collection provides estimates of prior probabilitiesfor competing hypotheses. After consideration of new data, Bayes theorem updates the probabilitiesfor the hypotheses being correct, returning posterior probabilities. Bayes factors, B, are a meansof expressing Bayes theorem if there are two hypotheses, H0 and H1. For fixed values of the priorprobabilities, B > 1 implies an increased posterior probability for H0 over its prior probability,whereas B < 1 implies an increased posterior probability for H1 over its prior probability. Threecommon problems are: (1) comparing variances in sets of data with known analytical uncertainties,(2) comparing mean values of two datasets with known analytical uncertainties, and (3) determiningwhether a data point falls on a predicted trend. The probability is better than 0.9934 that lava

flows of the 1968 eruption of Kilauea Volcano, Hawaii, are from a single magma batch. Theprobability is 0.99 that lava flows from two outcrops near Mount Edziza, British Columbia, arefrom different magma batches, suggesting that the two outcrops can be the same age only by anunlikely coincidence. Bayes factors for hypotheses relating lava flows from Volcano Mountain,Yukon Territory, by crystal fractionation support the hypothesis for one flow but the factor foranother flow is so small it practically guarantees the fractionation hypothesis is wrong. Probabilitiesfor petrologic hypotheses cannot become large with a single line of evidence; several data pointsor datasets are required for high probabilities.

KEY WORDS: statistics, analytical uncertainty, Bayes factor, hypothesis testing.

INTRODUCTION

Petrologic hypotheses are physical-chemical explanations of the causes of vari-ations in chemistry, mineralogy, and rock-type in rock bodies or rock suites.Predicted compositions derived from physical-chemical models (e.g., Ghiorsoand Sack, 1995) can be compared to measured compositions of rocks and min-erals. If the chemical, mineralogical, and rock-type data are consistent with the

1Received 19 July 1996; accepted 5 January 1998.2Department of Geology & Geophysics, University of Calgary, Calgary, Alberta T2N 1N4, Canada.e-mail: [email protected]

817

0882-8121/98/1000-0817$15.00/1 © 1998 International Association for Mathematical Geology

model, then the model is a viable hypothesis for explaining the causes of thevariations, otherwise it is not. Predicted and measured values seldom are equaland one has the problem of defining consistency. How close should predictedand measured values be if the hypothesis is true? Even if the measured andpredicted values are consistent, suggesting that the hypothesis is true, the nextquestion is how confident or sure are we that the model or hypothesis is true?In other words, we want to quantify the degree of confidence in the hypothesisor model. Bayes theorem and Bayesian statistical methods can help answer bothquestions.

The application of standard statistical methods to the evaluation of petro-logic hypotheses are summarized and described in several texts (e.g., LeMaitre,1982; Davis, 1986). Bayesian methods offer an alternative point of view thatcan sometimes provide insight for hypotheses that are not easily represented bystandard methods.

Numerical tests of petrologic hypotheses often fall into one of three cate-gories: (1) Does a set of data points that are expected to have equal values,have a variance greater than can be expected from analytical uncertainty? (2)Are the population mean values generated from two sets of data points equal?(3) Does a data point fall on a trend predicted by a physical-chemical model?Examples discussed in this paper include estimates of the probability that a setof lava flows originated from a single magma batch and estimates of the prob-ability that a set of rock analyses are samples of a magmatic crystallizationsequence.

In symbols, Bayes theorem is:

818 Nicholls

where Hi is an expression representing the hypothesis being tested, D is the datacollected to test the hypothesis, and I is the prior information available aboutthe hypothesis. Pr[H i |D & I] is the probability that Hi is correct or true, giventhe data (D) and prior information (I). Pr[D|H i & I] is the probability ofobserving or measuring the data, given that Hi is correct and given the priorinformation. P r [ H i | I ] is the probability of Hi being correct given the priorinformation and before taking into account the new data (D). Pr[H i |I] is calledthe prior probability. P r [ D | I ] is the probability of observing or measuring thedata, regardless of whether Hi or any other hypothesis is true.

Implicit in Bayes theorem is the idea that there are alternative hypotheses.If there is only one hypothesis and no alternatives, the probability for thathypothesis is one; it must be true if there are no other explanations. Conse-quently, applications of Bayes theorem become a comparison of two or morehypotheses. To evaluate the probabilities in Bayes theorem, the statements ofthe hypotheses and data are translated into statistical expressions.

is a probability density function for a normally distributed variable, x. ft and aare parameters (the mean and standard deviation, respectively).

Bayes factors for continuous distributions can be calculated from (Bergerand Delampady, 1987):

For a fixed value of p0, larger values of B correspond to higher values ofPr[H0 |D & I]. The variation of Pr[H 0 |D & I] with B is shown in Figure 1.Bayes factors greater than one indicate an increased posterior probability for H0

over its prior probability, whereas Bayes factors less than 1 indicate an increasedposterior probability for H1 over its prior probability. If the Bayes factor is equalto 1, the posterior probabilities equal the prior probabilities for the two hy-potheses.

Statistical expressions for the hypotheses that enter Bayes theorem are usu-ally written with probability density functions. For example:

where p0, the prior probability for H0, is given by:

This last expression, which is a ratio of the ratios of the posterior probabilitiesto the prior probabilities for the two hypotheses, is the definition used by Jeffreys(1961) but which he labeled K instead of B. In the recent literature, B is thesymbol used to label Bayes factors (e.g., Berger and Delampady, 1987; Jeffreysand Berger, 1992).

With Bayes factors, we can write Bayes theorem as:

where the subscripts distinguish the two hypotheses. Another way to express aBayes factor is:

Bayes factors are convenient devices for expressing Bayes theorem if thereare two alternative hypotheses. A Bayes factor is:

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 819

and the range of integration is over all space available to 6, commonly from— A to +A.

In summary, the posterior probability of a hypothesis that can be repre-sented by a statistical expression can be calculated if (1) there is at least onealternative, and (2) the prior probability density functions for the competinghypotheses can be found.

g(T) is the prior probability density function for t, with the condition that H1 istrue:

where:

Figure 1. Plot of Bayes factor vs. posterior probability of H0. Theprior probability of H0 is p0. The shaded region is defined by therange of B (0 < B < 1) and is where the posterior probability ofH1 increases over p1. The unshaded region is defined by B > 1and is where the posterior probability of H0 is larger than p0.

820 Nicholls

Table 1 shows some examples of the use of this particular Bayes factor.Data in Table 1 were transformed from analyses of four picrites and one basalterupted from Kilauea Volcano, Hawaii in 1968 (Wright, 1971; Wright, Swan-son, and Duffield, 1975). Fractionation or accumulation of olivine explains thevariations in the data (Nicholls, Russell, and Stout, 1986; Nicholls and Stout,1988). If this hypothesis is correct, then each element ratio, whose means,standard deviations, and analytical uncertainties are listed in the top part ofTable 1, should be constant across all five samples. None of the elements inthose ratios enters the olivine structure in measurable amounts. Analytical ormeasurement errors preclude the numbers being the same in all five analyses.

Calculating the Bayes factor (Eq. 7) gives:

The prior distribution for H1, g(a), is (Jeffreys, 1961; Schmitt, 1969):

TESTING HYPOTHESES REPRESENTED BY VARIANCEDISTRIBUTIONS

The simplest tests of petrologic hypotheses compare the dispersion of thedata with analytical uncertainty. The data can be ratios of isotopes or conservedelements, or even directly measured concentrations. If the data come from co-genetic samples, then we expect properly selected and transformed datasets tohave variances the same as the analytical uncertainties. If the variance of thedata sufficiently exceeds the analytical uncertainties, the conclusion is straight-forward: the sources of the data are not cogenetic. On the other hand, there isno physical-chemical explanation if the analytical uncertainty exceeds the vari-ance in the data. Perhaps the reported analytical uncertainty is too large and theanalyst, in an effort to be conservative, reported an analytical uncertainty asworse than it really is. Although unlikely, it is possible that we got lucky; thedata may be better than we think they are. A third possibility is that the particulardensity function used to describe the analytical uncertainties is incorrect.

Suppose we measure n values of a normally distributed random variable forwhich the expected variance of the population is O2. The variable (n - 1 )s 2 /D2, will also be a random variable but from a x(n-1) distribution (Meyer, 1975),where s2 is the sample variance. Consequently, the distribution function forcalculating the Bayes factor is:

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 821

However, we can test whether or not the dispersion of the data, as measuredby the standard deviation, s, of the five values of a particular ratio is differentfrom the expected analytical uncertainty, a0.

The Bayes factors for the element ratios range from less than 1 (Ca/K andNa/K) to nearly 20 (P/K). Both Ti/K and P/K have associated Bayes factorsthat strongly support the hypothesis that the variance in the data is equal to thesquare of the analytical uncertainty. The Bayes factors for A1/K, Ca/K, andNa/K are close enough to 1 that there is no reason to suspect the variance inthe data arises from anything but analytical uncertainty, especially because thestandard deviation of the data is less than the analytical uncertainty. Jeffreys(1961) suggests the following interpretations for ranges in B:

B > 1 H0 supported by the data.1 > B > 10-1/2 Evidence against H0 but not worth more than a

bare mention.10- 1 / 2 > B > 10-1 Evidence against H0 substantial.

10-1 > B > 10-3/2 Evidence against H0 strong.10-3/2 > B > 10-2 Evidence against H0 very strong.

10-2 > B Evidence against H0 decisive.

Nicholls and Stout (1988) hypothesized that the five lava flows, whichprovided the data in Table 1, were related by sorting (fractionation or accu-mulation) of olivine. If the hypothesis is correct, the data should fall on a trendwith a slope of two when plotted on a diagram with (Fe + Mg)/K and Si/K asaxial ratios (Fig. 2). Another way of expressing this consequence is: if thehypothesis is true, the intercepts formed by drawing a line with a slope of twothrough each data point should be the same for all samples. The statisticalequivalent, H0, of this characteristic of the data, given that the hypothesis iscorrect, is that the expected variance of the intercept values should be equal tothe propagated analytical uncertainty. Mean values for the intercepts, X-ratios,and Y-ratios are listed in Table 1. The Bayes factors for the X- and Y-ratiosare less than 1 and favor the alternative hypothesis, H1, that the values in eachset of ratios are not equal. Because the standard deviations of the data (s, Table1) for these two ratios exceeds that expected from analytical uncertainty (D0), itis likely that some process acted to cause the diversity, such as crystal sorting.

In contrast, the standard deviation of the intercept values (s = 0.7530) isless than expected from analytical uncertainty (a0 = 5.4275). Although theBayes factor (0.0273) is very strong evidence against H0, we cannot ascribe aphysical-chemical cause to the difference. Physical-chemical causes would leadto values of s greater than a0. Rather, the cause must be an overestimation ofthe analytical uncertainties, the use of an inappropriate probability density func-tion, or by luck, the data points plot closer to a single line with a slope of twothan expected—somewhat like winning the lottery.

822 Nicholls

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 823

where p - a/(TSn), n = (u0 — n)/T, and t = Sn(x — u 0 ) / o .Volcanic products have been used to define time lines in the stratigraphic

column. Correlations between outcrops are based on similarities in chemicalparameters (e.g., Oviatt and Nash, 1989). Table 2 contains data from twooutcrops of lava flows near Mount Edziza in northern British Columbia (Spooner,1994). The problem is to decide whether the two outcrops are correlative (i.e.,

TESTING HYPOTHESES WITH DIFFERENCES IN MEAN VALUES

This problem is one of testing whether an observation from a probabilitydensity function is equal to a given value or not. The Bayes factor for this caseis given by (Berger and Delampaday, 1987):

Figure 2. Element ratio diagram for picrites and a basalt that, by hypothesis,are related by olivine sorting. A consequence of the hypothesis is that thedata should all lie on a line with a slope of 2. Comparing the dispersion inthe intercepts formed by drawing a line with a slope of 2 through each datapoint with the dispersion expected from analytical uncertainty tests the hy-pothesis.

824 Nicholls

if such a hypothesis is true (see Fig. 3). We can look at the dispersion of theintercepts of lines with slope of one drawn through the data points to test thehypothesis. The results of the calculations are shown in Table 2. The Bayesfactor for the variance of the intercepts is considerably less than one, suggesting

are they the same age?). The lava flows are more likely to be the same age ifthey are from the same magma batch. Small magma bodies that differentiatehigh in the crust are transient phenomena, and multiple products must be closelyrelated in time. If the lava flows are from different magma batches, then theycan be the same age only by coincidence and there is no reason to suppose theyare the same age. Samples 141B, 142B, and 146B are from one locality, whereas194B and 197A are from another. Figure 3 shows two tests of the comagmatichypothesis.

The problem has two parts. First is whether there is evidence that the rockswere formed from one magma batch. Second, if there are multiple magmabatches, then did the rocks from one locality form from one batch and the rocksfrom the other locality form from a different batch?

The lava flows are approximately basaltic and if the chemical diversity iscaused by an internal process (e.g., crystal fractionation), then sorting of olivine,plagioclase, clinopyroxene, Fe-Ti oxides, and possibly apatite should accountfor the diversity. The data should fall on a trend with a slope of one on a plotof:

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 825

Table 2. Element Ratios and Intercepts for Mount Edziza Lava Flows

Sample

141B142B146BWeighted values

194B197 AWeighted values

P/K

0.4510.4570.4510.4529

0.3360.3500.3428

D0(P/K)

0.0410.0420.0410.0414

0.0420.0430.0425

I

-1.606-1.766-1.787-1.7178

-3.324-3.334-3.3291

00(I)

0.3110.3180.3190.3159

0.5580.5470.5524

Variable Mean s a0 B(s, a0) u1 - u2 D = T B(u1 - u2) Pr(H0| D & I)

P/K 0.409 0.0605 0.0418 6.0742 0.1101 0.0593 0.2523 0.2015I -2.363 0.8843 0.4267 0.0150 1.6113 0.6363 0.0573 0.0143

Nicholls

Figure 3. Pearce element ratio diagrams for samples of lava flows collected at two sites near Mt.Edziza, British Columbia. The shaded area of the inset marks a region of non-overlap betweenthe ratios of the conserved elements, P/K, for lava flows from two localities. The problem iswhether the samples are from the same magma batch or not. If they are, then it is likely the flowsat the two sites are the same age. If they are not from the same magma batch, then only bycoincidence can they be the same age.

the presence of more than one magma batch in the suite of samples. On theother hand, the dispersion of the P/K values leads to a Bayes factor greater thanone (B = 6.0742, Table 2). Consequently, the P/K values are those expectedfrom a single magma batch. This example demonstrates the occurrence of co-incidence. Although the P/K values are consistent with a single batch hypothesisand in that sense support the hypothesis, their small dispersion does not guar-antee the truth of the hypothesis, petrologic hypotheses can seldom be proven

826

where t = (u1 — u2)/a.For both P/K and the intercept, the Bayes factors are less than 1. Neither

variable offers any support for H0 according to the criteria of Jeffreys (1961).An estimate of the probability that H0 is correct follows from Bayes theorem.Because we start with no evidence, we are initially indifferent to whether thesamples from the two localities are from the same magma batch. We can thenset the prior probabilities equal to 1/2 and apply Equation (4) to the P/K valueand then to the intercept value. The posterior probability from the first appli-cation becomes the prior probability for the second application. The results are:Pr[H0\ D & I] is 0.2015 after application to the P/K value and 0.0143 after the

We have only one estimate of the difference between the means; consequently,n is equal to one. With these considerations, the Bayes factor becomes:

Whether or not the samples are from the same magma batch, there is no reasonto think the variances under the two hypotheses should be different. The variancearises from the analytical uncertainties, regardless of which hypothesis is true.Consequently, in Equation (13), we have

true. The large dispersion in intercept values effectively disproves the singlebatch hypothesis. These results, however, do not directly answer the secondquestion of whether the samples from one locality belong to the same batch asthe samples from the other outcrop.

To test the specific question of whether the lava flows from the two local-ities belong to different magma batches, we need to examine measures of centraltendency of the data, say the mean values of the data, for the two localities.Table 2 lists the weighted means and standard deviations for each location (seeMeyer, 1975). Variables in Equation (13) are assigned as follows. Under H0,the expected value of the difference in the weighted means, X, is zero becauseH0 is the hypothesis that the lava flows from the two localities are products ofthe same magma batch. Hence, we set u0 equal to zero. The observed value ofX for P/K is 0.1101. Under h1, the lava flows are from different batches andthe most likely value for the difference in the weighted means is the observeddifference, X = 0.1101 for P/K. Thus, we set u equal to 0.1101.

We are assuming the distributions under both hypotheses are normal. Thevariance of the difference between two normally distributed variables in (Meyer,1975):

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 827

second application to the intercept value. Consequently, Bayes analysis providesan estimate of a probability near 0.99 that the lava flows from the two localitiesare from different magma batches.

COMPARISON OF OBSERVED AND THEORETICAL QUANTITIES

Suppose a theory or hypothesis predicts a value for a quantity that can bemeasured or observed. Seldom will the observed and predicted values exactlyagree. We would like to estimate the probability that the observed value, X, isequal to the theoretical value, u0. The hypothesis can be symbolized:

828 Nicholls

The alternative hypothesis, H1, is that x comes from some other probabilitydistribution, say one with mean u and variance r2. If a normal distributioncharacterizes the distribution of x, regardless of hypothesis, and if the varianceof the observed value, a2, is known, then the Bayes factor is given by Equation(13). In summary, if theory predicts a single number, we know how to calculatethe Bayes factor.

Often, however, petrologic theory predicts a vector of quantities, say thecomposition of a solid or fluid solution. Commonly, predicted and observedquantities are shown on rectilinear diagrams such as Harker variation diagramsor Pearce element ratio diagrams. A point on such a graph can be treated as anumber in the complex plane (Nicholls, 1990):

with a variance given by:

a2 and a2 are the variances of the variables plotted on the x- and y-axis, re-spectively, and oxy is the covariance of the uncertainties between the two vari-ables, i is the imaginary unit S—1. If x and y are normally distributed, thentheir joint distribution is bivariate normal:

ux0 and uy0 are the means of the two variables and r is the correlation coefficient.The covariance and variances are related to the correlation coefficient by:

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 829

If we assign the coordinates of the point predicted by theory or hypothesis toux0 and uy0, then Equation (20) becomes the statistical expression representingour hypothesis:

The alternate hypothesis, H1, is for x and y to come from a density distri-bution with different means:

but with the same variances and covariance because these quantities are derivedfrom analytical uncertainties and are independent of which hypothesis is true.

Assuming the formula for calculating Bayes factors can be extended byanalogy from the univariate to bivariate case, i.e., change the functions inEquation (7) from univariate ones to bivariate ones, the formula for calculatingbivariate Bayes factors is:

Performing the indicated mathematical operations gives with Equations (20) and(23):

where:

Under the conditions of the alternate hypothesis, H1, the most likely valuesfor ux,. and uy are the observed or measured values themselves, X and Y. Making

830

these substitutions simplifies Equation (25) considerably:

Nicholls

An example is shown in Figure 4. Three samples of lava flows from Vol-cano Mountain, Yukon (Trupia, 1992; Trupia and Nicholls, 1996) have elementratios shown with filled circles: Xe11, Xe05, and VM01. The question is whetherXe05 and VM01 can be related to Xe11 by fractionation of olivine or clino-pyroxene, either separately or together. If olivine is the only fractionating phase,then data should fall on a line with a slope of 1 in Figure 4. If clinopyroxene

Figure 4. Pearce element ratio diagram for three samples from VolcanoMountain, Yukon (Trupia, 1992; Trupia and Nicholls, 1996). Fractionationof olivine from Xe11 would produce melts that fall on a line with a slope ofone through Xe11 (e.g., VM01). Clinopyroxene plus olivine fractionationfrom Xe11 would produce melts that would fall on lines with small negativeslopes. The particular line depends on pressure. Pressures larger or smallerthan approximately 3 GPa produce fractionation lines at more negative valuesof the Y-axis variable. Contours around the data points are lines of equalprobability density. Labels on the contours are values of the parameter k (Eq.27) that characterize the particular probability densities. The point labeled hmarks the coordinates where a contour is tangent to the fractionation path at3 GPa. The dashed ellipse around the point representing Xe05, k = 6.4, isjust tangent to the path with a slope of one that represents olivine fractionation.

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 831

is the only fractionating phase, data should fall on a line with a slope of — 1.The two phases fractionating together should produce a trend with a slope every-where between -1 and +1. The specific path followed by melts formed byfractionating Xe11 can be calculated through thermodynamic modeling. Frac-tionation paths are pressure dependent and three possible paths, calculated withthe thermodynamic database constructed by Ghiorso and others (1983) are shownas solid lines in Figure 4. The path at 3 GPa falls as close to the data pointrepresenting Xe05 as possible, given the thermodynamic database. Paths atlower and higher pressure fall further from the data point. The hypothesis isthat the data point for Xe05 falls on the 3 GPa fractionation trend. The questionsis: What is the probability that the hypothesis is true?

Centered at each data point are elliptical contours of equal probabilitydensity (Meyer, 1975). All points (x, y) on the ellipse defined by:

where k is a parameter that characterizes the contour, have the same probabilitydensity. Points inside the contour have a greater probability density than (x, y),whereas points outside the contour have a smaller probability density. The point(x0, y0) is the data point itself. Larger values of k characterize contours withsmaller probability densities. Consequently, the most likely point on the frac-tionation path, the point predicted by theory, to equal the observed value is thetangent point of the path to a contour characterized by a particular value of k.In other words, given a fractionation path, we calculate a value of k such thatthe elliptical contour is just tangent to the fractionation path.

Calculation of k and the coordinates of the point of tangency (X t, Yt) isrelatively straightforward if the fractionation path is or can be approximated bya straight line. Given the equation of the fractionation curve, evaluated at(Xt, yt), in the form:

one can set the slope of the line, X, equal to the appropriate derivative of theequation of the ellipse, dy/dx, also evaluated at (X t, Yt). The resulting expres-sion, with (Eq. 28), and the equation for the ellipse (Eq. 27), are three equationsin three unknowns, k, Xt, and Yt. The solutions are:

832 Nicholls

The y-coordinate, Yt, can be calculated with Equation (28) after calculating Xt.In order to calculate the Bayes factor, one simply assigns the coordinates of thepoint of tangency to ux0 and uy0 in Equation (26).

Table 3 contains Bayes factors for hypotheses relating the Volcano Moun-tain samples by crystal fractionation. The factor for evaluating a fractionationhypothesis relating Xe05 to Xe11, the hypothetical initial magma, is 2.83 x10-6, a value so small that it practically guarantees the fractionation hypothesiswith olivine and pyroxene is wrong. The path for fractionation of olivine aloneis tangent to a smaller ellipse than is the closest olivine plus pyroxene path. TheBayes factor for the olivine-alone path is 0.081 (Table 3, sample 2). This valuefor B suggests the evidence against the hypothesis for olivine fractionation issubstantial to strong. Consequently, the data provide no support for either frac-tionation path between Xe05 and Xe11. If the two samples are related, they arerelated by some mechanism other than fractionation of olivine and clinopyrox-ene. The olivine fractionation hypothesis relating the second sample, VM01, toXe11 has a Bayes factor of 1.956, a value that supports the fractionation hy-pothesis.

LIMITATIONS OF EQUATIONS (16) AND (26)

Bayes factors should vary from 0 to infinity if posterior probabilities ascalculated with Equation (4) are to vary from 0 to 1 at a fixed value of p0.Equations (16) and (26) can only result in values of B that are less than or equalto some finite maximum. Figure 5 demonstrates this fact for Equation (16). Bcan reach its maximum value of S2 only if the observed value of the variableu matches the hypothetical value u0. Suppose a maximum value for B is ob-tained. Equation (4) calculates the posterior probability for a given p0. Theincrease in p over p0 depends on p0. The maximum increase possible is 0.086

Table 3. Coordinates and Uncertainties of Data Points on Pearce Element Ratio Diagram (Fig. 4)for Lava Flows from Volcano Mountain, Yukona

Sample*

1. Xe052. Xe053. VM01

X

71.66571.66573.700

Y

-15.024-15.024-19.213

Sx

1.8561.8561.977

sy

0.4390.4390.556

Sxy

-0.710-0.710-1.005

X

-0.1591.0001.000

0

-4.867-92.388-92.388

B

2.83 x 10-6

0.0811.956

aData from Trupia (1992) and Trupia and Nicholls (1996). The hypothetical initial magma is Xe11 .The lines in Figure 4 that relate Xe05 and VM01 to Xe11 have slopes and intercepts X and 0,respectively. B is the Bayes factor for the fractionation hypothesis.

b1. Olivine + clinopyroxene fractionation hypothesis, 2, 3. Olivine fractionation hypothesis.

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 833

Figure 5. Plot of the Bayes factor, B, vs. t (Eq. 16). Valuesof the Bayes factor that increase the posterior probability overthe prior probability are limited to the range 1 to S2.

if the value of p0 is 0.414. Although more complicated, Equation (26) behavesin a similar manner. The principal differences are: (1) the maximum valuepossible for B is 2, and (2) the dependence of B on the correlation coefficient,r, and the variables u and v. Some of these dependencies are shown inFigure 6.

Equations (16) and (26) are both particular cases of more general Equations(13) and (25). Simplifications result from setting the variances and covariancesof the statistical representations of the hypotheses equal. These parameters arederived from the analytical uncertainties, which are the same, regardless ofwhich hypothesis is correct. By setting these statistical parameters equal, oneefectively limits the possible range of hypotheses and restricts the range of valuesBayes factors can attain. In the case of Equations (13) and (16), values of B arerestricted to the range: 0 < B < V2 (Fig. 5).

Probabilities favoring hypotheses that lead to Equations (16) and (26) can-not become large with a single line of evidence. Several data points must beentered in succession into a Bayes analysis to get a high probability. Perhaps

Figure 6. Plots of the Bayes factor, B, vs. r, the correlation coefficient forseveral values of u = v (Eq. 26). Values of B that increase the posteriorprobability over the prior probability are limited to the range 1 to 2.

this is fitting for petrologic hypotheses. One can never be sure they are correct,being consistent with the data is the best one can ask. Consequently, petrologichypotheses based on single lines of evidence should rarely have high probabil-ities of being true.

ACKNOWLEDGMENTS

This paper benefitted from comments by M. Z. Stout, J. K. Russell, BenEdwards, T. M. Gordon, E. D. Ghent, and three anonymous reviewers. They,of course, are not in any way responsible for its defects. George Snyder of theU.S.G.S once asked how sure I was of a petrologic interpretation. I had noanswer. This paper is a belated attempt to answer his question. A computerprogram that calculates the statistics for the three kinds of probability estimationsdescribed in this paper can be obtained by writing. Support was provided byNSERC Research Grant A7372.

834 Nicholls

REFERENCES

Berger, J. O., and Delampady, M., 1987, Testing precise hypotheses: Statis. Sci., v. 2, no. 3, p.317-352.

Davis, J. C., 1986, Statistics and data analysis in geology, 2nd Ed.: John Wiley & Sons, NewYork, 646 p.

Ghiorso, M. S., and Sack, R. O., 1995, Chemical mass transfer in magmatic processes. IV. Arevised and internally consistent thermodynamic model for the interpolation and extrapolationof liquid-solid equilibra in magmatic systems at elevated temperatures and pressures: Contrib.Mineral. Petrol., v. 119, no. 2, p. 197-212.

Ghiorso, M. S., Carmichael, I. S. E., Rivers, M. L., and Sack, R. O., 1983, The Gibbs freeenergy of mixing of natural silicate liquids; an expanded regular solution approximation forthe calculation of magmatic intensive variables: Mineral. Petrol., v. 84, no. 2, p. 107-145.

Jeffreys, W. H., and Berger, J. O., 1992, Ockham's razor and Bayesian analysis: Am. Scientist,v. 80, no. 1, p. 64-72.

Jeffreys, H., 1961, Theory of probability: Oxford University Press, Oxford, England, 459 p.LeMaitre, R. W., 1982, Numerical petrology: Statistical interpretation of geochemical data: Elsevier

Scientific Publishing Company, Amsterdam, 281 p.Meyer, S. L., 1975, Data analysis for scientists and engineers: John Wiley and Sons, New York,

513 p.Nicholls, J., 1990, Stoichiometric constraints on variations in Pearce element ratios and analytical

uncertainty in hypothesis testing, in Russell, J. K., and Stanley, C. R., eds., Theory andapplication of Pearce element ratios to geochemical data analysis, v. 8: Geological Associationof Canada, Vancouver, BC, p. 73-98.

Nicholls, J., and Stout, M. Z., 1988, Picritic melts in Kilauea—Evidence from the 1967-1968Halemaumau and Hiiaka eruptions: Jour. Petrology, v. 29, no. 5, p. 1031-1057.

Nicholls, J., Russell, J. K., and Stout, M. Z., 1986, Testing magmatic hypotheses with thermo-dynamic modelling, in Scarfe, C. M., ed., Short course in silicate melts, v. 12: MineralogicalAssoc. of Canada, Toronto, p. 210-235.

Oviatt, C. G., and Nash, W. P., 1989, Late Pleistocene basaltic ash and volcanic eruptions in theBonneville basin, Utah: Geol. Soc. America Bull., v. 101, no. 2, p. 292-303.

Schmitt, S. A., 1969, Measuring uncertainty, an elementary introduction to Bayesian statistics:Addison-Wesley Publishing Company, Reading, MA, 400 p.

Spooner, I. S., 1994, Quaternary Environmental Change in the Stikine Plateau Region, North-western British Columbia, Canada: unpubl. doctoral dissertation, University of Calgary, Cal-gary, 313 p.

Trupia, S., 1992, Petrology of nephelinites and associated ultramafic nodules of Volcano Mountain,Yukon Territory: unpubl. masters thesis, University of Calgary, Calgary, 123 p.

Trupia, S., and Nicholls, J., 1996, Petrology of Recent lava flows, Volcano Mountain, YukonTerritory, Canada: Lithos, v. 37, no. 1, p. 61-78.

Wright, T. L., 1971, Chemistry of Kilauea and Mauna Loa in space and time: U.S. Geol. SurveyProf. Paper, v. 735, no., p. 1-40.

Wright, T. L., Swanson, D. A., and Duffield, W. A., 1975, Chemical compositions of Kilaueaeast-rift lava, 1968-1971: Jour. of Petrology, v. 16, no. 1, p. 110-133.

Estimation of Probabilities of Petrologic Hypotheses with Bayes Theorem 835