Machine learning for geological mapping: Algorithms ?· machine learning algorithms, ... CHAPTER 3 –…

Download Machine learning for geological mapping: Algorithms ?· machine learning algorithms, ... CHAPTER 3 –…

Post on 09-Jul-2018

212 views

Category:

Documents

0 download

TRANSCRIPT

  • MACHINE LEARNING FOR GEOLOGICALMAPPING: ALGORITHMS AND APPLICATIONS

    MATTHEW J. CRACKNELLBSc (Hons)

    ARC Centre of Excellence in Ore Deposits (CODES)

    School of Physical Sciences (Earth Sciences)

    Submitted in fulfilment of the requirements for the degree of

    Doctor of Philosophy

    University of Tasmania

    May, 2014

    http://www.go2pdf.com

  • http://www.go2pdf.com

  • i

    Did you ever fly a kite in bed?

    Did you ever walk with ten cats on your head?

    Did you ever milk this kind of cow?

    Well, we can do it.

    We know how.

    If you never did you should.

    These things are fun and fun is good.

    Dr. Seuss

    http://www.go2pdf.com

  • ii

    http://www.go2pdf.com

  • iii

    DECLARATION OF ORIGINALITY

    This thesis contains no material which has been accepted for a degree or diploma by the

    University or any other institution, except by way of background information and duly

    acknowledged in the thesis, and to the best of my knowledge and belief no material

    previously published or written by another person except where due acknowledgement is

    made in the text of the thesis, nor does the thesis contain any material that infringes

    copyright.

    AUTHORITY OF ACCESS

    This non-published content of the thesis (see below) may be made available for loan and

    limited copying and communication in accordance with the Copyright Act 1968.

    STATEMENT REGARDING PUBLISHED WORKCONTAINED IN THESIS

    Chapter 4 of this thesis is published under a Creative Commons Attribution (CC BY)

    licence. You are free to copy, communicate and adapt the work, so long as you attribute

    the authors. To view a copy of this licence, visit http://creativecommons.org/licenses/. The

    publishers of the papers comprising Chapters 5 to 6 hold the copyright for that content, and

    access to the material should be sought from the respective journals.

    Matthew J. Cracknell

    May 2014

    http://creativecommons.org/licenses/.http://www.go2pdf.com

  • iv Machine learning for geological mapping

    http://www.go2pdf.com

  • v

    STATEMENT OF CO-AUTHORSHIP

    The following people and institutions contributed to the publication of work undertaken as

    part of this thesis:

    Matthew James Cracknell, ARC Centre of Excellence in Ore Deposits (CODES), School of

    Earth Sciences, University of Tasmania = Candidate

    Anya Marie Reading, ARC Centre of Excellence in Ore Deposits (CODES), School of

    Earth Sciences, University of Tasmania = Author 1

    Andrew William McNeill, Mineral Resources Tasmania, Department of Infrastructure

    Energy & Resources (DIER) = Author 2

    Author details and their roles:

    Paper 1, Geological mapping using remote sensing data: A comparison of five

    machine learning algorithms, their response to variations in the spatial distribution of

    training data and the use of explicit spatial information:

    Located in Chapter 4

    Candidate was the primary author and with Author 1 contributing to its development,

    refinement and presentation.

    http://www.go2pdf.com

  • vi Machine learning for geological mapping

    Paper 2, The upside of uncertainty: Identification of lithology contact zones from

    airborne geophysics and satellite data using Random Forests and Support Vector

    Machines:

    Located in Chapter 5

    Candidate was the primary author and with Author 1 contributing to development,

    refinement and presentation.

    Paper 3, Mapping geology and volcanic-hosted massive sulfide alteration in the

    HellyerMt Charter region, Tasmania, using Random Forests and Self-Organising

    Maps:

    Located in Chapter 6

    Candidate was the primary author and with Author 1 contributing to its refinement and

    presentation and Author 2 contributing to its formalisation and development.

    We the undersigned agree with the above stated proportion of work undertaken for each

    of the above published (or submitted) peer-reviewed manuscripts contributing to this

    thesis:

    Signed:

    Anya M. Reading Jocelyn McPhie

    Supervisor Head of School

    School Of Earth Sciences School Of Earth Sciences

    University of Tasmania University of Tasmania

    Date:

    http://www.go2pdf.com

  • vii

    ABSTRACT

    Machine learning algorithms are designed to identify efficiently and to predict accurately

    patterns within multivariate data. They provide analysts computational tools to aid

    predictive modelling and the interpretation of interactions between data and the

    phenomena under investigation. The analysis of large volumes of disparate multivariate

    geospatial data using machine learning algorithms therefore offers great promise to

    industry and research in the geosciences. Geoscience data are frequently characterised by a

    restriction in the number and distribution of direct observations, irreducible noise in these

    data and a high degree of intraclass variability and interclass similarity. The choice of

    machine learning algorithm, or algorithms and the details of how algorithms are applied

    must therefore be appropriate to the context of geoscience data. With this knowledge, I aim

    to employ machine learning as a means of understanding the spatial distribution of

    complex geological phenomena.

    I conduct a rigorous and comprehensive comparison of machine learning algorithms,

    representing the five general machine learning strategies, for supervised lithology

    classification applications. I also develop and test a novel method for obtaining robust

    estimates of the uncertainty associated with machine learning algorithm categorical

    predictions. The insights gained from these experiments leads to the further development

    and comparison of new methods for the incorporation of spatial-contextual information

    into machine learning supervised classifiers.

    In using machine learning algorithms for geoscience applications, I have developed best-

    practice methodologies that address the challenges facing geoscientists for geospatial

    supervised classification. Guidelines are established that detail the preparation and

    integration of disparate spatial data, the optimisation of trained classifiers for a given

    application and the robust statistical and spatial evaluation of outputs. I demonstrate,

    through a case study in a region that is prospective for economic mineralisation, the

    combination of supervised and unsupervised machine learning algorithms for the critical

    appraisal of pre-existing geological maps and formulation of meaningful interpretations of

    geological phenomena.

    http://www.go2pdf.com

  • viii Machine learning for geological mapping

    The experiments conducted as part of my research confirm the efficacy of machine

    learning algorithms to generate accurate geological maps representing a variety of terranes.

    I identify and explore key aspects of the spatial and statistical distributions of geoscience

    data that affect machine learning algorithm performance. My research clearly identifies

    Random Forests as a good first-choice algorithm for the prediction of classes

    representing lithologies using commonly available multivariate geological and geophysical

    data. Furthermore, Random Forests prediction uncertainty is shown to be closely related to

    ambiguous and/or erroneous classifications and, thus provides a practical means of

    indicating variable levels of confidence. Spatial-contextual information is best incorporated

    into machine learning supervised classifiers via the pre-processing of input variables

    and/or the post-regularisation of classifications. My findings indicate that a trade-off

    between optimal predictive models and interpretable explanatory models exists, whereby,

    intuitively interpretable models are not necessarily the most accurate.

    The practical application of machine learning algorithms requires the implementation of

    three key stages: (1) data pre-processing; (2) algorithm training; and (3) prediction

    evaluation. This methodology provides the foundation for generating accurate and

    geologically meaningful predictions with minimal user intervention and assists in the

    formulation of robust interpretations of complex geological phenomena. For example,

    classifications obtained by Random Forests are useful for critically appraising interpreted

    geological maps. Clusters produced by Self-Organising Maps indicate the presence of

    discrete, spatially contiguous and geologically significant sub-classes within individual

    lithological units, which represent regions of contrasting primary composition and

    alteration styles. My results may be widely applied to a broad range of practical geoscience

    challenges such as ore deposit targeting, geo-hazard risk assessment, engineering and

    construction projects, hydrological and environmental modelling and ecological studies.

    The applications of machine learning algorithms detailed in this thesis align well with

    state-of-the-art Big Data online infrastructure and virtual laboratories currently emerging in

    Australia.

    http://www.go2pdf.com

  • ix

    CONTENTS

    DECLARATION OF ORIGINALITY ............................................................................... III

    AUTHORITY OF ACCESS ................................................................................................. III

    STATEMENT REGARDING PUBLISHED WORK CONTAINED IN THESIS ...... III

    STATEMENT OF CO-AUTHORSHIP ...............................................................................V

    ABSTRACT ...........................................................................................................................VII

    CONTENTS .............................................................................................................................IX

    LIST OF TABLES ................................................................................................................ XV

    LIST OF FIGURES ........................................................................................................... XVII

    LIST OF ABBREVIATIONS.............................................................................................XXI

    ACKNOWLEDGEMENTS ............................................................................................. XXIII

    CHAPTER 1 INTRODUCTION ........................................................................................ 1

    1.1. Machine learning .......................................................................................................................2

    1.2. Geological maps .........................................................................................................................4

    1.3. Research scope and hypothesis ..................................................................................................5

    1.3.1. Major research questions to be addressed ..........................................................................6

    1.4. Thesis structure..........................................................................................................................7

    CHAPTER 2 MACHINE LEARNING THEORY AND IMPLEMENTATION ....... 9

    2.1. Machine learning .......................................................................................................................9

    2.1.1. Supervised versus unsupervised learning.........................................................................10

    2.2. Supervised classification ..........................................................................................................10

    2.2.1. Classification strategies...................................................................................................11

    2.2.1.1. Statistical learning algorithms.....................................................................................11

    2.2.1.2. Instance-based learners...............................................................................................14

    2.2.1.3. Logic-based learners ..................................................................................................17

    2.2.1.4. Support Vector Machines ...........................................................................................20

    2.2.1.5. Perceptrons ................................................................................................................23

    2.2.2. Supervised classifier implementation ..............................................................................25

    2.2.2.1. Data pre-processing....................................................................................................26

    2.2.2.2. Classifier training.......................................................................................................27

    http://www.go2pdf.com

  • x Machine learning for geological mapping

    2.2.2.3. Prediction evaluation ................................................................................................. 29

    2.3. Unsupervised clustering.......................................................................................................... 33

    2.3.1. Clustering strategies....................................................................................................... 33

    2.3.1.1. Partitioning algorithms .............................................................................................. 33

    2.3.1.2. Hierarchical algorithms ............................................................................................. 35

    2.3.1.3. Self-Organising Maps................................................................................................ 36

    2.3.2. Unsupervised clustering implementation ........................................................................ 38

    2.4. Conclusions ............................................................................................................................. 38

    CHAPTER 3 A REVIEW OF MACHINE LEARNING FOR GEOSCIENCE

    CLASSIFICATION APPLICATIONS ..............................................................................41

    3.1. Machine learning non-geoscience applications....................................................................... 41

    3.2. Machine learning geoscience applications .............................................................................. 44

    3.2.1. Classification of 0D data ................................................................................................ 45

    3.2.1. Classification of 1D data ................................................................................................ 46

    3.2.1.1. One temporal dimension............................................................................................ 46

    3.2.1.2. One spatial dimension ............................................................................................... 47

    3.2.1. Classification of 2D data ................................................................................................ 51

    3.2.1.3. Land cover/vegetation mapping ................................................................................. 52

    3.2.1.4. Geological mapping .................................................................................................. 55

    Supervised classification...................................................................................................... 55

    Unsupervised clustering....................................................................................................... 58

    Combined supervised and unsupervised methods.................................................................. 60

    3.3. Practical machine learning implementation ........................................................................... 61

    3.3.1. Data............................................................................................................................... 63

    3.3.2. Data pre-processing ....................................................................................................... 64

    3.3.3. Prediction evaluation...................................................................................................... 64

    3.3.4. Integrated workflow....................................................................................................... 65

    3.4. Conclusions ............................................................................................................................. 66

    CHAPTER 4 GEOLOGICAL MAPPING USING REMOTE SENSING DATA: A

    COMPARISON OF FIVE MACHINE LEARNING ALGORITHMS, THEIR

    RESPONSE TO VARIATIONS IN THE SPATIAL DISTRIBUTION OF TRAINING

    DATA AND THE USE OF EXPLICIT SPATIAL INFORMATION ...........................69

    4.0. Abstract................................................................................................................................... 69

    4.1. Introduction ................................................................................................................................ 70

    4.1.1. Machine learning for supervised classification................................................................ 72

    4.1.2. Machine learning algorithm theory................................................................................. 73

    4.1.2.1. Nave Bayes .............................................................................................................. 73

    4.1.2.2. k-Nearest Neighbours ................................................................................................ 73

    http://www.go2pdf.com

  • Contents xi

    4.1.2.3. Random Forests .........................................................................................................73

    4.1.2.4. Support Vector Machines ...........................................................................................74

    4.1.2.5. Artificial Neural Networks .........................................................................................74

    4.1.3. Geology and tectonic setting ...........................................................................................75

    4.2. Data ..........................................................................................................................................77

    4.3. Methods....................................................................................................................................78

    4.3.1. Pre-processing ................................................................................................................78

    4.3.2. Classification model training...........................................................................................79

    4.3.3. Prediction evaluation ......................................................................................................79

    4.4. Results ......................................................................................................................................79

    4.5. Discussion.................................................................................................................................84

    4.5.1. Machine learning algorithms compared...........................................................................84

    4.5.2. Influence of training data spatial distribution ...................................................................87

    4.5.3. Using spatially constrained data ......................................................................................88

    4.6. Conclusions ..............................................................................................................................89

    4.7. Acknowledgements ..................................................................................................................90

    4.8. Description of supplementary information..............................................................................91

    CHAPTER 5 THE UPSIDE OF UNCERTAINTY: IDENTIFICATION OF

    LITHOLOGY CONTACT ZONES FROM AIRBORNE GEOPHYSICS AND

    SATELLITE DATA USING RANDOM FORESTS AND SUPPORT VECTOR

    MACHINES ............................................................................................................................93

    5.0. Abstract....................................................................................................................................93

    5.1. Introduction .............................................................................................................................94

    5.1.1. The lithology prediction problem ....................................................................................97

    5.1.2. Random Forests..............................................................................................................98

    5.1.3. Support Vector Machines................................................................................................99

    5.2. Data ........................................................................................................................................101

    5.2.1. Tectonic setting and history ..........................................................................................101

    5.2.2. Data sources .................................................................................................................103

    5.2.3. Data pre-processing ......................................................................................................103

    5.3. Methods..................................................................................................................................103

    5.3.1. Training and evaluating algorithms ...............................................................................105

    5.3.2. Variance.......................................................................................................................106

    5.4. Results ....................................................................................................................................106

    5.5. Discussion...............................................................................................................................114

    5.6. Conclusions ............................................................................................................................118

    5.7. Acknowledgements ................................................................................................................119

    http://www.go2pdf.com

  • xii Machine learning for geological mapping

    CHAPTER 6 MAPPING GEOLOGY AND VOLCANIC-HOSTED MASSIVE

    SULFIDE ALTERATION IN THE HELLYERMT CHARTER REGION,

    TASMANIA, USING RANDOM FORESTS AND SELF-ORGANISING MAPS

    ................................................................................................................................................ 121

    6.0. Abstract..................................................................................................................................121

    6.1. Introduction ...........................................................................................................................122

    6.1.1. Geological setting .........................................................................................................123

    6.1.2. Random Forests ............................................................................................................128

    6.1.3. Self-Organising Maps ...................................................................................................130

    6.2. Data and Methods ..................................................................................................................130

    6.2.1. Source data ...................................................................................................................130

    6.2.2. Data sampling...............................................................................................................131

    6.2.3. Training Random Forests and variable selection ............................................................133

    6.2.4. Implementing Self-Organising Maps .............................................................................136

    6.3. Results ....................................................................................................................................137

    6.3.1. Geological classification using Random Forests ............................................................137

    6.3.2. Discrimination of geological sub-classes using Self-Organising Maps............................141

    6.4. Discussion...............................................................................................................................144

    6.5. Conclusions ............................................................................................................................146

    6.6. Acknowledgements.................................................................................................................147

    CHAPTER 7 SPATIAL-CONTEXTUAL MACHINE LEARNING SUPERVISED

    CLASSIFIERS: LITHOSTRATIGRAPHY CLASSIFICATION EXAMPLE ........ 149

    7.0. Abstract..................................................................................................................................149

    7.1. Introduction ...........................................................................................................................150

    7.1.1. Pre-processing methods.................................................................................................152

    7.1.1.1. Focal operators.........................................................................................................152

    7.1.1.2. Image segmentation..................................................................................................153

    7.1.2. Training data selection ..................................................................................................154

    7.1.3. Post-processing methods ...............................................................................................155

    7.1.4. Combination methods ...................................................................................................155

    7.1.5. Study aims....................................................................................................................155

    7.2. Data ........................................................................................................................................156

    7.2.1. Lithostratigraphy classification target .........................................................................156

    7.2.2. Geophysical data input variables ................................................................................159

    7.2.2.1. Pre-processing..........................................................................................................160

    7.3. Methods..................................................................................................................................160

    7.3.1. Data sampling...............................................................................................................160

    7.3.2. Global pixel-based classifiers........................................................................................162

    http://www.go2pdf.com

  • Contents xiii

    7.3.3. Spatial-contextual classifiers.........................................................................................162

    7.3.3.1. Pre-processing..........................................................................................................162

    7.3.3.2. Algorithm training....................................................................................................164

    7.3.3.3. Post-processing ........................................................................................................165

    7.3.4. Prediction evaluation ....................................................................................................165

    7.4. Results ....................................................................................................................................165

    7.5. Discussion...............................................................................................................................173

    7.5.1. Spatial-contextual classifiers compared .........................................................................173

    7.5.2. Issues of spatial scale....................................................................................................175

    7.5.3. Geological interpretations .............................................................................................176

    7.6. Conclusions ............................................................................................................................177

    CHAPTER 8 SYNTHESIS AND DISCUSSION ........................................................ 179

    8.1. Algorithms..............................................................................................................................179

    8.1.1. Supervised classification...............................................................................................179

    8.1.1.1. Implementation ........................................................................................................180

    8.1.1.2. Decision structures...................................................................................................181

    8.1.1.3. Accuracy comparison ...............................................................................................181

    8.1.1.4. Spatial-contextual classifiers ....................................................................................183

    8.1.1.5. Prediction uncertainty...............................................................................................184

    8.1.2. Unsupervised clustering................................................................................................185

    8.2. Applications ...........................................................................................................................186

    8.2.1. Data pre-processing ......................................................................................................186

    8.2.1.1. Data preparation.......................................................................................................187

    8.2.1.2. Variable extraction ...................................................................................................188

    8.2.1.3. Variable selection.....................................................................................................189

    8.2.2. Classifier training .........................................................................................................189

    8.2.2.1. Training and test data ...............................................................................................190

    8.2.2.2. Classifier induction ..................................................................................................190

    8.2.2.3. Classification post-processing...................................................................................191

    8.2.3. Evaluation and interpretation ........................................................................................192

    8.2.3.1. Statistical evaluation ................................................................................................193

    8.2.3.2. Interrogating decision structures ...............................................................................194

    8.2.3.3. Complementary interpretation ..................................................................................197

    8.3. Extended research implications.............................................................................................199

    8.3.1. Integrated workflow using R.........................................................................................199

    8.3.2. Wider geoscience applications ......................................................................................200

    8.3.3. Big Data .......................................................................................................................202

    CHAPTER 9 CONCLUSIONS...................................................................................... 205

    http://www.go2pdf.com

  • xiv Machine learning for geological mapping

    REFERENCES .................................................................................................................... 209

    APPENDIX A MACHINE LEARNING ALGORITHM SENSITIVITY TO

    IMBALANCED CLASS DISTRIBUTIONS .................................................................. 253

    A.1. Introduction ..........................................................................................................................253

    A.2. Methods .................................................................................................................................254

    A.3. Results ....................................................................................................................................256

    A.4. Discussion and Conclusions ...................................................................................................259

    APPENDIX B VARIANCE AND ENTROPY FOR MULTICLASS

    CLASSIFICATION UNCERTAINTY ............................................................................ 261

    APPENDIX C SUPPLEMENTARY INFORMATION............................................. 263

    C.1. Data ........................................................................................................................................263

    C.2. MLA software and parameters..............................................................................................266

    APPENDIX D R PACKAGES....................................................................................... 269

    APPENDIX E DATA SOURCES AND PRE-PROCESSING .................................. 271

    APPENDIX F R CODE AND SCRIPTS...................................................................... 275

    README.txt.....................................................................................................................................275

    http://www.go2pdf.com

Recommended

View more >