assessment of machine learning applied to x-ray

Assessment of Machine Learning Applied to

X-Ray Fluorescence Core Scan Data from

the Zinkgruvan Zn-Pb-Ag Deposit,

Bergslagen, Sweden

Filip Simán

Natural Resources Engineering, master's level (120 credits)

2020

Luleå University of Technology

Department of Civil, Environmental and Natural Resources Engineering

2

Preface

“Before the we implement A.I., we need to have the I”

- C. M. Simán

3

Abstract

Lithological core logging is a subjective and time consuming endeavour which could possibly be automated, the question is if and to what extent this automation would affect the resulting core logs. This study presents a case from the Zinkgruvan Zn-Pb-Ag mine, Bergslagen, Sweden; in which Classification and Regression Trees and K-means Clustering on the Self Organising Map were applied to X-Ray Flourescence lithogeochemistry data derived from automated core scan technology. These two methods are assessed through comparison to manual core logging. It is found that the X-Ray Fluorescence data are not sufficiently accurate or precise for the purpose of automated full lithological classification since not all elements are successfully quantified. Furthermore, not all lithologies are possible to distinquish with lithogeochemsitry alone furter hindering the success of automated lithological classification. This study concludes that; 1) K-means on the Self Organising Map is the most successful approach, however; this may be influenced by the method of domain validation, 2) the choice of ground truth for learning is important for both supervised learning and the assessment of machine learning accuracy and 3) geology, data resolution and choice of elements are important parameters for machine learning. Both the supervised method of Classification and Regression Trees and the unsupervised method of K-means clustering applied to Self Organising Maps show potential to assist core logging procedures.

4

Table of Contents Preface ......................................................................................................................................................................2 Abstract .....................................................................................................................................................................3 Introduction..............................................................................................................................................................5

Geological Background of Bergslagen ..................................................................................................................5 Zinkgruvan ...........................................................................................................................................................8

Methodology ............................................................................................................................................................9 Overview ..............................................................................................................................................................9 XRF Core Scanning ........................................................................................................................................... 11 Lithogeochemistry and XRF Data Validation .................................................................................................... 12 Virtual Core Logging (VCL) .............................................................................................................................. 15 Data pre-processing ............................................................................................................................................ 17 K-means clustering applied to the Self Organising Map..................................................................................... 22 Classification and Regression Trees (CART) ..................................................................................................... 24 Conventional Core Logging ............................................................................................................................... 27 Domain Comparison .......................................................................................................................................... 27

Results .................................................................................................................................................................... 27 Classification by Virtual Core Logging ............................................................................................................... 27 Classification by K-means Clustering on the SOM ............................................................................................ 31 Classification by CART ..................................................................................................................................... 33 Comparison of Agreement Scores ...................................................................................................................... 33 Comparison of VCL-2 to Machine Learning Logging ....................................................................................... 33 Method comparison on DBH 4442.................................................................................................................... 37 CART-1 vs CART-5 ........................................................................................................................................ 40 CART-3 vs CART-4 vs CART-5 .................................................................................................................... 40 SOM-1 vs SOM-2 ............................................................................................................................................. 40

Discussion ............................................................................................................................................................... 44 Finding 1: Domain Comparison of Unsupervised and Supervised learning ....................................................... 44 Finding 2: Method Comparisons to the Conventional Method ........................................................................ 44 Finding 3: Machine learning dependence on Geology....................................................................................... 45 Finding 4: Data Resolution for SOM and CART ............................................................................................. 45 Finding 5: Elements Included in Data Given to Machine Learning ................................................................... 45 Suggestions for Improving Accuracies of Machine Learning.............................................................................. 46

Conclusions ............................................................................................................................................................ 47 Acknowledgements ................................................................................................................................................ 47 References .............................................................................................................................................................. 47 Appendix I .............................................................................................................................................................. 50 Appendix II ............................................................................................................................................................ 54 Cover Picture: Voxel model of a core box with XRF data projected on top.

5

Introduction Since Palaeolithic hominins started mining hematite in the Ngwenya mine 43,000 years

ago [R.A. Dart, 1969] and up until the beginning of the 20th century, the human brain has been the main tool for mineral exploration. However, we may be facing a paradigm shift in how exploration is conducted, as big-tech companies such as Amazon.com, Inc. and Google LLC enter the oil and gas business applying their expertise in artificial intelligence (throughout the paper artificial intelligence will be abbreviated A.I. so as to not confuse it with the chemical symbol for aluminium, Al) [C.M. Matthews, 2018]. The use of A.I. tools built upon machine learning methods and algorithms are now also emerging in the mineral exploration industry.

The development of X-Ray Fluorescence (XRF) during the late 20th century [Beckhoff et al., 2006] has proven to be a powerful tool for the exploration industry. XRF devices are now being built into automatic core scanners by various enterprises, such as Minalyze AB, enabling the development of new work routines for core logging. The obvious link between XRF core scanners and A.I. is automation, allowing for faster data collection and processing, respectively. The mining industry has seen much progress in efficiency considering process automation and scale. This has in turn made it difficult for mining geologists to keep pace with production, which in extent threatens the production itself. One solution to this would be to employ more geologists, even though this is cost-intensive and results in known issues of subjectivity in core logging with individual interpretations [Curtis, A. 2012]. Hence, the development of a more efficient and objective methodology, as attempted in this study, is an intriguing solution.

This study assesses the utility of machine learning applied to XRF core scan data in mineral exploration, attempting to answer if A.I. can increase the efficiency of core logging without losing critical geological information. This was done by testing the performance of three lithological core logging methods; 1) Virtual Core Logging (VCL), 2) the unsupervised Self Organising Map (SOM) with K-means clustering, 3) the supervised Classification and Regression Trees (CART). The identical XRF core scan dataset was used for all methods. These three methods were then checked against a, more conventional, manual core logging method with collection of lithogeochemical data as a quality control for XRF core scanning data.

To assess machine learning applied to XRF core scan data, the Zinkgruvan deposit in south-western Bergslagen, Sweden, was used as a test site. Due to the relative simplicity of the Zinkgruvan stratigraphy, the deposit was well suited for this study.

Geological Background of Bergslagen The economic significance of Bergslagen is widely known with knowledge and

extraction of its mineral deposits tracing back to 375 BCE [Bindler et al., 2017]. During the 12th century the mining of iron began in Norberg and quickly spread in the region, then in the 13th century copper mining began at Falun [Bindler et al., 2017] fuelling Sweden’s industrial success for centuries to come. Bergslagen is dominated by iron oxide deposits with about half as many sulphide deposits [Stephens et al., 2009] (Figure 1).

Models of the geological and tectonic evolution of Bergslagen have seen various revisions over time, with the current understanding involving a 1.91–1.89 Ga extensional back-arc setting, as suggested by Allen et al. [1996]. This was further developed by Hermansson et al. [2008] with a younger 1.86 Ga extensional event. This complex switching between compressional and extensional regimes in an accretionary setting is thought to have been part of the Svecokarelian orogeny, lasting for ca. 50 million years [Stephens et al. 2020, 2009]. The intense volcanism in this palaeo-environment discharged large amounts of ash, which sedimented in shallow marine basins [Kampmann et al. 2017]. Another effect of this heat was the circulation of hydrothermal

6

fluids in the upper crust and the leaching of metals that were deposited either as replacement mineralisation or in brine pools on the seafloor [Kampmann et al. 2017, Jansson et al., 2017]. Broadly, there are three major lithologies formed between 1.9–1.8 Ga, the oldest of which are metagreywackes, followed by rhyolitic to dacitic metavolcanics derived from the intensive volcanism, commonly intercalated with metasedimentary rocks including marbles, with which the metallic deposits are spatially associated [Stephens et al. 2020, 2009]. The youngest lithologies are pelitic metasedimentary rocks [Stephens et al. 2020, 2009].

During and after the emplacement of the main lithologies in Bergslagen, intrusive suites formed in the region, namely the Granitoid-Dioritid-Gabbroid (GDG), the Granitoid-Syenitoid-Dioritoid-Gabbroid (GSDG) and the Granite-Pegmatite (GP) suites [Kampmann et al., 2016].

Crucial for geochemical exploration in Bergslagen is an understanding of the regional hydrothermal alteration prior to the greenschist- to amphibolite-facies metamorphism. A variety of alterations have caused compositional changes to feldspars including their destruction in favour of phyllosilicates [Stephens et al., 2009]. Skarns are common in marble-rich areas where hydrothermal fluids have interacted with carbonates [Kampmann et al. 2017, Jansson et al., 2011].

The sulphide deposits in Bergslagen bear some similarities to volcanogenic massive sulphide (VMS) deposits or sedimentary exhalative (SEDEX) deposits. Most of the sulphide deposits occur as replacement mineralisation in marble which differs from the VMS and SEDEX genetic models. Hence, the classification of deposits in Bergslagen is subject to discussion with advocates for different classifications [Jansson et al. 2017]. Allen et al. [1996] coined the terms Stratiform Ash Siltstone (SAS) deposits and Stratabound Volcanic-Associated and Limestone Skarn (SVALS) deposits as local alternatives to the narrowly defined SEDEX and VMS classifications, where these local terms are broader to encompass the large differences in style among the deposits in Bergslagen.

7

Figure 1. Geological map of Bergslagen showing the location of iron oxide, base metal and non-metalic mineral deposits, modified after Stephens et al. 2009.

8

Zinkgruvan The Zinkgruvan deposit is situated in south-western Bergslagen ( Figure 1). Two different kinds of ores are mined at Zinkgruvan, i.e. stratiform Zn-Pb-Ag

ore and stratabound Cu ore. The Zn-Pb-Ag resources are at 15.6 Mt grading in average 9.3 wt.% Zn, 3.7 wt.% Pb and 84 g/t Ag, whereas the Zn-Pb-Ag reserves are at 11.9 Mt grading 7.2 wt.% Zn, 2.9 wt.% Pb and 63 g/t Ag. The stratabound Cu resources are at 4.9 Mt with average grades of 2.3 wt.% Cu, 0.3 wt.% Zn and 32 g/t Ag, whereas the stratabound Cu reserves are at 5.2 Mt grading 1.8 wt.% Cu, 0.2 wt.% Zn and 26 g/t Ag [Wardell Armstrong, 2017].

The local stratigraphy of Zinkgruvan is relatively continuous along strike for several kilometres with only a few variations (Figure 2). The ore horizon stretches for 5 km with a thickness of about 5 to 25 m [Hedström et al. 1989]. The stratigraphy is displaced along the Knalla fault which is situated between the Knalla and Nygruvan mineshafts. As a result of regional ductile deformation, the morphology of the stratiform deposit is controlled by an overturned east-west trending syncline [Jansson et al., 2017] (Figure 2). Stratigraphic inversion resulted from folding; hence the stratigraphic footwall overlies the deposit as the structural hanging wall. Main stratigraphic units are the Mariedamm, the Zinkgruvan and the Vintergölen formations.

Figure 2. Map of Zinkgruvan showing the overturned east–west trending syncline, modified after Jansson et al. [2017]

The Mariedamm formation mainly consists of microcline-quartz rock and constitutes

the core of the previously mentioned east-west syncline (Figure 2), enveloped by the younger units. This oldest unit is characterized by a red colour at stratigraphic depth, whereas further up towards the Zinkgruvan formation it has a grey-white colour, indicating a decrease in hematite and magnetite [Jansson et al., 2017]. The grey-white Mariedamm rocks are dominated by

9

microcline-quartz with subordinate biotite [Jansson et al., 2017]. As a result of the percolation of hydrothermal fluids these rocks were depleted in Na, Mg, Fe, Mn and enriched in K and Ba [Hedström et al., 1989].

Stratigraphically overlying the Mariedamm formation is the Zinkgruvan formation which is more heterogenous with skarn-marble beds, metamafite inliers, quartzite and volcanoclastic metatuffite [Jansson et al., 2017]. The thickness and character of these rocks differ between the west and east side of the Knalla fault. On the west side a thin layer of metatuffite marks the transition from the underlying Mariedamm microcline-quartz, followed by a thick layer of dolomitic marble, the lower part of which locally hosts Cu-Co mineralisation and the top part locally hosts magnetite mineralisation. On the east side the metatuffite and marble are interbedded with skarn. Above the metatuffite and marble the Zn-Pb-Ag mineralisation occurs, thinner on the east side and thicker to the west of the Knalla fault. The mineralisation is locally overlain by a skarn unit on top of which another metatuffite unit occurs. It is well laminated with interbedded calcitic skarn-marble, referred to as KSL in this study.

The youngest stratigraphic unit, the Vintergölen formation, consists of gneissic-migmatised metasedimentary pelite [Jansson et al., 2017]. The transition of this pelitic unit is marked by the FES unit consisting of disseminated pyrrhotite in metatuffite, which is locally graphitic. This is followed by the GBK unit comprised of garnet porphyroblasts in a biotite-quartz matrix.

Radiometric dating efforts of the Zinkgruvan deposit by Kumpulainen et al. [1996] give an age estimate of 1901 +/- 18 to 1889 +35/-24 Ma after U/Pb dating of rhyolites. This age is contemporary with the volcanic activity in Bergslagen [Jansson et al., 2011].

The classification of Zinkgruvan is subject to different interpretations. Jansson et al. [2017] summarises this controversy by stating that, although a consensus on a syngenetic character has been reached, different models are still suggested, such as; VMS deposit, broken hill type (BHT) deposit, SEDEX deposit and VMS-SEDEX hybrid deposit. An apt description of the Zinkgruvan deposit is Stratiform Ash Siltstone (SAS) deposit [Allen et al. 1996].

Methodology

Overview The aim of this study has been to characterise four drill cores and compare three

approaches of doing so. These three methods are: 1) Lithological core logging in a virtual environment including XRF data in this study referred to as VCL (Virtual Core Logging), 2) an unsupervised clustering method applied to the XRF data referred to as K-means clustering applied to the Self Organising Map (SOM) and 3) a supervised classification method also applied to the XRF data referred to as Classification and Regression Trees (CART). These three approaches will receive a more detailed description in later sections. Finally, these three methods were compared to a conventional method of manual core logging on site. This contribution followed the outlined workflow in Figure 3. The sections below will detail each step of this workflow. The four selected drill cores (DBH 4442, DBH 4400, DBH 4420 and DBH 804) were made available courtesy of the near-mine exploration department at Zinkgruvan Mining AB.

10

Figure 3. Outline of the method workflow in this study, detailing the steps from data to graphic logs for comparison. Rectangular boxes represent each heading of the methodology section, whereas each elliptical box represents resulting core logs. The ellipses are chronologically organised downward in order of completion in the study. CART-2 has been excluded due to an error in the data pre-processing.

11

XRF Core Scanning The core scanner (Figure 4) used to acquire the XRF data for this study is a semi-

automatic optical and geochemical drill core scanner, developed and commercialized by Minalyze AB, that is able to analyse entire core trays. While a core box is in the machine, a sensor travels continuously over the core, collecting optical and XRF data simultaneously, achieving a scanning rate of 1 cm/sec while taking 10 px/mm resolution images [Sjöqvist et al., 2015].

Figure 4. The Minalyze XRF core scanner. Online image, Minalyze AB, https://minalyze.com

The mounted XRF is a non-destructive method using energy-dispersive spectrometry

(EDS) with a silicon drift detector (SDD) [Sjöqvist et al., 2015]. The benefit of SDD is that it minimises electronics noise and thus is ideal for high-resolution X-Ray spectrometry, allowing for the collection of low energy photons [Beckhoff et al., 2006]. In addition to this the XRF operates in a vacuum allowing for less attenuation in the air allowing for lower energy photons to reach the detector, hence a stronger signal is obtained. Plainly, this means that light elements down to Mg and Al can be detected.

For this project the XRF scan in form of counts per second were converted to geochemical concentrations for the 23 elements; Al, Si, P, S, Cl, K, Ca, Ti, V, Cr, Mn, Fe, Ni, Cu, Zn, As, Rb, Sr, Y, Nb, Zr, Ba, Pb by an empirical approach utilising OREAS 24B and OREAS 624 as standard materials for calibration. This paper has chosen to focus on Al, Si, K, Ca, Ti and Fe for analysis since these are most representative of the Zinkgruvan lithologies.

12

Lithogeochemistry and XRF Data Validation To enable assessment of the XRF data from core scanning it is important to know the

composition of the scanned material. This was achieved by collection of lithogeochemical samples from a selection of the lithologies in this study. The drill core DBH 4442 has been selected for collection of these lithogeochemical control data since it intersects most major lithologies found in the Zinkgruvan stratigraphy. Eight homogenous samples were selected; two samples from the pelite, three from the metatuffite, two from the marble and one from the diopside skarn. No samples were taken from the microcline-quartz rock since this lithology is not present in the selected drill core. Along with the eight samples two certified reference materials were included in the sample batch, provided by Boliden AB. Sample preparation and analysis were performed by the commercial laboratory ALS Minerals, involving crushing and milling and a complete whole-rock characterisation package using four acid digestion and inductively coupled plasma mass spectrometry (ICP-MS) to quantify the entire suite of elements that the XRF core scanner can analyse.

From the Lab:XRF ratio (Figure 5), it is found that the XRF core scanner overestimates the elements Al, Si, K, Ca and Ti. However, the XRF underestimates Fe for all rocks apart from marble. The marble unit shows a dip in the ratio for several elements, most notable K and Ti, since the marble is dolomitic with about 15–17 wt.% Mg [Jansson et al. 2017], which the XRF cannot accurately quantify. The pronounced overestimation in the dolomitic marble may therefore be a closure effect, i.e. the elements that the XRF can quantify are filling up the 15–17 wt.% gap that would have been Mg. This concept may also explain the general overestimation by XRF compared to the lab.

Figure 5. The Lab measured concentration divided by the XRF measured concentration (ratio) of eight samples from DBH 4442 representing most of the major lithologies, illustrating the overestimation (Ratio < 1) and underestimation (Ratio > 1) of certain elements during compositional analysis using XRF core scanning data. Deviation from the straight line indicateds an anomaly i.e. a difference in the measured concentration between the lab and XRF.

By drawing XY-plots of lab vs. XRF values ( Figure 6), conclusions about the precision and accuracy of the XRF can be made by

observing the spread of the data points and the slope of the regression line through these points. It is found that the two methods plot along a straight regression line for most elements, showing that the relative values are similar between the XRF and the lab values, hence we may conclude that the precision of the XRF is good. Aluminium (

Figure 6) exhibits an unclear trend demonstrated by the second-lowest squared correlation coefficient (R2) value of 0.876, this may be caused by Al being a light element and resulting difficulties for the XRF to quantify it. The R2 values found in this study are comparable

13

to other literature [Maliki, A. et al, 2017, Gregory B. et al, 2019] where authors have completed similar assessments of XRF accuracies in relation to ICP-MS. The slope of the regression lines are < 1 and indicate, in the same way as the lab:XRF ratio (Figure 5), that the XRF overestimates the abundance of analysed elements. We conclude that the concentrations are semi-quantative, i.e. unsuitable for applications trying to perform calculations on the data. However, for the case of this study where relative concentrations of elements are used for classification the data is fit for purpose.

Figure 6. XY-plots with regression lines and squared correlation coefficients for the six elements mainly used for classification. (Ti measured in parts per million, ppm)

The distributions of the XRF data, visualized in histograms (Figure 7), are asymmetric

and thus do not conform to a normal distribution. Aluminium (Figure 7a) displays a peak in the first bin, which indicates poor quantification of Al. The histograms also reveal the bimodal distribution of some of the data, suggesting two distinct groups. This bimodal distribution is most clearly found in Si and Ca (Figure 7b and 7d). Figure 7c, showing K, indicates a great variation in values. For the purpose of classification, it is preferable to have peaks that are distinctive allowing for the separation of groups.

Scatter plots have the benefit of comparing two elements simultaneously and allowing for identifying groups in two-dimensional space. Figure 8a to 8d show such scatter plots including different elements. Depending on the plotted elements, different groups are more or less distinct. For example, Figure 8a demonstrates a case where it is difficult to outline a distinctive group, whereas Figure 8c allows for the distinction of two or three groups.

14

Figure 7. Histograms describing the data distribution of the six main elements used in this study for rock classification.

15

Figure 8. Scatter plots describing the lithogeochemistry of the cores. Plotting different elements against each other allow for the visualisation of different geochemical groups.

Virtual Core Logging (VCL)

Virtual Core Logging (VCL) represents the off-site equivalent to the more conventional on-site core logging. The VCL method bears many similarities to conventional methods as a human observer records visual geological features such as mineralogy, colour, texture and grain size. However, as a complement to the visual logging, the VCL method also allows for consideration of XRF geochemical data during logging. For this contribution, the core and XRF data visualisation web service Minalogger was used in tandem with the geochemical analysis software ioGAS by IMDEX Ltd.

The virtual logging was done by viewing core trays as voxel models (Figure 9). In addition to this core could also be viewed as high resolution images at 10 px/mm resolution (Figure 10). Due to the virtual environment it was not possible to assess magnetism, hardness or reaction with hydrochloric acid. The XRF data was represented as bars on this virtual core (Figure 9), allowing for visual assessment of XRF concentrations in relation to other visual features.

16

Figure 9. Voxel model of core boxes with XRF concentrations projected as red bars ontop. Done in Minalogger by Minalyze AB.

To enable a classification by VCL with XRF data to support decision making a literature

study was done to identify the lithogeochemical patterns that best reflect the different lithologies found in the Zinkgruvan stratigraphy. Jansson et al. [2017, 2018] and Hedström et al. [1989] contribute with comprehensive geochemical descriptions of the Zinkgruvan stratigraphy, highlighting which elemental signatures may be expected from the different rock types. During a visit to Zinkgruvan additional information about the lithologies and Zinkgruvan logging scheme were obtained. Figure 10. Optical photograph of a core box from DBH 4442 demonstrating the challenge of heterogenity. In the case of this

core photo, neosome rafts are shown occuring in the pelitic rocks.

17

More advanced geochemical analysis was performed following the visual logging. The XRF lithogeochemistry data were analysed with the objective of defining a set of geochemical fingerprints for the different lithological units with the help of histograms and scatter plots (Figure 7a–f and Figure 8a–d). Useful fingerprint elements for VCL were Al, Si, S, K, Ca, Ti, Fe, Cu, Zn and Zr. Additionally, the geochemical ratios of Al:Ti and Zr:Ti were calculated and used for classification since these exhibit less variation in the microcline-quartz rock than in the metatuffite and hence work for classifying rocks [Jansson et al., 2017]. Of these fingerprints a selection were chosen for machine learning, the selected elements for machine learning were Al, Si, K, Ca, Ti and Fe.

Once both the visual and geochemical classifications were completed, the two were combined to produce VCL-1. In general, the visual log set lithological boundaries whereas the geochemical log assisted in classification of lithologies. At a later stage of the study, a second round of virtual core logging was conducted (VCL-2). VCL-2 was more informed than VCL-1, since other methods had revealed more information prior to this second virtual core log. Having two VCL’s allowed for a comparison of using VCL-1 or VCL-2 as learning data in supervised learning algorithms. Furthermore, a comparison of VCL-1 and VCL-2 allows for a discussion on human subjectivity in core logging.

Data pre-processing The raw XRF core scanning data have two intrinsic properties that must be addressed

prior to the data analysis by machine learning: 1) the data contain values below the limit of detection (LOD) of the XRF core scanner; and 2) the data belong to the simplex space, i.e. the data are compositional. These two properties are commonly not an issue when conducting a simple exploratory data analysis as done for the VCL stage. However, they do pose problems when applying more advanced statistical analysis tools than the ones used in this study. It was therefore necessary to pre-process the XRF data.

This study investigated the differences in the results using varying data resolutions, i.e. in the length of the XRF scan intervals. In this study a 1-metre resolution dataset and a 10-centimetre resolution dataset were considered. A consequence of this was that the two datasets behaved differently in pre-processing, which in extension meant the incorporation of different elements in the two datasets when applying the same pre-processing rules.

The average detection limits for each element is presented in Table 1 and Table 2. To mitigate the issue of values below the detection limit, such data were replaced: Elements for which more than 30 % of the values were below the detection limit were not used for advanced statistical analysis, as recommended by Martín-Fernández et al. [2012]. With 30 % as an omission criterion, the elements P, S, Cl, Cr and As were filtered out for the 1-metre resolution data set (Table 1); and P, S, Cl, V, Cr, As, Y and Nb were filtered out for the 10-centimetre dataset (Table 2). Elements with less than 30 % missing values were subjected to a replacement rule to impute missing values. In these cases missing values were replaced by a value corresponding to half the detection limit, as suggetsed by Hood et al. [2018].

18

Table 1. Data validation of 1-metre resolution XRF data. The criteria for an elements input data to pass the data validation was that less than 30 % of the data were below the limit of detection (<LOD) of the XRF detector.

Element <LOD Count <LOD % of dataset Average

Detection Limit Data

Validation Si (wt.%) 0 0.00 0.2755 Pass K (wt.%) 0 0.00 0.0041 Pass

Ca (wt.%) 0 0.00 0.0022 Pass Ti (ppm) 0 0.00 17.0168 Pass

Mn (ppm) 0 0.00 7.6498 Pass Fe (wt.%) 0 0.00 0.0007 Pass Ni (ppm) 0 0.00 4.3617 Pass Cu (wt.%) 0 0.00 0.0006 Pass Zn (wt.%) 0 0.00 0.0007 Pass Sr (ppm) 0 0.00 5.0839 Pass Pb (wt.%) 0 0.00 0.0033 Pass Rb (ppm) 1 0.08 5.2115 Pass Zr (ppm) 12 1.00 10.2912 Pass Y (ppm) 81 6.73 9.0792 Pass

Ba (ppm) 96 7.97 23.4059 Pass Al (wt.%) 104 8.64 1.1021 Pass V (ppm) 204 16.94 7.6894 Pass

Nb (ppm) 320 26.58 6.2024 Pass S (wt.%) 393 32.64 0.0845 Fail Cr (ppm) 790 65.61 6.0711 Fail Cl (ppm) 844 70.10 42.2893 Fail As (ppm) 1137 94.44 0.4970 Fail P (ppm) 1197 99.42 124.1513 Fail

19

Table 2. Data validation of 10-centimetre resolution data. The criteria for an elements input data to pass the data validation was that less than 30 % of the data were below the limit of detection (<LOD) of the XRF detector.

Element <LOD Count <LOD % of total Average

Detection Limit Data Validation K (wt.%) 0 0.00 0.0130 Pass

Ca (wt.%) 0 0.00 0.0070 Pass Fe (wt.%) 0 0.00 0.0022 Pass Zn (wt.%) 0 0.00 0.0024 Pass Ni (ppm) 2 0.02 14.0766 Pass Si (wt.%) 8 0.07 0.8811 Pass

Mn (ppm) 26 0.22 24.4770 Pass Ti (ppm) 143 1.19 54.5454 Pass Sr (ppm) 412 3.43 16.5626 Pass Rb (ppm) 539 4.49 16.9895 Pass Pb (wt.%) 930 7.75 0.0109 Pass Cu (wt.%) 953 7.94 0.0019 Pass Zr (ppm) 1126 9.38 33.3733 Pass Ba (ppm) 1859 15.49 74.1581 Pass Al (wt.%) 2750 22.92 3.4657 Pass V (ppm) 4615 38.46 25.5209 Fail Y (ppm) 5356 44.64 28.5236 Fail S (wt.%) 8132 67.77 0.2076 Fail Cr (ppm) 9354 77.96 21.5028 Fail Cl (ppm) 9893 82.45 144.4056 Fail Nb (ppm) 10130 84.42 20.2232 Fail As (ppm) 11452 95.44 2.1779 Fail P (ppm) 11967 99.73 566.6043 Fail

The second issue of data belonging to the simplex space, was addressed by a log-ratio

transformation. The simplex is a mathematical space in which each data point represents the part of a whole, hence all the values are relative to each other [Aitchison, 1982]. As geochemical concentrations are described in weight percentages (wt.%) or parts per million (ppm), the data space is described in the simplex. Using advanced statistical analysis tools on data in the simplex could lead to spurious correlations giving no real insight into the data [Hood et al., 2018]. To deal with this issue the data must transformed into Euclidean space, which can be done by a handful of methods including additive log-ratio (ALR), centered log-ratio (CLR) and inverse log-ratio transforms (ILR) [Egozcue 2003, Martin-Fernandéz 2012]. Since the investigation of the suitability of these transforms is beyond the scope of this study, it was decided to use the CLR, since this method was prebuilt into the ioGAS software. This method was also chosen for a similar study [Hood et al. 2018]. Applied on the dataset of this study, the CLR transformation produces distributions with better resemblance of normally distributed data (Figure 11) by normalising the data to their geometric means [Egozcue et al., 2003]. The geometric mean g(x) of the data, vector x, is found by equation (1).

𝑔(𝒙) = [𝑥 𝑥 … 𝑥 ] / (1)

20

Where xD represents the parts of vector x in the simplex space and D is the number of

dimensions of said vector, i.e. the number of elements analysed.

𝐶𝐿𝑅(𝒙) = 𝑙𝑛(𝒙)

, 𝑙𝑛(𝒙)

, … , 𝑙𝑛(𝒙)

= ln ( 𝒙(𝒙)

) (2)

The CLR transform is done by dividing each part, xD, of the vector x by the geometric

mean, g(x), and find the natural logarithm of this ratio, as described by equatiuon (2).

21

Figure 11. Comparison of Ca wt.% distribution to transformed Ca-CLR distribution.The CLR transformation from simplex space to Euclidiean space produces a distribution better resembling a normal distribution. Notice that the bins in the Ca-CLR do not reflect geochemical concentrations but rather the transformed concentrations in Euclidean space.

22

K-means clustering applied to the Self Organising Map

In this study, an unsupervised machine learning technique is represented by applying K-means clustering to the Self Organsing Map (SOM). Unsupervised learning is used in cases with no domain knowledge, i.e. if the number of lithologies and their names are unknown. In reality this study had domain knowledge but this knowledge was ignored when testing the unsupervised method.

Of the 23 elements from the core scanning 6 elements (that had passed the data validation) were selected. These were chosen to resemble the geochemical character of the major lithologies, following the approach by Hood et al. [2018]. These elements were Al, Ca, Fe, Si, K and Ti. The SOM method was executed on all four drill cores.

One of the challenges inherent to classifying geochemical samples is the high (>3) dimensionality of the data. This is caused by each data point representing numerous elements. Due to the high dimensionality it is difficult to graphically represent all values of a single data point. A method to tackle this issue is the SOM, which takes an input of all data points in n dimensions and projects them into a lower dimensional space whilst also preserving proximity relationships between points [Duda et al., 2001]. In this study six dimensions (corresponding to six elements), were used as an input for the SOM, the output being a two-dimensional unity matrix. As the spatial proximity of points is preserved, similar samples are mapped in neighbourhoods, which allows for an explicit clustering and classification [Kohonen, 2013].

When executing the SOM algorithm, parameters for matrix size, number of iterations, initial neighbour radius and learning rates, need to be set. Matrix size determines the number of row and columns of the SOM output, effectively altering the resolution of the map. The number of iterations determine how many times data are processed through the SOM algorithm. For each iteration, the data are presented in a different and random sequence. Initial neighbour radius dictates up to which distance best matches between data points are identified. It is recommended to set this parameter equal to half of the SOM matrix size [ioGAS user manual, 2019]. This study executed 89 SOM runs (see Appendix I) altering the following parameters: matrix size, initial neighbour radius, K-value (determined by plotting K-value against variation) and XRF data interval resolution (1-metre or 10-centimetre intervals).

Once the SOM had been finalized, the K-means clustering algorithm was applied, the principle of which is to find cluster centres by iteratively moving randomly chosen cluster centres toward actual cluster centres. The final position of cluster centres in the two-dimensional representation of the data define the centres of Voronoi cells. All surrounding data points inside the same Voronoi cells are grouped as belonging to the same cluster [Duda et al., 2001].

K-means clustering has a few parameters that should be set. These include the number of attempts, the maximum number of clusters and, most importantly, the K-value itself, which determines the number of clusters that the K-means clustering yields. In the case of lithological classification, it would be preferable if the K-value was chosen to be identical to the number of lithologies idenfitied in the core. However, as domain knowledge is ignored, the K-value was selected by plotting the change in variation of values in clusters against the number of clusters. The variation in values is calculated as sum of squares (SS) or delta values, i.e. the distance between data points. The K-value is set to the inflection point at which the variation increases less when adding a new cluster. Figure 12 and Figure 13 show how the selection of these K-valuea for runs 82 and 88 respectively were done.

23

Figure 12. Selection of K-value for run 82. Done by finding the inflection point of the sum of squares (SS) or delta value from cluster centre or by plotting the delta value on the Y-axis against the K-values on the X-axis. SS and delta values are two methods of calculating the distance between data points. With this inflection point K=7.

Figure 13. Selection of K-value for run 88. Done by finding the inflection point of the sum of squares (SS) or delta value from cluster centre or by plotting the delta value on the Y-axis against the K-values on the X-axis. SS and delta values are two methods of calculating the distance between data points. With this inflection point K=6.

24

From the 89 SOM runs (see Appendix II, Table A1) three were selected to represent SOM in comparison against the other methods; 1) run 82 with 1-metre data resolution, 2) run 88 with 10-centimetre data resolution and 3) run 89 also with 10-centimetre data resolution, but with Al included in the analysed data (Table 3). These runs then were used to create core logs as follows: run 82 produced SOM-1, run 88 produced SOM-2 and run 89 produced SOM-2b. Aluminium was not included in SOM-2 because of the the poor quantification of that element. Instead, SOM-2b included Al and was created with the same parameters as SOM-2. The addition of Al also resulted in a change in K-value, with the variation analysis indicating the inflection point at 7, which was selected for SOM-2b. For a full table with all K-means and SOM parameters see Table A1 (Appendix I). Table 3. Table of the some of the parameters used for the three SOM runs chosen to represent the K-means clustering on SOM method.

Run Data resolution Elements K-value

SOM-1 82 1 m Al, Si, K, Ca, Ti, Fe 7

SOM-2 88 10 cm Si, K, Ca, Ti, Fe 6

SOM-2b 89 10 cm Al, Si, K, Ca, Ti, Fe 7

It needs to be kept in mind that K-means clustering applied to the SOM is solely a

clustering algorithm, i.e. without labelling of the clusters. To enable simpler comparison to other methods in this study assignment of lithologies to clusters was needed. This lithology assignment was done by comparing K-means on SOM core logs to VCL-2 core logs. Through this comparison, the mode lithology from VCL-2 for each cluster was defined as the class assignment for each cluster. A correlation matrix between clusters and lithologies found in VCL-2 was also produced to further nuance the lithology assignment to clusters. VCL-2 was chosen for this lithology assignment since VCL-2 was completed at a more mature stage of the project and therefore thought to be a more accurate core log.

Classification and Regression Trees (CART)

The second machine learning method tested in this study was Classification and Regression Trees (CART) which is a supervised learning technique. The fundamental difference to unsupervised techniques such as K-means on SOM is that supervised learning requires domain knowledge about the number and name of lithologies. This domain knowledge is used to train a classifier on a portion of the data. The classifier in this case is a decision tree that consists of a series of nodes submitting the data to binary “true-or-false” questions. At the nodes the data set is split based on the values of each data point. The splitting is dictated by a mathematical split rule that calculates the purity of the two resulting groups after the split [Duda et al., 2001], opting for the highest purity. In the case of lithogeochemistry data, the samples concentration for a certain element is compared to a threshold, and the outcome (above or below threshold) determines which branch is followed to the next node. An unlabelled data point would pass through this sequence of questions until finally arriving at a leaf node at which the data point is assigned a label. The labels are set by the human operator beforehand in the learning data, in this study set to reflect the lithologies identified in the VCL stage. The goal has been to achieve the greatest similarity between data points arriving at the leaf nodes.

For the building of a decision tree, a portion of labelled data have to be sacrificed for

25

learning. Which data and how much data used for the building of decision trees will influence the results. The labelled XRF core scan data is divided into training, validation and test data (Figure 14). Through training and validation accuracies (presented in Appendix II) we are able to assess a decision tree before letting it attempt classification of the test data. The partition in percent of data into training and validation data is set before tree building begins and presented in Appendix II. This study attempts to assess the performance of classifiers, therefore; to find the best classifier to represent the CART method several runs were conducted with different parameters trying to answer two key questions: 1) the ideal size and partition of the labelled XRF core scan data and 2) the impact of selecting different drill cores as test data and as training and validation data. To this end, different configurations of learning (training and validation data) and test data from the four drill cores were done. This was done in an iterative manner by running first one core, followed by two, and ultimately three, as learning data (green in Figure 15).

Figure 14. The division of data into validation, training and test data

A total of 329 runs were completed (see Appendix II, Table A2) testing different

parameters for CART. The first 49 runs were conducted with the aim to understand CART and what configuration of cores as learning data and test data was preferable. The CART tests were done in sets of ten, in order to facilitate averaging of the training and validation accuracies for a statistically reliable result. For each CART test, seven of these sets of ten were done, each time changing which drill cores were used as learning data (Figure 15). The resulting logs were CART-1, CART-3, CART-4, CART-5, CART-3b and CART-4b. CART-2 is excluded from the results due to a critical error in data pre-processing.

For runs 50 to 329 a data partition of 70 % of the learning data as training data and 30 % as validation data, was used. CART-1 and CART-5 used the same elements as SOM for comparability, i.e. Al, Si, K, Ca, Ti and Fe. However, for CART-3 and CART-4 Al was excluded. Reruns of CART-3 and CART-4 were done; CART-3b and CART-4b, where Al was included. The reruns were done to test the influence of Al in CART.

26

Figure 15. The setup unseen, training and test data for CART runs done in this contribution. The figure describes the selection of which cores were used as validation data to be classified (yellow) or training and test data (green) over the 329 runs.

Table 4 presents the different parameters used in CART for this study. CART-1 (runs

50-89) utilised 1-metre data resolution with VCL-1 as ground truth. Since a VCL-2 was later produced it was decided that new runs should be performed with the more informed log as ground truth. The “hybrid” data resolution in CART-4 (runs 170-209) was created by averaging of the 10-centimetre data over 1-metre intervals and subsequent cutting to fit the lithological boundaries of VCL-2, such that no XRF interval crossed a lithological boundary. For a full table with all parameters and the training and test accuracies see Table A2 (Appendix II).

Table 4. Table of which parameters the different CART experiments used. “Hybrid” data resolution uses 10 cm data resolution averaged over 1 metre intervals and cut to fit the lithological boundaries of VCL-2.

Runs Ground truth Data resolution Elements

CART-1 50-89 VCL-1 1 m Al, Si, K, Ca, Ti, Fe

CART-3 130-169 VCL-2 10 cm Si, K, Ca, Ti, Fe

CART-4 170-209 VCL-2 Hybrid Si, K, Ca, Ti, Fe

CART-5 210-249 VCL-2 1 m Al, Si, K, Ca, Ti, Fe

CART-3b 250-289 VCL-2 10 cm Si, K, Ca, Ti, Fe

27

CART-4b 290-329 VCL-2 Hybrid Si, K, Ca, Ti, Fe

Conventional Core Logging

The conventional logging was completed on DBH 4442 and the first 200 meters of DBH 4400, ensuring that all major lithologies found in the Zinkgruvan stratigraphy were logged.

The conventional log was completed by filling out a logging template and using a hand lens, tungsten carbide scribe, neodymium magnet and hydrochloric acid as tools for the investigation of geological parameters.

The purpose of the conventional method of logging was to gain further understanding of how well the three novel core logging methods perform in general and not only between themselves. In this way, the conventional method operates as a validation method for the three novel methods, i.e. as a best estimate of ground truth.

Domain Comparison To determine the success of the classification by each of the methods, it was decided to

compare the machine learning logs to VCL-2 since this method was completed on all cores. Ideally, all methods would be compared directly to the conventional log, however; the conventional log was not completed for all cores. The domain comparison between methods was done by the calculation of agreement scores. Agreement scores were calculated as the percentage of intervals between two compared core logs that are assigned with the same lithology.

Results

Classification by Virtual Core Logging

Figure 16a to 16d show the scatter plots of the XRF geochemical data with the assigned classification from VCL-2.

The pelitic rocks of the Viksjö formation (Figure 17) were easiest identified by their characteristic feldspar porphyroblasts during the visual stage of VCL. Pelite apparently forms its own group at around 3 wt.% K. Pelite can also with some ease be separated by plotting Ti vs. Si (Figure 15d), suggesting that the pelite exhibits the highest Ti values. The transition into GBK was visually found by the presence of garnets, which were targeted geochemically through elevated Fe concentrations.

The metatuffites (Figure 18) were difficult to classify due to their non-unique geochemical signature. No other elements studied in the XRF data characterise the metatuffites distinctly. The metatuffites vary also in optical character, increasing the difficulty of accurate classification.

28

Figure 16. Scatter plots describing the XRF lithogeochemistry of the cores colour coded with assigned VCL-2 lithologies, based on the lab analysis.

Figure 17. Optical photograph of core showing the pelitic rocks of the Viksjö formation.

29

Figure 18. Optical photograph of core showing an example of the metatuffites.

Skarns in the stratigraphy were characterized by elevated Ca values. However, the

distinction between diopside skarns and KSL was mostly based on visual assessment, with the diopside skarns (

Figure 19) exhibiting a green tint. KSL (Figure 20), on the other hand, is visually distinct by apparent banding caused by varying mineralogical composition of ferroan diopside, grossular, wollastonite, calcite, biotite and vesuvianite, and its interbedding with metatuffite [Jansson et al. 2017].

Figure 19. Optical photograph of core of diopside skarn interbedded with metatuffite.

30

Figure 20. Optical photograph of core of finely laminated KSL.

The sphalerite rich mineralized zone (Figure 21) was visually distinct, exhibiting a metallic brown colour, as well as elevated Zn concentrations compared to all other rock types.

The dolomitic marble also belonged to the visually recognisable rocks by its dark spots of serpentine (Figure 22), occurring as a product of retrograde metamorphism of olivine [Jansson et al. 2017]. Marble is a geochemically distinctive group, easily recognizable by its low Si values and high Ca values (Figure 16).

Figure 21. Optical photograph of core showing the sphalerite mineralisation.

31

Figure 22. Optical photograph of core showing the dolomitic marble with its dark spots.

Microcline-quartz rock was easily identified visually (Figure 23) by its quartz

phenocrysts. In the K vs. Si scatter plot (Figure 16c), the microcline-quartz separates out distinctively at 7 to 8 et.% K.

Figure 23. Optical photograph of core showing the microcline-quartz rock.

Classification by K-means Clustering on the SOM

The resulting count matrix, unity matrix (U-matrix) and classified SOM for SOM-1 are presented in Figure 24. The dark patches in the classified SOM represent unclassified nodes. The resulting classified SOM for SOM-1 and SOM-2 are similar, but with the difference that SOM-1 includes a seventh cluster in the centre (Figure 24). The U-matrix for SOM-2 (Figure 25) appears to be similar to that of SOM-1, with many nodes plotting towards the top right corner. Even the classified SOM for SOM-2 is similar to the one for SOM-1, albeit with the clear difference that SOM-1 has seven clusters as opposed to the six in SOM-2. The U-matrix for SOM-2b (Figure 26) has somewhat similar appearance to those of SOM-1 and SOM-2, but with the difference that it is rotated upside down, many of the nodes plot to the bottom left in the U-matrix. A difference worth pointing out is the lack of unclassified nodes in SOM-2b.

32

Figure 24. SOM plots for run 82 ( SOM-1).

Figure 25. SOM plots for run 88 (SOM-2).

Figure 26. SOM plots for run 89 (SOM-2b).

33

Classification by CART

The decision tree for run 59 (Figure 27) was used to classify DBH 4442 in CART-1. The decision tree for run 219 (Figure 28) was used to produce a core log for DBH 4442 that is later used to represent CART-5 in comparison to other methods (Figure 29). Table 5 compares CART-1, CART-3, CART-4, CART-5, CART-3b and CART-4b to each other and other approaches of lithological core logging.

Comparison of Agreement Scores The agreement scores for different methods compared to each other are presented in

Table 5. Most are compared to VCL-2 since this method was completed for all cores and also is the best estimate of the ground truth apart from the conventional method. Of the machine learning methods SOM-1 has the highest agreement score in comparison to VCL-2, 68.02 %. CART-5 achieves the second-best agreement score overall of 62.79 % (Table 5). The tests with CART-3b, CART-4b and SOM-2b were performed including Al in the dataset. From Table 5 we find that CART-3b, CART-4 and SOM-2b achieve lower agreement scores with VCL-2 than their counterparts that excluded Al.

Table 6 shows the correlations in percent between clusters and lithologies. A key observation is that the metatuffite is represented in almost all clusters. An opposite behaving lithology-cluster correlation to the metatuffite can be found in marble. We find that the marble for each SOM seems to be represented by two clusters; one with a strong correlation (>80 %) between marble and its respective cluster; and another with a weaker correlation (ca 50 %).

Comparison of VCL-2 to Machine Learning Logging Figure 29 shows VCL-2 in comparison to machine learning guided core logging methods

for each of the four drill cores. The two machine learning methods, CART-5 and SOM-1, were chosen for this comparison since they delivered the highest agreement score with VCL-2 for their respective methods (Table 5).

DBH 4442 For this core SOM-1 has a higher agreement with VCL-2 than CART-5 (Table 5).

Locally in the first 50 metres pegmatites occur representing neosome rafts (Figure 10), possibly explaining the heterogenity in CART-5 and SOM-1. In the bottom half of the core diopside skarns are found in VCL-2, only a few of them are found in CART-5. For some intervals the diopside skarns of VCL-2 seems to correlate with cluster 3 of SOM-1 (39.36 %, Table 6). The dolomitic marble from VCL-2 is clearly indicated by CART-5 and correlates to cluster 1 from SOM-1 (90.37 %, Table 6). CART-5 and SOM-1 suggest a greater variation of lithologies, with CART-5 indicating a mix of lithologies. SOM-1 for this interval indicates a mix of clusters. Regardless of assigned lithology, all three methods agree that this interval is heterogenous.

34

Figure 27. Decision tree from run 59, i.e. CART-1; for classification of DBH 4442 using the other three cores as learning data.

35

Figure 28. Decision tree from run 219, i.e. CART-5; for classification of DBH 4442 using the other three cores as learning data.

36

Table 5. Agreement scores in percent for all lithological core logging methods. Notice that the overall score is not an average over the four cores, but an individual calculation treating the four cores as one single continuous core.

Compared methods DBH 4442 DBH 4400 DBH 4020 DBH 804 Overall CART-1 vs VCL-2 57.85 54.05 76.51 42.94 52.24 CART-1 vs VCL-1 51.45 50.15 75.30 41.83 48.42 CART-3 vs VCL-2 67.52 29.79 37.60 72.32 57.95 CART-4 vs VCL-2 59.94 43.39 77.01 63.96 59.04 CART-3b vs VCL-2 42.69 25.49 31.47 62.61 45.38 CART-4b vs VCL-2 51.75 25.45 59.64 44.16 44.58 CART-5 vs VCL-2 51.45 64.86 71.08 67.87 62.79 VCL-1 vs VCL-2 63.95 77.48 73.49 69.81 70.76 Conventional vs VCL-2 80.81 N/A N/A N/A N/A Conventional vs VCL-1 64.83 N/A N/A N/A N/A SOM-1 vs VCL-2 65.70 78.08 72.89 58.73 68.02 SOM-2 vs VCL-2 52.80 47.87 69.66 56.20 54.58 SOM-2b vs VCL-2 58.08 24.53 59.04 62.34 51.33 SOM-1 vs Conventional 66.57 N/A N/A N/A N/A CART-5 vs Conventional 48.55 N/A N/A N/A N/A

Table 6. Correlation between clusters and lithologies for all SOM’s. The colour coding of the cells is done in ascending order of values, red through yellow to green, across each row.

ID Cluster Dia

base

GBK

Gra

nite

KSL

Mar

ble

Met

amaf

ite

Met

atuf

fite

Mic

rocl

ine-

quar

tz

Pegm

atite

Pelit

e

Skar

n

Spha

lerit

e

SOM-1 Cluster 1 0.00 0.00 0.00 0.00 90.37 0.00 0.00 0.00 0.00 0.00 9.63 0.00 SOM-1 Cluster 2 0.00 0.37 2.23 0.74 0.00 0.74 21.19 64.68 2.97 2.23 4.83 0.00 SOM-1 Cluster 3 0.00 0.00 0.53 9.57 2.13 11.17 30.85 5.85 0.53 0.00 39.36 0.00 SOM-1 Cluster 4 0.48 6.67 0.00 0.00 0.00 0.95 18.10 0.00 1.43 70.95 1.43 0.00 SOM-1 Cluster 5 0.00 0.00 0.00 10.20 0.00 1.02 67.35 0.00 1.02 0.00 11.22 9.18 SOM-1 Cluster 6 0.00 0.00 0.00 6.56 59.84 0.82 3.28 0.00 0.00 0.00 29.51 0.00 SOM-1 Cluster 7 0.00 0.00 0.55 0.55 0.00 0.00 88.46 0.00 8.79 1.65 0.00 0.00

SOM-2 Cluster 1 0.00 0.70 0.00 6.84 49.58 0.51 6.84 0.19 0.00 0.26 34.38 0.70 SOM-2 Cluster 2 0.00 0.10 2.28 2.59 0.21 0.80 32.13 48.05 6.88 1.69 4.98 0.28 SOM-2 Cluster 3 0.00 0.00 0.00 0.50 84.83 0.00 0.86 0.00 0.14 0.00 13.66 0.00 SOM-2 Cluster 4 0.00 0.14 0.23 8.67 1.99 9.92 36.53 9.83 0.70 1.67 28.09 2.23 SOM-2 Cluster 5 0.00 0.73 0.26 0.57 0.00 0.26 75.46 11.70 2.18 6.71 1.40 0.73 SOM-2 Cluster 6 0.63 6.18 0.05 0.05 0.15 1.36 20.97 0.24 2.19 65.45 2.14 0.58

SOM-2b Cluster 1 0.00 0.50 0.00 5.49 50.71 0.43 6.28 0.07 0.00 0.21 35.45 0.86 SOM-2b Cluster 2 0.00 1.05 0.25 0.56 0.00 0.12 72.89 15.14 1.18 8.00 0.62 0.19 SOM-2b Cluster 3 0.00 0.00 3.62 0.85 0.21 0.85 48.88 19.81 17.47 2.34 5.96 0.00 SOM-2b Cluster 4 0.00 0.00 0.00 2.37 87.07 0.00 0.46 0.00 0.09 0.00 10.02 0.00 SOM-2b Cluster 5 0.00 0.18 1.63 3.89 0.36 1.27 39.90 38.23 2.39 2.49 8.09 1.58 SOM-2b Cluster 6 0.00 0.40 0.17 9.46 2.45 11.97 37.23 4.68 0.57 2.51 28.73 1.82 SOM-2b Cluster 7 0.71 6.25 0.00 0.05 0.11 1.26 15.84 0.11 2.47 70.89 2.30 0.00

37

DBH 4400 For this core SOM-1 also has a higher agreement with VCL-2 than CART-5 (Table 5),

in fact 78.08 % is the highest SOM-1 vs. VCL-2 agreement score of all cores and methods. VCL-2 and CART-5 both indicate that the first 190 metres of DBH 4400 is dominated by microcline-quartz. SOM-1 is dominated by cluster 2 that has a correlation of 64.68 % with microcline-quartz (Table 6). The marble and pelite from 190 to 230 metres is found in all three methods of logging. The final 100 metres of DBH 4400 is more heterogenous accoring to all methods, although SOM-1 suggests more heterogeneity.

DBH 4020 For this core SOM-1 also has a higher agreement with VCL-2 than CART-5, however;

the difference small, 1.81 percentage units (Table 5). SOM-1 indicates cluster 2 for both microcline-quartz and metatuffite for the first 30 metres. The final interval from 120 to 165 metres is marked by metatuffite transitioning to to GBK at 150 metres, followed by pelite at 155 metres, according to VCL-2. CART-5 indicates a similar log but excludes the GBK. SOM-1 is most heterogenous for this final interval and also misses the GBK.

DBH 804 DBH 804 is the most heterogenous core in this study. Here, CART-5 acheives a higher

agreement score with VCL-2 than SOM-1 (Table 5). The first 200 metres are comprised of intercalated metatuffite, diopside skarn and dolomitic marble according to VCL-2. This heterogeneity is indicated in CART-5 and SOM-1. From 200 to 360 metres VCL-2 indicates metatuffite dominance with local KSL. SOM-1 also favours metatuffite, however; it is represented by both cluster 7 and cluster 5. Similarly, CART-5 describes this interval as mostly metatuffite but excludes the KSL. For the thick KSL unit in VCL-2 at 340 to 360 metres, CART-5 instead indicates dolomitic marble and diopside skarn.

Method comparison on DBH 4442 The differences between VCL-1, VCL-2, SOM-1, CART-5 and the conventional

method are presented in Figure 30. For DBH 4442 the agreement score between VCL-2 and the conventional method is 80.81 %. Compared to VCL-1 this score is superior, an agreement between VCL-1 and the conventional log of 64.83 %. Comparison of the agreement scores of CART-5 and SOM-1 to the conventional log yield 66.57 % agreement for SOM-1 and 48.55 % agreement for CART-5 (Table 5).

38

Figure 29. A comparison of the core logs VCL-2, SOM-1 and CART-5. The machine learning methods in this figure were completed with 1-meter data resolution. CART-5 was trained on VCL-2.

39

Figure 30. A comparison of the novel methods to the conventional method of manual on-site core logging. Both iterations of the VCL are presented highlighting the differences in judgement over time.

40

CART-1 vs CART-5

The comparison of CART-1 and CART-5 are presented in Figure 31. The agreement

scores for CART-1 and CART-5 versus VCL-2 suggest that CART-5 overall agrees more with VCL-2 than CART-1 with VCL-2 with agreement scores of 62.79 % and 52.24 %, respectively (Table 5). Visually, the most apparent difference is the abundance of KSL in CART-1 over CART-5 (Figure 31). The agreement score for CART-1 vs VCL-1 of 48.42 % is relevant for comparison, since VCL-1 is the learning data for CART-1.

CART-3 vs CART-4 vs CART-5 The comparison of CART-3, CART-4 and CART-5 (Figure 32) highlights the

differences in using 10-centimetre data resolution (CART-3), hybrid data resolution (CART-4) and 1-metre data resolution (CART-5). Overall, CART-5 appears to be the most successful method with the highest agreement score of 62.79 %. CART-4 achieves an agreement score of 59.04 % and CART-3 an agreement of 57.95 %. All these agreement scores are in relation to VCL-2 (Table 5).

SOM-1 vs SOM-2 The comparison of SOM-1 and SOM-2 (Figure 33) also indicate that lower data

resolution achieves higher agreement scores overall. SOM-1 utilised 1-metre data resolution whereas SOM-2 utilised 10-centimetre data resolution. The agreement scores for SOM-1 is 68.02 % as opposed to 54.58 % achieved by SOM-2 (Table 5).

41

Figure 31. A comparison of CART-1 and CART-5 where the former is trained on VCL-1 and the latter on VCL-2. Both used a data resolution of 1 metre.

42

Figure 32. A comparison of CART using different data resolutions. CART-3 utilized 10-centimetre data resolution, CART-4 utilized variable data resolution dictated by lithological boundaries in VCL-2, and CART-5 utilized a 1-metre data resolution. For all mentioned methods, VCL-2 was used for learning.

43

Figure 33. A comparison of different data resolutions in SOM, where SOM-1 utilised 1-metre data resolution and SOM-2 utilised 10-centimetre data resolution. The colouring of the clusters is done by their respective correlation to lithologies.

44

Discussion

The major findings of the study were; 1) K-means clustering applied to the SOM using 1-metre data resolution achieves the highest overall agreement score (68.02 %) of the machine learning methods, 2) VCL-2 is in better agreement (80.81 %) with the conventional method than both machine learning methods, 3) both CART and K-means clustering applied to the SOM perform differently depending on the geology of the cores, 4) the 10-centimetre data resolution underperforms compared to 1-metre data resolution for both CART and K-means on the SOM, 5) the choice of elements has an impact on the performance of both CART and K-means on the SOM.

Finding 1: Domain Comparison of Unsupervised and Supervised learning

Finding 1) is surprising as K-means on the SOM represents the case with the least domain knowledge, that is; we don’t how many or which lithologies we have. One might have expected the supervised method, CART, to achieve a higher agreement score than the unsupervised method. In the case of this study, overall more common litholgies in the VCL-2 tended to be favoured over less common lithologies (Table 6). It should be discussed if the inherent differences between the outputs of CART and K-means on the SOM may be of relevance. That is; is it more likely for CART to make mistakes since it does more than K-means on the SOM? Through a decision tree the algorithm is forced to consider all possible domains for a sample, even the uncommon domains. Hence, could it be that geochemically non-distinctive samples are more likely to fall into incorrect classifications when put through CART, as compared to K-means on the SOM? A geochemically non-distinctive sample in K-means on the SOM would be included in larger clusters that are likely to be assigned a lithology that occurs more commonly, hence more likely to be the actual lithology of that sample.

An important aspect that could have influence on Finding 1) has been the domain validation, the format of comparison by agreement scores requires critical reflection. This is apparent when realising that CART and K-means on the SOM are fundamentally different. The unsupervised K-means on the SOM method weighs in the need of a human to assign lithologies to the clusters derived from the method. Thus, we find that the metric for method comparison is problematic and requires further attention in future research.

Finding 2: Method Comparisons to the Conventional Method

Finding 2) ensures us that a human core logger is still superior to the machine, however; the finding should be elaborated on by discussing the difference between VCL-1 and VCL-2. VCL-2 is in better agreement with the conventional method than VCL-1 (Table 5). This difference in human interpretation of core alludes to the subjectivity of core logging when done by humans. This problem of subjectivity is relevant for the ground truth in the context of both supervised learning and domain validation. The results found by comparing CART-1 and CART-5 (Figure 31) highlights differences found in utilising VCL-1 or VCL-2 as learning data. This shows the extent to which differences in judgement propagate throughout the learning algorithm and finally takes expression in the CART logs. These results suggest that a more informed labelling of learning data will produce more accurate classification by CART. It could be argued that the agreement score for CART-5 vs VCL-2 will always be greater than that of CART-1 vs VCL-2 since CART-5 used VCL-2 for learning. However, an agreement score for VCL-1 vs CART-1 (since CART-1 used VCL-1 for learning) should be investigated and

45

specifically be compared to the agreement score for CART-1 vs VCL-2. Interestingly, the agreement between CART-1 and VCL-1 is lower than the agreement between CART-1 and VCL-2 (Table 5). This may suggest that the impact of errors in VCL-1 decreased by learning and CART-1 therefore approaches the ground truth of VCL-2.

Finding 3: Machine learning dependence on Geology

The lithology-cluster correlations from Table 6 highlights Finding 3). The finding relates to geochemical distinctiveness, it was found that the marble and microcline-quartz are geochemically easy to distinguish. This however, cannot be said for the metatuffite. It may therefore not be surprising that the main culprit for misclassification is the metatuffite since it correlates to almost all clusters (Table 6). The cause for the overrepresentation of metatuffite is difficult to assess, but may be due to its variable lithogeochemistry, impeding its distinct classification. This is exemplified by the decision tree (Figure 28) in which we find metatuffite in 12 of the 32 leaf-nodes. We find in CART applied to DBH 4400 that the microcline-quartz is in many intervals misclassified as metatuffite (Figure 32). This is probably due to the poor representation of microcline-quartz in all other cores. Hence, when the three other cores are used as learning data for classification of DBH 4400 there is only a small amount of data reflecting this specific lithology. This finding indicates that the CART method is sensitive to the variety of geological units in the learning cores. Care should therefore be taken when selecting learning data for supervised methods of machine learning for core logging, in part this can probably be solved by using more cores for learning.

Finding 4: Data Resolution for SOM and CART

The tests involving SOM-1, SOM-2, CART-3, CART-4 and CART-5 highlight a challenge of automated approaches for core logging i.e. that the resolution of the data has an impact on the final core logs that these methods produce as stated by Finding 4) (Table 5). Higher resolution captures more detail, however; it may incorporate more data noise. Interestingly, CART-4 with hybrid data resolution did not achieve a higher overall agreement score than CART-5 using 1-metre data resolution. The cause of this data resolution may be found in how the data was produced. Critical to XRF analysis is the analysis time for each interval of core, ideally this would be the same for all intervals. However, due to the locally high fracture density in the core this cannot always be achieved [A. Inerfeldt, personal communication, 2020]. For the 10-centimetre analysises the analysis times were shorter ultimately leading to erroneous high-resolution logs. This may seem far-fetched but could be a part of the explaination, apart from the noisier nature of the 10-centimetre data.

Finding 5: Elements Included in Data Given to Machine Learning Aluminium was excluded from the 10-centimetre data used for CART-3, CART-4 and

SOM-2 although it was in fact below the exclusion criteria of 30 % below the detection limit. Therefore, CART-3b, CART-4b and SOM-2b were conducted including Al in the dataset. Interestingly, the inclusion of Al decreased the agreement scores for all machine learning methods (Table 5). Although the overall agreement scores for the methods including Al are lower, the agreement scores for individual cores are more nuanced. We find that for SOM-2 in comparison to SOM-2b, the difference if including Al is not as big as for the CART methods. In fact, we

46

find for DBH 804 individually that inclusion of Al in the data can produce higher agreement scores than when Al is excluded. However, inclusion can be very detrimental for the agreement score, as seen for DBH 4400 where SOM-2b greatly underperforms in comparison to SOM-2 (Table 5). The cause of Al’s impact on the individual core logs is elusive but could relate to the elements representation in marble, cores with a lot of marble would in turn have a lot of poorly quantified Al concentrations influencing the results. Templ et al. [2008], conclude that “the addition or deletion of just one variable can change the results of cluster analysis drastically”. It is therefore reasonable to choose which elements to include in the learning data with great care. The commonly used 30 % omission criterion for limit of detection data may be too lenient when using machine learning. Instead, it may be beneficial to use a more exclusive omission criterion. An alternative would be to critically assess which elements to process by machine learning. The categorical exclusion of light elements such as Al may be unreasonable due to their heterogeneity in performance in machine learning algorithms.

Suggestions for Improving Accuracies of Machine Learning Findings 1) serves to motivate the need of improving the machine learning logging since

a 68.02 % agreement with VCL-2 (Table 5) may be a too low for many applications. The question is when to say that an agreement score tolerable. For this we could use the agreement between VCL-2 and the conventional method to set thresholds for CART and K-means on the SOM. When automated methods are on par with agreement between human logs, i.e. non-distinguishable from a human log, then the accuracy of automated logging could be deemed tolerable.

Choice of data acquisiton technique

An important question is the collection of the data itself. This study chose to look at XRF core scan data since they yield the lithogeochemistry along the entire cores. However, important alteration related elements such as Na and Mg are difficult to quantify by XRF. Future studies should consider acquiring lithogeochemistry from a lab for machine learning as well. Data Pre-processing

The pre-processing of the data has a critical influence on the accuracy of the machine learning. Examples of this are found in the choice of data resolution, what elements to include and the choice of omission criteria for values below the detection limit. In literature it is debated what pre-processing should be done. The choice of which log-ratio transform to use is not obvious [Aitchison 1982; Egozcue et al. 2003]. In fact, transformation may not be needed at all [Trépanier et al. 2016].

Choice of algorithm

At the heart of the choice of machine learning algorithm is the intended purpose of the project. Templ et al. [2008] concludes that clustering algorithms are suitable for exploratory data analysis, i.e. if little is known about the data itself. Supervised methods such as Random Forests used by Hood et al. [2018] could, on the other hand, replace the “manual task of chemically grouping rock types” [Hood et al. 2018] to decrease work loads.

The need of dimensionality reduction should be discussed. In this study the SOM was used to represent the high dimensional data in a U-matrix, then apply K-means clustering to this U-matrix to produce meaningful clusters as in Klawitter et al. [2019]. However, it is apparent that using the SOM may not be needed [Hill et al. 2020]. They demonstrate an improvement to automated core logging where K-means clustering is applied without any dimensionality reduction to the XRF core scan data.

47

Ensemble Methods

A way to increase the classification accuracies of machine learning is to apply ensemble classifiers. One such would be a Random Forest classifier as used by Hood et al. [2018]. Bagging techniques have been found to increase accuracy [Breiman 1996]. It would be up to future research to attempt using these bagging techniques on unsupervised learning that in this study showed the best performance and ultimately produced a more superior classifier.

Conclusions

This study concludes that: x K-means on the SOM has the highest overall agreement score, however; this

result should be regarded with caution due to influences from domain validation. x The choice of ground truth for learning is important for both supervised learning

and the assessment of machine learning accuracy. x Geology, data resolution and choice of elements are critical parameters for

machine learning. The challenge this study has faced is attempting to compare two fundamentally different

algorithms designed to solve two very different problems. K-means applied to the SOM is a tool for exploring unlabelled data. Whereas CART is a tool for automating the otherwise manual task of classifying rocks. Regardless of this comparison, both methods in their own right show much potential in clustering or classifying rocks from the Zinkgruvan stratigraphy. This conclusion on their potentials is based on comparing the agreement scores of A.I. logs with the agreement scores between different human logs; the difference is not that big.

Acknowledgements

Firstly, I would like to thank my LTU supervisors Nils Jansson, Tobias Kampmann and Foteini Liwicki for helpful feedback on this thesis. I would also like to thank Marcus Liwicki (LTU) for answering questions on A.I. during my work.

I am also thankful for all the help from the staff at Zinkgruvan Mining AB who gave me the opportunity to explore this exciting topic. I would like to thank Anja Hagerud (Zinkgruvan Mining AB) who help organise my vists to Zinkgruvan and Filip Ivarsson (Zinkgruvan Mining AB) who due to travel restrictions related to the COVID-19 pandemic conducted the geochemical sampling, for answering questions and guiding me in the mine. My gratitude also goes to all other employees at the near mine department for their help in the core logging facility and discussing geology.

I would also like to thank Minalyze AB employees Andreas Inerfeldt and Torbjörn Svensson for answering my many questions concerning the core scanning machine.

Finally, I would like to thank my family and classmates for interesting discussions and support during the entire thesis.

References

48

Aitchison, J. (1982). The Statistical Analysis of Compositional Data. Journal of the Royal Statistical Society, Vol. 44, No. 2, p. 139–177.

Allen, R.L., Lundström, I., Ripa, M., Simeonov, A., Christofferson, H. (1996). Facies analysis of a 1.9 Ga, continental margin, back–arc, felsic caldera province with diverse Zn-Pb-Ag-(Cu-Au) sulfide and Fe oxide deposits, Bergslagen Region, Sweden. Economic Geology, Vol. 91, p. 979–1008.

Beckhoff, B., Kanngießer, B., Langhoff, N., Wedell, R., Wolff, H. (2006). Handbook of Practical X–Ray Fluorescence Analysis. Springer.

Breiman, L. (1996). Bagging Predictors. Machine Learning, Vol. 24, p. 123–140. Bindler, R., Karlsson, J., Rydberg, J., Karlsson, B., Nilsson, L.B., Biester, H., Segerström,

U. (2017). Copper-ore mining in Sweden since the pre-Roman Iron Age: lake-sediment evidence of human activities at the Garpenberg ore field since 375 BCE. Journal of Archaeological Science, Vol. 12, p. 99–108.

Curtis, A. (2012). The Science of Subjectivity. Geology, Vol. 40, p. 95–96 Dart, R.A. (1969). Evidence of Iron Ore Mining in Southern Africa in the Middle Stone

Age. Current Anthropolog, Vol. 10, No. 1, p. 127–128. Duda, R.O., Hart, P.E., Stork, D.G. (2001). Pattern Classification, 2nd Edition. Wiley. Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C. (2003).

Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology, Vol. 35, No. 3, p. 279–300.

Gregory, B.R.B., Patterson, R.T., Reinhardt, E.G., Galloway, J.M., Roe, H.M. (2019). An evaluation of methodologies for calibrating Itrax X-ray fluorescence counts with ICP-MS concentration data for discrete sediment samples. Chemical Geology, Vol. 521, p. 12–27.

Jansson, N.F., Allen, R.L. (2011). Timing of volcanism, hydrothermal alteration and ore formation at Garpenberg, Bergslagen, Sweden. GFF, Vol. 133:1-2, p. 3–18.

Jansson, N.F., Zetterqvist, A., Allen, R.L., Billström, K., Malmström, L. (2017). Genesis of the Zinkgruvan Zn-Pb-Ag deposit and associated dolomite-hosted Cu ore, Bergslagen, Sweden. Ore Geology Reviews, Vol. 82, p. 285–308.

Jansson, N.F., Zetterqvist, A., Allen, R.L., Malmström, L. (2018). Geochemical vectors for stratiform Zn-Pb-Ag sulfide and associated dolomite-hosted Cu mineralization at Zinkgruvan, Bergslagen, Sweden. Journal of Geochemical Exploration, Vol. 190, p. 207–228.

Hill, E.J., Pearce, M.A., Stromberg, J.M. (2020). Improving automated geological logging of drill holes by incorporating multiscale spatial methods. Math Geosci. https://doi.org/10.1007/s11004-020-09859-0

Kampmann, T., Jansson, N.F., Stephens, M.B., Majka, J., Lasskogen, J. (2017). Systematics of Hydrothermal Alteration at the Falun Base Metal Sulfide Deposit and Implications for Ore Genesis and Exploration, Bergslagen Ore District, Fennoscandian Shield, Sweden. Economic Geology, Vol. 112 (5), p. 1111–1152.

Kampmann, T., Stephens, M.B., Weihed, P. (2016). 3D modelling and sheath folding at the Falun pyritic Zn-Pb-Cu-(Au-Ag) sulphide deposit and implications for exploration in a 1.9 Ga ore district, Fennoscandian Shield, Sweden. Mineralium Deposita, Vol. 51, p. 665–680.

Klawitter, M., Valenta, R. (2019). Automated Geological Drill Core Logging Based on XRF Data Using Unsupervised Machine Learning Methods. Geomin Mineplanning, 6th International Conference on Geology and Mine Planning, Australia. https://espace.library.uq.edu.au/view/UQ:170e551/UQ170e551_OA.pdf

Kohonen, T. (2013). Essentials of the self–organizing map. Neural Networks, Vol. 37, p. 52–65.

Kumpulainen, R.A., Mansfeld, J., Sundblad, K., Neymark, L., Bergman, T. (1996). Stratigraphy, Age, and Sm-Nd Isotope Systematics of the Country Rocks to Zn-Pb Sulfide Deposits, Åmmeberg District, Sweden. Economic Geology, Vol. 91, p. 1009–1021.

Maliki, A.A., Al-lami, A.K., Hussain, H.M., Al-Ansari, N. (2017). Comparison between

https://doi.org/10.1007/s11004-020-09859-0

https://espace.library.uq.edu.au/view/UQ:170e551/UQ170e551_OA.pdf

49

inductively coupled plasma and X-ray fluorescence performance for Pb analysis in environmental soil samples. Environmental Earth Science, Vol. 76, p. 433.

Matthews, C.M. (2018). Silicon Valley to Big Oil: We Can Manage Your Data Better Than You. Wall Street Journal - Online Edition, p. 1.

Martín-Fernández, J.A., Hron, K., Templ, M., Filzmoser, P., Palarea-Albaladejo, J. (2012). Model-based replacement of rounded zeros in compositional data: Classical and robust approaches. Computational Statistics and Data Analysis, Vol. 56, p. 2688–2704.

Stephens, M.B., Jansson, N.F. (2020). "Paleoproterozoic (1.9–1.8 Ga) syn-orogenic magmatism, sedimentation and mineralization in the Bergslagen lithotectonic unit, Svecokarelian orogen", Sweden: Lithotectonic Framework, Tectonic Evolution and Mineral Resources, Stephens, M. B., Weihed, J.B. The Geological Society, Vol. 50, p. 156–206 . https://doi.org/10.1144/M50-2017-40

Stephens, M.B., Ripa, M., Lundström, I., Persson, L., Bergman, T., Ahl, M., Wahlgren, C.-H., Persson, P.-H. & Wickström, L. (2009). Synthesis of the bedrock geology in the Bergslagen region, Fennoscandian Shield, southcentral Sweden. Geological survey of Sweden Ba 58, p. 259.

Sjöqvist, A.S.L., Arthursson, M., Lundström, A., Calderón Estrada, E., Inerfeldt, A., Lorenz, H. (2015). An innovative optical and chemical drill core scanner. Scientific Drilling, Vol. 19, p. 13–16.

Templ, M., Filzmoser, P., Reimann, C. (2008). Cluster analysis applied to regional geochemical data: Problems and possibilities. Applied Geochemistry, Vol. 23, p. 2198–2213.

Trépanier, S., Mathieu, L., Daignault, R., Faure, S. (2016). Precursors predicted by artificial neural networks for mass balance calculations: Quantifying hydrothermal alteration in volcanic rocks. Computers & Geosciences, Vol. 89, p. 32–43.

Hedström, P., Simenov, A., Malmström, L. (1989). The Zinkgruvan Ore Deposit, South–Central Sweden: A Proterozoic, Proximal Zn-Pb-Ag Deposit in Distal Volcanic Facies. Economic Geology, Vol. 84, p. 1235–1261.

Hermansson, T., Stephens, M.B., Corfu, F., Page, L.M., Andersson, J. (2008). Migratory tectonic switching, western Svecofennian orogen, central Sweden: Constraints from U/Pb zircon and titanite geochronology. Precambrian Research, Vol. 161, p. 250–278.

Hood, S.B., Cracknell, M.J., Gazley, M. F. (2018). Linking protolith rocks to altered equivalents by combining unsupervised and supervised machine learning. Journal of Geochemical Exploration, Vol. 186, p. 270–280.

Wardell Armstrong (2017). NI 43–101 Technical Report for the Zinkgruvan Mine, Sweden (MM1185). https://www.lundinmining.com/site/assets/files/3642/zm-techreport-113017-sedar.pdf

https://www.lundinmining.com/site/assets/files/3642/zm-techreport-113017-sedar.pdf

https://www.lundinmining.com/site/assets/files/3642/zm-techreport-113017-sedar.pdf

50

Appendix I Table A1. A table with all of the K-means on SOM runs for this study showing the different input parameters used for the different runs.

SOM

K-means

Clustering

Run Elements Data Resolution

Pre-processing Matrix size

Iterations Initial Neighbour

Radius

Learning rate (n0)

Learning rate (c)

Tries Max Clusters

K value

1 Al, Si, S, K, Ca, Ti, Fe 1 m None 8 100 4 0.8 0.02 50 12 3 2 Al, Si, S, K, Ca, Ti, Fe 1 m None 8 200 4 0.8 0.02 50 12 3 3 Al, Si, S, K, Ca, Ti, Fe 1 m None 12 100 6 0.8 0.02 50 12 3 4 Al, Si, S, K, Ca, Ti, Fe 1 m None 12 200 6 0.8 0.02 50 12 3 5 Al, Si, S, K, Ca, Ti, Fe 1 m None 16 100 8 0.8 0.02 50 12 3 6 Al, Si, S, K, Ca, Ti, Fe 1 m None 16 200 8 0.8 0.02 50 12 3 7 Al, Si, S, K, Ca, Ti, Fe 1 m None 20 100 10 0.8 0.02 50 12 3 8 Al, Si, S, K, Ca, Ti, Fe 1 m None 20 200 10 0.8 0.02 50 12 3 9 Al, Si, S, K, Ca, Ti, Fe 1 m None 24 100 12 0.8 0.02 50 12 3

10 Al, Si, S, K, Ca, Ti, Fe 1 m None 24 200 12 0.8 0.02 50 12 3 11 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 8 100 4 0.8 0.02 50 12 3 12 Al, Si, S, K, Ca, Ti, Fe 1 m None 8 200 4 0.8 0.02 50 12 3 13 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 12 100 6 0.8 0.02 50 12 3 14 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 12 200 6 0.8 0.02 50 12 3 15 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 16 100 8 0.8 0.02 50 12 3 16 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 16 200 8 0.8 0.02 50 12 3 17 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 20 100 10 0.8 0.02 50 12 3 18 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 20 200 10 0.8 0.02 50 12 3 19 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 24 100 12 0.8 0.02 50 12 3 20 Al, Si, S, K, Ca, Ti, Fe 1 m Log 10 24 200 12 0.8 0.02 50 12 3 21 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 8 100 4 0.8 0.02 50 12 3 22 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 3

51

23 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 12 100 6 0.8 0.02 50 12 3 24 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 3 25 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 16 100 8 0.8 0.02 50 12 3 26 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 3 27 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 20 100 10 0.8 0.02 50 12 3 28 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 3 29 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 24 100 12 0.8 0.02 50 12 3 30 Al, Si, S, K, Ca, Ti, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 3 31 Al, Si, S, K, Ca, Fe 1 m None 16 200 8 0.8 0.02 50 12 3 32 Al, Si, S, K, Ca, Fe 1 m None 20 200 10 0.8 0.02 50 12 3 33 Al, Si, S, K, Ca, Fe 1 m None 24 200 12 0.8 0.02 50 12 3 34 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 3 35 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 3 36 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 3 37 Al, Si, S, K, Ca, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 4 38 Al, Si, S, K, Ca, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 5 39 Al, Si, S, K, Ca, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 6 40 Al, Si, S, K, Ca, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 7 41 Al, Si, S, K, Ca, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 8 42 Al, Si, S, K, Ca, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 9 43 Al, Si, S, K, Ca, Fe 1 m CLR 8 200 4 0.8 0.02 50 12 10 44 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 3 45 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 4 46 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 5 47 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 6 48 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 7 49 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 8 50 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 9 51 Al, Si, S, K, Ca, Fe 1 m CLR 12 200 6 0.8 0.02 50 12 10 52 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 3 53 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 4

52

54 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 5 55 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 6 56 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 7 57 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 8 58 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 9 59 Al, Si, S, K, Ca, Fe 1 m CLR 16 200 8 0.8 0.02 50 12 10 60 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 3 61 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 4 62 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 5 63 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 6 64 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 7 65 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 8 66 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 9 67 Al, Si, S, K, Ca, Fe 1 m CLR 20 200 10 0.8 0.02 50 12 10 68 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 3 69 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 4 70 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 5 71 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 6 72 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 7 73 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 8 74 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 9 75 Al, Si, S, K, Ca, Fe 1 m CLR 24 200 12 0.8 0.02 50 12 10 76 Al, Si, S, K, Ca, Fe 1 m CLR 50 400 25 0.8 0.02 50 12 3 77 Al, Si, S, K, Ca, Fe 1 m CLR 50 400 25 0.8 0.02 50 12 7 78 Al, Si, K, Ca, Fe, Ti 1 m CLR, <LOD

replacement by ILR

20 200 10 0.8 0.02 50 12 3

79 Al, Si, K, Ca, Fe, Ti 1 m CLR, <LOD replacement by

ILR

24 400 12 0.8 0.02 50 12 5

53

80 Ti, V, Ni, Rb, Sr, Y, Zr, Nb, Ba, Al, S, K, Ca, Fe, Cu, Zn,

Pb

1 m CLR, <LOD replacement by

0.5DL

24 400 12 0.8 0.02 50 12 6


0.5DL

24 400 12 0.8 0.02 50 12 7


0.5DL

24 400 12 0.8 0.02 50 12 7


0.5DL

20 200 10 0.8 0.02 50 12 7

84 Al, Si, K, Ca, Fe, Ti 10 cm CLR, <LOD replacement by

0.5DL

20 200 10 0.8 0.02 50 12 7

85 Zr, Ti, Rb, Sr, Y, V 10 cm CLR, <LOD replacement by

0.5DL

20 200 10 0.8 0.02 50 12 4

86 Zr, Rb, Sr, Y 10 cm CLR, <LOD replacement by

0.5DL

20 200 10 0.8 0.02 50 12 4

87 Al, Ti, Zr 10 cm CLR, <LOD replacement by

0.5DL

20 200 10 0.8 0.02 50 12 7

88 Si, K, Ca, Fe, Ti 10 cm CLR, <LOD replacement by

0.5DL

24 400 12 0.8 0.02 50 12 6

89 Al, Si, K, Ca, Fe, Ti 10 cm CLR, <LOD replacement by

0.5DL

24 400 12 0.8 0.02 50 12 7

54

Appendix II Table A2. A table with all of the CART runs for this study showing the different input parameters used for the different runs.

Run Result ID 4442 4400 4020 804 Decision

Tree Random Forest

Partition (%) Elements

Max Nodes

Min node size Split rule

Stratified Sampling

Training Acc. (%)

Validation Acc. (%)

Avg. Train. Acc. (%)

Avg. Val. Acc. (%)

1 N/A

Training and validation AND Test Test Test Test x N/A

Al, Si, K, Ca, Fe, Ti 100 5 Gini N/A 85 N/A N/A N/A

2 N/A Test

Training and validation AND Test Test Test x N/A


3 N/A Test Test

Training and validation AND Test Test x N/A


4 N/A Test Test Test

Training and validation AND Test x N/A


5 N/A

Training and validation AND Test

Training and validation AND Test Test Test x N/A


6 N/A Test Test




7 N/A

Training and validation AND Test Test



8 N/A Test




9 N/A

Training and validation AND Test Test Test

Training and validation x N/A


10 N/A Test Training and

validation Training and

validation Test x N/A Al, Si, K, Ca, Fe, Ti 100 5 Gini N/A 90 N/A N/A N/A

55

AND Test AND Test

11 N/A Test





12 N/A





13 N/A



Training and validation x N/A


14 N/A





15 N/A






16 N/A Training and

validation Test Test Test x N/A Al, Si, K, Ca, Fe, Ti 100 5 Gini N/A 85 N/A N/A N/A


validation Test Test x N/A Al, Si, K, Ca, Fe, Ti 100 5 Gini N/A 88 N/A N/A N/A

18 N/A Test Test Training and


19 N/A Test Test Test Training and

validation x N/A Al, Si, K, Ca, Fe, Ti 100 5 Gini N/A 85 N/A N/A N/A

20 N/A Training and






22 N/A Training and

validation Test Training and





24 N/A Training and

validation Test Test Training and









27 N/A Training and Test Training and Training and x N/A Al, Si, K, 100 5 Gini N/A 85 N/A N/A N/A

56

validation validation validation Ca, Fe, Ti

28 N/A Training and




29 N/A Training and




30 N/A Training and




validation x N/A Al, Si, K, Ca, Fe, Ti 100 5 Gini Y 85 N/A N/A N/A

31 N/A Training and





32 N/A Training and





33 N/A Training and




validation x N/A Al, Si, K, Ca, Fe, Ti 100 5 Gini N 85 N/A N/A N/A

34 N/A Training and





35 N/A Training and





36 N/A Training and

validation Test Test Test x N/A Al, Si, K, Ca, Fe, Ti 100 5 Gini N/A 90 N/A N/A N/A





39 N/A Test Test Test Training and


40 N/A Training and






42 N/A Training and






44 N/A Training and

validation Test Test Training and









57

47 N/A Training and




48 N/A Training and




49 N/A Training and





Tree Random

forest Partitio

n (%) Elements Max

nodes Min

node size Split rule Stratified Sampling

Training Accuracy

(%) Validation

Acc. (%)


Avg. Val. Acc. (%)

50 CART-1 Test Training and



validation x 70 Al, Si, K, Ca, Fe, Ti 100 5 Gini Y 86.54 68.22




































validation x 70 Al, Si, K, Ca, Fe, Ti 100 5 Gini Y 85.55 74.42 86.545 70.891

60 CART-1 Training and




















58
































































validation Test x 70 Al, Si, K, Ca, Fe, Ti 100 5 Gini Y 86.27 74.7

















59




















validation Test x 70 Al, Si, K, Ca, Fe, Ti 100 5 Gini Y 86.1 76.68 86.05 72.688


Tree Random

forest Partitio

n (%) Elements Max

nodes Min


Training Accuracy

(%) Validation

Acc. (%)


Avg. Val. Acc. (%)




validation x 70 Si, K, Ca,

Fe, Ti 100 5 Gini Y 75.39 71.93





Fe, Ti 100 5 Gini Y 74.63 70.59





Fe, Ti 100 5 Gini Y 76.26 71.42





Fe, Ti 100 5 Gini Y 75.77 72.16





Fe, Ti 100 5 Gini Y 74.9 71.93





Fe, Ti 100 5 Gini Y 76.74 71.38





Fe, Ti 100 5 Gini Y 74.84 70.87





Fe, Ti 100 5 Gini Y 76.76 70.59





Fe, Ti 100 5 Gini Y 76.22 72.11





Fe, Ti 100 5 Gini Y 76.09 69.35 75.76 71.233





Fe, Ti 100 5 Gini Y 80.39 74.98





Fe, Ti 100 5 Gini Y 78.3 73.91





Fe, Ti 100 5 Gini Y 78.4 74.7

60





Fe, Ti 100 5 Gini Y 79.34 74.93





Fe, Ti 100 5 Gini Y 79.52 72.61





Fe, Ti 100 5 Gini Y 78.66 74.24





Fe, Ti 100 5 Gini Y 78.72 74.79





Fe, Ti 100 5 Gini Y 78.8 72.85





Fe, Ti 100 5 Gini Y 79 73.35





Fe, Ti 100 5 Gini Y 79.81 76.69 79.094 74.305





Fe, Ti 100 5 Gini Y 75.07 71.57





Fe, Ti 100 5 Gini Y 75.29 70.09





Fe, Ti 100 5 Gini Y 75.35 69.55





Fe, Ti 100 5 Gini Y 75.13 71.1





Fe, Ti 100 5 Gini Y 74.02 70.06





Fe, Ti 100 5 Gini Y 75.64 69.66





Fe, Ti 100 5 Gini Y 74.33 71.97





Fe, Ti 100 5 Gini Y 75.75 72.73





Fe, Ti 100 5 Gini Y 74.88 79.67





Fe, Ti 100 5 Gini Y 74.5 70.06 74.996 71.646




validation Test x 70 Si, K, Ca,

Fe, Ti 100 5 Gini Y 75.74 71.5





Fe, Ti 100 5 Gini Y 76.01 71.81





Fe, Ti 100 5 Gini Y 75.27 70.85

61





Fe, Ti 100 5 Gini Y 74.36 71.63





Fe, Ti 100 5 Gini Y 75.62 71.63





Fe, Ti 100 5 Gini Y 75.64 71.26





Fe, Ti 100 5 Gini Y 74.58 72.91





Fe, Ti 100 5 Gini Y 75.99 71.72





Fe, Ti 100 5 Gini Y 75.8 71.36





Fe, Ti 100 5 Gini Y 75.62 71.31 75.463 71.598


Tree Random

forest Partitio

n (%) Elements Max

nodes Min


Training Accuracy

(%) Validation

Acc. (%)


Avg. Val. Acc. (%)





Fe, Ti 100 5 Gini Y 77.2 73.28





Fe, Ti 100 5 Gini Y 77.1 74.99





Fe, Ti 100 5 Gini Y 77.38 73.16





Fe, Ti 100 5 Gini Y 76.8 74.41





Fe, Ti 100 5 Gini Y 77.57 73.67





Fe, Ti 100 5 Gini Y 78.05 74.64





Fe, Ti 100 5 Gini Y 77.1 72.73





Fe, Ti 100 5 Gini Y 77.2 73.24





Fe, Ti 100 5 Gini Y 78.47 72.54





Fe, Ti 100 5 Gini Y 77.08 72.77 77.395 73.543





Fe, Ti 100 5 Gini Y 78.92 76.12

62





Fe, Ti 100 5 Gini Y 79.2 76.19





Fe, Ti 100 5 Gini Y 79.61 78.27





Fe, Ti 100 5 Gini Y 78.37 75.46





Fe, Ti 100 5 Gini Y 80.04 75.46





Fe, Ti 100 5 Gini Y 78.92 74.92





Fe, Ti 100 5 Gini Y 79.62 77.5





Fe, Ti 100 5 Gini Y 80.09 76.04





Fe, Ti 100 5 Gini Y 79.99 76.58





Fe, Ti 100 5 Gini Y 79.03 75.81 79.379 76.235





Fe, Ti 100 5 Gini Y 74.82 69.23





Fe, Ti 100 5 Gini Y 74.67 71.68





Fe, Ti 100 5 Gini Y 73.3 69.06





Fe, Ti 100 5 Gini Y 73.92 69.13





Fe, Ti 100 5 Gini Y 74.05 69.65





Fe, Ti 100 5 Gini Y 74.27 71.97





Fe, Ti 100 5 Gini Y 73.23 71





Fe, Ti 100 5 Gini Y 73.92 69.27





Fe, Ti 100 5 Gini Y 74.81 71.13





Fe, Ti 100 5 Gini Y 72.69 69.97 73.968 70.209





Fe, Ti 100 5 Gini Y 75.37 71.12

63





Fe, Ti 100 5 Gini Y 75.08 70.88





Fe, Ti 100 5 Gini Y 75.2 71.51





Fe, Ti 100 5 Gini Y 74.98 72.55





Fe, Ti 100 5 Gini Y 75.57 71.63





Fe, Ti 100 5 Gini Y 75.97 71.83





Fe, Ti 100 5 Gini Y 75.3 72.82





Fe, Ti 100 5 Gini Y 76.07 72.43





Fe, Ti 100 5 Gini Y 75.57 69.81





Fe, Ti 100 5 Gini Y 75.42 70.28 75.453 71.486


Tree Random

forest Partitio

n (%) Elements Max

nodes Min


Training Accuracy

(%) Validation

Acc. (%)


Avg. Val. Acc. (%)





Fe, Ti 100 5 Gini Y 89.32 82.42





Fe, Ti 100 5 Gini Y 89.64 79.85





Fe, Ti 100 5 Gini Y 89.48 80.22





Fe, Ti 100 5 Gini Y 90.42 80.22





Fe, Ti 100 5 Gini Y 88.38 79.49





Fe, Ti 100 5 Gini Y 88.38 76.56





Fe, Ti 100 5 Gini Y 89.48 83.88





Fe, Ti 100 5 Gini Y 89.48 80.59





Fe, Ti 100 5 Gini Y 88.07 82.05

64





Fe, Ti 100 5 Gini Y 89.32 78.75 89.197 80.403





Fe, Ti 100 5 Gini Y 88.58 74.55





Fe, Ti 100 5 Gini Y 88.73 77.09





Fe, Ti 100 5 Gini Y 88.42 80.36





Fe, Ti 100 5 Gini Y 88.73 77.45





Fe, Ti 100 5 Gini Y 87.01 76





Fe, Ti 100 5 Gini Y 87.48 77.45





Fe, Ti 100 5 Gini Y 88.89 77.45





Fe, Ti 100 5 Gini Y 88.42 74.91





Fe, Ti 100 5 Gini Y 89.36 74.18





Fe, Ti 100 5 Gini Y 88.11 74.91 88.373 76.435





Fe, Ti 100 5 Gini Y 85.98 77.74





Fe, Ti 100 5 Gini Y 87.42 80.49





Fe, Ti 100 5 Gini Y 87.94 78.66





Fe, Ti 100 5 Gini Y 88.07 82.93





Fe, Ti 100 5 Gini Y 88.99 75.3





Fe, Ti 100 5 Gini Y 87.94 77.13





Fe, Ti 100 5 Gini Y 88.6 75





Fe, Ti 100 5 Gini Y 87.29 78.96





Fe, Ti 100 5 Gini Y 88.47 71.34

65





Fe, Ti 100 5 Gini Y 86.5 79.57 87.72 77.712





Fe, Ti 100 5 Gini Y 88.67 78.63





Fe, Ti 100 5 Gini Y 89.82 75.95





Fe, Ti 100 5 Gini Y 87.19 80.92





Fe, Ti 100 5 Gini Y 88.51 75.19





Fe, Ti 100 5 Gini Y 86.54 80.92





Fe, Ti 100 5 Gini Y 88.83 78.24





Fe, Ti 100 5 Gini Y 88.67 79.39





Fe, Ti 100 5 Gini Y 87.52 78.24





Fe, Ti 100 5 Gini Y 86.54 78.63





Fe, Ti 100 5 Gini Y 88.18 75.57 88.047 78.168


Tree Random

forest Partitio

n (%) Elements Max

nodes Min


Training Accuracy

(%) Validation

Acc. (%)


Avg. Val. Acc. (%)





























66

















































































67






















































Tree Random

forest Partitio

n (%) Elements Max

nodes Min


Training Accuracy

(%) Validation

Acc. (%)


Avg. Val. Acc. (%)

250 CART-3b Test Training and




















68





















260 CART-3b Training and























validation x 70 Al, Si, K, Ca, Fe, Ti 100 5 Gini Y 80.57 77





































69






























































Tree Random

forest Partitio

n (%) Elements Max

nodes Min


Training Accuracy

(%) Validation

Acc. (%)


Avg. Val. Acc. (%)













70

















































































71





































































assessment of machine learning applied to x-ray

Documents