undusting the foundations of compositional analysis approaches of ceramic archaeological...
TRANSCRIPT
Un-Dusting the Foundations of Compositional Analysis Approaches of Ceramic Archaeological Data
Elisavet CharalambousNARNIA ESR08, EuroCy Innovations
Department of Electrical and Computer Engineering, University of Cyprus
The Classification Problem
Classification of archaeological ceramics deals with the isolation of ceramic groups of similar chemical profiles
Given a number of artifacts of known fabric/categoryidentify to which of a set of categories an uncategorizedartifact belongs
An instance of supervised learning which assumes that atraining set of correctly identified observations is available.
Algorithms which perform classification require theprovision of a set with known labels.
The lables of un-categorized observations are determined byprocessing the already classified material
Compositional Data Analysis
Compositional data do not vary independently Concentration based approaches to data analysis
can lead to faulty conclusions.
Compositional data lay in the constrained Simplex Space
Correlation analysis and the Euclidean distance are not mathematically meaningful concepts in this context
XY plots for raw or log-transformed data should only be used with care in an exploratory data analysis (EDA) sense
… cautious with the belief that good data will speak for themselves
Hypothesis Formulation
The development of an experimental design for the classification of data from different datasets which span several periods.
The proposed design is tested with the deployment of three well known classification algorithms.
Test 1: Null Hypothesis
The classification algorithms, k-Nearest Neighbour, C4.5(based on DecisionTrees) and Learning Vector Quantization (LVQ Networks) perform equallywhen analyzing ceramic compositional data.
Test 2: Alternative Hypothesis
The pairwise performance of algorithms is not equal and one algorithmoutperforms the others
Classification Algorithms
k-Nearest Neigbour
Learning Vector Quantization
C 4.5
The Dataset
177 samples constituted of utilitarian pottery found in Cyprus analyzed with ED-XRF analysis
Dated in the Philia phase as well as the Early and Middle Bronze Ages
Categorized into 36 classes including samples classified as outliers by the expert
05
1015202530354045
Class Label
Histogram of Class Distribution
Experiment Overview
The nature of the problem imposes constraints which lead to the deployment of the following practices: The Aitchison distance is used when classification takes
place in the Euclidian (real) space Resampling with bootstrapping for the generation of
statistics Fine tuning of the parametric aspects of each algorithm Evaluation of classification result based on the classification
accuracy and the Jaccard Index (an external cluster validity index)
Significance Testing at level 0.05 5x2 Cross Validation paired t-test Combined 5 x 2 Cross Validation F Test
Null Hypothesis
k-NN C4.5 LVQ
kNN Accept Reject
C4.5 Accept Reject
LVQ Reject Reject
Results of the experiment
45
50
55
60
65
70
75
80
85
KNN C4.5 LVQ
%
Algorithms
Classification Accuracy
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
k-NN 72.1 79.4 64.2 56.7 70.1 42.7
C4.5 68.5 77.2 61.7 49.1 63.7 38
LVQ 55.8 65.2 46.2 30.3 38.8 21
Classification Beyond Labeling
Analysis of misclassification patterns may lead to relationships between classes
Observations in the results has shown the following patterns Elements in class M1.II if misclassified would be allocated
to class M1.III. The same holds between classes M1.IV and M1.VIII, M1.VIII and M1.XII, Ph.I and M1.I
Confusion Matrix
M1.I M1.II M1.III M1.IV
M1.I 6 0 0 0
M1.II 0 6 2 0
M1.III 0 2 5 0
M1.IV 0 0 0 9
What if there is more?
Trace elements may concur more characteristically in determining the fingerprint of a deposit
The discussed experiment was also implemented for trace and main elements separately allowing us to hypothetize that trace elements might contain additional information which we could consider.
Classification using the Main Elements
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
KNN 73.2 79.5 67.6 52.9 64.8 40.5
C4.5 67.8 75.1 62.7 43.2 56.8 32.8
LVQ 57.3 67 48.8 29.3 36.6 20.9
Classification using just the Trace Elements
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
66.6 75 58.2 47.8 57.7 35.7
64.5 71.2 55.8 46.1 58.1 35.6
59.1 66.2 51.4 40.2 52.2 26.1
Last words....
Data transformation should be used only when fully understood and inline with the analysis objective
Analysis results may be misleading due to incomplete data : therefore exploratory analysis prior to any other analysis is crucial in gaining insight on the problem
Machine learning techniques and statistical analysis can be very useful if used appropriately; Necessary to consider the assumptions each method
imposes
Important to maintain consistency
Ensure no conflicting constraints
Thank you for the attention!
Comments and Questions are Welcome!