undusting the foundations of compositional analysis approaches of ceramic archaeological...

Un-Dusting the Foundations of Compositional Analysis Approaches of Ceramic Archaeological Data

Elisavet CharalambousNARNIA ESR08, EuroCy Innovations

Department of Electrical and Computer Engineering, University of Cyprus

[email protected]

[email protected]

http://www.eurocyinnovations.com/eurocy-website/


mailto:[email protected]

mailto:[email protected]

The Classification Problem

Classification of archaeological ceramics deals with the isolation of ceramic groups of similar chemical profiles

Given a number of artifacts of known fabric/categoryidentify to which of a set of categories an uncategorizedartifact belongs

An instance of supervised learning which assumes that atraining set of correctly identified observations is available.

Algorithms which perform classification require theprovision of a set with known labels.

The lables of un-categorized observations are determined byprocessing the already classified material



Compositional Data Analysis

Compositional data do not vary independently Concentration based approaches to data analysis

can lead to faulty conclusions.

Compositional data lay in the constrained Simplex Space

Correlation analysis and the Euclidean distance are not mathematically meaningful concepts in this context

XY plots for raw or log-transformed data should only be used with care in an exploratory data analysis (EDA) sense

… cautious with the belief that good data will speak for themselves



Hypothesis Formulation

The development of an experimental design for the classification of data from different datasets which span several periods.

The proposed design is tested with the deployment of three well known classification algorithms.

Test 1: Null Hypothesis

The classification algorithms, k-Nearest Neighbour, C4.5(based on DecisionTrees) and Learning Vector Quantization (LVQ Networks) perform equallywhen analyzing ceramic compositional data.

Test 2: Alternative Hypothesis

The pairwise performance of algorithms is not equal and one algorithmoutperforms the others



Classification Algorithms

k-Nearest Neigbour

Learning Vector Quantization

C 4.5



The Dataset

177 samples constituted of utilitarian pottery found in Cyprus analyzed with ED-XRF analysis

Dated in the Philia phase as well as the Early and Middle Bronze Ages

Categorized into 36 classes including samples classified as outliers by the expert

05

1015202530354045

Class Label

Histogram of Class Distribution



Experiment Overview

The nature of the problem imposes constraints which lead to the deployment of the following practices: The Aitchison distance is used when classification takes

place in the Euclidian (real) space Resampling with bootstrapping for the generation of

statistics Fine tuning of the parametric aspects of each algorithm Evaluation of classification result based on the classification

accuracy and the Jaccard Index (an external cluster validity index)

Significance Testing at level 0.05 5x2 Cross Validation paired t-test Combined 5 x 2 Cross Validation F Test



Null Hypothesis

k-NN C4.5 LVQ

kNN Accept Reject

C4.5 Accept Reject

LVQ Reject Reject

Results of the experiment

45

50

55

60

65

70

75

80

85

KNN C4.5 LVQ

%

Algorithms

Classification Accuracy


(%)

Jaccard Index

(%)

Mean Max Min Mean Max Min

k-NN 72.1 79.4 64.2 56.7 70.1 42.7

C4.5 68.5 77.2 61.7 49.1 63.7 38

LVQ 55.8 65.2 46.2 30.3 38.8 21



Classification Beyond Labeling

Analysis of misclassification patterns may lead to relationships between classes

Observations in the results has shown the following patterns Elements in class M1.II if misclassified would be allocated

to class M1.III. The same holds between classes M1.IV and M1.VIII, M1.VIII and M1.XII, Ph.I and M1.I

Confusion Matrix

M1.I M1.II M1.III M1.IV

M1.I 6 0 0 0

M1.II 0 6 2 0

M1.III 0 2 5 0

M1.IV 0 0 0 9



What if there is more?

Trace elements may concur more characteristically in determining the fingerprint of a deposit

The discussed experiment was also implemented for trace and main elements separately allowing us to hypothetize that trace elements might contain additional information which we could consider.

Classification using the Main Elements


(%)

Jaccard Index

(%)


KNN 73.2 79.5 67.6 52.9 64.8 40.5

C4.5 67.8 75.1 62.7 43.2 56.8 32.8

LVQ 57.3 67 48.8 29.3 36.6 20.9

Classification using just the Trace Elements


(%)

Jaccard Index

(%)


66.6 75 58.2 47.8 57.7 35.7

64.5 71.2 55.8 46.1 58.1 35.6

59.1 66.2 51.4 40.2 52.2 26.1



Last words....

Data transformation should be used only when fully understood and inline with the analysis objective

Analysis results may be misleading due to incomplete data : therefore exploratory analysis prior to any other analysis is crucial in gaining insight on the problem

Machine learning techniques and statistical analysis can be very useful if used appropriately; Necessary to consider the assumptions each method

imposes

Important to maintain consistency

Ensure no conflicting constraints



Thank you for the attention!

Comments and Questions are Welcome!



undusting the foundations of compositional analysis approaches of ceramic archaeological...

Documents

classification of data

ceramic compositional

good data

reject lvq

reject c4

performance of algorithms

lvq knn

knn c4