caa 2014 - to boldly or bravely go? experiences of using semantic technologies for archaeological...
Post on 11-Jun-2015
227 Views
Preview:
DESCRIPTION
TRANSCRIPT
by
Keith May @Keith_May Ceri Binding & Prof Doug Tudhope
Faculty of Advanced Technology University of South Wales
To Boldly or Bravely Go? Experiences of using Semantic Technologies for
Archaeological Resources
Excavation record data modelling
• CRM-EH focuses on common ‘core’ Concepts of our Archaeological processes
• Stratigraphic relationships (e.g. Harris matrix) crucial for relating individual records
• Mapped only a Limited degree of the minute archaeological detail to CIDOC CRM
• Different broad categories of contexts (Deposits, Masonry, Timber, etc) handled by separate forms but modelled together
• Model already "complex" enough - most archaeologists find it a little daunting
Details of Context on recording
form
What about comparing records across different countries?
With thanks to Anja Masur
Documentation• Different excavation methods bring differing documentation • Comparison of different documentation sheets
Similarities and Differences
Context
LocusExcavation
Unit
Lot
Level
StratumBehälter (Troy)
(Basket)
Semantics One language - one meaning – different terms
Stratigraphic Unit
With thanks to Gerald Hiebel
English Heritage Recording Manual
English Heritage Recording Manual with CRM-EH 'Extensions'
German - e.g. Gottingen & Bayer
Befunde - Stratigraphic Unit / Context
1. Bayer -Befundbuch (positive deposit?)
Bodenbefunde (soil SU)
Baubefunde (built SU e.g. Walls)
BefundeKomplex - Feature (Group)
Planum = Multi-context plans by level?
With thanks to Gerald Hiebel
Bavarian Recording Manual
Catalhoyuk - Hodder's 'Post-Processual' excavation recording
Units - Stratigraphic units, similar to Contexts
Features - groupings of units or more complex structures, similar to
MoLA Groups
French - e.g. ???? Please !!!!
Examples using Single Context Recording methodology?
INRAP N'est pas?
Other excavation methodologies?
Prototype Controlled Vocabulary searching
▪Controlled vocabularies online ▪Vocabularies from EH, RCAHMS, RCAHMW ▪Conversion to a common standard format (SKOS) ▪Persistent globally unique identifiers for every concept ▪Made available online as Linked Open Data ▪Also downloadable data files and listings
▪Web services ▪Facilitate concept searching, browsing, suggestion, validation
▪ Tools to use controlled vocabularies ▪Browser-based ‘widget’ user interface controls ▪Search, browse, suggest, select concepts
▪Case studies ▪Legacy data to thesaurus alignment ▪Thesaurus to thesaurus alignment ▪Third party use of project outcomes
STELLAR Project Tools - SKOS TemplateSKOS = Simple Knowledge Organisation System
Using SKOS - W3C standard for Web-based Terminologies
skos:Concept Castle:c789
skos:Concept Motte:c456
skos:broader skos:narrower
skos:Concept Bailey:c789
skos:Concept Motte:c456
skos:related skos:related
skos:ConceptScheme Monument:s123
skos:Concept Motte:c456
skos:inScheme
SKOS_CONCEPTS – scheme_id, broader_id, related_id
Voacabulary Widgets – e.g. for OASIS ▪ Scheme list ▪ Scheme details ▪ Top concepts ▪ Composite control
(composite control)(top concepts)
(scheme details)
(scheme list) More Widget details on HeritageData.org
LOD Heritage Vocabularies: http://www.heritagedata.org
Thesaurus searching and browsing
- Semantic ENrichment Enabling Sustainability of arCHAeological LinksSENESCHAL
Early adoption (continued)▪Clwyd-Powys Archaeological Trust (SENESCHAL widgets
embedded into HER application and mobile field recording app)
British Oceanographic Data Centre - LOD
EH Thesauri of Maritime
Craft
With Thanks to Adam Leadbetter
Typical alignment problems encountered▪ Simple spelling errors ▪ POSTHLOLE”, “CESS PITT”, “FURRROWS”, FLINT SCRAPPER”
▪ Alternate word forms ▪ “BOUNDARY”/”BOUNDARIES”, “GULLEY”/”GULLIES”
▪ Prefixes / suffixes ▪ “RED HILL (POSSIBLE)”, “TRACKWAY (COBBLED)”, “CROFT?”, “CAIRN (POSSIBLE)”,
“PORTAL DOLMEN (RE-ERECTED)” ▪ Nested delimiters ▪ “POTTERY, CERAMIC TILE, IRON OBJECTS, GLASS”
▪ Terms not intended for indexing ▪ “NONE”, “UNIDENTIFIED OBJECT”, “N/A”, “NA”, “INCOHERENT”
▪ Terms that would not be in (any) thesauri ▪ “WOTSITS PACKET”, “CHARLES 2ND COIN”, “ROMAN STRUCTURE POSSIBLY A VILLA“,
“ST GUTHLACS BENEDICTINE PRIORY”, “WORCESTER-BIRMINGHAM CANAL”, “KUNGLIGA SLOTTET”, “SUB-FOSSIL BEETLES”
▪ More specific phrases ▪ “SIDE WALL OF POT WITH LUG”, “BRICK-LINED INDUSTRIAL WELL OR MINE SHAFT”,
“ALIGNMENT OF PLATFORMS AND STONES”
Data alignment - R&D approach
▪Levenshtein edit distance algorithm ▪ Measures optimal number of character edits
required to change one string into another ▪ Accommodates small spelling differences/errors ▪ Bulk alignment process ▪ Compares each value to all terms from specified
thesaurus – obtain best textual match ▪ Similarity threshold introduced to suppress low
scoring matches. Levenshtein algorithm will always produce a match, even if it is a bad one!
▪ Periods require an additional approach due to mixed formats (named periods, numeric ranges etc.)
Data Alignment R&D Results – Monument Types
Needs some level of Human verification by
Domain experts. Do we need semantic
wiki -style interfaces
To enable that?
Conclusions and Challenges - Do you want to share Open Archaeological Data
somewhere on or over the horizon?
Different archaeological recording systems share common conceptual frameworks and semantic relationships
By conceptualising common relationships in our different data sets at a broad (metadata) level and aligning vocabularies of shared reference terms we can cross-search data with more semantic accuracy to find patterns and answers to related research questions
The technologies are being developed in other domains but is there a common will for sharing archaeological data Openly in the interests of improving research methods?
References
Catalin Pavel. "Describing and Interpreting the Past" Tudhope, May, Binding, Vlachidis. "Connecting Archaeological Data and Grey Literature via Semantic Cross Search" - Internet Archaeology Vol 30
Contact: Keith.May@english-heritage.org.uk
@Keith_May
top related