maria theodoridou semantic integration experiments

16
ARIADNE is funded by the European Commission's Seventh Framework Programme SemanAc IntegraAon experiments Improving Interoperability and Reusability Unlocking the PotenAal of Digital Archaeological Data Florence, 15 December 2016 Maria Theodoridou FORTHICS, Greece

Upload: ariadnenetwork

Post on 13-Apr-2017

133 views

Category:

Data & Analytics


3 download

TRANSCRIPT

ARIADNE  is  funded  by  the  European  Commission's  Seventh  Framework  Programme  

SemanAc  IntegraAon  experiments  Improving  Interoperability  and  Reusability    

Unlocking  the  PotenAal  of  Digital  Archaeological  Data    Florence,  15  December  2016    

 

Maria  Theodoridou  FORTH-­‐ICS,  Greece  

The  challenge    Build   an   Integrated   Knowledge   Repository     and   support   innovaAve   reasoning   on  archaeological  datasets  (relaAng  and  combining  data)  preserving  the  original  meaning  and  the  perspecAve  of  the  different  data  providers.      Two  main  pillars:  Ø  a  global,  extensible  schema  in  the  form  of  a  formal  ontology  that  allows  for  

integraAon  without  loss  of  meaning.  

Ø  ARIADNE  Reference  Model  =  CIDOC  CRM  +  Extension  Suite  

Ø  Common  vocabularies/terminologies  Ø  Use  of  well  established  standard  terminologies  Ø  GeCy  AAT  Ø  Nomisma.org  

ARIADNE  Reference  Model  

Few concepts, high recall

Special concepts, high precision

Case  Studies  Ø  NumismaAcs    

•  tradiAonal  science  with  experience  and  iniAaAves  in  standardizaAon  so  it  was  chosen  as  a  very  good  starAng  point  for  item-­‐level  integraAon      

•  Nomisma.org    serves    as    a    authoritaAve    resource      

Ø  Wood/Dendrochronology  •  integraAon  of  informaAon  from  diverse  datasets  and  (via  NLP)  

archaeological  reports  in  different  languages  •  GeCy  AAT  serves  as  an  authoritaAve  resource  

Ø  Sculptures  •  data  integraAon  of  sources  from  various  disciplines  including  sculpture  informaAon  and  its  archaeological  context.    

•  focuses  on  the  provenance  of  informaAon  according  to  bibliographic  references  which  leads  to  advanced  literature  research  

NumismaAcs  Case  Study  Extracts  of  5  diverse  databases  &  datasets:  Ø OEAW:  dFMRO  coin  archive  72  records    

Ø COINS  Project:  SAR  Archive  627  records    

Ø COINS  Project:  FWM  Archive  

Ø iDAI  Coins  Pergamon  517  records    

Ø CultureItalia:  MuseiD-­‐Italia  25562  records  

Ø NLP  data  from  Heslington  East  ExcavaAon  Archive  37  records  

Ø ACDM  records    

NumismaAcs  Case  Study  

Wood/Dendrochronology  Case  Study  •  Extracts  of  5  archaeological  datasets,  output  from  NLP  

on  25  grey  literature  reports  •  MulAlingual  -­‐  English,  Dutch  and  Swedish  data  •  Data  integraAon  via  CIDOC  CRM  and  Geay  AAT    •  1.09  million  RDF  triples  •  23,594  records    •  37,935  objects  •  DemonstraAon  query  builder    

for  easier  cross-­‐search  and    browse  of  integrated  datasets  

Wood/Dendrochronology    Case  Study  

SPARQL  queries  

DemonstraAon  applicaAon:    Query  Builder  

DCCD  

RDF  triple  store  

ADS,    DANS,  SND  

Geay  AAT  (RDF)  

VAG    cruck   NMS  VAG    

dendro   UNID  

XML  NLP  

Direct  import  TransformaAon  (STELETO)  

Cleansing  +  NormalisaAon  (OpenRefine)  

tabular  records  

TransformaAon  (STELETO)  

Grey  literature  Archaeological  datasets  

 tabular  records  TransformaAon  (XSLT)  

Sculptures  Case  Study  •  Extracts  of  5  diverse  databases  &  datasets:  –  Archaeological  object  database:  Arachne  –  Field  research  databases:  Athenian  Agora,  iDAI.field  – Museum  data:  BriAsh  Museum    –  Research  data:  Oxford  Roman  Economy  Project  

•  Data  integraAon  via  CIDOC  CRM  and  controlled  vocabularies:  Geay  AAT,  Wikidata,    Zenon,  iDAI.gazeaeer  

•  5,44  million  triples  •  58343  records    

Sculptures  Case  Study  

IntegraLon  &  Interoperability    ARIADNE  portal  

Integrated  Knowledge  Repository  

X3ML  Mapping  Framework  

mapping  provider  dataset  records  to  CIDOC  CRM  

Content  Providers    

ARIADNE  aggregaLon  infrastructure  

Provider  dataset  descripLons  

Catalog  

Integrated  Browse/Query  Interface    

Provider    records  

ACDM  records  

ACDM  records  

mapping  ACDM  records  to  CIDOC  CRM  

Browse  the  Catalog  

NLP  

NLP  records  

Integrated  Knowledge  Repository  Experimental  integrated  knowledge  repository  

Ø  NumismaAcs  Case  Sudy  1,2M  triples  Ø  Wood/Dendrochronology  Case  Study  1,5M  triples  Ø  Sculptures  Case  Study  5,5  M  triples  Ø  AAT  thesaurus  4,4M  triples  

         Total  ~  13M  triples    Contains  different  levels  of  informaAon:  

Ø  Item  specific  informaAon  Ø  Document  research  data  Ø  NLP  data  Ø  Catalog  informaAon  

 

   

Technologies  used:  

hap://www.metaphacts.com/  

haps://www.blazegraph.com/  

Research  quesAons    Ø Query  mechanisms  support  innovaAve  reasoning  on  

archaeological  datasets    

Ø Query  power  lies  in  relaAng  and  combining    

Ø data  from  different  providers,  preserving  the  original  meaning  and  their  perspecAve    

Ø data  from  grey  literature  reports  Ø  item  level  with  catalog  info  on  archaeological  datasets    

Research  quesAons  

Ø  Find  all  bronze  coins  (item  level  info,  retrieves  datasets  from  mulAple  providers)  

Ø  Find  the  publishers  of  all  collecAons  that  contain  coins  (catalog  info)  

Ø  Find  all  datasets  and  grey  literature  reports  that  contain  bronze  antonianus  (item  level,  NLP  data  and  catalog  info)  

 

 SAR  records  

 NLP  

record  

 CulturaItalia    records  

 DAI  

record  

 OEAW  records  

Catalog  info  

ContribuAng  partners       Achille  Felicem,  PIN  

Carlo  Meghini,  CNR-­‐ISTI  

Philipp  Gerth,  DAI  

Ceri  Binding,  USW  

Douglas  Tudhope,  USW  

Andreas  Vlachidis,  USW  

Nadezhda  Kecheva,  NIAM-­‐BAS  

Sara  di  Giorgio.  ICCU  

Edeltraud  Aspoeck,  OEAW  

Anja  Masur,  OEAW  

ARIADNE  is  a  project  funded  by  the  European  Commission  under  the  Community’s  Seventh  Framework  Programme,  contract  no.  FP7-­‐INFRASTRUCTURES-­‐2012-­‐1-­‐313193.    The  views  and  opinions  expressed  in  this  presentaAon  are  the  sole  responsibility  of  the  authors  and  do  not  necessarily  reflect  the  views  of  the  European  Commission.  

Thank  you