linked data, cultural heritage & the karma mapping software

54
Linked Data & Cultural Heritage Pedro Szekely and Craig Knoblock USC/Information Sciences Institute [email protected], [email protected] http://isi.edu/integration/karma February 2015

Upload: pedro-szekely

Post on 18-Jul-2015

173 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Linked Data, Cultural Heritage & the Karma Mapping Software

Linked Data & Cultural Heritage

Pedro Szekely and Craig Knoblock USC/Information Sciences Institute [email protected], [email protected]

http://isi.edu/integration/karma

February 2015

Page 2: Linked Data, Cultural Heritage & the Karma Mapping Software

Outline

•  Problem

•  Linked Data

•  Karma

•  Reconciliation

•  Next steps

CC-By 2.0 2 USC Information Sciences Institute

Page 3: Linked Data, Cultural Heritage & the Karma Mapping Software

CURRENT STATE OF CULTURAL HERITAGE DATA

CC-By 2.0 3 USC Information Sciences Institute

Page 4: Linked Data, Cultural Heritage & the Karma Mapping Software

Humans Browsing the Web Crystal Bridges

Museum ofAmerican Art

Dallas Museum of Art

IndianapolisMuseum of Art

The Metropolitan Museum of Art

National Portrait Gallery

Smithsonian American Art Museum

USC Information Sciences Institute CC-By 2.0 4

Page 5: Linked Data, Cultural Heritage & the Karma Mapping Software

WHAT WE SEE

CC-By 2.0 5 USC Information Sciences Institute

Page 6: Linked Data, Cultural Heritage & the Karma Mapping Software

blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah      blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah      blah  blah  blah  blah  blah  blah  blah  blah    blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah    

blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah    blah  blah  blah  blah  blah  blah  blah  blah    blah  blah  blah    blah  blah  blah  blah      blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah      blah  blah  blah    blah  blah  blah  blah    blah  blah  blah    blah  blah  blah    blah  blah  blah      

blah  blah  blah  blah  

blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  

blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  

blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah    blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah  blah    

WHAT THE COMPUTER SEES

USC Information Sciences Institute CC-By 2.0 6

Page 7: Linked Data, Cultural Heritage & the Karma Mapping Software

WEB PAGES ARE UNUSABLE FOR CREATING INNOVATIVE APPLICATIONS

USING THE DATA

CC-By 2.0 7 USC Information Sciences Institute

Page 8: Linked Data, Cultural Heritage & the Karma Mapping Software

SOLUTION: Linked Open Data

“web pages for computers”

using W3C standards for publishing data

CC-By 2.0 8 USC Information Sciences Institute

Page 9: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 9

Tim Berners Lee on Linked Open Data

USC Information Sciences Institute

http://youtu.be/OM6XIICm_qo

Page 10: Linked Data, Cultural Heritage & the Karma Mapping Software

Humans Browsing the Web Crystal Bridges

Museum ofAmerican Art

Dallas Museum of Art

IndianapolisMuseum of Art

The Metropolitan Museum of Art

National Portrait Gallery

Smithsonian American Art Museum

USC Information Sciences Institute CC-By 2.0 10

Page 11: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 11

RAW DATA NOW

USC Information Sciences Institute

Page 12: Linked Data, Cultural Heritage & the Karma Mapping Software

Publish Your Raw Data Crystal Bridges

Museum ofAmerican Art

Dallas Museum of Art

IndianapolisMuseum of Art

The Metropolitan Museum of Art

National Portrait Gallery

Smithsonian American Art Museum

USC Information Sciences Institute CC-By 2.0 12

Page 13: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 13

Examples of Raw Data Now

USC Information Sciences Institute

https://github.com/cooperhewitt/collection

https://github.com/IMAmuseum/ima-collection

Page 14: Linked Data, Cultural Heritage & the Karma Mapping Software

Convert Data to CRM (2 star) Crystal Bridges

Museum ofAmerican Art

Dallas Museum of Art

IndianapolisMuseum of Art

The Metropolitan Museum of Art

National Portrait Gallery

Smithsonian American Art Museum

USC Information Sciences Institute CC-By 2.0 14

Page 15: Linked Data, Cultural Heritage & the Karma Mapping Software

Linked Museum Data (3 star) Crystal Bridges

Museum ofAmerican Art

Dallas Museum of Art

IndianapolisMuseum of Art

The Metropolitan Museum of Art

National Portrait Gallery

Smithsonian American Art Museum

USC Information Sciences Institute CC-By 2.0 15

Page 16: Linked Data, Cultural Heritage & the Karma Mapping Software

Linked Cultural Heritage Data (4 star)

USC Information Sciences Institute CC-By 2.0 16

Page 17: Linked Data, Cultural Heritage & the Karma Mapping Software

Represent Resources Using URIs

h&p://szekelys.com/family#pedro  

“Pedro”  

h&p://xmlns.com/foaf/0.1/firstName  

USC Information Sciences Institute CC-By 2.0 17

Page 18: Linked Data, Cultural Heritage & the Karma Mapping Software

Represent Information as Triples

h&p://szekelys.com/family#pedro  h&p://xmlns.com/foaf/0.1/firstName  

Subject Predicate

Object

The resource being described

A property of the resource

The value of the property

“Pedro”  

USC Information Sciences Institute CC-By 2.0 18

Page 19: Linked Data, Cultural Heritage & the Karma Mapping Software

RDF Graphs

h&p://szekelys.com/family#pedro  

“Pedro”  

foaf:firstName  

foaf:Person  rdf:type  

h&p://isi.edu/~szekely  

foaf:homepage  

USC Information Sciences Institute CC-By 2.0 19

Page 20: Linked Data, Cultural Heritage & the Karma Mapping Software

Linked Open Data

CC-By 2.0 20 USC Information Sciences Institute

Page 21: Linked Data, Cultural Heritage & the Karma Mapping Software

Steps to Create Linked Open Data

CC-By 2.0 21 USC Information Sciences Institute

Page 22: Linked Data, Cultural Heritage & the Karma Mapping Software

Steps to Create Linked Open Data •  Publish the raw data

… get the data out of the proprietary database

•  Select ontologies … that define classes and properties for our data

•  Define URI scheme … identifiers of your resources

•  Convert data to RDF … from data sources to the ontologies

•  Identify links to other Linked Data datasets … aka reconciliation, entity resolution, …

USC Information Sciences Institute CC-By 2.0 22

Page 23: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 23

CIDOC CRM

•  Select ontologies … that define classes and properties for our data

http://www.cidoc-crm.org/

USC Information Sciences Institute

Page 24: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 24

•  Define URI scheme … identifiers of your resources

USC Information Sciences Institute

Page 25: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 25

http://edan.si.edu/saam/person-institution/8 http://edan.si.edu/saam/person-institution/8/id http://edan.si.edu/saam/person-institution/8/appellation/displayname http://edan.si.edu/saam/object/12 http://edan.si.edu/saam/object/12/title http://edan.si.edu/saam/object/12/id http://edan.si.edu/saam/object/12/acquisition http://edan.si.edu/saam/object/12/production http://edan.si.edu/saam/object/12/production/date http://edan.si.edu/saam/thesauri/nationality/American http://edan.si.edu/saam/thesauri/classification/Photography

•  Define URI scheme … identifiers of your resources

USC Information Sciences Institute

Page 26: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 26

•  Convert data to RDF … from data sources to the ontologies

USC Information Sciences Institute

Page 27: Linked Data, Cultural Heritage & the Karma Mapping Software

RDF Mapping Tools

CC-By 2.0 27 USC Information Sciences Institute

TOOL SHORTCOMINGS BENEFITS custom code

labor intensive w error prone

flexible

R2RML difficult to learn w only SQL databases

W3C standard w good documentation w multiple vendors

Open Refine

no guidance w only tabular data

graphical user interface w support for reconciliation w open source

Karma university product easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

Page 28: Linked Data, Cultural Heritage & the Karma Mapping Software

XML/JSON

Services

Karma

SQL/CSV

BigData

RDF

JSON

Interactive tool for rapidly extracting, cleaning, transforming, integrating & publishing

linked data in multiple formats 28 USC Information Sciences Institute

Ontology

Page 29: Linked Data, Cultural Heritage & the Karma Mapping Software

KARMA DEMO

CC-By 2.0 29 USC Information Sciences Institute

http://youtu.be/h3_yiBhAJIc

Page 30: Linked Data, Cultural Heritage & the Karma Mapping Software

Easy To Use

CC-By 2.0 30

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

CLEAR DEPICTION OF MAPPING

USC Information Sciences Institute

Page 31: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 31

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

LEARNS TO MAP YOUR DATA

USC Information Sciences Institute

Page 32: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 32

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

SUGGEST CORRECT ADJUSTMENTS

USC Information Sciences Institute

Page 33: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 33

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

EMBEDDED PYTHON SCRIPTING

USC Information Sciences Institute

Page 34: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 34

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

IMPORT POPULAR DATA FORMATS

USC Information Sciences Institute

Page 35: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 35

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

OUTPUT RDF IN MULTIPLE FORMATS

ntriples

JSON

AVRO

SPARQL

ElasticSearch, GitHub, …

Hadoop, BigData USC Information Sciences Institute

Page 36: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 36

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

40 million documents 1 billion triples

larger than all AAC museums combined

USC Information Sciences Institute

Page 37: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 37

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

periodic update every hour, every day

continuous update as new records come in

USC Information Sciences Institute

Page 38: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 38

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

Karma compatible with R2RML tools

USC Information Sciences Institute

Page 39: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 39

easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source

Karma Is Open Souce USC Information Sciences Institute

Page 40: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 40

URI RECONCILIATION

USC Information Sciences Institute

Page 41: Linked Data, Cultural Heritage & the Karma Mapping Software

Multiple “John Singer Sargent” ima:Singer_Sargent_John a aac:Person ; dct:date "1856-1925" ; foaf:name "John Singer Sargent" .

saam:person_4253 a aac:Person ; saam:associatedPlace saam:SaamPlace_1357324439768t1r13950_0, saam:SaamPlace_1357324439768t1r13951_0 ; saam:constituentId "4253" ; rdaGr2:biographicalInformation “Painter. Sargent traveled …" ; rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ; rdaGr2:dateOfBirth "1856-1-12" ; rdaGr2:dateOfDeath "1925-4-15" ; rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ; rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ; skos:altLabel "John S. Sargent" ; skos:prefLabel "John Singer Sargent" .

cb:12_4567 a aac:Person ; ont0:dateOfBirth "1879", "1885" ; ont0:dateOfDeath "1925" ; skos:prefLabel "John Singer Sargent" .

met:person_1893_3819 a aac:Person ; ont0:placeOfResidence "North and Central America", "United States" ; foaf:name "John Singer Sargent" .

dma:person_John_Singer_Sargent a aac:Person ; ont0:dateOfBirth "1856" ; ont0:dateOfDeath "1925" ; foaf:name "John Singer Sargent" .

Pedro  Szekely  USC Information Sciences Institute CC-By 2.0 41

Page 42: Linked Data, Cultural Heritage & the Karma Mapping Software

John Singer Sargent ima:SaamPerson_John_Singer_Sargent a aac:Person ; dct:date "1856-1925" ; foaf:name "John Singer Sargent" .

aac:Person_4253 a aac:Person ; saam:associatedPlace saam:SaamPlace_1357324439768t1r13950_0, saam:SaamPlace_1357324439768t1r13951_0 ; saam:constituentId "4253" ; rdaGr2:biographicalInformation “Painter. Sargent traveled …" ; rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ; rdaGr2:dateOfBirth "1856-1-12" ; rdaGr2:dateOfDeath "1925-4-15" ; rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ; rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ; skos:altLabel "John S. Sargent" ; skos:prefLabel "John Singer Sargent" .

cb:SaamPerson_John_Singer_Sargent a aac:Person ; ont0:dateOfBirth "1879", "1885" ; ont0:dateOfDeath "1925" ; skos:prefLabel "John Singer Sargent" .

met:SaamPerson_John_Singer_Sargent a aac:Person ; ont0:placeOfResidence "North and Central America", "United States" ; foaf:name "John Singer Sargent" .

dallas:SaamPerson_John_Singer_Sargent a aac:Person ; ont0:dateOfBirth "1856" ; ont0:dateOfDeath "1925" ; foaf:name "John Singer Sargent" .

Pedro  Szekely  USC Information Sciences Institute CC-By 2.0 42

Page 43: Linked Data, Cultural Heritage & the Karma Mapping Software

Reconciled “John Singer Sargent” URIs

saam:person_4253 owl:sameAs cb:12_4567 ; owl:sameAs dma:person_John_Singer_Sargent ; owl:sameAs ima:Singer_Sargent_John ; owl:sameAs met:SaamPerson_John_Singer_Sargent ; owl:sameAs dbpedia:John_Singer_Sargent ; owl:sameAs nytimes/N49129220686803623753 ; owl:sameAs w-flick/John_Singer_Sargent ; ....

Pedro  Szekely  USC Information Sciences Institute CC-By 2.0 43

Page 44: Linked Data, Cultural Heritage & the Karma Mapping Software

URI Reconciliation In Karma

Pedro  Szekely  USC Information Sciences Institute CC-By 2.0 44

Page 45: Linked Data, Cultural Heritage & the Karma Mapping Software

Results of Automatic Linking

Pedro  Szekely  

99% are correct 6% are missing

USC Information Sciences Institute CC-By 2.0 45

Page 46: Linked Data, Cultural Heritage & the Karma Mapping Software

Steps to Create Linked Open Data •  Publish the raw data

… get the data out of the proprietary database

•  Select ontologies … that define classes and properties for our data

•  Define URI scheme … identifiers of your resources

•  Convert data to RDF … from data sources to the ontologies

•  Identify links to other Linked Data datasets … aka reconciliation, entity resolution, …

USC Information Sciences Institute CC-By 2.0 46

Page 47: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 47

TMS to CRM easy?

USC Information Sciences Institute

Page 48: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 48

TMS to CRM easy?

USC Information Sciences Institute

NO  

Page 49: Linked Data, Cultural Heritage & the Karma Mapping Software

COMMUNITY EFFORT •  Publish the raw data

… get the data out of the proprietary database

•  Select ontologies … that define classes and properties for our data

•  Define URI scheme … identifiers of your resources

•  Convert data to RDF … from data sources to the ontologies

•  Identify links to other Linked Data datasets … aka reconciliation, entity resolution, …

USC Information Sciences Institute CC-By 2.0 49

Page 50: Linked Data, Cultural Heritage & the Karma Mapping Software

Radical Ideas

•  ULAN in Wikipedia or Wikidata •  ULAN in GitHub •  Collection data in GitHub •  Community created CRM mappings in GitHub •  CRM in JSON-LD in GitHub •  Tools to export from TMS to GitHub

USC Information Sciences Institute CC-By 2.0 50

Page 51: Linked Data, Cultural Heritage & the Karma Mapping Software

STORING AND MAINTAINING

THE DATA CC-By 2.0 51 USC Information Sciences Institute

Page 52: Linked Data, Cultural Heritage & the Karma Mapping Software

Deployment Options

CC-By 2.0 52 USC Information Sciences Institute

Technology Shortcomings Benefits SPARQL endpoint

low reliability, esoteric, slow

sophisticated query language

RDF dump no query capability, esoteric

flexibility: clients can download and use in applications, easy to publish

JSON-LD + ElasticSearch

restricted query language

very high performance, mainstream technology, easy to publish

Karma supports the three options

Page 53: Linked Data, Cultural Heritage & the Karma Mapping Software

CC-By 2.0 53

federation every publishes their data with

their own URIs

aggregation aggregator repulishes everyone’s

data with new URIs

USC Information Sciences Institute

Page 54: Linked Data, Cultural Heritage & the Karma Mapping Software

thanks for your attention!

https://github.com/usc-isi-i2/Web-Karma!Open Source, Apache 2 License!

CC-By 2.0 54 USC Information Sciences Institute