introduction of linked data for science

38
Linked Open Data for ACademia Introduction of Linked Data for Science Hideaki Takeda [email protected] / ORCID:0000-0002-2909-7163 Professor, National Institute of Informatics tional Conference on Open Data in Biodiversity and Ecological Research, 20 N

Upload: hideaki-takeda

Post on 10-May-2015

497 views

Category:

Technology


0 download

DESCRIPTION

Presented at 2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013

TRANSCRIPT

Page 1: Introduction of Linked Data for Science

Linked Open Data for ACademia

Introduction of Linked Data for Science

Hideaki [email protected] / ORCID:0000-0002-2909-7163Professor, National Institute of Informatics

2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013

Page 2: Introduction of Linked Data for Science

Linked Open Data for ACademia

Researchers in 1983

Printed Articles

Data

Real WorldObject

Survey Article Writing

Data

Survey, Research, and Writing

Page 3: Introduction of Linked Data for Science

Linked Open Data for ACademia

Researchers in 2013

Printed Articles

Data

Real WorldObject

Survey Article Writing

Data

Digital Articles

Acquiring Data Publishing DataData

Digital Information

Digital distribution of articles

Sharing and re-use of dataMore articles ever!

Real and Digital objects as target

Page 4: Introduction of Linked Data for Science

Linked Open Data for ACademia

Trends of Research and Data

• Rapid Growth– Increase of article publications– Big data and many (small) databases

• Open and Share– Open access– Data sharing

• Integration– Among different types of data– Across domains

Page 5: Introduction of Linked Data for Science

Linked Open Data for ACademia

Key Requirements

• Accessibility– Research results must be shared

• Reusability– Research results are expected to be re-used by

other research• Sustainability– Research results must be preserved

Page 6: Introduction of Linked Data for Science

Linked Open Data for ACademia

Key Requirements

• Accessibility– Research results must be shared

• Reusability– Research results are expected to be re-used by

other research• Sustainability– Research results must be preserved

Page 7: Introduction of Linked Data for Science

Linked Open Data for ACademia

Open Data

• Open Data is not just “data which is open”, rather …

• “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/

• Use, re-use, redistribute• Open license

Page 8: Introduction of Linked Data for Science

Linked Open Data for ACademia

5 Open Data★

- make your stuff available on the Web (whatever format) under an open license

- make it available as structured data (e.g., Excel instead of image scan of a table)

- use non-proprietary formats (e.g., CSV instead of Excel)

- use URIs to denote things, so that people can point at your stuff

- link your data to other data to provide context

http://5stardata.info/

Page 9: Introduction of Linked Data for Science

Linked Open Data for ACademia

Linked Data/Linked Open Data (LOD)

- use URIs to denote things, so that people can point at your stuff

- link your data to other data to provide context

Page 10: Introduction of Linked Data for Science

Linked Open Data for ACademia

Web of Documents

Page 11: Introduction of Linked Data for Science

Linked Open Data for ACademia

Web of Data

Another data to the observation

Data identical to this

What’s the meaning of the data?

Inter-connection between data in difference data sources is enabled

Page 12: Introduction of Linked Data for Science

Linked Open Data for ACademia

Linked Data Principles• The four rules for Linked Data

– Use URIs as names for things • Give a URI to every object in the world!

– Use HTTP URIs so that people can look up those names. • Don’t use URN

– When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Provide machine-readable data for URI

– Include links to other URIs. so that they can discover more things. • Make data linked together just like Web

Linked Data, TBL, http://www.w3.org/DesignIssues/LinkedData.html

Page 13: Introduction of Linked Data for Science

Linked Open Data for ACademia

How to express data in Linked Data• Use RDF(+RDFS, OWL)

– Very simple : <Subject> <predicate> <object> .

<http://www-kasm.nii.ac.jp/~takeda#me> rdfs:type foaf:Person .<http://www-kasm.nii.ac.jp/~takeda#me> foaf:name “Hideaki Takeda” .<http://www-kasm.nii.ac.jp/~takeda#me> foaf:gender “male” .<http://www-kasm.nii.ac.jp/~takeda#me> foaf:knows <http://southampton.rkbexplorer.com/id/person07113> .

http://www-kasm.nii.ac.jp/~takeda#me

http://southampton.rkbexplorer.com/id/person07113

foaf:knows

foaf:Person

rdfs:type

foaf:name foaf:gender

“Hideaki Takeda” “male”

Page 14: Introduction of Linked Data for Science

Linked Open Data for ACademia

“1955-06-08”

Linked Data の記述

http://www-kasm.nii.ac.jp/~takeda#me

http://southampton.rkbexplorer.com/id/person-07113

foaf:knows

foaf:Person

rdfs:type

foaf:name foaf:gender

<http://dbpedia.org/resource/Tim_Berners-Lee>

owl:sameAs

dbpprop:birthDatedbpprop:birthPlacedbpprop:name

dbpedia:Computer_scientist

dbpprop:occupation

“Hideaki Takeda” “male”

“London, England”“Sir Tim Berners-Lee”

Page 15: Introduction of Linked Data for Science

Linked Open Data for ACademia

Linking Open Data (LOD)• The project to collect published Linked Data• Major Linked Data• (Translated from the original resources)

– Dbpedia (Wikipedia) 270 Million Triples– Geonames : Geo names and their latitudes and longitudes, 93 Million

Triples– MusicBrainz : Music– WordNet : Dictionary– DBLP bibliography : Bibliography for technical papers. 28 Million Triples– US Census Data: 1 Billion Triples

• ( Crawling)– FOAF (Friend Of A Friend)

• ( Wrapper )– Flickr Wrapper

Page 17: Introduction of Linked Data for Science

Linked Open Data for ACademia

Page 18: Introduction of Linked Data for Science

Linked Open Data for ACademiaLOD Cloud

(Linking Open Data)

Page 19: Introduction of Linked Data for Science

Linked Open Data for ACademia

Benefits of LOD for Science

• Truly de-centralized database– No need for central database– Everyone can create one and join the cloud!

• Truly open and sharable data and schemata– Easy for re-use and mash-up – Easy for cross-domain/discipline use and connection

• A single format for all kind of data– Easy for data processing

Page 20: Introduction of Linked Data for Science

Linked Open Data for ACademia

Bio2RDF• Bio2RDF is an open source framework to produce

and provide biological linked data that uses simple conventions on the emerging semantic web

• Bio2RDF reduces the time andeffort involved in data integration so that you can get to doing science

• 19 datasets; 1,010,758,291 triples

http://bio2rdf.org/

At the heart of Linked Data for the Life Sciences

Page 21: Introduction of Linked Data for Science

Linked Open Data for ACademia

Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier: Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data, The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science Volume 7882, 2013, pp 200-212

Page 22: Introduction of Linked Data for Science

Linked Open Data for ACademia

Bio2RDF

Page 23: Introduction of Linked Data for Science

Linked Open Data for ACademia

LODAC Project - connecting academic data -

LODAC SPECIES: Connecting species data by nameSpecimen

DB

Species Info. DB

Taxon Name DBGBIF BioSci.

DB

Research DB

No. of Names :   113118No. of Triples : 14,532,449

Data from Source BIntegrated data

dc:references dc:references

dc:references dc:references

dc:references dc:references

dc:creatordc:creator

crm:P55_has_current_location

crm:P55_has_current_location

crm:P55_has_current_locationdc:creator

Data from Source AWork

Museum

Creator

Minimum Data to identify entitiesRaw Data for entities Raw Data for entitiesLODAC Museum: LOD of data in museums

App. for query expansion

CKAN Japanese: Catalog for Open Data

DBPedia Japanese

LODAC Location: Integration of location information

Page 24: Introduction of Linked Data for Science

Linked Open Data for ACademiaLODAC SPECIES: Linking Species Information

with namesMuseum Specimen

DB

Species Info. DB

Taxon Name LOD

GBIFBioSci.

DB

Research DB

No. of Species Names : 113118No. of Triples : 14,532,449

Page 25: Introduction of Linked Data for Science

Linked Open Data for ACademia

Data model for intergration

Specimen

rdf:type

species

institutionName

collectedDate

collectionLocality

crm:has_current_location

Bryophytes

TaxonName

ScientificNameCommonName TaxonRank

species

rdfs:subClassOfrdfs:subClassOf

rdf:typerdf:type

hasCommonName

hasScientificName hasSuperTaxon

rdf:type

hasTaxonRank

rdf:type

hasTaxonRank

rdf:type

ButterflyBDLS

dcterms:source

dcterms:publisher

: Named Graph: owl:Class

Page 26: Introduction of Linked Data for Science

Linked Open Data for ACademiaSearch application

with LODAC SPECIES

http://lod.ac/apps/lsdcs

Page 27: Introduction of Linked Data for Science

Linked Open Data for ACademia

LODAC Museum

• Integrated database for information on museums in Japan– Data• No. of museums : 114• No. of triples :

40,059,131

• Integration by creator, work and institute• Data publication by RDF• Some applications using the data

Type of Information RDF type No. of items

Collections (total) lodac:Specimen + lodac:Work

ca. 1,770,000

Collections (specimen) lodac:Specimen ca. 1,690,000

Collections (creative and historical work)

lodac:Work ca. 130,000

Creators foaf:Person ca. 8,800

Institutes Foaf:Organization ca. 200,000

Page 28: Introduction of Linked Data for Science

Linked Open Data for ACademia

Integrated data processing by RDF

• Collect : RDF by converting RDB / by scraping Web• Refine: Define schema and covert data by schema• Integrate: Schema mapping, ID mapping• Publish: Dump data / SPARQL Endpoint• Use: Mash-up applications

Collect Refine Integrate Publish Use

Processed by RDF

Page 29: Introduction of Linked Data for Science

Linked Open Data for ACademia

Extract

Extracting collection data from museum websites

Property Value

Property Value

Collect

Page 30: Introduction of Linked Data for Science

Linked Open Data for ACademia

DatasetType No. Data source

Art work (lodac:Work)

ca.80,000 Catalog of the collections of 3 National Art Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums

Database for National Treasure & Important Cultural Property of National Designated (915)

The Japanese Art Thesaurus (266)Specimen (lodac:Speciment)

ca.1,690,000 (100+ Museum collections)Science Net (National Science Museum)

Person (foaf:Person) ca. 8,800 The Japanese Art Thesaurus

Facilities (icls. Museum)

ca. 200,000 The Japanese Art ThesaurusCultural Heritage OnlineGIS data National and Regional Planning Bureau

Collect

Page 31: Introduction of Linked Data for Science

Linked Open Data for ACademia

31

Standardization of dataRe-organized common metadata.

Raw Data

dc:title

crm:P45_consistOf

skos:preflabel

lodac:era

Re-organized Metadata

Current organized policies・ Use existing metadata・ Define own metadata.

....

Refine

Page 32: Introduction of Linked Data for Science

Linked Open Data for ACademia

Metadata schema for works

lodac:Work PropertyGenre lodac:genreType of cultural assets lodac:culturalAssetsCreator dc:creator / dc11:creatorNationality crm:P7_took_place_atTitle dc:title / skos:prefLabelTitle Pronunciation (yomi) dc:title @ja-hrkt / skos:altLabelTitle in English dc:title @en / skos:altLabelInscription crm:P62I_is_depicted_bySeal crm:P65_shows_visual_itemNo. of parts crm:P57_has_number_of_partsCollection dc:isPartOfCreated year dc:createdEstimated starting year lodac:estimatedStartYearMaterial dc:medium / crm:P45_consists_of

Refine

Page 33: Introduction of Linked Data for Science

Linked Open Data for ACademia

Integrating Data

Data from Source BIntegrated data

dc:references dc:references

dc:references dc:references

dc:references dc:references

dc:creatordc:creator

crm:P55_has_current_location

crm:P55_has_current_location

crm:P55_has_current_locationdc:creator

Data from Source A

Work

Museum

Creator

Minimum Data to identify entitiesRaw Data for entities Raw Data for entities

Integrate

Page 34: Introduction of Linked Data for Science

Linked Open Data for ACademia

34

Integrating DataIntegrate Item Source Amount

of DataIntegration

Data

FacilitiesA.Japanese Art Thesaurus 648

77B.Cultural Heritage Online 915

Title of important cultural properties

A.Japanese Art Thesaurus (Art work) 3,80074

B.DB for National Treasure (Art work) 10,115

Creator information and Work Title

A.Japanese Art Thesaurus (Creator) 1,33215,020

B.All of art work (Work title string) 61,861

Creator nameA.Japanese Art Thesaurus (Creator) 1,332

615B.All of art work title(using creator name) 61,861

Integrate

Page 35: Introduction of Linked Data for Science

Linked Open Data for ACademia

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a

c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:skos="http://www.w3.org

/2004/02/skos/core#">

<foaf:Person rdf:about="http://lod.ac/id/359">

<lodac:creates rdf:resource="http://lod.ac/id/20029"/>

<lodac:creates rdf:resource="http://lod.ac/id/20128"/>

<lodac:creates rdf:resource="http://lod.ac/id/20755"/>

<lodac:creates rdf:resource="http://lod.ac/id/24768"/>

<lodac:creates rdf:resource="http://lod.ac/id/26732"/>

……

<dc:references rdf:resource="http://ja.dbpedia.org/resource/ 下村観山 "/>

<dc:references rdf:resource="http://lod.ac/ref/359"/>

<rdfs:label xml:lang="ja"> 下村観山 </rdfs:label>

<skos:prefLabel xml:lang="ja"> 下村観山 </skos:prefLabel>

<foaf:name xml:lang="ja"> 下村観山 </foaf:name>

</foaf:Person>

Publishing data as RDF

ID-resource URI(Own address)http://lod.ac/id/359

Ref-resource URIhttp://lod.ac/ref/359

External linkDBpedia Japanese

Links to her/his work URI

Publish

Page 36: Introduction of Linked Data for Science

Linked Open Data for ACademia

Yokohama Art Spot

–Application using museum and local data–Data related to art in

Yokohama• Collections• Events• Q&A

http://lod.ac/apps/yas/

LODAC Museum   ×   Yokohama Art LOD   ×   PinQA

Use

Page 37: Introduction of Linked Data for Science

Linked Open Data for ACademia

System Architecture

Work

InstitutionArtistArtist Institution

EventQuestion

AnswerUser

PinQAYokohama Art LOD

LODAC Museum

SPARQL

JSON

SPARQL

JSON

XML

SPARQL

Yokohama Art Spot

‣ Python + SPARQLWrapper‣ Geolocation

Use

Page 38: Introduction of Linked Data for Science

Linked Open Data for ACademia

Conclusion• Data and Web– Great Potential!

• Linked Data - Exploit the power of Web –– Simple Structure: URI and RDF– Truly distributed data management– Easy to link to each other

– Suitable for inter-disciplinary areas• Left Issues– Scalability– Sustainability

• DOI: DataCite• ORCID