introduction of linked data for science
DESCRIPTION
Presented at 2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013TRANSCRIPT
Linked Open Data for ACademia
Introduction of Linked Data for Science
Hideaki [email protected] / ORCID:0000-0002-2909-7163Professor, National Institute of Informatics
2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013
Linked Open Data for ACademia
Researchers in 1983
Printed Articles
Data
Real WorldObject
Survey Article Writing
Data
Survey, Research, and Writing
Linked Open Data for ACademia
Researchers in 2013
Printed Articles
Data
Real WorldObject
Survey Article Writing
Data
Digital Articles
Acquiring Data Publishing DataData
Digital Information
Digital distribution of articles
Sharing and re-use of dataMore articles ever!
Real and Digital objects as target
Linked Open Data for ACademia
Trends of Research and Data
• Rapid Growth– Increase of article publications– Big data and many (small) databases
• Open and Share– Open access– Data sharing
• Integration– Among different types of data– Across domains
Linked Open Data for ACademia
Key Requirements
• Accessibility– Research results must be shared
• Reusability– Research results are expected to be re-used by
other research• Sustainability– Research results must be preserved
Linked Open Data for ACademia
Key Requirements
• Accessibility– Research results must be shared
• Reusability– Research results are expected to be re-used by
other research• Sustainability– Research results must be preserved
Linked Open Data for ACademia
Open Data
• Open Data is not just “data which is open”, rather …
• “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/
• Use, re-use, redistribute• Open license
Linked Open Data for ACademia
5 Open Data★
- make your stuff available on the Web (whatever format) under an open license
- make it available as structured data (e.g., Excel instead of image scan of a table)
- use non-proprietary formats (e.g., CSV instead of Excel)
- use URIs to denote things, so that people can point at your stuff
- link your data to other data to provide context
http://5stardata.info/
Linked Open Data for ACademia
Linked Data/Linked Open Data (LOD)
- use URIs to denote things, so that people can point at your stuff
- link your data to other data to provide context
Linked Open Data for ACademia
Web of Documents
Linked Open Data for ACademia
Web of Data
Another data to the observation
Data identical to this
What’s the meaning of the data?
Inter-connection between data in difference data sources is enabled
Linked Open Data for ACademia
Linked Data Principles• The four rules for Linked Data
– Use URIs as names for things • Give a URI to every object in the world!
– Use HTTP URIs so that people can look up those names. • Don’t use URN
– When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Provide machine-readable data for URI
– Include links to other URIs. so that they can discover more things. • Make data linked together just like Web
Linked Data, TBL, http://www.w3.org/DesignIssues/LinkedData.html
Linked Open Data for ACademia
How to express data in Linked Data• Use RDF(+RDFS, OWL)
– Very simple : <Subject> <predicate> <object> .
<http://www-kasm.nii.ac.jp/~takeda#me> rdfs:type foaf:Person .<http://www-kasm.nii.ac.jp/~takeda#me> foaf:name “Hideaki Takeda” .<http://www-kasm.nii.ac.jp/~takeda#me> foaf:gender “male” .<http://www-kasm.nii.ac.jp/~takeda#me> foaf:knows <http://southampton.rkbexplorer.com/id/person07113> .
http://www-kasm.nii.ac.jp/~takeda#me
http://southampton.rkbexplorer.com/id/person07113
foaf:knows
foaf:Person
rdfs:type
foaf:name foaf:gender
“Hideaki Takeda” “male”
Linked Open Data for ACademia
“1955-06-08”
Linked Data の記述
http://www-kasm.nii.ac.jp/~takeda#me
http://southampton.rkbexplorer.com/id/person-07113
foaf:knows
foaf:Person
rdfs:type
foaf:name foaf:gender
<http://dbpedia.org/resource/Tim_Berners-Lee>
owl:sameAs
dbpprop:birthDatedbpprop:birthPlacedbpprop:name
dbpedia:Computer_scientist
dbpprop:occupation
“Hideaki Takeda” “male”
“London, England”“Sir Tim Berners-Lee”
Linked Open Data for ACademia
Linking Open Data (LOD)• The project to collect published Linked Data• Major Linked Data• (Translated from the original resources)
– Dbpedia (Wikipedia) 270 Million Triples– Geonames : Geo names and their latitudes and longitudes, 93 Million
Triples– MusicBrainz : Music– WordNet : Dictionary– DBLP bibliography : Bibliography for technical papers. 28 Million Triples– US Census Data: 1 Billion Triples
• ( Crawling)– FOAF (Friend Of A Friend)
• ( Wrapper )– Flickr Wrapper
Linked Open Data for ACademia
Linked Open Data for ACademia
Linked Open Data for ACademiaLOD Cloud
(Linking Open Data)
Linked Open Data for ACademia
Benefits of LOD for Science
• Truly de-centralized database– No need for central database– Everyone can create one and join the cloud!
• Truly open and sharable data and schemata– Easy for re-use and mash-up – Easy for cross-domain/discipline use and connection
• A single format for all kind of data– Easy for data processing
Linked Open Data for ACademia
Bio2RDF• Bio2RDF is an open source framework to produce
and provide biological linked data that uses simple conventions on the emerging semantic web
• Bio2RDF reduces the time andeffort involved in data integration so that you can get to doing science
• 19 datasets; 1,010,758,291 triples
http://bio2rdf.org/
At the heart of Linked Data for the Life Sciences
Linked Open Data for ACademia
Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier: Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data, The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science Volume 7882, 2013, pp 200-212
Linked Open Data for ACademia
Bio2RDF
Linked Open Data for ACademia
LODAC Project - connecting academic data -
LODAC SPECIES: Connecting species data by nameSpecimen
DB
Species Info. DB
Taxon Name DBGBIF BioSci.
DB
Research DB
No. of Names : 113118No. of Triples : 14,532,449
Data from Source BIntegrated data
dc:references dc:references
dc:references dc:references
dc:references dc:references
dc:creatordc:creator
crm:P55_has_current_location
crm:P55_has_current_location
crm:P55_has_current_locationdc:creator
Data from Source AWork
Museum
Creator
Minimum Data to identify entitiesRaw Data for entities Raw Data for entitiesLODAC Museum: LOD of data in museums
App. for query expansion
CKAN Japanese: Catalog for Open Data
DBPedia Japanese
LODAC Location: Integration of location information
Linked Open Data for ACademiaLODAC SPECIES: Linking Species Information
with namesMuseum Specimen
DB
Species Info. DB
Taxon Name LOD
GBIFBioSci.
DB
Research DB
No. of Species Names : 113118No. of Triples : 14,532,449
Linked Open Data for ACademia
Data model for intergration
Specimen
rdf:type
species
institutionName
collectedDate
collectionLocality
crm:has_current_location
Bryophytes
TaxonName
ScientificNameCommonName TaxonRank
species
rdfs:subClassOfrdfs:subClassOf
rdf:typerdf:type
hasCommonName
hasScientificName hasSuperTaxon
rdf:type
hasTaxonRank
rdf:type
hasTaxonRank
rdf:type
ButterflyBDLS
dcterms:source
dcterms:publisher
: Named Graph: owl:Class
Linked Open Data for ACademiaSearch application
with LODAC SPECIES
http://lod.ac/apps/lsdcs
Linked Open Data for ACademia
LODAC Museum
• Integrated database for information on museums in Japan– Data• No. of museums : 114• No. of triples :
40,059,131
• Integration by creator, work and institute• Data publication by RDF• Some applications using the data
Type of Information RDF type No. of items
Collections (total) lodac:Specimen + lodac:Work
ca. 1,770,000
Collections (specimen) lodac:Specimen ca. 1,690,000
Collections (creative and historical work)
lodac:Work ca. 130,000
Creators foaf:Person ca. 8,800
Institutes Foaf:Organization ca. 200,000
Linked Open Data for ACademia
Integrated data processing by RDF
• Collect : RDF by converting RDB / by scraping Web• Refine: Define schema and covert data by schema• Integrate: Schema mapping, ID mapping• Publish: Dump data / SPARQL Endpoint• Use: Mash-up applications
Collect Refine Integrate Publish Use
Processed by RDF
Linked Open Data for ACademia
Extract
Extracting collection data from museum websites
Property Value
Property Value
Collect
Linked Open Data for ACademia
DatasetType No. Data source
Art work (lodac:Work)
ca.80,000 Catalog of the collections of 3 National Art Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums
Database for National Treasure & Important Cultural Property of National Designated (915)
The Japanese Art Thesaurus (266)Specimen (lodac:Speciment)
ca.1,690,000 (100+ Museum collections)Science Net (National Science Museum)
Person (foaf:Person) ca. 8,800 The Japanese Art Thesaurus
Facilities (icls. Museum)
ca. 200,000 The Japanese Art ThesaurusCultural Heritage OnlineGIS data National and Regional Planning Bureau
Collect
Linked Open Data for ACademia
31
Standardization of dataRe-organized common metadata.
Raw Data
dc:title
crm:P45_consistOf
skos:preflabel
lodac:era
Re-organized Metadata
Current organized policies・ Use existing metadata・ Define own metadata.
....
Refine
Linked Open Data for ACademia
Metadata schema for works
lodac:Work PropertyGenre lodac:genreType of cultural assets lodac:culturalAssetsCreator dc:creator / dc11:creatorNationality crm:P7_took_place_atTitle dc:title / skos:prefLabelTitle Pronunciation (yomi) dc:title @ja-hrkt / skos:altLabelTitle in English dc:title @en / skos:altLabelInscription crm:P62I_is_depicted_bySeal crm:P65_shows_visual_itemNo. of parts crm:P57_has_number_of_partsCollection dc:isPartOfCreated year dc:createdEstimated starting year lodac:estimatedStartYearMaterial dc:medium / crm:P45_consists_of
Refine
Linked Open Data for ACademia
Integrating Data
Data from Source BIntegrated data
dc:references dc:references
dc:references dc:references
dc:references dc:references
dc:creatordc:creator
crm:P55_has_current_location
crm:P55_has_current_location
crm:P55_has_current_locationdc:creator
Data from Source A
Work
Museum
Creator
Minimum Data to identify entitiesRaw Data for entities Raw Data for entities
Integrate
Linked Open Data for ACademia
34
Integrating DataIntegrate Item Source Amount
of DataIntegration
Data
FacilitiesA.Japanese Art Thesaurus 648
77B.Cultural Heritage Online 915
Title of important cultural properties
A.Japanese Art Thesaurus (Art work) 3,80074
B.DB for National Treasure (Art work) 10,115
Creator information and Work Title
A.Japanese Art Thesaurus (Creator) 1,33215,020
B.All of art work (Work title string) 61,861
Creator nameA.Japanese Art Thesaurus (Creator) 1,332
615B.All of art work title(using creator name) 61,861
Integrate
Linked Open Data for ACademia
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a
c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:skos="http://www.w3.org
/2004/02/skos/core#">
<foaf:Person rdf:about="http://lod.ac/id/359">
<lodac:creates rdf:resource="http://lod.ac/id/20029"/>
<lodac:creates rdf:resource="http://lod.ac/id/20128"/>
<lodac:creates rdf:resource="http://lod.ac/id/20755"/>
<lodac:creates rdf:resource="http://lod.ac/id/24768"/>
<lodac:creates rdf:resource="http://lod.ac/id/26732"/>
……
<dc:references rdf:resource="http://ja.dbpedia.org/resource/ 下村観山 "/>
<dc:references rdf:resource="http://lod.ac/ref/359"/>
<rdfs:label xml:lang="ja"> 下村観山 </rdfs:label>
<skos:prefLabel xml:lang="ja"> 下村観山 </skos:prefLabel>
<foaf:name xml:lang="ja"> 下村観山 </foaf:name>
</foaf:Person>
Publishing data as RDF
ID-resource URI(Own address)http://lod.ac/id/359
Ref-resource URIhttp://lod.ac/ref/359
External linkDBpedia Japanese
Links to her/his work URI
Publish
Linked Open Data for ACademia
Yokohama Art Spot
–Application using museum and local data–Data related to art in
Yokohama• Collections• Events• Q&A
http://lod.ac/apps/yas/
LODAC Museum × Yokohama Art LOD × PinQA
Use
Linked Open Data for ACademia
System Architecture
Work
InstitutionArtistArtist Institution
EventQuestion
AnswerUser
PinQAYokohama Art LOD
LODAC Museum
SPARQL
JSON
SPARQL
JSON
XML
SPARQL
Yokohama Art Spot
‣ Python + SPARQLWrapper‣ Geolocation
Use
Linked Open Data for ACademia
Conclusion• Data and Web– Great Potential!
• Linked Data - Exploit the power of Web –– Simple Structure: URI and RDF– Truly distributed data management– Easy to link to each other
– Suitable for inter-disciplinary areas• Left Issues– Scalability– Sustainability
• DOI: DataCite• ORCID