release of agris 2.0: searching agricultural bibliografic data

18
Presentations by AIMS is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 3.0 Unported License. Fabrizio Celli Johannes Keizer AGRIS – exploiting bibliographic records to create rich Linked Open Data pages AIMS Webinar

Category:

Technology


0 download

DESCRIPTION

The objective of this presentation is to give an introduction to the new AGRIS website (that was released on the 2nd of December 2013) and its functions. The new website merges the AGRIS system with OpenAGRIS and provides a simple access point to search bibliographic data in the AGRIS data base.

TRANSCRIPT

Page 2: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org 2

Outline

AGRIS network and dataflow

Data Consumption• Centralization

• Interlinking

Provenance

Page 3: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

AGRIS

The AGRIS database is a collection of more than 7.7 million bibliographic records in the agricultural domain

They are enhanced by the AGROVOC thesaurus, which is extensively used by cataloguers to enrich data indexing in agricultural information systems

AGRIS is an RDF-aware system, a mashup application that allows users to query the AGRIS-RDF content, interlinking all records to external sources of information

7 million bibliographic records become 7 million mashup pages!

Page 4: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

AGRIS data consumption

Centralization: bibliographic references in the AGRIS domain (agriculture, forestry, animal husbandry, aquatic sciences and fisheries, and human nutrition)

Interlinking: other kinds of information related to the AGRIS domain (statistics, maps, country profiles, etc.)

Page 5: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

Data consuming

AGRIS consumes metadata provided by the community and publishes it as open data

The metadata is captured either by pulling data through harvesting from clients (e.g. aggregators, institutional repositories, using protocols such as OAI-PMH)

or by pushing data to AGRIS from clients (e.g. national libraries or journal publishers)

Page 6: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

Interoperability -Accept any input format!

Page 7: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

AGRIS data flow

Page 8: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

Centralization: Data processing

Metadata are randomly manually checked to look for inconsistencies or recurring semantic errors

Input format is mapped to AGRIS RDF

Metadata are converted to AGRIS RDF, running the AgroTagger when Agrovoc keywords are not available

Before adding metadata to the triplestore and indexing them in the Solr index, duplicates are detected and managed, as the same record may be indexed in multiple collections or be duplicated in the same repository

Page 9: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

AgroTagger

Not yet implemented

Maui is named after the

Polynesian mythological hero

and demi-god, which would

transform himself into different kinds of birds to perform

many of his exploits.

Page 10: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

RDF-ization

bibo:Articlebibo:abstractbibo:doibibo:isbnbibo:presentedAt -> bibo:Conference -> dct:titlebibo:uridct:alternativedct:creator -> foaf:organization -> foaf:namedct:creator -> foaf:Person -> foaf:namedct:dateSubmitteddct:descriptiondct:extentdct:identifierdct:language

dct:isPartOfdct:issueddct:publisher -> foaf:Organization -> foaf:namedct:sourcedct:subjectdct:titledct:typedct:rights

Choose of vocabularies and mapping!

Page 11: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

RDF/XML snapshot

Page 12: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

Provenance

Each AGRIS record has an identifier (ARN), which has a predefined structure and contains information on the data source together with the bibliographic record’s year of creation

“IT 2008 0 00091” refers to a record created in 2008 from a specific AGRIS data provider in Italy, whose progressive number is 91

Data providers information are stored in the CIARD RING and triplified in the AGRIS centers dataset (each data provider has its own unique URI)

Page 13: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

Storage system

AGRIS RDF is stored in Malaysia, at MIMOS (http://www.mimos.my/ )

Triples are managed by Allegrograph triplestore (http://www.franz.com/agraph/allegrograph/)

A 90GB machine is dedicated to the triplestore. Some month ago we used a 32 GB machine, but Allegrograph once a month (at least) went down (pending processes, memory problems)

We did tests with OWLIM and we could move to this triplestore, or find another kind of solution

Page 14: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

Interlinking

Agrovoc is the backbone

Align Agrovoc to other thesauri (skos:exactMatch, skos:closeMatch)

Discover Sparql endpoints

Discover Webservices and APIs

Write the code and interlink!

Page 15: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

The IFPRI case

A user queries the system

AGRIS record with Agrovoc

keywords

At least one Agrovoc keyword is a Country

name

The system queries IFPRI sparql endpoint (http://data.ifpri.org/sparql/ ) to retrieve the global hunger index (GHI) and the child mortality rate related to the Country

Page 16: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

Some numbers (02/12/2013)

7,636,069 bibliographic records

187,238,716 triples in the AGRIS records datasethttp://202.45.142.113:10035/repositories/agris

372,462 triples in the AGRIS serials datasethttp://202.45.142.113:10035/repositories/jad

11,414 triples in the AGRIS centers datasethttp://202.45.142.113:10035/repositories/centers

Page 17: Release of AGRIS 2.0: Searching agricultural bibliografic data

http://agris.fao.org

AGRIS RDF RECORD

AGROVOC

Page 18: Release of AGRIS 2.0: Searching agricultural bibliografic data

Thank you !