opening and integration of casdd and germplasm data to agris by prof. xuefu zhang and dr. guojian...

30
Opening and Integration of CASDD and Germplasm Data to AGRIS Prof. Xuefu Zhang & Dr. Guojian Xian Agricultural Information Institute of CAAS Research Data Alliance Fourth Plenary Meeting, 22-24 September, 2014, Amsterdam

Upload: ciard-movement

Post on 28-May-2015

168 views

Category:

Education


1 download

DESCRIPTION

Presentation delivered at the Agricultural Data Interoperability Interest Group -- Research Data Alliance (RDA) 4th Plenary Meeting -- Amsterdam, September 2014

TRANSCRIPT

Page 1: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

Opening and Integration of CASDD and Germplasm Data to AGRIS

Prof. Xuefu Zhang & Dr. Guojian Xian

Agricultural Information Institute of CAAS

Research Data Alliance Fourth Plenary Meeting, 22-24 September, 2014, Amsterdam

Page 2: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

2

Contents

Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results

Page 3: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

3

Main Materials

• Chinese Agricultural Sci-tech Documents Database (CASDD)– 440,113 records

• CGRIS Germplasm Data• AGROVOC

– agrovoc_2013-12-17_core.rdf• Chinese Agricultural Thesaurus(CAT)• KOS Mapping Results:

– AGROVOC_CAT.nt• AGRIS 2.0

– (Latest version: 20140427)

Page 4: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

4

About CASDD

• Chinese Agricultural Sci-tech Documents Database (CASDD), as agricultural bibliographic/abstracts database in China developed by CAAS, has the largest number of records and the longest time span of documents.

• Covering over 1000 kinds of agricultural academic journals and other materials, over 6 million records, in the fields of agronomy, horticulture, plant protection, soil sciences, animal husbandry, veterinary, agricultural engineering, agricultural products processing, agricultural economic,etc.

• It is the most comprehensive, reliable and accessible information resources of agricultural science and technology information from research institutions, education and related departments.

Page 5: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

5

Refining and Analyzing CASDD

CASDD

CAT AGROVOC RDF Core

Mapping of CAT and AGROVOC

Solr 4.7Solr 4.7Write&Read

Tagging(URI,Preflabel)

CASDDIndex CASDDIndex

Indexing

VirtuosoTriple Store

Tagging(URI,Preflabel)

Sparql query

MMseg4J/IKAnalyzerMMseg4J/IKAnalyzer

Java Application

SQE PluginSQE Plugin

Tagging CAT and AGROVOC concepts to CASDD

Page 6: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

6

English Coverage Analysis of CASDD Records

Fields Records Percentage

English Title 289,314 65.74%

English Keywords 286,032 64.99%

English Abstract 286,921 65.19%

Total Records: 440,113

Page 7: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

7

The CAT Concepts Coverage in CASDD

Page 8: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

8

TermCount TermFreq. Record Number Match RatioTermFreq>=3

TermCount>=1 400,009 90.89%TermFreq>=3

TermCount>=2 320,472 72.82%TermFreq>=3

TermCount>=3 227,481 51.69%TermFreq>=3

TermCount>=5 83,992 19.08%TermFreq>=5

TermCount>=3 51,726 11.75%

The AGROVOC Concepts Coverage in CASDD

Page 9: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

CASDD Restful API (Architecture)

CASDDDatabase

Tomcat(Jersey API)

CASDD Restful Web Service (API) Endpoint

Reading Only

Accessing & Linking

Container

Solr 4.7(SQE Plugin)

Third Part Application

Container

AGRIS

agINFRA

Index

CAT + AGROVOC + Mapping

Page 10: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

10

CASDD Restful API(Features)

• Aims to provide a light-weight solution to expose the records of CASDD to the third party applications.

• Providing several ways to access the records, such as query with keywords, ARN, PublicationDate, AGROVOC Concept URIs, Chinese Agricultural Thesaurus (CAT) URIs.

• The results also supporting pagination and sorting. • The output formats include RDF/XML following the

AGRIS AP standard and plain JSON.• Authentication and Detail Logging for evaluations

Page 11: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

11

CASDD Restful API(Samples)

Browsing records with paginationGet records with AGROVOC URI

Page 12: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

12

Contents

Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results

Page 13: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

13

Germplasm Data of CGRIS

• CGRIS germplasm database is a central repository for all type of plant genetic resources information in China. At present, there are over 4000 MB data on 200 kinds of crops, 410,000 accessions of germplasm stored in CGRIS.

Page 14: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

The Germplasm Restful API (Architecture)

CGRIS GermplasmDatabase

Tomcat(Jersey API)

CGRIS Website

CGRIS Germplasm Restful API

AGROVOC

CAT

Preflabel2URIMapping

Reading Only

Accessing & Linking

Redirect to Detail

Container

Third Part Application

Page 15: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

• Aims to provide a light-weight solution to expose the records of CGRIS Germplasm to the third party applications.

• Providing several ways to access the records, such as query with scientific name, vernacular name, catalogNumber, AGROVOC Concept URI, Chinese Agricultural Thesaurus (CAT) URI.

• The output formats include RDF/XML following the darwincore-germplasm schema and plain JSON.

• Authentication and Detail Logging for evaluations

The Germplasm Restful API (Features)

Page 16: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

The Germplasm Restful API (Samples)

Get records with scientific nameGet records with AGROVOC URIGet records with vernacular name

Page 17: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

17

Contents

Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results

Page 18: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

18

The Extended AGRIS in Chinese

Restful API

QUERY SEARCH RESULT BROWSINGSTATISTICS (CASDD)

SINGLE RECORD MASHUPS

( Germplasm)

AGRIS SERVICES LAYER

The Extended AGRIS in Chinese

Read

TOOLS LAYER

DATA LAYER

AGRISAGRISCASDDCASDD

CATAGROVOC RDF Core

Mapping of CAT and

AGROVOC

JAVA APPLICATION

Custom Modules

Chinese Query

Solr 4.7

SQE PluginSQE Plugin

CASDD Box

GermplasmGermplasm Other ResourcesOther Resources

CASDD

Germplasm

CASDD New Page

Page 19: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

Enhanced Search in Chinese

• Semantic Query Extension– Solr Query Expander (SQE)2.0

• Integrating and Linking CASDD API• Integrating and Linking Germplasm API• Other Improvements:– User Query Automatic Suggestion – Update AGRIS AP XML files Indexer to Solr 4.7– Integrating Bing Cloud Dictionary

19

Page 20: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

Improved and Updated SQE 2.0

• Totally be compliance with Solr 4.5.• Work with SKOS files with suffix .rdf (RDF/XML), .n3

(N3),.ttl (TURTLE) and .zip (ZIP)• Supports load more than one SKOS files at one time• Supports customized relationship types expansion,

such as PREF, ALT, HIDDEN, BROADER, NARROWER, BROADERTRANSITIVE, RELATED.

• Excellent performance with the improved version of IKAnalyzer2012FF (supports English phrase analysis and tagging based on English dictionary)

20

Page 21: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

Semantic Expansion Search with SQE2.0

21

Page 22: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

Integrating and Linking CASDD

• AGRIS Search Results(CASDD Box)– The box displays the search results of CASDD (first

five records)– Records include title, author, keywords,

submission date, and abstract.– get more related records– get more (detail information)

22

Page 23: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

Integrating and Linking CASDD

• Detail information(Single Record information)– Title(ZH/EN), Keywords(ZH/EN), Authors,

Submission Date, Abstract(ZH/EN), CAT keywords, AGROVOC keywords, Journal, ISSN

• More Related Records– Display more related records– Browing records with pagination

23

Page 24: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

Linking CGRIS Germplasm Resources

• Germplasm Mashup – get more…(detail information)First five CGRIS Germplasm records information

• Navigating to CGRIS Website– CGRIS website

24

Page 25: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

25

Contents

Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results

Page 26: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

26

Linking CASDD Records with Boxhttp://agris.fao.org/agris-search/searchIndex.do?query=barley&x=-430&y=-58

Page 27: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

27

Detail Info of a CASDD Record

Page 28: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

28

More Related Records From CASDD

Page 29: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

29

CGRIS Germplasm Mashup

Page 30: Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Zhang and Dr. Guojian Xian

30

Thanks for Your Listening!