opening and integration of casdd and germplasm data to agris by prof. xuefu zhang and dr. guojian...
DESCRIPTION
Presentation delivered at the Agricultural Data Interoperability Interest Group -- Research Data Alliance (RDA) 4th Plenary Meeting -- Amsterdam, September 2014TRANSCRIPT
Opening and Integration of CASDD and Germplasm Data to AGRIS
Prof. Xuefu Zhang & Dr. Guojian Xian
Agricultural Information Institute of CAAS
Research Data Alliance Fourth Plenary Meeting, 22-24 September, 2014, Amsterdam
2
Contents
Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results
3
Main Materials
• Chinese Agricultural Sci-tech Documents Database (CASDD)– 440,113 records
• CGRIS Germplasm Data• AGROVOC
– agrovoc_2013-12-17_core.rdf• Chinese Agricultural Thesaurus(CAT)• KOS Mapping Results:
– AGROVOC_CAT.nt• AGRIS 2.0
– (Latest version: 20140427)
4
About CASDD
• Chinese Agricultural Sci-tech Documents Database (CASDD), as agricultural bibliographic/abstracts database in China developed by CAAS, has the largest number of records and the longest time span of documents.
• Covering over 1000 kinds of agricultural academic journals and other materials, over 6 million records, in the fields of agronomy, horticulture, plant protection, soil sciences, animal husbandry, veterinary, agricultural engineering, agricultural products processing, agricultural economic,etc.
• It is the most comprehensive, reliable and accessible information resources of agricultural science and technology information from research institutions, education and related departments.
5
Refining and Analyzing CASDD
CASDD
CAT AGROVOC RDF Core
Mapping of CAT and AGROVOC
Solr 4.7Solr 4.7Write&Read
Tagging(URI,Preflabel)
CASDDIndex CASDDIndex
Indexing
VirtuosoTriple Store
Tagging(URI,Preflabel)
Sparql query
MMseg4J/IKAnalyzerMMseg4J/IKAnalyzer
Java Application
SQE PluginSQE Plugin
Tagging CAT and AGROVOC concepts to CASDD
6
English Coverage Analysis of CASDD Records
Fields Records Percentage
English Title 289,314 65.74%
English Keywords 286,032 64.99%
English Abstract 286,921 65.19%
Total Records: 440,113
7
The CAT Concepts Coverage in CASDD
8
TermCount TermFreq. Record Number Match RatioTermFreq>=3
TermCount>=1 400,009 90.89%TermFreq>=3
TermCount>=2 320,472 72.82%TermFreq>=3
TermCount>=3 227,481 51.69%TermFreq>=3
TermCount>=5 83,992 19.08%TermFreq>=5
TermCount>=3 51,726 11.75%
The AGROVOC Concepts Coverage in CASDD
CASDD Restful API (Architecture)
CASDDDatabase
Tomcat(Jersey API)
CASDD Restful Web Service (API) Endpoint
Reading Only
Accessing & Linking
Container
Solr 4.7(SQE Plugin)
Third Part Application
Container
AGRIS
agINFRA
Index
CAT + AGROVOC + Mapping
10
CASDD Restful API(Features)
• Aims to provide a light-weight solution to expose the records of CASDD to the third party applications.
• Providing several ways to access the records, such as query with keywords, ARN, PublicationDate, AGROVOC Concept URIs, Chinese Agricultural Thesaurus (CAT) URIs.
• The results also supporting pagination and sorting. • The output formats include RDF/XML following the
AGRIS AP standard and plain JSON.• Authentication and Detail Logging for evaluations
11
CASDD Restful API(Samples)
Browsing records with paginationGet records with AGROVOC URI
12
Contents
Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results
13
Germplasm Data of CGRIS
• CGRIS germplasm database is a central repository for all type of plant genetic resources information in China. At present, there are over 4000 MB data on 200 kinds of crops, 410,000 accessions of germplasm stored in CGRIS.
The Germplasm Restful API (Architecture)
CGRIS GermplasmDatabase
Tomcat(Jersey API)
CGRIS Website
CGRIS Germplasm Restful API
AGROVOC
CAT
Preflabel2URIMapping
Reading Only
Accessing & Linking
Redirect to Detail
Container
Third Part Application
• Aims to provide a light-weight solution to expose the records of CGRIS Germplasm to the third party applications.
• Providing several ways to access the records, such as query with scientific name, vernacular name, catalogNumber, AGROVOC Concept URI, Chinese Agricultural Thesaurus (CAT) URI.
• The output formats include RDF/XML following the darwincore-germplasm schema and plain JSON.
• Authentication and Detail Logging for evaluations
The Germplasm Restful API (Features)
The Germplasm Restful API (Samples)
Get records with scientific nameGet records with AGROVOC URIGet records with vernacular name
17
Contents
Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results
18
The Extended AGRIS in Chinese
Restful API
QUERY SEARCH RESULT BROWSINGSTATISTICS (CASDD)
SINGLE RECORD MASHUPS
( Germplasm)
AGRIS SERVICES LAYER
The Extended AGRIS in Chinese
Read
TOOLS LAYER
DATA LAYER
AGRISAGRISCASDDCASDD
CATAGROVOC RDF Core
Mapping of CAT and
AGROVOC
JAVA APPLICATION
Custom Modules
Chinese Query
Solr 4.7
SQE PluginSQE Plugin
CASDD Box
GermplasmGermplasm Other ResourcesOther Resources
CASDD
Germplasm
CASDD New Page
Enhanced Search in Chinese
• Semantic Query Extension– Solr Query Expander (SQE)2.0
• Integrating and Linking CASDD API• Integrating and Linking Germplasm API• Other Improvements:– User Query Automatic Suggestion – Update AGRIS AP XML files Indexer to Solr 4.7– Integrating Bing Cloud Dictionary
19
Improved and Updated SQE 2.0
• Totally be compliance with Solr 4.5.• Work with SKOS files with suffix .rdf (RDF/XML), .n3
(N3),.ttl (TURTLE) and .zip (ZIP)• Supports load more than one SKOS files at one time• Supports customized relationship types expansion,
such as PREF, ALT, HIDDEN, BROADER, NARROWER, BROADERTRANSITIVE, RELATED.
• Excellent performance with the improved version of IKAnalyzer2012FF (supports English phrase analysis and tagging based on English dictionary)
20
Semantic Expansion Search with SQE2.0
21
Integrating and Linking CASDD
• AGRIS Search Results(CASDD Box)– The box displays the search results of CASDD (first
five records)– Records include title, author, keywords,
submission date, and abstract.– get more related records– get more (detail information)
22
Integrating and Linking CASDD
• Detail information(Single Record information)– Title(ZH/EN), Keywords(ZH/EN), Authors,
Submission Date, Abstract(ZH/EN), CAT keywords, AGROVOC keywords, Journal, ISSN
• More Related Records– Display more related records– Browing records with pagination
23
Linking CGRIS Germplasm Resources
• Germplasm Mashup – get more…(detail information)First five CGRIS Germplasm records information
• Navigating to CGRIS Website– CGRIS website
24
25
Contents
Open CASDD as Restful APIs Open Germplasm as Restful APIs Integration and Extension to AGRIS Fruitful Results
26
Linking CASDD Records with Boxhttp://agris.fao.org/agris-search/searchIndex.do?query=barley&x=-430&y=-58
27
Detail Info of a CASDD Record
28
More Related Records From CASDD
29
CGRIS Germplasm Mashup
30
Thanks for Your Listening!