aginfra germplasm metadata analysis

41
Metadata analysis of germplasm collections The case of agINFRA Dr. Vassilis Protonotarios Agricultural Biotechnologist, PhD Agro-Know Technologies, Greece e-Conference on Germplasm Data Interoperability Session 2: “Status of data and metadata for germplasm”

Upload: vassilis-protonotarios

Post on 27-Jan-2015

120 views

Category:

Education


2 download

DESCRIPTION

Presentation of the two agINFRA Germplasm data sources (CGRIS, China and CRA, Italy) and the metadata used for the description of their germplasm accessions. Presented during Session 2 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)

TRANSCRIPT

Page 1: agINFRA Germplasm metadata analysis

Metadata analysis of germplasm collections

The case of agINFRA

Dr. Vassilis ProtonotariosAgricultural Biotechnologist, PhDAgro-Know Technologies, Greece

e-Conference on Germplasm Data InteroperabilitySession 2: “Status of data and metadata for germplasm”

Page 2: agINFRA Germplasm metadata analysis

Structure of the presentation

1. The agINFRA germplasm data sources– Chinese Crop Germplasm Information System– Italian National Germplasm Database

2. Current status– Mappings– Linked Data approach

3. Conclusions

Page 3: agINFRA Germplasm metadata analysis

The agINFRA germplasm data sources

Page 4: agINFRA Germplasm metadata analysis

agINFRA germplasm data sources

• Italian Germplasm Database (CRA)– Data available through EURISCO -> GENESYS– Uses EURISCO set of descriptors– Data also available through GBIF

• Chinese Crop Germplasm Information System (CGRIS/CAAS)– Data unavailable through aggregators– Own schema used for description of germplasm

accessions– Metadata exposure in CSV

Page 5: agINFRA Germplasm metadata analysis

agINFRA germplasm data analysis

1. Analysis of agINFRA germplasm data sources2. Analysis of metadata schemas used3. Identification of external schemas– Review of existing work

4. Definition of a base schema (descriptors)5. Mappings of various schemas to the base one6. Development of a linked data approach for

linking germplasm data sources

Page 6: agINFRA Germplasm metadata analysis

1. Chinese Crop Germplasm Information System (CGRIS / CAASD)

Page 7: agINFRA Germplasm metadata analysis

Chinese Crop Germplasm Information System (CGRIS)

• Provided by: Chinese Academy of Agricultural Sciences• A central repository for all type of plant genetic resources

information. It consists of six subsystems: 1. The management system of the National Crop Gene Bank (NCGB), 2. The management system of the long-term storage in Qinghai, 3. The management system of National germplasm Resources Nursery, 4. The crop characterization and evaluation database system,5. The database system for germplasm exchange at home and abroad

and 6. The management system of the medium-term storage in Beijing.

URL: http://icgr.caas.net.cn/cgrisintroduction.html

Page 8: agINFRA Germplasm metadata analysis

CGRIS: Data

At present, CGRIS owns• > 2000 MB data on 180 kinds of crops– including food crops, fibre plants, oil crops,

vegetable, fruit tree, tea, mulberry, tobacco, sugar, green manure crops, tropical crops etc.),

• 390,000 accessions of germplasm

Page 9: agINFRA Germplasm metadata analysis

CGRIS: Accessions (indicative list)

http://icgr.caas.net.cn/cgrisintroduction.html

Page 10: agINFRA Germplasm metadata analysis

Crop Germplasm Classification

Page 11: agINFRA Germplasm metadata analysis

Info on wheat varieties

Page 12: agINFRA Germplasm metadata analysis

Info on wheat varieties

Page 13: agINFRA Germplasm metadata analysis

CGRIS: Germplasm Data Query

Page 14: agINFRA Germplasm metadata analysis

CGRIS: Germplasm Data Query

Page 15: agINFRA Germplasm metadata analysis

CGRIS Metadata

• CGRIS germplasm descriptors based on own schema – can be seen as the de facto standard for

germplasm accession information in China. – Based on metadata scheme standards such as

developed by IPGRI (Bioversity) and GRIN

Page 16: agINFRA Germplasm metadata analysis

CGRIS: Basic Descriptors

Page 17: agINFRA Germplasm metadata analysis

CGRIS: Wheat descriptors

Page 18: agINFRA Germplasm metadata analysis

CGRIS Metadata: Next steps

• A mapping to the Multi-crop Passport Descriptors (MCPD) standard is intended– According to CAAS subject experts such a mapping

should be rather easy to produce.

Page 19: agINFRA Germplasm metadata analysis

CGRIS: Exposing data

• Data stored in relational DBs • Hosted in an SQL server• Exposure of data as CSV files (partially in

Chinese)

Page 20: agINFRA Germplasm metadata analysis

CGRIS: IPR information

• The CGRIS website is public and accessible for everybody. The information is provided free of charge but based on copyright.

• With regards to data exchange there is no explicit policy to follow.

• CGRIS does not have an Open Access mandate and the members of the CGRIS network apply their own institution policy.

Page 21: agINFRA Germplasm metadata analysis

2. Italian Germplasm Database (CRA)

Page 22: agINFRA Germplasm metadata analysis

Italian Germplasm Database

• Provided by: Italian Council for Research and Experimentation in Agriculture

• Developed in the context of the “Plant Genetic Resources/FAO” project in 2004 – Research Centres and Units of the CRA – The Institute of Plant Genetics of the CNR in Bari, – NGO “Rete Semi Rurali”– University collections (Perugia, Potenza etc.)

URL: http://fru.entecra.it

Page 23: agINFRA Germplasm metadata analysis
Page 24: agINFRA Germplasm metadata analysis

CRA Germplasm: Data

Current status of germplasm data (CRA)• 20,954 records from Italy are included in

EURISCO of which 17,212 from CRA • 28,509 records for 275 plant species in the

National Inventory (in general)– does not allow for identifying the number of CRA

germplasm records

Page 25: agINFRA Germplasm metadata analysis

CRA: Accessions (indicative list)

URL: http://fru.entecra.it/accessioni.php

Page 26: agINFRA Germplasm metadata analysis

Info on specific species

Page 27: agINFRA Germplasm metadata analysis
Page 28: agINFRA Germplasm metadata analysis

EURISCO descriptors

Page 29: agINFRA Germplasm metadata analysis

CRA Metadata

• Most CRA institutional databases use the MCPD– however, in the records provided to the National

Inventory several fields are often not filled. • Some CRA collections also use descriptors

defined by – the Union for the Protection of New Varieties of

Plants (UPOV) and – the National Register of New Varieties.

• Ensure mapping to the Multi-crop Passport Descriptors (MCPD)/EURISCO

Page 30: agINFRA Germplasm metadata analysis

CRA: IPR information

• The CRA website is public and accessible for everybody. The information is provided free of charge but based on copyright

• The Multilateral System (MLS) of the Treaty demands free availability of the information on the PGRFA that are under the management and control of the Contracting Parties and in the public domain (Treaty, Art. 11.2).

• This excludes – germplasm accessions that are subject to IPR and – other legally binding protection which restricts the Contracting

Party’s control over the material. – Accessions that are not covered by IPR include old and

autochthonous varieties, crop wild relatives and other material found in in-situ conditions, new cultivars not protected by IPR and cultivars whose IPR have expired.

Page 31: agINFRA Germplasm metadata analysis

Conclusions

Page 32: agINFRA Germplasm metadata analysis

Current status

• First version of mappings is available• EURISCO descriptors used as base schema– MCPD– Darwin Core for Genebanks– ABCD– CGRIS– CRA

Page 33: agINFRA Germplasm metadata analysis

Mapping table

Page 34: agINFRA Germplasm metadata analysis

Mapping table

Page 35: agINFRA Germplasm metadata analysis

Development of decision trees

Page 36: agINFRA Germplasm metadata analysis

Development of decision trees

Page 37: agINFRA Germplasm metadata analysis

Linked Data

• A linked data approach will be used by agINFRA for linking germplasm data sources

• OpenAGRIS already aggregates germplasm data using AGROVOC

Page 38: agINFRA Germplasm metadata analysis

Conclusions

• Both schemas / sets of descriptors can be mapped to the EURISCO ones

• Linked Data approach will facilitate linking of germplasm data from CRA/CGRIS

• EURISCO descriptors to be published as linked data– To be used as the base of passport data

• Linking to other germplasm standards– e.g. Darwin Core for Genebanks*

*https://code.google.com/p/darwincore-germplasm/wiki/DarwinCoreGermplasmMapping

Page 39: agINFRA Germplasm metadata analysis

Take home message

• The identification of common properties between different metadata schemas will facilitate the linked data framework

Page 40: agINFRA Germplasm metadata analysis

(Indicative) List of References

• agINFRA Deliverable D2.3 “Review of Content Requirements”

• agINFRA Deliverable D5.3 “Conceptual specification of linked agricultural data framework”

• agINFRA Germplasm Working Group Wiki http://wiki.aginfra.eu/index.php/Germplasm_Working_Group

• EURISCO passport descriptors http://www.ecpgr.cgiar.org/germplasm_databases.html

• Draft Mapping of EURISCO Descriptors to ABCD 2.06 http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/EURISCO-2-ABCD.pdf

Page 41: agINFRA Germplasm metadata analysis

Source: http://verastic.com/social/why-do-people-not-say-thank-you.html

Contact me: [email protected]