Obregón, Mexico March 24, 2014
Importance of data sharing and germplasm movement
Susan McCouch Cornell University
Why share?
• Humans have always shared; it is in our nature to share.
• Sharing creates and sustains relaDonships, enhances knowledge, protects from ignorance
• Sharing is powerful, helps reach goals, increases competence, brings rewards
Knowledge Sharing
• Knowledge sharing is a fundamental process of civilizaDon, it is central to learning, it creates community
• Making specific knowledge available to the right people at the right Dme is key to achieving goals
• People who have knowledge and relevant data gain respect when they share what they know
• Today, informaDon Technology (IT) codifies & helps manage knowledge and data sharing
Explicit knowledge and data sharing is greatly facilitated by InformaDon technology (IT)
In the informaDon age, we share more content, from more sources, with more people than ever before
We also have created an elaborate legal system designed to allow “ownership” of knowledge through Intellectual Property laws
Germplasm sharing • Historically Plant GeneDc Resources and breeding
materials were shared openly and moved rapidly around the globe
• Germplasm was conceived of and treated as a “public good” -‐ available to all without restricDon; its value was enhanced (not diminished) by use
• Over last 50 years, free and open germplasm exchange has been slowly restricted – InterpretaDons of the Treaty on Biological Diversity – Expanded legal (IP) protecDon of varieDes – PrivaDzaDon of plant breeding and plant breeding research – Significant spill-‐over effects of biotechnology & genomics
Biotechnology & Genomics: powerful partners in plant breeding
The Bourlag Global Rust IniDaDve (BGRI) aims to uDlize biotechnology, genomics and classical breeding to develop & deploy rust resistant wheat varieDes
• IdenDfy host resistance genes • Provide markers for selecDon of mulD-‐genic rust resistance • Create novel forms of resistance or immunity • Enable modeling and genomic predicDon (GS) • Increase breeding efficiency • Basis for intellectual property (IP) protecDon
BGRI advocates the sharing of data & germplasm among parEcipants to hasten the development of resistance
Rules of exchange
GERMPLASM – Landraces and wild species (Gene Banks) – Advanced breeding lines (Breeding Programs)
• Governed by different rules of exchange – Gene Banks => InternaDonal Treaty on Biological Diversity
• InternaDonal vs NaDonal Gene Banks – Breeding Lines => Bi-‐lateral agreements
• Freely distribute if no IP, formal agreements if protected DATA • Bi-‐ or mulD-‐lateral agreements • Centralized data & informaDon resource (open or restricted access)
– Genomic info & R-‐gene haplotypes -‐ accessions & advanced public lines – Pedigree info, IBD blocks, geneDc relaDonships, pop structure – Phenotype, Environment (use of ontologies & controlled vocabulary)
Big Data • High throughput data collecDon (sequencing, “omics”)
generates so much data so quickly that it outpaces our ability to analyze & make sense of them
• Standards urgently needed to guide data collecDon, management, annotaDon/ curaDon, sharing, & integraDon
• Improved experimental designs needed to help opDmize value of field evaluaDon & take advantage of “big data”
• CriDcally important to link genotypic, phenotypic and environmental informaDon with seed stocks and geneDc resources being evaluated => enormous tracking problem
Genotype • captures wide range of
polymorphisms • supports full range of ploidy • connects to genomic map
Field/Plant ObservaEon
• integrates planDng, treatment, locality
data • links to individual
plant sample
Germplasm
• pedigree descripDon • seed stock informaDon
Phenotype • quanDtaDve or qualitaDve
traits • supports ontology integraDon
GDPDM: www.maizegeneEcs.net/gdpdm
Track seed stocks, experiments, reps, genotypes, phenotypes, environments; provide data download opDons, querying tools, pipeline for GWAS,
mulD-‐variate analysis and data mining.
Diversity Database
Data informaDon resources
Can we connect the dots?
• Everything is moving very quickly-‐ – Data and informaDon – UG99
• Many people have informaDon that could help others but it is not always available in Dme or in ways that are useful
• Germplasm movement more restricted than informaDon sharing => informaDon alone can be criDcal – many sources of the same alleles, many forms of local adaptaDon
Introgression lines, mapping populaEons
5,000 lines
300 lines 80 lines RUF48 RUF47 RUF46 RUF45 RUF44 RUF43 RUF42 RUF41 RUF40 RUF39 RUF38 RUF37 RUF36 RUF35 RUF34 RUF33 RUF32 RUF31 RUF30 RUF29 RUF28 RUF27 RUF26 RUF25 RUF24 RUF23 RUF22 RUF21 RUF20 RUF19 RUF18 RUF17 RUF16 RUF15 RUF14 RUF13 RUF10 RUF09 RUF08 RUF07 RUF06 RUF05 RUF04 RUF03 RUF02 RUF01
Chromosome 1 2 3 4 5 6 7 8 9 10 11 12
0 50 100 150 200 Number of markers
CSSL Bi-‐parental
MAGIC
• Small public or private sector groups band together as “research consorDa” & agree to share data and create “economy of scale” for problem-‐solving (BGRI)
• Large, private sector enterprises already at “economy of scale”, benefit by greater access to info & materials, may contribute financially but do not share data or germplasm
• Large, coordinated public sector enterprises (China, India, Brazil) may represent “economies of scale” but lack strong breeding & informaDon infrastructure, want to parDcipate in research consorDa, but lack moDvaDon to share
ConflicEng moEvaEons
Concept of “public good” – A public good is a resource that is accessible to all, individuals cannot be excluded from using it, and its value and availability does not diminish from use. Generally paid for by taxaDon.
– Free Rider Problem: occurs when people are allowed to use a resource without paying for it
– If enough people can use the resource without paying, there is a danger that it breaks the system, or in a free market, that the resource will be under-‐provided or not provided at all
– In case of germplasm, disseminaDon of gene bank material key to gene bank mission; development & deployment of improved varieDes key to breeding; capturing value of varieDes is moDvaDon to “own & exclude” (private & public sector players)
SoluEons to Free Rider problems
• Tax: Divide the cost equally among beneficiaries; ensure that everyone who benefits contributes financially to maintain & support the resource.
• Altruism: Ask for donaDons. Some ‘free riders’ will not donate, but enough people willing to do so if cost of public good low and value high.
• PrivaEze: Restrict access to those willing to pay.
~ Mix –N-‐ Match ~
IRTP India
Mexico
Iraql
Ethiopia
Uganda
Turkey Kenya
Spain
Data sharing among partners to improve rate of geneEc gain in breeding
Germplasm
Genotype Phenotype
MulE-‐trait modeling
Genome Wide AssociaEon Studies
Database needed to manage data for consorEum partners Define IDB blocks => impute genotypes, impute phenotypes, characterize germplasm
DisrupEve change
• Big data (geno, pheno, environment) and computaDonal power provide basis for mulD-‐variate modeling of plant performance.
• More & more emphasis on using genotype to predict phenotypic potenDal of individuals & populaDons => driver is low cost of genotyping and high performance compuDng.
• Data sharing and germplasm exchange allows consorDum members to assemble info about pathogen, R-‐genes, iteraDvely develop and test predicDons about performance x environment
• Requires new data management pracDces among partners to ensure Dmely & systemaDc sharing of data and germplasm
And most importantly, requires real people on the ground observing, monitoring, selecDng, and reporDng about their work with wheat rust
Thanks for listening!
And thank you all for inviDng me to your meeDng and for being such a wonderful, communicaDve and dedicated scienDfic community...