bioinformatics and biostatistics in limagrain / biogemma jobim conference, july 2015

Post on 04-Jan-2016

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bioinformatics and Biostatistics in Limagrain / Biogemma

JOBIM Conference, July 2015

An international agricultural cooperative group

2

A portfolio of strong brands

4th largest seed company

worldwide

Sales of nearly 2 billion

Euros

Subsidiaries in 42 countries

Nearly

2,000 farmer members

Nearly

9,000 employees

13.5% of turnover

re-invested in research

A group that specializes in seeds and cereal products

3

Field Seeds

VegetableSeeds

Cereal Products

Limagrain Coop

FieldSeeds

BakeryProducts

CerealIngredients

GardenProducts

VegetableSeeds

A European group open to the world

4

64% of sales

64% of workforce

23% of sales

6% of sales

16% of workforce

7% of sales

12% of workforce

8% of workforce

Nearly

9,000 employees

66 nationalities

69% of sales

achieved outside France

Subsidiaries in

42 countries

Europe

Asia & Pacific

Africa & Middle East

Americas

An innovative group

5

13.5% of turnover invested in research

200 M€ invested in research(270 M€ with collabora-tions)

2.25%*5.4%*

10.2%*

13.5%

Averageindustry

Automobile industry

Pharmaceutical industry

Limagrain

* Source : Leem - April 2013

BIOGEMMA, a research partnership

Biotechnologies

Field Seeds

66

9.5% 9.5%

16%55 %

10%

| 7

Biogemma

Identification of genes associated with agronomic traits

Development of GM varieties in cereals

Development of tools and knowledgeBIOINFORMATIC

S

Bioinformatics for breeding

Analyze NGS-based data

Develop databases and tools to store and analyse biological data

Bioinformaticsdb

Tools

Tools

BiostatisticsDiscover Associations

BioanalysisExplain Associations

Molecular Breeding

Omics analysisPhenotype

Environment

ChromatinSilencing

Regulation of transcription

miRNA, siRNA

Protein modification, interaction, turnover

Regulation of translation

RNA stability

What wemeasure

Markers mRNATranscription levels, DGE

ProteinQuantity,

Activity levels

TraitPhenome

Regulationof expression

How we measure

GenotypingSequencing

RNA-Seqmicroarrays

HPLC Crystallo-

graphy

IA, NIR, HPLC, eyeball

DNAGenes,

Genomes

Biologicalmaterial

RNAmRNA, rRNATranscriptome

ProteinEnzyme

Proteome

TraitPhenome

MetabolomeTranscription Translation Expression

LD mapping, GWAS, GS

A great deal of complex information to correlate

Environment

Genotype Phenotype

Data processing tools getting more and more sophisticated

Data production & acquisition

Data analysis & processing

Results interpretation& decision supportfield trials

genotypingsequencinggenomics

LIMS, databases

data retrieval

quality control

statistical analyses

building predictive model

evaluation of individuals

predicting cross value

Data Life Cycle

Data production & acquisitionSequencing

NGS based: whole genome, targeted sequencing, transcriptome

Deliverables: SNP, structural variations, gene expression level, genomes

Genotyping

High density chips

- 103 – 105 SNP

- 105 samples

Automate calling / quality control

12

Steem_Z30_rep1

Steem_Z30_rep2

Steem_Z32_rep1

Steem_Z32_rep2

Steem_Z65_rep1

Steem_Z65_rep2

Data production & acquisition

Phenotypic data

Automate data collection

Sensors, images, NIR spectrometry…

Adjustments/corrections by geostatistical methods

Extraction of relevant information

13

Data production & acquisitionEnvironmental data

Local / internal:

- Sensors, airborne imagery, …

Global / external:

- Databases, internet, satellite images, …

Precise description of the growing conditions

14

Air temperature

Relative humidity

Dew point

ModellingMolecular data

Cost

Availability

Predict: genotype phenotype

QTL/GWAS – identify genomic regions involved

genomic selection – "black box" approach

15

Modelling

Statistical methods

Linear mixed models

Bayesian approaches

More and more complex models

GxE

Epistasis

computationally intensive methods

16(from Van Eeuwijk et al., 2010)

Data management

17

Integrative viewer for genomic data

Databases

BIG DATA: large volume of structured and unstructured data

Infrastructure

18

Local

on-the-premises computing

"data-centric computing"

Central

enterprise resources

Security

NGS data analysis on BIOGEMMA HPC (912 cores)

Elastic (cloud)

flexibility

low cost / hour CPU

Pied de page 19

Take Home Messages

Bioinformatics: a major activity supporting a large range of applications in Limagrain

Genomics

Phenomics

Enviromics

Biostatistics, Modelling and Prediction

Big Data (HPC, data management)

Both R&D and Applied

In a highly competitive and challenging research area

Pied de page20

More information…

21

Thank you

top related