solgs workshop 2016
TRANSCRIPT
solGS: Web-based Genomic Selection Analysis Tool
Purposes
Gain understanding of genomic selection, GS model building, breeding values prediction, assessing model data input and output quality.
Brainstorm for ideas to make the tool suit better your research purposes.
Outline
Overview GS and solGS
Demo Exercise, bug watching, feedback
Brainstorming
Phenotyped &
genotyped individuals
Genomic selection…
Prediction model
Predicted breeding
Values (GEBVs)
Genotyped selection candidates
Training population
GS advantages
Little or no phenotyping reduced cost
Shorter breeding cycles Higher selection gain per unit time
Increased prediction accuracy
Phenotyped &
genotyped individuals
Genomic selection…
Prediction model
Predicted breeding
Values (GEBVs)
Genotyped selection candidates
Training population
Challenges…
Data volume, storage Data structuring, cleaning, imputation
Statistical analysis complexity
visualization and sharing
solGShttp://cassavabase.org/solgs
What you can do with solGS…
Store data Chado Natural Diversity schema
Create training dataset Build models and predict breeding values of selection candidates
Test model accuracy
What you can do with solGS…
Explore phenotype data Evaluate population structure Check on relationship between GEBVs vs observed phenotypes
Calculate selection indices, correlation
Visualize data on interactive plots
Calculate selection response
What is the statistical approach behind solGS?
…preparing phenotype data
Omits individuals completely missing phenotype values
Adjusts phenotype values for block effects
Averages across multiple trials after adjusting for block effects
…preparing genotype data
Removes out monomorphic markers Removes markers with > 60% missing values
Removes markers with MAF < 5% Removes individuals with > 80% missing values
Imputes missing marker data Median substitution
…statistical modeling Univariate Two-stage analysis RR-BLUP
Endelman, Plant Genome (2010) GBLUP
Marker-based realized relationship matrix
Prediction accuracy Based on 10-fold cross-validation
How does solGS work?
Websites for exercise
Cassava-devel.sgn.cornell.edu Cassava-test.sgn.cornell.edu Review.cassavabase.org Cassavabase.org https://iita-mirror.cassavabase.org
https://172.30.2.199 Username: sgn Password: eggplant
Phenotyped &
genotyped individuals
Genomic selection steps…
Prediction model
Predicted breeding
Values (GEBVs)
Selection candidates
Training dataset
Demo: Part I
Create training data set & build modelExplore model input and outputPhenotype and genetic correlationPopulation structureSelection index
Things to consider when creating a training data set & building a model
Things to consider…Phenotype data
Number of phenotyped individuals Minimum 20 clones
Relevant to target environment Data quality
Experimental design Measurement accuracy Missing values outliers
Things to consider…genotype data Marker number, genome distribution, polymorphism,
Data quality Allele calling accuracy Missing values (Per marker, individual)
Minor alleles Heterozygosity, LD
Population structure
Let’s do stuff!
single trial – single trait Create training data set and build model Trial method
Search for trial ‘Cassava Ibadan 2002/03’
Create a training dataset with that trial
Description, correlation Build a model for FRW
Explore model input and output, model accuracy Download GEBVs
Exercise: single trial – single trait Create training data set and build model
Search for your trial Create a training dataset with that trial
Check description, correlation Build a model for your trait
Explore model input and output, Population structure model accuracy Download GEBVs
single trial – multiple traits Create training data set and build models
Search for trial ‘Cassava Ibadan 2002/03’
Create a training dataset with that trial
Description, correlation Build models for FRW and CMDS
Explore model input and output for each model,
Genetic correlation Selection index
Exercise: single trial – multiple traits Create training data set and build models
Search for your trial Create a training dataset with that trial
Check description, correlation Build models for two traits at the same time
Explore model input and output for each model,
Genetic correlation Calculate and download selection index
Combined trials – single trait Create training data set and build models using two trials Search for ‘cassava ibadan 02/03 & 01/02’ Create a training dataset with the trials
Check description, correlation Build a model for FRW
Explore model input and output for the model, Population structure Prediction accuracy Download GEBV
Exercise: combined trials – single trait Create training data set and build models using two trials
Search for your trials Create a training dataset with the trials
Check description, correlation Build a model for your trait
Explore model input and output for the model,
Population structure Prediction accuracy Download GEBV
Using list – single trait Create training data set and build a model using plots list
Using the search wizard create a plots list from trial ‘cassava ibadan 2002/03 plots’
Create a training dataset with the list Check description, correlation
Build a model for your FRW Explore model input and output for the model, Population structure Prediction accuracy Download GEBV
Exercise: Using list – single trait Create training data set and build a model using plots list
Using the search wizard create a plots list from a trial… select all plots..
Create a training dataset with the list Check description, correlation
Build a model for your trait Explore model input and output for the model, Population structure Prediction accuracy Download GEBV
Demo: Part II
Predict breeding values of selection populationsGenetic correlationSelection indexSelection gain
Things to consider when applying a model to predict breeding values of selection populations
Things to consider…applying the model Training population vs selection population genetic relationship
Target environment Marker types used Population structure
Predict GEBVs of a Selection population Create training data set & build model Cassava Ibadan 2002/03 FRW
Search for a selection population Cassava Ibadan 2003/04
Predict GEBVs for the selection population Check selection response Download GEBVs
Exercise: Selection Population Prediction Create training data set & build model use one of the models you already built
Search for a selection population Related to the training population
Predict GEBVs for the selection population Check selection response Download GEBVs
Multiple Traits: Predict GEBVs of a Selection population
Create training data set & build model Cassava Ibadan 2002/03 FRW, CMDS
Search for a selection population Cassava Ibadan 2003/04
Predict GEBVs for both traits for the selection population Check selection response Download GEBVs
Exercise: Multiple Traits selection population prediction
Create training data set & build model Use previous two models from your training populations
Search for a selection population
Predict GEBVs for both traits for the selection population Check genetic correlation Calculate selection index
List: Predict GEBVs of a Selection population Create training data set & build model Cassava Ibadan 2002/03 FRW
Search for a selection candidates list Cassava Ibadan 213 genotypes
Predict GEBVs for the selection population Check selection response Download GEBVs
Exercise: selection candidates list Create training data set & build model Go to a previous model page
Create a selection candidates list Use search wizard to create accessions list
Using the model predict GEBVs of the list Check selection response Download GEBVs
Demo: Part III
Trait search Search for ‘fresh root weight’ Select trial ‘cassava ibadan 2002/03’
Check model output
Demo: Part III
PCA using accessions list
Brainstorm for new featuresMake priority list
What features do you like in BMS?What features do you like in to be added in cassavabase?
Thanks to…
Composing a training population: Fitting a
prediction model...3 options
Fitting a prediction model…
Option 1: Search using a trait name
Estimating breeding values of selection
candidates
Applying the model…
Fitting a prediction model…
Option 2: Search for trials
Estimating breeding values of a selection candidates for multiple traits
Applying the models…
Estimating genetic correlations
Calculating selection indices
Fitting a prediction model…
Option 3: use your own list of
individuals
To sum up… Store data Build prediction models Estimate breeding values Additional analyses:
Correlation analysis Population structure Selection indices
http://cassavabase.org/solgs Open source code
Thanks to…
Many thanks!!
Background image: nextgencassava.org