genotype-by-sequence · jesse’s gs presentation cf. “diversity and structure” presentation...
TRANSCRIPT
Genotype-by-Sequence
Yung-Fen Huang
Collaborative Oat Research Meeting, March 7, Ottawa
Outline
2
• Principle of Genotype-by-Sequence (GBS)
• Oat GBS markers
• SNP assay vs. GBS
• Possible applications
• Ongoing oat GBS analysis and expected outcomes
Genotype-by-Sequence
3
1. Complexity reduction: Digest DNA with restriction enzyme(s)
Use methylation-sensitive enzyme to filter out repetitive genomic regions
2. Ligate adapters
Sample 1 Sample 2 Sample 3
Genomic
DNA
sample-specific barcode
3. Pool and amplify samples
4. Sequencing Case of oat: 1.5 M reads/sample ~ 0.7% of genome (96-plex)
Genotype-by-Sequence
4
5. SNP calling Use bioinformatic pipeline(s) to process the raw data for SNP identification
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
Ex. 1373 oat sample = 237 Gb (compressed file) = 1.5 billion reads
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
Marker S1 S2 S3 S4 S5 S6 S7 S8 S9
M1 A C C A A A A A A
M2 A H G A A A N A A
M3 A T T N T T N N T
M4 C A N A N N N N A
M5 A A C C C A A A A
M6 G C G C G G G C G
M7 N G N G G G G G N
Advantages – fast, large and cheap
- Marker discovery and genotyping at the same time
- Multiple samples at the same time
Challenges…. Bioinfo? And other?
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA
TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT
TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG
TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG
TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA
Oat molecular data from CORE
5
KxO OxT TxM
DxE
Genotype-by-Sequence
(120K)
SxH OxP HxZ PxG
GoldenGate
(2K)
subset (108 lines)
Breeders’ selection (580 lines)
In progress
Infinium
(6K)
Bi-parental
mapping
populations
IOI (350 lines)
How many markers does GBS produce?
6
Bi-parental (7 populations)
Europe (32 lines) Diversity panel (152 IOI lines)
~ 10,000 SNPs
More sequences
= more SNPs
North America (12 Breeding programs)
Cumulative marker numbers
Missing data (%)
10 20 30 40 50 60 70 80
0
20,000
40,000
60,000
80,000
100,000
120,000 N
o. o
f m
ark
ers
All (1373 lines)
OxT
~ 4,000 SNPs
North America (12 Breeding programs)
All (1373 lines) Bi-parental (7 populations)
Europe (32 lines) Diversity panel (152 IOI lines)
Oat GBS markers
7
No. of markers at different levels of completeness
Completeness (%)
10 20 30 40 50 60 70 80 90
0
5,000
10,000
15,000
20,000
25,000
30,000
No
. o
f m
ark
ers
Most SNPs are
25-50% complete
OxT
More sequences
= more markers
at high completeness
More sequences are
expected with
technology update
SNP assay vs. GBS
8
SNP assay GBS
SNP discovery Required Not required
No. of markers 6K-100K 2K-100K1
Time of experiment2 3 days 2-4 weeks
Time for SNP calling3 Weeks to months < 1 day
IT demand for SNP calling Simple High informatic effort
Data completeness (%) > 90% 0 < < 1004
Reproducibility High High
Cost/sample5 ~ $60 $10-20
For 96 samples (in the case of oat; based on former data with future projection):
1: Variable according to sample diversity and data completeness 2: From library preparation to raw data collection 3: Including data curation 4: Completeness varies according to end-use 5: Library and beadchip/sequencing consumables
What can the GBS data tell us?
9
Breeding cycle
New
cultivars Major gene introgression
Genomic prediction
Genomic contribution: selection precision
(+cycle acceleration)
Genetic structure
of the sample
QTL mapping Association mapping
Structure and relatedness analysis
Genome organisation
Genetic map
High-throughput genomic data
Trait genetic
architecture
Missing data don’t matter
ex. Wheat, barley, cassava
Good
phenotypes
Ongoing analysis – genetic map update
10
oc_plos_16A
[0] gmi_es15_c4222_543 [3] gmi_es17_c10073_640
[11] gmi_es17_c20215_324 [15] gmi_es_cc12708_442
[16] gmi_es17_c17558_304 [18] gmi_es15_c5368_259
[19] gmi_es01_c13907_104 [20] gmi_es01_c18017_440
[21] gmi_es17_c968_903 [22] gmi_es01_c7970_395 [23] gmi_es01_c1725_728
[24] gmi_es15_lrc19562_699 [25] gmi_es15_c19227_114
[26] gmi_es02_c3206_293 [30] gmi_es01_c13820_382
[32] gmi_snp2043_1 [36] gmi_es02_c1538_477
[37] gmi_es_cc2716_392 [39] gmi_snp_lrc40347_1 [42] gmi_es17_c8741_79
[44] gmi_es02_c15898_126
[52] gmi_es17_c3846_396 [54] gmi_es_cc13348_93
[56] gmi_es02_c21402_61 [57] gmi_ds_cc4575_55
[58] gmi_es15_c2802_625 [59] gmi_es02_c8034_282
[60] gmi_es_cc6497_157 [61] gmi_es17_c3200_273 [62] gmi_es17_c1612_641
[66] m38721-1 [67] af237553-1-2
[69] gmi_es15_c10509_256 [71] bm_912a
[74] gmi_es17_c5169_555 [75] gmi_es01_c17040_394
[76] gmi_es01_c1287_580 [77] gmi_es02_c12598_260 [81] gmi_es17_lrc7334_312
[82] gmi_es01_c284_1036 [86] bm_183a
gbs2_pg95_with_dist_16A
gmi_es01_c4259_207 [0] avjp1302 [0] gmi_es_cc9290_178 [3] tp252329 [12] avjp42734 [13] gmi_es17_c20215_324 [48] gmi_es17_c12516_818 [54] tp17466 [54] tp342240 [55] gmi_es15_c965_491 [56] gmi_es15_c735_156 [56] gmi_es15_c5905_473 [58] avjp70170 [59] avjp70171 [59] gmi_es02_c12745_731 [61] gmi_es17_c4427_657 [61] avjp20306 [63] avjp77463 [64] gmi_es14_c2025_443 [65] gmi_es17_c9257_328 [67] gmi_es03_c2344_498 [69] gmi_es17_c2699_441 [69] gmi_es01_c1725_728 [70] avjp76937 [72] avjp12767 [73] gmi_es01_c7970_395 [74]
avjp97487 [94] avjp53477 [96] gmi_es17_c9625_419 [98] gmi_es_cc14000_280 [99] gmi_es17_c5367_259 [100] avjp125669 [101] gmi_es02_c21402_61 [102] gmi_es15_c2802_625 [102] gmi_es17_c1612_641 [106] avjp68334 [106] avjp77316 [109] avjp105825 [110] avjp116909 [113] gmi_es15_c10509_256 [114] avjp115236 [116] gmi_es01_c17040_394 [119] avjp119774 [119] gmi_es17_c5169_555 [120] avjp85794 [121] avjp42711 [123] avjp12306 [124] avjp14139 [125] avjp52039 [126] tp279042 [127] gmi_es05_c8916_635 [129] avjp65787 [130] gmi_es17_c2063_243 [131] gmi_es15_c900_850 [149] avjp49411 [152] avjp90884 [161] gmi_es15_c17743_247 [170] tp336131 [177] gmi_es17_c7320_909 [184]
- 2nd generation framework map with high
quality SNP and GBS markers (1.5K +
40.6K) from 7 bi-parental populations
(quite challenging!)
Larger regions are covered
(ex. 16A: consensus vs. updated PxG map)
- Place historical markers and 19.6K
medium-quality GBS markers on
updated framework map
High-density consensus map of more than
50K ordered markers
Genetic map
Expected outcome – upcoming analyses
11
Breeding cycle
New
cultivars Major gene introgression
Genomic prediction
QTL mapping Association mapping
High-throughput genomic data
Trait genetic
architecture
Genome organisation
Structure and relatedness analysis Structure and relatedness analysis
Association mapping QTL mapping
Expected outcome
Upcoming analysis
Trait genetic
architecture
Genomic prediction
cf. Jesse’s GS presentation
cf. “Diversity and structure” presentation
cf. Allele mining presentations
Genetic structure
of the sample
Genetic structure
of the sample
12
Genotype-by-Sequence
13
Main steps of SNP calling
A
A
G
G
G A
G
G
Sample 1 Sample 2 Sample 3
i. Group sequences (“reads”) of the same sample according to barcode
ii. Group identical reads (groups of reads = “tags”)
A S1 T S1
T S1 A S1
G S2
S3
G S2
G S2 S2
A S3
S2 G S3
G S3
ii. Identify SNPs (group tags with few base mismatches, ex. 1 base)
C
T
T
C C
C
C C
T S1
T S1
S3
S2
C
C
SNP 1 SNP 2
G S2
G S2
G S2
G S3
G S3
A S1
A S1