A targeted subgenomic approach for phylogenomics based on microfluidic
PCR and high-throughput sequencing
Simon Uribe-Convers, Matt L. Settles and David C. TankUniversity of Idaho
www.simonuribe.com@uribe_convers
The era of Genomics
The era of Genomics
Sequence Capture
Genome Skimming
GBS/RADSeq
Transcriptomes
Whole Genome
The era of GenomicsIllumina platform
Genomic Library Amplification Sequencing
http://www.dddmag.com/sites/dddmag.com/files/legacyimages/Articles/2009_11/fluidigm.jpg
Targeted (sub-) genomics
-Using Fluidigm Access Array -48 x 48 (2304 PCRs) -Ready for next-gen sequencing
Microfluidic PCR
Mod
ified
from
:http
://w
ww.
dddm
ag.c
om/s
ites/
dddm
ag.c
om/f
iles/
lega
cyim
ages
/Art
icle
s/20
09_1
1/flu
idig
m.jp
g
Primer: forward & reverseConserved sequenceBarcodesSequencing adaptors
-4 primer reaction
-Dual barcodes and adapters are incorporated in the reaction
-No need for library preparation!
Microfluidic PCR
Primer design criteria
700bp
-Variable regions between 400-900bp -Conserved flanking regions -Every primer has the same annealing temperature (60°C)
Success
Dimer
Fail
1000 Plants Project (1KP)MarkerMiner
Chloroplast data-Six complete plastomes (via long PCR)
-Most variable regions in the chloroplast
-Designed 74 primer pairs
-53 primer pairs were successfully validated -72% success rate -The 48 most informative ones were chosen
average variability 2.7% (0.8%-7.5%)
LSC IRB SSCSmall Single CopyLarge Single Copy Inverted Repeat
Chloroplast data
-Low coverage genomic data
-Shotgun sequencing for four sample - three species HiSeq 2000 - 100bp paired-end reads
Nuclear data
Orthology, yes!
-Compared our reads to public databases PPR gene family COSII !
-Pipeline: BLAT Keeps reads and gene MAFFT IntronFinder from SolGenomics
Nuclear data
R primerExonExon
Target gene
F primer400-800 bp
Raw reads
Data Processing
Raw reads
-Trimming (optional)
-different values for R1 and R2 !
-Merge reads
-Min. 20 bp overlap
-Red colors are joined reads
-Grey colors are unpaired !
-Very little missing data !
!
Sample 1 Sample 2 Sample3
Raw reads
-Split reads into samples by dual barcodes (demultiplexing)
Region 1 Region 2 Region 3
Sample 1
-Split reads into amplicons by primers
-Up to 2 primer mismatches
-4 last bp of primers must match to produce clean ends
Sample 1 - Region 1
Sample 1 - Region 1
40% 40% 15% 5%
Minimum 5 reads and 5% of all reads
Sample 1 - Region 1
21%
Minimum 5 reads and 5% of all reads
21% 21% 21% 12.5% 4.1%
Neobartsia - Orobanchaceae (Uribe-Convers et al. in prep; UIdaho)
576 samples Nuclear: 21 PPR, 24 COSII, 1 ITS, 1 ETS, 1 Phototropin2 Chloroplast: 48 most variable regions Total: ~50,000 bp
Gene Family No. Primer Pairs Validated Primer Pairs Success ratePPR 44 26 59.09
COSII 130 25 19.23ITS 4 3 75ETS 4 4 100
Phototropin1 3 0 0Phototropin2 3 3 100
Total 188 61 32.44
Castilleja - Orobanchaceae
96 samples Nuclear: In primer design Chloroplast: 48 most variable regions Total: ~25,000 bp
CNMR.8
CNMR12
CAC
C17
CNMR.9
CNRM.4
CAPB28
CAC
C10
CAPB.1
CAPB29
CNRM.1
CATB26
CATB23
CAC
C13
CNRM30
CNAC
21
CNAC
10
CNAC
22
CNAC
19
CNNR28
CAM
D.4
CAM
D.2
CAM
D.7
CASC
S13
CWMT.1
CWMT.2
CMPAL29
CMPAL.8
CMPAL20
CAPR
C.8
CAPR
C.9
CAPR
L.8
CAPR
D.6
CNPH
12
CNPC
21
CNPC
15
CNPC
.9
CNPH
14 CNPH
13 CNNR30
CASC
31
CASC
.1
CMJH21
CMJH20
CMJH10 C
LiWA26
CLiW
A30
CLiW
A16
CLaPL.6
CLaPL.1
CLaPL.4
CLaG
P.5 CLaG
P.3
1103a
1103b
CWMB.5
CWBH
.1
CWMB.6
CWMB.1
CWBH
20
767d
770
771a
CLiH
D15
CLiPP21
CLIPP16
CLIH
D13
CMNP19
CLIH
D10
CLiTB18
CMNP25
CMNP13
CLiSW
15
CLiSW
16
CLiSW
14
CLiPP12
CMMP.2
CMMP.9
CMMP10
CAAR
.1
CAAR
10
CAAR
11
CLiD
N.5
CLiD
N.4
CLiD
N16
CLiD
N17
CLiTB.2
BS ≥ 75%BS ≥ 90%BS = 100%
C. affinis var. affinisC. affinis var. neglectaC. affinis var. inflataC. affinis var. contentiosaC. affinis var. insularisC. wightiiC. mendocinensisC. latifoliaC. litoralis
A
CD
E
F
G
B
A
B
C
D
E
F
G
Castilleja affinis vars. affinis/neglecta/inflataCastilleja mendocinensis / C. wightiiCastilleja latifoliaCastilleja affinis var. contentiosaCastilleja wightiiCastilleja affinis var. insularisCastilleja litoralis / C. mendocinensis
Tank et al. in prep
Lachemilla - Rosaceae (Diego Morales-Briones et al., UIdaho)288 samples Nuclear: 48 genes, Chloroplast: 48 most variable regions Total: ~55,000 bp
Autopolyploidy Allopolyploidy
Cucurbita - Cucurbitaceae (Heather-Rose Kates et al.; UFlorida)
22 species Nuclear: 48 genes
Draba and Solanum - Solanaceae (Ingrid Jordon-Thaden et al.; Bucknell University)
Nuclear: Genes based on transcriptomes using MarkerMiner
Tank lab Diego Morales-Briones, Hannah Marx Sarah Jacobs, Maribeth Latvis !IBEST Sam Hunter, Dan New, Tamara Max !
Acknowledgments
@uribe_convers