introduction to illumina ngs technology, library...
TRANSCRIPT
Introduction to Illumina NGS technology, Library
preparation and Mars-seq
Hadas Keren-Shaul Advanced Sequencing Technologies (Sandbox), LSCF
Introduction to Deep Sequencing Analysis course June 2017
LSCF
Sandbox A Playground for Genomic Research and
Science Innovation
• Open 24/7
• Access to Weizmann trained users
Sandbox lab, Levine building, Room 202
Genomic technologies are greatly advancing Biology and Medicine
How do we get access to these technologies?
How do we share the technological developments in Weizmann?
The Interface between cutting edge technologies developed in individual labs to the entire community of Weizmann
• Standardizing custom genomic protocols
• Affordable, accessible to many users
• Hands-on workshops
• Quality assured equipment and consumables
• Troubleshooting and guidance
Bringing advanced genomic technologies to Weizmann scientists
Sandbox Vision
Weizmann Genomic
Innovation Sandbox
Weizmann scientists
NGS – Next Generation Sequencing
The research tool to study biological systems with unprecedented throughput, scalability, and speed
Broad range of applications: Sequence whole genomes
Zoom in to deeply sequence target regions
Utilize RNA sequencing to discover RNA variants and splice sites, or quantify mRNAs for gene expression analysis
Analyze genome-wide methylation or DNA-protein interactions
Study microbial diversity in humans or in the environment
Timeline and Comparison of Commercial HTS Instruments
Jason A. Reuter, Damek V. Spacek, Michael P. Snyder Molecular Cell Volume 58, Issue 4, Pages 586-597 (May 2015) DOI: 10.1016/j.molcel.2015.05.004
Illumina, Sequencing By Synthesis
C. Sequencing
https://youtu.be/fCd6B5HRaZ8
Illumina, Sequencing By Synthesis
D. Alignment and Data Analysis
https://youtu.be/fCd6B5HRaZ8
What should I know before sequencing?
• Library quality and quantity
• Type of protocol used
• Type of kit used
• How many bp to read:
rd1, rd2, index1 (i7), index2 (i5)
• Run definitions can be made in Illumina website, basespace
In the Sandbox - NextSeq, Illumina
• Desktop sequencing machine, fast, flexible, high-throughput
• Independent, easy and accessible sequencing 24/7
• Nextseq training • Detailed run protocols • A downstream analysis pipeline, generated by LSCF
bioinformatics, for immediate demultiplexing of samples • Used daily by many Weizmann labs
RNA-Seq - Method of choice to study Gene Expression
Identification of novel transcripts
Less background noise
Greater dynamic range for detection
How to perform RNA-seq?
Many different methods for library preparation
Strand specific RNA-seq methods – which DNA strand corresponds to the sense strand of RNA
Wiley Interdisciplinary Reviews: RNA Volume 8, Issue 1, 19 MAY 2016 DOI: 10.1002/wrna.1364
How to perform RNA-seq?
Sequencing on DNA molecules instruments: Capture RNA molecules
Convert RNA to cDNA with defined size range
Add adapter sequences on the cDNA ends for amplification and sequencing
Fragmentation – size limitation of sequencing platform
Add 5’ and 3’ adapters
RNA-seq guidelines
RNA extraction method needs to be calibrated per sample used – kits, beads, Trizol, ec.
Selection of Poly(A)+ transcripts – beads, primers in RT
rRNA depletion – for non-poly(A) RNAs (prokaryotic mRNAs, fragmented mRNAs from FFPE)
RNA Input for library preparation
RNA quality and purity • Measure UV absorption- Nanodrop
• RNA has a maximum absorption at 260nm
• A260/280 – level of protein contamination in the sample. Pure RNA =2.1. Acceptable: 1.8-2.0
• A260/230 – level of salt / organic compounds contamination (guanidine salts and phenols, used in RNA isolation protocols). >1.5
• UV absorbance depends on pH of RNA solution.
• Inaccurate under 20 ng/ml (Qubit)
RNA Input for library preparation
RNA integrity
Degraded RNA will not perform well in dowsntream applications!
Run the on a 1% agarose gel -
28S rRNA band should be ~2-fold 18s
Equal intensity indicates some degradation
mRNA rRNA
rRNA
Higher molecular weight bands – can indicated DNA contamination
Smearing below rRNA indicates poor RNA quality
RNA Input for library preparation
RNA integrity
Run RNA on a Bioanalyzer / TapeStation-
RIN – RNA Integrity Number – quality score for total RNA
Mars-seq – Scalable and sensitive RNA-seq
• Library generation for 3’ RNA seq
• Developed in the lab of Ido Amit
• Low input material (1 ng of RNA)
• Stable, suitable for inexperienced users
• Suitable for a wide variety of species and applications (sorted cells, frozen tissues, etc.)
• Ultra low cost due to custom made reactions
• Simple and efficient due to early pooling of samples
• RNA data < 1 week
• A detailed quality control scheme for library evaluation at different steps prior to sequencing
Jaitin, Kenigsberg, Keren-Shaul et al., Massively Parallel single cell RNA-seq. Science 2014
Library construction- Day 1
Step 1: Reverse transcription
Step 4: Second strand synthesis
3’ 5’ An
NT20-UMI-barcode-partial rd2-1-T7 promoter 3’ 5’
Step 2: Sample pooling A
NT20-UMI-barcode-partial rd2rev-T7 promoter 3’ 5’ 5’ 3’
Step 5: In Vitro Transcription
Un-UMI-barcode-partial rd2rev 3’ 5’
RNA
cDNA
2nd strand
Legend:
Step 3: Exonuclease I
5' –T7 promoter-Illumina sequences XXXXXXX NNNNNNNN TTTTTTTTTTTTTTTTTTTTN 3'
BC-7bp UMI-8bp
aRNA
Un-UMI-barcode-partial rd2rev 3’ 5’
Step 7: RNA Fragmentation
Step 8: RNA/ssDNA ligation
P5_rd1 forward
primer
Step 10: Amplification + Illumina primers addition by nested PCR
Step 9: Reverse transcription
Step 6: DNaseI
OH OH
OH
v Un-UMI-barcode-partial rd2rev 3’ 5’ 3’ 5’ partial rd1rev
Un-UMI-barcode-partial rd2rev 5’ 3’
3’ 5’ partial rd1 primer
P7_rd2 reverse
primer
Library ready for Illumina sequencing
OH Un-UMI-barcode-partial rd2rev 5’
partial rd1
P5 P7
5’ 3’ partial rd2
Library construction- Day 2
Sequencing of Mars-seq libraries
• NextSeq® 500 High Output v2 Kit (75 cycles)
• 75 rd1, 15 rd2, no index
• Pooling up to 80 samples in one run
P5 P7 Read1 Read2
Insert to sequence
Read to align to genome
Read 2 to obtain cell and molecule barcodes
Mars-seq Workshop
• 3 days hands-on workshop
• Standard RNA material
• 1 representative per lab
# o
f M
ars-
seq
sam
ple
s
0
50
100
150
200
250
300
350
400
450