icahn school of medicine at mount sinai lincs … › pdf › dtoxs_sop_a-7.0...ravi iyengar...

6
Icahn School of Medicine at Mount Sinai LINCS Center for Drug Toxicity Signatures Standard Operating Procedure: Sequencing 3’-end, Broad-Prepared mRNA Libraries DToxS SOP Index: A-7.0 Last Revision: 1/23/2016 Written By: Milind Mahajan, Marc Birtwistle, and Evren U. Azeloglu Approvals (Date): Joseph Goldfarb (1/23/2016) Marc Birtwistle (1/23/2016) Eric Sobie ( 2/21/2016) Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight. Metadata recording is highlighted with yellow highlight and superscript indices . ---------------------------------------------------------------------------------------------------- 1. QC/QA-verified sequencing libraries are obtained from the Broad Institute according to DToxS SOP A-6.0. This protocol describes sequencing of these libraries on an Illumina HiSeq2500 platform with paired-end sequencing. 2. Perform a MiSeq run to verify library integrity. We load 6 pM and run according to the manufacturer’s protocol. a. This requires determining molar concentration of the mRNAseq library, which is described below Step 3aii. b. Note that the data collected here can supplement the final data set. 3. Determine optimum library concentration for sequencing. Note: In order to get maximum sequencing depth per lane, Illumina recommends using an empirically determined concentration of sequencing library that generates 800-1000 raw clusters per mm 2 surface area of the flowcell. This value depends on the preparation of the library, and needs to be determined once through titration for each type of library preparation (types relevant here include 96, 384, or 3x384 well plates)(QA/QC1). a. Calculate molar concentration of the mRNAseq library i. Estimate mean molecular weight from mean base pair size. We use a Bioanalyzer (Agilent Technologies, Cat:G2939A) as per vendor-supplied protocol. ii. Measure mass concentration. We use a Qubit (Life Technologies, Cat:Q32866) as per vendor-supplied protocol. b. Dilute the library concentration to 2 nM with nuclease-free water (non-DEPC treated).

Upload: others

Post on 07-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Icahn School of Medicine at Mount Sinai LINCS … › pdf › DToxS_SOP_A-7.0...Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight

Icahn School of Medicine at Mount Sinai LINCS Center for Drug

Toxicity Signatures

Standard Operating Procedure: Sequencing 3’-end, Broad-Prepared mRNA Libraries

DToxS SOP Index: A-7.0 Last Revision: 1/23/2016 Written By: Milind Mahajan, Marc Birtwistle, and Evren U. Azeloglu Approvals (Date): Joseph Goldfarb (1/23/2016) Marc Birtwistle (1/23/2016) Eric Sobie ( 2/21/2016) Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight. Metadata recording is highlighted with yellow highlight and superscript indices. ----------------------------------------------------------------------------------------------------

1. QC/QA-verified sequencing libraries are obtained from the Broad Institute according to DToxS SOP A-6.0. This protocol describes sequencing of these libraries on an Illumina HiSeq2500 platform with paired-end sequencing.

2. Perform a MiSeq run to verify library integrity. We load 6 pM and run according to the manufacturer’s protocol.

a. This requires determining molar concentration of the mRNAseq library, which is described below Step 3aii.

b. Note that the data collected here can supplement the final data set. 3. Determine optimum library concentration for sequencing.

Note: In order to get maximum sequencing depth per lane, Illumina recommends using an empirically determined concentration of sequencing library that generates 800-1000 raw clusters per mm2 surface area of the flowcell. This value depends on the preparation of the library, and needs to be determined once through titration for each type of library preparation (types relevant here include 96, 384, or 3x384 well plates)(QA/QC1). a. Calculate molar concentration of the mRNAseq library

i. Estimate mean molecular weight from mean base pair size. We use a Bioanalyzer (Agilent Technologies, Cat:G2939A) as per vendor-supplied protocol.

ii. Measure mass concentration. We use a Qubit (Life Technologies, Cat:Q32866) as per vendor-supplied protocol.

b. Dilute the library concentration to 2 nM with nuclease-free water (non-DEPC treated).

Page 2: Icahn School of Medicine at Mount Sinai LINCS … › pdf › DToxS_SOP_A-7.0...Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight

c. Dilute 10 nM PhiX control (Illumina, Cat:15017397) to 0.2 nM with nuclease-free water (non-DEPC treated). Mix 0.5 µL of 0.2 nM PhiX control with 9.5 µL of 2 nM mRNA-Seq library (approximately 1:200 PhiX-to-library molar ratio).

d. Take 3 µL of the library mixed with PhiX from the previous step and mix with 3 µL of 0.1N NaOH (Fisher Scientific, Cat:SS267) in a 1.5 mL tube.

e. Incubate the mixture for five minutes at room temperature. f. Add 294 µL of HT1 buffer (part of Illumina Cluster Generation Kit, Cat:PE-401-

4001)1 i. This yields a 20 pM single-stranded library

g. Repeat steps 2d-f three more times, except add 327 µL, 494 µL, and 994 µL of HT1 buffer into each of the tubes to make 18 pM, 12 pM, and 6 pM single-stranded libraries, respectively.

h. Generate clusters from these four (6, 12, 18 and 20 pM) libraries, one library per lane in 2, 2-lane rapid run flow cells using the cluster generation reagent kit (Cat:PE-401-4001) and Illumina’s cBOT cluster generation instrument (Cat: SY-301-2002). Follow the manufacturer’s protocol.

i. Sequence these clustered flow cells with different picomolar concentrations of the library on a HiSeq2500 according to the manufacturer’s protocol. Sequence using Paired End recipe with 26 bp at read-1 and 46 bp at read-2.

j. Compare the obtained cluster densities (QA/QC1). The concentration at which 800-1000 raw clusters per mm2 are obtained with less than 15% loss to automated pass filtering is the optimal concentration to get maximum sequencing depth.

k. Note that the data generated here can supplement the final data set. 4. Generate clusters for sequencing.

a. Load 160 µL per lane of the optimal pM library concentration (from Step 2) into 8 lanes of the high throughput flow cell that comes with cluster generation kit (Illumina, Cat:PE-401-4001) using the cBOT cluster generation instrument, following the manufacturer’s protocols.

i. Note: We find more than adequate sequencing depth to be obtained by running a 96 sample library in 8 lanes, and by running a 384 sample library in 16 lanes. Often reasonable depth for a 384 sample library is obtained from a single 8 lane run. Scale accordingly and/or optimize depth for your application as desired.

b. Uninterrupted completion suggests successful cluster generation; verify cluster density on the HiSeq2500 during sequencing

5. Sequence the clustered library a. Operate HiSeq2500 using “Sequencing by Synthesis (SBS) reagent kit (Cat: FC-

401-4002)2 following Illumina’s recommended procedure. b. Set the sequencing lengths of the two reads as follows: read-1 = 26 bases and

read-2 = 46 bases. i. Note: Even though only 17 bases are needed in read-1 to identify both the

well barcode and the unique molecular identifier (UMI), we need a minimum of 26 bases in order to use the “Pass Filter” functionality of the HiSeq2500 software (pass filter is required as the unfiltered raw data is often low quality).

ii. Read-1 sequence is trimmed to 17 bp during post-processing for sample demultiplexing and UMI counting.

c. QA/QC2: Monitor the progress of the sequencer at the start of the run (Cycle1), and after Cycle 15, Cycle 25, and Cycle 50. Check the following two items: i. Cluster intensities per cycle (on instrument data display). ii. Check image focus quality for each cycle by visual inspection.

Page 3: Icahn School of Medicine at Mount Sinai LINCS … › pdf › DToxS_SOP_A-7.0...Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight

6. Base calling. a. Process the raw images on the HiSeq2500 computer using Illumina software and

the manufacturer’s protocol. i. We transfer base calls in real-time to a 200 TB size primary storage

server of the Genomics Core Facility. b. QA/QC3: Evaluate base calling quality

i. Observe cluster intensity profiles and signal-to-noise ratios. ii. If quality is sufficient, we distribute final base call data to the end user (in our

case, the LINCS DToxS Signature Generation Core).

Page 4: Icahn School of Medicine at Mount Sinai LINCS … › pdf › DToxS_SOP_A-7.0...Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight

Metadata

1. HiSeq PE Flow cell and Cluster Kit v4 (Illumina, Cat:PE-401-4001): Lot number 2. HiSeq SBS v4 50 Cycle Kit (Illumina, Cat: FC-401-4002): Lot number

Page 5: Icahn School of Medicine at Mount Sinai LINCS … › pdf › DToxS_SOP_A-7.0...Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight

Quality Assurance/Control Steps (QC) QA/QC1: Optimal library concentration for sequencing:

1. Illumina recommends 800-1000 raw clusters per mm2 for maximum read depth. This requires an adequate concentration of the sequencing library to be loaded onto the flow cell; however, it should be noted that too high a concentration would lead to “over-clustering” where HiSeq2500 software will reject clusters through a pass filter.

2. The goal of the library concentration optimization is to determine the lowest concentration of the library where the number of raw and pass filtered clusters is between 800 and 1000 (per mm2). This can be seen in the figure below, where the numbers of raw clusters and pass filtered clusters increase with library concentration until 18 pM, above which the software automatically rejects additional clusters as low quality artifacts. The cluster density is observable in the “Data by Lane” window (see below).

!"

#!!"

$!!"

%!!"

&!!"

'!!!"

'#!!"

'" #" ("

)*+,"*-.,/"0122"3*4,5"6*+24,52"7,8"9-.,/"71:"6*+24,52"

;*+24,5"8

,.2-4<"=>

># ?"

'!"""""""""""""""""""""""'&"""""""""""""""""""""""#!"""@A"6B.6,.451CB."BD"7EFGH,I"*-J515<"

K-451CB."BD"7EFGH,I"*-J515<"4B"8,4,5>-.,"B@C>+>"@A"6B.6,.451CB."5,I+-5,8""

DB5">1L->+>"@122"3*4,5"5,182M"""

Page 6: Icahn School of Medicine at Mount Sinai LINCS … › pdf › DToxS_SOP_A-7.0...Ravi Iyengar (1/25/2016) Quality Assurance/Control (QA/QC) steps are indicated with green highlight

QA/QC2: Monitoring the sequencing run: 1. The HiSeq2500 monitor displays live quality control metrics including ratios for base

calling (colored traces on the left represent individual nucleotide bases), number of raw clusters, and pass filter ratios. These should be tracked for each run.

2. If any deviation is observed in QC parameter values, take corrective actions with the help of Illumina Service Engineer and Field Application Scientist. The run is aborted if the corrective measures turn out to be inadequate.

QA/QC3: Evaluating base calling quality: We filter out data that has a base calling quality score lower than Q30, which corresponds to an error rate of 0.1%.

Clusters  with  optimum  focus  quality:   Clusters  are  close  to  round  shape,     with  good  signal  to  noise.

Clusters  with  bad  focus  quality:   Clusters  look  fuzzy  with  variable  intensity  and  of  irregular  shape.  

Fluorescence intensity of all bases throughout the run. Note that the intensity of first 4-10 bases (cycles) of all reads is somewhat variable, due to low complexity of the sequence. In read-1, intensities of all bases until cycle 17 are visible with near uniform intensities from cycle 4. From Cycle 18 to 26, only thymidine is seen consistently, which is expected since this is mRNA sequencing with poly A tail enrichment.

According to Illumina guidelines, optimum pass filter clusters on a flow cell are 800-1000 per mm2. Cluster numbers per lane should be monitored for each run even though library concentrations are optimized.