affymetrix microarray and illumina/ solexa nextgen sequencing yuannan xia, ph.d genomics core...

19
Affymetrix Microarray and Illumina/ Solexa NextGen Sequencing Yuannan Xia, Ph.D Genomics Core Research Facility 10.27.2009

Post on 20-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Affymetrix Microarray and Illumina/ Solexa NextGen Sequencing

Yuannan Xia, Ph.D

Genomics Core Research Facility10.27.2009

1. High density oligo array: a single array containing 1 – 6 million features generates 1 – 6 million probe hybridization data points for summaring to values of 20K – 50K genes.

2. Hybridization Oven 640: Hybridize samples to array

3. Fluidics Station 450 : washing and staining

4. Scanner 3000 7G: Confocal laser scanning; High pixel resolution at 0.5 micron level

5. Data generating and processing software: GCOS

Software pipeline for data mining and annotation: Bioconductor Rosetta, AffyMiner, Ingenuity, Netaffix

Affymetrix GeneChip Microarray System

Affymetrix Expression ArraysArray type Number of transcripts Number of genes

Human U133 plus 2.0

>47,000 38,500

Mouse 430 2.0 >39,000 >34,000

Rat 230 2.0 >31,000 >28,000

Arabidopsis ATH1 22,500 >24,000

Drosophila 18,500 18,000

Wheat 61,127 55,052

Rice 51,279

Soybean 58,000 58,000

Barley 25,500 22,000

Illumina/Solexa Genome Analyzer II System

Flow cell – A glass slide with 8 channels (lanes) and 16 manifold ports for performing all PCR and sequencing reactions inside each channel.

Cluster generation station – Perform PCR bridge amplification to generate clusters inside the channels of flow cell and prepare flow cell ready for sequencing

GAII and Paired End Module – Perform sequencing, imaging, cluster modification for paired end read 2 sequencing

Two Key Chemistries used in Solexa Sequencing Technology

1. PCR bridge amplification of individual templates in a shotgun library to generate clusters (DNA polymerase colony)

High cluster density: 10 – 20 million/Lane

80 – 150 million/Run

2. Reversible Terminator Sequencing Chemistry

Allow to incorporate only ONE nucleotide at each cycle

Generate accurate (>99.5%) sequences:

300 – 800 Megabases/lane

3 – 6 Gigabases/Run

Bridge Amplification of Individual Templates by PCR

Cluster Generation

>All 4 bases with Reversible Terminators >4 labeling colors

>Terminators can be removed >Add all 4 nucleotides in one reaction

>No problem with homopolymer repeats >Higher accuracy

Sequencing by Synthesis Using Reversible Terminators

A. Extend first base T, read, and deblock. B & C, Repeat step A to extend strand.

D. Generate base calling.

Steps of Sequencing by Synthesis

A B C D

Base Calling From Image Raw Data

The identity of each base of a cluster is read off from sequential images

Cluster-a (xa,ya)

Cluster-b (xb,yb)

Read a

Read b

Cycles 1 - 9

Genomic ResequencingYeast genome (V. Gladyshev; AGP Corn

Processing)Fugus genome Aspergillus (S. Harris)

ChIP SequencingArabidopsis CHlP DNA (Fromm; Cerutti)

mRNA Sequencing - TranscritomeArabidopsis transcriptom (H. Cerutti)Human KSHV cell transcrptome (C. Wood)Chlorella/Virus transcriptome (J. Van Etten)

Mole rat transcriptome (V. Gladyshev and D.Fomenko)

Fugus Aspergillus transcriptome (S. Harris)

Paired End SequencingArabidopsis mitochondrial genome (S. Mackenzie)

Small RNA sequencingSeveral UNL faculty have expressed strong interest. (Y.Bin, H. Cerutti, J. Mower, J. Alfano)

Solexa Sequencing Applications at UNL

Genomic Resequencing

Data of Resequencing of 19 yeast genomes

Chromatin immunoprecipitation sequencing

(ChIP-seq)

Genome-wide analysis• Gene Regulation and Control• Epigenetic modifications• DNA-protein interactions

Nucleus

Crosslink

IP

Sequencing

Map bindingsites

Transcriptome Analysis – mRNA-Seq

•Relative expression of transcripts•Analysis of splice variants/coding SNPs•Analysis of non-coding RNAs•Transcript discovery

Paired End and Mate Pair Sequencing

Provides long range information– Repeat sequences– Characterize copy number variants & rearrangements– De novo assemblyIncreases output per flow cell

Genomic DNA, total RNA, CHlP DNA (exp design, QC)

DNA shotgun library preparation (SR, PE, cDNA)

Cluster generation (35 PCR amplification cycles)

Sequencing of clusters on GAII (1 TB machine, sequencing, imaging, image processing, base calling)

Data analysis on remote server at Bioinfornatics Core Facility (8 TB machine, base calling , read alignment using Illumina pipeline software)

Workflow of the service

Cost of Gene Expression Profiling

Microarray

• $500 - $650/array/sample

• $3000 - $4000 for 2 treatments-3replicats 6 samples experiment

(6 arrays)

Illumina

• $1300/lane/sample (400Mb sequence), $2100/2 lanes/sample (800Mb)

• $ 4200 for 2 treatment 2 samples 4 FC lanes experiment

(without replicates)

Challenges

More than 40 billion nucleotide sequences have been generated, will be double soon. Need solutions to

- Sample preparation (e.g small RNA libraries, CHlP pulldown)

- Further extracting sequencing data

- Biological annotations

- Data storage and management

- Drafting publications

Budget: - Maintaining both Affymetrix System and Illumina Solexa System is expensive.

- Cost for upgrading the system.

ACKNOWLEDGMENTS

Dr. Mike Fromm

Drs. Jean-Jack Riethoven

Ms. Mei Chen