sage- serial analysis of gene expression

34
Serial Analysis of Gene Expression (SAGE) Technology By: Dr. Ashish C Patel Assistant Professor Vet College, AAU, Anand

Upload: aashish-patel

Post on 13-Apr-2017

55 views

Category:

Science


0 download

TRANSCRIPT

Page 1: SAGE- Serial Analysis of Gene Expression

Serial Analysis of Gene Expression (SAGE) Technology

By: Dr. Ashish C PatelAssistant ProfessorVet College, AAU, Anand

Page 2: SAGE- Serial Analysis of Gene Expression

Serial Analysis of Gene Expression

It is believed that the majority of biological phenomena found in a variety of organisms can be explained by the quantity of gene products.

To understand the cellular functions under the certain conditions at a certain time By measuring the mRNAs of different genes and respective numbers of mRNAs at a point of time.

Each cell contains more than 10000 mRNAs of different genes, copies of mRNAs of each gene ranging from one to more than 10000, and, as a total, up to half a million mRNA transcript copies. It is therefore practically impossible to determine them.

Page 3: SAGE- Serial Analysis of Gene Expression

Large-scale Random cDNA sequencing by EST project was very useful for the identification of unknown genes expressed in given cells or tissues. (Adams et al., 1991)mRNA Species 1 ……………. mRNA Species n

Plasmid Insertion

cDNA clones

RE

Assemble EST1…n

Hence, sequencing = n x n times

cDNA

Assemble EST1…n

Assemble EST1…n of all seq. projects

All steps

Page 4: SAGE- Serial Analysis of Gene Expression

• However, this approach was not designed to quantify expressed genes.

• The body mapping project (Okubo et al., 1992) attempted to construct gene expression profiles of a number of cells and tissues by random sequencing of a 3’-directed cDNA library.

• About 300 bp fragments of these 3’-region were called gene signature and each represented a particular mRNA species.

• By sequencing 1000 or more cDNA clones, they could make a rough pattern of gene expression and identify mRNAs of highly abundant class.

• However, an expected weakness of both EST and body mapping projects, in which one sequencing process yields only one cDNA sequence.

• Mainly because of this low throughput, the profiles obtained by the body mapping project unavoidably became a long way from what is expected and demanded.

Page 5: SAGE- Serial Analysis of Gene Expression

• Although the more recent methods of hybridization-based analyses (DNA microarray) using immobilized cDNAs or oligonucleotides can potentially examine the expression patterns of a relatively large number of genes but these method can only examine expressed sequences that have already been identified.

• In contrast, the SAGE method allows for a quantitative and simultaneous analysis of a large number transcripts in any particular cells or tissues, without prior knowledge of the genes.

• As the body mapping procedure, this method takes advantage of the 3’-portion of mRNA as the gene tag, but of much shorter form (9–10 bp).These tags can be serially connected before cloning into a plasmid vector.

• Since the resulting plasmid clones contain multiple tags, sequences of several dozens of mRNAs can be obtained by a single sequencing reaction.

Page 6: SAGE- Serial Analysis of Gene Expression

• Rapid and cost-saving sequencing by this original device allows quantification and identification of a large number of cellular transcripts.

Page 7: SAGE- Serial Analysis of Gene Expression

• SAGE is based mainly on two principles, representation of mRNAs (cDNAs) by short sequence tags and concatenation of these tags for cloning to allow the efficient sequencing analysis.

• The hypothetical eukaryotic cell that contains seven mRNA molecules composed of four species is depicted.

• To explain the gene expression profile of this cell, they would have to conduct several cDNA sequencing reactions.

• However, if each mRNA species can be represented by a short unique sequence stretch (such as 9 bp tag), the purpose would be attained by sequencing them, because a sequence stretch as short as 9 bp can distinguish 49 (262 144) transcripts, provided a random nucleotide distribution throughout the genome.

• If we could connect these tags into a long stretch of DNA molecule, sequencing reaction would be needed only once.

Principle of SAGE

Page 8: SAGE- Serial Analysis of Gene Expression

The Principle of SAGE. The hypothetical eukaryotic cell that contain seven mRNA molecules composed of four species is shown as a model. Boxed are tags that are proper to mRNA species

Page 9: SAGE- Serial Analysis of Gene Expression

SAGE Scheme

SAGE method allows for a quantitative and simultaneous analysis of a large number of transcripts in any particular cells or tissues

mRNA species 1mRNA species 2mRNA species 3

9–10 bp tag

AAAAAAAAAAAAAAA

clone

Extract tags ,concatenate in plasmid

Page 10: SAGE- Serial Analysis of Gene Expression

SAGE Scheme

Isolate insertion seq from plasmid

sequencing

TAGCGG.. ATGCGGC.. TATTTTAGC…

mRNA tag of species 1 mRNA tag of species 2 mRNA tag of species 3

Use BLAST serviceHuman genome

ATCGCC TAGCGG

TACGCCG ATGCGGC

ATAAAATCGTATTTTAGC

Annotated Gene 1 Annotated Gene 12 Annotated Gene 34

Result: gene 1, 12, 34 are expressed during certain time say mitosis

Page 11: SAGE- Serial Analysis of Gene Expression

SAGE procedure

AAAAAmRNA

mRNa-cDNA hybrid

TTTTT

Oligo(dT)-primerAAAAA

Remove RNA by RNase H

TTTTT

ds cDNA synthesis TTTTT

AAAAA

Double-stranded cDNA is synthesized from mRNA by biotinylated oligo(dT) primer. b/c high efficiency for 3 ́ poly(A) region present in most eukaryotic mRNA

Page 12: SAGE- Serial Analysis of Gene Expression

SAGE procedure

AAAAATTTTT

TTTTTAAAAA

5’ GTAC

Bind to streptavidin beads

TTTTT5’ GTAC

Divide in half

TTTTT5’ GTAC

AAAAA

AAAAATTTTT

AAAAA5’ GTAC

The cDNA is then cleaved with a restriction enzyme (called anchoring enzyme, NlaIII

The cDNA with a cohesive end at its 5’terminus is immobilize by binding to streptavidin-coated beads.

Page 13: SAGE- Serial Analysis of Gene Expression

SAGE procedure

GTACAAAAATTTTT

CATGGGGA CCCT

GTACCATGGGGA

CCCTAAAAATTTTTLinkers A

Linkers B

Cleave Tagging Enzyme (TE) e.g. BsmFI.

Linkers have RE site for BsmFI or FokITE RE site

TE RE site

GTACCATGGGGA

CCCTNNNNN NNNNNNNNNNNNN Overlapping

end

CATGGGGA CCCT

NNNNN NNNNNNNNNNNNN GTAC

T4 DNA polymerase

GTACCATGGGGA

CCCTNNNNNNNNNNNNN NNNNNNNNNNNNN

CATGGGGA CCCT

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

Blunt end

Two independent linkers are ligated using NlaIII cohesive termini to each

Page 14: SAGE- Serial Analysis of Gene Expression

SAGE procedure

GTACCATGGGGA

CCCTNNNNNNNNNNNNN NNNNNNNNNNNNN

CATGGGGA CCCT

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC5’ 5’

Ligate tail-to-tail orientation GTAC

CATGGGGA CCCT

NNNNNNNNNNNNN NNNNNNNNNNNNN

CATG CCCT GGGA

NNNNNNNNNNNNN NNNNNNNNNNNNN

Amplify by primers A and B

GTACCATGGGGA

CCCTNNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN

primer A

primer B

GTAC

CATG CCCT GGGA GTAC

Two portions are mixed again and ligated. The 5’ends of the linkers are blocked by amino group, only the mRNA-derived termini are able to be ligated in a tail-to-tail orientation

Page 15: SAGE- Serial Analysis of Gene Expression

SAGE procedure

After 1 round of amplification

GTACCATGGGGA

CCCTNNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN

GTACCATGGGGA

CCCTNNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN

AE RE site

AE RE site

NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

CATG

CATGGGGA CCCT

CATG CCCT GGGA

CATG CCCT GGGA

GTAC

GTAC

GTAC

CCCT GGGA GTAC

NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

CATG

Isolate ditags

Amplified product cleaved by NlaIII, an anchoring enzyme

Ditag fragments flanked both ends with NlaIII cohesive terminus are isolated and ligated to obtain concatemers

Page 16: SAGE- Serial Analysis of Gene Expression

SAGE procedure

NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

CATG

NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

concatenate

NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

CATG NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

Insert into plasmid & clone

CATG

CATG

You can concatenate n number of species

1 mRNA species gives 2 ds cDNA joined by Palindromic Sequences

Page 17: SAGE- Serial Analysis of Gene Expression

SAGE procedure

NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

CATG NNNNNNNNNNNNN NNNNNNNNNNNNN

NNNNNNNNNNNNN NNNNNNNNNNNNN GTAC

CATG

1 mRNA species

mRNA species no. 1

mRNA species no. 2

mRNA species no. 3

mRNA species no. n

plasmid

Page 18: SAGE- Serial Analysis of Gene Expression

• SAGE is a tool for the study of gene expression, a variety of biological phenomena has been analyzed. Total tags analyzed by this method are close to five million up to year 2000.

• Table 1 showing highly diverse types of cells and tissues under a variety of physiological and pathological conditions can be noticed. Numbers of total collected tags in each study were variable.

Page 19: SAGE- Serial Analysis of Gene Expression
Page 20: SAGE- Serial Analysis of Gene Expression

Cancer studies (Lal et al., 1999)

• By comparing the gene expression profiles derived from cancer and normal tissue of interest, a large number of genes were identified as tumor specific.

• Usually Northern blot hybridization analysis was performed for the confirmation of differential expression of these genes against a number of independently isolated tissue samples of similar nature.

• About half of the overrepresented genes identified by SAGE were reproducibly present in these samples, while the behavior of the other half was quite different. This may reflect the heterogeneity among tumors from different individuals.

Page 21: SAGE- Serial Analysis of Gene Expression

Immunological studies• A few SAGE analysis has been directly applied for the study of

immunological phenomena.• Chen et al. (1998) have reported that the changes in gene

expression in the rat mast cells before and after they were stimulated through high affinity receptors for immunoglobulin E.

• It had not been previously associated with mast cells were macrophage migration inhibitory factor, receptors for growth hormone-releasing factor and melatonin.

• Many other genes that were differentially expressed were those related to cell structure and cell motility, and numerous unknown genes that showed no database-matching.

Page 22: SAGE- Serial Analysis of Gene Expression

Yeast• Yeast is widely used to clarify the biochemical and physiologic

parameters underlying eukaryotic cellular functions. • The entire genome sequence has been determined (Goffeau,

1997) and the number of genes has been estimated to be about 6300.

• Total mRNA molecules were also been estimated to be15 000 per cell (Hereford and Rosbach, 1977).

• So, yeast was chosen as a model organism to evaluate the power of the SAGE technology.

Page 23: SAGE- Serial Analysis of Gene Expression

Drawbacks, problems and technical modifications• As technical problems, a disadvantage of the need of relatively high

amount of mRNA, relative difficulty to construct tag libraries and others.• MicroSAGE (Datson et al., 1999) requires 500–5000-fold less starting

input RNA, and is simplified by the incorporation of a ‘one-tube’ procedure for all steps from RNA isolation to tag release.

• SAGE-lite, is another similarly-devised protocol also allows the global analysis of transcription from less than 100 ng of total starting RNA (Peters et al., 1999).

Technical difficulty of the procedure; • In the original SAGE protocol, major products of PCR are often linker-

dimers. To minimize contaminating linker molecules, biotinylated PCR primers were introduce, which generates biotinylated ditag products, thus allowing removal of the unwanted linkers by binding to streptavidin beads used at a later stage.

Page 24: SAGE- Serial Analysis of Gene Expression

• A simple introduction of heating step at final ligation step yields cloned concatemers with an average of 67 tags as compared to 22 tags obtained by the original protocol.

• A major problem of the SAGE approach is how to further analyze the unknown tags.

• The utilization of a conventional oligonucleotide-based plaque lift method was employed successfully for the isolation and cloning of a number of genes.

• However, it is almost impossible to discriminate one-base mismatched sequence within oligonucleotides of only 13–14 bp in length rather than temperature-regulated DNA–DNA hybridization technology, thus resulting in numerous false positives.

• An RT-PCR-based method was developed to analyze the corresponding genes and this approach utilizes identified tag sequences and oligo-dT as PCR primers.

Page 25: SAGE- Serial Analysis of Gene Expression

• Matsumura et al. (1999) reported a procedure to recover a longer cDNA fragment by PCR using the SAGE tag sequence as a primer, thereby facilitating the analysis of unknown genes identified by tag sequence in SAGE.

• Sequencing Error: Sequencing error rate affect a SAGE experiment which can improve by using phred scores and discarding ambiguous sequences.

• Short SAGE comprised 14bp and long SAGE comprised 21bp.• About 12% of C. elegans tags are not unambiguously

identified using 14bp tags (Mc Kay et al., 2003). Results of empirical data suggests that Long SAGE gives far greater resolution, but at an increased cost.

Page 26: SAGE- Serial Analysis of Gene Expression

SAGE Data Analysis Strategies

• The sequence files generated by the automated sequencer are analyzed using the SAGE2000 software (www.sagenet.org).

• The three steps involved in obtaining a differential gene expression list are as follows:

(1) Interpret the SAGE tags from the sequence data files by using the SAGE2000 software for extracting ditags and checking for duplicate ditags;

(2) Download a reference sequence database from the NCBI Web site (SAGEmap, www.ncbi.nlm.nih.gov); and

(3) Associating the tags to the expressed gene database.The relative transcript abundance can then be calculated by dividing

the unique tag count by the total tags sequenced, and the fold change can be determined by the ratio of tags between libraries.

Page 27: SAGE- Serial Analysis of Gene Expression

• The initial analysis is usually limited to a predefined tag ratio of greater than 5-fold and a value of P≤0.05.

• The rates of false-positives associated with different probability values have been computed by Monte-Carlo test to validate confidence intervals.

• Depending on the preliminary results, the SAGE data can be reanalyzed by varying the P values and the fold-change thresholds.

Page 28: SAGE- Serial Analysis of Gene Expression

SAGEmap

Page 29: SAGE- Serial Analysis of Gene Expression

http://www.sagenet.org/

Page 30: SAGE- Serial Analysis of Gene Expression

Sage resources

Page 31: SAGE- Serial Analysis of Gene Expression

Sage data

Page 32: SAGE- Serial Analysis of Gene Expression

SAGE APPLICATION • SAGE is useful in comparative expression studies to identify

differences in gene expression between two or more cellular sources of RNA.

• Gene Discovery• Determining changes on gene expression as consequence of an

experimental treatment (e.g. carcinogen, hormone) • Provides quantitative data on both known and unknown genes • Analyzes all transcripts (Transcriptome) without prior selection

of known genes • Analysis of Cardiovascular gene expression• Gene expression in carcinogenesis• Substance abuse studies• Cell, tissue and developmental stage profiling• Profiling of human diseases

Page 33: SAGE- Serial Analysis of Gene Expression

SAGE – Advantages & Disadvantages

Advantages• No hybridizing, so no cross-hybridizing can occur.• Can help identify new genes by using tag as a PCR primer

Disadvantages

• Cost and time required to perform so many PCR and sequencing reactions.

• Type IIS restriction enzyme can yield fragments of the wrong length depending on temperature.

• Multiple genes could have the same tag• As with microarrays, mRNA levels may not represent protein

levels in a cell

Page 34: SAGE- Serial Analysis of Gene Expression

Microarray Vs. SAGE