1 mbg-487 microarrays - i 2 human genome project

Post on 21-Jan-2016

233 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

MBG-487

Microarrays - I

2

HUMAN GENOME PROJECT

3

Knowledge about the effects of DNA variations among

individuals can lead to revolutionary new ways to

diagnose, treat, and someday prevent the thousands

of disorders that affect us. Besides providing clues to

understanding human biology, learning about

nonhuman organisms' DNA sequences can lead to an

understanding of their natural capabilities that can be

applied toward solving challenges in health care,

agriculture, energy production, environmental

remediation, and carbon sequestration.

What are some practical benefits to learning about DNA?

4

Genome

• The complete complement of an organism’s genes; an organism’s genetic material.

5

•identify all the approximately 20,000-25,000 genes in

human DNA,

•determine the sequences of the 3 billion chemical base

pairs that make up human DNA

•store this information in databases,

•improve tools for data analysis,

•transfer related technologies to the private sector, and

•address the ethical, legal, and social issues (ELSI) that

may arise from the project.

GOALS OF HUMAN GENOME PROJECT

6

Drosophila melanogaster

Caenorhabtitis elegans

Arabidopsis thaliana

Saccharomyces cerevisiae

E. coli

Mus musculus

Bacteriophage

Fugu rubripes

Homo sapiens

7

Genome sequencing helps in:• identifying new genes (“gene discovery”) • looking at chromosome organization and structure• finding gene regulatory sequences• comparative genomics

These in turn lead to advances in: •medicine•agriculture•biotechnology •understanding evolution and other basic science questions

8

Some current and potential applications of genome

research include:

• Molecular medicine• Energy sources and environmental applications• Risk assessment• Bioarchaeology, anthropology, evolution, and human

migration• DNA forensics (identification)• Agriculture, livestock breeding, and bioprocessing

9

Molecular Medicine

• Improved diagnosis of disease

• Earlier detection of genetic predispositions to disease

• Rational drug design

• Gene therapy and control systems for drugs

• Pharmacogenomics "custom drugs"

10

Bioarchaeology, Anthropology, Evolution, and

Human Migration

•Study evolution through germline mutations in lineages

•Study migration of different population groups based

on female genetic inheritance

•Study mutations on the Y chromosome to trace

lineage and migration of males

•Compare breakpoints in the evolution of mutations

with ages of populations and historical events

11

• Understanding genomics will help us understand human

evolution and the common biology we share with all of

life.

• Comparative genomics between humans and other

organisms such as mice already has led to similar genes

associated with diseases and traits.

• Further comparative studies will help determine the yet-

unknown function of thousands of other genes.

12

Genes (i.e., protein coding)

But. . . only <2% of the human genome encodes proteins

Other than protein coding genes, what is there?• genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.)• structural sequences (scaffold attachment regions)• regulatory sequences• “junk” (including transposons, retroviral insertions, etc.)

It’s still uncertain/controversial how much of the genome is composed of any of these classes

The answers will come from experimentation and bioinformatics.

What’s in a genome?

13

Why sequence is not enough

• Identifying genes and control regions is not enough to decipher the inner workings of the cell:

• We need to determine the function of genes.

• We would like to determine which genes are activated in

which cells and under which conditions.

• We would like to know the relationships between genes

(protein-DNA, protein-protein interactions etc.).

• We would like to model the various dynamic systems in

the cell.

14

• transcription• post transcription (RNA stability)

• post transcription (translational control)• post translation (not considered gene regulation)

usually, when we speak of gene regulation, we are referring to transcriptional regulation

the “transcriptome”

Genes can be regulated at many levels

RNA PROTEINDNATRANSCRIPTION TRANSLATION

The “Central Dogma”

15

• high throughput assays

• robotics

• high speed computing

• statistics

• bioinformatics

Because of the vast amounts of data that are generated, we need new approaches

16

High-throughput Technologies and ‘OMİKS’ Science

17

Terms and Definitions

Genomics

Analysis of an organisms genome – identification of single genes and their function

Functional genomics

Global and dynamic survey of gene expression; detection of functional relationship

Proteomics

Analysis of protein-sequences, expression-patterns and protein-interactions of a given organism

Bioinformatics

computer-aided processing of biological data detection of complex interrelations interpretation and conclusion structuring, saving, search

18

Functional genomics

The ability to perform genome-wide patterns of gene expression and the mechanisms by which gene expression is coordinated.

19

Functional Genomics

20

Functional Genomics

21

High-throughput analysis

22

Idea: finding which genes are expressed by measuring the mRNA amount in the cell (or other materials).

Finding gene expression

23

Microarrays can show us when

and where genes are expressed.

But what regulates this

expression?

24

One way of looking at the transcriptome is with DNA microarrays. With microarrays, the expression of thousands of genes can be assessed in a single experiment.

cDNAs or oligonucleotides representing all genes in the genome are deposited on a glass slide using a robotic arrayer:

Looking at the transcriptome: DNA

microarrays

Benfey, P. and Protopapas, A. Genomics. 2005. New Jersey: Pearson Prentice Hall. pp. 131-2

25

26

Why use microarrays?

•Each cell type expresses ~ 10- 20 000 genes

•Physiological and pathophysiological responses are

linked to changes in gene expression

•Knowledge of gene expression variation at different

states may create new hypotheses about gene

function and underlying mechanisms

27

Microarray Technology

• Microarray:– New Technology (first paper: 1995)

• Allows study of thousands of genes at same time

– Glass slide of DNA molecules • Molecule: string of bases (25 bp – 500 bp) • uniquely identifies gene or unit to be studied

http://kbrin.a-bldg.louisville.edu/CECS694/

28

Fabrications of Microarrays

• Size of a microscope slide

Images: http://www.affymetrix.com/

29

Differing Conditions

• Ultimate Goal:– Understand expression level of genes under

different conditions

• Helps to:– Determine genes involved in a disease– Pathways to a disease– Used as a screening tool

30

Gene Conditions

• Cell types (brain vs. liver)

• Developmental (fetal vs. adult)

• Response to stimulus

• Gene activity (wild vs. mutant)

• Disease states (healthy vs. diseased)

31

Expressed Genes

• Genes under a given condition– mRNA extracted from cells– mRNA labeled– Labeled mRNA is mRNA present in a given

condition– Labeled mRNA will hybridize (base pair)

with corresponding sequence on slide

32

33

34

35

36

37

38

39

40

DNA microarray ProbesProduction of high-density DNA microarrays is complex and requires:

-sequence information of the organism-gene transcript analysis-gene clustering and annotation-probe design

cDNA: reverse-transcribed from cellular mRNA populationcDNA libraries (~105 clones) represent a snapshot of cellular gene expression.

PCR-samples for probe generation (300-800 nt)amplified DNA needs purification from enzymes, nucleotidessuch contaminants can interfere with the microarray analysis

Oligo-nucleotides: 50-70, multiple 25merless time and effort; precision

surface chemistry: to facilitate the attachment of probes to the slide

41

Chip design and content

standard size: 1“ x 3“ (2.54 x 7.62 cm) glass slideDNA fragments (corresponding to a particular gene) are spottedonto the array’s surface along a defined gridspot size: ~100μm/ >20.000 individual samples

Microarray platformsfull genome chips

Affymetrix: Gene Chips: A,B,C sets: in situ25mer Oligos, 16 probes/geneone color, biotinylated targets,post labeling with SA-PEclosed system

Agilent:22k, 44k60mer Oligos, 1 probe/geneTwo color labeling (Cye dyes)open source

42

MIAME - Minimum Information About a Microarray Experiment

• -to enable the interpretation of the results• -to potentially reproduce the experiment verify the conclusions• -to make microarray data available to the scientific community

MIAME principlesExperiment Design

– The goal of the experiment– Keywords - e.g. time course, cell type comparison– Experimental factors - parameters or conditions tested

Samples used, extract preparation and labeling– The origin of each biological sample– Manipulation of samples and protocols used

Hybridization procedures and parameters-Measurement data and specifications

Data extraction and processing protocols– Image scanning hardware and software– processing procedures and parameters– Normalization, transformation and data selection

Array Design:– General array design, including the platform type– Array feature and annotation

43

Microarray Flow

44

Sample Preparation

45

Two major technologies

• cDNA arrays

- probes are placed on the slides

- allows comparison of different cell types

• Oligonucleotide arrays

- partial sequences are printed on the array

- measure values in one tissue type

46

Two Different Types of Microarrays

• Custom spotted arrays (up to 20,000 sequences)– cDNA– Oligonucleotide

• High-density (up to 100,000 sequences) synthetic oligonucleotide arrays– Affymetrix (25 bases)

47

Custom Arrays

• Mostly cDNA arrays

• 2-dye (2-channel)– RNA from two sources (cDNA created)

• Source 1: labeled with red dye• Source 2: labeled with green dye

48

Two Channel Microarrays

• Microarrays measure gene expression

• Two different samples:– Control (green label)– Sample (red label)

• Both are washed over the microarray– Hybridization occurs – Each spot is one of 4 colors

49

50

cDNA microarray experiments

mRNA levels compared in many different contexts

• Different tissues, same organism (brain v. liver) • Same tissue, same organism (ttt v. ctl, tumor v. non-

tumor) • Same tissue, different organisms (wt v. ko, tg, or

mutant)

• Time course experiments (effect of ttt, development)

• Other special designs (e.g. to detect spatial patterns).

51

cDNA Microarray

• Measure the relative levels of expression

• Parallel analysis

• Competitive hybridization

• Need cDNA library

mRNA cDNA

Reverse Transcription

52

PCR Amplification

Printing

Hybridization

Laser Scan

Labeling

SamplesReverse Transcription

Expression Data

53

54

Exponential Amplification of a Gene

Return

55

Labeling and Hybridization of

Sample cDNAs

Return

56

57

cDNA microarrays

Compare the genetic expression in two samples of cells

PRINTcDNA from one gene on each spot

SAMPLEScDNA labelled red/green

e.g. treatment / control

normal / tumor tissue

58

HYBRIDIZE

Add equal amounts of labelled cDNA samples to microarray.

SCAN

Laser Detector

59

Looking at the transcriptome: DNA

microarrays

extract mRNA

make labeled cDNA

hybridize to microarray

cell type A

cell type B

more in “A ”

more in “B”

equal in A & B

60

61

Microarrays provide a means to measure gene expression

62

63

64(Slide source: http://www.bsi.vt.edu/)

65

Microarray Image Analysis

• Microarrays detect gene interactions: 4 colors: – Green: high control– Red: High sample– Yellow: Equal– Black: None

• Problem is to quantify image signals

66

Information Extraction

— Spot Intensities—mean (pixel intensities).—median (pixel intensities).

— Background values—Local —Morphological opening—Constant (global)—None

— Quality Information

Take the average

Speed Group Microarray Page

http://stat-www.berkeley.edu/users/terry/zarray/Html/image.html

Signal

Background

67

Data verification• Gene expression ratio?

Low High Expression level

Gen A Gen B

Sample 2

Sample 1

68

Quantification of expression

For each spot on the slide we calculate

Red intensity = Rfg - Rbg

(fg = foreground, bg = background) and

Green intensity = Gfg - Gbg

and combine them in the log (base 2) ratio

Log2( Red intensity / Green intensity)

69

Data Normalization• Purpose

Adjust bias from variation in microarray technology.

E.g. differences between labeling, scanner setting, spatial positions

• Within-array normalizationlogarithmic transformation of ratio, subtract by mean log ratio

Red Green Difference Ratio (G/R) Log2 Ratio Centered R

16500 15104 -1396 0.915 -0.128 -0.048

357 158 -199 0.443 -1.175 -1.095

8250 8025 -225 0.973 -0.039 0.040

978 836 -142 0.855 -0.226 -0.146

65 89 24 1.369 0.453 0.533

684 1368 529 2.000 1.000 1.080

13772 11209 -2563 0.814 -0.297 -0.217

856 731 -125 0.854 -0.228 -0.148

70

Gene Expression Data On p genes for n slides: p is O(10,000), n is O(10-100), but

growing,

Genes

Slides

Gene expression level of gene 5 in slide 4

= Log2( Red intensity / Green intensity)

slide 1 slide 2 slide 3 slide 4 slide 5 …

1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.

71

• Microarray data converted to n x p table

(p –gene number, n – sample number)

0.091.85Gene 4

1.053.34Gene 3

10.53.2Gene 2

2.081.04Gene 1

Sample 2Sample 1

Microarray gene expression data

72

Statistical Analysis• Differences in ratios due to

– random variation

– meaningful changes

• Convention

– ratio >= 2 or ratio <= ½

• Analysis of variance (ANOVA)– 4 and 10 replicates of each treatment

– statistical significance

73

Single Color Microarrays

• Prefabricated – Affymetrix (25mers)

• Custom– cDNA (500 bases or so)– Spotted oligos (70-80 bases)

74

Single Color Microarrays

• Expressed sequences washed over chips

• Expressed genes hybridize

• Light passed under to see intensity (or hybridized oligos show dark color)

75

Affymetrix GeneChip System

• Large number of genes and ESTs

• Several number of species

• Oligonucleotide arrays for expression monitoring are

designed and synthesized based on sequence

information alone, without the need for physical

intermediates such as clones, PCR products, cDNAs.

• Printed oligos are of the same length, allowing for

equal hybridization.

76

Affymetrix Technology

DESOKY, 2003

77

Affymetrix Technology

Biotin (one dye) instead of 2 colors

One treatment per chip• For two conditions, need two slides• Compare patterns of both slides to get results

11, 16, or 20 gene markers pairs per gene

DESOKY, 2003

78

Affymetrix Technology

DESOKY, 2003

79

Affymetrix Genechip: experimental steps

80

81

Lithography

• It is a printing technology.• Lithography was invented by Alois Senefelder

in Germany in 1798.• The printing and non-printing areas of the

plate are all at the same level, as opposed to intaglio and relief processes in which the design is cut into the printing block.

• Lithography is based on the chemical repellence of oil and water.

82

Affymetrix TechnologyLight-directed synthesis of DNA chips

• Attachment of synthetic linkers modified with photochemically removable protecting groups to a glass substrate and direct light through a photolithographic mask to specific areas on the surface to produce localized photodeprotection.

• The first of a series of chemical building blocks, hydroxyl-protected deoxynucleosides, is incubated with the surface, and chemical coupling occurs at those sites that have been illuminated in the preceding step.

• Next, light is directed to different regions of the substrate by a new mask, and the chemical cycle is repeated.

• Current technology allow for 300,000 polydeoxynucleotides in a 1.28x1.28 cm arrays.

83

Affymetrix Array Construction

STROMBERG, 2003

84

85

86

PM to maximize hybridization

MM to ascertain the degree of cross-

hybridization

Affymetrix Design of probes

87

Affy Tech – Number of Features

Multipleoligo probes

25-mers

Features

5’ 3’Gene Sequence

– Use multiple oligos per gene

– Redundancy improves detection and quantification of the target gene

DESOKY, 2003

88

Affy Tech – Mismatches for Control

Multipleoligo probes

25-mers

Perfect MatchMismatch

5’ 3’Gene Sequence

• Each probe has a “control” – a DNA sequence which differs only slightly from the feature

• In a 25-mer, the mismatch sequence differs in the 13th position (A-T or G-C)

DESOKY, 2003

89

90

Probe Tiling Strategy

• Gene expression monitoring with oligonucleotide arrays. Expression probe and array design. Oligonucleotide probes are chosen based on uniqueness criteria and composition design rules. For eukaryotic organisms, probes are chosen typically from the 3´ end of the gene or transcript (nearer to the poly(A) tail) to reduce problems that may arise from the use of partially degraded mRNA. The use of the PM minus MM differences averaged across a set of probes greatly reduces the contribution of background and cross-hybridization and increases the quantitative accuracy and reproducibility of the measurements.

91

PMMM

Probe set

Probe pair

STROMBERG, 2003

92

Affymetrix Data

• Each gene labeled as “present”, “marginal”, or “absent.” – Present: gene expressed and reliably

detected in the RNA sample

• Label chosen based on a p-value

93

94

Why Probe redundancy?• use of multiple independent detectors for the same molecule improves

signal-to-noise ratios (due to averaging over the intensities of multiple array features), improves the accuracy of RNA quantification (averaging and outlier rejection), increases the dynamic range, mitigates effects due to cross-hybridization, and drastically reduces the rate of false positives and miscalls.

• An additional level of redundancy comes from the use of mismatch (MM) control probes that are identical to their perfect match (PM) partners except for a single base difference in a central position. The MM probes act as specificity controls that allow the direct subtraction of both background and cross-hybridization signals, and allow discrimination between ‘real’ signals and those due to non-specific or semi-specific hybridization (hybridization of the intended RNA molecules produces more signal for the PM probes than for the MM probes resulting in consistent patterns that are highly unlikely to occur by chance

95

96

97

98

Gene Expression Data

Gene expression data on p genes for n samples

Genes

mRNA samples

Gene expression level of gene i in mRNA sample j

=Log (Red intensity / Green intensity)

Log(Avg. PM - Avg. MM)

sample1 sample2 sample3 sample4 sample5 …

1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

99

100

101

What is gene expression?

Gene expression= Expression degree of a gene in a particular experiment (protein)

genes

Experiments (overtime)

Base line expression

Higherexpressioncompared tobaseline

Lowerexpressioncompared tobaseline

Spellman et al Mol. Biol. Cell 1998

102

Looking at the transcriptome: microarrays

genes

co

nd

itio

ns

condition 1 condition 2

condition 3

statistical processing and analysis

103

104

Microarrays yield information

Image: bioinfo.mbb.yale.edu/~mbg/ fun3/microarray-mona/

105

Are they important for clinical use?

High-throughput Technologies and ‘OMİKS’ Science

106

Adrenal Gland

Endometrium

Pancreas

Brain

Breast

Uterus

Esophagus

Gall BladderKidney

LiverLung

Ovary

Skin Bone

Stomach

ThyroidHead & Neck

ProstateGerm Cell

Soft Tissue

Lymph

CervixBladder

GISTColon

Adrenal Gland

Endometrium

Pancreas

Brain

Breast

Uterus

Esophagus

Gall BladderKidney

LiverLung

Ovary

Skin Bone

Stomach

ThyroidHead & Neck

ProstateGerm Cell

Soft Tissue

Lymph

CervixBladder

GISTColon

Adrenal Gland

Endometrium

Pancreas

Brain

Breast

Uterus

Esophagus

Gall BladderKidney

LiverLung

Ovary

Skin Bone

Stomach

ThyroidHead & Neck

ProstateGerm Cell

Soft Tissue

Lymph

CervixBladder

GISTColon

Why gene expression profiles can classify cancer types?

Cancers from different origins are

Derived from cells thatpasses through differentdevelopmental stages.

Expression profiles of thecells coming from different

developmental stagesdiffer from each other.

107

Revolution of Breast Cancer Classification

DNA Chip Analysis

108

(Baselga and Norton, 2002)

Breast Cancer Classification

109

Sorlie et al., 2001

Breast Cancer Classification

110

Portrait of Breast Cancer

Sørlie et al. Proc Natl Acad Sci U S A. 2001 Sep 11;98(19):10869-10874.

Basal–like

HER-2

“Normal

Luminal B

Luminal A

111

Subtypes of breast cancer identified by gene expression patterns (Sorlie et al, PNAS, 98: 10969-74, 2001)

Gene expression profiles provide classification of

the sub-types of cancers with different clinical

outcome.

Two ER positive subgroup:

•Luminal A Best clinical outcome

•Luminal B

Three ER negative subgroup:

•“Normal” breast-like

•ERBB2+ (ERBB2 amplic high expression)

•Basal-like Worst clinical outcome

112

Molecular Grading of Breast Cancer

Sotiriou C, et al.. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006 Feb 15;98(4):262-72.

• Gen ifade profil verisi meme kanserinde iki moleküler derece (grade) göstermektedir.

• Histolojik Grade 2 durumları moleküler grade 1 ve 2 arasında dağılmıştır.

• Moleküler dereceleme ER/PR ve HER2 gibi geleneksel prognostik faktör multivaryant analizlerinden daha iyi performans vermektedir.

113

Gene-based breast cancer testMammaPrint Array

FDA Approved

MammaPrint 70 genin aktivitesini ölçen DNA

mikroarray-bazlı bir testtir.

- Test ile bu genlerin herbirinin kadının meme

kanseri örneğindeki ifadeleri ölçülmekte ve özel

bir hesaplama kullanarak hastanın kanserinin

diğer bölgelere geçme olasılığının düşük mü

yoksa yüksek riskli mi olduğu hesaplamaktadır.

- Kimin tedavi edilmesi gerektiğine yön verici….

114

“ MammaPrint is a DNAmicroarray-based test thatmeasures the activity of 70genes... The test measureseach of these genes in asample of a woman'sbreast-cancer tumor andthen uses a specific formulato determine whether thepatient is deemed low riskor high risk for the spreadof the cancer to anothersite.”

FDA Approves Gene-BasedBreast Cancer Test*

115

DNA MİKROARRAY ANALİZİ İLE GEN İMZASI OLUŞTURMA - MammaPrint ARRAY

78 adet lenf nodu negatif genç hastanın primer meme tümörü kullanıldı:

- Bu hastalardan 5 yıl içinde uzak metastaz görülen 34’ünün gen ifade profilleri, 5 yıl içinde hastalığı olmadan yaşayan 44 hastanın gen ifade profilleri ile karşılaştırıldı.

- Analizler meme tümörlerini iyi veya kötü prognozlu grup olarak sınıflandırmalarına olanak veren 70 genlik bir gen ifade setinin çıkarılmasını sağladı.

116

Intra-operative Cancer Detection

“The rapid RT-PCR assay has found a breast cancer stem cell related mRNA signature in the sentinel

lymph node”

top related