from motif search to gene expression analysis

45
From motif search to gene expression analysis

Upload: lada

Post on 19-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

From motif search to gene expression analysis. Protein Motifs. Protein motifs are usually 6-20 amino acids long and can be represented as a consensus/profile:. P[ED]XK[RW][RK]X[ED]. or as PWM. Protein Domains. In additional to protein short motifs, proteins are characterized by Domains. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: From motif search to gene expression analysis

From motif search to gene expression analysis

Page 2: From motif search to gene expression analysis

P[ED]XK[RW][RK]X[ED]

Protein Motifs

Protein motifs are usually 6-20 amino acids long andcan be represented as a consensus/profile:

or as PWM

Page 3: From motif search to gene expression analysis

Protein Domains• In additional to protein short motifs, proteins are

characterized by Domains. • Domains are long motifs (30-100 aa) and are

considered as the building blocks of proteins (evolutionary modules).

The zinc-finger domain

Page 4: From motif search to gene expression analysis

Some domains can be found in many proteins with different functions:

Page 5: From motif search to gene expression analysis

….while other domains are only found in proteins with a certain function…..

MBD= Methylated DNA Binding Domain

Page 6: From motif search to gene expression analysis

Varieties of protein domains

Page 228

Extending along the length of a protein

Occupying a subset of a protein sequence

Occurring one or more times

Page 7: From motif search to gene expression analysis

Pfam

> Database that contains a large collection of multiple sequence alignments of protein domains

Based on Profile hidden Markov Models (HMMs).

HMM in comparison to PWM is a modelwhich considers dependencies between thedifferent columns in the matrix (different residues) and is thus much more powerful!!!!

http://pfam.sanger.ac.uk/

Page 8: From motif search to gene expression analysis

Profile HMM (Hidden Markov Model)can accurately represent a MSA

D16 D17 D18 D19

M16 M17 M18 M19

I16 I19I18I17

100%

100% 100%

100%

D 0.8S 0.2

P 0.4R 0.6

T 1.0 R 0.4S 0.6

X XX X

50%

50%D R T RD R T SS - - SS P T RD R T RD P T SD - - SD - - SD - - SD - - R

16 17 18 19

Match

delete

insert

Page 9: From motif search to gene expression analysis

Gene Expression Analysis

Page 10: From motif search to gene expression analysis

Gene Expression

10

proteinRNADNA

Page 11: From motif search to gene expression analysis

Gene Expression

11

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAmRNA gene1

mRNA gene2

mRNA gene3

Page 12: From motif search to gene expression analysis

Studying Gene Expression 1987-2011

12

Spotted microarray (first high throughput gene expression experiments)

DNA chips

RNA-seq (Next Generation Sequencing)

Page 13: From motif search to gene expression analysis

Classical versus modern technologies to study gene expression

13

Classical Methods (Spotted microarray, DNA chips)-Require prior knowledge on the RNA transcriptGood for studying the expression of known genes

New generation RNA sequencing-Do not require prior knowledge Good for discovering new transcripts

Page 14: From motif search to gene expression analysis

14

1. Spotted Microarray

Two channel cDNA microarrays.

2. DNA Chips

One channel microarrays

(Affymetrix, Agilent),

Classical Methods

Page 15: From motif search to gene expression analysis

http://www.bio.davidson.edu/courses/genomics/chip/chip.html

15

Page 16: From motif search to gene expression analysis

16

Experimental Protocol Two channel cDNA arrays

1. Design an experiment

(probe design)

2. Extract RNA molecules from cell

3. Label molecules with fluorescent dye

4. Pour solution onto microarray

– Then wash off excess molecules

5. Shine laser light onto array

– Scan for presence of fluorescent dye

6. Analyze the microarray image

Page 17: From motif search to gene expression analysis

17Cy3 Cy5Cy5Cy3

Cy5log2 Cy3

The ratio of expression is indicated by the intensity of the colorRed= High mRNA abundance in the experiment sample Green= High mRNA abundance in the control sample

Transforming raw data to ratio of expression

Page 18: From motif search to gene expression analysis

18

One channel DNA chips

• Each sequence is represented by a probe set colored with one fluorescent dye

• Target hybridizes to complimentary probes only• The fluorescence intensity is indicative of the

expression of the target sequence

Page 19: From motif search to gene expression analysis

19

Affymetrix Chip

Page 20: From motif search to gene expression analysis

RNA-seq

20

Page 21: From motif search to gene expression analysis

21

Clustering genes according to their expression profiles.

Gen

es

Experiments

NEXT…

Page 22: From motif search to gene expression analysis

22

WHY?What can we learn from the

clusterers?

• Identify gene function– Similar expression can infer similar function

• Diagnostics and Therapy– Different genes expression can indicate a disease

state– Genes which change expression in a disease can be

good candidates for drug targets

Page 23: From motif search to gene expression analysis

23

HOW?Different clustering approaches

• Unsupervised -Hierarchical Clustering

-Partition MethodsK-means

• Supervised Methods-Analysis of variance-Discriminant analysis-Support Vector Machine (SVM)

Page 24: From motif search to gene expression analysis

Clustering

Clustering organizes things that are close into groups.

- What does it mean for two genes to be close?

- Once we know this, how do we define groups?

Page 25: From motif search to gene expression analysis

What does it mean for two genes to be close?

25

We need a mathematical definition of distance between the expression of two genes

Gene 1

Gene 2

Gene1= (E11, E12, …, E1N)’Gene2= (E21, E22, …, E2N)’

For example distance between gene 1 and 2Euclidean distance= Sqrt of Sum of (E1i -E2i)2, i=1,…,N

Page 26: From motif search to gene expression analysis

Once we know this, how do we define groups?

26

Michael Eisen, 1998 : Generate a tree based on similarity(similar to a phylogenetic tree)

Each gene is a leaf on the treeDistances reflect similarity of expression

Hierarchical Clustering

Gen

es

Experiments

Gene Cluster

Page 27: From motif search to gene expression analysis

Internal nodes represent different functional Groups (A, B, C, D, E)

One genes may belong to more than one cluster

gene

s

Page 28: From motif search to gene expression analysis

28

Clusters can be presented by graphs

Page 29: From motif search to gene expression analysis

29

What can we learn from clusters with similar gene expression ??

Page 30: From motif search to gene expression analysis

30

EXAMPLE- hnRNP A1 and SRp40

HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues

Page 31: From motif search to gene expression analysis

31

Are hnRNP A1 and SRp40 functionally homologs ??

SF SFSF

SFSF

SF SF

SFSF

SFSFSF

SRP40

hnRNP A1

YES!!!!

Page 32: From motif search to gene expression analysis

32

What can we learn from clusters with similar gene expression ??

• Similar expression between genes

– The genes have similar function

– One gene controls the other

– All genes are controlled by a common regulatory genes

Page 33: From motif search to gene expression analysis

33

How can we use microarray for diagnostics?

Page 34: From motif search to gene expression analysis

Gene-Expression Profiles in Hereditary Breast Cancer

• Breast tumors studied: BRCA1 BRCA2sporadic tumors

• Log-ratios measurements of 3226 genes for each tumor after initial data filtering

cDNA MicroarraysParallel Gene Expression Analysis

RESEARCH QUESTIONCan we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles?

Page 35: From motif search to gene expression analysis

35

How can microarrays be used as a basis for diagnostic?

Patient 1

patient 2

patient 3

patient4

patient 5

Gen1 + - - + +Gen2 + + - + -Gen3 - + + + -Gen4 + + + - -Gen5 - - + - +

5 Breast Cancer Patient

Page 36: From motif search to gene expression analysis

36

How can microarrays be used as a basis for diagnostic?

patinet1

patient 2

patient4

patient 3

patient 5

Gen1 + - + - +Gen3 - + + + -Gen4 + + - + -Gen2 + + + - -Gen5 - - - + +

InformativeGenes

BRCA1 BRCA2

Page 37: From motif search to gene expression analysis

37

Specific Examples

Cancer Research

Ramaswamy et al, 2003Nat Genet 33:49-54

Hundreds of genesthat differentiate betweencancer tissues in differentstages of the tumor were found.The arrow shows an exampleof a tumor cells which were not detected correctly byhistological or other clinical parameters.

Page 38: From motif search to gene expression analysis

38

Supervised approachesfor predicting gene function based on microarray data

• SVM would begin with a set of genes that have a common function (red dots), In addition, a separate set of genes that are known not to be members of the functional class (blue dots) are specified.

Support Vector Machine

Page 39: From motif search to gene expression analysis

39

• Using this training set, an SVM would learn to differentiate between the members and non-members of a

given functional class based on expression data.

• Having learned the expression features of the class, the SVM could recognize new genes as members or as non-members of the class based on their expression data.

?

Page 40: From motif search to gene expression analysis

40

Using SVMs to diagnose tumors based on expression dataEach dot represents a vector of the expression pattern taken from a microarray experiment . For example the expression pattern of all genes from a cancer patients.

Page 41: From motif search to gene expression analysis

41

How do SVM’s work with expression data?In this example red dots can be primary tumors and blue arefrom metastasis stage.The SVM is trained on data which was classified based on histology.

?

After training the SVM we can use it to diagnose the unknown tumor.

Page 42: From motif search to gene expression analysis

Projects 2012-13

Page 43: From motif search to gene expression analysis

Key dates13.12 lists of suggested projects published **You are highly encouraged to choose a project yourself or find a relevant project which can help in your research

22.1 Submission project overview (one page)-Title-Main question-Major Tools you are planning to use to answer the questions

Final week – meetings on projects12.3 Poster submission20.3 Poster presentation

Instructions for the final projectIntroduction to Bioinformatics 2012-13

Page 44: From motif search to gene expression analysis

2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next stepsA. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by stepC. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final project. Your initial results should guide you towards your next steps.D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. .

Page 45: From motif search to gene expression analysis

3. Summarizing final project in a poster (in pairs)Prepare in PPT poster size 90-120 cmTitle of the project Names and affiliation of the students presenting

The poster should include 5 sections :Background should include description of your question (can add

figure)Goal and Research Plan: Describe the main objective and the research planResults (main section) : Present your results in 3-4 figures, describe

each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your projectReferences : List the references of paper/databases/tools used for

your project

Examples of posters will be presented in class