from motif search to gene expression analysis. finding tf targets using a bioinformatics approach...

44
From motif search to gene expression analysis

Upload: phillip-miles

Post on 28-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

From motif search to gene expression analysis

Page 2: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Finding TF targets using a bioinformatics approach

Scenario 1 : Binding motif is known (easier case)

Scenario 2 : Binding motif is unknown (hard case)

Page 3: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Are common motifs the right thing to search for ?

Page 4: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Solutions:

-Searching for motifs which are enriched in one set but not in a random set

- Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list

Page 5: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Sequencing the regions in the genome to which a protein (e.g. transcription factor) binds to.

ChIP-Seq

Page 6: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

ChIP –SEQ

BestBinders

WeakBinders

Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity

Page 7: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Ranked sequences list

Candidate k-mers

CTACGC

ACTTGA

ACGTGA

ACGTGC

CTGTGC

CTGTGA

CTGTAC

ATGTGC

ATGTGA

CTATGC

CTGTGC

CTGTGA

CTGTGACTGTGA

CTGTGA

CTGTGA

CTGTGA

- a word search approach to search for enriched motif in a ranked list

CTGTGA

CTGTGA

Page 8: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

The total number of input sequences

The number of sequences containing the motif

The number of sequences at

the top of the list

The number of sequences containing the motif among the top sequences

Ranked sequences list

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

uses the minimal hyper geometric statistics (mHG) to find enriched

motifs

Page 9: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

The enriched motifs are combined to get a PSSM which represents the binding

motif

Page 10: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario
Page 11: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

P[ED]XK[RW][RK]X[ED]

Protein Motifs

Protein motifs are usually 6-20 amino acids long andcan be represented as a consensus/profile:

or as PWM

Page 12: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Gene Expression Analysis

Page 13: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Gene Expression

13

proteinRNADNA

Page 14: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Gene Expression

14

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAmRNA gene1

mRNA gene2

mRNA gene3

Page 15: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Studying Gene Expression 1987-2013

15

Microarray (first high throughput gene expression experiments)

DNA chips

RNA-seq (Next Generation Sequencing)

Page 16: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Classical versus modern technologies to study gene expression

16

Classical Methods (Spotted microarray, DNA chips)-Require prior knowledge on the RNA transcriptGood for studying the expression of known genes

New generation RNA sequencing-Do not require prior knowledge Good for discovering new transcripts

Page 17: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

17

Experimental Protocol Two channel cDNA arrays

http://www.bio.davidson.edu/courses/genomics/chip/chip.html

Page 18: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

18

One channel DNA chips

• Each sequence is represented by a probe set colored with one fluorescent dye

• Target hybridizes to complimentary probes only• The fluorescence intensity is indicative of the

expression of the target sequence

Page 19: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

19

Affymetrix Chip

Page 20: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

RNA-seq

20

Page 21: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

21

Clustering the data according to expression profiles.

Gen

es

Expression in different conditions

NEXT…

Page 22: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

22

WHY?What can we learn from the

clusterers?

• Identify gene function– Similar expression can infer similar function

• Diagnostics and Therapy– Different genes expression can indicate a disease

state– Genes which change expression in a disease can be

good candidates for drug targets

Page 23: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

23Ramaswamy et al, 2003 Nat Genet 33:49-54

Samples were taken from patients with adenocarcinoma.Hundreds of genesthat differentiate betweencancer tissues in differentstages of the tumor were found.The arrow shows an exampleof a tumor cells which were not detected correctly byhistological or other clinical parameters.

A molecular signature of metastasis in primary solid tumors

Page 24: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

24

HOW?Different clustering approaches

• Unsupervised - Hierarchical Clustering - K-means

• Supervised Methods-Support Vector Machine (SVM)

Page 25: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Clustering

Clustering organizes things that are close into groups.

- What does it mean for two genes to be close?

- Once we know this, how do we define groups?

Page 26: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

What does it mean for two genes to be close?

26

We need a mathematical definition of distance between the expression pattern of two genes

Gene 1

Gene 2

Gene1= (E11, E12, …, E1N)’Gene2= (E21, E22, …, E2N)’

Euclidean distance= Sqrt of Sum of (E1i -E2i)2, i=1,…,N

For example distance between gene 1 and 201 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Page 27: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Clustering the genes according to expression

27

Generate a tree based on the distances between genes(similar to a phylogenetic tree)

Each gene is a leaf on the treeDistances reflect the similarity of their expression pattern

Hierarchical Clustering

Gen

es

Expression in different conditions

Gene Cluster

Page 28: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

28

a b c d

a 0 4 2 4

b 4 0 4.47 2.82

c 2 4.47 0 4.47

d 4 2.82 4.47 0

Clustering the genes according to gene expression

Distance Table

Distances (Euclidian distance)*

Genes

Dab = 4Dac = 2Dad = 4Dbc = 4.47Dbd = 2.82 Dcd = 4.47

• Can be calculated using different distance metrics

GENE a 1, -1, 1, 1, 1,-1,-1,-1GENE b 1, 1, -1, 1, 1, 1,-1, 1GENE c 1, -1, 1, -1, 1,-1,-1,-1GENE d -1, 1, -1, 1, 1, 1,-1,-1

Page 29: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

29

Analyzing the clusters of genes

Cluster 2

Cluster 3

Cluster 4

Page 30: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

30

What can we learn from clusters with similar gene expression ??

Page 31: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

31

EXAMPLE- hnRNP A1 and SRp40

HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues

Page 32: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

32

Are hnRNP A1 and SRp40 functionally homologs ??

SF SFSF

SFSF

SF SF

SFSF

SFSFSF

SRP40

hnRNP A1

YES!!!!

Page 33: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

33

What else can we learn from clusters with similar gene expression

??

• Similar expression between genes

– The genes have similar function

– One gene controls the other

– All genes are controlled by a common regulatory genes

Page 34: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

34

How can gene expression help in diagnostics?

Page 35: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

How can gene-expression help in diagnostics ?

Different patients (BRCA1 or BRCA2)

RESEARCH QUESTION

Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles?

HERE we want to cluster the patients not the genes !!!

Gen

es

Page 36: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

36

How can gene expression be applied for diagnostic?

Patient 1

patient 2

patient 3

patient4

patient 5

Gen1 + - - + +Gen2 + + - + -Gen3 - + + + -Gen4 + + + - -Gen5 - - + - +

5 Breast Cancer Patient

Page 37: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

37

How can gene expression be applied for diagnostic?

patinet1

patient 2

patient4

patient 3

patient 5

Gen1 + - + - +Gen3 - + + + -Gen4 + + - + -Gen2 + + + - -Gen5 - - - + +

InformativeGenes

BRCA1 BRCA2

Two-Way clustering = clustering the patients and genes

Page 38: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Supervised approachesfor diagnostic based on expression data

Support Vector Machine SVM

Page 39: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

• SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots).

Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient.

Page 40: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

40

How do SVM’s work with expression data?The SVM is trained on data which was classified based on histology.

?

After training the SVM to separated the BRCA1 from BRAC2 tumorsgiven the expression data, we can then apply it to diagnose anunknown tumor for which we have the equivalent expression data .

Page 41: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Projects 2013-14

Page 42: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

Key dates12.12 lists of suggested projects published **You are highly encouraged to choose a project yourself or find a relevant project which can help in your research

9.1 Submission project overview (one page)-Title-Main question-Major Tools you are planning to use to answer the questions

Final week – meetings on projects 12.3 Poster submission 19.3 Poster presentation

Instructions for the final projectIntroduction to Bioinformatics 2013-14

Page 43: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next stepsA. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by stepC. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final project. Your initial results should guide you towards your next steps.D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. .

Page 44: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario

3. Summarizing final project in a poster (in pairs)Prepare in PPT poster size 90-120 cmTitle of the project Names and affiliation of the students presenting

The poster should include 5 sections :Background should include description of your question (can add

figure)Goal and Research Plan: Describe the main objective and the research planResults (main section) : Present your results in 3-4 figures, describe

each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your projectReferences : List the references of paper/databases/tools used for

your project

Examples of posters will be presented in class