from motif search to gene expression analysis. finding tf targets using a bioinformatics approach...

Post on 28-Dec-2015

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

From motif search to gene expression analysis

Finding TF targets using a bioinformatics approach

Scenario 1 : Binding motif is known (easier case)

Scenario 2 : Binding motif is unknown (hard case)

Are common motifs the right thing to search for ?

Solutions:

-Searching for motifs which are enriched in one set but not in a random set

- Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list

Sequencing the regions in the genome to which a protein (e.g. transcription factor) binds to.

ChIP-Seq

ChIP –SEQ

BestBinders

WeakBinders

Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity

Ranked sequences list

Candidate k-mers

CTACGC

ACTTGA

ACGTGA

ACGTGC

CTGTGC

CTGTGA

CTGTAC

ATGTGC

ATGTGA

CTATGC

CTGTGC

CTGTGA

CTGTGACTGTGA

CTGTGA

CTGTGA

CTGTGA

- a word search approach to search for enriched motif in a ranked list

CTGTGA

CTGTGA

The total number of input sequences

The number of sequences containing the motif

The number of sequences at

the top of the list

The number of sequences containing the motif among the top sequences

Ranked sequences list

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

CTGTGA

uses the minimal hyper geometric statistics (mHG) to find enriched

motifs

The enriched motifs are combined to get a PSSM which represents the binding

motif

P[ED]XK[RW][RK]X[ED]

Protein Motifs

Protein motifs are usually 6-20 amino acids long andcan be represented as a consensus/profile:

or as PWM

Gene Expression Analysis

Gene Expression

13

proteinRNADNA

Gene Expression

14

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAmRNA gene1

mRNA gene2

mRNA gene3

Studying Gene Expression 1987-2013

15

Microarray (first high throughput gene expression experiments)

DNA chips

RNA-seq (Next Generation Sequencing)

Classical versus modern technologies to study gene expression

16

Classical Methods (Spotted microarray, DNA chips)-Require prior knowledge on the RNA transcriptGood for studying the expression of known genes

New generation RNA sequencing-Do not require prior knowledge Good for discovering new transcripts

17

Experimental Protocol Two channel cDNA arrays

http://www.bio.davidson.edu/courses/genomics/chip/chip.html

18

One channel DNA chips

• Each sequence is represented by a probe set colored with one fluorescent dye

• Target hybridizes to complimentary probes only• The fluorescence intensity is indicative of the

expression of the target sequence

19

Affymetrix Chip

RNA-seq

20

21

Clustering the data according to expression profiles.

Gen

es

Expression in different conditions

NEXT…

22

WHY?What can we learn from the

clusterers?

• Identify gene function– Similar expression can infer similar function

• Diagnostics and Therapy– Different genes expression can indicate a disease

state– Genes which change expression in a disease can be

good candidates for drug targets

23Ramaswamy et al, 2003 Nat Genet 33:49-54

Samples were taken from patients with adenocarcinoma.Hundreds of genesthat differentiate betweencancer tissues in differentstages of the tumor were found.The arrow shows an exampleof a tumor cells which were not detected correctly byhistological or other clinical parameters.

A molecular signature of metastasis in primary solid tumors

24

HOW?Different clustering approaches

• Unsupervised - Hierarchical Clustering - K-means

• Supervised Methods-Support Vector Machine (SVM)

Clustering

Clustering organizes things that are close into groups.

- What does it mean for two genes to be close?

- Once we know this, how do we define groups?

What does it mean for two genes to be close?

26

We need a mathematical definition of distance between the expression pattern of two genes

Gene 1

Gene 2

Gene1= (E11, E12, …, E1N)’Gene2= (E21, E22, …, E2N)’

Euclidean distance= Sqrt of Sum of (E1i -E2i)2, i=1,…,N

For example distance between gene 1 and 201 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Clustering the genes according to expression

27

Generate a tree based on the distances between genes(similar to a phylogenetic tree)

Each gene is a leaf on the treeDistances reflect the similarity of their expression pattern

Hierarchical Clustering

Gen

es

Expression in different conditions

Gene Cluster

28

a b c d

a 0 4 2 4

b 4 0 4.47 2.82

c 2 4.47 0 4.47

d 4 2.82 4.47 0

Clustering the genes according to gene expression

Distance Table

Distances (Euclidian distance)*

Genes

Dab = 4Dac = 2Dad = 4Dbc = 4.47Dbd = 2.82 Dcd = 4.47

• Can be calculated using different distance metrics

GENE a 1, -1, 1, 1, 1,-1,-1,-1GENE b 1, 1, -1, 1, 1, 1,-1, 1GENE c 1, -1, 1, -1, 1,-1,-1,-1GENE d -1, 1, -1, 1, 1, 1,-1,-1

29

Analyzing the clusters of genes

Cluster 2

Cluster 3

Cluster 4

30

What can we learn from clusters with similar gene expression ??

31

EXAMPLE- hnRNP A1 and SRp40

HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues

32

Are hnRNP A1 and SRp40 functionally homologs ??

SF SFSF

SFSF

SF SF

SFSF

SFSFSF

SRP40

hnRNP A1

YES!!!!

33

What else can we learn from clusters with similar gene expression

??

• Similar expression between genes

– The genes have similar function

– One gene controls the other

– All genes are controlled by a common regulatory genes

34

How can gene expression help in diagnostics?

How can gene-expression help in diagnostics ?

Different patients (BRCA1 or BRCA2)

RESEARCH QUESTION

Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles?

HERE we want to cluster the patients not the genes !!!

Gen

es

36

How can gene expression be applied for diagnostic?

Patient 1

patient 2

patient 3

patient4

patient 5

Gen1 + - - + +Gen2 + + - + -Gen3 - + + + -Gen4 + + + - -Gen5 - - + - +

5 Breast Cancer Patient

37

How can gene expression be applied for diagnostic?

patinet1

patient 2

patient4

patient 3

patient 5

Gen1 + - + - +Gen3 - + + + -Gen4 + + - + -Gen2 + + + - -Gen5 - - - + +

InformativeGenes

BRCA1 BRCA2

Two-Way clustering = clustering the patients and genes

Supervised approachesfor diagnostic based on expression data

Support Vector Machine SVM

• SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots).

Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient.

40

How do SVM’s work with expression data?The SVM is trained on data which was classified based on histology.

?

After training the SVM to separated the BRCA1 from BRAC2 tumorsgiven the expression data, we can then apply it to diagnose anunknown tumor for which we have the equivalent expression data .

Projects 2013-14

Key dates12.12 lists of suggested projects published **You are highly encouraged to choose a project yourself or find a relevant project which can help in your research

9.1 Submission project overview (one page)-Title-Main question-Major Tools you are planning to use to answer the questions

Final week – meetings on projects 12.3 Poster submission 19.3 Poster presentation

Instructions for the final projectIntroduction to Bioinformatics 2013-14

2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next stepsA. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by stepC. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final project. Your initial results should guide you towards your next steps.D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. .

3. Summarizing final project in a poster (in pairs)Prepare in PPT poster size 90-120 cmTitle of the project Names and affiliation of the students presenting

The poster should include 5 sections :Background should include description of your question (can add

figure)Goal and Research Plan: Describe the main objective and the research planResults (main section) : Present your results in 3-4 figures, describe

each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your projectReferences : List the references of paper/databases/tools used for

your project

Examples of posters will be presented in class

top related