identification of circadian clock genes by datamining microarray data

38
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results The Identification of Circadian Clock Genes By Data Mining Microarray Data Atreyi Banerjee and Martin Hunt The University of Leicester June 27, 2008

Upload: atreyi-banerjee

Post on 07-Apr-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The Identification of Circadian Clock GenesBy Data Mining Microarray Data

Atreyi Banerjee and Martin Hunt

The University of Leicester

June 27, 2008

Page 2: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Outline

• Introduction

• How to find circadian clock genes

• Promotor Analysis

Page 3: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Outline

• Introduction

• How to find circadian clock genes

• Promotor Analysis

Page 4: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

What is circadian rhythm?

Circadian circa (about) + dies (a day) Circadian rhythm is theself-sustained cycle with 24 hour period that controls rest/activitytime awareness, photosynthesis, etc. Common among eukaryotes(Neurospora, Drosophila, Mammals) Reserved for living organisms(daily traffic congestions is not a circadian rhythm) Circannual 1year period(e.g. migration)

Page 5: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Circadian rhythm properties

Circadian rhythm properties are conserved across plant and animalkingdom Basic properties of circadian rhythm: Endogenous freerunning period of 24 hours Synchronization of stimuli Period isunchanged with temperature Advantage: learn from studyingsimple organisms (Drosophila, Neurospora, Mouse) Mechanismsare similar but the genes are different The main cycling genes:PER, TIM, CLK, CYC, BMAL

Page 6: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Drosophila

Affymetrix gene chip (Drosgenome 1) assay Identifying circadiangenes Clustering and Heatmap Promoter analysis

Page 7: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Drosophila circadian oscillator

Page 8: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Circadian clock control in Drosophila

ADD REFERENCE

Page 9: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Experimentations

Drosophila entrained in 12:12 hour light dark (LD) cycle Then leftin complete darkness and analysed every 4 hours The final datasetincluded replicas of 4 chips CT0, CT4, CT8, CT12, CT16 andCT20

Page 10: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Outline

• Introduction

• How to find circadian clock genes

• Promotor Analysis

Page 11: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Outline

• Introduction

• How to find circadian clock genes

• Promotor Analysis

Page 12: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Promoter analysis

To detect genes having same regulatory mechanism Extracting the5’ untranslated region of the genes Finding out the overrepresented motifs in the sequences Finding out the cis-regulatorymodules (combination of binding sites) in sets of co-expressed orcoregulated genes Getting the putative transcription factor bindingsites (TFBS) Functional analysis

Page 13: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Effects of clock mutations on enhancers regulatingcircadian gene expression

Stempfl, T. et al. Genetics 2002;160:571-593

Page 14: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

TOUCAN software

An interactive java display Map genes onto the Sequence set spaceFlexibilty of using any identifier(Affy ID, EMBL, Refseq etc)Perform statistical tests for finding regulatory sequences, selectingparts of sequences, finding CpG islands in metazoan genome

Page 15: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Predict instances of known motifs with MotifScanner

Page 16: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The Significant motifs found in each cluster

Page 17: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Predict cis-regulatory modules with MotifSampler

The co-expression of Dorsal 2 and Myf showing

Page 18: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The cis-regulatory modules in each cluster

Page 19: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The cis-regulatory module in genes listed with p-values

Page 20: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Genscan output of cluster 1

Page 21: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

List of unknown TFBS found in each cluster

Page 22: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

de novo discovery of unknown TFBS

MotifSampler tool in TOUCAN used to find unknown motifs whichcould be novel transcription factors The 5’UTR sequences alsoextracted from Ensembl Biomart The over represented TFBS wereextracted from MATCH and OTFBS Dorsal 2 and Myf were overrepresented modules ARNT also found in cycle an important clockgene, was located Genscan predicted genes in each cluster

Page 23: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Outline

• Introduction

• Identifying circadian clock genes

• Promotor Analysis

Page 24: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Outline

• Introduction

• Identifying circadian clock genes

• Promotor Analysis

Page 25: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Identifying circadian genes: an outline

Microarray experiment

?

Data (spreadsheet)

?

Process data in R

?

Data analysis in R

?List of circadian genes

Page 26: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Identifying circadian genes: an outline

Microarray experiment

?

Data (spreadsheet)

?

Process data in R

?

Data analysis in R

?List of circadian genes

Page 27: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Identifying circadian genes: an outline

Four methods considered, all of which were implemented in R:

GeneCycle based

• The Fisher Method (Wichert et al. 2004)

• The Robust Method (Ahdesmaki et al. 2005)

“Sine wave” based

• The M&R Method (McDonald & Rosbash 2001)

• The Sine Method

Page 28: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The Fisher Method

Implemented by the R package GeneCycle, based on Fouriermethods and Fisher’s g test

Time Series:

CT0 = 1.2

CT4 = 4.9

CT8 = 9.5

CT12 = 0.4

CT16 = 1.5

CT20 = −42

- Fisher’s g test - p-value = 0.3213

Repeat this process for each time series

Page 29: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The Fisher Method: FDR

Oops! We’ve carried out over 6000 multiple tests.The solution: false discovery rate (FDR) control, implemented bythe R package fdrtool

Definition

The FDR value is the percentage of false-positives we expect to befound in our results

0.011, 0.021, 0.042, 0.045, 0.056, 0.065, 0.066, . . .

Page 30: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The Robust Method

Also implemented by the R package GeneCycle

Page 31: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The M&R Method

The M&R Method

Page 32: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The Sine Method

The Sine Method

Page 33: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Heatmap: The Fisher Method

heatmap of Fisher method

Page 34: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Heatmap: The Robust Method

Page 35: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

The Numbers

How many in genes in common between methods etc

Page 36: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Fisher Vs Sine Methods

what’s so different about them?

Page 37: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Conclusions

• Why only use sine waves as a model?

• Is FDR really better than multiple testing?

• Why use GeneCycle?

Page 38: Identification of Circadian Clock Genes  by datamining Microarray data

Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results

Conclusions

• All methods find some circadian clock genes

• . . . and some false positives

• Best approach: use many methods

• There is always a new, better method around the corner . . .