about genewise regulation analysis with regression
DESCRIPTION
Janne Nikkilä. About genewise regulation analysis with regression. Contents of the talk. Biological background of gene expression regulation Previous workin statistical modeling of gene regulation Our motivation Our analysis approach Some results Discussion. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/1.jpg)
About genewise regulation analysis with regression
Janne Nikkilä
![Page 2: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/2.jpg)
Contents of the talk
• Biological background of gene expression regulation• Previous work in statistical modeling of gene
regulation• Our motivation• Our analysis approach • Some results• Discussion
![Page 3: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/3.jpg)
Biological background for gene expression regulation
• Gene expression is regulated at several stages:– DNA unpacking (demethylation, histone acetylation)– Transcription– Alternative RNA splicing– mRNA degradation– Translation initiation– Protein processing and degradation
• Transcription is believed to be the most important one
![Page 4: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/4.jpg)
Biological background for gene transcription regulation
• In the transcription, mRNA corresponding the coding DNA sequence is formed
• Transcription initiation is mainly controlled by binding of specific protein complexes, transcription factors (tf), to gene promoter region
• Tfs may enhance, suppress, or do both• As tfs are composed of proteins, which are coded by
genes, tf activities can be analyzed by studying the expressions of the genes that code tfs
![Page 5: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/5.jpg)
Analysis methods used in the literature
• Modelling of gene interactions by– Boolean networks – Differential equations – Linear regression– Clustering– Probabilistic models (e.g. Bayesian networks)
![Page 6: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/6.jpg)
Our motivation
• None of the previous methods seem to work adequately
• This may be due to methods, due to the quality of the data, or perhaps due to the cumulative effect of these two factors
➔ We wish to find some evidence that gene regulation mechanisms can be inferred from gene expression data
![Page 7: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/7.jpg)
A simple approach
• Study one gene expression at time and try to explain it with the sum of the transcription factor component activities – Intuitive interpretation of the set up and the results
• Regression as model – Easy to interpret, computationally feasible
• Evaluate the results statistically– Somewhat quantitative interpretation of the results
![Page 8: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/8.jpg)
Data
• Expression data– 300 different knockout mutations of the yeast (300
arrays)– over 6000 yeast genes on each cDNA-array
• Binding data– Binding activity of 147 transcription factors to all yeast
genes (147 arrays)– About same genes on array as above– Used to choose a set of candidate tfs for each gene
![Page 9: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/9.jpg)
Preprocessing of the data
• Normal quantity provided by cDNA-arrays is the log-ratio of the sample and the control intensities from each spot– May hinder the discovery of normal regulation
mechanisms• Plain log-intensities separately?
• Not possible because of spotwise variation➔ Only the arraywise and genewise averages were
removed and the normal log-ratios were used in the analysis
![Page 10: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/10.jpg)
The regression model
• The expression of a gene, y, is modelled as a weighted sum of x, the expressions of a set of transcription factor genes
• The error e is assumed to be normally distributed• As a result each transcription factor gene is assigned
a coefficient , which denotes its role in gene regulation
• Fitted with robust fit-method
![Page 11: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/11.jpg)
Statistical analysis of the results
• A subset of nine genes: some confirmed transcription factors, the binding activities of the 50 tfs and significances of the same 50 tfs in regression model
• Tests:– Test whether binding activity and regression model
produce same kind of information about the roles of the tfs for each gene -> no statistical significance
– Test whether the confirmed tfs are found among the most significant ones in either binding or regression -> no stat signif
![Page 12: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/12.jpg)
![Page 13: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/13.jpg)
![Page 14: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/14.jpg)
![Page 15: About genewise regulation analysis with regression](https://reader036.vdocuments.mx/reader036/viewer/2022062323/568158bd550346895dc60622/html5/thumbnails/15.jpg)
Discussion
• Clearly, there is no linear association between the regulator genes and the regulated genes in this data set
• The biggest problem is perhaps the type of the data: cDNA-data without time dimension -> the change of data to Affymetrix and/or timeseries data might help
• Another problem may be oversimplified model, but with this kind of data statistical models for gene interactions seem to be fruitless