prediction of proteins that participate in learning process by machine learning

34
prediction of proteins that participate in learning process by machine learning Dan Evron Miri Michaeli Project Advisors: Dr. Gal Chechik

Upload: bardia

Post on 12-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

prediction of proteins that participate in learning process by machine learning. Dan Evron Miri Michaeli Project Advisors: Dr. Gal Chechik Ossnat Bar Shira. Biological Background. A synapse is a junction between 2 neurons. How does Synaptic Transmission works?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: prediction of proteins that participate in learning process by machine learning

prediction of proteins that participate in learning process

by machine learning

Dan EvronMiri Michaeli

Project Advisors: Dr. Gal ChechikOssnat Bar Shira

Page 2: prediction of proteins that participate in learning process by machine learning

Biological Background

• A synapse is a junction between 2 neurons.

• How does Synaptic Transmission works?

Page 3: prediction of proteins that participate in learning process by machine learning

Hebbian theory

Donald Hebb:

»"When an axon of cell A is near enough to excite B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased"

Page 4: prediction of proteins that participate in learning process by machine learning

Synaptic Plasticity

• synaptic plasticity is the ability of the synapse to change in strength by molecular alteration.

• What kind of alterations happen during synaptic plasticity?

Page 5: prediction of proteins that participate in learning process by machine learning

Synaptic Plasticitychanges

• Pre synaptic release probability.

• The number of postsynaptic receptors.

• Properties of postsynaptic receptors.

example

• Change in the probability of glutamate release.

• Insertion or removal of postsynaptic AMPA receptors.

• phosphorylation and de-phosphorylation inducing a change in AMPA receptor conductance.

Page 6: prediction of proteins that participate in learning process by machine learning

What is the connection to learning and memory?

synaptic plasticity is one of the important neurochemical foundations of learning and memory.

Page 7: prediction of proteins that participate in learning process by machine learning

Learning in Aplysia• Habituation• Sensitization• Classical conditioning

• All found in the gill withdrawal reflex !!!

• Kendel’s work connects organism level learning to cellular level learning !!!

Page 8: prediction of proteins that participate in learning process by machine learning

And what about us?

• in mammals: • Many of the pathways are far from

understood.

• Much bigger and complex nervous system.

• Research shows that many principals are the same (LTP/LTD in the Hippocampus).

Page 9: prediction of proteins that participate in learning process by machine learning

Project Idea & Goal

Biological research has found many proteins which are connected to biological pathways involved in learning in the neuron and synapse. Yet, pathways are far from understood and many components are missing.

Our goal is to find candidate proteins that may take part in these pathways and have not been discovered yet.

Page 10: prediction of proteins that participate in learning process by machine learning

How will we do that?

1. Collect numerical data on organism proteins.

2. Collect ontologies about synaptic plasticity

3. Label each gene as related / non related to synaptic ontologies (according to data)

4. Use SVM as a classifier

5. Search for false positive genes in results

6. Publish a great article and win a Nobel prize! (or just dream about it…)

Page 11: prediction of proteins that participate in learning process by machine learning

Our research organism is…Mus musculus

AKA...

The house mouse!

Page 12: prediction of proteins that participate in learning process by machine learning

Tools & Databases

GEO (Gene Expression Omnibus)

MGI (Mouse Genome Informatics)

GO (Gene Ontology)

MPPDB (Mouse Protein-Protein Interaction Database)

SynDB (Synapse Database)

Page 13: prediction of proteins that participate in learning process by machine learning

Tools & Databases

• Classifier: SVM (Support Vector Machine)

Page 14: prediction of proteins that participate in learning process by machine learning

The project had 2 main phases:

• Phase 1: – Work only on PPI data– Create baseline for further work

• Phase 2:– Increase our PPI data– another data type: gene expression– Combine the PPI and GE data– Try to improve prediction !!

Page 15: prediction of proteins that participate in learning process by machine learning

Phase 1:

• Extract PPI data from BioGRID

• Label the matrix for each ontology

• Perform SVM algorithm on the sets

• Calculate baseline

Page 16: prediction of proteins that participate in learning process by machine learning

Phase 1 - results• Most ontologies had only few related genes -

problematic.• Baseline:

baseline SVM prediction

010

203040

506070

8090

endoplasmicreticulum

ion channel activity G protein coupledreceptor protein

signaling pathw ay

ontologies

Page 17: prediction of proteins that participate in learning process by machine learning

Phase 2

will another type of data improve the results?

Gene expression

Page 18: prediction of proteins that participate in learning process by machine learning

Step 1 - extracting data

– Representative set of mouse proteins from MGI.

– Gene Expression data from experiments related to synaptic and neuronal learning.

– Mouse Protein Protein Interaction (PPI) from several data bases.

– gene ontologies from GO.– Synaptic ontologies from SynDB.

Page 19: prediction of proteins that participate in learning process by machine learning

Step 2 – processing data

• Each gene expression data comes in separate files - need to be combined.

• Normalize gene expression data.

• Create PPI’s matrix.

• Convert PPI’s proteins to genes.

Page 20: prediction of proteins that participate in learning process by machine learning

Step 3 - combine the data

According to the list of genes:– Matrix that combine PPI&GE when each gene

has at least one data type. (“union”)– Matrix that combine PPI&GE when each gene

has both data types. (“intersect”)– PPI matrices from the two mentioned matrices– GE matrices from the two mentioned matrices

Page 21: prediction of proteins that participate in learning process by machine learning

Step 4 - labeling the data

• For each set, and each ontology we labeled the genes (related/non related).

Page 22: prediction of proteins that participate in learning process by machine learning

Step 5 - perform SVM algorithm on the sets

Page 23: prediction of proteins that participate in learning process by machine learning

Step 6 - process the results

• Evaluate prediction success (AUC).

• Find potential false positive candidates.

So how did we do?

We have to build a ROC curve before..

Page 24: prediction of proteins that participate in learning process by machine learning

What is ROC?

ROC = Receiver Operating Characteristic.

• Our SVM builds a ROC curve - that is a graphical plot of the sensitivity vs. specificity.

• During the SVM run-time, it calculates the AUC of the ROC curve made by it after classification.

Page 25: prediction of proteins that participate in learning process by machine learning

What is AUC?

• AUC = Area Under the Curve.• The AUC is a way to evaluate accuracy of the

learning model by averaging the prediction precision.

• The AUC spans between 0.5 and 1, when 0.5 shows that the test has a 50% precision (equals to tossing a coin!) and 1 indicates a perfect precision ability.

• The AUC enables us to examine and compare SVM results.

Page 26: prediction of proteins that participate in learning process by machine learning

Results

• Intersect of the data:– Size of all 3 matrices is similar – enables

comparison.– Average AUC: GE alone: 75%

PPI alone: 63%

GE + PPI: 75%

Page 27: prediction of proteins that participate in learning process by machine learning

Results - intersectComparison of AUC in GE, PPI and GE+PPI

0.000.100.200.300.400.500.600.700.800.90

endo

som

e

mito

chon

dria

lin

ner

mem

bran

e

G-p

rote

inco

uple

dre

cept

orac

tivity

G-p

rote

inco

uple

dre

cept

orpr

otei

n

ion

chan

nel

activ

ity

volta

ge-g

ated

ion

chan

nel

activ

ity

GO terms

AU

C

PPI

GE

GE + PPI

Page 28: prediction of proteins that participate in learning process by machine learning

Results

• Union of the data:– Close to reality in number of genes (14K in

matrices, 15K in representative list)– Average AUC in GE alone = GE + PPI = 74%– The matrices size issue– PPI alone corresponded to different GO

categories, so can not be compared.

Page 29: prediction of proteins that participate in learning process by machine learning

Results - union

Comparison of AUC in GE and GE+PPI

0.000.100.200.300.400.500.600.700.800.901.00

en

do

pla

sm

icre

tic

ulu

mm

em

bra

ne

mit

oc

ho

nd

ria

lre

sp

ira

tory

ch

ain

ca

lciu

m c

ha

nn

el

ac

tiv

ity

ex

tra

ce

llula

rlig

an

d-g

ate

d i

on

ch

an

ne

l a

cti

vit

y

ca

tio

n c

ha

nn

el

ac

tiv

ity

sy

na

pti

c v

es

icle

ne

uro

tra

ns

mit

ter

rec

ep

tor

ac

tiv

ity

GO terms

AU

CGE

GE + PPI

Page 30: prediction of proteins that participate in learning process by machine learning

Conclusions

• We can compare between different types of data only from the “intersect” mats.

• In intersect, the PPI sets the size, therefore we have same GO categories.

• In union, GE size took over the PPI data and that is the reason for different GO categories (GO categories in both PPI’s are the same).

• PPI did not contribute to prediction !

(bad news…)

Page 31: prediction of proteins that participate in learning process by machine learning

The good news…

• Still, 75% is a nice accuracy!

• We found several false positive genes, that may be related to synaptic plasticity and have not been discovered yet as such.

examples:– Neurogranin (NRGN) – CADPS

Page 32: prediction of proteins that participate in learning process by machine learning

Neurogranin (NRGN)

Acts as a "third messenger" substrate of protein kinase C-mediated molecular cascades during synaptic development and remodeling. Binds to calmodulin in the absence of calcium.

Page 33: prediction of proteins that participate in learning process by machine learning

Ca++-dependent secretion activator(CADPS)

Calcium-binding protein involved in exocytosis of vesicles filled with neurotransmitters and neuropeptides. Probably acts upstream of fusion in the biogenesis or maintenance of maturesecretory vesicles.

Page 34: prediction of proteins that participate in learning process by machine learning

Next steps..

• Computationally:– Improve the classification by adding new

types of data and / or by different representation of the data.

• Biologically:– Explore through biological experiments the

proteins we have found (the FP list).