prediction of subcellular localization of proteins ~ past, present, and future ~ human genome...

23
Prediction of Subcellula r Localization of Protei ns ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai Swiss-Prot 20 Years

Upload: matilda-barton

Post on 27-Dec-2015

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Prediction of Subcellular Locali

zation of Proteins

~ Past, Present, and Future ~

Human Genome Center, Inst. Med. Sci.,

University of Tokyo

Kenta Nakai

Swiss-Prot 20 Years

Page 2: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

20 Years Ago..

• I became a graduate student i

n Prof. Minoru Kanehisa’s lab

• I wanted to write a program th

at interprets the information e

ncoded in DNA sequences

• But biology is full of exception

s

Page 3: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Diagnosis System of Bacterial Infections (MYCIN 1974)

Enter Information about the patient. (Name, Age, Sex, and Race) Are there any positive cultures obtained from SALLY? … Has SALLY recently had symptoms of persistent headache or other

abnormal neurologic symptoms (dizziness, lethargy, etc.)? …

Enter Information about the patient. (Name, Age, Sex, and Race) Are there any positive cultures obtained from SALLY? … Has SALLY recently had symptoms of persistent headache or other

abnormal neurologic symptoms (dizziness, lethargy, etc.)? …

INFECTION-1 is MENINGITIS

  + <ITEM-1> MYCOBACTERIUM-TB [from clinical evidence only]

+ …

[REC-1] My preferred therapy recommendation is as follows:

  1) ETHAMBUTAL

  Dose: 1.289 (13.0 100mg-tablets) q24h PO for 60 days [calculated

on basis of 25 mg/kg] then 770 mg (7.5 100mg-tablets) q24h PO ..

INFECTION-1 is MENINGITIS

  + <ITEM-1> MYCOBACTERIUM-TB [from clinical evidence only]

+ …

[REC-1] My preferred therapy recommendation is as follows:

  1) ETHAMBUTAL

  Dose: 1.289 (13.0 100mg-tablets) q24h PO for 60 days [calculated

on basis of 25 mg/kg] then 770 mg (7.5 100mg-tablets) q24h PO ..

Page 4: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Knowledge Base for Automatic Reasoning

• Knowledge is represented as a collection of “if-then”

rules, which are chained to make the system solve a

realistic problem

Rule 123

If: the gram stain of the organism is negative

and: the aerobicity of the organism is anaerobic

and: the morphology of the organism is rod

then: the genus of the organism is bacteroides

with a certainty factor of 0.6

Rule 123

If: the gram stain of the organism is negative

and: the aerobicity of the organism is anaerobic

and: the morphology of the organism is rod

then: the genus of the organism is bacteroides

with a certainty factor of 0.6

Working Memory

  Name: Sally

  Age: 42 years

  Sex: Female

Race: …

Working Memory

  Name: Sally

  Age: 42 years

  Sex: Female

Race: …

Page 5: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Expert Systemsエキスパート・システム

Knowledge Base

Inference Engine

Page 6: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Sample Problem

Page 7: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Prediction of Subcellular Localization

Page 8: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Typical Sorting SignalsSignal Function Example

Import into nucleus -P-P-K-K-K-R-K-V-

Export from nucleus -L-A-L-K-L-A-G-L-D-I-

Import into mitochondria <-MLSLRQSIRFFKPATRTLCSSRYLL-

Import into plastid <-MVAMAMASLQSSMSSLSLSSNS

FLGQPLSPITLSPFLQG-

Import into peroxisomes -S-K-L->

Import into ER <-MMSFVSLLLVGILFWAT

EAEQLTKCEVFN-

Return to ER -K-D-E-L->

Page 9: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Amino Acid Composition

• Another good clue for

prediction

• Suited for machine

learning

Outer membrane proteins and periplasmic proteins of Gram-negative bacteria

Page 10: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

PSORT (I)• Nakai & Kanehisa, 1991, 1992

• Expert system using about 100 “If-then” rules

ERM PM LSM ERL LSL OT ERM PM MT MTMT MT NC PX ERM PM GG CPOM IT MX

GY

motif

KK

signal peptide

(Specific Signals)

KDEL

GPI

Topology

MTSNLS

SKL

TMS

TMSTopology

Apolar

Topology

TMS in Mature Part

signal cleavage site

IM

Page 11: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Papers and the web server

• Nakai & Kanehisa, Proteins 1991

– cited 295 times

• Nakai & Kanehisa, Genomics 1992

– cited 961 times

– 34 in 2006

• Web server since 1993

Page 12: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Limitations of PSORT

• Relatively low accuracy possibly because of the

complexity of the sorting mechanisms

• It is difficult to optimize the certainty parameters

assigned for each rule

• It is tedious to update the knowledge base with the

growth of the training data

Page 13: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

PSORT II

• Nakai & Horton, 1997, 19

99 (cited 638 times)

• Machine learning

• kNN (k-nearest neighbor)

method Q

k = 3

Page 14: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

iPSORT: Bannai et al. 2002

Rule 1

A protein has an SP if the sum of hydropathy index values within [6,25] exceeds 18.3

Rule 2

A protein has either an mTP or a cTP if it contains less than 3 D/Es within [1,30] and if it contains a motif similar to 11212111, where 2=(I,R),3=(D,E,H,K,N),1=otherwise

Rule 3

A protein has an mTP if it satisfies Rule 2, if the sum of isoelectric point values within [1,15] exceeds 93, and if it contains a motif similar to 12211221, where 2=(K,R),3=(I,P),1=otherwise

Page 15: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

PSORTb and PSORT.ORG

• Gardy et al. 2003, 2004

– Contribution from a Canadia

n group (Brinkman lab)

• Update for bacterial proteins

Page 16: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

WoLF-PSORT

• Horton et al. 2006

• Latest PSORT update for eu

karyotic proteins

• WoLF: Women only Love Fo

ols!?

Page 17: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Current Dilemma• More data are necessary to improve the training

process

• The practical value of prediction methods becomes

less with the growth of experimental data

• Moreover, the more we investigate, the more the

number of exceptions grows

Page 18: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

It’s a General Problem• Gene Finding

• Prediction of Protein Structure

• …

• Knowing the answer of a problem before we become

to know how to solve it

Similarity search against the data of typical model

organisms will become enough in many cases

Page 19: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

New Generation Predictors

• Should be useful to engineer proteins for their targeti

ng sites

• Should complement errors of proteome analyses (i.

e., isoforms with differential localization)

• Comprehensively example-based rather than statistic

al feature-based (such as amino acid composition)

Page 20: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Biology is like Linguistics• Both are naturally born and full of exceptions

• There may not exist “general principles”

Page 21: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Future of Sequence Analysis

• It will become “DNA linguistics”

• Large dictionaries (databases) will contain both gener

al cases and exceptions

• Such databases may be a sort of knowledge base th

at can be used to simulate the subcellular processes

Page 22: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Past, Present, and Future

• Past

– Expert system-based predictions

• Present

– Machine learning-based predictions

• Future

– Combination of both?

– Revival of knowledge bases to simulate cellular processes?

Page 23: Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai

Acknowledgments

• Minoru Kanehisa

• Paul Horton

• Hideo Bannai, Satoru Miyano

• Jennifer Gardy, Fiona Brinkman

• And all the other people who contributed to the PSO

RT project!