protein structure database for structural genomics group jessica lau december 13, 2004 m.s. thesis...

23
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Upload: eric-mcbride

Post on 18-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Protein Structure Database for Structural Genomics

Group

Jessica LauDecember 13, 2004

M.S. Thesis Defense

Page 2: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

• Bioinformatics is• Analysis of biological data: gene expression, DNA

sequence, protein sequence. • Data mining and management of biological information

through database systems.

• At the Northeast Structural Genomics Consortium, database management systems play a large role in its daily operation

• Data collection and mining of experimental results• Track target progress – status milestones• Exchange information with rest of the world

• My thesis presents work in database management systems at the NESG.

• Part 1: ZebaView• Part 2: Worm Structure Gallery• Part 3: Prototype of NESG Structure Gallery

Page 3: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

• Zebaview is the official target list of the Northeast Structural Genomics Consortium

• Display summary table of NESG targets.– Status milestones– Protein properties: DNA and

protein sequences, molecular weight, isoelectric point

• New targets are curated and then uploaded to SPiNE.

• 11,284 targets from 88 organisms.

Page 4: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Family View

NESG Families

• Unfolded• Membrane• Core 50• Nf-kB

Page 5: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

In PDB / Cloned Prokaryotic vs. Eukaryotic

0

5

10

15

20

25

30

35H

. sa

pie

ns

(H)

D.

me

lan

og

ast

er

(F)

S.

cere

visi

ae

(Y

)

C.

ele

ga

ns

(W)

Organism

Pe

rce

nta

ge

In P

DB

/Clo

ne

d Prokaryotic

Eukaryotic

Target Summary Statistics

Success of soluble targets: Prokaryotic vs. Eukaryotic

0

10

20

30

40

50

60

70

80

90

D. m

elan

ogas

ter

(F)

S. c

erev

isia

e (Y

)

H. s

apie

ns (

H)

C. e

lega

ns (

W)

Organism

Per

cen

tag

e o

f S

olu

ble

/Clo

ned

Prokaryotic

Eukaryotic

Selected Cloned Expressed Soluble Purified X-ray or NMR data collection In PDB

• 4,418 targets cloned• 141 structures• 3.4% successful targets

Page 6: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

GO, Cellular Localization, and SignalP

• Search for targets that have • any of the three GO ontologies defined• no GO ontologies defined at all

116 NESG structures do not have Molecular Function defined

Page 7: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

LOCTarget

• Secretory proteins require formation of disulfide bonds• Oxidative Folding needed for proper native folding

• 2,132 “Extracellular” NESG targets

Bovine ribonuclease A has four disulfide bonds to stabalize its 3-D structure.Mahesh Narayan, et al. (2000) Acc. Chem. Res., 33 (11), 805 -812.

Page 8: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

SignalP

• mRNA are translated with signal peptide for cellular localization• Peptide is cleaved upon destination

• SignalP predicts cleavage of signal peptide• Removal of signal peptide gives proper native fold

Lodish et al. Molecular Cell Biology 4th edition, Figure 7.1 (2000)

Page 9: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Part 2 – Worm Structure Gallery

Page 10: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Caenorhabditis elegans– Widely studied model organism

• 2-3 weeks life span, small size (1.5-mm-long), ease of laboratory cultivation, transparent body

• Small genome, yet has complex organ systems similar to higher organisms: digestive, excretory, neuromuscular, reproductive systems

Donald Riddle et al, C. elegans II (1997)

Altun Z F and Hall DH. , Atlas of C. elegans Anatomy, Wormatlas (2002-2004)

Page 11: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

System Components

• 22,653 C. elegans proteins• 42 experimentally determined

• 4 are from NESG• 24 homology models

• 14 are from NESG• 960 C. elegans proteins potentially modeled

• Uniprot: Pfam domain, Gene name, ORF name• PDB Coordinates• Structure Validation Report• Sequence similarities to proteins in PDB

Page 12: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Protein Structure Validation Software

• Suite of quality validation software– PROCHECK

• Quality of experimental data• Distribution of φ, ψ angles in Ramachandran plot

– MolProbity Clashscore• Number of H atom clashes per 1,000 atoms

• With respect to a set of scores from 129 high resolution X-ray crystal structures

• < 500 residues, of resolution <= 1.80 Å, R-factor <= 0.25 and R-free <= 0.28;

Bahattacharya, A et al. to be published

Page 13: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

• Algorithm based on alignment between query and template sequences.– Regions of conserved

residues forms a set of constraints for modeling

• Sequence identity of 40% or more

• Good quality template

Homology Modeling Automatically (HOMA)

Page 14: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Bad alignment Bad model

Page 15: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Poor quality template Poor quality model

Page 16: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Quality scores of 3-D structures

Quality Z-scores - Homology Models vs. Experimentally Determined Structures

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

-10 -8 -6 -4 -2 0 2

Procheck (all) z-score

Mo

lPro

bit

y C

las

hs

co

re z

-sc

ore

Homology Models

Experimentally Determined Structures

Page 17: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Search

• Search for C. elegans proteins in local database.

• Keyword: “Ubiquitin” in any field

Results:72 C. elegans proteins2 Experimentally determined structures1 Homology model11 Potential models

Results:152 C. elegans proteins2 Experimentally determined structures1 Homology model19 Potential models

Page 18: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

System Architecture• Java, Tomcat, MySQL, Perl.

Three-tier architecture

• Client: Web browser

• Application: JSP, Logic components, Data access components

• Data: MySQL

Page 19: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Part 3 – NESG Structure Gallery

Page 20: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

• Structure files submitted by individual groups

• Structure information is entered into SPiNE manually

• Manually run PSVS and MolScript

• Structure files submitted by automated pipeline

• ADIT integrated with SPiNE for uniform format

• PSVS and images automatically generated

• Structure information from PSVS directly into SPiNE

• Archives structure files.

Page 21: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

• Downloads– Structure Validation

Report– Structure related files

• Atomic coordinates• NMR constraints• NMR peak lists • Chemical shifts• Structure factor

• Annotation– Functional annotation

provided by other NESG members

– Uniprot– PDB coordinates file

• Reusing Java components from Worm Structure Gallery

Page 22: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

– Enhance ZebaView performance to handle increased load and functionalities

– Integrate annotation from other protein and structure databases.

– Make modules available for other java-based applications within structural genomics.

– Develop a gallery for other organisms: yeast, fruit fly, human

– Continue specifications for the new NESG Structure Gallery

Page 23: Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Advisor: Dr. Gaetano Montelione

Thanks to everyone at theProtein NMR lab and NESG!

Aneerban BhattacharyaJohn Everett

All the scientists who solved the structures!