presentazione 2012 05_03

20
Ph.D. Day 2012/05/16 METAL PDB METAL PDB A DATABASE OF METALLOPROTEINS A DATABASE OF METALLOPROTEINS XXVI cycle of “International Doctorate in Mechanistic and Structural Systems Biology” Serena Lorenzini, 2 nd year Ph.D. student Tutor: Claudia Andreini Ph.D. Day 2012/05/16 METAL PDB METAL PDB A DATABASE OF METALLOPROTEINS A DATABASE OF METALLOPROTEINS XXVI cycle of “International Doctorate in Mechanistic and Structural Systems Biology” Serena Lorenzini, 2 nd year Ph.D. student Tutor: Claudia Andreini

Upload: serena-lorenzini

Post on 20-Jan-2017

167 views

Category:

Technology


0 download

TRANSCRIPT

Ph.D. Day2012/05/16

METAL PDBMETAL PDBA DATABASE OF METALLOPROTEINSA DATABASE OF METALLOPROTEINS

XXVI cycle of“International Doctorate in

Mechanistic and Structural Systems Biology”Serena Lorenzini, 2 nd year Ph.D. student

Tutor: Claudia Andreini

Ph.D. Day2012/05/16

METAL PDBMETAL PDBA DATABASE OF METALLOPROTEINSA DATABASE OF METALLOPROTEINS

XXVI cycle of“International Doctorate in

Mechanistic and Structural Systems Biology”Serena Lorenzini, 2 nd year Ph.D. student

Tutor: Claudia Andreini

Biological DatabasesWhy?

1. Biology has increasingly turned into a data-rich science.

2. To make biological data available to scientists. A particular type of information should be available in one single place

(book, site, database). Collecting data from literature is TIME-CONSUMING!

3. To organize data in order to produce knowledge

4. To make biological data available in computer-readable form. Analysis of biological data almost always involves computers.

Having the data in computer-readable form is a necessary first step.

Biological DatabasesWhy?

1. Biology has increasingly turned into a data-rich science.

2. To make biological data available to scientists. A particular type of information should be available in one single place

(book, site, database). Collecting data from literature is TIME-CONSUMING!

3. To organize data in order to produce knowledge

4. To make biological data available in computer-readable form. Analysis of biological data almost always involves computers.

Having the data in computer-readable form is a necessary first step.

1."Atlas of Protein Sequences and Structures" by Margaret Dayhoff and colleagues, 1965. (PIR database). 65 Sequences.2. Protein Data Bank (PDB); Join between CCDC and BNL. 1971. 9 structures.3. GenBank. December 1982. 606 sequences.

… Data require algorithms to be analyzed4. The FASTA algorithm is published by Pearson and Lipman. 19855. The BLAST program (Altschul,et.al.) is implemented. 1990

...The 'omics era6. The E.Coli genome is published. 1997

Today: 1783 Biological Databases (NAR database issue 20121)

1. http://www.oxfordjournals.org/nar/database/cap/

Biological DatabasesMilestones

Biological DatabasesMilestones

Biological Databases and MetalloproteinsA troubled relationship

Not informative Out of date(Lack of bio-inorganic background) (Difficult to update)

The problem:exceptional variability

lack of a formal description for metals in proteins

Few resources dedicated (10 on 1783 databases found using “metal” keyword in NAR database issue).

At least indicative.

Solution3D models of metal sites

Advantages:

Automatic Extraction of 3D models

Formal description of features

Systematic organization of data

Easy update

Metal sites must be thought as functional unitscomposed of the metal and its LOCAL environment

Basis for database architecture

Metal PDB Architecture

First level: Automatically filled1- Information onthe entire structuresfrom multiple

resources2- Information on

metal sites

Second level: Manually filledFunctional information on metal sites

PDB

Metal PDB ArchitectureMetal PDB Architecture

PfamPfam

SCOPSCOP

CATHCATHEC-PDBEC-PDB

GOGO

……

PDB PDB

FIRST LEVELInformation

on the entire protein structurePDB coderesolution

Protein nameUniprot code

Cluster 50% sequence identityCATH domain-sSCOP domain-sPFAM domain-s

Enzyme Classification number-sTaxonomy names

Organism of Expression

Metal PDB Architecture

Metal PDB Architecture

FIRST LEVELInformation

on the metal site onlyMetal/s type

NuclearityCoordination number

Bond distancesCoordination geometry1

LigandsProximal residues

Binding patternConservation rates of residues

Secondary Structure patternH bonds

............1. Andreini C., Cavallaro G., Lorenzini S., “FindGeo: a tool for determining metal coordination

geometry”, Bioinformatics, April 2012

Metal sitesMetal sites His 96His 96

His 94His 94 His 119His 119

Metal PDB Interface

Metal PDB Interface

Metal PDB Interface

Metal PDB Interface

Data from First LevelData from First Level

Metal PDB Architecture: from First to Second LevelMetal PDB Architecture: from First to Second Level

Problem:Problem: Metal sites in the PDB are 151683.151683.

28193 PDB entries actually bind metals.

Problem:Problem: Metal sites in the PDB are 151683.151683.

28193 PDB entries actually bind metals.

Superfamily 1: zincinsSuperfamily 1: zincins Superfamily 2: endostatinsSuperfamily 2: endostatins

Cluster 1Cluster 1 Cluster 2Cluster 2 Cluster 3Cluster 3 Cluster 4Cluster 4

Solution:Solution:Create clusters of equivalent sites (same function) which can be annotated

together 1,2

1. Andreini et al, “Structural analysis of metal sites in proteins: non-heme iron sites as a case study”, J Mol Biol 20092. Andreini et al, Minimal functional sites allow a classification of zinc sites in proteins. PloS one, 2011

Characterization of a Database Metal PDB

Type of dataMetal-containing 3D sub-structures

Data entry and quality controlAppointed curators add, remove and update data

Primary or derived dataSecondary databases: results of analysis of primary databasesLinks to other data itemsCombination of data

Technical designRelational database (SQL)

Maintainer statusAcademic group

AvailabilityPublicly available, no restrictions

THANK YOUFOR YOUR ATTENTION

Thanks to Professor Ivano Bertini and Professor Lucia Banci

Thanks toDr. Claudia Andreini

Dr. Gabriele CavallaroProf. Antonio RosatoTech. Enrico Morelli

THANK YOUFOR YOUR ATTENTION

Thanks to Professor Ivano Bertini and Professor Lucia Banci

Thanks toDr. Claudia Andreini

Dr. Gabriele CavallaroProf. Antonio RosatoTech. Enrico Morelli

Metal PDB Architecture: from First to Second Level

What are equivalent sites ?

1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,

PFAM, CLUSTER50%)3. Structural alignment among sites in the same group

(reference template is longest chain)

Metal PDB Architecture: from First to Second Level

What are equivalent sites ?

1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,

PFAM, CLUSTER50%)3. Structural alignment among sites in the same group

(reference template is longest chain)

Metal PDB Architecture: from First to Second Level

What are equivalent sites ?

1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,

PFAM, CLUSTER50%)3. Structural alignment among sites in the same group

(reference template is longest chain)4. Single linkage clustering to create sub-groups of structural similarity

Metal PDB Architecture: from First to Second Level

What are equivalent sites ?

1 cluster of structural similarity2 clusters of functional similarity

1 cluster of structural similarity2 clusters of functional similarity

Metal PDB Architecture: from First to Second Level

What are equivalent sites ?

1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,

PFAM, CLUSTER50%)3. Structural alignment among sites in the same group

(reference template is longest chain)4. Single linkage clustering to create sub-groups of structural similarity

5. Cluster equivalent (nuclearity and type) metal sites among members of same structural similarity group

EQUIVALENT SITES ARE DEFINED