digital archives for molecular microscopy a community database for biological research christoph...

44
Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas

Post on 21-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Digital Archives forMolecular Microscopy

A community database forbiological research

Christoph BestEuropean Bioinformatics Institute,

Cambridge, UK

Matthew T. DoughertyNCMI - Baylor College of Medicine

Houston, Texas

Page 2: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Bioimage Informatics Informatics in support of biological imaging Why?

Image data rapidly increasing (Confocal) Fluorescence microscopy (Cellular B.) EMDB: Electron Microscopy (Structural Biology) High-throughput methods (Genome Biology)

Enabling science by making data accessible, reliable, and understandable

Standards&Conventions Public Databases

Quality assessment Open Microscopy Environment

S.Haertel, U. Chile

J. Swedlow, U. Dundee

EMDB, EBI

Page 3: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Structural Databases at EBI Protein Databank (PDB)

Atomic structures (positions of atoms) PDB file format, mmCIF Derived from X-ray crystallography Long tradition, curated data base Huge: 65,000+ entries, 3 wwPDB sites

Electron Microscopy Databank (EMDB) Part of PDB at EBI and Rutgers 600 density maps of macromolecular structures and

subcellular complexes Started 2002 Curated, but limited metadata, experiment info XML-based

Page 4: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 4

SCIENTIFIC BACKGROUND

Page 5: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 5

Electron microscope

From Schweikert, 2004

Biocenter, U Helsinki

Page 6: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 6

Page 7: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 7

Single-particle method

Tripeptidyl-peptidase II(TPP II)

courtesy of B. Rockel, Martinsried

Molecular structure Many images

computationally combined

3D from 2D resolution increase by

avaraging

Page 8: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 8

Single-particle analysis: GroEL to 4A

Ludtke et al, Structure 2008

Page 9: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 9

Data Management Issues

Initial EM images:

O(1000), 4k x 4k -> O(10GPixel) Particle stacks:

O(100,000), 256x256 -> O(10 GPixel) Final data set: 1 MVoxel small Processing power:

O(100) cores, some weeks, lab-owned clusters Software:

1970s FORTRAN codes, 1990s C codes

fragmented communities, lack of standards

Page 10: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 10

Electron tomography 3D reconstruction by taking a series of images

from different angles Difficulty: Nanometer accuracy Problems:

Limited tilt range ↔ missing wedge⇒ distortion

Imperfections of the tilt ↔ alignment⇒ limited resolution

Computational reconstruction algorithms

Page 11: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 11

Tomography of eukaryotic cells

PROJECTION SLICE

O. Medalia et al, Science, 2002Dictyostelium discoideum

Page 12: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 12

Image enhancement

Before

Cytoskeleton of Spiroplasma melliferum

J. Kürner et al., Science, 2005

Page 13: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 13

Image enhancement

yellow: geodetic line J. Kürner J. Kürner et al.,et al., Science, Science, 20052005

After

Page 14: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 14

Automated image analysis

Manual Automatic

A. Linaroudis, Ph.D. Thesis, 2006

Automatic segmentation to identify points/lines/surfaces

Page 15: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Page 15

Data Management Issues

Original data:

60 images, 8k x 8k -> O(4 GPixel) Reconstruction:

8k x 8k x 256 -> O(16 GPixel) ? Software:

1970s algorithm in 1990s software Visualization:

“let's buy more memory” Future: web-based applications (Google Maps) ?

Page 16: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

The Electron Microscopy Data Bank

contains EM-derived density maps complementary to coordinate sets in PDB established 2002 @ EBI (Kim Henrick) web-based submission and retrieval hand-curated (R. Newman)

A bit like Ebay – and you won't make any money, either

Page 17: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

THE ELECTRON MICROSCOPY DATA BANK

Page 18: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

A Unified Data Resource for EM

NIH-funded joint project

Baylor College of Medicine, Houston (W. Chiu, M. Baker)

Rutgers University, New Jersey [H. Berman, C. Lawson)

PDBe, EBI, Cambridge, UK [K. Henrick, C. Best, R. Newman

Baylor College of MedicineHouston, TX

Rutgers University,Piscataway, NJ

European Bioinformatics Institute,Cambridge, UK

Page 19: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Characteristics

Curated Community Archive: PDB and EMDB NIH, EU (in past), and BBSRC funding (+ EMBL) Worldwide cooperation Advisory boards and task forces from the community Open deposition and retrieval

→ Alternative access systems by other institutions 760 entries, 26 GB data ca 100 entries/year curation both in Europe and US

Page 20: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Growth of EMDB

Page 21: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

EMDep deposition system 750 entries, current rate approx. 15-20/month Contents of an entry:

Metadata (XML header) → experimental metadataMap (any format, converted to CCP4/MRC)Additional files

Java/Tomcat/XML

Page 22: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Unified data resource plan

Page 23: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Joint deposition system

Page 24: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

EMDB search systemJava/Tomcat

Page 25: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

EMDB search systemJava/Tomcat

Page 26: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

EMDB Atlas pages

XSLT

Page 27: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

ISSUES

Page 28: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Metadata management

Difficult: many rounds of consulting the community

Still most fields remain empty Data harvesting

LIMS, PIMS -> rarely used Processing pipelines, image processing software

-> Lack of standards, idiosyncrasies Image formats: Appalling lack of standards

Page 29: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Data issues

Current: Deposit final result of experiment and computation

How much of original/intermediate data should be deposited?

Issues: Cost / Practicability Reproducibility of experiment Intellectual property (un-exploited results?) Usefulness

Page 30: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Non-data issues

Embargo: Image data can be withheld up to two years Allows original researcher to further exploit them Journals and funders must define:

what data must be deposited when they are to be released

Quality Standards: Require community acceptance Technically difficult Data Bank does enrich/annotate, but does not do

science → quality standards must be set by scientists

Page 31: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Image data formats

Current: Variety of historical ad hoc formats Unclear definitions, variations in different software

Need: Interoperability Standards Technical level? Acceptance? → Question for the community

HDF5 Common container format to deal with numerical data Heavyweight library, but widely available (but Java?) Would at least solve low-level format problems Metadata format still needs to be specified

Page 32: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Ontologies

Systematic way to define classes of objects attributes of these objects relationships between objects

Provides framework for metadata models Advantage: Powerful formal method Disadvantage: Not yet widely used

Page 33: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

TECHNICAL DEVELOPMENTS

Page 34: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Rich data sets

Submissions consist of maps (increasingly more than one) relations between data sets → unexpressed

XML-based standards for represen-ting relationships between data:

Subject-predicate-object relationships (RDF framework)

Harvesting interface to EM processing software Web-based visualization for sub-mission and

retrieval, complex sub-missions assembled interactively (AJAX)

Page 35: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Rich data submissions

Page 36: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Possible XML representation

Page 37: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Bioimage informatics tools

Current EMDB interface: simple and efficient but must be extended to

accommodate more complex experiments

OMERO interface: geared at labs, not

public databases All the beauty of AJAX high-performance

visualization

Page 38: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

multichannel imageslab notebooktaggingimage markup

Bioimage informatics tools

BISQUE/BISUICK (UCSB)

Page 39: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

No Standards

Experiment?Image?Analytics?Annotations?

Current Imaging Workflow Paradigm

Jason Swedlow(U. Dundee)

Page 40: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Towards Image Informatics

Page 41: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

OMERO in 2007/8/9

Jason Swedlow(Univ. Dundee)

Page 42: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

CONCLUSIONS

Page 43: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

Imaging Centers

USERS

Databases

Grid/cloud computing/storagein house storage

storage and co

mputing engines

data submission

data harvesting

acquisition,storage, and

managementof images

storagedistributionquality assessment

Software

A Virtual Research Community

Page 44: Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK

CONCLUSIONS

Community data bases are a central part of the Scientific Data Infrastructure

Image databases rapidly growing Technical challenges: data formats, size Standards and interoperability Improve metadata collection Keep the community engaged