macromolecular structure middleware

45
Research Collaboratory for Structural Bioinformatics http:// openmms.sdsc.edu Macromolecular Structure Middleware OpenMMS An Ontology Driven Architecture

Upload: anjelita-ortiz

Post on 03-Jan-2016

51 views

Category:

Documents


2 download

DESCRIPTION

Macromolecular Structure Middleware. OpenMMS An Ontology Driven Architecture. Overview. The mmCIF Ontology OpenMMS Toolkit Macromolecular Structure (MMS) Metamodel Parser, XML SQL / Corba Servers and Clients Corba UML and the future. How do we “Enable” Science?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Macromolecular Structure Middleware

OpenMMS

An Ontology Driven Architecture

Page 2: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Overview

The mmCIF Ontology OpenMMS Toolkit

Macromolecular Structure (MMS) Metamodel Parser, XML SQL / Corba Servers and Clients

Corba UML and the future...

Page 3: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

How do we “Enable” Science?

Promote well defined Macromolecular Structure (MMS) Specifications

Distribution – Open Interfaces– Now:

• flat files• W3 browsing and searching

– Future: • XML, SQL, CORBA

Page 4: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Why OpenMMS? Allow programmers to more easily create

efficient, high performance and robust applications.

A Java-only toolkit with that creates XML, CORBA and Relational DB representations of the mmCIF Macromolecular Structure Data.

Source code is publicly available so users can easily modify the metamodel or create an entirely new one.

Page 5: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

What Do We Mean by an Ontology Driven Architecture?

What do we mean by an Ontology?

A bridge between Our World of Natural Languageand the World of Machines.

Page 6: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

mmCIF Dictionary and Data Files Based on Ontology for Macromolecular

Structure defined by the International Union of Crystallography

Replaces the older 80-Column PDB files mmCIF Dictionary contains over 140 Category

and 1600 Item definitions Open, Extensible Provides a well-defined reference standard for

data distribution

Page 7: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

OpenMMS Toolkit Data Flow

Applications

mmCIF Data Files(Reference Standard)

CorbaServer

Relational

Database

mmCIFParsers

XML Files

Page 8: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Metamodel Information Flow

mmCIF Dictionary

Metamodel Framework

Corba IDL, SQL Schema,XML DTD,

Java Data LoadersJDBC Loaders

mmCIF OntologyMetamodel

Page 9: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

What can OpenMMS do? PDBase program will load any or all PDB files

into any SQL-92 compatible database (Oracle, mySQL, Sybase...)

Translate any PDB file into an XML file. Contains Two Corba servers:

– Reference server will cache and serve data read from PDB flat files.

– DB server will cache and serve data read from a SQL database (very quickly...)

All Source code written in Java and publicly available.

Page 10: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Some Advantages of Using an Ontology Driven Architecture

Scales to very large Ontologies More reliable and maintainable code Transfer between representations Scientific Correctness of representation Help in maintaining backward

compatibility

Page 11: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

How does one actually represent an ontology?(OpenMMS Internal Metamodel Overview)

Root

Module Module

Interface

Field

Struct Struct

Struct

Field

VisitorAbstract Class

VisitorSubclass

Page 12: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

mmCIF Parsers

General Purpose, Low-level access to data

Parsers available in many languages OpenMMS toolkit includes Java Parser

– Uses “Builder” Design Pattern– An application subclasses Abstract

Builder class and stores data into its data structures

Page 13: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

MMS in XML Large Flat Files (open and close tags) Tables can be grouped by rows or columns XML from SQL Query

– Many requests from Web browsers don’t really need or want all the data

– SW available from DB Vendors and ISVs for creating XML files from SQL result sets

– Smaller files load faster

Page 14: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Relational DB Expression SQL-92 Compatible Schemas for all the standard DB vendors Fast and Flexible Keyword searches PDBase loader allows structures to be

selectively loaded Oracle Instance Tested

– 14,556 Structures– 16GB, 88 Million Atom Records

Page 15: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

A very high-level (and very-rough) classification of communication

Person-to-Person communication– email

Person-to-Machine communication– HTTP/HTML

Machine-to-Machine communication– CORBA, SQL, .NET, Soap

Not Communications -> Data Formats– XML, mmCIF (STAR), many more …

Page 16: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

What is CORBA?

Common Object Request Broker Architecture

Defines a family of open software interface specifications for distributed object computing.

http://www.omg.org

Page 17: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

What is an Object? “A Data Structure with an Attitude”

Programs = Algorithms + Data Structure

Object Oriented Programming Principle: Partition the parts of algorithms with the

data structures they use

Page 18: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Side View of a Distributed Application

ClientE.g. a

Java Applet

ServerE.g. Mainframe

Computer Server

Internet (TCP/IP)

MiddleWare

MiddleWare

Network

IDL IDL

Page 19: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

The “Hourglass” view of the Internet

Unreliable Datagrams

Reliable Bitsteam

Applications

TCP, RTP,...

IP

Copper, GlassRadio Spectrum

HTTP, Corba, .NET OO High-Level Interface

(ATM, Ethernet, V.90, SONET...)

Page 20: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Where is Corba?

Inside every Java Runtime Environment. Commonly used in middle tier and backend

(e.g. database) connections. Open Source and Commercial

Implementations Available Usually buried deep inside the software

– Difficult or impossible to tell when it is being used

Page 21: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

What is Distributed Object Computing?

Extends the benefits of object-oriented technology across process and machine boundaries to encompass entire networks.

Attempts to make remote objects appear to programmers as if they were local objects in the same process. This is called location transparency.

Page 22: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Advantages of Distributed Object Computing

Easier (and faster) for programmers to create distributed applications

Increases Reliability Increases Maintainability Increases Portability Increases Extensibility

Page 23: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

The Alphabet Soup

OMG = Object Management GroupConsortium of 800+ companies founded in 1989.

IDL = Interface Definition Language

Page 24: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

The key is to focus on boundaries, interfaces, how things fit together

Not on the internal Not on the internal details of how they’re details of how they’re built; assume that will built; assume that will be diverse & be diverse & changingchanging

Shape of boundaryShape of boundaryis defined in IDLis defined in IDL

Boundaries, Interfaces

Page 25: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

The Interface to an The Interface to an object can be object can be distributed over a distributed over a network network

The glue that binds partsThe glue that binds partstogether is the ORBtogether is the ORB

Shape of boundaryShape of boundaryis defined in IDLis defined in IDL

Boundaries, Interfaces

Page 26: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Corba Independence

Open Standard for Distributed Object Oriented Design

Independent of Hardware Platform Independent of Operating System Independent of Programming Language Independent of Object Location

Page 27: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Object Request BrokerObject Request Broker

ClientClient ObjectObjectIDLLIDLIDL

ORBs mediate between objects and things that use them (clients)

Object Request BrokerObject Request Broker

Page 28: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Terminology

IIOP– The Internet Inter-ORB Protocol, defined in

the Spec as a vendor-independent, wire-level network protocol on top of TCP/IP. This allows ORB implementations of different vendors to interoperate.

Page 29: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

ORB ORB

ORB

Java PerlC++ C Ada Java

VB ActiveX

Corba / IIOP—Internet Inter-ORB Protocol

ORBs: Medium for Integration

Page 30: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Corba Facilities:Industry Standards in Vertical Markets

Manufacturing Finance Life Sciences Research C4I Many others...

Page 31: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Using Corba to accessMacromolecular Structure Data

No Parsing of Flat Files Direct Access to Binary Data Structures Strongly Typed Data Granularity of Access Indices and Presence Flags Pre-computed Highest Performance

Page 32: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

OMG/LSR Macromolecular Structure Adoption Process

August 1999 RFP issued March 2000 Initial Submission September 2000 Revised Submission February 2001 Adopted Spec by the OMG 4Q 2001 OpenMMS LSR/MMS1.0

compliant implementationsource code publicly available

February 2002 Approved as a Formal OMGAvailable Specification.

Page 33: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Using the CORBA MMS Server

An excerpt from legacy PDB Formatted File ATOM Record (4hhb.ent)...ATOM 6 CG1 VAL A 1 7.009 20.127 5.418 6.00 61.79 ...ATOM 7 CG2 VAL A 1 5.246 18.533 5.681 6.00 80.12 ...ATOM 8 N LEU A 2 9.096 18.040 3.857 7.00 26.44 ...ATOM 9 CA LEU A 2 10.600 17.889 4.283 6.00 26.32 ...ATOM 10 C LEU A 2 11.265 19.184 5.297 6.00 32.96 ...ATOM 11 O LEU A 2 10.813 20.177 4.647 8.00 31.90 ...ATOM 12 CB LEU A 2 11.099 18.007 2.815 6.00 29.23 ...ATOM 13 CG LEU A 2 11.322 16.956 1.934 6.00 37.71 ...ATOM 14 CD1 LEU A 2 11.468 15.596 2.337 6.00 39.10 ...ATOM 15 CD2 LEU A 2 11.423 17.268 .300 6.00 37.47 ......

Page 34: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

LSR/MMS “ATOM Record”

struct AtomSite { string id; IndexId type_symbol; AtomIndex label; IndexId label_entity; VectorXYZ cartn; float occupancy; float b_iso_or_equiv; };

DsLSRMacromolecularStructure.idl excerpt:

Page 35: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Example Code and Resulting Output

Entry e = entryFactory.get_entry_from_id(”4hhb");AtomSite[] a = e.get_atom_site_list();for (int i = 0; i < a.length; i++) { System.out.println(a[i].id + " " + a[i].type_symbol.id + " (" + a[i].cartn.x + ", " + a[i].cartn.y + ", " + a[i].cartn.z + ")"); }

produces:

1 N (11.065, 7.352, 9.598)2 C (12.436, 7.764, 9.902)3 C (12.883, 7.09, 11.208)4 O (12.088, 7.0, 12.147)5 C (12.611, 9.264, 10.06)...

Page 36: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

What are the alternatives to Corba?

TCP/IP Sockets - Byte stream

DCOM, COM++, OLE, .NET (Microsoft Only)– DCOM Corba Bridges are available from

several vendors

SOAP (Simple Object Access Protocol) – XML Based

Page 37: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Unified Modeling Language – UMLWhat do all those arrows and boxes Mean?

Schematic Language for Defining SW Graphics Representations UML = Things, Relations and Diagrams 9 types of Diagrams The most commonly used diagram is the

“Class Diagram”

Page 38: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

UML Class Diagram Example

get_version()

get_entry_id_list()

get_entry_modification_dates()

native_formats_supported()

get_native_entry_representation()

EntryFactory

EntryIdList * EntryId

Identifier

ModificationDateList

Entry_id : EntryIddate: TimeBase::TimeT

ModificationDate*

Page 39: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

UML Class Diagram Basics

method1()

method2()

method3()

Class_Name

var1: Type

var2: Type

Underlined for Class Instances, Italics for Abstract Classes

Variables

Methods

Details may be omitted if not important

Page 40: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

UML Relationships

*

*0..1

Dependency

Association

Generalization (Inheritance)

Aggregation

Page 41: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

UML Example

get_version()

get_entry_id_list()

get_entry_modification_dates()

native_formats_supported()

get_native_entry_representation()

EntryFactory

EntryIdList * EntryId

Identifier

ModificationDateList

Entry_id : EntryIdDate : TimeBase::TimeT

ModificationDate*

Page 42: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

XMI: XML Metadata Interchange UML is a graphical representation; need

some way to exchange UML models between applications

XMI is used to store and transmit UML models

XML based Defines XML tags for classes,

relationships between classes etc.

Page 43: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

OMG MDA

Platform Independent Models (PIMs) that define the interface are defined in UML

The PIMs are translated to Platform Specific Models (PSMs) such as Corba, SOAP, .NET or XML Schemas

The Corba servers and clients may be the same, but now the interface is defined in UML and the IDL is then generated from the UML

Page 44: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

MDA Platform Independent toPlatform Dependent Translation

UML

Corba

SOAP XML

.NET

Page 45: Macromolecular Structure Middleware

Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu

Thanks and Acknowledgments

Phil Bourne John Westbrook David Benton

Karl Konnerth Lynn TenEyck