(near term) develop database requirements to yield schema and interfaces

Post on 23-Jan-2016

26 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

(near term) Develop Database Requirements to Yield Schema and Interfaces MoBIoS: Database Management for Data in Metric Spaces Daniel P. Miranker Univ. of Texas. What we know for sure: Exploit Commodity Architecture. External Data/DB Sources. Web App Server. Curating New Content. - PowerPoint PPT Presentation

TRANSCRIPT

1. (near term) Develop Database Requirements to Yield Schema and Interfaces

2. MoBIoS: Database Management for Data in Metric Spaces

Daniel P. Miranker

Univ. of Texas

What we know for sure: Exploit Commodity Architecture

DB

Curating New Content

Computing GridWebApp

Server

External Data/DB Sources

Users

Repository Schema and Interface Definitions

Issue:

• Database organization and data interchange should be addressed simultaneously

• Once established, difficult to change

Best to get this right the first time.

What we know for sure:

DB Schema

Curating New Content

Computing GridWebApp

Server

1. Data transfer XML & Nexus files2. Curate: (manage quality)

Users

Both 1 & 2 impact schema, (data provenance)

XML and Bioinformatics

• Taxonomic Markup Language (TML)

• PhyloML

• BEAST: Bayesian Evolutionary Analysis Sampling Trees

• AGAVE: Architecture for Genomic Annoation Visualization and Exchange

   

Answers Start with a Requirements Analysis

• Who

• What

• Why

• How

“Use cases”: specific examples of what is to be accomplish

A Head Start

Requirements of Phylogenetic Databases (with Nakhleh, Barbancon Piel & Donoghue)[BIBE ’03]

• Did a requirements analysis

• Proof of concept for a correctly normalized database schema

1 evolutionary (tree)-edge = 1 row in the database

Who is interested in using Phylogenies?

• Casual Users

• Visualization

• Study Development

• Super-tree algorithms

• Simulation Studies

• Parameter Derivation

• Comparative Genomics

Super-Tree Algorithms Use-Cases

Construct phylogenies by assembling existing studies

Collect those studies by:

• Determine minimum spanning clade for a set of taxa

• Find all phylogenies sufficiently similar to a given phylogeny

Requirements of Phylogenetic Databases

The MoBIoS ProjectMolecular Biological Information System

Daniel P. Miranker

University of Texas

MoBIoS – A Simple IdeaOrganize the Storage Manager Around Metric Space Indexing

Relational Databases

B+ trees 1

dimensional

Spatial Databases

R & K-D trees 2 & 3 dimensions

Metric Databases

VP, M & GNAT trees

No dimensions

Or

very high dimensions

Biological queries conducted with sequential scans.

• Sequence (BLAST)

• Phylogenies (Tree of Life)

• Mass Spectra (Proteomics)

• Ligand Docking (Rational Drug Design)

Metric Space is

• a pair, M=(D,d), where • D is a set of points • d is [metric] distance function with the following

properties:

– d(x, y) = d (y, x) (symmetry)– d(x, y) > 0, d(x, x) = 0 (non negativity)– d(x, y) <= d(x, z) + d(z, y) (triangle inequality)

Can Biology Be Modeled by Metrics?

• Already metrics re:– Phylogenetic trees

– Ligand docking

• First Biologically Effective Metric Model of Amino Acid Substitution [Xu&Miranker 03] In effect, precisely the phylogenetic relationships among

sequences are exploited to form a database index.

• Metrics for proteomic mass-spectra underway

MoBIoS Architecture(Molecular Biological Information System)

phylogenies

First Application (with Randy Linder)

Compared:

{entire Arib. Genome} x {“entire” Rice genome}

To determine conserved pairs of primer pairs,

In O(m log n), will repeat study again soon, faster.

When biological data is put in to an RDBMS

• Primary data is stored in text or blob fields– Annotations may be relational

• Data retrieval – Filter DB, sequential dump, O(n), to utilities

• E.g. BLAST, TreeBASE, Sequest

Organism Function Sequence (BLOB)

Yeast membrane AACCGGTTT

Yeast mitosis TATCGAAA

E. Coli membrane AGGCCTA

Homework: Due tomorrow morning

1. Who are you, (generically)?

2. Use case involving the database

Don’t know: A General Web Service

DB Schema

Curating New Content

Computing GridWebApp

Server

ToL Infrastructure @ SDSC

Computing Grid

top related