(near term) develop database requirements to yield schema and interfaces

20
1. (near term) Develop Database Requirements to Yield Schema and Interfaces 2.MoBIoS: Database Management for Data in Metric Spaces Daniel P. Miranker

Upload: damara

Post on 23-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

(near term) Develop Database Requirements to Yield Schema and Interfaces MoBIoS: Database Management for Data in Metric Spaces Daniel P. Miranker Univ. of Texas. What we know for sure: Exploit Commodity Architecture. External Data/DB Sources. Web App Server. Curating New Content. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: (near term)  Develop Database Requirements to Yield Schema and Interfaces

1. (near term) Develop Database Requirements to Yield Schema and Interfaces

2. MoBIoS: Database Management for Data in Metric Spaces

Daniel P. Miranker

Univ. of Texas

Page 2: (near term)  Develop Database Requirements to Yield Schema and Interfaces

What we know for sure: Exploit Commodity Architecture

DB

Curating New Content

Computing GridWebApp

Server

External Data/DB Sources

Users

Page 3: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Repository Schema and Interface Definitions

Issue:

• Database organization and data interchange should be addressed simultaneously

• Once established, difficult to change

Best to get this right the first time.

Page 4: (near term)  Develop Database Requirements to Yield Schema and Interfaces

What we know for sure:

DB Schema

Curating New Content

Computing GridWebApp

Server

1. Data transfer XML & Nexus files2. Curate: (manage quality)

Users

Both 1 & 2 impact schema, (data provenance)

Page 5: (near term)  Develop Database Requirements to Yield Schema and Interfaces

XML and Bioinformatics

• Taxonomic Markup Language (TML)

• PhyloML

• BEAST: Bayesian Evolutionary Analysis Sampling Trees

• AGAVE: Architecture for Genomic Annoation Visualization and Exchange

   

Page 6: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Answers Start with a Requirements Analysis

• Who

• What

• Why

• How

“Use cases”: specific examples of what is to be accomplish

Page 7: (near term)  Develop Database Requirements to Yield Schema and Interfaces

A Head Start

Requirements of Phylogenetic Databases (with Nakhleh, Barbancon Piel & Donoghue)[BIBE ’03]

• Did a requirements analysis

• Proof of concept for a correctly normalized database schema

1 evolutionary (tree)-edge = 1 row in the database

Page 8: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Who is interested in using Phylogenies?

• Casual Users

• Visualization

• Study Development

• Super-tree algorithms

• Simulation Studies

• Parameter Derivation

• Comparative Genomics

Page 9: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Super-Tree Algorithms Use-Cases

Construct phylogenies by assembling existing studies

Collect those studies by:

• Determine minimum spanning clade for a set of taxa

• Find all phylogenies sufficiently similar to a given phylogeny

Page 10: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Requirements of Phylogenetic Databases

Page 11: (near term)  Develop Database Requirements to Yield Schema and Interfaces

The MoBIoS ProjectMolecular Biological Information System

Daniel P. Miranker

University of Texas

Page 12: (near term)  Develop Database Requirements to Yield Schema and Interfaces

MoBIoS – A Simple IdeaOrganize the Storage Manager Around Metric Space Indexing

Relational Databases

B+ trees 1

dimensional

Spatial Databases

R & K-D trees 2 & 3 dimensions

Metric Databases

VP, M & GNAT trees

No dimensions

Or

very high dimensions

Page 13: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Biological queries conducted with sequential scans.

• Sequence (BLAST)

• Phylogenies (Tree of Life)

• Mass Spectra (Proteomics)

• Ligand Docking (Rational Drug Design)

Page 14: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Metric Space is

• a pair, M=(D,d), where • D is a set of points • d is [metric] distance function with the following

properties:

– d(x, y) = d (y, x) (symmetry)– d(x, y) > 0, d(x, x) = 0 (non negativity)– d(x, y) <= d(x, z) + d(z, y) (triangle inequality)

Page 15: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Can Biology Be Modeled by Metrics?

• Already metrics re:– Phylogenetic trees

– Ligand docking

• First Biologically Effective Metric Model of Amino Acid Substitution [Xu&Miranker 03] In effect, precisely the phylogenetic relationships among

sequences are exploited to form a database index.

• Metrics for proteomic mass-spectra underway

Page 16: (near term)  Develop Database Requirements to Yield Schema and Interfaces

MoBIoS Architecture(Molecular Biological Information System)

phylogenies

Page 17: (near term)  Develop Database Requirements to Yield Schema and Interfaces

First Application (with Randy Linder)

Compared:

{entire Arib. Genome} x {“entire” Rice genome}

To determine conserved pairs of primer pairs,

In O(m log n), will repeat study again soon, faster.

Page 18: (near term)  Develop Database Requirements to Yield Schema and Interfaces

When biological data is put in to an RDBMS

• Primary data is stored in text or blob fields– Annotations may be relational

• Data retrieval – Filter DB, sequential dump, O(n), to utilities

• E.g. BLAST, TreeBASE, Sequest

Organism Function Sequence (BLOB)

Yeast membrane AACCGGTTT

Yeast mitosis TATCGAAA

E. Coli membrane AGGCCTA

Page 19: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Homework: Due tomorrow morning

1. Who are you, (generically)?

2. Use case involving the database

Page 20: (near term)  Develop Database Requirements to Yield Schema and Interfaces

Don’t know: A General Web Service

DB Schema

Curating New Content

Computing GridWebApp

Server

ToL Infrastructure @ SDSC

Computing Grid