1 mage-om and arrayexpress database model ugis sarkans, ebi
TRANSCRIPT
1
MAGE-OM and ArrayExpress database model
Ugis Sarkans, EBI
2
Outline
• what is MAGE-OM
• what is ArrayExpress
• what language is used for modeling
• MAGE-OM structure
• ArrayExpress status and future
• MAGE future developments
3
MAGE-OM
• MicroArray Gene Expression Object Model– also: MAGE-ML (.. Markup Language),
MAGE-STK (..Software ToolKit)
• Merging of MAML (MicroArray Markup Language) and GEML (Gene Expression Markup Language)
4
MAGE: brief history
• December 2000 - initial submissions of proposals to OMG (Object Management Group):– EBI (on behalf of MGED) - MAML
– Rosetta (on behalf of GEML community) - GEML + some IDLs
– NetGenics - IDLs
• Decision to proceed with a joint submission• Decision to comply with Model Driven
Architecture (MDA) principles• October 2001 - joint submission to OMG (Rosetta
and MGED)
5
Model Driven Architecture
• Platform Independent Model (UML)– most of the time spent on this
• Platform Specific Models– XML
• UML (refined from PIM)
• DTD (generated plus hand modifications)
– CORBA (not for MAGE)• UML (refined from PIM)
• IDL (hopefully generated)
– ….
6
ArrayExpress
• first version (object model) - 1999, in collaboration with German Cancer Research Centre (DKFZ)
• second version (object model) - end of 2000, prototype development funded by Incyte
7
ArrayExpress (2)
• implementation - first half of 2001 - Oracle schema, data loader (from MAML), prototype Web interface, a few datasets loaded
• decision to use MAGE-OM as basis for further development
• EU funding - 2002-2004, 8 new positions
8
ArrayExpress - features
• MIAME-compliant• able to import MAML (MAGE-ML) formatted
data• can deal with both raw and processed data• independence of:
– experimental platforms
– image analysis methods
– data normalization methods
• object model-based query mechanism• supports upcoming OMG standard for expression
data
9
Unified Modeling Language
• graphical language for describing software systems (and more ..)
• notation - yes
• methodology - no
10
UML diagram types
• class
• state
• collaboration
• sequence
• ……..
11
State diagram
12
Sequence diagram
13
Collaboration diagram
14
Classdiagram
15
Class diagrams - notation• classes
• attributes– types
• operations
• relationships– subclass relationship– aggregate relationship– association
• role names
• cardinalities
• navigation
16
class
class fromanother package
attribute
aggregation
navigation
role name
cardinality
associationname
inheritance
17
Classdiagram
18
Implementation issues
• Java, C++ - “easy”
• relational databases– classes - tables– 1:1, 1:N - foreign key– N:M - table– subclass relations
• all subclasses in the same table
• separate table for superclass and subclasses
• XML
19
Tools
• Rational Rose– bad graphical capabilities– forward/reverse engineering– API (VB-based)
• open source– ArgoUML
20BSANE BQS
Description
Protocol
Measurement
Audit
Treatment
Transformation
BioEvent
Experiment
ArrayDesign
BioMaterial
BioAssayData BioAssay
DesignElement
UML Packages
HigherLevelAnalysis
BioSequence
ArrayManufactureQuantitationType
21
Top level structure
22
BioAssay
23
Biomaterial
24
ArrayDesign
25
DesignElement
26
DesignElement
27
DesignElement mapping
28
Data
29
BioSequence
30
ArrayManufacture
31
Quantitations
32
HigherLevelAnalysis
33
BioEvent
34
Protocol
35
Description
36
AuditAndSecurity
37
Measurement
38
ArrayExpress: current status
• Object model (MAGE-OM) - stable
• Database schema - generated (standard SQL, we run under Oracle)
• Data loader from MAGE-ML - generated
• Web interface (queries, browsing) - under development
39
Near future developments
• Dedicated hardware for ArrayExpress
• Good quality data coming from collaborators (annotation tools needed)
• Data uploading and Web interface made public
40
Future developments
• Integration with existing tools (Expression Profiler)
• New analytical tools
• Links with other databases
• Data curation, liaison with data providers
41
ArrayExpress architecture
central database(experiment-centred)
data warehouse
application server(Java servlets)
Web server
image server
ArrayExpress
curation
MAGE-ML
API
curation tooldatabase
42
MAGE schedule
• OMG meeting, Dublin, November 12-16 - specification hopefully adopted
• Mechanism for incorporating changes and user feedback
• MAGE programming jamboree, EBI, December 6-11: API development, parser generation, annotation tools (MAGE STK)
43
Resources• Web site
– links to documents• presentations
• UML models – also HTML version and PNG image files of diagrams
– http://www.geml.org/omg.htm
• Mailing list– [email protected]– to subscribe, send the following to
subscribe lsr-ge <yourEmailAddress>
44
• Doug Bassett (Rosetta)
• Alvis Brazma (EBI)
• Steve Chervitz (Affymetrix)
• Francisco Dela Vega (Applied Biosystems)
• Michael Dickson (NetGenics)
• David Frankel (IONA)
• Scott Markel (NetGenics)
• Michael Miller (Rosetta)
• Dave Nellesen (Incyte)
• Alan Robinson (EBI)
• Martin Senger (EBI)
• Paul Spellman (Lawrence Berkley Lab)
• Jason Stewart (NCGR)
• Charles Troup (Agilent)
Acknowledgements