1 arrayexpress ugis sarkans, ebi. 2 overview underlying standards –miame –mage* data submission...

25
1 ArrayExpress Ugis Sarkans, EBI

Upload: nelson-shaw

Post on 29-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

1

ArrayExpress

Ugis Sarkans, EBI

Page 2: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

2

Overview• Underlying standards

– MIAME– MAGE*

• Data submission• Data access

– annotations– actual data– array design descriptions

• Some technical details• Future developments

Page 3: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

3

What information should be exchanged?

• MIAME - Minimum Information About a Microarray Experiment– informal specification– paper published in Nature Genetics– goal - to initiate discussion:

• which details are important and which may not be

– ArrayExpress can store MIAME data (and more)

Page 4: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

4

MAGE-OM

• MAGE-OM: MicroArray Gene Expression Object Model– in January 2002 became an “adopted” OMG

specification– January to August 2002 - finalization process– in September became an “available” specification– should be set in stone for the next 2 years– thinking about MAGE v2 started

• user feedback• support for other types of functional genomics data• more precise handling of data manipulation

Page 5: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

5BioEvent

Experiment

ArrayDesign

BioMaterialBioAssayData

BioAssay

DesignElement

UML Packages of MAGE

HigherLevelAnalysis

BioSequence

Array QuantitationType

DescriptionProtocol

MeasurementAuditAndSecurity

BQS

what was used what was done results

miscellaneous

Page 6: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

6

MAGE-ML

• MAGE-ML: MicroArray Gene Expression Markup Language– generated from MAGE-OM, therefore

evolved automatically– translation from Jan 2002 to Sep 2002

DTD quite easy

Page 7: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

7

ArrayExpress: data• currently - 9 experiments, 4 array designs:

– from EMBL - human, yeast– from Sanger - pombe

• coming:– array descriptions: Affymetrix, Agilent– labs: TIGR, Utrecht, more from Sanger, ...– export from existing DBs: SMD, RAD– tools - MAGE-ML export: Jexpress, BASE, ...– ILSI project

• journal requirements: Nature, Lancet, ...

Page 8: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

8

Help with MAGE-ML: MAGEstk

• MAGE-ML - the only way of getting data into ArrayExpress

• MAGEstk: MicroArray Gene Expression Software ToolKit– Jamboree IV in Stanford, beginning of

December– used in MIAMExpress (MAGE-ML export)

Page 9: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

9

MAGEstk

• Programming APIs• Mapping of MAGE-OM to language-

specific OMs• API’s are automatically generated from

the OM specifications– get/set methods for associations– get/set methods for attributes

• XML <=> language-specific OM marshallers/unmarshallers - also automatically generated

Page 10: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

10

MAGEstk (cont.)

• Use opensource/standard modules/packages– Xerces, JDBC, etc.

• Implementation in Java, C++, Perl, Python

• database access modules on top of these APIs– Postgres schema– DB access layer

• annotation tools - planned

Page 11: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

11

ArrayExpress data retrieval

• main objective - help in finding and initial exploration of data; download for detailed analysis

• data repository (now) + data warehouse (in development)

Page 12: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

12

Array Design- accession

- name

Protocol- accession

Experiment- accession

Organisation- name

Array

Species Sample

Hybridisation

ExperimentDesign

ExperimentType

ExperimentalFactor

Person- last name

Protocol Type

Queries - logical structure

Page 13: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

13

Query form

Page 14: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

14

Annotation browsing

Page 15: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

15

Data representation

spots

measurements

BioAssays (hybridizations, data transformations)

QuantitationTypes (signal intensity, ratio etc.)

DesignElements (spots, genes)

in MAGE/ArrayExpress

in Expression Profiler

Page 16: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

16

Exporting data to Expression Profiler

BioAssays (hybridizations, data transformations)

QuantitationTypes (signal intensity, ratio etc.)

DesignElements (spots)

BioAssayData1

BioAssayData2

select BioAssayData cubes

select QuantitationTypes

select BioAssaysDesignElements

(QT,BA) pairs

Page 17: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

17

Data export form

Page 18: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

18

Array representation - ADF format

Page 19: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

19

Experiment plan display

Page 20: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

20

ArrayExpress(Oracle + Tomcat)

OtherMicroarraydatabases

www

EBI

ExpressionProfiler

ExternalBioinformatics

databases

Data analysis

www

Queries

www

MIAMExpress(MySQL)

MAGE-ML

Submissions

Array Manufacturers

LIMS

Microarray

software

Data Analysissoftware

ArrayExpress Infrastructure

MAGE-ML import,

export

Local MIAMExpressInstallations

Data

pipelines

MAGE-ML

Page 21: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

21

Tomcat

ArrayExpress architecture

ArrayExpress(Oracle)

MAGE-ML(DTD)

MAGE-OMMAGE-ML (doc)MAGE-ML (doc)MAGE-ML (doc)

MAGEloader

Velocitytemplateengine

Castor

object/relationalmapping

Web pagetemplateWeb pagetemplate

Java servlets

MAGEvalidator

MAGEunloader

error.log

Page 22: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

22

ArrayExpress: other technical details

• Data matrices - stored in NetCDF format:– binary format for efficient storage of

multidimensional array

• Arrays - stored as ADF spreadsheets (in addition to normal MAGE structures)

Page 23: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

23

In development

• Immediate:– interface efficiency improvements– BioAssays - graphical display– better integration with Expression Profiler

• Medium-term:– user management

• non-public data (e.g., for reviewers)

– MAGE-ML export

• Curation tool

Page 24: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

24

ratio absolute change

confidence measure

namedesign element type

speciessample type

bioassay type

performer labexper. type

array design name

platform type

provider

Properties

Properties

Properties

Properties Properties

Data warehouse - for gene- and data-driven queries

namebiological entity type

Page 25: 1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions

25

Microarray Informatics team at EBIAlvis Brazma - group leader

ArrayExpress Curation MIAMExpress

•Ugis Sarkans

•Gonzalo Garcia •Helen Parkinson •Mohammadreza Shojatalab

Expression Profiler

•Jaak Vilo

Research, students•Thomas Schlitt•Katja Kivinen•Johan Rung•Patrick Kemmeren

•Misha Kapushesky•Lev Soinov

•Koichi Tazaki

•Anastasia Samsonova

•Susanna Sansone

•Philippe Rocca-Serra

•Ele Holloway

•Niran Abeyguna- wardena

•Ahmet Oezcimen

•Gaurab Mukherjee •Sergio Contrino

•Anjan Sharma

•Aurora Torrente