1 arrayexpress ugis sarkans, ebi. 2 overview underlying standards –miame –mage* data submission...

Post on 29-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

ArrayExpress

Ugis Sarkans, EBI

2

Overview• Underlying standards

– MIAME– MAGE*

• Data submission• Data access

– annotations– actual data– array design descriptions

• Some technical details• Future developments

3

What information should be exchanged?

• MIAME - Minimum Information About a Microarray Experiment– informal specification– paper published in Nature Genetics– goal - to initiate discussion:

• which details are important and which may not be

– ArrayExpress can store MIAME data (and more)

4

MAGE-OM

• MAGE-OM: MicroArray Gene Expression Object Model– in January 2002 became an “adopted” OMG

specification– January to August 2002 - finalization process– in September became an “available” specification– should be set in stone for the next 2 years– thinking about MAGE v2 started

• user feedback• support for other types of functional genomics data• more precise handling of data manipulation

5BioEvent

Experiment

ArrayDesign

BioMaterialBioAssayData

BioAssay

DesignElement

UML Packages of MAGE

HigherLevelAnalysis

BioSequence

Array QuantitationType

DescriptionProtocol

MeasurementAuditAndSecurity

BQS

what was used what was done results

miscellaneous

6

MAGE-ML

• MAGE-ML: MicroArray Gene Expression Markup Language– generated from MAGE-OM, therefore

evolved automatically– translation from Jan 2002 to Sep 2002

DTD quite easy

7

ArrayExpress: data• currently - 9 experiments, 4 array designs:

– from EMBL - human, yeast– from Sanger - pombe

• coming:– array descriptions: Affymetrix, Agilent– labs: TIGR, Utrecht, more from Sanger, ...– export from existing DBs: SMD, RAD– tools - MAGE-ML export: Jexpress, BASE, ...– ILSI project

• journal requirements: Nature, Lancet, ...

8

Help with MAGE-ML: MAGEstk

• MAGE-ML - the only way of getting data into ArrayExpress

• MAGEstk: MicroArray Gene Expression Software ToolKit– Jamboree IV in Stanford, beginning of

December– used in MIAMExpress (MAGE-ML export)

9

MAGEstk

• Programming APIs• Mapping of MAGE-OM to language-

specific OMs• API’s are automatically generated from

the OM specifications– get/set methods for associations– get/set methods for attributes

• XML <=> language-specific OM marshallers/unmarshallers - also automatically generated

10

MAGEstk (cont.)

• Use opensource/standard modules/packages– Xerces, JDBC, etc.

• Implementation in Java, C++, Perl, Python

• database access modules on top of these APIs– Postgres schema– DB access layer

• annotation tools - planned

11

ArrayExpress data retrieval

• main objective - help in finding and initial exploration of data; download for detailed analysis

• data repository (now) + data warehouse (in development)

12

Array Design- accession

- name

Protocol- accession

Experiment- accession

Organisation- name

Array

Species Sample

Hybridisation

ExperimentDesign

ExperimentType

ExperimentalFactor

Person- last name

Protocol Type

Queries - logical structure

13

Query form

14

Annotation browsing

15

Data representation

spots

measurements

BioAssays (hybridizations, data transformations)

QuantitationTypes (signal intensity, ratio etc.)

DesignElements (spots, genes)

in MAGE/ArrayExpress

in Expression Profiler

16

Exporting data to Expression Profiler

BioAssays (hybridizations, data transformations)

QuantitationTypes (signal intensity, ratio etc.)

DesignElements (spots)

BioAssayData1

BioAssayData2

select BioAssayData cubes

select QuantitationTypes

select BioAssaysDesignElements

(QT,BA) pairs

17

Data export form

18

Array representation - ADF format

19

Experiment plan display

20

ArrayExpress(Oracle + Tomcat)

OtherMicroarraydatabases

www

EBI

ExpressionProfiler

ExternalBioinformatics

databases

Data analysis

www

Queries

www

MIAMExpress(MySQL)

MAGE-ML

Submissions

Array Manufacturers

LIMS

Microarray

software

Data Analysissoftware

ArrayExpress Infrastructure

MAGE-ML import,

export

Local MIAMExpressInstallations

Data

pipelines

MAGE-ML

21

Tomcat

ArrayExpress architecture

ArrayExpress(Oracle)

MAGE-ML(DTD)

MAGE-OMMAGE-ML (doc)MAGE-ML (doc)MAGE-ML (doc)

MAGEloader

Velocitytemplateengine

Castor

object/relationalmapping

Web pagetemplateWeb pagetemplate

Java servlets

MAGEvalidator

MAGEunloader

error.log

22

ArrayExpress: other technical details

• Data matrices - stored in NetCDF format:– binary format for efficient storage of

multidimensional array

• Arrays - stored as ADF spreadsheets (in addition to normal MAGE structures)

23

In development

• Immediate:– interface efficiency improvements– BioAssays - graphical display– better integration with Expression Profiler

• Medium-term:– user management

• non-public data (e.g., for reviewers)

– MAGE-ML export

• Curation tool

24

ratio absolute change

confidence measure

namedesign element type

speciessample type

bioassay type

performer labexper. type

array design name

platform type

provider

Properties

Properties

Properties

Properties Properties

Data warehouse - for gene- and data-driven queries

namebiological entity type

25

Microarray Informatics team at EBIAlvis Brazma - group leader

ArrayExpress Curation MIAMExpress

•Ugis Sarkans

•Gonzalo Garcia •Helen Parkinson •Mohammadreza Shojatalab

Expression Profiler

•Jaak Vilo

Research, students•Thomas Schlitt•Katja Kivinen•Johan Rung•Patrick Kemmeren

•Misha Kapushesky•Lev Soinov

•Koichi Tazaki

•Anastasia Samsonova

•Susanna Sansone

•Philippe Rocca-Serra

•Ele Holloway

•Niran Abeyguna- wardena

•Ahmet Oezcimen

•Gaurab Mukherjee •Sergio Contrino

•Anjan Sharma

•Aurora Torrente

top related