1 miame the miame website: © 2002 norman morrison for manchester bioinformatics

1

MIAME

The MIAME website:

http://www.mged.org

© 2002 Norman Morrison for Manchester Bioinformatics.

2

Overview

• Why capture meta-data?• The data capture challenges

– What to capture?– How to capture it?– Who agrees what to capture?

3

Post-genome data

bioinformatics

genometranscriptome

proteome

interactome

metabolometextome

mobileome

phenome

4

Why meta-data?

• Genome data is static• Post-genome is very state-dependant

– Transcriptome = no. of cell types * no. no of environmental conditions– Annotation matters– Data comparisons matter– Learn from the gene debacle

• Protein-tyrosine phosphatase, non-receptor type 6, Protein-tyrosine phosphatase 1C, PTP-1C, Hematopoietic cell protein-tyrosine phosphatase, SH-PTP1, Protein-tyrosine phosphatase SHP-1

• LARD, death receptor 3 beta, WSL-1R protein, lymphocyte associated receptor of death, death receptor 3

• We need repositories

5

Microarray Repositories

• A repository is a primary source of data generated by experimentalists. Its main role is to enforce standards and quality thresholds and to make data widely available.

• Needs standards.

6

Microarray Repositories II

• Repositories allow for easier data exchange between groups

• Ensure that key details are kept

• BUT:

– What should be captured and how

• Requires international cooperation

– Minimal Information for the Annotation of Microarray experiments (MIAME)

– Developed within MGED

7

MIAME – Major Sections

• Array design– Reporters– Features– Control elements

• Experimental design– Experiment type– Sample details– Hybridisations– Measurements

8

The Six Parts of MIAME

1. Experimental design: the set of hybridization experiments as a whole

2. Array design: each array used and each element (spot, feature) on the array

3. Samples: samples used, extract preparation and labeling

4. Hybridizations: procedures and parameters

5. Measurements: images, quantification and specifications

6. Normalization controls: types, values and specifications

9

MIAME Glossary

10

Value of audit

• Based on (qualifier, value, source)

• Qualifier: cell type• Value: epithelial• Source: Gray’s anatomy (38th ed.)

or• Qualifier: treatment• Value: 15heat shock• Source: Smith and Jones, Nature Genet. (1992)

11

MIAME definitions

• Available from www.mged.org• A minimum document to be read• All details mentioned in MIAME should be captured somewhere

– Know where they are

• Latest draft: Version 1.1 (Draft 5, March 5, 2002)– Discussed at MGED IV

• See also: A. Brazma, et al., Nature Genetics, vol 29 (December 2001), pp 365 - 371

12

MIAME part 1:– array description

• In principle this is someone else’s problem– (e.g. Affymetrix, Clonetech, etc.)

• Three levels of array design elements: – feature – the location on the array– reporter – the nucleotide sequence present in a particular location on the array– composite sequence – a set of reporters used collectively to measure an expression

of a particular gene, exon, or splice-variant

• Array design has 5 parts:1.1 Array related information1.2 Reporter information1.3 Feature information1.4 Composite sequences1.5 Control elements

13

MIAME part 2:Experimental design

• This is your problem• Experimental design has four parts

2.1 Experimental design

2.2 Sample

2.3 Hybridisation

2.4 Measurements

14

2.1 Experimental design

• Design and purpose of the set of hybridisations• Author, lab and contact• Experiment type• Experimental factors• Number of hybridisations• Common reference• QC steps• Experiment description (plus refs)• Anything else

15

2.2 Sample

• Biosource properties – organism, contact, cell type, sex,….• Biomaterial manipulation – growth conditions, in vivo treatment,

compound• Sample labelling – label used, amount, method• Spiked controls – feature, type• Anything else

16

2.3 Hybridisation

• Relationship between samples and arrays• Protocol – full description • Anything else

17

2.4 Measurement

• Raw data – scanner files, scanning protocol• Scanning protocol – parameter settings• Analysis and quantification – analysis output, protocol – e.g.

algorithms• Normalisation – strategies and algorithms, final gene expression

table• Anything else

18

MAGE, ontologies and maxd

The MAGE website:

http://www.mged.org

© 2002 Norman Morrison for Manchester Bioinformatics.

19

Outline

• MIAME is useful, but …..– How can we represent it computationally?– How can we use it to share and exchange data?– The wonderful world of XML– The evil that is free text

• ontologies and controlled vocabularies

• Maxd – MIAME supportive, MAGE-ML compliant analysis of microarray data.

MAGE-ML, MAGE-OM

• MIAME sets a standard for what knowledge (meta-data) to capture

• But how to do it?• Need a knowledge model – a schema to represent the

knowledge and the relationships between them.

Knowledge capture

• UML – Universal Modelling Language provides a methodology for capturing knowledge in ways that are computationally tractable (cf database schemas)

• MAGE-OM is the MGED approved UML model which attempts to capture the concepts in MIAME

XML

• A UML diagram is not useful by itself• MAGE-ML is an attempt to capture MAGE-OM in XML

(eXtended Markup Language) – the next generation HTML• MAGE-ML provides a structure for a text document (marked up

with tags) which describes a microarray experiment

MAGE-ML

• MAGE-ML is not nice!– Complex– Not easily human readable– Needs software tools to help create it– Very rich

• MAGE-ML is the standard we have to work to.

ArrayExpress

• ArrayExpress is the new public microarray data repository based at the EBI

• Provides tools to help create MAGE-ML• Experiments will not be entered unless the annotation is of a

high quality

Making MAGE useable

• For a repository we need a relational database – not an object model

• We have created a relational implementation of the MAGE-OM which is MIAME compliant (based on an early UML diagram for arrayexpress) - maxdSQL

Data repositories

• Relational version of MAGE-OM

Outstanding issues – free text

• MAGE provides a structure for the knowledge – not a prescription for what gets put in

• How to control what people put in the free text areas of MIAME/MAGE (the mickey mouse / . problem)

• How do we define what is meant in ways that other people/software understand

Solution 1

• Controlled vocabularies– Agreed lists of terms (and definitions) that a community agree to use

• Pros: technical simple, easy to implement• Cons: limiting, how to get agreement?, terms on there own are

not very descriptive

Solution 2

• Ontologies– Can be thought of as a set of agreed terms and the relationships between

them (a taxonomy is a simple ontology in which the only relationship allowed is an is-a relationship)

• Pros: a very rich and powerful infrastructure• Cons: complex• Many developments – a space to watch

– Chris Stoeckert and Helen Parkinson http://www.cbil.upenn.edu/Ontology/MGED_ontology.html

1 miame the miame website: © 2002 norman morrison for manchester bioinformatics

Documents