towards a data model for the australian microbial resources information network (amrin)

Post on 06-Jan-2016

15 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN). Version: 0.03 17/09/2010. Lynette Woodburn Atlas of Living Australia. TIP. Each slide in this presentation comes with accompanying Notes. - PowerPoint PPT Presentation

TRANSCRIPT

Towards a Data Model

for the

Australian Microbial Resources Information Network

(AMRiN)

Lynette WoodburnAtlas of Living Australia

Version: 0.0317/09/2010

Each slide in this presentation comes with accompanying Notes.

You can’t see them if you display this presentation in ‘Slide Show’ mode.

If you’d like to see the Notes

• view the presentation in ‘Normal’ mode, and • expand the pane below the slide (the Notes pane) to see extra text.

Only then will you have a chance of understanding all the crazy diagrams.

TIP

a standard set of data fields for all micro-organisms

. to support the sharing and integration of data through AMRiN

. to pre-configure BioloMICS

Requirement

Options . choose an existing set

. develop something new

Towards a data model for AMRiN

Recommendation

. surprise!

1. Requirements

2. Options

3. Recommendation

AMRiN

AMRiN community

AMRiN

AMRiN community

AMRiN

AMRiN community

1. Requirements

2. Options

3. Recommendation

- existing

CABRIMCL

Common Access to Biological Resources and Information CABRI

a European organization of partner collections

who contribute data to searchable ‘catalogues’ covering

http://www.cabri.org/

• bacteria & archaea

• fungi & yeasts

• animal & human cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

CABRI’s sets of data elements

• 26

• 23

• 29

• 17

• 15

• 33

• 30

• 12

• 7

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

elements per set

Original_host_plant

Doubling_time

Lysogenicity

Isolated_from

Morphology

Common Access to Biological Resources and Information CABRI

For each different kind of biological resource,

CABRI defines nested sets of data elements

Mandatory Recommended Full

CABRI : bacteria & archaea

Strain_numberOther_collection_numbersRestrictionsOrganism_typeNameInfrasubspecific_namesStatusHistoryConditions_for_growth Form_of_supply

SerovarOther_namesIsolated_fromGeographic_originMutantGenotypeLiterature

Sexual_statePathogenicityEnzyme_productionMetabolite_productionApplicationsCatalogue_entryRemarksPrice_codePlasmids

Mandatory Recommended Full

CABRI : fungi & yeasts

Strain_numberOther_collection_numbersNameStatusOrganism_typeHistoryRestrictionsForm_of_supplyConditions_for_growth

Misapplied_namesRaceSubstrateGeographic_originLiteratureApplicationsMutantSexual_state

Price_codeRemarksPathogenicityMetabolite_productionEnzyme_productionGenotype

Mandatory Recommended Full

CABRI : animal & human cell lines

Accession_numberCell_line_nameBrief_descriptionDescriptionDepositorBibliographic_referencesMorphologyCulture_conditionsVirusesPropertiesRelease_conditionsHazard Passage_number

Species_validation

TumorigenicityKaryologyFreezing_mediumSterilityValidation_assaysFurther_bibliographyCommentsStorageDoubling_timeMycoplasmaFingerprintCytogeneticsKaryotypeCommentsResearch_council_depositBIOMED_1

Mandatory Recommended Full

CABRI’s sets of data elements

• 26

• 23

• 29

• 17

• 15

• 33

• 30

• 12

• 7

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

192

Sharing data about one kind of biological resource is easy

eg. phages

eg. plasmids

Sharing data about one kind of biological resource is easy

Sharing data about multiple kinds of biological resources is hard

Other_culture_collection_numbers

Other_collection_numbers

133 distinct data elements …

for describing several different kinds of biological resources ?

What is the prospect of deriving a common model from CABRI

… distributed across 9 sets

bacte

ria &

arc

haea

fungi & yeasts

animal cell lines

plant cell lines

hybridomas

phag

es plasmids

plant cell viruses

genomic libraries

each of 92 elements is found in only one set

CABRI as a common model ?

only 41 elements are found in more than one set

CABRI as a common model ?

27 data elements are found in two sets 10 ….. in three 4 ….. in four

No elements are found in more than 4 sets

Distribution of data elements across CABRI sets

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• hybridomas

• phages

• plant cell lines

• plant cell viruses

• plasmids

• genomic libraries

Count of data elements in one set two three four

6 3 22 7 14 12 9 13 6 11 4 12 2 1 2 2 1 1 1 3 1

CABRI data element ‘themes’

• bacteria & archaea

• fungi & yeasts

• animal cell lines

• plant cell lines

• hybridomas

• phages

• plasmids

• plant cell viruses

• genomic libraries

ID of item in

collection

Name / classific

ation of it

em

item admin

handling & distributio

n regulatio

ns

care / maintenance

characteristics

literature

….origin

CABRI : comparison of elements across sets

• different names, same meaning (definition)

Accession_number, Strain_number

History, History_of_deposit

Bibliographic_references, Reference_paper, Literature, Reference, Further_bibliography

Restricted_distribution, Release_conditions,Restrictions, Distribution

Morphology, Morphology_and_growth

….

CABRI : comparison of elements across sets

• same name, different meanings

Brief_description

Type

phages type of elementphage, transposon, minitransposon, IS element, …

plasmids type of elementplasmid, phasmid, cosmid, shuttle vector, transposon, minitransposon, IS element, …

genomic libraries type of libraryPAC, BAC, YAC, PI, cDNA, …

hybridomas listing of species, strain, antibody specificity

animal cell lines listing of species, strain, tissue, tumour, pathology, transformed/transfected

CABRI : comparison of data element sets

• varying levels of scope

Conditions_for_growth bacteria & archaea

fungi & yeasts

culture medium

atmospheric and light conditions

temperature conditions

additional remarks on cultivation

Medium plasmids, phages

Medium_1 plant cell lines

Light_regime plant cell lines

Light_conditions plant cell lines

Temperature plant cell lines

Humidity plant cell lines

• 9 sets of data elements (but does not cover algae)

good for sharing information about one kind of organism

• few elements common to several sets

hard to share information about more than one kind of organism • does not lend itself to the derivation of a common set

elements of ‘different names, same meaning’ elements of ‘same name, different meanings’ elements with meanings of varying scope

• has international acceptance / presence (but no longer funded?)

CABRI : fitness for our purpose

1. Requirements

2. Options

3. Recommendation

- existing

CABRIMCL

Microbiological Common Language

MCL

• a new data exchange standard for microbiological information

Research in Microbiology, 161(6), 439-445

http://www.straininfo.net/projects/mcl

• a pluggable framework, easily extended

• has the same ancestor as CABRI (MINE)

• underpins StrainInfo (www.straininfo.net)

“ a world-wide, virtual catalog integrating the information from BRC [Biological Resource Centres] catalogs with related information”

CABRIMCL

CABRI compared with MCL

partitioned by kind of biological resource partitioned by workflow step

Sample IsolationCulture

Deposit

Medium Publication

Strain

The abstract model of Microbiological Common Language (MCL)

… follows the logical flow from sampling to subsequent deposits

mcl : Sample

sampleDate

sampleCultureStrainNumber

sampleCollectorsampleCollectorInstitute

comments

sampleDescriptionsampleLocationDescription

sampleLocationCountrysampleLocationPlace

sampleAltsampleLatsampleLong

sampleHabitatEnvoTermsampleHabitat

sampleCulture

Sample

mcl : Culture

Culture

[otherStrainNumbers]

id

cultureLastUpdateDateotherStrainNumberstrainNumber

catalogURL

speciesName

historyisolationDateisolatorisolatorInstituteisolationMethod

typeStrainOfSpeciestypeStrainOf

typeStrainOfGenus

comments

minimalGrowthTemperature[growthTemperature]

optimalGrowthTemperaturemaximalGrowthTemperature

oxygenRelationship

nomenclaturalPublicationpublication

environmentPublicationhistoryPublicationtaxonomicPublication

hasSamplerecommendMedium

some Object Properties

Culture

hasSamplerecommendMedium

nomenclaturalPublicationpublication

environmentPublicationhistoryPublicationtaxonomicPublication

Sample

Medium Publication

mcl : Medium mcl : Publication

Medium

mediumNamemediumNumbermediumURLmediumDescriptioncomments

Publication

dcterms: bibliographicCitationdc: titledc: creatorprism: publicationNameprism: volumeprism: numberprism: startingPageprism: pageRangedcterms: issued

MCL : fitness for our purpose

• MCL offers a broadly-applicable suite of data elements

. data elements are grouped according to workflow steps, not organism type

. applicable to algae and cyanobacteria

. the Strain concept supports the logical linking of related cultures

• the model is modular and easily extensible

. model cohesion is achieved through Object Properties

. links easily with genomic standards (see StrainInfo)

• born and raised in Europe (StrainInfo), but now going global

. Asian biorepositories network is considering adoption

. we’re invited to contribute to ongoing development

• primarily devised (custom-built) as a data exchange standard

1. Requirements

2. Options

3. Recommendation

Recommendation : dip a toe into the water

• MCL, custom-built for describing microbiological data, deserves consideration

Proposal

undertake a pilot, involving a small group of AMRiN participants,

to assess the suitability of MCL for AMRiN’s purpose.

AMRiN

AMRiN community

AMRiN participants’ input

map local elements to MCL elements

Note:some MCL elements

may not have a local equivalent

identify local elements to be kept ‘private’

identify other local elements to be shared ;

provide English definitionsto enable reconciliation with other participants’ elements

Pilot assessment

• Coverage?

• What additional common elements exist amongst the set to be shared?

How much orange overlaps purple?

How much purple overlaps purple?

• Other assessment criteria?

Pulling the pieces together

Please consider the foregoing proposal.

Does it seem reasonable to you?

Do you think there’s a better way?

top related