source-to-output repositories aspects of repositories use: employing an ecology-based approach to...

22
Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences Panayiota Polydoratou Dagmar Biegon

Upload: joshua-ashby

Post on 28-Mar-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

Source-to-Output Repositories

Aspects of repositories use: employing an ecology-based

approach to report on user requirements in chemistry and the biosciences

Panayiota PolydoratouDagmar Biegon

Page 2: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Outline

StORe project The ecology approach - Concepts Example Observations

Page 3: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

StORe project (http://jiscstore.jot.com/WikiHome)

Funded: under the Joint Information Systems Committee (JISC – www.jisc.ac.uk) Digital Repositories

Programme and by the Consortium of Research Libraries in the British Isles (curl – www.curl.ac.uk)

Duration: 2 years (2005-2007)

Partners: University of Edinburgh (lead) University of York, representing White Rose Partnership University of Birmingham London School of Economics University of Manchester Imperial College London University College London UK Data Archive (UKDA) Johns Hopkins University

Seven disciplines Archaeology, Astronomy, Biochemistry, Biosciences, Chemistry, Physics, Social Policy

Page 4: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

StORe project (http://jiscstore.jot.com/WikiHome)

Aimed to:

Address the area of interactions between output repositories of research publications and source repositories of primary research data.

Determine the functionality that is required by researchers in both types of repository.

Identify options for increasing the value of using primary data in source repositories at the point where researchers submit to or download papers from output repositories.

Page 5: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

StORe project (http://jiscstore.jot.com/WikiHome)

Method:

Surveys of researchers that identify workflows and norms in the use of source and output repositories

Deliverable:

a generic technical specification of the functional enhancements to source and output repositories that were identified by researchers participating in the survey;

pilot middleware built using this generic technical specification and demonstrated through the linking of holdings in the UK Data Archive to research papers stored in output repositories

Page 6: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Characteristics of research in chemistry and the biosciences 1

Interdisciplinary research Strong bonds with biological, engineering,

environmental, material, physics and computing sciences

Diverse sets of research data, e.g. can be experimental, observational, computational

Publication to print journals usually requires the deposition of data to support the paper

Page 7: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Characteristics of research in chemistry and the biosciences 2

Bioscientists, especially those involved in research at gene and protein level usually generate large data sets which have to be organised and stored in specific repositories. Example: Human Genome Project

Humans have approximately 21,000 genes the number of possible sequences of DNA pairs (about 3

billion) translates into a huge number of possible combinations in DNA and proteins.

The majority of today’s gene and protein researchers will rely on information searching in publicly accessible, Internet-enabled databases, where the data generated by other research groups is stored

Page 8: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Ecology-based approach – Concepts 1

Scale = levels in an ecological view Entities = tangible “things” in a repository

ecosystem Species = particular type of entities Interactions = activities between different

species Resources = content of the interactions

Page 9: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Ecology-based approach – Concepts 2

SCALE

Levels of granularity – could be from Organism to Population to Community to Ecosystem StORe perspective = The users of a system or service

represent organisms or a population in an ecosystem.

Page 10: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Ecology-based approach – Concepts 3

ENTITY

Many things can be entities from humans to repositories to a service, etc. StORe perspective, humans = academic and research

staff and PhD students, the business analysts, and the system developers.

repositories = several domain-specific source and output repositories and

the consequent service that they provide as facilitators to free and open access to information.

Page 11: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Ecology-based approach – Concepts 4

Species = particular type of entities Researchers at different education levels, in

different disciplines, with different roles Repositories of different types (e.g.,

source/output, institutional/commercial, general/subject specific, etc.)

Service (provision of a support system for research, facilitation of open access to information, etc.)

Page 12: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Ecology-based approach – Concepts 5

Interactions Between humans (amongst researchers –

indicative of different patterns of research norms) Between humans and machines (differences at

the point of submission to a source/output repository to the point of searching and retrieving information from a source/output repository)

Between machines (by directional links between source and output repositories)

Page 13: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Ecology-based approach – Concepts 6

Resource Content of interactions between humans, humans

and machines and machines to machines. Primarily information (can take the form of processed or unprocessed data, exchange of data and conversations amongst researchers, etc.)

Page 14: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Ecology-based approach – Concepts 7

Factors Users’ ability/experience with systems Availability of data (including both present and

long term archiving) System’s functionality Clear demonstration of potential benefits and take

on by the target community/population Requirements imposed by third parties (e.g.,

funding bodies, legal aspects of availability and access to information, etc.)

Page 15: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Example* What sort of thing (repository or service) is this? What does it relate to (other repositories or services)? What does it depend on? How adaptable it is? What helps it to thrive?

*Example questions used on the model that R. John Robertson presented at:R. John Robertson (2007). The repository ecology: an approach to understanding repository and service interactions. Presentation at OAI5. (http://indico.cern.ch/contributionDisplay.py?contribId=11&sessionId=10&confId=5710)

Page 16: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Depositing to a source repository based on a use case

Page 17: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Observations

Scale Organism level – a researcher submitting data to

a source repository Species

Researchers (a bioscientist interacting with other bioscientists, groups, institutions, system administrators)

Repositories (GenBank, PubMed) Service (Making research available)

Page 18: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Observations

Interactions Humans (interactions to decide about which source

repository to submit, obtain clearance from research groups and institutions, check legal aspects of deposition)

Humans – machines (form for submission, filling in associated metadata, lab book data)

Machines to machines (source to output repository, primary data to publication, assorted metadata, mapping and conversion of file formats)

Resources Primary data, published data, files in various formats,

metadata associated with the files

Page 19: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Observations

Factors Easy and quick to use system Sustainable system Perceived value/quality of the service and take on

by the community it is intended (e.g., reliability and availability of data, etc.)

Institutional/funding bodies’ requirement

Page 20: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Observations

Use case maybe relatively simple to represent but in practice several factors are affecting the use and evolution of a system/service

Representation by an ecology-based approach may prove more beneficial to map but potentially more difficult to represent all species and possible interactions that could take place in one single deposition of data, for example.

Maybe some aspects are easier to represent by an ecology-based approach, e.g. interactions

Page 21: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

Thank you!

Panayiota Polydoratou & Dagmar Biegon

[email protected] [email protected]

Page 22: Source-to-Output Repositories Aspects of repositories use: employing an ecology-based approach to report on user requirements in chemistry and the biosciences

ECDL2007 Workshop: Towards an European repository ecology, 21st September 2007, Budapest

References

Human Genome Project (http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml)

StORe project (http://jiscstore.jot.com/WikiHome) R. John Robertson (2007). The repository ecology: an

approach to understanding repository and service interactions. Presentation at OAI5. (http://indico.cern.ch/contributionDisplay.py?contribId=11&sessionId=10&confId=5710)

Data Archive (http://www.data-archive.ac.uk/)