research data: what can libraries do?
DESCRIPTION
TRANSCRIPT
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 1
Zaven Akopov Deutsches Elektronen-Synchrotron DESY
Research data: what can libraries do?
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 2
Contents
• Data preservation as an intrinsic part of Open Data • HEP Data: challenges and specifics • Requirements for documentation and long-time storage • Types of documents • high-level (secondary) data • Assignment of metadata and long-time storage in Inspire
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 3
Data Preservation as an intrinsic part of Open Data
• Data management cycle in HEP • data taking – initial storage – data processing – storage of processed data (high level data) – physics analysis (software) – publication of papers (interpretation)
• initially not planned for open access – but limited lifetime of experiments …
• providing infrastructure for data preservation might be paving the path to sustainable (open) access to data
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 4
HEP Research Data: Motivation • Accelerators, Detectors • Unique experimental data, usually not reproducable in other labs • A lot of resources and investments to build detectors, provide for manpower for data analysis • By the end of experimental data taking still substantial amount of data not analyzed • The data can also be processed and analyzed using eventually new methods and models which developed over time; new approaches. • Bottomline: the HEP data should be made accessible and be preserved
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 5
Efforts and models
• DPHEP Working group active since 2009: www.dphep.org
• 4 “levels” of HEP Research data and its preservation
have been identified
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 6
Efforts and models
• Most collaborations and labs plan the “Level 4 preservation” (raw data)
• The requirements of “Level 1” and partially, of “Level 2”
can and should be fulfilled using a high-end bibliographic system, with metadata assignment, etc.
• This is where the Library can help to close the gap in
the complete data preservation cycle: • Identifiable data
• Focus data access: preservation is an integral part of
data management.
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 7
Specific cases • To provide the necessary know-how for re-use: not only
the (primary) data itself and analysis software needed, but also the associated documentation related to data taking and analysis (technical guides, internal notes,…)
• … which provide basis for the corresponding
publications – but also substantially more additional information, e.g.: • Details of the data analysis methods (software,
simulation, …) • Detectors and their components (pedestals,
operating parameters, …)
• Need to preserve: secondary data (tables, root scripts, codes, plots) - simplified Data from Level 2
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 8
Long-time storage
• Example HERA (DESY): • By the end-of-running, storage on collaboration/IT
structures (servers run by the staff or IT)
• Websites, AFS space, propietary structures, etc.
• Lack of real bibliographic system (metadata, complex search engine, …)
• No consistent strategy and no sustainability: these structures would not be preserved by the DESY/IT
-> provide infrastructure
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 9
Why INSPIRE?
• Ingestion of the technical documents and analysis notes - possibility to interlink them with the actual publications (based on…, superceded by…)
• The documents are preserved and are not
dependent on the life expectancy of the specific experiment and it‘s IT infrastructure
• Many Inspire features like fulltext search, data
object citation, etc. • The secondary data (high-level data) provide
added value to the existing publications
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 10
INSPIRE Features
• Google-like speed for up to 2M records • Combined search of Metadata, References,
Fulltext • Scalability • Flexible metadata (multimedia, secondary data) • Personalisation (claim your paper) • One-stop-shop for HEP information • Fulltext repository • Integration of research data
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 11
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 12
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 13
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 14
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 15
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 16
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 17
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 18
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 19
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 20
Long-term access
• The documents (Notes) Access Control • Working together with the labs to develop
effective access control strategies (short/long-term)
• Simple user accounts are live • Curator accounts live • Further options
• Flexible user accounts (based on author lists, external authentification SSO), e.g. arXiv account
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 21
Summary
• Collaboration of DESY Library (Inspire) with DPHEP • HERMES, ZEUS, H1, ZEUS Experiments • Internationally – DØ, CDF (Fermilab), BaBar
(Stanford) • First stage completed for all HERA experiments and
D0 (Fermilab): all of the internal documentation are stored in Inspire
• Second Stage comleted: Collaboration curator accounts with modification rights are also live and a success
• Test phase: ZEUS and BaBar preliminary notes harvested; the rest should follow;
• High-level Data in Inspire: HEPData, Plots, etc. – Inspire provides longtime preservation.
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 22
Z. Akopov
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop June 11, 2013 | Page 24