p. doorn (dans netherlands) - building data infrastructures for humanities
DESCRIPTION
Présentation de Peter Doorn (Data Archiving and Networked Services DANS) présentée lors do colloque infoclio.ch à Berne le 16 septembre 2010.TRANSCRIPT
Building national and international data infrastructures for humanities research
Peter Doorn - director, Data Archiving and Networked Services (DANS); co-ordinator, “Preparing DARIAH” (Digital Research Infrastructure for the Arts and Humanities)
Presentation for Digitale Forschungsinfrastrukturen für die Geschichtswissenschaften, Bern, 16 September 2010
Driven by data
Contents:
1.What is a data/research infrastructure?
2.The changing needs of the researchers
3.Setting up data infrastructures in the Netherlands, 1964 – 2010
4.The next steps
5.DARIAH and other international initiatives
Driven by data
1. What is a data/research infrastructure?
In the natural sciences: something concrete, something physical....
A building, a telescope, a particle accelerator, a nuclear icebreaker....
Driven by data
Driven by data
Research Infrastructures (R.I.)
• R.I. in general: permanent and physical• R.I. for the arts and humanities?
– Cultural heritage in all forms is the main source of humanities research– Libraries, archives and museums are the traditional “laboratories” for
the humanities
• In the digital age, essential for innovative humanities research is:– Access to digitised heritage data (data bases, text corpora, speech,
image collections, etc.)– Tools to process this information
• The most important new research infrastructure for the humanities is therefore a digital one
What kind of infrastructure do humanities scholars need?
Driven by data
From Humanities computingto e-humanities
Roots go back to the 1960s:• text analysis, e.g. bible studies• quantitative social and economic history• computer linguistics• digital archaeology
E-humanities as analogy of e-science: ‘science increasingly done through distributed global
collaborations enabled by the Internet, using very large data collections, large-scale computing resources and high performance visualisation.’
3. Setting up data infrastructures in the Netherlands, 1964 – 2010
1989: Netherlands Historical Data Archive
• Initiative by Low Countries Association for History and Computing
• Started with feasibility study, followed by inventory of databases and a pilot
• Until 1995 on a project basis, supported with digitization projects
• Organizational form flexible
Driven by data
2004: Electronic Depot of Dutch Archaeology• Idea came up at a conference of historians and archaeologists in 2003• 1980: computer used during excavation• Initiative by university archaeologists, data archive and state archaeological service• Started as a series of projects, since 2005 hosted by DANS
Driven by data
DANS created in 2005• Merger of earlier existing data infrastructures
• Serving humanities and social sciences
• Mission: providing permanent access to research data
• Funded by Academy of Arts and sciences (KNAW) and
Dutch Organisation for Research (NWO)
• Budget: 2.5 M€ + 1 M€ projects. Staff grew from 10 to
almost 40 (including projects).
Driven by data
What do we do?• Archive data and provide access
• Data projects in connection with researchers
• Data Seal of Approval
• Persistent Identifiers
• Symposia and Publications
• Subsidize “small data projects”
Three sections:• Archive
• Infrastructure
• Software Development Driven by data
Datasets according to disciplines
Driven by data
www.dans.knaw.nl
Driven by data
Electronic Archiving System for searching and depositing data
Dataset description
Download data after
login
Data
Documentation
Publications
Driven by data
Download statistics visible to all registered users
Digitization Population Censuses
www.volkstellingen.nl
Driven by data
Driven by data
Spreadsheets are look-alikes of original published tables
Driven by data
Mapping the census data for the Dutch municipalities
Shipping in the “Golden Age”
Journal entries, 26-29 September 1758
Ship’s name: NoordbevelandMonth: September
Year: 1758
Day: Tuesday
Date: 26thWeather on board Wind
Peculiarities
Dutch Shipping Routes 1750-1850Courtesy of CLIWOC project, KNMI
Driven by data
Driven by data
The research data:• can be found on the
Internet• are accessible (clear
rights and licenses)• are in a usable format• are reliable• can be referred to
(persistent identifier)
www.datasealofapproval.org
5 Criteria16 guidelines
4. The next steps
Broaden DANS into a discipline-independent data
organisation.
Many DANS activities are independent of discipline:
• Data Quality Guidelines: Data Seal of Approval
• Resolver for Persistent Identifiers
• Selection criteria for data preservation
• Deposit and Access Licenses, Intellectual Property Rights,
Privacy
• Standards (Archival file formats, metadata)
• Storage, conversion, backup, documentation services
Driven by data
New DANS strategy
– In line with National Coalition for Digital Preservation
– Build bridges between e-science and digital humanities
– Connect to other data infrastructures and initiatives in the technical, natural and life sciences
– Step by step approach
– Many large-scale facilities on the National Roadmap have a data function
Driven by data
www.dariah.eu
5. DARIAH and other international initiatives
European infrastructure challenges• In spite of some achievements, existing research
infrastructures are primarily national... if they are there at all!
• European activities are until now funded on a project basis and carried out as voluntary activities by national partners
• Stable, pan-European research infrastructures for the arts and humanities hardly exist
• Increasing internationalisation of humanities research puts new requirements for such infrastructures
• DARIAH is the only ESFRI proposal for the arts and humanities
www.dariah.eu
Science Case for DARIAH• Changing research practice in a networked world:
• Digital resources (data & tools) form the laboratory of the scholar in the arts and humanities
• Computational technologies and methods of analysis• Resources on the web are highly distributed• The scale of research goes up: networked projects
• European projects have no continuity• The existing structures are too weak (ad hoc networks, no
permanence) and national in scope• Answer: strong European data infrastructure, providing
continuity and support for digital A&H research and access to digital resources
www.dariah.eu
DARIAH Mission
The mission of DARIAH is to enhance and support digitally enabled research across the humanities and arts. DARIAH aims to develop and maintain an infrastructure in support of ICT-based research practices, working with communities of practice to:• Explore and apply ICT-based methods and tools to enable new
research questions to be asked and old questions to be answered in new ways
• Link and provide access to distributed digital source materials of many kinds
• Exchange knowledge, expertise, methodologies and practices across domains and disciplines
www.dariah.eu
DARIAH Partners• 14 members in 10 countries:
Croatia, Cyprus, Denmark, France, Germany (2), Greece (2), Ireland, Netherlands, Slovenia, United Kingdom (3)
• Associate members: Italy, Spain, Sweden
• Aspiring partners: Austria, Switzerland
• Other prospective partners in: Bulgaria, FYROM (Macedonia), Hungary, Lithuania, Norway, Serbia, Rumania
MembersAssociateAspiringProspective
www.dariah.eu
Preparation Project: Overview of the Work Packages
1. Project management2. Dissemination3. Strategic work4. Financial work5. Governance and logistical work6. Legal work7. Technical reference architecture8. Technical: Conceptual modelling
www.dariah.eu
Preparing DARIAH: time schedule
2008 2009
May 2007Deadline Capacities call
ESFRI projects
Q3 2008Agreement EC
funding
Q4 2008Start “Preparing DARIAH”
20102007
October 2006Publication ESFRI
Roadmap December 2006
Publication relevant FP7 call
Q3 2010 DARIAH
conference
Q1 2011Start construction DARIAH
Financial Commitment?
Q4 2009 Funders’ meeting
www.dariah.eu
DARIAH Virtual Competency Centers (Hubs)
Research & Education: supporting research groups and centres in the 'digital humanities'; knowledge exchange and education, post- graduate programmes and researcher exchange
e-Infrastructure: service provision, systems & tools, connecting resources
Adocacy & Promotion (& Management): PR, encourage collaboration, community building, website, administration, demonstrate value and impact
Content & Legal: supporting scholarly data creation, access, curation and preservation; rights management, IPR licences, quality assurance
www.dariah.eu
The VCC concept
www.dariah.eu
DARIAH Governance and Costs in Construction Phase
Governance structure of ERIC
www.dariah.eu
Relations to other projects and networks
www.dariah.eu
19-21 October, Vienna: SDHDARIAH-CLARIN conference
www.dariah.eu