populating the infrastructure the case of the netherlands

27
Populating the infrastructure the case of the Netherlands Hans Bennis executive board of CLARIN-NL Meertens Institute (KNAW) CLARIN COORDINATORS BUDAPEST, June 29-30 1

Upload: heman

Post on 12-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Populating the infrastructure the case of the Netherlands. Hans Bennis executive board of CLARIN-NL Meertens Institute (KNAW) CLARIN COORDINATORS BUDAPEST, June 29-30. the start in 2009. 9 million Euro for CLARIN-NL for the period 2009-2015 (requested amount m€ 25) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Populating the infrastructure the case of the Netherlands

Populating the infrastructurethe case of the Netherlands

Hans Bennisexecutive board of CLARIN-NL

Meertens Institute (KNAW)

CLARIN COORDINATORSBUDAPEST, June 29-30

1

Page 2: Populating the infrastructure the case of the Netherlands

the start in 2009

• 9 million Euro for CLARIN-NL for the period 2009-2015 (requested amount m€ 25)

• concentration on text (language data for humanities research)• audio and video are left out, in contrast to the original proposal• social sciences are not included, in contrast to the orginal proposal• organizational structure: director, executive board, board, advisory

panels (national and international)• substantial part of money will be spent in programmatic form

through Calls• important goal / ambition: create broad support for CLARIN in

humanities research in the Netherlands

2

Page 3: Populating the infrastructure the case of the Netherlands

Projects 2009• technical projects (centers, metadata, web services,

workflow, etc.)• centers: Max Planck Institute for Psycholinguistics (MPI,

Nijmegen), Meertens Institute (Amsterdam), DANS (Den Haag) and Institute for Dutch Lexicology (INL, Leiden)

• user survey• Call-1 (Demonstrator Projects or Resource Curation

projects)• 12 projects (+/- € 60.000 each)

– demonstrator projects– data curation projects

3

Page 4: Populating the infrastructure the case of the Netherlands

Call-1 Projects1) AAM-LR [UNijmegen/MPI] - Automatic annotation of

language resources2) Adelheid [UNijmegen/MPI] – Lemmatizer for

Historical Dutch3) Adept [UGroningen/Meertens] – Dialect Analysis4) Duelme-LMF [UUtrecht/INL] – Multi-word

expressions5) INTER-VIEWS [UNijmegen/DANS] – Interviews of life-

history of veterans6) MIMORE [UUtrecht/Meertens] – Dialect

morphosyntax7) SignLinC [UNijmegen/MPI] – Sign Language

4

Page 5: Populating the infrastructure the case of the Netherlands

Call-1 (more)

8) TDS Curator [UUtrecht/DANS] – Typological Database

9) TICCLops [UTilburg/INL] – Text Clean-up10) TQE [UNijmegen/MPI]Transcription evaluation11) WFT-GTB [Fryske Akademy/INL] – Integration of

Dutch and Frisian dictionaries12) CKCC [UUtrecht, Huygens Institute, DANS]

Correspondence of scholars in 17th century

5

Page 6: Populating the infrastructure the case of the Netherlands

Demonstration of the Microcomparative Morphosyntactic

Research Tool

MIMORESjef Barbiers, Matthijs Brouwer,

Jan Pieter Kunst, Folkert de Vriend

Meertens Instituut, 2011

6

Page 7: Populating the infrastructure the case of the Netherlands

Opening screen MIMORE

7

Page 8: Populating the infrastructure the case of the Netherlands

Research question• The Standard Dutch [non-neuter] relative pronoun

and distal demonstrative has the form ‘die’ (that, those).

• We know that there are dialects that have ‘dien’ as a relative pronoun and/or as a distal demonstrative.

• We would like to know if there is a correlation between ‘dien’ as a relative pronoun, ‘dien’ as a demonstrative preceding a noun, and ‘dien’ as a demonstrative in elliptical constructions.

• The linguistic question behind this search is what the ‘-n’ on ‘die’ is: case, phonologically determined, etc.?

8

Page 9: Populating the infrastructure the case of the Netherlands

Optional restrictions on the search

9

Page 10: Populating the infrastructure the case of the Netherlands

Search 1: DynaSAND with text string and tag constructor: ‘dien’ as relative pronoun

10

Page 11: Populating the infrastructure the case of the Netherlands

Elements of search result

11

Page 12: Populating the infrastructure the case of the Netherlands

Specification of data resource

12

Page 13: Populating the infrastructure the case of the Netherlands

Corresponding sound fragment

13

Page 14: Populating the infrastructure the case of the Netherlands

Search 2: GTRP with demonstrative + N in test item

14

Page 15: Populating the infrastructure the case of the Netherlands

Elements of search result

15

Page 16: Populating the infrastructure the case of the Netherlands

Result of search 3: demonstrative ‘dien’ in elliptical nominal groups in DIDDD

16

Page 17: Populating the infrastructure the case of the Netherlands

Available operations on search results

17

Page 18: Populating the infrastructure the case of the Netherlands

Map combining three search results

18

Page 19: Populating the infrastructure the case of the Netherlands

Map combiningtwo search results

19

Page 20: Populating the infrastructure the case of the Netherlands

Frequency maps

20

Page 21: Populating the infrastructure the case of the Netherlands

Creating the intersection of two sets of search results

21

Page 22: Populating the infrastructure the case of the Netherlands

Export as Excel-file

22

Page 23: Populating the infrastructure the case of the Netherlands

Data exported

23

Page 24: Populating the infrastructure the case of the Netherlands

Complex search: More thanone database, string of tags

24

Page 25: Populating the infrastructure the case of the Netherlands

CALL-2 (2011)

1) Arthurian Fiction [UUtrecht] - Curation of two databases for literary research

2) C-DSD [UUtrecht/Meertens] Curation of Folksong Database

3) COAVA [Meertens] bringing together five linguistic databases (language variation/acquisition)

4) INPOLDER [UNijmegen/Meertens] Syntactic analysis of historical Dutch

5) IPROSLA [UNijmegen/UAmsterdam/MPI] Sign language databases

25

Page 26: Populating the infrastructure the case of the Netherlands

CALL-2 (more)

6) NEHOL [UNijmegen] – Curation of Negerhollands database

7) VU-DNC [VU-Amsterdam] – corpus of Dutch newspapers

8) WAHSP [UUtrecht] – Text mining in large historical databases

9) WIP [NIOD] – Data curation of Dutch Second World War database

26

Page 27: Populating the infrastructure the case of the Netherlands

developments• collaboration with CATCH-programme (programme to finance projects for teams of ict-developers,

humanities scholars and cultural heritage institutions) – CLAVAS – vocabularies– Persistent Identifiers

• Data Curation Service (>2011) • Call 3 (call open now; projects in 2012)• Agreement with Dutch Science Foundation (NWO) and

Royal Netherlands Academy of Science (KNAW) with respect to CLARIN-norm for databases/tools in humanities

• CLARIN-NL + DARIAH-NL => CLARIAH – Dutch Roadmap

27