clarin: goals and structure of the project steven krauwer clarin coordinator utrecht institute of...

23
CLARIN: Goals and Structure of the Project Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL- OTS (NL)

Upload: aileen-garrison

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

CLARIN: Goals and Structure of the Project

Steven KrauwerCLARIN Coordinator

Utrecht institute of Linguistics UiL-OTS (NL)

Steven Krauwer CLARIN - Riga 03-11-2008 2

Overview

• Problem & Mission• Some why-questions• Some who-questions• Overall plan

– Technical dimension– Language dimension– User dimension– Governance and legal

dimension

• What CLARIN is NOT about

• How we work• Funding• Structure• To conclude

Steven Krauwer CLARIN - Riga 03-11-2008 3

The problem

• Much data in digital archives language based

• Only known to insiders

• Archives mostly unconnected

• Every archive has its own standards for storage and access

• Normally only simple retrieval of files (text, audio or video documents)

• Social sciences and humanities researchers are not language or speech technologists

• They are often not aware of the potential benefits of using language and speech technology

• Available tools are hard to use for non-specialist

Steven Krauwer CLARIN - Riga 03-11-2008 4

The CLARIN Mission

What: • Create an

infrastructure that makes language resources and technology (LRT),available to scholars of all disciplines, especially social sciences and humanities (SSH)

How: • Unite existing digital

archives into a federation of archives with unified web access

• Provide existing language and speech technology tools as web services operating on language data in archives

Steven Krauwer CLARIN - Riga 03-11-2008 5

Why a European infrastructure?

• too much fragmentation

• lack of coordination across countries

• lack of visibility• lack of

interoperability• lack of sustainability

• expertise exists but not in all countries

• language independent tools can be shared

• language dependent tools can often be ported

• most countries not able to bear the cost

Steven Krauwer CLARIN - Riga 03-11-2008 6

Why now?

• Exponential growth of digital data

• Increasing maturity of language and speech technology:– high speed– large volumes– new research

questions

• Growing interest at EU level in research infrastructures (RI)

• RI Roadmap published in 2006 by ESFRI

• includes 35 accepted proposals for RIs

• CLARIN is one of them• all of them will get

funding for a 1-3 year preparatory phase

Steven Krauwer CLARIN - Riga 03-11-2008 7

Who we are and where we come from

• The CLARIN consortium has now 32 partners from 22 EU and associated countries (and more on the waiting list)

• The CLARIN community has 142 members in 32 countries (Oct 2008)

• CLARIN is based on 4 earlier initiatives with many participants:– LangWeb – EARL – TELRI – (and later) DAM-LR

Steven Krauwer CLARIN - Riga 03-11-2008 8

Who else do we need?

• Both our membership and our consortium are quite unbalanced:– Speech & multimodality underrepresented– Humanities other than linguistics

underrepresented– Social sciences underrepresented– Some countries still missing

• There is no money to extend the consortium but we have to fill these gaps

Steven Krauwer CLARIN - Riga 03-11-2008 9

Overall plan for CLARIN

Preparatory phase:• 2008-2010• Put everything in

place

Construction phase:• 2011-2015• Build and populate

with tools and resources

Exploitation phase:• 2016-….• CLARIN in full

serviceBudget:• Prep phase

– 4.1 M€ from EC– ??? from countries

• Estimated budget until 2020: ca 200 M€

Steven Krauwer CLARIN - Riga 03-11-2008 10

4-dimensional approachin the preparatory phase

First 3 years dedicated

• The technical dimension

• The language dimension

to the design:

• The user dimension

• The governance and legal dimension

Steven Krauwer CLARIN - Riga 03-11-2008 11

Technical

• Technical specification of the infrastructure

• Construction of a prototype

• Validation on rich variety of– languages (>20)– resources– services

• Federation of existing archives

• Based on existing resources, tools

• Strong focus on interoperability standards

• Conversion of existing resources

• Encapsulation of existing tools

Steven Krauwer CLARIN - Riga 03-11-2008 12

Languages

• Cover all languages spoken or studied in participating countries

• Representational and descriptive standards should be adequate and validated for all languages

• Same minimal coverage of basic resources and tools for all languages

• BLARK (Basic Language Resources Toolkit) to be defined and implemented (funds from other sources needed)

Steven Krauwer CLARIN - Riga 03-11-2008 13

Language activities

• Survey of resources and tools, including:– encoding and

annotation data– quality indicators

• taxonomies and ontologies

• agreeing on common standards

Focus on• integration of tools• interoperability• usage scenarios• creating missing

essential resources• validating

specifications and prototype

Steven Krauwer CLARIN - Riga 03-11-2008 14

User

• Users are SSH scholars (including linguists, translation experts)

• Do WE know what they need?

• Do THEY know what they need?

• Actions: – analyze past and

ongoing SSH projects

– user consultation– launch typical

example projects to show potential

– expertise centers– awareness actions

Steven Krauwer CLARIN - Riga 03-11-2008 15

Legal

IPR issues• aim at open source,

but IPR for existing and future non-open resources must be accommodated

• federation of archives requires authentication, authorization and trust between archives

• aim at limited number of template license agreements for most common cases

• respect national legislation

• address ethical issues

Steven Krauwer CLARIN - Riga 03-11-2008 16

Governance andFunding

Agree on e.g.:• Who is going to pay

for the construction and exploitation of the infrastructure

• How will it be managed

• How will it be coordinated with national policies

Actions:• Analyse best

practice in funding and management of transnational projects

• Prepare agreement between (now) 22 countries about long term joint funding of CLARIN

Steven Krauwer CLARIN - Riga 03-11-2008 17

What CLARIN is NOT about

• building the infrastructure – we are just preparing it

• creating new resources – at this stage we want to use what is there and adapt it if necessary

• creating new applications – except maybe some essential tools or demonstrators

• focusing on the big languages – we find all languages equally important

• strengthening European industry – our target audience are SSH researchers, but we don’t want to exclude anyone

Steven Krauwer CLARIN - Riga 03-11-2008 18

How we work (1)

Work packages:• WP1: Management

and coordination• WP2: Designing the

infrastructure and building the prototype

• WP3: Humanities overview

• WP5: Language resources and technology overview

• WP6: Dissemination• WP7: IPR and

business models• WP8: Construction

and exploitation agreement

Steven Krauwer CLARIN - Riga 03-11-2008 19

How we work (2)

WP8Org&Legal Framework

WP7IPR, A&A,licensing

WP5LRT

Exploration

WP2Infrastructure

Prototype

WP3Humanities

Projects

1

26

7

4

5

3

8

Steven Krauwer CLARIN - Riga 03-11-2008 20

How we work (3)

• Most tasks executed in Working Groups

• WGs consist of project partners & other experts (CLARIN is open!)

• Some WGs do work (e.g. build prototype), others create consensus

• Participation by others essential as e.g. standards cannot be imposedby a small group

• Unfortunately no EC funding available for WG participation – only reward is influence!

Steven Krauwer CLARIN - Riga 03-11-2008 21

Funding & what to use it for

• From EC: 4.1 M€, used for generic, language independent tasks

• From countries: ??? M€, to be used for preparing CLARIN at the national level in every country:– build and organize local national CLARIN community– support for participation in working groups (e.g. travel)– validation tasks for own language(s)– creation or adaptation of essential resources– pilots and demonstrators & humanities projects– (co-)organisation of local or international events– preparing for future role (expertise centers, repositories)

Steven Krauwer CLARIN - Riga 03-11-2008 22

Structure

• Executive Board, consisting of the 7 WP leaders plus a special representative to liaise with the humanities community (a.o. through the DARIAH sister project)

• Boards:– Scientific Board– Strategic

Coordination Board– International

Advisory Board

• Meetings (virtual or face to face):– Consortium meetings– Member meetings– Working group

meetings

Steven Krauwer CLARIN - Riga 03-11-2008 23

More info

• CLARIN Website: http://www.clarin.eu

• CLARIN Office: [email protected]

• CLARIN Newsletter: http://www.clarin.eu/newsletter

• CLARIN Members: http://www.clarin.eu/members