clarin: goals and structure of the project steven krauwer clarin coordinator utrecht institute of...
TRANSCRIPT
CLARIN: Goals and Structure of the Project
Steven KrauwerCLARIN Coordinator
Utrecht institute of Linguistics UiL-OTS (NL)
Steven Krauwer CLARIN - Riga 03-11-2008 2
Overview
• Problem & Mission• Some why-questions• Some who-questions• Overall plan
– Technical dimension– Language dimension– User dimension– Governance and legal
dimension
• What CLARIN is NOT about
• How we work• Funding• Structure• To conclude
Steven Krauwer CLARIN - Riga 03-11-2008 3
The problem
• Much data in digital archives language based
• Only known to insiders
• Archives mostly unconnected
• Every archive has its own standards for storage and access
• Normally only simple retrieval of files (text, audio or video documents)
• Social sciences and humanities researchers are not language or speech technologists
• They are often not aware of the potential benefits of using language and speech technology
• Available tools are hard to use for non-specialist
Steven Krauwer CLARIN - Riga 03-11-2008 4
The CLARIN Mission
What: • Create an
infrastructure that makes language resources and technology (LRT),available to scholars of all disciplines, especially social sciences and humanities (SSH)
How: • Unite existing digital
archives into a federation of archives with unified web access
• Provide existing language and speech technology tools as web services operating on language data in archives
Steven Krauwer CLARIN - Riga 03-11-2008 5
Why a European infrastructure?
• too much fragmentation
• lack of coordination across countries
• lack of visibility• lack of
interoperability• lack of sustainability
• expertise exists but not in all countries
• language independent tools can be shared
• language dependent tools can often be ported
• most countries not able to bear the cost
Steven Krauwer CLARIN - Riga 03-11-2008 6
Why now?
• Exponential growth of digital data
• Increasing maturity of language and speech technology:– high speed– large volumes– new research
questions
• Growing interest at EU level in research infrastructures (RI)
• RI Roadmap published in 2006 by ESFRI
• includes 35 accepted proposals for RIs
• CLARIN is one of them• all of them will get
funding for a 1-3 year preparatory phase
Steven Krauwer CLARIN - Riga 03-11-2008 7
Who we are and where we come from
• The CLARIN consortium has now 32 partners from 22 EU and associated countries (and more on the waiting list)
• The CLARIN community has 142 members in 32 countries (Oct 2008)
• CLARIN is based on 4 earlier initiatives with many participants:– LangWeb – EARL – TELRI – (and later) DAM-LR
Steven Krauwer CLARIN - Riga 03-11-2008 8
Who else do we need?
• Both our membership and our consortium are quite unbalanced:– Speech & multimodality underrepresented– Humanities other than linguistics
underrepresented– Social sciences underrepresented– Some countries still missing
• There is no money to extend the consortium but we have to fill these gaps
Steven Krauwer CLARIN - Riga 03-11-2008 9
Overall plan for CLARIN
Preparatory phase:• 2008-2010• Put everything in
place
Construction phase:• 2011-2015• Build and populate
with tools and resources
Exploitation phase:• 2016-….• CLARIN in full
serviceBudget:• Prep phase
– 4.1 M€ from EC– ??? from countries
• Estimated budget until 2020: ca 200 M€
Steven Krauwer CLARIN - Riga 03-11-2008 10
4-dimensional approachin the preparatory phase
First 3 years dedicated
• The technical dimension
• The language dimension
to the design:
• The user dimension
• The governance and legal dimension
Steven Krauwer CLARIN - Riga 03-11-2008 11
Technical
• Technical specification of the infrastructure
• Construction of a prototype
• Validation on rich variety of– languages (>20)– resources– services
• Federation of existing archives
• Based on existing resources, tools
• Strong focus on interoperability standards
• Conversion of existing resources
• Encapsulation of existing tools
Steven Krauwer CLARIN - Riga 03-11-2008 12
Languages
• Cover all languages spoken or studied in participating countries
• Representational and descriptive standards should be adequate and validated for all languages
• Same minimal coverage of basic resources and tools for all languages
• BLARK (Basic Language Resources Toolkit) to be defined and implemented (funds from other sources needed)
Steven Krauwer CLARIN - Riga 03-11-2008 13
Language activities
• Survey of resources and tools, including:– encoding and
annotation data– quality indicators
• taxonomies and ontologies
• agreeing on common standards
Focus on• integration of tools• interoperability• usage scenarios• creating missing
essential resources• validating
specifications and prototype
Steven Krauwer CLARIN - Riga 03-11-2008 14
User
• Users are SSH scholars (including linguists, translation experts)
• Do WE know what they need?
• Do THEY know what they need?
• Actions: – analyze past and
ongoing SSH projects
– user consultation– launch typical
example projects to show potential
– expertise centers– awareness actions
Steven Krauwer CLARIN - Riga 03-11-2008 15
Legal
IPR issues• aim at open source,
but IPR for existing and future non-open resources must be accommodated
• federation of archives requires authentication, authorization and trust between archives
• aim at limited number of template license agreements for most common cases
• respect national legislation
• address ethical issues
Steven Krauwer CLARIN - Riga 03-11-2008 16
Governance andFunding
Agree on e.g.:• Who is going to pay
for the construction and exploitation of the infrastructure
• How will it be managed
• How will it be coordinated with national policies
Actions:• Analyse best
practice in funding and management of transnational projects
• Prepare agreement between (now) 22 countries about long term joint funding of CLARIN
Steven Krauwer CLARIN - Riga 03-11-2008 17
What CLARIN is NOT about
• building the infrastructure – we are just preparing it
• creating new resources – at this stage we want to use what is there and adapt it if necessary
• creating new applications – except maybe some essential tools or demonstrators
• focusing on the big languages – we find all languages equally important
• strengthening European industry – our target audience are SSH researchers, but we don’t want to exclude anyone
Steven Krauwer CLARIN - Riga 03-11-2008 18
How we work (1)
Work packages:• WP1: Management
and coordination• WP2: Designing the
infrastructure and building the prototype
• WP3: Humanities overview
• WP5: Language resources and technology overview
• WP6: Dissemination• WP7: IPR and
business models• WP8: Construction
and exploitation agreement
Steven Krauwer CLARIN - Riga 03-11-2008 19
How we work (2)
WP8Org&Legal Framework
WP7IPR, A&A,licensing
WP5LRT
Exploration
WP2Infrastructure
Prototype
WP3Humanities
Projects
1
26
7
4
5
3
8
Steven Krauwer CLARIN - Riga 03-11-2008 20
How we work (3)
• Most tasks executed in Working Groups
• WGs consist of project partners & other experts (CLARIN is open!)
• Some WGs do work (e.g. build prototype), others create consensus
• Participation by others essential as e.g. standards cannot be imposedby a small group
• Unfortunately no EC funding available for WG participation – only reward is influence!
Steven Krauwer CLARIN - Riga 03-11-2008 21
Funding & what to use it for
• From EC: 4.1 M€, used for generic, language independent tasks
• From countries: ??? M€, to be used for preparing CLARIN at the national level in every country:– build and organize local national CLARIN community– support for participation in working groups (e.g. travel)– validation tasks for own language(s)– creation or adaptation of essential resources– pilots and demonstrators & humanities projects– (co-)organisation of local or international events– preparing for future role (expertise centers, repositories)
Steven Krauwer CLARIN - Riga 03-11-2008 22
Structure
• Executive Board, consisting of the 7 WP leaders plus a special representative to liaise with the humanities community (a.o. through the DARIAH sister project)
• Boards:– Scientific Board– Strategic
Coordination Board– International
Advisory Board
• Meetings (virtual or face to face):– Consortium meetings– Member meetings– Working group
meetings
Steven Krauwer CLARIN - Riga 03-11-2008 23
More info
• CLARIN Website: http://www.clarin.eu
• CLARIN Office: [email protected]
• CLARIN Newsletter: http://www.clarin.eu/newsletter
• CLARIN Members: http://www.clarin.eu/members