previous events panel on international co-operation (lrec - granada) panel of the funding agencies...
TRANSCRIPT
PREVIOUS EVENTS
• Panel on International Co-operation(LREC - Granada)
• Panel of the Funding Agencies(LREC - Granada)
• Post-LREC Workshop on “Multilingual Information Management: Current
Levels and Future Abilities (Closing Session - Granada)
• International Conference on “New Vistas in Transatlantic Scientific and Technical Cooperation”
(Washington DC - June ‘98)
• HLT Session
• LREC (Athens) 2000 - Panel on International Cooperation
• COLING (Saarbruecken) 2000 - Panel on International Cooperation
INTERNATIONAL CO-OPERATION
In previous discussions (Granada, Washington, 1998; Athens, 2000) the following areas of HLT have emerged as being in urgent need of international co-operation:
– standards (de facto, best practices);
– Language Resources (LR);
– evaluation;
– core technologies;
– selected vertical domains.
LR are a central infrastructural component and a central issue for
international coordination
LR are essential components of HLT activity, supporting research, system development and training, and evaluation in both the mono- and multilingual context;
Political/cultural aspects of LR: “endangered languages”;
A key enabling condition of integration of different technologies and languages requires that LR are shared among different sectors and applications;
The richness of the multilingual capabilities associated with a language depends on the number of languages for which adequate LR exist;
The high cost and effort of the production of LR should be shared, in order to make them more affordable. The creation of multilingual LR requires agreement on a coordination policy, to ensure the reuse of existing monolingual resources and to facilitate access to native speakers of the various languages.
MULTILINGUAL LR
In particular, the production of multilingual LR poses:
• research issue and challenges;
• organisational problems:
who has the responsability of promoting - and how - the co-operation of R&D communities speaking different languages?
Different situation for:
• types of LR: corpora, lexicons, etc.;
• large/general multilingual LR;
• applications specific LR;
• customisation;
• different types of information (data VS analitical/interpretative features).
- for different phases of LR development (research standards; specifications;construction;maintenance; acquisition; (pre)tuning; customization; updating;technology transfer; etc.)
• DIFFERENT strategies in various continents ROLES/RESPONSABILITIES of various actors
CHALLENGES
• MODALITIES
“Structured” international co-operation (For ex.: US-EU Transatlantic agreement: current/past initiatives, lessons to be learned, perspectives for the future, perspectives/opportunities)
Other forms of international
co-operation: experiences;
advantages/disadvantages/recovery measures/consequences; suggestions for the short/medium/long term.
• ETC.
Why international co-operation is more “difficult” than in other sciences and needs institutional support?
What are the consequences of the links with social goals, national identities and commercial interests which characterize HLT?
In which areas is international co-operation necessary / appropriate / inappropriate?
Which activities need international co-operation (for instance, evaluation, multilingual LR etc.)?
What can we learn from recent experiences of co-operation supported by the Funding Agencies in the frame of the “Transatlantic Scientific and Technical Co-operation Agreement”? What are the obstacles? Which strategies have been successful?
What kind of initiatives for co-operation with other countries (Asian, South-American etc.) can be taken and what their possibilities / benefits / priorities could be?
European Language Resource Association An Improved infrastructure for Data sharing
Centralized Non-for-profit organization for the collection, distribution, and validation of
speech, text, and terminology resources and tools,
A Repository Center:Technical & Logistic issuesCommercial issues (prices, fees, royalties)Legal issues (Licensing, IPR) Information Dissemination
An Association of users of Language Resources
An operational company: European Language Resource Distribution Agency (ELDA)
The Association
• Membership Drive:ELRA is Open to European & Non-European InstitutionsResources are available to Members & Non-Members
Pay per Resource
Substantial discounts on LR prices (over 70%)Legal and contractual assistance with respect to LR mattersAccess to Validation and production manuals (Quality assessment)Figures and facts about the Market (results of ELRA surveys)Newsletter and other publications
• Some of the benefits of becoming a member:
ELRA CATALOGUE
-- Identification of LRs
22 3156 66 70
96 96 102125
628
122 126
189 176 180 181196
21
96
361 361 361 361 361
275 275 275 275
79 105 92 95
133 140171 171
272 275
0
50
100
150
200
250
300
350
400
03/96 10/96 03/97 12/97 03/98 06/98 09/98 12/98 03/99 06/99 09/99 12/99 06/00
Speech
Written
Terminology
Legal Issues- Licensing
Provider-User Agreements
Providers
Providers
Providers User
User
User
Legal Issues- Licensing
ELRAProviders
Owners
Distribution Agreement
VAR END-users
End-Users
VAR Agreement
End-User Agreement
DISTRIBUTION ACTIVITIES distributed resources
Number of resources distributed to members & non-members
Periods of 12 months
21
161
101
10 18
67
31
179168
0
20
40
60
80
100
120
140
160
180
200
1/10/96-30/09/97 1/10/97-30/09/98 1/1/99-31/12/99
Members
Non-members
Total
Other Technical ACTIVITIES
Market analysis & surveys
LR production, packaging & funding
Validation of resources - Validation networks
Language Resources for Evaluation purposes
Some aspects of LREC’2000
Connecting industrials with academic partners
Some facts on LREC’2000 Participants
Total Nb of Participants (Conference & Workshops)
600
Status Nb of Participants
Full 276
Member 124
Student 98
Others 104
Some facts on LREC’2000
WS Nb
WS 1: From Spoken Dialogue to Full Natural Interactive Dialogue. Theory, Empirical Analysis and Evaluation
53
WS 2. Very Large Telephone Speech Databases 25
WS 3. Meta-Descriptions and Annotation Schemas for Multimodal/Multimedia Language Resources
67
WS 4. Terminology Resources and Computation 40
WS 5. Workshop on the Evaluation of Machine Translation
37
WS 6. Information Extraction Meets Corpus Linguistics 74
WS 7. Language Resources and Tools in Educational Applications
23
WS11. Using Evaluation w ithin HLT Programs: Results and Trends
46
WS8. Data Architectures and Softw are Support for Large Corpora DATA: Tow ards an American National
61
WS 9. Developing Language Resources for Minority Languages. Reusability and Strategic Priorities.
38
International Standards International Standards for for
Language EngineeringLanguage EngineeringProject ParticipantsProject Participants
EUEU Consorzio Pisa Ricerche I CO University of Southern Denmark DK CR Université de Genève CH CR
USAUSA University of Pennsylvania – Computer and Information Sciences University of Pennsylvania – Linguistic Data Consortium New York University Information Sciences Institute – University of Southern
California
ObjectivesObjectives
To develop HLT standardsHLT standards within EU-USEU-US International Research Cooperation
To build on the successful EAGLES (Expert Advisory EAGLES (Expert Advisory Group for Language Engineering Standards)Group for Language Engineering Standards)
To tackle innovative areasinnovative areas where standards are strongly and urgently required
To support HLTsupport HLT RTD and National projects, and HLT industry
To promote EAGLES as an internationally active body for internationally active body for HLT standardisationHLT standardisation
To contribute to all ISTcontribute to all IST thematic programmes
OrganisationOrganisation
Work is organised in 3 EAGLES Working Groups3 EAGLES Working Groups and several subgroups of expertsseveral subgroups of experts, drawn from academia and
industry,
to build consensusbuild consensus around international workshops
ISLE standards and guidelines will be validated validated in RTD and National projects, disseminateddisseminated widely, with exemplary data
to yield maximum impact for minimum cost, and enhance user experience of the information society
through standards-based HLT
Multilingual Multilingual Computational Computational
Lexicons - CLWGLexicons - CLWG
extend EAGLES work on lexical semanticslexical semantics, necessary to establish inter-language links
design standards for multilingual lexiconsstandards for multilingual lexicons
develop a prototype toolprototype tool to implement lexicon guidelines and standards
create exemplary EAGLES conformant sample lexiconssample lexicons and tag tag exemplary corporaexemplary corpora for validation purposes
develop standardised evaluation proceduresstandardised evaluation procedures for lexicons
Natural InteractionNatural Interactionand Multimodalityand Multimodality
NIMM - WGNIMM - WG
A rapidly innovating domainrapidly innovating domain urgently requiring early standardisation. ISLE will develop guidelines forguidelines for:
the creation of NIMM data resourcesdata resources
multilayer annotation of NIMM datamultilayer annotation of NIMM data, including spoken dialogue in NIMM contexts
meta-data descriptionsmeta-data descriptions of multimodal resourcesmultimodal resources
a specification - and first implementation - of extension of annotation toolsannotation tools to multimodal capabilitymultimodal capability
Evaluation Evaluation of HLT Systems of HLT Systems
E - WGE - WG
Evaluation methodology of HLT products and systems
based on ISO standardson ISO standards
accompanied by practical case studiescase studies
quality models for Machine Translation systemsfor Machine Translation systems
maintenancemaintenance of previous guidelines - in an ISO based
framework (ISO 9126, ISO 14598)
‘86 Grosseto Workshop
• LEX Projects (Acquilex)
• EuroWordNet
Common Top Ontologies
Common Basic Concepts
ILI: Interlingual Index to English WordNet
Italian, Spanish, Dutch, +
WordNet Association announced in Athens (2nd LREC, June 2000)
Speechdat, Speechdat-Car, Specon, Sala, etc.
PAROLE/SIMPLE
A set of comparable corpora (20M words) and computational lexica (20K entries)
Encoded at morpho, syntactic, semantic level according to common specifications compatible with EAGLES recommendations
For all the EC languages
Subsidiarity principle
Initial harmonized nuclei (financially supported by the EC)
to be enlarged to “real-life” size through national funds (already 7 national projects)
ENABLER European National Activities for Basic Language Engineering & Resources
Survey of existing national activities
Fostering common research and compatibility of LR
Suggestion for and contribution to international
cooperation
-- A new InitiativeIdentification of existing resources (Universal Catalogue)The Basics (e.g. Standards, tools, evaluation procedures, …)
INITIAL TOPICS for an INTERNATIONAL COOPERATION INITIAL TOPICS for an INTERNATIONAL COOPERATION
“STRUCTURE” in the FIELD OF LR “STRUCTURE” in the FIELD OF LR
to identify LR available for different languages to define a “minimal” set of “basic” LR to be promoted for as many
languages as possible to establish an international effort vs. the WIPO for promoting an
adequate legislation for the provision of LR to establish for written LR an initiative parallel to COCOSDA
and an umbrella for the two initiatives
To set up truly international umbrella organization for LR
To define: The form, nature, ...
To formulate: A complete workplan
To identify: Possible affiliation/funding sources
We will call a meeting in the framework of ELSNET LR Task Group + Enabler before the end of the year
Taking into account the overview emerging from today’s survey