hlt: current and recent research & development in the netherlands 2001-2008 jan odijk 24 nov...
TRANSCRIPT
HLT:Current and Recent Research & Development in the Netherlands
2001-2008
Jan Odijk
24 Nov 2008
Overview
• Earlier History
• Instruments & Programmes
• Players and their Projects
• Concluding Remarks
Earlier History
• MT Projects in the 80’s– Eurotra (EU, 1985- ca.1990)– Distributed MT (BSO, 1984-ca.1990)– Rosetta (Philips, 1985-1992)
Emerging Community
Earlier History
• CLIN (Computational Linguistics in the Netherlands) initiated in 1990– After TIN (Linguistics in the Netherlands)– But no association
• Yearly informal conference– no or little pre-selection
• Yearly Proceedings– Selection of reviewed articles
• http://www.let.rug.nl/~vannoord/clin/clin.html
Earlier History
• Community further strengthened by common projects in the 90’s– Priority Programme LST centered around
public transportation information services (OVIS)
– Corpus Gesproken Nederlands (Spoken Dutch Corpus), together with Flanders
Overview
• Earlier History
• Instruments & Programmes
• Players and their Projects
• Concluding Remarks
Instruments
• NWO Pionier• NWO Vernieuwingsimpuls
– Veni, Vidi, Vici– http://www.nwo.nl/nwohome.nsf/pages/NWOA_4YJDQ3
• EZ Bsik– Increase knowledge and research capacity for 5 selected
areas (incl. ICT)– For mixed public/ private consortia that bundle
knowledge, expertise and innovative capacity – http://www.senternovem.nl/BSIK/
• EC (IST)
Overview
• Earlier History
• Instruments & Programmes
• Players and their Projects
• Concluding Remarks
IMIX
• Name: Interactive Multimodal Information Extraction
• Duration: 2001-2008• Budget: 2M€
– 4 small programmes, 3 post-doc projects– demonstrator
• URL: http://www.nwo.nl/IMIX • Funding: NWO
IMIX Goals
• Aims to develop knowledge and technology needed
• To find specific answers to specific questions in Dutch-language documents
• using multiple modalities at the input and output sides
STEVIN
• Name: STEVIN• Duration: 2004-2011• Budget: 11.4M€
– 19 R&D projects
– 14 demonstration projects
– Networking activities, educational activities, …
• Funding: Netherlands 2/3 & Flanders 1/3 • http://taalunieversum.org/taal/technologie/stevin/
STEVIN Goals
• contribute to the further progress of HLT for the Dutch language– realise an appropriate digital language infrastructure for
the Dutch language– carry out strategic research in the domains of language
and speech technology, in particular in areas for which there is a large demand from specific applications and technologies;
– create networks and core research areas; – promote the embedding of research and educate new
generations of experts; – encourage demand and knowledge transfer.
CATCH
• Name: Continuous Access to Cultural Heritage
• Duration: 2005-??• Budget: 6M€ until 2008 and 3M€ in 2008
– Currently 10 running projects
• URL: http://www.nwo.nl/catch • Funding: NWO + OCW. Cultural heritage
institutes contribute in kind (2.8M€ so far)
CATCH Goals
• Aims to develop generic methods and techniques– cutting across the areas of the humanities and
computer science, – aiming to facilitate an interaction with cultural
heritage institutions.
IOP MMI
• Name: Innovation-oriented Research Programme (IOP) Man Machine Interaction
• Duration: 1999-2003 (phase 1); 2004-2007• Budget: ??• URL:
http://www.senternovem.nl/iopmensmachineinteractie/index.asp
• Funding: EZ (Min. of Economic Affairs)
IOP MMI Goals (phase 2)
• focus on the Design, Implementation and Evaluation of Intelligent Systems – Which dynamic knowledge (of one another)
should systems and users acquire and apply in order to optimally achieve their goals
CLARIN-NL(?)
• Name: Common Language Resource and Technology Infrastructure - Netherlands
• Duration: 2009-2014
• Budget: 22M€ requested
• URL:
• Funding: OC&W (ESFRI)
CLARIN-NL Goals
• CLARIN-NL aims to design, construct, validate, and exploit – a research infrastructure that is needed to
provide a sustainable and persistent eScience working environment
– for researchers in the Humanities and Social Sciences (HSS)
– who want to make use of language resources and technology.
Overview
• Earlier History
• Instruments & Programmes
• Players and their Projects
• Concluding Remarks
Amsterdam (UvA)
• Institute: Informatics Institute
• Core Topics: – Information Retrieval– Question Answering
• Key people– Maarten de Rijke
Amsterdam (UvA)
• QASSIR (NWO, 2006-2010)– Question Answering as Semistructured Information
Retrieval
• EfFoRT (NWO, 2006-2010)– Effective Focused Retrieval Techniques
– UvA, (Twente)
• MultiMATCH (EU, 2006-2009)– Multilingual/Multimedia Access To Cultural Heritage
– 11 partners from Europe incl. UvA
Amsterdam (UvA)
• MuNCH (CATCH, (2005-2009)– Multimedia aNalysis for Cultural Heritage.– UvA, Beeld & Geluid (B&G), Digitaal Erfgoed Nederland
• MuSeUM (CATCH, 2005-2009)– Multiple-collection Searching Using Metadata.– UvA, Gemeentemuseum Den Haag, Rijksbureau voor
Kunsthistorisch Documentatie, Municipal Archives Rotterdam
• A Model Checking Approach to Query Evaluation on XML Documents (NWO, 2004-2008)
• .•
Amsterdam (UvA)
• FactMine (IMIX, 2004-2007)– Fact and Ontology Mining for Question
Answering– UvA, DFKI, Antwerpen, Erasmus MC
• AID (Dutch Government, 2004-2008)– Adaptive Information Disclosure.
Amsterdam (UvA)
• ITEQA (NWO, 2004-2007)– Inference for Temporal Question Answering.
• DuOMAn (STEVIN, 2008-2011)– Dutch Online Media Analysis– UvA, Groningen, Gent, TrendLight, GridLine
• DAESO (STEVIN), Cornetto (STEVIN), KYOTO (EU IST), CLARIN-NL
Amsterdam (UvA)
• Institute: ILLC
• Topics– Data Oriented Parsing (DOP)
• Key Persons:– Remko Scha– Rens Bod– Khalil Sima’an
Amsterdam (UvA)
• U-DOP (NWO VICI, 2006-2011)– Unsupervised Learning with the DOP Model
• DOP and Unsupervised Grammar Induction (NWO, 2004-2007)– Unsupervised stochastic grammar induction
from unlabeled data
• DOP and Learning Stochastic Tree-Grammars (NWO, 2003-2006)
Amsterdam (VU)
• Department: Language and Communication
• Core Topics: – Computational Lexicology
• Key people– Piek Vossen
Amsterdam (VU)
• Cornetto (STEVIN, 2006-2008)– Combinatorial and Relational Network as Toolkit for
Dutch Language Technology
– VU, UvA, Leuven, Irion
• KYOTO (EU FP7 ICT, 2008-2011)– Knowledge Yielding Ontologies for Transition-based
Organization
– VU, 8 other European partners
• CLARIN
Groningen
• Department: Centre for Language and Cognition/ Computational Linguistics
• Core Topics:– Syntax and Parsing
• Key people– John Nerbonne– Gertjan van Noord– Gosse Bouma
Groningen
• Alpino (NWO PIONIER, 2000-2005)– Algorithms for Linguistic Processing
• QADR (IMIX, 2004-2008)– Question Answering for Dutch using Dependency
Relations– Groningen, Spectrum
• COREA (STEVIN, 2005-2007)– Coreference Resolution for Extracting Answers– Groningen, Antwerpen, Language and Computing
•
Groningen
• LASSY (STEVIN, 2006-2009)– Large Scale Syntactic Annotation of written Dutch– Groningen, Leuven
• SCRATCH (CATCH, ??-??)– SCRipt Analysis Tools for the Cultural Heritage– Groningen, Nationaal Archief
• D-COI (STEVIN), IRME (STEVIN), DAISY (STEVIN), DuOMAn (STEVIN), PaCo-MT (STEVIN), CLARIN, CLARIN-NL
Nijmegen
• Institute: Centre for Language and Speech Technology
• Core Topics:– Speech Processing– Language Resource Development
• Key people– Lou Boves, Nelleke Oostdijk, Helmer Strik,
Henk van den Heuvel
Nijmegen
• NORISC (IMIX, 2004-2007)– Next generatiOn template based Recognition for
Interactive man-machine Speech Communication
• COMIC (EU IST, 2002-2005)– COnversational Multimodal Interaction with
Computers
• MATIS (IOP-MMI, ??-??)– Multimodal Access to Transaction and Information
Services
Nijmegen
• D-COI (STEVIN, 2005-2006)– Dutch Language Corpus Initiative– Nijmegen, Tilburg, Twente, Groningen, Utrecht,
Leuven, Polderland• SoNaR (STEVIN, 2008-2011)
– STEVIN Nederlandstalig Referentiecorpus– Nijmegen, Tilburg, Twente, Utrecht, Leuven, Gent
• JASMIN-CGN (STEVIN, 2005-2007)– Extension of CGN with speech of children, non-
natives, elderly and human-machine interaction– Nijmegen, Leuven, Talkinghome
Nijmegen
• ACORNS (EU, 2008-2010)– Acquisition of Communication and Recognition Skills
– “Intends to […] create an artificial agent that is capable of acquiring human verbal communication behaviour “
– Nijmegen, 5 other European partners
• Avoiding the ham in hamster (NWO VENI 2006-2010)– Modelling the use of non-segmental information in
human spoken-word recognition
Nijmegen
• BATS (ICTRegie, IBBT, 2008-2012)– Topic and Speaker Tracking in Broadcast
Archives – Nijmegen, Leuven
• SPEX– Speech Processing Expertise Centre– Collection, Annotation & Validation– ELRA Speech Validation Centre
Nijmegen
• Autonomata TOO (STEVIN, 2008-2010)– Autonomata Transfer of Output – Nijmegen, Gent, Utrecht, TeleAtlas, Nuance
• DISCO (STEVIN, 2008-2011)– Development and Integration of Speech technology into
COurseware for language learning– Nijmegen, Linguapolis Antwerpen, Taal- &
Communicatiecentrum Nijmegen, Polderland
• Autonomata (STEVIN), MIDAS (STEVIN), NBest (STEVIN), Praat (STEVIN), SPRAAK (STEVIN), CLARIN, CLARIN-NL, A Propos (IOP-MMI)
Soesterberg
• Institute: TNO Defense & Security
• Core Topics:– Speech Processing– Speech Technology Evaluation
• Key people– David van Leeuwen
Soesterberg
• NBest (STEVIN, 2006-2008)– Northern and Southern Dutch Benchmark
Evaluation of Speech recognition Technology– TNO Soesterberg, Nijmegen, Twente, Leuven,
Gent. Delft
• SPRAAK (STEVIN)
Tilburg
• Institute: Tilburg Centre for Creative Computing, Induction of Linguistic Knowledge (ILK) Research group
• Core Topics:– Machine-learning– Memory-Based learning
• Key people– Antal van den Bosch
Tilburg
• ROLAQUAD (IMIX, 2004-2008)– Robust Language Understanding in Question-
Answering Dialogue– Tilburg, Textkernel
• Implicit Linguistics (NWO VICI, 2005-2009)– Machine Learning of Text-to-Text Processing,
• A Propos (IOP MMI, 2006-2009?)– Proactive Personalization for Professional Document
Writing– Tilburg, Nijmegen, industrial partners
Tilburg
• MITCH (CATCH, 2007-2009?)– Mining for Information in Texts from the
Cultural Heritage– National Museum of Natural History, Tilburg
• D-COI (STEVIN), CLARIN, CLARIN-NL, SoNaR (STEVIN)
Tilburg
• Institute: Communication and Cognition
• Core Topics:– Communication and Cognition– Language Generation– Multimodality
• Key people– Emiel Krahmer
Tilburg
• IMOGEN (IMIX, 2004-2008)– Interactive Multimodal Output Generation– Tilburg and Twente
• Bridging the gap between psycholinguistics and computational linguistics (NWO VICI, 2007-2011)– Generation of referring expressions
• DAESO (STEVIN, 2006-2009)– Detecting and Exploiting Semantic Overlap– Tilburg, Antwerpen, Amsterdam (UvA), Textkernel
Tilburg
• TUNA (EPSRC (UK), 2003-2007)– Towards a Unified Algorithm for the
Generation of Referring Expressions– Aberdeen, Open University (UK), Tilburg
• FOAP (NWO Vidi, 2003-2007)– Functions of Audio-Visual Prosody
Tilburg
• Institute: Communication Information Sciences
• Core Topics:– Computational Semantics and Pragmatics– Dialogue Theory– Multimodal Interaction
• Key people– Harry Bunt
Tilburg
• Paradime (IMIX, 2004-2008)– Parallel Agent-based Dialogue Management
Engine
Twente
• Institute: Human Media Interaction
• Core Topics– Multimodality– Speech Recognition
• Key people– Franciska de Jong, Anton Nijholt– Arjan van Hessen, Roelof Ordelman
Twente
• AMI (EU IST, 2004-2006)– Augmented Multi-party Interaction– IDIAP (CH) and 11 other partners incl. Twente
• M4 (EU IST, 2002-2005)– MultiModal Meeting Manager– Sheffield and 8 other partners incl. Twente
• DRUID (Telematica ??-2003)– Multimedia Indexing & Retrieval on the basis of Image
Processing & Language and Speech Technology– TNO TPD, TNO TM, Twente, CWI
Twente
• SAFIR (EU, 2003-2007)– Speech Automatic Friendly Interface Research– 17 European partners incl. Twente
• Angelica (NWO Meervoud (?), 2003-2007)– A Natural-Language Generator for Embodied,
Lifelike Conversational Agents
Twente
• Pidgin (EZ, 2002-2004)– Self-learning Cross Lingual Interface– Twente, Irion, Carp, New Law Facilities
• Choral (CATCH, 2005-2009)– Access to Oral History– Twente, Municipal Archives Rotterdam. Radio
Rijnmond, Erasmus University Rotterdam• MultimediaN (Bsik, 2004-2008)
– MultimediaN/N5– Multiple research groups and 20+ industrial partners
Twente
• VIDIAM (IMIX, 2005-2008)– Dialog Management and the Visual Channel
• AMIDA (EU, 2006-2009)– Augmented Multi-party Interaction with
Distance Access– AMI consortium
Twente
• MESH-EU (EU, 2006-2009)– Multimedia Semantic Syndication for Enhanced News
Services– Twente and 10 other European partners
• MediaCampaign (EU, 2006-2009)– Discovering, inter-relating and navigating cross-media
campaign knowledge– Twente and 7 other European partners
• D-COI (STEVIN), NBest (STEVIN), SPRAAK (STEVIN), SoNaR (STEVIN), CLARIN, CLARIN-NL
Utrecht
• Institute: UIL-OTS• Core Topics
– Networks– Language Technology for e-Learning– Linguistic Resources– LR and LT Infrastructure
• Key people– Jan Odijk– Steven Krauwer– Paola Monachesi– Gerrit Bloothooft
Utrecht
• ELSNET (EU, 1991-Current)– A Europe-based forum dedicated to human language technologies
aiming to advance R&D in human language technologies in Europe • LT4eL (EU IST, (2005-2008)
– Language Technology for eLearning– Utrecht and 11 other European partners
• LTfLL (EU IST, (2008-2011)– Language Technology for LifeLong Learning– Open University and 8 other European partners incl. Utrecht
• IRME (STEVIN, 2005-2007)– Identification and lexical Representation of Multiword Expressions– Utrecht, Groningen, Van Dale
Utrecht
• CLARIN (EU ESFRI, 2008-2010)– Common LAnguage Resource and technology Infrastructure– Utrecht, Nijmegen, and 35+ other partners
• FlaReNet (EU IST 2008-2011)– Fostering Language Resources Network– Pisa, Utrecht, ILSP Athens, ELDA, LIMSI, Vienna, Barcelona
• CLARIN-NL (submitted to OC&W, 2009-2014)– Common LAnguage Resource and technology Infrastructure,
Netherlands part– Utrecht, Nijmegen, UvA, VU, Twente, Groningen, Tilburg and
many others • ISLE (EU IST), Autonomata (STEVIN), D-COI
(STEVIN), SoNaR (STEVIN), Autonomata TOO
Overview
• Earlier History
• Instruments & Programmes
• Players and their Projects
• Concluding Remarks
Concluding Remarks
Solid Community Largely complementary, some overlap
But always close collaboration Overall the situation in the Netherlands for LST
research and development is pretty good esp. in comparison to other countries The situation is getting more difficult
large thematic programmes instead of specific technologies The options for fundamental research are limited
and decreasing
Thank You
For Your Attention
Do NOT Go Beyond This Slide
Older or Marginally Related
• CHOICE (CATCH)– semi-automatic semantic annotation and
employing context information for ensuring continuous access to the cultural riches.
– B&G, VU, Telematica, MPI, ICN
Older or Marginally Related
• STITCH (CATCH)– Semantic Interoperability to Access Cultural
Heritage– KB, VU, MPI