ckl --- center for computational linguistics

28
CKL CKL --- --- Center for Center for Computationa Computationa l l Linguistics Linguistics Proje Proje c c t MŠMT LC536 t MŠMT LC536 (LC05) (LC05) Univerzita Karlova v Praze, Univerzita Karlova v Praze, ÚFAL MFF ÚFAL MFF Západočeská univerzita Plzeň, Západočeská univerzita Plzeň, KKY FAV KKY FAV Masarykova Univerzita Brno, FI Masarykova Univerzita Brno, FI

Upload: lesley

Post on 30-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

CKL --- Center for Computational Linguistics. Proje c t MŠMT LC536 (LC05) Univerzita Karlova v Praze, ÚFAL MFF Západočeská univerzita Plzeň, KKY FAV Masarykova Univerzita Brno, FI Ústav pro jazyk český AV ČR Praha http://www.centrumkomputacnilingvistiky.cz. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CKL --- Center for Computational Linguistics

CKLCKL------

Center for Center for Computational Computational

LinguisticsLinguisticsProjeProjecct MŠMT LC536t MŠMT LC536

(LC05)(LC05)Univerzita Karlova v Praze, ÚFAL MFFUniverzita Karlova v Praze, ÚFAL MFFZápadočeská univerzita Plzeň, KKY Západočeská univerzita Plzeň, KKY

FAVFAVMasarykova Univerzita Brno, FIMasarykova Univerzita Brno, FI

Ústav pro jazyk český AV ČR PrahaÚstav pro jazyk český AV ČR Prahahttp://www.centrumkomputacnilingvistiky.czhttp://www.centrumkomputacnilingvistiky.cz

Page 2: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 22

Center’s Advisory Board MeetingCenter’s Advisory Board Meeting 3131..11.20.201111

MFF UK, Malostranské nám. 25MFF UK, Malostranské nám. 25RoomRoom S S11, , 44thth floor floor

10:00 Introduction to the Center, history, results (Jan Hajic)10:00 Introduction to the Center, history, results (Jan Hajic) 10:25 Charles University research and results (Jan Hajic)10:25 Charles University research and results (Jan Hajic) 10:40 Break10:40 Break 11:00 Institute for Czech Language research and results 11:00 Institute for Czech Language research and results

(Karel Oliva)(Karel Oliva) 11:15 Masaryk University research and results (Karel Pala)11:15 Masaryk University research and results (Karel Pala) 11:30 University of West Bohemia research and results 11:30 University of West Bohemia research and results

(Pavel Ircing)(Pavel Ircing)

Page 3: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 33

The CenterThe Center

Goals:Goals:– Research in all areas of computational Research in all areas of computational

linguistics and speechlinguistics and speech– Close cooperation in speech and langaugeClose cooperation in speech and langauge– Create annotated data Create annotated data – Algorithms and SW Tools for NL analysis Algorithms and SW Tools for NL analysis

and generationand generation– Create and integrate lexical resources Create and integrate lexical resources

Page 4: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 44

History of the History of the CentCenterer

Former Former CentCenteerr for Computational for Computational LinguisticsLinguistics (program MŠMT LN) (program MŠMT LN)– 2000-20042000-2004– UK, ÚJČ, ZČUUK, ÚJČ, ZČU: fundamental research type (B): fundamental research type (B)

NowNow: Cent: Centeerr for Computational for Computational LinguisticsLinguistics – ((againagain) ) fundamental research,fundamental research, MŠMT LC MŠMT LC– Masaryk University in Brno added, now 4 Masaryk University in Brno added, now 4

sitessites

Page 5: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 55

The The CeCenter: some figuresnter: some figures

Budget and timeframeBudget and timeframe– 2.92.9 mil. mil. €€, 2005-2009, 2005-2009[-2011][-2011] ( (6 yrs +6 yrs + 9 9 mosmos))

Personální obsazení (20Personální obsazení (201010):):– 1 1 PIPI (prof (professoressor))– 7 7 Co-PIs and key presons Co-PIs and key presons ((full/assoc. prof.)full/assoc. prof.)– 1111 PostdocsPostdocs (Ph.D.) (Ph.D.)

99 of them graduated with CKL supportof them graduated with CKL support

– 24 24 graduate studentsgraduate students Reduced to about 2/3 for 2011Reduced to about 2/3 for 2011

Page 6: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 66

The sites The sites (1)(1)

UK Praha (UK Praha (ÚFALÚFAL MFF MFF / Charles University / Charles University))– Formal language theory and algorithmsFormal language theory and algorithms– SW SW tools for NLU / NLGtools for NLU / NLG– Raw, Annotated data (incl. parallel)Raw, Annotated data (incl. parallel)

ZČU Plzeň, KKY FAZČU Plzeň, KKY FAV (University of West V (University of West Bohemia in Pilsen)Bohemia in Pilsen)– Speech recognition and TTSSpeech recognition and TTS– Data collection and annotationData collection and annotation

Page 7: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 77

The sitesThe sites (2) (2)

MU Brno, FI, NLP laMU Brno, FI, NLP lab (Masaryk b (Masaryk University)University)– LexiLexical issuescal issues

LexiLexical databases, incl. SWcal databases, incl. SW

ÚJČ AV ČRÚJČ AV ČR (Institute of the Czech (Institute of the Czech Language, Academy of Sciences of Language, Academy of Sciences of the CR)the CR)– Digitization of historical dataDigitization of historical data– Lexical databasesLexical databases

Page 8: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 88

20052005

Start of work, after some “gap”Start of work, after some “gap”– Apr. 1, Apr. 1, 2005 – 2005 – three months vacuumthree months vacuum– [Got back the name…][Got back the name…]– Reduced budget for 2005 (300k Reduced budget for 2005 (300k €)€)

Durable equipment / future computing clusterDurable equipment / future computing cluster

– Cooperation: Cooperation: EU grant proposalsEU grant proposals continuing work on Malach (U.S.)continuing work on Malach (U.S.) Start of the PIRE NSF project (JHU, Brown Univ.)Start of the PIRE NSF project (JHU, Brown Univ.)

Page 9: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 99

20062006

First full yearFirst full year– Prague Dependency Treebank v2.0 finished (published at LDC)Prague Dependency Treebank v2.0 finished (published at LDC)– Speech reconstruction projectSpeech reconstruction project (UK, specifi (UK, specification with PIRE/JHUcation with PIRE/JHU))– Lexical issuesLexical issues (UK, MU (UK, MU, , ÚJČ)ÚJČ)– Speech (ASR, TTS - ZČU)Speech (ASR, TTS - ZČU)– IR – CLEF test collection, CLEF shared task, 1st partIR – CLEF test collection, CLEF shared task, 1st part– Digitization of historical material (ÚJČ)Digitization of historical material (ÚJČ)– Start of EU Integrated project „Companions“: UK, ZČUStart of EU Integrated project „Companions“: UK, ZČU– More More internationalinternational cooperation: EU, USA (JHU, Brown, Univ. of cooperation: EU, USA (JHU, Brown, Univ. of

PPennsylvaniaennsylvania))– Organization of Treebanks and Linguistics Theories, Dec. 2006 Organization of Treebanks and Linguistics Theories, Dec. 2006

(UK)(UK)– 40 „results40 „results”” in the government database („RIV in the government database („RIV”)”)

Page 10: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1010

20072007 Mid-projectMid-project

– LexiLexical resources, new Czech language lexical databasecal resources, new Czech language lexical database (MU+ÚJČ)(MU+ÚJČ)

– Added more students for English work, translationAdded more students for English work, translation English annotation specification, annotationEnglish annotation specification, annotation (ZČU, UK) (ZČU, UK)

– Integration of ASR and TTS with NLU/NLG Integration of ASR and TTS with NLU/NLG (UK, ZČU)(UK, ZČU) In the “Companions” projectIn the “Companions” project

– SW tools for analysis and generationSW tools for analysis and generation Speech, language Speech, language (UK, MU, ZČU)(UK, MU, ZČU)

– International collaborationInternational collaboration EU (3 projeEU (3 projectscts 6 6thth F FP: UK, UK+ZČU), USA (UK, UK+ZČU)P: UK, UK+ZČU), USA (UK, UK+ZČU)

– Local oLocal organirganisation of ACL 2007 and EMNLP 2007sation of ACL 2007 and EMNLP 2007 Still (2011) holds record in attendance (~1100 participants)Still (2011) holds record in attendance (~1100 participants)

– 66 66 results inresults in ““RIVRIV”” (16 (16 journalsjournals, 39 , 39 in-procin-proc., 5 SW/data ., 5 SW/data etcetc.).)

Page 11: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1111

20082008 Slightly modified goals (stress on MT)Slightly modified goals (stress on MT)

– LexiLexical resourcescal resources (MU, UK, ÚJČ) (MU, UK, ÚJČ) SSW toolsW tools

– SSemanticsemantics detection of plagiarism (detection of plagiarism (MU) MU) NLUNLU (UK, MU), (UK, MU), NLGNLG (UK (UK))

– NNew algorithms for ASRew algorithms for ASR ProProsody, language modeling, speech reconstructionsody, language modeling, speech reconstruction

– Data acquisition, annotation, corpus toolsData acquisition, annotation, corpus tools– Research (incl. data annotation) for machine translationResearch (incl. data annotation) for machine translation

The TectoMT SW and data platformThe TectoMT SW and data platform– Theoretical formal linguistics, language usageTheoretical formal linguistics, language usage

ResultsResults (RIV): 64 (RIV): 64:: 13 13 journal artjournal art., 32 ., 32 in-proc.in-proc., 5 , 5 booksbooks, 5 SW , 5 SW tools/data resources etc.tools/data resources etc.

Page 12: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1212

20092009 Should have been the last year of CKL…Should have been the last year of CKL…

– Application for extension for 2010-11Application for extension for 2010-11 Granted for 2010Granted for 2010

– Research: English data, MT, ASR, DialogResearch: English data, MT, ASR, Dialog Work on the parallel Czech-English treebank (PTB)Work on the parallel Czech-English treebank (PTB) Companions project: integration workCompanions project: integration work

– Tight cooperation between UK and ZCUTight cooperation between UK and ZCU PIRE project – workshops, students from US at UKPIRE project – workshops, students from US at UK Euromatrix EU project on MT extended (-2012)Euromatrix EU project on MT extended (-2012)

– Organization of the CoNLL 2009 shared taskOrganization of the CoNLL 2009 shared task– Organization of session at FET 2009 (EU Organization of session at FET 2009 (EU

conference)conference)– Results: 62, journals: 8, in-proc.: 42, 3 books etc.Results: 62, journals: 8, in-proc.: 42, 3 books etc.

Page 13: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1313

20102010 Last fully-funded year: ext. to 2011 granted in Nov.Last fully-funded year: ext. to 2011 granted in Nov.

– Continuation of research along the same linesContinuation of research along the same lines Wrap-up in data annotation: PCEDT, PDTSxWrap-up in data annotation: PCEDT, PDTSx Departures of people due to uncertaintyDepartures of people due to uncertainty

– International cooperation:International cooperation: Companions project finished (Nov. 2010)Companions project finished (Nov. 2010) PIRE continuing towards 2011, EuromatrixPlus renewed (UK)PIRE continuing towards 2011, EuromatrixPlus renewed (UK) New projects in 2010:New projects in 2010:

– Univ. of Pennsylvania – discourse representation, annotation (UK)Univ. of Pennsylvania – discourse representation, annotation (UK)– Khresmoi (EU IP) – medical IR and IE, UKKhresmoi (EU IP) – medical IR and IE, UK– Faust (STREP, machine translation, UK)Faust (STREP, machine translation, UK)– META-NET network of excellence in MT / data sharingMETA-NET network of excellence in MT / data sharing

Chairing the ACL 2010 conference (Uppsala, Sweden)Chairing the ACL 2010 conference (Uppsala, Sweden)– Results (prelim.): ~60 (12 journal articles, ~40 in-proc.)Results (prelim.): ~60 (12 journal articles, ~40 in-proc.)

Page 14: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1414

Quantitative Summary of Quantitative Summary of ResultsResults

RIV 2005-200RIV 2005-2009 (2010 pending)9 (2010 pending)– 274 records (+ ~ 60 in 2010)274 records (+ ~ 60 in 2010)

Mostly papers in proceedings of conferences and Mostly papers in proceedings of conferences and workshopsworkshops– ACL, EACL, NAACL, Coling, CoNLL; workshopsACL, EACL, NAACL, Coling, CoNLL; workshops– > 95% international, > 85% abroad> 95% international, > 85% abroad

Some journal articlesSome journal articles– LNCS, IEEE Transactions, LRELNCS, IEEE Transactions, LRE, Czech ling. Journals , Czech ling. Journals

(PBML, SaS – now in WoS)(PBML, SaS – now in WoS) Software aSoftware andnd data data

– Mostly Mostly „open source“„open source“; training, shared task (evaluation); training, shared task (evaluation)

Page 15: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1515

Most valued publicationsMost valued publications PapersPapers

– Semi-supervised POS tagging (EACL 2009)Semi-supervised POS tagging (EACL 2009) Best results in POS tagging so far, incl. EnglishBest results in POS tagging so far, incl. English Now taggers available in 5 languagesNow taggers available in 5 languages

– Extension of HVS Semantic Parser by Allowing Left-RightExtension of HVS Semantic Parser by Allowing Left-Right BranchBranching (ICASSP 2008)ing (ICASSP 2008) NNew result, drawing from S. Young’s workew result, drawing from S. Young’s work

– Large-scale Semantic Networks: Annotation and Large-scale Semantic Networks: Annotation and EvaluationEvaluation NAACL 2009; NAACL 2009; in cooperation with in cooperation with Google ResearchGoogle Research (Zurich, K. (Zurich, K.

Hall)Hall)– CoNLL 2009 Shared Task, CoNLL 2009CoNLL 2009 Shared Task, CoNLL 2009

Overall task and system descriptionOverall task and system description BookBook

– Valenční slovník českých sloves Valenční slovník českých sloves ((Valency Lexicon of Czech Valency Lexicon of Czech Verbs, Verbs, KarolinumKarolinum Press Press)) EleElectronic version availablectronic version available

Page 16: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1616

Most valued dataMost valued data CorporaCorpora ( (language databases, publicly availablelanguage databases, publicly available))

– Prague Dependency Treebank 2.0, Linguistic Data Consortium Prague Dependency Treebank 2.0, Linguistic Data Consortium 20062006

– Prague Czech-English Dependency Treebank, to appear in 2011Prague Czech-English Dependency Treebank, to appear in 2011 Penn Treebank & translation to Czech, with semantic annotation Penn Treebank & translation to Czech, with semantic annotation

~PDT/style~PDT/style– Czech Wordnet 1.0 (ELRA, 2008)Czech Wordnet 1.0 (ELRA, 2008)– Sign Language, Audiovisual (ELRA, 2008)Sign Language, Audiovisual (ELRA, 2008)

TesTest / shared task collectionst / shared task collections– CLEF 2006, 2007CLEF 2006, 2007

Multilingual cross-langauge search competitionsMultilingual cross-langauge search competitions– Machine Translation Open Competition – EuroMatrixMachine Translation Open Competition – EuroMatrix/Plus/Plus 2006- 2006-

1010 Czech-English, German, French, Italian, Hungarian, SpanishCzech-English, German, French, Italian, Hungarian, Spanish

– CoNLL Shared Task 2007, 2009CoNLL Shared Task 2007, 2009 DepDep.. parsing, semantic role labeling ( parsing, semantic role labeling (unified for 7 languagesunified for 7 languages))

Page 17: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1717

Most valued SW toolsMost valued SW tools SoftwareSoftware

– Corpus manager (client/server) Corpus manager (client/server) Bonito/ManateeBonito/Manatee Worldwide useWorldwide use: ČNK, SNK; Hu, Hr, GB: ČNK, SNK; Hu, Hr, GB

– Word Sketch EngineWord Sketch Engine Commercial use (Commercial use (Lexical ComputingLexical Computing))

– ComPOSTComPOST State-of-the-art POS tagger (Cz, En, State-of-the-art POS tagger (Cz, En, Dutch, Swedish, IcelandicDutch, Swedish, Icelandic))

– SyntaSyntacctiticc dependency dependency parser „MST“ (parser „MST“ (CzechCzech)) WithWith Univ. of Pennsylvania Univ. of Pennsylvania

– Improved Czec ASR and Emotional TTS Improved Czec ASR and Emotional TTS Used in the Companions projectUsed in the Companions project

– NLG and Dialogue Manager w/knowledge baseNLG and Dialogue Manager w/knowledge base Also for the Companions projectAlso for the Companions project

– The TectoMT SW and data handling platform The TectoMT SW and data handling platform MT, dialogue systems (now any NLU/NLG processing -> MT, dialogue systems (now any NLU/NLG processing ->

“Treex”)“Treex”)

Page 18: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1818

The Center provided…The Center provided…

Material benefitsMaterial benefits– 3/4 3/4 of budget: personnel (mainly graduate students)of budget: personnel (mainly graduate students)– Generous travel moneyGenerous travel money– Small equipmentSmall equipment– Durable equipment – clusters (30-200 CPUs)Durable equipment – clusters (30-200 CPUs)

Only in 2005/6 – need for renewalOnly in 2005/6 – need for renewal

– Small indirect costs (< Small indirect costs (< 12%12%, contribution of inst., contribution of inst.)) ““intangible” benefitsintangible” benefits

– (Sub)teams, even across institutions, flexible assignment (Sub)teams, even across institutions, flexible assignment of people to projects, of people to projects,

– dissertations, one assoc. professor promotiondissertations, one assoc. professor promotion

Page 19: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1919

The Center had to work The Center had to work under certain “restrictions”under certain “restrictions”

Employment of graduate students, postdocs, supervision of Employment of graduate students, postdocs, supervision of graduate studegraduate studentntss– NNow at all four sitesow at all four sites (2009: 10/4/9/1) (2009: 10/4/9/1)

RequirementRequirement: at least on site…: at least on site… →→ CheckCheck Requirement: Requirement: Participation of students (Participation of students (Bc./Mgr./Ph.D.)Bc./Mgr./Ph.D.)

– Total: 41Total: 41 student studentss →→ CheckCheck– 77 nationalitiesnationalities

Students - after graduation - went to (e.g.)…Students - after graduation - went to (e.g.)…– Petr Němec (UK): TextKernel, Hol.; Kiril Ribarov (UK): ČEZPetr Němec (UK): TextKernel, Hol.; Kiril Ribarov (UK): ČEZ– Jan Romportl, Aleš Pražák: SpeechTech (spinoff, ZČU)Jan Romportl, Aleš Pražák: SpeechTech (spinoff, ZČU)– VladimVladimír Kadlec (MU Brno): Acision (GB)ír Kadlec (MU Brno): Acision (GB)– Petr Pajas (UK): Google (Zurich)Petr Pajas (UK): Google (Zurich)– VVáclav Novák (UK): Ministry of Interior, then a small startupáclav Novák (UK): Ministry of Interior, then a small startup– FormerFormer CKL (LN CKL (LN, 00-04, 00-04): M. Čmejrek, J. Cuřín (UK): IBM Research): M. Čmejrek, J. Cuřín (UK): IBM Research

(Yorktown, Prague)(Yorktown, Prague)

Page 20: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2020

““RestrictionsRestrictions”” ((cont.cont.’d)’d)

RequirementRequirement: : integration to EU “research space”integration to EU “research space” 99 projectsprojects EU, EU, 66thth aand 7nd 7thth F FPP

– All typesAll types: IP, STREP, NoE; SSA, Dig. Libraries: IP, STREP, NoE; SSA, Dig. Libraries Companions (IP) - ZČU, UK; Companions (IP) - ZČU, UK; Khresmoi (IP) - UKKhresmoi (IP) - UK EuroMatrix, EuroMatrixPlusEuroMatrix, EuroMatrixPlus, Faust, Faust (STREP) - UK (STREP) - UK Flarenet, META-NET (NoE) - UKFlarenet, META-NET (NoE) - UK Clarin (SSA) - UK, MU, ÚJČ; Clarin (SSA) - UK, MU, ÚJČ; KYOTO (Dig. Libraries) - MUKYOTO (Dig. Libraries) - MU

USAUSA– Malach (Malach (till till 2007; UK, ZČU): USC, JHU, IBM, UMD2007; UK, ZČU): USC, JHU, IBM, UMD– PIRE: rozpoznávání řeči a strojový překlad (UK, PIRE: rozpoznávání řeči a strojový překlad (UK, indirectlyindirectly ZČU): ZČU):

JHU, Brown Univ.JHU, Brown Univ.– Discourse: Univ. of PennsylvaniaDiscourse: Univ. of Pennsylvania– Treebanking: Univ. of Colorado Treebanking: Univ. of Colorado →→ CheckCheck

Page 21: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2121

EU Project „Companions“EU Project „Companions“

GoalGoal– IntelIntellligent igent conversational companionconversational companion

Over photographs (Cz), Over photographs (Cz), „how was your day“„how was your day“ (En) (En)

TechnologiTechnologieses– ASR, emoASR, emotionaltional TTS TTS– Natural language understanding, NL generationNatural language understanding, NL generation– Naturalness of dialogue:Naturalness of dialogue: „user studies“ / „user studies“ /

„evaluation“„evaluation“ CKLCKL

– UK/ZČU: ASR, TTS, NLU, NLG, UK/ZČU: ASR, TTS, NLU, NLG, DDialogialogue ue managementmanagement

Page 22: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2222

The Companions project The Companions project

Page 23: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2323

Companions: System Companions: System DiagramDiagram

Page 24: CKL --- Center for Computational Linguistics

Other Other project project demos demos

Page 25: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2525

Semantic annotationSemantic annotation (UK) (UK)

Některé kontury problému se však po oživení Havlovým projevem zdají být jasnější.

Page 26: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2626

PDT 2.0:PDT 2.0:Annotation Annotation

layerslayers

„Byl by šel do lesa“(“he’d go to the forest”)

Linked layers of annotation

Stand-off annotation

Scheme (Relax NG) z-la

yer

Page 27: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2727

Speech reconstruction Speech reconstruction (UK, (UK, ZČU)ZČU)

● Goal: Goal: „„TranslationTranslation““

SEM NEMOH SEM TO JIM DÁT TEN VOBRAZ

‘m couldn’t ‘m that them give the paintin’

Ten obraz jsem jim nemohl dát.

Ten obraz jsem jim nemohl dát.

I could not give them the painting.

?

Generation

● Annotation

Page 28: CKL --- Center for Computational Linguistics

Jan 31, 2011, ÚFAL MFF UJan 31, 2011, ÚFAL MFF UKK

CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 2828

Speech Reconstruction Speech Reconstruction AnnotationAnnotation

Edited transcriptEdited transcript– All changes All changes

allowedallowed– ManuManualal an annotationnotation– Large dataLarge data

Malach dataMalach data Companions proj. Companions proj.

dialogues (> 100h)dialogues (> 100h)