center for computational learning systems
DESCRIPTION
Center for Computational Learning Systems. Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin Jansche, Rebecca Passonneau, Owen Rambow We are part of “The NLP Group” but not of the CS department What we do: Researchers - PowerPoint PPT PresentationTRANSCRIPT
Center for Computational Learning Systems
• Independent research center within the Engineering School
• NLP people at CCLS: Mona Diab, Nizar Habash, Martin Jansche, Rebecca Passonneau, Owen Rambow
• We are part of “The NLP Group” but not of the CS department
• What we do:o Researcherso Work with Kathy and Juliao Our own projectso Sometimes teacho Supervise students (PhD, Masters, independent studies)
• Some of us are in CEPSR, some in the Interchurch Building
• Some NLP Group meetings will take place in Interchurch Center
CLiMB 2: Computational Linguistics for Metadata
Building, phase 2
• Becky Passonneau (with University of Maryland)
• Interactive workbench for image cataloguers/indexers: Use NLP to extract descriptive terms from scholarly text
• Mellon Foundation• http://www.umiacs.umd.edu/~climb/
Automated Readers Advisor, Heiskell Talking Books and Braille
Library (NYPL)
• Becky Passonneau• Replace some of librarians’ tasks in
current over-the-phone borrowing system with automated dialogue system
• Use Wizard-of-Oz paradigm for data collection
• Joint project with CCNY (Esther Levin)• http://
www.cs.columbia.edu/~becky/pubs/WozVariant.ppt
Tracking Emergent Narrative Skills (TENS)
• Becky Passonneau• Current data set: ten-year olds retelling
silent movies• Develop quantitative methods to
compare semantic and pragmatic content (e.g., adapt Pyramid Method for evaluating summary content)
• Joint project with University of Connecticut (Elena Levy)
Arabic NLP
• CADIM Group: Mona Diab, Nizar Habash, Owen Rambow
• Focus on Standard Arabic AND the dialects• NLP tools for Arabic:
o Morphological analysis (exists)o Morphological tagging (exists, best-performing)
Tokenization POS tagging (best-performing) Diacritization (best-performing)
o Word-sense disambiguation (in progress)o Sentence-boundary detection for ASR (in progress)o Parsing (initial research)o Names-entity recognition (joint with Fair Isaacs, in progress)o …
Machine Translation
• Nizar Habash • Focus: Arabic-English MT• Different hybrid MT approaches explored
o Linguistic preprocessing for Statistical MT Morphological and Syntactic preprocessing
o Adding statistical resources to rule-based MT systems
Automatically extracted phrase tables combined with Generation-Heavy MT
• Columbia first time participation in NIST MTEval (2006)
Word Sense Modeling and Disambiguation
• Mona Diab• Using corpora (including
multilingual parallel and similar) for unsupervised learning
• Arabic WordNet• Arabic PropBank
Email Summarization:Social Networks
• Aaron Harnly (PhD student) and Owen Rambow, with Kathy McKeown
• Study interaction between:o Email-intrinsic factors
Language in email (lexison, syntax, …) Email genre
o Structure of dialog Threads Speech acts
o Relation among people Roles in organization Social networks
• Use to predict on factor from others• Use in high-level summaries of large amounts
of email communication
Multilingual Metagrammars
• Owen Rambow (with University of Pennsylvania)
• Goal: high-level abstract representation of syntax of (many/all) natural languages, from which we can automatically generate grammars that can be used for NLP
• Have: Universal Grammar component and language-specific modules for Korean, German, Yiddish
• Next: Icelandic, Mainland Scandinavian, English, Kashmiri, …