october 2005csa3180 nlp1 csa3180 natural language processing introduction and course overview
TRANSCRIPT
October 2005 CSA3180 NLP 1
CSA3180Natural Language Processing
Introduction
and
Course Overview
October 2005 CSA3180 NLP 2
Acknowledgement
• Material for some of these slides taken from J Nivre, University of Gotheborg, Sweden
October 2005 CSA3180 NLP 3
Why Language and Computers
• Engineering– NLP is concerned with the design and
implementation of effective NL input and output components for computational systems (Robert Dale 2000)
• Scientific– The use of computers for linguistic research
and applications
October 2005 CSA3180 NLP 4
NLP is Interdisciplinary
• Linguistics– Theoretical– Applied
• Computer Science– Algorithms– Compiling Techniques
• Artificial Intelligence– Understanding, reasoning– Intelligent Action
October 2005 CSA3180 NLP 5
Uszkoreit’s (2000) Five Points
• Solving the human language puzzle– by implementing complex theories directly
• Teaching computers to communicate with people– by exploiting natural modes of communication
• Friendly software should listen and speak– through development of multimodal communication
• Machines can help people communicate with each other.– by developing multilingual applications
• Language is the fabric of the web– through language technology for knowledge management
October 2005 CSA3180 NLP 6
Application Areas• Document Processing
– Classification– Summarisation– Information Extraction
• Question Answering– Information Retrieval– Dialogue
• Multilinguality– Machine Translation– Translation tools
• Multimodality– speech– intonation– image
October 2005 CSA3180 NLP 7
Basic Problems
• Analysis– Conversion of NL input to internal representations
• Generation– Conversion of internal representations to NL output
• Issues– What kind of input/output/representations– Evaluation– Learning
October 2005 CSA3180 NLP 8
Levels of Linguistic Knowledge
• Phonetics/Phonology: sound structure
• Morphology: word structure
• Syntax: sentence structure
• Semantics: meanings
• Pragmatics: use of language in context
• Discourse: paragraphs, texts, dialogues
October 2005 CSA3180 NLP 9
Ambiguity
• Morpho-SyntacticWe saw her duck
• Lexical SemanticThey went to the bank
• Structural semanticYoung men and women
• ReferentialShe did it
• PragmaticCan you pass the salt
October 2005 CSA3180 NLP 10
Ways of Studying NLP
• By ApplicationMT, IE, IR etc.
• By Approachrational vs. empirical
• By Linguistic Levelmorphology, syntax etc.
• By Algorithm
October 2005 CSA3180 NLP 11
Algorithms
• State Machines– automata and transducers
• Rule Systems– regular and context free grammars
• Search– top-down/bottom-up parsing
• Probabilistic algorithms
October 2005 CSA3180 NLP 12
Approach in this CoursePart I - Algorithms
• Words [3]– Finite State Algorithms– Morphological Processing
• Sentences [3]– Parsing– (Generation)
• Texts [3]– Tagging– Chunking
October 2005 CSA3180 NLP 13
Approach in this CoursePart II – Topics and Tools
• Semantics [6]
• Statistics [6]
• Information Extraction [6]
• Machine Translation [4]
• Information Retrieval [3]
October 2005 CSA3180 NLP 14
Course Information
• Course Websitewww.cs.um.edu.mt/~mros/csa3180
• Reference TextJurafsky and Martin
• Tools– Prolog: SWI Prolog– NLTK: nltk.sourceforge.net