![Page 1: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/1.jpg)
1
Introduction to NLP Tools
09/23/2003
![Page 2: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/2.jpg)
2
Motivation
• Machine Translation– From English to French
• What’s needed?
![Page 3: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/3.jpg)
3
Motivation Cont’d (1)
• Syntactic parser• Part-Of-Speech Tagger
– Example: NP -> adj noun
• Morphological Analyzer– Example: “tools” -> “tool”
“Who is he?” -> “Who is he ?”
• Semantic Analyzer – Word sense disambiguate (“wash dishes”)– Choose the correct translation
![Page 4: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/4.jpg)
4
Motivation Cont’d (2)
• Lexicons– The information of the word
How many senses? What’s the possible translations
of the word?
• Corpus– Useful for learning a tool– Useful for evaluation
![Page 5: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/5.jpg)
5
Outline
• Lexicons
• Text corpora
• Morphological tools
• Part-Of-Speech(POS) taggers
• Syntactic parsers
• Semantic knowledge bases and semantic parser
• Speech tools
![Page 6: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/6.jpg)
6
Lexicons
• Definition– A repository for words
• Lexicons in LDC(Linguistic Data Consortium)– creating and sharing linguistic resources: data,
tools and standards.
• CELEX
• WordNet
![Page 7: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/7.jpg)
7
CELEX
• Dutch Center for Lexical Information• Lexical databases of English , Dutch and German• 21,000 nouns, 8,000 adjectives and 6,000 verbs• English:
– English Orthography, Lemmas– English Phonology, Lemmas– English Morphology, Lemmas– English Syntax, Lemmas– English Frequency, Lemmas– English Orthography, Wordforms– English Phonology, Wordforms– English Morphology, Wordforms– English Frequency, Wordforms– English Corpus Types– English Frequency, Syllables
![Page 8: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/8.jpg)
8
WordNet
• A database of lexical relations• Inspired by current psycholinguistic
theories of human lexical memory• Synset: a set of synonyms, representing one
underlying lexical concept– Example:
• fool {chump, fish, fool, gull, mark, patsy, fall guy, sucker, schlemiel, shlemiel, soft touch, mug}
• Relations link the synsets: hypernym, Has-Member, Member-Of, Antonym, etc.
![Page 9: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/9.jpg)
9
WordNet Cont’d
• Examplepu-erh.cs.utexas.edu$ wn bike -partn
Part Meronyms of noun bike
2 senses of bike
Sense 1
motorcycle, bike
HAS PART: mudguard, splashguard
Sense 2
bicycle, bike, wheel
HAS PART: bicycle seat, saddle
HAS PART: bicycle wheel
HAS PART: chain
HAS PART: coaster brake
HAS PART: handlebar
HAS PART: mudguard, splashguard
HAS PART: pedal, treadle, foot lever
HAS PART: sprocket, sprocket wheel
• ExamplePu-erh.cs.utexas.edu$wn bike
Information available for noun bike
-hypen Hypernyms
-hypon, -treen Hyponyms & Hyponym Tree
-synsn Synonyms (ordered by frequency)
-partn Has Part Meronyms
-meron All Meronyms
-famln Familiarity & Polysemy Count
-coorn Coordinate Sisters
-simsn Synonyms (grouped by similarity of meaning)
-hmern Hierarchical Meronyms
-grepn List of Compound Words
-over Overview of Senses
Information available for verb bike
-hypev Hypernyms
-hypov, -treev Hyponyms & Hyponym Tree
-synsv Synonyms (ordered by frequency)
-famlv Familiarity & Polysemy Count
-framv Verb Frames
-simsv Synonyms (grouped by similarity of meaning)
-grepv List of Compound Words
-over Overview of Senses
![Page 10: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/10.jpg)
10
Corpus
• Definition– Collections of text and speech
• LDC
• Penn Treebank
• DSO
• Hansard
![Page 11: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/11.jpg)
11
Some of the Top Corpus from LDC
• TIPSTER – Information Retrieval, Data Extrraction datasets– TIPSTER project, TREC project
• TIMIT Acoustic-Phonetic Continuous Speech Corpus– A corpus of read speech designed to – Provide speech data for the acquisition of acousticphonetic
knowledge – Useful for the development and evaluation of automatic speech
recognition systems• ECI(European Corpus Initiative Multilingual Corpus) multilingual
electronic text corpus• NTIMIT
– A phonetically– balanced, continuous speech, telephone bandwidth speech database
![Page 12: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/12.jpg)
12
Penn Treebank
• A collection of corpora• Tagged with POS, Syntactic roles,
predicate/argument structure, dysfluency annotation
• How are they made– Hand correction of the output of an errorful automatic
process
• 3 million words– 1 million words tagged with predicate/argument
structure for extraction semantic knowledge
![Page 13: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/13.jpg)
13
Penn Treebank Cont.’d
• Corpora– Wall Street Journal
– ATIS (Air Travel Information System)
– Brown Corpus
– IBM Manual Sentences
– Library of America Texts: Mark Twain, Henry Adams, Herman Melville ...
– MUC-3 Messages
• Example:( (S (NP-SBJ Rally 's)
(VP operates
and
franchises
(NP (NP (QP about 160)
fast-food restaurants)
(PP-LOC throughout
(NP the U.S))))
Seeking/VBG to/TO block/VB
[ the/DT investors/NNS ]
from/IN buying/VBG
[ more/JJR shares/NNS ]
./.
![Page 14: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/14.jpg)
14
DSO
• Word Sense Corpus– Contains sentences in which about 192,800
word occurrences have been tagged with WordNet senses
– Taken from the Brown corpus and the Wall Street Journal corpus
– 121 nouns and 70 verbs
![Page 15: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/15.jpg)
15
Hansard
• Official records (Hansards) of the 36th Canadian Parliament, both in English of French
• 1.3 million pairs of aligned sentences of English and French– Example
• Comme il est 14 h 30, la Chambre s'ajourne jusqu'\xe0 lundi prochain, \xe0 11 heures, conform\xe9ment au paragraphe 24(1) du R\xe8glement.
• It being 2.30 p.m., the House stands adjourned until Monday next at 11 a.m., pursuant to Standing Order 24(1).
• Useful for Machine Translation
![Page 16: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/16.jpg)
16
Morphological Tools
• PC-KIMMO– A two-level morphological parser
• Porter Stemmer
• Penn Treebank Tokenizer– Seperate document into words– “dog?” -> “dog ?”
![Page 17: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/17.jpg)
17
Porter Stemmer
• Simple algorithm, use a set of cascaded rewrite rules– Example
• Ational->ATE (relational->relate)
• Stem:– The main morpheme of the word, supplying the main
meaning
• Fast• Used very widely in Information Retrieval
– Run stemmer on keywords and the words in the documents
![Page 18: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/18.jpg)
18
Part-Of-Speech(POS) Taggers
• Part-Of-Speech: noun, verb, pronoun, etc.• Brill’s Tagger• HMM Tagger• MXPOST
![Page 19: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/19.jpg)
19
Brill’s Tagger
• Transformation-Based Learning(TBL) tagger• /projects/nlp/brill-pos-tagger• First labels every word with its most-likely tag• Then Use Learned TBL Rules to correct mistakes
– Example:• Change NN to VB when the previous tag is TO
![Page 20: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/20.jpg)
20
HMM Tagger
• Also called Maximum Likelihood Tagger• Xerox PARC's HMM tagger: ftp://parcftp.xerox
.com/pub/tagger/• Choose the tag sequence with the maximum
possibility given the words seen.
![Page 21: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/21.jpg)
21
MXPOST: Maximum Entropy POS Tagger
• Maximum Entropy Model is a framework integrating many information sources(called features) for classification
• Each candidate tag is a class• Given features of the word(the around words, the
morphological feature, and around tags, etc.), decide which class it belongs.
![Page 22: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/22.jpg)
22
Syntactic Parsers
• Collin’s Parser
• XTAG
• MXPOST: Maximum Entropy Parser
![Page 23: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/23.jpg)
23
Collin’s Parser
• Context-free Grammar
• Use frequencies to solve ambiguities
• Got some idea of this parser– Web-based Chart parser
![Page 24: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/24.jpg)
24
XTAG
• An on-going project to develop a wide-coverage grammar for English
• using a lexicalized Tree Adjoining Grammar (TAG) formalism– Context sensitive grammar
• consists of a parser, an X-windows grammar development interface and a morphological analyzer.
• /projects/nlp/xtag/
![Page 25: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/25.jpg)
25
XTAG Cont’d
![Page 26: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/26.jpg)
26
Semantic Knowledge Bases and Semantic Parser
• Analyze what does it say
• WordNet
• Penn Treebank
• Web-based Semantic Parser
![Page 27: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/27.jpg)
27
WordNet
• Respresents lexical relations
• Useful in word sense disambiguation
![Page 28: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/28.jpg)
28
Penn Treebank
Predicate: fool(Kris)
![Page 29: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/29.jpg)
29
Semantic Parser
• A web-based chart parser enriched with semantic constraints
• Example:– Input: My dog has fleas.– Output: has(my(dog),fleas)
•
![Page 30: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/30.jpg)
30
Speech Tools
• ISIP
• EPOS
• CSLU Toolkit
![Page 31: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/31.jpg)
31
ISIP
• ISIP(Institute for Signal and Information Processing) public domain speech recognition system
• Open research software
• Online courses, tutorials, dictionaries, databases
• Build your own speech recognition system
![Page 32: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/32.jpg)
32
EPOS
• a language independent rule-driven Text-to-Speech (TTS) system
• supports several main speech generation algorithms
![Page 33: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/33.jpg)
33
CSLU Toolkit
• Basic framework and tools for people to build, investigate and use interactive language systems
• speech recognition, natural language understanding, speech synthesis and facial animation technologies
• Easy to use , spread from higher education into homes
![Page 34: 1 Introduction to NLP Tools 09/23/2003. 2 Motivation Machine Translation –From English to French What’s needed?](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649ddd5503460f94ad4d4f/html5/thumbnails/34.jpg)
34
Thanks!