ling 388: language and computers sandiway fong lecture 9: 9/21
Post on 15-Jan-2016
216 views
TRANSCRIPT
LING 388: Language and Computers
Sandiway Fong
Lecture 9: 9/21
2
Administrivia
• Homework 2 Review
• Homework 3 – out today– you’ll need file WSJ9_040.txt from the course
homepage– due next Thursday
3
Homework Question 1
(6pts) Give the complete (i.e. all answers) step-by-step computation tree for?- mem([1,2,3],X).
given the database– mem([X|_],X).– mem([_|L],X):-
mem(L,X).
• Hint: using– ?- trace.– will help but you will need to list
out the matches and variable binding at each step
– see Lecture 5 slides for app/3 to see what format you should use
4
Homework Question 1
(6pts) Give the complete (i.e. all answers) step-by-step computation tree for?- mem([1,2,3],X).
given the database– mem([X|_],X).– mem([_|L],X):-
mem(L,X).
?- mem([1,2,3],X).match base case
mem([X’|_],X’).when X’=1 and X=X’Answer: X=1
match recursive casemem([_|L],X’)when [2,3]=L and X=X’
?- mem([2,3],X’).match base case
mem([X”|_],X”).when X”=2 and X’=X”Answer: X=2
5
Homework Question 1mem([X|_],X).mem([_|L],X):- mem(L,X).• ?- mem([1,2,3],X).match base case
mem([X’|_],X’).when X’=1 and X=X’Answer: X=1
match recursive casemem([_|L],X’)when [2,3]=L and X=X’
?- mem([2,3],X’).match base case
mem([X”|_],X”).when X”=2 and X’=X”Answer: X=2
?- mem([2,3],X’).
match recursive casemem([_|L’],X”)
when [3]=L’ and X’=X”
?- mem([3],X”).
match base casemem([X”’|_],X”’).
when X”’=3 and X”=X”’
Answer: X=3
?- mem([3],X”).
match recursive casemem([_|L”],X”’)
when []=L” and X”=X”’
?- mem([],X”’).
No match
6
Homework Question 2
• DatabaseaddNT(W,Wnt) :-
atom_chars(W,L),
append(L,[n,’\’’,t],Lnt),
atom_chars(Wnt,Lnt).modal(should). “should is a modal”
modal(would). “would is a modal”
modal(could). “could is a modal”
modal(may). “may is a modal”
• (4pts) Modify the definition of addNT/2 to accept only modals
• Demonstrate your program works correctly for:?- addNT(should,X).
?- addNT(would,X).
?- addNT(john,X).
• Submit both your program and queries as your answer – put everything together,
– not in separate files!
7
Homework Question 2
• DatabaseaddNT(W,Wnt) :-
atom_chars(W,L),
append(L,[n,’\’’,t],Lnt),
atom_chars(Wnt,Lnt).modal(should). “should is a modal”
modal(would). “would is a modal”
modal(could). “could is a modal”
modal(may). “may is a modal”
• Idea:– make addNT/2 be true only
if word W is a modal
– i.e. call modal(W) as a sub-query of addNT/2
• Revised definition of addNT/2addNT(W,Wnt) :-
modal(W),
atom_chars(W,L),
append(L,[n,’\’’,t],Lnt),
atom_chars(Wnt,Lnt).
8
Homework Question 2
• (4pts) Further modify your definition of addNT/2 to exclude the ungrammatical case:– should shouldn’t– would wouldn’t– could couldn’t– may *mayn’t– i.e.?- addNT(may,X).No
• Idea:– make sure W cannot be may– i.e. call \+ W = may as a
sub-query of addNT/2
• Revised definition of addNT/2addNT(W,Wnt) :-
\+ W = may,
modal(W),
atom_chars(W,L),
append(L,[n,’\’’,t],Lnt),
atom_chars(Wnt,Lnt).
9
Homework Question 2
• (6pts) Extra Credit Question
• Notice that the following query doesn’t work:
?- addNT(X,'shouldn\'t').
ERROR: atom_chars/2: Arguments are not sufficiently instantiated
• Write the corresponding “subtract n’t” rule, call it subNT/2, for removing the n’t suffix:
• ?- addNT(X,'shouldn\'t').• X = should
10
Homework Question 2
• Original definitionaddNT(W,Wnt) :-
atom_chars(W,L), append(L,[n,’\’’,t],Lnt), atom_chars(Wnt,Lnt).
• Query?- addNT(should,X).instantiates W = should
• Definition becomesaddNT(should,Wnt) :-
atom_chars(should,L), append(L,[n,’\’’,t],Lnt), atom_chars(Wnt,Lnt).
• Query?- ?- addNT(X,'shouldn\'t').instantiates Wnt = 'shouldn\'t'
• Definition becomesaddNT(W,’shouldn\’t’) :-
atom_chars(W,L),
append(L,[n,’\’’,t],Lnt),
atom_chars(Wnt,Lnt).
Problem!atom_chars cannot operate without either an atom or list supplied
11
Homework Question 2
• Original definition
addNT(W,Wnt) :-
atom_chars(W,L),
append(L,[n,’\’’,t],Lnt),
atom_chars(Wnt,Lnt).
• Reverse the order of the sub-goals in the original definitionsubNT(W,Wnt) :-
atom_chars(Wnt,Lnt),
append(L,[n,’\’’,t],Lnt),
atom_chars(W,L).
12
Homework Question 3
(6pts) Define a predicate
pallindrome/1 – that is true when a word
can be spelt the same forwards or backwards
• Examples:– radar– redivider– abba
• Definitionreverse([],[]).
reverse([X|L],R) :-
reverse(L,LR),
append(LR,[X],R).
13
Homework Question 3
(6pts) Define a predicate pallindrome/1 – that is true when a word can
be spelt the same forwards or backwards
• Definitionreverse([],[]).reverse([X|L],R) :-
reverse(L,LR),append(LR,[X],R).
• Examples:– radar– redivider– abba
• Idea:– [r,a,d,a,r] reversed
is [r,a,d,a,r]
i.e. the same list!
pallindrome(W) :-atom_chars(W,L),reverse(L,L).
14
Homework 3
15
Data File
• please use a computer with Microsoft Word for this homework– machines in SBS RI
Computer Lab 224 (or any other lab) can be used
• in Microsoft Word– load file WSJ9_040.txt
from the course homepage
• Wall Street Journal articles (July 28th–August 1st 1989)
• this is the text file you will use for searching
• contains almost 14,500 lines
16
Last Time
• introduced the notion of a regular expression– pattern matching– important in document
searching
• varieties– grep
• “global regular expression print”
– Microsoft Word’s Find with wildcard
• somewhat limited form of regular expression search
17
Microsoft Word’s Find
• basic wildcards– ? and *
• ? any single character• * zero or more characters
– @
• one or more of the preceding character – < >
• < beginning of a word • > end of a word
– [ ]
• range of characters• e.g. [aeiou], [a-z], [A-z], [0-9]
– more wildcards can be found in the help documentation
18
Ordinals
• n-th– expression
• [0-9]@th>• “one or more occurrences of a character in the range 0 to 9,
followed by th and the word boundary”
– finds• In 4th quarter• 17th-largest• 17th-largest (part of above string)• its 100,000th case (matches 3 times)
19
Ordinals
• 1st– expression
• 1st>
– finds• 21st anniversary concert• 1st American
• 2nd– expression
• 2nd>
– finds• 2nd-Period
• 3rd– expression
• 3rd>
• combining expressions– [23][nr]d>
• “2 or 3, followed by n or r, followed by d, and a word boundary”
• works since 2rd and 3nd won’t be present
– [123][snr][td]>• 1st, 2nd, and 3rd
– [0-9]@[tsnr][htd]>• “one or more occurrences of a
character in the range 0 to 9, followed by one of t,s, n, or r, followed by one of h, t, or d and the word boundary”
• 1st, 2nd, 3rd, and 4th and so on...
20
Exercise 1
• Since this is the Wall Street Journal
• Try to find occurrences of $X millions
• Regular Expression– $[0-9]@ millions
– You need to use Find– Turn on Wildcard
searching
21
Homework Exercise 1
• Numbers (3pts)– Question 1 (2pts)
• Give a single Microsoft Word regular expression for finding occurrences of
– $number million
• where number not is a whole number of millions, e.g.– $25.3 million– $826.7 million– $3.1 million
– Question 2 (1pts)• How many are there of this kind in the document?
22
Homework Exercise 2
• Words (5pts)– Question 1 (2pts)
• Give a single Microsoft Word regular expression for finding occurrences of the following terms
– spokesman– spokesmen– spokeswoman– spokeswomen
– Question 2 (1pt)• How many are there in the document?
– Question 3 (1pt)• What does your expression assume about words in English?
– Question 4 (1pt)• Why is the answer different from a simple spokes* search?
23
Homework Exercise 3
• Document Structure (8pts)• Many articles contains a header line naming its author(s), e.g.
– @ By Carrie Dolan– @ By Ron Winslow and Michael Waldholz– @ By Jeffrey H. Birnbaum– Note: @ and By are separated by exactly 2 spaces
– Question 1 (2pts)• Give a Microsoft Word regular expression to find header lines
with first authors who use a middle initial
– Question 2 (1pt)• How many such articles are there in the document?
24
Homework Exercise 3
• Document Structure– Limitations of Microsoft Word
• Task: find articles authored by two people, e.g.• @ By Ron Winslow and Michael Waldholz• Expression: \@ By*<and>
careful with that *!not the results we want!
25
Homework Exercise 3
• Document Structure– Question 3 (4pts)
• Devise multiple regular expressions to find two author header lines
• @ By Ron Winslow and Michael Waldholz
• @ By S. Karene Witcher and Jeffrey A. Trachtenberg• HINT:
– to overcome Microsoft Word’s behavior you will have to break up the search into multiple cases
– Question 4 (1pt)• How many such header lines are there in the document?
26
Summary
• Total: 16 pts– Exercise 1: Numbers (3pts)– Exercise 2: Words (5pts)– Exercise 3: Document structure (8pts)