ling 388: language and computers sandiway fong lecture 9: 9/21

26
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

LING 388: Language and Computers

Sandiway Fong

Lecture 9: 9/21

Page 2: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

2

Administrivia

• Homework 2 Review

• Homework 3 – out today– you’ll need file WSJ9_040.txt from the course

homepage– due next Thursday

Page 3: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

3

Homework Question 1

(6pts) Give the complete (i.e. all answers) step-by-step computation tree for?- mem([1,2,3],X).

given the database– mem([X|_],X).– mem([_|L],X):-

mem(L,X).

• Hint: using– ?- trace.– will help but you will need to list

out the matches and variable binding at each step

– see Lecture 5 slides for app/3 to see what format you should use

Page 4: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

4

Homework Question 1

(6pts) Give the complete (i.e. all answers) step-by-step computation tree for?- mem([1,2,3],X).

given the database– mem([X|_],X).– mem([_|L],X):-

mem(L,X).

?- mem([1,2,3],X).match base case

mem([X’|_],X’).when X’=1 and X=X’Answer: X=1

match recursive casemem([_|L],X’)when [2,3]=L and X=X’

?- mem([2,3],X’).match base case

mem([X”|_],X”).when X”=2 and X’=X”Answer: X=2

Page 5: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

5

Homework Question 1mem([X|_],X).mem([_|L],X):- mem(L,X).• ?- mem([1,2,3],X).match base case

mem([X’|_],X’).when X’=1 and X=X’Answer: X=1

match recursive casemem([_|L],X’)when [2,3]=L and X=X’

?- mem([2,3],X’).match base case

mem([X”|_],X”).when X”=2 and X’=X”Answer: X=2

?- mem([2,3],X’).

match recursive casemem([_|L’],X”)

when [3]=L’ and X’=X”

?- mem([3],X”).

match base casemem([X”’|_],X”’).

when X”’=3 and X”=X”’

Answer: X=3

?- mem([3],X”).

match recursive casemem([_|L”],X”’)

when []=L” and X”=X”’

?- mem([],X”’).

No match

Page 6: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

6

Homework Question 2

• DatabaseaddNT(W,Wnt) :-

atom_chars(W,L),

append(L,[n,’\’’,t],Lnt),

atom_chars(Wnt,Lnt).modal(should). “should is a modal”

modal(would). “would is a modal”

modal(could). “could is a modal”

modal(may). “may is a modal”

• (4pts) Modify the definition of addNT/2 to accept only modals

• Demonstrate your program works correctly for:?- addNT(should,X).

?- addNT(would,X).

?- addNT(john,X).

• Submit both your program and queries as your answer – put everything together,

– not in separate files!

Page 7: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

7

Homework Question 2

• DatabaseaddNT(W,Wnt) :-

atom_chars(W,L),

append(L,[n,’\’’,t],Lnt),

atom_chars(Wnt,Lnt).modal(should). “should is a modal”

modal(would). “would is a modal”

modal(could). “could is a modal”

modal(may). “may is a modal”

• Idea:– make addNT/2 be true only

if word W is a modal

– i.e. call modal(W) as a sub-query of addNT/2

• Revised definition of addNT/2addNT(W,Wnt) :-

modal(W),

atom_chars(W,L),

append(L,[n,’\’’,t],Lnt),

atom_chars(Wnt,Lnt).

Page 8: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

8

Homework Question 2

• (4pts) Further modify your definition of addNT/2 to exclude the ungrammatical case:– should shouldn’t– would wouldn’t– could couldn’t– may *mayn’t– i.e.?- addNT(may,X).No

• Idea:– make sure W cannot be may– i.e. call \+ W = may as a

sub-query of addNT/2

• Revised definition of addNT/2addNT(W,Wnt) :-

\+ W = may,

modal(W),

atom_chars(W,L),

append(L,[n,’\’’,t],Lnt),

atom_chars(Wnt,Lnt).

Page 9: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

9

Homework Question 2

• (6pts) Extra Credit Question

• Notice that the following query doesn’t work:

?- addNT(X,'shouldn\'t').

ERROR: atom_chars/2: Arguments are not sufficiently instantiated

• Write the corresponding “subtract n’t” rule, call it subNT/2, for removing the n’t suffix:

• ?- addNT(X,'shouldn\'t').• X = should

Page 10: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

10

Homework Question 2

• Original definitionaddNT(W,Wnt) :-

atom_chars(W,L), append(L,[n,’\’’,t],Lnt), atom_chars(Wnt,Lnt).

• Query?- addNT(should,X).instantiates W = should

• Definition becomesaddNT(should,Wnt) :-

atom_chars(should,L), append(L,[n,’\’’,t],Lnt), atom_chars(Wnt,Lnt).

• Query?- ?- addNT(X,'shouldn\'t').instantiates Wnt = 'shouldn\'t'

• Definition becomesaddNT(W,’shouldn\’t’) :-

atom_chars(W,L),

append(L,[n,’\’’,t],Lnt),

atom_chars(Wnt,Lnt).

Problem!atom_chars cannot operate without either an atom or list supplied

Page 11: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

11

Homework Question 2

• Original definition

addNT(W,Wnt) :-

atom_chars(W,L),

append(L,[n,’\’’,t],Lnt),

atom_chars(Wnt,Lnt).

• Reverse the order of the sub-goals in the original definitionsubNT(W,Wnt) :-

atom_chars(Wnt,Lnt),

append(L,[n,’\’’,t],Lnt),

atom_chars(W,L).

Page 12: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

12

Homework Question 3

(6pts) Define a predicate

pallindrome/1 – that is true when a word

can be spelt the same forwards or backwards

• Examples:– radar– redivider– abba

• Definitionreverse([],[]).

reverse([X|L],R) :-

reverse(L,LR),

append(LR,[X],R).

Page 13: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

13

Homework Question 3

(6pts) Define a predicate pallindrome/1 – that is true when a word can

be spelt the same forwards or backwards

• Definitionreverse([],[]).reverse([X|L],R) :-

reverse(L,LR),append(LR,[X],R).

• Examples:– radar– redivider– abba

• Idea:– [r,a,d,a,r] reversed

is [r,a,d,a,r]

i.e. the same list!

pallindrome(W) :-atom_chars(W,L),reverse(L,L).

Page 14: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

14

Homework 3

Page 15: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

15

Data File

• please use a computer with Microsoft Word for this homework– machines in SBS RI

Computer Lab 224 (or any other lab) can be used

• in Microsoft Word– load file WSJ9_040.txt

from the course homepage

• Wall Street Journal articles (July 28th–August 1st 1989)

• this is the text file you will use for searching

• contains almost 14,500 lines

Page 16: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

16

Last Time

• introduced the notion of a regular expression– pattern matching– important in document

searching

• varieties– grep

• “global regular expression print”

– Microsoft Word’s Find with wildcard

• somewhat limited form of regular expression search

Page 17: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

17

Microsoft Word’s Find

• basic wildcards– ? and *

• ? any single character• * zero or more characters

– @

• one or more of the preceding character – < >

• < beginning of a word • > end of a word

– [ ]

• range of characters• e.g. [aeiou], [a-z], [A-z], [0-9]

– more wildcards can be found in the help documentation

Page 18: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

18

Ordinals

• n-th– expression

• [0-9]@th>• “one or more occurrences of a character in the range 0 to 9,

followed by th and the word boundary”

– finds• In 4th quarter• 17th-largest• 17th-largest (part of above string)• its 100,000th case (matches 3 times)

Page 19: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

19

Ordinals

• 1st– expression

• 1st>

– finds• 21st anniversary concert• 1st American

• 2nd– expression

• 2nd>

– finds• 2nd-Period

• 3rd– expression

• 3rd>

• combining expressions– [23][nr]d>

• “2 or 3, followed by n or r, followed by d, and a word boundary”

• works since 2rd and 3nd won’t be present

– [123][snr][td]>• 1st, 2nd, and 3rd

– [0-9]@[tsnr][htd]>• “one or more occurrences of a

character in the range 0 to 9, followed by one of t,s, n, or r, followed by one of h, t, or d and the word boundary”

• 1st, 2nd, 3rd, and 4th and so on...

Page 20: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

20

Exercise 1

• Since this is the Wall Street Journal

• Try to find occurrences of $X millions

• Regular Expression– $[0-9]@ millions

– You need to use Find– Turn on Wildcard

searching

Page 21: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

21

Homework Exercise 1

• Numbers (3pts)– Question 1 (2pts)

• Give a single Microsoft Word regular expression for finding occurrences of

– $number million

• where number not is a whole number of millions, e.g.– $25.3 million– $826.7 million– $3.1 million

– Question 2 (1pts)• How many are there of this kind in the document?

Page 22: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

22

Homework Exercise 2

• Words (5pts)– Question 1 (2pts)

• Give a single Microsoft Word regular expression for finding occurrences of the following terms

– spokesman– spokesmen– spokeswoman– spokeswomen

– Question 2 (1pt)• How many are there in the document?

– Question 3 (1pt)• What does your expression assume about words in English?

– Question 4 (1pt)• Why is the answer different from a simple spokes* search?

Page 23: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

23

Homework Exercise 3

• Document Structure (8pts)• Many articles contains a header line naming its author(s), e.g.

– @ By Carrie Dolan– @ By Ron Winslow and Michael Waldholz– @ By Jeffrey H. Birnbaum– Note: @ and By are separated by exactly 2 spaces

– Question 1 (2pts)• Give a Microsoft Word regular expression to find header lines

with first authors who use a middle initial

– Question 2 (1pt)• How many such articles are there in the document?

Page 24: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

24

Homework Exercise 3

• Document Structure– Limitations of Microsoft Word

• Task: find articles authored by two people, e.g.• @ By Ron Winslow and Michael Waldholz• Expression: \@ By*<and>

careful with that *!not the results we want!

Page 25: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

25

Homework Exercise 3

• Document Structure– Question 3 (4pts)

• Devise multiple regular expressions to find two author header lines

• @ By Ron Winslow and Michael Waldholz

• @ By S. Karene Witcher and Jeffrey A. Trachtenberg• HINT:

– to overcome Microsoft Word’s behavior you will have to break up the search into multiple cases

– Question 4 (1pt)• How many such header lines are there in the document?

Page 26: LING 388: Language and Computers Sandiway Fong Lecture 9: 9/21

26

Summary

• Total: 16 pts– Exercise 1: Numbers (3pts)– Exercise 2: Words (5pts)– Exercise 3: Document structure (8pts)