ishs04
TRANSCRIPT
Computational Approach To Recognizing Wordplay In Jokes
Julia M. Taylor & Lawrence J. MazlackApplied Artificial Intelligence LaboratoryUniversity of Cincinnati
Introduction
Computational recognition of humor is difficultRequires natural language understandingWorld knowledge
This is an initial investigation into computational humor recognition using wordplayLearns statistical patterns of textRecognizes utterances similar in pronunciation to a
given wordDetermines if found utterances transform a text into a
joke
Subclass of Humor: Joke
A joke is a short humorous piece of literature in which the funniness culminates in the final sentence
Most jokes have:Setup – the first part of the joke which establishes certain
expectationsPunchline – much shorter part of joke, causes some form of
conflict Force another interpretation Violate expectation
“Is the doctor home?” the patient asked in his bronchial whisper. “No,” the doctor’s young and pretty wife whispered in reply. “Come right in.”
Computational Humor
Computation humor generators, examples: Light bulb joke generator Joke generator that focuses on witticisms based
around idioms Generator of humorous parodies of existing acronyms Generator of a humorous sentence based on
alphanumeric password Pun generators
Computational humor recognizer
Wordplay Jokes
Depend on words that are similar in sound but have different meaning
Same pronunciation, same spelling Same pronunciation, different spelling Similar pronunciation, different spelling
The difference in meaning creates conflict or breaks prediction
Nurse: I need to get your weight today.Impatient patient: Three hours and twelve minutes.
weight=wait
Statistical Language Recognition N-gramsModel that uses conditional probability to predict Nth word based on N-1
previous words.Probabilities depend on the training corpus Find a word with largest P(word |“is”)
A newspaper reporter goes around the world with his investigation. He stops people on the street and asks them: “Excuse me, what is your opinion of the meat shortage?” An American asks: “What is ‘shortage’?” A Russian asks: “What is ‘opinion’?” A Pole asks: “What is ‘meat’?” A New York taxi-driver asks: “What is ‘excuse me’?”
The Parisian Little Moritz is asked in school: “How many deciliters are there in a liter of milk?” He replies: “One deciliter of milk and nine deciliters of water.” – In France, this is a good joke; in Hungary, this is a good milk.
Possible Methods for Joke RecognitionDetermine if a given text is a jokeGiven a joke, determine the punchline location
Hotel clerk: Welcome to our hotel
Max: Thank you. Can you tell me what room I’m in?
Clerk: The lobby
Restricted Domain: Knock Knock JokesLine1: “Knock, Knock”
Line2: “Who’s there?”
Line3: any phrase
Line4: Line3 followed by “who?”
Line5: One or several sentences containing one of the following
Type1: Line3
Type2: A wordplay on Line3
Type3: A meaningful response to a wordplay of Line4
Restricted Domain: Knock Knock JokesType1: Line3
--Knock, Knock--Who’s there?--Water--Water who?--Water you doing tonight?
Type2: A wordplay on Line3--Knock, Knock--Who’s there?--Ashley--Ashley who?--Actually, I don’t know.
Type3: A meaningful response to a wordplay of Line4--Knock, Knock--Who’s there?--Tank--Tank who?--You are welcome.
Script-based Semantic Theory of HumorThe text is compatible with 2 different scriptsThe 2 scripts are opposite
–Knock, Knock–Who’s there?–Water–Water who?–Water you doing tonight?
Scripts overlap in phonetics representation of water and what are
Scripts differ in meaning
Experimental Design
Definitions:Wordplay: a word that sounds similar but has a
different spelling (and meaning)
What are is a wordplay on waterKeyword: what wordplay is based on (Line3)
Water is a keyword
Recognize only Type1 jokes
Experimental Design
--Knock, Knock--Who’s there?--Water--Water who?--Water you doing tonight?
Step1: joke format validationStep2: computational generation of sound-alike sequencesStep3: validations of meaning of a chosen sound-alike
sequence Step4: last sentence validation with sound-alike sequence
Experimental Design
Training set: 66 Knock Knock jokesEnhance similarity table of lettersSelect N-gram training texts
66 texts containing wordplay from 66 training jokesTest set:
130 Knock Knock jokes 66 Non-jokes that have similar structure to Knock
Knock jokesWater is cold.
Experimental Design
Similarity Table Contains combination of letters that
sound similarBased on similarity table of cross-
referenced English consonant pairs Modified by:
o translating phonemes to letters o adding vowels that are close in
soundo adding other combinations of letters
that may be used to recognize wordplay
Segment of similarity table
Experimental Design
Training texts were entered into N-gram databaseWordplay validation: bigram table
Pairs of words from training texts with count of their occurrences(training texts 1) (texts were 1) (were entered 1) (entered into 1)…
Punchline validation: trigram tableThree words in a row from training texts with count of
their occurrences(training texts were 1) (texts were entered 1) (were entered into 1)
Step 1: Joke Format Validation
Line1: “Knock, Knock”
Line2: “Who’s there?”
Line3: any phrase
Line4: Line3 followed by “who?”
Line5: One or several sentences containing Line3–Knock, Knock–Who’s there?–Water–Water who?–Water you doing tonight?
–Knock, Knock–Who’s there?–Water–Water who?–What are you doing tonight?
Step 2: Generation of Wordplay SequencesRepetitive letter replacements of Line3
Similarity used for letter replacementsResulting utterances are ordered
according to their similarity with Line3Utterances with highest similarity are
checked for decomposition into several words
Words have to be in Webster's Second International (234,936 words)
Segment of similarity table
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2 :Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Decomposition of whator is what orwhat or is different from water
wordplay found; return what or
Step 3: Wordplay Validation
Check if the wordplay is meaningfulIf wordplay is at least two words
o Decompose wordplay into word pairs what or
o Check if word pair in the bigram database no
If wordplay is one wordo The word is in the dictionary
If wordplay is meaningful, Step 4. Otherwise, Step 2.
Step 2: Generation of Wordplay Sequences
a e .23
a o .23
e a .23
e i .23
e o .23
en e .23
k sh .11
l r .56
r m .44
r re .23
t d .39
t th .32
t z .17
w m .44
w r .42
w wh .23
Decomposition of whatare is what arewhat are is different from water
wordplay found; return what are
Step 3: Wordplay Validation
Check if the wordplay is meaningfulIf wordplay is at least two words
o Decompose wordplay into word pairs
what areo Check if word pair in the bigram database
yes
Proceed to Step 4.
Step 4: Last Sentence Validation with WordplayWordplay is meaningfulCould occur
In the beginning of last sentence:
What are you doing?
In the middle of last sentence:
Please tell me what are you doing?
At the end of last sentence
The question started with “what are”.
Step 4: Last Sentence Validation with Wordplay In the beginning of last sentence:
If wordplay is at least 2 wordso What are you doing?o Check if (what are you) and (are you doing) are in trigram table.
If wordplay is only one wordo Meter you doing?o Check if (meter you doing) is in trigram table
If at least one of the needed sequences in not in trigram table, Step 2.
Otherwise, the text is a joke.
Step 4: Last Sentence Validation with Wordplay In the middle of last sentence:
If wordplay is at least 2 wordso Please tell me what are you doing?o Check if (tell me what), (me what are), (what are you) and (are
you doing) are in trigram table.If wordplay is only one word
o Please tell me meter you doing?o Check if (tell me meter) and (meter you doing) is in trigram
tableIf at least one of the needed sequences in not in
trigram table, Step 2. Otherwise, the text is a joke.
Step 4: Last Sentence Validation with Wordplay At the end of last sentence:
If wordplay is at least 2 wordso The sentence ended with what are?o Check if (ended with what) and (with what are) are in trigram
table.If wordplay is only one word
o The sentence ended with meter?o Check if (ended with meter) is in trigram table
If at least one of the needed sequences in not in trigram table, Step 2.
Otherwise, the text is a joke.
Results
66 training jokes 59 jokes were recognized 7 unrecognized, no wordplay found
66 non-jokes 62 correctly recognized as non-jokes 1 found wordplay that makes sense 3 incorrectly recognized as jokes
130 test jokes 8 jokes were not expected to be recognized 12 identified as jokes with expected wordplay 5 identified as jokes with unexpected wordplay 80 expected wordplay found
Possible Enhancements
Improve last sentence validationIncreasing size of text used for N-gram trainingParserN-grams with stemming
Improve wordplay generatorUse of phoneme comparison
Use wider domainAll types of Knock Knock jokesOther types of wordplay jokes
Conclusion
Initial investigation into computational humor recognition using wordplay
The program was designed toRecognize wordplay in jokes 67%Recognize jokes with containing wordplay 12%