Knock Knock Jokes

Download Knock Knock Jokes

Post on 20-Feb-2015

865 views

Category:

Documents

5 download

Embed Size (px)

TRANSCRIPT

<p>UNIVERSITY OF CINCINNATIMay 24, 2004 Date:___________________</p> <p>Julia Michelle Taylor I, _________________________________________________________,hereby submit this work as part of the requirements for the degree of:</p> <p>Master of Sciencein:</p> <p>Computer ScienceIt is entitled:Computational Recognition of Humor in a Focused Domain</p> <p>This work and its defense approved by:Dr. Lawrence Mazlack Chair: _______________________________ Dr. Carla Purdy _______________________________ Dr. John Schlipf _______________________________Dr. Michele Vialet _______________________________</p> <p>_______________________________</p> <p>Computational Recognition Of Humor In A Focused DomainA thesis submitted to the Division of Research and Advanced Studies of the University of Cincinnati in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in the Department of Electrical and Computer Engineering and Computer Science of the College of Engineering 2004 by Julia Taylor B.S., University of Cincinnati, 1999 B.A., University of Cincinnati, 1999 Committee Chair: Dr. Lawrence Mazlack</p> <p>Abstract. With advancing developments of artificial intelligence, humor researchers have begun to look at approaches for computational humor. Although there appears to be no complete computational model for recognizing verbally expressed humor, it may be possible to recognize jokes based on statistical language recognition techniques. This is an investigation into computational humor recognition. It considers a restricted set of all possible jokes that have wordplay as a component and examines the limited domain of Knock Knock jokes. The method uses Raskin's Theory of Humor for its theoretical foundation. The original phrase and the complimentary wordplay have two different scripts that overlap in the setup of the joke. The algorithm deployed learns statistical patterns of text in N-grams and provides a heuristic focus for a location of where wordplay may or may not occur. It uses a wordplay generator to produce an utterance that is similar in pronunciation to a given word, and the wordplay recognizer determines if the utterance is valid by using N-gram. Once a possible wordplay is discovered, a joke recognizer determines if a found wordplay transforms the text into a joke.</p> <p>Acknowledgments</p> <p>I would like to express my sincere gratitude to Dr. Lawrence Mazlack, who not only made this project possible, but also very enjoyable. His advice, patience, ideas, and many late evenings of arguments and inventions are only a few reasons in a very long list. Thank you!</p> <p>I would like to thank the Thesis committee, Dr. John Schlipf, Dr. Michele Vialet and Dr. Carla Purdy. This work has greatly benefited from your suggestions.</p> <p>Thanks are due to Electronic Text Center at the University of Virginia Library for the permission to use their texts in the experiments. To Dr. Graeme Ritchie, thank you for your comments in the initial stage of the project, and making your research available. I would also like to thank Adam Hoffman for allowing the flexibility in time that made it possible to complete this thesis. The list would not be complete without G.I. Putiy, who has been an inspiration for many years.</p> <p>I would like to thank my parents, Michael and Tatyana Slobodnik, and my brother Simon for their love, encouragement, and support in too many ways to describe.</p> <p>Last but not least, a sincere thank you to my husband, Matthew Taylor, without whose love, help, understanding and support I would be completely lost.</p> <p>Table of Content List of Tables ................................................................................................. 4 1 Introduction ................................................................................................ 5 2 Background................................................................................................. 72.1 Theories of Humor................................................................................................... 7 2.1.1 Incongruity-Resolution Theory ........................................................................ 8 2.1.2 Script-based Semantic Theory of Humor ...................................................... 12 2.1.3 General Theory of Verbal Humor.................................................................. 17 2.1.4 Veatchs Theory of Humor ............................................................................. 21 2.2 Wordplay Jokes...................................................................................................... 24 2.3 Structure of Jokes .................................................................................................. 26 2.3.1 Structural Ambiguity in Jokes........................................................................ 26 2.3.1.1 Plural and Non-Count Nouns as Ambiguity Enablers................................. 26 2.3.1.2 Conjunctions as Ambiguity Enablers........................................................... 28 2.3.1.3 Construction A Little as Ambiguity Enabler ........................................... 28 2.3.1.4 Can, Could, Will, Should as Ambiguity Enablers........................................ 28 2.3.2 The Structure of Punchline ............................................................................. 29 2.4 Computational Humor .......................................................................................... 35 2.4.1 LIBJOG ............................................................................................................ 35 2.4.2 JAPE.................................................................................................................. 36 2.4.3 Elmo .................................................................................................................. 37 2.4.4 WISCRAIC....................................................................................................... 38 2.4.5 Ynperfect Pun Selector.................................................................................... 40 2.4.6 HAHAcronym .................................................................................................. 41 2.4.7 MSG .................................................................................................................. 42 2.4.8 Tom Swifties ..................................................................................................... 43 2.4.9 Jester ................................................................................................................. 44 2.4.10 Applications in Japanese .............................................................................. 44</p> <p>3 Statistical Measures in Language Processing........................................ 463.1 N-grams................................................................................................................... 46 3.2 Distant N-grams ..................................................................................................... 49</p> <p>4 Possible Methods for Joke Recognition ................................................. 504.1 Simple Statistical Method...................................................................................... 50 1</p> <p>4.2 Punchline Detector................................................................................................. 51 4.3 Restricted Context ................................................................................................. 52</p> <p>5 Experimental Design ................................................................................ 54 6 Generation of Wordplay Sequences ....................................................... 56 7 Wordplay Recognition ............................................................................. 61 8 Joke Recognition ...................................................................................... 648.1 Wordplay in the Beginning of a Punchline.......................................................... 65 8.2 Wordplay at the End of a Punchline .................................................................... 66 8.3 Wordplay in the Middle of a Punchline............................................................... 67</p> <p>9 Training Text ............................................................................................ 679.1 First Approach ....................................................................................................... 67 9.2 Second Approach ................................................................................................... 68 9.3 Third Approach ..................................................................................................... 69 9.4 Fourth Approach ................................................................................................... 71 9.5 Fifth Approach ....................................................................................................... 72</p> <p>10 Experimentation and Analysis........................................................... 7310.1 10.2 Training Set ..................................................................................................... 73 Alternative Training Set Data Test ............................................................... 76</p> <p>10.3 General Joke Testing ...................................................................................... 76 10.3.1 Jokes in the Test Set with Wordplay in the Beginning of Punchline ..... 79 10.3.2 Jokes in the Test Set with Wordplay in the Middle of a Punchline ....... 81 10.4 Testing Non-Jokes........................................................................................... 82</p> <p>11 Summary .............................................................................................. 86</p> <p>2</p> <p>12 Possible Extensions ............................................................................. 88 13 Conclusion............................................................................................ 90 Bibliography ................................................................................................ 92 Appendix A: Training texts ....................................................................... 97 Appendix B: Jokes Used in the Training Set ........................................ 101 Appendix C: Jokes Used in the Test Set ................................................. 105 Appendix D: KK Recognizer Algorithm Description ........................... 115 Appendix E: A table of Similarity of English consonant pairs using the natural classes model, developed by Stefan Frisch...117 Appendix F: Cost Table developed by Christian Hemplemann .......... 118</p> <p>3</p> <p>List of TablesTable1: The three-level scale............................................................................................. 23 Table2: Subset of entries of the Similarity Table, showing similarity of sounds in words between different letters..................................................................................58 Table3: Examples of strings received after replacing one letter from the word water and their similarity value to water..........................................................................60 Table4: Training Jokes results........................................................................................... 74 Table5: Unrecognized jokes in the training set ................................................................. 75 Table6: Results of the Joke Test Set.................................................................................. 79 Table7: Non-joke results ................................................................................................... 84</p> <p>4</p> <p>1</p> <p>Introduction</p> <p>Thinkers from the ancient time of Aristotle and Plato to the present day have strived to discover and define the origins of humor. Most commonly, early definitions of humor relied on laughter: what makes people laugh is humorous. Recent works on humor separate laughter and make it its own distinct category of response. Today there are almost as many definitions of humor as theories of humor; as in many cases, definitions are derived from theories [Latta, 1999]. Still, we are unsure of complete dimensions of the concept [Keith-Spiegel, 1972]. Some researchers say that not only there is no definition that covers all aspects of humor, but also humor is impossible to define [Attardo, 1994].</p> <p>Humor is an interesting subject to study not only because it is difficult to define, but also because sense of humor varies from person to person. Not only does it vary from person to person; but the same person may find something funny one day, but not the next, depending on what mood this person is in, or what has happened to him or her recently. These factors, among many others, make humor recognition challenging.</p> <p>Although most people are unaware of the complex steps involved in humor recognition, a computational humor recognizer has to consider all these steps in order to approach the same ability as a human being.</p> <p>5</p> <p>A common form of humor is verbal, or verbally expressed, humor. Verbally expressed humor can be defined as humor conveyed in language, as opposed to physical or visual humor, but not necessarily playing on the form of the language [Ritchie 2000]. Verbally expressed humor is easier to computationally analyze as it involves reading and understanding texts. While understating the meaning of a text may be difficult for a computer, reading it is not an issue.</p> <p>One of the subclasses of verbally expressed humor is the joke. Hetzron [1991] defines a joke as a short humorous piece of literature in which the funniness culminates in the final sentence. Most researchers agree that jokes can be broken into two parts, a setup and a punchline. The setup is the first part of the joke, usually consisting of most of the text, which establishes certain expectations. The punchline is a much shorter portion of the joke, and it causes some form of conflict. It can force another interpretation on the text, violate an expectation, or both [Ritchie, 1998]. As most jokes are relatively short, and, therefore, do not carry a lot of information, it should be possible to recognize them computationally.</p> <p>Computational recognition of jokes seems to be possible, but it is not easy. An intelligent joke recognizer requires world knowledge to understand most jokes.</p> <p>Computational work in natural language has a long history. Areas of interest have included: translation, understanding, database queries, summarization, indexing, and</p> <p>6</p> <p>retrieval.</p> <p>There has been very limited success in achieving true computational</p> <p>understanding.</p> <p>A focused area within natural language understanding is verbally expressed humor. As Ritchie [1998] states, It will probably be some time before we develop a sufficient understanding of humour, and of human behaviour, to permit even limited form of jokes to lubricate...</p>