aditya joshi pushpak bhattacharyya mark j. carman...

Cognitive Systems Monographs 37

Aditya JoshiPushpak BhattacharyyaMark J. Carman

Investigations in Computational Sarcasm

Cognitive Systems Monographs

Volume 37

Series editors

Rüdiger Dillmann, University of Karlsruhe, Karlsruhe, Germanye-mail: [email protected]

Yoshihiko Nakamura, Tokyo University, Tokyo, Japane-mail: [email protected]

Stefan Schaal, University of Southern California, Los Angeles, USAe-mail: [email protected]

David Vernon, University of Skövde, Skövde, Swedene-mail: [email protected]

The Cognitive Systems Monographs (COSMOS) publish new developments andadvances in the fields of cognitive systems research, rapidly and informally but witha high quality. The intent is to bridge cognitive brain science and biology withengineering disciplines. It covers all the technical contents, applications, andmultidisciplinary aspects of cognitive systems, such as Bionics, System Analysis,System Modelling, System Design, Human Motion, Understanding, HumanActivity Understanding, Man-Machine Interaction, Smart and CognitiveEnvironments, Human and Computer Vision, Neuroinformatics, Humanoids,Biologically motivated systems and artefacts Autonomous Systems, Linguistics,Sports Engineering, Computational Intelligence, Biosignal Processing, or CognitiveMaterials as well as the methodologies behind them. Within the scope of the seriesare monographs, lecture notes, selected contributions from specialized conferencesand workshops.

Advisory Board

Heinrich H. Bülthoff, MPI for Biological Cybernetics, Tübingen, GermanyMasayuki Inaba, The University of Tokyo, JapanJ.A. Scott Kelso, Florida Atlantic University, Boca Raton, FL, USAOussama Khatib, Stanford University, CA, USAYasuo Kuniyoshi, The University of Tokyo, JapanHiroshi G. Okuno, Kyoto University, JapanHelge Ritter, University of Bielefeld, GermanyGiulio Sandini, University of Genova, ItalyBruno Siciliano, University of Naples, ItalyMark Steedman, University of Edinburgh, ScotlandAtsuo Takanishi, Waseda University, Tokyo, Japan

More information about this series at http://www.springer.com/series/8354

http://www.springer.com/series/8354

Aditya Joshi • Pushpak BhattacharyyaMark J. Carman

Investigationsin Computational Sarcasm

123

Aditya JoshiIITB-Monash Research AcademyIndian Institute of Technology BombayMumbai, MaharashtraIndia

Pushpak BhattacharyyaDepartment of Computer Science andEngineering

Indian Institute of Technology BombayMumbai, MaharashtraIndia

Mark J. CarmanFaculty of Information TechnologyMonash UniversityMelbourne, VICAustralia

ISSN 1867-4925 ISSN 1867-4933 (electronic)Cognitive Systems MonographsISBN 978-981-10-8395-2 ISBN 978-981-10-8396-9 (eBook)https://doi.org/10.1007/978-981-10-8396-9

Library of Congress Control Number: 2018932175

© Springer Nature Singapore Pte Ltd. 2018This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmissionor information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exempt fromthe relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in thisbook are believed to be true and accurate at the date of publication. Neither the publisher nor theauthors or the editors give a warranty, express or implied, with respect to the material contained herein orfor any errors or omissions that may have been made. The publisher remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer NatureThe registered company is Springer Nature Singapore Pte Ltd.The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

Sarcasm is defined as verbal irony that is intended to mock or ridicule. Existingsentiment analysis systems show a degraded performance in case of sarcastic text.Hence, computational sarcasm has received attention from the sentiment analysiscommunity. Computational sarcasm refers to computational techniques that dealwith sarcastic text. This monograph presents our investigations in computationalsarcasm based on the linguistic notion of incongruity. For example, the sentence‘I love being ignored’ is sarcastic because the positive word ‘love’ is incongruouswith the negative phrase ‘being ignored.’ These investigations are divided into threeparts: understanding the phenomenon of sarcasm, sarcasm detection, and sarcasmgeneration.

To first understand the phenomenon of sarcasm, we consider two components ofsarcasm: implied negative sentiment and presence of a target. To understand howimplied negative sentiment plays a role in sarcasm understanding, we present anannotation study which evaluates the quality of a sarcasm-labeled dataset created bynon-native annotators. Following this, in order to show how the target of sarcasm isimportant to understand sarcasm, we first describe an annotation study whichhighlights the challenges in distinguishing between sarcasm and irony (since ironydoes not have a target while sarcasm does) and then present a computationalapproach that extracts the target of a sarcastic text.

We then present our approaches for sarcasm detection. To detect sarcasm, wecapture incongruity in two ways: ‘intra-textual incongruity’ where we look at theincongruity within the text to be classified (i.e., target text) and the ‘contextincongruity’ where we incorporate information outside the target text. To detectincongruity within the target text, we present four approaches: (a) a classifier thatcaptures sentiment incongruity using sentiment-based features (as in the case of‘I love being ignored’), (b) a classifier that captures semantic incongruity (as in thecase of ‘A woman needs a man like a fish needs bicycle’) using wordembedding-based features, (c) a topic model that captures sentiment incongruityusing sentiment distributions in the text (in order to discover sarcasm-prevalenttopics such as work, college), and (d) an approach that captures incongruity in thelanguage model using sentence completion. The approaches in (a) and

v

(c) incorporate sentiment incongruity relying on sentiment-bearing words, whereasapproach in (b) and (d) tackles other forms of incongruity where sentiment-bearingwords may not be present.

On the other hand, to detect sarcasm using contextual incongruity, we describetwo approaches: (a) a rule-based approach that uses historical text by an author todetect sarcasm in the text generated by them and (b) a statistical approach that usessequence labeling techniques for sarcasm detection in dialogue. The approach in(a) attempts to detect sarcasm that requires author-specific context, while that in(b) attempts to detect sarcasm that requires conversation-specific context. Finally,we present a technique for sarcasm generation. In this case, we use a template-basedapproach to synthesize incongruity and generate a sarcastic response to user input.The output of our sarcasm generation system obtains high scores on three qualityparameters: coherence, grammaticality, and sarcastic nature. Also, the humanevaluators are able to sufficiently identify the output of our system from that of ageneral purpose chatbot.

Our investigations demonstrate how evidences of incongruity (such as sentimentincongruity, semantic incongruity) can be modeled using different learning tech-niques (such as classifiers, topic models) for sarcasm detection and sarcasm gen-eration. In addition, our findings establish the promise of novel problems, such assarcasm target identification and sarcasm-versus-irony classification, and provideinsights for future research in sarcasm detection.

vi Preface

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Sentiment Analysis (SA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Research Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Sarcasm and Computational Sarcasm . . . . . . . . . . . . . . . . . . . . . . 51.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Turing Test-Completeness . . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Impact on Sentiment Classification . . . . . . . . . . . . . . . . . 7

1.4 Prevalence of Sarcasm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4.1 In Popular Culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4.2 On the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Sarcasm Studies in Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . 111.6 Incongruity for Sarcasm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.7 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.8 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.8.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.8.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.8.3 Other Datasets (Dialogues, Syntactic Patterns, etc.) . . . . . 211.8.4 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.8.5 Rule-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 221.8.6 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 221.8.7 Deep Learning-Based Approaches . . . . . . . . . . . . . . . . . . 241.8.8 Shared Tasks and Benchmark Datasets . . . . . . . . . . . . . . 241.8.9 Reported Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 241.8.10 Trends in Sarcasm Detection . . . . . . . . . . . . . . . . . . . . . 261.8.11 Pattern Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.8.12 Role of Context in Sarcasm Detection . . . . . . . . . . . . . . . 271.8.13 Issues in Sarcasm Detection . . . . . . . . . . . . . . . . . . . . . . 28

vii

1.8.14 Issues with Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 291.8.15 Issues with Sentiment as a Feature . . . . . . . . . . . . . . . . . 291.8.16 Dealing with Dataset Skews . . . . . . . . . . . . . . . . . . . . . . 301.8.17 Sentiment Analysis at IIT Bombay . . . . . . . . . . . . . . . . . 301.8.18 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.8.19 Monograph Organization . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Understanding the Phenomenon of Sarcasm . . . . . . . . . . . . . . . . . . . 332.1 Impact on Cross-Cultural Annotation . . . . . . . . . . . . . . . . . . . . . . 33

2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.1.2 Experiment Description . . . . . . . . . . . . . . . . . . . . . . . . . 352.1.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.2 Sarcasm-versus-irony Classification . . . . . . . . . . . . . . . . . . . . . . . 402.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.2 Experiment Description . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.3 An Approach for Identification of the Sarcasm Target . . . . . . . . . 452.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3.3 Experiment Description . . . . . . . . . . . . . . . . . . . . . . . . . 512.3.4 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Sarcasm Detection Using Incongruity Within Target Text . . . . . . . . 593.1 Sentiment Incongruity as Features . . . . . . . . . . . . . . . . . . . . . . . . 59

3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.1.2 Sentiment Incongruity-Based Features . . . . . . . . . . . . . . . 613.1.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.1.5 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.2 Semantic Incongruity as Features . . . . . . . . . . . . . . . . . . . . . . . . . 663.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2.2 Word Embedding-Based Features . . . . . . . . . . . . . . . . . . 673.2.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.2.5 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.3 Sentiment Incongruity Using Topic Model . . . . . . . . . . . . . . . . . . 743.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.3.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.3.5 Application to Sarcasm Detection . . . . . . . . . . . . . . . . . . 83

viii Contents

3.4 Language Model Incongruity Using Sentence Completion . . . . . . . 843.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.4.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.4.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.4.6 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 Sarcasm Detection Using Contextual Incongruity . . . . . . . . . . . . . . . 934.1 Contextual Incongruity in a Monologue . . . . . . . . . . . . . . . . . . . . 93

4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.1.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.1.5 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.2 Contextual Incongruity in Dialogue . . . . . . . . . . . . . . . . . . . . . . . 1004.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.2.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.2.3 Conversational Sarcasm Dataset . . . . . . . . . . . . . . . . . . . 1034.2.4 Paradigm 1: Traditional Models . . . . . . . . . . . . . . . . . . . 1064.2.5 Paradigm 2: Deep Learning-Based Models . . . . . . . . . . . 1084.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.2.7 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5 Sarcasm Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.2.1 Input Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.2.2 Generator Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.2.3 Sarcasm Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.3.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Contents ix

About the Authors

Aditya Joshi successfully defended his Ph.D. thesis at IITB-Monash ResearchAcademy, Mumbai, a joint Ph.D. program run by the Indian Institute of TechnologyBombay (IIT Bombay) and Monash University, Australia, since January 2013. Hisprimary research focus is computational sarcasm, and he has explored differentways in which incongruity can be captured in order to detect and generate sarcasm.In addition, he has worked on innovative applications of natural language pro-cessing (NLP) such as sentiment analysis for Indian languages, drunk-textingprediction, news headline translation, and political issue extraction. The monographis an outcome of his Ph.D. research.

Dr. Pushpak Bhattacharyya is the current president of the Association forComputational Linguistics (ACL) (2016–2017). He is the Director of the IndianInstitute of Technology Patna (IIT Patna) and Vijay and Sita Vashee ChairProfessor in the Department of Computer Science and Engineering at IndianInstitute of Technology Bombay (IIT Bombay). He was educated at the IndianInstitute of Technology Kharagpur (IIT Kharagpur) (B.Tech), Indian Institute ofTechnology Kanpur (IIT Kanpur) (M.Tech.), and IIT Bombay (Ph.D.).

He has been a Visiting Scholar and Faculty Member at the MassachusettsInstitute of Technology (MIT), Stanford, UT-Houston, and University JosephFourier (France). His research areas include natural language processing, machinelearning, and artificial intelligence (AI). Loved by his students for his inspiringteaching and mentorship, he has guided more than 250 students (Ph.D., masters,and bachelors). He has published over 250 research papers, is the author of thetextbook ‘Machine Translation,’ and has led government and industry projects ofinternational and national importance. His significant contributions in the fieldinclude multilingual lexical knowledge bases and projection. He is a fellow of theNational Academy of Engineering and recipient of the IIT Bombay's PatwardhanAward and the Indian Institute of Technology Roorkee's (IIT Roorkee) VNMMaward, both for technology development. He has also received IBM, Microsoft,Yahoo, and United Nations faculty grants.

xi

Dr. Mark J. Carman is a Senior Lecturer at the Faculty of InformationTechnology, Monash University, Australia. He obtained a Ph.D. from theUniversity of Trento, Italy, in 2004. His research and interests span from theoreticalstudies (e.g., investigating statistical properties of information retrieval measures) topractical applications (e.g., technology for assisting police during digital forensicinvestigations). He has authored a large number of publications in prestigiousvenues, including full papers at SIGIR, KDD, IJCAI, CIKM, WSDM, CoNLL, andECIR and articles in TOIS, IR, JMLR, ML, PR, JAIR, and IP&M.

xii About the Authors

Chapter 1Introduction

The rise ofWeb2.01 enabled Internet users to generate content,which often containedemotion.Considering the value of this content, automatic prediction of sentiment, i.e.,sentiment analysis, became a popular area of research in natural language processing.A recent advancement in sentiment analysis research is the focus on a challenge tosentiment analysis, namely sarcasm.

Sarcasm is a peculiar form of sentiment expression where words of a certainpolarity are used to imply a different polarity, with an intention to mock or ridicule.While sarcasm is often used as a device to express humor, its prevalence makes itimportant for sentiment analysis. In 2014, a BBC story stated that the US SecretService was also seeking a sarcasm detection system.2 Similar interest in sarcasmhas led to the work in computational approaches to process sarcasm over the lastfew years. We refer to them collectively as ‘computational sarcasm.’ There areseveral facets of computational sarcasm, analogous to natural language processing.Like natural language processing covers a broad spectrum of approaches to naturallanguage generation and several detection problems (such as sentiment detection,part-of-speech prediction), computational sarcasm covers similar problems such assarcasm generation and sarcasm detection. This monograph takes an in-depth lookinto the problem of computational sarcasm.

One might argue that computational sarcasm in text alone is insufficient sincesarcasm is understood through non-verbal cues. For example, rolling one’s eyes is acommon indicator of insincerity that often accompanies sarcasm. The importance ofnon-verbal cues is true without a doubt. However, social media today relies heavilyon text, and sarcastic content on social media today has a high volume. Therefore,it is natural that the current focus of computational sarcasm is textual data. In fact,several indicators of non-verbal cues exist in the form of hashtags, emoticons, etc.Therefore, computational sarcasm in text is a viable task in itself. This monograph

1https://en.wikipedia.org/wiki/Web_2.0.2http://www.bbc.com/news/technology-27711109.

© Springer Nature Singapore Pte Ltd. 2018A. Joshi et al., Investigations in Computational Sarcasm, Cognitive SystemsMonographs 37, https://doi.org/10.1007/978-981-10-8396-9_1

1

http://crossmark.crossref.org/dialog/?doi=10.1007/978-981-10-8396-9_1&domain=pdf

https://en.wikipedia.org/wiki/Web_2.0

http://www.bbc.com/news/technology-27711109

2 1 Introduction

describes our investigations in computational sarcasm of text. Our investigationskeep in focus prior work in sarcasm detection, while building upon it.

This chapter builds the foundation of this monograph and is organized as follows.We first introduce sentiment analysis (SA) in Sect. 1.1 followed by computationalsarcasm in Sect. 1.2. We then give the motivation behind computational sarcasm inSect. 1.3. We discuss prevalence of sarcasm in popular culture and social media inSect. 1.4. We describe linguistic theories of sarcasm in Sect. 1.5 and, specifically, thenotion of incongruity in Sect. 1.6. Incongruity forms the foundation of our work. InSect. 1.7, we specify our contribution. Section1.8 describes prior work in computa-tional sarcasm, specifically sarcasm detection. Finally, the monograph organizationis in Sect. 1.8.19.

1.1 Sentiment Analysis (SA)

Sentiment analysis (SA) refers to the research area of analyzing sentiment in text.OpinionMining (OM) has also been used as a synonym to sentiment analysis, in pastliterature Pang and Lee (2008). SA is the task of automatically predicting polarityin text. For example, the sentence ‘The pizza is delicious’ should be labeled as posi-tive, while the sentence ‘The pizza tastes awful’ should be labeled as negative. Thevalue of SA arises from the opportunity to understand preferences of individuals andcommunities, using user-generated content on the Web. To put computational sar-casm in perspective of sentiment analysis, we now highlight the challenges, researchproblems, and applications of SA.

1.1.1 Challenges

Several challenges to SA are well-known (Pang and Lee 2008). The first challenge isnegation. A negation marker can make sentiment prediction difficult. For example,the word ‘not’ in the sentence ‘I do not like this phone’ negates the sentiment of theverb ‘like’ making the sentence negative. However, in the sentence ‘I do not like thisphone but it’s still the best in its category’, the negation word ‘not’ negates only theportion before the word ‘but.’ Scope of a negation marker has been studied in thepast (Harabagiu et al. 2006). The second challenge to SA is domain dependence.Sentiment of words may differ depending on the domain. For example, the word‘unpredictable’ is positive for a movie review (for example, ‘The plot of the movie isunpredictable’) but negative for an automobile review (e.g., ‘The steering of a car isunpredictable’). Domain-specific sentiment is a long-studied subarea of SA (Fahrniand Klenner 2008). Similarly, polysemous words may carry different sentiment indifferent contexts. The word ‘deadly’ may occur in the positive sentence ‘ShaneWarne is a deadly spinner’ and also in the negative sentence ‘There are deadly snakesin the Amazon forest’. Learning classifiers that incorporate polysemous nature of

1.1 Sentiment Analysis (SA) 3

words have also been reported in the past (Balamurali et al. 2011). Another challengeis thwarted expectations. An example of thwarting is: ‘This city is polluted, hasreally bad traffic problems, and the weather sucks. However, I grew up here and I lovethe city.’ The second sentence reverses the sentiment expressed by the first, although,in terms of word count, negative words (‘polluted,’ ‘bad,’ and ‘sucks’) outnumberthe positive words (‘love’).

Thus, it can be seen that, in addition to approaches for sentiment analysis (ingeneral), there have been explorations that focus on specific challenging aspects (inparticular) such as polysemy, domain adaptation. Computational sarcasm is a similarendeavor that focuses on a specific challenge to sentiment analysis.

1.1.2 Research Problems

Research in SA spans several decades nowand has spawnedmultiple problemswithinthe umbrella of SA. Each of these research problems has a large volume of work andseveral innovations. Some of these research problems are:

1. Sentiment detection on its own deals with prediction of positive or negativepolarity. Both rule-based and statistical approaches have been reported for senti-ment detection. Joshi et al. (2011), for example, is a rule-based sentiment detectorthat relies on a sentiment lexicon and a set of rules to account forwords of a certainpolarity and constructs such as negations and conjunctions. Similarly, many sta-tistical sentiment detection approaches use features based on unigrams, bigrams,etc.

2. Subjectivity detection deals with prediction of a text as subjective or objective.In other words, subjectivity detection is concerned with distinguishing betweentext containing sentiment and the one not containing sentiment. This means thatwe wish to distinguish between fact and opinion in case of subjectivity detection.

3. Since a long documentmay contain some portions containing sentiment and somewithout, it is useful to separate the two. This resulted in ‘subjectivity extraction’as an area of research. Subjectivity extraction deals with identification of subsetof sentences in a document that carry sentiment. Such sentences are referred to assubjective sentences, while the ones without sentiment are referred to as objectivesentences. Pang and Lee (2004a) is a fundamental work in the area of subjectivityextraction where a minimum-cut algorithm is used to identify subjective extracts:subset of subjective sentences. For example, in case of the text ‘The film is abouta dog who befriends a boy. The story is very absurd but the actors do a greatjob. The lead is played by a new actor. He is a true find!’, the goal of subjectivityextractionwould be to extract the following sentences ‘The story is very absurd butthe actors do a great job. He is a true find!’ while discarding the other sentences.This may be useful because objective sentences do not contribute to the sentimentof the entity.

aditya joshi pushpak bhattacharyya mark j. carman...

Documents