understanding satirical articles using common-sense

14
Understanding Satirical Articles Using Common-Sense Dan Goldwasser Purdue University Department of Computer Science [email protected] Xiao Zhang Purdue University Department of Computer Science [email protected] Abstract Automatic satire detection is a subtle text clas- sification task, for machines and at times, even for humans. In this paper we argue that satire detection should be approached using common-sense inferences, rather than tradi- tional text classification methods. We present a highly structured latent variable model cap- turing the required inferences. The model ab- stracts over the specific entities appearing in the articles, grouping them into generalized categories, thus allowing the model to adapt to previously unseen situations. 1 Introduction Satire is a writing technique for passing criticism using humor, irony or exaggeration. It is often used in contemporary politics to ridicule individual politicians, political parties or society as a whole. We restrict ourselves in this paper to such politi- cal satire articles, broadly defined as articles whose purpose is not to report real events, but rather to mock their subject matter. Satirical writing often builds on real facts and expectations, pushed to ab- surdity to express humorous insights about the situ- ation. As a result, the difference between real and satirical articles can be subtle and often confusing to readers. With the recent rise of social media outlets, satirical articles have become increasingly popular and have famously fooled several leading news agencies 1 . These misinterpretations can often 1 https://newrepublic.com/article/118013/ satire-news-websites-are-cashing-gullible- outraged-readers Vice President Joe Biden suddenly barged in, asking if anyone could “hook [him] up with a Dixie cup” of their urine. “C’mon, you gotta help me get some clean whiz. Shinseki, Donovan, I’m looking in your direction” said Biden. “Do you want to hit this?” a man asked President Barack Obama in a bar in Denver Tuesday night. The president laughed but didn’t indulge. It wasn’t the only time Obama was offered weed on his night out. Figure 1: Examples of real and satirical articles. Top: satirical news excerpt. Bottom: real news excerpt. be attributed to careless reading, as there is a clear line between unusual events finding their way to the news and satire, which intentionally places key po- litical figures in unlikely humorous scenarios. The two can be separated by carefully reading the arti- cles, exposing the satirical nature of the events de- scribed in such articles. In this paper we follow this intuition. We look into the satire detection task (Burfoot and Bald- win, 2009), predicting if a given news article is real or satirical, and suggest that this prediction task should be defined over common-sense inferences, rather than looking at it as a lexical text classifica- tion task (Pang and Lee, 2008; Burfoot and Bald- win, 2009), which bases the decision on word-level features. To further motivate this observation, consider the two excerpts in Figure 1. Both excerpts mention top-ranking politicians (the President and Vice Pres- ident) in a drug-related context, and contain infor- mal slang utterances, inappropriate for the subjects’ 537 Transactions of the Association for Computational Linguistics, vol. 4, pp. 537–549, 2016. Action Editor: Timothy Baldwin. Submission batch: 1/2016; Revision batch: 5/2016; Published 12/2016. c 2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

Upload: others

Post on 04-Dec-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Satirical Articles Using Common-Sense

Understanding Satirical Articles Using Common-Sense

Dan GoldwasserPurdue University

Department of Computer [email protected]

Xiao ZhangPurdue University

Department of Computer [email protected]

Abstract

Automatic satire detection is a subtle text clas-sification task, for machines and at times,even for humans. In this paper we argue thatsatire detection should be approached usingcommon-sense inferences, rather than tradi-tional text classification methods. We presenta highly structured latent variable model cap-turing the required inferences. The model ab-stracts over the specific entities appearing inthe articles, grouping them into generalizedcategories, thus allowing the model to adaptto previously unseen situations.

1 Introduction

Satire is a writing technique for passing criticismusing humor, irony or exaggeration. It is oftenused in contemporary politics to ridicule individualpoliticians, political parties or society as a whole.We restrict ourselves in this paper to such politi-cal satire articles, broadly defined as articles whosepurpose is not to report real events, but rather tomock their subject matter. Satirical writing oftenbuilds on real facts and expectations, pushed to ab-surdity to express humorous insights about the situ-ation. As a result, the difference between real andsatirical articles can be subtle and often confusingto readers. With the recent rise of social mediaoutlets, satirical articles have become increasinglypopular and have famously fooled several leadingnews agencies1. These misinterpretations can often

1https://newrepublic.com/article/118013/

satire-news-websites-are-cashing-gullible-

outraged-readers

Vice President Joe Biden suddenly barged in, asking ifanyone could “hook [him] up with a Dixie cup” of theirurine. “C’mon, you gotta help me get some clean whiz.Shinseki, Donovan, I’m looking in your direction” saidBiden.

“Do you want to hit this?” a man asked President BarackObama in a bar in Denver Tuesday night. The presidentlaughed but didn’t indulge. It wasn’t the only time Obamawas offered weed on his night out.

Figure 1: Examples of real and satirical articles. Top:satirical news excerpt. Bottom: real news excerpt.

be attributed to careless reading, as there is a clearline between unusual events finding their way to thenews and satire, which intentionally places key po-litical figures in unlikely humorous scenarios. Thetwo can be separated by carefully reading the arti-cles, exposing the satirical nature of the events de-scribed in such articles.

In this paper we follow this intuition. We lookinto the satire detection task (Burfoot and Bald-win, 2009), predicting if a given news article isreal or satirical, and suggest that this prediction taskshould be defined over common-sense inferences,rather than looking at it as a lexical text classifica-tion task (Pang and Lee, 2008; Burfoot and Bald-win, 2009), which bases the decision on word-levelfeatures.

To further motivate this observation, consider thetwo excerpts in Figure 1. Both excerpts mentiontop-ranking politicians (the President and Vice Pres-ident) in a drug-related context, and contain infor-mal slang utterances, inappropriate for the subjects’

537

Transactions of the Association for Computational Linguistics, vol. 4, pp. 537–549, 2016. Action Editor: Timothy Baldwin.Submission batch: 1/2016; Revision batch: 5/2016; Published 12/2016.

c©2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

Page 2: Understanding Satirical Articles Using Common-Sense

position. The difference between the two examplesis apparent when analyzing the situation describedin the two articles: The first example (top), de-scribes the Vice President speaking inappropriatelyin a work setting, clearly an unrealistic situation. Inthe second (bottom) the President is spoken to inap-propriately, an unlikely, yet not unrealistic, situation.From the perspective of our prediction task, it is ad-visable to base the prediction on a structured repre-sentation capturing the events and their participants,described in the text.

The absurdity of the situation described in satir-ical articles is often not unique to the specific in-dividuals appearing in the narrative. In our exam-ple, both politicians are interchangeable: placing thepresident in the situation described in the first ex-cerpt would not make it less absurd. It is thereforedesirable to make a common-sense inference abouthigh-ranking politicians in this scenario.

We follow these intuitions and suggest a novelapproach for the satire prediction task. Ourmodel, COMSENSE, makes predictions by makingcommon-sense inferences over a simplified narra-tive representation. Similarly to prior work (Cham-bers and Jurafsky, 2008; Goyal et al., 2010; Wangand McAllester, 2015) we represent the narrativestructure by capturing the main entities (and trackingtheir mentions throughout the text), their activities,and their utterances. The result of this process is aNarrative Representation Graph (NRG). Figure 2 de-picts examples of this representation for the excerptsin Figure 1.

Given an NRG, our model makes inferencesquantifying how likely are each of the representedevents and interactions to appear in a real, or satiri-cal context. Annotating the NRG for such inferencesis a challenging task, as the space of possible situa-tions is extremely large. Instead, we frame the re-quired inferences as a highly-structured latent vari-able model, trained discriminatively as part of theprediction task. Without explicit supervision, themodel assigns categories to the NRG vertices (forexample, by grouping politicians into a single cate-gory, or by grouping inappropriate slang utterances,regardless of specific word choice). These categoryassignments form the infrastructure for higher-levelreasoning, as they allows the model to identify thecommonalities between unrelated people, their ac-

tions and their words. The model learns common-sense patterns leading to real or satirical decisionsbased on these categories. We express these pat-terns as parametrized rules (acting as global fea-tures in the prediction model), and base the predic-tion on their activation values. In our example, theserules can capture the combination of (EPolitician) ∧(Qslang)→ Satire, where EPolitician and Qslang arelatent variable assignments to entity and utterancecategories respectively.

Our experiments look into two variants of satireprediction: using full articles, and the more chal-lenging sub-task of predicting if a quote is real givenits speaker. We use two datasets collected 6 yearsapart. The first collected in 2009 (Burfoot and Bald-win, 2009) and an additional dataset collected re-cently. Since satirical articles tend to focus on cur-rent events, the two datasets describe different peo-ple and world events. To demonstrate the robust-ness of our COMSENSE approach we use the firstdataset for training, and the second as out-of-domaintest data. We compare COMSENSE to several com-peting systems including a state-of-the-art Convo-lutional Neural Network (Kim, 2014). Our experi-ments show that COMSENSE outperforms all othermodels. Most interestingly, it does so with a largermargin when tested over the out-of-domain dataset,demonstrating that it is more resistant to overfittingcompared to other models.

2 Related Work

The problem of building computational models deal-ing with humor, satire, irony and sarcasm has at-tracted considerable interest in the the Natural Lan-guage Processing (NLP) and Machine Learning(ML) communities in recent years (Wallace et al.,2014; Riloff et al., 2013; Wallace et al., 2015; Davi-dov et al., 2010; Karoui et al., 2015; Burfoot andBaldwin, 2009; Tepperman et al., 2006; Gonzalez-Ibanez et al., 2011; Lukin and Walker, 2013; Fi-latova, 2012; Reyes et al., 2013). Most work haslooked into ironic expressions in shorter texts, suchas tweets and forum comments. Most related to ourwork is Burfoot and Baldwin (2009) which focusedon satirical articles. In that work the authors sug-gest a text classification approach for satire detec-tion. In addition to using bag-of-words features, the

538

Page 3: Understanding Satirical Articles Using Common-Sense

authors also experiment with semantic validity fea-tures which pair entities mentioned in the article,thus capturing combinations unlikely to appear in areal context. This paper follows a similar intuition;however, it looks into structured representations ofthis information, and studies their advantages.

Our structured representation is related to severalrecent reading comprehension tasks (Richardson etal., 2013; Berant et al., 2014) and work on narrativerepresentation such, as event-chains (Chambers andJurafsky, 2009; Chambers and Jurafsky, 2008), plot-units (Goyal et al., 2010; Lehnert, 1981) and StoryIntention Graphs (Elson, 2012). Unlike these works,narrative representation is not the focus of this work,but rather provides the basis for making inferences,and as result we choose a simpler (and more ro-bust) representation, most closely resembling eventchains (Chambers and Jurafsky, 2008)

Making common-sense inferences is one of thecore missions of AI, applicable to a wide range oftasks. Early work (Reiter, 1980; McCarthy, 1980;Hobbs et al., 1988) focused on logical inference,and manual construction of such knowledge repos-itories (Lenat, 1995; Liu and Singh, 2004). Morerecently, several researchers have looked into au-tomatic common-sense knowledge construction andexpansion using common-sense inferences (Tandonet al., 2011; Bordes et al., 2011; Socher et al.,2013; Angeli and Manning, 2014). Several workshave looked into combining NLP with common-sense (Gerber et al., 2010; Gordon et al., 2011;LoBue and Yates, 2011; Labutov and Lipson, 2012;Gordon et al., 2012). Most relevant to our work isa SemEval-2012 task (Gordon et al., 2012), lookinginto common-sense causality identification predic-tion.

In this work we focus on a different task, satiredetection in news articles. We argue that this task isinherently a common-sense reasoning task, as iden-tifying the satirical aspects in narrative text does notrequire any specialized training, but instead reliesheavily on common expectations of normative be-havior and deviation from it in satirical text. Wedesign our model to capture these behavioral expec-tations using (weighted) rules, instead of relying onlexical features as is often the case in text categoriza-tion tasks. Other common-sense frameworks typi-cally build on existing knowledge bases represent-

?C?mon, you got t a hel p me get some cl ean whiz- Shinseki, Donovan, I?m l ooking in your dir ect ion"

MNR

Ar gument s and modi f i er s

Pr edi cat es

Ani mat eEnt i t i es

bar ge

Suddenly

Quote

Ask

Vice Pr esident Joe Biden

(a) NRG for a satirical article

TMPQuote

Ask

a man

"Do you want t o hit t his?"

Tuesday Night

Bar in Denver

Did Not

Pr esident Bar ack Obama The Pr esident

CoRef

Ar gument s and modi f i er s

Pr edi cat es

Ani mat eEnt i t i es

LOC

Laugh

A0A1A0

Indul ge

A0

NEG

(b) NRG for a real article

Figure 2: Narrative Representation Graph (NRG) fortwo article snippets

ing world knowledge; however, specifying in ad-vance the behaviors commonly associated with peo-ple based on their background and situational con-text, to the extent it can provide good coverage forour task, requires considerable effort. Instead, wesuggest to learn this information from data directly,and our model learns jointly to predict and representthe satirical elements of the article.

3 Modeling

Given a news article, our COMSENSE systemfirst constructs a graph-based representation of thenarrative, denoted Narrative Representation Graph(NRG), capturing its participants, their actions andutterances. We describe this process in more de-tail in Section 3.1. Based on the NRG, our modelmakes a set of inferences, mapping the NRG ver-tices to general categories abstracting over the spe-cific NRG. These abstractions are formulated as la-tent variables in our model. The system makes aprediction by reasoning over the abstract NRG, by

539

Page 4: Understanding Satirical Articles Using Common-Sense

decomposing it into paths, where each path capturesa partial view of the abstract NRG. Finally we asso-ciate the paths with the satire decision output. TheCOMSENSE model then solves a global inferenceproblem, formulated as an Integer Linear Program(ILP) instance, looking for the most likely explana-tion of the satire prediction output, consistent withthe extracted patterns. We explain this process indetail in Section 3.2.

NRG Abstraction as Common-Sense The maingoal of the COMSENSE approach is to move awayfrom purely lexical models, and instead base its de-cisions on common-sense inferences. We formulatethese inferences as parameterized rules, mapping el-ements of the narrative, represented using the NRG,to a classification decision. The rules’ ability to cap-ture common-sense inferences hinges on two key el-ements. First, the abstraction of NRG nodes intotyped narrative elements allows the model to findcommonalities across entities and their actions. Thisis done by associating each NRG node with a set oflatent variables. Second, constructing the decisionrules according to the structure of the NRG graphallows us to model the dependencies between narra-tive elements. This is done by following the pathsin the abstract NRG, generating rules by combiningthe latent variables representing nodes on the path,and associating them with a satire decision variable.

Computational Considerations When setting upthe learning system, there is a clear expressiv-ity/efficiency tradeoff over these two elements. In-creasing the number of latent variables associatedwith each NRG node would allow the model to learna more nuanced representation. Similarly, gener-ating rules by following longer NRG paths wouldallow the model to condition its satire decision onmultiple entities and events jointly. The addedexpressivity does not come without price. Giventhe limited supervision afforded to the model whenlearning these rules, additional expressivity wouldresult in a more difficult learning problem whichcould lead to overfitting. Our experiments demon-strate this tradeoff, and in Figure 4 we show the ef-fect of increasing the number of latent variables onperformance. An additional concern with increas-ing the model’s expressivity is computational effi-ciency. Satire prediction is formulated as an ILP

inference process jointly assigning values to the la-tent variables and making the satire decision. SinceILP is exponential in the number of variables, in-creasing the number of latent variables would becomputationally challenging. In this paper we takea straight-forward approach to ensuring computa-tional tractability by limiting the length of NRGpaths considered by our model to a constant sizec=2. Assuming that we have m latent categories as-sociated with each node, each path would generatemc ILP variables (see Section 3.3 for details), hencethe importance of limiting the length of the path. Inthe future we intend to study approximate inferencemethods that can help alleviate this computationaldifficultly, such as using LP-approximation (Martinset al., 2009).

3.1 Narrative Representation Graph for NewsArticles

The Narrative Representation Graph (NRG) is a sim-ple graph-based representation for narrative text, de-scribing the connections between entities and theiractions. The key motivation behind NRG was to pro-vide the structure necessary for making inferences,and as a result we chose a simple representation thatdoes not take into account cross-event relationships,and nuanced differences between some of the eventargument types. While other representations (Mani,2012; Goyal et al., 2010; Elson, 2012) capture moreinformation, they are harder to construct and moreprone to error. We will look into adapting thesemodels for our purpose in future work.

Since satirical articles tend to focus on politicalfigures, we design the NRG around animate entitiesthat drive the events described in the text, their ac-tions (represented as predicate nodes), their contex-tualizing information (location-modifiers, temporalmodifiers, negations), and their utterances. We omit-ted from the graph other non-animate entity types.In Figure 2 we show an example of this representa-tion.

Similar in spirit to previous work (Goyal et al.,2010; Chambers and Jurafsky, 2008), we representthe relations between the entities that appear in thestory using a Semantic Role Labeling system (Pun-yakanok et al., 2008) and collapse all the entity men-tions into a single entity using a Co-Reference reso-lution system (Manning et al., 2014). We attribute

540

Page 5: Understanding Satirical Articles Using Common-Sense

utterances to their speaker based on a previouslypublished rule based system (O’Keefe et al., 2012).

Formally, we construct a graph G = {V,E},where V consists of three types of vertices: AN-IMATE ENTITY (e.g., people), PREDICATE (e.g.,actions) and ARGUMENT (e.g., utterances, loca-tions). The edges E capture the relationships be-tween vertices. The graph contains several differentedges. COREF edges collapse the mentions of thesame entity into a single entity, ARGUMENT-TYPE

edges connect ANIMATE ENTITY nodes to PRED-ICATE nodes2, and PREDICATE nodes to argumentnodes (modifiers). Finally we add QUOTE edgesconnecting ANIMATE ENTITY nodes to utterances(ARGUMENT).

3.2 Satire Prediction using the NarrativeRepresentation Graph

Satire prediction is inherently a text classificationproblem. Such problems are often approached us-ing a Bag-of-Words (BoW) model which ignores thedocument structure when making predictions. In-stead, the NRG provides a structured representationfor making the satire prediction. We begin by show-ing how the NRG can be used directly and then dis-cuss how to enhance it by mapping the graph intoabstract categories.

Directly Using NRG for Satire Prediction Wesuggest a simple approach for extracting features di-rectly from the NRG, by decomposing it into graphpaths, without mapping the graph into abstract cat-egories. This simple, word-based representation forprediction structured according to the NRG (denotedNARRLEX), generates features by using the wordsin the original document, corresponding to the graphdecomposition. For example, consider the path con-necting “a man” to an utterance in Figure 2(b). Sim-ple features could associate the utterances wordswith that entity, rather than with the President. Theresulting NARRLEX model generates Bag-of-Wordsfeatures based on words corresponding to NRG pathvertices, conditioned on their connected entity ver-tex.

Using Common-Sense for Satire Prediction Un-like the NARRLEX model, which relies on directly

2These edges are typed according to their semantic roles.

observed information, our COMSENSE model per-forms inference over higher level patterns. In thismodel the prediction is a global inference process,taking into account the relationships between NRGelements (and their abstraction into categories) andthe final prediction. This process is described in Fig-ure 3.

First, the model associates a high level category,that can be reused even when other, previously un-seen, entities are discussed in the text. We associatea set of Boolean variables with each NRG vertex,capturing higher level abstraction over this node.

We define three types of categories correspond-ing to the three types of vertices, and denote themE,A,Q for Entity category, Action category andQuote category, respectively. Each category vari-able can take k different values. As a conventionwe denote X = i as category assignment, whereX ∈ {E,A,Q} is the category type, and i is its as-signment. Since these category assignments are notdirectly observed, they are treated as latent variablesin our model. This process is exemplified at the topright corner of Figure 3.

Combinations of category assignments form pat-terns used for determining the prediction. These pat-terns can be viewed as parameterized rules. Eachweighted rule associates a combination with an out-put variable (SATIRE or REAL). Examples of suchrules are provided in the middle of the right cornerof Figure 3. We formulate the activations of theserules as Boolean variables, whose assignments arehighly interconnected. For example, the variablesrepresenting the following rules (E = 0)→ SATIRE

and (E = 0)→ REAL are mutually exclusive, sinceassigning a T value to either one entails a satire(or real) prediction. To account for this interdepen-dency, we add constraints capturing the relations be-tween rules.

The model makes predictions by combining therule weights and predicting the top scoring outputvalue. The prediction can be viewed as a derivationprocess, mapping article entities to categories (e.g.,ENTITY(“A MAN”)→ (E=0), is an example of suchderivation), combinations of categories composeinto prediction patterns (e.g., (E=0)→SATIRE). Weuse an ILP solver to find the optimal derivation se-quence. We describe the inference process as an In-teger Linear Program in the following section.

541

Page 6: Understanding Satirical Articles Using Common-Sense

"do you want t o hit t his?"

Quote

A0

a man Pr esident Bar ak Obama

Laugh

Ar gument s and modi f i er s

Pr edi cat es

Ani mat eEnt i t i es E=1

Q=1

A=0

E=0

Commonsense Predict ion Rules

Latent Category AssignmentsEntity("a man") (E=0)Entity("president Barak Obama") (E=1) Predicate("laugh") (A=0)Quote("Do you want to hit This?") (Q=1)

(E=0) SATIRE(E=0) SATIRE(A=0) SATIRE(Q=1) SATIRE(E=1) (A=0) SATIRE(E=0) (Q=1) SATIRE

(E=0) REAL(E=0) REAL(A=0) REAL(Q=1) REAL(E=1) (A=0) REAL(E=0) (Q=1) REAL

Figure 3: Extracting Common-sense prediction rules.

3.3 Identifying Relevant Interactions usingConstrained Optimization

We formulate the decision as a 0-1 Integer LinearProgramming problem, consisting of three types ofBoolean variables: category assignments indicatorvariables, indicator variables for common-sense pat-terns, and finally the output decision variables. Eachindicator variable is also represented using a featureset, used to score its activation.

3.3.1 Category Assignment VariablesEach node in the NRG is assigned a set of com-

peting variables, mapping the node to different cate-gories according to its type.

• ANIMATE ENTITY Category Variables, de-noted hi,j,E , indicating the Entity category i forNRG vertex j.

• ACTION Category Variables, denoted hi,j,A, in-dicating the Action category i for NRG vertex j.

• QUOTE Category Variables, denoted hi,j,Q, in-dicating the Quote category i for NRG vertex j.

The number of possible categories for each vari-able type is a hyper-parameter of the model.

Variable activation constraints Category as-signments to the same node are mutually exclusive(a node can only have a single category). We encodethis fact by constraining the decision with a linearconstraint (where X ∈ {E,A,Q}):

∀j∑

i

hi,j,X = 1.

Category Assignment Features Each deci-sion variable decomposes into a set of features,φ(x, hi,j,X) capturing the words associated with thej-th vertex, conditioned on X and i.

3.3.2 Common-sense Patterns VariablesWe represent common-sense prediction rules us-

ing an additional set of Boolean variables, connect-ing the category assignments variables with the out-put prediction. The space of possible variables isdetermined by decomposing the NRG into paths ofsize up to 2, and associating two Boolean variableswith category assignment variables corresponding tothe vertices on these paths. One of the variables as-sociates the sequence of category assignment vari-ables with a REAL output value, and one with aSATIRE output value.

• Single Vertex Path Patterns Variables, denotedby hBhi,j,X

, indicating that the category assignmentcaptured by hi,j,X is associated with output valueB (where B∈ {SATIRE, REAL}).

• Two Vertex Path Patterns Variables, denoted byhB(hi,j,X1)

,(hk,l,X2), indicating that the pattern cap-

tured by category assignment along the NRG pathof hi,j,X1 and hi,j,X2 is associated with outputvalue B (where B∈ {SATIRE, REAL}).Decision Consistency constraints It is clear that

the activation of the common-sense Patterns Vari-ables entails the activation of the category assign-ment variables, corresponding to the elements of thecommon-sense patterns. For readability we onlywrite the constraint for the Single Vertex Path Vari-ables:

542

Page 7: Understanding Satirical Articles Using Common-Sense

(hBhi,j,X) =⇒ (hi,j,X).

Features Similar to the category assignmentvariable features, each decision variable decom-poses into a set of features, φ(x, hBhi,j,X

). These fea-tures captures the words associated with each of thecategory assignment variables (in this example, thewords associated with the j-th vertex) conditionedon the category assignments and the output predic-tion value (in this example, X, i and B). We also adda feature φ(hi,j,X , B) capturing the connection be-tween the output value B, and category assignment.

3.3.3 Satire Prediction VariablesFinally, we add two more Boolean variables cor-

responding to the output prediction: hSatire andhReal. The activation of these two variables is mu-tually exclusive, we encode that by adding the con-straint:

hSatire + hReal = 1.

We ensure the consistency of our model addingconstraints forcing agreement between the final pre-diction variables, and the common-sense patternsvariables:

hBhi,j,X=⇒ hB.

Overall Optimization FunctionThe Boolean variables described in the previous

section define a space of competing inferences. Wefind the optimal output value derivation by findingthe optimal set of variables assignments, by solvingthe following objective:

maxy,h

∑ihiw

Tφ(x, hi, y)

s.t. C, ∀i;hi ∈ {0, 1}, (1)

where hi ∈ H is the set of all variables definedabove and C is the set of constraints defined overthe activation of these variables. w is the weightvector, used to quantify the feature representation ofeach h, obtained using a feature function φ(·).

Note that the Boolean variable acts as a 0-1 indi-cator variable. We formalize Eq. (1) as an ILP in-stance, which we solve using the highly optimizedGurobi toolkit3.

3http://www.gurobi.com/

4 Parameter Estimation for COMSENSE

The COMSENSE approach models the decision asinteractions between high-level categories of enti-ties, actions and utterances. However, the high levelcategories assigned to the NRG vertices are not ob-served, and as a result we view it as a weakly super-vised learning problem, where the category assign-ments correspond to latent variable assignments. Welearn the parameters of these assignments by using adiscriminative latent structure learning framework.

The training data is a collectionD ={(xi, yi)}ni=1, where xi is an article, parsedinto an NRG representation, and y is a binary label,indicating if the article is satirical or real.

Given this data we estimate the models’ parame-ters by minimizing the following objective function.

LD(w) = minw

λ

2||w||2 +

1

n

n∑

i=1

ξi (2)

ξi is the slack variable, capturing the margin vio-lation penalty for a given training example, and de-fined as follows:

ξi = maxy,h

f(x,h, y,w) + cost(y, yi)

−maxh

f(x,h, yi,w),

where f(·) is a scoring function, similar to the oneused in Eq. 1. The cost function is the margin thatthe true prediction must exceed over the competinglabel, and it is simply defined as the difference be-tween the model prediction and the gold label. Thisformulation is an extension of the hinge loss for la-tent structure SVM. λ is the regularization parame-ter controlling the tradeoff between the l2 regularizerand the slack penalty.

We optimize this objective using the stochasticsub-gradient descent algorithm (Ratliff et al., 2007;Felzenszwalb et al., 2009). We can compute the sub-gradient as follows:

∇LD(w) = λw +n∑

i=1

Φ(xi, yi, y∗)

Φ(xi, yi, y∗) = φ(xi,h

∗, yi)− φ(xi,h∗, y∗),

where φ(xi,h∗, y∗) is the set of features represent-

ing the solution obtained after solving Eq. 14 and4modified to accommodate the margin constraint

543

Page 8: Understanding Satirical Articles Using Common-Sense

making a prediction. φ(xi,h∗, yi) is the set of fea-

tures representing the solution obtained by solvingEq. 1 while fixing the outcome of the inference pro-cess to the correct prediction (i.e., yi). Intuitively, itcan be considered as finding the best explanation forthe correct label using the latent variables h.

In the stochastic version of the sub gradient de-scent algorithm we approximate ∇LD(w) by com-puting the sub gradient of a single example and mak-ing a local update. This version resembles the latent-structure perceptron algorithm (Sun et al., 2009).We repeatedly iterate over the training examples andfor each example, if the current w leads to a correctprediction (and satisfies the margin constraint), weonly shrink w according to λ. If the model makes anincorrect prediction, the model is updated accordingΦ(xi, yi, y

∗). The optimization objective LD(W ) isnot convex, and the optimization procedure is guar-anteed to converge to a local minimum.

5 Empirical Study

We design our experimental evaluation to help clar-ify several questions. First, we want to understandhow our model compares with traditional text classi-fication models. We hypothesize that these methodsare more susceptible to overfitting, and design ourexperiments accordingly. We compare the models’performance when using in-domain data (test andtraining data are from the same source), and out-of-domain data, where the test data is collected froma different source. We look into two tasks. Oneis the Satire detection task (Burfoot and Baldwin,2009). We also introduce a new task, called “didI say that?” which only focuses on utterances andspeakers.

The second aspect of our evaluation focuses onthe common-sense inferences learned by our model.We examine how the size of the set of categoriesimpacts the model performance. We also providea qualitative analysis of the learned categories us-ing a heat map, capturing the activation strength oflearned inferences over the training data.

Prediction tasks We look into two predictiontasks: (1) Satire Detection (denoted SD), a binaryclassification task, in which the model has access tothe complete article (2) “Did I say that?” (denotedDIST), a binary classification task, consisting only

of entities mentions (and their surrounding contextin text) and direct quotes. The goal of the DISTis to predict if a given utterance is likely to be real,given its speaker. Since not all document contain di-rect quotes, we only use a subset of the documentsin the SD task.

Datasets In both prediction tasks we look intotwo settings: (1) In-domain prediction: where thetraining and test data are collected from the samesource, and (2) out-of-domain prediction, where thetest data is collected from a different source. Weuse the data collected by Burfoot and Baldwin(2009) for training the model in both settings, andits test data for in-domain prediction (denoted TRAIN

- SD’09, TEST - SD’09, TRAIN - SD’09 - DIST, TEST -

SD’09 - DIST, respectively for training and testing inthe SD and DIST tasks). In addition, we collecteda second dataset of satirical and real articles (de-noted SD’16). This collection of articles containsreal articles from cnn.com and satirical articles fromtheonion.com, a well known satirical news website.The articles were published between 2010 to 2015,appearing in the political sections of both news web-sites. Following other work in the field, all datasetsare highly skewed toward the negative class (real ar-ticles), as it better characterizes a realistic predictionscenario. The statistics of the datasets are summa-rized in Table 2.

Evaluated Systems We compare several systems,as follows:

SystemALLPOS Always predict Satire

BB’09 Results by (Burfoot and Baldwin, 2009)

CONV Convolutional NN. We followed (Kim,2014), using pre-trained 300-dimensionalword vectors (Mikolov et al., 2013).

LEX SVM with unigram (LEXU) or both uni-gram and bigram (LEXU+B) features

NARRLEX SVM with direct NRG-based features (seeSec 3.2)

COMSENSE Our model. We denote the full model asCOMSENSEF , and COMSENSEQ when us-ing only the entity+quotes based patterns.

We tuned all the models’ hyper-parameters by us-ing a small validation set, consisting of 15% of thetraining data. After setting the hyper-parameters, the

544

Page 9: Understanding Satirical Articles Using Common-Sense

model was retrained using the entire dataset. Weused SVM-light5 to train our lexical baseline sys-tems (LEX and NARRLEX). Since the data is highlyskewed towards the negative class (REAL), we ad-just the learner objective function cost factor for pos-itive examples to outweigh negative examples. Thecost factor was tuned using the validation set.

5.1 Experimental ResultsSince our goal is to identify satirical articles, givensignificantly more real articles, we report the F-measure of the positive class. The results are sum-marized in Tables 1 and 3. We can see that in allcases the COMSENSE model obtains the best results.We note that in both tasks, when learning in the out-of-domain settings performance drops sharply, how-ever the gap between the COMSENSE model andother models increases in these settings, showingthat it is less prone to overfitting.

Interestingly, for the satire detection (SD) task,the COMSENSEQ model performs best for the in-domain setting, and COMSENSEF gives the best per-formance in the out-of-domain settings. We hypoth-esize that this is due to a phenomenon we call “over-fitting to document structure”. Lexical models tendto base the decision on word choices specific to thetraining data, and as a result when tested on out ofdomain data, which describes new events and enti-ties, performance drops sharply. Instead, the COM-SENSEQ model focuses on properties of quotationsand entities appearing in the text. In the SD’09datasets, this information helps focus the learner,as the real and satire articles are structured differ-ently (for example, satire articles frequently containmultiple quotes). This structure is not maintainedwhen working with out-of-domain data, and indeedin these settings the model benefits from using addi-tional information offered by the full model.

Number of Latent Categories Our COM-SENSE model is parametrized with the numberof latent categories it considers for each entity,predicate and quote. This hyper-parameter can havea strong influence on the model performance (andrunning time). Increasing it adds to the model’sexpressivity allowing it to learn more complexpatterns, but also defines a more complex learning

5http://svmlight.joachims.org/

0.31

0.35

0.38

0.42

0.45

0.49

2 3 4 5 6

EV=2EV=2 EV=3EV=3

EV=1EV=1 LexLex

Quote Vars

F-

Sco

re

Figure 4: Different Number of Latent Categories. EV de-notes the number entity categories used, and QuoteVars denotesthe number of quote categories used.

problem (recall our non-convex learning objectivefunction). We focused on the DIST task whenevaluating different configurations as it convergedmuch faster than the full model. Figure 4 plots themodel behavior when using different numbers oflatent categories. Interestingly, the number of entitycategories saturates faster than the number of quotecategories. This can be attributed to the limited textdescribing entities.

Visualizing Latent COMSENSE Patterns Giventhe assignment to latent categories, our model learnscommon-sense patterns for identifying satirical andreal articles based on these categories. Ideally, thesepatterns could be extracted directly from the data,however providing the resources for this additionalprediction task is not straightforward. Instead, weview the category assignment as latent variables,which raises the question - what are the categorieslearned by the model?

In this section we provide a qualitative evaluationof these categories and the prediction rules identifiedby the system using the heat map in Figure 5. Forsimplicity, we focus on the DIST task, which onlyhas categories corresponding to entities and quotes.

(a) Prediction Rules These patterns are expressedas rules, mapping category assignments to outputvalues. In the DIST task, we consider combinationsof entity and quote category pairs, denoted Ei,Qj, inthe heat map. The top part of Figure 5, in red, showsthe activation strength of each of the category com-

545

Page 10: Understanding Satirical Articles Using Common-Sense

Task: SDINDOMAIN (SD’09+SD’09) OUTDOMAIN (SD’09+SD’16)

P R F P R F

ALLPOS 0.063 1 0.118 0.121 1 0.214BB’09 0.945 0.690 0.798 - - -CONV 0.822 0.531 0.614 0.517 0.310 0.452LEXU 0.920 0.690 0.790 0.298 0.579 0.394LEXU+B 0.840 0.720 0.775 0.347 0.367 0.356NARRLEX 0.690 0.590 0.630 0.271 0.425 0.330COMSENSEQ 0.839 0.780 0.808 0.317 0.706 0.438COMSENSEF 0.853 0.70 0.770 0.386 0.693 0.496

Table 1: Results for the SD task

E0,Q0 E0,Q1 E0,Q2 E1,Q0 E1,Q1 E1,Q2 E0,Q0 E0,Q1 E0,Q2 E1,Q0 E1,Q1 E1,Q2Satire Rule RuleReal Activation Activation

Quote Topics Activation Entity Topics ActivationSatire Profanity PresidentRealSatire Drugs LiberalRealSatire Polite Conserva-Real tiveSatire Science Annony-Real mousSatire Legal PoliticsRealSatire Politics SpeakerRealSatire Contro- LawEnfo-Real versy rcement

Figure 5: Visualization of the categories learned by the models. Color coding capture the activation strength of manuallyconstructed topical word groups, according to each latent category. Darker colors indicate higher values. Ei (Qi), indicates anentity (Quote) variable assigned the i-th category.

Data REAL SATIRE

TRAIN - SD’09 2505 133TEST - SD’09 1495 100TEST - SD’16 3117 433TRAIN - SD’09 - DIST 1160 112TEST - SD’09 - DIST 680 85TEST - SD’16- DIST 1964 362

Table 2: Datasets statistics.

binations when making predictions over the train-ing data. Darker colors correspond to larger values,which were computed as:

cell(CE , CQ, B) =

∑jhB(hCE,j,E),(hCQ,j,Q)

∑j,k,l

hB(hk,j,E),(hl,j,Q)

Intuitively, each cell value in Figure 5 is the numberof times each category pattern appeared in REAL or

SATIRE output predictions, normalized by the over-all number of pattern activations for each output.

We assume that different patterns will be associ-ated with satirical and real articles, and indeed wecan see that most entities and quotes appearing inREAL articles fall into a distinctive category pattern,E0, Q0. Interestingly, there is some overlap betweenthe two predictions in the most active SATIRE cate-gory (E1, Q0). We hypothesize that this is due to thefact that the two article types have some overlap.

(b) Associating topic words with learned cate-gories In order to understand the entity and quotecategories emerging from the training phase, welook at the activation strength of each category pat-tern with respect to a set of topic words. We manu-ally identified a set of entity types and quote topics,which are likely to appear in political articles. Weassociate a list of words with each one of these types.

546

Page 11: Understanding Satirical Articles Using Common-Sense

Task: DISTINDOMAIN (DIST’09+DIST’09) OUTDOMAIN (DIST’09+DIST’16)

P R F P R F

ALLPOS 0.110 1 0.198 0.155 1 0.268LEXU 0.837 0.423 0.561 0.407 0.328 0.363COMSENSEQ 0.712 0.553 0.622 0.404 0.561 0.469

Table 3: Results for the DIST task

For example, the entity topic PRESIDENT was asso-ciated with words such as president, vice-president,Obama, Biden, Bush, Clinton. Similarly, we associ-ated with the quote topic PROFANITY a list of pro-fanity words. We associate 7 types with quote cate-gories corresponding to style and topic, namely PRO-

FANITY, DRUGS, POLITENESS, SCIENCE, LEGAL, POLITICS,

CONTROVERSY, and another set of seven types withentity types, namely PRESIDENT, LIBERAL, CONSERVA-

TIVE, ANONYMOUS, POLITICS, SPEAKER, LAW ENFORCE-

MENT.

In the bottom left part of Figure 5 (in blue), weshow the activation strength of each category withrespect to the set of selected quote topics. Intu-itively, we count the number of times the words as-sociated with a given topic appeared in the text spancorresponding to a category assignment pair, sepa-rately for each output prediction. We normalize thisvalue by the total number of topic word occurrences,over all category assignment pairs. Note that weonly look at the text span corresponding to quotevertices in the NRG. We provide a similar analysisfor entity categories in the bottom right part of Fig-ure 5 (in green). We show the activation strengthof each category with respect to the set of selectedentity topic words. As can be expected, we can seethat profanity words are only associated with satir-ical categories, and even more interestingly, whenwords appear in both satirical and real predictions,they tend to fall into different categories. For ex-ample, the topic words related to DRUGS can ap-pear both in real articles discussing alcohol and drugpolicies. But topic words related to drugs also ap-pear in satirical articles portraying politicians usingthese substances. While these are only qualitativeresults, we believe they provide strong intuitions forfuture work, especially considering the fact that theactivation values do not rely on direct supervision,and only reflect the common-sense patterns emerg-ing from the learned model.

6 Summary and Future Work

In this paper we presented a latent variable modelfor satire detection. We followed the observationthat satire detection is inherently a semantic task andmodeled the common-sense inferences required forit using a latent variable framework.

We designed our experiments specifically to ex-amine if our model can generalize better than un-structured lexical models by testing it on out-of-domain data. Our experiments show that in thesechallenging settings, the performance gap betweenour approach and the unstructured models increases,demonstrating the effectiveness of our approach.

In this paper we restricted ourselves to limitednarrative representation. In the future we intend tostudy how to extend this representation to capturemore nuanced information.

Learning common-sense representation for pre-diction problems has considerable potential for NLPapplications. As the NLP community considers in-creasingly challenging tasks focusing on semanticand pragmatic aspects, the importance of findingsuch common-sense representation will increase.In this paper we demonstrated the potential ofcommon-sense representations for one application.We hope these results will serve as a starting pointfor other studies in this direction.

ReferencesGabor Angeli and Christopher D Manning. 2014. Nat-

uralli: Natural logic inference for common sense rea-soning. In Proc. of the Conference on Empirical Meth-ods for Natural Language Processing (EMNLP).

Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, AbbyVander Linden, Brittany Harding, Brad Huang, PeterClark, and Christopher D. Manning. 2014. Model-ing biological processes for reading comprehension.In Proc. of the Conference on Empirical Methods forNatural Language Processing (EMNLP).

Antoine Bordes, Jason Weston, Ronan Collobert, andYoshua Bengio. 2011. Learning structured embed-

547

Page 12: Understanding Satirical Articles Using Common-Sense

dings of knowledge bases. In Proc. of the NationalConference on Artificial Intelligence (AAAI).

Clint Burfoot and Timothy Baldwin. 2009. Automaticsatire detection: Are you having a laugh? In Proc. ofthe Annual Meeting of the Association ComputationalLinguistics (ACL).

Nathanael Chambers and Dan Jurafsky. 2008. Unsuper-vised learning of narrative event chains. In Proc. ofthe Annual Meeting of the Association ComputationalLinguistics (ACL).

Nathanael Chambers and Dan Jurafsky. 2009. Unsuper-vised learning of narrative schemas and their partici-pants. In Proc. of the Annual Meeting of the Associa-tion Computational Linguistics (ACL).

Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010.Semi-supervised recognition of sarcastic sentences intwitter and amazon. In Proc. of the Annual Confer-ence on Computational Natural Language Learning(CoNLL).

David K Elson. 2012. Dramabank: Annotating agencyin narrative discourse. In Proc. of the InternationalConference on Language Resources and Evaluation(LREC).

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, andD. Ramanan. 2009. Object detection with discrimina-tively trained part based models. IEEE Transactionson Pattern Analysis and Machine Intelligence, 99(1).

Elena Filatova. 2012. Irony and sarcasm: Corpus gen-eration and analysis using crowdsourcing. In Proc. ofthe International Conference on Language Resourcesand Evaluation (LREC).

Matt Gerber, Andrew S Gordon, and Kenji Sagae.2010. Open-domain commonsense reasoning usingdiscourse relations from a corpus of weblog stories.In Proc. of the NAACL HLT 2010 First InternationalWorkshop on Formalisms and Methodology for Learn-ing by Reading.

Roberto Gonzalez-Ibanez, Smaranda Muresan, and NinaWacholder. 2011. Identifying sarcasm in twitter: acloser look. In Proc. of the Annual Meeting of the As-sociation Computational Linguistics (ACL). Associa-tion for Computational Linguistics.

Andrew S Gordon, Cosmin Adrian Bejan, and KenjiSagae. 2011. Commonsense causal reasoning usingmillions of personal stories. In Proc. of the NationalConference on Artificial Intelligence (AAAI).

Andrew S Gordon, Zornitsa Kozareva, and MelissaRoemmele. 2012. Semeval-2012 task 7: choice ofplausible alternatives: an evaluation of commonsensecausal reasoning. In Proc. of the Sixth InternationalWorkshop on Semantic Evaluation.

Amit Goyal, Ellen Riloff, and Hal Daume III. 2010. Au-tomatically producing plot unit representations for nar-

rative text. In Proc. of the Conference on EmpiricalMethods for Natural Language Processing (EMNLP).

Jerry R Hobbs, Mark Stickel, Paul Martin, and DouglasEdwards. 1988. Interpretation as abduction. In Proc.of the Annual Meeting of the Association Computa-tional Linguistics (ACL).

Jihen Karoui, Farah Benamara, V?ronique Moriceau,Nathalie Aussenac-Gilles, and Lamia Hadrich Bel-guith. 2015. Towards a contextual pragmatic model todetect irony in tweets. In Proc. of the Annual Meetingof the Association Computational Linguistics (ACL).

Yoon Kim. 2014. Convolutional Neural Networks forSentence Classification. In Proc. of the Conference onEmpirical Methods for Natural Language Processing(EMNLP).

Igor Labutov and Hod Lipson. 2012. Humor as circuitsin semantic networks. In Proc. of the Annual Meetingof the Association Computational Linguistics (ACL).

Wendy G. Lehnert. 1981. Plot units and narrative sum-marization. Cognitive Science, 5(4):293–331.

Douglas B Lenat. 1995. Cyc: A large-scale investmentin knowledge infrastructure. Communications of theACM, 38(11):33–38.

Hugo Liu and Push Singh. 2004. Conceptnet?a practi-cal commonsense reasoning tool-kit. BT technologyjournal, 22(4):211–226.

Peter LoBue and Alexander Yates. 2011. Types ofcommon-sense knowledge needed for recognizing tex-tual entailment. In Proc. of the Annual Meeting of theAssociation Computational Linguistics (ACL).

Stephanie Lukin and Marilyn Walker. 2013. Really?well. apparently bootstrapping improves the perfor-mance of sarcasm and nastiness classifiers for onlinedialogue. In Proc. of the Workshop on Language Anal-ysis in Social Media.

Inderjeet Mani. 2012. Computational Modeling of Nar-rative. Synthesis Lectures on Human Language Tech-nologies. Morgan & Claypool Publishers.

Christopher D. Manning, Mihai Surdeanu, John Bauer,Jenny Finkel, Steven J. Bethard, and David McClosky.2014. The Stanford CoreNLP natural language pro-cessing toolkit. In Proc. of the Annual Meeting of theAssociation Computational Linguistics (ACL).

Andre FT Martins, Noah A Smith, and Eric P Xing.2009. Polyhedral outer approximations with applica-tion to natural language parsing. In Proc. of the Inter-national Conference on Machine Learning (ICML).

J. McCarthy. 1980. Circumscription a form of non-monotonic reasoning. Artificial Intelligence, 13(1,2).

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory SCorrado, and Jeffrey Dean. 2013. Distributed rep-resentations of words and phrases and their composi-tionality. In The Conference on Advances in NeuralInformation Processing Systems (NIPS).

548

Page 13: Understanding Satirical Articles Using Common-Sense

Tim O’Keefe, Silvia Pareti, James R Curran, Irena Ko-prinska, and Matthew Honnibal. 2012. A sequencelabelling approach to quote attribution. In Proc. of theConference on Empirical Methods for Natural Lan-guage Processing (EMNLP).

Bo Pang and Lillian Lee. 2008. Opinion mining andsentiment analysis. Foundations and trends in infor-mation retrieval, 2(1-2):1–135.

V. Punyakanok, D. Roth, and W. Yih. 2008. The impor-tance of syntactic parsing and inference in semanticrole labeling. Computational Linguistics, 34(2).

Nathan D. Ratliff, J. Andrew Bagnell, and Martin Zinke-vich. 2007. (approximate) subgradient methods forstructured prediction. In Proc. of the InternationalConference on Artificial Intelligence and Statistics(AISTATS).

Raymond Reiter. 1980. A logic for default reasoning.Artificial intelligence, 13(1):81–132.

Antonio Reyes, Paolo Rosso, and Tony Veale. 2013. Amultidimensional approach for detecting irony in twit-ter. Language Resources and Evaluation, 47(1):239–268.

Matthew Richardson, Christopher J. C. Burges, and ErinRenshaw. 2013. Mctest: A challenge dataset for theopen-domain machine comprehension of text. In Proc.of the Conference on Empirical Methods for NaturalLanguage Processing (EMNLP).

Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra DeSilva, Nathan Gilbert, and Ruihong Huang. 2013.Sarcasm as contrast between a positive sentiment andnegative situation. In Proc. of the Conference onEmpirical Methods for Natural Language Processing(EMNLP).

Richard Socher, Danqi Chen, Christopher D Manning,and Andrew Ng. 2013. Reasoning with neural ten-sor networks for knowledge base completion. In TheConference on Advances in Neural Information Pro-cessing Systems (NIPS).

Xu Sun, Takuya Matsuzaki, , Daisuke Okanohara, andJunichi Tsujii. 2009. Latent variable perceptron al-gorithm for structured classication. In Proc. of the In-ternational Joint Conference on Artificial Intelligence(IJCAI).

Niket Tandon, Gerard De Melo, and Gerhard Weikum.2011. Deriving a web-scale common sense factdatabase. In Proc. of the National Conference on Arti-ficial Intelligence (AAAI).

Joseph Tepperman, David R Traum, and ShrikanthNarayanan. 2006. ” yeah right”: sarcasm recognitionfor spoken dialogue systems. In Proc. of Interspeech.

Byron C. Wallace, Do Kook Choe, Laura Kertz, and Eu-gene Charniak. 2014. Humans require context to in-fer ironic intent (so computers probably do, too). In

Proc. of the Annual Meeting of the Association Com-putational Linguistics (ACL).

Byron C. Wallace, Do Kook Choe, and Eugene Charniak.2015. Sparse, contextually informed models for ironydetection: Exploiting user communities, entities andsentiment. In Proc. of the Annual Meeting of the Asso-ciation Computational Linguistics (ACL).

Hai Wang and Mohit Bansal Kevin Gimpel DavidMcAllester. 2015. Machine comprehension with syn-tax, frames, and semantics. In Proc. of the AnnualMeeting of the Association Computational Linguistics(ACL).

549

Page 14: Understanding Satirical Articles Using Common-Sense

550