[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...
TRANSCRIPT
The Mental State of Influencers
D.B. SkillicornSchool of ComputingQueen’s UniversityKingston. Canada
C. LeuprechtSchool of Policy Studies
Queen’s UniversityKingston. Canada
Abstract—Most analysis of influence looks at the mechanismsused, and how effectively they work on the intended audience.Here we consider influence from another perspective: what dothe language choices made by influencers enable us to detectabout their internal mental state, strategies and assessments ofsuccess. We do this by examining the language used by the U.S.presidential candidates in the high-stakes attempt to get elected.Such candidates try to influence potential voters, but must alsopay attention to the parallel attempts by their competitors toinfluence the same pool. We examine seven channels: personadeception, the attempt by each candidate to seem as attractive aspossible; nouns, as surrogates for content; positive and negativelanguage; and three categories that have received little attention,verbs, adverbs, and adjectives. Although the results are prelim-inary, several intuitive and expected hypotheses are supported,but some unexpected and surprising structures also emerge. Theresults provide insights into related influence scenarios whereopen-source data is available, for example marketing, businessreporting, and intelligence.
I. INTRODUCTION
We address the problem of language-based influence insettings where there are a set of influencers and an audienceto be influenced. Some examples of such settings are shownin Table I. However, unlike most work on influence, we do notconsider the the process from the perspective of the audience(how well does it work?) nor from the perspective of theinfluencer (how can I do it better?). Rather we consider whatthe choices, conscious and subconscious, of the influencerreveal about him/her/them and, to some extent, how thischanges over time. In other words, we use the influencinglanguage as a lens into the thinking of the influencer, ratherthan as a channel for influence itself. In fact, because muchlanguage choice is partly or completely subconscious, we uselanguage patterns to access aspects of influencers and theirorganizations that may not even be obvious to themselves.
This kind of influencing language cannot be modelled as aone-way channel from influencer to influenced. An influenceris almost always concerned with the audience’s reaction,whether in real-time in a speech, or at longer time scales using
Influencer Audience Setting
Businesses Consumers Advertising
Terrorists Sympathizers Recruitment
Politicians Voters Campaigns
TABLE IINFLUENCE SETTINGS
polling or sentiment analysis. Feedback is an essential part ofthe process.
The main constraints on an influencer are of four differentkinds:
1) Intrinsic capabilities, such as linguistic ability and per-sonality;
2) Goals, the purpose to be served by the communication;3) Responses, judgements about the effectiveness on the
target audience; and4) The language of competitors within the same sphere of
influence.
The first three drivers are not very different from those positedby systemic functional linguists for all communication [10,16].
We examine these drivers by exploring influence languageusing seven different “channels”:
1) Persona deception, the attempt by individuals or orga-nizations to appear better, wiser, smarter, and/or moreexperienced or more qualified than they really are.
2) Content, which measures how each influencer choosesamong particular topics to appeal to audience interestsor concerns, and to gain an edge over competitors.
3) Positivity, the attempt to appeal emotionally to an audi-ence in an uplifting way.
4) Negativity, both as signal of openness by the influencer,and as a way to denigrate competitors.
5) Verb usage, which measures how each influencerchooses among different families of verbs, and amongverb tenses.
6) Adverb usage, which measures how each influencermodifies the chosen verbs.
7) Adjective usage, which measures how each influencermodifies the chosen nouns (if at all).
These channels may play slightly different roles in differentsettings. For example, businesses use a form of personadeception to build up their brands, but are rarely negative,even about competitors. Politicians use persona deception topresent themselves as ready for the position for which they arecampaigning, but are quite willing to be negative about theircompetitors.
For some of these channels, there are plausible hypothesesthat can be tested. For example, we expect that politicians willtry as hard as they are able to present themselves as well-suitedfor the position they are campaigning for, and better suited
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.160
922
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.160
890
than their competitors. Hence we might expect high levels ofpersona deception and at least a certain amount of negativityin campaign speeches. We might expect broad consistency inthe choice of nouns, and general agreement among competi-tors, perhaps even more strongly within subdomains such acandidates from the same political party or industries sellingthe same products. We might expect high levels of positivityto maintain an upbeat relationship with the audience.
On the other hand, there are few plausible hypotheses aboutthe use of verbs, adjectives, and adverbs in influence (otherthan those implied by their participation in other kinds oflanguage). Here we provide exploratory information about howthese word classes are actually used that may eventually leadto more robust hypotheses.
The contribution of this paper is to provide empiricalevidence for the use of these seven classes of markers inthe U.S. presidential campaigns of 2008 and 2012; and toprovide partial support for some of the hypotheses mentionedabove. The methodology is also of some interest, since itis lightweight and generalizes to other domains in whichinfluence is crucial. This approach can be applied to otheropen-source influence scenarios where the language used isavailable for analysis, including marketing and some kindsof advertising, business reporting, and intelligence. However,the focus of the approach is what language tells us aboutthe source organizations and individuals, not the populationsegment intended to be influenced.
In Section II, we explain the word lists that we use todefine each of the seven channels. In Section III we commenton related work. In Section IV we lay out the experimentalframework. Section V presents the results of the analysis.Finally, in Section VI we draw some conclusions and suggestsome open problems suggested by the results.
II. CHANNELS OF INFLUENCE
We consider political speeches, and extract from thempatterns of language use on seven different channels.
Persona deception, the attempt to appear better than one is,is a form of deception and is captured well by Pennebaker’sdeception model [12]. This model characterized deceptionas causing changes in the frequencies of 86 words, in fourcategories. In deceptive text:
1) First-person singular pronouns (“I”, “me”, “myself”)decrease in frequency;
2) Exclusive words, words that single an increase in contentcomplexity (“but”, “or”) decrease in frequency;
3) Negative-emotion words (“afraid”, “disappointed”) in-crease in frequency;
4) Action verbs (“go”, “lead”) increase in frequency.Since the meaning of “increase” and “decrease” depends onsome baseline, this model can only be used to rank a group ofcognate documents by their relative deceptiveness, not labelan individual document as deceptive or not. For example, inbusiness writing, first-person singular pronouns are rare, soa mixture of such documents and, say, blogs would producenon-comparable deceptiveness scores.
For models based on parts of speech, we extract documentcontent using the QTagger [4], a part-of-speech aware taggerthat uses Lingpipe [1] as a front-end. This enables all wordstagged as nouns, as verbs, as adverbs, and as adjectives to beextracted and their frequencies counted.
Positivity and negativity are computed based on lists ofpositive and negative words created by Loughran and Mc-Donald [9] for business settings. Since political speechesoften have a significant economic component, these lists werejudged to be more appropriate than more-general lists used insentiment analysis, and targeted at consumer products, books,and movies.
III. RELATED WORK
There is a large literature on sentiment analysis (for ex-ample, [5, 7, 11, 14, 17]) including attempts to predict electionoutcomes [15]. That work addresses the converse of theproblem we address here: how to understand the reaction ofan audience to the effects of an influencer.
There is also a truly huge literature on the interactionbetween politics and language, addressing issues such as howthe choice of particular words frames a political discourse,how words change with political changes, and how politicianscan be better salespeople. This work is orthogonal to theproblem we address here, since it is primarily concerned withimproving the process of influencing.
The work closest to ours is work by psychologists who haveinvestigated the relationship between language use and mentalstate, for example Chung and Pennebaker’s work [2, 3] onhow words reveal psychological profiles and health; Newmanand Pennebaker’s work on deception [12], and Tausczik andPennebaker’s recent survey [13]. Attempts to identify ideologyfrom language usage, for example Koppel et al. [8], are alsorelevant.
IV. EXPERIMENTAL FRAMEWORK
The campaign speeches of presidential candidates in theperiod from January 2008 to the election (McCain, Clin-ton, Obama) and from January 2012 to early April 2012(Gingrich, Paul, Romney, Santorum, Obama) were collected.These speeches were, of course, often the product of manyauthors, so the insight they provide is insight into the thinkingof each campaign more than each candidate. That said, theindividuals who delivered each speech do make alterations on-the-fly, and speeches obviously based on the same script differsubstantially from one day to the next – so some insight intoindividuals is possible.
The complete set of speeches and either a specific word listor selected parts-of-speech tags were given to the QTaggerwhich extracted a document-word matrix in which each entryrepresents the frequency of (the tagged version of) each wordin each speech.
Each row of the matrix is normalized by dividing by the rowsum to reduce the effect of the difference in speech lengths.Each column of the matrix is normalized to z-scores using onlythe non-zero entries (subtracting the column mean from each
923891
column entry and dividing by the column standard deviation).The effect of this is that positive values in the normalizedmatrix represent greater than typical rates of use, negativevalues represent lower than typical rates of use, while absenceand median use are conflated to small magnitude entries.This effect is problematic, although justifiable, but there isno reasonable workaround.
In the deception model, the sign of those columns corre-sponding to words where a decrease in frequency is the signalof an increase in deception are reversed, so that positive entriesalways correspond to increasing deception.
If the data matrix has n rows (1 per document) and m
columns (1 per word), then each document can be consideredas a point in m-dimensional space. Understanding of the rela-tionships among documents can be increased by, for example,clustering in this space to find groups of similar documents.
However, m is typically large and there are advantagesin projecting the m-dimensional space into one of lowerdimension. Doing so removes some of the noise associatedwith polysemy, and also makes it possible to visualize therelationships among the documents.
We do this using singular value decomposition (SVD) [6].If A is the data matrix (n×m), then the SVD is given by:
A = USV′
where U is n × m, S is a diagonal matrix whose entries,called singular values, are non-increasing, V is m × m, andthe superscript dash indicates transposition. Both U and V areorthogonal matrices.
One interpretation of SVD is that it rotates the space so thatthe greatest variation is aligned with the axis given by the firstrow of V ′, the second-greatest uncorrelated variation with thesecond row of V
′, and so on. The rows of U are then thecoordinates of each document in this transformed space.
Because of the ordering of the axes, the decomposition canbe truncated after some k dimensions (a form of projection)so that
A ≈ UkSkV′
k
where U is n × k, S is k × k and V ′ is k × m. Themagnitudes of the singular values indicate how much variationis captured in each new direction and so estimates how muchis being ignored by truncation at any choice of k. The roleof the truncated SVD, therefore, is to create a data-dependentprojection of the data along axes that show the ways in whichthe documents vary.
If the data consists of a single underlying factor then theuse of SVD enables this to be detected as an almost-linearstructure in the low-dimensional space. Because of the inherentsymmetry between documents and words – if A = USV ′
then A′= V SU ′ – points corresponding to both documents
and words can be plotted in the same space (appropriatelyscaled). In such a plot, documents can be considered as beingdrawn towards the points corresponding to words that occurwithin them with higher frequency and, symmetrically, wordsas being drawn to documents in which they feature. The first
k columns of both the U and V matrices can be plotted in k
dimensions, with a point corresponding to each document andto each tagged word. The relationships among all of these canthen be observed in the plot: both documents and words maybe clustered. More importantly, the clusters agree in the sensethat a cluster of documents in the same region as a cluster ofwords indicates why the documents cluster (they use similarwords) and, symmetrically, why the words cluster (they appearin similar documents).
V. RESULTS
Throughout this section, the results are coded as shownin Table II. In all of the figures, distance from the origincorresponds to increased significance, and directions indicatedifferent kinds of variation.
Politician Symbol
Clinton (2008) magenta star
McCain (2008) cyan star
Obama (2008) red star
Gingrich green circle
Paul yellow circle
Romney blue circle
Santorum black circle
Obama (2012) red square
TABLE IISYMBOL CODING FOR RESULTS
The number of markers used in each experiment is givenin Table III. Each marker represents a particular word witha particular part-of-speech tag; hence, although the deceptionmodel uses 86 words, 98 tagged words were counted. Clearlyspurious words were removed, but the quality of parsing didnot make it possible to determine every spurious possibility(for example, “mine” as a noun instead of a pronoun). For allmodels, any word that occurred in more than 1 document wascounted.
Model Number of markers
Deception 98
Nouns 5169
Positive 246
Negative 763
Verbs 3299
Adverbs 453
Adjectives 1428
TABLE IIINUMBER OF TAGGED WORDS USED FOR EACH MODEL
A. Persona deception
The deception model produces the most puzzling resultsof all of the marker sets. Typically, this model produces asingle-factor structure, ranking documents from most to leastdeceptive. In this case, the structure is more complex (seeFigure 1).
924892
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
walk
without
i’llbut
ignorantlies
however
or
hate
lonely
look
moved
bringing
unless
crazy
taking
i’ve
except
i’dtragedy
go
going
driven
moved move
but
i’m
angerlie
worthless
golies
lie
carrying
flew
nor
go
arrived
or
look
mine
sorry
rather
drive
besides
run
me
however
carry
walk
ratherarrive
go
myself
fear
however
me rather
goes
fled
arrived
followed
vain
besides
terribleabandon
followedinadequate
mine
loss
arrives
enemy
run leadme
fear
my run
drive
greed
hate
mine
me
minemove
weakfollow
take
lead
action
ialthough
my
greeddisappointed
afraid
i
me
UV1
UV
2
Fig. 1. Speech distribution based on the deception model. The axes (here andin subsequent figures) are the directions of maximal variation (UV1) in boththe documents and words, and the next-largest uncorrelated variation (UV2).
In each direction from the center, there are different mix-tures of words from the four word classes, first-person singularpronouns, exclusive words, negative-emotion words, and ac-tion verbs. This might suggest that different candidates havetheir own individual patterns of use of these words, and soan individual style of persona deception – but this turns outnot to be the case. Rather, individuals over time change thewords they use, so that their trajectories in this space tend tobe roughly circular or spirals. These trajectories are shown inFigures 2–9.
B. Nouns
When we consider the content of speeches, as determinedfrom the nouns that they use, it is clear that each candidatehas a central topic focus. They tend to spend most of theirtime on particular topics (or, more accurately, topic blends)and although they do talk about other issues, they do so onlyfleetingly.
Figure 10 shows the relative topic mix of all of the can-didates in 2008 and 2012. The greater number of speechesfrom the 2008 campaign dominate but, despite this, seem tocover a wider range of topics than those of the 2012 campaignso far. Throughout, McCain shows a much wider varietyof language use than any other candidate. Figures 11–18shows the trajectory of content for each individual candidate.Although the lack of data for the 2012 campaign is limiting,it is clear that each candidate has a topic mixture aboutwhich they talk; and occasional deviations are almost alwaysfollowed by an immediate return to this topic center.
The actual topics can be determined by plotting the words,as shown in Figure 19. The extremal, and so most interesting,words are of four kinds: those associated with the economy(left), energy (bottom), security (lower right), and patriotism(upper right), with these latter two (unsurprisingly) blendingsomewhat.
0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
2
4
3
U1
1
6
5
U2
Fig. 2. Deception - Gingrich
−0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04−0.08
−0.07
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
1
5
U1
4
2
3
U2
Fig. 3. Deception - Paul
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08−0.12
−0.1
−0.08
−0.06
−0.04
−0.02
0
0.02
4
1
11
3
U1
10
2 8
5
6
9
7
U2
Fig. 4. Deception–Santorum
−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12−0.15
−0.1
−0.05
0
0.05
0.1
10
3
8
2
9 5
4
1
6
12
U1
7
11
U2
Fig. 5. Deception–Romney
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
7
U1
8
6
1
2
3
4
5U2
Fig. 6. Deception–Obama (2012)
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
104 24 26
20
31
28
14
6
63
30 25 83 75 80
77 89
66 64 87 98
50 84 70 7
97 82 92 99
16
103 106
46
101 86
62
85
94 105 109 88 15 108 91
102
112 12 72
22
13 93 3 79 76 52 100 54 47
73
41
57
10 65 23
43
110 90
58
45 55 96
68 29
69
74
113 51
33
95 9 8 4 36
111 56
53 32
71
35
48
67 11 27
40
42
39 1
38
78 5
59
44 81
34
18
19 17
49 107
37
60
21
U1
61
2
U2
Fig. 7. Deception–Obama (2008)
−0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08−0.02
0
0.02
0.04
0.06
0.08
0.1
3
1
12
29
24
2
9 19
11
30
7
27
21
4
U1
6
15
10
22
26
17
18
25
16
31
20
28
8
23
13
14
5
U2
Fig. 8. Deception–Clinton
−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
68 74 73 71
67 70
62 66 65
50
61 59
60
53 69
3
55
58
42
64
56
72
43 44
63
33
4 1
35
18
57
37
10
16
27
48
51
19
8
15
9
5
25
U1
2
32 21
45
54
49
34
22
40
29
13
30
52
38
23
39
11
17
46
47
36
75
26 41
24
28
6
14
20
31
7
12
U2
Fig. 9. Deception–McCain
C. Positive words
The most striking fact about the use of positive words isthat they separate into two almost orthogonal groups as shownin Figure 20. The most significant words in one group arewords such as “innovation”, “ingenuity”, and words with theroot “invent”; in the other group are words such as “stability”,“stabilizing”, and “rebound”. In other words, one group hasa positive sense that emphasizes novelty, while the other hasa positive sense that emphasizes a re-creation of some aspectof the past. Thus two very different mindsets, both in a sensepositive, are induced from the data. Individuals do not seem tobe strongly associated with one or other of these word groups –rather, they change from one viewpoint to the other in differentspeeches, rarely using both in a single speech. Figure 21 showsthe positions corresponding to each speech, overlaid on thewords used.
925893
−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.35
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
U1
U2
Fig. 10. Candidates’ topic focus (stars: 2008, circles: 2012 Republicans,square: 2012 Obama)
D. Negative words
The negative words form four groups, as shown in Fig-ure 22: words related to crime and violence at the top left(“crimes”), words related to hostility at the lower left (“sus-picion”); political loss words at the lower right (“weakened”,“defeats”), and economic negative words at the right (“crisis”,“cut”).
Unsurprisingly, speeches are most often associated with thenegative economic words, in both election cycles, with anoccasional foray into the negative crime and violence words.The hostility words are almost exclusively associated with the2012 cycle. Figure 23 shows the speeches overlaid on the wordusage.
E. Verbs
The verbs form a one-dimensional continuum (with an or-thogonal branch associated entirely with speeches by McCain).At one end are verbs such as “get”, “invest”, “cut”, give”,“pay”, “need”, and “afford” which seem to be determinedby economic content; at the other are words such as “be-longs”, “deprived”, “command”, “seek”, “tolerate”, “detest”,“involve”, and “destabilize” that seem to be associated with aninternational focus, with an emphasis on hostile relationships.The verbs are shown in Figure 24. Past tenses are rare,suggesting that candidates attempt to engage the audience inthe present and future, rather than the past. This sheds somefurther light on the split of positive words – the intent seemsnot to be to return to some positive past but to recreate apositive past.
Figure 25 shows the speeches overlaid with the verbs.Unsurprisingly, the majority of speeches are associated withthe economic verbs.
F. Adverbs
The pattern of use of adverbs is shown in Figure 26.Despite appearances, this is a single factor structure – the
−0.015 −0.01 −0.005 0 0.005 0.01−0.1
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
3
5
4 1
U1
2
6
U2
Fig. 11. Nouns–Gingrich
−0.01 −0.005 0 0.005 0.01 0.015−0.02
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
0.02 1
5
U1
2
3
4
U2
Fig. 12. Nouns–Paul
−0.018 −0.016 −0.014 −0.012 −0.01 −0.008 −0.006 −0.004 −0.002 0−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
6
7
3 10
8
5
1
U1
2
9
11
4
U2
Fig. 13. Nouns–Santorum
−0.05 −0.04 −0.03 −0.02 −0.01 0 0.01 0.02−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
11
7
1
12
6
U1
10
2
3
5
4
9
8
U2
Fig. 14. Nouns–Romney
−0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16−0.1
−0.08
−0.06
−0.04
−0.02
0
0.02
5
U1
1
8 3
4
2 6
7
U2
Fig. 15. Nouns–Obama (2012)
−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
40 59
60
18
61
51
70
25
37 32
49
16
U1
35
17
2
104
21
36
43
50
6
10
96
71
58
8
1 15
53
44
45
74
5
57
39
4
73
34 48
56
55
107
30 9 46 12 7
65
38
19
24
27 11
54
20
67
23
26 29 14
69
41
33
72
28
68 47
79
13
63
31
3
66 81
42
112
78
52
110
76
64 62 22
113
84
111
86
83 75 77 80 95
106
87
94 85 88
90 89 92 93 82 97
91 100
102
101 98 109 108 99 103 105
U2
Fig. 16. Nouns–Obama (2008)
−0.05 0 0.05 0.1 0.15 0.2−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
31
14 16
U1
20
22
8
5
28 25
27
7
10 23
13
11
29
24
2
30
15
9
1
21
17
19 12
3
26
6
18 4
U2
Fig. 17. Nouns–Clinton
−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.35
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
31
29
7
14
23 11
46
12
30
10
49
22
8
26
51
13
41
36
28
4
U1
9
25
75
24
5 47
39
44 18 45
1
34
64
32 40 54
3
38
27
48
37
20
35
19
2 17 16
15
58
21
33 50
55 59
6
52 43
60 61 56
42
53 69 66
70 73 74 57 68 71 65 62 67
63 72
U2
Fig. 18. Nouns–McCain
vertical branching is almost entirely due to McCain’s pro-lific and divergent use of adverbs at right-hand end of thespectrum. The spectrum itself is hard to characterize, exceptthat adverbs at the left-hand end seem to positive-soundingand to have a sense of motion (“swiftly”, “boldly”, “briefly”).Significant adverbs at the right-had end of the spectruminclude “irresponsibly”, “intentionally”, and “illegally”, whicharguably have negative connotations; but also “humanely” and“occasionally”.
The speeches overlaid with the adverbs are shown in Fig-ure 27.
G. Adjectives
The pattern of use of adjectives is shown in Figure 28.There are three strong clusters of significant adjectives: thoseassociated with energy (upper left), those associated withsecurity (right-hand side), and those associated with patriotism
926894
−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
case
king birmingham
man
life
government
loyalties obedience
others
supply use
gasoline source foresight
necessities
glaciers invention zero−emissions
obama
countrymen
emissions power carbon
customer vehicles technologies
abuses
ice wildlife
automaker
hatred
paths
emission reference animal satellite wilds snowpack habitat runoffs antarctic biologists hybrids
battery
leapfrog migration
cap−and−trade prudence
paralysis
temperatures
qualities
planners capitalists motive environmentalists rivers production
good
ingenuity reduction
stewardship
shelves
purchase
fidelity
arctic northwest
prize shortage
fossil gasses
day
greenhouse leap
sheets
energy
opec challenge gases
incentives
price
automakers powers
demand
gallon car plug−in alternatives
oil
selfishness
cars reserves
spending
gas
fuel
excesses
efforts
obama’s
drought patterns
scientists sea
standards
targets dangers
sources market
grandchildren’s fight
wind
grief jerusalem
peace regime
force
ditch
egypt
responsibilities officer
nations
aipac extremists concessions victor israeli−palestinian ben−gurion kohr
honor
bond
violence
jews goldwasser
integration
rocket israel’s camp
concern
world’s
technology
eldad shalit olmert regev gilad
businesses
ehud israel
hamas
levels
palestinians survivors hezbollah
slaves
cover divestment
syria
iran’s sderot rockets
threat
abbas mahmoud vashem hardliners yad palestinian nonproliferation holocaust
duty
allies
sanctions israelis
ahmadinejad alliance
senator
taxes
lebanon gaza kilos
jobs
counterinsurgency
iran
streets
annihilation mass
diplomacy
tehran non−proliferation
capital
destruction materials
companies
extremism
beirut
leverage
al arab anbar
uranium
iranian
vietnam
tel aviv weapons
tax
stockpiles
homeland
lugar cost
threats
economy
strategy
men
sacrifice
terror
claims
militias
defeat
terrorism
service
devotion
wall
summons
crisis liberators
college
un
jim
security terrorists
region
cracks
east
arms
valor street
plan
stain
servicemen
forces
wreckage
basra gains plastic
forces’
balkans encounter
records
treatment
sacrifices
tour
sailors casualty heroism
honors fascism
disgrace
kuwait
qaeda
webb
complaints hands−on
notes
appointments
lincoln
veteran’s
tolerance
marble forgets
vba va’s
crack
grandfather
politician’s
heroes
vets fha airmen
century
patton’s
disorder
illness
grandmother
families
warriors laugh cardboard
smile
petraeus
bragg
appointment
military’s
evaluation rifle
facilities
measure
bronze
bomber
retention
flag cemetery leavenworth
webb’s
it’s rehabilitative backlogs distances
tape
marines
va
rehabilitation
outpatient
brain
ptsd
afghanistan
graves homelessness
fort
spouses
punchbowl copies faction
war
harbor salute gi veteran
taps dunham stanley means−testing
uniform enlistment
veterans army
iraq
screening
tours deployments
pearl
walter reed
injuries troops injury
V1
V2
Fig. 19. Nouns labelled when their distance from the origin exceeds twicethe median
−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
stability
resolve
rewards
stabilizing
transparency
win
accomplish
improvements
innovation
rebound
regained succeeded proactive
efficiency advances
ingenuity
invention
adequately
favorites
lucrative
good
inventors
collaborating
influential
invent
advantages
impressive
greater
delighted
achieved
satisfied
impress
enthusiasm
tremendous
V1
V2
Fig. 20. Positive words labelled when their distance from the origin exceedstwice the median
(and especially veterans) at the bottom of the figure. Asexpected, there is general agreement between this structure andthe structure of the nouns except that there are no significantadjectives associated with the economic nouns.
Figure 29 shows the speeches overlaid with the adjectives,making it clear how popular energy-related adjectives are.
VI. DISCUSSION AND CONCLUSIONS
Persona deception is an important part of politics since can-didate success usually depends on convincing independents orswing voters. Neither particular policies nor factual deceptionseem to have much effect, perhaps because voters have becomecynical about both. Our results are puzzling, since there is nosimple spectrum of these speeches from most to least personadeceptive, and this is surprising.
We suggested that an intuitive hypothesis is that candidateswill talk about the same issues, and so tend to use the same
−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−2.5
−2
−1.5
−1
−0.5
0
0.5
1
stability
resolve
rewards
stabilizing
transparency
win
accomplish
improvements
innovation
rebound
regainedsucceededproactive
efficiencyadvances
ingenuity
invention
adequately
favorites
lucrative
good
inventors
collaborating
influential
invent
advantages
impressive
greater
delighted
achieved
satisfied
impress
enthusiasm
tremendous
UV1
UV
2
Fig. 21. Speeches and positive words overlaid
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
defaults concede
dangerous
injury injuries
delay weakened
claims
defeat
deeper
destruction
traumatic
defeats
complaints summons
slowdown
refinancing
slowing
defend
tolerate deter
deprived
force
tragedies
collapsing
illicit
defaulting
deterrent
corruption
renounce
threat
devastated
argue
cut
misrepresenting
injured sacrificed
inevitable
quit
divest incident volatile
adversaries worthless
criticisms
sacrifice wrongly
hostile
warning
force
sacrifices
worse troubles
aftermath
suspected
bad
error
destroy
denies dishonor
persistent
incompetents declined
defending
sharply monopoly threaten
degradation
concerns threats
recall
dangers
impedes deprives
destabilize
barriers discourages
difficulty
illegally
unintended
hurt
victims
vulnerability litigate plaintiff inconsistency suing
embargo
relinquish
prosecution
hostility prosecute persists
concern
displaced fines
stops
violent
crimes
denigrate
disturbing detain
assaults
dismiss
penalties
lost
convicted criminal
incomplete inconsistent
exploitation contradict
premature
crime
cut
offenders
overcome
confine
prematurely confusion
turmoil bankruptcy
claiming
serious
breaks
criminals
sued
disregarded
crisis
violated guilty
penalty
abuses objections disregard
disagreements
abuse
caution abuse
suspicion inconvenience disappoints disregarded
V1
V2
Fig. 22. Negative words labelled when their distance from the origin exceedstwice the median
nouns, with perhaps even greater agreement for candidatesfrom the same party. This appears to be broadly supportedby the data, although each candidate carves out a region ofthe space of topics where they tend to concentrate. Overtime, they make occasional forays into other topic mixes,but the path of topics chosen over time tends to be star-like. The candidates in the 2008 election cycle covered amuch broader range of topics, but it is not yet clear whetherthis is an inherent difference or an artifact of the smallerdataset available for 2012. The 2008 election cycle was alsoanomalous because there was no incumbent in either party;this may have promoted a more wide-ranging set of topicssince nobody was running on a existing presidential record.
We also suggested that election speeches would be markedby positivity, as candidates attempt to make voters feel goodabout the prospect of themselves as successfully elected. Thisis not supported as well as expected – there is a considerable
927895
−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
defaultsconcede
dangerous
injuryinjuries
delayweakened
claims
defeatdeeper
destructiontraumatic
defeats
complaintssummons
slowdown
refinancing
slowingdefend
toleratedeter
deprived
forcetragedies
collapsing
illicit
defaulting
deterrent
corruption
renouncethreat
devastated
argue
cut
misrepresentinginjuredsacrificed
inevitablequit
divestincidentvolatileadversariesworthless
criticismssacrifice
wronglyhostile
warning
force
sacrifices
worsetroubles
aftermathsuspected
bad
error
destroy
deniesdishonorpersistentincompetentsdeclined
defending
sharplymonopolythreaten
degradationconcernsthreats
recalldangers
impedesdeprives
destabilize
barriersdiscouragesdifficulty
illegally
unintended
hurtvictims
vulnerabilitylitigateplaintiffinconsistencysuing
embargo
relinquish
prosecutionhostilityprosecutepersists
concerndisplacedfines
stops
violent
crimes
denigratedisturbingdetain
assaults
dismiss
penalties
lost
convictedcriminal
incompleteinconsistent
exploitationcontradict
premature
crime
cut
offenders
overcome
confine
prematurelyconfusion
turmoilbankruptcy
claiming
serious
breaks
criminals
sueddisregarded
crisis
violatedguilty penalty
abusesobjectionsdisregard
disagreements
abuse
cautionabuse
suspicioninconveniencedisappointsdisregarded
UV1
UV
2
Fig. 23. Speeches and negative words overlaid
−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
was had
were lived
fight
concede
stand
loved
thank
quit
digging
fought
infighting strengthens
served
caring
freeze
loved
bailing
redistributing
scrub overtaken
clean
going
prove
injured
weakened
invaded
prepared
trusts
back
kicking lined
referred force
handed march
sliding
prosper
waiting
enlisted
planned presuming
controlling refinancing
revealed
rest folded
creeping
honoring
devastated
sacrificed
nearing
cut keeps
delay
defeats
how hits
risked
forgot
let
earned
fine
elected
succeeded
spend
doomed raising
defend
spread kill
taxing
collapsing
worried
slowing
served
determine
armed
matched
proposes tax
suspect
repeat
involve
command get
planning invaded
fleeting
acts
disabled stir
loving owe
tested
hide
quote weigh spending
spread
argue
veto
raise
measuring
legislate limb
employ
give
me
spreading
memorialized
extended lagging
creates
caregivers
blocked kept
received
accepts
instructed rank outweighs shouldered
applauding improves budgeting mix attach co−authored affording climbed specialized incurred aiming
charged deprived
detest suffered enters cosponsored
borne shed
applied
rank
expands returned misrepresenting
requiring repay shielded square
abandon
sponsored
enriched
receive
possess destabilize
suffer refined
belongs
entering considered secured
appreciate
forge
maintains denies
create
knock
acted
tolerate broaden
lower
civilized
enhance
transfer
cut killed
breaks
pose
isolate
out−compete
protect
giving
posed perceived
act
occupied tightened bolster
destroy
plunge
misled transferring emerging
tighten
break seek
guard
fed rent need
afford
rebuilding battled
proposed
vanishing
risen
acquire
steer
sell
united
outsourced
accelerating
prevent
excusing
investing
invest
strike
propose
provide
adapt
produce
threaten
strengthen
save pay
rising
meet
V1
V2
Fig. 24. Verbs labelled when their distance from the origin exceeds twicethe median
range of positivity. This may be the result of inherent limita-tions on the ability of each candidate to be positive throughouta gruelling process. The surprising discovery here is thatpositivity divides strongly into two factors, one characterizedby novelty and the other by re-creation. These do not separateparties or candidates but are different viewpoints taken byindividual candidates at different times.
Particular classes of verbs, for example verbs of persuasionor action verbs, have been previously considered. Our purelyinductive approach has not. Our results show a strong single-factor structure to verb use, but the characteristics that definethe spectrum of verbs are not obvious, and further work isneeded here. Unsurprisingly, the spectrum of adverbs thatmodify these verbs, which also has a single-factor structure,is also difficult to understand. The structure of adjectivesmatches, to some extent, the structure of nouns; interestingly,the content category of nouns for which matching adjectives
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
washadwerelived
fight
concede
stand
loved
thank
quit
digging
fought
infightingstrengthens
served
caring
freeze
loved
bailingredistributing
scrubovertakenclean
going
prove
injured
weakenedinvaded
prepared
trusts
back
kickinglined
referredforce
handedmarchsliding
prosperwaiting
enlisted
plannedpresumingcontrollingrefinancing
revealed
restfolded
creeping
honoring
devastated
sacrificed
nearingcutkeeps
delay
defeats
howhits
risked
forgotlet
earned
fine
elected
succeeded
spenddoomedraising
defend
spreadkilltaxing
collapsing
worried
slowing
served
determine
armedmatched
proposestax
suspect
repeat
involvecommandget
planninginvaded
fleeting
acts
disabledstir
lovingowe
tested
hide
quote weighspending
spread
argue
veto
raise
measuring
legislatelimb
employ
giveme
spreading
memorializedextendedlagging
creates
caregiversblockedkept
receivedaccepts
instructedrankoutweighsshouldered
applaudingimprovesbudgetingmixattachco−authoredaffordingclimbedspecializedincurredaimingchargeddepriveddetestsufferedenterscosponsoredborneshed
applied
rankexpandsreturnedmisrepresenting
requiringrepayshieldedsquare
abandon
sponsored
enriched
receivepossessdestabilize
sufferrefined
belongsenteringconsideredsecuredappreciate
forgemaintainsdenies
create
knock
actedtoleratebroaden
lower
civilizedenhance
transfer
cut killed
breaks
poseisolate
out−compete
protect
giving
posedperceived
act
occupiedtightenedbolsterdestroy
plunge
misledtransferringemerging
tighten
break seekguard
fedrentneed
affordrebuildingbattledproposed
vanishing
risen
acquire
steer
sell
unitedoutsourced
acceleratingprevent
excusing
investing
investstrike
propose
provide
adaptproduce
threatenstrengthen
savepay
rising
meet
UV1
UV
2
Fig. 25. Speeches and verbs overlaid
−0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
diligently
heady keenly
intensely extraordinarily
powerfully
intentionally
here
historically
mr
temporarily fairer
really
else’s
personally
naturally
profoundly
eventually
out —
faithfully
aggressively
— seldom
—
accurately
approximately traditionally
uniquely
grossly
outside
irresponsibly quite
before
nonetheless
widely
tenaciously
possibly
annually
freely
generously
together
constantly
sooner
mostly physically
politically
expeditiously
exclusively
occasionally
often
otherwise
humanly
rightly
seriously
briefly
neither
yes
illegally
o practically
humanely
however
sure
only deeper
elsewhere
recklessly steadfast
boldly
unwavering backward
longer swiftly
whomever merely
low
surely
V1
V2
Fig. 26. Adverbs labelled when their distance from the origin exceeds twicethe median
are rare is that of economics.Politicians are good surrogates for more general influencers
because they are highly motivated and well-funded, becausethey must attempt to influence many people in varied settings,and because they cannot appeal only to those who are alreadywilling to be influenced. The approach, and perhaps many ofthe results, presented here can be applied to other influencesettings such as advertising and terrorism. Recall that our goalis not so much to understand the process, although we do gainsome insight, but to extract information about the influencersfrom their use of the process.
REFERENCES
[1] Alias-i. Lingpipe 4.1.0. http://alias-i.com/lingpipe, 2008.[2] C.K. Chung and J.W Pennebaker. Revealing dimensions of thinking in
open-ended self-descriptions: An automated meaning extraction methodfor natural language. Journal of Research in Personality, 42:96–132,February 2008.
928896
−1 −0.5 0 0.5 1 1.5 2 2.5−5
−4
−3
−2
−1
0
1
2
3
diligently
headykeenly
intenselyextraordinarily
powerfully
intentionallyhere
historically
mr
temporarilyfairerreally
else’s
personally
naturally
profoundlyeventually
out —faithfully
aggressively
— seldom—
accurately
approximatelytraditionally
uniquely
grossly
outside
irresponsiblyquitebefore
nonetheless
widely
tenaciously
possiblyannually
freely
generously
together
constantly
sooner
mostlyphysically
politicallyexpeditiously
exclusively
occasionally
often
otherwise
humanly
rightly
seriously
briefly
neither
yes
illegallyopractically
humanely
however
sure
onlydeeper
elsewhere
recklesslysteadfast
boldly
unwaveringbackward
longerswiftly
whomevermerely
lowsurely
UV1
UV
2
Fig. 27. Speeches and adverbs overlaid
−0.1 −0.05 0 0.05 0.1 0.15−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
retail self−employed nationalized prone follow−up pre−surgical documenting corresponding uncontrolled cost−effective health−care chronic life−saving
rigid
clinic
congenital
medical
high−risk expensive
awful vigorous responsive
convenient
alike
current
cheapest oldest lower
useful
reliable
appropriate
electronic
such
inadequate
wasteful
worthless
mental multiple
predictable
wounded spinal care post−traumatic
unrelated
sad braver
fresh−faced
small
unworthy selfless
on−time at−risk
vet
solar
supportive
memorial
eligible
insufficient
bravest
timely
adequate
service−member
immense
traumatic
clearer
pending
american
unbroken
fuel−efficient
silent
deplorable
inspiring
greater ovarian
unpredictable
hasty
substandard valuable
loved pacific
final
older
bipartisan
new
resolute
solemn
certain
main
richest
red
middle
sacred
forthright
homeless brave
bottom−up brief
unconditional unruly immature fortunate fascinating guilty
submarine lasting
strategic
naval
hostile
japanese
civilian
indebted repetitive
itinerant ephemeral instrumental
national
proud
true
young
cooperative single−minded
military
transnational biological verifiable shared atomic ever−changing deadliest bipolar multilateral
al
white
illicit
prime unshakeable
nuclear
elusive unbreakable sacrosanct loose ballistic israeli
diplomatic european qualitative jewish
international
iranian terrorist
V1
V2
Fig. 28. Adjectives labelled when their distance from the origin exceedstwice the median
[3] C.K. Chung and J.W. Pennebaker. The psychological function offunction words. In K. Fiedler, editor, Frontiers in Social Psychology.Psychology Press, in press.
[4] J.L. Creasor and D.B. Skillicorn. QTagger: Extracting word usage fromlarge corpora. Technical Report 2012-587, Queen’s University, Schoolof Computing, 2012.
[5] N. Godbole, M. Srinivasaiah, and S. Skiena. Large-scale sentimentanalysis for news and blogs. In ICWSM 2007, 2007.
−1 −0.5 0 0.5 1 1.5 2 2.5−4
−3
−2
−1
0
1
2
retailself−employednationalizedpronefollow−uppre−surgicaldocumentingcorrespondinguncontrolledcost−effectivehealth−carechroniclife−savingrigid
clinic
congenital
medical
high−riskexpensiveawfulvigorousresponsive convenient
alike
current
cheapestoldestloweruseful
reliableappropriate
electronic
such
inadequate
wastefulworthless
mentalmultiple
predictablewoundedspinalcarepost−traumatic
unrelated
sad braver
fresh−faced
small
unworthyselfless
on−timeat−riskvet
solar
supportivememorial
eligible
insufficient
bravest
timely
adequateservice−member
immense
traumatic
clearer
pending
american
unbroken
fuel−efficient
silent
deplorableinspiring
greaterovarian
unpredictable
hastysubstandard
valuable
loved pacific
finalolder
bipartisan
new
resolute
solemn
certain
main
richestred
middle
sacred
forthright
homelessbrave
bottom−up brief
unconditionalunrulyimmaturefortunatefascinatingguilty
submarinelasting
strategic
naval
hostile
japanesecivilian
indebtedrepetitiveitinerantephemeralinstrumental
national
proud
true
young
cooperativesingle−minded
military
transnationalbiologicalverifiablesharedatomicever−changingdeadliestbipolarmultilateralal
white
illicitprimeunshakeable
nuclear
elusiveunbreakablesacrosanctlooseballisticisraelidiplomaticeuropeanqualitativejewish
internationaliranianterrorist
UV1
UV
2
Fig. 29. Speeches and adjectives overlaid
[6] G.H. Golub and C.F. van Loan. Matrix Computations. Johns HopkinsUniversity Press, 3rd edition, 1996.
[7] H. Kanayama, T. Nasukawa, and H. Watanabe. Deeper sentimentanalysis using machine translation technology. In Proceedings of the20th International Conference on Computational Linguistics, 2004.
[8] M. Koppel, N. Akiva, E. Alshech, and K. Bar. Automatically classifyingdocuments by ideological and organizational affiliation. In Proceedingsof the IEEE International Conference on Intelligence and SecurityInformatics (ISI 2009), pages 176–178, 2009.
[9] T. Loughran and B. McDonald. When is a liability not a liability? textualanalysis, dictionaries, and 10-Ks. Journal of Finance, 66:35–65, 2011.
[10] C. Matthiessen and M.A.K. Halliday. Systemic functional grammar: Afirst step into the theory. Macquarie University Working Paper, 1997.
[11] T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability usingnatural language processing. In Proceedings of the 2nd InternationalConference on Knowledge Capture, pages 70–77, 2003.
[12] M.L. Newman, J.W. Pennebaker, D.S. Berry, and J.M. Richards. Lyingwords: Predicting deception from linguistic style. Personality and SocialPsychology Bulletin, 29:665–675, 2003.
[13] Y.R. Tausczik and J.W. Pennebaker. The psychological meaning ofwords: LIWC and computerized text analysis methods. Journal ofLanguage and Social Psychology, 29:24–54, 2010.
[14] P.C. Tetlock. Giving content to investor sentiment: the role of media inthe stock market. Journal of Finance, 62:1139–1168, 2007.
[15] A. Tumasjan, T.O. Sprenger, P.G. Sandner, and I.M. Welpe. Predictingelections with Twitter: What 140 characters reveal about politicalsentiment. In Proceedings of the Fourth International AAAI Conferenceon Weblogs and Social Media, pages 178–185, 2010.
[16] C. Whitelaw and S. Argamon. Systemic functional features in stylistictext classification. In Proceedings of AAAI Fall Symposim on Styleand Meaning in Language, Art, Music, and Design, Washington, DC,October 2004.
[17] C. Whitelaw, N. Garg, and S. Argamon. Using appraisal taxonomiesfor sentiment analysis. In Second Midwest Computational LinguisticColloquium (MCLC 2005), 2005.
929897