[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...

The Mental State of Influencers

D.B. SkillicornSchool of ComputingQueen’s UniversityKingston. Canada

[email protected]

C. LeuprechtSchool of Policy Studies

Queen’s UniversityKingston. Canada

[email protected]

Abstract—Most analysis of influence looks at the mechanismsused, and how effectively they work on the intended audience.Here we consider influence from another perspective: what dothe language choices made by influencers enable us to detectabout their internal mental state, strategies and assessments ofsuccess. We do this by examining the language used by the U.S.presidential candidates in the high-stakes attempt to get elected.Such candidates try to influence potential voters, but must alsopay attention to the parallel attempts by their competitors toinfluence the same pool. We examine seven channels: personadeception, the attempt by each candidate to seem as attractive aspossible; nouns, as surrogates for content; positive and negativelanguage; and three categories that have received little attention,verbs, adverbs, and adjectives. Although the results are prelim-inary, several intuitive and expected hypotheses are supported,but some unexpected and surprising structures also emerge. Theresults provide insights into related influence scenarios whereopen-source data is available, for example marketing, businessreporting, and intelligence.

I. INTRODUCTION

We address the problem of language-based influence insettings where there are a set of influencers and an audienceto be influenced. Some examples of such settings are shownin Table I. However, unlike most work on influence, we do notconsider the the process from the perspective of the audience(how well does it work?) nor from the perspective of theinfluencer (how can I do it better?). Rather we consider whatthe choices, conscious and subconscious, of the influencerreveal about him/her/them and, to some extent, how thischanges over time. In other words, we use the influencinglanguage as a lens into the thinking of the influencer, ratherthan as a channel for influence itself. In fact, because muchlanguage choice is partly or completely subconscious, we uselanguage patterns to access aspects of influencers and theirorganizations that may not even be obvious to themselves.

This kind of influencing language cannot be modelled as aone-way channel from influencer to influenced. An influenceris almost always concerned with the audience’s reaction,whether in real-time in a speech, or at longer time scales using

Influencer Audience Setting

Businesses Consumers Advertising

Terrorists Sympathizers Recruitment

Politicians Voters Campaigns

TABLE IINFLUENCE SETTINGS

polling or sentiment analysis. Feedback is an essential part ofthe process.

The main constraints on an influencer are of four differentkinds:

1) Intrinsic capabilities, such as linguistic ability and per-sonality;

2) Goals, the purpose to be served by the communication;3) Responses, judgements about the effectiveness on the

target audience; and4) The language of competitors within the same sphere of

influence.

The first three drivers are not very different from those positedby systemic functional linguists for all communication [10,16].

We examine these drivers by exploring influence languageusing seven different “channels”:

1) Persona deception, the attempt by individuals or orga-nizations to appear better, wiser, smarter, and/or moreexperienced or more qualified than they really are.

2) Content, which measures how each influencer choosesamong particular topics to appeal to audience interestsor concerns, and to gain an edge over competitors.

3) Positivity, the attempt to appeal emotionally to an audi-ence in an uplifting way.

4) Negativity, both as signal of openness by the influencer,and as a way to denigrate competitors.

5) Verb usage, which measures how each influencerchooses among different families of verbs, and amongverb tenses.

6) Adverb usage, which measures how each influencermodifies the chosen verbs.

7) Adjective usage, which measures how each influencermodifies the chosen nouns (if at all).

These channels may play slightly different roles in differentsettings. For example, businesses use a form of personadeception to build up their brands, but are rarely negative,even about competitors. Politicians use persona deception topresent themselves as ready for the position for which they arecampaigning, but are quite willing to be negative about theircompetitors.

For some of these channels, there are plausible hypothesesthat can be tested. For example, we expect that politicians willtry as hard as they are able to present themselves as well-suitedfor the position they are campaigning for, and better suited

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.160

922

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.160

890

than their competitors. Hence we might expect high levels ofpersona deception and at least a certain amount of negativityin campaign speeches. We might expect broad consistency inthe choice of nouns, and general agreement among competi-tors, perhaps even more strongly within subdomains such acandidates from the same political party or industries sellingthe same products. We might expect high levels of positivityto maintain an upbeat relationship with the audience.

On the other hand, there are few plausible hypotheses aboutthe use of verbs, adjectives, and adverbs in influence (otherthan those implied by their participation in other kinds oflanguage). Here we provide exploratory information about howthese word classes are actually used that may eventually leadto more robust hypotheses.

The contribution of this paper is to provide empiricalevidence for the use of these seven classes of markers inthe U.S. presidential campaigns of 2008 and 2012; and toprovide partial support for some of the hypotheses mentionedabove. The methodology is also of some interest, since itis lightweight and generalizes to other domains in whichinfluence is crucial. This approach can be applied to otheropen-source influence scenarios where the language used isavailable for analysis, including marketing and some kindsof advertising, business reporting, and intelligence. However,the focus of the approach is what language tells us aboutthe source organizations and individuals, not the populationsegment intended to be influenced.

In Section II, we explain the word lists that we use todefine each of the seven channels. In Section III we commenton related work. In Section IV we lay out the experimentalframework. Section V presents the results of the analysis.Finally, in Section VI we draw some conclusions and suggestsome open problems suggested by the results.

II. CHANNELS OF INFLUENCE

We consider political speeches, and extract from thempatterns of language use on seven different channels.

Persona deception, the attempt to appear better than one is,is a form of deception and is captured well by Pennebaker’sdeception model [12]. This model characterized deceptionas causing changes in the frequencies of 86 words, in fourcategories. In deceptive text:

1) First-person singular pronouns (“I”, “me”, “myself”)decrease in frequency;

2) Exclusive words, words that single an increase in contentcomplexity (“but”, “or”) decrease in frequency;

3) Negative-emotion words (“afraid”, “disappointed”) in-crease in frequency;

4) Action verbs (“go”, “lead”) increase in frequency.Since the meaning of “increase” and “decrease” depends onsome baseline, this model can only be used to rank a group ofcognate documents by their relative deceptiveness, not labelan individual document as deceptive or not. For example, inbusiness writing, first-person singular pronouns are rare, soa mixture of such documents and, say, blogs would producenon-comparable deceptiveness scores.

For models based on parts of speech, we extract documentcontent using the QTagger [4], a part-of-speech aware taggerthat uses Lingpipe [1] as a front-end. This enables all wordstagged as nouns, as verbs, as adverbs, and as adjectives to beextracted and their frequencies counted.

Positivity and negativity are computed based on lists ofpositive and negative words created by Loughran and Mc-Donald [9] for business settings. Since political speechesoften have a significant economic component, these lists werejudged to be more appropriate than more-general lists used insentiment analysis, and targeted at consumer products, books,and movies.

III. RELATED WORK

There is a large literature on sentiment analysis (for ex-ample, [5, 7, 11, 14, 17]) including attempts to predict electionoutcomes [15]. That work addresses the converse of theproblem we address here: how to understand the reaction ofan audience to the effects of an influencer.

There is also a truly huge literature on the interactionbetween politics and language, addressing issues such as howthe choice of particular words frames a political discourse,how words change with political changes, and how politicianscan be better salespeople. This work is orthogonal to theproblem we address here, since it is primarily concerned withimproving the process of influencing.

The work closest to ours is work by psychologists who haveinvestigated the relationship between language use and mentalstate, for example Chung and Pennebaker’s work [2, 3] onhow words reveal psychological profiles and health; Newmanand Pennebaker’s work on deception [12], and Tausczik andPennebaker’s recent survey [13]. Attempts to identify ideologyfrom language usage, for example Koppel et al. [8], are alsorelevant.

IV. EXPERIMENTAL FRAMEWORK

The campaign speeches of presidential candidates in theperiod from January 2008 to the election (McCain, Clin-ton, Obama) and from January 2012 to early April 2012(Gingrich, Paul, Romney, Santorum, Obama) were collected.These speeches were, of course, often the product of manyauthors, so the insight they provide is insight into the thinkingof each campaign more than each candidate. That said, theindividuals who delivered each speech do make alterations on-the-fly, and speeches obviously based on the same script differsubstantially from one day to the next – so some insight intoindividuals is possible.

The complete set of speeches and either a specific word listor selected parts-of-speech tags were given to the QTaggerwhich extracted a document-word matrix in which each entryrepresents the frequency of (the tagged version of) each wordin each speech.

Each row of the matrix is normalized by dividing by the rowsum to reduce the effect of the difference in speech lengths.Each column of the matrix is normalized to z-scores using onlythe non-zero entries (subtracting the column mean from each

923891

column entry and dividing by the column standard deviation).The effect of this is that positive values in the normalizedmatrix represent greater than typical rates of use, negativevalues represent lower than typical rates of use, while absenceand median use are conflated to small magnitude entries.This effect is problematic, although justifiable, but there isno reasonable workaround.

In the deception model, the sign of those columns corre-sponding to words where a decrease in frequency is the signalof an increase in deception are reversed, so that positive entriesalways correspond to increasing deception.

If the data matrix has n rows (1 per document) and m

columns (1 per word), then each document can be consideredas a point in m-dimensional space. Understanding of the rela-tionships among documents can be increased by, for example,clustering in this space to find groups of similar documents.

However, m is typically large and there are advantagesin projecting the m-dimensional space into one of lowerdimension. Doing so removes some of the noise associatedwith polysemy, and also makes it possible to visualize therelationships among the documents.

We do this using singular value decomposition (SVD) [6].If A is the data matrix (n×m), then the SVD is given by:

A = USV′

where U is n × m, S is a diagonal matrix whose entries,called singular values, are non-increasing, V is m × m, andthe superscript dash indicates transposition. Both U and V areorthogonal matrices.

One interpretation of SVD is that it rotates the space so thatthe greatest variation is aligned with the axis given by the firstrow of V ′, the second-greatest uncorrelated variation with thesecond row of V

′, and so on. The rows of U are then thecoordinates of each document in this transformed space.

Because of the ordering of the axes, the decomposition canbe truncated after some k dimensions (a form of projection)so that

A ≈ UkSkV′

k

where U is n × k, S is k × k and V ′ is k × m. Themagnitudes of the singular values indicate how much variationis captured in each new direction and so estimates how muchis being ignored by truncation at any choice of k. The roleof the truncated SVD, therefore, is to create a data-dependentprojection of the data along axes that show the ways in whichthe documents vary.

If the data consists of a single underlying factor then theuse of SVD enables this to be detected as an almost-linearstructure in the low-dimensional space. Because of the inherentsymmetry between documents and words – if A = USV ′

then A′= V SU ′ – points corresponding to both documents

and words can be plotted in the same space (appropriatelyscaled). In such a plot, documents can be considered as beingdrawn towards the points corresponding to words that occurwithin them with higher frequency and, symmetrically, wordsas being drawn to documents in which they feature. The first

k columns of both the U and V matrices can be plotted in k

dimensions, with a point corresponding to each document andto each tagged word. The relationships among all of these canthen be observed in the plot: both documents and words maybe clustered. More importantly, the clusters agree in the sensethat a cluster of documents in the same region as a cluster ofwords indicates why the documents cluster (they use similarwords) and, symmetrically, why the words cluster (they appearin similar documents).

V. RESULTS

Throughout this section, the results are coded as shownin Table II. In all of the figures, distance from the origincorresponds to increased significance, and directions indicatedifferent kinds of variation.

Politician Symbol

Clinton (2008) magenta star

McCain (2008) cyan star

Obama (2008) red star

Gingrich green circle

Paul yellow circle

Romney blue circle

Santorum black circle

Obama (2012) red square

TABLE IISYMBOL CODING FOR RESULTS

The number of markers used in each experiment is givenin Table III. Each marker represents a particular word witha particular part-of-speech tag; hence, although the deceptionmodel uses 86 words, 98 tagged words were counted. Clearlyspurious words were removed, but the quality of parsing didnot make it possible to determine every spurious possibility(for example, “mine” as a noun instead of a pronoun). For allmodels, any word that occurred in more than 1 document wascounted.

Model Number of markers

Deception 98

Nouns 5169

Positive 246

Negative 763

Verbs 3299

Adverbs 453

Adjectives 1428

TABLE IIINUMBER OF TAGGED WORDS USED FOR EACH MODEL

A. Persona deception

The deception model produces the most puzzling resultsof all of the marker sets. Typically, this model produces asingle-factor structure, ranking documents from most to leastdeceptive. In this case, the structure is more complex (seeFigure 1).

924892

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

walk

without

i’llbut

ignorantlies

however

or

hate

lonely

look

moved

bringing

unless

crazy

taking

i’ve

except

i’dtragedy

go

going

driven

moved move

but

i’m

angerlie

worthless

golies

lie

carrying

flew

nor

go

arrived

or

look

mine

sorry

rather

drive

besides

run

me

however

carry

walk

ratherarrive

go

myself

fear

however

me rather

goes

fled

arrived

followed

vain

besides

terribleabandon

followedinadequate

mine

loss

arrives

enemy

run leadme

fear

my run

drive

greed

hate

mine

me

minemove

weakfollow

take

lead

action

ialthough

my

greeddisappointed

afraid

i

me

UV1

UV

2

Fig. 1. Speech distribution based on the deception model. The axes (here andin subsequent figures) are the directions of maximal variation (UV1) in boththe documents and words, and the next-largest uncorrelated variation (UV2).

In each direction from the center, there are different mix-tures of words from the four word classes, first-person singularpronouns, exclusive words, negative-emotion words, and ac-tion verbs. This might suggest that different candidates havetheir own individual patterns of use of these words, and soan individual style of persona deception – but this turns outnot to be the case. Rather, individuals over time change thewords they use, so that their trajectories in this space tend tobe roughly circular or spirals. These trajectories are shown inFigures 2–9.

B. Nouns

When we consider the content of speeches, as determinedfrom the nouns that they use, it is clear that each candidatehas a central topic focus. They tend to spend most of theirtime on particular topics (or, more accurately, topic blends)and although they do talk about other issues, they do so onlyfleetingly.

Figure 10 shows the relative topic mix of all of the can-didates in 2008 and 2012. The greater number of speechesfrom the 2008 campaign dominate but, despite this, seem tocover a wider range of topics than those of the 2012 campaignso far. Throughout, McCain shows a much wider varietyof language use than any other candidate. Figures 11–18shows the trajectory of content for each individual candidate.Although the lack of data for the 2012 campaign is limiting,it is clear that each candidate has a topic mixture aboutwhich they talk; and occasional deviations are almost alwaysfollowed by an immediate return to this topic center.

The actual topics can be determined by plotting the words,as shown in Figure 19. The extremal, and so most interesting,words are of four kinds: those associated with the economy(left), energy (bottom), security (lower right), and patriotism(upper right), with these latter two (unsurprisingly) blendingsomewhat.

0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

2

4

3

U1

1

6

5

U2

Fig. 2. Deception - Gingrich

−0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

1

5

U1

4

2

3

U2

Fig. 3. Deception - Paul

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08−0.12

−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

4

1

11

3

U1

10

2 8

5

6

9

7

U2

Fig. 4. Deception–Santorum

−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12−0.15

−0.1

−0.05

0

0.05

0.1

10

3

8

2

9 5

4

1

6

12

U1

7

11

U2

Fig. 5. Deception–Romney

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

7

U1

8

6

1

2

3

4

5U2

Fig. 6. Deception–Obama (2012)

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

104 24 26

20

31

28

14

6

63

30 25 83 75 80

77 89

66 64 87 98

50 84 70 7

97 82 92 99

16

103 106

46

101 86

62

85

94 105 109 88 15 108 91

102

112 12 72

22

13 93 3 79 76 52 100 54 47

73

41

57

10 65 23

43

110 90

58

45 55 96

68 29

69

74

113 51

33

95 9 8 4 36

111 56

53 32

71

35

48

67 11 27

40

42

39 1

38

78 5

59

44 81

34

18

19 17

49 107

37

60

21

U1

61

2

U2

Fig. 7. Deception–Obama (2008)

−0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08−0.02

0

0.02

0.04

0.06

0.08

0.1

3

1

12

29

24

2

9 19

11

30

7

27

21

4

U1

6

15

10

22

26

17

18

25

16

31

20

28

8

23

13

14

5

U2

Fig. 8. Deception–Clinton

−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

68 74 73 71

67 70

62 66 65

50

61 59

60

53 69

3

55

58

42

64

56

72

43 44

63

33

4 1

35

18

57

37

10

16

27

48

51

19

8

15

9

5

25

U1

2

32 21

45

54

49

34

22

40

29

13

30

52

38

23

39

11

17

46

47

36

75

26 41

24

28

6

14

20

31

7

12

U2

Fig. 9. Deception–McCain

C. Positive words

The most striking fact about the use of positive words isthat they separate into two almost orthogonal groups as shownin Figure 20. The most significant words in one group arewords such as “innovation”, “ingenuity”, and words with theroot “invent”; in the other group are words such as “stability”,“stabilizing”, and “rebound”. In other words, one group hasa positive sense that emphasizes novelty, while the other hasa positive sense that emphasizes a re-creation of some aspectof the past. Thus two very different mindsets, both in a sensepositive, are induced from the data. Individuals do not seem tobe strongly associated with one or other of these word groups –rather, they change from one viewpoint to the other in differentspeeches, rarely using both in a single speech. Figure 21 showsthe positions corresponding to each speech, overlaid on thewords used.

925893

−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.35

−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

U1

U2

Fig. 10. Candidates’ topic focus (stars: 2008, circles: 2012 Republicans,square: 2012 Obama)

D. Negative words

The negative words form four groups, as shown in Fig-ure 22: words related to crime and violence at the top left(“crimes”), words related to hostility at the lower left (“sus-picion”); political loss words at the lower right (“weakened”,“defeats”), and economic negative words at the right (“crisis”,“cut”).

Unsurprisingly, speeches are most often associated with thenegative economic words, in both election cycles, with anoccasional foray into the negative crime and violence words.The hostility words are almost exclusively associated with the2012 cycle. Figure 23 shows the speeches overlaid on the wordusage.

E. Verbs

The verbs form a one-dimensional continuum (with an or-thogonal branch associated entirely with speeches by McCain).At one end are verbs such as “get”, “invest”, “cut”, give”,“pay”, “need”, and “afford” which seem to be determinedby economic content; at the other are words such as “be-longs”, “deprived”, “command”, “seek”, “tolerate”, “detest”,“involve”, and “destabilize” that seem to be associated with aninternational focus, with an emphasis on hostile relationships.The verbs are shown in Figure 24. Past tenses are rare,suggesting that candidates attempt to engage the audience inthe present and future, rather than the past. This sheds somefurther light on the split of positive words – the intent seemsnot to be to return to some positive past but to recreate apositive past.

Figure 25 shows the speeches overlaid with the verbs.Unsurprisingly, the majority of speeches are associated withthe economic verbs.

F. Adverbs

The pattern of use of adverbs is shown in Figure 26.Despite appearances, this is a single factor structure – the

−0.015 −0.01 −0.005 0 0.005 0.01−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

3

5

4 1

U1

2

6

U2

Fig. 11. Nouns–Gingrich

−0.01 −0.005 0 0.005 0.01 0.015−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02 1

5

U1

2

3

4

U2

Fig. 12. Nouns–Paul

−0.018 −0.016 −0.014 −0.012 −0.01 −0.008 −0.006 −0.004 −0.002 0−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

6

7

3 10

8

5

1

U1

2

9

11

4

U2

Fig. 13. Nouns–Santorum

−0.05 −0.04 −0.03 −0.02 −0.01 0 0.01 0.02−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

11

7

1

12

6

U1

10

2

3

5

4

9

8

U2

Fig. 14. Nouns–Romney

−0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

5

U1

1

8 3

4

2 6

7

U2

Fig. 15. Nouns–Obama (2012)

−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

40 59

60

18

61

51

70

25

37 32

49

16

U1

35

17

2

104

21

36

43

50

6

10

96

71

58

8

1 15

53

44

45

74

5

57

39

4

73

34 48

56

55

107

30 9 46 12 7

65

38

19

24

27 11

54

20

67

23

26 29 14

69

41

33

72

28

68 47

79

13

63

31

3

66 81

42

112

78

52

110

76

64 62 22

113

84

111

86

83 75 77 80 95

106

87

94 85 88

90 89 92 93 82 97

91 100

102

101 98 109 108 99 103 105

U2

Fig. 16. Nouns–Obama (2008)

−0.05 0 0.05 0.1 0.15 0.2−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

31

14 16

U1

20

22

8

5

28 25

27

7

10 23

13

11

29

24

2

30

15

9

1

21

17

19 12

3

26

6

18 4

U2

Fig. 17. Nouns–Clinton

−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.35

−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

31

29

7

14

23 11

46

12

30

10

49

22

8

26

51

13

41

36

28

4

U1

9

25

75

24

5 47

39

44 18 45

1

34

64

32 40 54

3

38

27

48

37

20

35

19

2 17 16

15

58

21

33 50

55 59

6

52 43

60 61 56

42

53 69 66

70 73 74 57 68 71 65 62 67

63 72

U2

Fig. 18. Nouns–McCain

vertical branching is almost entirely due to McCain’s pro-lific and divergent use of adverbs at right-hand end of thespectrum. The spectrum itself is hard to characterize, exceptthat adverbs at the left-hand end seem to positive-soundingand to have a sense of motion (“swiftly”, “boldly”, “briefly”).Significant adverbs at the right-had end of the spectruminclude “irresponsibly”, “intentionally”, and “illegally”, whicharguably have negative connotations; but also “humanely” and“occasionally”.

The speeches overlaid with the adverbs are shown in Fig-ure 27.

G. Adjectives

The pattern of use of adjectives is shown in Figure 28.There are three strong clusters of significant adjectives: thoseassociated with energy (upper left), those associated withsecurity (right-hand side), and those associated with patriotism

926894

−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

case

king birmingham

man

life

government

loyalties obedience

others

supply use

gasoline source foresight

necessities

glaciers invention zero−emissions

obama

countrymen

emissions power carbon

customer vehicles technologies

abuses

ice wildlife

automaker

hatred

paths

emission reference animal satellite wilds snowpack habitat runoffs antarctic biologists hybrids

battery

leapfrog migration

cap−and−trade prudence

paralysis

temperatures

qualities

planners capitalists motive environmentalists rivers production

good

ingenuity reduction

stewardship

shelves

purchase

fidelity

arctic northwest

prize shortage

fossil gasses

day

greenhouse leap

sheets

energy

opec challenge gases

incentives

price

automakers powers

demand

gallon car plug−in alternatives

oil

selfishness

cars reserves

spending

gas

fuel

excesses

efforts

obama’s

drought patterns

scientists sea

standards

targets dangers

sources market

grandchildren’s fight

wind

grief jerusalem

peace regime

force

ditch

egypt

responsibilities officer

nations

aipac extremists concessions victor israeli−palestinian ben−gurion kohr

honor

bond

violence

jews goldwasser

integration

rocket israel’s camp

concern

world’s

technology

eldad shalit olmert regev gilad

businesses

ehud israel

hamas

levels

palestinians survivors hezbollah

slaves

cover divestment

syria

iran’s sderot rockets

threat

abbas mahmoud vashem hardliners yad palestinian nonproliferation holocaust

duty

allies

sanctions israelis

ahmadinejad alliance

senator

taxes

lebanon gaza kilos

jobs

counterinsurgency

iran

streets

annihilation mass

diplomacy

tehran non−proliferation

capital

destruction materials

companies

extremism

beirut

leverage

al arab anbar

uranium

iranian

vietnam

tel aviv weapons

tax

stockpiles

homeland

lugar cost

threats

economy

strategy

men

sacrifice

terror

claims

militias

defeat

terrorism

service

devotion

wall

summons

crisis liberators

college

un

jim

security terrorists

region

cracks

east

arms

valor street

plan

stain

servicemen

forces

wreckage

basra gains plastic

forces’

balkans encounter

records

treatment

sacrifices

tour

sailors casualty heroism

honors fascism

disgrace

kuwait

qaeda

webb

complaints hands−on

notes

appointments

lincoln

veteran’s

tolerance

marble forgets

vba va’s

crack

grandfather

politician’s

heroes

vets fha airmen

century

patton’s

disorder

illness

grandmother

families

warriors laugh cardboard

smile

petraeus

bragg

appointment

military’s

evaluation rifle

facilities

measure

bronze

bomber

retention

flag cemetery leavenworth

webb’s

it’s rehabilitative backlogs distances

tape

marines

va

rehabilitation

outpatient

brain

ptsd

afghanistan

graves homelessness

fort

spouses

punchbowl copies faction

war

harbor salute gi veteran

taps dunham stanley means−testing

uniform enlistment

veterans army

iraq

screening

tours deployments

pearl

walter reed

injuries troops injury

V1

V2

Fig. 19. Nouns labelled when their distance from the origin exceeds twicethe median

−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

stability

resolve

rewards

stabilizing

transparency

win

accomplish

improvements

innovation

rebound

regained succeeded proactive

efficiency advances

ingenuity

invention

adequately

favorites

lucrative

good

inventors

collaborating

influential

invent

advantages

impressive

greater

delighted

achieved

satisfied

impress

enthusiasm

tremendous

V1

V2

Fig. 20. Positive words labelled when their distance from the origin exceedstwice the median

(and especially veterans) at the bottom of the figure. Asexpected, there is general agreement between this structure andthe structure of the nouns except that there are no significantadjectives associated with the economic nouns.

Figure 29 shows the speeches overlaid with the adjectives,making it clear how popular energy-related adjectives are.

VI. DISCUSSION AND CONCLUSIONS

Persona deception is an important part of politics since can-didate success usually depends on convincing independents orswing voters. Neither particular policies nor factual deceptionseem to have much effect, perhaps because voters have becomecynical about both. Our results are puzzling, since there is nosimple spectrum of these speeches from most to least personadeceptive, and this is surprising.

We suggested that an intuitive hypothesis is that candidateswill talk about the same issues, and so tend to use the same

−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

stability

resolve

rewards

stabilizing

transparency

win

accomplish

improvements

innovation

rebound

regainedsucceededproactive

efficiencyadvances

ingenuity

invention

adequately

favorites

lucrative

good

inventors

collaborating

influential

invent

advantages

impressive

greater

delighted

achieved

satisfied

impress

enthusiasm

tremendous

UV1

UV

2

Fig. 21. Speeches and positive words overlaid

−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

defaults concede

dangerous

injury injuries

delay weakened

claims

defeat

deeper

destruction

traumatic

defeats

complaints summons

slowdown

refinancing

slowing

defend

tolerate deter

deprived

force

tragedies

collapsing

illicit

defaulting

deterrent

corruption

renounce

threat

devastated

argue

cut

misrepresenting

injured sacrificed

inevitable

quit

divest incident volatile

adversaries worthless

criticisms

sacrifice wrongly

hostile

warning

force

sacrifices

worse troubles

aftermath

suspected

bad

error

destroy

denies dishonor

persistent

incompetents declined

defending

sharply monopoly threaten

degradation

concerns threats

recall

dangers

impedes deprives

destabilize

barriers discourages

difficulty

illegally

unintended

hurt

victims

vulnerability litigate plaintiff inconsistency suing

embargo

relinquish

prosecution

hostility prosecute persists

concern

displaced fines

stops

violent

crimes

denigrate

disturbing detain

assaults

dismiss

penalties

lost

convicted criminal

incomplete inconsistent

exploitation contradict

premature

crime

cut

offenders

overcome

confine

prematurely confusion

turmoil bankruptcy

claiming

serious

breaks

criminals

sued

disregarded

crisis

violated guilty

penalty

abuses objections disregard

disagreements

abuse

caution abuse

suspicion inconvenience disappoints disregarded

V1

V2

Fig. 22. Negative words labelled when their distance from the origin exceedstwice the median

nouns, with perhaps even greater agreement for candidatesfrom the same party. This appears to be broadly supportedby the data, although each candidate carves out a region ofthe space of topics where they tend to concentrate. Overtime, they make occasional forays into other topic mixes,but the path of topics chosen over time tends to be star-like. The candidates in the 2008 election cycle covered amuch broader range of topics, but it is not yet clear whetherthis is an inherent difference or an artifact of the smallerdataset available for 2012. The 2008 election cycle was alsoanomalous because there was no incumbent in either party;this may have promoted a more wide-ranging set of topicssince nobody was running on a existing presidential record.

We also suggested that election speeches would be markedby positivity, as candidates attempt to make voters feel goodabout the prospect of themselves as successfully elected. Thisis not supported as well as expected – there is a considerable

927895

−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

defaultsconcede

dangerous

injuryinjuries

delayweakened

claims

defeatdeeper

destructiontraumatic

defeats

complaintssummons

slowdown

refinancing

slowingdefend

toleratedeter

deprived

forcetragedies

collapsing

illicit

defaulting

deterrent

corruption

renouncethreat

devastated

argue

cut

misrepresentinginjuredsacrificed

inevitablequit

divestincidentvolatileadversariesworthless

criticismssacrifice

wronglyhostile

warning

force

sacrifices

worsetroubles

aftermathsuspected

bad

error

destroy

deniesdishonorpersistentincompetentsdeclined

defending

sharplymonopolythreaten

degradationconcernsthreats

recalldangers

impedesdeprives

destabilize

barriersdiscouragesdifficulty

illegally

unintended

hurtvictims

vulnerabilitylitigateplaintiffinconsistencysuing

embargo

relinquish

prosecutionhostilityprosecutepersists

concerndisplacedfines

stops

violent

crimes

denigratedisturbingdetain

assaults

dismiss

penalties

lost

convictedcriminal

incompleteinconsistent

exploitationcontradict

premature

crime

cut

offenders

overcome

confine

prematurelyconfusion

turmoilbankruptcy

claiming

serious

breaks

criminals

sueddisregarded

crisis

violatedguilty penalty

abusesobjectionsdisregard

disagreements

abuse

cautionabuse

suspicioninconveniencedisappointsdisregarded

UV1

UV

2

Fig. 23. Speeches and negative words overlaid

−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

was had

were lived

fight

concede

stand

loved

thank

quit

digging

fought

infighting strengthens

served

caring

freeze

loved

bailing

redistributing

scrub overtaken

clean

going

prove

injured

weakened

invaded

prepared

trusts

back

kicking lined

referred force

handed march

sliding

prosper

waiting

enlisted

planned presuming

controlling refinancing

revealed

rest folded

creeping

honoring

devastated

sacrificed

nearing

cut keeps

delay

defeats

how hits

risked

forgot

let

earned

fine

elected

succeeded

spend

doomed raising

defend

spread kill

taxing

collapsing

worried

slowing

served

determine

armed

matched

proposes tax

suspect

repeat

involve

command get

planning invaded

fleeting

acts

disabled stir

loving owe

tested

hide

quote weigh spending

spread

argue

veto

raise

measuring

legislate limb

employ

give

me

spreading

memorialized

extended lagging

creates

caregivers

blocked kept

received

accepts

instructed rank outweighs shouldered

applauding improves budgeting mix attach co−authored affording climbed specialized incurred aiming

charged deprived

detest suffered enters cosponsored

borne shed

applied

rank

expands returned misrepresenting

requiring repay shielded square

abandon

sponsored

enriched

receive

possess destabilize

suffer refined

belongs

entering considered secured

appreciate

forge

maintains denies

create

knock

acted

tolerate broaden

lower

civilized

enhance

transfer

cut killed

breaks

pose

isolate

out−compete

protect

giving

posed perceived

act

occupied tightened bolster

destroy

plunge

misled transferring emerging

tighten

break seek

guard

fed rent need

afford

rebuilding battled

proposed

vanishing

risen

acquire

steer

sell

united

outsourced

accelerating

prevent

excusing

investing

invest

strike

propose

provide

adapt

produce

threaten

strengthen

save pay

rising

meet

V1

V2

Fig. 24. Verbs labelled when their distance from the origin exceeds twicethe median

range of positivity. This may be the result of inherent limita-tions on the ability of each candidate to be positive throughouta gruelling process. The surprising discovery here is thatpositivity divides strongly into two factors, one characterizedby novelty and the other by re-creation. These do not separateparties or candidates but are different viewpoints taken byindividual candidates at different times.

Particular classes of verbs, for example verbs of persuasionor action verbs, have been previously considered. Our purelyinductive approach has not. Our results show a strong single-factor structure to verb use, but the characteristics that definethe spectrum of verbs are not obvious, and further work isneeded here. Unsurprisingly, the spectrum of adverbs thatmodify these verbs, which also has a single-factor structure,is also difficult to understand. The structure of adjectivesmatches, to some extent, the structure of nouns; interestingly,the content category of nouns for which matching adjectives

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

washadwerelived

fight

concede

stand

loved

thank

quit

digging

fought

infightingstrengthens

served

caring

freeze

loved

bailingredistributing

scrubovertakenclean

going

prove

injured

weakenedinvaded

prepared

trusts

back

kickinglined

referredforce

handedmarchsliding

prosperwaiting

enlisted

plannedpresumingcontrollingrefinancing

revealed

restfolded

creeping

honoring

devastated

sacrificed

nearingcutkeeps

delay

defeats

howhits

risked

forgotlet

earned

fine

elected

succeeded

spenddoomedraising

defend

spreadkilltaxing

collapsing

worried

slowing

served

determine

armedmatched

proposestax

suspect

repeat

involvecommandget

planninginvaded

fleeting

acts

disabledstir

lovingowe

tested

hide

quote weighspending

spread

argue

veto

raise

measuring

legislatelimb

employ

giveme

spreading

memorializedextendedlagging

creates

caregiversblockedkept

receivedaccepts

instructedrankoutweighsshouldered

applaudingimprovesbudgetingmixattachco−authoredaffordingclimbedspecializedincurredaimingchargeddepriveddetestsufferedenterscosponsoredborneshed

applied

rankexpandsreturnedmisrepresenting

requiringrepayshieldedsquare

abandon

sponsored

enriched

receivepossessdestabilize

sufferrefined

belongsenteringconsideredsecuredappreciate

forgemaintainsdenies

create

knock

actedtoleratebroaden

lower

civilizedenhance

transfer

cut killed

breaks

poseisolate

out−compete

protect

giving

posedperceived

act

occupiedtightenedbolsterdestroy

plunge

misledtransferringemerging

tighten

break seekguard

fedrentneed

affordrebuildingbattledproposed

vanishing

risen

acquire

steer

sell

unitedoutsourced

acceleratingprevent

excusing

investing

investstrike

propose

provide

adaptproduce

threatenstrengthen

savepay

rising

meet

UV1

UV

2

Fig. 25. Speeches and verbs overlaid

−0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

diligently

heady keenly

intensely extraordinarily

powerfully

intentionally

here

historically

mr

temporarily fairer

really

else’s

personally

naturally

profoundly

eventually

out —

faithfully

aggressively

— seldom

—

accurately

approximately traditionally

uniquely

grossly

outside

irresponsibly quite

before

nonetheless

widely

tenaciously

possibly

annually

freely

generously

together

constantly

sooner

mostly physically

politically

expeditiously

exclusively

occasionally

often

otherwise

humanly

rightly

seriously

briefly

neither

yes

illegally

o practically

humanely

however

sure

only deeper

elsewhere

recklessly steadfast

boldly

unwavering backward

longer swiftly

whomever merely

low

surely

V1

V2

Fig. 26. Adverbs labelled when their distance from the origin exceeds twicethe median

are rare is that of economics.Politicians are good surrogates for more general influencers

because they are highly motivated and well-funded, becausethey must attempt to influence many people in varied settings,and because they cannot appeal only to those who are alreadywilling to be influenced. The approach, and perhaps many ofthe results, presented here can be applied to other influencesettings such as advertising and terrorism. Recall that our goalis not so much to understand the process, although we do gainsome insight, but to extract information about the influencersfrom their use of the process.

REFERENCES

[1] Alias-i. Lingpipe 4.1.0. http://alias-i.com/lingpipe, 2008.[2] C.K. Chung and J.W Pennebaker. Revealing dimensions of thinking in

open-ended self-descriptions: An automated meaning extraction methodfor natural language. Journal of Research in Personality, 42:96–132,February 2008.

928896

−1 −0.5 0 0.5 1 1.5 2 2.5−5

−4

−3

−2

−1

0

1

2

3

diligently

headykeenly

intenselyextraordinarily

powerfully

intentionallyhere

historically

mr

temporarilyfairerreally

else’s

personally

naturally

profoundlyeventually

out —faithfully

aggressively

— seldom—

accurately

approximatelytraditionally

uniquely

grossly

outside

irresponsiblyquitebefore

nonetheless

widely

tenaciously

possiblyannually

freely

generously

together

constantly

sooner

mostlyphysically

politicallyexpeditiously

exclusively

occasionally

often

otherwise

humanly

rightly

seriously

briefly

neither

yes

illegallyopractically

humanely

however

sure

onlydeeper

elsewhere

recklesslysteadfast

boldly

unwaveringbackward

longerswiftly

whomevermerely

lowsurely

UV1

UV

2

Fig. 27. Speeches and adverbs overlaid

−0.1 −0.05 0 0.05 0.1 0.15−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

retail self−employed nationalized prone follow−up pre−surgical documenting corresponding uncontrolled cost−effective health−care chronic life−saving

rigid

clinic

congenital

medical

high−risk expensive

awful vigorous responsive

convenient

alike

current

cheapest oldest lower

useful

reliable

appropriate

electronic

such

inadequate

wasteful

worthless

mental multiple

predictable

wounded spinal care post−traumatic

unrelated

sad braver

fresh−faced

small

unworthy selfless

on−time at−risk

vet

solar

supportive

memorial

eligible

insufficient

bravest

timely

adequate

service−member

immense

traumatic

clearer

pending

american

unbroken

fuel−efficient

silent

deplorable

inspiring

greater ovarian

unpredictable

hasty

substandard valuable

loved pacific

final

older

bipartisan

new

resolute

solemn

certain

main

richest

red

middle

sacred

forthright

homeless brave

bottom−up brief

unconditional unruly immature fortunate fascinating guilty

submarine lasting

strategic

naval

hostile

japanese

civilian

indebted repetitive

itinerant ephemeral instrumental

national

proud

true

young

cooperative single−minded

military

transnational biological verifiable shared atomic ever−changing deadliest bipolar multilateral

al

white

illicit

prime unshakeable

nuclear

elusive unbreakable sacrosanct loose ballistic israeli

diplomatic european qualitative jewish

international

iranian terrorist

V1

V2

Fig. 28. Adjectives labelled when their distance from the origin exceedstwice the median

[3] C.K. Chung and J.W. Pennebaker. The psychological function offunction words. In K. Fiedler, editor, Frontiers in Social Psychology.Psychology Press, in press.

[4] J.L. Creasor and D.B. Skillicorn. QTagger: Extracting word usage fromlarge corpora. Technical Report 2012-587, Queen’s University, Schoolof Computing, 2012.

[5] N. Godbole, M. Srinivasaiah, and S. Skiena. Large-scale sentimentanalysis for news and blogs. In ICWSM 2007, 2007.

−1 −0.5 0 0.5 1 1.5 2 2.5−4

−3

−2

−1

0

1

2

retailself−employednationalizedpronefollow−uppre−surgicaldocumentingcorrespondinguncontrolledcost−effectivehealth−carechroniclife−savingrigid

clinic

congenital

medical

high−riskexpensiveawfulvigorousresponsive convenient

alike

current

cheapestoldestloweruseful

reliableappropriate

electronic

such

inadequate

wastefulworthless

mentalmultiple

predictablewoundedspinalcarepost−traumatic

unrelated

sad braver

fresh−faced

small

unworthyselfless

on−timeat−riskvet

solar

supportivememorial

eligible

insufficient

bravest

timely

adequateservice−member

immense

traumatic

clearer

pending

american

unbroken

fuel−efficient

silent

deplorableinspiring

greaterovarian

unpredictable

hastysubstandard

valuable

loved pacific

finalolder

bipartisan

new

resolute

solemn

certain

main

richestred

middle

sacred

forthright

homelessbrave

bottom−up brief

unconditionalunrulyimmaturefortunatefascinatingguilty

submarinelasting

strategic

naval

hostile

japanesecivilian

indebtedrepetitiveitinerantephemeralinstrumental

national

proud

true

young

cooperativesingle−minded

military

transnationalbiologicalverifiablesharedatomicever−changingdeadliestbipolarmultilateralal

white

illicitprimeunshakeable

nuclear

elusiveunbreakablesacrosanctlooseballisticisraelidiplomaticeuropeanqualitativejewish

internationaliranianterrorist

UV1

UV

2

Fig. 29. Speeches and adjectives overlaid

[6] G.H. Golub and C.F. van Loan. Matrix Computations. Johns HopkinsUniversity Press, 3rd edition, 1996.

[7] H. Kanayama, T. Nasukawa, and H. Watanabe. Deeper sentimentanalysis using machine translation technology. In Proceedings of the20th International Conference on Computational Linguistics, 2004.

[8] M. Koppel, N. Akiva, E. Alshech, and K. Bar. Automatically classifyingdocuments by ideological and organizational affiliation. In Proceedingsof the IEEE International Conference on Intelligence and SecurityInformatics (ISI 2009), pages 176–178, 2009.

[9] T. Loughran and B. McDonald. When is a liability not a liability? textualanalysis, dictionaries, and 10-Ks. Journal of Finance, 66:35–65, 2011.

[10] C. Matthiessen and M.A.K. Halliday. Systemic functional grammar: Afirst step into the theory. Macquarie University Working Paper, 1997.

[11] T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability usingnatural language processing. In Proceedings of the 2nd InternationalConference on Knowledge Capture, pages 70–77, 2003.

[12] M.L. Newman, J.W. Pennebaker, D.S. Berry, and J.M. Richards. Lyingwords: Predicting deception from linguistic style. Personality and SocialPsychology Bulletin, 29:665–675, 2003.

[13] Y.R. Tausczik and J.W. Pennebaker. The psychological meaning ofwords: LIWC and computerized text analysis methods. Journal ofLanguage and Social Psychology, 29:24–54, 2010.

[14] P.C. Tetlock. Giving content to investor sentiment: the role of media inthe stock market. Journal of Finance, 62:1139–1168, 2007.

[15] A. Tumasjan, T.O. Sprenger, P.G. Sandner, and I.M. Welpe. Predictingelections with Twitter: What 140 characters reveal about politicalsentiment. In Proceedings of the Fourth International AAAI Conferenceon Weblogs and Social Media, pages 178–185, 2010.

[16] C. Whitelaw and S. Argamon. Systemic functional features in stylistictext classification. In Proceedings of AAAI Fall Symposim on Styleand Meaning in Language, Art, Music, and Design, Washington, DC,October 2004.

[17] C. Whitelaw, N. Garg, and S. Argamon. Using appraisal taxonomiesfor sentiment analysis. In Second Midwest Computational LinguisticColloquium (MCLC 2005), 2005.

929897

[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...

Documents