conspiracy, complaints, and fraud: the language of reasons

123
The language of reasons Tyler Schnoebelen Conspiracy, complaints, and fraud

Upload: idibon1

Post on 12-Apr-2017

1.588 views

Category:

Law


1 download

TRANSCRIPT

Page 1: Conspiracy, complaints, and fraud: The language of reasons

The language of reasons

Tyler Schnoebelen

Conspiracy, complaints, and fraud

Page 2: Conspiracy, complaints, and fraud: The language of reasons

2

1) Showing how computational linguistics solves business problems2) Identifying markers of fraud using language data

For company-internal fraud/compliance investigatorsFor government/regulatory/consumer advocacy

3) Detecting and using rationalization and reason-givingThe importance of emotionThe case of because in

Consumer complaintsConspiracy forum posts

Hi! Welcome to the slides for this talk—also check out the Notes. Basically this talk is about:

Page 3: Conspiracy, complaints, and fraud: The language of reasons

3

Fraud

Page 4: Conspiracy, complaints, and fraud: The language of reasons

4

Page 5: Conspiracy, complaints, and fraud: The language of reasons

5

The Association of Certified Fraud Examiners looked at 1,483 fraud cases reported in 2014

They estimate global fraud loss is at least 5% of revenue for companies

Estimate of losses to fraud, worldwide

$3.7 trillion

Page 6: Conspiracy, complaints, and fraud: The language of reasons

6

Page 7: Conspiracy, complaints, and fraud: The language of reasons

7

Page 8: Conspiracy, complaints, and fraud: The language of reasons

8

Financial statement fraud is much more expensive

Page 9: Conspiracy, complaints, and fraud: The language of reasons

9

By industry

Page 10: Conspiracy, complaints, and fraud: The language of reasons

10

For the big dollars, look to the top

Page 11: Conspiracy, complaints, and fraud: The language of reasons

11

What are the red flags for fraudsters?

Page 12: Conspiracy, complaints, and fraud: The language of reasons

12

Losses are rarely recovered

Page 13: Conspiracy, complaints, and fraud: The language of reasons

13

Detecting deception

Page 14: Conspiracy, complaints, and fraud: The language of reasons

14

Prior work tends to be “word lists” or experiments

L&Z used 29,663 transcribed quarterly earnings calls

16,577 CEO Q&A responses14,462 CFO Q&A responses

L&Z keep track of when quarterly financial statements were later restated (during first call they knew something was amiss)

Depending on strictness of restatement, 14%, 7% or 5% of the calls had deception in them.

Larcker & Zakolyukina (2010)

Page 15: Conspiracy, complaints, and fraud: The language of reasons

15

Larcker & Zakolyukina (2010)

CEOs CFOsReferences to general knowledge (you know) more more

Non-extreme positive emotion words fewer fewer

References to shareholder value/value creation fewer fewer

Self-references fewer

3rd person plural/impersonal pronouns more fewer

Extreme negative emotion words fewer

Extreme positive emotion words more

Certainty words fewer more

Hesitation words fewer more

Page 16: Conspiracy, complaints, and fraud: The language of reasons

16

Text analytics

Page 17: Conspiracy, complaints, and fraud: The language of reasons

17

Jason Brenier
Remove red line pls
Tatiana Mejia
[email protected] Recreate the primary task bar graphs without the vertical guidelines. Keep the legend. Add a colored light gray box for the narrative we will be adding re. energy [email protected] we need the text for the box from you. We also need to know what the label the primary task.
Page 18: Conspiracy, complaints, and fraud: The language of reasons

Text analytics

18

Linguistics:Scientific study of language

Machine Learning:Automatically train computers to make human-like decisions● Compliance monitoring

● Enterprise search● E-Communications surveillance● Technology assisted review● Sentiment analysis● Deception detection● Text summarization

Natural Language Processing:Enable machines to automatically derive meaning from natural language input

Page 19: Conspiracy, complaints, and fraud: The language of reasons

Fraud and compliance in digital communications

19

Early case assessment

Relevancy filtering

Risk Scoring

Key entitiesStrategic communications

Spam, NewslettersNear de-duplication

Fraud diamondSentimentPersonal communications

Investigation stage Models100% Data volume

30%

10%

< 1%

Page 20: Conspiracy, complaints, and fraud: The language of reasons

Top-down vs. bottom-up text analytics

20

“Bribe” …

“Tea money”

“Facilitation payment” “Backhande

r”

Top-down (Search)● Rules-based● False positives/negatives● Brittle

Bottom-up (Discovery)● Statistical● Highly accurate● Adaptive

Page 21: Conspiracy, complaints, and fraud: The language of reasons

21

Page 22: Conspiracy, complaints, and fraud: The language of reasons

Comparing rules vs. machine learning

22

High accuracy on complex task after only 1 day of work

Project goal: Uncover key documents relevant to Energy Regulation out of 200,000 messages that matched raw keywords

Page 23: Conspiracy, complaints, and fraud: The language of reasons

23

Flexible Ontology

23

Develop rich ontology for investigative analytics and insights at scale

Cline

Cline

1Client’s

QuestionsKnown Areas of Interest

Pressure

Rationalization

Names

Opportunity

Capability

Emotions

Topic Modeling

Themes

?

2

21

3

Page 24: Conspiracy, complaints, and fraud: The language of reasons

24

Data gets smarter and more accurate through adaptive system

Adaptive System Structured Data Reports

Action• Annotation suggestions• Document priority• Shortest path for coverage• Error detection

Machine Learning

Optimization Prediction Engine

Human Review

4 5 6

6

Page 25: Conspiracy, complaints, and fraud: The language of reasons

Idibon’s models drive more accurate, scalable investigations of fraud

25

Identify indicative language• Identify and extract indicators correlated with fraud• Gather data from disparate structured,

unstructured, public, and private data sourcesModel fraud within the organization

• Score and rank individual custodians by likelihood of fraud

• Summarize indicators of fraud by department or scheme

Scale across people and clients• Model fraud using documents from multiple

custodians• Build replicable models for different client types

Monitor and track risk• Model on-going risks in client interactions• Track known liability or non-compliance issues

1.

2.

3.

4.

Page 26: Conspiracy, complaints, and fraud: The language of reasons

Detecting fraud requires a variety of modelsStrategic Communications: Automatically identify important communications based on the language used in emails with a BCC recipient

Fraud Triangle and Fraud Diamond: Identify messages containing indicators of Motive, Opportunity, Rationalization and Capability to risk-rank actors and their communications

Key Entities: Discover people, places, organizations, and other entities mentioned in communications to uncover hidden relationships

Personal Messages: Flag messages that are intimate in nature and that may contain evidence of illicit behavior or collusion

Sentiment Analysis: Categorize communications as positive, neutral, or negative

Taboo Words and Obscenity: Identify emotionally charged language that may reflect behaviors and events of interest

Page 27: Conspiracy, complaints, and fraud: The language of reasons

enron report merger (Corporate communications about mergers that you probably DON’T care about)

27

Find needles in haystacks: quickly hone in on relevant areas of the data

legal f&j citizens “I also find the advance ethical waiver language repugnant, but could agree to it if the other modifications mentioned could be made.”

employees enron bankruptcy“Michelle, here is a suggested revision to Section 3.4 B … If a terminated employee who is entitled to receive a severance benefit … the severance benefit payable under the Plan shall be reduced and offset”

time good back(Lots of irrelevant stuff about home, weekends, Thanksgiving, etc.)

Page 28: Conspiracy, complaints, and fraud: The language of reasons

Sentiment analysis and automatic topic discovery reveal significant communications

28

Negative: Antitrust issues, M&A, Insider TradingPositive: Product Releases, Employment

Page 29: Conspiracy, complaints, and fraud: The language of reasons

29

The Fraud Triangle (and briefly, the Fraud

Diamond)

Page 30: Conspiracy, complaints, and fraud: The language of reasons

30

The Fraud Triangle

Rationalization

Opportunity

FRAUD SCORE

Pressure

Page 31: Conspiracy, complaints, and fraud: The language of reasons

31

The Fraud Triangle

Rationalization

Opportunity

FRAUD SCORE

Pressure

Pressure:

Incentives, wants, needs (e.g., gambling debts)

Page 32: Conspiracy, complaints, and fraud: The language of reasons

32

The Fraud Triangle

Rationalization

Opportunity

FRAUD SCORE

Pressure

Opportunity:

Weaknesses in the system that allow fraud to happen

Page 33: Conspiracy, complaints, and fraud: The language of reasons

Image Placeholder

33

Capability(Makes it a Fraud Diamond)

Personal traits and abilities

• Effective lying• Immunity to stress• Intelligence• Confidence

Page 34: Conspiracy, complaints, and fraud: The language of reasons

34

Jason Brenier
pls remove red line at top
Page 35: Conspiracy, complaints, and fraud: The language of reasons

35

Page 36: Conspiracy, complaints, and fraud: The language of reasons

36

But let’s return to the peaky point

Rationalization

Opportunity

FRAUD SCORE

Pressure

Rationalization:

Committing fraud is worth the risk

Page 37: Conspiracy, complaints, and fraud: The language of reasons

37

Jason Brenier
pls remove red line at top
Page 38: Conspiracy, complaints, and fraud: The language of reasons

38

When and how do people give reasons?

Page 39: Conspiracy, complaints, and fraud: The language of reasons

39

Because because because because

Page 40: Conspiracy, complaints, and fraud: The language of reasons

40

Page 41: Conspiracy, complaints, and fraud: The language of reasons

41

Conditions:• “Excuse me, I have (5 or 20) pages. May I use the Xerox machine?”

(no-because)

• “Excuse me, I have (5 or 20) pages. May I use the Xerox machine, because I’m in a rush? (because)

• “Excuse me, I have (5 or 20) pages. May I use the Xerox machine, because I have to make copies?” (because-empty)

The idea here is that the because-empty clause offers no information. For 5 pages: because = because-empty >> no-becauseThough when stakes are higher (20 pages): because > because-empty > no-because

Langer, Blank and Chanowitz (1978)

Page 42: Conspiracy, complaints, and fraud: The language of reasons

42

• “Given” information comes before “new”—so usually people say “such and such happened because of X” rather than “Because of X, such and such happened”

• Given: what’s been said already, inferable, familiar, expected• Easier to process new information when it’s framed• See Chafe (1984) and lots of others

• “causal clauses are primarily used to back up a previous statement that the hearer may not accept or may not find convincing” (Diessel 2006)

• Conversation analysts find becauses offered by either speaker right before a disagreement

• In English speech, they are surrounded by pauses, hesitations, excuses, mitigations, indirectness, partial agreement, polarity reversals (see Ford & Mori 1994)

Quick lit review

Page 43: Conspiracy, complaints, and fraud: The language of reasons

43

Two main coherence relations: cause-consequence and argument-claimCausality and Subjectivity are keyConsider:

The sun was shining CONNECTIVE the temperature rose quickly

Causality

The neighbors’ lights are out CONNECTIVE they are not at home

Subjectivity

Some languages use different connectives

Sanders (2003)

Causality Subjectivity

Dutch doordat want

French parce que puisque

German weil denn

Page 44: Conspiracy, complaints, and fraud: The language of reasons

44

Children learning English learn things in this order (Bloom et al 1980):Additive < Temporal < Causal < Adversative

and < and then < because < so < but

That is, causal connectives are seen as more complex (see also Piaget 1924/1969, Katz & Brent 1968, Clark 2003, Vers-Vermeul 2005)BUT causally connected information is remembered better

And causal relations are read fasterReading time decreases when causality increases

More Sanders (2003)

Page 45: Conspiracy, complaints, and fraud: The language of reasons

45

Digression! A new construction!

Page 46: Conspiracy, complaints, and fraud: The language of reasons

46

Page 47: Conspiracy, complaints, and fraud: The language of reasons

47

24k “because X” tweets

Page 48: Conspiracy, complaints, and fraud: The language of reasons

48

Because X is mostly playful but has strong affective underpinnings

Page 49: Conspiracy, complaints, and fraud: The language of reasons

49

Because and emotions

Page 50: Conspiracy, complaints, and fraud: The language of reasons

In soap operas, guess what the word most associated with because is?

Page 51: Conspiracy, complaints, and fraud: The language of reasons

In the British Parliamnet, one of the words most associated with because…

Page 52: Conspiracy, complaints, and fraud: The language of reasons

Affect and emotion are bound up in discussions of reasoning and cognition • Damasio (1994)• Kahneman (2003)• Matthews and Wells (1994)• Zajonc (1980)• Loewenstein et al. (2001)• LeDoux (1998)

Reasoning needs emotions

Page 53: Conspiracy, complaints, and fraud: The language of reasons

“A sophisticated well-being monitor and guidance system that serves both attention-regulatory and motivational functions” (Smith and Kirby 2000: 90).

What are emotions?

Page 54: Conspiracy, complaints, and fraud: The language of reasons

54

The need to convey and assess feelings, moods, dispositions, and attitudes is as critical as describing events.

We don’t just need to know predications, we need to know affective orientation to the predication.

(See the appendix for lots of ways that other languages encode emotional information)

Emotions are expressed in language

Page 55: Conspiracy, complaints, and fraud: The language of reasons

55

Consumer complaints about banks and credit

agencies

Page 56: Conspiracy, complaints, and fraud: The language of reasons

56

An act or practice is unfair when:

(1) It causes or is likely to cause substantial injury to consumers;

(2) The injury is not reasonably avoidable by consumers;

(3) The injury is not outweighed by countervailing benefits to consumers or to competition.

An act or practice is deceptive when

(1) The act or practice misleads or is likely to mislead the consumer;

(2) The consumer’s interpretation is reasonable under the circumstances;

(3) The misleading act or practice is material.

UDAAP (Unfair, Deceptive, or Abusive Acts or Practices)

Page 57: Conspiracy, complaints, and fraud: The language of reasons

57

Whoa.

Page 58: Conspiracy, complaints, and fraud: The language of reasons

58

Page 59: Conspiracy, complaints, and fraud: The language of reasons

59

Consumers detect fraud, tooData source: Consumer Financial Protection Bureau

21,206 consumer narratives

About banks and credit agencies

25% have the word “because” in it

(Limiting this study to because; also worth looking at are becuase, cuz, since, therefore etc.)

Companies/governments want to detect fraud

Page 60: Conspiracy, complaints, and fraud: The language of reasons

60

Complaints with-because are much longer

Because-narratives No-because narratives

244

106

Median word count

Page 61: Conspiracy, complaints, and fraud: The language of reasons

61

Becauses per complaint

Three or more

Two becauses

Single "because"

11%

21%

68%

Page 62: Conspiracy, complaints, and fraud: The language of reasons

62

Becauses happen much more in:• Bank account or service• Mortgage

And less often (proportionally) in:• Credit reporting• Debt collection

The categories most/least because-y

Page 63: Conspiracy, complaints, and fraud: The language of reasons

63

Result: We strongly suggest someone look into Citimortgage’s business practices,

Cause: because at best they are completely incompetent, and at worst they are committing acts of fraud

Both Result-Cause and Cause-Result can happenBut as in most studies, Result-Cause accounts for the vast majority (here ~95%)

Structure of becauses

Page 64: Conspiracy, complaints, and fraud: The language of reasons

64

“Verifiable if you just had a transcript”Objective-Result / Objective-CauseThey said I owed $10,000because I didn’t pay my bill for 3 months

“Not-verifiable even if you had a transcript”Subjective-Result / Subjective-CauseI am near tearsbecause I don’t know what to do

Krippendorff’s alpha (inter-annotator agreement): 0.85That’s very good agreementHighest for Objective-CauseLowest for Objective-Result (exactly what is the scope)All easily distinguishable—collapsing categories does not

result in higher alpha value

3 annotators, 4 annotation types

Page 65: Conspiracy, complaints, and fraud: The language of reasons

65

The Idibon team! Thanks to Jason and Nick

Page 66: Conspiracy, complaints, and fraud: The language of reasons

66

40% are Subjective-Result + Subjective-Cause33% are Objective-Result + Objective-Cause17% are Objective-Result + Subjective-Cause10% are Subjective-Result + Objective-Cause

A preference for matching types

Page 67: Conspiracy, complaints, and fraud: The language of reasons

67

If you talk about your home, you aren’t objective

Subjective-Causes vs. Objective-Causes

Page 68: Conspiracy, complaints, and fraud: The language of reasons

68

There’s really no difference between Subjective Results and Objective Results

There’s also no difference between Subjective Results and Objective Causes

Each of these tends to have a median of about 66 characters

But Subjective Causes are quite a bit different—a median of 84 characters (significant, p = 0.009303 by Wilcox test)

Affective information gets length

Page 69: Conspiracy, complaints, and fraud: The language of reasons

69

because I found they have dealt fraudulently with many, many consumers

because the matter has not been handled in accordance with the law

BECAUSE NOW SPRINGLEAF FINANCIAL WOULD NOT WORK WITH THE NEW TRUSTEE OF THE TRUST

because Nationstar has dragged its feet in the face of its SIGNIFICANT error

Some examples of Subjective-Causes

Page 70: Conspiracy, complaints, and fraud: The language of reasons

70

• Breakdown in process (repeated attempts, again, for more than, once again, over and over, again and again)

• Unresponsiveness (nothing happened, did not respond)• Misrepresentation (deceived, lied, misled, scam, told me

that)• Omission (did not tell me, failed to reveal, failed to bring to

my attention)• Emotion (my fear is that, i am angry that, frustrating)

• Subjective terms (patiently, unfair, not fair, unreasonable, struggling, sickening, absurd, allowed to do this, tedious)

• Dialogue acts (request, deny, thank, complain, refuse, accept)

• Mortgage processes (refinance, modification, refer, appeal, assistance)

Concepts in the cause and result clauses

Page 71: Conspiracy, complaints, and fraud: The language of reasons

71

Page 72: Conspiracy, complaints, and fraud: The language of reasons

72

Fraud: How do people treat their companiesComplaints: How do companies treat their consumers?Now: How do people treat each other?

Page 73: Conspiracy, complaints, and fraud: The language of reasons
Page 74: Conspiracy, complaints, and fraud: The language of reasons
Page 75: Conspiracy, complaints, and fraud: The language of reasons

Finding healthy communities (supportive)

Page 76: Conspiracy, complaints, and fraud: The language of reasons

And unhealthy ones (toxic)

Page 77: Conspiracy, complaints, and fraud: The language of reasons
Page 78: Conspiracy, complaints, and fraud: The language of reasons

78

Basically all of Reddit, Jan - May 2015266m posts96k forums (“subreddits”)

Most popular:• /r/AskReddit (21m posts)• /r/leagueoflegends (5m)• /r/funny (4m)• /r/pics (3m)• /r/nfl (3m)• /r/nba (3m)

Data details

Page 79: Conspiracy, complaints, and fraud: The language of reasons

79

Median % of posts with because across subreddits with 50k+ posts (758 subreddits)

Top quartile

Bottom quartile

Distribution of “because” across subredits

5.44%

7.25%

3.95%

Page 80: Conspiracy, complaints, and fraud: The language of reasons

80

/r/changemyview (21%)/r/DebateAChristian (19%)/r/PurplePillDebate (18%)/r/DebateReligion (17%)/r/AgainstGamerGate (17%)/r/truegaming (17%)/r/DebateAnAthiest (17%)/r/philosophy (16%)/r/raisedbynarcissists (16%)/r/PoliticalDiscussion (16%)/r/listentothis (15%)/r/relationship_advice (15%)/r/relationships (15%)/r/Anxiety (14%)/r/ADHD (14%)

Examples of most-because-y subreddits

Page 81: Conspiracy, complaints, and fraud: The language of reasons

81

/r/podemos (0%)/r/newsokur (0%)/r/sweden (0%)/r/gonewild (1%)/r/randomsuperpowers (1%)/r/ACTrade (1%)/r/GlobalOffensiveTrade (1%)/r/millionairemakers (1%)/r/SVExchange (1%)/r/PercyJacksonRP (1%)/r/YamkuHighSchool (1%)/r/XMenRP (2%)/r/hardwareswap (2%)/r/rwbyRP (2%)/r/thebutton (2%)

Examples of the least because-y

Page 82: Conspiracy, complaints, and fraud: The language of reasons

82

Page 83: Conspiracy, complaints, and fraud: The language of reasons

83

Page 84: Conspiracy, complaints, and fraud: The language of reasons

84

Page 85: Conspiracy, complaints, and fraud: The language of reasons

85

This presentation is helped out by some insights by Jana Thompson one of our NLP Engineers and Charissa Plattner, one of our summer interns

Co-conspirators!

Page 86: Conspiracy, complaints, and fraud: The language of reasons

86

385k posts30k have “because” (7.81%)

Posts with “because” tend to score higher for “controversiality”

They are also significantly longer (p < 2.2e-16 by Wilcoxon rank sum test)

/r/conspiracy

Page 87: Conspiracy, complaints, and fraud: The language of reasons

87

Counting "deleted" and "AutoModerator" as real users, then there are 32,024 different users who post in conspiracy from Jan-May 2015.

1,064 of them have 50 or more posts.

The median % of posts with because is 7.19% • Top quartile: 11.43%• Bottom quartile: 4.02%

A view of authors

Page 88: Conspiracy, complaints, and fraud: The language of reasons

88

Those who pay decent rent are doing so because they've been living in a rent controlled area for a LONG time.• This is preceded by a paragraph all about rent prices• All Caps Evaluative

So, because it's minor at first that would possibly embolden them? You can't be serious...• So vs. oh, the importance of questions and rhetoric• Preposed because (given/new)

Slaves? Are we literally whipped bloody when we don't do as master says (or just because he wants to).

• Adversative: ends with, “Do you have any clue what slavery really is?”

Some examples from big-because users

Page 89: Conspiracy, complaints, and fraud: The language of reasons

89

There are 384,839 posts in this time frame. They roll up to 222,818 "parent_id" threads.

For threads that have 50+ posts (there are only 144 of them), the median % of posts with "because" is 5.61%.• Top quartile: 8.14%• Bottom quartile: 3.33%

For threads that have 15-49 posts (1,181 of them), the median % of posts with "because" is 5.88%.• Top quartile: 10.53%• Bottom quartile: 0%

A view of threads

Page 90: Conspiracy, complaints, and fraud: The language of reasons

90

Jason Brenier
pls remove red line at top
Page 91: Conspiracy, complaints, and fraud: The language of reasons

91

Page 92: Conspiracy, complaints, and fraud: The language of reasons

92

• JFK (head autopsy paper wound jfk)• 9/11 buildings (building collapse steel fire wtc)• aliens (humans earth life evolution aliens)• 9/11 (9 11 bin laden attacks) • space (earth moon nasa gravity apollo)They avoid…• media (conspiracy media news government propaganda) • US politics (law vote obama federal president congress)• More JFK (don't kennedy)• moderation (reddit post comments mods banned)• family/harm (children school kids mother abuse)

Where do authors who like because go?

Page 93: Conspiracy, complaints, and fraud: The language of reasons

93

The because-irrific authors use a median 901 characters per post

The least-because-y use 615 characters per post

Within because posts…

Page 94: Conspiracy, complaints, and fraud: The language of reasons

94

Are because users just wordy?

Or is it that because users hang out in threads where there’s just a lot more because?

Answer: Basically some topics are just wordier than some others (see next two slides about length)

What is driving length?

Page 95: Conspiracy, complaints, and fraud: The language of reasons

95

Length of posts by topic/author disposition (longest)

Everyone Prolific becausers Because avoiders

1 JFK (2089 char) 9/11 (2747 char) More JFK (2157 char)

2 9/11 (1834 char) JFK (2464 char) aliens (1321 char)

3 More JFK (1784 char) More JFK (2130 char) reality (1270 char)

4 9/11 buildings (1489 char)

9/11 buildings (1962 char)

9/11 (1113 char)

5 aliens (1313 char) aliens (1800 char) religion (917 char)

Page 96: Conspiracy, complaints, and fraud: The language of reasons

96

Length of posts by topic/author disposition (shortest)

Everyone Prolific becausers Because avoiders

25 criticism (534 char) moderation (695 char) climate change (392 char)

24 moderation (564 char) criticism (744 char) moderation (439 char)

23 media (653 char) meta-conspiracy (816 char)

criticism (440 char)

22 meta-conspiracy (666 char)

media (900 char) race (459 char)

21 race (721 char) internet (913 char) food/health (489 char)

Page 97: Conspiracy, complaints, and fraud: The language of reasons

97

7.8% of posts in /r/conspiracy have “because”

16,069 of the posts in /r/conspiracy have language around fraud (21.7%)

So we’d expect about 1,255 posts to have both “because” and fraud/etc.

Instead we find 3,491.

What about claims about fraud, illegality, bamboozlement, etc?

Page 98: Conspiracy, complaints, and fraud: The language of reasons

98

Wrapping up

Page 99: Conspiracy, complaints, and fraud: The language of reasons

99

1) Showing how computational linguistics solves business problems2) Identifying markers of fraud using language data

For company-internal fraud/compliance investigatorsFor government/regulatory/consumer advocacy

3) Detecting and using rationalization and reason-givingThe importance of emotionThe case of because

Your thoughts on next steps?

Reviewing where we’ve been

Page 100: Conspiracy, complaints, and fraud: The language of reasons

100

There are links between rationalization and because usage that can help with applications of the fraud diamond/triangle

The different ways people use/don’t use because can help us understand the psychological state of fraudsters and the information of people who may be encountering it

On because

Page 101: Conspiracy, complaints, and fraud: The language of reasons

101

Page 102: Conspiracy, complaints, and fraud: The language of reasons

Fraud and compliance in digital communications

102

Early case assessment

Relevancy filtering

Risk Scoring

Key entitiesStrategic communications

Spam, NewslettersNear de-duplication

Fraud diamondSentimentPersonal communications

Investigation stage Models100% Data volume

30%

10%

< 1%

Page 103: Conspiracy, complaints, and fraud: The language of reasons

103

Jason Brenier
Remove red line pls
Tatiana Mejia
[email protected] Recreate the primary task bar graphs without the vertical guidelines. Keep the legend. Add a colored light gray box for the narrative we will be adding re. energy [email protected] we need the text for the box from you. We also need to know what the label the primary task.
Page 104: Conspiracy, complaints, and fraud: The language of reasons

Processing millions of SMS in 12 African languages

Intent of sender(i.e. report a problem,

ask a question or make a suggestion)

Categorization(i.e. orphans and

vulnerable children, violence against children, health,

nutrition)

Language detection

(i.e. English, Acholi, Karamojong, Luganda, Nkole, Swahili, Lango)

Location(i.e. village names)

Page 105: Conspiracy, complaints, and fraud: The language of reasons

105

Page 106: Conspiracy, complaints, and fraud: The language of reasons

Understand language data like never before

106

Thank [email protected]

twitter.com/idibonidibon.com

Page 107: Conspiracy, complaints, and fraud: The language of reasons

107

• Given-then-new information (result-then-cause in his small corpus, too)

• Given as what’s been said• Inferable, familiar, expected

• New as unfamiliar, unexpected, unpredictable• The rare times that because is initial, it acts as a guidepost for

information flow• Like however, anyway, for example, on the other hand• “A guidepost par excellence is ‘meanwhile, back at the rank’.”• People as orienting the information for upcoming clauses• A more general strategy of giving a frame

• Third case (That in itself was scary, cause I never fainted before) is sequential and meant to add to the first assertion

• An “afterthought”

Chafe (1984)

Page 108: Conspiracy, complaints, and fraud: The language of reasons

108

Ordering is about functional and cognitive pressures (draws on Hawkins 1994, 2004):• Syntactic parsing• Discourse pragmatics• SemanticsResult-then-clause order violates iconicity of sequence, yet they are the most attested• “causal clauses are primarily used to back up a previous

statement that the hearer may not accept or may not find convincing” (Diessel 2006)

Diessel (2008)

Page 109: Conspiracy, complaints, and fraud: The language of reasons

109

Because occurs when agreement is at-issue (Ford 1993)Instead of focusing on information flow, they focus on speaker interaction and see it as occurring where there is actual/incipient disagreement

Thus, conversation analysts find becauses offered by either speaker right before a dispreferred turn

In English, they are surrounded by pauses, hesitations, excuses, mitigations, indirectness, partial agreement, polarity reversals

Ford and Mori (1994)

Page 110: Conspiracy, complaints, and fraud: The language of reasons

110

The real point of their paper is that there are two Japanese becauses, but the function differently:• datte: glossed as ‘no for the reason that’, is immediate and

clear—strong disagreement—it isn’t about getting information but about getting a justification

• kara: more like English, shifts towards alignment; also used if a reference is unclear, a term is unknown, or if the speaker is assuming something of the recipient that they don’t actually know

If you want to give someone a datte response in English, you have to use turn onset, stress, intensifiers, choice of evaluative language, directness of disagreeing, and non-verbal expressions

Ford and Mori (1994), cont’d

Page 111: Conspiracy, complaints, and fraud: The language of reasons

111

John came back because he loved her.One event causes another

John loved her, because he came back.Illustrates the speaker’s reasoning, “epistemic”; English since, French puisque, German denn

What are you doing tonight, because there’s a good movie on.A “speech act”

Subjective relations are often derived from objective relations (see also Traugott 1995)

Sweetser (1990)

Page 112: Conspiracy, complaints, and fraud: The language of reasons

Tongan

si’i and si’a

Different determiners express sympathy to the DP they head (Hendrick, 2005)

Page 113: Conspiracy, complaints, and fraud: The language of reasons

Navajo

=go

Emotional evaluation in narrative (Mithun 2008)

Page 114: Conspiracy, complaints, and fraud: The language of reasons

Korean

Evidentials and psych predicates

Non-evidential sentences are more assertive/informational, evidential sentences about the speaker are more “expressive” and “spontaneous” (Chung 2010)

Page 115: Conspiracy, complaints, and fraud: The language of reasons

East Caucasian lgs

Case for emotion experiencers ≠ perception experiencers

Van den Berg (2005)

Page 116: Conspiracy, complaints, and fraud: The language of reasons

Thai thîi Complementizer for verbs of emotion/evaluation (Singhapreecha, 2010)

Page 117: Conspiracy, complaints, and fraud: The language of reasons

For strangers on the phone, because is used mostly for vices, holidays, money, travel, wars

117

Page 118: Conspiracy, complaints, and fraud: The language of reasons
Page 119: Conspiracy, complaints, and fraud: The language of reasons

1.4%

Page 120: Conspiracy, complaints, and fraud: The language of reasons
Page 121: Conspiracy, complaints, and fraud: The language of reasons

Top 3 categories in Nigeria

Employment

U-report support

Health

9.69%

17.68%

39.44%

Page 122: Conspiracy, complaints, and fraud: The language of reasons

122

Are becausers drawn to different topics more than others

O/E big becausers O/E because-avoiders

JFK 69 posts by big becausers in this topic / 56 posts expected

0 posts by because-avoiders in this topic / 13 posts expected

9/11 buildings 408 / 366 44 / 86

media 357 / 394 130 / 93

moderation 442 / 489 154 / 114

aliens 133 / 123 19 / 29

food/health 231 / 214 33 / 51

More JFK 38 / 41 13 / 10

internet 103 / 112 35 / 26

vaccines 263 / 247 43 / 59

Page 123: Conspiracy, complaints, and fraud: The language of reasons

123

Basically the same list is top, except vaccines pop up a few spots and aliens drop down a few spots

• More JFK (don't kennedy)• JFK (head autopsy paper wound jfk)• 9/11 (9 11 bin laden attacks)• vaccines (vaccines children disease

autism polio)• 9/11 buildings (building collapse steel

fire wtc)

Let’s remove the authors who like because