Real-Time Text Analytics for Event Detection in the Financial World Gaining value from Big Data
Volker Stümpflen
April 2015
Winner
Information Delay - A Big Data Problem Markets are driven by news (and sentiments)
� Loss of S&P 500 alone totaled $136.5 billion within six minutes � Cost of a second: $380 million
Hacked AP tweet
Use Case Baader Bank AG
� A leading German investment bank, market maker, sales trader � Missed relevant information in real-time
3
780.000 financial instruments
N traders and analysts
500.000 news p.d. ~4 bn sentences p.a.
From stocks to derivatives Increasing
Decreasing time for increasing information Is constant and small
From news agencies to social media channels Strongly increasing
Market Moving Event Types From news corpus Big Data analytics
� Reuters 2008 news corpus � S&P 500 companies
� Price change >= 1% in less than 1 minute
4
Event Rel Freq.
CDS Price Move" 1"
Analyst Forecast" 1"
Business Climate Change" 1"
CEO Search" 1"
Company Forecast" 1"
Customer Problems" 1"
Debt Financing" 1"
Equity Financing" 1"
Fraud Investigation" 1"
Government Decision (no bailout)"
1"
Incorporation Change" 1"
Legal Settlement" 1"
M&A" 1"
Restructuring" 1"
Supply Chain" 1"
Trading Halt" 1"
Asset Liquidation" 2"
Stocks Fall (Peers)" 2"
Dividend Change! 3!
Broker Rating! 9!
Quarterly Results! 10!
Three Information Classes
� Common events � E.g. “U.K. Economy Grew 0.6% in Fourth Quarter, Revised From 0.5%“
� Structured and accesible with simple RegEx � Immediately in the market
� „Grey Swans“ � E.g. „The Swiss National Bank scrapped the cap on the Franc.“
� Expected market movers but individual event is less frequent
� Form together approx. 50% of all events
� „Black Swans“
� E.g. „Zombie virus epedemic“ � Unexpected market movers and typically catastrophes
5
The Way We Look at Events Information is always connected
� Economy consist of � Entities like companies, people, products, locations, catastropes, ...
� Events occur if entities doing something with each other
� The simplest event is � Entity A – is doing something with – Entity B
� Complex events are � Superpositions of simple events
� And/or indirect effects of simple events (guilty by association)
6
A Recent Example
7
Cap on franc
crapped
FXCM had serious losses
Networks – The Most Natural Way to Look At It
8
Networks With Predicate Argument Structures
9
Swiss National Bank
the cap on the franc
Customers
FXCM
225 million francs
scrapped
owe
=
Wanted
� Software that ... � extracts PAS out of raw unstructured text � is fast (PAS after few ms per sentence) � has high precision for agent, patient, beneficiary � is easy to extend for domain-specific language
e.g. *EHEALTH SEES YEAR ADJ. EPS 34C-41, EST. 38C� is inherently multi-lingual
� Nothing available did all of that – so we built our own!
10
Preprocessing
11
The SNB scrapped the cap on the franc. Markets are stunned.
Sentence 1 Sentence 2
The SNB scrapped the cap on the franc Markets are stunned
Sentence Splitting
Tokenization
Part-of-speech tagging
The SNB scrapped the cap on the franc Markets are stunned
DT NNP VBD DT NN IN DT NN NNS VBP
VBN
noun verb
Known Concept Identification
� Includes named entity recognition (NER)
� Uses machine learning techniques, Context Free Grammars (CFGs) and pattern matching
� Easy to extend
12
Customers owe FXCM approx. 225 million francs .
to owe
owe/owes/owing/owed/owed Takes arg0, arg1, arg2
FXCM
Forex Capital Markets Ltd NYSE: FXCM
Currency Value
Value: 225 000 000 Currency: Swiss Franc
Chunk Parsing
� A chunk is (working definition) a sequence of consecutive tokens grouped by some notion of syntactic or semantic function or dependency.
13
DT The
NN decision
IN of
DT the
NNP SNB
RB greatly
VBD surprised
DT the
NNS markets
NOUN CHUNK NOUN CHUNK NOUN CHUNK VERB CHUNK
TOKEN TOKEN TK TK TK TOKEN TOKEN TK TOKEN Start: 1-token chunks
Apply CFG
Chu
nkin
g
Mixed Semantic and Syntactical Analysis
� Semantic concepts that are recognized during chunking are attached to special “Known Concept” chunks
� Subsequent CFGs can recursively check for “is a company” etc.
14
NNP Forex
NNP Capital
NNP Markets
NNP Ltd
KNOWN CONCEPT
TOKEN TOKEN TK TOKEN
NOUN CHUNK
Chu
nkin
g
FXCM
Forex Capital Markets Ltd NYSE: FXCM
Detecting Predicate-Argument Structures
15
DT The
NN decision
IN of
DT the
NNP SNB
RB greatly
VBD surprised
DT the
NNS markets
NOUN CHUNK NOUN CHUNK NOUN CHUNK VERB CHUNK
TOKEN TOKEN TK TK TK TOKEN TOKEN TK TOKEN Start: 1-token chunks
Apply CFG Chu
nkin
g
CONCEPT C Known concepts
The decision of the SNB surprised the markets detected PAS agent patient
Implemented in Scala/Akka A horizontally scalable real-time solution
16
SRV 1 SRV 2
SRV 3 SRV 4
AMQP
AMQP
Reuters
Bloomberg
AMQPFetcher
AMQPFetcher
fetch
fetch
Frontend
Frontend
failover
Internet
Baader
ClusterNode
ClusterNode ClusterNode
ClusterNode
ElasticSearch DB ElasticSearch DB
ElasticSearch DBElasticSearch DB
ServicesServices
Applications
17
Real-Time Event Detection
18
Mood Propagation Networks Systemic Mood
� Inferring sentiment channels
� Calculating positive and negative sentiment flow
� Similar to metabolic networks in biology
19
Samsung
Microsoft
Sony Google
Motorola
XYZ
China
Rare Earths
Foxconn
Apple
Foxconn
Sony Google
Motorola
XYZ
legal action Samsung
Apple sues Samsung
in Australia
ACTING COMPANY NEGATIVE RELATION RECEIVING COMPANY
LOCATION OF RELATION Arbitrary example
Systemic Mood – The Fukushima Example „Activation“ of renewable energy companies
20
Before 3/11/2011: No interest in renewable energies
After 3/11/2011
Conclusion
� Real-time news analytics system
� PAS pipeline based on
� machine learning techniques, Context Free Grammars (CFGs) and pattern matching
� Native transformation into associative network
� Utilized e.g. to infer sentiment propagation
� Benefits for Baader Bank AG
� Comprehensive news analytics
� Smart market moving events
� Reduced completely the losses due to missed news
� Substantial increased trading profit
21
Clueda AG
Contact
22
Dr. Volker Stümpflen
T +49 89 4161402 10
M +49 0176 57 288282
Clueda AG
Elsenheimerstraße 59
D-80687 Munich
www.clueda.com