fraud detection in financial services · community detection and influencer analysis churn risk...
TRANSCRIPT
Hans Viehmann
Product Manager EMEA
ORACLE Corporation
Brighton, December 3rd, 2019
@SpatialHannes
Fraud Detection in Financial Services… using Graph Analysis and Machine Learning
Copyright © 2019 Oracle and/or its affiliates
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
Safe Harbor
Copyright © 2019 Oracle and/or its affiliates
Analysis of Relationships, Dependencies and Behavioural Patterns
Copyright © 2019 Oracle and/or its affiliates
https://www.information-age.com/gartner-data-and-analytics-technology-trends-123479234/
Copyright © 2019 Oracle and/or its affiliates
Graph Data Models
Copyright © 2019 Oracle and/or its affiliates
Property Graph Model
Financial Retail, Marketing Public Safety Smart Manufacturing
• Path Analytics
• Graph Analytics
• Detect patterns and anomalies
• Data federation
• Knowledge representation
• Semantic Web
RDF Graph Model
Life Sciences Health Care Publishing Finance
Use CasesGraph Model Industry Domain Shipping for 12+ years
Shipping for 3+ years
Graph Data Model
What is a graph?
Data model representing entities as vertices and relationships as edges
Optionally including attributes
Also known as „linked data“
What are typical graphs?
Social Networks
LinkedIn, Facebook, Google+, Twitter, ...
Physical networks, Supplier networks,...
Knowledge Graphs
Apple SIRI, Google Knowledge Graph, ...
E
A D
C B
F
Copyright © 2019 Oracle and/or its affiliates
Why are graphs so popular?
Easy data modeling
„whiteboard friendly“
Flexible data model
No predefined schema, easily extensible
Particularly useful for sparse data
Insight from graphical representation
Intuitive visualization
Enabling new kinds of analysis
Overcoming some limitations in relational technology
Additional perspective for Machine Learning
E
A D
C B
F
Copyright © 2019 Oracle and/or its affiliates
Romanian Police Force
Creating Knowledge Graphs from all kinds of content
Social media networks, documents, images, audio, video, structured data
Using machine learning (text analysis, classification, entity extraction, face recognition, speech2text, ...)
Enabling relationship analysis and semantic search
bigCONNECT platform built by mWARE
Running on Big Data Applicance, Big Data Cloud Service or commodity Hadoop
BIG DATA since 2012
Copyright © 2019 Oracle and/or its affiliates
Examples for Graph Analytics
Community detection and influencer analysisChurn risk analysis/targeted marketing, HR Turnover analysis
Product recommendationCollaborative filtering, clustering
Anomaly detectionSocial Network Analysis (spam detection), fraud detection in healthcare
Path analysis and reachabilityOutage analysis in utilities networks, vulnerability analysis in IP networks, „Panama Papers“
Pattern matchingTax fraud detection, data extraction
Copyright © 2019 Oracle and/or its affiliates
Graph Analysis in Financial Services
Copyright © 2019 Oracle and/or its affiliates
Banco de Galicia
Customer profitability analysis
Part of larger Hadoop/Big Data project
Analysis of banking transactions
Focus on corporate customers
Identification of undesired behaviouralpatterns, eg.
Customers using other banks to make large numbers of transactions
Many of which flow back to Banco Galicia
Increase fees, terminate contracts, or move activities to Banco Galicia
Copyright © 2019 Oracle and/or its affiliates
Paysafe
Providing online payment solutions
Real-time payments, e-Wallets
1bn revenue/yr
500000 payments/day
Strong demand for fraud detection
Only feasible with graph data
In real-time, upon money movement
During account creation
In investigation, visualizing payment flows
Analysis of payment flows
Identifying suspicious patterns
Copyright © 2019 Oracle and/or its affiliates
Using graph algorithms for initial assessmentFollowed by interactive analysis with visualization and PGQL
Copyright © 2019 Oracle and/or its affiliates
Moving towards graph analysis with machine learning
Rule Engine:Takes
decision to process or fail
payment
Graph QueryExample: Is there fraudster in 3
payments distance?
Graph Query Example: Do we have linked by password
customer in 3 payments distance?
Example: Pass fraud probability as fact to the rule engine
Graph Database
Machine Learning
Copyright © 2019 Oracle and/or its affiliates
Anomaly Detection using Graph Analysis
Example: Finding anomalies in healthcare billing data
Medical providers and their operations
Providers of the same specialty are close to each other in the graph
Closely connected by common services
a provider vertex exceptionally close to vertices of a different specialty should be an anomaly
Using closeness as a metric
eg. Hop-distance, ...
X
Doctors900,000 HCPCS
6,000Edges9,000,000
Copyright © 2019 Oracle and/or its affiliates
Using Personalized Pagerank to find outliers and anomalies
Variant of Pagerank algorithm that requires a set of starting vertices
Random walks (with restart) from the starting vertices
Computes a new probability of visiting each vertex in the graph biased by the vertices on the starting set
Personalized Pagerank score → a natural relative distance (or closeness) with respect to the vertices from the starting set
Algorithm generates regular pagerank values when starting set contains all vertices in the graph
Starting set of vertices
Copyright © 2019 Oracle and/or its affiliates
Anomaly Detection Procedure
Example: Finding anomalies in healthcare billing data
Medical providers and their operations
Providers of the same specialty are close to each other in the graph
Closely connected by common services
a provider vertex exceptionally close to vertices of a different specialty should be an anomaly
Using closeness as a metric
eg. Hop-distance, ...
X
DoctorsHCPCS
Same specialty(starting set)
Anomalous (other specialty)
Specialty Actions
Copyright © 2019 Oracle and/or its affiliates
Combining Graph Analytics and Machine Learning
Graph Analytics
Compute graph metric(s)
Explore graph or computenew metrics using ML result
Machine Learning
Build predictive modelusing graph metric
Build model(s) and score or classify data
Add to structured data
Add to graph
Copyright © 2019 Oracle and/or its affiliates
Encoding similarity for use in machine learning
Graph captures fine-grained relationship between data entities
As before, closeness can be defined and measured on the graph
Providing numeric representation of your data that retains the distance information
RawData
MLModel
Graph Representation
Numeric Representation (N-dimensional vector)
Copyright © 2019 Oracle and/or its affiliates
Encoding similarity for use in machine learning
Different approaches available
eg. exploiting techniques from modern NLP (natural language processing)
Used Word2Vec in our example
a ML technique that learns closeness between words from large number of sentences
Perform many random walks on the graph
Apply W2V technique on random walk traces, treating vertices as words
KDD‘14
Copyright © 2019 Oracle and/or its affiliates
Using Word2Vec on a large text corpus
Word2Vec – Mikolov et al., 2013, image: Steven Skiena, Stony Brook Univ.Copyright © 2019 Oracle and/or its affiliates
Deepwalk – Translate graphs to a vector space
Copyright © 2019 Oracle and/or its affiliates
Practical example – Student classification
Can you predict a student’s major or department just by looking at the classmates in the course that (s)he is taking?
Very similar to customer segmentation problem
Student => Customer
Course taking => Item or service purchase
Department => Segment label
Copyright © 2019 Oracle and/or its affiliates
CS
ME
10.003
10.004
10.005
11.103
11.213
12.118
students courses
Evaluation – Comparison
1. CNN trained on “standard” features (e.g., student age, courses taken, …)
2. Use PPR and predict the department of the highest-scoring vertex
3. Train a CNN on vertex embeddingsextracted with DeepWalk
4. Add “standard” features beside graph embeddings
Copyright © 2019 Oracle and/or its affiliates
CS
ME
10.003
10.004
10.005
11.103
11.213
12.118
students courses
Results
(Result #1) Graph-based prediction gives better result than naïve application of ML (e.g. CNN) on basic student features (e.g. age, gender, background, …)
(Result #2) Deep-Walk preserves information from graph representation
(Result #3) Deep-Walk allows to combined graph data with other features
CNN on Original Features
PPR (Graph Algorithm)
CNN on Extracted Graph Features(from deep-walk)
CNN on Original + Graph Features
Copyright © 2019 Oracle and/or its affiliates
Enabling Spatial and Graph use cases on every platform
Oracle DatabaseSpatial and Graph Option
Oracle Big DataSpatial and Graph
CloudServices
Database Cloud Service,Exadata Cloud Service,
Graph Cloud Service (planned)Big Data Appliance,Commodity Hadoop,
Spark
Exadata,Non-Engineered Systems
Copyright © 2019 Oracle and/or its affiliates
Oracle‘s Property Graph Technologies – Product Packaging
Available for Big Data platform
Hadoop, HBase, Oracle NoSQL
Supported both on BDA and commodity hardware
CDH and Hortonworks
Database connectivity through Big Data Connectors or Big Data SQL
Included in Big Data Cloud Service
Available since Oracle 12.2 (EE)
Using tables for graph persistence
In-database graph analytics
Sparsification, shortest path, page rank, triangle counting, WCC, sub graph generation…
SQL queries possible
Integration with Spatial, Text, Label Security, RDF Views, etc.
PGQL-to-SQL converter
Oracle Big Data Spatial and Graph Oracle Spatial and Graph (DB option)
Copyright © 2019 Oracle and/or its affiliates
Categories of Graph Analysis
Compute values on vertices and edges
Traversing graph or iterating over graph (usually repeatedly)
Procedural logic
Examples:
Shortest Path, PageRank, Weakly Connected Components, Centrality, ...
Based on description of pattern
Find all matching sub-graphs
Computational Graph Analytics Graph Pattern Matching
:Person{100}name = ‘Amber’age = 25
:Person{200}name = ‘Paul’age = 30
:Person{300}name = ‘Heather’age = 27
:Company{777}name = ‘Oracle’location = ‘Redwood City’
:worksAt{1831}startDate = ’09/01/2015’
:friendOf{1173}
:knows{2200}
:friendOf {2513}since = ’08/01/2014’
Copyright © 2019 Oracle and/or its affiliates
Oracle Graph Analytics Architecture
Scalable and Persistent Storage
Graph Storage Management
Graph Analytics In-memory Analytic Engine
Blueprints & SolrCloud / Lucene
RE
ST
We
b S
erv
ice
Py
tho
n, P
erl, P
HP
, Ru
by,
Jav
ascrip
t, …
Java APIs
Java APIs/JDBC/SQL/PLSQL
Vis
ua
liza
tio
nR
Inte
gra
tio
n (
OA
Ag
rap
h)
Sp
ark
inte
gra
tio
n
Copyright © 2019 Oracle and/or its affiliates
Analytical vs. Transactional System
Three-tier
Graph analysis and traversal queries in-memory
Graph updated in-memory periodically
Two-tier
Graph traversal queries in Oracle Database
Graph updates available to queries in real-time
Shell, Notebook,Application, PGViz
Client Graph
Store
In-memory Engine
Graph AnalysisGraph Traversal
Graph
Store
Shell, Notebook,Application, PGViz
Client
Graph Traversal
Copyright © 2019 Oracle and/or its affiliates
Analyzing the Marvel Graph
g = session.readGraphWithProperties(“config.json”)
analyst.pagerank(g)
analyst.vertexBetweennessCentrality(g)
g.publish(VertexProperty.ALL, EdgeProperty.ALL)
Copyright © 2019 Oracle and/or its affiliates
Pattern matching in Property Graphs using PGQL
Finding a given pattern in graph
Fraud detection
Anomaly detection
Subgraph extraction
...
SQL-like syntax but with graph pattern description and property access
Interactive (real-time) analysis
Supporting aggregates, comparison, such as max, min, order by, group by
Proposed for standardization by Oracle
Specification available on-line
Open-sourced front-end (i.e. parser)
https://github.com/oracle/pgql-lang
Copyright © 2019 Oracle and/or its affiliates
Basic graph pattern matching
Find all instances of a given pattern/template in the data graph
SELECT v3.name, v3.ageFROM socialNetworkGraph
MATCH (v1:Person) –[:friendOf]-> (v2:Person) –[:knows]-> (v3:Person)WHERE v1.name = ‘Amber’
Query: Find all people who are known by friends of ‘Amber’.
socialNetworkGraph
100:Personname = ‘Amber’age = 25
200
:Personname = ‘Paul’age = 30
300
:Personname = ‘Heather’age = 27
777:Companyname = ‘Oracle’location = ‘Redwood City’
:worksAt{1831}startDate = ’09/01/2015’
:friendOf{1173}
:knows{2200}
:friendOf {2513}since = ’08/01/2014’
Copyright © 2019 Oracle and/or its affiliates
Basic graph pattern matching
Find all instances of a given pattern/template in the data graph
SELECT v3.name, v3.ageFROM socialNetworkGraph
MATCH (v1:Person) –[:friendOf]-> (v2:Person) –[:knows]-> (v3:Person)WHERE v1.name = ‘Amber’
Query: Find all people who are known by friends of ‘Amber’.
socialNetworkGraph
100:Personname = ‘Amber’age = 25
200
:Personname = ‘Paul’age = 30
300
:Personname = ‘Heather’age = 27
777:Companyname = ‘Oracle’location = ‘Redwood City’
:worksAt{1831}startDate = ’09/01/2015’
:friendOf{1173}
:knows{2200}
:friendOf {2513}since = ’08/01/2014’
socialNetworkGraph
100:Personname = ‘Amber’age = 25
200
:Personname = ‘Paul’age = 30
300
:Personname = ‘Heather’age = 27
777:Companyname = ‘Oracle’location = ‘Redwood City’
:worksAt{1831}startDate = ’09/01/2015’
:friendOf{1173}
:knows{2200}
:friendOf {2513}since = ’08/01/2014’
Copyright © 2019 Oracle and/or its affiliates
PGQL Examples
SELECT e
MATCH ()-[e]->()
How to get started quickly
... once we have the Graph Cloud Service
Copyright © 2019 Oracle and/or its affiliates
Requirements
Technology to interact with the data
Data modeling tool to convert tabular data
Graph database and analytics environment
Copyright © 2019 Oracle and/or its affiliates
Graph Cloud Service (planned)
“One-click” deployment: no installation, zero configurationAutomated failure detection and recovery
Automated graph modelerEasily convert your relational data into property graphs
Pre-built Algorithms, Flows and SQL-like graph query languageJava, Groovy
Rest APIs
Rich User InterfaceLow code / zero code features
Notebook support and powerful data visualization features
Copyright © 2019 Oracle and/or its affiliates
Converting relational schema to a graph
Copyright © 2019 Oracle and/or its affiliates
Interactive analysis with Notebooks
Copyright © 2019 Oracle and/or its affiliates
Graph visualization
Copyright © 2019 Oracle and/or its affiliates
Graph capabilities in Oracle Products and Cloud Services
Graph databases are powerful tools, complementing relational databases
Especially strong for analysis of graph topology and connectednessGraph analytics offer new insight
Especially relationships, dependencies and behavioural patternsOracle Property Graph technology offers
Comprehensive analytics through various APIs, integration with relational database
Scaleable, parallel in-memory processing
Secure and scaleable graph storage using Hadoop platform or Oracle DatabaseAvailable both on-premise or in the Cloud already today
Copyright © 2019 Oracle and/or its affiliates
Confidential – Oracle Internal/Restricted/Highly Restricted
„Whenever you‘re analyzing relationships, think graphs!“
Key takeaway for today ...
Resources
Oracle Property Graph Technologies OTN product page:https://www.oracle.com/database/technologies/spatialandgraph/property-graph-features.html
White papers, software downloads, documentation and videos
Oracle Labs Tutorials https://docs.oracle.com/cd/E56133_01/latest/tutorials/index.html
Blog post series on setting up Graph Analysis on Oracle Cloudhttps://blogs.oracle.com/oraclespatial/how-to-enable-oracle-database-cloud-service-with-property-graph-capabilities
Free cloud credits available on http://cloud.oracle.com
Blog – examples, tips & tricks: blogs.oracle.com/bigdataspatialgraph
@OracleBigData, @SpatialHannes, @JeanIhm Oracle Spatial and Graph Group
Copyright © 2019 Oracle and/or its affiliates
Where does the community meet?
Copyright © 2019 Oracle and/or its affiliates
Thank you
Hans Viehmann
@SpatialHannes
Copyright © 2019 Oracle and/or its affiliates