powering realtime decision engines in finance and healthcare using open source software

21
© 2015 ligaDATA, Inc. All Rights Reserved. Powering Real-time Decisioning for Financial & Healthcare using Open Source August 2015 Community @ http://Kamanja.org

Upload: greg-makowski

Post on 18-Aug-2015

34 views

Category:

Data & Analytics


5 download

TRANSCRIPT

© 2015 ligaDATA, Inc. All Rights Reserved.

Powering Real-time Decisioning for Financial & Healthcare using Open Source

August 2015

Community @ http://Kamanja.org

2 © 2015 ligaDATA, Inc. All Rights Reserved.

In ‘14 Barclays embarked on transforming how they leverage their data using Open Source & Big Data technologies.

3 © 2015 ligaDATA, Inc. All Rights Reserved.

To achieve this goal withBarclays we needed to:

1.  Create a framework to adopt Open Source Software

2.  Need a catalyst to attract and retain the talent

© 2015 ligaDATA, Inc. All Rights Reserved. 4

Marissa Meyer of Yahoo won’t have to go in front of the senate to explain why 100,000 records were lost – Barbara Desoer of CitiBank would.

What is different about Financial Services? ü  Regulatory requirements requires 100% data protection ü  Security & Data governance ü  Auditability ü  Lineage ü  ZERO data loss ü  Integration with legacy ecosystem ü  Skillset

Open Source in Financial ServicesGood enough for Internet companies isn't good enough!

© 2015 ligaDATA, Inc. All Rights Reserved. 5

A modified “Crossing the Chasm” view for OSS

OSS – Adoption Chasm Why Financial Services have not adopted OSS more aggressively?

Creators Contributors Users

CreatorsTechnology

Organizations, Rich resources, Solving a problem, Creating a

competitive advantage

ContributorsTechnology

Organizations, taking a risk while Solving a

problem

Users Lower Technology Skillset, Low risk

tolerance, Solving a problem

© 2015 ligaDATA, Inc. All Rights Reserved. 6

Establish the BOSS framework for the consumption and contribution to open source software (OSS) at scale in Barclays .

Barclays Open Source Software (BOSS)

Contribution to OSS by enhancing existing open

source projects, documentation, fixes,

enhancements

Initiation of a new OSS project, championing and facilitating OSS community

development and consumption

Evaluation & Consumption of OSS

Maturing Capability

Consumption Contribution

Barclays Current Focus Step Change

Pioneering Target

BOSS optimises Consumption, enables Contribution and Creation •  Input from stakeholders, internal and external influenced BOSS framework definition •  OSS advisory board to steer and drive •  Pre-approved licenses types per use case (consumption and contribution) •  Invest in enabling technology, GitHub, Black Duck, Sonatype •  No new governance steps, leverage and streamline existing controls instead of creating new ones

Creation

© 2015 ligaDATA, Inc. All Rights Reserved. 7

BOSS framework is designed based on guidance and feedback received from key representatives within Barclays and from leading open source contributors and fellow banks .

Technology

Internal

External

BOSS – Collective Thought Process

Retail

Investment

Cards

Legal

Risk

Security

Sourcing

Business Units Control Functions

Data

Design

Infra

© 2015 ligaDATA, Inc. All Rights Reserved. 8

Millennial developers …

•  Grew up using OSS

•  Unaware of Closed Source software

•  Want to engage, share and contribute

Real-time using Kamanja was selected as a capability big enough, important enough to build a Center of Excellence around it.

Attracting and Retaining talent

© 2015 ligaDATA, Inc. All Rights Reserved. 9

Individual Events

Decisioning, Detection

In-context and online

Cross section of events

Analytics, MI

Offline, Longer cycle

Deriving Decisionsfrom Big Data

BATCH REAL-TIME

© 2015 ligaDATA, Inc. All Rights Reserved. 10

customer-centric product design require Real-time decisions

Trig

gers

Scoring

Notifications

Alerts

Transactional Updates

Deriving an Opportunity or Threat

E N D - T O - E N D C A P A B I L I T Y

Tracking & Analyzing (processing)

Streams of Information(real-time)

About Things That Happen (events)

Actio

ns

Real-timeDecisions

Real-time DecisionRequirement

11 © 2015 ligaDATA, Inc. All Rights Reserved.

LigaDATA introduced Kamanja – an open source real-time decisioning project, hardened for Financial Services & Healthcare requirements and scalable to IoT level data volumes enabling low latency use cases.

Customer churn/

retention

Risk Analysis

Customer Contact

Cyber Crime

Fraud

Security & Compliance

Audit & Governance

U S E C A S E S

Marketing

Telephony Interception

Real-Time Offer

12 © 2015 ligaDATA, Inc. All Rights Reserved.

Uses of Real-Time Decisioning

Complex Event Processing (CEP) •  A few to possibly 100’s of concurrent data streams •  Apply rule logic, select, aggregate •  Decide action on elements in stream

Enterprise Applications, During … •  customer call or chat: recommendations to improve service •  card transaction: offer credit increase •  web application: pre-approval •  web transaction: recommend other product(s)

13 © 2015 ligaDATA, Inc. All Rights Reserved.

Case Study of a Modeling Department Monitor $80B of consumer bank transactions / year to detect fraud (between 1,400 banks)

PAIN POINT: ~2 months to deploy (model group was different from deployment group) INDUSTRY REVIEW to answer: •  How common is it to use many algorithms or tools in a project? •  What is an easier way to deploy models?

14 © 2015 ligaDATA, Inc. All Rights Reserved.

http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html

Independent use of tools

15 © 2015 ligaDATA, Inc. All Rights Reserved.

http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html

Tools used in combination

16 © 2015 ligaDATA, Inc. All Rights Reserved.

Scoring Engine

(Kamanja)

PMML DiagramPredictive Modeling Markup Language

Training & test data (batch)

Data Mining Tool File, Save As

PMML

PMML File

PMML Producer

PMML File Scoring data

(real time streaming) Output data has new score field

Training Project Phase

Production Scoring Project Phase

Full model specification

PMML Consumer

17 © 2015 ligaDATA, Inc. All Rights Reserved.

Given industry fragmentation, PMML is a solution PMML Producers (18 companies) •  R (Rattle, PMML) •  RapidMiner •  KNIME

PMML Consumers (12 co) •  Zementis •  SAS •  IBM SPSS •  KNIME •  Microstrategy •  Kamanja •  JPMML

•  Spark (MLlib) (Open Source) •  Weka •  SAS Enterprise Miner

PREDICTIVE Naïve Bayes Neural Net Regression Rules Scorecard Sequence SVM Time Series Trees

DESCRIPTIVE / OTH Association Rules Cluster, K-Nearest Nb Text Models model ensembles & composition (i.e. Gradient Boosting)

© 2015 ligaDATA, Inc. All Rights Reserved. 18

Real Time Computing

OSS Technology Stack Integration with Kamanja

Kamanja (PMML/Java/Scala Consumer)

High level languages / abstractions

Compute Fabric

Cloud, EC2 Internal Cloud

Security

Kerberos

Real Time Streaming

Kafka, MQ

Spark*

ligaDATA

Data Store

HBase, Cassandra,

InfluxDB HDFS

(Create adaptors to integrate others)

Resource Management

Zookeeper, Yarn*, Mesos*

High Level Languages / Abstractions

MLlib* (PMML Producer)

© 2015 ligaDATA, Inc. All Rights Reserved. 19

PerformanceCharacteristics

© 2015 ligaDATA, Inc. All Rights Reserved. 19

Performance •  Throughput of million messages/second

•  Uses commodity hardware

Scalability •  Linear scalability -- horizontally

•  Data partitioning support

•  Runtime multi-model optimizations to support thousands of models

•  Consistent performance on hundreds of models and thousands of rules

Built for IoT data volumes

© 2015 ligaDATA, Inc. All Rights Reserved. 20

•  Clinicians (knowledge experts) develop heuristic based rule set models

•  The initial model was COPD (Chronic Obstructive Pulmonary Disease) risk assessment

•  Support of referenced Beneficiary, HL7, Inpatient Claim, and Outpatient Claim

•  Models are expressed with a domain specific language (DSL) they developed

•  DSL models are transformed to PMML for Kamanja

•  Models consume current + prior related messages over “look back period” Save the “assertions” of a patient in the database (beyond standard PMML) “State” can evolve over time

•  The “Medical Company” plans to integrate the DSL with their ontology data modeling effort

•  Goal is to generate new models as their “medical world” ontology evolves

Medical Company use of Kamanja

© 2015 ligaDATA, Inc. All Rights Reserved.

Try out

© 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL

Community @ http://Kamanja.org