analytics modernization - tibco software...aws s3, aws emr, aws glue agile, open architecture...

28
Analytics Modernization Ravi Hubbly President [email protected]

Upload: others

Post on 13-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Analytics

Modernization

Ravi Hubbly

President

[email protected]

Page 2: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Explore Digits Inc

● Data Engineering and Data Science solutions● 5+ years of experience with TIBCO Data Science and AWS● Experience providing TIBCO based solutions supporting initiatives such as

○ Identifying factors contributing to health issues for long term space travel

○ Defining policies to immediately manage disease outbreaks

○ Optimize monetary flow in the economy

Agenda

1. Challenges in taking AI/ML journey for mature organizations.

2. Best practices and solutions that can address these challenges.

3. Some key discriminators, who may want to consider.

2

Page 3: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Journey from Descriptive Analytics to AI/ML

Use Cases Machine Learning Artificial Intelligence

Space Agency Predict person talking is an Astronaut Detect Astronaut is in stress and

recommends actions to reduce stress

Global Health Agency Detect cause of epidemic Suggest policies to control spread

Analytics

Report number of services customers

have used

Advanced Analytics

Clusters customers-based patterns of

services used

Machine Learning

Predicts services a customer will use

Artificial Intelligence

Detect a customer need and recommend a solution

Descriptive & Diagnostic Analytics: How are we doing?

Predictive Analytics:What might happen in the future?

Prescriptive Analytics:Suggest course of action

3

Page 4: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

As-Is Fraud Prevention Environment

Teradata

SAS

Tools

Model development &

Investigation Environment

Requirements Implementation

Challenge #1

Ineffective Model deployment as

two different environment/solutions

Challenge #2

Significant effort in managing data

pipelines

Challenge #3

Significant effort in O&M data

environment based on Hadoop & IBM.

Challenge #5

Legacy limiting tools

E.G SAS is legacy and limiting capability.

IDR only supports RDBMS based

analytics

Execution Environment

Data Files

HIVE Meta store

MAPREDUCE Jobs

(PIG Programs)

IBM SPSS + IBM Rules

HADOOP

Payment

System

Cla

ims

Pro

vid

er

Be

ne

ficia

r

y

Dru

g

Qu

alit

y

Th

ird

Pa

rty d

ata

Data Center

Challenge #6

Fixed expensive on-premise data

center

Challenge #4

Scalability & in handling thousands

of models especially ML and AI

based such as deep neural

network based models

4

Page 5: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Data Sources

Multiple

Databases

Analytics

Data

External Feeds

Mainframe

Apps data

80%-90%

Usage Patterns

ML/AI

Analytics

“It’s impossible to overstress this:

80%-90% of the work in any data project

is in getting the data.”

What is the Biggest Challenge in Accelerating Data Analytics?

5Proprietary & Confidential.

Data Sharing

Legacy Data pipelines built since 1990’s

Legacy, inflexible, and proprietary Data Repositories

Significant investment in legacy analytics tools such as

SAS and client-server based BI tools

Page 6: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Analysis

Data Warehouse and Data Lake

AI/ML

Business Intelligence

6

Data Analyst Data Engineer Data Scientist

ValidatingDiscovering Structuring Cleaning Enriching Deploying

And it Impacts Your Entire Organization...

Data Sources

Multiple

Databases

Analytics

Data

External Feeds

Mainframe

Apps data

6

Page 7: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Moving to The Cloud Aggravates The Problem

Legacy Tools Are Insufficient to Tackle Data Challenges in the Cloud

Not integrated with Cloud platforms

Unable to handle diverse data in the cloud

Lack self-service for business users

Rigid design, poor scalability, costly

Poor Analytics Outcome in the Cloud

7

Page 8: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

New Tools and Migration process needed for Cloud

8

Build out foundation based on “Cloud & Big Data Enabled”

Migrate Data & Analytics to Reap Benefits of low

cost, high performance (key)

Improved analytics including Self

service and Collaborative

TIME

Operationalize AI and ML Data Environment - Intelligent, powerful, data foundation architected for the cloud

Analytics Environment – Leveraging Open Source and/or Best of breed ready to use Analytics Platform

Data and Analytics Migration services -Migration of legacy analytics and data pipelines using automation – Automate migration of legacy analytics to modern platforms

User enablement for accelerated transition

Page 9: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Nationwide Healthcare Quality Improvement

Foundational Principles:

• Enable innovation • Data driven decision and policy making• Foster learning organizations• Eliminate disparities, strengthen infrastructure, and data systems

Goals

Make care safer

Strengthen person and family engagement

Promote effective communication and coordination of care

Promote effective prevention and treatment

Promote best practices for healthy living

Make care affordable

9

Page 10: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Continuous Improvement Activities

Root Cause Analysis: Identify challenges/problems in real-time and effectively

apply problem-solving techniques.

Measurement Strategies: Utilize data to develop healthcare quality measures

and/or a measurement strategy.

Data Driven: Collect, aggregate, analyze, display, and openly share data (i.e., to

achieve or work toward transparency).

Data Collection: Large and diverse data sets. Internal and External data.

Data Analytics & Reporting: Monitor, educate, and develop policies.

Transparency: Collaborative share methods and results.

Copyright © 2019, Explore Digits Inc. All rights reserved. 10

Page 11: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Addressing Hospital Readmissions: Real-World Scenario

Identify 60% of population that needs to be targeted for readmission

improvement.

Understand network/dependencies and conduct root cause analysis.

Contact potential healthcare providers, convey goals, and possible causes.

Recruit providers to participate in the program.

Monitor progress during the period hence ability to closer to real-time claims

data become important.

Copyright © 2019, Explore Digits Inc. All rights reserved. 11

Page 12: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Legacy Environments

On-premise Unix-based workbench with following major capabilities

Centralized file system - data is made available in SAS data format. Limited file storage

capacity.

SAS server tool - limited processing capability use mainly to extract data.

Observations

80-90% effort is data preparation.

Based on conversations limited statistical and analytics algorithms.

Thousands of legacy SAS modules to measure and monitor health care quality

Copyright © 2019, Explore Digits Inc. All rights reserved. 12

Page 13: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Drawbacks of Legacy Environments

Need diverse compute and consolidated data storage capability

Workbench performance and capacity issues

Low productivity

Expensive investments in local data centers

Need for closer to real-time data

Delayed data makes interaction with impacted users ineffective

Need more data for effective analytics

Not built for modern ML/AI

New employees not skilled in legacy. Hence expensive labor

Need to improve gap in reporting or data visualization tool

Copyright © 2019, Explore Digits Inc. All rights reserved. 13

Page 14: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Data & Analytics Modernization

Business Goals

Improve Analytic experience for users through advanced functionality

Modernize current analytic platforms and data repositories

Increase collaboration and data sharing for internal/external stakeholders

Improve data timeliness Simplify/automate data access

Reduce data duplication, mature data governance and increase data quality and security

Objectives

Establish Centralized Data Repository (CDR) and Analytics Platform (AP) in Cloud Create a centralized repository that includes Enterprise data, etc.

Make Program Specific datasets available for analytics and sharing with multiple consumers

Implement Analytics Toolset

Moving analytic operations

Assist transition to Cloud

14

Page 15: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Data Lake

Data Lake

Processing Engine

Data

Warehouse

Data

Warehouse

Databases

Log Files

Spreadsheets

IoT Sensors

Apps

SourcesCloud Platform

AI & ML

BI Reporting

Enterprise Data

Service

Analysis

ETL/ELT Access

control

Solution mapped to Needs

Central

Data RepositoryOAP

ML based

Advisor

Analytics

(SAS)

Migration

Service

Data Base

Migration

Service

Data

Pipeline

Migration

Service15

Page 16: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Data Repository

DataObjects

(Amazon S3)

AWS HIVE

(Glue Data Catalogue –

planned)

Data De-IdentificationService

Data CatalogueService

Data HydrationService

Data PreparationService

Billing

Enterprise Cluster Program 1

Apache HDFS/SPARK

Cluster

TIBCOData

SciencePortal

Apache Hive VPC

Program 2

Apache HDFS/SPARK

Cluster

Reporting+

Data PrepPortal

VPC

Apache HDFS/SPARK

Cluster

TIBCO+

ReportingPortal

VPC

Local(Amazon S3)

Apache Hive

Local(Amazon S3)

Apache Hive

Local(Amazon S3)

Amazon EMR(on

demand)

Modernization Vision: Cloud-Based Data and Analytics solution

AWS IAM

AWS Lambda

AWS DMS

Clinical

Provider

Patients

Vendors

Enrollment

Copyright © 2019, Explore Digits Inc. All rights reserved.

Program 3

AWS IAM

16

Page 17: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Internal Data Sources

TIBCO Data Science ReportingTools

Ingest Ingest

De-IdentificationPresto SQL Engine

Data Integration

Use of AWS Lake Formation for Data repository includes use of Amazon S3 – Storage, AWS Glue Catalogue (HIVE), AWS Glue ETL or AWS DMS, and EDOS tools

Use of customized AWS Lake Formation or connect to existing environment

External Data Feeds

VPC VPC• Eliminate need to• replicate data• download data to

machines• Reduce

• data centers • data movement

• Improve data quality

IAM IAM

Local Data

Central Amazon EMR

Cluster

Central Data

Central Data

Catalogue

Data Catalogue

Data APIsPresto SQL

Engine

Program 1

Cluster

Local Data

CatalogueData

Data APIsAWS

Redshift

Program 2

Cluster

Access

Internal Data Sources

Internal Data Sources

Copyright © 2019, Explore Digits Inc. All rights reserved.

Modernized Architecture

IAM

17

Page 18: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Open Analytics Platform

Vision Challenges Solution

Increase productivity with data science. Provide a self-service data analytics solution that is scalable, affordable, and leverages big data.

• Time to data insight is too long

• Data scientist skills are expensive

• 80% time spent finding and cleaning data

• Analytics limited to sample data sets

• Application of analytics to operation environment requires recoding

• Significant dependency on IT teams, causing delays

• Lacking collaboration for data science resulting in data analytics silos

• Volume, velocity, and variety of data

TIBCO Platform is part of an integrated, best-in-breed, big data analytics solution on AWS.

• Works with Apache Hadoop and Apache Spark

• 80%-ready solution for most customers

• Allows focus on mission and benefit of continuous innovation

18

Page 19: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Data Visualization Arcadia

Data Transformation Trifacta

Advanced Modeling TIBCO

Analytics Toolset SAS Viya

Bring-Your-Own-Analytics (BOYA)

open source tools

Open Analytics Platform: Tool Choices

Multiple tools to support various needs and offer costs savings

19

Page 20: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

OAP: Open Analytics Platform Capabilities

20

Core Capability Technology Benefits

Data Storage and Processing

▪ AWS S3, AWS EMR, AWS Glue

▪ Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data.

▪ Enables data integration, streaming ingest, and storing and processing unstructured data.

Event Framework ▪ AWS Lamba; Tibco Data

Science▪ Real-time processing of new data on Event triggers; micro-services-based

BOTS that push analytics real-time for maximum benefits

Data Governance ▪ AWS Glue Data Catalog▪ Smart data catalog that automatically discovers, organizes and surfaces high-

quality information, making it easy to find and use data.

Predictive Transformation

▪ Tibco Data Science▪ Self service data preparation using suggestive approach in rapidly developing

data processing jobs.

Data Visualization ▪ Arcadia Data/Tableau

▪ Self service approach towards rapidly building and sharing interactive data visualization.

▪ Link analytics tools enable greater insight into data relationships to isolate suspicious entities

▪ Geospatial Analytics to perform iterative, spatial analysis to solve complex problems related to the physical world.

Advanced Analytics & AI/ML

▪ Tibco Data Science▪ Provides a collaborative, visual environment to create and deploy analytics

workflow and predictive models.

Page 21: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

AI/ML Environment

DataSourcesData

SourcesDataSources

Investigation

Data Discovery &Governance

Common Analytic

AI Platform

Operationalized Production-Ready AI

Services

Enterprise Metadata Search

Infrastructure logsKnowledge

ManagementCollaboration

Crowd SourcingExternal Data

Scientist Analyst Business/Operations

MONITORING&

Governance&

Control

AI Platform

1010101010

Model Development

Deploy

AI bots

AI Engine

Digital

Exhaust

21

Challenge #1 addressed

Effective Model deployment as we

have seamless deployment

Challenge #2 addressed

Compute to data pattern

eradicates data silos hence effort

in getting data

Challenge #3 addressed

AWS managed services reduce

burden on infrastructure O&M.

Challenge #5 addressed

Modern tools able to bring in new

generation of data scientist

Challenge #6 addressed

Fixed expensive on-premise data center

Challenge #4 addressed

Distributed cluster like Apache Spark and

use of AWS Sagemaker enables highly

scalable and performing executing engine

Page 22: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

SAS to Python

Proprietary & closed source Open Source

Recurring licensing costs on usage and any

expansion

No license fee on usage or expansion

Specialized Skillset or training needed for

resources.

Widely adopted and easy to find resources

with the skillset needed

Deployment is constrained by licensing and

hard to deploy anywhere without first licensing

the software

Deployment is easy across in-house or cloud

infrastructures

22

Page 23: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Manual effort by teams of Data Scientists.

Resources should know both SAS and Python

Time consuming and error prone

And very expensive too in terms of resources

Manually convert SAS to Python

We use solution that automatically* converts code written in SAS language to open source Python 3.x language with the goal of enabling data scientists to use the modern machine learning and deep learning packages available via Python.

Written by team of Data Scientists and Engineers to convert SAS to Python correctly.

Solution: Automate conversion

* 80% automation usually seen and increases with familiarity of the SAS code used

23

Page 24: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Simple language constructs

24

Page 25: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Datalines

25

Page 26: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Statistical Calculations

26

Page 27: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

ETL code

27

Page 28: Analytics Modernization - TIBCO Software...AWS S3, AWS EMR, AWS Glue Agile, open architecture solution for storing and processing massive volume, velocity, and variety of data. Enables

Questions?