his conference 2017 - ipa conference 2017/analytical ecosystem... · his conference 2017 © 2016...

6
HIS Conference 2017 © 2016 Teradata 1 What is the Analytical Ecosystem? And why is it making my life more complicated? Brian Grant The Changing Environment What are the new Technologies? How does It all fit together? Conclusions Agenda © 2016 Teradata 2 3 Evolution in The Use of Information OPERATIONALIZING WHAT IS happening now? ACTIVATING MAKE it happen! Link to Operational Systems Automated Linkages REPORTING WHAT happened? PREDICTING WHAT WILL happen? ANALYZING WHY did it happen? Batch Reports Ad Hoc, BI Tools Predictive Models Insights Action s Does this process change with Big Data? No. Do the tools you need change? Yes. 4 How is Data Changing? BIG DATA WEB Terabytes CRM Gigabytes Megabytes ERP Petabytes Increasing Data Variety and Complexity User Generated Content Mobile Web SMS/MMS Sentiment External Demographics HD Video Speech to Text Product/ Service Logs Social Network Business Data Feeds User Click Stream Web Logs Offer History Dynamic Pricing A/B Testing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels Payment Record Support Contacts Customer Touches Purchase Detail Purchase Record Segmentation Offer Details 5 The Data Mart Era ”Just Give Me Any Old Data – And Fast!” The EDW Era “Give me integrated, high quality data that enables me to optimise end-to-end business processes cost- effectively.” The Logical Data Warehouse Era “Centralise the data that are widely re-used and shared - but integrate all of the data and the analytics.” How have Enterprise Analytical Architectures evolved? The Changing Environment What are the new Technologies? How does It all fit together? Conclusions Agenda © 2016 Teradata 6

Upload: others

Post on 27-Oct-2019

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HIS Conference 2017 - IPA Conference 2017/Analytical Ecosystem... · HIS Conference 2017 © 2016 Teradata 1 What is the Analytical Ecosystem? And why is it making my life more complicated?

HIS Conference 2017

© 2016 Teradata

1

What is the Analytical Ecosystem?And why is it making my life more complicated?

Brian Grant

2

The Changing Environment

What are the new Technologies?

How does It all fit together?

Conclusions

Agenda

© 2016 Teradata2

3

Evolution in The Use of Information

OPERATIONALIZINGWHAT IS

happening now?

ACTIVATINGMAKE it happen!

Link to Operational Systems

Automated Linkages

REPORTINGWHAT

happened?

PREDICTINGWHAT WILL happen?

ANALYZINGWHY

did it happen?

Batch Reports

Ad Hoc, BI Tools

Predictive Models

Insights

Actions

Does this process change with Big Data? No.Do the tools you need change? Yes.

4

How is Data Changing?

BIG DATA

WEBTerabytes

CRMGigabytes

Megabytes ERP

Petabytes

Increasing Data Variety and Complexity

User Generated Content

Mobile Web

SMS/MMS

Sentiment

External Demographics

HD Video

Speech to Text

Product/Service Logs

Social Network

Business Data Feeds

User Click Stream

Web Logs

Offer History Dynamic Pricing

A/B Testing

Affiliate Networks

Search Marketing

Behavioral Targeting

Dynamic FunnelsPayment Record Support Contacts

Customer TouchesPurchase

Detail

Purchase Record

Segmentation

Offer Details

5

The Data Mart Era

”Just Give Me Any Old Data – And Fast!”

The EDW Era

“Give me integrated, high quality data that

enables me to optimise end-to-end business

processes cost-effectively.”

The Logical Data Warehouse Era

“Centralise the data that are widely re-used

and shared - but integrate all of the data

and the analytics.”

How have Enterprise Analytical Architectures evolved?

6

The Changing Environment

What are the new Technologies?

How does It all fit together?

Conclusions

Agenda

© 2016 Teradata6

Page 2: HIS Conference 2017 - IPA Conference 2017/Analytical Ecosystem... · HIS Conference 2017 © 2016 Teradata 1 What is the Analytical Ecosystem? And why is it making my life more complicated?

HIS Conference 2017

© 2016 Teradata

7

• High performance data access– Random and sequential

– An access language + API

• High availability– Recovery following errors

• Data model for business applications– Isolates schema from application

– Relationships enforced between data attributes

– Logical and physical data designs

• ACID properties

• Shared resource, concurrency

• Data controlled by database– Data types

– Secure access controls

What is a Database?Atomicity

apply all changes or none

Consistencyrollback on errors

Isolationone update at a time

Durabilitytransactions survive crashes

8

So what is Hadoop?

• Open source Data Store Optimized to handle

Massive amounts of data

Variety of data (Structured/Unstructured/Semi-structured)

Contributors: Yahoo!, IBM, Google

• Growing list of supporting tools

• Great Performance for certain Use Cases

• Reliability through replication

9

Hadoop vs RDBMS: Complementary Capabilities

Observations…

• Technology changing

• Different perspectives

Focus really needs to be…

• Capabilities

• Client Skill Sets

• Requirements0

1

2

3

4

5

Multi-tenant /

Concurrency

Service Levels

Development Effort

Complex SQL

Workload

Management

Relational Processing

Grain of Security

System Management &

DBAMaturity of Optimiser

Streaming

Glacial History

Multi-Structured

In-System Analytics

Non-Relational

Intensive Processing

Schema-on-Read

Scalability

Hadoop

Teradata

10

Usage Patterns Data Warehouse extended by Data Lake

Usage Pattern Example

• Secure data• Integrated data• Quality data• Scrubbed data

• Privacy requirements• Product-plan analysis• Regulatory reporting• Enterprise reporting• Financial reporting

• Enriched data• Historical data

• Dollar-chain rules• YoY trend analysis

• Descriptive analytics• Predictive analytics

• Product forecasting• Campaign mgmt

• Machine learning• Document mgmt• Discovery analysis• Streaming data

• Fraud detection• Contract mgmt• NPS drivers• Sensor

• Unstructured data• Raw data• Schema evolution• Natural Language

• MRI / CT images• CEP analysis• Evolving protocols• Google like search

Best at…

Better at…

Good at…

Better at…

Best at…

Data Warehouse

Data Lake

(Hadoop)

11

How am I supposed to remember all of that?

13/10/2017

Remember “Schema on Read”

12

1950: Alan Turing defines the imitation game

“Can machines think?”

“And how would we know?”

Page 3: HIS Conference 2017 - IPA Conference 2017/Analytical Ecosystem... · HIS Conference 2017 © 2016 Teradata 1 What is the Analytical Ecosystem? And why is it making my life more complicated?

HIS Conference 2017

© 2016 Teradata

13

New Analytical Tools

Data

Mining

Machine Learning

Many people associate

“Machine Learning” with

”Deep Learning”…

Deep Learning is

one “supervised”

approach to Machine

Learning.

14

The Changing Environment

What are the new Technologies?

How does It all fit together?

Conclusions

Agenda

© 2016 Teradata14

15

Blended Architecture

• IDW AND Data Lake

• On Premise AND Cloud

• Open Source AND Proprietary

• File System AND Database

• Agile AND Waterfall

• BI AND Advanced Analytics

• ETL Tools AND Roll Your Own

The blended architecture era:

from “OR” to “AND”

Blended architecture is using the right tools and technologies for the right purpose

© 2016 Think Big, A Teradata Company16

The Challenge is selecting the right technology

© 2016 Teradata

Enough time…

Enough money…

Work magic with technology…

What is the time to value?

What is the cost of re-engineering?

What is the optimal solution?

17

REAL TIME

Acquisition Analytics Access

EMERGING

Data Engines

CONVENTIONAL

MULTI GENRE

DATAWAREHOUSE

IN MEMORY

DATA LAKE

No SQL

COMPUTE CLUSTER

OPERATIONAL

Business

Intelligence

Languages

Integrated

Development

INGEST

Users

Operational

Systems

Customers

Partners

Engineers

Data

Scientists

Business

Analysts

Knowledge

Workers

Marketing

Executives

Platform Services DEVELOPMENTDATA OPERATIONS

PRIVATE HYBRIDCloud Deployment PUBLIC

Sources

ERP

SCM

CRM

Video

Machin

e

Logs

Text

Web

and

Social

APP FRAMEWORK

VIRTUAL QUERY

Sensors

18

Data

Analytic

methods

Analytic

processesDecision

Evaluation Action

Loosely coupled

Humangenerated

Businessgenerated

Interactiongenerated

Machinegenerated

Reporting

Operationalintelligence

Path

Graph

Machine learning

Pattern

Analysis

Event

detection

Degrees of

Integration

Tightly coupled

Noncoupled

Predictive

Threshold

analysis

Offer

options

Next best action

Monitor

Replace

Event execution

Time series

analysis

Repair

Affinity

analysis

Migrate From Projects To FrameworksShift focus from a single, tactical exercise to a comprehensive framework

Page 4: HIS Conference 2017 - IPA Conference 2017/Analytical Ecosystem... · HIS Conference 2017 © 2016 Teradata 1 What is the Analytical Ecosystem? And why is it making my life more complicated?

HIS Conference 2017

© 2016 Teradata

19

Typical Usage Patterns

20

Hadoop for Active Data ArchiveActive data archive for better data management

Impact

• Reduced storage costs for data variety

• Perform adhoc analytics on the multiple versions of data

• Retrieve data in minutes ( vs. days with tape archives )

• Reduced load and improved performance of DW/Databases

Situation

High performance storage is expensive. A large integrated pharmacy HC provider deals with a variety of data with different business value. All data cannot be store on the same system. Ever expanding

data is only adding to this challenge.

Problem

Long terms storage data cannot be queried and it takes a long time for retrieval. No analysis can be performed on the archived data. Losing out on business value from this valuable data.

Solution

Used Hadoop to store all the data coming in from weblogs, medical data, JSON files. Hadoop also serves as a enrichment layer to enhance data for high-end analytics consumption. The complete

solution provides easy movement of data from Hadoop, Aster and Teradata.

21

Analyze Customer Web InteractionsCapture, Refine, Store ClickStream Data

Impact

• Reduced data inconsistencies and improved performance

• Capture and curate ALL the data and prepare for analysis

• Perform ad hoc analytics on multi-level interactions

• Improves the marketing campaigns and the customer support process

Situation

Customers interact with public websites of large PC vendor for various purposes — resulting in huge volumes of raw data. Because of its nature, the data structure and format is not always consistent and because of the volumes, processing the

amount of data is difficult.

Problem

Inconsistencies like file errors, corrupted file compressions in the raw omniture data makes the capturing and analysis process error prone. The volume, velocity (70files/hr, 1M files) adds to the complexity.

Solution

Teradata Big Analytics solution to provide a landing and staging area for in-coming data at high velocity. Hadoop nodes to curate the data, check for data consistency, and prepare the data for consumption by higher end analytic

platforms.

22

Document Classification

Impact

• Enabling near real-time processing of messages and documents

• Real-time querying by end users via Elastic search ( in seconds)

• Combine real-time and batch processing for better results

SituationCurrent Mayo Clinic architecture has diverse architectures and varied applications to manage clinical

documents for storing, indexing, viewing- resulting in high latencies and inefficiencies

ProblemMayo Clinic’s NLP technology has been used to implement various research and clinical use cases which

leverage unstructured documents, but these have been constrained in deployment due to RDBMS technology limitations.

Solution

Teradata Appliance for Hadoop platform to design applications to ingest ALL documents types(radiology, operation, ECG/EKG, notes), make data available for free text search while making themsimultaneously available for batch processing.

Enabling Real-time Processing of Clinical Records

23

Telematics Geospatial Analytics for Better Risk Management

Impact

• Quickly analyze data for informed decisions and ad hoc reporting

• Streamlined process to calculate vehicle and fleet scores

• Cost effectively quantify, adjust and manage risk premiums

Situation

A large diversified insurance provider needed to accurately calculate risk scores and adjust risk premiums for itsenterprise fleets based on vehicle data, driver behavior, GPS data, weather data, traffic and DW data. Current custom

developed applications limits the effectiveness of these scores.

Problem

Lacks infrastructure and system to handle the huge volumes of real time data. No ad-hoc reporting systems to combine,enrich and analyze the data. Limited storage capacity limits the amount of data that can be captured, refined and

stored.

Solution

Used Big Analytics to design a platform to streamline the ingestion process for telematics data from multiplesources, data types, structure, and frequency and combine with other data sources to perform meaningful

analytics.

24

Cyber Analytics

Impact• Improve response times drastically to network traffic events

• Improve effectiveness of controls, fraud detection, and DLP efforts

• Proactively & automatically respond to malicious events or risks

SituationCurrent network traffic solutions are not real-time. They either employ a deep packet inspection later or

try to analyze one packet at a time in the hope of catching bad apples.

ProblemIneffective detection techniques with only 0.001% of the corporate traffic which might be infested. Today’s

APT ( advanced persistent threat type viruses ) make it even more difficult to get detected with the current processing architectures

Solution

Teradata Appliance for Hadoop is critical part of the UDA solution architecture that can processenormous volumes of data in near real-time and the ability to point accurately malicious code,predictively preventing malicious attacks

Network Analysis for Advanced security threats

Page 5: HIS Conference 2017 - IPA Conference 2017/Analytical Ecosystem... · HIS Conference 2017 © 2016 Teradata 1 What is the Analytical Ecosystem? And why is it making my life more complicated?

HIS Conference 2017

© 2016 Teradata

25

Analytics of Things (AoT) Overview

26

AoT – Connecting Sensor Data for Business Outcome

Right data, right insights, right actions driving business outcomes – revenue growth, customer satisfaction, operational efficiency

Sensor Data Other Data Sources Analytics Business Actions

Business Benefits

27

Our Most Frequently used Capabilities in AoT

• Smart Assets

• Service 360

• Connected Factory

• Connected Supply Chain

• Health & Fitness

• Smart Home

• Connected Car

• Usage based Insurance

• Smart Infrastructure

• Smart Utilities

• Smart Citizen

• Smart Mobility

IoT Data

• Sensor Data Qualification (AoT accelerators)

Industrial AoT Consumer AoT Smart City

28

Operating

Conditions

External

Conditions

Traffic

Location

nPath

cluster

graph regression

machine learning

geo-spatial

affinityEDW

Master Data

Asset Data

Serv ice Records

Manuals

Warranty Details

Decision on Edge

Auto shut down

Auto back up initiation

Change operating condition

Centralized Decisions

Preventive maintenance

Planned maintenance

Spares network

Technician Planning

Failure Rates

Warranty Planning

Asset Health

Monitor asset health

Monitor utilization

Smart Assets – Solution

29

Case Study: Predictive model for engine failures on trains

Business Objective• Avoid unplanned train downtime through prediction model

• Repair trains before failure occurs, secure uninterrupted process

• Pre-Dispatch required spare parts in time

• Avoid unplanned downtime, penality, process interruption• Save time on failure analysis

Data Challenge• Sensor data from engines and data from maintenance management system

• Understand patterns to failure (pattern analysis)

• Understand correlation of errors / sensor readings around a failure

• Based on this insight, build a predictive model• Example: daily pattern of temperature readings mid – low – mid often occurs 3 days

prior to an engine problem

Solution• Data structure to support ingestion, transformation, storage of data for

analytics

− Leverage Teradata analytics capability for predictive modeling

− Scoring of predictive model to predict failures

Opportunity to Impact

30

• Are you responsible for the uptime of an asset used by end customers?

• Does a downtime affect your production, revenue or customer service?

• Will you benefit from predicting a failure of a remote asset?

• Can you improve your product quality by understanding product behavior in the field?

Where might Smart Assets Help You?

Page 6: HIS Conference 2017 - IPA Conference 2017/Analytical Ecosystem... · HIS Conference 2017 © 2016 Teradata 1 What is the Analytical Ecosystem? And why is it making my life more complicated?

HIS Conference 2017

© 2016 Teradata

31

The Changing Environment

What are the new Technologies?

How does It all fit together?

Conclusions

Agenda

© 2016 Teradata31 32

Blended Architectures are no longer an option; They are a requirement

Data Warehouse + Data Lake

On Premise + Cloud

RDBMS + HDFS

Commercial + Open Source

32

333333 © 2016 Teradata