his conference 2017 - ipa conference 2017/analytical ecosystem... · his conference 2017 © 2016...
TRANSCRIPT
HIS Conference 2017
© 2016 Teradata
1
What is the Analytical Ecosystem?And why is it making my life more complicated?
Brian Grant
2
The Changing Environment
What are the new Technologies?
How does It all fit together?
Conclusions
Agenda
© 2016 Teradata2
3
Evolution in The Use of Information
OPERATIONALIZINGWHAT IS
happening now?
ACTIVATINGMAKE it happen!
Link to Operational Systems
Automated Linkages
REPORTINGWHAT
happened?
PREDICTINGWHAT WILL happen?
ANALYZINGWHY
did it happen?
Batch Reports
Ad Hoc, BI Tools
Predictive Models
Insights
Actions
Does this process change with Big Data? No.Do the tools you need change? Yes.
4
How is Data Changing?
BIG DATA
WEBTerabytes
CRMGigabytes
Megabytes ERP
Petabytes
Increasing Data Variety and Complexity
User Generated Content
Mobile Web
SMS/MMS
Sentiment
External Demographics
HD Video
Speech to Text
Product/Service Logs
Social Network
Business Data Feeds
User Click Stream
Web Logs
Offer History Dynamic Pricing
A/B Testing
Affiliate Networks
Search Marketing
Behavioral Targeting
Dynamic FunnelsPayment Record Support Contacts
Customer TouchesPurchase
Detail
Purchase Record
Segmentation
Offer Details
5
The Data Mart Era
”Just Give Me Any Old Data – And Fast!”
The EDW Era
“Give me integrated, high quality data that
enables me to optimise end-to-end business
processes cost-effectively.”
The Logical Data Warehouse Era
“Centralise the data that are widely re-used
and shared - but integrate all of the data
and the analytics.”
How have Enterprise Analytical Architectures evolved?
6
The Changing Environment
What are the new Technologies?
How does It all fit together?
Conclusions
Agenda
© 2016 Teradata6
HIS Conference 2017
© 2016 Teradata
7
• High performance data access– Random and sequential
– An access language + API
• High availability– Recovery following errors
• Data model for business applications– Isolates schema from application
– Relationships enforced between data attributes
– Logical and physical data designs
• ACID properties
• Shared resource, concurrency
• Data controlled by database– Data types
– Secure access controls
What is a Database?Atomicity
apply all changes or none
Consistencyrollback on errors
Isolationone update at a time
Durabilitytransactions survive crashes
8
So what is Hadoop?
• Open source Data Store Optimized to handle
Massive amounts of data
Variety of data (Structured/Unstructured/Semi-structured)
Contributors: Yahoo!, IBM, Google
• Growing list of supporting tools
• Great Performance for certain Use Cases
• Reliability through replication
9
Hadoop vs RDBMS: Complementary Capabilities
Observations…
• Technology changing
• Different perspectives
Focus really needs to be…
• Capabilities
• Client Skill Sets
• Requirements0
1
2
3
4
5
Multi-tenant /
Concurrency
Service Levels
Development Effort
Complex SQL
Workload
Management
Relational Processing
Grain of Security
System Management &
DBAMaturity of Optimiser
Streaming
Glacial History
Multi-Structured
In-System Analytics
Non-Relational
Intensive Processing
Schema-on-Read
Scalability
Hadoop
Teradata
10
Usage Patterns Data Warehouse extended by Data Lake
Usage Pattern Example
• Secure data• Integrated data• Quality data• Scrubbed data
• Privacy requirements• Product-plan analysis• Regulatory reporting• Enterprise reporting• Financial reporting
• Enriched data• Historical data
• Dollar-chain rules• YoY trend analysis
• Descriptive analytics• Predictive analytics
• Product forecasting• Campaign mgmt
• Machine learning• Document mgmt• Discovery analysis• Streaming data
• Fraud detection• Contract mgmt• NPS drivers• Sensor
• Unstructured data• Raw data• Schema evolution• Natural Language
• MRI / CT images• CEP analysis• Evolving protocols• Google like search
Best at…
Better at…
Good at…
Better at…
Best at…
Data Warehouse
Data Lake
(Hadoop)
11
How am I supposed to remember all of that?
13/10/2017
Remember “Schema on Read”
12
1950: Alan Turing defines the imitation game
“Can machines think?”
“And how would we know?”
HIS Conference 2017
© 2016 Teradata
13
New Analytical Tools
Data
Mining
Machine Learning
Many people associate
“Machine Learning” with
”Deep Learning”…
Deep Learning is
one “supervised”
approach to Machine
Learning.
14
The Changing Environment
What are the new Technologies?
How does It all fit together?
Conclusions
Agenda
© 2016 Teradata14
15
Blended Architecture
• IDW AND Data Lake
• On Premise AND Cloud
• Open Source AND Proprietary
• File System AND Database
• Agile AND Waterfall
• BI AND Advanced Analytics
• ETL Tools AND Roll Your Own
The blended architecture era:
from “OR” to “AND”
Blended architecture is using the right tools and technologies for the right purpose
© 2016 Think Big, A Teradata Company16
The Challenge is selecting the right technology
© 2016 Teradata
Enough time…
Enough money…
Work magic with technology…
What is the time to value?
What is the cost of re-engineering?
What is the optimal solution?
17
REAL TIME
Acquisition Analytics Access
EMERGING
Data Engines
CONVENTIONAL
MULTI GENRE
DATAWAREHOUSE
IN MEMORY
DATA LAKE
No SQL
COMPUTE CLUSTER
OPERATIONAL
Business
Intelligence
Languages
Integrated
Development
INGEST
Users
Operational
Systems
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
Knowledge
Workers
Marketing
Executives
Platform Services DEVELOPMENTDATA OPERATIONS
PRIVATE HYBRIDCloud Deployment PUBLIC
Sources
ERP
SCM
CRM
Video
Machin
e
Logs
Text
Web
and
Social
APP FRAMEWORK
VIRTUAL QUERY
Sensors
18
Data
Analytic
methods
Analytic
processesDecision
Evaluation Action
Loosely coupled
Humangenerated
Businessgenerated
Interactiongenerated
Machinegenerated
Reporting
Operationalintelligence
Path
Graph
Machine learning
Pattern
Analysis
Event
detection
Degrees of
Integration
Tightly coupled
Noncoupled
Predictive
Threshold
analysis
Offer
options
Next best action
Monitor
Replace
Event execution
Time series
analysis
Repair
Affinity
analysis
Migrate From Projects To FrameworksShift focus from a single, tactical exercise to a comprehensive framework
HIS Conference 2017
© 2016 Teradata
19
Typical Usage Patterns
20
Hadoop for Active Data ArchiveActive data archive for better data management
Impact
• Reduced storage costs for data variety
• Perform adhoc analytics on the multiple versions of data
• Retrieve data in minutes ( vs. days with tape archives )
• Reduced load and improved performance of DW/Databases
Situation
High performance storage is expensive. A large integrated pharmacy HC provider deals with a variety of data with different business value. All data cannot be store on the same system. Ever expanding
data is only adding to this challenge.
Problem
Long terms storage data cannot be queried and it takes a long time for retrieval. No analysis can be performed on the archived data. Losing out on business value from this valuable data.
Solution
Used Hadoop to store all the data coming in from weblogs, medical data, JSON files. Hadoop also serves as a enrichment layer to enhance data for high-end analytics consumption. The complete
solution provides easy movement of data from Hadoop, Aster and Teradata.
21
Analyze Customer Web InteractionsCapture, Refine, Store ClickStream Data
Impact
• Reduced data inconsistencies and improved performance
• Capture and curate ALL the data and prepare for analysis
• Perform ad hoc analytics on multi-level interactions
• Improves the marketing campaigns and the customer support process
Situation
Customers interact with public websites of large PC vendor for various purposes — resulting in huge volumes of raw data. Because of its nature, the data structure and format is not always consistent and because of the volumes, processing the
amount of data is difficult.
Problem
Inconsistencies like file errors, corrupted file compressions in the raw omniture data makes the capturing and analysis process error prone. The volume, velocity (70files/hr, 1M files) adds to the complexity.
Solution
Teradata Big Analytics solution to provide a landing and staging area for in-coming data at high velocity. Hadoop nodes to curate the data, check for data consistency, and prepare the data for consumption by higher end analytic
platforms.
22
Document Classification
Impact
• Enabling near real-time processing of messages and documents
• Real-time querying by end users via Elastic search ( in seconds)
• Combine real-time and batch processing for better results
SituationCurrent Mayo Clinic architecture has diverse architectures and varied applications to manage clinical
documents for storing, indexing, viewing- resulting in high latencies and inefficiencies
ProblemMayo Clinic’s NLP technology has been used to implement various research and clinical use cases which
leverage unstructured documents, but these have been constrained in deployment due to RDBMS technology limitations.
Solution
Teradata Appliance for Hadoop platform to design applications to ingest ALL documents types(radiology, operation, ECG/EKG, notes), make data available for free text search while making themsimultaneously available for batch processing.
Enabling Real-time Processing of Clinical Records
23
Telematics Geospatial Analytics for Better Risk Management
Impact
• Quickly analyze data for informed decisions and ad hoc reporting
• Streamlined process to calculate vehicle and fleet scores
• Cost effectively quantify, adjust and manage risk premiums
Situation
A large diversified insurance provider needed to accurately calculate risk scores and adjust risk premiums for itsenterprise fleets based on vehicle data, driver behavior, GPS data, weather data, traffic and DW data. Current custom
developed applications limits the effectiveness of these scores.
Problem
Lacks infrastructure and system to handle the huge volumes of real time data. No ad-hoc reporting systems to combine,enrich and analyze the data. Limited storage capacity limits the amount of data that can be captured, refined and
stored.
Solution
Used Big Analytics to design a platform to streamline the ingestion process for telematics data from multiplesources, data types, structure, and frequency and combine with other data sources to perform meaningful
analytics.
24
Cyber Analytics
Impact• Improve response times drastically to network traffic events
• Improve effectiveness of controls, fraud detection, and DLP efforts
• Proactively & automatically respond to malicious events or risks
SituationCurrent network traffic solutions are not real-time. They either employ a deep packet inspection later or
try to analyze one packet at a time in the hope of catching bad apples.
ProblemIneffective detection techniques with only 0.001% of the corporate traffic which might be infested. Today’s
APT ( advanced persistent threat type viruses ) make it even more difficult to get detected with the current processing architectures
Solution
Teradata Appliance for Hadoop is critical part of the UDA solution architecture that can processenormous volumes of data in near real-time and the ability to point accurately malicious code,predictively preventing malicious attacks
Network Analysis for Advanced security threats
HIS Conference 2017
© 2016 Teradata
25
Analytics of Things (AoT) Overview
26
AoT – Connecting Sensor Data for Business Outcome
Right data, right insights, right actions driving business outcomes – revenue growth, customer satisfaction, operational efficiency
Sensor Data Other Data Sources Analytics Business Actions
Business Benefits
27
Our Most Frequently used Capabilities in AoT
• Smart Assets
• Service 360
• Connected Factory
• Connected Supply Chain
• Health & Fitness
• Smart Home
• Connected Car
• Usage based Insurance
• Smart Infrastructure
• Smart Utilities
• Smart Citizen
• Smart Mobility
IoT Data
• Sensor Data Qualification (AoT accelerators)
Industrial AoT Consumer AoT Smart City
28
Operating
Conditions
External
Conditions
Traffic
Location
nPath
cluster
graph regression
machine learning
geo-spatial
affinityEDW
Master Data
Asset Data
Serv ice Records
Manuals
Warranty Details
Decision on Edge
Auto shut down
Auto back up initiation
Change operating condition
Centralized Decisions
Preventive maintenance
Planned maintenance
Spares network
Technician Planning
Failure Rates
Warranty Planning
Asset Health
Monitor asset health
Monitor utilization
Smart Assets – Solution
29
Case Study: Predictive model for engine failures on trains
Business Objective• Avoid unplanned train downtime through prediction model
• Repair trains before failure occurs, secure uninterrupted process
• Pre-Dispatch required spare parts in time
• Avoid unplanned downtime, penality, process interruption• Save time on failure analysis
Data Challenge• Sensor data from engines and data from maintenance management system
• Understand patterns to failure (pattern analysis)
• Understand correlation of errors / sensor readings around a failure
• Based on this insight, build a predictive model• Example: daily pattern of temperature readings mid – low – mid often occurs 3 days
prior to an engine problem
Solution• Data structure to support ingestion, transformation, storage of data for
analytics
− Leverage Teradata analytics capability for predictive modeling
− Scoring of predictive model to predict failures
Opportunity to Impact
30
• Are you responsible for the uptime of an asset used by end customers?
• Does a downtime affect your production, revenue or customer service?
• Will you benefit from predicting a failure of a remote asset?
• Can you improve your product quality by understanding product behavior in the field?
Where might Smart Assets Help You?
HIS Conference 2017
© 2016 Teradata
31
The Changing Environment
What are the new Technologies?
How does It all fit together?
Conclusions
Agenda
© 2016 Teradata31 32
Blended Architectures are no longer an option; They are a requirement
Data Warehouse + Data Lake
On Premise + Cloud
RDBMS + HDFS
Commercial + Open Source
32
333333 © 2016 Teradata