creating a modern data architecture for digital transformation
Post on 16-Apr-2017
421 Views
Preview:
TRANSCRIPT
Creating a Modern Data Architecture for Digital Transformation
Rich CullenManager – Solutions Architecture, UK & NEUR
Building Blocks – The New Enterprise Stack
TRADITIONAL MODERNISED
APPS On-Premise, Monoliths SaaS, Microservices
DATABASE Relational Non-Relational
EDW Teradata, Oracle, etc. Hadoop
COMPUTE Scale-Up Server Containers / Commodity Server / Cloud
STORAGE SAN Local Storage & Data Lakes
NETWORK Routers and Switches Software-Defined Networks
Challenges of Digital Transformation
Growth in Data Silos
Lack Real-Time Insight
Existing Systems Overwhelmed
• Single View• Event Sourcing• CQRS• Data Domains• Polyglot Processing• Data Lake• Microservices• Containers• Continuous Delivery• Data-as-a-Service
Modern Approaches & Architecture Patterns
• AKA: Data Hub, 360 Degree View, Multi-Channel display
• A system that gathers data…
• …from multiple, disconnected sources…
• …and aggregates to provide a single view
• Foundation for analytics – cross-sell, upsell, churn risk
What is a Single View?
• Customer
• Product
• Employee
• Asset
• Risk
• City
• Anything meaningful to a business
A Single View… of what?
Why Not Use The Usual Tech – Relational Databases?
Database MUST simultaneously handle source systems complexity
Untenable change management
Complex data access
…Mobile
App
Web
Call Centre CRM Social
Feed
COMMON FIELDSCustomerID | eMail |
DYNAMIC FIELDSCan vary from record to record: location, action
Single View
Solution: Aggregate With A Dynamic Schema
• Flexible data model • Rich query, aggregation, search & reporting• High availability• Predictable scalability• Flexible deployment model
Single View – Required Database Capabilities
Single View – High Level Data Flow
Source:Web App
Source:CRM App
Source: Mainframe
System
Batch or real-time
Documents
Customer Service App
Churn Analytics
Risk Model
Real-Time Access
Update Queue
… GroupFilterSortCountAverageDeviations
Valid
atio
n
• Flexible data model • Rich query, aggregation, search & reporting• High availability• Predictable scalability• Flexible deployment model
Why MongoDB for Single View?
Single View of CustomerInsurance leader generates coveted single view of customers in 90 days – “The Wall”
Problem Why MongoDB ResultsProblem Solution Results
No single view of customer, leading to poor customer experience and churn
145 years of policy data, 70+ systems, 24 800 numbers, 15+ front-end apps that are not integrated
Spent 2 years, $25M trying build single view with Oracle – failed
Built “The Wall,” pulling in disparate data and serving single view to customer service reps in real time
Flexible data model to aggregate disparate data into single data store
Expressive query language and secondary indexes to serve any field in real time
Prototyped in 2 weeks
Deployed to production in 90 days
Decreased churn and improved ability to upsell/cross-sell
• Centralised repository for data collected from operational systems
• Exploratory analytics
• Extension of EDW: often based on Hadoop
• 50% of organisations invested in data lakes*
* Gartner
What is a Data Lake?
http://www.infoworld.com/article/2980316/big-data/why-your-big-data-strategy-is-a-bust.html
“Thru 2018, 70 percent of Hadoop deployments will not meet cost savings and revenue generation objectives due to skills and integration challenges.”Nick Heudecker, Research Director, Data Management & Integration
• Unify analytics with operational applications
• Create smart, contextually aware, data-driven
apps & insights
• Integrate operational database with data lake
How To Avoid Being In The 70%
• Smart/native integration with the data lake
• Powerful real-time analytics
• Flexible, governed data model
• Scale with the data lake
• Sophisticated management & security
• MongoDB provides all these capabilities
Operational Database Requirements
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Distributed Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalised Data Lake
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Distributed Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalised Data Lake
Configure where to land incoming data
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Distributed Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalised Data Lake
Raw data processed to generate analytics models
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Distributed Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalised Data LakeMongoDB exposes
analytics models to operational apps. Handles real time
updates
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Distributed Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalised Data Lake
Compute new models against
MongoDB & HDFS
Problem Why MongoDB ResultsProblem Solution Results
Existing EDW with nightly batch loads
No real-time analytics to personalize user experience
Application changes broke ETL pipeline
Unable to scale as services expanded
Microservices architecture running on AWS
All application events written to Kafka queue, routed to MongoDB and Hadoop
Events that personalize real-time experience (ietriggering email send, additional questions, offers) written to MongoDB
All event data aggregated with other data sources and analyzed in Hadoop, updated customer profiles written back to MongoDB
2x faster delivery of new services after migrating to new architecture
Enabled continuous delivery: pushing new features every day
Personalized user experience, plus higher uptime and scalability
UK’s Leading Price Comparison SiteOut-pacing Internet search giants with continuous delivery pipeline powered by microservices & Docker running MongoDB, Kafka and Hadoop in the cloud
• Development agility
• Data re-use
• Operational efficiency
• Corporate governance and data lineage
• Cost accountability
Standardising the Database Environment
API Access Layer
Operational Data
CustomersProducts
AccountsTransactions
Physical Infrastructure
App1 App2 App3• Shared, multi-tenant database
accessible via a common API• Exposes CRUD, search,
geospatial, graph, analytics• Each data domain isolated into
its own replica set• Logically managed as one
service, UI for self-service provisioning & scaling
Data-as-a-Service High Level Architecture
Patterns for Modern Data Architectures
Existing Systems OverwhelmedGrowth in Data Silos Lack Real-Time Insight
Single View Data-as-a-ServiceOperationalised
Data Lake
top related