leveraging data virtualization - axians · data sources – readily available to your data analysts...
TRANSCRIPT
![Page 1: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/1.jpg)
1
![Page 2: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/2.jpg)
To maximize advance analytics & machine
learning potential
Vincent Fages-Gouyou,
Director of Product Management EMEA
LeveragingData Virtualization
![Page 3: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/3.jpg)
Agenda1. What are Advanced Analytics?
2. The Data Challenge
3. The Rise of Logical Data Architectures
4. Tackling the Data Pipeline Problem
5. Real-time Machine Learning with Data Virtualization
6. Key Takeaways
7. Q&A
8. Next Steps
![Page 4: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/4.jpg)
4
Analytics Value Escalator
![Page 5: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/5.jpg)
5
The Analytics Chasm
![Page 6: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/6.jpg)
6
The Key Ingredient for Advanced Analytics is Data
Input data for a data science project may come in a variety of systems and formats:
• Files (CSV, logs, Parquet)
• Relational databases (EDW, operational systems)
• NoSQL systems (key-value pairs, document stores, time series, etc.)
• SaaS APIs (Salesforce, Marketo, ServiceNow, Facebook, Twitter, etc.)
In addition, the Big Data community has also embraced data science as one of their
pillars. For example Spark and SparkML, and architectural patterns like the Data Lake
![Page 7: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/7.jpg)
7
Typical Data Science Workflow
A typical workflow for a data scientist is:
1. Gather requirements for the business
problem
2. Identify relevant Data
3. Cleanse Data into a useful format
4. Analyze Data
5. Prepare input for algorithms
6. Execute Data Science algorithms (ML, AI, etc.)
Iterate 1 to 6
7. Visualize and Share
![Page 8: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/8.jpg)
8
Typical Data Science Workflow
80%
10%
10%
Finding and Preparing
the Data
Analysis
Visualizing data
![Page 9: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/9.jpg)
9
Data Access – A Wild West Expedition ?
Finding the right data
Getting access
Understand heterogeneous technologies
(noSQL, REST APIs, etc.)
Transforming into a useable format
Combining multiple sources
Profiling / sanitizing data
Prepare for ML / AI algorithms
Share data, processes and results
Photo by Jasper van der Meij on Unsplash
![Page 10: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/10.jpg)
10
Data Lakes – The Solution?
High initial investment with unclear value
Replication, replication, replication…
with limited value added
Data scientists alone can’t manage it
efficiently, without additional support
Large processing capabilities
Agility
Photo by Aaron Burden on Unsplash
Lower cost of storage
Without governance & management, it can
become a “Data Swamp”
![Page 11: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/11.jpg)
11
Data Virtualization
Mutualized Data Infrastructures
Security, access control & audit
A unique Data Delivery platform for
Data Science, Analytics et APIs
Maximize value out of your existing
technologies (RDBMS, Hadoop, Cloud, etc.)
Optimized Investments
Shorten Time-to-Data
Photo by Tiago Gerken on Unsplash
![Page 12: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/12.jpg)
12
Rise of Logical Architectures
The evolution of Analytical Architectures: Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs, Gartner April 2018
![Page 13: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/13.jpg)
Logical Data Warehouse: the Path to the Future
Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs”. Henry Cook, Gartner April 2018
![Page 14: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/14.jpg)
14
Data Scientist Workflow Steps
Identify useful
data
Modify data into
a useful format
Analyze data Execute data
science algorithms
(ML, AI, etc.)
Share with
business users
Prepare for
ML algorithm
![Page 15: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/15.jpg)
Security
Governance, Metadata Management, Data Mart
Data Access
Data Virtualization Data Services
15
VizualisationML / AIData ScienceData Quality
Agile Information Architecture
Data Sources
Data Warehouse
noSQL
RDBMS
Federation
Transformation
Abstraction
Data Service Dynamic Query
Optimization
Cost Based
Optimizer
Query
Rewriting
Caching MPP
Security &
Governance
Lifecycle
Management
Data Catalog
Discover
Collaborate
Query
Categorize
![Page 16: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/16.jpg)
16
Gartner, Adopt the Logical Data Warehouse Architecture
to Meet Your Modern Analytical Needs, May 2018
“When designed properly, Data Virtualization can speed data
integration, lower data latency, offer flexibility and reuse, and
reduce data sprawl across dispersed data sources.
Due to its many benefits, Data Virtualization is often the first step
for organizations evolving a traditional, repository-style data
warehouse into a Logical Architecture”
![Page 17: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/17.jpg)
17
Benefits of a Virtual Data Layer
§ A Virtual Layer improves decision making and shortens development cycles
• Surfaces all company data from multiple repositories without the need to replicate all data
into a lake
• Eliminates data silos: allows for on-demand combination of data from multiple sources
§ A Virtual Layer broadens usage of data
• Improves governance and metadata management to avoid “data swamps”
• Decouples data source technology. Access normalized via SQL or web services
• Allows controlled access to the data with low grain security controls
§ A Virtual Layer offers performant access
• Leverages the processing power of the existing sources controlled by Denodo’s optimizer
• Processing of data for sources with no processing capabilities (e.g. files)
• Caching and ingestion engine to persist data when needed
![Page 18: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/18.jpg)
Demonstration
Accelerating the Machine Learning Data
Pipeline with Data Virtualization
18
![Page 19: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/19.jpg)
19
https://flic.kr/p/x8HgrF
Can we predict the usage of the NYC bike
system based on data from previous years?
![Page 20: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/20.jpg)
20
Data Sources – Citibike
![Page 21: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/21.jpg)
21https://flic.kr/p/CYT7SS
There are external factors to consider.
Which ones?
![Page 22: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/22.jpg)
22
Data Sources – NWS Weather Data
![Page 23: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/23.jpg)
23
What We’re Going To Do…
1. Connect to data and have a look
2. Format the data (prep it) so that we can look for significant factors
• e.g. bike trips on different days of week, different months of year, etc.
3. Once we’ve decided on the significant attributes, prepare that data for the ML
algorithm
4. Using Python, read the 2017 data and run it through our ML algorithm for
training
5. Read the 2018 data, test the algorithm
6. Save the results and load them into the Denodo Platform
![Page 24: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/24.jpg)
DATA CONSUMERS
24
DISPARATE DATA SOURCES
SQL Queries
(JDBC, ODBC, ADO.NET)
Web Services
(SOAP, REST, OData)
Web-based catalog
& search
Secure delivery
(SSL/TLS)
DATA CONSUMERS
MPP Processing
Relational Cache
Corporate Security
Monitoring & Auditing
Metadata
Repository
Execution Engine
& Optimizer
A Modern Data Virtualization Architecture
DATA VIRTUALIZATION
![Page 25: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/25.jpg)
Demo
25
![Page 26: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/26.jpg)
26
McCormick Spice
![Page 27: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/27.jpg)
27
McCormick Spice (Cont’d)
Data Services(Data Virtualization)
API Management and Runtime
Semantics & Discovery
Go
ve
rna
nce
Se
curi
ty
System 1 System nExternal
API $
Go
ve
rna
nce
Se
curi
ty
![Page 28: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/28.jpg)
28
McCormick Spice (Cont’d)
Benefits
✓ Timely Information✓ No replication
✓ No need to validate information✓ Better staging for learning
Approach
1. Model requests Specific Modifications/Full Information
2. Model incrementally or fully trains
Algorithms
Backend
SystemsReal-Tim
e
Real-TimeExternal
Systems
1Request Enterprise
Data
Services
2 Collect
train
4 3Receive
![Page 29: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/29.jpg)
Key Takeaways
29
![Page 30: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/30.jpg)
30
Key Takeaways
ü The Denodo Platform makes all kinds of data – from a variety of
data sources – readily available to your data analysts and data
scientists
ü Data virtualization shortens the ‘data wrangling’ phases of
analytics/ML projects, avoiding needing to write ‘data prep’ scripts
in Python, R, etc.
ü It’s easy to access and analyze the data from analytics tools such as
Zeppelin or Jupyter
ü The Denodo Platform enable centralization of data access and
sharing across each data processing stages
![Page 31: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/31.jpg)
31
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
G E T S TA R T E D T O D AY
www.denodo.com/TestDrive
![Page 32: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/32.jpg)
Q&A
![Page 33: Leveraging Data Virtualization - Axians · data sources – readily available to your data analysts and data scientists ü Data virtualization shortens the ‘data wrangling’ phases](https://reader030.vdocuments.mx/reader030/viewer/2022041014/5ec5765079e4292b6e2f4459/html5/thumbnails/33.jpg)
Thanks!
www.denodo.com [email protected]
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.