analyst view of data virtualization: conversations with boulder business intelligence
TRANSCRIPT
-
Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence
-
Speakers
Ravi Shankar
Chief MarketingOfficer
Pablo Alvarez
Principal TechnicalAccount Manager
@Ravi_Shankar_@Denodo
-
Agenda1.About Denodo2.Data Virtualization Overview3.Denodo 6.0 Overview4.Denodo 6.0 Virtual Launch
-
About Denodo
-
HEADQUARTERSPalo Alto, CA.
DENODO OFFICES, CUSTOMERS, PARTNERSGlobal presence throughout North America, EMEA, APAC, and Latin America.
CUSTOMERS
250+ customers, including many F500 and G2000 companies across every major industry have gained significant business agility and ROI.
LEADERSHIP Longest continuous focus on data
virtualization and data services. Product leadership. Solutions expertise.
5
THE LEADER IN DATA VIRTUALIZATION
Denodo provides agile, high performance data integration and data abstraction across the broadest range of enterprise, cloud, big data and unstructured data sources, and real-time data services at half the cost of traditional approaches.
-
Award-Winning Data Virtualization Leader
6
Forrester Wave: Enterprise Data Virtualization
Forrester Wave: Enterprise Data Virtualization, Q1 2015
2015 Magic Quadrant for Data Integration
Tools
2015 Leader in Forrester Wave: Enterprise Data
Virtualization.
2015 Technology Innovation Award for
Information Management
2015 #1 Readers Choice Awards For Data
Virtualization Platforms
2015 Rank Companies that Matters Most in
Data
2015 Big Data 50 Companies Driving
Innovation
2015 Leadership Award in Big Data
For DenodoCustomer Autodesk
Trend-Setting Products in Data and Information Management for 2016
2016 Premier 100 Technology Leader
For Denodo Customer CIT
-
Data Virtualization
-
The Business Need
8
Ready Access to Critical Information to Support Business Processes
MarketingSales ExecutiveSupport
Customers
Warranties
Channels
Products
Access to complete information
Access to related information
Access in real-time
Cross-sell / Up-sell
-
Manually access different systems
Not productive slows down response times
IT responds with point-to-point data integration
The Challenge
9
Data Is Siloed Across Disparate Systems
MarketingSales ExecutiveSupport
Database
AppsWarehouse Cloud
Big Data
Documents AppsNo SQL
-
The Solution
10
Data Abstraction Layer
Abstracts access to disparate data sources
Acts as a single repository (virtual)
Makes data available in real-time to consumers
10
DATA ABSTRACTION LAYER
-
Data Virtualization
11
Data Abstraction Layer
Publishes the data to applications
Combinesrelated data into views
Connectsto disparate data sources
2
3
1
-
Benefits of Data Virtualization
12
Data Virtualization
Better Data Integration
Lower integration costs by 80%.
Flexibility to change.
Real-time (on-demand) data services.
Complete Information
Focus on business information needs.
Include web / cloud, big data, unstructured, streaming.
Bigger volumes, richer/easier access to data.
Better Business Outcome
Projects in 4-6 weeks.
ROI in
-
Problem Solution Results
Case Study
13
Autodesk Successfully Changes Their Revenue Model and Transforms Business
Autodesk was changing their business revenue model from a conventional perpetual license model to subscription-based license model.
Inability to deliver high quality data in a timely manner to business stakeholders.
Evolution from traditional operational data warehouse to contemporary logical data warehouse deemed necessary for faster speed.
General purpose platform to deliver data through logical data warehouse.
Denodo Abstraction Layer helps live invoicing with SAP.
Data virtualization enabled a culture of see before you build.
Successfully transitioned to subscription-based licensing.
For the first time, Autodesk can do single point security enforcement and have uniform data environment for access.
Autodesk, Inc. is an American multinational software corporation that makes software for the architecture, engineering, construction, manufacturing, media, and entertainment industries.
-
Denodo Platform 6.0
-
Accelerate Your Fast Data StrategyWith Denodo Platform 6.0
Dynamic Query Optimizer
In the Cloud
Self Service Data Discovery and Search
Best Real-time Performance.Shortest Time to Data.Rapid Decision Making.
-
Accelerate Your Fast Data Strategy with Denodo Platform 6.0
16
New Release of Denodo Platform Delivers Breakthrough Performance, Accelerates Adoption, and Expedites Business Use of Data
Breakthrough Performance
Dynamic Query Optimizer delivers breakthrough performance for big data, logical data warehouse, and operational scenarios
Data Virtualization In the Cloud
Denodo Platform for AWS accelerates adoption of data virtualization
Self-service data discovery, and search
Self-service data discovery and search expedites use of data by business users
-
Dynamic Query Optimizer
17
Delivers Breakthrough Performance for Big Data, Logical Data Warehouse, and Operational Scenarios
Dynamically determines lowest-cost query execution plan based on statistics
Factors in all the special characteristics of big data sources such as number of processing units and partitions
Can easily handle any number of incremental queries
Enables connectivity to the broadest array of big data sources such as Redshift, Impala, Spark.
Best dynamic query optimization engine.
-
How Dynamic Query Optimizer Works
18
Example: Mining external dimensions with EDW
Total sales by retailer and product during the last month for the brand ACME
Time Dimension Fact table(sales) Product Dimension
Retailer Dimension
EDW MDM
SELECT retailer.name, product.name,SUM(sales.amount)FROMsales JOIN retailer ONsales.retailer_fk = retailer.id JOIN product ON sales.product_fk = product.idJOIN time ON sales.time_fk = time.idWHERE time.date < ADDMONTH(NOW(),-1) AND product.brand = ACMEGROUP BY product.name, retailer.name
-
How Dynamic Query Optimizer Works
19
Example: Non-optimized
1,000,000,000 rows
JOIN
JOIN
JOIN
GROUP BYproduct.name, retailer.name
100 rows 10 rows 30 rows
10,000,000 rows
SELECTsales.retailer_fk, sales.product_fk,sales.time_fk,sales.amountFROM sales
SELECTretailer.name,retailer.idFROM retailer
SELECTproduct.name,product.idFROM productWHERE produc.brand = ACME
SELECT time.date,time.idFROM timeWHERE time.date < add_months(CURRENT_TIMESTAMP, -1)
-
How Dynamic Query Optimizer Works
20
Step 1: Applies JOIN reordering to maximize delegation
100,000,000 rows
JOIN
JOIN
100 rows 10 rows
10,000,000 rows
GROUP BYproduct.name, retailer.name
SELECT sales.retailer_fk, sales.product_fk,sales.amountFROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)
SELECTretailer.name,retailer.idFROM retailer
SELECT product.name,product.idFROM productWHERE produc.brand = ACME
-
How Dynamic Query Optimizer Works
21
Step 2
100,000 rows
JOIN
JOIN
100 rows 10 rows
1,000 rowsGROUP BYproduct.name, retailer.nameSince the JOIN is on foreign keys
(1-to-many), and the GROUP BY is on attributes from the dimensions, it applies the partial aggregation push down optimization
SELECT sales.retailer_fk, sales.product_fk,SUM(sales.amount)FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)GROUP BY sales.retailer_fk, sales.product_fk
SELECTretailer.name,retailer.idFROM retailer
SELECT product.name,product.idFROM productWHERE produc.brand = ACME
-
How Dynamic Query Optimizer Works
22
Step 3
Selects the right JOIN strategy based on costs for data volume estimations
10,000 rows
NESTED JOIN
HASH JOIN
100 rows10 rows
1,000 rowsGROUP BYproduct.name, retailer.name
SELECT sales.retailer_fk, sales.product_fk,SUM(sales.amount)FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)GROUP BY sales.retailer_fk, sales.product_fkWHERE product.id IN (1,2,)
SELECTretailer.name,retailer.idFROM retailer
SELECT product.name,product.idFROM product WHERE produc.brand = ACME
-
How Dynamic Query Optmizer Works
The use of Automatic JOIN reordering groups branches that go to the same source to maximize query delegation and reduce processing in the DV layer
End users dont need to worry about the optimal pairing of the tables
The Partial Aggregation push-down optimization is key in those scenarios. Based on PK-FK restrictions, pushes the aggregation (for the PKs) to the DW
Leverages the processing power of the DW, optimized for these aggregations Reduces significantly the data transferred through the network (from 1 b to 100 k)
The Cost-based Optimizer picks the right JOIN strategies based on estimations on data volumes, existence of indexes, transfer rates, etc.
Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular databases to take into consideration the different way those systems operate (distributed data, parallel processing, different aggregation techniques, etc.)
23
Summary
-
How Dynamic Query Optimizer Works
Pruning of unnecessary JOIN branches (based on 1 to + associations) when the attributes of the 1-side are not projected
Relevant for horizontal partitioning and fat semantic models when queries do not need attributes for all the tables
Unnecessary tables are removed from the query (even for single-source models)
Pruning of UNION branches based on incompatible filters Enables detection of unnecessary UNION branches in vertical partitioning scenarios
Automatic data movement Creation of temp tables in one of the systems to enable complete delegation of a federated
branch. The target source needs to have the data movement option enabled for this option to be
taken into account
24
Other relevant optimization techniques
-
Performance ComparisonLogical Data Warehouse vs. Physical Data Warehouse
Customer Dimension2 M rows
Sales Facts290 M rows
Items Dimension400 K rows
* TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems.
Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following scenario
The baseline was set using the same queries with all data in a Netezza appliance
http://www.tpc.org/tpcds/
-
Performance ComparisonLogical Data Warehouse vs. Physical Data Warehouse
Query Description Returned RowsAvg. Time Physical
(Netezza)
Denodo Avg. Time Logical
Optimization Technique (automatically chosen)
Total sales by customer 1.99 M 21.0 sec 21. 5 sec Full aggregation push-down
Total sales by customer and year between 2000 and 2004 5.51 M 52.3 sec 59.1 sec Full aggregation push-down
Total sales by item brand 31.4 K 4.7 sec 5.3 sec Partial aggregation push-down
Total sales by item where sale price less than current list price
17.1 K 3.5 sec 5.2 sec On the fly data movement
-
Improved Cache Performance
27
Incremental Queries
Merge cached data and changed data to provide fully up-to-date results with minimum latency
Get Leads changed / addedsince 1:00AM
CACHELeads updatedat 1:00AM
Up-to-date Leads data
1. Salesforce Leads data cached in VDP at 1:00 AM
2. Query needing Leads data arrives at 11:00 AM
3. Only new/changed leads are retrieved through the WAN
4. Response is up-to-date but query is much faster
-
Big Data Connectivity
Big Data and Cloud Databases Connectivity
Redshift enhanced adapter as data source, cache and data movement target
Vertica enhanced as source, cache and data movement target
Apache Spark enhanced adapter
Impala enhanced as cache and data movement target
28
-
Data Virtualization in the Cloud
29
Accelerate Adoption of Data Virtualization
Ready-to-use and available on AWS Marketplace
Dynamic and elastic infrastructure
Complete with all enterprise-grade features at the lowest cost
Zero set-up requirements
Flexible rent-by-the-hour options
A wide range of capacity options
Only data virtualization platform on AWS.
-
Buying a Subscription
Customer must have an Amazon AWS account
Choose configuration required (building block + Amazon VM) Building block by sources or number of conc. queries & results
Click-Through license agreement
Amazon provides monthly billing based on usage Annual subscriptions billed upfront
Support included in final pricing30
-
Self-Service Data Discovery and Search
31
Expedite Use of Data by Business Users
Search Google-like search for data and metadata
Discover Easy-to-use user interface to browse data and metadata as well as data lineage
Explore Ability to view the graphical representation of entities and relationships
Advanced Query Wizard for users to create ad-hoc queries
Sandbox environment to explore the data before publishing
Data virtualization solution to search data from sources.
-
Search
32
Google-like Search
Global Search enter keyword to find views containing that data
-
Discover
33
Data Lineage Views
Data lineage and tree view information including derived fields transformations
-
Explore
34
Graphical representation of views and relationships
-
Create Ad-hoc Queries
35
GUI Based Query creation & save as new Denodo view
Export data via CSV & HTML
-
Managing Very Large Deployments
Establish limits on resource usage e.g. Estimated memory, estimated cost, # of concurrent queries, limits to max. execution time and/or
max. # of rows
Assigned to user and/or roles Limits can be individual or global e.g.
Individual: Each query of a user with role marketing cannot use more than 100 MB Global: All concurrent queries from users with role marketing cannot use more than 300 MB
Possible actions if limits are surpassed: Prevent execution Allow execution with restricted resources Allow execution; cancel if resources limit is surpassed
Can be dynamically assigned through custom policies (e.g. assign different plans based on time of day)
36
New Resource Manager
-
Managing Very Large Deployments
Monitor operation of the system, Diagnose Problems and Analyze Usage Metrics
The new tool will also allow after the fact diagnosis of problems Set the time when the problem
occurred and you will see everything that was happening in an integrated, graphical manner down to the individual query level
37
Enhanced Monitoring and Diagnostic Tool
-
Unified Security and Governance
38
Enforcing Security and Governance Policies Kerberos Southbound support for databases and Web Services Kerberos pass-through support and Kerberos constrained delegation API for accessing view dependencies information and data lineage
information
-
Agile Development
39
New Admin Tool
Multiple tabs and databases
Resize and organize all panels and dialogs
Manages several open tasks at the same time
VQL highlighting and autocomplete features
Graphical support for GIT
-
40
-
Denodo 6.0 Fast Data Strategy Summit
41
March 30 US; March 31 EMEA
9:00 Welcome: Fast Data Strategy Summit Angel Vina, CEO, Denodo
9:30 Analyst Keynote: Accelerating Fast Data Strategy with Data Virtualization Presenter: Noel Yuhanna, Principal Research Analyst, Forrester Research
10:00 Customer Case Study: Designing Fast Data Architectures with Data Virtualization and Big Data on Cloud Presenter: Kurt Jackson, Platform Architect, Autodesk
10:30 Experts Panel: Core Components of Fast Data Strategy Big Data and Data Virtualization Panelists: Noel Yuhanna, Principal Research Analyst, Forrester Research Mark Eaton, Enterprise Architect, Autodesk Matt Morgan, Vice President, Product and Partner Marketing, Hortonworks Moderated by: Ravi Shankar, CMO, Denodo
11:00 Use cases: Where does Fast Data Strategy fit within IT Projects Presenter: Ravi Shankar, CMO, Denodo
12:00 Demo: How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and Operatio Scenarios Presenter: Pablo Alvarez, Principal Technical Account Manager, Denodo
12:05 Closing: Fast Data Strategy Summit Angel Vina, CEO, Denodo
-
Denodo 6.0 Fast Data Strategy Summit
42
March 30 US; March 31 EMEA Tracks Case Studies Intro to Data Virtualization Technical Deep-Dive
Customer Case Study: SQLization of Hadoop Increasing Business Adoption of Big Data Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient
Intro: Getting Started with Data Virtualization What problems DV solves Richard Walker, VP, Sales, Denodo
Data Science: Expediting Use of Data by Business Users with Self-service Discovery and Search Mark Pritchard, Director, Sales Engineering
Customer Case Study: Data Services Rapid Application Development using Data Virtualization Jay Heydt, Manager, Database Technologies, DrillingInfo
Demo: Getting Started with Data Virtualization What problems DV solves Pablo Alvarez, Principal Technical Account Manager, Denodo
Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses Alberto Bengoa, Sr. Product Manager, Denodo
Customer Case Study: Data Virtualization in the Cloud Avinash Desphande, Big Data and Advanced Analytics, Logitech
Enabling Fast Data Strategy: Whats new in Denodo Platform 6.0 Alberto Pan, CTO, Denodo
Data Virtualization Deployments: How to Manage Very Large Deployments Juan Lozano, Sales Engineering Manager, Denodo
Customer Case Study: TBD TBD
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption Paul Moxon, Sr. Director, Strategic Technology Office, Denodo
Big Data: Architecture and Performance Considerations in Logical Data Lakes Alberto Pan, CTO, Denodo
Customer Case Study: TBD TBD
Data Virtualization Maturity: Enterprise Features in Denodo Platform 6.0 Suresh Chandrasekaran, Sr. Vice President, Denodo
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB Alberto Bengoa, Sr. Product Manager, Denodo
Customer Case Study: TBD TBD
Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence Brain Trust Claudia Imhoff, CEO, Intelligent Solutions
Partner Enablement: Architecting and Deploying Data Virtualization
Tracks
Case Studies
Intro to Data Virtualization
Technical Deep-Dive
Customer Case Study: SQLization of Hadoop Increasing Business Adoption of Big Data
Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient
Intro: Getting Started with Data Virtualization What problems DV solves
Richard Walker, VP, Sales, Denodo
Data Science: Expediting Use of Data by Business Users with Self-service Discovery and Search
Mark Pritchard, Director, Sales Engineering
Customer Case Study: Data Services Rapid Application Development using Data Virtualization
Jay Heydt, Manager, Database Technologies, DrillingInfo
Demo: Getting Started with Data Virtualization What problems DV solves
Pablo Alvarez, Principal Technical Account Manager, Denodo
Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses
Alberto Bengoa, Sr. Product Manager, Denodo
Customer Case Study: Data Virtualization in the Cloud
Avinash Desphande, Big Data and Advanced Analytics, Logitech
Enabling Fast Data Strategy: Whats new in Denodo Platform 6.0
Alberto Pan, CTO, Denodo
Data Virtualization Deployments: How to Manage Very Large Deployments
Juan Lozano, Sales Engineering Manager, Denodo
Customer Case Study: TBD
TBD
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Paul Moxon, Sr. Director, Strategic Technology Office, Denodo
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Alberto Pan, CTO, Denodo
Customer Case Study: TBD
TBD
Data Virtualization Maturity: Enterprise Features in Denodo Platform 6.0
Suresh Chandrasekaran, Sr. Vice President, Denodo
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Alberto Bengoa, Sr. Product Manager, Denodo
Customer Case Study: TBD
TBD
Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence Brain Trust
Claudia Imhoff, CEO, Intelligent Solutions
Partner Enablement: Architecting and Deploying Data Virtualization
-
Thanks!
www.denodo.com [email protected] Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.
Slide Number 1Slide Number 2Slide Number 3About DenodoSlide Number 5Award-Winning Data Virtualization LeaderData VirtualizationThe Business NeedThe ChallengeThe SolutionData VirtualizationBenefits of Data VirtualizationAutodesk Successfully Changes Their Revenue Model and Transforms BusinessDenodo Platform 6.0Accelerate Your Fast Data StrategyAccelerate Your Fast Data Strategy with Denodo Platform 6.0Dynamic Query OptimizerHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optmizer WorksHow Dynamic Query Optimizer WorksPerformance ComparisonPerformance ComparisonImproved Cache PerformanceBig Data ConnectivityData Virtualization in the CloudBuying a SubscriptionSelf-Service Data Discovery and SearchSearchDiscoverExploreCreate Ad-hoc QueriesManaging Very Large DeploymentsManaging Very Large DeploymentsUnified Security and GovernanceAgile DevelopmentSlide Number 40Denodo 6.0 Fast Data Strategy SummitDenodo 6.0 Fast Data Strategy SummitSlide Number 43