analyst view of data virtualization: conversations with boulder business intelligence

Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence

Speakers

Ravi Shankar

Chief MarketingOfficer

Pablo Alvarez

Principal TechnicalAccount Manager

@Ravi_Shankar_@Denodo

Agenda1.About Denodo2.Data Virtualization Overview3.Denodo 6.0 Overview4.Denodo 6.0 Virtual Launch

About Denodo

HEADQUARTERSPalo Alto, CA.

DENODO OFFICES, CUSTOMERS, PARTNERSGlobal presence throughout North America, EMEA, APAC, and Latin America.

CUSTOMERS

250+ customers, including many F500 and G2000 companies across every major industry have gained significant business agility and ROI.

LEADERSHIP Longest continuous focus on data

virtualization and data services. Product leadership. Solutions expertise.

5

THE LEADER IN DATA VIRTUALIZATION

Denodo provides agile, high performance data integration and data abstraction across the broadest range of enterprise, cloud, big data and unstructured data sources, and real-time data services at half the cost of traditional approaches.

Award-Winning Data Virtualization Leader

6

Forrester Wave: Enterprise Data Virtualization

Forrester Wave: Enterprise Data Virtualization, Q1 2015

2015 Magic Quadrant for Data Integration

Tools

2015 Leader in Forrester Wave: Enterprise Data

Virtualization.

2015 Technology Innovation Award for

Information Management

2015 #1 Readers Choice Awards For Data

Virtualization Platforms

2015 Rank Companies that Matters Most in

Data

2015 Big Data 50 Companies Driving

Innovation

2015 Leadership Award in Big Data

For DenodoCustomer Autodesk

Trend-Setting Products in Data and Information Management for 2016

2016 Premier 100 Technology Leader

For Denodo Customer CIT

Data Virtualization

The Business Need

8

Ready Access to Critical Information to Support Business Processes

MarketingSales ExecutiveSupport

Customers

Warranties

Channels

Products

Access to complete information

Access to related information

Access in real-time

Cross-sell / Up-sell

Manually access different systems

Not productive slows down response times

IT responds with point-to-point data integration

The Challenge

9

Data Is Siloed Across Disparate Systems

MarketingSales ExecutiveSupport

Database

AppsWarehouse Cloud

Big Data

Documents AppsNo SQL

The Solution

10

Data Abstraction Layer

Abstracts access to disparate data sources

Acts as a single repository (virtual)

Makes data available in real-time to consumers

10

DATA ABSTRACTION LAYER

Data Virtualization

11

Data Abstraction Layer

Publishes the data to applications

Combinesrelated data into views

Connectsto disparate data sources

2

3

1

Benefits of Data Virtualization

12

Data Virtualization

Better Data Integration

Lower integration costs by 80%.

Flexibility to change.

Real-time (on-demand) data services.

Complete Information

Focus on business information needs.

Include web / cloud, big data, unstructured, streaming.

Bigger volumes, richer/easier access to data.

Better Business Outcome

Projects in 4-6 weeks.

ROI in

Problem Solution Results

Case Study

13

Autodesk Successfully Changes Their Revenue Model and Transforms Business

Autodesk was changing their business revenue model from a conventional perpetual license model to subscription-based license model.

Inability to deliver high quality data in a timely manner to business stakeholders.

Evolution from traditional operational data warehouse to contemporary logical data warehouse deemed necessary for faster speed.

General purpose platform to deliver data through logical data warehouse.

Denodo Abstraction Layer helps live invoicing with SAP.

Data virtualization enabled a culture of see before you build.

Successfully transitioned to subscription-based licensing.

For the first time, Autodesk can do single point security enforcement and have uniform data environment for access.

Autodesk, Inc. is an American multinational software corporation that makes software for the architecture, engineering, construction, manufacturing, media, and entertainment industries.

Denodo Platform 6.0

Accelerate Your Fast Data StrategyWith Denodo Platform 6.0

Dynamic Query Optimizer

In the Cloud

Self Service Data Discovery and Search

Best Real-time Performance.Shortest Time to Data.Rapid Decision Making.

Accelerate Your Fast Data Strategy with Denodo Platform 6.0

16

New Release of Denodo Platform Delivers Breakthrough Performance, Accelerates Adoption, and Expedites Business Use of Data

Breakthrough Performance

Dynamic Query Optimizer delivers breakthrough performance for big data, logical data warehouse, and operational scenarios

Data Virtualization In the Cloud

Denodo Platform for AWS accelerates adoption of data virtualization

Self-service data discovery, and search

Self-service data discovery and search expedites use of data by business users

Dynamic Query Optimizer

17

Delivers Breakthrough Performance for Big Data, Logical Data Warehouse, and Operational Scenarios

Dynamically determines lowest-cost query execution plan based on statistics

Factors in all the special characteristics of big data sources such as number of processing units and partitions

Can easily handle any number of incremental queries

Enables connectivity to the broadest array of big data sources such as Redshift, Impala, Spark.

Best dynamic query optimization engine.

How Dynamic Query Optimizer Works

18

Example: Mining external dimensions with EDW

Total sales by retailer and product during the last month for the brand ACME

Time Dimension Fact table(sales) Product Dimension

Retailer Dimension

EDW MDM

SELECT retailer.name, product.name,SUM(sales.amount)FROMsales JOIN retailer ONsales.retailer_fk = retailer.id JOIN product ON sales.product_fk = product.idJOIN time ON sales.time_fk = time.idWHERE time.date < ADDMONTH(NOW(),-1) AND product.brand = ACMEGROUP BY product.name, retailer.name


19

Example: Non-optimized

1,000,000,000 rows

JOIN

JOIN

JOIN

GROUP BYproduct.name, retailer.name

100 rows 10 rows 30 rows

10,000,000 rows

SELECTsales.retailer_fk, sales.product_fk,sales.time_fk,sales.amountFROM sales

SELECTretailer.name,retailer.idFROM retailer

SELECTproduct.name,product.idFROM productWHERE produc.brand = ACME

SELECT time.date,time.idFROM timeWHERE time.date < add_months(CURRENT_TIMESTAMP, -1)


20

Step 1: Applies JOIN reordering to maximize delegation

100,000,000 rows

JOIN

JOIN

100 rows 10 rows

10,000,000 rows

GROUP BYproduct.name, retailer.name

SELECT sales.retailer_fk, sales.product_fk,sales.amountFROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)


SELECT product.name,product.idFROM productWHERE produc.brand = ACME


21

Step 2

100,000 rows

JOIN

JOIN

100 rows 10 rows

1,000 rowsGROUP BYproduct.name, retailer.nameSince the JOIN is on foreign keys

(1-to-many), and the GROUP BY is on attributes from the dimensions, it applies the partial aggregation push down optimization

SELECT sales.retailer_fk, sales.product_fk,SUM(sales.amount)FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)GROUP BY sales.retailer_fk, sales.product_fk


SELECT product.name,product.idFROM productWHERE produc.brand = ACME


22

Step 3

Selects the right JOIN strategy based on costs for data volume estimations

10,000 rows

NESTED JOIN

HASH JOIN

100 rows10 rows

1,000 rowsGROUP BYproduct.name, retailer.name

SELECT sales.retailer_fk, sales.product_fk,SUM(sales.amount)FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)GROUP BY sales.retailer_fk, sales.product_fkWHERE product.id IN (1,2,)


SELECT product.name,product.idFROM product WHERE produc.brand = ACME

How Dynamic Query Optmizer Works

The use of Automatic JOIN reordering groups branches that go to the same source to maximize query delegation and reduce processing in the DV layer

End users dont need to worry about the optimal pairing of the tables

The Partial Aggregation push-down optimization is key in those scenarios. Based on PK-FK restrictions, pushes the aggregation (for the PKs) to the DW

Leverages the processing power of the DW, optimized for these aggregations Reduces significantly the data transferred through the network (from 1 b to 100 k)

The Cost-based Optimizer picks the right JOIN strategies based on estimations on data volumes, existence of indexes, transfer rates, etc.

Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular databases to take into consideration the different way those systems operate (distributed data, parallel processing, different aggregation techniques, etc.)

23

Summary


Pruning of unnecessary JOIN branches (based on 1 to + associations) when the attributes of the 1-side are not projected

Relevant for horizontal partitioning and fat semantic models when queries do not need attributes for all the tables

Unnecessary tables are removed from the query (even for single-source models)

Pruning of UNION branches based on incompatible filters Enables detection of unnecessary UNION branches in vertical partitioning scenarios

Automatic data movement Creation of temp tables in one of the systems to enable complete delegation of a federated

branch. The target source needs to have the data movement option enabled for this option to be

taken into account

24

Other relevant optimization techniques

Performance ComparisonLogical Data Warehouse vs. Physical Data Warehouse

Customer Dimension2 M rows

Sales Facts290 M rows

Items Dimension400 K rows

* TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems.

Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following scenario

The baseline was set using the same queries with all data in a Netezza appliance

http://www.tpc.org/tpcds/

Performance ComparisonLogical Data Warehouse vs. Physical Data Warehouse

Query Description Returned RowsAvg. Time Physical

(Netezza)

Denodo Avg. Time Logical

Optimization Technique (automatically chosen)

Total sales by customer 1.99 M 21.0 sec 21. 5 sec Full aggregation push-down

Total sales by customer and year between 2000 and 2004 5.51 M 52.3 sec 59.1 sec Full aggregation push-down

Total sales by item brand 31.4 K 4.7 sec 5.3 sec Partial aggregation push-down

Total sales by item where sale price less than current list price

17.1 K 3.5 sec 5.2 sec On the fly data movement

Improved Cache Performance

27

Incremental Queries

Merge cached data and changed data to provide fully up-to-date results with minimum latency

Get Leads changed / addedsince 1:00AM

CACHELeads updatedat 1:00AM

Up-to-date Leads data

1. Salesforce Leads data cached in VDP at 1:00 AM

2. Query needing Leads data arrives at 11:00 AM

3. Only new/changed leads are retrieved through the WAN

4. Response is up-to-date but query is much faster

Big Data Connectivity

Big Data and Cloud Databases Connectivity

Redshift enhanced adapter as data source, cache and data movement target

Vertica enhanced as source, cache and data movement target

Apache Spark enhanced adapter

Impala enhanced as cache and data movement target

28

Data Virtualization in the Cloud

29

Accelerate Adoption of Data Virtualization

Ready-to-use and available on AWS Marketplace

Dynamic and elastic infrastructure

Complete with all enterprise-grade features at the lowest cost

Zero set-up requirements

Flexible rent-by-the-hour options

A wide range of capacity options

Only data virtualization platform on AWS.

Buying a Subscription

Customer must have an Amazon AWS account

Choose configuration required (building block + Amazon VM) Building block by sources or number of conc. queries & results

Click-Through license agreement

Amazon provides monthly billing based on usage Annual subscriptions billed upfront

Support included in final pricing30

Self-Service Data Discovery and Search

31

Expedite Use of Data by Business Users

Search Google-like search for data and metadata

Discover Easy-to-use user interface to browse data and metadata as well as data lineage

Explore Ability to view the graphical representation of entities and relationships

Advanced Query Wizard for users to create ad-hoc queries

Sandbox environment to explore the data before publishing

Data virtualization solution to search data from sources.

Search

32

Google-like Search

Global Search enter keyword to find views containing that data

Discover

33

Data Lineage Views

Data lineage and tree view information including derived fields transformations

Explore

34

Graphical representation of views and relationships

Create Ad-hoc Queries

35

GUI Based Query creation & save as new Denodo view

Export data via CSV & HTML

Managing Very Large Deployments

Establish limits on resource usage e.g. Estimated memory, estimated cost, # of concurrent queries, limits to max. execution time and/or

max. # of rows

Assigned to user and/or roles Limits can be individual or global e.g.

Individual: Each query of a user with role marketing cannot use more than 100 MB Global: All concurrent queries from users with role marketing cannot use more than 300 MB

Possible actions if limits are surpassed: Prevent execution Allow execution with restricted resources Allow execution; cancel if resources limit is surpassed

Can be dynamically assigned through custom policies (e.g. assign different plans based on time of day)

36

New Resource Manager

Managing Very Large Deployments

Monitor operation of the system, Diagnose Problems and Analyze Usage Metrics

The new tool will also allow after the fact diagnosis of problems Set the time when the problem

occurred and you will see everything that was happening in an integrated, graphical manner down to the individual query level

37

Enhanced Monitoring and Diagnostic Tool

Unified Security and Governance

38

Enforcing Security and Governance Policies Kerberos Southbound support for databases and Web Services Kerberos pass-through support and Kerberos constrained delegation API for accessing view dependencies information and data lineage

information

Agile Development

39

New Admin Tool

Multiple tabs and databases

Resize and organize all panels and dialogs

Manages several open tasks at the same time

VQL highlighting and autocomplete features

Graphical support for GIT

Denodo 6.0 Fast Data Strategy Summit

41

March 30 US; March 31 EMEA

9:00 Welcome: Fast Data Strategy Summit Angel Vina, CEO, Denodo

9:30 Analyst Keynote: Accelerating Fast Data Strategy with Data Virtualization Presenter: Noel Yuhanna, Principal Research Analyst, Forrester Research

10:00 Customer Case Study: Designing Fast Data Architectures with Data Virtualization and Big Data on Cloud Presenter: Kurt Jackson, Platform Architect, Autodesk

10:30 Experts Panel: Core Components of Fast Data Strategy Big Data and Data Virtualization Panelists: Noel Yuhanna, Principal Research Analyst, Forrester Research Mark Eaton, Enterprise Architect, Autodesk Matt Morgan, Vice President, Product and Partner Marketing, Hortonworks Moderated by: Ravi Shankar, CMO, Denodo

11:00 Use cases: Where does Fast Data Strategy fit within IT Projects Presenter: Ravi Shankar, CMO, Denodo

12:00 Demo: How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and Operatio Scenarios Presenter: Pablo Alvarez, Principal Technical Account Manager, Denodo

12:05 Closing: Fast Data Strategy Summit Angel Vina, CEO, Denodo

Denodo 6.0 Fast Data Strategy Summit

42

March 30 US; March 31 EMEA Tracks Case Studies Intro to Data Virtualization Technical Deep-Dive

Customer Case Study: SQLization of Hadoop Increasing Business Adoption of Big Data Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient

Intro: Getting Started with Data Virtualization What problems DV solves Richard Walker, VP, Sales, Denodo

Data Science: Expediting Use of Data by Business Users with Self-service Discovery and Search Mark Pritchard, Director, Sales Engineering

Customer Case Study: Data Services Rapid Application Development using Data Virtualization Jay Heydt, Manager, Database Technologies, DrillingInfo

Demo: Getting Started with Data Virtualization What problems DV solves Pablo Alvarez, Principal Technical Account Manager, Denodo

Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses Alberto Bengoa, Sr. Product Manager, Denodo

Customer Case Study: Data Virtualization in the Cloud Avinash Desphande, Big Data and Advanced Analytics, Logitech

Enabling Fast Data Strategy: Whats new in Denodo Platform 6.0 Alberto Pan, CTO, Denodo

Data Virtualization Deployments: How to Manage Very Large Deployments Juan Lozano, Sales Engineering Manager, Denodo

Customer Case Study: TBD TBD

Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption Paul Moxon, Sr. Director, Strategic Technology Office, Denodo

Big Data: Architecture and Performance Considerations in Logical Data Lakes Alberto Pan, CTO, Denodo


Data Virtualization Maturity: Enterprise Features in Denodo Platform 6.0 Suresh Chandrasekaran, Sr. Vice President, Denodo

Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB Alberto Bengoa, Sr. Product Manager, Denodo


Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence Brain Trust Claudia Imhoff, CEO, Intelligent Solutions

Partner Enablement: Architecting and Deploying Data Virtualization

Tracks

Case Studies

Intro to Data Virtualization

Technical Deep-Dive

Customer Case Study: SQLization of Hadoop Increasing Business Adoption of Big Data

Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient

Intro: Getting Started with Data Virtualization What problems DV solves

Richard Walker, VP, Sales, Denodo

Data Science: Expediting Use of Data by Business Users with Self-service Discovery and Search

Mark Pritchard, Director, Sales Engineering

Customer Case Study: Data Services Rapid Application Development using Data Virtualization

Jay Heydt, Manager, Database Technologies, DrillingInfo

Demo: Getting Started with Data Virtualization What problems DV solves

Pablo Alvarez, Principal Technical Account Manager, Denodo

Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Alberto Bengoa, Sr. Product Manager, Denodo

Customer Case Study: Data Virtualization in the Cloud

Avinash Desphande, Big Data and Advanced Analytics, Logitech

Enabling Fast Data Strategy: Whats new in Denodo Platform 6.0

Alberto Pan, CTO, Denodo

Data Virtualization Deployments: How to Manage Very Large Deployments

Juan Lozano, Sales Engineering Manager, Denodo

Customer Case Study: TBD

TBD

Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption

Paul Moxon, Sr. Director, Strategic Technology Office, Denodo

Big Data: Architecture and Performance Considerations in Logical Data Lakes

Alberto Pan, CTO, Denodo


TBD

Data Virtualization Maturity: Enterprise Features in Denodo Platform 6.0

Suresh Chandrasekaran, Sr. Vice President, Denodo

Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Alberto Bengoa, Sr. Product Manager, Denodo


TBD

Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence Brain Trust

Claudia Imhoff, CEO, Intelligent Solutions

Partner Enablement: Architecting and Deploying Data Virtualization

Thanks!

www.denodo.com [email protected] Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

Slide Number 1Slide Number 2Slide Number 3About DenodoSlide Number 5Award-Winning Data Virtualization LeaderData VirtualizationThe Business NeedThe ChallengeThe SolutionData VirtualizationBenefits of Data VirtualizationAutodesk Successfully Changes Their Revenue Model and Transforms BusinessDenodo Platform 6.0Accelerate Your Fast Data StrategyAccelerate Your Fast Data Strategy with Denodo Platform 6.0Dynamic Query OptimizerHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optmizer WorksHow Dynamic Query Optimizer WorksPerformance ComparisonPerformance ComparisonImproved Cache PerformanceBig Data ConnectivityData Virtualization in the CloudBuying a SubscriptionSelf-Service Data Discovery and SearchSearchDiscoverExploreCreate Ad-hoc QueriesManaging Very Large DeploymentsManaging Very Large DeploymentsUnified Security and GovernanceAgile DevelopmentSlide Number 40Denodo 6.0 Fast Data Strategy SummitDenodo 6.0 Fast Data Strategy SummitSlide Number 43