analyst view of data virtualization: conversations with boulder business intelligence

43
Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence

Upload: denodo

Post on 09-Jan-2017

129 views

Category:

Technology


1 download

TRANSCRIPT

  • Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence

  • Speakers

    Ravi Shankar

    Chief MarketingOfficer

    Pablo Alvarez

    Principal TechnicalAccount Manager

    @Ravi_Shankar_@Denodo

  • Agenda1.About Denodo2.Data Virtualization Overview3.Denodo 6.0 Overview4.Denodo 6.0 Virtual Launch

  • About Denodo

  • HEADQUARTERSPalo Alto, CA.

    DENODO OFFICES, CUSTOMERS, PARTNERSGlobal presence throughout North America, EMEA, APAC, and Latin America.

    CUSTOMERS

    250+ customers, including many F500 and G2000 companies across every major industry have gained significant business agility and ROI.

    LEADERSHIP Longest continuous focus on data

    virtualization and data services. Product leadership. Solutions expertise.

    5

    THE LEADER IN DATA VIRTUALIZATION

    Denodo provides agile, high performance data integration and data abstraction across the broadest range of enterprise, cloud, big data and unstructured data sources, and real-time data services at half the cost of traditional approaches.

  • Award-Winning Data Virtualization Leader

    6

    Forrester Wave: Enterprise Data Virtualization

    Forrester Wave: Enterprise Data Virtualization, Q1 2015

    2015 Magic Quadrant for Data Integration

    Tools

    2015 Leader in Forrester Wave: Enterprise Data

    Virtualization.

    2015 Technology Innovation Award for

    Information Management

    2015 #1 Readers Choice Awards For Data

    Virtualization Platforms

    2015 Rank Companies that Matters Most in

    Data

    2015 Big Data 50 Companies Driving

    Innovation

    2015 Leadership Award in Big Data

    For DenodoCustomer Autodesk

    Trend-Setting Products in Data and Information Management for 2016

    2016 Premier 100 Technology Leader

    For Denodo Customer CIT

  • Data Virtualization

  • The Business Need

    8

    Ready Access to Critical Information to Support Business Processes

    MarketingSales ExecutiveSupport

    Customers

    Warranties

    Channels

    Products

    Access to complete information

    Access to related information

    Access in real-time

    Cross-sell / Up-sell

  • Manually access different systems

    Not productive slows down response times

    IT responds with point-to-point data integration

    The Challenge

    9

    Data Is Siloed Across Disparate Systems

    MarketingSales ExecutiveSupport

    Database

    AppsWarehouse Cloud

    Big Data

    Documents AppsNo SQL

  • The Solution

    10

    Data Abstraction Layer

    Abstracts access to disparate data sources

    Acts as a single repository (virtual)

    Makes data available in real-time to consumers

    10

    DATA ABSTRACTION LAYER

  • Data Virtualization

    11

    Data Abstraction Layer

    Publishes the data to applications

    Combinesrelated data into views

    Connectsto disparate data sources

    2

    3

    1

  • Benefits of Data Virtualization

    12

    Data Virtualization

    Better Data Integration

    Lower integration costs by 80%.

    Flexibility to change.

    Real-time (on-demand) data services.

    Complete Information

    Focus on business information needs.

    Include web / cloud, big data, unstructured, streaming.

    Bigger volumes, richer/easier access to data.

    Better Business Outcome

    Projects in 4-6 weeks.

    ROI in

  • Problem Solution Results

    Case Study

    13

    Autodesk Successfully Changes Their Revenue Model and Transforms Business

    Autodesk was changing their business revenue model from a conventional perpetual license model to subscription-based license model.

    Inability to deliver high quality data in a timely manner to business stakeholders.

    Evolution from traditional operational data warehouse to contemporary logical data warehouse deemed necessary for faster speed.

    General purpose platform to deliver data through logical data warehouse.

    Denodo Abstraction Layer helps live invoicing with SAP.

    Data virtualization enabled a culture of see before you build.

    Successfully transitioned to subscription-based licensing.

    For the first time, Autodesk can do single point security enforcement and have uniform data environment for access.

    Autodesk, Inc. is an American multinational software corporation that makes software for the architecture, engineering, construction, manufacturing, media, and entertainment industries.

  • Denodo Platform 6.0

  • Accelerate Your Fast Data StrategyWith Denodo Platform 6.0

    Dynamic Query Optimizer

    In the Cloud

    Self Service Data Discovery and Search

    Best Real-time Performance.Shortest Time to Data.Rapid Decision Making.

  • Accelerate Your Fast Data Strategy with Denodo Platform 6.0

    16

    New Release of Denodo Platform Delivers Breakthrough Performance, Accelerates Adoption, and Expedites Business Use of Data

    Breakthrough Performance

    Dynamic Query Optimizer delivers breakthrough performance for big data, logical data warehouse, and operational scenarios

    Data Virtualization In the Cloud

    Denodo Platform for AWS accelerates adoption of data virtualization

    Self-service data discovery, and search

    Self-service data discovery and search expedites use of data by business users

  • Dynamic Query Optimizer

    17

    Delivers Breakthrough Performance for Big Data, Logical Data Warehouse, and Operational Scenarios

    Dynamically determines lowest-cost query execution plan based on statistics

    Factors in all the special characteristics of big data sources such as number of processing units and partitions

    Can easily handle any number of incremental queries

    Enables connectivity to the broadest array of big data sources such as Redshift, Impala, Spark.

    Best dynamic query optimization engine.

  • How Dynamic Query Optimizer Works

    18

    Example: Mining external dimensions with EDW

    Total sales by retailer and product during the last month for the brand ACME

    Time Dimension Fact table(sales) Product Dimension

    Retailer Dimension

    EDW MDM

    SELECT retailer.name, product.name,SUM(sales.amount)FROMsales JOIN retailer ONsales.retailer_fk = retailer.id JOIN product ON sales.product_fk = product.idJOIN time ON sales.time_fk = time.idWHERE time.date < ADDMONTH(NOW(),-1) AND product.brand = ACMEGROUP BY product.name, retailer.name

  • How Dynamic Query Optimizer Works

    19

    Example: Non-optimized

    1,000,000,000 rows

    JOIN

    JOIN

    JOIN

    GROUP BYproduct.name, retailer.name

    100 rows 10 rows 30 rows

    10,000,000 rows

    SELECTsales.retailer_fk, sales.product_fk,sales.time_fk,sales.amountFROM sales

    SELECTretailer.name,retailer.idFROM retailer

    SELECTproduct.name,product.idFROM productWHERE produc.brand = ACME

    SELECT time.date,time.idFROM timeWHERE time.date < add_months(CURRENT_TIMESTAMP, -1)

  • How Dynamic Query Optimizer Works

    20

    Step 1: Applies JOIN reordering to maximize delegation

    100,000,000 rows

    JOIN

    JOIN

    100 rows 10 rows

    10,000,000 rows

    GROUP BYproduct.name, retailer.name

    SELECT sales.retailer_fk, sales.product_fk,sales.amountFROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)

    SELECTretailer.name,retailer.idFROM retailer

    SELECT product.name,product.idFROM productWHERE produc.brand = ACME

  • How Dynamic Query Optimizer Works

    21

    Step 2

    100,000 rows

    JOIN

    JOIN

    100 rows 10 rows

    1,000 rowsGROUP BYproduct.name, retailer.nameSince the JOIN is on foreign keys

    (1-to-many), and the GROUP BY is on attributes from the dimensions, it applies the partial aggregation push down optimization

    SELECT sales.retailer_fk, sales.product_fk,SUM(sales.amount)FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)GROUP BY sales.retailer_fk, sales.product_fk

    SELECTretailer.name,retailer.idFROM retailer

    SELECT product.name,product.idFROM productWHERE produc.brand = ACME

  • How Dynamic Query Optimizer Works

    22

    Step 3

    Selects the right JOIN strategy based on costs for data volume estimations

    10,000 rows

    NESTED JOIN

    HASH JOIN

    100 rows10 rows

    1,000 rowsGROUP BYproduct.name, retailer.name

    SELECT sales.retailer_fk, sales.product_fk,SUM(sales.amount)FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1)GROUP BY sales.retailer_fk, sales.product_fkWHERE product.id IN (1,2,)

    SELECTretailer.name,retailer.idFROM retailer

    SELECT product.name,product.idFROM product WHERE produc.brand = ACME

  • How Dynamic Query Optmizer Works

    The use of Automatic JOIN reordering groups branches that go to the same source to maximize query delegation and reduce processing in the DV layer

    End users dont need to worry about the optimal pairing of the tables

    The Partial Aggregation push-down optimization is key in those scenarios. Based on PK-FK restrictions, pushes the aggregation (for the PKs) to the DW

    Leverages the processing power of the DW, optimized for these aggregations Reduces significantly the data transferred through the network (from 1 b to 100 k)

    The Cost-based Optimizer picks the right JOIN strategies based on estimations on data volumes, existence of indexes, transfer rates, etc.

    Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular databases to take into consideration the different way those systems operate (distributed data, parallel processing, different aggregation techniques, etc.)

    23

    Summary

  • How Dynamic Query Optimizer Works

    Pruning of unnecessary JOIN branches (based on 1 to + associations) when the attributes of the 1-side are not projected

    Relevant for horizontal partitioning and fat semantic models when queries do not need attributes for all the tables

    Unnecessary tables are removed from the query (even for single-source models)

    Pruning of UNION branches based on incompatible filters Enables detection of unnecessary UNION branches in vertical partitioning scenarios

    Automatic data movement Creation of temp tables in one of the systems to enable complete delegation of a federated

    branch. The target source needs to have the data movement option enabled for this option to be

    taken into account

    24

    Other relevant optimization techniques

  • Performance ComparisonLogical Data Warehouse vs. Physical Data Warehouse

    Customer Dimension2 M rows

    Sales Facts290 M rows

    Items Dimension400 K rows

    * TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems.

    Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following scenario

    The baseline was set using the same queries with all data in a Netezza appliance

    http://www.tpc.org/tpcds/

  • Performance ComparisonLogical Data Warehouse vs. Physical Data Warehouse

    Query Description Returned RowsAvg. Time Physical

    (Netezza)

    Denodo Avg. Time Logical

    Optimization Technique (automatically chosen)

    Total sales by customer 1.99 M 21.0 sec 21. 5 sec Full aggregation push-down

    Total sales by customer and year between 2000 and 2004 5.51 M 52.3 sec 59.1 sec Full aggregation push-down

    Total sales by item brand 31.4 K 4.7 sec 5.3 sec Partial aggregation push-down

    Total sales by item where sale price less than current list price

    17.1 K 3.5 sec 5.2 sec On the fly data movement

  • Improved Cache Performance

    27

    Incremental Queries

    Merge cached data and changed data to provide fully up-to-date results with minimum latency

    Get Leads changed / addedsince 1:00AM

    CACHELeads updatedat 1:00AM

    Up-to-date Leads data

    1. Salesforce Leads data cached in VDP at 1:00 AM

    2. Query needing Leads data arrives at 11:00 AM

    3. Only new/changed leads are retrieved through the WAN

    4. Response is up-to-date but query is much faster

  • Big Data Connectivity

    Big Data and Cloud Databases Connectivity

    Redshift enhanced adapter as data source, cache and data movement target

    Vertica enhanced as source, cache and data movement target

    Apache Spark enhanced adapter

    Impala enhanced as cache and data movement target

    28

  • Data Virtualization in the Cloud

    29

    Accelerate Adoption of Data Virtualization

    Ready-to-use and available on AWS Marketplace

    Dynamic and elastic infrastructure

    Complete with all enterprise-grade features at the lowest cost

    Zero set-up requirements

    Flexible rent-by-the-hour options

    A wide range of capacity options

    Only data virtualization platform on AWS.

  • Buying a Subscription

    Customer must have an Amazon AWS account

    Choose configuration required (building block + Amazon VM) Building block by sources or number of conc. queries & results

    Click-Through license agreement

    Amazon provides monthly billing based on usage Annual subscriptions billed upfront

    Support included in final pricing30

  • Self-Service Data Discovery and Search

    31

    Expedite Use of Data by Business Users

    Search Google-like search for data and metadata

    Discover Easy-to-use user interface to browse data and metadata as well as data lineage

    Explore Ability to view the graphical representation of entities and relationships

    Advanced Query Wizard for users to create ad-hoc queries

    Sandbox environment to explore the data before publishing

    Data virtualization solution to search data from sources.

  • Search

    32

    Google-like Search

    Global Search enter keyword to find views containing that data

  • Discover

    33

    Data Lineage Views

    Data lineage and tree view information including derived fields transformations

  • Explore

    34

    Graphical representation of views and relationships

  • Create Ad-hoc Queries

    35

    GUI Based Query creation & save as new Denodo view

    Export data via CSV & HTML

  • Managing Very Large Deployments

    Establish limits on resource usage e.g. Estimated memory, estimated cost, # of concurrent queries, limits to max. execution time and/or

    max. # of rows

    Assigned to user and/or roles Limits can be individual or global e.g.

    Individual: Each query of a user with role marketing cannot use more than 100 MB Global: All concurrent queries from users with role marketing cannot use more than 300 MB

    Possible actions if limits are surpassed: Prevent execution Allow execution with restricted resources Allow execution; cancel if resources limit is surpassed

    Can be dynamically assigned through custom policies (e.g. assign different plans based on time of day)

    36

    New Resource Manager

  • Managing Very Large Deployments

    Monitor operation of the system, Diagnose Problems and Analyze Usage Metrics

    The new tool will also allow after the fact diagnosis of problems Set the time when the problem

    occurred and you will see everything that was happening in an integrated, graphical manner down to the individual query level

    37

    Enhanced Monitoring and Diagnostic Tool

  • Unified Security and Governance

    38

    Enforcing Security and Governance Policies Kerberos Southbound support for databases and Web Services Kerberos pass-through support and Kerberos constrained delegation API for accessing view dependencies information and data lineage

    information

  • Agile Development

    39

    New Admin Tool

    Multiple tabs and databases

    Resize and organize all panels and dialogs

    Manages several open tasks at the same time

    VQL highlighting and autocomplete features

    Graphical support for GIT

  • 40

  • Denodo 6.0 Fast Data Strategy Summit

    41

    March 30 US; March 31 EMEA

    9:00 Welcome: Fast Data Strategy Summit Angel Vina, CEO, Denodo

    9:30 Analyst Keynote: Accelerating Fast Data Strategy with Data Virtualization Presenter: Noel Yuhanna, Principal Research Analyst, Forrester Research

    10:00 Customer Case Study: Designing Fast Data Architectures with Data Virtualization and Big Data on Cloud Presenter: Kurt Jackson, Platform Architect, Autodesk

    10:30 Experts Panel: Core Components of Fast Data Strategy Big Data and Data Virtualization Panelists: Noel Yuhanna, Principal Research Analyst, Forrester Research Mark Eaton, Enterprise Architect, Autodesk Matt Morgan, Vice President, Product and Partner Marketing, Hortonworks Moderated by: Ravi Shankar, CMO, Denodo

    11:00 Use cases: Where does Fast Data Strategy fit within IT Projects Presenter: Ravi Shankar, CMO, Denodo

    12:00 Demo: How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and Operatio Scenarios Presenter: Pablo Alvarez, Principal Technical Account Manager, Denodo

    12:05 Closing: Fast Data Strategy Summit Angel Vina, CEO, Denodo

  • Denodo 6.0 Fast Data Strategy Summit

    42

    March 30 US; March 31 EMEA Tracks Case Studies Intro to Data Virtualization Technical Deep-Dive

    Customer Case Study: SQLization of Hadoop Increasing Business Adoption of Big Data Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient

    Intro: Getting Started with Data Virtualization What problems DV solves Richard Walker, VP, Sales, Denodo

    Data Science: Expediting Use of Data by Business Users with Self-service Discovery and Search Mark Pritchard, Director, Sales Engineering

    Customer Case Study: Data Services Rapid Application Development using Data Virtualization Jay Heydt, Manager, Database Technologies, DrillingInfo

    Demo: Getting Started with Data Virtualization What problems DV solves Pablo Alvarez, Principal Technical Account Manager, Denodo

    Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses Alberto Bengoa, Sr. Product Manager, Denodo

    Customer Case Study: Data Virtualization in the Cloud Avinash Desphande, Big Data and Advanced Analytics, Logitech

    Enabling Fast Data Strategy: Whats new in Denodo Platform 6.0 Alberto Pan, CTO, Denodo

    Data Virtualization Deployments: How to Manage Very Large Deployments Juan Lozano, Sales Engineering Manager, Denodo

    Customer Case Study: TBD TBD

    Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption Paul Moxon, Sr. Director, Strategic Technology Office, Denodo

    Big Data: Architecture and Performance Considerations in Logical Data Lakes Alberto Pan, CTO, Denodo

    Customer Case Study: TBD TBD

    Data Virtualization Maturity: Enterprise Features in Denodo Platform 6.0 Suresh Chandrasekaran, Sr. Vice President, Denodo

    Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB Alberto Bengoa, Sr. Product Manager, Denodo

    Customer Case Study: TBD TBD

    Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence Brain Trust Claudia Imhoff, CEO, Intelligent Solutions

    Partner Enablement: Architecting and Deploying Data Virtualization

    Tracks

    Case Studies

    Intro to Data Virtualization

    Technical Deep-Dive

    Customer Case Study: SQLization of Hadoop Increasing Business Adoption of Big Data

    Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient

    Intro: Getting Started with Data Virtualization What problems DV solves

    Richard Walker, VP, Sales, Denodo

    Data Science: Expediting Use of Data by Business Users with Self-service Discovery and Search

    Mark Pritchard, Director, Sales Engineering

    Customer Case Study: Data Services Rapid Application Development using Data Virtualization

    Jay Heydt, Manager, Database Technologies, DrillingInfo

    Demo: Getting Started with Data Virtualization What problems DV solves

    Pablo Alvarez, Principal Technical Account Manager, Denodo

    Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

    Alberto Bengoa, Sr. Product Manager, Denodo

    Customer Case Study: Data Virtualization in the Cloud

    Avinash Desphande, Big Data and Advanced Analytics, Logitech

    Enabling Fast Data Strategy: Whats new in Denodo Platform 6.0

    Alberto Pan, CTO, Denodo

    Data Virtualization Deployments: How to Manage Very Large Deployments

    Juan Lozano, Sales Engineering Manager, Denodo

    Customer Case Study: TBD

    TBD

    Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption

    Paul Moxon, Sr. Director, Strategic Technology Office, Denodo

    Big Data: Architecture and Performance Considerations in Logical Data Lakes

    Alberto Pan, CTO, Denodo

    Customer Case Study: TBD

    TBD

    Data Virtualization Maturity: Enterprise Features in Denodo Platform 6.0

    Suresh Chandrasekaran, Sr. Vice President, Denodo

    Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

    Alberto Bengoa, Sr. Product Manager, Denodo

    Customer Case Study: TBD

    TBD

    Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence Brain Trust

    Claudia Imhoff, CEO, Intelligent Solutions

    Partner Enablement: Architecting and Deploying Data Virtualization

  • Thanks!

    www.denodo.com [email protected] Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

    Slide Number 1Slide Number 2Slide Number 3About DenodoSlide Number 5Award-Winning Data Virtualization LeaderData VirtualizationThe Business NeedThe ChallengeThe SolutionData VirtualizationBenefits of Data VirtualizationAutodesk Successfully Changes Their Revenue Model and Transforms BusinessDenodo Platform 6.0Accelerate Your Fast Data StrategyAccelerate Your Fast Data Strategy with Denodo Platform 6.0Dynamic Query OptimizerHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optimizer WorksHow Dynamic Query Optmizer WorksHow Dynamic Query Optimizer WorksPerformance ComparisonPerformance ComparisonImproved Cache PerformanceBig Data ConnectivityData Virtualization in the CloudBuying a SubscriptionSelf-Service Data Discovery and SearchSearchDiscoverExploreCreate Ad-hoc QueriesManaging Very Large DeploymentsManaging Very Large DeploymentsUnified Security and GovernanceAgile DevelopmentSlide Number 40Denodo 6.0 Fast Data Strategy SummitDenodo 6.0 Fast Data Strategy SummitSlide Number 43