red hat jboss data virtualization bill kemp sr. solutions architect

Download RED HAT JBOSS DATA VIRTUALIZATION Bill Kemp Sr. Solutions Architect

If you can't read please download the document

Upload: brian-york

Post on 08-Jan-2018

221 views

Category:

Documents


5 download

DESCRIPTION

Red Hat is… Today the collaboration between Red Hat and SAP continues. Engineers from both companies are working towards a common target — enhancing the interoperability of JBoss Enterprise middleware with the existing SAP landscape. Specifically, Red Hat and SAP are collaborating on development efforts for tools that are designed to simplify the integration of SAP data and business processes with other enterprise data and applications. The aim of such integration, of course, is a more intelligent enterprise — one that can maximize the value of your data assets in accelerating business decisions. “By running tests and executing numerous examples for specific teams, we were able to prove […] not only would the solution work, but it will perform better & at a fraction of the costs.” MICHAEL BLAKE, Director, Systems & Architecture

TRANSCRIPT

RED HAT JBOSS DATA VIRTUALIZATION Bill Kemp Sr. Solutions Architect
August 28, 2014 Red Hat is Today the collaboration between Red Hat and SAP continues. Engineers from both companies are working towards a common target enhancing the interoperability of JBoss Enterprise middleware with the existing SAP landscape.Specifically, Red Hat and SAP are collaborating on development efforts for tools that are designed to simplify the integration of SAP data and business processes with other enterprise data and applications. The aim of such integration, of course, is a more intelligent enterprise one that can maximize the value of your data assets in accelerating business decisions. By running tests and executing numerous examples for specific teams, we were able to prove [] not only would the solution work, but it will perform better & at a fraction of the costs. MICHAEL BLAKE, Director, Systems & Architecture Innovate faster, in a smarter way
A family of a lightweight, enterprise-grade productsthat are ideal for open hybrid cloud environments. Red Hat JBoss Middleware
User Interaction JBoss Portal Business Process Management JBoss BRMS JBoss BPM Suite Development Toolsh Application Integration JBoss A-MQ JBoss Fuse JBoss Fuse Service Works Management Tools Data Integration JBoss Developer Studio JBoss Data Virtualization JBoss Operations Network Foundation JBoss EAP JBoss Web Server JBoss Data Grid ACCELERATEINTEGRATEAUTOMATE Agenda Business Problem Product Overview Customer Stories Competition
Prospecting Guidance Pricing & Promotions Business Challenges Data Driven Economy Data is becoming the new raw material of business: an economic input almost on a par with capital and labor. Every day I wake up and ask, how can I flow data better, manage data better, analyze data better? CIO - Wal-Mart Data Challenges Getting Bigger Big Data, Cloud, and Mobile
Existing Data Integration approaches are not sufficient Extracting and moving data adds latency and cost Every project solves data access and integration in a different way Solutions are tightly coupled to data sources Poor flexibility and agility Constant Change BI Reports Operational Reports Enterprise Applications SOA Applications Mobile Applications Integration Complexity How to align? Siloed & Complex Hadoop NoSQL Cloud Apps Data Warehouse & Databases Mainframe XML, CSV & Excel Files Enterprise Apps Business Objective Turn Data into Actionable Information
Only 28% Users have any meaningful data access Reduce costs for finding and accessing highly fragmented data Improve time to market for new products and services by simplifying data access and integration Deliver IT solution agility necessary to capitalize on constantly changing market conditions Transform fragmented data into actionable information that delivers competitive advantage Over 70% BI project efforts lies in the integration of source data Technology Overview What does Data Virtualization software do
What does Data Virtualization software do? Turn Fragmented Data into Actionable Information DATA CONSUMERS Data Virtualization software virtuallyunifies data spread across variousdisparate sources; and makes it available to applications as a singleconsolidated data source. The data virtualization softwareimplements 3 steps process to bridge data sources and data consumers: Connect: Fast access to data fromdiverse data sources Compose: Easily create unifiedvirtual data models and views bycombining and transforming datafrom multiple sources. Consume: Expose consistentinformationto data consumers inthe right form thru standard dataaccess methods. Easy, Real-time Information Access BI Reports SOA Applications Data Virtualization Software Consume Compose Connect Virtual Consolidated Data Source Virtualize Abstract Federate Easy data accessibility thru standard interfaces e.g SQL, Web Services etc. Exposes non-relational sources as relational Read and write data in place Real time access No data replication/duplication required So lets define what are the attributes of Data Virtualization solution.The first thing that data virtualization product does is virtualizes the data, regardless of where it is. It makes the data look as if it was in one place. So applications dont need to know where the data is, because the data virtualization software does that for you. The second thing that data virtualization does is federating the data. Youre running a query which spans multiple databases or data warehouses. You want that query to run sufficiently and with optimum performance. So in order to do that, you need a variety of techniques, like caching, like pushdown optimization, you need to have knowledge of the source databases to make this whole environment run as smoothly and efficiently as possible. Thirdly, it abstracts the data into the format of choice. It conforms the data so that its in a consistent format, and thats regardless of the native structure or syntax of the data. And one point I should make here is that you want to be able to you dont want a tool which will force you to have a particular format. What you want is a format that suits your business, rather than one which is imposed on you. So you need to have, the data virtualization tool itself needs to be agile and flexible, in the sense of being able to provide a data format that suits you. And then the fourth thing you have a requirement for is to present the data in a consistent fashion. And it doesnt matter whether its a business intelligence application, its a mash-up, its a regular application; whatever it is, you want to be able to present the data in a consistent format to the business, to participating applications. Imagine if all the up-to-date data you need to take informed action, is available to you on demand as one unified source.This is the capability provided by Data Virtualization software. Siloed & Complex Oracle DW SAP XML, CSV & Excel files Salesforce.com DATA SOURCES Turn Siloed Data into Actionable Information
Mobile Applications BI Reports & Analytics ESB, ETL SOA Applications & Portals Consumers Data Easy, Real-time Information Access Consume Design Tools Standard based Data Provisioning JDBC, ODBC, SOAP, REST, OData Data Virtualization JBoss Dashboard Compose Optimization Virtualize Transform Federate Unified Virtual Database / Common Data Model Data Transformations Caching Connect Security The data virtualization software provides 3 step process to connect data sources and data consumers: Connect: Fast Access to data from disparate systems (databases, files, services, applications, etc.)with disparate access method and storage models. Compose: Easily create reusable, unified common data model and virtual data views by combining and transforming data from multiple sources. Consume: Seamlessly exposing unified, virtual data model and views available in real-time through a variety of open standards data access methods to support different tools and applications. JBoss Data Virtualization software implements all three steps internally while isolating/hiding complexity of data access methods, transformation and data merge logic details from information consumers. This enables organization to acquire actionable, unified information when they want it and the way they want it; i.e. at the business speed. Native Data Connectivity Metadata Sources Data Siloed & Complex Data Warehouse & Databases XML, CSV & Excel Files Enterprise Apps Hadoop NoSQL Cloud Apps Mainframe Consider... How would your organization change
Inconsistent, Incomplete Information Uninformed, Delayed Decisions Costly Business Risk and Exposure How would your organization change If data were readily reusable in place rather thanrequiring significant effort to build new intermediary datatiers? If data could be repurposed quicklyinto new applicationsand business processes? If all applications and business processes could get all ofthe information needed in the form needed, whereneeded and when needed? JBoss Data Virtualization Use Cases
Self-Service Business Intelligence The virtual, reusable data model provides business-friendly representation of data, allowing the user to interact with their data without having to know the complexities of their database or where the data is stored and allowing multiple BI tools to acquire data from centralized data layer.Gain better insights from Big Data using JBoss Data Virtualization to integrate with existing information sources. 360 Unified View Deliver a complete view of master & transactional data in real-time.The virtual data layer serves as a unified, enterprise-wide view of business information that improves users ability to understand and leverage enterprise data. Agile SOA Data Services A data virtualization layer deliver the missing data services layer to SOAapplications.JBoss Data Virtualization increases agility and loose coupling with virtual data stores without the need to touch underlying sources and creation of data services that encapsulate the data access logic and allowing multiple business service to acquire data from centralized data layer. Regulatory Compliance Data Virtualization layer deliver the data firewall functionality.JBoss Data Virtualization improves data quality via centralized access control, robust security infrastructure and reduction in physical copies of data thus reducing risk. Furthermore, the metadata repository catalogs enterprise data locations and the relationships between the data in various data stores, enabling transparency and visibility. Enable Self-Service Business Intelligence Shared, Reusable Logic = Lighter, Faster Client Development Microsoft Cognos Microsoft Cognos BI Tool Centric Non-sharable & Duplicated Presentation Logic KPI Calculations Semantic Data Model Data Security Policy Data Transformation Logic Data Integration Logic Data Access Logic BI Tool Centric Non-sharable & Duplicated Presentation Logic KPI Calculations Semantic Data Model Data Security Policy Data Transformation Logic Data Integration Logic Data Access Logic Presentation Logic Presentation Logic JBoss Data Virtualization Shared & Reusable KPI Calculations Semantic Data Model Data Security Policy Data Transformation Logic Data Integration Logic Data Access Logic Database Data Warehouse ERP App Cloud App Database Data Warehouse ERP App Cloud App DB DB DB DB DB DB 360 Unified View Complete View of Master and Transactional Data in Real-time
BI Reports CRM Apps Portal JBoss Data Virtualization Shared & Reusable Unified Customer View Unified Product View Unified xBusiness View Master Data Management Hub Data Repository Workflow Enterprise Apps DB DB DB Operational Data Sources Agile SOA Data Services Shared, Reusable Logic = Lighter, Faster Service Development
Web Service Web Service Web Services Web Services Non-sharable & Duplicated BusinessLogic Semantic Data Model Data Security Policy Data Transformation Logic Data Integration Logic Data Access Logic Non-sharable & Duplicated BusinessLogic Semantic Data Model Data Security Policy Data Transformation Logic Data Integration Logic Data Access Logic Business Logic BusinessLogic JBoss Data Virtualization Shared & Reusable Semantic Data Model Data Security Policy Data Transformation Logic Data Integration Logic Data Access Logic Database Data Warehouse ERP App Cloud App Database Data Warehouse ERP App Cloud App DB DB DB DB DB DB JBoss Data Virtualization Key Business Values
Increase ROA Improved utilization of data assets Derive more value from existing investments Complements existing systems Boost Agility Better/faster than hand coding Faster, less costly than batch data movement Data virtualization provides loose coupling Improve Productivity Right data at the right time to the right people Decision support, BI with a complete view of information Better Information Control Powerful security, Auditing, Data Firewall Avoid data silo proliferation Central data access and policy, Compliance JBoss Data Virtualization Key Differentiators
Lowest TCO Cost leadership lower adoption barrier Core based subscription provide flexibility across small to large deployment Openness Open, community based innovation No vendor lock-in Cloud Ready Private, public and hybrid cloud deployments Comprehensive Integrated with JBoss Middleware portfolio for end-to-end business solution Single vendor support simplify IT operations Performance Fast query processing optimizations, low footprint Comprehensive data provisioning options Quick data visualization through business dashboard Customer Success Self-Service BI and Hybrid data integration use case
Global Biotech Company Self-Service Data for Self-Service Business Intelligence Situation/Needs Needed to integrate cloud application data (salesforce.com) withon-premise, real-time data (role mgmt, territory mgmt andauthentication systems) for operational reporting and monitoring Need to ensure HIPAA compliance Need to support multiple BI tools Solution Used Data Virtualization to provide unified interface to data tomultiple BI tools Virtual views isolate BI applications from changes in the sourcedata systems Single point of data access ensured security policy enforcementand HIPAA compliance Benefits Enabled business users to use the BI tools of choice while ITensured better control of information Rapid development cycle thru the use of common data models Sensitive data is protect to ensure strict compliance requirements Portal Spotfire Business Objects Crystal Reports JBoss Data Virtualization Consume Compose Connect Web Service JNDI JDBC Role Membership LDAP Server Cloud CRM Navigator Security Regional Bank Single View of Loans Processing
Unified 360* view use case Regional Bank Single View of Loans Processing Situation / Needs: Thousands of loans in process Management seeks visibility and control, while loanoperations needs to speed up funding steps Loan data spread across many databases/systems Solution: Consolidate all data into virtual data mart Transformation of data differences Provide real-time data access to management portal andloan workflow system Benefits: Management get timely information on funding needs,exposure and operating metrics Loan officers received all the information to process theloan faster Sensitive data is protected ManagementReporting Loan Processing Workflow Mgmt. Web Services JBoss Data Virtualization Consume Compose Connect Web Services Loan Origination & Approvals Risk Analysis Loan Funding Large US Bank VISA Data Security & Governance
Data firewall use case Large US Bank VISA Data Security & Governance Situation / Needs: VISA PCI mandates protection of cardholder info Cant maintain common security policy acrossmultiple data stores Solution: Create data firewall across many data sources Federate rather than replicate Common access policy across all sources Common data definitions Audit trail Benefits: One set of data security policies Can prove to regulators that data is protected Web Portal JBoss Data Virtualization Consume Compose Connect Data Sources Multinational Insurance Company SOA Data Services Layer
Agile SOA Data Services use case Multinational Insurance Company SOA Data Services Layer Situation/Needs: Deploying SOA reference architecture Want common data model across all sources Dont want tightly bound physical data sources Change data sources without breaking apps/services Solution: All data is access via data services Data Virtualization provides abstraction and logicaldata model for enterprise Expose data as Web services and SQL Benefits: All applications will get the same data through useof common model Easier to expose data to new applications Easier to make changes to data sources SOA Applications SOA/ESB JBoss Data Virtualization Consume Compose Connect Data Sources Gain Better Insight from Big Data Intelligent Inventory Management
Big Data integration use case Gain Better Insight from Big Data Intelligent Inventory Management Objective: Right merchandise, at right time and price Problem: Cannot utilize social data and sentimentanalysis with their inventory and purchasemanagement system Solution: Leverage JBoss Data Virtualization tomashup Sentiment analysis data withinventory and purchasing system data. Leveraged BRMS to optimize pricing andstocking decisions. Analytical Apps JBoss BRMS Data Driven Decision Management JBoss Data Virtualization Consume Compose Connect Hive Purchase Mgmt Application Inventory Databases Sentiment Analysis Better Together - Big Data and Data Virtualization Big Data is not another Silo - Customers Combine Multiple Technologies Combine structured and unstructured analysis Augment data warehouse with additional external sources, suchas social media Combine high velocity and historical analysis Analyze and react to data in motion; adjust models with deephistorical analysis Reuse structured data for analysis Experimentation and ad-hoc analysis with structured data Better Together - Big Data and Data Virtualization Capture, Process and Integrate Data Volume, Velocity, Variety BI Analytics (historical, operational, predictive) SOA Composite Applications Integrate & Analyze Data Integration JBoss Data Virtualization In-memory Cache JBoss Data Grid Red Hat Enterprise Linux & Virtualization Red Hat Storage Capture & Process Messaging and Event Processing JBoss A-MQ and JBoss BRMS J Hadoop Structured Data Streaming Data Semi-Structured Data Product Details JBoss Data Virtualization: Supported Data Sources
Enterprise RDBMS: Oracle IBM DB2 Microsoft SQL Server Sybase ASE MySQL PostgreSQL Ingres Enterprise EDW: Teradata Netezza Greenplum Hadoop: Apache HortonWorks Cloudera More coming Office Productivity: Microsoft Excel Microsoft Access Google Spreadsheets Specialty Data Sources: ModeShape Repository Mondrian MetaMatrix LDAP NoSQL: JBoss Data Grid MongoDB Enterprise & Cloud Applications: Salesforce.com SAP Technology Connectors: Flat Files, XML Files, XML over HTTP SOAP Web Services REST Web Services OData Services Key New Features and Capabilities
Data connectivity enhancements Hadoop Integration (Hive Big Data), NoSQL (MongoDB Tech Preview) and JBoss Data Grid Odata support (SAP integration) Developer Productivity improvements New VDB Designer 8 and integration with JBoss Developer Studio v7 Enhanced column level security, VDB import/reuse, and native queries Simplify deployment and packaging Requires JBoss EAP only; included with subscription Remove dependency with SOA Platform Business Dashboard New rapid data reporting/visualization capability Business Dashboard Quickly Visualize your Data OData Support OData (OASIS Open Data Protocol)
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=odata Objective: OASIS OData TC works to simplify the querying and sharing of dataacross disparate applications and multiple stakeholders for re-use in theenterprise, Cloud, and mobile devices. A REST-based protocol, OData builds onHTTP, AtomPub, and JSON using URIs to address and access data feed resources.It enables information to be accessed from a variety of sources including (but notlimited to) relational databases, file systems, content management systems, andtraditional Web sites. Data Services v6 supports Odata in two ways: Connect to and access Odata sources Act as an Odata server to client applications Data Virtualization Designer Model Driven Development
Eclipse-based graphical tool for modeling, analyzing, Integrating, resolving semantic differences and testing multiple data sources to produce Relational, XML and Web Service Views that expose your business data without any programming. Shows structural transformations and dependencies Defines transformations 33 (Modeshape + Infinispan)
Metadata Repository & Governance S-RAMP: SOA Repository Artifact Model & Protocol OASIS specification that defines: a common data model for repositories an interaction protocol to facilitate the useof common tooling and sharing data. S-RAMP repository capabilities: Store and retrieve content and metadata Classification of artifacts (e.g. XSD, WSDL,VDB, ...) Clients interact via ATOM/REST XPath2 based query language Integration with Maven ATOM Binding (REST) Core Model Documents Derived Models (Read Only) JCR Storage (Modeshape + Infinispan) Semantic Mediation & Integration
Business Intelligence Applications Search Applications Web Services Application views of informationn: Relational, XML XML Document T T T Claims, Billing, Policies, Semantic Data Services Data Dictionary: Based on logical data model or XML schema Support for multiple COIs Support for multiple versions bldg_id SITENUM Facility_ID Location_ID bldg_type Depot_Number Location_Type T T T Authoritative Sources: Mapped to logical view Multiple Internal/External Information Sources JBoss Data Virtualization Logical Architecture
Data Consumers Data Sources JBoss Data Virtualization System Flow
Tooling VirtualDB Engine Server Imported from data sources Supplied via DDL Provided by Engine
Tooling VirtualDB Engine Server Users create data models based on metadata: Imported from data sources Supplied via DDL Provided by Engine Specified by user Models are packaged in a Virtual Database (VDB) Connector Binding Properties
Tooling VirtualDB Engine Server VDB Internals Virtual Databases (VDBs) are deployment archives similar to .WAR. VDBs contain Source metadata and models View metadata and models System metadata Connection information, which is bound to sources at deployment time VDBs are deployed to the query engine Source Models View Models Manifesto Info Connector Binding Properties Tooling VirtualDB Engine Server Data Consumer Apps Query Engine JDBC API Query Engine is core data virtualization functionality:Federating relational query engine.Rule and cost based optimizer, advanced query planner, caching, hint processing. Query Engine hosts VDBs, binds to data sources, performs query execution and results processing. VDB C1 C2 Connector Binding (1) Connector Binding (2) DB Oracle DB SQL Server Admin Socket Transport
Tooling VirtualDB Engine Server JBoss EAP Applications Security JAAS TransactionManager JDV Runtime Engine BufferMgr Threading Local Caches etc. VDB VDBs ODBC Socket Transport Admin Socket Transport JDBC Socket Transport Profile Service ODBC JDBC Admin / AdminShell RHQ DS JCA Translators Embedded DS xxx-ds.xml yyy-ds.xml zzz-ds.xml The server runtime environment is JBoss EAP. The Teiid Query engine is hosted in JBoss EAP and uses key container-provided services: Transaction manager JAAS security framework Container managed data sources EAP management infrastructure EAP deployment The Server exposes views /services to consumers and managed connections and connection pools for data sources. Rich Security Capabilities
Multiple forms of Authentication: Client Authentication: LoginModules (File, LDAP); Kerberos (JDBC/ODBC);HTTP Basic, WS UsernameToken Profile (Web Services) PassThrough Authentication Source Authentication: Source credentials, Caller Identity (same credentialsas client), RoleBasedCredentialMap (credentials per role), Executionpayload/Custom Authorization: Create, Read, Update, Delete, Execute permissions Row-based security Column masking Additional Security: Transport encryption (SSL: Anon, 1-way, 2-way) Password encryption Transactions Support All scopes are handled by JBoss Transactions JTA
Three scopes Global (through XAResource) Local (autocommit = false) Command (autocommit = true) Command scope behavior is handled throughtxnAutoWrap={ON|OFF|DETECT} Isolation level is set on a per connector basis. Customization & Extensibility
Many forms of customization available: Extended connectors/translators New connectors/translators User-defined functions Custom logging Administrative API XML-based virtual database, DDL support Custom metadata injection Embeddable engine Performance Optimization Load Handling
Memory Usage the BufferManager acts as a memorymanager for batches (with passivation) to ensure thatmemory will not be exhausted. Non-blocking source queries rather than waiting forsource query results processor thread detach from theplan and pick up a plan that has work. Time slicing plans produce batches for a time slicebefore re-queuing and allowing their thread to do otherwork (preemptive control only between batches) Caching ResultSets, processing plans, internalmaterialized views, etc. Performance Optimization Caching & Materialized View
Virtual Table T Source Table Materialized Table Oracle SQL Server Files XML, Text etc. Result set Cache Cached? In-coming Query Results Save? No Yes Materialization Support Virtual Database JBoss Data Virtualization Server Multiple levels of caching to meet performance requirements and manage load on source systems Materialized Views External or Internal materialized views Ability to override use of materialized views Result set Caching Applied to results return from user queries and virtual procedure calls Configurable time to live and max. number of entries Code Table Caching Suited for integrating reference data with transaction/operational data e.g. Country code, State Code etc. Caching hints to set time-to-live, memory preference, and updatability Performance Optimization Query
Access Patterns criteria requirements on pushdown queries Pushdown decompose user query into source queries Projection minimization to remove unused select items Decompose aggregates over joins/unions Generating SQL matching Teiid system functions Dependent Joins (can use hints) feed equi-join values from oneside of the join to the other Partition aware aggregation and joins Optional Join (can use hints) removes an unused join child Multi-source models allows for multiple homogeneous schemas tobe used through the same model. Copy Criteria uses criteria transitivity to minimize join tuples. Performance Optimization Query Planning
Distinct phases: parse, resolve, validation, rewrite,optimization, process plan creation. Rewrite canonicalizes and simplifies. The optimization phase follows with rules/hints/costing Non-federated optimization is similar to mature RDBMS Optimizer plan structure is a flexible tree - distinct fromthe command form and processing plans. Planning is typically quick and deterministic preparedplans are recommended Thank You Q&A Additional Position Slides Integration Technologies Integration Technologies When to use What?
Data Virtualization Real Time Service Oriented (ESB) Extract, Transform, Load (ETL) Responsiveness Batch Data Integration Style Process SOA-Centric Integration
Data Virtualization Complements SOA-Centric Integration (ESB) Our key message is that soa-centric approaches to implementing data integration/synchronization require large amounts of service/workflow development and result in solutions with lots of moving parts which can benefit from a model-based data virtualization technology that requires no data integration coding SOA-Centric Integration Data Virtualization Multi-step process or workflow development using graphical tooling Real-time transactional access to data across multiple heterogeneous data sources for operational data needs Data is treated as a special type of step that typically contains a SQL statement to execute against a source Specialized,graphical tooling for easy mapping between different models of data Resulting approach is static and cannot be queried On demand, query-able access and update of real-time up-to-date data Relational or XML data only Any data source Data Virtualization Complements Extract, Transform, Load (ETL)
Our key message is that most operational data consumption problems cannot be solved with a data warehouse but instead require specific tooling and technology focusing on model-based data consumption, integration, and exchange ETL Data Virtualization Bulk / batch data operations for data consolidation, reporting and analysis Real-time bi-directional access to data across multiple heterogeneous data sources for operational and analytical data needs Involves periodically moving / copying / consolidating large amounts of data No moving or copying of data required finer grained operational data sets No on-demand access to real-time data On demand access and update of real-time up-to-date data Limited data sources only (relational, structured files) Any data source Additional Position Slides Top 10 Ways Data Virtualization enablesAgile business intelligence development #1 Data Flattening- Simplified Tables
The table structures implemented in a data store might be complex to access for the data consumers. This leads to complex queries for retrieving data and that complicates application development. Data virtualization could present a simpler and more appropriate table structure, simplifying application development and maintenance. Every data consumer can benefit from those simplified table structures. #2 Tools Agnostic Common Data Model
Jaspersoft Cognos Business Object Microsoft Data Consumers Reusable, Common, SemanticData Model JBoss Data Virtualization Virtual DB Data Virtualization provides a unified semantic layer. So what that means is that it doesnt matter what BI tool youre using. I mean, the fact is that most large organizations have multiple BI tools. In theory, it might be a good idea if they standardized on a single one, but in practice thats probably not gonna happen. What the data virtualization layer allows you to do is to have a single interface for it, which supports all of those BI Tools. And you shouldnt have to change the way that you have a query running in Cognos or Business Objects, or whatever tool you happen to use. You should be able to run exactly as it runs now, hit the data virtualization layer, and that will provide the data for you. Data Sources #3 Centralized Data Transformation
Report 1 Report 2 Report 3 Report 4 Data Consumers JBoss Data Virtualization Format consistency (123) 123/456/7890 123,456,7890 [123] Particular data values in a data store might have formats that arent suitable for some data consumers. Imagine that most data consumers want to process telephone number values as pure digits and not in the form in which the area code is separated from the subscriber number by a dash. A data virtualization server could implement this transformation and all the data consumers will use it. Data Sources #4 Centralized Business KPIs & Metrics Calculations
BI App 1 BI App 2 BI App 3 BI App 4 Data Consumers Net Profit Operating Margin Net Sales JBoss Data Virtualization Similarly, if multiple data consumers have to access multiple data stores, each and every data consumer has to include code that is responsible for calculating business matrices and uses different calculation rules on data from those data stores. The consequence is a lot of variation of business matrices formulas and calculation methods. A data virtualization server centralizes key business metrices calculation code and all data consumers will share that code. Data Sources #5 Centralize Data Integration
BI App 1 BI App 2 BI App 3 BI App 4 Data Consumers Virtual Customer Master Virtual Master Data Virtual Product Master JBoss Data Virtualization If multiple data consumers have to access multiple data stores, each and every data consumer has to include code that is responsible for integrating those data stores. The consequence is a lot of replication of data integration solutions. A data virtualization server centralizes integration code and all data consumers will share that integration code. Data Sources #6 Ubiquitous Data Consumption
BI App 1 BI App 2 BI App 3 BI App 4 Data Consumers JDBC, ODBC, SOAP, REST, XML, JMS, POJO, Hibernate JBoss Data Virtualization Standard based Provisioning Different data stores might be using different storage formats. For example, some of the data might be stored in a SQL database, some in Excel spreadsheets, some in index sequential files, some in databases supporting other database languages than SQL, some in XML documents, and some of the data might even be hidden in HTML-based webpages. A data virtualization server can offer one unified API and database language to access all these different storage formats, therefore simplifying data access for the data consumers. They will only have to support one language and API. Data Sources #7 Optimized Data Access
Federating relational query engine. Rule and cost based optimizer, advanced query planner Multi-level caching Pushdown Queries When data from multiple data stores is joined, a performance question is where and how the join is processed: is all the data first shipped to the data consumer, will the latter process the join, or should the data from one data store be shipped to another, and will that other data store process the join? Other processing strategies exist. A developer should not be concerned with such an issue. Therefore, this is a task taken over by a data virtualization server. #8 No Data Latency Virtual Table
select e.title, e.lastname from Employees as e JOIN Departments as d ON e.dept_id = d.dept_id where year(e.birthday) >= 1970 and d.dept_name = 'Engineering' A data virtualization product can integrate data live. So, when a data consumer queries data, only then is data from the data stores retrieved and integrated. Compare this to ETL solutions which integrate data in a more scheduled fashion. The result of an ETL integration process has to be stored before it can be used for reporting. Live data integration is called on-demand data integration whereas ETL delivers scheduled data integration. Data Source(s) #9 Minimize Need for Data Replication and Duplication
Activities required to setup a physicalvs. virtual data mart Define Data Structure Define ETL Logic Prepare HW Server Install and Configure RDBMS Create Database Physical DB Design and Tuning Load Tables and Setup Batch Updates Require DBA, Developer to maintain and manage VS. Design Data Structure Define Mappings Define Virtual Tables Enable Caching (if need) #10 Centralize Security Data Sanitization Column level masking
Access and audit control Centralize compliance policies Appendix Large Investment Bank Dashboard Derivatives Trading
BI Use Case Large Investment Bank Dashboard Derivatives Trading Situation / Need: Monitor derivatives security trades to prevent roguetrades and financial loss Trading data spread across many databases/systems Solution: Consolidate all trading data into single view Real-time access Transformation of data differences Benefits: Prevent financial loss Saved time and cost to develop application Easier to manage data changes Dashboard Custom App JBoss Data Virtualization Consume Compose Connect Data Sources Large Financial Services Institution Single View of Customer
Unified 360* view use case Unified 360 View Use Case Large Financial Services Institution Single View of Customer Situation / Needs: 600 different brokerage offices 600 databases Cant access account information from other offices Cant manage customer only individual accounts Solution: Enable a CRM application to find customerinformation with single query across all databases Real-time access Benefits: Better manage customer Simpler/faster application development Brokerage CRM App JBoss Data Virtualization Consume Compose Connect 600 geographically dispersed DBs Competitive Landscape
Platform Competitors IBM(InfoSphere Federation Server) Oracle (Oracle Data Service Integrator) Strengths: Comprehensive offerings but require multiple SKUs Weakness (Exploit/Attack): Extremely expensive Complexity requires lots of services Proprietary Competitive Landscape Only Open Source Data Virtualization: Lowest TCO for broad adoption: compared to competing solutions; especially as customers are looking for ways to reduce spending. Lower business risk: due to open, community-based technology.No vendor lock-in. Out performs competitive solutions: faster query performance, most comprehensive data provisioning options, and simple data vistualization thru dashboard Comprehensive Solution: JBoss Data Virtualization is fully integrated and certified with the JBoss stack. It is part of a more comprehensive offering than those from pure-play vendors, providing shorter time to value. Pure-Play Competitors Informatica (Power Center Data Virtualization Edition) Strengths: Integrated ETL and Data Virtualization offering Integrated Data Quality support Data Integration leadership Weakness (Exploit/Attack): Always push ETL first Extremely expensive Proprietary Composite Software (Composite Information Server) Denodo (Denodo Platform) Easy to use tools Broad Connectivity Performance Lack of comprehensive platform Weak data provisioning support TCO for Mass Adoption: Lower TCO and pricing compared to competing solutions; especially as customers are looking for ways to reduce spending. Core-based subscriptions are easy to understand and provide flexibility across small to large deployments. Lower business risk due to open, community-based technology: No vendor lock-in. Note that many government organizations have a stated preference for open source products. Out performs competitive solutions: faster query performance, more provisioning options simplifies data consumption, and dashboard helps data reporting and visualization. Comprehensive Solution: JBoss Data Virtualization is fully integrated and certified with the JBoss stack. It is part of a more comprehensive offering than those from pure-play vendors, providing shorter time to value. Model Driven Development Data Virtualization Designer
Logical Models representing virtual, unified data views Shows structuraltransformations anddependencies Definestransformations with Selects Joins Criteria Functions Unions User Defined Physical Models representing actual data sources Eclipse-based graphical modeling tool for modeling, analyzing, integrating and testing multiple data sources to produce Relational, XML and Web Service Views that expose your business data without programming. 69 Business Dashboard Quickly Visualize your Data JBoss Data Virtualization
Lean Virtual Data Integration Comprehensive data federation, integration, transformation andprovisioning through the creation of reusable virtual logical datamodels that are easily consumable thru standard based SQL (JDBC,ODBC, Hibernate) and Web Services (REST, SOAP, Odata) interfaces. Model Driven Development Eclipse-based graphical tool, lets you map and transform data fromsources to target formats, as well as resolve semantic differences,create virtual data structures at a physical or logical level, and usedeclarative interfaces that are compatible with and optimized for yourapplications. Universal Connectivity with Big Data and Cloud Support for Hadoop, NoSQL, and SaaS data integration along with allmajor enterprise RDBMS, Data Warehouses and files (XML, CSV, Excel)and strong extensibility support for custom connectors.