aufbau eines semantic layers zwischen db und hadoop? · oracle certified professional exadata...
TRANSCRIPT
Aufbau eines Semantic Layers zwischen DB und Hadoop?
DOAG Konferenz Nürnberg, 18.11.2015 Matthias Fuchs
Public – Company Confidential – Customer Confidential – Sensitive
Über mich
10+ Jahre Erfahrung mit Oracle
Oracle Certified Professional
Exadata Certified
Oracle Engineered Systems
• Exadata
• Exalytics
• Big Data
• Exalogic
DWH, Hadoop, Monitoring, Audit
Senior Solution Architect [email protected]
Twitter: @hias222
Copyright © Capgemini 2015. All Rights Reserved
3 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
Copyright © Capgemini 2015. All Rights Reserved
4 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
Capgemini – eine starke Gruppe
Umsatz nach Branchen* Umsatz nach Geschäftsbereichen*
Telecom, Media
& Entertainment
Other Managed
Services
Local
Professional
Services
Consulting Services
Application
Services
Energy, Utilities
& Chemicals
Others
Public Sector
Manufacturing,
Automotive &
Life Sciences
14%
4%
7% 19%
16%
23%
4%
58% 23%
15%
“Cap Gemini S.A.” ist im CAC 40 gelistet;
Paris, ISIN code: FR0000125338
Unsere Marke ist Capgemini, an der Pariser Börse sind
wir unter “Cap Gemini S.A.” gelistet.
Financial
Services
Copyright © Capgemini 2015. All Rights Reserved
5 Hadoop_semantic_layer_15_11.pptx
17%
Customer Products,
Retail, Distribution &
Transportation
Operative Marge : 970 Mio. €
Operativer Gewinn : 853 Mio. €
Jahresgewinn : 580 Mio. €
Netto-Barmittel und bargleiche Mittel : 1.22 Mrd. €
Umsatz 2014: 10,57 Mrd. €
* Stand: 1. Halbjahr 2015 * Stand: 1. Halbjahr 2015
In über 40 Ländern engagieren sich 180.000 Mitarbeiter für unsere Kunden (Stand Juli 2015)
Kanada
USA
Mexico
Brasilien
Argentinien
Europa
Marokko
Australien
China
Indien
Chile
Guatemala
Singapur
Philippinen
Taiwan
Vereinigte Arabische
Emirate
Mitarbeiter Offshore 96.000
Malaysia
Neuseeland
Japan
Südafrika
Kolumbien
Vietnam
Copyright © Capgemini 2015. All Rights Reserved
6 Hadoop_semantic_layer_15_11.pptx
Capgemini kombiniert seine hohe fachliche Kompetenz mit fundiertem Branchen-Know-how
Ausgewählte Referenzkunden
Copyright © Capgemini 2015. All Rights Reserved
7 Hadoop_semantic_layer_15_11.pptx
Automotive Public Sector
Telecom, Media & Entertainment
Manufacturing, Retail &
Distribution
Financial Services
Energy, Utilities & Chemicals
Copyright © Capgemini 2015. All Rights Reserved
8 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
Definition Semantic Layer
Copyright © Capgemini 2015. All Rights Reserved
9 Hadoop_semantic_layer_15_11.pptx
A semantic layer is a business representation of corporate data that helps end users access data autonomously using common
business terms. Developed and patented by Business Objects, it maps complex data into familiar business terms such as product, customer, or revenue to offer a unified, consolidated view of data
across the organization.
Unified Semantic View of Information: The Oracle BI Foundation Suite allows an organization to model the complex information sources of their business as a
simple, semantically unified, logical business model. It provides facilities to map complex physical data structures including tables, derived measures, and OLAP
cubes into business terms - abstracting how a business user expresses calculations. It translates familiar, easy to understand business concepts into the
technical details required to access the information. The Oracle BI Foundation Suite is unique in the market because it defines an enterprise semantic layer that
spans across the unified enterprise view of information.
Quelle: http://www.oracle.com/us/obiee-11g-technical-overview-078853.pdf
Semantic Layer
Business Intelligence Semantic Layer Produkte
Copyright © Capgemini 2015. All Rights Reserved
10 Hadoop_semantic_layer_15_11.pptx
Quelle: http://www.datasciencecentral.com/profiles/blogs/its-time-to-unleash-the-semantic-layer
Semantic Layer Beispiel OBIEE
Copyright © Capgemini 2015. All Rights Reserved
11 Hadoop_semantic_layer_15_11.pptx
Copyright © Capgemini 2015. All Rights Reserved
12 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
Hadoop – Polyglot Persistence
Copyright © Capgemini 2015. All Rights Reserved
13 Hadoop_semantic_layer_15_11.pptx
Hadoop
SQL
Not only SQL
Document DB
Key, Value
Hadoop
File System
Bilder, Texte
...
RDBMS
Relational
SQL
Resource Description Framework
Web Semantic
Tripel
Big Data - Tools
Copyright © Capgemini 2015. All Rights Reserved
14 Hadoop_semantic_layer_15_11.pptx
Quelle: https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan
Metadata HCatalog
Engines Interactive, Batch
HCatalog
Copyright © Capgemini 2015. All Rights Reserved
15 Hadoop_semantic_layer_15_11.pptx
Entstanden im Rahmen von Hive
Erstellt einen View/Beschreibung auf die Daten im Hadoop Filesystem
Stellt die Daten anderen Applikation zur Verfügung
HCatalog supports
• RCFile, Parquet, OCR
• CSV
• JSON
• SequenceFile (Key, Value)
• Avro (Binary)
Hcatalog Erweiterung
• InputFormat, OutputFormat
• SerDe (Serializer – Deserializer)
• Z.B. Datapump Files
Process Engines
Copyright © Capgemini 2015. All Rights Reserved
16 Hadoop_semantic_layer_15_11.pptx
General Purpose Processing Frameworks, Apache Projekte
• MapReduce Erstes Process Framework auf Hadoop, Batchverarbeitung
• Tez schneller als MapReduce, interaktive Datenverarbeitung, in Memory Verarbeitung, Integration in YARN
• Spark Performance ähnlich Tez, auch Standalone möglich
• Weitere wie z.B. Flink – Humboldt Uni Berlin
Auf Basis der Engines laufen viele SQL Frameworks, das wichtigste Hive
Es gibt auch SQL Frameworks mit eigener Process Engine wie z.B. Impala
Copyright © Capgemini 2015. All Rights Reserved
17 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
SQL Zugriff aus Sicht von Hadoop
Copyright © Capgemini 2015. All Rights Reserved
18 Hadoop_semantic_layer_15_11.pptx
Hive
MapReduce Spark* Tez
Hadoop Storage HDFS HBase, Kudu **
HA
WQ
Imp
ala
Drill
Processing Layer
SQL Queries
SQL Engines Auswahl
Storage Managers
* Spark SQL über Hive, Hive Spark nicht für Produktion ** Kudu beta
Big
Da
ta S
QL
Query HCatalog
SerDes OTA4H
Hadoop
External Tools
Big Data SQL – Oracle SQL Layer
Copyright © Capgemini 2015. All Rights Reserved
19 Hadoop_semantic_layer_15_11.pptx
Oracle Big Data SQL
Cloudera
Hadoop
NOSQL
R Advanced
Analytics
Exadata
Advanced
Analytics
Advanced
Security
Connectors
ODI
Hadoop und Big Data SQL
Copyright © Capgemini 2015. All Rights Reserved
20 Hadoop_semantic_layer_15_11.pptx
Storage Layer
Filesystem (HDFS)
Resource Management YARN + MapReduce
Processing Layer
Big Data SQL
Big Data SQL Roadmap
Copyright © Capgemini 2015. All Rights Reserved
21 Hadoop_semantic_layer_15_11.pptx
Big Data SQL
1.x - 2014
Erste Version mit Smart Scan auf Hadoop und NoSQL
Optimierte Joins - Bloom filter mit Hadoop Daten
Fan-out Parallelität auf Hadoop
2.0 – 09/2015
Storage Indexes für Big Data SQL
Reduzierter IO – auslassen von HDFS Blöcken aufgrund des Storage Index
Minimierung User Administration
Future
Optimizer – Columnar – Parquet – Partition pruning - Exadata?
Copyright © Capgemini 2015. All Rights Reserved
22 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
Semantic Layer in BI
Copyright © Capgemini 2015. All Rights Reserved
23 Hadoop_semantic_layer_15_11.pptx
Reporting Dashboards
DB Hive
Hadoop
Impala
HCatalog
Semantic Layer
Semantic Layer in Hadoop
Copyright © Capgemini 2015. All Rights Reserved
24 Hadoop_semantic_layer_15_11.pptx
DB Hive
Hadoop
Impala
HCatalog
Reporting Dashboards
Semantic Layer
Semantic Layer in Oracle DB
Copyright © Capgemini 2015. All Rights Reserved
25 Hadoop_semantic_layer_15_11.pptx
DB
Hive
Hadoop
Impala
HCatalog Big Data SQL
Semantic Layer
Reporting Dashboards
Vergleich Möglichkeiten
Copyright © Capgemini 2015. All Rights Reserved
26 Hadoop_semantic_layer_15_11.pptx
Scurity Agile Geschwindigkeit Extensibility
BI Layer External
Auf DB und Hadoop Ebene
Seperate Implementation auf DB und Hadoop
DB hoch, Hadoop abhängig
New Tool New Layer
BI Layer Database
DB Security
One Layer Implementation
Big Data SQL oder Hive, DB hoch
One Layer for Many Tools, Analytic Views
BI Layer Hadoop
Hadoop Security Knox, Sentry
One Layer ImplementationDatalake
Abhängig Verwendung Spark/Tez/MapReduce etc. Streamingdaten
Offen für Trends in Hadoop wie Streaming, Realtime
Copyright © Capgemini 2015. All Rights Reserved
27 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
Kommt in Oracle DB 12.2
Neuer Type von View Basis Business Model und Calculation Hirachien
Datanzugriff auf Tabellen, Views, External Tabellen, Big Data SQL Abfrage der Views mit einfachem SQL und MDX MDX Provider (OLE DB for OLAP) unterstütz Excel Pivot Smart Analytic View -> SQL einfach
Analytic Views DB 12.2
Copyright © Capgemini 2015. All Rights Reserved
28 Hadoop_semantic_layer_15_11.pptx
Beispiel Analytic View
Copyright © Capgemini 2015. All Rights Reserved
29 Hadoop_semantic_layer_15_11.pptx
Select time_hierachy.member_name as time,
Product_hierachy.member_name as product,
Customer_hierachy.member_name as customer,
Sales as sales,
Sales_ytd_pct_chg_yr_ago as sales_ytd_pct_chg,
Share_product_parent_sales as prod_share_sales
from sales_analysis HIERACHIES
(time_hierachy,
Product_hierachy,
Customer_hierachy)
where
Time_hierachy.level_name = ‚YEAR‘
And Product_hierachy.level_name = ‚Department‘
And Customer_hierachy=level_name = ‚Region
Filters, level der Aggregation
SQL Query:
New Hirachy Objekt
Copyright © Capgemini 2015. All Rights Reserved
30 Hadoop_semantic_layer_15_11.pptx
Agenda
Über Capgemini
Semantic Layer
Big Data - Daten und Tools
Datenbank - Hadoop Kombination
Schlussfolgerung
Ausblick Analytic Views
Demo Big Data SQL
Demo
Copyright © Capgemini 2015. All Rights Reserved
31 Hadoop_semantic_layer_15_11.pptx
Load Data über Big Data SQL
Transfer Data
The information contained in this presentation is proprietary.
Copyright © 2015 Capgemini. All rights reserved.