aufbau eines semantic layers zwischen db und hadoop? · oracle certified professional exadata...

32
Aufbau eines Semantic Layers zwischen DB und Hadoop? DOAG Konferenz Nürnberg, 18.11.2015 Matthias Fuchs Public Company Confidential Customer Confidential Sensitive

Upload: phungthuy

Post on 28-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Aufbau eines Semantic Layers zwischen DB und Hadoop?

DOAG Konferenz Nürnberg, 18.11.2015 Matthias Fuchs

Public – Company Confidential – Customer Confidential – Sensitive

Über mich

10+ Jahre Erfahrung mit Oracle

Oracle Certified Professional

Exadata Certified

Oracle Engineered Systems

• Exadata

• Exalytics

• Big Data

• Exalogic

DWH, Hadoop, Monitoring, Audit

Senior Solution Architect [email protected]

Twitter: @hias222

Copyright © Capgemini 2015. All Rights Reserved

3 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

Copyright © Capgemini 2015. All Rights Reserved

4 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

Capgemini – eine starke Gruppe

Umsatz nach Branchen* Umsatz nach Geschäftsbereichen*

Telecom, Media

& Entertainment

Other Managed

Services

Local

Professional

Services

Consulting Services

Application

Services

Energy, Utilities

& Chemicals

Others

Public Sector

Manufacturing,

Automotive &

Life Sciences

14%

4%

7% 19%

16%

23%

4%

58% 23%

15%

“Cap Gemini S.A.” ist im CAC 40 gelistet;

Paris, ISIN code: FR0000125338

Unsere Marke ist Capgemini, an der Pariser Börse sind

wir unter “Cap Gemini S.A.” gelistet.

Financial

Services

Copyright © Capgemini 2015. All Rights Reserved

5 Hadoop_semantic_layer_15_11.pptx

17%

Customer Products,

Retail, Distribution &

Transportation

Operative Marge : 970 Mio. €

Operativer Gewinn : 853 Mio. €

Jahresgewinn : 580 Mio. €

Netto-Barmittel und bargleiche Mittel : 1.22 Mrd. €

Umsatz 2014: 10,57 Mrd. €

* Stand: 1. Halbjahr 2015 * Stand: 1. Halbjahr 2015

In über 40 Ländern engagieren sich 180.000 Mitarbeiter für unsere Kunden (Stand Juli 2015)

Kanada

USA

Mexico

Brasilien

Argentinien

Europa

Marokko

Australien

China

Indien

Chile

Guatemala

Singapur

Philippinen

Taiwan

Vereinigte Arabische

Emirate

Mitarbeiter Offshore 96.000

Malaysia

Neuseeland

Japan

Südafrika

Kolumbien

Vietnam

Copyright © Capgemini 2015. All Rights Reserved

6 Hadoop_semantic_layer_15_11.pptx

Capgemini kombiniert seine hohe fachliche Kompetenz mit fundiertem Branchen-Know-how

Ausgewählte Referenzkunden

Copyright © Capgemini 2015. All Rights Reserved

7 Hadoop_semantic_layer_15_11.pptx

Automotive Public Sector

Telecom, Media & Entertainment

Manufacturing, Retail &

Distribution

Financial Services

Energy, Utilities & Chemicals

Copyright © Capgemini 2015. All Rights Reserved

8 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

Definition Semantic Layer

Copyright © Capgemini 2015. All Rights Reserved

9 Hadoop_semantic_layer_15_11.pptx

A semantic layer is a business representation of corporate data that helps end users access data autonomously using common

business terms. Developed and patented by Business Objects, it maps complex data into familiar business terms such as product, customer, or revenue to offer a unified, consolidated view of data

across the organization.

Unified Semantic View of Information: The Oracle BI Foundation Suite allows an organization to model the complex information sources of their business as a

simple, semantically unified, logical business model. It provides facilities to map complex physical data structures including tables, derived measures, and OLAP

cubes into business terms - abstracting how a business user expresses calculations. It translates familiar, easy to understand business concepts into the

technical details required to access the information. The Oracle BI Foundation Suite is unique in the market because it defines an enterprise semantic layer that

spans across the unified enterprise view of information.

Quelle: http://www.oracle.com/us/obiee-11g-technical-overview-078853.pdf

Semantic Layer

Business Intelligence Semantic Layer Produkte

Copyright © Capgemini 2015. All Rights Reserved

10 Hadoop_semantic_layer_15_11.pptx

Quelle: http://www.datasciencecentral.com/profiles/blogs/its-time-to-unleash-the-semantic-layer

Semantic Layer Beispiel OBIEE

Copyright © Capgemini 2015. All Rights Reserved

11 Hadoop_semantic_layer_15_11.pptx

Copyright © Capgemini 2015. All Rights Reserved

12 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

Hadoop – Polyglot Persistence

Copyright © Capgemini 2015. All Rights Reserved

13 Hadoop_semantic_layer_15_11.pptx

Hadoop

SQL

Not only SQL

Document DB

Key, Value

Hadoop

File System

Bilder, Texte

...

RDBMS

Relational

SQL

Resource Description Framework

Web Semantic

Tripel

Big Data - Tools

Copyright © Capgemini 2015. All Rights Reserved

14 Hadoop_semantic_layer_15_11.pptx

Quelle: https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan

Metadata HCatalog

Engines Interactive, Batch

HCatalog

Copyright © Capgemini 2015. All Rights Reserved

15 Hadoop_semantic_layer_15_11.pptx

Entstanden im Rahmen von Hive

Erstellt einen View/Beschreibung auf die Daten im Hadoop Filesystem

Stellt die Daten anderen Applikation zur Verfügung

HCatalog supports

• RCFile, Parquet, OCR

• CSV

• JSON

• SequenceFile (Key, Value)

• Avro (Binary)

Hcatalog Erweiterung

• InputFormat, OutputFormat

• SerDe (Serializer – Deserializer)

• Z.B. Datapump Files

Process Engines

Copyright © Capgemini 2015. All Rights Reserved

16 Hadoop_semantic_layer_15_11.pptx

General Purpose Processing Frameworks, Apache Projekte

• MapReduce Erstes Process Framework auf Hadoop, Batchverarbeitung

• Tez schneller als MapReduce, interaktive Datenverarbeitung, in Memory Verarbeitung, Integration in YARN

• Spark Performance ähnlich Tez, auch Standalone möglich

• Weitere wie z.B. Flink – Humboldt Uni Berlin

Auf Basis der Engines laufen viele SQL Frameworks, das wichtigste Hive

Es gibt auch SQL Frameworks mit eigener Process Engine wie z.B. Impala

Copyright © Capgemini 2015. All Rights Reserved

17 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

SQL Zugriff aus Sicht von Hadoop

Copyright © Capgemini 2015. All Rights Reserved

18 Hadoop_semantic_layer_15_11.pptx

Hive

MapReduce Spark* Tez

Hadoop Storage HDFS HBase, Kudu **

HA

WQ

Imp

ala

Drill

Processing Layer

SQL Queries

SQL Engines Auswahl

Storage Managers

* Spark SQL über Hive, Hive Spark nicht für Produktion ** Kudu beta

Big

Da

ta S

QL

Query HCatalog

SerDes OTA4H

Hadoop

External Tools

Big Data SQL – Oracle SQL Layer

Copyright © Capgemini 2015. All Rights Reserved

19 Hadoop_semantic_layer_15_11.pptx

Oracle Big Data SQL

Cloudera

Hadoop

NOSQL

R Advanced

Analytics

Exadata

Advanced

Analytics

Advanced

Security

Connectors

ODI

Hadoop und Big Data SQL

Copyright © Capgemini 2015. All Rights Reserved

20 Hadoop_semantic_layer_15_11.pptx

Storage Layer

Filesystem (HDFS)

Resource Management YARN + MapReduce

Processing Layer

Big Data SQL

Big Data SQL Roadmap

Copyright © Capgemini 2015. All Rights Reserved

21 Hadoop_semantic_layer_15_11.pptx

Big Data SQL

1.x - 2014

Erste Version mit Smart Scan auf Hadoop und NoSQL

Optimierte Joins - Bloom filter mit Hadoop Daten

Fan-out Parallelität auf Hadoop

2.0 – 09/2015

Storage Indexes für Big Data SQL

Reduzierter IO – auslassen von HDFS Blöcken aufgrund des Storage Index

Minimierung User Administration

Future

Optimizer – Columnar – Parquet – Partition pruning - Exadata?

Copyright © Capgemini 2015. All Rights Reserved

22 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

Semantic Layer in BI

Copyright © Capgemini 2015. All Rights Reserved

23 Hadoop_semantic_layer_15_11.pptx

Reporting Dashboards

DB Hive

Hadoop

Impala

HCatalog

Semantic Layer

Semantic Layer in Hadoop

Copyright © Capgemini 2015. All Rights Reserved

24 Hadoop_semantic_layer_15_11.pptx

DB Hive

Hadoop

Impala

HCatalog

Reporting Dashboards

Semantic Layer

Semantic Layer in Oracle DB

Copyright © Capgemini 2015. All Rights Reserved

25 Hadoop_semantic_layer_15_11.pptx

DB

Hive

Hadoop

Impala

HCatalog Big Data SQL

Semantic Layer

Reporting Dashboards

Vergleich Möglichkeiten

Copyright © Capgemini 2015. All Rights Reserved

26 Hadoop_semantic_layer_15_11.pptx

Scurity Agile Geschwindigkeit Extensibility

BI Layer External

Auf DB und Hadoop Ebene

Seperate Implementation auf DB und Hadoop

DB hoch, Hadoop abhängig

New Tool New Layer

BI Layer Database

DB Security

One Layer Implementation

Big Data SQL oder Hive, DB hoch

One Layer for Many Tools, Analytic Views

BI Layer Hadoop

Hadoop Security Knox, Sentry

One Layer ImplementationDatalake

Abhängig Verwendung Spark/Tez/MapReduce etc. Streamingdaten

Offen für Trends in Hadoop wie Streaming, Realtime

Copyright © Capgemini 2015. All Rights Reserved

27 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

Kommt in Oracle DB 12.2

Neuer Type von View Basis Business Model und Calculation Hirachien

Datanzugriff auf Tabellen, Views, External Tabellen, Big Data SQL Abfrage der Views mit einfachem SQL und MDX MDX Provider (OLE DB for OLAP) unterstütz Excel Pivot Smart Analytic View -> SQL einfach

Analytic Views DB 12.2

Copyright © Capgemini 2015. All Rights Reserved

28 Hadoop_semantic_layer_15_11.pptx

Beispiel Analytic View

Copyright © Capgemini 2015. All Rights Reserved

29 Hadoop_semantic_layer_15_11.pptx

Select time_hierachy.member_name as time,

Product_hierachy.member_name as product,

Customer_hierachy.member_name as customer,

Sales as sales,

Sales_ytd_pct_chg_yr_ago as sales_ytd_pct_chg,

Share_product_parent_sales as prod_share_sales

from sales_analysis HIERACHIES

(time_hierachy,

Product_hierachy,

Customer_hierachy)

where

Time_hierachy.level_name = ‚YEAR‘

And Product_hierachy.level_name = ‚Department‘

And Customer_hierachy=level_name = ‚Region

Filters, level der Aggregation

SQL Query:

New Hirachy Objekt

Copyright © Capgemini 2015. All Rights Reserved

30 Hadoop_semantic_layer_15_11.pptx

Agenda

Über Capgemini

Semantic Layer

Big Data - Daten und Tools

Datenbank - Hadoop Kombination

Schlussfolgerung

Ausblick Analytic Views

Demo Big Data SQL

Demo

Copyright © Capgemini 2015. All Rights Reserved

31 Hadoop_semantic_layer_15_11.pptx

Load Data über Big Data SQL

Transfer Data

The information contained in this presentation is proprietary.

Copyright © 2015 Capgemini. All rights reserved.