pentaho google hangout - simplifying analytics architecture for big data
TRANSCRIPT
PENTAHO – SimplifyingAnalytics Architecture forBig Data
22nd Feb, 2016
Presenter - Sandeep [email protected]
Presenter – Sameer [email protected]
Data Pipeline
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
2
Making the Big Data Blend easy and in reality
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
3
Telco Customer Experience Analytics
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
4
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
5
Pentaho Product Overview
Pentaho Product Components
Pentaho Data Integration
Pentaho Dashboards
Pentaho Data Mining / Predictive Analytics
Pentaho Enterprise and Interactive Reports
Pentaho for Big Data MapReduce & Instaview
Pentaho Analyzer
❯ Simple, easy-to-use visual data exploration
❯ Web-based thin client; in-memory caching
❯ Rich library of interactive visualizations
• Geo-mapping, heat grids, scatter plots, bubble charts, line over bar and more
• Pluggable visualizations
❯ Java ROLAP engine to analyze structured and unstructured data, with SQL dialects for querying data from RDBMs
❯ Pluggable cache integrating with leading caching architectures: Infinispan (JBoss Data Grid) & Memcached
Pentaho Interactive Analysis & Data DiscoveryHighly Flexible Advanced Visualizations
❯ Web-based thin client
❯ Drag & drop, easy-to-use
❯ Supports any database, and data model
❯ Simple, powerful query capabilities for business users
• Filtering, formatting, group summary
❯ Powerful function library for calculated columns
❯ Sharing and distribution
❯ Row-level security
❯ Localization capabilities
Pentaho Interactive ReportingSimple Ad Hoc Reporting for Business Users
• Rich desktop report designer, pixel perfect reports
• High-volume, highly-formatted enterprise reporting
• Run locally or publish to the server
• Broad data support – relational, big data, flat files, PDI transformations
• Output options include
• HTML, Excel, CSV, PDF and RTF
• 100% Java
• Embeddable / white-labeling
Pentaho Enterprise ReportingHighly Polished Reports for Scalable Distribution
Dashboard Designer
• Web-based thin client
• Simple, drag & drop, easy-to-use
• Template-based dashboard creation
• Design rich interactivity with data
• Mashup all Pentaho and 3rd party content
Dashboard Framework
• Create highly customized dashboards & interactive web applications
• Collection of visualization and filter control widgets
• Extensible (Java, JavaScript)
Pentaho DashboardsSelf-Service for Business Users | Customizable for App Developers
Pentaho Data Integration
Easy to Use, Highly Scalable
• Graphical ETL designer
• Data agnostic
• Structured, unstructured, web
services, packaged apps (Google,
SAS, SFDC, etc.), big data sources,
traditional sources, JSON, XML,
HL7, etc.
• Batch, low-latency & real time
processing
• Scale-out architecture, deployable to
PDI clusters, Hadoop clusters
• 100% Java engine; plug-in architecture
for extensibility
• Workflow, alerting, monitoring
Integration, Manipulation & Enrichment
Use Cases:
Classic ETL – data warehouse creation, population &
maintenance
Information Delivery – extraction from multiple data
sources, transformation and streaming to a report
MapReduce Applications – implementing “code-free”
transformation pipelines within Hadoop
Extensibility – adding 3rd-party functionality that
automatically works within any of the above use cases.
Pentaho Big Data Analytics Accelerate the time to big data value
• Full continuity from data
access to decisions –
complete data integration
& analytics for any big
data store
• Faster development,
faster runtime – visual
development, distributed
execution
• Instant and interactive
analysis – no coding and
no ETL required
Major sponsor of the open source project Weka
Data exploration/visualization, model construction and export, preliminary evaluation
Numerous classification/regression and clustering algorithms
Integration with Pentaho Data Integration
• Import 3rd-party models using Predictive Modeling Markup Language (PMML)
• Operationalize models inside or outside of a Hadoop Cluster
• Incorporate algorithms into Pentaho visual interface; store and version models using the Pentaho repository
Pentaho Predictive Analytics
Full Predictive Analytics Lifecycle Support
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
14
Leverage new/enhanced features
Inline Model Editing
Model Shared with other users
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
16
Data Service and Power Blending
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
17
Data Lineage Analysis
• Understanding data origins each time it’s
executed
• What happens to it overtime
• Where data moves
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
18
Visual MapReduce
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
19
Lifecycle Management
Minimize disruption between major versions
• Content backup & restore
• Support for backward compatible components (Spring, Java)
• Additional effort on upgrade transparency for 5.x users
Scope of capability: Backup and restore all content within the enterprise repository
• Data sources
• Schedules,
• Reports and report outputs,
• Transformations,
• Metastore.
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
20
Enhanced Enterprise Security
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
Putting Big Data to Work
21
Please write to us on [email protected]
Follow us on
Find this presentation on our youtube/slideshare
Co
nfi
den
tial
info
rmat
ion
, fo
r in
tern
al u
se o
nly
22