kognitio spark modern data platform print
DESCRIPTION
TRANSCRIPT
![Page 1: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/1.jpg)
@Kognitio #SparkEvent
Hadoop meets Mature BI: Where the rubber meets the road for
the Modern Data Platform
Michael HiskeyFuturist, Product Evangelist
(and VP, Marketing & Business Development)
![Page 2: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/2.jpg)
![Page 3: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/3.jpg)
@Kognitio #SparkEvent
Today, and the Future
Big DataAdvanced Analytics
In-memory
Modern Data Platform
Hybrid Data Ecosystem ‘Logical Data Warehouse’
Predictive Analytics
Data Scientists
Data
![Page 4: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/4.jpg)
@Kognitio #SparkEvent
The Data ScientistSexiest job of the 21st Century?
![Page 5: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/5.jpg)
@Kognitio #SparkEvent
Data Scientist
The Analytical Enterprise
Business Analyst
Systems Admin
![Page 6: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/6.jpg)
@Kognitio #SparkEvent
Remember: Decision Support Systems?
…accessed with easeand simplicity
Historical information, latency
BI tools have plateaued
0 1 2 3 4 5 6 7 8 9
Advanced analytics & data science
More math…a lot more math
![Page 7: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/7.jpg)
select Trans_Year, Num_Trans,count(distinct Account_ID) Num_Accts,sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts,cast(sum(total_spend)/1000 as int) Total_Spend,cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend,rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts,rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spendfrom( select Account_ID,
Extract(Year from Effective_Date) Trans_Year,count(Transaction_ID) Num_Trans,sum(Transaction_Amount) Total_Spend,avg(Transaction_Amount) Avg_Spend
from Transaction_factwhere extract(year from Effective_Date)<2009and Trans_Type='D' and Account_ID<>9025011and actionid in (select actionid from DEMO_FS.V_FIN_actions
where actionoriginid =1)group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary
group by Trans_Year, Num_Transorder by Trans Year desc Num Trans;
Behind the numbers
![Page 8: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/8.jpg)
@Kognitio #SparkEvent
What has changed?
More connected-users?
More-connected users?
![Page 9: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/9.jpg)
@Kognitio #SparkEvent
Don’t be a Railroad Stoker!Highly skilled engineering required … but the world innovated around them.
![Page 10: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/10.jpg)
@Kognitio #SparkEvent
Machine learning algorithms Dynamic
Simulation
Statistical Analysis
Clustering
Behaviormodelling
The drive for deeper understanding
Reporting & BPMFraud detection
Dynamic Interaction
Technology/Automation
Analytical Com
plexity
Campaign Management
![Page 11: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/11.jpg)
@Kognitio #SparkEvent
Key: “Graduation”Projects will need
to Graduatefrom the
Data Science Lab and become part
of Business as Usual
![Page 12: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/12.jpg)
@Kognitio #SparkEvent
Your goal:
PRESS HERE…and really cool Big Data stuff happens!
![Page 13: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/13.jpg)
@Kognitio #SparkEvent
Data flow
![Page 14: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/14.jpg)
@Kognitio #SparkEvent
© 20th Century Fox
![Page 15: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/15.jpg)
@Kognitio #SparkEvent
No need to pre‐process No need to align to schema
No need to triage
Null storage concerns
![Page 16: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/16.jpg)
@Kognitio #SparkEvent
Hadoop just too slow for interactive
BI!
…loss of train‐of‐thought
“while Hadoop shines as a processingplatform, it is painfully slow as a query tool”
![Page 17: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/17.jpg)
@Kognitio #SparkEvent
Lots of these
Not so many of theseinherently disk oriented
typically low ratio of CPU to Disk
Hadoop is…
![Page 18: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/18.jpg)
@Kognitio #SparkEvent
Analytics needslow latency, no I/O wait
High speed in‐memory processing
![Page 19: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/19.jpg)
A*Modern Data Platform Reference Architecture
AnalyticalPlatform Near‐line
Storage(optional)
AccessApplication &Client Layer
All BI Tools All OLAP Clients Excel
PersistenceLayer
HadoopClusters
Enterprise DataWarehouses
LegacySystems
…
Reporting
Cloud Storage
*(not THE)
![Page 20: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/20.jpg)
© Hortonworks Inc. 2013
(another) Next-Generation Data Architecture
Page 20
APPLICAT
IONS
DAT
A SYSTEM
S
Microsoft Applications
DAT
A SO
URC
ES
Traditional Sources (RDBMS, OLTP, OLAP)
In‐memory MPP Accelerator
BI Tools & OLAP Clients
TRADITIONAL REPOSRDBMS EDW MPP
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
New Sources (web logs, email, sensors, social media)
HORTONWORKS DATA PLATFORM
![Page 21: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/21.jpg)
Analytical Platform
![Page 22: Kognitio spark modern data platform print](https://reader031.vdocuments.mx/reader031/viewer/2022020122/548158655806b5d8108b4667/html5/thumbnails/22.jpg)
@Kognitio #SparkEvent
It’s all about getting work done
Used to be simple fetch of valueTasks evolving:
Then was compute dynamic aggregate
Now complex algorithms!
Now complex algorithms!