with a gpu data frame accelerate analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...mapd...
TRANSCRIPT
![Page 1: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/1.jpg)
Accelerate Analytics with a GPU Data FrameAaron WilliamsOctober 18, 2017
![Page 2: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/2.jpg)
MapD: Extreme Analytics
2
100x Faster Queries
MapD Core
The world’s fastest columnar database, powered
by GPUs
+
Visualization at the Speed of Thought
MapD Immerse
A visualization front end that leverages the speed &
rendering superiority of GPUs
![Page 3: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/3.jpg)
MapD System ArchitectureAccelerating the existing data infrastructure
3
![Page 4: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/4.jpg)
4
MAPD DEMO
![Page 5: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/5.jpg)
MapD BenchmarksBlogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and found it to be up to orders-of-magnitude faster than the fastest CPU databases
5
MapD Core: Comparative Query Acceleration*System Q 1 Q 2 Q 3 Q 4
BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x
ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x
Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x
BigQuery 95x 38x 6x 6x
Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x
Amazon Athena 305x 117x 37x 13x
Elasticsearch (heavily tuned) 386x 343x n/a n/a
Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x
Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x
Vertica, Intel Core i5 4670K 685x 607x 203x 132x
Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a
Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x
Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x
PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x
Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x
*All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark
Source: http://tech.marksblogg.com/benchmarks.html
![Page 6: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/6.jpg)
Query Compilation with LLVM
6
Traditional DBs can be highly inefficient• each operator in SQL treated as a separate function• incurs tremendous overhead and prevents vectorization
MapD compiles queries w/LLVM to create one custom function• Queries run at speeds approaching hand-written functions• LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc).• Code can be generated to run query on CPU and GPU simultaneously
10111010101001010110101101010101
00110101101101010101010101011101LLVM
![Page 7: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/7.jpg)
Keeping Data Close to ComputeMapD maximizes performance by optimizing memory use
7
SSD or NVRAM STORAGE (L3)250GB to 20TB1-2 GB/sec
CPU RAM (L2)32GB to 3TB70-120 GB/sec
GPU RAM (L1)24GB to 256GB1000-6000 GB/sec
Hot Data Speedup = 1500x to 5000xOver Cold Data
Warm DataSpeedup = 35x to 120xOver Cold Data
Cold Data
COMPUTELAYER
STORAGELAYER
Data Lake/Data Warehouse/System Of Record
Spee
d In
crea
ses
Space Increases
![Page 8: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/8.jpg)
The Status Quo: Memory Bottlenecks
8
PCIe4-16GB/s
![Page 9: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/9.jpg)
The GPU Open Analytics Initiative ModelStandard in-memory format; zero-copy interchange
9
GPU
![Page 10: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/10.jpg)
The GPU Open Analytics Initiative ModelStandard in-memory format; zero-copy interchange
10
![Page 11: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/11.jpg)
Interactive Machine LearningEmpowering the People in the Pipeline
11
Personas inAnalytics Lifecycle
(Illustrative)Business Analyst
Data Scientist
Data Engineer
IT Systems Admin
Data Scientist / Business Analyst
Data Preparation
Data Discovery& Feature
Engineering
Model & Validate
PredictOperationalize
Monitoring & Refinement
Evaluate & Decide
GPUsMapD H20.ai MapD
![Page 12: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/12.jpg)
12
GOAI DEMO
![Page 13: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/13.jpg)
Try MapDIt’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”)
13
Play with the live demos:https://www.mapd.com/demos/
Download the Community Edition:https://www.mapd.com/platform/download-community/
Join our forums:https://community.mapd.com/
Review these slides:https://www.slideshare.net/aaronrogerwilliams
![Page 15: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge](https://reader034.vdocuments.mx/reader034/viewer/2022050103/5f421d8d3c4966625a6923b0/html5/thumbnails/15.jpg)