alluxio: unify data at memory speed; 2016-11-18
TRANSCRIPT
![Page 1: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/1.jpg)
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li @ AMPLab End of Project Celebration
November 18th, 2016
![Page 2: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/2.jpg)
HISTORY
2
Trex12-13
![Page 3: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/3.jpg)
HISTORY
2
Trex12-13
Tachyon13-15
![Page 4: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/4.jpg)
HISTORY
2
Trex12-13
Tachyon13-15
Alluxio15-
![Page 5: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/5.jpg)
FASTEST-GROWING BIG DATA PROJECT
3
![Page 6: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/6.jpg)
FASTEST-GROWING BIG DATA PROJECT
3
• Fastest growing open-source project in the big data ecosystem
• 400+ contributors from 100+ organizations
• Running world’s largest production clusters
• Welcome to join the community!
![Page 7: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/7.jpg)
CURRENT STATUS
4
Haoyuan Li, CEO
Alluxio (formerly Tachyon) Co-creator, Joined AMPLab Ph.D. Program 2011FOUNDER
INVESTOR
TEAM
From AMD, Dell, Google, Palantir, Uber, Yahoo; Experts in Distributed Systems
MSs and PhDs in CS from CMU,, Stanford, UC Berkeley
Top 10 Committers of the Alluxio Open Source Project
We are Hiring!
COMPANY Founded 2015
![Page 8: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/8.jpg)
BIG DATA ECOSYSTEM YESTERDAY
5
…
…
![Page 9: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/9.jpg)
BIG DATA ECOSYSTEM TODAY
5
…
…
![Page 10: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/10.jpg)
5
…
…
BIG DATA ECOSYSTEM ISSUES
![Page 11: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/11.jpg)
BIG DATA ECOSYSTEM WITH ALLUXIO
5
…
…
FUSE Compatible File System
Hadoop Compatible File System
Native Key-Value Interface
Native File System
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
![Page 12: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/12.jpg)
BIG DATA ECOSYSTEM WITH ALLUXIO
5
…
…
FUSE Compatible File System
Hadoop Compatible File System
Native Key-Value Interface
Native File System
Enabling Application to Access Data from any Storage System at Memory-speed
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
![Page 13: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/13.jpg)
WHY ALLUXIO
6
Co-located compute and data with memory-speed access to data
Virtualized across different storage systems under a unified namespace
Scale-out architecture
File system API, software only
![Page 14: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/14.jpg)
ALLUXIO BENEFITS
7
Unification
New workflows across any data in any storage system
Orders of magnitude improvement in run time
Choice in compute and storage – grow each independently, buy only what is needed
Performance Flexibility
![Page 15: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/15.jpg)
TRUSTED BY THE WORLD LEADING COMPANIES
8
![Page 16: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/16.jpg)
ALLUXIO USE CASES
9
Accelerating I/O to and from remote storage
Managing data across disparate storage systems
Sharing data across workloads at memory speed
![Page 17: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/17.jpg)
ACCELERATE I/O TO/FROM REMOTE STORAGE
10
Baidu’s PMs and analysts run
interactive queries to gain insights
into their products and business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu File System
![Page 18: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/18.jpg)
ACCELERATE I/O TO/FROM REMOTE STORAGE
10
The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. - Baidu
RESULTS
• Data queries are now 30x faster with Alluxio
• Alluxio cluster runs stably, providing over 50TB of RAM space
• By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds
Baidu’s PMs and analysts run
interactive queries to gain insights
into their products and business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu File System
![Page 19: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/19.jpg)
SHARE DATA ACROSS JOBS @ MEMORY SPEED
11
Barclays uses query and machine
learning to train models for risk
management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
Relational Database
![Page 20: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/20.jpg)
SHARE DATA ACROSS JOBS @ MEMORY SPEED
11
Thanks to Alluxio, we now have the raw data immediately available at every iteration and we can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. - Barclays
RESULTS
• Barclays workflow iteration time decreased from hours to seconds
• Alluxio enabled workflows that were impossible before
• By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds
Barclays uses query and machine
learning to train models for risk
management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
Relational Database
![Page 21: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/21.jpg)
MANAGE DATA ACROSS STORAGE SYSTEMS
12
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
Qunar uses real-time machine
learning for their website ads.
![Page 22: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/22.jpg)
MANAGE DATA ACROSS STORAGE SYSTEMS
12
We’ve been running Alluxio in production for over 9 months, Alluxio’s unified namespace enable different applications and frameworks to easily interact with data from different storage systems - Qunar
RESULTS
• Data sharing among Spark Streaming, Spark batch and Flink jobs provide efficient data sharing
• Improved the performance of their system with 15x – 300x speedups
• Tiered storage feature manages storage resources including memory, SSD and disk
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
Qunar uses real-time machine
learning for their website ads.
![Page 23: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/23.jpg)
ALLUXIO, INC PRODUCT OFFERINGS
13
Capa
bilit
y/Va
lue
Technology Validation
Alluxio Open Source
Open Source
Alluxio Community
Edition (ACE)
Accelerate Adoption
Alluxio Manager
Open Source
Alluxio Enterprise
Edition (AEE)
Enterprise Deployment
• Kerberos Authentication • Data Replication • Support
Alluxio Manager
Open Source
![Page 24: Alluxio: Unify Data at Memory Speed; 2016-11-18](https://reader033.vdocuments.mx/reader033/viewer/2022042723/5875bb811a28ab33128b46b9/html5/thumbnails/24.jpg)
GOING INTO THE FUTURE
14