open source memory speed virtual distributed storage
TRANSCRIPT
Alluxio (formerly Tachyon)Open Source Memory Speed Virtual Distributed Storage
Haoyuan LiCEO, Alluxio, Inc.
2
Rebranded from Tachyon to Alluxio!
Tachyon
Alluxio
3
Rebranded from Tachyon to Alluxio!
http://www.alluxio.com/blog/
About Alluxio
• Team– Alluxio Creators and Top Developers/Committers
(all top 8 committers).
• Investors
Performance Trend: Memory is Fast
• RAM throughput increasing exponentially• Disk throughput increasing slowly• Memory-locality key to interactive response
times
Price Trend: Memory is Cheaper
Source: jcmit.com
The Big Data Ecosystem Today
What is Alluxio?• Alluxio: Memory Speed Virtual Distributed Storage• Enables Virtualized Data Across Multiple Types of Storage
9
Open Source Community Growth
Dec-2012
Mar-2013
Jun-2013
Sep-2013
Dec-2013
Mar-2014
Jun-2014
Sep-2014
Dec-2014
Mar-2015
Jun-2015
Sep-2015
Dec-2015
Mar-2016
0
50
100
150
200
250
300
350
# Co
ntrib
utor
s (gi
t com
mit
hist
ory)
v0.2 v0.3v0.4 v0.5
v0.6v0.7
v0.8
10
Open Source Community Growth
Dec-2012
Mar-2013
Jun-2013
Sep-2013
Dec-2013
Mar-2014
Jun-2014
Sep-2014
Dec-2014
Mar-2015
Jun-2015
Sep-2015
Dec-2015
Mar-2016
0
50
100
150
200
250
300
350
# Co
ntrib
utor
s (gi
t com
mit
hist
ory)
v0.2 v0.3v0.4 v0.5
v0.6v0.7
v0.8
v1.0
v1.1
Open Source Alluxio System
• The fastest growing open source project in big data
• Over 250 contributors from over 100 organizations
Alluxio Benefits• Flexibility
– Enable new workloads across any storage systems– Unified Name Space enable application to access data in any storage system
• Agility– Work with the framework of your choice– Work with the storage of your choice
• Performance – High performance data access
• Cost– Grow Storage and Compute independently
• Any application accesses any data from any storage at memory speed.
New Features and Improvements in
Alluxio 1.0 and 1.1
Gene Pang @ Alluxio, Inc.June 15, 2016 @ Alluxio Meetup (hosted by Intel)
14
About Me
• Gene Pang - Software Engineer @ Alluxio, Inc.
• One of the core maintainers of Alluxio Open Source Project
• Ph.D. @ AMPLab, UC Berkeley
• Worked at Google before UC Berkeley
• Twitter: @unityxx
15
Outline
Performance Improvement Results in Alluxio 1.1
New Developments in Alluxio
Alluxio Architecture Overview
16
Alluxio Architecture Overview
17
Architecture Overview
AlluxioMaster
AlluxioWorker
AlluxioWorker
AlluxioWorker
Under File System
Under File System
Journal
Manages metadata
Servesdata blocks
Mount multiple storage systems
18
Alluxio New Developments
19
Releases
Tachyon 0.8 – Oct 22, 2015
Alluxio 1.0 – Feb 23, 2016
Alluxio 1.1 – Jun 7, 2016
20
New DevelopmentsNew Integrations
Usability Improvements
Performance Improvements
Access Control (Alpha)
21
New IntegrationsNative OpenStack Swift Driver
Alluxio to FUSE Connector
Google Cloud Storage
Aliyun Object Storage Service
Google Compute Engine
improve performance, reduce complexity
manage data on Alibaba Cloud
mount Alluxio to local file system
manage data on Google Cloud Platform
deploy Alluxio on Google Cloud Platform
22
Access Control (Alpha)User/Group Support
Command-line Permission Tools
Configuration Parameter
File System Permissionssimilar to POSIX permission model
chown, chgrp, chmod
alluxio.security.authorization.permission.enabled
similar to POSIX permission model
23
Usability ImprovementsWrite Location Policies
Simplified Configuration
Automatic Metadata Loading
configure how to write data to Alluxio
load metadata automatically
customize with properties
24
Performance ImprovementsImproved Alluxio Master Scalability
Better Support for Random I/O Workloads
Improved Alluxio Worker Scalability
fine-grained locking, efficient journaling
improved data structures, improved locking
cache blocks during random I/O (e.g., parquet files)
25
Alluxio 1.1 Performance Improvement Results
26
Create File Throughput
1.0.1
Test Duration
Thro
ughp
ut
(Local Journal)
27
Create File Throughput
1.0.11.1.0
Test Duration
Thro
ughp
ut
1.8x improvement
(Local Journal)
28
Create File Throughput(Remote Journal)
1.0.1
Test Duration
Thro
ughp
ut
29
Create File Throughput(Remote Journal)
1.0.11.1.0
Test Duration
Thro
ughp
ut
23x improvement
30
List Directory Throughput
1.0.1
Test Duration
Thro
ughp
ut
31
List Directory Throughput
1.0.11.1.0
Test Duration
Thro
ughp
ut
7x improvement
32
Worker Scalability
1.0.1
# Blocks on Worker
Writ
e La
tenc
y
33
Worker Scalability
1.0.11.1.0
# Blocks on Worker
Writ
e La
tenc
y
34
Try out Alluxio 1.1.0http://www.alluxio.org/releases