an hdd-based lustre hsm implementation using a …...wos: overview wos is an object storage platform...
TRANSCRIPT
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
1!
An HDD-based Lustre HSM Implementation Using A Scalable Object Store. Discussion And Performance Evaluation
LAD 2015
Maria Perez Gutierrez, Thomas Favre-Bulle, James Coomer, Shuichi Ihara, Robert Triendl
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
2! Agenda
▶ Motivation ▶ HSM idea ▶ Why an Object Storage Target ▶ Object Storage Solution Overview: WOS ▶ WOS Copytool Implementation ▶ Test environment & benchmarked Operations ▶ Results ▶ Summary
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
3! Motivation
▶ Current Situation • Robinhood has driven stronger HSM takeup for Lustre • Most are implemented to tape (e.g. HPSS) and scale-out
NAS
▶ But • Object Stores are faster than tape, and simpler to manage
than NAS. They also can deliver simple DR ▶ So
• We implemented a new, simple copytool to an object store (DDN WOS)
▶ And • We compare it with other methods
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
4! HSM idea
Archive: “Copy blocks for candidate files into Archive” e.g. after last access > 14 days Release: “Release Lustre blocks for these” e.g. when filesystem is 70% full
/
/home /work /scratch
File A
File B
File C
File D
inode table File A - - - File B - - - File C - - - File D - - - File E - - - File F - - - File G - - - File H - - - File I - - - File J archived File K archived File L archived .. File N - - -
File E
File F
File G
File H
File I
File J
File K
File L
File J
File K
File L
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
5! Why an Object Store Target for HSM?
▶ Eliminate the higher latency, lower throughput, and
uncertainty associated with Tape ▶ Easily implement data distribution to remote sites ▶ Flexible Policies for data protection and distribution ▶ Much higher scalability and lower management
overhead than an HSM NAS target ▶ Object stores optimised for lower cost at scale
DDN Confidential
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
6! WOS: Overview
▶ WOS is an object storage platform that exposes a PUT/GET/DELETE API
▶ Object Disk Architecture: drives formatted with custom WOS disk object layout, no Linux FS, no fragmentation, fully contiguous object read and write operations for maximum disk efficiency
▶ WOS storage nodes can be distributed geographically to build a global storage cloud
▶ Data is stored as objects, with an object ID and metadata in a flat namespace
▶ PUTs into WOS require a POLICY to be requested
▶ POLICIES can created to define where the data is replicated to, and how it is erasure coded across drives and sites.
DDN Confidential
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
7! WOS: Overview
C
A C Zone 1 C Zone 3 B
B
A
C Zone 2
WOS cluster: No. of zones containing nodes. Policy: defines how an object is stored, no. of replicas in each zone and the method used to secure the data. Policy A Zone 1 = 1
Zone 2 = 1
Policy B Zone 2 = 1 Zone 3 = 1
Policy C Zone 1 = 2 Zone 2 = 1 Zone 3 = 1
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
8! WOS Copytool: Data Flow
Object Store
coordinator
copytool
MDS
Agent
PUT GET
Copytool control
File I/O
Archive
Lustre
OSS OSS
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
9! WOS Copytool Implementation
▶ lhsmtool_wos ▶ Operates similar to POSIX copytool ▶ OPTIONS: -A : archive_id : each copytool agent has a unique archive id -c : chunk_size: chunk
-d : daemon, run the wos copytool on daemon mode -h : WOS_IP -p : WOS_POLICY -v : verbose
▶ Internal commands: • wos_copy put -s INFILENAME -h WOS_IP -p WOS_POLICY • wos_copy get -o OID -d OUTFILENAME -h WOS_IP -p WOS_POLICY
▶ An archived file then finds the xattr populated with OID to allow later retrieval
[root@testy test1C]# getfattr -n trusted.oid /lustre/testfile# file: lustre/testfile trusted.oid="CDDEjBNaBlkKSrCFqBNrgURVpMoD2D6e8hG0FYWG”
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
10! Test Environment
120 NL-SAS drives 4 SSDs
Agent Agent Agent
OSS1 OSS2
IB 10G
Agent
FDR Infiniband 1 links from each OSS
FDR Infiniband 1 links from each agent/
client 10GbE 1 links per agent
10GbE 1 links per WOS node
GS7K/ES7K With SS7000
Expansion Enclosures
Client Client Client Client
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
11! Benchmarked Operations
Agent Agent Agent
OSS1 OSS2
IB 10G
Agent
Writ
e to
file
syst
em
archive
Rea
d fro
m F
ilesy
stem
Read from Archive (restore
file)
• How fast can we move big files (throughput) and little files (software overhead for operations)
• AIMs: ► to keep up with the
filesystem IO ► To provide reasonable
read rates for archived files
120 NL-SAS drives 4 SSDs
Client Client Client Client
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
12! Results: Large Files - Throughput
▶ Using 4 clients/agents, single thread per client/agent
▶ Around 1.5GB/s to Archive files into WOS for large files
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
GPFS write
GPFS read
Lustre write
Lustre read
Archive (GPFS)
Archive (Lustre)
Archive read
(GPFS)
Archive read
(Lustre)
Thro
ughp
ut (M
B/s
)
Data Movement Type
Throughput for large (100M) files
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
13! Results: Large Files - Throughput
▶ Using 4 clients/agents, multithread per client/agent
▶ Around 1.5GB/s to Archive files into WOS for large files
▶ Aggregate Reads from WOS ~650MB/s
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
GPFS write
GPFS read
Lustre write
Lustre read
Archive (GPFS)
Archive (Lustre)
Archive read
(GPFS)
Archive read
(Lustre)
Thro
ughp
ut (M
B/s
)
Data Movement Type
Throughput for large (100M) files
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
14! Results: Medium and Large files
320 219
858
1781
317.7
617 643
1569
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1MB - reads 1MB - writes 100MB - reads 100MB - writes
MB
/s
File size – Operation type
4 clients: Medium & Large files
GPFS archive
Lustre archive
▶ Using 4 clients/agents, multithread per client/agent
▶ WOS copytool provides higher bandwidth than GPFS archive solution for 1M files
▶ Results for large files get close to GPFS based solution
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
15! Results: Single client
6088
3925
249 236
5562
4026
425 450
0
1000
2000
3000
4000
5000
6000
7000
GPFS (IB) Lustre (IB) GPFS archive (10Gb)
Lustre archive (10Gb)
MB
/s
Single client - Max performance
Read
Writes
▶ WOS copytool provides maximum client performances aligned with values provided by the GPFS based solution
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
16! Results: Small Files (4k) – Rates
▶ Using 4 clients/agents, with one thread per client/agent
▶ Around 400 files moved per second archiving data
▶ Got higher rates with Lustre Wos-copytool on archiving and recovering data
0
200
400
600
800
1000
1200
1400
1600
1800
GPFS write
GPFS read
Lustre write
Lustre read
Archive (GPFS)
Archive (Lustre)
Archive read
(GPFS)
Archive read
(Lustre)
No.
of F
iles/
seco
nd
Data movement Type
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
17!
0
200
400
600
800
1000
1200
1400
1600
1800
1client 4 clients
MB
/s
WOS copytool - performance scaling
1MB files
100MB files
Performances scaling
3.56x
3.48x
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
18! Summary
▶ Promising data rates to Object Store ▶ Next Steps:
• Scalability Tests: Clients and Copytool Agents at scale • Explore Data Protection Options o Enable DR within Object Store (support a remote
namespace) o Enable Backup operations to Object Store (immutable file
copies to object store)
ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
19! Thank you
Questions?