secure data reservoir: 70 gbps fully encrypted data ... · kei hiraki mary inaba the university of...
TRANSCRIPT
Junichiro Shitami
Secure Data Reservoir: 70 Gbps fully encrypted data transfer facility
Junichiro Shitami
Goki Honjo
Kei Hiraki
Mary Inaba
The University of Tokyo
1
Junichiro Shitami
Agenda
• Background
• Our Goal
• Preliminary Evaluations
• Design of Secure Data Reservoir
• Evaluation on Real Network
• Conclusion
• Future Work
2
Junichiro Shitami
Background
• High performance data transfer is essential• We have been working to efficiently utilze high-speed network
• Data Reservoir Project• Started on 1 Gbps Tokyo – US network
• Achieved Land Speed Record of that date
• We can get enough performance• for memory transfer even 100G network
• also for unencrypted storage transfer
3
0
500000
1000000
1500000
2000000
2000 2003 2006 2009 2012 2015 2018
Bandwidth Distance Product(Tbit meter / second)
Junichiro Shitami
Change of target application
• Target application is changed• Existing: Physics, Astronomy, Graphics, Videos
• Recent: Bio-science, Medical science
• We need fully encrypted transfer• for very high-speed cell sorter called “Serendipiter”
• Bio science is regulated by low
• Traditionally used “scp” is very slow
4
Serendipiter
Junichiro Shitami
• Design a data transfer facility optimal for bio- and medical science• to enable fully encrypted storage transfer
• using ordinary Linux 1 server pair
• utilizes all available bandwidth on intercontinental network
• named “Secure Data Reservoir”
• Verify the performance• on high-speed long-distance network
• on real network
• with pacing technique
Our Goal
5
Junichiro Shitami
Preliminary Evaluations
• We made several preliminary evaluations• for the better design of Secure Data Reservoir
• Evaluations• Network performance
• RAID performance
• Storage performance
• Encryption performance
6
Junichiro Shitami
Network Performance
Throughput CPU load
Send 99 Gbps 75 %
Receive 99 Gbps 90 %
• We plan to use Chelsio 100 Gbps NIC (T62100-LP-CR)
• Confirm basic TCP send/receive performance• on back-to-back network
• using iperf3
• Result• Wire-rate performance
• Receiver CPU load is higher than sender
7
Junichiro Shitami
RAID Performance
Read CPU load Write CPU load
Linux software RAID 6729 MB/s 28.2 % 2059 MB/s 15.9 %
VROC RAID 6751 MB/s 28.3 % 2087 MB/s 16.2 %
• We plan to use Intel VROC RAID
• Compare Intel VROC RAID and Linux software RAID• on same SSD
• using fio benchmark
• Result• Almost no difference
• We can use any SSDs regardless of VROC support
8
Intel CPUVROC RAID
VS
LinuxSoftware RAID
Junichiro Shitami
Storage Performance
Throughput CPU load
Read 180 Gbps 100 %
Write 95 Gbps 100 %
• We plan to use Samusung SSD 960 PRO (NMVe)
• Confirm basic read/write performance• using 8 SSDs per 1 server (Linux software RAID)
• using fio benchmark
• Result• Sufficient performance for data transfer on 100 Gbps network
• Write CPU load is much higher than read
9
Linux Software RAID8 SSDs (4 x2)
ASUS HYPER M.2 X16 ASUS HYPER M.2 X16
Junichiro Shitami
Encryption Performance
64 bytes 1024 bytes 8192 bytes CPU load
Intel AES-NI 29.0 GB/s 80.4 GB/s 98.8 GB/s 100 %
Chelsio Crypto Offload 28.6 GB/s 81.2 GB/s 98.9 GB/s 100 %
• We plan to use Crypto Offload function of Chelsio NIC
• Compare Chelsio Crypto Offload and Intel AES-NI (CPU)• using openssl benchmark
• Result• Almost no difference
• Chelsio Crypto Offload does not reduce CPU load
10
Intel CPUAES-NI encryption
ChelsioCrypto Offload
VS
Junichiro Shitami
Design of Secure Data Reservoir
• Design component• Chelsio 100G NIC
• Sumsung SSD 960 PRO (8 SSDs per 1 server)
• Linux software RAID
• Intel AES-NI encryption
• Features• AES256 encryption
• Dedicated Storage threads and network threads
• Pacing technique to stabilize TCP streams
11
Junichiro Shitami
System Configuration
Storage data
CPU
M.2 NVMe SSD x8
Storage thread
Network thread
Encryption
100G NIC
CPU
Network
CPU
M.2 NVMe SSD x8
Storage thread
Network thread
Encryption
100G NIC
CPU
CPU Intel Xeon Scalable Gold 6144 (8 core, 3.5 GHz) x2
Memory DDR4-2666 192 GB
Network Chelsio T62100-LP-CR (100Gbps NIC)
Storage Samsung SSD 960 PRO (512GB) x8, ASUS HYPER M.2 X16
OS Linux 4.9.88
Linux Sever Linux Server
12
Junichiro Shitami
Dallas
Seattle
Los Angeles
Tokyo
Singapore
Evaluation on Real Network
Route RTT Folded RTT
Dallas – Seattle 41 ms 82 ms
Dallas – Tokyo 137 ms 274 ms
Dallas – Singapore 222 ms 444 ms
Dallas – Los Angeles 403 ms 806 ms
SC18 venue
13
TransPAC
JGN
SingAREN
Rec
eive
r
Sen
der
Junichiro Shitami
Experimental Setup
• 2 servers (sender, receiver) placed at SC18 venue• Using folded network
• Experiments on 3 routs• Dallas – Tokyo
• Dallas – Singapore
• Dallas – Los Angeles
• Encrypted memory and Storage transfer
• Sender side traffic pacing technique
14
Folded network
senderreceiver
dataack
VLA
N A
VLA
N B
Junichiro Shitami
Dallas
Seattle
Los Angeles
Tokyo
Singapore
Result: Dallas – Tokyo
SC18 venue
Folded RTT is 274 ms
15
TransPAC
JGN
SingAREN
Rec
eive
r
Sen
der
Junichiro Shitami
Result: Dallas – Tokyo (RTT 274 ms)Encrypted Memory to Memory Transfer
• ~5 seconds to peak throughput
• ~15 Gbps
• Throuput is vibrating due to TCP window size limitation
0
5
10
15
20
1 51 101 151 201 251
Thro
ugh
pu
t(G
bp
s)
Time (sec)
16
Junichiro Shitami
Result: Dallas – Tokyo (RTT 274 ms)Encrypted Storage to Storage Transfer
• ~11 Gbps
• Throughput is limited by storage performance
0
2
4
6
8
10
12
14
1 51 101 151
Thro
ugh
pu
t(G
bp
s)
Time (sec)
17
Junichiro Shitami
Result: Dallas – Tokyo (RTT 274 ms)Encrypted Storage to Storage Transfer x8
• ~74 Gbps
• Throughput is not stable
• Difference between streams is large
0
20
40
60
80
100
1 51 101 151 201
Thro
ugh
pu
t(G
bp
s)
Time (sec)
stream 1
stream 2
stream 3
stream 4
stream 5
stream 6
stream 7
stream 8
total
without pacing
18
Junichiro Shitami
Result: Dallas – Tokyo (RTT 274 ms)Encrypted Storage to Storage Transfer x8
• ~76 Gbps
• Pacing technique improve stability
• Difference between streams decreases
0
20
40
60
80
100
1 51 101 151 201
Thro
ugh
pu
t(G
bp
s)
Time (sec)
stream 1
stream 2
stream 3
stream 4
stream 5
stream 6
stream 7
stream 8
total
with pacing
19
Junichiro Shitami
Dallas
Seattle
Los Angeles
Tokyo
Singapore
Result: Dallas – Singapore
SC18 venue
Folded RTT is 444 ms
20
TransPAC
JGN
SingAREN
Rec
eive
r
Sen
der
Junichiro Shitami
Result: Dallas – Singapore (RTT 444 ms)Encrypted Storage to Storage Transfer x8
• ~70 Gbps
• Throughput is not stable
• Throughput degradation due to retransmission
0
10
20
30
40
50
60
70
80
90
1 51 101 151 201
Thro
ugh
pu
t(G
bp
s)
Time (sec)
stream 1
stream 2
stream 3
stream 4
stream 5
stream 6
stream 7
stream 8
total
without pacing
21
Junichiro Shitami
Result: Dallas – Singapore (RTT 444 ms)Encrypted Storage to Storage Transfer x8
• ~74 Gbps
• Pacing technique improve stability
• Difference between streams decreases
0
10
20
30
40
50
60
70
80
90
1 51 101 151 201
Thro
ugh
pu
t(G
bp
s)
Time (sec)
stream 1
stream 2
stream 3
stream 4
stream 5
stream 6
stream 7
stream 8
total
with pacing
22
Junichiro Shitami
Dallas
Seattle
Los Angeles
Tokyo
Singapore
Result: Dallas – Los Angeles
SC18 venue
Folded RTT is 806 ms
23
TransPAC
JGN
SingAREN
Rec
eive
r
Sen
der
Junichiro Shitami
Result: Dallas – Los Angeles (RTT 806 ms)Encrypted Storage to Storage Transfer x8
• ~40 Gbps
• Throuput is vibrating due to TCP window size limitation
0
10
20
30
40
50
60
1 51 101 151 201 251
Thro
ugh
pu
t(G
bp
s)
Time (sec)
stream 1
stream 2
stream 3
stream 4
stream 5
stream 6
stream 7
stream 8
total
24
Junichiro Shitami
TCP window problem
• Standard TCP windows size is up to 1 GB• It is too small for very long-distance high-speed network
• like Dallas – Los Angeles route
• We developed LFTCP protocol• overcomes TCP window size limitation
• out of the scope of this presentation
25
0
20
40
60
80
100
0 200 400 600 800 1000
Thro
ugh
pu
t (G
bp
s)
RTT (ms)
Junichiro Shitami
Result SummaryTotal of 8 streams encrypted storage transfer
26
Route RTT Peak Throughput Average Throughput
Dallas – Tokyo (without pacing) 274 ms 90.5 Gbps 74.3 Gbps
Dallas – Tokyo (with pacing) 274 ms 83.6 Gbps 76.3 Gbps
Dallas – Singapore (without pacing) 444 ms 80.6 Gbps 70.2 Gbps
Dallas – Singapore (with pacing) 444 ms 79.6 Gbps 74.1 Gbps
Dallas – Los Angeles 806 ms 50.6 Gbps 39.7 Gbps
• Pacing technique improve stability • Difference between peak and average throughput decreases
• Dallas – Los Angeles is too long• Total throughput is limited by TCP window size
Junichiro Shitami
Comparison with Emulated Delay
• Compare the result • on SC18 network (real network)
• on emulated delay network
• Memory and storage transfer (1 stream)• Due to delay emulation performance
• Network configuration is not exactly same
27
Junichiro Shitami
Emulated Delay Setup
• Delay is emulated by Linux server
• Maximum throughput is ~70 Gbps• depends on traffic pattern
CPU
M.2 NVMe SSD x8
Storage thread
Network thread
Encryption
100G NIC
CPU
CPU
M.2 NVMe SSD x8
Storage thread
Network thread
Encryption
100G NIC
CPU
Linux Server Linux Server
100G NIC
CPU
Linux Server
28
Delay Emulation
Junichiro Shitami
Result: RTT 275 ms (Dallas – Tokyo)Encrypted Memory to Memory Transfer
• Same behavior
0
5
10
15
20
1 51 101 151 201 251
Thro
ugh
pu
t (G
bp
s)
Time (sec)
SC18
Emulated Delay
29
Junichiro Shitami
Result: RTT 275 ms (Dallas – Tokyo)Encrypted Storage to Storage Transfer
• Average throughput is nearly same
• Behavior of start is different
0
2
4
6
8
10
12
14
1 51 101 151
Thro
ugh
pu
t (G
bp
s)
Time (sec)
SC18
Emulated Delay
30
Junichiro Shitami
Discussion about Emulated Delay
• Emulated delay works with 1 stream transfer• Large traffic causes significant performance degradation
• Pacing technique may be useful
• Average throughput is same
• Behavior of start is different
• Impact is large on storage transfer
31
Junichiro Shitami
Conclusion
• We designed “Secure Data Reservoir”
• We achieved 70 Gbps fully encrypted data transfer• on real intercontinental network
• Pacing technique improves stability• CPU load on receiver side is higher than sender side
• Throughput may be limited by TCP window size• LFTCP protocol overcomes this limitation
32
Junichiro Shitami
Current Work
• We participate at SCA19.
• Hardware is critical to performance
33
0
20
40
60
80
100
Singapore –Chicago
Singapore –South Korea
Singapore –Australia
Singapore -Japan
Dallas – Tokyo Dallas –Singapore
Dallas – Los Angeles
Thro
ugh
pu
t (G
bp
s)
SC18 (Dedicated Hardware)
Data Mover Challenge(Provided Hardware)
RTT 249 ms RTT 406 ms RTT 334 ms RTT 84 ms RTT 274 ms RTT 444 ms RTT 806 ms
Junichiro Shitami
Future Work
• Apply Secure Data Reservoir to other large-scale medical applications• Currently for “Serendipiter” (very high-speed cell sorter)
• Improvement User Interface• Current software is very experimental
• Further performance improvement for 400 Gbps network
34