whitepaper transtec & beegfs...transtec is a registered gold partner of thinkparq and installed...

11
transtec & BeeGFS Whitepaper

Upload: others

Post on 09-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

transtec & BeeGFSWhitepaper

Page 2: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

2

Whitepapertranstec & BeeGFS

OVERVIEW

> transtec AG and BeeGFS

For more than 35 years, transtec has been a vendor and service provider on the German

market, and is now also represented in a further seven European countries. Over the years,

the direct-selling operation in Germany has grown into an international business which does

much more than manufacture, sell and distribute hardware. Today, transtec is operating

highly successful in the fi elds of High Performance Computing, Storage, Virtualization and

Consolidation, and as such has developed into a pan-European solution provider.

BeeGFS is a parallel cluster fi le system developed and maintained by ThinkParQ. They

focused on performance, easy installation and management when developing this parallel

fi le system: things, a lot of our customers ask for. So we decide to create this paper to give

customers and interested persons an overview of a potential confi guration and performance

by using a BeeGFS solution.

transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS

system worldwide.

> About

This document presents a summary of the results of a series of benchmarks. The goal was to

measure the data streaming and I/O operations throughput of a possible Turn-Key Solution

from transtec AG.

The Benchmarks was executed on transtec hardware. For the BeeGFS Storage- and Metadata

services, we have used two transtec CALLEO Application Server 4280H. As compute nodes a

CALLEO High-Performance Server 2880 where used, this is a blade system that contains four

insertions. The BeeGFS Management Service was running on a central server that is also

used as cluster manager.

CONTACTS

transtec AG

www.transtec.de | [email protected]

Tel. +49 (0)7121 2678 - 400

ThinkParQ BeeGFS

www.thinkparq.com www.beegfs.com

Page 3: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

3

Whitepapertranstec & BeeGFS

SYSTEM DESCRIPTION

> Storage- and Metadata Server System confi guration

The transtec CALLEO Application Server 4280H

Link: shop.transtec.de/SA4280A336R-calleo-application-server-4280h

Each of the two Servers contain the following components:

2x Intel(R) Xeon(R) CPU E5-2667 V3 @ 3.20 GHz

64 GB RAM @ 2133 MHz

2x Seagate 1200 SAS SSD, 200GB (ST200FM0053)

> Confi gured as RAID1 for meta data

34x Seagate Enterprise Capacity SAS HDD, 2TB (ST2000NM0034)

> Confi gured as 3x RAID6 with 11 Disks per Raid

> 1 Disk confi gured as Global Hot-Spare

AVAGO MegaRAID SAS 9361-8i

Mellanox FDR ConnectX-3 MT27500 Infi niband controller

CentOS 7.2

Mellanox OFED stack (from CentOS 7.2 repo)

BeeGFS Storage and Medadata Service version 2015.03-r11

> Compute and Client Nodes confi guration

The transtec CALLEO High-Performance Server 2880

Link: shop.transtec.de/calleo-high-performance-server-2880

Each of the two Servers contain the following components:

2x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

128 GB RAM @ 2400 GHz

Mellanox FDR ConnectX-3 MT27500 Infi niband controller

Stateless CentOS 7.1

Mellanox OFED stack (from CentOS 7.1 repo)

BeeGFS Client Service version 2015.03-r11

> Benchmarks

The benchmark tools used in the experiment are listed below:

IOR-3.0.1: for measuring the sustained throughput of the BeeGFS

storage service.

mdtest-1.9.4: for measuring the performance of the BeeGFS

metadata service

IOR and mdtest has been compiled with openmpi/1.10.0-gcc4.9.2.

Page 4: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

4

Whitepapertranstec & BeeGFS

BeeGFS Meta and Storage Server

Ethernet Switch

BeeGFS Management Server

BeeGFS Meta and Storage Server

Infiniband Switch

Compute Nodes

4x

4x

> Topology map

Schema 2: Topology map

Page 5: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

5

Whitepapertranstec & BeeGFS

CONFIGURATION AND TUNING

> BeeGFS storage Server

Formatting Options Value

RAID stripe size per disk (KB) 256

RAID level 6

Disks per RAID volume 9+2

Disk array local Linux fi le system XFS

Partition Alignment GPT

XFS Mount Options (/etc/fstab) Value

Last File and Directory Access noatime, nodiratime

Log buffer tuning logbufs=8, logbsize=256k

Streaming performance optimization largeio, inode64, swalloc

Streaming write throughput allocsize=131072k

Write Barriers nobarrier

IO Scheduler Options (/sys/block/<dev>/queue/) Value

Scheduler deadline

nr_requests 4096

read_ahead_kb 32768

Virtual Memory Settings (/proc/sys/vm/) Value

dirty_background_ratio 5

dirty_ratio 10

vfs_cache_pressure 50

min_free_kbytes 262144

CPU Frequency Settings Value

/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor performance

BeeGFS Storage Service (/etc/beegfs/beegfs-storage.conf) Value

connMaxInternodeNum 40

tuneFileReadAheadSize 32m

tuneFileReadAheadTriggerSize 2m

tuneFileReadSize 512K

tuneFileWriteSize 512K

tuneNumWorkers 18

tuneWorkerBufSize 16m

Page 6: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

6

Whitepapertranstec & BeeGFS

> BeeGFS meta Server

Format Options Value

RAID Level 1

SSDs per RAID volume 2

Disk array local Linux fi le system ext4

Minimize access times for large directories -Odir_index

Large inodes -I 512

Number of inodes -i 2048

Large journal -J size=400

Extended attributes user_xattr

Partition Alignment GPT

EXT4 Mount Options (/etc/fstab) Value

Last File and Directory Access noatime, nodiratime

Write Barriers nobarrier

IO Scheduler Options (/sys/block/<dev>/queue/) Value

Scheduler deadline

nr_requests 128

read_ahead_kb 128

BeeGFS Meta Service (/etc/beegfs/beegfs-meta.conf) Value

connMaxInternodeNum 32

tuneNumWorkers 0

tuneTargetChooser randomrobin

> BeeGFS client service

BeeGFS Client Service (/etc/beegfs/beegfs-client.conf) Value

connMaxInterrnodeNum 64

tuneRemoteFSync false

Parallel Network Requests Option Value

Chunk size 2M

IOR File-per-process: targets per fi le 3

IOR Single-shared-fi le: targets per fi le 6

> BeeGFS striping settings

Command-Line:

beegfs-ctl --setpattern --chunksize=<Chunk size> --numtargets=<targets per fi le> <path>

Page 7: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

7

Whitepapertranstec & BeeGFS

RESULTS

> Benchmark Parameters

IOR (fi le-per-process):

8 benchmarks from 20 to 27 processes, 5 repeats each

Every process writes and read his own fi le

Files of variable size (960GiB / processes)

Total size of data write/read = 960GiB

Transfersize = 2MiB

IOR (single-shared-fi le):

6 benchmarks from 21 to 26 processes, 5 repeats each

All processes write and read the same shared fi le

Total size of data write/read = 640GiB

Transfersize = 2MiB

Mdtest:

8 benchmarks from 20 to 27 processes, 5 repeats each

Total fi les created and statted: 1 million

MPI:

MPI was confi gured to split the running processes between all 4 nodes. Exception is

naturally the benchmarks running with 1 or 2 processes. On those benchmarks only 1

process was running per compute node

Page 8: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

8

Whitepapertranstec & BeeGFS

15001 2 4 8 16 32 64 128

2500

3500

4500

5500

6500

meanminmaxMiB/s

Process

> Results IOR (fi le-per-process)

Chart 1: Write throughput (fi le-per-process)

WRITE IOR (fi le-per-process)

15001 2 4 8 16 32 64 128

2500

3500

4500

5500

6500

meanminmaxMiB/s

Process

Chart 2: Read throughput (fi le-per-process)

READ IOR (fi le-per-process)

WRITE READ

Processes Max Min Mean Max Min Mean

1 1887,0 1878,6 1882,3 2204,3 1925,9 2080,6

2 3549,4 3544,6 3546,3 4122,2 3821,6 4010,2

4 5690,8 5651,0 5678,9 5230,4 4943,5 5089,0

8 5660,3 5558,3 5589,3 5155,5 4850,3 4983,6

16 5539,9 5503,9 5520,5 5733,7 5383,4 5565,5

32 5286,9 5242,2 5261,4 5860,6 5823,1 5844,8

64 4757,0 4727,2 4738,1 5975,8 5824,1 5901,1

128 4040,2 4004,1 4026,2 5915,2 5808,2 5860,6

Table 1: Read and Write throughputs (MiB/s) fi le-per-process benchmark

Page 9: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

9

Whitepapertranstec & BeeGFS

15002 4 8 16 32 64

2500

3500

4500

meanminmaxMiB/s

Process

> Results IOR (single-shared-fi le)

Chart 3: Write throughput (single-shared-fi le)

WRITE IOR (single-shared-fi le)

15002 4 8 16 32 64

2500

3500

4500

5500

meanminmaxMiB/s

Process

Chart 4: Read throughput (single-shared-fi le)

READ IOR (single-shared-fi le)

WRITE READ

Processes Max Min Mean Max Min Mean

2 2097,1 2042,1 2059,6 3957,2 3652,5 3846,4

4 3835,7 3763,4 3798,9 5021,3 4556,6 4769,3

8 3960,7 3830,7 3889,1 4212,1 3762,3 3890,8

16 4024,3 3890,6 3950,0 4871,2 3882,6 4401,0

32 4039,6 3896,1 3992,1 4937,9 3647,2 4240,8

64 4235,1 3995,3 4154,3 4332,5 3412,3 3957,4

Table 2: Read and Write throughputs (MiB/s) single-shared-fi le benchmark

Page 10: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

10

Whitepapertranstec & BeeGFS

01 2 4 8 16 32 64 128

20 000

40 000

60 000

80 000

120 000

100 000

MiB/s

Process

meanminmax

> Results mdtest

Chart 5: File creation throughput

mdtest Files creation

01 2 4 8 16 32 64 128

10 000

20 000

30 000

40 000

MiB/s

Process

meanminmax

Chart 6: File stat throughput

mdtest Files stat

WRITE READ

Processes Max Min Mean Max Min Mean

1 7290 5718 6731 28915 21608 26424

2 14735 13547 13887 59287 49436 52923

4 31131 28701 29944 118461 110927 115570

8 53691 51219 52312 206813 199581 203559

16 86437 79607 84207 340804 329923 336653

32 93855 90888 93027 358610 347062 354744

64 97402 93825 95638 292356 252831 274747

128 86754 78000 82665 187561 183149 186103

Table 3: Files creation and stat per second throughputs mdtest Benchmark

Page 11: Whitepaper transtec & BeeGFS...transtec is a registered gold partner of ThinkParQ and installed the fi rst Petabyte BeeGFS system worldwide. > About This document presents a summary

11

Whitepapertranstec & BeeGFS

COMMANDOSThis section shows the commands that were used on the compute nodes to run the streaming and metadata benchmarks.

> IOR fi le-per-process

for (( x=0; x <= 7; x++ ))

do

N_PROCS=$((2**$x))

mpirun -npernode ${N_PROCS} --bind-to none -- ./src/ior -a POSIX -i 5 -g -C -d 10 \

-w -r -e -t 2m -F –b $((960/$N_PROCS))g –o /scratch/bench/ior | tee \

./outputs/ior_bench_fpp_${N_PROCS}.log

done

> IOR single-shared-fi le

for (( x=1; x <= 6; x++ ))

do

N_PROCS=$((2**$x))

mpirun -npernode ${N_PROCS} --bind-to none -- ./src/ior -a POSIX -i 5 -g -C -d 10 \

-w -r -e -t 2m –b $((640/$N_PROCS))g –o /scratch/bench/ior | tee \

./outputs/ior_bench_ssh_${N_PROCS}.log

done

> Mdtest

for (( x=0; x <= 7; x++ ))

do

N_PROCS =$((2**$x))

fi lesperdir=$((1000000/64/$N_PROCS))

mpirun -np ${N_PROCS} ./mdtest -C -T -d /scratch/mdtest -i 5 -I ${fi lesperdir} -z \

2 -b 8 -L -u –F

done