new results from caspur storage lab · 2005. 4. 11. · 15 mb/sec in case of lto and 25 mb/sec in...

35
CASPUR / GARR / CERN / CNAF / CSP New results from CASPUR Storage Lab Andrei Maslennikov CASPUR Consortium June 2003

Upload: others

Post on 17-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

CASPUR / GARR / CERN / CNAF / CSP

New results from CASPUR Storage Lab

Andrei MaslennikovCASPUR Consortium

June 2003

Page 2: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 2

Participated:

CASPUR : M.Goretti, A.Maslennikov(*), M.Mililotti, G.Palumbo

ACAL FCS (UK) : N. Houghton

GARR : M.Carboni

CERN : M.Gug, G.Lee, R.Többicke, A.Van Praag

CISCO : L.Pomelli

CNAF : P.P.Ricci, S.Zani

CSP Turin : R.Boraso

Nishan (UK) : S.Macfall

(*) Project Coordinator

Page 3: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 3

Sponsors:

E4 Computer : Loaned 6 SuperMicro servers (MBs and assembly)- excellent hardware quality and support – Italy

Intel : Donated 12 x 2.8 GHz Xeon CPUs

San Valley Systems : Loaned two SL1000 units- good remote CE support during tests

ACAL FCS / Nishan : Loaned two 4300 units - active participation in tests, excellent support

CISCO Systems : Loaned two MDS 9216 switches with FCIP modules- active participation in tests, excellent support

Page 4: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 4

Contents

• Goals

• Components and test setup

• Measurements:- SAN over WAN- NAS protocols- IBM GPFS- Sistina GFS

• Final remarks

• Vendors’ contact info

Page 5: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 5

1. Feasibility study for a SAN-based Distributed Staging System

2. Comparison of the well-known NAS protocols on latest commodity hardware

3. Evaluation of the new versions of IBM GPFS and Sistina GFS as a possible underlying technology for a scalable NFS server.

Goals for these test series

Page 6: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 6

Feasibility study for a SAN-based Distributed Staging System

- Most of the large centers keep the bulk of the data on tapes and usesome kind of disk caching (staging, HSM, etc) to access these data.

- Sharing datastores between several centers is frequently requested, and this means that some kind of mechanism that would allow toaccess the data stored on the remote tapes should be proposed.

- Suppose now that your centre has implemented a tape<->disk migrationsystem. And you have to extend your system to allow it to access thedata dislocated on remote tape drives.

Let us see how this can be achieved.

Remote Staging

Page 7: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 7

Solution 1: To access a remote tape file, stage it on a remotedisk, then copy it via network to the local disk.

Local Site Remote Site

Remote Staging

Disk

Tape

Disk

Disadvantages:

- 2-step operation: more time is needed, harder to orchestrate- Wasted remote disk space

Page 8: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 8

Solution 2: Use a “tape server”: a process residing on a remote hostthat has access to the tape drive. The data are read remotelyand then “piped” via network directly to the local disk.

Local Site Remote Site

Remote Staging

Disadvantages:

- remote machine is needed- architecture is quite complex

Tape

Disk

TapeServer

Page 9: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 9

Solution 3: Access the remote tape drive as a native device on SAN.Use it then as if it is a local unit attached to one of your local data movers.

Local Site Remote Site

Remote Staging

Benefits:

- Makes the staging software a lot simpler. Local field-tested solution applies.- Best performance guaranteed (provided the remote drive may be used

locally at native speed)

Tape

Disk

Tape SAN

Page 10: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 10

“Verify whether FC tape drives may be used at native speedsover the WAN, using the SAN-over-WAN interconnectionmiddleware”

- In 2002, we had already tried to reach this goal. In particular, we usedthe CISCO 5420 iSCSI appliance to access an FC tape over the 400 km distance. We were able to write at the native speed of the drive, but the read performance was very poor.

- This year, we were able to assemble a setup which implements a symmetric SAN interconnection and hence used it to repeat these tests.

Remote Staging

Page 11: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 11

Benchmark the well-known NAS protocols on a moderncommodity hardware.

- These tests we do on a regular basis, as we wish to know what performance we may currently count on, and how the different protocols compare on the same hardware base.

- Our test setup was visibly more powerful than that of the last year,so we were expecting to obtain better numbers.

- We were comparing two remote copy protocols: RFIO and Atrans(cacheless AFS), and two protocols that provide the transparent file access: NFS and AFS.

NAS protocols

Page 12: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 12

Evaluate the new versions of IBM GPFS and Sistina GFS asa possible underlying technology for a scalable NFS server.

- In 2002, we have already tried both GPFS and GFS.

- GFS 5.0 has shown interesting performance figures, but we haveobserved several issues with it: unbalanced perfomance in caseof multiple clients, exponential increase of load on the lock server with increasing number of clients.

- GPFS 1.2 was showing a poor performance in case of concurrentwriting on several storage nodes.

- We used GFS 5.1.1 and GPFS 1.3.0-2 during this test session.

Scalable NFS Server

Page 13: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 13

- High-end Linux units for both servers and clients6x SuperMicro Superserver 7042M-6 and 2x HP Proliant DL380 with:

2 CPUs Pentium IV Xeon 2.8GHzSysKonnect 9843 Gigabit Ethernet NIC (fibre)Qlogic QLA2300 2Gbit Fibre Channel HBAMyrinet HBA

- Disk systems4x Infortrend IFT-6300 IDE-to-FC arrays:

12 x Maxtor DiamondMax Plus 9 200 GB IDE disks (7200 rpm)Dual Fibre Channel outlet at 2 Gbit Cache: 256 MB

Components

Page 14: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 14

- Tape drives4x LTO/FC (IBM Ultrium 3580)

- Network12-port NPI Keystone GE switch (fibre)

28-port Dell 5224 GE switches (fibre / copper)

Myricom Myrinet 8-port switch

Fast geographical link (Rome-Bologna, 400km),with guaranteed throughput of 1 Gbit.

- SANBrocade 2400, 2800 (1Gbit) and 3800 (2Gbit) switches

SAN Valley Systems SL1000 IP-SAN Gateways

Nishan IPS 4300 multiprotocol IP Storage Switches

CISCO MDS 9216 switches with DS-X9308-SMIP module

Components -2

Page 15: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 15

New devices

We were loaned three new objects: from San Valley Systems, fromNishan Systems, and from CISCO Systems. All these units provide theSAN-over-IP interconnect function, and are suitable for wide-area SANconnectivity.

Let me give some more detail of these units.

Components -3

Page 16: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 16

San Valley Systems IP-SAN Gateway SL-700 / SL-1000

- 1 or 4 wirespeed Fibre Channel -to- Gigabit Ethernet channels- Uses UDP and hence delegates to the application the handling of a network outage- Easy in configuration- Allows for the fine-grained traffic shaping (step size 200 Kbit, 1Gb/s to 1Mb/s) and QoS - Connecting two SANs over IP with a pair of SL1000 units is in all aspects equivalent - to the case when these two SANs are connected with a simple fibre cable

- Approximate cost: 20 KUSD/unit (SL-700, 1 channel)- 30 KUSD/unit (SL-1000, 4 channels)

- Recommended number of units per site: 1-

Page 17: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 17

Nishan IPS 3300/4300 multiprotocol IP Storage Switch

- 2 or 4 wirespeed iFCP ports for SAN interconnection over IP- Uses TCP and is capable to seamlessly handle the network outages- Allows for traffic shaping at predefined bandwidth (8 steps,1Gbit- 10Mbit) and QoS- Impements an intelligent router function: allows to interconnect multiple fabricsfrom different vendors and makes them look as a single SAN

- When interconnecting two or more separately managed SANs, maintains theirindependent administration

- Approximate cost: 33 KUSD/unit (6 universal FC/GE ports + 2 iFCP ports - IPS 3300)48 KUSD/unit (12 universal FC/GE ports + 4 iFCP ports - IPS 4300)

- Recommended number of units per site: 2 (to provide redundant routing)

Page 18: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 18

CISCO IP Storage Services Module (DS-X9308-SMIP)

- Provides integration of IP Storage Services into the Cisco MDS9000 FC switches(One MDS9216 switch incorporates 16 FC ports)

- 8 Gigabit Ethernet IP Storage Interfaces:o Wire-rate FCIP on all ports simultaneouslyo Up to 24 simultaneous FCIP links per module

- Industry standard FCIP protocol uses TCP/IP to provide reliable transport

- VSAN and VLAN services increase stability and security of WAN-connected SAN elements

- Pricing provided through reseller partners (IBM, HP and others)

- Recommended number of units per site: 1

MDS9216 16 port modular FC switch IP Storage Service module

Page 19: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 19

CASPUR Storage Lab

HP DL380 Bologna Gigabit IP (Bologna)FC SAN (Bologna)

IPS 4300

SL1000

HP DL380 Rome

SM 7042M-6

SM 7042M-6

Myrinet

Gigabit IP (Rome)SM 7042M-6

SM 7042M-6

SM 7042M-6

SM 7042M-6DisksTapes

FC SAN (Rome)

IPS 4300SL1000

1 Gbit WAN, 400km

MDS 9216

MDS 9216

Page 20: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 20

Series 1: accessing remote SAN devices

DisksTapes

FC SAN (Rome) IPS 4300

SL1000

FC SAN (Bologna) IPS 4300

SL1000

1 Gbit WAN, 400km

HP DL380 Bologna

HP DL380 Rome

MDS 9216

MDS 9216

Page 21: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 21

- We were able to operate with tape drives at the drive native speed (R and W):15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive

- In case of disk devices we have observed a small (5%) loss of performanceon writes and a more visible (up to 12%) loss on reads, on all 3 units. The performance drop on reads increases with distance between the units: 6% at 1 m through 10-12% at 400 km.

- Several powerful devices used simultaneously over the geographical link easily grab the whole available bandwidth of the GigE link between the two appliances(seen so far for Nishan and San Valley units only; CISCO will be tested in the closefuture).

- in case of Nishan (TCP-based SAN interconnection) we have witnessed asuccessful job completion after an emulated 1-minute network outage.A similar test will also be performed for the CISCO unit.

Distributed Staging based on a direct tape drive access is POSSIBLE !

Series 1 - current results

Page 22: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 22

Series 2 – Comparison of NAS protocols

ServerSM 7042

ClientSM 7042Gigabit Ethernet

RAID 50R/W: circa 150 MB/sec

Infortrend IFT6300

FC 2 GbitInfortrend IFT6300

Page 23: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 23

Some settings: - Kernels on server: 2.4.18-27 (RedHat 7.3, 8.0)- Kernel on client: 2.4.20-9 (RedHat 9) - AFS : cache was set up on ramdisk (400MB) - used ext2 filesystem on server

Problems encountered:- Poor array performance on reads with kernel 2.4.20-9

Series 2 - details

Page 24: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 24

Write tests:- Measured average time needed to transfer 20 x 1.9 GB from memory

on the client to the disk of the file server and vice versa including the time needed to run “sync” command on both client and the server at the end of operation:

20 x {dd if=/dev/zero of=<filename on server> bs=1000k count=1900}T=Tdd + max(Tsyncclient, Tsyncserver)

Read tests:- Measured average time needed to transfer 20 x 1.9 GB files from a disk on

the server to the memory on the client (output directly to /dev/null ).

Because of the large number of files in use and the file size comparablewith available RAM on both client and server machines, caching effectswere negligible.

Series 2 – more detail

Page 25: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 25

Series 2- current results (MB/sec)[SM 7042 - 2GB RAM on server and client]

3047AFS

5667AFS cacheless(Atrans)

8287NFS

111111RFIO

151146Pure disk

ReadWrite

Page 26: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 26

Series 3a – IBM GPFS

4 x IFT 6300 disk arrays

SM 7042M-6

SM 7042M-6

SM 7042M-6

SM7042M-6

FC SANNFS

Myrinet

Page 27: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 27

GPFS installation:- GPFS version 1.3.0-2- Kernel 2.4.18-27.8.0smp- Myrinet as server interconnection network- All nodes see all disks (NSDs)

What was measured:1) Read and Write transfer rates (memory<->GPFS file system) for large files2) Read and Write rates (memory on NFS client<->GPFS exported via NFS)

Series 3a - details

Page 28: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 28

Series 3a – GPFS native (MB/sec)

R / W speed for a single disk array: 123 / 78

1221573 nodes

1271352 nodes

117961 node

WriteRead

Page 29: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 29

Series 3a – GPFS exported via NFS (MB/sec)

1 node exporting

2 nodes exporting

3 nodes exporting12011384Read

106

6 clients

106107Write

9 clients3 clients

44444435Read

83

3 clients

73

2 clients

8855Write

9 clients1 client

857260Read

106

4 clients

10690Write

6 clients2 clients

Page 30: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 30

Series 3b – Sistina GFS

4 x IFT 6300 disk arrays

SM 7042M-6

SM 7042M-6

SM 7042M-6

SM7042M-6

FC SAN

SM7042M-6

NFS

Lock Server

Page 31: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 31

GFS installation:- GFS version 5.1.1 - Kernel: SMP 2.4.18-27.8.0.gfs (may be downloaded from Sistina

together with the trial distribution), includes all the required drivers.

Problems encountered:- Exporting GFS via the kernel-based NFS daemon does not work well

(I/Os on NFS clients were ending in error). Sistina was able to reproduce the bug on our setup, and already has partially fixed it – the error rate went down in a visible way. So we are reporting here the results obtained witha generally less-performant but stable user space nfsd.

What was measured:1) Read and Write transfer rates (memory<->GFS file system) for large files2) Same for the case (memory on NFS client<->GFS exported via NFS)

Series 3b - details

Page 32: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 32

Series 3b – GFS native (MB/sec)

NB: - Out of 5 nodes:1 node was running the lock server process4 nodes were doing only I/O

2972913 clients

3003304 clients

2452302 clients

1561221 client

WriteRead

R / W speed for a single disk array: 123 / 78

Page 33: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 33

Series 3b – GFS exported via NFS (MB/sec)

250Read

236Write

8 clients

93786754Read

64

4 clients

64

2 clients

6156Write

8 clients1 client

207194145Read

190

6 clients

185164Write

9 clients3 clients

1 node exporting

3 nodes exporting

4 nodes exporting

NB: - User space NFSD was used

Page 34: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 34

- We are proceeding with the test program. Currently under test: new middleware from CISCO, new tape drive from Sony.We are expecting also a new iSCSI appliance from HP, and an LTO2 drive.

- We are open for any collaboration.

Final remarks

Page 35: New results from CASPUR Storage Lab · 2005. 4. 11. · 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small

A.Maslennikov - June 2003 - SLAB update 35

- Supermicro servers for Italy E4 Computer Vincenzo Nuti - [email protected]

- FC over IPSan Valley Systems John McCormack - [email protected]

Nishan Systems Stephen Macfall - [email protected]

ACAL FCS Nigel Houghton - [email protected]

CISCO Systems Luciano Pomelli – [email protected]

Vendors’ contact info