performance measurements of a user-space dafs server with a database workload
DESCRIPTION
Performance Measurements of a User-Space DAFS Server with a Database Workload. Samuel A. Fineberg Don Wilson NonStop Labs. Outline. Background on DAFS and ODM Prototype client and server I/O tests performed Raw benchmark results Oracle TPC-H results Summary and conclusions. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/1.jpg)
© 2003 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice
Performance Measurements of a User-Space DAFS Serverwith a Database Workload
Samuel A. FinebergDon Wilson
NonStop Labs
![Page 2: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/2.jpg)
page 2August 27, 2003 Fineberg and Wilson NICELI Presentation
Outline
• Background on DAFS and ODM• Prototype client and server• I/O tests performed• Raw benchmark results• Oracle TPC-H results• Summary and conclusions
![Page 3: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/3.jpg)
page 3August 27, 2003 Fineberg and Wilson NICELI Presentation
What is the Direct Access File System (DAFS)?
• Created by the DAFS Collaborative– Group consisting of over 80 members from industry,
government, and academic institutions– DAFS 1.0 spec was approved in September 2001
• DAFS is a distributed file access protocol– Data requested from files, not blocks– Based loosely on NFSv4
• Optimized for local file sharing environments– Systems are in relatively close proximity– Connected by a high-speed low-latency network
• Built on top of direct-access transport networks– Initially targeted at Virtual Interface Architecture (VIA)
networks– Direct Access Transport (DAT) API was later generalized and
ported to other networks (e.g., Infiniband, iWarp)
![Page 4: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/4.jpg)
page 4August 27, 2003 Fineberg and Wilson NICELI Presentation
Characteristics of a Direct Access Transport
• Connected model, i.e., VIs must be connected before communication can occur
• Two forms of data transport– Send/receive – two-sided– RDMA read and write – one sided
• Both transports support direct data placement– Receives must be pre-posted
• Memory regions must be “registered” before they can be transferred through a DAT– Pins data in physical memory– Establishes VM nslation tables for the NIC
![Page 5: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/5.jpg)
page 5August 27, 2003 Fineberg and Wilson NICELI Presentation
DAFS Details
• Session based– DAFS clients initiate sessions with a server– DAT/VIA connection is associated with a session
• RPC-like Command format– Implemented with send/receive– Server “receives” requests “sent” from clients– Server “sends” responses to be “received” by client
• Open/Close– Unlike NFSv2, files must be open and closed (not
stateless)• Read/Write I/O “modes”
– Inline: limited amount of data included in request/response
– Direct: Server initiates RDMA read or write to move data
![Page 6: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/6.jpg)
page 6August 27, 2003 Fineberg and Wilson NICELI Presentation
Inline vs. Direct I/OTim
e
Client ServerClient Server
Inline Direct
Response
Request
disk read
or write
Request
disk writeRDMA read
Response
disk readRDMA write
Response
Request
Direct
write
Direct
read
Inline
Read
or write
![Page 7: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/7.jpg)
page 7August 27, 2003 Fineberg and Wilson NICELI Presentation
Oracle Disk Manager (ODM)
• File access interface spec for the Oracle Database– Supported as a standard feature in Oracle 9i– Implemented as a vendor supplied DLL– Files that can not be opened using ODM use standard
APIs• Basic commands
– Files are created and pre-allocated then committed– Files are then “identified” (open) and “unidentified”
(closed)– All r/w I/O uses an asynchronous “odm_io” command
• I/Os specified as descriptors, multiple per odm_io call
– Multiple waiting mechanisms: wait for specific, wait for any
– Other commands are synchronous, e.g., resizing
![Page 8: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/8.jpg)
page 8August 27, 2003 Fineberg and Wilson NICELI Presentation
Prototype Client/Server
• DAFS Server– Implemented for Windows 2000 and Linux (all
testing was on Windows)– Built on VIPL 1.0 using DAFS 1.0 SDK protocol stubs– All data buffers are pre-allocated and pre-registered– Data-driven multithreaded design
• ODM Client– Implemented as a Windows 2000 dll for Oracle 9i– Multithreaded to enable decoupling of
asynchronous I/O from Oracle threads– Inline buffers are copied, direct buffers are
registered/deregistered as part of the I/O– Inline/direct threshold (set when library is
initialized)
![Page 9: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/9.jpg)
page 9August 27, 2003 Fineberg and Wilson NICELI Presentation
Test System Configuration
• Goal was to compare local I/O with DAFS• Local I/O configuration
– Single system running Oracle on locally attached disks
• DAFS/ODM I/O configuration– One system running DAFS server software with
locally attached disks– Second system running Oracle and ODM client, files
on DAFS server accessed using ODM over a network• 4-processor Windows 2000 server based systems
– 500MHz Xeon, 3GB RAM, dual-bus PCI 64/33– ServerNet II (VIA 1.0 based) System Area Network– Disks were 15K RPM attached by two PCI RAID
controllers, configured for RAID 1/0 (mirrored-striped)
![Page 10: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/10.jpg)
page 10August 27, 2003 Fineberg and Wilson NICELI Presentation
Experiments
• Raw I/O Tests– Odmblast – streaming I/O test– Odmlat – I/O latency test– DAFS tests used ODM dll to access files on DAFS
server– Local tests used special local ODM library built on
Windows unbuffered I/O• Oracle database test
– Standard TPC-H benchmark– SQL based decision support code– DAFS tests used ODM dll to access files on DAFS
server– Local tests used ran without ODM (Oracle uses
windows unbuffered I/O directly)
![Page 11: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/11.jpg)
page 11August 27, 2003 Fineberg and Wilson NICELI Presentation
Odmblast
• ODM based I/O stress test– Intended to present a continuous load to the I/O
system– Issues many simultaneous I/Os (to allow for
pipelining)• In our experiments, Odmblast streams 32 I/Os to
server– 16 I/Os per odm_io call– wait for I/Os from the previous odm_io call
• I/Os can be reads, writes, or a random mix• I/Os can be at sequential or random offsets
![Page 12: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/12.jpg)
page 12August 27, 2003 Fineberg and Wilson NICELI Presentation
Odmblast read comparison
0.0
50.0
100.0
150.0
200.0
250.0
0 200000 400000 600000 800000 1000000
I/O Size (bytes)
Ban
dw
idth
(M
B/s
ec)
Local Seq RdLocal Rand RdDAFS Seq RdDAFS Rand Rd
![Page 13: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/13.jpg)
page 13August 27, 2003 Fineberg and Wilson NICELI Presentation
Odmblast write comparison
0.0
10.0
20.0
30.0
40.0
50.060.0
70.0
80.0
90.0
100.0
0 200000 400000 600000 800000 1000000
I/O Size (bytes)
Ban
dw
idth
(M
B/s
ec)
Local Seq WrLocal Rand WrDAFS Seq WrDAFS Rand Wr
![Page 14: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/14.jpg)
page 14August 27, 2003 Fineberg and Wilson NICELI Presentation
Odmlat
• I/O Latency test– How long does a single I/O take
• (not necessarily related to aggregate I/O rate)
– For these experiments, <16K = inline, ≥ 16K = direct
– Derived the components that make up I/O time using linear regression
– More details in paper
![Page 15: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/15.jpg)
page 15August 27, 2003 Fineberg and Wilson NICELI Presentation
Odmlat performance
0.0
1000.0
2000.0
3000.0
0 16384 32768 49152 65536Bytes per I/O Operation
Tim
e p
er O
per
atio
n
(mic
rose
con
ds)
Read Time
Write Time
![Page 16: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/16.jpg)
page 16August 27, 2003 Fineberg and Wilson NICELI Presentation
Oracle-based results
• Standard Database Benchmark - TPC-H– Written in SQL– Decision support benchmark– Multiple ad-hoc query streams with an “update thread”– 30GB database size
• Oracle configuration– All I/Os 512-byte aligned (required for unbuffered I/O)– 16K database block size– Database files distributed across two NTFS file systems
• Measurements– Compared average runtime for local vs. DAFS based I/O– Skipped official “TPC-H power” metric– Varied inline/direct threshold for DAFS based I/O
![Page 17: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/17.jpg)
page 17August 27, 2003 Fineberg and Wilson NICELI Presentation
Oracle TPC-H Performance
13:2317:13
14:39
local DAFS 16k-direct
DAFS 16k-inline
Tim
e (H
rs:M
in)
![Page 18: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/18.jpg)
page 18August 27, 2003 Fineberg and Wilson NICELI Presentation
Oracle TPC-H Operation Distribution
16 KByte Read79.4%
>16 KByte Write1.1%
>16 KByte Read19.1%
16 KByte Write0.3%
![Page 19: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/19.jpg)
page 19August 27, 2003 Fineberg and Wilson NICELI Presentation
Oracle TPC-H CPU Utilization
0
10
20
30
40
50
60
70
80
90
100
0 200 400 600 800 1000 1200
Elapsed Time (mins)
% C
PU
Use
d
Dafs Client (inline I/O) DAFS Server (inline I/O) DAFS Server (direct I/O)
Local I/O DAFS Client (direct I/O)
![Page 20: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/20.jpg)
page 20August 27, 2003 Fineberg and Wilson NICELI Presentation
TPC-H Summary
• Local I/O still faster– Limited ServerNet II bandwidth– Memory registration or copying overhead – Windows unbuffered I/O is already very efficient
• DAFS still has more capabilities than local I/O– Capable of cluster I/O (RAC)
• Memory registration is still a problem with DATs– Registration caching can be problematic
• Can not guarantee address mappings will not change• ODM has no means for notifying NIC of mapping
changes– Need either better integration of I/O library with
Oracle or better integration of OS with DAT • Transparency is expensive!
![Page 21: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/21.jpg)
page 21August 27, 2003 Fineberg and Wilson NICELI Presentation
Conclusions
• DAFS Server/ODM Client achieved performance close to the limits of our network– Local SCSI I/O was still faster
• Running a database benchmark, DAFS TPC-H performance was within 10% of local I/O– Also provides advantages of a network file system (i.e.,
clustering support)• Limitations of our tests
– ServerNet II bandwidth was inadequate – no support for multiple NICs
– Needed to do client-side registration for all direct I/Os• TPC-H benchmark was not optimally tuned
– Needed to bring client CPU closer to 100%• More disks, less CPUs, other tuning
– CPU offload is not a benefit if I/O is the bottleneck
![Page 22: Performance Measurements of a User-Space DAFS Server with a Database Workload](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56814125550346895dacfea1/html5/thumbnails/22.jpg)
HP logo