the panasas® parallel storage clusterrich/crc_summer_scholars_2014/... · how does an...
Post on 23-May-2020
2 Views
Preview:
TRANSCRIPT
The Panasas® Parallel Storage Cluster
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
What Is It?
What Is The Panasas ActiveScale Storage Cluster
•A complete hardware and software storage solution
•Implements
•An Asynchronous, Parallel, Object-based, POSIX compliant Filesystem
•A Global Namespace
•Strict client cache coherency
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Physically, How Is It Organized?
•A shelf is 4U high and contains slots from 11 blades
•0-3 DirectorBlades per shelf, with the remaining slots for StorageBlades
1 DirectorBlade
10 StorageBlades
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Terminology
Metadata – The information that describes the data contained in files
•Size, create time, modify time, location on disk, permissions
Block Based Filesystem – A filesystem in which the client accessed files
based on the physical location on disk.
File Based Filesystem – A filesystem where a client requests a file by name.
Object Based Filesystem – In this case, the filename is abstracted into a
identifier. We will discuss this later.
RAID – multiple disks arranged into one physical disk tuned for redundancy
or speed.
JBOD – (just a bunch of disks) multiple disks access directly rather than in
an array.
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Direct Attached Storage (local filesystem)
Private storage for a host operating system
IDE connected internal hard drive
Serial ATA or SCSI attached drives
USB drives
Examples: ext3, reiserFS, NTFS, ufs, FAT32
This discussion is mostly about distributed file systems
Problems of scale require lots of storage computers working together
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Network Attached Storage
File Server exports storage at the file level
NFS/CIFS are widely deployed
NFS is the only official file system standard
Scalability limited by server hardware
Moderate number of clients (10’s to 100’s)
Moderate amount of storage (few TB)
A nice model until it runs out of steam
“Islands of storage”
Bandwidth to a file limited by its server
NetApp (ONTAP 7.x), Sun, HP, SnapServer, EMC
Celerra, StorEdge NAS, IBM TotalStorage NAS,
whitebox Linux
NAS Head
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
More scalable than single-headed NAS
Multiple NAS heads share back-end storage
“In-band” NAS head still limits performance and
drives up cost
Two primary architectures
Forward requests to “owner Head”
Export NAS from shared file system
NFS does not provide a good mechanism for
dynamic load balancing
Clients permanently mount a particular Head
GPFS, Isilon OneFS, IBRIX, Polyserve, NetApp-
GX, BlueArc, Exanet ExaStore, ONStor, Pillar Data,
IBM/Transarc AFS, IBM DFS
Clustered NAS
NAS Heads
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Storage Area Network
Common management and provisioning for host storage
Block Devices (JBOD or RAID) accessible via iSCSI or FC network
Wire-speed/RAID-speed performance potential
Proprietary solutions for shared file systems
Scalability limited by block management on metadata server (e.g., 32 nodes)
NAS access provided by “file head” that re-exports the SAN file system
Asymmetric (pictured) or Symmetric implementations
Clients Storage
SAN
Metadata Server(s)
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Object Based Storage Clusters
Block and file interfaces replaced with an object abstraction.
Block management pushed all the way out to the disks.
Allows for parallel and direct access to disks
Requires non-standards based Client
Luster, Panasas
OSD Clients
Object (OSD) Storage
Metadata Server
data
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
pNFS: Standard Storage Clusters
pNFS is an extension to the Network File System v4 protocol standard
Allows for parallel and direct access
From Parallel Network File System clients
To Storage Devices over multiple storage protocols
Moves the Network File System server out of the data path
pNFS Clients
Block (FC) / Object (OSD) / File (NFS) Storage NFSv4.1 Server
data
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
RAID
Redundant Array of Independent Drives
Many physical disks bound together with
hardware or software.
Multiple layouts to accommodate
performance and fault tolerance requirements.
Used to create larger filesystems out of
standard drive technology.
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Comparing Technology
How does an object-based, parallel filesystem compare to traditional
storage solutions?
vs. Direct Attached Storage
oSeparate control and data paths. Metadata and data workloads are distributed.
oMultiple access points for redundancy and scalability
oNo need to balance expensive server resources between applications and storage
access
vs. Network Attached Storage
oScalability and ease of management in very large installations
vs. Storage Area Networks
oClients access storage directly, no intermediary gateway
oAll communication is IP based, choose your infrastructure
Low cost, high bandwidth Gigabit or 10-Gigabit Ethernet
Higher cost, low latency Infiniband
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Panasas Object-Based Storage Cluster
Consist of two primary components
Object Storage Devices (OSD): StorageBlades
MetaData Manager: DirectorBlades
Directors implement file system semantics
Access control, cache consistency, user identity, etc.
Directors have rights to perform these object operations
Create, delete, create group, delete group
Get attributes and set attributes
Clone group, copy-on-right support for snapshots
Clients perform direct I/O with these object operations
Read, write
Get attributes, set (some) attributes
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Balanced storage device
CPU, SDRAM, GE NIC and 2 spindles, 2x2TB SATA
Commodity parts drive low cost
Performance scales with capacity
Panasas StorageBlade (OSD)
Single Seamless Namespace! Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
DirectFLOW client is kernel loadable FS module
Implements standard Vnode interface
Uses native Panasas network protocols (RPC and iSCSI)
Caches data, directories, attributes, capabilities
Responds to callbacks for cache consistency
Does RAID I/O directly to StorageBlades w/ iSCSI/OSD
DirectFLOW Client
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Metadata manager
Realm Control – admit blades, start/stop services, failover
File Manager – access control, cache consistency, file system semantics
Storage Manager – file virtualization (maps), recovery, reconstruction
Management console
Web-based GUI or Command Line Interface (CLI)
Status, charts, reporting
Storage management
Gateway function (NFS/CIFS) collocated on DirectorBlade
Fast processor and large main memory
Multiple DirectorBlades allow service replication for fault tolerance
DirectorBlades
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
AC Power
Each shelf has dual power supplies and battery
Automatic graceful shutdown if you lose AC power
Masks brownouts and short (5-sec) power glitches
Thermal
800 Watts in 4u!
Power supplies and batteries have fans that cool the shelf
Blades, power supplies, batteries, network cards all monitor tempurature
Warnings generated near temperature limit
Unilateral blade shutdown if a blade gets very hot
Graceful shutdown of a whole shelf if multiple blades are hot
Environment
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Bladeset is a storage (OSD) failure domain
Single OSD failure results in degraded operation and reconstruction
Two OSD failures results in data unavailability
Bladesets can be expanded or merged (but not unmerged) for growth
Capacity balancing occurs within a bladeset
Volume is a file hierarchy with a quota
One or more volumes compete for space within a bladeset
No physical boundaries between volumes, except quota limits
Volume is unit of DirectFlow metadata work
Each director blade manages one or more volumes
NFS/CIFS gateway workload is orthogonal to DirectFlow metadata
All director blades provide uniform/symmetric NFS/CIFS access
Bladesets and Volumes
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
What Problems Does It Solve?
It’s all about removing the bottlenecks in traditional storage
No RAID engine bottleneck
oClient driven RAID scales as the number of clients increases
oMultiple Volumes or DirectorBlades for Scalable Reconstruction
No Network Uplink bottleneck
o10GigE port or 4-Port Gig-E Link Aggregation Group per Shelf
Flexible, per File layouts (SDK Required)
oRAID1/5 for large streaming I/O
oRAID10 for N-to-1 Writes or Random I/O
oCustomizable Stripe width and depth
Control the number of spindles
Parity Overhead
Global Namespace
Single, web browser based management interface of 100’s of TBs
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Customizing For Your Environment
Pick your protocol
DirectFLOW, NFS, CIFS, any combination at one time
oMore Director Blades for NFS / CIFS performance
oMore Storage Blades for DirectFLOW performance
Interactive vs. Batch Processing
ActiveStor 5000 w/ larger cache sizes on Storage Blades for Interactive work
Fault tolerance
Configurable spares for multiple sequential Storage Blade failures
Configurable bladeset sizes for simultaneous Blade failure risk mitigation
Redundant network links
Storage Capacity Options
Smaller Capacity Blades
oMore spindles, less data to reconstruct, more shelves
Larger Capacity Blades
oFewer shelves, reduced double-disk failure risk
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Logically, How Does Data Flow
Linux Client w/ DirectFLOW Filesystem Client
IP Network IP Network
Director Blades
Storage Blades
NFS / CiFS Clients
NFS / CiFS Gateway
Metadata Manager
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Logically, How Does Data Flow
Linux Client w/ DirectFLOW Filesystem Client
IP Network IP Network
Director Blades
Storage Blades
NFS / CiFS Clients
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
NFS / CiFS Gateway
Metadata Manager
DB1
DB2
DB3
DB4
DB5
DB6
An Example Six Shelf System
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Logically, How Does Data Flow
Linux Client w/ DirectFLOW Filesystem Client
IP Network IP Network
Director Blades
Storage Blades
NFS / CiFS Clients
Blad
eset 1
Blad
eset 2
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Blad
eset 3
NFS / CiFS Gateway
Metadata Manager
DB1
DB2
DB3
DB4
DB5
DB6
An Example Six Shelf System, with Three Bladesets
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
Logically, How Does Data Flow?
Linux Client w/ DirectFLOW Filesystem Client
IP Network IP Network
Director Blades
Storage Blades
NFS / CiFS Clients
Blad
eset 1
Blad
eset 2
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Shelf – 10 SBs
Blad
eset 3
Vo
l1
Vo
l2
Vo
l3
Vo
l4
Vo
l5
Vo
l6
Vo
l7
NFS / CiFS Gateway
Metadata Manager
DB1
DB2
DB4
DB5
Vo
l1
Vo
l2
Vo
l4
Vo
l6
DB6
Vo
l5
Vo
l7
DB3
Vo
l3
An Example Six Shelf System, with Three Bladesets and Eight Volumes
Vol8
Vo
l8
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
How Do I Manage 100’s of TB?
All from a single http:// or command
line interface
PanActive Manager: Single GUI for
entire namespace management
Simple out-of-box experience
Seamlessly adopt new blades
Capacity & load balancing
Volumes and quotas
Snapshots
1-touch reporting capabilities for
capacity trends, asset ID, and
performance
Email and/or pager notification of
errors and warnings
Scriptable CLI for all features
Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
top related