elastic storage in openstack - meetupfiles.meetup.com/10602292/dec...
TRANSCRIPT
IBM GCG STG Lab
1 © 2014 IBM Corporation
Elastic Storage in OpenStack
Feng Shuo [email protected]
2014.12
IBM GCG STG Lab
2 © 2014 IBM Corporation
Agenda
From GPFS to Elastic Storage
ES in OpenStack Deep Dive
On-going Interesting Topics
IBM GCG STG Lab
3 © 2014 IBM Corporation
From GPFS to Elastic Storage
IBM GCG STG Lab
4 © 2014 IBM Corporation
What is GPFS
“General Parallel File System”: IBM’s shared disk, parallel cluster file system.
GPFS – IBM的基于共享磁盘的并行文件系统
支持AIX、Linux和Windows操作系统,以及IBM的Power、Z和Intel/AMD的x86平台.
最初面向高性能的商用或科学计算应用程序而设计,广泛应用于当今的超级计算机上
Cluster: 2-10,000 nodes, fast
reliable communication, common
admin domain.
Shared disk: all data and
metadata on storage devices
accessible from any node through
block I/O interface (“disk”: any
kind of block storage device)
Parallel: data and metadata flow
from all of the nodes to all of the
disks in parallel.
Network Shared Disk (NSD) Server Model
Shared Nothing Cluster (SNC) Model
Storage Area Network (SAN) Model
Storage
Storage Storage
TCP/IP or Infinband RDMA Network
Storage Network
TCP/IP Network
TCP/IP or Infinband Network
NSD
Servers
Application
Nodes Application
Nodes
IBM GCG STG Lab
5 © 2014 IBM Corporation
GPFS Architecture Highlights & Features
面向海量存储的架构
– 由多个磁盘或者LUN构成单一文件系统
– 完全分布式的数据和元数据存储
– 多个节点共享存储,并行访问
面向高吞吐率的架构
– 磁盘间条带(可跨越文件系统中所有磁盘)
– 支持大的磁盘块(up to 16M)
– 优化多文件并发写入的性能
高可靠性的架构
– 在节点层面上,在节点故障后可以实现自动的日志回滚操作以保证一致性
– 在数据层面上,支持内嵌Raid(GNR)方式和副本方式。
– 在线管理,支持动态的添加、删除磁盘以及节点,不需要重新挂载文件系统
工业标准的架构
– 单一名字空间,完全兼容POSIX语义
– 提供完善的锁机制用来保护读写的一致性
高度可扩展 – 大容量(10+PB, 8B files)
– 高带宽 (TB/s)
高容错性
– 最大三副本或N+4冗余的Erasure Code(ESS
only)
– 支持节点的自动Failover,以及节点之上NAS
集群服务的自动Failover
丰富的企业级特性
– 支持快照与文件克隆
– 支持NAS集群
– 原生的加密支持
高级数据管理技术
– HSM, ILM (DMAPI, storage pools &
filesets, storage policy)
– Multi-cluster (cross-cluster mounts)
– AFM: remote replication, WAN caching
Hadoop兼容层
– GPFS Connector for Hadoop applications
(e.g. MapReduce).
IBM GCG STG Lab
6 © 2014 IBM Corporation
典型的部署方式
Network Shared Disk (NSD) Server Model
Shared Nothing Cluster (SNC) Model
Storage Area Network (SAN) Model
Storage
Storage Storage
TCP/IP or Infinband RDMA Network
Storage Network
TCP/IP Network
TCP/IP or Infinband Network
NSD
Servers
Application
Nodes
DB Cluster
NAS Cluster
LTFS EE
HPC (Blue Gene, Sierra and Summit)
Technical Computing
Share-nothing DB Cluster Commercial Big-
data Products
Community Big-data Products
IBM GCG STG Lab
7 © 2014 IBM Corporation
Elastic Storage – SDS by GPFS
Elastic Storage是基于GPFS的软件定义存储
– 将GPFS作为跨越不同存储系统之上的统一数据平面,为管理平面提供统一的管理接口
– 为多种分布式应用提供统一的存储名字空间(the nature of GPFS as cluster filesystem)
– 不依赖于底层存储系统的成熟的企业级特性
NFS
Map Reduce Connector OpenStack
Flash
Disk
Tape
POSIX
Client workstations
Users and applications
Compute Farm
GSS
iSCSI
CIFS
Cinder Swift
Glance Manila
VMware
SRM VADP
VAAI vSphere
Single name space
Elastic Storage
Share Nothing
Cluster
Site A
Site B
Site C
GPFS
AFM
IBM InterCloud Store Universal Cloud gateway
connects IBM public cloud,
Amazon S3, Azure and other
storage providers
IC
Store
IBM GCG STG Lab
8 © 2014 IBM Corporation
Elastic Storage in OpenStack Deep Dive
IBM GCG STG Lab
9 © 2014 IBM Corporation
IBM对OpenStack社区的贡献
Legal support for drafting bylaws
Improvements to stability and
quality
Community bug squashing days
Permission building in the China
market
IBM P Series IBM Storwize V7000 IBM XIV
PowerVM driver
Dynamic hypervisor support
HA enhancements
Membership Services from HSLT
Globalization and localization enablement
Localization for Simplified Chinese
Crowd-sourced translation capability
API, quotas, Nova integration
Drivers for Elastic Storage
and IBM NAS
Drivers for Storwize family,
XIV and DS8000
CDMI support and generic
WSGI support
Elastic Storage
Server Elastic Storage
As Software
IBM GCG STG Lab
10 © 2014 IBM Corporation
Zoom into OpenStack Storage
GPFS/ES Cinder Driver • 自Havana开始正式进入OpenStack
• 要求所有访问存储的节点配置GPFS
• 后端通过GPFS文件来实现一个Volume
• 通过GPFS文件克隆来实现Volume的快照和快速克隆
• 支持精简配置
• 通过GPFS本身的副本或Erasure Code
保证数据可用性
IBM-NAS Cinder Driver • 自Juno开始正式进入OpenStack
• 通过NFS协议来访问Elastic Storage
系统 (配置为NFS集群)
• 提供与标准GPFS Driver相当的功能
Naturally
supported by
file storage
面向对象存储的Elastic Storage • 2014年9月正式发布安装及部署白皮书.
• 实现方式上是使用Elastic Storage作为swift 的后端存储.
• 通过ESS来实现Erasure Code级别的容错能力,在非ESS上使用副本
• 更深度的集成工作进行中
Neutron
Cinder
Horizon
Nova Swift Glance
Keystone
IBM GCG STG Lab
11 © 2014 IBM Corporation
Diving Deeper into Block Storage
A sample backend configuration
– volume_driver = cinder.volume.drivers.ibm.gpfs.GPFSDriver
– gpfs_mount_point_base = /gpfs0/OpenStack/cinder/volumes
– gpfs_images_dir = /gpfs0/OpenStack/glance/images
– gpfs_images_share_mode = copy_on_write
– gpfs_sparse_volumes = True
– volume_backend_name=GPFS_SPARSE
– gpfs_storage_pool=silver
Nova Compute
Instance Instance
Cinder Glance Image
Legend
Persistent volume control Image control
Persistent volume data Image data
Elastic Storage (GPFS) or CNFS based on GPFS
每个volume/image将以文件方式呈现在该目录中
使用GPFS的克隆(而非拷贝)来实现卷快照
使用稀疏文件来存储卷 (相当于精简配置thin-provision).
指定文件所在的GPFS Storage Pool,从而控制该文件具体处于哪些物理存储介质之上(下文将详细描述)
Cinder, Nova和Glance可以共享访问Elastic Storage中的数据,并统一以文件的方式访问
IBM GCG STG Lab
12 © 2014 IBM Corporation
Diving Deeper into Block Storage
在Volume API (Volume metadata)中可以指定的参数
– “data_pool_name”- 存储池,同上
– “dio” – 强制使用或者强制禁止DIO方式 (whatever KVM, Xen or Docker will use)
– “block_group_factor” – 设置预分配的尺寸 (用于扩展image时的性能优化)
– “replicas” – 本Volume/Image需要保存几个副本
– “write_affinity_depth” – 这些副本在GPFS内的分布方式 (FPO setup only)
– “write_affinity_failure_group” – 直接指定存储在哪些节点上 (FPO setup only).
深入理解Elastic Storage的数据以及存储布局
“gold”
SSD
10k rpm SAS
7200 rpm SATA
“silver”
“bronze”
D1
D1
D1
D1
Write with
allocation
on Node 1
Storage
Pool?
D2
D2
D2
D2
D3
D3
D3
D4
Node 1
(RG:1,0,1)
Node 2
(RG:1,0,2)
Node 3
(RG:1,0,3)
Node 4
(RG: 2,0,1)
可以通过上述方式手工指定,也可以通过Elastic Storage的Policy Engine统一配置,例如规则: RULE ‘default’ SET POOL ‘fpopool’ REPLICATE(3) AND setBGF(4) AND setWAD(1)
IBM GCG STG Lab
13 © 2014 IBM Corporation
Diving Deeper into Object Storage
基于OpenStack标准的Swift对象存储架构
可以实现单一存储系统中块与对象的统一存储
对单一对象的大小没有要求(mostly, currently 5TB)
– 基于GPFS的条带方式,大的存储对象也不会造成存储热点或者存储不均衡
可以使用ESS成熟的Erasure Code来代替传统的三副本方式
– 3x max performance improvement!!
– 大大降低网络以及存储本身的成本
所有节点都可以直接访问到所有的对象
– 添加新的节点不但可以增加存储,还可以提升可靠性
更加节能,并降低系统成本
– 远高于传统方案的存储密度
– 无缝的添加磁盘与节点
– 无缝的添加系统容量
– 只加入新的节点时不需要Rebuild或者任何数据移动
对企业级存储特性的无缝支持
– 磁带备份, 加密, 备份, 容灾, 自动分级存储等
Protocol
Nodes
Web Load
Balancer
Geo-Distributed GPFS Object Store
… GPFS
Swift
Client
Applications
SSD
Fast
Disk
Slow
Disk
Tape
GPFS
Swift
IBM GCG STG Lab
14 © 2014 IBM Corporation
Putting Everything Together
Nova VM rootfs disk
Glance Protected snapshot
Cinder backend
Swift backend
SNAP
Volume IMG VOL
RAW FS Elastic
Storage
Storage DR COW
Clone
COW
Clone
Elastic Storage AFM
Software Defined Storage
– Commodity servers, local disks and
affordable network
– Configurable storage through GPFS
– Almost all functionality from GPFS
Unified Storage Platform
– Unified Data Flow
– Unified Name Space
– Unified Data Management
Optimize with Virtualization
– Locality for VM instance
– Policy based life cycle mgmt
– Snapshot/Clone for image mgmt
OpenStack
HSM, ILM and Backup
Storage Data Magement
IBM GCG STG Lab
15 © 2014 IBM Corporation
On-going Interesting Topics
IBM GCG STG Lab
16 © 2014 IBM Corporation
Interesting topics not only for ES, but also community
进一步增强存储位置感知的能力
– 在使用服务器本地存储的GPFS OpenStack环境中,已经可以实现在分配时自动将第一个副本保存在本地存储媒介中(只要空间允许),但相对于Big Data对GPFS的应用水平而言,优化度还有一定差距
– 如何进一步优化读写?
• Cinder driver需要报告更加精确的存储位置信息(机柜+节点)
• 更高层的调度器(如HEAT)需要将此存储的位置信息计入调度参数的考量
进一步增强易用性
– 允许Nova直接通过ES来boot,而不是通过“boot from volume”或者是拷贝image到Nova本地的方式 (需要libvirt本身以及OpenStack和libvirt接口逻辑的工作).
– 自动化的安装与部署以及集成化的管理(集成底层存储管理)
在OpenStack内部实现数据共享 – Manila
– Manila是OpenStack的共享文件系统服务: https://wiki.openstack.org/wiki/Manila
– Manila为VMs提供共享的文件存储,这点与Cinder的逻辑类似,但是Cinder上的数据是无法直接共享的
– Manila还可以为Cinder难以支持的平台提供存储服务(即使不需要共享),如Docker
– Manila还可以实现Nova与其他应用之间的数据共享,比如Sahara中的大数据应用.
进一步实现与对象存储的数据共享
– SwiftOnFile: https://wiki.openstack.org/wiki/Swiftonfile
IBM GCG STG Lab
17 © 2014 IBM Corporation
Thanks
For more information
– GPFS on IBM Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSFKCN/gpfs_welcome.html
– IBM Private, Public, and Hybrid Cloud Storage Solutions:
http://www.redbooks.ibm.com/abstracts/redp4873.html?Open
– A Deployment Guide for Elastic Storage Object:
http://www.redbooks.ibm.com/redpieces/abstracts/redp5113.html?Open
– The OpenStack documentation for configuring the GPFS volume driver:
http://docs.openstack.org/havana/config-reference/content/config_overview.html
– The GPFS support in OpenStack Cinder Havana release blog:
http://www.ibm.com/developerworks/community/blogs/storage_redbooks/entry/gpfs_support_in_open
stack_cinder_havana_release?lang=en
– Manila Project: : https://wiki.openstack.org/wiki/Manila
– SwiftOnFile: https://wiki.openstack.org/wiki/Swiftonfile
IBM GCG STG Lab
18 © 2014 IBM Corporation
BACKUPS
IBM GCG STG Lab
19 © 2014 IBM Corporation
What GPFS is not
Not a client-server file system like NFS,
CIFS, or AFS/DFS: no single-server
performance and bottleneck scaling limits
No centralized metadata
server like Lustre (“MDS)” or
HDFS (“name node”)
TCP/IP Network
File Server
Client
Nodes
Storage Metadata Server
Metadata
Data Data
Network
data
metadata
data