designing storage area networks -2nd edition - tom clark

of 407

Designing Storage Area Networks, Second Edition, succinctly captures the key technologies that are driving the storage networking industry. Tom Clark's works are helping to educate the IT community to the benefits and challenges of shared storage and are recommended reading for anyone wishing to understand this exciting new technology." —Sheila Childs, VP Product Management, Legato Systems Chairperson, SNIA

Fibre Channel SANs have become a mainstay at the backend of the biggest corporations on the planet. The second edition of Designing Storage Area Networks brings the next wave of connection (IP) points and management into context, helping the user to quickly understand all the benefits before them." —Steve Duplessie, Founder and Senior Analyst, Enterprise Storage Group

Designing Storage Area Networks, Second Edition, provides a practical roadmap through the ever-changing landscape of SAN technology. The new Fibre Channel, IP, and virtualization initiatives covered in this work will enable customers to implement comprehensive shared storage solutions that reduce management overhead and cost." —John Webster, Founder and SeniorAnalyst, Data Mobility Group

Storage Area Networks (SANs) are now recognized as the preferred solution for fulfilling institutions' and enterprises' critical data-storage needs. Whether powered by Fibre Channel or TCP/IP and Gigabit Ethernet technology, SANs far exceed the capabilities of traditional storage access methods. SANs are quickly becoming the solution of choice for organizations that require high-volume data-handling capacity.

Written for network developers, IT consultants, administrators, and managers, this updated and greatly expanded edition of the best-selling Designing Storage Area Networks goes far beyond a straight description of technical specifications and standards. The text offers practical guidelines for using diverse SAN technologies to solve existing networking problems in large-scale corporate networks. With this book you will learn how the technologies work and how to organize their components into an effective, scalable design. In doing so, you will discover today's best methods for managing storage area networks, including new troubleshooting techniques.

Designing Storage Area Networks, Second Edition, also features detailed case studies that demonstrate how SANs can solve a number of commonly encountered business challenges, including LAN-free and server-free tape backup, server clustering, and disaster recovery. As an information-systems professional, you must keep pace with this powerful, evolving technology.

Key topic coverage includes:

[ Team LiB ]

• Table of Contents

Designing Storage Area Networks: A Practical Reference for Implementing Fibre

Channel and IP SANs, Second Edition

By Tom Clark

Publisher : Addison Wesley

Date

Published: March 28, 2003

ISBN : 0-321-13650-0

Pages : 592

of 407

� Using the SNIA Shared Storage Model

� Fibre Channel layers and protocols

� Fabrics and fabric switches

� Host bus adapters

� Fibre Channel RAID and Fibre Channel JBODs

� iSCSI and IP storage protocols and products

� SAN management and problem isolation techniques

� Building extended SANs for data center and remote storage access

[ Team LiB ]

of 407

[ Team LiB ]

• Table of Contents

Designing Storage Area Networks: A Practical Reference for Implementing Fibre

Channel and IP SANs, Second Edition

By Tom Clark

Publisher : Addison Wesley

Date

Published: March 28, 2003

ISBN : 0-321-13650-0

Pages : 592

Copyright

Preface

The Organization of This Book

Acknowledgments

Chapter 1. Introduction

Section 1.1. Using the SNIA Shared Storage Model

Section 1.2. Example: Carlson Companies

Section 1.3. Text Overview

Section 1.4. Chapter Summary

Chapter 2. Storage and Networking Concepts

Section 2.1. Networking in front of the Server

Section 2.2. The SCSI Architecture

Section 2.3. The Parallel SCSI Bus

Section 2.4. Network-Attached Storage

Section 2.5. Networking behind the Server


Chapter 3. Fibre Channel Internals

Section 3.1. Fibre Channel Layers

Section 3.2. 1Gbps and 2Gbps Transport

Section 3.3. Physical Layer Options

Section 3.4. Data Encoding

Section 3.5. Ordered Sets

Section 3.6. Framing Protocol

Section 3.7. Classes of Service

Section 3.8. Flow Control

Section 3.9. Naming and Addressing Conventions


of 407

Chapter 4. Fibre Channel SAN Topologies

Section 4.1. Point-to-Point

Section 4.2. Arbitrated Loop

Section 4.3. Fabrics

Section 4.4. Building Extended Fabrics

Section 4.5. Fabrics and Loops


Chapter 5. Fibre Channel Products

Section 5.1. Transceivers

Section 5.2. Host Bus Adapters

Section 5.3. Fibre Channel RAID

Section 5.4. Fibre Channel JBODs

Section 5.5. Arbitrated Loop Hubs

Section 5.6. Switching Hubs

Section 5.7. Fabric Switches

Section 5.8. Fibre Channel-to-SCSI Bridges

Section 5.9. Fibre Channel Extension Products


Chapter 6. IP SAN Technology

Section 6.1. Ethernet and TCP/IP

Section 6.2. Native IP Storage Protocols

Section 6.3. Discovery in IP SANs

Section 6.4. Quality of Service for IP SANs

Section 6.5. Security for IP SANs

Section 6.6. Wide Area Storage Networking


Chapter 7. IP SAN Products

Section 7.1. Gigabit Ethernet Switches

Section 7.2. IP Routers

Section 7.3. iSCSI Adapter Cards

Section 7.4. iSCSI Storage Devices

Section 7.5. IP Storage Gateways

Section 7.6. iSCSI-to-SCSI Bridges

Section 7.7. iSNS Servers


Chapter 8. SAN Software Products

Section 8.1. Server Clustering

Section 8.2. Tape Backup

Section 8.3. Data Replication

Section 8.4. Distributed File Systems and File Sharing


Chapter 9. Problem Isolation in SANs

Section 9.1. Simple Problem-Isolation Techniques

Section 9.2. Fibre Channel Analyzers

Section 9.3. iSCSI Network Analyzers

Section 9.4. Performance Tools


Chapter 10. Management of SANs

Section 10.1. Storage Network Management

Section 10.2. Storage Resource Management

of 407

Section 10.3. Storage Management

Section 10.4. Integration of Storage, Systems, and Enterprise Management

Section 10.5. The Common Information Model (CIM) (Bluefin)


Chapter 11. Storage Virtualization

Section 11.1. What Is Storage Virtualization?

Section 11.2. In-Band and Out-of-Band Virtualization

Section 11.3. Host-Based Storage Virtualization

Section 11.4. SAN Interconnect-Based Storage Virtualization

Section 11.5. Storage-Based Virtualization

Section 11.6. Multivendor Storage Virtualization

Section 11.7. File System and NAS Virtualization

Section 11.8. Tape Virtualization

Section 11.9. Virtualization and the Data Storage Utility


Chapter 12. Application Studies

Section 12.1. Post-Production Video Editing

Section 12.2. Prepress Operations

Section 12.3. LAN-Free and Server-Free Tape Backup

Section 12.4. Server Clustering

Section 12.5. Storage Consolidation

Section 12.6. Internet Service Providers

Section 12.7. Campus Storage Networks

Section 12.8. Remote Tape Vaulting

Section 12.9. Disaster Recovery


Chapter 13. SAN Issues

Section 13.1. Standardization

Section 13.2. Interoperability

Section 13.3. Management

Section 13.4. Convergence


Chapter 14. The Future of SAN

Section 14.1. Integration of SANs into Mainstream Networking

Section 14.2. Ubiquity of Shared Storage

Section 14.3. Virtualization

Section 14.4. Human Factors

Section 14.5. Contributing Technologies


Appendix A. SAN Resources

Section A.1. Standards and Proposals

Section A.2. Fibre Channel Technical and Marketing

Section A.3. IP Storage Technical and Marketing

Section A.4. Related Web Resources

Appendix B. SAN and Related Vendors

Section B.1. SAN System Vendors

Section B.2. SAN Storage and Tape Vendors

Section B.3. Fibre Channel Products

Section B.4. IP SAN Products

Section B.5. SAN Extension

of 407

Section B.6. SAN Management and Virtualization

Section B.7. Gigabit Ethernet Vendors

Section B.8. Test Equipment and Verification Labs

Appendix C. The Standardization Process

Appendix D. The Storage Networking Industry Association (SNIA)

Section D.1. Board of Directors

Section D.2. Executive Director and Staff

Section D.3. SNIA Technology Center

Section D.4. Customer Councils

Section D.5. Committees

Section D.6. Technical Workgroups

Section D.7. Industry Forums

Section D.8. SNIA and Other Industry Associations

Section D.9. Summary

Appendix E. The SNIA Shared Storage Model

Revision history

Usage terms

Executive summary

Acknowledgments

The shared storage vision

Why a model for shared storage?

The SNIA Shared Storage Model

Where can it be done?

The services subsystem

Additional topics

Some common storage architectures

Block storage aggregation in a storage network ("SAN appliance")

Summary and conclusions

Appendix F. The SNIA Dictionary of Storage Networking Terminology

Appendix G. SAN Essays

SAN Customers and SAN Vendors

Standardization and Storage Networking Technologies

Standards Compliance versus Interoperability

Storage Resource Management

Going the Distance with Storage Data

Shared Storage for the Masses

SAN Security

Infrastructures and Applications

Disaster Recovery in an Uncertain World

Enabling iSCSI Migration

Bibliography

[ Team LiB ]

of 407

[ Team LiB ]

CopyrightMany of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.

The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.

The publisher offers discounts on this book when ordered in quantity for bulk purchases and special sales. For more information, please contact:

U.S. Corporate and Government Sales

(800) 382-3419

[email protected]

For sales outside of the U.S., please contact:

International Sales

(317) 581-3793

[email protected]

Visit Addison-Wesley on the Web: www.awprofessional.com

Library of Congress Cataloging-in-Publication Data

Clark, Tom, 1947 Aug. 12–

Designing storage area networks : a practical reference for implementing storage area networks / Tom Clark—2nd ed.

p. cm.

Includes bibliographical references and index.

ISBN 0-321-13650-0 (alk. paper)

1. Computer networks. 2. Information storage and retrieval systems. 3. Computer storage devices. 4. Internetworking (Telecommunication) 5. Fibre Channel (Standard)

I. Title.

TK5105.5.C547 2003

004.6—dc21 2003041450

Copyright © 2003 by Pearson Education, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the

of 407

publisher. Printed in the United States of America. Published simultaneously in Canada.

For information on obtaining permission for use of material from this work, please submit a written request to:

Pearson Education, Inc.

Rights and Contracts Department

75 Arlington Street, Suite 300

Boston, MA 02116

Fax: (617) 848-7047

Text printed on recycled paper

1 2 3 4 5 6 7 8 9 10—CRS—0706050403

First printing, March 2003 [ Team LiB ]

of 407

[ Team LiB ]

Preface This book expands and updates the content of my earlier book Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel SANs, published in the winter of 1999. At a thin 202 pages, that book was the first to provide a brief overview of storage area network (SAN) technology and was widely adopted as an introductory and training text for both vendors and customers. Because of the rapid growth and technical development within the storage industry over the past three years, however, new functionality and technology initiatives are already trans forming the SAN landscape. Advances in Fibre Channel performance and switch technology, the introduction of SANs based on Transmission Control Protocol/Internet Protocol (TCP/IP) and Gigabit Ethernet, and the emergence of storage virtualization have given customers more options for addressing their data storage needs. Although some of these technical initiatives are presented in my second book, IP SANs: A Guide to iSCSI, iFCP, and FCIP Protocols for Storage Area Networks, this text focuses on the practical considerations implementers should think about in designing SAN solutions with today's diverse technologies.

Storage area networks are now recognized as the preferred solution for fulfilling a wide range of critical data storage needs for institutions and enterprises. The success of SANs in establishing real end-user value is affirmed by the adoption of the technology by all major solution providers as their flagship server and storage offerings. IBM, Hewlett-Packard, Sun, Dell, and others now provide certified SAN configurations for performance, high availability, and backup of storage data, and storage vendors such as EMC, Hitachi Data Systems, XIOtech, and others offer SAN interfaces on their premier products. Although the current market penetration of SAN-based solutions is only 20 percent of the total storage market, SANs are expected to capture the majority of the market within a few years.

The emergence of storage networking represents the fusion of two distinct technologies, each with its own priorities, vocabulary, and culture. For 20 years, the glacial pace of storage development gradually resulted in smaller, higher-capacity disks and faster channel performance, but this evolutionary inertia was always bounded by a rigid master/slave relationship between hosts and storage. By contrast, the rapid and more volatile development of networking technology has overturned previous architectures and relationships and, as demonstrated by the Internet, has delivered a global capability with innovative and enriched feature sets. Combining the more conservative outlook of storage with the more dynamic worldview of networking has resulted in a collision of cultures that has actually benefited both disciplines. Data storage has been transformed from the frumpy wallflower of data communications into a stellar attraction for investors, vendors, and technologists. Networking has gained an endless supply of high-volume data that further extends and justifies its presence throughout the enterprise. It has also gained new opportunities to create storage-specific interfaces and functionality.

This synthesis of storage and networking into a new technology has presented both challenges and opportunities for customers and vendors. Storage administrators and managers must now be conversant in networking concepts and consequences such as addressing, routing, and network convergence, and network architects and administrators must learn the mysteries of LUNs, JBODs, and RAID levels. This book is therefore written for a broad audience of IT managers, administrators, consultants, and technical staff whose responsibilities may span both storage and networking implementations. In addition, storage networking has generated new concepts and issues previously undefined by either storage or networking. The unexpected consequences of positioning storage in an open, networked environment are also examined in the text, and particularly in the applications studies.

Although storage networking is an enabling technology for dealing with the massive growth of storage data, it is often criticized for presenting persistent interoperability and management issues. Some of the problems associated with SANs are simply the byproduct of a new technology struggling to break new ground. Other problems are, unfortunately, vendor-induced in an effort to retain market share. Overcoming interoperability issues and management complexity is a prerequisite for wider-scale adoption of SAN technology, and several industry initiatives are now attempting to address the remaining obstacles. Although it may take several years for some of these efforts to bear fruit, this book discusses them to indicate where help is on the way.

In the first edition of Designing Storage Area Networks, the underlying infrastructure, or plumbing, for SANs was exclusively Fibre Channel. As the first successful gigabit serial transport, Fibre Channel pioneered the signaling and data-encoding mechanisms later adopted by Gigabit Ethernet. Had Gigabit Ethernet appeared first, it might have preempted Fibre Channel as a storage network transport. Although Ethernet has on its side the massive momentum of market presence in the rest of the data communications space, its tardy arrival in storage networking now puts it in conflict with the flourishing Fibre Channel. Although this contention has generated ongoing religious disputes among vendors, both

of 407

Fibre Channel and Ethernet are, after all, simply plumbing. This book provides technical detail on both Fibre Channel and Ethernet-based SAN technologies, in the expectation that customers will make their own decisions on which transport best meets their needs. The vast majority of vendors selling SAN solutions are planning to provide both Fibre Channel and IP-based products to accommodate their customers' requirements.

The people who have direct responsibility for designing and implementing SANs have a desperate need for more detailed information on product capabilities and interoperability, information such as which version of a particular backup application works with a specific database or level of operating system. Given the lead time required for hard-copy publishing, it is impossible to capture this very useful but granular information. By the time such a text hit the stands, it would be outdated and of little value. The Storage Networking Industry Association (SNIA) is currently establishing end-user organizations that maintain online databases of customer issues and solutions. I urge any reader seeking more detailed information on SAN solutions to become involved in this SNIA-led effort and to help build the practical resources that will benefit the user community at large.

Designing a storage network requires, first of all, answering a basic question: What is the application? Customers do not, after all, spend millions of dollars on storage arrays, tape subsystems, switches, servers, and cabling for the sake of the technology itself. Applications are fostering SAN deployment, and applications are as varied as the core requirements of diverse institutions and enterprises. Although vendors may package canned solutions to fulfill common needs such as storage consolidation or tape backup, the SAN architect should avoid being driven by vendors, instead driving the vendors to solve their specific application requirements. This text attempts to define the most common applications that are more efficiently served by SAN technology, but these examples only begin to address a much broader spectrum of real-world applications that storage networking may benefit. [ Team LiB ] [ Team LiB ]

The Organization of This Book This book takes you through a progression of concepts that in the end should provide a foundation for understanding storage networking infrastructures and applications. It assumes no prior knowledge of SANs and yet attempts to strike a balance between basic and advanced content. Hopefully, this mix will provide sufficient technical detail for those who need it and meaningful overview for those who want to understand SAN technology at a more abstract level.

The first two chapters provide a framework for understanding the central concepts of shared storage. Chapter 1 reviews the SNIA Shared Storage Model, which clearly abstracts the basic layers of storage applications and underlying infrastructure. The SNIA Shared Storage Model is a useful tool for understanding SAN architectures and positioning them in relation to upper-layer application requirements. It has also proven to be a useful tool for justifying SAN acquisitions to management by explaining how a particular solution will better serve business requirements. Chapter 2 provides an overview of storage and networking concepts and explains how the fusion of these ideas has created new means to solve data storage issues.

Because Fibre Channel was the first transport to appear on the storage networking scene and continues to ship in significant volumes, the next three chapters provide a technical discussion of Fibre Channel protocols, topologies, and products. Chapter 3 reviews lower-layer physical transport, protocols, and addressing. Chapter 4 reviews Fibre Channel topologies, with emphasis on the Fibre Channel fabric switch services provided by most SAN solutions today. Chapter 5 describes the product suite developed by vendors for Fibre Channel servers, storage, interconnects, and other components. Collectively, they provide a rich toolset for the SAN architect in designing efficient solutions.

Beginning in 2000, the initiative to transport storage data over mainstream TCP/IP networks generated IP storage networking architectures and products. These products further enhance the ability of SAN designers to craft shared storage configurations that leverage IP exclusively or integrate both IP and Fibre Channel into a heterogeneous storage network. Chapter 6 reviews the protocols and issues unique to IP SANs, and Chapter 7 describes IP storage products that have been introduced to the market over the past two years.

Chapter 8 provides an overview of storage-specific applications supported by the SAN infrastructure. Server clustering for high-availability data access, tape backup for data archiving, and data replication via disk mirroring are commonly deployed as part of a SAN solution, although these storage applications in turn are intended to serve high-level customer business applications and databases.

Inevitably, the gains provided by SANs are accompanied by some pain. Although vendors attempt to debug and certify

of 407

SAN configurations in advance, problems naturally arise during installation or subsequent operation. Chapter 9 explains some basic troubleshooting techniques and tools for identifying and isolating storage network problems.

Chapter 10 discusses SAN management and its unique challenges. Managers of shared storage must integrate transport management with management of data placement on disk or tape. Creating a unified management capability has been difficult, even for software vendors with ample resources to address SAN management. The Common Information Model (CIM) initiative, however, is at last getting traction in the industry and may provide the framework required for comprehensive management of storage networks.

Chapter 11 attempts to describe the amorphous entity known as "storage virtualization." Storage virtualization has suffered somewhat from excessive vendor marketing, but is a viable technology for simplifying storage administration and opening a broader market for SAN solutions. This chapter covers the theoretical capabilities of storage virtualization as well as the more limited functions currently available in shipping products.

Institutions and enterprises have been implementing SANs for a wide variety of applications. Some of these are discussed in Chapter 12, Application Studies, with examples of potential issues customers might face in the course of deploying and administering SANs. Where applicable, combinations of Fibre Channel and IP storage products are proposed, especially for addressing metropolitan and wide area SAN configurations.

Chapter 13 discusses the still unresolved issues of SAN technologies, including interoperability, management, and convergence of SANs with mainstream networking. For engineers, these issues are simply challenges to be mastered, provided that market competition does not artificially thwart their efforts.

As a summary of SAN design, Chapter 14 offers speculation on the future of storage networking. That future promises to be as dynamic as the emergence of SANs was, with the rapid development of new storage initiatives such as virtualization and IP-based SANs, and accommodation of new host interconnects such as InfiniBand.

In this edition I have included in the appendixes additional material that may be useful as reference or background information on storage networking in general. The SNIA has produced some very valuable work in the form of the SNIA Shared Storage Model and the SNIA Dictionary. I encourage vendors as well as consumers of storage networking technology to take advantage of the SNIA's ongoing work and, when possible, to participate in SNIA activities. The more closely customers and technologists align their common goals, the more quickly viable solutions can be put into customers' hands.

[ Team LiB ]

of 407

[ Team LiB ]

AcknowledgmentsIn writing the current edition of Designing Storage Area Networks I have been the beneficiary of the hard work done by thousands of others over the past decade of SAN development. Explaining how a technology works is trivial compared with the enormous effort required to define and produce viable solutions. I therefore first of all wish to acknowledge, collectively, the technologists, engineers, architects, implementers, systems engineers, support personnel, end users, and yes, even marketers, who have helped to shape the diversity of products and applications that have enabled storagenetworking. Storage networking is an enabling technology for ready access to information, which in turn holds promise for improving our common condition.

I would like to thank Karen Gettman, my editor, and Emily Frey and the staff at Addison-Wesley for managing this project and facilitating the reviews and compilation of the manuscript. Without their constant support and efficiency in handling publication issues, seeing this work through to completion would not have been possible.

I would also like to thank my peers who reviewed the text, offering suggestions and corrections to the draft manuscript. My thanks to Larry Krantz, Charles Monia, Milan Merhar, Bob Snead, and Gary Orenstein for their endurance and their input on the technical and literary aspects of this work.

My frontline editor and wife, Lou Clark, has now become accustomed to the spousal abandonment that a book project implies. The several months of lost weekends, solitary evenings, techno-babbling, and occasional grumpiness of the author would no doubt have been unbearable had it not been punctuated by the Pacific salmon and halibut runs, the alder smoke from the Lynx grill, the bounty of the eastern Washington vineyards, and the excellent food at Kanishka's Cuisine of India in downtown Redmond, Washington. Constant and enduring thanks to Lou for reminding me that man cannot, after all, live on command descriptor blocks and protocol primitives alone.

This book has repeated references to the Storage Networking Industry Association, on whose board I sit. Funded primarily by storage and storage networking vendors and staffed primarily by volunteers from member companies, the SNIA has attracted great talent and expertise to fulfill its mission of ensuring that storage networks become complete and trusted solutions across the IT community. Work on standards requirements for new storage technologies, standards compliance, interoperability, management, education, and other efforts requires consistent focus and dedication, and the people who have undertaken these tasks on behalf of the IT community are seldom publicly recognized for their work. I therefore would like to acknowledge all the volunteer technologists who have invested their time and energy in SNIA technical workgroups, committees, forums, and the Technical Council and board, as well as the companies and academic institutions who have sponsored them.

Thanks, too, to Jim Vale of Network Associates, who early on recognized the value of SANs for IP networking diagnostics and is helping to bring storage awareness to the mainstream IT community.

Finally, I would like to acknowledge the customers who have provided valuable insight into the strengths and weaknesses of SAN solutions in practical deployments. Many thanks to the members of the SNIA Customer Executive Council and the SNIA Customer Advisory Council, and in particular to Gary Johnson and Mark Price of Carlson Companiesand to John Alley, Brad Basden (and friends), and Mark Licata of Microsoft Corporation.

Tom [email protected]

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 1. Introduction Designing storage area networks requires an understanding of the upper-layer applications that depend on storage resources as well as the storage architectures available to satisfy them. Although the specific application requirements may vary from customer to customer, a framework has been developed to help storage architects determine which storage solutions may be most appropriate. The SNIA Shared Storage Model (SSM) provides a guide to understanding the relationships between host applications, storage networks, and storage facilities. [ Team LiB ][ Team LiB ]

1.1 Using the SNIA Shared Storage Model Sharing storage resources among multiple servers or workstations requires a peer-to-peer network that joins targets to initiators. The composition of that network and the type of storage data traversing it vary from one architecture to another. Generally, shared storage architectures divide into storage area networks (SANs) and network-attached storage (NAS). For SANs, the network infrastructure may be Fibre Channel or Gigabit Ethernet, and the type of storage data being transported is block Small Computer Systems Interface (SCSI) data. For NAS, the network infrastructure is typically Ethernet (Fast Ethernet or Gigabit Ethernet), and the type of storage data carried across the network is file-based. At the most abstract level, then, the common denominator between SAN and NAS is that both enable sharing of storage resources by multiple initiators, whether block-based or file-based.

Understanding the role of direct-attached, SAN, and NAS solutions and fitting them into a coherent IT storage strategy can be a challenge for both technologists and the managers who sign off on major storage acquisitions. The SNIA Shared Storage Model offers a useful framework for understanding the relationships between upper-layer applications and their supporting storage infrastructures. The ability to map current storage deployments to proposed solutions helps to clarify the issues that storage architects are attempting to address and also creates a framework for projecting future requirements and solutions.

As shown in Figure 1-1, the Shared Storage Model establishes the general relationship between user applications that run on servers and hosts and the underlying storage domain. Applications may support user activity such as processing online transactions, mining databases, or serving Web content. Storage-specific applications such as management, backup, cluster serving, or disk-to-disk data replication are grouped within the services subsystem of the storage domain. The model thus distinguishes between end-user or business applications at the upper layer, and secondary applications used to monitor and support the lower-level storage domain.

Figure 1-1. The SNIA Shared Storage Model overview

of 407

The storage domain subdivides into three main categories: the file/record subsystem, the block aggregation layer, and the block subsystem. The file/record subsystem is the interface between upper-layer applications and the storage resources. Database applications such as SQL Server and Oracle use a record format as processing units, whereas most other applications expect to process files.

Whether useful information appears to the upper-layer application as records or files, both formats are ultimately stored on disk or tape as contiguous bytes of data known as blocks. The size of a data block can vary from system to system, as can the method of mapping records or files to the blocks of bytes that compose them. In all cases, the storage domain requires some means of associating blocks of data with the appropriate record or file descriptors. This function isdepicted as the block aggregation layer, which may be based on the host system, within the storage network, or at the storage device. Having been identified with specific records or files, the blocks themselves are written to or read from physical storage, shown in the Shared Storage Model as the block subsystem.

Also part of the storage domain, but positioned as an auxiliary subsystem, the services subsystem contains a number of storage-specific functions, including management, security, backup, availability, and capacity planning. These servicesmay appear as integrated functions in storage products or as stand-alone software applications used to monitor and administer storage resources. A particular block aggregation method may require a unique management service. NAS devices, for example, may perform backups somewhat differently than their SAN brethren, requiring a separate type of application in the services area.

Considerable engineering effort has been invested in creating products to reliably integrate the file/record, block aggregation, and block subsystems into viable shared storage solutions. Perfecting gigabit transport of block data to ensure data integrity over very high speed serial links, for example, took years of standards development and verification testing. Development of the services subsystem was not feasible until these basic infrastructure issues were resolved. Consequently, the creation of management, security, and other auxiliary services has trailed somewhat behind the deployment of shared storage networks. Today, the lack of enhanced management services inhibits further expansion of shared storage in the market, and the focus of the storage industry is shifting from infrastructure to auxiliary services that will make it easier to install and support shared storage solutions. In using the Shared Storage Model to position your own storage solutions, you should anticipate that new products will become available in the services subsystem to enable management, security, and other functions.

With the layered architecture of the Shared Storage Model as a guide, it is now possible to insert server and storage components to clearly differentiate between direct-attached, SAN, and NAS configurations. As shown in Figure 1-2,direct-attached storage (DAS) extends from the server to the storage target by parallel SCSI cabling. This is the mostcommon storage configuration today, although shared storage is expected to gradually displace DAS over the next few years. In this example, the left side of Figure 1-2 shows a server with logical volume management (LVM) and software RAID (redundant array of independent disks) running on the host system. As the server receives

of 407

information from the application that must be written to disk, the software RAID stripes blocks of data across multiple disks in the block layer. The host thus performs the block aggregation function. Whereas software RAID executes the mechanics to striping blocks of data to multiple disks, the logical volume manager presents a coherent image of the datato the upper-layer application in the form of volume (for example, M: drive), directories, and subdirectories.

Figure 1-2. Direct-attached storage in the SNIA Shared Storage Model

On the right side of Figure 1-2, servers are shown that have a SCSI attachment to a disk array containing an integrated RAID controller. In this instance, the host is relieved of the task of striping data blocks via software RAID. Instead, the array itself performs this function. Consequently, it is shown overlapping the block and block aggregation layers.

SANs alter the relationship between servers and storage targets, as shown in Figure 1-3. Instead of being bound by direct parallel SCSI cabling, servers and storage are now joined through a peer-to-peer network. As in direct-attached storage, logical volume management, software RAID, and hardware RAID still play their roles, but the connectivity between servers and storage devices now allows you to attach any server to any storage target. The exclusive ownership of a storage resource by a server, symbolized by the umbilical tether of SCSI cabling, is no longer mandatory.You can assign shared storage resources at will to designated servers, and you can alter the relationships between servers and storage dynamically to accommodate changing application requirements.

Figure 1-3. Changing the relationship between servers and storage via a SAN

of 407

Replacing direct-attached connections with a more flexible network interconnection enables new storage solutions. For example, you can consolidate storage, share resources for both tape and disk, and cluster multiple servers for high availability. Depending on the SAN topology you use, you can also scale networked storage to higher populations of servers and storage devices. A large disk array, for example, can support tens or hundreds of servers in a single SAN, amortizing the cost of the array over more hosts and streamlining storage administration.

The SAN infrastructure can be Fibre Channel, Gigabit Ethernet, or, with the appropriate IP storage switches, both. To accurately represent actual customer deployments, this model could also depict multiple SANs servicing various upper-layer applications and providing attachment to common or distinct storage resources, as shown in Figure 1-4. For heterogeneous environments, it might be useful to call out which SANs use Fibre Channel fabrics exclusively and which involve a mix of Fibre Channel and IP components.

Figure 1-4. Multiple SANs within the Shared Storage Model

The Shared Storage Model positions NAS devices partly in the file/record subsystem layer, extending down to the block subsystem. NAS devices serve up files and so naturally include the block aggregation functions required to put data to disk. As shown in Figure 1-5, a NAS device is essentially a file server with its own storage resources. NAS devices typically use NFS (Network File System) or CIFS (Common Internet File System) protocols to transport files over

of 407

the local area network (LAN) to clients. For NAS, the internal SCSI access transport between the NAS intelligence (head) and its physical disk drives is transparent to the user. The NAS disk drive banks may be integrated drive electronics (IDE) at the low end, SCSI attached, or Fibre Channel. Network Appliance, for example, uses Fibre Channel arbitrated loop disk drives for mass storage. This is in effect a SAN behind the NAS device, efficiently serving up blocks of data that the NAS head assembles into files for NFS or CIFS transport.

Figure 1-5. The role of NAS in the Shared Storage Model

The Shared Storage Model in Figure 1-6 captures the general relationship among direct-attached, SAN, and NAS solutions. Because many enterprises have a mix of direct-attached, SAN, and NAS storage solutions, this is a useful framework for associating specific upper-layer applications with current and planned storage configurations. Applications currently running on direct-attached storage, for example, can be redrawn with SAN or NAS components, or multiple direct-attached storage arrays can be redrawn with consolidated SAN-based arrays and SAN-attached hosts.

Figure 1-6. Using the Shared Storage Model to show a mix of DAS, SAN, and NAS solutions

of 407

The practical value of applying the SNIA Shared Storage Model is in defining the current user applications and supporting storage resources and mapping those to new infrastructures. This in turn provides a comprehensive overviewof customer storage requirements and options and can be used as a blueprint for further development of the network. [ Team LiB ][ Team LiB ]

1.2 Example: Carlson Companies An excellent example of this practical application of the model is provided by the IT staff of Carlson Companies. Carlson Companies is one of the largest privately held companies in the United States, with more than 180,000 employees in more than 140 countries. Its IT division, Carlson Shared Services, acts as a service provider to its internal clients and consequently must support a spectrum of user applications and services.

In formulating a global IT strategy for networking and storage, the Carlson storage architects began with their current applications and the supporting server/storage infrastructure. Originally, Carlson's business applications were supported on a combination of direct-attached storage arrays and internal storage. As shown in Figure 1-7, database applications ran on servers attached to large storage arrays, and other applications ran on servers with internal storage.

Figure 1-7. Carlson Companies' storage access prior to installing shared storage

of 407

Overlaid on the SNIA Shared Storage Model, Carlson Companies' original storage configuration clearly demarcates which applications access data via records, which via files, and which via device-level block aggregation or host-supported block aggregation. The first step in determining which applications might be better served by shared storage and which shared storage technologies are most appropriate for specific upper-layer applications is to diagram the upper-layer applications and their storage access methods.

By using the Shared Storage Model, the storage architects clarified the relationship between application requirements and storage resources and also revealed the administrative overhead incurred by dedicated storage units. The system required four large storage frames and six additional arrays to support Carlson's data center applications. Each storage unit required separate administration as well as maintenance contracts from the supplying vendor. Because storage consolidation is one of the key benefits of SANs, the Shared Storage Model offered a framework for modifying the components in the block subsystem to reduce the number of storage devices while still serving Carlson's business applications.

As shown in Figure 1-8, analysis of Carlson's former storage deployment led to a proposed shared storage configuration that leveraged both SAN and NAS technologies to streamline storage administration and enable more efficient use of storage capacity. This design replaces the four large direct-attached storage frames with a single SAN-attached storage array supporting more than 10 terabytes of data. In this example, Oracle and other Carlson custom applications now share a single, highly available storage resource that is more easily administered and offers more economical maintenance support compared with the four direct-attached storage frames it replaced.

Figure 1-8. Carlson Companies' storage access after installing SAN and NAS storage

of 407

Shared directories and other internal applications have moved from direct-attached, dedicated storage to large NAS filers. In this instance, NAS was chosen for its cross-platform support and for its ability to serve remote NAS access for Carlson clients. Infrastructure applications that have no compelling reason to reside on either NAS or SAN remain on internal storage, as shown on the right side of the diagram.

Remote storage access is also a significant factor for organizations with geographically dispersed sites. In Carlson Companies' global IT strategy, safeguarding corporate data generated at remote locations is essential for business continuance. Relying on each remote office to back up its own data locally offers no guarantee that backups are successfully completed or that data can be restored in the event of failure. Consequently, Carlson decided to streamline backup operations by bringing remote backup operations into its central data center.

As shown in Figure 1-9, the Shared Storage Model is used to define four separate remote office backup scenarios. Remote offices with no local file servers have valuable corporate data on internal drives of laptops and desk workstations. Typically (and unfortunately) for most enterprise networks, this data is rarely backed up, resulting in loss of productivity if a laptop or desktop disk drive fails. For these sites, Carlson uses software that enables block change backup, meaning that only the data that has changed since the last backup is sent to the data center backup facility. This arrangement reduces the amount of data that must be sent across the wide area network and accomplishes the goal of securing dispersed corporate data assets. Similarly, remote sites with a local file server can be backed up using block changes, again reducing the amount of traffic generated over the WAN.

Figure 1-9. Carlson Companies' remote backup scenarios

of 407

In the remaining two scenarios, a remote office may have a local NAS or SAN installed. For NAS, Carlson is leveraging vendor-supplied file backup utilities to replicate data from the remote site to the data center NAS equipment. At the datacenter, the NAS-based data can then be centrally archived to tape. For remote SAN installations, Carlson is using IP storage technology to transfer locally generated Fibre Channel SAN data to the corporate SAN. In Carlson's case, the data center SAN is built with an IP SAN core, thereby facilitating IP-based block backup between remote and central SANs.

For Carlson Companies, the SNIA Shared Storage Model offers a coherent framework for analyzing current and future storage requirements for both the data center and its dispersed remote locations. For storage architects, the Shared Storage Model is a tool for defining application and storage relationships while enabling drill-down to more detailed storage-specific issues. For CIOs and IT executives, the model provides a basis for understanding the composition of their storage deployments and seeing how proposed acquisitions contribute to broader strategic IT goals. Additional information on the SNIA Shared Storage Model is available in Appendix E as well as the SNIA Web site at www.snia.org.[ Team LiB ][ Team LiB ]

1.3 Text Overview The first edition of this work attempted to provide a comprehensive overview of SAN technology for both technical and nontechnical readers. Because SANs were relatively new arrivals on the storage scene, it was possible to accomplish this goal in a fairly short work by using a dense writing style that one reader has called "information compression." The current work necessarily includes additional discussion of new SAN technologies that have emerged over the past three years, but it attempts to preserve an information compression ratio that delivers more knowledge in fewer pages. Because few people in today's information technology space have the time to read a work cover to cover, it is hoped that this edition of Designing SANs will be a useful reference for readers who need ready access to specific information on various SAN topics.

The chapter progression begins with lower-level technical overviews of storage and networking primarily related to the underlying SAN plumbing that supports higher-level applications. This discussion includes both abstract standards and the products developed from them. SAN infrastructure components that were previously based on Fibre Channel alone now include IP-based solutions, so additional space in the current work is devoted to heterogeneous SAN design options.Following the infrastructure discussion, middle-layer SAN issues, such as storage-specific applications, management, and virtualization, are reviewed. With these additional concepts in place, it is then possible to discuss concrete implementations of SANs in customer environments. The text concludes with a general overview of outstanding issues

of 407

yet to be resolved in the areas of standardization, interoperability, and management, along with speculation on the future direction of storage area networking as the adoption of SANs expands throughout the market.

Chapter 2 provides an overview of both storage and networking concepts and explains how the fusion of these technologies has created new means to solve data storage issues. As in any dialectical synthesis, the resulting product has new attributes quite distinct from its constituent parts. Storage has thus placed new demands on networking, and networking has fundamentally altered storage relationships. As a fusion technology, storage networking requires a basic understanding of networking and storage—a need addressed by this chapter—as well as new concepts addressed by later chapters.

Storage area networking has deep roots in Fibre Channel technology. Fibre Channel was the first serial transport to successfully solve gigabit transport, data encoding, optics, transport protocol, and topology issues required for high-performance shared storage applications. Chapter 3, Fibre Channel Internals, reviews the lower-level operation of the Fibre Channel physical and protocol layers, including framing, class of service, flow control, and naming and addressing conventions.

Chapter 4 provides the foundation for understanding Fibre Channel topologies that have been employed for SANs. Today, most Fibre Channel SANs are based on switched fabrics. Chapter 4 therefore focuses on fabrics and fabric switch features and devotes less space to point-to-point and arbitrated loop configurations. Arbitrated loop, however, is still commonly used for JBODs ("just a bunch of disks") and as back-end storage for some NAS products, and so this chapter also discusses loop principles.

Chapter 5 examines the Fibre Channel products that have been created to accommodate server and storage attachment to a SAN. This spectrum of products includes transceivers and cabling, host bus adapters (HBAs), Fibre Channel-attached storage and tape, loop hubs, and fabric switches. Because extension of native Fibre Channel over wide areas using tunneling or dark fiber is dependent on the same Fibre Channel behavior demonstrated in a local environment, native Fibre Channel extension and IP tunneling products are also grouped into this chapter.

Over the past two years, storage over IP has entered the SAN market to complement or sometimes compete with Fibre Channel solutions. Chapter 6 provides a brief overview of IP SAN technology. Native IP storage protocols include Internet Fibre Channel Protocol (iFCP) and Internet SCSI (iSCSI). As with Fibre Channel, IP storage must provide discovery and naming mechanisms, and these are reviewed in the sections on Service Locator Protocol (SLP) and Internet Storage Name Service (iSNS). Because IP SANs directly converge mainstream IP networking with storage, additional IP services, such as Quality of Service (QoS) and IP Security that can be applied to storage, are discussed. IP is also an enabling technology for metro and wide area SANs, and Chapter 6 concludes with a review of buffering and speed-of-light latency issues that a storage architect must consider when designing SANs over distance.

Chapter 7 examines various classes of IP SAN products that have been or will be introduced into the market. Much of an IP SAN infrastructure can be borrowed from mainstream IP networking. Gigabit Ethernet switches and IP routers, for example, are not specific to SANs but can be incorporated into SAN design without modification. Similarly, TCP offload engines (TOEs) have been developed to solve host processing issues historically associated with TCP but also have become enablers for high-performance IP SANs. IP storage gateways can perform Fibre Channel-to-iSCSI or Fibre Channel-to-iFCP conversion, or both. As in the discussion of Fibre Channel products, the focus here is on product functionality and not specific vendor implementation. Although it is tempting to review and compare individual vendor products, the market is moving too quickly for a book to capture capabilities of specific vendor products.

After establishing a foundation for understanding SAN plumbing options in the preceding chapters, Chapter 8 begins a higher-level discussion of SAN software applications. This chapter reviews storage-specific applications, including server clustering, backup, data replication, distributed file systems, and file sharing. Some of these solutions are software only;others can be embedded in storage hardware products.

Chapter 9 examines problem isolation in SANs. Problems can and do occur eventually and can occur at any layer in a SAN configuration. This chapter looks at some simple problem isolation techniques as well as hardware and software tools that facilitate identifying and isolating faults in a SAN.

Management of SANs is a broad topic because it includes management of the SAN transport as well as data placement and resource management. Chapter 10 discusses the major categories of SAN management and management protocols that can be used. This chapter also includes an overview of the Common Information Model (CIM) initiative. Although CIM has been under construction for some time, it has recently received strong industry support in the hopes that it will enable a coherent and unified solution for managing both data transport and data placement from a single platform.

of 407

Chapter 11 takes the reader into the Twilight Zone technology of storage virtualization. Like a quantum particle in a transient state, storage virtualization can both exist and not exist in the same moment. Elementary storage virtualization such as RAID certainly exists and has provided significant value for storage applications for some time. Advanced, application-aware storage virtualization has yet to appear and will require significant engineering effort to bring into being. The vendor and marketing hype aside, storage virtualization holds great promise for SANs and will enable customers to implement shared storage solutions throughout their networks.

Chapter 12 covers application studies of SAN solutions. From storage-intensive operations, such as full-motion video editing and prepress graphic creation, to more ubiquitous customer requirements for streamlined tape backup, SANs have successfully addressed a wide range of business needs. In discussing storage consolidation, server clustering, content distribution, disaster recovery, and other applications, the material in Chapter 12 attempts to offer examples that accommodate Fibre Channel, storage over IP, or both in a heterogeneous solution. Where possible, the SAN infrastructure is presented in a technology-neutral way so that the SAN architect can thereafter overlay the customer transport of choice.

Chapter 13 reviews some of the main areas of concern for SANs, including standardization, interoperability, and management. These are mutually dependent and overlapping issues that vendors and customers together must address.Standardization of a technology, for example, has little value for customers if it does not result in interoperability of vendor products. Similarly, even the most sophisticated management platform would be ineffectual if its cycles were consumed reporting errors due to standards violations or interoperability conflicts.

Chapter 14 is my opportunity to engage in wild speculation about the future of SAN technology. No one has a crystal ballin such a fast-paced technology, and the storage industry itself is being shaped by the push of emerging technologies and the pull of customer demands. Although InfiniBand may help transform the high-performance server market, storage virtualization and IP storage networking are now the main forces driving SANs to ubiquitous adoption.

This edition of Designing SANs also includes supplementary material in the appendixes to provide background and reference information on storage networking. In addition to resource and vendor listings, material is provided on the standardization process behind SAN development, including the work of NCITS/ANSI T10 and T11 groups and the Internet Engineering Task Force (IETF). The structure and activity of the SNIA are also included, as is the work of the SNIA Technical Council on the Shared Storage Model and the SNIA Education Committee's SNIA Dictionary. Some of my previously published articles on storage networking are grouped in Appendix G, SAN Essays.

[ Team LiB ][ Team LiB ]

1.4 Chapter Summary Using the SNIA Shared Storage Model

� The SNIA Shared Storage Model provides a framework for understanding the relationship between applications andunderlying storage resources.

� At the high level, the SSM separates applications from the supporting storage domain.

� The storage domain includes the file/record subsystem, block aggregation layer, block subsystem, and auxiliary services subsystem.

� The file/record subsystem defines whether data is passed to the upper-layer application as files or as records.

� Database applications typically use records.

� Files and records ultimately reside on storage disks as data blocks.

� A data block is a contiguous group of data bytes stored on disk.

of 407

� Data blocks are grouped into files or records at the block aggregation layer.

� Block aggregation can be performed on the host, within the SAN, or at the storage device.

� The block subsystem is composed of physical disk or tape devices that store data blocks.

� The services subsystem includes discovery, monitoring, management, security, backup, availability, capacity planning, and other storage-specific functions that enhance the operation of the storage domain.

� A SAN resides between the file/record subsystem and the block subsystem.

� SANs enable connection between multiple servers and storage resources in a peer-to-peer network.

� SANs serve data blocks to the host; NAS serves files to the host.

� The SAN infrastructure may be Fibre Channel, IP SAN, or some combination.

� A NAS device may span the file/record subsystem to the lower-level block subsystem.

� The Shared Storage Model can be used to overlay customer applications and storage and to facilitate planning for shared storage alternatives.

� The SSM is a convenient tool for both data center and remote site SAN design. [ Team LiB ]

of 407

[ Team LiB ]

Chapter 2. Storage and Networking Concepts Storage and networking have evolved as two distinct technologies. Storage, as embodied by the concepts of initiatorsand targets, assumes a master/slave relationship between connected devices. Networking, by contrast, assumes more egalitarian peer-to-peer relationships between connected devices. The focus of storage technology has been efficient data placement, whereas the focus of networking has been efficient data transport.

By converging storage with networking, SAN technology must provide high-performance transport of data from A to B aswell as efficient placement of data after it has safely arrived. In addition, SANs must resolve the contradiction created bythrusting master servers and storage slaves into a democratic peer infrastructure. This chapter examines the basic concepts of networking and storage to provide a foundation for understanding what occurs when they are combined. [ Team LiB ][ Team LiB ]

2.1 Networking in front of the Server The front of the server faces outward toward the user community via a network interface. Typically, you network dispersed computer resources across an enterprise by using Ethernet as the physical transport and TCP/IP as the networking protocol. The nearly universal adoption of Ethernet and IP is the result of a long natural selection process against competing transports and protocols. Token Ring, for example, provided an orderly and stable networking solution, but it could not survive the economies and simplicity offered by Ethernet. SNA once dominated enterprise data communications networks, but in the end it succumbed to open systems IP networking.

Networks can be defined by protocol (for example, an IP network) or by geography (for example, a local, metropolitan, or wide area network). Regardless of the specific means employed to move data from one device to another, network infrastructures generally follow a hierarchy that extends from the physical layer up through the end-user application.

The abstract reference for this hierarchy is the OSI (Open System Interconnection) Reference Model. Although individual network protocols may deviate from the OSI Reference Model, it is a useful structure for understanding the basic features of networking.

As shown in Table 2-1, the OSI model divides networking into seven functional layers. Although these layers are interdependent, the model gives you the flexibility to substitute one implementation for another (such as Ethernet for FDDI) within a layer. In this way, the upper-layer functions can be supported by a wide variety of lower-layer implementations. An e-mail carried in an IP datagram, for example, can be consecutively transmitted over Ethernet, Frame Relay, Asynchronous Transfer Mode (ATM), and back to Ethernet at the destination. Layers 1 and 2 in this case would change at each step, and the content of the upper layers would be preserved during transit.

As user data is passed down by an application for transmission across the network, it is wrapped in successive envelopesof information, roughly corresponding to the layers represented in Table 2-1.

A file transfer, for example, would acquire an information envelope on the file's format (layer 6), information useful for maintaining the file transfer session between sender and recipient (layer 5), information for acknowledging receipt of blocks of the file (layer 4), network addressing information (layer 3), information about the type of media on which the data was originated (layer 2), and encoding to place the data on the physical transport (layer 1). At the receiving end, each envelope would be verified and removed by the target until the data that composed the original file could be passed to the upper layers for reconstruction.

Table 2-1. The OSI Reference Model

Layer Number Layer Name Description

7 Application Application layer: Interface for e-mail, file transfer, etc.

6 Presentation Application data formatting: application-specific

of 407

Network protocols such as TCP/IP enable data transfer for both local area and wide area networks. In complex networks,data wrapped in network protocols may pass through multiple LAN and WAN segments before arriving at the final destination. Each routing step through a network segment is generically referred to as a hop. As shown in Figure 2-1, a single LAN segment enables multiple users to access shared resources such as file servers or printers. Multiple LAN switches can be connected to scale to higher concentrations of devices.

Figure 2-1. A LAN segment using switched Ethernet

Local area networks are typically built to link computers at a single site, such as an office building or data center. LANs provide connectivity over a variety of physical links, including twisted-pair wiring, fiber optics, coaxial cable, and wirelesstransmitters. These media correspond to the first layer of the OSI Reference Model.

The most prevalent LAN transport is Ethernet over twisted-pair copper wiring. Ethernet transmission rates are 10Mbps (megabits per second), 100Mbps (also known as Fast Ethernet), and 1,000Mbps or 1Gbps (Gigabit Ethernet). 10Mbps and 100Mbps speeds are most commonly used for attachment of individual PCs or workstations, with effective data rates of slightly more than 1MBps (megabytes per second) and 12MBps, respectively. Gigabit Ethernet is used primarily for high-performance core networks, and for attachment to file and application servers. The original 10Mbps and 100Mbps Ethernet LANs often supported multiple devices on a single shared bandwidth by use of hubs. Today, most Ethernet deployments use Ethernet switches for dedicated bandwidth to each device.

Ethernet is ubiquitous, and modern office complexes are prewired with both phone and Ethernet jacks in every office and cubicle. These local user connections are typically run to Ethernet switches, which may, in turn, use Gigabit Ethernetuplinks to tie the local segment into a Gigabit Ethernet backbone. High-port-count (128, 256, or higher) backbone switches enable enterprise networks to allocate connections to users and servers and to manage bandwidth based on application requirements. In addition, most vendors of Gigabit Ethernet switches offer 10Gbps Ethernet ports for interswitch links. High-performance 10Gbps Ethernet links allow scaling to hundreds or thousands of devices in a single network.

As shown in Figure 2-2, if you want to scale networks to efficiently support higher populations of devices, you must manage bandwidth. If you restrict the network to a single speed, bottlenecks will occur as more users attempt to access

5 Session End-to-end session control: NetBIOS, application-specific

4 Transport Transmission control: TCP, UDP

3 Network Routing protocol: e.g., IP

2 Data Link Access method: Ethernet, Token Ring, FDDI

1 Physical Physical transport: twisted pair, fiber optics, coax

of 407

shared resources. Because Ethernet offers deterministic bandwidth in powers of ten, it is possible to design scalable nonblocking infrastructures. In this example, users are attached by Fast Ethernet, and the departmental LAN switches are attached to the core by Gigabit Ethernet. The core itself can be built with one or more 10Gb interswitch links to provide a high-performance LAN backbone.

Figure 2-2. Scaling the network using departmental and core switches

Ethernet LANs are normally limited to a single building or campus. To share computer resources with remote sites, widearea networks (WANs) use telecommunications lines, microwaves, or satellites to transmit data.

Routers are an essential ingredient of both LANs and WANs. As shown in Figure 2-3, routers sit on the local LAN segment and move data from one local segment to another and to the telecommunications carrier (AT&T, Qwest, etc.) for remote sites. This connection to the carrier is often depicted using a cloud symbol because any number of microwaves, satellites, or optical switched networks could be used to move data from the local site to the remote. On the other side of the cloud, a communications line connects to a remote router, which in turn serves the remote computer users. As with Ethernet, WAN topologies provide access methods to ensure the orderly transmission of data from one site to another. Various communications methods for the WAN are specified by Point-to-Point Protocol(PPP), Frame Relay, ATM, and Packet over SONET (POS).

Figure 2-3. Routers in LAN and WAN configurations

In Figure 2-3, the connectivity provided by routers enables users in one local network segment to access networked

of 407

resources in another local segment.

The ability to segment a single network into multiple parts improves the performance and efficiency of each segment while still allowing communication between them. Segmentation also insulates the network as a whole from data-link layer broadcast storms generated within a single part of the network. Traffic in a single network segment is performed at layer 2, whereas layer 3 route processing is required only for traffic destined for another segment. If, alternatively, the entire network were one large layer 2 segment, it would be vulnerable to disruptions. Layer 2 LAN bridging of the 1980s exposed this vulnerability as networks grew to larger populations and eventually were displaced by layer 3 IP routers. The significance of network segmentation for SANs is discussed in Chapter 4 (on Fibre Channel topologies) as well as Chapter 7 (on IP storage products).

Several fundamental principles of networking are common to both LAN and WAN technologies. Networking principles include the following:

� Serial transport

� Access method

� Addressing

� Packetizing of data

� Routing of packets

� Upper-layer protocol support

Because these networking fundamentals also form the foundation of storage area networks, it is important to understand how each contributes to the network infrastructure and enables the paradigm shift now occurring in server-to-storage connectivity.

2.1.1 Serial Transport

Networking is a serial transport. The digital 1's and 0's that compose all data are transmitted sequentially, one bit after another, from source to destination. Serial transmission enables data to be shipped over longer distances with fewer resources at a faster data rate. Serial transmission over fiber-optic cabling, for example, can extend tens of kilometers; over satellite, hundreds of kilometers. Alternatively, a parallel transport (such as a parallel printer interface) must use multiple lines to send groups of bits concurrently. Because of electrical and clocking considerations, parallel transmission is severely limited in distance, usually to less than 25 meters.

A serial link between networked nodes requires, at minimum, a transmit lead to send data and a receive lead to accept data. Early serial links such as RS-232 required control and ground leads as well as the transmit and receive wires. Current serial architectures such as Fibre Channel and Gigabit Ethernet require only transmit and receive links. The clocking and flow control functionality is now integrated into the bit stream itself. This enables a streamlined cabling plant while also enabling full duplex operation. Both transmit and receive links can be active at the same time, although storage applications typically run in a near half duplex mode. During a write operation, for example, the initiator will stream frames on the transmit link while the receive link is used for acknowledgments and status.

2.1.2 Access Method

A networked device requires an access method to gain control of the transport media. Ethernet uses carrier sense and collision detection, whereas Token Ring provides a token that can be claimed by only one user at a time. Fibre Channel arbitrated loop is a shared transport and uses an arbitration process to determine which node has access to the loop. In switched environments such as Fibre Channel fabrics and switched Ethernet, each device has dedicated bandwidth and can send data at will without negotiating for access to the transmission media. To be backward-compatible with traditional shared Ethernet, switched Gigabit Ethernet retains the standard Ethernet carrier sense and collision detection mechanism, but it is really not required in switched environments.

of 407

2.1.3 Addressing

Each device on a network must have a unique identity. This identity is established by a unique address, and, depending on the upper-layer protocols that a device supports, a single networked computer may have a unique address corresponding to each protocol layer.

An Ethernet interface card in a PC, for example, has a unique MAC (media access control) address that is assigned by the manufacturer. The 6-byte MAC address is used when data is exchanged between computers on the same LAN segment. If a user wishes to communicate with another device on a different segment, a network protocol such as IP is required. Each device communicating via a network protocol must have, in addition to a unique MAC address, a unique network address. IP provides an addressing scheme for billions of unique network addresses, although you must administer address allocation to avoid duplications and guarantee proper routing of data.

Layer 2 and layer 3 addressing is used for data transport over the network. A networked device may also have other identifiers, such as an alias or name. Fibre Channel, for example, provides a 64-bit World-Wide Name (WWN) as a unique identifier for each Fibre Channel node, whereas iSCSI specifies an extremely long 255-byte name. These name identifiers are not used for routing traffic across the network but instead are meant to preserve the identity of a node in the event its layer 2 or layer 3 address is changed.

2.1.4 Packetizing of Data

Data is sent across the network in discrete packets, or frames. A large graphics file, for example, must be divided into multiple small packets for transport across the network. Each packet contains a portion of the original file as well as sequencing and source/destination addressing in a packet header. At the receiving end, the network-specific addressingand sequence information is removed, and the data is reassembled to re-create the original file.

Packetizing of data is necessary to preserve data integrity and to provide optimal utilization of the network. If a large block of data were simply dumped onto the network in its entirety, the resources of each hop through the network wouldbe consumed while the block was being forwarded to the next destination. In addition, any unrecoverable bit corruption in the block would force a retransmission of the entire block, further burdening network resources. By breaking large blocks of data into smaller units, you enable your network equipment to handle multiple transactions concurrently, treating each packet as a unit of information and multiplexing packets from different sources to different destinations. Corruption of a single packet will, in the case of TCP/IP, trigger recovery for that packet alone or will, in the case of FibreChannel, initiate a retransmission of the sequence of frames.

2.1.5 Routing of Packets

Networks are composed of multiple segments that are joined by routers or switches. Physical segmentation of a network is required if you want to avoid overloading the transport with too many users and thus degrading performance. Users on a single shared LAN segment can communicate directly with one another. If a user wishes to communicate with someone on a different LAN segment, a router or switch must forward the data. Managing the bandwidth available on a single segment and allocating sufficient router or switch ports for access to the rest of the network are fundamental challenges of network design.

In addition to getting packets across multiple segments, routing enables the network to create redundant links between those segments. A meshed network refers to a topology that provides multiple data paths between its participants. If a single link goes down, a meshed network can route data around the failure and still get data to its final destination.

Routing in IP networks is most commonly accomplished by using the Open Shortest Path First (OSPF) routing protocol. When you use OSPF, network equipment can monitor the status and capacity of links and determine the optimum path at any given moment through the network. Fibre Channel fabrics use a subset of the OSPF protocol called Fabric Shortest Path First (FSPF).

2.1.6 Upper-Layer Protocol Support

Network topologies and protocols provide the communications infrastructure for upper-level applications. The network protocol layer is responsible only for moving data from one point to another. What is actually done with the data after it arrives is the responsibility of upper-level protocols. IP, for example, routes packets through the network to the intendeddestination. Sitting above IP, the TCP (or similar) layer formats the data for hand-off to the application. For SANs, the

of 407

dominant upper-layer protocol employed is a variant of the SCSI protocol that is optimized for moving blocks of data to and from disk.

For Fibre Channel SANs, the upper-layer protocol is the Fibre Channel Protocol (FCP). For IP-based SANs, the upper-layer protocol may be Internet Fibre Channel Protocol (iFCP), which puts FCP over TCP/IP, or Internet SCSI(iSCSI), which wraps serial SCSI in TCP/IP. These upper-layer protocols encapsulate basic SCSI read and write commands, status, and data. [ Team LiB ][ Team LiB ]

2.2 The SCSI Architecture The SCSI protocol was developed to provide efficient transport of data between computers and peripheral devices such as disks, printers, scanners, and other resources. SCSI sits below the file/record layer in the SNIA Shared Storage Modeland receives requests to send or retrieve blocks of data from a peripheral device. Suppose, for example, that an application initiates a write request to the operating system to store data. For the SCSI protocol layer, this write request is interpreted as a command to write a certain number of data blocks to a specified location. As a mediator between the operating system and storage, SCSI is not responsible for how the blocks are assembled for transport or how they are actually placed on disk. As SCSI sends blocks to a destination, the target may be a single physical drive or a RAID controller that will stripe the blocks over multiple physical drives. The responsibility of the SCSI protocol is simply to ensure that the write task is completed and to report success to the operating system, no matter how the physical storage is configured.

SCSI targets are identified by the operating system in a three-part bus/ target/LUN descriptor. The bus designator is one of several SCSI interfaces installed on a host system. A traditional parallel SCSI adapter card may represent a singlebus, with that bus supporting multiple daisy chained disks. Alternatively, a Fibre Channel host bus adapter (HBA) or iSCSI network interface may be viewed by the operating system as a SCSI bus. Multiple installed cards would be seen asmultiple bus numbers. The target represents a single storage resource on a bus daisy chain, whereas the LUN (logical unit number) designation identifies the SCSI client within the target. A single physical disk, for example, may have onelogical unit and consequently one logical unit number. A RAID controller attached to a bus may represent a single target but have multiple logical units and multiple LUNs assigned.

For configuration and management purposes, the operating system represents SCSI targets in the bus/target/LUN triad nomenclature, but what the user and user application see are logical identifiers, such as "E:" drive. Thus, the mapping between the bus/target/LUN designation and the logical drive identifier provides the portal between physical devices andthe upper-layer file system.

The bus/target/LUN identifier may be further mapped to the addressing requirements of a specific transport. The Fibre Channel Protocol, for example, maps bus/target/LUN to a device ID/LUN pair. Consequently, the representation of physical storage has two components. One, directed at the operating system, establishes a familiar addressable entity based on the SCSI triad. The other, directed at the specific storage transport, accommodates the addressing requirements of that topology.

The relationship between SCSI initiators and targets is defined in the SCSI Architectural Model (SAM-2) shown in Figure 2-4. For networked storage, additional standards documents further define serial SCSI implementations. Serial SCSI implementations such as Fibre Channel and iSCSI are a component of the SAM-2 definitions for SCSI-3 commands.

Figure 2-4. SAM-2 SCSI-3 standards functional layers

of 407

The SCSI architecture defines the relationship between initiators (hosts) and targets (for example, disks) as a client/server exchange. The SCSI-3 application client resides in the host and represents the upper-layer application, file system, and operating system I/O requests. The SCSI-3 device server sits in the target device, responding to requests. The client/server requests and responses are exchanged across some form of underlying transport, which is governed bythe appropriate SCSI-3 service delivery protocol for that transport, such as the FCP protocol or iSCSI for gigabit serial links. Thus, the SCSI-3 commands that serve I/O requests from the host application are differentiated from the SCSI-3 transport protocols that actually move data via the service delivery subsystem.

An initiator may have multiple requests pending to a target. The client/ server model must therefore accommodate concurrent request-response exchanges and track the status of each. As shown in Figure 2-5, multiple requests generate multiple instances of the application client and multiple transactions for the device server. Status and diagnostic functions can be supervised through task management between the two entities.

Figure 2-5. SCSI client/server model. The delivery subsystem could be parallel cabling, FCP, or iSCSI

of 407

On the initiator, juggling multiple transactions to one or more targets requires context switching: the ability to quickly switch from one job to another as SCSI responses are processed. An initiator such as a server, for example, could issue a write request to a target. As the server waits for the target to prepare its buffers to receive data, it may switch contexts to another pending task and so maximize throughput. If SCSI tasks were performed only consecutively, time would be wasted waiting for each individual write or read to complete.

The failure to handle multiple threads concurrently may result in additional overhead in the form of retries or SCSI timeouts. The SAM-2 architecture does not define how this is to be implemented but indicates it must be addressed. Typically, context switching is performed by the host adapter card, whether it is parallel SCSI, Fibre Channel, or iSCSI.

The SCSI-3 Architectural Model is structured so that the I/O requests from the host system can be served without regard to the underlying service delivery subsystem. A single file server could therefore conduct I/O operations against avariety of target types. A server could, for example, have direct-attached SCSI targets as well as serial SCSI targets over a gigabit interface.

Reads and writes of data between SCSI initiators and targets are performed with a series of SCSI commands, delivery requests, delivery actions, and responses. SCSI commands and parameters are specified in the command descriptor block (CDB). The CDB is part of a command frame sent from initiator to target. For improved performance on write operations, the frame may also contain data to be written to the media. Serial SCSI transport protocols such as Fibre Channel and iSCSI simply encapsulate CDBs as their payload. The CDB is encapsulated within the Fibre Channel Protocolinformation units (IUs), whereas in iSCSI it is carried in the iSCSI protocol data unit (PDU).

The first byte of a CDB is an opcode that specifies the type of operation the target is to perform. A SCSI write to disk, for example, triggers the creation of an application client on the initiator (such as HBA), which in turn issues a SCSI command request to the target to prepare its buffers for data reception. The target device server issues a delivery actionresponse when its buffers are ready. The initiator responds by sending blocks. Depending on the lower-layer delivery subsystem, the blocks may be transported as bytes in parallel such as LVD SCSI cabling) or segmented into frames for serial transport (such as Fibre Channel or iSCSI).

From the standpoint of the application or operating system, the write was conducted as a single transaction. In reality, asingle write may cause multiple delivery requests and delivery action exchanges before all data is finally sent to the target, as shown in Figure 2-6.

Figure 2-6. A SCSI write operation with multiple data delivery actions to complete a singlecommand/response pair

of 407

In a read operation, the SCSI command block reverses the sequence of data delivery requests and acknowledgments, although it is assumed that because the initiator issued the read command, its buffers are ready to receive the first set of data blocks. The number of blocks sent in a single phase of write or read transactions is negotiated between the initiator and target and is based on the buffering capacity of each. High-performance disk arrays, for example, typically provide large buffers to accommodate larger transfers and thus increase productivity.


2.3 The Parallel SCSI Bus The most common method for connecting disk and tape devices to file and application servers has been via traditional SCSI bus cabling. Although SCSI is really a block protocol, the term is also used to refer to the parallel cabling scheme that evolved with the protocol.

The original SCSI physical layer transport was a parallel cable with 8 data lines and a number of control lines. Transmitting 8 bits of data during each transmit clock provides a relatively high bandwidth, but electrical issues restrict the total distance allowed by most SCSI implementations to 15–25 meters. The SCSI parallel bus architecture has evolved over time, with higher bandwidth provided by wider data paths (16 data lines and 32 data lines) and faster clockrates.

One of the difficulties presented by parallel bus architecture is a phenomenon known as skew. If 8 or 16 bits of data aresent simultaneously in parallel, small differences in propagation delay along each data line may occur, and not all bits may arrive at the destination at precisely the same moment. Skew refers to the difference in arrival time for each bit comprising a data word. The skew window is the time difference within which all bits must arrive. As shown in Figure 2-7, the greater the differences in propagation delay, the wider the window must be to ensure that all data bits are captured. One factor in skew is the differential cable propagation delay between each data bit—that is, the difference in the amount of time taken by a signal to travel from one end of the cable to the other, as a function of bit position. For a given cable length, an increase in clock rate requires a reduction in the skew window and the minimizing of differential propagation delay.

of 407

Figure 2-7. Timing skew on parallel bit transmission

Additionally, parallel SCSI requires termination of any unused ports. As multiple SCSI devices are daisy-chained together, end devices must be terminated to avoid erratic signal interference. Proper cabling and termination are critical for stable parallel SCSI operation. Marginal components or improper termination can cause data corruption or transaction failures.

File servers may house disk drives internally and provide either parallel SCSI cabling or a parallel SCSI backplane for installing disks. Current data requirements for most enterprises, however, make internal storage impractical.

To accommodate hundreds of gigabytes or terabytes of data, it is necessary to deploy multiple external disk enclosures, which can be connected by parallel SCSI cabling to multiple SCSI host bus adapters installed in the server, as shown in Figure 2-8.

Figure 2-8. Server with multiple SCSI adapter cards installed

The movement of data from internal storage to external storage has been a driving force in finding alternative solutions to parallel SCSI cabling. For enterprise-class storage requirements, the restrictions in terms of distance and number of devices supported on a SCSI bus make it difficult to support larger storage capacities. The primary weakness of parallel SCSI for enterprise applications, however, is that it binds storage resources to a single server. Storage resources cannot be shared easily by multiple servers, and unused storage capacity behind one server cannot be utilized by another. For enterprises that must support and manage tens or hundreds of terabytes of data, a networked storage solution is needed.

For direct-attached storage configurations, the limitations of parallel SCSI are being addressed by Serial Attached

of 407

SCSI (SAS). The SAS specification is being developed by ANSI T10 as the next-generation direct-attached storage interface. Although SAS is not intended as a replacement for networked storage, it will provide efficiencies in local storage connectivity and will be immune to the skew and distance limitations of parallel cabling.


2.4 Network-Attached Storage Although the relationship between NAS and SAN has been somewhat obscured by vendor marketing, the distinction between them is fairly straightforward. NAS delivers storage data in the form of files; SANs deliver storage data as blocks. NAS therefore uses file transport protocols such as NFS and CIFS, whereas SANs use SCSI, a block transport protocol.

Because data blocks are the raw material from which files are formed, NAS also has a block component. These blocks are addressed on a per-file basis using metadata (directory information) to define the file-block relationship. The block access methods of a NAS device, however, are typically hidden in the NAS enclosure. To the outside world, the NAS device is a server of files and directories.

NAS accomplishes the central goal of storage networking: the sharing of storage resources through the separation of servers and storage over a common network. Like SAN-based storage, NAS overcomes the limitations of parallel SCSI and enables a more flexible deployment of shared storage. In redundant configurations, NAS can also provide highly available, nondisruptive storage access.

As shown in Figure 2-9, a NAS architecture includes disk arrays for data placement, a NAS processor, and an external interface to the user network. This architecture has a number of implications. Whereas the block data transport of a SANnormally occurs over a dedicated storage network (or virtual LANs within a Gigabit Ethernet network), the file transport of a NAS device assumes direct connectivity to the end-user network. Thus, NAS performance is determined in part by the bandwidth available on the messaging network. In addition, although the NAS processor is optimized for file transport, it is essentially a thin server sitting on a LAN and acting as a front end to storage arrays. To differentiate NAS from a common network-attached file server, vendors of NAS products have attempted to make the thin server component so thin that it is invisible from the user perspective. The term "NAS" is therefore always linked in marketing literature to the concept of "appliance"—a device that is simply plugged in to the network and requires little administration. Finally, the connection between the NAS intelligence and its disks is immaterial from a user perspective. Although SAN storage is predicated on high-performance gigabit interfaces for storage (either directly or via bridge products), a NAS device can rely on parallel SCSI, IDE, or Fibre Channel for storage connectivity.

Figure 2-9. A NAS appliance can use any type of physical storage

The challenge of NAS vendors is to make their products more appliance-like and to reduce the overhead of NFS/CIFS protocols over TCP/IP. To deal with latency, some NAS vendors support the Distributed File Access System (DAFS) over Virtual Interface (VI). DAFS relies on remote direct memory access (RDMA) techniques to move file data

of 407

directly to systems memory. If a NAS controller can efficiently place file data directly into the memory of the client, transport latency is offset by the higher performance of file data placement.

Another trend in NAS technology is the separation of storage from the NAS processor or head, so that storage capacity can be more easily expanded using SAN techniques or shared with SAN-attached servers. As shown in Figure 2-10, a NAS head still performs its traditional role of file serving, but its storage is no longer hidden within the NAS enclosure. This lets you carve out part of the storage capacity for file requests while leaving the remainder available for block access.

Figure 2-10. A NAS head with shared SAN storage

The top end of NAS product offerings have used Fibre Channel arbitrated-loop disk drives for block storage. This approach results in higher performance and scalable storage capacity, but it presents vulnerabilities in the event of loop failure. Today, embedded-loop switch technology is used on the backplane of some NAS storage arrays to bypass failures and maintain stable operation. This technology was initially developed by Vixel Corporation and adopted by Network Appliance for its NAS filers.

At the low end of NAS products, vendors have introduced economical NAS appliances using internal IDE disk drives. The basic functionality of NFS or CIFS file sharing is maintained, although these systems tend to be limited in total capacity and give somewhat lower performance compared with the enterprise-class NAS products.


2.5 Networking behind the Server As illustrated in Figure 2-11, storage area networks introduce network relationships behind the server. Replacing the dedicated parallel SCSI cabling bond between servers and storage with peer-to-peer networking does two things: It not only provides more flexibility in the deployment of storage resources but also frees storage from the exclusive ownershipby a single server. In a SAN topology, any storage resource can be assigned to any server. Although these new relationships must be managed, SANs enable more efficient use of storage resources and lower administrative costs through storage consolidation.

of 407

Figure 2-11. A simple storage area network

SANs represent, after WANs and LANs, a third type of network. A SAN may be isolated from the mainstream messaging network through protocol segregation (Fibre Channel) or network segregation (virtual LAN, or VLAN). In either case, the purpose of the SAN is to provide high-performance access to shared storage resources using SCSI block transactions.

The SAN infrastructure has provided a foundation for new storage applications such as server clustering, LAN-free and server-free tape backup, and storage consolidation. To serve these, a new generation of storage applications, management platforms, and products has been developed. Along with that comes a market demand for people who have the skills to deploy and manage SAN environments.

With current technology, you can build the SAN infrastructure using Fibre Channel, Gigabit Ethernet, or a combination of both. Fibre Channel provides both 1Gbps and 2Gbps speeds and so offers an advantage in raw bandwidth for device attachment. Gigabit Ethernet provides 1Gbps for device attachment and 10Gbps for interswitch links, and it offers an advantage in ubiquity and relatively lower plumbing costs.

The combination of a high-speed network transport and native SCSI protocol results in an efficient means to deploy servers, disk arrays, and tape subsystems, and it frees all components from the constraints of parallel SCSI architecture.At the same time, the fusion of storage and networking has created new relationships that must be managed. If a storage resource on a SAN can be assigned to any server, it can also be inadvertently assigned to unintended servers. Concurrent with the development of SAN technology, new tools such as storage zoning, LUN masking, and storage virtualization are required to balance the productivity gains SANs provide with the disruptions they have created for traditional storage management. [ Team LiB ][ Team LiB ]

2.6 Chapter Summary Networking in front of the Server

� SANs bring new networking concepts to the server/storage model.

� Moving data serially allows higher speeds and longer distances between servers and storage.

� Access to the physical transport involves low-level protocols to ensure data integrity.

� You must use network addressing to uniquely identify each participant and to properly route data across the infrastructure.

� Large blocks of data must be segmented into packets or frames for orderly shipment through the network.

� Routers and switches route data packets between network segments.

of 407

� Networking facilitates movement of data whose source and destination are ultimately upper-level application interfaces.

The SCSI Architecture

� Applications view storage resources as abstract representations of the underlying physical devices.

� I/O requests are generated by applications, file systems, and operating systems.

� Logical storage identifiers (such as E: drive) are mapped to the SCSI addressing scheme of bus/target/LUN.

� The supporting SCSI infrastructure may also provide additional mapping between the conventional bus/target/LUN addressing and the address scheme required for that topology.

� The SCSI Architectural Model (SAM-2) separates SCSI commands from SCSI delivery protocols and underlying interconnections.

� SAM-2 defines SCSI relationships in a client/server model.

� Initiators are host systems that generate I/O requests.

� Targets are storage devices that process those requests.

� An application client resides in the initiator and is generated for every I/O operation.

� The device client resides in the target device and serves requests.

� Multiple instances of the application client may be running concurrently to support multiple outstanding I/Os.

� I/O is performed through a command/response sequence.

� SCSI commands and parameters are contained in the command descriptor block.

� The command descriptor block is encapsulated within the Fibre Channel Protocol information unit or iSCSI Protocol data unit.

� Multiple data delivery requests and acknowledgments can be exchanged within a single command/response pair.

� The efficiency of the SCSI protocol depends on the stability of the supporting interconnect infrastructure.

The Parallel SCSI Bus

� SCSI cabling provides parallel wires for simultaneous transfer of data bits.

� The maximum SCSI cabling distance is 25 meters.

� The maximum device population for SCSI cabling is 15 devices on a string.

� Skew refers to the window of time required to capture all data bits in a parallel transmission.

Network-Attached Storage

� NAS serves storage data as files; SANs serve storage data as blocks.

of 407

� NAS resources are shared using NFS or CIFS file transport protocols.

� NAS enables cross-platform access to common data.

� The NAS architecture is based on a thin server processor with attached storage resources.

� NAS products may use Fibre Channel or any SCSI interconnect as a back-end architecture for higher-speed access to disks.

Networking behind the Server

� SANs are a new type of network optimized for storage.

� SANs may use Fibre Channel, Gigabit Ethernet, or a combination of both.

� SANs enable new storage models such as server clustering and LAN-free tape backup.

� SANs require new management tools to supervise new storage relationships.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 3. Fibre Channel Internals As the first successful gigabit serial transport, Fibre Channel was soon adopted as a viable architecture for block I/O applications. The fibre of Fibre Channel connotes support for both fiber-optic and copper cabling. The channel refers to its role as a gigabit, high-performance conduit between computing resources. This chapter presents an overview of the various layers that compose the Fibre Channel architecture and provides a basis for understanding Fibre Channel products and functionality. [ Team LiB ][ Team LiB ]

3.1 Fibre Channel Layers Fibre Channel is a standards-based networking architecture. The standards provide definitions for physical-layer attributes, transport controls, and upper-level support of TCP/IP, SCSI-3, High Performance Parallel Interface(HiPPI), and other protocols. Fibre Channel is a gigabit transport, with current implementations at 1Gbps and 2Gbps. The governing body for Fibre Channel standards is the NCITS/ANSI T11X3 committee. Resources for the various Fibre Channel documents are listed in Appendix A.

Fibre Channel standards define a multilayered architecture for moving data across the network. As listed in Table 3-1,these layers are numbered from FC-0 to FC-4. The top layer, FC-4, establishes the interface between Fibre Channel and upper-level applications. The serial SCSI protocol, for example, must associate Fibre Channel devices with storage resources that can be accessed by the operating system. For host bus adapters, this function is typically fulfilled by the device driver provided by the vendor. FC-3 is under development and may include facilities for data encryption and compression. The FC-2 layer defines how blocks of data handed down by the upper-level application will be segmented into sequences of frames for hand-off to the transport layers. This layer also includes various classes of services and flow control mechanisms. The lower two layers—FC-1 and FC-0—focus on the actual transport of data across the network. FC-1 provides facilities for encoding and decoding data for shipment at gigabit speeds and defines the command structure for accessing the media. FC-0 establishes standards for various media types, allowable lengths, and signaling.

Collectively, the Fibre Channel layers fall within the lower levels of the OSI model and can be considered a link layer network. Fibre Channel assumes a single unpartitioned network and homogeneous address space for the entire network fabric. Although theoretically this address space can be quite large (15.5 million addresses in a single switched fabric), inpractice Fibre Channel fabrics commonly support tens of devices in a single SAN or, in some data center applications, a few hundred devices.

Fibre Channel's layered architecture is implemented on three transport topologies: point-to-point, arbitrated loop, and switched fabric. Point-to-point is a dedicated connection between two devices only, typically a server and a disk. Arbitrated loop is a shared medium, similar to Token Ring or Fiber Distributed Data Interface (FDDI), and employs a special superset of commands to control access to the medium by multiple devices. A Fibre Channel fabric is one or more switches providing higher-level services and switched bandwidth of 100MBps or 200MBps (200MBps and 400MBps full duplex) per port. Each of these topologies is discussed in detail in Chapter 4.

Table 3-1. Fibre Channel Layers

Layer Number Function Content

FC-4 Upper-layer protocol interface For example, SCSI-3, IP

FC-3 Common services Under construction

FC-2 Data delivery Framing, flow control, service class

FC-1 Ordered sets/byte encoding 8b/10b encoding, link controls

FC-0 Physical interface Optical/electrical, cable plant

of 407


3.2 1Gbps and 2Gbps Transport To move data bits with integrity over a physical medium, a reference clock must be provided to ensure that each bit is received as it was transmitted. In parallel topologies, a separate clock or strobe line serves this function. As data bits are launched in parallel from the source, the strobe line toggles high or low to signal the receiving end that a full byte (or multiple bytes for 16- and 32-bit wide parallel cabling) has been sent.

Because serial data transports can have only two lines (transmit and receive), clocking cannot be achieved by a separateline. The serial data itself must carry the reference timing; in other words, clocking must be embedded in the bit stream.Different serial transports accomplish embedded clocking by different means. Fibre Channel uses a byte encoding scheme and CDR (clock and data recovery) circuitry to recover clock and thus determine the data bits that compose bytes and words.

At gigabit speeds, maintaining valid signaling (and therefore valid data recovery) is critical for data integrity. Fibre Channel standards allow for a single bit error to occur only once in a trillion bits (10–12). In practice, this amounts to a maximum of one bit error every 16 minutes, although actual occurrence of bit errors is far less frequent. One of the challenges for Fibre Channel vendors is to improve on this bit error rate. Reducing the rate to 10–13 would reduce statistical occurrence to one bit error every 2.5 hours. Such reductions are possible if attention is given to the hardware interface design, cable quality, cable length, and the system's jitter budget.

Jitter refers to any deviation in timing that a bit stream suffers as it traverses the physical medium and the circuitry on-board the end devices. A certain amount of deviation from the original signaling will occur naturally as a serial bit streampropagates over fiber-optic or copper cabling. Signaling can also be affected by transients from power supplies, board design, and other sources of electromagnetic interference (EMI). Recognizing that jitter cannot be eliminated, Fibre Channel standards have attempted to set guidelines for how much jitter individual components can contribute. If each product abides by the recommended jitter budget, timing deviations to gigabit signaling with a system should not exceed the recommended bit error rate of 10–12.

Measuring the jitter contribution of any component requires expertise and expensive equipment. Because test equipment itself may contribute to jitter, research for better techniques and test facilities is ongoing. The ANSI T11.2 Methodologies for Jitter Specification (MJS) group is one official body that is attempting to define valid, unintrusive methods for measuring and defining jitter limits.

Jitter is typically represented in graphical form by an eye diagram on a test scope, as illustrated in Figure 3-1. The cross-over points or intersections forming the "eye" represent signaling transitions to high or low voltages. Ideally, all transitions should occur at precisely the same interval. If that were the case, the clock and data recovery circuitry could recover all data bits with no bit errors whatsoever. In reality, some deviation will always be observed. If the jitter is too extreme, the CDR will miss one or several bits, resulting in the corruption of Fibre Channel words or data within frames.

Figure 3-1. Eye diagram showing signal transitions that may result in bit errors

of 407

Jitter is an important element in SAN design because every component can potentially contribute to jitter or jitter accumulation. A faulty transceiver, substandard fiber-optic cabling or connectors, cable distances that exceed guidelines,improperly shielded copper components, or simply bad product design can introduce system instability at the physical layer. Despite what you will read in the marketing material of some vendors, no Fibre Channel product is jitter-free. Consequently, selection of low-jitter components should be based, when possible, on independent verification in additionto the vendor's product specification.

Because 2Gbps Fibre Channel doubles the clock rate of 1Gbps Fibre Channel, it is essential to maintain signal integrity through the cable plant and SAN equipment. Fortunately, vendors have been able to apply their considerable experience with 1Gbps jitter issues to produce stable 2Gbps products. In addition, the installed base of 1Gbps Fibre Channel products has required support for auto-negotiation circuitry on 2Gbps Fibre Channel switches and HBAs. When a 1Gbps device is plugged in to a 2Gbps switch port, for example, the CDR of the switch port senses the incoming clock and automatically adjusts its own baud rate. This enables a mix of 1Gbps and 2Gbps products in the same SAN. In addition to end devices, 2Gbps capability is useful for maximizing throughput of interswitch links. [ Team LiB ][ Team LiB ]

3.3 Physical Layer Options Fibre Channel can be run over optical or copper media. Because of its noise immunity, fiber-optic cabling is preferred, but copper is also widely used, particularly for small form factor Fibre Channel disk drives. Mixing fiber-optic and copper components in the same environment is supported, although not all products provide that flexibility.

For Fibre Channel transport, copper cabling typically is twinaxial and is specified as either intracabinet or intercabinet. Intracabinet copper assumes that all connections will be made within a single enclosure, such as a 19-inch rack. This reduces exposure to EMI and potential ground loop problems. Intracabinet copper uses unequalized cabling to distances of 13 meters. Intercabinet copper incorporates active components on the signal generation circuitry to reduce EMI and drive the signal longer distances. Intercabinet copper uses equalized or balanced cabling to extend up to the maximum of 30 meters.

Fiber-optic cabling is normally referred to by mode, or the frequencies of light waves that are carried by a particular cable type. Multimode cabling is used with shortwave laser light and has either a 50�m (micrometer) or 62.5�m core with 125�m cladding. The reflective cladding around the core restricts light to the core. The 50�m or 62.5�m diameter is sufficiently large for injected light waves to be reflected off the core interior, as shown in Figure 3-2. Because a shortwave laser beam is composed of hundreds of light modes that reflect off the core at different angles, a dispersion effect reduces the total distance at which the original signal can be reclaimed. Multimode fiber supports 175 meters with 62.5�m/125�m cable, and 500 meters with 50�m/125�m cable.

Figure 3-2. Multimode fiber

of 407

Single-mode fiber is constructed with a 9�m core and 125�m cladding. Single-mode is used to carry longwave laser light. With a much smaller diameter core and a single-mode light source, single-mode fiber supports much longer distances, currently as much as 10km at gigabit speeds.

Both optic cable types are protected by an exterior coating, usually in a distinctive orange color, and may use standard(SC) fiber-optic connectors or small form factor (SFF) connectors for attachment to transceivers. The bend radius for fiber-optic cable should not exceed 3 inches, so some attention should be given to strain relief at connection points and routing of cabling within cabinets.

The selection of copper or optical media is determined by a number of practical considerations. Copper may present EMI issues but is sometimes mandated by other concerns. Some Fibre Channel products provide only copper DB-9 interfaces.Copper adapters are also less expensive than fiber-optic ones, and that drives copper into price-sensitive configurations.Fiber-optic components and cabling, however, eliminate most of the EMI and ground loop problems that may occur with copper, and fiber optic is the preferred medium for stable SAN construction.

At either end of the cable plant, transceivers or adapters are used to bring the gigabit bit stream onto the circuit boards of host bus adapters or controller cards. A number of form factors and interface types are available for both copper and optical media. For copper, this connection is fairly straightforward: electrical in (for example, from the copper cable plant), electrical out (for example, onto the copper traces of the controller's circuit board). For optical media, the optical pulses must be converted to electrical—that is, copper—and vice versa. Fiber optic transceivers fulfill this role.

One of the first adapters developed for Fibre Channel applications was the gigabaud link module, or GLM. GLMs are available in both fiber-optic and copper versions, although their popularity has declined as new form factors have emerged. GLMs are semipermanent modules. A connector latches the GLM onto the circuit board of the HBA or controller, and the circuit board, in turn, is enclosed in a server chassis, disk cabinet, switch, or hub. Maintenance is an issue with GLMs because to reseat the module, you must power off and open the enclosure.

From a product design standpoint, GLMs save the HBA or controller architect several design steps by incorporating, in addition to the optical or copper interface, clock and data recovery components and serializing/deserializing (serdes) circuitry. The connection between the GLM and its host card is a parallel interface. Encoded bytes can be sent or received via the connector with no additional parallel/serial conversion. In the end, however, this convenience for the architect has been outweighed by the inconvenience GLMs pose for maintenance and replacement.

Until recently, the most widely used transceiver module for Fibre Channel was the gigabit interface converter, or GBIC. The GBIC form factor was first developed by Compaq, Sun, Amp, and Vixel Corporation and was a de facto standard in the industry. GBICs lack the serializing/deserializing capability of GLMs but are modular, hot-swappable devices. This lets you insert or remove GBICs without powering down or opening the supporting chassis, and it facilitatesreplacement of one medium with another, such as a copper GBIC with an optical GBIC. Using removable media in Fibre Channel switches and hubs offers greater flexibility in SAN design, primarily because it lets you mix various media and interface types and make changes to the topology with less disruption to the system.

Optical GBICs are available in both shortwave (to 500m) and longwave (to 10km) versions and have standard dual SC connectors for attaching cabling. Copper GBICs are available in passive (to 13m) and active (to 30m) versions and use DB-9 or HSSDC (high-speed serial direct connect) cable interfaces. The HSSDC interface was developed by Amp and provides quicker cable attachment and removal compared with the lock-down screw design of the DB-9 connector.

Concurrent with the development of 2Gbps Fibre Channel products, vendors have introduced SFF optical and copper connectors. These can be fixed or removable and enable vendors to create much higher port density in the same chassisfootprint. A 1U (one 19-inch rack space) Fibre Channel switch design that previously could support only 16 standard

of 407

GBICs can now support 32 ports. Small form factor optical and copper connectors also enable much higher port density in Fibre Channel director switches, which typically support 64 to more than 200 ports in a single chassis.

If a SAN design requires an all-optical solution, you can accommodate copper-only devices by using media interface adapters, or MIAs. MIAs typically provide a DB-9 connector on one end, and a dual SC optical connector on the other. Power to drive the MIA's transceiver function can be drawn through the DB-9 connector (if the supporting electronics supply power and ground pins) or via an AC adapter. MIAs introduce an additional component, and therefore another potential point of failure, into the topology, but they are a viable means to overcome copper's EMI and distance limitations.

Because fiber optics pose a potential laser safety issue, early implementations used an open fiber control (OFC) mechanism to shut down laser transmission when a cable was removed or broken. The OFC feature is based on a signaling handshake between two devices. If a receiver loses signal, it initially shuts off its transmitter and then begins a series of low-intensity signals to its partner. If the partner also responds with low-intensity signals (for example, if the cable is reattached), both sides boost their laser output to resume normal activity. The time required to complete an OFC handshake is unsuited to some topologies, particularly for arbitrated loop. The development of new laser technologythat can support full-speed signaling at safer, low intensities has resulted in a class of optical connectors known as non-OFC transceivers. Currently, most transceivers used for SANs are based on the non-OFC standard. Still, however tempting, it is not advisable to stare directly into any laser source.


3.4 Data Encoding Proper transmission and reception of a digital bit stream would not be possible if raw bytes of data were simply herded into a shift register and shoved one bit at a time onto the transport. Among all the possible combinations of 1's and 0's in an 8-bit byte, and the possible combinations of bytes that could be sent one after the other, bit streams with sustained 1's or 0's could occur. A raw stream of hex FF bytes, for example, would appear as a sustained DC positive voltage on the transmission line. Without a byte encoding scheme to force transitions from positive to negative, it would be impossible to detect when one bit ended and another began—in other words, it would be impossible to recover the clocking used to issue the stream.

This problem is exacerbated at gigabit and multigigabit speeds. At more than a thousand million bits per second and a standards-enforced bit error rate of 10–12, the Fibre Channel physical transport must ensure that every bit is properly recovered for reassembly into valid bytes. This is accomplished with an efficient encoding scheme that ensures that no sustained highs or lows will occur in the transport signal.

First developed by IBM, the 8b/10b encoding algorithm converts each 8-bit byte into two possible 10-bit characters. Each of the two 10-bit products of this conversion should have no more than six total 1's or 0's, and about half will have an equal number of 1's and 0's. The 10-bit characters with more 1's than 0's have positive disparity, whereas those with more 0's than 1's have negative disparity. An even balance of 1's and 0's results in neutral disparity.

In processing all 256 possible data bytes, the 8b/10b mechanism allows a maximum of only six bits of the same type per character, with no more than four of the same bit type to occur sequentially (see Figure 3-3). The hex byte x'FF', forexample, is encoded from its original binary 1111 1111 into either 101011 0001 or 010100 1110. Both encoding results have neutral disparity, and neither has more than four of the same bit type in sequence.

Figure 3-3. 8b/10b encoding logic

of 407

Within the possible combinations that can be generated with 10 bits, a number have no relation to the standard 256 data bytes. Fibre Channel uses one of these nondata characters as a special command character. Known as the K28.5command character, it has a positive disparity composition of 001111 1010 and a negative disparity of 110000 0101.In both instances, the special character has two bits of one type followed by five of the opposite, something that immediately distinguishes the command character from ordinary encoded data characters.

This unique comma sequence is used by data recovery circuitry as a trigger for defining the boundaries of 10-bit characters. When the comma sequence is detected in a serial bit stream, data recovery can begin picking up entire characters simply by counting the bits to the beginning of the next 10-bit character.

Given that the conversion of 8-bit data bytes produces two 10-bit results, how does the 8b/10b encoder decide which result to use? If the choice were arbitrary, the sequential shipment of some 10-bit characters would, in combination, result in an imbalance of the same bit types (including instances of five bits in sequence) appearing on the bit stream. As a result, the clock and data recovery circuitry at the receiving end would not be able to retrieve valid characters.

The 8b/10b encoder resolves this problem by monitoring the disparity of the previously processed character. If a positive disparity character is transmitted, the next character issued should have negative disparity. By monitoring the running disparity of the preceding event, the encoder can ensure that an overall balance of 1's and 0's is maintained in the serial stream. Data bytes that encode to two equally balanced results (an equal number of 1's and 0's) have one result that is used when the current running disparity is positive, and another when the current running disparity is negative.

8b/10b notation is expressed in dotted decimal subsets of the original data byte before it undergoes encoding. A hex xD7, for example, has a binary bit pattern of 1101 0111. The notation scheme divides this pattern into the first three bits, 110 (decimal 6), followed by the byte's last five bits, 10111 (decimal 23). This division reflects the internal operation of the 8b/10b encoder, because these units are processed in smaller 3b/4b and 5b/6b modules and then swapped to produce the 10-bit results. Consequently, 8b/10b notation refers to xD7 as D23.6. The only intuitive aspect of this notation is the D, which stands for "data." The K in the unique K28.5 command character indicates that it is a control character (Kontrol in German). It is distinguished from D28.5 (xBC) by virtue of its exclusive comma sequence.

The practical impact of 8b/10b encoding on general SAN design becomes clear when we examine Fibre Channel protocol,addressing, and hardware issues. For all the effort to maintain a more even distribution of 1's and 0's in the gigabit stream, some quite valid sequences of encoded characters nonetheless stress the ability of the recovery circuitry to capture data. The 8b/10b algorithm is also responsible for the address space allocated to arbitrated loop, as discussed later.

Gigabit Ethernet adopted the physical layer and byte encoding scheme initially developed by Fibre Channel engineers. Below the link layer, then, Fibre Channel and Gigabit Ethernet are very similar and so share common characteristics such as jitter, CDR, and 8b/10b encoding. Gigabit Ethernet, however, has elected not to follow Fibre Channel's lead on

of 407

2Gbps transport and instead has focused on 10Gbps. Having originally done the hard pioneering work on 1Gbps technology, Fibre Channel vendors are now letting Ethernet engineers pave the way to 10Gbps and will produce the Fibre Channel variant after the 10Gbps Ethernet standard has proved successful.


3.5 Ordered Sets To move data across the network, Fibre Channel uses a concise command syntax referred to as ordered sets. These transmission words are composed of four encoded bytes, or 40 bits, the first byte of which is the K28.5 special character. Ordered set recognition logic triggers on the K28.5 character, and the three following data bytes define the function provided by the ordered set.

Ordered sets fall into three general categories: start of frame, end of frame, and primitive signals. One subset is used to indicate the start and end of data frames and the class of service used to ship the frame. The start of frame (SOF) delimiter K28.5 D21.5 D22.2 D22.2, for example, is used in a frame header to initiate a connectionless level of service. Following the SOF ordered set, as many as 2,112 bytes of data would compose the body of the frame. An end of frame(EOF) ordered set—for example, K28.5 D21.4 D21.6 D21.6—would indicate that the transmission of this frame was complete. Fibre Channel defines 11 different types of SOF ordered sets, and 8 EOF words. The specific SOF or EOF used informs the recipient as to what further action (such as an acknowledgment), if any, is required.

A group of ordered sets known as primitive signals is used to indicate actions or events on the transport. The IDLEprimitive, for example, has the notation K28.5 D21.4 D21.5 D21.5 and is used as a fill word between data frames and during periods of inactivity. The Close primitive (CLS) is used in arbitrated loop configurations to indicate the end of a transaction. Only a single occurrence of a primitive signal will trigger a response. If a primitive signal is missed or is corrupted in transport (for example, by excessive jitter on a link), logic on the end nodes should time out on the transaction and recover via higher-level protocols.

Primitive sequences are a group of ordered sets used to indicate or initiate state changes on the transport. Unlike primitive signals, primitive sequences require that at least three consecutive words be recognized before any action is taken. A Fibre Channel disk drive, for example, would issue a sequence of loop initialization primitives (LIPs) when first inserted into an arbitrated loop topology. Downstream nodes would not respond to the primitive sequence until three consecutive LIPs had been received.

Although most frame delimiting ordered sets are common to all Fibre Channel topologies, supersets of primitive signals and primitive sequences have been defined specifically for arbitrated loop and Fibre Channel fabric operation. A shared topology such as arbitrated loop requires special commands for gaining exclusive control of the transport. A dedicated connection to a fabric, on the other hand, assumes that the transport is always available and so requires no negotiation for access. The ordered sets unique to each topology are discussed in Chapter 4.


3.6 Framing Protocol To move data from one Fibre Channel-attached device to another, the data blocks handed down by the upper-layer protocol of the sender must be organized into discrete packets for shipment across the transport. In Fibre Channel nomenclature, data packets are referred to as frames.

As shown in Table 3-2, frames are always prefaced with an ordered set start of frame (SOF) delimiter. This is a single

Table 3-2. Fibre Channel Frame Format

SOF Header Data Field CRC EOF

of 407

4-byte word that defines the class of service used and specifies whether the frame is the first in a series or simply one ina series of related frames. Following the SOF, a 24-byte frame header contains destination and source addressing, as well as control fields that indicate the frame's content (control information or data type) and position within a series of sequential frames.

After the header, the data unit itself may be from 0 to 2,112 bytes. This variable-length data framing technique allows Fibre Channel to accommodate various application requirements with a reasonable balance between frame overhead andpayload. Because Fibre Channel frame construction is based on multiples of 4-byte transmission words, the user data must be padded with additional fill bytes if the total byte count is not evenly divisible by 4. A 509-byte payload, for example, would need 3 additional fill bytes for proper frame alignment. Data integrity within the frame is verified with a 32-bit cyclic redundancy check (CRC). The CRC calculation is performed before the data is run through the 8b/10b encoder, and the 4-byte CRC itself is later encoded along with the rest of the frame contents. Following the CRC, an endof frame (EOF) ordered set is appended to notify the recipient that the frame is complete. The specific EOF used is determined by the class of service and whether the frame is one of or the last in of a series of frames.

In addition to the standard 24-byte frame header, optional headers can be used for applications that require extensive control fields. To keep the total frame size within 2,148 bytes, the optional headers must occupy part of the 2,112-byte data space, and this may reduce the data payload by as much as 64 bytes per frame. You should recognize the presenceof optional headers when calculating data throughput in Fibre Channel environments. For most applications, it is sufficient to simply count the bytes between SOF and EOF and deduct 28 bytes (normal header + CRC) to determine the data payload.

After the frame is assembled, the frames are delivered via a protocol hierarchy of sequences and exchanges. A sequence can include one or more related frames—for example, a block of data written to disk via multiple frames—whereas an exchange can include one or more unrelated sequences. Two communicating devices, in turn, can have several exchanges established at the same time, with unique exchange IDs and sequence IDs separating the traffic. Thisnested structure of exchanges and sequences of frames maximizes utilization of the link between two devices, with minimal overhead for setting up and tearing down logical connections.

To further minimize overhead in frame transport, Fibre Channel limits error recovery to the sequence level. If a frame fails the CRC check, for example, it is more efficient to recover by reissuing the entire sequence of frames at gigabit speed than to embed the logic required to track and recover individual frames. Reliance on sequence-level recovery is also a testimony to the link integrity expected for the Fibre Channel transport. If an extremely low bit error rate is maintained by the Fibre Channel infrastructure, few sequence recoveries are required. [ Team LiB ][ Team LiB ]

3.7 Classes of Service To ensure proper data transport, applications may require different levels of delivery guarantees, bandwidth, and connectivity between communicators. Data streaming for a tape backup application, for example, implies a sustained, dedicated connection and full-bandwidth availability. Alternatively, with intermittent activity, online transaction processing (OLTP) may not require dedicated bandwidth but would benefit from acknowledgments for each transfer. Toaccommodate these requirements, Fibre Channel standards provide five classes of delivery service. Not all of these classes are currently implemented in shipping products.

Class 1 service defines a dedicated connection between two devices—for example, a file server and a disk array—with acknowledgment of frame delivery. The communicating partners establish a Class 1 connection and then assume full bandwidth and a constant circuit until the connection is taken down. Because the Class 1 circuit usually excludes others from intervening in the conversation, there is no danger that one of the partners will be busied out during an exchange. The dedicated connection also ensures that all frames will be received in the same order they were sent, something that speeds reassembly of data blocks at the FC-2 level.

Class 1 service places an additional design burden on the Fibre Channel switches. A Class 1 connection between a serverand a disk that are communicating via a switch consumes switch routing resources. The switch must maintain a dedicated channel between the communicating ports and is not allowed to intervene. Thus, multiple Class 1 connections by multiple pairs would quickly impede the switch's ability to serve other requests to those ports. Analogous to a phone switch, Class 1 service provides a camp on facility, which, if supported by the switch, allows a contending request to be queued against a busy port. As soon as the current Class 1 connection is ended, the new request can be served. You need additional buffering on the switch as well as logic to monitor the queue and timeout values. Class 1 service also

of 407

defines a stacked connect feature, which allows a Class 1 participant to queue additional connect requests to the switch. Because of the added complexity and cost associated with Class 1 functionality, these features are not widely implemented by Fibre Channel switch vendors.

Class 1 service can also be implemented in a nonswitched topology such as arbitrated loop. Typically, however, loop implementations employ the more democratic Class 3 service.

Class 2 service does not require a dedicated connection between talking pairs, but it does provide acknowledgment of frame delivery. In switch environments, the connectionless nature of Class 2 allows the switch to forward frames as resources allow without dedicating bandwidth to a transaction. Traffic congestion or multiple routes can therefore result in frames being received in random order, something that would require additional processing overhead for reassembly into data blocks. To alleviate this possibility, a Class 2 connection can request in-order delivery. In arbitrated loop environments, a Class 2 transfer ensures in-order delivery of frames because only the communicating partners occupy the loop for the duration of the transaction.

Class 2 is suited to mission-critical applications that demand a high degree of data integrity at the transport level. Acknowledgment of frame delivery and notification of nondelivery add an extra validation without involving upper-level protocols. Class 2 is also suited to smaller data transactions with only bursty traffic requirements. Intermittent updates to a relational database, for example, do not require a dedicated connection but may need verification via frame acknowledgments.

Like Class 2, Class 3 service is connectionless, but unlike Class 2, there is no notification of delivery. Class 3 is similar to datagram services in LAN topologies (for example, User Datagram Protocol/Internet Protocol, or UDP/IP) and relies on upper-level protocols to recover if a transport error occurs. To a certain extent, Class 3 service sacrifices reliability for reduced overhead and higher performance. In well-constructed SANs, the reliability factor may not be a pressing issue. Class 3, for example, is widely used in arbitrated loop environments, even for mission-critical applications. This is possible because of the temporary dedicated connections between partners that a loop topology provides. Frames are received in order, and, with the selection of appropriate products, the low occurrence of bit errors results in reliable frame transport. In a heavily burdened switch topology, Class 3's inherent laissez-faire attitude towardframe delivery is more apparent. A switch may discard Class 3 frames under congested conditions and, because there is no acknowledgment mechanism, trigger a more lengthy recovery chain reaction at the FC-4 upper-level protocol.

You should consider this difference between loop and switch Class 3 handling when designing SANs for specific applications. Some Class 3 implementations are more reliably deployed on loop segments alone, whereas others, depending on bandwidth allocation requirements, can be reasonably configured with a combination of loop segments andswitch topologies.

For multiclass applications, Fibre Channel provides an optional intermix feature. Intermix allows any unused bandwidth in a Class 1 connection to be utilized by Class 2 and Class 3 frames. The operative descriptor is unused, because Class 1 by definition guarantees full bandwidth when Class 1 frames are present. If implemented, intermix is useful for applications such as tape backup because Class 2 or 3 frames could be used to set up the next transaction while a Class 1 connection is still active.

Two additional delivery classes have been defined, but not widely implemented, for specialized applications. Class 4 service introduces the concept of virtual circuits to Fibre Channel architecture. Like Class 1, Class 4 service is connection-oriented. But instead of allocating the entire bandwidth between communicating pairs, Class 4 lets you assign fractional bandwidth as well as different quality of service (QoS) parameters for each connection. Additionally, you can establish multiple virtual circuits between a single pair or multiple pairs of communicators. The mandated quality of service for each virtual circuit ensures that time-sensitive applications, such as real-time video, always have bandwidth available for transmission. The functionality described by Class 4 service poses considerable design challenges to switch architects. The switch must maintain and monitor potentially hundreds of virtual circuits, with variable QoS parameters for each one. This complexity may explain why Class 4 capability exists primarily as a Fibre Channel standard and not as a standard feature of switch products. Fibre Channel Class 5 is as yet undefined.

Class 6 service has been defined to provide multicast service with acknowledged delivery. Multicast is a requirement for video broadcast applications based on a central video server and multiple video recipients. This function can also be provided by Class 3 datagram service if acknowledgment is not required. Depending on the number of target devices in a multicast group, the number of acknowledgments flooding back to an initiator might overwhelm the topology. Class 6 alleviates this problem by placing a multicast server into the configuration. The multicast server is the focal point for all acknowledgments and, for each multicast transaction, returns a single confirmation to the initiator.

of 407

Although Class 4 and Class 6 services are described in granular detail in Fibre Channel documentation, this does not mean that these features have been implemented in tangible products. Fibre Channel switches, for example, typically listsupport for Class 2 and Class 3 services only. Support for other functionality is driven, like everything else, by market demand. The fact that Class 4 and Class 6 have emerged as Fibre Channel standards, however, reflects the influence of market requirements on the technology. It also indicates that cooperation between customers and vendors is driving the higher functionality seen in products.

An additional delivery class, known as Class F, is specifically for switch-to-switch communications. Similar to Class 2, Class F traffic is connectionless and requires acknowledgment of delivery or failure of delivery. Class F is restricted to Fibre Channel switch expansion ports (E_Ports) and is used to exchange routing, name service, and notifications between switches joined into a single fabric. E_Ports are discussed later in this chapter. [ Team LiB ][ Team LiB ]

3.8 Flow Control To prevent a target device from being overwhelmed with frames, Fibre Channel provides several flow control mechanisms based on a system of credits. Each credit represents a device's ability to accept an additional frame. If the recipient issues no credits to the sender, no frames can be sent. Pacing frame transport on the basis of credits prevents loss of frames and reduces the frequency of entire sequences being retransmitted across the link.

In practice, this credit scheme is based on the number of buffers that an end node maintains for storing incoming data. Memory on-board a host bus adapter, for example, may be allocated as receive buffers to interface between the deserializing and decoding functions of FC-1 and the frame reassembly function of FC-2. The receive buffers are filled as frames are handed up by FC-1, and they are emptied as FC-2's assembly line pulls individual frames for reconstruction into data blocks.

For efficient utilization of the Fibre Channel transport, it is useful if multiple frames can be sent consecutively. This is accomplished by granting sufficient credits before a transaction begins and by using Fibre Channel's full duplex capabilityto send additional credits while frames are still being received and processed.

Transmission credit is initially established when two communicating nodes log in and exchange communication parameters. End-to-end flow control (EE_Credit) is used by Class 1 and Class 2 service between two end nodes and is monitored by the nodes themselves. An intervening switch does not participate in EE_Credit. After an initial credit level is granted, credits are replenished by acknowledgments issued by the receiver to the sender. The sender decrements theEE_Credit by 1 for each frame issued and increments only when an ACK is received.

Buffer-to-buffer credit (BB_Credit) is used by Class 2 and Class 3 service and relies on the receiver-ready (R_RDY)ordered set to replenish credits. An end node attached to a switch will establish its BB_Credit during login to the fabric. A communicating partner on the far side of the switch will establish its own (and possibly different) BB_Credit to the switch during login. BB_Credit thus has no end-to-end component. The sender decrements the BB_Credit by 1 for each frame sent, and increments BB_Credit by 1 for each R_RDY received. The initial value of the BB_Credit must be nonzero.

In arbitrated loop, an alternative BB_Credit scheme is employed. Alternative BB_Credit assumes that the BB_Credit ofa device is zero. The target node does not establish a positive BB_Credit until it is opened for a transaction by an initiator. The target then issues a number of R_RDYs equivalent to the receive buffers it has available. The differentiating feature of alternative BB_Credit is that an initiator does not have to maintain a table of BB_Credit values for potentially more than 100 targets on the same loop. By simply setting the initial value to zero and relying on R_RDYsto establish credit, you dramatically reduce the overhead on loop devices. [ Team LiB ][ Team LiB ]

3.9 Naming and Addressing Conventions The unique identity of participants in a Fibre Channel environment is maintained through a hierarchy of fixed names and assigned address identifiers. In Fibre Channel terminology, a communicating device is a node. A host bus adapter in a server, for example, constitutes a Fibre Channel node. Normally, the node has only one physical interface, known as a

of 407

node port, or N_Port. If an HBA or controller is multiported (some have as many as four physical ports per card), it is possible for a single node to have multiple N_Ports.

Each node has a fixed 8-byte node name assigned by the manufacturer. If the manufacturer has registered with the Institute of Electrical and Electronics Engineers (IEEE) for a range of addresses, the node name will be globally unique and so is normally referred to as a World-Wide Name (WWN). An N_Port within a parent node is also assigned a unique 8-byte port name that typically follows the IEEE format. This naming convention allows each node and its associated N_Ports to be unique and accessible, even in complex SANs. Standards allow for several IEEE formats, as well as 64-bit formats for IP and locally administered names.

Like the 48-bit MAC address scheme in Ethernet and Token Ring, the Fibre Channel naming convention allows either global or locally administered uniqueness to be assigned to a device. Unlike the LAN MAC address, however, the administered name or WWN is not used for transporting frames across the network. In addition to a Fibre Channel name, a communicating device is dynamically assigned a 24-bit port address, or N_Port ID, that is used for frame routing (see Figure 3-4). The 24-bit addresses of two communicating partners are embedded in the frame header for both the destination identifier (D_ID) and source identifier (S_ID).

Figure 3-4. Server with a dual-ported host bus adapter

This dual name and network address system has several benefits. The 64-bit name provides a unique identity throughout the SAN. Using 64-bit identifiers for routing data, however, would fatten the frame header and incur more processing overhead. By using a shorter, 24-bit port address, this arrangement optimizes the frame header and routing logic for high-speed switching of frames. The 24-bit format still allows for more than 16 million addresses, an address space far larger than any practical SAN design.

In addition to optimizing frame routing, the 24-bit port address strategy allows the topology itself to assign addresses and thereby removes the need for manual administration of addresses. In fabric environments, the switch is responsible for assigning a 24-bit address to each device as it logs on. The switch maintains a table of the device's WWN and the assigned 24-bit address in the Simple Name Server (SNS). In arbitrated loop, the loop devices themselves perform an address selection routine that ensures that every device on the loop has a unique address. Dynamic addressing removesthe element of human error in address maintenance and provides flexibility for adds, moves, and changes in the SAN. Atthe same time, Fibre Channel's self-addressing mechanism places additional burdens on fabric switch design, particularlyas SANs scale to large sizes or are extended over distance.

An N_Port can be attached directly to another N_Port, as in a point-to-point topology. In a fabric, the N_Port is attached to a fabric port, or F_Port. Each switch port also has a 64-bit port name, and the switch itself, like a node, has a 64-bit fabric name. The unique identity of each switch and switch port, as well as each attached node port, is thus established for the entire SAN.

Connections between fabric switches are maintained by expansion ports, or E_Ports. If the switch vendor provides ports that can be used for either node attachment or expansion, a generic port, or G_Port, definition is used. If in addition the G_Port can support loop devices, it is referred to as a GL_Port. Finally, a switch can also allow attachment of public arbitrated loop devices on fabric loop ports, or FL_Ports. The attached arbitrated loop ports are referred to as node loop ports, or NL_Ports. The connection between the NL_Port and the FL_Port requires higher functionality and protocol support on the part of the NL_Port, including logon facilities and registration with the switch's SNS. These

of 407

fabric services can also be provided by a special NL_Port on an arbitrated loop if no fabric switch is present. This special port is defined as a fabric/node loop port, or F/NL_Port.

A freestanding arbitrated loop is configured with multiple NL_Ports on a shared medium. If the NL_Ports are connected through an arbitrated loop hub, the hub plays no active role in frame routing decisions. Hub ports therefore have no particular designation because the hub's existence is transparent to the attached devices. The trend in hub design is toward adding intelligence, particularly in loop management and automatic recovery features. If for management purposes a vendor engineers a Fibre Channel controller in the hub architecture, it appears as an additional NL_Port to the topology.

For native Fibre Channel extension over distance, a bridge can be connected to a fabric switch to translate between FibreChannel and a LAN or WAN. A bridge port, or B_Port, designation is used to refer to the bridge interface to the fabric and, as with E_Port connections, Class F traffic is used for communications between the fabric switch and the bridge.


3.10 Chapter Summary FC Layers

� Fibre Channel is a standards-based, layered architecture.

� FC-0 defines the physical interface, which may be optical or copper.

� FC-1 provides low-level link controls and data encoding for gigabit transport.

� FC-2 defines segmentation and reassembly of data via frames, flow control, and classes of service.

� FC-3 is being developed for common services.

� FC-4 is the Upper layer protocol interface between Fibre Channel and IP, SCSI-3, and other protocols.

Gigabit and 2Gbps Transport

� Fibre Channel supports 1Gbps and 2Gbps transmission rates.

� Fibre Channel standards enforce a 10–12 bit error rate.

� Timing deviation is referred to as jitter. Fibre Channel network components are assigned a jitter budget; if the budget is exceeded, excessive errors may occur.

� 10Gbps Fibre Channel will be based on 10Gbps Ethernet transport standards.

Physical Layer Options

� Fibre Channel supports both optical and copper media.

� The maximum distance for intracabinet copper is 13 meters.

� The maximum distance for intercabinet copper is 30 meters.

� Shortwave laser supports up to 500 meters on 50�m/125�m multimode cable.

of 407

� Longwave laser supports up to 10 kilometers on 9�m/125�m single-mode cable.

� The most commonly used transceivers are GBICs and small form factor transceivers.

Data Encoding

� For reliable transport at gigabit speeds, data must be encoded.

� 8b/10b encoding converts 8-bit bytes into 10-bit characters.

� The special K28.5 character is used to indicate Fibre Channel commands.

� Running disparity is used to maintain balanced signaling.

Ordered Sets

� A Fibre Channel word is four 10-bit characters.

� Words with K28.5 in the first character position are ordered sets.

� Ordered sets are used as frame delimiters, primitive signals, and primitive sequences for low-level transport protocol.

Framing Protocol

� Data is segmented into frames for transport.

� The maximum frame size is 2,148 bytes, with 2,112 bytes of payload.

� Frames are transmitted as sequences of related frames.

� An exchange can include multiple sequences.

Class of Service

� Class 1 service is a dedicated connection between two communicators with acknowledgment of frame delivery.

� Class 2 service is connectionless but provides acknowledgment.

� Class 3 service is connectionless and provides no notification of delivery.

� Class 4 allows fractional bandwidth for virtual circuits.

� Class 6 provides multicast with acknowledgment.

� Class F is used for switch-to-switch communications in a fabric.

Flow Control

� Flow control is maintained by a credit scheme between communicators and prevents a target from being overwhelmed with frames.

� End-to-end credit is monitored by the communicating nodes and is replenished by acknowledgments.

of 407

� Buffer-to-buffer credit is replenished by receiver-ready (R_RDY) primitives.

� An alternative buffer-to-buffer credit is used by arbitrated loop.

Naming and Addressing Conventions

� A node is a communicating device.

� A node is identified by a unique 8-byte node name, or World-Wide Name.

� A node may contain multiple ports, or N_Ports, each of which has a unique 8-byte port name.

� Each N_Port has a 24-bit port address, or N_Port ID, used for frame routing.

� A Fibre Channel fabric switch port is an F_Port for attachment to N_Ports, or an FL_Port when attached to arbitrated loop FL_Ports.

� An E_Port is a fabric switch expansion port used to connect fabrics.

� A G_Port can support either node or expansion port attachment.

� A GL_Port can support node, loop, or expansion port attachment.

� A B_Port is a bridge port used to connect fabrics to non-Fibre Channel interconnects or networks.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 4. Fibre Channel SAN Topologies Fibre Channel architecture has evolved three distinct physical topologies to support connectivity between networked devices: point-to-point, arbitrated loop, and fabric. The first Fibre Channel configurations were built on a dedicated, point-to-point connection between two nodes. Participants in a point-to-point topology establish an initial connection via login and then assume full bandwidth availability. Arbitrated loop allows more than two devices to communicate over a shared media. An initiator in a loop environment must negotiate for access to the media before launching a transaction. A Fibre Channel fabric topology provides multiple, concurrent, point-to-point connections via link-level switching, and so it entails a much higher level of complexity.

The vast majority of Fibre Channel SAN configurations being shipped today by major storage solutions providers are based on fabric switches. This chapter reviews the internal operations of point-to-point, loop, and fabrics, focusing on the requirements of the more commonly deployed switched fabrics.


4.1 Point-to-Point A point-to-point topology is a simple, direct connection between two N_Ports. As shown in Figure 4-1, the transmit lead of one N_Port is connected via copper or optical cabling to the receive lead of its partner. The partner's transmit, in turn,is cabled to the other N_Port's receive. This cabling scheme creates dedicated bandwidth between the pair, with throughput of up to 100MBps in each direction for 1Gbps Fibre Channel, and 200MBps in each direction for 2Gbps Fibre Channel.

Figure 4-1. A point-to-point Fibre Channel link between a server and a disk

Before data transactions can occur, the two N_Ports must perform an N_Port login to assign N_Port addresses or N_Port IDs. Thereafter, a persistent connection is maintained, with utilization of the dedicated link determined by the application.

Although it is conceivable to have an application requiring simultaneous full duplex transfers—for example, 200MBps total throughput at 1Gbps—in practice only one side of the link sees any real traffic at a given moment. Depending on

of 407

the application, a server and disk in a point-to-point configuration may be performing either reads or writes of data, or both concurrently. From the server's standpoint, an ongoing read operation would initiate incoming frames on the server's receiver, with only ACKs (for Class 1 or 2 service) leaving the server's transmitter. Even then, the 100MBps available bandwidth on the server's receive link is not likely to saturate. Link utilization in point-to-point configurations isdetermined by the performance of the Fibre Channel controllers at either end, along with the buffering available to queue up data to be transmitted or received.

The original point-to-point configurations were based on quarter-speed, 266Mbps bandwidth with an effective throughput of 25MBps in each direction. Distributed primarily by Sun Microsystems, quarter-speed implemen tations used fiber-optic GLMs as the transceiver interface to the host bus adapter and disk controller logic. Tens of thousands of these systems were shipped to customers during the mid-1990s, creating a large but unadvertised base of Fibre Channelproducts in production environments long before SANs became popular. This legacy base has provided valuable experience for improving both the physical transport and protocol support.

As server and disk performance has increased, Fibre Channel through put has kept pace with 1Gbps and 2Gbps implementations. At the same time, advances in fabric and arbitrated loop technology have enabled more flexibility and functionality than point-to-point provides. A point-to-point configuration is still viable for simple configurations, but for growth of the SAN it is important to select the proper HBA and controller components. If the vendor includes device drivers for point-to-point protocol, and arbitrated loop and fabric attachment, you can accommodate additional devices on the SAN with minimal pain. [ Team LiB ][ Team LiB ]

4.2 Arbitrated Loop Compared with point-to-point configurations, arbitrated loops provide more flexibility and support for more devices. Although most SAN offerings use fabric switches for connectivity, arbitrated loop is still commonly used for JBODs (just abunch of disks) and for mass storage to support NAS processors.

Arbitrated loop is a shared, gigabit transport. As with shared Ethernet or Token Ring segments, the functional bandwidthavailable to any individual loop device is determined by the total population on the segment and the level of activity of the other participants: more active talkers, less available bandwidth. A 1Gbps arbitrated loop with 50 equally active nodes, for example, would provide 100MBps/50, or only 2MBps functional bandwidth per node. Arbitrated loop would therefore not be a popular choice for SANs were it not for the fact that a typical storage network has relatively few active contenders for bandwidth. Although a single loop may have more than a hundred disk drives, there are usually nomore than four to six initiators making requests to those drives. Thus, you can design large configurations on a single loop without dividing the bandwidth down to the level of ordinary Ethernet.

Because the transport is shared, some means must be provided for orderly access to the media. In arbitrated loop, you gain media access through an arbitration protocol. After an NL_Port has arbitrated and won control of the transport, it has the full bandwidth available for its transaction. When the transaction is complete, the NL_Port closes the temporary connection, making the transport available to others.

4.2.1 Loop Physical Topology

Arbitrated loop is a true physical loop, or ring, created by tying the transmit lead of one NL_Port to the receive lead of its downstream neighbor. The neighbor's transmit is, in turn, connected to the receiver of yet another NL_Port, and so on, until the circle completes at the original NL_Port's receiver. In this way, a continuous data path exists through all theNL_Ports, allowing any device to access any other device on the loop, as illustrated in Figure 4-2.

Figure 4-2. A daisy chain arbitrated loop between a server and two storage arrays

of 407

The first arbitrated loops were built in a daisy chain configuration, using copper or fiber-optic cabling to create the loop of NL_Ports. Several problems quickly arose. Powering off or disconnecting a single node would break the chain and thuscrash the loop. A break in cabling or a faulty transceiver anywhere along the loop would also halt loop traffic and entail tedious troubleshooting to locate the problem. As with the problems encountered in hardwired Token Ring topologies, the overhead and risks associated with dispersed loop cabling promoted the development of centralized arbitrated loop hubs.

Arbitrated loop hubs provide a physical star topology for a loop configuration, bringing each NL_Port's transmit and receive leads to a common location. The internal architecture of a hub completes the connections between transmitters and receivers on a port-by-port basis via mux circuitry, and it finishes the loop by connecting the transmitter of the last hub port (say, port 8) to the receiver of the first (say, port 1). One of the most useful features of a loop hub is bypass circuitry at each port, which allows the loop to circumvent a disabled or disconnected node while maintaining operation. Most unmanaged arbitrated loop hubs also validate proper gigabit signal ing before allowing a device to insert into the loop, whereas managed hubs provide additional functionality. These features are described in more detail in Chapter 5.

Because an arbitrated loop hub supplies a limited number of ports, building larger loops may require linking multiple hubs. This is called hub cascading. As shown in Figure 4-3, a cascade is simply a normal cable connection between a port on one hub and a port on another. No special cable is required, although to minimize potential ground loop and noise problems, fiber-optic cabling is recommended instead of copper. Cascading consumes one port on the first and lasthubs in a chain, and two ports on intervening ones. Cascading three 8-port hubs, for example, would yield 20 usable ports, with one-fifth of the total ports sacrificed to achieve the cascade. Depending on the vendor, hubs can be cascadedto the arbitrated loop maximum of 127 ports, although the advisability of doing so should be application-driven. Just because you can build some configurations does not mean that you should.

Figure 4-3. Cascaded arbitrated loop hubs

of 407

Cascading one hub to another extends the loop through the additional ports on the downstream hub. You can achieve a similar effect by inserting a JBOD into a hub port. Although the link between the hub and the JBOD consists of a single cable pair, the JBOD itself is composed of a series of arbitrated loop disks daisy chained (transmit to receive) together. The transmit and receive leads of the JBOD interface to the hub represent an entire cluster or loop segment of multiple NL_Ports and not just a single NL_Port. The loop is thus extended through the JBOD enclosure, and the loop population is increased by the number of drives in the JBOD chassis. This is an important consideration when you calculate hub portrequirements and optimal loop size.

Arbitrated loop standards provide address space for up to 127 devices (126 NL_Ports and 1 FL_Port) on one loop. Fibre Channel specifications allow for 10km runs over single-mode cabling and longwave fiber-optic transceivers. You are advised not to combine these two concepts. Even a few 10km links on a single loop can severely impede loop performance, because each 10km link incurs a 50-microsecond propagation delay in each direction. Every transaction onthe loop would have to traverse the extended links, multiplying the effect of each transit delay by the number of transactions. Long haul requirements for disaster recovery or campus networks are better served with dedicated fabric switch ports.

4.2.2 Loop Addressing

An NL_Port, like an N_Port, has a 24-bit port address. If no switch connection exists, the upper two bytes of this addressare zeroed to x'00 00'. This arrangement is referred to as private loop because devices on the loop have no connection to the outside world. If the loop is attached to a fabric and if an NL_Port supports fabric login, the switch assigns a positive value to the upper two bytes (and possibly the last byte). This mode is called public loop because fabric-capable NL_Ports are members of both a local loop segment and a greater fabric community, and they need a full 24-bit address for identity in the network. In the case of public loop assignment, the value of the upper two bytes represents the loop identifier and is common to all NL_Ports on the same loop that performed login to the fabric.

In both public and private arbitrated loops, the last byte of the 24-bit port address is referred to as the arbitrated loop physical address, or AL_PA. The AL_PA is acquired during initialization of the loop; in the case of fabric-capable loop devices, the switch can modify the AL_PA during login. The 1-byte AL_PA provides a compact addressing scheme and lets you include a device's identity as part of a 4-byte ordered set. In fact, an ordered set can include two AL_PAs, identifying both source and destination devices. The ordered set for open full duplex, for example, is K28.5 D17.4 AL_PD AL_PS, with AL_PD representing the destination address and AL_PS representing the source address.

The total number of AL_PAs available for arbitrated loop addressing is 127. This number was not determined by rigorousperformance testing on assorted loop topologies, nor was it calculated on theoretical throughput given various loop populations. Instead, it is based on the requirements of 8b/10b running disparity between frames.

As a frame terminates with an end of frame character, the EOF forces the current running disparity to be negative. By Fibre Channel standards, each transmission word between the end of one frame and the beginning of another frame should also leave the running disparity negative. This function is provided by the IDLE ordered set, which has a fixed

of 407

format of K28.5 D21.4 D21.5 D21.5. The special K28.5 leaves running disparity positive. The D21.4 leaves the running disparity negative. The D21.5 characters used for the last two bytes are neutral disparity. The net result is a negative running disparity at the end of the IDLE transmission word.

Because the loop-specific ordered sets may include AL_PAs in the last two byte positions, negative running disparity is facilitated if these values are neutral. In the open full duplex ordered set just cited, for example, the D17.4 character following the special K28.5 would leave the running disparity negative. If the destination and source AL_PAs are neutral disparity, the Open transmission word will leave the running disparity negative. This satisfies the requirement for the next start of frame (SOF).

If all 256 possible 8-bit bytes are dispatched to the 8b/10b encoder, 134 of them will emerge with neutral disparity characters. Fibre Channel claims some of these for special purposes. The remaining 127 neutral disparity characters have been assigned as AL_PAs.

Thus, the number 127 is not a recommended load for a Fibre Channel loop transport. It is simply the maximum number (minus reserved values) of neutral disparity addresses that could be assigned for loop use. At higher Fibre Channel speeds, such as 200MBps, having 127 active loop participants may be quite reasonable or may even be considered inadequate for some needs.

Because the AL_PA values are determined on the basis of neutral disparity, a listing of hex values of AL_PAs seems to jump randomly over some byte values and not others. Listed sequentially, the hex value of AL_PAs would begin 00, 01, 02, 04, 08, 0F, 10, 17, 18, 1B, 1D,…. The gaps in the list represent byte values that, after 8b/10b encoding, resultin nonneutral disparity characters. This is significant for some Fibre Channel disk drives, which allow the user to set jumpers or dip switches on a controller card to manually assign a fixed AL_PA. Typically, the jumper positions correspond only to an index of AL_PA values and do not correspond to actual hex values (in most network equipment they correspond to the actual hex values).

Arbitrated loop assigns priority to AL_PAs based on numeric value. The lower the numeric value, the higher the priority. AL_PA priority is used during arbitration to give advantage to initiators such as file servers and fabric loop ports. An FL_Port by default has the address x'00', which gives it the highest priority over all other NL_Ports. When arbitrating against other devices for access to the loop, the FL_Port will always win. This helps to ensure that a valuable resource such as a switch can quickly serve the loop and then return to fabric duties. During address selection, as shown in Figure4-4, servers typically attempt to take the highest-priority, lowest-value AL_PAs, whereas disk arrays take lower-priority, higher-value AL_PAs. A server with an AL_PA of x'01' will have a statistically higher chance of winning arbitration againstlower-priority contenders, although arbitrated loop also provides safeguards against starvation of any port.

Figure 4-4. AL_PA assignment on a small arbitrated loop

An NL_Port's AL_PA can change with every initialization of the loop or reset of the device. On the surface, this may seemdisruptive, but dynamic address assignment by the topology itself greatly reduces administrative overhead. As anyone who has had to reconfigure an IP network can testify, off loading low-level address administration to the topology is highly desirable. Arbitrated loop initialization guarantees that each attached device will have a unique AL_PA. Potential

of 407

addressing conflicts are possible only when you join two separate loops (for example, by cascading two active hubs) without initialization. Some hub vendors have responded to this problem by incorporating an initialization sequence whenever a cascade condition is sensed.

4.2.3 Loop Initialization

Loop initialization is an essential process for allowing new participants onto the loop, assigning AL_PAs, providing notification of topology changes, and recovering from loop failure. Following loop initialization, the loop enters a stable monitoring mode and begins (or resumes) normal activity. Depending on the number of NL_Ports attached to the loop, an entire loop initialization sequence may take only a few milliseconds. For Sun Solaris servers, a loop initialization may result in a message posted to the event log. For NT servers, loop initialization is largely ignored. In either case, a loop initialization on an active loop normally causes a brief suspension of activity, which resumes after initialization is complete.

A loop initialization can be triggered by a number of causes, the most common being the introduction of a new device. The new device could actually be a former participant that has been powered on, or an active device that has been moved from one hub port to another.

A number of ordered sets have been defined to cover the various conditions that an NL_Port may sense as it launches the initialization process. These ordered sets, called loop initialization primitive sequences, are referred to collectivelyas LIPs. An NL_Port issues at least 12 LIPs to start loop initialization. In the following examples, we assume a Fibre Channel host bus adapter installed in a file server:

� An HBA that is attached to an active loop and is power cycled will, upon bootup, start processing the incoming bit stream. The presence of a valid signal and protocol verifies that the server is on an active loop. Because the serverwas powered down, however, the HBA has lost the AL_PA that it was previously assigned. That previously assigned AL_PA was stored in a temporary register in the HBA, and the register was wiped clean by the power cycle. The HBA immediately begins transmitting LIP(F7, F7) onto the loop. The xF7 is a reserved, neutral disparity character. The first occurrence of xF7 indicates that the HBA recognizes that it is on an active loop. The second xF7 indicates that the HBA has no AL_PA.

� An HBA attached to an active loop is moved from one hub port to another. When the cable is unplugged from the hub and moved to the other port, the HBA temporarily loses Fibre Channel signal. Upon reinsertion, the HBA sees valid signal return and begins processing the bit stream. In this instance, the HBA still has its previously assigned AL_PA, and so it begins transmitting LIP(F7, AL_PS) onto the loop. The xF7 indicates that the HBA sees the active loop. The AL_PS is the source AL_PA of the LIP—that is, the HBA's previously assigned AL_PA. In this example, the HBA is not issuing LIPs to acquire an address but to notify the loop that a topology change has occurred.

� The receiver of the HBA is broken, or the receive cable is broken, and the server has been power cycled. In this instance, the HBA does not see valid signal on its receiver and assumes that a loop failure has occurred. It also does not recall its previously assigned AL_PA. The HBA therefore starts streaming LIP(F8, F7) onto the loop. The xF8 is another reserved, neutral disparity character that is used to indicate a loop down state. The xF7 indicates that the HBA has no AL_PA.

� In the same scenario as the preceding, if the HBA still has a previously assigned AL_PA, it will issue a LIP(F8,AL_PS). The F8 indicates that the HBA senses loop failure. The AL_PS is the source AL_PA of the alert.

Of the conditions in this list, the most insidious for arbitrated loop environments is the LIP(F8) stream. A node issuing anormal LIP(F7) will trigger, at most, a temporary suspension of loop operations until the initialization process is completed. A node issuing LIP(F8)s, however, will continue streaming "loop down" alarms as long as it cannot recognize loop activity on its receiver. If the node's transmitter is connected to an active loop, all NL_Ports will enter a suspended initialization state and will continue to forward the offender's LIP(F8) stream, as shown in Figure 4-5.Normal loop initialization cannot complete, and the loop in fact fails. This has been another challenge for vendors of arbitrated loop hubs. Some have responded with auto-recovery policies that automatically bypass a port that is streaming LIP(F8)s.

Figure 4-5. An NL_Port streaming LIP(F8)s onto an arbitrated loop

of 407

In addition to loss of signal, an NL_Port may issue LIP(F8) if no valid ordered sets are present on the loop. This may occur if an upstream node is corrupting the bit stream because of excessive jitter or a malfunction of processing logic. Other conditions may trigger LIPs, including a node's inability to successfully arbitrate for loop access. Arbitrated loop provides a fairness algorithm for media access, but if a participant is not playing fair, others on the loop may issue LIPs to reinitialize a level playing field. Arbitrated loop also provides a selective reset LIP that is directed by one NL_Port to another. How the reset is implemented is vendor-specific, but the selective reset LIP(AL_PD, AL_PS) may cause the target device to reboot. This allows one NL_Port to force a misbehaving NL_Port into a known good state.

The loop initialization process begins when an NL_Port streams at least 12 LIPs onto the loop. As each downstream device receives the LIP stream, it enters a state known as Open-Init, which suspends any current operations and prepares the device for the loop initialization procedure. The LIPs are forwarded along the loop until all NL_Ports, including the originator, are in an Open-Init condition.

At this point, the NL_Ports need someone to be in charge. Unlike Token Ring, arbitrated loop has no permanent master to monitor the topology. Loop initialization therefore provides a selection process to determine which device will be the temporary loop master. After it is selected, the loop master is responsible for conducting the rest of the initialization procedure and returning the loop to normal operation (see Figure 4-6).

Figure 4-6. Steps in loop the initialization sequence

of 407

The loop master is determined by a subroutine known as the loop initialization select master procedure, or LISM. Each loop device vies for the position of temporary master by continuously issuing LISM frames that contain a port type identifier (x'00' for FL_Port, x'EF' for NL_Port) and its 64-bit World-Wide Name (WWN). When a downstream device receives a LISM frame from an upstream partner, it first checks the identifier. If the identifier is x'00', a fabric is present and the device ceases is own LISM frame broadcast and begins issuing the FL_Port's LISM frame. If the identifier is a standard NL_Port, the downstream device compares the WWN in the LISM frame to its own. As with AL_PA priorities, theWWN with the lowest numeric value has higher priority. If the received WWN has higher priority, the device ceases its own LISM broadcast and begins transmitting the received LISM. If the received WWN has lower priority, the device throws away the received LISM and continues broadcasting its own LISM.

Eventually, a node will receive its own LISM frame, indicating that it has the highest priority and is therefore temporary loop master. It then begins transmitting a special ordered set, ARB(F0), to notify the others that a temporary master has been selected. When the ARB(F0) circles back to the loop master, the initialization can proceed to the next phase.

The first task of the temporary loop master is to issue a series of four frames that will allow each participant on the loop to select a unique AL_PA. The frame format contains a 128-bit field that represents an abbreviated mapping of all possible AL_PAs. The position of each bit in the AL_PA map corresponds to the sequential list of AL_PAs, beginning with x'00' and ending with x'EF'. When the first frame is issued, all bits are initialized to 0, indicating that no AL_PAs have been selected. As each device picks an AL_PA from the map, the corresponding bit is set to 1.

The header of each AL_PA map frame contains an identifier that defines which of the loop devices is allowed to select an AL_PA. The first frame issued has an identifier of loop initialization fabric address, or LIFA. As this frame circulates the loop, only public loop devices that have previously been assigned an address by the fabric have permission to select a bit corresponding to their original AL_PA. When this frame returns to the loop master, it is reissued with a loop initialization previous address, or LIPA, identifier. Now the private loop devices that remember their previously assigned AL_PAs have an opportunity to reselect them from the map. If by chance two devices previously had the same AL_PA (for example, if two separate, active loops were hot cascaded), the first device to see the frame would be able to reselect it. The second device would see the bit already set to 1 and would have to wait for the next frame.

When the frame is issued for the third time, the identifier is changed to loop initialization hard address, or LIHA. NL_Ports that have dip switch or jumpered addresses (for example, disk drive controllers) can now attempt to select AL_PAs from the map corresponding to their hardwired addresses. If the hard assigned AL_PA is already taken, however, a device must wait for the next addressing frame. The last frame issued has an identifier of loop initialization soft address, or LISA. This frame is for any NL_Ports that did not qualify for or were unsuccessful in the

of 407

previous rounds. On a populous, previously operational loop, devices that must select from the LISA frame would find only leftovers. Typically, an initiator such as a server will attempt to select a bit corresponding to a higher-priority AL_PA; a target such as a disk may attempt to select a lower-priority AL_PA.

After the LISA frame returns to the temporary loop master, each loop device will have a unique AL_PA. In the original Fibre Channel standard for arbitrated loop (FC-AL-1), the loop initialization process was closed after this phase and the loop returned to normal operation. As an option, vendors could implement an additional subroutine to provide positional mapping of devices along the loop. A positional map is useful for determining how AL_PAs are physically positioned in a loop topology. Knowing which AL_PAs are on which hub ports, for example, gives you the diagnostic capability and the opportunity to fine-tune a loop configuration for optimal performance.

One problem with the positional mapping subroutine, however, is that not all arbitrated loop devices support it. The HP Tachyon chip set, for example, was widely used in host bus adapters and disk controllers and was designed before positional mapping was developed. FC-AL-2 accommodates these devices by providing bits in the LISA frame header that can be set by Tachyon or other nonparticipants. If any device reports that it cannot support positional mapping, the subroutine is abandoned and the temporary loop master closes the initialization process. Otherwise, the loop master issues a frame called a loop initialization report position, or LIRP, frame, which may contain as many as 127 bytes. The temporary loop master inserts its own AL_PA into the first byte position, increments an offset by 1, and passes the frame downstream. As each loop device receives the LIRP, it inserts its AL_PA in the next byte position, increments the offset, and forwards the frame. Eventually the frame fills with a positional map that details how loop devices are physically positioned in relationship to one another. When the positional map is complete, the loop master distributes it in a loop initialization loop position, or LILP, frame, allowing each loop device to copy and process the contents.

As long as the positional map is used for diagnostic or optimization purposes, the failure of Tachyon or other Fibre Channel controllers to support it does not create interoperability problems. Some vendors, however, use the positional map to discover which devices are on the loop. Instead of polling the entire 127 AL_PA address space to discover targets, these implementations poll only those AL_PAs listed in the positional map. Consequently, interoperability has been an issue when older Tachyon devices are mixed with positional map-dependent devices on the same loop.

Following the LILP (if all devices support positional mapping) or LISA (if some do not), the temporary master finishes loop initialization by issuing a close (CLS) ordered set, followed by IDLEs. As each loop device receives the CLS, it leaves the open-init state and resumes normal operation. IDLEs continue to circulate around the loop until any previously suspended operations are resumed or new ones begun.

It is helpful to remember that although this discussion of the initialization process is somewhat lengthy, the actual process completes in mere milliseconds and generally is not disruptive to loop operations.

The variables introduced by loop initialization are important considerations for SAN design. In any network, failures occur most frequently during adds, moves, and changes. Proper selection of components, HBAs, disks, and arbitrated loop hubs will help to ensure that the topology change implied by loop initialization will not impact loop stability.

4.2.4 Port Login

Loop initialization allows each device to select a unique AL_PA and thus avoids addressing conflict on the loop. For example, a loop with a single server and 24 Fibre Channel disk drives will emerge from loop initialization with 25 distinct AL_PAs. Immediately following loop initialization, however, the server has no idea which other devices are on the loop. For an initiator (server) to discover targets (disks), an additional step is required. This function is provided by a port-to-port login process known as N_Port login, or PLOGI. A similar function is provided for fabric-capable devices, known as fabric login, or FLOGI. Both login processes are part of a set of extended link services that are used to exchange communication parameters and identities and thus establish logical sessions between devices on the topology.

In arbitrated loop, PLOGI is usually performed immediately following loop initialization. Because Fibre Channel disk drives do not normally communicate with one another (except in XOR drive configurations), there is no need for a disk to initiate a device discovery. A server, on the other hand, will need to discover all targets on the loop, even if the upper-layer application (such as NT Disk Administrator) assigns only a few for the server's use. In most implementations, the server attempts to establish login sessions with targets by issuing PLOGI frames addressed to eachof the 126 possible NL_Port AL_PAs. The targets that accept the PLOGI from the server will return an ACC (accept) frame to the server, informing it of the target's WWN, buffer-to-buffer credit capability, maximum frame size support, and so on. By thus walking the address space, the server finds and establishes sessions with active participants on the loop.

of 407

The PLOGI login session is the first step in a series of interactions among loop devices that percolate up through the Fibre Channel hierarchy to FC-4's upper-layer protocol interface. At some point, an association must be made between the link-level AL_PAs and the application's logical definition of SCSI bus, target, and LUN (logical unit number) addressing. The extended link services that interface to FC-2 can determine which devices are out there; then the FC-4 protocol mapping—for example, Fiber Channel Protocol for SCSI-3—can determine what can be done with them. The responsibility for maintaining the association between lower-level AL_PAs and upper-level SCSI addressing is assumed by the device driver of the HBA or controller installed in an initiator.

4.2.5 Loop Port State Machine

Arbitrated loop introduces a new functional layer to Fibre Channel architecture. This layer, called the loop port state machine, resides between the FC-1 encoding/decoding function and FC-2's frame management and flow control functions. As shown in Figure 4-7, loop-specific functions are embodied in the loop port state machine, and the state machine in turn is embedded in silicon or microcode in Fibre Channel loop end nodes.

Figure 4-7. Loop port state machine logic

The loop port state machine monitors and performs the actions required for a device to become a loop participant: accessing the loop for transactions, being opened for transactions by other devices, and yielding control of the loop when transactions are complete. These processes are expressed in 11 different states and are controlled by a number of input and output variables. As requests are handed down by the FC-2 layer, the loop port state machine must determineits current state and decide which further actions are required to fulfill the pending request. If, for example, the upper layer has frames to send, the loop port state machine must transition from its normal monitoring state and enter an arbitrating state to gain access to the loop. When it wins arbitration, it changes to an arbitration won state and thenchanges to an open state as it notifies the target that a transaction is under way. When the frame transmission is complete, the loop port state machine enters a transmitted close or received close state, and finally it returns to monitoring mode.

Depending on its current state, the loop port state machine may not be able to immediately serve upper-layer requests. FC-2 might have frames pending at the same time that the loop port state machine already is in an open state—that is, another loop device has it conditioned to receive frames. Or it might be in an open state and in the midst of frame transfer when a LIP is received, in which case the loop port state machine must suspend any current activity and enter the open-init state. The logical transition from one state to another based on current inputs and variables provides an efficient mechanism for orderly conduct of the loop. Early interoperability testing of various vendors' implementations of loop port state machine logic revealed a number of timing and transition issues, but most of those issues, at least at the hardware level, have been resolved.

of 407

4.2.6 Arbitration

Because arbitrated loop is a shared transport, gaining access to the loop is a central function of the loop port state machine. Contention for access involves two components: the priority of a loop device's AL_PA, and an access variable that is toggled when a device wins arbitration. An NL_Port that observes fairness in sharing the loop will, after it has won arbitration, only reset its access bit and will be able to arbitrate again when no other devices are arbitrating. This scheme allows even low-priority devices on the loop to win arbitration and thus prevents starvation of any port. Fabric loop ports, however, do not observe fairness. If an FL_Port honored fair access, the switch might become congested as frames queued up at the FL_Port for delivery to the loop. Because an FL_Port is unfair and has the highest-priority AL_PA, it is always assured of winning arbitration whenever it needs loop access.

An arbitrate primitive, ARB(x), is transmitted whenever a loop device needs access to the loop. The actual format of theprimitive is K28.5 D20.4 AL_PA AL_PA but is referred to generically as ARB(x) to indicate that each of the last two bytescontains the AL_PA of the arbitrating device. An ARB(x) can be transmitted even if another NL_Port owns the loop, but ifframes are traversing the loop the ARB(x) can be issued only between frames. You transmit the ARB(x) by substituting an ARB for each current fill word (CFW). When there is no frame traffic on the loop, the CFW is normally the IDLEprimitive. As each IDLE passes through the arbitrating NL_Port, it is replaced with an ARB(x) containing the AL_PA of thearbitrator. If another device already possesses the loop, the CFW will be ARB(F0). The x'F0' in this primitive has the lowest priority and is used as an indicator for the fairness algorithm. As you will see, the arbitrating device will substituteits own ARB(x) for the ARB(F0) received.

If no other device is arbitrating, the NL_Port will transmit an ARB(x) with its own AL_PA value, and, as the ARB(x) circlesthe loop, it will be returned to the sender. The NL_Port then transitions to the ARB_WON state and proceeds to an OPENcondition to send frames. If two or more devices are arbitrating at the same time, the NL_Port with the higher-priority AL_PA will win. When an arbitrating device receives an ARB(x) from an upstream partner, it compares the value of the AL_PA to its own. If the received ARB(x) carries a higher-priority AL_PA, the device must forward it on. If the received ARB(x) contains a lower-priority AL_PA, the device replaces the received ARB(x) with its own. In this way, only the NL_Port with the highest-priority AL_PA will receive its own ARB(x) and thus win arbitration. The contending arbitrators will continue issuing their own ARB(x)s and will continue to substitute their ARB(x) for any received lower-priority ARB(x) until one of them, too, eventually wins arbitration.

Fairness is monitored by the presence of the ARB(F0) primitive on the loop. When a loop device wins arbitration, it sets its access variable to 0. As long as the access bit is 0, the device cannot arbitrate again. The winning device also begins substituting all current fill words it receives with ARB(F0). These ARB(F0) primitives are forwarded by nonarbitrating devices around the loop. If another NL_Port begins arbitrating, it will compare the AL_PA value in any received ARB(F0)to its own. The xF0 will always have lowest priority, and consequently the contending NL_Port will substitute its own ARB(x) for the stream of ARB(F0)s it receives. When the contender's ARB(x) is received by the current arbitration winner, it informs the current loop owner that another device is still arbitrating. The current loop owner discards the received ARB(x) and again substitutes ARB(F0).

When the current owner is finished with the loop, it yields control of the loop to others. Its access variable, however, is still set to 0 and cannot be reset as long as ARB(F0)s are circulating. The next highest-priority AL_PA that is arbitrating will immediately win and will begin the same process of ARB(F0) substitution. As long as active arbitrators are contending for loop access, ARB(F0)s will continue to stream from the current winner, and this, in turn, will keep the access bits of all previous winners set to 0. The duration of this activity is called the access fairness window. The window is closed only when a current winner receives an ARB(F0). The fact that no other loop device has substituted an ARB(x) for the ARB(F0) in transit back to the current winner confirms that no other devices are arbitrating. When the current winner closes its transaction and yields control of the loop, it ceases issuing ARB(F0)s and replaces the current fill word with IDLEs. As each previous winner receives IDLEs, its access bit is reset to 1 and it is now free to arbitrate at will.

The arbitration strategy has design implications for SANs because it may be helpful for manipulating fairness to increase performance. Allowing certain file servers to be unfair will increase their access to disks vis-à-vis less critical servers.

4.2.7 The Nonbroadcast Nature of Arbitrated Loop

Shared LAN topologies such as Ethernet and Token Ring are broadcast transports in that data sent by any device on asegment is broadcast to all other devices on the same segment. Shared (nonswitched) Ethernet is an obvious example

of 407

because an Ethernet transmission is sent along a common wire for all to hear. Each device along the wire processes the transmission to determine whether the destination MAC address matches its own. Token Ring employs a similar transmit-to-receive cabling scheme as arbitrated loop. A Token Ring device waits for a free token to pass by, marks the token "busy" and appends its data, and forwards the frame downstream. As each Token Ring device receives the frame, it examines the destination MAC address for a match. If the frame is addressed to the device, it marks the frame "copied" and forwards it on to the ring. Eventually the original frame returns to the sender. The sender sees that the frame was copied, removes the frame from the ring, and issues a new free token.

Because transmissions are visible to any device on a shared Ethernet or Token Ring segment, capturing traffic for problem diagnosis is relatively straightforward. You can plug in Sniffer, made by Network Associates, to any hub port for data capture of all segment activity. Multiple conversations among multiple pairs of devices can be captured on a single trace without moving the Sniffer from port to port. After the capture is complete, decoding of frames and analysis of protocol or performance problems can proceed.

Arbitrated loop, unlike shared Ethernet or Token Ring, is a nonbroadcast transport. As shown in Figure 4-8, when an NL_Port wins arbitration and opens a target for frame transmission, a frame or series of frames can be sent from initiator to target. Intervening loop devices in the path between the two will see the frames and forward them on. The target or recipient, however, removes the frames from the loop and simply issues an R_RDY or CLS primitive to the sender. The loop devices downstream from the recipient will forward the primitive back to the initiator, but they have no visibility to the frame transaction that occurred.

Figure 4-8. Frame transit in an arbitrated loop

Because frames are not broadcast throughout the loop segment, it is more challenging to capture traffic for problem diagnosis in arbitrated loops. With most Fibre Channel analyzers, you can capture only the traffic going into and out of a single port. Instead of simply plugging analyzer probes into any available hub port, you must place them in ports immediately preceding and following the monitored port, or you must configure them inline on the monitored device's link. This requirement alone is disruptive to loop traffic and may in fact mask an intermittent problem by altering the topology. Some vendors of arbitrated loop hubs have responded to these difficulties by engineering analyzer functionality into the hub itself.

During normal loop operation, the nonbroadcast nature of arbitrated loop enhances performance by removing the overhead of frame handling from at least part of the loop.

4.2.8 Design Considerations for Arbitrated Loop

Before the development of stable fabric switches, arbitrated loop was the most commonly deployed Fibre Channel storage topology. Loop hubs costing less than a thousand dollars were used to join hundreds of thousands of dollars' worth of servers and storage arrays into a SAN. The transition to fabric switches was accompanied by marketing campaigns highlighting LIPs and the vulnerability of loops to disruption. By that time, however, Fibre Channel arbitrated loop had achieved a fairly high level of interoperability, and managed loop hubs provided safeguards against disruptive behavior. Now arbitrated loop has been relegated to lower-end SAN solutions, to JBODs, and to storage backplanes of some RAID and NAS devices. From a SAN design standpoint, though, loops should still be considered, especially for price-sensitive configurations.

of 407

Arbitrated loop is not suited to some application requirements, but it provides an economical solution for a variety of shared storage needs. The following are among the application-driven criteria that you should consider when implementing arbitrated loop:

� Types of devices per loop segment

� Private and public loop support

� Total number of loop devices per segment

� Bandwidth requirements

� Distance requirements

� Managed or unmanaged environments

� High-availability requirements

Additional issues also factor in to the design equation, especially because SANs must often support multiple, sometimes contending applications concurrently. Balancing the needs of each application on a common topology may be challenging, and that is why complex SANs are often constructed with a combination of shared loop segments and fabrics.

Types of Devices per Loop Segment

An arbitrated loop may support a variety of devices, including host bus adapters installed in servers, individual Fibre Channel disk drives, JBODs, Fibre Channel RAID arrays, native Fibre Channel tape subsystems, and Fibre Channel-to-SCSI bridges. At the link level, each device type simply appears as one or more AL_PAs, or as peers on a common transport. At the upper-layer protocol, the device types divide into a less egalitarian society of initiators and targets.

Because applications sit on top of the upper-layer protocol of initiators, applications typically determine the traffic patterns to and from targets. Read-intensive applications such as data mining, for example, create a traffic flow from target to initiator. Tape backup applications create a flow from initiator to target or, in the case of third-party copybackup, from target to hybrid target. For single-initiator loops, mixing multiple device types on the same topology has little impact, because the initiator is responsible for launching all transactions (such as reads and writes to disk or tape). The physical positioning of devices on the loop can therefore be optimized for the dominant application of that initiator. In multi-initiator environments, optimization via physical positioning of devices is more difficult.

Tape backup presents a special problem for multi-initiator loops. Tape subsystems connected through Fibre Channel-to-SCSI bridges or native Fibre Channel interfaces are not particularly fast (typically less than 15MBps) and so pose no bandwidth issues. A streaming tape backup, however, may not tolerate interruptions due to loop reconfiguration. Such disruptions may abort the entire process, defeating tape's prime directive of data integrity. In the more dynamic and populous loop environment implied by a multi-initiator configuration, the statistical occurrence of interruptions in the form of LIPs increases. Installing a new drive in a JBOD while a tape is backing up another array, for example, will initiate a LIP throughout the loop segment and potentially will abort the backup process. Therefore, it is strongly recommended that you segregate tape backup systems to fabric ports or restrict LIP propagation via intelligent hubs (provided that you manage the AL_PAs via hard addressing).

If multiple initiators and multiple targets share the same loop, can all servers access the same drives? The topology guarantees that they can but does not guarantee the consequences if they do. Arbitrated loop does not provide intrinsic file locking, file permission monitoring, volume ownership, or any other feature that would prevent two servers from overwriting data on the same drive. If data sharing among multiple initiators is required, you can do it using middleware, usually in the form of applications that sit between the operating system and SCSI-3. Otherwise, you must administer the servers manually to ensure that each target is owned by a single initiator.

Windows NT aggravates this problem because NT wants to own everything it sees. When an NT server boots on an arbitrated loop (or fabric), NT Disk Administrator queries and reports all Fibre Channel-attached drives. If the drive has not previously been accessed or formatted for NT, Disk Administrator immediately prompts the user to write a volume

of 407

label on the disk so that it can be accessed. If the user consents, a volume label is written to the first sector of the disk, and if the disk actually belongs to a Solaris server, that will make the disk and its files unusable. This problem has been fixed with newer versions of Microsoft operating software, such as .NET.

Although the number of initiators on the same segment is important in terms of administering disk ownership, you should also consider hetero geneous operating systems and their appetites. Fibre Channel fabrics and switching hubs with port zoning (discussed later) are possible solutions for mixed OS environments.

Private and Public Loop Support

The bandwidth and population restrictions of private arbitrated loop can satisfy most requirements of applications used within departments. With one to four servers and several RAID or JBOD arrays, a customer service, engineering, or similar department would have adequate storage and response time for data access.

Support for fabric login and public loop becomes meaningful when requirements for bandwidth allocation, population, or distance exceed the capabilities of a single loop. Introducing a fabric opens new design possibilities for loop configurations, but it also mandates fabric support for HBAs, disks, and other loop devices. With a large installed base of legacy nonfabric, private loop devices in the market, some switch vendors have engineered support for both public and private loop devices on the same fabric.

For new SAN installations, it is preferable to select components that support public loop, even if the initial configuration is a stand-alone arbitrated loop. This is not a consideration for selecting arbitrated loop hubs, because they play a passive role in the loop and have no login functions. Selecting host bus adapters and storage devices with fabric login capability, however, will provide additional flexibility in evolving the SAN and will extend the life of the investment.

Total Number of Loop Devices per Segment

As discussed previously, the 126 NL_Port and 1 FL_Port capacity of a single arbitrated loop was not derived from performance calculations but is the result of neutral disparity requirements. What is a reasonable maximum number of devices that can be configured on a single loop? It depends on the application. Theoretically, a SAN design could specify a lone file server and 125 disks on a single arbitrated loop. Some current vendor implementations, in fact, have a single server with more than 90 Fibre Channel drives on a single loop. These large loop configurations are not unreasonable when only one or two file servers are present and when they are not aggressively contending for bandwidth. The obvious benefit of such large loops is storage capacity. More than a terabyte of storage can be provided by 125 ten-gigabyte Fibre Channel drives.

Arbitrated loops, especially ones with multiple initiators, normally do not exceed 20–30 devices. Most are much smaller, in the 2–10 device range, depending on whether Fibre Channel RAIDs or JBODs are used for storage. Host bus adapters are capable of driving more than 97MBps on a 1Gbps loop, or nearly the full Fibre Channel bandwidth. Fibre Channel drives can provide 15MBps–18MBps throughput each. If a server uses software RAID to stripe data blocks across multiple drives, a JBOD of 8 drives can fulfill server requests at nearly the same throughput as the HBA. From a performance standpoint, adding more JBODs to this 9-node loop will not significantly affect total throughput. It will simply make additional storage available. Adding initiators, however, may degrade performance if the new servers are equally contending for access and if the application has high bandwidth requirements (for example, video).

Bandwidth Requirements

In the example just mentioned, a configuration with one server (equipped with an efficient HBA) and an 8-drive JBOD provides enough throughput to nearly saturate a 100MBps Fibre Channel pipe on a 1Gbps loop. For the pipe to be fully utilized, the application would have to drive a sustained access to disk. In reality, this occurs only with specialized applications such as multiple streams of full-motion video. At 100MBps, arbitrated loop can support several video streams, primarily because such streams utilize the maximum 2,112-byte frame payload and require less command overhead per transaction.

Most enterprise applications do not require sustained throughput. Radiology, geological resource mapping, prepress, andother applications that require very large file transfers, for example, tend to be bursty, with periods of extremely high utilization followed by periods of inactivity. It is difficult to determine the total bandwidth requirements of such applications because averaging the total data requirement over time does not address the bandwidth needs of random bursts of data. Other applications—such as online transaction processing (OLTP), Internet service provider (ISP) Web

of 407

servers, and relational database queries—may have more predictable bandwidth requirements.

Distance Requirements

Arbitrated loop is a closed ring topology, with the total circumference determined by the distances between nodes. At gigabit speeds, signals propagate through copper media at 4 nanoseconds per meter, and through fiber-optic media at 5nanoseconds per meter. You can easily calculate the total propagation delay incurred by the loop's circumference by multiplying the lengths (both transmit and receive) of copper and fiber-optic cabling deployed by the appropriate delay factor. A single 10km link to an NL_Port, for example, would cause a 50-microsecond propagation delay in each direction, or 100 microseconds total. This is the equivalent of 1MBps of bandwidth consumed to accommodate the link. In practice, the propagation delay penalty is negligible compared with the degradation incurred by long links at the protocol level. Because all transactions must traverse a larger circumference, performance may decline by as much as 40 percent.

Managed or Unmanaged Environments

Advances in the design of arbitrated loop hubs have resulted in products with various levels of management capability, from simple enclosure services to enhanced analyzer-type functionality. Managed hubs usually provide a graphical interface, typically written in Java or platform-specific programming languages. When selecting loop hubs, you should consider, in addition to port density and port cost, the management features available in various products.

Not all loop environments require management. Applications that support less critical business operations, loops supporting only a few devices, homogeneous (single-vendor) configurations, and so on, may function quite well without management. Mission-critical applications, populous loops, and heterogeneous (multivendor) configurations, however, almost demand the higher level of visibility that management provides. Fibre Channel hardware components, like networking products in general, are very stable and have a high mean time between failure (MTBF). Even then, eventually a cable will be pulled inadvertently or a host bus adapter will misbehave. When something does break, products with good management features can help to reduce down time and restore the loop to operation.

High-Availability Requirements

With its parallel cabling scheme, the traditional SCSI architecture does not lend itself to high-availability configurations. The networking characteristics of SANs make it much easier to design and implement high availability to storage. A common configuration for arbitrated loop involves dual-provisioning HBAs in each server and installing two arbitrated loop hubs for redundant paths, as shown in Figure 4-9. Fibre Channel disk drives typically provide an "A" and a "B" channel for dual loop attachment. This configuration provides redundant data paths as well as redundant loop hubs, transceivers, cables, and power supplies. With the appropriate software running on the host, a failure of one loop automatically routes data to the standby loop.

Figure 4-9. Redundant loop configuration

of 407

High-availability configurations using dual loops are given additional reliability if managed hubs are used for each loop. Knowing the status of both loops (for example, knowing that the standby loop itself has not failed) provides a much higher level of stable operation. [ Team LiB ][ Team LiB ]

4.3 Fabrics A Fibre Channel fabric is one or more fabric switches in a single, sometimes extended, configuration. Fabrics provide full bandwidth per port—approximately 200MBps full duplex for 1Gbps links, and 400MBps full duplex for 2Gbps links. Although the introduction of new devices to an arbitrated loop further divides the shared bandwidth, adding new devices on each port of a fabric increases the aggregate bandwidth. An eight-port fabric with four initiators and four targets, for example, can support four concurrent 200MBps conversations at 2Gbps, or a total of 800MBps throughput (1600MBps if full duplex applications are available). As shown in Figure 4-10, a fabric switch is optimized for direct connections between switch F_Ports and device N_Ports.

Figure 4-10. A fabric switch provides full bandwidth per port

of 407

The switching mechanism typically used in fabrics is called cut-through. Cut-through technology is also used in Ethernet switches to speed packet routing from port to port. When a frame enters a fabric port from an attached N_Port,cut-through logic examines only the link-level destination ID (D_ID) of the frame. In Fibre Channel, the destination ID is the 24-bit port address of the target N_Port. A routing decision is based on the D_ID, and the frame is switched by internal routing logic to the appropriate port. Cut-through increases performance by reducing the time required to make a routing decision. Because the D_ID resides in the first 4-byte word of the frame header, a routing decision can be made immediately as the frame enters the fabric port. Alternatively, the store-and-forward algorithm employed by some switch technologies requires that the entire frame be buffered before a routing decision occurs.

The number of frame buffers offered on each F_Port is an important consideration for switch selection. During periods of congestion, the fabric may not be able to simply transmit a frame on the destination port, especially if two N_Ports are sending data to the same destination. Depending on the class of service, the fabric may be forced to abandon frames it cannot process. The ability to queue multiple frames reduces this possibility and enhances performance for both transmitting and receiving N_Ports.

Fabrics, unlike arbitrated loop, are not restricted to an address space based on transmission word running disparity (AL_PAs). Instead, fabric-attached N_Ports are assigned a 24-bit port address, which theoretically allows for 224, or more than 16 million, possible addresses. The address allocation scheme introduced by the Fibre Channel switch fabric (FC-SW) standard reduces this number somewhat to around 15.5 million. Because fabric products available in themarket provide 8 to 256 ports, this reduction from 16 to 15.5 million addresses has not yet caused an uproar among end users.

As shown in Figure 4-11, the FC-SW standard divides the 3-byte port address into three 1-byte fields. The most significant byte is the domain, the next significant is the area, and the least significant is the port. The 8-bit domain field has a number of reserved addresses, leaving 239 available for assignment to individual switches. This allows each switch in an extended fabric to have a unique identifier and also limits the largest possible fabric to 239 interconnected switches. In practice, most SANs do not connect more than a few switches in a single fabric. The 8-bit area field provides 256 area addresses. You can use these addresses to identify individual FL_Ports that support loops, or groups of F_Ports. An example of the latter is a multiport card installed in a switch chassis, which can be governed by a separate part of the routing engine. Finally, the 8-bit port field provides 256 addresses for identifying attached N_Ports or NL_Ports. The division of the 24-bit address space into domain, area, and port provides enhanced switching performance by enabling a routing decision to be made on the basis of a single byte. If an incoming frame, for example, has a domain address intended for a different switch in the fabric, the routing engine can forward the frame to the appropriate interconnect or E_Port without processing the entire 24-bit address.

Figure 4-11. Fabric addressing

Fibre Channel switch products offer ports ranging from 8 ports for departmental fabrics to 128 or more ports for larger enterprise-level directors. Because fabrics are extended via switch-to-switch links on E_Ports, a potential bottleneck could occur during periods of high activity between switches. You could resolve this problem by providing multiple E_Ports between two interconnected switches and a load-balancing algorithm to distribute traffic. Such a solution, however, might result in out-of-order delivery of frames from source to destination because a sequence of frames could take different paths through the fabric. Consequently, some implementations allow only a single active link between two

of 407

switches, whereas others provide additional logic to prevent out-of-order delivery. Trunked 2Gbps E_Port connections can be used in some products to build higher-speed interswitch connections, and 10Gbps Fibre Channel will provide evenhigher performance interswitch links.

You can configure a complex fabric of multiple interconnected switches in a mesh topology, with alternative data paths from one N_Port to another through the fabric cloud, as depicted in Figure 4-12. Redundant paths through the fabric are desirable, because the failure of one path will allow traffic to be routed through alternative paths. As in the example just cited, concurrently active alternative paths could result in out-of-order delivery of frames. A meshed fabric can avoid thisproblem by employing spanning tree, an IEEE 802.1 standard commonly used in LAN bridges and routers. The spanning tree protocol is used by network devices to determine optimum routes through a network and to hold redundant links in reserve. Alternative paths are activated only when a primary path fails. Spanning tree thus avoids duplication of active paths through the network and, in the case of a meshed fabric topology, allows in-order delivery of frames.

Figure 4-12. A partially meshed fabric of four Fibre Channel switches

Because fabrics must support N_Ports, public arbitrated loop, and switch-to-switch interconnect, they are considerably more complex than loop hubs. This is reflected in the per-port cost of fabrics compared with that of hubs, although competition and the introduction of efficient ASIC (application-specific integrated circuit) technology are driving the per-port cost down. Considering that fabrics and loop hubs sometimes tie together hundreds of thousands of dollars' worth of servers and storage and that these systems support the business of entire enterprises, Fibre Channel interconnect products are the least expensive component of any storage network.

Fabrics provide a number of services that rationalize activity on the transport and facilitate adds, moves, and changes of attached devices. Fabric login services allow N_Ports and public loop NL_Ports to register with the fabric for communication. Simple Name Server (SNS) functions streamline the process of discovery between communicating ports.And State Change Notification (SCN) allows all fabric devices to keep current with topology changes caused by the introduction or removal of devices.

4.3.1 Fabric Login

When a fabric-capable Fibre Channel node is attached to a fabric switch, it performs fabric login. Like port login, FLOGI is an extended link service command that establishes a session between two participants. In the case of FLOGI, an association or login session is created between the N_Port or public loop NL_Port and the switch. An N_Port sends a FLOGI frame containing its node name, N_Port name (World-Wide Name), and service parameters to a well-known address of xFFFFFE. A public loop NL_Port first opens the destination AL_PA x00 before issuing its FLOGI request. In both cases, the login handler in the switch (at address xFFFFFE) accepts login by returning an accept (ACC) frame to the sender. If the fabric does not support certain service parameters requested by the N_Port or NL_Port, it sets bits in the ACC frame to indicate as much.

An N_Port uses a 24-bit port address of x000000 when it logs in to the fabric. This allows the fabric to assign the appropriate port address to the device, based on the domain/area/port address format. The newly assigned address is

of 407

contained in the destination (D_ID) field of the ACC response.

A similar process is used for a public loop NL_Port, except that the least significant byte is used to assign an AL_PA, and the upper two bytes constitute a fabric loop identifier. Because the NL_Port has just emerged from a loop initialization (LIP) process before performing fabric login, it will already have a previously derived AL_PA. It is up to the fabric to determine whether it will accept this AL_PA. If it does not, a new AL_PA is assigned to the NL_Port, and that causes the fabric to launch another LIP process on its local loop segment. This ensures that the fabric-assigned AL_PA does not conflict with any previously selected AL_PAs on the loop. During the subsequent loop initialization sequence, the NL_Portwill now have a fabric-assigned address that it can select during the LIFA phase.

Fabric login establishes the initial relationship between the switch and an attached device. Reliance on the fabric to automatically assign a network address and to negotiate capabilities between the switch port and the device is meant to streamline the installation and management of the fabric. As discussed later, however, administrators still need to manage the fabric to avoid configurations that may inadvertently result in duplicate addressing.

4.3.2 Simple Name Server

As you saw in the discussion of arbitrated loop, initiators on the loop begin the process of target discovery by performingport logins to all 126 AL_PA device addresses. At gigabit speeds, this process occurs fairly quickly and allows each serverto find target disks whenever a topology change occurs.

This process is untenable in fabric environments, however, because the potential address space is more than 15.5 million addresses. Even at gigabit speeds, this would take some time. Fabrics rationalize the process of device discovery by providing a name server function in each switch. The SNS is a database of objects that allows each fabric device to register or query useful information, including the name, address, and class of service capability of other participants. Tokeep the database compact, some objects—such as the type of FC-4 protocols supported by a registrant—are coded into a positional bit map.

The name server is accessed by performing a port login (PLOGI) to a well-known address of xFFFFFC. An N_Port or public NL_Port registers with the name server after performing PLOGI to it and then terminates the session. The device may register values for some or all of the database objects, but the most useful are its 24-bit port address, 8-byte port name, 8-byte node name, class of service parameters, FC-4 protocols supported, and port type (N, NL, etc.). Devices that support IP can also register their IPv4 or IPv6 IP addresses. Because the name server contains a table of both 24-bit addresses and the corresponding 64-bit WWNs, one device can find another based on either category.

The fabric name server makes it possible for a file server, for example, to begin discovering disk targets by inquiring for a list of all port addresses registered with the switch. This would relieve the file server of the need to poll 15.5 million addresses. The file server could then perform PLOGIs with each reported address to discover, via upper-layer commands, which devices were SCSI-3 targets. Alternatively, because the name server also contains information on FC-4 protocol support, the file server could PLOGI only those port addresses that reported SCSI-3 support.

The name server in each Fibre Channel switch enables initiators to discover targets. When two or more fabric switches are connected via E_Port, an initiator on one switch might also need to access storage resources connected to the other switches. To discover those fabric-wide resources, the switches themselves must exchange SNS information. This exchange is performed using the Class F protocol over the expansion port connections and assumes that each switch's SNS is large enough to accommodate additional entries. In very large fabrics, this requires additional memory for SNS data and may prolong the convergence time of the fabric as each switch updates its name server.

4.3.3 State Change Notification

Because fabrics are dynamic environments and may undergo topology changes when new devices are attached or previously active devices are taken off line, it is useful if notification of changes can be provided to other participants. This function is provided by State Change Notification (SCN) and a variant, Registered State Change Notification(RSCN). SCN and RSCN are voluntary functions in that an N_Port or NL_Port must register its interest in being notified of network changes if another device alters state.

The original SCN service allowed an N_Port to send a change notification directly to another N_Port. RSCN does not allow this because a direct N_Port-to-N_Port notification would not be seen by the remainder of the registrants. As a fabric service, RSCN allows all devices that have registered for notification to be informed if a topology change occurs.

of 407

Because storage targets are passive recipients of commands from initiators, there is no need for targets to be alerted when devices enter or leave the fabric. Servers, by contrast, want to be notified when resources are no longer available or when new resources have been added. Change notification thus provides for fabrics a function that loop initialization intrinsically provides for arbitrated loop.

4.3.4 Private Loop Support

Fabrics support public arbitrated loop devices via FL_Ports. The FL_Port has the highest-priority AL_PA, x00, always winsarbitration, is always temporary loop master during loop initialization, and provides the interface between the loop segment and the rest of the fabric. To communicate in a fabric environment, however, loop devices must be fabric-capable—that is, they must perform fabric login following loop initialization.

Fibre Channel standards do not support private, nonfabric loop devices on a fabric topology. However, this has not prevented fabric vendors from devising means to accommodate the large population of private loop devices in the market. Allowing private loop-only devices to participate in a switched topology has obvious benefits for storage networks that have been steadily deploying arbitrated loops over the past several years. Although upgrading a file server with a fabric-capable HBA is affordable, performing a forklift upgrade on a private loop disk array is not. Finding ameans to incorporate these resources into a fabric is therefore a very attractive option.

Private loop support on a switch implies the need for additional intelligence to monitor the activity of the private loop devices and to ensure that the existence of the fabric is transparent to them. Because the private loop devices are not aware of the fabric's presence, private loop support on a switch is sometimes described as stealth, or phantom, mode. Various implementations are available.

A segmentation strategy allows private loop devices to be dispersed on specific fabric ports, as shown in Figure 4-13.Although vendors may restrict the total number of private loop devices on a single port, the allocation of devices per port is usually determined by bandwidth and traffic considerations. Servers, for example, could be assigned to shared ports, with large JBODs (representing multiple AL_PAs) assigned to other dedicated ports. What was previously a single, continuously shared transport would thus be segmented into a series of smaller arbitrated loop segments, some of whichwould have only one private loop device per port. Segmentation transforms what was previously a physical arbitrated loop into a virtual arbitrated loop. Because the private loop devices are not aware of the fabric, they arbitrate, open, transfer, and close transactions as before. A server on its own port, for example, arbitrates for access as usual, but because it is the only node on that loop segment, it has no contenders and immediately wins. When it issues frames to adestination AL_PA, the switch logic intervenes and routes the frame directly to the switch port where that AL_PA resides.By this means, segmentation provides the high-speed bandwidth of a fabric without requiring fabric support. Segmentation also allows multiple conversations to occur simultaneously on the same virtual loop.

Figure 4-13. Private loop support on a fabric switch

of 407

A translational strategy allows private loop devices to communicate with fabric-capable devices. You can implement this strategy by manipulating the upper two bytes or loop identifier of the private loop device's port address. On the private loop segment, the loop identifier is x0000, indicating that the device is nonfabric. At the FL_Port, the fabric can proxy a loop identifier on behalf of the private loop device and create an entry in the SNS, thus allowing fabric-capable devices to address it. This manipulation implies a passive role for the private loop device, because it would require considerable overhead to allow private loop servers to discover and communicate with fabric-capable storage. Translational private loop support is thus better suited to the access of private loop storage by fabric-capable servers.

4.3.5 Fabric Zoning

Fabric zoning allows you to segregate devices based on function, departmental separation, or potential conflicts between operating systems. Windows NT's habit of claiming all available storage resources, for example, makes it highlydesirable to isolate UNIX storage from NT storage. Without some means of separating resources, putting marketing department information on the same fabric with engineering might inadvertently give engineers a peek at the overly aggressive plans of marketing managers. Zoning resolves these potential problems by assigning devices or ports to separate zones or groups, and enforcing separation via selective switching.

Zoning can be implemented on a port-by-port basis, thereby providing greater security, or by 24-bit port address or 64-bit WWN. The latter variation usually requires a dedicated server to configure and maintain discrete assignment of subgroups. Port-based zoning is less involved, because you can use filtering logic in the routing engine to maintain the assignment of ports to one or more zones.

A classic application for zoning is tape backup for heterogeneous operating systems, as illustrated in Figure 4-14.Windows NT servers and storage can be assigned to, say, zone A, whereas Sun Solaris servers and storage are assignedto zone B. This segregation prevents NT from accessing the Sun storage arrays. Both NT and Solaris servers, however, could be assigned to a common zone C, which would include the tape subsystem. Zone C would allow both servers to back up or restore from tape.

Figure 4-14. Port zoning on a fabric switch

of 407

Port zoning is relatively more secure than WWN-based zoning because an intruder cannot spoof port zoning by manipulating Fibre Channel frame information. The zone, however, stays with the port and not the device. So if someone inadvertently moves a server from one port to another, the server is no longer part of its former zone. In WWN-based zoning, the zoning stays with the device. This facilitates movement of devices in the fabric but is vulnerable to being spoofed.

Zoning implementations in fabric switches may be proprietary, although switch interoperability has required common tagging methods so that a zone created on one vendor's switch can be recognized by another vendor's switch. Zonemerging is now part of the switch-to-switch protocol and, like SNS exchange, requires additional handshaking between switches before the fabric can become stable and operational. [ Team LiB ][ Team LiB ]

4.4 Building Extended Fabrics As SANs proliferate in enterprise networks, fabrics composed of multiple switches have become more commonplace. Showcase applications such as full-motion video editing and high-performance relational database servers are driving large fabric configurations. Extended fabrics built for disaster recovery applications employ multiple switches separated by long fiber-optic runs, DWDM (dense wave division multiplexing), or IP tunneling. The scalability promised by Fibre Channel is being challenged at every step of the technology's adoption, forcing standards to prove themselves in workable products.

Building large multiswitch fabrics has been difficult for both technical and vendor-driven reasons. The theoretical use of 239 switches in a single fabric cannot be achieved with today's fabric switch products. Even in a single-vendor solution, achieving stable fabrics of more than a dozen switches is problematic. One reason is the amount of switch-to-switch communications that must occur before the fabric can become active. Even in a fairly small multiswitch fabric, the distribution of address blocks to each switch and the exchange of routing tables and SNS information may require more than 10 minutes before a fabric achieves stability. Latency across multiple switches is also an issue, and switch vendors have recommended hop count limitations of three to seven switches to avoid timing issues. These practical issues are not dependent on Fibre Channel standards but instead are driven by the current state of the art of the products. E_Port connections (also known as interswitch links, or ISLs) have yet to achieve the level of interoperability commonplace for Ethernet switch products.

In the category of vendor-driven issues, multivendor switch interoperability has been the most contentious. For one switch to communicate with another, a switch-to-switch protocol is required. In the early stages of fabric switch development, each vendor implemented its own switch-to-switch protocol, making it impossible for customers to deploy multivendor fabrics. This approach was contrary to the open systems expectations of customers, and marketing pressure

of 407

eventually pushed fabric switch vendors to cooperate in resolving interoperability problems. The Fabric Shortest Path First protocol, for example, was finally developed as a common standard when one switch vendor reverse-engineered the switch-to-switch protocol of another and took it into the NCITS/ANSI T11 standardization process. Even now, however, subtle differences in switch-to-switch protocol implementation require vendors to implement multiple compatibility modes that must be configured manually before switches can be connected.

Another obstacle to large fabrics has been the influence of marketing on the engineering recommendations given to customers. The diagram in Figure 4-15, for example, is an actual vendor recommendation for building a large fabric composed of 16-port fabric switches. The vendor in question did not simply recommend a more rational configuration based on 128-port switches, because at the time it did not have large port count switches in its product portfolio. Consequently, the proposed configuration introduced innumerable points of failure, resulted in a significant managementburden, required that user ports be sacrificed to achieve interswitch links, and violated the vendor's own recommendation for hop count limitation. In practice, this proposed configuration would never have worked. That fact, however, did not prevent vendor marketing from positioning this sort of configuration as a viable SAN solution.

Figure 4-15. A fabric configuration that reflects the undue influence of marketing on systems engineering

Although little can be done about overly aggressive and misleading vendor marketing, the technical issues that large fabrics must address are not insurmountable. Standardization of E_Port connectivity has seen encouraging progress overthe past several years, and the continued introduction of large port count Fibre Channel directors offers streamlined solutions for large fabric deployments.

4.4.1 E_Port Standardization

E_Port connectivity involves several Fibre Channel standards that define various processes that are performed between linked fabric switches. The standard most relevant to E_Port connectivity is NCITS/ANSI FC-SW-2, which defines the requirements for the FSPF protocol. FSPF is a subset of the more commonly used Open Shortest Path First (OSPF) standard used in IP routing. Like OSPF, FSPF is a link state routing protocol. Link state routing protocols were developed as an enhancement to distance vector routing protocols such as RIP (Routing Information Protocol), which simply determine the optimum path based on the number of hops from source to destination. In a link state routing protocol, the best route through a network is determined by the optimum number of links (least cost) offered between switches. In the case of FSPF, the relative cost of traversing multiple switches is driven by the link speed provided by E_Port connections. For OSPF, additional variables such as current utilization, or load, may be factored in tothe cost equation.

Optimum routes through FSPF Fibre Channel fabrics are based on a single metric expressed in the following formula: FSPF Cost = S(1,000) x1.0625 [Gbps] / Line Rate [Gbps]. Consequently, the cost of a standard 1Gbps E_Port connection between two switches is 1,000. As shown in Figure 4-16, the more E_Port links a connection must traverse, the higher the cost. Because the path from switch A to switch D has least cost when routed via switch B, the more costlypath through C will not be selected unless the E_Port link between switch B and switch D fails. The cost calculation becomes more interesting (and less like simple RIP) when mixed 1Gbps and 2Gbps E_Port connections are deployed.

Figure 4-16. FSPF link cost calculation based on 1Gbps interswitch links

of 407

A proposal for a version 2 of FSPF (FSPFv2) introduces the variable of latency into the link cost equation. This initiative has been prompted by initial experience with native Fibre Channel SAN extension over longer distances in which speed-of-light latency impacts performance. By factoring in latency, it is possible to select a more optimal route that provides lower bandwidth but less latency compared with a higher-speed link with more latency.

In addition to routing information, it is desirable to exchange zoning information over E_Port connections. The ability to merge zoning tables between Fibre Channel switches allows administrators to allocate storage resources to the appropriate hosts even as SANs scale to larger multiswitch configurations. How zoning is implemented in individual switches may be proprietary, but establishing a standards-based way to share zone information between various switch products facilitates deployment of more complex SANs.

The exchange of FSPF routing and zoning parameters over E_Port connections helps to build the framework of the fabric.The fabric itself, however, is only a convenience for communication among servers, disks, and tape devices. Another essential ingredient for E_Port connectivity is therefore exchange of Simple Name Server data so that servers can discover and establish connections with storage resources throughout the fabric. The name server standard is covered by NCITS/ANSI FC-GS-2 and for multiswitch configurations relies on compliance with FC-SW-2 for successful exchange of SNS information.

4.4.2 Principal Switch Selection

To avoid duplication of addressing between multiple Fibre Channel switches, Fibre Channel E_Port connectivity requires aprocess for assigning blocks of unique addresses to individual switches. In the principal switch selection process, one fabric switch assumes the role of principal switch and automatically allocates blocks of 64K addresses to the other switches in the fabric. This gives each switch a unique Domain_ID and avoids address duplication in the fabric. In a single-switch SAN, an individual switch would serve as its own principal switch. The principal switch facility was developed to automate link layer addressing in Fibre Channel fabrics. In practice, most vendor implementations allow (orrequire) that an administrator manually determine which switch will serve as principal and which as secondary.

Principal switch selection is predicated on an orderly attachment to the fabric. If multiple E_Port connected switches are powered up or enabled simultaneously, they will attempt to contact their peers by issuing frames to a well-known address of x'FF FF FD', which represents the fabric controller of each switch. This frame flooding continues until a presettimeout value (F_S_TOV, or fabric stability timeout value) is reached. In the end, the switch with the highest principalswitch priority is determined, and address blocks are assigned.

If an active fabric is already operational, a newly introduced switch will either gracefully enter the fabric (if the switch isfreshly enabled) or trigger a new principal switch selection process (if the switch is already active and assumes that it is still the principal switch). The build fabric link service is nondisruptive and enables a newly introduced switch to gain a unique Domain_ID and participate in the enlarged fabric. The reconfigure fabric link service is disruptive in that Domain_IDs may change, forcing the devices attached to a switch to log on to the switch again and acquire new addresses. Any storage traffic on the fabric will be discarded, and if the reconvergence is lengthy, higher-level SCSI timeouts may occur.

In either of these cases, the fabric will eventually stabilize and additional switch-to-switch services, such as SNS exchange and zone merging, can occur. The potential for disruptive reconfiguration of the fabric thereafter may occur if

of 407

another operational switch is inserted into the fabric or if the principal switch disappears due to a broken E_Port connection. The likelihood of the latter is exacerbated when E_Port connections are stretched over distance. A break in the wide area link or transient loss of connectivity in an IP tunneled solution may cause the previously joined fabric switches at each side to split into separate SAN islands.

Administrators who have attempted to create extended or large fabrics are intimately familiar with fabric reconvergence issues. The very mechanism that is meant to simplify Fibre Channel fabric deployments may result in disruptions if the fabric is not properly managed. New switches must be either disabled or powered off before they are inserted into an operational fabric, and switch-to-switch links over wide area connections must be monitored for stability.


4.5 Fabrics and Loops Fabric switches can support dedicated N_Port connections for full bandwidth to a single node or can support shared NL_Port connections for arbitrated loop nodes. The most common loop devices on a fabric switch are JBODs, which appear on a switch port as a series of downstream NL_Ports on a common loop. It may also be reasonable, however, to group servers on a loop segment if the bandwidth required per server is well below the 1Gbps or 2Gbps bandwidth provided by the switch.

The combination of loop and dedicated fabric nodes on a switched infrastructure provides flexibility in SAN design and enables cost-effective use of switch ports. As shown in Figure 4-17, the customer can use an arbitrated loop hub to allocate shared bandwidth for servers that have lower performance requirements while devoting full bandwidth to high-performance servers. JBODs do not have the throughput of RAID, but they offer an economical alternative to high-performance RAID storage.

Figure 4-17. Fabric switches, loop hub, and loop JBODs on a single Fibre Channel SAN

The ability to integrate loop and fabric components in a single solution is also useful for customers who have already deployed loop-based SANs in various departments and now wish to tie them together into an enterprise-wide storage network. Although switch vendors are always willing to sell more switches in each department, leveraging the investment the customer has already made lowers the deployment costs and still accommodates assignment of bandwidth as required by the storage application. [ Team LiB ][ Team LiB ]

4.6 Chapter Summary Point-to-Point

� Point-to-Point Fibre Channel is a dedicated connection between two N_Ports, typically a server and a disk.

Arbitrated Loop

of 407

� Arbitrated loop is a ring topology supporting as many as 127 attachments on a shared, 100MBps bandwidth.

� Loop hubs collapse the ring topology into a physical star configuration.

� Arbitrated loop physical addresses (AL_PAs) are dynamically assigned, 1-byte identifiers.

� Loop initialization ensures that each device has a unique AL_PA.

� Following loop initialization, initiators (servers) discover targets (disks) via port login.

� The loop port state machine monitors activity on the loop and is responsible for initialization and loop access.

� Access to the loop is gained through arbitration. A fairness algorithm ensures that all participants are allowed utilization of the loop.

� Arbitrated loop is nonbroadcast; the recipient of frames removes them from the loop.

� Total loop population should be application-driven.

� Private loop devices do not support fabric services; public loop devices will attempt fabric login to obtain a loop identifier.

� Loops support 10km links, but any long haul will incur propagation delay for all loop transactions.

� Managed loop hubs should be used for mission-critical environments.

� Redundant loops provide duplicate data paths and hardware for failover.

Fabrics

� A fabric is one or more Fibre Channel switches; each fabric port provides full bandwidth to the attached device.

� Fabric addressing allows for more than 15 million unique identifiers.

� The 3-byte port address is divided into domain, area, and port.

� Fabrics can be configured in a meshed topology to provide redundant links.

� An N_Port performs fabric login to establish a session with the fabric switch.

� An N_Port may register its World-Wide Name and service parameters with the Simple Name Server to facilitate discovery of targets.

� Registered State Change Notification allows a device to be notified if a communicating partner leaves or reenters the fabric.

� Some fabric switches may provide support for private loop devices and allow public devices to access them.

� Zoning allows segregation of devices by port or World-Wide Name.

� E_Port connections enable scaling of a fabric to provide more ports.

� The Fabric Shortest Path First protocol determines the best network path based on available bandwidth between

of 407

switches.

� The principal switch ensures that each switch has a unique Domain_ID.

� Principal switch selection may be disruptive or nondisruptive, depending on how new switches are introduced to the fabric.

Fabrics and Loops

� Fabrics and loop hubs can be combined to create an integrated SAN.

� Leveraging loop attachment to a fabric enables discrete allocation of bandwidth and cost-effective utilization of switch ports.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 5. Fibre Channel Products Fibre Channel SANS are composed of various hardware and software products, which at the same time complement one another and compete for space in the storage solutions market. The manufacturers of Fibre Channel products go to market along various paths, including large original equipment manufacturer (OEM) contracts with solutions providers (Dell, HP, IBM, Sun, and so on), arrangements with value-added reseller (VAR) and reseller channels, and direct sales tocorporate accounts. With the diversity of products required to create an operational storage network, OEMs and integrators have relationships with a spectrum of vendors and must certify interoperability before solutions are taken to market.

Closer cooperation between noncompeting vendors—such as manufacturers of HBAs, switches, storage products, and software—has promoted lower-level interoperability between the essential components of a SAN interconnection. Additionally, a few vendors have made substantial investments in their own verification labs to ensure that their products will work with others. Interoperability of Fibre Channel products has also been encouraged by the Fibre ChannelIndustry Association (FCIA) and the Storage Networking Industry Association (SNIA). The FCIA has developed the SANmark framework to provide validation of standards compliance at various levels of SAN functionality. The SANmark Conformance Documents (SCDs) cover Fibre Channel standards compliance from lower-level physical layers and protocols to upper-layer storage applications such as clustering and backup. Products that bear the SANmark label have at least demonstrated basic standards compliance, something that hopefully translates into interoperability.

The spectrum of hardware components extends from transceivers to enterprise-class storage arrays. Transceivers, HBAs, loop hubs, and fabrics inter connect Fibre Channel-enabled RAIDs, JBODs, Fibre Channel-to-SCSI bridges, and native Fibre Channel tape subsystems. Software components include device drivers that are supplied with HBAs, management software for hubs and switches, file and volume management applications, storage resource management, failover software, and SAN-ready tape backup applications. These hardware and software categories are discussed later in this chapter, but because the Fibre Channel market, like other markets, is subject to mergers and acquisitions and changes in vendor identity, I emphasize the useful features of each product type and not the responsible vendor. In selecting particular products, the SAN designer should conduct the usual inquisition to grill vendors on their actual product features and functionality. A list of vendors and corresponding product categories is given in Appendix B.[ Team LiB ][ Team LiB ]

5.1 Transceivers Gigabit interface converters, or GBICs, are still commonly used for Fibre Channel applications. GBICs are removable media and so provide flexibility for configuring either optical or copper cabling. The selection of GBICs for SAN interconnection is as important as the selection of the proper switch or hub, because without signal integrity at the link level, no higher functions can be performed. GBICs are manufactured to a de facto standard form factor, and that enables use of various vendors' transceivers in the same device chassis. As shown in Figure 5-1, an optical GBIC provides an SC connector for multimode or single-mode fiber-optic cabling on one end, and the electrical connector to plug in to the device port on the other.

Figure 5-1. Optical GBIC with standard SC connector

of 407

First-generation optical GBICs used edge-emitting CD lasers—that is, laser components normally used in consumer CD products. For high-speed data transport, only the highest-quality CD lasers can be used. Wafer manufacturing techniques for CD lasers require separation of the discrete components for testing to ensure quality, and that contributesto manufacturing overhead and cost. In addition, compared with other kinds of lasers, CD lasers consume more power, radiate more heat, and are more susceptible to loss of calibration over time. These inherent problems of CD technology have encouraged the development of alternative laser products—in particular, vertical cavity surface emitting lasers, or VCSELs.

VCSEL technology has been in development for some time, awaiting resolution of obstacles to mass production. In the past few years, these problems have been solved one by one, allowing manufacture and packaging of second-generationlaser products and, for Fibre Channel applications, manufacture of second-generation GBICs. Unlike edge-emitting CD lasers, vertical cavity lasers emit light perpendicular to the surface of the wafer substrate, as shown in Figure 5-2.VCSELs also consume less power, radiate less heat, and maintain calibration better than their CD cousins. And because VCSELs can be tested at the wafer, as opposed to the discrete component level, manufacturing and quality testing can be performed more efficiently. GBIC manufacturers have converted from CD lasers to VCSELs, and that helps to ensure that these physical-layer components will maintain calibration and therefore data integrity at the system level.

Figure 5-2. Edge-emitting CD laser and vertical cavity laser components

Another advance in GBIC technology is the development of serial ID functionality. Originally an optional feature of the GBIC specification, serial ID has been engineered into second-generation products to provide a higher level of management at the physical layer. Serial ID allows a supporting enclosure—such as a hub, switch, or HBA—to query the GBIC for inventory or status information. Inventory information includes manufacturer, date of manufacture, serial number, speed, media and lengths supported, and other useful tracking information. Depending on the vendor, status information may include current power consumption, output, and diagnostic data. Serial ID relies on the supporting enclosure to solicit this information. This implies that the hub, switch, or HBA can query the GBIC and report its findings

of 407

to a management application.

GBICs were initially developed for 1Gbps transmission. Current products typically accommodate both 1Gbps and 2Gbps signaling and depend on the clock and data recovery circuitry on the device port to provide the appropriate clock rate. Inaddition, some GBIC manufacturers support both Fibre Channel 1.0625Gbps and Gigabit Ethernet 1.25Gbps signaling, enabling customers to use the same type of GBIC for different applications.

Concurrent with the development of 2Gbps Fibre Channel, vendors have achieved higher port density in the same chassis footprint by substituting new small form factor transceivers in place of the traditional GBIC. Small form factor transceivers provide the same functionality as GBICs but in a more compact package. Small form factor transceivers may be fixed (permanently attached to the device port) or, as shown in Figure 5-3, pluggable. The pluggable variety is typically referred to as SFP (small form factor pluggable).

Figure 5-3. Small form factor pluggable transceivers

Although handling the small form factor transceivers may be a bit awkward for administrators with large fingers and opposable thumbs, these devices enable very high port density for Fibre Channel switches and directors. It is also possible to mix small form factor transceivers and GBICs at each end of the cabling plant so that storage devices using GBICs can be attached to SAN switches using the more compact SFP connectors. [ Team LiB ][ Team LiB ]

5.2 Host Bus Adapters Fibre Channel host bus adapters provide the interface between the internal bus architecture of the server or workstation and the external storage network. HBAs are available for various bus types and various physical connections to the transport. Most commonly employed are HBAs with peripheral component interface (PCI) bus interfaces and shortwave fiber-optic transceivers. HBAs are supplied with software drivers to support various operating systems and upper-layer protocols as well as support for private loop, public loop, and fabric topologies.

Although most HBAs have a single transceiver for connection to the SAN, some dual-ported and even quad-ported HBAs

of 407

exist. As discussed in Chapter 4, these multiported devices appear as a single Fibre Channel node containing two or more N_Ports, each with a unique World-Wide Name (Port_Name) and 24-bit port address. Multiported HBAs save bus slots by aggregating N_Ports, but they also pose a potential single point of failure should the HBA hang. Most HBAs offer a single Fibre Channel port, requiring that you install additional HBAs if you desire multiple links to the same or different SAN segment.

As shown in Figure 5-4, the HBA embodies all four Fibre Channel layers, FC-0 through FC-4. At the FC-0 layer, the HBA has transmit and receive functions to physically connect to the link. For fiber optics, this connector may be a standard GBIC, a fixed transceiver, or a small form factor transceiver. For copper interface, the connector may be DB-9 with four active wires or the high-speed serial direct connect (HSSDC) form factor. Behind the link interface, clock and data recovery (CDR) circuitry, serializing/deserializing functions, and an elasticity buffer and retiming circuit enable the receipt and transmission of gigabit serial data.

Figure 5-4. Host bus adapter functional diagram

The FC-1 transmission protocol requirements are met with on-board 8b/10b encoding logic for outbound data and decoding logic for incoming and error monitoring functions. For loop-capable HBAs, the FC-1 functions must be followed by a loop port state machine (LPSM) circuit, typically included with other features in a single chip, such as Tachyon or Emulex Firefly. Above the LPSM, the HBA provides the signaling protocol for frame segmentation and reassembly, class of service, and credit algorithms as well as link services for fabric and port login required by FC-2. At the FC-4 upper-layer protocol mapping level, most HBAs provide SCSI-3 software drivers for NT, UNIX, Solaris, or Macintosh operating systems.

The number of functions consolidated into one or more chips is vendor-dependent, but current HBAs are ASIC-based andcollapse most functions into an integrated architecture. This arrangement has helped to bring the cost down and provides more functionality on less real estate compared with less integrated designs.

The rapid evolution of server technology to multiprocessor architectures and wider buses has also affected the development of HBAs. The 32-bit PCI bus found in some NT, UNIX, and Macintosh platforms can sustain 132MBps throughput. Two 32-bit HBAs driving 100MBps each would outperform the bus. Newer 64-bit PCI bus implementations, however, can drive 264MBps, permitting full utilization of two 64-bit PCI HBAs at 1Gbps or a single 2Gbps HBA at 200MBps. The 64-bit PCI HBAs are usually backward-compatible with 32-bit buses, and that provides flexibility in sourcing and maintaining these cards in mixed platform environments.

The design of HBAs typically includes Flash ROM for microcode. This is an important feature, because compatibility issues and the need for microcode fixes are facts of life for all network products. Having a means to upgrade microcode via a software utility is very useful and extends the life of the product. Device drivers for specific operating systems are also upgradable. In most cases, installing new microcode or device drivers requires taking an HBA off line, and that is

of 407

additional incentive for redundant configurations in high-availability networks.

Although HBA vendors may sell products directly to customers, significant volumes are sold through VARs, systems integrators, and OEMs. These resellers, in turn, may integrate a model of HBA into a certified SAN configuration along with servers, switches, and storage targets. Although the HBA vendor may have supplied device drivers with the HBA to ensure interoperability in open systems environments, the reseller may modify those device drivers to suit specific application requirements. The result is that even though an HBA functions perfectly in an OEM configuration, interoperability issues surface when third-party products or other HBAs are introduced into the SAN. OEM microcode meddling may also extend to Fibre Channel switches, further exacerbating open systems interoperability issues. For the SAN architect, this state of affairs requires additional due diligence in product and vendor selection to avoid frustrating and expensive troubleshooting to resolve vendor-induced problems.

The SCSI-3 device driver supplied by the HBA vendor is responsible for mapping Fibre Channel storage resources to the SCSI bus/target/LUN triad required by the operating system (OS). These SCSI address assignments may be configured by OS utilities or by a graphical interface supplied by the manufacturer (or both). Because Fibre Channel addresses are self-configuring, the mapping between port addresses or AL_PAs (which may change) and the upper-layer SCSI device designations (which generally do not change) is maintained by the HBA and its device driver interface to the OS.

Device drivers for IP over Fibre Channel must perform a similar function via the Address Resolution Protocol, or ARP. When upper-layer applications send data addressed to an IP destination, the HBA's device driver must resolve IP addresses into Fibre Channel addresses. Most configurations assume that all IP-attached devices reside on the same IP subnet and that no IP router engine exists in the fabric to which it is connected. If a SAN design requires concurrent use of IP and SCSI-3, some vendors require a separate card for each protocol. Others provide integrated SCSI-3 and IP support and thereby allow you to use a single HBA.

The trend in HBA development, as with other Fibre Channel products, is toward more sophisticated functionality at a reduced cost. Some host bus adapters offer add-on features such as HBA-based RAID, which offloads the task of stripingof data across multiple drives from the server's CPU. Support for the Virtual Interface (VI) protocol is also being developed. VI drivers will allow applications ready access to the Fibre Channel transport without passing through traditional, CPU-intensive protocol stacks. And advanced diagnostic features—including SCSI Enclosure Services (SES)emulation and standardized management application programming interfaces (APIs)—allow HBAs to participate in umbrella SAN management platforms. You should consider such enhanced features when querying HBA vendors about their product roadmaps and cooperative development efforts with other Fibre Channel suppliers.

The Storage Networking Industry Association has promoted interoperable HBA management through development of theCommon HBA API. This API is standardized through the NCITS/ANSI T11 committee and defines status reporting. This enables management frameworks to solicit common information in a multivendor HBA environment, including traffic statistics and configuration data useful for monitoring the operation of the SAN. Additional information on the Common HBA API is available on the SNIA Web site at www.snia.org.[ Team LiB ][ Team LiB ]

5.3 Fibre Channel RAID Redundant arrays of independent disks, or RAIDs, are a well-established technology in traditional storage applications. RAID standards define methods for storing data to multiple disks and imply intelligence in the form of a RAID controller. The controller can be implemented in software—for example, as an application running on a file server—but typically is adedicated card installed in a RAID storage enclosure. RAID storage configurations may include eight to ten disks for a departmental-level array, or many more for enterprise applications.

When a server is connected to a single disk drive, reads or writes of multiple data blocks are limited by the buffering capability and rotation speed of the disk. While the disk is busy processing one or more blocks, the host must wait for acknowledgment before sending or receiving more data. You can increase throughput by dispersing data blocks across several disks in a RAID, a technique called striping. In a write operation, for example, the host can send fewer data blocks to multiple targets consecutively and thereby avoid swamping the capacity of any individual drive. This simplified RAID is called level 0. Although it boosts performance, it does not provide data security. If a single disk fails, data cannot be reconstructed from the survivors.

Other RAID levels introduce data integrity by either writing parity data to a dedicated drive or writing parity information

of 407

on each drive in an array. RAID level 3 writes byte-level parity to a dedicated drive. RAID level 4 writes block-level parity to a parity drive. Because a dedicated drive contains the information required to reconstruct data, levels 3 and 4 pose a data integrity problem if one of the data drives and the parity drive itself fail. RAID level 5 addresses this problem by striping block-level parity information across each drive. If an individual drive fails, it can be reconstructed from the parity information contained on the other drives. Another technique, level 1, achieves full data security by sacrificing the performance gain of striping in favor of simple disk mirroring. In disk mirroring, every write operation toa primary disk is duplicated on a secondary, or mirrored, disk. If the primary disk fails, the server can switch over to the backup disk. Some RAID techniques may be combined, as in level 0+1, to provide both striping throughput and data redundancy via mirroring, as shown in Figure 5-5.

Figure 5-5. RAID 0+1 striping plus mirroring of data to two arrays

The various techniques that RAID provides for redundancy and speed can be implemented by a dedicated RAID controller housed in the same enclosure as the disks, or by a RAID controller provisioned in the host system or file server. Data is passed from the operating system to the RAID controller, which then manages the striping or mirroring task. If the operating system itself manages data striping or mirroring, the host CPU must devote cycles to the task. Thelatter technique is called software RAID. In the next section, you will see how software RAID is implemented on JBODs.

Fibre Channel-attached RAID enclosures use an integrated RAID controller that sits behind the Fibre Channel interface. The RAID controller receives SCSI-3 requests via the loop or fabric and stores or retrieves data from the array using the appropriate RAID levels. From the standpoint of the Fibre Channel interconnect, the physical interface between the RAIDcontroller and its disks is irrelevant. The RAID controller appears as an N_Port or NL_Port to the outside world but can use a proprietary bus, a SCSI bus, or other architecture to talk to its drives, as shown in Figure 5-6. For performance and reliability, however, RAID manufacturers can also incorporate Fibre Channel technology behind the RAID controller. This allows the use of Fibre Channel drives and the redundant topologies provided by, for example, dual-loop configurations. The use of arbitrated loop between the RAID controller and its Fibre Channel disks is invisible to the SAN,because the RAID controller still appears as a single N_Port or NL_Port to the rest of the topology.

Figure 5-6. RAID with internal SCSI bus and internal arbitrated loop RAID

of 407

Because a Fibre Channel RAID controller must support the functionality of an HBA and independent intelligence to manage striping and mirroring functions, it brings additional cost to a storage solution. Fibre Channel RAID products justify the additional expense by including high-availability and storage management features required for enterprise networking. RAID enclosures offer redundant, hot-swappable power supplies, redundant fans, and self-diagnostics. Most of these products accommodate on-line spare drives for automatic reconstruction of data if a disk failure occurs. Large systems can be scaled to storage ranging from hundreds of gigabytes to tens of terabytes, and that implies an enterprise-level budget.

For mission-critical SANs, Fibre Channel RAIDs offer a number of advantages. The advanced diagnostics support, management features, reliability, and scalability typical of such high-end systems fill in most of the checkboxes for enterprise storage selection criteria. Server processing resources benefit by offloading data redundancy tasks to the RAID controller. Additionally, when the server is freed from the overhead of software RAID, it is also liberated from exclusive ownership of the storage device. No single server is responsible for managing the striping of data to multiple disks, so any server may own part or all of the RAID controller's data. Fibre Channel RAID thus enables server clusteringand other applications that are predicated on co-ownership of storage resources.

From a SAN design standpoint, Fibre Channel RAIDs simplify configurations because the RAID controller appears as a single port address to the loop or fabric. Adding drives to a RAID enclosure does not add to the population of the SAN transport, even if the RAID enclosure internally uses arbitrated loop to access its disks. For arbitrated loops, you can attach more than a terabyte of storage without consuming all available AL_PAs. For direct fabric attachment, the RAID controller and all its storage appear as a single N_Port on its own 100MBps segment, whereas a single JBOD enclosure will appear as multiple NL_Ports sharing a common bandwidth. However, JBODs do have one distinct advantage over RAIDs: cost. [ Team LiB ][ Team LiB ]

5.4 Fibre Channel JBODs A JBOD, or Just a Bunch of Disks, is an enclosure with multiple Fibre Channel disk drives inserted into a common backplane. The backplane provides the transmit-to-receive connections that bring the assembled drives into a common arbitrated loop segment; these connections bypass electronics that let you insert or remove drives without disrupting the loop circuit. Because Fibre Channel drives provide primary and secondary NL_Ports for dual-loop attachment, the JBOD enclosure typically includes external interfaces for connecting two loops, as shown in Figure 5-7.

Figure 5-7. JBOD disk configuration with primary and secondary loop access

of 407

A JBOD brings two things—the receive lead going to the first drive of a set, and the transmit lead coming from the last drive of a set—to a Fibre Channel interface, typically DB-9 copper. When the JBOD's interface is connected to a Fibre Channel hub or switch, the connection is not to a single Fibre Channel device but to multiple independent loop devices within the enclosure. An eight-drive JBOD, for example, appears as eight AL_PAs to the interconnection. If the connection is made to a switch, the switch port must be an FL_Port because the downstream enclosure is actually a loop segment. If the connection is to an arbitrated loop hub, the population of the entire loop is increased by the number of drives in the JBOD. JBODs, unlike Fibre Channel RAIDs, have a direct impact on the topology to which they are attached.

Some JBOD-based disk arrays incorporate additional logic for enclosure management, usually a Fibre Channel controller that supports SCSI Enclosure Services queries. This adds an extra AL_PA to the configuration, something that can be addressed by an SES management workstation to solicit power, temperature, and fan status. JBODs may also include options for configuring the backplane to support dual loops to a single set of drives or, as shown in Figure 5-8, dividing the drives into separate smaller sets with a single loop attachment each.

Figure 5-8. Dividing the JBOD backplane into separate loops

of 407

How the individual disks within a JBOD are used for data is determined by a host server. You use volume administration tools for the appropriate OS to assign drives as individual logical disks or to assign a group of drives or the entire JBOD as a logical disk. In the latter case, you can use software RAID to increase performance and provide redundancy. Use of software RAID, however, implies that the server will be the exclusive owner of the JBOD, because that server alone is responsible for managing striping of data across the drives. Even if another server sits on the SAN, it will have to request the JBOD's data via the external LAN. In some systems this problem has been addressed by custom application software, but it requires an additional server on the SAN to coordinate file information (metadata) management. Generally, software RAID on JBODs offers redundancy for dedicated server-to-storage relationships, but it does not lend itself to server clustering or serverless tape backup across the SAN.

Software RAID improves performance by avoiding the latency of reading or writing to a single drive, but this gain must be balanced with the increased traffic load on the loop. In a normal disk write operation to a single drive, for example, the server must first set up the write transaction using a SCSI-3 write command to the target. On an arbitrated loop, this may require several loop accesses (tenancies of the loop) before the disk is ready to accept data.

The transfer of frames to the disk, in turn, is not normally accomplished with a single possession of the loop unless the file is fairly small. If the server is sending frames faster than the target can process them, the target will accept what it can and then close the loop. The loop must then be regained for each additional transfer of frames. A single write operation may therefore require multiple tenancies, or loop occupations, before all frames have been sent and a write response is sent by the target to the initiator.

This problem is aggravated when you use software RAID with a JBOD, because the initiator must conduct a series of small transactions to multiple targets. At the SCSI-3 level, commands queue to the targets and await response from each one before more frames can be sent. Even if the initiator can leverage transfer mode to address several targets during a single tenancy, it is still limited by disk performance and the timeout values of arbitrated loop. This incurs more protocol overhead and traffic on the loop, and that may be a concern if the loop has other active initiators. A loop with four servers, each with its own loop-attached JBOD, would suffer much higher protocol overhead than a comparable configuration with loop-attached RAIDs.

JBOD enclosures are typically marketed with eight to ten drive bays, some of which may be configured for failover. Some very large disk arrays are packaged in 19-inch rack form factors, with more than 20 disks per JBOD module and as many as four modules per rack enclosure. These high-end systems may include rack-mounted arbitrated loop hubs orswitches for connecting JBOD modules to one another and to servers for single- or dual-loop configurations. Given the amount of customer data that is stored on such arrays, it is preferable to use managed hubs or fabric switches rather than unmanaged interconnects. Dual power supplies, redundant fans, and SES management options allow JBODs to be used in high-availability environments. Vendors may also provide an upgrade path to add a RAID controller card to the enclosure, something that extends the life of the original investment. You can begin with a partially populated JBOD enclosure and add disks as storage needs dictate. A RAID controller option can then be added to increase performance and offload software RAID tasks from the host.

First-generation JBODs provided a backplane with hardwired arbitrated loop connections to connect the transmit lead of one drive to the receive lead of another. If an individual loop drive failed, an entire bank of loop drives would go down aswell. Current backplane design incorporates "switch-on-a-chip" architecture developed by Vixel Corporation that automatically bypasses failed drives and keeps the remaining drives operational. This the loop transport, resulting in proactive notification of impending problems that might affect loop stability. [ Team LiB ][ Team LiB ]

5.5 Arbitrated Loop Hubs Primarily because of the successful marketing tactics of fabric switch vendors, loop hubs are less commonly deployed forSANs than are higher-priced switches. Loop hubs, however, are quite adequate for many moderate-performance shared storage applications and can be combined with fabric switches to provide a more economical solution.

Hubs are available from a number of vendors in various port configurations, interface types, and levels of management. Unlike other Fibre Channel devices, a hub is a passive participant in the SAN topology. Except for special management functions that may be provided by a vendor, a hub normally has no port address. So although all loop traffic passes through the hub, no NL_Port speaks directly to the hub itself. Finding a means to maintain this unintrusive role and yet engineer advanced management features into hub technology is a challenge that hub vendors have assumed.

of 407

5.5.1 Star Topology for Arbitrated Loop

As discussed in Chapter 4, you can wire arbitrated loops into a ring by connecting the transmit lead of one device to the receive lead of another and extending this scheme until all devices are physically configured into a loop. This technique eliminates the cost of a concentrating hub, but it exposes the topology to considerable risk. If a connection fails anywhere along the ring, the entire loop goes down. Troubleshooting is complicated by the fact that there is no central point where all cabling is brought together, and, because the loop is down, no in-band diagnostics (if available) can be used. In addition, introducing an NL_Port to the configuration (or removing it) requires downing the loop for the durationof the change. Bringing down a network for an add, move, or change usually is not tolerated outside test lab environments.

As illustrated in Figure 5-9, loop hubs simplify the cable plant by concentrating physical links at a central location, and they minimize disruption by providing bypass circuitry at each port. Each link in the physical loop is brought into the hub, and that creates a star cabling configuration while maintaining the circular data path through all devices. Port bypass circuitry automatically routes around any hub port that loses Fibre Channel signaling, and that prevents a brokencable or disabled NL_Port from disrupting loop operations.

Figure 5-9. Wiring concentration with an arbitrated loop hub

Hubs are usually co-located with storage arrays or servers in 19-inch equipment racks, and that offers a further convenience for verifying status and cabling within the enclosure. Large storage arrays may have multiple JBODs or RAIDs and multiple loop hubs configured into separate or redundant loops within a single 19-inch enclosure.

5.5.2 Hub Architecture

Hub design varies from vendor to vendor, but all hubs incorporate basic features specific to arbitrated loop. A hub embodies the loop topology by completing circuits through each port and then joining the transmit of the last port to the receive of the first. When cabling from an NL_Port is plugged in to a hub port, the port must, at minimum, be able to recover valid Fibre Channel clock. If the signaling is too fast or too slow, the port will remain in bypass mode. Dependingon the sophistication of the loop hub, an insertion decision will be made on the basis of the received Fibre Channel signalor a combination of valid signal and valid ordered sets via protocol decode. For most unmanaged loop hubs, if the attached NL_Port is providing valid signal, the device is allowed to insert into the loop.

As shown in Figure 5-10, a port in bypass mode shunts the bit stream it has received from its upstream neighbor directly to its immediate downstream neighbor. If a device is removed from port 4, for example, the bypass circuitry will route traffic from port 3 to port 5. Port 4's transmitter, however, may still be active. Some vendor implementations turn off the transmitter as long as the port is in bypass mode. The transmitter is then enabled only when the hub port receives valid signal from a newly attached device. Other designs leave the transmitter on so that an attached device will immediately see valid Fibre Channel signal from the hub and thus insert without delay.

Figure 5-10. Internal hub architecture with port bypass circuitry

of 407

Auto-bypass circuitry allows the loop topology to self-configure when devices are added or removed. When servers or storage arrays are taken off line or reenabled, the circumference of the loop circuit automatically contracts or expands through the associated ports. At the link protocol level, loop initialization handles the automatic reconfiguration of port addresses to accommodate newly attached devices. The combination of hub auto-bypass and insertion with arbitrated loop's self-configuring addressing algorithm offloads tedious administration tasks from day-to-day storage network operations and facilitates changes to the topology with minimal disruption to data traffic.

Loop hubs may offer one or more light-emitting diodes (LEDs) per port to display port status. Typically, two LEDs are provided: a green LED to indicate a link connection, and an amber LED to indicate the current bypass mode. Depending on the vendor's specification, the combination of green and amber LED states can be used to display a number of port conditions, listed in Table 5-1. Some implementations use a single, multicolored LED per port, thereby reducing the number of diagnostic display states available. Port LEDs give the operator an at-a-glance status of a device's connection state, and by simplifying troubleshooting they help reduce down time.

Hub architecture is not dictated by specific Fibre Channel standards except for the implied parameters established by arbitrated loop documents (FC-AL , FC-AL-2, and so on) and physical layer FC-0 standards. As long as vendors do not violate arbitrated loop and Fibre Channel physical layer provisions, they are free to engineer variations in port density, port type, signal processing, and management features to enhance functionality. Consequently, various hub products are available that are engineered to different marketing requirements, from simple entry-level unmanaged hubs to hubs with advanced diagnostics on-board.

Port density on a single unit may vary from as few as 5 ports to as many as 32. The most common density is from 6 to 12 ports per unit. This range is adequate for most loop implementations, and by cascading hubs together you can build higher loop populations. Loop hubs are typically 1-U high units, and that allows, for example, as many as 48 ports to be supplied in 4 slots of a 19-inch rack. With some models you can configure higher densities, but usually only by providingports on both back and front of the hub chassis, thus complicating the cabling scheme.

Port types also vary from vendor to vendor. Fixed copper ports can be found on low-cost, entry-level loop hubs, although others in the same class offer GBIC-based ports. Fixed copper ports are not a highly desirable feature, because they are prone to high EMI emissions and jitter. Cross-talk, in which signaling from one port affects its neighbor, is alsoa concern with copper interfaces. Additionally, the lower price of fixed copper ports is offset by the extra cost of media

Table 5-1. Hub Port LED State Table

Green Amber Port State

Off Off No device attached

On Off Device attached and loop inserted

On On Device attached and bypassed

Off On Bad GBIC, port bypassed

Blinking Blinking Maintenance mode via hub management

of 407

interface adapters (MIAs) if you wish to use fiber-optic cabling from the hub to a device. Most mid-range hubs employ GBIC-based ports because of the flexibility GBICs offer for mixing copper, shortwave, and longwave media. This design allows you to provision ports as needed and to change from one media type to another without swapping out the hub itself. GBIC-based hubs may be marketed "without media"—that is, without the additional cost of the GBICs themselves—or may be bundled with a specific configuration of optical or copper GBICs (or both) preinstalled. Because most fabric switches also use GBICs, this hub architecture simplifies parts sparing for storage networks based on both fabrics and loops.

5.5.3 Unmanaged Hubs

Unmanaged hubs can be used for small, single-vendor environments that are less susceptible to the dynamics of larger, more complex SAN installations. If only a single server and one or two disk arrays are involved, there is less exposure toprolonged down time if a cable or other component fails. And if a single solutions provider has supplied the server, HBA, disk arrays, hub, GBICs, and cabling, there is less demand for the kind of higher-level SAN diagnostics required by a multivendor configuration.

Unmanaged hubs are a simple, low-cost, entry-level interconnection solution. They typically provide port bypass circuitry, based on valid signaling alone and port LEDs to display insertion or bypass status. If an attached device is unplugged or powered off, an unmanaged hub will auto-bypass the port and light the appropriate port LEDs. Having no intelligence or ordered set recognition circuitry, an unmanaged hub cannot respond to protocol violations or conditions, such as a port streaming LIP(F8), that would bring the loop down. The selection of an unmanaged hub therefore impliesan assumed risk. The trade-off for lower cost and simplified installation is the longer mean time to repair that might follow a loop failure.

Some loop-based SAN configurations are deployed with redundant loops to provide an alternative path if the primary loop fails. The lack of visibility to the hub and loop status via hub management is balanced against the security offered by redundant paths. Such implementations are, by and large, lower risk, because the hubs and other interconnection components rarely fail. The worst case scenario for redundant, unmanaged loops would be the quiet failure of the backup loop, followed eventually by the failure of the primary loop. Without hub management to report the backup loop failure, the redundancy would no longer be in force, and the failure of the primary loop would bring the system down. Troubleshooting would then be complicated by the fact that what appeared to be a single occurrence—system failure—was actually the result of two separate events on two separate loop topologies.

Unmanaged hubs are a logical choice for low-cost storage network solutions in which economical servers and small JBODs are used to meet budget restraints. These entry-level systems bring SAN capability within reach of small businessand departmental applications, and they create the initial infrastructure that can evolve to support more sophisticated requirements.

5.5.4 Managed Hubs

Managed hubs introduce intelligence into the SAN interconnection. The degree of intelligence and management capability varies from product to product and is normally reflected in price. At the low end of the managed hub offering, basic hub status and port controls are available via Web browser, Telnet, or SNMP (Simple Network Management Protocol) management software. At the high end, advanced diagnostics and proactive management features are available on a variety of management platforms, including HP OpenView and Java-based applications.

Support for management functionality in arbitrated loop hubs has two basic components. First, at the hardware level, additional on-board circuitry is needed to monitor power supply, fan, temperature, and port status. These functions are the minimum requirement for hub management, although some implementations concentrate on port status alone. Managed hubs with advanced capabilities also provide more extensive diagnostic circuits to monitor the state of the loop, protocol activity, and more comprehensive port and hub status. Second, for these circuits to be useful, the hub must be able to report to and accept commands from an external management workstation. This is typically accomplished via an Ethernet port on the hub, over which SNMP queries and commands are sent from an NT or UNIX console. The application software used to manage a hub is provided by the hub vendor, either as a stand-alone programor as a utility that can be launched from more comprehensive SAN management applications.

Managed hub products that support SNMP or Web browsers can be managed from anywhere in an IP routed network, as shown in Figure 5-11. This is a distinct advantage for customer support organizations and network command centers, as well as resellers or VARs that have support contracts with remote customers. SNMP is preferable to browser-based management because multiple hubs can be managed from a single workstation. Browser-based management typically

of 407

speaks to only one IP address, and therefore to one hub, at a time. Chapter 10 discusses SNMP and other management protocols and strategies in more detail.

Figure 5-11. Managed SAN with remote SNMP console

Most vendors provide an event log for automatic logging of hub and loop events. The event log may be queried from the hub using Telnet or SNMP, or it may be replicated on the management workstation for a permanent record of activity. An event log is a useful diagnostic tool, especially for troubleshooting intermittent problems that may occur during unattended operation of the management workstation.

The graphical interfaces supplied with managed hubs vary in capability and platform support from vendor to vendor. At the low end, browser-based management may provide screens for basic port configuration and status but no active updates for visual cues on hub status. At the high end, NT or Java-based applications provide a "back of box" or portsidedepiction of the hub, with color-coded cues and legends for hub and port status. Pop-up alerts may be provided for significant events, as well as SNMP traps to send notification to another management platform. Depending on your management philosophy, a SAN management workstation may be co-located with the SAN or incorporated into a network operations center where LAN and WAN management is performed. Consequently, managed hub applications may provide additional interfaces to integrate into HP OpenView, Tivoli, or other comprehensive management platforms.[ Team LiB ][ Team LiB ]

5.6 Switching Hubs Switching hubs are a hybrid SAN interconnection, occupying a middle ground between shared loop hubs and fabrics. As with the functionality of private loop stealth or phantom mode for some fabric switches, switching hubs leverage the simplicity of arbitrated loop with the high-performance bandwidth of switches. By design, switching hubs are not fabric-capable and so do not support fabric login, SNS, or state change notification. This reduces the engineering requirements and additional cost associated with fabric switches. Switching hubs are therefore more expensive than standard loop hubs, but more economical than fabrics. For configurations that require high bandwidth but no more than 126 total devices, switching hubs offer a reasonable price and performance solution.

Switching hubs typically provide 6 to 12 ports, each of which supports 1Gbps or 2Gbps throughput. The attached loop nodes are configured into one virtual loop composed of multiple loop segments. By assigning high-performance servers to their own ports and RAIDs or JBODs to other ports, switching hubs provide the aggregate bandwidth of fabric switches. The segmentation of one logical loop into smaller virtual loop segments also allows loop behavior to be modified. Loop initialization can be restricted to a single port, and that removes a source of potential disruption for sensitive applications such as streaming video or streaming tape backup. Additionally, concurrent conversations within a single virtual loop, as shown in Figure 5-12, are now possible. A server on port 3, for example, could be accessing storage on port 7 while a server on port 1 could access data on port 6. Because neither normal loop protocols—such as ARBs or OPNs—nor the data frames themselves have to traverse the entire loop, switching hubs also accommodate long

of 407

hauls (up to 10km) without impacting overall loop performance.

Figure 5-12. Switching hubs allow multiple concurrent loop transactions

At the high end of the hub product offering, switching hubs support SNMP, SCSI Enclosure Services, or other management features. Depending on the vendor's design, some products offer advanced diagnostic features, including the ability to direct, via a management interface, data capture traffic on any other port without disrupting the topology. [ Team LiB ][ Team LiB ]

5.7 Fabric Switches The deployment of fabric switches was initially hindered by cost (more than $2,000 per port) and the requirement for fabric login services for host bus adapters and disk arrays. Cost reductions due to ASIC technology, and widespread support of fabric services by other Fibre Channel vendors, have moved fabric switches into the mainstream of SAN solutions. SAN solutions, in turn, have been moved into the mainstream of enterprise networking.

As illustrated in Figure 5-13, current-generation fabric switches support 1Gbps or 2Gbps per port, provide a high-speed routing engine to switch frames from source to destination, and offer basic services for fabric login and name server functions. Products are differentiated based on vendor-specific issues such as port density, performance, and value-added functionality for ease of installation and management.

Figure 5-13. Fabric switch functional diagram

of 407

Fabric switches may support 8 to 16 ports for departmental applications, or 32 to 128 ports (or more) for larger enterprises. Cascading fabrics via expansion ports (E_Ports) allows small and medium configurations to expand as SAN requirements grow, with the caveat that each cascade consumes additional ports and that the expansion links themselves may become potential bottlenecks for fabric-to-fabric communication. Some switch products support multiple links between two switches with load balancing of traffic. This resolves the congestion issue but may introduce another problem if the source and destination N_Ports on either side receive out-of-order frames. In addition to bandwidth, switch-to-switch latency may limit the number of switches in a path. Hop count limitations imposed by vendors range from three to seven switches in any consecutive path.

Fibre Channel standards define F_Ports for attaching nodes (N_Ports), E_Ports for fabric expansion, G_Ports for supporting either N_Ports or other fabrics, and NL_Ports for loop attachment. The standards do not define how, specifically, these port types are to be implemented in hardware, so vendor designs may differ. Some products offer a modular approach, with separate port cards for each port type. Others provide ports that can be configured via management software or auto-configuration to support any port type. The latter offers more flexibility than the others for changing SAN topologies, permitting redistribution or addition of devices with minimal disruption.

Performance differences between fabric products are minimal when ASIC technology is used. Non-ASIC fabrics typically incur more than 2 microseconds latency per frame switch. ASIC-based fabrics normally incur less than 1 microsecond latency. As latency drops into the nanosecond range, the fabric is essentially performing at wire speed. Fabric performance is also affected by the transmit and receive buffering capability of each port. Some products provide sufficient buffering to queue 12 frames, whereas others can queue 60 or more. Additional buffering allows the fabric to handle congestion without throwing away frames, and that speeds the end-to-end communication between source and destination.

Enhanced functionality not covered by specific Fibre Channel standards also includes support for private loop devices, proprietary fabric management, and load sharing trunking between switches.

Almost all fabric switches support some variation of zoning. Port zoning allows a port to be assigned to an exclusive group of other ports. In most implementations, a single port can be assigned to several groups, depending on application requirements. Port zoning, normally a no-cost enhancement to the switch, is an accessible means to segregate servers and their storage from other servers or to isolate different departments sharing the same switch resource. Other zoning options include zoning by node World-Wide Name (WWN) or port World-Wide Name. This more granular approach offers more flexibility in assigning zones but also incurs more cost. Zoning on WWNs may also requirean external server and custom application software for managing the zones.

To achieve interoperability in multivendor switch environments, a process called zone merging is required. The mechanics of how each vendor implements port- or WWN-based zoning may differ, but once established, zones must be represented in a common nomenclature so that other switches in the fabric can honor zoning restrictions. As with SNS information exchange, zone merging is a requisite step in the convergence of the fabric toward stable operation. The more complex the fabric, the longer it takes to merge zones and ensure that only authorized storage conversations are allowed.

of 407

Management of fabrics is normally provided out-of-band over Ethernet using SNMP or Telnet. Fabric switches may also be managed in-band through the Fibre Channel link using SES queries or proprietary protocols. Most fabric management applications are device managers; they manage the switch enclosure but have no visibility to the rest of the SAN. Fabric management graphical interfaces may include topology mapping, enclosure and port statistics, routing information, and port performance graphing. These features give IT managers a snapshot of the fabric's status and throughput. The tendency of enterprise management for the LAN and WAN is to incorporate network snapshots into trending applications (for example, Concord Network Health) that can be used for capacity planning and network analysis. As this trend extends to SANs, the management data from fabrics can be queried to provide proactive information for storage networkplanning.

5.7.1 Departmental Fabric Switches

Fabric switches that provide 8 to 32 ports are referred to as departmental switches to differentiate them from more robust and higher-port-count director switches. The term "departmental," however, does not mean that this class of switch is used only in stand-alone departments or small-scale, dispersed SANs. It is not uncommon for an enterprise data center to have hundreds of 8- and 16-port fabric switches mounted with servers and storage in endless rows of 19-inch racks. Although these switches reside in the same data center, they are rarely connected to each other. Instead, each switch serves a specific storage application for one or more departments in the enterprise.

Enterprise data centers may accumulate a significant quantity of departmental fabric switches simply because the supplying vendor packages servers, switches, and storage as a bundle. When more applications are required, more bundles appear. At some point, the volume of separate departmental fabrics begins to resemble the previous accumulation of direct-attached server/storage configurations and presents the same problems in terms of storage management and server administration. It also becomes difficult to reengineer the data center in a more rational distribution of servers and storage connected by a common enterprise fabric, especially in multivendor situations.

Departmental fabric switches may have redundant power supplies and swappable fans, but they do not provide the high-availability features of director-class fabric switches. Each departmental fabric switch is a FRU (field-replaceable unit),and high availability requires installation of dual data paths and redundant switches.

Consolidating an enterprise storage network by connecting multiple departmental switches in a single fabric is a challenge for both customers and vendors. Although a vendor data sheet for a departmental switch may declare support for as many as 239 switches in a fabric, practical guidelines typically call for no more than 32 switches, with no more than 7 switch-to-switch hops in any path. To create a robust meshed network with 8- and 16-port departmental switches simply consumes too many ports for interswitch links and imposes lengthy convergence times to stabilize the fabric. For high-port-count enterprise SAN requirements, it is more efficient to use director-class switches at the core, with departmental switches as fan-out for device ports.

The best use of departmental fabric switches is in environments that need to support only a few storage applications for a limited device population. If growth is anticipated, the SAN design should include eventual attachment to a core director switch as opposed to cascading of additional departmental switches.

5.7.2 Fibre Channel Directors

At the high end of Fibre Channel fabric offerings, Fibre Channel directors provide high port counts in a single, high-availability enclosure. Directors may provide 64 to 128 or more ports (256 ports for some announced products) and so present a streamlined solution for the storage requirements of large data centers. High port count alone, however, does not automatically qualify a product for director status. With roots in data center mainframe channel extension, Fibre Channel director architecture implies high availability for every component, including redundant processors, routing engines, backplanes, and hot-swappable port cards. These high-availability features necessarily drive the cost of director-class fabric switches beyond the reach of departmental budgets and positions them as a high-end data center solution.

As with departmental switches, the number of directors that can be connected in a single fabric may be limited. This is not because of the availability of ports for interswitch links but rather because of the complexity of exchanging route, zoning, and SNS data and the switch-to-switch latency that may occur as the hop count increases. The automated processes provided by Fibre Channel switch protocols to streamline small fabric configurations pose problems for scalability. In addition, some vendors do not advertise the fact that their high-port-count director chassis is the result of combining smaller modules—for example, connecting two 64-port switches in a single enclosure. This design introduces an additional hop count within the director itself and, without proper allocation of ports, may result in a blocking

of 407

architecture. Due diligence with your vendors is always a good idea, particularly when significant amounts of money are involved.

As shown in Figure 5-14, you can accommodate hop count limitations by combining directors with departmental fabric switches. Trunked interswitch links between directors can reduce potential blocking, and the port fan-out supplied by departmental switches increases the total population that can be reasonably supported. This type of solution does not provide director-class availability to every port, so the SAN architect must decide which devices should be directly attached to the director and which can reside on departmental switches.

Figure 5-14. Combining directors and departmental switches in a fabric


5.8 Fibre Channel-to-SCSI Bridges When a new technology encroaches on an older one, there is an opportunity for transitional products to bridge the gap. Fibre Channel-to-SCSI bridges perform this function for Fibre Channel SANs by allowing legacy SCSI devices to participate in storage networking. Fibre Channel-to-SCSI bridges normally provide one or two Fibre Channel interfaces for SAN attachment, and two to four SCSI ports for SCSI disk arrays or tape backup subsystems. In addition to this physical and transport conversion, the Fibre Channel-to-SCSI bridge translates serial SCSI-3 protocol to the appropriate SCSI protocol required by the legacy devices.

Vendors of these products like to call them "routers" instead of bridges, largely because of the poor reputation LAN bridge products acquired in the 1980s and their association with disruptive network broadcast storms. The Fibre Channel-to-SCSI products, however, do not actually route data at layer 3; instead, they simply convert one form of SCSI to another. In other words, they provide a bridging function.

The most common application for Fibre Channel-to-SCSI bridges is to support legacy tape backup subsystems, as illustrated in Figure 5-15. Existing SCSI tape backup and library systems represent a substantial investment whose replacement with Fibre Channel devices may be prohibitively costly. Because tape backup is an integral requirement for any storage network, Fibre Channel-to-SCSI bridges may therefore both preserve your investment in tape subsystem hardware and satisfy the need to optimize the tape backup process itself.

Figure 5-15. A Fibre Channel-to-SCSI bridge supporting SCSI-attached tape

of 407

Placing a tape subsystem behind a Fibre Channel-to-SCSI bridge also removes ownership of the subsystem from an individual server. Because the tape subsystem is now addressable by any server on the storage network, all servers can share what was previously a dedicated resource. The separation of a tape subsystem from server ownership is taken a step further by other products discussed later in this chapter.

The role of Fibre Channel-to-SCSI bridges will decline as legacy devices are eventually replaced by SAN-enabled systems. Given the large population and investment in traditional SCSI tape and disk products in use, however, Fibre Channel-to-SCSI bridges will continue to be useful options for SAN design for some time to come.


5.9 Fibre Channel Extension Products Because organizations often have multiple regional or international sites, the ability to access storage data over distance has gained increasing importance. In addition to disaster recovery, wide area storage applications—such as consolidated tape backup, resource sharing, and content distribution—require a means to extend Fibre Channel-originated traffic beyond the central data center.

Several solutions are currently available for Fibre Channel extension, although the high ground for block storage data over distance has been claimed by IP-based storage products. For native Fibre Channel extension—transport methods that require no protocol conversion—the most common extension products rely on dark fiber and dense wave division multiplexing or on encapsulation of Fibre Channel frames using IP tunneling.

5.9.1 Fibre Channel Extension Using DWDM

Dense wave division multiplexing (DWDM) is a physical transport that leverages optical signaling to send numerous data streams concurrently over a single optical link. Light traversing an optical cable is composed of various modes or wavelengths of light. If a data stream is associated with a specific wavelength of light at the point of transmission, it can be separated from other data streams at the receiving end, as illustrated in Figure 5-16. With current technology, as many as 64 concurrent streams can be supported, each riding its own wavelength.

Figure 5-16. DWDM transports each data source on a separate wavelength

DWDM requires dedicated fiber cabling, commonly referred to as dark fiber. Dark fiber is any unused optical pair of an installed cable run. In contrast, lit fiber is already carrying Ethernet, ATM, or some other transport. Carriers with unused fiber are always willing to sell access for a price, but dark fiber is not always available in all areas. Alternatively,

of 407

an organization could run its own fiber at some expense, providing it has the right of way from source to destination.

DWDM can carry any protocol but has become linked to Fibre Channel extension because of its ability to drive longer distances more efficiently than traditional longwave optics and single-mode cable. Still, even with DWDM, performance of Fibre Channel switches over distance falls dramatically beyond metropolitan distances.

Fibre Channel fabric switches were originally designed for data center applications within a fairly narrow circumference (500m with multimode cabling and shortwave optics). With adjoining equipment in close proximity, port buffer requirements are minimal. A fabric switch port, for example, can buffer 3 or 4 frames before they are placed on the cable for transmission. As the switch receives additional credits from the receiving end, it can empty its buffers onto the wire. Under congestion, a switch may have to buffer more frames, typically 32 frames, before frame discard occurs. To empty its buffers, it would have to receive as many as 32 credits from the destination device.

In long haul applications, speed-of-light latency over distance increases the wait time for credits or acknowledgments to return from the receiving end. Depending on the distance, a typical fabric switch port could send its maximum allowed frames and then sit idle for milliseconds as it waited for credits to return. Even with full gigabit bandwidth available on the link, the credit limitation hinders performance, and throughput continues to fall as the distance is extended.

Some fabric switch vendors have attempted to overcome credit starvation by pooling buffers from adjoining ports to serve the long haul port. Although this technique enables the switch port to provide better performance, the contributor ports are unusable for normal node attachment. The customer is thus asked to pay for the inadequacies of the original switch design. Given the low cost of memory, the better solution is to simply design in more buffers per port.

Fibre Channel extension with DWDM can be used for either switch-to-switch or switch-to-node connectivity. From the standpoint of the fabric switch, the intervening DWDM equipment and long haul cable plant are transparent. Consequently, E_Ports or F_Ports can be connected to the DWDM infrastructure as if it were a straight run fiber cable. For E_Port connections, switch-to-switch behavior—including principal switch selection, route information and SNS exchange, zone merging, and so on—now occurs across an extended link. As long as the link is stable, this stretched E_Port connection presents no major difficulties. If the link fails, however, the effect is the same as pulling the cable between two adjacent fabric switches. Each would undergo fabric reconfiguration, with all storage conversations suspended until each fabric stabilized.

5.9.2 Fibre Channel Extension Using IP Tunneling

Because of the higher cost of dark fiber compared with other communications services, use of the more common and more affordable IP network services is an attractive option for Fibre Channel extension. Fibre Channel over IP (FCIP) is an IP tunneling solution favored by Fibre Channel vendors because it perpetuates acquisition of Fibre Channel switcheson both sides of an extended link. Alternatively, two native IP storage protocols (iFCP and iSCSI) can also be used to extend Fibre Channel-originated traffic. Because iFCP and iSCSI are more properly IP storage technologies, they are discussed in Chapter 6.

FCIP tunneling performs a simple task, although the cost of FCIP products often does not reflect it. As shown in Figure 5-17, an FCIP device typically attaches by E_Port connection to the source Fibre Channel switch. Fibre Channel frames destined for the remote end are wrapped in IP datagrams and sent across the IP network. At the receiving end, the IP datagrams are stripped off, and the original Fibre Channel frames are delivered to the E_Port of the receiving fabric switch for routing.

Figure 5-17. Using FCIP tunneling to span distance between fabric switches

of 407

Because FCIP implements a standard E_Port connection over distance, normal Fibre Channel switch-to-switch protocols are passed over the wide area link. This creates a single Fibre Channel fabric, which is now stretched over tens or hundreds of miles. Disruptions at one site may propagate to another, or a break in the wide area link may trigger a fabric reconvergence.

With the caveat of stretched E_Port connections, FCIP tunneling is suitable for applications that require connectivity only between two sites. In current product offerings, some FCIP devices may provide additional buffering to compensate for buffering limitations of fabric switches and may also provide data compression to maximize use of the available WAN bandwidth. In addition, some FCIP products incorporate WAN interfaces for connecting directly to carrier services, whereas others rely on IP routers to connect to the cloud.

FCIP is a point-to-point wide area connection. A pair of FCIP devices is therefore required for each remote link, and each remote site assumes the existence of a fabric switch in addition to end nodes.

5.9.3 Fibre Channel WAN Bridging

Still being defined in standards, Fibre Channel WAN bridging offers another solution for SAN extension. A Fibre Channel WAN bridge transports Fibre Channel traffic over non-Fibre Channel topologies such as ATM. Connection to a local fabric is via a B_Port (bridge port) interface on the bridge. B_Ports, which provide a subset of E_Port protocols, either transparently pass through switch-to-switch Class F frames or engage in the fabric building process as an additional switch.

WAN bridging between two Fibre Channel SANs may result in a single fabric spread over distance. In this case, the WAN bridges pass E_Port traffic between fabric switches, and a common address space with a unique Domain_ID is established for the dispersed SAN. Alternatively, the WAN bridge can engage in E_Port behavior with its local fabric switch, creating an autonomous Fibre Channel region, as shown in Figure 5-18. The notion of autonomous regions is meant to overcome the inherent vulnerability of extended SANs to disruptions. If instead of being joined by an E_Port into one large fabric, each site is an autonomous region, then propagation of disruptions can be limited. Establishment ofautonomous regions still requires that each fabric switch have a unique Domain_ID. Because this is usually accomplished automatically through principal switch selection, implementing autonomous regions will require manual administration of addresses for each switch.

Figure 5-18. Autonomous regions created with Fibre Channel WAN bridging

of 407


5.10 Chapter Summary Transceivers

� Transceivers are a critical component of the Fibre Channel transport.

� Vertical Cavity Surface Emitting Laser (VCSEL) transceivers provide high reliability for Fibre Channel applications.

� Serial ID support allows GBICs to provide inventory and diagnostic status but requires management software for reporting.

� Small form factor transceivers enable high port density on Fibre Channel products.

� SFP (small form factor pluggable) transceivers provide the flexibility of removable media for accommodating shortwave, longwave, and copper cabling.

Host Bus Adapters

� HBAs provide the interface between the host computer's bus architecture and the Fibre Channel network.

� HBA hardware, firmware, and device drivers fulfill Fibre Channel physical, link-level, framing, and upper-protocol requirements.

� For storage applications, the HBA maps Fibre Channel addressing to the SCSI bus/target/LUN identifier required bythe operating system.

� The Common HBA API was developed by the SNIA to simplify management of multivendor HBA environments.

Fibre Channel RAID

� RAID is both a method for data distribution among multiple disks and a hardware implementation for disk arrays.

� RAID levels represent various techniques for improving performance and providing data security.

� Striping of data across multiple disks improves performance; mirroring of data provides redundancy. Parity algorithms can also be used to maximize speed and data integrity.

of 407

� Software RAID is performed by the host system; hardware RAID is usually performed by a controller within the disk array enclosure.

� A hardware RAID device can use parallel SCSI or Fibre Channel to interface between the disks and the RAID controller.

Fibre Channel JBODs

� JBODs are an economical means to provide Fibre Channel-attached storage.

� JBODs appear as loop segments and can be configured as single or redundant loops to disks.

� Disks within a JBOD can be addressed individually or can be assigned as sets for software RAID by a server.

� The overhead of software RAID across Fibre Channel may affect overall performance.

� Embedded loop switch-on-a-chip technology enhances the reliability of JBODs based on Fibre Channel disk drives.

Arbitrated Loop Hubs

� Loop hubs facilitate wiring configurations and enable hot insertion and removal of devices.

� Port bypass circuitry and status indicators simplify network changes and elementary diagnostics.

� Unmanaged hubs are economical but cannot provide protection against protocol-level events.

� Managed hubs can include loop integrity features based on protocol recognition circuitry.

Switching Hubs

� Switching hubs are based on a hybrid architecture that combines switched bandwidth with support for private loop devices.

� Switching hubs can allow multiple concurrent conversations on a single virtual loop.

Fabric Switches

� Fabric switches provide 100MBps or 200MBps per port and can support 8 to more than 128 ports.

� Support for fabric port types (F_Port, FL_Port, E_Port) is vendor-specific.

� Port buffering allows frames to be queued and reduces frame discard under congested conditions.

� Fabric switches can also support private loop devices and port or address zoning.

� Management of fabric switches can be provided via SNMP, Telnet, SES, and other protocols.

� Departmental fabric switches provide 8 to 32 ports.

� Larger deployments with departmental switches are best implemented using core director switches.

� Director-class fabric switches are characterized by high port count, redundant processors, routers, backplanes, andhot-swappable port cards.

of 407

Fibre Channel-to-SCSI Bridges

� Fibre Channel-to-SCSI bridges are transitional products that bring legacy SCSI disks and tape subsystems into a Fibre Channel SAN.

� Fibre Channel-to-SCSI bridges allow tape subsystems to be shared by multiple servers.

� Fibre Channel-to-SCSI bridges are not routers.

Fibre Channel Extension Products

� Native Fibre Channel extension normally cannot span beyond metropolitan distances.

� Dark fiber is unused fiber pairs already deployed in long haul optical cable runs.

� Dense wave division multiplexing (DWDM) supports multiple data streams on the same optical plant by assigning each stream to a separate wavelength (mode) of light.

� In stretched E_Port, switch-to-switch protocols are extended over distance and a single fabric is created.

� Fibre Channel over IP (FCIP) is a technique for encapsulating Fibre Channel frames in IP datagrams.

� FCIP devices require a fabric switch at each site.

� FCIP devices are deployed in pairs for each link to be served.

� A B_Port is a WAN bridge port connection and supports a subset of E_Port protocols.

� Creating autonomous regions requires manual administration of addresses to avoid Domain_ID duplication.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 6. IP SAN Technology New SAN solutions based on IP protocols and Gigabit Ethernet have been introduced to complement and, in some cases, replace traditional Fibre Channel products. Consequently, SAN architects now have additional options for implementing shared storage. As usual, though, separating practical facts from vendor fiction is a necessary first step in designing a viable SAN. This chapter examines the basic features of IP storage protocols and transports and provides a foundation for understanding the capabilities that IP SAN products can offer. [ Team LiB ][ Team LiB ]

6.1 Ethernet and TCP/IP Mainstream data communications networks are based on Ethernet for physical transport and on TCP/IP for routing of data from source to destination. Ethernet and TCP/IP are paired technologies for LAN and MAN (metropolitan area network) networks, whereas IP can extend to wide area networks over point-to-point, frame relay, ATM, Packet over SONET, or switched optical networks. Because the vast majority of organizations use Ethernet and IP for data communications, the introduction of shared storage based on Fibre Channel is often outside the scope of corporate IT network strategies. IP storage resolves this exclusion by bringing block storage data back into mainstream IP networking.

Both Ethernet and TCP/IP have undergone substantial metamorphosis over the past 20 years. Ethernet began as a shared media access method over thick coaxial cable. IP began as a military-funded routing protocol that could still deliver messages even if parts of the network were blown to smithereens by nuclear attack. Twenty years later, Ethernetis now a switched physical transport that has little need for its original collision detection algorithm. And IP is now driving the Internet, the world's largest civilian network.

Along this evolutionary path, other contenders for network transports and protocols proved unfit to survive. Those with proprietary sponsorships, such as Token Ring and SNA (Systems Network Architecture), eventually succumbed to the growing customer demand for open, interoperable systems. Today, Ethernet and TCP/IP are ubiquitous. This mass of installed base generates its own gravitational momentum, making it difficult for other technologies to secure an independent foothold among customers. For SANs, the convergence of Fibre Channel and IP technologies is already under way.

6.1.1 Gigabit Ethernet Transport

For data center applications, current-generation Ethernet is based on gigabit and 10Gbps switched networks. Gigabit Ethernet switches provide 1Gbps full duplex links to end devices and may support 10Gbps interswitch links to build a high-performance core infrastructure. Typically, 1Gbps ports attach directly to file servers or to departmental Ethernet switches for fan-out of 100Mbps (Fast Ethernet) links to end-user workstations. Shared Ethernet segments based on hubs are less commonly deployed and are relegated to low-performance applications. The original 10Mbps Ethernet that once supported mission-critical enterprise applications now resides at the commodity level for home use. When 10Mbps/100Mbps Ethernet switches and adapter cards appear by the pallet load at the local Costco, you know the technology has achieved full market penetration.

Because switched Ethernet provides dedicated bandwidth to each port, the end device no longer experiences the collisions that would occur on a shared medium. So although carrier sense and collision detection is still an attribute of Ethernet, the collision recovery mechanism is rarely invoked.

The reference model for Gigabit Ethernet is defined in the IEEE 802.3z standard. Like 10/100 Ethernet, Gigabit Ethernet is a physical and data-link technology, corresponding to the lower two OSI layers, as shown in Table 6-1.

At the physical layer, Gigabit Ethernet contains both media-dependent and media-independent components. This allows the gigabit media-independent interface (GMII) to be implemented in silicon and still interface with a variety of network cabling. As with Fibre Channel, Gigabit Ethernet supports longwave and shortwave optical fiber and shielded copper, but it can also be implemented over Category 5 (CAT 5) cabling. The reconciliation sublayer passes signaling primitives between upper and lower layers, including transmit and receive status as well as backward-compatible carrier

of 407

sense and collision detection signaling.

In contrast to the Fibre Channel stack, Gigabit Ethernet's 8b/10b encoding occurs at the physical layer via sublayers in the media-dependent physical group. As shown in Figure 6-1, Fibre Channel layers FC-0 and FC-1 are brought into the lower-layer physical interface, whereas traditional 802.3 Ethernet provides media access control (MAC) and logical link control (LLC), or its offspring, MAC Client, to support the upper-layer protocols.

Figure 6-1. Gigabit Ethernet uses Fibre Channel lower layers for signaling

Gigabit Ethernet uses standard Ethernet framing, as shown in Figure 6-2. The preamble and start-of-frame (SFD) delimiter are followed by the destination (DA) and source (SA) 48-bit MAC addresses of the communicating devices. Creative use of bytes within the length/type field enables enhanced functionality such as VLAN tagging, as discussed shortly. The data field can contain as many as 1,500 bytes of user data, with pad bytes if required. The CRC is part of the frame check sequence (FCS). An optional frame padding is provided by the extension field, although this is required only for gigabit half duplex transmissions.

Figure 6-2. Standard Ethernet frame format

Table 6-1. Gigabit Ethernet Physical and Data-Link Layers

OSI Reference Layers Gigabit Ethernet Layers

Data-link layer MAC client sublayer

MAC control (optional)

Media access control (MAC)

Physical Layer Reconciliation

Gigabit media-independent interface

Media-dependent physical group

Medium-dependent interface

Medium

of 407

Ethernet frame addressing gets data from one device to another on the same network segment. When the data is intended for a recipient outside the local LAN segment, network-layer addressing is required. The Ethernet frame therefore serves as the outer envelope for the network packet. It may contain, in addition to network addressing (such as IP), a session control protocol (such as TCP).

IP protocol over Ethernet is inserted into the data field and provides the network-layer routing information to move user data from one network segment to another. TCP provides higher-level session control for traffic pacing and the ability to recover from packet loss. Although IP can be carried in other frame formats, link-layer enhancements for Ethernet offer additional reliability and performance capability unmatched by other transports. These include link aggregation, link-layer flow control, virtual LANs, and quality of service.

IEEE 802.3ad Link Aggregation

Link aggregation, or trunking, provides high bandwidth for switched networks by provisioning multiple connections between switches or between a switch and an end device such as a server. Link aggregation also facilitates scaling the network over time, because it lets you add links to a trunked group incrementally as bandwidth requirements increase. As shown in Figure 6-3, two Gigabit Ethernet switches can share three aggregated links, for a total available bandwidth of 7.5Gbps full duplex.

Figure 6-3. Link aggregation between two Gigabit Ethernet switches

Link aggregation must resolve several issues to avoid creating more problems than it solves. In typical bridge environments, the spanning tree algorithm would, upon encountering multiple links between two devices, simply disable the redundant links and allow only a single data path. This would prevent duplication of frames and potential out-of-order delivery. Link aggregation must therefore make multiple links between two devices appear as a single path while simultaneously providing a mechanism to avoid frame duplication and ensure in-order frame delivery. You could implement this by manipulating MAC addresses (for example, assigning the same MAC address to every trunked link) or by inserting link aggregation intelligence between the MAC client and MAC layers. You would have to monitor the status of link availability, current load, and conversations through the trunk to ensure that frames are not lost or inadvertently reordered.

In-order delivery of frames is guaranteed if a conversation between two end devices is maintained across a single link in the trunk. Although link aggregation is not as efficient as simply shipping each frame over any available connection, it avoids the extra logic required for frame ordering and reassembly before delivery to the recipient. At the same time, additional transactions by other devices benefit from the availability of the aggregated interswitch links, and switch-to-switch bottlenecks are avoided.

Link aggregation as specified in IEEE 802.3ad is almost mandatory for IP-based storage networks, particularly when multiple Gigabit Ethernet switches are used to build the SAN backbone. Along with 802.1p/Q prioritization, link aggregation can ensure that mission-critical storage traffic has an available path through the network and that multiple instances of mission-critical transactions can occur simultaneously. This requirement will be satisfied temporarily by the arrival of 10Gbps uplinks between switches, but these will inevitably be trunked to provide even higher bandwidth over time.

IEEE 802.3x Flow Control

Flow control at the data-link level helps to minimize frame loss and avoids latency due to error recovery at the higher-layer protocols. In Fibre Channel, flow control for Class 3 service is provided by a buffer credit scheme. As buffers are available to receive more frames, the target device issues receiver readys (R_RDYs) to the initiator, one per available

of 407

buffer. In Gigabit Ethernet, link-layer flow control is provided by the IEEE 802.3x standard. As illustrated in Figure 6-4,the 802.3x implementation uses a MAC control PAUSE frame to hold off the sending party if congestion is detected. If, forexample, receive buffers on a switch port are approaching saturation, the switch can issue a PAUSE frame to the transmitting device so that the receive buffers have time to empty. Typically, the PAUSE frame is issued when a certain high water mark is reached, but before the switch buffers are completely full. To use IEEE 802.3x flow control efficiently,all switches in the data path must support bidirectional flow control.

Figure 6-4. Sending PAUSE frames to pace incoming traffic

Because the PAUSE frame is a type of MAC control frame, the frame structure is slightly different from the conventional data frame. The length/type field is used to indicate the presence of a MAC control frame. The flow control mechanism within the control frame is a hold-off pause_time value, establishing how long the sending party should pause before continuing to send frames. The pause_time cannot be specified in fixed units such as microseconds, because this would prove too inflexible for backward compatibility and future Ethernet transmission rates. Instead, the pause_time is specified in pause_quanta, with 1 pause_quanta equal to 512 bit_times for the link speed being used. The timer value can be between 0 and 65,535 pause_quanta, or a maximum of ~33 milliseconds at Gigabit Ethernet's 1.25Gbps transmission rate. If the device that issued the PAUSE frame empties its buffers before the stated pause_time has elapsed, it can issue another PAUSE frame with the pause_time set to zero. This signals the transmitting device that frame transmission can resume.

Because PAUSE frames can be used between any devices and the switch ports to which they are attached and because Gigabit Ethernet allows only one device per port, there is no need to personalize the PAUSE frame with the recipient's MAC address. Instead, a universal, well-known address of 04-80-C2-00-00-01 is used in the destination address field. When a switch port receives the PAUSE frame with this address, it processes the frame but does not forward it to the network.

The 802.3x flow control provided by Gigabit Ethernet switches creates new opportunities for high-performance storage traffic over IP. Fibre Channel Class 3 service has already demonstrated the viability of a connectionless, unacknowledgedclass of service, provided that there is a flow control mechanism to pace frame transmission. In Fibre Channel fabrics using Class 3, as with 802.3x in Ethernet, the flow control conversation occurs between the switch port and its attached device. As the switch port buffers fill, it stops sending R_RDYs until additional buffers are freed. In Gigabit Ethernet, this function is performed with PAUSE frames, with the same practical result. In either case, buffer overruns and the consequent loss of frames is avoided, and this is accomplished with minimal impact to performance.

IEEE 802.1Q VLAN Tagging

Virtual LANs in a switched Ethernet infrastructure let you share network resources such as large Gigabit Ethernet switches while segregating traffic from designated groups of devices. As with Fibre Channel zoning, members of a virtualLAN can communicate among themselves but lack visibility to the rest of the network.

of 407

VLAN tagging was standardized in 1998 through the IEEE 802.1Q committee. You create virtual LANs by manipulating the length/type field in the Ethernet frame. To indicate that the frame is tagged, a unique 2-byte descriptor of hex 81-00is inserted into the field. This tag type field is followed by a 2-byte tag control information field, as shown in Figure 6-5.This field carries the VLAN identifier and user priority bits, as described shortly. The 12-bit VLAN identifier allows you to assign as many as 4,096 VLANs on a single switched infrastructure.

Figure 6-5. IEEE 802.1Q VLAN tag fields

From a performance standpoint, VLAN tagging is a highly efficient means to segregate network participants into communicating groups without incurring the overhead of MAC address filtering. Intervening switches use the logical VLAN identifier, rather than a MAC address, to properly route traffic from switch to switch, and this in turn simplifies the switch decision process. As long as the appropriate switch port is associated with the proper VLAN identifier, no examination of the MAC address is required. Final filtering against the MAC address occurs at the end point.

All major vendors of Gigabit Ethernet switches support the 802.1Q standard. This makes it a useful feature not only for data paths that must cross switch boundaries but also for heterogeneous switched networks. For IP storage network applications, 802.1Q facilitates separation of storage traffic from user messaging traffic, as shown in Figure 6-6.

Figure 6-6. VLANs on a common switch infrastructure. VLAN 2 is a storage network

IEEE 802.1p/Q Frame Prioritization

The 802.1Q VLAN tag control information field allocates 3 bits for user priority. The definition for these user priority bits is provided by IEEE 802.1p/Q and enables individual frames to be marked for priority delivery. The quality of service (QoS) supported by 802.1p/Q allows for eight levels of priority assignment. As shown in Figure 6-7, this ensures that mission-critical traffic will receive preferential treatment in potentially congested conditions across multiswitch networks and thus minimizes frame loss due to transient bottlenecks.

Figure 6-7. Frames with highest priority are processed first

of 407

For storage network applications, the ability to prioritize transactions in an IP-based SAN is a tremendous asset. Storagenetworks normally support a wide variety of applications, not all of which require high priority. Updating an online customer order or a financial transaction between banks, for example, rates a much higher priority for business operations than does a tape backup stream. The class of service provided by 802.1p/Q allows storage administrators to select the applications that should receive higher-priority transport and assign them to one of the eight available priority levels. In a multiswitch network, class of service ensures that prioritized frames have preference across interswitch links.

6.1.2 TCP/IP

Layer 3 IP network routing solves a fundamental problem of layer 2 networks. Layer 2 networks have a single address space based on MAC or a similar link layer address. Network-wide queries such as broadcasts are distributed throughout the network address space, creating a potential for broadcast storms that would bring normal network traffic to a halt.Broadcast storms were a notorious problem in the 1980s for bridged Ethernet or Token Ring networks. What was neededwas a means to isolate link-level broadcasts to a single network segment and therefore eliminate network-wide disruptions. That solution was provided by IP routers, which established Cisco's dominance of the data communications market.

IP adds an address layer above the link-level MAC address. Traffic within a network segment can use link layer addressing to send frames from one device to another. Traffic destined for devices beyond the local segment are sent to an IP router or gateway, which uses the upper-layer IP address to determine where frames must be sent. Link layer broadcasts are restricted to the local segment, thus preventing network-wide broadcast storms.

In Figure 6-8, four network segments are joined by IP routers. Communications between devices within segment A, for example, would use Ethernet MAC addressing to send and receive frames. A link-level failure or broadcast storm in segment A would be blocked by the IP router and not passed to the rest of the network. Segment B, in this instance, could continue to communicate with devices in segment D using IP addressing because the rest of the network would be operational.

Figure 6-8. A network segmented by IP routers

of 407

Collectively, all networks connected by IP routers can be referred to as an IP network. The Internet, for example, is composed of hundreds of thousands of individual networks, each with its own unique IP address space.

IP Addressing

Unlike proprietary networking protocols, IP has evolved in the public realm. The IP addressing scheme therefore assumes that individual end devices not only have a unique identity within a particular network but also are unique globally.

The traditional IP address scheme (IPv4) uses a 32-bit address that is divided into network identifiers and host (device) identifiers. For convenience, the 32 bits are expressed in a dotted decimal notation—for example, 192.168.20.10. The original IPv4 addressing relied on various classes of addresses to provide address ranges for large and small networks. To distinguish between the network and host portions of an IP address, a subnet mask is used. The subnet mask sets 1bits to indicate the network portion, and 0 bits to indicate the host portion. This arrangement makes it possible to subdivide a class of IP addresses into small portions that can be easily administered.

Although IPv4's 32-bit address space allows for more than 4,000,000,000 (four billion) unique addresses, the class-based allocation of blocks of addresses has proved to be too restrictive to satisfy the ever-increasing demand for IP connectivity. Consequently, a new IPv6 enhancement allocates 128 bits for an IP address, for a 4,000,000,0003 (four billion cubed) advantage over the IPv4 total address space. In addition, the class-based scheme for IPv4 has been replaced by a more flexible classless IP address scheme.

Classless Inter-Domain Routing (CIDR) was created in the mid-1990s as a transitional solution to IPv4 address allocation until IPv6 could be developed. CIDR essentially overturns the notion of fixed IP classes and offers a more flexible means to demarcate networks and hosts. CIDR ignores the upper bits that identify IP classes and simply indicates the number of bits that should be used for calculating the network portion of the address. The CIDR postfix /15appended to a network address of 172.16.0.0/15 would indicate that the first 15 bits of the address should be used to identify the network portion. Because a /15 mask is shorter than the 16 bits normally associated with the Class B 172.16.0.0 address, CIDR may not be compatible with earlier network interface card (NIC) device drivers. It is especially beneficial, however, for ISPs to use in allocating addresses that scale to their customers' requirements.

For all the difficulties associated with IPv4 address allocation, IP addressing offers a viable long-term solution for future storage networks. The deployment of a virtually unlimited IPv6 address space will accompany the widespread adoption of readily available storage as storage network solutions expand from the data center to consumer-oriented applications.This universality would be difficult to achieve with the more finite address space and single fabric topology provided by Fibre Channel. In addition, IPv6 standards include QoS and security features that are useful for a wide variety of networked storage solutions.

Address Resolution Protocol

of 407

When an IP packet is sent between networks, the header contains the source and destination IP addresses but no information on the destination's lower-level MAC address. A workstation may have both a 48-bit MAC address and a 32-bit IP network address, and it cannot receive frames based on the IP address alone. An IP router must therefore be able to associate a specific IP address with the appropriate MAC address in order to complete delivery. This process, called address resolution, requires a special protocol (ARP) consisting of request and reply packets.

ARP relies on broadcasts that could be potentially disruptive to the LAN. Having received a data packet addressed to the local IP subnet, for example, an IP router broadcasts an ARP request throughout the local LAN segment. The ARP request contains the MAC and IP address of the router (the source) and the IP address of the intended recipient (the destination). Every host on the LAN segment examines the broadcast request and compares the IP address it contains toits own. If there is a match, the host issues an ARP reply to the router's MAC address and inserts its own MAC address as the source of the reply. The router can then resolve the MAC-to-IP addressing and send the original data packet on the destination LAN segment using the host's proper MAC address.

IP Routing

IP routing relies on layer 3 network addressing for transporting packets between routers and relies on layer 2 addressingfor delivering packets to their final destination on local LAN segments. When an IP router receives a packet from a local Ethernet segment, for example, it strips off the Ethernet header (layer 2) and uses the IP header (layer 3) to make a forwarding decision. At the receiving end, a layer 2 header must be reapplied before the packet can be sent to the appropriate destination IP address. This process allows communication using IP between heterogeneous LAN topologies—for example, between Ethernet-based and Token Ring-based hosts.

The IP routers themselves must have a means to exchange information about which IP networks or subnets they are attached to. Several routing information methods have evolved, including the original Routing Information Protocol (RIP)and Open Shortest Path First (OSPF). RIP-2, which allows for variable-length subnet masks, is a distance-vectorprotocol. As routers exchange information about their attached IP networks, RIP calculates optimum paths through the network based on the shortest number of hops between source and destination. Each IP router represents a single hop. In a meshed network, there may be multiple paths between a source and destination network. Calculating IP routes based on the fewest number of hops would therefore appear to be the best means to get data from A to B. In reality, the hop count method does not account for the state of the links in the network. IP routers may have different types of links between them; for example, one hop may be connected by a Fast Ethernet link (100Mbps) whereas another may be connected by a T3 communications link (45Mbps). In addition, some links may bear more traffic than others, so even though there may be fewer hops along a single path vis-à-vis another, packets forwarded along that path may face congestion and may be discarded. The limitations of distance-vector routing encouraged the development of other protocols, including OSPF.

Open Shortest Path First is a link-state protocol. Instead of simply calculating optimum paths based on number of hops, OSPF monitors the state of the links to which an IP router is attached and calculates the relative cost of moving data from one point to another based on bandwidth, current traffic load, and other link-dependent conditions. Link-state information is transmitted to neighboring IP routers so that the most efficient data paths can be selected.

OSPF also introduces the concept of autonomous areas within the IP-routed network. As shown in Figure 6-9, the OSPF area hierarchy includes autonomous areas of IP routers that are ultimately linked through a common backbone area (area 0). This hierarchical scheme restricts OSPF broadcasts to specific areas and so prevents flooding of the entire network with routing information updates whenever changes occur.

Figure 6-9. OSPF backbone and autonomous areas

of 407

Another advantage of OSPF is the ability it gives you to quickly accommodate changes in the network. In network terminology, convergence refers to the reestablishment of network connections after a major change has occurred. RIPuses an update interval method that can result in prolonged link failures before new paths are broadcast. In addition, because there are no autonomous areas as in OSPF, convergence time depends on the propagation of routing information throughout the network before stability is achieved. OSPF shortens convergence time by issuing state changes as soon as link failures are detected and by restricting (via areas) the number of IP routers that must be notified of changes.

Fibre Channel routing uses a subset of the OSPF protocol. FSPF calculates the shortest path on the basis of link speed, although new proposals would also include latency over distance as a metric.

TCP Session Control

The Transmission Control Protocol (TCP) is a connection-oriented transport protocol sitting above the IP network layer. Instead of simply pushing data from source to destination on a best-effort basis, TCP first establishes a transmission connection between the communicating pair and imposes a system of datagram acknowledgments to ensure that each transmission arrived intact. TCP also provides a mechanism to recover from packet loss due to errors or network congestion.

Because TCP is optimized for potentially congested networks, it is the protocol of choice for wide area communications and is used extensively in enterprise networks and the Internet. The cost for leveraging TCP/IP's data integrity features, however, is measured in performance. Someone must do the work. In the case of a TCP/IP driver on a host computer system, the CPU must process the TCP routines as datagrams are transmitted and received, resulting in 50 percent to 80 percent CPU utilization for intensive, high-speed TCP/IP transactions. In addition, the extra transmissions required forsession establishment, acknowledgments, and session teardown place more traffic on the network infrastructure. But without the transport layer delivery guarantees provided by TCP/IP, the potential burden on the network and host CPUs might be even greater, particularly if upper-layer applications were forced into constant error recovery and retransmission of larger units of data.

TCP introduces procedures for establishing communications between two network entities, segmentation of messages forhand-off to the IP layer, sequence numbers and acknowledgments to track the transmission of bytes across the link, a ramping and back-off algorithm to pace the traffic flow, and recovery routines to handle packet loss through the network. Like UDP, TCP uses port numbers to facilitate communications between upper-layer applications. The TCP port number and IP address can be combined to create unique identifiers for abstractions known as sockets. A socket represents the end point of a TCP session.

Because TCP is a connection-oriented protocol it requires acknowledgment for each TCP segment transmitted. If TCP waited for an acknowledgment before sending the next segment in queue, however, performance would be adversely affected. To optimize flow control while maintaining acknowledged service, TCP uses a sliding window, which allows multiple segments to be sent before acknowledgments are received. The window size is negotiated during connection setup and typically reflects the buffering capability of the communicating devices. The larger the receive buffers, the greater the number of segments that can be transmitted en masse before acknowledgments are issued.

Although TCP requires an initial setup between two devices, the logical connection that is established does not specify the bandwidth available for data transport. The underlying network could be a relatively slow WAN link, an oversubscribed network, or a wide open switched gigabit infrastructure. Because the network may be unreliable, TCP makes no assumptions about how quickly it can transmit its data. Instead, it probes the network capacity by gradually increasing the number of packets sent until congestion is detected or the sliding window value is reached. This ramping algorithm allows TCP to throttle transmissions and so adjust for variable network conditions.

The TCP slow start algorithm tests the rate at which TCP can inject segments into the network by observing the latencyof acknowledgments that are returned by the destination device. The more quickly acknowledgments are received, the more quickly additional segments can be issued. With slow start, a congestion window value (cwnd) is initialized to a value of one segment. As each ACK is received from the destination, the congestion window is incremented exponentially. After the first ACK is received, the congestion window is increased to two segments. When those are acknowledged, the window increases to four segments, and so on.

The sliding window algorithm depends on the buffering capacity of the destination device—that is, how quickly it can process its receive buffers and return acknowledgments. The slow start algorithm also relies on responses from the destination, but it allows the source to limit segment transmissions until confidence in the underlying network is

of 407

established.

For streaming storage applications such as backup, slow start has no significant impact on performance. For storage applications that may require exchanges of smaller frames, such as on-line transaction processing, slow start may affect throughput, especially over long distances.

In the event of errors or dropped packets due to network congestion, TCP's constant monitoring of the session enables itto recover packets from the point of failure. This ensures that although individual packets may be dropped, they will be retransmitted and the data recovered. On a well-designed network, TCP recovery is rarely invoked. Nonetheless, the TCPlayer at both ends of the transaction must continue monitoring sequence numbers and acknowledgments.

TCP processing can consume significant host CPU cycles, as evidenced by the sluggish behavior of any PC when downloading large files from the Internet. New technologies for offloading TCP processing from the host are now available on some Gigabit Ethernet adapter cards. TOE (TCP offload engine) chip sets remove the burden of TCP overhead from the host CPU and thus free the CPU for user processing. Chapter 7 examines the use of TOE technology for accelerating storage over IP.

TCP is a form of insurance against unforeseen network disruptions. Session control, sliding windows, slow start, packet recovery, and in-order delivery ensure that despite network difficulties, data will still arrive intact. Although this technology satisfies one of the requirements of mission-critical storage applications, it does not mean that storage data should be consigned arbitrarily to any TCP/IP network. To avoid timeouts at the upper SCSI layer and degraded performance for storage applications, the SAN architect must also ensure that the network infrastructure is properly designed for the intended load. Oversubscription of links may be permissible for some applications and anathema to others. Like any insurance, TCP recovery provides security against loss, but you may also suffer penalties if it is invoked too often. [ Team LiB ][ Team LiB ]

6.2 Native IP Storage Protocols A native IP protocol is one that maps a unique IP address to an individual networked device. In the earlier discussion on FCIP tunneling, you learned that IP addresses are assigned to the FCIP tunnel gateways but not to individual storage devices whose traffic passes through the IP tunnel. By contrast, both the Internet Fibre Channel Protocol (iFCP) and Internet SCSI (iSCSI) assign individual IP addresses to each storage participant. In this way, traffic from any device can leverage the capabilities of IP routing and TCP session control through the network, and you can use traditional IP network management tools to monitor conversations between initiators and targets.

The fact that there are two protocols in this category reflects the transitional nature of SAN plumbing. iFCP makes no mention of iSCSI and instead focuses on mapping Fibre Channel end devices to an IP transport. This lets you build SANs without Fibre Channel fabrics and overcome the problematic FCIP tunneling issues. Moreover, iSCSI makes no mention of Fibre Channel at all and instead focuses on building pristine IP SANs. iFCP therefore provides a migration path from Fibre Channel to IP SANs, whereas iSCSI attempts to create IP SANs from the ground up.

Because both iFCP and iSCSI are based on traditional TCP/IP networking, standardization is under the jurisdiction of the IETF. Standardization is not an end point but rather is an ongoing process as protocols are developed and augmented byauxiliary services. Both iFCP and iSCSI rely on IP routing protocols and TCP session control to ensure in-order delivery ofdata and packet-level recovery in the event of network congestion. The IP and TCP mechanisms therefore provide the foundation for transporting data reliably over IP network infrastructures.

6.2.1 Internet Fibre Channel Protocol

The Internet Fibre Channel Protocol (iFCP) is a gateway-to-gateway protocol for providing Fibre Channel fabric services to Fibre Channel end devices over a TCP/IP network. iFCP combines the fault isolation properties of autonomous regions with the scalability of the IP network. Although iFCP allows attachment to existing Fibre Channel fabric switches, an iFCP gateway also allows direct attachment of Fibre Channel end devices. As a result, you can create IP-based SANs that accommodate Fibre Channel storage and hosts but do not require Fibre Channel fabrics or fabric management.

As illustrated in Figure 6-10, an iFCP gateway has a dual personality. A Fibre Channel N_Port attached to an iFCP gateway is served by a standard F_Port interface on the gateway. This implies that the gateway can support fabric

of 407

services, such as fabric login, SNS registration and queries, and State Change Notifications. Between the Fibre Channel device domain and the IP domain, iFCP must translate between Fibre Channel behavior and IP behavior. At the iFCP layer, a Fibre Channel device's 24-bit fabric address is mapped to a unique 32-bit IP address, providing native IP addressing for individual Fibre Channel initiators and targets. In place of the Fibre Channel lower-layer transport (FC-2), iFCP substitutes TCP/IP for reliable transmission over the IP network.

Figure 6-10. A simple IP SAN based on the iFCP gateway protocol

After Fibre Channel-originated data has been converted to IP format, it can be sent anywhere within the IP routed network. Discrete IP mapping thus enables convergence of SAN, LAN, and WAN storage applications using existing Fibre Channel assets. iFCP also enables use of mainstream Gigabit Ethernet switches to build the SAN core, an attractive option for customers who already have established acquisition and support channels for Gigabit Ethernet director and departmental switches.

A unique contribution of iFCP is the ability it gives you to create gateway regions that provide communications between multiple sets of Fibre Channel end devices and yet isolate potential disruptions within each region. Because iFCP gateways do not pass Fibre Channel switch-to-switch protocols from site to site, you avoid the need to create an extended fabric. Disruptive behavior within a local SAN is isolated to that gateway region and is not, as would be the case with FCIP, propagated throughout the network. The ability to preserve the autonomy of each local site and yet provide connectivity to other sites contributes to the stability of the IP SAN and restricts fabric-inspired disruptions.

Discrete mapping of individual Fibre Channel devices to IP addresses also enables any-to-any IP routing, as shown in Figure 6-11. Fibre Channel-originated storage conversations are not restricted to IP tunnels and can be routed to any desired destination in the IP network.

Figure 6-11. iFCP any-to-any communication across an IP routed network

of 407

This strategy provides flexibility in device deployment for wide area, metropolitan, and local IP SAN configurations and enables advanced disaster recovery implementations such as data replication between multiple sites.

Compared with simple FCIP tunneling, iFCP actively intervenes in the transportation of frame traffic between N_PORTs. The iFCP gateway must constantly manage Fibre Channel transactions and intercept port login requests when the boundary between Fibre Channel and IP domains is crossed. When an E_Port is attached to a Fibre Channel switch, the iFCP gateway must emulate a fabric while terminating the E_Port connection at the local site. In the network shown in Figure 6-11, for example, the Fibre Channel switch would see its locally attached iFCP gateway as a terminal point in the local fabric. All the Fibre Channel devices at the far sides of the cloud would appear to be attached directly to the local iFCP gateway. The iFCP gateway thus presents itself as a large virtual fabric switch with potentially hundreds or thousands of directly attached devices, although in reality those devices may be dispersed throughout an IP network.

The complexity of iFCP compared with FCIP tunneling has encouraged Fibre Channel switch vendors to select FCIP for SAN extension. In the process, however, the vulnerabilities of stretched E_Port connections and distributed fabrics are perpetuated. The engineering challenge for iFCP gateway design is to perform complex functions at wire speed while still abiding by both Fibre Channel and IP standards.

A native IP solution for Fibre Channel-originated traffic may actually extend the value of Fibre Channel storage assets. After Fibre Channel data has been converted to iFCP, it can benefit from established QoS and security services originally created for IP messaging. These enhancements are not yet available in Fibre Channel switch products but are readily available in IP network equipment. Consequently, the protocol conversion provided by iFCP amplifies the capabilities customers can leverage for the investment they have already made in Fibre Channel storage and hosts.

6.2.2 Internet SCSI (iSCSI)

The development of the iSCSI protocol fulfills customer requests dating back to the initial marketing of SANs. The ability to create storage networks with familiar and supportable Ethernet and IP technology was not feasible without gigabit performance and an IP protocol optimized for block storage data. Now that SANs have proven their end-user value and Gigabit Ethernet is readily available, development of iSCSI has received backing by all major storage and storage solutions providers.

The iSCSI protocol standard omits any reference to Fibre Channel because iSCSI replaces both Fibre Channel fabrics andFibre Channel device interfaces with a native IP protocol. Whereas iFCP is intended to provide a migration path from Fibre Channel to IP SANs, iSCSI has no migration component. Both initiators and targets are iSCSI, with standard IP routing and Ethernet providing the link between them.

One of the fundamental contradictions that iSCSI has had to address is the issue of network intelligence. In Fibre Channel SANs (as well as iFCP gateways), the network interconnect must supply intelligence in the form of automatic address allocation, login, device discovery, and state change notification. This intelligence is required because storage targets are passive nodes in the network, playing a role that is analogous to their passive position on a SCSI daisy chain. IP networking, by contrast, assumes intelligence in the end nodes and little intelligence in the network itself. Specifically, current-generation IP network equipment may not be storage-aware. Because iSCSI cannot assume storageintelligence in the Gigabit Ethernet switch or IP router complex, an iSCSI device itself must have alternative means to deal with storage-specific phenomena. Some of the proactive functionality may be in the end device, whereas some maybe supplied by additional boxes (such as SNS servers) attached to the network.

As with the FCP layer in Fibre Channel, iSCSI's primary task is to encapsulate SCSI commands, status, and data. At the core of an iSCSI packet, the traditional CDB (command descriptor block) performs the real work of reading or writing blocks of data to disk or tape. As shown in Figure 6-12, the SCSI CDB is carried in an iSCSI protocol data unit (PDU) in the same way it is carried by the Fibre Channel FCP layer. The iSCSI PDU contains other information to monitor the status of transactions between initiators and targets and in turn rides upon the TCP session control and IP routing layers.

Figure 6-12. The iSCSI protocol stack

of 407

One or more TCP connections are established to support an iSCSI session between initiator and target. More TCP sessions enable more concurrent transactions to be performed, thus optimizing utilization of the protocol processing and the link.

For routing purposes, each iSCSI device has a unique IP address. If the device is moved from one part of the network to another, its IP address might necessarily change. Consistent identity of a device regardless of its network address must therefore be supplied with something comparable to a Fibre Channel World-Wide Name (WWN). For iSCSI, as many as 255 bytes can be used to assign a unique identity to an iSCSI node. This potentially very long iSCSI name allows for both WWN-type identity and human-readable names that might specify location, device type, or departmental ownership.

In Fibre Channel fabrics, the first act of an attached device is to log in to the fabric switch and receive a Fibre Channel address. iSCSI also provides a login service, but instead of being directed to the network switch, the login is performed to another iSCSI device. The network IP address has already been assigned, either through manual configuration or via a DHCP (Dynamic Host Configuration Protocol) server. The iSCSI login sequence is used to negotiate any variable parameters between an initiator and a target and may invoke a security routine to authenticate allowable connectivity. Ifthe connection is successful, the target issues a login acceptance to the initiator; otherwise, the login is rejected and the connection broken.

After login is completed, the iSCSI session enters into normal SCSI transactions. If multiple connections for that session have been established, individual command/response pairs must flow over the same TCP connection. This connectionallegiance ensures that specific read or write commands are fulfilled without the additional overhead of monitoring multiple connections to see whether a particular request is completed. A SCSI write, for example, would be performed over a single connection until all data was transmitted. Unrelated transactions, however, could simultaneously be issued on their own connections during the same session.

As shown in Figure 6-13, an iSCSI transaction between initiator and target is very similar to an FCP transaction.

Figure 6-13. Execution of an iSCSI write operation

of 407

iSCSI protocol data units are used to send commands, status, and data, with Ready to Transmit (R2T) PDUs fulfilling therole of upper-layer SCSI flow control between target and initiator. In the SCSI write operation depicted in Figure 6-13,the R2Ts are issued by the target device to solicit data as receive buffers become available. At the completion of the write, the target issues status and sense, indicating a successful transaction.

iSCSI sessions and their associated TCP connections normally remain open, awaiting additional SCSI commands from the upper-layer applications. An initiator, for example, typically has assigned disk resources in the SAN and rarely breaks connection with them unless rebooted. In an iSCSI environment, however, initiator activity may require more TCP connections for some transactions than others, and so allowance is made for selectively logging out of previously established connections. In some instances, such as downing a resource for maintenance, it may be desirable to log out of a session and all connections completely. The iSCSI logout command supplies reason codes for terminating sessions or connections within a session, in the latter case using the connection ID (CID) to specify which connection should be taken down. In the event of a connection error, the logout command can be issued on an alternative connection (or a newly established one) to clean up the problematic TCP connection.

As the iSCSI protocol has worked its way through the standardization process, security of storage data over IP networks has been a central concern. Early standards drafts provided login processes for negotiating a common security methodology including public/private key exchanges and Kerberos. An attempt to mandate inclusion of more sophisticated IPSec encryption, however, met opposition by vendors who feared that burdening every iSCSI adapter card with the cost of encryption ASICs would undermine iSCSI adoption. (IPSec is discussed in Section 6.5.) In the end, customers will determine where and at what price they need the safeguards of data encryption. Top of the line iSCSI offerings may include on-board IPSec capabilities, whereas other iSCSI devices may depend on third-party security functionality provided by IP routers or dedicated firewall products.

The two IP-based technologies that are facilitating iSCSI for SANs are TCP off-load engines and 10Gbps Ethernet. TOEs are highly attractive for any TCP/IP processing and have a much broader market appeal than strictly storage-specific applications. By reducing host CPU overhead, TOEs enable iSCSI adapters to provide performance and utilization comparable to Fibre Channel HBAs. Similarly, 10Gbps Ethernet is being driven by the broader IP networking market, but it is especially suited for high-volume IP storage requirements. In addition to 10Gbps Ethernet interswitch links for building large IP SAN core infrastructures, 10Gbps Ethernet interfaces on storage arrays will enable simultaneous access for much higher populations of servers than traditional SANs. This capability helps to amortize the cost of storage accesswithout sacrificing performance. [ Team LiB ][ Team LiB ]

of 407

6.3 Discovery in IP SANs If hosts and storage targets are plugged in to a common network, only connectivity between them has been achieved. For initiators to actually communicate with targets, a discovery mechanism is required. In Fibre Channel, the discovery process is enabled by the Simple Name Server (SNS) in each fabric switch. Because all storage targets register with the SNS, initiators can simply query the SNS to identify potential resources. Unlike Fibre Channel fabrics, IP networks do notautomatically provide discovery services. Typically, such advanced services are provided by a third-party server installedsomewhere in the network—for example, Domain Name System (DNS) servers that resolve uniform resource locator (URL) and IP address associations. Concurrent with the development of IP storage transport protocols, then, IP protocolsto facilitate device discovery have also been required.

The two protocols under consideration for discovery in IP SANs are Service Locator Protocol (SLP) and Internet Storage Name Server (iSNS). SLP is an IP protocol that defines how queries can be directed to a server that in turn supplies IP addresses or names for designated storage resources. The iSNS protocol also provides IP address resolution but includesadditional features for zoning, authentication, and the detection of changes in the storage network configuration.

6.3.1 Service Locator Protocol (SLP)

Although SLP was not designed for storage, it provides a means to perform elementary discovery of storage targets by initiators. Like FCIP, SLP offers a path of least resistance for vendors and is much simpler to implement than iSNS. The trade-off for vendor convenience, however, is that the SLP mechanism is not scalable to large deployments and does notoffer the enhanced zoning and security features proposed by the iSNS protocol.

SLP uses a client/server scheme composed of user agents (UAs), service agents (SAs), and a directory agent (DA), as shown in Figure 6-14. The user agent can reside in an initiator or target and is responsible for establishing contact with service agents or a directory agent. The service agent maintains a list of storage resources and advertises their availability to interested user agents. A directory agent is a central repository of services; in effect, it is a management server installed somewhere in the network.

Figure 6-14. SLP user, service, and directory agents for iSCSI discovery

of 407

An SLP server agent maintains a list of storage resources as service URLs. A service URL may include the IP address, the TCP port number, and a device name—for example, 10.1.30.21:5003/Fred. If the IP address is included, it is assumed that the address is static and will not change over a given time. If a URL-formatted iSCSI name with no address is used, an additional step for name resolution is required. Having obtained the URL of a storage resource, an initiator would have to query a DNS server to find the current IP address of that device. An SLP listing of targets may also have a scope entry, indicating which initiators are authorized to access the storage device.

The SLP proposal for IP storage discovery relies on the initial service request that an initiator sends when it comes onto the network. There is no equivalent of a state change notification if a storage resource is removed from the network. The initiator would instead have to rely on iSCSI session failure for notification. And although SLP includes a scope entry to identify authorized hosts for a target, there is no equivalent of Fibre Channel zones or zone sets.

This streamlined discovery mechanism is suitable for small IP SAN installations but may be difficult to scale to enterprise-class storage networks. The lack of asynchronous notifications, zoning capability, and security features may simplify the engineering tasks of vendors who implement SLP, but it narrows their market to small iSCSI configurations.

6.3.2 Internet Storage Name Server (iSNS)

The Internet Storage Name Server protocol combines Fibre Channel Simple Name Server with Domain Name System (DNS) functionality from IP networking. For proactive notification of resource availability, iSNS provides state change notification to designated initiators, and for zoning of resources provides discovery domains and domain sets.Security between initiators and targets is also enabled through public key/private key administration on an iSNS server.

Because iSNS borrows database objects from SNS, it can support discovery of both iFCP gateway-attached Fibre Channeldevices and iSCSI devices. For heterogeneous SANs, iSNS thus facilitates discovery and zoning of both Fibre Channel and iSCSI targets by any mix of Fibre Channel and iSCSI hosts.

The iSNS protocol is based on a client/server model that can be implemented on IP storage switches or distributed among multiple switches and dedicated iSNS server platforms. The ability to accommodate both centralized and distributed models makes iSNS capable of scaling from small installations to enterprise-wide storage networks.

of 407

Enterprise-class platforms such as Microsoft .NET can include both iSNS client and server agents.

iSNS features include facilities for registration, discovery, and management of IP storage resources as well as for zoning and state change management. The name registration service enables IP storage devices to register their attributes and address, as with the Fibre Channel SNS. Initiators can then query the iSNS to identify potential targets. Zoning functionality is provided by discovery domains (DDs), and that restricts the discovery of IP storage targets to authorized initiators. State change notification alerts iSNS clients to any change in status of a registered device or reconfiguration ofthe client's discovery domain.

iSNS discovery domains enable a device to participate in one or more zones. Like Fibre Channel zones, DDs must be administered manually, at least for the initial establishment of functional groups within the network. By default, a new device is isolated from the storage network until a management workstation assigns it to a specific discovery domain. This arrangement prevents inadvertent access by unauthorized initiators. After a DD has been configured for the device, state change notification is used to alert authorized initiators that a new resource has been added to the domain.

iSNS also supports discovery domain sets (DDS). Analogous to a zone set in Fibre Channel, a DDS can be used to quickly reconfigure an IP SAN for different application requirements. One DDS, for example, could include a tape resource in an NT discovery domain for one configuration, whereas an alternative DD configuration could move the tape device into a UNIX discovery domain.

As shown in Figure 6-15, an iSNS server can reside anywhere within the IP network, accessible by iFCP or iSCSI clients. One or more management workstations are used to configure and monitor the iSNS server, either by iSNS protocol or Simple Network Management Protocol (SNMP). Because iSNS provides a common resource for a variety of IP storage types, each can register with and query the iSNS server for information relevant to the functionality the device supports.An iFCP gateway, for example, could query the iSNS server for the existence of iSCSI storage targets and, if it provided iFCP to iSCSI translation, proxy additional entries that would make those resources available to iFCP-attached initiators.

Figure 6-15. A centralized iSNS server for a multiprotocol IP SAN

After devices have registered with the iSNS server, zoning information is supplied by a management workstation. With the appropriate DDs defined via management, the iSNS server can notify the clients that a reconfiguration of the network has occurred. The state change notification issued by the iSNS server will prompt any initiators to query the iSNS for available resources.

As the central repository of data for device discovery and discovery domain enforcement, the iSNS server is a logical place to host security services. As part of the iSNS registration process, for example, an IP storage device could register its X.509 Public Key Certificate with the iSNS server. After DDs are established, the iSNS server can distribute the

of 407

appropriate public keys between devices within the same domain.

Compared with SLP, iSNS requires more engineering for iSCSI adapters, iSCSI targets, and iFCP gateways. That is a vendor issue, however, and should be transparent to the customer. The advanced services provided by iSNS make it more suitable than SLP for enterprise storage networking and the preferred discovery mechanism for SANs that include both Fibre Channel and iSCSI components.


6.4 Quality of Service for IP SANs IP storage is the opportunistic beneficiary of enhanced services that have already been developed for standard IP networking. As discussed in Section 6.1.1, quality of service (QoS) mechanisms provided by Gigabit Ethernet can be applied to both iSCSI traffic and Fibre Channel-originated traffic that has been converted to iFCP. QoS enables expedited delivery of higher-priority frames over lower-priority frames and so facilitates use of a common infrastructure to carry different types of storage data. As shown in Figure 6-16, prioritizing on-line transaction processing over tape backup streams ensures that the backup process will not disrupt incoming customer orders and revenue.

Figure 6-16. Traffic prioritization through a shared Gigabit Ethernet transport

In addition to link layer traffic prioritization, IP networking standards support policy-based traffic management through Differentiated Services (DiffServ). As with header bit manipulation in IEEE 802.3p prioritization, DiffServ uses IP header bits to indicate as many as 64 different classes of service. The expanded classes of service options are rule-based and are meant to achieve various guarantees of delivery. How a vendor implements a specific class of service mayvary. The result, however, should be the same, even across multivendor IP network infrastructures.

Other quality of service offerings include Multi-Protocol Label Switching (MPLS) and Resource Reservation Protocol (RSVP). Both protocols establish data paths through the network to speed traffic flows and enable QoS enforcement. MPLS uses a packet labeling scheme to ensure that every packet is processed individually and handled according to its relative priority. RSVP, by contrast, establishes reserved routes through the network to guarantee available bandwidth from source to destination. This avoids individual packet processing but requires extensive coordination between network routers to establish and tear down routes.

Whether at layer 2, layer 3, or switch-to-switch protocols, leveraging Gigabit Ethernet and IP quality of service for storage data requires end-to-end QoS protocol support by the intervening network equipment. [ Team LiB ][ Team LiB ]

6.5 Security for IP SANs

of 407

Security for storage data has trailed behind the deployment of shared storage technologies. In part, this lag has resultedfrom the physical isolation of SANs from the rest of the data communications network. In part, security and management issues could not be addressed until SAN technology itself achieved stability and an installed base. In contrast, storage data over IP networks has focused more attention on security because that data is now potentially exposed to untrusted network segments. This sensitivity in turn has exposed the security weakness of Fibre Channel SANs, and a new industry has developed specifically to address SAN security. NeoScale, for example, has developed products that enable Fibre Channel-originated storage data to be encrypted to disk, providing a secure solution for highly sensitive information stores.

SAN security involves access, authentication, and data encryption. Rudimentary access security can be provided by zoning, LUN masking, or access control lists (ACLs). As with quality of service, IP networking brings additional tools for securing storage data. At the link layer, VLAN tagging can be used to enforce the equivalent of Fibre Channel zone segregation. The iSNS protocol discussed earlier can be used to enforce zoning through discovery domains.

Authentication and data encryption are well-defined facilities in IP networking, with specifications provided under the banner of IP Security, or IPSec. The IPSec architecture is defined in the IETF RFC 2401 standards track document, "Security Architecture for the Internet Protocol." IPSec has two main components: authentication of the identity of communicating peers in the network and data encryption. Authentication is defined in RFC 2402, "IP Authentication Header," and payload encryption is defined in RFC 2406, "IP Encapsulating Security Payload (ESP)." Because various authentication and encryption methods have evolved over time, IPSec does not specify an individual solution. The specific type of authentication or encryption to be used is negotiated by the end devices or is established by IPSec products such as firewalls or IP routers with IPSec engines.

Authentication and encryption methods are dependent on encryption keys. A key is a value that is applied in an encryption algorithm against standard data. The longer the key, the more difficult it is to decipher the newly encrypted data. The Data Encryption Standard (DES), for example, uses a 56-bit key, allowing for as many as 72 quadrillion possible keys that can be applied to a datagram. In a well-publicized challenge, however, a DES-encrypted message wascracked, resulting in a more rigorous triple DES implementation. In triple DES, data blocks are passed through three different keys to thoroughly encrypt them.

Although secret keys such as DES can be used at both source and destination to encrypt and decrypt user data, both sides must have the same list of keys to properly decipher arriving datagrams. Securely exchanging secret key lists is therefore an issue. Another approach, provided by Public Key Infrastructure (PKI) services, employs public and private key pairs for data encryption and decryption. Based on the International Telecommunications Union (ITU) X.509 standard, PKI enables the distribution of public keys over untrusted networks while relying on the security of individual hosts to maintain control over their private keys. A message that is encrypted by a remote host using your public key can be decoded locally using your private key.

The role of IPSec for storage has been a contentious issue in the standardization of iSCSI and IP storage protocols. Ideally, IPSec should be embedded in every networked device so that complete authentication and encryption would be available for every IP-based transaction. The IETF has therefore encouraged incorporation of IPSec in every IP transport protocol, including iSCSI. The reality, however, is that embedding IPSec ASICs on every adapter card would drive the cost of iSCSI NICs much higher and beyond the reach of the broad market. In addition, customers may already have firewalls or IPSec options installed in their IP routers and thus see less value in driving security down to the end device.

In the end, customers must decide which parts of their networks are untrusted and where IPSec should be applied. For storage networking applications, third-party IPSec products are now capable of wire-speed data encryption and unobtrusively appear as "bumps in the wire" when deployed. As shown in Figure 6-17, IPSec hardware can be installed on the boundary of an untrusted network segment. IPSec products see iSCSI or iFCP traffic as simply additional IP datagrams, so even though these security products are not storage-aware, they can provide a valuable service for storage data. When IP security chip sets become more commoditized, it will be possible to justify the cost of implementing security for every networked device.

Figure 6-17. IPSec for Fibre Channel-originated storage data

of 407


6.6 Wide Area Storage Networking Traditionally, storage applications have assumed the close proximity of servers and disks. The transition from direct-attached parallel SCSI connectivity to networked Fibre Channel SANs extended the potential distance between servers and storage to 500 meters, and the introduction of IP storage potentially separates servers and storage configurations by thousands of miles.

The physical connectivity between storage resources is invisible to the upper-layer storage application. The effects of thephysical connection are felt only if the SCSI operation is not completed within an expected interval or if the response time of the upper-layer application slows. A frame error on a Fibre Channel transaction, for example, may trigger a retransmission of a sequence of frames, but if the recovery is quick enough, no error is reported to the upper-layer SCSIinterface. Similarly, TCP may be invoked to recover a dropped iSCSI PDU, but timely recovery will not disrupt a SCSI read or write operation.

Assuming an error-free link between two devices, two variables may adversely affect the transport: bandwidth and latency. The bandwidth must be sufficient to carry storage data at a performance level expected by the upper-layer application. High-definition video, for example, may require ~130MBps bandwidth for nondisruptive display. If sufficient bandwidth is not provided, the application may falter or fail.

The other major variable is latency. Latency in a link may be induced by network equipment or simply by speed-of-light propagation over distance. Manufacturers of network switch equipment have greatly reduced the amount of latency caused by buffering and routing frames from one port to another. Aside perhaps from quantum tunneling, however, no one has been able to overcome the physical laws of speed-of-light latency. The longer the distance between source and destination, the greater the propagation delay as signaling makes its way toward its target.

Bandwidth typically provided by carrier services is a fraction of the gigabit and multigigabit rates common to SANs. As shown in Table 6-2, available transmission rates range from T1 (1.544Mbps) to multigigabit speeds, with the most commonly deployed services for enterprise networks being T3 (45Mbps) to OC-12c (622Mbps).

Companies that own their own wide area links have a clear advantage in allocating bandwidth to suit specific applications. Companies that must rely on carriers pay substantially for higher bandwidth in the OC-12 (622Mbps) and OC-48 (2.5Gbps) range. In addition, "last mile" runs from the provider to the customer premises can pose additional expense and delays in deploying a WAN solution.

Fortunately, although SANs may run at gigabit or multigigabit speeds, most storage applications do not really require gigabit bandwidth. A tape backup stream, for example, may require only 12MBps–25MBps, although the SAN pipe provides ~200MBps. Similarly, on-line transaction processing applications may have very modest requirements for bandwidth and are not accelerated by virtue of traversing very fast links. The vendor adage that "speed sells" is true, although it is often sold into environments that do not fully utilize it.

Table 6-2. Typical Wide Area Network Transmission Rates (Source: R. Fardal)

of 407

This fact is borne out by storage over wide area connections. A T3, for example, supplies 45Mbps of bandwidth, or a payload of approximately 5MBps. That is probably the lower limit of storage over wide area applications, but it is still sufficient for incremental tape backups over distance. Low-volume asynchronous data replication may also survive at T3 speeds or at OC-3c's delivery of approximately 17MBps.

The calculation of throughput of data volume versus bandwidth is fairly straightforward. As shown in Table 6-3,transferring 70 gigabytes of storage data each hour would require an OC-3c link. A terabyte of storage data would therefore take approximately 15 hours over an OC-3 link. Similarly, it would take a little more than 2 days (over, say, a weekend) to back up a terabyte of storage over a T3 link.

These calculations are predicated on the ability of the storage equipment and IP SAN interconnection to perform at the available bandwidth rate, including wire-speed Gigabit Ethernet. Therefore, the iFCP or iSCSI protocol conversion needs to scale from slower T3 links to full gigabit speeds if you want to avoid having to calculate switch latency into the equation. In addition, for T3 and T1 links it is highly desirable to have rate limiting options to pace Fibre Channel-originated traffic so that it does not flood a slower WAN link.

Services (GigE switch or router on each end) Link Speed (bits) Payload Rate (bytes)

Packet over DS3 and SONET (North America)

DS1/T1 (not recommended for most storage applications) 1.544Mbps 0.2MBps

DS3/T3 45Mbps 4.9MBps

OC-3c ("c" indicates concatenated clear channel) 155Mbps 17MBps

OC-12c 622Mbps 67MBps

OC-48c 2,488Mbps (2.5Gbps) 270MBps

OC-192c 9,953Mbps (10Gbps) 1,078MBps

Packet over E3 and STM (Rest of World)

E1 (Not recommended for most storage applications) 2.048Mbps 0.2MBps

E3 34Mbps 3.7MBps

STM-1 155Mbps 17MBps

STM-4 622Mbps 67MBps

STM-16 2,488Mbps (2.5Gbps) 270MBps

STM-64 9,953Mbps (10Gbps) 1,078MBps

Gigabit Ethernet MAN and WAN Services

Fractional GigE over GigE Ports (Max rate is user-configurable) Up to 1Gbps Up to 108MBps

Gigabit Ethernet 1Gpbs 108MBps

10 Gigabit Ethernet (MAN/WAN interswitch links) Up to 10Gbps 1,078MBps

Table 6-3. Calculating Data Throughput Based on Available Bandwidth (Source: R. Fardal)

Link Mbps MBps MB/minute MB/hour GB/hour

T1/DS1 1.544 0.193 11.58 694.8 0.6948

T3/DS3 45 5.625 337.5 20,250 20.25

OC-3c 155 19.375 1,162.5 69,750 69.75

of 407

Bandwidth is only part of the SAN/WAN relationship. Speed-of-light latency may have a more profound effect, especially over intercontinental distances, such as backing up corporate data from Los Angeles to Tokyo (more than 5,000 miles in each direction). Although the speed of light in a vacuum is approximately 186,000 miles per second, transmission through optical cabling reduces speed-of-light propagation to about 100,000 miles per second, or 100 miles per millisecond. Transmission between two points 100 miles apart would therefore incur a millisecond of delay in each direction, or 2 milliseconds round trip. As shown in Table 6-4, latency increases directly in proportion to distance.

These latency figures assume minimal contribution by optical switches and IP routers. State of the art wide area networkequipment typically contributes less than 200 microseconds of latency, and so it has minimal impact on latency calculations. Substandard equipment, however, may affect performance, so it is advisable to monitor round-trip latency across the WAN service in order to properly calculate the impact of total latency on a proposed storage application.

Latency cannot be resolved by throwing more bandwidth on the link. Whether the transmission speed is T3 or OC-48, speed-of-light latency over a given distance is the same. The implication for storage applications is that delay-sensitive applications, such as synchronous disk mirroring, may not be able to tolerate extremely long links. Streaming applications such as tape backup or content distribution, however, are very tolerant of latency.

In the Promontory Project organized by Nishan Systems, storage applications were successfully run at gigabit speeds between Sunnyvale, California, and Newark, New Jersey. Qwest provided OC-48c links for this transcontinental IP SAN using Cisco IP routers and Ciena optical switches. As testimony to the efficiency of current wide area network equipment, packet loss through this configuration was zero for the several months that storage data was run. A combination of iSCSI and iFCP traffic, as well as conventional data pumping tools such as Intel IOmeter, was used to support tape backup and asynchronous data replication. For this technology showcase, sustained transmission of more than 100MBps in each direction was enabled by enormous buffering on the IP storage switches (256MB per long haul port).

Port buffering cannot overturn the laws of physics, but it does compensate for the effects of latency over long distances. The greater the port buffering, the greater the number of credits that can be issued for any transaction. Instead of issuing a handful of credits and then waiting tens of milliseconds for additional credits to be issued from the receiving side, an IP storage switch can issue hundreds of credits at once. The sending IP storage switch can thus stream frames across the wide area link, filling the WAN pipe at the same time that new credits make their way back from the destination.

Utilization of the WAN link is also enhanced by increasing the number of outstanding I/Os that an initiator can manage. Concurrent I/Os minimize the bottleneck at the host processing side of the link, allowing more data to be sent in less

OC-12c 620 77.5 4,650 279,000 279

OC-48c 2,480 310 18,600 1,116,000 1,116

Fast Ethernet 100 12.5 750 45,000 45

Gigabit Ethernet 1,000 125 7,500 450,000 450

Megabits per second divided by 8 for megabytes per second

MBps per second times 60 for megabytes per minute

MB per minute times 60 for MB per hour

MB per hour divided by 1,000 for gigabytes per hour

Table 6-4. Speed-of-Light Latency Increases at Roughly 1 Millisecond per 100 Miles

Point-to-point distance (km) 893 1,786 2,679 3,572 4,465 5,357 6,250 7,143

Point-to-point distance (miles) 555 1,110 1,664 2,219 2,774 3,329 3,884 4,439

Latency each way (ms) 5 10 15 20 25 30 35 40

Round trip latency (ms) 10 20 30 40 50 60 70 80

of 407

time. [ Team LiB ][ Team LiB ]

6.7 Chapter Summary Gigabit Ethernet

� Gigabit Ethernet is a data-link transport that borrows from both Fibre Channel and conventional 802.3 Ethernet.

� Ethernet framing is used to transport TCP/IP data over Gigabit Ethernet networks.

� 802.1Q VLAN tagging allows segregation of devices on the SAN.

� 802.1p/Q frame prioritization enables mission-critical traffic to be assigned one of eight levels of priority for SAN transport.

� 802.3x flow control provides reliable transport of storage data over connectionless protocols such as UDP/IP.

� 802.3ad link aggregation allows scalability of IP-based SANs with no loss in performance.

� Gigabit Ethernet cabling includes Category 5 UTP as well as standard multimode and single-mode fiber-optic cabling.

TCP/IP

� IP is a layer 3 network protocol that sits on top of the data-link and physical layers.

� IP routing enables communication between different networks or different segments of a single network (or both).

� IP addressing is based on a 32-bit address field, commonly represented in a dotted decimal notation.

� An IP address has a network component and a host component.

� Subnet masking is used to divide a single IP address range into small IP segments.

� Classless Inter-Domain Routing (CIDR) was created to overcome IP address starvation and revises the traditional IP class system.

� Address Resolution Protocol (ARP) is used to discover the MAC address associated with a particular IP address.

� Open Shortest Path First (OSPF) is a link-state protocol that calculates optimum routes by availability, bandwidth, traffic load, and other factors relating to links between routers.

� Convergence is the time required to achieve network stability after a network change has occurred.

� TCP initializes a connection between two hosts before data is transferred.

� TCP provides error recovery and reordering of out-of-sequence segments.

� The TCP/IP stack is CPU-intensive for the host system.

� TCP sliding window refers to the transmission of multiple TCP segments before acknowledgments are received.

of 407

� TCP slow start prevents TCP from streaming segments onto an unreliable network.

� TCP recovery mechanisms provide for retransmission of lost segments, out-of-order delivery, and discard of duplicate segments by the receiver.

� A stable network infrastructure reduces the latency incurred by TCP recovery processes.

� For IP storage applications, network conditions that force TCP recovery mechanisms should be minimized.

Native IP Storage Protocols

� iFCP is a gateway-to-gateway protocol for integrating Fibre Channel end devices into an IP storage network.

� iFCP storage switches may replace Fibre Channel switches and provide direct connection of Fibre Channel end devices.

� iFCP provides fabric service emulation to translate between FC and IP domains.

� iFCP supports multiple TCP connections for concurrent storage transactions.

� Security for iFCP implementations can be provided by zoning, using public/private keys, and using IPSec.

� iSCSI follows the SCSI client/server model.

� iSCSI assumes that both initiators and targets have native iSCSI interfaces.

� iSCSI commands and data are issued via protocol data units (PDUs).

� PDUs are used to encapsulate SCSI command descriptor blocks (CDBs).

� iSCSI error handling includes recovery of individual PDUs, reestablishment of TCP connections, and reestablishment of iSCSI sessions.

� Security can be provided by Kerberos, public key, IPSec, or other methods.

� To provide adequate performance for SANs, iSCSI adapters require TCP off-load engines.

Discovery in IP SANs

� The Service Locator Protocol (SLP) defines user agents (UAs), service agents (SAs), and directory agents (DAs).

� SLP service agents advertise resources by IP address, URL, or both.

� An SLP service agent can reside in an iSCSI storage array, switch, or server.

� SLP is intended for small and medium IP SAN configurations.

� iSNS combines Fibre Channel and DNS functions to provide device discovery for iFCP and iSCSI devices.

� iSNS can be centralized in servers or distributed in IP storage switches.

� iSNS provides device registration, zoning, and state change management for IP SANs.

of 407

� iSNS zones are called discovery domains (DDs).

� Discovery domains can be organized into optional groupings known as discovery domain sets (DDS).

� An iSNS server is managed by an external management workstation.

� iSNS can scale to large enterprise-class storage networks.

� The iSNS server can host security services such as public key distribution.

� Authentication keys can be distributed during iSNS login.

� Public key security is scalable to large IP SANs.

Quality of Service for IP SANs

� Quality of service includes traffic prioritization, bandwidth allocation, and timely delivery guarantees.

� Quality of service mechanisms can be leveraged for IP storage networking.

� The IEEE 802.1p standard is a link layer method for assigning priority to a frame.

� 802.1p is enforced through buffer queuing in IP switches.

� DiffServ enables policy-based traffic prioritization.

� Resource Reservation Protocol (RSVP) establishes guaranteed bandwidth for data flows.

� Multi-Protocol Label Switching (MPLS) uses frame labels to expedite traffic through the network.

� A label can be a header inserted in front of a standard IP datagram.

Security for IP SANs

� IEEE 802.1Q VLAN tagging enables separation of storage traffic through a switched IP network.

� VLAN tagging is a standard feature of most Gigabit Ethernet switches.

� IP Security (IPSec) includes authentication of end devices and encryption of user data.

� Authentication and encryption rely on keys.

� A key is a variable value applied in an encryption algorithm that processes blocks of data into encrypted output.

� The Data Encryption Standard (DES) provides a 56-bit key.

� Triple DES uses three separate keys to encrypt data blocks.

� Public Key Infrastructure (PKI) provides public key/private key pairing to facilitate key distribution over untrusted networks.

� IPSec can be implemented for IP SANs via firewalls, within storage switches or gateways, or on individual storage devices.

of 407

� Shared network segments require some form of IPSec solution to ensure data security.

Wide Area SANs

� IP storage enables storage data to be transported over long distances.

� Bandwidth and latency are the main variables for wide area SAN design.

� Wide area carrier services range from T1 (1.544Mbps) to OC-192c (10Gbps).

� T3 (45Mbps) is the recommended minimum bandwidth for storage applications.

� Latency is the result of speed-of-light propagation and network equipment processing.

� Current network equipment typically incurs less than 200 microseconds of latency.

� Speed-of-light latency is approximately 1 millisecond for every 100 miles of transit.

� Port buffering helps to offset the effect of speed-of-light latency.

� Concurrent I/Os increase bandwidth utilization over distance.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 7. IP SAN Products IP storage combines traditional IP networking with shared storage technologies. The IP SAN plumbing is therefore a mix of conventional Gigabit Ethernet switches and IP routers, as well as storage-specific adapters, gateways, targets, bridges, and management servers. This chapter provides an overview of both established IP network products and the new IP storage products that are enabling the convergence of shared storage and mainstream IP networking. Chapter12, Application Studies, discusses how these products can be applied for practical storage applications.


7.1 Gigabit Ethernet Switches Gigabit Ethernet switches form the core infrastructure of nearly all enterprise IP networks. The term switch is somewhat misleading, because typically Gigabit Ethernet switches also perform layer 3 routing of IP packets. The term router still connotes the high processing latency common to first-generation IP routers, whereas switch implies high performance. Current Gigabit Ethernet switches are now both layer 2 switches and layer 3 routers, with some products also performing layer 4 through 7 functions.

Gigabit Ethernet switches are marketed as both departmental 8- or 16-port products and data center-class 200+-port products. High port count data center Gigabit Ethernet switches may include high-availability features such as hot-swappable modules, nondisruptive microcode load, failover and redundant processors, power supplies, and fans. These larger Gigabit Ethernet switches are therefore equivalent to Fibre Channel directors in terms of high availability, although the per-port cost is typically half (or less) that of Fibre Channel products.

As with Fibre Channel switches, Gigabit Ethernet switches achieve high port density by using small form factor connectors. Unlike Fibre Channel switches, Gigabit Ethernet switches also support Category 5 cabling and so can provide even high density through use of standard RJ-45 ports. This enables a Gigabit Ethernet director-class switch to support more than 200 ports in a single chassis.

Switch architecture is typically based on a modular design that allows for a variety of port and processor cards to be installed in the chassis. Vendors then have the opportunity to sell varying densities of ports as well as value-added functionality supported by dedicated processors. In addition to port cards (blades) that provide 1Gbps ports, the vendormay supply blades for 10Mbps/100Mbps connections to workstations or 10Gbps modules for interswitch links, as shown in Figure 7-1. With multiple link speed modules in the same chassis, the blades must be designed with sufficient buffering to avoid flooding slower ports when they receive packets from higher-performance ports.

Figure 7-1. Gigabit Ethernet switch with 10Mbps/100Mbps, 1Gbps, and 10Gbps port modules

of 407

As additional port blades are installed, the aggregate bandwidth required for switching increases. In some vendor implementations, a fully loaded switch may result in a blocking architecture if all ports are simultaneously active. The data sheet may not reflect this fact, but you should always query the vendor on the tested performance of the switch at maximum load. Because a switch might support different types of traffic through VLAN segregation, it is important to avoid disruption of storage data as additional modules and load are added to the chassis over time.

Gigabit Ethernet switches are cut-through switches. In a cut-through implementation, only the destination address makes a switching decision. For layer 2 switching on the same network segment, only the MAC address is read. For layer3 switching, the destination IP address is parsed from the IP header. In either case, as soon as the intended destination is known, the rest of the packet is automatically switched to the appropriate port. A store-and-forward switch, by contrast, must buffer the entire packet before a switching decision is made. IP storage switches are necessarily store-and-forward switches, because not only must the entire packet be buffered before switching, but also the contents of thepacket itself undergo protocol conversion. The challenge for IP storage switch design is to do this at wire speed so that the switch appears to have cut-through performance.

The arrival of 10Gbps interswitch link modules in mid-2001 enabled Gigabit Ethernet switches to more easily scale to support thousands of devices. Like 1Gbps links, 10Gbps ports can be link-aggregated to provide fatter pipes between switches. Depending on traffic requirements, interswitch links can be provisioned to create a high-performance multiswitch network core, with fan-out to edge switches to provide departmental connectivity.

Because Gigabit Ethernet switches support layer 2 and layer 3 functionality, they must support both IEEE layer 2 standards and IETF IP routing standards. Layer 2 features include 802.3z compliance, 802.1Q VLAN tagging, 802.1p/Q frame prioritization, 802.3x flow control, and 802.3ad link aggregation. Layer 3 IP features include a long laundry list of IETF Requests for Comments (RFCs), including OSPF routing, RIP, and Border Gateway Protocol (BGP). In addition, vendors may offer auxiliary modules for MPLS and IP security, along with advanced support for management and auditing.

Using Gigabit Ethernet switches for IP storage eliminates the requirement for dedicated director-class storage switches to build data center SANs. From the standpoint of the Gigabit Ethernet core, storage data is only one more variant of IP traffic. Storage-specific functions are provided by iFCP or iSCSI gateways or iSCSI end devices. Customers can elect to devote one or more Gigabit Ethernet switches to storage, or they can create a VLAN for existing Gigabit Ethernet switches to support both storage and messaging traffic. If the network is properly designed and if sufficient attention is paid to application requirements, you can conveniently support block storage data on Gigabit Ethernet switching while taking advantage of layer 2 and layer 3 enhanced services.

Because data center Gigabit Ethernet switches are already built on a modular architecture, you can further integrate IP storage in the mainstream network by incorporating storage services modules in the switch chassis. In addition to

of 407

simplifying deployment, integration centralizes management of IP storage assets and facilitates support of storage-specific operations such as virtualization. [ Team LiB ][ Team LiB ]

7.2 IP Routers IP routers are distinguished from switches by the presence of WAN interfaces and routing services optimized for Internetor wide area connectivity. IP router products range from middling-performance small office devices to sophisticated director-class data center units. An IP router may also be an Ethernet switch within the local environment, providing layer 2 switch ports for end device attachment. Unlike an Ethernet switch, however, an IP router provides one or more wide area ports that enable attachment to a carrier service. The WAN interfaces can be Packet over SONET (POS), ATM, ISDN, frame relay, or point-to-point and can drive bandwidth ranging from T1 to OC-192c.

For basic routing functions, an IP router is not concerned with the type of IP data it is forwarding. Higher-level services can be applied to specific types of data so that, for example, voice over IP meets its delivery requirements. For layer 3 routing, calculation of best paths via the OSPF protocol and maintenance of route tables ensures that incoming packets are forwarded expeditiously to the next hop in the network.

For wide area storage applications, the main requirement for IP routers is simply that they route efficiently and impose minimal latency. A small, low-cost IP router with a single Gigabit Ethernet port and T3 WAN interface may be entirely adequate for remote tape vaulting or resource sharing from a branch office. At the data center, director-class IP routers can receive storage streams from hundreds of remote locations or drive high-performance OC-48 links for disaster recovery between the data center and one or more recovery sites.

Some IP router models also provide options for implementing IPSec over the wide area link, or multiprotocol label switching to enforce QoS requirements. These enhanced services further simplify design and implementation of storage over distance, provided that you have the budget to acquire them. [ Team LiB ][ Team LiB ]

7.3 iSCSI Adapter Cards There has been an ongoing debate in the vendor community over what to call iSCSI adapter cards. Vendors from the IP networking community have marketed them as iSCSI NICs, because the cards may simply be enhanced Gigabit EthernetNICs with optimized iSCSI processing. In contrast, vendors from the Fibre Channel SAN community prefer to call them iSCSI HBAs, or at least iSCSI adapters, to emphasize that such cards have significantly more value than ordinary NICs. The real issue, as always for vendors, is money. SAN hardware has always demanded premium pricing, with Fibre Channel HBAs selling in the $1,200 to $2,000 range. Commonplace IP network equipment, and especially network interface cards, has been driven to commodity pricing. A Gigabit Ethernet NIC, for example, typically sells for less than $500 with optical interface, and less than $100 with copper RJ-45 interface.

Whatever they are called, iSCSI adapters are more expensive than Gigabit Ethernet NICs and less expensive than Fibre Channel HBAs. iSCSI adapters are more expensive than traditional NICs because of the additional processing required tooffload TCP and iSCSI processing by the host. TCP off-load engines (TOEs) are a prerequisite for enterprise storage applications. Burdening the host system with intensive network packet processing is unacceptable, and Fibre Channel HBAs have set the expectation that host CPU utilization should be less than 10 percent. Fortunately for iSCSI card manufacturers, the cost of TOE technology is being driven lower by the volume of TOE chips required for the much broader IP networking market.

As shown in Figure 7-2, a fully implemented iSCSI adapter card performs the same basic functions as an ordinary Gigabit Ethernet NIC and includes additional logic for TOE and iSCSI processing. At the interface to the network, the card looks very similar to a Fibre Channel HBA, reflecting Gigabit Ethernet's roots in Fibre Channel signaling. Clock and data recovery, serializing/deserializing (serdes), and 8b/10b encoding are the same. The LAN controller logic performs Ethernet-specific functions, and the TOE and iSCSI layers represent two new additions to the card architecture.

Figure 7-2. Functional diagram of an iSCSI adapter card

of 407

Although it is possible to run an iSCSI device driver over a standard Gigabit Ethernet NIC, the penalty for doing so is imposed by lost CPU cycles. A host system may expend 50 percent to 90 percent of its CPU cycles simply processing the TCP layer in each packet. In addition, the host would also have to serve the iSCSI device driver, placing an additional burden on host resources. If the specific storage application is tape backup and if it is performed during off hours, a software-only iSCSI implementation may be viable. From a cost standpoint, host connectivity would be free because iSCSI device drivers are available for download from a number of vendor Web sites. For enterprise storage applications, though, the iSCSI adapter should have at minimum an efficient TOE to reduce host network processing to less than 10 percent.

As shown in Figure 7-3, iSCSI can be implemented in several ways. In the left-hand column, software iSCSI is run over a standard NIC, which places both TCP and iSCSI processing responsibility on the host CPU as described earlier. In the middle column, software iSCSI is run over a TOE-enabled Gigabit Ethernet NIC. With the on-board TOE engine, most networking processing has been offloaded, but the host must still process iSCSI exchanges. In the right-hand column, both TCP and iSCSI processing have been fully offloaded. In this case, the card hands already processed SCSI data to the host SCSI interface to the operating system. As might be expected, the TOE adds cost to the card, as does the additional iSCSI processing logic. Still, it should be possible to implement a fully offloaded iSCSI card for less cost than acomparable Fibre Channel HBA.

Figure 7-3. iSCSI adapter implementations

of 407

iSCSI adapter cards provide flexibility for deploying servers within the network. No longer tethered to Fibre Channel switches, servers can be connected to standard departmental or core Gigabit Ethernet switches or to remote IP routers. On the storage target side, the interface can be iSCSI or an IP storage switch that performs iSCSI-to-Fibre Channel protocol conversion.

iSCSI adapters have begun the migration from Fibre Channel SANs to IP storage networking. With iSCSI on the servers and with the SAN interconnection provided by standard IP infrastructure, at least two-thirds of the SAN components can be IP-based. When target systems also support iSCSI, the entire SAN enters mainstream IP networking. [ Team LiB ][ Team LiB ]

7.4 iSCSI Storage Devices The first iSCSI storage target was the IBM 200i disk array. This was a medium- to low-performance storage target that was more of a technology showpiece than a serious contender for market share. Because of its moderate performance and lack of customer appeal, IBM eventually withdrew active marketing of the 200i, causing much gossip in the trade press at the time.

It is unfortunate that the first iSCSI storage target was performance-challenged. As with other poorly performing early market products, the lack of rigorous engineering fostered the belief that iSCSI was unsuitable for mainstream SAN applications. Some vendors themselves have encouraged this impression, marketing their iSCSI solutions for low-end applications such as backup of laptop PCs.

In fact, there is nothing in the iSCSI protocol or product implementation that limits it to the low end of the market. Like FCP, iSCSI is simply a serial encapsulation of SCSI. With TCP offload on end systems, iSCSI can deliver high performance with minimal processor overhead. For manufacturers of storage arrays and tape subsystems, then, iSCSI issimply one more interface that serves as a front end to their storage target.

iSCSI-attached storage arrays and tape offer the same advantages and flexibility as iSCSI adapters. Whereas Fibre Channel cut the umbilical cord of parallel SCSI cabling, iSCSI cuts the tether of Fibre Channel cabling. With 1Gbps or 10Gbps Ethernet interfaces on the storage target, it can be deployed anywhere within the IP routed network, as shown in Figure 7-4.

Figure 7-4. iSCSI targets can be deployed anywhere within the IP routed network

Although engineering an iSCSI interface for a RAID or tape subsystem is fairly straightforward, the one storage device that cannot be easily converted to iSCSI is the JBOD. An iSCSI JBOD would require iSCSI interfaces on individual disk

of 407

drives. At the moment there is no compelling reason for disk drive manufacturers to support iSCSI in addition to SCSI and Fibre Channel. JBOD support in a pristine iSCSI environment would therefore require an iSCSI-to-SCSI bridge or an iSCSI-to-Fibre Channel gateway.

In terms of storage functionality, the substitution of an iSCSI interface for a Fibre Channel interface does not alter the basic requirements of storage configuration or management. You still need a means to zone authorized resources, manage LUN visibility through LUN masking, provide dual or multiple pathing, support failover, and so on. In addition, with the continued development of array-based storage virtualization, this new intelligence will start appearing behind an IP-based interface. iSCSI enables storage to be more ubiquitous in the network; virtualization will make it much simpler to manage.


7.5 IP Storage Gateways An IP storage gateway may convert Fibre Channel to iSCSI or Fibre Channel to iFCP, or both, depending on product design. An iSCSI-to-Fibre Channel storage switch is typically used to bring iSCSI-enabled hosts to Fibre Channel-attached targets. IP storage gateway products may be designed around fixed-port configurations, with ports or port blades dedicated to either Fibre Channel or Gigabit Ethernet. Alternatively, multiservice port designs allow any port to beused for either Fibre Channel or Gigabit Ethernet attachment. The latter provides more flexibility but must accommodateboth Fibre Channel and Gigabit Ethernet signaling as well as protocols on a port-by-port basis.

The early market product in this case was the Cisco SN 5420, which provided a single Gigabit Ethernet port and a single Fibre Channel port with a substantial price tag. The second-generation version of this product is the SN 5428, which improves the port count to provide two ports of Gigabit Ethernet and eight ports of Fibre Channel at a more reasonable price. The underwhelming throughput of the original SN 5420 contributed to the downplay of iSCSI functionality. By virtue of integrating an eight-port Fibre Channel switch-on-a-chip, the SN 5428 offers improved performance.

iSCSI-to-Fibre Channel protocol conversion requires store-and-forward buffering, frame parsing (also called frame cracking), address translation, and the conversion of FCP commands, status, and data to iSCSI commands, status, and data. Ideally, this protocol conversion should be executed at wire speed, although not all products support full gigabit performance. As shown in Figure 7-5, a fixed-port configuration offers some number of dedicated Gigabit Ethernet ports and typically 8 or 16 Fibre Channel ports. Engineering this type of product is less difficult than designing multiservice ports, but the former is limited by the off-the-shelf components used to create it. Given that the primary function of an iSCSI-to-Fibre Channel gateway is to provide iSCSI host access to Fibre Channel targets, the port ratio is actually the opposite of what it should be. A better design would allocate more ports for iSCSI attachment and fewer to Fibre Channel, or at least an even ratio of iSCSI to Fibre Channel ports. As it is, if each Fibre Channel port is attached to a storage target, each would be grossly underutilized by the relatively few iSCSI hosts that could be installed on the 2Gbps Ethernet ports. In the example given in Figure 7-5, the 2 gigabits of iSCSI host access would be undersubscribed to the 12 gigabits (6 ports x 2Gbps Fibre Channel) available for storage access.

Figure 7-5. A fixed-port iSCSI-to-Fibre Channel storage switch

of 407

Other products in this class provide multiservice ports that can be configured for either Gigabit Ethernet or Fibre Channelon any port. This design presents a more difficult engineering challenge, but it overcomes the limitations of fixed-port solutions. As shown in Figure 7-6, a multiservice port design does not split out Fibre Channel and iSCSI connectivity but allows a more flexible assignment of devices. As the ratio of Fibre Channel and iSCSI end devices changes over time, individual ports can assume the appropriate personality to accommodate the change. This design, introduced by Nishan Systems in its IP storage switch, has enabled concurrent connectivity between any mix of Fibre Channel and IP storage end devices. As shown in Figure 7-6, iSCSI hosts have access to Fibre Channel targets, and Fibre Channel hosts can access iSCSI targets.

Figure 7-6. The multiservice port IP storage switch architecture pioneered by Nishan

In both of these examples, E_Port connectivity to existing Fibre Channel fabrics is provided. If the E_Port is properly implemented, this arrangement facilitates integration of new iSCSI products with Fibre Channel SAN assets.

For Fibre Channel-to-Fibre Channel extension, either iSCSI or iFCP can be used. The caveat is that not all storage applications will work if iSCSI is used as the intervening IP transport. EMC's Symmetrix Remote Data Facility (SRDF), for example, was originally engineered for Fibre Channel connectivity and expects FCP layer mapping between two storage arrays. The protocol conversion from Fibre Channel to iSCSI and back to Fibre Channel loses this necessary FCP layer association. Because iFCP preserves FCP commands, it is better suited to SRDF and other FCP-sensitive applications.

For Fibre Channel SAN extension over distance, iFCP gateways let you terminate E_Port connections at each site and thus avoid the problems associated with stretched fabrics, as discussed in Chapter 6. IP storage gateways that support

of 407

both iFCP and iSCSI protocol conversion offer a single box solution for stable SAN extension and migration from Fibre Channel to iSCSI storage networking. [ Team LiB ][ Team LiB ]

7.6 iSCSI-to-SCSI Bridges As with the Fibre Channel-to-SCSI bridge products reviewed in Chapter 5, iSCSI-to-SCSI bridges bring legacy SCSI tape and disk assets into an IP storage network. Given that 85 percent of organizations and enterprises still use direct-attached storage, the potential market for iSCSI-to-SCSI bridges is quite large. Over time, legacy SCSI disk arrays may be retired, primarily because of the relatively higher cost of maintenance and the scarcity of older SCSI disk drives. SCSI-based tape subsystems represent a substantial customer investment, however, and tend to stay on the books for along time. iSCSI-to-SCSI bridges enhance the value of these assets, simply by virtue of sharing resources with new iSCSI servers and new SAN-enabled applications.

As shown in Figure 7-7, an iSCSI-to-SCSI bridge can be attached directly to the IP network. From the standpoint of the iSCSI servers, the tape target appears as another iSCSI resource, reported through SLP or iSNS discovery. From the standpoint of the SCSI tape device, the iSCSI servers appear as conventional SCSI hosts to which the tape device has parallel SCSI cabling attachment. Access to the tape resource is managed by a backup server installed somewhere within the IP infrastructure. This server launches backups for each server on a predetermined schedule.

Figure 7-7. Using an iSCSI-to-SCSI bridge to integrate SCSI resources in an IP SAN

The iSCSI-to-SCSI bridge may also contain an extended copy (third-party copy) agent to handle the reads and writes of data directly from iSCSI storage to the tape subsystem. This takes the iSCSI server out of the data path and reduces thetotal volume of traffic on the IP network. It also frees server CPU cycles for processing user requests.

iSCSI-to-SCSI bridges may also be targets of NAS appliances that support iSCSI in addition to NFS and CIFS file transport protocols. NAS appliances can thus serve files to workstations and servers while leveraging block I/O for more efficient backup of NAS storage over the IP network. In addition, some vendors are adding support for other disk interconnections such as Serial ATA (Advanced Technology Attachment), as well as simple virtualization capability so that disparate disks attached to the bridge can be managed as a single storage pool. [ Team LiB ][ Team LiB ]

7.7 iSNS Servers

of 407

For larger IP SAN installations, one or more iSNS management servers can provide centralized discovery, zoning, and security services. Analogous to DNS servers, iSNS servers reside somewhere within the IP network and are accessible through a well-known address.

As shown in Figure 7-8, centralized administration of iSCSI and iFCP storage assets may be provided by an iSNS server whose policies and discovery domain assignments are managed by a management workstation. In this example, three discovery domains have been created to zone assets apart. Discovery domain A includes an iSCSI host and iSCSI storage target. Discovery domain B includes an iSCSI host as well as a Fibre Channel host and Fibre Channel storage connected by iFCP gateways. Discovery domain C includes all iSCSI and iFCP-attached hosts and an iSCSI-attached tape subsystem. Consequently, servers in discovery domains A and B can see only their assigned storage targets but can share the tape library in common.

Figure 7-8. Centralized IP SAN administration via an iSNS server

In addition, through centralized administration of security keys by the iSNS server, storage conversations between zoned initiators and targets can be authenticated and, if necessary, encrypted. Centralized iSNS management servers can be provided in stand-alone products or can be integrated into more comprehensive storage management platforms.


7.8 Chapter Summary Gigabit Ethernet Switches

� Gigabit Ethernet switches provide layer 2 link switching and layer 3 IP routing.

� Gigabit Ethernet switch products range from 8 to more than 200 ports.

� Port density is achieved through small form factor transceivers and RJ-45 connectors.

� A Gigabit Ethernet switch can support multiple transmission speeds.

� Cut-through switching routes packets based on destination address.

of 407

� Store-and-forward switching buffers the entire packet before routing.

� 10Gbps interswitch links enable scalable, high-performance core networks.

� Gigabit Ethernet switches support both IEEE and IETF standards.

� Gigabit Ethernet directors can be used to build highly available data center SANs.

IP Routers

� An IP router supports both Ethernet and wide area connectivity.

� WAN links can include Packet over SONET, ATM, ISDN, frame relay, and point-to-point.

� WAN link speeds range from T1 (1.544Mbps) to OC-192c (10Gbps).

� IP storage traffic appears as standard IP packets to an IP router.

� IP routers can incorporate QoS and IP security functionality.

iSCSI Adapter Cards

� iSCSI can be implemented in software or offloaded in silicon to a Gigabit Ethernet NIC.

� iSCSI adapters require TCP off-load engines (TOEs) to minimize host CPU utilization.

� Pricing for iSCSI adapters is midway between standard Gigabit Ethernet NICs and Fibre Channel HBAs.

� iSCSI adapters are changing the composition of SANs from predominantly Fibre Channel to predominantly IP.

iSCSI Storage Devices

� The functionality provided by storage arrays and tapes is not changed by substitution of iSCSI for Fibre Channel at the interconnection.

� iSCSI interfaces on storage targets provide flexibility in deployment of storage assets.

� JBODs will probably remain SCSI or Fibre Channel-attached.

� iSCSI-enabled storage is suitable for both data center and low-end storage applications.

IP Storage Gateways

� IP storage gateways can provide Fibre Channel-to-iSCSI or Fibre Channel-to-iFCP protocol conversion.

� IP storage gateway architecture can employ fixed ports for Fibre Channel or Gigabit Ethernet or multiservice ports that support both.

� IP storage gateways provide E_Port connectivity to existing Fibre Channel fabrics.

� iSCSI may not be suited to data replication between Fibre Channel storage arrays.

� iFCP gateways provide protection against Fibre Channel fabric reconvergence issues.

of 407

iSCSI-to-SCSI Bridges

� iSCSI-to-SCSI bridges enable legacy SCSI tape and disk to be integrated into an IP SAN.

� Bridge products can include extended copy agents to facilitate server-free tape backup.

� iSCSI-to-SCSI bridges extend the useful life of SCSI assets by leveraging them for new SAN-enabled storage applications.

iSNS Servers

� iSNS servers allow centralized discovery, zoning, and security for IP SANs.

� iSNS server products can be third-party devices in the IP network or can be integrated into more comprehensive storage management frameworks.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 8. SAN Software Products Storage area networking has led to the development of new software applications that leverage the potential of shared storage access. Because storage networking itself was encouraged by software application requirements for performanceand resource sharing, the emergence of new storage-specific applications continues the symbiotic cycle of hardware and software evolution. High-availability storage access via server clustering, rationalized data backup and restore, disk-to-disk data replication, and file sharing utilities—all these are predicated on a SAN infrastructure that provides peer-to-peer connectivity. This chapter examines some of the SAN-specific software applications that enhance data access. Laterchapters discuss storage management and storage virtualization. [ Team LiB ][ Team LiB ]

8.1 Server Clustering Server clustering involves several functional components, including failover, load sharing, and common data access. As long as each server is tied to its own storage via parallel SCSI cabling, you cannot efficiently combine the resources of multiple servers. Separating servers from storage, and then reconnecting them on a high-speed network topology, enables storage to be accessed independently. This, in turn, facilitates deployment of storage for data-centric applications and eliminates the access constraints of the server-centric model. If any server can provide a route to the same data, no server is indispensable.

As shown in Figure 8-1, common access to storage resources is supplied by the SAN interconnection. For servers on the same SAN to be clustered, however, cluster software must be run on each server to coordinate and monitor server-to-server communications.

Figure 8-1. Dual-pathed four-server cluster on a SAN

Failover for clustered servers can scale from 2 to more than 30 servers in a single configuration. Failover strategies are vendor-specific but generally require mutual monitoring of server status via a heartbeat protocol. The heartbeat protocol sends keepalive messages between the servers as well as notification of status or required action. The heartbeat can be run in-band over the Fibre Channel or Gigabit Ethernet SAN infrastructure, or out-of-band over a Fast Ethernet connection. In some implementations, a failover algorithm triggers on the loss of an individual application or discrete server component (for example, failure of a host bus adapter or LAN interface). Other servers in the cluster assume the tasks of an individual server that has suffered a failure. If multiple servers are running the same application,you need additional middleware to ensure that servers do not overwrite shared data.

In the example given in Figure 8-1, the SAN infrastructure supplements the high availability of the server cluster by

of 407

providing dual data paths through the SAN. Each server is provisioned with two SAN interfaces, each running to a separate SAN switch. Similarly, the storage targets are dual-ported to each data path. Although failure of an individual server may trigger failover to one of the surviving servers, failure of an individual SAN switch or link will still provide an alternative path to storage data.

Coordinating the activity of multiple servers and safeguarding file access are not trivial tasks, and that is reflected in the cost of clustering solutions. Compared with the cost of lost business due to down time, however, high availability via clustering is easily justified for mission-critical networks. In addition to clustering capability, integration of dual or multiple pathing into the operating system further facilitates deployment of server cluster solutions.


8.2 Tape Backup Tape backup is a universal requirement, and a universal problem, for all data networks, regardless of the specific topology employed. Data security via backup is not only desirable but also sometimes mandated by law (for example, in finance and banking operations). For server/storage configurations based on parallel SCSI, tape backup strategies may require a dedicated tape backup subsystem for each server (an approach that optimizes performance but increases costs) or a centralized tape backup configuration that requires a data path across a LAN (an approach that reduces cost but may hinder performance). As applications generate ever-growing storage requirements, the time required to back up data increases. You cannot accommodate this backup window via the production LAN without increasing the number of switch links to accommodate block storage as well as user bandwidth requirements. And, given the amount of data supported by an individual server, provisioning tape subsystems for every large server is cost-prohibitive.

Traditional LAN-based tape backup is based on backup of files. As shown in Figure 8-2, a tape subsystem is attached to a backup server, which in turns sits on the LAN. Each server on the LAN may have gigabytes of direct-attached storage that needs to be secured through backup. The backup server instructs each server to initiate a backup, with the data sent over the LAN from server to backup server. This type of backup involves multiple conversions. Upon launching a backup, the target server must read blocks of SCSI data from disk, assemble the blocks into files, and packetize the filesfor transfer over the LAN. At the backup server, the inbound packets must be rebuilt into files, and the files, in turn, are disassembled into blocks to be written to tape. Therefore, the original data blocks that reside on the target storage undergo four steps of conversion before reappearing at the destination as blocks: blocks file packets file

blocks. Both the server and the backup server must devote considerable CPU cycles to SCSI as well as network protocol overhead.

Figure 8-2. Traditional LAN-based tape backup

On shared media LANs, this backup operation places a significant load on the LAN and may disrupt user messaging traffic. Most enterprise LANs, however, use switched Ethernet infrastructures instead of shared media. Bottlenecks may occur on switch-to-switch links, but not on individual Ethernet switches.

of 407

SAN-attached storage and tape offers efficiencies for tape backup by eliminating the block-to-file conversion overhead. Instead, blocks are read from disk and written directly to tape across the SAN. For Fibre Channel, the block transport is over a separate topology. For IP SANs, the block tape backup may be over a separate Gigabit Ethernet switched network, or over a VLAN within a larger Gigabit Ethernet complex. In both cases, by leveraging block-based SCSI transfer, you allow more data to be transported in less time and with less overhead.

Placing servers, storage, and tape on a peer-to-peer network also enables new backup applications such as server-free tape backup. As long as the server is the exclusive owner of storage, the server must always be in the data path for any type of access, including tape backup. On a SAN, however, storage is directly accessible by a tape subsystem. As shown in Figure 8-3, the server can be taken out of the backup data path, with blocks of data read from disk and written directly to tape. Server-free backup requires an extended copy (third-party copy) agent, which typically resides in the tape subsystem, in a bridge, or in a SAN switch.

Figure 8-3. Server-free tape backup using extended copy

Whereas the SAN plumbing and utilities such as extended copy facilitate efficient backup of storage data, the application software that initiates and manages backup processes varies in capabilities from vendor to vendor. Although every storage administrator recognizes the necessity of data backup, it is often difficult to verify that a backup operation was completed and that the tapes can actually be used for data restore. In addition, regular backup operations may repeatedly copy data that is unchanged over time, and that adds to the volume and duration of the backup process. Vendors of backup software may provide utilities for verification, point-in-time (snapshot) backup for active databases, changed-block-only backup, or other value-added backup services. As the volume of storage data grows, the task of securely backing up data in a reasonable time frame becomes increasingly difficult. Backup application software and tape vendors are trying to meet this challenge. Storage vendors are offering data replication, an alternative but complementary solution. [ Team LiB ][ Team LiB ]

8.3 Data Replication Tape backup has survived simply because it is the most available means to provide a copy of data on a stable medium. Data on tape cartridges can be archived, transported off site, and stored indefinitely. But even though tape ensures the integrity of storage data, it is far too slow to be used for regular data access and requires a lengthy restore process if it must be copied back to spinning media.

Ideally, an enterprise should have a synchronized copy of data available and should be able to access that copy immediately if the primary storage fails. This is the goal of data replication, which uses disk mirroring algorithms to duplicate data from one disk array to another.

Software-based data replication may require that the host perform write operations to two separate disk targets. Although this solution may be more economical than others, it incurs more overhead on the host and requires multiple data paths from host to the intended targets. Disk-based data replication, by contrast, is transparent to the host system and offloads all duplication tasks to the disk array itself. In this case, the disk arrays must be both targets and initiators, receiving data to be written while managing write operations to the secondary storage.

of 407

Each storage vendor has developed its own data replication solution. For example, EMC's Symmetrix Remote Data Facility (SRDF), Hitachi Data Systems' True Copy, Hewlett-Packard's (Compaq) Data Replication Manager, and XIOtech's REDI-SANlinks all provide a means to perform disk-to-disk data replication for disaster recovery and other business continuance applications. Data replication normally implies distance, with primary and secondary storage arrays separated by at least metropolitan area distance. Consequently, data replication must define how data mirroring will be accomplished at the array level and how wide areas will be spanned.

As shown in Figure 8-4, a data replication configuration has primary and secondary storage arrays. In the case of EMC SRDF, an active-passive configuration requires data written to the primary array to be synchronously written to the secondary array. In the event of failure of the primary, the secondary can be accessed directly. For companies with multiple data centers or sites, however, an active-active configuration enables each site to serve as both primary for local access and secondary for another site. Regional centers can thus serve as mutual data replication sites for each other, ensuring that a readily accessible copy of each site's data is always available.

Figure 8-4. Active-passive (top) and active-active data replication

Data replication can be performed with synchronous or asynchronous updates between primary and secondary storage. In synchronous mode, the write operation is not final until both arrays have signaled write completion. This guarantees that an exact copy of data is now on both arrays, although at a performance penalty. The primary storage must always wait until the secondary is finished before reporting successful write to the host. In asynchronous mode, the primary array can buffer writes intended for the secondary and initiate them only during idle periods. This improves performance but may result in loss of a true copy if the write to the secondary subsequently fails. The primary array would then be forced to break the mirror to the secondary, and possibly track changes in data until the secondary recovers.

Data replication can be performed within the local data center to provide a current and readily accessible copy of data, or it can be extended over distance to facilitate disaster recovery and business continuance scenarios. In both synchronous and asynchronous implementations, the stability of the link between primary and backup disk arrays is critical, as is the latency that would naturally occur over very long haul links. If a primary array, for example, must wait some tens of milliseconds before a synchronous write is completed, the response time between the host and the primarywill be affected. At some point, the latency might be too great for synchronous replication over very long distances. Exactly where the edges of this envelope lie can be determined only in practice.

Another practical caveat is posed by the wide area services and protocols used to link primary and secondary sites. Some data replication applications, for example, were written specifically for Fibre Channel SANs and expect to leverage Fibre Channel Protocol (FCP) in the disk-to-disk storage transaction. Converting from Fibre Channel to iSCSI for the widearea link and then back to Fibre Channel at the secondary site loses too much of the FCP layer mapping and so is unworkable. Simple FCIP tunneling may be suitable for limited point-to-point extension, but it typically requires a Fibre Channel switch at each end to serve as a front end to the storage arrays. The iFCP gateway protocol enables multipoint

of 407

data replication while avoiding the need for additional Fibre Channel switches for storage attachment.


8.4 Distributed File Systems and File Sharing Although a peer-to-peer storage network enables multiple servers to share a storage resource, it does not automatically enable servers to share the same storage data. In fact, most SAN deployments are configured for a shared nothingenvironment, in which individual servers are assigned separate LUNs on the storage target. Each server manages its own data. In a server clustering scheme, the LUNs previously assigned to a failed server can be mapped to active servers so that data access can continue. The SAN provides the network that facilitates this deliberate reassignment of resources, but at any point in time each server has access only to its authorized LUNs.

If, alternatively, any server could access any storage resource in the SAN, data corruption would be rampant. Servers might inadvertently overwrite blocks of data, with no server able to track the random changes to the volumes exposed to common ownership. Consequently, low-level mechanisms such as zoning and LUN masking are used to enforce resource allocation and avoid this potential free-for-all.

To share the storage data itself, a layer of management prevents data corruption. You must monitor the status of a file or record that is accessible to multiple servers on the SAN so that you can track changes. In read-only applications, suchas multiple servers accessing display-only Web content, a simple file locking monitor may be sufficient. In active read/write applications, such as a load sharing server cluster, more sophisticated management is required. Changes to shared data must be permitted, but in an orderly fashion that synchronizes all modifications to the original data and writes a coherent version back to disk.

Data sharing is further complicated by the fact that the data is typically dispersed over multiple storage arrays as a storage pool, as shown in Figure 8-5. The complex of physical storage devices on the SAN must be presented as a single logical resource on top of which sits a common view of a file system shared by all servers in a cluster. A distributed volume manager must thus present a coherent view of the physical storage resources; a distributed file system presents a uniform view of directories, subdirectories, and files.

Figure 8-5. A distributed file system presents a common view of resources to all servers

A distributed file system must present a consistent image to the server cluster. If a new file is created by one server, other servers must be updated immediately. Similarly, a file opened by one server cannot be arbitrarily deleted by another, or else the file system will lose integrity. If multiple servers may have the same file opened, and if modifications of the file are permitted, a distributed file system must also be able to notify each server of pending changes.

This task is further complicated by the fact that a server may be working on file content based on a now-outdated version in its cache buffers. Cache coherency between multiple servers requires notification of any changes to the file and a reread of the updated file to refresh the cache. One solution to this problem is to force the server to rely on cache

of 407

buffers on the storage array instead of the host system. Device Memory Export Protocol (DMEP) has been proposed as a means to maintain cache coherency on storage arrays and avoid the issue of conflicting cached versions of files on each clustered server.

Applications that benefit from data sharing range from high-availability server clusters to processing-intensive application clusters that must digest massive amounts of data. The scientific research facility Fermilab, for example, uses Sistina's Global File System (GFS) for large server clusters that analyze data on the distribution of galaxies in the universe. With an initial server cluster of ten CPUs acting in concert, the Fermilab project must analyze more than 15 terabytes of data. The Sistina GFS presents a common view of data to the server cluster, enabling the processing to be shared across multiple platforms. In this example, the Sistina GFS also interfaces with DMEP in the target storage arraysto provide a streamlined mechanism for file locking and modification.

Although the management of distributed file systems is more complex than that of a shared nothing implementation, a distributed file system facilitates high availability, simplifies system administration, and enables high-performance processing clusters for a wide variety of compute-intensive applications. Tighter integration into operating systems will make distributed file systems and file sharing more available and further enhance the value proposition of SANs. [ Team LiB ][ Team LiB ]

8.5 Chapter Summary Server Clustering

� Networked storage enables multiple servers to assume the tasks of a failed server.

� Server clustering requires failover software on each server.

� Servers monitor the status of the cluster via a heartbeat protocol.

� Server clusters are typically provisioned with dual data paths through the SAN.

Tape Backup

� Traditional LAN-based tape backup requires multiple block-to-file conversion processes.

� SAN-based tape backup enables block transport for backup.

� Extended copy (third-party copy) agents can move blocks directly from disk to tape, taking the server out of the backup data path.

� The integrity of a tape backup process requires validation through testing or restore.

Data Replication

� Data replication provides a current copy of data on a separate disk mirror.

� Software-based data replication requires that the host system perform write operations to two separate arrays.

� Disk-based data replication performs direct disk-to-disk write operations.

� An active-passive configuration is based on a primary and a secondary storage array.

� An active-active configuration enables each array to be primary to itself and secondary to another.

� Synchronous data replication requires completion of writes to both arrays before notification of write complete to

of 407

the host.

� Asynchronous data replication spools pending writes from the primary to the secondary array.

� Synchronous data replication may be sensitive to speed-of-light latency over long distances.

Distributed File Systems

� In a shared nothing configuration, each host is assigned its own LUNs.

� File sharing must be managed to safeguard against overwrites and data corruption.

� A distributed file system presents a uniform view of data to all servers in a cluster.

� Cache coherency requires that each host have the most current version of a file in its cache buffers.

� Device Memory Export Protocol (DMEP) maintains file cache on the storage arrays instead of the host systems.

� Applications such as Sistina Global File System (GFS) can interact with DMEP to provide file locking and managed updates to a distributed file system.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 9. Problem Isolation in SANs No network is bulletproof. Even with careful selection of network equipment, cabling, and management applications, eventually something will break. The challenge then is to minimize the down time caused by a failure and keep the rest of the network operational. Because SANs represent the fusion of networking and storage, storage networks inherit both the strengths and the weaknesses of each. In addition, upper-layer applications may not always behave as advertised, further complicating diagnosis and isolation of problems. This chapter examines some of the basic problem-isolation techniques used in SAN environments and the tools available to monitor the health of the storage network. [ Team LiB ][ Team LiB ]

9.1 Simple Problem-Isolation Techniques Generally, most error conditions are encountered during changes to a network's topology. This is particularly true duringinitial installation of the network because new equipment is being connected for the first time. In complex SAN installations, you must install host adapters and load their appropriate device drivers, lay the cable plant, install and configure switches, position GBICs or transceivers, and properly deploy and cable disk arrays. To simplify installation, solutions providers typically preconfigure as many components as possible, but there is always an opportunity to incorrectly cable devices together or omit steps in installation. Things happen.

After the various SAN components have been configured, cabled, and powered on, you usually verify operation by testing a server's ability to access storage. If a server cannot see part or all of the assigned disks, elementary troubleshooting begins. As outlined in Figure 9-1, this process typically begins with examination of the physical cable plant, port status, and transceivers and continues through verification of the host system, interconnection, and storage target.

Figure 9-1. Troubleshooting server-to-target connectivity

of 407

You can diagnose physical-layer problems by verifying insertion or bypass status at the switch port. Port status LEDs should indicate whether an inbound signal is recognized or whether it is intermittently making and breaking contact. It isdifficult to troubleshoot signaling status when copper cabling is used, because there is no immediate way to verify signal presence. For optical, non-OFC connections, disconnecting the cable at the device allows visual verification of signal presence on the SC connector and thus speeds problem identification, as shown in Figure 9-2. Short exposure to non-OFC laser is not harmful, but it is always recommended that you hold a small mirror or translucent paper over a fiber-optic connection before looking at the laser source. If no light is present on the fiber-optic cable, the likely cause is a break along the length of the cable lead. If light is present, lack of communication may have a number of other causes, including incorrect cable length or type, poor quality of cabling, dirty connectors, excessive jitter along the link, or protocol-level problems at the transport or upper layers. Using a known good cable in place of the existing run is usually the fastest means to verify the cable plant and move on to the next step in troubleshooting.

Figure 9-2. Verifying signal presence on a non-OFC fiber-optic SC connector

Because bypass mode can be triggered by loss of signal, disconnecting the cable at the switch port allows verification of

of 407

laser presence on the port's receive lead. Lack of laser light may indicate a break on the cable lead from the attached node's transmit running to the port's receiver. It may also indicate a problem with the node's transceiver or internal controller logic, or it may simply mean that the attached device is powered down.

Depending on the server's operating system, failure to discover targets may be a matter of the SAN boot sequence. An operating system may need to have disk arrays up and operational before its own device drivers load. Disks that are powered on after the system is booted may not be recognized by the operating system, and in that case you may need to reboot the host to see the drives. The next logical step is to verify that the proper device drivers are installed for the host adapter cards. For preconfigured servers, this is assumed, and most vendors provide utilities for driver installation and card diagnostics. When you are provisioning multiple adapter cards in a single server, you must ensure that no resource conflicts occur and that each adapter is cabled to its appropriate SAN switch.

Although adapter card vendors generally abide by standards in implementing Fibre Channel or iSCSI device drivers, some have peculiar interpretations of the parameters allowed. Some Fibre Channel adapter cards, for example, allow only specified address ranges to be used. In a single-vendor environment, this may not cause problems, but in a multivendor environment default addressing assumptions may cause interoperability conflicts. The configuration utility supplied with the adapter card should indicate current and allowable addresses, as well as the microcode and device driver versions for the card.

For Fibre Channel fabrics, you can examine the switch SNS table to verify the successful connection of end devices to the switch. Vendors may provide a graphical interface or command line interface to interrogate the SNS. An SNS entry for a disk array, however, does not guarantee that a server will discover it. The server HBA may not have completed a process login to the array, or the fabric switch may be inadvertently zoned to exclude the array from the server's view. Similarly, in an IP SAN environment, a successful entry in the iSNS server does not guarantee server-to-target connectivity. The end devices themselves may have failed to negotiate a login, or the TCP connection between them may need to be reset.

Zoning may be configured by port or World-Wide Name (WWN). Depending on vendor equipment, the default zoning configuration may exclude all devices; in other words, an administrator must first define zones before any communications can occur. It may also be possible to inadvertently create zone conflicts, with the result that resources that are zoned together by WWN are excluded by port zoning.

Similar in impact to zoning, a switch may provide LUN masking so that only specified LUNs are visible within a zoned target device. Consequently, an initiator and a target may reside in the same zone—based, for example, on port zoning—and yet the LUNs on that target may not be visible. A careful review of switch zones and LUN masking configurations may quickly resolve the problem without the need to resort to further diagnostics.

In multiswitch configurations, status of the interswitch links may also be an issue. Fibre Channel switches typically auto-sense E_Port connections in a single-vendor fabric, but they may need to be manually configured in multivendor fabrics. Some vendors support multiple E_Port compatibility modes, both for different microcode versions of their own products and for peculiar implementations of E_Port by other vendors.

E_Port connectivity may also require that you manually configure switch addresses or designate which switch will serve as principal switch for the fabric. Address conflicts between switches may send the fabric into a perpetual fabric reconfiguration or may automatically disable E_Ports in an attempt to recover. In more complex Fibre Channel fabrics, failure of some initiators to see designated targets may be caused by exceeding the recommended hop count between multiple switches. Initiators and targets separated by multiple hops may have the appropriate entries in their respective switch's SNS table but be unable to communicate because of excessive switch-to-switch latency or failure of SNS updates to propagate through the fabric.

At the receiving end, the storage target may have its own configuration utility for establishing and assigning LUNs. You should verify array-based LUN configurations and compare them with any additional LUN masking parameters supportedby the SAN switch or host adapter card.

If the end-to-end SAN plumbing appears to be properly configured and operational, there is always the magic wand of system reboot. It may not be the weapon of choice for administrators in a production environment, but since the dawn of networking, all sorts of mysterious problems have been dispatched by the on/off switch.

Although SAN vendors have attempted to provide more advanced diagnostic capabilities in their products, the ability to crack frames and provide protocol decode at multigigabit speeds would add significant cost to any product. A full

of 407

diagnostic of a storage network therefore requires protocol analyzers and either in-house expertise or a contracted support organization with trained personnel. In either case, low-level diagnostics are an expensive and often unadvertised component of an enterprise SAN. [ Team LiB ][ Team LiB ]

9.2 Fibre Channel Analyzers As with network analyzers developed for LAN and WAN traffic analysis, Fibre Channel analyzers provide protocol capture and decode on the transmission link. Even with enormous buffers, it is not possible to capture more than a few seconds of traffic at 1Gbps and 2Gbps speeds. You can, however, capture protocol events that may last some microseconds or milliseconds of time. Typically, an analyzer is used to trace a specific process, such as a fabric login problem. Instead of randomly sampling traffic in the hopes of capturing a specific event, analyzers let you trigger the start of data capture on the basis of designated ordered sets and addresses. Trigger criteria can be complex, with combinations of protocol states and source/destination address, or they can be restricted to certain bit patterns.

Through either manual control or preprogrammed triggers, the analyzer can capture several seconds, or several hundredmegabytes, of activity at a time. After the capture is performed, the analyzer decodes the protocols and data and displays the results, usually on an integrated monitor. Capture is typically run against one port at a time, although some vendor implementations allow data capture on multiple ports. The latter is particularly useful for capturing a transaction from the initiator as well as the target perspective.

The maximum amount of data that can be captured varies from vendor to vendor but generally does not exceed several seconds. Because IDLEs and other primitives may occur in series, the memory needed to buffer captured data is reduced simply by counting the number of repeated occurrences and logging those to the decode display. Thousands of consecutive IDLEs are thus collapsed into a single entry, and that also makes it easier to read a trace.

After a capture is executed, the decode display will contain a sequential listing of all ordered set and frame activity. Depending on the vendor's design, the trace will appear in one or more windows, with drill-down screens for details of specific events. Trace field headers usually include a time stamp (to nanosecond granularity), a time delta from one event to the next, the analyzer port that saw the event, and ordered set and frame information, as shown in Figure 9-3.Most Fibre Channel analyzers supply SCSI frame decode by default, with IP and other protocols as options. The drill down for frame decode contains all frame fields, including routing control, source and destination IDs, frame controls, and data payload.

Figure 9-3. Generic Fibre Channel analyzer trace (courtesy of Finisar Corporation)

of 407

Analyzers may include, as integrated or accessory features, the capability to generate ordered sets and frames as well as to induce protocol or frame errors for test purposes. Data generators are particularly useful for verifying proper operation of end devices, although this type of activity is usually conducted in a test lab environment. Vendors may also offer client/server versions of their products to allow multiple diagnostic workstations to utilize a central analyzer resource.

In addition to diagnostic functions, analyzers provide performance statistics that can be used to monitor the impact of a particular application or configuration on the SAN. Because overloading a SAN segment can be one source of errors, portutilization data is very useful for understanding both link and upper protocol issues. Some fabric switches also provide performance data in graphical form, something that is useful for monitoring capacity without the need to physically insert an analyzer.

Although the cost of Fibre Channel analyzers remains high, the ongoing cost for sophisticated troubleshooting is in the training and retention of experts who can make sense of captured data. Having the tools to capture and decode data at gigabit speeds is only part of the equation; maintaining the expertise to productively use the tools is still a challenge for many enterprise networks.


9.3 iSCSI Network Analyzers Comparable diagnostic analyzers for IP SAN environments are also available, although in the main these products are simply enhancements of traditional IP network analyzers. LAN and WAN protocol analyzers have evolved alongside IP networking, and they let you capture and decode data on virtually any link speed, including 10Gbps Ethernet.

Network analyzers capture and decode link layer and network layer protocols, as well as upper-layer protocols in some products. In this way, you can verify Ethernet MAC layer communications on local segments as well as IP and TCP sessions through the network. The unique addition for iSCSI analysis is the breakout of the iSCSI PDUs that are encapsulated within TCP/IP, as shown in Figure 9-4. In addition to TCP session status, the analyzer display decodes SCSI commands, status, and data so that you can trace both transport and SCSI protocol status.

Figure 9-4. A SNIFFER iSCSI protocol trace (courtesy of Network Associates)

of 407

As with Fibre Channel analyzers, IP network analyzers typically include performance statistics so that link utilization can be monitored over time. One of the great unknowns in SAN deployment is the actual bandwidth required for various storage applications. Because part of the value proposition of IP SANs is the ability to use common IP network infrastructure for storage traffic, the ability to accurately provision bandwidth based on sampled utilization is key to optimizing performance and avoiding excessive oversubscription of links. [ Team LiB ][ Team LiB ]

9.4 Performance Tools Sampling link utilization for typical storage transactions may not identify periods of peak activity over time. A database application, for example, may require less than 20MBps per server but must contend with additional traffic in the SAN during periodic tape backups. In addition to performance monitoring utilities in SAN switches or network analyzers, then,it is useful to sample storage traffic over time to present a more comprehensive view of total utilization.

Providing performance statistics over extended periods provides useful information for both current utilization and network capacity planning. Vendors such as Concord Communications have been providing capacity-planning capability for traditional IP networks and more recently have introduced platforms for monitoring storage networks. Concord's eHealth-SAN, for example, periodically polls SAN adapters, switches, and interfaces to solicit point-in-time port utilization statistics, as depicted in Figure 9-5. This can be done by querying the SNMP MIB provided by the hardware vendor or by using proprietary APIs. Sampled performance numbers must then be correlated to data paths through the SAN and periodically rolled up in a database. The output can be displayed as bar or pie chart graphs, showing periods of peak utilization, potential bottlenecks, and utilization trends over a period of days, weeks, or months.

Figure 9-5. Concord eHealth-SAN historical SAN performance monitoring. Reprinted with permission from Concord Communications

Performance aggregation is also useful for monitoring service level agreements (SLAs), particularly for service providers whose clients need verification that service levels are met. Historical reporting both validates current service level compliance and provides data that may justify increasing available bandwidth for specific applications.


9.5 Chapter Summary Simple Problem-Isolation Techniques

� Most SAN connectivity problems occur during an add, move, or change to the storage network.

� The presence of optical signaling can be verified at cable ends of both nodes and switch ports.

� Adapter cards should be verified for appropriate microcode and device driver versions.

of 407

� Vendors may require specific addresses or address ranges for proper adapter configuration.

� Successful connection to the fabric can be verified by examination of the SNS or iSNS table entries.

� Improper zoning or LUN masking settings may inadvertently prevent an initiator from discovering an authorized storage target.

� In multiswitch fabrics, E_Port compatibility modes and principal switch priority should be verified to ensure interoperability.

� Storage targets may have LUN and LUN masking configuration utilities that should be harmonized with similar utilities in adapters or switches.

Fibre Channel Analyzers

� Data capture typically takes only several seconds and may be triggered by specific protocol or frame conditions.

� Capture may be performed against one or two ports concurrently.

� Analyzers may include frame and error injection utilities.

� Analyzers may provide point-in-time performance monitoring.

iSCSI Network Analyzers

� iSCSI network analyzers provide both link and network layer protocol decode.

� iSCSI PDUs are decoded to provide analysis of storage transactions in addition to TCP/IP transport.

� iSCSI network analyzers may also provide point-in-time link utilization statistics.

Performance Tools

� Performance monitoring over time provides a comprehensive view of SAN traffic and periods of peak activity.

� Performance statistics are solicited from SAN network equipment and compiled into a database.

� Performance monitoring may indicate trends in SAN utilization over a period of days, weeks, or months.

� Performance monitoring is useful for verifying service level agreements.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 10. Management of SANs NEtwork management has one central objective: to maintain the stable transport of data across the network infrastructure. What is done with the data after it has arrived at its destination is usually outside the scope of traditional network management disciplines. As networking has emerged behind the server in the form of SANs, however, both the transport of data and the management of data itself are critical to stable operation. This chapter provides an overview ofthe unique complexities of SAN management, including storage transport and storage data management. Although the lack of a standardized management model has inhibited SAN adoption, new uniform initiatives promise to ease management burdens in the administration of multivendor SANs. [ Team LiB ][ Team LiB ]

10.1 Storage Network Management As long as the server/storage relationship was isolated from the network, storage management applications tended to be focused on the server (the direct owner of storage) and specific operating systems. Storage networking shifts the focus from the server to storage and enables storage management to address data issues apart from individual servers and independently of the operating platform. With SANs, NT and UNIX servers can now share storage resources of a single disk array, and management of the array's resources may exist independently—for example, as a storage management entity—of either an NT or a UNIX environment. This abstraction from the previous server-centric model promises far more flexibility for resource allocation, volume management, and data integrity.

As shown in Figure 10-1, management of the SAN is a hierarchy of functions that may exist as separate applications or as integrated management systems. Lower layers move status and event information up through the hierarchy, and upper layers issue commands and queries down to the appropriate agents. The management structure is built on a foundation of managed devices that create the SAN interconnection (host adapters, switches, and bridges) as well as management entities that may reside in disk arrays and tape subsystems. These devices communicate to their respective device management applications via a number of protocols, primarily Simple Network Management Protocol (SNMP), SCSI Enclosure Services, or proprietary APIs. The device management applications, in turn, may communicate to upper-level storage and storage resource managers, which may provide interfaces to enterprise-level systems management platforms. SAN hardware and software vendors have a vested interest in adapting their products to this umbrella management strategy, because cohesiveness of the different strata that compose SAN management will facilitate deployment of all products.

Figure 10-1. SAN management hierarchy

Storage network management is responsible for interfacing directly with interconnection equipment. The vendor of a

of 407

managed network device supplies a console or graphical interface that, at minimum, allows port configuration (insert/bypass) and reports basic enclosure environmental status (power, fan, temperature). The feature set of a managed HBA, hub, or fabric switch is vendor-dependent. Some products offer only basic enclosure and port status, whereas others include performance and diagnostic utilities.

The application provided by the vendor to manage its product is a device manager, because typically the management scope does not extend beyond the product itself. A management utility for a host adapter card, for example, will give status, configuration, and port statistics for the adapter, but it may not provide visibility to fabric switches or other nodes with which it communicates.

The management of multiple storage network products implies the need for multiple device managers, and consequentlymultiple management workstations or consoles to support applications from different vendors. Maintaining multiple management consoles is not an attractive option for IT administrators. Paralleling the evolution of network management in local and wide area networking, this fragmentation of management platforms provides incentives for consolidating device managers under one storage network application.

Device management has software, firmware, and hardware components, as shown in Figure 10-2. Putting a port into bypass mode from a management console, for example, requires use of a management protocol (such as SNMP) to convey the command from the workstation to the device. A management agent on-board the device must then translate the bypass command into a hardware instruction, and the hardware, in turn, must raise or lower the appropriate leads to physically bypass the port.

Figure 10-2. Device management architecture

The controller or firmware that supports the management agent is the device's voice to the outside world. Communication between a device and its management workstation may occur through the storage transport media, and this is referred to as in-band management. Alternatively, management data may be segregated from storage data traffic via an external Ethernet, RS-232, or other interface, and this is referred to as out-of-band management. Out-of-band management through Ethernet normally uses the SNMP protocol, although Telnet and Web browser HTTP (Hypertext Transfer Protocol)-based implementations are also available. In-band and out-of-band solutions have their respective advantages and disadvantages, and some products offer both to provide complementary functions.

10.1.1 In-Band Management

As illustrated in Figure 10-3, in-band management relies on the SAN transport to carry status and configuration information between devices and a management console. The management protocol may be SNMP-based or may be a combination of APIs that interface to the management software application. In a Fibre Channel SAN, you might use the common HBA API to report host bus adapter status and use proprietary switch and storage APIs to report status and configuration information for the fabric and targets. In the case of an IP SAN, SNMP or API queries could be sent along the same infrastructure as the SAN interconnection, directed at the IP addresses of management entities within host, switch, or target devices.

Figure 10-3. In-band management over the SAN data path

of 407

In-band management simplifies SAN deployment, provided that all the devices that require management support in-band methods. Unfortunately, in Fibre Channel SANs some devices still require out-of-band access, so you need an additional Ethernet network to handle SNMP traffic.

The significant drawback of any in-band management scheme is apparent when the storage transport itself is down or disabled by excessive errors. Because all management data must move across the storage network, loss of the transportmeans loss of management visibility to the managed devices, and that negates the ability to detect, isolate, and recover from network problems. Thus, although in-band management fulfills a useful function under stable network conditions, itis ill equipped to deal with catastrophic physical-layer or transport-level events. In large, meshed SANs, you can somewhat obviate this vulnerability by providing redundant links. Providing alternative paths for storage data and management data, however, adds cost, consumes additional ports, and still does not eliminate the possibility that a management workstation will be rendered ineffectual if combinations of links or fabrics fail.

10.1.2 Out-of-Band Management

Out-of-band management avoids in-band issues by moving all management data from the production storage network. As shown in Figure 10-4, out-of-band techniques typically use Ethernet for the management data path and wrap management queries or commands in SNMP, Telnet, or Web browser HTTP protocols. Most switches and storage targets also provide a serial RS-232 console port as an alternative (but rarely used) access method.

Figure 10-4. Out-of-band management requires a separate network to carry managementdata

Because out-of-band implementations do not rely on the SAN transport, you can still manage host adapters, switches, and storage arrays if the fabric links are down. This is a significant advantage over in-band methods. Out-of-band management also facilitates integration of storage network management with enterprise management platforms, particularly those based on IP and SNMP. The major weakness of out-of-band network management in Fibre Channel environments is its inability to provide auto-topology mapping and other functions that require in-band communication between devices.

of 407

The most commonly used out-of-band protocols are SNMP, HTTP, and Telnet, all of which run over IP. The use of IP to carry management instructions and responses lets you manage storage network devices from anywhere in a routed IP network. Storage network management can thus be co-located with the SAN or can be centralized with other IP-based network management applications in an enterprise network operating center (NOC).

10.1.3 SNMP

SNMP (Simple Network Management Protocol) is the predominant protocol for multi vendor enterprise networks and is widely supported by routers and switches in wide and local area networking. SNMP provides a command set for solicitingstatus (SNMP Gets) or setting operational parameters (SNMP Sets) of target devices. An enterprise-wide SNMP management platform is typically run as a graphical interface on a large UNIX or NT workstation and may poll hundreds of devices throughout the routed network. The management platform contains the SNMP manager; the managed router, hub, or switch contains an SNMP agent.

Device status information may include a variety of data points: serial number, vendor ID, enclosure status, port type, port operational state, traffic volumes, error conditions, and so on. This information is organized in a management information base (MIB), which is maintained by both the management workstation and the SNMP agent within the managed device. For LAN and WAN management, several standard MIBs have been sanctioned by the internetworking community through a series of RFCs. If a vendor wishes to include additional device information that is not specified in a standard MIB, vendor-specific parameters or status can be compiled in a proprietary enterprise MIB or MIB extension.

Device information, status, and control variables within an MIB are organized in a hierarchical data structure called structure of management in formation (SMI). SMI defines an information tree whose branches lead to various management information bases and whose leaves are discrete data about a device's functionality and status. SMI notation, which is carried within an SNMP query or command, is essentially an address pointing to the location of the data requested by the management workstation.

In the MIB example given in Figure 10-5, the SMI notation 1.3.6.1.4. 1.2490.1.1 points along multiple branches of the management tree and terminates at a vendor's MIB extension for Fibre Channel hub variables. Below this point, additional branches may lead to specific port, environment, and diagnostic values. With the adoption of standard Fibre Channel MIBs, common features shared by all vendors migrate to standard MIBs, whereas vendor-specific functionality will remain in proprietary MIB extensions. This structure accommodates further feature enhancements without restricting vendors to the lowest common denominator management capability.

Figure 10-5. SNMP SMI notation for a Fibre Channel loop hub device

of 407

In addition to providing the ability to actively query a device for status, SNMP allows a device to generate a trap, or unsolicited status information. If a preconfigured error condition or threshold is reached, the managed device will initiatean SNMP message to the management workstation as an alert. On the management platform, the icon representing the managed device typically turns yellow or red, and the application may send a page or e-mail to the network operator. Because SNMP is a protocol shared by the vendor's device management application and third-party management platforms, SNMP traps can be directed to either one. Integration into HP OpenView, for example, would allow a trap to be addressed to the OpenView management workstation, which could then automatically launch the vendor's device manager for further status or diagnosis of the problem.

Any management facility based on IP implies additional traffic on the messaging network. Although polling for all status information from mul tiple SNMP agents may generate noticeable overhead, SNMP management applications can reduce IP traffic by relying on traps for notification of events or by polling only for changes from an original status check.

10.1.4 HTTP

Use of HTTP (Hypertext Transfer Protocol) is a management tool that leverages the proliferation of Web browsers for managing network products. Web-based management of network resources is being extended through the Web Based Enterprise Management (WBEM) initiative, which makes browsers the common interface for managing diverse network components.

An HTTP management implementation has two components: an HTTP server that resides in firmware in the managed device, and a Web browser (such as Netscape or Explorer) that acts as a management graphical interface. By pointing the browser to the IP address of the managed device, the user can navigate a series of screens to monitor device status or change device parameters.

Embedded HTTP capability is an entry-level management scheme and does not integrate into broader management applications as easily as SNMP or SES. Because a browser can be addressed only to one IP target at a time, the administrator would have to open multiple instances of a browser application to manage multiple HTTP-based devices. Embedded HTTP management does not facilitate a comprehensive view of all managed products. Management-by-browser is therefore more suited to small SAN installations with a single fabric to monitor. Recognizing that HTTP-based management has an appeal for the low end of the market, vendors of SAN products may provide both HTTP and SNMP management options.

of 407

Even with password protection at the managed device, HTTP presents a higher security risk than other methods. All that is required to access an HTTP-managed device is a browser (which everyone has) and the proper IP address (which can be hacked). At minimum, password protection should be provided for both read-only and read/write access, complemented by a firewall to prohibit external penetration of the customer network.

Browser-based management can also be implemented in a client/server configuration, as shown in Figure 10-6. A management workstation communicates with devices using SNMP and concurrently provides an HTTP server interface to remote browser consoles. This arrangement allows mul tiple remote managers to view the storage network via browsers and overcomes the single-device limitation of embedded HTTP management. By assigning read-only and read/write permissions, the administrator can provide management operations for monitoring and modification controls.

Figure 10-6. Browser-based HTTP client/server management

10.1.5 Telnet

Unlike typical SNMP- or HTTP-based management solutions, a Telnet implementation is a text-only, command line interface. The user sends a Telnet command across the IP network to the managed device (for example, Telnet 192.168.1.1) and establishes a session. The vendor, hopefully, has provided one or two layers of password protection, beyond which a menu of commands is available. Telnet is usually used only to set the device's default IP address to one that can be used for SNMP access, but a command line interface alone is sometimes the tool of choice for UNIX environments.

The command set available and status information that can be queried via Telnet are vendor-dependent. Some offer only basic configuration commands, whereas others provide commands and queries for all functions that are available through the vendor's SNMP graphical interface.

10.1.6 Storage Network Management Issues

Storage network management faces several challenges previously encountered in local and wide area networking. In complex multivendor, multiplatform environments, customer requirements cannot be met with proprietary management solutions, even if a vendor offers a rich feature set for man agement of its own products. Open systems exert pressure on all vendors to seek common methods cooperatively while competing for market share with unique functionality. This contradiction slowly and sometimes painfully works its way through standards bodies and industry groups, with both technical and political consequences. Still, the net result of this process is always preferable to a closed, proprietary approach.

Because storage network management is shaped by the methods and capabilities of individual device managers, the configuration, status, and diagnostic features of each Fibre Channel network product are more useful if they are accessible through a common platform. Development of standard Fibre Channel MIBs, for example, simplifies the creation of comprehensive network management applications and enables high-level storage management platforms to

of 407

gather data from the SAN interconnect. From the customer's standpoint, the ability to manage a variety of products from a common interface has significant value because it consolidates network management consoles and streamlines staffing and training requirements.

A common management language also facilitates network management of the SAN. A single enterprise may have NT, Solaris, AIX, HP-UX, and other operating systems spread across the network. Network management may be centralized in a NOC or may be dispersed throughout individual business units. To accommodate the variety of platforms on which management applications may be run, the trend in SAN management has been toward a standardized management interface suitable for multivendor SANs and one that can be implemented on a variety of management platforms. As discussed later in this chapter, the Common Information Model (CIM) is being developed to meet these requirements.


10.2 Storage Resource Management In the SAN management hierarchy, storage resource management (SRM) applications are a subset of more comprehensive storage management platforms. SRM applications are available as stand-alone programs or as plug-in modules for broader management applications. They are written to either homogeneous or heterogeneous operating system environments.

The primary value offered by these programs is the ability to make all distributed disk assets visible to a single management console. Because SRM applications are focused exclusively on storage availability and utilization, the way that storage is physically connected to servers is largely transparent. Therefore, SRM applications are not unique to SANs but may also encompass internal workstation-attached, SCSI-attached, and NAS storage. The SAN-specific features of a storage resource application surface when, via the SAN interconnection, multiple servers have access to the same storage arrays and when the SRM workstation itself is SAN-attached.

Without a storage resource mechanism, disk administration is limited to individual servers. For an enterprise with hundreds or thousands of servers, and consequently hundreds or thousands of disks and arrays, it would be overwhelming to manually consolidate information on disk utilization, location of operating systems, location of common applications, the amount of disk space budgeted for users or departments, and so on. Consequently, applications are unnecessarily duplicated on multiple servers, and disk space is underutilized in some departments and overutilized in others. In addition, tape backup requirements cannot be properly sized, and sudden outages due to disk starvation cannot be anticipated.

Storage resource management addresses these shortcomings by automating the process of disk information retrieval and presenting a single view of all disk resources. SRM client software on each server periodically updates information on its assigned volumes and directories and forwards this data to the SRM manager. The SRM management platform, in turn, consolidates the status information from multiple clients in a relational database and may, depending on the vendor's implementation, provide storage policies that issue alerts when usage thresholds are exceeded. As with trending tools in local and wide area networks, SRM applications may also offer enhanced capacity planning utilities that facilitate redistribution of storage resources and provide data for accurate budget forecasts of impending storage needs.

Because SANs enable a storage-centric model, SRM applications can be leveraged more effectively for optimal use of storage than can fixed SCSI configurations. JBODs and RAIDs on a storage network, for example, offer more flexibility inredistribution of disk space between SAN-attached servers and more easily accommodate increases in the pool of storage without system down time. At a higher level, storage utilization data via SRM can be correlated with transport utilization via storage network management to determine the optimum configuration of servers and storage to meet capacity requirements and traffic patterns.


10.3 Storage Management

of 407

Storage management (also referred to as enterprise storage management) is a broader category of storage functions that may, in addition to asset tracking, include tape backup, archiving, data placement, and volume and file policy administration. A storage management application may be an umbrella, multifunctional platform, packaged as a suite of complementary software products or as stand-alone programs dedicated to specific management functions (suchas tape backup). As with SRM applications, storage management is facilitated by, but not dependent on, storage area network topologies. Vendors of storage management products, however, have become SAN proponents, largely because of the greater flexibility provided by SANs over other storage configurations.

Storage management functions may monitor access to shared resources such as tape libraries or optical archives, schedule nondisruptive disk defragmentation routines, manage the growth and integrity of file systems, and oversee cross-platform access to storage. The more tightly integrated these activities are within a single application, the simpler storage administration becomes for day-to-day operations. As more storage, tape, and archive systems appear on the SAN, integration is further enhanced, because the storage management platform itself can access resources directly without having to pass through servers.


10.4 Integration of Storage, Systems, and Enterprise Management Systems management applications consolidate the process of monitoring and controlling network resources, and they often encompass different networks within an enterprise under a single application umbrella. A systems management application may be dedicated to a specific function, such as utilization and performance of all network resources, or it may incorporate related categories across multiple networks, such as asset and accounting control for tracking resourcesand billing individual departments' use of network equipment and bandwidth. Systems management functions such as security management and fault management span wide area, local area, and storage networks, and so they require common interfaces to all topologies.

Enterprise management applications embrace all or most of the individual systems disciplines and provide the most comprehensive view of all corporate network resources. Sitting at the top of the management food chain, enterprise management relies on a steady diet of status, statistics, configuration, utilization, performance, and other management data fed to it by the underlying management subsystems and entities. Because storage networking introduces another rich set of data points for systems and enterprise management, initiatives have been proposed to rationalize the flow of management data between the lowest-level network structures and the upper-level, enterprise-wide management applications. [ Team LiB ][ Team LiB ]

10.5 The Common Information Model (CIM) (Bluefin) The storage networking industry has been struggling with management issues since its inception. In the evolution of a new technology, basic functionality must be developed before advanced services can be implemented.

For SAN technology in particular, the development of standardized management, quality of service, and security serviceshas trailed far behind the deployment of infrastructure and SAN-enabled applications. Consequently, customers are handed an assortment of installation CDs and manuals for different device managers and storage utilities, and they are expected to administer a single SAN through a myriad of management consoles.

Although hardware vendors may use SNMP, HTTP, or Telnet to configure and monitor their products, value-added management may be provided by proprietary APIs. APIs enable dynamic and granular management but are vendor-specific. Proprietary APIs have thus become a tool for maintaining a competitive advantage in the market, as well as a means to exclude others from providing products that can be managed from a common platform. The situation is not improved by "gentlemen's agreements" between dominant vendors to exchange their proprietary API information so thatthey can manage each other's products. This simply expands the circle of market influence from individual vendors to vendor affinity groups but does not provide the customer an open systems solution.

In response to single-vendor initiatives and to customer dissatisfaction with continued proprietary offerings, the storage

of 407

networking industry has rallied around an open systems management proposal based on the Common Information Model. The CIM initiative for storage networking (known as Bluefin in its draft form) will provide a standards-based alternative to proprietary SAN management and may ultimately achieve the goal of comprehensive management of multivendor storage networks from a single console.

The Common Information Model has been under construction for some time as an object-oriented framework for managing virtually anything connected to a network. CIM development in general has been led by the Distributed Management Task Force (DMTF), whereas storage network-specific CIM has been developed within the Storage Networking Industry Association.

CIM provides a common management interface under the umbrella of Web Based Enterprise Management, or WBEM.WBEM leverages the ubiquity of Internet protocols and browsers to provide enterprise-wide management of all IT resources. This enables a commonly deployed and essentially free user interface for a comprehensive and sophisticated management infrastructure.

The CIM specification maps existing management protocols such as SNMP and APIs to a structured, object-based model. Because the CIM specification is all-encompassing, it divides the world of managed entities into schemas. CIM schemas are organized frameworks that define systems, applications, networks, devices, and physical elements to be managed. SAN management thus involves the definition of schemas that are specific to storage and storage transport management.

As shown in Figure 10-7, CIM sits between managed devices and the upper-layer management application. To provide backward compatibility, SNMP and other management protocols can be accessed via an entity mapping. This enables thecurrent generation of SNMP-based devices to be grandfathered into a CIM-based management framework. Alternatively, vendors can implement standardized APIs within their products that communicate directly to the common interface. CIM information is formatted via Exten sible Markup Language (XML) and delivered to a browser using HTTP over TCP/IP. Use of standard Internet tools enables WBEM/CIM to be run on any operating system and any IP infrastructure.

Figure 10-7. Common Interface Model for multivendor, cross-platform management

For storage networking, the CIM model is implemented through agents that can be embedded in SAN hardware or software products. An HBA, for example, can support a CIM agent that provides standard HBA-specific information in thestructure of an object class HBA. For devices that are not CIM-compliant, a proxy server can solicit status and configuration information through standard SNMP or HTTP protocols and present that data in CIM format. Agents report to clients, which are distributed management applications that gather and rationalize the aggregated management data. The management client can implement a CIM object manager (CIMOM) to manage a diversity of storage devices and resources.

Standardization of CIM for storage networking is being assumed by the SNIA. Standards development is expensive, bothin terms of actual costs for technical editors and standards compliance testing, and in terms of volunteer resources from contributing companies and individuals. Achieving an open systems solution for management of SAN applications, subsystems, and devices, however, will remove one of the major impediments to SAN adoption and will enable more customers to realize the benefits of shared storage.

of 407


10.6 Chapter Summary Storage Network Management

� Management of SANs falls within a hierarchy of management applications, extending from hardware device managers through enterprise-wide management applications.

� Device managers convey commands and status from hardware to an external management application, typically via SNMP, HTTP, or proprietary APIs.

� Storage network management relies on information from device managers and consolidates management data into a single view of the SAN interconnection.

� In-band management moves management data across the SAN transport. Out-of-band management relies on SNMP, HTTP, or other protocols to move SAN management data via a separate Ethernet LAN segment.

Storage Resource Management

� Storage resource management provides a consolidated view of distributed disk resources, available space, file locations, user storage budgets, and so on, and it facilitates capacity planning.

� SRM applications can monitor both direct-attached and SAN-attached storage resources.

Storage Management

� Storage management may include resource functions as well as tape backup management, data archiving, data placement, and file policy administration.

� Storage management applications are facilitated by the SAN's ability to share tape and storage resources.

Integration of Storage, Systems, and Enterprise Management

� Management of storage and storage networking can be integrated into higher-level systems and enterprise management applications.

� Integration is facilitated by common programming interfaces.

The Common Information Model (CIM)

� Web Based Enterprise Management (WBEM) leverages Internet protocols and browser interfaces to provide management of a diversity of network products and applications.

� The Common Information Model (CIM) provides an object-based structure for defining categories of managed devices and their attributes.

� CIM information is formatted using Extensible Markup Language (XML) and is transported via HTTP and TCP/IP.

� CIM implementations for storage networking are based on CIM agents in devices and applications, and CIM clients in management frameworks.

of 407

� A CIM client may provide a CIM object manager (CIMOM) to manage a diversity of storage resources from a singleplatform.

� WBEM/CIM management for storage is being led by the Storage Networking Industry Association and will simplify management of complex multivendor SANs.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 11. Storage Virtualization Storage virtualization has been alternately praised as a panacea for all sorts of storage networking ills and denounced asa marketing fraud perpetrated by avaricious vendors on desperate and gullible customers. Although vendor greed may in fact result in marketing excesses, the potential for storage virtualization to dramatically reduce administrative costs is so great that wild enthusiasm for the technology has been difficult to restrain. If only people agreed on what it actually does. This chapter presents an overview of storage virtualization and the sometimes conflicting proposals for how it should be implemented. As the technology matures, storage virtualization will facilitate the deployment of shared storage and will significantly reduce the cost of storage administration.


11.1 What Is Storage Virtualization? The common denominator for storage virtualization initiatives is the abstraction of storage from its physical deployment and configuration. Without this logical abstraction, the host system would have to discover and manage every physical disk or tape subsystem under its control. It would have to be aware of the data paths between the host and its storage resources and would have to accommodate the unique characteristics of each target device. Storage virtualization hides this physical complexity from applications residing on the host system, instead presenting a simplified view of storage resources. With a virtualization mechanism in place, the application sees only specified storagevolumes that may in fact reside on a very complex assortment of physical storage devices on a multipathed network infrastructure.

RAID controllers are an obvious example of storage virtualization engines. As discussed in Chapter 5, a RAID controller sits between the host and the physical disk drives on which storage data is placed. Each physical disk supports a logical unit that executes SCSI read and write commands. Without a RAID controller to manage the physical drives, the host would have to attach to and manage each one individually—for example, as a JBOD. The RAID controller simplifies connectivity and management of physical drives, however, by appearing to the host as a single storage resource. The host system is not aware of the physical configuration of disks sitting behind the controller, their individual attributes, nor even how many disks are installed in the RAID chassis. When querying the RAID controller, the host sees one or more LUNs that the RAID controller has logically aggregated from its attached drives. In addition to striping data across several physical disks to enhance performance, the RAID controller may provide mirroring for data integrity. These internal RAID functions are transparent to the host, which benefits from both simplified administration and the efficiency provided by the RAID system.

Although RAID storage devices were not initially marketed under the banner of storage virtualization, they are an atomiccomponent of a virtualized solution. Just as a single RAID controller provides a simplified logical view of its attached storage devices, multiple storage systems can be virtualized to appear as a single storage pool, as shown in Figure 11-1. As with an individual RAID controller, this superset of abstraction requires an intelligence that sits between the host and multiple downstream storage arrays. Various schemes have been offered to solve the problem of where to place the virtualization engine for greatest efficiency.

Figure 11-1. Physical and logical views of a virtualized storage network configuration

of 407

Storage virtualization takes on the burden of configuration and administration of physical storage devices but also enables more efficient utilization of storage resources. Without a virtualization intelligence, individual hosts would be assigned separate storage arrays. In that case, one host might use only 40 percent of its assigned storage capacity, whereas another host might use 90 percent. You would not be able to assign the excess capacity of one host for use by another host, and you might purchase additional physical storage even though a surplus actually existed. Storage virtualization aggregates the total capacity in the SAN and, with the proper implementation, can automatically balance utilization across multiple hosts. This reduces the total cost of the shared storage system and helps to consolidate storage administration.

The virtualization engine also screens the host systems from changes on the back-end storage network. Because the host accesses storage via the virtualization intelligence and has no awareness of the physical devices in the storage pool, new storage can be added nondisruptively. Networking storage via SANs facilitates scaling of storage resources, and virtualization masks the physical adds, moves, and changes to the network. If virtualization is properly implemented, the ability to hide physical changes in the storage configuration should eliminate or at least minimize down time.

If myriad physical storage devices now appear as a single logical storage pool, the capacity of the individual devices that compose the pool can be more efficiently utilized. As shown in Figure 11-2, traditional direct-attached storage cannot leverage unused capacity. Because each server is the exclusive owner of its own fixed storage resources, there is no means to balance capacity among multiple servers. One server may have excess capacity, while its neighbor is exceeding its allotted storage. Whereas SAN technology facilitates sharing of storage capacity, virtualization makes the sharing transparent.

Figure 11-2. Inefficient storage utilization with direct-attached storage

of 407

As shown in Figure 11-3, the virtualization of storage into a single pool enables more efficient utilization of the total storage capacity. Available disk resources can be used on demand, and the total capacity can be resized nondisruptively.

Figure 11-3. Maximizing capacity utilization with storage virtualization

In addition, the composition of the storage pool itself can vary, depending on performance, availability, and budget requirements. To boost performance, the storage pool can present logical volumes that ultimately reside on RAID arrays with large cache memory. For availability, the RAID arrays can be mirrored to other RAID systems or to cheaper JBOD banks. For economy, the entire pool can be composed of JBODs, or even SCSI disks that are front-ended by SAN bridges. Because the potentially eclectic assortment of physical storage devices is hidden behind the virtualization engine, the complexity of storage capabilities and redundancy does not need to be managed by individual servers.

Storage virtualization strategies can be extended over distance by use of SAN extension or IP storage technology. This makes it possible, for example, to implement campus or metro storage pools that further streamline storage operations and amortize storage costs over more servers. If multiple departments within an enterprise are brought into a common virtualized storage scenario, the unique requirements of each can be accommodated by policy-driven mechanisms to enforce service level agreements. This setup represents a higher level of virtualization intelligence, because the virtualization engine must not only aggregate diverse physical storage but also determine which applications require special treatment in terms of performance and availability.

of 407

The discussion thus far may imply that these capabilities are already available in virtualization products. Although some features may be available, the higher-level functionality for automatic sizing of volumes, nondisruptive changes, and quality of service are still under construction. The ideal of application-aware virtualization, which identifies data types and automatically determines which storage policies should be applied, may not be available for some time.


11.2 In-Band and Out-of-Band Virtualization In traditional shared storage access, the path between the server and the networked storage targets carries both the information about how data is stored (metadata) and the data itself, as shown in Figure 11-4. A file system, for example, may reside on a physical storage array, containing metadata such as file names, properties, and lists of block addresses where each file's data is stored. The rest of the array is composed of the data blocks to which the file system metadata points. The host system thus accesses metadata and the data itself in-band, through the transport path established by the SAN between server and target array.

Figure 11-4. Conventional in-band access to metadata and data over a SAN

What happens, then, when a virtualization engine abstracts both the view of metadata and the physical location of data blocks? If the virtualization engine remains in the path between servers and storage, it preserves in-band access. Alternatively, the virtualization engine can split the access path to metadata from the access path to data, creating an out-of-band implementation. As shown in Figure 11-5, the server's view of metadata shifts from physical storage to thevirtualization entity, which in turn controls how data blocks are physically distributed in the storage pool. In reality, the metadata always sits somewhere on physical disk media, whether in the storage pool or on dedicated disks within the virtualization product.

Figure 11-5. In-band and out-of-band virtualization solutions

of 407

Vendors of each solution are quick to point out the superior benefits of in-band or out-of-band virtualization methods. The in-band architecture is the least intrusive from the standpoint of the server. Instead of discovering and attaching to multiple storage targets across the SAN, the server now sees only a single large storage resource in its path, represented by the virtualization appliance. The in-band virtualization engine manages the disparate physical storage devices in the downstream SAN and presents a coherent image of metadata to the server. But because the virtualization engine in this case presumes to sit directly between servers and storage, it is a potential bottle neck as more servers and storage are added. Vendors must therefore demonstrate that their in-band solutions do not impede data throughput and can handle both virtualization processing and data transport at wire speed. Performance can be enhanced by incorporating the virtualization engine directly into the SAN interconnection, melding switching and virtualization functions into a single product. This both streamlines the virtualization solution and leverages the ability of a SAN to support heterogeneous storage devices.

Whether appliance-based or switch-based, in-band virtualization may be vulnerable to disruption. Because the in-band appliance now becomes the sole gateway between servers and storage, failure of the appliance can result in loss of data access for the entire SAN. A distributed virtualization solution could resolve this problem by allowing multiple appliances to share metadata information and provide failover in the event of disruption.

The out-of-band architecture avoids in-band issues by placing metadata control outside the data transport path. Individual servers, however, must have virtualization software agents installed so that I/O requests can be redirected to the out-of-band appliance. Separating data and control paths ensures that the virtualization appliance will not act as a bottleneck to data transport, although, as with in-band methods, loss of the out-of-band appliance itself can still be disruptive. Redundant out-of-band appliances with synchronized I/O control could be configured to avoid outages and provide high availability.

The debate between the in-band and out-of-band approaches is tied to another virtualization quandary: Where should the virtualization intelligence reside? The three commonly proposed sites are in the host, in the interconnection, and in the storage target. Each location has its strengths and weaknesses, and some approaches attempt to distribute virtualization support across all three.


11.3 Host-Based Storage Virtualization

of 407

Host-based virtualization can be implemented through a logical volume management (LVM) application installed on each server. As with other solutions, the host-based virtualization application hides the complexity of physical storagewhile presenting a logical view of storage resources to the operating system. Because the abstraction of physical storageoccurs within the server itself, there is no splitting of control and data paths (and in fact no "path" per se). Host-based virtualization bypasses the bottleneck and performance issues associated with in-band virtualization. The host can be multipathed to shared storage, with any mix of storage target devices at its disposal, as illustrated in Figure 11-6.

Figure 11-6. Host-based storage virtualization

The major drawback of host-based virtualization is that virtualization on each host must be manually configured and managed separately. The more servers, the more administrative overhead this imposes. In addition, connectivity to the SAN and individual targets must also be configured on a server-by-server basis. For shops with small server farms, this level of administration may be acceptable, but then with few servers and storage targets there is less incentive to deployvirtualization in the first place.


11.4 SAN Interconnect-Based Storage Virtualization Interconnect-based virtualization solutions move the virtualization intelligence into the heart of the SAN, either as a third-party appliance or as enhanced functionality within a Fibre Channel or Gigabit Ethernet SAN switch. The rationale behind interconnection-based virtualization is that because the storage network already provides connectivity between all servers and storage devices, it can more easily intercept and process storage requests transparently. No special agents are required on the host systems, and any mix of storage targets can be supported. All that is required is sufficient processing power on the appliance or SAN switch to manipulate metadata and still provide wire-speed data transport.

As shown in Figure 11-7, in an interconnection-based virtualization solution, servers may attach to SAN switches as

of 407

usual, with dual pathing for failover or load balancing. Unlike traditional SAN attachment, however, the servers do not discover individual physical storage devices on the SAN. Instead, the virtualization intelligence in the SAN interconnection (in this example, in the SAN switches themselves) presents a logical view of a storage pool. Individual servers are assigned LUNs within the pool but are not aware of which storage devices are actually supporting the data blocks those LUNs represent. For high availability, the interconnection-based virtualization engines may provide mirroring between, for example, a RAID array and a JBOD. This is transparent to the server, which nonetheless benefits from the mirroring done on its behalf.

Figure 11-7. Network-based virtualization in a redundant SAN switch configuration

In this illustration, the switch-based virtualization is deployed in a redundant configuration, ensuring dual paths to both servers and storage as well as redundancy for the virtualization engines. For this to work, the virtualization engines must be constantly synchronized to ensure that each has a duplicate and current image of the metadata of the other. As with server clustering, a heartbeat protocol keeps both engines aligned and triggers failover if a switch or virtualization entity within it fails.

Interconnect-based storage virtualization offers significant advantages and disadvantages. In its support, network-basedvirtualization is completely transparent to servers, and so it readily accommodates heterogeneous server environments and minimizes server administrative tasks. It is also completely transparent to the storage targets, enabling any mix of enterprise-class or departmental RAID or JBODs from any vendor.

As an in-band solution, however, interconnect-based virtualization does pose a potential bottleneck in terms of performance. Vendors must demonstrate that their interconnect-based appliances can provide all the benefits of virtualization without imposing a performance penalty. The main drawback of switched-based virtualization in particular is that it places an enormous responsibility on products that were originally designed for relatively simple switching tasks. Even at additional cost, providing a product that combines high-performance SAN switching with complex virtualization management is not a trivial task.


11.5 Storage-Based Virtualization

of 407

As storage vendors themselves will modestly admit, storage arrays are the perfect place to house virtualization technology. Enterprise-class storage arrays already provide low-level virtualization via RAID and mirroring, and many of them support higher-level functions such as storage pooling across multiple arrays. As with interconnection-based virtualization, array-based virtualization may be completely transparent to servers and may require no special agent software. Unlike interconnection-based virtualization, storage arrays reside at the end points of the SAN, not the middle,and so are less likely to generate SAN-wide bottlenecks to performance.

Storage-based virtualization from leading vendors provides direct disk-to-disk data replication that simplifies (at a cost) the implementation of high availability for applications such as business continuance and disaster recovery. EMC SRDF, HDS TrueCopy, HP DRM, and XIOtech REDI-SANlinks, for example, all provide means to replicate storage data without server intervention, thus fulfilling one major component of storage virtualization.

Some storage-based virtualization solutions provide additional functions, such as point-in-time snapshots of storage datato facilitate windowless tape backup, dynamic storage pool creation within an enclosure, and storage pooling between multiple enclosures, as shown in Figure 11-8.

Figure 11-8. Storage-based virtualization for pooling and data replication

If storage-based virtualization sounds too good to be true, it is. The major limitation of array-based virtualization is not technical but marketing in nature. Storage-based virtualization solutions are proprietary to each vendor, and so they canbe implemented only in homogeneous storage environments.

Although that may be acceptable for customers who prefer to implement uniform storage products throughout their networks, most customers resist solutions that foster single-vendor lock-in. In addition, the spontaneous accumulation of storage products from multiple vendors on a departmental basis or through company acquisitions often results in a heterogeneous mix of storage investments, all of which could benefit from a vendor-neutral virtualization solution. This has driven attention from array-based virtualization to the upstream interconnection-based solutions as a means to accommodate multivendor environments.


11.6 Multivendor Storage Virtualization Although the SNIA is attempting to define the basic characteristics of storage virtualization to determine whether standardization can be applied, to date little progress has been made toward an open systems approach to storage virtualization. This is not because of lack of interest or support by vendors and customers. Instead, the causes are the

of 407

complex facets of the virtualization gem and the difficulty of retrofitting a standards-compliant framework onto storage arrays whose added value is already based on deeply proprietary technologies.

Because storage-based virtualization is inherently proprietary, heterogeneous SANs must rely on either host-based or interconnection-based solutions. Given the administrative overhead of host-based virtualization, interconnection-based solutions are by default more suitable for larger multivendor storage networks. With the caveat of potential performance bottlenecks, interconnection-based solutions can at least accommodate heterogeneous server platforms upstream and mixed environments of RAIDs and JBODs downstream with less administrative overhead. Reduction of administrative overhead and its associated costs is, after all, one of the main drivers of virtualized storage solutions.


11.7 File System and NAS Virtualization Similar in concept to the virtualization of block-based storage, the virtualization of file systems promises to simplify administration and facilitate sharing of resources. File systems, whether supported on individual file servers or on NAS appliances, become difficult to manage as more file systems proliferate throughout the network.

As shown in Figure 11-9, a virtualized file system configuration relies on software installed on each host. The virtualized file system is the result of merging file system metadata from multiple systems into one extended file system. This design both simplifies administration and enables more efficient use of shared resources such as network-attached storage. As with conventional NAS, NFS and CIFS protocols are used for file access across the LAN, with the further benefit that heterogeneous operating systems can be supported on the same virtualized file system.

Figure 11-9. Virtualized file system based on NAS storage

of 407

Virtualization at the file system level can be combined with virtualization of block-level data. The NAS appliances shown in Figure 11-9, for example, could use SAN-attached external storage with interconnection-based virtualization engines to create a storage pool. Administration of both file systems and block data storage would thus be streamlined, reducingthe cost of ongoing system management. [ Team LiB ][ Team LiB ]

11.8 Tape Virtualization Virtualization of tape subsystems relies on an in-band intervention between backup servers or appliances (such as third-party copy agents) and tape libraries. Without the assistance of virtualization intelligence, backing up multiple servers to tape requires multiple tape targets, each with its own tape drives and media control. As the population of servers increases, additional tape drives and media are required, as are additional administrators to oversee backup operations.

A tape virtualization engine may sit in the SAN in the data path between servers and tape subsystems, as shown in Figure 11-10. As with disk storage virtualization, the physical configuration of tape subsystems is transparent to the servers or extended copy agents. Instead, the tape virtualization appliance presents the image of a high-capacity, multidrive tape subsystem that can accept backup streams from multiple servers concurrently. The tape virtualization appliance can contain high-performance cache memory to absorb the backup streams before spooling them to physical tape drives. Because the individual tape drives and associated media are not dedicated to separate backups, fewer physical tape drives are required and use of media can be maximized.

Figure 11-10. Virtualizing tape backup for performance and reduced administration

Combined with disk storage virtualization, tape virtualization offers further efficiencies to shrink the physical footprint of the SAN and reduce administrative costs. Despite the somewhat surreal sensation generated by the notion of virtual diskbacking up to virtual tape, ultimately the data on both disk and tape is quite real. The challenge for vendors is to implement these virtualization schemes with sufficient integrity checks to ensure accurate data storage, archival, and retrieval. [ Team LiB ][ Team LiB ]

11.9 Virtualization and the Data Storage Utility Storage virtualization will fulfill its promise when products are able to fully automate data placement. Virtualization is thekey to the concept of a data storage utility, in which computing resources are simply plugged in to a network and the network itself is able to identify the type of data being generated, the performance and availability requirements typical of that data type, its quality of service requirements, the security policies that should be enforced, and the appropriate

of 407

archival methods that should be applied, and then automatically provision additional capacity as the volume of data grows.

Post-production video editing, for example, has unique requirements compared with on-line transaction processing or tape backup. Video streams require high performance, both from the SAN transport and from disk storage. Data written to the longer, outer tracks of disk media requires less disk head movement and repositioning, and so it delivers data more efficiently than data written to inner tracks. Because video production represents a substantial investment of money and human resources, re-creating lost or corrupted image streams is prohibitively expensive. High availability viasynchronous mirroring and point-in-time snapshots for immediate tape archiving are therefore appropriate. Video editingmay require file sharing so that multiple editing workstations can manipulate the same data streams. The editing facilities themselves may be geographically separated, requiring both high-performance SAN/WAN links and data encryption over untrusted network segments. Finally, the sheer volume of data generated by digitized video requires scalable storage capacity that can seamlessly accommodate the influx of new data.

With current technology, fulfilling the requirements of particular data types is a labor-intensive and often tedious process. Each step requires manual configuration of unique products and careful crafting of processes to ensure interoperability between applications and the supporting infrastructure. The level of automated intelligence required for application-aware, policy-based virtualization is significantly more sophisticated than that required for storage pooling or tape virtualization, but it is achievable. As with any technology, the starting point is to break down individual manual processes into their constituent parts and define the means to replicate them artificially.

The ideal data storage utility must also accommodate heterogeneous deployments of applications, operating systems, servers, interfaces, interconnections, and storage targets. Although customers are not always happy with the rainbow coalition of storage products in their networks, so far they have not been willing to return to the era of monolithic, single-vendor solutions with its accompanying vendor lock-in. Making the data storage utility a reality will therefore require much closer multivendor cooperation. As always, the onus falls on customers to make sure their vendors are sufficiently motivated to take the technology forward.


11.10 Chapter Summary What Is Storage Virtualization?

� Storage virtualization is the logical abstraction of storage from its physical deployment and configuration.

� By hiding the complexity of physical storage, virtualization greatly simplifies and reduces the cost of storage administration.

� RAID is a low-level virtualization implementation.

� Storage virtualization enables efficient utilization of storage capacity.

� Virtualization enables nondisruptive changes to the back-end storage network.

� Both RAID and JBOD storage can be incorporated into the same storage pool.

� Virtualization can be extended over distance via SAN extension or IP storage.

� Available products may support only low-level virtualization functionality.

In-Band and Out-of-Band Virtualization

� Metadata is information about how data is stored on disk and how disk assets are organized.

of 407

� In-band virtualization sends metadata and data over the same network path.

� Out-of-band virtualization separates control metadata and data and sends them over different network paths.

� In-band solutions are unintrusive to both hosts and storage targets, but they may create performance bottlenecks.

� Out-of-band solutions avoid performance bottlenecks, but they require software drivers on each server.

� An in-band virtualization engine can reside in a third-party appliance or can be incorporated into a SAN switch.

Host-Based Storage Virtualization

� Host-based storage virtualization can be implemented via a logical volume manager (LVM).

� Host-based solutions require installation and management of virtualization software on each server.

SAN Interconnect-Based Storage Virtualization

� Interconnect-based virtualization is transparent to hosts as well as storage targets.

� Interconnect-based solutions can support heterogeneous multivendor environments.

� For high availability, redundant virtualization engines are required.

� Interconnect-based solutions must provide adequate performance to avoid becoming bottlenecks to data throughput.

Storage-Based Virtualization

� Storage arrays provide virtualization functions such as RAID and disk-to-disk data replication.

� Storage-based virtualization may allow storage pools to span multiple physical arrays.

� Storage-based virtualization is inherently proprietary, limiting its appeal to single-vendor shops.

Multivendor Storage Virtualization

� The SNIA is attempting to define standards requirements for virtualization technology.

� Virtualization in multivendor SANs can be implemented most efficiently via interconnect-based products.

File Systems and NAS Virtualization

� Metadata from separate file systems can be combined to present a larger, virtualized file system.

� File system virtualization requires software on each host system.

� Virtualization of file systems can be combined with virtualization of block storage resources.

Tape Virtualization

� Tape virtualization relies on in-band intervention between backup servers and tape subsystems.

of 407

� Tape virtualization is transparent to servers and tape devices.

� Tape virtualization enables efficient use of tape drives and tape media.

Virtualization and the Data Storage Utility

� Storage as a utility requires application-aware, policy-based virtualization intelligence.

� Different types of data may require different methods for storage placement.

� Automated policy engines can be used to enforce data placement, quality of service, performance, availability, security, capacity, and other requirements.

� In addition to intelligence, the ideal of data storage utility requires support for multivendor environments. [ Team LiB ]

of 407

[ Team LiB ]

Chapter 12. Application Studies Although storage networks share common components in the form of servers, storage, and interconnection devices, the configuration of a SAN is determined by the upper-layer application it supports. A post-production video editing application may have different requirements than a high-availability OLTP (on-line transaction processing) application. Server-free tape backup applications may employ unique hardware and software products that would not appear in a SAN designed to support server clustering. Because SANs offer the flexibility of networking, however, you can satisfy theneeds of multiple applications within a single shared storage configuration.

The application studies in this chapter examine SAN installations that were designed to meet specific requirements. In some instances, the deployment of a new networked infrastructure provided additional opportunities for resolving unrelated issues. A SAN originally designed for a high-bandwidth application, for example, can also facilitate a more efficient tape backup solution. Where applicable, these studies indicate whether Fibre Channel or IP SAN interconnections are used, although the type of underlying plumbing is less important than the upper-layer applications being served. [ Team LiB ][ Team LiB ]

12.1 Post-Production Video Editing As one of the first applications to employ high-performance SAN technology, full-motion video editing leverages the bandwidth, distance, and shared resources that SAN technology enables. Digitized video has several unique requirements that exceed the capabilities of legacy data transports, including the sustained transmission of multiple gigabit streams and intolerance for disruption or delays. Most SAN-based video applications use the SCSI-3 protocol to move data from disk to workstations, although custom configurations have been engineered using IP for multicast and broadcast distribution.

Video applications have common high-performance transport requirements but may vary considerably in content. A video editing application can center on a workgroup configuration, as shown in Figure 12-1, allowing peer workstations to access and modify video streams from one or more disk arrays. In addition to the physical SAN topology, any application that allows data sharing must have software support for file access and locking by multiple users. A video broadcast application that serves content from a central data source to multiple feeds must have the means to support multicast across the SAN. Video used for training applications may support both editing workstations and user stations, with random access to shared video clips or to instructional modules digitized on disk.

Figure 12-1. A peer video editing SAN via a switched fabric

of 407

In this example, the bandwidth required per workstation depends on the type of video streams retrieved from and stored to disk. Standard digitized streams may require ~30MBps throughput, whereas high-definition video requires ~130MBps. In the latter case, 2Gbps Fibre Channel would support ~400MBps full duplex, or sufficient bandwidth for a high-definition video stream to be read from disk, processed, and written back concurrently.

The configuration shows dual pathing between the video editing workstations and redundant SAN switches. Device drivers for the host adapter cards must therefore support failover in case a link or switch is lost, and preferably load balancing to fully utilize both paths during normal operation. In addition, the ratio of server to storage links must be adequate to support concurrent operation by all workstations. Larger storage arrays support multiple links so that a nonblocking configuration can be built.

Video editing applications are intolerant of latency or delays. For optimal jitter-free performance, video data can be written to the outer tracks of in dividual disks within the storage array. Although this technique reduces the total usable capacity, it requires less disk head movement for reading and writing data and thus minimizes latency in disk performance.


12.2 Prepress Operations Prepress refers to the creation and modification of graphical images for advertisements, catalogs, and posters. Graphics can be as simple as low-resolution black and white newspaper ads or as sophisticated as large, four-color, high-resolution images that are applied to billboards or city buses.

Unlike full-motion video applications, computerized prepress data traffic is bursty. A single graphical image is read from disk, rendered for several hours by a graphic artist at a workstation, and written back to disk. The file may pass throughmultiple revisions, and therefore multiple workstations, as other detail, titles, and legends are added. When graphical editing is complete, the file is then read by a preprint processor for conversion from digital format to hardcopy or print negative.

Because a graphical image must go through a series of editing steps, each by a different artist, as it passes through the production process, file ownership is critical for maintaining data integrity. If the same image is inadvertently opened and modified at the same time, hours of work can be lost. Software companies that specialize in prepress operations resolve this potential problem by providing file sharing middleware. This software resides on each workstation. By intercepting calls from the operating system to the file system, the application allows file ownership to be serially

of 407

transferred from one user to another as the file is read from and written back to disk.

In addition to file ownership, the amount of time it takes to read a large graphical image from disk for editing, and to write back the modified version, is an important issue for prepress. Read/write time is, for the user, down time, and the accumulated down time between edits can impact the entire production process. For large prepress operations in particular, the bandwidth supplied by a Fast Ethernet LAN is not sufficient for concurrent file transfers of image files that are often in the range of hundreds of megabytes. To increase productivity, gigabit speeds (Gigabit Ethernet or Fibre Channel) are required.

As with video applications, prepress has a voracious appetite for storage. A catalog production, for example, may requireterabytes of storage for high-resolution photographic images and formatting information. A major brands consumer catalog may have three or more editions per year, with revisions of some product images and introduction of new ones. All this data must be maintained and accessible for updates. File compression helps to reduce the overall storage requirement, but it is less effective for high-resolution images.

SANs were introduced into prepress operations primarily by vendors of file access software. As a total solution, the combination of file access middleware, higher bandwidth and shared storage via a storage network, and increased storage capacity provided by SAN-attached disk arrays addresses most of the data infrastructure issues faced by prepress operators.

As shown in Figure 12-2, graphics workstations can be attached to departmental SAN switches in each work area, with trunked links running between the departments and the processing/storage center. The distribution of users is scalable: Additional users can be accommodated with additional switches, and the population of each switch can be adjusted according to workload and bandwidth requirements. Shared storage in this configuration is also scalable, both by the addition of drives into the RAID enclosures and by attachment of new arrays over time. Specialized preprint processors can be brought into the SAN via SAN bridges. And finally, file access software on each graphics workstation ensures that a file can be modified by only one user at a time and that the identity of the current owner is known. This SAN solution also provides great efficiency by transporting graphics data using the block SCSI protocol, as opposed to file-based transports.

Figure 12-2. Prepress storage network with departmental graphics workstations


12.3 LAN-Free and Server-Free Tape Backup Tape backup poses a number of problems for IT operations, none of them easily addressed by traditional parallel SCSI methods. As long as disk arrays are bound to individual servers, tape backup options are limited to server-attached tapesubsystems or transport of backup data as files across the messaging network. Provisioning each server with its own tape backup system is expensive and requires additional overhead for administration of scheduling and tape rotation on

of 407

multiple tape units. Performing backups across the production LAN allows for the centralization of administration to one or more large tape subsystems, but it burdens the messaging network with much higher traffic volumes during backup operations. In addition, scheduling backups for multiple servers to a central tape resource creates a potential conflict between the time required to back up all servers and the time available for nondisruptive access to the network. Scheduling backups during off-peak hours, such as 8:00 PM to 6:00 AM, may not provide sufficient time to back up all data. Nor is it an option for enterprises that operate across multiple or international time zones.

In Figure 12-3, three departmental servers share a common tape backup resource across the production LAN. Even with switched 100Mbps Ethernet and no competing user traffic, the maximum sustained throughput from server to tape is approximately 25GB per hour. If each server supports a modest 100GB of data, a full backup of the department's data would require 12 hours. Backups, however, are normally scheduled for incremental backup of changed files on a daily basis, with full disk backups occurring only once per month or quarter. To accommodate both full and incremental backups, the full backup routines would have to be rotated among different servers on different days, and then only during periods when full LAN bandwidth was available.

Figure 12-3. Tape backup across a departmental network with direct-attached storage

When the volume of data exceeds the allowable backup window and stresses the bandwidth capacity of the messaging network, either the bandwidth of the messaging network must be increased or the backup data must be partitioned fromthe messaging network. Installing a high-speed LAN transport, such as switched Gigabit Ethernet, can alleviate the burden on the production network but leaves the server/storage relationship unchanged. Thus, the potential conflict between user traffic and storage backup requirements can be resolved only by isolating each on a separate network, either by installing separate SAN interconnection using Fibre Channel or Gigabit Ethernet or by creating VLANs for storage data through a common interconnection. A storage network removes backup data from the production network, provides gigabit transport, and enables other backup and storage technologies to emerge.

As a transitional configuration, the SAN in Figure 12-4 is installed solely to offload the production network. The server connectivity can be either Fibre Channel HBA or iSCSI NIC, with the appropriate Fibre Channel or IP storage SAN switch providing access to a shared tape subsystem. Because the tape subsystem appears to each server as another SCSI device on a separate SCSI bus, it is accessible to the tape backup client residing on each server. The backup scheduler instructs each server when to perform backup and what kind of backup to perform on a sequential basis. Because the backup data path is now across a dedicated interconnection, the constraints of the messaging network are removed from the backup process and the burden of backup traffic is removed from the LAN.

Figure 12-4. Transitional LAN-free backup implementation for direct-attached storage

of 407

The multigigabit performance of the SAN and the flexibility of moving backup data on its own dedicated transport, however, do not resolve every issue associated with this backup implementation. Although the SAN transport may allow backup data to move at high speed, other limiting factors include server performance, data rate of the parallel SCSI drives, the type of data being backed up, and the performance of the tape subsystem itself. The slowest component in a backup configuration is usually determined by the tape drive's sustained streaming rate. A tape unit may be able to stream only 10MBps to 15MBps and so cannot fully utilize the bandwidth made available by the Fibre Channel or Gigabit Ethernet SAN. The overall time required for full backups is thus only moderately improved by the SAN, although the scheduling itself is no longer dependent on and no longer interferes with LAN traffic patterns.

Because each of the three servers is now provisioned with a SAN interface, other options are available for reducing backup times. Tape subsystems can provide multiple SAN ports or sharing of tape drives so that multiple backup streams can be accommodated concurrently.

Further optimizing the backup routine requires several additional SAN components. Moving disk storage from parallel SCSI to SAN-attached arrays offers, among other things, the ability to remove the server from the backup data path. This is the most significant improvement from the standpoint of performance and nondisruptive backup operations. If server resources are freed from backup tasks, the servers are always available for user access. And if the backup process itself does not interfere with user access to data, the backup window is no longer defined by users nor by the relatively slow performance of the tape subsystem. Backups can be performed at any time, provided that the backup software handles file permissions and updates and that a SAN-attached backup agent exists to buffer data from disk to tape.

The backup agent can be an extended copy (third-party copy) utility embedded in a SAN switch, in a dedicated SAN-attached backup server, in a SAN-to-SCSI bridge product, or in the tape target itself.

Figure 12-5 demonstrates an extension of the departmental tape backup solution that incorporates SAN-attached disk arrays and a third-party copy utility resident on a SAN switch. In this configuration, backup data is read directly from disk by the copy agent and then written to tape, bypassing the server. Although the SAN provides the vehicle to move the backup data, the backup software must control when and where to move it. Concurrent backup and user access to the same data are possible if the backup protocol maintains metadata (file information about the actual data) to track changes that users may make to data (such as records) as it is being written to tape.

Figure 12-5. LAN-free and server-free tape backup with SAN-attached storage

of 407

As higher-performance native SAN-attached tape subsystems are available, the ability to back up and restore over the SAN will better accommodate the growing volume of data generated by enterprises. The SNIA Interoperability Committee has demonstrated terabyte per hour backup operations at workshops held in the SNIA Technology Center, highlighting the feasibility of building high-performance backup configurations for large enterprise networks. Less attention, however, has been given to the counterpart of backup procedures: restoring tape data to disk. It is not uncommon for companies to institute complex schedules for tape backup—with periodic incremental and full backups, tape labeling and rotation, and physical transport of tapes off-site—and yet never test the validity of tape data restoration to disk. A SAN infrastructure may facilitate streamlined tape backup operations, but a comprehensive backupstrategy must also verify the integrity of archived data. [ Team LiB ][ Team LiB ]

12.4 Server Clustering As enterprise applications have shifted from mainframe and midrange systems to application and file servers, the reliable access to data that the legacy systems provided (and that required decades of engineering to accomplish) has been compromised. To make their products acceptable for enterprise use, server manufacturers have responded with more sophisticated designs that offer dual power supplies, dual LAN interfaces, multiple processors, and other features to enhance performance and availability. The potential failure of an individual component within a server is thus accommodated by using redundancy. Redundancy typically implies hardware features but may also include redundant software components, including applications. Extending this strategy, redundancy can also be provided simply by duplicating the servers themselves, with multiple servers running identical applications. In the case of failure of a hardware or software module within a server, you shift users from the failed server to one or more servers in a server cluster.

The software used to reassign users from one server to another with minimal disruption to applications is necessarily complex. Clustering software written for high-availability implementations can be triggered by the failure of a hardware, protocol, or application component. The recovery process must preserve user network addressing, login information, current status, open applications, open files, and so on. Clustering software may also include the ability to balance the load among active servers. In this way, in addition to failover support, the servers in a cluster can be maximized to increase overall performance.

Small clusters can be deployed with traditional parallel SCSI cabling for shared data, but generally they are limited to two servers. SANs allow server clusters to scale to very large shared data configurations, with more than a hundred servers in a single cluster.

Because the focus of clustering is to facilitate availability, deploying a server cluster on a SAN typically includes redundant paths from multiple servers to data. Software on each server must monitor the health of hardware

of 407

components and applications and must be able to inform other servers in the cluster if a failure or loss of service occurs. This heartbeat status is usually propagated over a dedicated (and sometimes redundant) LAN interface. If redundant paths to data are provided, each server must also monitor the status of each SAN connection and must redirect storage traffic if a switch segment fails. In addition, the data itself may be secured via local or remote RAID mirroring, which provides a duplicate copy if a primary storage unit fails. This tiered strategy helps to ensure the availability of servers, access to data, and the data itself.

In the site represented in Figure 12-6, a cluster of eight servers is configured with redundant data paths over two separate SAN segments. Two SAN switches are configured as primary and secondary paths between the clustered servers and RAID disk arrays. For this installation, the status heartbeat can be configured with redundant Ethernet links between each server, so that the failure of an Ethernet link will not falsely trigger a condition in which each server wouldattempt to assume services for others. Because the clustering software determines which components or applications on each server should be covered by a failure, subsets of recovery policies can be defined within the eight-server cluster. Inthis configuration, all servers share a common database application, and subsets of three servers are configured for failover for specific user applications. The sample configuration can also be scaled to accommodate additional servers or storage by adding SAN switch ports via switch-to-switch links (E_Port for Fibre Channel, link aggregation for Gigabit Ethernet) or by installing direct-class SAN switches.

Figure 12-6. A fully redundant server cluster with dual data paths

For server attachment, dual pathing is not simply a matter of installing multiple Fibre Channel HBAs or iSCSI adapters. The operating system (for example, Microsoft .NET) or device drivers on the SAN adapter cards must be capable of monitoring link availability on each SAN connection and diverting storage traffic from one path to another in the event oflink or adapter card failure. Similarly, failover from one storage array to another is determined by the mirroring or data replication method used. Storage arrays that provide disk-to-disk data replication without server intervention may allow instantaneous failover (for synchronous mirroring) or delayed failover (for asynchronous mirroring).


12.5 Storage Consolidation Companies often find themselves in a reactive mode when attempting to meet the ever-growing demand for storage. When an individual file or application server exhausts its SCSI-attached storage capacity, an administrator may be forced to buy additional server/storage pairs simply to quickly provide more storage space. Over time, a data center may accumulate hundreds or thousands of servers, each with its own direct-attached storage. This spontaneous growth in the number of servers and the number of disks results in an expensive and often painful administrative headache. Youneed a much larger IT staff to manage large numbers of servers and storage devices, coordinate multiple tape backups, and maintain updates to servers and applications, and that requirement inhibits the alignment of data storage with corporate business priorities.

of 407

In addition, because each server is the exclusive owner of its attached storage, utilization of storage capacity throughoutthe enterprise may be out of balance. One server may have excess capacity while another exceeds capacity. Without a means to balance utilization, storage resources cannot be used effectively, resulting in unnecessary expense and administrative overhead.

The solution is to break the direct connection between servers and storage and to combine both on a common network. In this way, you can treat each element as a separate, manageable resource. You can assess server requirements independently of storage while sharing storage capacity among multiple servers. Storage area networking is thus the enabling infrastructure for consolidating both server and storage resources, reducing costs, and streamlining administration.

As shown in Figure 12-7, the spontaneous acquisition of servers and direct-attached storage imposes additional administration of multiple servers as well as multiple separate storage devices. Because storage is not shared, utilizationmay vary from server to server, resulting in inefficient use of your storage investment.

Figure 12-7. Inefficient capacity utilization with direct-attached storage

Placing both server and storage resources on a storage network enables you to reduce the number of servers and to consolidate dispersed storage capacity into a shared, highly available storage array. It also lets you manage more storage capacity with fewer administrators and use the familiar and more readily managed IP networking for high-performance storage access.

Depending on application requirements, the number of servers required to support database, Exchange, or other business applications can be reduced and consolidated onto more powerful platforms, as shown in Figure 12-8.Heterogeneous operating systems that previously required their own dedicated storage targets can now address their own volumes on a shared, highly available array.

Figure 12-8. Reducing the server population and providing centralized management of storage resources on the SAN

of 407

Although storage consolidation on a SAN helps you to reduce the number of servers and individual disk arrays you must administer, it is not a panacea for all storage management issues. Without storage virtualization tools, you must manually configure the multiple LUNs on an enterprise-class storage target and assign each LUN to an individual server. Capacity management is now centralized, but capacity is not automatically allocated. In heterogeneous operating system environments, you must use LUN masking to conceal, for example, LUNs used by UNIX systems from Windows-based servers. For high availability, you must support dual data paths from the servers, redundant SAN switches, and alternative pathing on storage arrays. So even though the SAN is an enabling infrastructure for reducing server and storage overhead, it presents its own demands and issues. SAN-specific requirements demand due diligence in network design, but they more than offset the severe disadvantages of the spontaneous accumulation of direct-attached storage and servers.


12.6 Internet Service Providers Internet service providers that provide Web hosting services have traditionally implemented servers with internal or SCSI-attached storage. For smaller ISPs, internal or direct-attached disks are sufficient as long as storage requirements do not exceed the capacity of those devices. For larger ISPs hosting multiple sites, storage requirements may exceed the SCSI-attached capacity of individual servers. Network-Attached Storage (NAS) or SANs are viable options for supplying additional data storage for these configurations.

In addition to meeting storage needs, maintaining availability of Web services is critical for ISP operations. Because access to a Web site (URL) is based on Domain Name System (DNS) addressing rather than physical addressing, you can deploy redundant Web servers as a failover strategy. If a primary server fails, another server can assume access responsibility via round-robin DNS address resolution. For sites that rely on internal or SCSI-attached storage, this technique implies that each server and its attached storage must maintain a duplicate copy of data. This solution is workable so long as the data itself is not dynamic—that is, it consists primarily of read-only information. This option is less attractive, however, for e-commerce applications, which must constantly update user data, on-line orders, and inventory tracking information. The shift from read-mostly to more dynamic read/write requirements encourages the separation of storage from individual servers. With NAS or SAN-attached disk arrays, data is more easily mirrored for redundancy and is made available to multiple servers for failover operation. As Figure 12-9 illustrates, NAS provides common data access over shared or switched Ethernet, allowing multiple Web servers to exist in a failover configuration.

Figure 12-9. A small ISP implementation using network-attached storage

of 407

SAN architecture brings additional benefits to ISP configurations by providing high-speed data access between servers and storage using block I/O instead of NFS or CIFS file protocols and by enabling scalability to much higher populations of servers. As shown in Figure 12-10, you can extend the SAN with additional switch ports to accommodate expansion ofstorage capacity and increased population of Web servers. This small configuration can scale to hundreds of servers and terabytes of data with no degradation of service. Backup operations can also be streamlined using server-free backup, as discussed earlier.

Figure 12-10. ISP configuration built with (a) iSCSI or (b) Fibre Channel

of 407

The upper diagram (a) in Figure 12-10 depicts a scalable ISP configuration using iSCSI for block I/O access to storage and tape. In this case, the Ethernet switch is a common interconnection both for Web traffic via the IP router and for block access to storage data. Although servers can be provisioned with dual Ethernet links to segregate file and block traffic using VLANs, some iSCSI adapters support file and block I/O on the same interface. Depending on bandwidth requirements, this solution may minimize components and wiring and simplify the configuration.

The lower diagram (Figure 12-10b) shows the functional equivalent to (a) but uses Fibre Channel instead of iSCSI. This configuration introduces additional devices and connections but fulfills the requirement for high performance and scalable access to shared storage resources. [ Team LiB ][ Team LiB ]

12.7 Campus Storage Networks The need to share storage resources over campus or metropolitan distances has been one byproduct of the proliferation of SANs on a departmental basis. Separate departments within a company, for example, may make their own server andstorage acquisitions from their vendor of choice. Each departmental SAN island is designed to support specific upper-layer applications, and so they may be composed of various server platforms, SAN interconnections, and storage devices. It may be desirable, however, to begin linking SANs to streamline tape backup operations, share storage capacity, or share storage data itself. Creating a campus network thus requires transport of block storage traffic over distance as well as accommodation of potentially heterogeneous SAN interconnections.

By standard, native Fibre Channel supports distances of as much as 10 kilometers over single-mode fiber-optic cabling and longwave transceivers. This is sufficient for many campus requirements, but to drive longer distances requires additional equipment. The main issue with native Fibre Channel SAN extension is not the distance itself but the

of 407

requirement for dedicated fiber from one site to another. Many campus and metropolitan networks may already have Gigabit Ethernet links in place, but to share the same cable by Fibre Channel and Gigabit Ethernet simultaneously requires the additional cost of dense wave division multiplexing (DWDM) equipment. Consequently, it has proven to be more economical to leverage existing Gigabit Ethernet services and use FCIP or iFCP protocols to transport Fibre Channeltraffic over IP.

As discussed in Chapter 4, extending Fibre Channel over distance also extends the scope of Fibre Channel switch-to-switch protocols. Connecting Fibre Channel switches builds a single layer 2 fabric, and therefore multiple sites in a campus or metro storage network must act in concert to satisfy fabric requirements. State change notifications and SNS updates can be broadcast throughout the campus network, or disruptions at one SAN site can propagate to all others. Although native Fibre Channel extension and FCIP tunneling are vulnerable to fabric-wide behavior, the problem can be addressed with iFCP gateways. These satisfy the need for Fibre Channel SAN connectivity but preserve the autonomy of each SAN site and isolate potential disruptions.

The other legacy issue for campus SANs based on Fibre Channel is lack of interoperability in multivendor switch environments. Depending on the vintage of Fibre Channel switches involved, it may not be possible to accommodate older microcode versions. With newer products, you may have more success in implementing standard ANSI FC-SW-2 switch interoperability, whereas some iFCP gateway products help you to connect different vendors' fabric switches and avoid having to build a fabric.

The Fibre Channel-specific extension issues are eliminated if the hosts and storage end systems throughout a campus are iSCSI-based. iSCSI appears on the campus IP network or MAN as simply additional IP data, with no inherent distance limitation, no need for DWDM or tunneling, and no special switch behavior. At each site, iSCSI servers and iSCSI storage arrays are connected to standard Gigabit Ethernet switches and can then take advantage of any available IP connectivity from site to site. In a campus environment, this is typically Gigabit Ethernet from building to building, or it may involve some mix of Gigabit Ethernet and IP over ATM or wide area transport. As with any storage application, you must administer authorized assignment of servers to campus-attached storage resources to avoid connectivity-induced chaos.

Figure 12-11 shows a campus storage network with a heterogeneous mix of Fibre Channel and iSCSI-based SANs. In this example, existing Gigabit Ethernet links connect the various buildings. Depending on bandwidth requirements, theselinks can be shared with messaging traffic or can be dedicated to storage. For Fibre Channel SAN connectivity, FCIP could be used, but this example shows iFCP gateways to ensure autonomy of each departmental SAN and isolation from potential fabric disruption. The administrative building is shown with aggregated Gigabit Ethernet links to the data centerto provide higher bandwidth, although 10Gbps Ethernet could also be used if desired. The development center is shown with an iSCSI SAN, which requires only a local Gigabit Ethernet switch to provide connections to server, storage, and thecampus.

Figure 12-11. Campus storage network with heterogeneous Fibre Channel and iSCSI SANs

of 407

This campus configuration could support multiple concurrent storage applications, such as consolidated tape backup to the data center or sharing of storage capacity between sites. Because the campus connectivity is built with Gigabit Ethernet and IP, network designers can use IPSec and IP quality of service mechanisms to safeguard storage data and prioritize storage traffic against other IP loads. Finally, the integration of iFCP gateways to support Fibre Channel enablesany-to-any connectivity between iSCSI and Fibre Channel resources. In this way, the composition of the campus SAN solution can be altered over time as the Fibre Channel resources are maintained and new iSCSI devices are introduced.


12.8 Remote Tape Vaulting Administrators of enterprise networks with dispersed geographical locations are often overwhelmed by tape backup requirements for remote offices and regional centers. The most common scenario is that data backup at each remote facility is entrusted to the local branch manager—meaning that the manager throws in a tape before locking the doors atnight. Tapes may or may not be transported off site, and tapes are rarely tested to see whether the data can in fact be restored. This is somewhat like carrying expensive car insurance but never knowing whether the insurance company will honor a claim if one is filed. Because the local branch manager or administrative assistant is probably not excited about data integrity or the tape backup procedure, tapes may not be changed, may not be properly labeled, or may be overwritten by multiple backups. Nonetheless, data on local storage is still critical to business continuance, and there may be no copy of customer information and transactions on centralized enterprise storage.

Consolidating remote tape backup operations was not feasible as long as SAN architects were constrained by the inherent distance limitations of native Fibre Channel and the exorbitant cost of dedicated or DWDM links. But IP SAN andIP gateway solutions have largely resolved the distance issue and now enable rationalized tape backup strategies based on more affordable and available IP network services, a strategy called remote vaulting.

As shown in Figure 12-12, you can use IP network services to consolidate data backup for local branch offices to a regional center. In this example, branch offices in the Midwest are connected to a regional facility in Houston, and

of 407

branch offices in other regions can be backed up to the closest facility. Branch office storage can be connected to the appropriate regional center through iSCSI or iFCP gateways. The regional centers, in turn, can leverage more efficient tape libraries for both branch and regional center data.

Figure 12-12. Remote tape vaulting from branch offices to a regional data center

To minimize the amount of backup traffic on each link, daily incremental backups from branch file servers would be made to the regional facility. Depending on the amount of data that must be regularly secured, the backup window and bandwidth requirements can be calculated using the formulas given in Table 6-3. Use of data compression and jumbo frames for streaming block backup data, however, maximizes link utilization and enables use of more economical wide area IP connections (such as T3). These features are available in IP routers as well as iFCP gateways.

Remote vaulting centralizes the management of data archiving and shifts the responsibility for backup from non-IT administrators to full-time IT staff. Although this arrangement facilitates verification of backup operations and secures dispersed corporate data, there is always the possibility that a regional center may suffer an outage. One of the benefits of remote vaulting over IP services is that backup data can be redirected through the IP network to an alternative operational facility. As discussed in the next section, regional center SANs can also be configured to provide mutual failover for both disk and tape data resources.


12.9 Disaster Recovery Like tape backup operations, disaster recovery (DR) has been viewed as a necessary but unattractive requirement for IT storage strategies. The cost of implementing a DR solution is balanced against both the likelihood of a major disruption and the impact on business if access to corporate data is lost. Unfortunately, disaster recovery tends to move toward the top of IT priorities only after major natural or human-caused disasters. As a sad commentary on our times, disaster recovery is now a high priority for many institutions and companies, with IT administrators scrambling to implement DR strategies within the confines of budget and personnel limitations.

The scope of a DR solution is more manageable if administrators first identify the types of applications and data that are most critical to business continuance. Customer information and current transactions, for example, must be readily accessible to continue business operations. Project planning data or code for application updates is not as mission-critical, even though such code may represent a substantial investment and should be recovered at some point. Reducing the volume of data that must be accessible in the event of disaster is key to sizing a DR solution to capture what is both essential and affordable.

In addition to prioritizing corporate data, you must consider the time it will take to restore operations. What duration of

of 407

outage is acceptable without risking loss of ongoing business? If a company can withstand several days of outage, tape backup and restore may be feasible. Otherwise, disk mirroring from primary to DR sites may be required. In practice, some combination of mirroring for business-critical data and tape backup for less critical data is often deployed so that full business operations can be restored in the event of catastrophe, meaning complete loss of the primary site.

Another fundamental challenge for DR strategies is to determine what distance is sufficient to safeguard corporate data. Formerly, for example, financial institutions in New York would create disaster recovery sites in nearby New Jersey or surrounding states. But metro or regional distances are no longer viewed as sufficient to ensure continuous operation, which now requires interregional separation of hundreds or thousands of miles. Similarly, primary data centers in geologically unstable locations such as San Francisco or Seattle are better served by DR facilities well away from fault lines or potentially devastating tsunamis.

Accommodating distance requirements for open systems storage has always been an issue for native Fibre Channel SAN extension. The high cost of dedicated dark fiber, distance limitations of ~50 miles, and performance problems beyond a metro circumference make native Fibre Channel extension unsuitable for robust DR scenarios. FCIP and iFCP can providelong distance support for Fibre Channel-originated storage traffic, whereas iSCSI offers a native IP storage solution to address the distance issue.

The maximum distance allowed depends on the type of DR strategy to be implemented. As demonstrated by the Promontory Project in 2001, streaming applications such as tape backup can be sustained over thousands of miles at gigabit speeds. Latency-sensitive applications such as synchronous disk-to-disk data replication, however, may not tolerate the natural speed-of-light latency induced by extremely long wide area links. Therefore, you should verify the tolerance for latency by vendor-specific disk replication products (synchronous or asynchronous) as part of your DR site selection process.

The DR configuration shown in Figure 12-13 supports both data replication and tape backup options and uses IP networkservices to connect the primary site to the DR site. In this example, both sites have director-class Fibre Channel switches that connect servers, storage, and tape. The iFCP gateways provide IP connectivity for the wide area link and are connected via E_Ports to the Fibre Channel directors as well as direct-connected to the storage arrays. This dual connection serves two purposes. First, it establishes a path for tape backup from the production site to the DR site. This enables less mission-critical data to be vaulted at the DR location, ready for restore on an as-needed basis. Second, the direct connection to disk enables synchronous or asynchronous disk-to-disk data replication for the most essential business data. The direct connection between the iFCP gateways and disk arrays eliminates the need to pass data replication traffic through the Fibre Channel director and so provides a more direct path from site to site.

Figure 12-13. Disaster recovery configuration using IP network services and disk-based data replication

This example can be extended to additional locations by using IP routing to provide any-to-any connectivity. In the tape vaulting diagram shown earlier in Figure 12-12, for example, multiple regional data centers can be configured for mutualDR support on a round-robin basis—for example, New York Houston Los Angeles Seattle New York. The primary storage array at each location could serve as a secondary mirror for its upstream neighbor. For large financial institutions, insurance companies, or other enterprises with multiple regional data centers, this design offers a means to implement an economical DR strategy using existing resources and mainstream IP network services.

of 407


12.10 Chapter Summary � SANs can be designed around specific upper-layer application requirements.

� A single SAN configuration can support multiple storage applications concurrently.

Post-Production Video Editing

� Video editing requires sustained transmission of storage data at multigigabit speeds and is intolerant of transmission delays.

� Video editing applications may require auxiliary file sharing middleware.

� Digitized video streams may require ~30MBps throughput, and high-definition video may exceed ~130MBps.

Prepress Operations

� Prepress applications are characterized by bursty transfers of large graphics files and shared data requirements.

� Sequential editing of a single graphical image by multiple graphic artists requires file locking middleware to ensure data integrity.

LAN-Free and Server-Free Tape Backup

� LAN-free tape backup can be implemented by using dedicated Fibre Channel or Gigabit Ethernet network segments.

� A SAN for backup traffic can be implemented in a mixed direct-attached disk and shared disk environment.

� Server-free backup uses third-party copy agents to enable tape backups to be performed without affecting server resources.

� Third-party copy agents can be embedded in SAN switches, SAN-attached backup servers, SAN bridges, or SAN-attached tape subsystems.

Server Clustering

� Server clustering can provide both failover and load balancing solutions for high-availability requirements.

� Servers monitor each other's status via a heartbeat protocol, typically over Ethernet.

� SANs provide shared access to storage for both load balancing and failover implementations.

� Server clusters are typically configured with dual data paths and redundant SAN switches to ensure high availability.

Storage Consolidation

� Direct-attached SCSI storage often results in inefficient utilization of storage capacity.

� Storage consolidation on a SAN streamlines storage administration and simplifies tape backup operations.

of 407

� LUN administration and LUN masking are used to manage assignment of storage to individual servers.

� Storage consolidation accommodates heterogeneous operating systems on a single SAN infrastructure.

Internet Service Providers

� ISP applications require high availability of large amounts of data.

� Domain Name Systems (DNS) allow one server to assume transactions for another.

� A SAN allows multiple servers to access the same content for Web serving and e-commerce applications.

Campus Storage Networks

� Distance limitations and dedicated link requirements of Fibre Channel can be overcome with IP storage gateways.

� Native IP storage protocols such as iFCP and iSCSI can be used to link multiple SANs over campus or metro distances while preserving the autonomy of each site.

� Campus SANs facilitate consolidated tape backup and storage resource sharing.

Remote Tape Vaulting

� Branch office tape backup is often unreliable and is rarely verified for integrity of tape data.

� IP storage gateways or iSCSI can be used to centralize backup operations for multiple dispersed sites.

� Bandwidth requirements can be minimized by performing incremental backups.

� Data compression and jumbo frames for IP storage further enable maximum utilization of available wide area bandwidth.

Disaster Recovery

� Disaster recovery strategy criteria include prioritization of corporate data, allowable time to restore data, and distance sufficient to safeguard business continuance.

� Current IP storage products can be used to extend DR links to hundreds or thousands of miles between primary and DR sites.

� DR strategies may include both disk-to-disk data replication for critical data and tape backup for secondary data.

� Synchronous data replication is sensitive to latency and may not be suitable for extremely long distances.

� Enterprises with multiple regional data centers can leverage IP services and data replication to implement a round-robin DR solution.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 13. SAN Issues Storage networking has proven to be a disruptive technology, dramatically impacting applications and infrastructures as well as vendors and customers. SANs represent a positive paradigm shift from the previous generation of direct-attached storage, with storage networking itself undergoing further internal paradigm shifts with the development of storage virtualization and IP-based SANs. Because SAN technology is driven mainly by competition between vendors andthe perpetual tug-of-war between vendors and myriad customers, there are few objective guideposts to shape its development. This is capitalism, after all, and not a technocracy driven by the pursuit of pristine solutions. In the end, the purchasing power of customers affirms or denies the viability of a technology and highlights its strengths and weaknesses. This chapter discusses the outstanding issues that the storage industry must address to remove the remaining obstacles to SAN adoption.


13.1 Standardization Standardization is the foundation of the open systems environments that have displaced the monolithic and proprietary computing technologies of the 1960s and 1970s. In that former state of single-vendor monopoly, everything from networking protocols to cabling schemes was a closed system, forcing customers to depend on a single supplier that set its own, often exorbitant prices. This gave vendors tremendous power over customers, to the extent that IT administrators would often be risking their careers if they sought alternative solutions or suppliers. The monopoly over computer systems and networking was eventually broken by technologists who produced architectures and protocols that could be implemented by multiple vendors, thus fostering competition to deliver similar solutions more economically. By the mid-1980s, the standardization of Ethernet, TCP/IP, SCSI, and other technologies, and the decentralization of computing resources through introduction of networked computing, validated the open systems initiative and made standards compliance a prerequisite for technology acquisition. Today, a customer request for quotation (RFQ) is typically accompanied by a laundry list of compliance checkboxes for IETF, ANSI, or IEEE standards, which must be fulfilled before a vendor's products will be considered.

The scope of standards-based technologies is currently limited to the lower levels of computing systems. Cable plants, interconnections, transport protocols, and management protocols may be defined by standards, but processors, operating systems, and upper-layer applications are not. Consequently, although customers can impose rigorous requirements for standards-compliant infrastructures, the criteria for application selection are more subjective. De facto or in-house standards vary from customer to customer, with a preference for, say, Windows by one customer and Solaris by another. In storage networking, upper-layer storage applications—such as server clustering, backup/restore, volume management, and storage virtualization—are predicated on a standards-compliant infrastructure, but these applications themselves are largely free from the constraints of standardization. Data can be backed up with one application (such as Veritas) but cannot be restored with another (such as Legato). This situation creates an inherent barrier to uniform operation of storage networking systems as well as "single pane of glass" management solutions that can span both heterogeneous storage infrastructures and applications.

The most interesting chess moves in the game of market dominance are played out at the top, by vendors of storage applications and operating systems, who leverage the benefits of underlying open systems architectures to build their own proprietary solutions. At this level, the marketplace, and not standards organizations, determines what is viable andcompliant. Because customers are not aggressively driving open systems criteria to the upper application layers and seem content to accept de facto instead of industry standards, they should be prepared to accept the consequences: vendor lock-in, monopoly pricing, and incompatibilities due to proprietary implementations.

The standards process itself reveals a fundamental contradiction between the need for vendors to demonstrate both standards compliance to appease customers and differentiation in features or functionality against competitors. In standards organizations, this contradiction is revealed in the maneuvering by vendors to exert influence over the scope and content of works in progress. A vendor may, for example, submit a nearly finished technology to an open standards process, having already deployed it in products but needing the Good Housekeeping Seal of Approval of standards compliance. This tactic simultaneously removes the stigma of proprietary ownership while ensuring a first-to-market

of 407

opportunity. Similarly, features of a proposed protocol or architecture may play to the strengths of one vendor and deliberately against the weaknesses of others. The deliberate stalling on Fibre Channel switch routing protocols in ANSI T11 is a classic and notorious example of vendor manipulation of the standards process. These technical parries and counterthrusts are delivered through e-mail reflector threads, proposals, and heated discussions that occur as a proposalstruggles from draft to full standard.

In addition, even in vendor-neutral standards organizations such as the IETF, which fosters individual rather than company contribution, vendors are eager to occupy key positions in the standards hierarchy at least to monitor, if not control, standards processes. This Machiavellian behavior may seem in stark contrast to the rational and impartial conduct required for technology development, but it is only the natural consequence of technology evolution in a market economy. The missing counterweight to vendor influence is more active participation by customers, who, instead of being passive recipients of open systems technology, can ensure that the vendor community is focused on customer requirements. This would also shorten the validation cycle. Technology must spend considerable time in the market before the standards wheat is separated from the standards chaff.

Standardization of storage networking technology is especially complex. The melding of storage with networking has generated new technical challenges, not all of which lend themselves to uniform requirements that can be codified into standards. Those that do lend themselves to uniform requirements have fallen under the purview of separate standards bodies, which may have very different strategic charters. SCSI, for example, falls under the ANSI T10 Committee; Fibre Channel under ANSI T11; the Fibre Channel management MIB under IETF; iSCSI, iFCP, and FCIP under IETF; Gigabit Ethernet under IEEE; the Common Information Model under DMTF (Distributed Management Task Force) and potentially the SNIA; storage security under ANSI T11 for some components and IETF for others; InfiniBand under the IBTA (InfiniBand Trade Association); and so on. The SNIA Technical Council is analyzing storage virtualization technologies to see whether standardization is feasible, although the initial challenge has been simply to define what storage virtualization is. In addition, the SNIA may become a standards body for storage networking-specific areas such as CIM/WBEM management, something that would require coordination with DMTF.

The complex coordination required to sustain an open systems framework for storage networking may be hidden from customer view, but the practical outcome is not. The two main complaints still lodged against SANs are lack of interoperability and lack of comprehensive management tools. In addition to delays due to vendor competition and subterfuge, storage networking technology has been hampered by the lack of a single umbrella under which diverse standards initiatives can be coordinated and brought to fruition. As a volunteer, vendor-focused organization, the SNIA has contributed to this effort with technical workgroups that feed standards requirements to ANSI, IETF, and other bodies. A much higher level of synchronization is needed, though, to close the gap between requirements formulation and delivery of viable standards in real products to customers. More active participation by the consumers of shared storage technology would help to achieve this goal.


13.2 Interoperability One of the main objectives of standardization is to promote interoperability between the products of different vendors. Standards compliance and interoperability, however, are two distinct states. No standard is so rigorously defined that it excludes variations in interpretation by engineers. Two similar products may therefore be designed to the same standardand yet fail basic compatibility testing. This was evident in the early standards development of Fibre Channel technology, which in turn encouraged the development of vendor plugfests and standards conformance test suites by the University of New Hampshire (UNH). This work has been extended by the Fibre Channel Industry Association (FCIA) into its SANmark program for pass/fail compliance testing for various levels of Fibre Channel protocols. Similarly, compliance with early drafts of the iSCSI protocol proposal has been periodically validated by UNH-sponsored plugfests, and the SNIA has invested substantial funds in developing a standards compliance test harness that can be adapted to multiple management protocols.

In addition, the SNIA Interoperability Committee hosts multimillion-dollar interoperability demonstrations at the SNIA Technology Center and Storage Networking World conferences, which provide showcase examples for customers and gives vendors the opportunity to test their products with a more diverse range of hardware and software components. Aligning standards development with compliance testing accelerates product time to market and, hopefully, removes most of the burden of discovering interoperability issues from the backs of customers.

of 407

Interoperability in SANs is still problematic, however, for a number of reasons. With hundreds of vendors in the same market space, it is simply not possible to test every permutation of product interaction. An interoperability matrix must include not only lists of vendor products but also specific versions of microcode or software that have been tested, along with combinations of multiple products in the same configuration. Even large vendors with sophisticated interoperability labs have difficulty staying current with new releases of products and microcode from their peers, and the qualification process for a specific product may be in a long queue behind other newly released or untested products.

In addition, because SAN components include everything from transceivers to upper-layer storage software applications,interoperability issues may derive from higher levels not covered by existing standards or interoperability test processes.An unexpected mismatch between the upper-layer application, the operating system, middleware, device drivers, or virtualization engines may result in failure, even though the rest of the SAN system is thoroughly interoperable. This highlights the value of testing total solutions before customer installation to ensure success (and avoid embarrassment).

Another source of SAN interoperability issues is the formation of vendor affinity groups, or alliances that promote interoperability between specific vendors but lock out others. A subtle, agreed-upon change in the use of a bit in a protocol stream may or may not violate the relevant standard but may result in interoperability problems for everyone except the participants. Sometimes these collective changes are meant only to enhance functionality between a given set of products and thus add value to the solution. In other instances, they amount to a conspiracy to produce a workingsolution that is at best difficult for others to attach to. The excluded vendors are thus forced to reverse-engineer the modification and are always a step behind.

The more sinister roots of interoperability problems are found in the lower depths of vendor competition and are expressed in the desire to trade industry compliance for market share. A dominant vendor may, for example, refuse to participate in interoperability activity with competitors and simply declare itself to be the de facto standard. This strategy may even work for a while, until customer complaints and bad press eventually begin to impact sales. Although there are no laws in the storage industry to punish the evil doers, customer commitment to open systems interoperability exacts its own justice in the end.

As long as new storage products and technologies continue to surface in the market, it will always be a challenge to dealwith interoperability issues proactively. The catch phrase "plug and play" is not really applicable to SANs, at least not for some time to come. Even vendor-tested solutions generate the occasional surprise that only seems to occur on the customer site. Customers, however, should always keep the vendors' margin-swollen feet to the flame to ensure that the vendor community, and not the customer base, bears responsibility for identifying and resolving interoperability problems. [ Team LiB ][ Team LiB ]

13.3 Management Along with interoperability, management of storage networks has trailed behind the introduction of functional SAN solutions. Because SAN management includes both transport and placement of storage data, it is considerably more difficult to simplify deployment and operation of storage networks than to streamline traditional network management. The proliferation of IP SANs will make storage transport management more accessible because of the maturity of tools for IP network monitoring, configuration, and traffic shaping. Merging transport management with storage LUN, volume, capacity utilization, and backup management is more problematic. The comprehensive framework offered by CIM/WBEM as an industry standard approach will be several years in the making, but it holds more promise than the current effort by large vendors to create management Band-Aids by swapping their proprietary storage APIs. Customers who settle forthese schemes will always be vulnerable to the politics of vendor alliances as agreements are made and broken by competitive interests.

The targets of management initiatives are typically the sundry hardware components that compose the SAN. Host bus adapters, SAN switches, hubs, bridges, tape subsystems, and disk arrays are all sources of management data. Instead of having separate management utilities for each device to monitor connectivity, errors, utilization, and other statistics, a common management framework is possible as long as all devices present management data in a uniform format. As with any upper-layer application, the management platforms that solicit and consolidate standards-compliant data from end devices are not themselves subject to application-level standards. There is no requirement, then, that management frameworks must play well together or must avoid stomping on each other when multiple frameworks are monitoring the same infrastructure. This is a problem for volume management when, for example, disk-based volume management and host-based volume management are inadvertently turned on at the same time.

of 407

One argument against standardization of SAN management is that it must be crafted around the lowest common denominator of product capabilities. Feature-rich proprietary APIs are stripped down to their bare essentials for the sake of standards conformance. SNMP network management addressed this issue long ago by allowing vendor-specific MIBs to coexist with industry-standard MIBs. The customer then has the option of leveraging richer functionality in specific products if desired, while still having uniform monitoring and management capabilities across all products. The operativewords here are "has an option," because it should be up to the customer, and not vendors, to determine what level of management is appropriate for the customer's configuration.

As with standardization and interoperability, customers who take a more active part in the industry discussions on storage management can help to define what is really needed and can accelerate delivery of useful management products to market. In a room full of noisy vendors, each of whom has vocal opinions about what a technology should do, things go amazingly quiet when a customer speaks up. [ Team LiB ][ Team LiB ]

13.4 Convergence The emergence of IP-based SANs, storage virtualization, InfiniBand, and hybrid NAS/SAN technologies has further complicated the SAN landscape. Just as Fibre Channel SANs are becoming lucrative, stable, and somewhat interoperable, new components are being thrown in that undermine the steady production and sale of prepackaged SAN solutions. Now storage vendors and solutions providers must create and qualify additional solutions that accommodate the advances introduced by new storage technologies.

The Fibre Channel fundamentalists advocate a straightforward approach to this problem: to simply declare that Fibre Channel is the one true SAN that will reign forever and forever. The remainder of the storage networking community, however, is not as devout, as shown by the introduction of iSCSI interfaces on enterprise-class storage arrays, multiprotocol SAN switches that support both Fibre Channel and IP, the support of SCSI block access by traditional NAS vendors, and support of file and NAS access by traditional RAID array vendors. This adaptive radiation of products promotes vendors' survival in the market by accommodating the diverse needs of customers, as opposed to bending customers around a specific architecture.

Because storage networking is itself a convergence of highly dissimilar storage and networking technologies, it is not surprising that further permutations will occur until a level of synthesis is reached. For operating systems, convergence is appearing in the form of integrated support for storage networking, as shown by support for dual pathing, virtualization, backup, and iSCSI in Windows .NET. For applications, convergence is appearing in a tighter linkage between upper-layer applications and application-aware intelligence in the SAN. For the interconnection, convergence is surfacing as multiprotocol SAN switches and SAN internetworking, as well as integrated interfaces for new InfiniBand clusters. For storage targets, concurrent support for multiple interfaces and protocols as well as virtualization intelligence enables customers to select a single product that can adapt to changing requirements.

Convergence is a transitional state. At some point, the convergence process in storage networking will generate a new, uniform architecture based on the best traits of its contributing technologies, just as on a larger scale the Internet has emerged as a distinct entity, far greater than the sum of its parts.

Although the ongoing convergence of storage networking technologies has its own noise and confusion factor, it is creating new options for customers and is broadening the market opportunity for vendors. More rigor on the standards, interoperability, and management fronts will help to remove the remaining obstacles to SAN adoption, and convergence will facilitate productive deployment of SANs for enterprises and small businesses alike.


13.5 Chapter Summary Standardization

� Standardization of architectures, protocols, and interconnections is the foundation of open systems.

of 407

� Standards compliance is often a mandatory requirement for technology acquisition.

� Standardization applies to infrastructures but rarely to upper-layer applications that run over those infrastructures.

� Vendors must comply with industry standards while simultaneously asserting their product's unique added value.

� The standards process is vulnerable to vendor-specific interests.

� Standards for storage networking are governed by a diversity of standards organizations.

Interoperability

� Standards compliance and interoperability are two distinct states.

� Implementations of products on the same standard may vary because of differing interpretations of the standard.

� Concurrent standards compliance and interoperability testing via plugfests help to ensure compatibility between standards-based products.

� The diversity of SAN products and applications makes it impossible to ensure interoperability for every configuration.

Management

� Management of SANs includes data transport management and data placement management.

� Uniform storage networking management may be possible through adoption of CIM/WBEM as a common management interface.

� Upper-layer management frameworks depend on standards-compliant end devices.

� Management frameworks need not be interoperable.

Convergence

� SANs are the result of technology convergence between networking and storage architectures.

� Convergence is an ongoing process, as shown by the emergence of IP SANs, storage virtualization, InfiniBand, andNAS/SAN hybrids.

� Ultimately, convergence will generate a new uniform architecture for shared storage.

� Convergence is providing new options for customers and is expanding the potential market for SANs.

[ Team LiB ]

of 407

[ Team LiB ]

Chapter 14. The Future of SAN The future of storage networking is being shaped by a spectrum of existing and emergent technologies and the real-world applications they are designed to serve. Foretelling the future of anything is at best an educated guess, but the history of IT has repeatedly shown the path that any successful technology will inevitably take. This chapter speculates on the future of SANs as it might unfold over the next few years.


14.1 Integration of SANs into Mainstream Networking The historical trend of successful information technologies is toward ubiquitous deployment, enhanced functionality, and lower total cost of acquisition and operation; or, put simply, more, better, and cheaper. The Internet, for example, would not be possible without uniform and affordable technologies such as IP presentation and transport protocols, low-cost computing platforms, routed infrastructures, and ubiquitous and affordable access services (Ethernet, DSL, dial-up, and so on). At the same time, convergence does not automatically eliminate diversity. Voice over IP is gaining momentum for enterprise use but is still a minority application compared with legacy telephony systems. For voice over IP to displace classical telephony, it must meet these three basic criteria for successful technology evolution.

Ethernet began as a rather difficult and expensive technology—requiring thick coaxial cable with limited distance and device support—and was deployed primarily on midrange computing systems. In a fairly short time, it was adapted to thin coax cable and then to unshielded twisted-pair wiring, making it much more economical to deploy. When the functionality of Ethernet was enhanced by Ethernet switching, the increase in link speed from 10Mbps to 100Mbps gave Ethernet a clear performance advantage over the competing technologies of Token Ring and FDDI. The proliferation of Ethernet products simultaneously drove down the cost, eventually pushing the technology into the home at the low end while making it ubiquitous throughout enterprise networks. Just as widespread deployment drives down cost, lower costs accelerate widespread deployment. Pervasive adoption also fosters technology enhancement, and enhancements validate the viability of further deployments. More, better, cheaper becomes an escalating and ever-expanding circle of technology renewal.

Storage networking temporarily defied this evolutionary process by maintaining high margins on storage systems and interconnections, making only moderate improvements in interoperability and management, and targeting high-end enterprises with large budgets. SANs would thus be marginalized to top-tier enterprise networks were it not for the introduction of IP storage transports and more affordable storage targets based on ATA/IDE. These innovations preservethe basic value proposition of SANs in terms of resource sharing, consolidation, availability, and data archiving, but they build SANs on products that are easily acquired and supported.

The merger of SANs with mainstream networking, both for local data center applications and for distributed or wide area requirements, eventually leads to commoditization of storage networking, the bane of storage vendors and solutions providers. Instead of a captive market that can command premium pricing, vendors will find themselves in an open market driven by volume and volume pricing. Far from inhibiting improvement in functionality and quality, the mass market actually drives enhancements to differentiate one product from another. PCs, for example, now have much richerfunctionality and capabilities with each succeeding generation, with higher functionality available at a fraction of the previous cost point. As volumes increase and prices drop, customers benefit at the expense of vendor margins. Vendors in turn maintain profitability only by expanding market penetration, pulling more money from a wider customer base.

The appropriate analogy for what is currently happening to SAN technology is the shared printer networking market of the 1980s. For some time, sharing of printer resources was an art requiring special equipment, skills, and software. Vendors at the time made high margins at the expense of the customers' desire to streamline corporate printing operations. Salespeople sent their kids to college, paid their mortgages, and bought cars, sailboats, and other perks of deal closing, all on the basis of networked printer access. In less than a decade, however, networked printers became commoditized, with PC-attached sharable color printers now priced at less than $100. In the meantime, printer technology experienced a dramatic improvement in speed and capability. Fast laser and color inkjet printers with integrated Windows device driver support, high-resolution color photo quality, and support for multiple paper formats

of 407

are sold by the pallet-load at Costco and other bulk outlets. Whether via Ethernet or USB interface, these devices can bereadily shared in an enterprise or home network and either are automatically discovered by the operating system or can be easily configured by any literate end user.

What happened to the complexity of printer sharing that initially enabled such high margins? It was subsumed by printerinterfaces and host systems, became embedded in silicon, or was handled by device drivers and configuration wizards. The complexity did not disappear; it disappeared only from view of the customer. Consequently, there is no equivalent to the high-priced SAN consultant or architect in today's mainstream shared printer market, and no Printer Networking Industry Association (PNIA) to coordinate the technological and marketing initiatives of printer vendors. The enormous success of shared printing has stripped it of its former elevated status and has democratized it throughout information technology, thus expanding the foundation of information technology as a whole by absorbing the solutions opportunistically created to serve it.

The integration of SANs into mainstream data communications does not eliminate diversity in product offerings or abolish the need for high-end solutions. The delta between high-end and midrange or between midrange and low-end, however, is less pronounced as vendors attempt to capture greater market share. Absorbing volume and resource management capabilities into an operating system, for example, deprives the third-party volume management vendors of their former monopoly and forces them to innovate by differentiating value-added functionality. RAID arrays that provide the sophistication of high-end systems but employ much lower-cost ATA technology force vendors of high-end storage to enhance their systems to maintain competitiveness. Low-cost SAN bridges that accommodate commodity disks while providing basic storage pooling capability help to drive greater intelligence into higher-end products while maintaining competitive pressure to keep pricing in check. Overall, the pricing commanded by the entire spectrum of solutions gradually falls, while the functionality offered at each level increases. The trend of technology to more, better, and cheaper products thus enhances SAN solutions over time and enables a broader market to share the benefits of storage networking. [ Team LiB ][ Team LiB ]

14.2 Ubiquity of Shared Storage The proliferation of SANs into mainstream IT networking enables storage to become a common utility—ubiquitous, uniformly accessible, on demand, and highly reliable. As articulated by Compaq's original Enterprise Network Storage Architecture (ENSA), a storage utility is a single storage repository of global extent that leverages open standards to provide any-to-any connectivity between servers and storage. Combined with management and monitoring tools, ubiquitous access to storage capacity simplifies data management and more closely aligns applications to their storage requirements.

Whereas the ENSA model was formulated for enterprise networks, mainstream SANs extend the storage utility concept to the broad IT marketplace. Like a large enterprise, a small health clinic, for example, has the same requirements for ready access to client information, transaction history, a database of technical information, inventory, scheduling, and soon, and needs reliable archiving and restoration of vital data in the event of system failure. Access to business-critical information should not require knowledge of the underlying storage infrastructure nor special skills to maintain or expand storage capacity. Analogous to Internet access, a storage utility should be as simple as making a connection andletting the network infrastructure itself determine optimum routes, formatting, and presentation of information as required by the upper-layer application. The storage utility thus assumes rigorous standards compliance to simplify connectivity between disparate storage resources, automation of policies to determine appropriate storage methods and redundancy requirements, and intelligence in the storage network to proactively respond to potential disruption or failures on the back end.

As storage becomes a readily accessible utility, it also becomes vulnerable to intrusion. The physical isolation of today's SANs behind the locked doors of data centers offers only a castle and moat model for security. For a storage utility to beviable, more sophisticated security mechanisms are required. The iSCSI standard, for example, mandates IPSec for end-to-end device authentication and data encryption. The iSNS protocol enables secure discovery and authorized connectivity for iSCSI and Fibre Channel devices attached to gateways on IP networks. Third-party products are also appearing in the market to provide data encryption for Fibre Channel transports and encryption of data to disk. These security initiatives are an essential complement to the storage utility model.

Ubiquitous access to storage data facilitates information generation and exchange. It does not, unfortunately, govern the quality of that information. Just as the Internet is the home to both scientific data and pornography, a storage utility can streamline data transport and management but cannot ensure that storage resources are used beneficially. As with

of 407

any commonly available technology, the rich functionality it provides and what people actually do with it are two different things.


14.3 Virtualization Integration of shared storage into mainstream networking enables ubiquitous access to storage but does not guarantee that such access will be used. Convenient access has little value if storage configuration and management are still manually intensive and complex. As discussed in Chapter 11, storage virtualization promises to simplify administration of storage resources by automating configuration and policy enforcement. At the same time, virtualization solutions must be sufficiently robust to deal with changes or failures in the storage network, one of the main shortcomings of available virtualization products. If a utility pole is under maintenance, customers should not have to phone the power company individually to have service rerouted through the grid or specify how many lamps or TVs they need enabled.

In the ideal storage utility, computing platforms and storage resources are simply connected to the network, and applications are run. Users do not care and do not need to know where, specifically, those resources reside or what magic is being done on their behalf behind the scenes. Virtualization intelligence should be undetectable but highly effective, application-aware, and simple to configure and deploy. Whether they are hosted on servers, SAN switches, or storage arrays, virtualization engines should also be interoperable and capable of supporting distributed configurations across multivendor environments.

Given the complexity of automating storage allocation, quality of service, security, policy enforcement, and resiliency into a virtualization intelligence, it will take considerable engineering to produce viable products even on a proprietary basis. Standardizing that capability presents further complexity and delay but is a prerequisite for widespread adoption. How quickly the ideal storage utility is delivered to users depends on how forcefully customers demand both functionalityand standards compliance and how readily vendors respond to the opportunity presented by a mass, utility-driven market.


14.4 Human Factors The convergence of storage and networking into a single technology has generated new architectures, new concepts, and a unique vocabulary known only to (and often disputed by) the initiated. Although vendors and technologists have assumed the role of evangelists to educate customers and explain the many benefits of SANs, there is no single audience on the customer premises responsible for both storage and networking. Enterprise networks may have server administrators who are responsible for installing, configuring, and maintaining servers with direct-attached storage, but no one dedicated to overall storage administration. Similarly, the responsibility of network administrators typically stops at the front end of the server, where the domain of familiar IP networking ends and the mysterious realm of SCSI and LUNs begins.

The division of labor between server/storage administrators and traditional network administrators within an enterprise is so complete that it is often the SAN vendor that brings the two groups into the same room and makes the introductions. The eyes of the storage administrators glaze over when the discussion turns to SAN routing protocols or State Change Notifications; the network administrators stare at the ceiling when it turns to LUN masking or tape backup strategies. Hopefully, in the end some crossover occurs, with storage administrators recognizing the value of networkingtheir storage resources, and network administrators appreciating the new sphere of influence introduced by SANs.

One of the challenges articulated by CIOs and executives responsible for overall information access within their organizations is the difficulty of integrating networked storage with their existing staff. This integration was particularly difficult when SAN solutions required expertise on Fibre Channel fabrics and diagnostic methods. Although vendor-specific training certification and the SNIA FC SAN certification program have addressed the granular aspects of architecture-specific SANs, there is as yet no comprehensive training or certification program for storage networking

of 407

technology or SAN design. Consequently, although a storage administrator may be able to correctly configure a particular vendor's product, the overall SAN design may not be optimized for the application it serves.

Between the ideal of the ubiquitous, simplified storage utility and the current reality of complex and limited SANs, customers and the industry will need to invest in human resources to leverage the long-term cost benefits of shared storage solutions. The alternative to upgrading the human skills infrastructure is to simply add more servers to accommodate more direct-attached storage capacity, more server administrators to manage them, and thus significantlymore cost over time.


14.5 Contributing Technologies The future of storage networking is also being shaped by emergent technologies having only tangential associations withstorage. InfiniBand, for example, is first appearing as a server architecture, despite the InfiniBand specification's declared intent to rule the IT world. InfiniBand resolves the limitations of server bus architectures and expedites deliveryof application data into server memory. Although it is optimized for high-performance server cluster applications, it has not demonstrated clear end-user value for networking or storage applications. The translation mechanism between InfiniBand and other architectures is defined in the target channel adapter (TCA) specification, and TCAs may provide the link between InfiniBand clusters and Fibre Channel or IP SAN storage assets.

In the Ethernet world, the availability of 10Gbps Ethernet interfaces encourages development of iSCSI storage targets with multiple, high-performance links to serve hundreds of iSCSI initiators simultaneously. Although 10Gbps Fibre Channel may also provide high-performance connectivity, it lacks the ability to seamlessly integrate with mainstream networks and leverage 10Gbps interswitch links now available on Gigabit Ethernet switches. Scalability and simplified deployment, as well as reduced cost due to volume production, are thus giving the advantage to 10Gbps Ethernet.

Higher-capacity and significantly cheaper ATA disks are already exerting pressure on more expensive SCSI and Fibre Channel arrays for midrange applications, and new solid state technologies may displace spinning media altogether at the high end. Enterprise networks that are currently spending several million dollars for high-end arrays may find it difficult to justify the expense for 99.999% up time, when 99.99% can be had for a fraction of the cost. At the same time, the ability to recycle SCSI or ATA drives via cheap iSCSI bridges simplifies the buying decision dramatically for departmental and low-end storage needs.

Object-based storage, distributed file systems, virtual interface (VI) protocols, and remote direct memory access (RDMA) protocols—as well as technologies still in deep stealth mode—will add to the diversity (and confusion) of the storage networking market. For some vendors, these new technologies introduce chaos and threaten established market share; for others, they provide new opportunities against the incumbents. For customers who can navigate the oncoming waves of technology and can separate vendor marketing from reality, the destination will be, if not exactly thepromised land of storage, at least significantly better than the one they left behind.


14.6 Chapter Summary Integration of SANs into Mainstream Networking

� The historical trend of information technologies is toward widespread deployment, greater functionality, and lower total cost.

� The core value of SANs is reemerging on the basis of affordable and supportable mainstream technologies.

� The integration of SANs into mainstream networking fosters diversity in products to provide differentiation

of 407

between vendor offerings.

Ubiquity of Shared Storage

� The proliferation of SAN technology results in ubiquitous storage access.

� Compaq's Enterprise Network Storage Architecture (ENSA) first defined the storage utility model.

� A storage utility is a single storage repository of global extent that leverages open standards to provide any-to-anyconnectivity between servers and storage.

� Ubiquitous access to storage requires security mechanisms to prevent unauthorized access.

Virtualization

� Ubiquitous access does not guarantee that access will be used.

� Storage virtualization complements and enhances common storage access.

� Virtualization intelligence should be undetectable.

� Standardization of virtualization technology accelerates adoption and deployment.

Human Factors

� The combination of storage and networking has spawned new architectures, concepts, and vocabulary.

� Enterprise networks typically do not have staff dedicated to storage administration.

� Training and certification should include high-level SAN concepts and design.

� To leverage the benefits and economy of SANs, customers must invest in upgrading the skillsets of storage and networking staff.

Contributing Technologies

� InfiniBand will integrate with storage networking via target channel adapter (TCA) bridges.

� 10Gbps Ethernet enables high-performance iSCSI storage arrays supporting hundreds of initiators simultaneously.

� ATA disk technology and new solid state storage will impact SAN storage at both the low and the high ends of the market.

� New technologies make the SAN landscape more complex but offer additional options and benefits to customers. [ Team LiB ]

of 407

[ Team LiB ]

Appendix A. SAN Resources Section A.1. Standards and Proposals

Section A.2. Fibre Channel Technical and Marketing

Section A.3. IP Storage Technical and Marketing

Section A.4. Related Web Resources[ Team LiB ][ Team LiB ]

A.1 Standards and Proposals A.1.1 SCSI and Fibre Channel Standards

www.T10.org(ANSI T10 Technical Committee)

www.T11.org(ANSI T11 Technical Committee)

www.cern.ch/HSI/FCS(European Laboratory for Particle Physics)

A.1.2 IETF Requests for Comments

www.ietf.org(Internet Engineering Task Force)

ftp://ftp.isi.edu/in-notes (University of Southern California School of Engineering Information Sciences)

A.1.3 IETF IP Storage Internet Drafts

www.ietf.org/internet-drafts(IETF Internet Drafts)

www.haifa.il.ibm.com/satran/ips(Julian Satran's IP Storage Archive)

A.1.4 Gigabit Ethernet

http://standards.ieee.org(IEEE Standards site)

www.manta.ieee.org/groups/802/3/(IEEE Ethernet Working Group)

A.1.5 InfiniBand

www.infiniband.org(InfiniBand Trade Association)

of 407

www.fabric-io.com(InfiniBand)


A.2 Fibre Channel Technical and Marketing http://www.fibrechannel.org(Fibre Channel Industry Association)

http://www.fibrechannel-europe.com(Fibre Channel Industry Association—Europe)

http://www.snia.org(Storage Networking Industry Association)

http://www/snia-e.org(Storage Networking Industry Association—Europe)

http://www.snia-j.org(Storage Networking Industry Association—Japan)


A.3 IP Storage Technical and Marketing http://www.ietf.org/html.charters/ips-charter.html(IETF IP Storage Working Group)

http://www.snia.org/English/Products/Products_FS.html(SNIA IP Storage Forum)



www.iol.unh.edu/consortiums/iscsi(University of New Hampshire iSCSI Consortium) [ Team LiB ][ Team LiB ]

A.4 Related Web Resources http://www.snia.org(Storage Networking Industry Association)


of 407


www.10gea.org(10 Gigabit Ethernet Alliance) [ Team LiB ]

of 407

[ Team LiB ]

Appendix B. SAN and Related Vendors Section B.1. SAN System Vendors

Section B.2. SAN Storage and Tape Vendors

Section B.3. Fibre Channel Products

Section B.4. IP SAN Products

Section B.5. SAN Extension

Section B.6. SAN Management and Virtualization

Section B.7. Gigabit Ethernet Vendors

Section B.8. Test Equipment and Verification Labs[ Team LiB ][ Team LiB ]

B.1 SAN System Vendors

Avid (video) www.avid.com

Bull www.bull.com

Dell www.dell.com

Driveline (video) www.drivelinenet.com

Fujitsu www.ftsi.fujitsu.com

Hewlett-Packard www.hp.com

IBM www.ibm.com

NEC www.nec.com

Sun www.sun.com

Unisys www.unisys.com


B.2 SAN Storage and Tape Vendors

ADIC www.adic.com

BoxHill www.boxhill.com

Ciprico www.ciprico.com

Compaq www.compaq.com

DataDirect www.datadirectnetworks.com

Dell Computers www.dell.com

of 407

EMC www.emc.com

Eurologic www.eurologic.com

Exabyte www.exabyte.com

Hitachi Data Systems www.hds.com

IBM www.ibm.com

JMR www.jmr.com

LSI Logic www.lsilogic.com

MTI www.mti.com

Network Appliance www.netapp.com

Overland Data www.overlanddata.com

Procom www.procom.com

Quantum www.quantum.com

Raidtec www.raidtec.com

Siemens www.sni.com

SpectraLogic www.spectralogic.com

StorageTek www.network.com

Sun Microsystems www.sun.com

Unisys www.unisys.com

XIOtech www.xiotech.com


B.3 Fibre Channel Products

Adaptec www.adaptec.com

Agilent www.agilent.com

Andiamo www.andiamo.com

ATTO Technology www.attotech.com

Brocade www.brocade.com

Chaparral www.chaparraltec.com

Crossroads www.crossroads.com

Emulex www.emulex.com

GENROCO www.genroco.com


IBM www.ibm.com

INRANGE www.inrange.com

Interphase www.iphase.com

JNI www.jni.com

of 407

LSI Logic www.lsilogic.com

McDATA www.mcdata.com

Qlogic www.qlogic.com

Troika Networks www.troikanetworks.com

Vixel Corporation www.vixel.com


B.4 IP SAN Products

3Ware www.3ware.com

Adaptec www.adaptec.com

ADVA www.san.com

Agilent www.agilent.com

Alacritech www.alacritech.com

Cisco www.cisco.com

Emulex www.emulex.com

FalconStor www.falconstor.com



IBM www.ibm.com

Intel www.intel.com

JNI www.jni.com

Lucent www.lucent.com

Maranti Networks www.marantinetworks.com

Nishan Systems www.nishansystems.com

Rhapsody Networks www.rhapsodynetworks.com

Qlogic www.qlogic.com

SANRAD www.sanrad.com

SAN Valley www.sanvalley.com

SpectraLogic www.spectralogic.com

StoneFly Networks www.stoneflynetworks.com

Trebia www.trebia.com

Troika Networks www.troikanetworks.com

XIOTech www.xiotech.com


B.5 SAN Extension

AT&T www.att.com

of 407

ATTO Technology www.attotech.com

Brocade www.brocade.com

Chaparral www.chaparraltec.com

CNT www.cnt.com

Crossroads www.crossroads.com

Gadzoox www.gadzoox.com

InRange www.inrange.com

Lucent www.lucent.com

McDATA www.mcdata.com

Nishan Systems www.nishansystems.com

Nortel www.nortelnetworks.com

Vixel www.vixel.com


B.6 SAN Management and Virtualization

BakBone Software www.bakbone.com

BMC www.bmc.com

CommVault www.commvault.com

Computer Associates www.ca.com

CreekPath www.creekpath.com

DataCore www.datacore.com

DataDirect www.datadirect.com

EMC www.emc.com

FalconStor www.falconstor.com

Fujitsu Software www.fujitsu.com


KOM Networks www.komnetworks.com

Legato Systems www.legato.com

Maranti Networks www.marantinetworks.com

Netreon www.netreon.com

SANavigator www.sanavigator.com

StorageApps www.storageapps.com

StoreAge www.store-age.com

Sun Microsystems www.sun.com

Tivoli www.tivoli.com

TrueSAN www.truesan.com

of 407

Veritas Software www.veritas.com

Vicom www.vicom.com

XIOTech www.xiotech.com


B.7 Gigabit Ethernet Vendors

Avaya www.avaya.com

Avici Systems www.avici.com

Cisco www.cisco.com

Extreme www.extremenetworks.com

Foundry www.foundrynetworks.com

Juniper www.juniper.net

Nortel www.nortelnetworks.com

RiverStone www.riverstonenetworks.com


B.8 Test Equipment and Verification Labs

Ancot www.ancot.com

Finisar www.finisar.com

Imation Labs www.imation.com

I-Tech www.i-tech.com

Medusa Labs www.medusalabs.com

Mier Communications www.mier.com

Network Associates www.sniffer.com

SNIA Technology Center www.snia.org

Solution Technology www.soltechnology.com

Tek-Tools www.tek-tools.com

Xyratex www.xyratex.com

[ Team LiB ]

of 407

[ Team LiB ]

Appendix C. The Standardization Process Open systems are predicated on common standards that define how protocols and interfaces should perform. Ideally, if different vendors engineer products to the same standards, the products should function in an identical manner and should be interoperable. This homogenization of functionality does not preclude proprietary value-added features but simply establishes the common ground for base features across multivendor environments. The IETF standard for SNMP management, for example, defines common Management Information Bases (MIBs) for Ethernet, Token Ring, and so on.Vendors may also include proprietary management functionality through vendor-specific MIB extensions. The vendor that complies with standard MIBs may supply additional proprietary MIBs, which are then made to comply with the SNMP management platform so that the customer can monitor both standard and vendor-specific values of a specific product. This flexibility ensures that customers can go to multiple vendors to source similar products (thus achieving the benefit of open systems) while having the option to choose one vendor over another on the basis of value-added functionality.

How, specifically, the industry settles on which features are desirable for open systems implementation and which features are best assigned to vendors' unique value propositions is worked out through the standardization process. Vendors and technologists must reach a consensus on the essential requirements of a technology without burdening the proposed standard with excessive demands. Including too many requirements slows the standards process and delays introduction of solutions to the market. Failing to accommodate essential requirements, on the other hand, undermines open systems and inhibits interoperability. Therefore, reaching agreement on a balance of base requirements is never a simple task and often requires sustained discussion and debate before resolution is reached.

The standardization process may vary in detail from one standards organization to another but is commonly based on elementary democratic principles. Participation is generally open to anyone, or if membership in a standards organization is required, it is based on a minimal and reasonable fee. This policy encourages participation by vendors that have a vested interest in a technology and by technologists or academics whose interest is more objective. Any participant can introduce proposals for standards or proposed revisions, with simple peer pressure acting to discourage frivolous or poorly formulated submissions. Discussion and debate are typically conducted via readily accessible channelssuch as Internet reflector e-mail threads, conference calls, and periodic face-to-face meetings. In the IETF, for example, anyone with Internet access can join a working group reflector, assuming that the person has the endurance to follow the discussions and sufficient disk space to save the e-mail threads.

During a proposal or standards discussion, difficult or tangential issues that delay or divert the process may be tabled bymajority agreement or by recommendation of a workgroup chair. Typically, multiple facets of a specification are discussed concurrently so that the proposal as a whole gets fleshed out simultaneously instead of sequentially. Interim revision levels of a proposal mark the overall progress to date, until eventually a 1.0 version is released as standard. These thresholds serve as checks and balances, ensuring open discussion and majority agreement before the process passes to the next level. A last call for comment offers an open forum for any additions or critique, after which a specification may undergo a final review by officials of the standards organization before a standards identifier is issued.

As shown in Figure C-1, requirements for a proposed standard may be defined by a work group within the relevant standards body or by technical work groups in other contributing organizations. The requirements definition for SNMP management of Fibre Channel devices, for example, was conducted in the SNIA technical workgroups and the renegade Fibre Alliance before being submitted to the IETF for further standards track work. Much of the requirements work for extended copy (third-party copy) was also done by the SNIA and then handed to ANSI T10 for standardization. The InfiniBand Trade Association (IBTA), by contrast, conducts its own requirements definition and standards development internally.

Figure C-1. During development, standards drafts may be validated via compliance testing

of 407

Compliance testing is an important complement to the standards process. This work may be done externally, as in the case of the University of New Hampshire's work on behalf of both Fibre Channel and iSCSI standards development. The aim of compliance testing is to validate vendor implementation of various revision levels of a proposed standard and to identify omissions in the standards draft itself. Compliance testing is done via test suites or scripts that exercise the main features of a proposed standard and verify appropriate response by vendor products. Compliance to fabric login, for example, is tested by sending the requisite login streams to a switch and monitoring the conversation between the switch and the test device. It is thus possible to verify compliance to properly formatted protocol streams as well as to test the response to invalid streams. The FCIA has codified much of the lower-level Fibre Channel protocol compliance testing in its SANmark program, giving vendors an industry-recognized stamp of approval for passing through a hierarchy of compliance testing. The SNIA Interoperability Committee has established the Interoperability Compliance Test Program (ICTP) to create test harnesses for a variety of SNIA-sponsored standards initiatives, including iSCSI adapter management, iSNS discovery, and storage-specific CIM/WBEM specifications.

Achieving the status of a completed and released standard is by no means the conclusion of this extended process. Workon technology standards is an ongoing process, and as indicated in Figure C-1, release of the 1.0 version of a standard may immediately launch new work on the 2.0 version.

Although not all this activity is free from vendor influence and maneuvering, it is still an amazingly objective and democratic process and fosters cooperation and camaraderie within the technical community. Having many cooks in the soup sometimes makes dinner a bit late, but it is far better than the eclectic and proprietary fare that otherwise would be served to customers.

[ Team LiB ]

of 407

[ Team LiB ]

Appendix D. The Storage Networking Industry Association (SNIA) Industry associations embody the contradiction between the competitive interests of vendors and their recognition that their own success is tied to the success of the industry as a whole. The appropriate homily for industry associations is "rising waters raise all ships," although occasionally a gunboat will appear, the sign of a vendor's competitive drive goingunhecked. An industry association may focus primarily on marketing campaigns to raise end-user awareness of the industry's technology, or it may combine marketing and technical initiatives to promote awareness and to formulate standards requirements. The Fibre Channel Industry Association, for example, has organized activities to promote out-bound messaging through Networld+Interop, Comdex, and other venues as well as technical work on the SANmark program for standards compliance. For standardization, the FCIA has worked primarily through the NCITS T11 Committee, to the extent of holding FCIA meetings and NCITS T11 sessions concurrently.

The umbrella organization for all storage networking technologies is the Storage Networking Industry Association, or SNIA. SNIA has more than 300 members, representing vendors and customers from a wide variety of storage disciplines—including management software, storage virtualization, NAS, Fibre Channel, IP storage, and disk and tape—and solutions providers who offer certified configurations and support. As with other industry associations, the SNIA is a volunteer organization with only a few paid staff positions. Its activities are funded by the monetary and personnel contributions of the membership. The general mission of the SNIA is to promote the adoption of storage networking technology as a whole, with the membership itself providing the momentum to accomplish this goal. The more the membership invests in financial and volunteer resources, the more the organization can accomplish. [ Team LiB ][ Team LiB ]

D.1 Board of Directors As shown in the organization chart in Figure D-1, the governing body of the SNIA is the Board of Directors. Board members are elected by the SNIA membership for two-year terms, and an individual board member may serve two consecutive terms. The ten elected board members are supplemented by three at-large board members appointed by the board itself. The board is responsible for establishing policies and managing the resources of the organization to fulfill SNIA's mission, and it provides oversight to the SNIA committees, industry forums, the Technical Council, the Consumer Council, the technical director, and the SNIA Technology Center.

Figure D-1. Storage Networking Industry Association organizational structure

of 407

To ensure involvement in wider SNIA activity, board members are required to chair or provide leadership in SNIA committees and subgroups. Board members are also required to participate in six face-to-face meetings each year and are expected to represent the organization at public venues such as the Storage Networking World (SNW) conferences. This volunteer activity represents a substantial contribution of time and resources for member companies that participate at the board level and reflects their commitment to the industry as a whole. Of course, board representation also provides an opportunity to promote specific vendor agendas, although board representation is sufficiently diverse todiscourage overt vendor-driven initiatives.


D.2 Executive Director and Staff Board activity is supported by a salaried executive director and staff. The executive director conducts the day-to-day operations of the organization and ensures logistical support for SNIA meetings and conference participation.

D.2.1 Board Advisors

The board may receive counsel on industry-related issues from the board advisors, typically former board members and interested parties who may attend board meetings and provide input into board discussions. Board advisors play a critical role in providing viewpoints on storage networking issues and in helping to promote the SNIA within the industry.

D.2.2 Technical Council

The technical activity and strategic technical vision of the SNIA are managed by the SNIA Technical Council. The council is composed of nine of the top experts within the storage networking community, who volunteer their time and expertiseto maintain the integrity of SNIA's technical initiatives. In 2001, the technical council produced the SNIA Shared StorageModel as a guide to understanding storage networking technologies. The technical council also oversees the activity of the technical work groups in cooperation with the technical director.

D.2.3 Technical Director

The SNIA technical director (TD) is a full-time staffed position funded by the SNIA budget. The current technical director is Arnold Jones. Responsibilities of the TD include technical direction and project management of the SNIA work groups, SNIA liaison with standards bodies such as ANSI and IETF, technical input into the SNIA Technology Center, and supportof SNIA committees.

D.2.4 Technical Center Director

The technical center director is also a full-time staffed position and is responsible for management of the SNIA Technology Center in Colorado Springs, Colorado. Over time, the technical center director will staff the Technology Center for a variety of SNIA-sponsored initiatives.


D.3 SNIA Technology Center The SNIA Technology Center in Colorado Springs was launched in 2001 as a multipurpose facility. The Technology Center was made possible by a $3,500,000 grant from Compaq Computer Corporation to the SNIA. The center comprises 14,000 square feet of lab and classroom space and is operated as a vendor-neutral facility by the SNIA. Uses of the Technology Center include interoperability demonstrations, standards compliance testing, proof of concept and evaluation configurations, technology development in support of SNIA technical work group activity, and training in storage networking technology. The initial formation of the Technology Center was led by board members Gary Phillips and Dona Stever with assistance from Sheila Childs and Tom Clark.

of 407

As with other SNIA activities, the Technology Center depends on contributions of money and equipment by member companies. Network Appliance was one of the first vendors to contribute, donating more than half a million dollars' worth of equipment in the form of fully configured NetApp filers. Other vendors have been contributing sponsors, providing equipment to get the center operational. The Technology Center is a significant and practical step for the SNIA in providing its members and the customer community a venue for accelerating the adoption of storage networking.


D.4 Customer Councils Because vendors alone do not determine the useful purposes to which technology will be put, the SNIA has organized two customer councils to solicit customer representation within the SNIA and customer input into storage networking strategies.

The Customer Executive Council is composed of volunteer CIOs, VPs, and other executives from end-user companies that have strategic and business interests in enterprise storage solutions. The Customer Advisory Council is composed ofadministrators, SAN engineers, architects, and support personnel who have practical, day-to-day responsibility for shared storage operations. These two end-user organizations provide both strategic and tactical input into the SNIA and thereby help to establish priorities and shape the future of storage networking. [ Team LiB ][ Team LiB ]

D.5 Committees Much of the nontechnical activity of the SNIA is conducted through committees. Committees may be chaired by SNIA board members or other volunteers, with volunteer participation by member companies. Committees are chartered with various tasks that must be performed within the vendor-neutral culture of the parent organization. Committees and work groups have face-to-face meetings at least four times a year as well as periodic conference calls to track their progress and assign tasks.

The Storage Management Initiative (SMI) Steering Committee provides marketing support to the SNIA's CIM/WBEM storage standards effort. Whereas technical work groups work on standards requirements and definitions, the SMI Steering Committee is responsible for promoting industry and customer adoption of a uniform storage management interface and for organizing support for interoperability and showcase demonstrations.

The Conference Committee is responsible for organizing SNIA participation in Storage Networking World (SNW) and other major events. SNW is the product of a cooperative agreement between the SNIA and Computerworld. SNW conferences are held twice a year, typically in Palm Desert, California, in the spring and Orlando, Florida, in the fall. Attendance at SNW events has grown from a few hundred to several thousand, and SNW has become the premier event for storage networking.

The Education Committee is responsible for creating training and certification programs for the SNIA. This activity ranges from training classes held at the SNIA Technology Center to technology certification through various partnerships. The Education Committee also produced the "SNIA Dictionary of Storage Networking Terminology." (See Appendix F)

The Finance Committee is responsible for developing the SNIA annual operating budget and for ensuring that the fiscal responsibilities of the organization are met.

The Interoperability Committee has had multiple responsibilities, including initial organization of the Technology Center, development of standards compliance test suites, and staging of interoperability demonstrations at SNW and other events. Of these activities, the visibility provided by the SNW Interoperability Lab has encouraged participation in the Interoperability Committee, with more than 65 vendors represented in recent SNWs. For test suite development, the Interoperability Committee has solicited contributions of code and programming expertise from member companies to accelerate development of standards compliance tests. Initial compliance tests are being developed for the iSCSI, iSNS, and SMI specifications.

of 407

The Marketing Committee is responsible for preparation of press releases, press and analyst reviews, and marketing material and for coordination of marketing events in support of SNIA committees and work groups.

The Membership Committee is responsible for recruitment and maintenance of the SNIA membership and for explaining the benefits of SNIA participation to vendors and customers.

The Web Committee has assumed the task of maintenance and revision of the SNIA Web site. The SNIA URL (www.snia.org) is a resource for customers and members and includes information on SNIA activity as well as technical resources such as the SNIA dictionary and source code.

Depending on their available time and resources, SNIA member companies may participate in any or all of the SNIA committees. Although committee activity is vendor-neutral and is focused on the industry as a whole, participation is a means to ensure that a company is adequately represented in the creation of policies, processes, and events that provide visibility in the market. Committee participation is also a means to monitor the state of the industry and thus shape vendor strategies to the consensus of industry peers. [ Team LiB ][ Team LiB ]

D.6 Technical Workgroups The SNIA technical work groups have been instrumental in formulating requirements for technology standards that may then be forwarded to the appropriate standards body for further work. Additional detail on the activity of each technical work group can be found on the SNIA Web site. SNIA work groups have produced open source code for the Common Information Model Object Model (CIMOM) and the Host Bus Adapter API (HBA API), both of which are available via the SNIA site. Technical work groups support a diversity of interests, from management and backup to security issues.

For IP storage, the SNIA IP Storage Work Group was formed to create requirements for standards for IP-based SANs. These requirements documents are forwarded to ANSI, IETF, or another appropriate standards organization.


D.7 Industry Forums To accommodate new storage networking trends within the SNIA umbrella, the SNIA has created SNIA industry forums as a vehicle for organization and marketing. SNIA industry forums enjoy some autonomy within the SNIA but are chartered within the general guidelines of SNIA policy. The forum concept enables those who are creating emergent technologies and services to leverage the SNIA infrastructure and thus accelerate development without the need to create separate industry associations.

D.7.1 SNIA IP Storage Industry Forum

The first forum created under the industry forum definition was the IP Storage Forum. After some initial discussion on itsscope, the IP Storage Forum now represents all vendors that are developing block storage data over IP solutions. Currently, subgroups have been created for the FCIP, iFCP, and iSCSI protocols. More than 40 SNIA member companies are enrolled in the forum, including new IP storage vendors as well as established storage networking vendors that are developing IP-based interfaces for their products. The focus of the IP Storage Forum is the marketing and promotion of IP SAN technology. It thus complements the technical work of the IP Storage Work Group.

D.7.2 SNIA Supported Solutions Forum

The SNIA Supported Solutions Forum was created to promote storage networking solutions that are interoperable and have vendor backing for customer service. As with other SNIA subgroups, the SSF is open to any SNIA member company. The focus of this forum is to configure and test multivendor solutions for specific applications and to create a support infrastructure that will streamline problem resolution for the customer. Ideally, a customer need make only a single problem call, and the vendors themselves will troubleshoot, isolate, and repair the problem without further

of 407

customer intervention.

D.7.3 SNIA DAFS Forum

The DAFS (Direct Access File System) Forum is chartered to provide marketing and promotional support for the DAFS protocol and product implementations.

D.7.4 SNIA Storage Security Industry Forum

The SNIA Storage Security Industry Forum is tasked with promoting secure solutions for storage networks, including authentication and data encryption mechanisms for both Fibre Channel and IP storage networks. The establishment of this forum is an indicator of the steady penetration of storage networks into enterprise environments and the security concerns that have accompanied widespread deployment.

D.7.5 SNIA Europe Forum

The SNIA Europe Forum promotes shared storage solutions in the United Kingdom and continental Europe. SNIA-E has its own Web site (www.snia-e.org), committees, and events, including SNW Europe.

D.7.6 SNIA Japan Forum

The SNIA Japan Forum promotes shared storage solutions in Japan. SNIA-J has its own Web site (www.snia-j.org), committees, and events, including SNW Japan. [ Team LiB ][ Team LiB ]

D.8 SNIA and Other Industry Associations The SNIA has cooperative relations with other industry associations, including the Fibre Channel Industry Association, the InfiniBand Trade Association, and the University of New Hampshire iSCSI Consortium. The SNIA Interoperability Committee and FCIA have been jointly addressing the issue of mid- and upper-layer interoperability testing for the FCIA SANmark program. The FCIA's work has centered on physical-layer and transport-layer issues, whereas the SNIA has focused on application and management issues.

The SNIA Technical Center Committee and Interoperability Committee have met with the ITA to discuss InfiniBand testing, and the ITA has participated in the SNIA Interoperability Committee's Interop Lab at SNW conferences. In addition, the SNIA Interoperability Committee, SNIA IP Storage Forum members, and University of New Hampshire representatives have discussed using the SNIA Technology Center to hold IP storage plugfest events for interoperability testing. [ Team LiB ][ Team LiB ]

D.9 Summary The SNIA represents a diversity of technologies that meet on the common ground of storage networking. Software vendors, hardware vendors, solutions providers, integrators, consultants, and customers committed to shared storage can work within the SNIA to advance their individual and collective interests. As a volunteer organization, the SNIA solicits involvement by its members and interested individuals for committee and work group activity. Additional information on membership and services of the SNIA is available at www.snia.org.[ Team LiB ]

of 407

[ Team LiB ]

Appendix E. The SNIA Shared Storage Model

Shared Storage Model

A framework for describing storage architectures

SNIA Technical Council

[email protected]

June 5th, 2001

Current status: draft SNIA TC Proposal document[ Team LiB ][ Team LiB ]

Revision history

Date By Comments

February 23, 2001

Wayne Rickard, John Wilkes First Draft.

March 8, 2001

Wayne Rickard, Garth Gibson, Dave Anderson, David Black

Second Draft, includes comments from Tech Council review.

March 9, 2001

Wayne Rickard Second Draft, pdf version, minor edits only.

April 3, 2001 John Wilkes Third draft; major editorial changes.

April 28, 2001

Wayne Rickard Third draft #2; incorporating review comments from SNIA BoD and first public review. This revision prepared for April SNW distribution.

May 14, 2001

John Wilkes Fifth revision. Merged the two divergent branches; replaced the graphics.

June 5, 2001 John Wilkes Revised legal language: final draft of SNIA TC Proposal document.


Usage terms The Storage Networking Industry Association (SNIA) hereby grants permission to use this document solely as set forth below, provided that the user complies with and agrees to the terms and conditions set forth below:

of 407

The user may reproduce, distribute, or display publicly this document (in its entirety). In any reproduction, distribution, or display, the user must not remove any copyright or other proprietary notices or these terms and conditions from this document.

The user may also incorporate this document, and/or portions of this document, into other documents or works of authorship, including without limitation articles and books. For any such use, the user must include a reasonably prominent copyright notice acknowledging the SNIA's copyright in any content in this document and a reasonably prominent acknowledgment crediting the SNIA for granting permission for the use of this document. The user must not assert any claims of ownership or proprietary rights, including without limitation seeking copyright registration, of or in any content in this document; and the user must not otherwise seek to limit or restrict others' ability to use this document subject to these terms and conditions.

In any use of this document whatsoever, the user may not alter, change, or otherwise modify any of the content in this document. However, in using a diagram from this document, the user may make certain limited changes to the diagram:(a) solely as reasonably necessary to show the placement or interaction of a product (including a planned or hypothetical product) with or relative to the components shown in the diagram; (b) that are text annotations superimposed upon a diagram, solely as reasonably necessary to comment on or explain the diagram, the Shared Storage Model, or other related proposals, research, or analysis; or (c) that are simple graphical annotations (such as arrows, circles, or other geometric figures) to show interaction, relations, groupings, or subsets, solely as reasonably necessary to comment on or explain the diagram, the Shared Storage Model, or other related proposals, research, or analysis.

When making any such changes to a diagram permitted under Section 3, above, the user must include a reasonably proximate and prominent notice both of the changes and that any such changes are the sole responsibility of the user (in addition to any notices required by Section 2, above).

Notwithstanding anything to the contrary, the user must not make any change to any diagram that: (a) changes the SNIA Shared Storage Model itself (as described in this document); (b) extends the SNIA Shared Storage Model to cover products or applications for which it was not intended (as described in this document); or (c) misrepresents the SNIA Shared Storage Model (as described in this document) in any way, for example, by omitting significant portions of a diagram.

Furthermore, the user may make no use of this document that might in any way suggest the SNIA endorses, sponsors, or is affiliated with the user or any products, services, or content provided by the user. The user may not use the names "Storage Networking Industry Association" or "SNIA" to endorse or promote any product, service, or content provided bythe user incorporating, based on, or otherwise derived from this document, without prior, express, written permission. However, nothing in these terms and conditions precludes a member of the SNIA from noting it is a member of the SNIA.

THIS DOCUMENT IS PROVIDED "AS IS," WITHOUT REPRESENTATION OR WARRANTY OF ANY KIND; AND THE SNIA EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, OR NON-INFRINGEMENT, AND/OR ANY IMPLIED WARRANTY ARISING OUT OF A COURSE OF PERFORMANCE, DEALING, OR TRADE USAGE.

UNDER NO CIRCUMSTANCES SHALL THE SNIA, OR ANY OF ITS MEMBERS, OR ANY AUTHOR OF OR CONTRIBUTOR TO THIS DOCUMENT, BE LIABLE TO THE USER FOR DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, SPECIAL, EXEMPLARY, PUNITIVE, OR OTHER FORM OF DAMAGES (EVEN IF SUCH DAMAGES ARE FORESEEABLE OR THERE HAS BEEN NOTICE OR KNOWLEDGE OF THE POSSIBILITY OF SUCH DAMAGES) ARISING FROM THE USE OF THIS DOCUMENT, REGARDLESS OF WHETHER ANY CLAIM TO SUCH DAMAGES IS BASED IN CONTRACT, WARRANTY, TORT (INCLUDING NEGLIGENCE AND STRICT LIABILITY), PRODUCT LIABILITY, OR OTHER THEORY OF LIABILITY.

YOU HEREBY AGREE TO INDEMNIFY, DEFEND, AND HOLD HARMLESS THE SNIA, OR ANY OF ITS MEMBERS, OR ANY AUTHOR OF OR CONTRIBUTOR TO THIS DOCUMENT, FROM ANY AND ALL THIRD-PARTY CLAIMS OR OTHER LIABILITIES,INCLUDING WITHOUT LIMITATION REASONABLE ATTORNEYS' FEES, COSTS, AND EXPENSES, ARISING OUT OF OR IN ANY WAY RE LATED TO YOUR USE OF THIS DOCUMENT.

Any person's or company's retention and/or use of this document manifests his, her, or its assent to the above terms and conditions and agreement not to make any use of this document except as expressly provided above. All rights not explicitly granted are expressly reserved to the SNIA.

of 407

Permission to use this document in ways or for purposes not allowed above, or clarifications on permissible changes to diagrams, may be requested by e-mailing [email protected]; please include the identity of the requesting individual and/or company and a brief description of the purpose, nature, and scope of the requested use.

Copyright © 2001 Storage Networking Industry Association.


Executive summary Terms like "SAN" and "NAS" become loaded with artificial and sometimes arbitrary meaning in the absence of a single, simple model for shared storage architectures. The SNIA Technical Council has developed a framework that captures the functional layers and properties of a storage system, regardless of the underlying design, product, or installation.

Much like the OSI 7-layer model in conventional networking, the SNIA Shared Storage Model may be used to describe common storage architectures graphically, while exposing what services are provided, where interoperability is required, and the pros and cons of each potential architecture.

The model describes architectures, but it is not itself an architecture. You cannot buy it, or a system that it describes by specifying it in a bid, or a request for a bid. You cannot "build it." The model does not represent any value judgments between the architectures it describes. Instead, the model makes it possible to compare architectures, and to communicate about them in a common vocabulary.

For example, the term "virtualization" is sometimes used to refer to some or all of the aggregation, mapping, and naming functions described here. Because there is a lack of clarity in the storage industry as to the precise meaning of this term we have deliberately avoided its use in the SNIA Shared Storage Model. Instead, we offer descriptions of what is meant at the various layers of the model, and adopt appropriate terminology for each of those meanings.

By aligning on such a common "architecture vocabulary," vendors can better explain the differentiation offered by their products, customers can better structure their choices and, for the first time, reference comparisons between common solutions will be possible.

This paper provides:

� An explanation of the context and scope of the model.

� A description of the model itself.

� Examples of applying the model to a number of different existing systems, to help explain how the model is used.


AcknowledgmentsThe SNIA Shared Storage Model is a cooperative effort of the SNIA Technical Council. It brings together the perspectivesof system architects from diverse backgrounds and perspectives in the storage industry. An effort of this scope could only be successful with support from each of the SNIA member organizations that sponsored the individuals, contributing their time and knowledge to the creation and review of this document. The SNIA Board of Directors would like to extend its gratitude to this dedicated group of individuals and their sponsoring companies:

Chief Author, Project Lead John Wilkes, Hewlett-Packard

Co-Author David Black, EMC

Co-Author Harald Skardal, Network Appliance

of 407

The Technical Council would also like to recognize the following individuals whose special contributions to the Shared Storage Model helped make the project a success: Paul Massiglia (Veritas), Paul Borrill (Veritas), Andrea Westerinen (Cisco), and Brad Stamas (StorageTek).

SNIA TC Chair, Editor & Contributor Wayne Rickard, Gadzoox Networks

Technical Council, Contributor Dave Anderson, Seagate Technology

Technical Council, Contributor Jim Carlson, IBM

Technical Council, Contributor Garth Gibson, CMU

Technical Council, Contributor Kevin Phaup, HighGround Systems

Technical Council, Contributor David Thiel, Compaq


The shared storage vision A major objective of the storage networking movement within the computer industry is to address the needs of corporate computing environments for storage systems that offer the scale, flexibility, availability, and other attributes that are typically required by enterprise networks.

Storage is increasingly recognized as a distinct resource—one that is best thought of separately from the computer systems (hosts) that are its consumers and beneficiaries. Such storage is increasingly often shared by mul tiple hosts and acquired and managed independently from them. This is in contrast to the historical view (host-attached storage) that storage is an intrinsic part of a computer system, i.e., a "peripheral."

This trend towards shared storage recognizes the critical value of the information entrusted to the storage system, as well as the fact that storage represents a significant portion of the investment in a typical computing environment. In turn, the trend offers the IT community several benefits, which can be summarized as:

� increased flexibility in deploying and managing storage,

� improved quality of service, and

� increased operational efficiency.

These benefits are each highly valued in today's businesses, which must react quickly to changing opportunities while meeting rapidly rising expectations for service quality—and yet are constrained by the need to accomplish more with limited resources.

The key enabling technology for shared storage is networking technology that can provide high bandwidth, large scale, good connectivity, and long-distance connectivity at a cost that makes shared storage an attractive alternative to the historical host-attached storage model.

We use the term storage network or the acronym "SN" to refer to this concept throughout this document. We have intentionally avoided the more commonly used "SAN" and "NAS" because of their associated connotations, and because we wish to de-emphasize any particular networking technology. A storage network, in our usage, could be Fibre Channel, ESCON, Ethernet, etc., so long as its primary use is for accessing the shared storage.

Although shared storage environments can bring many benefits, they also have a number of challenges. The Storage Network Industry Associa tion (SNIA) was formed to communicate the benefits of this new paradigm, and to provide a forum for computer vendors, storage vendors, and the IT community to address its challenges together. The primary purpose of the SNIA Shared Storage Model is to provide a language for identifying and articulating those challenges in order to facilitate the creation of effective answers that—in turn—make possible the full benefits of the shared storage

of 407

vision.

Why shared storage?

Through much of the history of computing, storage has been seen as an intrinsic part of computer systems. While storage once was regarded as a "peripheral," more recently it has come to be thought of as a storage subsystem, but still uniquely associated with a computer. The principal exceptions to this have been mainframe computer complexes and computer clusters where a modest number of cooperating computer systems share a common set of storage devices.

As business has become more dependent upon computing, it has also become more dependent upon data, the life-blood of computing. While a failed processor can usually be readily replaced and operations continued almost immediately after the replacement, a failed storage resource requires replacement followed by typically time-consuming restoration of data, all too often with some loss of recent changes to that data that requires recovery action, before operations can continue. As a result, storage and the disciplines of caring for data and the storage on which it resides have gown in visibility and importance.

In addition, the fraction of the purchase price of a computer system that is represented by the storage component has grown over time to the point that now the storage component of a computer system is often in the vicinity of half of the total price.

Beyond the purchase price of storage, the total cost of owning storage has become a significant part of the cost of maintaining the computing environment. In other words, the acquisition cost is a small portion of the total cost of ownership of storage over its lifetime.

Computing environments have grown as business has become increasingly reliant on computing. As this has happened, computer systems have grown in size and in number. Because the traditional computing model associates storage uniquely with a computer system, a computing environment with many computer systems has many storage and storage management environments to maintain and operate—one per computer system.

In responding to these trends, the IT community has come to view storage as a resource that should be purchased and managed independently of the computer systems that it serves. The IT community has also increasingly come to view storage as a resource that should be shared among computer systems. These changes allow more focused attention on storage, which is expected to lead to reduced costs and higher levels of service, and more flexibility through the sharing of the storage resource. This, in turn, allows IT professionals to provide improved quality and response time as business needs change.

As the storage system for a computing environment becomes a shared, independent resource, additional requirements emerge:

� Reliability— as required of any large, shared, critical resource

� Scalability— to match the size, performance, and physical and geographic placement of computing environments

� Manageability— to provide high levels of service, and achieve the expected reduction in operational expenses

� Standards-based interoperation— to avoid excessive vendor dependence in a large, critical component of data centers

Another key trend is the evolution in communications technology that has made available high performance communication between many nodes over distances ranging from a few inches to fully global systems at reasonable cost. (Despite this, high performance over large distances is still relatively expensive compared to local connectivity, andit is subject to speed-of-light related delays that can be significant in comparison to storage device access times.)

When all of this is considered, a structure emerges that achieves the goals of reliability, scalability, and manageability. That structure is a storage system comprising many computer systems that are the consumers of the storage system, many storage devices, and extensive management capabilities—all interconnected richly and with the necessary high performance. This is the shared storage environment.

of 407

The potential

Shared storage is a powerful concept, bringing together into single storage systems essentially unbounded collections of interconnected and geographically unconstrained storage resources and management capabilities.

This redefines storage, where traditional storage systems are very limited (both in scope and extent) and where management capabilities are typically distinct from storage systems.

The power of the shared storage environment lies in a richly interconnected set of resources and in defining the storage environment as a storage system in its own right. The former opens many avenues for exploiting connectivity. The latterenables the storage system to be the focus of invention, deployment, and operation as an entity independent of the hosts that it serves. Many aspects of ongoing activity under the "storage networking" umbrella support these claims; it is likely that more are to come given the high level of energy being applied to this field.

Some examples of potential benefits are:

� Reduced Total Cost of Ownership (TCO) of storage solutions through intelligent Storage Resource Management. For example, if three servers share a pool of storage, they need not each have spare storage to accommodate growth but can pull from a common pool. They also may share a backup resource that each can use.

� Better performance or efficiency through use of network enabled storage management approaches, i.e., "server independent" storage management, consolidated storage management, data movers, 3rd party copy, etc.


Why a model for shared storage? This document presents the SNIA Shared Storage Model, which describes a set of practical, possible storage network architectures.

Such an architecture describes a particular functional partitioning of services across the physical and logical resources in a shared storage environment. In use, an architecture is used to develop a storage system design, which includes thingslike the quantity and size of the resources, and possibly choices of suppliers, versions, etc. A completed design contains enough information to allow construction of a real system that conforms to it.

The SNIA Shared Storage Model is deliberately simple on the surface: the goal is to make it easy to use, yet rich enoughto cover a wide range of storage networks that are being—or could be—deployed. To make this easier, the model is primarily described in a graphical form, supported by a concise terminology that supports the key concepts.

A major intent of the model is to highlight the fundamental structures of a storage system that have the largest effect on the system's value proposition. These include, but are not limited to:

� The functions or services that a storage network architecture can support.

� The division of functional responsibility among the components of the system.

� Relationships between control and data flows.

� Boundary conditions where interoperability is likely to be an issue. This includes both places where interactions take place between architecture modules (and hence interoperability is required) and what kinds of interoperabilitymust occur there.

� The implications of interface abstraction on technology evolution and marketplace competition.

Of course, the model doesn't explicitly cover all possible architectures. Instead, the intent is that the user of the model can use it to develop and describe his or her own particular mix of architectural elements and choices. Nonetheless, our

of 407

experience with it to date has been that it is capable of covering a wide variety of different architectures and designs. It is likely, therefore, that the model does cover a new situation—although perhaps not in the most immediately obvious way.

Benefits of the model

The benefits from a model of this form are that it can:

� Provide a common vocabulary that people comparing and designing architectures can use to communicate with one another, and make their design efforts and documentation more precise. (The analogy here is with the "definitions of terms" that are an important deliverable from the IETF and standards bodies.) This will also make it easier for vendors to explain their offerings to customers.

� Provide a common way of recording network storage architectures, so that these architectures can be compared and contrasted with one another, making it easier for customers to compare different architectures and proposals.

Overall, the hope is that this common "vocabulary" will help to align the storage industry for the mutual benefit of all of its participants and customers.

Note that although the model describes architectures, it is not itself an architecture. You cannot buy it, or a system that it describes, by specifying it in a bid, or a request for a bid. You cannot "build it." The model does not represent any value judgments between the architectures it describes. Instead, the model makes it possible to compare architectures, and to communicate about them in a common vocabulary.

A note on the graphical conventions used in the model

Throughout the model, we have tried to be consistent about the following graphical conventions:

� 3D objects represent physical entities (hosts, switches, etc.)

� 2D objects with drop shadows represent functional entities

� Orange/ochre represents storage devices

� Dark blue represents host computer systems

� Pale blue represents storage networks

� Green represents file-level entities

� Yellow is used for caches

� Thick lines represent the major data transfer paths; thinner lines with arrowheads represent paths over which metadata flows

� Diagonal stripes indicate metadata managers

� The vertical placement is also important: particularly in the block subsystem layer

� The classic storage model

All too often, the picture shown here represents the current state of conversations about storage networking: vendors, system designers, and customers try to describe what they want—or what they have—using a set of inconsistent, ad hoc, languages.

of 407

Things are made worse by there being a great many network storage components, with relatively small differences between them. This causes designs that are actually the same to be described in different ways, and different designs to be described sometimes using identical forms of words. This is clearly undesirable, and results in many problems: it's often not obvious what is being proposed or described; tradeoffs between alternatives are harder to identify than they could—or should—be; and it's harder for everybody to make high quality decisions.

These confusions are not accidental: the wide variety of the range of system architectures that have been developed exhibit a great deal of complexity because they are trying to accommodate a great deal of information, and cover many different elements and functions. Some of those elements are physical—boxes, wires, computers—and it is often the case that architectures are presented by describing the physical components in some detail, coupled with an explanation of what functions they perform. That is, the traditional approach focuses first on the physical partitioning that a particular vendor has selected, rather than on the range of options that may be possible. And because this is "box-centric" rather than "function-centric" it is all too easy to misunderstand precisely what has been included.

Figure E-1.

The SNIA Shared Storage Model is an approach to removing these difficulties. It does so by taking a slightly different approach: it first identifies the functions that can be provided, and then describes a range of different architectural choices for placing those on physical resources. As a result, the SNIA Shared Storage Model makes it easier to compare alternative architectures and designs, it lets architects think about functions independently of implementations, and it makes it simpler to anticipate new implementations or combinations of architectures, designs, and implementations.


The SNIA Shared Storage Model This is the highest-level picture of the SNIA Shared Storage Model. It has three main components within its scope:

� The file/record layer, which includes databases and file systems.

of 407

� The block layer, which includes both low-level storage devices and block-based aggregation.

� A services subsystem, which provides functions such as the management of the other components.

Note that applications lie outside the scope of the model—they are viewed as "clients" of the storage domain, in the broadest sense.

Storage system components

The SNIA Shared Storage Model supports the following kinds of components:

� Interconnection network— the network infrastructure that connects the elements of the shared storage environment. This network may be a network that is primarily used for storage access, or one that is also shared with other uses. The important requirement is that it provides an appropriately rich, high-performance, scalable connectivity upon which a shared storage environment can be based.

Figure E-2.

� The physical-layer network technologies that are used (or have been used) for this function include Fibre Channel, Fast-and Gigabit-Ethernet, Myrinet, the VAX CI network, and ServerNet. Network protocols that areused at higher layers of the protocol stack also cover a wide range, including SCSI FCP, TCP/IP, VI, CIFS, and NFS.

� Redundancy in the storage network allows communication to continue despite the failure of various components; different forms of redundancy protect against different sorts of failures. Redundant connectionswithin an interconnect may enable it to continue to provide service by directing traffic around a failed component. Redundant connections to hosts and/or storage enable the use of multi-path I/O to tolerate interface and connection failures; multi-path I/O imple mentations may also provide load balancing among alternate paths to storage. An important topology for multi-path I/O uses two completely separate networks to ensure that any failure in one network cannot directly affect the other.

� Host computer— a computer system that has some or all of its storage needs supplied by the shared storage environment. In the past, such hosts were often viewed as external to the shared storage environment, but we

of 407

take the opposite view, and will show examples of function mappings that place key components in such hosts.

� A host typically attaches to a storage network with a host bus adapter (HBA) or network interface card (NIC). These are typically supported by associated drivers and related software; both hard ware and software may be considered part of the shared storage environment.

� The hosts attached to a shared storage environment may be largely unaware of each other, or they may explicitly cooperate in order to exploit shared storage environment resources. Most commonly this occurs across subsets of the hosts ("clusters"). One of the advantages of separating hosts from their storage devices in a shared storage world is that the hosts may be of arbitrary and differing hardware architecture and run different versions and types of operating system software.

� Physical storage resource— a non-host element that is part of the shared storage environment, and attached tothe storage network. Examples include disk drives, disk arrays, storage con trollers, array controllers, tape drives and tape libraries, and a wide range of storage appliances. (Hosts are not physical storage resources.) Physical storage resources often have a high degree of redundancy, including multiple network connections, replicated functions, and data redundancy via RAID and other techniques—all to provide a highly available service.

� Storage device— a special kind of physical-storage resource that persistently retains data.

� Logical storage resource— a service or abstraction made available to the shared storage environment by physical storage resources, storage management applications, or combination thereof. Examples include volumes, files, and data movers.

� Storage management— functions that observe, control, report, or implement logical storage resources. Typicallythese functions are implemented by software that executes in a physical storage resource or host.

The layering scheme of the SNIA Shared Storage Model

The SNIA Shared Storage Model is a layered one. The figure shows a picture of the stack with a numbering scheme for the layers. Roman numerals are used to avoid confusion with the ISO and IETF networking stack numbers.

The layers are as follows:

� IV. Application

� III. File/record layer

� IIIb. Database

� IIIa. File system

� II. Block aggregation layer, with three function-placements:

� IIc. Host

� IIb. Network

� IIa. Device

� Storage devices

Figure E-3.

of 407

We will now describe each of these layers, from the topmost layer downwards.

The file/record layer

This layer packs small things such as files (byte vectors) and database tuples (records) into larger entities such as block-level volumes and storage device logical units.

The two commonest implementations seen at this level are database management systems and file systems. Both provide mechanisms for naming or indexing files or records, enforcing access controls, performing space allocation and clustering, and caching data for rapid access.

In both cases, the file/record layer sits on top of one or more volumes: large block vector stores or byte vectors provided by an underlying block store or (sometimes) file system. That is, database management systems typically offer mappings (or packings) of:

tuples or records tables volumes

Sometimes additional intermediate mapping layers are introduced, such as a grouping of tables together into a "table space" that sits atop one or more external volumes. This is usually done to make it easier to manipulate such mappings from inside the database management system in an environment-independent fashion.

Figure E-4.

File systems typically do mappings from:

bytes files volumes

of 407

Because a byte vector can be used to emulate a block vector, the volumes that a database is mapped to can sometimes be files. This is most often done for small database systems where the performance penalties of the two levels of mapping it entails are outweighed by the simpler management that results from exploiting the naming and access control mechanisms offered by the file system.

Secondary functionality provided by this layer may include content indexing, aggressive prefetching and write-behind techniques to improve performance, hierarchy management, and providing coherency across multiple copies in distributed systems.

In the future, we expect to see new implementations at this layer, such as file systems explicitly designed for replaying isochronous streams against multimedia objects (e.g., videos). Indeed, an http web cache might be considered a new kind of distributed file system.


Where can it be done? The functions provided by the file/record layer can be implemented in several different places:

� Solely in the host: these are the traditional host-based file systems and databases (the left-hand column in the figure shown here). In such systems, the implementation of the file system or database resides completely in the host, and the interface to the storage device is at the block-vector level.

� In both client and server: these are the standard "network" file systems such as NFS, CIFS, etc. (The right-hand column of the figure.) Such implementations split their functions between the client (host) and the server system. (Note that this split happens inside the file system layer in the SNIA Shared Storage Model.) The client side always resides in a host computer. The server side, however, can reside in a:

� file server (sometimes database server)— typically a host computer with local attached block storage devices that may be dedicated to this function.

Figure E-5.

� NAS head— a dedicated-function computer acting as a file server and relying on external block storage devices connected through a storage network.

� storage device— such as a disk array or "smart disk."

of 407

The block layer

The block layer provides low-level storage to higher layers, typically with an access interface that supports one or more linear vectors of fixed-size blocks. In SCSI, these logical address spaces are called logical units (LUs); a single SCSI storage device may support several such logical units.

Ultimately, data is stored on "native" storage devices such as disk drives, solid-state disks, and tape drives. These devices can be used directly, or the storage they provide can be aggregated into one or more block vectors to increase or decrease their size, or provide redundancy. This aggregation can occur in many places—more on this below.

Figure E-6.

Secondary responsibilities of the block layer include a simple form of naming, such as SCSI logical unit names (LUNs), caching (and, in particular, non-volatile caches for write-behind data), and (increasingly) simple access control.

Block aggregation

Block aggregation comprises a powerful set of techniques that are used to serve many purposes. These include:

� Space management: constructing a large block vector by assem bling several smaller ones, or packing many small block vectors into one large one, or both. (This "slicing and dicing" has historically been one of the most important functions of host-based logical volume managers.)

� Striping: apportioning load across several lower-level block vectors and the systems that provide them. The typical reason for doing this is to increase throughput by increasing the amount of parallelism available; a valuablesecondary benefit may be the reduction in average latency that can result as a side effect.

� Providing redundancy for increasing availability in the face of storage device failures. This can be full redundancy (e.g., local and remote mirroring, RAID-1, -10…); or partial redundancy (RAID-3, -4, -5,…). Additional features like point-in-time copy (various versions of this are sometimes called "snapshot") can be provided, which can be used to increase the effective redundancy level of a system, and to help recover from other kinds of failures.

In practice, several of these functions are often combined together. For example, a system that can handle striping is often also capable of performing mirroring (a combination sometimes referred to as RAID-10).

Where can it be done?

of 407

The block aggregation functions can be performed at several of the storage components described in the model. Indeed, it is common to find more than one being used.

� Host-side, such as in logical volume managers, device drivers, and host bus adapters (HBAs).

� In components of the storage network itself, such as specialized "SN appliances." In addition, some HBAs are better thought of as part of the storage network.

� And, very commonly, in the storage devices themselves: disk array controllers (e.g., RAID) are classic examplesof this. Modern disk drive controllers provide some level of this functionality too, such as a logical-to-physical blockmapping for supporting sparing.

How is it done?

The figure shown here offers a simple, visual model of how block aggregation operates. It also illustrates the reason thatblock-based aggregation functions at different components can be composed together.

Figure E-7.

You can think of each kind of aggregation function as a building block that imports (uses) one or more block vectors from "below," and exports to its clients one or more block vectors at its upper interface that are constructed (i.e., "aggregated" or "virtualized") from those imported ones. The construction can embody any or all of the functions described above.

Because the interfaces to both imported and exported block vectors are the same, these building blocks can often be stacked on top of one another: for example, mirroring across two disk arrays could be performed in a host logical volume manager, and RAID 5 redundancy applied within each of the disk arrays. Each layer could in theory also be internally constructed in this same way.

Sample architectures

This illustration shows the first application of the model to a number of different block-based storage architectures.

Figure E-8.

of 407

� direct-attach: the leftmost column illustrates this. A host is connected to some private, non-shared storage devices (e.g., disk drives attached by a local SCSI cable). In the figure, the host is shown running a logical volumemanager (LVM)—perhaps capable of providing a software RAID function to provide protection against disk failure.

� storage-network attach: the second and third hosts, plus the storage network and disk array in the second column, illustrate this. This scheme introduces a storage network (the pale blue cloud) connecting one or more hosts—perhaps still running LVM software—to a disk array that is providing a further set of block aggregation functions. The disk array resources can now be shared between multiple hosts.

� storage-network aggregation: the final example embeds a block-aggregation function into the storage network in an aggregation appliance that might be providing access control and (say) striping aggregation functions.

Putting it all together—combining the block and file/record layers

This picture puts together both block-based and file-based storage architectures.

Figure E-9.

of 407

� direct-attach: the leftmost column shows local, private storage directly connected to a single host.

� storage-network attach: the second column shows a representative host connected to a (shared) disk array through a storage network (SN).

� NAS head: the third column illustrates a dedicated-function "NAS head" (file server controller) interposed between the lower-level, block-based storage network and its clients, which are connected to it through a second network (generically an arbitrary second storage network, but shown here as a LAN, as that is the most common form), and operate using client/server file system protocols. Note that block-aggregation functions can be used to support several NAS heads, as well as regular block-level hosts.

� NAS server: this is shown in the right-most (fourth) column and logically consists of a combined NAS head and itsown local, private storage.

Access paths

An access path is the list of components (hardware and software) that are traversed by read and write requests to the storage system and their responses. If we restrict ourselves to avoid cycles, there are eight possible paths from the application layer to the lowest storage layer through the elements of the SNIA Shared Storage Model.

Figure E-10.

Five examples of common paths are shown in the figure here, reading from right to left:

� direct to the storage device (e.g., disk drive or disk array)

� via a file system

� via a file system that sits on top of a block aggregation function

� via a database on top of a file system on top of a block aggregation function

of 407

� via a database

Caching

Caching is designed to shorten access paths for frequently referenced items, and so improve the performance of the overall storage system. Most elements of a storage system can provide a cache, and so such caches can be performed at the block or file/record layer—or both. Indeed, it is common to see several caches in operation simultaneously. For example: a read cache in a file system, coupled with a write-back cache in a disk array, and a read-ahead cache in a disk drive.

Figure E-11.

The figure here illustrates this: almost any of the components in the system can be augmented with a cache. The figure also introduces a new component: a dedicated caching appliance, added to the storage network solely to provide caching functions.

Ideally, all that adding caching does is speed things up, but making this true for shared storage networks requires takingaccount of a multitude of detailed issues that occur in distributed systems, such as maintaining the coherency of multiplecopies of data or metadata (data about data), and tolerating partial failures of the system. In particular, cache management (deciding which cache should—or does—hold what) is significantly more complicated when data may be cached in several places.

Access control

A shared storage environment bestows the benefit of hosts being able to access their storage devices directly. But the existence of an access path should not be taken as equivalent to permission to exercise it. Access control is the set of techniques that enforce the decisions that encapsulate these permissions.

There are many different kinds of unwanted accesses possible, and the protection used against them has perforce to trade off the degree of protection against efficiency, and the complexity of enforcement. The basic goals are to provide:

� authentication ("proof that I am who I say I am"),

� authorization ("proof that I am allowed to do this"), and

� privacy ("proof that I am allowed to see the contents").

of 407

Historically, storage systems have provided little or no support for any of these, other than via simple physical security—locking up the storage devices and the hosts that access them. This is likely to change significantly in the storage network world because the number of different threat types is so much larger.

Ultimately, all approaches to access control rely on some form of secure channel being established between the provider of data (or operations on that data) and its destination. The least secure, but also easiest to implement, solution imposes simple, coarse-grained accessor checks (of the form "is this host permitted to send requests to this storage device?"); at the other extreme lie cryptographic protection mechanisms that are resistant to a wide variety of impersonation, monitoring, and replay attacks, and capable even of securely storing data on storage devices that cannotbe trusted not to divulge (or lose) their contents.

Preventing unwanted accesses can be performed in several places:

� At the host: this offers convenience of implementation, easy scalability, low execution cost, and potentially fine-grained control, but it must be pervasively deployed, and the enforcement mechanism should be resistant to tampering. With support from their host operating systems, file systems and databases commonly enforce access controls on data, and similar kinds of solutions are appearing at the block vector layer.

� Networking stacks in the host can also use encryption to provide secure channels across various forms of network (e.g., IPsec). The performance of software versions of these schemes means today that they are best suited to use on relatively low–speed network links (such as a wide area network), but this is likely to change with the deployment of hardware accelerators.

� In the storage network: today, this is largely restricted to relatively low-level approaches that offer the illusion of private, dedicated sub-networks that permit a set of host and storage device ports access only to one another, hiding any other ports. (These are usually called zones in Fibre Channel, or VLANs in the Ethernet world.) Because they operate on whole ports, such solutions are quite coarse-grained. But, by analogy with the migration of functions such as load balancing into traditional IP networking components, it is reasonable to expect finer-grainedcontrols appearing in storage network switches, such as the ability to enforce such virtual networks on per-logical-unit (LU) boundaries.

� At the storage devices: ultimately, storage devices will probably have to accept as much responsibility for enforcing access controls as the hosts do: they are, after all, the primary shared resource that storage networking is trying to make available. This has long been true of servers at the file/record layer such as file servers and NAS heads; and now solutions are appearing that perform a simple form of "LU masking" at the block layer, where onlycertain hosts (or host ports) are allowed access to a particular LU.

As storage networks span ever-greater numbers of devices, and encompass greater heterogeneity of host types, host software, and distances, the importance of this issue will greatly increase. [ Team LiB ][ Team LiB ]

The services subsystem A very important part of the model is the set of services that lie "off to one side" of the critical data-access paths. The list provided in the graphic here is not meant to be exhaustive, but to give a flavor of the kinds of things that are handled by this services subsystem. Many of the services are "management" tasks, and need to be tied into the larger system-wide service management tool processes and tools.

Figure E-12.

of 407

Although such services are vital for successful implementations, they are not further discussed here: this version of the SNIA Shared Storage Model deliberately focuses on the data-access portion, in order to allow us to communicate the model for discussion and use in a timely manner.

Refining the definitions, terminology, specifications, and interfaces associated with the services subsystem represents a major opportunity for the SNIA community. This issue will be explored in detail in future SNIA Tech nical Council reports.


Additional topics This section pulls together some additional areas that the model touches on that don't readily or conveniently fit into the layered structure identified above.

Clustering

In addition to the simple block aggregation that the model has described so far, many components of the model are amenable to a form of "service aggregation"—or clustering—to improve scalability, failure tolerance, and ease of management. It is particularly prevalent in solutions to the distributed cache management problem mentioned above, although database systems that support it tend to emphasize the increased availability that can result.

Figure E-13.

of 407

In the context of a file system, this approach is often called a "cluster file system," or "distributed file system." The idea is spreading: logical volume managers are appearing with similar clustering properties.

Data versus storage

One of the issues that often arise in discussions about storage models is of the form "what's the difference between dataand storage?" The approach taken in the SNIA Shared Storage Model is that the data (content) is the stuff put into storage entities (the containers). A similar distinction can be applied to "information" (the meaning of data) and "data" (the stored bytes). These definitions are recursive: every time there is a mapping, this rule can be applied.

For example:

Sharing of resources and data

With host-attached storage, storage resources are accessible, managed, and organized solely by the host to which they are directly connected. No other hosts can access, manage, or organize these storage resources except through the hostto which they are directly connected.

This difficulty can be alleviated in a shared storage environment: storage resources can be directly accessible by many hosts, and can be considered a common pool of storage resources. While this capability offers many benefits, e.g., in thearea of flexibility, it also creates the requirement that access be controlled.

To further explore this topic, a distinction must be made between resource sharing and data sharing.

User: data ("learning my preferences")

Application: container ("user keystroke history")

Application: data ("user keystroke history file")

File system: container ("byte vector")

File system: data ("a named file")

Volume system: container ("blocks in volume")

Volume system: data ("replicated, striped layout")

Disk array: container ("blocks in LU")

of 407

Resource sharing

Resource sharing is the shared use of just the storage system resources—those portions of the storage system that provide its containers—not the containers themselves or their contents.

Shared resources commonly include the network and physical storage resources such as storage controllers, disk arrays,tape libraries, disk drives, tape drives, etc., attached to it. A simple example is an array controller that provides a logicalunit (LU) to one host and a different LU to a second host. The network, the array controller—and possibly even the array's disk drives—are shared resources, but the LUs are not shared in any way.

Resource sharing is present, at least in principle, in every shared storage environment, since the storage devices and the storage network are examples of shared resources. It is much more common than data sharing because it avoids difficult issues such as multi-copy data coherence and data format translation.

Data sharing

Data sharing is the sharing of contents: two or more client entities (hosts) access and interpret the same data. (It is sometimes called "logical sharing," or "content sharing.") Data sharing requires attention to difficult issues such as coherence and data format translation. As a result, it is relatively uncommon at the block level, but much more prevalent at the file/record level.

In the most primitive form of data sharing, multiple hosts may be given read-only access to the same logical unit (or logical volume). Some logical volume systems can handle a single writer with multiple readers—but the real difficulties occur in the file/record level, where caching is done. Most often, such block-based data sharing is supported only within the context of cluster file systems or distributed databases on a set of clustered host systems.

However, file/record-based data sharing is quite common for the case of a single, shared server providing a client/serverfile access interface such as NFS or CIFS. Here, the protocols required to allow multiple hosts to access the same target file system are reasonably well embedded in the client side of distributed file system implementations—although there is still room for improvement in cases where there are multiple writers for the same file.

Figure E-14.

Modular systems

The ability to compose larger systems from modules (components) offers many benefits.

of 407

� Systems can be built using interchangeable components from different vendors. This makes competitive development by multiple suppliers possible, driving quality up and prices down for customers.

� Modules can be upgraded or enhanced independently (e.g., one module at a time rather than a complete system replacement), as long as the interface rules are adhered to. This can protect invest ments, reduce obsolescence, and support smooth improvements in functionality over time.

� System scale can be increased by adding or enhancing components, allowing smooth incremental growth. In turn, this minimizes dis ruptions to the system's users, lowers the initial system cost, and lets customers take advantage of cost reductions in components over time.

Achieving all this requires that modules can be composed into larger systems and replaced individually. In turn, this requires module interfaces that have well defined functions (what they do), interface protocols (data formats), and access protocols (system call, RPC, flow control, etc.).

Figure E-15.

Three examples of physical interfaces that can be used are shown here:

� A binary programmatic interface, or API;

� A bus— e.g., the SCSI parallel bus;

� A network— e.g., Fibre Channel, Ethernet, etc.

The illustration shows how a logically layered system may be designed by stacking functional modules (shown in blue) with different interfaces (shown in orange).

Interfaces may be proprietary or open. Proprietary interfaces deliver some of the same advantages as open interfaces, but the vendor choice is likely to be more limited and new developers may not have ready access to the technical specifications required for interoperability.

Figure E-16.

of 407

It is the position of the SNIA that the full benefits of modular systems are only likely to be realized through the development of common, open interfaces that see wide adoption. As a result, SNIA promotes the development of open, industry-standard interfaces that have well-understood, well-documented specifications. It is also important that the specifications be published (which includes open discussion of changes under formal revision control), and are supportedby multiple products. Industry standards processes are an important way of achieving these goals, which is why SNIA is active in supporting them.

Storage networks

The SNIA Shared Storage Model does not explicitly include details of the storage network, such as whether or not to use network switches or a par ticular kind of interconnection technology. This is intentional: the model is network- and transport-neutral: although it relies only on the existence of storage networks and module interfaces to achieve the functional decompositions that it describes, it is not particular about the precise details of those networks or interfaces.

For example, much has been made of the merits of switched networks, and their importance to the shared storage environment. But in the context of the SNIA Shared Storage Model, it is worth emphasizing that this arises primarily because of their good networking properties, rather than anything specific to storage systems: smooth scalability, fault-tolerance through supporting redundant paths, support for heterogeneity in end-node types, access at a distance, and reasonable cost for high aggregate performance.

To date, most shared storage networks for the block level have been built on Fibre Channel, and most shared file/record-level networks on Ethernet. In the future, the range is likely to become broader (e.g., encompassing Sonet and InfiniBand), and it seems likely that IP-based storage transport protocols will come into being for block-level storagetraffic.

"SAN" versus "NAS"?

While the SNIA Shared Storage Model uses functional decomposition to show what could be built, many product vendors have taken vertical and horizontal slices of the model, explicitly specified the protocols and interconnect, and positioned the resulting products in broadly generalized categories. SAN (Storage Area Network) and NAS (Network Attached Storage) are two such categories.

We have deliberately avoided using the terms "SAN" and "NAS" until now, because the potential preconceptions carried with these terms detract from the implementation neutral nature of the Shared Storage Model. This section encourages the user to look beyond the labels attached to implementations, consider the usage models, and apply informed best judgment.

For example, it is frequently the case that network implementation choices are presented as either/or decisions: Fibre

of 407

Channel or Ethernet or InfiniBand at the hardware level? FCP or TCP/IP transport protocols? Block-level or file/record-level access protocols?

In reality, of course, a combination of the appropriate choices from this space is the best response, rather than slavish adherence to one choice for all purposes. Each has their place; each has their advantages and disadvantages.

For the purposes of the SNIA Shared Storage Model, we define a storage network (SN) as any (mostly) dedicated network, that is installed (mostly) for storage traffic, whatever the hardware, API, or protocol. For example, far from being direct competitors, the "SAN" and "NAS" models actually address different functional layers of the Shared Storage Model, and thus may be implemented in a complementary manner.

As technologies change, generalized terms become more difficult to rationalize. Is iSCSI (running the block system over TCP/IP over an Ethernet network) still a SAN? Is running a file system such as DAFS (Direct Attach File System) over VI over Fibre Channel a NAS?

These distinctions become less relevant in the presence of the transport neutrality of the SNIA Shared Storage Model. In turn, this allows decisions to focus on the consequences of each design choice, rather than adherence to simplistic "slogans." Thus, we encourage the end users of storage networking technology to return to the Shared Storage Model with a focus on the problem to be solved, rather than the terminology used to describe the solution. In this way, unnecessary semantic battles can be avoided.

A challenge

Many of the standards or solutions needed to make the promise of shared storage a reality do not yet exist, or exist onlyin primitive or proprietary forms. Simply put, most operating systems and volume managers still expect a wire, not a complex, intelligent interconnect, between them and their storage devices.

Some of the issues that need to be solved by standardization in the storage networking industry include initialization, discovery, end-to-end binding, address mapping, address distribution and management, name services, and event management. Many of these problems will need to be solved multiple times, because although the Shared Storage Model is transport neutral, the simple act of selecting a particular transport means buying into a particular set of supportmechanisms (or services) that may be incomplete.

Future work by the SNIA Technical Council and Technical Working groups will focus on the "services subsystem" part of the Shared Storage Model, where many of these issues reside. [ Team LiB ][ Team LiB ]

Some common storage architectures This section explores some of the more common (and emerging) storage architectures, and maps them to the SNIA Shared Storage Model. You should be able to recognize many familiar products in these mappings, and for the first time,be able to compare a wide range of architectures against a common model.

Direct-attached block storage

Direct-attached block storage is the most common, most mature, least shared, high-performance storage design. It is characterized by:

� having only one (active) host on each storage interconnect wire,

� the absence of switches or hubs, and

� the use of a block interface protocol for operations over the interconnect.

Figure E-17.

of 407

Aggregation may occur at the device (e.g., array controller) or host (e.g., logical volume manager, software or hardwareRAID implementation).

Storage network-attached block storage

Storage network-attached block storage (aka "SAN") is characterized by:

� multiple hosts and devices attached to a shared storage interconnect,

� employing a block interface protocol over that interconnect and

� by configuring each host with a consistent allocation of the shared device resources.

Here, the storage network functions primarily as a communications medium and does not usually provide any block aggregation functionality of its own.

In the case of some storage networks (e.g., Fibre Channel and ESCON-based SANs), the physical distances between hosts and storage devices can be large enough to have two or more sites separated by hundreds of meters, for better failure tolerance. Such schemes fall into this "pure SAN" category only if they do not perform protocol conversions between the sites, or require specialized peer-to-peer connectivity: that is, if the storage network is transparently extended across the site boundaries.

Figure E-18.

of 407


Block storage aggregation in a storage network ("SAN appliance") Block storage aggregation in a SAN appliance is characterized by:

� having multiple hosts and devices attached to a shared storage interconnect,

� employing a block interface protocol over that interconnect and by

� providing block-aggregation functions in a dedicated "appliance" that is on the data data-path for each operation.

This is a convenient option for centralizing the control over data placement in a shared storage environment: only the SAN appliance has to be updated to change where data is placed on the back-end storage devices. It comes at the cost of adding an additional step to the data-path, and runs the risk of the SAN appliance being a performance—and availability— bottleneck.

Figure E-19.

of 407

The SAN appliance may provide some of the redundancy functions that are normally associated with disk array controllers, or it may simply provide the space management functions of block aggregation.

Storage network-attached block storage with metadata server ("asymmetric block service")

Storage network-attached block storage with metadata server is characterized by

� having multiple hosts and devices attached to a shared storage interconnect,

� employing a block interface protocol over that interconnect,

� having the hosts communicate directly to the storage devices, while

� employing a metadata server to provide layout information ("block metadata") to the hosts on the current layout of block data on those storage devices.

By comparison to the "SAN appliance" architecture, this does not impose additional physical resources in the data-accesspath, but data placement changes require coherent updates of any cached copies of the metadata (layout information) held at the hosts.

Figure E-20.

of 407

Multi-site block storage

Multi-site block storage is characterized by the use of peer-to-peer protocols between like components of two or more systems at different sites to maintain data replicas at each site.

This addresses the increasing need for geographic separation and appropriate decoupling between two or more data sites. In turn, this can be used to enhance data availability in the presence of site disasters, while—with careful caching and update protocols—retaining the performance advantages of having access to a local copy of data. (This is particularly important in the presence of the larger propagation delays and lower bandwidths of long-haul networks.)

The peer-to-peer protocols can be implemented at several different levels, such as between pairs of logical volume managers, SAN appliances (e.g., remote mirroring boxes), and between storage devices themselves, such as disk arrays. The type of network used between the sites is frequently different than the network used within each site, so gateways or protocol conversion boxes may need to be employed to achieve the desired connectivity.

Figure E-21.

of 407

File server

File servers ("NAS" systems) are characterized by:

� bundling storage devices and a file/record subsystem controller into one package,

� employing a client:server file/record protocol to access the data,

� using a network that is typically not specialized for, or dedicated to, storage traffic, such as a LAN.

Of the approaches to shared network storage, this is probably the commonest, most mature, easiest to deploy, and most capable today of supporting heterogeneous hosts. The price is that the file server can sometimes be a performance, capacity or availability bottleneck.

Some database servers exist with a similar architecture.

File server controller ("NAS head")

File server controllers are characterized by:

� decoupling storage devices from the file/record subsystem controller that provides access to them,

Figure E-22.

Figure E-23.

of 407

� employing a client:server file/record protocol to access the file/record subsystem from the client hosts, and using a network that is typically not specialized for, or dedicated to, storage traffic, such as a LAN for the host to file/record subsystem traffic,

� having the file/record subsystem controller, as well as multiple hosts and devices, attached to a shared storage interconnect that employs a block interface protocol.

This variation of the classic file server model has several potential benefits:

� the block storage devices can be used directly (as shared block storage resources) by both the file/record service and the hosts;

� the hosts can be bound to both block and file services from common resources at the same time;

� easier independent scalability of file/record subsystem performance and block-storage performance and capacity.

The cost is largely that of increased complexity of managing the larger numbers of components that are exposed compared to the integrated file server approach.

NAS/file server metadata manager ("asymmetric file service")

NAS/file server metadata managers are characterized by the following (shown applying to the left-hand host in the figure here):

� multiple hosts and devices attached to a shared storage interconnect that employs a block interface protocol;

� a separate file system metadata server that maintains layout information for files ("file metadata"), and provides this to the hosts on request;

� hosts that communicate with the file system metadata server (using an extended client:server file/record protocol)to obtain layout information for the files they wish to access, and then access directly across the shared storage interconnect using a block interface protocol.

This variation offers the performance advantage of direct access to the block storage devices across the storage networkwith the data sharing advantages of file servers. The performance advantages are largely those of high-speed data traffic, and apply only as long as the rate of metadata updates and the amount of cache coherency traffic is not "too

of 407

high."

In addition, the file/record metadata server can also do double-duty and act as a traditional file/record subsystem controller ("NAS head") for hosts that do not support the extended client/server metadata protocols.

Figure E-24.

Object-based Storage Device (OSD), CMU NASD

Object-based Storage Devices (OSD) are characterized by

� storage devices that take on data-layout responsibilities, exporting a large number of "byte vectors" (objects) rather than a small number of logical units; each such object is typically used to hold the contents of a single file;

� a separate metadata server that provides object access-and authentication-information to hosts and (optionally) the storage devices, using an extended client:server file/object interface;

� direct access from hosts to the storage devices across a shared storage network (usually of a kind that is not specialized for storage traffic, such as a LAN);

� using a file/object protocol for client operations on the object-storage devices.

The idea here is to offload data layout and security (access) enforcement responsibilities to the storage devices (whose number can easily be scaled), while retaining the semantic advantages of a shared file system, and the performance advantages of direct access from the hosts to the storage devices. As with the NAS/file server metadata managers, the OSD metadata manager can do double-duty as a file server controller ("NAS head") if necessary or desired (e.g., for backwards compatibility).

Figure E-25.

of 407


Summary and conclusions The SNIA Shared Storage Model is a simple and powerful model for describing shared storage architectures. We believe this model will be valuable to several groups:

� Vendors— to place products in the space of architectures, and clarify product differences.

� Customers— to understand vendor offerings better.

� The storage industry as a whole— as a basis for common definitions, communication, understanding, and increased interoperability.

We hope you, too, find it useful.

Status

The SNIA Shared Storage Model is still a work in progress. The SNIA Technical Council would appreciate feedback on it, and particularly on those occasions where the model fails to cover an architecture or design, or when it could do so moreelegantly. You can send us feedback via email to [email protected], or through the SNIA technical director at [email protected].

Figure E-26.

of 407

[ Team LiB ]

of 407

[ Team LiB ]

Appendix F. The SNIA Dictionary of Storage Networking Terminology The "SNIA Dictionary of Storage Networking Terminology" is administered by the SNIA Education Committee to establishuniform definitions of storage networking terms and concepts. The online version of this dictionary is readily available at www.snia.org/education/dictionary. It is reprinted here with the permission of the SNIA. Comments and suggested additions can be forwarded to the SNIA at [email protected].

8B/10B encoding CONTEXT [Fibre Channel]

An algorithm for encoding data for transmission in which each 8-bit data byte is converted to a 10-bit transmissioncharacter. Invented and patented by IBM Corporation, 8B/10B encoding is used in transmitting data on Fibre Channel, ESCON, and Gigabit Ethernet. 8B/10B encoding supports contin uous transmission with a balanced number of ones and zeros in the code stream and detects single-bit transmission errors.

access CONTEXT [Security]

The opportunity to make use of an information system resource.

access control CONTEXT [File System] [Security]

The granting or withholding of a service or access to a resource to a requestor based on the identity of the principal for which the requestor is acting.

access control list CONTEXT [File System] [Security]

A persistent list that enumerates the rights of principals (e.g., users and groups of users) to access resources. Often used in a file system context to denote a persistent list maintained by a file system and defining user and group permissions for file and directory access.

access control mechanism CONTEXT [Security]

A security safeguard designed to detect and deny unauthorized access and permit authorized access in an information system.

access fairness CONTEXT [Fibre Channel]

A process by which nodes are guaranteed access to a Fibre Channel arbitrated loop independent of other nodes' activity.

access method CONTEXT [Fibre Channel][Operating System]

of 407

The means used to access a physical transmission medium in order to transmit data.

(In IBM Corporation's OS/390 operating system and its precursors) A file organization method, such as sequential, random, indexed, etc., and the operating system software used to implement it.

access path CONTEXT [Storage System]

The combination of host bus adapter, logical unit number, route through the host-storage interconnect, controller, and logical unit used by a computer to communicate with a storage device. Some configurations support multiple access paths to a single device. cf. multi-path I/O

ACL CONTEXT [File System] [Security]

Acronym for access control list.

ACS CONTEXT [Backup]

Acronym for automated cartridge system.

active CONTEXT [Fibre Channel]

The state of a Fibre Channel Sequence Initiator between the start of transmission of the first data frame of a sequence and the completion of transmission of the last data frame in the sequence. Also, the state of a Fibre Channel Sequence Recipient between the start of reception of the first data frame of a sequence and the completion of reception of the last data frame in the sequence.

active-active (components, controllers) CONTEXT [Storage System]

Synonym for dual active components or controllers.

active copper CONTEXT [Fibre Channel]

A type of Fibre Channel physical connection that allows up to 30 meters of copper cable between adjacent devices.

active-passive (components, controllers) CONTEXT [Storage System]

Synonym for hot standby components or controllers.

active component

A system component that requires electrical power to operate. In a storage subsystem, for example, active

of 407

components might include power supplies, storage devices, fans, and controllers. By contrast, enclosures and canisters are not normally active components.

adapter

A hardware device that converts the timing and protocol of one bus or interface to another. Adapters are typically implemented as specialized hardware on system boards, or as add-in cards, to enable a computer system's processing hardware to access peripheral devices. An example adapter is a host bus adapter. Adapter is the preferred term in Fibre Channel contexts. cf. adapter card, host bus adapter, network interface card, NIC

adapter card

An adapter implemented as a printed circuit module.

adaptive array CONTEXT [Storage System]

A disk array that is capable of changing its algorithm for virtual data address to physical data location mapping dynamically (i.e., while the array is operating). AutoRAID from Hewlett-Packard, which can change a given virtual disk representation from mirrored to parity RAID, is an adaptive array.

address CONTEXT [Storage Device] [Storage System][SCSI]

A fixed length bit pattern that uniquely identifies a block of data stored on a disk or tape. Typically treated as a number to which a mapping algorithm can be applied to determine the physical location of a block of data. Applying this mapping algorithm to disk block addresses yields the cylinder, head, and relative sector number at which data may be found. Tape block addresses typically refer to the relative position of a block of data in a single linear stream of blocks.

A fixed-length bit pattern that uniquely identifies a location (bit, byte, word, etc.) in a computer memory. [SCSI] Abyte whose value uniquely identifies a device connected to a SCSI bus for purposes of communication.

address identifier CONTEXT [Fibre Channel]

An address value used to identify the source (S_ID) or destination (D_ID) of a frame. The FC-SW standard includes a table of special address identifier values and their meanings.

address resolution CONTEXT [Network]

The process of determining a MAC address, given a more abstract LAN or WAN address.

Address Resolution Protocol CONTEXT [Network]

Any protocol used to obtain a mapping from a higher layer address to a lower layer address. Abbreviated ARP. TheAcronym ARP is most often used to refer to the Ethernet Address Resolution Protocol (below).

of 407

The protocol used by an IP networking layer to map IP addresses to lower level hardware (i.e., MAC) addresses. There are four ARP messages for IP running over Ethernet: ARP requests and replies and reverse ARP request and replies.

addressing CONTEXT [Storage System]

An algorithm by which areas of fixed disk, removable cartridge media, or computer system main memory are uniquely identified. cf. block addressing, C-H-S addressing, explicit addressing, implicit addressing

administration host CONTEXT [Network]

A computer that manages one or more storage subsystems (e.g., filers, disk array subsystems, tape subsystems, etc.)

administrator

A person charged with the installation, configuration, and management of a computer system, network, storage subsystem, database, or application.

advanced encryption standard CONTEXT [Security]

A cryptographic algorithm designated by NIST as a replacement for DES. Abbreviated AES. The actual algorithm selected is also known as Rijndael.

Advanced Intelligent Tape CONTEXT [Storage Device]

A tape device and media technology introduced by Sony Corporation.

AES CONTEXT [Security]

Acronym for advanced encryption standard.

agent

A program that performs one or more services (such as gathering information from the Internet), acting for or as a principal.

aggregation CONTEXT [Network] [Storage System]

The combining of multiple similar and related objects or operations into a single one. Several disk or tape data streams are sometimes aggregated into a single stream for higher performance. Two or more disks can be aggregated into a single virtual disk for increased capacity. Two or more disks can be aggregated into a RAID

of 407

array for high availability. Two or more I/O requests for adjacently located data can be aggregated into a single request to minimize request processing overhead and rotational latency. cf. consolidation

AH CONTEXT [Security]

Acronym for authentication header.

AIT CONTEXT [Storage Device]

Acronym for Advanced Intelligent Tape.

algorithmic mapping CONTEXT [Storage System]

Use of an algorithm to translate from one data addressing domain to another. If a volume is algorithmically mapped, the physical location of a block of data may be calculated from its virtual volume address using known characteristics of the volume (e.g., stripe depth and number of member disks). cf. dynamic mapping, tabular mapping

alias CONTEXT [Fibre Channel]

An alternate name for an entity that is more easily human-readable. Aliases are sometimes used for grouping purposes. cf. alias address identifier

alias address identifier CONTEXT [Fibre Channel]

One or more address identifiers which may be recognized by an N_Port in addition to its N_Port identifier. Alias address identifiers are used to form groups of N_Ports so that frames may be addressed to a group rather than to individual N_Ports. cf. hunt group, multicast group

AL_PA CONTEXT [Fibre Channel]

Acronym for Arbitrated Loop Physical Address.

alternate client restore CONTEXT [Backup]

The process of restoring files to a different client than the one from which they were backed up.

alternate path restore CONTEXT [Backup]

The process of restoring files to a different directory than the one from which they were backed up.

of 407

always on CONTEXT [General][Fibre Channel]

The state of always having power applied (systems) or of being continually active (communication links).

[Fibre Channel] A state of always being powered on and/or continually active. In a Fibre Channel, ESCON, or Gigabit Ethernet context, "always on" describes the state of an operational link. It is constantly transmitting either data frames, idles or fill words. This can be contrasted with bursty transmissions and listening for a quiet line in Ethernet. For Fibre Channel management purposes being "always on" allows link level error detection on each transmitted word.

American National Standards Institute

A coordinating organization for voluntary standards in the United States. Often abbreviated ANSI. The ANSI working committees most closely aligned with storage networking interests are called X3T10 (principally responsible for SCSI I/O interface standards), and X3T11 (principally responsible for Fibre Channel interface standards).

ANSI

Acronym for American National Standards Institute.

ANSI T10

The American National Standards Institute T10 technical committee, the standards organization responsible for SCSI standards for communication between computers and storage subsystems and devices.

ANSI T11

The American National Standards Institute T11 technical committee, the standards organization responsible for Fibre Channel and certain other standards for moving electronic data into and out of computers and intelligent storage subsystems and devices.

ANSI X3T10

The American National Standards Institute committee responsible for standards for accessing and controlling I/O devices. ANSI X3T10 is responsible for the SCSI family of standards. Often shortened to T10.

ANSI X3T11

The American National Standards Institute committee responsible for standards for high performance I/O interfaces such as Fibre Channel and HIPPI. Often shortened to T11.

of 407

API

Acronym for Application Programming Interface.

appliance

An intelligent device programmed to perform a single well-defined function, such as providing file, web, or print services. Appliances differ from general purpose computers in that their software is normally customized for the function they perform, pre-loaded by the vendor, and not alterable by the user. Appliances are generally considered to be capable of performing their specialized functions at lower cost and with higher reliability than general purpose servers. cf. filer

application I/O request application read request application write request CONTEXT [Storage System]

I/O requests made by storage clients, as distinguished from member I/O requests made by a storage subsystem's own control software. SNIA publications do not generally distinguish between I/O requests made by the operating environment (e.g., for paging, swapping, file system directory lookups, etc.) and those made by user applications.

application programming interface

An interface used by an application program to request services. Abbreviated API. The term API is usually used to denote interfaces between applications and the software components that comprise the operating environment (e.g., operating system, file system, volume manager, device drivers, etc.)

application response measurement

An Open Group technical standard defining function calls for transaction monitoring. The ARM standard is being advanced in both The Open Group and the Distributed Management Task Force. The latter organization has defined an object oriented information model for describing units of work (i.e., transactions).

application specific integrated circuit

An integrated circuit designed for a particular application, such as interfacing to a SCSI bus. Abbreviated ASIC.

arbitrated loop CONTEXT [Fibre Channel]

Fibre Channel interconnect topology in which each port is connected to the next, forming a loop. At any instant, only one port in a Fibre Channel Arbitrated Loop can transmit data. Before transmitting data, a port in a Fibre Channel Arbitrated Loop must participate with all other ports in the loop in an arbitration to gain the right to transmit data. The arbitration logic is distributed among all of a loop's ports.

The version of the Fibre Channel protocol used with the arbitrated loop physical topology.

arbitrated loop physical address CONTEXT [Fibre Channel]

of 407

An 8-bit value used to identify a participating device in an arbitrated loop.

arbitration

Any process by which a user of a shared resource negotiates with other users for the (usually temporary) right to use the resource. A port connected to a shared bus must win arbitration before it transmits data on the bus.

archive CONTEXT [Backup]

A consistent copy of a collection of data, usually taken for the purpose of maintaining a long-term durable record of a business or application state. Archives are normally used for auditing or analysis rather than for application recovery. After files are archived, online copies of them are typically deleted, and must be restored by explicit action. cf. backup

archiving CONTEXT [Backup]

The act of creating an archive.

ARM

Acronym for application response measurement.

A common microprocessor architecture, as well as the name of the company that created the architecture.

ARP CONTEXT [Network]

Acronym for Address Resolution Protocol.

array CONTEXT [Storage System]

A storage array, i.e., a disk array or tape array.

array configuration CONTEXT [Storage System]

Assignment of the disks and operating parameters for a disk array. Disk array configuration includes designating the array's member disks or extents and the order in which they are to be used, as well as setting parameters such as stripe depth, RAID model, cache allowance, spare disk assignments, etc. cf. configuration, physical configuration

The arrangement of disks and operating parameters that results from such an assignment.

of 407

Association_Header CONTEXT [Fibre Channel]

An optional header used to associate a Fibre Channel exchange with a process, system image, or multi-exchange I/O operation on an end system. May also be used as part of Exchange Identifier management.

ASIC

Acronym for application specific integrated circuit.

asymmetric cryptosystem CONTEXT [Security]

A cryptographic algorithm in which different keys are used to encrypt and decrypt a single message or block of stored information. One of the keys is kept secret and referred to as a private key; the other key can be freely disclosed and is called a public key.

asymmetric virtualization

Synonym for out-of-band virtualization. Out-of-band virtualization is the preferred term.

asynchronous I/O request CONTEXT [Storage Device] [Storage System]

A request to perform an asynchronous I/O operation.

asynchronous I/O operation CONTEXT [Storage Device] [Storage System]

An I/O operation whose initiator does not await its completion before proceeding with other work. Asynchronous I/O operations enable an initiator to have multiple concurrent I/O operations in progress.

Asynchronous Transfer Mode CONTEXT [Network]

A connection-oriented data communications technology based on switching 53-byte fixed-length units of data called cells. Abbreviated ATM. Each cell is dynamically routed. ATM transmission rates are multiples of 51.840 Mbits per second. In the United States, a public communications service called SONET uses ATM at transmission rates of 155, 622, 2048, and 9196 Mbits per second. These are called OC-3, OC-12, OC-48, and OC-192 respectively. A similar service called SDH is offered in Europe. ATM is also used as a LAN infrastructure, sometimeswith different transmission rates and coding methods than are offered with SONET and SDH. More information is available from the ATM Forum.

ATM CONTEXT [Network]

Acronym for Asynchronous Transfer Mode.

of 407

atomic operation

An indivisible operation that, from an external perspective, occurs either in its entirety or not at all. For example, database management systems that implement the concept of business transactions treat each business transaction as an atomic operation on the database. This means that either all of the database updates that comprise a transaction are performed or none of them are performed; it is never the case that some of them are performed and others not. RAID arrays must implement atomic write operations to properly reproduce single-disk semantics from the perspective of their clients. Atomic operations are required to ensure that component failures do not corrupt stored data.

attenuation

The power dissipation between an optical or electrical transmitter and a receiver. Expressed in units of decibels (dB).

audit trail CONTEXT [Network][Security]

A chronological record of system activities that enables the reconstruction and examination of a sequence of events and/or changes in an object. The term audit trail may apply to information in an information system, to message routing in a communications system, or to any transfer of sensitive material and/or information.

authentication CONTEXT [Network]

The process of determining what principal a requestor or provider of services is or represents.

CONTEXT [Security]

A security measure designed to establish the validity of a transmission, message, or originator, or a means of verifying an individual's authorization to receive information.

authentication header CONTEXT[Security]

A component of IPsec that permits the specification of various authentication mechanisms designed to provide connectionless integrity, data origin authentication, and an optional anti-replay service. Standardized by the Internet Engineering Task Force.

authorization CONTEXT [Network]

The process of determining that a requestor is allowed to receive a service or perform an operation. Access controlis an example of authorization.

CONTEXT [Security]

The limiting of usage of information system resources to authorized users, programs, processes or other systems.

of 407

Access control is a specific type of authorization. Authorization is formally described as controlling usage by subjects of objects.

auto swap

Abbreviation for automatic swap . cf. cold swap, hot swap, warm swap

automated cartridge system CONTEXT [Backup]

Synonym for robot.

automatic backup CONTEXT [Backup]

A backup triggered by an event (e.g., a schedule point, or a threshold reached) rather than by human action.

automatic failover

Failover that occurs without human intervention.

automatic swap

The substitution of a replacement unit (RU) in a system for a defective one, where the substitution is performed bythe system itself while it continues to perform its normal function (possibly at a reduced rate of performance). Automatic swaps are functional rather than physical substitutions, and do not require human intervention. Ultimately, however, defective components must be replaced in a physical hot, warm, or cold swap operation. cf. cold swap, hot swap, warm swap, hot spare

automatic switchover

Synonym for automatic failover.

availability

The amount of time that a system is available during those time periods when it is expected to be available. Availability is often measured as a percentage of an elapsed year. For example, 99.95% availability equates to 4.38 hours of downtime in a year (0.0005 * 365 * 24 = 4.38) for a system that is expected to be available all the time. cf. data availability, high availability

B_Port CONTEXT [Fibre Channel]

A port on a Fibre Channel bridge device. cf. E_Port, F_Port, FL_Port, G_Port, L_Port, N_Port, NL_Port

of 407

backing store

Non-volatile memory. The term backing store is often used to contrast with cache, which is a (usually) volatile random access memory used to speed up I/O operations. Data held in a volatile cache must be replicated in or saved to a non-volatile backing store so that it can survive a system crash or power failure.

backup CONTEXT [Backup]

(noun) A collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible. Also called backup copy. To be useful for recovery, a backup must be made by copying the source data image when it is in a consistent state.

(noun) A process that creates a backup (definition 1).

(verb) The act of creating a backup. cf. archive

backup client CONTEXT [Backup]

A computer system containing online data to be backed up.

backup image CONTEXT [Backup]

The collection of data that constitutes a backup copy of a given set of online data.

backup manager CONTEXT [Backup]

An application program whose purpose is to schedule and manage backup operations.

backup policy CONTEXT [Backup]

An IT installation's rules for how and when backup should be performed. Backup policies specify information such as which files or directories are to be backed up, the schedule on which backups should occur, which devices and media are eligible to receive the backups, how many copies are to be made, and actions to be performed if a backup does not succeed.

backup window CONTEXT [Backup]

An interval of time during which a set of data can be backed up without seriously affecting applications that use the data. For example, if an application accesses data from 8AM until midnight, then the window between midnightand 8AM is available for making backup copies. Offline backups require that applications not update data during the backup. Online backups typically use point-in-time copy technology to create consistent images of data for backup. If a backup uses different resources (storage devices, I/O paths, processing power) than the application, as is common with split mirror point-in-time copies, then the backup window is the time required to create the image. If the online backup shares resources with the applications using the data, as is common with copy-on-

of 407

write point-in-time copies, the backup window may be increased due to resource contention.

bandwidth

The numerical difference between the upper and lower frequencies of a band of electromagnetic radiation.

Synonym for data transfer capacity.

basic input output system

A relatively small program that resides in programmable, non-volatile memory on a personal computer and that is responsible for booting that computer and performing certain operating system independent I/O operations. Abbreviated BIOS. Standard BIOS interrupts are defined to allow access to the computer's disk, video and other hardware components (for example, INT13 for disk access).

baud CONTEXT [Network]

The maximum rate of signal state changes per second on a communications circuit. If each signal state change corresponds to a code bit, then the baud rate and the bit rate are the same. It is also possible for signal state changes to correspond to more than one code bit, so the baud rate may be lower than the code bit rate.

Bayonet Neil Councilman (connector) CONTEXT [Network]

A type of coaxial cable connector sometimes used in Ethernet applications. Abbreviated BNC. The specification for BNC connectors is contained in EIA/TIA 403-A and MIL-C-39012.

BB_buffer CONTEXT [Fibre Channel]

A buffer associated with buffer to buffer flow control.

BB_credit CONTEXT [Fibre Channel]

Buffer-to-buffer credit; used to determine how many frames can be sent to a recipient when buffer to buffer flow control is in use.

beginning running disparity CONTEXT [Fibre Channel]

The running disparity present at a transmitter or receiver when an ordered set is initiated.

BER CONTEXT [Network] [Storage Device]

of 407

Acronym for Bit Error Rate.

Berkeley RAID Levels CONTEXT [Storage System]

A family of disk array data protection and mapping techniques described by Garth Gibson, Randy Katz, and David Patterson in papers written while they were performing research into I/O subsystems at the University of Californiaat Berkeley. There are six Berkeley RAID Levels, usually referred to by the names RAID Level 1, etc., through RAID Level 6. cf. RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5, RAID 6

best effort (class of service) CONTEXT [Fibre Channel] [Network]

A class of service which does not guarantee delivery of packets, frames, or datagrams, but for which the network, fabric, or interconnect makes every reasonable delivery effort.

big endian

A format for the storage and transmission of binary data in which the most significant bits are stored at the numerically lowest addresses, or are transmitted first on a serial link.

BIOS

Acronym for basic input output system.

bit error rate CONTEXT [Network] [Storage Device]

The probability that a transmitted bit will be erroneously received. Abbreviated BER. The BER is measured by counting the number of bits in error at the output of a receiver and dividing by the total number of bits in the transmission. BER is typically expressed as a negative power of 10.

bit synchronization

The process by which the receiver of a serial communication establishes its clocking used to locate code bits in a received data stream.

black CONTEXT [Security]

A designation applied to information systems in the context of security analysis, and to associated areas, circuits, components, and equipment, in which sensitive information is not processed.

blind mating

The ability of pairs of components to be connected without the electrical or optical connection points being visible.

of 407

Blind mating is usually accomplished by mechanical guides (e.g., slots and rails) on the components.

block CONTEXT [Fibre Channel] [Storage Device] [Storage System]

The unit in which data is stored and retrieved on disk and tape devices. Blocks are the atomic unit of data recognition (through a preamble and block header) and protection (through a CRC or ECC).

A unit of application data from a single information category that is transferred within a single sequence.

block addressing CONTEXT [Storage Device] [Storage System]

An algorithm for uniquely identifying blocks of data stored on disk or tape media by number, and then translating these numbers into physical locations on the media. cf. C-H-S addressing

block virtualization

The act of applying virtualization (q.v.), to one or more block based (storage) services for the purpose of providinga new aggregated, higher level, richer, simpler, secure etc., block service to clients. Block virtualization functions can be nested. A disk drive, RAID system and volume manager all perform some form of block address to (different) block address mapping or aggregation. cf. file virtualization.

BNC CONTEXT [Network]

Acronym for Bayonet Neil Councilman, a type of Coaxial Cable Connector.

Specifications for BNC style connectors are defined in EIA/TIA 403-A and MIL-C-39012.

boot booting bootstrapping

The loading of code from a disk or other storage device into a computer's memory. Bootstrapping is an appropriateterm since a code load typically occurs in steps, starting with a very simple program (BIOS) that initializes the computer's hardware and reads a sequence of data blocks from a fixed location on a pre-determined disk, into a fixed memory location. The data thus read is the code for the next stage of bootstrapping—usually an operating system loader. The loader completes the hardware setup and results in an executing operating system, in memory.

bridge controller CONTEXT [Storage System]

A storage controller that forms a bridge between two external I/O buses. Bridge controllers are commonly used to connect single-ended SCSI disks to differential SCSI or Fibre Channel host I/O buses.

broadcast CONTEXT [Fibre Channel] [Network]

The simultaneous transmission of a message to all receivers (ports) connected to a communication facility.

of 407

Broadcast can be contrasted with unicast (sending a message to a specific receiver) and multicast (sending a message to a select subset of receivers). In a Fibre Channel context, broadcast specifically refers to the sending of a message to all N_Ports connected to a fabric. cf. multicast, unicast

buffer

A solid state memory device or programming construct, used to hold data momentarily as it moves along an I/O path or between software components. Buffers allow devices to communicate using links with faster or slower datatransfer speeds, allow devices with different native processing speeds to intercommunicate, and allow software components to intercommunicate, share data and coordinate their activities. cf. cache

buffer to buffer flow control CONTEXT [Fibre Channel]

Flow control that occurs between two directly connected Fibre Channel ports, e.g., an N_Port and its associated F_Port. A port indicates the number of frames buffers that can be sent to it (its buffer credit), before the sender is required to stop transmitting and wait for the receipt of a "ready" indication. Buffer to buffer flow control is used only when an NL-Local port is logged into another NL-Local port, or when Nx ports are logged into Fx ports.

bypass circuit CONTEXT [Fibre Channel]

A circuit that automatically removes a device from a data path (such as a Fibre Channel arbitrated loop) when valid signaling is lost.

byte CONTEXT [Fibre Channel]

An eight-bit organizational unit for data.

The unit in which data is delivered to and by applications. In Fibre Channel, bytes are organized with the least significant bit denoted as bit 0 and most significant bit as bit 7. The most significant bit is shown on the left side in FC-PH documents, unless specifically indicated otherwise.

CA CONTEXT [Security]

Acronym for certification authority

cable plant

All of an installation's passive communications elements (e.g., optical fibre, twisted pair, or coaxial cable, connectors, splices, etc.) between a transmitter and a receiver.

cache

A high speed memory or storage device used to reduce the effective time required to read data from or write data to a lower speed memory or device.

of 407

Read cache holds data in anticipation that it will be requested by a client. Write cache holds data written by a client until it can be safely stored on more permanent storage media such as disk or tape. cf. buffer, disk cache, write-back cache, write-through cache

canister CONTEXT [Storage System]

An enclosure for a single disk or tape. Canisters are usually designed to mount in shelves, which supply power, cooling, and I/O bus services to the devices. Canisters are used to minimize RF emissions and to simplify insertion and removal of devices in multi-device storage subsystems. cf. shelf

carousel CONTEXT [Backup]

A media handling robot in which the media are stored in and selected from a rotating wheel.

carrier sense multiple access with collision detection CONTEXT [Network]

A physical layer data transmission protocol used in Ethernet and fast Ethernet networks. Abbreviated CSMA/CD. Carrier sense refers to arbitration for a shared link. Unlike "always on" physical protocols, carrier sense protocols require a node wishing to transmit to wait for the absence of carrier (indicating that another node is transmitting) on the link. Multiple access refers to the party line nature of the link. A large number of nodes (up to 500 in the case of Ethernet) share access to a single link. Collision detection refers to the possibility that two nodes will simultaneously sense absence of carrier and begin to transmit, interfering with each other. Nodes are required to detect this interference, and cease transmitting. In the case of Ethernet, each node detecting a collision is requiredto wait for a random interval before attempting to transmit again.

cascading CONTEXT [Fibre Channel]

The process of connecting two or more Fibre Channel hubs or switches together to increase the number of ports orextend distances.

catalog CONTEXT [Backup][File System]

[Backup] A stored list of backed up files and directories and the locations (media identifiers) of the backup copies. Backup managers use catalogs to determine what files must be backed up, and to determine which media must bemounted and read to perform a restore.

[File System] A persistent data structure used by some file systems to keep track of the files they manage.

CDR CONTEXT [Fibre Channel]

Acronym for clock and data recovery.

certification authority CONTEXT [Security]

of 407

In a public key infrastructure (PKI), the authority and organization responsible for issuing and revoking user certificates, and ensuring compliance with the PKI policies and procedures.

changed block changed block point-in-time copy CONTEXT [Storage System]

Any of a class of point-in-time copy implementations or the resulting copies in which the copy and its source share storage for portions (usually blocks) of the copy that are not subsequently modified (in the source, or in the copy if the copy is writeable). Storage is physically copied only as a consequence of modifications (to the source, or to the copy if the copy is writeable). A changed block copy occupies only the storage necessary to hold the blocks of storage that have been changed since the point in time at which the copy logically occurred.

channel

[storage] The electrical circuits that sense or cause the state changes in recording media and convert between those state changes and electrical signals that can be interpreted as data bits.

[I/O] Synonym for I/O bus. The term channel has other meanings in other branches of computer technology. The definitions given here are commonly used when discussing storage and networking. cf. device channel, device I/O bus, I/O bus, host I/O bus

character CONTEXT [Fibre Channel]

In general computing usage, synonym for byte.

A 10-bit information unit transmitted and received by FC-1. 8B/10B encoding provides the mapping between 8 bitsof data and a 10-bit transmission character. Some transmission characters correspond to special codes and not all 10-bit sequences represent valid transmission characters.

character cell interface

Synonym for command line interface.

check data CONTEXT [Storage System]

In a RAID array, data stored on member disks that can be used for regenerating any user data that becomes inaccessible.

checkpoint CONTEXT [Backup] [File System]

The recorded state of an application at an instant of time, including data, in-memory variables, program counter, and all other context that would be required to resume application execution from the recorded state.

An activity of a file system (such as the High Performance File System, HPFS, or the Andrews File System, AFS) in which cached metadata (data about the structures of the file system) is periodically written to the file system's permanent store. This allows the file system to maintain consistency if an unexpected stop occurs.

of 407

chunk CONTEXT [Storage System]

Synonym for strip.

chunk size CONTEXT [Storage System]

Synonym for stripe depth and strip size.

C-H-S addressing CONTEXT [Storage System]

Synonym for cylinder-head-sector addressing.

CIFS CONTEXT [File System]

Acronym for Common Internet File System.

CIM CONTEXT [Management] [Network]

Acronym for Common Information Model.

cipher CONTEXT [Security]

Any cryptographic system in which arbitrary symbols, or groups of symbols, represent units of plaintext or in which units of plaintext are rearranged, or both.

ciphertext CONTEXT [Security]

Data that has been encrypted for security reasons. cf. cleartext

circuit CONTEXT [Fibre Channel] [Network]

Synonym for communication circuit.

CKD (architecture) CONTEXT [Storage System]

Synonym for count-key-data disk architecture.

of 407

Class 1 CONTEXT [Fibre Channel]

A connection-oriented class of communication service in which the entire bandwidth of the link between two ports is dedicated for communication between the ports and not used for other purposes. Also known as dedicated connection service. Class 1 service is not widely implemented. cf. intermix


A connectionless Fibre Channel communication service which multiplexes frames from one or more N_Ports or NL_Ports. Class 2 frames are explicitly acknowledged by the receiver, and notification of delivery failure is provided. This class of service includes end to end flow control.


A connectionless Fibre Channel communication service which multiplexes frames to or from one or more N_Ports or NL_Ports. Class 3 frames are datagrams, that is, they are not explicitly acknowledged, and delivery is on a "best effort" basis.


A connection-oriented class of communication service in which a fraction of the bandwidth of the link between two ports is dedicated for communication between the ports. The remaining bandwidth may be used for other purposes. Class 4 service supports bounds on the maximum time to deliver a frame from sender to receiver. Also known as fractional service. Class 4 service is not widely implemented.


A connection-oriented class of communication service between two Fibre Channel ports that provides dedicated unidirectional connections for reliable multicast. Also known as uni-directional dedicated connection service. Class 6 service is not widely implemented.

classified information CONTEXT [Security]

Information that an appropriate agency has determined to require protection against unauthorized disclosure and has caused to be marked to indicate its classified status.

class of service CONTEXT [Networking] [Fibre Channel]

A mechanism for managing traffic in a network by specifying message or packet priority. The characteristics and guarantees of the transport layer of a Fibre Channel circuit. Fibre Channel classes of service include: connection services (Class 1), guaranteed frame delivery with end to end flow control (Class 2), packetized frame datagrams (Class 3), quality of service sub-channels (e.g., constant sub-rate or constant latency) (Class 4). Different classes of service may simultaneously exist in a fabric. The form and reliability of delivery in Class 3 circuits may vary withthe topology. Different classes of service may simultaneously exist in a fabric. The identification and grouping of data packets based on a priority label (in the packet header) or via other mechanisms (such as "per hop behavior," defined by the IETF's Differentiated Services).

of 407

cleartext

Data that is not encrypted. cf. ciphertext

CLI

Acronym for command line interface.

client

An intelligent device or system that requests services from other intelligent devices, systems, or appliances. cf. server

An asymmetric relationship with a second party (a server) in which the client initiates requests and the server responds to those requests.

client service request CONTEXT [Fibre Channel]

A request issued by a client application to a well-known service. An example is a name service query.

clock and data recovery CONTEXT [Fibre Channel]

definition coming soon

cluster

A collection of computers that are interconnected (typically at high speeds) for the purpose of improving reliability,availability, serviceability and/or performance (via load balancing). Often, clustered computers have access to a common pool of storage, and run special software to coordinate the component computers' activities.

CMIP CONTEXT [Management] [Network]

Acronym for Common Management Information Protocol.

coaxial cable

An electrical transmission medium consisting of two concentric conductors separated by a dielectric material with the spacings and material arranged to give a specified electrical impedance. cf. triaxial cable

of 407

code balance CONTEXT [Fibre Channel]

The number of 1 bits in a 10-bit transmitted data stream divided by 10 (e.g., 1110100011 has a code balance of 6/10 = 60%).

code bit CONTEXT [Fibre Channel]

A bit (binary digit) of an encoded datum. Sequences of code bits make up symbols, each of which corresponds to adata element (word, byte, or other unit). The smallest time period used by FC-0 for transmission on the media.

code byte CONTEXT [Network]

A byte of an encoded datum. Sometimes called a symbol. Code bytes are the output of encoding or encryption processes. In communication theory contexts, a code byte is often referred to as a code word. cf. data byte

code violation CONTEXT [Fibre Channel]

The error condition that occurs when a received transmission character cannot be decoded into a valid data byte or special code using the validity checking rules specified by the transmission code.

cold backup CONTEXT [Backup]

Synonym for offline backup. cf. hot backup, online backup

cold swap

The substitution of a replacement unit (RU) in a system for a defective one, where external power must be removed from the system in order to perform the substitution. A cold swap is a physical substitution as well as a functional one. cf. automatic swap, hot swap, warm swap

comma character CONTEXT [Fibre Channel]

Either of the seven bit sequences 0011111 or 1100000 in an encoded stream. A special character containing a comma.

command line interface

A form of human interface to intelligent devices characterized by non-directive prompting and character string user input. Perceived by many users to be more difficult to comprehend and use than graphical user interfaces (GUI).

Common Information Model CONTEXT [Management] [Network]

of 407

An object oriented description of the entities and relationships in a business' management environment maintainedby the Distributed Management Task Force. Abbreviated CIM. CIM is divided into a Core Model and Common Models. The Core Model addresses high-level concepts (such as systems and devices), as well as fundamental relationships (such as dependencies). The Common Models describe specific problem domains such as computer system, network, user or device management. The Common Models are subclasses of the Core Model and may also be subclasses of each other.

Common Internet File System CONTEXT [Network]

A network file system access protocol originally designed and implemented by Microsoft Corporation under the name Server Message Block protocol, and primarily used by Windows clients to communicate file access requests to Windows servers. Abbreviated CIFS. Today, other implementations of the CIFS protocol allow other clients and servers to use it for intercommunication and interoperation with Microsoft operating systems.

Common Management Information Protocol CONTEXT [Management] [Network]

A network management protocol built on the Open Systems Interconnection (OSI) communication model. Abbreviated CMIP. CMIP is more complete, and therefore larger than, SNMP.

communication circuit CONTEXT [Fibre Channel] [Network]

A bidirectional path for message exchange within a Fibre Channel fabric.

In networking, a specific logical or physical path between two points over which communications occur.

communications security CONTEXT [Network][Security]

Protection of information while it is being transmitted, particularly via telecommunications. A particular focus of communications security is message authenticity. Communications security may include cryptography, transmission security, emission security, and physical security.

complex array CONTEXT [Storage System]

A disk array whose control software protects and maps data according to more complex algorithms than those of the Berkeley RAID Levels. The most common complex arrays are multi-level disk arrays, which perform more than one level of data address mapping, and adaptive arrays, which are capable of changing data address mapping dynamically.

compression CONTEXT [Backup] [File System] [Network] [Storage Device] [Storage System]

The process of encoding data to reduce its size. Lossy compression (i.e., compression using a technique in which a portion of the original information is lost) is acceptable for some forms of data (e.g., digital images) in some applications, but for most IT applications, lossless compression (i.e., compression using a technique that preservesthe entire content of the original data, and from which the original data can be reconstructed exactly) is required.

of 407

computer security CONTEXT [Security]

Measures and controls that ensure confidentiality, integrity, and availability of information system assets, includinghardware, software, firmware, and information being processed, stored, and communicated.

concatenation CONTEXT [Network] [Storage System]

A logical joining of two series of data. Usually represented by the symbol "|". In data communications, two or more data are often concatenated to provide a unique name or reference (e.g., S_ID | X_ID). Volume managers concatenate disk address spaces to present a single larger address spaces.

concurrency

The property of overlapping in time. Usually refers to the execution of I/O operations or I/O requests.

concurrent concurrent copy CONTEXT [Storage System]

A hybrid point-in-time copy mechanism for which each copy is initially a changed block copy (i.e., shares unmodified storage with its source), but over time becomes a split mirror copy (i.e., does not share any storage with its source) without changing the point in time at which the copy logically occurred, independent of whether and where modifications to the source or the copy subsequently occur. A concurrent copy occupies at least the amount of storage required to hold changed blocks and grows to occupy as much storage as the copy source.

concurrent operations

Operations that overlap in time. The concept of concurrent I/O operations is central to the use of independent access arrays in throughput-intensive applications.

conditioning

The processing of a signal for the purpose of making it conform more closely to an ideal. Power conditioning is used to minimize voltage and frequency variations in an external power source. Signal conditioning is used to reduce noise in logic or data signals.

confidentiality CONTEXT [Security]

Encryption (in a security context).

configuration CONTEXT [Storage System]

The process of installing or removing hardware or software components required for a system or subsystem to function.

of 407

Assignment of the operating parameters of a system, subsystem or device. Disk array configuration, for example, includes designating the array's member disks or extents, as well as parameters such as stripe depth, RAID model, cache allowance, etc. The collection of a system's hardware and software components and operating parameters. cf. array configuration, physical configuration

connection CONTEXT [Fibre Channel]

Short form of dedicated connection.

connection initiator CONTEXT [Fibre Channel]

An N_Port which initiates a Class 1 connection with a destination N_Port through a connect-request and receives a valid response from the destination N_Port to establish the connection.

connection recipient CONTEXT [Fibre Channel]

An N_Port which receives a Class 1 connect-request from a connection initiator and accepts the connection requestby transmitting a valid response.

connectionless buffer CONTEXT [Fibre Channel]

A receive buffer used in a connectionless service and capable of receiving connectionless frames.

connectionless frame CONTEXT [Fibre Channel]

A frame used in a connectionless service (i.e., Class 1 frames with SOF(C1), Class 2, and Class 3 frames referred to individually or collectively).

connectionless integrity service CONTEXT [Security]

A security service that provides data integrity service for an individual IP datagram by detecting modification of the datagram without regard to the ordering of the datagram in a stream of datagrams.

connectionless service CONTEXT [Fibre Channel]

Communication between two N_Ports or NL_Ports without a dedicated connection.

console

A device for graphical or textual visual output from a computer system. In systems, network and device management, an application that provides graphical and textual feedback regarding operation and status, and thatmay accept operator commands and input influencing operation and status. Sometimes called enterprise

of 407

management console.

consolidation CONTEXT [Storage System]

The process of accumulating the data for a number of sequential write requests in a cache, and performing a smaller number of larger write requests to achieve more efficient device utilization.

continuously increasing relative offset CONTEXT [Fibre Channel]

A transmission control algorithm in which the frames containing the subblocks that comprise a block of informationare transmitted strictly in the order of the subblocks. Continuously increasing relative offset simplifies reassembly and detection of lost frames relative to random relative offset.

control software CONTEXT [Storage System]

A body of software that provides common control and management for one or more disk arrays or tape arrays. Control software presents the arrays of disks or tapes it controls to its operating environment as one or more virtual disks or tapes. Control software may execute in a disk controller or intelligent host bus adapter, or in a hostcomputer. When it executes in a disk controller or adapter, control software is often referred to as firmware.

controller CONTEXT [Storage System] [Management]

The control logic in a disk or tape that performs command decoding and execution, host data transfer, serializationand deserialization of data, error detection and correction, and overall management of device operations. The control logic in a storage subsystem that performs command transformation and routing, aggregation (RAID, mirroring, striping, or other), high-level error recovery, and performance optimization for multiple storage devices.A subclass of CIM_LogicalDevice. A CIM_Controller represents a device having a single protocol stack whose primary purpose is to communicate with, control, and reset connected devices. There are many subclasses of CIM_Controller, addressing SCSI, PCI, USB, serial, parallel, and video controllers.

controller based array controller based disk array CONTEXT [Storage System]

A disk array whose control software executes in a disk subsystem controller. The member disks of a controller-based array are necessarily part of the same disk subsystem that includes the controller. cf. host based array.

controller cache CONTEXT [Storage System]

A cache that resides within a controller and whose primary purpose is to improve disk or array I/O performance. cf. cache, disk cache, host cache

copy on write CONTEXT [Storage System, Backup]

A technique for maintaining a point-in-time copy of a collection of data by copying only data which is modified after the instant of replicate initiation. The original source data is used to satisfy read requests for both the source

of 407

data itself and for the unmodified portion of the point-in-time copy. cf. pointer remapping.

copyback CONTEXT [Storage System]

The replacement of a properly functioning array member by another disk, including copying of the member's contents to the replacing disk. Copyback, which is most often used to create or restore a particular physical configuration for an array (e.g., a particular arrangement of array members on device I/O buses), is accomplished without reduction of the array.

count-key-data CONTEXT [Storage Device]

A disk data organization model in which the disk is assumed to consist of a fixed number of tracks, each having a maximum data capacity. Multiple records of varying length may be written on each track of a count-key-data disk, and the usable capacity of each track depends on the number of records written to it. Count-key-data (CKD) architecture derives its name from the record format, which consists of a field containing the number of bytes of data and a record address, an optional key field by which particular records can be easily recognized, and the data itself. Count-key-data is the storage architecture used by IBM Corporation's System 390 series of mainframe computer systems. cf. fixed block architecture

covert channel CONTEXT [Security]

An unintended and/or unauthorized communications path that can be used to transfer information in a manner that violates a security policy.

credit CONTEXT [Fibre Channel]

The number of receive buffers allocated to a transmitting N_Port, NL_Port, or F_Port. The credit is the maximum number of outstanding frames that can be transmitted by that N_Port, NL_Port, or F_Port without causing a buffer overrun condition at the receiver.

CRC

Acronym for cyclic redundancy check.

CRU

Acronym for customer replaceable unit.

cryptanalysis CONTEXT [Security]

A set of operations performed in converting encrypted information to plaintext without initial knowledge of the algorithm and/or key employed in the encryption.

of 407

cryptosystem CONTEXT [Security]

A single means of encryption or decryption.

CSMA/CD

Acronym for carrier sense multiple access with collision detection.

cumulative incremental backup CONTEXT [Backup]

A backup in which all data objects modified since the last full backup are copied. To restore data when cumulative incremental backups are in use, only the latest full backup and the latest cumulative incremental backup are required. cf. differential incremental backup, full backup

current running disparity CONTEXT [Fibre Channel]

The running disparity present at a transmitter when the encoding of a valid data byte or special code is initiated, or at a receiver when the decoding of a transmission character is initiated.

customer replaceable unit

A unit, or component of a system that is designed to be replaced by "customers"; i.e., individuals who may not be trained as computer system service personnel. cf. field replaceable unit

cut through (switching) CONTEXT [Fibre Channel]

A switching technique that allows a routing decision to be made and acted upon as soon as the destination addressof a frame is received.

cyclic redundancy check

A scheme for checking the correctness of data that has been transmitted or stored and retrieved. Abbreviated CRC. A CRC consists of a fixed number of bits computed as a function of the data to be protected, and appended to the data. When the data is read or received, the function is recomputed, and the result is compared to that appended to the data. Cyclic redundancy checks differ from error correcting codes in that they can detect a wide range of errors, but are not capable of correcting them. cf. error correcting code

cylinder CONTEXT [Storage Device]


of 407

cylinder-head-sector addressing CONTEXT [Storage Device]

A form of addressing data stored on a disk in which the cylinder, head/platter combination, and relative sector number on a track are specified. Abbreviated C-H-S addressing. cf. block addressing

D_ID CONTEXT [Fibre Channel]

Acronym for destination identifier. cf. S_ID

daemon CONTEXT [Operating System]

A process that is always running on a computer system to service a particular set of requests. For example, in UNIX, lpd is a daemon that handles printing requests. Daemons are independent processes, and not part of an application program. Application requests may be serviced by a daemon.

data availability

The amount of time that data is accessible by applications during those time periods when it is expected to be available. Data availability is often measured as a percentage of an elapsed year. For example, 99.95% availabilityequates to 4.38 hours of unavailability in a year (0.0005 x 365 x 24 = 4.38) for a set of data that is expected to be available all the time. cf. availability, high availability

data byte CONTEXT [Network] [Storage Device] [Storage System]

A byte of user data as presented to a storage or communication facility. Data bytes are input to processes that encode for transmission or encrypt for privacy. cf. code byte, data character

data character CONTEXT [Fibre Channel] [Network]

Any transmission character associated by the transmission code with a valid data byte.

Data Encryption Standard CONTEXT [Security]

A cryptographic data protection algorithm published by the National Institute of Standards and Technology in Federal Information Processing Standard (FIPS) Publication 46.

data frame CONTEXT [Fibre Channel]

A frame containing information meant for FC-4 (ULP) or the link application.

data manager CONTEXT [File System]

of 407

A computer program whose primary purpose is to present a convenient view of data to applications, and map that view to an internal representation on a system, subsystem or device. File systems and database management systems are the most common forms of a data manager.

data model

A repository-specific representation of an information model. A database representation of the CIM schemas is an example of a data model.

data reliability

Expressed as Mean Time to Data Loss (MTDL). The length of the statistically expected continuous span of time over which data stored by a population of identical disk subsystems can be correctly retrieved.

data stripe depth CONTEXT [Storage System]

Synonym for user data extent stripe depth.

data striping CONTEXT [Storage System]

A disk array data mapping technique in which fixed-length sequences of virtual disk data addresses are mapped to sequences of member disk addresses in a regular rotating pattern. Disk striping is commonly called RAID Level 0 or RAID 0 because of its similarity to common RAID data mapping techniques. It includes no data protection, however, so strictly speaking, the appellation RAID is a misnomer.

data transfer capacity

The maximum rate at which data can be transmitted. Bandwidth is sometimes expressed in terms of signaling capacity (e.g., SCSI), and sometimes in terms of data transmission capacity inclusive of protocol overhead (e.g., Fibre Channel). cf. throughput, data transfer rate

data transfer-intensive (application)

A characterization of applications. A data transfer-intensive application is an I/O-intensive application which makeslarge I/O requests. Data transfer-intensive applications' I/O requests are usually sequential.

data transfer rate

The amount of data per unit time actually moved across an I/O bus in the course of executing an I/O load. The data transfer capacity of an I/O subsystem is an upper bound on its data transfer rate for any I/O load. For disk subsystem I/O, data transfer rate is usually expressed in MBytes/second (millions of bytes per second, where 1 million = 10^6). cf. data transfer capacity

of 407

database management system CONTEXT [Database]

A set of computer programs with a user and/or programming interface that supports the definition of the format ofa database, and the creation of and access to its data. A database management system removes the need for a user or program to manage low level database storage. It also provides security for and assures the integrity of the data it contains. Types of database management systems are relational (table-oriented) and object oriented. Abbreviated DBMS.

datagram CONTEXT [Fibre Channel] [Network]

A message sent between two communicating entities for which no explicit link level acknowledgement is expected. Datagrams are often said to be sent on a "best efforts" basis.

DBMS CONTEXT [Database]

Acronym for database management system.

decoding CONTEXT [Fibre Channel]

Validity checking of received transmission characters and generation of valid data bytes and special codes from those characters.

decryption CONTEXT [Security]

The operations performed in converting encrypted information to plaintext with full knowledge of the algorithm and key(s) used to encrypt it. Decryption is the intended method for an authorized user to decrypt encrypted information.

dedicated connection CONTEXT [Fibre Channel]

A communication circuit between two N_Ports maintained by a Fibre Channel fabric. The port resources used by a dedicated connection cannot be used for other purposes during the life of the dedicated connection.

dedicated connection service CONTEXT [Fibre Channel]

Synonym for Class 1 service.

degraded mode CONTEXT [Storage System]

Synonym for reduced mode. A mode of RAID array operation in which not all of the array's member disks are functioning, but the array as a whole is able to respond to application read and write requests to its virtual disks.

of 407

degaussing CONTEXT [Security]

A procedure that reduces magnetic flux to virtual zero by applying a reverse magnetizing field. Also called demagnetizing. Degaussing is used to ensure that no residual signal remains on magnetic media from which previously stored information could be recovered.

delimiter CONTEXT [Fibre Channel]

An ordered set used to indicate a frame boundary.

DEN CONTEXT [Network]

Acronym for directory enabled network.

denial of service CONTEXT [Security]

Result of any action or series of actions that prevents any part of an information system from functioning.

DES CONTEXT [Security]

Acronym for Data Encryption Standard

Desktop Management Interface CONTEXT [Management] [Network]

A former name for the Distributed Management Task Force (DMTF).

destination identifier CONTEXT [Fibre Channel]

An address contained in a Fibre Channel frame that identifies the destination of the frame.

destination N_Port CONTEXT [Fibre Channel]

The N_Port to which a frame is addressed.

device CONTEXT [Management] [Storage System]

Synonym for storage device. CIM_LogicalDevice is an object that abstracts the configuration and operational aspects of hardware. Subclasses of CIM_LogicalDevice include low-level sensors, processors, storage devices and printer hardware.

of 407

device bus device I/O bus CONTEXT [Storage System]

An I/O bus used to connect storage devices to a host bus adapter or intelligent controller. Device I/O bus is the preferred term.

device channel CONTEXT [Storage System]

A channel used to connect storage devices to a host I/O bus adapter or intelligent controller. The preferred term is device I/O bus.

device fanout CONTEXT [Storage System]

The ability of a storage controller to connect host computers to multiple storage devices using a single host I/O bus address. Device fanout allows computer systems to connect to substantially more storage devices than could be connected directly.

DHCP CONTEXT [Network]

Acronym for dynamic host control protocol.

differential incremental backup CONTEXT [Backup]

A backup in which data objects modified since the last full backup or incremental backup are copied. To restore data when differential incremental backups are in use, the newest full backup and all differential backups newer than the newest full backup are required. cf. cumulative incremental backup, full backup

differential mirror resynchronization differential resynchronization CONTEXT [Backup]

Synonyms for incremental mirror resynchronization.

differential (signaling) CONTEXT [SCSI]

A SCSI electrical signaling technique in which each control and data signal is represented by a voltage differential between two signal lines. Differential signaling can be used over longer distances than the alternative single endedsignaling. cf. single ended (signaling)

differentiated services CONTEXT [Management]

A protocol defined by the IETF for managing network traffic based on the type of packet or message being transmitted. Abbreviated DiffServ. DiffServ rules define how a packet flows through a network based on a 6-bit field (the Differentiated Services Code Point) in the IP header. The Differentiated Services Code Point specifies the "per hop behavior" (bandwidth, queuing and forward/drop status) for the packet or message.

of 407

DiffServ CONTEXT [Management]

Abbreviation for differentiated services.

digest CONTEXT [Security]

A computationally efficient function mapping binary strings of arbitrary length to binary strings of some fixed length.

Digital Linear Tape CONTEXT [Backup]

A family of tape device and media technologies developed by Quantum Corporation.

digital signature CONTEXT [Security]

A cryptographic process used to assure information authenticity, integrity, and nonrepudiation. Generally refers to assurances that can be externally verified by entities not in possession of the key used to sign the information. Forexample, a secure hash of the information encrypted with the originator's private key when an asymmetric cryptosystem is used. Some algorithms that are used in digital signatures cannot be used to encrypt data (e.g., DSA).

directory CONTEXT [File System] [Management] [Network]

A mechanism for organizing information.

A file or other persistent data structure in a file system that contains information about other files. Directories are usually organized hierarchically (i.e., a directory may contain both information about files and other directories), and are used to organize collections of files for application or human convenience. An LDAP-based repository consisting of class definitions and instances of those classes. An example of an enterprise-wide LDAP directory is Microsoft's Active Directory (AD) or Novell's NetWare Directory Service (NDS).

directory enabled network CONTEXT [Management] [Network]

An industry initiative, now part of the DMTF's mission, to map the CIM schema to an LDAP Directory. Abbreviated DEN. DEN's goals are to provide a consistent and standard data model to describe a network, its elements and its policies/rules. Policies are defined to provide quality of service or to manage to a specified class of service.

directory tree CONTEXT [File System]

A collective term for a directory, all of its files, and the directory trees of each of its subdirectories.

discard policy CONTEXT [Fibre Channel]

of 407

An error handling policy that allows an N_Port or NL_Port to discard data frames received following detection of a missing frame in a sequence.

disconnection CONTEXT [Fibre Channel]

The process of removing a dedicated connection between two N_Ports.

disk disk drive CONTEXT [Storage Device]

A non-volatile, randomly addressable, re-writable data storage device. This definition includes both rotating magnetic and optical disks and solid-state disks, or non-volatile electronic storage elements. It does not include specialized devices such as write-once-read-many (WORM) optical disks, nor does it include so-called RAM disks implemented using software to control a dedicated portion of a host computer's volatile random access memory.

disk array CONTEXT [Storage System]

A set of disks from one or more commonly accessible disk subsystems, combined with a body of control software. The control software presents the disks' storage capacity to hosts as one or more virtual disks. Control software is often called firmware or microcode when it runs in a disk controller. Control software that runs in a host computer is usually called a volume manager.

disk array subsystem CONTEXT [Storage System]

A disk subsystem which includes control software with the capability to organize its disks as disk arrays.

disk block CONTEXT [Storage Device] [Storage System]

The unit in which data is stored and retrieved on a fixed block architecture disk. Disk blocks are of fixed usable size (with the most common being 512 bytes), and are usually numbered consecutively. Disk blocks are also the unit of on-disk protection against errors; whatever mechanism a disk employs to protect against data errors (e.g., ECC) protects individual blocks of data. cf. sector

disk cache

A cache that resides within a disk. A cache that resides in a controller or host whose primary purpose is to improvedisk or array I/O performance. cf. cache, controller cache, host cache

disk image backup CONTEXT [Backup] [Windows]

A backup consisting of a copy of each of the blocks comprising a disk's usable storage area. A disk image backup incorporates no information about the objects contained on the disk, and hence cannot be used for individual object restoration.

of 407

disk shadowing CONTEXT [Storage System]

Synonym for mirroring.

disk striping CONTEXT [Storage System]

Synonym for data striping.

disk subsystem CONTEXT [Storage System]

A storage subsystem that supports only disk devices.

disk scrubbing CONTEXT [Storage System]

A function which reads all of the user data and check data blocks in a RAID array and relocates them if media defects are found. Disk scrubbing can have a noticeable negative effect on application performance.

disparity CONTEXT [Fibre Channel]

The difference between the number of ones and the number of zeros in a transmission character.

Distributed Management Task Force CONTEXT [Management]

An industry organization that develops management standards for computer system and enterprise environments. DMTF standards include WBEM, CIM, DMI, DEN and ARM. Abbreviated DMTF. The DMTF has a web site at www.dmtf.org.

DLT CONTEXT [Storage Device]

Acronym for Digital Linear Tape.

DMI

Acronym for Desktop Management Interface.

DMR

Acronym for differential mirror resynchronization.

of 407

DMTF CONTEXT [Management]

Acronym for Distributed Management Task Force.

DNS CONTEXT [Network]

Acronym for Domain Name Service.

document type definition CONTEXT [Network]

In XML, a specification of the permissible tags or "markup codes" in a document, and their meanings. Tags are delimited by the characters "<" and ">". Abbreviated DTD. When a DTD is available for a document, a universal reader (program) can parse the document and display or print it.

domain

A shared user authorization database which contains users, groups, and their security policies. A set of interconnected network elements and addresses that are administered together and that may communicate.

domain controller CONTEXT [Windows]

A Windows NT or Windows 2000 server that contains a copy of a user account database. A Windows domain may contain both a primary and a backup domain controller.

Domain Name Service CONTEXT [Network]

A computer program that converts between IP addresses and symbolic names for nodes on a network in a standard way. Abbreviated DNS. Most operating systems include a version of Domain Name Service.

DoS CONTEXT [Security]

Acronym for denial of service

double buffering

A technique often used to maximize data transfer rate by constantly keeping two I/O requests for consecutively addressed data outstanding. A software component begins a double-buffered I/O stream by making two I/O requests in rapid sequence. Thereafter, each time an I/O request completes, another is immediately made, leavingtwo outstanding. If a disk subsystem can process requests fast enough, double buffering allows data to be transferred at a disk or disk array's full volume transfer rate.

of 407

drive letter CONTEXT [Windows]

A single letter of the alphabet by which applications and users identify a partition of physical or virtual disk to the Windows operating system. The number of letters in the alphabet limits the number of disks that can be referenced.

driver driver software

Synonyms for I/O driver.

DSA CONTEXT [Security]

Acronym for a specific algorithm proposed by NIST in 1991 for use in digital signatures.

DTD

Acronym for document type definition.

dual active (components)

A pair of components, such as the controllers in a failure tolerant storage subsystem, that share a task or class of tasks when both are functioning normally. When one of the components fails, the other takes on the entire task. Dual active controllers are connected to the same set of storage devices, improving both I/O performance and failure tolerance compared to a single controller. Dual active components are also called active-active components.

duplicate CONTEXT [Backup]

(noun) A general term for a copy of a collection of data, including point-in-time copies. (verb) The action of making a duplicate as defined above. cf. replicate, snapshot. Any redundant component in a system.

dynamic host control protocol CONTEXT [Network]

An Internet protocol that allows nodes to dynamically acquire ("lease") network addresses for periods of time rather than having to pre-configure them. Abbreviated DHCP. DHCP greatly simplifies the administration of large networks and networks in which nodes frequently join and depart.

dynamic mapping CONTEXT [Storage System]

A form of mapping in which the correspondence between addresses in the two address spaces can change over time. cf. algorithmic mapping, tabular mapping

E_Port CONTEXT [Fibre Channel]

of 407

An expansion port on a Fibre Channel switch. E_Ports are used to link multiple Fibre Channel switches together into a fabric. cf. F_Port, FL_Port, G_Port, L_Port, N_Port, NL_Port

EBU CONTEXT [Standards]

Acronym for European Broadcast Union.

ECC

Acronym for error correcting code.

EE_buffer CONTEXT [Fibre Channel]

A buffer associated with end-to-end flow control.

EE_credit CONTEXT [Fibre Channel]

A credit scheme used to manage end to end flow control during the exchange of frames between two communicating devices.

electronic storage element CONTEXT [Storage Device]

Synonym for solid state disk.

embedded controller embedded storage controller CONTEXT [Storage System]

An intelligent storage controller that mounts in a host computer's housing and attaches directly to a host's internal I/O bus. Embedded controllers obviate the need for host bus adapters and external host I/O buses. Embedded storage controllers differ from host bus adapters in that they provide functions beyond I/O bus protocol conversion(e.g., RAID).

Encapsulating Security Payload CONTEXT [Security]

A component of IPsec that permits the specification of various confidentiality mechanisms.

encoding CONTEXT [Fibre Channel]

Generation of transmission characters from valid data bytes and special codes.

of 407

encryption CONTEXT [Security]

The conversion of plaintext to encrypted text with the intent that it only be accessible to authorized users who have the appropriate decryption key.

end of frame

A group of ordered sets that delineates the end of a frame.

end to end encryption CONTEXT [Security]

Encryption of information at its origin and decryption at its intended destination without intermediate decryption.

end to end flow control CONTEXT [Fibre Channel] [Network]

Control of message flow between the two end parties to a communication on a network. Flow control that occurs between two connected Fibre Channel N-Ports.

enterprise resource management CONTEXT [Management] [Network]

Software that manages all aspects of an organization's assets, systems, services and functions.

The management of a set of resources in the wider perspective of an organization's entire business. Managing in an enterprise context requires that entities be named uniquely and locatable within the enterprise, that heterogeneity of platforms and services may be assumed, and that the dynamic nature of the environment is taken into account.

Enterprise Systems Connection CONTEXT [Storage System]

A 200 Mbps serial I/O bus used on IBM Corporation's Enterprise System 9000 data center computers. Abbreviated ESCON. Similar to Fibre Channel in many respects, ESCON is based on redundant switches to which computers and storage subsystems connect using serial optical connections.

entry port exit port CONTEXT [Backup]

A port in a media library through which media can be inserted or removed without exposing internal library components. Also called exit port.

EOF

Acronym for end of frame.

of 407

ERM

Acronym for Enterprise Resource Management.

error correcting code

A scheme for checking the correctness of data that has been stored and retrieved, and correcting it if necessary. Abbreviated ECC. An ECC consists of a number of bits computed as a function of the data to be protected, and appended to the data. When the data and ECC are read, the function is recomputed, the result is compared to the ECC appended to the data, and correction is performed if necessary. Error correcting codes differ from cyclic redundancy checks in that the latter can detect errors, but are not generally capable of correcting them. cf. cyclic redundancy check

ESCON CONTEXT [Storage System]

Acronym for Enterprise Systems Connection.

ESP CONTEXT [Security]

Acronym for Encapsulating Security Payload

ESRM

Acronym for enterprise storage resource management.

Ethernet CONTEXT [Network]

The predominant local area networking technology, based on packetized transmissions between physical ports over a variety of electrical and optical media. Ethernet can transport any of several upper layer protocols, the mostpopular of which is TCP/IP. Ethernet standards are maintained by the IEEE 802.3 committee. The unqualified term Ethernet usually refers to 10 Mbps transmission on multi-point copper. Fast Ethernet is used to denote 100-Mbps transmission, also on multi-point copper facilities. Ethernet and Fast Ethernet both use CSMA/CD physical signaling. Gigabit Ethernet (abbreviated GBE) transmits at 1250 Megabaud (1 Gbit of data per second) using 8b/10b encoding with constant transmission detection.

Ethernet adapter CONTEXT [Network]

An adapter that connects an intelligent device to an Ethernet network. Usually called an Ethernet network interfacecard, or Ethernet NIC. cf. NIC

European Broadcast Union CONTEXT [Standards]

A European-based television (Video) standardization group coordinated with SMPTE and loosely affiliated with FC-AV. Abbreviated EBU.

of 407

EVSN CONTEXT [Backup]

Acronym for External Volume Serial Number.

exchange CONTEXT [Fibre Channel]

A set of one or more non-concurrent related sequences passing between a pair of Fibre Channel ports. An exchange encapsulates a "conversation" such as a SCSI task or an IP exchange. Exchanges may be bidirectional and may be short or long lived. The parties to an exchange are identified by an Originator Exchange_Identifier (OX_ID) and a Responder Exchange_Identifier (RX_ID).

Exchange_Identifier CONTEXT [Fibre Channel]

A generic term denoting either an Originator Exchange Identifier (OX_ID) or a Responder Exchange Identifier (RX_ID).

exchange status block CONTEXT [Fibre Channel]

A data structure which contains the state of an exchange. An originator N_Port or NL_Port has an originator exchange status block and a responder N_Port or NL_Port has a responder exchange status block for each concurrently active exchange.

exclusive connection CONTEXT [Fibre Channel]

A Class 1 dedicated connection without intermix.

exit port CONTEXT [Backup]

A port in a media library through which media can be inserted or removed without exposing internal library components. cf. entry port

expansion card expansion module

A collective term for optional adapters in the form of printed circuit modules that can be added to intelligent devices. Expansion cards include host bus adapters, network interface cards, as well as NVRAM, console, and otherspecial purpose adapters.

expansion slot

A mounting and internal bus attachment device within an intelligent device into which expansion cards are inserted.

of 407

explicit addressing CONTEXT [Storage Device] [Storage System]

A form of addressing used with disks in which the data's address is explicitly specified in the access request. cf. implicit addressing

export (verb)

Synonym for present. To cause to appear or make available. Disk array control software exports virtual disks to itshost environment. In file systems, a directory may be exported or made available for access by remote clients. To move objects, such as data, from within a system to a location outside the system, usually requiring a transformation during the move.

eXtensible Markup Language

A universal format for structured documents and data on the World Wide Web. Abbreviated XML. The World Wide Web Consortium is responsible for the XML specification. cf. http://www.w3.org/XML/.

extent CONTEXT [Storage Device] [Storage System]

A set of consecutively addressed FBA disk blocks that is allocated to consecutive addresses of a single file. A set of consecutively located tracks on a CKD disk that is allocated to a single file. A set of consecutively addressed disk blocks that is part of a single virtual disk-to-member disk array mapping. A single disk may be organized into multiple extents of different sizes, and may have multiple (possibly) non-adjacent extents that are part of the same virtual disk-to-member disk array mapping. This type of extent is sometimes called a logical disk.

A subclass or instance of the CIM_StorageExtent object. CIM models both removable and nonremovable types of storage media.

external controller external disk controller external storage controller CONTEXT [Storage System]

An intelligent storage controller that mounts outside its host computer's enclosure and attaches to hosts via external I/O buses. External storage controllers usually mount in the enclosure containing the disks they control.

external volume serial number CONTEXT [Backup]

A humanly readable volume serial number on a removable media or cartridge. Abbreviated EVSN. cf. label

eye

The region of an eye diagram that does not occur for correctly formed pulses. This is in the center of the eye diagram and distinguishes presence of signal (region above the eye) from absence of signal (region below the eye).

of 407

eye diagram

A diagram used to specify optical or electrical pulse characteristics for transmitters. The horizontal axis represents normalized time from pulse start and the vertical axis represents normalized amplitude. cf. eye opening

eye opening

The time interval across the eye, measured at a 50% normalized eye amplitude, which is error free to the specified BER.

F_Port CONTEXT [Fibre Channel]

A port that is part of a Fibre Channel fabric. An F_Port on a Fibre Channel fabric connects to a node's N_Port. F_Ports are frame routing ports, and are insensitive to higher level protocols. cf. E_Port, FL_Port, G_Port, L_Port, N_Port, NL_Port. The Link_Control_Facility within a fabric which attaches to an N_Port through a link. An N_Port uses a well-known address (hex 'FFFFFE') to address the F_Port attached to it.

F_Port name CONTEXT [Fibre Channel]

A Name_Identifier associated with an F_Port.

fabric CONTEXT [Fibre Channel]

A Fibre Channel switch or two or more Fibre Channel switches inter connected in such a way that data can be physically transmitted between any two N_Ports on any of the switches. The switches comprising a Fibre Channel fabric are capable of routing frames using only the D_ID in a FC-2 frame header.

fabric login CONTEXT [Fibre Channel]

The process by which a Fibre Channel node establishes a logical connection to a fabric switch.

Fabric Name CONTEXT [Fibre Channel]

A Name_Identifier associated with a fabric.

failback

The restoration of a failed system component's share of a load to a replacement component. For example, when a failed controller in a redundant configuration is replaced, the devices that were originally controlled by the failed controller are usually failed back to the replacement controller to restore the I/O balance, and to restore failure tolerance. Similarly, when a defective fan or power supply is replaced, its load, previously borne by a redundant component, can be failed back to the replacement part.

of 407

failed over

A mode of operation for failure tolerant systems in which a component has failed and its function has been assumed by a redundant component. A system that protects against single failures operating in failed over mode is not failure tolerant, since failure of the redundant component may render the system unable to function. Some systems (e.g., clusters) are able to tolerate more than one failure; these remain failure tolerant until no redundantcomponent is available to protect against further failures.

failover

The automatic substitution of a functionally equivalent system component for a failed one. The term failover is most often applied to intelligent controllers connected to the same storage devices and host computers. If one of the controllers fails, failover occurs, and the survivor takes over its I/O load.

failure tolerance

The ability of a system to continue to perform its function (possibly at a reduced performance level) when one or more of its components has failed. Failure tolerance in disk subsystems is often achieved by including redundant instances of components whose failure would make the system inoperable, coupled with facilities that allow the redundant components to assume the function of failed ones.

fanout CONTEXT [Storage System]

Synonym for device fanout.

fast SCSI CONTEXT [SCSI]

A form of SCSI that provides 10 megatransfers per second. Wide fast SCSI has a 16-bit data path, and transfers 20 MBytes per second. Narrow fast SCSI transfers 10 MBytes per second. cf. wide SCSI, Ultra SCSI, Ultra2 SCSI, Ultra3 SCSI

fast mirror resynchronization CONTEXT [Backup]

A technique for reducing the time required to synchronize a split mirror with the set of storage devices from which it was split. Fast mirror resynchronization requires that a list of changes to the original set of data since moment ofsplitting be kept. When the split mirror is rejoined to the original set of volumes, only the data items identified in the list are copied from the original to the split mirror rather than the entire contents of the devices.

fault tolerance

Synonym for failure tolerance.

of 407

FBA CONTEXT [Storage Device]

Acronym for fixed block architecture.

FC-PH CONTEXT [Fibre Channel]

The Fibre Channel physical standard, consisting of FC-0, FC-1, and FC-2.

FC-0 CONTEXT [Fibre Channel]

The Fibre Channel protocol level that encompasses the physical characteristics of the interface and data transmission media. Specified in FC-PH.


The Fibre Channel protocol level that encompasses 8B/10B encoding, and transmission protocol. Specified in FC-PH.


The Fibre Channel protocol level that encompasses signaling protocol rules and the organization of data into frames, sequences, and exchanges. Specified in FC-PH.


The Fibre Channel protocol level that encompasses common services between FC-2 and FC-4. FC-3 contains no services in most implementations.


The Fibre Channel protocol level that encompasses the mapping of upper layer protocols (ULP) such as IP and SCSI to lower protocol layers (FC-0 through FC-3). For example, the mapping of SCSI commands is an FC-4 ULP that defines the control interface between computers and storage.

FC-AE CONTEXT [Fibre Channel]

Acronym for Fibre Channel Avionics Environment.

FC-AL CONTEXT [Fibre Channel]

Acronym for Fibre Channel Arbitrated Loop.

of 407

FC-AV CONTEXT [Fibre Channel]

Acronym for Fibre Channel Audio Video.

FC-GS2 CONTEXT [Fibre Channel]

Acronym for Fibre Channel Generic Services.

FC-SB FC-SB2 CONTEXT [Fibre Channel]

Acronym for Fibre Channel Single Byte (command set).

FC-SW FC-SW2 CONTEXT [Fibre Channel]

Acronym for Fibre Channel Switched (fabric interconnect).

FC-VI CONTEXT [Fibre Channel]

Acronym for Fibre Channel Virtual Interface.

FCA CONTEXT [Fibre Channel]

Acronym for Fibre Channel Association.

FCP CONTEXT [Fibre Channel]

Acronym for Fibre Channel Protocol.

FCSI CONTEXT [Fibre Channel]

Acronym for Fibre Channel Systems Initiative.

FDDI CONTEXT [Network]

Acronym for Fiber Distributed Data Interface.

FIM CONTEXT [Backup]

of 407

Acronym for frozen image method.

FDDI adapter CONTEXT [Network]

An adapter that connects an intelligent device to an FDDI network. Both FDDI-fiber adapters that connect to optical fiber FDDI networks and FDDI-TP adapters that connect to twisted copper pair FDDI networks exist. Although network interface cards are usually referred to as NICs rather than as adapters, the term FDDI adapter ismore common than FDDI NIC. cf. adapter, NIC

Federal Information Processing Standard CONTEXT [Security]

Standards (and guidelines) produced by NIST for government-wide use in the specification and procurement of Federal computer systems.

Federated Management Architecture Specification CONTEXT [Management] [Network]

A specification from Sun Microsystems Computer Corporation that defines a set of Java APIs for heterogeneous storage resource and storage network management. This specification is a central technology of Jiro.

Fiber Distributed Data Interface CONTEXT [Network]

An ANSI standard for token ring Metropolitan Area Networks (MANs), based on the use of optical fiber cable to transmit data at a rate of 100 Mbits/second. Both optical fiber and twisted copper pair variations of the FDDI physical standard exist. FDDI is a completely separate set of standards from Fibre Channel. The two are not directly interoperable.

fibre CONTEXT [Fibre Channel]

A general term used to cover all transmission media specified in FC-PH. The X3T11 standardization committee's preferred spelling of the name of Fibre Channel technology.

Fibre Channel CONTEXT [Fibre Channel]

A set of standards for a serial I/O bus capable of transferring data between two ports at up to 100 MBytes/second, with standards proposals to go to higher speeds. Fibre Channel supports point to point, arbitrated loop, and switched topologies. Fibre Channel was completely developed through industry cooperation, unlike SCSI, which was developed by a vendor and submitted for standardization after the fact.

Fibre Channel Association CONTEXT [Fibre Channel]

A former trade association incorporated in 1993 to promote Fibre Channel technology in the market. Abbreviated FCA. Separate FCA Europe and FCA Japan organizations also exist. In 1999, FCA merged with FCLC to form the FCIA.

of 407

Fibre Channel Avionics Environment CONTEXT [Fibre Channel]

A technical committee and industry group whose goal is to standardize Fibre Channel for avionics, defense, and other mobile applications.

Fibre Channel Audio Video CONTEXT [Fibre Channel]


Fibre Channel Arbitrated Loop CONTEXT [Fibre Channel]

A form of Fibre Channel network in which up to 126 nodes are connected in a loop topology, with each node's L_Port transmitter connecting to the L_Port receiver of the node to its logical right. Nodes connected to a Fibre Channel Arbitrated Loop arbitrate for the single transmission that can occur on the loop at any instant using a Fibre Channel Arbitrated Loop protocol that is different from Fibre Channel switched and point to point protocols. An arbitrated loop may be private (no fabric connection) or public (attached to a fabric by an FL_Port).

Fibre Channel Community Fibre Channel Loop Community

A former trade association incorporated in 1995 to promote Fibre Channel Arbitrated Loop technology for storage applications. Abbreviated FCLC. Name changed to Fibre Channel Community in 1997 to reflect changing goals and interests of the organization. In 1999, FCLC merged with FCA to form the FCIA.

Fibre Channel Generic Services CONTEXT [Fibre Channel]

An ANSI standard that specifies several Fibre Channel services such as the Name Server, Management Server, Time Server and others. Abbreviated FC-GS-2.

Fibre Channel Industry Association CONTEXT [Fibre Channel]

The industry association resulting from the 1999 merger of the Fibre Channel Association and the Fibre Channel Community.

Fibre Channel Name CONTEXT [Fibre Channel]

A Name_Identifier that is unique in the context of Fibre Channel. Essentially unused; most Fibre Channel name identifiers are World Wide Names that are unique across heterogeneous networks.

Fibre Channel Protocol CONTEXT [Fibre Channel]

The serial SCSI command protocol used on Fibre Channel networks. Abbreviated FCP. FCP standardization is the responsibility of the X3T10 committee.

of 407

Fibre Channel Service Protocol CONTEXT [Fibre Channel]

A FC-4 protocol that defines all services independently of topology or fabric type.

Fibre Channel Single Byte (command set) CONTEXT [Fibre Channel]

The industry standard command protocol for ESCON over Fibre Channel. Abbreviated FC-SB. A second version is known as FC-SB2.

Fibre Channel Switched (fabric interconnect) CONTEXT [Fibre Channel]

The standard governing the form of Fibre Channel network in which nodes are connected to a fabric topology implemented by one or more switches. Abbreviated FC-SW. Each FC-SW node's N_Port connects to an F_Port on a switch. Pairs of nodes connected to a FC-SW network can communicate concurrently. cf. fabric, Fibre Channel Arbitrated Loop

Fibre Channel Systems Initiative CONTEXT [Fibre Channel]

An industry association sponsored by Hewlett-Packard, IBM and SUN with the goals of creating Fibre Channel profiles and promoting use of Fibre Channel for computer systems applications. Abbreviated FCSI. FCSI was formed in 1993, and dissolved in 1995.

Fibre Channel Virtual Interface CONTEXT [Fibre Channel]

A proposed standard for application-level distributed interprocess communication based on Intel Corporation's V1.0 Virtual Interface (VI) Architecture; formerly known as VIA. Abbreviated FC-VI.

Fibre Connect CONTEXT [Fibre Channel]

IBM Corporation's implementation of ESCON over Fibre Channel. Abbreviated FICON. Later standardized as Fibre Channel Single Byte Command Set.

FICON CONTEXT [Fibre Channel]

Acronym for Fibre Connect.

field replaceable unit

A unit, or component of a system that is designed to be replaced "in the field"; i.e., without returning the system to a factory or repair depot. Field replaceable units may be customer-replaceable, or their replacement may require trained service personnel. cf. customer replaceable unit

of 407

file CONTEXT [File System]

An abstract data object made up of (a.) an ordered sequence of data bytes stored on a disk or tape, (b.) a symbolic name by which the object can be uniquely identified, and (c.) a set of properties, such as ownership and access permissions, that allow the object to be managed by a file system or backup manager. Unlike the permanent address spaces of storage media, files may be created and deleted, and in most file systems may expand or contract in size during their lifetimes.

file server ONTEXT [File System]

A computer whose primary purpose is to serve files to clients. A file server may be a general purpose computer that is capable of hosting additional applications or a special purpose computer capable only of serving files. cf. filer

file system CONTEXT [File System]

A software component that imposes structure on the address space of one or more physical or virtual disks so thatapplications may deal more conveniently with abstract named data objects of variable size (files). File systems are often supplied as operating system components, but are implemented and marketed as independent software components.

file system virtualization

The act of aggregating multiple file systems into one large virtual file system. Users access data objects through the virtual file system; they are unaware of the underlying partitioning. The act of providing additional new or different functionality, e.g., a different file access protocol, on top of one or more existing file systems.

file virtualization CONTEXT [File System]

The use of virtualization to present several underlying file or directory objects as one single composite file. The useof virtualization to provide HSM-like properties in a storage system. The use of virtualization to present an integrated file interface when file data and metadata are managed separately in the storage system. cf. block virtualization

filer CONTEXT [File System]

An intelligent network node whose hardware and software are designed to provide file services to client computers. Filers are pre-programmed by their vendors to provide file services, and are not normally user programmable. cf. appliance, file server

firmware

Low-level software for booting and operating an intelligent device. Firmware generally resides in read-only memory (ROM) on the device.

of 407

fill byte fill word CONTEXT [Fibre Channel]

A transmission word that is an idle or an ARBx primitive signal. Fill words are transmitted between frames, primitive signals, and primitive sequences to keep a fibre channel network active.

flash memory


FIPS CONTEXT [Security]

Acronym for Federal Information Processing Standard.

fixed block architecture

A model of disks in which storage space is organized as linear, dense address spaces of blocks of a fixed size. Abbreviated FBA. Fixed block architecture is the disk model on which SCSI is predicated. cf. count-key-data

FL_Port CONTEXT [Fibre Channel]

A port that is part of a Fibre Channel fabric. An FL_Port on a Fibre Channel fabric connects to an arbitrated loop. Nodes on the loop use NL_Ports to connect to the loop. NL_Ports give nodes on a loop access to nodes on the fabric to which the loop's FL_Port is attached. cf. E_Port, F_Port, G_Port, L_Port, N_Port, NL_Port

FLOGI CONTEXT [Fibre Channel]

Acronym for fabric login.

formatting CONTEXT [Storage Device] [Storage System]

The preparation of a disk for use by writing required information on the media. Disk controllers format disks by writing block header and trailer information for every block on the disk. Host software components such as volumemanagers and file systems format disks by writing the initial structural information required for the volume or file system to be populated with data and managed.

frame CONTEXT [Fibre Channel]

An ordered vector of words that is the basic unit of data transmission in a Fibre Channel network. A Fibre Channel frame consists of a start of frame word (SoF) (40 bits); a Frame Header (8 Words or 320 bits); data (0 to 524 Words or 0 to 2192 ten bit encoded bytes; a CRC (one word or 40 bits); and an end of frame (EoF) (40 bits). cf. data frame

of 407

frame content CONTEXT [Fibre Channel]

The information contained in a frame between its start-of-frame and end-of-frame delimiters, excluding the delimiters.

FRU

Acronym for field replaceable unit.

FSP CONTEXT [Fibre Channel]

Acronym for Fibre Channel Service Protocol.

FMR CONTEXT [Backup]

Acronym for fast mirror resynchronization.

full backup CONTEXT [Backup]

A backup in which all of a defined set of data objects are copied, regardless of whether they have been modified since the last backup. A full backup is the basis from which incremental backups are taken. cf. cumulative incremental backup, differential incremental backup

full duplex

Concurrent transmission and reception of data on a single link.

full volume transfer rate

The average rate at which a single disk transfers a large amount of data (e.g., more than one cylinder) in response to one I/O request. The full-volume data transfer rate accounts for any delays (e.g., due to inter-sector gaps, inter-track switching time and seeks between adjacent cylinders) that may occur during the course of a large data transfer. Full volume transfer rate may differ depending on whether data is being read or written. If thisis true, it is appropriate to speak of full volume read rate or full volume write rate. Also known as spiral data transfer rate.

frozen image CONTEXT [Backup]

Synonym for point-in-time copy.

of 407

frozen image method CONTEXT [Backup]

A method by which a frozen image of a set of data can be generated. Abbreviated FIM. Split mirrors and copy-on-write snapshots are the two common methods of generating frozen images.

G_Port CONTEXT [Fibre Channel]

A port on a Fibre Channel switch that can function either as an F_Port or as an E_Port. The functionality of a G_Port is determined during port login. A G_Port functions as an F_Port when connected to a N_Port, and as an E_Port when connected to an E_Port. cf. E_Port, F_Port, FL_Port, L_Port, N_Port, NL_Port

GBE CONTEXT [Network]

Acronym for Gigabit Ethernet.

Gb Gbit gigabit

Shorthand for 1,000,000,000 (10^9) bits. Storage Networking Industry Association publications typically use the term Gbit to refer to 10^9 bits, rather than 1,073,741,824 (2^30) bits. For Fibre Channel, 1,062,500,000 bits per second.

GB GByte

Synonym for gigabyte. Shorthand for 1,000,000,000 (10^9) bytes. The Storage Networking Industry Association uses GByte to refer to 10^9 bytes, as is common in I/O-related applications rather than the 1,073,741,824 (2^30) convention sometimes used in describing computer system random access memory.

GBIC CONTEXT [Fibre Channel]

Acronym for gigabit interface converter.

geometry

The mathematical description of the layout of blocks on a disk. The primary aspects of a disk's geometry are the number of recording bands and the number of tracks and blocks per track in each, the number of data tracks per cylinder, and the number and layout of spare blocks reserved to compensate for media defects.

gigabaud link module CONTEXT [Fibre Channel]

A transceiver that converts between electrical signals used by host bus adapters (and similar Fibre Channel devices) and either electrical or optical signals suitable for transmission. Abbreviated GLM. Gigabaud link modules allow designers to design one type of device and adapt it for either copper or optical applications. Gigabaud link modules are used less often than gigabit interface converters because they cannot be hot swapped. cf. gigabit interface converter

of 407

gigabit

Synonym for Gbit.

Gigabit Ethernet CONTEXT [Network]

A group of Ethernet standards in which data is transmitted at 1Gbit per second. Gigabit Ethernet carries data at 1250 Megabaud using an adaptation of the Fibre Channel Physical Layer (8b/10b encoding); Abbreviated GBE. GBE standards are handled by IEEE 802.3z.

gigabit interface converter CONTEXT [Fibre Channel]

A transceiver that converts between electrical signals used by host bus adapters (and similar Fibre Channel and Ethernet devices) and either electrical or optical signals suitable for transmission. Abbreviated GBIC. Gigabit interface converters allow designers to design one type of device and adapt it for either copper or optical applications. Unlike gigabaud link modules (GLMs), GBICs can be hot swapped, and are therefore gradually supplanting the former type of transceiver. cf. gigabaud link module

gigabyte

Synonym for GByte.

Gigabyte System Network

A common name for the HIPPI-6400 standard for 800 MByte per second links. A network of devices that implement the HIPPI-6400 standard.

GLM CONTEXT [Fibre Channel]

Acronym for gigabaud link module.

graphical user interface

A form of user interface to intelligent devices characterized by pictorial displays and highly structured, forms oriented input. Valued for perceived ease of use compared with character cell interface.

group

A collection of computer user identifiers used as a convenience in assigning resource access rights or operational privileges.

of 407

GSN

Acronym for Gigabyte System Network.

GUI

Acronym for graphical user interface.

hacker CONTEXT [Security]

An unauthorized user who attempts to gain and/or succeeds in gaining access to an information system.

hard zone CONTEXT [Fibre Channel]

A zone consisting of zone members which are permitted to communicate with one another via the fabric. Hard zones are enforced by fabric switches, which prohibit communication among members not in the same zone. Well-known addresses are implicitly included in every zone.

HBA

Acronym for host bus adapter.

hierarchical storage management CONTEXT [Backup]

The automated migration of data objects among storage devices, usually based on inactivity. Abbreviated HSM. Hierarchical storage management is based on the concept of a cost-performance storage hierarchy. By accepting lower access performance (higher access times), one can store objects less expensively. By automatically moving less frequently accessed objects to lower levels in the hierarchy, higher cost storage is freed for more active objects, and a better overall cost:performance ratio is achieved.

high availability

The ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. High availability is most often achieved through failure tolerance. High availability is not an easily quantifiable term. Both the bounds of a system that is called highly available and the degree to which its availability is extraordinary must be clearly understood on a case-by-case basis.

High Performance Parallel Interface

An ANSI standard for an 800 Mbit/second I/O interface primarily used in supercomputer networks. Abbreviated HIPPI. The subsequent 6400 Mbit per second I/O interface standard, HIPPI-6400, is more commonly referred to asthe Gigabyte System Network (GSN) standard.

of 407

high speed serial direct connect CONTEXT [Fibre Channel]

A form factor that allows quick connect/disconnect for Fibre Channel copper interfaces.

HIPPI

Acronym for High Performance Parallel Interface.

host

A host computer.

host adapter

Synonym for host bus adapter.

host based array host based disk array CONTEXT [Storage System]

Synonym for volume. A disk array whose control software executes in one or more host computers rather than in a disk controller. The member disks of a host-based array may be part of different disk subsystems. cf. controller based array, volume

host based virtualization

Virtualization implemented in a host computer rather than in a storage subsystem or storage appliance. Virtualization can be implemented either in host computers, in storage subsystems or storage appliances, or in a specific virtualization appliances in the storage interconnect fabric.

host bus

Synonym for host I/O bus.

host bus adapter

An I/O adapter that connects a host I/O bus to a computer's memory system. Abbreviated HBA. Host bus adapter is the preferred term in SCSI contexts. Adapter and NIC are the preferred terms in Fibre Channel contexts. The term NIC is used in networking contexts such as Ethernet and token ring.

host cache CONTEXT [Storage]

of 407

A cache that resides within a host computer whose primary purpose is to improve disk or array I/O performance. Host cache may be associated with a file system or database, in which case the data items stored in the cache are file or database entities. Alternatively, host cache may be associated with the device driver stack, in which case the cached data items are sequences of disk blocks. cf. cache, controller cache, disk cache

host computer

Any computer system to which disks, disk subsystems, or file servers are attached and accessible for data storage and I/O. Mainframes, servers, workstations and personal computers, as well as multiprocessors and clustered computer complexes, are all referred to as host computers in SNIA publications.

host environment

A storage subsystem's host computer or host computers, inclusive of operating system and other required software instance(s). The term host environment is used in preference to host computer to emphasize that multiple host computers are being discussed, or to emphasize the importance of the operating system or other software in the discussion.

host I/O bus CONTEXT [Storage System]

An I/O bus used to connect a host computer's host bus adapter to storage subsystems or storage devices. cf. device I/O bus, I/O bus, channel

hot backup CONTEXT [Backup]

Synonym for online backup. cf. cold backup, offline backup

hot disk

A disk whose capacity to execute I/O requests is saturated by the aggregate I/O load directed to it from one or more applications.

hot file

A frequently accessed file. Hot files are generally the root cause of hot disks, although this is not always the case. A hot disk can also be caused by operating environment I/O, such as paging or swapping.

hot spare (disk)

A disk being used as a hot standby component.

hot standby (component, controller)

of 407

A redundant component in a failure tolerant storage subsystem that is powered and ready to operate, but which does not operate as long as a companion primary component is functioning. Hot standby components increase storage subsystem availability by allowing systems to continue to function when a component such as a controller fails. When the term hot standby is used to denote a disk, it specifically means a disk that is spinning and ready tobe written to, for example, as the target of a rebuilding operation.

hot swap

The substitution of a replacement unit (RU) in a system for a defective unit, where the substitution can be performed while the system is performing its normal functioning normally. Hot swaps are physical operations typically performed by humans. cf. automatic swap, cold swap, warm swap

hot swap adapter

An adapter that can be hot swapped into or out of an intelligent device. Some adapters that are called hot swap adapters should more properly be termed warm swap adapters, because the function they perform is interrupted while the substitution occurs.

HSM CONTEXT [Backup]

Acronym for hierarchical storage management.

HSSDC CONTEXT [Fibre Channel]

Acronym for high speed serial direct connect.

HTML

Acronym for HyperText Markup Language.

HTTP

Acronym for HyperText Transfer Protocol.

hub

A communications infrastructure device to which nodes on a multi-point bus or loop are physically connected. Commonly used in Ethernet and Fibre Channel networks to improve the manageability of physical cables. Hubs maintain the logical loop topology of the network of which they are a part, while creating a "hub and spoke" physical star layout. Unlike switches, hubs do not aggregate bandwidth. Hubs typically support the addition or removal of nodes from the bus while it is operating.

of 407

hub port CONTEXT [Fibre Channel]

A port on a Fibre Channel hub whose function is to pass data transmitted on the physical loop to the next port on the hub. Hub ports include loop healing port bypass functions. Some hubs have additional management functionality. There is no definition of a hub port in any Fibre Channel standard.

hunt group CONTEXT [Fibre Channel]

A set of associated N_Ports in a single node attached to the same fabric. A hunt group is assigned a special alias address identifier that enables a switch to route any frames containing the identifier to be routed to any available N_Port in the group. FC-PH does not presently specify how a hunt group can be realized.

HyperText Markup Language

The set of tags or "markup" codes that describe how a document is displayed by a web browser. Tags are delimited by the characters, "<" and ">". For example, the markup code "<p>" indicates that a new paragraph is beginning, while "</p>" indicates that the current paragraph is ending.

HyperText Transfer Protocol

An application level protocol, usually run over TCP/IP, that enables the exchange of files via the World Wide Web.

ICMP

Acronym for Internet Control Message Protocol.

idempotency

A property of operations on data. An idempotent operation is one that has the same result no matter how many times it is performed on the same data. Writing a block of data to a disk is an idempotent operation, whereas writing a block of data to a tape is not, because writing a block of data twice to the same tape results in two adjacent copies of the block.

idle idle word CONTEXT [Fibre Channel]

An ordered set of four transmission characters normally transmitted between frames to indicate that a fibre channel network is idle.

IETF CONTEXT [Network] [Standards]

Acronym for Internet Engineering Task Force.

of 407

ignored (field) CONTEXT [Fibre Channel]

A field that is not interpreted by its receiver.

IKE CONTEXT [Network][Security]

Acronym for Internet Key Exchange

implicit addressing CONTEXT [Storage Device]

A form of addressing usually used with tapes in which the data's address is inferred from the form of the access request. Tape requests do not include an explicit block number, but instead specify the next or previous block fromthe current tape position, from which the block number must be inferred by device firmware. cf. explicit addressing

IMR CONTEXT [Backup]

Acronym for incremental mirror resynchronization.

in-band (transmission) CONTEXT [Fibre Channel]

Transmission of a protocol other than the primary data protocol over the same medium as the primary data protocol. Management protocols are a common example of in-band transmission.

in-band virtualization

Virtualization functions or services that are in the data path. In a system that implements in-band virtualization, virtualization services such as address mapping are performed by the same functional components used to read or write data. cf. out-of-band virtualization

incremental backup CONTEXT [Backup]

A collective term for cumulative incremental backups and differential incremental backups. Any backup in which only data objects modified since the time of some previous backup are copied. cf. cumulative incremental backup, differential incremental backup, full backup

incremental mirror resynchronization incremental resynchronization CONTEXT [Backup]

A technique for reducing the time required to synchronize a split mirror with the set of storage devices from which it was split. Incremental mirror resynchronization requires that a list of changes to the original set of data since moment of splitting be kept. When the split mirror is rejoined to its original set of volumes, only the data items identified in the list are copied from the original to the split mirror (rather than the entire contents of the devices).

of 407

independent access array

A disk array whose data mapping is such that different member disks can execute multiple application I/O requests concurrently. cf. parallel access array

infinite buffer CONTEXT [Fibre Channel]

A term indicating that at the FC-2 level, the amount of buffering available at the Sequence Recipient is assumed tobe unlimited. Buffer overrun must be prevented by each ULP by choosing an appropriate amount of buffering per sequence based on its maximum transfer unit size.

information category CONTEXT [Fibre Channel]

A frame header field indicating the category to which the frame payload belongs (e.g., Solicited Data, Unsolicited Data, Solicited Control and Unsolicited Control).

information model

A repository-independent definition of entities (i.e., objects) and the relationships and interactions between these entities. For example, the CIM schemas are an example of an information model. An information model differs from a data model, which is repository-specific.

information system CONTEXT [Security]

The entire infrastructure, organization, personnel and components for the collection, processing, storage, transmission, display, dissemination and disposition of information.

information technology

All aspects of information creation, access, use, storage, transport and management. The term information technology addresses all aspects of computer and storage systems, networks, users and software in an enterprise.Abbreviated IT.

information unit CONTEXT [Fibre Channel]

An related collection of data specified by FC-4 to be transferred as a single FC-2 sequence.

infrastructure-based virtualization

Virtualization implemented in the storage fabric, in separate devices designed for the purpose, or in network devices. Examples are separate devices or additional functions in existing devices that aggregate multiple individual file system appliances or block storage subsystems into one such virtual service, functions providing transparent block or file system mirroring functions, or functions that provide new security or management

of 407

services.

inherent cost

The cost of a system expressed in terms of the number and type of components it contains. The concept of inherent cost allows technology-based comparisons of disk subsystem alternatives by expressing cost in terms of number of disks, ports, modules, fans, power supplies, cabinets, etc. Because it is inexpensively reproducible, software is generally assumed to have negligible inherent cost.

initial relative offset CONTEXT [Fibre Channel]

The relative offset of the block or sub-block transmitted by the first frame in a sequence. The initial relative offset is specified by an upper layer protocol and need not be zero.

initialization CONTEXT [Fibre Channel]

The startup and initial configuration of a device, system, piece of software or network. For FC-1, the period beginning with power on and continuing until the transmitter and receiver at that level become operational.

initiator

The system component that originates an I/O command over an I/O bus or network. I/O adapters, network interface cards, and intelligent controller device I/O bus control ASICs are typical initiators. cf. LUN, originator, target, target ID

inode CONTEXT [File System]

A persistent data structure in a UNIX or UNIX-like file system that describes the location of some or all of the disk blocks allocated to the file.

instantiation

The creation of an instance of a class or object oriented abstraction.

intelligent controller CONTEXT [Storage System]

A storage controller that includes a processor or sequencer programmed to enable it to handle a substantial portion of I/O request processing autonomously.

intelligent device

A computer, storage controller, storage device, or appliance.

of 407

Intelligent Peripheral Interface CONTEXT [Network]

A high-performance standards-based I/O interconnect. Abbreviated IPI.

intercabinet CONTEXT [Fibre Channel]

A specification for Fibre Channel copper cabling that allows up to a 30m distance between two enclosures that contain devices with Fibre Channel ports.

interconnect

A physical facility by which system elements are connected together and through which they can communicate with each other. I/O buses and networks are both interconnects.

interface connector CONTEXT [Fibre Channel]

An optical or electrical connector which connects the media to the Fibre Channel transmitter or receiver. An interface connector consists of both a receptacle and a plug.

intermix CONTEXT [Fibre Channel]

A Fibre Channel class of service that provides a full bandwidth dedicated Class 1 connection, but allows connectionless Class 2 and Class 3 traffic to share the link during intervals when bandwidth is unused.

International Standards Organization CONTEXT [Standards]

The international standards body. Abbreviated ISO. ISO-published standards have the status of international treaties.

Internet Control Message Protocol CONTEXT [Network]

A control protocol strongly related to IP and TCP, and used to convey a variety of control and error indications.

Internet Engineering Task Force CONTEXT [Network][Security] [Standards]

A large open international community of network designers, operators, vendors, and researchers concerned with evolution and smooth operation of the Internet, and responsible for producing RFCs. The standards body responsible for Internet standards, including SNMP, TCP/IP and policy for QoS. Abbreviated IETF. The IETF has a web site at www.ietf.org.

of 407

Internet Key Exchange CONTEXT [Network][Security]

A protocol used to obtain authenticated keying material. Standardized by the Internet Engineering Task Force and described in RFC 2409.

Internet Protocol CONTEXT [Network]

A protocol that provides connectionless best effort delivery of datagrams across heterogeneous physical networks. Abbreviated IP. cf. TCP, UDP

interrupt

A hardware or software signal that causes a computer to stop executing its instruction stream and switch to another stream. Software interrupts are triggered by application or other programs. Hardware interrupts are caused by external events, to notify software so it can deal with the events. The ticking of a clock, completion or reception of a transmission on an I/O bus or network, application attempts to execute invalid instructions or reference data for which they do not have access rights, and failure of some aspect of the computer hardware itself are all common causes of hardware interrupts.

interrupt switch

A human-activated switch present on some intelligent devices that is used to generate interrupts. Usually used for debugging purposes.

intracabinet CONTEXT [Fibre Channel]

A Fibre Channel specification for copper cabling that allows up to 13m total cable length within a single enclosure that may contain multiple devices.

I/O

Acronym for input/output. The process of moving data between a computer system's main memory and an external device or interface such as a storage device, display, printer, or network connected to other computer systems. I/O is a collective term for reading or moving data into a computer system's memory, and writing, or moving data from a computer system's memory to another location.

I/O adapter

An adapter that converts between the timing and protocol requirements of an intelligent device's memory bus and those of an I/O bus or network. In the context of storage subsystems, I/O adapters are contrasted with embedded storage controllers, which not only adapt between buses but also perform transformations such as device fanout, data caching, and RAID. Synonym for host bus adapter.

I/O bus

of 407

Any path used to transfer data and control information between components of an I/O subsystem. An I/O bus consists of wiring (either cable or backplane), connectors, and all associated electrical drivers, receivers, transducers, and other required electronic components. I/O buses are typically optimized for the transfer of data, and tend to support more restricted configurations than networks. In this book, an I/O bus that connects a host computer's host bus adapter to intelligent storage controllers or devices is called a host I/O bus. An I/O bus that connects storage controllers or host I/O bus adapters to devices is called a device I/O bus. cf. channel, device channel, device I/O bus, host I/O bus, network

I/O bottleneck

Any resource in the I/O path (e.g., device driver, host bus adapter, I/O bus, intelligent controller, or disk) whose performance limits the performance of a storage subsystem as a whole.

I/O driver

A host computer software component (usually part of an operating system) whose function is to control the operation of peripheral controllers or adapters attached to the host computer. I/O drivers manage communication and data transfer between applications and I/O devices, using host bus adapters as agents. In some cases, driversparticipate in data transfer, although this is rare with disk and tape drivers, since most host bus adapters and controllers contain specialized hardware to perform data transfers.

I/O intensity

A characterization of applications. An I/O-intensive application is one whose performance depends strongly on the performance of the I/O subsystem that provides its I/O services. I/O-intensive applications may be data transfer intensive or I/O request intensive.

I/O load

A sequence of I/O requests made to an I/O subsystem. The requests that comprise an I/O load include both user I/O and host overhead I/O, such as swapping, paging, and file system activity.

I/O load balancing

Synonym for load balancing.

I/O load optimization


I/O operation

A read, write, or control function performed to, from or within a computer system. For example I/O operations are requested by control software in order to satisfy application I/O requests made to virtual disks. cf. I/O request

of 407

I/O request

A request by an application to read or write a specified amount of data. In the context of real and virtual disks, I/Orequests specify the transfer of a number of blocks of data between consecutive disk block addresses and contiguous memory locations. cf. I/O operation

I/O subsystem CONTEXT [Storage System]

A collective term for the set of devices and software components that operate together to provide data transfer services. A storage subsystem is one type of I/O subsystem.

IP CONTEXT [Network]

Acronym for Internet Protocol.

IPI

Acronym for Intelligent Peripheral Interface.

IPsec CONTEXT [Network][Security]

Acronym for IP Security

IP Security CONTEXT [Network][Security]

A suite of cryptographic algorithms, protocols and procedures used to provide communication security for IP-basedcommunications. Standardized by the Internet Engineering Task Force.

ISO CONTEXT [Standards]

Acronym for International Standards Organization.

IT

Acronym for Information Technology.

Java

An object oriented computer programming language that is similar to but simpler than C++. Java was created by

of 407

Sun Microsystems Computer Corporation.

JBOD CONTEXT [Storage Device] [Storage System]

Acronym for "Just a Bunch Of Disks." Originally used to mean a collection of disks without the coordinated control provided by control software; today the term JBOD most often refers to a cabinet of disks whether or not RAID functionality is present. cf. disk array

Jini

An architecture and supporting services for publishing and discovering devices and services on a network. Jini was created by Sun Microsystems Computer Corporation.

Jiro

A Sun Microsystems Computer Corporation initiative, developed using the Java Community Process. Jiro's goal is to enable the management of heterogeneous storage networks. The core technology in Jiro is defined in the Federated Management Architecture Specification.

jitter CONTEXT [Fibre Channel]

Deviation in timing that a bit stream encounters as it traverses a physical medium.

K28.5 CONTEXT [Fibre Channel]

A special 10-bit character used to indicate the beginning of a Fibre Channel command.

KB KByte

Synonyms for kilobyte.

key CONTEXT [Security]

Usually a sequence of random or pseudorandom bits used to direct cryptographic operations and/or for producing other keys. The same plaintext encrypted with different keys yields different ciphertexts, each of which requires a different key for decryption. In a symmetric cryptosystem the encryption and decryption keys are the same. In an asymmetric cryptosystem the encryption and decryption keys are different.

key exchange CONTEXT [Security]

A cryptographic protocol and procedure in which two communicating entities determine a shared key in a fashion such that a third party who reads all of their communication cannot effectively determine the value of the key. A

of 407

common approach to key exchange requires such a third party to compute a discrete logarithm over a large field in order to determine the key value and relies on the computational intractability of the discrete logarithm problemfor suitably selected large fields for its security.

key management CONTEXT [Security]

The supervision and control of the process by which keys are generated, stored, protected, transferred, loaded, used, revoked and destroyed.

key pair CONTEXT [Security]

A public key and its corresponding private key as used in public key cryptography (i.e., asymmetric cryptosystem).

keying material CONTEXT [Security]

A key or authentication information in physical or magnetic form.

kilobyte

1000 (10^3) bytes of data. (Common storage industry usage)

1024 (2^10) bytes of data. (Common usage in software contexts)

Which is meant is typically clear from the context in which the term is used.

L_Port CONTEXT [Fibre Channel]

A port used to connect a node to a Fibre Channel arbitrated loop. cf. E_Port, F_Port, FL_Port, G_Port, N_Port, NL_Port

label CONTEXT [Backup]

An identifier associated with a removable media or cartridge. Labels may be humanly readable, machine readable, or both. cf. external volume serial number, media ID

LAN CONTEXT [Network]

Acronym for local area network.

LANE CONTEXT [Network]

of 407

Acronym for local area network emulation.

LAN-free backup

A disk backup methodology in which a SAN appliance performs the actual backup I/O operations, thus freeing the LAN server to perform I/O operations on behalf of LAN clients. Differentiated from serverless backup by the requirement of an additional SAN appliance to perform the backup I/O operations.

large read request large write request large I/O request

An I/O request that specifies the transfer of a large amount of data. "Large" obviously depends on the context, buttypically refers to requests for 64 KBytes or more of cf. small I/O request

latency

Synonym for I/O request execution time, the time between the making of an I/O request and completion of the request's execution. Short for rotational latency, the time between the completion of a seek and the instant of arrival of the first block of data to be transferred at the disk's read/write head.

latent fault

A failure of a system component that has not been recognized because the failed aspect of the component has not been exercised since the occurrence of the failure. A field-developed media defect on a disk surface is a latent fault until an attempt is made to read the data in a block that spans the defect.

LBA

Acronym for logical block address.

LDAP

Acronym for Lightweight Directory Access Protocol.

LDM

Acronym for logical disk manager.

LED

Acronym for light emitting diode.

of 407

library CONTEXT [Backup]

A robotic media handler capable of storing multiple pieces of removable media and loading and unloading them from one or more drives in arbitrary order.

light emitting diode

A multimode light source based on inexpensive optical diodes. Abbreviated LED. Available in a variety of wavelengths; 1300 nanometer wavelength is typical for data communications. The practical transfer rate limit for LEDs is 266 Mbps.

Lightweight Directory Access Protocol

An IETF protocol for creating, accessing and removing objects and data from a directory. It provides the ability to search, compare, add, delete and modify directory objects, as well as modifying the names of these objects. It also supports bind, unbind and abandon (cancel) operations for a session. LDAP got its name from its goal of beinga simpler form of DAP (Directory Access Protocol) than the X.500 set of standards.

link CONTEXT [General] [Fibre Channel]

A physical connection (electrical or optical) between two nodes of a network.

[Fibre Channel] Two unidirectional fibres transmitting in opposite directions and their associated transmitters and receivers.

[Fibre Channel] The full-duplex FC-0 level association between FC-1 entities in directly attached ports.

[Fibre Channel] The point to point physical connection from one element of a Fibre Channel fabric to the next

link control frame


LIP CONTEXT [Fibre Channel]

Acronym for loop initialization primitive.

LISM CONTEXT [Fibre Channel]

Acronym for loop initialization select master.

of 407

load balancing

The adjustment of system and/or application components and data so that application I/O or computational demands are spread as evenly as possible across a system's physical resources. I/O load balancing may be done manually (by a human) or automatically (by some means not requiring human intervention). cf. I/O load optimization, load sharing

load optimization

The manipulation of an I/O load in such a way that performance is optimal by some objective metric. Load optimization may be achieved by load balancing across several components, or by other means, such as request reordering or interleaved execution. cf. load balancing, load sharing

load sharing

The division of an I/O load or task among several storage subsystem components, without any attempt to equalizeeach component's share of the work. Each affected component bears a percentage of a shared load. When a storage subsystem is load sharing, it is possible for some of the sharing components to be operating at full capacity, to the point of actually limiting performance, while others are underutilized. cf. I/O load balancing, load optimization

local area network CONTEXT [Network]

A communications infrastructure designed to use dedicated wiring over a limited distance (typically a diameter of less than five kilometers) to connect a large number of intercommunicating nodes. Ethernet and token ring are thetwo most popular LAN technologies. cf. wide area network

local area network emulation

A collection of protocols and services that combine to create an emulated local area network using ATM as the underlying network. Abbreviated LANE. Local area network emulation enables intelligent devices with ATM connections to communicate with remote LAN-connected devices as if they were directly connected to the LAN.

local F_Port CONTEXT [Fibre Channel]

The F_Port to which a particular N_Port is directly attached by a link.

logical block CONTEXT [Storage Device] [Storage System]

A block of data stored on a disk or tape, and associated with an address for purposes of retrieval or overwriting. The term logical block is typically used to refer to the host's view of data addressing on a physical device. Within a storage device, there is often a further conversion between the logical blocks presented to hosts and the physical media locations at which the corresponding data is stored. cf. physical block, virtual block

of 407

logical block address CONTEXT [Storage Device] [Storage System]

The address of a logical block. Logical block addresses are typically used in hosts' I/O commands. The SCSI disk command protocol, for example, uses logical block addresses.

logical disk CONTEXT [Storage System]

A set of consecutively addressed FBA disk blocks that is part of a single virtual disk to physical disk mapping. Logical disks are used in some array implementations as constituents of logical volumes or partitions. Logical disks are normally not visible to the host environment, except during array configuration operations. cf. extent, virtual disk

logical disk manager CONTEXT [Windows]

A name for the volume management control software in the Windows NT operating system.

logical unit CONTEXT [SCSI]

The entity within a SCSI target that executes I/O commands. SCSI I/O commands are sent to a target and executed by a logical unit within that target. A SCSI physical disk typically has a single logical unit. Tape drives and array controllers may incorporate multiple logical units to which I/O commands can be addressed. Each logicalunit exported by an array controller corresponds to a virtual disk. cf. LUN, target, target ID

logical unit number CONTEXT [SCSI]

The SCSI identifier of a logical unit within a target.

logical volume CONTEXT [Storage System]

A virtual disk made up of logical disks. Also called a virtual disk, or volume set.

login server CONTEXT [Fibre Channel]

An intelligent entity within a Fibre Channel fabric that receives and executes fabric login requests.

long wavelength laser CONTEXT [Fibre Channel]

A laser with a wavelength 1300 nm or longer; usually 1300 or 1550 nm. Widely used in the telecommunications industry.

loop initialization CONTEXT [Fibre Channel]

of 407

The protocol by which a Fibre Channel Arbitrated Loop network initializes upon power up or recovers after a failure or other unexpected condition. Usually abbreviated LIP. During a LIP, the nodes present on the arbitrated loop identify themselves and acquire addresses on the loop for communication. No data can be transferred on an arbitrated loop until a LIP is complete.

loop initialization primitive CONTEXT [Fibre Channel]

A Fiber Channel primitive used to (1) initiate a procedure that results in unique addressing for all nodes, (2) indicate a loop failure, or (3) reset a specific node.

loop initialization select master CONTEXT [Fibre Channel]

The process by which a temporary Fibre Channel arbitrated loop master is determined during loop initialization. Abbreviated LISM.

loopback CONTEXT [Fibre Channel]

An FC-1 operational mode in which information passed to the FC-1 transmitter is shunted directly to the FC-1 receiver. When a Fibre Channel interface is in loopback mode, the loopback signal overrides any external signal detected by the receiver.

loop port state machine CONTEXT [Fibre Channel]

Logic that monitors and performs the tasks required for initialization and access to a Fibre Channel arbitrated loop.

LWL CONTEXT [Fibre Channel]

Acronym for long wavelength laser.

LUN CONTEXT [SCSI]

Acronym for logical unit number.

MAC CONTEXT [Network]

Acronym for media access control.

magnetic remanance CONTEXT [Security]

A magnetic representation of residual information remaining on a magnetic medium after the medium has been degaussed.

of 407

MAN

Acronym for metropolitan area network.

Managed Object Format CONTEXT [Management]

The syntax and formal description of the objects and associations in the CIM schemas. Abbreviated as MOF. MOF can also be translated to XML using a document type definition published by the DMTF.

Management Information Base CONTEXT [Management]

The specification and formal description of a set of objects and variables that can be read and possibly written using the SNMP protocol. Abbreviated MIB. Various standard MIBs are defined by the IETF.

mandatory (provision) CONTEXT [Standards]

A provision in a standard which must be supported in order for an implementation of the standard to be compliant.

mapping CONTEXT [Storage System]

Conversion between two data addressing spaces. For example, mapping refers to the conversion between physical disk block addresses and the block addresses of the virtual disks presented to operating environments by control software.

mapping boundary CONTEXT [Storage System]

A virtual disk block address of some significance to a disk array's mapping algorithms. The first and last blocks of auser data space strip or check data strip are mapping boundaries.

maximum transfer unit CONTEXT [Network]

The largest amount of data that it is permissible to transmit as one unit according to a protocol specification. Abbreviated MTU. The Ethernet MTU is 1536 eight-bit bytes. The Fibre Channel MTU is 2112 eight-bit bytes.

MB MByte

Shorthand for megabyte.

Mb Mbit

of 407

Shorthand for megabit.

MBps

Acronym for megabytes per second. A measure of bandwidth or data transfer rate.

Mbps

Acronym for megabits per second. A measure of bandwidth or data transfer rate.

MD5 CONTEXT [Security]

A specific message-digest algorithm producing a 128-bit digest which is used as authentication data by an authentication service.

mean time between failures

The average time from start of use to first failure in a large population of identical systems, components, or devices. Abbreviated MTBF.

mean time to (loss of) data availability

The average time from startup until a component failure causes a loss of timely user data access in a large population of storage devices. Loss of availability does not necessarily imply loss of data; for some classes of failures, (e.g., failure of non-redundant intelligent storage controllers), data remains intact, and can again be accessed after the failed component is replaced.

mean time to data loss

The average time from startup until a component failure causes a permanent loss of user data in a large population of storage devices. Mean time to data loss is similar to MTBF for disks and tapes, but is likely to differ inRAID arrays, where redundancy can protect against data loss due to component failures.

mean time to repair

The average time between a failure and completion of repair in a large population of identical systems, components, or devices. Mean time to repair comprises all elements of repair time, from the occurrence of the failure to restoration of complete functionality of the failed component. This includes time to notice and respond to the failure, time to repair or replace the failed component, and time to make the replaced component fully operational. In mirrored and RAID arrays, for example, the mean time to repair a disk failure includes the time required to reconstruct user data and check data from the failed disk on the replacement disk. Abbreviated MTTR.

of 407

meaningful (control field) CONTEXT [Standards]

In a standard, a control field or bit that must be correctly interpreted by a receiver. Control fields are either meaningful or "not meaningful," in which case they must be ignored.

media

The material in a storage device on which data is recorded. A physical link on which data is transmitted between two points.

media access control CONTEXT [Network]

Algorithms that control access to physical media, especially in shared media networks.

media defect


media ID CONTEXT [Backup]

An machine readable identifier written on a piece of removable media that remains constant throughout the media's life. cf. external volume serial number, label

media manager CONTEXT [Backup]

A backup software component responsible for tracking the location, contents, and state of removable storage media.

media robot CONTEXT [Backup]

Synonym for robotic media handler.

media stacker CONTEXT [Backup]

A robotic media handler in which media must be moved sequentially by the robot. Usually services a single drive. A stacker may be able to load media into a drive in arbitrary order, but must cycle through media in sequence to do so.

megabaud

One million baud (elements of transmitted information) per second, including data, signaling, overhead.

of 407

megabit

1,000,000 (10^6) bits. The SNIA uses the 10^6 convention commonly found in data transfer-related literature rather than the 1,048,576 (2^20) convention common in computer system random access memory and software literature.

megabyte

1,000,000 (10^6) bytes. The SNIA uses the 10^6 convention commonly found in storage and data transfer-related literature rather than the 1,048,576 (2^20) convention common in computer system random access memory and software literature.

megatransfer CONTEXT [SCSI]

The transfer of one million data units per second. Used to describe the characteristics of parallel I/O buses like SCSI, for which the data transfer rate depends upon the amount of data transferred in each data cycle. cf. SCSI, fast SCSI, Ultra SCSI, Ultra2 SCSI, wide SCSI

member member disk CONTEXT [Storage System]

A disk that is in use as a member of a disk array.

message-digest algorithm CONTEXT [Security]

An algorithm which produces a secure hash.

metadata CONTEXT [File System] [Storage System]

Data that describes data. In disk arrays, metadata consists of items such as array membership, member extent sizes and locations, descriptions of logical disks and partitions, and array state information. In file systems, metadata includes file names, file properties and security information, and lists of block addresses at which each file's data is stored.

metropolitan area network CONTEXT [Network]

A network that connects nodes distributed over a metropolitan (city-wide) area as opposed to a local area (campus) or wide area (national or global). Abbreviated MAN. From a storage perspective, MANs are of interest because there are MANs over which block storage protocols (e.g., ESCON, Fibre Channel) can be carried natively, whereas most WANs that extend beyond a single metropolitan area do not currently support such protocols.

MIB CONTEXT [Management]

of 407

Acronym for Management Information Base.

MIME CONTEXT [Network]

Acronym for Multipurpose Internet Mail Extensions.

mirror resynchronization CONTEXT [Backup]

The process of making the contents of a split mirror identical with the contents of the storage devices from which the mirror was split. Mirror resynchronization may entail copying the entire contents of the storage devices or, when fast mirror resynchronization is used, only the data items changed in the original since the instant of splitting.

mirroring CONTEXT [Storage System]

A form of storage array in which two or more identical copies of data are maintained on separate media. Also known as RAID Level 1, disk shadowing, real-time copy, and t1 copy.

mirrors mirrored disks CONTEXT [Storage System]

The disks of a mirrored array.

mirrored array CONTEXT [Storage System]

Common term for a disk array that implements RAID Level 1, or mirroring to protect data against loss due to disk or device I/O bus failure.

MLS CONTEXT [Security]

Acronym for multilevel security

modeling language

A language for describing the concepts of an information or data model. One of the most popular modeling languages in use today is UML (Unified Modeling Language). The essence of modeling languages is that they be capable of conveying the model concepts.

MOF CONTEXT [Management]

Acronym for Managed Object Format.

of 407

monitor (program)

A program that executes in an operating environment and keeps track of system resource utilization. Monitors typically record CPU utilization, I/O request rates, data transfer rates, RAM utilization, and similar statistics. A monitor program, which may be an integral part of an operating system, a separate software product, or a part of a related component, such as a database management system, is a necessary prerequisite to manual I/O load balancing.

mount

In the Network File System (NFS), a protocol and set of procedures to specify a remote host and file system or directory to be accessed. Also specified is the location of the accessed directories in the local file hierarchy.

MTBF

Acronym for mean time between failures.

MTDA

Acronym for mean time to (loss of) data availability.

MTDL

Acronym for mean time to data loss.

MTTR

Acronym for mean time to repair.

MTU CONTEXT [Network]

Acronym for maximum transfer unit.

multicast CONTEXT [Fibre Channel] [Network]

The simultaneous transmission of a message to a subset of more than one of the ports connected to a communication facility. In a Fibre Channel context, multicast specifically refers to the sending of a message to multiple N_Ports connected to a fabric.

of 407

multicast group CONTEXT [Fibre Channel] [Network]

A set of ports associated with an address or identifier that serves as the destination for multicast packets or frames that are to be delivered to all ports in the set.

multi-level disk array CONTEXT [Storage System]

A disk array with two levels of data mapping. The virtual disks created by one mapping level become the membersof the second level. The most frequently encountered multi-level disk arrays use mirroring at the first level, and stripe data across the resulting mirrored arrays at the second level.

multi-level security CONTEXT [Security]

Allows users and resources of different sensitivity levels to access a system concurrently, while ensuring that only information for which the user or resource has authorization is made available. Requires a formal computer security policy model which assigns specific access characteristics to both subjects and objects.

multimode (fiber optic cable) CONTEXT [Fibre Channel] [Network]

A fiber optic cabling specification that allows up to 500-meter distances between devices.

multi-threaded

Having multiple concurrent or pseudo-concurrent execution sequences. Used to described processes in computer systems. Multi-threaded processes are one means by which I/O request-intensive applications can make maximumuse of disk arrays to increase I/O performance.

multi-path I/O CONTEXT [Storage System]

The facility for a host to direct I/O requests to a storage device on more than one access path. Multi-path I/O requires that devices be uniquely identifiable by some means other than by bus address.

Multipurpose Internet Mail Extensions CONTEXT [Network]

A specification that defines the mechanisms for specifying and describing the format of Internet message bodies. An HTTP response containing a MIME Content-Type header allows the HTTP client to invoke the appropriate application for processing the received data.

N_Port CONTEXT [Fibre Channel]

A port that connects a node to a fabric or to another node. Nodes' N_Ports connect to fabrics' F_Ports or to other nodes' N_Ports. N_Ports handle creation, detection, and flow of message units to and from the connected systems.N_Ports are end points in point to point links. cf. E_Port, F_Port, FL_Port, G_Port, L_Port, NL_Port

of 407

N_Port Name CONTEXT [Fibre Channel]

A Name_Identifier associated with an N_Port.

NAA CONTEXT [Network] [Standards]

Acronym for Network Address Authority.

Name_Identifier CONTEXT [Fibre Channel]

A 64-bit identifier, consisting of a 60-bit value concatenated with a 4-bit Network_Address_Authority_Identifier. Name_Identifiers identify Fibre Channel entities such as N_Port, node, F_Port, or fabric.

name server CONTEXT [Fibre Channel] [Network]

An intelligent entity in a network that translates between symbolic node names and network addresses. In a Fibre Channel network, a name server translates between world wide names and fabric addresses.

naming

The mapping of address space to a set of objects. Naming is typically used either for human convenience (e.g., symbolic names attached to files or storage devices), or to establish a level of independence between two system components (e.g., identification of files by inode names or identification of computers by IP addresses).

namespace CONTEXT [File System] [Management]

The set of valid names recognized by a file system. One of the four basic functions of file systems is maintenance of a namespace so that invalid and duplicate names do not occur. In XML, a document at a specific Web address (URL) that lists the names of data elements and attributes that are used in other XML files. In CIM and WBEM, a collection of object definitions and instances that are logically consistent.

NAS CONTEXT [Network] [Storage System]

Acronym for network attached storage.

National Committee Information Technology Standards CONTEXT [Standards]

A committee of ANSI that serves as the governing body of X3T11 and other standards organizations.

of 407

National Institute of Standards and Technology CONTEXT [Security]

A non-regulatory federal agency within the U.S. Commerce Department's Technology Administration. NIST's mission is to develop and promote measurement, standards, and technology to enhance productivity, facilitate trade, and improve the quality of life. Specifically, the Computer Security Division within NIST's Information Technology Laboratory managed the Advanced Encryption Standard (AES) program.

NCITS CONTEXT [Standards]

Acronym for National Committee Information Technology Standards.

NDMP CONTEXT [Management] [Network]

Acronym for Network Data Management Protocol.

network CONTEXT [Network]

An interconnect that enables communication among a collection of attached nodes. A network consists of optical orelectrical transmission media, infrastructure in the form of hubs and/or switches, and protocols that make message sequences meaningful. In comparison to I/O buses, networks are typically characterized by large numbers of nodes that act as peers, large inter-node separation, and flexible configurability. cf. channel, I/O bus, local area network, storage area network

network adapter CONTEXT [Network]

An adapter that connects an intelligent device to a network. Usually called a network interface card, or Ethernet NIC. cf. Ethernet adapter, NIC

Network Address Authority (identifier) CONTEXT [Fibre Channel]

A four-bit identifier defined in FC-PH to denote a network address authority (i.e., an organization such as CCITT or IEEE that administers network addresses).

network attached storage CONTEXT [Network] [Storage System]

A term used to refer to storage elements that connect to a network and provide file access services to computer systems. Abbreviated NAS. A NAS storage element consists of an engine, which implements the file services, and one or more devices, on which data is stored. NAS elements may be attached to any type of network. When attached to SANs, NAS elements may be considered to be members of the SAS class of storage elements. A class of systems that provide file services to host computers. A host system that uses network attached storage uses a file system device driver to access data using file access protocols such as NFS or CIFS. NAS systems interpret these commands and perform the internal file and device I/O operations necessary to execute them. cf. storage area network

of 407

Network Data Management Protocol CONTEXT [Backup]

A communications protocol that allows intelligent devices on which data is stored, robotic library devices, and backup applications to intercommunicate for the purpose of performing backups. Abbreviated NDMP. An open standard protocol for network-based backup of NAS devices. NDMP allows a network backup application to control the retrieval of data from, and backup of, a server without third-party software. The control and data transfer components of backup and restore are separated. NDMP is intended to support tape drives, but can be extended to address other devices and media in the future. The Network Data Management Task Force has a web site at http://www.ndmp.org.

Network File System (protocol) CONTEXT [File System]

A distributed file system and its associated network protocol originally developed by Sun Microsystems Computer Corporation and commonly implemented in UNIX systems, although most other computer systems have implemented NFS clients and/or servers. Abbreviated NFS. The IETF is responsible for the NFS standard.

network interface card CONTEXT [Network]

An I/O adapter that connects a computer or other type of node to a network. Abbreviated NIC. A NIC is usually a circuit module; however, the term is sometimes used to denote an ASIC or set of ASICs on a computer system board that perform, the network I/O adapter function. The term NIC is universally used in Ethernet and token ring contexts. In Fibre Channel contexts, the terms adapter and NIC are used in preference to host bus adapter. cf. adapter, host bus adapter, I/O adapter

NFS CONTEXT [File System] [Storage System]

Acronym for Network File System.

NIC CONTEXT [Network]

Acronym for network interface card.

NIST CONTEXT [Security]

Acronym for National Institute of Standards and Technology

NL_Port CONTEXT [Fibre Channel]

A port specific to Fibre Channel Arbitrated Loop. An NL_Port has the same functional, logical, and message handling capability as an N_Port, but connects to an arbitrated loop rather than to a fabric. Some implementations can function either as N_Ports or as NL_Ports depending on the network to which they are connected. An NL_Port must replicate frames and pass them on when in passive loop mode. cf. E_Port, F_Port, FL_Port, G_Port, N_Port

node CONTEXT [Network] [Storage System]

of 407

An addressable entity connected to an I/O bus or network. Used primarily to refer to computers, storage devices, and storage subsystems. The component of a node that connects to the bus or network is a port.

node name

A Name_Identifier associated with a node.

normal operation normal mode

A state of a system in which the system is functioning within its prescribed operational bounds. For example, whena disk array subsystem is operating in normal mode, all disks are up, no extraordinary actions (e.g., reconstruction) are being performed, and environmental conditions are within operational range. Sometimes calledoptimal mode.

non-linear mapping CONTEXT [Storage System]

Any form of tabular mapping in which there is not a fixed size correspondence between the two mapped address spaces. Non-linear mapping is required in disk arrays that compress data, since the space required to store a given range of virtual blocks depends on the degree to which the contents of those blocks can be compressed, andtherefore changes as block contents change. cf. algorithmic mapping, dynamic mapping, tabular mapping

non-OFC (laser) CONTEXT [Fibre Channel]

A laser transceiver whose lower-intensity output does not require special OFC mechanisms.

non-repeating ordered set CONTEXT [Fibre Channel]

An ordered set passed by FC-2 to FC-1 for transmission which has non-idempotent semantics; i.e., it cannot be retransmitted.

non-repudiation CONTEXT [Security]

Assurance that a subject cannot later deny having performed some action. For communication, this may involve providing the sender of data with proof of delivery and the recipient with proof of the sender's identity, so neither can later deny having participated in the communication. Digital signatures are often used as a non-repudiation mechanism for stored information in combination with timestamps.

non-transparent failover

A failover from one component of a redundant system to another that is visible to the external environment. For example, a controller failover in a redundant disk subsystem is non-transparent if the surviving controller exports the other's virtual disks at different host I/O bus addresses or on a different host I/O bus. cf. transparent failover

of 407

Non-Uniform Memory Architecture

A computer architecture that enables memory to be shared by multiple processors, with different processors having different access speeds to different parts of the memory. Abbreviated NUMA.

non-volatile random access memory

Computer system random access memory that has been made impervious to data loss due to power failure through the use of UPS, batteries, or implementation technology such as flash memory. Abbreviated NVRAM.

non-volatility

A property of data. Non-volatility refers to the property that data will be preserved, even if certain environmental conditions are not met. Used to describe data stored on disks or tapes. If electrical power to these devices is cut, data stored on them is nonetheless preserved.

not operational (receiver or transmitter) CONTEXT [Fibre Channel]

A receiver or transmitter that is not capable of receiving or transmitting an encoded bit stream based on rules defined by FC-PH for error control. For example, FC-1 is not operational during initialization.

NUMA

Acronym for Non-Uniform Memory Architecture.

NVRAM

Acronym for non-volatile random access memory.

NVRAM cache

A quantity of NVRAM used as a cache. NVRAM cache is particularly useful in RAID array subsystems, filers, database servers, and other intelligent devices that must keep track of the state of multi-step I/O operations even if power fails during the execution of the steps.

NVRAM card

A printed circuit module containing NVRAM.

object CONTEXT [Security]

of 407

In the context of access control, an entity to which access is controlled and/or usage of which is restricted to authorized subjects. Information system resources are often examples of objects.

object oriented (methodology)

A methodology for decomposing an entity or problem by its key abstractions, versus by its procedures or steps. The key abstractions become classes in an information or data model, and embody well-defined behaviors called methods, with a unique set of data attributes. Instances of a class are called objects. Abbreviated OO.

OC-n CONTEXT [Network]

A data rate that is a multiple of the fundamental SONET rate of 51.84 Mbits/sec. OC-3 (155 Mbits/sec), OC-12 (622 Mbits/sec), OC-48 (2488 Mbits/sec) and OC-192 (9953 Mbits/sec) are currently in common use. cf. Asynchronous Transfer Mode

OFC CONTEXT [Fibre Channel]

Acronym for open fibre control.

offline backup CONTEXT [Backup]

A form of backup in which the data being backed up is not accessed by applications for the duration of the backup.

online backup CONTEXT [Backup]

A form of backup in which the data being backed up may be accessed by applications during the backup. Online backup of a set of data is usually accomplished through the use of a frozen image of the data.

OO

Acronym for object oriented.

open CONTEXT [General] [Fibre Channel]

[General] Any system or aspect of a system whose function is governed by a readily accessible standard rather than by a privately owned specification.

[Fibre Channel] A period of time that begins when a sequence or exchange is initiated and ends when the sequence or exchange is normally or abnormally terminated.

[General] Not electrically terminated, as an unplugged cable.

of 407

open fibre control

A safety interlock system that limits the optical power level on an open optical fibre cable.

Open Group, the

A cross-industry consortium for open systems standards and their certification. UNIX, management and security standards are developed within the Open Group. The Open Group's web site is at www.opengroup.org.

open interconnect

Synonym for standard interconnect.

operating environment

A collective term for the hardware architecture and operating system of a computer system.

operation CONTEXT [Fibre Channel]

An FC-2 construct that encapsulates one or more possibly concurrent exchanges into a single abstraction for higher level protocols.

Operation_Associator CONTEXT [Fibre Channel]

A value used in the Association_Header to identify a specific operation within a node and correlate communicating processes related to that operation. The Operation_Associator is a handle by which an operation within a given node is referred to by another communicating Node. Operation_Associator is a generic reference to Originator Operation_Associator and Responder Operation_Associator.

operational (state) CONTEXT [Fibre Channel]

The state of a receiver or transmitter that is capable of receiving or transmitting an encoded bit stream based on the rules defined by FC-PH for error control. Those receivers capable of accepting signals from transmitters requiring laser safety procedures are not considered operational after power on until a signal of a duration longer than that associated with laser safety procedures is present at the fibre attached to the receiver.

optical fall time

The time interval required for the falling edge of an optical pulse to transition between specified percentages of thesignal amplitude. For lasers the transitions are measured between the 80% and 20% points. For LED media the specification points are 90% and 10%.

of 407

optional (characteristic) CONTEXT [Standards]

Characteristics of a standard that are specified by the standard but not required for compliance. If an optional characteristic of a standard is implemented, it must be implemented as defined in the standard.

ordered set CONTEXT [Fibre Channel]

A transmission word (sequence of four 10-bit code bytes) with a special character in its first (leftmost) position and data characters in the remaining three positions. An ordered set is represented by the combination of special codes and data bytes which, when encoded, result in the generation of the transmission characters specified for the ordered set. Ordered sets are used for low-level Fibre Channel link functions such as frame demarcation, signaling between the ends of a link, initialization after power on, and some basic recovery actions.

originator CONTEXT [Fibre Channel]

The party initiating an exchange.

Originator Exchange_Identifier CONTEXT [Fibre Channel]

An identifier assigned by an originator to identify an exchange. Abbreviated OX_ID. An OX_ID is meaningful only to its originator.

overwrite procedure CONTEXT [Security]

The process of writing patterns of data on top of the data stored on a magnetic medium for the purpose of obliterating the data.

out-of-band (transmission) CONTEXT [Fibre Channel]

Transmission of management information for Fibre Channel components outside of the Fibre Channel network, typically over Ethernet.

out-of-band virtualization

Virtualization functions or services that are not in the data path. Examples are functions related to metadata, the management of data or storage, security management, backup of data, etc.

OX_ID CONTEXT [Fibre Channel]

Acronym for Originator Exchange_Identifier.

packet

of 407


panic

A colloquial term describing a software program's reaction to an incomprehensible state. In an operating system context, a panic is usually a system call or unexpected state that causes the system to abruptly stop executing so as to eliminate the possibility that the cause of the panic will cause further damage to the system, applications, or data.

parallel access array CONTEXT [Storage System]

A disk array model in which data transfer and data protection algorithms assume that all member disks operate in unison, with each participating in the execution of every application I/O request. A parallel access array is only capable of executing one I/O request at a time. True parallel access would require that an array's disks be rotationally synchronized. In actual practice, arrays approximate parallel access behavior. Ideal RAID Level 2 and RAID Level 3 arrays are parallel access arrays. cf. independent access array

parallel (transmission)

Simultaneous transmission of multiple data bits over multiple physical lines.

parity disk CONTEXT [Storage System]

In a RAID Level 3 or 4 array, the single disk on which the parity check data is stored.

parity RAID CONTEXT [Storage System]

A collective term used to refer to Berkeley RAID Levels 3, 4, and 5.

parity RAID array CONTEXT [Storage System]

A RAID array whose data protection mechanism is one of Berkeley RAID Levels 3, 4, and 5.

partition CONTEXT [Storage System]

A subdivision of the capacity of a physical or virtual disk. Partitions are consecutively numbered ranges of blocks that are recognized by MS-DOS, Windows, and most UNIX operating systems. Synonym for the type of extent used to configure arrays. A contiguously addressed range of logical blocks on a physical media that is identifiable by an operating system via the partition's type and subtype fields. A partition's type and subtype fields are recorded on the physical media and hence make the partition self-identifying.

partitioning CONTEXT [Storage System]

of 407

Presentation of the usable storage capacity of a disk or array to an operating environment in the form of several virtual disks whose aggregate capacity approximates that of the underlying physical or virtual disk. Partitioning is common in MS-DOS, Windows, and UNIX environments. Partitioning is useful with hosts which cannot support the full capacity of a large disk or array as one device. It can also be useful administratively—for example, to create hard subdivisions of a large virtual disk.

passive copper CONTEXT [Fibre Channel]

A low-cost Fibre Channel connection that allows up to 13-meter copper cable lengths.

passphrase CONTEXT [Security]

A sequence of characters longer than the acceptable length of a password that is transformed by a password system into a virtual password of acceptable length.

password CONTEXT [Security]

A protected private alphanumeric string used to authenticate an identity or to authorize access to data.

path

The access path from a host computer to a storage device.

The combination of device address and file system directory elements used to locate a file within a file system.

Any route through an interconnect that allows two devices to communicate.

A sequence of computer instructions that performs a given function, such as I/O request execution.

path length CONTEXT [General] [Backup] [File System]

[General] The number of instructions (a rough measure of the amount of time) required by a computer to perform a specific activity, such as I/O request execution.

[Backup] [File System] The number of characters in a path name.

path name CONTEXT [File System]

The complete list of nested sub-directories through which a file is reached.

payload CONTEXT [Fibre Channel] [Network]

Contents of the data field of a communications frame or packet. In Fibre Channel, the payload excludes optional

of 407

headers and fill bytes, if they are present.

PB PByte

Acronym for petabyte (10^15 bytes).

PBC CONTEXT [Fibre Channel]

Acronym for port bypass circuit.

PCI

Acronym for Peripheral Component Interconnect.

PDC CONTEXT [Windows]

Acronym for primary domain controller.

pcnfsd

A daemon that permits personal computers to access file systems accessed through the NFS protocol.

penetration CONTEXT [Security]

An unauthorized bypassing of the security mechanisms of a system.

Peripheral Component Interconnect

A bus for connecting interface modules to a computer system. Abbreviated PCI. Variations of PCI support 32-and 64-bit parallel data transfers at 33-and 66-MHz cycle times. A 133-MHz PCIX has been proposed by Compaq, HP, and IBM.

permanent virtual circuit CONTEXT [Fibre Channel]


persistence

Synonym for non-volatility. Usually used to distinguish between data and metadata held in DRAM, which is lost

of 407

when electrical power is lost, and data held on non-volatile storage (disk, tape, battery-backed DRAM, etc.) that survives, or persists across power outages.

physical configuration

The installation, removal, or re-installation of disks, cables, HBAs, and other components required for a system or subsystem to function. Physical configuration is typically understood to include address assignments, such as PCI slot number, SCSI target ID and logical unit number, etc. cf. array configuration, configuration

physical block CONTEXT [Storage Device]

A physical area on a recording media at which data is stored. Distinguished from the logical and virtual block viewstypically presented to the operating environment by storage devices.

physical block address

The address of a physical block. A number that can be algorithmically converted to a physical location on storage media.

physical disk CONTEXT [Storage System] [Operating System]

[Storage System] A disk. Used to emphasize a contrast with virtual disks. [Operating System] A host operating system's view of an online storage device.

physical extent CONTEXT [Storage System]

A number of consecutively addressed blocks on a physical disk. Physical extents are created by control software asbuilding blocks from which redundancy groups and volume sets are created. Called p_extent by ANSI.

physical extent block number CONTEXT [Storage System]

The relative position of a block within a physical extent. Physical extent block numbers are used to develop higher-level constructs in RAID array striped data mapping, not for application or data addressing.

PKI CONTEXT [Security]

Acronym for public key infrastructure.

plaintext CONTEXT [Security]

Unencrypted information.

of 407

PLDA CONTEXT [Fibre Channel]

Acronym for private loop direct attach.

PLOGI CONTEXT [Fibre Channel]

Acronym for port login.

point-in-time copy CONTEXT [Backup]

A fully usable copy of a defined collection of data that contains an image of the data as it appeared at a single point in time. The copy is considered to have logically occurred at that point in time, but implementations may perform part or all of the copy at other times (e.g., via database log replay or rollback) as long as the result is a consistent copy of the data as it appeared at that point in time. Implementations may restrict point in time copies to be read-only or may permit subsequent writes to the copy. Three important classes of point in time copies are split mirror, changed block, and concurrent. Pointer remapping and copy on write are implementation techniques often used for the latter two classes. cf. snapshot

pointer copy CONTEXT [Backup]

A point in time copy made using the pointer remapping technique.

pointer remapping CONTEXT [Backup]

A technique for maintaining a point in time copy in which pointers to all of the source data and copy data are maintained. When data is overwritten, a new location is chosen for the updated data, and the pointer for that data is remapped to point to it. If the copy is read-only, pointers to its data are never modified. cf. copy on write

policy processor

In an intelligent device, the processor that schedules the overall activities. Policy processors are usually augmented by additional processors, state machines, or sequencers that perform the lower-level functions required to implement overall policy.

port

An I/O adapter used to connect an intelligent device (node) to an I/O bus or network.

[storage subsystems] Synonym for the head end of a device I/O bus containing the arbitration logic.

Port_ID CONTEXT [Fibre Channel]

of 407

A unique 24-bit address assigned to an N_Port. There may be at most 2^24 or 16.7 million N_Ports in a single Fibre Channel fabric. There may be at most 127 NL_Ports in a single loop. For point to point (N_Port to N_Port) connection, there are only 2. In some implementations, a device's Port_ID is derived from its World Wide Name. In other cases Port_ID's are permanently assigned in association with specific physical ports. Port_ID's may or may not survive a loop initialization process or, in the case of a switched fabric, a reconfiguration of the Fibre Channel switch.

port bypass circuit CONTEXT [Fibre Channel]

A circuit that automatically opens and closes a Fibre Channel arbitrated loop so that nodes can be added to or removed from the loop with minimal disruption of operations. Port bypass circuits are typically found in Fibre Channel hubs and disk enclosures.

port login CONTEXT [Fibre Channel]

The port-to-port login process by which Fibre Channel initiators establish sessions with targets.

port name CONTEXT [Fibre Channel]

A unique 64-bit identifier assigned to a Fibre Channel port.

POST

Acronym for power on self test.

power conditioning

The regulation of power supplied to a system so that acceptable ranges of voltage and frequency are maintained. Power conditioning is sometimes done by a storage subsystem, but may also be an environmental requirement.

power on self test

A set of internally stored diagnostic programs run by intelligent devices when powered on. Abbreviated POST. These diagnostic programs verify the basic integrity of hardware before software is permitted to run on it.

present (verb)

To cause to appear or to make available. For example, RAID control software and volume managers present virtual disks to host environments. Synonym for export.

primary domain controller CONTEXT [Windows]

of 407

A domain controller that has been assigned as or has negotiated to become the primary authentication server for the domain of which it is a part.

primitive sequence CONTEXT [Fibre Channel]

An ordered set transmitted repeatedly and continuously until a specified response is received.

primitive signal CONTEXT [Fibre Channel]

An ordered set with a special meaning such as an idle or Receiver_Ready (R_RDY).

private key CONTEXT [Security]

A key which is used in a symmetric cryptosystem in both encryption and decryption processes, or in an asymmetric cryptosystem for one, but not both, of those processes. A private key must remain confidential to the using party if communication security is to be maintained.

private key cryptography CONTEXT [Security]

An encryption methodology in which the encryptor and decryptor use the same key, which must be kept secret. cf.symmetric cryptosystem

private loop CONTEXT [Fibre Channel]

A Fibre Channel arbitrated loop with no fabric attachment.

private loop device CONTEXT [Fibre Channel]

A Fibre Channel arbitrated loop device that does not support fabric login.

private loop direct attach CONTEXT [Fibre Channel]


process policy CONTEXT [Fibre Channel]

An error handling policy that allows an N_Port to continue processing data frames following detection of one or more missing frames in a sequence.

of 407

Process_Associator CONTEXT [Fibre Channel]

A value in the Association_Header that identifies a process or a group of processes within a node. Communicating processes in different nodes use Process_Associators to address each other. Originating processes have Originator Process_Associators; responding processes have Responder Process_Associators.

profile CONTEXT [Standards]

A proper subset of a standard that supports interoperability across a set of products or in a specific application. Profiles exist for FCP (FCSI and PLDA), IP, and other areas. A profile is a vertical slice through a standard containing physical, logical and behavioral elements required for interoperability.

proprietary interconnect proprietary I/O bus CONTEXT [Storage System]

An I/O bus (either a host I/O bus or a device I/O bus) whose transmission characteristics and protocols are the intellectual property of a single vendor, and which require the permission of that vendor to be implemented in the products of other vendors. cf. open interconnect

protected space protected space extent

The storage space available for application data in a physical extent that belongs to a redundancy group.

protocol CONTEXT [Fibre Channel] [Network] [SCSI]

A set of rules for using an interconnect or network so that information conveyed on the interconnect can be correctly interpreted by all parties to the communication. Protocols include such aspects of communication as data representation, data item ordering, message formats, message and response sequencing rules, block data transmission conventions, timing requirements, and so forth.

public key CONTEXT [Security]

A key which is used in an asymmetric cryptosystem for either the encryption or decryption process where the private key is not used, and which can be shared among a group of users without impacting the security of the cryptosystem.

public key cryptography CONTEXT [Security]

An encryption system using a linked pair of keys. What one key of the pair encrypts, the other decrypts. Either keycan be used for encryption and decryption. cf. asymmetric cryptosystem

public key infrastructure CONTEXT [Security]

A framework established to issue, maintain, and revoke public key certificates accommodating a variety of securitytechnologies.

of 407

public loop CONTEXT [Fibre Channel]

A Fibre Channel arbitrated loop with an attachment to a fabric.

public loop device CONTEXT [Fibre Channel]

A Fibre Channel arbitrated loop device that supports fabric login and services.

pull technology

The transmission of information in response to a request for that information. An example of a pull technology is polling. cf. push technology

push technology

The transmission of information from a source or initiator without the source being requested to send that information. An example of a push technology is an SNMP trap. cf. pull technology

PVC CONTEXT [Fibre Channel]

Acronym for permanent virtual circuit.

QoS

Acronym for quality of service.

quiesce (verb) CONTEXT [Backup]

To bring a device or an application to a state in which (a.) it is able to operate, (b.) all of its data is consistent and stored on non-volatile storage, and (c.) processing has been suspended and there are no tasks in progress (i.e., allapplication tasks have either been completed or not started).

quiescent state CONTEXT [Backup]

An application or device state in which (a.) the application or device is able to operate, (b.) all of its data is consistent and stored on non-volatile storage, and (c.) processing has been suspended and there are no tasks in progress (i.e., all tasks have either been completed or not started).

quality of service CONTEXT [Management]

of 407

A technique for managing computer system resources such as bandwidth by specifying user visible parameters such as message delivery time. Policy rules are used to describe the operation of network elements to make these guarantees. Relevant standards for QoS in the IETF are the RSVP (Resource Reservation Protocol) and COPS (Common Open Policy Service) protocol. RSVP allows for the reservation of bandwidth in advance, while COPS allows routers and switches to obtain policy rules from a server.

RAID CONTEXT [Storage System]

Acronym for Redundant Array of Independent Disks, a family of techniques for managing multiple disks to deliver desirable cost, data availability, and performance characteristics to host environments.

A Redundant Array of Independent Disks. A phrase adopted from the 1988 SIGMOD paper "A Case for Redundant Arrays of Inexpensive Disks."

RAID 0 RAID Level 0 CONTEXT [Storage System]

Synonym for data striping.




A form of RAID in which a Hamming code computed on stripes of data on some of an array's disks is stored on the remaining disks and serves as check data.


A form of parity RAID in which all disks are assumed to be rotationally synchronized, and in which the data stripe size is no larger than the exported block size.


A form of parity RAID in which the disks operate independently, the data strip size is no smaller than the exported block size, and all parity check data is stored on one disk.


A form of parity RAID in which the disks operate independently, the data strip size is no smaller than the exported block size, and parity check data is distributed across the array's disks.

of 407


Any form of RAID that can continue to execute read and write requests to all of an array's virtual disks in the presence of two concurrent disk failures. Both dual check data computations (parity and Reed Solomon) and orthogonal dual parity check data have been proposed for RAID Level 6.

RAID Advisory Board CONTEXT [Storage System]

An organization of suppliers and users of storage subsystems and related products whose goal is to foster the understanding of storage subsystem technology among users, and to promote all aspects of storage technology in the market. cf. http://www.raid-advisory.com

RAID array CONTEXT [Storage System]

Acronym for Redundant Array of Independent Disks.

RAMdisk CONTEXT [Storage Device]

A quantity of host system random access memory (RAM) managed by software and presented to applications as a high-performance disk. RAMdisks generally emulate disk I/O functional characteristics, but unless augmented by special hardware to make their contents non-volatile, they cannot tolerate loss of power without losing data. cf. solid state disk

random I/O random I/O load random reads random writes CONTEXT [Storage System]

Any I/O load whose consecutively issued read and/or write requests do not specify adjacently located data. The term random I/O is commonly used to denote any I/O load that is not sequential, whether or not the distribution of data locations is indeed random. Random I/O is characteristic of I/O request-intensive applications. cf. sequential I/O

random relative offset CONTEXT [Fibre Channel]

A transmission control algorithm in which the frames containing the subblocks that comprise a block of informationmay be transmitted in any order. This complicates reassembly and detection of lost frames by comparison with continuously increasing relative offset.

rank CONTEXT [Storage System]

A set of physical disk positions in an enclosure, usually denoting the disks that are or can be members of a single array. The set of corresponding target identifiers on all of a controller's device I/O buses. Like the preceding definition, the disks identified as a rank by this definition usually are or can be members of a single array. Synonym for a stripe in a redundancy group. Because of the diversity of meanings attached to this term by disk subsystem developers, SNIA publications make minimal use of it.

of 407

RAS

Acronym for Reliability, Availability, and Serviceability

Acronym for Remote Access Server (Windows NT dialup networking server).

raw partition

A disk partition not managed by a volume manager. The term raw partition is frequently encountered when discussing database systems because some database system vendors recommend volumes or files for underlying database storage, while others recommend direct storage on raw partitions.

raw partition backup CONTEXT [Backup]

A bit-by-bit copy of a partition image. A raw partition backup incorporates no information about the objects contained on the partition, and hence cannot be used for individual object restoration. cf. disk image backup

read/write head CONTEXT [Storage Device]

The magnetic or optical recording device in a disk. Read/write heads are used both to write data by altering the recording media's state, and to read data by sensing the alterations. Disks typically have read/write heads, unlike tapes, in which reading and writing are often done using separate heads.

real-time copy CONTEXT [Storage System]


rebuild rebuilding CONTEXT [Storage System]

The regeneration and writing onto one or more replacement disks of all of the user data and check data from a failed disk in a mirrored or RAID array. In most arrays, a rebuild can occur while applications are accessing data on the array's virtual disks.

receiver CONTEXT [Fibre Channel]

An interconnect or network device that includes a detector and signal processing electronics. The portion of a Link_Control_Facility dedicated to receiving an encoded bit stream, converting the stream into transmission characters, and decoding the characters using the rules specified by FC-PH. A circuit that converts an optical or electrical media signal to a (possibly retimed) electrical serial logic signal.

receptacle

The stationary (female) half of the interface connector on a transmitter or receiver.

of 407

reconstruction CONTEXT [Storage System]

Synonym for rebuilding.

recorded volume serial number CONTEXT [Backup]

Synonym for media ID. Abbreviated RVSN.

recovery

CONTEXT [Backup]

The re-creation of a past operational state of an entire application or computing environment. Recovery is requiredafter an application or computing environment has been destroyed or otherwise rendered unusable. It may includerestoration of application data, if that data had been destroyed as well. cf. restoration

red CONTEXT [Security]

In the context of security analysis, a designation applied to information systems and associated areas, circuits, components, and equipment in which sensitive information is being processed.

red/black concept CONTEXT [Security]

The separation of electrical and electronic circuits, components, equipment and systems that handle sensitive information (red) in electrical form, from those that handle information that is not sensitive (black) in the same form.

reduced mode CONTEXT [Storage System]

Synonym for degraded mode.

reduction CONTEXT [Storage System]

The removal of a member disk from a RAID array, placing the array in degraded mode. Reduction most often occurs because of member disk failure; however, some RAID implementations allow reduction for system management purposes.

redundancy

The inclusion of extra components of a given type in a system (beyond those required by the system to carry out its function) for the purpose of enabling continued operation in the event of a component failure.

of 407

redundancy group CONTEXT [Management] [Storage System]

A collection of extents organized for the purpose of providing data protection. Within a redundancy group, a single type of data protection is employed. All of the usable storage capacity in a redundancy group is protected by checkdata stored within the group, and no usable storage external to a redundancy group is protected by check data within it. A class defined in the CIM schema (CIM_RedundancyGroup) consisting of a collection of objects in which redundancy is provided. Three subclasses of CIM_RedundancyGroup are defined: (1) CIM_SpareGroup for sparing and failover, (2) CIM_ExtraCapacityGroup for load sharing or load balancing, and (3) CIM_StorageRedundancyGroup to describe the redundancy algorithm in use.

redundancy group stripe CONTEXT [Storage System]

A set of sequences of correspondingly numbered physical extent blocks in each of the physical extents comprising a redundancy group. The check data blocks in a redundancy group stripe protect the protected space in that stripe.

redundancy group stripe depth CONTEXT [Storage System]

The number of consecutively numbered physical extent blocks in one physical extent of a redundancy group stripe.In the conventional striped data mapping model, redundancy group stripe depth is the same for all stripes in a redundancy group.

redundant (components)

Components of a system that have the capability to substitute for each other when necessary, as, for example when one of the components fails, so that the system can continue to perform its function. In storage subsystems,power distribution units, power supplies, cooling devices, and controllers are often configured to be redundant. Thedisks comprising a mirror set are redundant. A parity RAID array's member disks are redundant, since surviving disks can collectively replace the function of a failed disk.

redundant (configuration, system)

A system or configuration of a system in which failure tolerance is achieved by the presence of redundant instances of all components that are critical to the system's operation.

Redundant Array of Independent Disks CONTEXT [Storage System]

A disk array in which part of the physical storage capacity is used to store redundant information about user data stored on the remainder of the storage capacity. The redundant information enables regeneration of user data in the event that one of the array's member disks or the access path to it fails. Although it does not conform to this definition, disk striping is often referred to as RAID (RAID Level 0).

regeneration CONTEXT [Storage System]

of 407

Re-creation of user data from a failed disk in a RAID array using check data and user data from surviving members. Regeneration may also be used to recover data from an unrecoverable media error. Data in a parity RAID array is regenerated by computing the exclusive OR of the contents of corresponding blocks from the array's remaining disks. Data in a RAID Level 6 array is regenerated by choosing the more convenient of two parity algorithms and executing it.

registered state change notification CONTEXT [Fibre Channel]

A Fibre Channel switch function that allows notification to registered nodes if a change occurs to other specified nodes.

rekeying CONTEXT [Security]

The process of changing the key used for an ongoing communication session.

relative offset CONTEXT [Fibre Channel]

A displacement, expressed in bytes, used to divide a quantity of data into blocks and sub-blocks for transmission in separate frames. Relative offsets are used to reassemble data at the receiver and verify that all data has arrived.

relative offset space CONTEXT [Fibre Channel]

A numerical range defined by a sending upper level protocol for an information category. The range starts at zero, representing the upper level-defined-origin, and extends to a highest value. Relative offset values are required to lie within the appropriate relative offset space.

removable media


removable media storage device

A storage device designed so that its storage media can be readily removed and inserted. Tapes, CDROMs, and optical disks are removable media devices.

repeater CONTEXT [Fibre Channel]

A circuit that uses clock recovered from an incoming signal to generate an outbound signal.

repeating ordered set CONTEXT [Fibre Channel]

of 407

An ordered set issued by FC-2 to FC-1 for repeated transmission until a subsequent transmission request is issued by FC-2.

replacement disk CONTEXT [Storage System]

A disk available for use as a member disk, or to replace a failed one, in a RAID array.

replacement unit

A component or collection of components in a system which is always replaced (swapped) as a unit when any part of the collection fails. Abbreviated RU. Replacement units may be field replaceable, or they may require that the system of which they are part be returned to a factory or repair depot for replacement. Field replaceable units maybe customer replaceable, or their replacement may require trained service personnel. Typical replacement units in a disk subsystem include disks, controller logic boards, power supplies, cooling devices, and cables. Replacement units may be cold, warm, or hot swappable.

replay attack CONTEXT [Security]

An attack in which a valid data transmission is maliciously or fraudulently repeated, either by the originator or by an adversary who intercepts the data and retransmits it.

replica CONTEXT [Backup]

A general term for a copy of a collection of data. cf. duplicate, point in time copy, snapshot

replicate CONTEXT [Backup]

(noun) A general term for a copy of a collection of data. cf. duplicate, point in time copy, snapshot

(verb) The action of making a replicate as defined above. A characterization of applications. Also known as throughput-intensive. A request-intensive application is an I/O-intensive application whose I/O load consists primarily of large numbers of I/O requests for relatively small amounts of data. Request-intensive applications are typically characterized by random I/O loads.

Request for Comment CONTEXT [Security]

Internet-related specifications, including standards, experimental definitions, informational documents, and best practice definitions, produced by the IETF.

request-intensive (application)

A characterization of applications. A request-intensive application is an I/O-intensive application characterized by ahigh rate of I/O requests. Request-intensive applications' I/O requests are usually randomly addressed and often specify a small amount of data for transfer.

of 407

reserved (field) CONTEXT [Fibre Channel]

In a standard, a field in a data structure set aside for future definition. Some standards prescribe implementation behavior with respect to reserved fields (e.g., originators of data structures containing reserved fields must zero fill them; consumers of data structures containing reserved fields must ignore them, etc.); others do not. A field filled with binary zeros by a source N_Port and ignored by a destination N_Port. Each bit in a reserved field is denoted by "r" in the Fibre Channel standards. Future enhancements to FC-PH may define usages for reserved fields. Implementations should not check or interpret reserved fields. Violation of this guideline may result in loss of compatibility with future implementations which comply with future enhancements to FC-PH.

resilvering CONTEXT [Backup, Storage Systems]

Synonym for mirror resynchronization.

responder CONTEXT [Fibre Channel]

Synonym for target. Used only in Fibre Channel contexts.

Responder Exchange_Identifier CONTEXT [Fibre Channel]

An identifier assigned by a responder to identify an exchange. Abbreviated RX_ID. An RX_ID is meaningful only to the responder that originates it.

restoration CONTEXT [Backup]

The copying of a backup to online storage for application use. Restoration normally occurs after part or all of an application's data has been destroyed or become inaccessible. cf. recovery

retention period CONTEXT [Backup] [File System]

[Backup] The length of time that a backup image should be kept.

[File System] In some file systems, such as that shipped with IBM Corporation's OS/390 operating system, a property of a file that can be used to implement backup and data migration policies.

retimer CONTEXT [Fibre Channel]

A circuit that uses a clock independent of the incoming signal to generate an outbound signal.

return loss CONTEXT [Fibre Channel]

The ratio of the strength of a returned signal to that of the incident signal that caused it. In electrical circuits,

of 407

return loss is caused by impedance discontinuities. Optical return loss is caused by index of refraction differences.

RFC CONTEXT [Security]

Acronym for Request for Comment

robot robotic media handler CONTEXT [Backup]

A mechanical handler capable of storing multiple pieces of removable media and loading and unloading them from one or more drives in arbitrary order in response to electronic commands. cf. library

rotational latency CONTEXT [Storage Device]

The interval between the end of a disk seek and the time at which the starting block address specified in the I/O request passes the disk head. Exact rotational latencies for specific sequences of I/O operations can only be obtained by detailed disk drive simulation or measurement. The simplifying assumption that, on average, requests wait for half a disk revolution time of rotational latency works well in practice. Half of a disk revolution time is therefore defined to be the average rotational latency.

row CONTEXT [Storage System]

The set of blocks with corresponding physical extent block addresses in each of an array's member physical extents. The concept of rows is useful for locking the minimal amount of data during a RAID array update so as to maximize the potential for parallel execution.

RSA CONTEXT [Security]

Acronym for both a public key algorithm and a corporation in the business of algorithm design, derived from the names of the founders (Rivest, Shamir, and Adelman).

RSCN CONTEXT [Fibre Channel]

Acronym for registered state change notification.

RU

Acronym for replaceable unit. cf. CRU, FRU

run length

The number of consecutive identical bits in a transmitted signal. For example, the pattern 0011111010 has run lengths of 2, 5, 1, 1, and 1.

of 407

running disparity CONTEXT [Fibre Channel]

The cumulative disparity (positive or negative) of all previously issued transmission characters.

RVSN CONTEXT [Backup]

Acronym for recorded volume serial number.

S_ID CONTEXT [Fibre Channel]

Acronym for Source_Identifier.

S_Port CONTEXT [Fibre Channel]

A logical port inside a switch addressable by external N_Ports for service functions. An S_Port may be an implicit switch port or a separate entity such as a name server connected to and controlled by the switch. S_Ports have well known port names to facilitate early discovery by N_Ports.

SAN CONTEXT [Fibre Channel] [Network] [Storage System]

Acronym for storage area network. (This is the normal usage in SNIA documents.) Acronym for server area network, which connects one or more servers. Acronym for system area network, an interconnected set of system elements.

SAN attached storage

A term used to refer to storage elements that connect directly to a storage area network and provide file, database, block, or other types of data access services to computer systems. Abbreviated SAS. SAS elements that provide file access services are commonly called network attached storage, or NAS devices. cf. NAS

SAS

Acronym for SAN attached storage.

saturated disk

A disk whose instantaneous I/O load is as great as or greater than its capability to satisfy the requests comprising the load. Mathematically, a saturated disk's I/O queue eventually becomes indefinitely long. In practice, however, user reaction or other system factors generally reduce the rate of new request arrival for a saturated disk.

of 407

scale (verb)

In computer systems, to grow or support growth in such a way that all capabilities of the system remain in constant ratio to each other. For example, a storage subsystem whose data transfer capacity increases by the addition of buses as its storage capacity increases by the addition of disks is said to scale.

schema

A collection of information models or data models.

script

A parameterized list of primitive I/O bus operations intended to be executed in sequence. Often used with respect to ports, most of which are able to execute scripts of I/O commands autonomously (without policy processor assistance).

A sequence of instructions intended to be parsed and carried out by another program. Perl, VBScript, JavaScript and Tcl are all scripting languages.

SCSI CONTEXT [SCSI]

Acronym for Small Computer System Interface.

SCSI adapter CONTEXT [SCSI]

An adapter that connects an intelligent device to a SCSI bus. cf. HBA, host bus adapter

SCSI address CONTEXT [SCSI]

The full address used by a computer to communicate with a SCSI device, including an adapter number (required with computers configured with multiple SCSI adapters) and the target ID of the device. SCSI addresses do not include logical unit numbers, because those are not used for communication.

SCSI bus CONTEXT [SCSI]

Any parallel (multi-signal) I/O bus that implements some version of the ANSI SCSI standard. A wide SCSI bus may connect up to16 initiators and targets. A narrow SCSI bus may connect up to eight initiators and targets. cf. initiator, target

SCSI Enclosure Services CONTEXT [SCSI]

An ANSI X3T10 standard for management of environmental factors such as temperature, power, voltage, etc. Abbreviated SES.

of 407

SCSI Parallel Interface CONTEXT [SCSI]

The family of SCSI standards that define the characteristics of the parallel version of the SCSI interface. Abbreviated SPI. Several versions of SPI, known as SPI, SPI2, SPI3, etc., have been developed. Each version provides for greater performance and functionality than preceding ones.

SCSI Trade Association

A trade association incorporated in 1996 to promote all forms of SCSI technology in the market. Abbreviated STA. cf. http://www.scsita.org/

SDH CONTEXT [Network]

Acronym for Synchronous Digital Hierarchy.

sector CONTEXT [Storage Device]

The unit in which data is physically stored and protected against errors on a fixed block architecture disk. A sector typically consists of a synchronization pattern, a header field containing the block's address, data, a checksum or error correcting code, and a trailer. Adjacent sectors are often separated by information used to assist in track centering. Most often, each sector holds a block of data. cf. disk block

secure hash CONTEXT [Security]

An algorithm that generates a digest from its input (e.g., a message). The digest has the properties that different inputs are extraordinarily unlikely to have the same fingerprint, small changes in its input lead to large changes in its output, and it is computationally intractable to generate an input that has the same fingerprint as a given input.

Secure Sockets Layer CONTEXT [Security]

A suite of cryptographic algorithms, protocols and procedures used to provide security for communications used to access the world wide web. The characters "https:" at the front of a URL cause SSL to be used to enhance communications security. More recent versions of SSL are known as TLS (Transport Level Security) and are standardized by the Internet Engineering Task Force (IETF).

security identifier CONTEXT [Security]


SEQ_ID Sequence Identifier CONTEXT [Fibre Channel]

A number transmitted with each data frame in a sequence that identifies the frame as part of the sequence.

of 407

sequence CONTEXT [Fibre Channel]

A set of Fibre Channel data frames with a common Sequence_ID (SEQ_ID), corresponding to one message element, block, or information unit. Sequences are transmitted from initiator to recipient, with an acknowledgment, if applicable, transmitted from recipient to initiator. cf. Sequence Initiator, Sequence Recipient

sequence initiative CONTEXT [Fibre Channel]

A Fibre Channel protocol feature that designates which end of an exchange has authority to send the next sequence.

Sequence Initiator CONTEXT [Fibre Channel]

An N_Port which initiates a sequence and transmits data frames to a destination N_Port. cf. Sequence Recipient

Sequence Recipient CONTEXT [Fibre Channel]

An N_Port or NL_Port which receives data frames from a Sequence Initiator and, if applicable, transmits responses (link control frames) to the Sequence Initiator.

Sequence Status Block CONTEXT [Fibre Channel]

A data structure which tracks the state of a sequence. Both Sequence Initiators and Sequence Recipients have Sequence Status Blocks for each active sequence.

sequential I/O sequential I/O load sequential reads sequential writes

An I/O load consisting of consecutively issued read or write requests to adjacently located data. Sequential I/O is characteristic of data transfer-intensive applications. cf. random I/O

serdes

Acronym for serializer deserializer.

serial (transmission) CONTEXT [Fibre Channel]

The transmission of data bits one at a time over a single link.

serial adapter

of 407

An adapter that connects an intelligent device to an RS232 or RS425 serial communications link. Serial adapters are sometimes used by storage subsystems, filers, and other intelligent devices to connect to serial consoles for management purposes. cf. adapter, host adapter

serial console

A real or emulated communication terminal used by humans to manage an intelligent device. Serial consoles connect to the devices' serial adapters.

serial SCSI CONTEXT [SCSI]

Any implementation of SCSI that uses serial data transmission (as opposed to multi-conductor parallel buses). Optical and electrical Fibre Channel, SSA and IEEE 1394 are examples of serial SCSI implementations.

serializer deserializer

A mechanism for converting data from parallel to serial form and from serial to parallel form.

server

An intelligent device, usually a computer, that provides services to other intelligent devices, usually other computers or appliances. cf. client

An asymmetric relationship with a second party (a client) in which the client initiates requests and the server responds to those requests.

server based virtualization

Virtualization implemented in a host computer rather than in a storage subsystem or storage appliance. Virtualization can be implemented either in host computers, in storage subsystems or storage appliances, or in specific virtualization appliances in the storage interconnect fabric.

serverless backup

A disk backup methodology in which either the disk being backed up or the tape device receiving the backup manages and performs actual backup I/O operations. Server-free backup frees the LAN server to perform I/O operations on behalf of LAN clients and reduces the number of trips the backup data takes through processor memory. Differentiated from LAN-free backup in that no additional SAN appliance is required to offload backup I/Ooperations from the LAN server.

Server Message Block (protocol) CONTEXT [Network]

A network file system access protocol designed and implemented by Microsoft Corporation and used by Windows clients to communicate file access requests to Windows servers. Abbreviated SMB. Current versions of the SMB

of 407

protocol are usually referred to as CIFS, the Common Internet File System.

service incident standard


service level agreement

An agreement between a service provider, such as an IT department, an internet services provider, or an intelligent device acting as a server, and a service consumer. A service level agreement defines parameters for measuring the service, and states quantitative values for those parameters. Abbreviated SLA.

SES CONTEXT [SCSI] [Standards]

Acronym for SCSI Enclosure Services

Acronym for Solution Exchange Standard.

share CONTEXT [File System]

A resource such as data or a printer device made available for use by users on other computer systems. For example, a printer or a collection of files stored in a single directory tree on a file server may be made available as a share. CIFS clients, which include most networked personal computers, typically map a share to a drive letter.

shielded enclosure CONTEXT [Security]

A room or container designed to attenuate electromagnetic radiation.

shelf CONTEXT [Storage System]

A modular enclosure for storage devices (disks and tapes). Storage shelves usually contain power supplies and cooling devices, and have pre-wired backplanes that carry power and I/O bus signals to the devices mounted in them. cf. canister

SIA

Acronym for Semiconductor Industries Association.

Acronym for SCSI Industry Association.

simple name server CONTEXT [Fibre Channel] [Network]

of 407

A service provided by a Fibre Channel switch that simplifies discovery of devices attached to the fabric.

SID

Acronym for security identifier.

Simple Network Management Protocol CONTEXT [Network] [Standards]

An IETF protocol for monitoring and managing systems and devices in a network. The data being monitored and managed is defined by a MIB. The functions supported by the protocol are the request and retrieval of data, the setting or writing of data, and traps that signal the occurrence of events.

single (component) configuration

A configuration in which the referenced component is not redundant. cf. redundant (component)

single ended (signaling) CONTEXT [SCSI]

An electrical signaling technique in which all control and data signals are represented by a voltage difference from a common ground. cf. differential

single mode (fiber optic cable) CONTEXT [Fibre Channel] [Network]

A fiber optic cabling specification that provides for up to a 10-kilometer distance between devices.

single point of failure

One component or path in a system, the failure of which would make the system inoperable. Abbreviated SPOF.

SIS CONTEXT [Management] [Standards]

Acronym for service incident standard.

SLA

Acronym for service level agreement.

Small Computer System Interface (SCSI) CONTEXT [SCSI]

of 407

A collection of ANSI standards and proposed standards which define I/O buses primarily intended for connecting storage subsystems or devices to hosts through host bus adapters. Originally intended primarily for use with small (desktop and desk-side workstation) computers, SCSI has been extended to serve most computing needs, and is arguably the most widely implemented I/O bus in use today.

small read request small write request small I/O request

An I/O, read, or write request that specifies the transfer of a relatively small amount of data. "Small" usually depends on the context, but most often refers to 8 KBytes or fewer. cf. large I/O request

SMB CONTEXT [File System] [Network]

Acronym for Server Message Block.

SMI CONTEXT [Fibre Channel] [Management] [Network]

Acronym for Structure of Management Information.

SMPTE CONTEXT [Standards]

Acronym for Society of Motion Picture and Television Engineers.

snapshot CONTEXT [Backup] [Storage System]

A fully usable copy of a defined collection of data that contains an image of the data as it appeared at the point in time at which the copy was initiated. A snapshot may be either a duplicate or a replicate of the data it represents. The CIM_Snapshot class. An optional construct that can be used to represent a storage extent that contains either a full copy of another storage extent or the changes to that extent (in the case of a delta before or delta after copy). A CIM snapshot is not equivalent to a volume-or file-based snapshot, or a point in time copy. It represents storage used to hold a copied image of an extent, or to hold changes to an extent.

SNIA CONTEXT [Network] [Standards] [Storage System]

Acronym for Storage Networking Industry Association.

sniffer CONTEXT [Security]

A software tool for auditing and identifying network traffic packets.

SNMP CONTEXT [Network] [Management]

Acronym for Simple Network Management Protocol.

of 407

SNS CONTEXT [Network]

Acronym for simple name server.

Society of Motion Picture and Television Engineers CONTEXT [Standards]

An industry association whose goal is to standardize television and motion picture industry information interchangeprotocols.

soft zone CONTEXT [Fibre Channel]

A zone consisting of zone members that are permitted to communicate with each other via the fabric.

Soft zones are typically implemented through a combination of name server and Fibre Channel protocol—when a port contacts the name server, the name server returns information only about Fibre Channel ports in the same zone(s) as the requesting port. This prevents ports outside the zone(s) from being discovered and hence the Fibre Channel protocol will not attempt to communicate with such ports. In contrast to hard zones, soft zones are not enforced by hardware; e.g., a frame that is erroneously addressed to a port that should not receive it will nonetheless be delivered. Well-known addresses are implicitly included in every zone. cf. zone, hard zone

SOF CONTEXT [Fibre Channel]

Acronym for start of frame.

solicited control CONTEXT [Fibre Channel]

An information category indicated in a Fibre Channel frame header.

solicited data CONTEXT [Fibre Channel]


solid state disk CONTEXT [Storage Device]

A disk whose storage capability is provided by solid state random access memory rather than magnetic or optical media. A solid state disk generally offers very high access performance compared to that of rotating magnetic disks, because it eliminates mechanical seek and rotation time. It may also offer very high data transfer capacity. Cost per byte of storage, however, is typically quite high, and volumetric density is lower. A solid state disk includesome mechanism such as battery backup or magnetic backing store that allows its operating environment to treat it as non-volatile storage. cf. RAMdisk

Solution Exchange Standard CONTEXT [Management]

of 407

A DMTF standard that defines the exchange of support or help desk information.

Service Incident Standard CONTEXT [Management]

A DMTF standard that defines how a support or help desk incident is processed.

SONET CONTEXT [Network]

Acronym for Synchronous Optical Network.

Source Identifier CONTEXT [Fibre Channel]

A number in a Fibre Channel frame that identifies the source of the frame. Abbreviated S_ID. cf. D_ID

source N_Port CONTEXT [Fibre Channel]

The N_Port from which a frame is transmitted.

spare (disk, extent) CONTEXT [Storage System]

An object reserved for the purpose of substitution for a like object in case of that object's failure.

special character CONTEXT [Fibre Channel]

Any transmission character that is valid in the transmission code but does not correspond to a valid data byte. Special characters are used to denote special functions.

special code CONTEXT [Fibre Channel]

A code which, when encoded using the rules specified by the transmission code, results in a special character. Special codes are typically associated with control signals related to protocol management (e.g., K28.5).

SPI

Acronym for SCSI Parallel Interface.

spiral data transfer rate

Synonym for full volume transfer rate.

of 407

split I/O request

An I/O request to a virtual disk which requires two or more I/O operations to satisfy, because the virtual data addresses in the request map to more than one extent on one or more disks

An application I/O request that is divided into two or more sub-requests by a file system or other operating systemcomponent because the amount of data requested is too large for the operating environment to handle as a unit.

split mirror split mirror copy split mirror point in time copy CONTEXT [Storage System]

Any of a class of point in time copy implementations or the resulting copies in which the storage for the copy is synchronized to the source of the copy and then split. A split mirror copy occupies as much storage as the source of the copy.

CONTEXT [Backup]

A method for generating a frozen image of a set of data. A split mirror frozen image is a set of storage devices containing a complete copy of data as of the moment of frozen image creation. When a split mirror frozen image has served its purpose, the contents of the storage devices it occupies must be resynchronized with the original data from which it was split.

SPOF

Acronym for single point of failure.

spoofing CONTEXT [Security]

Unauthorized use of legitimate identification and authentication data to mimic a subject different from the attacker. Impersonating, masquerading, piggybacking and mimicking are forms of spoofing.

SR CONTEXT [Fibre Channel]

Acronym for Sequence Recipient.

SRM CONTEXT [Management]

Acronym for storage resource management.

STA

Acronym for SCSI Trade Association.

of 407

stand-alone drive CONTEXT [Backup]

A removable media drive that is not associated with a media stacker or robot.

standard interconnect CONTEXT [Standards]

An I/O or network interconnect whose specifications are readily available to the public, and which can therefore easily be implemented in a vendor's products without license or royalty payments. Also called open interconnect.

star

A physical network configuration in which every node is connected directly to, and only to, a central point. All communications pass through the central point, which may be a hub or a switch.

start of frame CONTEXT [Fibre Channel]

A group of ordered sets that delineates the beginning of a frame.

storage area network CONTEXT [Fibre Channel] [Network] [Storage System]

A network whose primary purpose is the transfer of data between computer systems and storage elements and among storage elements. Abbreviated SAN. A SAN consists of a communication infrastructure, which provides physical connections, and a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust. The term SAN is usually (but not necessarily) identified with block I/O services rather than file access services. A storage system consisting of storage elements, storage devices, computer systems, and/or appliances, plus all control software, communicating over a network. Note: TheSNIA definition specifically does not identify the term SAN with Fibre Channel technology. When the term SAN is used in connection with Fibre Channel technology, use of a qualified phrase such as "Fibre Channel SAN" is encouraged. According to this definition an Ethernet-based network whose primary purpose is to provide access to storage elements would be considered a SAN. SANs are sometimes also used for system interconnection in clusters.

storage array CONTEXT [Storage System]

A collection of disks or tapes from one or more commonly accessible storage subsystems, combined with a body ofcontrol software.

storage controller CONTEXT [Storage System]

An intelligent controller to which storage devices are attached.

storage device CONTEXT [Storage Device]

of 407

A collective term for disks, tapes, disk arrays, tape arrays, and any other mechanisms capable of non-volatile data storage. This definition is specifically intended to exclude aggregating storage elements such as RAID array subsystems, robotic tape libraries, filers, and file servers.

storage device virtualization

Virtualization of storage devices such as disk, tape drive, RAID shelves, etc.

storage domain CONTEXT [Storage System]

A collection of storage resources and supporting software and interfaces that are managed as a unit.

storage element

Any device designed and built primarily for the purpose of persistent data storage and delivery. This definition is specifically intended to encompass disk drives, tape drives, RAID array subsystems, robotic tape libraries, filers, file servers, and any other types of storage devices.

storage extent

A CIM object called CIM_StorageExtent. A storage extent instance may represent either removable or nonremoveable media. cf. extent

storage networking

The practice of creating, installing, administering, or using networks whose primary purpose is the transfer of data between computer systems and storage elements and among storage elements.

Storage Networking Industry Association CONTEXT [Network] [Standards] [Storage System]

An association of producers and consumers of storage networking products whose goal is to further storage networking technology and applications.

storage resource management CONTEXT [Management]

Management of physical and logical storage resources, including storage elements, storage devices, appliances, virtual devices, disk volume and file resources.

storage subsystem CONTEXT [Storage System]

An integrated collection of (a.) storage controllers and/or host bus adapters, (b.) storage devices such as disks, CDROMs, tapes, media loaders and robots, and (c.) any required control software, that provides storage services

of 407

to one or more computers.

storage subsystem virtualization

The implementation of virtualization in a storage subsystem.

storage virtualization

The act of abstracting, hiding, or isolating the internal function of a storage (sub) system or service from applications, compute servers or general network resources for the purpose of enabling application and network independent management of storage or data.

The application of virtualization to storage services or devices for the purpose of aggregating, hiding complexity or adding new capabilities to lower level storage resources. Storage can be virtualized simultaneously in multiple layers of a system, for instance to create HSM like systems.

storage volume

In CIM, a StorageVolume is a subclass of CIM_StorageExtent and represents an object presented to an operating system (for example, by a hardware RAID cabinet), to a file system (for example, by a software volume manager) or to another entity. Storage volumes do NOT participate in CIM_Storage RedundancyGroups. They are directly realized in hardware or are the end result of assembling and building on lower level extents.

store and forward (switching) CONTEXT [Fibre Channel] [Network]

A switching technique that requires buffering an entire frame before a routing decision is made.

streamed sequence CONTEXT [Fibre Channel]

A new sequence initiated by a Sequence Initiator in any class of service for an exchange while it already has sequences open for that exchange.

strip CONTEXT [Storage System]

A number of consecutively addressed blocks in a single extent. A disk array uses strips to map virtual disk block addresses to member disk block addresses. Also known as stripe element.

strip size CONTEXT [Storage System]

Synonym for stripe depth.

stripe CONTEXT [Storage System]

of 407

The set of strips at corresponding locations of each member extent of a disk array which uses striped data mapping. The strips in a stripe are associated with each other in a way (e.g., relative extent block addresses) that allows membership in the stripe to be quickly and uniquely determined by a computational algorithm. Parity RAID uses stripes to map virtual disk block addresses to member extent block addresses.

stripe depth CONTEXT [Storage System]

The number of blocks in a strip in a disk array which uses striped data mapping. Also, the number of consecutivelyaddressed virtual disk blocks mapped to consecutively addressed blocks on a single member extent of a disk array.

stripe element CONTEXT [Storage System]

Synonym for strip.

stripe size CONTEXT [Storage System]

The number of blocks in a stripe. A striped array's stripe size is the stripe depth multiplied by the number of member extents. A parity RAID array's stripe size is the stripe depth multiplied by one less than the number of member extents.

striped array striped disk array CONTEXT [Storage System]

A disk array with striped data mapping but no redundancy for failure protA disk array with striped data mapping but no redundancy for failure protection. Striped arrays are usually used to improve I/O performance on data that is of low value or easily replaced.

stripeset CONTEXT [Storage System]

Synonym for striped array.

striping CONTEXT [Storage System]

Short for data striping; also known as RAID Level 0 or RAID 0. A mapping technique in which fixed-size consecutive ranges of virtual disk data addresses are mapped to successive array members in a cyclic pattern. A network technique for aggregating the bandwidth of several links between the same pair of nodes. A single data stream can be spread across the links for higher aggregate bandwidth. Sometimes called port aggregation.

Structure of Management Information CONTEXT [Fibre Channel] [Management] [Network]

A notation for setting or retrieving management variables over SNMP.

of 407

SSL CONTEXT [Security]

Acronym for Secure Sockets Layer

subdirectory CONTEXT [File System]

A directory in a hierarchical directory tree whose parent is a directory.

subject CONTEXT [Security]

In the context of access control or authorization, an entity whose access or usage is controlled. Users are examples of subjects.

substitution

The assumption of a component's function in a system by a functionally equivalent component.

SVC CONTEXT [Network]

Acronym for switched virtual circuit.

swap swapping

The installation of a replacement unit in place of a defective unit in a system. Units are any parts of a system which may be either field replaceable (FRUs) by a vendor service representative or consumer replaceable (CRUs). A physical swap operation may be cold, warm, or hot, depending on the state in which the disk subsystem must bein order to perform it. A functional swap operation may be an auto swap or it may require human intervention.

switch CONTEXT [Fibre Channel] [Network]

A network infrastructure component to which multiple nodes attach. Unlike hubs, switches typically have internal bandwidth that is a multiple of link bandwidth, and the ability to rapidly switch node connections from one to another. A typical switch can accommodate several simultaneous full link bandwidth transmissions between different pairs of nodes. cf. hub

switch-back

Synonym for failback.

switch-over

of 407

Synonym for failover.

switched over (system)

Synonym for failed over.

switched virtual circuit


symmetric cryptosystem CONTEXT [Security]

A cryptographic algorithm in which the same key is used to encrypt and decrypt a single message or block of stored information. Keys used in a symmetric cryptosystem must be kept secret.

symmetric virtualization

Synonym for in-band virtualization. In-band virtualization is the preferred term

Synchronous Digital Hierarchy CONTEXT [Network]

An ISO standard with 155, 622, 2048, 9953 Mbps serial data rates in steps of 4. A common worldwide telecommunications methodology. SDH uses a light scrambling of data to remove only the lowest frequency elements with the goal of achieving maximum digital bandwidth use.

synchronization CONTEXT [Fibre Channel]

A receiver's identification of a transmission word boundary.

The act of aligning or making two entities equivalent at a specified point in time.

synchronous operations

Operations which have a fixed time relationship to each other. Most commonly used to denote I/O operations which occur in time sequence, i.e., a successor operation does not occur until its predecessor is complete.

Synchronous Optical Network CONTEXT [Network]

A standard for optical network elements. Abbreviated SONET. SONET provides modular building blocks, fixed overheads, integrated operations channels, and flexible payload mappings. Basic SONET provides a bandwidth of 51.840 megabits/second. This is known as OC-1. Higher bandwidths that are n times the basic rate are available

of 407

(known as OC-n). OC-3, OC-12, OC-48, and OC-192 are currently in common use.

system board

A printed circuit module containing mounting devices for processor(s), memory, and adapter cards, and implementing basic computer functions such as memory access, processor and I/O bus clocking, and human interface device attachment.

system disk

The disk on which a computer system's operating software is stored. The system disk is usually the disk from which the operating system is bootstrapped (initially loaded into memory). The system disk frequently contains the computer system's swap and/or page files as well. It may also contain libraries of common software shared among several applications.

system under test

An entity being tested to verify functional behavior or determine performance characteristics. Distinguished from test system.

T1 copy CONTEXT [Storage System]


T10 CONTEXT [SCSI]

The ANSI T10 technical committee, the standards organization responsible for SCSI standards for communication between computers and storage subsystems and devices.

T11 CONTEXT [Fibre Channel]

The ANSI T11 technical committee, the standards organization responsible for Fibre Channel and certain other standards for moving electronic data into and out of computers and intelligent storage subsystems and devices.

tabular mapping CONTEXT [Storage System]

A form of mapping in which a lookup table contains the correspondence between the two address spaces being mapped to each other. If a mapping between two address spaces is tabular, there is no mathematical formula that will convert addresses in one space to addresses in the other. cf. algorithmic mapping, dynamic mapping

tape tape drive tape transport CONTEXT [Storage Device]

of 407

A storage device that writes data sequentially in the order in which it is delivered, and reads data in the order in which it is stored on the media. Unlike disks, tapes use implicit data addressing. cf. disk

tape array CONTEXT [Storage System]

A collection of tapes from one or more commonly accessible storage subsystems, combined with a body of control software.

tape virtualization tape drive virtualization tape library virtualization

The act of creating abstracted tape devices by applying virtualization to tape drives or tape libraries.

target CONTEXT [SCSI]

The system component that receives a SCSI I/O command. cf. initiator, LUN, target ID

target ID CONTEXT [SCSI]

The SCSI bus address of a target device or controller.

TByte

Shorthand for terabyte.

TCO

Acronym for total cost of ownership.

TCP CONTEXT [Network]

Acronym for Transmission Control Protocol. cf. IP

TCP/IP CONTEXT [Network]

Shorthand for the suite of protocols that includes TCP, IP, UDP, and ICMP. This is the basic set of communication protocols used on the Internet.

tenancy CONTEXT [Fibre Channel]

of 407

The possession of a Fibre Channel arbitrated loop by a device to conduct a transaction.

terabyte

Shorthand for 1,000,000,000,000 (10^12) bytes. SNIA publications typically use the 10^12 convention commonlyfound in I/O literature rather than the 1,099,5111,627,776 (2^40) convention sometimes used when discussing random access memory.

test system

A collection of equipment used to perform a test. In functional and performance testing, it is generally important to clearly define the test system, and distinguish it from the system under test.

third party copy CONTEXT [Backup] [Management]

A protocol for performing tape backups using minimal server resources by copying data directly from the source device (disk or array) to the target device (tape transport) without passing through a server.

threat CONTEXT [Security]

Any circumstance or event with the potential to harm an information system through unauthorized access, destruction, disclosure, modification of data, and/or denial of service.

throughput

The number of I/O requests satisfied per unit time. Expressed in I/O requests/second, where a request is an application request to a storage subsystem to perform a read or write operation.

throughput-intensive (application)

A request-intensive application.

time server

An intelligent entity in a network that enables all nodes in the network to maintain a common time base within close tolerances.

TNC CONTEXT [Network]

Acronym for Threaded Neil Councilman, a type of coaxial cable connector. Specifications for TNC style connectors are defined in MIL-C-39012 and MIL-C-23329.

of 407

token ring (network) CONTEXT [Network]

A network in which each node's transmitter is connected to the receiver of the node to its logical right, forming a continuous ring. Nodes on a token ring network gain the right to transmit data by retaining a token (a specific unique message) when they receive it. When a node holding the token has transmitted its allotment of data, it forwards the token to the next node in the ring. A LAN protocol for token ring networks governed by IEEE Standard 802.3 that operates at speeds of 4 Mbits/second and 16 Mbits/second.

topology

The logical layout of the components of a computer system or network and their interconnections. Topology deals with questions of what components are directly connected to other components from the standpoint of being able to communicate. It does not deal with questions of physical location of components or interconnecting cables.

total cost of ownership

The comprehensive cost of a particular capability such as data processing, storage access, file services, etc. Abbreviated TCO, total cost of ownership includes acquisition, environment, operations, management, service, upgrade, loss of service, and residual value. cf. inherent cost

TPC CONTEXT [Backup] [Management]

Acronym for third party copy.

transceiver CONTEXT [Fibre Channel]

A transmitter and receiver combined in one package.

transmission character CONTEXT [Fibre Channel]

Any encoded character (valid or invalid) transmitted across a physical interface specified by FC-0. Valid transmission characters are specified by the transmission code and include data characters and special characters.

transmission code CONTEXT [Fibre Channel]

A means of encoding data to enhance its transmission characteristics. The transmission code specified by FC-PH is byte-oriented, with both valid data bytes and special codes encoded into 10-bit transmission characters.

Transmission Control Protocol

The Internet connection oriented network transport protocol. Abbreviated TCP. TCP provides a reliable delivery service.

of 407

transmission word CONTEXT [Fibre Channel]

A string of four contiguous transmission characters aligned on boundaries that are zero modulo 4 from a previously received or transmitted special character. FC-1 transmission and reception operate in transmission wordunits.

transmitter CONTEXT [Fibre Channel]

The portion of a Link_Control_Facility that converts valid data bytes and special codes into transmission charactersusing the rules specified by the transmission code, converting these transmission characters into a bit stream, and transmitting this bit stream on an optical or electrical transmission medium.

An electronic circuit that converts an electrical logic signal to a signal suitable for optical or electrical communications media.

transparent failover

A failover from one component of a system to another that is transparent to the external operating environment. Often used to refer to paired disk controllers, one of which exports the other's virtual disks at the same host bus addresses after a failure. cf. non-transparent failover

trap CONTEXT [Management]

A type of SNMP message used to signal that an event has occurred.

triaxial cable

An electrical transmission medium consisting of three concentric conductors separated by a dielectric material with the spacings and material arranged to give a specified electrical impedance. cf. coaxial cable

trojan horse CONTEXT [Security]

A computer program containing hidden code that allows the unauthorized collection, falsification, or destruction of information.

tunneling CONTEXT [Security]

A technology that enables one network protocol to send its data via another network protocol's connections. Tunneling works by encapsulating the first network protocol within packets carried by the second protocol. A tunnel may also encapsulate a protocol within itself (e.g., an IPsec gateway operates in this fashion, encapsulatingIP in IP and inserting additional IPsec information between the two IP headers).

of 407

UDP CONTEXT [Network]

Acronym for User Datagram Protocol.

ULP CONTEXT [Fibre Channel]

Acronym for Upper Layer Protocol.

Ultra SCSI CONTEXT [SCSI]

A form of SCSI capable of 20 megatransfers per second. Single ended Ultra SCSI supports bus lengths of up to 1.5meters. Differential Ultra SCSI supports bus lengths of up to 25 meters. Ultra SCSI specifications define both narrow (8 data bits) and wide (16 data bits) buses. A narrow Ultra SCSI bus transfers data at a maximum of 20 MBytes per second. A wide Ultra SCSI bus transfers data at a maximum of 40 MBytes per second.

Ultra2 SCSI CONTEXT [SCSI]

A form of SCSI capable of 40 megatransfers per second. There is no single ended Ultra2 SCSI specification. Low voltage differential (LVD) Ultra2 SCSI supports bus lengths of up to 12 meters. High voltage differential Ultra2 SCSI supports bus lengths of up to 25 meters. Ultra2 SCSI specifications define both narrow (8 data bits) and wide(16 data bits) buses. A narrow Ultra SCSI bus transfers data at a maximum of 40 MBytes per second. A wide Ultra2 SCSI bus transfers data at a maximum of 80 MBytes per second.

Ultra3 SCSI CONTEXT [SCSI]

A form of SCSI capable of 80 megatransfers per second. There is no single ended Ultra3 SCSI specification. Low voltage differential (LVD) Ultra2 SCSI supports bus lengths of up to 12 meters. There is no high voltage differentialUltra3 SCSI specification. Ultra3 SCSI specifications only define wide (16 data bits) buses. A wide Ultra3 SCSI bus transfers data at a maximum of 160 MBytes per second.

UML CONTEXT [Management]

Acronym for Unified Modeling Language.

unauthorized disclosure CONTEXT [Security]

The exposure of information to individuals not authorized to receive or access it.

unclassified CONTEXT [Security]

Information that is not designated as classified.

of 407

unicast CONTEXT [Network]

The transmission of a message a single receivers. Unicast can be contrasted with broadcast (sending a message toall receivers on a network) and multicast (sending a message to select subset of receivers).

Unicode

A standard for a 16-bit character set (each character has a 16-bit number associated with it). Unicode allows for up to 2^16, or 65,536, characters each of which may have a unique representation. It accommodates several non-English characters and symbols, and is therefore an aid to development of products with multilingual user interfaces. Unicode was designed and is maintained by the non-profit industry consortium Unicode Inc.

uninterruptible power source

A source of electrical power that is not affected by outages in a building's external power source. Abbreviated UPS.UPSs may generate their own power using gasoline generators, or they may consist of large banks of batteries. UPSs are typically installed to prevent service outages due to external power grid failure in computer applications deemed by their owners to be "mission critical."

Unified Modeling Language CONTEXT [Management]

A visual approach that uses a variety of diagrams such as use case, class, interaction, state, activity and others) tospecify the objects of a model and their relationships. Abbreviated UML. Various tools exist for turning UML diagrams into program code.

unsolicited control CONTEXT [Fibre Channel]


unsolicited data CONTEXT [Fibre Channel]


upper layer protocol CONTEXT [Fibre Channel]

A protocol used on a Fibre Channel network at or above the FC-4 level. Abbreviated ULP. SCSI and IP are examples of ULPs.

UPS

Acronym for uninterruptible power source.

of 407

usable capacity CONTEXT [Storage Device] [Storage System]

The storage capacity in a disk or disk array that is available for storing user data. Usable capacity of a disk is the total formatted capacity of a disk minus the capacity reserved for media defect compensation and metadata. Usable capacity of a disk array is the sum of the usable capacities of the array's member disks minus the capacity required for check data and metadata.

user data CONTEXT [Storage System]


user data extent CONTEXT [Storage System]

The protected space in one or more contiguously located redundancy group stripes in a single redundancy group. In RAID arrays, collections of user data extents comprise the virtual disks or volume sets presented to the operating environment.

user data extent stripe depth CONTEXT [Storage System]

The number of consecutive blocks of protected space in a single user data extent that are mapped to consecutive virtual disk block addresses. In principle, each user data extent that is part of a virtual disk may have a different user data extent stripe depth. User data extent stripe depth may differ from the redundancy group stripe depth of the protected space extent in which it resides.

User Datagram Protocol CONTEXT [Network]

An Internet protocol that provides connectionless datagram delivery service to applications. Abbreviated UDP. UDP over IP adds the ability to address multiple endpoints within a single network node to IP.

user identification number

A unique number that identifies an individual to a computer system. Abbreviated UID. UIDs are the result of authentication processes that use account names, passwords and possibly other data to verify that a user is actually who she represents herself to be. UIDs are input to authorization processes that grant or deny access to resources based on the identification of the requesting user.

valid data byte CONTEXT [Fibre Channel]

A string of eight contiguous bits within FC-1 which represents a value between 0 and 255.

valid frame CONTEXT [Fibre Channel]

A received frame containing a valid start of frame (SOF), a valid end of frame (EOF), valid data characters, and proper cyclic redundancy check (CRC) of the frame header and data field.

of 407

validity control bit CONTEXT [Fibre Channel]

A control bit that indicates whether a field is valid. If a validity control bit indicates that a field is invalid, the value in the field is treated as invalid.

VBA CONTEXT [Storage System]

Acronym for virtual block address.

VCI

Acronym for virtual channel identifier.

VCSEL CONTEXT [Fibre Channel]

Acronym for Vertical Cavity Surface Emitting Laser.

vendor unique CONTEXT [Standards]

Aspects of a standard (e.g., functions, codes, etc.) not defined by the standard, but explicitly reserved for private usage between parties using the standard. Different implementations of a standard may assign different meaningsto vendor unique aspects of the standard.

verify verification CONTEXT [Backup]

The object-by-object comparison of the contents of a backup image with the online data objects from which it was made.

versioning CONTEXT [Backup]

The maintenance of multiple point-in-time copies of a collection of data. Versioning is used to minimize recovery time by increasing the number of intermediate checkpoints from which an application can be restarted.

Vertical Cavity Surface Emitting Laser CONTEXT [Fibre Channel]

A surface emitting laser source fabricated on a planar wafer with emission perpendicular to the wafer.

VI

of 407

Acronym for Virtual Interface (Architecture)

VIA

Acronym for Virtual Interface Architecture.

virtual block CONTEXT [Storage System]

A block in the address space presented by a virtual disk. Virtual blocks are the atomic units in which a virtual disk's storage capacity is typically presented by RAID arrays to their operating environments.

virtual block address CONTEXT [Storage System]

The address of a virtual block. Virtual block addresses are typically used in hosts' I/O commands addressed to the virtual disks instantiated by RAID arrays. SCSI disk commands addressed to RAID arrays, are actually using virtualblock addresses in their logical block address fields.

virtual channel identifier CONTEXT [Network]

A unique numerical tag contained in an ATM cell header. Abbreviated VCI. A VCI identifies an ATM virtual channel over which the cell containing it is to travel.

virtual circuit CONTEXT [Fibre Channel]

A set of state information shared by two communicating nodes that is independent of the particular path taken by any particular transmission.

A unidirectional path between two communicating N_Ports. Fibre Channel virtual circuits may be limited to a fraction of the bandwidth available on the physical link.

virtual device CONTEXT [Storage System]

A device presented to an operating environment by control software or by a volume manager. From an application standpoint, a virtual device is equivalent to a physical one. In some implementations, virtual devices may differ from physical ones at the operating system level (e.g., booting from a host based disk array may not be possible).

virtual disk CONTEXT [Storage System]

A set of disk blocks presented to an operating environment as a range of consecutively numbered logical blocks with disk-like storage and I/O semantics. The virtual disk is the disk array object that most closely resembles a physical disk from the operating environment's viewpoint. cf. logical disk

of 407

Virtual Interface Architecture

An API specification for direct communication among distributed applications developed by Intel, Compaq, and Microsoft. Abbreviated VIA. VIA reduces interprocess communication latency by obviating the need for applicationsto use processor interrupt or operating system paths to intercommunicate, while maintaining security on the communications path. VIA is interconnect neutral. cf. Fibre Channel Virtual Interface

virtual path identifier CONTEXT [Network]

An eight-bit field in an ATM cell header that denotes the cell over which the cell should be routed.

virtual tape CONTEXT [Storage System]

A virtual device with the characteristics of a tape.

virtualization

The act of integrating one or more (back end) services or functions with additional (front end) functionality for the purpose of providing useful abstractions. Typically virtualization hides some of the back end complexity, or adds orintegrates new functionality with existing back end services. Examples of virtualization are the aggregation of multiple instances of a service into one virtualized service, or the addition of security to an otherwise insecure service. Virtualization can be nested or applied to multiple layers of a system.

virus CONTEXT [Security]

A type of programmed threat. A code fragment (not an independent program) that replicates by attaching to another program, and either damaging information directly or causing denial of service

volatility

A property of data. Volatility refers to the certainty that data will be obliterated if certain environmental conditions are not met. For example, data held in DRAM is volatile, since if electrical power to DRAM is cut, the data in it is obliterated. cf. non-volatility, persistence

volume CONTEXT [Storage System]

Synonym for virtual disk. Used to denote virtual disks created by volume manager control software.

A piece of removable media that has been prepared for use by a backup manager (e.g., by the writing of a media ID).

volume group CONTEXT [Backup]

A collection of removable media that resides in a single location, for example, in a single robot or group of

of 407

interconnected robots.

volume manager CONTEXT [Storage System]

Common term for host based control software.

volume pool CONTEXT [Backup]

A logical collection of removable media designated for a given purpose, for example, for holding the copies of a single repetitive backup job or for backing up data from a given client or set of clients. A volume pool is an administrative entity, whereas a volume group is a physical one.

volume set CONTEXT [Storage System]

Synonym for virtual disk.

VPI

Acronym for virtual path identifier.

vulnerability CONTEXT [Security]

A weakness in an information system, system security procedures, internal controls, or implementation that could be exploited.

WAN CONTEXT [Network]

Acronym for wide area network.

warm spare (disk) CONTEXT [Storage System]

A spare to which power is applied, which is not operating but which is otherwise usable as a hot spare.

warm swap

The substitution of a replacement unit (RU) in a system for a defective one, where in order to perform the substitution, the system must be stopped (caused to cease performing its function) but power need not be removed. Warm swaps are manual (performed by humans) physical operations. cf. automatic swap, cold swap, hot swap

of 407

Wave Division Multiplexing

The splitting of light into a series of "colors" from a few (sparse) to many with a narrow wavelength separation (Dense WDM, or DWDM) for the purpose of carrying simultaneous traffic over the same physical fiber (9 micron usually). Each "color" is a separate data stream.

WBEM CONTEXT [Management]

Acronym for Web Based Enterprise Management

WDM CONTEXT [Windows]

Acronym for Wave Division Multiplexing.

Acronym for Windows Driver Model.

Web Based Enterprise Management CONTEXT [Management]

Web Based Enterprise Management is an initiative in the DMTF. Abbreviated WBEM. It is a set of technologies that enables interoperable management of an enterprise. WBEM consists of CIM, an XML DTD defining the tags (XML encodings) to describe the CIM Schema and its data, and a set of HTTP operations for exchanging the XML-based information. CIM joins the XML data description language and HTTP transport protocol with an underlying information model, CIM, to create a conceptual view of the enterprise.

well-known address CONTEXT [Fibre Channel]

An address identifier used to access a service provided by a Fibre Channel fabric. A well-known address is not subject to zone restrictions; i.e., a well-known address is always accessible, irrespective of the current active zone set.

wide SCSI CONTEXT [SCSI]

Any form of SCSI using a 16-bit data path. In a wide SCSI implementation, the data transfer rate in MBytes per second is twice the number of megatransfers per second because each transfer cycle transfers two bytes. cf. fast SCSI, Ultra SCSI, Ultra2 SCSI, Ultra3 SCSI.

wide area network CONTEXT [Network]

A communications network that is geographically dispersed and that includes telecommunications links.

Windows Driver Model CONTEXT [Windows]

A Microsoft specification for device drivers to operate in both the Windows NT and the Windows 95/98 operating

of 407

systems.

Windows Internet Naming Service CONTEXT [Windows]

A facility of the Windows NT operating system that translates between IP addresses and symbolic names for network nodes and resources.

Windows Management Instrumentation CONTEXT [Windows]

The name of the Microsoft framework that supports CIM and WBEM. A set oThe name of the Microsoft framework that supports CIM and WBEM. A set of Windows NT operating system facilities that enable operating system components to provide management information to management agents.

WINS CONTEXT [Windows]

Acronym for Windows Internet Naming Service.

WMI CONTEXT [Windows]

Acronym for Windows Management Instrumentation.

word CONTEXT [General] [Fibre Channel]

An addressable unit of data in computer memory. Typically defined to be 16 consecutively addressed bits. Most processor architectures include arithmetic and logical instructions that operate on words.

[Fibre Channel] The smallest Fibre Channel data element, consisting of 40 serial bits representing either a flag (K28.5) plus three encoded data bytes (ten encoded bits each) or four ten-bit encoded data bytes.

[Fibre Channel] A string of four contiguous bytes occurring on boundaries that are zero modulo four from a specified reference. cf. transmission word

workgroup

A group of UNIX or Windows computer system users, usually with a common mission or project, that is created foradministrative simplicity.

world wide name CONTEXT [Fibre Channel]

A 64-bit unsigned Name_Identifier which is worldwide unique. cf. Fibre Channel Name

A unique 48-or 64-bit number assigned by a recognized naming authority (often via block assignment to a manufacturer) that identifies a connection or a set of connections to the network. Abbreviated WWN. A WWN is assigned for the life of a connection (device). Most networking technologies (e.g., Ethernet, FDDI, etc.) use a

of 407

world wide name convention.

worm CONTEXT [Security]

An independent program that replicates from computer to computer across network connections, often clogging networks and computer systems as it spreads.

write hole CONTEXT [Storage System]

A potential data corruption problem for parity RAID technology resulting from an array failure while application I/Ois outstanding, followed by an unrelated member disk failure (some time after the array has been returned to service). Data corruption can occur if member data and parity become inconsistent due to the array failure, resulting in a false regeneration when data from the failed member disk is subsequently requested by an application. Parity RAID implementations typically include mechanisms to eliminate the possibility of write holes.

write-back cache

A caching technique in which the completion of a write request is signaled as soon as the data is in cache, and actual writing to non-volatile media occurs at a later time. Write-back cache includes an inherent risk that an application will take some action predicated on the write completion signal, and a system failure before the data is written to non-volatile media will cause media contents to be inconsistent with that subsequent action. For this reason, good write-back cache implementations include mechanisms to preserve cache contents across system failures (including power failures) and to flush the cache at system restart time. cf. write-through cache

write penalty

Low apparent application write performance to independent access RAID arrays' virtual disks. The write penalty is inherent in independent access RAID data protection techniques, which require multiple member I/O requests for each application write request, and ranges from minimal (mirrored arrays) to substantial (RAID Levels 5 and 6). Many RAID array designs include features such as write-back cache specifically to minimize the write penalty.

write-through cache

A caching technique in which the completion of a write request is not signaled until data is safely stored on non-volatile media. Write performance with a write-through cache is approximately that of a non-cached system, but if the data written is also held in cache, subsequent read performance may be dramatically improved. cf. write-back cache

WWN CONTEXT [Fibre Channel]

Acronym for World Wide Name.

X_ID CONTEXT [Fibre Channel]

of 407

Acronym for Exchange_Identifier.

X3T10 CONTEXT [SCSI]

The ANSI X3T10 technical committee, the standards organization responsible for SCSI standards for communication between computers and storage subsystems and devices.

X3T11 CONTEXT [Fibre Channel]

The ANSI X3T11 technical committee, the standards organization responsible for Fibre Channel and certain other standards for moving electronic data in and out of computers.

XML

Acronym for Extensible Markup Language.

zero filling CONTEXT [Security]

The process of filling unused storage locations in an information system with the representation of the character denoting "0".

zeroization CONTEXT [Security]

The process of removing or eliminating the key from a cryptographic program or device.

zone CONTEXT [Fibre Channel]

A collection of Fibre Channel N_Ports and/or NL_Ports (i.e., device ports) that are permitted to communicate with each other via the fabric. Any two N_Ports and/or NL_Ports that are not members of at least one common zone are not permitted to communicate via the fabric. Zone membership may be specified by: (1) port location on a switch, (i.e., Domain_ID and port number); (2) the device's N_Port_Name; (3) the device's address identifier; or (4) the device's Node_Name. Well-known addresses are implicitly included in every zone. cf. zone set

CONTEXT [Fibre Channel]

A set of zone definitions for a fabric. Zones in a zone set may overlap (i.e., a port may be a member of more than one zone). Fabric management may support switching between zone sets to enforce different access restrictions (e.g., at different times of day).

zoning CONTEXT [Fibre Channel]

A method of subdividing a storage area network into disjoint zones, or subsets of nodes on the network. Storage

of 407

area network nodes outside a zone are invisible to nodes within the zone. Moreover, with switched SANs, traffic within each zone may be physically isolated from traffic outside the zone.

Copyright (c) 2001 Storage Networking Industry Association.

[ Team LiB ]

of 407

[ Team LiB ]

Appendix G. SAN Essays The following articles by Tom Clark originally appeared in Windows and .NET Magazine's Storage Admin online publication (www.storageadmin.com) and are reprinted here with permission of the publisher. These short essays expand on topics discussed in this book and are provided here as supplemental material. [ Team LiB ][ Team LiB ]

SAN Customers and SAN Vendors Just as politicians like to project an image of reflecting the popular will of their constituencies, vendors often present their products as simply responses to customer demands. The ideal products are the result of careful analysis of customer requirements obtained through in-depth discussions of customer needs. Vendors then apply technology to satisfy those needs, so that rather than imposing new solutions on customers, the new solutions themselves express thewill of the customer. This dialectic between customer and vendor ideally drives product development and creates a cycle of innovation based on the customer's evolving business requirements.

The reality, however, is that while customers may know their immediate needs they may be less certain of what issues loom on the horizon. Circumstances may accelerate some requirements or delay others. Unexpected changes in businesspolicies may inject new criteria that cannot be easily satisfied by existing technologies. So while vendors must listen carefully to their customers, they cannot rely solely on customer input or risk trailing behind the market. Consequently, technologists must be able to parse current customer requirements and use those results to project what future needs may be. Trailing the customer may retard technological development; getting too far ahead of the customer may result in new solutions unproductively seeking problems to solve.

The storage networking industry has been driven by both customer requirements and vendor innovation. The basic customer problems storage networking has solved were evident long before viable solutions were available. High availability data access, tape backup, storage consolidation and other issues helped define what solutions storage networking should provide. A pent-up demand moved those solutions quickly into the top tier enterprise networks. The storage industry as a whole, however, failed to respond quickly to the customer issues that arose once SAN products were deployed. Interoperability, complexity, scalability, management and cost continue to inhibit adoption of storage networking for the 85% of the market that continues to use direct-attached storage, as well as large customers that have been unable to scale and extend networked storage across the enterprise.

Trailing somewhat behind customer desires, storage virtualization is responding to customer complaints that shared storage is too complex and too difficult to manage. This is something that vendors did not foresee, or had pushed low ontheir priorities for product development. As a consequence, a portion of the market is waiting for vendors to catch up before more pervasive implementation will occur.

Likewise, customers early on expressed the desire to implement shared block storage over their traditional IP networks. The initial inhibitors to meeting this need included lack of a high performance topology (since Gigabit Ethernet was not available) and lack of native IP storage protocols. Pioneering vendors have accelerated development of IP storage products in recognition of the substantial investment customers have already made in infrastructure, training and support for their mainstream data communications networks. Migrating SANs into mainstream IP networks thus satisfies long-standing requests by customers for less problematic and less expensive shared storage solutions.

Storage networking will achieve its goals when customers can acquire infrastructure and applications that meet their immediate needs and fulfill the basic criteria of open standards, scalability and extensibility to meet their future ones. Torealize this objective, vendors and customers must both lead the storage industry and in common draw the guidelines for its further evolution. The Storage Networking Industry Association (SNIA) has created the Consumer Council as a vehicle for this interaction so that both vendors and customers can define meaningful objectives for the industry. [ Team LiB ][ Team LiB ]

Standardization and Storage Networking Technologies

of 407

The emergence of open systems has had a profound impact on the development of data communications technologies. For the past 20 years, open systems have successfully undermined the monolithic, single-vendor dominance of data processing, forcing even the largest vendors to pledge allegiance to open, multi-vendor solutions. At the heart of open systems, the standardization process insures that customers will no longer be bound to a single supplier simply in order to benefit from new technical advances. In fact, even the most attractive new technologies must first demonstrate open standards support before customers will consider them for their data networks. The widespread rejection of proprietary initiatives means that even competing vendors must find ways to cooperate if any are to be successful in the market.

Open systems standardization is embodied in a variety of organizations. Standards organizations assume the challengingtask of defining the common parameters of new technologies so that new products will be compliant and interoperable. In mainstream IP networking, for example, the Internet Engineering Task Force (IETF) has become the standards authority for a wide variety of technologies based on TCP/IP, including management, security and quality of service initiatives. When evaluating IP networking products, customers will often require compliance to the relevant RFCs (Requests for Comments) that detail the behavior of protocols or features. Failure to comply with the appropriate IETF standards often means disqualification from consideration.

Although standards bodies have considerable market influence, they draw their technical expertise from an army of volunteers. Participants may be individual technologists, academics, vendor representatives or paid consultants representing corporate or vendor interests. Generally, overt company identification or influencing is discouraged, although for participants there is usually little question of where individual companies stand on issues. Discussion and decision-making for technical matters are democratic and open to any interested individual, but it is expected that any input be informed and current with the stage of development of the standard at hand.

The challenge for vendors is to create products that are standards compliant in several areas and yet offer differentiatingvalue to the customer. Since the vendors themselves are active participants in the standards process, it is in their interest to cooperate on fundamental features that promote the technology. It is also not unusual, however, for vendors to withhold unique technical advantages from the standards process in order to maintain a competitive edge. In some instances, as in the case of Fibre Channel switch standards, this stalling tactic results in customer dissatisfaction and retards technology adoption. The era in which an individual company could arrogantly declare itself to be the standard is long gone, although this reality has not yet hit home to some vendors.

Storage networking technology involves a number of standards organizations. Fibre Channel SAN vendors rely on the NCITS/ANSI T10 committee for SCSI protocol issues and NCITS/ANSI T11 committee for Fibre Channel-specific transport issues. In addition, some SAN management components rely on out-of-band SNMP methods which are standardized within the IETF. The Fibre Channel Management Integration MIB, for example, is defined in an Internet Draft, although Fibre Channel itself falls under the standards authority of NCITS/ANSI.

With the introduction of IP storage networking, use of Gigabit Ethernet for high performance transport introduces standards defined by the IEEE 802.3 committee. The protocols for moving block storage data over IP networks are beingdefined in the IETF via a number of Internet Drafts, as are the discovery and management mechanisms for IP storage. In addition, encapsulation of SCSI in TCP/IP necessarily involves compliance to NCITS/ANSI T10 standards. Vendors whoprovide connectivity to Fibre Channel storage devices must be NCITS/ANSI T11 compliant as well.

The parfait of standards helps insure interoperability and open systems commitment from lower level physical transportsto upper layer protocols essential for the SAN infrastructure. On top of these, customers apply applications which, although not subject to the same standards rigor as the infrastructure, hopefully enjoy a firm open systems foundation to run upon.


Standards Compliance versus Interoperability Pitting standards compliance against interoperability at first glance seems contradictory. After all, standards are created to insure interoperability of products and applications. If multiple vendors engineer products to the same standards, the prospects for interoperability are greatly enhanced, reducing the customer's exposure to being locked into a proprietary solution. That customers appreciate this relationship is shown by the long lists of standards checkboxes that must be filled before an RFP for new product acquisition is awarded. No vendor wishing to compete in the marketplace can ignore

of 407

the importance of standards compliance. Nor can any customer wanting to maintain vendor independence and flexibility in product selection.

Standards compliance, however, does not guarantee interoperability. Once standards are formalized through the appropriate standards organization, the guidelines for product design are clearly drawn. The problem is that vendors may interpret these guidelines differently, or the standards themselves may lack sufficient definition in some areas to insure compatibility. The result is that two products can be fully standards compliant, and yet fail to interoperate. Standards development thus requires periodic interoperability testing to verify the integrity of the standards themselves and to surface variances in vendor interpretation. The iSCSI standards initiative within the IETF, for example, has been accompanied by interoperability plugfests held at the University of New Hampshire. Additional iSCSI interoperability testing will occur at the SNIA Storage Networking World conference in Orlando in October. Without practical validation ofboth standards compliance and interoperability, the goal of standardization cannot be met.

Likewise, interoperability in itself does not guarantee standards conformance. Vendors often have their own interoperability labs which are used to craft multi-vendor solutions. A storage vendor, for example, may configure a tapebackup solution that includes their own storage products, SAN switches, Host Bus Adapters, routers, tape subsystems and application software. The prime directive for such affinity group configurations is to debug any problems and to produce an interoperable and supportable solution. Although some level of standards compliance may be assumed, if theintention is to produce a working solution that can be taken to market, interoperability typically takes precedence over standards. As a consequence, a multi-vendor interoperable solution may perform flawlessly until the customer plugs in athird party standards compliant product. Unfortunately, this is too often the case for packaged SAN solutions. Solution providers may tweak microcode to enhance performance or functionality and in the process make it impossible for customers to add other products into the configuration. Whether intended or not, this binds the customer to a particular vendor and defeats the purpose of open systems standards.

Since enforcement of both standards compliance and interoperability is essential for the long term viability of the customer's storage network, customers themselves must encourage their vendors to provide solutions that are both interoperable and standards based. In support of this goal, the SNIA Interoperability Committee is producing standards compliance test suites that will assist vendors in meeting their dual responsibilities for standardization and interoperability. This effort will hopefully remove some of the contradictions that currently exist. [ Team LiB ][ Team LiB ]

Storage Resource Management The unprecedented growth in data is forcing administrators to find new means to manage storage capacity and utilization. Storage area networks (SANs) have provided some relief to the data explosion by enabling more servers to access larger storage arrays. SANs, however, simply provide the means to consolidate storage resources. Without storage management applications, a SAN infrastructure cannot determine what kinds of data are being stored or identifywhere data placement should occur. One of the major contributors to data growth is simply duplication. A business may have multiple copies of applications, databases and files spread throughout the network. To address this inefficient use of storage capacity, storage administrators are turning to storage resource management (SRM) applications to help them determine where duplication exists.

Storage resource management software was pioneered by Highground Systems, which Sun Microsystems acquired in 2000. Highground's innovative approach to storage management is based on intelligence-gathering by software agents installed on each server. The SRM software agent constantly monitors storage utilization on disk arrays under a server's control. Information on the types of applications, databases and files is reported by each server to a central management platform. By processing the information gathered from tens or hundreds of servers, the SRM management platform can identify instances of duplication wherever it exists. With this information, storage administrators can then determine how best to eliminate redundant applications and data and maximize use of existing storage capacity.

Resource management applications are a natural complement to shared storage networks. While SANs provide an efficient means to access storage, storage resource management ensures that shared disk space is not filled with redundant files or applications. In addition, by minimizing data duplication, tape backups can be shortened. Since the ongoing cost of administering storage is often calculated on a per-gigabyte basis, anything that reduces the volume of data to manage contributes to a lower total cost of ownership. The combination of SAN technology and storage resource management helps ensure that more useful data can be administered more economically.

Resource management is also a complementary component to storage virtualization. Hiding the complexity of physical

of 407

storage through virtualization has less value if the virtualized storage is glutted with redundant applications and files. Theoretically, storage resource management can be tightly coupled with storage virtualization so that a virtualization engine dynamically allocates storage capacity as needed and monitors the types of data being stored. By automatically eliminating redundancy, policy-based virtualization can contribute to lower administrative overhead and hardware costs.

One of the initial drawbacks of storage resource management was the requirement for software agents to be run on each server. Customers generally do not like to load wedge or "gasket" software onto their servers, even if it serves a useful purpose. If a server fails, for example, someone must remember to reload the proper version of SRM software. In addition, a new release of SRM agent software would require a lengthy installation on multiple servers. As with volume management, the resolution to this problem is to integrate SRM functionality into the operating system. Updates can then be more easily automated since installation occurs as part of a normal service pack upgrade.

In the long term, customers will benefit from the convergence of SAN management, volume management, storage virtualization, and storage resource management applications. As shown by Microsoft's recent announcements at the Windows Hardware Engineering Conference (WinHEC), operating system vendors are recognizing the key role that storage continues to play in enterprise networks. Providing hooks into the operating system for intelligent SRM agents will simplify the lives of storage administrators and help contain the unrelenting growth of data.


Going the Distance with Storage Data For the past few years companies have been acquiring storage area networks to satisfy specific application requirements. In a single enterprise, separate SANs have appeared in different departments or regional locations as managers have attempted to address the high availability, resource consolidation or tape backup requirements of their own areas. With this initial deployment of SAN islands in various business units, little thought was given to how they might be joined together to streamline shared storage administration and maximize utilization of shared storage resources throughout the enterprise. Now, however, more customers are seeking means to join previously independent SANs together, often across campus, metropolitan or wide area distances. At the same time, customers may want to leverage storage over wide areas for disaster recovery, content distribution or remote tape vaulting applications. Findinga means to efficiently and economically move block storage data across long distances is thus becoming a top priority for many customers, both to satisfy budget requirements and to insure business continuance in the event of disruption or disaster.

Since current SANs are based primarily on Fibre Channel, extending storage data over distance must accommodate the peculiar requirements of Fibre Channel fabrics. Fibre Channel switches are designed for the narrow circumference of the data center or department, and so are not easily adapted to wide area connections. Connecting SAN islands over dark fiber or dense wave division multiplexing (DWDM) may work well within metropolitan environments, but the lack of sufficient buffering in fabric switches results in dramatic loss in performance beyond 50–100 miles. This problem can be addressed by using special IP tunneling gateways that use Fibre Channel over IP and typically provide much larger buffers than stand-alone fabric switches.

In both of these instances, however, the connection between fabric switches over distance creates one extended SAN. This is analogous to a link layer LAN of the 1980s, before the introduction of IP routers. Just as link layer LANs were vulnerable to broadcast storms by errant end devices, an extended SAN may be vulnerable to disruptions in the form of fabric reconvergence or state change notification broadcasts. Problems at one site can be propagated to another site, causing the extended SAN to divide again into separate SAN islands and the loss of storage connectivity.

To overcome this potential issue, the Internet Fibre Channel Protocol (iFCP) was developed. The iFCP protocol allows Fibre Channel SANs to be connected over long distances, but without building a single logical SAN. Instead, the autonomy of each SAN location is maintained, while authorized sessions between servers and storage across the wide area can be established. Disruptions at one site are not propagated across the wide area link, so other sites are unaffected.

While iFCP addresses the issue of stability for SAN extension, basic laws of physics must also be considered. Speed of light propagation delay is roughly a millisecond of latency for every 100 miles of distance. A thousand mile link between two locations, therefore, would incur about 20 milliseconds of latency round trip. Having enormous port buffers in an IP

of 407

storage gateway is one way to deal with speed of light latency, particularly for streaming storage applications such as tape backup. Instead of having to periodically wait for limited buffers to be replenished, large buffers can be used to fill the wide area link with storage data as acknowledgments return from the receiving site. Additional mechanisms such as data compression, large (jumbo) IP packets, and manipulation of write operations contribute to the more efficient utilization of available wide area bandwidth.

With technology available today it is possible to span thousands of miles of distance with storage data using IP and switched optical network infrastructures. This enables more resilient disaster recovery strategies that place a backup sitewell beyond a potential disruption while leveraging more affordable IP network services. In addition, an enterprise with national or international storage requirements can now implement comprehensive solutions that were not possible a few years ago. Storage data can now go the distance by leveraging the best of Fibre Channel storage with the best of IP storage switch technology.


Shared Storage for the Masses The steady penetration of storage area networks (SANs) into the market continues, but so far has been largely limited tothe top tier companies and institutions. By generous estimates, less than 20% of the potential market for shared storagesolutions has actually adopted SANs. This is due in part to higher acquisition costs of Fibre Channel storage and in part to the complexity and management issues associated with SAN technology. Although small and medium businesses (SMBs) can also benefit from the storage consolidation, tape backup and high availability applications that SANs enable, it is often difficult to make the transition from direct-attached to shared storage with limited budget, staff and SAN expertise. New SAN products, however, are facilitating this change by reducing acquisition costs and maximizing use of existing corporate network infrastructures.

The iSCSI host adapters, IP storage switches and iSCSI-to-SCSI bridge products now available in the market are changing the complexion of SANs. Homogeneous Fibre Channel fabrics must now accommodate IP networking, both for classic SAN extension applications such as disaster recovery and for heterogeneous mixtures of iSCSI and Fibre Channel end devices. With the introduction of native iSCSI interfaces on storage arrays in early 2003, the multi-protocol composition of SANs will shift even more dramatically towards IP. As the IP content increases, however, the market penetration of SANs will also increase. Companies that could not previously assume the burden of cost and complexity ofFibre Channel fabrics now have options for implementing SANs that conform to the rest of their data communications networks.

Thanks to large networking vendors such as Intel, the cost of iSCSI host adapters (a.k.a. storage accelerators or storageNICs) is being driven into a more affordable price range. While it was difficult to justify installing a $1200 Fibre Channel Host Bus Adapter in a $2000 server, it is more reasonable to install a $600 iSCSI card that accomplishes the same task. The iSCSI adapter, in turn, may be plugged into a standard Gigabit Ethernet switch port instead of a dedicated Fibre Channel switch, at roughly half the per-port cost. iSCSI host attachment eliminates other costs as well, such as special equipment to extend SAN connectivity over distance. Standard IP switches and routers now provide the SAN transport, regardless of physical location.

IP storage switches are still required to bring Fibre Channel storage and end devices into an IP SAN, but currently do notoffer a significant cost savings over Fibre Channel fabric switches. The cost benefit for these devices is primarily in cost avoidance. IP storage switches, for example, enable both Fibre Channel and iSCSI hosts to access Fibre Channel storagetargets over traditional IP networks. This solution avoids the costs of implementing and managing a Fibre Channel fabric,with its associated complexity and requirement for special diagnostic tools and expertise.

iSCSI-to-SCSI bridges open up the mass IT market to SAN solutions. These products enable customers to bring their legacy SCSI disk arrays and tape devices into a shared IP SAN. Unlike the previous generation of Fibre Channel-to-SCSI bridges, connectivity to the SAN no longer mandates more expensive Fibre Channel switches or Fibre Channel hosts. Instead, SCSI storage can be bridged into a common IP network infrastructure and shared by iSCSI-attached servers. Asa value-add, some vendors are integrating storage virtualization capability into iSCSI-to-SCSI bridges so that even smalland medium businesses can enjoy the benefits of storage pooling. Collectively, iSCSI adapters, IP storage switches and iSCSI-to-SCSI storage concentrators are bringing shared storage solutions to a broad spectrum of customers. [ Team LiB ]

of 407

[ Team LiB ]

SAN Security Data is the currency of today's enterprises. While loss of data access can cost up to millions of dollars per outage, loss ofsensitive data to a competitor or hacker can be catastrophic. Enterprise networks therefore go to great lengths to ensuredata security for their mission-critical information, and in particular for the security of data on disk. Data centers typically maintain secure physical access to safeguard corporate information from outside penetration. In addition, conventional direct-attached storage itself provides inherent security since storage arrays are bound by parallel cabling directly to servers. The introduction of SANs, however, has changed this view of secure data storage. SANs bring networking between servers and storage to solve availability and storage consolidation issues, but with peer-to-peer networking technology there is always the potential for unsecured access.

Consequently, running storage data over SANs invariably raises the issue of security. First generation Fibre Channel SANs have relied on the implicit security provided by data center environments. When SAN resources are shared between multiple departments, however, physical isolation of the SAN may not be sufficient. Human resources and engineering, for example, may share a Fibre Channel fabric switch, with each department maintaining its own storage resources. To preserve the privacy of human resources information from the prying eyes of engineers, it is necessary to logically separate the SAN. In Fibre Channel fabric switches, this function is provided by zoning. Zoning allows a single physical SAN to be divided into logical storage access groups. In this example, human resources would reside in one zone, and engineering in another. The Fibre Channel fabric switches enforce separation of data traffic so that only zoned participants can communicate to their designated resources. Overlapping zones enable authorized resources to be grouped for specific functions. Although human resource and engineering servers are zoned apart, both could be zoned into a third access group to a common tape backup subsystem.

Still, zoning does not address the security of data in transit over a link. Even with zoning, it is possible for someone to sniff the link with an analyzer and capture data. This vulnerability is especially evident when storage data is run over longer unsecured distances, for example over a metropolitan link provided by a carrier. Native Fibre Channel and Fibre Channel extension via DWDM lack an integrated security option. Until recently, there has been no means to encrypt the storage data before it leaves the premises and enters the long haul.

Ironically, data security for Fibre Channel SANs can be resolved by enhanced capabilities introduced by a competing technology: IP SANs. IP-based storage networking can leverage well-established security mechanisms such as IPSec (IP Security) to provide sophisticated data encryption for storage traffic. Newly introduced IP storage switches enable Fibre Channel hosts and storage devices to connect to Gigabit Ethernet switches and IP routers. While some IP routers offer integrated IPSec capability, there are also third-party encryption devices that can be paired across an untrusted link. These products use the Data Encryption Standard (DES) or the more rigorous triple-DES algorithm to thoroughly encryptsensitive data. Some products also provide gigabit interfaces for optimum throughput. By integrating Fibre Channel storage and servers with IP infrastructures, network administrators can insure the security of storage data for any network segments that may be vulnerable to penetration.

Security for SANs is gaining increasing attention as enterprise networks expand their disaster recovery and remote backup strategies. In addition, SAN-based storage is gradually displacing direct-attached storage over time, thus widening the scope of security concerns. The introduction of IP storage networking is alleviating those concerns by bringing enhanced security features to traditional Fibre Channel SAN technology. [ Team LiB ][ Team LiB ]

Infrastructures and Applications In a typical neighborhood, people live on a thin veneer of essential services that are for the most part hidden from view. Power lines, phone lines, water pipes, gas and sewer lines are thankfully buried underground. This collective plumbing provides the support infrastructure for daily living and becomes a focal point only when things don't work and daily livingis disrupted.

This same scenario applies to the typical neighborhood of data communications. End users run a variety of applications to transact their daily business, assuming that the underlying data communications network will provide stable and continuous service. If the network goes down, daily business is disrupted, sometimes costing companies millions of dollars per hour. End users are not and should not be aware of the underlying plumbing that is so essential to their

of 407

business lives. That is the responsibility of network managers and engineers who work behind the scenes to ensure that the upper layer applications run smoothly.

Database applications, online transaction processing, email, web services and other end user applications require computing platforms, storage and network interconnect. Storage area networking brings the storage component into thenetwork plumbing and enables sharing of disk and tape resources by all upper layer applications. Since storage networksare a relatively new technology, there has been more initial focus on SAN infrastructures. Fibre Channel, iSCSI, IP-basedSAN extension, storage virtualization and other plumbing issues receive considerable press coverage and engender heated debate among SAN vendors. While this may be a natural side effect of an emerging technology, in the end all thecustomer really wants is reliable and high performance data service: Will my applications run more efficiently and will I have continuous access to my data?

Specific storage-related applications such as volume managers, storage resource management, server clustering, data replication, and tape backup utilities benefit from the lower level storage network's performance and resource sharing capabilities. Those applications, however, should be agnostic to the underlying transport. The customer should expect that, regardless of whether the SAN infrastructure is based on Fibre Channel, iSCSI or some combination of both, that the upper layer applications will perform the same. One of the challenges of SAN hardware vendors, then, is to make their solutions transparent to the higher level storage applications. The ideal network solution is one that, like an underground water main, is invisible to the end user.

The stability and interoperability of SAN infrastructures is fostered by standardization. To maintain a low profile and minimize administrative overhead, SAN hardware needs to satisfy the customer's expectations for standards compliance.Consequently, customer hardware acquisitions through Requests for Proposals (RFPs) typically have long laundry lists of standards checkboxes that must be filled before a solution is considered. The assurance of standards compliance increases the customer's odds that an acquisition will not later create support and interoperability issues. The idea here is to verify that, once in place, the plumbing will pose minimal issues.

Ironically, these same criteria are not placed on the applications themselves. While running applications smoothly is predicated on a stable, standards-compliant network infrastructure, the applications themselves are vendor-specific or tailored for a customer's business needs. Applications thus benefit from the rigor taken to create sound plumbing, but can provide the flexibility to adapt to specific customer requirements.


Disaster Recovery in an Uncertain World Following the September 11th terrorist attacks, the storage networking industry found itself in an awkward situation. Disaster recovery has always been one of the solutions that SAN advocates have promoted. Use of Dense Wave Division Multiplexing (DWDM) or new IP storage technologies to extend the reach of storage beyond primary data centers has enabled remote mirroring and remote backup of vital corporate data. Fortunately for the collective karma of the storage networking industry, few vendors have attempted to opportunistically exploit the sudden surge of interest in disaster recovery. Instead, most have simply restated the technology benefits that SANs generally provide for disaster recovery, highlighting new functionality that can enhance design and implementation of business continuance strategies.

Like tape backup in general, disaster recovery has not been a favored career path for storage administrators. Safeguarding ever-changing corporate data is often a monotonous routine that, like paying insurance premiums, is done grudgingly but on a fixed schedule. While backing up vital corporate data to tape has been assisted by the development of larger tape libraries and higher capacity tape cartridges, the primary issue for tape backup is not scheduling or tape rotation, but restoring data from tape in the event of failure. Restoration imposes a lengthy downtime during which ongoing business may be lost. Consequently, despite the higher cost of disk mirroring to maintain a readily accessible copy of data, mirroring strategies are increasingly deployed as part of an optimum disaster recovery solution.

Intelligent controllers in higher end storage devices offer new options for disk mirroring, including the ability to have twoseparate storage units use each other as secondary mirrors. For disaster recovery implementations, this would allow twoseparate data centers to provide mutual support in the event of the failure of either center, performing on a grand scale the failover capability of server clustering. Mirroring between two remote sites, however, is not a trivial undertaking, particularly if long distances are involved. Depending on the applications being run to storage, the amount of data that

of 407

must be synchronized between two locations may require sustained gigabit bandwidth in both directions. Initial attemptsto use (DWDM) for extending distance were successful within a metropolitan circumference, but have not scaled well to regional distances. Mirroring between New York and New Jersey, for example, can be done with simple SAN extension via DWDM. Mirroring between New York and Chicago is problematic.

Unfortunately for all of us, the distances now required to truly safeguard human and corporate data resources are unknown. Where previously it might have been sufficient to mirror between two buildings in the same city, now an entire city may be victim to direct or indirect disruption from an attack. Consequently, consultants are now recommending business continuance solutions that span regions, maintaining mirrors that may be separated by hundreds or thousands of miles.

The precedent for long distance mirroring was established by businesses situated in geologically unstable areas, such as California or the Pacific Northwest. Until recently, achieving stable, high performance links between regions has been difficult. Carriers such as Qwest are now offering gigabit and multi-gigabit services that can span coast to coast. For storage applications such as remote mirroring, new IP storage technologies have demonstrated sustained gigabit performance over thousands of miles. These new solutions are enabling disaster recovery strategies no longer bounded by distance or speed considerations, and which can now concentrate on the more critical issues of preserving human andIT resources in the event of catastrophe.


Enabling iSCSI Migration The future of Internet SCSI (iSCSI) has been the subject of rampant speculation in the storage industry for the past two years. While iSCSI represents the ultimate convergence of SANs with mainstream TCP/IP networking, some early marketproducts failed to establish sufficient end-user value to gain market acceptance. IBM's recent withdrawal from the market of its iSCSI-based 200i TotalStorage array and Cisco's lack of success with the NuSpeed SN 5420 switch have fostered an industry attitude that iSCSI may not be ready for prime time. At the same time, however, the iSCSI standards process is nearly complete, new wire-speed iSCSI host adapters and IP storage switches from other vendors are now available, and Microsoft has announced support for iSCSI in its forthcoming .NET operating system. These mixed iSCSI messages may be confusing the industry and customers alike, but perhaps the problem is not with iSCSI itself, but how individual vendors have marketed the technology.

Some of the early vendor marketing collateral on iSCSI positioned the technology as a viable solution only for low end storage applications. Running an iSCSI device driver on laptop PCs, for example, would enable laptops to perform block data backup over standard Ethernet networks. In part, this market positioning for the low end was based on the very limited performance of some products. Vendors with wire-speed products, by contrast, have been positioning iSCSI as a solution for data center and high performance storage applications. The fact is there is nothing inherently limiting in the iSCSI protocol that would restrict it to low level or higher end applications. As with TCP/IP networking in general, customers get what they pay for, and those willing to pay more can get more in terms of functionality and performance.

The introduction of high performance iSCSI host adapters at a list price of between $600 and $1000 challenges the vendor hype that iSCSI is much cheaper than Fibre Channel. While iSCSI adapters are somewhat more economical than comparable Fibre Channel HBAs ($1200–$2000 price range), they are still much more expensive than standard Gigabit Ethernet NICs ($100–$500 price range). The real savings for customers, then, is not in hardware acquisition costs, but in leveraging the IP infrastructure, expertise and network management tools already in place in their corporate networks.

Although enterprise-class iSCSI storage arrays are not expected to appear in the market until 2003, the availability of iSCSI host adapters and wire-speed IP storage switches will radically change the composition of SANs through the remainder of this year. Why? Because customers will now have the flexibility to deploy servers anywhere in the IP network instead of tethering servers to dedicated Fibre Channel switches. With the ability to now use more of their familiar IP infrastructure and support staff to implement SAN solutions, customers can expand their storage networking applications without simultaneously hiring more Fibre Channel fabric experts. Finally, when iSCSI storage systems come to market, customers can simply integrate them into their existing IP SAN, side by side with Fibre Channel arrays and hosts that have already been brought into the IP SAN via IP storage switches. As shown by Carlson Companies and other customers, implementing IP-based storage networks is viable today, even as new IP storage products continue to

of 407

be introduced to the market.

This heterogeneous iSCSI migration enables customers to select the best-in-class Fibre Channel storage and iSCSI adapter products without being locked into a specific technology solution. Customers can thus set their own pace for integrating IP and Fibre Channel products while taking full advantage of the functionality and performance offered by both. [ Team LiB ]

of 407

[ Team LiB ]

BibliographyBarker, Richard, and Paul Massiglia. Storage Area Network Essentials. New York: John Wiley & Sons, 2001.

Benner, Alan F. Fibre Channel. New York: McGraw-Hill, 1996.

Benner, Alan F. Fibre Channel for SANs. New York: McGraw-Hill, 2001.

Breyer, Robert, and Sean Riley. Switched, Fast, and Gigabit Ethernet. Indianapolis: Macmillan, 1999.

Clark, Tom. Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel SANs. Reading, MA: Addison-Wesley, 1998. Also in Japanese translation, Nanosoft, 2000, 89471-272-5.

Clark, Tom. IP SANs: A Guide to iSCSI, iFCP, and FCIP Protocols for Storage Area Networks. Reading, MA: Addison-Wesley, 2001.

Cunningham, David, and William Lane. Gigabit Ethernet Networking. Indianapolis: Macmillan, 1999.

Davie, Bruce, et al. Switching in IP Networks. San Francisco: Morgan Kaufmann, 1998.

Farley, Marc. Building Storage Networks. Berkeley, CA: Osborne, 2000.

Field, Gary, et al. The Book of SCSI. San Francisco: No Starch Press, 1999.

Hufferd, John. iSCSI: The Universal Storage Connection. Reading, MA: Addison-Wesley, 2002.

Kembel, Robert W. The Fibre Channel Consultant: Arbitrated Loop. Tucson, AZ: Northwest Learning Associates, 1997.

Kembel, Robert W. The Fibre Channel Consultant: A Comprehensive Introduction. Tucson, AZ: Northwest Learning Associates, 1998.

Kembel, Robert W. The Fibre Channel Consultant: Fibre Channel Switched Fabric. Tucson, AZ: Northwest Learning Associates, 2001.

Massiglia, Paul. Disk Storage Management for Windows Servers. Mountain View, CA: Veritas, 2001.

Massiglia, Paul. Highly Available Storage for Windows Servers. New York: John Wiley & Sons, 2001.

Perlman, Radia. Interconnections. Second Ed. Reading, MA: Addison-Wesley, 2000.

Pollack, Daniel. Practical Storage Area Networking. Reading, MA: Addison-Wesley, 2002.

Preston, Curtis. Using SANs and NAS. Sebastopol, CA: O'Reilly and Associates, 2002.

Seifert, Rich. Gigabit Ethernet. Reading, MA: Addison-Wesley, 1999.

Stai, Jeffrey D. The Fibre Channel Bench Reference. Saratoga, CA: ENDL Publications, 1996.

Stevens, W. Richard. TCP/IP Illustrated. Three vols. Reading, MA: Addison-Wesley, 2000.

Thornburgh, Ralph. Fibre Channel for Mass Storage. Upper Saddle River, NJ: Prentice Hall, 1999.

Toigo, Jon William. Disaster Recovery Planning. Upper Saddle River, NJ: Prentice Hall, 2002.

of 407

Toigo, Jon William. The Essential Guide to Application Service Providers. Upper Saddle River, NJ: Prentice Hall, 2001.

Toigo, Jon William. The Holy Grail of Data Storage Management. Upper Saddle River, NJ: Prentice Hall, 1999.

[ Team LiB ]

of 407