storage - bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. yes, it’s still...

27
snapshot 1 Companies without DR plans offer weak excuses EDITOR’S NOTE / CASTAGNA Commodity hardware is a myth SCALABLE STORAGE Architecture for analytics: Big data storage STORAGE REVOLUTION / TOIGO Back to school shopping for the real value of virtualization STORAGE september 2015, Vol. 14, no. 7 snapshot 2 Disk, replication and—surprise!—tape are key DR techs hot spots / sinclair The cost of software-defined storage storage systems Beyond block and file: Integrating object storage READ-WRITE / TANEJA Hyper-convergence on the flipside managing the information that DriVes the enterprise Fibre Channel is alive and kicking Don’t sound the death knell for Fibre Channel storage yet; research indicates widespread use with faster speeds to come.

Upload: others

Post on 27-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

storage • september 2015 1

snapshot 1

Companies without DR plans offer weak excuses

eDiToR’S noTe / CASTAgnA

Commodity hardware is a myth

SCAlABle SToRAge

Architecture for analytics: Big data storage

SToRAge RevolUTion / Toigo

Back to school shopping for the real value of virtualization

Storage

september 2015, Vol. 14, no. 7

snapshot 2

Disk, replication and—surprise!—tape are key DR techs

hot spots / sinclair

The cost of software-defined storage

storage systems

Beyond block and file: integrating object storage

ReAD-WRiTe / TAneJA

Hyper-convergence on the flipside

managing the information that DriVes the enterprise

Fibre Channel is alive and kicking

Don’t sound the death knell for Fibre Channel storage yet; research indicates widespread use

with faster speeds to come.

Page 3: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 3

there’s no suCh thing as commodity storage hardware. Sure, you can choose to believe all those software-defined storage vendors out there instead of me, but even they would admit that if there really was such a thing, they probably wouldn’t want their products running on it.

Still, software-defined storage (SDS) vendors try to woo new users with the cockamamie idea that hardware really doesn’t matter, that their software can do it all. There are plenty of solid, innovative SDS products available that can satisfy a specific need for many companies shopping for servers and storage together. But giving it a label like commodity hardware—that suggests it somehow has less intrinsic value—doesn’t make sense and certainly doesn’t make the software seem any better.

If you were shopping for storage for your company’s data—SAN, NAS, SDS, hyper-converged, whatever—would you feel comfortable parking all that corporate intellectual property on something called “commodity” hardware? Of course not. That’s why you ask what kind of disks come with the unit, and you study the specs to make sure they meet your needs. Maybe the word “commodity” floated to the top because of a perception that there are few choices to be made when purchasing hard disk drives (HDDs), given that only a handful of vendors still make them. But you and I know that couldn’t be more off-base—HDDs range from speedy 15K drives with stingy capacities to be-hemoths with room for 10 TB of data. There’s no way any sane person would lump those two ends of the HDD spec-trum into a single thing called a “commodity” when their applications and use cases couldn’t be more divergent.

For solid-state storage, the notion of “commodity” is even more laughable. With a variety of architectures and technologies to choose from—single-level cell, multi-level cell, triple-level cell, 3D and so on—and implementation choices that run the gamut from SAS/SATA interfaces, PCIe, DIMM slots and just about everything in between, it would be equally insane to drop all those variations on a NAND flash theme into the same bucket.

SDS undoubtedly simplifies that hardware layer in a storage system, and admirably makes an end run around the need for specialized gear to enable sharing of storage

eDiToR’S leTTeR RiCH CASTAgnA

Commodity hardware is a mythDon’t let software-defined storage vendors fool you—hardware is just as important as their apps.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

Page 4: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 4

resources. But the hardware still matters, arguably as much as the cool new software.

And SDS vendors likely know this, as well, despite their rhetoric that tends to marginalize the hardware their products rely on. As SDS has established itself as a

bona fide storage category and attractive alternative to traditional storage offerings, it has also moved away from the do-it-yourself, save big bucks and avoid administrative headaches image that it projected early on.

Honestly, can you see yourself trawling the aisles of Best Buy or Fry’s, filling your shopping cart with stacks of “commodity” hard drives and servers that you’ll cobble together back at the shop?

There aren’t very many companies that are willing to risk their data and application performance on a DIY

project. And the truth is that SDS vendors are about as uncomfortable with that prospect as you likely are. If their software ends up running on a Spud’s Servers R Us server with Spin City hard drives, their product will look as junky as the commodity stuff it’s running on. That’s why so many SDS vendors sell their products in software-hardware bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it.

VMware is the poster child for that tactic. With a lot of hoopla, they rolled out VSAN as a true software-defined storage product, but the product—and the idea of buying storage software separate from the hardware—got a luke-warm reception. That experience spawned EVO:RAIL—software-defined storage again, but a reference model that ensured front-line hardware vendors will package it with their servers, storage and other gear. So much for commodity hardware, eh?

So what does all this mean? I’m glad you asked.First off, it means what every storage pro has always

known: Hardware does, indeed, make a difference. But it also means you shouldn’t let vendors distract you from the importance of the hardware—it’s is just as important as the software it hosts. n

rICh CAstAGnA is TechTarget’s VP of Editorial.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

Would you Feel ComFortAble pArkInG All thAt CorporAte IntelleCtuAl property on somethInG CAlled “CommodIty” hArdWAre?

Page 5: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 5

IF you Are reading this, your kids (or your neighbor’s kids) probably just re-entered school for a new, exciting year of learning. Those who aren’t cut out for algebra, geometry, trigonometry or some other hardcore ‘rithmetic will end up in “business math.” And while you mathletes might write it off as simple stuff, there sure seems to be a lack of basic economics skills among IT planners.

Since server virtualization reared its head back in the early Aughties, I have read about ways virtualizing servers could save us money in IT. First, it was the cost savings from consolidating a bunch of little servers, each running one app, into a fewer number of servers running many

virtualized apps. That was going to save a ton of money just on reduced hardware and software spending. But with increased hypervisor license fees, the need to add 7 to 16 more I/O ports per server, the need to load up with faster processors, more cores and more memory, and to layer in some flash in the mix, high-end servers have a price tag that make IBM mainframes look like a bargain.

Okay, so the Capex math didn’t quite add up. But we could depend on those cost savings for administration—the Opex savings—they said, referring to the smaller number of IT staff you would need to retain, feed, clothe and ensure a highly virtualized and agile data center. Virtualization supposedly would automate and simplify everything so much that a half-baked virtualization ad-ministrator with a $25,000 certification from VMware could do the work of an application administrator, server administrator, storage administrator, network administra-tor, disaster recovery planner, data security planner, data manager and archivist, and basic electrician. Businesses could finally say goodbye to those IT fat cats and domain experts: Virtualizing servers would flatten the IT hierar-chy once and for all.

As it turned out, the wonderful technology of virtu-alization may have hidden certain complexities, but it didn’t improve the manageability of either infrastructure or data. It did displace talent in some IT shops, replacing experienced staff with virtualization school drones, many

SToRAge RevolUTion Jon Toigo

back to school shopping for the real value of virtualizationvirtualization vendors claim Capex and opex savings, but high school math blows holes in that argument.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

Page 6: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 6

of whom have no idea about IT outside the knowledge imparted in VMware school. So, troubleshooting is ineffi-cient, metrics such as mean time to repair, administrative or logistical downtime and first fix rate are abysmal. All of this produces very unflattering Opex totals for those who care to do simple business math.

The biggest boondoggle of all has been with respect to storage. Virtualization vendors blamed all of their woes on storage from the get-go. But they did not tell any of their customers that, just over the horizon, they would need to rip and replace all existing storage infrastructure to obtain even modestly acceptable performance from virtualized application workloads.

We received that news about a year ago, whether they called it EVO:RAIL or Virtual SAN or hyper-converged infrastructure. Simple business math tells us that collec-tively it is going to cost companies a fortune to replace the 40 to 60 exabytes of storage deployed in data centers worldwide.

Furthermore, most of the software-defined storage (SDS) and hyper-converged infrastructure from the big virtualization vendors requires a minimum of three identi-cal nodes per server, and a clustered server model for high availability. That means storage deployment increases to a

minimum of six nodes (that’s basic math for three nodes of identical storage behind each of two clustered servers or 2 x 3). That represents, conservatively, about $90,000 in SDS licenses and hardware per server for an SDS product and potentially much more for a prefabricated hyper-con-

verged appliance. So, I guess that’s what you will be doing with the huge cost savings that virtualizing servers is supposed to deliver?

Look, maybe instead of going to virtualization vendor school, we should send IT staffers back to high school business math for a refresher. Heck, they might even be able to take the course online, assuming they know how to operate a PC. n

Jon WIllIAm toIGo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

hIGh-end servers hAve A prICe tAG thAt mAke Ibm mAInFrAmes look lIke A bArGAIn.

Page 7: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 7

FiBRe CHAnnel

Fibre Channel is alive and kickingDon’t sound the death knell for Fibre Channel

storage yet; market research indicates widespread use and ongoing development.

By CARol SliWA

reports oF the death of Fibre Channel have resurfaced, but once again, the dire proclamations appear to be an exaggeration. Market research suggests only that the stor-age interconnect is heading for a slow downtrend, not the technology graveyard.

Dell’Oro Group, a market research firm that tracks storage networking, forecasts an average annual decrease through 2019 of 3% in Fibre Channel (FC) switch ports and 7% to 8% in the FC ports on host bus adapters (HBAs), which provide connectivity and I/O processing between servers and storage.

Crehan Research Inc., projects an average reduction of 5% per year through 2019 in shipments of both FC switch and HBA ports, on the heels of a 2% actual drop from 2008 through 2014.

“Fibre Channel has a large installed base, and it’s seeing a gradual decline,” said Seamus Crehan, president of the research company. “It’s not falling off a cliff.”

FC switch revenue grew, from $1.8 to $1.9 billion, from 2013 to 2014, despite the FC switch port decline from 5.9 to 5.8 million, according to Crehan. He said users moved to 16 Gigabit per second (Gbps) FC technology, and the newer switches carried a relatively small price premium over the older 8 Gbps FC.

homehenrik5000/istock

Page 8: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 8

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

FC used For mIssIon-CrItICAl Apps

Many enterprises remain committed to FC storage net-working for mission-critical workloads. Martin Littmann, CTO at Kelsey-Seybold Clinic in Houston, said his organi-zation plans to increase its use of “proven” Fibre Channel technology. Kelsey-Seybold’s storage networking consists of FC for storage arrays, Fibre Channel over Ethernet (FCoE) from the host servers to top-of-rack switches, and some InfiniBand.

Dean Flanders, head of informatics at Friedrich Mi-escher Institute (FMI) for Biomedical Research in Basel,

Switzerland, said his organization got burned badly when trying out iSCSI several years ago and has stuck with FC ever since.

Flanders said FC may lack some reusability for network-ing purposes other than storage, and require in-house expertise, but he likes the fact that FC was purpose-built for data storage, doesn’t drop data packets and offers Linux drivers that have been around for more than a decade.

“If it ain’t broke, don’t fix it,” Flanders wrote in an email. “There is a big difference between theory and practice, and in practice, FC works.”

2010 2011 2012 2013 2014 projection for 2019

source: crehan research, inc.

Fibre Channel units sold

8 million

6 million

4 million

2 million

n SWiTCH

n HBA

4.4

5.86.4 6.3 6.2 5.9

3.0 2.82.1

3.33.4 3.5

Page 9: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 9

Flanders said he could foresee a gradual move to more Internet Protocol (IP)-based object storage, potentially leading to smaller pools of FC storage. But, he said an ultra-reliable FC SAN would still make sense, especially for the virtual machine infrastructure.

Why the FC unIt shIpment deClIne?

FC has long been a prime choice for enterprises in need of a high-speed interconnect for SANs, but the landscape started to shift with the introduction of iSCSI storage more than 10 years ago.

Ethernet networking gained acceptance for block stor-age and became popular for file, and later cloud-based, object storage for a variety of reasons. It costs less, requires no dedicated switching and special training, and offers adequate performance for most business applications.

Sergis Mushell, a research director at Gartner, said FC unit shipments are declining because of the changing na-ture of the data that organizations save. He said they keep more unstructured data, and much of the exponentially growing data is not important enough to require highly resilient FC SANs.

“You store the data based on the perceived value of the data,” said Mushell. “You don’t put a YouTube video on the most expensive storage platform.”

Still, Mushell predicted the FC unit decline would re-main in single digits for at least the next three to five years. He said storage technologies “can be on a deathbed for a long time,” pointing to tape and optical drives as ex- amples.

“All practical storage technologies have a very long tail. It goes down to some number, and then that number will continue for several years possibly, because here’s the thing: You have data stored on that technology. You have infrastructure,” said Mushell. “So, you’re going to use it as long as you can.”

‘ChAllenGe Is FIndInG GroWth’

New York-based 451 Research LLC conducted interviews this year with 247 large and medium-sized enterprises, and the results showed FC’s footprint remains significant. The vast majority (83%) use fabric or director-class FC switches in production.

“The challenge is finding growth,” wrote Marco Coulter, vice president of storage customer insight at 451 Research, in an email. He noted that only four respondents without FC deployed in production today indicated plans to in-troduce it.

Among a subset of 82 enterprises that discussed storage strategy, the 67% using an FC SAN viewed it as strategic over the next five years and 40% of that group said any alternative would need to offer better performance at a lower cost to supplant it. Yet, iSCSI was the top alternative for 39% who didn’t view an FC SAN as strategic, and FCoE was next, at 17%.

“Storage in enterprises is going through dramatic ar-chitectural changes driven by flash, cloud and hyper-con-vergence,” Coulter concluded. “In the midst of this chaos, FC SAN sits as a strategic technology already in use but not the technology of choice for greenfield deployments.”

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

Page 10: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 10

FlAsh storAGe GIves FC A kICk

FC has gotten a boost from flash storage. In a TechTarget survey earlier this year, eight of 11 storage vendors said the majority of their enterprise customers use FC switches and adapters with their all-flash arrays and hybrid storage systems, which combine solid-state and hard-disk drives.

Nimble Storage, which specializes in hybrid arrays, saw a need to introduce FC networking last year after initially supporting only iSCSI in its SAN products. The startup wanted to expand its market base to enterprises

after originally focusing on higher-end, small and medi-um-sized businesses.

EMC and Hewlett-Packard said flash has spurred up-grades to 16 Gbps FC in many cases. Nearly half the FC switch ports shipped last year were 16 Gbps, and 16 Gbps HBAs started to ramp up this year, according to Crehan. He said that adoption has been faster for switches because of inter-switch links, where the bandwidth demand is greater.

From 2013 to 2014, FC HBA ports declined from 3 mil-lion to 2.8 million, as HBA revenue fell from $570 million to $537 million, according to Crehan.

Jeff Hoogenboom, vice president and general manager of Avago Technologies’ Emulex Connectivity Division, one of the two major HBA vendors, attributed the slowing of the FC market to the decline of Unix servers.

But, Hoogenboom said flash, the transition to 16 Gbps and market growth in regions such as China are reinvigo-rating FC. He claimed the SAN-attached rate for FC HBAs and FCoE converged network adapters (CNAs) grew 2% from 2013 to 2014, with a slight decline for the HBAs and slight increase for the newer CNAs.

Future oF FC teChnoloGy

At this point, FC shows no signs of hitting the end of the technology innovation road. The major FC switch makers, Brocade Communications Systems and Cisco Systems, and HBA vendors, QLogic and Avago’s Emulex division, all confirmed they are working on next-generation products that promise to double the data transfer rate to 32 Gbps.

FC ports shipped in 2014 by speed

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

4 gbps

8 gbps

16 gbps48%

52%

1%

11%

86%

3%

note: Does not aDD up to 100% Due to rounDing; source: crehan research inc.

n SWiTCH

n HBA

Page 11: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 11

“Cisco’s philosophy is to invest in multiprotocol storage networking, and toward that end, there’s a class of appli-cations such as databases and [Microsoft] Exchange that require high-transaction I/O processing,” said Nitin Garg, a senior manager of product management in Cisco’s data center and enterprise switching group. “It’s for this class of applications that customers have deployed Fibre Channel in the past, and we expect that customers will continue to deploy Fibre Channel for the foreseeable future.”

Brocade, typically the first switch vendor to support new FC technology, expects to start shipping 32 Gbps products in 2016, according to Scott Shimomura, a direc-tor of product marketing.

He noted the latest Gen 6 FC standard also provides an option for 128 Gbps FC, mapping four lanes of 32 Gbps FC onto a single cable and optical transceiver to enable customers to aggregate traffic. He said Brocade expects to ship 128 Gbps switches in 2017. The use of 128 Gbps FC will require users to shift to quad small form-factor pluggable (QSFP) transceivers from the SFP+ optics used with 16 and 32 Gbps FC, he added.

Shimomura said he has heard reports about the im-pending death of Fibre Channel since 2000, just three years after the introduction of the first FC products. He said the naysayers tend to resurface just before a new FC generation is ready to hit the market.

“Every time they’ve predicted the death, the oppo-site’s been true. We had record years all the way up until

probably a year or two ago, when we saw a flattening.”For the most recent fiscal quarter, ending on May 2,

Brocade reported SAN product revenue of $314 million, down 2% over the same quarter in 2014. The company blamed the decline on softer storage demand and oper-ational issues at certain OEM partners. Meanwhile, Bro-

cade’s IP Networking product revenue was up 19% over the same quarter last year, to $145 million.

Marc Staimer, president of Dragon Slayer Consulting, said he foresees FC SANs being relegated to niche status within 10 years, in use only in places where IT organiza-tions are slow to change.

“There are only two Fibre Channel switch vendors left, and there are only two HBA vendors left. That tells you it’s a non-growth market,” said Staimer. “Nothing ever dies in storage, but Fibre Channel’s on the watch list.” n

CArol slIWA is a senior writer with TechTarget’s Storage Media Group.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

“ We expeCt thAt Customers WIll ContInue to deploy FIbre ChAnnel For the ForeseeAble Future.” —nitin Garg, Cisco Systems

Page 12: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 12

snapshot 1Companies lacking DR plans offer weak excuses

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

D What is your relative level of confidence in your organization’s dr plan in the event of an emergency?

storage • september 2015 12

* multiple selections permitteD

D What’s the main reason your company doesn’t currently have a disaster recovery plan in place?

Developing a plan now

Haven’t gotten around to it

lack the funds

Backup system suffices

lack the staff

Don’t see a need

other

39%

29%

26%

20%

17%

14%

9% 32D percentage of companies that don’t have a disaster recovery plan in place

none low moderate High

2% 8% 49% 41%

Page 13: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 13

Whether drIven by direct competition or internal business pressure, CIOs, CDOs and even CEOs today are looking to squeeze more value, more insight and more intelligence out of their data. They no longer can afford to archive, ig-nore or throw away data if it can be turned into a valuable asset. At face value, it might seem like a no-brainer—“we just need to analyze all that data to mine its value.” But, as you know, keeping any data, much less big data, has a definite cost. Processing larger amounts of data at scale is challenging, and hosting all that data on primary storage hasn’t always been feasible.

Historically, unless data had some corporate value—possibly as a history trail for compliance, a source of strategic insight or intelligence that can optimize opera-tional processes—it was tough to justify keeping it. Today, thanks in large part to big data analytics applications, that thinking is changing. All of that bulky low-level bigger data has little immediate value, but there might be great future potential someday, so you want to keep it—once it’s gone, you lose any downstream opportunity.

bIG dAtA AlChemy

To extract value from all that data, however, IT must not only store increasingly large volumes of data, but also architect systems that can process and analyze it in

SCAlABle SToRAge

Architecture for analytics:

big data storageBig data analytics place new demands

on storage systems and often require new or modified storage structures.

by mike matchett

SHoo_ARTS/iSToCk

home

Page 14: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 14

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

multiple ways. For many years, the standard approach was to aggregate limited structured/transactional ele-ments into data warehouses (and related architectures) to feed BI workflows and archive older or file-based data for compliance and targeted search/recall needs. Thus, we’ve long supported expensive scale-up performance platforms for structured query-based analytics alongside capacity-efficient deep content stores to freeze historical and compliance data until it expires. Both are complex and expensive to implement and operate effectively.

But, that limited bimodal approach left a lot of potential data-driven value out of practical reach. Naturally, the market was ripe for innovation that could not only bring down the cost of active analysis at larger scale and faster speeds, but also fill in the gaps where value-laden data was being left unexploited. For example, archiving archi-tectures began embedding native search type analytics to make their captured cold data more “actively” useful. Today, after generations of performance and scalability de-sign improvements, what was once considered a dumping ground for dying data has evolved into Web-scale object storage (e.g., AWS S3).

Likewise, the emerging Hadoop ecosystem brought HPC-inspired scale-out parallel processing onto afford-able hardware, enabling rank-and-file organizations to conduct cost-efficient, high-performance data analysis on a large scale. As a first use case, Hadoop is a good place to land raw detail data and host large-scale ELT (extract, load, transform)/ETL (extract, transform, load) for highly structured BI/DW architectures. But the growing Hadoop ecosystem has also unlocked the ability to mine value from

keys to big data storage success When tAsked WIth storing and analyzing data on a

large scale, consider the following:

n While storage (and compute, memory and so on)

is constantly getting cheaper, as the analytical

data footprint grows over time so does cost. When

you budget, account for data transmission and

migration fees, lifetime data access operations

(even the cost of eventual deletion) and other

administrative/operational factors (e.g., storage

management opex).

n ensuring data protection, business continuity/

availability and security do not get easier at scale.

And placing tons of eggs into only a few baskets

can create massive points of vulnerability.

n While many analytical needs have been met with

batch-oriented processing, more and more analyt-

ical outputs are being applied in real time to affect

the outcomes of dynamic business processes (i.e.,

meeting live prospect/customer needs). this op-

erational speed intelligence requires well-planned

big data workflows that will likely cross multi-

ple systems and will probably require judicious

amounts of flash cache or in-memory processing. n

Page 15: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 15

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

less-structured, higher-volume and faster-aggregating data streams. Today, complex Hadoop and Hadoop-con-verged processing offerings (e.g., HP Haven integrating Hadoop and Vertica) are tightly marrying structured and unstructured analytical superpowers, enabling operation-ally focused (i.e., in business real-time) big data analytics applications.

In fact, we are seeing new and exciting examples of convergent (i.e., convenient) IT architectures combin-ing ever larger-scale storage with ever larger-scale data processing. While the opportunity for organizations to profitably analyze data on a massive scale has never been better, the number of storage options today is bewildering. Understanding the best of the options available can be quite challenging.

mAssIve AnAlytICAl dAtA storAGe

So what does it take to store and make bigger data an-alytically useful to the enterprise? Obviously, handling data at larger scale is the first thing that most folks need to address. A popular approach is to leverage a scale-out design in which additional storage nodes can be added as needed to grow capacity. Scale-out products also deliver almost linear performance ramp-up to keep pace with data growth—more nodes for capacity mean more nodes serving IOPS. Certain architectures even allow you to add flash-heavy nodes to improve latency and capacity-laden ones to expand the storage pool.

Many scale-out storage products are available as software-defined storage; in other words, they can be

purchased as software and installed on more cost-effective hardware. In the real world, however, we see most folks still buying SDS as appliances, pre-loaded or converged to avoid the pain of DIY implementation.

The second thing we find with these new generations of massive analytical systems is that the analytical processing is being converged with the storage infrastructure. There is, of course, a remote I/O performance “tax” to be paid when analyzing data stored on a separate storage system, and with bigger data and intensive analytics that tax can be staggering, if not an outright obstacle.

When we look at the widely popular and growing Hadoop ecosystem (YARN, MR, Spark and so on), we see a true paradigm shift. The Hadoop Distributed File System (HDFS) has been designed to run across a scale-out cluster that also hosts compute processing. Cleverly parallelized algorithms and supporting job schedulers farm out analysis tasks to run on each node to process relevant chunks of locally stored data. By adding nodes to deal with growing scale, capacity can be increased while overall performance remains relatively constant.

Since Hadoop is a scale-out platform designed to run on commodity servers, HDFS is basically software-defined storage custom-designed for big data. However, there are some drawbacks to a straight-up Hadoop implementation, including challenges with handling multiple kinds of data at the same time, mixed user/workloads with varying QoS needs, and multi-stage data flows. Within a single Hadoop cluster, it can be hard to separately scale capacity and performance. And Hadoop’s native products are still ma-turing around enterprise data management requirements,

Page 16: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 16

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

although Hadoop vendors like Hortonworks and Cloudera continue to fill the remaining gaps.

For some use cases, the fact that enterprise networks are always getting faster means that a separate scale-out storage system closely networked to a scale-out processing system can also make sense. Instead of fully converging processing with storage, maintaining loosely coupled infrastructure domains can preserve existing data man-agement platforms and provide for shared multi-protocol data access patterns.

Before attempting to use existing enterprise storage, consider the large-scale analytical data demands—will traditional storage platforms designed to share centralized data with many similar client workloads be able to serve a tremendous number of different small files, or a small number of tremendously large files to each of many ana-lytical applications at the same time? For example, HDFS is designed to support analyses that call for huge streams of long serial reads, while traditional NAS might focus on tiering and caching hot data for small file read and write use cases.

Futures: ConverGed dAtA lAkes

Overall, it seems like converged (and hyper-converged) products are the future. The evolving Hadoop architecture is but one example. With the advent of container technol-ogies, we are starting to hear about how traditional storage arrays may be able to now natively host data-intensive applications.

IT convergence is happening on several levels, including

managing massive storage for analytics here Is A list of areas to pay attention to:

1. Capacity planning. Balancing vast amounts of

data with a seemingly infinite scale-out infra-

structure is not trivial. capacity needs ongoing

attention and planning to optimize cost while

avoiding getting caught without enough space.

2. Clusters. As clusters of any kind of iT infra-

structure grow to hundreds or even thousands of

nodes, effective cluster management becomes

increasingly important. patching, provisioning

and other tasks become difficult without world-

class management.

3. big data workflows. When designing really ef-

fective storage systems, think about data from an

end-to-end lifecycle perspective by following the

data from sources, to results, to content distribu-

tion and consumption (and back again).

4. data protection. At scale it’s even more import-

ant to protect data from loss or corruption, and re-

cover from potential disasters. look for snapshot,

replication, backup and Dr approaches that could

address bigger data stores. n

Page 17: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 17

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

the integration of storage and compute (and networking services) and by mixing diverse data types (e.g., transac-tional records with unstructured documents and machine data repositories) together to support increasingly com-plex and demanding applications.

Many vendors today are pushing the idea of an enter-prise big data lake in which all relevant corporate data first lands to be captured, preserved and mastered in a scale-out Hadoop cluster. Data from that master reposi-tory would then be directly available for shared access by big data analytics applications and users from across the organization.

However, some of the thorniest challenges to the data lake concept are governance and security. It’s hard to track exactly what data is in a structured database, much less sunk into a huge unstructured data lake. That’s important not just to help figure out what can be useful in any given analytical scenario, but also to find and perhaps mask things like credit card numbers for compliance reasons. And with multiple modes of access across an ever-chang-ing data lake, how do you control who has access to which data and track who has accessed it? From a data quality perspective, users will want to know which data is most current, where it came from and what exactly about it has been validated.

bACk to the Cloudy Future

We are seeing a resurgence of storage previously asso-ciated with HPC-like environments now coming into its own in enterprise data centers to support large-scale

analytical processes. Some examples include Panasas’ PanFS, Lustre-based arrays (i.e., DDN EXAscaler) and IBM’s GPFS packaged into IBM Spectrum Scale.

Also, public cloud storage and burstable analytical pro-cessing go hand-in-hand (e.g., AWS S3 and Amazon Elastic MapReduce). Today, many cloud security regimes are bet-

ter than some enterprise data centers, and cloud options are now able to meet most compliance regulations.

One perceived sticking point with the cloud is the cost and time of moving data into and across clouds, but in practice for many applications, most data only needs to be moved into the cloud once, and from then on, only manageable increments of data need be migrated (if not produced in the cloud). Of course, cloud data storage costs over time will accrue, but that can be budgeted.

With software-defined, scale-out storage, converged infrastructure (storage and processing), virtualization, containers and cloud approaches, enterprises are now well-armed to build cost-effective scalable storage to sup-port whatever analytical challenge they take on. n

mIke mAtChett is a senior analyst and consultant at Taneja Group.

It’s hArd to trACk exACtly WhAt dAtA Is In A struC-tured dAtAbAse, muCh less sunk Into A huGe unstruCtured dAtA lAke.

Page 18: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 18

snapshot 2Disk, replication and—surprise!—tape are key Dr techs

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

storage • september 2015 18

D What technologies/practices do you currently use in your dr plan?

D most important factors in judging tape, disk or online vaulting products for dr

76% cite price as the most

important criteria when evaluating DR management/

monitoring software.

Disk backup

Remote replication

off-site tape storage

Tape backup

Co-location services

Cloud backup

DR monitoring app

online vaulting

Capacity/scalability

price

Recovery speed

Compatible with current backup/storage

Compatible with virtual machines

rpo/rto

Compatible with servers

Familiar with the media

59% 75%

51% 72%

43% 63%

41% 54%

29% 52%

28% 46%

19% 39%

18% 23%

*multiple selections permitteD on both graphs

Page 19: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 19

beyond block and file: Integrating

object storageobject storage offers scalability that

traditional storage doesn’t, but integration with block and file is tough.

By CHRiS evAnS

storage systems

For the pAst 20-plus years, block and file have been the two main external shared storage system protocols. Both protocols have been successful due to the ubiquity of the networking interfaces that drive them. In the case of block devices, that has been Fibre Channel and Ethernet (iSCSI); and for file, it has been Ethernet (CIFS/SMB and NFS).

However, block and file are not well-suited for build-ing large-scale data repositories, mainly due to issues with data protection and indexing and addressing. RAID doesn’t scale adequately and file-based protocols start to run into issues with metadata management at peta-byte-sized volumes of data and billions of files.

Object storage has emerged as an answer to storing data at the multi-petabyte level. However, many applications expect traditional SAN and NAS interfaces, so integrating an object store is not as straightforward as using block- and file-based systems. But it’s being done today and there are a variety of options available for making object storage work with your organization’s key applications.

Why obJeCt storAGe?

Object storage has unique attributes that overcome the issues of scalability and metadata management seen in traditional storage platforms. These features include:

SHAiATeA/iSToCk

home

Page 20: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 20

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

dispersed and geo-dispersed data protection. Protection mechanisms are typically implemented with a form of erasure coding, also known as forward error correction, which allows lost or corrupted data to be recreated from a subset of the original content. The exact ratios of re-dundant to primary data are determined by the service level needed to be applied to that content. As a protec-tion mechanism, erasure coding is much more scalable and capacity-efficient than RAID (albeit at the cost of additional CPU overhead). Erasure coding also offers business continuity/disaster recovery (BC/DR) benefits by allowing subsets of erasure-coded data to be placed in geographically distant locations. This can protect against the failure of one (or more) of the systems in these loca-tions. Obviously, the specific configuration of an object storage system depends on an organization’s specific data protection requirements.

Improved data management. With any storage system, there is always the risk of data loss or corruption. Today’s disk and solid-state storage media are reliable, but not totally error-free. That can be a problem with very large-scale data repositories. Storage media does fail and may be subject to silent corruption or unrecoverable read errors (UREs) that put data at risk. Object stores mitigate these problems using data scrubbing techniques that validate and rebuild potentially corrupt or missing data. The use of erasure coding and the typical write-once nature of object store data allows failed data to be recreated as a background task with little or no impact to production operations.

The ability of object stores to manage device failure at scale (and the fact that object stores do not have high I/O performance requirements) means that systems can use lower-cost, higher-capacity drives than their block or file counterparts. At scale, the ability to maximize capacity and reduce the cost per TB becomes a design imperative.

detailed and extensible metadata. Block-based storage systems collect very little information about the contents of the data being stored in the system. The metadata that does exist is used to map logical concepts such as LUNs or volumes to the physical location of that data on disk. Modern block storage systems use metadata to track the application of space-saving features such as thin provi-sioning and data deduplication, which are infrastructure rather than content-focused. File-based storage makes use of slightly more metadata, as the nature of storing files re-quires keeping track of permissions (ACLs), access dates/times and file owners.

Object stores offer much richer metadata capabilities, typically providing extensibility to the metadata model itself; allowing abstract key pairs (keywords and values) to be stored with each object. Object storage systems have the ability to search metadata quickly and efficiently to locate and retrieve objects from the store.

simplified data access. At the heart of the technology, object stores use vastly simplified access methods to store and retrieve data. REST APIs based on Web-based pro-tocols like HTTP allow objects to be accessed through a unique URL. The URL is built from API commands (like

Page 21: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 21

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

GET and PUT) plus a unique reference code assigned to each object—the object ID.

In terms of standards, the de facto object API is Amazon Web Services’ S3 (Simple Storage Service). The S3 API for-mat has become so ubiquitous that object-based platforms must support it to compete in today’s market. The accu-racy of S3 support is a key success factor for vendors and their products. Many applications use S3 support as the standard method of writing and reading application data.

versioning. REST-based APIs provide a much easier way to interact with an object store as almost all commands operate at the object level. In addition, on most object stores, an individual object is immutable; meaning that once created, it cannot be changed. Updates to the data within an object require the user to retrieve the data, change the object and store it again into the object store. The result is a new object ID or a new version of the same object. This ability to version data provides an audit trail and archive log that allows previous versions of objects to be retrieved. On systems that provide data deduplica-tion, the overhead of object versioning is restricted to the change in data itself.

Cloud-bAsed obJeCt storAGe InteGrAtIon

All of the major cloud vendors offer some kind of object storage technology and, in many cases, it was the first storage platform they offered. AWS offers S3 and Gla-cier, Google Compute Platform offers Cloud Storage and Microsoft Azure offers Blob Storage. Consuming these

services is a case of writing for the APIs, all of which are, of course, subtly different. That means vendors must de-velop software using the API to allow object resources to be used. The alternative is to use a cloud gateway that pres-ents a well-known interface to the end user, such as block or file. The cloud providers already offer these features to some degree. Amazon provides the AWS Storage Gateway, a locally installed software feature that emulates iSCSI storage while storing the content on S3. Microsoft acquired Stor-Simple in 2012 and now offers a hardware appliance based on the technology as an on-ramp to storing data in Microsoft Azure while using the standard iSCSI interface.

There are also a range of ven-dors that offer hardware and software products that integrate with the cloud providers. These include CTERA, FXT from Avere Systems, Nasuni, NetApp’s AltaVault (based on technology acquired from Riverbed), Panzura and TwinStrata (now part of EMC). However, because these products provide the ability to consume cloud-based object stores through the familiarity of block and file protocols, they don’t de-liver the full benefits of object storage, per se. So what about IT departments looking to deploy in-house object stores? How does the landscape change when you want to deploy hardware on-site, rather than consuming it as a service from a cloud vendor?

11% of our readers are using object

storage today; another 9.6% are “actively evaluating” it

SoURCe: TeCHTARgeT

Page 22: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 22

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

In-house obJeCt storAGe

There are both proprietary and open-source platforms for building object stores. The leaders in the market (accord-ing to IDC) are Cleversafe, Scality, DataDirect Networks (DDN), Amplidata and EMC. NetApp, Caringo, Cloudian and HDS are also major players. Some are deployed as ap-pliances and some as software-only, where the customer chooses hardware of their own specification. Looking to the open-source platforms, there are options such as Ceph and OpenStack Swift. Ceph is now part of Red Hat, which offers commercial support and SwiftStack provides support for Swift.

These products offer the common characteristics of cloud storage (protection, data and metadata manage-ment, APIs and versioning). There is widespread (usually native) protocol support among these systems. Scality supports NFS, SMB, Linux FS, REST, CDMI, S3, Open-Stack Swift, Cinder and Glance, making the platform readily integrated into OpenStack environments as the persistent storage layer for all requirements.

Of course, object stores must have a good metadata engine and rich application support. Cloudian, for exam-ple, uses a modified version of the open-source Cassandra database for both metadata and transaction logging. The database can be shared and distributed across multiple nodes, providing scalability for the metadata function as object volumes increase. Hitachi’s HCP is a good example of a platform with strong application support. HDS offers integrated search capabilities (Hitachi Data Discovery Suite), data ingest (with Hitachi Data Ingestor) and inte-gration with secure file sharing through HCP Anywhere.

Most vendors have optimized their technology for performance through tiering, supporting both solid-state and traditional spinning media. DDN’s WOS, for exam-ple, is capable of supporting up to three million IOPS in a single cluster, with optimization both for small and large file performance. Cleversafe, Cloudian and DDN all use techniques to measure the latency of each node in a cluster, retrieving data from the nodes with the lowest latency scores. This feature is particularly important in geo-dispersed configurations.

Many of these systems offer features that allow nodes to be added and removed from a storage cluster without interruption to meet availability requirements. The nodes being migrated were simply powered off, moved and re-added to the cluster, with background data manage-ment features updating and correcting any changed or corrupted data identified during the physical move. Most also provide data encryption at rest for added protection in large environments where drive replacements can be frequent. Multi-tenancy is common as well, making them suitable for service provider environments or private clouds.

Finally, some object storage platforms offer integration between private and public object stores. Cloudian and HCP, for example, offer this functionality. This allows or-ganizations to take advantage of the cost effectiveness of the public cloud for certain types of data (such as inactive or rarely accessed content) while retaining the ability to search across on-site and cloud data. n

ChrIs evAns is an independent consultant with Langton Blue.

Page 23: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

Where does the majority of the value reside in a typical enterprise storage array? Is it the software, the hardware or both?

With all the attention and hype around software-de-fined storage (SDS) lately, the industry seems to be lean-ing in the direction of software. The noise isn’t just coming from small storage startups, either. EMC, HP and IBM, just to name a few, are all offering SDS products.

Despite the confusion around SDS technology, many organizations will likely add SDS products to their list of future storage initiatives, betting on the promises of SDS, all of which look enticing:

n hardware flexibility: Separating storage software from hardware allows organizations greater choice in what hardware they deploy and when they deploy it.

n Access to newer hardware technologies sooner: This

flexibility allows for faster or higher-capacity storage to be integrated as soon as the hardware is available, instead of having to wait for a new array to be released four to five years later.

n simplified license management: There is no need to buy new or upgrade storage software feature licenses when procuring the next generation of hardware.

n support for multiple generations: The ability to inte-grate multiple hardware revisions over the lifetime of the system. This allows organizations to incrementally upgrade hardware with data in place, eliminating fork-lift upgrades.

However, there is a potentially hidden cost related to SDS that few in the industry are talking about. The mixing and matching of hardware made possible by SDS can shift the cost—or the risk—of integrating the software and hardware to the end-user.

While many like to use the term “commodity hardware,” in truth, there is no such thing. I spent a portion of my career as a storage engineer, and I still have nightmares of the transition from U160 to U320 SCSI. I remember working long hours in a lab with sets of presumably com-modity SCSI hard drives, each creating a unique interac-tion on the SCSI bus. Drives from vendor A would work

hot spots SCoTT SinClAiR

the cost of sdsSoftware-defined storage is often lauded for cost efficiency and flexibility, but many overlook hidden costs for it.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

storage • september 2015 23

Page 24: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 24

fine. Drives from vendor B would work fine. However, if drives from vendor A and B were together, the entire system would break. The results would vary based on controller firmware, drive manufacturers and drive firm-ware revisions. After months of detailed engineering and

test analysis, we would be able to release a qualified and validated system that worked.

Times and technologies have changed to some extent, and some may argue that drive standards have improved. But I would argue that new hardware technologies, such as solid state, are evolving every day. The bottom line is that if we extend the idea of storage software abstraction to its fullest, it should be able to work with any hardware. If we see that as truly desirable for SDS deployments, the number of possible technology combinations in a system over the life of the software could become endless. In this scenario, the responsibility—and cost—to validate and integrate new hardware technologies will fall primarily to IT.

Organizations that are evaluating SDS products today

often recognize this challenge and ask for hardware op-tions that have been qualified as a way to help mitigate the risk. Many SDS offerings also have an appliance option to ease the integration concern. However, some might argue that this isn’t truly software-defined storage. Additionally, SDS products that target large content repository storage workloads, such as object storage, create multiple copies of data or use erasure coding to improve resiliency, re-ducing the risk of data loss if a non-validated hardware component is deployed.

I expect some organizations to welcome the oppor-tunity to deploy storage technology as software. These organizations may be large enough to support their own qualification efforts and procure enough hardware where the scale could justify the implementation of non-ven-dor qualified components. For these deployments, SDS technology as a software-only deployment may make tremendous sense.

Organizations will have to make a choice based on what is best for their business. As noted, SDS players are recognizing this challenge and responding by delivering appliances or providing lists of certified components. When choosing an SDS product, it is critical to evaluate if the vendor can truly offer the benefits of SDS—whether it is delivered as software or as hardware. n

sCott sInClAIr is a storage analyst with Enterprise Strategy Group in Austin, Texas.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

sds CAn shIFt the Cost— or the rIsk—oF InteGrAtInG the soFtWAre And hArd-WAre to the end-user.

Page 25: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

As hyper-ConverGed products mature, I am watching splits occurring among hyper-converged vendors along several lines, with data protection at the core. Data protection in general is undergoing a sea change, but hyper-conver-gence adds yet another dimension.

Technologies such as data deduplication, compression, continuous data protection, synthetic backup, copy data management, very rapid snapshots, and WAN optimi-zation applied to replication are changing traditional data protection. Also, we are starting to see flat backup presented by a variety of primary storage vendors. This approach allows users to send backup data directly to secondary storage without the need for backup software.

On the surface, products from Gridstore, Maxta, Nim-boxx, Nutanix, Scale Computing, SimpliVity and VMware EVO:RAIL partners may seem quite similar, but under the covers, some fundamental differences are showing up. The very basic difference is whether the only conver-gence is between compute and storage, with all the other layers left alone. This is rudimentary hyper-convergence, at best, and vendors such as SimpliVity argue these prod-ucts don’t even belong in the hyper-converged category. It is important to understand what a given vendor has converged and what else is necessary to complete your overall IT infrastructure. The area in which this varies most is data protection.

hyper-ConverGed vendors

oFFer vArIed strAteGIes

For example, SimpliVity makes a categorical statement that (other than networking) one does not need any other products to build a complete IT infrastructure. That means no specialty data deduplication arrays, WAN opti-mization appliances, DR products, media servers, backup software, replication software or performance, configura-tion and SRM managers. Over time, I expect this list will include archiving, governance, compliance and perhaps even Hadoop-based big data analytics. And whatever else the world has to throw at it. SimpliVity offers the most

ReAD/WRiTe ARUn TAneJA

the flipside of hyper- convergenceHyper-convergence has already had a big impact on primary storage. now some vendors are taking the concept to data protection.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

storage • september 2015 25

Page 26: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 26

holistic vision for hyper-convergence I have seen in the industry. Of course, we all know in reality you may need additional technology at any given point in time, as things evolve. But the vision is crystal-clear: one straightforward way to build an IT infrastructure.

In comparison, look at Nutanix. It stays true to “primary storage” hyper-convergence. That means they want to leave secondary storage to others. This is a vastly differ- ent strategy. I expected market maturity would bring out some differences in strategy, but these differences are stark and will lead these companies in very different di-rections (including who they partner with and compete against).

Encouraged by Nutanix’s primary storage approach and seeing the value of convergence in general, two new companies are trying to change the world of secondary storage: Cohesity and Rubrik. While there are differences in their approaches, both are applying the principles of hyper-convergence to secondary storage. In essence, this means backup, archiving, DR, test/dev, analytics, com-pliance, governance and so on, on one massive scale-out storage array.

Another way to look at this is to view this as creating a two-storage-array world: one for primary data (storage interacting with the main application) and one for sec-ondary data (storage for everything else). Storage require-ments for these two worlds are, of course, vastly different. Performance (IOPS, transactional throughput and low latency), data resiliency and availability are the primary drivers for primary storage, while density, immutability

and simplified technology upgrades are the primary driv-ers for secondary storage.

seCondAry storAGe

mArketplACe Is A mIshmAsh

Today, the world of secondary storage is a mishmash of products hundreds of different vendors supplying data protection, archiving, compliance, governance, cloud gateways and other products. These new companies want to change that completely. This is a heady charter but the secondary storage market could not be more ready for change.

Many organizations today struggle to store, protect and manage mountains of data, let alone extract value from it. This is a new world and we are at the earliest stages of development. Just as hyper-convergence is transforming primary storage, the same principles are being applied to the all aspects of secondary storage—in a scale-out fashion that will hopefully avoid the issues we’ve been dealing with on primary side with scale-up products. The impact on data protection stalwarts, such as Commvault, EMC, IBM and Symantec, could be significant.

Data protection as a discipline has been sleeping for decades, but data management convergence may be the wakeup call it needs. Get ready for an exciting ride. n

Arun tAneJA is founder and president at Taneja Group, an analyst and consulting group focused on storage and storage-centric server technologies.

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

Page 27: Storage - Bitpipedocs.media.bitpipe.com/io_12x/io_126400/item... · bundles. Yes, it’s still software-defined storage, but here’s some hardware to go with it. VMware is the poster

storage • september 2015 27

TechTarget Storage Media Group

storAGe mAGAzIneVp eDitorial Rich CastagnaexeCUTive eDiToR Andrew Burtonsenior managing eDitor ed HannanASSoCiATe eDiToRiAl DiReCToR ellen o’BrienConTRiBUTing eDiToRS James Damoulakis, steve Duplessie, Jacob gsoedlDiReCToR oF online DeSign linda koury

seArChstorAGe.ComseArChCloudstorAGe.Com seArChvIrtuAlstorAGe.ComASSoCiATe eDiToRiAl DiReCToR ellen o’BrienSenioR neWS DiReCToR Dave RaffoSenioR neWS WRiTeR sonia r. leliiSenioR WRiTeR Carol SliwaSTAFF WRiTeR garry kranzSiTe eDiToR Sarah WilsonASSiSTAnT SiTe eDiToR erin Sullivan

seArChdAtAbACkup.Com seArChdIsAsterreCovery.Com seArChsmbstorAGe.Com seArChsolIdstAtestorAGe.ComexeCUTive eDiToR Andrew Burtonsenior managing eDitor ed HannanSTAFF WRiTeR garry kranzSiTe eDiToR paul crocetti

storAGe deCIsIons teChtArGet ConFerenCeseDitorial expert community coorDinator kaitlin Herbert

subsCrIptIonswww.searchstorage.com

storAGe mAGAzIne275 grove street, newton, ma [email protected]

teChtArGet InC. 275 grove street, newton, ma 02466www.techtarget.com

©2015 techtarget inc. no part of this publication may be transmitted or reproduced in any form or by any means without written permission from the publisher. TechTarget reprints are available through The ygS group.

About techtarget: TechTarget publishes media for information technology professionals. more than 100 focused websites enable quick access to a deep store of news, advice and analysis about the technologies, products and processes crucial to your job. our live and virtual events give you direct access to independent expert commentary and advice. at it knowledge exchange, our social community, you can get advice and share solutions with peers and experts.

coVer image anD page 7: henrik5000/istock

Home

Castagna: The commodity myth

Toigo: Back to school

FC is alive and kicking

Weak excuses for DR dearth

Architecture for analytics

Tape is still key for DR

Beyond block and file

Sinclair: SDS costs

Taneja: Hyper-converged data protection

About Us

stay connected! Follow @SearchStorageTT today.