permabit albireo vdo white paper1

7
Permabit Technology Corporaon One Alewife Center, Suite 410 Cambridge, MA 02140 USA Phone: 617.252.9600 FAX: 617.252.9977 [email protected] www.permabit.com WHITE PAPER Permabit Albireo Data Optimization Software Virtual Data Optimizer (VDO) for Linux August 2012

Upload: unicycle1234

Post on 22-Dec-2015

13 views

Category:

Documents


3 download

DESCRIPTION

Permabit

TRANSCRIPT

Page 1: Permabit Albireo VDO White Paper1

Permabit Technology CorporationOne Alewife Center, Suite 410

Cambridge, MA 02140 USA

Phone: 617.252.9600 FAX: 617.252.9977

[email protected] www.permabit.com

WHITE PAPER

Permabit Albireo Data Optimization Software

Virtual Data Optimizer (VDO) for Linux

August 2012

Page 2: Permabit Albireo VDO White Paper1

2

ContentsExecutive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Market Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Low-End NAS Market — Big Storage is Coming to Town . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Enterprise Flash — New Solutions, New Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Cloud Computing — A Growth Opportunity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Ready-to-Run Deduplication Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Albireo VDO Data Optimization Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Albireo VDO Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Albireo VDO Deduplication Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Albireo VDO Resource Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Why Permabit? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Expertise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

About Permabit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Find Out More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

“The Albireo technology from Permabit will save an OEM 18-24 months getting to market, if they can do it at all. This stuff is so far ahead in its capabilities and performance I can’t see why you would want to do it yourself, unless you already have it baked.”

— Steve Duplessie Founder & Sr. Analyst

Enterprise Strategy Group

Executive SummaryRampant data growth is the IT industry’s single most important issue, affecting budgets, operating costs, floor space and of course capital expenditure through the amount of data created and its associated cost. According to IDC, the amount of electronic data created is expected to reach 35 zettabytes by 20201. The conundrum facing every business is how to afford to store, analyze, manage and house this data.

With recent advancements in data optimization, IT organizations are exploring this technology as a more efficient means of housing information. In fact, another study by ESG shows that data efficiency is the number one priority of storage professionals in IT today. As a direct result of this increasing customer demand, storage manufacturers and online service providers are now offering data optimization capabilities. Of the available data optimization technologies, data deduplication is the one with the greatest potential to deliver substantial and recurring impact on the cost and manageability of data growth.

The challenge for many of these manufacturers and providers is to incorporate deduplication technology that can be leveraged across storage platforms while meeting market timing demands and not derailing other high priority R&D projects. Several have made investments in deduplication technology specifically to address problems associated with backup, only to find the resulting solutions were not suitable for primary storage workflows or were not extendable to new technologies.

Albireo Virtual Data Optimizer (Albireo VDO) provides the fastest route to market for deduplication on systems running the popular Linux operating system. Powered by Permabit’s Albireo Data Optimization Software, Albireo VDO is a complete, ready-to-run solution that offers block-level deduplication services for both Linux-based storage OEMs and online service providers.

1 Extracting Value from Chaos, IDC, June 2011

Page 3: Permabit Albireo VDO White Paper1

3

Market DynamicsUnbridled information growth is forcing all organizations to re-think their storage strategies

in light of flat IT budgets. As a result, the $25+ billion storage market is facing a sea change that promises to reset the competitive landscape. Technologies that substantially reduce overall

storage costs are “must have” requirements. OEMs that provide the highest storage efficiency and most substantial top-line and bottom-line impacts are poised for the greatest success.

Low-End NAS Market — Big Storage is Coming to Town

The market for Linux-based OEM storage has grown 300% over a 3 year period. “Big Storage” providers recognize this growth as an opportunity and are moving down-market using data optimization technologies to deliver highly efficient storage at extremely low realized costs per GB. Gartner defines this as the Low-end Enterprise NAS market segment, where Linux-based storage dominates. Deduplication technology is allowing “Big Storage” to be increasingly competitive in this cost-sensitive market. The challenge facing today’s value-oriented, Linux-based storage OEM is: how can they continue to leverage open source to compete effectively with larger competitors who have data optimization capabilities?

Enterprise Flash — New Solutions, New Challenges

At the same time, on the high-end of the storage market, Enterprise Flash-based appliances have rapidly evolved to become the performance leader in IT. An array of Enterprise Flash typically accelerates overall application performance due to its I/O capabilities when compared to spinning disk. Many of these appliances also provide caching capabilities that automatically move inactive data to traditional hard drives. Mission-critical applications in the enterprise today such as database indexing, online transaction processing, desktop and server virtualization, front-end Web serving, and key infrastructure offerings (such as email and messaging) have been the targets of Enterprise Flash deployments.

Over the past few years, the cost of high-performance storage has dropped significantly. The main driver for this cost reduction has been the emergence of flash-based solid-state devices (SSDs) in enterprise storage appliance configurations. When deduplication technologies are applied to Enterprise Flash environments the effective costs become even more aligned with spinning disk storage. In addition, deduplication optimizes flash write operations because less data is being written relative to the amount of data stored, providing incremental life cycles to flash and improving data safety.

Flash vendors are beginning to offer deduplication in these environments today, and deduplication is the enabler that closes the cost and data safety gaps that have previously inhibited more widespread adoption. However, deduplication technology designed for spinning disk-based primary or backup storage does not readily apply to devices based on flash technology. Permabit Albireo for Enterprise Flash has been designed specifically to meet the demands of these I/O intensive devices using patented technology and algorithms that optimize resource efficiency in solid state environments.

Cloud Computing — A Growth Opportunity

As mentioned above, the amount of electronic data stored worldwide is on track to exceed 35 zettabytes by 2020, and as the data footprint expands, the process of storing and managing information becomes more complex. By 2015, nearly 20% of this information will be “touched” by (and as much as 10% maintained in) the cloud2. Cloud-based server and desktop Virtual Machine (VM) providers, in particular, stand to benefit from these growth trends. Gartner estimates that 5% of all VMs will be hosted by cloud providers by 20143. Infrastructure providers have emerged with solutions to help manage and protect this massive pool of desktops and servers.

2 Extracting Value from Chaos, IDC, June 20113 Virtual Machines Will Slow in the Enterprise, Grow in the Cloud, Gartner, March 2011

“There are dozens of Linux based NAS vendors that are looking for ways to differentiate themselves from their competitors. Building the Albireo VDO technology into their devices is an excellent way to differentiate themselves and deliver a truly usable deduplication solution.”

— George Crump Founder

Storage Switzerland

“It turns out that, like chocolate and peanut butter, data deduplication and SSDs combine to create a whole greater than the sum of its parts.”

— Howard Marks “Data Deduplication And

SSDs: Two Great Tastes That Taste Great Together“

Network Computing

Page 4: Permabit Albireo VDO White Paper1

4

Ready-to-Run Deduplication SoftwareMarket dynamics justify storage manufacturer and service provider efforts to develop or

integrate comprehensive, sub-file-level deduplication capabilities into their existing single-tier storage solutions, while providing a viable roadmap to tomorrow’s universal storage solutions.

The overarching requirement is for data optimization to increase storage efficiency without incurring a performance penalty. All differentiating features of the storage platform must remain

intact, with no compromises in functionality, data ingestion, or data access performance.

Key requirements involve the following areas: • Performance — Data optimization must be extremely efficient and maintain a level of

performance that does not impede overall storage performance on read and write operations. Storage vendors have made billion dollar R&D investments to optimize their storage performance as a means of differentiating their offerings.

• Feature Set Compatibility — Data optimization software must operate in conjunction with existing storage software and not interfere with or impede existing features. Storage vendors have invested millions into storage features that are vital to the operation and market value of their respective storage solutions.

• Resource Efficiency — Cost is king, particularly in the Low-end NAS appliance space. Accordingly, data optimization software cannot increase resource requirements that then impact that cost.

Albireo VDO Data Optimization Software Albireo VDO provides ready-to-run data deduplication capabilities for Linux-based storage, enabling OEMs to continue leveraging all of their storage solutions’ existing features, including existing Linux file systems, storage virtualization features, and data protection capabilities. Because Albireo VDO uses Permabit’s patented Albireo deduplication technology it is able to avoid costs associated with today’s high-end enterprise deduplication solutions that typically require large amounts of system memory and proprietary PCI Express cards to achieve even a fraction of Albireo’s scalability and performance.

Albireo’s high performance data deduplication provides a truly competitive feature set for mixed applications and use cases. Albireo VDO’s straightforward block-level, content-agnostic approach to data optimization provides an effortless solution that is both transparent and non-disruptive to end-user customers. With Albireo’s record-breaking performance, Linux-based storage OEMs can extend their deduplication capabilities and out-compete even the high-end proprietary storage players by providing data optimization capabilities for mission-critical application storage while effectively leveraging Linux open source to maximize value. Since Albireo VDO is implemented in terms of the Linux device mapper, it provides the perfect solution for Linux-based storage providers who wish to leverage their existing Linux integration investments, increase margins, and accelerate time-to-market with leading-edge data optimization.

Albireo VDO ArchitectureThe Albireo index provides the foundation for the Albireo VDO solution. The single greatest challenge when implementing a deduplication system is in rapidly identifying duplicate information across a storage pool that can contain hundreds of billions of items. To achieve acceptable levels of performance the system must, for each new piece of data, quickly determine if that piece is identical to any previously stored piece of data. If a match is found, the storage system can then internally reference the existing item to avoid storing the same information a second time. The Albireo Index Engine can identify duplicates across large storage pools in memory more than 99.95% of the time, eliminating the largest deduplication bottleneck, disk-based fetches. Index lookup averages just 5 microseconds on flash or 10 microseconds on traditional hard disk-based storage — orders of magnitude faster than other deduplication solutions. This enables Albireo VDO to support sustainable ingestion rates of over 1 GB/sec with a single 6-core processor.

Page 5: Permabit Albireo VDO White Paper1

5

The Albireo VDO Linux kernel module is implemented in terms of the Linux device-mapper. In the Linux kernel, the device-mapper serves as a generic framework

to map one block device onto another. It forms the foundation of LVM2 and EVMS, software RAIDs, dm-crypt disk encryption, and additional features such as file system snapshots. Device-

mapper works by processing data passed in from a virtual block device, in this case Albireo VDO, and then passing the resultant data on to another block device.

Albireo VDO can be implemented asynchronously or synchronously:• When running in asynchronous mode, Albireo deduplication technology works in-line to

find duplicates and then writes only the unique blocks to underlying storage. As part of the asynchronous data flow, VDO supports block-layer flush commands by persisting metadata to ensure file system integrity. VDO also provides data integrity from unclean shutdowns by ensuring that no more than 5 seconds of data is lost as a result of unexpected system crash.

• In synchronous mode new blocks are always written to the underlying storage device first, before Albireo checks for duplicates, to provide the highest level of data integrity. When VDO receives deduplication advice from the Albireo Index, it removes duplicate blocks from storage.

In all cases, data optimization through block-level deduplication increases the overall capacity of the underlying device.

In addition to deduplication, Albireo VDO provides thin provisioning services for Linux. Thin provisioning allocates physical volume or file system capacity as applications write data, rather than pre-allocating all physical capacity at the time of provisioning. This allows space savings to be realized from the deduplication process, effectively making more virtual space accessible than is physically available.

Figure 1:

Albireo VDO Architecture

Minimum System Requirements

• CPU Architecture: 64-bit x86

• RAM: 350 MB

• Disk Space: 55 GB

• Linux Distribution: Debian, SuSE, Red Hat, CentOS, Ubuntu

Page 6: Permabit Albireo VDO White Paper1

6

Albireo VDO Deduplication Savings

Deduplication savings are highly dependent on the way that data is used (workflow) as well as the type of data being processed. Albireo has been tested on a wide range of popular

data types including common office productivity files, Backups, and VMware system images (Table 1). Albireo achieved the best results with VMware images with a deduplication rate as

high as 99%. Excellent results were also achieved with the Exchange data and office files. Albireo reduced Exchange data by 86% and office files by 33%. Across the board, Albireo deduplication delivers massive cost savings.

Sample Data Dedupe Rate (4 KB Chunks)

User Directories, Fixed Chunk 2.8 : 1

User Directories, Variable Chunk 3.9 : 1

Tar Backups, Fixed Chunk with LZ77 compression 25.1 : 1

VMware Images, Fixed Chunk 36.3 : 1

Albireo VDO Resource Efficiency

Albireo VDO requires a single, dedicated Intel (or compatible) CPU core, 350 MB of memory, and 52 GB of disk space to address deduplication requirements for a 1 TB storage partition. Efficiency is improved for larger configurations. For example, 32 GB of memory can be used to support a 256 TB storage partition (0.13 GB of RAM/TB of disk). Albireo VDO also requires 42 GB of physical storage for indexing along with 1 GB of physical storage per TB of logical storage for handling metadata.

Physical Capacity 1 TB 16 TB 64 TB 256 TB

Logical Capacity 10 TB 160 TB 640 TB 2.5 PB

Memory Requirements 0.35 GB 2.1 GB 8.4 GB 32 GB

Disk Requirements* 55 GB 247 GB 859 GB 3.2 TB

*Assumes 10x logical storage

Why Permabit?

Innovation

Only Permabit Albireo VDO enables OEMs to rapidly deliver high performance deduplication for Linux-based storage solutions. Albireo VDO is a plug-and-play OEM solution that flexibly integrates within the constraints of existing storage architectures and leverages existing significant R&D investments.

Focus

Permabit is an expert in the development of highly scalable, next-generation storage solutions that deploy full inline data deduplication. By offering the industry’s first embedded OEM data optimization solution, Permabit is enabling Linux-based storage OEMs to compete effectively with breakthrough technology. Emerging storage vendors can capitalize on this major market shift by introducing new storage solutions that take market share away from incumbents. Leading storage vendors can leverage Albireo to further solidify their market position.

Expertise

The Permabit track record in storage expertise and innovation is without peer for a company of its age and size. Permabit has a total of 37 patents filed and 28 patents granted, all in the storage-related field. Its MIT-educated engineers have earned multiple awards for product innovation. Since 2000, Permabit has worked to develop the latest storage technology to address the challenges of highly scalable storage. With the release of Albireo, Permabit has made its core intellectual property for data optimization available for the first time as an OEM offering to other manufacturers and service providers. The Albireo architecture is a proven technology that has been implemented in production environments as a core technology in the Permabit Enterprise Archive and Cloud Storage.

Table 1:

Albireo VDO Deduplication Savings

Table 2:

Albireo VDO Resource Efficiency

Page 7: Permabit Albireo VDO White Paper1

7© 2012 Permabit Technology Corporation. All Rights Reserved. Permabit is a registered trademark and the Permabit logo, Albireo logo, Permabit Enterprise Archive, and Scalable Data Reduction are trademarks of the Permabit Technology Corporation. All other products or services mentioned may be covered by registered trademarks, trademarks, service marks, or product names as designated by the companies who market those products.

One Alewife Center, Suite 410 Cambridge, MA 02140 Phone: 617.252.9600

FAX: 617.252.9977

[email protected] www.permabit.com

ConclusionData centers are dealing with explosive data growth and flat budgets. As a result, IT organizations are making storage purchase decisions based on storage efficiency and total storage costs versus simply buying “cheap capacity.” Storage vendors and online service providers who will grow and flourish in today’s business environment must adapt their existing storage solutions and/or introduce new offerings that provide greater storage efficiency and reduced operating cost.

Albireo VDO delivers advanced data optimization technology that substantially reduces effective storage costs without sacrificing existing storage features. Once deployed, Albireo VDO provides unsurpassed performance and exceptional deduplication efficiency. In addition, the process of storage allocation is greatly simplified with the introduction of thin provisioning capabilities.

By delivering a virtual block device that “just works” out-of-the-box with existing file systems and data management features, Albireo VDO offers the fastest possible route to market for Linux-based storage OEMs, both manufacturers and online service providers. Whether the OEM is delivering NAS, SAN, or unified storage solutions based on traditional hard disks or flash-based storage, Albireo VDO provides the ideal ready-to-run data efficiency solution with leading capabilities for powerful competitive differentiation.

About PermabitPermabit is a recognized leader in data efficiency technology. We enable OEMs to leverage their R&D investment, increase margin, accelerate time to market and achieve competitive advantage. Permabit Albireo software massively improves performance and efficiency of data creation, transmission and storage. Solutions built with Albireo are being delivered by leading hardware, software and service providers.

Find Out MoreTo learn more about the Permabit Albireo technology, or to license our products, visit our website at www.permabit.com or call us directly at 617.252.9600.

Albireo (al-BEER-ee-oh) appears to the naked eye to be a single star but can be resolved with

a telescope into a double star, consisting of a brighter yellow

star and a fainter blue star .

0923