Transcript
Page 1: Open Source in the Next Computing Wave

Illuminata, Inc. • 4 Water Street • Nashua, NH 03060 • 603.598.0099 • 603.598.0198 • www.illuminata.com

TM

Insight

Gordon Haff9 January 2009

Open Source in theNext Computing Wave

Open source software has been both benefactor and beneficiary of the “Internetwave” of computing during which large-scale, network-connected computingarchitectures built from relatively standardized hardware and softwarecomponents have came into their own. The Nineties may have been the yearswhen small workgroup systems running operating systems such as NetWare andWindows NT arrived throughout the business world in a big way. But the Noughtshave made distributed systems a core part of almost every datacenter.

Open source has fitted this evolution well. Linux itself came to be, in a sense, theunified Unix that had failed to be birthed through more conventional commercialpartnerships.1 And Unix-style operating systems, historically closely tied to therise of computer networks (including standards like TCP/IP that underpin theInternet), were a great technical match for an increasingly network-centric style ofcomputing. At the same time, those computer networks provided the widespreadconnectivity that collaborative open source development needed to flourish. Forusers, these networks provided easy access to the open source software and ameans to connect to, and engage with, a community of other users. Nor did it hurtthat the large scale of many of these new computing infrastructures made cost abigger issue than ever before—which helped to drive the proliferation of x86-based servers and open source software.

Today, a fresh set of trends and technologies is changing the way that webuild computing systems and operate them. Two of the biggest are

virtualization and cloud computing. Virtualization effectivelydecouples operating systems and their applications from serverhardware, and thereby makes it easier to physically move them fromone machine to another. Cloud computing is changing where

applications run—from on-premise to out-in-the-network.

Business dynamics are also changing. Even if it’s often just a self-interestedconcern about their power bill, we are starting to see a greater awareness of

environmental issues among those responsible for operating datacenters.The current economic climate is also forcing more systematic thinkingabout costs in general, including those associated with overall complexityand the security and resiliency of large distributed infrastructures.

These trends intersect in powerful ways; a new wave of computing is gatheringmomentum as a result. And open source is once again playing a major role.

1 We use “Unix” here in the sense of a family of modular operating systems thatgenerally share programmatic interfaces and other conventions and approaches.

Copyright © 2009 Illuminata, Inc.

Licensed to Red Hat,Inc. for web posting. Donot reproduce. Allopinions andconclusions hereinrepresent theindependent perspectiveof Illuminata and itsanalysts.

*3B5DB40603D89DF

Page 2: Open Source in the Next Computing Wave

*3B5DB40603D89DF

2

Cloud Computing’s Coming of Age

We consider cloud computing first. There’scertainly plenty of buzz about it. For our purposeshere, we define cloud computing as accessingcomputing resources over a network—whetherthose resources take the form of a completeapplication (Software as a Service—SaaS); adeveloper platform such as Google Apps orMicrosoft Azure; or something that’s more akin toa barebones operating system, storage, or adatabase (Amazon Web Services).2

As recounted by, among others, Nick Carr in hisThe Big Switch, cloud computing metaphoricallymirrors the evolution of power generation anddistribution. Industrial Revolution factories—suchas those that once occupied many of the riversidebrick buildings I overlook from my Nashua, NewHampshire office—built largely customizedsystems to run looms and other automated tools,powered by water and other sources. These powergeneration and distribution systems were acompetitive differentiator; the more power youhad, the more machines you could run, and themore you could produce for sale. Today, by contrast,power (in the form of electricity) is just acommodity for most companies—something thatthey pull off the grid and pay for based on howmuch they use.

The economic argument underpinning cloudcomputing has two basic parts. The first relates tohow firms should generally focus their resourceson those things that differentiate them and givethem advantage over competitors. Computersystems—especially those devoted to mundanetasks such as email—aren’t one of thosedifferentiators for many companies.3 The secondpart relates to size and scope of computing facilities.Efficient IT operations involve a high degree ofstandardization, up-front design, and automated

2 See our To Cloud or Not to Cloud for more discussion ofthe different forms that cloud computing takes.

3 One of the earliest examples of widespreadoutsourcing of a computing task was payroll. Thisfunction is certainly important but having “betterpayroll” (whatever that would mean) isn’t somethingthat advantages a company.

operation; applying this degree of“industrialization” to datacenters and theiroperation isn’t really viable at small scale.4

Open Source in the Cloud

Open source is very much part of cloud computing.The benefit of open source to the cloud providers isclear at several levels.

First, there’s the matter of cost. Open source isn’tnecessarily “free as in beer” (to use the popularexpression)—that is, zero cost; companies oftenwant subscription offerings and support contractseven if the bits are nominally available for free. Butit does tend to be less expensive than proprietaryalternatives even when some production-scalefeatures are extra-cost options (as in the case of themonitoring tools in MySQL Enterprise). And thisis no small consideration when you look at the sizeof providers like Amazon and Google which oftenseem to add datacenters at a rate that manycompanies once added computers.

Open source software is also just a good match forthis style of computing. For one thing, cloudproviders—almost by definition—are technicallysavvy and sophisticated. Although they don’t wantto reinvent every wheel, they’re generally ready,able, and willing to tweak software and evenhardware in the interests of optimization. Opensource software and, more broadly, open sourcecommunities with which they can engage, aretherefore a good fit given that they can modifysource code and otherwise participate in evolvingsoftware in a way that meets their requirements.

There are some areas of friction between opensource and cloud computing. We see this in theongoing social and community pressure on largecloud vendors such as Goggle to make their “fairshare” of contributions to open source projects.5

4 There’s an ongoing debate over how big “big” needsto be. See our Bigness in the Cloud. But there’s generalagreement that the entry point is somewhere aroundlarge datacenter scale.

5 Most “copyleft” open source licenses, such as theGPL, don’t require that code enhancements becontributed back to the community when the

Page 3: Open Source in the Next Computing Wave

*3B5DB40603D89DF

3

Proprietary Web-based applications and services—such as those from Google, 37signals,Salesforce.com, and even some traditional softwarevendors—also tend to mirror certain open sourcestrengths such as easy acquisition.

However, in the main, it’s a largely healthy andmutually beneficial relationship. Open source iswidely embraced by all manner of technologycompanies because they’ve found that, for manypurposes, open source is a great way to engage withdeveloper and user communities—and even withcompetitors. In other words, they’ve found that it’sin their own interests to participate in the ongoingevolution of relevant projects rather than simplytaking a version of a project private and thenworking on it in isolation.

Virtualization

Virtualization is the other buzziest IT topic today.Truth be told, when it comes to enterprisecomputing, it’s actually of more immediate interestthan cloud computing given that it’s a moredeveloped set of technologies and its use cases arebetter understood.6

To better understand how server virtualization7

plays with both cloud computing and open source,it helps to think about what virtualization really isand how it is evolving. The core component ofserver virtualization is a hypervisor, a layer ofsoftware that sits between a server’s hardware andthe operating system or systems that run on top intheir isolated virtual machines (VM). Essentially,the hypervisor presents an idealized abstraction ofthe server to the software above. It can also make itappear as if there are multiple such independentservers (all of which, in reality, cooperatively share

software is delivered only in the form of a service, asis typical with cloud computing.

6 Although various antecedents to, and subsets of,cloud computing go back some time—think hostingproviders or even timesharing.

7 At its most conceptual, virtualization is an approachto system design and management that happens inmany places and at many layers in a system. For ourpurposes here, virtualization refers specifically to theparticular approach to server virtualization described.

the physical server’s hardware under the control ofthe hypervisor).

This ability to share a single (often underutilized)physical server is certainly a salient trait ofvirtualization. In fact, it’s the main reason thatmost companies first adopt virtualization—toreduce the number of physical servers they have topurchase to run a given number of workloads.However, looking forward, the abstraction layerthat virtualization inserts between hardware andapplication software is at least as importantwhether it’s used to run multiple operating systemimages on a single server or not.

Historically, once an application was installed on asystem, it was pretty much stuck there for life.That’s because the act of installing the application—together with its associated operating system andother components—effectively bound it to thespecifics of the physical hardware. Moving theapplication meant dealing with all manner ofdependencies and, in short, breaking “stuff.”Placing a hypervisor in the middle means that thesoftware is now dealing with a relativelystandardized abstraction of the hardware ratherthan actual hardware. The result is greatlyincreased portability.

Portability, in turn, enables lots of interestingpractical uses. For example, administrators can takea snapshot of an entire running system for archivepurposes or to rollback to if there’s a problem witha system upgrade. VMs can be transferred from onesystem to another, without interrupting users, tobalance loads or to perform scheduled maintenanceon a server. Ultimately, virtualization enables whatis often called a virtual infrastructure or a dynamicinfrastructure—by whatever name, aninfrastructure in which workloads to move towherever they are most appropriately run, ratherthan where they happened to be installed onceupon a time.

Open Source in Virtualization

VMware both brought proprietary servervirtualization to the mainstream and has been the

Page 4: Open Source in the Next Computing Wave

*3B5DB40603D89DF

4

vendor who has most benefited from it to date.However, a variety of virtualization options arenow available, enabled in part by enhancements toprocessor hardware from AMD and Intel thatsimplify some of the more difficult aspects ofsimplifying x86 hardware.

Among these options are Xen and KVM. Both opensource projects are part of standard Linuxdistributions.8 They are also available in the formof a “standalone hypervisor”—essentially a smallpiece of code (often embedded in flash memory)that lets a server directly boot up into a virtualizedstate without first installing an operating system.“Guest” operating systems can then be installed ontop in the usual manner. Xen is the more widely-used and mature of the two today. But Red Hatbought Qumranet—the startup behind KVM—inearly September 2008 and is focusing on KVM asits strategic virtualization technology goingforward; KVM has also been incorporated into themainline Linux kernel since version 2.6.20.9

Virtualization has a close relationship to cloudcomputing, especially cloud computingimplementations that provide users with anexecution environment in the form of a virtualmachine.10 Virtualization brings a lot of theproperties you’d want a cloud computingenvironment to have. You want to be able to storesnapshots of your environment to use in thefuture. Check. You want to be able to spin upapplications dynamically and shut them downwhen they’re no longer needed. Check. You want toinsulate users from details of the physicalinfrastructure so that you can make transparentupgrades and other changes. Check. Virtualizationisn’t a universal requirement for all types of cloudcomputing. High performance computing in

8 Xen is also the basis for the server virtualization inSun’s OpenSolaris and xVM. See our VirtualizationStrategies: Sun Microsystems.

9 Red Hat is doing this for both business and technicalreasons. See our Red Hat Makes Buy for KVM—But VDI Too.

10 Providers of other types of cloud computing, such asSaaS, may also use virtualization—but, such detailsof their technology infrastructure are hidden fromusers of the service. In fact, that’s sort of the point.

particular often uses an alternative form ofvirtualization that’s more about distributing asingle large job to a large number of servers usingcertain standard protocols—sometimes called agrid. However, server virtualization is certainly anideal complement to many cloud computingimplementations.

Cloud computing providers have adopted opensource virtualization approaches (especially Xen)for many of the same reasons that they’ve widelyadopted open source in general. Amazon ElasticCompute Cloud (EC2) is an illustrative and well-known example of virtualization, paired withLinux, in the cloud.11 With EC2, you rent VMs bythe hour. Each VM comes with a specific quantityof CPU, memory, and storage; currently there arefive different size combinations available. Users canthen build their own complete VM from scratch.More commonly, they’ll start from a standardAmazon Machine Image (AMI)—an archived VMpre-loaded with an operating system, middleware,and other software.

Initially, these AMIs consisted almost entirely ofcommunity-supported Linux distributions.However, one of the things that we now seehappening as cloud computing evolves from adeveloper-centric, kick-the-tires stage to somethingthat supports production applications and evenentire businesses,12 is that some of the sameconcerns that are relevant to software running inan enterprise datacenter are finding their way intosoftware running at cloud providers.

An example of this trend is AMIs with Red HatEnterprise Linux (RHEL) and the JBoss EnterpriseApplication Platform (currently in a supportedpublic beta phase). This allows enterprises runningRHEL inside their firewall to run the sameoperating system on Amazon Web Services (AWS).They might do this as part of migrating to, orrunning, just new applications in the cloud—or forusing the cloud to handle temporary workloadspikes. Precise support policies can vary by software

11 At the end of 2008, Amazon also added support forMicrosoft Windows and SQL Server to EC2.

12 See our SmugMug and Amazon S3.

Page 5: Open Source in the Next Computing Wave

*3B5DB40603D89DF

5

vendor, but—in general—running the exact sameoperating system and middleware stack in a remoteenvironment as locally should allow applicationsthat are certified and supported in one place to alsobe certified and supported in the other.

The Green in Green

It’s popular to talk about “green,”environmentally-sensitive computing these days.Some green computing is, indeed, green for green’ssake. It may be driven by government regulation13

or it may be part of a high-level corporate initiativeundertaken for brand image or other reasons.However, much of the time, companies undertakeenergy efficiency and conservation projects for themost pragmatic of reasons: they can help thebottom line. Especially when bringing new ITcapacity on-line, it’s profitable to factor power andcooling costs into any financial analysis.

Optimizing power use dovetails with the other twotrends that we’ve discussed—virtualization andcloud computing—in important ways.

Perhaps the most obvious intersection is withvirtualization in its basic guise as a way toconsolidate applications onto fewer physicalservers. Reducing the number of servers cutsacquisition costs, certainly. However, it’s also thecase that the server with the lowest environmentalimpact—in all dimensions, not just power draw—isone that doesn’t exist.

More broadly, virtualization brings dynamism toan IT environment through features such as thelive migration of virtual machines from one serverto another and the dynamic allocation of hardwareresources in response to workload changes. These inturn enable what’s often called “automation,”essentially distributed workload management inresponse to user-specified policies. In the opensource space, Red Hat Enterprise MRG (messaging,realtime, and grid) is an example of technologiesorganized around an automation theme.

13 Such as the RoHS (Regulation of HazardousSubstances) directive in the European Union.

Among the benefits of automation is that, as thedemands on a datacenter’s infrastructure changeover the course of a day, the course of a quarter, orin response to traffic spikes, physical resources thataren’t immediately required can be turned off untilthey are needed. If fewer Web servers or applicationservers are needed at night, they needn’t bepowered-up all the time. There are also analogousexamples in the storage arena where data thatdoesn’t have to be instantly available can be movedto lower-power near-line storage such as tape orMAID (massive array of idle disks).

Automating and Integrating

In practice, most datacenters are still in quite earlydays when it comes to broadly applyingautomation. There are a variety of reasons for this.One is simply that it’s a largely new way ofthinking about operating servers. After all, it tookyears before (just about) everyone got comfortablewith letting an operating system schedule jobs onmulti-processors within a single server. Theadoption pattern of automation in a distributedenvironment will be no different. What’s more,virtualization—to say nothing of the tools thatautomate its use—is still quite new on the timescale of IT evolution. Short version: These thingstake time.

There’s also a prosaic but very real issue associatedwith changing IT operations for the benefit of thepower bill; the budget for IT gear, software, and thepeople needed to run it efficiently is often (indeed,usually) separate from the budget that pays forutilities. And, however much many managers wantto—in principle—make the right decisions for theircompany take as a whole, in practice actions tend tobe driven by individual and departmental budgetsand other incentives. Which brings us back to cloudcomputing.

Many of the things we think of as Green ITfundamentally relate to efficiency. So is,fundamentally, cloud computing. It’s based on thepremise that specialized service providers candeliver computing less expensively and with a

Page 6: Open Source in the Next Computing Wave

*3B5DB40603D89DF

6

better level of service than small operations thatdon’t have IT as a core competency.

Efficiency partly relates to the idea that larger scalehelps to smooth out demand. If different customersconsume computing at different times and indifferent patterns, the aggregate demand issmoothed out—and requires less infrastructure (atconsequently reduced cost and power consumption)—than would if individual customers were toprovision themselves for their own peak loads.

It also suggests an “industrialization” of thedatacenter, to use a term applied by IrvingWladawsky-Berger of IBM. By this he meansprofessional, disciplined, and efficient ITmanagement. This suggests things like reducingcomplexity (in terms of number of applications andplatform types) and the use of the sort ofautomation tools such as we discussed earlier.Ultimately, one of the big things that cloudproviders sell is the quality of their IT, whethermanifested as high service levels, low cost, ability toscale, or some other attribute. And, even for thoseapplications that enterprises decide to continuerunning internally, cloud providers provide abenchmark of what is possible.

If we talk specifically about software servicesdelivered by an external provider, however, onefinal aspect of cloud computing is especiallyinteresting in a Green IT context. When we buysoftware in the form of a service rather thaninstalling it locally—say SugarCRM On-Demandor Zoho Business—we no longer need to buy,operate, or power a local server. The software is stillrunning someplace, of course: on a serviceprovider’s hardware. The difference is that we arenow implicitly paying for all those costs as part ofour software subscription. They’re no longerhidden in someone else’s budget.

Thus, cloud computing—as it develops—shouldhelp lead to more efficient IT operations and, as asort of side effect, will make the full costs ofrunning an application more visible.

Conclusion

A new wave of IT is now forming at theintersection of cloud computing, virtualization, anda generalized drive towards simplicity andefficiency. Cloud computing’s aim is to providesoftware, platforms, and infrastructure in the formof a service from providers optimized to do so atscale. The collective goal of technologies andconcepts such as automation, orchestration, andvirtualization is to break tight bonds betweenapplications and the underlying physicalinfrastructure in order to use that infrastructuremore efficiently and to change the resourcesavailable to those applications on-the-fly.

Each of these individual trends is important. Butsome of the biggest wins for IT departments willcome when they consider the ways that thesetrends play off and cross-support each other. Andtheir technology choices should reflect this—including those that involve open source software.

Open source is clearly a significant part of this newwave. Part of its role is essentially more of the sameas open source projects proliferate and maturethroughout software ecosystems. Open sourcebenefits such as easy acquisition and trial, ability tomodify, and general adherence to standards thathave been important to end-users are also equally(or even more) important to the cloud providersdelivering a new generation of software services.

However, “free and open source software” (FOSS)has never been just about viewing and modifyingbits. It’s introduced new ways of thinking abouthow we collectively develop, consume, and pay forsoftware. And continuing that thinking about whatprotections and rights the users of software haveand should have may be as important a continuingrole for FOSS as the actual operating systems,middleware, and applications built on the opensource model.


Top Related