Containing Chaos

Download Containing Chaos

Post on 09-May-2015




0 download

Embed Size (px)


From Nemertes Research: Data center architects need to consider designs that limit complexity and reduce thepossibility of chaotic behavior. Learn more at


<ul><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>1 </p><p>Containing Chaos: How Networks Must Reduce Complexity to Adapt to the Demands of Next-Generation Data Centers </p><p> By Johna Till Johnson President &amp; Sr. Founding Partner, Nemertes Research </p><p>Executive Summary Data centers network requirements are changing dramatically, driven by </p><p>new applications, ongoing data-center consolidation, and the increasing workload dynamism and volatility introduced by virtualization. At the same time, traditional requirements for high speed, low latency, and high reliability continue to ratchet inexorably upwards. In response to these demands, data- center architects need to consider designs that limit complexity and reduce the possibility of chaotic behavior. </p><p>The Issue Data-center consolidation, server virtualization, and an increase in real-</p><p>time, high-bandwidth applications (such as video) and performance-sensitive applications (such as Voice over IP and desktop virtualization) are driving a paradigm shift in network architecture. A few years back, servers and users shared the same campus network, had similar requirements, and therefore relied on the same networking technologies. These days, servers are increasingly virtualized and consolidated in data centers. Users, in contrast, are distributed out across branches and administrative offices. </p><p>In other words, yesterdays one-size-fits-all campus LAN has bifurcated into two LANs: An access network that primarily interconnects users, and a data-center network that primarily interconnects virtualized servers and connected storage. </p><p>Servers have very different network usage characteristics than users. Typically they require orders-of-magnitude greater bandwidths coupled with very low latency. That means data-center networks are under intense pressure to scale up performance and reduce latency, and scale out to handle increased interconnections and bandwidth. </p></li><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>2 </p><p> And thats not all. Virtualization introduces two new challenges: dynamism and complexity. With virtualization, its no longer possible to predict which workloads need to communicate with which usersor more importantly, which other workloads, or where either end of the conversation will be located. It used to be possible to engineer a network based on the expected traffic flows, giving well-traveled paths (either between users and an application, or between applications) higher bandwidth and lower latency. Now the physical location of the virtualized workload is unknown, and in fact usually varies with time. IT staffs can launch applications anywhere in the data center, and those applications even can jump to other data centers. This any-to-any behavior doesnt work well across a traditional hierarchical data-center network. </p><p>The second challenge virtualization introduces is increased complexityboth operational and architectural. If you think of each virtualized workload as an end-node, the number of end-nodes connected through each network device goes up by at least an order of magnitude in a virtualized environment versus a physical one. And complexity increases geometrically with scale, which means a network designed to handle virtualized workloads isnt 10 times more complex than one handling physical ones, but closer to 50-100 times. And an increase in complexity translates directly into an increase in management overhead (including costs)and a decrease in reliability. Complexity also limits agility; particularly in a virtualized environment in which the goal is to build dynamic pools of compute resources. A complex network is harder to modify rapidlymeaning that the network gets in the way of rapidly provisioning resources. </p><p>The challenge facing data-center architects, therefore, lies in designing a network that scales performance while simultaneously reducing complexity. </p><p>Key Technology Trends and Business Drivers To better understand these challenging data-center requirements, it helps to </p><p>take a closer look at some of the critical technology trends and business drivers that produced them. </p><p>Data-Center Consolidation First and foremost is data-center consolidation: Over the past few years, IT </p><p>organizations have increasingly consolidated from dozens of data centers down to a handful (typically three), with the goal of optimizing costs by reducing real-estate footprint. This means that the remaining data centers are housing an order of magnitude more computing and storage resourcesand that networks need to scale accordingly. Thats even before the added impact of server virtualization (see below). </p><p>Server Virtualization As noted, server virtualization is also a critical trend. Nearly every </p><p>organization (97%) has adopted some degree of server virtualization. (Please see Figure 1.) For organizations that have fully deployed virtualization, 78% of </p></li><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>3 </p><p>workloads are virtualized. But most companies are still in the process of virtualizing: Just 68% of workloads, on average, are fully virtualized, meaning this is a trend thats ongoing. And as noted, virtualization injects specific challenges into an architecture, increasing performance requirements and complexity by an order of magnitude or more. </p><p>Bandwidth Increases As all this is going on, bandwidth requirements are increasing dramatically, </p><p>driven by increases in application density and type. 10 Gbit Ethernet has become the de-facto standard in data-center networks. (Please see Figure 2). And major router manufacturers have announced 100-Gbit interfaces. The bottom line? Get ready for yet another step-function increase in data-center bandwidth. </p><p> Figure 1: Server Virtualization Adoption </p><p>Emerging Real-time Applications Along with the structural changes in the data center, IT organizations are </p><p>coping with a dramatic influx in real-time applications. These include growing use of video, both conferencing and streaming video. (Approximately 74% of companies are deploying, planning to deploy, or evaluating streaming video). </p></li><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>4 </p><p>Another major application is desktop virtualization, deployed by 51% of companies in 2010 and projected to rise to 74% of companies by 2012. </p><p>These applications drive the need for both high bandwidth and extremely low latency in the data center core, since the servers for these applications are increasingly instantiated as virtual machines located in the data center. </p><p> Figure 2: Growth in 10-G Ethernet </p><p> The Impact of Technology Trends on Network Design </p><p>Overall, the impact of these technology trends is to shift the fundamental job of the data center. With virtualization, the major challenge of data-center networking is to provide an interconnection capability across which administrators can create virtual machines and manage them dynamically. </p><p>These virtual machines have two main problematic characteristics. First, theyre dynamic: They appear and disappear unpredictably as servers and applications are provisioned (increasingly, by the users themselves), and they move. Second, they increasingly generate traffic flows that are device-to-device, rather than client-to-server. Were seeing a 20% increase in any-to-any traffic in the data center, says the CIO of a midsize university who notes that a driving factor is the increased use of video streaming applications. Users (including but not </p></li><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>5 </p><p>limited to students) often store videos on one server, then move them server-to-server to process them. </p><p>Yet another trend is the emergence of SOA, Web 2.0, and collaboration applications, which also require real-time performance, and also drive server-to-server traffic flows. </p><p>In other words, data-center traffic flows are changing from statically defined, top-down (client-to-server) towards dynamic server-to-serverand the data- center architecture must change along with them. </p><p>At the same time, performance and reliability requirements continue to scale up. As the sheer volume of data-center traffic increases, due to data-center consolidation, and the emergence of high-bandwidth applications, capacity is also a design factor. And applications are increasingly intolerant of delay, making latency another design factor. Finally, since data centers increasingly are consolidated, failure is no longer an option, meaning that data-center networks need to get bigger, faster, and more able to handle unpredictable workloads while becoming even more reliable than previously. </p><p>The Complexity Challenge The challenge in re-architecting the data-center core is fundamentally this: </p><p>To support the design requirements of dynamic any-to-any traffic flows and high performance, while also reducing complexity. Why worry about complexity? In any large-scale system (such as a network), increasing complexity tends to do two things: Increase the cost of managing the system and decrease reliability. The catch is that to reduce complexity, one first has to understand and define it. Although theres an entire science devoted to complexity theory, theres no fixed definition of complexity, or of complex systems. A good definition of a complex system is the following: Complex systems are built out of a myriad of simple components which interact, and exhibit behavior that is not a simple consequence of pairwise interactions, but rather, emerges from the combination of interactions at some scale. </p><p>For networked systems, one can begin to think about complexity in terms of the number of devices or agents in the system and the potential interconnections between them. If there are N agents in a system, it takes N*(N-1)/2 interconnections to interlink these agents directly to each other, meaning that the number of interconnections scales geometrically with N. </p><p>In a data-center network, agents are switching and routing elements, and connections are the logical paths between them. Controlling complexity therefore involves minimizing the number of the interactions between agents, which, as well see, is easier said than done. </p></li><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>6 </p><p>Complexity, Chaos, and Dynamism One consequence of complexity is that it can generate chaotic behavior. </p><p>Mathematically speaking, chaotic behavior is behavior thats neither predictable nor randominfinitesimally small changes in a starting state can produce arbitrarily large changes in a later state. Obviously this is undesirable in a system (or network) thats designed to consistently deliver a specific function predictably and manageably. For example, in a networked environment, a minor difference in configuration could trigger a downstream failure thats unpredictable and thus, unpreventable. These types of problems arise in virtually any complex environment (including nuclear power plants and airplanes in flight). Interestingly, chaotic behavior often arises from very simple relationships. In other words, a complex system that is constructed of simple, deterministic building blocks can nonetheless display chaotic behavior. (Surprisingly, this mathematical conception of chaos was accurately captured back in 1945 by the poet Edna St. Vincent Millay, who described it as, Something simple yet not understood.) </p><p>The challenge of reducing complexity therefore becomes, in essence, the challenge of containing chaos. Some approaches to doing so can be drawn from other fields in which chaotic behavior arises; others are specific to networks. </p><p>As noted, the complexityand propensity for chaotic behaviorincreases dramatically with a systems dynamism, the need to change quickly from state to state. As Duncan Watts, applied mathematician and principal research scientist at Yahoo! Research, puts it more eloquently, dynamic behavior dramatically changes the game when it comes to chaos and complexity. </p><p>How does all this apply to data-center networks? In a nutshell, they need to be architected in a way that limits complexity and reduces, or eliminates, the possibility of chaotic behaviorparticularly in light of the non-deterministic nature of virtualized servers and applications. </p><p>Architecting to Control Complexity As noted, controlling complexity in a network boils down to minimizing N, </p><p>the number of networking elements. With traditional architectures, thats not exactly easy. Traditional architectures are built around a core-distribution-edge design (Please see Figure 3.) With such an architecture, connecting from a virtual workload (V) executing in server farm A to a virtual workload in server farm C requires traversal up and down the hierarchy (six hops). Similarly, a virtual workload in server farm B is three hops away from server farm C, and eight hops away from server farm A. </p><p> Next to the mysteries of dynamics on a networkthe problems of networks we have encountered up to now are just pebbles on the seashore.Duncan Watts, applied mathematician and principal research scientist, Yahoo! Research </p></li><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>7 </p><p>This poses three challenges. First, to scale to support an increasing number of virtual workloads, this architecture must increase the network elements, exactly the opposite of the goal. Second, injecting multiple switching elements between virtual machines increases the path length, and therefore the latency, between virtual machines, which can adversely affect performance. Finally, a hierarchical network design is ill equipped to handle highly dynamic endpoints, such as virtual workloads. </p><p> Figure 3: Traditional Network Architecture </p><p> The solution is to collapse the core, or flatten the traditional hierarchical </p><p>structure as much as possible, ideally to provide a single hop between every site. That reduces complexity and also reduces latency. This means, for example, that if a user is processing a video-streaming application, he or she can dynamically provision a video-processing server across the data-center network from the video server with the confidence that the data-center network will inject no more than a single hops delay. </p><p>Theres an additional step that can reduce complexity even more dramatically. In other complex systemsaerospace engineering, for examplethe solution to untrammeled complexity is to essentially create black boxes that provide a simple and predictable set of inputs and outputs to the rest of the system, thereby bounding the complexity (and curtailing potential chaos) That is, possible number of interactions between elements within the black box and the rest of the system is reduced, since the black box appears to the outside system as a single element. </p></li><li><p>Nemertes Research 2011 888-241-2685 DN1420 </p><p>8 </p><p>In the case of network complexity, the corresponding approach is to reduce the entire network to a single, consistently managed switchi...</p></li></ul>