large scale distributed computing - summarytu_wien-large...large scale distributed computing -...

40
Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1 Most common forms of Large Scale Distributed Computing ....................... 3 2.2 High Performance Computing ........................................ 3 2.2.1 Parallel Programming ........................................ 5 2.3 Grid Computing ............................................... 5 2.3.1 Definitions .............................................. 5 2.3.2 Virtual Organizations ........................................ 6 2.4 Cloud Computing ............................................... 6 2.4.1 Definitions .............................................. 6 2.4.2 5 Cloud Characteristics ....................................... 7 2.4.3 Delivery Models ........................................... 7 2.4.4 Cloud Deployment Types ...................................... 7 2.4.5 Cloud Technologies .......................................... 7 2.4.5.1 Virtualization ....................................... 7 2.4.5.2 The history of Virtualization ............................... 8 2.5 Web Application Frameworks ........................................ 9 2.6 Web Services ................................................. 9 2.7 Multi-tenancy ................................................. 9 2.8 BIG DATA .................................................. 10 3 CHAPTER 2 11 3.1 OS / Virtualization .............................................. 11 3.1.1 Batchsystems ............................................. 11 3.1.1.1 Common Batch processing usage ............................. 11 3.1.1.2 Portable Batch System (PBS) .............................. 12 3.1.2 VGE - Vienna Grid Environment .................................. 12 3.2 VMs, VMMs ................................................. 12 3.2.1 Why Virtualization? ......................................... 12 3.2.2 Types of virtualization ........................................ 13 3.2.3 Hypervisor vs. hosted Virtualization ................................ 14 3.2.3.1 Type 1 and Type 2 Virtualization ............................ 14 3.2.4 Basic Virtualization Techniques ................................... 14 3.3 Xen ...................................................... 14 3.3.1 Architecture ............................................. 14 3.3.2 Dynamic Memory Control (DMC) ................................. 16 3.3.3 Balloon Drivers ............................................ 16 3.3.4 Paravirtualization .......................................... 16 3.3.5 Domains in Xen ........................................... 17 3.3.6 Hypercalls in Xen .......................................... 17 3.4 VMWare .................................................... 17 3.4.1 Hosted vs. Hypervisor Architecture ................................ 17 3.5 Cloud Management .............................................. 18 3.6 OpenNebula .................................................. 18 3.7 Eucalyptus .................................................. 19 3.7.1 Components ............................................. 19 3.8 Virtualization - Glossary ........................................... 19 1

Upload: others

Post on 28-Oct-2019

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Large Scale Distributed Computing - Summary

Author: @triggetry

December 10, 2013

Contents

1 Preamble 3

2 CHAPTER 1 32.1 Most common forms of Large Scale Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . 32.2 High Performance Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2.1 Parallel Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Virtual Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4.2 5 Cloud Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.3 Delivery Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.4 Cloud Deployment Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.5 Cloud Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4.5.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.5.2 The history of Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Web Application Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.7 Multi-tenancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.8 BIG DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 CHAPTER 2 113.1 OS / Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Batchsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.1.1 Common Batch processing usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.1.2 Portable Batch System (PBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.2 VGE - Vienna Grid Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 VMs, VMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Why Virtualization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.2 Types of virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.3 Hypervisor vs. hosted Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.3.1 Type 1 and Type 2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.4 Basic Virtualization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3.2 Dynamic Memory Control (DMC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.3 Balloon Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.4 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.5 Domains in Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3.6 Hypercalls in Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4 VMWare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4.1 Hosted vs. Hypervisor Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5 Cloud Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.6 OpenNebula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.7 Eucalyptus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.7.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.8 Virtualization - Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1

Page 2: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

4 CHAPTER 3 224.1 Self adaptable Clouds: Cloud Monitoring and Knowledge Management . . . . . . . . . . . . . . . . . 22

4.1.1 Traditional MAPE Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1.2 SLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1.3 LoM2His Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1.4 Cloud Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.5 How to make Clouds energy-efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.6 How to avoid SLA violations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.7 How to structure actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Policy Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Cloud Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4 Cloud Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Cloud Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.6 Problems when providing virtual goods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.7 Resource markets in Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.8 Commercial Resource Providers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.9 Liquidity Problems in Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.10 The importance of SLAs in markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.11 Managing SLAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.12 The SLA Template Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.13 SLA Mapping in Double Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.14 Consequences of few resource types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.15 Mapping the SLA Landscape for High Performance Clouds . . . . . . . . . . . . . . . . . . . . . . . 294.16 SLA Mapping Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 CHAPTER 4 305.1 Map-Reduce Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2 Map Reduce Sequence of Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Master Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.4 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.4.1 Worker Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.4.2 Master Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.5 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.6 Partitioning function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.7 Combiner function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.8 Input and Output Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.9 Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.10 Hive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.10.1 HiveQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.11 HadoopDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Some Questions 36

2

Page 3: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

1 Preamble

This document is a collections of articles and informations, mainly from Wikipedia and other online resources aswell as the offical slides, which cover the topics of the lecture in Large Scale Distributed Computing. Note thatthis summary is not complete, and by no means a guarantee for passing the exam. If you need the sources of thedocument, feel free to contact me via twitter (see title page).

2 CHAPTER 1

2.1 Most common forms of Large Scale Distributed Computing

• High Performance Computing (HPC)

– tightly coupled

– homogeneous

– single system image

• Grid Computing

– large scale

– cross-organizational

– geographical distribution

– distributed management

• Voluntary Computing (e.g. seti@home)

• Global Distributed Computing (google, seti@home)

– loosely coupled

– heterogeneous

– single administration

• Cloud Computing

– provisioned on demand

– service guarantee

– vms and web 2.0-based

• Big Data

– NoSql DBs

– real time processing

– distributed queries

2.2 High Performance Computing

There are applications like preoperative surgery planning, modeling and simulation, etc. which would need monthsor years for the execution on a single processor PC. Here is where High Performance Computing and Parallel Pro-cessing comes in. You could do things more efficiently if you do them in parallel. But there is one problem: If oneworker needs 1000 days, 2 workers need 500 days, etc. to finish a job. Does it mean that 1000 workers only need 1day or less? Communication between the workers is the key problem!

We distinguish between different HPC-Infrastructures:

• Supercomputers (custom processor, tightly coupled)

– vector processors (single operation on multiple data)

– scalar processors (single operation on single data)

– SISD (Single instruction, single data) e.g. sequential computer. In computing, SISD (single instruction,single data) is a term referring to a computer architecture in which a single processor, a uniprocessor,executes a single instruction stream, to operate on data stored in a single memory. This corresponds tothe von Neumann architecture.

3

Page 4: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

– SIMD (Single instruction, multiple data) . It describes computers with multiple processing elementsthat perform the same operation on multiple data points simultaneously. Thus, such machines exploitdata level parallelism. SIMD is particularly applicable to common tasks like adjusting the contrast ina digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMDinstructions in order to improve the performance of multimedia use.

– MISD (Multiple instructions, single data) e.g. vector computing. a type of parallel computing architec-ture where many functional units perform different operations on the same data. Pipeline architecturesbelong to this type, though a purist might say that the data is different after processing by each stagein the pipeline. Fault-tolerant computers executing the same instructions redundantly in order to detectand mask errors, in a manner known as task replication, may be considered to belong to this type. Notmany instances of this architecture exist, as MIMD and SIMD are often more appropriate for commondata parallel techniques. Specifically, they allow better scaling and use of computational resources thanMISD does. However, one prominent example of MISD in computing are the Space Shuttle flight controlcomputers.

– MIMD (Multiple instructions, multiple data) e.g. mulitcores. Machines using MIMD have a numberof processors that function asynchronously and independently. At any time, different processors may beexecuting different instructions on different pieces of data. MIMD architectures may be used in a numberof application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling,and as communication switches. MIMD machines can be of either shared memory or distributed memorycategories. These classifications are based on how MIMD processors access memory. Shared memorymachines may be of the bus-based, extended, or hierarchical type. Distributed memory machines mayhave hypercube or mesh interconnection schemes.

• Cluster (COTS-Components, loosely coupled, Beowulf Clusters)

• Grids (Interconnection of computational resources across different administration domains, Virtual organiza-tions)

Why do it parallel

We do parallel processing for the simple reason to speed up the execution of programs and thus to save time.

Speedup = sequentialruntimeparallelruntime

Sequential Processing

1. store program and data in memory

2. CPU gets instructions and data from memory

3. Decode instructions

4. Execute it sequentially

Symmetric Multi Processing

SMP (symmetric multiprocessing) is the processing of programs by multiple processors that share a common oper-ating system and memory. In symmetric (or ”tightly coupled”) multiprocessing, the processors share memory andthe I/O bus or data path. A single copy of the operating system is in charge of all the processors. SMP, also knownas a ”shared everything” system, does not usually exceed 16 processors.

SMP systems are considered better than MPP systems for online transaction processing (OTP) in which manyusers access the same database in a relatively simple set of transactions. An advantage of SMP for this purpose isthe ability to dynamically balance the workload among computers (and as a result serve more users faster).

Memory Architectures

Shared Memory Multiple CPUs can operate independently. The Changes in memory are visible to the CPUs.SMP.

Distributed Memory

• communication network required

• processors have their own memory

4

Page 5: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

• no global adress space

In a distributed memory system there is typically a processor, a memory, and some form of interconnection thatallows programs on each processor to interact with each other. The interconnect can be organised with point to pointlinks or separate hardware can provide a switching network. The network topology is a key factor in determininghow the multi-processor machine scales. The links between nodes can be implemented using some standard networkprotocol (for example Ethernet), using bespoke network links (used in for example the Transputer), or using dualported memories.

Hybrid Distributed-Shared

• used in most parallel computers today

• Cache-choerent SMP nodes

• Distributed memory → multiple SMP nodes

2.2.1 Parallel Programming

• Shared Memory: In a shared memory model, parallel tasks share a global address space which they readand write to asynchronously. This requires protection mechanisms such as locks, semaphores and monitors tocontrol concurrent access. Shared memory can be emulated on distributed-memory systems but non-uniformmemory access (NUMA) times can come in to play.

– e.g. OpenMP

• Threads

• Message Passing: In a message passing model, parallel tasks exchange data through passing messages toone another. These communications can be asynchronous or synchronous. The Communicating SequentialProcesses (CSP) formalisation of message-passing employed communication channels to ’connect’ processes,and led to a number of important languages such as Joyce, occam and Erlang.

– (e.g. Message Passing Interface MPI)

• Hybrid approaches e.g. OpenMP and MPI, MPI and POSIX Threads

2.3 Grid Computing

2.3.1 Definitions

”..computational Grid is hardware and software infrastructure that provides dependable, consistent, and pervasiveaccess to high-end computational capabilities” 1

”grid computing is concerned with coordinated resource sharing and problem solving in dynamic, multi-institutionalvirtual organizations” 2

3 A grid

• uses standard, open, general purpose protocols and interfaces

• coordinates resources that are NOT subject to centralized control

• delivers non-trivial qualities of service

Overview

A grid computer is multiple number of same class of computers clustered together. A grid computer is connectedthrough a super fast network and share the devices like disk drives, mass storage, printers and RAM. Grid Computingis a cost efficient solution with respect to Super Computing. Operating system has capability of parallelism Gridcomputing combines computers from multiple administrative domains to reach a common goal, to solve a single task,and may then disappear just as quickly. One of the main strategies of grid computing is to use middleware to divideand apportion pieces of a program among several computers, sometimes up to many thousands. Grid computinginvolves computation in a distributed fashion, which may also involve the aggregation of large-scale clusters. Thesize of a grid may vary from small—confined to a network of computer workstations within a corporation, for

1Foster, Kesselman (1998)2Foster, Kesselman (2000)3Foster, Kesselman (2002)

5

Page 6: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

example—to large, public collaborations across many companies and networks. ”The notion of a confined grid mayalso be known as an intra-nodes cooperation whilst the notion of a larger, wider grid may thus refer to an inter-nodescooperation”. Grids are a form of distributed computing whereby a ”super virtual computer” is composed of manynetworked loosely coupled computers acting together to perform very large tasks. This technology has been appliedto computationally intensive scientific, mathematical, and academic problems through volunteer computing, andit is used in commercial enterprises for such diverse applications as drug discovery, economic forecasting, seismicanalysis, and back office data processing in support for e-commerce and Web services. Coordinating applicationson Grids can be a complex task, especially when coordinating the flow of information across distributed computingresources. Grid workflow systems have been developed as a specialized form of a workflow management systemdesigned specifically to compose and execute a series of computational or data manipulation steps, or a workflow,in the Grid context. 4

2.3.2 Virtual Organizations

The term virtual organization is used to describe a network of independent firms that join together, often temporar-ily, to produce a service or product. Virtual organization is often associated with such terms as virtual office, virtualteams, and virtual leadership. The ultimate goal of the virtual organization is to provide innovative, high-qualityproducts or services instantaneously in response to customer demands. The term virtual in this sense has its rootsin the computer industry. When a computer appears to have more storage capacity than it really possesses it isreferred to as virtual memory. Likewise, when an organization assembles resources from a variety of firms, a virtualorganization seems to have more capabilities than it actually possesses.

2.4 Cloud Computing

2.4.1 Definitions

”Cloud computing is a pay-per-use model for enabling convenient, on-demand network access to a shared pool ofconfigurable and reliable computing resources (e.g., networks, servers, storage, applications, services) that can berapidly provisioned and released with minimal consumer management effort or service provider interaction.”

”Computing Power as a configurable, payable Service”

Overview

Cloud computing is a colloquial expression used to describe a variety of different computing concepts that involve alarge number of computers that are connected through a real-time communication network (typically the Internet).Cloud computing is a jargon term without a commonly accepted non-ambiguous scientific or technical definition.In science, cloud computing is a synonym for distributed computing over a network and means the ability to run aprogram on many connected computers at the same time. The popularity of the term can be attributed to its usein marketing to sell hosted services in the sense of application service provisioning that run client server softwareon a remote location. 5

Figure 1: Cloud Computing

4https://en.wikipedia.org/wiki/Grid_computing5http://en.wikipedia.org/wiki/Cloud_computing

6

Page 7: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

2.4.2 5 Cloud Characteristics

• On-demand self-service

• Ubiquitous network access

• Resource pooling

• Rapid elasticity

• Pay per use

2.4.3 Delivery Models

• Cloud Software as a Service (SaaS)

– Use provider’s applications over a network

– E.g., Salesforce.com

• Cloud Platform as a Service (PaaS)

– Deploy customer-created applications to a cloud

– E.g. Google App Engine, Microsoft Azure

• Cloud Infrastructure as a Service (IaaS)

– Rent processing, storage, network capacity, and other fundamental computing resources

– E.g. Elastic Computer Cloud (EC3), Simple Storage Service (S3), Simple DB

2.4.4 Cloud Deployment Types

• Private Cloud: enterprise owned or leased

– Community Cloud - shared infrastructure for specific community

• Public Cloud: sold to the public, mega-scale infrastructure

• Hybrid Cloud: composition of two or more clouds

2.4.5 Cloud Technologies

• Virtualization

• Grid technology

• Service Oriented Architectures

• Distributed Computing

• Broadband Networks

• Browser as a platform

• Free and Open Source Software

• Autonomic Systems

• Web 2.0

• Web application frameworks

• Service Level Agreements (SLAs)

2.4.5.1 Virtualization Virtualization is very important for cloud computing and as a result brings another benefitthat cloud computing is famous for, scalability. Because each virtual server is allocated only enough computingpower and storage capacity that the client needs, more virtual servers can be created. But if the needs grow, morepower and capacity can be allocated to that server, or lowered if needed. And because clients only pay for howmuch computing power and capacity they are using, this can be very affordable for most clients.

Without virtualization, cloud computing as we know it would not exist or would be in a different form. Butsuch is now only in the realm of speculation as virtualization is really here to make Information Technology moreaffordable for the world. 6

6http://www.cloudtweaks.com/2012/12/cloud-computing-and-virtualization/

7

Page 8: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

2.4.5.2 The history of Virtualization IT administrators realized very early that conventional methods of managingIT environments were no longer effective because of dynamic business requirements in agile environment. Demandsfor faster time to market, installation and up-gradation requests, need to quickly apply security patches to operatingsystems and applications, and many other management complications drove to a new strategy to endpoint andhosting server handling and management.

Various IT challenges such as low server utilization, complex server-storage migration, inefficient server deploy-ment, agile Business Requirements, increased total cost of ownership, server sprawl, high-availability requirements,disaster-recovery complexity, green IT requirements, automation, and policy driven management led to the innova-tion called virtualization

Virtualization has been changing information facilities due to its capability to consolidate hardware resourcesand decrease energy costs. Virtualization is the means of establishment in a new, effective era of cloud computing,pushed by need for spending budget effectively, agility and other challenges in the traditional environment.

For anyone already using a virtualized server environment providing basic IT elasticity and scalability, organi-zations are very near to being able to progress your facilities to take benefits of the extra advantages providedby private cloud implementations such as self-service, automation, and faster time to market, and deliver yourInfrastructure as a Service (IaaS). 7

Figure 2: History of Virtualization

Example for Virtualization Middleware

• OpenNebula

– partly developed by the European Union’s Reservoir project

• VMWare Vcloud

• Aneka Clouds

• FOSII Infrastructure (UT Vienna)

7http://www.cloudtweaks.com/2012/12/the-history-of-virtualization/

8

Page 9: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

2.5 Web Application Frameworks

A web application framework (WAF) is a software framework that is designed to support the development ofdynamic websites, web applications, web services and web resources. The framework aims to alleviate the overheadassociated with common activities performed in web development. For example, many frameworks provide librariesfor database access, templating frameworks and session management, and they often promote code reuse. For acomparison of concrete web application frameworks, see Comparison of web application frameworks.

2.6 Web Services

A web service is a method of communication between two electronic devices over the World Wide Web. A webservice is a software function provided at a network address over the web or the cloud, it is a service that is ”alwayson” as in the concept of utility computing.

2.7 Multi-tenancy

Multi-tenancy is one of three features of Saas (Software as a Service) applications. The other two are scalabilityand configurability. Multi-tenancy is a concept that allows the same software to be shared by multiple companies(tenants). Each company only sees their own environment and is not affected by others.

1. Level 1: Ad-Hoc/Custom: Multi-Tenancy is achieved by hosting a separate version of the software onseparate machines for each customer. The software is customized for each customer and thus a differentversion runs on each VM instance.

2. Level 2: Configurable: Customers use the same version of the software. Each customer can configure histenant by using metadata. The same version of the software is used for all customers but still each customergets a separate VM Instance.

3. Level 3: Configurable, Multi-Tenant-Efficent: The software is multi-tenant aware. One instance of thesoftware serves all customers. The software can still be configured by each customer.

4. Level 4: Configurable, Multi-Tenant-Efficent, Scalable: The same version of the multi-tenant-awaresoftware runs on several VM instances. A tenant load-balancer distributes each tenant to an instance. Multipletenants can share the same instance and migrate between instances automatically to balance the load.

Figure 3: Different levels of Multi-tenancy

See full presentation of Peter Mell is available at cloudsecurityallicance.org 8

8https://cloudsecurityalliance.org/csa-mitre-presos/cloud-computing-v24.ppt

9

Page 10: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

2.8 BIG DATA

Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate,manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, asof 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. The target moves dueto constant improvement in traditional DBMS technology as well as new databases like NoSQL and their abilityto handle larger amounts of data. With this difficulty, new platforms of ”big data” tools are being developed tohandle various aspects of large quantities of data. 9

9http://en.wikipedia.org/wiki/Big_data

10

Page 11: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

3 CHAPTER 2

3.1 OS / Virtualization

There are many different approaches to classify OS and computer architectures:

• Quantum computer vs. Chemical computer

• Scalar processor vs. Vector processor

• Non-Uniform Memory Access (NUMA) computers vs. UMA

• Register machine vs. Stack machine

• Harvard architecture vs. von Neumann architecture

• data and memory separated/not-separated

• Cellular architecture

• Local vs. Batch

Another Classification

• Multi-user - A multi-user operating system allows for multiple users to use the same computer at the sametime and/or different times.

• Multiprocessing - An operating system capable of supporting and utilizing more than one computer pro-cessor.

• Multitasking -An operating system that is capable of allowing multiple software processes to run at thesame time.

• Multithreading - Operating systems that allow different parts of a software program to run concurrently.

• ...

3.1.1 Batchsystems

A batch system is used to monitor and control the jobs running on a system. It enforces limits on runtime (walltime)as well as the number of jobs running at one time (both total and per user). To run a job, the batch system allocatesthe resources requested in the batch script, sets up an environment to run the job in (thus running the users .cshrcand .login files), and then runs the job in that environment. In this environment, standard out and standard errorare redirected into files in the current working directory at the time the executable is actually run. 10

Workmode

• A program takes a set of data files as input

• process the data, and produces a set of output data files

• ”batch processing” → the input data are collected into batches on files and are processed in batches by theprogram.

3.1.1.1 Common Batch processing usage

• Data processing

– End of day - reporting (EOD) ” mainframes.

• Printing

– the operator selecting the documents they need printed and indicating to the batch printing softwarewhen, where they should be output and priority of the print job. Then the job is sent to the print queuefrom where printing daemon sends them to the printer.

• Databases

– Used for the automated transaction processing, as contrasted to interactive online transaction processing(OLTP) applications.

10http://www.chpc.utah.edu/docs/manuals/faq/batch.html

11

Page 12: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

• Images

– operations with digital images ”computer programs that let one resize, convert, watermark, or otherwiseedit image files.

• Converting

– converting a number of computer files from one format to another.

• Job scheduling

– UNIX utilizes cron facilities to allow for scheduling of complex job scripts. Windows has a job scheduler.Most high-performance computing clusters use batch processing to maximize cluster usage.

Why Scheduling? A scheduler works with the batch system to increase throughput and enforce policies on thesystem:

• Where to execute my job?

• When to execute my job?

• When can i except the results?

• Fairness

• Efficiency

• Minimization of execution time

• Maximization of throughput

3.1.1.2 Portable Batch System (PBS) Portable Batch System (or simply PBS) is the name of computer softwarethat performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the availablecomputing resources. It is often used in conjunction with UNIX cluster environments. PBS is supported as a jobscheduler mechanism by several meta schedulers including Moab by Cluster Resources (which became AdaptiveComputing Enterprises Inc.) and GRAM (Grid Resource Allocation Manager), a component of the Globus Toolkit.11

Figure 4: Batch System

3.1.2 VGE - Vienna Grid Environment

??

3.2 VMs, VMMs

3.2.1 Why Virtualization?

• Consolidate workloads to reduce hardware, power, and space requirements.

• Run multiple operating systems simultaneously — as an enterprise upgrade path, or to leverage the advantagesof specific operating systems

• Run legacy software on newer, more reliable, and more power-efficient hardware.

• Dynamically migrate workloads to provide fault tolerance.

• Provide redundancy to support disaster recovery.

11http://en.wikipedia.org/wiki/Portable_Batch_System

12

Page 13: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 5: VMs

3.2.2 Types of virtualization

• Software, or full virtualization

– Hypervisor “trap” the machine operations the OS and uses to read or modify the system’s status orperform input/output (I/O) operations

– Emulation of operations

– Status code consistent with the OS

• Partial virtualization or para-virtualization

– Eliminates trapping and emulating

– Guest OS knows about hypervisor

• Hardware-assisted virtualization

– hardware extensions to the x86 system architecture to eliminate much of the hypervisor overhead asso-ciated with trapping and emulating I/O operations

– Rapid Virtualization Indexing

Virtual Machine Monitor

A virtual machine monitor (VMM) is a host program that allows a single computer to support multiple, identicalexecution environments. All the users see their systems as self-contained computers isolated from other users, eventhough every user is served by the same machine. In this context, a virtual machine is an operating system (OS)that is managed by an underlying control program. For example, IBM’s VM/ESA can control multiple virtualmachines on an IBM S/390 system. 12

VMM decouples the software from the hardware by forming a level of indirection between the software runningin the virtual machine (layer above the VMM) and the hardware.

Design goals for VMMs

• Compatibility

• Performance

• Simplicity

12http://searchservervirtualization.techtarget.com/definition/virtual-machine-monitor

13

Page 14: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

3.2.3 Hypervisor vs. hosted Virtualization

In computing, a hypervisor or virtual machine monitor (VMM) is a piece of computer software, firmware or hardwarethat creates and runs virtual machines. A computer on which a hypervisor is running one or more virtual machinesis defined as a host machine. Each virtual machine is called a guest machine. The hypervisor presents the guestoperating systems with a virtual operating platform and manages the execution of the guest operating systems.Multiple instances of a variety of operating systems may share the virtualized hardware resources.

3.2.3.1 Type 1 and Type 2 Virtualization

• Type 1 (or native, bare metal) hypervisors run directly on the host’s hardware to control the hardware and tomanage guest operating systems. A guest operating system thus runs on another level above the hypervisor.This model represents the classic implementation of virtual machine architectures; the original hypervisorswere the test tool, SIMMON, and CP/CMS, both developed at IBM in the 1960s. CP/CMS was the ancestorof IBM’s z/VM. Modern equivalents of this are Oracle VM Server for SPARC, Oracle VM Server for x86, theCitrix XenServer, VMware ESX/ESXi, KVM, and Microsoft Hyper-V hypervisor.

• Type 2 (or hosted) hypervisors run within a conventional operating system environment. With the hypervisorlayer as a distinct second software level, guest operating systems run at the third level above the hardware.BHyVe, VMware Workstation and VirtualBox are examples of Type 2 hypervisors.

In other words, Type 1 hypervisor runs directly on the hardware; a Type 2 hypervisor runs on another operatingsystem, such as FreeBSD, Linux, or Windows. 13. More information on the differences between Type 1 and Type2 hypervisors and how to classify Xen and KVM can he found on an article on techtarget.com 14

Figure 6: Type 1, Type 2 Virtualization

3.2.4 Basic Virtualization Techniques

→ http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf ←

3.3 Xen

Xen is a hypervisor providing services that allow multiple computer operating systems to execute on the samecomputer hardware concurrently.

3.3.1 Architecture

Xen is a native, or bare-metal hypervisor. It runs in a more privileged CPU state than any other software onthe machine. Responsibilities of the hypervisor include memory management and CPU scheduling of all virtualmachines (”domains”), and for launching the most privileged domain (”dom0”) - the only virtual machine whichby default has direct access to hardware. From the dom0 the hypervisor can be managed and unprivileged domains(”domU”) can be launched. The dom0 domain is typically a modified version of Linux, NetBSD or Solaris. Userdomains may either be unmodified open-source or proprietary operating systems, such as Microsoft Windows (if thehost processor supports x86 virtualization, e.g., Intel VT-x and AMD-V), or modified, para-virtualized operatingsystem with special drivers that support enhanced Xen features. On x86 Xen with a Linux dom0 runs on Pentium

13https://en.wikipedia.org/wiki/Hypervisor14http://searchservervirtualization.techtarget.com/news/2240034817/KVM-reignites-Type-1-vs-Type-2-hypervisor-debate

14

Page 15: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Pro or newer processors. Xen boots from a bootloader such as GNU GRUB, and then usually loads a paravirtualizedhost operating system into the host domain (dom0). 15

Xen is an open source virtualization software based on paravirtualization technology. This section provides anoverview of the Xen 3.0 architecture.1 Figure 7 shows the architecture of Xen 3.0 hosting four VMs (Domain 0,VM 1, VM 2, and VM 3). This architecture includes the Xen Virtual Machine Monitor (VMM), which abstractsthe underlying physical hardware and provides hardware access for the different virtual machines. Figure 1 showsthe special role of the VM called Domain 0. Only Domain 0 can access the control interface of the VMM, throughwhich other VMs can be created, destroyed, and managed. Management and control software runs in Domain 0.Administrators can create virtual machines with special privileges—such as VM 1—that can directly access thehardware through secure interfaces provided by Xen. Administrators can create other virtual machines that canaccess the physical resources provided by Domain 0’s control and management interface in Xen. In this example,the guest operating systems in VM 1 and in VM 2 are modified to run above Xen and also have Xen-aware driversto enable high performance. Near-native performance can be achieved through this approach. Unmodified guestoperating systems are also supported, as discussed in the “Xen and Intel Virtualization Technology” section in thisarticle. In addition, the developers of Xen 3.0 plan to include support for virtual machines with symmetric mul-tiprocessing (SMP) capabilities, 64-bit guest operating systems, Accelerated Graphics Port (AGP), and AdvancedConfiguration and Power Interface (ACPI). In a virtual data center framework, CPU, memory, and I/O componentsneed to be virtualized. Xen 3.0 is designed to enable para-virtualization of all three hardware components.

Figure 7: Xen Architecture hosting four VMs

CPU Operations The Intel x86 architecture provides four levels of privilege modes. These modes, or rings, arenumbered 0 to 3, with 0 being the most privileged. In a non-virtualized system, the OS executes at ring 0 and theapplications at ring 3. Rings 1 and 2 are typically not used. In Xen para-virtualization, the VMM executes at ring0, the guest OS at ring 1, and the applications at ring 3. This approach helps to ensure that the VMM possessesthe highest privilege, while the guest OS executes in a higher privileged mode than the applications and is isolatedfrom the applications. Privileged instructions issued by the guest OS are verified and executed by the VMM.

Memory Operations In a non-virtualized environment, the OS expects contiguous memory. Guest operatingsystems in Xen paravirtualization are modified to access memory in a non-contiguous manner. Guest operatingsystems are responsible for allocating and managing page tables. However, direct writes are intercepted and validatedby the Xen VMM.

I/O Operations I/O operations. In a fully virtualized environment, hardware devices are emulated. Xen para-virtualization exposes a set of clean and simple device abstractions. For example, I/O data to and from guestoperating systems is transferred using sharedmemory ring architecture (memory is shared between Domain 0 andthe guest domain) through which incoming and outgoing messages are sent. Modifying the guest OS is not feasiblefor non–open source platforms such as Microsoft Windows 2000 or Windows Server 2003 operating systems. Asa result, such operating systems are not supported in a para-virtualization environment. The following sectionexplains how Xen works with Intel Virtualization Technology to support unmodified operating systems.

15https://en.wikipedia.org/wiki/Xen

15

Page 16: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

3.3.2 Dynamic Memory Control (DMC)

Dynamic Memory Control (DMC) is a technology provided by Xen Cloud Platform (XCP), starting from the 0.5release. DMC makes it possible to:

• change the amount of host physical memory assigned to any running virtual machine without rebooting it.(within limits specified by an administrator)

• start additional virtual machines on a host whose physical memory is currently full, by automatically reducingthe memory allocations of existing virtual machines in order to make space. (within limits specified by anadministrator)

3.3.3 Balloon Drivers

Balloon drivers make it possible to temporarily remove memory from running guests so that the memory can beused by other guests.

In order to add memory to or remove memory from a running guest, DMC relies completely on the action ofa balloon driver running within the guest operating system. The balloon driver works by inflating or deflating amemory balloon - a special area of memory within the guest’s physical address space.

During normal operation, every physical memory page within the guest is backed up by a physical memory pageon the host. When XCP wants to reduce a guest’s memory allocation, it asks the balloon driver running in theguest to ”inflate” its memory balloon. The balloon driver achieves this by using an OS-specific technique to allocatephysical memory pages from the guest. Initially, these guest physical memory pages will be backed by host physicalmemory pages, just like any other pages in the guest. However, after acquiring guest physical memory pages, theballoon driver immediately informs the hypervisor that it may recycle the host physical memory pages that backthem. The hypervisor immediately revokes the guest’s access to these host physical memory pages, and makes themavailable for use by other guests. Inflating a balloon measurably increases physical memory pressure within theguest. From the guest’s point of view, it can no longer use the memory that’s been taken away. The balloon driverappears to be rather like a long-lived process that’s using some of the available guest physical memory. When XCPwants to increase a guest’s memory allocation, it asks the balloon driver running in the guest to ”deflate” its memoryballoon. However, in order to do this, the balloon driver must first ask the hypervisor to re-back ballooned-outguest physical pages with host physical pages. The hypervisor may refuse, and will certainly do so if there are noremaining physical pages available on the host. (However, in normal circumstances, XCP will never ask the guestto increase its memory allocation if it thinks the hypervisor cannot meet the demand.)

3.3.4 Paravirtualization

In computing, paravirtualization is a virtualization technique that presents a software interface to virtual machinesthat is similar but not identical to that of the underlying hardware. The intent of the modified interface is to reducethe portion of the guest’s execution time spent performing operations which are substantially more difficult to run ina virtual environment compared to a non-virtualized environment. The paravirtualization provides specially defined’hooks’ to allow the guest(s) and host to request and acknowledge these tasks, which would otherwise be executedin the virtual domain (where execution performance is worse). A successful paravirtualized platform may allow thevirtual machine monitor (VMM) to be simpler (by relocating execution of critical tasks from the virtual domain tothe host domain), and/or reduce the overall performance degradation of machine-execution inside the virtual-guest.Paravirtualization requires the guest operating system to be explicitly ported for the para-API — a conventionalOS distribution that is not paravirtualization-aware cannot be run on top of a paravirtualizing VMM. However,even in cases where the operating system cannot be modified, still components may be available that enable manyof the significant performance advantages of paravirtualization; for example, the Xen Windows GPLPV projectprovides a kit of paravirtualization-aware device drivers, licensed under the terms of the GPL, that are intended tobe installed into a Microsoft Windows virtual-guest running on the Xen hypervisor. 16

An alternative to Paravirtualization are Hardware (based) Virtual Machines (HVM) which require the CPU andBIOS to support special instructions for virtual machines (Intel VT-x and AMD-V). However these extensions donot make paravirtualization obsolete, since they have a higher overhead (10-15%) compared to paravirtualization(2-3%).

Depending on the sources, configuration and applications the overhead varies, but 2-3% compared to10-15% CPU overhead (in favor of paravirtualization) seem realistic. 17.

Overview of Xen: http://www.dell.com/downloads/global/power/ps3q05-20050191-abels.pdf

16http://en.wikipedia.org/wiki/Paravirtualization17http://www.techinferno.com/2012/07/05/xen-and-debian-an-introduction-to-virtualization/3/

16

Page 17: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 8: Paravirtualization

3.3.5 Domains in Xen

Figure 9: Xen, Domains

3.3.6 Hypercalls in Xen

3.4 VMWare

http://www.vmware.com/pdf/virtualization.pdf

3.4.1 Hosted vs. Hypervisor Architecture

Hosted

• installs and runs as an application

• relies on host OS for device support and physical resource management

Bare-Metal (Hypervisor) Architecture

• Lean virtualization-centric kernel

• Service console for agents and helper applications

17

Page 18: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 10: Xen, Hypercalls

Figure 11: Hosted vs. hypervisor

3.5 Cloud Management

Eucalyptus, OpenNebula and Nimbus are three major open-source cloud-computing software platforms.The overall function of these systems is to manage the provisioning of virtual machines for a cloud pro-viding infrastructure-as-a-service. These various open-source projects provide an important alternativefor those who do not wish to use a commercially provided cloud.

3.6 OpenNebula

OpenNebula is a distributed virtualization layer. OpenNebula is an open-source cloud computing toolkitfor managing heterogeneous distributed data center infrastructures. The OpenNebula toolkit manages adata center’s virtual infrastructure to build private, public and hybrid implementations of infrastructureas a service.

OpenNebula orchestrates storage, network, virtualization, monitoring, and security technologies to de-ploy multi-tier services (e.g. compute clusters) as virtual machines on distributed infrastructures, com-bining both data center resources and remote cloud resources, according to allocation policies. Accordingto the European Commission’s 2010 report ”... only few cloud dedicated research projects in the widestsense have been initiated – most prominent amongst them probably OpenNebula ...”. The toolkitincludes features for integration, management, scalability, security and accounting. It also claims stan-dardization, interoperability and portability, providing cloud users and administrators with a choiceof several cloud interfaces (Amazon EC2 Query, OGF Open Cloud Computing Interface and vCloud)and hypervisors (Xen, KVM and VMware), and can accommodate multiple hardware and softwarecombinations in a data center.18

• Transform a distributed infrastructure into a flexible virtual infrastructure

• Adapt it to the changing demands of the the service workload

• OpenNebula is a distributed virtualization layer

18http://en.wikipedia.org/wiki/OpenNebula

18

Page 19: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

• Decouple the service from the physical infrastructure

Figure 12: Open Nebula

3.7 Eucalyptus

Eucalyptus is open source computer software for building Amazon Web Services (AWS)-compatibleprivate and hybrid cloud computing environments marketed by the company Eucalyptus Systems. Eu-calyptus enables pooling compute, storage, and network resources that can be dynamically scaled up ordown as application workloads change. Eucalyptus Systems announced a formal agreement with AWSin March 2012 to maintain compatibility. 19

https://dspace.ist.utl.pt/bitstream/2295/584877/1/EucalyptusWhitepaperAug2009.pdf

3.7.1 Components

• Node Controller controls the execution, inspection, and terminating of VM instances on the hostwhere it runs.

• Cluster Controller gathers information about and schedules VM execution on specific node con-trollers, as well as manages virtual instance network.

• Storage Controller (Walrus) is a put/ get storage service that implements Amazon’s S3 interface,providing a mechanism for storing and accessing virtual machine images and user data.

• Cloud Controller is the entry-point into the cloud for users and administrators. It queries nodemanagers for information about resources, makes highlevel scheduling decisions, and implementsthem by making requests to cluster controllers.

3.8 Virtualization - Glossary

Virtual Machine

A representation of a real machine using software that provides an operating environment which can runor host a guest operating system. Guest Operating System An operating system running in a virtualmachine environment that would otherwise run directly on a separate physical system.

19http://en.wikipedia.org/wiki/Eucalyptus_(computing)

19

Page 20: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 13: Eucalyptus employs a hierarchical design to reflect underlying resource topologies

Virtual Machine Monitor

Software that runs in a layer between a hypervisor or host operating system and one or more virtualmachines that provides the virtual machine abstraction to the guest operating systems. With fullvirtualization, the virtual machine monitor exports a virtual machine abstraction identical to a physicalmachine, so that standard operating systems (e.g., Windows 2000, Windows Server 2003, Linux, etc.)can run just as they would on physical hardware.

Hypervisor

A thin layer of software that generally provides virtual partitioning capabilities which runs directly onhardware, but underneath higher-level virtualization services. Sometimes referred to as a “bare metal”approach.

Hosted Virtualization

A virtualization approach where partitioning and virtualization services run on top of a standard oper-ating system (the host). In this approach, the virtualization software relies on the host operating systemto provide the services to talk directly to the underlying hardware.

Para-virtualization

A virtualization approach that exports a modified hardware abstraction which requires operating systemsto be explicitly modified and ported to run.

Virtualization Hardware Support

Industry standard servers will provide improved hardware support for virtualization. Initial hardwaresupport includes processor extensions to address CPU and some memory virtualization. Future supportwill include I/O virtualization, and eventually more complex memory virtualization management.

Hardware-level virtualization

Here the virtualization layer sits right on top of the hardware exporting the virtual machine abstraction.Because the virtual machine looks like the hardware, all the software written for it will run in the virtualmachine.

20

Page 21: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Operating system–level virtualization

In this case the virtualization layer sits between the operating system and the application programs thatrun on the operating system. The virtual machine runs applications, or sets of applications, that arewritten for the particular operating system being virtualized.

High-level language virtual machines

In high-level language virtual machines, the virtualization layer sits as an application program on topof an operating system. The layer exports an abstraction of the virtual machine that can run programswritten and compiled to the particular abstract machine definition. Any program written in the high-level language and compiled for this virtual machine will run in it.

21

Page 22: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

4 CHAPTER 3

4.1 Self adaptable Clouds: Cloud Monitoring and Knowledge Management

Challenges

• How to monitor Cloud resources?

• How to manage and enforce SLA agreement based on monitored resource metrics?

4.1.1 Traditional MAPE Loop

Monitoring – Analysis – Planning – Exectuion

4.1.2 SLA

Cloud services include high performance applications requiring lots of system resources. Service provi-sioning in the Cloud is based on Service Level Agreements (SLA), which is a contract signed betweenthe customer and the service provider. It states the terms of the service including the non-functionalrequirements of the service specified as quality of service (QoS), obligations, service pricing, and penal-ties in case of agreement violations. In order to guarantee an agreed SLA, the service provider mustbe capable of monitoring its infrastructure (host) resource metrics to enforce the agreed service terms.Traditional monitoring technologies for single machines or Clusters are restricted to locality and homo-geneity of monitored objects and, therefore, cannot be applied in the Cloud in an appropriate manner.Moreover, in traditional systems there is a gap between monitored metrics, which are usually low-levelentities, and SLA agreements, which are high-level user guarantee parameters.

4.1.3 LoM2His Framework

Lom2His is a novel framework for managing the mappings of the Low-level resource Metrics to High-levelSLAs. The LoM2HiS framework isembedded into FoSII infrastructure, which facilitates autonomic SLAmanagement and enforcement. Thus, the LoM2HiS framework detects future SLA violation threats andcan notify the enactor component to act so as to avert the threats. 20

Figure 14: LoM2His Framework

20Low Level Metrics to High Level SLAs - LoM2HiS Framework: Bridging the Gap Between Monitored Metrics and SLA Parametersin Cloud Environments Vincent C. Emeakaroha, Ivona Brandic, Michael Maurer, Schahram Dustdar http://dsg.tuwien.ac.at/

staff/vincent/pub/lom2his.pdf

22

Page 23: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Goal

• resource monitoring

• metrics mapping

• SLA violation detection

Metric Mapping Rules The run-time monitor chooses the mapping rules to apply based on the servicesbeing provisioned. These rules are used to compose, aggregate, or convert the low-level metrics to formthe high-level SLA parameter. We distinguish between simple and complex mapping rules. A simplemapping rule maps one-to-one from low-level to high-level, as for example mapping low-level metric “diskspace” to highlevel SLA parameter “storage”. In this case only the units of the quantities are consideredin the mapping rule. Complex mapping rules consist of predefined formulae for the calculation of specificSLA parameters using the resource metric. 20

Figure 15: Example for Complex Mapping Rules

4.1.4 Cloud Characteristics

• dynamic

• on demand: computing as utility

• unforeseen load changes

• autonomic adaptation and (re-)provisioning of resources

• very scalable

4.1.5 How to make Clouds energy-efficient

Intra-VM optimization An optimization technique based on

• VM reconfiguration

• Application migration

• VM migration

• Cloud Federation

4.1.6 How to avoid SLA violations

Knowledge Management

• How to store relevant information about theCloud infrastructure?

• How to use/interpret it? What to infer?

Knowledge DBs Predict SLA violations before they happen. Problems:

• How to identify possible SLA violations ahead of time?

• Thresholds for the SLA parameter values where we have to react

• Tradeoff: preventions of SLA violations vs. doing nothing and paying penalties

• Consider non SLA parameters like energy efficiency, carbon footprint

23

Page 24: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Overview of KM techniques to answer these questions

We can use several KM techniques to answer questions about how to learn and interpret informationsin order to detect if SLA would get violated:21

Rule based A rule-based system contains rules in the ”IF Condition THEN Action” format, e.g.,

IF IB less than TTIB THEN Add physical machine to VM 22

Default Logic Default Logic is a version of a rule-based system whose rules are no longer simple IF-THEN Rules, but can be described as IF condition - and there are no reasons against it- THEN action.e.g.

d1 = IB<TTIB :IncreaseIBshareIncreaseIBshare

The rule means: If incoming bandwidth is smaller than its threat threshold, and if there is no reasonagainst increasing bandwidth share, then increase bandwidth share. Reasons against could be that thebandwidth share is already at its maximum or that other (possibly more important) services issued arequest for an increase at the same time.

Contrary to ordinary rules in a rule-based system, it is easy for default rules to understand that resourcescannot be increased indefinitely.

Situation Calculus Situation Calculus describes the world we observe in states, the so called fluents,and situations. Fluents are first-order logic formulae that can be true or false based on the situation inwhich they are observed. Situations themselves are a finite series of actions. The situation before anyaction has been performed - the starting point of the system - is called the initial situation. The state ofa situation s is the set of all fluents that are valid in s. Predefined actions can advance situations to newones in order to work towards achieving a pre-defined goal by manipulating the fluents in this domain.For a world of three bricks that can be stacked upon each other lying on a table, fluents are quite easyto find: First, a brick can be on the table or not. Second, a brick can have another brick on it or not.Third, a brick x can lie on a brick y or not. Two possible actions are: Stack brick x on brick y andunstack brick y, i.e., put brick y onto the table. Now, a goal could be to have one pile of all three bricksin a specified order with an initial situation of them being piled in the reverse order. In each state of asituation, different fluents are true (e.g., brick x lies on brick y, brick y does not lie on brick x, brick zlies on the table), and stacking or unstacking generates a new situation. To map this analogy to CloudComputing is not as easy. As far as fluents are concerned, in a Cloud we have to consider the currentvalue of each specific parameter, and whether the respective SLO is fulfilled or not. Furthermore, allthe states of the Cloud itself like number of running virtual machines, number of physical machinesavailable, etc., have to be modeled as fluents as well. Fluents for a specific application could be thepredicate has value(SLAParameter p, Value v) with v 2 R 2 meaning that the SLAParameter p holdsthe value v in the current situation, and fulills(SLO s) meaning that the specified application fulfills acertain SLO s. The predicate has value(SLAParameter p1, x) is valid for only one x 2 R in a certainsituation. The possible actions are provided by our use case

Case Based Reasoning (CBR) Case Based Reasoning is the process of solving problems based on pastexperience. In more detail, it tries to solve a case (a formatted instance of a problem) by looking forsimilar cases from the past and reusing the solutions of these cases to solve the current one. In general,a typical CBR cycle consists of the following phases assuming that a new case was just received:

1. Retrieve the most similar case or cases to the new one.

2. Reuse the information and knowledge in the similar case(s) to solve the problem.

3. Revise the proposed solution.

4. Retain the parts of this experience likely to be useful for future problem solving.

In step 4, the new case and the found solution is stored in the knowledge base.

21Towards Knowledge Management in Self-adaptable Clouds. Michael Maurer, Ivona Brandic, Vincent C. Emeakaroha and SchahramDustdar

22TTIB indicates the Threat Threshold value to trigger actions before a SLA would get violated. Finding such a value is not trivial.

24

Page 25: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Speculative approach

May we allocate less resources then agreed, but more than actually utilized at the specific point intime - and not violate SLAs?

4.1.7 How to structure actions

Escalation levels:

1. Change VM configuration

2. Migrate applications from one VM to another.

3. Migrate one VM from one PM to another or create new VM on appropriate PM.

4. Turn on/off PM.

5. Outsource to other Cloud provider.

or Do nothing! (which is sometimes also a solution)

4.2 Policy Modes

→ Global view of the Cloud infrastructure

• Green: Plenty of resources left. Over-consumption allowed.

• Green-Orange: Heavy over-consumption forbidden.

• Orange: Resource is becoming scarce, but SLA demand can be fulfilled if no over-consumptiontakes place. Thus, over-consumption is forbidden.

• Orange-Red: Over-consumption forbidden. Initiate outsourcing of some applications.

• Red: Over-consumption forbidden. SLA resource requirements of all consumers cannot be fulfilled.If possible, a specific choice of applications is outsourced. If not enough, applications with higherreputation points or penalties are given priority over applications with less impact. SLAs of latterones are deliberately broken to ensure SLAs of former ones.

4.3 Cloud Market

In business, economics or investment, market liquidity is an asset’s ability to be sold without causing asignificant movement in the price and with minimum loss of value. Money, or cash, is the most liquidasset, and can be used immediately to perform economic actions like buying, selling, or paying debt,meeting immediate wants and needs. However, currencies, even major currencies, can suffer loss ofmarket liquidity in large liquidation events. For instance, scenarios considering a major dump of USdollar bonds by China, Saudi Arabia, or Japan (each of which holds trillions of dollars in such bonds)would certainly affect the market liquidity of the US dollar and US dollar denominated assets. Thereis no asset whatsoever that can be sold with no effect on the market. Liquidity also refers both to abusiness’s ability to meet its payment obligations, in terms of possessing sufficient liquid assets, andto such assets themselves. An act of exchange of a less liquid asset with a more liquid asset is calledliquidation. 23

An important characteristic of Cloud markets is the liquidity of the traded good. For the market tofunction efficiently, a sufficient number of market participants is needed. Creating such a market witha large number of providers and consumers is far from trivial. Resource consumers will only join, ifthey are able to find what they need quickly. Resource providers will only join, if they can be fairlycertain that their resources will be sold. Not meeting either of these conditions will deter providers andconsumer from using the market.

4.4 Cloud Characteristics

See figure 16.

4.5 Cloud Enabling Technologies

See figure 17.

23https://en.wikipedia.org/wiki/Market_liquidity

25

Page 26: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 16: Cloud Characteristics

Figure 17: Cloud Enabling Technologies

4.6 Problems when providing virtual goods

The use of virtualization enables providers to create a wide range of resource types and allows consumersto specify their needs precisely. If the resource variability of both sides is large, consumers and providerswill not meet, since their offers may differ slightly. With the help of a simulation model, we willdemonstrate the problems caused by a large number of resource definitions.

4.7 Resource markets in Research

The research into resource markets can be divided into two groups, when looking at their attempts ofdescribing the tradable good. The first group consists largely of Grid market designs that did notdefine goods clearly:

• E.g. GRACE developed a market architecture for Grid markets and outlined a market mechanism

• E.g. The SORMA project focused more on fairness and efficient resource allocation; it also iden-tified several requirements for open Grid markets (allocative efficiency, computational tractability,individual rationality, etc)

The second group has simplified the computing resource good by focusing on only one aspect of it

• In MACE, the importance of developing a definition for the tradable good was recognized and anabstraction was developed. The liquidity of goods and the likelihood that consumer and providerswith common offers can meet, was not addressed.

• The Popcorn market only traded Java Operations which simplified the matching between con-sumers and providers

26

Page 27: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

• The Spawn market was envisioned to work with CPU time slices which makes the matching ofdemand and supply trivial but forces consumers to determine the number of required CPU cycles

4.8 Commercial Resource Providers

In recent years, a large number of commercial Cloud providers have entered the utility computing market,offering a number of different types of services:

• Resource providers who only provide computing resources (e.g. Amazon, Tsunamic Technologies)

• Saas providers who sell their own resources together with their own software services (e.g. GoogleApps, Salesforce.com)

• Companies that attempt to run a mixed approach, i.e. they allow users to create their own servicesbut at the same time, offer their own services (Sun N1 Grid, Microsoft Azure)

In the current market, providers only sell a single type of resources (with the exception of Amazon).This limited number of different resource types enables a market creation, since all demand is channeledtowards very few resource types.

4.9 Liquidity Problems in Markets

If an open Cloud market is created, in which resource specifications are left to the trader, would such amarket be able to match providers and consumers? This we have simulated in a double auction marketenvironment with varying numbers of resource types and traders. The matching probability was usedas a measure to determine how attractive a market would be to providers and consumers.

The Challenges

Based on the analysis of the liquidity problems in markets, we are faced with an interesting researchchallenge: On the one hand, to fully exploit the potential of open markets, a large number of providersand consumers is necessary.

On the other hand, the large number of potential traders might inflate the variety of resources whichleads to the problem that the supply and the demand are spread across a wide range of resources

To give traders few restrictions, an approach is needed which allows traders to define their resources (orrequirements) freely while facilitating SLA matching

4.10 The importance of SLAs in markets

Current adaptive SLA matching mechanisms are based on OWL and DAML-S and other semantictechnologies. However, none of these approaches address the issues of the open market. In mostexisting approaches, user and consumer have to agree either on specific ontologies or have to belongto a specific portal. None of the approaches deal with semi-automatic definition of SLA mappingsenabling negotiations between inconsistent SLA templates. 24

4.11 Managing SLAs

4.12 The SLA Template Lifecycle

4.13 SLA Mapping in Double Auctions

In a DA market, traders use public SLA templates, which describe all aspects of the computing resources(e.g. the software running on these resources, the Terms of Use, and the price). A small number ofpublic SLA templates in the market encourages consumers to map their demand to the existing SLAtemplates (despite the inherent user utility loss of this method). Similarly, providers are encouraged tomap their supply to the public SLA templates (despite the inherent user utility loss of this method).Once the market is operating, provisions have to be made to add new SLA templates and to removeunpopular SLA templates.

24SLA templates represent popular SLA formats containing all attributes and parameters but without any values and are usually usedto channel demand and offer of a market. Private templates are utilized at the buyers and traders infrastructures and reflect theneeds of the particular stakeholder in terms of SLA parameters they use to establish a contract.

27

Page 28: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 18: Managing SLAs

Figure 19: The SLA Template LifecycleAn initial template is created in the beginning of the lifecycle (step 1, Figure 3). Afterwards, consumers perform

SLA mappings (step 2). Based on their needs, inferred from these mappings (step 3), and the predefinedadaptation method, the public SLA template is adapted (step 4). Assuming that the demand of market

participants does not change, a final template is generated (step 5). If the demand has changed during a fixedtime period, the process continues with step 2. In practice, the time between two iterations could correspond to a

time period of one week. During that time new SLA mappings are solicited from consumers and users.

Figure 20: The SLA Template Lifecycle

28

Page 29: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

4.14 Consequences of few resource types

In a market with a single tradable SLA template, the matching probability is high, since all supply anddemand is channeled into a single resource type

In a market with 100 tradable SLA templates, about 12,000 traders would be needed to reach a matchingprobability of 75 %

In a market with 1000 tradable SLA templates, about 17,000 traders are needed to reach a matchingprobability of 75 %

4.15 Mapping the SLA Landscape for High Performance Clouds

http://www.hpcinthecloud.com/hpccloud/2011-02-07/mapping_the_sla_landscape_for_high_performance_

clouds.html

4.16 SLA Mapping Approach

The SLA mapping process is presented in Figure 1. In the step 1, a service provider assigns his service(e.g., infrastructure resources) to a particular public SLA template. Since it is sometimes not possible tofind a perfect match between a private SLA template and a public SLA template, service provider cande- fine mappings to bridge the differences between two templates (step 2). In the next step, a serviceconsumer can look for appropriate Cloud services. Since it might happen that a consumer cannot changehis private template due to existing business processes, legal issues, or other reasons, he can define SLAmappings to bridge the differences between his private template and the public template assigned to theservice (step 4). Both service provider and service consumer can specify two types of SLA mappings:

• Ad-hoc SLA mapping type defines translation between a parameter existing in the user’s privateSLA template and the public SLA template. We distinguish simple ad-hoc mappings, i.e., mappingof different values for an SLA element (e.g., a mapping between the names ”CPUCores” and”NumberOfCores” of an SLA parameter, or a mapping between two different values of a servicelevel objective), and complex ad-hoc mappings. The later one maps between different functionsused for calculating a value of an SLA parameter (e.g., defining a mapping for a metric unit of avalue of an SLA parameter such as ”Price” from ”EUR” to ”USD” can translate one function forcalculating price to another one).

• Future SLA mapping type defines a wish for adding (or deleting) a new SLA parameter that issupported by the private template to a public SLA template. Unlike ad-hoc mapping, futuremapping cannot be applied immediately.

29

Page 30: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

5 CHAPTER 4

Motivation In the field of Large scale data processing we want to use 1000s of CPUs but do not wantto hassle of managing things. Map-Reduce provides:

• Automatic parallelization and distribution

• Fault tolerance

• I/O Scheduling

• Monitoring and status updates

5.1 Map-Reduce Overview

MapReduce is a framework for processing parallelizable problems across huge datasets using a largenumber of computers (nodes), collectively referred to as a cluster (if all nodes are on the same localnetwork and use similar hardware) or a grid (if the nodes are shared across geographically and admin-istratively distributed systems, and use more heterogenous hardware). Computational processing canoccur on data stored either in a filesystem (unstructured) or in a database (structured). MapReduce cantake advantage of locality of data, processing data on or near the storage assets to decrease transmissionof data. The first complete end-to-end framework for MapReduce on top of Apache Hadoop was donewithin the Advanced Research Group of Unisys under Dr. Sumeet Malhotra.

”Map” step: The master node takes the input, divides it into smaller sub-problems, and distributesthem to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure.The worker node processes the smaller problem, and passes the answer back to its master node.

”Reduce” step: The master node then collects the answers to all the sub-problems and combines themin some way to form the output – the answer to the problem it was originally trying to solve.

MapReduce allows for distributed processing of the map and reduction operations. Provided eachmapping operation is independent of the others, all maps can be performed in parallel – though inpractice it is limited by the number of independent data sources and/or the number of CPUs near eachsource. Similarly, a set of ’reducers’ can perform the reduction phase – provided all outputs of the mapoperation that share the same key are presented to the same reducer at the same time, or if the reductionfunction is associative. While this process can often appear inefficient compared to algorithms that aremore sequential, MapReduce can be applied to significantly larger datasets than ”commodity” serverscan handle – a large server farm can use MapReduce to sort a petabyte of data in only a few hours. Theparallelism also offers some possibility of recovering from partial failure of servers or storage during theoperation: if one mapper or reducer fails, the work can be rescheduled – assuming the input data is stillavailable. Another way to look at MapReduce is as a 5-step parallel and distributed computation:

1. Prepare the Map() input – the ”MapReduce system” designates Map processors, assigns the K1input key value each processor would work on, and provides that processor with all the input dataassociated with that key value.

2. Run the user-provided Map() code – Map() is run exactly once for each K1 key value, generatingoutput organized by key values K2.

3. ”Shuffle” the Map output to the Reduce processors – the MapReduce system designates Reduceprocessors, assigns the K2 key value each processor would work on, and provides that processorwith all the Map-generated data associated with that key value.

4. Run the user-provided Reduce() code – Reduce() is run exactly once for each K2 key value producedby the Map step.

5. Produce the final output – the MapReduce system collects all the Reduce output, and sorts it byK2 to produce the final outcome.

Logically these 5 steps can be thought of as running in sequence – each step starts only after the previousstep is completed – though in practice, of course, they can be intertwined, as long as the final result isnot affected. In many situations the input data might already be distributed (”sharded”) among manydifferent servers, in which case step 1 could sometimes be greatly simplified by assigning Map serversthat would process the locally present input data. Similarly, step 3 could sometimes be sped up byassigning Reduce processors that are as much as possible local to the Map-generated data they need toprocess. 25

Short: The user specifies:

25http://en.wikipedia.org/wiki/MapReduce

30

Page 31: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

• a map function that processes a key/value pair to generate a set of intermediate key/value pairsand

• a reduce function that merges all intermediate values associated with the same intermediate key

5.2 Map Reduce Sequence of Actions

1. The MapReduce library in the user program first splits the input files into M pieces of typically 16megabytes to 64 megabytes (MB) per piece (controllable by the user via an optional parameter).

2. It then starts up many copies of the program on a cluster of machines.

3. One of the copies of the program is special – the master.

4. The rest are workers that are assigned work by the master. There are M map tasks and R reducetasks to assign.

5. The master picks idle workers and assigns each one a map task or a reduce task.

6. A worker who is assigned a map task reads the contents of the corresponding input split.

7. It parses key/value pairs out of the input data and passes each pair to the user-defined Mapfunction.

8. The intermediate key/value pairs produced by the Map function are buffered in memory.

9. Periodically, the buffered pairs are written to local disk, partitioned into R regions by the parti-tioning function.

10. The locations of these buffered pairs on the local disk are passed back to the master, who isresponsible for forwarding these locations to the reduce workers.

11. When a reduce worker is notified by the master about these locations, it uses remote procedurecalls to read the buffered data from the local disks of the map workers.

12. When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so thatall occurrences of the same key are grouped together.

13. The sorting is needed because typically many different keys map to the same reduce task.

14. If the amount of intermediate data is too large for the memory, an external sort is used!

15. The reduce worker iterates over the sorted intermediate data and for each unique intermediate keyencountered, it passes the key and the corresponding set of intermediate values to the user’s Reducefunction.

16. The output of the Reduce function is appended to a final output file for this reduce partition.

17. When all map tasks and reduce tasks have been completed, the master wakes up the user program.

18. At this point, the MapReduce call in the user program returns back to the user code.

Figure 21: Map Reduce Execution Model

31

Page 32: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

5.3 Master Data Structures

The master keeps several data structures. For each map task and reduce task, it stores the state

• idle,

• in-progress,

• completed,

• and the identity of the worker machine (for non-idle tasks).

The master is the conduit through which the location of intermediate file regions is propagated frommap tasks to reduce tasks. For each completed map task, the master stores the locations and sizes ofthe R intermediate file regions produced by the map task. Updates to this location and size informationare received as map tasks are completed. The information is pushed incrementally to workers that havein- progress reduce tasks.

5.4 Fault Tolerance

Since the MapReduce library is designed to help process very large amounts of data using hundreds orthousands of machines, the library must tolerate machine failures gracefully.

• Worker Failure

• Master Failure

• Semantics in the Presence of Failures

5.4.1 Worker Failure

The master pings every worker periodically. If no response is received from a worker in a certain amountof time, the master marks the worker as failed. Any map tasks completed by the worker are reset backto their initial idle state, and therefore become eligible for scheduling on other workers. Similarly, anymap task or reduce task in progress on a failed worker is also reset to idle and becomes eligible forrescheduling. Completed map tasks are re-executed on a failure because their output is stored on thelocal disk(s) of the failed machine and is therefore inaccessible. Completed reduce tasks do not need tobe re-executed since their output is stored in a global file system. When a map task is executed firstby worker A and then later executed by worker B (because A failed), all workers executing reduce tasksare notified of the re-execution. Any reduce task that has not already read the data from worker A willread the data from worker B.

MapReduce is resilient to large-scale worker failures. For example, during one MapReduce operation,network maintenance on a running cluster was causing groups of 80 machines at a time to become un-reachable for several minutes. The MapReduce master simply re-executed the work done by the unreach-able worker machines, and continued to make forward progress, eventually completing the MapReduceoperation.

5.4.2 Master Failure

It is easy to make the master write periodic checkpoints of the master data structures. If the mastertask dies, a new copy can be started from the last checkpointed state. However, given that there isonly a single master, its failure is unlikely; therefore current implementation aborts the MapReducecomputation if the master fails. Clients can check for this condition and retry the MapReduce operationif they desire.

5.5 Data Flow

Input Reader

The input reader divides the input into 16MB to 128MB splits and the framework assigns one split toeach Map function. The input reader reads data from stable storage (typically a distributed file system)and generates key/value pairs. A common example will read a directory full of text files and return eachline as a record.

32

Page 33: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Map function

Each Map function takes a series of key/value pairs, processes each, and generates zero or more outputkey/value pairs. The input and output types of the map can be (and often are) different from eachother. → e.g., If the application is doing a word count, the map function would break the line intowords and output the word as the key and ”1” as the value.

Partition function

The output of all of the maps is allocated to a particular reducer by the application’s partition function.The partition function is given the key and the number of reducers and returns the index of the desiredreduce. A typical default is to hash the key and modulo the number of reducers.

Comparison function

The input for each reduce is pulled from the machine where the map ran and sorted using the applica-tion’s comparison function.

Reduce function

The framework calls the application’s reduce function once for each unique key in the sorted order. Thereduce can iterate through the values that are associated with that key and output 0 or more values. Inthe word count example, the reduce function takes the input values, sums them and generates a singleoutput of the word and the final sum.

Output Writer

The Output Writer writes the output of the reduce to stable storage, usually a distributed file system.

5.6 Partitioning function

The users of MapReduce specify the number of reduce tasks/output files that they desire (R).

Data gets partitioned across these tasks using a partitioning function on the intermediate key.

A default partitioning function is provided that uses hashing (e.g. hash(key) mod R→ tends to resultin fairly well-balanced partitions.

In some cases, however, it is useful to partition data by some other function of the key. → sometimesthe output keys are URLs, and we want all entries for a single host to end up in the same output file.

→ The user of the MapReduce library can provide a special partitioning function. E.g. using hash(Hostname(urlkey))mod R as the partitioning function causes all URLs from the same host to end up in the same outputfile.

5.7 Combiner function

In some cases, there is significant repetition in the intermediate keys produced by each map task, andthe user specified Reduce function is commutative and associative. Example → word counting:

Since word frequencies tend to follow a Zipf distribution, each map task will produce hundreds orthousands of records of the form < the, 1 >. All of these counts will be sent over the network to a singlereduce task and then added together by the Reduce function to produce one number.

→ specify an optional Combiner function that does partial merging of this data before it is sent overthe network.

The Combiner function is executed on each machine that performs a map task. Typically thesame code is used to implement both the combiner and the reduce functions.

The only difference between a reduce function and a combiner function is how the MapReduce libraryhandles the output of the function.

The output of a reduce function is written to the final output file.

The output of a combiner function is written to an intermediate file that will be sent to a reduce task.

Partial combining significantly speeds up certain classes of MapReduce operations.

33

Page 34: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

5.8 Input and Output Types

The MapReduce library provides support for reading input data in several different formats. For ex-ample, .text mode input treats each line as a key/value pair: the key is the offset in the file and thevalue is the contents of the line. Another common supported format stores a sequence of key/valuepairs sorted by key. Each input type implementation knows how to split itself into meaningful rangesfor processing as separate map tasks (e.g. text mode’s range splitting ensures that range splits occuronly at line boundaries). Users can add support for a new input type by providing an implementationof a simple reader interface, though most users just use one of a small number of predefined input types.A reader does not necessarily need to provide data read from a file. For example, it is easy to definea reader that reads records from a database, or from data structures mapped in memory. In a similarfashion, we support a set of output types for producing data in different formats and it is easy for usercode to add support for new output types.

5.9 Hadoop

Apache Hadoop is an open-source software framework that supports data-intensive distributed applica-tions, licensed under the Apache v2 license. It supports the running of applications on large clusters ofcommodity hardware. Hadoop was derived from Google’s MapReduce and Google File System (GFS)papers. The Hadoop framework transparently provides both reliability and data motion to applications.Hadoop implements a computational paradigm named MapReduce, where the application is dividedinto many small fragments of work, each of which may be executed or re-executed on any node in thecluster. In addition, it provides a distributed file system that stores data on the compute nodes, provid-ing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file systemare designed so that node failures are automatically handled by the framework. It enables applica-tions to work with thousands of computation-independent computers and petabytes of data. The entireApache Hadoop ”platform” is now commonly considered to consist of the Hadoop kernel, MapReduceand Hadoop Distributed File System (HDFS), as well as a number of related projects – including ApacheHive, Apache HBase, and others. 26

5.10 Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization,query, and analysis. While initially developed by Facebook, Apache Hive is now used and developed byother companies such as Netflix. Amazon maintains a software fork of Apache Hive that is included inAmazon Elastic MapReduce on Amazon Web Services.

What is Hive

The size of data sets being collected and analyzed in the industry for business intelligence is growingrapidly, making traditional warehousing solutions prohibitively expensive. Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremelylarge data sets on commodity hardware. However, the map-reduce programming model is very low leveland requires developers to write custom programs which are hard to maintain and reuse. In this paper,we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queriesexpressed in a SQL-like declarative language -HiveQL, which are compiled into map-reduce jobs executedon Hadoop. In addition, HiveQL supports custom map-reduce scripts to be plugged into queries. Thelanguage includes a type system with support for tables containing primitive types, collections like arraysand maps, and nested compositions of the same. The underlying IO libraries can be extended to querydata in custom formats. Hive also includes a system catalog, Hive-Metastore, containing schemas andstatistics, which is useful in data exploration and query optimization. In Facebook, the Hive warehousecontains several thousand tables with over 700 terabytes of data and is being used extensively for bothreporting and ad-hoc analyses by more than 100 users.

→ Administration and anlysis of structured data with Hadoop→ Requests with HiveQL, execution with MapReduce→ Data management with HDFS, Meta data for tables→ scalability and failure management→ extendable (User defined table functions, user defined aggregation functions)

26http://en.wikipedia.org/wiki/Apache_Hadoop

34

Page 35: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

What is Hive NOT

→ Does and cannot promise low latency on queries→ he paradigm here is strictly of submitting jobs and being notified when the jobs are completed asopposed to real-time queries→ Hive queries response times for even the smallest jobs can be of the order of several minutes

5.10.1 HiveQL

While based on SQL, HiveQL does not strictly follow the full SQL-92 standard. HiveQL offers extensionsnot in SQL, including multitable inserts and create table as select, but only offers basic support forindexes. Also, HiveQL lacks support for transactions and materialized views, and only limited subquerysupport. Internally, a compiler translates HiveQL statements into a directed acyclic graph of MapReducejobs, which are submitted to Hadoop for execution.

5.11 HadoopDB

MapReduce system such as Hadoop provides a very flexible data processing framework where data doesnot need to be modeled before all the analytic work can start. At run-time, the system automaticallyfigures out how to parallelize the job into tasks, does load-balancing and failure recovery. ParallelDBMSs on the other hand have sophisticated cost-based optimization techniques built-in which allowsorders of magnitude speedup on performance as compared to Hadoop. However, parallel DBMSs needto have the data modeled in a schematic way before useful programs can run to provide the systemenough information to optimize the queries.

MapReduce was designed with fault tolerance and unstructured data analytics in mind. Structured dataanalysis with MapReduce emerged later such as Hive. Historically Parallel DBMSs carried the designassumption that failures do not occur very often, which is not quite the case especially in large clusterswith heterogeneous machines.

HadoopDB has two layers, the data processing layer and data storage layer. The data storage layer runsthe DBMS instances, one at each node. The data processing layer runs the Hadoop as a job schedulerand communication layer. HadoopDB needs to load data from HDFS into the data storage layer beforeprocessing happens. Like Hive, HadoopDB compiles the SQL queries into a DAG of operators such asscan, select, group-by, reduce, which correspond to either the map phase or reduce phase in a traditionalsense of MapReduce. These operators become the job to run in the system.

Compared to Hive, HadoopDB does more on the system-level architecture which enables higher-leveloptimization opportunities. For example, Hive does not care about whether tables are collocated in thenode. HadoopDB, however, detects from the metadata it collects whether the attribute to group by isalso the anchor attribute to partition the table; if so, then the group-by operator can be pushed to eachnode, and joins can also be done on this partitioned attribute as well. 27

Therefore, HadoopDB is one of the attempts to combining the fault tolerance of MapReduce with ParallelDBMS’s advantage of query optimization. HadoopDB is:→ A hybrid of DBMS and MapReduce technologies that targets analytical workloads→ Designed to run on a shared-nothing cluster of commodity machines, or in the cloud→ An attempt to fill the gap in the market for a free and open source parallel DBMS→Much more scalable than currently available parallel database systems and DBMS/MapReduce hybridsystems.→ As scalable as Hadoop, while achieving superior performance on structured data analysis workloads

27Hadoop.apache.org

35

Page 36: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

6 Some Questions

XEN domains (wichtig Domain0)

Memory Types

• Shared Memory

• Distributed Memory

• Hybrid Distributed-Shared

Memory Virtualisation

Beyond CPU virtualization, the next critical component is memory virtualization. This involves sharingthe physical system memory and dynamically allocating it to virtual machines. Virtual machine memoryvirtualization is very similar to the virtual memory support provided by modern operating systems.Applications see a contiguous address space that is not necessarily tied to the underlying physicalmemory in the system. The operating system keeps mappings of virtual page numbers to physical pagenumbers stored in page tables. All modern x86 CPUs include a memory management unit (MMU) anda translation lookaside buffer (TLB) to optimize virtual memory performance.

To run multiple virtual machines on a single system, another level of memory virtualization is required.In other words, one has to virtualize the MMU to support the guest OS. The guest OS continues tocontrol the mapping of virtual addresses to the guest memory physical addresses, but the guest OS cannothave direct access to the actual machine memory. The VMM is responsible for mapping guest physicalmemory to the actual machine memory, and it uses shadow page tables to accelerate the mappings. Asdepicted by the red line in Figure 8, the VMM uses TLB hardware to map the virtual memory directlyto the machine memory to avoid the two levels of translation on every access. When the guest OSchanges the virtual memory to physical memory mapping, the VMM updates the shadow page tablesto enable a direct lookup. MMU virtualization creates some overhead for all virtualization approaches,but this is the area where second generation hardware assisted virtualization will offer efficiency gains.

Figure 22: Memory Virtualization

MapReduce(Wie funktiert es, was ist es, ...?)

Wie bildet man stateful WS?

Managing state with a web service means that some data that should have been stored in the requestsis stored on the web service side. Commonly, but not always, statefulness is used due to a design flaw.Stateful example: When client A sends its first request to a stateful web service, the state is establishedand for every subsequent request from client A, the web service retains the state associated with clientA. Compare this to patients going to see the doctor. Statelessness is a patient going to see the doctorat a big hospital; the patient cannot choose which doctor to see, but sees a random doctor each time.Statefulness is a patient going to see a specific doctor - the patient returns to the same clinic each visitand sees the same doctor.

NoSQL (typen von NoSQL datenbanken, vorteile bzw. nachteile)

Big data is in, and consequently, relational databases are out. That’s what everyone’s saying, anyway,and it’s not hard to see why: traditional databases do, after all, have a great deal of difficulty with the

36

Page 37: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

massive and unpredictable flows of information that go hand-in-hand with unstructured data. It’s gottento the point where there are many who believe NoSQL might replace SQL entirely, rather than simplyexisting as an alternative: this belief is inaccurate for a number of reasons, not the least of which isthe fact that the SQL query language (or at the very least an approximation) can be utilized in NoSQLwith relative ease. But that’s neither here nor there.

See, while there’s no denying that NoSQL databases are incredibly disruptive, with some very clearadvantages in their implementation; at the same time, the technology also has a number of shortcomingsand limitations. Today, I’d like to have a look at some of the chief advantages of a NoSQL implementationin light of the disadvantages. 28

Advantages

It’s More Scalable NoSQL’s Elastic Scaling is precisely what makes it so well-suited for big data.Relational databases tend often to ‘scale up’: they add larger, more powerful servers as the databaseload begins to increase. In the case of big data – which is likely to grow at a breakneck pace- this simplyisn’t a viable choice. It’s thus far better to ‘scale out’ instead; distributing the database across multiplehosts in order to efficiently manage server load.

It’s Flexible A NoSQL database is considerably less restricted than an SQL database, mainly becauseit’s not locked into any one specific data model (this also forms the crux of one of its chief disadvantages,but more on that in a moment). Applications can store data in virtually any structure or format neces-sary, making change management a breeze. Ultimately, this means more up-time and better reliability.Contrast this against relational databases, which must be strictly and attentively managed; where evena minor change may result in downtime or a reduction of service.

It’s Administrator-Friendly NoSQL databases tend more often than not to be considerably less complexand considerably simpler to deploy than their relational cousins. This is because, as noted by TechNirvana, they’re “designed from the ground up to require less management, with automatic repair, datadistribution, and simple data models.” All these factors together ultimately lead to a database whichrequires considerably less overhead management.

It’s Cost-Effective and Open-Source The servers utilized in a NoSQL implementation are typicallycheap, low-grade commodity devices, as opposed to the oft-expensive servers and storage systems re-quired in relational databases. That’s not the only thing that drives down the cost, either. NoSQL isentirely open-source, meaning generally higher reliability security, and speed of deployment.

The Cloud’s the Limit NoSQL meshes naturally with cloud computing. This is due to a couple offactors. Foremost among these is that NoSQL’s horizontal scaling meshes extremely well with the cloud,allowing them to take full advantage of cloud computing’s elastic scaling. In addition, the ease ofdeployment and management within a NoSQL database (and its focus on big data) make it a primepartner for cloud computing, allowing administrators to focus more on the software side of things ratherthan having to worry about what hardware they’re using.

Disadvantages

It Has a Very Narrow Focus One of the primary reasons that NoSQL will never wholly replace SQLis that it was never meant to do so. NoSQL databases have a very narrow focus: they are designedprimarily for storage, and offer very little functionality beyond. When transactions enter the equation,relational databases are still the better choice. Further, NoSQL doesn’t really do so well with databackup on its own.

Standardization and Open Source That NoSQL is open-source could at once be considered its greateststrength and its greatest weakness. The truth is, there really aren’t many reliable standards for NoSQLdatabases quite yet, meaning that no two databases are likely to be created equal. Getting a particularimplementation to play nice with existing infrastructure can thus be something of a crap-shoot, whilesupport could end up being spotty when compared against a more traditional database implementation.

28http://greendatacenterconference.com/blog/the-five-key-advantages-and-disadvantages-of-nosql/

37

Page 38: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Performance and Scaling > Consistency Because of the way data is stored and managed in a NoSQLdatabase, data consistency might well end up being a concern. NoSQL puts performance and scalabilityfirst; consistency isn’t really a consideration. Depending on what you’re using it for, this could actuallybe either a crippling weakness or a powerful strength. In certain situations – such as when you’re dealingwith a massive onslaught of unstructured data – this is completely acceptable. In other situations, suchas management of financial records, it most certainly is not.

A General Lack of Maturity While it’s certainly true that NoSQL isn’t exactly the new kid on the block(the underlying technology has existed for at least ten years now), widespread acceptance of NoSQLstill lags; compared to traditional relational databases the technology is still relatively immature. Thisis reflected also in a lack of developers and administrators with the right knowledge and skills: NoSQLmay be administrator and developer-friendly, true, but that means nothing if neither administrator nordeveloper have the tools or understanding to address it.

Relational databases are much better established in the enterprise world, and thus enjoy more func-tionality, greater acceptance, and a wealth of professionals who actually understand how to managethem.

It Doesn’t Play Nice with Analytics Admittedly, this is a weakness which has been addressed in recentyears, with the emergence of startups like Precog. Even so, NoSQL doesn’t necessarily mesh well withtraditional BI applications and platforms. It might well be less complex than SQL in many areas, butwhere analytics is concerned, it has the very real potential to become a complicated, difficult-to-decipherbehemoth.

Vergleich von OpenNebula und Eucalyptus (unterschiede, gemeinsamkeiten)

In some settings, such as a large organization with many users, it might be more cost effective for theorganization to purchase hardware to create its own private cloud. This is where open-source cloudframeworks such as Eucalyptus, OpenNebula and Nimbus enter the picture. These software productsare designed to allow an organization to set up a private group of machines as their own cloud.

These framework processes inputs from the front-end, retrieves the needed disk images from the reposi-tory, signals a VMM to set up a VM and then signals DHCP and IP bridging programs to set up MACand IP addresses for the VM.

These three frameworks represent three different points of interest in the design space of this particulartype of open-source cloud.

The actual setup of the physical machines and network components depends heavily on whether oneis using OpenNebula, Nimbus or Eucalyptus, as these systems can have different expectations for bothphysical and virtual network configuration.

Eucalyptus

Eucalyptus is designed to be an open-source answer to the commercial EC2 cloud. First, there is a verystrong separation from user-space and admin-space. Root access is required for everything done by theadministrator on the physical machines themselves. Users are only allowed to access the system via aweb interface or some type of front-end tools (e.g. euca2ools).

The software configuration also leans more toward decentralizing resources, insofar as possible. Thesystem allows for multiple clusters, such that while there is a single head node for handling user interfaces,there can be multiple cluster controllers.

Furthermore, Eucalyptus implements a distributed storage system called Walrus which is designed toimitate Amazon’s S3 distributed storage.

The highly decentralized design of Eucalyptus, with multiple clusters, distributed storage, and locallystored running virtual disks, lends itself to a large number of machines. Second, as far as possible, theinternal technical details are hidden from users, catering to persons whose primary focus might not becomputer science.

Figure 23 shows how a virtual machine is constructed in an Eucalyptus configuration.

38

Page 39: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 23: Constructing a VM in Eucalyptus

OpenNebula

OpenNebula tends to a greater level of centralization and customizability (especially for end-users).Specifically, the idea of OpenNebula is a pure private cloud, in which users actually log into the headnode to access cloud functions.

From the administrator’s perspective, the most striking customization available is in the shared filesystem used to store all of OpenNebula’s files. In order to spawn a VM, the user provides a configurationfile containing parameters which would be fed into the VMM command line. This allows for memory,processor, network and disk resources to be requested for essentially any configuration. However, thedownside to this kind of customizability is that it is easy for the user to make a mistake.

Figure 24 shows how a virtual machine is constructed in an OpenNebula configuration.

→ Transform a distributed infrastructure into a flexible virtual infrastructure→ Adapt it to the changing demands of the the service workload→ OpenNebula is a distributed virtualization layer→ Decouple the service from the physical infrastructure

Summary Eucalyputs vs. OpenNebula

Generally speaking, Eucalyptus is geared toward a private company that wants their own cloud fortheir own use and wants to protect themselves from user malice and mistakes. OpenNebula is gearedtoward persons interested in the cloud or VM technology as it own end. Such persons would want aVM sandbox so they can try new and interesting things on the computational side. OpenNebula is alsoideal for anyone that wants to stand up just a few cloud machines quickly.

39

Page 40: Large Scale Distributed Computing - SummaryTU_Wien-Large...Large Scale Distributed Computing - Summary Author: @triggetry December 10, 2013 Contents 1 Preamble 3 2 CHAPTER 1 3 2.1

Figure 24: Constructing a VM in OpenNebula

Figure 25: Eucalyptus vs. OpenNebula Table

40