Escuela
Polit
écnic
a S
uperior
de L
inare
s
UNIVERSIDAD DE JAÉN Escuela Politécnica Superior de Linares
Master Thesis
______
ENERGY OPTIMIZATION IN CLOUD
COMPUTING SYSTEMS
DE CLOUD COMPUTING
Student: Iván Tomás Cotes Ruiz
Supervisors: Dr. Rocío Pérez de Prado
Dr. Sebastián García Galán
Department: Telecommunication Engineering Department
June, 2016
1
Contents Index of tables ....................................................................................................................... 3
Index of figures ...................................................................................................................... 4
Antecedentes .......................................................................................................................... 5
Objetivos ................................................................................................................................ 7
Conclusiones .......................................................................................................................... 8
1. Background .................................................................................................................. 10
2. Objectives .................................................................................................................... 11
3. Methodology ................................................................................................................ 12
3.1. State of the art ....................................................................................................... 12
3.1.1. Power model .................................................................................................. 12
3.1.2. Dynamic Voltage and Frequency Scaling (DVFS) ....................................... 13
3.1.3. Fuzzy Logic ................................................................................................... 15
3.1.4. Cloud computing types .................................................................................. 23
3.1.5. Power saving techniques in Datacenters ....................................................... 27
3.2. First stage: simulation environment ...................................................................... 32
3.2.1. CloudSim ....................................................................................................... 33
3.2.2. CloudSim with DVFS .................................................................................... 35
3.2.3. WorkflowSim ................................................................................................ 36
3.2.4. Merged simulator ........................................................................................... 43
3.2.5. Changes with WorflowSim ........................................................................... 44
3.2.6. Changes with CloudSim ................................................................................ 45
3.2.7. Modifications and additions to achieve the proposed joint simulator ........... 46
3.2.8. Additional notes to the power model ............................................................. 48
3.3. Second stage: scheduling algorithms .................................................................... 50
3.3.1. Power aware scheduling ................................................................................ 50
3.3.2. VM scheduling .............................................................................................. 51
2
3.3.3. Tasks scheduling ........................................................................................... 53
3.3.4. Bag-of-tasks power aware scheduling ........................................................... 54
3.3.5. Classic schedulers adapted to power ............................................................. 55
3.3.6. Watts per MIPS scheduler for VMs .............................................................. 56
3.3.7. Fuzzy integration in WorkflowSimDVFS ..................................................... 58
3.3.8. VM scheduling FRBS .................................................................................... 61
3.3.9. Tasks scheduler FRBS ................................................................................... 62
3.3.10. Power model analytical .............................................................................. 63
3.3.11. Integration with Matlab ............................................................................. 65
4. Results and discussion ................................................................................................. 67
4.1. DVFS results ......................................................................................................... 67
4.1.1. DVFS savings ................................................................................................ 67
4.1.2. DVFS parameters evolution .......................................................................... 75
4.2. FRBS results ......................................................................................................... 78
4.2.1. FRBS simulation scenario ............................................................................. 78
4.2.2. Rules generated for both FRBS schedulers ................................................... 82
4.2.3. FRBS savings ................................................................................................ 84
5. Conclusions ................................................................................................................. 91
Bibliography ........................................................................................................................ 92
3
Index of tables
Table 1: basic temperature levels ........................................................................................ 16
Table 2: membership functions of air conditioner FRBS .................................................... 18
Table 3: car FRBS example ................................................................................................. 20
Table 4: CloudSim’s basic example .................................................................................... 34
Table 5: WorkflowSim communication messages between entities ................................... 38
Table 6: WorkflowSim tags meaning .................................................................................. 39
Table 7: Frequency multipliers and MIPS ........................................................................... 48
Table 8: Energy estimation example ................................................................................... 54
Table 9: Time summary (s) ................................................................................................. 68
Table 10: Overall power summary (W) ............................................................................... 70
Table 11: Avg power summary (W) .................................................................................... 72
Table 12: Energy summary (Wh) ........................................................................................ 73
Table 13: physical hosts’ configuration input parameters ................................................... 79
Table 14: physical host’s configuration calculated parameters ........................................... 81
Table 15: VM MIPS in FRBS scenario ............................................................................... 82
Table 16: basic experiments ................................................................................................ 85
Table 17: experiments with fuzzy task scheduler ................................................................ 86
Table 18: experiments with fuzzy VM scheduler ................................................................ 88
Table 19: experiments with both fuzzy schedulers ............................................................. 89
4
Index of figures
Figure 1: fuzzy temperature levels ...................................................................................... 17
Figure 2: FRBS objects ....................................................................................................... 20
Figure 3: distance membership functions ............................................................................ 21
Figure 4: speed membership functions ................................................................................ 21
Figure 5: acceleration membership functions ...................................................................... 21
Figure 6: car FRBS test {0.8, 0.2}....................................................................................... 22
Figure 7: CloudSim’s basic example ................................................................................... 34
Figure 8: Montage 25 DAG ................................................................................................. 36
Figure 9: WorkflowSim initialization stage ........................................................................ 42
Figure 10: WorkflowSim main stage................................................................................... 42
Figure 11: WorkflowSim ending stage ................................................................................ 43
Figure 12: Time summary (s) .............................................................................................. 69
Figure 13: Time savings (%) ............................................................................................... 69
Figure 14: Overall power summary (W) ............................................................................. 70
Figure 15: Overall power savings (%) ................................................................................. 71
Figure 16: Avg power summary (W) .................................................................................. 72
Figure 17: Avg power savings (%) ...................................................................................... 72
Figure 18: Energy summary (Wh) ....................................................................................... 73
Figure 19: Energy savings (%) ............................................................................................ 74
Figure 20: Utilization evolution .......................................................................................... 76
Figure 21: Multiplier evolution ........................................................................................... 76
Figure 22: Power evolution ................................................................................................. 77
Figure 23: VmWattsPerMipsMetric result values ............................................................... 85
Figure 24: VmWattsPerMipsMetric result savings ............................................................. 86
Figure 25: VmWattsPerMipsMetric result values (2) ......................................................... 87
Figure 26: VmWattsPerMipsMetric result savings (2)........................................................ 87
Figure 27: VmsFuzzy result values ..................................................................................... 88
Figure 28: VmsFuzzy result savings ................................................................................... 89
Figure 29: VmsFuzzy result values (2) ............................................................................... 90
Figure 30: VmsFuzzy result savings (2) .............................................................................. 90
5
Antecedentes
La industria de Cloud Computing se encuentra en continuo crecimiento en los
recientes años. Cada vez más empresas ofrecen servicios de almacenamiento o
procesamiento en la nube. Esto ofrece a los usuarios la capacidad de acceder a esos servicios
desde cualquier lugar con tan sólo una conexión a Internet, haciendo mucho más sencillas
tareas como copias de seguridad de datos importantes, así como poder obtener acceso a una
capacidad de procesamiento superior para la ejecución de tareas.
Sin embargo, el crecimiento de los Centros de Procesamiento de Datos (CPD),
contiendo cada vez más servidores con mayor número de recursos está incrementando su
consumo energético y, por tanto, el consumo energético global. Además, los sistemas de
refrigeración necesitados para evitar altas temperaturas en los CPD son numerosos y
consumen una alta cantidad de potencia.
Las técnicas que reducen el consumo de potencia también consiguen una reducción
en los costes, así como un incremento en la fiabilidad del sistema, ya que un mayor consumo
energético genera una mayor temperatura y cuanto mayor es la temperatura del sistema se
consiguen mayores tasas de error. Como se manifiesta en [ 27 ], por cada 10ºC de incremento
de la temperatura de un sistema, la tasa de fallo se duplica. Se requiere una solución
inminente para solventar esto.
Como se menciona en varios artículos [ 1 ][ 2 ][ 3 ][ 4 ] el consumo energético debido
a los servidores se eleva al 0.5% mundial en 2008, y se espera que ese consumo se
cuadriplique en 2020. Además, se estima que el consumo de un superordenador equivale a
la suma del consumo de 22000 hogares en los Estados Unidos.
Hay una gran cantidad de trabajo previo relacionado con este proyecto. En [ 43 ], M.
Seddiki et al. introducen un planificador de potencia basado en reglas borrosas para
máquinas virtuales en los simuladores CloudSim y RealCloudSim, pero no consideran
planificación de tareas ni el uso de workflows.
En [ 44 ], García-Galán et al. proponen una nueva estrategia llamada KARP, como
una alternativa a la aproximación de Michigan. KARP está basado en PSO, mientras que
Michigan está basado en Algoritmos Genéticos. Aplican esta nueva estrategia en Grid
Computing, obteniendo mejores resultados que con la alternativa genética. En [ 45 ]
comparan diferentes estrategias de adquisición de conocimiento en sistemas borrosos,
comparando KASIA con la aproximación de Pittsburgh y KARP con la aproximación de
Michigan, y aplicando estas técnicas a Grid Computing. Muestran cómo KASIA y KARP,
6
ambos basados en PSO consiguen mejores resultados que sus alternativas genéticas. La
estrategia KASIA es introducido en [ 46 ], donde muestran las ventajas del algoritmo PSO
sobre los algoritmos genéticos. También trabajan con meta planificadores basados en
algoritmos genéticos en [ 47 ].
7
Objetivos
El objetivo principal de este proyecto consiste en reducir los consumos de potencia
y energía de los CPD. Cada algoritmo implementado necesita ser probado. Para evitar la
necesidad de poseer in CPD, este proyecto está basado en simulaciones. La elección del
simulador es importante, debido al hecho de obtener resultados lo más similares a un CPD
real como sea posible. Si los resultados obtenidos en un simulador no se corresponden con
los que obtendrían en un escenario real, el análisis y las conclusiones no tienen sentido, y no
podremos decir que los algoritmos desarrollados consiguen ahorrar energía en CPD reales.
Teniendo esto en cuenta, la primera etapa de este proyecto consiste en conseguir un
buen escenario de simulación como la base para la segunda etapa, el diseño de los algoritmos
para reducir el consumo energético. De este modo, los resultados obtenidos al final del
proceso serán basados en un buen entorno de simulación. Siempre se ha de tener en cuenta
que los resultados de la simulación no serán perfectos. Un procesamiento real en un CPD
real considera múltiples parámetros, y no todos ellos son considerados en los simuladores.
Sin embargo, los simuladores están consiguiendo cada vez una mejor correspondencia con
el funcionamiento de CPD reales y ofrecen a los investigadores una gran herramienta para
probar diferentes tipos de algoritmos para mejorar el rendimiento de los CPD.
Existen diversas técnicas de ahorro de potencia para CPD. En la sección 3.1.5 se
muestran algunos de las más importantes. Los métodos usados en este proyecto se basan en
una combinación del algoritmo DVFS y el desarrollo de dos sistemas expertos basados en
reglas para proveer planificadores basados en potencia para los ámbitos de planificación de
máquinas virtuales y tareas. El algoritmo DVFS ya está implementado en el simulador
elegido como base del proyecto. En esta documentación se describe su uso y se analizan los
ahorros de potencia y energía conseguidos con él. Los dos sistemas expertos desarrollados
están basados en una optimización con un único fitness, en los cuales se ha considerado la
energía. A lo largo de esta documentación, se explican en detalle los parámetros
considerados en cada sistema experto y se analizan los resultados obtenidos.
8
Conclusiones
La sección 4 de esta documentación muestra los resultados obtenidos en las dos etapas
del proyecto. En primer lugar, se realizan experimentos con el entorno de simulación
obtenido y el algoritmo DVFS. La sección 4.1.1 muestra estos resultados, y la sección 4.1.2
analiza la evolución de tres parámetros del algoritmo en el simulador. La Tabla 12, y Figuras
18 y 19 muestran que los mejores resultados obtenidos para el consumo energético son
conseguidos por los gobernadores dinámicos, como era de esperar. También se analizan los
resultados obtenidos para el tiempo de procesamiento y la potencia consumida, donde se
muestra cómo el gobernador de mínima potencia introduce retardos en la ejecución,
incrementando el tiempo total de procesado y elevando el consumo de energía.
Del mismo modo, se ejecutan 14 experimentos con respecto a la segunda etapa del
proyecto, la adición de dos planificadores sistemas expertos basados en reglas borrosas. La
sección 4.2.1 describe el escenario de simulación, mostrando las características de las
máquinas virtuales y físicas. La sección 4.2.2 muestra los resultados obtenidos en los 14
experimentos, comparando los planificadores mencionados con otros algoritmos no basados
en reglas. Los mejores resultados obtenidos en cuanto a consumo energético se deben al
doble uso de ambos planificadores de máquinas virtuales y tareas basados en reglas,
obteniéndose un ahorro total de un 7.23% como muestra la Tabla 19 ya las Figuras 29 y 30.
También se analizan los resultados con respecto a tiempo de ejecución y consumo de
potencia, pero al no hacer una optimización multi-objetivo los resultados obtenidos con
respecto a tiempo no son optimizados por los planificadores sistemas expertos.
Tras ejecutar y analizar todos los experimentos, podemos concluir sabiendo que los
algoritmos desarrollados consiguen un menor valor del consumo energético, lo que
queríamos conseguir. A pesar de que estos algoritmos y resultados están basados en
experimentos ejecutados en un simulador y no podemos afirmar que los resultados obtenidos
sean exactamente lo que se habría obtenido ejecutando estos algoritmos en un sistema real,
podemos asegurar que estos resultados son similares a los obtenidos en un escenario real,
debido a las características implementadas en el simulador. No incluir el algoritmo DVFS
no haría que los planificadores de lógica borrosa ahorraran más o menos energía, pero los
resultados que obtenemos incluyendo este algoritmo son más cercanos a los obtenidos en un
escenario real, donde los procesadores que contienen los servidores ejecutan este algoritmo
de manera nativa internamente.
9
Adicionalmente, la importancia de la inclusión de los Gráficos Acíclicos Dirigidos
(DAG) nos permiten reafirmar la autenticidad de los resultados y la reducción de la energía
consumida con estos planificadores de lógica borrosa, ya que el orden de las tareas que son
ejecutadas en estos experimentos siguen un patrón real de consecuciones, y además sus
longitudes no son generadas de forma aleatoria.
Desarrollos futuros de este proyecto incluirán una optimización multi-objetivo, basada
en ambos parámetros tiempo y energía como fitness en estos algoritmos. Como es esperado,
los resultados de la Tabla 16 muestran una buena reducción de energía, pero no podemos
conseguir una gran reducción del tiempo de procesamiento. Esto es debido a que todos los
algoritmos lógica borrosa y Pittsburgh están basados en utilizar la energía como fitness
únicamente, no teniendo en cuenta el tiempo de ejecución. Los resultados que obtienen un
menor tiempo de ejecución también obtienen mayor consumo energético. Esta optimización
multi-objetivo obtendría un equilibrio entre estos dos parámetros, consiguiendo valores que
satisfagan ambos requisitos de unos bajos tiempo de ejecución y consumo energético en los
sistemas de Cloud Computing.
En el caso del simulador propuesto, al ser de código libre ofrecemos una gran
oportunidad a investigadores de todo el mundo de poder probar diferentes algoritmos y
comprobar los resultados en tiempo, potencia y energía, sabiendo que puede ejecutar trazas
reales, reproduciendo cargas de trabajo de escenarios reales. Esto puede ahorrar inversión
en las primeras etapas de investigación, cuando se desean probar nuevos algoritmos. Una
vez que el algoritmo obtiene buenos resultados en el simulador, se puede probar en un
escenario real para confirmar su funcionamiento. En todo caso, este simulador de código
libre podrá seguir creciendo por manos de cualquier investigador que desee contribuir al
proyecto.
10
1. Background
The industry of Cloud Computing is in continuous growth in the recent years. More
and more enterprises offer processing or data storage services in their datacenters. This
makes users able to access those services wherever they are, making much easier to backup
and access to their important data, as well as obtaining additional processing resources for
their tasks.
However, the growth of the datacenters, containing more clusters and servers with
more resources is increasing their energy consumption, and hence, the global amount of
energy consumed. Also, the cooling systems needed to avoid high temperatures in
Datacenters are numerous and consume a high amount of power too.
The techniques that reduce power consumption also get a reduction in operational
costs, as well as an increase in system reliability, as higher power means higher temperature
and high temperatures lead to higher failure rates. As stated in [ 27 ], for every 10ºC that
temperature increases in a system, the failure rate is doubled. An imminent solution is needed
to solve this.
As it is mentioned in several papers [ 1 ][ 2 ][ 3 ][ 4 ] the energy consumption due to
datacenters was up to a 0.5% of the global consumption in 2008, and it is expected to
continue raising to quadruple it on 2020. Additionally, estimations show that the energy
consumed by a super computer is equivalent to the energy consumed by 22000 households
in the US.
There is a wide range of previous work related to this project. In [ 43 ]. M. Seddiki
et al. introduce a power-aware FRBS scheduler for VMs in CloudSim and RealCloudSim,
but they do not consider tasks scheduling or the use of workflows.
In [ 44 ], García-Galán et al. propose a new strategy named KARP, as an alternative
to the use of the Michigan approach. KARP is based on PSO, while the Michigan approach
is based on the Genetic Algorithm. They apply this new strategy in Grid Computing, getting
better results than using the genetic counterpart. In [ 45 ], they compare different strategies
of knowledge acquisition in fuzzy systems, comparing KASIA with the Pittsburgh approach
and KARP with the Michigan approach, and applying these techniques to Grid Computing.
They show how KASIA and KARP, both based on PSO achieve better results than their
genetic alternatives. The KASIA strategy is introduced in [ 46 ], where they show the
advantages of the PSO algorithm against Genetic ones. They also cover fuzzy meta
schedulers based on Genetic Algorithms in [ 47 ].
11
2. Objectives
The main goal of this project consists on reducing the power and energy consumption
of datacenter networks. Every algorithm implemented needs to be tested. To avoid the
necessity of owning a datacenter, this project will be based on simulations. The simulator’s
choice is important, due to the fact of obtaining results the most similar as possible to a real
network. If the results got in a simulator don’t match a real scenario at all, the analysis and
conclusions are pointless, and we can’t say the designed algorithms save energy in real
datacenters.
With this in mind, the first stage of this project consists on providing a good
simulation environment as the base for the second stage, the design of the algorithms in order
to reduce the energy consumption. This way, the results got at the end of the process will be
based on a fairly good simulation environment. Always need to note that the simulation
results will not be perfect. A real processing in a real datacenters consider multiple
parameters, and not all of them are considered in the simulators. However, simulators are
getting better matching with real networks and provide researchers a great tool for testing
different types of algorithms to improve the performance of the datacenters.
There are several power-saving techniques in Datacenters. In section 3.1.5 we show
some of the most important of them. The methods we use are based on a combination of the
Dynamic Voltage and Frequency Scaling (DVFS) algorithm and the development of two
rule-based expert systems to provide power-aware schedulers in both VMs and tasks scopes.
The first algorithm is already implemented in the base simulator chosen and we describe its
use and analyze the power and energy savings got with it. The two developed expert systems
are based on a single fitness, considered energy. Throughout the documentation, we explain
in detail the parameters considered in each expert system and analyze the results got.
12
3. Methodology
In this section, the overall process of this project will be explained, from the starting
point until the final results, explaining in detail step by step in order to make the reader able
to reproduce the experiments. It will begin with a state of the art, introducing several
concepts that need to be understood in order to comprehend everything in the project. After
that, the overall process will be explained following execution order, from the initial
situation, describing problems encountered and approaches to solve them, until the
simulation environment is built and explained. Then, the same procedure will be performed
with the second stage, describing step by step everything needed to get the algorithms that
will save energy consumption. Results and analysis will be described in the next section.
3.1. State of the art
In this section we will introduce some concepts that need to be understand in order
to fully comprehend the project.
3.1.1. Power model
For being able of knowing if the techniques developed in this project achieve the
intended reduction of the energy consumption, we need to be able to measure the power
consumed. Each task that will be executed in a resource will last a certain amount of time,
and that particular resource will consume a certain amount of power. The relation between
both parameters allows to estimate the energy that is consumed, following (1).
𝐸 = 𝑃 · 𝑡 ( 1 )
This relation will have a great importance in this project, as the balance between
power consumption and execution time will limit the energy that can be saved. A machine
that can execute a task under a lower time is normally due to having a larger amount of
resources, which tends to consume more power. If the amount of time saved is equivalent to
the power over consumed, there has not been energy savings at all, so this relation will be
the center of this entire project.
Execution time is easily got as a ratio between the task’s length and the performance
of the machine. The power parameter is a more delicate issue. The globally used power
model considers energy consumption as a sum of two parts: a dynamic and a static
component [ 28 ].
𝐸 = 𝐸𝑑 + 𝐸𝑠 ( 2 )
13
The dynamic component depends directly on the processor, while the static
component depends on memory, I/O devices and storage. The dynamic power can be easily
modified, as will be seen in section 3.1.2, but the static part is not. This is why this static
component is normally considered as a relation with the dynamic component. The dynamic
power is estimated as on (3) [ 29 ], where there is a relation to the processor voltage squared,
the frequency of the processor, its capacitance and a constant 𝑘𝑑.
𝑃𝑑𝑦𝑛𝑎𝑚𝑖𝑐 = 𝑘𝑑𝐶𝑉2𝑓 ( 3 )
Power considered in equation (1) is related to this dynamic component, which is the
part that depends on the CPU parameters. So with (1) and (3) we get the dynamic energy.
𝐸𝑑 = 𝑃𝑑 · 𝑡 ( 4 )
The static component of the energy is often considered as a portion of the dynamic
component (5) [ 30 ][ 31 ], where 𝑘𝑠 is a constant.
𝐸𝑠 = 𝑘𝑠𝐸𝑑 ( 5 )
With this consideration, and joining equations (2) to (5) [ 28 ]:
𝐸 = 𝐸𝑠 + 𝐸𝑑 = 𝑘𝑠𝐸𝑑 + 𝐸𝑑 = (1 + 𝑘𝑠)𝐸𝑑 ∝ 𝐸𝑠(𝑉, 𝑓) ( 6 )
This is why the variation of the global energy is considered as a proportion to the
dynamic component, the one that we can vary with the CPU’s parameters: voltage and
frequency, but always taking into account the balance expressed on (1). The technique
introduced in the next section also depends on this balance. Always keep in mind that a
reduction of a machine’s performance reduces power consumed, but increasing the
execution time of the tasks the machine will execute.
3.1.2. Dynamic Voltage and Frequency Scaling (DVFS)
Modern CPUs already include this technique which is used to dynamically vary the
voltage and frequency of the CPU to adapt to the workload, which will be measured as the
utilization of the CPU. Its main objective is trying to reduce the energy consumption. Its
behavior depends on the governor selected. There are two basic types: static and dynamic
governors. While static ones use a fixed value for both voltage and frequency, dynamic
governors are based on thresholds, so the voltage and frequency are varied based on the
current utilization in respect to the configured threshold. It is important to note that the
different available levels of voltage and frequency depend on each processor. Those values
are displayed as multipliers, so that we have a number of discrete values.
14
In this project, we are going to work with 5 different governors. We are introducing
them and explaining their differences here.
Static
Performance: fixes the values of voltage and frequency to achieve the maximum
value of performance. This is the configuration that works as per default, which
means, as if the DVFS algorithm was not active. This always achieves the lowest
execution time in the machine, but incurs the highest power consumption.
PowerSave: the opposite as the Performance governor. Fixes the lowest values
of voltage and frequency, achieving the lowest power consumption as possible,
but enlarging execution time, delaying the end time.
UserSpace: allows the user to select the desired values for both voltage and
frequency. In this project, not many tests have been performed using this
governor, due to its dependency with the different available values of the
multiplier of the processor and the current selection of the user.
Dynamic
OnDemand: works comparing the current value of the utilization of the CPU with
a preselected threshold. In the simulator used, this threshold is configurable. The
default value in the simulator is 95%.
Conservative: varies from the previous governor in considering two thresholds
instead of one, called up_threshold and down_threshold. Their default values in
the simulator are 80% and 20% respectively.
The behavior of the dynamic governors allows to increase the performance when
needed, if the utilization of the CPU exceeds the up_threshold, and also allows to lower it to
save power when utilization falls below the down_threshold. Note that in OnDemand both
values expresses the same one. To avoid a constant variation of the multipliers, an iterator is
introduced. When utilization falls below the threshold in OnDemand, an iterator counts a
number of steps, and, if after it reaches a certain value the utilization continues being under
the threshold, then the multipliers and lowered one step.
A more detailed comparison of these governors is shown in the results, where we
show those that achieve to reduce the energy consumption and different problems that incur
in some of them.
15
The general objective of this algorithm is to reduce both values of voltage and
frequency when the utilization is low. While utilization is below 100% there is no further
delay in the execution’s end time. In other hand, if utilization exceeds the threshold or even
reach 100%, voltage and frequency will be switched up until it reaches its maximum, so to
use the maximum performance of the CPU if it is needed in the execution of those tasks.
Now consider 𝑃𝑖𝑑𝑙𝑒 as the amount of power consumed when the utilization of the
CPU is 0, and 𝑃𝑏𝑢𝑠𝑦 the power consumed when the utilization is 100%. This technique is
based on knowing that even though both 𝑃𝑖𝑑𝑙𝑒 and 𝑃𝑏𝑢𝑠𝑦 depend on CPU’s utilization 𝑢 = 0
and 𝑢 = 1 respectively, power depends on (3), meaning that as this technique decreases both
voltage and frequency, power consumption will be reduced. A CPU can work at a certain
number of different frequencies depending on the multiplier, and voltage is scaled with the
frequency. Lower voltage implies that lower frequencies can be selected on the CPU, as both
parameters are interrelated. The basic power model used on the simulator considers that both
𝑃𝑖𝑑𝑙𝑒 and 𝑃𝑏𝑢𝑠𝑦 values have been got experimentally on a modern CPU that supports DVFS,
meaning that it is possible to measure both parameters at different values of the multiplier,
having different frequencies and voltages, so that an array of values will be gotten for each
of them.
Using DVFS in the simulator we can estimate the power and energy consumption
based on a real CPU’s behavior, as modern processors introduce this technique. Based on
both 𝑃𝑖𝑑𝑙𝑒 and 𝑃𝑏𝑢𝑠𝑦 values and the utilization level of the CPU we can estimate the power
consumed by the CPU of a computer based on a linear increase of the CPU’s utilization.
𝑃(𝑢) = 𝑃𝑖𝑑𝑙𝑒 + (𝑃𝑏𝑢𝑠𝑦 − 𝑃𝑖𝑑𝑙𝑒)𝑢 ( 7 )
while this utilization u is used to check the thresholds of the selected governor and scale
frequency and voltage to minimize the overall energy consumption.
This way, we can estimate power and energy consumption in a simulator also
considering the savings introduced by the DVFS algorithm. This allows us to have a
simulation environment nearer to the behavior of a real processor.
3.1.3. Fuzzy Logic
This type of logic is commonly used in Fuzzy Rule Based Systems (FRBS). The main
concept of this logic is based on knowing that the value of a variable can be a real number
between 0 and 1, instead of true or false. The range of values that get each output for each
16
variable is specified in the Membership Functions. This type of logic gets systems working
nearer to the way humans evaluate the state of a parameter.
A basic example would be measuring the temperature of the water. Classic logic
based on true or false variables would only get discrete values for the input parameters, while
fuzzy logic allows defining a combination of the different membership functions based on
the level of membership with each of them. In the case of water temperature, we could define
different levels of temperature based on the value of the temperature in ºC. For example,
defining 5 membership functions we could have the following values:
Level Temperature
Very cold <10ºC
Cold >10ºC AND <20ºC
Warm >20ºC AND <30ºC
Hot >30ºC AND <40ºC
Very hot >40ºC
Table 1: basic temperature levels
Using classic logic based on IF ELSE statements, a value of temperature of 34ºC
would be defined HOT according to Table 1. However, every value of temperature between
31ºC and 39ºC would be indeed defined as HOT, offering the information that all these
values incur the same type of temperature. This is not how we humans think, as everyone
would agree that 36ºC is hotter than 32ºC. This cannot be achieved using the traditional logic,
and it is why the concept of fuzzy logic was created.
With this new type of logic, we can now define in a system different ranges of values
of the temperature for each specified level. This way, a value of 34ºC would belong to both
WARM and HOT levels as it is an intermediate point between both levels. The width of each
membership function depends on the user that configures them. Figure 1 below shows a
possible configuration for these membership functions. A width of 20 has been set on each
of them, allowing each temperature to belong to two membership functions.
There are several ways of defining the membership functions in a FRBS. Typical
methods include triangular, pyramidal and Gaussian functions among others. The one used
in this temperature example uses triangular functions. Some systems as jFuzzyLogic
consider triangular and pyramidal membership functions as a number of points, while other
17
authors like in [ 5 ] consider pyramidal functions as two points and two slopes. Matlab makes
easy the use of Gaussians. Figure 1 has been generated with jFuzzyLogic, and as a simple
example, we show the code to build this antecedent.
FUZZIFY temperature TERM very_cold := (0, 1) (5, 1) (15, 0); TERM cold := (5, 0) (15, 1) (25, 0); TERM warm := (15, 0) (25, 1) (35, 0); TERM hot := (25, 0) (35, 1) (45, 0); TERM very_hot := (35, 0) (45, 1) (50, 1); END_FUZZIFY
Figure 1: fuzzy temperature levels
Normally in a FRBS, the input value range is not specified as in Figure 1, considering
values of temperature between 0 and 50ºC, but instead are normally normalized between 0
and 1. Input value then, need to be normalized according to the maximum possible input
temperature, which in this case is 50ºC. This is done this way to allow a single FRBS be able
to work in different systems. Imagine that we have two systems that measure temperature.
If one system considers temperatures between 0 and 50ºC, and the other one takes values
between 0 and 100ºC, we can do two different things. As the temperature ranges are different,
we can simply build two FRBS. This, however, feels like having the same engine twice,
which is not really efficient. The other option is building just one FRBS between 0 and 1,
and make both systems use it. The only thing to do is, prior to send the input parameter,
system one need to normalize the temperature by the maximum of 50ºC, and system two
18
with 100ºC. This way we can avoid having multiple FRBS, always that both systems
consider the same number and type of antecedents.
This is the case of the antecedents, which is the name that receive the input
parameters. But fuzzy logic considers other different objects apart from the antecedents.
- Fuzzification:
This is the first stage. Input parameters are received here after they have been
normalized, and this engine is in charge of determining the level of membership that the
normalized input parameter has in each membership function. Following the current
temperature example, a value of 27ºC would belong in a high grade to the membership
function WARM, and would also belong, but in a lower grade to HOT. This engine
determines this grade of membership to each membership function. These grades of
memberships will be used by the inference engine to evaluate the rules.
- Rule base:
The rule base includes all the rules that will be executed in the inference process.
Rules are defined in an IF THEN clause that will take the level of membership of some or
all the antecedents to a specified membership function and use them to obtain a value for the
output consequent in a membership function. This can be easier understood with an example.
Following the FRBS specified in [ 5 ], we get a system with two antecedents, temperature
and variation of the temperature and a consequent, the output response to the air conditioner.
Table 2 shows the membership functions for this system.
Antecedents Consequent
Temperature Variation Output
Cold Slow Very low
Cool Moderate Low
Mild Fast Medium
Warm
High
Hot Very high
Table 2: membership functions of air conditioner FRBS
19
A sample rule would be, if the temperature measured is too low, belonging to Cold,
and the variation of the temperature is high, belonging to Fast, which means that the
temperature is dropping at a high rate, we would want the air conditioner to heat the room
fast, so the output response would be Very high. In fuzzy logic, this rule is considered as
follows:
IF temperature IS cold AND variation IS fast THEN output IS very-high
This is an example of a possible rule. Always keep in mind that there can be several
rules in the FRBS and all of them are evaluated each time the system evaluates the input
parameters to get a value for the output. Then, we need in the system a set of rules that cover
the majority of the situations that our system can find when receiving the input parameters.
If we have rules for both low and high temperature values, it is obvious to think that the
temperature measured and received as input will be either high or low, but never both at the
same time. What the fuzzifier does in the first stage, is obtaining the grade of membership
of each input to each of their membership functions. As stated most of the memberships
grades will be 0, so when evaluating a set of rules, if the input temperature is too low and its
grade of membership to the high membership function is 0, the evaluation of the rules
considering the high membership function will get a null value for the output.
There are two types of rules: those which consider AND clauses in the antecedents
and those which use OR clauses. The main difference between both of them is as follows:
while AND rules set the consequent value as the minimum value of all of the antecedents,
OR rules set the consequent value as the maximum of the antecedents. In the prior rule shown,
if the grade of membership of temperature to cold is 0.8 and the grade of variation to fast is
0.3, the grade that output will be set to very-high will be 0.3, as AND set to minimum.
Modifying the rule to OR with the given antecedent values would get an output grade of 0.8.
- Inference engine:
While the rule base only stores the rules defined by the user, which are the same on
all the evaluations, the input parameters change in each evaluation and the output value will
vary. This evaluation of each rule with the current values of the antecedents is performed in
the inference engine, which receives the grade of membership of each antecedent to their
membership functions from the fuzzifier. After this evaluation, the system has an output
value for each rule.
20
- Deffuzification:
This is where all output values are considered to get a single normalized output value.
The typical way these values are merged into a single one is using the Center of Gravity
(COG) method. When the COG has been got, the deffuzifier will get the normalized output
value based on the membership functions of the consequent. Then, this normalized value
need to be denormalized in the corresponding system to get an appropriate output value from
the FRBS.
Figure 2 shows a diagram of this set of objects used in a FRBS.
Figure 2: FRBS objects
To understand better all these concepts, here we show an example of a FRBS
consisting in a system that evaluate the distance and speed of a car in a given time and returns
the acceleration of the car. The system needs to move the car until a wall set at a certain
distance and stop it in the wall. Table 3 shows the membership functions of the antecedents
and consequent.
Antecedents Consequent
Distance Speed Acceleration
Close Slow Much-brake
Medium Medium Brake
Far Fast Keep-speed
Accelerate
Much-accelerate
Table 3: car FRBS example
21
In this example, both antecedents will have a normalized range of [0 1], as both
distance and speed cannot be negative. A negative distance would mean that the wall has
been surpassed and a negative speed would mean going backwards, and both of them are
unwished situations. However, the output acceleration needs to be negative in those cases
where the car is wanted to brake and reduce its speed, so the normalized range of the
consequent will be [-1 1]. Figures 3, 4 and 5 show the different membership functions
displayed in Table 3
Figure 3: distance membership functions
Figure 4: speed membership functions
Figure 5: acceleration membership functions
22
To test this FRBS, first the rules need to be generated. According to the scenario, it
is wished that the car moves fast when it is far from the wall, but gradually reduces the speed
when it is getting closer to its destination. As a matter of testing, 9 rules are generated that
adjust to this requirements:
1. IF distance IS close AND speed IS slow THEN acceleration IS keep-speed 2. IF distance IS close AND speed IS medium THEN acceleration IS brake 3. IF distance IS close AND speed IS fast THEN acceleration IS much-brake 4. IF distance IS medium AND speed IS slow THEN acceleration IS accelerate 5. IF distance IS medium AND speed IS medium THEN acceleration IS keep-speed 6. IF distance IS medium AND speed IS fast THEN acceleration IS brake 7. IF distance IS far AND speed IS slow THEN acceleration IS much-accelerate 8. IF distance IS far AND speed IS medium THEN acceleration IS accelerate 9. IF distance IS far AND speed IS fast THEN acceleration IS keep-speed
Now to see how these rules work, we will test the FRBS with input values of
{distance, speed} = {0.8, 0.2}.
Figure 6: car FRBS test {0.8, 0.2}
As can be seen, being AND rules, all of them containing close distance and fast speed
get an output of 0. The COG is performed with the rest of the output values and, as expected,
the FRBS indicates to keep accelerating, as the distance is relatively far for the speed the car
is moving.
The output we get in a FRBS depends in a high grade on the rules configured by the
user. As it is difficult for a user to find the best possible rule configuration to accomplish the
23
best result, normally these FRBS are used along another system that optimizes these rules.
This is the case of the Pittsburgh and KASIA approach among others. These systems use
meta-heuristic algorithms like the genetic algorithm and Particle Swarm Optimization (PSO)
respectively to try an initially random generated members of a population or particles and
optimize them through a combination of them oriented to the best solutions found.
Here enter the concepts of exploration and exploitation. If the system is too focused
on exploring, then there will be low chances of getting a good final value, as the system is
not moving towards the good solutions. In the other hand, if the system is too focused on
exploiting, then there are high chances of getting stuck in a local optimum in the case of PSO
or incur in elitism due to a high selective pressure. For these two reasons, it is needed to find
a correct balance between these two approaches.
This way, using one of these systems we can find a set of rules that accomplish a
better result than with the rules we manually tried.
3.1.4. Cloud computing types
After years of researching, the community has developed several different types of
Cloud Computing networks, each of them based on offering a different service to their users.
This way, all different types are named after a service, using the nomenclature XaaS (X as a
Service) [ 23 ]. Among all these types, the most important are SaaS (Software as a Service),
PaaS (Platform as a Service), IaaS (Infrastructure as a Service) and HaaS (Hardware as a
Service), among others. There are many different types of taxonomies based on different
enterprises, as each of them tries to define Cloud Computing in their own way.
The mentioned types follow a layered architecture, being SaaS the upper level and
HaaS the bottom. Their order expresses the level of privileges that the final user gets in each
of them. Here we describe which is done in each of them.
- Application level
Contains the current application that is offered to their users. This will vary
depending on the enterprise. The use of this type of services include several benefits from
the point of view of the final user. First, the final user do not need to have specialized
knowledge for the support of the system. The responsibility of this step depends on the
enterprise which offer the cloud computing service, offering to the final user the availability
of the application at all times and ensuring that the application will work as intended.
24
- Platform level
If the final user needs more privileges in the cloud and prefers more flexibility, this
is the second layer of the stack. In this level, the final user is in charge of managing the whole
operating system instead of just making use of a built application. This allows the flexibility
of being able to work with the cloud server as if it were a local server, with the
responsibilities over the system and the applications.
The final user is given a platform with a certain performance, not being able to decide
on the VMs characteristics or the physical host management. However, this may be an
advantage for some final users, as they do not need to have the required knowledge to
manage these VMs and reduce the complexity of the system on their side. As in SaaS, the
hardware failures continue being responsibility of the IT enterprises that offer this service,
which avoid the final user the need of buying any hardware in case any of them stops
working.
- Infrastructure level
In case the final user wants still more privileges and absolute control over a physical
host, this layer offer a whole server of the demanded characteristics, offering the managing
rights of VMs creation and deciding how to divide the host’s resources. Still, final users get
rid of tasks such as security, backup and data partitioning. Three of the most important
virtualization technologies used at this layer of the cloud computing stack are KVM [ 24 ],
VMware [ 25 ] and XEN [ 26 ]. Virtualization technologies offer the configuration of the
whole performance of the physical host and divide it into different VMs. These VMs are
configurable, able of choosing which the performance capacities of each one of them is. This
allows having a dynamic resource management, with systems modifying the capabilities of
each VM to achieve the optimum working of the whole server and system.
Another great advantage for users that use this is the fact of always been able of using
the latest technology. In case the user prefers a local server to have full control over it, after
some time the hardware will grow deprecated and another server will need to be bought.
However, using these IaaS solutions the final user will always get the performance that is
being paid, not worrying about hardware. This allows users being able to compete at a much
lower cost than if they would need to regularly acquire new hardware.
25
- Hardware level
This level is normally managed by the IT enterprise that offer the cloud service to
the final users. In this layer, the physical resources of the Datacenter are managed, including
the physical servers, routers, switches and systems to provide power and cooling. In this
level, operations like hardware configuration, failure, power, cooling and traffic
management take place.
In the case of Hardware as a Service (HaaS), hardware belonging to the IT enterprise
is installed in the client’s site. A SLA indicates the responsibilities of each part, and the client
pay for the use of that hardware. This way, the final user is able of managing the hardware,
but leaving the replacement of broken parts to the IT owner.
Among the numerous advantages of using this solution, some of them are shared with
the IaaS alternative. For example, using this strategy the final use do not need to worry with
the obsolescence of hardware, as always that the level of performance needed increases, the
IT enterprise can replace the hardware for another one with more resources increasing the
fee the user is paying to the owner of the hardware. This hardware is then able to be used for
another client who does not need as much performance.
Additionally, in case there is some failure with the hardware, the IT is in charge of
replacing it for new one as it is one of their responsibilities. This way, the amount of funds
that need to be invested by the final users each time an upgrade in hardware is needed is
much lower than needing to buy the hardware themselves.
Also, in case of failures, the Managed Services Provider (MSP) is able to do the
troubleshooting instead and help solving the problem. Maintenance can be relied on them
too, making much easier the task of keeping the hardware working as intended.
Another advantage is scalability. If a client needs to increase the amount of hardware
of the Datacenter, the MSP can easily install additional hardware increasing the fee.
Although this is obvious to think, there is another possibility that is also important to note.
In case the client buys the hardware him/herself and the enterprise contracts, the amount of
hardware needed will be lower, but hardware already bought means money already invested.
With HaaS solutions, it is as simple as getting the MSP to take away some hardware and
reducing the free paid.
The information shown concerns the architecture of cloud computing. There is also
another classification of cloud computing as types.
26
- Public clouds
The services offered by the cloud are available to every user. These networks
normally do not incur in an initial investment, but lacks the control of the data managed.
Also, the security offered on their applications or storage services may not be enough for
some clients, who will then consider paying for a private cloud if their data is too important
to sacrifice reliability.
- Private clouds
These clouds are designed by the use of a single organization, which offer a higher
level of privacy over free public solutions. They offer the highest level of control over
reliability, performance and security, as they can be managed by the same organization that
pays for them. In case of Datacenters self-run by the organization, they may incur in high
costs and they will require high amounts of space for allocation of the physical machines
within the organization. They also need to invest funds in management and maintenance,
which increases costs too. In this case, they can even lack the benefit from less focused
management, and in part losing part of the concept that makes cloud computing a great
solution to many organizations.
- Hybrid clouds
Being a combination of both public and private alternatives, these networks try to
solve the problems encountered in both them by themselves. There are two parts of the
infrastructure in both different networks, having one part in a public cloud and another one
in a private cloud, which gets a higher flexibility than running only one of them. This can
offer the possibilities of storing important data of the organization in the private part, while
leaving some applications run in a SaaS solution in the public network.
They offer a higher level of control and data security than solo-public solutions, but
continuing facilitating the scalability of the network by admitting extending the Datacenter
capabilities with the addition of external services in public clouds. Another advantage is
meeting temporary capacities, in those times when a high amount of resources are needed to
process a high amount of data. Then, external resources can be used to solve a peak problem
of burst data. This means that the client is paying for those extra resources only when they
are needed, avoiding the necessity of owning a huge private cloud to meet users SLA at all
times and avoid breaching their agreements. This way, the private Datacenter can be built to
27
support processing average workloads, and letting external resources to deal with additional
traffic.
The difficult component of this mixed solution is deciding and optimizing the best
division of components that run in each of both parts.
- Virtual private clouds
This is another solution to solve the problems found running a cloud solely public or
private. Essentially, it consists on a platform running over a public cloud, with the difference
of using Virtual Private Networks (VPN) that allows defining own topologies and security
settings.
3.1.5. Power saving techniques in Datacenters
There are many different types of techniques to achieve a reduction in the power
consumed by Datacenters. In this section we show a summary of some of the most important
of them and indicate the direction this project moves in the power saving scope.
1- DVFS
As shown before, DVFS [ 8 ][ 9 ] is a technique that, based on the type of governor
used, adjust both frequency and voltage levels of a processor adapting to the workload. This
technique allows a higher performance at a higher power cost and, in the other hand, achieve
power reduction at the cost of performance. This algorithm is included in the Linux kernel
at it allows the execution in servers easily. This technique is not a substitution of other
algorithm in the same scope, but an addition to continue increasing the level of power saving
in datacenters.
As the reduction of the power consumption also incurs in a reduction of the
performance, it is not wise to set power to the minimum, as it would make difficult to
accomplish the agreements set in the SLA of the user. To achieve the maximum power
saving, the levels of frequency and voltage need to be set to the lowest possible level that
accomplish the SLA, keeping both performance and power to the minimum possible levels.
Servers in old days when this technique was not yet implemented used to continuously have
both frequency and voltage levels set to their maximum, consuming at all times the
maximum power depending only on the utilization. This meant an incredibly high amount
of power consumed in idle servers, which now is fairly lower.
28
2- Power Capping and Shifting
This is a similar technique to DVFS. It considers the power/performance balance, but
the algorithm works in a different manner. First, power capping limit the maximum amount
of power that each server can consume. Then, power shifting adjust the power of each cluster
considering the maximum limited in the first stage. The main difference between this
algorithm and DVFS is the network component. While DVFS adjust the power levels
individually in each server, this technique allows to set the power levels in different servers
within a cluster. [ 6 ][ 7 ] show the effectiveness of this algorithm in the power saving scope.
3- Server virtualization
Another typical technique to reduce power is virtualization. This technique allows to
run several virtual machines in a single physical server. Benefits from this approach are
multiple.
First, cost that incur in a single server are lower than running several of them. Even
if we are talking of a server with a high amount of resources in order to be able to run several
virtual machines, the hardware cost of this server is lower than needing several individual
servers.
Another important thing to consider is that the normal state of server is low utilization,
leaving the majority of its resources unused. Knowing this, the sum of the amount of the idle
resources of individual servers is not needed in a big server that keeps them as virtual
machines, as a virtual machine will only need a high amount of resources and a high
performance in rare occasions.
Additionally, the amount of power idly consumed by the single server is way lower
than a group of servers, as the number of processors needed will be lower in the first case.
Also, the power consumed by the cooling systems will be lower, helping to get less amount
of power consumed in this part too.
Another great characteristic of having multiple VMs in a single server is the
possibility of consolidating the workloads of different VMs into a single one without
network traffic delays. Also, being able to migrate VMs between physical machines [ 10 ]
allows that, in a situation having VMs running on two or more physical machines with a
really low utilization, all of those VMs can be migrated onto one of the servers and turn off
the rest of them, always that the sum of workloads can be executed in that single server. This
permits getting rid of the static component of the power consumed on different idle servers,
which is high enough to no being considered negligible. These characteristics offer to
29
virtualized servers a flexibility that helps reducing even more the power consumed in
datacenters [ 11 ][ 12 ].
4- Server consolidation
From the migration characteristic, another technique is derived which helps reducing
the power consumption on a Datacenter by consolidating the workloads of different servers
into the minimum amount of them running at full performance switching the rest of them to
a low power state or even turned off to get rid of most of the idle power of the servers.
The base of this technique is that knowing that the static part of the power consumed
by a server, the part that does not depend on the processor and cannot be lowered with
algorithms such as DVFS, is relatively high enough to between 30% ~ 70% depending on
the server, if we get to cut off this component in, for example, 7 of 10 servers and leave only
3 running at full performance, we are indeed getting a great reduction of power consumed.
The main problem of this technique is that if suddenly, a high amount of tasks arrives
to the Datacenter, it takes high response times and transition costs to turn on the off servers
and migrate VMs between them. One of the proposed solutions to this problem is introduced
by Anagnostopoulou et al. [ 13 ] where they mention a design in which servers are switched
to a low-power barely-alive state, with most of their components in an off state but being
still accessible to petitions and incurring a much lower delay to be turned up when needed.
Another approach by L. Liu et al. [ 14 ] introduces live migration of VMs between physical
machines, so that the server consolidation can continuously be made, migrating VMs even
when being executed without users note their migration. This is a great addition, as the
system needs not stop running the VMs to migrate them, but instead can periodically run the
consolidation process to optimize the power consumption always keeping the less amount
of servers switched on. Leverich et al. approach [ 15 ] include a mechanism to manage
multicore processors, controlling the power supply individually to different cores within the
processor.
5- Load prediction
Based on server consolidation, the main problem mentioned is based on slow
response and high delays on VMs migrations and ON/OFF servers switching. Adding a
system that predicts the workload income would allow the system to ease the effects of those
high delays by switching on the servers with enough time before traffic load is increased
[ 16 ]. Indeed, to achieve this a system able to predict load with high precision is required.
30
Additionally, the load prediction can help deactivating the servers to increase the power
saving when the prediction shows a low load income. However, a bad prediction could make
the system to turn servers off just before the reception of high traffic load, violating the SLA
agreements of the users or, on the other hand, turn all servers on in a low workload stage,
wasting energy.
6- Thermal-aware techniques
Most of the mentioned techniques do not consider the systems temperature at all.
They work taking time and power/energy into consideration in their optimizations, but they
leave temperature out of their parameters. However, it is also important to note that higher
power leads to higher temperature, and hence, higher consumption from the dissipation
hardware. From this, some techniques are derived that seek power reduction but considering
thermal properties as well.
Moore et al. [ 17 ] consider the following. In a Datacenter, there are a number of
temperature sensors scattered around the room. Some of the servers are nearer to these
sensors than others, and this may lead to an inefficient scenario where not all servers generate
the same amount of heat and some of them may overheat due to this fact. Based on this
information, this techniques proposes the use of a heating map, so that the scheduler knows
which server is closer of further to a sensor, taking this into account on the distribution of
tasks made by the scheduler to try to avoid an overheating of some servers over others.
Li et al. [ 18 ] present a model to predict the temperatures near the servers inside the
Datacenter. It is based on measurements of temperature streams and airflows, predicting the
heating in different parts of the room, and hence, the heating of different servers based on
their location. It also uses sensors to take measurements of temperatures and helping the
prediction system to learn different parameters to build a good prediction system.
Patel et al. [ 19 ] also include models to adapt the workload given to the servers based
on seasonal and diurnal temperature, as a method of heating prediction based on the moment
of the day and day of the year. They show an example of two Datacenters set one in the US
and the other in India. The system know that during daytime in India, it is nighttime in the
US, and prefers to send to traffic load there to avoid using a high amount of energy for
thermal dissipation.
7- Workload scheduling
As nowadays, Datacenters tend to be composed of several servers, the workload
received to the Datacenter needs to be scheduled to determine which physical machine will
31
execute each task within the workload. The global power consumed by the Datacenter
depends from this scheduling decision, making this an important step when looking for
power savings. A bad scheduling could not only incur in higher power consumption, but also
in a longer execution time of the tasks within the workload.
There are different types of schedulers. Some are based on selecting which physical
machine will process a certain task, selecting the VM within the host. Other type, called
meta-schedulers, are in charge of optimizing the functioning of these first schedulers. The
first type are used in a local network, where different servers are located and optimize the
operation of these servers. The second type interconnect different networks, deciding where
to send different group of tasks to be processed. This helps balancing the workload in
different networks, keeping all of them in a similar load level to optimize the whole
Datacenter, in those cases where it is big enough to be composed of several networks.
8- Energy aware task scheduling
Tasks schedulers are divided into three different types: offline schedulers takes
decision of tasks prior to the execution. Online schedulers take decisions on a dynamic
method during the execution. Hybrid schedulers combine both approaches, performing a
prior decision and adapting dynamically during the execution. Wang et al. [ 20 ] consider a
model where users admit decreasing the performance specified on their SLA a given
percentage to allow the Datacenter system saving energy.
Some authors show that schedulers should be built according to the type of load that
they will work with. Others mention taking into account load balancing between the different
servers [ 21 ] and take information of the network connections [ 22 ].
From all these techniques, our work focus on both DVFS and scheduling through
rule-based expert systems as FRBS. The first algorithm is implemented considering the five
different governors that have been presented in section 3.1.2. Scheduling has been divided
into two parts, considering VM scheduling in the first part and tasks scheduling in the second
one. These schedulers will be presented in the second stage of this project.
32
3.2. First stage: simulation environment
At the beginning of this project, we part knowing what we want to do and how to do
it, but for being able to begin working on it first we need a good simulation environment.
This is an absolutely necessary stage. Consider an algorithm designed that, after testing
shows that it really achieves savings. If those tests have been performed on a simulator which
behavior is far from a real datacenter, we cannot guarantee that the designed algorithm will
be able to save energy on a real network. This is the strong reason that impose a first stage
which main objective is building a simulation environment as real as possible.
The main requirements in which we base the simulation choice will be simple but
necessary:
1. It must be able to estimate the energy consumption. If the chosen simulator
cannot show an estimation of energy consumed, we are unable to know whether
the designed algorithms can achieve energy savings. This is an essential
characteristic of the simulator wanted.
2. Also, and not less important, the simulator must be able of simulating traces that
represent a real processing. If the tasks are randomly generated, we have the
problem that the tasks to execute don’t match a real execution in a datacenter.
We need this characteristic in the simulator in order to have results as real as
possible.
3. Also, we have a personal preference for free code software, so that everything we
design can afterwards be useful for the community.
With this in mind, and studying the current available simulators with these three
preferences we were unable to find a platform that could satisfy our needs. However, we
found a simulator that, even though it did not cover any of the two main characteristics, there
were two extensions that could offer us what we were looking for. With only one problem:
each extension covered one of the characteristics. Then we had what we needed, but in two
separated simulators. The solution is simple to note, just join them into one scenario. This is
anything but a trivial task.
First we will show here the different simulators and their behavior and functionalities,
similarities and differences.
33
3.2.1. CloudSim
CloudSim [ 32 ] is an open source Cloud Computing simulator written on Java. It
allows simulating several types of scenarios defining different simulation entities. Each
entity will be in charge of a certain number of operations to do and they communicate with
each other using different commands, identified by a tag. The two main entities used in
CloudSim are the Broker and the Datacenter. The Broker acts in the name of the user and
keep all the tasks (named Cloudlets in this simulator) that the user need to compute. The
Datacenter entity models, in fact, a Datacenter, which contains all the physical machines
simulated. As these physical normally have a high amount of resources, tasks are not
assigned directly to them. Virtual Machines (VM) are created so to divide those resources,
allowing several tasks to be executed in parallel on the same machine.
To understand the functioning of this simulator, a basic scenario will be explained
here. This is important to know, as this is the basic behavior of the simulator. Both extensions
include several differences, so for now we will explain the basics.
1. Datacenter is registered in the CIS entity. Then broker can get a list with all
Datacenters registered, in this case only 1.
2. Broker request to each Datacenter of the list its characteristics.
3. Upon receiving the information from all Datacenters, Broker will check which
Datacenter is suitable for its processing requirements and choose it.
4. Broker will solicit to the Datacenter the creation of a certain number of Virtual
Machines, in which the Cloudlets will be executed.
5. Datacenter notifies to Broker of the creation of the Virtual Machines requested, in
addition to inform about any change in VMs capabilities, in case there were not
enough MIPS left for all VMs.
6. Broker then send the Cloudlets step by step, in this example, only 1 Cloudlet is sent.
7. Datacenter process the task only taking into account information related to execution
times.
8. Execution results are given back to the Broker.
9. When there are no more Cloudlets left, Broker solicits the VMs destruction and finish
the simulation.
This example is shown in Table 4 and Figure 7, which contain the basic messages
that are used to communicate both entities. They depend on the tag, so each Tag ID
corresponds to a certain action to do in the destination entity which receives the command.
34
Tags Flow
Number Source Destination Tag ID Tag
1
Datacenter CIS 2 Register Resource
Broker Broker 15 Resource Characteristics
Request
2 Broker Datacenter 6 Resource Characteristics
3 Datacenter Broker 6 Resource Characteristics
4 Broker Datacenter 32 VM_CREATE_ACK
5 Datacenter Broker 32 VM_CREATE_ACK
6 Broker Datacenter 21 CLOUDLET_SUBMIT
7 Datacenter Datacenter 41 VM_DATACENTER_EVENT
8 Datacenter Broker 20 CLOUDLET_RETURN
9 Broker Datacenter 33 VM_DESTROY
Broker Broker -1 End_of_Simulation
Table 4: CloudSim’s basic example
Figure 7: CloudSim’s basic example
This is the basic behavior of the simulator. Changes introduced by the extensions add
complexity and functionality to this scenario. Note that in this case, none of the necessary
characteristics are considered. Datacenter only gives an estimation of execution time at the
end of the simulation, but does not provide a power model which serves as a tool to estimate
35
the global power and energy that would be consumed in those tasks processing. Moreover,
this simulator allows to create the Cloudlets that will be processed freely, that is, allowing
to decide their length and not following a specific order or restriction. This gives the user
enough freedom to specify the desired scenario to simulate, but its behavior does not satisfy
the necessity of a real scenario. Even though this is a simulator, it is desired to be able to
simulate a real processing, rather than allowing random tasks to be executed and measured.
This gives no information regarding the capabilities of energy saving if is not under a real
scenario simulation.
3.2.2. CloudSim with DVFS
This extension [ 33 ] includes the power model and DVFS algorithm explained in
sections 3.1.1 and 3.1.2. Several classes are included in order to add these functionalities to
the basic scenario explained before. The entities used in this simulator are the same as those
used in the basic version of CloudSim. There is no need of showing a table containing the
different messages like in the prior example, as the only variation is related with the tasks
processing in the Datacenter entity. When the process reach event Tag 41, Cloudlet
execution, instead of performing an estimation of the time needed to fully process the
Cloudlet, the event is repeated a number of times estimating the power consumed until the
current time reaches the end time of the Cloudlet. The amount of time elapsed between these
events is called Scheduling Interval, by default set to 0.01 seconds.
The process is as follows. Once the Cloudlets are sent from the Broker to the
Datacenter to be executed in parallel, the Datacenter will estimate the execution time for
each task, which added to the current time of the simulator gets the end time for each task.
Then, the execution enters a loop, which is repeated until the current simulation time exceeds
the end time of one of the tasks. As explained, time elapsed between these loops is the
scheduling interval. The process performed in this loop estimate the power consumed by
each task that is been executed. Using the current value of the utilization of the CPU and
equation (7), the Datacenter estimates the current value of power that is being consumed in
that exact moment. When all tasks have been executed, the system will get the energy value
from the total power and time.
This allows us to know whether the scheduling algorithms we design save energy or
not, which is our main objective in this project. However, as it has already been mentioned
before, this extension does not base the processing tasks into a real trace, and continues
having the same problem that the basic version of CloudSim.
36
3.2.3. WorkflowSim
This simulator developed by Weiwei Chen [ 34 ] extends CloudSim to make it able
to work with Directed Acyclic Graph (DAG) [ 36 ] in a XML format, called DAX. With this
addition, instead of simulating a series of tasks as Cloudlets, we can simulate and experiment
using traces of real Datacenters workflows. These DAX files are generated using a program
called Pegasus [ 35 ], and it can be used to convert these XML files to an image format so
that it is easier to understand how it works. The image contained in Figure 8 shows the graph
that will follow a specific execution order of the workflow called Montage_25 [ 37 ].
The advantages of using workflows are numerous. In one hand, the use of workflows
in the simulator help determining the order that tasks will be processed, including constrains
that avoid a tasks to be processed before its parent tasks. This make sure that the tasks are
processed following the intended order, instead of what happens with random generated
tasks in the case basic Cloudsim. This can be useful when simulating a trace containing a
number of tasks that reproduce the real traffic of a real workload in a real system.
Figure 8: Montage 25 DAG
ID00019
ID00021
ID00018
ID00015
ID00017 ID00016 ID00020
ID00014
ID00011 ID00010ID00013 ID00012
ID00022
ID00008ID00009
ID00024
ID00002ID00003
ID00007
ID00000
ID00006 ID00005
ID00001ID00004
ID00023
Montage::mProjectPP:1.0
Montage::mDiffFit:1.0
Montage::mConcatFit:1.0
Montage::mBgModel:1.0
Montage::mBackground:1.0
Montage::mImgTbl:1.0
Montage::mAdd:1.0
Montage::mShrink:1.0
Montage::mJPEG:1.0
37
Each task have inputs and outputs of a certain size. Being able to process these files,
our simulations are based on real workflows instead of a series of a number of Cloudlets, so
this can establish a base for optimizing Cloud environments.
The main difference with CloudSim consists in the use of additional entities, called
Planner, Clustering Engine, Engine and Scheduler, this last one substituting the Broker.
Planner will be in charge of the highest level planning, and it will call the Parser class to
transform the DAX into tasks so that the simulator can handle them. The Engine will take
care that the simulation is following the desired order established in the DAG, so that no
child task is executed before its parent node, as there would be missing input data. The
Scheduler then, following a similar behavior to the Broker, will receive the Tasks from the
Engine and send them to the Datacenter for their execution.
It is important to know that it is here, in the Scheduler entity, where the decision is
taken of which VM will process each task. This is one of the main core parts that will be
addressed in the second stage of this project, where the algorithms to save energy are
designed.
While this new characteristic of the simulator is getting the simulations nearer to a
real Datacenter behavior, it only considers information related to execution times, going
back to the initial problem of CloudSim.
To explain the different types of messages that entities use to communicate with each
other, we are making the simulator run following the basic Montage_25 DAX file.
Table 5 shows a fragment of that communication, divided in three stages that we call
Initialization stage, from messages 1 to 11, Main stage including steps 12 to 18 and Ending
stage, with the last steps displayed on this table. The different type of messages are also
shown in Table 6 ordered by their Tag number, with an explanation of what each of those
messages mean. Additionally, to help understand all the messages shown in Table 5 and the
three stages mentioned, three Figures are shown similar to Figure 7. Figure 9 shows the
Initialization stage. Figure 10 displayed the entities concerned in the Main stage, processing.
Figure 11 shows how the simulation is finished.
Tag Src entity Dst entity Info
1
2 Datacenter CIS Datacenter registers in the CIS
1000 Planner Planner Parse DAX file
1000 Merger Merger Empty method
38
15 Engine Engine Sends a Tag 15 message to the Scheduler
2 Scheduler CIS Scheduler registers in the CIS
2 1001 Planner Merger Planner sends all the Cloudlets to the Merger
15 Engine Scheduler Sends a Tag 6 message to the Datacenter
3 1001 Merger Engine Merger sends all the Cloudlets to the Engine
6 Scheduler Datacenter Requests characteristics
4 6 Datacenter Scheduler Responds to the request
5 32 Scheduler Datacenter Requests VMs creation
6 32 Datacenter Scheduler Creation acknowledgment
7 21 Scheduler Engine Sends the first Cloudlet, doesn't belong to the
DAG
8 21 Engine Scheduler Sends the Cloudlet to the Scheduler
9 1005 Scheduler Scheduler Only 1 Cloudlet, easy decision
10 21 Scheduler Datacenter Sends the Scheduled Cloudlet to its VM
11 41 Datacenter Datacenter Cloudlet processing in the Datacenter
... See Tag 41 explanation
12 20 Datacenter Scheduler Adds a Cloudlet_update (1005) to the future
queue
13 20 Scheduler Engine Sends Cloudlet back to Engine
1005 Scheduler Scheduler Empty Cloudlet list, no scheduling done
14 21 Engine Engine Gets the following row to be sent to the
Scheduler
15 21 Engine Scheduler Sends the Cloudlet to the Scheduler
16 1005 Scheduler Scheduler Bag of tasks, takes decision of VM for each
Cloudlet
17 21 Scheduler Datacenter Sends the Scheduled Cloudlets to their VMs
18 41 Datacenter Datacenter Cloudlets processing in the Datacenter
... See Tag 41 explanation
159 20 Datacenter Scheduler Returns the last Cloudlet
160 20 Scheduler Engine The Engine sets Tag -1, End of simulation
1005 Scheduler Scheduler No Cloudlets, does nothing
161 -1 Engine Scheduler Proceeds to clear Datacenters
Table 5: WorkflowSim communication messages between entities
39
Tag Message
2 Register_resource
6 Resource_characteristics
15 Res_charact_request
20 Cloudlet_return
21 Cloudlet_submit
32 VM_creation
33 VM_destroy
41 VM_ datacenter_event
1000 Start_simulation
1001 Job_submit
1005 Cloudlet_update
-1 End_of_simulation
Table 6: WorkflowSim tags meaning
To clarify the purpose of each tag message, we add a further explanation here.
2: The Cloud Information Service is the entity where all Datacenters and Schedulers
must be registered before they can be accessed. And so they do, at the beginning of the
simulation.
6: In order for the Scheduler to be able to take decisions about where to schedule a VM
or a Task, it needs to have access to all the information of the Datacenter. This message,
Resource characteristics, make the Datacenter to send its characteristics to the requester,
the Scheduler. This is the reason there is a double message with this tag, being the first
for requesting and the second for replying, indicating the Datacenter's characteristics.
15: This message is used when any entity accesses the Cloud Information Service to get
information of any Datacenter or Scheduler already registered. In this example, it is used
twice. The first time is used by the Engine entity, which requests to the CIS information
about any registered Scheduler. The second time, it is this Scheduler that requests
information about the Datacenter. After the Scheduler knows the availability of one
Datacenter, it will send the message with Tag 6, to get its characteristics information.
20: After the VM has finished processing one Cloudlet, this is returned to the entity that
sent it to the Datacenter, the Scheduler. Then, as the entity in charge of making sure that
40
the simulator is following the workflow order is the Engine, the Scheduler also sends the
Cloudlet back so that the Engine will continue with the workflow order. Each time the
Engine gets a Cloudlet returned, it will add it to the processed list and check whether
there are Cloudlets whose all its parent nodes have already been processed. If there are
any, they will be sent to the Scheduler, take decision of VM, and sent to Datacenter to
be processed.
21: This is the standard message used to pass a Cloudlet between entities. When used by
the Engine, it will select those Cloudlets whose parent nodes have already been
processed and returned, what means that the simulation is following the order established
by the workflow parsed from the DAG and send them to the Scheduler, which will then
use the message with Tag 1005 to make a scheduling decision. After the Scheduler
finishes deciding the VMs which will process each Cloudlet, it will use this message to
send them to the Datacenter, each of them to its corresponding VM.
32: Cloudlets are executed into VMs, but they must have been created before. This is the
message sent from the Scheduler to the Datacenter to request for the VM creation. This
Tag is also used in the reverse direction to express an acknowledgment of this creation.
33: This message is triggered while ending the simulation, Tag -1. It will destroy all VMs
before shutting down the system. This is the last message shown before the end of the
simulation.
41: With this message, the Datacenter shows that at least one Cloudlet is being processed
in any of its VMs. It is important to note that not only one of the Cloudlets submitted is
processed at a time. All different VM are processed in parallel, so that there is a
maximum of Cloudlets that can be executed at the same time equal to the number of
VMs that the Datacenter has. However, as this simulator is not power aware, the process
stage is made in only one step. We need to take into account that WorkflowSim is a
simulator that follows the order of a Workflow, but only consider time in Cloudlets
execution. For this reason, when a Cloudlet is executed, the simulator only need to
estimate the time that that particular Cloudlet takes in its execution, and that calculation
can be made in just one step. But not only is the time spent into processing the Cloudlet
considered. Upon its arrival to the Datacenter, it will check the Cloudlets input files, get
their size and calculate the time it takes to transfer them to the Datacenter. This time is
added to the Cloudlet's processing time, as being fair, the Cloudlet's execution will not
begin before the necessary files are transferred to the Datacenter, so this time must be
41
considered too. However, this Tag 41 message is not this simple in the joined simulator.
After the Cloudlet has been processed, it will be sent back to the Scheduler, using a
message with Tag 20.
1000: Method used to start the simulation. It is here where the DAX file that contains all
the information about tasks is parsed into Cloudlets in order to make the simulator able
to process them.
1001: The Engine is the entity which needs to have access to all the whole set of
Cloudlets, as it is the entity in charge of passing the Cloudlets to the Scheduler making
sure that the process is following the correct workflow's order. However, at the beginning
of the simulation the entity that parses the DAX file is the Planner, making necessary the
transference of all jobs to the Engine. This message is used first to pass all the jobs to
the Merger entity, and later, from the Merger to the Engine. Once the Engine is in
possession of all the Jobs, it will send the Cloudlets that compose those Jobs to the
Scheduler using the Tag 21 following the desired workflow's order.
1005: Each time the Scheduler receives a number of Cloudlets from the Engine entity it
will need to take a decision of which VM is the most suitable to process each of them.
This is the second phase of Cloud scheduling, which consider a bag of Cloudlets to be
sent to a group of VMs. This step is one of the most important in the simulator, and most
of the effort to reduce the overall energy consumed will be performed at this point. There
are many scheduling algorithms; some of them are time based and other power based,
whose optimization will be the main objective of this DVFS-Workflow joined simulator.
After the Scheduler has decided all the matching pairs, Cloudlet-VM, it will again send
a message with Tag 21 to assign them to their respective VM in the Datacenter.
-1: The End of Simulation message is used when there are no more Cloudlets waiting to
be processed and the simulation needs to end. It will erase all the Datacenters, which will
trigger a Tag 33 message per VM to destroy them, as they are no longer needed.
The process shown for steps 12 to 18 is repeated each time the Datacenter has finished
processing a Cloudlet and sends it back. The Engine then checks whether some new
Cloudlets can be executed already, sends them to the Scheduler if there is any, takes decision
of VMs and send them to be processed. Always that the process has reached the Tag 41 part,
it will advance the lapse of time needed to get the next Cloudlet finished. In the case of the
Montage_25, the first row will send 5 Cloudlets, with IDs from 0 to 4. Seeing that the shortest
42
is the Cloudlet with ID 2, the simulator will advance time considering the length of the
Cloudlet and the transfer time of its input files. Then it is sent back and the Engine will check
that the Cloudlet with ID 10 can already be processed too, so it will be scheduled and
delivered to the Datacenter. In this moment, the Datacenter checks which Cloudlet is the
next to be finished and, once more advance time to that value. This time the Cloudlet 0 is
finished and returned too, getting the Cloudlet 8 as the next available to be processed, as its
parent nodes have both been already processed. This process is repeated until all of the
Cloudlets have been executed.
Figure 9: WorkflowSim initialization stage
Figure 10: WorkflowSim main stage
43
Figure 11: WorkflowSim ending stage
3.2.4. Merged simulator
The main purpose consists in having a simulator that is able to compute the consumed
energy, but following the structure of a real workflow. Joining both simulators we get not
only both characteristics accomplished, but it also includes the DVFS technique. This makes
the simulator able to adjust the CPU's frequency to minimize the overall energy consumption.
The available governors are those introduced in section 3.1.2, named Performance,
PowerSave, UserSpace, OnDemand and Conservative. Performance will select and fix the
maximum available frequency, while PowerSave will choose the lowest. UserSpace permits
the user to specify a certain frequency to be fixed, depending on the available multipliers.
The last two of them work with thresholds that are compared with the CPU's utilization.
OnDemand will only use one threshold (user configurable, default 95%). Whenever the
utilization is greater than the selected threshold, the frequency will be scaled to the maximum
multiplier. And vice-versa, should the utilization be lower than the threshold, the frequency
will be scaled down one step to the next lower multiplier. This down-scaling is done once a
given counter runs out, which is defined in the simulator as a variable named
samplingDownFactor, so to avoid the CPU from scaling up and down constantly, similar as
the Smith trigger usage in electronics. The Conservative governor contains two thresholds,
up and down (user configurable, default 80% and 20% respectively). Unlike OnDemand,
this governor could make the frequency to be set to the maximum and, if utilization falls
down to 25%, the frequency would still be set to the maximum. This would mean a non-
optimum behavior when talking about energy consumption, as frequency could be scaled
down and let utilization raise to save energy.
44
As mentioned before, this establishes a good base for future researching on Cloud
Computing using the simulator, as we can estimate the power consumption and the
experiments are based on real workflows. Also, as modern CPUs already include DVFS
technique, it is important that the simulator would include this technique so the CPU's
dynamic energy optimization will be considered. From this point, the simulator can be used
to optimize the Scheduler behavior to save more energy.
While Guérout et al. [ 33 ] also implement DVFS and DAG workflows in CloudSim,
it is a parallel implementation with WorkflowSim, as they mention. This contribution join
both simulators allowing WorkflowSim to be power-aware, so that all Scheduling algorithms
implemented in WorkflowSim can be simulated with DVFS, measuring Datacenters power
and add new Scheduling algorithms to optimize the overall energy consumption.
In the same way, Fei Cao et al. [ 38 ] also propose a simulator including DAG
workflows and DVFS, but their proposal focus more in the Scheduling rather than DVFS
implementation, as in their paper they only mention the use of DVFS to obtain the optimum
frequency, not mentioning anything about governors used of even availability of user
configuration of the governors.
The process to join both simulators is explained in section 3.2.7, but it is important to
know that the base simulator taken is WorkflowSim. Then, from [ 33 ] we have added
everything necessary related with power models and DVFS. Before explaining the joining
process we will show the changes introduced with both base simulators.
3.2.5. Changes with WorflowSim
The entities messages in the proposed simulator are all the same as in its base
simulator, WorkflowSim. However, being this simulator power aware, the step in Tag 41
changes considerably, like introduced in section 3.2.2. First thing to mention is that now this
stage is not performed in just one step, so that only one of these messages will be shown in
a Cloudlet's execution. The Scheduling interval parameter set in the Datacenter creation, in
the main file, is the lapse of time between these messages with Tag 41. The default value for
the Scheduling interval is 0.01. So, each 0.01 seconds there is a message with Tag 41. This
indicates how often the Datacenter checks the utilization, estimates the power consumed and
applies the DVFS algorithm to check whether the Datacenter is being idle and the system
can be scaled down to save energy or, on the contrary, the utilization is too high that
surpasses the high threshold and the system needs to be scaled up to increase its performance.
For a Cloudlet that would last 13 seconds to be totally processed, the number of messages
45
with Tag 41 that the simulator produces would be 1300, with the default value of the
Scheduling interval set to 0.01 seconds.
Similarly as WorkflowSim, the process shown for steps 12 to 18 in Table 5 is
repeated each time the Datacenter has finished processing a Cloudlet and sends it back. The
Engine then checks whether some new Cloudlets can be executed already, sends them to the
Scheduler if there is any, takes decision of VMs and send them to be processed. Always that
the process reaches the Tag 41 part, it will be repeated as explained before until one of the
Cloudlets has been completed again. In the case of the Montage_25 in Figure 8, the first row
will send 5 Cloudlets, with IDs from 0 to 4. All of them are processed in parallel in different
VMs, but due to their different lengths, and seeing that the shortest is the Cloudlet with ID
2, the moment it is completed, it will be sent back without waiting for the rest to be processed.
After the Cloudlet with ID 2 is returned, the Engine will check that the Cloudlet with ID 10
can already be processed too, so it will be scheduled and delivered to the Datacenter. In this
moment, the processing of the 5 Cloudlets will continue, but taking into consideration that
the 4 that had been being processed before during the entire duration equal to the length of
the already returned shortest Cloudlet are nearly completely processed too. This is why, just
a short time after this, the Cloudlet 0 is finished and returned too. This time, it is the Cloudlet
8, whose parent nodes have both been already processed, the one that will be sent to the
Datacenter. This process is repeated until all of the Cloudlets have been executed.
Apart from this point, the rest of the system works equally to WorkflowSim. It is this
step the one that checks and gets the power consumption, leaving the rest of the messages
unaltered.
3.2.6. Changes with CloudSim
WorkflowSim adds 4 new entities with respect to CloudSim. CloudSim Shutdown and
Cloud Information Service are both kept the same. The Datacenter entity is modified to allow
the execution of Workflows and power estimation, but its purpose remains the same. The
major changes are reflected on the Broker. In CloudSim, this entity is the one that requests
the VM creation, is in possession of Cloudlets, sends to and receives from the Datacenter,
initializes the system and destroys it when the simulation is finished. Basically, it is the only
entity in part of the user. However, CloudSim only works with a group of Cloudlets to be
processed, considers time alone and does not really need more entities for that simple
purpose.
46
When the process gets more complicated, as in the case of WorkflowSim, dividing the
problem makes it easier. In this case, 4 new entities are introduced. The Planner is the one
that initializes the system and parses the DAX file to get the tasks. Then, those tasks are
passed onto the Merger (also called Cluster) that is the one that join different Tasks onto a
Job. The default configuration of the simulator considers no clustering, just gets each Task
into a Job, individually. After this, the jobs are sent to the Engine, where they will be selected
following the order specified by the workflow. This is the main difference with the basic
CloudSim simulator, where the different Cloudlets were executed following a simple
sequential order.
As explained in the Section 3.2.3, the Engine is the entity in charge of making sure the
workflow's order is followed, and sending to the Scheduler the Tasks that can be processed
each time a Task is returned, being these Tasks those whose parent nodes have already been
processed beforehand. Finally, the Scheduler is the entity that will choose which VM is the
most suitable for processing each Task. For this purpose, several different scheduling
algorithms can be used, but that will not be study in this paper.
3.2.7. Modifications and additions to achieve the proposed joint simulator
To join both simulators, the chosen method is to use WorkflowSim as the base,
adding the power classes from [ 33 ] and modifying a number of classes. The following steps
must be performed in order to successfully achieve the simulator joint. To avoid modifying
the basic simulators, a new package called dvfs is created into org.workflowsim. New classes
needed will be stored into this package. Also, the main file for simulation is taken from
WorkflowSimBasicExample1 and copied into another file that we called
WorkflowDVFSBasicNoBrite, stored into a new package named dvfs into
org.workflowsim.examples.
Both PowerDatacenter (CloudSim [ 33 ]) and WorkflowDatacenter (WorfklowSim)
extend the base class Datacenter. But we need parameters from both Datacenter types, as the
first one defines parameters related to power and the second one allows working with DAX
files, as well as Tasks and Jobs. The first modification will be making WorkflowDatacenter
to extend PowerDatacenter. As Java does not allow a double inheritance, using this method
we have a suitable Datacenter able to work as we intend to. To avoid modifying the
functioning of the WorkflowDatacenter class, we have created a new class in the new
package, called WorkflowDVFSDatacenter, copying all contents from the
WorkflowDatacanter class and making it to extend from PowerDatacenter as just explained.
47
Additionally to modifying the inheritance of this new class, we need to add several
parameters to this new class to make it power aware. In the main method of the main file,
the creation of the Datacenter object must be changed from WorkflowDatacenter to the new
one, WorkflowDVFSDatacenter and add datacenter.setDisableMigrations(true) after the
datacenter object creation, as we can see in the DVFS base example.
At the end of the same main method, after finishing the simulation, we need to add
some code to print the information related to the simulation result, being these four
parameters about:
1. Total execution time in seconds (not simulation time).
2. Total power sum in watts.
3. Average power dividing the last two parameters, watts and time.
4. The energy consumed in Wh, multiplying the average power by the time in hours.
In the Datacenter creation, we need to add several information about power, as the
default file only were creating a WorkflowDatacenter. From PowerDatacenter we need to
copy all parameters related to DVFS, as the frequencies and governors' information as
thresholds. The Host class objects are changed to PowerHost objects, as we need Hosts that
can handle power. So, in the PowerHosts creation we need to add information about the
power model used, as well as a Boolean indicating if it allows DVFS. Also, at the end of the
method, when creating the WorkflowDVFSDatacenter object we need to indicate the
PowerVmAllocationPolicy used and change the schedulingInterval from 0 to 0.01. If this
interval is not changed, the simulator will not advance in time and will be trapped into an
infinite loop.
As both classes PowerDatacenter and WorkflowDatacenter extends the Datacenter
base class, both of them override a method called updateCloudletProcessing. After changing
the inheritance hierarchy to Datacenter -> PowerDatacenter -> WorkflowDVFSDatacenter
the method that will prevail will be the last. However, the method we need must take power
into account, which is not the case of the WorkflowDatacenter. For this reason, we need to
comment this method in the WorkflowDVFSDatacenter class so that this does not override
the one belonging to the PowerDatacenter class.
Also, inside the WorkflowDVFSDatacenter class there is another method called
processCloudletSubmit. This method is missing a necessary line, that is included in the same
method of the class PowerDatacenter, being this line
48
setCloudletSubmitted(CloudSim.clock()). We must add this code at the end of this method
in our new datacenter class in order to get the simulator to work.
3.2.8. Additional notes to the power model
The power model used in this joined simulator is the power model explained in
section 3.1.1, describing how power varies with voltage and frequency in a normal processor.
But in this simulator, there is no explicit dependence of power on the voltage nor the
frequency. This simulator works as the following explanation.
We know that power consumed varies depending on frequency, and the values of
frequency that a processor can work are increased multiplying the mother board's base
frequency by the processor's frequency multiplier, so that the different available frequency
values that the processor can work at are few and discrete.
The default scenario in CloudSim defines the different values of the frequency
multipliers in the main simulation file. When creating the Datacenter, the user needs to
specify the different multiplier values as a percentage of the total processing capacity. There
is no explicit indication of the maximum frequency that the processor can work, but instead
is all reduced in terms of the MIPS performance. So, in the example of the default scenario
there are 5 different multiplier values added and a MIPS value of 1500. Table 7 shows the
different values of the multiplier as well as the MIPS that the processor would have at each
multiplier. As it is obvious, the last multiplier achieves the maximum MIPS value.
Multiplier (%) 59.925 69.93 79.89 89.89 100
Performance (MIPS) 898.875 1048.95 1198.35 1348.35 1500
Null utilization power (W) 82.75 82.85 82.95 83.10 83.25
Full utilization power (W) 88.77 92.00 95.5 99.45 103.0
Table 7: Frequency multipliers and MIPS
The last multiplier is selected when maximum performance is needed. The variation
of this multiplier value is decided by the DVFS algorithm and the chosen governor, which
is in charge of comparing the current utilization with both upper and lower thresholds to
know whether it needs higher performance or, on the contrary, the processor is rather idle
and can be scaled down to save unused power. Moreover, additionally to frequency, the
49
power consumed also depends on the current utilization. The higher the utilization, the more
power will be consumed.
In the power model class, as the default "PowerModelSpecPower_BAZAR", there
are two arrays of power values. The way this simulator works with power is defining two
values of power consumption for each multiplier, being these values those corresponding to
null and full utilization. Those values are shown in Table 7. To determine the power
consumption at a certain moment, the simulator takes both power values for the current
frequency index. Then, using the utilization value will estimate the power consumption as a
linear interpolation between both null and full utilization values, using Equation 8. Note that
this equation achieves the same results as Equation 7.
𝑝𝑜𝑤𝑒𝑟 = (1 − 𝑢𝑡𝑖𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛) ∗ 𝑖𝑑𝑙𝑒𝑝𝑜𝑤𝑒𝑟 + 𝑢𝑡𝑖𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 ∗ 𝑓𝑢𝑙𝑙𝑝𝑜𝑤𝑒𝑟 ( 8 )
As mentioned in the "PowerModelSpecPower_BAZAR" class, the machine used is a
CPU Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz with 4GB Ram. The power values
are measured using a Plogg wireless electricity meter. Of course, both voltage and frequency
parameters are adjusted with the DVFS algorithm in the physical processor, but in the
simulator, this power model works with linear interpolation, only giving values of null and
full utilization power and estimating the rest.
50
3.3. Second stage: scheduling algorithms
This second stage takes as a base the simulation environment we got in the first stage.
Without that simulator we would not have the necessary characteristics required to correctly
test the scheduling algorithms developed in this stage. These algorithms are based on finding
the optimum scheduling in two phases. The process is as follows.
Every simulation part from having one or more datacenters composed of a number
of physical machines. As stated before, users’ tasks are not assigned directly to them, due to
their high amount of resources, to avoid wasting them. Instead, a number of VMs of different
performance are created in the physical machines providing a heterogeneous simulation
scenario. Then, the different tasks are sent to those VMs to be execute, returning the results
to the user.
This introduces the two phases of the scheduling:
1- First, the system part from a number of physical machines where the VMs will
be created. This first phase consists of finding which physical machine is the most
suitable to hold the VM creation. Depending on how this process is performed,
result parameters as execution time and energy consumption will vary. Later on,
this step will be explained in more detail.
2- Then, once the VMs are created, the tasks can be executed. In this second phase,
the scheduler is in charge of finding which VM will execute each task. Like in
the first phase, the final results got will greatly depend on the degree of
optimization of this scheduling.
In this section, we explain both phases in detail, how they are implemented and we
compare them with other scheduling techniques. Results got are shown in section 4.2.
3.3.1. Power aware scheduling
There are many different techniques used to schedule tasks to different resources in
their correspondent resource. However, classic scheduling algorithms like Min-Min, Max-
Min and Sufferage focus on the minimization of the execution time in the overall processing
of all tasks. In this project, we are focusing on the reduction of the energy consumed rather
than time. Classic Min-Min and Max-Min have been adapted to optimize energy instead of
time to serve as a comparison method with the scheduling developed based on Fuzzy Logic.
WorkflowSim simulator already include a power aware scheduler for VMs. In the
Power package, there is a class called PowerVmAllocationPolicySimpleWattPerMipsMetric
51
which will allocate the VMs in different Hosts, considering as metric the ratio watts/mips in
the different Hosts. So, this algorithm will choose the Host that need the less amount of watts
per MIPS and allocate the VM there. This way, it is minimizing the overall consumption of
the system, as each VM will be consuming the minimum Watts as possible.
This class will compute this metric for a certain VM in all Hosts. Then it will choose
the host with the lowest metric and execute the allocateHostForVm(Vm,choosenHost)
method. In the allocation, it will call the method host.vmCreate(vm) and allocate the
necessary resources, like Storage, RAM and BW.
However, the simulator do not include a power aware scheduler for tasks, leaving the
second scheduling phase to the classic Min-Min or Max-Min, chosen by the user. We want
to add to the simulator a task scheduler able to work with energy rather than just time to try
to get better results and reduce overall energy consumption as much as possible.
Next sections show with more detail both of these scheduling phases, describing
parameters considered by the simulator in different algorithms.
3.3.2. VM scheduling
First scheduling step. In this stage, all the different Virtual Machines (VMs)
requested by the different users are created. The scheduler takes the decision of which
physical machine holds the creation of each VM. The scheduler decisions are based on
different parameters that vary if power is considered or not.
In the case of not power-aware, the scheduler will choose the physical machine for a
certain VM based on which machine has more free resources. This way, VMs will be
distributed in all the available machines so that no machine is over utilized while the rest are
left idle.
This is a great strategy to use, knowing that in DVFS, dynamic governors work up
and down thresholds. This way, the division of VMs in all available machines achieve an
even utilization level in all processors, what means that all physical machines should have a
similar utilization percentage, so that their processors' frequency multiplier should all be near
each other.
Knowing the saving DVFS is used to reduce the processors' energy consumption
keeping a low utilization in all machines will mean in an energy saving in all of them while
assuring that the users' deadline that is established in the Service Level Agreement (SLA)
will be achieved, as the utilization of the machines is intended to be kept far from the 100%.
52
In either way, the scheduling of a VM and its assignment to a physical machine will
be requested by a user and performed by the scheduler. So upon receiving a user request of
VM creation the scheduler will decide and choose which physical machine should receive
the VM creation order. For this decision purpose, in addition to the idle part of the processor
the user will request for a certain amount of MIPS, which is the common method of
measuring the performance of a VM.
The method used in CloudSim's VmAllocationPolicySimple.java class is based on
the idle PEs, which are the Processing Units available in each physical machine. So this class
tries to achieve that no machine has all PEs used while the rest of machines are totally idle.
The higher number of PEs being used means the higher utilization of the CPU of that
machine, and if that utilization surpasses the up_threshold the power consumption of that
CPU would be higher as the DVFS algorithm would increase the voltage and frequency at
which the CPU works, increasing the energy consumed.
In the other hand, the scheduler may be intended to accomplish a power-aware
scheduling in order to achieve an energy consumption as lowest as possible. In this second
scenario, the common algorithm and the one used in the WorkflowSim's
"PowerVmAllocationPolicySimpleWattPerMipsMetric.java" class is based in looking for
which physical machine can get the lowest energy consumption for the required MIPS of the
VM that is wished to schedule. This decision of which machine can achieve the lowest
energy consumption is based on an estimation of the energy consumed in each machine.
This algorithm is developed in different ways depending on the programmer. This
mentioned class defines a “watts per mips” metric. When the scheduler receives the request
of a VM it will estimate which would be the energy consumption of each physical machine,
based on the MIPS that the user requires for that particular VM.
Using the power model from section 3.1.1, the scheduler can estimate the energy
consumption based on the voltage and frequency of each of the machines and then request
the VM creation in the machine in which the estimation is the lowest. As the static energy is
a fraction of the dynamic, a higher dynamic energy means a higher static energy part.
Although the static component is not estimated, being in proportion to the dynamic part we
know that a machine in which the dynamic energy estimated is higher, the static part would
also be higher, which would sum a greater energy than other machines. So the scheduler can
rely on this proportion to estimate the overall consumption and decide which machine will
accomplish the lowest.
53
3.3.3. Tasks scheduling
For the energy estimation explanation we are relying on the energy model discussed
on [ 28 ]. Using a similar approach to the one explained in this documentation, they define
the energy consumption as a sum of a dynamic and a static components. Although they leave
the static consideration as a fraction of the dynamic part, the important part for understanding
this model is focused onto the dynamic component.
Defining this dynamic energy as a proportion of the product of the voltage squared
and the total number of clock cycles of the CPU for that particular task 𝐸𝑑 = 𝛼𝑉2𝑁𝑐𝑐, if we
take into consideration the energy dependence of the voltage alone we can make an
estimation of the energy consumption. Taking as a reference the table 2 of [ 28 ], we can
show an example of how the scheduler takes this decision.
First of all, we need to note that knowing that each task is defined by its length in MI
and deadline in seconds, the first step of this estimation is based on knowing the VMs that
can accomplish the deadline. Each VM is defined by its performance measured in MIPS. So,
dividing the task's length by the performance of each VM we get the time that each VM
would take processing that task, 𝑡𝑚𝑎𝑥 =𝑙𝑒𝑛𝑔𝑡ℎ
𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒=, [
𝑀𝐼
𝑀𝐼𝑃𝑆] = [
𝑀𝐼𝑀𝐼
𝑠
] = [𝑠]. From this first
step, the scheduler can discard the VMs whose processing time for the scheduled task would
surpass the user's deadline. Additionally, the scheduler can get a minimum value for the
performance that must have the VMs. 𝑝𝑒𝑟𝑓𝑚𝑖𝑛 =𝑙𝑒𝑛𝑔𝑡ℎ
𝑑𝑒𝑎𝑑𝑙𝑖𝑛𝑒, [
𝑀𝐼
𝑠] = [𝑀𝐼𝑃𝑆].
After this first step, the schedule know the VMs that can process the task accomplishing the
deadline. It will then estimate which is the VM with the lowest energy consumption for that
task. So, for each VM, it will calculate the dynamic energy based on the product of the
voltage squared and the time, so taking into account the variable parts of the formula and
leaving the constants.
Imagine a task that need to be scheduled, with a length of 25000 MI and a deadline
of 5s. Considering that there are 4 different VMs with the parameters voltage and
performance shown in Table 8, we can estimate a proportion of the energy that would be
consumed in the execution of that task.
First of all, the task would need a minimum performance of 𝑝𝑒𝑟𝑓𝑚𝑖𝑛 =25000
5=
5000𝑀𝐼𝑃𝑆 to accomplish the user's deadline, but the first VM, having only 4000 MIPS
would not be enough. Alternatively, the execution time for the first VM would be 𝑡𝑚𝑎𝑥 =
25000
4000= 6.25𝑠, which surpass the 5s deadline.
54
Discarded the first VM, the rest of them exceed the minimum of 5000 MIPS, so all
of them could process the task in the time limit. Now the real question is, which of these 3
would get the lowest energy consumption? In Table 8, power is approximated as a proportion
of the voltage squared, time is gotten as a ratio between length and performance and the
energy is calculated as the product of power and time. We can see that the VM working at
the lowest voltage get the lowest energy consumption, thanks to the dependence of voltage
squared.
Voltage (V) MIPS Power (W) Time (s) Energy (J)
0.9 4000 0.81 6.25 (Doesn't accomplish deadline)
1.1 6000 1.21 4.17 5.0457
1.3 8000 1.69 3.125 5.2813
1.5 10000 2.25 2.5 5.625
Table 8: Energy estimation example
3.3.4. Bag-of-tasks power aware scheduling
In this paper, Buyya et al. [ 28 ] define a scheduling method for a bag-of-tasks,
considering the scheduling of all tasks in the different VMs in order to get the lowest power
consumption as possible. They consider the scheduling of bag-of-tasks, what they call "job".
Their energy model consists on the sum of the dynamic energy and static energy, considering
the static part as a percentage of the dynamic part, following the power model introduced in
section 3.1.1.
When scheduling a task, a function checks which VM has the lowest energy
consumption. When checking this, it is important to note that there can be several VMs that
can handle the task's execution, but it will make the VM choice based on an estimation of
the energy consumption of the task on each VM.
Note that, as the power is related to the squared voltage and as a reduction of the
MIPS means in a reduction of the voltage, the energy is reduced in a squared ratio. So, when
selecting the VM for the task's execution, the lowest MIPS will mean the lowest energy
consumption as well.
Each task is defined by its length, measured in Millions of Instructions (MI) and a
deadline, which is the maximum time that can take for the execution of a task, which is fixed
55
by the Quality of Service (QoS) of the Service Level Agreement (SLA) of the user with the
cloud provider.
As an example, a task with a length of 25000 MI and a deadline of 5s would need a
minimum performance of 𝑝𝑒𝑟𝑓𝑚𝑖𝑛 =25000
5= 5000𝑀𝐼𝑃𝑆 to accomplish the deadline
established in the SLA.
Considering 𝐸 = 𝑃 · 𝑡, and 𝑃 is proportional to 𝑉2, in Table 8 we can see that in the
scheduling decision, the lowest MIPS that accomplish the deadline results in the lowest
energy consumption. Less MIPS means longer time, but also lower voltage. As power is
reduced with the square of voltage, the energy product will be reduced with the voltage, even
though time is increased. So, the lowest MIPS that accomplish the deadline tends to get the
lowest energy consumption.
From this paper we learn that choosing the VM with the lowest MIPS we tend to get
the lowest energy consumption due to the squared relation of the power with the voltage. In
the simulator, there is no deadline parameter, but it is important to have time in mind as
energy depends on it.
3.3.5. Classic schedulers adapted to power
As mentioned, we have adapted the Min-Min and Max-Min schedulers as a mean of
comparison with the fuzzy logic schedulers. Here we analyze their behavior and problems.
We part from defining a sequential power aware algorithm, that we have called
PowerAwareSeqSchedulingAlgorithm, in the packet org.workflowsim.scheduling. The
order tasks are scheduled is the same order they are received by the scheduler, hence the
name of sequential. Then, it finds the VM that achieves the lowest power consumption for
each task in the sequential order and assign the execution to that VM. By default, only one
task can be executed in each VM at the same time, so VMs assigned to the first machines
are kept busy processing until the finish executing the task.
This introduces a problem. Consider two tasks of lengths 𝑙1 = 10000𝑀𝐼 and 𝑙2 =
1000𝑀𝐼, and two VMs of performance 𝑝1 = 100𝑀𝐼𝑃𝑆 and 𝑝2 = 1000𝑀𝐼𝑃𝑆. If task 1 is
received before task 2, the first task to be assigned is the biggest one. It would be assigned
to the second virtual machine, leaving task 2 for the first VM. The parallel time of both
executions would be:
𝑚𝑎𝑥 (𝑙1
𝑝2,
𝑙2
𝑝1) = 𝑚𝑎𝑥 (
10000
1000,1000
100) = 𝑚𝑎𝑥(10,10) = 10𝑠
56
However, if the second task is received before the first one:
𝑚𝑎𝑥 (𝑙2
𝑝2,
𝑙1
𝑝1) = 𝑚𝑎𝑥 (
1000
1000,10000
100) = 𝑚𝑎𝑥(1,100) = 100𝑠
This shows the importance of sorting the received tasks before schedule them, as big
tasks should be assigned to big VMs to avoid this situation.
This problem is shared with the Min-Min scheduler. The adaptation of this algorithm
has been named PowerAwareMinMinSchedulingAlgorithm, placed in the simulator in the
org.workflowsim.scheduling packet. The second part of this scheduler acts in the same way
of the prior one, but changes in performing an initial sorting of the tasks by its length, from
shorter to longer. This makes shorter tasks to be assigned to the bigger VMs, wasting
resources and leading to the situation explained before.
The solution to this problem is found in the Max-Min. Named
PowerAwareMaxMinSchedulingAlgorithm and stored in the org.workflowsim.scheduling
packet, this adaptation sort tasks by descending length, assigning longer tasks first.
The problem explained is related to time, but these schedulers work with power. As
each task takes some power in being processed, the total sum is also increased if the
scheduling is not done optimally. However, even though both algorithms find the same
problem, the same configuration of assignments do not optimize both scenarios. A
configuration that optimize time will not get the lowest value of power consumed in a
heterogeneous system, where machines with different types of processors are mixed.
3.3.6. Watts per MIPS scheduler for VMs
As introduced before, WorkflowSim already include a scheduling algorithm that
allocates VMs to physical machines. This is the first phase mentioned, and this algorithm
considers power in the scheduling decision. Below, we introduce a short pseudo code to
explain how this works.
Inside the class that defines the scheduler, there is a method called
allocateHostForVm, which will be the one in charge of actually evaluate, decide and allocate
the VM in a Host. It will use another method called MetricWattPerMips to help estimate this
metric in all Hosts. From the algorithm study, we get the following steps:
57
Selects the PowerHostList.
For each host:
Gets the number of Processing Elements (PEs).
For each PE:
1. Gets the IndexFrequency.
2. Gets the MIPS of that PE at the selected indexFreq.
3. Gets the power of the Host at the selected indexFreq considering maximum
utilization.
4. Gets the power of the Host at the selected indexFreq considering null utilization.
5. Gets the power of the current PE multiplying the maximum of the host by a ratio.
6. Gets the power of the current PE multiplying the minimum of the host by a ratio. The
ratio in both 5 and 6 is calculated by dividing the MIPS of the PE by the total MIPS
of the Host. If there is only 1 PE in the Host, both PE and Host powers will coincide.
7. Gets the utilization of the PE as a ratio of the total allocated MIPS and the MIPS of
this PE. At the beginning, when there are no VMs allocated in the current Host, the
total MIPS allocated is 0, making this ratio null, and so, the PE's utilization.
8. Gets the total MIPS of the Host, considering the indexFreq.
9. Gets the VMList in the current Host. For this list, calculates the total SumMaxMips
of all the VMs.
10. Finally, calculates the metric as the difference of the maximum and minimum power
for the PE multiplied by a ratio of the last SumMaxMips and the maximum MIPS of
the Host: (𝑝𝑚𝑎𝑥 − 𝑝𝑚𝑖𝑛) − (𝑉𝑀𝐿𝑖𝑠𝑡𝑠𝑢𝑚𝑀𝑎𝑥𝑀𝑖𝑝𝑠
𝐻𝑜𝑠𝑡𝑚𝑖𝑝𝑠)
The metric of all MIPS is an accumulated sum of all metrics for
all PEs. When finishing with all the PEs metric, the sum is divided
by the number of PEs to make a mean.
After all the mean metrics are gotten, it will choose the minimum metric
which will mean the machine with the lowest ratio between the utilization
and the difference of power between the maximum and minimum
utilization.
To understand this metric we need to consider both parameters. If a host is set to
0 utilization and the difference between PE_Pmax and PE_Pmin is low,
understanding these parameters as the power consumption in both maximum and
minimum utilization, then we can assure that an allocation of a VM in this Host
58
would mean a low difference between the current and later power consumption
after the allocation, so that this Host would mean a good candidate for the VM
allocation.
The second parameter to take into account is the current utilization. This is shown
here in the equation as the ratio between the MIPS allocated and the total MIPS.
If a host has null MIPS allocated, it means that this Host is completely idle, so
that the allocation of the VM in this host would be far from raise its utilization to
100%. Again, we search the Host with the minimum of this ratio, as we try to
avoid leaving idle Hosts while over utilizing others.
And so, this metric is composed by these two parameters, and its multiplication
gets the final metric used in this algorithm.
After the minimum metric is found, the Host is selected as the allocation target. It will
then proceed with the allocation of the VM in this Host.
Using this method, the scheduler is able to find a good allocation of the VMList in
the HostList. As a summary of the process, it will take each VM in a sequential order and
find the host that achieves the minimum amount of watts consumed for the MIPS
performance of that VM. Although this is a good scheduler, it suffers from the same problem
of the PowerAwareSeqSchedulingAlgorithm introduced in section 3.3.5, as the host with the
best ratio of “watts per mips” is used for the first VM in the VMList, and this VM may not
be the biggest one. So the host with the best ratio is wasted instead of used in the VM with
the highest value of MIPS, reducing the power consumed for that VM, meaning a similar
problem as the last section.
As this scheduler is included in WorkflowSim, it provides a good method of
comparison with the fuzzy scheduler developed.
3.3.7. Fuzzy integration in WorkflowSimDVFS
Apart from the classical time scheduling algorithms, the addition of power models to
this simulator allows us to include power-aware scheduling algorithms with which we can
optimize the overall energy consumed in the different machines that compose the Datacenter.
However, the functioning of classical power-aware algorithms tend to focus in the
optimization based on a single calculated parameter. This is the case of MinMin that chooses
the VM that can process a Task in the lowest time, not considering other important
59
parameters like the utilization of the machine. In the case of power-aware scheduling
algorithms, they tend to focus in selecting the VM that can achieve the lowest energy
consumption for the execution of a certain Task through an estimation of the energy that
would be consumed in each of the available VMs. This estimation is based on the power
consumed at current utilization and frequency levels and the time that would take to totally
process the Task, got as a ratio of the Task's length and the performance of the VM. While
values like utilization, MIPS and time are present on the energy calculation itself, they are
not considered on the final decision of which VM should process each Task, only the final
energy value calculated.
There is another alternative that can perform that decision based on multiple
parameters. This is the case of FRBS, where the system considers multiple input parameters,
called antecedents, and an output parameter, called consequent. This way, we can consider
all utilization, time, power, energy, MIPS and whatever relevant value in this decision. The
consequent of the system will provide a degree of selection for the current VM to process
that Task. The higher this consequent is, the more suitable this VM is to process that Task.
This is performed for a Task with each available VM, and the highest consequent will result
in the selected VM to process that task. The process is repeated for all Tasks pending to be
scheduled.
The process explained considers the second stage of scheduling in
WorkflowSimDVFS. The Tasks are processed in VMs, but these must have been created
prior to their use. The two stages involved in the scheduling of the system are:
1. VM scheduling: considers a number of VMs to be created in a number of physical
machines that compose the Datacenter. This first stage contains the decision of which is
the most suitable machine to allocate each VM. Useful parameters are utilization of each
machine, MIPS requested and availability among others.
2. Tasks scheduling: considers a number of Tasks to be executed in a number of VMs.
This second stage contains the decision of which is the most suitable VM to process
each Task. Useful parameters are task's length, execution time and power consumed
among others.
FRBS need a number of antecedents and consequents to work. It's obvious to think
that the consequent will be used to take the decision of the machine to Schedule the VM or
task. For the antecedents, various parameters will be considered to help evaluate the
60
resources. In the simulator, two FRBS will be developed as schedulers for both VM and
tasks allocation. The antecedents used in each of them will be different, as they need to use
parameters that takes part in the decision of the scheduler, and those parameters differ from
VMs to tasks.
In the simulator, we use jFuzzyLogic [ 39 ][ 40 ] to fuzzify the antecedents, hold the
rule base, and defuzzify the consequent, being able of having a decision based on both the
input values taken from the simulator and the rules stored. The rules considered are not tried
at random values, as the final value will depend on these rules and random rules could mean
a really bad scheduler. Instead, we use Matlab in addition to provide a system based on the
Pittsburgh approach to find a good configuration of rules that gets the best result as possible,
in both FRBS. The integration of Matlab will be explained in another section.
After the Matlab program has got the rule base that optimizes the energy consumption,
the rules are sent to the simulator, where they are used in the schedulers. The simulator needs
then an object to store the rule base until they are used, as well as a method of obtaining each
rule to pass them onto the jFuzzyLogic class to provide the FRBS with the rules it needs to
work properly. For this purpose, we have defined a number of classes in a package named
org.workflowsim.fuzzy. The classes added to WorkflowSimDVFS are:
FuzzyVariable: expresses an antecedent or a consequent. Contains a name and the weight
from the selected membership functions.
FuzzyRule: expresses a group of antecedents and one consequent. Contains a list of
FuzzyVariables as the antecedent list and another FuzzyVariable as the consequent.
FuzzyEntity: expresses the rule base. Contains the list of FuzzyRules that have to be
evaluated to obtain an output.
Two rule base are sent from Matlab to the simulator, one containing the rules for the
VM scheduler and another for the tasks scheduler. Each rule base received needs to be stored
in a different object, as both will be used in different schedulers. We explain the parameters
taken as antecedents in different sections for each FRBS.
61
3.3.8. VM scheduling FRBS
We have created a package named “vms” inside the package
“org.workflowsim.fuzzy”. Inside, two classes are added. WorkflowSimVmsFUZZY is the
class that will communicate with Matlab to receive the rule base passed by and store it in an
object from the class FuzzyEntity. As the rule base is encoded in JSON format, this class is
in charge of decoding the data before its storage. For this scheduler we have considered 4
variables:
MIPS requested by the VM.
Total MIPS of the host (physical machine).
Utilization of the host.
Power consumed in the host.
These parameters need to be normalized to avoid getting these values out of the range
of the Membership Function (MF). The class GeneralParametersVmsFUZZY is a static class
that will store the necessary parameters to normalize the parameters. These values are:
MIPS
Minimum MIPS requested by VMs.
Maximum MIPS requested by VMs.
Total MIPS
Minimum MIPS of all hosts.
Maximum MIPS of all hosts.
Utilization
Minimum = 0.
Maximum = 1.
Power
Minimum power of all hosts.
Maximum power of all hosts.
The formula used to normalize is as follows:
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 − min 𝑣𝑎𝑙𝑢𝑒
max 𝑣𝑎𝑙𝑢𝑒 − min 𝑣𝑎𝑙𝑢𝑒∗ 𝑚𝑎𝑥 𝑟𝑎𝑛𝑔𝑒 𝑜𝑓 𝑀𝐹
62
3.3.9. Tasks scheduler FRBS
Similarly to the VMs, a package is created named “tasks” inside the package
“org.workflowsim.fuzzy”. The class WorkflowSimTasksFUZZY will also communicate
with Matlab and store the rule base received in a FuzzyEntity object, decoding the JSON
data first. In this case, there are 5 parameters considered as antecedents.
MIPS of the VM.
Power consumed for that host.
Length of the task.
Time spent on processing.
Energy consumed in the total execution.
Also, the class GeneralParametersTasksFuzzy stores the parameters to normalize
these antecedents.
MIPS
MinMIPS of all VMs.
MaxMIPS of all VMs.
Power
MinPower of all the hosts.
MaxPower of all the hosts.
Length
MinDaxLength (minimum length from all tasks within the DAG file).
MaxDaxLength.
Time
Minimum time = MinDaxLength / MaxMIPS.
Maximum time = MaxDaxLength / MinMIPS.
Energy
Minimum energy = MinPower · MinTime.
Maximum energy = MaxPower · MaxTime.
The same formula is used to normalize the parameters as in the VMs scheduler.
63
3.3.10. Power model analytical
The initial environment found in the basic simulation class provides a homogeneous
scenario where all hosts are the same one, with the same capabilities and all VMs are the
same too, with the same amount of MIPS. This initial environment do not provide a good
scenario for scheduling. When needing to find which VM will execute a certain task, there
is not much decision option when all VMs are the same one, so no matter which VM the
algorithm chooses, the task will last the same time in finishing its execution. Moreover, no
matter which host holds the VM, as being all hosts the same the power and energy
consumption will be identical. Of course, this is a necessary problem to solve.
Going in a reverse order, we begin with the second stage, tasks scheduling. In a real
scenario, a group of VMs with different value of MIPS performance will be found.
Modifying this is rather simple, as the amount of MIPS that a VM will have is configurable
by the user. A simple algorithm will let us have a group of different VMs. Here we set a
pseudo code to explain how it works.
begin
set maxMips;
set minMips;
set vmNum;
set downScale = (maxMips – minMips) / vmNum;
for (i = 0; i < vmNum; i++)
set mips = maxMips – (vmNum - i) * downScale;
end for
end
With this, we make the VM group have a range of MIPS from a maximum to a
minimum value, adapting to the number of VMs available.
Done this, now we have a heterogeneous scenario to schedule our tasks, being able
to choose which VM will process each one having different processing times for each VM,
as a task with a fixed length will take a different time in being executed in different VMs
with a different value of the MIPS performance.
However, moving on to the first stage, we have the problem of the similar hosts. This
makes the first scheduler encounter a similar problem, but with a more difficult solution.
64
The power model from WorkflowSim uses a certain type of processor, with two arrays of
powers, for null and full utilization, as explained. Modeling a single type of processor the
power consumption cannot be modified, hence the homogeneous scenario consisting on
identical hosts with the same power consumption. Needing a group of hosts with different
power requires another more configurable power model.
And so, we have designed a power model named “PowerModelAnalytical”, which
has been placed in the “org.cloudbus.cloudsim.power.model” package. This power model
uses an analytical model, as its name indicates. It uses multiple parameters to calculate two
basic values, power consumption and MIPS performance of the host. For this purpose, we
use two major formulas: the already explained dynamic power one and another equation to
calculate the value of MIPS that the host will have as performance. Below we show both of
them and indicate which parameters need to be considered for its calculation.
𝑃𝑑 = 𝑘𝑑𝐶𝑉2𝑓 ( 9 )
This calculates the maximum dynamic power consumed by the processor. It uses
parameters that express maximum values.
𝑘𝑑: dynamic constant: number of active gates of the processor.
𝐶: capacitance of the processor.
𝑉: maximum voltage of the processor.
𝑓: maximum frequency of the processor.
𝑃 =𝑃𝑑
1−𝑘𝑠 ( 10 )
Gets the total power consumed, sum of the dynamic and static components.
𝑘𝑠: static constant. Percentage of the power not consumed by the processor.
𝑀𝐼𝑃𝑆 =𝑓·𝐼𝑃𝐶
106 ( 11 )
Obtains the performance of the processor using the Instructions per Cycle (IPC) and
frequency. This new parameter depends on the processor, and indicating this we can model
correctly the MIPS performance of the processor.
𝑓: maximum frequency of the processor.
65
𝐼𝑃𝐶: Instructions per Cycle.
An additional array named percentages is passed by, taken from the main file. This
is the array that indicates the different multipliers of the frequency, shown as percentages of
the maximum frequency. Using this array, we get the value of the power consumption and
MIPS for each value of frequency. This way, we adapt this power model to the simulator,
maintaining the rest of the simulator with the null and full power arrays, so nothing out of
this file is modified.
Now, we are able to get a group of hosts with different value of maximum MIPS and
power consumption, values that were fixed in the previous power model. With this, we can
have a heterogeneous scenario for both VMs and tasks scheduling, being able to test our
fuzzy schedulers.
3.3.11. Integration with Matlab
The next step is using Matlab to get the rule base for each scheduler. To integrate it
with Matlab, the simulator is exported into a jar file. This file is used in Matlab to evaluate
the rules got by the Pittsburgh algorithm [ 41 ]. The method used to communicate both of
them is encoding the rule base in JSON format and passing it by through an argument to the
simulator jar file. The main class in the simulator must then receive the arguments and use
the fuzzy classes prior explained to decode and store the rules.
The Pittsburgh approach is widely known and used. Here we use a pseudo code to
explain the important parameters used and how the algorithm uses them to get the final rule
base. The antecedents have three membership functions, while the consequent has five of
them to provide a more precise output. Note that each member of the population is a
complete rule base, and the evaluation consists in running the simulator with each of them.
The typical Pittsburgh algorithm consists as follows:
initialize population;
evaluate population, get fitness;
sort by fitness;
loop
1) Select parents for crossover. Avoid elitist. Selection
dependence with its fitness.
2) Crossing parents. Gets children rules.
66
3) Children mutation. Improves exploration.
4) Children evaluation. Join children with parents. Sort
by fitness.
end loop
At the end of the loop, the joint parent and children array is sort, so at the end of the
execution the first position of the array will contain the rule base that achieves the lowest
energy consumption.
Below we introduce the files we use in Matlab with the Pittsburgh algorithm to get
the rule base and send it to the simulator.
- Pittsburgh algorithm. The main file to execute. There are two of them, one for VMs and
another one for tasks.
- WorkflowSimDVFS: called in the Pittsburgh algorithm when evaluating the population.
In this file, the jar file containing the simulator is indicated, so the path has to be modified
to the real path of the computer. It builds the command text to call the simulator sending
the rule base as an argument.
- encode: the file that takes the rule base matrix as an input parameter and encodes it in
JSON format. The output string is used in the command text from the previous file.
- executeWorkflowSimDVFS: it takes the command text and execute a system command.
The simulator is called here and it returns the result.
- decode: takes the result from the simulator that contains the information related to the
resulting execution time and energy consumption separated by a semicolon (;). The main
file, containing the Pittsburgh algorithm will use the second parameter, as the fitness
considered is the energy consumption.
- orderSelection: to avoid elitism in the Pittsburgh algorithm, the probability of selection
of the parents is based on their fitness. This function is used after the parents have been
selected, and chooses the order the crossing between them, following a linear order.
These files have been divided in two folders, separating tasks and VMs algorithms.
Using both these Matlab programs, we get learning algorithms for both our schedulers. These
algorithms are used to find an optimum rule set to for each scheduler, allowing the simulator
to get an optimum result for saving energy.
67
4. Results and discussion
In this section we show all results got in the two stages of this project. To understand
all the procedure that gets these results, please read section 3 of this documentation. We also
analyze the results and compare the different techniques that gets them.
4.1. DVFS results
This section will show the results got from the experiments run in the simulator using
the DVFS algorithm. We divide these results in two parts. The first part will show the savings
in time, power and energy parameters got by using this algorithm. The second part will
analyze the evolution of other three parameters of the simulator considered in the DVFS
algorithm, named utilization, multiplier and power.
4.1.1. DVFS savings
Here we show the results got from applying different governors using the DVFS
technique. The topology used in the basic main file counts up to 20 Hosts, each of them
consisting of 1 PE of 1500 MIPS each one. In these machines, 20 VMs are created each one
with a performance of 1000 MIPS. The scenario is left as the default one, all machines and
VMs with the same performance. This is not a suitable scenario for optimizing the schedulers,
as there is not much choice between machines of identical resources. However, as the aim
of this paper is to show the functioning of the DVFS algorithm, the only important
parameters are the maximum MIPS and the different frequency multiplier values. It is
important to note that the maximum value of the MIPS performance cannot be set to
whatever value if we want to see the algorithm behavior. A too high value of MIPS would
make that even though the multiplier is scaled down to its minimum, the utilization would
not surpass the threshold and the system would remain in the lowest frequency saving energy,
making us unable to see the DVFS algorithm behavior. If on the contrary the value of the
maximum MIPS is set too low with respect to the MIPS of the VM then the multiplier would
not be able to be scaled down, as the utilization would be too near the threshold even in the
maximum multiplier. With the values indicated before, we get a balance of these parameters
allowing us to check the DVFS algorithm's behavior.
The simulator has been tested using twelve different workflows, of three different types,
named Montage, Sipht and Inspiral. There are four workflows of each type, increasing
number of tasks to compute in each of them. Each of the twelve scenarios are simulated
68
using the governors Performance, PowerSave, OnDemand and Conservative, leaving
parameters at default values. The UserSpace governor has not been used in the comparison,
as its behavior will depend on the specified frequency and the available number of
multipliers of each CPU. From each governor, four parameters have been taken: Time,
Overall power consumption, Average power consumption and Energy consumption. For
each of these parameters, we have computed the saving percentage of the PowerSave,
OnDemand and Conservative governors with respect to the Performance governor, which
express the maximum consumption without delaying the total simulation time.
Simulation times have been expressed in Table 9. As expected, time needed for each
of the workflow types will raise with the number of tasks. As can be seen, OnDemand and
Conservative governors do not delay the total simulation time, as they will scale up the
frequency to its maximum when the up_threshold is surpassed. PowerSave, however, will
reach 100% of CPU's utilization for the base frequency and will remain in that frequency,
being able to compute less amount of data than in a higher frequency. This will involve a
delay of tasks that would not be needed if frequency was scaled up.
Time summary (s)
Governors % savings
DAX Perform PowSv OnDem Cons PowSv OnDem Cons
Mont_25 57,72 63,03 57,72 57,73 -9,20 0,00 -0,02
Mont_50 83,14 90,68 83,14 83,15 -9,07 0,00 -0,01
Mont_100 125,39 137,05 125,39 125,40 -9,30 0,00 -0,01
Mont_1000 1079,27 1182,19 1079,27 1079,27 -9,54 0,00 0,00
Sipht_30 4448,43 4950,44 4448,43 4448,44 -11,29 0,00 0,00
Sipht_60 4681,28 5209,60 4681,28 4681,28 -11,29 0,00 0,00
Sipht_100 4519,47 5029,53 4519,47 4519,47 -11,29 0,00 0,00
Sipht_1000 11363,74 12648,20 11363,74 11363,74 -11,30 0,00 0,00
Inspiral_30 1344,36 1496,35 1344,36 1344,36 -11,31 0,00 0,00
Inspiral_50 1420,24 1580,85 1420,24 1420,24 -11,31 0,00 0,00
Inspiral_100 1592,32 1771,44 1592,32 1592,32 -11,25 0,00 0,00
Inspiral_1000 11888,48 13229,81 11888,48 11888,48 -11,28 0,00 0,00
Table 9: Time summary (s)
69
In figure 12 we can see easier than in the table how time needed tends to be higher
in the PowerSave governor. This time difference grows with the time needed, so this
difference is bigger in Sipht_1000 and Inspiral_1000.
Figure 12: Time summary (s)
Figure 13 shows better this difference, expressed as a negative saving of time. More
complex workflows, Sipht and Inspiral have a bigger time difference, and hence, a lower
save for this governor.
Figure 13: Time savings (%)
In Table 10 we can see the overall power needed for the Datacenter to process all of
the workflow's tasks.
0,00
2000,00
4000,00
6000,00
8000,00
10000,00
12000,00
14000,00
25/30 50/60 100 1000
Time summary (s)
Mont_PowSv
Mont_OnDem
Mont_Cons
Mont_Perf
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Sipht_Perf
Insp_PowSv
-12,00
-10,00
-8,00
-6,00
-4,00
-2,00
0,00
25/30 50/60 100 1000
Time savings (%)
Mont_PowSv
Mont_OnDem
Mont_Cons
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Insp_PowSv
Insp_OnDem
Insp_Cons
70
Overall power summary (W)
Governors % savings
DAX Perform PowSv OnDem Cons PowSv OnDem Cons
Mont_25 1,11E+05 1,12E+05 1,10E+05 1,10E+05 -0,53 1,24 1,22
Mont_50 1,60E+05 1,61E+05 1,59E+05 1,58E+05 -0,42 0,75 1,22
Mont_100 2,42E+05 2,43E+05 2,38E+05 2,39E+05 -0,63 1,36 1,22
Mont_1000 2,08E+06 2,10E+06 2,06E+06 2,06E+06 -0,85 1,06 1,23
Sipht_30 8,58E+06 8,79E+06 8,47E+06 8,47E+06 -2,46 1,28 1,23
Sipht_60 9,03E+06 9,25E+06 8,91E+06 8,92E+06 -2,46 1,29 1,23
Sipht_100 8,72E+06 8,93E+06 8,60E+06 8,61E+06 -2,46 1,28 1,23
Sipht_1000 2,19E+07 2,25E+07 2,16E+07 2,16E+07 -2,47 1,28 1,23
Inspiral_30 2,59E+06 2,66E+06 2,56E+06 2,56E+06 -2,48 1,29 1,23
Inspiral_50 2,74E+06 2,81E+06 2,70E+06 2,70E+06 -2,48 1,28 1,23
Inspiral_100 3,07E+06 3,14E+06 3,03E+06 3,03E+06 -2,42 1,28 1,23
Inspiral_1000 2,29E+07 2,35E+07 2,26E+07 2,26E+07 -2,46 1,29 1,23
Table 10: Overall power summary (W)
Again, as in the time case, Figure 14 shows this comparison in a similar shape.
Overall power needed grows larger in more complex scenarios and with more tasks to
compute. This also shows that as the overall power counts through the full simulation, the
longer the simulation is, the larger the power consumption will be.
Figure 14: Overall power summary (W)
0,00E+00
5,00E+06
1,00E+07
1,50E+07
2,00E+07
2,50E+07
25/30 50/60 100 1000
Overall power summary (W)
Mont_PowSv
Mont_OnDem
Mont_Cons
Mont_Perf
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Sipht_Perf
Insp_PowSv
71
Figure 15 shows a similar saving than time, based on being this overall power
dependent on time. This would mean that using PowerSave governor, as it takes longer to
finish the full execution, it will need more power.
Figure 15: Overall power savings (%)
If we calculate the average power needed for the execution, based on a time
normalization, then we can see in Table 11 that this relation of power dependent on time is
satisfied. Once we normalize the power with the needed time, there are not major changes
in these values.
Avg power summary (W)
Governors % savings
DAX Perform PowSv OnDem Cons PowSv OnDem Cons
Mont_25 19,28 17,75 19,04 19,05 7,93 1,24 1,23
Mont_50 19,28 17,75 19,14 19,05 7,93 0,75 1,23
Mont_100 19,28 17,75 19,02 19,05 7,93 1,36 1,23
Mont_1000 19,28 17,75 19,08 19,05 7,93 1,06 1,23
Sipht_30 19,28 17,75 19,04 19,05 7,93 1,28 1,23
Sipht_60 19,28 17,75 19,04 19,05 7,93 1,29 1,23
Sipht_100 19,28 17,75 19,04 19,05 7,93 1,28 1,23
Sipht_1000 19,28 17,75 19,04 19,05 7,93 1,28 1,23
Inspiral_30 19,28 17,75 19,03 19,05 7,93 1,29 1,23
Inspiral_50 19,28 17,75 19,04 19,05 7,93 1,28 1,23
-3,00
-2,50
-2,00
-1,50
-1,00
-0,50
0,00
0,50
1,00
1,50
2,00
25/30 50/60 100 1000
Overall power savings (%)
Mont_PowSv
Mont_PowSv
Mont_OnDem
Mont_Cons
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Insp_PowSv
Insp_OnDem
72
Inspiral_100 19,28 17,75 19,04 19,05 7,93 1,28 1,23
Inspiral_1000 19,28 17,75 19,04 19,05 7,93 1,29 1,23
Table 11: Avg power summary (W)
Looking at Figure 16 it seems that the governor which needs less power is
PowerSave.
Figure 16: Avg power summary (W)
Also, Figure 17 shows the larger savings to this governor. While OnDemand and
Conservative governors only get an average power savings of 1.2%, PowerSave gets it
around 8%.
Figure 17: Avg power savings (%)
16,50
17,00
17,50
18,00
18,50
19,00
19,50
25/30 50/60 100 1000
Avg power summary (W)
Mont_PowSv
Mont_OnDem
Mont_Cons
Mont_Perf
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Sipht_Perf
Insp_PowSv
0,00
1,00
2,00
3,00
4,00
5,00
6,00
7,00
8,00
9,00
25/30 50/60 100 1000
Avg power savings (%)
Mont_PowSv
Mont_OnDem
Mont_Cons
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Insp_PowSv
Insp_OnDem
Insp_Cons
73
However, average power is not time dependent. When we calculate the energy
consumed in Table 12, knowing that 𝐸 = 𝑃 · 𝑡, then longer time means larger energy.
Energy summary (Wh)
Governors % savings
DAX Perform PowSv OnDem Cons PowSv OnDem Cons
Mont_25 30,92 31,08 30,53 30,54 -0,53 1,24 1,22
Mont_50 44,53 44,72 44,20 43,99 -0,42 0,75 1,22
Mont_100 67,17 67,59 66,25 66,34 -0,63 1,36 1,22
Mont_1000 578,10 583,01 571,95 570,98 -0,85 1,06 1,23
Sipht_30 2382,79 2441,35 2352,31 2353,45 -2,46 1,28 1,23
Sipht_60 2507,52 2569,16 2475,29 2476,63 -2,46 1,29 1,23
Sipht_100 2420,84 2480,36 2389,79 2391,02 -2,46 1,28 1,23
Sipht_1000 6086,96 6237,57 6008,76 6011,99 -2,47 1,28 1,23
Inspiral_30 720,10 737,94 710,80 711,23 -2,48 1,29 1,23
Inspiral_50 760,75 779,61 751,04 751,37 -2,48 1,28 1,23
Inspiral_100 852,92 873,60 842,04 842,41 -2,42 1,28 1,23
Inspiral_1000 6368,04 6524,40 6286,17 6289,60 -2,46 1,29 1,23
Table 12: Energy summary (Wh)
Figure 18 once more shows a similar shape as Figures 12 and 13.
Figure 18: Energy summary (Wh)
0,00
1000,00
2000,00
3000,00
4000,00
5000,00
6000,00
7000,00
25/30 50/60 100 1000
Energy summary (Wh)
Mont_PowSv
Mont_OnDem
Mont_Cons
Mont_Perf
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Sipht_Perf
Insp_PowSv
74
This can be seen easier in Figure 19, where we can see that using the PowerSave
governor would mean on a greater energy consumption than using one of the both threshold
dependent governors.
Figure 19: Energy savings (%)
After this comparison, we can see that using fixed frequency governors is not the best
option. Using one of the two threshold dependent governors will have a dynamic behavior,
what will be able to adjust the frequency to a higher value when needed, avoiding a total
time delay, but scaling down when possible to save energy. At first, it could seem that using
the PowerSave governor is the best option for energy saving, as it will restrict the frequency
to the lowest multiplier. It really is the governor that achieves the lowest average power, but
it is important to note that all experiments will take a certain amount of time, what means
that energy is more important that power. As PowerSave will delay the execution it will
make the Power and Time product to exceed the energy consumption of the dynamic
governors.
-3,00
-2,50
-2,00
-1,50
-1,00
-0,50
0,00
0,50
1,00
1,50
2,00
25/30 50/60 100 1000
Energy savings (%)
Mont_PowSv
Mont_OnDem
Mont_Cons
Sipht_PowSv
Sipht_OnDem
Sipht_Cons
Insp_PowSv
Insp_OnDem
Insp_Cons
75
4.1.2. DVFS parameters evolution
Now we will analyze the evolution of three parameters of the simulator: the
utilization of the machine, the value of the frequency multiplier and the power consumption,
the three of them as a function of time. The purpose of showing these parameters is to justify
the proper functioning of the DVFS algorithm. To avoid a large number of data for the graph,
we have used the basic Montage_25 workflow. The use of a static governor does not consider
adjustments in the frequency multiplier, so the power only varies depending on the
utilization of the machine. To show the adjustment of the frequency we have to use a
dynamic governor. In this example, the simulation has used the on_demand governor, with
its two parameters kept as the default configuration, with the threshold set to a value of 95%
and the sampling down factor set to 100 iterations. The functioning of this algorithm consists
as the following explanation. Having just one threshold it is considered as the up_threshold
and down_threshold. However, to use it as a down_threshold it first need to count a number
of iterations equal to the value of the sampling down factor parameter configured. Whenever
the utilization of the system gets over the threshold (in this case, over 0.95), as the system is
near the 100% of utilization and that will may cause a delay in the processing of tasks, the
system will scale the multiplier up to the highest value, so that the performance is set to its
maximum. One this happens, it will count a number of iterations equal to the sampling down
factor and then check the utilization value. If the utilization is lower than the threshold, the
multiplier will descend one level. This algorithm is repeated until the utilization is increased
over the threshold, increasing the multiplier to its maximum and beginning again.
In this example, we can see how at the beginning the multiplier is set to its maximum
of 4. The current load of tasks means a utilization of 60%, and a power near 95W. After 100
iterations, the utilization is checked and as its value is just 60%, which is lower than 95%,
the multiplier is reduced to 3. The process is repeated until the moment the multiplier is
reduced to 1, when the utilization is increased over 0.95 and the DVFS algorithm sets the
multiplier back to 4. Note that, during the period of time that the multiplier is below 4, the
power consumption is reduced. This does not occurs in the selected governor is performance,
leaving the multiplier to its maximum and wasting power.
Figure 20 shows the utilization of one machine. Consider the same load in the
machine during the full simulation. Knowing that a lower multiplier means lower MIPS, the
utilization will increase each time the multiplier is scaled down. However, the power
consumption will decrease, so lowering the multiplier when the system is idle is a method
of reducing the overall power consumption through the simulation.
76
Figure 20: Utilization evolution
Figure 21 shows the evolution of the frequency multiplier. As explained, it will be
set to its maximum whenever the utilization exceeds the threshold, and scaled down to save
power when it is below, taking into account the sampling down factor. This is the method of
scaling the consumption and performance of the system as a double-edged sword. A high
multiplier means more MIPS, what makes sure that there is no unnecessary delay in the tasks
execution, but the power consumption is higher, what could be wasted if the utilization is
not high enough. A lower value of the multiplier saves energy, but the system must make
sure that the utilization does not reach 100% at a low multiplier, as that would mean that the
tasks would be delayed unnecessarily.
Figure 21: Multiplier evolution
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.0
1
1.5
3
3.0
5
4.5
7
6.0
9
7.6
1
9.1
3
10.6
5
12.1
7
13.6
9
15.2
1
16.7
3
18.2
5
19.7
7
21.2
9
22.8
1
24.3
3
25.8
5
29.5
2
31.0
4
32.5
6
34.0
8
35.6
0
37.1
2
38.6
4
Time(s)
Utilization
1
1.5
2
2.5
3
3.5
4
0.0
1
1.4
7
2.9
3
4.3
9
5.8
5
7.3
1
8.7
7
10.2
3
11.6
9
13.1
5
14.6
1
16.0
7
17.5
3
18.9
9
20.4
5
21.9
1
23.3
7
24.8
3
28.4
4
29.9
0
31.3
6
32.8
2
34.2
8
35.7
4
37.2
0
38.6
6
Time(s)
Multiplier
77
Figure 22 shows the different values of power at each time the multiplier is changed.
Again, the objective is try to get the overall energy as low as possible, but never letting
unnecessary delays as explained.
Figure 22: Power evolution
With this merged simulator we get a base for future research to optimize energy
consumption, but considering real workflows and being power-aware. We can also choose
different types of governors in DVFS and compare the simulator's behavior when optimizing,
knowing the advantages of dynamic governors.
91
92
93
94
95
96
97
0.0
1
1.4
7
2.9
3
4.3
9
5.8
5
7.3
1
8.7
7
10.2
3
11.6
9
13.1
5
14.6
1
16.0
7
17.5
3
18.9
9
20.4
5
21.9
1
23.3
7
24.8
3
28.4
4
29.9
0
31.3
6
32.8
2
34.2
8
35.7
4
37.2
0
38.6
6
Time(s)
Power
78
4.2. FRBS results
As stated, the second stage of this project considers the development of two schedulers,
both based on an expert system. Similarly to the DVFS results section, we divide this section
in to parts. The first part will show the configuration of the simulator with respect to the
scenario of simulation, showing the characteristics of hosts, VMs and tasks that are executed.
In the second part, we show the results of the experiments got using these FRBS schedulers
in combination with other scheduling algorithms and analyze their efficiency.
4.2.1. FRBS simulation scenario
In this section we show the configuration of the elements of the simulator, including
characteristics of physical hosts, VMs and tasks to execute.
The scenario of these experiments contains 20 physical hosts, as in the case of the
DVFS scenario. However, as we are considering scheduling and the default power model in
the base simulator do not allow to model different types of processors, using the default
power model we would have a scenario where all physical machines have the same
characteristics, offering no possibilities of decision of which physical machine should
contain each VM that need to be created. To solve this, an alternative power model has been
added to the simulator, as explained in section 3.3.10. This power model needs configuring
multiple parameters, but allows having different physical machines with different resources
and performance, offering the experimental scenario that we need.
In Table 13, the different parameters needed to be configured in the physical hosts
are displayed. Additionally to those parameters, other three are required, which are set to the
same value in all hosts in this scenario. These additional parameters are the dynamicConstant,
set to 0.5 and required in equation (3) to calculate the dynamic component of the power
consumed by each host; staticConstant, set to 0.7 and required to calculate the static
component and total power and the array of percentages, which show the different
multipliers in percentages of the maximum values set. This array has been set to
{59.925, 69.93, 79.89, 89.89, 100.0} in all hosts. Applying these percentages to
maxFrequency and maxVoltage gets the different frequency and voltage values when the
multiplier of the processor is down set from the maximum.
Values shown in Table 13 are:
1. Maximum value of frequency for the processor.
2. Maximum value of voltage of the power supply.
79
3. Capacitance of the processor.
4. Instructions per cycle, used to get the value of MIPS performance.
Host maxFreq (GHz) maxVolt (V) Capt (F) * 𝟏𝟎−𝟖 IPC (Inst / Cycle)
1 3 3 0.5 0.5
2 2.9 2.95 0.625 0.575
3 2.8 2.9 0.75 0.65
4 2.7 2.85 0.875 0.725
5 2.6 2.8 1 0.8
6 2.5 2.75 1.125 0.875
7 2.4 2.7 1.25 0.95
8 2.3 2.65 1.375 1.025
9 2.2 2.6 1.5 1.1
10 2.1 2.55 1.625 1.175
11 2 2.5 1.75 1.25
12 1.9 2.45 1.875 1.325
13 1.8 2.4 2 1.4
14 1.7 2.35 2.125 1.475
15 1.6 2.3 2.25 1.55
16 1.5 2.25 2.375 1.625
17 1.4 2.2 2.5 1.7
18 1.3 2.15 2.625 1.775
19 1.2 2.1 2.75 1.85
20 1.1 2.05 2.875 1.925
Table 13: physical hosts’ configuration input parameters
The previous values refer to the input parameters, the values that need to be
configured in the simulator for each host. From these values, the power model gets other set
of parameters from the different equations shown in sections 3.1.1 and 3.3.10. The calculated
values are:
1. Frequencies array, which contains the different frequencies corresponding at each
multiplier.
80
2. Voltages array, containing the different values of the power supply depending on
the multiplier.
3. DynamicPower array, indicating the dynamic component of the power consumed
by the host, the part corresponding to the processor, applied to the different values
of the multiplier.
4. StaticPower, the component of the power that does not depend on the processor.
5. idlePower array, containing the different values of the static power depending on
the multiplier.
6. maxPower, the total power consumed by the host.
7. fullPower array, with the different values of the max power applying the different
multipliers.
8. maxMIPS, indicating the maximum performance that the host can achieve.
9. Mips array, considering the different multipliers applied to the maxMIPS.
To avoid listing the full arrays of each host, Table 14 shows the maximum values of
the dynamicPower, staticPower, totalPower and MIPS. From the parameters calculated, we
can see that there is a wide range of different values, obtaining a heterogeneous scenario
suitable for scheduling experiments.
Host dynPow (W) stcPow (W) totPow (W) maxMIPS
1 67.5 157.5 225 1500
2 78.87 184 262.89 1667
3 88.3 206 294.35 1820
4 95.94 223.87 319.82 1957
5 101.92 237.81 339.73 2080
6 106.35 248.14 354.5 2187
7 109.35 255.15 364.5 2280
8 111.04 259.1 370.14 2357
9 111.54 260.26 371.8 2420
10 110.95 258.88 369.83 2467
11 109.37 255.21 364.58 2500
12 106.92 249.48 356.4 2517
13 103.68 241.92 345.6 2520
81
14 99.75 232.75 332.5 2507
15 95.22 222.18 317.4 2480
16 90.17 210.41 300.58 2437
17 84.7 197.63 282.33 2380
18 78.87 184.03 262.9 2307
19 72.76 169.78 242.55 2220
20 66.45 155.05 221.51 2117
Table 14: physical host’s configuration calculated parameters
In the case of VMs, we set 20 of them again, allocating one per physical host. In this
case, there is no problem in selecting the amount of MIPS performance that each VM will
have, as a virtual machine is a software program running on the host and is configurable.
However, if we want the DVFS algorithm running properly on the different machines, we
cannot just select the values in a random manner. Instead, the values of MIPS selected will
be a 97.5% of the minimum MIPS each host has.
For example, if the first host has 1500 MIPS, like Table 14 shows, applying the first
multiplier of 59.925% we get the minimum MIPS of the host, which accounts for 898.87
MIPS. Applying a 97.5% we get a final value of 876.40 MIPS. This way, if VM number 1
is configured with this value and is allocated in host number 1, when the DVFS algorithm
scales down the multiplier to the minimum value, the utilization of the machine will go up
to 97.5%, surpassing the up_threshold and triggering the up scaling. This way, we get the
DVFS algorithm working at all times, instead of leaving the multiplier set to the minimum
and not using this algorithm.
However, even though each VM performance is calculated with respect to each host’s
MIPS, this does not mean that they will be allocated following that order. The VM scheduler
will take care of this stage, but we configure the VMs performance values to make them
similar to the hosts’ performance, taking into account that only one VM is created in each
host. In doing this, we are trying to make the DVFS algorithm to be executed in the
maximum hosts as possible, but that depends on the scheduler in the end. Table 15 show
these values.
82
VM 1 2 3 4 5
MIPS 525 583 637 685 728
VM 6 7 8 9 10
MIPS 765 798 825 847 863
VM 11 12 13 14 15
MIPS 875 881 882 877 868
VM 16 17 18 19 20
MIPS 853 833 807 777 741
Table 15: VM MIPS in FRBS scenario
Once all VMs have been scheduled to the physical hosts, it is the turn of the tasks
scheduling. In all experiments run, which results are shown in the next section, the workflow
used has been Montage_25, as it was in the case of the DVFS experiments in the first stage
of this project. The tasks scheduler takes care of following the order established in the
workflow, reproducing the process of the workflow, as if it were a real processing in a real
environment.
4.2.2. Rules generated for both FRBS schedulers
Here we show an example of some of the rules that were optimized by using the
Pittsburgh approach in Matlab. Antecedents consider 3 Membership Functions (MF),
{𝑙𝑜𝑤, 𝑛𝑜𝑟𝑚𝑎𝑙, ℎ𝑖𝑔ℎ} and the consequent is configured with 5, to get a higher range of
possibilities, {𝑣𝑒𝑟𝑦_𝑙𝑜𝑤, 𝑙𝑜𝑤, 𝑛𝑜𝑟𝑚𝑎𝑙, ℎ𝑖𝑔ℎ, 𝑣𝑒𝑟𝑦_ℎ𝑖𝑔ℎ}. Both Matlab functions based on
the Pittsburgh approach have got a total of 28 rules for each scheduler.
This scheduling stage considers the next process. First, the scheduler contains the
group of VMs that need to be created in a group of physical hosts. Then, for each VM, the
scheduler will check the selection rate with each host. The highest selection means the host
that will allocate that VM creation. The process is repeated until all VMs are created. The
antecedents considered are introduced in section 3.3.8, being vmMIPS the value of the MIPS
requested by the current VM and hostMIPS, utilization and power parameters belonging to
the host. Beginning with the VM scheduler, we show a sample of 5 rules.
83
1. If (vmMIPS IS low) AND (hostMIPS IS normal) AND (utilization IS low)
AND (power IS normal) THEN selection IS very_high
2. If (vmMIPS IS high) AND (hostMIPS IS high) AND (utilization IS normal)
AND (power IS low) THEN selection IS very_high
3. If (vmMIPS IS normal) AND (hostMIPS IS normal) AND (utilization IS
low) AND (power IS high) THEN selection IS very_low
4. If (vmMIPS IS high) AND (hostMIPS IS normal) AND (utilization IS high)
AND (power IS normal) THEN selection IS low
5. If (vmMIPS IS low) AND (hostMIPS IS normal) AND (utilization IS normal)
AND (power IS normal) THEN selection IS normal
Rules 1 and 2 show a very high selection, with low utilization and power in each of
them. Rule 3 show that a high power consumption gets a very low selection. In Rule 4 the
MIPS of the VM to create are high compared with the MIPS of the host. Rule 5 shows a
normal consequent rule.
The next five rules are a sample of the rule base belonging to the Tasks scheduler.
Antecedents are introduced in section 3.3.9, being MIPS a parameter of the current selected
VM and length representing the number of Millions of Instructions (MI) of the task. Power,
time and energy parameters consider both the current task and VM.
1. If (MIPS IS high) AND (power IS normal) AND (length IS low) AND (time
IS low) AND (energy IS low) THEN selection IS very_high
2. If (MIPS IS high) AND (power IS normal) AND (length IS high) AND (time
IS low) AND (energy IS low) THEN selection IS high
3. If (MIPS IS normal) AND (power IS high) AND (length IS low) AND (time
IS high) AND (energy IS high) THEN selection IS very_low
4. If (MIPS IS normal) AND (power IS low) AND (length IS high) AND (time
IS normal) AND (energy IS low) THEN selection IS normal
5. If (MIPS IS low) AND (power IS low) AND (length IS low) AND (time IS
high) AND (energy IS normal) THEN selection IS normal
Rules 1 and 2 show how low powers of energy and time get a high selection of the
VM to process this task. Rule 3 shows the opposite, high values of power, time and energy
get a very low selection score for that VM to process that task. Rules 4 and 5 show a
84
combination of high and low values, which consider a normal score in the selection of the
current VM.
4.2.3. FRBS savings
After both fuzzy schedulers have been programmed and added to the simulator, we
have tested their results and compared them to the other algorithms explained in this
document. First we list the algorithms tested in both VMs and Tasks schedulers and the
combinations run.
VMs
VmWattsPerMetricMips.
VmFuzzy.
Tasks
MinMin.
MaxMin.
PowerMinMin.
PowerMaxMin.
PowerFuzzySeq.
PowerFuzzyMinMax.
PowerFuzzyMaxMax.
All possible combinations have been tested, resulting in a total of 14 simulations, all
of them running following the tasks order established in the Montage 25 DAG. Four
parameters are got as the result of each experiment: execution time, total power consumption,
average power consumption and energy consumption, the same parameters considered in the
results of the first stage, effects of DVFS algorithm in the simulator. Each experiment is
named in the following way: VMScheduler / TasksScheduler. Being the default scenario in
the WorkflowSim simulator VmWattsPerMetricMips / MinMin, results include a percentage
of savings with respect to this configuration. As we did on section 4.1, we show here graphs
to facilitate the understanding of the results shown in the tables. Also, as overall power
values are much higher than the rest, we do not include these values on the graphs as they
would make more difficult to see the other results.
First, we begin showing the results got running the experiments with the VM
scheduler already implemented in WorkflowSim, based on a Watts per MIPS metric. We
85
use here the tasks schedulers MinMin and MaxMin, and the power-aware variants
implemented. In all the experiments, we are considering the heterogeneous scenario
explained in section 3.2.10, instead of the homogeneous scenario by default, where all Hosts
and VMs were identical.
The results got from these 4 initial experiments before testing any Fuzzy scheduler
are shown in Table 16. Here we can see that, as explained, MaxMin solves the problem of
the non-optimum scheduling, wasting the best resources with rather short tasks. For this
reason, the values of time and energy are lower using MaxMin than MinMin. Opposite as
expected, the Power-aware variations achieve much worse results than their time based ones.
WmWattsPerMipsMetric
Results Savings (%)
Time (s) Total (W) Avg (W) Energy (Wh) Time Total Avg Energy
MinMin 65.11 81820.60 12.57 22.73 - - - -
MaxMin 64.19 81390.27 12.68 22.61 1.41 0.53 -0.90 0.53
PowerMinMin 90.90 116921.40 12.86 32.48 -39.61 -42.90 -2.36 -42.90
PowerMaxMin 89.53 112911.19 12.61 31.36 -37.50 -38.00 -0.36 -38.00
Table 16: basic experiments
Figure 23 show the results from these 4 first experiments. As displayed in the table,
the loweset values of energy are got by using the MaxMin algorithm.
Figure 23: VmWattsPerMipsMetric result values
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
80,00
90,00
100,00
Time (s) Avg (W) Energy (Wh)
VmWattsPerMipsMetric values
MinMin MaxMin PowerMinMin PowerMaxMin
86
In case of displayed as percentage of savings with respect to the basic default
WattsPerMips / MinMin configuration, Figure 24 attribite the highest savings to the MaxMin
algorithm as well.
Figure 24: VmWattsPerMipsMetric result savings
In the second stack of experiments, we introduce the fuzzy tasks scheduler along with
the default VM scheduler. In this case, we cannot find major improvements in the tasks
scheduling with respect to the classic MinMin scenario, which proves to provide a robust
algorithm able of optimizing different scenarios. Table 17 shows that PowerFuzzyMaxMax
only gets a 0.71% reduction in the energy consumption, while PowerFuzzyMinMin gets
even worse results.
WmWattsPerMipsMetric
Results Savings (%)
Time (s) Total (W) Avg (W) Energy (Wh) Time Total Avg Energy
PowerFuzzySeq 65.11 81822.62 12.57 22.73 0.00 0.00 0.00 0.00
FuzzyMinMax 64.74 82077.07 12.68 22.80 0.57 -0.31 -0.89 -0.31
FuzzyMaxMax 64.07 81240.43 12.68 22.57 1.60 0.71 -0.90 0.71
Table 17: experiments with fuzzy task scheduler
Figure 25 shows these results, but it is somehow difficult to distinguish which is
higher.
-50,00
-45,00
-40,00
-35,00
-30,00
-25,00
-20,00
-15,00
-10,00
-5,00
0,00
5,00
Time Avg Energy
%
VmWattsPerMipsMetric savings
MaxMin PowerMinMin PowerMaxMin
87
Figure 25: VmWattsPerMipsMetric result values (2)
However, Figure 26 shows in finer detail the better results got by
PowerFuzzyMaxMax in both time and energy values. In this case, the power consumed by
this algorithm is the worst of the fuzzy schedulers, but the high savings got in both time and
energy make up for this power worsening thanks to the time / power balance.
Figure 26: VmWattsPerMipsMetric result savings (2)
In this third round of experiments, the VM fuzzy scheduler proves to get much better
results than the default Watts per MIPS ratio based VM scheduler. Using this in combination
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
Time (s) Avg (W) Energy (Wh)
VmWattsPerMipsMetric values
PowerFuzzySeq PowerFuzzyMinMax PowerFuzzyMaxMax
-1,50
-1,00
-0,50
0,00
0,50
1,00
1,50
2,00
Time Avg Energy
%
VmWattsPerMipsMetric savings
PowerFuzzySeq PowerFuzzyMinMax PowerFuzzyMaxMax
88
with MaxMin achieves a 7.07% of reduction in the energy consumption, which is not a
negligible improvement.
VmsFuzzy
Results Savings (%)
Time (s) Total (W) Avg (W) Energy (Wh) Time Total Avg Energy
MinMin 65.11 77127.06 11.85 21.42 0.00 5.74 5.74 5.74
MaxMin 64.19 76037.23 11.85 21.12 1.41 7.07 5.74 7.07
PowerMinMin 73.18 86693.08 11.85 24.08 -12.40 -5.96 5.74 -5.96
PowerMaxMin 75.10 88967.51 11.85 24.71 -15.35 -8.73 5.74 -8.74
Table 18: experiments with fuzzy VM scheduler
In Figure 27 we can see how the power-aware alternatives of MinMin and MaxMin
get higher values than their classic siblings.
Figure 27: VmsFuzzy result values
Figure 28 shows how the difference is higher in time than in energy, but still
preferring the use of the classic scheduling algorithms.
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
80,00
Time (s) Avg (W) Energy (Wh)
VmsFuzzy values
MinMin MaxMin PowerMinMin PowerMaxMin
89
Figure 28: VmsFuzzy result savings
In this last step, both fuzzy schedulers have been used. As the results related to the
PowerFuzzyMaxMax scheduler were expected to be the best, this is the one that has been
further tested to achieve the best results. As Table 19 shows, the combination of both
VmsFuzzy scheduler and PowerFuzzyMaxMax gets to achieve a 7.23% of reduction in the
energy consumption. This result and the rules got by both Matlab functions have been run
in an iterative process until the enhancements were too low for continuing. At first, a rule
base for tasks was got. Then, using that result, used the vmPittsburgh to get a rule base for
the VMs scheduler. And so, repeating this process keeping the rule base from the previous
step until there were not important improvements.
VmsFuzzy
Results Savings (%)
Time (s) Total (W) Avg (W) Energy (Wh) Time Total Avg Energy
FuzzySeq 64.27 76132.00 11.85 21.15 1.29 6.95 5.74 6.95
FuzzyMinMax 66.78 79105.33 11.85 21.97 -2.56 3.32 5.74 3.32
FuzzyMaxMax 64.08 75906.93 11.85 21.09 1.58 7.23 5.74 7.23
Table 19: experiments with both fuzzy schedulers
In Figure 29, such as in Figure 25 is not easy to distinguish the differences between
the results got with each of the three fuzzy schedulers.
-20,00
-15,00
-10,00
-5,00
0,00
5,00
10,00
Time Avg Energy
%
VmsFuzzy savings
MinMin MaxMin PowerMinMin PowerMaxMin
90
Figure 29: VmsFuzzy result values (2)
Once more, savings make easier in Figure 30 to see the better result of
PowerFuzzyMaxMax over the other two alternatives.
Figure 30: VmsFuzzy result savings (2)
Results show that the combination of both expert systems accomplish the best results
compared to other combinations. These results depend to a large extent on the rules
optimized with the Matlab functions running the Pittsburgh algorithm. A different set of
rules may be found using other algorithms, such as KASIA [ 42 ] and these results might be
different.
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
80,00
Time (s) Avg (W) Energy (Wh)
VmsFuzzy values
PowerFuzzySeq PowerFuzzyMinMax PowerFuzzyMaxMax
-4,00
-2,00
0,00
2,00
4,00
6,00
8,00
Time Avg Energy
%
VmsFuzzy savings
PowerFuzzySeq PowerFuzzyMinMax PowerFuzzyMaxMax
91
5. Conclusions
After all experiments have been run and analyzed, we can conclude acknowledging
that the algorithms developed achieves a lower value of the energy consumption, as we were
looking for. Although these algorithms and results are based on experiments executed in a
simulator and we cannot affirm that the results got are exactly what we would get by running
these algorithms in a real system, we can assure that these results are similar to the real
scenario due to the characteristics implemented in the simulator. Not including the DVFS
algorithm would not make the fuzzy schedulers to save more or less energy, but the results
that we get including this algorithm are nearer to a real scenario, where processors running
on the servers do execute the DVFS algorithm internally.
Additionally, the importance of the DAG inclusion allows us to reassure the
authenticity of the results and the reduction of energy consumed with these fuzzy schedulers,
as the order of tasks that are executed in these experiments follow a real pattern of
consecutions, as well as their lengths are not randomly generated.
Future developments in this project will include a double side optimization, based on
both time and energy parameters as fitness in these algorithms. As expected, results in Table
19 show a good energy reduction, but cannot achieve a great time lowering. This is due to
all algorithms and Pittsburgh are based on soloing energy as fitness, not taking times into
consideration. Results that get lower execution times also get higher energy consumption.
This double side optimization would get a balance between these two parameters, achieving
values that satisfy both the requirements of low time and energy in the Cloud Computing
systems.
In the case of the merged simulator, being open source we offer a great opportunity for
researchers all over the world of being able of testing different algorithms and checking the
results in time, power and energy, knowing that they are able of executing real traces,
reproducing traffic workloads of real Datacenters. This can save high amounts of funds in
early stages of researching, when new algorithms want to be tested. Once the algorithm gets
good results in the simulation environment, it can be test on a real scenario to confirm its
effectiveness. In all cases, this open source simulator can continue growing in characteristics
by any researcher who wishes to contribute in the project.
92
Bibliography
[ 1 ] J. G. Koomey, ―Estimating total power consumption by servers in the US and the
world, Oakland, CA: Analytics Press. February 15, 2007.
[ 2 ] A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. Giampapa, R. A.
Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D.
Steinmacher-Burow, T. Takken, and P. Vranas, “Overview of the blue gene/l system
architecture,” IBM Journal of Research and Development, vol. 49, no. 2-3, pp. 195–212,
2005.
[ 3 ] K. Li, “Performance analysis of power-aware task scheduling algorithms on
multiprocessor computers with dynamic voltage and speed,” IEEE Trans. Parallel Distrib.
Syst., vol. 19, no. 11, pp. 1484–1497, 2008.
[ 4 ] W. Forrest, “How to cut data centre carbon emissions,” Website, December 2008.
[Online]. Available: http://www.computerweekly.com/Articles/2008/12/05/233748/how-
to-cut-data-centre-carbon-emissions.htm
[ 5 ] Vuong, P. T., Madni, A. M., & Vuong, J. B. (2006, July). VHDL implementation for a
fuzzy logic controller. In Automation Congress, 2006. WAC'06. World (pp. 1-8). IEEE.
[ 6 ] X. Wang and M. Chen, “Cluster-level feedback power control for performance
optimization,” in Proc. IEEE 14th Int. Symp. High Performance Computer Architecture
HPCA 2008, 2008, pp. 101–110.
[ 7 ] X. Wang and Y. Wang, “Coordinating power control and performance management for
virtualized server clusters,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 2, pp. 245–259,
2011.
[ 8 ] S. Mittal, “A survey of architectural techniques for improving cache power efficiency,”
Sustainable Computing: Informatics and Systems, 2013.
[ 9 ] Q. Wu, P. Juang, M. Martonosi, L.-S. Peh, and D. W. Clark, “Formal control techniques
for power-performance management,” IEEE Micro, vol. 25, no. 5, pp. 52–62, 2005.
[ 10 ] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield,
“Live migration of virtual machines,” in Proc. 2nd conference on Symposium on Networked
Systems Design & Implementation - Volume 2, ser. NSDI’05. Berkeley, CA, USA: USENIX
Association, 2005, pp. 273–286.
[ 11 ] A. Beloglazov and R. Buyya, “Adaptive threshold-based approach for energy-efficient
consolidation of virtual machines in cloud data centers, ”in Proc. 8th International Workshop
93
on Middleware for Grids, Clouds and e-Science, ser. MGC ’10. New York, NY, USA: ACM,
2010, pp. 4:1–4:6.
[ 12 ] A. Beloglazov and R. Buyya, “Energy efficient allocation of virtual machines in cloud
data centers,” in Proc. 10th IEEE/ACM Int Cluster, Cloud and Grid Computing (CCGrid)
Conf, 2010, pp. 577–578.
[ 13 ] V. Anagnostopoulou, S. Biswas, A. Savage, R. Bianchini, T. Yang, and F. T. Chong,
“Energy conservation in datacenters through cluster memory management and barely-alive
memory servers,” in Proceedings of the 2009 Workshop on Energy Efficient Design, 2009.
[ 14 ] L. Liu, H. Wang, X. Liu, X. Jin, W. B. He, Q. B. Wang, and Y. Chen, “GreenCloud:
a new architecture for green data center,” in 6th international conference industry session on
Autonomic computing and communications industry session. ACM, 2009, pp. 29–38.
[ 15 ] J. Leverich, M. Monchiero, V. Talwar, P. Ranganathan, and C. Kozyrakis, “Power
management of datacenter workloads using per-core power gating,” Computer Architecture
Letters, vol. 8, no. 2, pp. 48–51, 2009.
[ 16 ] S. Wang, J.-J. Chen, J. Liu, and X. Liu, “Power saving design for servers under
response time constraint,” in Proc. 22nd Euromicro Conf. Real-Time Systems (ECRTS),
2010, pp. 123–132.
[ 17 ] J. D. Moore, J. S. Chase, P. Ranganathan, and R. K. Sharma, “Making scheduling cool:
Temperature-aware workload placement in data centers.” In USENIX annual technical
conference, General Track, 2005, pp. 61–75.
[ 18 ] L. Li, C.-J. M. Liang, J. Liu, S. Nath, A. Terzis, and C. Faloutsos, “Thermocast: a
cyber-physical forecasting model for datacenters,” in 17th ACM SIGKDD international
conference on Knowledge discovery and data mining. ACM, 2011, pp. 1370–1378.
[ 19 ] Patel, C., Sharma, R., Bash, C., and Graupner, S.2002. Energy Aware Grid: Global
Workload Placement based on Energy Efficiency. Tech. rep., HP Laboratories.
[ 20 ] Wang, L., von Laszewski, G., Dayal, J., and Wang, F. 2010. Towards Energy Aware
Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS. In
Conference on Cluster, Cloud and Grid Computing (CCGrid). 368-377.
[ 21 ] Merkel, A. and Bellosa, F. 2006. Balancing power consumption in multiprocessor
systems. ACM SIGOPS Operating Systems Review 40, 4, 403-414.
[ 22 ] Chen, G., He, W., Liu, J., Nath, S., Rigas, L., Xiao, L., and Zhao, F. 2008. Energy-
aware server provisioning and load dispatching for connection-intensive internet services.
In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 337-
350.
94
[ 23 ] Gathering Clouds of XaaS! http://www.ibm.com/developer.
[ 24 ] Kernel Based Virtual Machine, www.linux-kvm.org/page/MainPage
[ 25 ] VMWare ESX Server, www.vmware.com/products/esx
[ 26 ] XenSource Inc, Xen, www.xensource.com
[ 27 ] P. Hale, “Acceleration and time to fail,” Quality and Reliability Engineering
International, vol. 2, no. 4, 1986.
[ 28 ] K. H. Kim, R. Buyya, and J. Kim, “Power aware scheduling of bag-of-tasks
applications with deadline constraints on dvs-enabled clusters,” in CCGRID, 2007, pp. 541–
548.
[ 29 ] R. Ge, X. Feng, and K. Cameron, “Performance-constrained distributed dvs scheduling
for scientific applications on power-aware clusters,” in Proceedings of the 2005 ACM/IEEE
conference on Supercomputing. IEEE Computer Society Washington, DC, USA, 2005.
[ 30 ] J. Li and J. F. Martínez, “Dynamic power-performance adaptation of parallel
computation on chip multiprocessors,” in HPCA, 2006, pp. 77–87.
[ 31 ] C. Piguet, C. Schuster, and J. Nagel, “Optimizing architecture activity and logic depth
for static and dynamic power reduction,” in Circuits and Systems, 2004. NEWCAS 2004. The
2nd Annual IEEE Northeast Workshop on, 2004, pp. 41–44.
[ 32 ] Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A., & Buyya, R. (2011).
CloudSim: a toolkit for modeling and simulation of cloud computing environments and
evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1),
23-50.
[ 33 ] Tom Guerout, Thierry Monteil, Georges Da Costa, Rodrigo N. Calheiros, Rajkumar
Buyya, Mihai Alexandru. Energy-aware simulation with DVFS. Simulation Modelling
Practice and Theory, Volume 39, pages 76-91, December 2013.
[ 34 ] Chen, W., & Deelman, E. (2012, October). Workflowsim: A toolkit for simulating
scientific workflows in distributed environments. In E-Science (e-Science), 2012 IEEE 8th
International Conference on (pp. 1-8). IEEE.
[ 35 ] Pegasus, workflow generator. Available at: https://pegasus.isi.edu/
[ 36 ] Thulasiraman, K., & Swamy, M. N. S. (1992). 5.7 Acyclic Directed Graphs. Graphs:
Theory and Algorithms, 118.
[ 37 ] Pegasus montage workflow. Available at:
https://confluence.pegasus.isi.edu/display/pegasus/Montage
[ 38 ] Fei Cao, Michelle M. Zhu, Chase Q. Wu, “Energy-Efficient Resource Management
for Scientific Workflows in Clouds”, in 2014 IEEE 10th World Congress on Services.
95
[ 39 ] Cingolani, Pablo, and Jesús Alcalá-Fdez. "jFuzzyLogic: a Java Library to Design
Fuzzy Logic Controllers According to the Standard for Fuzzy Control Programming".
[ 40 ] Cingolani, Pablo, and Jesús Alcalá-Fdez. "jFuzzyLogic: a robust and flexible Fuzzy-
Logic inference system language implementation." Fuzzy Systems (FUZZ-IEEE), 2012
IEEE International Conference on. IEEE, 2012.
[ 41 ] Smith S.F.: A learning system based on genetic adaptive algorithms. PhD thesis,
University of Pittsburgh.
[ 42 ] Prado, R. P., Garcia-Galan, S., & Expósito, J. M. (2011, April). KASIA approach vs.
Differential evolution in fuzzy rule-based meta-schedulers for grid computing. In Genetic
and Evolutionary Fuzzy Systems (GEFS), 2011 IEEE 5th International Workshop on (pp.
87-94). IEEE.
[ 43 ] Seddiki, M., de Prado, R. P., Munoz-Expósito, J. E., & García-Galán, S. (2014). Fuzzy
Rule-Based Systems for Optimizing Power Consumption in Data Centers. In Image
Processing and Communications Challenges 5 (pp. 301-308). Springer International
Publishing.
[ 44 ] García-Galán, S., Prado, R. P., & Expósito, J. M. (2015). Rules discovery in fuzzy
classifier systems with PSO for scheduling in grid computational infrastructures. Applied
Soft Computing, 29, 424-435.
[ 45 ] García-Galán, S., Prado, R. P., & Expósito, J. E. M. (2014). Swarm Fuzzy Systems:
Knowledge Acquisition in Fuzzy Systems and Its Applications in Grid Computing. IEEE
Transactions on Knowledge and Data Engineering, 26(7), 1791-1804.
[ 46 ] Prado, R. P., Expósito, J. M., & Yuste, A. J. (2010). Knowledge acquisition in fuzzy-
rule-based systems with particle-swarm optimization. IEEE Transactions on Fuzzy Systems,
18(6), 1083-1097.
[ 47 ] Prado, R. P., García-Galán, S., Yuste, A. J., & Expósito, J. M. (2010). A fuzzy rule-
based meta-scheduler with evolutionary learning for grid computing. Engineering
Applications of Artificial Intelligence, 23(7), 1072-1082.