final our report

8/4/2019 Final Our Report

1/106

SHRI G. S. INSTITUTE OF TECHNOLOGY & SCIENCE

INDORE (M.P.)

DECENTRALIZED LOAD BALANCING SYSTEM

OVER LOCAL AREA NETWORK

Major Project towards the partial fulfillment of the Degree of Bachelor

of EngineeringYear 2009 10

Department of Information Technology Engineering

Guided By: SubmittedBy:Mr.MUKUL SHUKLA AB-48001 AAMIRMANASAWALA AB-48047 NIKITA PAHADIA

AB-48052PRIYA MISHRA

1


2/106

AB-48062SAMARTH MOD

AB-48065SHIPRA AGRAWAL


INDORE (M.P.)

Certificate

This is to certify that the project report entitled "Decentralized Load

Balancing System over Local Area Networks", submitted by Aamir

Manasawala, Nikita Pahadia, Priya Mishra, Samarth Mod and Shipra

Agrawal towards the partial fulfillment of degree of Bachelor of

Engineering (Information Technology) of Rajiv Gandhi Proudyogiki

Vishwavidyalaya, Bhopal, is a satisfactory account of their project work andis approved for the award of degree.

2


3/106

Internal Examiner External Examiner


INDORE (M.P.)

Recommendation

This is to certify that the project report entitled "Decentralized Load

Balancing System over Local Area Networks", submitted by Aamir

Manasawala, Nikita Pahadia, Priya Mishra, Samarth Mod and Shipra

Agrawal towards the partial fulfillment of degree of Bachelor of

Engineering (Information Technology) of Rajiv Gandhi Proudyogiki

Vishwavidyalaya, Bhopal, is a satisfactory account of their project work and

is approved for the award of degree.

Mr. Mukul Shukla HeadProject Guide Information Technology

3


4/106

Director

SGSITS, INDORE

ACKNOWLEDGEMENT

Every endeavor we understand takes an indomitable urge, perseverance, and proper

guidance especially when it is most needed. Internally motivated to undertake some

appreciable work as our degree project, we came to undertake the project. Unsure

though but with a hope and then we were introduced to these project work to be

completed, initially when we had hardly ever thought of, the kind of work we were

going to do.

We deeply indebted to our project guide Mr. Mukul Shukla who generously

shared his wisdom and expertise with us, both as our project guide and teacher for

providing us his valuable guidance and suggestions.

We wish to express our deep sense to our H.O.D. and guide Mr. Mukul Shukla and

the whole faculty members of the department of I.T. for encouraging and giving

moral support, not only regarding this project but also throughout our studies at the

institute. We would like to pay special thanks to our Director and all those people

who helped us in making this project a success..

4


5/106

AB-48001 AAMIRMANASAWALA

AB-48047 NIKITA PAHADIA

AB-48052 PRIYA MISHRA

AB-48062 SAMARTH MODAB-48065 SHIPRA AGRAWAL

TABLE OF CONTENTS

Chapter 1: INRTODUCTION

Preface

1.1 Problem Domain

1.2 Project Objective

1.2.1 Transparency

1.2.2 Configurable

1.2.3 Automatic

1.2.4 Convenient

1.3 Advantages of our System

1.4 Scope and Limitations

1.5 Organization of the project report

Chapter 2: FUNDAMENTALS

2.1 Load Measure

2.1.1 Load Metrics

2.2 Spectrum of Load Balancing

2.1.2 Processor Level Load Balancing

2.1.3 Network Level Load Balancing

2.3 Features Of Good Load Balancing System

2.3.1 Local Autonomy

2.3.2 Centralization and decentralization of the decision and

actions

2.3.3 Time out period required for the sender

2.3.4 Transferring of the files needed

2.3.5 Distributed file system can be used

2.3.6 Total work should be completed as fast as possible2.3.7 Effective distribution

5


6/106

2.3.8 Fair distribution

2.3.9 Tasks with constraints

2.3.10 Scalability

2.3.11 Stable

2.3.12 Fault tolerant

2.4 Classification Of Load balancing Algorithms2.4.1 Load Balancing and Load Sharing

2.4.2 Preemptive and Non-Preemptive Algorithm

2.4.3 Static and Dynamic Algorithms

2.5 Logical components of load balancing system

2.5.1 Transfer Policy

2.5.2 Selection policy

2.5.3 Location Policy

2.5.4 Information policy

2.6 Communication Modes

2.6.1 Shares Memory

2.6.2 Message Passing

Chapter 3: ANALYSIS

3.1 Existing System

3.2 Proposed System

3.3 Feasibility

3.3.1 Technical Feasibility

3.3.2 Economic Feasibility

3.4 Software Requirements

3.5 Hardware requirements

3.6 Assumptions of the Operating Environment

3.6.1 File system

3.6.2 No process Migration

3.6.3 Multiprogramming

3.6.4 Network

Chapter 4: DESIGN

4.1 Design of Load Balancing System

4.1.1 Architectural Diagram Of Load Balancing System

4.1.2 User Process

4.1.3 The Sender Manager

4.1.4 The Receiver Manager

4.1.5 The Load Information Module

4.1.6 The Task Information Module

4.1.7 Communication

4.1.8 Load Metrics and its implementation

Chapter 5: IMPLEMENTATION

6


7/106

5.1 Algorithm Implemented

5.2 Communication used

5.3 Achieving Inter Process Communication

5.3.1 TCP/IP and UDP Communications5.3.1.1 Datagram Communication

5.3.1.2 Stream Communication

5.3.2 C/C++ Socket APIs

5.4 K Developer for C/C++

5.5 Linux

5.6 Programming Languages

5.7 Implementation Diagram

5.8 Remote Command Execution

5.9 Algorithms

Chapter 6: EXPERIMENTS AND RESULTS

6.1 Unit Based Testing

6.2 Integration Based Testing

6.3 Deployment

6.4 Snap Shots

6.5 Results Gathered

Chapter 7: CONCLUSION

7.1 Inferences Drawn

7.2 Project Extensions

References

Appendix

A. Source Code

B. User Manual

7


8/106

CHAPTER 1

INTRODUCTION

This chapter provides a short description of the software being specified and its purpose

including problem background, which is the main reason behind developing this project,

short description of the problem statement including relevant benefits, objectives and

goals. Along with this it also specifies scope and limitations of the system.

1.1 Preface

In mid 19th century when computers were interconnected, i.e., Computer Networks were

created, the main motivation was to be able to exchange information between computers,

but as technology improved, with Computer Networks many other challenging tasks werepossible.

With the advent of faster and swift LAN and WAN technologies, it became convenient

and effective to share hardware and software resources. LAN technologies became so

efficacious that sometimes it was able to outperform local kernel execution speeds, and

this was the cardinal reason for the new brand of technology of Distributed Systems.

Now, Distributed Systems are prevailing technologies with a wide variety of applications.

One of the purposes of distributed system is to enable applications and services to proceed

concurrently without competing for same resources and to exploit the available

computational resources (processor, memory and network capacities).

1.2Problem DomainA LAN consists of group of individual (may be autonomous) systems connected to each

other and capable of exchanging data and messages. The high speed and low delay of

modern LAN technologies makes it so efficient that it can sometimes outperform the local

disk performances. Therefore it may be a good facility if these computers can share Load

with each other.

We have provided a design for properly utilizing available resources in LAN using a Load

Balancing strategy. Load balancing can concisely be defined as follows:

8


9/106

Load Balancing is a technique to share (spread) load / services, work between two or

more machines, in order to get optimal resource utilization, throughput, or response time.

It is said to provide increased reliability through redundancy.

In this report we are discussing the design of a load balancing mechanism. First we have

analyzed the current scenario of load distribution and available technologies. Then wehave taken various features and we have proposed our system.

1.3 Project Objective

The main objective of our project is to implement a Transparent, Automatic, Configurable

& Convenient Decentralized Process Level Load Balancing System, that facilitates proper

distribution of load over a Local Area Network with a view to optimize resource

utilization. The project aims at sharing the workload of a highly loaded node by

transferring the processes to any lightly loaded node. The project aims at realization of a

Process Level Distributed System similar to the standard well known File Level

Distributed Systems like NFS, AFS, etc.

Thus the implementation considers the following features:

1.3.1 Transparency

The system maintains transparency with the user by hiding all the details regarding the

demons of our Load Balancing System. The user gets a feel that the process is being

executed in his system only. Transparency is apparent because the user executes a process

in similar fashion irrespective of whether it is forked locally or remotely.

For example, the Network File System is transparent because it introduces the access to

files stored remotely on the network in similar fashion to that of a local access. Earlier File

Transfer Protocol (FTP) is considerably less transparent because it requires each user to

learn how to access files through an ftp client.

1.3.2 Configurable

The users can configure the system as per the individual choice of sharing the load. A user

can:-

a. Subscribe/Unsubscribe the feature of Load Balancing.

b. Subscribe/Unsubscribe for the transferring of load/task to other nodes.

c. Subscribe/Unsubscribe for the receiving of load/task from other nodes.

d. Can do both of the above stated tasks simultaneously.

e. Can Subscribe/Unsubscribe for certain specific tasks only.

For instance, a user may wish to execute all Database related tasks on his system and all

the calculation oriented tasks on another node of the LAN.

1.3.3 Automatic

The distribution of processes is fully automatic. Each node of the system would

automatically have accurate and current load of other nodes at every point in time which is

governed by the standard protocol of load transfer. Whenever a new process is submitted,

considering the Transfer policy of the system, a decision will be made whether to initiate

the process on another node or not. This whole task will be automatic and will not require

any user command.

1.3.4 Convenient

9


10/106

As Automation is followed by convenience, so does the system. The system ensures

maximum flexibility to the user with minimum user involvement. Most of the tasks are

being handled by the system so user is not involved most of the time.

1.4 Advantages of our system

a. Reduces response time for processesThe processing of the application on a lightly loaded node will be able to reduce the

response time because the processing will be faster on a lightly loaded node as compared

to a heavily loaded node.

b. LAN can be easily upgradedConsider a LAN which needs to be upgraded. Instead of changing all processors, we

instead bring some high performance processors and connect them in the LAN. Our LoadBalancing System now would be able to remotely execute processes on newly added high

performance nodes which earlier could not have been locally executed. This will not only

be cost effective, but also indirectly help in upgrading the LAN.

c. Improves PerformanceMoving processes closer to the resource it needs to use, will improve the performance.

E.g. For accessing a file, we can move our process to the node on which the file is located,

and hence, the process effectively performs the input output operation.

d. Solution for Architecture dependent applicationsIt will provide a cost effective solution to architecture dependent applications. E.g.

suppose we want to use a 64 bit application and we dont have processor to support the

same. If we have a node having the capability to support 64 bit applications, then our

request will be transferred to that node and output in proper format returned to the node.

1.5 Scope and Limitation

The scope of our system is restricted to a fully fledged load information policy that helps

to determine the correct node where the user process would be executed. After the

determination of the best node in the LAN, the system can efficiently execute the process

on a remote node and display its output and error locally.

The scope of our system is limited to only transferring the processes that are not yet run

therefore the task will be completely run on the node where it is started, that is there is no

process migration incorporated. The task information module we are presenting is having

a basic functionality to help in making selection decisions. A more sophisticated module

can be a future enhancement.

1.6 Organization of the project report

The project report has been organized in the following manner:

Chapter 1 presents the introduction of the project work including the problem domain,

objective and scope of our system. We have discussed the fundamental details of load

balancing in the 2nd

chapter. This includes the load metric, classification and variousalgorithms related to load balancing. The 3rd chapter deals with the existing systems and

10


11/106

then proposes our system. In Chapter 4 we present the design, which includes the

components and there placement in the architecture. Chapter 5 deals with the

implementation part of the project. Test cases and results obtained for various user inputs,

graphical representation of the results obtained is presented in Chapter 6. Finally, Chapter

7 derives the conclusion and proposes some future extensions.

CHAPTER 2

FUNDAMENTALS

2.1 Load Measure

2.1.1 Load metrics

The load metric defines a value that characterizes the load that will be used as the

parameter for deciding whether to transfer the process or not. There are certain options

available as defined below:

a. Number of processes

This may be inadequate because some are swapped out, dead, etc.b. Current/average length of CPU queue

c. Length of ready queue, length of I/O queues

This method correlates well with response time and is used extensively.

Also it does correlate with CPU utilization, particularly in an interactive

environment.

d. Residual running time of all processes

The value is not easy to get but it gives possible estimate of the time

consumed overall.

e. Context switch rate

f. Main memory utilization

2.2 Spectrum of load balancing

The applications of load balancing can be found in multiple fields of study. May it be a

network where our application is deployed, like the DNS servers where the multiple

requests are load balanced onto various back end servers, or it may be autonomous

computers on a Local Area Network. Some applications are discussed below.

2.2.1 Processor Level Load Balancing

This sort of application is implemented on tightly coupled systems.

E.g.:-V-System, it was developed at Stanford in 80-ies, it is microkernel-based and UNIX-

emulating, which binds several workstations on a LAN into a distributed system.

11


12/106

Load index - CPU utilization at the node

Information policy - State change driven (each node broadcasts whenever its state changessignificantly, info is cached by all nodes)

State change (rather than demand-driven) scheme is selected because it does not vary asmuch with load

Selection policy - Only new tasks are scheduled for transferLocation policy - Each machine randomly selects one of the lightly loaded machines fromcache.

2.2.2 Network Load Balancing

This sort of application is implemented on Network resources, such as Media or server

applications. It works at various levels as discussed below.

Layer-2 load balancing

Also known as link aggregation, port aggregation, ether channel, or gigabit ether channelport bundling is to bond two or more links into a single, higher-bandwidth logical link.

Aggregated links also provide redundancy and fault tolerance if each of the aggregated

links follows a different physical path. Link aggregation may be used to improve access to

public networks by aggregating modem links or digital lines. Link aggregation may also

be used in the enterprise network to build multi-gigabit backbone links between Gigabit

Ethernet switches. See also NIC teaming or Link Aggregation Control Protocol (LACP).

Layer-4 Load Balancing

Layer-4 load balancing is to distribute requests to the servers at transport layer, such as

TCP, UDP and SCTP transport protocol. The load balancer distributes network

connections from clients who know a single IP address for a service, to a set of servers

that actually perform the work. Since connection must be established between client and

server in connection-oriented transport before sending the request content, the load

balancer usually selects a server without looking at the content of the request. Examples:

a) Linux Virtual ServerThe Linux Virtual Server Project (LVS) implements layer-4 switching in the Linux

Kernel. This allows TCP and UDP sessions to be load balanced between multiple real

servers. Thus it provides a way to scale Internet services beyond a single host. HTTP and

HTTPS traffic for the World Wide Web is probably the most common use. It can also be

used for more or less any service e.g. email and X Windows System.LVS itself runs on Linux; however it is able to load balance connections from end users

running any operating system to real servers running any operating system. As long as the

connections use TCP or UDP, LVS can be used.

LVS is very high performance. It is able to handle upwards of 100,000 simultaneous

connections.

b) Network Load Balancing Services (NLBS)Proprietary Microsoft implementation of clustering and load balancing provides high

availability ,high reliability and high scalability NLBS is intended for applications with

relatively small data sets that rarely change (one example would be web pages), and do not

have long-running-in-memory states. These types of applications are called statelessapplications, and typically include Web, File Transfer Protocol (FTP), and virtual private

12


13/106

networking (VPN) servers. Every client request to a stateless application is a separate

transaction, so it is possible to distribute the requests among multiple servers to balance

the load. One attractive feature of NLBS is that all servers in a cluster monitor each other

with a heartbeat signal, so there is no single point of failure.

c) Red Hat Cluster SuiteFor applications that require maximum uptime, a Red Hat Enterprise Linux cluster with

Red Hat Cluster Suite is the answer. Specifically designed for Red Hat Enterprise Linux,

Red Hat Cluster Suite provides two distinct types of clustering:

a. Application/Service Fail over - Create n-node server clusters for failover of key

applications and services

b. IP Load Balancing - Load balance incoming IP network requests across a farm of

servers

With Red Hat Cluster Suite, applications can be deployed in high availability

configurations so that they are always operational-bringing "scale-out" capabilities to

Enterprise Linux deployments.For high-volume open source applications, such as NFS, Samba, and Apache, Red Hat

Cluster Suite provides a complete ready-to-use failover solution. For most other

applications, customers can create custom failover scripts using provided templates. Red

Hat Professional Services can provide custom Red Hat Cluster Suite deployment services

where required.

Layer-7 Load Balancing

Layer-7 load balancing, also known as application-level load balancing, is to parse

requests in application layer and distribute requests to servers based on different types of

request contents, so that it can provide quality of service requirements for different types

of contents and improve overall cluster performance. The overhead of parsing requests in

application layer is high, thus its scalability is limited, compared to layer-4 load balancing.

Example: KTCPVS.

2.3 Features of good Load balancing system

There are various features that are desirable in a good and efficient load balancing system.

Some of the features that a good load balancing system should fulfill are discussed below.

2.3.1 Local autonomy

Administrator of a pc has responsibility to control the use of their machine. Administrator

must be able to control the environment of the machine of which he is responsible, thetasks must be carried out on his machine as he wants, and this is called the concept of local

autonomy. The system where a task can be transferred to a remote machine and can be

received from other machine can pose a threat to the administrators ability to carry out

these responsibilities. In our view we feel that the Local autonomy should be maintained

by the load balancer with a very high degree.

2.3.2 Centralization and decentralization of the decision and actions

Centralization of the decisions means the decisions are made by a single entity across the

system while the decentralization means that decisions are taken by many independent

nodes. Load balancer will perform two kinds of operations that mean taking the decisions

and carrying out these decisions means taking corresponding actions.

13


14/106

Types of Decision

a. Task transferringIt involves taking decisions whether a task is to be transferred for execution on a

remote machine or where the particular task is to be transferred. This can be

decided by analyzing the load metric considered.b. Task receiving

It involves the decisions of accepting a task on a machine means a task received is

to be accepted or not, this can be decided by the administrator of that particular

machine or present load of the machine.

Load balancing activities, that is making decisions and performing them can happen at any

of the following levels.

1. Per task-The load balancing can be done per task of a machine which is

completely decentralized means that individual machine can take decision per task.

2. Per user-The load balancing in this case is to be done for all the task per user ofthat machine which means centralized decision for the tasks and decentralized for

the users.

3.Per machine-Decisions can be made centralized for the users and decentralized

for the machines

4. Per group of machine-Centralized for the machines and decentralized for theclusters means a group defined by the load balancing system.

5. Globally- (for all the machines taking part in load balancing)-Totally centralizedfor all machines

2.3.3 Time out period required for the sender

In case a sender manager of a machine sends a task on a remote machine it may happen

that the remote machine crashes in between and the task may never get executed so the

sender will require a time out period till which it needs to wait for the remote machine to

send back the result of the task.

2.3.4 Transferring of the files needed

There can be a case where a task that is transferred for remote execution need some file

read or write operation in that case those files are to be made available to that task for this

either remote access can be made or that files can be copied to the remote machine and

work can be carried out there.

2.3.5 Distributed file system can be used

If we use distributed file system then the file system and the load balancing system can

negotiate so either the file can be transferred to the work or the work can be transferred to

the place where the file is located. For this a support from the file system is needed.

2.3.6 Total work should be completed as fast as possible

One of the main features of the load balancing system is that overall turnaround time

should be minimum that is total tasks present are to be completed as fast as possible.

2.3.7 Effective distribution

14


15/106

It can happen that a smaller task is transferred onto a remote machine whereas it can be

executed slowly on its originating machine and this speed is still fast compared to its

transfer to a remote machine and its execution there.

2.3.8 Fair distribution

It should not happen in the load balancing system that some of the processors getoverloaded and other remains under loaded means the distribution of the task should be

fair and effective.

2.3.9 Tasks with constraints

There may be tasks that depends on output from some other tasks in that case the

dependent task cant be transferred for remote execution since the synchronization

between tasks on remote machine will be difficult or much inefficient .

2.3.10 ScalabilitySystem should be scalable means it should work well for both large as well as small

systems and can be easily scaled if the dimensions of the system are varied.

2.3.11 Stable

System should be stable means any task should not be transferred so many times that no

work could get done and the time is only wasted in transferring that task from one

machine to other

2.3.12 Fault tolerant

We want in our system that if any machine in a group fails the system should recover from

it that means all the tasks that were going on in that machine should be recovered or

reported back to their originating machines. In this case there will also be reduction of the

computation power of the whole system so this information also needs to be propagated to

all the nodes in the system

2.4 Classification of Load Balancing Algorithms

We can classify a load balancing systems using various criterions. These are discussed

below.

2.4.1 Load Balancing and Load Sharing

As the name of two suggests, Load balancing tries to equalize the load at all processors i.e.takes the total load as input and distributes it equally on available machines whereas Load

sharing tries to reduce the load on the heavily loaded processors only it probably leads to a

better solution.

If we compare the two then it can be seen that, load balancing moves tasks more often

than load sharing; and hence there is much more overhead involved.

2.4.2 Preemptive and Non-Preemptive Algorithm

This classification is based on the question that, Can a task be transferred to another

processor once it starts executing?

In Non-Preemptive transfer (task placement) one can only transfer tasks that have not yet

begun execution. Here we have to transfer environment info along with

15


16/106

a. Program code and data,

b. Environment variables,

c. Working directory,

d. Inherited privileges, etc.

The method is fairly simple. In preemptive algorithms we can also transfer a task that haspartially executed, but along with it we have to transfer entire state of the task, virtual

memory image, process control block, unread I/O buffers and messages, file pointers,

timers that have been set, etc. the method may produce better results but it proves

expensive.

2.4.3 Static and Dynamic Algorithms

a. Static Algorithms

The static algorithms does not consider dynamic system state i.e. the change in it, this

algorithm uses static information about average behavior of the machines and load

distribution decisions are hard-wired into the algorithm. It is clear here that only a littlerun-time overhead is involved to take balancing decisions. Examples are stated below-

1. Round Robin and Randomized AlgorithmsAs the name suggests processes are divided evenly between all processors and each

new process is assigned to new processor in round robin order the advantage is that it

does not require inter-process communication.

2. Central Manager AlgorithmA central processor selects the host for each new process and minimally loaded

processor depending on the overall load is selected when process is created.

3. Threshold AlgorithmIn this algorithm we first define a scheme under which Load of a processor can

characterize by one of the three levels: under loaded, medium and overloaded based on

some threshold value according to a load metric now distribution decisions are taken

according to specified levels on available processors.

b. Dynamic Algorithms

In Dynamic algorithms the distribution system takes current system state into account

and then moves on to take decisions regarding moving work, also this algorithm hasthe potential to outperform static load distribution because it can exploit short-term

fluctuations in system state But at the same time it has some overhead for state

monitoring.

1. Local Queue AlgorithmMain feature of this algorithm is dynamic process migration support. It allows the

static allocation of all new processes with process migration initiated by a host when

its load falls under threshold limit.

2. Central Queue Algorithm

16


17/106

This works on the principle of dynamic distribution, the system stores new task as a

cyclic FIFO queue and whenever a request for an activity is received by the queue

manager, it removes the first activity from the queue and sends it to the requester.

In between the above Static and Dynamic algorithms is the Adaptive algorithm. Here

we modify the algorithm based on the state. For example, stop collecting info (gostatic) if all nodes are busy so as not to impose extra overhead.

2.5 Logical components of load balancing system

A Load balancing system is composed of various logical components, each of which

performs a predefined task and interacts with other components to achieve the goal of load

balancing. They are:

a. Transfer Policy

b. Location Policy

c. Selection Policy

d. Information Policy

Following sections describe each of them in detail.

2.5.1 Transfer Policy

The transfer policy determines whether a nodes is a sender or a receiver of the load,

basically the Sender is an overloaded node and a Receiver is an under loaded nodes.

Various mechanisms are available to implement this policy as follows:

Threshold-based transferIn it we first establish a threshold, expressed in units of load (however load is measured).

Now when a new task originates on a nodes, if the load on that nodes exceeds the

threshold, the transfer policy decides that, that nodes is a sender or when the load at a

nodes falls below the threshold, the transfer policy decides that the nodes can be a receiver

Various sub options are there in this mechanism

a. Single threshold

With our load index we define a single threshold value. The process is simple, but

there may be too many transfers

b. Double thresholds

Here we define two values; high and low i.e. a load window and transfer policy

recognizes the sender and receiver accordingly.

c. Imbalance detected by information policy (relative policy)

If load parameter of two nodes differs by more than d, then it can be either the

sender or receiver

2.5.2 Selection policy

When the load balancing system recognizes that there is a need to move a task from one

place to another then, Selection Policy, selects which task is to be transferred. The points

to take under consideration are

a) Small size, moving a small task may take less time and lesser networkresources.

17


18/106

b) Lowest priority.

c) Newly originated simple (task just started).

d) Long lived task.

Within the Selection Policy we need a priority policy to define a priority for each task to

be performed

Priority assignment policyThe Priority we are mentioning here is not to be taken as in the general case but this

priority is defined with respect to selection policy to take the decisions, more the priority

lesser the process is migrated. Possible ways of assigning priority are,

Altruistic In this, a remote process given higher priority over the local one i.e. if there

is a need to unload then first choose a local process.

Selfish As opposed to Altruist, here local processes given priority penalizes

processes arriving at busy node

Intermediate In between the above two options we have an intermediate solution as,give priority on the ratio of local/remote processes in the system this generally is used.

Finally when designing a Selection Policy we have to take care whether our Load

Distribution algorithm is Non-preemptive or preemptive.

2.5.3 Location Policy

As mentioned earlier the principle of load balancing system is to obtain a (nearly) equal

distribution of work across machines. For this to be achieved there is a need to locate a

sender and receiver pair in the system. Therefore once the transfer policy designates a

node a sender, Location Policy finds a receiver or stated otherwise, once the transfer

policy designates nodes a receiver, Location Policy finds a sender. To achieve this, polling

can be used.

PollingOne node polls other nodes to find out if it is a suitable node for load distribution; the

polling can be done by any of following techniques

a) Randomly

b) Based on information collected in previous polls

c) On a nearest-neighbor basis

Also a node can poll other nodes either serially or in parallel (e.g., multicast) but usually

there is some limit on number of polls, and if that number is exceeded, the load

distribution is not done.

Finally a node can also just broadcast a query to find a node who wants to be involved.

2.5.4 Information policy

The Information Policy answers the following questions:

a) When information about the state of other nodes should be collected

b) Where it should be collected from

c) What information should be collected

18


19/106

Demand-drivenThis is based on the demand driven approach i.e. a nodes collect the state of the other

nodes only when it becomes either a sender or a receiver (based on transfer and selection

policies)

PeriodicAs the name suggests, nodes exchange load information at periodic intervals, now based

on information collected, transfer policy on a nodes may decide to transfer tasks. The

disadvantage of this approach is that it does not adapt to system state and collects same

information (overhead) at high system load as at low system load.

Dynamic - driven by system stateSender-initiated - senders look for receivers to transfer load onto.

Receiver-initiated - receivers solicit load from senders.

Symmetrically-initiated - combination of above two where load sharing is triggered by the

demand for extra processing power or extra work.

State-change-drivenHere the nodes broadcast state information whenever their state changes by a certain

degree.

It differs from demand-driven in that a processor disseminates information about its own

state, rather than collecting information about the state of other processors. The options

available here are that the nodes may send information to central collection point, or may

send it to peers.

2.6 Communication Modes

In many circumstances, one process needs to exchange information with another process.

Such communication can occur in two major ways. The first takes place between

processes that are executing on the same computer; the second takes place between

processes that are executing on different computer systems that are tied together by a

computer network. Communications may be implemented via shared memory, or by the

technique of message passing, in which packets of information are moved between

processes.

2.6.1 Shared Memory

In the shared-memory model, processes use map memory system calls to gain access toregions of memory owned by other processes. Normally, the operating system tries to

prevent one process from accessing another process' memory. Shared memory requires

that several processes agree to remove this restriction. They may then exchange

information by reading and writing data in these shared areas. The form of the data and the

location are determined by these processes and are not under the operating system's

control. The processes are also responsible for ensuring that they are not writing to the

same location simultaneously.

2.6.2 Message Passing

In the message-passing model, information is exchanged through an inter-process

communication. Before communication can take place, a connection must be opened. Thename of the other communicator must be known, be it another process on the same CPU,

19


20/106

or a process on another computer connected by a communications network. Each

computer in a network has a host name, such as an IP name, by which it is commonly

known. Similarly, each process has a process name, which is translated into an equivalent

identifier by which the operating system can refer to it. The get hostidandget processidsystem calls do this translation. These identifiers are then passed to the general-purpose

open and close calls provided by the file system, or to specific open connect ion and closeconnect ion system calls, depending on the system's model of communications.

CHAPTER 3

ANALYSIS

3.1 Existing system

Various system use Load balancing and sharing systems to achieve performance gain and

proper resource utilization. Examples are: Linux Virtual Sever (LVS), RedHat Cluster

Suite, Network Load Balancing System (NLBS) and V-System etc. we have studied aboutthese systems in brief and found that there is still some scope in field of Local Area

Networks. Following sections describes the system we are proposing.

3.2 Proposed system

Our system is based on approach of load balancing, which will try to transfer load of

highly loaded nodes onto lightly loaded one. The system starts when user enters any

command on his machine, various module of the system now communicate with each

other to take various decision of load balancing. In this way we try to provide load

balancing service transparently where no user interaction is needed once a system has been

started, all the modules run in background. User does not need to consider anything

regarding load balancing decisions.

3.3 Feasibility

Feasibility analysis includes knowing whether a system proposed is feasible under certain

parameters like cost, time and technology. In other words it is done to know whether a

system that is proposed is possible under these given parameters. In the following sections

we present a study of feasibility analysis of our system.

3.3.1 Technical feasibility

Technical feasibility considers whether available technology suffices technical needs of

the proposed system. This analysis is used to know availability of the various tools,

programming languages and development environments.

We ensure the technical feasibility of our proposed system under following points.

20


21/106

a. One of our major requirements is homogeneity of various machines in the LAN

which can be part of our Load balancing system.

b. For Inter process communication in our system we are using message passing

model which can be achieved using TCP sockets.

c. We in our system need shell as the user process which presents interface to the user

of the system. For this, shell can be modified by using programming language likeC and C++.

3.3.2 Economic feasibility

A system that can be developed technically and that will be used if installed must still be a

good investment for the organization. Economical feasibility

a. The cost to conduct a full system investigation.

b. The cost to hardware and software for the application being considered.

c. The benefits in the form of reduced costs or fewer costly errors.

Economical justification is generally the Bottom line Consideration for most systems.

Economic justification includes a broad range of concerns that include cost benefitanalysis. Cost benefit analysis delineates costs for project development and weighs them

against tangible (i.e. measurable directly) and intangible benefits of a system.

Software testing is a critical element of software quality assurance and represents the

ultimate review of specifications design, and coding. The point was kept in mind through

out the software development activity and a conscious effort was made to test the system

as thoroughly as possible. The objective behind this was to systematically uncover

different classes of errors with a minimum amount of time and effort.

3.4 Software requirements

For successful execution of our project we need minimum of these requirements:

a) Linux as operating system of any flavor having higher kernel version than 2.35

b) C++ API for kernel level and socket level programming,

c) Network file system configured as an Distributed File system.

3.5 Hardware requirements

The preferred hardware configuration is:

a. Intel(R) 845GVSR motherboard

b. Pentium(R) 4 CPU 2.40GHz

c. 256MB of DDR RAM

3.6 Assumptions of the operating environment

The design proposed here is fairly general in that it makes few assumptions about the

operating environment. The design could be implemented in any environment that fulfills

these assumptions. This section describes those aspects of the operating environment on

which the design of the load balancer depends.

3.6.1File System

Various options are considered for accessing files from a remote machine, like if we can

access the file from host machine to remote machine whenever needed or to use

distributed file system like NFS.

21


22/106

We are using NFS (Network File System) to be the service which will make available files

to remote machines whenever needed. It will mainly work by making files, which will be

needed to operate upon, available to remote machine as if they are present on the remote

machine.

Suppose a user wants to compress a file and the corresponding command gets loadbalanced due to the high load of that system. Then the machine that will accept this task of

compressing the file will need that file to be available locally, which is done by NFS.

If we rule out this option then making files of every machine available to every other

machine will be a cumbersome task in itself.

3.6.2 No Process Migration

Our design does not rely upon process migration. Load balancing is done by initial

placement of tasks; once a task is accepted by a machine, it runs to termination on that

machine. It is possible that a task that was accepted under one set of circumstances could

become unwelcome on its host machine later in its lifetime if circumstances change.Nevertheless, once a task is accepted it runs to completion, even if this later becomes

inconvenient or undesirable. Without process migration there is little alternative. In the

case of a multi-user computer, this is small problem; users must except that tasks

belonging to other users will affect the response times of their tasks. In the case of single

user workstation or personal computer the problem is more significant, if only for

psychological reasons. We have considered some approaches to ameliorating the problem:

a. Foreign tasks which become unwelcome could be suspended for a while, or

they could have their priority lowered (but such action will increase response

time).

b. Tasks which become unwelcome could be aborted and an attempt made to

automatically restart them from scratch elsewhere. But it may be difficult for

the load balancer to know whether and how to restart a task. Alternatively,

unwelcome tasks could be aborted and an error reported to the originating

process, which may choose to restart the task elsewhere, but it may be difficult

even for the originating process to know whether and how a task can safely be

restarted. At the least, an aborted task has wasted all the process time and all

the real time it has used, increasing response time and defeating the purpose of

load balancing.

Neither of these alternatives is attractive and we have not included either in the design.

Tolerating foreign tasks which have become unwelcome is part of the price one pays forparticipating in a load balancing scheme.

Another reason for not using process migration in the load balancer is that migrating an

executing process would involve sending the processs entire execution image to the

destination machine which itself is very tough task to map process execution image and

context with one to one correspondence on the host machine.

3.6.3 Multiprogramming

The load balancer consists of number of independent long running processes, sometimes

called daemons. Any operating system on which this design is to be implemented must

provide multiprogramming, the ability to have several processes running or ready to run.

22


23/106

We have taken Linux as our implementation platform which very well supports

multiprogramming.

Since we need to run several modules simultaneously in our Load balancing system

therefore the need of multiprogramming support from the operating system is must for our

Load balancing system to work correctly.

3.6.4 Network

The network connecting the load balancing machines need only support point-to-point

communication, that is, process-to-process communication between machines. Certain

operations of the load balancer can be made more efficient if broadcast or multicast is

available, but these are not required for correct operation. The network must operate with

a fairly low latency or some of the benefit of load balancing will be lost. All

communication between components of the load balancer, and between user processes and

the load balancer, are reasonably short ASCII text messages.

Also we consider a LAN as real time system, where processes communicate in real timewithout a big delay where we can transfer heavy data such as big files if needed to a

remote machine in a real time.

This assumption is taken because our LAN needs not to be a part of internet to avail the

services of Load balancer. Therefore we need only point to point communication between

machines. Moreover we will be able to bound worst case performance of the system.

Also if we dont take our LAN to be operating in real time then there may be a

considerable delay between processes communication which can affect the overall purpose

of Load balancing system.

Subject to the above requirements, we believe it would be possible to implement our

design on most any hardware, software, and network.

23


24/106

CHAPTER 4DESIGN

Design is a meaningful engineering representation of something that is to be built. It can

be traced to a customers requirements and at the same time assessed for quality against

asset of predefined criteria for good design. In the software engineering context, design

focuses on four major areas of concern: data, architecture, interfaces, and components.

Software engineers design computer-based system, but the skills required at each level of

design work are different. At data and architectural level, design focuses on patterns as

they apply to the application to be built. At the interface level, human ergonomics oftendictate our design approach. At the component level, a programming approach leads us

to effective data and procedural designs. Design begins with the requirements model. We

work to transform this model into four levels of design detail: data structure, system

architecture, the interface representation and the component level detail.

4.1 Design of Load Balancing System

The overall design of a very broad and a general load balancing system consists of five

separate modules dedicated to perform specific task as shown in the figure. The load

balancing decisions made by sending machine (load balancing sending decisions) are

different from the load balancing decisions made by receiving machine ( load balancingreceiving decisions) so, there are separate entities to handle the different decisions. The

one per machine entity responsible for making load balancing sending decisions is the

Sending Manager or SM. The one per machine entity responsible for making load balancing receiving decisions is the Receiving Manager or RM. Each machine

participating in load balancing will normally run both SM and RM (An administrator can

choose which entity he wants to allow both or any one of them) Two other software

entities support the SM. In making its decisions, the SM can use information about the

loads on other machines. The Load Information Module, orLIM, maintains and updatesload information and provides it on request to the SM. The SM can also use information

about the characteristics of particular tasks. The Task Information Module, or TIM,

maintains and updates this information and provides it on the request to the SM. The SMcommunicates with the LIM and the TIM via inter-process communication (IPC). The

24


25/106

LIM and TIM will reply to the queries from any process, so other system process or user

processes may obtain information from them. Figure given below will illustrate the

operations of the load balancer as explained in following section

4.1.1 Architectural Diagram of Load Balancing System

Figure 4.1 Architecture of Load balancing system

4.1.2 User Process

Any process which executes tasks can use the load balancer. We expect that users shell

will be the main users of the load balancer, using it to improve the performance of the

commands typed by the user. Nonetheless, any program or process can use the load

balancer. Possibilities include a load balancing Make, or a graphics program computing

many picture elements.

25


26/106

A user process want to execute some task or tasks subject to load balancing sends a query

to the local SM. User process clients of SM make no decisions; they know nothing about

which task are eligible for load balancing or how placement decisions are made. This

allows the clients to be simpler, eliminates certain duplications of overhead, and simplifies

administrative control of load balancing.

Client communicate with the SM via whatever IPC mechanism is available and

convenient, (in this case we are using TCP sockets). The SM may consult the LIM and

TIM and its own information on recent placement. It makes load balancing sending

decisions and replies to the user process with the names of the machines on which the task

should be executed. Tasks which the SM decides should be placed on the originating

machine are executed in the usual manner. For tasks which the SM decides should be

placed on remote machine, the user process sets up the connection to the RM on that

machine selected and send the tasks to them. The user process can then either wait for the

result from the remote machine or proceed with other work.

If one of the selected receiving machines refuses to accept the task, it replies with a refusalmessage to the user process which sent the task. The user process than notifies its SM of

the refusal and asks for a new placement decision.

4.1.3 The Sender Manager

The SM receives requests from user processes; it makes load balancing sending decisions

in response to those queries and sends the results back to the query processes. The SM

maintains information about recent replacement decisions it has made and can use that

information in making new decisions. The SM may query the LIM or the TIM and use

their information in making decisions; the SM communicates with the LIM and the TIM

through IPC. If the SM has no other duties and the local machines load is low enough, it

can try to pre-compute placement decisions for likely requests (How the SM might pre-

compute placement decision is new point of investigation, which we have not yet

explored). Note that SM can decide to execute a process on the local machine under

following conditions:

a. Because the local load is so low that there is no need to transfer task,

b. Because the task has been declared not eligible for load balancing in TIM

c. Because the placement module has selected local machine

If the SM encounters a problem of some sort, so that it cannot make decisions, it replies to

all queries by saying all tasks can be executed locally, that is, it turns off load balancing. Itwould be an error for the SM to have a task sent somewhere it should not go, whereas

turning off load balancing merely removes an enhancement to the system.

4.1.4 The Receiver Manager

When a user process wants to execute some task or tasks subject to load balancing it

consults its local SM, which provides identity of a machine to which the user process

should send task. The user process then uses an inter-machine IPC mechanism to establish

a connection to the RM on the selected machine, and send the task to RM. The RM

examines the task and decides whether or not to accept it. If the RM accepts the task, it

sends an acceptance message back to the user process, it establishes an execution

environment for the task, and it arranges for the task to execute. If the RM does not accept

26


27/106

the task, it sends a refusal message back to the user process. The receiving machine has no

obligation to provide service to the sending process.

For each accepted task, the RM creates an execution environment, if necessary obtaining

information from the user process. (In UNIX, for the load balancing shell, this will involve

obtaining environment variables and shell variables from the originating shell.) Theconnection from the user process is duplicated and passed to the execution environments.

The RM then causes the tasks to execute, each in its own execution environment. The

tasks execute logically in parallel, so they may compete with each other for resources. If

any task requires input data it reads it from the originating user process; any result or other

output are written to the user process via the connection.

The RMs acceptance decision can be based on such factors as the local load, the name of

the task, the characteristics of the task, the identity of the sending user, the identity of the

sending machine, the number of foreign tasks already accepted, the number of users

logged on, the time of day, and more. A file of acceptance criteria, maintained by the

system administrator, specifies which factors, and what values of the factors, determineacceptability. The format and contents of this file are described in a later section. The RM

may communicate with the LIM or the TIM running on the same machine.

The RM maintains a file of accounting information, containing a record for each foreign

task accepted. Each record includes the originating machine and user, the real (wall clock)

starting and ending times, and accounting data such as CPU time used, amount of I/O,

memory occupancy, network traffic, and so on. Additional accounting data not directly

associated with the execution of a task should also be recorded if at all possible; this

includes data such as number of pages printed or typeset, number and type of tapes

mounted, and so on.

The RM does not perform any accounting actions such as computing or assessing charges,

it just writes out its accounting file. System administrators will probably arrange for the

RM and accepted tasks to run under one account, for example, an account named

loadbal. System administrators could use the RMs accounting file to re-assign charges

from this load balancer account back to the originating users account. It is left to system

administrators to arrange this, if appropriate.

When a users task is executed on a remote machine via the RM, the remote machine

charges the task execution to the load balancing account and the remote RM records the

task execution in its accounting file. This design allows users to use the load remotemachine (subject to acceptance criteria in the receivers configuration file). Note that the

accounting is carried out in this manner even if the user does happen to have an account on

the remote machine. There are several reasons for this. First, doing it the same way in all

cases makes the design and implementation of the load balancer simpler. Second, the

accounting file is a technical record of the RMs activities as well as an administrative

accounting record. Third, the account named joe on machine Alpha may belong to a

different person than does joe on machine Beta, or the accounts may belong to the same

person but be subject to different administrative or accounting arrangements (for example,

they may be used for different projects billed to different clients). In any case, the

accounting records should be kept separate.

4.1.5 The Load Information Module

27


28/106

One Load Information Module (LIM) executes on each machine. The LIM is responsible

for obtaining and maintaining up-to-date information about the loads on all machines

which might be considered for load balancing. (Other processes may consult the LIM

about similar matters, so its duties may be a bit broader than just providing load balancing

information). To do this, it samples the load of the local machine and, when necessary,

sends updates to other LIMs. It receives load updates sent by other machines and it may, ifnecessary, request load updates from other machines. The LIM makes its information

available by responding to queries. The LIM responds to three types of queries:

a. Query for the load of the local machine,

b. Query for the load of a single specified machine,

c. Query for all information the LIM has.

The LIM is used by the SM and the RM running on the same machine, but may also be

used by other processes. Many operating systems run some sort of remote machine

information facility (e.g. rwhodin Berkeley UNIX, rstatd in Sun UNIX). At some future

time, these facilities might be merged with or replaced by the LIM.

4.1.6 The Task Information Module

Different tasks consume different types of resources and cause different kinds of load.

Tasks may run for a short time or for a long time, and they may consume a lot of resources

during this interval or not. Some tasks may do much computation and little I/O. some

tasks may do much disk I/O but little network communication, and so on. These are

characteristics of tasks.

It is the responsibility of the TIM to obtain, update, maintain, and make available

information on the characteristics of tasks. This information may be used by the SM and

by the RM running on the same machine, and may be used by other process as well.

The current design of the TIM is a fairly simple prototype. Some task information will be

obtained through hand-done characterization studies, and this data will be loaded into the

TIM. We plan to do experiments to see whether task information is actually helpful in

making load balance decisions. If task information is shown to be useful in making load

balancing decisions, or some other type of decisions, then a more sophisticated TIM may

be designed.

Queries to the TIM will specify a program, and as much information about flags and

arguments as is available. The TIM will reply with information about the taskcharacteristics for that program with those flags and arguments, if they are known. If the

flags and arguments are not known, the TIM will send low confidence reply. If nothing

is known about the task, the TIM will send a no information reply.

Having discussed the operation of each module of the load balancer, we next discuss the

message formats by which the components communicate.

4.1.7 Communication

All communication with and between load balancing managers will be by sending

messages in ASCII text. This will make it easier for user program to communicate with

the load balancing system, especially programs written in language other than the one the

28


29/106

load balancer is implemented in. it will also make debugging the load balancing system

easier.

To make it easier and more efficient for programs to generate, read, and understand

messages, all messages must conform to a strict format. Every message begins with a

message header and contains one or more sections. Each section will begin with sectionheader keyword, and the lines of section have a set format. All keywords are exactly four

characters long.

Ever message begins with a message header identifying the sender of the message and the

intended recipient of the message. This information is not strictly necessary; it is included

as an error check and debugging aid. Any load balancer component receiving a message

first checks to see that it is the intended receiver.

The first line of every message is thefrom line:FROM sending_component sending_machine

Sending_component identifies the component sending the message; it is one of

SHELL, SM, RM, LIM, or TIM

All user process are identified as USER, Sending_machine is the name of the host fromwhich the message is being sent. The name can be in whatever format is convenient for the

network being used.

The second line of every message is the to line:TO receiving_component receiving_machine

Every message ends with a line containing only the keyword

END

It is an error to have any extra text after the END.

4.1.8 Load Metric and its Interpretation

For the purpose of characterizing any node as lightly or heavily loaded we need a load

metric depending on which sending and receiving decisions can be made. The desirable

properties of a load metric are: easy to fetch and calculate, easy to interpret, should not be

instantaneous e.g. CPU percentage usage is an instantaneous value, must relate to and

characterize the load which we are trying to level. Choice of Load Metric depends on the

objective of load balancing system. Fairly sophisticated Load Metric values can be

designed to account for various characteristics affecting system performance.

With our design, we are trying to reduce average turnaround time and response time for

user, so a metric value which relates to this, is desirable. The Load Average as

calculated by UNIX/LINUX Kernel possesses these characteristics. Commands namelytop, uptime etc. display three values of load average, these are average of the number ofprocesses waiting in the run-queue plus the number currently executing over past 1, 5, and

15 minutes. High load averages means that the system is being used heavily and the

response time is correspondingly low.

Therefore a value 1 implies that on an average 1 task was demanding CPU, assuming a

single CPU system this value equates demand and supply.An acceptable load average is 3

to 7 jobs for a large system, 1 to 2 jobs for a workstation. Therefore we have used the

output of top command as the load metric and threshold is chosen as equal to 2, which can

further be configured by the administrator.

29


30/106

CHAPTER 5

IMPLEMENTATION

This is the construction phase of any software project development. It is the phase that

actually provides meaning to all the other phases. All the steps, starting from the

requirement analysis are carried out just for the sake of providing support to the

implementation. The implementation of this project is very difficult and long task if an

attempt is made to do so for all the possible variants and consideration of each and every

minor intricacy involved. Implementation model represents the current mode of operation;that is the existing or proposed allocation for all system elements. The essential model is

generic in the sense that realization of function is not explicitly indicated. Implementation

is the development of the executable software as per design specification of the desired

system. The code generation step performs the task of mapping system requirements as

specified in the design phase into programming steps that achieves the system

functionality as specified by the user.

The implementation phase affects both testing and maintenance profoundly. The time

spent in implementation is small percentage of the total software cost, while testing and

maintenance consume the major percentage. Thus, the goal during implementation should

not be to reduce the implementation cost, but should be to reduce the cost of late phases,

even if it means that the cost of this phase has to increase. In other words, the goal duringthis phase is not simplify the job of the programmer. Rather the goal should be to simplify

the job of tester & maintainer. Hence, keeping the consideration of the above facts, this

tool is kept as modular as possible & proper documentation is provided for

maintainability. There are many different criteria for judging the program including

readability, size of the program, execution time and required memory.

5.1 Algorithm Implemented

As per the classification of Load Balancing algorithms that we have mentioned earlier in

analysis phase, our system falls in following category:

a. Dynamic: The Load Information Module is well equipped to correctly

gather remote load information timely and consistently. This algorithm

inculcates dynamism as a consequence of which non obsolete load of every

node is timely available which helps LIM for correct decision making for

remote load selection.

b.Non-preemptive: Processes once transferred is completely run on that node

until completion of that task. No process migration is supported.

c. Load balancing: We have tried to balance the load between nodes rather

than load sharing approach.

5.2 Communication mode used

30


31/106

In a programming model where components are strictly isolated from each other we need

a form of communication for them to carry out their task with perfect coordination and

synchronization. Without some form of communication, components are not merely

isolated from each other rather they are not aware of each other For components to form a

meaningful application they need some sort of communication mode between them so that

they can carry out their tasks with full coordination and synchronization.

Two basic modes of communication between different components in a distributed system

are based on whether we use threads or processes to implement different components.

In an application where we use threads the communication between different threads is

possible by shared memory, where a common memory is shared between different threads

to communicate, major problem with this architecture is dealing with inconsistency of the

shared memory. Several threads can try to read or write the shared memory at the same

time which can leave memory in an inconsistent state. This inconsistency can cause a big

problem if not correctly dealt with. It can be avoided by stringent dead lock avoidance or

preventions techniques but this model causes big overhead due to which we have preferredto use message protocol architecture which has no such overhead.

Message passing using TCP sockets provides uniformity in inter process communication

as same set of procedure is used for local and remote communication. This form of

communication is even useful in slow network links so the performance of the system

does not dip. Also there is no problem related to inconsistent states of the system and

hence no overhead is inculcated.

Each process communicates with other process through ASCII messages whose formats

are predefined.

Considering above facts we have implemented message passing model in our system

where each module like TIM, LIM, SM, RM communicate with each other by passing

messages to each other and they then carry out their task accordingly.

5.3 Achieving Inter process communication

The UNIX input/output (I/O) system follows a paradigm usually referred to as Open-

Read-Write-Close. Before a user process can perform I/O operations, it calls Open to

specify and obtain permissions for the file or device to be used. Once an object has been

opened, the user process makes one or more calls to ReadorWrite data. Readreads datafrom the object and transfers it to the user process, while Write transfers data from the userprocess to the object. After all transfer operations are complete, the user process calls

Close to inform the operating system that it has finished using that object.

When facilities for Inter Process Communication (IPC) and networking were added to

UNIX, the idea was to make the interface to IPC similar to that of file I/O. In UNIX, a

process has a set of I/O descriptors that one reads from and writes to. These descriptors

may refer to files, devices, or communication channels (sockets). The lifetime of a

descriptor is made up of three phases: creation (open socket), reading and writing (receive

and send to socket), and destruction (close socket).

31


32/106

The IPC interface in BSD-like versions of UNIX is implemented as a layer over the

network TCP and UDP protocols. Message destinations are specified as socket addresses;

each socket address is a communication identifier that consists of a port number and an

Internet address.

The IPC operations are based on socket pairs, one belonging to a communication process.IPC is done by exchanging some data through transmitting that data in a message between

a socket in one process and another socket in another process. When messages are sent,

the messages are queued at the sending socket until the underlying network protocol has

transmitted them. When they arrive, the messages are queued at the receiving socket until

the receiving process makes the necessary calls to receive them.

5.3.1 TCP/IP and UDP/IP communications

There are two communication protocols that one can use for socket programming:

datagram communication and stream communication.

5.3.1.1 Datagram communicationThe datagram communication protocol, known as UDP (user datagram protocol), is a

connectionless protocol, meaning that each time you send datagram, you also need to send

the local socket descriptor and the receiving socket's address. As you can tell, additional

data must be sent each time a communication is made.

5.3.1.2 Stream communication

The stream communication protocol is known as TCP (transfer control protocol). Unlike

UDP, TCP is a connection-oriented protocol. In order to do communication over the TCP

protocol, a connection must first be established between the pair of sockets. While one of

the sockets listens for a connection request (server), the other asks for a connection

(client). Once two sockets have been connected, they can be used to transmit data in both

(or either one of the) directions.

As it can be seen that TCP sockets are connection oriented as well as reliable form of

communication therefore we have implemented TCP sockets in our system for different

modules of the system to perform inter process communication.

We have used C library of sockets functions to achieve Inter process communication in

our system. Since C has simple interfaces for sockets, both stream and datagram sockets

therefore it was easier while implementing sockets using programming language C.

In the following section we present details of Socket APIs used.

5.3.2 C/C++ Socket APIs

We have written majority of our code in C++ language. The Socket library available has

various structures and methods for our use as described below:

struct addrinfo This structure is a more recent invention, and is used to prepare the

socket address structures for subsequent use. It's also used in host name lookups, and

service name lookups. It's one of the first things we have to call when making a

connection.

struct addrinfo {int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.

32


33/106

int ai_family; // AF_INET, AF_INET6, AF_UNSPEC

int ai_socktype; // SOCK_STREAM, SOCK_DGRAM

int ai_protocol; // use 0 for "any"

size_t ai_addrlen; // size of ai_addr in bytes

struct sockaddr *ai_addr; // struct sockaddr_in or _in6

char *ai_canonname; // full canonical hostnamestruct addrinfo *ai_next; // linked list, next node

};

struct sockaddr and struct sockaddr_in holds socket address information for many types

of sockets.

struct sockaddr {

unsigned short sa_family; // address family, AF_xxx

char sa_data[14]; // 14 bytes of protocol address

};

// (IPv4 only--see struct sockaddr_in6 for IPv6)

struct sockaddr_in {

short int sin_family; // Address family, AF_INET

unsigned short int sin_port; // Port number

struct in_addr sin_addr; // Internet address

unsigned char sin_zero[8]; // Same size as struct sockaddr

};

socket(): This function creates a socket and returns an integer file descriptor which

will later be used for all operations for corresponding socket

#include #include

int socket(int domain, int type, int protocol);

bind(): Once you have a socket, you might have to associate that socket with a

port on your local machine, bind system is used here.

Here is the synopsis for the bind() system call:

#include

#include int bind(int sockfd, struct sockaddr *my_addr, int addrlen);

sockfdis the socket file descriptor returned by socket(). my_addris a pointer to a

struct sockaddr that contains information about your address, namely, port and IP

address. addrlen is the length in bytes of that address.

connect(): The connect() call is as follows:

#include

#include int connect(int sockfd, struct sockaddr *serv_addr, int addrlen);

33


34/106

sockfdis socket file descriptor, as returned by the socket() call, serv_addris a struct

sockaddr containing the destination port and IP address, and addrlen is the length in

bytes of the server address structure.

listen(): The listen() call is fairly simple, but requires a bit of explanation:

int listen(int sockfd, int backlog);

sockfdis the usual socket file descriptor from the socket() system call. backlogis the

number of connections allowed on the incoming queue. incoming connections are going

to wait in this queue until you accept() them and this is the limit on how many can queue

up.

accept(): The call is as follows:

#include

#include

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

sockfdis the listen()ing socket descriptor. Easy enough. addrwill usually be a pointer

to a local struct sockaddr_storage. This is where the information about the incoming

connection will go (and with it you can determine which host is calling you from which

port). addrlen is a local integer variable that should be set to sizeof(struct

sockaddr_storage) before its address is passed to accept(). accept() will not put

more than that many bytes into addr. If it puts fewer in, it'll change the value ofaddrlen

to reflect that.

send(): The send() call has following prototype:

int send(int sockfd, const void *msg, int len, int flags);

sockfdis the socket descriptor you want to send data to (whether it's the one returned by

socket() or the one you got with accept().) msgis a pointer to the data you want to

send, and len is the length of that data in bytes.

recv(): The recv() call is similar to send in many respects:

int recv(int sockfd, void *buf, int len, int flags);

sockfdis the socket descriptor to read from, bufis the buffer to read the information into,

len is the maximum length of the buffer, and flags can again be set to 0.

close(): This is used to close a socket

int close(int sockfd);

This will prevent any more reads and writes to the socket. Anyone attempting to read or

write the socket on the remote end will receive an error.

34


35/106

5.4 K developer for C/C++

K developer IDE provides the user with a complete set of tools that integrate into one

graphical environment. It's an enormous bonus for developers if the environment remains

flexible enough to handle things separately or outside of the IDE, so they are not forced to

use the IDE's features where they think other tools are more appropriate.

Although IDEs on other platforms, especially on Microsoft operating systems, come with

all the tools bundled into one package, it's very different on a UNIX system. With UNIX,

the compiler, which is needed to create applications from the programming code that can

be run on a machine, is part of the operating system. Various tools that can be used in

conjunction with the compiler, like make or the GNU tools, are delivered as separate

packages, and an IDE makes use of these tools internally.

The KDE Project, comes with an IDE called KDevelop. This IDE can be used on any

UNIX system to develop software, especially KDE software, but not limited to it. Many

experienced UNIX programmers use it for plain C and C++ programming.

5.5 Linux

Although there are a large number of Linux implementations, we can find a lot of

similarities in the different distributions, if only because every Linux machine is a box

with building blocks that are put together following your own needs and views. Installing

the system is only the beginning of a long-term relationship. Just when you think you have

a nice running system, Linux will stimulate your imagination and creativeness, and the

more you realize what power the system can give you, the more you will try to redefine its

limits.

Linux may appear different depending on the distribution, hardware and personal taste, but

the fundamentals on which all graphical and other interfaces are built, remain the same.

The Linux system is based on GNU tools (Gnu's Not UNIX), which provide a set of

standard ways to handle and use the system. All GNU tools are open source, so they can

be installed on any system. Most distributions offer pre-compiled packages of most

common tools, such as RPM packages on RedHat and Debian packages (also called deb or

dpkg) on Debian, so you needn't be a programmer to install a package on your

system.Since most distributions come with a complete set of development tools, allowing

installation of new software purely from source code.

A list of common GNU software:

Bash: The GNU shellGCC: The GNU C Compiler

GDB: The GNU Debugger

Coreutils: a set of basic UNIX-style utilities, such as ls, cat and chmod

Findutils: to search and find files

Fontutils: to convert fonts from one format to another or make new fonts

Gnome: the GNU desktop environment

Emacs: a very powerful editor

GNU SQL: relational database system

5.6 Programming languages

35


36/106

We have used C++ to code our project. The main reason for choosing C++ was that it a

system convenient language and it provides a smooth interface between the program and

kernel by the means of system calls.

5.7 Implementation Diagram

The design diagram that we had constructed is now mapped with the implementation

diagram. We have proposed a message protocol that is used for communication between

various modules. With the input of the user command, the system follows these internalsteps:

1. The modified shell receives the command from the user input and parses the

command.

2. The parsed command is sent to the Sender Manager for an enquiry of the node

where this command needs to be executed.

3. The Sender Manager in turn, contacts TIM, which refers to its administrative

configurable structures for an answer.

4. If TIM refuses to accept the task, then SM sends local host to the Modified Shell,

else, it contacts LIM to find the best node possible to execute the user command.

5. LIM due to its dynamic algorithm (as explained earlier) is highly updated by the

load of remote nodes, selects the best node for command execution. The selection

policies of LIM to select the best node are also described earlier.

6. LIM then returns an IP address to SM, where the process needs to be executed. It

can even return local host.

7. If LIM returns local host SM returns local host to the Modified Shell, else, if LIM

returns IP Address of a remote node SM tries to enquire with RM of the remote

node.

8. RM considers the query of SM that is whether it is ready to accept the command

for execution. RM dictates its verdict using its own heuristics (described earlier) by

consultation of its own TIM and LIM, and sends it to SM.9. If the RM refuses to accept the command, the SM sends local host to Modified

Shell, and if it is ready to accept it, it sends the IP address to it.

10. After receiving the output from SM, the Modified Shell knows where to execute

the task.

11.If the reply from SM is local, it immediately executes it, and if it is remote, it sends

the full command to the RM of remote node (with the IP address that SM returned)

which would be waiting for it for the creation of execution environment.

12.If it is the case of remote execution at this point of time, we have found a sender

receiver pair, which is ready to remotely execute the user input command. Given a

sender receiver pair the way a remote command is executed is described in the next

topic.

36


37/106

37

SENDER MANAGER

MODULE

MODIFIED

SHELL C O

M

M A -

N D

TIM

SM2

SM

TIM1

LIMS

M4

SH

SM1

SM

LIM3

SM

SH3

LIM

LBCServer

MeasureLoad

RequestHandlerLoad

DBTimlist

(Config.file)

TIM

RECEIVER MANAGER

MODULE

SelfLoad

TIM

RM2

RM

TIM1

LIM

LBCServer

MeasureLoad

RequestHandler LoadDB Timlist

(Config.file)

TIM

OutputRedirected

ENVIRONMENTCmdfile.s

h

LIM

LIM5

SM

RM1

LIM

LIM3

RM

SM2

SH

RM1


38/106

SYMBOL DESCRIPTION

Module: Performs a dedicated and a specific

function or service assigned to it.

Message: Segments that are passed between

various modules with the following

specification:

Text File: A simple text file used to store

some permanent data or temporary output of

a command or a shell script used for

execution of a command.

Arrow: A direct function call or use of a

global variable without message passing

through socket programming.

Dashed Arrow: For input, output, error

redirections amongst various modules

38

MODULE

NAME

FILE

NAME


39/106

Dashed Oval: Specifies creation of

execution environment by execing

command, or exec a script file where the

command is written.

Figure 5.1 Implementation Diagram

5.8 Remote Command Execution

If given a sender and a receiver pair (which is chosen by the Load balancing system with

the help of TIM, LIM and other modules) then subsequent general transactions to achieve

remote command execution can be as follows:

1. The sender and the receiver establish a TCP socket connection for communication.

2. The sender sends the command it needs to remotely execute.

final our report

Documents