parallex - the supercomputer

TThhee

SSuuppeerr CCoommppuutteerr

PPAARRAALLLLEEXX –– TTHHEE SSUUPPEERR CCOOMMPPUUTTEERR

A PROJECT REPORT

Submitted by

Mr. AMIT KUMAR

Mr. ANKIT SINGH

Mr. SUSHANT BHADKAMKAR

in partial fulfillment for the award of the degree

Of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE

GUIDE: MR. ANIL KADAM

AISSMS’S COLLEGE OF ENGINEERING, PUNE

UNIVERSITY OF PUNE

2007 - 2008

CERTIFICATE

Certified that this project report “Parallex - The Super Computer” is

the bonafide work of

Mr. AMIT KUMAR (Seat No. :: B3*****7)

Mr. ANKIT SINGH (Seat No. :: B3*****8)

Mr. SUSHANT BHADKAMKAR (Seat No. :: B3*****2)

who carried out the project work under my supervision.

Prof. M. A. Pradhan Prof. Anil Kadam

HEAD OF DEPARTMENT GUIDE

Acknowledgment

The success of any project is never limited to an individual undertaking

the project. It is the collective effort of people around the individual that

spell success. There are some key personalities involved whose role has

been very vital to pave way for the success of the project. We take the

opportunity to express our sincere thanks and gratitude to them.

We would like to thank all the faculties (teaching & non-teaching) of

Computer Engineering Department of AISSMS College of Engineering,

Pune. Our project guide Prof. Anil Kadam was very generous in his

time and knowledge with us. We are grateful to Mr. Shasikant

Athavale who was the source of constant motivation and inspiration for

us. We are very thankful and obliged by the valuable suggestions

constantly given by Prof. Nitin Talhar and Ms. Sonali Nalamwar

which proved to be very helpful for the success of our project. Our

deepest gratitude to Prof. M. A. Pradhan for her thoughtful comments

accompanied with her gentle support during the academics.

We would like to thank the college authorities for providing us with full

support regarding lab, network and related software.

Abstract

Parallex is a parallel processing cluster consisting of control nodes and

execution nodes. Our implementation removes all the requirements of kernel level

modification and kernel patches to run a Beowulf cluster system. There can be many

control nodes in a typical Parallex cluster. The many control nodes will no longer just

monitor but will also take part in execution if resources permit. We have removed all

the restrictions of kernel, architecture and platform dependencies making out cluster

system work with completely different sets of CPU powers, operating systems, and

architectures, that too without the use of any existing parallel libraries, such as MPI

and PVM.

With a radically new perspective of how parallel system is supposed to be, we

have implemented our own distribution algorithms and parallel algorithms aimed at

ease of administration and simplicity of usage, without compromising the efficiency.

With a fully modular 7-step design we attack the traditional complications and

deficiencies in existing parallel system, such as redundancy, scheduling, cluster

accounting and parallel monitoring.

A typical Parallex cluster may consist of a few old-386 running NetBSD,

some ultra modern Intel – Dual Core running Linux, and some server class MIPS

processor running IRIX, all working in parallel with full homogeneity.

Table of Contents

Chapter No. Title Page No.

LIST OF FIGURES I

LIST OF TABLES II

1. A General Introduction

1.1 Basic concepts 1

1.2 Promises and Challenges 5

1.2.1 Processing technology 6

1.2.2 Networking technology 6

1.2.3 Software tools and technology 7

1.3 Current scenario 8

1.3.1 End user perspectives 8

1.3.2 Industrial perspective 8

1.3.3 Developers, researchers & scientists perspective 9

1.4 Obstacles and Why we don’t have 10 GHz today 9

1.5 Myths and Realities: 2 x 3 GHz < 6GHz 10

1.6 The problem statement 11

1.7 About PARALLEX 11

1.8 Motivation 12

1.9 Feature of PARALLEX 13

1.10 Why our design is “alternative” to parallel system 13

1.11 Innovation 14

2. REQURIREMENT ANALYSIS 16

2.1 Determining the overall mission of Parallex 16

2.2 Functional requirement for Parallex system 16

2.3 Non-functional requirement for system 17

3. PROJECT PLAN 19

4. SYSTEM DESIGN 21

5. IMPLEMENTATION DETAIL 24

5.1 Hardware architecture 24

5.2 Software architecture 26

5.3 Description for software behavior 28

5.3.1 Events 32

5.3.2 States 32

6. TECNOLOGIES USED 33

6.1 General terms 33

7. TESTING 35

8. COST ESTIMATION 44

9. USER MANUAL 45

9.1 Dedicated cluster setup 45

9.1.1 BProc Configuration 45

9.1.2 Bringing up BProc 47

9.1.3 Build phase 2 image 48

9.1.4 Loading phase 2 image 48

9.1.5 Using the cluster 49

9.1.6 Managing the cluster 50

9.1.7 Troubleshooting techniques 51

9.2 Share cluster setup 52

9.2.1 DHCP 52

9.2.2 NFS 54

9.2.2.1 Running NFS 55

9.2.3 SSH 57

9.2.3.1 Using SSH 60

9.2.4 Host file and name service 65

9.3 Working with PARALLEX 65

10. CONCLUSION 67

11. FUTURE ENHANCEMENT 68

12. REFERENCE 69

APPENDIX A 70 – 77

APPENDIX B 78 – 88

GLOSSARY 89 – 92

MEMORABLE JOURNEY (PHOTOS) 93 – 95

PARALLEX ACHIEVEMENTS 96 - 97

I. LIST OF FIGURES:

1.1 High-performance distributed system.

1.2 Transistor vs. Clock Speed

4.1 Design Framework

4.2 Parallex Design

5.1 Parallel System H/W Architecture

5.2 Parallel System S/W Architecture

7.1 Cyclomatic Diagram for the system

7.2 System Usage pattern

7.3 Histogram

7.4 One frame from Complex Rendering on Parallex: Simulation of an

explosion

II. LIST OF TABLES:

1.1 Project Plan

7.1 Logic/ coverage/decidion Testing

7.2 Functional Test

7.3 Console Test cases

7.4 Black box Testing

7.5 Benchmark Results

The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer

AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 1 -

Chapter 1. A General Introduction

1.1 BASIC CONCEPTS

The last two decades spawned a revolution in the world of computing; a move away

from central mainframe-based computing to network-based computing. Today,

servers are fast achieving the levels of CPU performance, memory capacity, and I/O

bandwidth once available only in mainframes, at cost orders of magnitude below that

of a mainframe. Servers are being used to solve computationally intensive problems

in science and engineering that once belonged exclusively to the domain of

supercomputers. A distributed computing system is the system architecture that makes

a collection of heterogeneous computers, workstations, or servers act and behave as a

single computing system. In such a computing environment, users can uniformly

access and name local or remote resources, and run processes from anywhere in the

system, without being aware of which computers their processes are running on.

Distributed computing systems have been studied extensively by researchers, and a

great many claims and benefits have been made for using such systems. In fact, it is

hard to rule out any desirable feature of a computing system that has not been claimed

to be offered by a distributed system [24]. However, the current advances in

processing and networking technology and software tools make it feasible to achieve

the following advantages:

• Increased performance. The existence of multiple computers in a distributed system

allows applications to be processed in parallel and thus improves application and

system performance. For example, the performance of a file system can be improved

by replicating its functions over several computers; the file replication allows several

applications to access that file system in parallel. Furthermore, file replication

distributes network traffic associated with file access across the various sites and thus

reduces network contention and queuing delays.

• Sharing of resources. Distributed systems are cost-effective and enable efficient

access to all system resources. Users can share special purpose and sometimes



expensive hardware and software resources such as database servers, compute servers,

virtual reality servers, multimedia information servers, and printer servers, to name

just a few.

• Increased extendibility. Distributed systems can be designed to be modular and

adaptive so that for certain computations, the system will configure itself to include a

large number of computers and resources, while in other instances, it will just consist

of a few resources. Furthermore, limitations in file system capacity and computing

power can be overcome by adding more computers and file servers to the system

incrementally.

• Increased reliability, availability, and fault tolerance. The existence of multiple

computing and storage resources in a system makes it attractive and cost-effective to

introduce fault tolerance to distributed systems. The system can tolerate the failure in

one computer by allocating its tasks to another available computer. Furthermore, by

replicating system functions and/or resources, the system can tolerate one or more

component failures.

• Cost-effectiveness. The performance of computers has been approximately doubling

every two years, while their cost has decreased by half every year during the last

decade. Furthermore, the emerging high speed network technology [e.g., wave-

division multiplexing, asynchronous transfer mode (ATM)] will make the

development of distributed systems attractive in terms of the price/performance ratio

compared to that of parallel computers. These advantages cannot be achieved easily

because designing a general purpose distributed computing system is several orders of

magnitude more difficult than designing centralized computing systems—designing a

reliable general-purpose distributed system involves a large number of options and

decisions, such as the physical system configuration, communication network and

computing platform characteristics, task scheduling and resource allocation policies

and mechanisms, consistency control, concurrency control, and security, to name just

a few. The difficulties can be attributed to many factors related to the lack of maturity

in the distributed computing field, the asynchronous and independent behavior of the



systems, and the geographic dispersion of the system resources. These are

summarized in the following points:

• There is a lack of a proper understanding of distributed computing theory—the field

is relatively new and we need to design and experiment with a large number of

general-purpose reliable distributed systems with different architectures before we can

master the theory of designing such computing systems. One interesting explanation

for the lack of understanding of the design process of distributed systems was given

by Mullender. Mullender compared the design of a distributed system to the design of

a reliable national railway system that took a century and half to be fully understood

and mature. Similarly, distributed systems (which have been around for

approximately two decades) need to evolve into several generations of different

design architectures before their designs, structures, and programming techniques can

be fully understood and mature.

• The asynchronous and independent behavior of the system resources and/or

(hardware and software) components complicate the control software that aims at

making them operate as one centralized computing system. If the computers are

structured in a master–slave relationship, the control software is easier to develop and

system behavior is more predictable. However, this structure is in conflict with the

distributed system property that requires computers to operate independently and

asynchronously.

• The use of a communication network to interconnect the computers introduces

another level of complexity. Distributed system designers not only have to master the

design of the computing systems and system software and services, but also have to

master the design of reliable communication networks, how to achieve

synchronization and consistency, and how to handle faults in a system composed of

geographically dispersed heterogeneous computers. The number of resources

involved in a system can vary from a few to hundreds, thousands, or even hundreds of

thousands of computing and storage resources.

Despite these difficulties, there has been limited success in designing special-purpose

distributed systems such as banking systems, online transaction systems, and point-of-

sale systems. However, the design of a general purpose reliable distributed system



that has the advantages of both centralized systems (accessibility, management, and

coherence) and networked systems (sharing, growth, cost, and autonomy) is still a

challenging task. Kleinrock makes an interesting analogy between the human-made

computing systems and the brain. He points out that the brain is organized and

structured very differently from our present computing machines. Nature has been

extremely successful in implementing distributed systems that are far more intelligent

and impressive than any computing machines humans have yet devised. We have

succeeded in manufacturing highly complex devices capable of high speed

computation and massive accurate memory, but we have not gained sufficient

understanding of distributed systems; our systems are still highly constrained and

rigid in their construction and behavior. The gap between natural and man-made

systems is huge, and more research is required to bridge this gap and to design better

distributed systems. In the next section we present a design framework to better

understand the architectural design issues involved in developing and implementing

high performance distributed computing systems. A high-performance distributed

system (HPDS) (Figure 1.1) includes a wide range of computing resources, such as

workstations, PCs, minicomputers, mainframes, supercomputers, and other special-

purpose hardware units. The underlying network interconnecting the system resources

can span LANs, MANs, and even WANs, can have different topologies (e.g., bus,

ring, full connectivity, random interconnect), and can support a wide range of

communication protocols.



Fig. 1.1 High-performance distributed system.

1.2 PROMISES AND CHALLENGES OF PARALLEL AND

DISTRIBUTED SYSTEMS

The proliferation of high-performance systems and the emergence of high speed

networks (terabit networks) have attracted a lot of interest in parallel and distributed

computing. The driving forces toward this end will be

(1) The advances in processing technology,

(2) The availability of high-speed network, and

(3) The increasing research efforts directed toward the development of software

support and programming environments for distributed computing.

Further, with the increasing requirements for computing power and the diversity in

the computing requirements, it is apparent that no single computing platform will

meet all these requirements. Consequently, future computing environments need to

capitalize on and effectively utilize the existing heterogeneous computing resources.

Only parallel and distributed systems provide the potential of achieving such an

integration of resources and technologies in a feasible manner while retaining desired

usability and flexibility. Realization of this potential, however, requires advances on a



number of fronts: processing technology, network technology, and software tools and

environments.

1.2.1 Processing Technology

Distributed computing relies to a large extent on the processing power of the

individual nodes of the network. Microprocessor performance has been growing at a

rate of 35 to 70 percent during the last decade, and this trend shows no indication of

slowing down in the current decade. The enormous power of the future generations of

microprocessors, however, cannot be utilized without corresponding improvements in

memory and I/O systems. Research in main-memory technologies, high-performance

disk arrays, and high-speed I/O channels are, therefore, critical to utilize efficiently

the advances in processing technology and the development of cost-effective high

performance distributed computing.

1.2.2 Networking Technology

The performance of distributed algorithms depends to a large extent on the bandwidth

and latency of communication among work nodes. Achieving high bandwidth and

low latency involves not only fast hardware, but also efficient communication

protocols that minimize the software overhead. Developments in high-speed networks

provide gigabit bandwidths over local area networks as well as wide area networks at

moderate cost, thus increasing the geographical scope of high-performance distributed

systems.

The problem of providing the required communication bandwidth for distributed

computational algorithms is now relatively easy to solve given the mature state of

fiber-optic and optoelectronic device technologies. Achieving the low latencies

necessary, however, remains a challenge. Reducing latency requires progress on a

number of fronts. First, current communication protocols do not scale well to a high-

speed environment. To keep latencies low, it is desirable to execute the entire protocol

stack, up to the transport layer, in hardware. Second, the communication interface of

the operating system must be streamlined to allow direct transfer of data from the

network interface to the memory space of the application program. Finally, the speed



of light (approximately 5 microseconds per kilometer) poses the ultimate limit to

latency. In general, achieving low latency requires a two-pronged approach:

1. Latency reduction. Minimize protocol-processing overhead by using streamlined

protocols executed in hardware and by improving the network interface of the

operating system.

2. Latency hiding. Modify the computational algorithm to hide latency by pipelining

communication and computation. These problems are now perhaps most fundamental

to the success of parallel and distributed computing, a fact that is increasingly being

recognized by the research community.

1.2.3 Software Tools and Environments

The development of parallel and distributed applications is a nontrivial process and

requires a thorough understanding of the application and the architecture. Although a

parallel and distributed system provides the user with enormous computing power and

a great deal of flexibility, this flexibility implies increased degrees of freedom which

have to be optimized in order to fully exploit the benefits of the distributed system.

For example, during software development, the developer is required to select the

optimal hardware configuration for the particular application, the best decomposition

of the problem on the hardware configuration selected, and the best communication

and synchronization strategy to be used, and so on. The set of reasonable alternatives

that have to be evaluated in such an environment is very large, and selecting the best

alternative among these is a nontrivial task. Consequently, there is a need for a set of

simple and portable software development tools that can assist the developer in

appropriately distributing the application computations to make efficient use of the

underlying computing resources. Such a set of tools should span the software life

cycle and must support the developer during each stage of application development,

starting from the specification and design formulation stages, through the

programming, mapping, distribution, scheduling phases, tuning, and debugging

stages, up to the evaluation and maintenance stages.



1.3 Current Scenario

The current scenario of the Parallel Systems can be viewed under three

perspectives. A common concept that applies to all of the following is the idea of

Total Ownership Cost (TOC). By far TOC is a common scale on which level of

computer processing is assessed worldwide. TOC is defined by the ratio of Total Cost

of Implementation and maintenance by the net throughput the parallel cluster delivers.

TOTAL COST OF IMPLEMENTATION AND MAINTENANCE

TOC = ------------------------------------------------------------------------------------

NETSYSTEM THROUGHPUT (IN FLOATING POINT / SEC)

1.3.1 End user perspectives

Various activities such as rendering, adobe Photoshop applications and

different processes come under this category. As there is increase in need of

processing power day by day it thereby increases hardware cost. From the end user

prospective the Parallel Systems aims to reduce the expenses and avoid the

complexities. At this stage we are trying to implement a Parallel System which is

more cost effective and user friendly. However, as the end user, TOC is less important

in most cases because Parallel Clusters could rarely be owned by a single user, and in

that case the net throughput of the Parallel System becomes the most crucial factor.

1.3.2 Industrial Perspective

In Corporate Sectors Parallel Systems are extensively implemented. Such a

Parallel Systems consist of machines that have to handle millions of nodes

theoretically not practically. From the industrial point of view the Parallel System

aims at resource isolation, replacing large scale dedicated commodity hardware and

Mainframes. Corporate sectors often place TOC as the primary criteria at which a

Parallel Cluster is judged. With increase in scalability, the cost of owing Parallel

Clusters shoot up to unmanageable heights and our primary aim is this area is to bring

down the TOC as much as possible.



1.3.3 Developers, Researchers & Scientists Perspective

Scientific applications such as 3D simulations, high scale scientific rendering,

intense numerical calculations, complex programming logic, and large scale

implementation of algorithms (BLAS and FFT Libraries) require levels of processing

and calculation that no modern day dedicated vector CPU could possibly meet.

Consequently, the Parallel Systems are proven to be the only and the most efficient

alternative in order to keep pace with modern day scientific advancements and

research. TOC is rarely a matter of concern here.

1.4 Obstacles and Why we don’t have 10 GHz today…

Fig 1.2 Transistor vs. Clock Speed

CPU performance growth as we have known it hit a wall

Figure graphs the history of Intel chip introductions by clock speed and number of

transistors. The number of transistors continues to climb, at least for now. Clock

speed, however, is a different story.



Around the beginning of 2003, you’ll note a disturbing sharp turn in the previous

trend toward ever-faster CPU clock speeds. We have added lines to show the limit

trends in maximum clock speed; instead of continuing on the previous path, as

indicated by the thin dotted line, there is a sharp flattening. It has become harder and

harder to exploit higher clock speeds due to not just one but several physical issues,

notably heat (too much of it and too hard to dissipate), power consumption (too high),

and current leakage problems.

Sure, Intel has samples of their chips running at even higher speeds in the

lab—but only by heroic efforts, such as attaching hideously impractical quantities of

cooling equipment. You won’t have that kind of cooling hardware in your office any

day soon, let alone on your lap while computing on the plane.

1.5 Myths and Realities: 2 x 3GHz < 6 GHz

So a dual-core CPU that combines two 3GHz cores practically offers 6GHz of

processing power. Right?

Wrong. Even having two threads running on two physical processors doesn’t

mean getting two times the performance. Similarly, most multi-threaded applications

won’t run twice as fast on a dual-core box. They should run faster than on a single-

core CPU; the performance gain just isn’t linear, that’s all.

Why not? First, there is coordination overhead between the cores to ensure

cache coherency (a consistent view of cache, and of main memory) and to perform

other handshaking. Today, a two- or four-processor machine isn’t really two or four

times as fast as a single CPU even for multi-threaded applications. The problem

remains essentially the same even when the CPUs in question sit on the same die.

Second, unless the two cores are running different processes, or different

threads of a single process that are well-written to run independently and almost never

wait for each other, they won’t be well utilized. (Despite this, we will speculate that

today’s single-threaded applications as actually used in the field could actually see a

performance boost for most users by going to a dual-core chip, not because the extra

core is actually doing anything useful, but because it is running the ad ware and spy

ware that infest many users’ systems and are otherwise slowing down the single CPU



that user has today. We leave it up to you to decide whether adding a CPU to run your

spy ware is the best solution to that problem.)

If you’re running a single-threaded application, then the application can only

make use of one core. There should be some speedup as the operating system and the

application can run on separate cores, but typically the OS isn’t going to be maxing

out the CPU anyway so one of the cores will be mostly idle. (Again, the spy ware can

share the OS’s core most of the time.)

1.6 The problem statement

So now let us summarize and define the problem statement:

• Since the growth of requirements of processing is far greater than the growth

of CPU power, and since the silicon chip is fast approaching its full capacity,

the implementation of parallel processing at every level of computing becomes

inevitable.

• There is a need to have a single and complete clustering solution which

requires minimum user interference but at the same time supports

editing/modifications to suit the user’s requirements.

• There should be no need to modify the existing applications.

• The parallel system must be able to support different platforms

• The system should be able to fully utilize all the available hardware resources

without the need of buying any extra/special kind of hardware.

1.7 About PARALLEX

While the term parallel is often used to describe clusters, they are more

correctly described as a type of distributed computing. Typically, the term parallel

computing refers to tightly coupled sets of computation. Distributed computing is

usually used to describe computing that spans multiple machines or multiple

locations. When several pieces of data are being processed simultaneously in the same

CPU, this might be called a parallel computation, but would never be described as a

distributed computation. Multiple CPUs within a single enclosure might be used for



parallel computing, but would not be an example of distributed computing. When

talking about systems of computers, the term parallel usually implies a homogenous

collection of computers, while distributed computing typically implies a more

heterogeneous collection. Computations that are done asynchronously are more likely

to be called distributed than parallel. Clearly, the terms parallel and distributed lie at

either end of a continuum of possible meanings. In any given instance, the exact

meanings depend upon the context. The distinction is more one of connotations than

of clearly established usage.

Parallex is both a parallel and distributed cluster because it supports both ideas

of multiple CPUs within a single enclosure as well as a heterogeneous collection

of computers.

1.8 Motivation

The motivation behind this project is to provide a cheap and easy to use

solution to cater to the high performance computing requirements of organizations

without the need to install any expensive hardware.

In many organizations including our college, we have observed that when old

systems are replaced by newer ones the older ones are generally dumped or sold at

throw away prices. We also wanted to find a solution to effectively use this “silicon

waste”. These wasted resources can be easily added to our system as the processing

need increases, because the parallel system is linearly scalable and hardware

independent. Thus the intent is to have an environment friendly and effective

solution that utilizes all the available CPU power to execute applications faster.

1.9 Features of Parallex

• Parallex simplifies the cluster setup, configuration and management process.

• It supports machines with hard disks as well as diskless machines running at

the same time.

• It is flexible in design and easily adaptable.

• Parallex does not require any special kind of hardware.



• It is multi platform compatible.

• It ensures efficient utilization of silicon waste (old unused hardware).

• Parallex is scalable.

How these features are achieved and details of design will be discussed in subsequent

chapters.

1.10 Why our design is “Alternative” to parallel system?

Every renowned technology needs to evolve after a particular time as new

generation enhances the sort come of the technology used earlier. So what we

achieved is a bare bone line semantic of parallel system.

When we were studying about the parallel and distributed system, the

advantage is that we were working on the latest technology. The parallel system

designed by scientist, no doubt were far more genius and intelligent than us. Our

system is unique because we are actually splitting up the task according to processing

power of nodes instead of just load balancing. Hence a slow processing node will get

a smaller task compared to a faster one and all nodes will show the output the same

calculated time on master node.

We found some difficulties that how much task should be given to the

heterogeneous system in order to get result at same time. We worked on this problem

to find the solution and developed mathematical distribution algorithm which was

successfully implemented and functional. This algorithm breaks the task according to

the speed of the CPUs by sending a test application to all nodes and storing the return

time of each node into a file. Then we further worked on the automation of the entire

system. We were using password less secure shell login and network file system. We

were successful up to some extent but atomization was not possible to ssh and NFS

configuration. Hence manually setting up of new nodes every time is a demerit of ssh

and NFS. To overcome this demerit we sorted the alternative solution which is

Beowulf cluster, but after studying we concluded that it considered all nodes of same

configuration and send tasks equally to all nodes.

To improve our system we think differently from Beowulf cluster. We tried to

make system more cost effective. We thought of diskless cluster concept in order get

reed of hard disk to cut the cost and enhance the reliability of machine. The storage



device will affect the performance of entire system and increase the cost (due to

replacement of the disks) and increase the waste of time in searching the faults. So,

we studied & patched the Beowulf server & Beowulf distributed process space

according to our need for our system. We made a kernel images for running diskless

clusters using RARP protocol. When clusters runs kernel image in its memory, it

demands for IP from master node or can also be called as server. The server assigns

IP & node number of the clusters. By this, our diskless clusters system stands & ready

to use for parallel computing. Then we modified our various codes including our own

distribution algorithm, according to our new design. The best part of our system was

that there is no need for any authorization setup. Every thing is now automatic.

Till now, we were working on CODE LEVEL PARALLELISM. In this, we

little bit modify code to run on our system just like MPI libraries are used to make

code parallely executable. Now, the challenge with us was that what if we didn’t get

source code instead of which we will get binary file to execute it on our parallel

system. So, now we need to enhance our system by adding BINARY LEVEL

PARALLELISM. We studied Open Mosix. Once open Mosix is installed & all the

nodes are booted, the Open Mosix nodes see each other in the cluster and start

exchanging information about their load level and resource usage. Once the load

increases beyond the defined level, the process migrates to any other nodes on the

network. There might be a situation where process demands heavy resource usage, it

may happen that the process may keep migrating from node to node without been

serviced. This is the major design flaw of the Open Mosix. And we are working out to

find the solution.

So, Our Design is ALTERNATIVE to all problems in the world of parallel

computing.

1.11 Innovation

Firstly our system does not require any additional hardware if the existing

machines are well connected in a network. Secondly, even in a heterogeneous

environment, with few fast CPUs and a few slower ones, the efficiency of the system

does not drop by more than 1 to 5%, still maintaining an efficiency of around 80% for

suitably adapted applications. This is because the mathematical distribution algorithm



considers relative processing powers of the node distributing only the amount of load

that a node can process in the calculated optimal time of the system. All the nodes

will process respective tasks and produce output at this calculated time. The most

important point about our system is the ability to use diskless nodes in cluster, thereby

reducing hardware costs and space and the required maintenance. Also in case of

binary executables (when source code is not available) our system exhibits almost

20% performance gains.



Chapter 2. Requirement Analysis

2.1 Determining the overall mission of Parallex

• User base: Students, educational institutes, small to medium business

organizations.

• Cluster usage: There will be one part of the cluster fully dedicated to solve the

problem at hand and an optional part where computing resources from

individual workstations are used. In the latter part, the parallel problems will

be having lower priorities.

• Software to be run on cluster: Depends upon the user base. At the cluster

management level, the system software will be Linux.

• Dedicated or shared cluster: As mentioned above it will be both.

• Extent of the cluster: Computers that are all on the same subnet

2.2 Functional Requirements for Parallex system

Functional Requirement 1

The PC’s must be connected in LAN so as to enable the system to be use without any

obstacles.


There will one master or controlling node which will distribute the task according to

the processing speed of the node.

Services

Three services are to be provided on the master.

1. There is a Network Monitoring tool for resource discovery (e.g. IP address,

MAC addresses, UP/DOWN Status etc.)

2. The Distribution Algorithm will distribute the task according to the current

processing speed of the nodes.

3. Parallex Master Script that will send the distributed task to the nodes and get

back the result and integrate it and gives out the output.




The final size of the executable code so be such that it should reside in the limited

memory constraints on the machine.


This product will only be used to speed up the applications which are preexisting in

the enterprise.

2.3 Non-Functional Requirements for system

- Performance

Even in a heterogeneous environment, with few fast CPUs and a few slower ones, the

efficiency of the system does not drop by more than 1 to 5%, still maintaining an

efficiency of around 80% for suitably adapted applications. This is because the

mathematical distribution algorithm considers relative processing powers of the node

distributing only the amount of load that a node can process in the calculated optimal

time of the system. All the nodes will process respective tasks and produce output at

this calculated time. The most important point about our system is the ability to use

diskless nodes in cluster, thereby reducing hardware costs and space and the required

maintenance. Also in case of binary executables (when source code is not available)

our system exhibits almost 20% performance gains.

- Cost

While a system of n parallel processors is less efficient than one n times faster

processor, the Parallel System is often cheaper to build. Parallel computation is used

for tasks which require very large amounts of computation, take a lot of time, and can

be divided into n independent subtasks. In recent years, most high performance

computing systems, also known as supercomputers, have parallel architectures.



- Manufacturing costs

No extra hardware required. Cost of setting up LAN.

- Benchmarks

There are at least three reasons for running benchmarks. First, a benchmark will

provide us with a baseline. If we make changes to our cluster or if we suspect

problems with our cluster, we can rerun the benchmark to see if performance is really

any different. Second, benchmarks are useful when comparing systems or cluster

configurations. They can provide a reasonable basis for selecting between

alternatives. Finally, benchmarks can be helpful with planning.

For benchmarking we will use a 3D rendering tool named Povray (Persistence Of

Vision Ray tracer, please see the Appendix for more details).

- Hardware required

x686 Class PCs (Linux (2.6x Kernels) installed with intranet connection)

Switch (100/10T)

Serial port connectors

100 BASE T LAN cable, RJ 45 connectors.

- Software Resources Required

Linux (2.6.x kernel)

Intel Compiler suite (Noncommercial)

LSB (Linux Standard Base) Set of GNU Kits with GNU CC/C++/F77/LD/AS

GNU Krell monitor

Number of PC’s connected in LAN

8 NODES in the LAN.



Chapter 3. Project Plan

Plan of execution for the project was as follows:

Serial

No.

Activity Software

Used

Number Of

Days

1 Project Planning

a) Choosing domain

b) Identifying Key areas of

work

c) Requirement analysis

- 10

2 Basic Installation of LINUX.

LINUX (2.6x

Kernel)

3

3 Brushing up on C programming Skills

- 5

4 Shell Scripting

LINUX (2.6x

Kernel), GNU

BASH

12

5 C Programming in LINUX Environment

GNU C

Compiler

Suite

5

6 A Demo Project (Universal Sudoku

Solver)

To familiarize with LINUX

programming environment.

GNU C

Compiler

Suite , INTEL

Compiler suite

(Non-

commercial)

16

7 Study Advanced LINUX tools and

Installation of Packages & RED HAT

RPMs.

Iptraf, mc, tar,

rpm, awk, sed,

GNU plot,

strace, gdb, etc.

10



8 Studying Networking Basics & Network

configuration in LINUX.

- 8

9 Recompiling, Patching and

analyzing the system kernel

LINUX (Kernel

2.6x.x), GNU c

compiler

3

10 Study & implementation of Advanced

Networking Tools : SSH & NFS

ssh & Openssh,

nfs

7

11 a) Preparing the preliminary design of

the total workflow of the project.

b) Deciding the modules for overall

execution, and dividing the areas of the

concentration among the project group.

c) Build Stage I prototype

All of the above 17

12 Build Stage II prototype

(Replacing ssh by custom made

application)

All of the above 15

13 Build Stage III prototype

(Making Diskless Cluster)

All of the above 10

14 Testing & Building Final Packages

All of the above 10

Table 1.1 Project Plan



Chapter 4. System Design

Generally speaking, the design process of a distributed system involves three main

activities:

(1) designing the communication system that enables the distributed system resources

and objects to exchange information,

(2) defining the system structure (architecture) and the system services that enable

multiple computers to act as a system rather than as a collection of computers, and

(3) defining the distributed computing programming techniques to develop parallel

and distributed applications.

Based on this notion of the design process, the distributed system design framework

can be described in terms of three layers:

(1) network, protocol, and interface (NPI) layer,

(2) system architecture and services (SAS) layer, and

(3) distributed computing paradigms (DCP) layer. In what follows, we describe the

main design issues to be addressed in each layer.

Fig. 4.1 Design Framework



• Communication network, protocol, and interface layer. This layer describes the

main components of the communication system that will be used for passing control

and information among the distributed system resources. This layer is decomposed

into three sub layers: network type, communication protocols, and network interfaces.

• Distributed system architecture and services layer. This layer represents the

designer’s and system manager’s view of the system. SAS layer defines the structure

and architecture and the system services (distributed file system, concurrency control,

redundancy management, load sharing and balancing, security service, etc.) that must

be supported by the distributed system in order to provide a single-image computing

System.

• Distributed computing paradigms layer. This layer represents the programmer

(user) perception of the distributed system. This layer focuses on the programming

paradigms that can be used to develop distributed applications. Distributed computing

paradigms can be broadly characterized based on the computation and communication

models. Parallel and distributed computations can be described in terms of two

paradigms: functional parallel and data parallel paradigms. In functional parallel

paradigm, the computations are divided into distinct functions which are then

assigned to different computers. In data parallel paradigm, all the computers run the

same program, the same program multiple data (SPMD) stream, but each computer

operates on different data streams.

With reference to Fig. 4.1, Parallex can be described as follows:



Fig. 4.2 Parallex Design



Chapter 5. Implementation Details

The goal of the project is to provide an efficient system that will handle process

parallelism with the help of Clusters. This parallelism will thereby reduce the time of

execution. Currently we form a cluster of 8 nodes. Using a single computer for

execution of any heavy process takes lot of time in execution. So here we are forming

a cluster and executing those processes in parallel by dividing the process into number

of sub processes. Depending on the nodes in cluster we migrate the process to those

node and when the execution is over then it brings back the output produced by them

to the Master node. By doing this we are reducing the process execution time and

increasing the CPU utilization.

5.1 Hardware Architecture

We have implemented a Shared Nothing Architecture of parallel system by

making use of Coarse Grain Cluster structure. The inter-connect is ordinary 8-port

switch and an optionally a Class-B or Class-C network. It is 3 level architecture:

1. Master topology

2. Slave Topology

3. Network interconnect

1. Master is a Linux running machine with a 2.6.x or 2.4.x (both under testing)

kernel. It runs the parallel-server and contains the application interface to drive the

remaining machines. The master runs a network scanning script to detect all the slaves

that are alive and retrieves all the necessary information about each slave. To

determine the load on each slave just before the processing of the main application,

the master sends a small diagnostic application to the slave to estimate the load it can

take at the present moment. Having collected all the relevant information, it does all

the scheduling, implementing of parallel algorithms (distributing the tasks according

processing power and current load), making use of CPU extensions (MMX, SSE,

3DNOW) depending upon the slave architecture, and everything except the execution

of the program itself. It accepts the input/task to be executed. It allocates the tasks to



underlying slave nodes constituting the parallel system, which execute the tasks in

parallel and return the output to the Master. Master plays the role of watchdog, which

may or may not participate in actual processing But manages the entire task.

2. Slave is a single system cluster image (SSCI). It is basically dedicated for

processing purpose. It accepts the sub-task along with the necessary library modules

executes them and returns the output back to the Master. In our case, the slaves would

be multi-boot capable systems, which could at one point of time be diskless cluster

hosts, at other time they might behave as a general purpose cluster node and at some

other time, they could act as normal CPU handling routine tasks of office and homes.

In case of Diskless Machines, the slave will boot on Pre-created kernel image patched

appropriately.

3. Network interconnection is to merge both Master and Slave topologies. It makes

use of an 8-port switch, RJ 45 connectors and serial CAT 5 cables. It is a Star

topology where the Master and the Slaves are interconnected through the Switch.

Fig. 5.1 Parallel System H/W Architecture

Cluster Monitoring : Each slave runs a server that collects the kernel processing / IO

/ memory / CPU and all the related details from PROC VIRTUAL file system and



forwards it to the MASTER NODE (here acting as a slave to each server running on

each slave), and a user base programs plots it interactively on the Server screen thus

showing the CPU / MEMORY / IO details of each node separately.

5.2 SOFTWARE ARCHITECTURE:-

This architecture consists of two parts i.e.

1. Master Architecture

2. Slave Architecture

Master consists of following levels.

1. Linux BIOS: Linux BIOS usually loads a Linux kernel.

2. Linux: Platform on which Master runs.

3. SSCI + Beoboot: This level extracts a single system cluster image used by

Slave nodes.

4. Fedora Core/ Red Hat: Actual Operating System running on Master.

5. System Services: Essential Services running on Master. Eg. RARP Resolver

Daemon.

Slave inherits the Master with the following levels.

1. Linux BIOS

2. Linux

3. SSCI

Fig 5.2 Parallel System S/W Architecture



Parallex is broadly divided in to following Modules:

1. Scheduler: this is the heart of out system. With radically new approach

towards data and instruction level distribution, we have implemented a

completely optimal heterogeneous cluster technology. We do task allocation

based on the actual processing capability on each node and not on the give

GHz power on the manual of the system. The task allocation is dynamic and

the scheduling policy is based on POSIX scheduling implementation. We are

also capable of implementing preemption, which we right now do not do in

favour of the fact that system such as Linux and FreeBSD are capable of

industry level preemption.

2. Job/instruction alligator: this is a set of remote fork like utility that allocates

the jobs to then nodes. Unlike traditional cluster technology, this job allocator

is capable of doing execution in disconnected mode that means that the

network latency would substantially reduce due to temporary disconnection.

3. Accounting: we have written a utility “remote cluster monitor” which is

capable of providing us samples of results from all the nodes, information

about the CPU load, temperature, and memory statistics. We propose that with

less than 0.2% of CPU power consumption, our network monitoring utility can

sample over 1000 nodes in less than 3 seconds.

4. Authentication: all transactions between the nodes are 128 bit encrypted and

do not require root privileges to run. Just a common user on all the standalone

node must exist. For the diskless part, we remove this restriction as well.

5. Resource discovery: we run our own socket layered resource discovery

utility, which discovers any additional nodes. Also reports if the resource has

been lost. In case of any additional hardware capable of being used as part of

parallel system, such as an additional processor to a system, or a replacement

of processor with dual core processor is also reported continually.



6. Synchronizer: the central balancing of the cluster. Since the cluster is capable

of simultaneously running both the diskless, and standalone nodes as part of

the same cluster, the synchronizer makes the result more reasonable in output

is queued in real time so that data is not mixed up. It does instruction

dependency analysis, and also uses pipelines in the network to make

interconnect more communicative.

5.3 Description for software behavior

The end user will submit the process/application to the administrator in case

the application is source based, and the Cluster administrator owns the responsibility

to explicitly parallelize the application for maximum exploitation of parallel

architectures within the CPU and across the cluster nodes. In case the application is

binary ( non source), the user might himself/herself submit the code to Master node

program acceptor, which in turn would run the application with somewhat lower

efficiency as compared to the source submissions to the administrator. Now the total

system is responsible for minimizing the time of processing which in turn increases

the throughput and speed up the processing.



5.3.1 Events

1. System Installation

2. Network initialization

3. Server and host configuration

4. Take input

5. Parallel execution

6. Send response

5.3.2 States

1. System Ready

2. System Busy

3. System Idle



Chapter 6. Technologies Used

6.1 General terms

We will now briefly define the general terms that will be used in further descriptions

or are related to our system.

Cluster: - Interconnection of large number of computers working together in close

synchronized manner to achieve higher performance, scalability and net

computational power.

Master: - Server machine which acts as the administrator of the entire parallel Cluster

and executes task scheduling.

Slave: - A client node which executes the task as given by the Master.

SSCI: - Single System Cluster Image is a hypothetical idea of implementing cluster

nodes into an image, where the cluster nodes will behave as if it were an additional

processor; add on ram etc. into the controlling Master computer. This is the base

theory of cluster level parallelism. Example implementations are, Multi node NUMA

(IBM/Sequent) Multi-quad computers, SGI ATIX Servers. However, the idea of true

SSCI remains unimplemented when it comes to heterogeneous clusters for parallel

processing, except for Supercomputing clusters such as Thunder and Earth Stimulator.

RARP: - Reverse Address Resolution

Protocol is a network layer protocol used to resolve an IP address from a

given hardware address (such as an Ethernet address / MAC Address).



BProc:-

The Beowulf Distributed Process Space (BProc) is set of kernel modifications,

utilities and libraries which allow a user to start processes on other machines in a

Beowulf-style cluster. Remote processes started with this mechanism appear in the

process table of the front end machine in a cluster. This allows remote process

management using the normal UNIX process control facilities. Signals are

transparently forwarded to remote processes and exit status is received using the usual

wait() mechanisms.

Having discussed the basic concepts of parallel and distributed systems, the problems

in this field, and an overview of Parallex, we now move forward with the requirement

analysis and design details of our system.



Chapter 7. Testing

Logic Coverage/Decision Based: Test cases

SI

No

.

Test case name Test

Procedure

Pre-

condition

Expected

Result

Reference

to Detailed

Design

1. Initial_frame_fail Initial frame

not defined

None Parallex

should

give error

& exit

Distribution

algo

2. Final_frame_fail Final frame not

defined

None Parallex

should

give error

& exit

Distribution

algo

3. Initial_final_full Initial & Final

frame given

None Parallex

should

distribute

accordingt

to speed.

Distribution

Algo.

4. Input_file_name_

blank

No input file

given

None Input file

not found

Parallex

Master

5. Input_parameters

_blank

No parameters

defined at

command line

None Exit on

error

Parallex

Master

Table 7.1 Logic/ coverage/decidion Testing



Initial Functional Test Cases for Parallex

Use Case Function Being

Tested

Initia l System

State Input Expected Output

System

Startup

Master is started

when the switch

is turned "on"

Master is off Activate the

"on" switch Master ON

System

Startup

Nodes is started

when the switch

is turned “on”

Nodes is ON Activate the

"on" switch NODES is ON

System

Startup

Nodes assigned

IP by master Booting

Get boot Image

from Master

Master shows that

nodes are UP

System

Shutdown

System is shut

down when the

switch is turned

"off"

System is on and

not servicing a

customer

Activate the

"off" switch System is off

System

Shutdown

Connection to the

Master is

terminated when

the system is shut

down

System has just

been shut down

Verify from the

Master side that a

connection to the

Slave no longer

exists

Session

System reads a

customer's

Program

System is on and

not servicing a

customer

Insert a readable

Code/Program Program accepted

Session

System rejects an

unreadable

Program

System is on and

not servicing a

customer

Insert an

unreadable

Code/ program

Program is

rejected; System

displays an error

screen; System is

ready to start a new

sesion



Use Case Function Being

Tested

Initia l System

State Input Expected Output

System

Startup

Master is started

when the switch

is turned "on"

Master is off Activate the

"on" switch Master ON

Session

System accepts

customer's

Program

System is asking

for entry of

RANGE of

calculation

Enter a RANGE System gets the

RANGE

Session System breaks

the task

System is

breaking task

according to

processing speed

of Nodes.

Perform

distribution

Algo

System breaks task

& write into a file.

Session

System feeds the

task to Nodes for

processing

System feeds

tasks to the

nodes for

execution

Send tasks

System displays a

menu of task

running on Nodes

Session

Session ends

when all nodes

gives out output

System is

getting output of

all nodes &

display the

output & ends

Get the output

from all nodes.

System displays

the output & quit.

Table 7.2 Functional Test



Cyclomatic Complexity:

Control Flow Graph of a System:

Fig 7.1 Cyclomatic Diagram for the system

Cyclomatic complexity is a software metric (measurement) in computational

complexity theory. It was developed by Thomas McCabe and is used to measure the

complexity of a program. It directly measures the number of linearly independent

paths through a program's source code.

Computation of Cyclomatic Complexity:

In the above flow graph

E = no. of edges = 9

N = no. of nodes = 7

M = E – N + 2

= 9 – 7 + 2

= 4



Console And Black Box Testing:

CONSOLE TEST CASES

Sr.

No. Test Procedure Pre - Condition Expected Result Actual Result

1 Testing in Linux

terminal

Terminal

variables have

default values

Xterm related tools

are disabled

No graphical

information

displayed

2 Invalid no. of

arguments All nodes are up Error message Proper Usage given

3

Pop-up terminals

for different

nodes

All nodes are up

No of pop-ups =

no. of cores in alive

nodes

No of pop-ups = no.

of cores in alive

nodes

4 3D Rendering on

single machine

All necessary files

in place Live 3D rendering

Shows frame being

rendered

5 3D Rendering on

Parallex system. All nodes are up Status of rendering Rendered video

6 Mplayer testing Rendered frames Animation in .avi

format

Rendered

video(.avi)

Table 7.3 Console Test cases



BLACK BOX TEST CASES

Sr.

No. Test Procedure Pre - Condition Expected Result Actual Result

1 New Node up Node is Down

Status Message

Displayed By

NetMon Tool.

Message Node UP

2 Node goes Down Nodes is UP

Status Message

Displayed By

NetMon Tool

Message Node

DOWN

3 Nodes

Information Nodes are UP

Internal Information

of Nodes

Status, IP , MAC

addr, RAM etc.

4 Main task

submission

Application is

Compiled

Next module called

(distribution algo)

Processing speed

of the nodes.

5

Main task

submission with

faulty input.

Application is

Compiled ERROR

Display error &

EXIT

6 Distribution

algorithm Get RANGE

Break task according

processing speed of

the nodes

Breaks The

RANGE &

generates scripts

7 Cluster feed script All nodes up

Task sent to

individual machines

for execution

Display shows

task executed on

each machine

8 Result assembly All machines have

returned results

Final result

calculation

Final result

displayed on

screen

9 Fault tolerance

Machine(s) goes

down in-between

execution

Error recovery script

is executed

Task resent to all

alive machines

Table 7.4 Black box Testing



System Usage Specification outline:

Fig 7.2 System Usage pattern :

Fig 7.3 Histogram:



Runtime BENCHMARK:

Runtime Benchmark :

Fig 7.4 One frame from Complex Rendering on Parallex: Simulation of an explosion

The following is the output comparison of same application with same

parameters being run on a Standalone Machine, Existing Beowulf Parallel Cluster,

and Our Cluster System Parallex.

Application: POVRAY

Hardware Specifications:

NODE 0 P4 2.8 GHz

NODE 1 Cor2DUO 2.8 GHz

NODE 2 AMD 64, 2.01 GHz

NODE 3 AMD 64, 1.80 GHz

NODE 4 CELERON D,2.16 GHz



Benchmark Results:

Time Single

Machine

Existing

Parallel

Systems(4

NODES)

Parallex

Cluster

System (4

NODES)

Real Time 14m 44.3 s 3m 41.61 s 3m 1.62 s

User Time 13m 33.2s 10m 4.67 s 9m 30.75 s

Sys Time 2m 2.26s 0m 2.26 s 0m 2.31s

Table 7.5 Benchmark Results

Note : User Time of Cluster is approximate sum of all per user system time per node.



Chapter 8. Cost Estimation

Since the growth of requirements of processing is far greater than the growth

of CPU power, and since the silicon chip is fast approaching its full capacity, the

implementation of parallel processing at every level of computing becomes inevitable.

Therefore we propose that in coming ages parallel processing and the

algorithms that sophisticate it, like the ones we have designed and implemented,

would form the heart of modern computing. Not surprisingly, parallel processing has

already begun to penetrate the modern computing marker directly in form of multi

core processors such is Intel dual-core and quad-core processors.

One of ours primary aims are simplistic implementation and least

administrative overhead makes the implementation of Parallex simple and effective.

Parallex can be easily deployed to all sectors of modern computing where CPU intensive applications form an important part for its growth.

While a system of n parallel processors is less efficient than one n times faster

processor, the Parallel System is often cheaper to build. Parallel computation is used

for tasks which require very large amounts of computation, take a lot of time, and can

be divided into n independent subtasks. In recent years, most high performance

computing systems, also known as supercomputers, have parallel architectures.

Cost effectiveness is one of the major achievements of our Parallex system.

We need no external or expensive hardware nor software, so price of our system is not

been expensive. Our system is based on heterogeneous clusters in which power of

CPU is not an issue due to our mathematical distribution algorithm. Our system

efficiency will not drop by more than 5% due to fewer slower machines.

So, we can say that we are using Silicon waste as challenge to our system,

where we use out dated slower CPUs. Hence our system is Environment friendly

design. One more feature of our system is that we are using diskless nodes which will

reduce the total cost of system by approx. 20% as we are not using the storage devices

of nodes. Apart from separate storage device we will use a centralized storage

solution. Last but not the least our all software tools are Open source.

Hence, we conclude that our Parallex system is one of the most cost effective

systems in its genre.



Chapter 9. User Manual

9.1 Dedicated cluster setup

For the dedicated cluster with one master and many diskless slaves, all the user has to

do is install the RPMs supplied in the installation disk on the master. The BProc

configuration file will then be found at /etc/bproc/config.

9.1.1 BProc Configuration

Main configuration file:

/etc/bproc/config

• Edit with favorite text editor

• Lines consist of comments (starting with #)

• Rest are keyword followed by arguments

• Specify interface:

interface eth0 10.0.4.1 255.255.255.0

• eth0 is interface connected to nodes

• IP of master node is 10.0.4.1

• Netmask of master node is 255.255.255.0

• Interface will be configured when BProc is started

Specify range of IP addresses for nodes:

iprange 0 10.0.4.10 10.0.4.14

• Start assigning IP addresses at node 0



• First address is 10.0.4.10, last is 10.0.4.14

• The size of this range determines the number of nodes in the cluster

• Next entries are default libraries to be installed on nodes

• Can explicitly specify libraries or extract library information from an

executable

• Need to add entry to install extra libraries

librariesfrombinary /bin/ls /usr/bin/gdb

• The bplib command can be used to see libraries that will be loaded

Next line specifies the name of the phase 2 image

bootfile /var/bproc/boot.img

• Should be no need to change this

• Need to add a line to specify kernel command line

• kernelcommandline apm=off console=ttyS0,19200

• Turn APM support off (since these nodes don’t have any)

• Set console to use ttyS0 and speed to 19200

• This is used by beoboot command when building phase 2 image

Final lines specify Ethernet addresses of nodes, examples given

#node 0 00:50:56:00:00:00

#node 00:50:56:00:00:01

• Needed so node can learn its IP address from master

• First 0 is optional, assign this address to node 0

• Can automatically determine and add ethernet addresses using the

nodeadd command



• We will use this command later, so no need to change now

• Save file and exit from editor

Other configuration files

/etc/bproc/config.boot

• Specifies PCI devices that are going to be used by the nodes at boot time

• Modules are included in phase 1 and phase 2 boot images

• By default the node will try all network interfaces it can find

/etc/bproc/node_up.conf

• Specifies actions to be taken in order to bring a node up

• Load modules

• Configure network interfaces

• Probe for PCI devices

• Copy files and special devices out to node

9.1.2 Bringing up BProc

Check BProc will be started at boot time

# chkconfig --list clustermatic

• Restart master daemon and boot server

# service bjs stop

# service clustermatic restart

# service bjs start

• Load the new configuration



• BJS uses BProc, so needs to be stopped first

• Check interface has been configured correctly

# ifconfig eth0

• Should have IP address we specified in config file

9.1.3 Build a Phase 2 Image

• Run the beoboot command on the master

# beoboot -2 -n --plugin mon

• -2 this is a phase 2 image

• -n image will boot over network

• --plugin add plugin to the boot image

• The following warning messages can be safely ignored

WARNING: Didn’t find a kernel module called gmac.o

WARNING: Didn’t find a kernel module called bmac.o

• Check phase 2 image is available

# ls -l /var/clustermatic/boot.img

9.1.4 Loading the Phase 2 Image

• Two Kernel Monte is a piece of software which will load a new

Linux kernel replacing one that is already running



• This allows you to use Linux as your boot loader!

• Using Linux means you can use any network that Linux supports.

• There is no PXE bios or Etherboot support for Myrinet, Quadrics or Infiniband

• “Pink” network boots on Myrinet which allowed us to avoid buying a 1024

port ethernet network

• Currently supports x86 (including AMD64) and Alpha

9.1.5 Using the Cluster

bpsh

• Migrates a process to one or more nodes

• Process is started on front-end, but is immediately migrated onto nodes

• Effect similar to rsh command, but no login is performed and no shell is

started

• I/O forwarding can be controlled

• Output can be prefixed with node number

• Run date command on all nodes which are up

# bpsh -a -p date

• See other arguments that are available

# bpsh -h

bpcp

• Copies files to a node

• Files can come from master node, or other nodes

• Note that a node only has a ram disk by default

• Copy /etc/hosts from master to /tmp/hosts on node 0



# bpcp /etc/hosts 0:/tmp/hosts

# bpsh 0 cat /tmp/hosts

9.1.6 Managing the Cluster

bpstat

• Shows status of nodes

• up node is up and available

• down node is down or can’t be contacted by master

• boot node is coming up (running node_up)

• error an error occurred while the node was booting

• Shows owner and group of node

• Combined with permissions, determines who can start jobs on the node

• Shows permissions of the node

---x------ execute permission for node owner

------x--- execute permission for users in node group

---------x execute permission for other users

bpctl

• Control a nodes status

• Reboot node 1 (takes about a minute)

# bpctl -S 1 –R

• Set state of node 0

# bpctl -S 0 -s groovy

• Only up, down, boot and error have special meaning, everything else

means not down



• Set owner of node 0

# bpctl -S 0 -u nobody

• Set permissions of node 0 so anyone can execute a job

# bpctl -S 0 -m 111

bplib

• Manage libraries that are loaded on a node

• List libraries to be loaded

# bplib –l

• Add a library to the list

# bplib -a /lib/libcrypt.so.1

• Remove a library from the list

# bplib -d /lib/libcrypt.so.1

9.1.7 Troubleshooting techniques

• The tcpdump command can be used to check for node activity during and after a

node has booted

• Connect a cable to serial port on node to check console output for errors in boot

process

• Once node reaches node_up processing, messages will be logged in

/var/log/bproc/node.N (where N is node number)



9.2 Shared Cluster Setup

Once you have the basic installation completed, you'll need to configure the system.

Many of the tasks are no different for machines in a cluster than for any other system.

For other tasks, being part of a cluster impacts what needs to be done. The following

subsections describe the issues associated with several services that require special

considerations.

9.2.1 DHCP

Dynamic Host Configuration Protocol (DHCP) is used to supply network

configuration parameters, including IP addresses, host names, and other information

to clients as they boot. With clusters, the head node is often configured as a DHCP

server and the compute nodes as DHCP clients. There are two reasons to do this. First,

it simplifies the installation of compute nodes since the information DHCP can supply

is often the only thing that is different among the nodes. Since a DHCP server can

handle these differences, the node installation can be standardized and automated. A

second advantage of DHCP is that it is much easier to change the configuration of the

network. You simply change the configuration file on the DHCP server, restart the

server, and reboot each of the compute nodes.

The basic installation is rarely a problem. The DHCP system can be installed as a part

of the initial Linux installation or after Linux has been installed. The DHCP server

configuration file, typically /etc/dhcpd.conf, controls the information distributed to

the clients. If you are going to have problems, the configuration file is the most likely

source.

The DHCP configuration file may be created or changed automatically when some

cluster software is installed. Occasionally, the changes may not be done optimally or

even correctly so you should have at least a reading knowledge of DHCP

configuration files. Here is a heavily commented sample configuration file that

illustrates the basics. (Lines starting with "#" are comments.)



# A sample DHCP configuration file.

# The first commands in this file are global,

# i.e., they apply to all clients.

# Only answer requests from known machines,

# i.e., machines whose hardware addresses are given.

deny unknown-clients;

# Set the subnet mask, broadcast address, and router address.

option subnet-mask 255.255.255.0;

option broadcast-address 172.16.1.255;

option routers 172.16.1.254;

# This section defines individual cluster nodes.

# Each subnet in the network has its own section.

subnet 172.16.1.0 netmask 255.255.255.0 {

group {

# The first host, identified by the given MAC address,

# will be named node1.cluster.int, will be given the

# IP address 172.16.1.1, and will use the default router

# 172.16.1.254 (the head node in this case).

host node1{

hardware ethernet 00:08:c7:07:68:48;

fixed-address 172.16.1.1;




option domain-name "cluster.int";

}

host node2{

hardware ethernet 00:08:c7:07:c1:73;

fixed-address 172.16.1.2;


option domain-name "cluster.int";

}

# Additional node definitions go here.

}

}

# For servers with multiple interfaces, this entry says to ignore requests

# on specified subnets.

subnet 10.0.32.0 netmask 255.255.248.0 { not authoritative; }

As shown in this example, you should include a subnet section for each subnet on

your network. If the head node has an interface for the cluster and a second interface

connected to the Internet or your organization's network, the configuration file will

have a group for each interface or subnet. Since the head node should answer DHCP

requests for the cluster but not for the organization, DHCP should be configured so

that it will respond only to DHCP requests from the compute nodes.

9.2.2 NFS

A network filesystem is a filesystem that physically resides on one computer (the file

server), which in turn shares its files over the network with other computers on the

network (the clients). The best-known and most common network filesystem is

Network File System (NFS). In setting up a cluster, designate one computer as your

NFS server. This is often the head node for the cluster, but there is no reason it has to



be. In fact, under some circumstances, you may get slightly better performance if you

use different machines for the NFS server and head node. Since the server is where

your user files will reside, make sure you have enough storage. This machine is a

likely candidate for a second disk drive or raid array and a fast I/O subsystem. You

may even what to consider mirroring the filesystem using a small high-availability

cluster.

Why use an NFS? It should come as no surprise that for parallel programming you'll

need a copy of the compiled code or executable on each machine on which it will run.

You could, of course, copy the executable over to the individual machines, but this

quickly becomes tiresome. A shared filesystem solves this problem. Another

advantage to an NFS is that all the files you will be working on will be on the same

system. This greatly simplifies backups. (You do backups, don't you?) A shared

filesystem also simplifies setting up SSH, as it eliminates the need to distribute keys.

(SSH is described later in this chapter.) For this reason, you may want to set up NFS

before setting up SSH. NFS can also play an essential role in some installation

strategies.

If you have never used NFS before, setting up the client and the server are slightly

different, but neither is particularly difficult. Most Linux distributions come with most

of the work already done for you.

9.2.2.1 Running NFS

Begin with the server; you won't get anywhere with the client if the server isn't

already running. Two things need to be done to get the server running. The file

/etc/exports must be edited to specify which machines can mount which directories,

and then the server software must be started. Here is a single line from the file

/etc/exports on the server amy:

/home basil(rw) clara(rw) desmond(rw) ernest(rw) george(rw)

This line gives the clients basil, clara, desmond, ernest, and george read/write access

to the directory /home on the server. Read access is the default. A number of other



options are available and could be included. For example, the no_root_squash option

could be added if you want to edit root permission files from the nodes.

Had a space been inadvertently included between basil and (rw), read access would

have been granted to basil and read/write access would have been granted to all other

systems. (Once you have the systems set up, it is a good idea to use the command

showmount -a to see who is mounting what.)

Once /etc/exports has been edited, you'll need to start NFS. For testing, you can use

the service command as shown here

[root@fanny init.d]# /sbin/service nfs start

Starting NFS services: [ OK ]

Starting NFS quotas: [ OK ]

Starting NFS mountd: [ OK ]

Starting NFS daemon: [ OK ]

[root@fanny init.d]# /sbin/service nfs status

rpc.mountd (pid 1652) is running...

nfsd (pid 1666 1665 1664 1663 1662 1661 1660 1657) is running...

rpc.rquotad (pid 1647) is running...

(With some Linux distributions, when restarting NFS, you may find it necessary to

explicitly stop and restart both nfslock and portmap as well.) You'll want to change

the system configuration so that this starts automatically when the system is rebooted.

For example, with Red Hat, you could use the serviceconf or chkconfig commands.



For the client, the software is probably already running on your system. You just need

to tell the client to mount the remote filesystem. You can do this several ways, but in

the long run, the easiest approach is to edit the file /etc/fstab, adding an entry for the

server. Basically, you'll add a line to the file that looks something like this:

amy:/home /home nfs rw,soft 0 0

In this example, the local system mounts the /home filesystem located on amy as the

/home directory on the local machine. The filesystems may have different names. You

can now manually mount the filesystem with the mount command

[root@ida /]# mount /home

When the system reboots, this will be done automatically.

When using NFS, you should keep a couple of things in mind. The mount point,

/home, must exist on the client prior to mounting. While the remote directory is

mounted, any files that were stored on the local system in the /home directory will be

inaccessible. They are still there; you just can't get to them while the remote directory

is mounted. Next, if you are running a firewall, it will probably block NFS traffic. If

you are having problems with NFS, this is one of the first things you should check.

File ownership can also create some surprises. User and group IDs should be

consistent among systems using NFS, i.e., each user will have identical IDs on all

systems. Finally, be aware that root privileges don't extend across NFS shared systems

(if you have configured your systems correctly). So if, as root, you change the

directory (cd) to a remotely mounted filesystem, don't expect to be able to look at

every file. (Of course, as root you can always use su to become the owner and do all

the snooping you want.) Details for the syntax and options can be found in the nfs(5),

exports(5), fstab(5), and mount(8) manpages.

9.2.3 SSH



To run software across a cluster, you'll need some mechanism to start processes on

each machine. In practice, a prerequisite is the ability to log onto each machine within

the cluster. If you need to enter a password for each machine each time you run a

program, you won't get very much done. What is needed is a mechanism that allows

logins without passwords.

This boils down to two choices—you can use remote shell (RSH) or secure shell

(SSH). If you are a trusting soul, you may want to use RSH. It is simpler to set up with

less overhead. On the other hand, SSH network traffic is encrypted, so it is safe from

snooping. Since SSH provides greater security, it is generally the preferred approach.

SSH provides mechanisms to log onto remote machines, run programs on remote

machines, and copy files among machines. SSH is a replacement for ftp, telnet, rlogin,

rsh, and rcp. A commercial version of SSH is available from SSH Communications

Security (http://www.ssh.com), a company founded by Tatu Ylönen, an original

developer of SSH. Or you can go with OpenSSH, an open source version from

http://www.openssh.org.

OpenSSH is the easiest since it is already included with most Linux distributions. It

has other advantages as well. By default, OpenSSH automatically forwards the

DISPLAY variable. This greatly simplifies using the X Window System across the

cluster. If you are running an SSH connection under X on your local machine and

execute an X program on the remote machine, the X window will automatically open

on the local machine. This can be disabled on the server side, so if it isn't working,

that is the first place to look.

There are two sets of SSH protocols, SSH-1 and SSH-2. Unfortunately, SSH-1 has a

serious security vulnerability. SSH-2 is now the protocol of choice. This discussion

will focus on using OpenSSH with SSH-2.

Before setting up SSH, check to see if it is already installed and running on your

system. With Red Hat, you can check to see what packages are installed using the

package manager.

[root@fanny root]# rpm -q -a | grep ssh



openssh-3.5p1-6

openssh-server-3.5p1-6

openssh-clients-3.5p1-6

openssh-askpass-gnome-3.5p1-6

openssh-askpass-3.5p1-6

This particular system has the SSH core package, both server and client software as

well as additional utilities. The SSH daemon is usually started as a service. As you

can see, it is already running on this machine.

[root@fanny root]# /sbin/service sshd status

sshd (pid 28190 1658) is running...

Of course, it is possible that it wasn't started as a service but is still installed and

running. You can use ps to double check.

[root@fanny root]# ps -aux | grep ssh

root 29133 0.0 0.2 3520 328 ? S Dec09 0:02 /usr/sbin/sshd

...

Again, this shows the server is running.

With some older Red Hat installations, e.g., the 7.3 workstation, only the client

software is installed by default. You'll need to manually install the server software. If

using Red Hat 7.3, go to the second install disk and copy over the file

RedHat/RPMS/openssh-server-3.1p1-3.i386.rpm. (Better yet, download the latest



version of this software.) Install it with the package manager and then start the

service.

[root@james root]# rpm -vih openssh-server-3.1p1-3.i386.rpm

Preparing... ########################################### [100%]

1:openssh-server ########################################### [100%]

[root@james root]# /sbin/service sshd start

Generating SSH1 RSA host key: [ OK ]

Generating SSH2 RSA host key: [ OK ]

Generating SSH2 DSA host key: [ OK ]

Starting sshd: [ OK ]

When SSH is started for the first time, encryption keys for the system are generated.

Be sure to set this up so that it is done automatically when the system reboots.

Configuration files for both the server, sshd_config, and client, ssh_config, can be

found in /etc/ssh, but the default settings are usually quite reasonable. You shouldn't

need to change these files.

9.2.3.1 Using SSH

To log onto a remote machine, use the command ssh with the name or IP address of

the remote machine as an argument. The first time you connect to a remote machine,

you will receive a message with the remote machines' fingerprint, a string that

identifies the machine. You'll be asked whether to proceed or not. This is normal.

[root@fanny root]# ssh amy



The authenticity of host 'amy (10.0.32.139)' can't be established.

RSA key fingerprint is 98:42:51:3e:90:43:1c:32:e6:c4:cc:8f:4a:ee:cd:86.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'amy,10.0.32.139' (RSA) to the list of known hosts.

root@amy's password:

Last login: Tue Dec 9 11:24:09 2003

[root@amy root]#

The fingerprint will be recorded in a list of known hosts on the local machine. SSH

will compare fingerprints on subsequent logins to ensure that nothing has changed.

You won't see anything else about the fingerprint unless it changes. Then SSH will

warn you and query whether you should continue. If the remote system has changed,

e.g., if it has been rebuilt or if SSH has been reinstalled, it's OK to proceed. But if you

think the remote system hasn't changed, you should investigate further before logging

in.

Notice in the last example that SSH automatically uses the same identity when

logging into a remote machine. If you want to log on as a different user, use the -l

option with the appropriate account name.

You can also use SSH to execute commands on remote systems. Here is an example

of using date remotely.

[root@fanny root]# ssh -l sloanjd hector date

sloanjd@hector's password:



Mon Dec 22 09:28:46 EST 2003

Notice that a different account, sloanjd, was used in this example.

To copy files, you use the scp command. For example,

[root@fanny root]# scp /etc/motd george:/root/

root@george's password:

motd 100% |************************ *****| 0 00:00

Here file /etc/motd was copied from fanny to the /root directory on george.

In the examples thus far, the system has asked for a password each time a command

was run. If you want to avoid this, you'll need to do some extra work. You'll need to

generate a pair of authorization keys that will be used to control access and then store

these in the directory ~/.ssh. The ssh-keygen command is used to generate keys.

[sloanjd@fanny sloanjd]$ ssh-keygen -b1024 -trsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/sloanjd/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/sloanjd/.ssh/id_rsa.

Your public key has been saved in /home/sloanjd/.ssh/id_rsa.pub.



The key fingerprint is:

2d:c8:d1:e1:bc:90:b2:f6:6d:2e:a5:7f:db:26:60:3f sloanjd@fanny

[sloanjd@fanny sloanjd]$ cd .ssh

[sloanjd@fanny .ssh]$ ls -a

. .. id_rsa id_rsa.pub known_hosts

The options in this example are used to specify a 1,024-bit key and the RSA

algorithm. (You can use DSA instead of RSA if you prefer.) Notice that SSH will

prompt you for a pass phrase, basically a multi-word password.

Two keys are generated, a public and a private key. The private key should never be

shared and resides only on the client machine. The public key is distributed to remote

machines. Copy the public key to each system you'll want to log onto, renaming it

authorized_keys2.

[sloanjd@fanny .ssh]$ cp id_rsa.pub authorized_keys2

[sloanjd@fanny .ssh]$ chmod go-rwx authorized_keys2

[sloanjd@fanny .ssh]$ chmod 755 ~/.ssh

If you are using NFS, as shown here, all you need to do is copy and rename the file in

the current directory. Since that directory is mounted on each system in the cluster, it

is automatically available.

If you used the NFS setup described earlier, root's home

directory/root, is not shared. If you want to log in as root



without a password, manually copy the public keys to the target

machines. You'll need to decide whether you feel secure setting

up the root account like this.

You will use two utilities supplied with SSH to manage the login process. The first is

an SSH agent program that caches private keys, ssh-agent. This program stores the

keys locally and uses them to respond to authentication queries from SSH clients. The

second utility, ssh-add, is used to manage the local key cache. Among other things, it

can be used to add, list, or remove keys.

[sloanjd@fanny .ssh]$ ssh-agent $SHELL

[sloanjd@fanny .ssh]$ ssh-add

Enter passphrase for /home/sloanjd/.ssh/id_rsa:

Identity added: /home/sloanjd/.ssh/id_rsa (/home/sloanjd/.ssh/id_rsa)

(While this example uses the $SHELL variable, you can substitute the actual name of

the shell you want to run if you wish.) Once this is done, you can log in to remote

machines without a password.

This process can be automated to varying degrees. For example, you can add the call

to ssh-agent as the last line of your login script so that it will be run before you make

any changes to your shell's environment. Once you have done this, you'll need to run

ssh-add only when you log in. But you should be aware that Red Hat console logins

don't like this change.

You can find more information by looking at the ssh(1), ssh-agent(1), and ssh-add(1)

manpages. If you want more details on how to set up ssh-agent, you might look at

SSH, The Secure Shell by Barrett and Silverman, O'Reilly, 2001. You can also find



scripts on the Internet that will set up a persistent agent so that you won't need to

rerun ssh-add each time.

9.2.4 Hosts file and name services

Life will be much simpler in the long run if you provide appropriate name services.

NIS is certainly one possibility. At a minimum, don't forget to edit /etc/hosts for your

cluster. At the very least, this will reduce network traffic and speed up some software.

And some packages assume it is correctly installed. Here are a few lines from the host

file for amy:

127.0.0.1 localhost.localdomain localhost

10.0.32.139 amy.wofford.int amy

10.0.32.140 basil.wofford.int basil

...

Notice that amy is not included on the line with localhost. Specifying the host name as

an alias for localhost can break some software.

9.3 Working with Parallex

Once the master has been configured and all nodes are up, working with Parallex to

utilize all your available resources is very easy. Follow these simple steps to use the

power of all nodes that are up.

• Compile your code and place it in $PARALLEX_DIR/bin/

You can use the Makefile to do this for you.

# make main_app

• After the application is compiled without any errors, first start the networking

monitoring tool of Parallex



# netmon

• Parallex will now know which machines are up and running in your cluster.

To read information about the machines, run the following command:

# parastat

• To get a graphical representation about CPU usage and other stats about your

slave machines run the Gkrellm configuration script.

# gkrllm_config

• To run the main application on Parallex engine just run the master script

followed by the full path of the executable binary that was compiled from your

source application and a list of arguments that indicate the data set that is to be

parallelized as follows:

# parallex ../bin/my_app 1 99999999



Chapter 10. Conclusion

There exist many solutions for running applications on distributed/parallel

systems. Parallex however, is a single complete solution that takes care of all

issues related to High Performance Computing right from cluster boot up, to

management of processes on remote machines.

Parallex is also unique in the sense that it supports both dedicated and shared

cluster architecture. The ability of Parallex to efficiently utilize the available

computing resources means that the cluster does not require any special kind of

hardware, nor does it have to be homogenous i.e. of the same kind, thus resulting

in significant cost savings.

Parallex in its current state, is intended for use in educational institutes and small

to medium sized businesses. However, it can be easily adapted for a range of

applications from mathematical, scientific, to 3D rendering.

Hence because of its simplicity, adaptability, ease of use and relatively low cost of

ownership we can conclude that Parallex is a poor man’s Super Computer



Chapter 11. Future Enhancements

Handling Binary-level parallelism: Given the source code, the master can

successfully break the application for processing in parallel, however to handle binary

executables we use the openMosix technology which is a Linux kernel extension for

single-system image clustering. Processes originating from any one node, if that node

is too busy compared to others, can migrate to any other node. OpenMosix

continuously attempts to optimize the resource allocation. The distributed computing

concept is implemented by openMosix by extending the kernel and thus it is

transparent to all applications. We are trying to include openMosix so that we can add

load balancing into parallel processing.

Compatibility with non-Unix platforms: At present Parallex can run on multiple

platforms with the only restriction that all should be Unix based (Linux, FreeBSD,

NetBSD, Plan 9, Darwin etc.). Another restriction is that the applications needed to be

run on Parallex should be compliant with all the above systems. To be able to work

with other platforms, one solution is to have a virtual machine running one of the

above supported platforms as guest OS.



Chapter 12. References

[1] Parallel Computer Architectures: Hardware/Software Approach. Culler, David.

Morgran Coffman Publishers. San Fransisco,CA.

[2] High Performance Computing: 2nd Edition, Dowd Kavin and Charles.

Sebastopol , CA : ORielly and Associates

[3] Source Book of Parallel Computing: Dongara, Jack. Morgran Coffman

Publishers. San Fransisco,CA

[4] High Performance Linux Clusters: Joseph Sloan, CA:O’ReillyMedia Inc.

[5] Parallel Computing on Heterogeneous Networks by Alexey L. Lastovetsky

[6] Designing and Building Parallel Programs: Ian Foster

[7] Tools and environments for Parallel and Distributed Computing: Salim Hariri,

Manish Parashar

[8] Performance tuning techniques for Clusters: Troy Baer

[9] Introduction to Parallel Computing: Los Alamos National Laboratory

[10] http://bproc.sourceforge.net/bproc.html: BProc Homepage

[11] www.beowulf.org: Homepage of the Beowulf project

[12] Beowulf Cluster Computing with Linux, Second Edition: William Gropp, Ewing

Lusk and Thomas Sterling

[13] Parallel I/O for High Performance Computing: John M. May

[14] High Performance Computing and Beowulf Clusters: R.J. Allan, S.J. Andrews

and M.F. Guest

[15] www.kernel.org: Kernel Sources



APPENDIX- A. BProc

BProc (Beowulf Distributed Process Space )

The Beowulf Distributed Process Space (BProc) is set of kernel modifications,

utilities and libraries which allow a user to start processes on other machines in a

Beowulf-style cluster. Remote processes started with this mechanism appear in the

process table of the front end machine in a cluster. This allows remote process

management using the normal UNIX process control facilities. Signals are

transparently forwarded to remote processes and exit status is received using the usual

wait() mechanisms.

BProc:-

• Manages a single process-space across machine

• Responsible for process startup and management

• Provides commands for starting processes, copying files to nodes, etc.

BProc is a Linux kernel modification which provides:-

• A single system image for process control in a cluster

• Process migration for creating processes in a cluster

In a BProc cluster, there is a single master and many slaves

• Users (including root) only log into the master

• The master’s process space is the process space for the cluster

• All processes in the cluster are

• Created from the master

• Visible on the master

• Controlled from the master



A1.0 Motivation

rsh and rlogin are a lousy way to interact with the machines in a cluster.

Being able to log into any machine in the cluster instantly necessitates a large amount

of software and configuration be present on the machine. You will need things like

shells for people to log in. You will need an up to date password database. You'll need

all the little programs that people expect to see on a UNIX system for people to be

comfortable using the system. You'll probably also need all the setup scripts and

associated configuration information to get the machines up to the point where they're

actually usable by the users. That sucks. There's an awful lot of configuration there.

With a large number of machines, it's also very easy for the users to make a mess.

Runaway processes are a problem.

The goal of BProc is to change to model of the cluster from a pile of PCs to

single machine with a collection of network attached compute resources. And, of

course, to do away with rsh and rlogin in the cluster environment.

Once we do away with the interactive logins, we get two basic needs. We need

a way to start processes on remote machines and most importantly, we need a way to

monitor and control what's going on the remote machines.

BProc provides process migration mechanisms which allow a process to place

copies of itself on remote machines via a remote fork system call. When creating

remote processes via this mechanism, the child processes are all visible in the front

end's process tree.

The central idea in BProc is the idea of a distributed process ID (PID) space.

Every instance of Linux has a process space - a pool of process IDs and a process tree.

BProc takes the process space of the front end machine and allows portions of it to

exist on the other machines in the cluster. The machine distributing pieces of its

process space is the master machine and the machines accepting pieces of it to run are

the slave machines.



A2.0 Process Migration

• BProc provides a process migration system to place processes on

other nodes in the cluster

• Process migration on BProc is not

• Transparent

• Preemptive

• A process must call the migration system call in order to move

• Process migration on BProc is

• Very fast (1.9s to place a 16MB process on 1024 nodes)

• Scalable

• It can create many copies for the same process (e.g. MPI startup) very efficiently

• O(log #copies)

A2.1 Process migration does preserve

• The contents of memory and memory related metadata

• CPU State (registers)

• Signal handler state

• Process migration does not preserve

• Shared memory regions

• Open files

• SysV IPC resources

• Just about anything else that isn’t “memory”

A3.0 Running on a Slave Node

• BProc is a process management system

• All other system calls are handled locally on the slave node

• BProc does not impose any extra overhead on non-process related

system calls

• File and Network I/O are always handled locally

• Calling open() will not cause contact with the master node



• This means network and file I/O are as fast as they can be

A4.0 Implementation:

BProc consists of four basic pieces. On the master node, there are "ghost processes"

which are place holders in the process tree that represent remote processes. There is

also the master daemon which is the message router for the system and is also the

piece which maintains state information about which processes exist where. On the

slave nodes there is process ID masquerading which is a system of lying to processes

there so that they appear (to themselves) to be in the master's process space. There is

also a simple daemon on the slave side which is mostly just a message pipe between

the slave's kernel and the network.

A4.1 Ghost Processes

Code reuse is good. BProc tries to recycle of as much of the kernel's existing process

infrastructure as possible. The UNIX process model is well thought out and certainly

well understood. All the details of the UNIX model have been hammered out and it

works well. Rather than try and change or simplify it for BProc, BProc tries to keep it

entirely. Rather than creating some new kind of semi-bogus process tree, BProc uses

the existing tree and fills the places which represent remote processes with light

weight "ghost" processes.

Ghost processes are normal processes except that they lack a memory space and open

files. They resemble kernel threads like kswapd and kflushd. It is possible for ghosts

to wake up and run on the front end. They have their own status (i.e. sleeping,

running) which is independent of the remote processes they represent. Most of the

time, however, they sleep and wait for the remote process to request one of the few

operations which are performed on their behalf.

Ghost processes mirror portions of the status of the remote process. The status include

information such as the process state and the amount of CPU time that it has used so

far. This aternate status is what gets presented to user space in the procfs filesystem.

This status gets updated on demand (via a request to the real process) and no more

often than every 5 seconds.



Ghosts catch and forward signals to the remote process. Since ghosts are kernel

threads (not running in user space), they can catch and forward SIGKILL and

SIGSTOP. There is no way to get rid of ghost process without the remote process

exiting.

Ghosts perform certain operations on behalf of the real processes they represent. In

particular they do fork() and wait(). If a process on a remote machine decides to fork,

a new process ID must be allocated for it in the master's process space. Also, we

should see a new ghost on the front end when the remote process forks. Having the

ghost call fork() accomplishes both of these nicely. Likewise, the ghost process will

also clean up the process tree on the front end by performing wait()s when necessary.

Finally, the ghost will exit() with the appropriate status when the remote process it

represents exits. Since the ghost is a kernel thread, it can accurately reflect the exit

status of the remote process including states such as killed by a signal and core

dumped.

A4.2 Process ID Masquerading

The slave nodes accept pieces of the master's process space. The problem here is

although a process might move to a different machine, it should not appear (to that

process) that it's left the process space of the front end. That means things like the

process ID can't change and system calls like kill() should function as if the process

was still on the front end. That is we shouldn't be able to send signals across process

spaces to the other processes on the slave node.

Since the slave doesn't control the process space of the processes it's accepting, not all

operations can be handled entirely locally either. fork() is a good example.

The solution that BProc uses is to ignore the process ID that a process gets when it's

created on the slave side. BProc attaches a second process ID to the process and

modifies the process ID related system calls to essentially lie to the process about

what its ID is.



Having this extra tag also allows the slave daemon to differentiate the process from

the other processes on the system when performing process ID related system calls.

A4.3 The Daemons

The master and slave daemons are the glue connecting the ghosts and the real

processes together.

A4.4 Design Principles

BProc's design is based on the following basic principles.

A4.5 Code reuse is good

BProc uses place holders called ghosts in the normal UNIX process tree on the front

end to represent remote processes. The parent child relationships are a no-brainer that

way and so is handling signals, wait, etc.

A4.6 Code reuse is really good

Code reuse is even more important in user space since things seem to change so

regularly. To avoid having to write our own set of process viewing utilities like ps and

top. BProc presents all the information about remote processes in the procfs file

system just like the system does for normal processes. As long as we keep up with

changes in the procfs file system, all existing and future process viewing/control

utilities will continue work for all time.

This is especially important in user space since user space programs seem to change

very often.

A4.7 The System must be bullet proof! (from user space)

Processes can't escape or confuse the management system. Ghosts need to properly

forward all signals including SIGKILL and SIGSTOP. There is no way for a ghost to

exit without the process it represents also exiting.



A4.8 Kernels shouldn't talk on the network.

The kernel is a very very bad place to screw up. Try and keep as much as possible

outside of kernel space. This includes message routing and all the information about

the current state of the machine.

A4.9 Minimum knowledge

If a piece of the system doesn't really need to know something don't let it know. The

master daemon is the only piece that knows where the processes actually exist. The

kernel layers only have a notion of processes that are here or not here. Slaves don't

know what node number they are.

In Brief :

• All processes are started from the master with process migration

• All processes remain visible on the master

• No runaways

• Normal UNIX process control works for ALL processes in the

Cluster

• No need for direct interaction

• There is no need to log into a node to control what is running there

• No software is required on the nodes except the BProc slave

Daemon

• ZERO software maintenance on the nodes!

• Diskless nodes without NFS root

• Reliable nodes

A4.10 Screen Shots

Every self respecting piece of software provides a screen shot of some kind. For

BProc we have a shot of top. Note the CPU states line. cpumunch is a stupid little

program that just eats up CPU time on remote nodes.

3:08pm up 2:25, 7 users, load average: 0.13, 0.07, 0.07



175 processes: 46 sleeping, 129 running, 0 zombie, 0 stopped

CPU states: 12798.7% user, 8.3% system, 0.0% nice, 0.0% idle

Mem: 128188K av, 57476K used, 70712K free, 23852K shrd, 17168K buff

Swap: 130748K av, 0K used, 130748K free

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME

COMMAND

1540 hendriks 0 0 0 0 0 RW 0 99.9 0.0 21:35 cpumunch














The processes here appear swapped because the ghosts don't have a memory space

and procfs doesn't mirror remote memory sizes.



APPENDIX- B. POV - Ray

B1.0 What is POV-Ray?

POV-RayTM is short for the Persistence of VisionTM Raytracer, a tool for producing

high-quality computer graphics. POV-RayTM is copyrighted freeware, that is to say,

we, the authors, retain all rights and copyright over the program, but that we permit

you to use it for no charge, subject to the conditions stated in our license, which

should be in the documentation directory as povlegal.doc.

Without a doubt, POV-Ray is the worlds most popular raytracer. From our website

alone we see well over 100,000 downloads per year, and this of course doesn't count

the copies that are obtained from our mirror sites, other internet sites, on CD-ROM, or

from persons sharing their copies.

The fact that it is free helps a lot in this area, of course, but there's more to it than that.

There are quite a few other free ray tracers and renderers available. What makes this

program different?

The answers are too numerous to detail in full here. Suffice it to say that POV-Ray

has the right balance of power and versatility to satisfy extremely experienced and

competent users, while at the same time not being so intimidating as to completely

scare new users off.

Of course, the most important factor is image quality, and in the right hands, POV-

Ray has it. We, the developers, have seen images that were rendered using our

software that we at first thought were photographs - they were that realistic. (Note that

photo-realism is an advanced skill; one that takes some practice to develop).

B1.1 What is POV-Ray for Unix?



POV-Ray for Unix is essentially a version of the POV-Ray rendering engine prepared

for running on a Unix or Unix-like operating system (such as GNU/Linux). It contains

all the features of POV-Ray described in chapters 2 and 3 of the documentation, plus

a few others specific to UNIX and GNU/Linux systems. These additional features do

not affect the core rendering code. They only make the program suitable for running

under an Unix-based system, and provide the user with Unix-specific displaying

capabilities. For instance, POV-Ray for UNIX can use the X Window System to

display the image it is rendering. On GNU/Linux machines, it can also display the

image directly on the console screen using the SVGA library.

POV-Ray for Unix uses the same scheme as the other supported platforms to create

ray-traced images. The POV-Ray input is platform-independent, as it is using text

files (POV-Ray scripts) to describe the scene: camera, lights, and various objects.

B2.0 Available distributions

There are two official distributions of POV-Ray for Unix available:

• Source package: this package contains all the source files and Makefiles

required for building POV-Ray. Building the program from source should

work on most Unix systems. The package uses a configuration mechanism to

detect the adequate settings in order to build POV-Ray on your own platform.

All required support libraries are included in the package. See the INSTALL

file of the source package for details.

• Linux binary package: this package contains a compiled version of POV-Ray

for x86-compatible platforms running the GNU/Linux operating system. A

shell script for easy installation is also included. Further details are given in

the README file of this package.

Both distributions are available for download at the POV-Ray website and on the

POV-Ray FTP server (ftp.povray.org).



B3.0 Configuration

All official versions of POV-Ray for Unix come with procedures for correctly

installing and configuring POV-Ray. These explanations are for reference.

B3.1.1 The I/O Restrictions configuration file

When POV-Ray starts it reads the configuration for the I/O Restriction feature from

the povray.conf files. See the I/O Restrictions Documentation for a description of

these files.

B3.1.2 The main POV-Ray INI file

When starting, POV-Ray for UNIX searches for an INI file containing default

configuration options. The details can be found in the INI File Documentation.

B3.1.3 Starting a Render Job

Starting POV-Ray rendering any scene file is as simple as running povray from a

command-line with the scene file name as an argument. This will work with either a

POV file or an INI file (as long as it has an associated POV file). See Understanding

File Types. The scene is rendered with the current POV-Ray 3 options (see

Understanding POV-Ray Options).

Note: One of the more common errors new users make is turning off the display

option. The Display option (+d) is ON by default. If you turn this OFF in the INI

file or on the command line, POV-Ray will not display the file as you render.

Please also note that POV-Ray for Unix will write the output file to a .png by default.

There is no way to 'save the render window' after rendering is completed. If you

turned file output off before starting the render, and change your mind, you will have

to start the rendering all over again. We recommend that you just leave file output on

all the time.



B3.2.1 X Window display

When the X Window display is used, the rendered image is displayed in a graphics

window. During rendering, the window will be updated after every scanline has been

rendered, or sooner if the rendering is taking a long time. To update it sooner you can

click any mouse button in the window or press (almost) any key. Pressing <CTRL-R>

or <CTRL-L> during rendering will refresh the whole screen. If you have the

Exit_Enable or +X flag set, pressing 'q' or 'Q' at any time during the rendering will

stop POV-Ray rendering and exit. The rendering will pause when complete if the

Pause_When_Done (or +P) flag is set. To exit at this point, press the 'q' or 'Q' key or

click any mouse button in the window.

POV-Ray 3.6 includes a color icon in the program if it was compiled with libXpm

(which is available on most platforms where the X Window System is installed). If

this icon is used for the render view window depends on the window manager being

used (KDE, Gnome, fvwm, ...). POV-Ray also comes with a separate color icon

(xpovicon.xpm) for use with the window managers that can use external icons. For

instance, to have fvwm use this icon, copy the icon file to one of the directories

pointed to by PixmapPath (or ImagePath) which is defined in your $HOME/.fvwmrc.

Then, add the following line in $HOME/.fvwmrc:

Style "Povray" Icon xpovicon.xpm

and re-start the X Window server (re-starting fvwm will not be enough). Using this

icon with another window manager may use a different procedure.

Documentation of the special command line options to configure the X Window

display can be found in Special Command-Line Options.

B3.2.2 SVGAlib display

For GNU/Linux systems that don't have the X Window System installed, or for those

Linux users who prefer to run on the console, it is possible to use the SVGA library to

display directly to the screen. For SVGAlib display, the povray binary must be



installed as a setuid root executable. If POV-Ray does not use SVGAlib display, first

try (as root):

chown root.root povray

chmod 4755 povray

Note: Doing this may have serious security implications. Running POV-Ray as

root or through 'sudo' might be a better idea.

If it still doesn't work then make sure SVGAlib is installed on your machine and

works properly. Anything that can at least use the 320x200x256 mode (ie regular

VGA) should be fine, although modes up to 1280x1024x16M are possible. If you do

not have root privileged or can't have the system admin install POV-Ray, then you

must use the X Window or text display which do not require any special system

priviledges to run. If you are using a display resolution that is lower than what you are

rendering, the display will be scaled to fit as much of the viewing window as possible.

B3.3.0 Output file formats

The default output file format of POV-Ray for Unix is PNG (+fn). This can be

changed at runtime by setting the Output_File_Type or +fx option. Eventually, the

default format can be changed at compile time by setting

DEFAULT_FILE_FORMAT in the config.h file located in the unix/ directory.

Other convenient formats on Unix systems might be PPM (+fp) and TGA (+ft). For

more information about output file formats see File Output Options.

If you are generating histogram files (See CPU Utilization Histogram) in the CSV

format (comma separated values), then the units of time are in tens of microseconds

(10 x 10-6 s), and each grid block can store times up to 12 hours.

To interrupt a rendering in progress, you can use CTRL-C (SIGINT), which will

allow POV-Ray to finish writing out any rendered data before it quits. When graphics

display mode is used, you can also press the 'q' or 'Q' keys in the rendering preview

window to interrupt the trace if the Test_Abort (or +X) flag is set.



B4.0 Rendering the Sample Scenes

POV-Ray for Unix comes with a set of shell scripts to automatically render the

sample scenes coming with POV-Ray.

These shell scripts are usually installed in /usr/local/share/povray-3.6/scripts. They

require a bash compatible shell. There are three scripts that are supposed to be called

by the user.

• allscene.sh:

renders all stills. The syntax is:

allscene.sh [log] [all] [-d scene_directory] [-o output_directory] [-h html_file]

If html_file is specified a HTML listing of the rendered scenes is generated. if

ImageMagick is installed the listing will also contain thumbnails of the

rendered images.

• allanim.sh:

renders all animations. The syntax is:

allanim.sh [log] [-d scene_directory] [-o output_directory] [-h html_file]

If ffmpeg is installed the script will compile mpeg files from the rendered

animations.

• portfolio.sh:

renders the portfolio. The syntax is:

portfolio.sh [log] [-d scene_directory] [-o output_directory]

The portfolio is a collection of images illustrating the POV-Ray features and

include files coming with the package.

If the option log is specified, a log file with the complete text output from POV-Ray is

written (filename log.txt)



If scene_directory is specified, the sample scenes in this directory are rendered,

otherwise the scene directory is determined form the main povray ini file (usually

/usr/local/share/povray-3.6/scenes).

If output_directory is specified, all images are written to this directory; if it is not

specified the images are written into the scene file directories. If the directories are

not writable, the images are written in the current directory. All other files (html files,

thumbnails) are written here as well.

To determine the correct render options the scripts analyze the beginning of the scene

files. They search for a comment of the form

// -w320 -h240 +a0.3

in the first 50 lines of the scene. The animation script possibly also uses an INI file

with the same base name as the scene file. The allscene.sh has the additional all

option which - if specified - renders also scenes without such an options comment

(using default options then).

B5.0 POV-Ray for Unix Tips

B5.1 Automated execution

POV-Ray for Unix is well suited for automated execution, for example, for rendering

diagrams displaying statistical data on a regular basis or similar things.

POV-Ray can also write its image output directly to stdout. Therefore the image data

can be piped in another program for further processing. To do this the special output

filename '-' needs to be specified. For instance:

povray -iscene.pov +fp -o- | cjpeg > scene.jpg

will pass the image data to the cjpeg utility which writes the image in the JPEG

format.

The text output of POV-Ray is always written to stderr, it can be redirected to a file

with (using a Bourne-compatible shell):



povray [Options] 2> log.txt

For remote execution of POV-Ray, as for example in a rendering service on the web,

make sure you read and comply with the POV-Ray Legal Document.

B6.0 Understanding File Types

B6.1 POV Files

POV-Ray for Unix works with two types of plain text files. The first is the standard

POV-Ray scene description file. Although you may give files of this type any

legitimate file name, it is easiest if you give them the .pov extension. In this Help file,

scene description files are referred to as POV files.

The second type, the initialization file, is new to POV-Ray 3. Initialization files

normally have .ini extensions and are referred to in this help file as INI files.

B6.2 INI Files

An INI file is a text file containing settings for what used to be called POV-Ray

command-line options. It replaces and expands on the functions of the DEF files

associated with previous versions of POV-Ray. You can store a default set of options

in the main POV-Ray INI file which is searched for at the following locations:

• The place defined by the POVINI environment variable. When you want to

use an INI file at a custom location you can set this environment variable.

• ./povray.ini

• $HOME/.povray/3.6/povray.ini

• PREFIX/etc/povray/3.6/povray.ini (PREFIX by default is /usr/local)

For backwards compatibility with version 3.5, POV-Ray 3.6 also attempts to read the

main INI file from the old locations when none is found at the places above:

• $HOME/.povrayrc



• PREFIX/etc/povray.ini (PREFIX by default is /usr/local)

Note: Use of these locations is deprecated; they will not be available in future

versions.

Any other INI file can be specified by passing the INI file name on the command line.

One of the options you can set in the INI file is the name of an input file. You can

specify the name of a POV file here. This way you can customize POV-Ray settings

for any individual scene file.

For instance, if you have a file called scene.pov, you can create a file scene.ini to

contain settings specific for scene.pov. If you include the option

'Input_File_Name=scene.pov' in scene.ini, and then run povray scene.ini, POV-Ray

will process scene.pov with the options specified in scene.ini.

Remember, though, that any options set at the command line when you activate an

INI file override any corresponding options in the INI file (see Understanding POV-

Ray Options). Also, any options you do not set in the INI file will be taken as last set

by any other INI file or as originally determined in povray.ini.

You can instruct POV-Ray to generate an INI file containing all the options active at

the time of rendering. This way, you can pass a POV file and its associated INI file on

to another person and be confident that they will be able to generate the scene exactly

the same way you did. See the section titled Using INI Files for more information

about INI files.

B6.2.1 INI File Sections

Sections are not files in themselves; they are portions of INI files. Sections are a

means of grouping multiple sets of POV-Ray options together in a single INI file, by

introducing them with a section label. Consider the following INI file, taken from the

POV-Ray 3 documentation:

; RES.INI



; This sample INI file is used to set resolution.

+W120 +H100 ; This section has no label.

; Select it with "RES"

[Low]

+W80 +H60 ; This section has a label.

; Select it with "RES[Low]"

[Med]

+W320 +H200 ; This section has a label.

; Select it with "RES[Med]"

[High]

+W640 +H480 ; Labels are not case sensitive.

; "RES[high]" works

[Really High]

+W800 +H600 ; Labels may contain blanks

If you select this INI file, the default resolution setting will be 120 x 100. As soon as

you select the [High] section, however, the resolution becomes 640 x 480.

B7.0 Special Command-Line Options

POV-Ray for Unix supports several special command-line options not recognized by

other versions. They follow the standards for programs that run under the X Window

System.

-display <display_name>

Display preview on display_name rather than the default display. This is

meant to be used to change the display to a remote host. The normal dispay

option +d is still valid.

-geometry [WIDTHxHEIGHT][+XOFF+YOFF]



Render the image with WIDTH and HEIGHT as the dimensions, and locate

the window XOFF from the left edge, and YOFF from the top edge of the

screen (or if negative the right and bottom edges respectively). For instance: -

geometry 640x480+10+20 creates a display for a 640x480 image placed at

(10, 20) pixels from the top-left corner of the screen. The WIDTH and

HEIGHT, if given, override any previous +Wn and +Hn settings.

-help

Display the X Window System-specific options. Use -H by itself on the

command-line to output the general POV-Ray options.

-icon

Start the preview window as an icon.

-title <window_title>

Override the default preview window title with window_title.

-visual <visual_type>

Use the deepest visual of visual_type, if available, instead of the automatically

selected visual. Valid visuals are StaticGray, GrayScale, StaticColor,

PseudoColor, TrueColor, or DirectColor.

Note: if you are supplying a filename with spaces in it, you will need to enclose

the filename itself within quotes.



Glossary of Terms and Acronyms

3D RENDERING

Creating 3D animations or 3D scenes.

BEOWULF CLUSTER

High Performance cluster built with commodity off the shelf

hardware.

BINARY-LEVEL PARALLELISM

Parallelism at instruction level.

BLAS

Basic Linear Algebra Subprograms (BLAS) is a de facto

application programming interface standard for publishing

libraries to perform basic linear algebra operations such as

vector and matrix multiplication.

BPROC

The Beowulf Distributed Process Space (BProc) is set of kernel

modifications, utilities and libraries which allow a user to start

processes on other machines in a Beowulf-style cluster.

CAT 5 CABLES

Category 5 cable, commonly known as Cat 5 or "Cable and

Telephone", is a twisted pair cable type designed for high

signal integrity. This type of cable is often used in structured

cabling for computer networks such as Ethernet, and is also

used to carry many other signals such as basic voice services,

token ring, and ATM.

DHCP

Dynamic host control protocol. It is used to assign IP leases to

client machine.

DISTRIBUTED COMPUTING

Distributed computing is a form of computing for a collection

of independent machines that appears to its users as a single

coherent system.



FREEBSD

FreeBSD is a Unix-like free operating system descended from

AT&T UNIX via the Berkeley Software Distribution (BSD)

branch through the 386BSD and 4.4BSD operating systems.

FreeBSD has been characterized as "the unknown giant among

free operating systems."[2] It is not a clone of UNIX, but works

like UNIX, with UNIX-compliant internals and system APIs.

IRIX

IRIX is a operating system by Silicon Graphics Inc.

MIPS

MIPS (originally an acronym for Microprocessor without

Interlocked Pipeline Stages) is a RISC microprocessor

architecture developed by MIPS Technologies. MIPS designs

are currently primarily used in many embedded systems such as

the Series2 TiVo, Windows CE devices, Cisco routers, Foneras,

and video game consoles like the Nintendo 64 and Sony

PlayStation, PlayStation 2, and PlayStation Portable handheld

system.

MPI

Message Parsing Interface. MPI is a library specification for

message-passing, proposed as a standard by a broadly based

committee of vendors, implementers, and users.

NETBSD

NetBSD is a freely redistributable, open source version of the

Unix-derivative BSD computer operating system. Noted for its

portability and quality of design and implementation, it is often

used in embedded systems and as a starting point for the

porting of other operating systems to new computer

architectures



NFS

Network File System. A network filesystem is a filesystem that

physically resides on one computer (the file server), which in

turn shares its files over the network with other computers on

the network (the clients).

PLAN 9

Plan 9 from Bell Labs is a distributed operating system,

primarily used as a research vehicle. It was developed as the

research successor to Unix by the Computing Sciences

Research Center at Bell Labs. Plan 9 is most notable for

representing all system interfaces, including those required for

networking and the user-interface, through the filesystem rather

than specialized interfaces.

POVRAY

Persistence Of Vision Ray tracer. A 3D rendering Tool.

PVM

Parallel Virtual machine. A tool used to make run applications

in parallel

RARP

Reverse address resolution protocol. It is used to resolve IP

address from MAC address.

RPM

RPM Package Manager (originally Red Hat Package Manager,

abbreviated RPM) is a package management system.[1] The

name RPM refers to two things: a software package file format,

and software packaged in this format. RPM was intended

primarily for Linux distributions; the file format RPM is the

baseline package format of the Linux Standard Base.

RSH

Remote shell protocol. It is used for remote login into client

machines.



SSCI

Single System Image (SSI) Clustering. Presenting the collection

of machines that make up a cluster as a single machine.

SSH

Secure Shell Protocol. It is an encrypted version of RSH. This

is used in connecting with remote machine in network or login

into the remote machine using its password.

TCP/IP

Transmission control Protocol/Internet Protocol. This Protocol

is used in transmission of messages safely between computers

or different networks.



“ Parallex – The Super Computer” Memorable Journey

Parallex’s First Prototype with two machines

“Parallex – The Super Computer” with Diskless Machines



Display of Parallex Master

At Parallex Stall with our Project Guide Prof. Anil J. Kadam

(Representing Computer Department)



All smiles: Chief Guest and Guest of Honour of Engineeting Today

2008 at Parallex Stall

Explaining our “Parallex – Super computer”

(Our HOD madam at extreme right)



“ Parallex – The Super Computer” Achievements

� FIRST in Intercollegiate National Level Event

“EXCELSIOR 08” project competition & exhibition.

� FIRST in National Level Students Technical Symposium

and Exposition “AISSMS Engineering Today 2008” Project

Competition.

� SECOND in National Level Students Technical Symposium

and Exposition “AISSMS Engineering Today 2008” Technical

Paper Presentation.

� FIRST in National level Technical event “Zion 2008” project

competition.

� Finalist in many National level project competitions.

� Letter of Recommendation from our Head of Department and

support for setting up “High Performance Computing”

laboratory (Letter attached on next page).



Letter of Recommendation

parallex - the supercomputer

Technology

signals aretransparently

rootjames

rootfanny

rootfanny

growthof cpu

sloanjdfanny

test procedure

masters process