cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

20
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2010; 22:241–260 Published online 21 August 2009 inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1475 Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise Kevin Kane , and Blair Dillaway Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A. SUMMARY Cycle-scavenging grids appeal to organizations with large numbers of workstations that remain idle outside of normal working hours. This represents a potential source of grid computing cycles, but the security and isolation issues that come with the use of non-dedicated resources have slowed their adoption in the enterprise. In this paper we present Cyclotron, a prototype cycle-scavenging grid solution that leverages virtualization and a declarative security policy-based access control infrastructure, supporting flexible authorization rules and the constrained delegation of access rights, to address these requirements. Copyright © 2009 John Wiley & Sons, Ltd. Received 27 February 2009; Revised 13 May 2009; Accepted 16 June 2009 KEY WORDS: cycle-scavenging; cycle-stealing; grid computing; security; virtualization; access control; policy language 1. INTRODUCTION Cycle-scavenging, or cycle-stealing, grids have become popular alternatives to their dedicated cousins with the promise of large amounts of processing time without the equipment, data center, and staffing costs involved with maintaining a dedicated grid. Cycle-scavenging grids operate on a ‘best-effort’ basis, where the availability and reliability of compute nodes are unknown. This makes them ill-suited for distributed applications with a large coordination or communication overhead, but ideal for decoupled applications. Some recent studies have suggested that less than 40% of the processing capacity and less than 50% of the disk storage capacity are used on the typical desktop PCs. This has proven alluring to the scientific research community, academia, and commercial and government organizations. There are many successful examples of their use in the former areas, both for research computing grids and public ‘volunteer’ computing efforts. Their use within the Correspondence to: Kevin Kane, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, U.S.A. E-mail: [email protected] Copyright 2009 John Wiley & Sons, Ltd.

Upload: kevin-kane

Post on 11-Jun-2016

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2010; 22:241–260Published online 21 August 2009 inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1475

Cyclotron: a secure, isolated,virtual cycle-scavenging grid inthe enterprise

Kevin Kane∗,† and Blair Dillaway

Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

SUMMARY

Cycle-scavenging grids appeal to organizations with large numbers of workstations that remain idleoutside of normal working hours. This represents a potential source of grid computing cycles, but thesecurity and isolation issues that come with the use of non-dedicated resources have slowed their adoptionin the enterprise. In this paper we present Cyclotron, a prototype cycle-scavenging grid solution thatleverages virtualization and a declarative security policy-based access control infrastructure, supportingflexible authorization rules and the constrained delegation of access rights, to address these requirements.Copyright © 2009 John Wiley & Sons, Ltd.

Received 27 February 2009; Revised 13 May 2009; Accepted 16 June 2009

KEY WORDS: cycle-scavenging; cycle-stealing; grid computing; security; virtualization; access control; policylanguage

1. INTRODUCTION

Cycle-scavenging, or cycle-stealing, grids have become popular alternatives to their dedicatedcousins with the promise of large amounts of processing time without the equipment, data center,and staffing costs involved with maintaining a dedicated grid. Cycle-scavenging grids operate on a‘best-effort’ basis, where the availability and reliability of compute nodes are unknown. This makesthem ill-suited for distributed applications with a large coordination or communication overhead,but ideal for decoupled applications. Some recent studies have suggested that less than 40% of theprocessing capacity and less than 50% of the disk storage capacity are used on the typical desktopPCs. This has proven alluring to the scientific research community, academia, and commercial andgovernment organizations. There are many successful examples of their use in the former areas,both for research computing grids and public ‘volunteer’ computing efforts. Their use within the

∗Correspondence to: Kevin Kane, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, U.S.A.†E-mail: [email protected]

Copyright q 2009 John Wiley & Sons, Ltd.

Page 2: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

242 K. KANE AND B. DILLAWAY

latter organizations has been limited by concerns over information security and isolation from othercritical computing tasks [1,2].There are five security and isolation characteristics of a cycle-scavenging grid solution which the

authors believe are important to potential adopters. These are:

• Grid entity authentication and access control. All entities participating in the grid must beidentifiable and strongly authenticated to support high-assurance authorization and auditing.This is a pre-requisite to ensuring that grid resources are only used by grid entities for authorizedactions. This should include restrictions on who can submit and manage jobs, access gridsoftware and job-related data on file shares, and install and execute grid software on thecompute hosts.

• Grid job isolation. Grid job execution should not interfere with the primary function of thehost nor alter the host configuration in ways which could impact the primary function.

• Grid execution platform stability. Differences in an OS’s configuration or patch level andinstalled shared libraries can introduce application incompatibilities. Grid users need confi-dence that their jobs will run successfully in the target execution environment. This arguesstrongly for a well-defined OS and standard library environment for executing grid applica-tions. In some environments it is also important to be able to reproduce computational resultswhen requested. Providing an identical execution environment is important in achieving thischaracteristic. It is unlikely the primary OS environment across a large number of volunteerworkstations will provide such a uniform and stable execution environment.

• Controlled software deployment. To the primary user of a workstation, and the overall organi-zation, grid job software is a potential threat. It can be malicious, introduce security vulnera-bilities, or simply be buggy. Without appropriate controls, a grid infrastructure can be easilysubverted to create a viral dissemination mechanism. Hence, there must be positive controlover what grid executables are allowed to be deployed and executed on the cycle-scavengingcompute hosts.

• Grid access to organizational resources. Grid services and executing jobs should only haveaccess to the required grid resources, whether machine local or remote. It is undesirable to rungrid jobs and services in a security context which allows for broad access to organizationalresources or allows them to access resources in a manner which is indistinguishable from theactions of non-grid organizational users and applications.

In addition to the above considerations, it is important that the grid imposes a relatively modestadministrative burden and does not force changes in an organization’s existing identity and accesscontrol infrastructure. Naturally, it must also be reliable and make efficient use of computationresources to be compelling.In this paper we describe Cyclotron, a cycle-scavenging grid environment that addresses the

above requirements. It offers significant improvements in grid security and isolation characteristicsrelative to existing solutions. Our initial experimentation provides evidence that such a system canachieve high reliability while providing efficient computation and a relatively low administrativeburden.In Section 2, we discuss other cycle-scavenging grid solutions with an emphasis on their abil-

ity to address the above requirements. In Section 3, we present the architecture of Cyclotron,detail our security mechanisms, and evaluate how each of the above requirements are fulfilled.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 3: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 243

In Section 4, we describe our experimental setup, including both the dedicated servers that run thegrid, and the compute nodes whose cycles were donated by employees (located in several coun-tries) on the Microsoft corporate intranet. In Section 5, we discuss the possible future work andconclude.

2. RELATED WORK

In this section, we briefly review some prominent cycle-scavenging grid systems and their approachto security and isolation. Space constraints prevent more than a cursory comparison.Condor [3] is a grid computing system that supports both dedicated computing nodes as well

as scavenging cycles from idle desktops. It operates on UNIX and Microsoft Windows� systems,using a pre-existing authentication and authorization infrastructure. Jobs typically execute under theaccount of the submitting user, allowing jobs to access any user-accessible resources. This can beproblematic in that grid jobs are indistinguishable from other user applications and can perform anyuser-authorized action. It also requires a highly privileged grid agent capable of starting jobs underany account. Compromise of such an agent can allow an attacker access to other organizationalassets. Typically, Condor executes jobs on the workstation OS, relying on account-based isolationfrom the primary workstation applications. This can provide adequate isolation between well-behaved applications. Process isolation mechanisms prevent direct interaction between executingcode and ACLs can limit access to persistent data if properly configured. Still, it does not addressthe types of indirect interference problems that can arise between the grid and primary applicationsresulting from CPU contention or memory pressure.SETI@home [4] is one of the more successful cycle-scavenging applications. It is one of the

several volunteer computing projects using the Berkeley Open Infrastructure for Network Comput-ing (BOINC) [5]. BOINC uses code signing for job binary integrity, though users must trust theservice to provide non-malicious code. Jobs run on the host OS under a user-configured accountwhich has the same issues identified for Condor. In addition, the burden is on the user to properlyconfigure a grid account and local ACLs to provide the desired isolation.OurGrid [6] is a peer-oriented cycle-scavenging system. It is designed to facilitate the creation of

peer computing communities and provides fairly weak control over the grid participants and jobswhich can be run. A notable feature is its support for grid job sandboxing using XEN-based virtualmachines. This provides good isolation between the grid and primary workstation applications.The downside is the fairly limited support provided for access to network resources. This has bothpositive and negative ramifications. On the positive side it mitigates the risk of network-basedattacks by hostile grid applications. On the negative side, it complicates application provisioning,results retrieval, and the ability to leverage external services.The Digipede Network [7] is a commercial cycle-scavenging solution built around Microsoft

.NET and targeting Windows hosts. Volunteer hosts run an agent that checks in with a centralserver for available jobs. The agent typically runs under an unprivileged account and executesjobs in this security context. This provides process-based isolation from the primary workstationapplications and prevents modification of non-grid resources if ACLs are configured properly. Theyrecognize that some grid jobs do need access to external services to access shared data sets, deliverresults, or perform inter-job coordination. Their solution is to run such jobs under a common

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 4: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

244 K. KANE AND B. DILLAWAY

Windows domain account and authorize that account with the necessary service access rights. Thisreduces the ability to isolate, and distinguish between, different grid jobs.Univa UD (formerly United Devices) Grid MP [8] targets both Linux and Windows-based hosts,

and offers the Secure Execution and Automation Layer (SEAL) in which jobs are executed. SEALvalidates job modules through the use of digital signature, and provides compressed and encryptedcommunications. Its design requires that all grid application and data files be staged through theGrid MP servers, which are then executed on a host machine under a grid agent running as anunprivileged user account. This has the same types of concerns about interaction between the gridand primary workstation applications as noted previously. It also forces one to place all grid softwareand data in a common repository with a uniform access model. This provides a limited ability toisolate and distinguish between different grid jobs. It also limits flexibility in precluding grid jobaccess to external services other than the Grid MP servers.Cyclotron combines the desirable features of these solutions, such as code signing and virtual-

ization. It then adds a standard set of images for platform stability and homogeneity for ease ofdeployment, and a security model that allows fine-grained control over resources without the lossof granularity or delegation of unnecessary privileges that prove problematic in other solutions.

3. CYCLOTRON DESCRIPTION

The Cyclotron grid prototype was designed to provide enhanced information security and isolationwithin an organizational environment. It creates a logical grid which is independent from the largerorganizational compute environment. The prototype comprises a small number of grid-specificcomponents integrated with Microsoft Windows� products as depicted in Figure 1.The Grid Manager (GM) service provides job management, scheduling, and reporting. The GM

is analogous to a grid portal or head node in other systems. A grid client application allowsauthorized grid users to interact with the GM. Volunteer compute hosts run the Client PartitionManager (CPM), which works with Virtual Server� 2005 to manage a grid guest partition (akavirtual machine) and run grid jobs. We chose Virtual Server over Virtual PC because it providesa secured management interface that can be controlled programmatically, and provides superiorisolation and performance. Windows file servers provide storage services for grid executablesand user data files. All grid communication is done using web service protocols, secured usingWS-Security [9], and implemented on the Windows Communication Foundation (WCF) [10]. Thegrid utilizes the Security Policy Assertion Language (SecPAL) [11,12] as the basis for delega-tions and access control decisions within the grid. This includes expressing the required trustrelationships, delegation of rights between grid entities, and authorization rules. A Security TokenServer (STS) is provided to issue SecPAL-based security tokens to the grid users and services.The subsequent material explains each of these components in more detail and the overall systemoperation.

3.1. Security Policy Assertion Language

SecPAL [11] is a declarative security policy language developed by Microsoft for use in dis-tributed computing environments. The Microsoft research implementation [12] has a number of

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 5: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 245

Figure 1. Cyclotron system overview.

characteristics which make it a good match for the Cyclotron requirements:

• Flexible, logic-based security policies which can be digitally signed for secure deployment togrid components.

• Public key-based security tokens for strong authentication.• Expression of fine-grained trust and delegated access rules.

A complete explanation of SecPAL is beyond the scope of this paper; however, in this sectionwe provide a short overview of its relevant features. This is adequate to understand the Cyclotronsecurity policies presented later. A restricted English grammar is used to aid readability.The fundamental concept in SecPAL is the assertion:

A says fact if fact1, . . . , factn where c

An assertion is thus a sentence, said by a principal A, about a fact that may be dependent on thetruth of one or more conditional facts and a boolean constraint c. The above assertion means thatA will assert that fact is true if A believes fact1 to factn are all true. Facts consist of a subjectprincipal, a verb phrase, and optional ‘qualifiers’ such as the intended validity time or location ofuse. Each of the fact components may be a constant or a variable (indicated by a leading ‘%’ inour grammar). Constraints are simple functions operating on fact variables and environment datasuch as the current date and time.Facts most commonly express a binding of attributes to a principal,

p possesses attribute-type : value

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 6: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

246 K. KANE AND B. DILLAWAY

or actions a principal is authorized to make,

p can action resource from Time1 to Time2

The second example shows an optional qualifier limiting the time period (Time1,Time2) duringwhich the fact is asserted to be valid. Such a qualifier can be added to any assertion.SecPAL has three verb phrases with special semantics. Cyclotron makes use of only one, can

say, which is important in expressing trust relationships and delegations:

p can say fact

As a simple example, if Bob wishes to trust the STS to assert group names for other principals hecan make the assertion:

Bob says STS can say %p possesses group :′ .∗′

The regular expression ‘.*’ will match any possible group attribute value. This assertion, combinedwith a specific STS assertion (typically in a security token), such as:

STS says Joe possesses group : ProjectXallows one to logically conclude that

Bob says Joe possesses group : ProjectXduring a SecPAL evaluation. This provides a flexible mechanism for deciding what facts can satisfypolicy requirements. For example, if Bob only allows members of ProjectX to read the file foo.txt,he would write the authorization policy assertion as:

Bob says %p can read foo.txt if %p possesses group : ProjectXIf Bob believes Joe possesses group:ProjectX, then Bob believes Joe can read foo.txt. That is, Joe isa valid binding to the variable %p. Without the above trust assertion, Bob would have no basis forbelieving the STS assertion that Joe possesses group:ProjectX, and therefore one would not deducethat Joe can read the file.Assertions can be grouped into tokens. A token is a WS-Trust compliant collection of assertions

made and signed by a principal. In the examples that follow, collections of assertions preceded by‘Principal says’ means that Principal’s cryptographic signature has been applied to and protectsthose assertions from modification.A SecPAL policy evaluation consists of determining whether a query, consisting of atomic

SecPAL assertions (i.e. assertions with no conditional facts or constraints) and constraints can bededuced based on an evaluation context. This context contains a set of assertions from the relevantpolicies and security tokens along with environmental information such as the current time.Within Cyclotron, we extended the SecPAL research implementation grammar to include some

additional nouns and verbs allowing a more natural expression of the Cyclotron policies. Thesehave their commonly understood meaning and are consistent with the SecPAL formal model.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 7: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 247

3.2. Cyclotron dedicated grid services

Cyclotron required the development of three dedicated grid services:

• Grid Manager (GM). This is the central point of coordination in Cyclotron. The GM acceptsjobs from users, tracks the job status, maintains the list of operating CPMs, and dispatches jobsin accordance with its scheduling algorithm. It executes under a low-privilege local serviceaccount. Its only means of authenticating to external entities is using its grid-specific SecPALsecurity token.

• File store(s). Application codes, data files, and OS image files are kept on network accessiblefile shares. These use Windows file sharing services enhanced to support web service access bygrid entities. They may also be accessed by authorized domain users via the standard Windowsfile sharing protocol (SMB). This allows grid users to manage grid code and data files usingexisting tools and user interfaces. Access to these shares by grid services and jobs can onlybe made via the SecPAL-protected web service interface since they lack appropriate Windowsdomain credentials.

• Security Token Service (STS). This supports issuing the SecPAL security tokens Cyclotronrelies upon for grid entity authentication. These tokens are signed using the STS’s RSA privatekey. The associated public key provides the basis for the grid root of trust (similar to an X.509Certificate Authority).

A SecPALWS-Trust profile is used to request, and return, grid security tokens. This profile supportsrequestor authentication using domain account credentials (e.g. Kerberos tickets). Within Cyclotron,the STS was integrated with the Windows domain Active Directory� (AD) to retrieve existingaccount attributes, such as group membership. This information is used in conjunction with aSecPAL token issuance policy to determine what information is to be placed in the issued gridsecurity tokens.

3.3. Cyclotron compute node components

The heart of the grid is the volunteer workstations on which grid jobs execute. For the proto-type, these may be running any currently supported version of Microsoft Windows. On each ofthese workstations, the CPM is responsible for controlling interactions with the grid. It makes theworkstation’s availability and status known to the GM, and accepts jobs from the GM.To join a workstation to the grid, the CPM contacts the GM after it is installed and configured.

This registers the workstation/CPM with the GM and requests a CPM security token to uniquelyidentify the CPM instance. The GM proxies the security token request to the STS. This ensures onlyknown CPMs can obtain tokens and provides a basis for STS trust in the supplied CPM identifyinginformation. A potential concern is the possibility of hostile code pretending to be a CPM in order tolater gain access to grid software and/or data. In our prototype system, it was assumed anyWindowsdomain joined machines could be trusted to act as grid compute hosts. This can be determinedsince such machines have valid domain machine credentials. A more sophisticated discriminationmechanism could be implemented at the GM. For example, one might restrict volunteers to specificmachines, certain network addresses, or require a domain administrator to register the machine withthe GM.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 8: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

248 K. KANE AND B. DILLAWAY

Once the CPM security token has been issued, the GM returns it to the volunteer workstationsand also provides the current SecPAL security policy which the CPM will enforce.Grid jobs execute in an independent OS environment within a guest virtual machine partition,

supported by Virtual Server 2005. When grid jobs are not executing, no grid virtual machine isrunning. The workstation’s primary user may, of course, use Virtual Server to run any virtualmachines they require. Management of the grid virtual machines is controlled by the CPM andrequires no user interaction. This requires Virtual Server security be configured to allow the CPM,which runs under a local unprivileged account, to control Virtual Server. The use of an unprivilegedaccount restricts CPM access to the workstation’s resources and mitigates possible damage due tobugs or exploitable flaws in the CPM.When the CPM is assigned a grid job to execute, it starts the appropriate type of grid virtual

machine, which starts the bootstrapper component, which in turn handles loading and executing thegrid job. Both the bootstrapper and job run under a local, unprivileged account inside the virtualmachine. By default, they can only access a restricted sub-tree of the file system and have no abilityto authenticate to external services. They are supplied with appropriate SecPAL delegation tokensby the CPM if external resource access is required.The workstation user may configure the CPM to control the hours of availability, between

which jobs will be accepted and may reclaim the machine at any time via a notification trayagent. If this occurs, the grid virtual machine is immediately paused in response. This doesnot release the memory occupied by the virtual machine, but frees the CPU for the user’s use.This approach allows us to resume the grid job’s execution once the machine again becomesavailable.

3.3.1. Proxy access to network

If access to the network is required by the application, its code token (described in Section 3.5.5) caninclude assertions to grant it access to network resources. This is currently provided and enforcedthrough a custom TCP proxy mechanism implemented by the CPM. The proxy makes policy queriesto determine if an application is allowed to use it, and the assertions in its code token are that whichenable this check to pass. IP packets cannot be routed directly from the virtual machine. Thisrequires network-using applications to be aware of and use the proxy.

3.4. Cyclotron client application

A Cyclotron client application allows grid users to interact with the GM to submit and manage jobs.This allows users to easily supply job parameters, submit new jobs, get the status of in-progressjobs, and cancel jobs if necessary. Job requests are transmitted to the GM using the HPC BasicProfile standard [13]. Mutual authentication is based on the user and GM SecPAL grid securitytokens. The Cyclotron client supports user enrollment to obtain an STS-issued grid security tokenidentifying the user and their grid roles, such as a user, an administrator, or an application endorser.The enrollment requests are authenticated using the user’s Windows domain credentials (Kerberostickets). The STS uses this information to look up the group memberships of the user whichdetermines the appropriate grid roles to encode in the security token. This avoids the need for aseparate administrative approach for managing grid-specific information.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 9: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 249

The user’s security token is fully managed by the client application. This is an important usabil-ity consideration and avoids the need for grid users to learn new mechanisms for handling gridauthentication. The private key associated with a user’s grid security token is stored and used bythe Windows CryptoAPI. This provides strong off-line protection and reduces the likelihood of keycompromise.A separate tool was also developed to support generation of SecPAL code tokens. These are used

to identify and certify virtual machine images and grid application code and provide a means ofindependently verifying their integrity. In our prototype, only users in the application endorser rolemay generate code tokens which will be trusted by the various Cyclotron services. This is one ofthe roles that the STS may encode in the user’s security token. Code tokens are discussed furtherin Section 3.5.5.

3.5. Cyclotron access control policies

We now provide a set of exemplary access control policies for Cyclotron. These represent a plausible,though necessarily simplified, organizational policy. For example, we leave out administrative accessrules which would certainly be required. These additions, as well as enhancements to provide finer-grained access control, are possible and straightforward.For the purposes of this paper, we limit the number of resource containers considered. For the

grid job queues, only a single queue is used. The Cyclotron design supports multiple queues. Theseall exist in the virtual namespace ‘file://GridManagerQueue/’ with separate queues for each projectidentifier. This paper only discusses a single queue for the project ‘DefaultProject’, although wemention a second project called ‘SecondProject’ in the policy for illustration. Similarly, we use thenamespace ‘file://FileStore/’ to represent the grid file sharing service. A complex sub-directorieshierarchy can be used to separate out information by projects, users, and so forth. For this paper, onesub-directory is used to hold virtual machine images (VirtualMachines), another for grid applicationcode (Jobs), and a third for grid data files (Data).As noted in Section 3.2, the STS determines who is authorized to receive grid security tokens

and which attributes are appropriate. From the perspective of the grid, the important thing is thatthe STS-issued security tokens are the only acceptable mechanism for authenticating entities withinthe grid. All grid entities must have valid grid tokens to participate in the grid. User tokens willcontain their grid groups (Projects) and roles (GridUser, AppEndorser, and GridAdmin). The GM,file stores, and CPMs have tokens indicating their service type and endpoint URL.

3.5.1. Grid manager policy

Our prototype GM policy specifies the operations users and CPMs are allowed to access. It relies onSTS certifications of a user having the GridUser role and CPMs having the expected CPM serviceURL.GM says

1. STS can say %p possesses roleName:".*"2. STS can say %p possesses serviceName:".*"3. %p can call service "RegisterCPMState"

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 10: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

250 K. KANE AND B. DILLAWAY

if %p possesses serviceName:"ˆhttp://.*:[0-9]+/CPM/$"4. %p can call service "GetJobConfiguration"

if %p possesses serviceName:"ˆhttp://.*:[0-9]+/CPM/$"5. %p can call service "CreateActivity"

if %p possesses roleName:"GridUser"6. %p can call service "ListJobs" if %p possesses roleName:"GridUser"7. %p can submit, list,

delete:"file://GridManagerQueue/DefaultProject/"if %p possesses roleName:"GridUser"

8. %p can submit, list,delete:"file://GridManagerQueue/SecondProject/"if %p possesses rolename:"SecondProjectUser"

9. %p can say %q can submit, list,delete %r if %p can submit, list, delete %r

Assertions 1 and 2 express the GM’s trust in the STS. 3 and 4 specify authorized CPM actions,and 5–7 specify grid user-authorized actions related to managing the job queues. Note that 5 and 7together require that a user must be allowed to call CreateActivity and submit to the job queue inorder to create a job.Assertion 8 is an example of a delegation authorization policy. This allows a user to delegate

access to their jobs and queues to another principal who is in the GridUser role. The latter is requiredsince only GridUsers are authorized to the call the service interface. This responds to the very realrequirement to provide a means for users to delegate such rights if they will be unavailable for anextended time period. One would expect such delegations to be time restricted. This can be doneusing the timespan fact qualifiers as discussed in Section 3.1. Assertion 9 is an example of the sameassertion for a second project, ‘SecondProject’, existing in a separate namespace to show how usersfrom one project cannot interact with another project’s queue.Note that the multi-verb assertions used in this policy is a syntactic form done for compactness. In

SecPAL, such assertions are interpreted as a group of independent assertions for each verb (submit,list, delete).

3.5.2. File store policy

The grid FileStore policy expresses the authorized access by the grid services and users. Ourexemplary policy is:FileStore says

1. STS can say %p possesses roleName:".*"2. STS can say %p possesses serviceName:".*"3. %p can say %q can call service "GetFile"

if %p possesses serviceName:"http://.*:[0-9]+/GM/" and%q possesses serviceName:"http://.*:[0-9]+/CPM/"

4. %p can say %q can read, list "file://FileStore/VirtualMachines/"if %p possesses serviceName:"http://.*:[0-9]+/GM/" and%q possesses serviceName:"http://.*:[0-9]+/CPM/"

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 11: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 251

5. %p can read, list, write, deletedigitalContent:"file://FileStore/jobs/"if %p possesses roleName:"GridUser"

6. %p can read, list, write, deletedigitalContent:"file://FileStore/data/"if %p possesses roleName:"GridUser"

7. %p can say %q can read, list, write, delete%r if %p can read, list, write, delete %r

This defines separate authorization rules for access to the virtual machine images, grid applications,and grid job data directory trees. Assertions 1 and 2 define trust in the STS similar to the GM policy.Assertions 3 and 4 define authorized delegations which the GM is allowed to make. The first

says a GM can authorize a CPM to call the GetFile interface. In effect, this rule states that onlyan entity which can authenticate as a CPM service will be allowed to call the GetFile web serviceinterface. The second says a GM can authorize a CPM to read and list files in the virtual machineimage store. Together, these allow the GM to authorize a registered CPM’s read access to the gridvirtual machine images. In practice, this is time limited and CPMs must receive new delegationswith each new job they execute. Assertions 5 and 6 are straightforward rules authorizing any userin the GridUser role to access the job and data sub-directories. It is interesting to note that theseassertions are insufficient to allow a GridUser to access this information via the FileStore webservice. This is because there is no policy assertion granting such users the right to call the serviceGetFile interface. This has no impact on the users, since they access the file store using WindowsSMB file sharing based on their domain credentials. The reason these are important in the policyis that they allow us to introduce the delegation policy in Assertion 7. This authorizes a user todelegate any access rights they hold to the job and data sub-directories to another principal. As withthe GM policy, such delegations will only allow access if the delegate is also authorized to call theGetFile service. As discussed above, this is limited to CPM services. The net effect is that usabledelegations can only be made to CPM services. As with other delegations, these are time limitedin practice to the expected job lifetime.

3.5.3. CPM policy

The CPM policy provides control over what entities are authorized to call the CPMs web serviceinterface and expresses what grid software is authorized to run on the workstation. An exemplarypolicy is:CPM says

1. STS can say %p possesses roleName:".*"2. STS can say %p possesses serviceName:".*"3. %p can call service "AssignJob" if %p possesses

serviceName:"http://.*:[0-9]+/GM/"4. System Loader can execute %r if %r possesses

appEndorsement:"CyclotronGrid"5. %p can say %q possesses appEndorsement:".*" if %p possesses

roleName:"AppEndorser"

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 12: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

252 K. KANE AND B. DILLAWAY

6. %p can say %q can connect, acceptNetwork(Protocol=’.*’, Endpoint=’.*’)if %p possesses roleName:"AppEndorser"

As in the previous policies, the CPM policy defines those attributes it trusts the STS to assert.Assertion 3 authorizes the GM to call the AssignJob interface on the CPM, which is the onlymechanism for assigning jobs to a CPM. Assertion 4 defines what grid code the CPM is allowed toload and execute. In this case, only code for which we have a corresponding code token carryingthe ‘CyclotronGrid’ application endorsement is allowed. Assertion 5 asserts that only users in theAppEndorser role are trusted to assert an application endorsement. This role is restricted to thoseusers who can be trusted to exercise adequate due diligence before endorsing any code images.This could include verifying the origin of the code files, running virus scans, and/or running teststo ensure that the code is not obviously buggy.Finally, Assertion 6 expresses the CPMs willingness to trust users in the AppEndorser role to

define explicit network connection requirements for the grid application. In particular, this providesa mechanism to control what external endpoints a job can connect to, or accept connections from,using a given protocol type (i.e. TCP). An example of how this is used in practice is discussed inSection 4.

3.5.4. Example delegations

In the above policies there are several examples of explicit delegation authorizations. Most importantto operation of Cyclotron are delegations granting a CPM access to information about the file stores.By default, CPMs have no rights to access to this information. That is, they are not authorized tocall the web service interface nor are they authorized to access any of the directories containing thegrid software and data. For access to the virtual machine images, the GM must provide a delegationto the individual CPMs. This takes the form of two assertions:GM says

1. CPMx can call service GetFile from T1 to T22. CPMx can read, list

file://FileStore/VirtualMachines/VirtualMachineX from T1 to T2

Here the time period defined by T1 and T2 would reflect the expected lifetime of the job beingassigned to the CPM. ‘CPMx’ means a specific registered CPM identity and ‘VirtualMachineX’ isa specific virtual machine image.The CPM must also be granted explicit access to job-specific data. This is typically only the

job application and data shares, but could also include job-specific services. These come from theuser creating the job. Note that the user does not know which CPM will execute the job at creationtime. Hence, our prototype approach is for the user to authorize the GM to delegate this access toa specific CPM(s). This delegation chain is:

1. K-User says GM can say %p can read, listdigitalContent:"file://FileStore/jobs/myapp.exe

2. K-User says GM can say %p can read, listdigitalContent:file://FileStore/data/mydata.dat

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 13: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 253

3. K-User says GM can say %p can writedigitalContent:file://FileStore/data/myresults.dat

4. GM says CPMx can read, listdigitalContent:"file://FileStore/jobs/myapp.exe

5. GM says CPMx can read, listdigitalContent:file://FileStore/data/mydata.dat

6. GM says CPMx can write,digitalContent:file://FileStore/data/myresults.dat

When submitting the job, K-User (i.e. the public key of User) generates statements 1–3, enclosesthem in a signed token, and provides that token to the GM. When dispatching a job to a CPM, theGM generates statements 4–6, encloses them in a signed token, and provides both its token andK-User’s token to the CPM.Here we list the asserter of each statement (the principal that says) as it is not the same for

all assertions. In practice, these are time limited for the expected job lifetime. The result is thatindividual CPMs receive limited rights to job information they require and never hold any broadlyscoped, long-term access rights.It is also important to note that these delegations are made explicitly to the CPM expected

to execute the job. They are not usable by any other entity. This limits the risk from possibleman-in-the-middle attacks or wireline ‘sniffing’.

3.5.5. Example code token

The CPM policy described above only authorizes grid software to execute if it has been properlyendorsed. The CPM learns of such endorsements via SecPAL code security tokens associated withcode images. Cyclotron includes a tool for generating such tokens. The tool user is the assertor andthe generated token is digitally signed using the private key associated with the user’s SecPAL gridsecurity token.Each code image is uniquely identified by its hash value. This defines the code ‘principal’ used

within SecPAL assertions. For executables and libraries, we use the Microsoft Authenticode� [14]SHA-1 hash value. For virtual machine images we use a flat SHA-1 hash of the virtual hard drive(VHD) file. The code token can assert a variety of attributes associated with the code image. This caninclude the name, version, target processor architecture, compatible OS versions, and an applicationendorsement.An example of a code token’s assertions for the PhyloD application is shown below. Code tokens

are generated for each of the dependent libraries as well, but are not shown here.K-User says

1. H-PhyloD.exe possesses appEndorsement:"CyclotronGrid"2. H-PhyloD.exe possesses appName:"Name=PhyloD.exe"3. H-PhyloD.exe possesses

appTargetEnvironment:"ProcessorArchitecture=x86,OS=Microsoft Windows NT 5.2.3790 Service Pack 2"

4. H-PhyloD.exe possesses appDependency:"Name=PhyloD.dll"

STS says K-User possesses roleName:"AppEndorser"

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 14: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

254 K. KANE AND B. DILLAWAY

As noted previously, K-User should be in the AppEndorser role for this code token to be trusted.This is proven by including the statement from the STS above in the code token. Assertion 4lists a dependency on a dynamically linked library. In operation, each time the CPM downloads acode image it also downloads the associated code token. To determine if the code image shouldbe allowed to be executed, it first loads the code token into a SecPAL evaluation context alongwith the local CPM policy. It then computes the hash H for the code image and formulates thequery: ‘CPM says System Loader can execute H?’. This query can only be satisfied by theSecPAL evaluation engine if: (1) the value of H is equal to the hash principal value in the as-sociated code token (H-PhyloD.exe) ; (2) the code token was issued by a trusted user, i.e. theyare in the AppEndorser role; and (3) the code token contains the AppEndorsement required bythe authorization policy. If any of these three conditions are not satisfied, meaning the computedhash indicates modifications after the code token was generated, or the required endorsements aremissing or not asserted by a trustworthy user, then authorization fails and the associated job isaborted.For a job requiring multiple downloaded code files, each code file must independently be au-

thorized to execute by policy. Assertions of the form of assertion 4 above will cause anothercheck to be performed on each listed dependency. It is the responsibility of the application en-dorser to properly list the dependent libraries that must be shipped with the executable. Librariesfrom the operating system (and therefore located in the virtual machine image) do not need to belisted.

3.6. Job life cycle

Now that we have described the Cyclotron environment and access control mechanisms, we tracethe life cycle of a job from creation through completion. The interactions between the various gridentities are depicted in Figure 2. The numbers are keyed to the discussion in the subsequent sec-tions. This discussion primarily focuses on the elements which provide the assurance and isolation

WindowsHost

Virt. Srv

Grid Partition

Bootstrapper

GridApp

Client PartitionManager (CPM)

Figure 2. Cyclotron compute node components.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 15: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 255

Figure 3. Cyclotron job life cycle.

properties for the grid environment. We assume that virtual machine images have already beencreated, endorsed, and placed on the OS image file share (Figure 3).

3.6.1. Phase 1: publishing the job application and data files

1. An application endorser creates code tokens for the application code image files. They thenplace the code image files and associated code tokens on the grid file store. The grid userneeding to run a job determines the location of the required application code and referencesthis location in their job parameters.

2. The user either creates the necessary data file for the job and uploads it to the grid file store ordetermines the location of an existing data set. This location is also referenced in their job pa-rameters.

3.6.2. Phase 2: job scheduling

3. The user schedules a new job by uploading their job parameters to the GM. This includesthe job identifiers, references to application and data files, the required VHD, and so forth. Inaddition, the Cyclotron client will automatically generate the appropriate delegations of theuser’s rights necessary to execute the job. These are expressed as time-restricted delegationauthorizations for the GM and are not directly usable by any other entity.

4. The GM authenticates the user request based on their security token. It then determines ifits policy authorizes job creation based on the authenticated facts from the security token. Ifauthorized, a new job queue entry is created which makes the job information available to theGM scheduler. The GM sends a response confirming the job creation which it authenticateswith its security token. This allows the user to confirm the authenticity of the response.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 16: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

256 K. KANE AND B. DILLAWAY

3.6.3. Phase 3: job dispatch and execution

5. The GM scheduler allocates jobs to available CPM based on several criteria. Once it hasdetermined an allocation, it contacts the CPM and assigns it a job. Along with the jobparameters, the GM generates any required delegated rights necessary to execute the job asdiscussed in Section 3.5.4, and transmits the entire delegation chain to the CPM. Delegationstypically include rights to read a VHD file, read application code files, read data files, andwrite results files. In some cases delegations to access other services may be necessary. Thesemessages carrying this information are all authenticated by the GM.

6. The CPM receiving the job assignment request first checks its local policy to ensure thecaller is authorized. Only the GM is authorized by our baseline policy which prevents otherentities from injecting rogue jobs into the grid system.

7. The CPM then downloads the application code package from the application file share, placesthe code and token files in a working directory. It then validates that these are authorized forexecution as described previously. If the execution check fails, the job is aborted.

8. Only then does the CPM download the data file(s) from the data file store.9. Once it has the required job application and data files the CPM checks that it has the required

VHD image. If the correct image is not already cached in the local file system, the CPMdownloads the virtual machine image from the file store using its delegated rights. The hashof the virtual machine image is computed, the CPM validates the VHD is authorized toexecute. This ensures that the correct VHD is being used and that it has not been corruptedor tampered with. This is also where the CPM checks to see if a cached image should beevicted due to an updated image or lack of local disk space.

10. The virtual machine is created and booted, the job application code and data are transferred,and the code initialized using parameters supplied in the job parameter file. When the jobcompletes, results are transferred to the CPM. The virtual machine is then terminated anddeleted. The original VHD image is cached for later re-use to minimize latency and networkutilization. Any changes to the VHD resulting from the job execution are discarded.

11. The CPM uploads the results to the results file share using the provided delegation tokens.12. The CPM signals to the GM that the job is complete. It is the responsibility of the GM to verify

that the CPM which was assigned the job is the same one which has signaled its completion,which it can deduce from the identity token used by the CPM to signal this completion.

3.6.4. Phase 4: status query and results collection

13. The user may query the GM for job status at any time. The GM will check that the user isauthorized to query the required job queue based on their grid security token as describedpreviously. If authorized, then the GM will respond with the job status information. Thismessage is authenticated by the GM using its security token allowing the user to verify itsorigin.

14. Based on the status response message, the user can see whether the job is pending, executing,completed, or has encountered some unrecoverable error. Once the job has been completed,the user can retrieve the results from the results file share using SMB file sharing and theirdomain credentials.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 17: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 257

4. EXPERIMENTAL RESULTS

The STS, GM, and File Store in our experimental system all run on a single, dedicated DellPowerEdge 2950 server running Windows Server 2003. The volunteered compute nodes run on avariety of PC hardware platforms. Client nodes included both Intel- and AMD-based workstations,with typical memory configurations between 1 and 4GB. Disk sizes varied, but free space pressurewas rarely a concern. These were running a number of different versions of Microsoft Windowsincluding XP, Server 2003, Vista�, Server 2008. Jobs ran inside customized virtual machine imagesbased on Windows XP SP2. These images were approximately 2.5GB in size.All these images included the .NET Framework 3.0 and our bootstrapper agent which is responsi-

ble for launching the job application code once the virtual machine has booted. The job applicationcode can be any OS compatible executable, either managed (i.e. .NET) or native code. One virtualmachine image included pre-installed binaries for the POV-Ray application. This is an example ofnative code executable which requires administrative privileges to install on the target OS. By usingthis approach, we avoid the need to run our bootstrapper, or other installation application, under aprivileged account.

4.1. Test workloads

Three workloads were prepared as test jobs:

1. PhyloD [15], which takes samples of HIV viruses as input and analyzes mutations betweenstrains.

2. POV-Ray [16], a raytracer.3. A distributed DES key cracking application.

These jobs each illustrate a different cycle-scavenging job paradigm. PhyloD is entirely self-contained, operates locally on a provided input data set, and generates an output file. It is written inmanaged code and need only be copied into a directory in the grid virtual machine and executed.Command line parameters, supplied with the job parameters determine the data file and results filenames.POV-Ray operates in a similar manner but required a specialized virtual machine image which

had the application code pre-installed by an administrator.The DES key cracking application represents a significantly different operational paradigm. Its

operation is somewhat similar to the SETI@home volunteer computing system in that a centralservice is responsible for determining the workload assigned to individual grid jobs and collectingthe computational results. The grid DES key crack jobs were written in a managed code and couldsimply be copied into the execution environment and run.In operation, each DES crack job contacts the controlling service in order to obtain an encrypted

message and a DES key space to search for the decryption key. This requires the grid job tobe able to access the network. As noted previously, Cyclotron does not allow grid jobs networkaccess capabilities by default. It will only provide those network access rights which a job specif-ically requires. This is controlled by network permissions embedded in the grid applications codetoken. These allow the code token generator to assert what outgoing (connect) and incoming (ac-cept) connections are required by the job. These identify both the allowed protocol and endpoint.

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 18: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

258 K. KANE AND B. DILLAWAY

For example, if the DES key cracking service is located at host DESService on TCP port 17099,then one would need to allow the grid jobs to access DESService on that port. This is encoded inan assertion such as:

K-User says H-DESCrack.exe can connect(Protocol=TCP, Endpoint=DESService:17099)

The default policy in the CPM trusts such network connectivity assertions provided the K-User isin the AppEndorser role. The CPM functions as the policy enforcement point for the grid virtualmachine’s network access and will allow only those connections which are explicitly authorized touse this mechanism.Having the required network access is necessary, but not sufficient, to allow the DES crack job

to access the DES cracking service. The service requires all requests to be authenticated and onlyallows those in the GridUser role, or entities they delegate their right to, to access the service. TheDES service policy is very similar to that already discussed for the grid File Store (Section 3.5.2).Hence, it is required that the user delegate to the job the necessary rights to access the service.

This is handled by the Cyclotron client, similar to other required delegations, with the GM handlingthe re-delegation to the assigned compute host. Cyclotron includes a general purpose mechanismfor associating delegation credentials with web service messages employing the SOAP MessageSecurity standard for authentication. Similar mechanisms could be added to handle authenticationfor other protocols.The DES crack application follows a very traditional client–server interaction paradigm. One

could easily extend this approach to support cooperative parallel processing between multiple jobinstances. For this type of application, a network service would need to provide discovery andrendezvous coordination for the various job instances as the grid execution environments do nothave well-known network addresses which can be discovered through other means.

4.2. Analysis

At its peak, our grid comprises 54 hosts inside Microsoft Corporation communicating over thecorporate intranet. Most systems were physically located in the Puget Sound region of Washingtonstate, though grid users and computational hosts were also present in California, Portugal, Germany,and India. Our GM and file server were located in a lab on the Puget Sound site. We submitted 100jobs every night which were a mix of the test workload described above. In addition, the grid userssubmitted other jobs they had developed. Cyclotron was regularly observed to have a 98–100% jobcompletion rate. Job failures typically resulted from incorrectly configured jobs, compute nodesbecoming unresponsive, or application-specific failures unrelated to the grid system.The total job execution time varied based on workload, but virtual machine image transfer time

(where applicable), startup, and shutdown time added to overall latency between job submission andcompletion. In the worst case, virtual machine boot was 60 s, and shutdown once job execution wascomplete was 45 s. Transfer time of virtual machine images ranged from 5min within Puget Soundto over 2 h internationally. As seen in step 9 of the CPM workload in Section 3.5.3, this imageis cached, and so this transfer time is only incurred once per image. Images are only downloadedagain if the image is changed, or if evicted from cache due to disk space pressures. This has thecurious effect of introducing a very high latency on the first job to use a single image. We expect the

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 19: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

CYCLOTRON 259

virtual machine images to remain static over extended time periods to provide a stable executionenvironment. Individual job codes are transported separately and are relatively small in size.

5. CONCLUSIONS AND FUTURE WORK

Cycle-scavenging grids continue to generate considerable interest and discussion. They have provenquite popular for volunteer computing and academic/research use is growing. Adoption inside com-mercial and governmental organizations remains limited due to many factors, including concernsover information security and impact on client machine primary business demands. Cyclotron hasdemonstrated the feasibility of creating cycle-scavenging environments which can offer far betterassurances in these areas.Cyclotron has shown that the use of dynamically configured virtual machines can support grid job

execution while also providing strong isolation from the workstation host environment. In addition,by using centrally managed virtual machine images one can provide standard environments for gridjobs which are capable of supporting reproducible computation. Cyclotron has also demonstrated thefeasibility of building a virtually isolated grid environment within an organization which providesfine-grained access control based on an easy-to-use, declarative policy language with high assuranceauthentication and authorization. It proved fairly simple to create grid policies that supported theneeds of several different types of applications with different external resource dependencies.Cyclotron uses Virtual Server as its virtualization platform, which follows the classic virtualiza-

tion model where the Virtual Machine Monitor (VMM) operates inside the workstation host OS.Hypervisor architectures, where a virtualizing layer exists beneath all of the operating systems,including the host, is now embraced by new VMMs. This model is implemented by MicrosoftHyper-VTM, Xen, and VMware ESX. These allow somewhat better isolation between the host andguest partitions and tend to support better virtual device control. Moving to a hypervisor-basedvirtualization environment would be a valuable enhancement to Cyclotron.SecPAL already provides the grammar for articulating policy across a wide variety of resources,

so it seems feasible to extend the current usage to provide policy control over device access by thegrid guest partition. This could provide a means of limiting machine resource consumption by agiven grid virtual machine. This could become valuable in the future as multi-CPUmachines becomewidely deployed. On such machines, one might wish to run multiple grid execution environmentsin parallel to optimize use of the available CPUs.Cyclotron secure code loading is presently implemented by the CPM. This includes virtual

machine and job image integrity validation as well as policy authorization checks. A more securecode loading system would integrate policy checks in the system loader.The network proxy currently operates as a service in the CPM, and client applications must

be aware of the proxy and use it. Attempting to directly route IP packets through the networkingstack of the guest will fail. One improvement would be to add a ‘shim’ in the networking stack tointercept network calls, and route them through the proxy.An interesting addition would be high assurance attestation as to the actual grid job being run.

This could potentially be used by remote sites to further limit access. We considered the use ofthe Trusted Platform Modules (TPM) as the basis for creating attestation chains for the executionhosts and grid applications. This could potentially provide additional protection against malicious

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe

Page 20: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise

260 K. KANE AND B. DILLAWAY

hosts, CPMs, and application codes. This represents an interesting area for future exploration asTPM chips become more commonplace on business workstations.

ACKNOWLEDGEMENTS

The authors would like to thank the referees from the Middleware for Grid Computing 2007 (MGC2007)workshop for their comments on the initial version of this paper, the attendees at the workshop for theirdiscussions with us, and the referees for the journal version of this paper for their comments.

REFERENCES

1. Enterprise Grid Alliance. Enterprise grid security requirements. Available at: http://www.ogf.org/gf/docs/egadocs.php[5 May 2009].

2. Selby N. Grid security: ? Availbale at: http://www.ogf.org/OGF20/materials/672/OGF20 GMB Nick Selby.ppt[5 May 2009].

3. Thain D, Tannenbaum T, Livny M. Distributed computing in practice: The condor experience. Concurrency andComputation: Practice and Experience 2005; 17(2–4):323–356.

4. Anderson D, Cobb J, Korpela E, Lebofsky M, Werthimer D. SETI@home: An experiment in public resource computing.Communications of the ACM 2002; 45(11):56–61. DOI: 10.1145/581571.581573.

5. Anderson D. BOINC: A system for public-resource computing and storage. Proceedings of the 5th IEEE/ACMInternational Workshop on Grid Computing, Pittsburgh, PA, 2004. DOI: 10.1109/GRID.2004.14.

6. Federal University of Campina Grande. OurGrid, Open source software for peer-to-peer computing. Available at:http://www.ourgrid.org [5 May 2009].

7. Digipede Technologies, LLC. The Digipede network (web site). Available at: http://www.digipede.net/products/digipede-network.html [5 May 2009].

8. Univa UD. Grid MP overview (web site). Available at: http://www.univaud.com/hpc/products/grid-mp/ [5 May 2009].9. Nadalin A, Kaler C, Monzillo R, Hallam-Baker P. Web services security: SOAP message security 11 (WS-Security 1.1).

Available at: http://www.oasis-open.org/committees/download.php/16790/ wss-v1.1-spe-os-SOAPMessageSecurity.pdf[5 May 2009].

10. Microsoft Corporation. Windows Communication Foundation (WCF). Available at: http://msdn.microsoft.com/en-us/netframework/aa663324.aspx [5 May 2009].

11. Becker M, Fournet C, Gordon A. SecPAL: Design and semantics of a decentralized authorization language. TechnicalReport MSR-TR-2006-120, Microsoft Research, September 2006.

12. Dillaway B, Hogg J. Security Policy Assertion Language (SecPAL) Specification 1.0. Available at:http://research.microsoft.com/projects/secpal/downloadSecPALSpecification.aspx [5 May 2009].

13. Dillaway B, Humphrey M, Smith C, Theimer M, Wassen G. HPC Basic Profile, Version 1.0. GFD-R-P.114. Open GridForum 2007.

14. Microsoft Corporation. Microsoft Authenticode Reference Guide. Available at: http://www.microsoft.com/technet/archive/security/topics/secaps/authcode.mspx?mfr=true [5 May 2009].

15. Bhattacharya T, Daniels M, Heckerman D, Foley B, Frahm N, Kadie C, Carlson J, Yusim K, McMahon B, Gaschen B,Mallal S, Mullins JI, Nickle DC, Herbeck J, Rousseau C, Learn GH, Miura T, Brander C, Walker B, Korber B. Foundereffects in the assessment of HIV polymorphisms and HLA allele associations. Science 2007; 315(5818):1583–1586.

16. Persistence of Vision Raytracer Pty. Ltd. POV-Ray—The persistence of vision raytracer. Available at:http://www.povray.org [5 May 2009].

Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:241–260DOI: 10.1002/cpe