recommending resources to cloud applications based on ...faculty.missouri.edu › calyamp ›...

Recommending Resources to Cloud Applications based onCustom Templates Composition

Ronny Bazan AntequeraUniversity of Missouri-Columbia, USA

[email protected]

Prasad CalyamUniversity of Missouri-Columbia, USA

[email protected]

Arjun Ankathatti ChandrashekaraUniversity of Missouri-Columbia, USA

[email protected]

Shivoam MalhotraIndian Institute of Technology (Varanasi), India

[email protected]

ABSTRACTEmerging interdisciplinary data-intensive applications in science andengineering fields (e.g. bioinformatics, cybermanufacturing) demandthe use of high-performance computing resources. However, data-intensive applications’ local resources usually present limited capac-ity and availability due to sizable upfront costs. The applicationsrequirements warrant intelligent resource ‘abstractions’ coupled with‘reusable’ approaches to save time and effort in deploying cyberin-frastructure (CI). In this paper, we present a novel ‘custom templates’management middleware to overcome this scarcity of resourcesby use of advanced CI management technologies/protocols to on-demand deploy data-intensive applications across distributed/federatedcloud resources. Our middleware comprises of a novel resourcerecommendation scheme that abstracts user requirements of data-intensive applications and matches them with federated cloud re-sources using custom templates in a catalog. We evaluate the accu-racy of our recommendation scheme in two experiment scenarios.The experiments involve simulating a series of user interactions withdiverse applications requirements, also feature a real-world data-intensive application case study. Our experiment results show thatour scheme improves the resource recommendation accuracy by upto 21%, compared to the existing schemes.

KEYWORDSFederated Cloud Resources, Component Abstraction Model, CustomTemplates, Recommendation Scheme

ACM Reference format:Ronny Bazan Antequera, Prasad Calyam, Arjun Ankathatti Chandrashekara,and Shivoam Malhotra. 2017. Recommending Resources to Cloud Applica-tions based on Custom Templates Composition. In Proceedings of CF’17,Siena, Italy, May 15-17, 2017, 10 pages.DOI: http://dx.doi.org/10.1145/3075564.3075582

1 INTRODUCTIONData-intensive science applications in fields such as bioinformatics,climate modeling, and particle physics are becoming increasinglymulti-disciplinary. These applications present unique requirementsin terms of deployment of distributed heterogeneous infrastructure

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected]’17, Siena, Italy© 2017 ACM. 978-1-4503-4487-6/17/05. . . $15.00DOI: http://dx.doi.org/10.1145/3075564.3075582

and use of advanced cyberinfrastructure technologies/protocols suchas high-speed data transfer protocols, end-to-end virtualization, lo-cal/remote computing, Software-defined Networking (SDN) withOpenFlow support, Network Function Virtualization (NFV), end-to-end performance monitoring [35], and federated identity & accessmanagement [2]. In most cases, such resources are shared amongfederated user sites and are handled as ‘component’ solutions thatcan be combined for transforming a local applications (frequentlyat desktop scale) to a hybrid cloud application (i.e., infrastructurecomposed of distributed/federated cloud resources) [3].

Traditional infrastructure deployment approaches use a five-stepwaterfall model involving: sequential abstraction, analysis, compo-nent deployment, recursive benchmarking and infrastructure deploy-ment steps. Such a tedious five-step approach generally requiresinvolvement of experts to accurately identify the application re-quirements, compose feasible solutions, and re-tune the componentspost-deployment to improve upon sub-optimal outcomes. Alterna-tively, automation of such a waterfall model through expert systemscan be performed using suitable ‘abstractions’ of heterogeneousresource components and a ‘reusable’ solution model. The lackof abstraction and reusability of configurations in the approachesmakes provisioning of heterogeneous resources for data-intensiveapplications quite time-consuming and prone to guesswork. Thissubsequently leads to sub-optimal and cost-prohibitive outcomes fordata-intensive application users and can impede wide-adoption ofdynamic distributed resource management.

There are existing infrastructure-level automated resource deploy-ment approaches that offer automation and reusability to some ex-tent. These solutions include: VMware Appliance [4] (vApps) fromVMware; Global Environment for Network Innovations (GENI)RSpecs [5] using XML files; and Amazon Machine Images [6] byAmazon Web Services (AWS). However, due to their proprietarynature, they assume a homogeneous deployment environment thatfeatures only their API (Application Programming Interface) thathides the complexity, and thereby, obviates the need for abstractionsto facilitate heterogeneous resource provisioning. There are alsoexisting application level approaches such as, VMware ThinApp,Citrix XenApp and Microsoft App-V that are limited in terms ofplatform and operating system independence. Thus, a suitable ab-straction, virtualization, and orchestration approach is needed tofully comprehend the reusable heterogeneous distributed resourcedeployments for data-intensive applications.

In this paper, we address the above limitations of abstractionefforts and deployment approaches for heterogeneous distributedcomputing infrastructures. More specifically, we present a novelmiddleware, shown in Fig. 1, for integrating federated/distributedcloud resources to support data-intensive application user needs. The

Figure 1: Abstraction of federated cloud infrastructurecomponents to compose solutions that integrate heterogeneous

cloud resources.

goal of the design of our middleware is to enable it to create data-intensive application resource requirement ‘abstractions’ to fosterpertinent cloud resources recommendations, coupled with ‘reusable’approaches and automated resource deployment to save time and ef-fort. Our middleware implements the Component Abstraction Modelwhich is designed to abstract and group different cloud resourcesinto categories of heterogeneous resource components through acorresponding Application Requirement Abstraction. It generatesreusable, and extensible Custom Templates of components to fulfillthe diverse data-intensive application requirements. Our middlewarebuilds upon existing technologies, and protocols and uses a “CustomTemplate Catalog” for storage and reuse of these Custom Templates.

We evaluate our middleware recommendation scheme with sim-ulated interactions and a real-world data-intensive case study. Wesimulate a series of user interactions for diverse applications re-quirements, considering ‘novice users’ (users who provide reducedor incomplete information to the KIS) and ‘expert users’ (userswho are more aware of their data-intensive application requirementsand provide more inputs to the KIS). Next, our evaluation of themiddleware implementation features a real-world data-intensive ap-plication with computing workflow requirements involving a clusterof systems/nodes. Our experiment results demonstrate significantimprovement for novice/expert users to effectively express theirdata-intensive application requirements to the KIS, and subsequentlyaccess federated cloud resources that are automatically deployed andconfigured. This whole process reduces the resource provisioningtime drastically and the guesswork in selecting appropriate cloudresource allocations. For our two experiment scenarios, we use theGENI cloud infrastructure [5], and setup a testbed with distributedheterogeneous resources that can be discovered and configured byusing the geni-lib, a GENI API capability.

A summary of the overall contributions of our middleware pro-posal is as follows:

• Knowledge Interface System (KIS): The KIS efficientlyutilizes different data structures to abstract requirements fordata-intensive application requirements, and normalizes thedata for future comparisons. It also significantly reducesthe complexity of collecting requirements from novice andexpert users, through intelligent interactions that guide themto effectively express their needs.

• Recommendation Scheme: This scheme takes into accountthe heterogeneity of distributed cloud resources, and recom-mends candidate solutions with high accuracy (up to 92%in best case scenarios). It does so by recommending customtemplates that can be customized and stored in a catalogfor future reuse.

• Resource Deployment Engine: This engine automates theprocess for cloud resources deployment in a federated-distributed cloud environment. The infrastructure is de-ployed by using any available cloud service provider (CSP)API, and software requirements of novice and expert usersare met by using containerization.

The reminder of the paper is organized as follows: Section 2presents related works. Section 3, describes our Component Abstrac-tion Model. In Section 4, we present our Recommender Algorithm.In Section 5, we present details of our middleware implementa-tion and resource deployment methodology. Section 6 discusses theperformance evaluation and Section 7 concludes the paper.

2 RELATED WORKThe related work can be broadly divided under into the followingcategories: resource abstraction models, knowledge interface & rec-ommendation schemes, and middleware for automated distributedresource provisioning.

2.1 Component abstraction modelIt is important to identify and abstract application requirements, aswell as an available pool of resources to effectively provision in-frastructure configurations. Hence, applying relevant methods forrequirements abstraction is imperative. Our literature review foundworks related to this area such as [7], where authors use componentabstraction for reducing the complexity in the planning and manage-ment processes. They break down a given problem by using heuristicrules in order to plan at the component level. In [8], the authors useabstractions for component-based software architectures by estimat-ing the assembly of reusable software components. The abstractionsalso involve making a forecast on software properties to an associ-ated architecture by using graph-theoretic approaches. In [9], authorsapply abstractions to check large asynchronous designs based oncomponent abstraction verification in isolation. This process elimi-nates the need for finding an accurate verification. Similarly, authorsin [10] present different software engineering theorems for a hier-archical abstraction model pertaining to knowledge development inthe brain.

Existing abstraction models and methodologies can be lever-aged as initial approaches for application requirement abstractionand cloud infrastructure abstraction of distributed heterogeneousresources, an area which we found is still quite under-explored.We adapted the idea of component level abstraction to abstract dis-tributed heterogeneous resources by enabling different domains suchas networking component, storage component, computation com-ponent etc. as shown in Fig. 1. Thus, the decisions are made within

and across the components, which makes resource planning andcomposition relatively easy and more efficient.

2.2 Cloud service matchingIn recent times, a considerable amount of research work has beendone on the cloud service matching problem and the related resourcerecommendation problem. Given the user requirements, the servicematching algorithms finds the best available cloud resources frommultiple cloud service providers. The definition of the best resourceis based on the user’s importance to the functional, non-functionaland cost of the resources. Some research efforts consider the problemas a multi-criteria decision making (MCDM). For example, authorsin [11] focus on recommendations for Software-as-as-Service (SaaS)using an Analytical Hierarchy Process to rank services; each ser-vice is ranked based on the weights collected by experts for eachmetric and the best service is recommended. A similar approach ispresented in [12] through the implementation of a vector of cost,performance, and mobility generated for available resources, and avector for user requirements in order to find resource similarity. Incomparison, our work considers a recommender that accounts forseveral variables that should be considered such as metrics, QoS andfeature importance that may vary from user to user.

Works that consider different variables for service matching andrecommendation include [13] and [14]. In [14], a web ontologycloud model is constructed to abstract the heterogeneity of resourcespecification among multiple cloud service providers. The obtainedsolution is based on a genetic algorithm to match both qualitativeand quantitative criteria. In [15], authors discuss the problem ofselecting the closest available service to user data in order to decreasethe transfer time and thereby, increase the response time. Workin [16] presents a cloud broker that discovers services and calculatesthe indexes for each service by considering user requirements. Itthen applies a K-NN algorithm to obtain services that match theuser requirement based on the calculated indexes. Authors in [17]developed a comprehensive user interface for cloud service selectionbased on multiple criteria and considered user preferences (i.e., cost,QoS attributes). However, this approach lacks a deployment engineand functions as the equivalent of an e-commerce website for a CSP.A different approach is presented in [18], where authors discussa system that uses a knowledge base to store available resourcesand previously recommended solutions. The knowledge base issearched for every new request to identify a similar solution thatcould be adapted to suit the new requirement. A knowledge baseapproach where user requirements are stored as a template in acatalog is also presented in [19]. In comparison, our work considerscollection of novice and expert user requirements through intelligentKIS interactions and builds on the ideas to use a catalog and acorresponding knowledge base.

2.3 Automated resource provisioning frameworksStudies on automation and provisioning of cloud resources have beenconducted and presented in recent times. Authors in [23] present aweb framework which extracts the arbitrary program of an applica-tion from GitHub and deploys it in a user-desired cloud. The toolsetconverts the application to be deployed on the cloud into virtualmachine images. However, this solution is restricted in its abilityto execute tasks in standalone systems. In [24], authors describe anarchitecture composed of four layers (Cloud infrastructure, Abstrac-tion, Orchestration, and Design) that enables automated deploymentof cloud services. It considers templates that encapsulate resource

provisioning requirements. However, they do not consider noviceand expert user interactions for requirements collection, and do notemploy a recommender scheme to suggest a suitable solution, as wedo in this paper.

In [25], the authors present MetaConfig system, a tool similarto Puppet [26] that integrates configuration management, virtualmachine allocation, and bootstrapping for virtual machine allocation.The MetaConfig system is flexible enough to include unexpectedchanges and scalability requirements, however, it does not considerconfiguration customization before the system is deployed. Somestudies on middleware to provision elastic resources for cloud appli-cation have been developed such as Celar [28] that allows users toselect and deploy certain compute resources in different cloud plat-forms through API calls using abstraction libraries (such as ApacheLibcloud [30], jclouds [31], delta-cloud [32]). Cisco UCS Direc-tor [29], a Unified Infrastructure Management middleware, alsoprovisions compute, network and storage resources automatically inlocal and remote locations using a catalog equivalent scheme viz.,‘workflows’, without a recommender feature.

Our approach improves the abstraction and automated deploy-ment process by applying intelligent catalog management and arecommender scheme that allows effective and easy to re-use exist-ing solutions to save time, avoid guesswork and enable automatedresource deployment over heterogeneous resources. Our work alsoconsiders container technology [33, 34] to foster reproducible ex-periments for data-intensive applications. Our middleware can beintegrated into emerging heterogeneous distributed computing infras-tructures and can be extended to complement other infrastructuresthat choose to adopt related standards such as OASIS Topology andOrchestration Specification for Cloud Applications (TOSCA) [27].

3 COMPONENT ABSTRACTION MODELIn this section, we present the details of our component abstractionmodel solution. Our aim is to transform the resource performanceexpectations of data-intensive applications’ (through intelligent re-quirements collection) into reusable resource recommendation tem-plates. For this, we develop a Custom Template Cycle shown inFig. 1 that involves three stages: Collection, Composition and Con-sumption. Fig. 2 presents the Custom Template Life Cycle stagesand the intermediate steps involved. The life cycle stages are, asfollows:

• Collection: In this stage, we collect and abstract the data-intensive application requirements and categorize them intodifferent resources domains.

• Composition: Depending on the requirement of the data-intensive application, we recommend either existing similarcustom template solutions from pre-existing templates orcompose new custom templates.

• Consumption: Candidate custom templates configurationsare presented to the data-intensive application user who is anovice or an expert. Upon selection, the resource allocationprocess automatically deploys cloud resources along withrequired software for user access.

3.1 CollectionThis stage collects and abstracts user requirements through the KIS,which is a web-based solution that aims to mimic scenarios to col-lect requirements for data-intensive application infrastructure. Tra-ditionally, the requirements collection is done through one-to-one

Figure 2: Custom Template Life Cycle. Our middleware solution presents three main stages: (a) Collection, that abstractsdata-intensive application requirements. (b) Composition, that recommends template solutions based on the comparison of

requirements with templates stored in a catalog or by the creation of new templates, and (c) Consumption, that enables cloudresource deployment automation.

Figure 3: KIS is based on a web questionnaire that collects general data-intensive application information, network-connectivity,storage, computation resources and software requirements. Based on users interaction with the KIS, an internal module (policy

database) pre-populates fields with pertinent values to help users decide whenever they are not sure about their inputs.

interviews and meetings between users and cyberinfrastructure engi-neers. In our approach, we present to users a questionnaire to gatherapplication requirements, as shown in Fig. 7 that correspond to Step1 of Fig. 2. Our questionnaire is divided into five sections: Generalinformation, Networking, Storage, Computation and Software re-quirements. Each section gathers specific domain information thathelps to understand application infrastructure and software needsand recommends adequate solutions that might range from virtualmachines with standard OS capabilities to distributed resources overfederated CSP software that needs heterogeneous cloud resources.

The main function of the KIS is to collect user requirements fortheir applications and to help novice users to choose compatible re-sources. The first step collects applications’ high-level requirementsand preferences to understand resource priority needs. This collected

information helps the KIS to pre-populate relevant fields with de-fault values by predicting the needs. However, users (novice andexpert) can always change the pre-populated values to personalizetheir solution. The pre-population process uses a rule-based model(stored in a database) to select the most adequate answer (basedon previous interaction with the KIS) which guarantees minimumfunctional resource compatibility for a given set of application re-quirements. The KIS additionally allows users to reuse previouslydeployed templates (select or upload template) in order to customizethem and skip the questionnaire process.

3.1.1 Application Requirement Identifier. When users final-ize their interactions with the KIS, the collected data is automaticallytranslated into an Application Requirement Identifier (ARI) structure(step 2 of Fig. 2). An ARI structure represents the meta-information

of an application’s resource requirements containing a vector con-sisting of cloud resources Features and corresponding Preconditionsthat satisfy the deployment of the application. A typical ARI struc-ture is shown in Fig. 4, where Fi represents required resource feature(i.e., Bandwidth, Storage Size, RAM size) and Pi represents thecorresponding required precondition (i.e., 20 Mbps bandwidth, 20GB HDD storage, 8 GB RAM). A single or multiple such Fi − Pipair/s (i ∈ I, where I is the set of required resources, and I = |I| is thenumber of elements in I) that may belong to any new user applicationAk . Satisfying an application’s resource requirements is subjectedto the successful fulfillment of all the features’ preconditions.

Figure 4: Application Requirement Identifier structure

3.1.2 Resource Space. The collected data is then comparedwith the cloud resource components called Resource Space (RSpace).The components within the RSpace (step 3 of Fig. 2) are catego-rized into domains, i.e., VPN (virtual private network) and networkbandwidth resources belong to the overall Network Connectivitydomain and No. of CPUs cores, RAM size belong to the overallComputation domain.

We define D as the set of all cloud resource domains whereD={D1, D2, ..., DN }, and N is the number of domain categories thecloud resources are divided into N=|D|. Each such domain consistsof a different number of cloud resources as follows:

D1 = {R11, · · · ,R

1A}where A is the no. of resources in D1

D2 = {R21, · · · ,R

2B}where B is the no. of resources in D2

...

DN = {RN1 , · · · ,RNN }where N is the no. of resources in DN

Each such resource consists of one or many resource specifica-tions and their corresponding constraints, which define the perfor-mance bounds of the respective resources and the associated cost.For simplicity, we are not showing cost associated with the resourcein the representation shown below. The RSpace data structure isrepresented as:

R11 = {(S

111,C

111), · · · , (S

11A ,C

11A )}

R12 = {(S

211,C

211), · · · , (S

21B ,C

21B)}

...

R1A = {(S

A11,C

A11), · · · , (S

A1N ,C

A1N)}

where Rpm denotes p-th resource of the m-th domain. Spmn and

Cpmn denote n-th specification, and constraint respectively of the

p-th resource belonging to them-th domain, where A to N are thenumber of constraints in R1

1 to R1A

respectively. Such a formalization

maintains the uniqueness of each domain, resource, constraint andits corresponding resource cost.

Fig. 5 shows RSpace representation of resources and their corre-sponding constraints categorized in different domains.

Figure 5: Resource Space abstraction

3.1.3 Macro Operator. The output data structure of the Col-lections stage is called as the Macro Operator (MacOps), which isa set of candidate cloud resources that could satisfy data intensiveapplication requirements. Depending on the resource specifications,and constraints, MacOps are constructed from a mapping of a featurein ARI to one or many resources in the corresponding domain D,resource couples in RSpace and are represented as:

f : (Fl , Pl ) 7→ (Dk ,Rkj ) (1)

where l ∈ I, k ∈ D, and j ∈ {A|B| · · · |N}. Here A denotesthe set with A resources. Equation (1) formulates the process ofthe ARI features translating into domains and the correspondingARI preconditions along with RSpace resource specifications andconstraints being mapped into one or multiple resources in thatdomain, forming one or multiple (Domain, Resource) pairs. Onelimiting criteria for such a mapping is that the total number of l’sshould be equal to the total number of k’s, i.e., each feature in anARI maps to one and only one domain in RSpace. However, no suchrestrictions apply for a number of resources within the domain. Thecorresponding ARI preconditions are added to the domain-resourcepairs forming MacOps as (Domain,Resource, Precondition,Cost)4-tuples.

A MacOp representation is shown in Fig. 6 that corresponds toan ARI (Feature, Precondition) couples shown in Fig. 4. For thisparticular example, we see that ARI (F1, P1) has been mapped tomultiple MacOps, viz. (D1,R1

2), and (D1,R13) denoting either can

satisfy the ARI.As we mentioned, a single (Feature, Precondition) tuple in ARI

can lead to multiple Resources in MacOps belonging to the samedomain. One such example of a MacOps generation is shown in Ta-ble 1, where ‘Network Connectivity’ domain have multiple resourceoptions that can solve the network connectivity requirements of theapplication.

Figure 6: Macro Operator

Table 1: Macro Operator structure of an exemplar application

Domain Resource PreconditionNetworking Connectivity Layer 2 100 MbpsNetworking Connectivity Layer 3 100 Mbps

Domain Resource Precondition

Storage Local Storage 20GBRemote Storage 20GB

Domain Resource PreconditionLocal Computation Dedicated Server 24 cores, 8 GB RAMLocal Computation Virtual Server 24 cores, 8 GB RAM

3.2 CompositionInitially, the (Resource, Precondition) pair of the generated MacOpsneeds to be compared with existing custom templates configurationsresiding in the catalog. However, if there is no match, a new customtemplate needs to be generated using all possible combinations ofMacOps (a set of candidate cloud resources that could satisfy data in-tensive application requirements, equation: 1) and then stored in thecatalog. This process is done through our recommender scheme thatchooses the best of the candidate solutions. The different structurespresented in the composition stage are:

3.2.1 Custom Template. One or many custom template con-figurations can be created for a given application depending on thecorresponding MacOps results. Each custom template is a com-bination of 5-tuple (Domain, Resource, Specification Constraint,Precondition and cost). Fig. 7 shows an excerpt of the custom tem-plate JSON file for an example application that contains a sampledescription of cloud resource configuration and application meta-data.

3.2.2 Catalog. As the new templates are created they are storedin a catalog for future use (step 4 in custom template life cycle). Iffor a user requirement there exists a template in the catalog whichsatisfies the application requirement, then it might be chosen by ourrecommender scheme as a candidate. Initially, the catalog will havefew templates generated for the different application requirements.However, over the time the catalog will have critical mass with newcustom templates upon deployment of new resources.

3.2.3 Policy Database. We consider Policy database whichcontains a pre-defined set of rules with inter-dependencies amongthem. They are used to provide initial resource recommendationfor the KIS. These rules initially are created based on the previ-ous deployment of diverse data-intensive applications that requiredheterogeneous resources, as well as logical rules that guaranteecompatibility and functionality of cloud resources.

3.3 ConsumptionThe last stage of the custom template life cycle focuses on theresource deployment. Once the custom template configurations aregenerated or retrieved from the catalog for a particular application’srequirement, the configurations (along with the cost) are presentedto the end user. Three different templates are selected for the user(‘green’, ‘red’ and ‘gold’). We consider the closest solution to theuser requirement as the ‘green template’, the template that presents

Figure 7: Custom template JSON configuration excerptdescribing compute, network, storage and software for:application requirements (Figure on the left), and one

recommended solution (Figure on the right).

lower cost as the ‘red template’ and the template with more resourcesfor a certain feature (i.e., memory) based on user preference as the‘gold template’. This preference is explicitly selected by the userduring the KIS interactions.

Once the user chooses a particular template (Step 8 in Fig. 2),corresponding infrastructure resources are automatically deployedand credentials for infrastructure interaction are available to the user.The deployment process utilizes the APIs for different CSPs anddocker container technologies for software deployment i.e., throughthe ‘docker hub’. Once the infrastructure is deployed, the monitoringmanager (Step 9 in Fig. 2) verifies the deployment and then collectsthe information about user account creation, IP configuration, userprivileges granted, customized software and availability and networkconnectivity. This data along with the application data is stored and isavailable as part of the template meta-data within the catalog. For thiswork, we use a simple cost per time-unit model and investigations ona more sophisticated cost model are beyond the scope of this paper.

4 RECOMMENDATION SCHEMEIn this section, we present details of our template recommendationalgorithm that finds the most closely matching templates for a givenuser’s requirement. Our algorithm extends the k-Nearest Neighbors(KNN) algorithm to find the three most closely related templates(i.e., ‘green’, ‘red’ and ‘gold’) in the catalog as shown in the Al-gorithm 1. Preferred dimension (pDimension) input is explicitlydefined by users through the KIS and it is used to prioritize resourcerequirement during template selection process. From the ARI, weconstruct a requirement vector V (reqVector) having (r1,r2,r3,....,rn)where ri is the pre-condition for each resource in network, storageand computation domain.

The MacOp data structure contains different candidate resourcesfor different domains. This is used to compose templates from allpossible combinations and to store unique templates in the catalog. Afilter process for string comparison is applied to the catalog and thereqVector to find candidates templates that contain the required cloudresources. Next, the distance between candidates and reqVector iscalculated and stored in a list (tList). A second filtering process is

Algorithm 1 Template recommendation algorithm

1: procedure DECISION(MacOp[1..m], catalog[1..n], pDimension, re-qVector)

2: /*number of neighbors*/3: k = 34: /*generate possible candidates combination of MacOp*/5: m = MacOpComb(MacOp)6: /*update catalog with generated candidates*/7: catalog = catalog + m8: /*filter catalog based on string values*/9: f = filter(catalog, reqVector)

10: /*calculate templates distance to reqVector*/11: for all t in f do12: tList = distance(t , reqVector, pDimension)13: end for14: /*filter k number of candidate templates*/15: candidateList = nearestNeighbors(k , tList , reqVector)16: /*select closest template*/17: green = closestT(candidateList ,reqVector)18: /*select max preferred dimension template*/19: gold = maxPDimension(candidateList , pDimension)20: /*select cheapest Template*/21: red =cheapestT(candidateList)22: return r ed , дold , or дr een23: end procedure

applied to the catalog i in order to find the 3 closest templates usingthe K-NN algorithm.

Following this, the reqVector is compared with n-dimensionalvector with Ri components ∈ RSpace where Ri contains: No. ofCores, RAM, Storage and Bandwidth, and so on. Then, the distancebetween Ri and reqVector is calculated using the formula 2 for Ri ∈RSpace.

Di = w ∗j=n∑j=1

Vj ∗ abs(Ri, j −Vj )+

w ∗V4 ∗max(0,Ri j −Vj ).(2)

The weights in the above equation are determined by collectingpreferences from the user and by using the Analytical HierarchyProcess described in Section 2. The weight factor w is used to scalethe distance. Note that multiplying each term byV gives more weightto the distance in the dimension which is rated highest by the user.This dimension is further referred to as the Preferred Dimension.

We consider the closest solution i.e., the one with the least dis-tance to the reqVector as the ‘green’ template (function closestT() inthe Algorithm), and the one with high value for the pDimension as‘gold’ template (function maxPDimension() in the Algorithm) andthe remaining candidate as ‘red’ template (function cheapestT() inthe Algorithm). We also consider range of resource requirements todetermine upper and lower boundaries as the KIS allows ranges forcertain resources requirements.

5 IMPLEMENTATION AND RESOURCEDEPLOYMENT

The middleware is developed using Java Struts 2 web framework.For the front-end user interface, JSP/JavaScript and JQuery are usedand entire application logic is written in Java. We use the MySQLrelational database to store the abstracted cloud model (RSpace).

From an architectural point of view, provisioning cloud servicesinvolves identifying cloud resources required for the user require-ments, deploying infrastructure in the selected CSP and installingand configuring any software that user needs in the deployed fed-erated/distributed resources. The last step is done by using Dockerfunctionality that allows software deployment on a variety of plat-forms without being constrained to software dependencies. We havemade the portal code and recommender scripts along with data struc-tures, data and GENI configuration openly available at [36].

Fig 8 shows the middleware architecture diagram (divided intothree layers) to automate application requirements abstraction andfederated cloud resources deployment. In the ‘Application Require-ment & Abstraction Layer’, users interact with the middlewarethrough the KIS user interface to identify the application require-ments. This layer also generates ARI from the user input capturedby the KIS and it is passed to the ’Resource Provisioning & Deploy-ment Layer’ where the ARI is compared with the RSpace resultingin the creation of MacOps. The MacOps are used by the ‘ResourceRecommender’ module to create/reuse custom templates and catalogthese templates. Once a user selects a particular custom template,resources will be automatically provisioned through the ‘ResourceDeployment Module’. The ‘Infrastructure Layer’ abstracts the re-sources from different CSPs and calls the correct API to interactwith a specific CSP. Infrastructure Layer receives the template to bedeployed from resource provisioning and deployment layer and callsthe relevant APIs to allocate resources on corresponding CSP infras-tructure. The ‘Monitoring Manager’ verifies that the resources aredeployed successfully. All the inter-process communication betweendifferent layers is implemented via RESTful web services.

Figure 8: System architecture diagram for applicationrequirements abstraction and hybrid cloud resource

deployment using custom template middleware.

Initially, user requirements are captured from the client side. Uponcollection of general requirements of applications, the pre-population

manager sends back the rest of the form with pre-populated valueswhich are minimum requirements for the application. The completeuser requirements are sent to solution generator which generates theARI and sends it to the MacOp generator. The MacOp generatorcompares the ARI with the RSpace and gives the possible resourcesin each component that can satisfy the user requirement. The solutiongenerator then generates all possible combinations of the solutionand creates the MacOps and inserts them into the catalog. Here, onlythe unique templates are inserted into the catalog, to avoid duplicates.Next, the solution generator contacts the recommendation scheme tooutput 3 templates (‘green’, ‘red’ and ‘gold’) that are the 3 closestsolutions to the user requirement as per Algorithm 1. The templatesare shown to the user and upon user selection, the selected template(among ‘green’, ‘red’ and ‘gold’) is sent to configuration manager fordeployment. Lastly, access information of the configured resourcesand credentials are available to the user.

We implemented our middleware to provision cloud infrastructurefor a real-world advanced manufacturing data-intensive application.Advanced manufacturing design requires synchronous collaborativework among multi-site engineering experts and a robust local infras-tructure that can run a highly scalable workflow. Small and Mediumsize manufacturer businesses are increasingly leveraging modelingand simulation processes as they cannot afford to invest in localinfrastructures. We observe that our middleware implementationreduces the product design cycles and provides the businesses withexpertise to manage the sizable upfront costs for high-performancecomputing resources and software licenses.

6 PERFORMANCE EVALUATIONImplementing our custom template middleware can provide seam-less access to the users (novice and expert) to federated cloud re-sources configured based on the requirements of the data-intensiveapplication. Particularly, users without technical or cloud platformexperience can easily interact with KIS and take advantage of ourrecommendation scheme to provision the pertinent resources. Addi-tionally, upon automated deployment of cloud resources, credentialswill be sent to the users along with access instructions. This wholeprocess reduces the resource provisioning time drastically and re-moves guess work in cloud resource allocation complexity, arisingfrom manual approaches. In addition, users can save their previoussuccessful solutions as custom templates in the catalog for portabilityand future purposes.

6.1 Evaluation of Recommendation SystemFor our evaluation experiments, we used the publicly available dis-tributed infrastructure of GENI. We setup a testbed with distributedheterogeneous resources that can be discovered and configured byour recommendation scheme using the geni-lib, a GENI API ca-pability. To mimic the different data centers, we considered differentaggregate managers (configurable computational resources) that aredistributed across many cities.

To evaluate our middleware utility, we use a Resource Provi-sion Accuracy metric, which we define as the similarity of cloudresources presented in templates, versus the similarity of the cloudresources in requirements of data-intensive applications. We eval-uate the accuracy of our recommendation system by consideringresource requirement boundaries created based on reqVector. Theboundaries are calculated based on the input ranges selected by theusers at the time of choosing cloud resources through the KIS (e.g.,

required bandwidth range: 10 - 15 Mbps). This information is usedto create additional boundary vectors.Vmin is the lower bound of therequirements, Vmax is the upper bound of the user requirements andVmid is the vector constructed using the mid-values for the features.We test our recommendation system with generated data for 60 userrequirement requests. We implement our recommendation schemefor 30 novice users (users who do not provide enough information tothe KIS) and 30 expert users (who completed most or all fields in theKIS). The recommendation scheme is also evaluated by consideringusers who explicitly specified pDimension variable (resource pref-erence selection) and users who did not specify any preference (allresources have equal importance). Finally, we compare our resultswith a prior scheme detailed in [18] that also has a recommendationapproach as mentioned in Section 2.

Fig 9(b) shows results obtained with our scheme which clearlypresents improved results in comparison with Fig 9(a). Results in9(b) are cataloged in different groups that correspond to novice andexpert users. Within those categories there are two sub-categorieswhich represent users who consider all resources with the same im-portance and users that explicitly set a pDimension to determine pref-erence or priority of some resource component. Results for noviceusers show that our recommender scheme achieves nearly 58% ac-curacy when pDimension is not applied and 71% accuracy whenpDimension is applied. pDimension represents an improvement ofnearly 13%. Similarly, results for expert users show that our recom-mender scheme achieves nearly 78% accuracy when pDimensionis not applied and 92% when pDimension is applied. pDimensionrepresent an improvement of 14%. Overall our recommender schemeaccuracy depends on the user input and accuracy is high when theuser has some level of cloud knowledge or experience, and accuracyis low otherwise. Also, pDimension effectively targets resourcesbased on the resource priority specified.

6.2 Data-intensive Application Case StudyIn this subsection, we present the effectiveness of our middlewareimplementation in the form of a case study for the advanced manu-facturing of data-intensive application.

Case study: A small-business advanced manufacturing companyintegrated our custom template middleware solution and providedtheir cloud resource requirement needs as input to the KIS. A modelrequirement input is presented below:

• Highly available cluster infrastructure resources requiredwith support of batch processing execution• Nodes require 2 cores with 4-8 GB RAM and 20 GB-40

GB local storage and bandwidth of 10-15 Mbps• Require high-throughput computing software framework

such as HTCondor available upon resource provisioning• Require customized queuing software available on master

node upon resource provisioning (stored in Github reposi-tory)

• Require binary files available on new cluster environment(sources in Github repository)• User Preference input for pDimension: The preference fea-

ture selected is RAM MemorySome of the cloud resource requirements (RAM, Storage Size

and Bandwidth) present range values. Consequently, boundariesare established through vectors (Vmin and Vmax ) and the reqVectoris constructed with Vmid vector. Some of the vector values are:Vmin = (2, 4, 20, 10), Vmid = (2, 6, 30, 12.5) and Vmax = (2, 8,

(a) (b)

Figure 9: Accuracy for recommended results: (a) This figure presents results based on Figs. 5 and 6 of work done in [18], where the recommender schema present amaximum accuracy value of 71% ; (b) Present our middleware results based on four different input data: Novice that represents Requirements Without Preference),

Novice (P) that represents: Requirements With Preference, Expert that represents: Requirements Without Preference and Expert (P) that represents Requirements WithPreference. The maximum accuracy obtained is 92%.

(a) (b)

Figure 10: The reqVector presents the following requirements: 2 Cores, RAM memory between 4 and 8, Storage capability between 20 and 40 GB and bandwidthbetween 10 and 15 Mbps. Gold, green and red templates are presented as candidate solution with normalized values. (a) pDimension is not specified hence some templates

are out of the boundaries; (b) pDimension for RAM Memory resource is specified, hence all templates tend to be close to the RAM upper bound, and all templates areinside the ranges.

40, 15). we use the evaluation metric, Resource Demandin this setof experiments. This metric is defined as the amount of resourcesneeded for different templates.

Our recommender scheme matches this requirement with differentCSP’s and presents the available resources to the user for applicationdeployment. Once the user selects one of the proposed solutions (i.e.,among the ‘green’, ‘gold’ and ‘red’ options), the infrastructure isdeployed and pre-configured with all the necessary software. Cre-dentials along with instructions to access the new infrastructure aremade available for the user. Once this process is done, the solutiontemplate is stored in the catalog for future reuse.

In order to demonstrate the effectiveness of the pDimension vari-able, we perform two experiments using the advanced manufacturingapplication implementation. In the first experiment, all resourceshave the same priority, hence a pDimension was not defined. How-ever, for the second experiment, ‘RAM’ was selected for pDimen-sion. From Fig. 10, we can observe that the pDimension variable

affects the output results because the candidate templates differ in thenumber of recommended resources (i.e., in the Resource Demand).Results in Fig. 10(a) show that the cloud resources are out of theboundaries. However, results in Fig. 10(b) show candidate templatesare found to be within the upper and lower boundaries as well asthose that are close to the upper boundary. Majority of our experi-ments reveal an interesting phenomenon related to the cost, wherethe ‘green’ template generally incurs high cost and the associatedvector is close to the upper bound. Moreover, the ‘gold’ template(template with the highest values based on RAM pDimension) incurslower cost in comparison with the ‘green’ and higher cost than the‘red’. Finally, the ‘red’ template that presents the lowest cost, as wellas the vector, is close to the lower bound.

7 CONCLUSIONIn this paper, we presented that there are substantial barriers anda clear lack of scientific approaches and middleware solutions to

help users of data-intensive applications to effectively deploy het-erogeneous distributed cloud resources. We designed a user-friendlyinterface (KIS) to overcome limitations of reusability of previouslysuccessful configurations of resource provisioning for similar ap-plications, thus demonstrating proper abstractions. Particularly, wedescribed the implementation of a custom template catalog (i.e., ourrecommendation scheme contribution) that recommends configura-tion solutions for requirements of different applications, which inturn could lead to effectively utilize time and effort in provisioningheterogeneous resources for novice and expert users. The presentedcustom template middleware (i.e., our implementation for a real-world application, code openly available in GitHub at [36]) wasevaluated through two experiment scenarios that involved simulationof novice beginner and expert interactions with the KIS. Resultsof the evaluation show that our scheme can obtain up to 71% ac-curacy for novice users, and up to 92% accuracy for expert users,thus a net improvement of 21% accuracy, compared to an existingrecommendation scheme [18].

Future work could involve enabling dynamic resource adaptationover federated cloud resources based on resource utilization mon-itoring. Also, middleware extensions can be developed to allow aprovisioned distributed cloud resource to be adapted and refinedautomatically after its provisioning in an online manner. These ex-tensions could solve cases e.g., when a user is not sure about therequired resources or the KIS inputs are scarce. Lastly, our workcould be extended to a number of other real-world applicationsto benefit data-intensive user communities in various science andengineering disciplines such as bioinformatics and neuroscience.

ACKNOWLEDGMENTSThis work was supported by the National Science Foundation un-der awards: ACI-1246001, ACI-1245795, ACI-1440582 and CNS-1429294, and Cisco Systems. Any opinions, findings, and conclu-sions or recommendations expressed in this publication are those ofthe author(s) and do not necessarily reflect the views of the NationalScience Foundation or Cisco Systems.

REFERENCES[1] A. Hanemann, J. Boote, E. Boyd, J. Durand, L. Kudarimoti, R. Lapacz, D. Swany,

S. Troche, J. Zurawski, “perfSONAR: A Service Oriented Architecture for Multi-Domain Network Monitoring”, Proceedings of International Conference on Service-Oriented Computing (ICSOC), 2005.

[2] R. Morgan, S. Cantor, S. Carmody, W. Hoehn, K. Klingenstein, “Federated Security:The Shibboleth Approach”, EDUCAUSE Quarterly, 27(4):12-17, 2004.

[3] R. Bazan Antequera, P. Calyam, S. Debroy, L. Cui, S. Seetharam, M. Dickinson,T. Joshi, D. Xu, T. Beyene, “ADON: Application-Driven Overlay Network-as-a-Service for Data-Intensive Science”, IEEE Transactions on Cloud Computing,PP(99):1, 2017.

[4] R. Schmidt, S. Grarup, “vApp: A Standards-based Container for Cloud Providers”ACM SIGOPS Operating Systems Review, 44(4):115-123, 2010.

[5] M. Berman, J. Chase, L. Landweber, A. Nakao, M. Ott, D. Raychaudhuri, R.Ricci, I. Seskar, “GENI: A federated testbed for innovative network experiments”,Elsevier Computer Networks,61(1):5âAS23, 2014.

[6] “Amazon Web Services: offers a suite of cloud computing services that make up anon-demand computing platform”, http://aws.amazon.com

[7] A. Botea, M. Muller, J. Schaeffer, “Using Component Abstraction for AutomaticGeneration of Macro-Actions", Proceedings of International Conference on Auto-mated Planning and Scheduling(ICAPS)181-190, 2004.

[8] G. Wei, X. Zhong-Wei, X. Ren-Zuo, “Metrics of Graph Abstraction for Component-Based Software Architecture", Proceedings of the WRI World Congress on Com-puter Science and Information Engineering,07:518-522, 2009.

[9] H. Zheng, H. Yao, T. Yoneda , “Modular Model Checking of Large Asynchro-nous Designs with Efficient Abstraction Refinement", IEEE Transactions onComputers,59(4):561-573, 2010.

[10] Y. Wang, “A hierarchical abstraction model for software engineering”, Pro-ceedings of the International workshop on The role of abstraction in softwareengineering,43-48, 2008.

[11] M. Godse, S. Mulik, “An Approach for Selecting Software-as-a- Service (SaaS)Product”, IEEE International Conference on Cloud Computing, 2009.

[12] M. Singhal, J. Ramanathan, P. Calyam, M. Skubic, “In-the-know: Recommenda-tion Framework for City-supported Hybrid Cloud Services”, IEEE/ACM Interna-tional Conference on Utility and Cloud Computing (UCC), 2014.

[13] M. Zhang, R. Ranjan, A. Haller, D. Georgakopoulos, and P. Strazdins, “Investi-gating decision support techniques for automating Cloud service selection”, IEEEInternational Conference on Cloud Computing Technology and Science (Cloud-Com), 2012.

[14] M. Zhang, R. Ranjan, M. Menzel, A. Haller, “A Declarative Recommender Systemfor Cloud Infrastructure Services Selection”, Proceedings of the Internationalconference on Economics of Grids, Clouds, Systems, and Services,102-113, 2009.

[15] H. Qian, H. Zu, C. Cao, Q. Wang, “CSS: Facilitate the cloud service selectionin IaaS platforms”, Proceedings of International Conference on CollaborationTechnologies and Systems (CTS), 2013.

[16] S. Sundareswaran, A. Squicciarini, D. Lin, “A Brokerage-Based Approach forCloud Service Selection”, Proceedings of IEEE International Conference on CloudComputing (CLOUD), 2012.

[17] Z. Gui, C. Yang, J. Xia, Q. Huang, K. Liu, Z. Li, M. Yu, M. Sun, N. Zhou, B. Jin,“A Service Brokering and Recommendation Mechanism for Better Selecting CloudServices”, PLoS ONE( Public Library of Science) 9(8):e105297, 2014.

[18] S. Soltani, P. Martin, K. Elgazzar, “QuARAMRecommender: Case-Based Rea-soning for IaaS Service Selection”, Proceedings of International Conference onCloud and Autonomic Computing (ICCAC), 2014.

[19] J. Kirschnick, J. Alcaraz, L. Wilcock, N. Edwards, “Toward an Architecture forthe Automated Provisioning of Cloud Services", IEEE Communications Magazine,48(12):124-131, 2010.

[20] J. Bobadilla, F. Ortega, A. Hernando, A. GutiÃl’Rrez, “Recommender systemssurvey”, Journal on Knowledge-Based Systems,46:109-132, 2013.

[21] V. Lopez, R. Salamanca, M. Moreno, J. Corchado, “A Knowledge-Based Recom-mender Agent to Choosing a Competition System”, Trends in Practical Applica-tions of Agents, Multi-Agent Systems and Sustainability,143-150, 2015.

[22] D. Arnett, Z. Baizal, Adiwijaya, “Recommender system based on user functionalrequirements using Euclidean fuzzy”, Proceedings of International Conference onInformation and Communication Technology (ICoICT), 2015.

[23] C. Horuk, G. Douglas, A. Gupta, C. Krintz, et. al., “Automatic and portable clouddeployment for scientific simulations”, Proc. of IEEE HPCS, 2014.

[24] J. Kirschnick, J. Alcaraz, L. Wilcock, N Edwards, “Toward an architecture for theautomated provisioning of cloud services”, Proceedings of International Confer-ence on High Performance Computing & Simulation (HPCS), 2010.

[25] T. Nielsen, C. Iversen, P. Bonnet, “Private Cloud Configuration with MetaConfig”,Proceedings of IEEE International Conference on Cloud Computing (CLOUD),2011.

[26] “Puppet: Manage IT infrastructure as code across all environments”,https://puppetlabs.com

[27] R.Qasha, J. CaÅCa, P. Watson,“Towards Automated Workflow Deployment inthe Cloud using TOSCA” Proceedings of IEEE International Conference on CloudComputing (CLOUD), 2015.

[28] I. Giannakopoulos, N. Papailiou, C. Mantas, I. Konstantinou, D. TsoumakosâAa,N. Koziris, “CELAR: Automated Application Elasticity Platform” Proceedings ofIEEE International Conference on Big Data (Big Data), 2014.

[29] “Cisco UCS Director: Cisco UCS Director automates, orchestrates, andmanages Cisco and third-party hardware. It allows a user to manage net-work components and services and allows data center staff to focus onservices.”, http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-director/index.html

[30] “Apache Cloud: it is a standard Python library that abstracts away differenceamong multiple cloud providers .i.e. it provides one interface to interact with allCSP”,http://libcloud.apache.org

[31] “Apache jCloud: it is an open source multi-cloud toolkit for the Java plat-form that gives user the freedom to create applications that are portableacross clouds while giving you full control to use cloud-specific features.”,http://jclouds.incubator.apache.org

[32] “Delta-Cloud: Deltacloud provides the API server and drivers necessary forconnecting to cloud providers. It enables management of resources in differentclouds by the use of one of three supported APIs.”, http://deltacloud.apache.org

[33] C. Boettiger, “An introduction to docker for reproducible research”, ACM SIGOPSOperating Systems Review - Special Issue on Repeatability and Sharing of Experi-mental Artifacts archive,49(1):71-79, 2015.

[34] M. Thanh, N. Quang-Hung, M. Nguyen, N. Thoai, “Using Docker in High Perfor-mance Computing Applications”, Proceedings of IEEE International Conferenceon Communications and Electronics (ICCE), 2016.

[35] A. Hanemann, J. Boote, E. Boyd, J. Durand, et. al., “PerfSONAR: A ServiceOriented Architecture for Multi-Domain Network Monitoring”, Proceedings of theInternational conference on Service-Oriented Computing,241-254, 2005.

[36] Openly accessible Github repository of Custom Template Middleware,https://github.com/vmlb/CT

recommending resources to cloud applications based on ...faculty.missouri.edu › calyamp ›...

Documents