computers, environment and urban systemsapp.mtu.edu.ng/cbas/geoscience/building model as a...maas...

12
Building Model as a Service to support geosciences Zhenlong Li , Chaowei Yang , Qunying Huang, Kai Liu, Min Sun, Jizhe Xia Center of Intelligent Spatial Computing for Water/Energy Sciences, George Mason University, Fairfax, VA 22030-4444, United States abstract article info Article history: Received 8 August 2013 Received in revised form 4 June 2014 Accepted 26 June 2014 Available online 17 July 2014 Modeling is a fundamental methodology for simulating the past, understanding the present and predicting the future of the geospatial systems and phenomena. However, modeling in the geospatial science poses several chal- lenges, including complex model setup, repetition in model setup, requirement for large, scalable computing re- sources, and management of a large amount of model output. To address these challenges, we propose Model as a Service (MaaS) by leveraging the latest advancement of cloud computing. MaaS enables various geoscience models to be published as services, and these services can be accessed through a simple web interface. MaaS au- tomates the processes of conguring machines, setting up and running models, and managing model outputs. The computing resources are automatically provisioned by MaaS in a cloud environment. A proof-of-concept MaaS prototype is presented using a global climate change model (ModelE). Experimental results show that the MaaS prototype signicantly simplies model setup, accelerates model simulation and enhances model out- put by providing a web-based, on-demand, scalable modeling environment. © 2014 Elsevier Ltd. All rights reserved. Keywords: Cloud computing Web service Geospatial data Model Web EarthCube Big data 1. Introduction Models can help us better understand a natural systems past, pres- ent and future. In the geosciences, modeling is critical for researching how the Earth system works by analyzing its uid and solid dynamics and simulating its physical processes (Claussen et al., 2002; Hill, DeLuca, Suarez, & Da Silva, 2004). For example, climate models con- struct the past climate conditions and predict future climate changes for the next decade or century by simulating the interactions of the oceans, atmosphere, land surface, biosphere and ice (Schneider & Dickinson, 1974). Polar science models give scientists insights of the in- teractions of physical and biological processes in the Polar Regions and predict potential future changes (Lindsay & Zhang, 2006). Using models for geoscience studies pose several challenges. First, geoscience models are often highly complicated, setting up the model is onerous and time consuming (Harrop, Bernardet, Govett, Smith, & Weygandt, 2008; Nefedova et al., 2006), often having to be repeated many times. Second, geoscience model simulations are computing in- tensive, requiring large amounts of computing resources (Allcock et al., 2002). In addition, some model simulations have strict time re- quirements, needing to be nished in hours, (e.g., weather and dust storm forecasting). Such circumstances further stress the computational requirement of model simulation. Third, scalable computing resources are desired by ensemble model runto conduct model runs in parallel. Ensemble model runs refer to running a model many times with differ- ent congurations to test a set of scientic experiments (e.g., evaluate a models sensitivity to different variables) (Murphy et al., 2004; von Deimling, Held, Ganopolski, & Rahmstorf, 2006). Finally, model simula- tion is often data intensive. For example, conducting a 5-day weather forecast could generate 10 Terabytes of output data per day (Ferraro, Sato, Brasseur, Deluca, & Guilyardi, 2003). Thus, it is challenging to de- velop an effective methodology to handle the large volume of model output. To capture the technology advancements for dealing with geosci- ence modeling, the National Science Foundation (NSF) EarthCube 1 pro- gram held a modeling workshop with 62 participants including geoscientists, geoscience educators, modelers and technologists in April 2013 (Arrigo et al., 2013). The workshop concluded that an infra- structure is desired to support model interoperability, reusability, trans- parency and portability across geoscience communities. Thus, an infrastructure is required where geoscientists, policy makers, and the public can explore what if(prediction given hypothetical constraints) scenarios (Argent, 2004). Toward forming such a modeling infrastructure, we propose a cloud- enabled Model as a Service (MaaS) framework, which addresses the aforementioned challenges by: (1) publishing geoscience models as Computers, Environment and Urban Systems 61 (2017) 141152 Corresponding authors. Tel.: +1 7039093676 (Z. Li), +1 7039937472 (C. Yang). E-mail addresses: [email protected] (Z. Li), [email protected] (C. Yang), [email protected] (Q. Huang), [email protected] (K. Liu), [email protected] (M. Sun), [email protected] (J. Xia). 1 https://www.earthcube.org. http://dx.doi.org/10.1016/j.compenvurbsys.2014.06.004 0198-9715/© 2014 Elsevier Ltd. All rights reserved. Contents lists available at ScienceDirect Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/ceus

Upload: others

Post on 16-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Computers, Environment and Urban Systems 61 (2017) 141–152

Contents lists available at ScienceDirect

Computers, Environment and Urban Systems

j ourna l homepage: www.e lsev ie r .com/ locate /ceus

Building Model as a Service to support geosciences

Zhenlong Li ⁎, Chaowei Yang ⁎, Qunying Huang, Kai Liu, Min Sun, Jizhe XiaCenter of Intelligent Spatial Computing for Water/Energy Sciences, George Mason University, Fairfax, VA 22030-4444, United States

⁎ Corresponding authors. Tel.: +1 7039093676 (Z. Li),E-mail addresses: [email protected] (Z. Li), cyang3@gmu.

[email protected] (Q. Huang), [email protected] (K. Liu), [email protected] (J. Xia).

http://dx.doi.org/10.1016/j.compenvurbsys.2014.06.0040198-9715/© 2014 Elsevier Ltd. All rights reserved.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 8 August 2013Received in revised form 4 June 2014Accepted 26 June 2014Available online 17 July 2014

Modeling is a fundamental methodology for simulating the past, understanding the present and predicting thefuture of the geospatial systems and phenomena. However,modeling in the geospatial science poses several chal-lenges, including complex model setup, repetition in model setup, requirement for large, scalable computing re-sources, andmanagement of a large amount ofmodel output. To address these challenges,we proposeModel as aService (MaaS) by leveraging the latest advancement of cloud computing. MaaS enables various geosciencemodels to be published as services, and these services can be accessed through a simple web interface. MaaS au-tomates the processes of configuring machines, setting up and running models, and managing model outputs.The computing resources are automatically provisioned by MaaS in a cloud environment. A proof-of-conceptMaaS prototype is presented using a global climate change model (ModelE). Experimental results show thatthe MaaS prototype significantly simplifies model setup, accelerates model simulation and enhances model out-put by providing a web-based, on-demand, scalable modeling environment.

© 2014 Elsevier Ltd. All rights reserved.

Keywords:Cloud computingWeb serviceGeospatial dataModel WebEarthCubeBig data

1. Introduction

Models can help us better understand a natural system’s past, pres-ent and future. In the geosciences, modeling is critical for researchinghow the Earth system works by analyzing its fluid and solid dynamicsand simulating its physical processes (Claussen et al., 2002; Hill,DeLuca, Suarez, & Da Silva, 2004). For example, climate models con-struct the past climate conditions and predict future climate changesfor the next decade or century by simulating the interactions of theoceans, atmosphere, land surface, biosphere and ice (Schneider &Dickinson, 1974). Polar science models give scientists insights of the in-teractions of physical and biological processes in the Polar Regions andpredict potential future changes (Lindsay & Zhang, 2006).

Using models for geoscience studies pose several challenges. First,geoscience models are often highly complicated, setting up the modelis onerous and time consuming (Harrop, Bernardet, Govett, Smith, &Weygandt, 2008; Nefedova et al., 2006), often having to be repeatedmany times. Second, geoscience model simulations are computing in-tensive, requiring large amounts of computing resources (Allcocket al., 2002). In addition, some model simulations have strict time re-quirements, needing to be finished in hours, (e.g., weather and duststorm forecasting). Such circumstances further stress the computational

+1 7039937472 (C. Yang).edu (C. Yang),[email protected] (M. Sun),

requirement of model simulation. Third, scalable computing resourcesare desired by “ensemble model run” to conduct model runs in parallel.Ensemble model runs refer to running a model many times with differ-ent configurations to test a set of scientific experiments (e.g., evaluate amodel’s sensitivity to different variables) (Murphy et al., 2004; vonDeimling, Held, Ganopolski, & Rahmstorf, 2006). Finally, model simula-tion is often data intensive. For example, conducting a 5-day weatherforecast could generate 10 Terabytes of output data per day (Ferraro,Sato, Brasseur, Deluca, & Guilyardi, 2003). Thus, it is challenging to de-velop an effective methodology to handle the large volume of modeloutput.

To capture the technology advancements for dealing with geosci-ence modeling, the National Science Foundation (NSF) EarthCube1 pro-gram held a modeling workshop with 62 participants includinggeoscientists, geoscience educators, modelers and technologists inApril 2013 (Arrigo et al., 2013). The workshop concluded that an infra-structure is desired to supportmodel interoperability, reusability, trans-parency and portability across geoscience communities. Thus, aninfrastructure is required where geoscientists, policy makers, and thepublic can explore “what if” (prediction given hypothetical constraints)scenarios (Argent, 2004).

Toward forming such amodeling infrastructure,we propose a cloud-enabled Model as a Service (MaaS) framework, which addresses theaforementioned challenges by: (1) publishing geoscience models as

1 https://www.earthcube.org.

Page 2: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

142 Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

web services to hide the complexity of model setup; (2) providing anon-demand ready-to-go model environment, including hardware andsoftware resources (e.g., computing, storage, and operating system(OS) with models and dependent libraries); (3) automatically provi-sioning computing resources to execute multiple model runs in parallelto support many-model-run scenarios and concurrent user accesses;and (4) effective handling model output for online visualization.

The remainder of this paper details the MaaS. Section 2 reviews rel-evant researchwhereas Section 3 details themethodologies. A proof-of-concept prototype is presented in Section 4 with experimental resultsdemonstrating the feasibility of the MaaS. Finally, Section 5 draws con-clusions and discusses future relevant research.

2. Related work

2.1. Model Web for model accessibility and interoperability

The concept of model access through web services, Model Web, hasbeen proposed to facilitate model accessibility and interoperability.Geller and Turner (2007) first defined Model Web as an open-endedsystem of interoperable computer models and databases, with machineand end-user Internet access via web services. They envisioned thatthrough Model Web geoscience communities would work as a systemof independent but interactivemodels in three phases: (1) data interop-erability for handling data heterogeneity; (2) ontology to address thedomain heterogeneity; and (3) automation with the web portals formodel accessibility and integration. Notable efforts havemade progresstoward this vision. For example, the climate dynamics community hasdeveloped modeling systems reflecting the interaction of Earth subsys-tems, such as ocean and atmosphere (e.g., theMETAFORproject,2 Nativi,Mazzetti, & Geller, 2013). Geller and Melton (2008) applied an ecologi-cal web model to assess the impacts of climate change. Nativi et al.(2013) further introduced the GEO Model Web (GMW) initiative to in-crease environmental model access and sharing. Model interfaces havebeen designed and tested asweb services to improve the accessibility ofmodel output (e.g., PCMDI: Program for Climate Model Diagnosis andIntercomparison3). Roman, Schade, Berre, Bodsberg, and Langlois(2009) described the MaaS concept as the evolution of Model Webwithout detailing how such a concept could be implemented with thelatest information and computing techniques. Addressing the require-ments to make models and their outputs configurable, accessible andinteroperable, Model Web has gained recognition by the geosciencecommunity as a useful tool to build geosciencemodels, combining indi-vidual components in complex workflows (Bastin et al., 2013).

The three-phase vision of the Model Web has been largely fulfilledby previous research. However, those visions primarily focused onmodel integration and interoperability across various disciplines. In ad-dition, previous researches (Geller & Melton, 2008; Geller & Turner,2007; Roman et al., 2009; Nativi et al., 2013) utilized open-standardsto achieve model interoperability and have a common limitation ofavoiding computational issues. In fact, the underlying computing infra-structure for running these models are barely mentioned, leaving com-putational challenges such as computing intensity, data intensity anddisruptive computing requirement unexplored.

2.2. Computing technologies for computing intensive model simulations

Traditionally, supercomputers or large scale Grids, such as TeraGrid(Beckman, 2005) and Open Science Grid (Pordes et al., 2007), havebeen used to address computational challenges for geoscience models(Bernholdt et al., 2005; Chang et al., 2008; Fernández-Quiruelas,

2 http://metaforclimate.eu/.3 http://www2-pcmdi.llnl.gov/.

Fernández, Cofiño, Fita, & Gutiérrez, 2011; Yang, Wu, Huang, Li, & Li,2011). However, these computing facilities require dedicated hardwareand software with significant upfront investment (e.g., years to assem-ble). Only a few large projects have access to these supercomputers. Analternative approach is to build a loosely coupled cluster from comput-ing resources of volunteers using BOINC (Anderson, 2004), such asSETI@home (Anderson, Cobb, Korpela, Lebofsky, & Werthimer, 2002).While such a volunteer-based computing environment works wellwith computing intensive projects, it is less functional for projectswith Big Data (e.g., global climate simulation) and a finite deadlinedue to limited bandwidth resources. As a result, the availability and ac-cessibility of computing resources for traditional computing paradigmshamper the advancements of geoscience and geospatial technologies.

Cloud computing provides an alternative for scientists to lease com-puting resources from commercial cloud computing providers. Cloudcomputing, characterized by on-demand self-service, availability, scal-ability and measured cost (Mell & Grance, 2009), is a new computingparadigm that has been matured and widely used in the past fewyears. Previously, the forms of cloud service models primarily includeInfrastructure as a Service (IaaS), Platform as a Service (PaaS), Softwareas a Service (SaaS) and Data as a Service (DaaS). One of the most re-markable characteristics of cloud computing, especially for IaaS, is thatusers can provision computing resources to run tasks with automatedworkflows for maximum efficiency and scalability in minutes, ratherthan waiting in queue with a limited computing pool in a traditionalcomputing paradigm. Therefore, cloud computing offers a powerfuland affordable alternative to run large-scale simulations that are com-putationally intensive (Huang et al., 2013). Many studies have beenconducted to explore the feasibility of utilizing cloud computing forgeoscience models and how to best adapt to this new paradigm(Huang, Yang, Nebert, Liu, & Wu, 2010; Ostermann et al., 2010;Vecchiola, Pandey, & Buyya, 2009; Yang, Goodchild, et al., 2011; Yang,Xu, & Nebert, 2013; Yang et al., 2014). Evangelinos and Hill (2008) con-cluded that cloud computing could provide a potential solution to sup-port atmosphere–ocean climate models, and Huang et al. (2013)utilized cloud computing to simulate dust storms. However, few studieshave explored leveraging cloud computing to support responsive, mas-sive, on-demand models, especially for supporting the computing- anddata-intensive geospatial science simulations.

This paper extends the traditional Model Web concept by integrat-ing the latest cloud computing services (e.g., on demand and elasticity)to handle themodel configuration, computing intensive and data inten-sive challenges. Different from the previous ModelWeb concept, whichfocuses onmodel integration and interoperability, the proposedMaaS isrelevant to the Everything as a Service (XaaS) in the context of cloudcomputing. Through incorporating cloud computing, we move ModelWeb to a new phase to address the computability challenges over theInternet for the general geospatial science community and the public.

3. Methodologies

3.1. MaaS in the cloud

The concept of MaaS is a viable mechanism to easily access, interactand run complex climate models and to manipulate model outputthrough simple web interfaces. However, it is challenging to buildMaaS with traditional computing infrastructure such as grid computingor HPC. First as a web service, it is normal to expect concurrent modelruns from many end users; with traditional computing infrastructure,the underlying computing resources need to be provisioned with alarge fixed number of physical machines to accommodate peak re-quests, which are not feasible or cost-effective. Second, many geosci-ence models run in a sequential mode and utilize only part of amachine’s resources. Under such circumstances, high-end computerscould not accelerate the simulation; instead, a low-end computer(e.g., one-core CPU) is more cost effective for such models. Third with

Page 3: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Fig. 1. General framework of the cloud-enabled MaaS.

143Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

traditional computing infrastructure, models are tightly coupled withthe physical computing infrastructure, and the whole model executionand data manipulation are directly deployed on the machines. Remov-ing existing models from or adding newmodels to such modeling envi-ronment is complex and time consuming.

Cloud computing (especially IaaS) is a promising technology to ad-dress this challenge for several reasons. First, IaaS provides scalablecomputing resources to handle concurrent and ensemble runs. Second,the specifications of a virtual machine (VM) (e.g., such as the number ofCPU cores and the size of RAM) can be efficiently tailored for a specificmodel run based on its resource consumption characteristics. Third,IaaS provides an image-basedmechanism, allowing themodel environ-ment to be “burnt” to a VM image (i.e., snapshot of a computer softwaresystem), which serves as the foundation ofMaaS. PaaS is not suitable forbuilding MaaS because PaaS does not allow users to manage or controlthe VM’s operating system or storage/network (Mell & Grance, 2009).Based on these considerations, we propose to build MaaS on IaaS, serv-ing as a new service model of cloud computing.

The general framework of MaaS consists of four layers: cloud com-puting platform, geoscience model image repository, MaaS middlewareand MaaS users (Fig. 1). As one of the most popular and matured IaaSplatforms, Amazon EC24 is used herein to demonstrate the idea.Eucalyptus5 is an open source cloud platform compatible with AmazonEC2. Therefore, the methodology presented in this paper can be appliedto building MaaS either on EC2 or Eucalyptus or other EC2-compatibleplatforms. The implementation architecture of the framework is pre-sented in Section 3.2.

3.2. MaaS architecture

The overall architecture ofMaaS enables users to submit amodel runrequest, monitor the model run status and finally get the model run re-sult through aweb interface (Fig. 2). The tasks of provisioningmachinesand setting up and running the model are automated by MaaS.

MaaS Engine, powered by cloud computing, compiles and runsmodels on the Model VMs. A Model VM is a ready-to-go environment

4 http://aws.amazon.com/ec2.5 https://www.eucalyptus.com.

for a specific model, including the model code, dependent software li-braries and other configurations. The Model VM is provisioned basedon the Model VM Image, which is a system snapshot containing thewhole required software environment to run a model except for themodel configurationfile and input data uploaded by theuser forflexibil-ity. All images are managed by MaaS in the cloud platform and can bedownloaded upon request. Once aModel VM is provisioned, the follow-ing tasks will be executed: upload the model configuration file andinput data, compile themodel, run themodel, preprocessmodel output,and upload the output to the data server. These tasks are controlled byMaaS Server.

MaaS Server is a virtual agent that dispatches and monitors themodel runs aswell asmanages the underlying computing resources au-tomatically. It is responsible for the following: (1) interpreting modelrun requests using the Request Interpreter and dispatching the tasksusing Task Controller; (2) controlling the life cycle of Model VM usingcloud API based on the model execution status; and (3) monitoringtask execution status using the Task Monitor, a background programwhich periodically communicates with allModel VMs to fetch the latestmodel execution status (e.g., model running time and size of dataproduced).

Data Server archives and manages all model outputs and relatedmetadata (e.g., model run parameters, output data format) in a central-ized database. These data can be accessed by other researchers either bydownloading via ftp or visualizing online. The data uploading process iscontrolled by Task Monitor, and the data uploading strategy is discussedin Section 3.4. Web Portal provides the graphic user interface (GUI) forMaaS, allowing users to register/login to MaaS, configure and submitmodel runs, check the model run status and perform other tasks. By in-tegrating existing online visual analytic systems with Data Server andleveraging spatial web portal technologies (Li, Yang, Wu, Li, & Miao,2011; Yang, Cao, Evans, Kafatos, & Bambacus, 2006), theWeb Portal fur-ther supports model output visualization and analysis.

3.3. Model I/O

Each geoscience model is a processing unit producing output withspecific input. Handling model input and output (I/O) could be verycomplex for MaaS because the input types and formats are heteroge-neous for different models. For instance, some models take one

Page 4: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Fig. 2. Overall architecture of MaaS.

144 Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

configuration file as input such as ModelE,6 while others may requiremultiple configuration files and data files such as the NMM-dustmodel (Xie, Yang, Zhou, & Huang, 2010). In order to handle the hetero-geneity, we abstract the model input as two general types: configura-tion input (e.g., geographic region, spatial resolution, and time period)and data input (e.g., observation data and initial condition data).Model output are data (all data files produced by themodel) in differentformats as denoted in Expression (1):

ModelðConfigInput; DataInputÞ→run DataOutput ðExpression 1Þ

This abstraction enables us to define three standardized “channels”(ConfigInput channel, DataInput channel and DataOutput channel) foreach model VM to receive themodel input fromMaaS Server and to up-loadmodel output to Data Server (Fig. 3). A “channel” refers to a specificModel VM location (directory) where the model input files or outputfiles are placed. The three “channels” are specified when creating theModel VM Image and registered in MaaS Server. HTTP protocol is usedfor transferring data between model VM, MaaS server and data server.This model I/O mechanism addresses the heterogeneity problem atthe framework level by encapsulating the data details (types and for-mats) to a level that can be dealt with by, for example, OGC web pro-cessing services. The interoperability achieved by this I/O mechanismenables MaaS to support various geoscience models transparency in aunified framework. New models can be added to MaaS as long as thechannels are properly defined. Section 3.4 details the mechanism forpublishing new models.

3.4. Mechanism for publishing new models

Supporting an array of geosciencemodels is an essential capability ofMaaS. Even though different models may have different model run en-vironments and model input and output, the procedure to setup andrun differentmodels is similar, starting from selecting hardware, to con-figuring the model run environment and handling model output (Liu,Huang, & Xia, 2013; Liu, Huang, Xia, Li, & Lostritto, 2013). The MaaS ar-chitecture provides a pluggable mechanism to enable newmodels to beeasily plugged into the existing MaaS system with two general steps:(1) creating a model VM image for this new model and (2) registeringthe new model VM image into MaaS.

To create a model VM image, software and OS requirements are an-alyzed (e.g., CentOS, Ubuntu) afterwhich a “bare-metal”VM is launchedwith supported OS in a cloud platform (any clouds that are compatiblewith Amazon EC2/Eucalyptus) using a basic VM image. The basic

6 http://www.giss.nasa.gov/tools/modelE/.

image is available from the cloud provider (e.g., Amazon) or the cloudplatform community (e.g., Eucalyptus). Following VM launch, themodel and dependent software libraries are installed following thesame procedure as on a physical machine. The next step configuresthe model (configuration file and input data) and conducts a test run.If the run is successful, four steps are initiated: extract model configura-tion file(s) and input data required for a model run and delete themfrom current VM; identify the three channels for the model VM; andcreate the model VM image based on current VM using the built-incloud APIs (Liu, Huang, & Xia, 2013; Liu, Huang, Xia, Li, et al., 2013).

Three steps are needed to register the new model VM image inMaaS. First, the newmodel VM image is uploaded to theMaaS platform.Second, the three channels are registered in MaaS server to indicatewhere model input and output files should be placed on the modelVM. Third a new Request Interpreter is configured and plugged intoMaaS server to parse the model run request for this new model.

The model VM image can be published in MaaS by modelers/re-searchers. Registering the newmodel (VM image) intoMaaS is conduct-ed by the MaaS provider. Collaborations between the modelers/researchers and MaaS providers are critical for publishing a newmodel into MaaS as the process is complex. However, once the modelis published, these complex steps are “recorded” and a ready-to-gomodel environment could be provisioned in minutes.

3.5. Mechanism for parallelizing ensemble model run

Ensemble run of amodel is a normal practice in geosciencemodelingfor testing the model sensitivity to input parameters. For example, totest a climate model’s sensitivity to a set of parameters, ensemble runsof themodel are conducted hundreds of times with different parametercombinations forwhichwe have observational data.With the tradition-al model infrastructure such as HPC, an ensemble run can be conductedeither via installing the model on one machine and conducting the en-semble runs sequentially, or installing the model on many machinesand conducting the runs in parallel. Neither approach is effective.

MaaS can conduct an ensemble run in parallel with a single request(Fig. 4). The model configuration and input data for each model run areuploaded by users using the web interface through the “channels”. Forexample, users zip all model configuration files (.R file) in a single pack-age and upload to MaaS. Upon receiving the ensemble run request,MaaS provisions a model VM for each model run concurrently. Theuploaded configuration files and input data are distributed to the VMs.After the model runs start on each model VM, model outputs are post-processed and uploaded to the data server for download or visualiza-tion. For each finished model run, the model VM is terminated to mini-mize the resource consumption.

Page 5: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Fig. 3.Model I/O data transfer between model VM, MaaS server and data server.

Fig. 4. Mechanism for parallelizing ensemble model run (model VMs are provisioned based on the same model VM image but running with different configurations.

145Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

With this parallel mechanism, a large ensemble run can beparallelized by provisioning the same number of model VMs as thenumber of model runs if the computing pool is large enough(e.g., Amazon EC2). If the computing pool is limited (e.g., privatecloud), MaaS enables users to specify howmany model VMs to be pro-visioned. In this case,MaaS (task controller) maintains a wait list for thependingmodel runs, and a new run is started once amodel VM finishes.This parallel mechanism is cost- and performance-effective becauseconducting 100 model runs on one machine for 100 h equals the sameon 100 machines for 1 h.

The parallelization for ensemble model run occurs at theexperiment-level, which allows conducting many model runs concur-rently to accelerate the ensemble experiment. The code-levelparallelization of enabling a model to run in parallel using theparallelization technologies such as MPICH27 is well studied and be-yond the scope of this paper. However, if a model is already MPI-enabled, such as the dust model NMM-dust (Xie et al., 2010), such amodel can be incorporated into MaaS, and a cloud-based HPC clusterwill be provisioned to support running the model. This is discussed inSection 5.4.

In order to support concurrent requests, MaaS uses similar mecha-nism (Fig. 5). If n users run a model concurrently, MaaS starts n modelVMs (if the computing pool is large enough) with each VM runningthese requests concurrently. When the computing pool is not largeenough to provision more model VMs, MaaS manages a waiting list. Ifa model VM has finished a current model run, MaaS terminates thisVM and starts a pendingmodel VM using a first-come first-serve policy.

7 http://www.mpich.org/.

It should be noted that each user request can also be an ensemble runrequest; in this case the parallelization is achieved at the levels of boththe user request and the ensemble run.

3.6. Mechanism for handling model output

Each model run generates gigabytes of data, and this volume in-creases to terabytes and petabytes levels in many model runs. Often,model output needs to be post-processed (e.g., format conversion, ag-gregation) before beingmanaged and accessed. In the distributed simu-lation environment, eachmodel output also needs to be uploaded to thedata server for integration. This poses at least three challenges: post-processing is computing intensive; uploading to the data server is com-munication intensive; and the storage for the Model VM needs to belarge enough to hold the output for at least one model run.

Even though a model simulation produces large volumes of data,these data are generated progressively. For different models runningon multiple VMs, this procedure occurs in a distributed manner. We in-troduce a progressive mechanism to accelerate data post-processingand uploading: post-processing distributively data on the model VMswhere data are generated; post-processing data progressively as dataare produced; and uploading data progressively once they are proc-essed. Thus, when a model run is finished, most of the model outputhas been post-processed and uploaded to the Data Server.

This progressivemechanism has several benefits. First, thewait timefor post-processing and uploading is significantly reduced. Second, thecomputing resource consumption is minimized by terminating theVMs almost at the same time when the model run is finished. Andthird, the VM storage requirement for storing model output is signifi-cantly reduced.

Page 6: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Fig. 5.Mechanism for concurrent user requests (model VMs provisioned based on the different model VM images and running different models configurations).

146 Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

4. Prototype and experiment results

Based on the MaaS framework and architecture, we developed aproof-of-concept MaaS prototype to demonstrate its feasibility.

4.1. Experimental model and a study case

ModelE,8 a global climate model developed by National Aeronauticsand Space Administration (NASA), is a representative geospatial model.To validate sensitivity and accuracy of ModelE, scientists run the modelmany times with different input configurations. Running ModelE iscomputing anddata intensive. The following is a study case demonstrat-ing how ModelE is used by climatologists.

A researcher examines how climate responds to small changes in theEarth’s absorption of solar radiation by simulating 14 different scenarioswith different levels of chlorine in the atmosphere. Fig. 6A shows theworkflow for conducting this experiment. In this workflow, completingthe first three steps is a cumbersome task. In step 4, runningmodels se-quentially requires longer run time, while parallel model runs needmore efforts to set up the model. Furthermore, the model takes hoursto days to finish a single simulation, assuming unbridled access. Oncethe simulations are finished (Step 6), all model outputs need to beintegrated.

The following sections introduce theMaaS prototype forModelE andanalyze the effectiveness of this prototype for this study case.

4.2. Prototype

A private cloud platform is established on Eucalyptus version 2.0,9

serving as the cloud environment for the prototype. The underlyinghardware for this private cloud are six physical machines (8-core CPUrunning at 2.35 GHz, 16 GB of RAM) connected with 1 Gigabit Ethernet(Gbps). Totally 40 VMs (1 core CPU running at 1 GHz and 2G of RAM)can be provisioned in the cloud. The ModelE VM image is built basedon the Linux Ubuntu 4.3 using Eucalyptus API (Fig. 7).

8 http://www.giss.nasa.gov/tools/modelE/.9 http://www.eucalyptus.com/.

Fig. 8A shows theWeb Portal GUI once a user logged in. The runningstatus for the submitted tasks of this user is displayed in a dynamictable. An existing web-based visual analytic system optimized by vari-ous performance-improving techniques (Li et al., 2013; Sun et al.,2012) is incorporated into the Data Server and integrated with theWeb Portal, enabling users to visualize the model output directly onthe web without downloading the data. The original model output canalso be downloaded from Data Server (Fig. 8B).

4.3. MaaS evaluation

This section evaluates the efficiency of MaaS by illustrating howMaaS helps scientists in the study case (Section 4.1) based on three as-pects: model setup, model run, and model output handling.

4.3.1. Evaluation of model setupAn experiment is conducted to evaluate the value of MaaS by com-

paring the setup of ModelE with and without MaaS. Ten users with dif-ferent backgrounds were selected (denoted User1,User2,…,User10).The users were provided with the sameMaaS user manual and ModelEconfiguration file (.r file), and each set up and ranModelE with the con-figuration file usingMaaS. Of the ten, only User1 andUser2manually setup ModelE because this task is complex and only is needed when themodel is first setup in theMaaS. BothUser1 andUser2 have a geosciencebackground, and User1 and User2 have high-level and limited-levelLinux administrative skills respectively. The two users are providedwith the same installation guide for ModelE and the same Linuxmachine.

The time spent for setting up themodel by each user (Table 1) showsthat User1 and User2 spent 1.2 and 2.3 h respectively. Conversely, withMaaS, all ten users required b 4 min for configuring and submittingthe job. Once the job was submitted, MaaS provisioned themodel envi-ronment (starting an instance, transferring input files, and compilingand starting) in b4 min. Thus, MaaS not only reduced the user interac-tion time from hours to minutes for setting up the model, but alsoallowed those with little system administrative experience to proceedexpeditiously.

Page 7: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Fig. 7. Physical architecture for the MaaS prototype.

Fig. 6.Workflows for simulating and analyzing 14 different scenarios using ModelE: (A) without using MaaS and (B) using MaaS.

147Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

Page 8: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Fig. 8. (A) The Web Portal GUI showing the running status of the submitted tasks and (B) model output progressively transmitted to data server to ensure data availability.

Table 1Comparison of time spent on setting up ModelE environment with/without MaaS (MaaSusage does not include the time for registering ModelE into MaaS).

Manual setup MaaS

User interaction time(h)

User interaction time(min)

Auto provision time(min)

User1 ∼1.2 1.3 4.3User2 ∼2.3 1.5 4.2User3 – 1.9 4.2User4 – 1.5 3.9User5 – 2.8 4.3User6 – 1.9 4.2User7 – 1.6 4.1User8 – 3.4 4.5User9 – 1.5 4.2User10 – 2.3 4.0

148 Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

4.3.2. Evaluation of model run and model output handling

4.3.2.1. Many-model-run performance. The capability and reliability ofprovisioning multiple model instances to support many-model-run inthe study case was evaluated in another experiment by provisioning14 model runs with 14 different input configurations. These model

Fig. 9. Timeline for provisioning and running the 14 model runs with a 10-year simulation pervisualization, the timeline is not evenly scaled).

input configurations came from 300 ensemble runs provided by NASAscientists. The simulation period was from December 1949 to January1961with a spatial resolution of 4° × 5° and a time resolution ofmonth-ly. Fig. 9 illustrates the timeline for the 14model runs, and Fig. 10 showsthe running status in theWeb Portal. All model instances were success-fully provisioned (m1.small instance type) and runwith ModelE within6 min (Fig. 9). The 14 model runs were finished in ∼5 days, and modelinstances automatically terminated b 10 s after completion of themodel run.

4.3.2.2. Model output handling. Each run generated 2088 Megabytes(MB) of data, and a total of 28.55 GB data were produced and uploaded.Monthly data were generated by ModelE and the size of each datasetwas ∼16 MB. The progressive data uploading mechanism ensures thatwhen a model run is finished, an instance needs to wait only foruploading the last month dataset (16 MB) before termination, ∼4 s inthe testing environment. Once model runs are finished, users can ana-lyze, visualize and compare model outputs directly through the inte-grated online visual analytic system without downloading andmanaging the data. Yearly mean “net thermal radiation” for three simu-lation scenarios for a selected study area from 1951 to 1960 is shown inFig. 11.

iod. x-Axis is time spent on the three stages, while y-axis is the 14 model runs (for better

Page 9: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Fig. 10. 14 Model runs successfully provisioned and finished with a single request.

149Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

4.3.3. Workflow improvements for the study case using MaaSThe newworkflow for runningmodelswhen usingMaaS is shown in

Fig. 6B. With MaaS, the researcher manually conducts Steps 1 and 2,

Fig. 11. Yearly mean “net thermal radiation” for three simulation scenarios for

which take a few minutes. The complex and time consuming taskssuch as configuringmachines, installing and compilingmodels are auto-mated by MaaS (Step 3). The large computing resources required by

a select study area (visual analytic system adopted from Li et al. (2013)).

Page 10: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Table 2Geospatial modeling challenges addressed by key features of MaaS.

Challenges/MaaS features F1 F2 F3 F4 F5 F6 F7 F8 F9

Model setup complexity × × × × × ×Computing intensive × × × ×Scalable requirement × × ×Data intensive × × ×On demand research × × × × ×

10 http://www.sgi.com.

150 Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

model runs are rapidly provisioned by MaaS. Furthermore, all modeloutputs are managed by MaaS and the researcher directly analyzesthe simulated data using the online visual analytical system. Based onthe above experiments, it is concluded thatMaaS expedites significantlymodel setup, model run and model output analysis.

5. Conclusion and discussion

This paper proposes a MaaS framework to address the geosciencemodeling challenges by publishing various geoscience models as ser-vices. Methodologies for designing and implementing MaaS are pre-sented. To test the feasibility of the framework, a prototype MaaSsystem is developed. Experimental results show that the proposedMaaS significantly speedup the geoscientists’ modeling activities.

5.1. Key features of MaaS

The key features of MaaS are summarized as following. The chal-lenges addressed by the key features are summarized in Table 2.

• (F1). New geoscience models are published to MaaS by creating newmodel VM images and registration in MaaS. This feature is enabledby the pluggable approach in the MaaS design and the image-basedmechanism offered by cloud computing.

• (F2). Themodel configuration files andmodel input data are preparedoutside of MaaS and then uploaded to the model VM through MaaSusing three standard “channels.” This flexibility enables researchersto run a model many times with different configurations without re-building the model environment.

• (F3). Once a model VM image is created, if a modeler needs to modifythe model code, she/he can easily create a new version of the VMimage containing the modified model based on the existed modelVM image. Once the model VM image is updated, anyone who re-quests to run this model will get the latest model environment. Thisfeature enables modelers to easily test and propagate their models.

• (F4). The specifications of themodel VM (e.g., CPU, RAM and disk) arespecified by the user when submitting the model run request basedon the resource consumption characteristics. This flexibility ensuresthat themodel VMs are tailored for specific models, whichmaximizesthe load on each model VM while ensuring the computational effi-ciency. For example, if a model can only utilize one CPU core, a smallVM instance with one core CPU is used.

• (F5). The model VM image is downloaded directly from MaaS,allowing users to quickly run the model in other cloud environments,such as their own private cloud platform (needs to be EC2 compati-ble) or Amazon EC2. This feature enables the models to be installedone time, run many times on any compatible platforms.

• (F6). Ensemble model runs are effectively handled by provisioningmany model VMs simultaneously and conducting these model runsin parallel. Users prepare the model configuration files, upload theinput data and submit the request;MaaS handles the remaining tasks.

• (F7). Concurrent model run requests are handled by provisioningmany model VMs simultaneously with each VM dedicated to oneuser (if the computing pool is large enough). If the computing poolis not large enough to handle all requests concurrently, MaaS handlesthe requests following a “first-come, first-serve” policy.

• (F8). Progressive data transmitting mechanism enables the modeloutput to be “pushed” to the user as soon as they are available andalso dramatically reduces the waiting time for data uploading todata server.

• (F9). The model run request together with model configuration andinput data are submitted directly from a web interface. The progressfor all model runs is periodically updated in the web interface. Finallythe model outputs are downloaded from the Data Server and visual-ized in the web interface. This feature enables users to conduct scien-tific research in a web-based environment.

5.2. Advantages of MaaS

The traditional computing infrastructures (e.g., HPC server) havelimited availability, accessibility, and scalability. Traditionally, to run amodel, researchers need to start from purchasing the infrastructure, orwaiting for account approval to access public HPC resources(e.g., SGI.10) The model configuration, data input, preprocessing andpost-processing have to be done each time the HPC is allocated.

Compared to the traditional HPC server, MaaS offers several distinctadvantages. First, users access the computing resources and run themodel on demandwithout waiting. Second, MaaS enables users to pub-lish newgeosciencemodels toMaaSwith a pluggable approachwhich isleverage-able in the future whenever the model is reconfigured or runagain. Third, the HPC server is not elastically provisioned for computingresources to satisfy on-demand computing needs. If a researcher con-ducts a large ensemble experiment by running a model hundreds oftimes, a longwait time is required as the runs are sequential. Converselywith MaaS, hundreds of virtual machines are provisioned in a few mi-nutes to conduct runs in parallel.

The IaaS serves as the fundamental platform to buildMaaS, similar toIaaS serving as the foundation for PaaS and SaaS. The IaaS provides animage-based mechanism, essential for MaaS to create a ready-to-gomodel environment. The on-demand, elastic and scalable computing re-sources offered by IaaS also enable MaaS to accommodate the comput-ing intensive challenge posed by the geoscience community. However,IaaS serves only as the underlying computing infrastructure (similar tobare-metal machines) for MaaS, and the aforementionedMaaS featuresare not by default provided by IaaS (except feature #4 and #5). Themechanisms of scheduling, monitoring, model registering and input/output handling are provided by MaaS.

The supportability of the features by the three platforms of HPC, IaaSand MaaS are compared in Table 3.

5.3. User roles potentially benefited from MaaS

The proposedMaaS benefits a wide range of users across the geosci-ence community and other domains. For geoscientists to run a modelregistered inMaaS, oneonly needs to log onto theMaaSWeb Portal, con-figure the model input, submit the model run request and monitor therunning status in theWeb Portal. The model output can be directly ana-lyzed in the integrated online visual analytic application and sharedwith other team members. Hence, MaaS enables model users to set upand effectively run a model in minutes without dealing with complexmodel setup procedures or the required computing resources.

Geoscience modelers can publish their models to MaaS by creatingand publishingVM images toMaaSwith a pluggable approach. The pub-lished models are accessible by other users through MaaS. Thus, MaaSprovides a new mechanism for modelers to share their computationalframework. In addition, MaaS can expedites model development in ef-fectively validating and calibrating models by easily reusing models

Page 11: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

Table 3Supportability of the features by the three platforms (HPC, IaaS, MaaS) for key features ofMaaS.

Platforms/MaaS features F1 F2 F3 F4 F5 F6 F7 F8 F9

MaaS × × × × × × × × ×IaaS × × × × ×HPC × × ×

151Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

already published and runningmodelmany timeswith different config-urations and comparing the model output directly on the Web.

Finally, models help students understand important scientific pro-cesses and provide a platform to explore relevant processes(Grosslight, Unger, Jay, & Smith, 1991; Jacobson & Wilensky, 2006).MaaS can be used as a tool in the classroom, facilitating teaching andlearning. For example, by logging on the MaaS through a browser, stu-dents can easily run amodel with different configurations and comparethe outputs.

5.4. Future research

The proposed MaaS architecture and prototype system have limita-tions. For the model setup, the research reported herein focused on alimited number of users (N = 10). Further research is desired for im-proving the MaaS architecture and development and some needs aresuggested below:

• The current architecture supports parallelization in the experiment-level but is not optimized for models that run on a HPC cluster suchas NMM-dust model (Xie et al., 2010). Exploring possible approachesto automatically provision aHPC cluster in cloud environment is essen-tial for supporting MPI-enabled model parallelization. Huang et al.(2013) introduced detailed steps to manually provision a HPC clusteron Amazon EC2 to run the MPI-enabled NMM-dust model in parallel.

• The current MaaS architecture enables users to manually upload themodel input files. A more comprehensive mechanism would be tolink with online datasets integrated in the model configuration formodel to fetch data from data server directly. Future research onaccepting standard geospatial web services as model input/output tosupport model chaining and scientific workflow are needed(e.g., adopting OGC Sensor Observation Service (SOS) for real time ob-servation data input, Web Coverage Service (WCS) for raster datainput/output).

• A pricing model is desired when implementing the MaaS on a publiccloud to archive the pay-as-you-go style for usingMaaS. Such a pricingmodel should recommend on what kind of VM should be launchedbased on the geoscience model types and costs assessed.

The objective of this research is to bridge the computing infrastruc-ture and computing requirements of geospatial science modeling, andthis can be easily broaden from the climate sciences to other environ-ment and urban systems. This paper aims to provide new insights andguidance for geoscientists seeking solutions to address the computingdemands for geoscience models. This is achieved by presenting aMaaS framework and strategies to run the models in an innovativeway and taking advantage of cloud computing.We expect this approachto benefit a broader geoscience community across environmental andurban domains in the coming decade.

Acknowledgements

Research reported is supported by NSF EarthCube, Polar CI, and Spa-tiotemporal Innovation Center Programs (ICER-1343759, CNS-1117300,PLR-1349259 and IIP-1338925) and NASA (NNX12AF89G). Dr. GeorgeTaylor and Mr. Shawn Dias helped proof an earlier version of themanuscript.

References

Allcock, B., Bester, J., Bresnahan, J., Chervenak, A. L., Foster, I., Kesselman, C., et al. (2002).Data management and transfer in high-performance computational grid environ-ments. Parallel Computing, 28(5), 749–771.

Anderson, D. P. (2004). Boinc: A system for public-resource computing and storage. InGrid computing proceedings. Fifth IEEE/ACM international workshop (pp. 4–10). IEEE.

Anderson, D. P., Cobb, J., Korpela, E., Lebofsky, M., & Werthimer, D. (2002). SETI@ home:An experiment in public-resource computing. Communications of the ACM, 45(11),56–61.

Argent, R. M. (2004). An overview of model integration for environmental applications –Components, frameworks and semantics. Environmental Modelling & Software,19(2004), 219–234.

Arrigo, J., Brown, J. Kellogg, L., Hwang, L., Peckham, P., & Tarboton, D. (2013). Workshopreport. April 22–23, 2013, Boulder, Colorado. bhttp://geodynamics.org/cig/community/workshops/Earthcube13/ExecSummaryN Last accessed 25.07.13.

Bastin, L., Cornford, D., Jones, R., Heuvelink, G., Pebesma, E., Stasch, C., et al. (2013). Man-aging uncertainty in integrated environmental modeling: The UncertWeb frame-work. Environmental Modelling & Software, 39, 116–134.

Beckman, P. H. (2005). Building the TeraGrid. Philosophical Transactions of the RoyalSociety A: Mathematical, Physical and Engineering Sciences, 363(1833), 1715–1728.

Bernholdt, D., Bharathi, S., Brown, D., Chanchio, K., Chen, M., Chervenak, A., et al. (2005).The earth system grid: Supporting the next generation of climate modeling research.Proceedings of the IEEE, 93(3), 485–495.

Chang, H. I., Niyogi, D., Chen, F., Kumar, A., Song, C., Zhao, L., et al. (2008). Developing aTeraGrid based land surface hydrology and weather modeling interface. In Proceed-ings of the TeraGrid 2008 conference.

Claussen, M., Mysak, L., Weaver, A., Crucifix, M., Fichefet, T., Loutre, M. F., et al. (2002).Earth system models of intermediate complexity: Closing the gap in the spectrumof climate system models. Climate Dynamics, 18(7), 579–586.

Evangelinos, C., & Hill, C. (2008). Cloud computing for parallel scientific HPC applications:Feasibility of running coupled atmosphere–ocean climate models on Amazon’s EC2.In The 1st workshop on cloud computing and its applications (pp. 2–34).

Fernández-Quiruelas, V., Fernández, J., Cofiño, A. S., Fita, L., & Gutiérrez, J. M. (2011). Ben-efits and requirements of grid computing for climate applications. An example withthe community atmospheric model. Environmental Modelling & Software, 26(9),1057–1069.

Ferraro, R., Sato, T., Brasseur, G., Deluca, C., & Guilyardi, E. (2003). Modeling the earth sys-tem. Critical computational technologies that enable us to predict our planet’s future.In Geoscience and remote sensing symposium, 2003. IGARSS’03. Proceedings. 2003 IEEEinternational (pp. 630–633). IEEE.

Geller, G. N., & Turner, W. (2007). The model web: A concept for ecological forecasting. InGeoscience and remote sensing symposium. IGARSS 2007. IEEE International (pp.2469–2472). IEEE.

Geller, G. N., & Melton, F. (2008). Looking forward: Applying an ecological model web toassess impacts of climate change. Biodiversity, 9(3–4), 79–83.

Grosslight, L., Unger, C., Jay, E., & Smith, C. L. (1991). Understanding models and their usein science: Conceptions of middle and high school students and experts. Journal ofResearch in Science Teaching, 28(9), 799–822.

Harrop, C. W., Bernardet, L., Govett, M., Smith, J. S., & Weygandt, S. (2008). A workflowmanagement system for automating weather and climate simulations. InCires’annual, Institute-wide symposium.

Hill, C., DeLuca, C., Suarez, M., & Da Silva, A. R. L. I. N. D. O. (2004). The architecture of theearth system modeling framework. Computing in Science & Engineering, 6(1), 18–28.

Huang, Q., Yang, C., Nebert, D., Liu, K., & Wu, H. (2010). Cloud computing for geosciences:Deployment of GEOSS clearinghouse on Amazon’s EC2. In Proceedings of the ACMSIGSPATIAL international workshop on high performance and distributed geographic in-formation systems (pp. 35–38). ACM.

Huang, Q., Yang, C., Benedict, K., Chen, S., Rezgui, A., & Xie, J. (2013). Utilize cloud comput-ing to support dust storm forecasting. International Journal of Digital Earth, 6(4),338–355.

Jacobson, M. J., & Wilensky, U. (2006). Complex systems in education: Scientific and edu-cational importance and implications for the learning sciences. The Journal of theLearning Sciences, 15(1), 11–34.

Li, Z., Yang, C., Sun, M., Li, J., Xu, C., Huang, Q., et al. (2013). A high performanceweb-basedsystem for analyzing and visualizing spatiotemporal data for climate studies.Web andwireless geographical information systems (pp. 190–198). Berlin Heidelberg: Springer.

Li, Z., Yang, C. P., Wu, H., Li, W., & Miao, L. (2011). An optimized framework for seamlesslyintegrating OGC Web Services to support geospatial sciences. International Journal ofGeographical Information Science, 25(4), 595–613.

Lindsay, R. W., & Zhang, J. (2006). Assimilation of ice concentration in an ice-oceanmodel.Journal of Atmospheric and Oceanic Technology, 23(5), 742–749.

Liu, K., Huang, Q., Xia, J., Li, Z., & Lostritto, P. (2013). Chapter 4 how to use cloud comput-ing? In C. Yang, Q. Huang, Z. Li, C. Xu, & K. Liu (Eds.), Spatial cloud computing: A prac-tical approach (pp. 73–89). CRC Press.

Liu, K., Huang, Q., & Xia, J. (2013). Chapter 5 cloud-enabling geoscience applications, 2013.In C. Yang, Q. Huang, Z. Li, C. Xu, & K. Liu (Eds.), Spatial cloud computing: A practicalapproach (pp. 73–89). CRC Press.

Mell, P., & Grance, T. (2009). The NIST definition of cloud computing. bhttp://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdfN Last accessed 15.07.13.

Murphy, J. M., Sexton, D. M., Barnett, D. N., Jones, G. S., Webb, M. J., Collins, M., et al.(2004). Quantification of modelling uncertainties in a large ensemble of climatechange simulations. Nature, 430(7001), 768–772.

Nativi, S., Mazzetti, P., & Geller, G. N. (2013). Environmental model access and interoper-ability: The GEO Model Web initiative. Environmental Modelling & Software, 39,214–228.

Nefedova, V., Jacob, R., Foster, I., Liu, Z., Liu, Y., Deelman, E., et al. (2006). Automating cli-mate science: Large ensemble simulations on the TeraGrid with the GriPhyN virtualdata system. In e-Science and grid computing, 2006. e-Science’06. Second IEEE interna-tional conference (pp. 32–32). IEEE.

Page 12: Computers, Environment and Urban Systemsapp.mtu.edu.ng/cbas/Geoscience/Building Model as a...MaaS enables various geoscience modelstobepublished as services,and theseservices canbeaccessedthrougha

152 Z. Li et al. / Computers, Environment and Urban Systems 61 (2017) 141–152

Ostermann, S., Iosup, A., Yigitbasi, N., Prodan, R., Fahringer, T., & Epema, D. (2010). A per-formance analysis of EC2 cloud computing services for scientific computing. Cloudcomputing (pp. 115–131). Berlin Heidelberg: Springer.

Pordes, R., Petravick, D., Kramer, B., Olson, D., Livny, M., Roy, A., et al. (2007). The open sci-ence grid. Journal of Physics: Conference Series, 78(1), 012057 IOP Publishing.

Roman, D., Schade, S., Berre, A. J., Bodsberg, N. R., & Langlois, J. (2009). Model as a service(MaaS). In AGILE workshop: Grid technologies for geospatial applications, Hannover,Germany.

Schneider, S. H., & Dickinson, R. E. (1974). Climate modeling. Reviews of Geophysics, 12(3),447–493.

Sun,M., Li, J., Yang, C., Schmidt, G. A., Bambacus, M., Cahalan, R., et al. (2012). A web-basedgeovisual analytical system for climate studies. Future Internet, 4(4), 1069–1085.

Vecchiola, C., Pandey, S., & Buyya, R. (2009). High-performance cloud computing: A viewof scientific applications. In Pervasive systems, algorithms, and networks (ISPAN), 200910th international symposium (pp. 4–16). IEEE.

von Deimling, T. S., Held, H., Ganopolski, A., & Rahmstorf, S. (2006). Climate sensitivity es-timated from ensemble simulations of glacial climate. Climate Dynamics, 27(2–3),149–163.

Xie, J., Yang, C., Zhou, B., & Huang, Q. (2010). High-performance computing for the simu-lation of dust storms. Computers, Environment and Urban Systems, 34(4), 278–290.

Yang, C. P., Cao, Y., Evans, J., Kafatos, M., & Bambacus, M. (2006). Spatial Web portal forbuilding spatial data infrastructure. Geographic Information Sciences, 12(1), 38–43.

Yang, C., Goodchild, M., Huang, Q., Nebert, D., Raskin, R., Xu, Y., et al. (2011). Spatial cloudcomputing: How can the geospatial sciences use and help shape cloud computing?International Journal of Digital Earth, 4(4), 305–329.

Yang, C., Sun, M., Liu, K., Huang, Q., Li, Z., Gui, Z., et al. (2014). Contemporary computingtechnologies for processing big spatiotemporal data. In Mei-Po Kwan, DouglasRichardson, Donggen Wang, & Chenghu Zhou (Eds.), Space–time integration in geog-raphy and GIScience: Research frontiers in the U.S. and China. Dordrecht: Springer.

Yang, C., Wu, H., Huang, Q., Li, Z., & Li, J. (2011). Using spatial principles to optimize dis-tributed computing for enabling the physical science discoveries. Proceedings of theNational Academy of Sciences, 108(14), 5498–5503.

Yang, C., Xu, Y., & Nebert, D. (2013). Redefining the possibility of digital earth andgeosciences with spatial cloud computing. International Journal of Digital Earth, 6(4),1–16.