Download - [IEEE 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) - Seattle, WA, USA (2011.03.21-2011.03.25)] 2011 IEEE International

Cloud Computing Oriented Network Operating System and Service Platform

Jianwei Yin, Yanming Ye, Bin Wu College of Computer Science

Zhejiang University Hangzhou, China

[email protected], [email protected], [email protected]

Zuoning Chen National Parallel Computing Engineering Research

Center Chinese Academy of Sciences

Beijing, China [email protected]

Abstract—In recent years, cloud computing sweeps the world and becomes the new direction for future network applications. The traditional operating systems lack constructivity and evolvability, support less for the application perception and multi-core platform so that they cannot satisfy the requirements of cloud computing applications well. So the development of next-generation cloud-oriented network operating system is very significant. In this position paper, we first explained the background of the new network operating system. Then we listed some existing cloud computing operating systems. After describing the challenges that the project meets, we proposed the prototype architecture. We also discussed some key technologies and elaborated the expected typical scenario of developing and deploying applications in the proposed system.

Keywords- Cloud Computing; Microkernel; Network Operating System; Service Computing

I. INTRODUCTION In the last two decades, the Internet has connected

enterprises and individuals around the world and made far-reaching impact on business operations of every company and daily routines of every person. As one of the most popular application technologies in the Internet, web applications require vast amounts of storage capacity and computing power to meet growing business needs [1]. Especially, the emergence of ‘cloud computing’ model requires more on how to obtain mass storage and powerful computing capacity, how to provide services more economically on the Internet, and how to make Internet services more convenient[2][3][4].

Most of the application systems in cloud computing are based on the traditional operating systems. However, the lack of constructivity and evolvability make it hard to meet the needs of applications’ complexity and variability well in cloud computing age. Moreover, the traditional operating systems are always over large and complex, poor controllable, so that Internet users can hardly use various resources efficiently to satisfy the individual needs of computation-intensive, data-intensive, network-intensive and other typical network computing model. At the same time, the traditional operating systems can scarcely support the widespread multi-core platform very well and almost not provide the application-driven, open and efficient resource services.

For these deficiencies, we must change a lot on the traditional operating systems to build network computing platforms. Under these circumstances, we received the support of the National Key Science and Technology Research Program of China at the end of last year to develop next-generation network operating system. The project will also provide a service platform that established on this operating system to facilitate developing and deploying cloud computing applications.

In this paper, the cloud operating system (or cloud computing operating system or cloud computing oriented operating system) means specially a specific operating system software that is different from the traditional (network) operating systems and excels in development and deployment of all types of cloud computing related software; the cloud computing platform means especially a set of support software that is established on certain operating system and provide the environment for the convenience of the development and deployment of cloud computing applications; the cloud computing system means especially a application software that developed and deployed in the existing cloud computing operating systems or platforms.

II. RELATED WORKS In the context of cloud computing is becoming the main

model of the Internet applications, research institutes and business communities pay more and more attention to the design and application of cloud computing platform. Some famous companies also presented their cloud computing operating systems and platforms.

(1) Barrelfish [6][7][8][9] Barrelfish system was developed by Microsoft Research

Cambridge in the UK and ETH Zurich in Switzerland. The system is designed for supporting multi-core and many-core processors, and the nuclear communication is achieved through message. In the way of built-in database, the system can track the use of the resources, so that it can accelerate the overall processing. Unlike other Microsoft systems, Barrelfish is expected to become open-source, but with the exception of the snapshot version, the system's core code has not been open to the public.

(2) FOS [10][11][12][13] Fos is a new multi-core and cloud computing operating

system, developed by Carbon Research Group of MIT. Fos uses micro-kernel and supports multi-core processor by

1st IEEE PerCom Workshop on Pervasive Communities and Service Clouds

978-1-61284-937-9/11/$26.00 ©2011 IEEE 111

providing a single operating system image to shield the differences of the multi-core and cloud environment. Hence the system has good compatibility and reliability and can provide most of the services that traditional operating systems have, through the message mechanism.

(3) Microsoft Azure[14][5] Azure is Microsoft's cloud service platform which relies

on the Microsoft's data centers. Azure is an integrated solution, providing both a computing infrastructure and a platform for developing applications. Azure actually is composed of a variety of different services in a common platform, such as Live Services, NET services, SQL services, SharePoint services and Dynamic CRM Services.

(4) Google App Engine (GAE) [5][15] Google App Engine is a platform for developing and

hosting web applications in Google-managed data centers. It virtualizes applications across multiple servers and data centers. Google App Engine (GAE) platform includes five parts: GAE Web services infrastructure, distributed storage service (Datastore), the application runtime environment (ARE), application development kit (SDK) and Management Console (Admin Console).

All the systems and platforms are on the cloud side, and there also exist many web operating systems on the terminal side which can integrate web and local applications easily. The more well-known web operating systems are as follows: Google Chrome OS, DeviceVM Splashtop, Windows Cloud, Jolicloud and Red Flag in Mini, etc.

In summary, the development of network operating systems trends to the design of micro-kernel, virtualization and transparency. But current network operating systems are not flexible, so that they are not adaptable to the complex and volatile network computing environment, and cannot meet the high flexible application requirements.

III. CHALLENGES To facilitate the cloud computing system, operating

system should meet many challenges. In our project, we focus on the following challenges.

1. How to provide context-aware components and applications in isomerism network environment.

In traditional service computing network, the systems on each node are isomorphism, while the requirements of actual applications are not the same. This heavily impact on the performance of each node. Service computing network generally uses the distributed architecture and the promotion of communication technology and the various ways in different working environments lead to greater distribution of the information processing. But the vast majority of current distributed systems are isomorphic, which appears in that each node in the network is be made up of the same type of computing and communicating hardware or have the same operating system, programming languages and development tools. In recent years, the distributed systems in heterogeneous network gained some achievements, but most of them are still based on the traditional operating systems and some only implement a communication layer. These systems focus on the application layer programming and can hardly meet the increasingly complex network environment

well. The load balancing between nodes have also been greatly restricted in these systems. In fact, the role and the function of each node in the real environment are different, so the required configurations of hardware and software are different. The design based on the traditional operating system make the actual capability of each node similar, which heavily affects the performance of each node. Therefore, in the target system, we will use micro-kernel to achieve a basic scheduling and management of resources, and then other system functions, basic services and software will be encapsulated into different context-aware and self-adaptive pluggable components. In practical application of target system, each node only needs to install micro-kernel, and then the corresponding context-aware and self-adaptive components will be installed automatically as needed during operation. If one component is no longer in use, it can also be automatically uninstalled to achieve the on-demand configuration.

2. How to overcome the deficiency of server management and maintenance and low resource utilization in large-scale network environment with vast amounts of servers.

According to the IDC market report of China, the total number of servers is approaching one million, of which Tencent has 100 thousand servers and Alibaba has nearly 30 thousand servers, and the number in Netease is about 20 thousand and so on. On one hand, such a large number of servers can provide quality services for users. On the other hand, they also cause a series of problems. First of all, how can each data center ensures effective and timely maintenance of each server and ensures reliable operation of the systems in such a large scale network? The traditional network management model works in a single set deployment way that a set of network management system can only be installed on one monitor server. The deficiency of this way is that the capability of management and maintenance is limited by the performance of the monitor server. When the network size exceeds the limitation of the server’ capacity, we have to add other monitor servers manually and then install the network management system on them respectively. This will lead to a number of parameter reconfigurations. Therefore, the approach will not only increase the human and material costs greatly, but also be in the risk of single point failure. Secondly, in the complex network environment, due to the need for disaster recovery and efficiency of load balancing and so on, doubling the number of servers does not result in a substantial increase in processing power and storage capacity, so there exists large waste of various system resources. In the target system, we will adopt transparency computing technology to solve the difficult problem of management and maintenance and use virtualization technology to solve the problems of low utilization of resources.

3. How to overcome the extra overheads that overlapped API calls cause and achieve the user-policy-driven seamless integration of the operating system and applications.

API calls are popular in software development, which not only facilitate developing software rapidly, but also cause a series of problems. First of all, each API call will inevitably

112

produce a lot of overhead, which may ultimately cause a huge loss of performance in the system with frequent API calls. Secondly, the method of API calls is not conducive to cross-platform. As long as changes happen in the underlying operating system or in the services components, the application may have to be recompiled to run properly. Finally, API calls isolate the application from operating system; the upper application can difficultly get the same protection and optimization as the operating system. This project will study the new methods of associating different procedures and systems. The initial is to use automatic map technology, which can make important applications run on kernel mode and automatically compile the various related procedures into a whole. This way can speed up the implementation of the procedures and bridge the gap between the applications and the operating system.

4. How to extract the common features from all kinds of applications and build service components in a standard form for the application developers to achieve rapid development.

The rapid development of quality applications is an important factor for the promotion of the target system. In the field of software development, there have been many techniques and methods to avoid duplicative development of programs and accelerate the development, such as open source software, function libraries, development based on the framework and code refactoring and so on. With the basic research on these technologies, we will focus on the common features of all kinds of applications and services on the target system to build the service platform in which developer can

get or implement the standard components. The expected platform will facilitate the rapid implementation of business processes and free developer from the specific implementation of various resource scheduling and services.

IV. PROTOTYPE

4.1 Architecture In this section we will describe the main architecture of

the system that we intend to implement. The system has two subsystems, as shown in Fig 1: the

cloud side operating system and service platform (C-OSSP) and the terminal side operating system (T-OS). The difference is that cloud side operating system and service platform will run on the server nodes in the network and can load and configure different service components and supportive software dynamically according to the actual situation of each node, while terminal side operating system runs on the user client devices to facilitate dynamic access to the cluster. C-OSSP is the core of the system which supports cloud computing and transparency computing and consists of foundation service layer (FSL) and network service layer (NSL). FSL is composed of vStarOS kernel, vStarOS Engine and vStarEnv; NSL is composed of Application Service components (JTang++), Service management components and Integrated Development ToolKits (IDT). T-OS is composed of foundation OS and web access service components.

Figure 1. Prototype architecture

The vStarOS kernel and vStarOS Engine in FSL are also called vStar Meta OS, as shown in Fig 2.

vStarOS Kernel

vStarOS Kernel adopts distributed micro-kernel architecture, which is lightweight, running in kernel mode, providing message protection layer, time sharing multiplex mechanism, virtual processor support, the basic security

113

mechanism and a set of kernel interface. The architecture is designed favoring for the implement of multicore process platform. In the multi-core environment, each core has a vStarkernel image running on the processor. In vStarOS Kernel, Vdriver is in charge of abstracting one or more virtual device from each physical device. Core driver is the management unit directly running on the processors. Communication between processors is the responsibility of Communicator unit. Coordinator is the coordination management unit, responsible for coordinating the management of the global state.

Figure 2. vStar Meta OS Layered Structure

vStarOS Engine The vStarOS engine is running upper kernel mode and its

main function is to build flexible executing environment for applications through the engine components. There are several types of engines, including security engine, message engine, device engine, scheduling engine and so on. Each of them can have one or more instances, running on and binding with different kernels.

vStarEnv The vStarEnv is the top layer of FSL, providing

infrastructure services and some interfaces for NSL. The vStarEnv hides all the details of vStarOS and thus makes the system more safe and reliable. Through the virtual machine layer, the upper NSL and applications can be implemented much flexibly. vStarEnv take full advantage of vStarOS core interfaces to constitute virtual machines, network and applications operating environment. The services that vStarEnv can provide include meta-services management, virtual machine service, storage service, network service and computing service.

JTang++ JTang++(NSL) is a series of public typical service

libraries built on the FSL, which offers network infrastructure services for the upper applications based on vStarEnv and JTang Middelware Series(developed by our group). JTang++ has three parts: the basic services unit, service management unit and integrated development toolkits. Among all the parts, the basic service layer is the most important, which encapsulates a series of basic services to support for the application service provision. JTang++ can be a simple platform that supports the average cloud

computing applications. Besides, JTang++ has various types of application and service frameworks and these will contribute to the rapid building of the complex cloud computing platform. By analyzing the cloud computing applications, we catalog three typical computing models: computation-intensive service frameworks, data-intensive service frameworks and network-intensive service frameworks.

Computation-intensive services are mainly for the services using more computing resources, including parallel computing services, search engine services, and mass data query service.

Data-intensive services are mainly for the services using more storage resources, including Non-SQL data services, network storage services and data cache services.

Network-intensive services are mainly for the services using more network traffic bandwidth, including cross-media services, message services and location services.

The service management unit provides a series of unified management and maintenance methods according to various types of services, mainly including service encapsulation and access, service full life cycle management, service security maintenance, service optimization and directory service. There also contains two sets of important interfaces: develop interface and deploy interface, which will facilitate developing and deploying applications.

In the cloud computing architecture, all service resources are in the cloud and the terminal should be able to get on-demand services without high configuration, therefore the user terminals must be changed to accommodate to future environment of cloud computing. In our prototype, we also design a terminal side operating system (T-OS) to satisfy the cloud characteristics. Being a client of the C-OSSP, the T-OS is still based on the vStar Meta OS with the micro-kernel and virtualization architecture and can benefit from the security, flexibility and transparency features of vStar Meta OS. The T-OS can reduce some services that provided by vStar Meta OS as needed, but following services are essential: Meta services management, network service, display service and virtualization service. The browser becomes the most important or the only access to obtain services. So we retain the core part of traditional web browser, and implement some new technologies on it, such as push, cling, widgets and so on.

4.2 Key technologies Virtualization is the key technology of cloud computing.

In this project, virtualization technology will be achieved as a module to be selectively added into the operating system. By that, the system can greatly improve compatibility, high performance and flexibility. Besides, the virtualization of hardware in the system orient to the underlying mechanism through which the actual physical characteristics of computing resources are hidden behind complex applications and end users. The flexible configuration of virtual resources, live migration of virtual machine, load balancing, rapid deployment of virtual machine or application, fault isolation and other features can be benefited from the virtualization.

114

Several key technologies of the project proceed around the three types of network application services. For example, semantic-based integration and query of mass heterogeneous data technology can support for computation-intensive and data-intensive applications. The technology uses semantic ontology annotation of heterogeneous data to remove the semantic ambiguity, and uses semantic inference to realize the semantic association between heterogeneous data. Then the technology will rewrite and decompose the query requests of massive data into several efficient distributed queries on the different heterogeneous data sources by the Map / Reduce paralleling method. Distributed file server based cloud storage technology can support data-intensive applications. The technology is integrated with the cluster and the distributed file system and can coordinate the operation of multiple storage devices in cloud storage so that organize inexpensive storage devices into a huge storage space to provide a unified service in a transparent way. High-frequency access supported traffic control and data transmission technology can provide perfect solutions for the network-intensive applications to enhance the pattern’s timeliness and efficiency.

Furthermore, the performance is an important factor in non-functional properties of operating system. In the service management unit of NSL, we will achieve prediction algorithm based adaptive performance optimization technology of complex systems. The technology can provide the appropriate configuration of hardware and software according to the change of the users and SLAs (Service Level Agreement, SLA), which maximize the usage of various resources to achieve optimal performance. The technology can be applied to the system development phase and deployment phase, whose functions are as follows: predicting performance according to the system model; adopting machine learning based prediction algorithms to automatically generate reference configuration; carrying out stress test on the initiate system and find system flaws by comparing with performance prediction results; predicting next system behavior based on the analysis of the online data.

V. APPLICATION SCENARIO The advantage of this system is the use of the micro-

kernel and the context-aware pluggable component. For building a specific system environment, users can only install the micro-kernel system on each node, and then choose different service components as needed to be built on the micro-kernel, or leave the components dynamically loading and self-configuring based on the actual situation. In addition, various middleware (e.g. application servers) can also be encapsulated in the form of pluggable components and then we can expand them with a context-aware and self-configuration services layer. So only provide some key parameters, the users can achieve the deployment of the applications. The typical process is shown as Fig 3. Developers can develop software rapidly and agilely using IDT integrated with the Eclipse tool. JTang++ has many common libraries and frameworks, developers can only offer some parameters to achieve certain function or software. On running, the related libraries will map (or dynamically linked

with) each other as a whole. The vStar Meta OS will abstract the underlying physical devices into several virtual machines as needed and share these virtual machines in the whole community. So the users will deploy their systems on their optimal virtual machines respectively without understanding the concrete hardware. If a physical device down, the virtual machines, which abstracted from it, can move to similar normal device and running well and the users will have no realization of the changes. Moreover, the virtual layer of the system and the applications will be designed with good facilities for recovery and migration. That is to say that if a virtual machine breaks down, the applications that running on it will move rapidly to the fast-recovered one that abstracted from the same physical device or the similar one but abstracted from different device.

Figure 3. Application Development and Deployment Scenario

Compared with the traditional operating systems, the target system is more suitable for the cloud computing environment. If a server has been installed the C-OSSP, it can search the network to find the cluster and request to join. Then all the servers that already exist in the cluster will be aware of the new node and adjust themselves to the change. After that, the new virtual machines will be abstracted and all the virtual machines will share to each other. Users can benefit for the fact that their applications are able to move from one virtual machine to another and even from one device to another without user intervention.

VI. CONCLUSION AND FUTURE WORK In this paper, we presented a prototype of cloud

computing oriented network operating system and service platform, which can facilitate the cloud computing system, to meet the constructivity, flexibility and context-awareness challenges. The design of hierarchical architecture and the pluggable components will help the prototype more flexible and can be better applied in business fields. This work has been supported by National Key Science and Technology Research Program of China and more than 100 researchers are working together to expect to release the first version of the prototype system at the end of the year. We look forward to achieving the research outcome quickly to share with our related researchers.

ACKNOWLEDGMENT

115

This work has been supported by National Key Science and Technology Research Program of China (2009ZX01043-003-003).

REFERENCES [1] M.A. Rappa, “The Utility Business Model and the Future of

Computing Services”. IBM Systems J., vol. 43, no. 1, 2004, pp. 32–42.

[2] A. Weiss, “Computing in the Clouds,” netWorker, vol.11, no. 4, 2007, pp. 16–25.

[3] G.Boss, P. Malladi, D. Quan, L. Legregni, H. Hall. “Cloud computing”. IBM White Paper, 2007. http://download. boulder.ibm.com/ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf

[4] K. Sims. “IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing”. 2007.http://www-03.ibm.com/press/us/en/ press release/22613.wss

[5] M. Armbrust, A. Fox, R. Griffith et al., “Above the Clouds: A Berkeley View of Cloud Computing”. Technical report, EECS Department, University of California, Berkeley, Feb 2009

[6] A. Baumann, P. Barham, P.-E. Dagand, T. Harris,R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania, “The multikernel: a new OS architecture for scalable multicore systems”. In Proceedings of the ACM SIGOPS 22nd Symposium on OS Principles, 2009, pp. 29–43.

[7] A. Schüpbach, S. Peter, A. Baumann, T. Roscoe, P. Barham, T. Harris, and R. Isaacs. “Embracing diversity in the Barrelfish manycore operating system”. In Proceedings of the Workshop on Managed Many-Core Systems (MMCS) 2008. ACM, June 2008.

[8] S. Peter, A. Schüpbach, P. Barhamy, A. Baumann,R. Isaacsy, T. Harrisy and T. Roscoe,”Design Principles for End-to-End Multicore Schedulers”. In 2nd Workshop on Hot Topics in Parallelism, Berkeley, CA, USA, June 2010.

[9] ETH Zurich, Microsoft Research Cambridge. The Barrelfish Operating System. http://www.barrelfish.org/.

[10] D. Wentzlaff and A. Agarwal. “The Case for a Factored Operating System (fos)”, MIT CSAIL Technical Report, MIT-CSAIL-TR-2008-060, October 2008.

[11] D. Wentzlaff and A. Agarwal. “Factored Operating Systems (fos): The Case for a Scalable Operating System for Multicores”. ACM SIGOPS Operating System Review (OSR), April 2009.

[12] D. Wentzlaff, C. Gruenwald III, N. Beckmann, K. Modzelewski, A. Belay, L. Youseff, J. Miller, and A. Agarwal. “A Unified Operating System for Clouds and Manycore: fos”. MIT-CSAIL-TR-2009-059, Nov. 2009, http://hdl.handle.net/1721.1/49844.

[13] D. Wentzlaff, C. Gruenwald III, N. Beckmann, K. Modzelewski, A. Belay, L. Youseff, J. Miller, and A. Agarwal. “An Operating System for Multicore and Clouds: Mechanisms and Implementation”. ACM Symposium on Cloud Computing (SOCC), June 2010.

[14] Windows Azure Platform. Available: http://www.microsoft.com/windowsazure/.

[15] Google App Engine. Available: http://code.google.com/appengine/.

116

Download - [IEEE 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) - Seattle, WA, USA (2011.03.21-2011.03.25)] 2011 IEEE International

Top Related