report on the infrastructure for implementing the mobile ... · implementing the mobile...

23
Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt Date: 10 Sep, 2017 Draft v 4.0

Upload: others

Post on 12-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Report on The Infrastructure for Implementing the Mobile

Technologies for Data Collection in Egypt

Date: 10 Sep, 2017 – Draft v 4.0

Page 2: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 2 of 23

Table of Contents

1. Introduction .................................................................................................................................... 3

2. Infrastructure Reference Architecture .......................................................................................... 4

3. Current Status of CPI-Related Solutions ........................................................................................ 7

4. Targeted Data Management Continuum....................................................................................... 9

5. Current Infrastructure Architecture ............................................................................................ 11

6. Targeted Solution Architecture ................................................................................................... 14

7. Recommendations for Applications and Data Management ..................................................... 16

8. Main Recommended Components .............................................................................................. 17

9. Estimated Hi-Level Sizing and Specifications .............................................................................. 20

10. Conclusion and Next Actions ....................................................................................................... 22

11. References .................................................................................................................................... 23

Page 3: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 3 of 23

1. Introduction

Realizing the advantages of using mobile technology for data collection and statistical production, the United Nations Economic Commission for Africa (ECA) is implementing a series of pilot projects on strengthening the capacity of African countries to use mobile technologies to collect data for effective policy and decision making. The pilot projects are designed to be executed by the National Statistical Office (NSO) in collaboration with a Training and Research Institute (TRI) designated by the NSO. The main partner in the project is the NSO in Egypt, called the Central Agency for Public Mobilization and Statistics (CAPMAS). CAPMAS has in turn designated Nile University as the TRI. The main objectives of the pilot project are as follows:

Strengthen the capacity of country to collect data with mobile technology

Experiment with self–enumeration using mobile devices to collect data and determine the suitability of such data for the production of statistics;

Strengthen working relationship between NSO and TRI in statistical development.

The focus of this report is to support CAPMAS to install and/or upgrade technical infrastructure, including computer servers and software to receive data from the project and integrate into standard statistical processes in Egypt, as well as to acquire handheld devices. Based on several meetings and assessment events with CAPMAS team, the current infrastructure and the targeted upgrades has been illustrated in this report. At the end, sizing estimates along with recommendations for Big Data components and platform has been made.

The main infrastructure achievement at CAPMAS is the virtualized data center which is recommended to be upgraded further to Cloud Computing platform. The National Institute of Standards and Technology (NIST) Cloud reference architecture is recommend to be sued to achieve a private cloud computing platform for this purpose.

Page 4: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 4 of 23

2. Infrastructure Reference Architecture

For the sacked of standardizing the infrastructure design for the project, a suitable reference architecture need to be used. As the cloud computing provides several benefits and at the same time exiting data center provide a solid foundation for such approach, The National Institute of Standards and Technology (NIST) Cloud reference architecture will be used as detailed in reference 2, following are key points.

The Architectural Components of the NIST Reference Architecture describes the important aspects of service deployment and service orchestration. The overall service management of the cloud is acknowledged as an important element in the scheme of the architecture. Business Support mechanisms are in place to recognize customer management issues like contracts, accounting and pricing and are vital to cloud computing.

Following figure presents an overview of the NIST cloud computing reference architecture, which identifies the major actors, their activities and functions in cloud computing. The diagram depicts a generic high-level architecture and is intended to facilitate the understanding of the requirements, uses, characteristics and standards of cloud computing.

Page 5: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 5 of 23

The NIST cloud computing definition is widely accepted as a valuable contribution toward providing a clear understanding of cloud computing technologies and cloud services. It provides a simple and unambiguous taxonomy of three service models available to cloud consumers: cloud software as a service (SaaS), cloud platform as a service (PaaS), and cloud infrastructure as a service (IaaS). It also summarizes four deployment models describing how the computing infrastructure that delivers these services can be shared: private cloud, community cloud, public cloud, and hybrid cloud. Finally, the NIST definition also provides a unifying view of five essential characteristics that all cloud services exhibit: ondemand self-service, broad network access, resource pooling, rapid elasticity, and measured service.

The NIST cloud computing reference architecture defines five major actors: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud broker. Each actor is an entity (a person or an organization) that participates in a transaction or process and/or performs tasks in cloud computing. Following table briefly lists the actors defined in the NIST cloud computing reference architecture:

Actor Definition

Cloud Consumer A person or organization that maintains a business relationship with, and uses service from, Cloud Providers

Cloud Provider A person, organization, or entity responsible for making a service

available to interested parties

Cloud Auditor A party that can conduct independent assessment of cloud services, information system operations, performance and security of the cloud implementation

Cloud Broker An entity that manages the use, performance and delivery of cloud

services, and negotiates relationships between Cloud Providers and

Cloud Consumers

Cloud Carrier An intermediary that provides connectivity and transport of cloud

services from Cloud Providers to Cloud Consumers

Page 6: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 6 of 23

Our focus in this solution will be on the Private Cloud Model that need to be in place at CAPMAS as infrastructure of the mobile data collection applications as well as back end processing technologies. NIST defines A private cloud to give a single Cloud Consumer’s organization the exclusive access to and usage of the infrastructure and computational resources. It may be managed either by the Cloud Consumer organization or by a third party, and may be hosted on the organization’s premises (i.e. on-site private clouds) or outsourced to a hosting company (i.e. outsourced private clouds).

Page 7: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 7 of 23

3. Current Status of CPI-Related Solutions

Currently, there is neither dedicated infrastructure for CPI related processing at CAPMAS nor back end processing components like database engines or big data platforms to handle data processing, transformation and modeling. Most work is done either manually or collected to spread sheets for processing and estimation of CPI and intermediate statistics and KPIs.

The following statistics provided by CAPMAS illustrates the workload for the CPI process in terms of effort needed by involved members:

KPI Measure Description

Number of Researchers 141 Filed persons assigned to collected data from the different markets

Number of Supervisors 31 Filed person assigned to manage filed operation of researchers

Number of Researchers per Supervisor

About 5 The average number of researchers being supervised by a supervisor

Overall number of governorates

27 Governorates where filed operation takes place

Overall number of regions 141 Regions where markets are located for collecting prices

Overall Number of markets

About 15000

Markets where prices are being collected

Number of markets per region

Not specified

Number of markets per regions where operation takes place

Number of markets per researcher

One region to a one

researcher

Number of markets assigned during one month to single researcher

Page 8: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 8 of 23

Number of forms per researcher

11 Number of forms to be completed by a researcher in one month

Number of products per form

964 products

Number of products the researcher need to get prices for per each single form

Number of branch reviewers

60 Number of reviewers assigned to review the collected prices for each branch office

Number of head office reviewers

20 Number of reviewers at the head office responsible for the final review of prices collected from all filed operations

.

Page 9: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 9 of 23

4. Targeted Data Management Continuum

The effectiveness of mobile data collection solution for the CPI Process requires the exitance of enterprise data management platform that is capable of handling collected data in integrated, secured and accessible way so that collaborative model among researchers, supervisors and CAPMAS branches, central departments and CPI departments can be achieved.

The current situation in the CPI process at CAPMAS lacks for such enterprise data management platform hence most of the process is done manually through paper forms except for the final analysis which is conducted using excel sheets or local desktop software prohibiting the value of collaborative data models. The target platform and infrastructure should fulfill the following main requirements split by each phase of the data management continuum:

Data Collection: enables automating the data sourcing, review, approval and consolidation using automated process through the workflow embedded into the mobile application for the filed researchers and their supervisors.

Page 10: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 10 of 23

Data Aggregation: the sourced data from the mobile applications after review and approval needed to be aggregated properly into the backend database through direct connection and predefined rules defined by the CPI department.

Data Matching: ability to extract external data and maintain master data while provide ability to query date using predefined queries as well as ad-hoc queries. At the same time, enable augmenting CPI data with other data like spatial and geolocation data.

Data Quality: provide means for checking data quality and validation during the collection process and post collection while reviewing on the back-office processing and applying standard CPI statistical analysis.

Data Persistence: retain and organize data for as long time as possible while provides capabilities of multi structured data to save the cost of storage.

Data Consolidation: assemble data entities integrated into the back-end systems with flexible meta data management to ensure accessibility by specific roles.

Data Distribution: enable analysis tools to access, retrieve and communicate data in an intuitive way suitable to each level of CPI employees as well as structured for branches access and top management reporting.

The new model proposed to be implemented in the pilot project will address the above requirements for each area targeting an integrated data management platform that enables data integration, collaboration, retention using most recent big data management technologies. Transfer data directly to secured servers managed internally by CAPMAS including the following features:

End-to-end encryption using existing CAPMAS telecommunication infrastructure.

Reliable simultaneous connections to CAPMAS datacentre servers.

Online/offline synchronization.

GIS Integration.

Multilanguage.

Could architecture be used by all surveys and by all statistical processes.

Could architecture be easily used to handle the self-enumeration concept.

Page 11: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 11 of 23

5. Current Infrastructure Architecture

At CAPMAS, virtualized data center infrastructure is used widely for other applications which can be leveraged for the CPI project with some modifications and upgrades as per the next sections. The current infrastructure is based on VMWare virtualization technologies as details in reference 3 main points are following.

VMware Infrastructure includes the following components as shown in above figure:

VMware ESX Server – A production-proven virtualization layer run on physical servers that abstract processor, memory, storage and networking resources to be provisioned to multiple virtual machines

VMware Virtual Machine File System (VMFS) – A high-performance cluster file system for virtual machines

Page 12: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 12 of 23

VMware Virtual Symmetric Multi-Processing (SMP) – Enables a single virtual machine to use multiple physical processors simultaneously

VirtualCenter Management Server – The central point for configuring, provisioning and managing virtualized IT infrastructure

Virtual Infrastructure Client (VI Client) – An interface that allows administrators and users to connect remotely to the Virtual Center Management Server or individual ESX Server installations from any Windows PC

Virtual Infrastructure Web Access – A Web interface for virtual machine management and remote consoles access

VMware VMotion™ – Enables the live migration of running virtual machines from one physical server to another with zero downtime, continuous service availability and complete transaction integrity

Page 13: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 13 of 23

VMware High Availability (HA) – Provides easy-to-use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers that have spare capacity

VMware Distributed Resource Scheduler (DRS) – Intelligently allocates and balances computing capacity dynamically across collections of hardware resources for virtual machines

VMware Consolidated Backup – Provides an easy to use, centralized facility for agent-free backup of virtual machines. It simplifies backup administration and reduces the load on ESX Server installations

VMware Infrastructure SDK – Provides a standard interface for VMware and third-party solutions to access VMware Infrastructure

Page 14: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 14 of 23

6. Targeted Solution Architecture

While leveraging the current virtualized infrastructure using a cloud computing model is the designated approach, the target infrastructure has several roles in running the mobile data collection solution to work smoothly as planned. Those roles including as per reference 4:

Support the tabled mobile application communications for field researcher and supervisor applications.

Enable hosting and running the REST APIs and associated data services developed for the mobile application data interfacing.

Provide Big Data capabilities for long term data retention and high-performance computing.

For supporting the tabled mobile application communications for field researcher and supervisor applications, following figure shows the communications topology:

System Communication Diagram

Page 15: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 15 of 23

The tablet devices are connected a 4G broadband cellular network

The end-to-end communication between field devices and the back-end server is done through a Virtual Private Network (VPN) tunneling to ensure data security.

Due to communication limitation, tablet devices should alternate between Online and Offline modes

In Offline mode, the tablet device can still gather and store data and save them locally on a local database that resides on the tablet

In Online mode, the device can synchronize the local and central database, send and receive messages and perform all other functions that require connectivity.

On the other side, for enabling hosting and running the REST APIs and associated data services developed for the mobile application data interfacing, following figure shows the main tablet mobile applications system components and data flow:

Mobile Tablet Applications System Modules Diagram

Providing Big Data capabilities for long term data retention and high-performance computing will be covered in next section.

Page 16: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 16 of 23

7. Recommendations for Applications and Data Management

In the previous section on the tablet mobile application system components, the CAPMAS Backend Server is the landing space for collected data through the field researchers and supervisors. To provide Big Data capabilities for long term data retention and high-performance computing, and receiving additional data like self-enumeration and external sources integration, additional services will be integrated beneath the backend server receiving tabled data. The following features will be attained through the additional services:

# Feature Description

1 Distributed Data Management Data will be stored in distributed blocks on several nodes

enables granular management, scalability and high-performance computing.

2 Distributed Processing Aggregation, transformation, statistical analysis, data

modeling will be implemented on a distributed application framework to enable high performance scalable resilient computing.

3 Batch Loading Enable ingestion of accumulated data into batches for long

frequency loads.

4 Streaming Loading Enables ingesting data into small frequent streams of data in

the form of pipeline of messages or transactions.

5 In Memory Processing Running data analysis in selected set of data in memory for

faster processing and manipulation.

6 Data Science Modeling Specialized libraries that implements machine learning, deep

learning, statistical modeling, data mining and analysis operations atop of the data platform

7 Graph Analysis Components that enable big graph implementation and

network analysis models.

Page 17: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 17 of 23

8. Main Recommended Components

Based on the previous sections of current status and targeted requirements, several component need to be installed to achieve needed upgrades of exiting infrastructure. The following sections describes recommended components subject to review during the implementation of infrastructure upgrades and setup:

- VMware vCloud Suite Leverage the current virtualized infrastructure into cloud management. vCloud Suite is an integrated offering that brings together VMware’s industry-leading vSphere hypervisor and VMware vRealize Suite multi-vendor hybrid cloud management platform. VMware’s new portable licensing units allow vCloud Suite to build and manage vSphere-based private clouds. Accelerate application delivery across both traditional and container based applications by giving developers the freedom to use the tools that make them most productive while still ensuring that applications can be moved seamlessly from developer laptop to production.

- Apache Hadoop Distributed File System (HDFS) Distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project.

- Apache YARN The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global Resource Manager (RM) and per-application Application Master (AM). An application is either a single job or a DAG of jobs. The Resource Manager and the Node Manager form the data-computation framework. The Resource Manager is the ultimate authority that arbitrates resources among all the applications in the system. The Node Manager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Resource Manager/Scheduler. The per-application Application Master is, in effect, a framework specific library and is tasked with negotiating resources from the Resource Manager and working with the Node Manager(s) to execute and monitor the tasks.

Page 18: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 18 of 23

- Apache Spark A fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

- Apache Hive Data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

- Apache HBase Provides random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

- Apache Oozie Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). Oozie is a scalable, reliable and extensible system.

- Apache Tez building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN. Provides expressive dataflow definition APIs, flexible Input-Processor-Output runtime model, data type agnostic, Simplifying deployment, performance gains over Map Reduce, optimal resource management, plan reconfiguration at runtime and dynamic physical data flow decisions

Page 19: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 19 of 23

- Apache Flume A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

- Apache Sqoop A tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.

- MongoDB A document database with the scalability and flexibility that you want with the querying and indexing that you need. MongoDB stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time. Will be used a document store for unstructured data.

- PostgreSQL A powerful SQL based database engine that will be used for landing mobile tablet applications collected data working behind the data services of the REST APIs. It provides extensive high-performance processing as well as special capabilities like GIS data handling.

Page 20: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 20 of 23

9. Estimated Hi-Level Sizing and Specifications

The following table lists the estimated sizing for the infrastructure required for deploying and running the for mentioned components. Sizing will be revised during the implementation taking advantage from the cloud approach deployed on top of the virtualized infrastructure at CAPMAS data center:

# VM Function Estimated Node Sizing

1 2 x Name Nodes 4 Cores 3.0 GHz

16 GB RAM

200 GB Storage

Linux OS

2 2 x Resource Scheduling Nodes 4 Cores 3.0 GHz

16 GB RAM

200 GB Storage

Linux OS

3 8 x Worker Nodes 2 Cores 3.0 GHz

8 GB RAM

500 GB Storage

Linux OS

4 2 x Document Services Nodes 4 Cores 3.0 GHz

16 GB RAM

500 GB Storage

Linux OS

5 2 x REST APIs Hosting Nodes 4 Cores 3.0 GHz

16 GB RAM

100 GB Storage

Linux OS

Page 21: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 21 of 23

6 2 x Central Database Nodes 4 Cores 3.0 GHz

16 GB RAM

500 GB Disk Space

Linux OS

7 2 x Back Office Applications 4 Cores 3.0 GHz

8 GB RAM

200 GB Disk Space

Windows Server

Page 22: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 22 of 23

10. Conclusion and Next Actions

The achievement of virtualized infrastructure at CAPMAS is paving the way for building solid foundation for the mobile data collection solution as well as other potential data solutions and integration with external data sources. To leverage this achievement two main additional layers need to be build:

Extending Virtualization to Cloud Platform

Deploying Big Data Management Platform

Next Actions would include commencing in implementing plan for the two above items where implementation team need to be invited while ensuring complete know-how transfer to CAPMAS team specially on the Big Data management solutions as well as extending the backend capabilities to support the mobile data collection solution as the main focus of this pilot project.

Page 23: Report on The Infrastructure for Implementing the Mobile ... · Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS Project

Implementing the Mobile Technologies for Data Collection in Egypt – UNECA – Nile University – CAPMAS

Project Infrastructure Report – Draft v 4.0 – 10 Sep 2017

Page 23 of 23

11. References

1- UNECA – CAPMAS – Nile University Letter of Agreement (LoA).

2- Cloud Computing Reference Architecture: Recommendations of the National Institute of Standards and Technology http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=909505

3- VMware Virtualization Documentation https://docs.vmware.com/en/VMware-vSphere/index.html

4- CAPMAS Pricing Tablet Application Requirements and Design Document.

5- VMware vCloud Suite https://www.vmware.com/products/vcloud-suite.html

6- Apache Hadoop Main Page http://hadoop.apache.org/