latency aware elastic streaming for estimating online ... · provisioning and under -provisioning...
TRANSCRIPT
LATENCY AWARE ELASTIC STREAMING FOR
ESTIMATING ONLINE VACANCY IN TRAFFIC DATA
Roshni P, Surekha Mariam Varghese
Mar Athanasius College of Engineering Kothamangalam Kerala
[email protected], [email protected]
Real-time road traffic monitoring is now an efficient method for road transport. The
trajectory data from the vehicles are collected in real-time in smart cities. The processing
of these massive amount of data with low latency and minimal resource utilization is a
considerable challenge. The trajectory sensor data stream can be used to predict and
prevent the traffic jams. An elastic and distributed system that performs analysis in real
time of road traffic data is proposed to predict the vacancies in road traffic data. Apache
spark parallel framework is used to implement the distributed system to process the huge
volume of trajectory data. The amount of data generated depends on time and day. So there
is need for elastic provisioning of resources to handle the varying amount of workload and
processing. In-order to manage and aggregate resources elastically, Amazon EC2 cloud is
deployed in the system. Since adding or removing computational resources takes several
minutes to launch, it results in an impaired performance. Thus the system predicts the need
for extra capacity using decision tree classifier and provision new resources to meet the
computational needs. The elasticity property of the system will optimize the resource
utilization and increases the overall performance. The elasticity is implemented with shell
script and vacancy estimation with multi-linear regression.
Keywords. Amazon EMR, Apache spark, Elasticity, Over-provisioning, Under-
provisioning.
1. INTRODUCTION
Real time streaming of big data resulted in large-scale processing of data. Road-traffic can
be prevented by the use of real-time big data for new products and improved services with
traffic and monitoring systems. For new cases, application needs to be flexible and simple
for processing system and also be adaptable to recognise. Workload and resource use has
to be statistically balanced and variation in workload should adapt to the system run time.
The system should add or remove new computational nodes thereby distributing workload.
Road transport will be efficient with real - time road traffic monitoring. In smart cities,
traffic trajectory sensor produces trajectory data streams in real - time. Processing real time
traffic sensor data in large city is challenging because of numerous vehicles present. Data
generated from vehicles are used to predict traffic jams in a location. Processing real - time
traffic data and streaming the same time is quite difficult. Real - time traffic is calculated
through data collected from wireless communication device installed in all vehicles. But
trajectory sensor generating continuous data creates bottle neck. Multiple linear regression
International Journal of Pure and Applied MathematicsVolume 119 No. 18 2018, 1911-1925ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
1911
used to estimate traffic vacancy. Correlation coefficient computed with the multiple linear
aggression models established by sample items in multiple linear approach. Correlation
coefficient calculates the vacancy estimation.
The system proposes an optimized method which presents an efficient allocation of the
computing resources to incorporate the random nature of the vehicle movement. An online
real time approach to address the problem is realized with Apache Spark framework in
Elastic Map Reduce. The sys-tem processes the constant stream of location information
from vehicles to address the major defect sparseness which focuses on employing the real
time data. The vehicles emit new location information every second, so the data processing
should be close as the data providers, meaning the data transfer duration between stream
operator and source should be minimized [4]. The real time processing ( geographical as
well as operational ) of this transport data demands a distributed stream processing engine,
where each stream processing operator can be deployed on different clouds as individual
operator nodes. Since the traffic are based on real time, and they are fluctuating in non-
continuous manner, the system elastically stream the data in cloud [5].
Cloud computing is a kind of Internet-based computing that provides shared processing
resources and data to computers and other devices on demand. [2] It is a model for
enabling ubiquitous, on-demand access to a shared pool of configurable computing
resources.[2] Cloud computing has become a highly demanded service or utility due to the
advantages of high computing power, cheap cost of services, high performance, scalability,
accessibility as well as availability. [2] Users access cloud computing using networked
client devices, such as desktop computers, laptops, tablets and smartphones and any
Ethernet enabled device such as Home Automation Gadgets. The system uses Amazon
Elastic Map Reduce to rapidly and cost-effectively process vast amounts of data. We can
provision many computing instances to process data at any scale. The computing resources
can be provisioned to process data at any scale [6]. The increasing or decreasing the
computing resources according to the requirements, or workload are called over-
provisioning and under-provisioning respectively. The system uses Apache spark
framework for parallel execution.
Figure. 1.1 Spark Architecture
The Figure1 shows the Apache spark cluster, where driver nodes communicate with
executor nodes. (Logically like execution cores).
International Journal of Pure and Applied Mathematics Special Issue
1912
2. MOTIVATION
With the arrival of data-specific applications that generate enormous volumes of real-time
data, distributed stream processing systems become increasingly important in road-traffic
monitoring. Data stream processing is needed for processing incoming data streams from
large numbers of sensors in a real-time fashion. Even when the incoming data rate
fluctuates in a non-continuous manner, the data stream system must work with low
latency. This is almost impossible with a local workstation or laboratory because its
computational resources are finite. An efficient stream processing engine is needed to
implement a cost efficient stream processing engine. It should adopt to elastic resource
usage while maintaining the real-time processing. Elastic resource usage means elastically
increasing and decreasing the computational resources according to the requirements. In
the proposed system, method changes the computing environment based on the data rate in
the input data stream. By using the cloud environment, additional computational resources
within a few minutes can be added or removed, and there is no need to consider where the
new resources are located. And, the number of virtual machines can be dynamically added
or removed, so system can deal the elastic nature by temporarily adding or removing
computational nodes. Green computing is the environmentally responsible and eco-
friendly use of computers and their resources [3] & [7]. Reducing the electronic waste and
power dissipation can highly contribute to green computing. Thus using electronic
equipment according to the workload can help in green computing. In-order to support the
above, the elastic property should be implemented where there is a need for performing
computations in massive amount of data.
3. PROPOSED SYSTEM
The increasing power of computer brought innovations in the world of information.
Analysing massive real-time data has enabled to massively distribute computing in well
adapted to cloud which computes the data elastically. The main objective of the proposed
system is to make the cloud services elastic for processing the real-time data coming from
traffic sensors and help in decision making for transport systems. By using the cloud
environment, additional computational resources can be varied, and there is no need to
consider where the new resources are located. In addition, the number of virtual machines
can be dynamically, so that system can even deal with situations where the data rate
suddenly becomes high, by temporarily adding cloud nodes. The system uses a cloud
environment by changing the number of computational nodes in the cloud. The system
takes continuously generating trajectory data, then adding computational nodes in the
cloud environment by using an appropriate number of virtual machines and processing in
parallel. The main challenges in processing the traffic data streams are, in homogeneous
sparseness in both spatial and temporal dimensions that is introduced by probe vehicles
moving at their own will, and processing stream data in real time manner with low
latency.[1] The proposed system presents real-time road traffic monitoring system that
collects data from vehicles to monitor the real-time traffic scenario. Since the computing
resources continues to be generated in massive amounts, elasticity is established to meet
the requirements.
The proposed system mainly consist of;
International Journal of Pure and Applied Mathematics Special Issue
1913
1. Real time road traffic monitoring in Apache Spark
2. Design of elastic stream processing in Amazon Web Service cloud
Elastic Stream Processing Platform for transport data analysis is proposed to estimate the
vacancies in raw traffic data. The road traffic monitoring module is executed in Amazon
web service cloud. The monitor checks the storage and the average CPU usage of the
instances (cluster) to fire the trigger. Various threshold metrics are given so as to check
whether there is need for over provisioning and under provisioning. If the CPU usage is
above a limit, then the instances are resized to add more task nodes, thereby providing
more storage and memory. If the usage is less than the threshold, then the cluster instances
are shrinked. The usage monitor observes the load of the stream source and its CPU and
RAM usage are analysis. Based on the threshold, whether data to be scaled up or down is
decided. This monitoring data is preprocesses within the usage monitor to derive the
metrics such as average system load/ minute and forwarded to the reason which decides
the over provisioning and under provisioning scenarios.
The processing node component takes care of allocating new processing nodes to ensure
that the real-time processing capabilities of the stream processing engine. The processing
node management allocates cloud computing resources and deploy the processing nodes
on these computational resource.. The velocity estimations using speed is computed so as
to decide whether there is block at the study or estimated region. It can take up to 10
minutes for an instances to launch in a cloud.
Figure 3.1 Proposed System
International Journal of Pure and Applied Mathematics Special Issue
1914
That's ten minutes is between the system checks the requirement for extra capacity and the
time when that capacity is actually available. That's a ten minutes of impaired performance
for computation. If proper capacity prediction is done before, the merit is that capacity will
be added before it is needed, ensuring that the proper capacity is always in place. The
system also considers this case, by predicting the need for extra capacity using decision
tree classifier.
3.1. Real-time Traffic Monitoring Design
The real-time road traffic monitoring design collects the real time data from GPS wireless
communication in vehicles to monitor the real-time traffic scenario. The velocity or
vacancy estimation problem in traffic data is addressed using multiple linear regression in
[1]. The velocity is considered as the criteria to predict the traffic congestion in a region.
The assumption is that, a good traffic condition always results in more velocity.
1. The location is identified and is converted to (latitude, longitude).
2. The eight neighborhood regions of location is found out.
3. The vehicles in each region is identified, and velocity of each vehicle is calculated
from its distance and time.
4. The average velocity is found out considering every region‟s velocity and it is
considered as the real value.
5. The multi regression approach is used to find the estimated velocity of the region
If there are m neighboring regions construct the matrix X.
1 𝑣11 …… 𝑣𝑚1
X = 1 𝑣12……. 𝑣𝑚2
1 𝑣1ℎ …….𝑣𝑚ℎ
6. The estimation of correlation coefficient βˆ is found out as
βˆ = (XTX)
-1(X
TV)
7. The models for representing correlations between ri and its neighboring regions at
time t, denoted as
vit = β0 + β1v1t + β2v2t +•••+ βmvmt + μt
8. vit is the average velocity at the estimated region.
The velocities in neighborhood regions are also considered to estimate the vacancy at a
particular region. So based on the velocity at a particular region, one can predict whether
there is more or less traffic at a region. In traffic vacancy estimation problem, the first step
is to find all near-by vehicles around the study region in a specific time window. The geo
hash method divides points on the earth‟s surface into grids. The pseudo code for the
traffic vacancy estimation is given below [1].
Input:
Locationi : include longitude and latitude
scopei : range of region
di : direction of traffic condition estimation
∆ti : time window
ti : time of traffic condition
Output:
ETCM : traffic condition estimation matrix
International Journal of Pure and Applied Mathematics Special Issue
1915
1: r(ri, ti, di) ← geohash( longitude, latitude), scopei,∆ti
2: Rnb(i) ← SearchNeighboringr(r0, t0, d0), |Rnb(i)| = 8
3: for K = 1 to |Rnb(i)| do
4: Rk(ri, ti, di) = {S
ki (t)|r(r, ri) < _r ∧ |t − ti| < _ti ∧ d = di}
5: k(ri, ti, di) = where N = |R(ri, ti, di)|
6: CkTCM(di) ← v
k(ri, ti, di)
7: end for
8: while di == destimate do
9: X ←CkTCM(di) //samplefromtwoflanks
10: ˆβ= ( ˆβ0, ˆβ1, ˆβ2, . . . , ˆβm)T = (X
T X)
−1(X
TV)
11: ˆvit = ˆβ0 + ˆβ1v1t + ˆβ2v2t + . . . + ˆβmvmt //sample in the same direction
12: end while
13: ETCM ← vit
14: return ETCM
The region is divided to nine rectangular areas. There are eight neighboring regions around
the estimation region. The estimation is made by the two rectangular areas that are in the
same direction as the estimation region.
3.2. Elasticity Design
Elasticity is defined as the degree to which a system is able to adapt to workload changes
by provisioning and de-provisioning resources in an autonomic manner, such that at each
point in time the available resources match the current demand as closely as possible. [8]
For the elastic streaming of transport-data, there should be under provisioning (de-
provisioning) and over provisioning (provisioning) cases. Over-provisioning means
allocating more resources than required and under-provisioning means allocating fewer
resources than fixed number of resources. Varying workload can be adjusted by altering
the number or use of computing resources and is called "elastic computing". [9]
Elasticity can be illustrated with an example. Consider a road traffic monitoring office
where road traffics are monitored and they have five systems deployed. During peak time,
(say, Saturday evening) the number of vehicles will be more, and thus they need more
computational nodes to store and process the data generated randomly from the vehicles.
So there is a need of overprovisioning of nodes. ie, more worker nodes are needed to
handle the traffic load without making the nodes overloaded. Consider another scenario
where there are hardly any vehicle (say, mid-night) there is no need of five worker nodes
to process the data. So system should elastically provision the clusters according to the
requirements. Thus over-provisioning (provisioning) means allocating more resources than
required and under-provisioning (de-provisioning), means allocating fewer resources than
fixed number of resources. Elasticity is given as some thresholds that set the conditions
that trigger over-provisioning or under-provisioning. Provisioning and de-provisioning are
triggered if the average CPU utilization is above an upper utilization threshold or below
the lower utilization threshold. Different metrics are used as the thresholds 10].
Reconfiguration actions aim at achieving an average CPU utilization as close as possible to
Target Utilization Threshold. Load balancing is triggered when the standard deviation of
the CPU utilization is above the Upper Imbalance Threshold. In order to enforce the
International Journal of Pure and Applied Mathematics Special Issue
1916
elasticity rules, the system periodically collects monitoring information from all instances
on each sub cluster. The information includes average CPU usage. The system then
computes the average CPU usage per sub cluster. If it is outside the allowed range, the
number of instances required to cope with the current load is computed. If the sub cluster
is under provisioned, new instances are allocated. If the sub cluster is over provisioned, the
load of not needed instances is transferred to the rest of instances by offload function.
(a) Over-Provisioning Scenario
The upscaling algorithm is composed of two upscaling thresholds. The reasoner decides to
scale up, either the load of the incoming queue excesses a specific threshold or if the
average CPU usage of the processing nodes exceeds 95% of the available CPU resources.
def scaleUp() :
if (incomingData > specificThreshold ) :
addmoreNodes()
else if ( avgCPUusage() > 0.95 ) :
addmoreNodes()
else
doNothing()
The CPU usage of the processing node is monitored every time. Since the amount of
continuous flow of traffic data recorded vary with time, the CPU load is analyzed to check
whether it falls above the threshold. If it reaches a value above the particular threshold, the
scale up function is called to add more instances to process the data[11].
(b) Under-Provisioning Scenario
The downscaling operation is triggered if the load of the incoming queue falls below a
specific threshold and if there are at least two processing nodes assigned to one operator
node. If this is the case, the reasoner iterates through all processing nodes to select a
suitable one.
def scaleDown() :
if (incomingData < specificThreshold ) :
if( processingNode > 1) :
scaleDownNodes()
else if ( avgCPUusage() < 0.20 ) :
removeNodes()
The under provisioning design starts by analyzing the incoming queue of the traffic data
[13]. If the amount of incoming queue is less than the specified threshold, and the
processing nodes are more than one, the system scales down the number of instances.
(c) Prediction to foresee Capacity Requirements
Amazon and other clouds cannot respond fast enough to increased capacity needs. It takes
up more than five minutes for instances to launch. That's many minutes of impaired and
delayed performance for computation. Only if proper prediction about capacity is done, the
impaired performance can be improved by resizing the cluster instances before workload
comes. This is done by considering the capacity needed in the past days at the same time.
For example, to foresee whether provisioning is needed at 12.00 pm, the streamed traffic
data at 12.00 pm on past five days is taken and is analyzed to check whether it exceeded
International Journal of Pure and Applied Mathematics Special Issue
1917
threshold and needed large instance. At the same time, workload before 10 min on the
same day is also taken as the criteria. Then it is predicted whether provisioning of instance
is needed. Thus latency during the instance allocation can be decreased to an extent,
thereby increasing the performance.
def capacitypred() :
currentTime ()
calculatepastcapacities()
if ( capacitiesexceeds_more )
resize_instances()
else
elasticity()
The algorithm for prediction analysis is shown above. It takes the current time and
analyses the past capacities [14]. If more past capacities exceeds threshold, then resizing is
considered. Otherwise algorithm for provisioning or de-provisioning is considered.
3.3. Dataset
A taxi trajectory data set that contains one-day trajectories of approximate 7,648 taxis. [12]
The total number of data in the dataset taken is about 18 million.
ID Timestamp Longitude Latitude
Figure 3.2 An extract from the dataset
The dataset consist of the vehicle number of the taxi, the time at which the information is
sent, longitude and latitude of the current location from where the information is sent.
They are represented as a four tuple < ID, Timestamp, Longitude, Latitude > as in Figure
3.2
4. IMPLEMENTING ELASTICITY
The resource usage is continuously monitored and analyzed on current resource usage
threshold basis. There are two scenarios, namely, under provisioning and over
provisioning [15]. The below pseudo codes decide on either scaling up, ie, adding a new
processing Node, maintaining the current state or scaling down, ie, removing the most
processing node.
def capacitypred() :
International Journal of Pure and Applied Mathematics Special Issue
1918
currentTime ()
calculatepastcapacities()
if ( capacitiesexceeds_more )
resize_instances()
else
elasticity()
def elasticity()
if queueIncomingload > upscalingThreshold then
scaleUP()
else if avgCPUload > 90 then
scaleUP();
else if queueIncomingload < downscalingThreshold then
if processingNodes>1 then
scaleDownNode = NULL
else if avgCPUload < 30 then
scaleDownNode();
else
doNothing();
The upscaling algorithm is composed of two upscaling thresholds. The reasoner decides to
scale up, either the load of the incoming queue excesses a specific threshold or if the
average CPU usage of the processing nodes exceeds 90% of the available CPU resources.
The downscaling operation is triggered if the load of the incoming queue falls below a
specific threshold. Prediction to foresee the capacity requirement before the arrival of real
time traffic data is also added in the elasticity. This is to adjust the impaired time taken to
resize the cluster instances. Thereby, latency in performing computation can be reduced.
The decision about the resizing also considers the following steps. In-order predict the
capacity requirement at hh:mm:ss time, ten minute before data-stream is considered. The
prediction is done on decision tree classifier. DecisionTreeClassifier is a class capable of
performing multi-class classification on a dataset [16] & [17]. Here, there are two target
classes, „yes‟ and „no‟. „yes‟ stands for the need for resizing and „no‟ stands for the
converse. If „no‟ comes, then conditions for over-provisioning and under-provisioning is
considered. The past five days workload are considered for prediction. If threshold exceeds
more than three, then it falls in class „yes‟ is predicted and if not, it falls in class „no‟.
Since there are five attributes, then there will be 32 training sets. If it exceeds the
threshold, it is represented as „1‟ and if it not, then it is represented as „0‟. The decision
classifier is implemented in python using scikit-learn library. It is a simple and efficient
tools for data mining and data analysis which is built on NumPy, SciPy, and matplotlib. If
the answer results in „no‟ class, it should trigger under provisioning or over provisioning
algorithm[18] & [19]. In Amazon Web Service cloud, the instance usage is monitored
using Cloud Watch.
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-6bcb57c4 \
--statistics Average \
--start-time `date -u '+%FT%TZ' -d '10 mins ago'` \
International Journal of Pure and Applied Mathematics Special Issue
1919
--end-time `date -u '+%FT%TZ'` \
--period 60 | jq '.Datapoints[0] | .Average'
Code 1 : Cloudwatch Monitoring
The output is the average CPU usage of the cluster instance in 10 minute duration. The
output is compared with the threshold value (say upper limit as 90%). If it is greater than
the threshold, the master and provisioned nodes are out of CPU, thus there is a need for
more computational resources [20]. Thus cluster instances are resized to a larger instance
cluster by adding more task nodes. This is how over-provision carried out. If it is below
the threshold, the cluster is shrinked to its minimum instance type.
clf = tree.DecisionTreeClassifier()
clf = clf.fit(dataset, target)
predctn= clf.predict(input)
Code 2: Pseudocode for prediction
5. PERFORMANCE ANALYSIS
Analyzing the behavior of the proposed system based on several properties such as
execution time, data input and liveness is explained in this section. The analysis explains
the need for elasticity when the data load is less or more and when the CPU usage is high.
The conditions used to perform the analysis are;
1. Under-provisioning when the input load is less than a threshold (Based on the
instance-type in the cluster)
2. Over-provisioning when the input load is greater than a threshold (Based in the
instance-type in the cluster, it varies)
3. Over-provisioning when the CPU usage is 90 % (It means the instance reached its
saturation point and will not perform well)
The instance type used during cluster creation is m1.large instance type. It is a general
purpose instance type. One master node and one core node is allocated for the cluster. A
core (slave) node is an instance in the cluster that runs tasks and stores data. A master node
typically runs master components of the distributed applications that are installed on a
cluster. More tasks nodes are added to the cluster node when under-provisioning or over-
provisioning occurs. One can add up to 48 additional task groups. When it reaches its
threshold, one can modify the instance type itself. The analysis is also carried out with
m1.medium (when the system is under-provisioned) and adding more task-nodes (when
the system is over-provisioned) instance-type to show the elastic property. The resource
utilization for the normal case instance is shown in the Figure 5.1. The resource utilization
of the running cluster is shown in the Figure 5.1. The resource usage in terms of CPU
usage, memory and storage is plotted against time. The x-axis is taken as the time duration
and y-axis is taken as the resource utilized in each hour for performing the computations
on traffic data. The time is taken in one hour difference.
International Journal of Pure and Applied Mathematics Special Issue
1920
Figure 5.1 Resource Utilization
At the starting phase, the graph for resource usage is plotted against computation on early
morning. That time, the number of vehicles are too less. So there is no need for heavy
utilization of resources. Most of the time, allocated resources will be in idle state or low
usage state, thus resulting in wastage of these computational resources. Even-though they
are in low-usage state, there will be more power dissipation and cost since allocated
resources have high configurations [21]. The usage of resources gradually increases. At
peak time, especially at morning and evening there will be large number of vehicles, thus
large amount of streaming data. So resource usage will be very high. Thus the high usage
of resources will result in low execution time and increased latency. The system aims at
reducing the latency and increasing the throughput. So under-provisioning is implemented
where the resource usage falls below a certain threshold ( say 30% ). A small instance is
allocated for processing the traffic data on these times. Similarly more resources are added
to process the data on peak times, since most of the resources will be in saturated state
because of heavy load. A threshold of 80% is set as the condition for over provisioning and
a threshold of 30% usage monitoring is taken as the condition for under provisioning [22].
The cluster instances are provisioned with more task nodes when the CPU usage reaches
threshold of 85 %. When they are assigned more task nodes, the memory usage gradually
decreased as the workload is distributed among the task nodes. The Figure 5.3 shows the
memory usage after the provisioning of nodes.
International Journal of Pure and Applied Mathematics Special Issue
1921
Figure 5.2 Capacity-Utilization Curve
Figure 5.3 Memory usage after provisioning of nodes
A comparison graph as shown in Figure 5.4 is drawn to compare the memory usage
before and after provisioning. From the above, it is shown that, memory usage can be
minimized by provisioning more task nodes. Thus latency can be decreased, and the
execution time can also be decreased [23].
Figure 5.4 Comparison
A comparison graph as shown in Figure 5.4 is drawn to compare the memory usage before
International Journal of Pure and Applied Mathematics Special Issue
1922
and after provisioning. From the above, it is shown that, memory usage can be minimized
by provisioning more task nodes. Thus latency can be decreased, and the execution time
can also be decreased.
6. CONCLUSION
The demand for new efficient methods for processing large-scale heterogeneous data in
real-time is growing. Currently, key challenge in Big Data is performing low latency
analysis with real-time data. In vehicle traffic, continuous high speed data streams generate
large data volumes. The system is deployed in distributed and parallel computing
framework with Apache Spark. Elasticity is deployed in Amazon EC2 cloud to handle the
varying storage and resources. The system evaluates the need for elastic and distributed
real-time analysis of heterogeneous data. It is shown that the elasticity can contribute to
low resource utilization, low latency and high throughput. Predicting the need for extra
capacity reduce the time taken to launch clusters by providing with capacity when
required. The roadtraffic monitoring also provides an efficient mechanism to predict the
congestion in a region by leveraging the real-time data. The system also discovers the
trends of system resources by predicting the workload so as to increase the system capacity
and performance. The system highly contributes to green computing by optimizing the use
of computational resources.
7. REFERENCES
[1] F.Wangetal., Estimating online vacancies in real- time road traffic monitoring with
traffic sensor data stream ,AdHoc Networks (2015) Ad Hoc networks Volume 35,
December 2015, Pages 3-13
[2] Konstantinou, I., Angelou, E., Boumpouka, C., Tsoumakos, D., and Koziris, N. On
the elasticity of nosql databases over cloud management platforms. In Proceedings of the
20th ACM international conference on Information and knowledge management (2011),
ACM, pp. 2385–2388.
[3] Jugraj Veer Singh, Sonia Vatta, Green Computing : Eco Friendly Technology
International Journal of Engineering Research and General Science Volume 4, Issue 1,
Jan-Feb, 2016 ISSN 2091- 2730
[4] Michael Franklin, Alon Halevy From Databases to Dataspaces: A New Abstraction
for Information Management, ACM SIGMOD Record Homepage archive Volume 34
Issue 4, December 2005 Pages 27-33
[5] Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martinez, Claudio Soriente, and
Patrick Valduriez. 2012. StreamCloud: An Elastic and Scalable Data Streaming System.
IEEE Trans. Parallel Distrib. Syst. 23, 12 (December 2012), 2351-2365.
DOI=http://dx.doi.org/10.1109/TPDS.2012.24
[6] Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, and Christof Fetzer. 2014.
Auto-scaling techniques for elastic data stream processing. In Proceedings of the 8th ACM
International Conference on Distributed EventBased Systems
[7] Yingjun Wu, Kian-Lee Tan ChronoStream: Elastic Stateful Stream Computation
in the Cloud Data Engineering (ICDE), 2015 IEEE 31st International Conference
10.1109/ICDE.2015.7113328
International Journal of Pure and Applied Mathematics Special Issue
1923
[8] F. Calabrese, M. Colonna, P. Lovisolo, D. Parata, C. Ratti, Real-time urban
monitoring using cell phones: A case study in rome, IEEE Trans. Intell.Transp. Syst. 12
(1) (2011) 141–151, doi:10.1109/tits.2010.2074196.
[9] N. Caceres, J.P. Wideberg, G. Benitez, Review of traffic data estimations extracted
from cellular networks, IET Intell. Transp. Syst. 2 (3) (2008) 179– 192, doi:10.1049/iet-
its:20080003. [10] J.C. Herrera, D.B. Work, R. Herring, X.G. Ban, Q. Jacobson, A.M.
Bayen, Evaluation of traffic data obtained via gps-enabled mobile phones: The mobile
century field experiment, Transp. Res. Part C-Emerging Technol. 18 (4) (2010) 568–583,
doi:10.1016/j.trc.2009.10.006.
[11] H. Su, K. Zheng, J. Huang, H. Jeung, L. Chen, X. Zhou, Crowdplanner: A crowd-
based route recommendation system, in: Data Engineering (ICDE), 2014 IEEE 30th
International Conference on, 2014, doi:10.1109/ICDE.2014.6816730.
[12] Trajectory Data https://www.microsoft.com/enus/research/publication/trajectory-data-
mining-an-overview/
[13] R. Frank, M. Mouton, T. Engel, Towards collaborative traffic sensing using mobile
phones (poster), in: 2012 IEEE Vehicular Networking Conference, VNC 2012, November
14- November 16, IEEE Computer Society, 2012, pp. 115–120,
doi:10.1109/VNC.2012.6407419.
[14] J. Zhou, C.L. Philip Chen, L. Chen, A small-scale traffic monitoring system in urban
wireless sensor networks, in: Proceedings of the IEEE International Conference on
Systems, Man, and Cybernetics, SMC 2013, October 13, 2013 - October 16, 2013, IEEE
Computer Society, 2013, pp. 4929–4934, doi:10.1109/SMC.2013.842.
[15] M. Whaiduzzaman, M. Sookhak, A. Gani, R. Buyya, A survey on vehicular cloud
computing, J. Network Comput Appl.325–344, doi:10.1016/j.jnca.2013.08.004.
[16] M. Gerla, Vehicular cloud computing, in: 11th Annual Mediterranean Ad Hoc
Networking Workshop, Med-Hoc-Net 2012, June 19,2012 - June 22, 2012, IEEE
Computer Society, 2012, pp. 152– 155,doi:10.1109/MedHocNet.2012.6257116.
[17] C.Y. Goh, J. Dauwels, N. Mitrovic, M.T. Asif, A. Oran, P. Jaillet, Online map-
matching based on hidden markov model for real-time traffic sensing applications,
Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on,
2012, pp. 776–781, doi:10.1109/ITSC.2012.6338627.
[18] S Rajeswari, K Suthendran, K Rajakumar and S Arumugam, “An Overview of the
MapReduce Model”, International Conference on Theoretical Computer Science and
Discrete Mathematics, Springer-LNCS,Vol.10398,pp.312-317,2016.
[19] S.Rajeswari and K. Suthendran , “Chi-Square MapReduce Model for Agricultural
Data”, Journal of Cyber Security and Mobility,Vol.7(1),pp.13-24,2018.
[20] Thulasi Mohan,Shilpa Sudheendran, Fepslin AthishMon and K. Suthendran,
“Divisioning and Replicating Data in Cloud for Optimal Performance and Security”, International Journal of Pure and Applied Mathematics,Vol.118,pp. 271-275,2018.
[21] Z. Shan, D. Zhao, Y. Xia, Urban road traffic speed estimation for missing probe
vehicle data based on multiple linear regression model, in: Intelligent Transportation
Systems - (ITSC), 2013 16th International IEEE Conference on, 2013, pp. 118–123. doi:
10.1109ITSC.2013.6728220.
[22] Hamzeh Khazaei, Saeed Zareian, Rodrigo Veleda, Marin Litoiu Sipresk: A Big Data
Analytic Platform for Smart Transportation
[23] Yisheng Lv ; „Traffic Flow Prediction with Big Data: A Deep Learning Approach‟
State Key Lab. of Manage. & Control for Complex Syst., Inst. of Autom., Beijing, China
International Journal of Pure and Applied Mathematics Special Issue
1924
1925
1926