final year ieee project 2013-2014 - parallel and distributed systems project title and abstract

Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |

Pondicherry | Trivandrum | Salem | Erode | Tirunelveli

http://www.elysiumtechnologies.com, [email protected]

13 Years of Experience

Automated Services

24/7 Help Desk Support

Experience & Expertise Developers

Advanced Technologies & Tools

Legitimate Member of all Journals

Having 1,50,000 Successive records in

all Languages

More than 12 Branches in Tamilnadu,

Kerala & Karnataka.

Ticketing & Appointment Systems.

Individual Care for every Student.

Around 250 Developers & 20

Researchers




227-230 Church Road, Anna Nagar, Madurai – 625020.

0452-4390702, 4392702, + 91-9944793398.

[email protected], [email protected]

S.P.Towers, No.81 Valluvar Kottam High Road, Nungambakkam,

Chennai - 600034. 044-42072702, +91-9600354638,

[email protected]

15, III Floor, SI Towers, Melapudur main Road, Trichy – 620001.

0431-4002234, + 91-9790464324.

[email protected]

577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641002

0422- 4377758, +91-9677751577.

[email protected]

mailto:[email protected]

mailto:[email protected]




Plot No: 4, C Colony, P&T Extension, Perumal puram, Tirunelveli-

627007. 0462-2532104, +919677733255,

[email protected]

1st Floor, A.R.IT Park, Rasi Color Scan Building, Ramanathapuram

- 623501. 04567-223225,

[email protected]

74, 2nd floor, K.V.K Complex,Upstairs Krishna Sweets, Mettur

Road, Opp. Bus stand, Erode-638 011. 0424-4030055, +91-

9677748477 [email protected]

No: 88, First Floor, S.V.Patel Salai, Pondicherry – 605 001. 0413–

4200640 +91-9677704822

[email protected]

TNHB A-Block, D.no.10, Opp: Hotel Ganesh Near Busstand. Salem

– 636007, 0427-4042220, +91-9894444716.

[email protected]




ETPL

PDS-001 Adaptive Network Coding for Broadband Wireless Access Networks

Abstract: Broadband wireless access (BWA) networks, such as LTE and WiMAX, are inherently lossy

due to wireless medium unreliability. Although the Hybrid Automatic Repeat reQuest (HARQ) error-

control method recovers from packet loss, it has low transmission efficiency and is unsuitable for delay-sensitive applications. Alternatively, network coding techniques improve the throughput of wireless

networks, but incur significant overhead and ignore network constraints such as Medium Access Control

(MAC) layer transmission opportunities and physical (PHY) layer channel conditions. The present study provides analysis of Random Network Coding (RNC) and Systematic Network Coding (SNC) decoding

probabilities. Based on the analytical results, SNC is selected for developing an adaptive network coding

scheme designated as Frame-by-frame Adaptive Systematic Network Coding (FASNC). According to

network constraints per frame, FASNC dynamically utilizes either Modified Systematic Network Coding (M-SNC) or Mixed Generation Coding (MGC). An analytical model is developed for evaluating the mean

decoding delay and mean goodput of the proposed FASNC scheme. The results derived using this model

agree with those obtained from computer simulations. Simulations show that FASNC results in both lower decoding delay and reduced buffer requirements compared to MRNC and N-in-1 ReTX, while also

yielding higher goodput than HARQ, MRNC, and N-in-1 ReTX.

ETPL

PDS-002 Covering Points of Interest with Mobile Sensors

Abstract: The coverage of Points of Interest (PoI) is a classical requirement in mobile wireless sensor

applications. Optimizing the sensors self-deployment over a PoI while maintaining the connectivity between the sensors and the base station is thus a fundamental issue. This paper addresses the problem of

autonomous deployment of mobile sensors that need to cover a predefined PoI with a connectivity

constraint. In our algorithm, each sensor moves toward a PoI but has also to maintain the connectivity

with a subset of its neighboring sensors that are part of the Relative Neighborhood Graph (RNG). The Relative Neighborhood Graph reduction is chosen so that global connectivity can be provided locally.

Our deployment scheme minimizes the number of sensors used for connectivity thus increasing the

number of monitoring sensors. Analytical results, simulation results and practical implementation are provided to show the efficiency of our algorithm.

ETPL

PDS-003 Detection and Localization of Multiple Spoofing Attackers in Wireless Networks

Abstract: Wireless spoofing attacks are easy to launch and can significantly impact the performance of

networks. Although the identity of a node can be verified through cryptographic authentication,

conventional security approaches are not always desirable because of their overhead requirements. In this paper, we propose to use spatial information, a physical property associated with each node, hard to

falsify, and not reliant on cryptography, as the basis for 1) detecting spoofing attacks; 2) determining the

number of attackers when multiple adversaries masquerading as the same node identity; and 3) localizing multiple adversaries. We propose to use the spatial correlation of received signal strength (RSS) inherited

from wireless nodes to detect the spoofing attacks. We then formulate the problem of determining the

number of attackers as a multiclass detection problem. Cluster-based mechanisms are developed to

determine the number of attackers. When the training data are available, we explore using the Support Vector Machines (SVM) method to further improve the accuracy of determining the number of attackers.



http://www.elysiumtechnologies.com, [email protected] In addition, we developed an integrated detection and localization system that can localize the positions of

multiple attackers. We evaluated our techniques through two testbeds using both an 802.11 (WiFi)

network and an 802.15.4 (ZigBee) network in two real office buildings. Our experimental results show that our proposed methods can achieve over 90 percent Hit Rate and Precision when determining the

number of attackers. Our localization results using a representative set of algorithms provide strong

evidence of high accuracy of localizing multiple adversaries.

ETPL

PDS-004

Efficient Eager Management of Conflicts for Scalable Hardware Transactional

Memory

Abstract: The efficient management of conflicts among concurrent transactions constitutes a key aspect

that hardware transactional memory (HTM) systems must achieve. Scalable HTM proposals so far inherit the cache-based style of conflict detection typically found in bus-based systems, largely unaware of the

interactions between transactions and directory coherence. In this paper, we demonstrate that the

traditional approach of detecting conflicts at the private cache levels is inefficient when used in the context of a directory protocol. We find that the use of the directory as a mere router of coherence

requests restricts the throughput of conflict detection, and show how it becomes a bottleneck under high

contention. This paper proposes a scheme for conflict detection that decouples conflict detection from cache coherence in order to overcome pathological situations that degrade the performance of an eager

HTM system. Our scheme places bookkeeping metadata at the directory, introducing it as a separate

hardware module that leaves the coherence protocol unmodified. In comparison to a state-of-the-art eager

HTM system, our design handles contention more efficiently, minimizes the performance degradation of false positives for signatures of similar hardware cost, and reduces the network traffic generated.

ETPL

PDS-005 High Performance Resource Allocation Strategies for Computational Economies

Abstract: Utility computing models have long been the focus of academic research, and with the recent

success of commercial cloud providers, computation and storage is finally being realized as the fifth

utility. Computational economies are often proposed as an efficient means of resource allocation, however adoption has been limited due to a lack of performance and high overheads. In this paper, we

address the performance limitations of existing economic allocation models by defining strategies to

reduce the failure and reallocation rate, increase occupancy and thereby increase the obtainable utilization of the system. The high-performance resource utilization strategies presented can be used by market

participants without requiring dramatic changes to the allocation protocol. The strategies considered

include overbooking, advanced reservation, just-in-time bidding, and using substitute providers for

service delivery. The proposed strategies have been implemented in a distributed metascheduler and evaluated with respect to Grid and cloud deployments. Several diverse synthetic workloads have been

used to quantity both the performance benefits and economic implications of these strategies.

ETPL

PDS-006 Mapping a Jacobi Iterative Solver onto a High-Performance Heterogeneous Computer

Abstract: High-performance heterogeneous computers that employ field programmable gate arrays

(FPGAs) as computational elements are known as high-performance reconfigurable computers (HPRCs). For floating-point applications, these FPGA-based processors must satisfy a variety of heuristics and rules

of thumb to achieve a speedup compared with their software counterparts. By way of a simple sparse

matrix Jacobi iterative solver, this paper illustrates some of the issues associated with mapping floating-



http://www.elysiumtechnologies.com, [email protected] point kernels onto HPRCs. The Jacobi method was chosen based on heuristics developed from earlier

research. Furthermore, Jacobi is relatively easy to understand, yet is complex enough to illustrate the

mapping issues. This paper is not trying to demonstrate the speedup of a particular application nor is it suggesting that Jacobi is the best way to solve equations. The results demonstrate a nearly threefold wall

clock runtime speedup when compared with a software implementation. A formal analysis shows that

these results are reasonable. The purpose of this paper is to illuminate the challenging floating-point mapping process while simultaneously showing that such mappings can result in significant speedups.

The ideas revealed by research such as this have already been and should continue to be used to facilitate

a more automated mapping process.

ETPL

PDS-007 MIN-MAX: A Counter-Based Algorithm for Regular Expression Matching

Abstract: We propose an NFA-based algorithm called MIN-MAX to support matching of regular expressions (regexp) composed of Character Classes with Constraint Repetitions (CCR). MIN-MAX is

well suited for massive parallel processing architectures, such as FPGAs, yet it is effective on any other

computing platform. In MIN-MAX, each active CCR engine (to implement one CCR term) evaluates input characters, updates (MIN, MAX) counters, and asserts control signals, and all the CCR engines

implemented in the FPGA run simultaneously. Unlike traditional designs, (MIN, MAX) counters contain

dynamically updated lower and upper bounds of possible matching counts, instead of actual matching

counts, so that feasible matching lengths are compactly enclosed in the counter value. The counter-based design can support constraint repetitions of n using O({rm log} n) memory bits rather than that of O(n) in

existing solutions. MIN-MAX can resolve character class ambiguity between adjacent CCR terms and

support overlapped matching when matching collisions are absent. We developed a set of heuristic rules to assess the absence of collision for CCR-based regexps, and tested them on Snort and SpamAssassin

rule sets. The results show that the vast majority of rules are immune from collisions, so that MIN-MAX

can cost effectively support overlapped matching. As a bonus, the new architecture also supports fast reconfiguration via ordinary memory writes rather than resynthesis of the entire design, which is critical

for time-sensitive regexp deployment scenarios.

ETPL

PDS-008 Network Traffic Classification Using Correlation Information

Abstract: Traffic classification has wide applications in network management, from security monitoring

to quality of service measurements. Recent research tends to apply machine learning techniques to flow

statistical feature based classification methods. The nearest neighbor (NN)-based method has exhibited superior classification performance. It also has several important advantages, such as no requirements of

training procedure, no risk of overfitting of parameters, and naturally being able to handle a huge number

of classes. However, the performance of NN classifier can be severely affected if the size of training data is small. In this paper, we propose a novel nonparametric approach for traffic classification, which can

improve the classification performance effectively by incorporating correlated information into the

classification process. We analyze the new classification approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world

traffic data sets to validate the proposed approach. The results show the traffic classification performance

can be improved significantly even under the extreme difficult circumstance of very few training samples.




ETPL

PDS-009 Online Real-Time Task Scheduling in Heterogeneous Multicore System-on-a-Chip

Abstract: Online task scheduling in heterogeneous multicore system-on-a-chip is a challenging problem due to precedence constraints and nonpreemptive task execution in the synergistic processor core. This

study first proposes an online heterogeneous dual-core scheduling framework for dynamic workloads

with real-time constraints. The general purpose processor core and the synergistic processor core are dedicated to separate schedulers with different scheduling policies, and precedence constraints among

tasks are dealt with through interaction between the two schedulers. This framework is also configurable

for low priority inversion and high system utilization. We then extend this framework to heterogeneous

multicore systems with well-known dispatcher schemas. This paper presents a real case study to show the practicability of the proposed methodology, and presents a series of extensive simulations to obtain

comparison studies using different workloads and scheduling algorithms.

ETPL

PDS-010

Scalable and Secure Sharing of Personal Health Records in Cloud Computing Using

Attribute-Based Encryption,

Abstract: Personal health record (PHR) is an emerging patient-centric model of health information

exchange, which is often outsourced to be stored at a third party, such as cloud providers. However, there have been wide privacy concerns as personal health information could be exposed to those third party

servers and to unauthorized parties. To assure the patients' control over access to their own PHRs, it is a

promising method to encrypt the PHRs before outsourcing. Yet, issues such as risks of privacy exposure, scalability in key management, flexible access, and efficient user revocation, have remained the most

important challenges toward achieving fine-grained, cryptographically enforced data access control. In

this paper, we propose a novel patient-centric framework and a suite of mechanisms for data access

control to PHRs stored in semitrusted servers. To achieve fine-grained and scalable data access control for PHRs, we leverage attribute-based encryption (ABE) techniques to encrypt each patient's PHR file.

Different from previous works in secure data outsourcing, we focus on the multiple data owner scenario,

and divide the users in the PHR system into multiple security domains that greatly reduces the key management complexity for owners and users. A high degree of patient privacy is guaranteed

simultaneously by exploiting multiauthority ABE. Our scheme also enables dynamic modification of

access policies or file attributes, supports efficient on-demand user/attribute revocation and break-glass access under emergency scenarios. Extensive analytical and experimental results are presented which

show the security, scalability, and efficiency of our proposed scheme.

ETPL

PDS-011 Strategies for Energy-Efficient Resource Management of Hybrid Programming Models

Abstract: Many scientific applications are programmed using hybrid programming models that use both

message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using

software-controlled execution schemes that consider both the programming model and the power-aware

execution capabilities of the system. However, such approaches have focused on identifying optimal

resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models

since the possible resource configurations increase exponentially. Nonetheless, with the accelerating

adoption of hybrid programming models, we increasingly need improved energy efficiency in hybrid



http://www.elysiumtechnologies.com, [email protected] parallel applications on large-scale systems. In this work, we present new software-controlled execution

schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and

frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and

time requirements under different concurrency and frequency configurations. We apply our models and

methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74 percent on average and up to 13.8 percent) with some

performance gain (up to 7.5 percent) or negligible performance loss.

ETPL

PDS-012

Supporting HPC Analytics Applications with Access Patterns Using Data

Restructuring and Data-Centric Scheduling Techniques in MapReduce

Abstract: Current High Performance Computing (HPC) applications have seen an explosive growth in the

size of data in recent years. Many application scientists have initiated efforts to integrate data-intensive computing into computational-intensive HPC facilities, particularly for data analytics. We have observed

several scientific applications which must migrate their data from an HPC storage system to a data-

intensive one for analytics. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must

be performed before existing data-intensive tools such as MapReduce can be used to analyze data. This

reorganization requires at least two complete scans through the data set and then at least one MapReduce

program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. That is for every MapReduce phase,

a distributed read and write operation on the file system must be performed. Our contribution is to

develop a MapReduce-based framework for HPC analytics to eliminate the multiple scans and also reduce the number of data preprocessing MapReduce programs. We also implement a data-centric scheduler to

further improve the performance of HPC analytics MapReduce programs by maintaining the data locality.

We have added additional expressiveness to the MapReduce language to allow application scientists to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple

data preprocessing MapReduce programs, and 2) the data can be simultaneously reorganized as it is

migrated to the data-intensive file system. Using our augmented Map-Reduce system, MapReduce with

Access Patterns (MRAP), we have demonstrated up to 33 percent throughput improvement in one real application, and up to 70 percent in an I/O kernel of another appl- cation. Our results for scheduling show

up to 49 percent improvement for an I/O kernel of a prevalent HPC analysis application.

ETPL

PDS-013

Thermal and Energy Management of High-Performance Multicores: Distributed and

Self-Calibrating Model-Predictive Controller

Abstract: As result of technology scaling, single-chip multicore power density increases and its spatial

and temporal workload variation leads to temperature hot-spots, which may cause nonuniform ageing and accelerated chip failure. These critical issues can be tackled by closed-loop thermal and reliability

management policies. Model predictive controllers (MPC) outperform classic feedback controllers since

they are capable of minimizing performance loss while enforcing safe working temperature. Unfortunately, MPC controllers rely on a priori knowledge of thermal models and their complexity

exponentially grows with the number of controlled cores. In this paper, we present a scalable, fully

distributed, energy-aware thermal management solution for single-chip multicore platforms. The model-

predictive controller complexity is drastically reduced by splitting it in a set of simpler interacting controllers, each one allocated to a core in the system. Locally, each node selects the optimal frequency to



http://www.elysiumtechnologies.com, [email protected] meet temperature constraints while minimizing the performance penalty and system energy. Comparable

performance with state-of-the-art MPC controllers is achieved by letting controllers exchange a limited

amount of information at runtime on a neighborhood basis. In addition, we address model uncertainty by supporting learning of the thermal model with a novel distributed self-calibration approach that matches

well the controller architecture.

ETPL

PDS-014 Topology Abstraction Service for IP-VPNs

Abstract: VPN service providers (VSP) and IP-VPN customers have traditionally maintained service

demarcation boundaries between their routing and signaling entities. This has resulted in the VPNs viewing the VSP network as an opaque entity and therefore limiting any meaningful interaction between

the VSP and the VPNs. The purpose of this research is to address this issue by enabling a VSP to share its

core topology information with the VPNs through a novel topology abstraction (TA) service which is both practical and scalable in the context of managed IP-VPNs. TA service provides tunable visibility of

state of the VSP's network leading to better VPN performance. A key challenge of the TA service is to

generate TA with relevant network resource information for each VPN in an accurate and fair manner. We develop three decentralized schemes for generating TAs with different performance characteristics.

These decentralized schemes achieve improved call performance, fair resource sharing for VPNs, and

higher network utilization for the VSP. We validate the idea of the VPN TA service and study the

performance of the proposed techniques using various simulation scenarios over several topologies.

ETPL

PDS-015

A Secure Payment Scheme with Low Communication and Processing Overhead for

Multihop Wireless Networks

Abstract: We propose RACE, a report-based payment scheme for multihop wireless networks to stimulate

node cooperation, regulate packet transmission, and enforce fairness. The nodes submit lightweight

payment reports (instead of receipts) to the accounting center (AC) and temporarily store undeniable security tokens called Evidences. The reports contain the alleged charges and rewards without security

proofs, e.g., signatures. The AC can verify the payment by investigating the consistency of the reports,

and clear the payment of the fair reports with almost no processing overhead or cryptographic operations. For cheating reports, the Evidences are requested to identify and evict the cheating nodes that submit

incorrect reports. Instead of requesting the Evidences from all the nodes participating in the cheating

reports, RACE can identify the cheating nodes with requesting few Evidences. Moreover, Evidence

aggregation technique is used to reduce the Evidences' storage area. Our analytical and simulation results demonstrate that RACE requires much less communication and processing overhead than the existing

receipt-based schemes with acceptable payment clearance delay and storage area. This is essential for the

effective implementation of a payment scheme because it uses micropayment and the overhead cost should be much less than the payment value. Moreover, RACE can secure the payment and precisely

identify the cheating nodes without false accusations.

ETPL

PDS-016

Analysis of Distance-Based Location Management in Wireless Communication

Networks

Abstract: The performance of dynamic distance-based location management schemes (DBLMS) in



http://www.elysiumtechnologies.com, [email protected] wireless communication networks is analyzed. A Markov chain is developed as a mobility model to

describe the movement of a mobile terminal in 2D cellular structures. The paging area residence time is

characterized for arbitrary cell residence time by using the Markov chain. The expected number of paging area boundary crossings and the cost of the distance-based location update method are analyzed by using

the classical renewal theory for two different call handling models. For the call plus location update

model, two cases are considered. In the first case, the intercall time has an arbitrary distribution and the cell residence time has an exponential distribution. In the second case, the intercall time has a hyper-

Erlang distribution and the cell residence time has an arbitrary distribution. For the call without location

update model, both intercall time and cell residence time can have arbitrary distributions. Our analysis makes it possible to find the optimal distance threshold that minimizes the total cost of location

management in a DBLMS.

ETPL

PDS-017

Cluster-Based Certificate Revocation with Vindication Capability for Mobile Ad Hoc

Networks,

Abstract: Mobile ad hoc networks (MANETs) have attracted much attention due to their mobility and

ease of deployment. However, the wireless and dynamic natures render them more vulnerable to various types of security attacks than the wired networks. The major challenge is to guarantee secure network

services. To meet this challenge, certificate revocation is an important integral component to secure

network communications. In this paper, we focus on the issue of certificate revocation to isolate attackers

from further participating in network activities. For quick and accurate certificate revocation, we propose the Cluster-based Certificate Revocation with Vindication Capability (CCRVC) scheme. In particular, to

improve the reliability of the scheme, we recover the warned nodes to take part in the certificate

revocation process; to enhance the accuracy, we propose the threshold-based mechanism to assess and vindicate warned nodes as legitimate nodes or not, before recovering them. The performances of our

scheme are evaluated by both numerical and simulation analysis. Extensive results demonstrate that the

proposed certificate revocation scheme is effective and efficient to guarantee secure communications in mobile ad hoc networks.

ETPL

PDS-018 Coloring-Based Inter-WBAN Scheduling for Mobile Wireless Body Area Networks

Abstract: In this study, random incomplete coloring (RIC) with low time-complexity and high spatial

reuse is proposed to overcome in-between wireless-body-area-networks (WBAN) interference, which can

cause serious throughput degradation and energy waste. Interference-avoidance scheduling of wireless

networks can be modeled as a problem of graph coloring. For instance, high spatial-reuse scheduling for a dense sensor network is mapped to high spatial-reuse coloring; fast convergence scheduling for a mobile

ad hoc network (MANET) is mapped to low time-complexity coloring. However, for a dense and mobile

WBAN, inter-WBAN scheduling (IWS) should simultaneously satisfy both of the following requirements: 1) high spatial-reuse and 2) fast convergence, which are tradeoffs in conventional coloring.

By relaxing the coloring rule, the proposed distributed coloring algorithm RIC avoids this tradeoff and

satisfies both requirements. Simulation results verify that the proposed coloring algorithm effectively overcomes inter-WBAN interference and invariably supports higher system throughput in various mobile

WBAN scenarios compared to conventional colorings.

ETPL

PDS-019

Cross-Layer Design of Congestion Control and Power Control in Fast-Fading Wireless

Networks




Abstract: We study the cross-layer design of congestion control and power allocation with outage

constraint in an interference-limited multihop wireless networks. Using a complete-convexification

method, we first propose a message-passing distributed algorithm that can attain the global optimal source rate and link power allocation. Despite the attractiveness of its optimality, this algorithm requires larger

message size than that of the conventional scheme, which increases network overheads. Using the bounds

on outage probability, we map the outage constraint to an SIR constraint and continue developing a practical near-optimal distributed algorithm requiring only local SIR measurement at link receivers to

limit the size of the message. Due to the complicated complete-convexification method, however the

congestion control of both algorithms no longer preserves the existing TCP stack. To take into account the TCP stack preserving property, we propose the third algorithm using a successive convex

approximation method to iteratively transform the original nonconvex problem into approximated convex

problems, then the global optimal solution can converge distributively with message-passing. Thanks to

the tightness of the bounds and successive approximations, numerical results show that the gap between three algorithms is almost indistinguishable. Despite the same type of the complete-convexification

method, the numerical comparison shows that the second near-optimal scheme has a faster convergence

rate than that of the first optimal one, which make the near-optimal scheme more favorable and applicable in practice. Meanwhile, the third optimal scheme also has a faster convergence rate than that of a previous

work using logarithm successive approximation method.

ETPL

PDS-020 Distributed Data Replenishment

Abstract: We propose a distributed data replenishment mechanism for some distributed peer-to-peer-

based storage systems that automates the process of maintaining a sufficient level of data redundancy to ensure the availability of data in presence of peer departures and failures. The dynamics of peers entering

and leaving the network are modeled as a stochastic process. A novel analytical time-backward technique

is proposed to bound the expected time for a piece of data to remain in P2P systems. Both theoretical and simulation results are in agreement, indicating that the data replenishment via random linear network

coding (RLNC) outperforms other popular strategies. Specifically, we show that the expected time for a

piece of data to remain in a P2P system, the longer the better, is exponential in the number of peers used

to store the data for the RLNC-based strategy, while they are quadratic for other strategies.

ETPL

PDS-021 Distributed k-Core Decomposition

Abstract: Several novel metrics have been proposed in recent literature in order to study the relative importance of nodes in complex networks. Among those, k-coreness has found a number of applications

in areas as diverse as sociology, proteinomics, graph visualization, and distributed system analysis and

design. This paper proposes new distributed algorithms for the computation of the k-coreness of a network, a process also known as k-core decomposition. This technique 1) allows the decomposition,

over a set of connected machines, of very large graphs, when size does not allow storing and processing

them on a single host, and 2) enables the runtime computation of k-cores in “live” distributed systems.

Lower bounds on the algorithms complexity are given, and an exhaustive experimental analysis on real-world data sets is provided.




ETPL

PDS-022 Dynamic Coverage of Mobile Sensor Networks

Abstract: We study the dynamic aspects of the coverage of a mobile sensor network resulting from continuous movement of sensors. As sensors move around, initially uncovered locations may be covered

at a later time, and intruders that might never be detected in a stationary sensor network can now be

detected by moving sensors. However, this improvement in coverage is achieved at the cost that a location is covered only part of the time, alternating between covered and not covered. We characterize

area coverage at specific time instants and during time intervals, as well as the time durations that a

location is covered and uncovered. We further consider the time it takes to detect a randomly located

intruder and prove that the detection time is exponentially distributed with parameter 2λrv̅s where λ represents the sensor density , r represents the sensor 's sensing range , and v̅s denotes the average sensor

speed. For mobile intruders, we take a game theoretic approach and derive optimal mobility strategies for

both sensors and intruders. We prove that the optimal sensor strategy is to choose their directions uniformly at random between (0, 2π). The optimal intruder strategy is to remain stationary. This solution

represents a mixed strategy which is a Nash equilibrium of the zero-sum game between mobile sensors

and intruders.

ETPL

PDS-023 Exploiting Ubiquitous Data Collection for Mobile Users in Wireless Sensor Networks

Abstract: We study the ubiquitous data collection for mobile users in wireless sensor networks. People with handheld devices can easily interact with the network and collect data. We propose a novel approach

for mobile users to collect the network-wide data. The routing structure of data collection is additively

updated with the movement of the mobile user. With this approach, we only perform a limited

modification to update the routing structure while the routing performance is bounded and controlled compared to the optimal performance. The proposed protocol is easy to implement. Our analysis shows

that the proposed approach is scalable in maintenance overheads, performs efficiently in the routing

performance, and provides continuous data delivery during the user movement. We implement the proposed protocol in a prototype system and test its feasibility and applicability by a 49-node testbed. We

further conduct extensive simulations to examine the efficiency and scalability of our protocol with varied

network settings.

ETPL

PDS-024 Fast Channel Zapping with Destination-Oriented Multicast for IP Video Delivery

Abstract: Channel zapping time is a critical quality of experience (QoE) metric for IP-based video delivery systems such as IPTV. An interesting zapping acceleration scheme based on time-shifted

subchannels (TSS) was recently proposed, which can ensure a zapping delay bound as well as maintain

the picture quality during zapping. However, the behaviors of the TSS-based scheme have not been fully studied yet. Furthermore, the existing TSS-based implementation adopts the traditional IP multicast,

which is not scalable for a large-scale distributed system. Corresponding to such issues, this paper makes

contributions in two aspects. First, we resort to theoretical analysis to understand the fundamental

properties of the TSS-based service model. We show that there exists an optimal subchannel data rate which minimizes the redundant traffic transmitted over subchannels. Moreover, we reveal a start-up

effect, where the existing operation pattern in the TSS-based model could violate the zapping delay

bound. With a solution proposed to resolve the start-up effect, we rigorously prove that a zapping delay



http://www.elysiumtechnologies.com, [email protected] bound equal to the subchannel time shift is guaranteed by the updated TSS-based model. Second, we

propose a destination-oriented-multicast (DOM) assisted zapping acceleration (DAZA) scheme for a

scalable TSS-based implementation, where a subscriber can seamlessly migrate from a subchannel to the main channel after zapping without any control message exchange over the network. Moreover, the

subchannel selection in DAZA is independent of the zapping request signaling delay, resulting in

improved robustness and reduced messaging overhead in a distributed environment. We implement DAZA in ns-2 and multicast an MPEG-4 video stream over a practical network topology. Extensive

simulation results are presented to demonstrate the validity of our analysis and DAZA scheme.

ETPL

PDS-025

Gaussian versus Uniform Distribution for Intrusion Detection in Wireless Sensor

Networks

Abstract: In a Wireless Sensor Network (WSN), intrusion detection is of significant importance in many

applications in detecting malicious or unexpected intruder(s). The intruder can be an enemy in a battlefield, or a malicious moving object in the area of interest. With uniform sensor deployment, the

detection probability is the same for any point in a WSN. However, some applications may require

different degrees of detection probability at different locations. For example, an intrusion detection application may need improved detection probability around important entities. Gaussian-distributed

WSNs can provide differentiated detection capabilities at different locations but related work is limited.

This paper analyzes the problem of intrusion detection in a Gaussian-distributed WSN by characterizing

the detection probability with respect to the application requirements and the network parameters under both single-sensing detection and multiple-sensing detection scenarios. Effects of different network

parameters on the detection probability are examined in detail. Furthermore, performance of Gaussian-

distributed WSNs is compared with uniformly distributed WSNs. This work allows us to analytically formulate detection probability in a random WSN and provides guidelines in selecting an appropriate

deployment strategy and determining critical network parameters.

ETPL

PDS-026

IDM: An Indirect Dissemination Mechanism for Spatial Voice Interaction in

Networked Virtual Environments

Abstract: One type of Peer-to-Peer (P2P) live streaming has not yet been significantly investigated,

namely topologies that provide many-to-many, interactive connectivity. Exemplar applications of such P2P systems include spatial audio services for networked virtual environments (NVEs) and distributed

online games. Numerous challenging problems have to be overcome-among them providing low delay,

resilience to churn, effective load balancing, and rapid convergence-in such dynamic environments. We

propose a novel P2P overlay dissemination mechanism, termed IDM, that can satisfy such demanding real-time requirements. Our target application is to provide spatialized voice support in multiplayer

NVEs, where each bandwidth constrained peer potentially communicates with all other peers within its

area-of-interest (AoI). With IDM each peer maintains a set of partners, termed helpers, which may act as stream forwarders. We prove analytically that the system reachability is maximized when the loads of

helpers are balanced proportionally to their network capacities. We then propose a game-theoretic

algorithm that balances the loads of the peers in a fully distributed manner. Of practical importance in dynamic systems, we prove that our algorithm converges to an approximately balanced state from any

prior state in rapid O(log log n) time, where n is the number of users. We further evaluate our technique

with simulations and show that it can achieve near optimal system reachability and satisfy the tight

latency constraints of interactive audio under conditions of churn, avatar mobility, and heterogeneous user access network bandwidth.




ETPL

PDS-027 In-Network Estimation with Delay Constraints in Wireless Sensor Networks

Abstract: The use of wireless sensor networks (WSNs) for closing the loops between the cyberspace and the physical processes is more attractive and promising for future control systems. For some real-time

control applications, controllers need to accurately estimate the process state within rigid delay

constraints. In this paper, we propose a novel in-network estimation approach for state estimation with delay constraints in multihop WSNs. For accurately estimating a process state as well as satisfying rigid

delay constraints, we address the problem through jointly designing in-network estimation operations and

an aggregation scheduling algorithm. Our in-network estimation operation performed at relays not only

optimally fuses the estimates obtained from the different sensors but also predicts the upper stream sensors' estimates which cannot be aggregated to the sink before deadlines. Our estimate aggregation

scheduling algorithm, which is interference free, is able to aggregate as much estimate information as

possible from the network to the sink within delay constraints. We proved the unbiasedness of in-network estimation, and theoretically analyzed the optimality of our approach. Our simulation results corroborate

our theoretical results and show that our in-network estimation approach can obtain significant estimation

accuracy gain under different network settings.

ETPL

PDS-028 IP-Geolocation Mapping for Moderately Connected Internet Regions

Abstract: Most IP-geolocation mapping schemes [14], [16], [17], [18] take delay-measurement approach, based on the assumption of a strong correlation between networking delay and geographical distance

between the targeted client and the landmarks. In this paper, however, we investigate a large region of

moderately connected Internet and find the delay-distance correlation is weak. But we discover a more

probable rule - with high probability the shortest delay comes from the closest distance. Based on this closest-shortest rule, we develop a simple and novel IP-geolocation mapping scheme for moderately

connected Internet regions, called GeoGet. In GeoGet, we take a large number of webservers as passive

landmarks and map a targeted client to the geolocation of the landmark that has the shortest delay. We further use JavaScript at targeted clients to generate HTTP/Get probing for delay measurement. To

control the measurement cost, we adopt a multistep probing method to refine the geolocation of a targeted

client, finally to city level. The evaluation results show that when probing about 100 landmarks, GeoGet correctly maps 35.4 percent clients to city level, which outperforms current schemes such as GeoLim [16]

and GeoPing [14] by 270 and 239 percent, respectively, and the median error distance in GeoGet is

around 120 km, outperforming GeoLim and GeoPing by 37 and 70 percent, respectively.

ETPL

PDS-029 Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor

Abstract: We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing

Architecture (MLCA), a novel architecture targeted for multimedia multicore systems. The MLCA

augments a traditional multicore architecture (called the lower level) with a CP (called the top-level),

which automatically extracts parallelism among coarse-grain units of computation (tasks), synchronizes these tasks and schedules them for execution on processors. It does so in a fashion similar to how

instruction-level parallelism is extracted by superscalar processors, i.e., using register renaming, Out-of-

Order Execution (OoOE) and scheduling. The coarse-grain nature of tasks imposes challenging



http://www.elysiumtechnologies.com, [email protected] constraints on the direct use of these techniques, but also offers opportunities for simpler designs. We

analyze the impact of these constraints and opportunities and present novel microarchitectural

mechanisms for coarse-grain superscalar execution, including register renaming, task queue, dynamic out-of-order scheduling and task-issue. We design an MLCA system around our CP microarchitecture and

implement it on an FPGA. We evaluate the system using multimedia applications and show good

scalability for eight processors, limited by the memory bandwidth of the FPGA platform. Furthermore, we show that the CP introduces little overhead in terms of resource usage. Finally, we show scalability

beyond eight processors using cycle-accurate RTL-level simulation with an idealized memory subsystem.

We demonstrate that the CP poses no performance bottlenecks and is scalable up to 32 processors.

ETPL

PDS-030 Mobi-Sync: Efficient Time Synchronization for Mobile Underwater Sensor Networks

Abstract: Time synchronization is an important requirement for many services provided by distributed networks. A lot of time synchronization protocols have been proposed for terrestrial Wireless Sensor

Networks (WSNs). However, none of them can be directly applied to Underwater Sensor Networks

(UWSNs). A synchronization algorithm for UWSNs must consider additional factors such as long propagation delays from the use of acoustic communication and sensor node mobility. These unique

challenges make the accuracy of synchronization procedures for UWSNs even more critical. Time

synchronization solutions specifically designed for UWSNs are needed to satisfy these new requirements.

This paper proposes Mobi-Sync, a novel time synchronization scheme for mobile underwater sensor networks. Mobi-Sync distinguishes itself from previous approaches for terrestrial WSN by considering

spatial correlation among the mobility patterns of neighboring UWSNs nodes. This enables Mobi-Sync to

accurately estimate the long dynamic propagation delays. Simulation results show that Mobi-Sync outperforms existing schemes in both accuracy and energy efficiency.

ETPL

PDS-031

Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and

Heterogeneous GPU Clusters

Abstract: This paper develops and evaluates search and optimization techniques for autotuning 3D stencil

(nearest neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for

heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula,

autogenerates tunable code from it, systematically searches for the best configuration and generates the

code with optimal parameter configurations for different GPUs. This autotuning approach guarantees

adaptive performance for different generations of GPUs while greatly enhancing programmer productivity. Experimental results show that the delivered floating point performance is very close to

previous handcrafted work and outperforms other autotuned stencil codes by a large margin. Furthermore,

heterogeneous GPU clusters are shown to exhibit the highest performance for dissimilar tuning parameters leveraging proportional partitioning relative to single-GPU performance.

ETPL

PDS-032

An Iterative Divide-and-Merge-Based Approach for Solving Large-Scale Least Squares

Problems

Abstract: Singular value decomposition (SVD) is a popular decomposition method for solving least

squares estimation (LSE) problems. However, for large data sets, applying SVD directly on the

coefficient matrix is very time consuming and memory demanding in obtaining least squares solutions. In



http://www.elysiumtechnologies.com, [email protected] this paper, we propose an iterative divide-and-merge-based estimator for solving large-scale LSE

problems. Iteratively, the LSE problem to be solved is processed and transformed to equivalent but

smaller LSE problems. In each iteration, the input matrices are subdivided into a set of small submatrices. The submatrices are decomposed by SVD, respectively, and the results are merged, and the resulting

matrices become the input of the next iteration. The process is iterated until the resulting matrices are

small enough which can then be solved directly and efficiently by SVD. The number of iterations required is determined dynamically according to the size of the input data set. As a result, the

requirements in time and space for finding least squares solutions are greatly improved. Furthermore, the

decomposition and merging of the submatrices in each iteration can be independently done in parallel. The idea can be easily implemented in MapReduce and experimental results show that the proposed

approach can solve large-scale LSE problems effectively.

ETPL

PDS-033 Buffer Management for Aggregated Streaming Data with Packet Dependencies

Abstract: In many applications, the traffic traversing the network has interpacket dependencies due to

application-level encoding schemes. For some applications, e.g., multimedia streaming, dropping a single packet may render useless the delivery of a whole sequence. In such environments, the algorithm used to

decide which packet to drop in case of buffer overflows must be carefully designed, to avoid goodput

degradation. We present a model that captures such interpacket dependencies, and design algorithms for

performing packet discard. Traffic consists of an aggregation of multiple streams, each of which consists of a sequence of interdependent packets. We provide two guidelines for designing buffer management

algorithms, and demonstrate their effectiveness. We devise an algorithm according to these guidelines and

evaluate its performance analytically, using competitive analysis. We also perform a simulation study that shows that the performance of our algorithm is within a small fraction of the performance of the best

known offline algorithm.

ETPL

PDS-034

Design and Performance Evaluation of Overhearing-Aided Data Caching in Wireless

Ad Hoc Networks

Abstract: Wireless ad hoc network is a promising networking technology to provide users with Internet

access anywhere anytime. To cope with resource constraints of wireless ad hoc networks, data caching is

widely used to efficiently reduce data access cost. In this paper, we propose an efficient data caching algorithm which makes use of the overhearing property of wireless communication to improve caching

performance. Due to the broadcast nature of wireless links, a packet can be overheard by a node within

the transmission range of the transmitter, even if the node is not the intended target. Our proposed algorithm explores the overheard information, including data request and data reply, to optimize cache

placement and cache discovery. To the best of our knowledge, this is the first work that considers the

overhearing property of wireless communications in data caching. The simulation results show that, compared with one representative algorithm and a naive overhearing algorithm, our proposed algorithm

can significantly reduce both message cost and access delay.

ETPL

PDS-035

Dynamic Optimization of Multiattribute Resource Allocation in Self-Organizing

Clouds

Abstract: By leveraging virtual machine (VM) technology which provides performance and fault

isolation, cloud resources can be provisioned on demand in a fine grained, multiplexed manner rather than



http://www.elysiumtechnologies.com, [email protected] in monolithic pieces. By integrating volunteer computing into cloud architectures, we envision a gigantic

self-organizing cloud (SOC) being formed to reap the huge potential of untapped commodity computing

power over the Internet. Toward this new architecture where each participant may autonomously act as both resource consumer and provider, we propose a fully distributed, VM-multiplexing resource

allocation scheme to manage decentralized resources. Our approach not only achieves maximized

resource utilization using the proportional share model (PSM), but also delivers provably and adaptively optimal execution efficiency. We also design a novel multiattribute range query protocol for locating

qualified nodes. Contrary to existing solutions which often generate bulky messages per request, our

protocol produces only one lightweight query message per task on the Content Addressable Network (CAN). It works effectively to find for each task its qualified resources under a randomized policy that

mitigates the contention among requesters. We show the SOC with our optimized algorithms can make an

improvement by 15-60 percent in system throughput than a P2P Grid model. Our solution also exhibits

fairly high adaptability in a dynamic node-churning environment.

ETPL

PDS-036 Enabling Efficient WiFi-Based Vehicular Content Distribution

Abstract: For better road safety and driving experience, content distribution for vehicle users through

roadside Access Points (APs) becomes an important and promising complement to 3G and other cellular

networks. In this paper, we introduce Cooperative Content Distribution System for Vehicles (CCDSV)

which operates upon a network of infrastructure APs to collaboratively distribute contents to moving vehicles. CCDSV solves several important issues in a practical system, like the robustness to mobility

prediction errors, limited resources of APs and the shared content distribution. Our system organizes the

cooperative APs into a novel structure, namely, the contact map which is based on the vehicular contact patterns observed by APs. To fully utilize the wireless bandwidth provided by APs, we propose a

representative-based prefetching mechanism, in which a set of representative APs are carefully selected

and then share their prefetched data with others. The selection process explicitly takes into account the AP's storage capacity, storage status, inter-APs bandwidth and traffic loads on the backhaul links. We

apply network coding in CCDSV to augment the distribution of shared contents. The selection of shared

contents to be prefetched on an AP is based on the storage status of neighboring APs in the contact map

in order to increase the information utility of each prefetched data piece. Through extensive simulations, CCDSV proves its effectiveness in vehicular content distribution under various scenarios

ETPL

PDS-037

Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed

Systems

Abstract: Most existing global-snapshot algorithms in distributed systems use control messages to

coordinate the construction of a global snapshot among all processes. Since these algorithms typically

assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in

higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a

large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response

time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process

finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff

between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm



http://www.elysiumtechnologies.com, [email protected] is able to achieve better workload balance and less network congestion. Most importantly, based on our

framework, we demonstrate that the minimum number of control messages required by a symmetrical

global-snapshot algorithm is Ω(N log N), where N is the number of processes. Finally, we also assume non-FIFO channels.

ETPL

PDS-038 Hardware Signature Designs to Deal with Asymmetry in Transactional Data Sets

Abstract: Transactional Memory (TM) systems must track memory accesses made by concurrent

transactions in order to detect conflicts. Many TM implementations use signatures for this purpose, which

summarize reads and writes in fixed-size bit registers at the cost of false positives (detection of nonexisting conflicts). Signatures are commonly implemented as two separate same-sized Bloom filters,

one for reads and other for writes. In contrast, transactions frequently exhibit read and write sets of

uneven cardinality. This mismatch between data sets and filter storage introduces inefficiencies in the use of signatures that have some impact on performance. This paper presents different signature designs as

alternatives to the common scheme to deal with the asymmetry in transactional data sets in an effective

way. Basically, we analyze two classes of new signatures, called multiset and reconfigurable asymmetric signatures. The first class uses only one Bloom filter to track both read and write sets, while the second

class uses Bloom filters of configurable size for reads and writes. The main focus of this paper is a

thorough study of these alternative signature designs, including a statistical analysis of false positives and

an experimental evaluation, providing performance results and hardware area, time and energy requirements.

ETPL

PDS-039 Improve Efficiency and Reliability in Single-Hop WSNs with Transmit-Only Nodes

Abstract: Wireless Sensor Networks (WSNs) will play a significant role at the “edge” of the future

“Internet of Things.” In particular, WSNs with transmit-only nodes are attracting more attention due to

their advantages in supporting applications requiring dense and long-lasting deployment at a very low cost and energy consumption. However, the lack of receivers in transmit-only nodes renders most existing

MAC protocols invalid. Based on our previous study on WSNs with pure transmit-only nodes, this work

proposes a simple, yet cost effective and powerful single-hop hybrid WSN cluster architecture that contains not only transmit-only nodes but also standard nodes (with transceivers). Along with the hybrid

architecture, this work also proposes a new MAC layer protocol framework called Robust Asynchronous

Resource Estimation (RARE) that efficiently and reliably manages the densely deployed single-hop

hybrid cluster in a self-organized fashion. Through analysis and extensive simulations, the proposed framework is shown to meet or exceed the needs of most applications in terms of the data delivery

probability, QoS differentiation, system capacity, energy consumption, and reliability. To the best of our

knowledge, this work is the first that brings reliable scheduling to WSNs containing both nonsynchronized transmit-only nodes and standard nodes.

ETPL

PDS-040 Improving the Reliability of MPI Libraries via Message Flow Checking

Abstract: Distributed processing through ad hoc and sensor networks is having a major impact on scale and applications of computing. The creation of new cyber-physical services based on wireless sensor

devices relies heavily on how well communication protocols can be adapted and optimized to meet



http://www.elysiumtechnologies.com, [email protected] quality constraints under limited energy resources. The IEEE 802.15.4 medium access control protocol

for wireless sensor networks can support energy efficient, reliable, and timely packet transmission by a

parallel and distributed tuning of the medium access control parameters. Such a tuning is difficult, because simple and accurate models of the influence of these parameters on the probability of successful

packet transmission, packet delay, and energy consumption are not available. Moreover, it is not clear

how to adapt the parameters to the changes of the network and traffic regimes by algorithms that can run on resource-constrained devices. In this paper, a Markov chain is proposed to model these relations by

simple expressions without giving up the accuracy. In contrast to previous work, the presence of limited

number of retransmissions, acknowledgments, unsaturated traffic, packet size, and packet copying delay due to hardware limitations is accounted for. The model is then used to derive a distributed adaptive

algorithm for minimizing the power consumption while guaranteeing a given successful packet reception

probability and delay constraints in the packet transmission. The algorithm does not require any

modification of the IEEE 802.15.4 medium access control and can be easily implemented on network devices. The algorithm has been experimentally implemented and evaluated on a testbed with off-the-

shelf wireless sensor devices. Experimental results show that the analysis is accurate, that the proposed

algorithm satisfies reliability and delay constraints, and that the approach reduces the energy consumption of the network under both stationary and transient conditions. Specif- cally, even if the number of devices

and traffic configuration change sharply, the proposed parallel and distributed algorithm allows the

system to operate close to its optimal state by estimating the busy channel and channel access probabilities. Furthermore, results indicate that the protocol reacts promptly to errors in the estimation of

the number of devices and in the traffic load that can appear due to device mobility. It is also shown that

the effect of imperfect channel and carrier sensing on system performance heavily depends on the traffic

load and limited range of the protocol parameters.

ETPL

PDS-041 Optimal Client-Server Assignment for Internet Distributed Systems

Abstract: We investigate an underlying mathematical model and algorithms for optimizing the

performance of a class of distributed systems over the Internet. Such a system consists of a large number of clients who communicate with each other indirectly via a number of intermediate servers. Optimizing

the overall performance of such a system then can be formulated as a client-server assignment problem

whose aim is to assign the clients to the servers in such a way to satisfy some prespecified requirements

on the communication cost and load balancing. We show that 1) the total communication load and load balancing are two opposing metrics, and consequently, their tradeoff is inherent in this class of distributed

systems; 2) in general, finding the optimal client-server assignment for some prespecified requirements on

the total load and load balancing is NP-hard, and therefore; 3) we propose a heuristic via relaxed convex optimization for finding the approximate solution. Our simulation results indicate that the proposed

algorithm produces superior performance than other heuristics, including the popular Normalized Cuts

algorithm.

ETPL

PDS-042 Resilient Self-Compressive Monitoring for Large-Scale Hosting Infrastructures

Abstract: Large-scale hosting infrastructures have become the fundamental platforms for many real-world

systems such as cloud computing infrastructures, enterprise data centers, and massive data processing systems. However, it is a challenging task to achieve both scalability and high precision while monitoring

a large number of intranode and internode attributes (e.g., CPU usage, free memory, free disk, internode

network delay). In this paper, we present the design and implementation of a Resilient self-Compressive Monitoring (RCM) system for large-scale hosting infrastructures. RCM achieves scalable distributed

monitoring by performing online data compression to reduce remote data collection cost. RCM provides

failure resilience to achieve robust monitoring for dynamic distributed systems where host and network



http://www.elysiumtechnologies.com, [email protected] failures are common. We have conducted extensive experiments using a set of real monitoring data from

NCSU's virtual computing lab (VCL), PlanetLab, a Google cluster, and real Internet traffic matrices. The

experimental results show that RCM can achieve up to 200 percent higher compression ratio and several orders of magnitude less overhead than the existing approaches.

ETPL

PDS-043 Service Provision Control in Federated Service Providing Systems

Abstract: Different from traditional P2P systems, individuals nodes of a Federated Service Providing

(FSP) system play a more active role by offering a variety of domain-specific services. The service

provision control (SPC) problem is an important problem of the FSP system and will be tackled in this

paper within a stochastic optimization framework through several steps. The first step focuses on using stochastic differential equations (SDEs) to model and analyze the dynamic evolution of the service

demand. Driven by the SDE model, expected future performance of a FSP system is analytically

evaluated in the second step. Step three utilizes the differential evolution (DE) algorithm to identify near-optimal service-providing policies for each node. The service subscription protocol is further proposed in

step four to help every node adjust its local policy in accordance with the services provided by other

nodes. The four steps together implement a complete solution of the SPC problem and will be called the SDE-based service-provision control (SSPC) mechanism in this paper. Experimental evaluation of the

mechanism has been reported in the paper. The results show that our approach is effective in tackling the

SPC problem and may be therefore suitable for many practical applications.

ETPL

PDS-044 Social Similarity Favors Cooperation: The Distributed Content Replication Case

Abstract: This paper explores how the degree of similarity within a social group can dictate the behavior

of the individual nodes, so as to best tradeoff the individual with the social benefit. More specifically, we

investigate the impact of social similarity on the effectiveness of content placement and dissemination. We consider three schemes that represent well the spectrum of behavior-shaped content storage strategies:

the selfish, the self-aware cooperative, and the optimally altruistic ones. Our study shows that when the

social group is tight (high degree of similarity), the optimally altruistic behavior yields the best performance for both the entire group (by definition) and the individual nodes (contrary to typical

expectations). When the group is made up of members with almost no similarity, altruism or cooperation

cannot bring much benefit to either the group or the individuals and thus, selfish behavior emerges as the preferable choice due to its simplicity. Notably, from a theoretical point of view, our “similarity favors

cooperation” argument is inline with sociological interpretations of human altruistic behavior. On a more

practical note, the self-aware cooperative behavior could be adopted as an easy to implement distributed

alternative to the optimally altruistic one; it has close to the optimal performance for tight social groups and the additional advantage of not allowing mistreatment of any node, i.e., its induced content retrieval

cost is always smaller than the cost of the selfish strategy.

ETPL

PDS-045

SPOC: A Secure and Privacy-Preserving Opportunistic Computing Framework for

Mobile-Healthcare Emergency

Abstract: With the pervasiveness of smart phones and the advance of wireless body sensor networks

(BSNs), mobile Healthcare (m-Healthcare), which extends the operation of Healthcare provider into a

pervasive environment for better health monitoring, has attracted considerable interest recently. However, the flourish of m-Healthcare still faces many challenges including information security and privacy

preservation. In this paper, we propose a secure and privacy-preserving opportunistic computing

framework, called SPOC, for m-Healthcare emergency. With SPOC, smart phone resources including computing power and energy can be opportunistically gathered to process the computing-intensive

personal health information (PHI) during m-Healthcare emergency with minimal privacy disclosure. In



http://www.elysiumtechnologies.com, [email protected] specific, to leverage the PHI privacy disclosure and the high reliability of PHI process and transmission in

m-Healthcare emergency, we introduce an efficient user-centric privacy access control in SPOC

framework, which is based on an attribute-based access control and a new privacy-preserving scalar product computation (PPSPC) technique, and allows a medical user to decide who can participate in the

opportunistic computing to assist in processing his overwhelming PHI data. Detailed security analysis

shows that the proposed SPOC framework can efficiently achieve user-centric privacy access control in m-Healthcare emergency. In addition, performance evaluations via extensive simulations demonstrate the

SPOC's effectiveness in term of providing high-reliable-PHI process and transmission while minimizing

the privacy disclosure during m-Healthcare emergency.

ETPL

PDS-046 A Secure Protocol for Spontaneous Wireless Ad Hoc Networks Creation

Abstract: This paper presents a secure protocol for spontaneous wireless ad hoc networks which uses an

hybrid symmetric/asymmetric scheme and the trust between users in order to exchange the initial data and to exchange the secret keys that will be used to encrypt the data. Trust is based on the first visual contact

between users. Our proposal is a complete self-configured secure protocol that is able to create the

network and share secure services without any infrastructure. The network allows sharing resources and offering new services among users in a secure environment. The protocol includes all functions needed to

operate without any external support. We have designed and developed it in devices with limited

resources. Network creation stages are detailed and the communication, protocol messages, and network

management are explained. Our proposal has been implemented in order to test the protocol procedure and performance. Finally, we compare the protocol with other spontaneous ad hoc network protocols in

order to highlight its features and we provide a security analysis of the system.

ETPL

PDS-047 Bayesian-Inference-Based Recommendation in Online Social Networks

Abstract: In this paper, we propose a Bayesian-inference-based recommendation system for online social

networks. In our system, users share their content ratings with friends. The rating similarity between a pair of friends is measured by a set of conditional probabilities derived from their mutual rating history. A

user propagates a content rating query along the social network to his direct and indirect friends. Based on

the query responses, a Bayesian network is constructed to infer the rating of the querying user. We

develop distributed protocols that can be easily implemented in online social networks. We further propose to use Prior distribution to cope with cold start and rating sparseness. The proposed algorithm is

evaluated using two different online rating data sets of real users. We show that the proposed Bayesian-

inference-based recommendation is better than the existing trust-based recommendations and is comparable to Collaborative Filtering (CF) recommendation. It allows the flexible tradeoffs between

recommendation quality and recommendation quantity. We further show that informative Prior

distribution is indeed helpful to overcome cold start and rating sparseness.

ETPL

PDS-048

CDS-Based Virtual Backbone Construction with Guaranteed Routing Cost in Wireless

Sensor Networks

Abstract: Inspired by the backbone concept in wired networks, virtual backbone is expected to bring substantial benefits to routing in wireless sensor networks (WSNs). Virtual backbone construction based

on Connected Dominating Set (CDS) is a competitive approach among the existing methods used to

establish virtual backbone in WSNs. Traditionally, CDS size was the only factor considered in the CDS-

based approach. The motivation was that smaller CDS leads to simplified network maintenance.



http://www.elysiumtechnologies.com, [email protected] However, routing cost in terms of routing path length is also an important factor for virtual backbone

construction. In our research, both of these two factors are taken into account. Specifically, we attempt to

devise a polynomial-time constant-approximation algorithm that leads to a CDS with bounded CDS size and guaranteed routing cost. We prove that, under general graph model, there is no polynomial-time

constant-approximation algorithm unless P = NP. Under Unit Disk Graph (UDG) model, we propose an

innovative polynomial-time constant-approximation algorithm, GOC-MCDS-C, that produces a CDS D whose size I D is within a constant factor from that of the minimum CDS. In addition, for each node pair

u and v, there exists a routing path with all intermediate nodes in D and path length at most 7 · d(u, v),

where d(u, v) is the length of the shortest path between u and v. Our theoretical analysis and simulation results show that the distributed version of the proposed algorithm, GOC-MCDS-D, outperforms the

existing approaches.

ETPL

PDS-049 Characterization and Management of Popular Content in KAD

Abstract: The endeavor of this work is to study the impact of content popularity in a large-scale Peer-to-

Peer network, namely KAD. Based on an extensive measurement campaign, we pinpoint several

deficiencies of KAD in handling popular content and provide a series of improvements to address such

shortcomings. Our work reveals that keywords, which are associated with content, may become popular for two distinct reasons. First, we show that some keywords are intrinsically popular because they are

common to many disparate contents: in such case we ameliorate KAD by introducing a simple

mechanism that identifies stopwords. Then, we focus on keyword popularity that directly relates to popular content. We design and evaluate an adaptive load balancing mechanism that is backward

compatible with the original implementation of KAD. Our scheme features the following properties: 1) it

drives the process that selects the location of peers responsible to store references to objects, based on

object popularity; 2) it solves problems related to saturated peers that would otherwise inflict a significant drop in the diversity of references to objects, and 3) if coupled with a load-aware content search

procedure, it allows for a more fair and efficient usage of peer resources.

ETPL

PDS-050

Complete EAP Method: User Efficient and Forward Secure Authentication Protocol

for IEEE 802.11 Wireless LANs

Abstract: It is necessary to authenticate users who attempt to access resources in Wireless Local Area

Networks (WLANs). Extensible Authentication Protocol (EAP) is an authentication framework widely used in WLANs. Authentication mechanisms built on EAP are called EAP methods. The requirements for

EAP methods in WLAN authentication have been defined in RFC 4017. To achieve user efficiency and

robust security, lightweight computation and forward secrecy, excluded in RFC 4017, are desired in WLAN authentication. However, all EAP methods and authentication protocols designed for WLANs so

far do not satisfy all of the above properties. This manuscript will present a complete EAP method that

utilizes stored secrets and passwords to verify users so that it can 1) fully meet the requirements of RFC

4017, 2) provide for lightweight computation, and 3) allow for forward secrecy. In addition, we also demonstrate the security of our proposed EAP method with formal proofs.

ETPL

PDS-051

Coordinated Self-Configuration of Virtual Machines and Appliances Using a Model-

Free Learning Approach

Abstract: Cloud computing has a key requirement for resource configuration in a real-time manner. In such virtualized environments, both virtual machines (VMs) and hosted applications need to be

configured on-the-fly to adapt to system dynamics. The interplay between the layers of VMs and

applications further complicates the problem of cloud configuration. Independent tuning of each aspect



http://www.elysiumtechnologies.com, [email protected] may not lead to optimal system wide performance. In this paper, we propose a framework, namely

CoTuner, for coordinated configuration of VMs and resident applications. At the heart of the framework

is a model-free hybrid reinforcement learning (RL) approach, which combines the advantages of Simplex method and RL method and is further enhanced by the use of system knowledge guided exploration

policies. Experimental results on Xen-based virtualized environments with TPC-W and TPC-C

benchmarks demonstrate that CoTuner is able to drive a virtual server cluster into an optimal or near-optimal configuration state on the fly, in response to the change of workload. It improves the systems

throughput by more than 30 percent over independent tuning strategies. In comparison with the

coordinated tuning strategies based on basic RL or Simplex algorithm, the hybrid RL algorithm gains 25 to 40 percent throughput improvement.

ETPL

PDS-052 Exploiting Concurrency for Efficient Dissemination in Wireless Sensor Networks

Abstract: Cloud computing has a key requirement for resource configuration in a real-time manner. In

such virtualized environments, both virtual machines (VMs) and hosted applications need to be

configured on-the-fly to adapt to system dynamics. The interplay between the layers of VMs and

applications further complicates the problem of cloud configuration. Independent tuning of each aspect may not lead to optimal system wide performance. In this paper, we propose a framework, namely

CoTuner, for coordinated configuration of VMs and resident applications. At the heart of the framework

is a model-free hybrid reinforcement learning (RL) approach, which combines the advantages of Simplex method and RL method and is further enhanced by the use of system knowledge guided exploration

policies. Experimental results on Xen-based virtualized environments with TPC-W and TPC-C

benchmarks demonstrate that CoTuner is able to drive a virtual server cluster into an optimal or near-optimal configuration state on the fly, in response to the change of workload. It improves the systems

throughput by more than 30 percent over independent tuning strategies. In comparison with the

coordinated tuning strategies based on basic RL or Simplex algorithm, the hybrid RL algorithm gains 25

to 40 percent throughput improvement.

ETPL

PDS-053 Exploiting Concurrency for Efficient Dissemination in Wireless Sensor Networks

Abstract: Wireless sensor networks (WSNs) can be successfully applied in a wide range of applications. Efficient data dissemination is a fundamental service which enables many useful high-level functions

such as parameter reconfiguration, network reprogramming, etc. Many current data dissemination

protocols employ network coding techniques to deal with packet losses. The coding overhead, however, becomes a bottleneck in terms of dissemination delay. We exploit the concurrency potential of sensor

nodes and propose MT-Deluge, a multithreaded design of a coding-based data dissemination protocol. By

separating the coding and radio operations into two threads and carefully scheduling their executions, MT-Deluge shortens the dissemination delay effectively. An incremental decoding algorithm is employed

to further improve MT-Deluge's performance. Experiments with 24 TelosB motes on four representative

topologies show that MT-Deluge shortens the dissemination delay by 25.5-48.6 percent compared to a

typical data dissemination protocol while keeping the merits of loss resilience.

ETPL

PDS-054 Fault Tolerance in Distributed Systems Using Fused Data Structures




Abstract: Replication is the prevalent solution to tolerate faults in large data structures hosted on

distributed servers. To tolerate f crash faults (dead/unresponsive data structures) among n distinct data

structures, replication requires f + 1 replicas of each data structure, resulting in nf additional backups. We present a solution, referred to as fusion that uses a combination of erasure codes and selective replication

to tolerate f crash faults using just f additional fused backups. We show that our solution achieves O(n)

savings in space over replication. Further, we present a solution to tolerate f Byzantine faults (malicious data structures), that requires only nf + f backups as compared to the 2nf backups required by replication.

We explore the theory of fused backups and provide a library of such backups for all the data structures in

the Java Collection Framework. The theoretical and experimental evaluation confirms that the fused backups are space-efficient as compared to replication, while they cause very little overhead for normal

operation. To illustrate the practical usefulness of fusion, we use fused backups for reliability in Amazon's

highly available key-value store, Dynamo. While the current replication-based solution uses 300 backup

structures, we present a solution that only requires 120 backup structures. This results in savings in space as well as other resources such as power.

ETPL

PDS-055 Feasibility of Polynomial-Time Randomized Gathering for Oblivious Mobile Robots

Abstract: We consider the problem of gathering n anonymous and oblivious mobile robots, which

requires that all robots meet in finite time at a nonpredefined point. While the gathering problem cannot

be solved deterministically without assuming any additional capabilities for the robots, randomized approaches easily allow it to be solvable. However, the randomized solutions currently known have a

time complexity that is exponential in n with no additional assumption. This fact yields the following two

questions: Is it possible to construct a randomized gathering algorithm with polynomial expected time? If it is not possible, what is the minimal additional assumption necessary to obtain such an algorithm? In

this paper, we address these questions from the aspect of multiplicity-detection capabilities. We newly

introduce two weaker variants of multiplicity detection, called local-strong and local-weak multiplicity,

and investigate whether those capabilities permit a gathering algorithm with polynomial expected time or not. The contribution of this paper is to show that any algorithm only assuming local-weak multiplicity

detection takes exponential number of rounds in expectation. On the other hand, we can obtain a constant-

round gathering algorithm using local-strong multiplicity detection. These results imply that the two models of multiplicity detection are significantly different in terms of their computational power.

Interestingly, these differences disappear if we take one more assumption that all robots are scattered (i.e.,

no two robots stay at the same location) initially. We can obtain a gathering algorithm that takes a

constant number of rounds in expectation, assuming local-weak multiplicity detection and scattered initial configurations.

ETPL

PDS-056

Finding All Maximal Contiguous Subsequences of a Sequence of Numbers in O(1)

Communication Rounds

Abstract: Given a sequence A of real numbers, we wish to find a list of all nonoverlapping contiguous

subsequences of A that are maximal. A maximal subsequence M of A has the property that no proper

subsequence of M has a greater sum of values. Furthermore, M may not be contained properly within any subsequence of A with this property. This problem has several applications in Computational Biology and

can be solved sequentially in linear time. We present a BSP/CGM algorithm that solves this problem

using p processors in O(|A|=p) time and O(|A|=p) space per processor. The algorithm uses a constant number of communication rounds of size at most O(|A|=p). Thus, the algorithm achieves linear speedup



http://www.elysiumtechnologies.com, [email protected] and is highly scalable. To our knowledge, there are no previous known parallel BSP/CGM algorithms to

solve this problem.

ETPL

PDS-057 Geocommunity-Based Broadcasting for Data Dissemination in Mobile Social Networks

Abstract: In this paper, we consider the issue of data broadcasting in mobile social networks (MSNets).

The objective is to broadcast data from a superuser to other users in the network. There are two main

challenges under this paradigm, namely 1) how to represent and characterize user mobility in realistic MSNets; 2) given the knowledge of regular users' movements, how to design an efficient superuser route

to broadcast data actively. We first explore several realistic data sets to reveal both geographic and social

regularities of human mobility, and further propose the concepts of geocommunity and geocentrality into MSNet analysis. Then, we employ a semi-Markov process to model user mobility based on the

geocommunity structure of the network. Correspondingly, the geocentrality indicating the “dynamic user

density” of each geocommunity can be derived from the semi-Markov model. Finally, considering the

geocentrality information, we provide different route algorithms to cater to the superuser that wants to either minimize total duration or maximize dissemination ratio. To the best of our knowledge, this work is

the first to study data broadcasting in a realistic MSNet setting. Extensive trace-driven simulations show

that our approach consistently outperforms other existing superuser route design algorithms in terms of dissemination ratio and energy efficiency.

ETPL

PDS-058

LOBOT: Low-Cost, Self-Contained Localization of Small-Sized Ground Robotic

Vehicles

Abstract: It is often important to obtain the real-time location of a small-sized ground robotic vehicle

when it performs autonomous tasks either indoors or outdoors. We propose and implement LOBOT, a

low-cost, self-contained localization system for small-sized ground robotic vehicles. LOBOT provides

accurate real-time, 3D positions in both indoor and outdoor environments. Unlike other localization schemes, LOBOT does not require external reference facilities, expensive hardware, careful tuning or

strict calibration, and is capable of operating under various indoor and outdoor environments. LOBOT

identifies the local relative movement through a set of integrated inexpensive sensors and well corrects the localization drift by infrequent GPS-augmentation. Our empirical experiments in various temporal and

spatial scales show that LOBOT keeps the positioning error well under an accepted threshold.

ETPL

PDS-059 Lower Bound for Node Buffer Size in Intermittently Connected Wireless Networks

Abstract: We study the fundamental lower bound for node buffer size in intermittently connected wireless

networks. The intermittent connectivity is caused by the possibility of node inactivity due to some external constraints. We find even with infinite channel capacity and node processing speed, buffer

occupation in each node does not approach zero in a static random network where each node keeps a

constant message generation rate. Given the condition that each node has the same probability p of being

inactive during each time slot, there exists a critical value pc(λ) for this probability from a percolation-based perspective. When p <; pc(λ), the network is in the supercritical case, and there is an achievable

lower bound (In our paper, “achievable” means that node buffer size in networks can achieve the same

order as the lower bound by applying some transmission scheme) for the occupied buffer size of each node, which is asymptotically independent of the size of the network. If p > pc(λ), the network is in the



http://www.elysiumtechnologies.com, [email protected] subcritical case, and there is a tight lower bound Θ(√n) for buffer occupation, where n is the number of

nodes in the network.

ETPL

PDS-060

On-Chip Sensor Network for Efficient Management of Power Gating-Induced

Power/Ground Noise in Multiprocessor System on Chip

Abstract: Reducing feature sizes and power supply voltage allows integrating more processing units

(PUs) on multiprocessor system on chip (MPSoC) to satisfy the increasing demands of applications.

However, it also makes MPSoC more susceptible to various reliability threats, such as high temperature

and power/ground (P/G) noise. As the scale and complexity of MPSoC continuously increase, monitoring and mitigating reliability threats at runtime could offer better performance, scalability, and flexibility for

MPSoC designs. In this paper, we propose a systematic approach, on-chip sensor network (SENoC), to

collaboratively predict, detect, report, and alleviate runtime threats in MPSoC. SENoC not only detects reliability threats and shares related information among PUs, but also plans and coordinates the reactions

of related PUs in MPSoC. SENoC is used to alleviate the impacts of simultaneous switching noise in

MPSoC's P/G network during power gating. Based on the detailed noise behaviors under different

scenarios derived by our circuit-level MPSoC P/G noise simulation and analysis platform, simulation results show that SENoC helps to achieve on average 26.2 percent performance improvement compared

with the traditional stop-go method with 1.4 percent area overhead in an 8*8-core MPSoC in 45 nm. An

architecture-level cycle-accurate simulator based on SystemC is implemented to study the performance of the proposed SENoC. By applying sophisticated scheduling techniques to optimize the total system

performance, a higher performance improvement of 43.5 percent is achieved for a set of real-life

applications.

ETPL

PDS-061 Robust Tracking of Small-Scale Mobile Primary User in Cognitive Radio Networks

Abstract: In cognitive radio networks (CRNs), secondary users must be able to accurately and reliably track the location of small-scale mobile primary users/devices (e.g., wireless microphones) in order to

efficiently utilize spatial spectrum opportunities, while protecting primary communications. However,

accurate tracking of the location of mobile primary users is difficult due mainly to the CR-unique

constraint, i.e., localization must rely solely on reported sensing results (i.e., measured primary signal strengths), which can easily be compromised by malicious sensors (or attackers). To cope with this

challenge, we propose a new framework, called Sequential mOnte carLo combIned with shadow-faDing

estimation (SOLID), for accurate, attack/fault-tolerant tracking of small-scale mobile primary users. The key idea underlying SOLID is to exploit the temporal shadow fading correlation in sensing results

induced by the primary user's mobility. Specifically, SOLID augments conventional Sequential Monte

Carlo (SMC)-based target tracking with shadow-fading estimation. By examining the shadow-fading gain between the primary transmitter and CRs/sensors, SOLID 1) significantly improves the accuracy of

primary tracking regardless of the presence/absence of attack, and 2) successfully masks the abnormal

sensing reports due to sensor faults or attacks, preserving localization accuracy and improving spatial

spectrum efficiency. Our extensive evaluation in realistic wireless fading environments shows that SOLID lowers localization error by up to 88 percent in the absence of attacks, and 89 percent in the

presence of the challenging "slow-poisoning” attack, compared to the conventional SMC-based tracking.




ETPL

PDS-062 Scheduling Sensor Data Collection with Dynamic Traffic Patterns

Abstract: The network traffic pattern of continuous sensor data collection often changes constantly over time due to the exploitation of temporal and spatial data correlations as well as the nature of condition-

based monitoring applications. In contrast to most existing TDMA schedules designed for a static

network traffic pattern, this paper proposes a novel TDMA schedule that is capable of efficiently collecting sensor data for any network traffic pattern and is thus well suited to continuous data collection

with dynamic traffic patterns. In the proposed schedule, the energy consumed by sensor nodes for any

traffic pattern is very close to the minimum required by their workloads given in the traffic pattern. The

schedule also allows the base station to conclude data collection as early as possible according to the traffic load, thereby reducing the latency of data collection. We present a distributed algorithm for

constructing the proposed schedule. We develop a mathematical model to analyze the performance of the

proposed schedule. We also conduct simulation experiments to evaluate the performance of different schedules using real-world data traces. Both the analytical and simulation results show that, compared

with existing schedules that are targeted on a fixed traffic pattern, our proposed schedule significantly

improves the energy efficiency and time efficiency of sensor data collection with dynamic traffic patterns.

ETPL

PDS-063 Secure SOurce-BAsed Loose Synchronization (SOBAS) for Wireless Sensor Networks

Abstract: We present the Secure SOurce-BAsed Loose Synchronization (SOBAS) protocol to securely synchronize the events in the network, without the transmission of explicit synchronization control

messages. In SOBAS, nodes use their local time values as a one-time dynamic key to encrypt each

message. In this way, SOBAS provides an effective dynamic en-route filtering mechanism, where the

malicious data is filtered from the network. With SOBAS, we are able to achieve our main goal of synchronizing events at the sink as quickly, as accurately, and as surreptitiously as possible. With loose

synchronization, SOBAS reduces the number of control messages needed for a WSN to operate providing

the key benefits of reduced energy consumption as well as reducing the opportunity for malicious nodes to eavesdrop, intercept, or be made aware of the presence of the network. Albeit a loose synchronization

per se, SOBAS is also able to provide 7.24 μs clock precision given today's sensor technology, which is

much better than other comparable schemes (schemes that do not employ GPS devices). Also, we show

that by recognizing the need for and employing loose time synchronization, necessary synchronization can be provided to the WSN application using half of the energy needed for traditional schemes. Both

analytical and simulation results are presented to verify the feasibility of SOBAS as well as the energy

consumption of the scheme under normal operation and attack from malicious nodes.

ETPL

PDS-064 On Data Staging Algorithms for Shared Data Accesses in Clouds

Abstract: In this paper, we study the strategies for efficiently achieving data staging and caching on a set of vantage sites in a cloud system with a minimum cost. Unlike the traditional research, we do not intend

to identify the access patterns to facilitate the future requests. Instead, with such a kind of information

presumably known in advance, our goal is to efficiently stage the shared data items to predetermined sites at advocated time instants to align with the patterns while minimizing the monetary costs for caching and

transmitting the requested data items. To this end, we follow the cost and network models in [1] and

extend the analysis to multiple data items, each with single or multiple copies. Our results show that



http://www.elysiumtechnologies.com, [email protected] under homogeneous cost model, when the ratio of transmission cost and caching cost is low, a single copy

of each data item can efficiently serve all the user requests. While in multicopy situation, we also consider

the tradeoff between the transmission cost and caching cost by controlling the upper bounds of transmissions and copies. The upper bound can be given either on per-item basis or on all-item basis. We

present efficient optimal solutions based on dynamic programming techniques to all these cases provided

that the upper bound is polynomially bounded by the number of service requests and the number of distinct data items. In addition to the homogeneous cost model, we also briefly discuss this problem under

a heterogeneous cost model with some simple yet practical restrictions and present a 2-approximation

algorithm to the general case. We validate our findings by implementing a data staging solver, whereby conducting extensive simulation studies on the behaviors of the algorithms.

ETPL

PDS-065 WILL: Wireless Indoor Localization without Site Survey

Abstract: Indoor localization is of great importance for a range of pervasive applications, attracting many

research efforts in the past two decades. Most radio-based solutions require a process of site survey, in

which radio signatures are collected and stored for further comparison and matching. Site survey involves

intensive costs on manpower and time. In this work, we study unexploited RF signal characteristics and leverage user motions to construct radio floor plan that is previously obtained by site survey. On this

basis, we design WILL, an indoor localization approach based on off-the-shelf WiFi infrastructure and

mobile phones. WILL is deployed in a real building covering over 1600 m2, and its deployment is easy and rapid since site survey is no longer needed. The experiment results show that WILL achieves

competitive performance comparing with traditional approaches.

ETPL

PDS-066 Analysis of a Pool Management Scheme for Cloud Computing Centers

Abstract: In this paper, we propose an analytical performance model that addresses the complexity of

cloud centers through distinct stochastic submodels, the results of which are integrated to obtain the overall solution. Our model incorporates the important aspects of cloud centers such as pool management,

compound requests (i.e., a set of requests submitted by one user simultaneously), resource virtualization

and realistic servicing steps. In this manner, we obtain not only a detailed assessment of cloud center

performance, but also clear insights into equilibrium arrangement and capacity planning that allows servicing delays, task rejection probability, and power consumption to be kept under control.

ETPL

PDS-067

A New Progressive Algorithm for a Multiple Longest Common Subsequences Problem

and Its Efficient Parallelization

Abstract: The multiple longest common subsequence (MLCS) problem, which is related to the

measurement of sequence similarity, is one of the fundamental problems in many fields. As an NP-hard

problem, finding a good approximate solution within a reasonable time is important for solving large-size problems in practice. In this paper, we present a new progressive algorithm, Pro-MLCS, based on the

dominant point approach. Pro-MLCS can find an approximate solution quickly and then progressively

generate better solutions until obtaining the optimal one. Pro-MLCS employs three new techniques: 1) a new heuristic function for prioritizing candidate points; 2) a novel $(d)$-index-tree data structure for

efficient computation of dominant points; and 3) a new pruning method using an upper bound function

and approximate solutions. Experimental results show that Pro-MLCS can obtain the first approximate

solution almost instantly and needs only a very small fraction, e.g., 3 percent, of the entire running time to



http://www.elysiumtechnologies.com, [email protected] get the optimal solution. Compared to existing state-of-the-art algorithms, Pro-MLCS can find better

solutions in much shorter time, one to two orders of magnitude faster. In addition, two parallel versions of

Pro-MLCS are developed: DPro-MLCS for distributed memory architecture and DSDPro-MLCS for hierarchical distributed shared memory architecture. Both parallel algorithms can efficiently utilize

parallel computing resources and achieve nearly linear speedups. They also have a desirable

progressiveness property—finding better solutions in shorter time when given more hardware resources.

ETPL

PDS-068 A Novel Message Scheduling Framework for Delay Tolerant Networks Routing

Abstract: Multicopy routing strategies have been considered the most applicable approaches to achieve

message delivery in Delay Tolerant Networks (DTNs). Epidemic routing and two-hop forwarding routing

are two well-reported approaches for delay tolerant networks routing which allow multiple message replicas to be launched in order to increase message delivery ratio and/or reduce message delivery delay.

This advantage, nonetheless, is at the expense of additional buffer space and bandwidth overhead. Thus,

to achieve efficient utilization of network resources, it is important to come up with an effective message

scheduling strategy to determine which messages should be forwarded and which should be dropped in case of buffer is full. This paper investigates a new message scheduling framework for epidemic and two-

hop forwarding routing in DTNs, such that the forwarding/dropping decision can be made at a node

during each contact for either optimal message delivery ratio or message delivery delay. Extensive simulation results show that the proposed message scheduling framework can achieve better performance

than its counterparts.

ETPL

PDS-069

Attribute-Aware Data Aggregation Using Potential-Based Dynamic Routing in

Wireless Sensor Networks

Abstract: The resources especially energy in wireless sensor networks (WSNs) are quite limited. Since

sensor nodes are usually much dense, data sampled by sensor nodes have much redundancy, data aggregation becomes an effective method to eliminate redundancy, minimize the number of transmission,

and then to save energy. Many applications can be deployed in WSNs and various sensors are embedded

in nodes, the packets generated by heterogenous sensors or different applications have different attributes.

The packets from different applications cannot be aggregated. Otherwise, most data aggregation schemes employ static routing protocols, which cannot dynamically or intentionally forward packets according to

network state or packet types. The spatial isolation caused by static routing protocol is unfavorable to data

aggregation. To make data aggregation more efficient, in this paper, we introduce the concept of packet attribute, defined as the identifier of the data sampled by different kinds of sensors or applications, and

then propose an attribute-aware data aggregation (ADA) scheme consisting of a packet-driven timing

algorithm and a special dynamic routing protocol. Inspired by the concept of potential in physics and pheromone in ant colony, a potential-based dynamic routing is elaborated to support an ADA strategy.

The performance evaluation results in series of scenarios verify that the ADA scheme can make the

packets with the same attribute spatially convergent as much as possible and therefore improve the

efficiency of data aggregation. Furthermore, the ADA scheme also offers other properties, such as scalable with respect to network size and adaptable for tracking mobile events.

ETPL

PDS-070 DCNS: An Adaptable High Throughput RFID Reader-to-Reader Anticollision Protocol




Abstract: The reader-to-reader collision problem represents a research topic of great recent interest for the

radio frequency identification (RFID) technology. Among the state-of-the-art anticollision protocols, the

ones that provide high throughput often have special requirements, such as extra hardware. This study investigates new high throughput solutions for static RFID networks without additional requirements. In

this paper, two contributions are presented: a new configuration, called Killer, and a new protocol, called

distributed color noncooperative selection (DCNS). The proposed configuration generates selfish behavior, thereby increasing channel utilization and throughput. DCNS fully exploits the Killer

configuration and provides new features, such as dynamic priority management, which modifies the

performance of the RFID readers when it is requested. Simulations have been conducted in order to analyze the effects of the innovations proposed. The proposed approach is especially suitable for low-cost

applications with a priority not uniformly distributed among readers. The experimental analysis has

shown that DCNS provides a greater throughput than the state-of-the-art protocols, even those with

additional requirements (e.g., 16 percent better than NFRA).

ETPL

PDS-071 Finite-Difference Wave Propagation Modeling on Special-Purpose Dataflow Machines

Abstract: Modeling wave propagation through the earth is an important application in geoscience. We present a framework for wave propagation modeling on special-purpose hardware, which dramatically

improves the application performance compared to conventional CPUs. We utilize custom hardware

platforms consisting of a mix of x86 CPUs and dataflow engines connected by high-bandwidth communication links. Application programmers describe their algorithms in a domain specific language

using Java syntax, with special dataflow semantics overlayed on top of the Java language. The

application-specific dataflow engines run at hundreds of MHz with massive parallelism and deliver high performance/Watt, up to 30 times more energy efficient than conventional CPUs. The power efficiency of

this approach suggests that dataflow computing may have a key role to play in the improvements in

power efficiency necessary to reach exascale computing.

ETPL

PDS-072 GKAR: A Novel Geographic $(K)$-Anycast Routing for Wireless Sensor Networks

Abstract: To efficiently archive and query data in wireless sensor networks (WSNs), distributed storage

systems, and multisink schemes have been proposed recently. However, such distributed access cannot be fully supported and exploited by existing routing protocols in a large-scale WSN. In this paper, we will

address this challenging issue and propose a distributed geographic $(K)$-anycast routing (GKAR)

protocol for WSNs, which can efficiently route data from a source sensor to any $(K)$ destinations (e.g., storage nodes or sinks). To guarantee $(K)$-delivery, an iterative approach is adopted in GKAR where in

each round, GKAR will determine not only the next hops at each node, but also a set of potential

destinations for every next hop node to reach. Efficient algorithms are designed to determine the selection of the next hops and destination set division at each intermediate node. We analyze the complexity of

GKAR in each round and we also theoretically analyze the expected number of rounds required to

guarantee $(K)$-delivery. Simulation results demonstrate the superiority of the GKAP scheme in

reducing the total duration and the communication overhead for finding $(K)$ destinations, by comparing with the existing schemes, e.g., $(K 1)$-anycast [10].




ETPL

PDS-073 Heterogeneous Resource Allocation under Degree Constraints

Abstract: In this paper, we consider the problem of assigning a set of clients with demands to a set of servers with capacities and degree constraints. The goal is to find an allocation such that the number of

clients assigned to a server is smaller than the server's degree and their overall demand is smaller than the

server's capacity, while maximizing the overall throughput. This problem has several natural applications in the context of independent tasks scheduling or virtual machines allocation. We consider both the

offline (when clients are known beforehand) and the online (when clients can join and leave the system at

any time) versions of the problem. We first show that the degree constraint on the maximal number of

clients that a server can handle is realistic in many contexts. Then, our main contribution is to prove that even if it makes the allocation problem more difficult (NP-Complete), a very small additive resource

augmentation on the servers degree is enough to find in polynomial time a solution that achieves at least

the optimal throughput. After a set of theoretical results on the complexity of the offline and online versions of the problem, we propose several other greedy heuristics to solve the online problem and we

compare the performance (in terms of throughput) and the cost (in terms of disconnections and

reconnections) of all proposed algorithms through a set of extensive simulation results

ETPL

PDS-074 Lightweight Location Verification Algorithms for Wireless Sensor Networks

Abstract: The knowledge of sensors' locations is crucial information for many applications in Wireless Sensor Networks (WSNs). When sensor nodes are deployed in hostile environments, the localization

schemes are vulnerable to various attacks, e.g., wormhole attack, pollution attack, range

enlargement/reduction attack, and etc. Therefore, sensors' locations are not trustworthy and need to be

verified before they can be used by location-based applications. Previous verification schemes either require group-based deployment knowledge of the sensor field, or depend on expensive or dedicated

hardware, thus they cannot be used for low-cost sensor networks. In this paper, we propose a lightweight

location verification system that performs both “on-spot” and “in-region” location verifications. The on-spot verification intends to verify whether the locations

claimed by sensors are far from their true spots beyond a certain distance. We propose two algorithms that

detect abnormal locations by exploring the inconsistencies between sensors' claimed locations and their

neighborhood observations. The in-region verification verifies whether a sensor is inside an application-specific verification region. Compared to on-spot verification, the in-region verification is tolerable to

large errors as long as the locations of sensors don't cause the application to malfunction. We study how

to derive the verification region for different applications and design a probabilistic algorithm to compute in-region confidence for each sensor. Experiment results show that our on-spot and in-region algorithms

can verify sensors' locations with high detection rate and low false positive rate. They are robust in the

presence of malicious attacks that are launched during the verification process. Moreover, compared with previous verification schemes, our algorithms are effective and lightweight because they do not rely on

the knowledge of deployment of senso- s, and they don't require expensive or dedicated hardware, so our

algorithms can be used in any low-cost sensor networks.

ETPL

PDS-075 Load Rebalancing for Distributed File Systems in Clouds




Abstract: Distributed file systems are key building blocks for cloud computing applications based on the

MapReduce programming paradigm. In such file systems, nodes simultaneously serve computing and

storage functions; a file is partitioned into a number of chunks allocated in distinct nodes so that MapReduce tasks can be performed in parallel over the nodes. However, in a cloud computing

environment, failure is the norm, and nodes may be upgraded, replaced, and added in the system. Files

can also be dynamically created, deleted, and appended. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed as uniformly as possible among the nodes. Emerging

distributed file systems in production systems strongly depend on a central node for chunk reallocation.

This dependence is clearly inadequate in a large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly scaled with the system size, and may thus

become the performance bottleneck and the single point of failure. In this paper, a fully distributed load

rebalancing algorithm is presented to cope with the load imbalance problem. Our algorithm is compared

against a centralized approach in a production system and a competing distributed solution presented in the literature. The simulation results indicate that our proposal is comparable with the existing centralized

approach and considerably outperforms the prior distributed algorithm in terms of load imbalance factor,

movement cost, and algorithmic overhead. The performance of our proposal implemented in the Hadoop distributed file system is further investigated in a cluster environment.

ETPL

PDS-076

Lower and Upper Bounds for Multicasting under Distance Dependent Forwarding

Cost Functions

Abstract: Assume a forwarding cost function which depends on the sender receiver separation, and

assume further that noncooperative relaying is applied. What is the minimum total forwarding cost

required for sending a message from source to one or more destinations when multicasting along optimal placed relaying nodes is applied? In this paper, I define and analyze cost function properties from which I

derive general lower bound expressions on multicasting costs. I consider an MAC layer model which does

not exploit the broadcast property of wireless communication and an MAC layer model which exploits it.

For specific cost functions, I show further that in case of optimal relay positions, multicasts can be constructed whose cost always stays below the derived lower bound expression plus an additive constant

depending on the number of destinations. For both, lower and upper bounds, I define a general procedure

to check if—and if yes how—my findings can be used to derive the specific lower and upper bound expressions for a given cost function. I explain the procedure with three cost function

examples: the euclidean distance, energy cost function, and the expected number of retransmissions under

Rayleigh fading.

ETPL

PDS-078 PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly

Abstract: The study of genomes has been revolutionized by sequencing machines that output many short

overlapping substrings (called reads). The task of sequence assembly in practice is to reconstruct long

contiguous genome subsequences from the reads. With Next Generation Sequencing (NGS) technologies,

assembly software needs to be more accurate, faster, and more memory-efficient due to the problem complexity and the size of the data sets. In this paper, we develop parallel algorithms and compressed

data structures to address several computational challenges of NGS assembly. We demonstrate how



http://www.elysiumtechnologies.com, [email protected] commonly available multicore architectures can be efficiently utilized for sequence assembly. In all

stages (indexing input strings, string graph construction and simplification, extraction of contiguous

subsequences) of our software Pasqual, we use shared-memory parallelism to speed up the assembly process. In our experiments with data of up to 6.8 billion base pairs, we demonstrate that Pasqual

generally delivers the best tradeoff between speed, memory consumption, and solution quality. On

synthetic and real data sets Pasqual scales well on our test machine with 40 CPU cores with increasing number of threads. Given enough cores, Pasqual is fastest in our comparison.

ETPL

PDS-079 Query-Log Aware Replicated Declustering

Abstract: Data declustering and replication can be used to reduce I/O times related with processing of data

intensive queries. Declustering parallelizes the query retrieval process by distributing the data items

requested by queries among several disks. Replication enables alternative disk choices for individual disk items and thus provides better query parallelism options. In general, existing replicated declustering

schemes do not consider query log information and try to optimize all possible queries for a specific

query type, such as range or spatial queries. In such schemes, it is assumed that two or more copies of all

data items are to be generated and scheduling of these copies to disks are discussed. However, in some applications, generation of even two copies of all of the data items is not feasible, since data items tend to

have very large sizes. In this work, we assume that there is a given limit on disk capacities and thus on

replication amounts. We utilize existing query-log information to propose a selective replicated declustering scheme, in which we select the data items to be replicated and decide on their scheduling

onto disks while respecting disk capacities. We propose and implement an iterative improvement

algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multiway replicated declustering. Then we improve the obtained multiway replicated

declustering by efficient refinement heuristics. Experiments conducted on realistic data sets show that the

proposed scheme yields better performance results compared to existing replicated declustering schemes.

ETPL

PDS-080

RASS: A Real-Time, Accurate, and Scalable System for Tracking Transceiver-Free

Objects

Abstract: Transceiver-free object tracking is to trace a moving object that does not carry any

communication device in an environment with some monitoring nodes predeployed. Among all the tracking technologies, RF-based technology is an emerging research field facing many challenges.

Although we proposed the original idea, until now there is no method achieving scalability without

sacrificing latency and accuracy. In this paper, we put forward a real-time tracking system RASS, which can achieve this goal and is promising in the applications like the safeguard system. Our basic idea is to

divide the tracking field into different areas, with adjacent areas using different communication channels.

So, the interference among different areas can be prevented. For each area, three communicating nodes are deployed on the ceiling as a regular triangle to monitor this area. In each triangle area, we use a

Support Vector Regression (SVR) model to locate the object. This model simulates the relationship

between the signal dynamics caused by the object and the object position. It not only considers the ideal

case of signal dynamics caused by the object, but also utilizes their irregular information. As a result, it can reach the tracking accuracy to around 1 m by just using three nodes in a triangle area with 4 m in each

side. The experiments show that the tracking latency of the proposed RASS system is bounded by only

about 0.26 m. Our system scales well to a large deployment field without sacrificing the latency and accuracy.




ETPL

PDS-081

Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological

Sequences Using GPU

Abstract: In Genome Projects, biological sequences are aligned thousands of times, in a daily basis. The Smith-Waterman algorithm is able to retrieve the optimal local alignment with quadratic time and space

complexity. So far, aligning huge sequences, such as whole chromosomes, with the Smith-Waterman

algorithm has been regarded as unfeasible, due to huge computing and memory requirements. However, high-performance computing platforms such as GPUs are making it possible to obtain the optimal result

for huge sequences in reasonable time. In this paper, we propose and evaluate CUDAlign 2.1, a parallel

algorithm that uses GPU to align huge sequences, executing the Smith-Waterman algorithm combined

with Myers-Miller, with linear space complexity. In order to achieve that, we propose optimizations which are able to reduce significantly the amount of data processed, while enforcing full parallelism most

of the time. Using the NVIDIA GTX 560 Ti board and comparing real DNA sequences that range from

162 KBP (Thousand Base Pairs) to 59 MBP (Million Base Pairs), we show that CUDAlign 2.1 is scalable. Also, we show that CUDAlign 2.1 is able to produce the optimal alignment between the

chimpanzee chromosome 22 (33 MBP) and the human chromosome 21 (47 MBP) in 8.4 hours and the

optimal alignment between the chimpanzee chromosome Y (24 MBP) and the human chromosome Y (59 MBP) in 13.1 hours.

ETPL

PDS-082 Scalable and Accurate Graph Clustering and Community Structure Detection

Abstract: One of the most useful measures of cluster quality is the modularity of the partition, which

measures the difference between the number of the edges joining vertices from the same cluster and the

expected number of such edges in a random graph. In this paper, we show that the problem of finding a

partition maximizing the modularity of a given graph $(G)$ can be reduced to a minimum weighted cut (MWC) problem on a complete graph with the same vertices as $(G)$. We then show that the resulting

minimum cut problem can be efficiently solved by adapting existing graph partitioning techniques. Our

algorithm finds clusterings of a comparable quality and is much faster than the existing clustering algorithms.

ETPL

PDS-083 Scaling Laws of Cognitive Ad Hoc Networks over General Primary Network Models

Abstract: We study the capacity scaling laws for the cognitive network that consists of the primary hybrid

network (PhN) and secondary ad hoc network (SaN). PhN is further comprised of an ad hoc network and

a base station-based (BS-based) network. SaN and PhN are overlapping in the same deployment region,

operate on the same spectrum, but are independent with each other in terms of communication requirements. The primary users (PUs), i.e., the ad hoc nodes in PhN, have the priority to access the

spectrum. The secondary users (SUs), i.e., the ad hoc nodes in SaN, are equipped with cognitive radios,

and have the functionalities to sense the idle spectrum and obtain the necessary information of primary nodes in PhN. We assume that PhN adopts one out of three classical types of strategies, i.e., pure ad hoc

strategy, BS-based strategy, and hybrid strategy. We aim to directly derive multicast capacity for SaN to

unify the unicast and broadcast capacities under two basic principles: 1) The throughput for PhN cannot be undermined in order sense due to the presence of SaN. 2) The protocol adopted by PhN does not alter

in the interest of SaN, anyway. Depending on which type of strategy is adopted in PhN, we design the

optimal-throughput strategy for SaN. We show that there exists a threshold of the density of SUs



http://www.elysiumtechnologies.com, [email protected] according to the density of PUs beyond which it can be proven that: 1) when PhN adopts the pure ad hoc

strategy or hybrid strategy, SaN can achieve the multicast capacity of the same order as it is stand-alone;

2) when PhN adopts the BS-based strategy, SaN can asymptotically achieve the multicast capacity of the same order as if PhN were absent, if some specific conditions in terms of relations among the numbers of

SUs, PUs, the destinations of each multicast session in SaN, and BSs in PhN hold.

ETPL

PDS-084

SyRaFa: Synchronous Rate and Frequency Adjustment for Utilization Control in

Distributed Real-Time Embedded Systems

Abstract: To efficiently utilize the computing resources and provide good quality of service (QoS) to the

end-to-end tasks in the distributed real-time systems, we can enforce the utilization bounds on multiple processors. The utilization control is challenging especially when the workload in the system is

unpredictable. To handle the workload uncertainties, current research favors feedback control techniques,

and recent work combines the task rate adaptation and processor frequency scaling in an asynchronous way for CPU utilization control, where task rates and the processor frequencies are tuned asynchronously

in two decoupled control loops for control convenience. Since the two manipulated variables, task rates

and processor frequencies, contribute to the CPU utilizations together with strong coupling, adjusting

them asynchronously may degrade the utilization control performance. In this paper, we provide a novel scheme to make synchronous rate and frequency adjustment to enforce the utilization setpoint, referred to

as SyRaFa scheme. SyRaFa can handle the workload uncertainties by identifying the system model online

and can simultaneously adjust the manipulated variables by solving an optimization problem in each sampling period. Extensive evaluation results demonstrate SyRaFa outperforms the existing schemes

especially under severe workload uncertainties.

ETPL

PDS-085 Anchor: A Versatile and Efficient Framework for Resource Management in the Cloud

Abstract: We present Anchor, a general resource management architecture that uses the stable matching

framework to decouple policies from mechanisms when mapping virtual machines to physical servers. In Anchor, clients and operators are able to express a variety of distinct resource management policies as

they deem fit, and these policies are captured as preferences in the stable matching framework. The

highlight of Anchor is a new many-to-one stable matching theory that efficiently matches VMs with

heterogeneous resource needs to servers, using both offline and online algorithms. Our theoretical analyses show the convergence and optimality of the algorithm. Our experiments with a prototype

implementation on a 20-node server cluster, as well as large-scale simulations based on real-world

workload traces, demonstrate that the architecture is able to realize a diverse set of policy objectives with good performance and practicality.

ETPL

PDS-086

Efficient Resource Mapping Framework over Networked Clouds via Iterated Local

Search-Based Request Partitioning

Abstract: The cloud represents a computing paradigm where shared configurable resources are provided

as a service over the Internet. Adding intra- or intercloud communication resources to the resource mix

leads to a networked cloud computing environment. Following the cloud infrastructure as a Service paradigm and in order to create a flexible management framework, it is of paramount importance to

address efficiently the resource mapping problem within this context. To deal with the inherent

complexity and scalability issue of the resource mapping problem across different administrative



http://www.elysiumtechnologies.com, [email protected] domains, in this paper a hierarchical framework is described. First, a novel request partitioning approach

based on Iterated Local Search is introduced that facilitates the cost-efficient and online splitting of user

requests among eligible cloud service providers (CPs) within a networked cloud environment. Following and capitalizing on the outcome of the request partitioning phase, the embedding phase-where the actual

mapping of requested virtual to physical resources is performed can be realized through the use of a

distributed intracloud resource mapping approach that allows for efficient and balanced allocation of cloud resources. Finally, a thorough evaluation of the proposed overall framework on a simulated

networked cloud environment is provided and critically compared against an exact request partitioning

solution as well as another common intradomain virtual resource embedding solution.

ETPL

PDS-087 Optimal Multiserver Configuration for Profit Maximization in Cloud Computing

Abstract: As cloud computing becomes more and more popular, understanding the economics of cloud computing becomes critically important. To maximize the profit, a service provider should understand

both service charges and business costs, and how they are determined by the characteristics of the

applications and the configuration of a multiserver system. The problem of optimal multiserver

configuration for profit maximization in a cloud computing environment is studied. Our pricing model takes such factors into considerations as the amount of a service, the workload of an application

environment, the configuration of a multiserver system, the service-level agreement, the satisfaction of a

consumer, the quality of a service, the penalty of a low-quality service, the cost of renting, the cost of energy consumption, and a service provider's margin and profit. Our approach is to treat a multiserver

system as an M/M/m queuing model, such that our optimization problem can be formulated and solved

analytically. Two server speed and power consumption models are considered, namely, the idle-speed model and the constant-speed model. The probability density function of the waiting time of a newly

arrived service request is derived. The expected service charge to a service request is calculated. The

expected net business gain in one unit of time is obtained. Numerical calculations of the optimal server

size and the optimal server speed are demonstrated.

ETPL

PDS-088 Error-Tolerant Resource Allocation and Payment Minimization for Cloud System

Abstract: With virtual machine (VM) technology being increasingly mature, compute resources in cloud systems can be partitioned in fine granularity and allocated on demand. We make three contributions in

this paper: 1) We formulate a deadline-driven resource allocation problem based on the cloud

environment facilitated with VM resource isolation technology, and also propose a novel solution with polynomial time, which could minimize users' payment in terms of their expected deadlines. 2) By

analyzing the upper bound of task execution length based on the possibly inaccurate workload prediction,

we further propose an error-tolerant method to guarantee task's completion within its deadline. 3) We validate its effectiveness over a real VM-facilitated cluster environment under different levels of

competition. In our experiment, by tuning algorithmic input deadline based on our derived bound, task

execution length can always be limited within its deadline in the sufficient-supply situation; the mean

execution length still keeps 70 percent as high as user-specified deadline under the severe competition. Under the original-deadline-based solution, about 52.5 percent of tasks are completed within 0.95-1.0 as

high as their deadlines, which still conforms to the deadline-guaranteed requirement. Only 20 percent of

tasks violate deadlines, yet most (17.5 percent) are still finished within 1.05 times of deadlines.




ETPL

PDS-089

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing

Environment

Abstract: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from resource multiplexing through

virtualization technology. In this paper, we present a system that uses virtualization technology to allocate

data center resources dynamically based on application demands and support green computing by optimizing the number of servers in use. We introduce the concept of "skewness” to measure the

unevenness in the multidimensional resource utilization of a server. By minimizing skewness, we can

combine different types of workloads nicely and improve the overall utilization of server resources. We

develop a set of heuristics that prevent overload in the system effectively while saving energy used. Trace driven simulation and experiment results demonstrate that our algorithm achieves good performance.

ETPL

PDS-090

Performance Enhancement for Network I/O Virtualization with Efficient Interrupt

Coalescing and Virtual Receive-Side Scaling

Abstract: Virtualization is a key technology in cloud computing; it can accommodate numerous guest

VMs to provide transparent services, such as live migration, high availability, and rapid checkpointing.

Cloud computing using virtualization allows workloads to be deployed and scaled quickly through the rapid provisioning of virtual machines on physical machines. However, I/O virtualization, particularly for

networking, suffers from significant performance degradation in the presence of high-speed networking

connections. In this paper, we first analyze performance challenges in network I/O virtualization and identify two problems-conventional network I/O virtualization suffers from excessive virtual interrupts to

guest VMs, and the back-end driver does not efficiently use the computing resources of underlying

multicore processors. To address these challenges, we propose optimization methods for enhancing the

networking performance: 1) Efficient interrupt coalescing for network I/O virtualization and 2) virtual receive-side scaling to effectively leverage multicore processors. These methods are implemented and

evaluated with extensive performance tests on a Xen virtualization platform. Our experimental results

confirm that the proposed optimizations can significantly improve network I/O virtualization performance and effectively solve the performance challenges.

ETPL

PDS-091 A New Disk I/O Model of Virtualized Cloud Environment

Abstract: In a traditional virtualized cloud environment, using asynchronous I/O in the guest file system

and synchronous I/O in the host file system to handle an asynchronous user disk write exhibits several

drawbacks, such as performance disturbance among different guests and consistency maintenance across

guest failures. To improve these issues, this paper introduces a novel disk I/O model for virtualized cloud system called HypeGear, where the guest file system uses synchronous operations to deal with the guest

write request and the host file system performs asynchronous operations to write the data to the hard disk.

A prototype system is implemented on the Xen hypervisor and our experimental results verify that this new model has many advantages over the conventional asynchronous-synchronous model. We also

evaluate the overhead of asynchronous I/O at host, which is brought by our new model. The result

demonstrates that it enforces little cost on host layer.

ETPL

PDS-092 Improving Data Center Network Utilization Using Near-Optimal Traffic Engineering




Abstract: Equal cost multiple path (ECMP) forwarding is the most prevalent multipath routing used in

data center (DC) networks today. However, it fails to exploit increased path diversity that can be provided

by traffic engineering techniques through the assignment of nonuniform link weights to optimize network resource usage. To this extent, constructing a routing algorithm that provides path diversity over

nonuniform link weights (i.e., unequal cost links), simplicity in path discovery and optimality in

minimizing maximum link utilization (MLU) is nontrivial. In this paper, we have implemented and evaluated the Penalizing Exponential Flow-spliTing (PEFT) algorithm in a cloud DC environment based

on two dominant topologies, canonical and fat tree. In addition, we have proposed a new cloud DC

topology which, with only a marginal modification of the current canonical tree DC architecture, can further reduce MLU and increase overall network capacity utilization through PEFT routing.

ETPL

PDS-093 Electricity Cost Saving Strategy in Data Centers by Using Energy Storage

Abstract: Electricity expenditure comprises a significant fraction of the total operating cost in data

centers. Hence, cloud service providers are required to reduce electricity cost as much as possible. In this

paper, we consider utilizing existing energy storage capabilities in data centers to reduce electricity cost under wholesale electricity markets, where the electricity price exhibits both temporal and spatial

variations. A stochastic program is formulated by integrating the center-level load balancing, the server-

level configuration, and the battery management while at the same time guaranteeing the quality-of-service experience by end users. We use the Lyapunov optimization technique to design an online

algorithm that achieves an explicit tradeoff between cost saving and energy storage capacity. We

demonstrate the effectiveness of our proposed algorithm through extensive numerical evaluations based on real-world workload and electricity price data sets. As far as we know, our work is the first to explore

the problem of electricity cost saving using energy storage in multiple data centers by considering both

the spatial and temporal variations in wholesale electricity prices and workload arrival processes.

ETPL

PDS-094 Simple and Effective Dynamic Provisioning for Power-Proportional Data Centers

Abstract: Energy consumption represents a significant cost in data center operation. A large fraction of

the energy, however, is used to power idle servers when the workload is low. Dynamic provisioning techniques aim at saving this portion of the energy, by turning off unnecessary servers. In this paper, we

explore how much gain knowing future workload information can bring to dynamic provisioning. In

particular, we develop online dynamic provisioning solutions with and without future workload information available. We first reveal an elegant structure of the offline dynamic provisioning problem,

which allows us to characterize the optimal solution in a “divide-andconquer” manner. We then exploit

this insight to design two online algorithms with competitive ratios 2 - α and e/(e - 1 + α), respectively, where 0 ≤ α ≤ 1 is the normalized size of a look-ahead window in which future workload information is

available. A fundamental observation is that future workload information beyond the full-size look-ahead

window (corresponding to α = 1) will not improve dynamic provisioning performance. Our algorithms are

decentralized and easy to implement. We demonstrate their effectiveness in simulations using real-world traces.




ETPL

PDS-095

Harnessing the Cloud for Securely Outsourcing Large-Scale Systems of Linear

Equations

Abstract: Cloud computing economically enables customers with limited computational resources to outsource large-scale computations to the cloud. However, how to protect customers' confidential data

involved in the computations then becomes a major security concern. In this paper, we present a secure

outsourcing mechanism for solving large-scale systems of linear equations (LE) in cloud. Because applying traditional approaches like Gaussian elimination or LU decomposition (aka. direct method) to

such large-scale LEs would be prohibitively expensive, we build the secure LE outsourcing mechanism

via a completely different approach-iterative method, which is much easier to implement in practice and

only demands relatively simpler matrix-vector operations. Specifically, our mechanism enables a customer to securely harness the cloud for iteratively finding successive approximations to the LE

solution, while keeping both the sensitive input and output of the computation private. For robust cheating

detection, we further explore the algebraic property of matrix-vector operations and propose an efficient result verification mechanism, which allows the customer to verify all answers received from previous

iterative approximations in one batch with high probability. Thorough security analysis and prototype

experiments on Amazon EC2 demonstrate the validity and practicality of our proposed design.

ETPL

PDS-096 Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud

Abstract: With the character of low maintenance, cloud computing provides an economical and efficient solution for sharing group resource among cloud users. Unfortunately, sharing data in a multi-owner

manner while preserving data and identity privacy from an untrusted cloud is still a challenging issue, due

to the frequent change of the membership. In this paper, we propose a secure multi-owner data sharing

scheme, named Mona, for dynamic groups in the cloud. By leveraging group signature and dynamic broadcast encryption techniques, any cloud user can anonymously share data with others. Meanwhile, the

storage overhead and encryption computation cost of our scheme are independent with the number of

revoked users. In addition, we analyze the security of our scheme with rigorous proofs, and demonstrate the efficiency of our scheme in experiments.

ETPL

PDS-097

A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-Effective

Privacy Preserving of Intermediate Data Sets in Cloud

Abstract: Cloud computing provides massive computation power and storage capacity which enable users

to deploy computation and data-intensive applications without infrastructure investment. Along the

processing of such applications, a large volume of intermediate data sets will be generated, and often

stored to save the cost of recomputing them. However, preserving the privacy of intermediate data sets becomes a challenging problem because adversaries may recover privacy-sensitive information by

analyzing multiple intermediate data sets. Encrypting ALL data sets in cloud is widely adopted in existing

approaches to address this challenge. But we argue that encrypting all intermediate data sets are neither efficient nor cost-effective because it is very time consuming and costly for data-intensive applications to

en/decrypt data sets frequently while performing any operation on them. In this paper, we propose a novel

upper bound privacy leakage constraint-based approach to identify which intermediate data sets need to be encrypted and which do not, so that privacy-preserving cost can be saved while the privacy

requirements of data holders can still be satisfied. Evaluation results demonstrate that the privacy-

preserving cost of intermediate data sets can be significantly reduced with our approach over existing



http://www.elysiumtechnologies.com, [email protected] ones where all data sets are encrypted.

ETPL

PDS-098

A Truthful Dynamic Workflow Scheduling Mechanism for Commercial Multicloud

Environments

Abstract: The ultimate goal of cloud providers by providing resources is increasing their revenues. This

goal leads to a selfish behavior that negatively affects the users of a commercial multicloud environment.

In this paper, we introduce a pricing model and a truthful mechanism for scheduling single tasks considering two objectives: monetary cost and completion time. With respect to the social cost of the

mechanism, i.e., minimizing the completion time and monetary cost, we extend the mechanism for

dynamic scheduling of scientific workflows. We theoretically analyze the truthfulness and the efficiency of the mechanism and present extensive experimental results showing significant impact of the selfish

behavior of the cloud providers on the efficiency of the whole system. The experiments conducted using

real-world and synthetic workflow applications demonstrate that our solutions dominate in most cases the

Pareto-optimal solutions estimated by two classical multiobjective evolutionary algorithms.

ETPL

PDS-099 QoS Ranking Prediction for Cloud Services

Abstract: Cloud computing is becoming popular. Building high-quality cloud applications is a critical research problem. QoS rankings provide valuable information for making optimal cloud service selection

from a set of functionally equivalent service candidates. To obtain QoS values, real-world invocations on

the service candidates are usually required. To avoid the time-consuming and expensive real-world service invocations, this paper proposes a QoS ranking prediction framework for cloud services by taking

advantage of the past service usage experiences of other consumers. Our proposed framework requires no

additional invocations of cloud services when making QoS ranking prediction. Two personalized QoS

ranking prediction approaches are proposed to predict the QoS rankings directly. Comprehensive experiments are conducted employing real-world QoS data, including 300 distributed users and 500 real-

world web services all over the world. The experimental results show that our approaches outperform

other competing approaches.

ETPL

PDS-100 Cloudy with a Chance of Cost Savings

Abstract: Cloud-based hosting is claimed to possess many advantages over traditional in-house (on-premise) hosting such as better scalability, ease of management, and cost savings. It is not difficult to

understand how cloud-based hosting can be used to address some of the existing limitations and extend

the capabilities of many types of applications. However, one of the most important questions is whether cloud-based hosting will be economically feasible for my application if migrated into the cloud. It is not

straightforward to answer this question because it is not clear how my application will benefit from the

claimed advantages, and, in turn, be able to convert them into tangible cost savings. Within cloud-based

hosting offerings, there is a wide range of hosting options one can choose from, each impacting the cost in a different way. Answering these questions requires an in-depth understanding of the cost implications of

all the possible choices specific to my circumstances. In this study, we identify a diverse set of key factors

affecting the costs of deployment choices. Using benchmarks representing two different applications (TPC-W and TPC-E) we investigate the evolution of costs for different deployment choices. We consider



http://www.elysiumtechnologies.com, [email protected] important application characteristics such as workload intensity, growth rate, traffic size, storage, and

software license to understand their impact on the overall costs. We also discuss the impact of workload

variance and cloud elasticity, and certain cost factors that are subjective in nature.

ETPL

PDS-101

A Highly Practical Approach toward Achieving Minimum Data Sets Storage Cost in

the Cloud

Abstract: Massive computation power and storage capacity of cloud computing systems allow scientists

to deploy computation and data intensive applications without infrastructure investment, where large

application data sets can be stored in the cloud. Based on the pay-as-you-go model, storage strategies and

benchmarking approaches have been developed for cost-effectively storing large volume of generated application data sets in the cloud. However, they are either insufficiently cost-effective for the storage or

impractical to be used at runtime. In this paper, toward achieving the minimum cost benchmark, we

propose a novel highly cost-effective and practical storage strategy that can automatically decide whether a generated data set should be stored or not at runtime in the cloud. The main focus of this strategy is the

local-optimization for the tradeoff between computation and storage, while secondarily also taking users'

(optional) preferences on storage into consideration. Both theoretical analysis and simulations conducted

on general (random) data sets as well as specific real world applications with Amazon's cost model show that the cost-effectiveness of our strategy is close to or even the same as the minimum cost benchmark,

and the efficiency is very high for practical runtime utilization in the cloud.

ETPL

PDS-102

Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production

Cloud Computing Systems

Abstract: Performance diagnosis is labor intensive in production cloud computing systems. Such systems

typically face many real-world challenges, which the existing diagnosis techniques for such distributed systems cannot effectively solve. An efficient, unsupervised diagnosis tool for locating fine-grained

performance anomalies is still lacking in production cloud computing systems. This paper proposes

CloudDiag to bridge this gap. Combining a statistical technique and a fast matrix recovery algorithm, CloudDiag can efficiently pinpoint fine-grained causes of the performance problems, which does not

require any domain-specific knowledge to the target system. CloudDiag has been applied in a practical

production cloud computing systems to diagnose performance problems. We demonstrate the

effectiveness of CloudDiag in three real-world case studies.

ETPL

PDS-103 A Fast RPC System for Virtual Machines

Abstract: Despite the advances in high performance interdomain communications for virtual machines (VM), data intensive applications developed for VMs based on the traditional remote procedure call

(RPC) mechanism still suffer from performance degradation due to the inherent inefficiency of data

serialization/deserilization operations. This paper presents VMRPC, a lightweight RPC framework specifically designed for VMs that leverages the heap and stack sharing mechanism to circumvent

unnecessary data copy and serialization/deserilization. Our evaluation shows that the performance of

VMRPC is an order of magnitude better than traditional RPC systems and existing alternative interdomain communication optimization systems. The evaluation on a VMRPC-enhanced networked file

system across a varied range of benchmarks further reveals the competitiveness of VMRPC in IO-



http://www.elysiumtechnologies.com, [email protected] intensive applications.

ETPL

PDS-103 ASAP: Scalable Collision Arbitration for Large RFID Systems

Abstract: The growing importance of operations such as identification, location sensing, and object

tracking has led to increasing interests in contactless Radio Frequency Identification (RFID) systems.

Enjoying the low cost of RFID tags, modern RFID systems tend to be deployed for large-scale mobile objects. Both the theoretical and experimental results suggest that when tags are in large numbers, most

existing collision arbitration protocols do not satisfy the scalability and time-efficiency requirements of

many applications. To address this problem, we propose Adaptively Splitting-based Arbitration Protocol (ASAP), a scheme that provides efficient RFID identification for both small and large deployment of

RFID tags, in terms of time and energy cost. Theoretical analysis and simulation evaluation show that the

performance of ASAP is better than most existing collision-arbitration solutions and the time efficiency is

close to the theoretically optimal values.

ETPL

PDS-103 Attached-RTS: Eliminating an Exposed Terminal Problem in Wireless Networks

Abstract: Leveraging concurrent transmission is a promising way to improve throughput in wireless networks. Existing media access control (MAC) protocols like carrier sense multiple access always try to

minimize the number of concurrent transmissions to avoid collision, although collisions at sender sides

are harmless to the overall performance. The reason for such conservative strategy is that those protocols cannot obtain accurate channel status (who is transmitting and receiving) with low cost. They can only

avoid potential collisions through rough channel status (idle or busy). To obtain additional information in

a cost-efficient way, we propose a novel coding scheme, Attachment Coding, to allow control information

to be “attached” on data packet. Nodes then transmit two kinds of signals simultaneously, without degrading the effective throughput of the original data traffic. Based on Attachment Coding, we propose

an Attached-RTS MAC (AR-MAC) to exploit exposed terminals for concurrent transmissions. The

attached control information provides accurate channel status for nodes in real time. Therefore, nodes can identify exposed terminals and utilize them for concurrent transmission. We theoretically analyze the

feasibility of Attachment Coding, and implement it on the GNU Radio testbed to further verify it. We also

conduct extensive simulations to evaluate the performance of Attached-RTS. The experimental results show that by leveraging Attachment Coding, AR-MAC achieves up to 180 percent in dense deployed ad

hoc networks.

ETPL

PDS-103

"ESWC: Efficient Scheduling for the Mobile Sink in Wireless Sensor Networks with

Delay Constraint

Abstract: This paper exploits sink mobility to prolong the network lifetime in wireless sensor networks

where the information delay caused by moving the sink should be bounded. Due to the combinational

complexity of this problem, most previous proposals focus on heuristics and provable optimal algorithms remain unknown. In this paper, we build a unified framework for analyzing this joint sink mobility,

routing, delay, and so on. We discuss the induced subproblems and present efficient solutions for them.

Then, we generalize these solutions and propose a polynomial-time optimal algorithm for the origin problem. In simulations, we show the benefits of involving a mobile sink and the impact of network



http://www.elysiumtechnologies.com, [email protected] parameters (e.g., the number of sensors, the delay bound, etc.) on the network lifetime. Furthermore, we

study the effects of different trajectories of the sink and provide important insights for designing mobility

schemes in real-world mobile WNNs.

ETPL

PDS-103 Grouping-Proofs-Based Authentication Protocol for Distributed RFID Systems

Abstract: Along with radio frequency identification (RFID) becoming ubiquitous, security issues have

attracted extensive attentions. Most studies focus on the single-reader and single-tag case to provide

security protection, which leads to certain limitations for diverse applications. This paper proposes a

grouping-proofs-based authentication protocol (GUPA) to address the security issue for multiple readers and tags simultaneous identification in distributed RFID systems. In GUPA, distributed authentication

mode with independent subgrouping proofs is adopted to enhance hierarchical protection; an asymmetric

denial scheme is applied to grant fault-tolerance capabilities against an illegal reader or tag; and a sequence-based odd-even alternation group subscript is presented to define a function for secret updating.

Meanwhile, GUPA is analyzed to be robust enough to resist major attacks such as replay, forgery,

tracking, and denial of proof. Furthermore, performance analysis shows that compared with the known

grouping-proof or yoking-proof-based protocols, GUPA has lower communication overhead and computation load. It indicates that GUPA realizing both secure and simultaneous identification is efficient

for resource-constrained distributed RFID systems.

ETPL

PDS-103 Hint-Based Execution of Workloads in Clouds with Nefeli

Abstract: Infrastructure-as-a-Service clouds offer entire virtual infrastructures for distributed processing

while concealing all physical underlying machinery. Current cloud interface abstractions restrict users from providing information regarding usage patterns of their requested virtual machines (VMs). In this

paper, we propose Nefeli, a virtual infrastructure gateway that lifts this restriction. Through Nefeli, cloud

consumers provide deployment hints on the possible mapping of VMs to physical nodes. Such hints include the collocation and anticollocation of VMs, the existence of potential performance bottlenecks,

the presence of underlying hardware features (e.g., high availability), the proximity of certain VMs to

data repositories, or any other information that would contribute in a more effective placement of VMs to

physical hosting nodes. Consumers designate only properties of their virtual infrastructure and remain at all times agnostic to the cloud internal physical characteristics. The set of consumer-provided hints is

augmented with high-level placement policies specified by the cloud administration. Placement policies

and hints form a constraint satisfaction problem that when solved, yields the final VM-to-host placement. As workloads executed by the cloud may change over time, VM-to-host mappings must follow suit. To

this end, Nefeli captures such events, changes VM deployment, helps avoid bottlenecks, and ultimately,

improves the quality of the rendered services. Using our prototype, we examine overheads involved and show significant improvements in terms of time needed to execute scientific and real application

workloads. We also demonstrate how power-aware policies may reduce the energy consumption of the

physical installation. Finally, we compare Nefeli's placement choices with those attained by the open-

source cloud middleware, OpenNebula.

ETPL

PDS-103

Hypocomb: Bounded-Degree Localized Geometric Planar Graphs for Wireless Ad Hoc

Networks




Abstract: We propose a radically new family of geometric graphs, i.e., Hypocomb (HC), Reduced

Hypocomb (RHC), and Local Hypocomb (LHC). HC and RHC are extracted from a complete graph;

LHC is extracted from a Unit Disk Graph (UDG). We analytically study their properties including connectivity, planarity, and degree bound. All these graphs are connected (provided that the original

graph is connected) planar. Hypocomb has unbounded degree while Reduced Hypocomb and Local

Hypocomb have maximum degree 6 and 8, respectively. To our knowledge, Local Hypocomb is the first strictly localized, degree-bounded planar graph computed using merely 1-hop neighbor position

information. We present a construction algorithm for these graphs and analyze its time complexity.

Hypocomb family graphs are promising for wireless ad hoc networking. We report our numerical results on their average degree and their impact on FACE routing. We discuss their potential applications and

pinpoint some interesting open problems for future research.

ETPL

PDS-103 Link Scheduling for Exploiting Spatial Reuse in Multihop MIMO Networks

Abstract: Multiple-Input-Multiple-Output (MIMO) has great potential for enhancing the throughput of

multihop wireless networks via spatial multiplexing or spatial reuse. Spatial reuse with Stream Control

(SC) provides a considerable improvement of the network throughput over spatial multiplexing. The gain of spatial reuse, however, is still not fully exploited. There exist large numbers of additional data streams,

which could be transmitted concurrently with those data streams scheduled by stream control at certain

time slots and vicinities. In this paper, we address the issue of MIMO link scheduling to maximize the gain of spatial reuse and thus network throughput. We propose a Receiver-Oriented Interference

Suppression model (ROIS), based on which we design both centralized and distributed link scheduling

algorithms to fully exploit the gain of spatial reuse in multihop MIMO networks. Further, we address the traffic-aware link scheduling problem by injecting nonuniform traffic load into the network. Through

theoretical analysis and comprehensive performance evaluation, we achieve the following results: 1) link

scheduling based on ROIS achieves significant higher network throughput than that based on stream

control, with any interference range, number of antennas, and average hop length of data flows. 2) The traffic-aware scheduling is enticingly complementary to the link scheduling based on ROIS model.

Accordingly, the two scheduling schemes can be combined to further enhance the network throughput.

ETPL

PDS-103

Managing Overloaded Hosts for Dynamic Consolidation of Virtual Machines in Cloud

Data Centers under Quality of Service Constraints

Abstract: Dynamic consolidation of virtual machines (VMs) is an effective way to improve the utilization

of resources and energy efficiency in cloud data centers. Determining when it is best to reallocate VMs from an overloaded host is an aspect of dynamic VM consolidation that directly influences the resource

utilization and quality of service (QoS) delivered by the system. The influence on the QoS is explained by

the fact that server overloads cause resource shortages and performance degradation of applications. Current solutions to the problem of host overload detection are generally heuristic based, or rely on

statistical analysis of historical data. The limitations of these approaches are that they lead to suboptimal

results and do not allow explicit specification of a QoS goal. We propose a novel approach that for any

known stationary workload and a given state configuration optimally solves the problem of host overload detection by maximizing the mean intermigration time under the specified QoS goal based on a Markov

chain model. We heuristically adapt the algorithm to handle unknown nonstationary workloads using the

Multisize Sliding Window workload estimation technique. Through simulations with workload traces



http://www.elysiumtechnologies.com, [email protected] from more than a thousand PlanetLab VMs, we show that our approach outperforms the best benchmark

algorithm and provides approximately 88 percent of the performance of the optimal offline algorithm.

ETPL

PDS-103

Per-Flow Queue Management with Succinct Priority Indexing Structures for High

Speed Packet Scheduling

Abstract: Priority queues are essential building blocks for implementing advanced per-flow service disciplines and hierarchical quality-of-service at high-speed network links. Scalable priority queue

implementation requires solutions to two fundamental problems. The first is to sort queue elements in real

time at ever increasing line speeds (e.g., at OC-768 rates). The second is to store a huge number of

packets (e.g., millions of packets). In this paper, we propose novel solutions by decomposing the problem into two parts, a succinct priority index (PI) in SRAM that can efficiently maintain a real-time sorting of

priorities, coupled with a DRAM-based implementation of large packet buffers. In particular, we propose

three related novel succinct PI data structures for implementing high-speed PIs: a PI, a counting priority index (CPI), and a pipelined counting priority index (pCPI). We show that all three structures can be very

compactly implemented in SRAM using only ⊖(U) space, where U is the size of the universe required to

implement the priority keys (time stamps). We also show that our proposed PI structures can be

implemented very efficiently as well by leveraging hardware-optimized instructions that are readily

available in modern 64-bit processors. The operations on the PI and CPI structures take ⊖(logW U) time

complexity, where W is the processor word length (i.e., W = 64). Alternatively, operations on the pCPI

structure take amortized constant time with only ⊖(logW U) pipeline stages (e.g., only four pipeline stages for U = 16 million). Finally, we show the application of our proposed PI structures for the scalable

management of large packet buffers at line speeds. The pCPI structure can be implemented efficiently in

high-performance network processing applications such as advanced per-flow scheduling with quality-of-

service guarantee.

ETPL

PDS-103 POVA: Traffic Light Sensing with Probe Vehicles

Abstract: Traffic light sensing aims to detect the status of traffic lights which is valuable for many applications such as traffic management, traffic light optimization, and real-time vehicle navigation. In

this work, we develop a system called POVA for traffic light sensing in large-scale urban areas. The

system employs pervasive probe vehicles that just report real-time states of position and speed from time to time. POVA has advantages of wide coverage and low deployment cost. The important observation

motivating the design of POVA is that a traffic light has a considerable impact on mobility of vehicles on

the road attached to the traffic light. However, the system design faces three unique challenges: 1) Probe reports are by nature discrete while the goal of traffic light sensing is to determine the state of a traffic

light at any time; 2) there may be a very limited number of probe reports in a given duration for traffic

light state estimation; and 3) a traffic light may change its state with a variable interval. To tackle the

challenges, we develop a new technique that makes the best use of limited probe reports as well as statistical features of light states. It first estimates the state of a traffic light at the time instant of a report

by applying maximum a posterior estimation. Then, we formulate the state estimation of a light at any

time into a joint optimization problem that is solved by an efficient heuristic algorithm. We have implemented the system and tested it with a fleet of around 4,000 probe taxis and 2,000 buses in

Shanghai, China. Trace-driven experimentation and field study show that nearly 60 percent of traffic

lights have an estimation error lower than 19 percent if 20,000 probe vehicles would be employed in the

urban area of Shanghai. We further demonstrate that the estimation error rate is as low as 18 percent even



http://www.elysiumtechnologies.com, [email protected] when the number of available reports is merely 1 per minute.

ETPL

PDS-103 Resisting Web Proxy-Based HTTP Attacks by Temporal and Spatial Locality Behavior

Abstract: A novel server-side defense scheme is proposed to resist the Web proxy-based distributed denial

of service attack. The approach utilizes the temporal and spatial locality to extract the behavior features of

the proxy-to-server traffic, which makes the scheme independent of the traffic intensity and frequently

varying Web contents. A nonlinear mapping function is introduced to protect weak signals from the interference of infrequent large values. Then, a new hidden semi-Markov model parameterized by

Gaussian-mixture and Gamma distributions is proposed to describe the time-varying traffic behavior of

Web proxies. The new method reduces the number of parameters to be estimated, and can characterize the dynamic evolution of the proxy-to-server traffic rather than the static statistics. Two diagnosis approaches

at different scales are introduced to meet the requirement of both fine-grained and coarse-grained

detection. Soft control is a novel attack response method proposed in this work. It converts a suspicious

traffic into a relatively normal one by behavior reshaping rather than rudely discarding. This measure can protect the quality of services of legitimate users. The experiments confirm the effectiveness of the

proposed scheme

ETPL

PDS-103

Runtime Contention and Bandwidth-Aware Adaptive Routing Selection Strategies for

Networks-on-Chip

Abstract: This paper presents adaptive routing selection strategies suitable for network-on-chip (NoC).

The main prototype presented in this paper uses contention information and bandwidth space occupancy to make routing decision at runtime during application execution time. The performance of the NoC

router is compared to other NoC routers with queue-length-oriented adaptive routing selection strategies.

The evaluation results show that the contention- and bandwidth-aware adaptive routing selection strategies are better than the queue-length-oriented adaptive selection strategies. Messages in the NoC are

switched with a wormhole cut-through switching method, where different messages can be interleaved at

flit-level in the same communication link without using virtual channels. Hence, the head-of-line blocking

problem can be solved effectively and efficiently. The routing control concept and the VLSI microarchitecture of the NoC routers are also presented in this paper.

ETPL

PDS-103

Self-Adaptive Contention Aware Routing Protocol for Intermittently Connected

Mobile Networks

Abstract: This paper introduces a novel multicopy routing protocol, called Self-Adaptive Utility-based

Routing Protocol (SAURP), for Delay Tolerant Networks (DTNs) that are possibly composed of a vast

number of devices in miniature such as smart phones of heterogeneous capacities in terms of energy resources and buffer spaces. SAURP is characterized by the ability of identifying potential opportunities

for forwarding messages to their destinations via a novel utility function-based mechanism, in which a

suite of environment parameters, such as wireless channel condition, nodal buffer occupancy, and encounter statistics, are jointly considered. Thus, SAURP can reroute messages around nodes



http://www.elysiumtechnologies.com, [email protected] experiencing high-buffer occupancy, wireless interference, and/or congestion, while taking a considerably

small number of transmissions. The developed utility function in SAURP is proved to be able to achieve

optimal performance, which is further analyzed via a stochastic modeling approach. Extensive simulations are conducted to verify the developed analytical model and compare the proposed SAURP

with a number of recently reported encounter-based routing approaches in terms of delivery ratio,

delivery delay, and the number of transmissions required for each message delivery. The simulation results show that SAURP outperforms all the counterpart multicopy encounter-based routing protocols

considered in the study.

ETPL

PDS-103 Sensor Network Navigation without Locations

Abstract: We propose a pervasive usage of the sensor network infrastructure as a cyber-physical system for navigating internal users in locations of potential danger. Our proposed application differs from

previous work in that they typically treat the sensor network as a media of data acquisition while in our

navigation application, in-situ interactions between users and sensors become ubiquitous. In addition,

human safety and time factors are critical to the success of our objective. Without any preknowledge of user and sensor locations, the design of an effective and efficient navigation protocol faces nontrivial

challenges. We propose to embed a road map system in the sensor network without location information

so as to provide users navigating routes with guaranteed safety. We accordingly design efficient road map updating mechanisms to rebuild the road map in the event of changes in dangerous areas. In this

navigation system, each user only issues local queries to obtain their navigation route. The system is

highly scalable for supporting multiple users simultaneously. We implement a prototype system with 36 TelosB motes to validate the effectiveness of this design. We further conduct comprehensive and large-

scale simulations to examine the efficiency and scalability of the proposed approach under various

environmental dynamics.

ETPL

PDS-103 The Bodyguard Allocation Problem

Abstract: In this paper, we introduce the Bodyguard Allocation Problem (BAP) game, that illustrates the

behavior of processes with contradictory individual goals in distributed systems. In particular, the game deals with the conflict of interest between two classes of processes that maximize/minimize their distance

to a special process called the root. A solution of the BAP game represents a rooted spanning tree in

which there exists a condition of equilibrium with maximum social welfare. We analyze the inefficiency of equilibria of the game based on both a completely cooperative and noncooperative approach.

Additionally, we design two algorithms, CBAP and DBAP, that provide approximated solutions for the

BAP game. We prove that both algorithms always terminate in a configuration with equilibrium and we analyze their running time based on the approach of cooperation used. We perform experimental

simulations to compare the overall quality of equilibria obtained by the proposed algorithms.

ETPL

PDS-103

A 3.42-Approximation Algorithm for Scheduling Malleable Tasks under Precedence

Constraints

Abstract: Scheduling malleable tasks under general precedence constraints involves finding a minimum



http://www.elysiumtechnologies.com, [email protected] makespan (maximum completion time) by a feasible allotment. Based on the monotonous penalty

assumptions of Blayo et al. [2], this work defines two assumptions concerning malleable tasks: the

processing time of a malleable task is nonincreasing in the number of processors, while the work of a malleable task is nondecreasing in the number of processors. Additionally, the work function is assumed

herein to be convex in the processing time. The proposed algorithm reformulates the linear program of

[11], and this algorithm and associated proofs are inspired by the ones of [11]. This work describes a novel polynomial-time approximation algorithm that is capable of achieving an approximation ratio of

2+√2≈3.4142. This work further demonstrates that the proposed algorithm can yield an approximation

ratio of 2.9549 when the processing time is strictly decreasing in the number of the processors allocated to the task. This finding represents an improvement upon the previous best approximation ratio of

100/63+100(√6469+137)/5481≈3.2920 [12] achieved under the same assumptions.

ETPL

PDS-103 Aging-Aware Energy-Efficient Workload Allocation for Mobile Multimedia Platforms

Abstract: Multicore platforms are characterized by increasing variability and aging effects that imply

heterogeneity in core performance, energy consumption, and reliability. In particular, wear-out effects

such as negative-bias-temperature-instability require runtime adaptation of system resource utilization to time-varying and uneven platform degradation, so as to prevent premature chip failure. In this context,

task allocation techniques can be used to deal with heterogeneous cores and extend chip lifetime while

minimizing energy and preserving quality of service. We propose a new formulation of the task allocation problem for variability affected platforms, which manages per-core utilization to achieve a target lifetime

while minimizing energy consumption during the execution of rate-constrained multimedia applications.

We devise an adaptive solution that can be applied online and approximates the result of an optimal, offline version. Our allocator has been implemented and tested on real-life functional workloads running

on a timing accurate simulator of a next-generation industrial multicore platform. We extensively assess

the effectiveness of the online strategy both against the optimal solution and also compared to alternative

state-of-the-art policies. The proposed policy outperforms state-of-the-art strategies in terms of lifetime preservation, while saving up to 20 percent of energy consumption without impacting timing constraints.

ETPL

PDS-103

An Efficient Penalty-Aware Cache to Improve the Performance of Parity-Based Disk

Arrays under Faulty Conditions

Abstract: The buffer cache plays an essential role in smoothing the gap between the upper level

computational components and the lower level storage devices. A good buffer cache management scheme

should be beneficial to not only the computational components, but also the storage components by reducing disk I/Os. Existing cache replacement algorithms are well optimized for disks in normal mode,

but inefficient under faulty scenarios, such as a parity-based disk array with faulty disk(s). To address this

issue, we propose a novel penalty-aware buffer cache replacement strategy, named Victim Disk(s) First (VDF) cache, to improve the reliability and performance of a storage system consisting of a buffer cache

and disk arrays. VDF cache gives higher priority to cache the blocks on the faulty disks when the disk

array fails, thus reducing the I/Os addressed directly to the faulty disks. To verify the effectiveness of the

VDF cache, we have integrated VDF into the popular cache algorithms least frequently used (LFU) and least recently used (LRU), named VDF-LFU and VDF-LRU, respectively. We have conducted intensive

simulations as well as a prototype implementation for disk arrays to tolerate one disk failure (RAID-5)

and two disk failures (RAID-6). The simulation results have shown that VDF-LFU can reduce disk I/Os to surviving disks by up to 42.3 percent in RAID-5 and 50.7 percent in RAID-6, and VDF-LRU can



http://www.elysiumtechnologies.com, [email protected] reduce those by up to 36.2 percent in RAID-5 and 48.9 percent in RAID-6. Our measurement results also

show that VDF-LFU can speed up the online recovery by up to 46.3 percent in RAID-5 and 47.2 percent

in RAID-6 under spare-rebuilding mode, or improve the maximum system service rate by up to 47.7 percent in RAID-5 under degraded mode without a reconstruction workload. Similarly, VDF-LRU can

speed up the online recovery by up to 34.6 percent in RAID-5 and 38.2 percent in RAID-6, or improve

the system service rate by up to 28.4 percent in RAID-5.

ETPL

PDS-103

DoMaIN: A Novel Dynamic Location Management Solution for Internet-Based

Infrastructure Wireless Mesh Networks

Abstract: Wireless mesh networks (WMNs) have been deployed in many areas. There is an increasing demand for supporting a large number of mobile users in WMNs. As one of the key components in

mobility management support, location management serves the purpose of tracking mobile users and

locating them prior to establishing new communications. Previous dynamic location management schemes proposed for cellular and wireless local area networks (WLANs) cannot be directly applied to

WMNs due to the existence of multihop wireless links in WMNs. Moreover, new design challenges arise

when applying location management for silently roaming mobile users in the mesh backbone.

Considering the number of wireless hops, an important factor affecting the performance of WMNs, we propose a DoMaIN framework that can help mobile users to decide whether an intra- or intergateway

location update (LU) is needed to ensure the best location management performance (i.e., packet delivery)

among dynamic location management solutions. In addition, by dynamically guiding mobile users to perform LU to a desirable location entity, the proposed DoMaIN framework can minimize the location

management protocol overhead in terms of LU overhead in the mesh backbone. Furthermore, DoMaIN

brings extra benefits for supporting a dynamic hop-based LU triggering method that is different from previous dynamic LU triggering schemes proposed for cellular networks and WLANs. We evaluate the

performance of DoMaIN in different case studies using OPNET simulations. Comprehensive simulation

results demonstrate that DoMaIN outperforms other location management schemes and is a satisfactory

location management solution for a large number of mobile users silently and arbitrarily roaming under the wireless mesh backbone.

ETPL

PDS-103

Efficient Computation of Robust Average of Compressive Sensing Data in Wireless

Sensor Networks in the Presence of Sensor Faults

Abstract: Wireless sensor networks (WSNs) enable the collection of physical measurements over a large

geographic area. It is often the case that we are interested in computing and tracking the spatial-average of

the sensor measurements over a region of the WSN. Unfortunately, the standard average operation is not robust because it is highly susceptible to sensor faults and heterogeneous measurement noise. In this

paper, we propose a computational efficient method to compute a weighted average (which we will call

robust average) of sensor measurements, which appropriately takes sensor faults and sensor noise into consideration. We assume that the sensors in the WSN use random projections to compress the data and

send the compressed data to the data fusion centre. Computational efficiency of our method is achieved

by having the data fusion centre work directly with the compressed data streams. The key advantage of

our proposed method is that the data fusion centre only needs to perform decompression once to compute the robust average, thus greatly reducing the computational requirements. We apply our proposed method

to the data collected from two WSN deployments to demonstrate its efficiency and accuracy.




ETPL

PDS-103

E-SmallTalker: A Distributed Mobile System for Social Networking in Physical

Proximity

Abstract: Small talk is an important social lubricant that helps people, especially strangers, initiate conversations and make friends with each other in physical proximity. However, due to difficulties in

quickly identifying significant topics of common interest, real-world small talk tends to be superficial.

The mass popularity of mobile phones can help improve the effectiveness of small talk. In this paper, we present E-SmallTalker, a distributed mobile communications system that facilitates social networking in

physical proximity. It automatically discovers and suggests topics such as common interests for more

significant conversations. We build on Bluetooth Service Discovery Protocol (SDP) to exchange potential

topics by customizing service attributes to publish non-service-related information without establishing a connection. We propose a novel iterative Bloom filter protocol that encodes topics to fit in SDP attributes

and achieves a low false-positive rate. We have implemented the system in Java ME for ease of

deployment. Our experiments on real-world phones show that it is efficient enough at the system level to facilitate social interactions among strangers in physical proximity. To the best of our knowledge, E-

SmallTalker is the first distributed mobile system to achieve the same purpose.

ETPL

PDS-103

Formal Specification and Runtime Detection of Dynamic Properties in Asynchronous

Pervasive Computing Environments

Abstract: Formal specification and runtime detection of contextual properties is one of the primary

approaches to enabling context awareness in pervasive computing environments. Due to the intrinsic dynamism of the pervasive computing environment, dynamic properties, which delineate concerns of

context-aware applications on the temporal evolution of the environment state, are of great importance.

However, detection of dynamic properties is challenging, mainly due to the intrinsic asynchrony among

computing entities in the pervasive computing environment. Moreover, the detection must be conducted at runtime in pervasive computing scenarios, which makes existing schemes do not work. To address

these challenges, we propose the property detection for asynchronous context (PDAC) framework, which

consists of three essential parts: 1) Logical time is employed to model the temporal evolution of environment state as a lattice. The active surface of the lattice is introduced as the key notion to model the

runtime evolution of the environment state; 2) Specification of dynamic properties is viewed as a formal

language defined over the trace of environment state evolution; and 3) The SurfMaint algorithm is

proposed to achieve runtime maintenance of the active surface of the lattice, which further enables runtime detection of dynamic properties. A case study is conducted to demonstrate how the PDAC

framework enables context awareness in asynchronous pervasive computing scenarios. The SurfMaint

algorithm is implemented and evaluated over MIPA--the open-source context-aware middleware we developed. Performance measurements show the accuracy and cost-effectiveness of SurfMaint, even

when faced with dynamic changes in the asynchronous pervasive computing environment.

ETPL

PDS-103 GPUs as Storage System Accelerators

Abstract: Massively multicore processors, such as graphics processing units (GPUs), provide, at a

comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of

system components, triggers the opportunity to redesign systems and to explore new ways to engineer

them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing



http://www.elysiumtechnologies.com, [email protected] GPUs' computational power to improve the performance, reliability, or security of distributed storage

systems. In this context, we present the design of a storage system prototype that uses GPU offloading to

accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype

under two configurations: as a content addressable storage system that facilitates online similarity

detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing

applications' performance. Our results show that this technique can bring tangible performance gains

without negatively impacting the performance of concurrently running applications.

ETPL

PDS-103 High-Accuracy TDOA-Based Localization without Time Synchronization

Abstract: Localization is of great importance in mobile and wireless network applications. Time Difference of Arrival (TDOA) is one of the widely used localization schemes, in which the target (source)

emits a signal and a number of anchors (receivers) record the arriving time of the source signal. By

calculating the time difference of different receivers, the location of the target is estimated. In such a

scheme, receivers must be precisely time synchronized. But time synchronization adds computational cost, and brings errors which may lower localization accuracy. Previous studies have shown that existing

time synchronization approaches using low-cost devices are insufficiently accurate, or even infeasible

under high requirement for accuracy. In our scheme (called Whistle), several asynchronous receivers record a target signal and a successive signal that is generated artificially. By two-signal sensing and

sample counting techniques, time synchronization requirement can be removed, while high time

resolution can be achieved. This design fundamentally changes TDOA in the sense of releasing the synchronization requirement and avoiding many sources of errors caused by time synchronization. We

implement Whistle on commercial off-the-shelf (COTS) cell phones with acoustic signal and perform

simulations with UWB signal. Especially we use Whistle to localize nodes of large-scale wireless

networks, and also achieve desirable results. The extensive real-world experiments and simulations show that Whistle can be widely used with good accuracy.

ETPL

PDS-103 Intelligent Sensor Placement for Hot Server Detection in Data Centers

Abstract: Recent studies have shown that a significant portion of the total energy consumption of many

data centers is caused by the inefficient operation of their cooling systems. Without effective thermal monitoring with accurate location information, the cooling systems often use unnecessarily low

temperature set points to overcool the entire room, resulting in excessive energy consumption. Sensor

network technology has recently been adopted for data-center thermal monitoring because of its nonintrusive nature for the already complex data center facilities and robustness to instantaneous CPU or

disk activities. However, existing solutions place sensors in a simplistic way without considering the

thermal dynamics in data centers, resulting in unnecessarily degraded hot server detection probability. In

this paper, we first formulate the problems of sensor placement for hot server detection in a data center as constrained optimization problems in two different scenarios. We then propose a novel placement scheme

based on computational fluid dynamics (CFD) to take various factors, such as cooling systems and server

layout, as inputs to analyze the thermal conditions of the data center. Based on the CFD analysis in



http://www.elysiumtechnologies.com, [email protected] various server overheating scenarios, we apply data fusion and advanced optimization techniques to find a

near-optimal sensor placement solution, such that the probability of detecting hot servers is significantly

improved. Our empirical results in a real server room demonstrate the detection performance of our placement solution. Extensive simulation results in a large-scale data center with 32 racks also show that

the proposed solution outperforms several commonly used placement solutions in terms of detection

probability.

ETPL

PDS-103 ITA: Innocuous Topology Awareness for Unstructured P2P Networks

Abstract: One of the most appealing characteristics of unstructured P2P overlays is their enhanced self-* properties, which results from their loose, random structure. In addition, most of the algorithms which

make searching in unstructured P2P systems scalable, such as dynamic querying and 1-hop replication,

rely on the random nature of the overlay to function efficiently. The underlying communications network (i.e., the Internet), however, is not as randomly constructed. This leads to a mismatch between the

distance of two peers on the overlay and the hosts they reside on at the IP layer, which in turn leads to its

misuse. The crux of the problem arises from the fact that any effort to provide a better match between the

overlay and the IP layer will inevitably lead to a reduction in the random structure of the P2P overlay, with many adverse results. With this in mind, we propose ITA, an algorithm which creates a random

overlay of randomly connected neighborhoods providing topology awareness to P2P systems, while at the

same time has no negative effect on the self-* properties or the operation of the other P2P algorithms. Using extensive simulations, both at the IP router level and autonomous system level, we show that ITA

reduces communication latencies by as much as 50 percent. Furthermore, it not only reduces by 20

percent the number of IP network messages which is critical for ISPs carrying the burden of transporting P2P traffic, but also distributes the traffic load more evenly on the routers of the IP network layer.

ETPL

PDS-103 K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps

Abstract: We present an implementation of parallel $(K)$-means clustering, called $(K_{ps})$-means,

that achieves high performance with near-full occupancy compute kernels without imposing limits on the

number of dimensions and data points permitted as input, thus combining flexibility with high degrees of

parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups

of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with

popular numerical software packages.

ETPL

PDS-103 LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Abstract: LU factorization with partial pivoting is a canonical numerical procedure and the main

component of the high performance LINPACK benchmark. This paper presents an implementation of the

algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion between the

computational power of the CPUs, compared to the GPUs, and in the meager bandwidth of the



http://www.elysiumtechnologies.com, [email protected] communication link between their memory systems. An additional challenge comes from the complexity

of the memory-bound and synchronization-rich nature of the panel factorization component of the block

LU algorithm, imposed by the use of partial pivoting. The challenges are tackled with the use of a data layout geared toward complex memory hierarchies, autotuning of GPU kernels, fine-grain parallelization

of memory-bound CPU operations and dynamic scheduling of tasks to different devices. Performance in

excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.

ETPL

PDS-103 LvtPPP: Live-Time Protected Pseudopartitioning of Multicore Shared Caches

Abstract: Partition enforcement policy is essential in the cache partition, and its main function is to

protect the lines and retain the cache quota of each core. This paper focuses online protection based on its

generation time rather than the CPU core ID that it belongs to or the position of the replacement stack, where it is located. The basic idea is that when a line is live, it must be protected and retained in the

cache; when the line is "dead," it needs to be evicted as early as possible. Therefore, the

live-time protected counter (LvtP, four bits) is augmented to trace the lines' live time. Moreover, dead

blocks are predicted according to the access event sequence. This paper presents a pseudopartition approach--LvtPPP and proposes a two-cascade victim selection mechanism to alleviate dead blocks based

on the LRU replacement policy and the LvtP counter. LvtPPP also supports flexible handling of

allocation deviation by introducing a parameter $(lambda)$ to adjust the generation time of the line. There is significant improvement of the performance and fairness in LvtPPP over PIPP and UCP

according to the evaluation results based on Simics.

ETPL

PDS-103 Modeling Propagation Dynamics of Social Network Worms

Abstract: Social network worms, such as email worms and facebook worms, pose a critical security threat

to the Internet. Modeling their propagation dynamics is essential to predict their potential damages and develop countermeasures. Although several analytical models have been proposed for modeling

propagation dynamics of social network worms, there are two critical problems unsolved: temporal

dynamics and spatial dependence. First, previous models have not taken into account the different time

periods of Internet users checking emails or social messages, namely, temporal dynamics. Second, the problem of spatial dependence results from the improper assumption that the states of neighboring nodes

are independent. These two problems seriously affect the accuracy of the previous analytical models. To

address these two problems, we propose a novel analytical model. This model implements a spatial-temporal synchronization process, which is able to capture the temporal dynamics. Additionally, we find

the essence of spatial dependence is the spreading cycles. By eliminating the effect of these cycles, our

model overcomes the computational challenge of spatial dependence and provides a stronger approximation to the propagation dynamics. To evaluate our susceptible-infectious-immunized (SII)

model, we conduct both theoretical analysis and extensive simulations. Compared with previous epidemic

models and the spatial-temporal model, the experimental results show our SII model achieves a greater

accuracy. We also compare our model with the susceptible-infectious-susceptible and susceptible-infectious-recovered models. The results show that our model is more suitable for modeling the

propagation of social network worms.




ETPL

PDS-103 Online Balancing Two Independent Criteria upon Placements and Deletions

Abstract: The analysis of real-world complex networks has been the focus of recent research. Detecting communities helps in uncovering their structural and functional organization. Valuable insight can be

obtained by analyzing the dense, overlapping, and highly interwoven $(k)$-clique communities.

However, their detection is challenging due to extensive memory requirements and execution time. In this paper, we present a novel, parallel $(k)$-clique community detection method, based on an innovative

technique which enables connected components of a network to be obtained from those of its

subnetworks. The novel method has an unbounded, user-configurable, and input-independent maximum

degree of parallelism, and hence is able to make full use of computational resources. Theoretical tight upper bounds on its worst case time and space complexities are given as well. Experiments on real-world

networks such as the Internet and the World Wide Web confirmed the almost optimal use of parallelism

(i.e., a linear speedup). Comparisons with other state-of-the-art $(k)$-clique community detection methods show dramatic reductions in execution time and memory footprint. An open-source

implementation of the method is also made publicly available.

ETPL

PDS-103

Scalable Hypergrid k-NN-Based Online Anomaly Detection in Wireless Sensor

Networks

Abstract: Online anomaly detection (AD) is an important technique for monitoring wireless sensor

networks (WSNs), which protects WSNs from cyberattacks and random faults. As a scalable and parameter-free unsupervised AD technique, $(k)$-nearest neighbor (kNN) algorithm has attracted a lot of

attention for its applications in computer networks and WSNs. However, the nature of lazy-learning

makes the kNN-based AD schemes difficult to be used in an online manner, especially when

communication cost is constrained. In this paper, a new kNN-based AD scheme based on hypergrid intuition is proposed for WSN applications to overcome the lazy-learning problem. Through redefining

anomaly from a hypersphere detection region (DR) to a hypercube DR, the computational complexity is

reduced significantly. At the same time, an attached coefficient is used to convert a hypergrid structure into a positive coordinate space in order to retain the redundancy for online update and tailor for bit

operation. In addition, distributed computing is taken into account, and position of the hypercube is

encoded by a few bits only using the bit operation. As a result, the new scheme is able to work

successfully in any environment without human interventions. Finally, the experiments with a real WSN data set demonstrate that the proposed scheme is effective and robust.

ETPL

PDS-103 Task Allocation for Undependable Multiagent Systems in Social Networks

Abstract: Task execution of multiagent systems in social networks (MAS-SN) can be described through

agents' operations when accessing necessary resources distributed in the social networks; thus, task

allocation can be implemented based on the agents' access to the resources required for each task and aimed to minimize this resource access time. Currently, in undependable MAS-SN, there are deceptive

agents that may fabricate their resource status information during task allocation but not really contribute

resources to task execution; although there are some game theory-based solutions for undependable MAS, but which do not consider minimizing resource access time that is crucial to the performance of task

execution in social networks. To achieve dependable resources with the least access time to execute tasks

in undependable MAS-SN, this paper presents a novel task allocation model based on the negotiation



http://www.elysiumtechnologies.com, [email protected] reputation mechanism, where an agent's past behaviors in the resource negotiation of task execution can

influence its probability to be allocated new tasks in the future. In this model, the agent that contributes

more dependable resources with less access time during task execution is rewarded with a higher negotiation reputation, and may receive preferential allocation of new tasks. Through experiments, we

determine that our task allocation model is superior to the traditional resources-based allocation

approaches and game theory-based allocation approaches in terms of both the task allocation success rate and task execution time and that it usually performs close to the ideal approach (in which deceptive

agents are fully detected) in terms of task execution time.

ETPL

PDS-103

Two Blocks Are Enough: On the Feasibility of Using Network Coding to Ameliorate

the Content Availability of BitTorrent Swarms

Abstract: In this paper, we conduct an in-depth study on the feasibility of using network coding to

ameliorate the content availability of BitTorrent swarms. We first perform mathematical analysis on the potential improvement in the content availability and bandwidth utilization induced by two existing

network coding schemes. It is found that these two coding schemes either incur a very high coding

complexity and disk operation overhead or cannot effectively leverage the potential of improving the

content availability. In this regard, we propose a simple sparse network coding scheme in which both the drawbacks mentioned before are precluded. To accommodate the proposed coding scheme into

BitTorrent, a new block scheduling algorithm is also developed based on the original rarest-first block

scheduling policy of BitTorrent. Through extensive simulations and performance evaluations, we show that the proposed coding scheme is very effective in terms of improving the content availability of

BitTorrent swarms when compared with some existing methods.

ETPL

PDS-103

Virtual Batching: Request Batching for Server Energy Conservation in Virtualized

Data Centers

Abstract: Many power management strategies have been proposed for enterprise servers based on

dynamic voltage and frequency scaling (DVFS), but those solutions cannot further reduce the energy consumption of a server when the server processor is already at the lowest DVFS level and the server

utilization is still low (e.g., 10 percent or lower). To achieve improved energy efficiency, request batching

can be conducted to group received requests into batches and put the processor into sleep between the

batches. However, it is challenging to perform request batching on a virtualized server because different virtual machines on the same server may have different workload intensities. Hence, putting the shared

processor into sleep may severely impact the application performance of all the virtual machines. This

paper proposes Virtual Batching, a novel request batching solution for virtualized servers with primarily light workloads. Our solution dynamically allocates CPU resources such that all the virtual machines can

have approximately the same performance level relative to their allowed peak values. Based on this

uniform level, Virtual Batching determines the time length for periodically batching incoming requests and putting the processor into sleep. When the workload intensity changes from light to moderate, request

batching is automatically switched to DVFS to increase processor frequency for performance guarantees.

Virtual Batching is also extended to integrate with server consolidation for maximized energy

conservation with performance guarantees for virtualized data centers. Empirical results based on a hardware testbed and real trace files show that Virtual Batching can achieve the desired performance with

more energy conservation than several well-designed baselines, e.g., 63 percent more, on average, than a

solution based on DVFS only

final year ieee project 2013-2014 - parallel and distributed systems project title and abstract

Education