accommodating bursts in distributed stream processing systems

32
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI [email protected] Vana Kalogeraki, AUEB [email protected]

Upload: breck

Post on 07-Jan-2016

36 views

Category:

Documents


4 download

DESCRIPTION

Accommodating Bursts in Distributed Stream Processing Systems. Yannis Drougas, ESRI [email protected] Vana Kalogeraki, AUEB [email protected]. Stream Processing Applications. Large class of emerging applications in which data streams must be processed online Example applications include: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Accommodating Bursts in Distributed Stream Processing Systems

Accommodating Bursts in Distributed Stream

Processing SystemsYannis Drougas, ESRI

[email protected]

Vana Kalogeraki, [email protected]

Page 2: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 2

• Large class of emerging applications in which data streams must be processed online

• Example applications include:– Traffic Monitoring– Sensor network data processing– Stock Exchange data filtering– Surveillance– Network monitoring

Stream Processing Applications

Page 3: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 3

Distributed Stream Processing SystemsHigh-volume, continuous

input streamsProcessed

result streams

On-line processing functions / continuous query operators implemented on each node:

Clustering Correlation Filtering Aggregation ...

Page 4: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 4

• Data is produced continuously, in large volumes and at high rates

• Data has to be processed in a timely manner, e.g. within a deadline

• Application input rates fluctuate notably and abruptlyE.g.:

– increasing traffic volume due to a DoS attack

– large number of messages generated when there is a time-critical event in the sensor network field

• It is important that no data is lost and that time-critical events are processed and delivered in a timely manner

Stream Processing Applications Characteristics

Clustering

Filtering

Join

Page 5: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 5

Previous Work• The majority of previous work [Tatbul@VLDB 2003,

VLDB2007, Amini@ICDCS 2006, Gu@ICDCS 2005] has focused on optimizing a given utility function– Some solutions employ data admission

– Others consider the optimal placement of tasks on nodes

• The case where load patterns can be predicted has also been studied [Borealis @ ICDE 2005]

• QoS management [Wei@RTSS2006] to predict query workload

• Common ways to deal with an overload: Admission control or resource reservation techniques [Abeni & Buttazzo @ RTSS’04] can result in under-utilization

• QoS degradation is a reactive technique and occurs only after burst has occurred [Chen @ IPDPS’05]

Page 6: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 6

Our Problem• We focus on the problem of addressing bursts

of input data rate– Can we accommodate a burst without having to reserve

resources a priori?

– How should we react to a burst?

• Benefits:– Lost data units due to bursts are minimized– No QoS degradation or data admission– No under-utilization, dynamic reservation used

• Challenges:– Highly dynamic / unpredictable environment– Multiple limiting resource types– Plan must be applied in real-time for the burst

Page 7: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 7

Roadmap

• Motivation and Background• System Architecture• Burst Handling Mechanism

– Feasible Region– Pareto Points– Online burst management through rate

assignment

• Experimental Evaluation• Conclusion

Page 8: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 8

System Architecture

Overlay network consisted of processing nodes. Built over a DHT (currently, Pastry).

The physical (IP) network.

Application layer: Execution of stream processing applications.

Page 9: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems

Synergy Node Architecture• Replica Placement places

components• Load Balancing and Load

Prediction detect hot-spots• Migration Engine alleviates

hot-spots• Application Composition and

QoS Projection instantiate applications

• Execution Scheduler provides real-time execution

• Node Monitor measures processor and bandwidth

• Discovery locates streams and components

• Routing transfers streaming data

9

Page 10: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 10

Distributed Stream Processing Application

• A stream processing application is represented as a graph. The application is executed collaboratively by peers of the system that invoke the appropriate services.

• Multiple applications can run simultaneously sharing the system resources.

• A service can be instantiated on more than one nodes.

• A service instantiation on a node is a component.

Page 11: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 11

• Each component ci is characterized by its resource requirements:

• The selectivity of the component ci is given by:

• Each node n is characterized by the availability of its resources:

System Model

Page 12: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 12

Optimization Problem• Capacity Constraints:

• Flow Conservation Constraints:

• We need to come up with a plan that satisfies the above constraints and minimizes the likelihood of missing data due to bursts. The plan should determine the rate assignment of each component rci

Page 13: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 13

Feasible Region• Assume we have Q applications

• The state of the system at any given time can be described by a point in the Q-dimensional space

• The feasible region is the set of all points (application input rates combinations) that nodes in the given distributed stream processing system can accommodate without any data unit being dropped.

• The form of linear constraints suggest that in the general case of Q applications, the feasible region is a convex polytope.

Page 14: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 14

Feasible Region - Example

Page 15: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 15

• A point p1 dominates a point p2 when for each application q,

• If the current system state is p2, one can apply the rate allocations calculated for p1

• A Pareto point is not dominated by any other point in the feasible region

Dominance & Pareto Points

• Pareto points represent optimal solutions: There is no point that is “better” than a pareto point

Page 16: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 16

• If the input rate of application q increases by , the input rate of a component of q will increase by

• In order for a stream processing system to be able to sustain such an increase, the following must hold for each node:

Burst Handling

Initial resource requirements

Additional resource requirements due to single burst

Page 17: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 17

• Assuming simultaneous bursts, the capacity constraints for each node are:

• We assume each application has equal probability for a burst to appear.

• To minimize the amount of dropped data, we wish to maximize all :

• If the current input rates are represented by p, we need to configure the system for:

Optimization Objective

Page 18: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 18

Optimization Objective - Example

• To be able to accommodate the burst, the new point must be inside the feasible region

Page 19: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 19

Our System: BARRE• Our objective is to accommodate bursts in real-

time through rate reconfiguration• This problem is difficult to solve online• Instead, our solution:– Offline: Pre-calculates a small number of component

rate assignment plans during an offline phase (using the Feasible Region & Pareto Point machinery that we described)

– Online: • Monitors application incoming rates and resource

availability during runtime• When bursts occur, component rates are re-assigned,

based on the pre-calculated plans

Page 20: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 20

Feasible Region Determination

• We don’t compute the entire feasible region• Instead, we find a set of Pareto points and

use them to construct the appropriate component rate assignment on the event of a burst

• For those Pareto points we pre-calculate their optimal rate assignment

• These component rate assignments will be used to construct the appropriate component rate allocation on the event of a burst.

Page 21: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 21

Identifying Index Points - Example

• Step 1: Find the maximum possible rate for each application– Solve a max-flow problem, with the rates of

all but one applications set to 0.

Page 22: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 22

Identifying Index Points - Example

• Step 2: Find the mid-point of the resulting plane and see how far it can go– Solve max-flow by also constraining direction

Page 23: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 23

Identifying Index Points - Example

• Step 3: For each of the resulting sides, find each mid-point and repeat previous step– The upper left and lower right points are now

dominated by the newly found index points

Page 24: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 24

Identifying Index Points - Example• Step 4: This is the resulting approximation to the feasible region

– The mid-points of the sides cannot improve without breaking the capacity constraints

Page 25: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems

Online Phase• Upon the onset of a burst, we use the pre-

calculated Pareto Points to determine the appropriate component input rate allocation plan

• We may need to adjust the entire plan as a result of the burst

25

p’ = p +δ p2 dominates p, however,when there is a burst we mayhave to change plans to p3

p1

p

p3

p2

p’

Page 26: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 26

Experimental Setup• Implementation over our Synergy middleware• Used FreePastry for service discovery and

collection of statistics• 10 unique services in the system, 5 services on a

single node• Each application included 4 to 6 services• Each result is an average of 5 runs• Comparison with:

– A burst-unaware method– Static Reservation of 20% of node resources– Simple Dynamic Adaptation, from previous work– A combination of the above

Page 27: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 27

Index Point Pre-Calculation

Index Point (Pareto Point) Database calculation time, this is why index points need to be pre-calculated.

Page 28: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 28

BARRE Operation

BARRE avoids missing data units during the burst and minimizes lost data units. Also, resulting plan is “safer”, so it prevents future drops

Page 29: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 29

Missed Data Units

BARRE results in fewer data units dropped. It can sustain up to about 80% (vs. about 20%) application rate increase without dropping any data unit.

Page 30: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 30

Data Units Delivered On Time

BARRE achieves a high percentage of data unit delivery.

Page 31: Accommodating Bursts in Distributed Stream Processing Systems

Yannis Drougas, Vana Kalogeraki

Accommodating Bursts in Distributed Stream Processing Systems 31

Conclusions

• We proposed BARRE, a dynamic reservation scheme to address bursts in a dynamic stream processing system.

• Reservation is based on application needs and readjusted according to system dynamics

• BARRE utilizes multiple nodes for each processing component, splitting the input rate among them whenever needed

Page 32: Accommodating Bursts in Distributed Stream Processing Systems

Accommodating Bursts in Distributed Stream

Processing Systems

Yannis Drougas, [email protected]

Vana Kalogeraki, [email protected]