borealis is a distributed stream processing system (dsps) based on aurora and medusa

of 4/4
Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University Borealis is a distributed stream processing system (DSPS) based on Aurora and Medusa Contract-Based Load Management HA Semantics and Algorithms Network Partitions Approach: 1 - Offline, participants negotiate and establish bilateral contracts that: • Fix or tightly bound price per unit-load • Are private and customizable (e.g., performance, availability guarantees, SLA) Properties: • Simple, efficient, and low overhead (provable small bounds) • Provable incentives to participate in mechanism • Experimental result: A small number of contracts and small price-ranges suffice to achieve acceptable allocation A C Approach: Favor availability. Use updates to achieve consistency • Use connection points to create replicas and stream versions • Downstream nodes • Monitor upstream nodes Reconnect to available upstream replica • Continue processing with minimal Goal: Handle network partitions in a distributed stream processing system p p [p,p+e] 0.8p B’ B Contract at p Convex cost function Offered load (msgs/ sec) Total cost (delay, $) Task t moves from A to B if: unit MC task t > p, at A unit MC task t < p, at B B A C ACK Trim Upstream backup lowest runtime overhead B A C B’ Repla y Active Standby shortest recovery time B A C B’ ACK Trim Passive Standby most suitable for precise recovery Goal: Streaming applications can tolerate different types of failure recovery: Gap recovery: may lose tuples • Rollback recovery: produces duplicates but does not lose tuples • Precise recovery: takes over precisely from the point of failure Repeatable Convergent Deterministic Filter, Map, Join BSort, Resample, Aggregate Union, operators with timeouts B A C B’ ACK Checkpoin t D A C B Goals: • Manage load through collaborations between autonomous participants • Ensure acceptable allocation where each node’s load is below threshold Particip ant Contract specifying that A will pay C, $p per unit of load Challenges: Operator and processing non- determinism 2 - At runtime, Load moves only between participants that have a contract Movements are based on marginal costs: • Each participant has a private convex cost function • Load moves when it’s cheaper to pay partner than to process locally Challenges: Incentives, efficiency, and customization Arbitrary load( t) MC(t) at A Challenges: • Maximize availability • Minimize reprocessing • Maintain consistency MC(t) at B

Post on 12-Feb-2016

36 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Borealis is a distributed stream processing system (DSPS) based on Aurora and Medusa. HA Semantics and Algorithms. Contract-Based Load Management. - PowerPoint PPT Presentation

TRANSCRIPT

  • Load Management and High Availability in BorealisMagdalena Balazinska, Jeong-Hyon Hwang, and the Borealis teamMIT, Brown University, and Brandeis UniversityBorealis is a distributed stream processing system (DSPS) based on Aurora and Medusa Contract-Based Load ManagementHA Semantics and AlgorithmsNetwork PartitionsApproach: 1 - Offline, participants negotiate and establish bilateral contracts that: Fix or tightly bound price per unit-load Are private and customizable (e.g., performance, availability guarantees, SLA)Properties: Simple, efficient, and low overhead (provable small bounds) Provable incentives to participate in mechanism Experimental result: A small number of contracts and small price-ranges suffice to achieve acceptable allocationACApproach: Favor availability. Use updates to achieve consistency Use connection points to create replicas and stream versions Downstream nodes Monitor upstream nodes Reconnect to available upstream replica Continue processing with minimal disruptions

    Goal: Handle network partitions in a distributed stream processing systempp[p,p+e]0.8pBBContractat pConvex cost functionOffered load(msgs/sec)Total cost(delay, $)Task t moves from A to B if: unit MC task t > p, at A unit MC task t < p, at B

    BACACKTrimUpstream backup lowest runtime overheadBACBReplayActive Standby shortest recovery time BACBACKTrimPassive Standby most suitable for precise recovery Goal: Streaming applications can tolerate different types of failure recovery: Gap recovery: may lose tuples Rollback recovery: produces duplicates but does not lose tuples Precise recovery: takes over precisely from the point of failureRepeatableConvergentDeterministicFilter, Map, JoinBSort, Resample, AggregateUnion, operators with timeoutsBACBACKCheckpointDACBGoals: Manage load through collaborations between autonomous participants Ensure acceptable allocation where each nodes load is below thresholdParticipantContract specifying that A will pay C, $p per unit of loadChallenges: Operator and processing non-determinism 2 - At runtime,Load moves only between participants that have a contractMovements are based on marginal costs: Each participant has a private convex cost function Load moves when its cheaper to pay partner than to process locallyChallenges: Incentives, efficiency, and customizationArbitraryload(t)MC(t) at AChallenges: Maximize availability Minimize reprocessing Maintain consistencyMC(t) at B

  • Load Management Demonstration SetupACBD2) As node A becomes overloaded it sheds load to its partners B and C until system reaches acceptable allocationACB0.8p3) Load increases at node B causing system overload4) Node D joins the system. Load flows from node B to C and C to D until the system reaches acceptable allocationAll nodes process a network monitoring query over real traces of connection summariesGroup by IPcount60sGroup by IPcount distinct port60sFilter> 10Filter> 100Group by IP prefix, sum60sFilter> 100ConnectioninformationClusters of IPs that establish many connectionsTFACBppp1) Three nodes with identical contracts and uneven initial load distributionAcceptable allocationNode A overloadedA sheds load to B then to CAcceptable allocationSystem overloadNode D joinsLoad flows from C to D and from B to CABCCBDIPs that establish many connectionsIPs that connect over many portsQuery: Count the connections established by each IP over 60 sec and the number of distinct ports to which each IP connected

  • High Availability Demonstration SetupPassive Standby1) The four primaries, B0, C0, D0, and E0 run on one laptopIdentical queries traverse nodes that use different high availability approaches3) We compare the runtime overhead of the approachesAB0B1C0C1D0D1E0E1Active StandbyUpstream BackupUpstream Backup &Duplicate EliminationB0C0D0E02) All other nodes run on the other laptop4) We kill all primaries at the same time 5) We compare the recovery time and the effects on tuple delay and duplicationStatically assigned secondaryTuples receivedE2E delayFailureDuplicate tuplesFailureActive standby has highest runtime overheadUpstream backup has highest overhead during recoveryPassive standby adds most end-to-end delayPassive StandbyActive StandbyUB no dupsUpstream Backup

  • Network Partition Demonstration Setup2) We unplug the cable connecting the laptops3) Node C detects that node B has become unreachable1) The initial query distribution crosses computer boundariesACLaptop 2Laptop 1RB4) Node C identifies node R as reachable alternate replica:Output stream has the same name but a different version5) Node C connects to node R and continues processing from the same point on the stream6) Node C changes the version of its output stream7) When partition heals, node C remains connected to R and continues processing uninterruptedEnd-to-end tuple delay increases while C detects the network partition and re-connects to REnd-to-end tuple delaySequence nb of received tuples Tuples received through BTuples received through RNo duplications and no losses after network partitions