[ieee 2007 ieee international symposium on a world of wireless, mobile and multimedia networks -...

9
Dynamic Filter Merging for Publish/Subscribe Sasu Tarkoma Telecommunications Software and Multimedia Laboratory and Helsinki Institute for Information Technology (HIIT) Helsinki University of Technology P.O. Box 5400, 02015 TKK, Finland Email: [email protected] Abstract-Filter-based publish/subscribe and content-based 3) We present experimental results for dynamic filter merg- routing have been proposed for flexible information dissemination ing for both desktop systems and small devices. The in distributed environments. A content-based router is part of results indicate that run-time merging with frequent filter an overlay structure, in which each router forwards events to additions and removals is feasible given that the work- neighbouring routers and local clients based on their interests. addehits remonale mergeabli chatetics. Filter merging or summarization has been proposed as an load exhibits reasonable mergeability characteristics. optimization strategy in this environment. These techniques This paper is structured as follows. Section II gives a combine filters to reduce the number of propagated filters and brief background on filter merging and Section III presents thus the size of distributed state. In this paper, we present . . the algorithms for generic dynamic filter merging and discuss the prelimiaries Section IV presents a formal model for integration with routing tables. The algorithms are based on a filter merging, Section V presents the algorithm for dynamic formal framework of merging rules. Experimental results are filter merging and discusses integration with a routing table. examined and analyzed for both desktop systems and small Section VI examines experimental results. Finally, we present devices. The results indicate that dynamic filter merging is the conclusions in Section VII. feasible given that the workload is mergeable. II. BACKGROUND I. INTRODUCTION Let us assume a multi-dimensional space that is used to Publish/subscribe systems are generally seen as good can- define all content in a publish/subscribe system. A filter is a didates for supporting distributed applications in dynamic and subspace of this content space. Filter merging is a technique ubiquitous environments because they support decoupled and to find the minimum number of filters that represent a set asynchronous many-to-many information dissemination [1]. In of subscriptions defined in the content space. Filter merging this paradigm, clients subscribe events by specifying their approaches this by fusing and combining the filters using interests using filters. Event producers publish events (also logical rules. Filter covering is a related technique, which is known as notifications), which are matched against active used to remove filters that are covered by other more general subscriptions. Event filtering or matching is used to deliver filters. Merging and covering are needed to reduce processing information to the proper set of subscribers [2], [3], [4], [5], power and memory requirements both on client devices and [6]. on event routers. These techniques are typically general and A content-based router is part of an overlay structure, in may be applied to subscriptions, advertisements, and other which each router forwards events to neighbouring routers information represented using filters. and local clients based on their interests. Filter merging or A false negative is an event that was not matched and summarization has been proposed as an optimization strategy delivered when it should have been. Similarly, afalse positive in this environment. These techniques combine filters to reduce is a message that was matched, but it should not have been. the number of propagated filters and thus the size of distributed In publish/subscribe, false negatives should never occur and state. they indicate a serious error in the system. False positives In this paper, we present the algorithms for generic dynamic may occur and it is possible to balance between efficiency filter merging and discuss integration with routing tables. The and accuracy. algorithms are based on our earlier work on formal rules for A filter-merging-based routing mechanism was presented merging [7], [8]. The data structures and the filter merging in the Rebeca distributed event system [3]. The mechanism framework discussed in this paper are available as Open merges conjunctive filters using perfect merging rules that Source'. The new contributions of this paper are the following. are predicate-specific. Merging was used only for simple 1) We define algorithms for dynamic filter merging using predicates in the context of a stock application [9], [3]. The a formal framework we have previously proposed [7]. integration of the merging mechanism with a routing data 2) We show that removing covered filters before filter structure was not elaborated and we are not aware of any merging is useful and present modular designs for inte- generic solutions on this topic. grating filter merging with generic routing tables. The optimal merging of filters and queries with constraints has been shown to be NP-complete [10] in the multicast- 1Available at http://hoslab.cs.helsinki.fi/homepages/fuego-core/ environment. This work considered query merging for al- 1-4244-0992-6/07/$25.OO © 2007 IEEE

Upload: sasu

Post on 25-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

Dynamic Filter Merging for Publish/SubscribeSasu Tarkoma

Telecommunications Software and Multimedia Laboratoryand Helsinki Institute for Information Technology (HIIT)

Helsinki University of Technology P.O. Box 5400, 02015 TKK, FinlandEmail: [email protected]

Abstract-Filter-based publish/subscribe and content-based 3) We present experimental results for dynamic filter merg-routing have been proposed for flexible information dissemination ing for both desktop systems and small devices. Thein distributed environments. A content-based router is part of results indicate that run-time merging with frequent filteran overlay structure, in which each router forwards events to additions and removals is feasible given that the work-neighbouring routers and local clients based on their interests. addehits remonale mergeabli chatetics.Filter merging or summarization has been proposed as an load exhibits reasonable mergeability characteristics.optimization strategy in this environment. These techniques This paper is structured as follows. Section II gives acombine filters to reduce the number of propagated filters and brief background on filter merging and Section III presentsthus the size of distributed state. In this paper, we present . .the algorithms for generic dynamic filter merging and discuss the prelimiaries Section IV presents a formal model forintegration with routing tables. The algorithms are based on a filter merging, Section V presents the algorithm for dynamicformal framework of merging rules. Experimental results are filter merging and discusses integration with a routing table.examined and analyzed for both desktop systems and small Section VI examines experimental results. Finally, we presentdevices. The results indicate that dynamic filter merging is the conclusions in Section VII.feasible given that the workload is mergeable.

II. BACKGROUNDI. INTRODUCTION Let us assume a multi-dimensional space that is used to

Publish/subscribe systems are generally seen as good can- define all content in a publish/subscribe system. A filter is adidates for supporting distributed applications in dynamic and subspace of this content space. Filter merging is a techniqueubiquitous environments because they support decoupled and to find the minimum number of filters that represent a setasynchronous many-to-many information dissemination [1]. In of subscriptions defined in the content space. Filter mergingthis paradigm, clients subscribe events by specifying their approaches this by fusing and combining the filters usinginterests using filters. Event producers publish events (also logical rules. Filter covering is a related technique, which isknown as notifications), which are matched against active used to remove filters that are covered by other more generalsubscriptions. Event filtering or matching is used to deliver filters. Merging and covering are needed to reduce processinginformation to the proper set of subscribers [2], [3], [4], [5], power and memory requirements both on client devices and[6]. on event routers. These techniques are typically general andA content-based router is part of an overlay structure, in may be applied to subscriptions, advertisements, and other

which each router forwards events to neighbouring routers information represented using filters.and local clients based on their interests. Filter merging or A false negative is an event that was not matched andsummarization has been proposed as an optimization strategy delivered when it should have been. Similarly, a false positivein this environment. These techniques combine filters to reduce is a message that was matched, but it should not have been.the number of propagated filters and thus the size of distributed In publish/subscribe, false negatives should never occur andstate. they indicate a serious error in the system. False positives

In this paper, we present the algorithms for generic dynamic may occur and it is possible to balance between efficiencyfilter merging and discuss integration with routing tables. The and accuracy.algorithms are based on our earlier work on formal rules for A filter-merging-based routing mechanism was presentedmerging [7], [8]. The data structures and the filter merging in the Rebeca distributed event system [3]. The mechanismframework discussed in this paper are available as Open merges conjunctive filters using perfect merging rules thatSource'. The new contributions of this paper are the following. are predicate-specific. Merging was used only for simple

1) We define algorithms for dynamic filter merging using predicates in the context of a stock application [9], [3]. Thea formal framework we have previously proposed [7]. integration of the merging mechanism with a routing data

2) We show that removing covered filters before filter structure was not elaborated and we are not aware of anymerging is useful and present modular designs for inte- generic solutions on this topic.grating filter merging with generic routing tables. The optimal merging of filters and queries with constraints

has been shown to be NP-complete [10] in the multicast-1Available at http://hoslab.cs.helsinki.fi/homepages/fuego-core/ environment. This work considered query merging for al-

1-4244-0992-6/07/$25.OO © 2007 IEEE

Page 2: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

locating query answers to multicast channels. Subscription We have proposed an alternative structure called the poset-partitioning and routing in content-based systems have been derivedforest, which stores filters by their covering propertyinvestigated in [1 1], [12] using Bloom filters [13] and R-trees with other filters2 [20], [8]. The poset-derived forest keepsfor efficiently summarizing subscriptions. only a subset of the direct successor and predecessor relations,Bloom filters and additional predicate indices were used which results in a forest structure instead of a directed acyclic

in a mechanism to summarize subscriptions [14], [15]. The graph (dag). The forest structure allows fast updates to thesubscription summarization is similar to filter merging, but structure with the price of an incomplete picture of the directit is not transparent, because routers need to be aware of successor and predecessor relations. The filters poset and thethe summarization mechanism. Filter merging, on the other poset-derived forest compute the minimal cover set for thehand, does not necessarily require changes to other routers. input set (Definition 2).The benefit of the summarization mechanism is improved Definition 2: A minimal cover set or a root set of the filtersefficiency, since a custom-matching algorithm is used that is poset or poset-derived forest is a set S such that there doesbased on Bloom filters and the additional indices. not exist an element a C S for which b I a and b C S.A BDD-based merging algorithm has been proposed in [ 16]. The motivation for filter merging is that it can give improved

The exact rules for dynamic filter merging were not elaborated performance for both local and distributed operation. Thein this work. The algorithm removes all subscriptions, which minimal cover set, found at the root of a cover-based routingare covered by a new merger. This requires that all routers structure, is a natural target for filter merging techniques. Ifare aware of the merging technique in order to support safe each router is able to reduce the size of the propagated filterunsubscriptions. set, the size of total distributed size becomes significantly

smaller.III. PRELIMINARIES We assume that a merge(Fi,F2) procedure exists that

In this section, we define filters and their basic properties. merges input filters F1 and F2 and returns a single mergedWe focus on the covering relation between filters and data filter FM for which FM F1 and FM z F2. A merge of twostructures that manage filters based on this relation. For filter- or more filters is called a merger. A merger is either perfect orbased routing, we observe that it is sufficient to propagate only imperfect. A perfect merger does not result in false positivesthe minimal cover set of filters, which is typically the root set or negatives, whereas an imperfect merger may result in falseof a poset (partially ordered set) or forest data structure. We positives. More formally, a merger M of filters ..F,... , Fn}then consider how filter merging can be performed on this root is perfect if and only if N(M) = Ui N(Fi). Otherwise, theset using perfect or imperfect techniques. merger is called imperfect [3], [21]. Filter merging is typicallyA filter F is a stateless Boolean function that accepts performed by applying merging rules [3], [8]. The rules may

a notification as an argument. A filter is said to match a be hardcoded into the merging framework, or specified innotification n if and only if F(n) = {true}. The set of all external profiles or ontologies.notifications matched by a filter F is denoted by N(F). A To consider a simple example, we may consider the rangesfilter F1 is said to cover a filter F2, denoted by F1 I F2, [10, 20] and [15, 30]. The perfect merger of these ranges isif and only if all notifications that are matched by F2 are the range [10, 30]. We may also consider the ranges [10, 20]also matched by F1, i.e., N(F1) D N(F2). The filter F1 is and [24, 30]. These two ranges cannot be merged perfectly, butequivalent to F2, written F1 _ F2, if F1 7 F2 and F2 7 F1. an imperfect merger is possible which is, quite naturally, the

The I relation is reflexive and transitive and defines a range [10, 30].partial order. Filter covering may be determined efficiently forsimple predicate-based filters [17] and attribute filters with IV. FILTER MERGINGdisjunctions [7]. Algorithms exist for arbitrary conjunctive In this section we present techniques for incorporatingfilters [18], and also conjunctive tree queries [19]. filter merging into content-based routers. The techniques are

To our knowledge, the Siena filters poset was the first data independent of the filtering language and routing data structurestructure for filter-based routing with optimizations using the used, and do not depend on the mechanism that is used tocovering relation [4]. The two central records associated with merge two input filters.each filter in the filters poset are the forwards and subscriberssets. The former stores the outbound interface identifiers where A. Motivationa filter has been sent, and the latter stores the identifiers Filter merging is potentially useful for event routers, becauseof those interfaces that installed a filter. The former can be it allows to remove redundancy from filter sets. Figure 1computed at runtime. The root set of the filters poset is the set illustrates the benefits of filter merging for typed-tuple-basedof non-covered filters. This set and the set of direct successors filters. First, we have a set of four filters. There are no coveringto root nodes are used to forward filters to neighbouring routers relations in this set, so they need to be propagated by the(Theorem 1). current router. Given that only filters with the same structure

Theorem 1. Only elements in the root set or the direct are merged, the top filters may be merged by applying ansuccessors of elements in the root set may have a non-emptyforwards set [20], [8]. 2Demonstration available at http://www.hiit.fi/fuego/fc/demos

Page 3: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

existence test. The bottom filters can be combined by fusing supports k neighbours and can be combined with the designthe two ranges. There are covering relations between the from Figure 2a to form a complete router that can supportmerged filters, and only the top filter is propagated. In this both local clients and neighbouring routers. In this paper, weexample, the size of the propagated set was reduced from four focus on the dynamic merging of the root set, which appliesfilters to one filter. to all three configurations. The aggregate merger is a modified

root merger, which merges only filters destined to the sameII. There are covering relations in outbound destination [7], [8].

I. There are no covering the merged set. The more generalrelations in this set of four filter is sent. C. Rules for Merging

We define a set of merging rules for preserving equivalenceStock: abc Sabc k: abc = Stock: a for insertions and deletions between routing data structuresVal: X > 10 + Va:X<1 Val: existnesrin n eltosbtendt

and their counterparts that have been extended with filtermerging. If deletions are allowed, filter merging requires that

SData:X +Data:bXa Stoc=k|ab the merging system keeps track of both the mergers and theirVal: X in [2,14] Val: X in [9,30] Val: X in [2,30]components.

We first present four examples of filter merging and thenformalize the merging rules. Figure 3 presents an example of

Fig. 1. Example of filter merging. filter merging. The trace shows two mergers M(A, B) andM(D, E), which cover the root set. This means that onlymergers are established by the forwarding engine.

B. Merging and Routing TablesTrace:

We propose a merging extension to the generic content- M, M(D,l)1. Add Abased routing table. The desired characteristics for this merg- 2. Add B, M(A,B)3. AddCing mechanism are simplicity in implementation, efficiency, 4, ,< 4. AddED

and minimal requirements for the underlying routing table. A AB dD F EM.Cv(D,E)In addition to accuracy, we have additional requirements with mergerfilter merging:

* Merging must be transparent for applications and routers. Fig. 3. Example of two mergers.* Merging must maintain the set of inserted nodes. An

insert of x may result in a new merged node merge(x, y), Figure 4 illustrates a scenario, in which after the tracebut after the deletion of x the resulting node must cover in Figure 3 F is removed. This results in the removal ofY. the merger M(D, E). The merger must be removed, because

Filter merging may be applied in different places in the its component is removed. This has the result that C andevent router. We distinguish between three different merging D become uncovered. The forwarding engine must removescenarios and techniques: local merging, root-merging, and state about the merger M(D, E) and give C and D to theaggregate merging. In the first scenario, filter merging is forwarding engine.performed within a data structure. In the second scenario,filter merging is performed on the root sets of local filters,edge/border routers, and hierarchical routers. In the third Trace:

scenario, filter merging is performed on the two first levels T1 RacoveEof a peer-to-peer data structure, such as the filters poset. In 2. Remove M(D,E)

3. C,D become

this paper, we focus on the last two cases. uncoveredFigure 2 presents three router configurations with filter

merging and highlights the modular structure of content-basedrouters. Figure 2a illustrates using a poset-derived forest withfilter merging. The main idea is to merge only the root setof the poset-derived forest, i.e., the minimal cover set. Thisconfiguration is intended for local clients and edge brokers. Figure 5 shows a later development, in which a new

Figure 2b presents a modular block suitable for hierarchical element E is added and merged with C resulting in a newrouting. In this case, a non-redundant forest [20] is needed, merger M(C, E). Now, M(C, E) covers an existing mergerwhich ensures that all redundant filters are removed. This M(A, B). Therefore, it must also cover the components ofconfiguration supports k slave routers and n local clients. M(A, B) and any nodes that are covered by M(A, B). The

Figure 2c presents the routing block for peer-to-peer acyclic- covered merger is removed. The forwarding engine establishesgraph-based routing with merging support. In this case, the M(C,EF) and the only root set node not covered by thefilters poset is used with a merging mechanism. The block merger, D, as new state. In Siena semantics this means that

Page 4: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

a) b) c)master peer

Root Merger Root Merger Aggregate Merger

Poset-derived Non-redundant Filters PosetForest Slave Forest Slave (DAG)

~ 1 master router pecr peern local clients k slave routers k neighbours

n local clients

Interface to localclients

Fig. 2. Merging extension for routing tables.

any established covered state is automatically removed by in the data structure. The sets CO and CV are needed in orderneighbouring routers. to maintain transparent operation.

Table I presents six rules for maintaining equivalence. TheTrace: rules pertain to two arbitrary input filters F1 and F2. The

/ Cf <r1 Add rules are presented as condition-action pairs. We assume that-,T 2 Add M(C,E) elements in a conjunction are evaluated from left to right.

3. M(A,B) covered by eeetneautd lf. .' merger and removed. These rules do not specify the semantics or performance of.-_ _ _ ~~~~~~4.C,Bare coveredby 1lz t1ztz lI3Wly11 ulalloV blllAl9V

merger. the merging mechanism. They specify the requirements forequivalence. A merging algorithm needs to follow these rules.For example, rule number five does not imply that re-merging

Fig. 5. Example of a new merger. of the removed merger should not be done.We assume that the routing data structure provides two

In the last example in Figure 6, a new filter F is added operations: add and del. The add inserts a filter to theand merged with an existing merger to form a new merger structure, and del removes a filter. Note that the del in ruleM(C, E, F). This merger covers D and is given to the four is applied to a merger, and the del in rule five is appliedforwarding engine. to a component of a merger. The del operation for a merger is

only invoked internally; the client of the system that sent thecomponents of the merger has no knowledge of its existence.

Trace: When a del is performed to a node that is part of a merger's1. Add F CV set, the deleted node is also removed from that set.

," ," -32. Add M(C,E,F) We also define two auxiliary operations addComponent andII 3.Dcvrdb

._'- 4merger. addComponents. The addComponent(S, F) operation takes ag( A ) ( B ) 1: C ) ( D ) ( E ) ( F }1 | set S and a filter F as arguments and adds F to S if there does

not exist a filter in S that covers F. Similarly, any filters in Scovered by F are removed from S. The addComponents(S,

Fig. 6. Example of nested merging. P) operation is similar to addComponent, but the secondargument, P, is a set.

This first rule says that when a non-merged node coversD. Merging Rules for Preserving Equivalency another non-merged node, the covered node is removed usingWe define a set of rules to keep the merged routing the del operation. The second rule states that when a merger is

state equivalent to the non-merged state under additions and covered by a non-merger, the merger is removed and all of itsdeletions. Three cases need to be considered, which are the components are also removed (Rule 6). The third rule statesfollowing 1. A merger covers a node. 2. A node covers a that when a merger covers a non-merger, the covered nodemerger. 3. A merger covers another merger. is removed and added to the merger's set of covered nodes.

Let M denote the set of merged nodes/filters. Each element Rule 4 specifies that when a merger covers another merger,x C M is a result of a sequence of merge operations and has the covered merger is removed (Rule 6) and all components ofa corresponding set, denoted by CO(x), which contains the the merger and nodes covered by the merger are added to thecomponents of the merger x. Further, let CV(x) denote nodes respective sets of the covering merger. Rule 5 says that whenthat were removed due to covering by x if the merger is placed a component of a merger is removed, all the components and

Page 5: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

TABLE IMERGING RULES

1 F1 I F2 A F1 X M A F2 XM = del(F2)2 F1 IF2 A F1 0 MA F2 cM=>del(F2)3 F1 I F2 A F1 C M A F2 X M => del(F2) A addComponent(CV(Fi), F2)4 F1 I F2 A F1 c M A F2 c M => addComponents(CV(Fi), CO(F2)) A addComponents(CV(Fi), CV(F2)) A del(F2)5 del(F) A ('x E M: F E CO(x)) => (Vx E CO(F) \ {F}: add(x)) A (Vx E CV(F): add(x))6 del(F) A F c M > (M' = M \ {F}) A (Vx E CO(F): del(x)) A (Vx E CV(F): del(x))

covered nodes should be returned to the data structure. After Algorithm 1 The merge-set algorithm.this, the merger should be removed. The final rule 6 states thatwhen a merger is removed, all its components must also be MERGE-SET(S, R, n)removed. 1 > Scan set for mergeable nodesremoved. 2 ForAll xC S

3 do4 if MERGEABLE(x, n)

V. ROOT-SET MERGING ALGORITHM 5 then6 mT merge(x,n)7 ADDCOMPONENTS(CO(m), CO(x) U {n, x})

In this section we present the basic root-set merging al- 8 ADDCOMPONENTS(CV(m), CV(x))gorithm that is the building block for the various merging 9 CO(x) - 0mechanisms: local, hierarchical, and aggregate presented in 10 CV(x) 0Figure 2. 11 S~- S \{fx}12 > Scan merged set for cover

Algorithm 1 presents the basic merging algorithm that tests 13 ForAll z C Mthe mergeability of the given node n against the given set 14 doS. If a merge is possible, the mergeability rules are applied 15 ifmhnlzand the root set is scanned for covered nodes. The algorithm 17 ADDCOMPONENTS(CV(m)maintains the CV set that contains the direct successors of a 18 CO(z))merged node. This set is convenient when deleting mergers. 19 ADDCOMPONENTS(CV(m),Algorithm 1 is linear to the size of the input set S, the number 20 CV(z))of root nodes, and the size of the merger set M. Since S 21 M4<M \{z}is either R or M, and R »> M, we have 0( R ) time 23 > Scan root set for cover

complexity. 24 doAlgorithm 2 gives the algorithm for the addition of a new 25 if y , CV(m) A y , CO(m) and

node N. When many nodes are added to the root set after 26 m -1ya del, this may be invoked sequentially. The algorithm first 28 ADDCOMPONENT(CV(m),y)scans the merger set M for covering nodes. If one or more 29 M AM u mare found, it is enough to update the CV sets of the covering 30 return TRUEmergers by adding the new node to them. If there are no 31 return FALSEcovering mergers, the algorithm invokes merge-set first withthe set of mergers M and if that is not successful, the setof non-covered and non-merged root nodes. We note that thealgorithm may not find the best possible mergers, but this VI. EXPERIMENTATION WITH DYNAMIC ROOT MERGINGapproximative approach is more favourable when support for Figure 7 illustrates the structure of the filter merging bench-frequent updates is required. mark. First, a number of filters (up to 500) are generated using

Algorithm 3 presents the del operation. When a root node a workload generator. Each filter has a unique interface. Theis removed, the root-set-del is invoked first, and after that filters are added to a poset-derived forest extended with thethe actual del operation is performed for n. The root-set-del root merging algorithm. The merging algorithm maintains asimply removes any mergers whose components are removed. merged root set when filters are added and removed to theA node is also removed from any CV sets. Possible re- structure.merging is performed when elements of the uncovered set, We use the simple perfect merging algorithm presentedwhen non-empty, are added to the structure. Finally, the root in [7], [8]3, which merges two input filters if it is possible.set is scanned for nodes that are not covered by mergers We note that the mechanism is not limited to this approach.and that have not yet been forwarded. Actual implementations Dynamic root set merging generalizes also for imperfectneed also to take into account any changes due to interfaceelimination and nodes being removed from the root set. 3Source available at http://hoslab.cs.helsinki.fi/homepages/fuego-core/

Page 6: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

Algorithm 2 The root-set-add algorithm. scenario is the initial state for the add/remove scenario. Thenumber of filters to be added and removed in the add/remove

ROOT- SET-ADD (n) microbenchmark was 100.1 Let F be a Boolean flag2 F ~-- FALSE X1 Add or Add/Remove scenanro3 > Check if the node is covered by a merger Filters4 ForallxeCM Workload Forest Root Correctness

5 do generator Merger testing

6 if xlr n Notifications

7 then 2. Matching test8 ADDCOMPONENT(CV(X), n)9 F& TRUE Fig. 7. Add and add/remove scenario with the root merger algorithm.10 elseif -F A n Il x11 then remove merger x12 if F Correctness of operation is ensured by testing that the13 then return14 merged set does not result in false negatives and positives, and15 > Try to merge with merged set ensuring that the internal data structures used by the merging16 F <- MERGE-SET(M,R,n) algorithm are correct.17 if F Figure 8 presents the results for the variable number of19 then return attribute filters case. In this scenario, filter merging may be

20 > Try to merge with non-covered and non-merged nodes performed without significant overhead, because the root set is21 T - R \ (UxGM CV(x) U CO(x)) small due to covering and root filters have only a few attribute22 F MERGE-SET(T,R,n) filters. Merging is very useful in this case and a constant or

near constant root set size is achieved, whereas the non-mergedroot set size is linear to the number of input filters.

Algorithm 3 The root-set-del algorithm. Figure 9 presents the results for two static attribute filters.Filter merging is also beneficial in this case, but has more

ROOT-SET-DEL (n) overhead than for the previous microbenchmark. The merged1 iftheM:nCCO(z) filter set size has also a near constant size in this case. The

3 CO() {0} root set size is larger in this case compared to the previous4 CV(X) {0} case and filter merging cannot be performed as often due to5 M '-M \ {x} the static number of attribute filters.6 Forall xC M: n CV(x) Figure 10 shows the results for three static attribute filters.

8 CV(d ) CV(x) \{ In this case, the merging overhead is substantial and theroot set size is not reduced. This demonstrates the effectsof a non-mergeable workload. We observe that to preventunreasonable overhead to due ineffective merging attempts,

merging. The difference is that imperfect merging allows to the filter merging framework should be able to classify filtersbalance between size and accuracy, whereas perfect m into mergeable subsets. This can be realized in a number ofensures that there are no false positives. ways, for example, by identifying non-mergeable schemas (if

schemas are used) or by using external information pertainingThe filters were generated using the structure enforced by tohmereabilia schema. Each measurement was replicated 20 times and the to ergeabitypredicates were randomly selected form the set of predicates A Ex< > , [a, b] using a uniform distribution. The perments on a Small Devce

range for integer values was 100. We used the following We also experimented with dynamic filter merging on aequipment: an HP laptop with a 2 GHz Pentium III and 512MB small device, namely the Nokia Communicator 9500. Weof main memory, Windows XP, and Java JDK 1.4.2. The ported the desktop benchmark directly to the device's normaltwo benchmarked filter schemas were: a variable number of Java environment. The code was not optimized for the smallattribute filters (1-3), and a static number of attribute filters (2 device environment. The purpose of the experiment was toand 3). evaluate the feasibility of using the implemented filter merging

The two important benchmarks were the add scenario and system on small devices, for example, to realize filter-basedaddlremove scenario. The former consists of the creation of peer-to-peer systems.the forest data structure with the given set of input filters. We experimented with one and two static attribute filterThe latter consists of repeated insertions and deletions to the microbenchmarks. The maximum number of filters in thisdata structure. In the add/remove scenario a filter is removed experiment was 225 and each measurement was replicated 20and a new random filter with the same interface identifier is times. In the add/remove scenario 25 filters were added andcreated and added to the data structure. The state after the add removed.

Page 7: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

Add scenario time Add scenario root set size400 50350 4300-250 3

'~200 -2E 250 - 20E150~~~~~~~~~~~~1

50 5

100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500Filters Filters

Forest I Forest IAIMerged forest ;--A--> Merged forest

Add/remove scenario total time (ms) Add/remove scenario total ops140 0 4500100~~~~~~~~~~C 3000120 0,if,i ° 30

3500=

F 40 150020- . 1000400 0100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500

Filters Filters

Forest I Forest IAIMerged forest ;--A--> Merged forest

Fig. 8. Add and add/remove scenario for a variable number of attribute filters.

Add scenario time Add scenario root set size2000 1601800 1401600

~'1400E 12000i

1000 8E 800 U~60

600 -40

0 0100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500

Filters Filters

Forest I A Forest I AMerged forest ---<--> Merged forest -A--

Add/remove scenario total time Add/remove scenario total ops800 18000

600 CL 12000 -

300 1 4000200 1 2000100 1 0000

E 300 I" 0 6000

100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500Filters Filters

Forest IAI Forest IAIMerged forest ;--A--> Merged forest -A-

Fig. 9. Add and add/remove scenario for a static number of attribute filters (2).

Page 8: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

Add scenario time Add scenario root set size12000 30010000 250-8000 200

, 6000 +-- _' 150E I4000 100

2000 _--- 503IIII0 ______________

100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500Filters Filters

Forest I A Forest I AMerged forest ;--A--> Merged forest --A--

Add/remove scenario total time Add/remove scenario total ops7000 450006000 40000

~~ 5000 - j~~~~--- ~ 3500003000 -- 2000150000

E 4000 2500003000 - > 20000

0F 000, 0F

0 0100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500

Filters Filters

Forest I A Forest I A,Merged forest ;--A--A Merged forest

Fig. 10. Add and add/remove scenario for a static number of attribute filters (3).

We found the one attribute filter microbenchmark to be a clients, slave routers in hierarchical operation, and neighbour-very good scenario for merging, because merging is tested ing routers in peer-to-peer operation. The compositionalityonly in one dimension and it is thus probable. The merging of routing and filtering blocks is important for scalabilitymechanism has some overhead in the add scenario, but both and extensibility. Filter merging may be separately applied tomerged and non-merged insertion times were approximately various components of a router, namely the part that manageslinear to the number of input filters. The size of the root set local clients and the part that handles external traffic betweenwere very small for this experiment. neighbouring routers.

Figure 11 presents the results for the two attribute filters We presented performance results for dynamic merging withmicrobenchmark. The benefits of filter merging are visible also frequent filter additions and removals using current routingin this scenario. This was expected, because the filter structure data structures. The results indicate that filter merging mayis identical to the one used in the experiment of Figure 9. In be used to considerably reduce the processing overhead ofthis case the merging add has significantly more overhead than neighbouring routers in a distributed environment given athe non-merging add. Perfect merging can still be performed mergeable workload. We presented performance results on aand the root set size stabilizes approximately at 30 filters. The small device and the results suggest that dynamic merging canadd/remove scenario costs are similar for the two mechanisms be performed also on resource limited systems. We observedwith the merging system having more overhead. that a non-mergeable workload presents considerable overhead

to the system. This overhead can be mitigated by classifyingVII. CONCLUSIONS filters by their expected mergeability property. Our future work

In this paper we presented algorithms for generic dynamic includes investigation of run-time filter mergeability analysis.filter merging grounded on a formal filter merging framework.Filter merging is a technique to remove redundancy from filter REFERENCESsets. Our system assumes that covering relations are known orcan be computed for filters. We showed that filter merging can [1] P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec, "Thebe effectively performed on the minimal cover set returned by many faces of publish/subscribe," ACM Comput. Surv., vol. 35, no. 2,pp. 114-131, 2003.cover-based routing structures. [2] A. Carzaniga and A. L. Wolf, "Forwarding in a content-based network,"

The presented mechanisms are useful for information in Proceedings of ACM SIGCOMM 2003, Karlsruhe, Germany, Aug.processing applications, such as firewalls, auditing gateways, 2003, pp. 163-174.

' ' ' ~~~~~~~~~~[3]G. Muihl, "Large-scale content-based publish/subscribe systems," Ph.D.peer-to-peer systems, and sensor networks. The proposed dissertation, Darmstadt University of Technology, September 2002.framework is applicable for the merging of filters from local [Online]. Available: http://elib.tu-darmstadt.de/diss/000274/

Page 9: [IEEE 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks - Espoo, Finland (2007.06.18-2007.06.21)] 2007 IEEE International Symposium on a World

Add scenario time Add scenario root set size50000 9045000- 80-40000- 70-35000 - 60-

E. 30000 - 50-~, 25000 - - 40

E 20000 - 43015000 -100001 20 - --5000 - 10-

0 00 50 100 150 200 250 0 50 100 150 200 250

Filters Filters

Forest - A Forest A -Merged forest ;--A--> Merged forest --A--

Add/remove scenario total time Add/remove scenario total ops12000 I 300010000 _ -_ 2500-8000 - C 2000-

'~6000 - _4----i-- -j- 1500-E -'0

4000--L 1000-2000 - - 500

0 00 50 100 150 200 250 0 50 100 150 200 250

Filters Filters

Forest - A Forest H *A-Merged forest ---A--> Merged forest ---A--

Fig. 11. Add and add/remove scenario for a static number of attribute filters (2) on the Communicator 9500.

[4] A. Carzaniga, J. Deng, and A. L. Wolf, "Fast forwarding for content- ability and efficiency in publish/subscribe systems," in In Proceedingsbased networking," Department of Computer Science, University of of the 1st International Workshop on Distributed Event-Based SystemsColorado, Tech. Rep. CU-CS-922-01, Nov. 2001. [Online]. Available: (DEBS'02), J. Bacon, L. Fiege, R. Guerraoui, A. Jacobsen, and G. Miihl,http://www.cs.colorado.edu/ carzanig/papers/index.html Eds., 2002.

[5] F. Cao and J. P. Singh, "Efficient event routing in content-based publish- [15] , "Subscription summarization: A new paradigm for efficient pub-subscribe service networks," in Proceedings ofIEEE INFOCOM 2004. lish/subscribe systems." in ICDCS. IEEE Computer Society, 2004, pp.Hong Kong, China: IEEE, Mar. 2004. 562-571.

[6] F. Fabret, H. A. Jacobsen, F. Llirbat, J. Pereira, K. Ross, and [16] G. Li, S. Hou, and H.-A. Jacobsen, "A unified approach to routing,D. Shasha, "Filtering algorithms and implementation for very fast covering and merging in publish/subscribe systems based on modifiedpublish/subscribe," in Proceedings of the 20th Intl. Conference on binary decision diagrams." in ICDCS. IEEE Computer Society, 2005,Management ofData (SIGMOD 2001), T. Sellis and S. Mehrotra, Eds., pp. 447-457.Santa Barbara, CA, USA, 2001, pp. 115-126. [Online]. Available: [17] A. Carzaniga, D. S. Rosenblum, and A. L. Wolf, "Design andhttp://www-caravel.inria.fr/LeSubscribe/sigmodOl.ps evaluation of a wide-area event notification service," ACM Transactions

[7] S. Tarkoma and J. Kangasharju, "Filter merging for efficient information on Computer Systems, vol. 19, no. 3, pp. 332-383, Aug. 2001. [Online].dissemination." in CoopIS, 2005, pp. 274-291. Available: http://www.cs.colorado.edu/ carzanig/papers/

[8] S. Tarkoma, "Efficient content-based routing, mobility-aware topologies, [18] A. Kiani and N. Shiri, "Containment of conjunctive queries withand temporal subspace matching," Ph.D. dissertation, Department of arithmetic expressions." in CoopIS, 2005, pp. 439-452.Computer Science, University of Helsinki, 2006, available at ethe- [19] C. Y Chan, W. Fan, P. Felber, M. N. Garofalakis, and R. Rastogi, "Treesis.helsinki.fi. pattern aggregation for scalable XML data dissemination." in VLDB,

[9] G. Miihl, L. Fiege, F. C. Gartner, and A. P. Buchmann, "Evaluating 2002, pp. 826-837.advanced routing algorithms for content-based publish/subscribe sys- [20] S. Tarkoma and J. Kangasharju, "Optimizing Content-based Routers:tems," in The Tenth IEEEIACM International Symposium on Modeling, Posets and Forests," The Journal of Distributed Computing, 2006, toAnalysis and Simulation of Computer and Telecommunication Systems appear.(MASCOTS 2002), A. Boukerche, S. K. Das, and S. Majumdar, Eds. [21] J. Antollini, M. Antollini, P. Guerrero, and M. Cilia, "Extending RebecaFort Worth, TX, USA: IEEE Press, October 2002, pp. 167-176. to support concept-based addressing," in First Argentine Symposium on

[10] A. Crespo, 0. Buyukkokten, and H. Garcia-Molina, "Query merging: Information Systems (ASIS 2004), Sept. 2004.Improving query subscription processing in a multicast environment."IEEE Trans. Knowl. Data Eng., vol. 15, no. 1, pp. 174-191, 2003.

[11] Y.-M. Wang, L. Qiu, D. Achlioptas, G. Das, P. Larson, and H. J.Wang, "Subscription partitioning and routing in content-based pub-lish/subscribe networks," in Distributed algorithms, ser. Lecture Notesin Computer Science, D.Malkhi(Ed.), Ed., vol. 2508/2002, Oct 2002.

[12] Y.-M. Wang, L. Qiu, C. Verbowski, D. Achlioptas, G. Das, and P. Larson,"Summary-based routing for content-based event distribution networks,"SIGCOMM Comput. Commun. Rev., vol. 34, no. 5, pp. 59-74, 2004.

[13] B. H. Bloom, "Space/time trade-offs in hash coding with allowableerrors," Commun. ACM,4 vol. 13, no. 7, pp. 422-426, 1970.

[14] P. Triantafillou and A. Economides, "Subscription summaries for scal-