a hybrid multicast-unicast infrastructure for efficient publish-subscribe in enterprise networks
DESCRIPTION
A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks. Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock IBM Haifa Research Lab, Israel. Outline. Motivation The channelization problem Our hybrid approach Experimental results Conclusions. - PowerPoint PPT PresentationTRANSCRIPT
A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in
Enterprise Networks
Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock
IBM Haifa Research Lab, Israel
IBM Haifa Research Lab
2
Outline
Motivation The channelization problem Our hybrid approach Experimental results Conclusions
IBM Haifa Research Lab
3
Motivation: large scale publish subscribe application
Large number of information flows (topics) and subscribers
Each flow must be delivered to a subset of interested subscribers
Example: financial market data dissemination
Publisher divides data feed into a large number information flows, (~100K) e.g. stock symbols, futures, commodities
Many stand-alone subscribers (~1K) Subscribers display interest heterogeneity -
are interested in different yet overlapping subsets of the topics
Any single topic may be delivered to a large number of subscribers (hot / cold topics)
Subscribers
Publisher
Data VendorWAN
Enterprise LAN
Multiple information flows (Topics)
IBM Haifa Research Lab
4
Common approaches
Use unicast (point-to-point) connections Limitations: poor utilization of network resources (duplicate
transmissions) Use broadcast (single multicast channel)
Limitations: receivers filter unwanted content Utilize multicast to transmit data
Topics are mapped into multicast groups. Each user joins the groups that cover his topic-interest.
Reduces receiver filtering Limitations: limited amount of multicast addresses
Network element state problem Receiver resources (NICs)
IBM Haifa Research Lab
5
Our novel contribution
Create a hybrid approach that combines both multicast and unicast Flexible allocation of transmissions Topics with high interest enjoy efficiency of multicast Topics with low interest are transmitted in unicast
Formalize as an optimization problem Propose a two step alternating method for computing the resource
allocation
IBM Haifa Research Lab
6
The Channelization Problem
n flows Flow rates λ k multicast groups m users Interest matrix W
The task: find mapping matrices X,Y that minimizes the communication cost
The cost of transmission – take into account transmission to multiple groups
The cost of reception – minimize excess filtering
IBM Haifa Research Lab
7
The Hybrid Channelization Problem
F1
F2
Fn
F3
G1
G2
Gk
U1
U2
Um
U3
Flows
Users
Multicast Groups
F1 F2
F1 F2 F8
F3 F4 F6
F1 Fn
InterestExtraction (W)
F4
X – flow to group map
Y – user subscription map
T – unicast transmission map
IBM Haifa Research Lab
8
The Hybrid Channelization Problem
Modified cost function
Problem objective is
Cost of multicast reception
Cost of multicast transmission
Cost of unicast reception & transmission
IBM Haifa Research Lab
9
Proposed Solution
Unfortunately the hybrid problem is NP-hard We propose a two step heuristic solution
First step: solve the channelization problem (multicast mapping) Second step:
Choose flow-user pairs for unicast, Remove redundant assignments from multicast mapping Recalculate the cost
Iterate until convergence, or unicast BW limit exceeded
IBM Haifa Research Lab
10
First step: channelization problem solution
We have experimented with the following algorithms
K-Means (2005) performs best
IBM Haifa Research Lab
11
K-Means Mapping Algorithm
Input Interest matrix, topic rate vector
Basic insight Put “similar” topics in the same group “Similar” topics have a similar audience -
causes less filtering
Take the rate into account
Iterative Clustering Algorithm (K-means) Init: Topics are assigned into a fixed number of groups Move: In each step, remove a single topic, and move it to
the best group – the one producing the lowest cost Cost: After each epoch, compute total filtering cost Stop: cost doesn’t improve | time elapsed | max # iter.
T1
T2
T3 T4
T5
T6
T7
T8
T9
T5
?
?
?
v x x x x
x v v x xUsers
Topics
x x v v v
User’s Interest Vector
Topic’sAudience Vector
Interest Matrix =
R1 R2 … RKRate Vector =
IBM Haifa Research Lab
12
Second step: choosing user-flow pairs for unicast
Experimented with several heuristics Heavy users - all transmission to a specific heavy user is sent using
unicast Lightweight flows - flows with low bandwidth are sent using unicast Greedy flows - move to unicast the flow which best minimizes the
total cost Greedy users - move to unicast the user which best minimizes the
total cost An additional heuristic - Greedy user-flow pairs – move to unicast
the user-flow pair which best minimizes the total cost - very slow, impractical run-time
IBM Haifa Research Lab
13
Experimental results
Construction of user-interest matrix W Random, uniform Market distribution – based on a model of NYSE stock volume IBM WebSphere cell – a real system
IBM Haifa Research Lab
14
Channelization algorithms
K-Means (2005) performs best
Takes rate into account Gradient decent on the
true cost function
IBM Haifa Research Lab
15
Effect of the interest matrix on channelization performance
The interest and rate have a significant effect on channelization performance
Some interests have patterns that are easy to “channelize”
Interests with less entropy, more order, are easier
IBM Haifa Research Lab
16
Hybrid Algorithm Heuristics
Market dist. - Greedy users
Can use more unicast BW
WebSphere dist. - Greedy flows
Doesn’t need more than 20% unicast BW
Unicast BW limit – algorithm will use optimal amount up to the limit
IBM Haifa Research Lab
17
Hybrid using greedy flow – unicast / multicast tradeoff
Unicast BW allocation – exact amount of unicast BW used
Every interest and rate distribution has an optimal amount of unicast BW it can use
The hybrid approach improves upon both unicast-only and multicat-only
IBM Haifa Research Lab
18
Conclusions
We have presented a novel hybrid approach for publish subscribe We have shown using extensive and realistic simulation results that our
approach reduces consumed network and host resources K-Means (2005) performs best for channelization, from the selection of
algorithms we tested Greedy hybrid heuristics performed best in our tests Relative competitiveness of the greedy-flows & greedy-users heuristics
depends on the structure of the interest matrix and rate
~ The End ~
IBM Haifa Research Lab
19
Model based on statistical analysis of NYSE daily trade data
20K Topics 500 Subscribers Avg. ~70 flows / user Min 15 flows / user Max 115 flows / user Avg. message fan out
~10.1 clients
Multicast - message is transmitted once
Unicast transmitter data rate is x10 of multicast !
Real Life Messaging Load Model
Backup – Model
IBM Haifa Research Lab
20
Messaging Load Model – Based on Market Research Financial front office
Hundreds of users, requiring stock quotes and financial information from several markets
Topic space structureWithin each market, symbol popularity and
rate are exponentially distributed (NYSE market research)Several different markets, with Avg.
popularity and size prop. ~1/m (assumption).20K flows, 10 markets, 500 users
User interestEach user: selects some markets, selects a
percent of the symbols from each chosen market, according to the said distributions
0 1000 2000 3000 400010
0
101
102
103
104
105
NYSE daily trade
Symbol rank
Num
ber
of t
rade
s
Daily trade, July 7 2004Expo. fitDaily trade min/max in July
0 0.5 1 1.5 2
x 104
0
5
10
15
20
Symbols, by Market and Rank
Msg
/Sec
Avg. Message Rate
Market 1
Market 10 Market 2
~10% of Symbols~55% of trade
Backup – Model
IBM Haifa Research Lab
21
Mapping Algorithm Input
interest matrix, topic rate vector Basic insight
Put “similar” topics in the same group
“Similar” topics have a similar audience
A group with a homogenous audience causes less filtering
Take the rate into account The cost of putting two topics in
the same group The cost of adding a new topic
to a group of topics
v x x x xx v v x xUsers
Topics
x x v v v
Interest Matrix
Topics with identical audience
Topics with similar audience
v xv vx vx x
Users R20R10
Topics
1 2
1
23
4
R1+ R2
Filtering Cost
Rk – the rate of topic k
Backup – Algorithm
IBM Haifa Research Lab
22
Iterative Clustering Algorithm (K-means) Init: Topics are assigned into a fixed number of groups Move: In each step, remove a single topic, and move it
to the best group – the one producing the lowest cost Cost: After each epoch, compute total filtering cost Stop: time elapsed | cost does not improve | exceeded
max number of iterations
Topic group
vvvxxx
vxvvxx
vvvxvx
xvvxxx
1 2 3
Users
vvvvxx
Groupaudience vector
Candidatetopic 5
R1+R2+R3
0
R5
0
R1+R2+R3+R5
The cost of adding topic 5 to topic group {1,2,3}
00
The best group for topic K
is the group
with the lowest cost
T1
T2
T3T4
T5
T6
T7
T8
T9
T5
?
?
?
Backup – Algorithm