jon maloy, ericsson steven blake, modularnet maarten koning, windriver jamal hadi salim,znyx
DESCRIPTION
TIPC as TML. draft-maloy-tipc-01.txt. Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx Hormuzd Khosravi,Intel. IETF-61, Washington DC, Nov 2004. TIPC. A transport protocol for cluster environments - PowerPoint PPT PresentationTRANSCRIPT
Jon Maloy, EricssonSteven Blake, Modularnet
Maarten Koning, WindRiverJamal Hadi Salim,ZnyxHormuzd Khosravi,Intel
draft-maloy-tipc-01.txt
TIPC as TML
IETF-61, Washington DC,
Nov 2004
NOKIA RESEARCH CENTER / BOSTON
TIPCTIPC A transport protocol for cluster environments
Connectionless and Connection Oriented; Reliable or Unreliable. Reliable or Unreliable Multicast Usage not limited to ForCES context
A framework for detecting, supervising and maintaining cluster topology
Available as portable open source code package under BSD licence
12000 lines of C code, 112 kbyte Linux kernel module Runs on 4 OS:es so far, and more to come
Proven concept, used and deployed in several Ericsson products
NOKIA RESEARCH CENTER / BOSTON
ForCES Protocol FrameworkForCES Protocol Framework
ForCES Protocol Messages
CE TML
CE PL (ForCES Protocol)
Transport (IP,TCP,RapidIO,Ethernet…)
FE TML
FE PL (ForCES Protocol)
Transport (IP,TCP,RapidIO,Ethernet…)
NOKIA RESEARCH CENTER / BOSTON
TIPC as L2 TMLTIPC as L2 TML
ForCES Protocol Messages
TIPC TML
CE PL (ForCES Protocol)
L2 Transport (RapidIO,Ethernet…)
TIPC TML
FE PL (ForCES Protocol)
L2 Transport (RapidIO,Ethernet…)
NOKIA RESEARCH CENTER / BOSTON
Interface AdaptationInterface Adaptation
ForCES Protocol Messages
TIPC TML
CE PL (ForCES Protocol)
L2 Transport (RapidIO,Ethernet…)
TIPC TML
FE PL (ForCES Protocol)
L2 Transport (RapidIO,Ethernet…)
Interface Adaptation Interface Adaptation
NOKIA RESEARCH CENTER / BOSTON
Reliability Reliable transport in all modes Can be made unreliable per socket/direction
Security Only secure within closed networks. No explicit authentication/encryption support yet, but planned Not IP-based, no router will forward TIPC messages!!
Congestion Control At three levels: Connection/Transport, Signalling Link and Carrier level Will give feedback to PL layer if connection is broken or message
rejected Multicast/Broadcast
Supported
Fulfilling Requirements(1)Fulfilling Requirements(1)
NOKIA RESEARCH CENTER / BOSTON
Timeliness Immediate delivery (No Nagle algorithm) Inter-node delivery time in the order of 100 microseconds
HA Considerations L2 link failure detection and failover handled transparently for user Connection abortion with error code if no redundant carrier available Peer node failure detection after 0.5-1.5 seconds
Encapsulation 24 byte extra header 40 extra for connectionless
Priorities Supports 4 message importance priorities, determining congestion
levels and abort/rejection levels Is 8 levels really needed ?
Fulfilling Requirements(2)Fulfilling Requirements(2)
NOKIA RESEARCH CENTER / BOSTON
Connection Directly on TIPCConnection Directly on TIPC
LFB 1 LFB 2FE
Object
FB X FB YCE
Object
FE
CE
TIPC
NOKIA RESEARCH CENTER / BOSTON
Connections via FE/CE ObjectConnections via FE/CE Object
FE Object
CE Object
FE
CE
TIPC
LFB 1 LFB 2
FB X FB Y
NOKIA RESEARCH CENTER / BOSTON
Connection UsageConnection Usage
FE Object
CE Object
FE
CE
LFB 1 LFB 2
FB X FB Y
Control Connection:High PriorityReliable in both directions
Traffic Data Connection:Low PriorityReliable CE->FEUnreliable FE->CE
TIPC
NOKIA RESEARCH CENTER / BOSTON
Server Process,Partition B
Server Process,Partition A
Client Process
bind(type = foo, lower=0, upper=99)
sendto(type = foo, instance = 33)
bind(type = foo, lower=100, upper=199)
foo,33
Functional Addressing: UnicastFunctional Addressing: Unicast Function Address
Persistent, reusable 64 bit port identifier assigned by user Consists of type number and instance number
Function Address Sequence Sequence of function addresses with same type
NOKIA RESEARCH CENTER / BOSTON
Address Mapping -UnicastAddress Mapping -Unicast
FE Object
CE Object
FE
CE
LFB 1Meter
44
FB XRSVP
77
TIPC
TIPC API
TML APItml_bind(RSVP,77)
bind(RSVP,77,77)
TML APItml_bind(meter,44)
bind(meter,44,44)TIPC API
NOKIA RESEARCH CENTER / BOSTON
Connection SetupConnection Setup
FE Object
CE Object
FE 17
CE 8
LFB 1Meter
44
FB XRSVP
77
TIPC
TIPC API
TML APItml_bind(RSVP,77)
bind(RSVP,77,77)
tml_connect(RSVP,77, CEID=8)
connect(RSVP,77,node=8)
If instance numbers are coordinated over whole cluster there is no need for LFBs to know CEID
NOKIA RESEARCH CENTER / BOSTON
Server Process,Partition B
Server Process,Partition A
Client Process
bind(type = foo, lower=0, upper=99)
sendto(type = foo, lower = 33,
upper = 133)
bind(type = foo, lower=100, upper=199)
foo,33,133
foo,33,133
Functional Addressing: MulticastFunctional Addressing: Multicast Based on Function Address Sequences
Any partition overlapping with the range used in the destination address will receive a copy of the message
Client defines “multicast group” per call
NOKIA RESEARCH CENTER / BOSTON
Address Mapping -MulticastAddress Mapping -Multicast
FE Object
CE Object
FE
CE
Meter13
Meter44
FB XRSVP
77
TIPC
tml_mcast(meter_mc,group=X)
sendto(meter_mc,X,X)
tml_join(meter_mc,X)
bind(meter_mc,X,X)bind(meter_mc,X,X)
tml_join(meter_mc,X)
NOKIA RESEARCH CENTER / BOSTON
Questions???Questions???
NOKIA RESEARCH CENTER / BOSTON
Congestion control at three levels Connection level, signalling link level and media level Based on 4 importance priorities
Simple to configure Each node needs to know its own identity, that is all Automatic neighbour detection using multicast/broadcast
Lightweigth, Reactive Connections Immediate connection abortion at node/process failure or overload
Toplogy Subscription Service Functional and physical topology
Why TIPC in ForCES ?Why TIPC in ForCES ?
NOKIA RESEARCH CENTER / BOSTON
Infiniband Mirrored MemoryEthernet SCTPUDP
Bearer Adapter API
Sequence/RetransmissionControl
Packet BundlingCongestion Control
Fragmentation/De-fragmentation
Reliable Multicast Neighbour DetectionLink Establish/Supervision/Failover
Address Table Distribution
Connection SupervisionRoute/Link Selection
Address Subscription Address Resolution
User Adapter API
Socket API Adapter Port API Adapter Other API Adapters
NodeInternal
Functional ViewFunctional View
NOKIA RESEARCH CENTER / BOSTON
Zone <1>
Zone <2>
Node <1.2.3>
Internet/Intranet
Slave Node <2.1.3333>
Network TopologyNetwork Topology
Cluster <1.2>
Cluster <1.1>
Cluster <2.1>
NOKIA RESEARCH CENTER / BOSTON
Server Process,Partition B
Server Process,Partition A
Client Process
bind(type = foo, lower=0, upper=99)
sendto(type = foo, instance = 33)
bind(type = foo, lower=100, upper=199)
foo,33
Functional Addressing: UnicastFunctional Addressing: Unicast Function Address
Persistent, reusable 64 bit port identifier assigned by user Consists of type number and instance number
Function Address Sequence Sequence of function addresses with same type
NOKIA RESEARCH CENTER / BOSTON
Server Process,Partition B
Server Process,Partition A
Client Process
bind(type = foo, lower=0, upper=99)
sendto(type = foo, lower = 33,
upper = 133)
bind(type = foo, lower=100, upper=199)
foo,33,133
foo,33,133
Functional Addressing: MulticastFunctional Addressing: Multicast Based on Function Address Sequences
Any partition overlapping with the range used in the destination address will receive a copy of the message
Client defines “multicast group” per call
NOKIA RESEARCH CENTER / BOSTON
Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved
Client Process
sendto(type = foo, lower = 33,
upper = 133)
Node <1.1.1> Server Process,Partition B
Server Process,Partition A
bind(type = foo, lower=0, upper=99)
bind(type = foo, lower=100, upper=199)
foo,33,133
Location TransparencyLocation Transparency
NOKIA RESEARCH CENTER / BOSTON
Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved
Client Process
sendto(type = foo, lower = 33,
upper = 133)
Node <1.1.1> Server Process,Partition B
Server Process,Partition A
bind(type = foo, lower=0, upper=99)
bind(type = foo, lower=100, upper=199)
foo,33,133
Location TransparencyLocation Transparency
Node <1.1.2>
NOKIA RESEARCH CENTER / BOSTON
Node <1.1.2>
bind(type = foo, lower=100, upper=199)
Node <1.1.3>
Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved
Client Process
sendto(type = foo, lower = 33,
upper = 133)
Node <1.1.1> Server Process,Partition B
Server Process,Partition A
bind(type = foo, lower=0, upper=99)
foo,33,133
Location TransparencyLocation Transparency
NOKIA RESEARCH CENTER / BOSTON
Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client
bind(type = foo, lower=0, upper=99)
Client Process
sendto(type = foo, lower = 33,
upper = 133)
Server Process,Partition A’
Server Process,Partition A
bind(type = foo, lower=0, upper=99)
foo,33,133
Address BindingAddress Binding
NOKIA RESEARCH CENTER / BOSTON
Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client
Same socket may bind to many partitions
bind(type = foo, lower=100, upper=199)
Client Process
sendto(type = foo, lower = 33,
upper = 133)
Server Process,Partition B
Server Process,Partition A+B’
bind(type = foo, lower=0, upper=99)bind(type=foo, lower=100, upper=199)
foo,33,133
Address BindingAddress Binding
NOKIA RESEARCH CENTER / BOSTON
Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client
Same socket may bind to many partitions Same socket may bind to different functions
bind(type = foo, lower=100, upper=199)
Client Process
sendto(type = foo, lower = 33,
upper = 133)
Server Process,Partition B
Server Process,Partition A
bind(type = foo, lower=0, upper=99)bind(type=bar, lower=0, upper=999)
foo,33,133
Address BindingAddress Binding
NOKIA RESEARCH CENTER / BOSTON
Server Process,Partition B
Server Process,Partition A
Client Process
bind(type = foo, lower=0, upper=99)
subscribe(type = foo, lower = 0,
upper = 500)
bind(type = foo, lower=100, upper=199)
foo,100,199
foo,0,99
Functional Topology SubscriptionFunctional Topology Subscription Function Address/Address Partition bind/unbind events
NOKIA RESEARCH CENTER / BOSTON
TIPC
bind(type = node, lower=0x1001003, upper=0x1001003)
Node <1.1.2>
Client Process
subscribe(type = node, lower = 0x1001000,
upper = 0x1001009)node,0x1001003
node,0x1001002
Node <1.1.1>
Node <1.1.3>
bind(type = node, lower=0x1001002, upper=0x1001002)
TIPC
Network Topology SubscriptionNetwork Topology Subscription Node/Cluster/Zone availability events
Same mechanism as for function events
NOKIA RESEARCH CENTER / BOSTON
ForCES Applied on TIPCForCES Applied on TIPC
Network EquipmentNetwork Equipment
Control ElementControl Element
Forwarding Element Forwarding Element
OSPF, RIPOSPF, RIP COPS, CLI, SNMPCOPS, CLI, SNMP Other ApplicationsOther Applications
ForCES Protocol/TIPC
LFB <IPv4F,5>LFB <CNT,17>LFB <IPv4F,1>LFB <CNT,32>
NOKIA RESEARCH CENTER / BOSTON
Network EquipmentNetwork Equipment
Control ElementControl Element Control ElementControl Element
ForCES applied on TIPCForCES applied on TIPC
Control ElementControl Element
Forwarding Element Forwarding Element Forwarding Element Forwarding Element
OSPF, RIPOSPF, RIP COPS, CLI, SNMPCOPS, CLI, SNMP Other ApplicationsOther Applications
Internet
InternetForCES Protocol/TIPC
LFB <IPv4F,5>LFB <CNT,17>LFB <IPv4F,1>LFB <CNT,32>
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS Establishment based on functional addressing
Selectable lookup algorithm, partitioning, redundancy etc No protocol messages exchanged during setup/shutdown
Only payload carrying messages Traditional TCP-style connection setup/shutdown as alternative End-to-end flow control SOCK_SEQPACKET SOCK_STREAM SOCK_RDM for connectionless and multicast SOCK_DGRAM can easily be added if needed Same with “Unreliable SOCK_SEQPACKET”
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS
foo,117
Server Process,Partition BClient
Process
sendto(type = foo, instance = 117)
No protocol messages exchanged during setup/shutdown Only payload carrying messages
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS No protocol messages exchanged during setup/shutdown
Only payload carrying messages
Server Process,Partition BClient
Process connect(client)send()
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS No protocol messages exchanged during setup/shutdown
Only payload carrying messages
Server Process,Partition BClient
Process
connect(server)
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS Immediate “abortion” event in case of peer process crash
Server Process,Partition BClient
Processabort
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS Immediate “abortion” event in case of peer node crash
Server Process,Partition BClient
Process
abort
Node <1.1.5>Node <1.1.3>
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS Immediate “abortion” event in case of communication failure
Server Process,Partition BClient
Process
abort
Node <1.1.5>Node <1.1.3>
NOKIA RESEARCH CENTER / BOSTON
CONNECTIONSCONNECTIONS Immediate “abortion” event in case of node overload
Server Process,Partition BClient
Process
Node <1.1.5>Node <1.1.3>
abort
NOKIA RESEARCH CENTER / BOSTON
Network RedundancyNetwork Redundancy Retransmission protocol and congestion control at signalling link level Normally two links per node pair, for full load sharing and redundancy
Server Process,Partition BClient
Process
Node <1.1.5>Node <1.1.3>
NOKIA RESEARCH CENTER / BOSTON
Network RedundancyNetwork Redundancy Retransmission protocol and congestion control at signalling link level Normally two links per node pair, for full load sharing and redundancy Smooth failover in case of single link failure, with no consequences for
user level connections
Server Process,Partition BClient
Process
Node <1.1.5>Node <1.1.3>
NOKIA RESEARCH CENTER / BOSTON
Remaining WorkRemaining WorkImplementation
Reliable Multicast not fully implemented yet (exp. end of Q1) Re-stabilization after most recent changes Re-implementation of multi-cluster neighbour detection and link
setupProtocol
Fully manual inter cluster link setup Guaranteeing Name Table consistency between clusters Slave node Name Table reduction ?????
NOKIA RESEARCH CENTER / BOSTON
http://tipc.sourceforge.nethttp://tipc.sourceforge.net
NOKIA RESEARCH CENTER / BOSTON
QUESTIONS ??QUESTIONS ??