cloud fabric: myths, missteps, and mysteries
DESCRIPTION
Cloud Fabric: Myths, Missteps, and Mysteries. Radia Perlman Intel Labs [email protected]. Network Protocols. A lot of what we all know…is false!. How networking tends to be taught. Memorize these RFCs Nothing else ever existed - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/2.jpg)
Network Protocols
• A lot of what we all know…is false!
2
![Page 3: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/3.jpg)
How networking tends to be taught
• Memorize these RFCs• Nothing else ever existed• Except possibly to make snide comments
about “other teams”
3
![Page 4: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/4.jpg)
Things are so confusing
• Comparing technology A vs B– Nobody knows both of them– Somebody mumbles some vague marketing
thing, and everyone repeats it– Both A and B are moving targets
4
![Page 5: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/5.jpg)
What about “facts”?
• What if you measure A vs B?
5
![Page 6: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/6.jpg)
What about “facts”?
• What if you measure A vs B?• What are you actually measuring?...one
implementation of A vs one implementation of B
6
![Page 7: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/7.jpg)
How I wish we’d compare
• Isolate conceptual pieces• Try to ignore buzzwords or “which team”
7
![Page 8: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/8.jpg)
Some really confusing stuff
• We talk about “layer 2 solutions” vs “layer 3 solutions”….what’s that about?
8
![Page 9: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/9.jpg)
Basic network protocols
• Simple…an envelope in which you put your data
• Envelope contains, e.g., source, destination• Switch has forwarding table that indicates
(based on info in packet) output port or set of ports
9
![Page 10: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/10.jpg)
“Switch”
• Something that forwards (e.g., bridge, router, switch)
10
![Page 11: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/11.jpg)
What does a switch do?
• Forward based on:– Info in packet
• Destination address or “label” (like MPLS, changes at each hop and represents an S-D path)
• If need to keep things in order, other stuff in packet (e.g., TCP ports, flow ID, entropy field)
– Forwarding table
11
![Page 12: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/12.jpg)
When does forwarding table get filled in?
• Proactively• When a flow starts
12
![Page 13: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/13.jpg)
Seems to me…
• Proactively is better…otherwise latency while setting up a path for a new flow
13
![Page 14: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/14.jpg)
Info in packet
• Forwarding table indexed by– destination vs label vs flow
• Forwarding table gives single port or set of ports (allowing switch to choose)
• Preview: I think destination-based is best, with set of ports
14
![Page 15: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/15.jpg)
Destination alternatives
• Flat or hierarchial– Flat
• Convenient for moving without changing address• Dense vs sparse: dense can be direct lookup, sparse (as in 6-
byte Ethernet address) requires hash– Hierarchical
• Makes forwarding table smaller• Either reserve certain bits for each level, or be flexible and
have to do longest prefix match to find proper forwarding entry
15
![Page 16: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/16.jpg)
16
“Label”: is a path
S
AR1
R2
R3
R4
R5
D
3
4
7
2
4
3
1
2
3
(3,51)=(7,21)(4,8)=(7,92)(4,17)=(7,12)
(2,12)=(3,15)(2,92)=(4,8)
(1,8)=(3,6)(2,15)=(1,7)VC=8, 92, 8, 6
8
92
8
6
![Page 17: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/17.jpg)
Flow-based
• Each forwarding table entry is for a single conversation…more specific than (S-D)– E.g., source, destination, TCP ports
17
![Page 18: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/18.jpg)
Some thoughts• Dest-based vs label-based
– Destination-based is smaller (O(n)) forwarding table than label-based (O(n2))
– People think label-based is for traffic engineering, but can do traffic engineering with destination-based using some special destination addresses
– ATM did label-based because• # of currently communicating pairs much smaller than total number
of destination• OK to have latency to set up a conversation
– MPLS did it because it grew out of “tag-switching”
18
![Page 19: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/19.jpg)
More thoughts
• Flow-based vs destination-based– Only way to make flow-based not totally
explode the forwarding table is to create entry when flow starts (incur latency)
– Switch in better position to load-split traffic than central fabric manager
19
![Page 20: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/20.jpg)
Exploiting parallel paths
20
S
R1a
R1b
R1cR1d
R1e
R2a
R2b
R2cR2d
R2e
R3a
R3b
R3cR3d
R3e
D
Intel Confidential
![Page 21: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/21.jpg)
Load splitting and keeping packets in order
• Source chooses the path– With a label or with choice of destination addresses for
a destination (each one having a different path)• Forwarding table based on flow• Switch looks at other info to choose port
– Deep packet inspection (e.g., TCP ports)– “entropy field”– Either way, deterministically choose same path for
same flow
21
![Page 22: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/22.jpg)
Research Suggestion
• Suppose a central place knows about all the flows
• What spreads traffic better?– Switches based on local output queues?
• What about knowing about congestion k hops away?
– Central place carefully placing all the paths for all the flows?
22
![Page 23: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/23.jpg)
Seems to me…
• Better to give switches choices per destination, and have them load split
• If have to keep order, can occasionally re-hash to move flows around
• I believe flows are inherently bursty
23
![Page 24: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/24.jpg)
Completely orthogonal concept
24
![Page 25: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/25.jpg)
Where does forwarding table come from?
• Distributed algorithm• Central fabric manager• Neither concept new…and completely
orthogonal to “data plane”• Concept of separation of control plane from
data plane not new…• I don’t believe the distributed algorithm
makes switches expensive25
![Page 26: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/26.jpg)
Seems to me…
• Distributed algorithm is superior, because it can react to topology changes more quickly
• But if there are very few topology changes, then perhaps less overhead with central?
26
![Page 27: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/27.jpg)
How do you manage a network?
• From a management console, which translates “ big” commands, such as “forward based on this metric” or “traffic engineer this path” into individual commands to switches
27
![Page 28: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/28.jpg)
How do you manage a network?
• From a management console, which translates “ big” commands, such as “forward based on this metric” or “traffic engineer this path” into individual commands to switches
• Protocols define parameters that are settable, readable, events that trigger alerts
28
![Page 29: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/29.jpg)
To my astonishment
• That original vision degraded
29
![Page 30: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/30.jpg)
To my astonishment
• That original vision degraded• If we reinvent that vision with a new
language for managing the switches, will the same vision degrade for the same reason?
30
![Page 31: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/31.jpg)
New topic
31
![Page 32: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/32.jpg)
What is Ethernet?
32
![Page 33: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/33.jpg)
33
Why this whole layer 2/3 thing?
• Perlman’s View of ISO Layers– 1: physical
![Page 34: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/34.jpg)
34
Why this whole layer 2/3 thing?
• Perlman’s View of ISO Layers– 1: physical– 2: data link (nbr-nbr, e.g., Ethernet)
![Page 35: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/35.jpg)
35
Why this whole layer 2/3 thing?
• Perlman’s View of ISO Layers– 1: physical– 2: data link (nbr-nbr, e.g., Ethernet)– 3: network (create entire path, e.g., IP)
![Page 36: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/36.jpg)
36
Why this whole layer 2/3 thing?
• Perlman’s View of ISO Layers– 1: physical– 2: data link (nbr-nbr, e.g., Ethernet)– 3: network (create entire path, e.g., IP)– 4 end-to-end (e.g., TCP, UDP)
![Page 37: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/37.jpg)
37
Why this whole layer 2/3 thing?
• Perlman’s View of ISO Layers– 1: physical– 2: data link (nbr-nbr, e.g., Ethernet)– 3: network (create entire path, e.g., IP)– 4 end-to-end (e.g., TCP, UDP)– 5 and above:
![Page 38: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/38.jpg)
38
Why this whole layer 2/3 thing?
• Perlman’s View of ISO Layers– 1: physical– 2: data link (nbr-nbr, e.g., Ethernet)– 3: network (create entire path, e.g., IP)– 4 end-to-end (e.g., TCP, UDP)– 5 and above: boring
![Page 39: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/39.jpg)
So…why are we forwarding Ethernet packets?
• Ethernet was intended to be layer 2• Just between neighbors – not forwarded
39
![Page 40: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/40.jpg)
So…why are we forwarding Ethernet packets?
• Ethernet was intended to be layer 2• Just between neighbors – not forwarded• What exactly is Ethernet?
40
![Page 41: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/41.jpg)
Back then…
• I was layer 3 architect for DECnet• Layer 3 calculate paths, and forwarded
packets• Layer 2 just marked beginning and end of
packet, and checksum• Then along came Ethernet
41
![Page 42: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/42.jpg)
The story of Ethernet
42
![Page 43: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/43.jpg)
The story of Ethernet
• CSMA/CD• Spanning Tree• TRILL• Futures?
43
![Page 44: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/44.jpg)
44
Ethernet packet
data
Ethernet header: 6 byte addresses – strangely large…because it allows autoconfigurationPlus stuff like protocol type and VLAN
dest source
![Page 45: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/45.jpg)
CSMA/CD Ethernet
• CSMA/CD…shared bus, peers, no master– CS: carrier sense (don’t interrupt)– MA: multiple access (you’re sharing the air!)– CD: listen while talking, for collision
• Lots of papers about goodput under load only about 60% or so because of collisions
• Limited in # of nodes (maybe 1000), distance (kilometer or so)
45
![Page 46: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/46.jpg)
But Ethernet hasn’t been CSMA/CD for decades
46
![Page 47: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/47.jpg)
How it evolved to spanning tree
• People got confused, and thought Ethernet was a network instead of a link– Link (layer 2) = nbr-nbr– Network (layer 3) = forward along a path
• Built apps on Ethernet, with no layer 3• Router can’t forward without the right
envelope
47
![Page 48: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/48.jpg)
48
Problem Statement (from about 1983)
Need something that will sit between two Ethernets, andlet a station on one Ethernet talk to another
A C
Without modifying the endnode, or Ethernet packet, in any way
![Page 49: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/49.jpg)
The basic concept
• Bridge just listens promiscuously, and forwards to each other port when the ether is free
• Learn (Source=S, input port). Once learned, if see a packet with destination=S, know where to forward it (rather than “all the ports”)
• This requires a tree (no loops) topology49
![Page 50: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/50.jpg)
50
A C
DE
XJ
X,CA
![Page 51: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/51.jpg)
51
93
4
117
10
14
2 5
6
A
X
Physical Topology
![Page 52: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/52.jpg)
52
93
4
117
10
14
2 5
6
A
X
Pruned to Tree
![Page 53: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/53.jpg)
53
Algorhyme
I think that I shall never seeA graph more lovely than a tree.
A tree whose crucial propertyIs loop-free connectivity.
A tree which must be sure to spanSo packets can reach every LAN.
First the root must be selected,By ID it is elected.
Least cost paths from root are traced,In the tree these paths are placed.
A mesh is made by folks like me.Then bridges find a spanning tree.
Radia Perlman
![Page 54: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/54.jpg)
54
Bother with spanning tree?
• Maybe just tell customers “don’t do loops”• First bridge sold...
![Page 55: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/55.jpg)
55
First Bridge Sold
A C
![Page 56: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/56.jpg)
56
93
4
117
10
14
2 5
6
A
X
Problems with spanning tree: suboptimal paths,Unused links
![Page 57: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/57.jpg)
Why not just use IP routers?
• World has converged to IP as layer 3, and it’s in the network stacks
57
![Page 58: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/58.jpg)
Why not just use IP routers?
• IP is configuration intensive, moving VMs disruptive– IP protocol requires every link to have a unique
block of addresses– Routers need to be configured with which
addresses are on which ports– If something moves, its address changes
58
![Page 59: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/59.jpg)
59
Layer 3 doesn’t have to work that way!
• CLNP / DECnet...20 byte address– Bottom level of routing is a whole cloud with the
same 14-byte prefix– Routing is to 6 byte ID inside the cloud– Enabled by “ES-IS” protocol, where endnodes
periodically announce themselves to the routers
14 bytes 6 bytes
Prefix shared by all nodes in large cloud Endnode ID
![Page 60: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/60.jpg)
60
HierarchyOne prefix per link (like IP) One prefix per campus
2*
25*
28*
292*
22*293*
2*
![Page 61: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/61.jpg)
61
Worst decision ever
• 1992…Internet could have adopted CLNP• Easier to move to a new layer 3 back then
– Internet smaller– Not so mission critical– IP hadn’t yet (out of necessity) invented DHCP, NAT,
so CLNP gave understandable advantages• CLNP still has advantages over IPv6 (e.g., large
multilink level 1 clouds)
![Page 62: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/62.jpg)
Ethernet looks like a single IP link
• So Ethernet provides a large cloud in which switches can autoconfigure, and nodes (e.g., VMs) can move around transparently
• But don’t want limitations of spanning tree
62
![Page 63: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/63.jpg)
Next step in evolution: TRILL
63
![Page 64: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/64.jpg)
TRILL
• TRansparent Interconnection of Lots of Links
• Basic idea: Put Ethernet in another envelope that acts more like a layer 3 envelope, and can be routed
64
![Page 65: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/65.jpg)
65
TRILL
R7
R1
R3
R4
R6
R2
R5
a
c
![Page 66: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/66.jpg)
66
TRILL packet
Original Ethernet packet
TRILL headerSwitch addresses are 16 bits
Lastswitch
1stswitch
hops
![Page 67: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/67.jpg)
16-bit TRILL switch “nicknames”
• Allows 64,000 switches…many more endnodes
• TRILL autoconfigures nicknames• Allows simple forwarding table lookup
– Direct table lookup– Don’t need associative memory, or hash, or
longest prefix match
67
![Page 68: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/68.jpg)
Advantage of extra header
• Switches inside cloud don’t need to know about all the endnodes…– Forwarding table size of # of switches
• The outer header is like a layer 3 header, and can use all the layer 3 techniques, e.g.,– Shortest paths– Multiple paths (exploit parallelism)– Traffic engineering
68
![Page 69: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/69.jpg)
How does R1 know R2 is “last switch”?
• Orthogonal concept to rest of TRILL• R1 needs table of (destination MAC, egress
switch)• Various possibilities
– Edge switch learns when decapsulating data, floods if destination unknown
– Configuration of edge switches– Directory that R1 queries– Central fabric manager pushes table
69
![Page 70: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/70.jpg)
Note: TRILL is evolutionary
• Endnodes just think it’s Ethernet…no changes• Even interworks with existing spanning tree
switches• The more switches you upgrade to TRILL, the
better the bandwidth utilization• This could have been implemented by a single
vendor, without standardizing
70
![Page 71: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/71.jpg)
Orthogonal concept
71
![Page 72: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/72.jpg)
Who encapsulates/decapsulates?
• Could be– first switch– Or hypervisor– Or VM– Or application
• For “evolution”, switch• Having endnode do it saves work for
switch, easier to eliminate stale entries72
![Page 73: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/73.jpg)
73
Algorhyme v2I hope that we shall one day see
A graph more lovely than a tree.A graph to boost efficiency
While still configuration-free.A network where RBridges can
Route packets to their target LAN.The paths they find, to our elation,
Are least cost paths to destination.With packet hop counts we now see,
The network need not be loop-free.RBridges work transparently.
Without a common spanning tree.Ray Perlner
![Page 74: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/74.jpg)
Recently, a bunch of similar things invented
• NVGRE, VXLAN, …
74
![Page 75: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/75.jpg)
How to compare
• “Inner” packet based on flat address space– IP or Ethernet…
• IP header bigger, addresses smaller, well-known how to get unique Ethernet addresses without configuring
• “Outer” header location dependent– TRILL header small, nickname; simple
forwarding lookup
75
![Page 76: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/76.jpg)
What does encapsulation header address?
• Last switch?– Smaller forwarding tables– Last switch has to look at inner header to know
where to forward• Output port of last switch?
– Can avoid making forwarding tables bigger if there is a fixed hierarchy:• Last switch | Port on last switch
76
![Page 77: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/77.jpg)
Interesting (to me, anyway) note
• CLNP vs IP+TRILL– With CLNP, no need for ARP to get address on
final link…it’s part of the header– With these encapsulation things, forwarding
table inside final cloud can be smaller…with CLNP, routers have to keep track of all endnodes inside the cloud
77
![Page 78: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/78.jpg)
Some heresy
• Fabrics should be allowed to reorder packets…make smarter endnodes, including work of middle boxes
• Congestion by telling source too slow• Cost of making fabric “lossless” is too high
– Congestion spreads if• You never drop packets• You backpressure, based on a few classes
78
![Page 79: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/79.jpg)
Protocol Folklore
• Obvious stuff everyone gets wrong
79
![Page 80: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/80.jpg)
80
What’s a Version Number?
• Version number– what is “new version” vs “new protocol”?
• same lower layer multiplex info– therefore, must always be in same place!– drop if version # bigger
![Page 81: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/81.jpg)
81
Version #
• Nobody seems to do this right• IP, IKEv1, SSL unspecified what to do if
version # different. Most implementations ignore version number field
• SSL v3 moved version field!
![Page 82: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/82.jpg)
82
Parameters
• Minimize these:– someone has to document it– customer has to read documentation and
understand it• How to avoid
– architectural constants if possible– automatically configure if possible
![Page 83: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/83.jpg)
83
Settable Parameters
• Make sure they can’t be set incompatibly across nodes, across layers, etc. (e.g., hello time and dead timer)
• Make sure they can be set at nodes one at a time and the net can stay running
![Page 84: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/84.jpg)
84
Example: Hello Timer
• IS-IS– pairwise parameters reported in “hellos”– So you know what to expect from that neighbor
• OSPF– Kind of copied IS-IS, but decided…
![Page 85: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/85.jpg)
85
Example: Hello Timer
• IS-IS– pairwise parameters reported in “hellos”– So you know what to expect from that neighbor
• OSPF– Kind of copied IS-IS, but decided…– Refuse to talk if timers not identical with
neighbor’s!
![Page 86: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/86.jpg)
Latency
• Store-and-forward vs cut-through• Cut through can start after the forwarding
decision is made• What field do you need to see for
forwarding decision?
86
![Page 87: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/87.jpg)
87
IPv4 header
![Page 88: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/88.jpg)
88
IPv6 header
![Page 89: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/89.jpg)
Another latency mistake
• TCP has checksum in the header• So can’t start transmitting until you see the
whole packet
89
![Page 90: Cloud Fabric: Myths, Missteps, and Mysteries](https://reader036.vdocuments.mx/reader036/viewer/2022062410/5681658e550346895dd85ab3/html5/thumbnails/90.jpg)
Parting thoughts
• Don’t believe anything about “technology X” unless there is a plausible inherent reason for it
• Don’t get carried away by buzzwords• Know what problem you’re solving before
you start on the solution
90