packet switching

Instructor: Rob Nash Readings: Chapter 3, P&D

Packet Switching

We have a limited number of hosts so far Also, a limited geographical distance

◦ As broadcast can only take us so far We can connect two distant nodes (or

networks) via point-to-point connections◦ But we don’t service any nodes in between

We’d like to build a global network, so we must consider hosts that aren’t directly connected.

To (Inter)Connect Two Nets

“Nature seems […] to reach many of her ends by long circuitous routes.” – Rudolph Lotze

“Packets are able to reach many different ends by (sometimes) long circuitous routes” ◦ But imagine this dilemma for a second:◦ How are packets able to navigate an unknown

topology? Ether is simple: send to everybody, but again doesn’t

scale

Motivation

Your phone isn’t directly connected to all other phone users

Rather, you’re connected to a switch An operator will provide the “directly

connected” illusion by configuring a (temporary) link for use in the call

In the same vein, computer networks have packet switches◦ For use in forwarding/switching packets

Routing is the process of building a forwarding table (4)

Borrowing from Telephony

Very broadly defined here as either: Connection-oriented: Like a telephone call, with

temporary state stored at each switch◦ X.25◦ ATM

Connectionless: Like the postal service, with even less recourse for problems (no RTS, etc.)◦ IP, UDP

Also, we’ll focus on two specific examples of switching◦ Ethernet & ATM

Switch Categorizations

Forwarding is a table lookup◦ Given the input port and ID, what is the output

port and outgoing ID? Routing is the algorithm that builds the

table◦ A distributed algorithm by nature of the domain◦ Should be fair◦ Consider offering a QoS◦ This has evolved over the history of networks

LAN Switching is an evolution of Ethernet Bridging with performance augmentations

A Bit on Terminology

CSS432: Switching and Fowarding

7

Switched Networks Switch Function:

◦ Connects two or more network segments◦ Forwards packets from input port to output port◦ Selects a port based on address in packet header

Inputports

T3T3

STS-1

T3T3STS-1

Switch

Outputports

Covers a large geographic area (> 2500m in Ethernet)

Support large numbers of hosts (>1024 hosts in Ethernet)

Maintain performance (>two packets through a switch)◦ And for n input ports each with buffer b, we can

provide n x b queuing simultaneously Contrast this to Ethernet, where two hosts will

compete for the line

Switched Network Advantages

Point-to-Point Ethernet MAC Rings A switch adds the star topology to our set

◦ Also, the ability to interconnect any of the above networking technologies As switches may be connected to hosts, or other

switches

Topologies

Switched networks are more scalable than shared-media networks◦ Directly due to their ability to support many hosts at full speed

(limited to memory capacity) And, we can use a switch to combine two disparate

networks◦ A SONET STS-3 link with and a few T3s◦ Each port runs the appropriate link layer protocol

Switching (or forwarding): receiving incoming packets on an input port and selecting the appropriate output port on which to forward the data

Switching in General

How does a switch make its decision?◦ This depends on the approach {connectionless,

etc}◦ In general, look at the header of the packet for an

identifier (could be a local id, could be an IP addr) Use this to make your decision by looking up the ID

in a table, and forward accordingly

We’ll start simply with the datagram approach

Questions on this Approach?

We can provide unique identifiers to each host on the network (e.g., an address)

We also will be interested in providing identifiers to label each input and output port in a switch

Identifiers

Each packet contains enough information to enable any switch to forward it

How? Just including the complete destination address in every packet.

Each switch will use the destination address as the key in the lookup

No connection state (thus no setup) All packets are forwarded independently Node failure and reroute is possible

Datagrams

CSS432: Switching and Fowarding 14

Datagram Switching (P.4)

0

132

01 3

2

013

2

Switch 3Host B

Switch 2

Host A

Switch 1

Host C

Host D

Host EHost F

Host G

Host H

Dest Port

A 3

B 0

C 3

D 3

E 2

F 1

G 0

H 0

Table at Switch 2

In a simple and static environment, one network operator may know the topology◦ And, manually install this in switches in the

network

In a distributed and dynamic environment, no one operator knows the complete topology◦ Multiple pathways, failing nodes, etc.◦ This harder problem is routing (Section 4.2)◦ For now: routing is an assumed background

process, and forwarding is a simple lookup

Forwarding & Routing

Hosts can send packets at any time (and to anywhere)◦ No setup or teardown◦ All switches can immediately forward this packet, assuming a

correct routing table Hosts don’t know (or care) about the health of the

intermediary network or destination node◦ You could send a packet to a machine that just lost power◦ Or, you could send a packet through a network whose

switches just lost power Failures may not catastrophically effect

communications if alternate routes exists around failed nodes (and the network updates its tables)

Connectionless Minimalist

A connection-oriented approach◦ With a setup, communicate, and teardown phase◦ This may seem like TCP over IP, but we’ll see this is

implemented on top of the connectionless approach

Setup: establishing connection state and path through the network ◦ Each subsequent packet will follow this path

Forwarding tables use VCIs – Virtual Circuit Identifiers – that help uniquely identify connections at a local switch

Virtual Circuit Switching


18

Virtual Circuit Switching (1,4)

Each switch maintains a VC table The Input Port & VCI uniquely determine a

connection

0

132

01 3

2

013

25 11

4

7Switch 3

Host B

Switch 2

Host A

Switch 1

VCI = 5

VCI = 11

VCI = 7

VCI = 4Port (in) VCI Port (out) VCI

2 5 1 11

Port (in) VCI Port (out) VCI

3 11 0 7

Port (in) VCI Port (out) VCI

0 7 3 4

Switch 1

Switch 2

Switch 3

PVCs – “permanent Virtual Circuits”, which are long-lived (or network operator configured) table entries

Signaled: a host may set up or delete a VC dynamically and autonomously

Configuring VCs

Oracle: How do switches to know what outgoing VCI they should use?◦ This data is literally downstream of the current

switch! Answer: We fill this data in “in reverse”,

after we’ve built a path from A to B.◦ Then, a setup/connection packet from B to A is

sent informing each upstream hop of the VCI it should use

We signal to set up (reserve a VCI entry) and signal to reclaim these resources when done

A Note on Setup

At least one RTT delay before any payload is communicated…◦ Why?

Setup packets differ from payload packets◦ Since setup contains the full GuID for the destination◦ So, per-packet overhead is reduced relative to the

datagram approach When we do get to send data, much network

topology is known in advance ◦ There is a receiver and route to that receiver, and the

receiver is ready to accept data

Implications for VC Switching

Resources are reserved in advance to avoid contention

SWP is used in between node pairs along the circuit

Flow control is used to prevent congestion, and new circuits are declined if not enough resources at a switch

X.25

Popular with telephony companies in the 80s

Physical medium : POTS links or ISDN◦ ISDN integrates speech and data on the same line◦ Pre-DSL

From Wiki: “X.25 is today to a large extent replaced by

less complex protocols, especially the Internet protocol (IP) “

Intro to X.25

http://en.wikipedia.org/wiki/Internet_protocol

We see the datagram approach is minimal and doesn’t reserve resources in advance◦ But, it also cannot make the same guarantee that

X.25 can We can implement a QoS concept using the

connection model, as we set the service level per connection◦ QoS here: a performance or resource guarantee

My packets shouldn’t be delayed (queued) too long My packets will always be accommodated at each

switch

Comparing the Approaches

Frame Relay is used for VPN construction (4.1.8)

ATM is used to link telephony systems across wide areas in a point-to-point configuration

Virtual Circuits in Action

Consider a pair of Ethernets you’d like to connect

We could just place a “repeater” terminal that collects all packets on one net and broadcasts them to the other◦ Shout louder!◦ This forms an extended LAN◦ The simplest version does no optimization

Note that a “bridge” here could be a host, but it meets our definition of a switch.

LAN Switches & Ethernet Bridges

Consider a shared-media example Consider the star topology offered by

switching◦ Note that each host has its own dedicated link

In the MAC example, link contention is an issue◦ In the switching example, I can send as much as

the switch can forward (or buffer) on my own link

Switching Performance


28

Bridges and Extended LANs

Connecting two or more LANs◦ Repeater

L1 – Physical Layer Limitations: <= 2500m and <= 1024

nodes ◦ Bridge (or LAN switch)

L2 – link layer No physical limitations Fowarding frames using MAC address Static configuration + partial dynamic

configuration (Spanning Tree Protocol)◦ Router

L3 – Network Layer Routing IP packets using IP address Dynamic configuration

A

Bridge

B C

X Y Z

Port 1

Port 2


Learning Bridges Do not forward when unnecessary

Ex. A frame sent from A to B Maintain forwarding table

HostPort

A 1 B 1 C 1 X 2 Y 2 Z 2

Learn table entries based on source address Ex. An entry for A is registered upon receiving a frame from A Ex. When receiving a frame from B, don’t forward to Port 2

Table is an optimization; need not be complete Entries are expired after a specific period of time

A

Bridge

B C

X Y Z

Port 1

Port 2

Based on datagram switching

How could a network come to have cycles in it?◦ Perhaps it’s a multi-site distributed net where no

one administrator knows the complete topology◦ Introduced by accident?◦ More likely: introduced for redundancy!

However, Learning Bridges can fail if a cycle exists, so we need a strategy to address graph cycles.

Network Growth

Algorithm deactivates ports to remove cycles◦ The spanning tree determines which bridges to use, and

which bridges should “sit out” Note that a bridge may forward on some ports, but not others

Formalized in the IEEE 802.1 Specification◦ Bridges adopt this distributed algorithm (as we’ll see)

Concept: remove edges from your graph until no cycles exist (the tree is a subset of the graph)◦ Oddity: vertices in this graph are both hosts and switches

Spanning Tree Protocol (Link)

When the network has settled, certain bridges will be designated to forward packets over their IO ports based on their distance to the root (or ID number if a tie)

Other bridges or ports will simply be disabled

Each bridge decides the ports over which it will and will not forward frames

Spanning Further

Elect the smallest ID as the root◦ Roots always forward over all ports

Each bridge computes the distance between it and the root◦ Usually a per-hop count

Trades this information with its neighbors, keeping track of “best” paths◦ Ie, shortest hop count in this context◦ Bridges that offer the best paths become designated

Finally all bridges determine the root feeder, which is the only bridge that forwards to the root◦ Chosen so it is closest to the root

Spanning Algorithm


STP Overview Each bridge has unique id (e.g., B1, B2, B3) Select a bridge with smallest id as root Select a bridge on each LAN closest to root as designated

bridge (use id to break ties)

B3

A

C

E

DB2

B5

B

B7 K

F

H

B4

J

B1

B6

G

I

Each bridge forwards frames over each LAN if it is a designated bridge

root

1 hop

1 hopB5 < B7

1 hopB4 < B6

1 hop

1 hop

2 hops


STP Details (use p. 191)

Initially, each bridge believes it is the root When learn not the root, stop generating

configuration messages in steady state, only the root

generates messages When learn not a designated bridge, stop

forwarding configuration messages in steady state, only designated

bridges forward configuration messages

If any bridge does not receive configuration message after a period of time, it starts generating configuration messages claiming to be the root.

B3

A

C

E

DB2

B5

B

B7 K

F

H

B4

J

B1

B6

G

I

Bridges exchange configuration messages (Y, d, X) Y: the id of root to be d: #hops from X to Y X: the sending bridge id

(1, 0, 1)

(1, 1, 2)

(1, 1, 5)

(1, 0, 1)

(1, 0, 1)

36

Bridge Limitations STP:

◦ It won’t forward frames over alternative paths for the sake of: Routing around a congested bridge Routing along a shorter path like one from a node on B to

another node on K◦ Scales linearly, and uses broadcast mechanism

Bridges in general: ◦ Not scalable (“tens of”)

STP Broadcast (forwarding all broadcast/multicast frames in

the current practice)◦ Homogenous networks only (uses network’s frame header)

Ethernet to Ethernet Token ring to Token ring ATM to ATM Idea: Partition LANS using coloring/tiling to limit the number

Of network segments that will broadcast

“It is never safe to design network software under the assumption that it will run over a single Ethernet segment.”

“Bridges happen.”◦ Drop frames if congested (rare on Ethernet alone)◦ Frames could be reordered in an extended LAN

Not in a singular Ethernet segment

Don’t Expect a Single LAN

Many ways to build economy & high-end switches◦ More advanced fabrics are implemented in high-end (core)

switches

The high level concepts overlap, however One idea: Get a box and a few NICs (DMA)

◦ Not a bad experimentation setup for new protocols◦ Or cross-protocol examination

Not so hot for performance Another idea: Custom Hardware

◦ A shared-memory switch memory with dual ports Crossbar switch Switches that attempt to self-route (3.4-3.5, Batcher & Banyan)

Switch Implementation & Perf


39

Workstation as a Switch (33-34)

Advantage: flexible because a workstation has a CPU. Example

◦ 33MHz 32bit I/O bus 1Gbps for one way from NIC to main memory 500Mbps for a round trip between NIC and main memory Enough to support five 100Mbps Ethenet

◦ What if a packet is very small like 64byes The workstation has 500,000 packets per second (pps). Throughput: 500,000 x 64 x 8 = 256Mbps

NIC

NIC

NIC

I/Octlr

CPU

Main memory I/O Bus

LAN A

LAN B

LAN C

Workstation


40

Shared Bus/Memory-Based Switch

A simple design Shared bus or memory becomes a bottleneck. (Max. 16 bus

masters)

Output Port

Output Port

Output Port

Input Port

Input Port

Input Port

Shared memory

Shar

ed b

us

Controlprocessor

DMA from port to port


41

Crossbar Switch

Without a collision, all inputs delivered to each output All inputs may go to the same output which causes a

collision in the output buffer.

Connection-oriented packet switching◦ Uses signaling (Protocol Q.2931)

WAN, but more recently LANs Runs on various physical mediums

◦ SONET◦ Shared Media such as Wireless◦ Shared-Media like Ethernet (with LANE)

Packets are called cells, which are fixed length (48 + 5 Bytes)

ATM is Cell Switching

LAN packets V.S. ATM cells◦ Consider also CISC v.s. RISC

In this light, certain features of ATM shine

Observations for a short and simple approach:◦ Its easier to build HW to do simple (short) jobs◦ The processing of data is simpler when fixed format

RISC ISA commonly has only a few instruction formats Off topic: 802.5 & Dec.Intel.Xerox Ethernet standard

Meaning: Compatibility can be simpler with a common format◦ Simple and short data {frames, instructions} can often

be “trained” or “pipelined”

A Quick Hardware Analogy

Observation: homogenous packet length lends to homogenous switching structures◦ Short and uniform structures can make the task of

exploiting parallelism easier Either at the hardware level

See simultaneous multithreading Or along protocol stack (simultaneous packet

processing, self-organizing streams, etc.)◦ Uniformity at higher levels tends to promote

uniform hardware designs Since this is not custom, often cheaper to build this

fast, scalable hardware

Analogy Further

Fixed length instructions help to align, fetch, prefetch, optimize, synchronize, reorder etc.◦ See the original 360 and Robert Tomasulo

Variable length instructions are more complex by design, ◦ possibly requiring multiple cycles to fetch a longer instruction

And/or more trips across the bus to and from memory

All said and done, Ethernet LANs are just as convincing due to their speed, cost, success & adoption rate

Finally…

Error detection is implemented at endpoints◦ End-to-end but not at each switch (i.e., at data

link layer) Congestion control

◦ Admission control If switches are completely reserved, decline

connections

ATM Features

Fixed-size cells can make this easier One Approach: use some SONET overhead

to point to the start of the cell in the payload

Another Approach: CRC every 5 bytes◦ If you see no error, you’re likely at an ATM header

Repeat this approach looking for the same results every 53 bytes

See p.199 for the frame format

Framing in ATM

Not exaustive ATM offers Qos features ATM offers flow control, LANs are “best

effort” ATMS are conservative resource-wise

◦ Connectionless protocols are minimalist ATM can guarantee resources ahead of time

◦ Useful esp. for voice-grade guarantees Fixed length V.S. variable length packets No broadcast (natively) V.S. only broadcast

ATM & LAN Bridging

Layers were built ontop of ATM to support other styles of networks and services◦ AAL 1-2 is for voice grade guaranteed bit rates◦ AAL 3-4 is for packet data over ATM

This requires S&R, since MTU for Ethernet >> 53B

Segmentation & Reassembly

When packets are being discarded frequently due to lack of resources◦ arrivalRate > sendRate + bufferSpace for some t

Switch Congestion

packet switching

Documents