the switch book by rich seifert-notes
TRANSCRIPT
Switch Book Layer 2 concepts
1/51
THE SWITCH BOOKTHE SWITCH BOOKTHE SWITCH BOOKTHE SWITCH BOOK By Rich Seifert
CHAPTER-1
Foundations Of LAN Switches:
Switch Book Layer 2 concepts
2/51
Network Architecture:
OSI LAYER:(Open System Interconnect)
It consists of seven layers of network system functions.
1. Physical Layer:
� Transmission and reception of signals from the communications medium.
� Data is sent in terms of bits:0’s and 1’s.
� This layer is a function of the design of physical medium.(Cabling)
2.Data Link Layer:
� Provides direct communication between devices.
� Communications are of two types-point to point and point to multipoint.
� It provides mechanisms for 1.Framing 2.Addressing 3.Error Detection.
Switch Book Layer 2 concepts
3/51
� 2 Modes of operation :
1. Connectionless :( a).Just forwards the frame and doesn’t
receive acknowledgement. (b) Doesnt provides error control and flow control.
2. Connection oriented :( a) continual exchange of data and
receives acknowledgement. (b) Provides error and flow control.
3.Network Layer:
� Station to Station data delivery across multiple links.
� Routing of packets across the internetwork usually through routers.
� Protocols include:IP,IPX,Appletalk etc
4. Transport Layer:
� Shields between lower and upper layers.
� Provides error free sequenced guaranted delivery service.
� Mechanisms:1.Connection establishment,2.Error Recovery 3.Flow control.
� Protocols:TCP,ATP,SPX etc.
5. Session Layer:
� Establishment of communications sessions between applications.
� Deals with user authentication and access control(passwords).
6.Presentation Layer:
� Presents proper data to application layer.
� Data formats: encryption,decryption, encoding,decoding.
Switch Book Layer 2 concepts
4/51
7.Application Layer:
� Provides APIs that allow user applications to communicate across the network.
� Functions such as FTP, Mail Utilities, SMTP, NFS etc.
Data link sub layering:
1. Logical Link Control (LLC): It’s the upper layer .Provides the data link service
(connectionless or connection oriented) to the higher layer clients, independent of the
underlying LAN.There are 3 types of service 1.LLC TYPE 1: Connectionless Service
2.LLC TYPE 2: Connection Oriented Service 3.LLC TYPE 3: Acknowledged
Connectionless Service.
2. Medium Access Control (MAC): It’s the lower layer.Deals with details of frame
formats associated with the particular technology in use.
LLC Frame Format:
Mac Header Dest SAP-1
Byte
Source SAP-
1 Byte
CTRL-1
Byte
DATA
LLC/Snap Format: If the SAP is set to OXAA, then SNAP is in use.
Mac
Header
Dest
SAP=OXAA
Source
SAP=OXAA
CTRL SNAP
OUI
SNAP
Pid
Data
Switch Book Layer 2 concepts
5/51
Addressing:
MAC Address:Its a 48 bit address.Its used in the data link.Its also called as hardware
address,Physical address.
1 byte 2nd 3rd 4th 5th 6th
OUI OAP
The OUI is used the denote the manufacturer.Force10 has the OUI as 00-01-E8.
ETHERNET:
� Low cost,High speed communication.
Frame transmission:
� Sensing carrier.
� Waiting for Interframe gap.
� Transmission takes place.
Frame reception:
� Station monitors for receiving frame.
� When channel becomes non-idle,it starts receiving bits.
� Frames will be discarded if they less than one slot time in length.
� FCS checks for minimum frame length,if valid the receiver will check for the
DA to see if it matches the physical address of the receiving station.
Switch Book Layer 2 concepts
6/51
� If it matches then frame is forwarded to client.
Ethernet Frame Formats:
Type encapsulation:ETHERNET VERSION-2
Preamble/SFD DA SA TYPE DATA FCS
Bytes: 8 6 6 2 46-1500 4
Length encapsulation:IEEE 802.3
46-1500
LLC HEADER
Preamble/SFD DA SA LENGTH DSAP SSAP CTRL DATA PAD FCS
Bytes: 8 6 6 2 1 1 1 4
Preamble: It consists of 7 bytes and it allows receivers to synchronize on incoming frame.
It has a value of 0X55.
SFD:Its consists of 1 byte. Its used to signify the beginning of the DA.Its value is OXD5.
DA:Destination address of the frame.It consists of 6 bytes.
SA:Source addressof the frame.It consists of 6 bytes.
Switch Book Layer 2 concepts
7/51
DATA:It consists of 46-1500 bytes. It encapsulates the higher layer protocol information
being transferred across the Ethernet.
Pad: This is used to add extra bytes incase the value of the data is less say less than 46
bytes.In that case the frame will be discarded,so in order to prevent it we use pad field.
LAYER ENCAPSULATION:
PHYSICAL LAYER ENCAPSULATION-STREAM-BITS
ETHERNET FRAME
IP PACKET
TCP SEGMENT
PL
HEADER
ETH
HEADER
IP
HEADER
TCP
HEADER
APPLICATION
DATA
ETH
TRAILER
PL
TRAILER
A transport PDU is called a segment or message.
A Network PDU is called a packet.
A Data Link PDU is called frame.
A Physical layer PDU is called symbol stream.
PDU: Protocol Data Unit.
Switch Book Layer 2 concepts
8/51
CHAPTER-2
TRANSPARENT BRIDGES
Switch Book Layer 2 concepts
9/51
Transparent bridges.
Now getting into details of how things actually work…
Transparent bridges are so named because their presence and operation are transparent to
network hosts. When transparent bridges are powered on, they learn the network's
topology by analyzing the source address of incoming frames from all attached
networks.
If, for example, a bridge sees a frame arrive on Line 1 from Host A, the bridge concludes that
Host A can be reached through the network connected to Line 1. Through this
process, transparent bridges build a table.
Host address Network number
15 1
17 1
12… 2….
Figure 1: Transparent bridges build a table that determines a host's accessibility
The bridge uses its table as the basis for traffic forwarding. When a frame is received on one
of the bridge's interfaces, the bridge looks up the frame's destination address in its
internal table. If the table contains an association between the destination address and
any of the bridge's ports aside from the one on which the frame was received, the
frame is forwarded out the indicated port. If no association is found, the frame is
flooded to all ports except the inbound port. Broadcasts and multicasts also are
flooded in this way.
UNICAST OPERATION:
Switch Book Layer 2 concepts
10/51
When a frame is received on any port, the bridge extracts the destination address from
the frame, looks up in the table, and determines the port to which the address maps.
We have filtering and forwarding concepts.
Filtering: When a packet is received by a node, filtering is the task of
a) Determine whether to forward the packet at all, and
b) Which port(s) to forward the packet to?
Filtering makes the network operation more efficient by reducing the number of
output ports to which the packet needs to be sent to. For example:
Unicast packets need to go to only one output port, and that output port should the
next step in the desired path to the destination. Multicast packets need to go to a
(sub) set of ports. The forwarding table encodes this subset of ports and avoids the
need to carry such information in the packet itself.
Forwarding:
Given a packet at a node, finding which output port it needs to go to is called
“forwarding” – it is a per-node function, whereas routing may encompass several
nodes.
The forwarding function is performed by every node in the network including hosts,
repeaters, bridges and routers.
Forwarding is trivial in the case of a single-port node (or in a dual port node, where
the destination is on the port other than the input port) – in this case you don’t need
even addresses.
Generating the address table:
1) The address table can be built automatically by considering the source address in
received frames.
Switch Book Layer 2 concepts
11/51
2) Bridges perform a table lookup on the destination address in order to determine
on which ports to forward the frame.
Address table aging:
If all we ever did was add learned address to the table and never removes them, we
would have two problems.
1) The larger the table, the more time the lookup will require. Thus we have to restrict
entries in the table to only those stations that are known to be currently active.
2) If a station moves from one port to another, the table will incorrectly indicate the
old port until the station sends traffic that would cause the bridge to learn its new
location.
The simple solution to both the problems is to age entries out of the address table
when a station has not been heard from for some period of time. Thus, when we
perform the table lookup for the source address, we not only make a new entry, we
flag the entry as being still active. On regular basis we check for stale entries.—
entries that have not been flagged as active for some period of time—and remove
them from table.
Process Model of Table Operation
1. A lookup process compares the destination address in incoming frames to the
entries in the table to determine whether to discard the frame, forward it to a specific
port, or flood it to all the ports.
2. A learning process compares the source address in incoming frames to the entries
in the table and updates the port mapping and activity indicators or creates new
entries as needed.
3. An aging process removes stale entries from the table on a regular basis.
Switch Book Layer 2 concepts
12/51
Custom Filtering and Forwarding
We can add filtering and forwarding criteria beyond defaults. Many commercial
bridges allow the network administrator to program the custom filter and forward
criteria: for example the network administrator may wish to:
1. Prevent specific users from accessing certain resources.
2. Prevent sensitive traffic from being allowed to propagate beyond a set of
controlled LANs.
3. Limit the amount of multicast traffic that is flooded onto certain LANs.
Implementing the bridge address table
Table operations:
There are three operations that need to be performed on the Bridge address table;
Destination address lookup, source address learning, and entry aging. Considering
the priority of the operations, the table design should be optimized for fast, real-time
lookup, at the expense of slower and more complex update and aging algorithms if
need be.
Search Algorithms:
1) Hash tables
2) Binary search
3) Content addressable Memories (CAM)
You can compare CAM to the inverse of RAM. When read, RAM produces the
data for a given address. Conversely, CAM produces an address for a given data word.
When searching for data within a RAM block, the search is performed serially. Thus,
finding a particular data word can take many cycles. CAM searches all addresses in
Switch Book Layer 2 concepts
13/51
parallel and produces the address storing a particular word. You can use CAM for any
application requiring high-speed searches, such as networking, communications, data
compression, and cache management.
Aging entries from the table
The aging process is a non-critical low priority task .It can be done in the background
with out significant performance or operational penalty. The common mechanism used
,to maintain two bits valid(V) and Hit(H) .the valid (v) bit indicates that a table entry is
valid; the hit(h) bit indicates that this entry has been ‘hit ‘,that is ,seen as a source
address, during the most recent aging process cycle.
The IEEE 802.1D standard
In addition to the formal description of transparent bridge operation, the standard
provides:
1) An architectural frame work for the operation of bridges, including formal
specifications for interlayer services.
2) A formal description of the bridge address table, frame filtering, and forwarding,
including for static and dynamic entries, forwarding rules and custom filter definitions.
3) A set of operating parameters for interoperation of bridged catenets.
Switch Book Layer 2 concepts
14/51
CHAPTER-4
Principle of LAN switches
Switch Book Layer 2 concepts
15/51
1. Switched LAN concepts
� Access Domains : The set of stations sharing a given LAN and arbitrating
among themselves using whatever access control mechanism is appropriate
for that LAN.
� Collision Domains: In an Ethernet LAN, the set of stations contending for
the access to shared Ethernet LAN. This results in Collision domain.
� Token Domains : Similarly, the set of stations contending for the use of
token on a token-passing LAN, which results in Token domain.
� Both Collision and Token domain are the examples for Access domain.
� Each port in the switch act as the terminal for the access domain of that
particular link.
� It’s the switch that separates the Access domain of each port.
Segmentation and Microsegmentation:
� Segmentation is connecting group of stations to the each port of the switch
(i.e. each port is connected with a shared LAN). So a switch used in this
manner provides a collapsed backbone.
� So to over come the drawbacks of the collapsed backbone, the concept of
Microsegmentation come in to task.
� Microsegmentation: it is the direct connection of each end stations to each
switch port.
Switch Book Layer 2 concepts
16/51
� Microsegmentation interesting characters:
1. No access contention (i.e. no collision) only in the full duplex mode.
2. Possible to eliminate access control when full duplex used.
3. There will be a dedicated bandwidth (LAN segmentation is available for
each station). So the data rate is independent. For example one can be
10Mbps, the other one can be 100Mbps or 1000Mbps.
Extended distance limitations:
� Switches allow us to extend distance coverage of a LAN.
� Using full duplex the distance constraints can be eliminated
(i.e. microsegmentation).
Increase aggregate capacity:
� A switch provides greater data-carrying capability than a shared LAN.
� Since a switch hub provides dedicated capacity on each switch port, the total
LAN capacity increases with the number of switch ports. So the aggregate
capacity will equal:
Capacityagg = port=1Σn Data Rateport
2. Cut-Through verses Store-and-Forward
� Store and forward: as the name implies, each frame is received (stored)
completely and then decisions are made regarding whether and where to
forward the frames.
Switch Book Layer 2 concepts
17/51
� This is done based on the Destination address in the Ethernet frame. The
destination address is the first field in the Ethernet frame.
� So in this method the switch waits for 1.2ms, for the frame to receive fully
and then the decision and forwarding is done.
� To reduce this receiving and forwarding timing, the concept of Cut-Through
comes in to picture.
� Cut-Through: The switch begin transmitting the frame before the frame
fully received at the input side. Since the destination address is the first field
in the Ethernet frames, as soon as the switch reads the destination address it
forwards the frame to the destination.
� Switch can receive the destination address field by 11.2 µs make the
decision and forward. So the switch need not wait for the whole frame to be
received.
� Because of this advantage, Cut-Through mode is having less latency than
the Store-and-Forward.
� The implication was that a Cut-Through switch provided a 20:1
performance improvement over Store-and-Forward switch. There are a
number of fallacies with this conclusions:
1. Absolute latency is not a significant issue for most higher-layer protocols
and applications (at least not latency on the order of a few milliseconds).
2. For those protocols that are sensitive to latency, the switch is only a
smaller part of the problem.
3. Any latency benefit accrues only when the output port is available.
4. Cut-Through operation is generally not possible for multicast or unknown
destination address.
Switch Book Layer 2 concepts
18/51
CHAPTER 5
Loop resolution
Switch Book Layer 2 concepts
19/51
Spanning tree protocol
� Frames would loop for an indefinite period of time in networks with
physically redundant links.
� To prevent looping frames, STP blocks some ports from forwarding
frames so that only one active path exists between any pair of LAN
segments (collision domains).
� The result of STP is both good and bad
� Good: Frames do not loop infinitely, which makes the LAN usable.
� Bad: the network does not actively take advantage of some of the
redundant links, because they are blocked to prevent frames from looping.
Some users’ traffic travels a seemingly longer path through the network,
because a shorter physical path is blocked.
� However the net result is GOOD.
Terminology
Tree topology: Think of a tree. There is a root, branches (actually, a hierarchy of
progressively smaller branches), and ultimately leaves. On a given tree, there are no
disconnected parts that are still considered part of the tree; that is, the tree encompasses
all of its leaves. In addition, there are no loops in the tree. Thus a tree is a loop-free
topology that spans all of its parts.
Root Bridge: just as a tree has a root, spanning tree has a Root Bridge. The root Bridge is
the logical center (but not necessarily the physical center) of the catenet. There is always
exactly one Root Bridge in a catenet.
Designated Bridge: the bridge responsible for forwarding traffic in the direction from
the root to a given link is known as the designated bridge for that link.
Designated Port: the port in the active topology used to forward traffic away from the
root on to the link(s) for which this bridge is the Designated Bridge.
Switch Book Layer 2 concepts
20/51
Root Port: the port in the active topology that provides connectivity from the designated
bridge towards the root.
Bridge identifier: in order to properly configure, calculate, and maintain the spanning
tree, there needs to be a way to uniquely identify each bridge in the catenet and each port
within the bridge.
A bridge identifier is a 64-bit field unique to each port in the catenet. The bridge
id is the concatenation of a globally-unique 48-bit field and a 16-bit “priority” value.
Bridge id: the priority is from 0 to 65,535 (216
) and the default priority value is
32768(0x8000).
Port identifier: each port of the bridge is assigned a port id. Similar to the bridge id, a
port id concatenates a unique 8-bit port number and 8-bit priority field. The range of the
priority field in port id is 0 to 255(0xFF); the default value is the range (128 or 0X80).
Link and link cost: each port on a bridge connects to a link. That link may be a high-
speed LAN or, alternatively, some wide area communications technology. The STP
attempts to configure the catenet such that every end station is reachable from the root
through the path with the lowest cost. By default,
Link cost = 1000/ data rate in Mbps
Table: link cost recommendations
DATA RATE RECOMMENDED LINK
COST RANGE
RECOMMENDED LINK
COST VALUE
4Mbps 100-1000 250
10Mbps 50-600 100
16Mbps 40-400 62
100Mbps 10-60 19
1Gbps 3-5 4
10Gbps 1-5 2
Switch Book Layer 2 concepts
21/51
Path cost: as stated earlier, the STP attempts to configure the catenet such that every
station is reachable from the root through the path with the lowest cost. The cost of a path
is the sum of the cost of the links attached to the root ports in that path, as calculated
earlier.
Calculating and maintaining the spanning tree:
The spanning tree topology for a given set of links and bridges is determined by the
bridge id, the link cost, and the port id associated with the bridges in the catenet.
Logically, we need to perform three operations:
1. Determine (elect) a root bridge.
2. Determine (elect) the designated bridge and designated ports for each link.
3. Maintain the topology over time.
In practice all of these are done in parallel, through the spanning tree algorithm
operating identically and independently in each bridge.
Elect a root
To elect a root there is a election algorithm: the bridge with the numerically-lowest
bridge id becomes the root bridge at any given time.
Elect the designated bridges and designated ports:
� By definition, the root bridge is the designated bridge for each link to which it
attaches.
� For other links the designated bridge is elected with the help of the cost factor.
The link which is having low path cost back to the root.
Switch Book Layer 2 concepts
22/51
� If there is a tie in the path cost, then the bridge with lowest-numbered bridge id
will become the designated bridge.
� For a particular designated bridge there have to be only one designated port. So
the port with lowest-numbered port id will be the designated port.
Spanning tree maintenance:
In normal (steady state) operation, to maintain the tree, the protocol operates as
follows:
� Once every Hello Time (2 seconds), the root bridge transmits a configuration
message encoded as BPDU.
� all bridges sharing links with root bridge receive the BPDU and pass it to the STP
entity within the bridge. Like the data frames, the BPDU is no forwarded by the
bride to the end stations.
� The designated bridge will create a new BPDU based no the received BPDU from
the root bridge and then transmit the message.
� So in each tire, the designated bridges will update the BPDU with their own
information and transmit to the next tire. This process continues until there are no
more designated bridges.
Switch Book Layer 2 concepts
23/51
CHAPTER 7
Full Duplex Operation
Switch Book Layer 2 concepts
24/51
Half Duplex: It’s like where one device is transmitting and the other devices are
receiving.
Full Duplex channel:Its a communication channel which supports data transfer in both
directions.
Half-duplex works optimally only if one device is transmitting and all the other devices are receiving.otherwise, collisions occur. When the collisions are detected, the devices causing the collision wait
for a random time before retransmitting. Half-duplex is the most common transmission method and is
adequate for normal workstation and PC connections.
Full-duplex provides dual communication on a point-to-point connection and allows each device to simultaneously transmit and receive on a connection. Full-duplex mode is typically used to connect to other switches or to connect fast access devices such as workgroup servers.
To use full-duplex communication, both ends of the connection must be configured to operate in full-duplex mode.Full-duplex operation is only possible on point-to-point Ethernet connections that use separate conductors or fibers for transmit and receive, such as 10Base-T and 100Base-FX cabling etc. Full-duplex operation is not possible on connections using coaxial or AUI (10Base-5) cables or with most hubs.
Full Duplex Operation in LAN: It Depends on
1.Use of dedicated media as provided by the popular structured cabling.(10 Base T,1000 Base Sx,1000 Base Lx etc)
2. The use of microsegmented (One PC to One Port connection), dedicated LAN’S.
For full duplex operation to occur:
1. There should be 2 devices on LAN (Switch –PC or PC to PC etc).
2. Physical cabling should support Full Duplex.
3.Ethernet MAC must be configured to work in full duplex mode(Pascal code is used to disable collision detection).
Full duplex operation is a subset of half duplex,disabling functions of half duplex.(no CS, no MA, no CD).
Switch Book Layer 2 concepts
25/51
Implications of full duplex operation :
1.Eliminating collisions.
2.Increasing aggregate channel capacity.
3.Increases potential load on switch.
Transmitter Operation:
A Full duplex transmitter will send a frame following two simple rules:
1.)The station sends frame by frame, that is, it finishes sending one frame before sending the next pending frame.
2.)The transmitter sends frames with interframe gap which gives the receiver some time to perform housekeeping chores.
Receiver Operation:
1.)The receiver waits for valid SFD and then begins to assemble the data link encapsulation of the frame.
2.) The Destination address is checked whether it matches the device otherwise its discarded.
3.)The FCS is checked, and any frame invalid is discarded.
4.)The frame length is checked and frames shorter than minimum length is discarded.
5.)The receiver passes up to its client all frames that have passed the previous tests.
Full Duplex Application Environments:
Full duplex operation is most often seen in:
1.) Switch to Switch connections-Increased capacity, Meet the two station LAN requirement for full duplex operation, and require link lengths in excess of those allowed by the use of CSMA/CD.
2.) Server and Router connections- Increased capacity, justified in using dedicated switch ports, even at very high speeds.
3.) Long distance connections. Optical fiber is commonly used as it supports long distances.
Switch Book Layer 2 concepts
26/51
Chapter 8
LAN and Switch Flow Control
Switch Book Layer 2 concepts
27/51
The need for flow control:
Both LANs and LAN switches are connectionless in nature. Frames are transferred
without error to a high degree of probability, but there is no absolute assurance of
success.
In the event of a bit error, receiver buffer unavailability, or any other abnormal
occurrence, a receiver simply discards the frame without providing any
notification of the fact. This allows LAN interfaces to be built at very low cost; a
connectionless system is much simpler to implement than a system that includes
mechanisms for error recovery and flow control within the data link.
Default switch behavior
A switch receives frames on its input ports and forwards them onto the
appropriate output ports based on information [typically DA] in the received
frame. Depending on the traffic patterns, switch performance limitations, and
available buffer memory, it is possible that frames can arrive faster than the switch
can receive, process, and forward them. The default behavior of a switch is to
discard frames when faced with congestion condition.
The Effect of Frame Loss
A higher layer protocol or application that requires reliable delivery must
implement some form of error control .Such mechanism in TCP use positive
acknowledge and retransmission [PAR] algorithm.
In this scheme, data being transferred in one direction between stations is
acknowledges in the other. The originating station does not assume that data has
been successfully delivered until an acknowledge has been received. Depending
on the transport protocol, a single lost frame can incur the penalty of idling the
data transfer for seconds.
Switch Book Layer 2 concepts
28/51
Controlling flow in half duplex networks
Half Duplex with Back Pressure
Half-duplex back pressure ensures retransmission of incoming packets if a
half-duplex switch port is unable to receive incoming packets. When back
pressure is enabled and no buffers are available to a port, the switch
sends collision frames across the affected port and causes the transmitting
station to resend the packets. The switch can then use this retransmission
time to clear it’s receive buffer by sending packets already in the queue
MAC Control
MAC Control frame format
Preamble
(7-bytes)
Start
Frame
Delimiter
(1-byte)
Dest. MAC
Address (6-
bytes)
= (01-80-
C2-
00-00-01)
or unique
DA
Source
MAC
Address
(6-
bytes)
Length/Type
(2-bytes)
= 802.3 MAC
Control
(88-08)
MAC
Control
Opcode
(2-bytes)
= PAUSE
(00-01)
MAC
Control
Parameters
(2-bytes)
= (00-00 to
FF-FF)
Reserved
(42-
bytes)
= all
zeros
Frame
Check
Sequence
(4-bytes)
PAUSE Function
The PAUSE function is used to implement flow control on full duplex Ethernet links .PAUSE
operation MAC control architecture and frame format .The operation is defined only for
use across a single duplex link; it can’t be used on a shared LAN. It may be sued to
control data frame flow between:
A pair of end stations
Switch Book Layer 2 concepts
29/51
A switch and an end station
A switch-to-switch link
The pause function is specifically designed to prevent switches from unnecessarily
discarding frames due to input buffer overflow under short-term transient overload
conditions.
PAUSE operation
PAUSE operation implements a very simple stop-start form of flow control. A device
wishing to temporarily inhibit incoming data sends a pause frame, with a parameter
indicating the length of time that the full duplex partner should wait before sending any
more dataframes.when a station receives a pause frame, it stops sending data frames for
the period
A station may issue a PAUSE may cancel the remainder of the pause period bu issuing
another PAUSE frame with a parameter of zero time.
FLOW CONTROL IMPLEMENTATION ISSUES
Design implications of PAUSE Function
1) Inserting PAUSE frames in the Transmit Queue
Ethernet simply transmits frames in the order presented by
the device driver, without PAUSE frame. Inserting PAUSE frames in timely
manner is important for the effective use of flow control. The transmission of
PAUSE frame cannot preempt a data transmission in progress. Therefore, the
interface should complete the transmission of any frame in progress, wait
interface spacing, and then send the requested PAUSE frame.
2) Parsing received PAUSE frames
An interface must inspect and parse the fields in all incoming frames to
determine when a valid PUASE has been received in order to act upon it. The
fields like DA, Type field, MAC control opcode, FCS must be checked.
Switch Book Layer 2 concepts
30/51
3) PAUSE timing
Following the reception of the PAUSE frames itself (i.e. starting from the end of
the last bit of the received FCS), the interface has the maximum 512 bit times to
validate, decode and act upon the PAUSE frame. If during this time, the
transmitter begins transmission of frame, then it is completed normally.
Switch Book Layer 2 concepts
31/51
Chapter 9
LINK AGGREGATION
Switch Book Layer 2 concepts
32/51
Why Link Aggregation?
Link Aggregation or trunking is a method of combining physical network links into a single
logical link for increased bandwidth. With Link aggregation we are able to increase the capacity
and availability of the communications channel between devices (both switches and
end stations) using existing Fast Ethernet and Gigabit Ethernet technology. Two or more Gigabit
Ethernet connections are combined in order to increase the bandwidth capability and to
create resilient and redundant links. A set of multiple parallel physical links between two devices
is grouped together to form a single logical link.
Link Aggregation also provides load balancing where the processing and communications
activity is distributed across several links in a trunk so that no single link is overwhelmed.
By taking multiple LAN connections and treating them as a unified, aggregated link, we can
achieve practical benefits in many applications.
Link Aggregation provides the following important benefits:
_ Higher link availability
_ Increased link capacity
_ Improvements are obtained using existing hardware (no upgrading to higher-capacity
link technology is necessary)
Aggregating replaces Upgrading
If the link capacity is to be increased, there are usually two possibilities: either upgrade the
native link capacity or use an aggregate of two or more lower-speed links. Upgrades typically
occur in factors of 10. In many cases, however, the device cannot take advantage of this
increase. A performance improvement of 1:10 is not achieved, moreover the bottleneck is just
moved from the network link to some other element within the device.
Link aggregation may be less expensive than a native speed upgrade and yet achieve
a similar performance level. Both the hardware costs for a higher speed link and the equivalent
number of lower speed connections have to be balanced to decide which approach is the most
advantageous.
**Sometimes link aggregation may even be the only means to improve performance when the
highest data rate available on the market is not sufficient.
Switch Book Layer 2 concepts
33/51
Types of Link Aggregation
There are a number of situations where Link Aggregation is commonly deployed:
_ Switch-to-switch connections
_ Switch-to-station (server or router) connections
_ Station-to-station connections
Switch-to-Switch Connections
In this scenario, multiple workgroups are joined to form one aggregated link. By aggregating
multiple links, the higher speed connections can be achieved without hardware upgrade.
Switch-to-Station (Server or Router) Connections
Most server platforms can saturate a single 100 Mb/s link with many of the applications
available today. Thus, link capacity becomes the limiting factor for overall system performance.
Station-to-Station Connections
In the case of aggregation directly between a pair of end stations, no switches are involved
at all. As in the station-to-switch case, the higher performance channel is created without
having to upgrade to higher-speed LAN hardware. In some cases, higher-speed NICs may
not even be available for a particular server platform, making link aggregation the only practical
choice for improved performance.
Physical issues in Link Aggregation
Addressing
Each network interface controller is assigned a unique MAC address. Usually this address is
Programmed into the ROM during manufacturing. During initialization, the device driver
reads the contents of the ROM and transfers the address to a register within the MAC controller.
In most cases, this address is used as source and destination address during the
transmission of packets. Aggregated links are to appear as a single link with a single logical
network interface and therefore only have one “virtual” MAC address. The MAC address of
one of the interfaces belonging to the aggregated link provides the “virtual” address of the
logical link.
Frame Distribution[transmission of frames ]
When applying WAN technologies, frames are sometimes broken into smaller units to accelerate
transmission . LAN communications channels, however, do not support sub-frame transfers. The
complete frame has to be sent through the same physical link. Using aggregated links, the task is
to select the link on which to transmit a given frame. Sending one long frame may take longer
Switch Book Layer 2 concepts
34/51
than sending several short ones, so the short frames may be received earlier than one long
frame. The order has to be restored at the receiver side. Thus, an agreement has been made: all
frames belonging to one conversation must be transmitted through the same physical link, which
guarantees correct ordering at the receiving end station. For this reason no sequencing
information may be added to the frames. Traffic belonging to separate conversations can be sent
through various links in a random order. The algorithm for assigning frames to a conversation
depends on the application environment and the kind of devices used at each end of the link.
When a conversation is to be transferred to another link because the originally mapped link
is out of service (failed or configured out of the aggregation) or a new link has become available
relieving the existing ones, precautions have to be taken to avoid mis-ordering of
frames at the receiver. This can be realized either by means of a delay time the distributor
must determine somehow or through an explicit marker protocol that searches for a marker
identifying the last frame of a conversation. The distributor inserts a “marker message” behind
the last frame of a conversation. After the collector receives this “marker message” it
sends a response to the distributor, which then knows, that all frames of the conversation
have been delivered. Now the distributor can send frames of these types of conversations
via a new link without delay. If the conversation is to be transferred to a new link, because
the originally mapped link failed, this method will not work. There is no path on which the
message marker can be transferred, i.e. the distributor has to employ the timeout method.
Technology Constraints
In principle, the devices applied in the aggregation restrict the throughput. Using an aggregation
of four 100 Mb/s links instead of one 100 Mb/s link will increase the capacity but the
throughput on each link remains the same.
Switch Book Layer 2 concepts
35/51
CHAPTER-11
Virtual LANs:Applications and Concepts
Switch Book Layer 2 concepts
36/51
VLAN (Virtual LAN): Virtual LAN - Virtual Local Area Network. A division of a local
area network by software rather than by physical arrangement of cables. Division of the
LAN into subgroups can simplify and speed up communications within a workgroup.
Switching a user from one virtual LAN to another via software is also easier than
rewiring the hardware. The stations on the same VLAN group can communicate with
each other. With VLAN, a station cannot directly talk to or hear from stations that are not
in the same VLAN group(s)
Applications of VLAN:
1.) Software patch panel: This simple application requires only port based vlans. With the
centralized wiring center connections between equipment on the LAN are made by patch
cord interconnections on a wiring panel. Thus moving, adding or changing a station
Can be simply achieved by changing the patch cord interconnections without rewiring.
2.) LAN Security: A user on a shared LAN can create problems by sending lots of traffic
to some targeted users, resulting in performance degradation. Therefore by creating
logical partitions to the catenet with VLAN technology we enhance the protections
against unwanted traffic. Port based VLAN allows free communication among the
members of a given VLAN, but does forward traffic among switch ports associated with
members of different VLANs.
3.) User Mobility:
a.) User’s view of the network can stay consistent regardless of physical location.
b.)Network layer addresses may not need to be changed based on physical location.
c.)Mobile users are granted access privileges so that they can access their home servers.
4.) Bandwidth Preservation: VLAN technology will isolate traffic between logically
separated workgroups, thus preserving bandwidth.
VLAN Concepts:
A station can be in multiple VLANs depending upon the capabilities of the station and
switches deployed and applications operating within the station. Stations simply look at
frames and classify a frame as belonging to a particular based on a set VLAN association
rules. LAN aware devices just need to apply the rules and classify frames as belonging to
one vlan or another.
VLAN Tagging:
Switch Book Layer 2 concepts
37/51
Implicit Tags: Here tags are not involved; it’s an unmodified frame as sent by any station
or switch. All frames sent by VLAN unaware end stations are considered implicitly
tagged. Here it’s based on set of VLAN association rules. The VLAN association is a
function of protocol type, data link source address, higher layer network identifiers etc. If
there are no explicit tags provided then the VLAN aware switch must determine the
VLAN association from an application of the rules.
Explicit Tags: An explicit tag is a predefined filed in a frame that carries the VLAN
identifier for that frame. These tags are applied to the VLAN aware devices and these
devices after receiving the frame does not re-apply the application rules.
Tagged Frame Type - this indicates the type of tag, for Ethernet frames this is
currently always 0x8100.
Priority - this ranges from binary 000 (0) for low priority to binary 111 (7) for high
priority
Canonical - this is always 0.
VLAN ID - this identifies the VLAN number when trunking VLANs.
VLAN Awareness:
Switch Book Layer 2 concepts
38/51
1.) Making frame forwarding decisions based on VLAN association of a given
frame.(based on DA and also on the VLAN to which the frame belongs).
2.) Providing explicit VLAN identification within transmitted frame.
VLAN Aware Switches:
Edge Switches: These switches connect at the boundary between VLAN unaware domain
and the VLAN aware domain. This switch apply rules on every frame and then tags these
frames for forwarding it to the backbone through the core switch.An edge switch will
remove the inserted tag before forwarding the frame to the VLAN unaware domain.
Core Switches: These switches connect between two VLAN aware devices. They do not
tag or untag frmes. It purely forwards frames based on VLAN identification in the tag.
It consist of a table that maps VLAN identifiers to the set of ports that are needed to reach
the members of the VLAN.The depth of the table is fixed at 4094 entries.
Vlan Aware End Stations: (Advantages):
1.) A Set of stations may negotiate a dynamically created VLAN for the purpose of
carrying on short term audio or video conference and the conferencing application
can tag frames for that particular conference with a unique VLAN identifier.
2.) The frame sent by the station will reach only to members of that same VLAN.
3.) If all frames carry VLAN tags, then all switches will become core switches that is
switches will make decision based on vlan tag information.
VLAN awareness in end stations. (Methods)
1.)Applications themselves need to be written to be VLAN aware.
2.)APIs need to be enhanced to support passing of VLAN information to and from
applications.
3.) Device drivers for LAN interfaces need to be changed to allow a client to specify a
VLAN in addition to the other information needed to send frames on its behalf.
4.) Insert VLAN tags within transmitted frames.This is implemented in the device driver
or in VLAN aware NIC.
VLAN Unaware Switches: VLAN unaware switches are not capable of tagging or
untagging.VLAN unaware switch can process a VLAN tagged frame based on the
address in the frame.
VLAN Association Rules: (Mapping frames to VLANs)
Switch Book Layer 2 concepts
39/51
1.) Port based VLAN mapping: Stations within a given VLAN can freely
communicate among themselves. No communication is possible between stations
connected to ports that are members of different VLANs.Its used for software
patch panel. It provides bandwidth preservation.This mapping is used in force10.
2.) Mac address VLAN mapping: In this type of mapping switch uses source address
to determine the VLAN membership. A look up process that is used to learn the
port mapping for the station is used to determine the VLAN mapping.
3.) Protocol Based VLAN mapping: A switch with protocol-based VLANs that divide the physical
network into logical VLAN groups for each required protocol. When a frame is received at a port, its
VLAN membership can then be determined based on the protocol type being used by the inbound
packets. The protocol based VLAN mapping allows a station to be member of multiple VLANs
depending on the number of protocols it supports (IP, IPX, and Appletalk etc).The VLAN mapping
is a function of both the source address and the encapsulated protocol.
4.) IP Subnet Based VLAN mapping: In this of mapping the VLANs are divided based on the IP
Subnets. A VLAN aware switch needs to perform two operations to create IP subnet based VLANs.
a.)Check if frame encapsulates an IP datagram.
b.)Extract the IP subnet portion of the IP source address in the encapsulated datagram.
5.) Application Based VLAN mapping: In this type the VLANs are divided based on higher layer
application processes. The applications could provide audio or video conferencing, group document
preparation etc.The use of application based VLANs requires that the station be VLAN aware.The
application will ensure that the frame carried the VLAN identifier in an explicit tag, so that the VLAN aware
switches never need to parse the frames to determine the application and they can simply switch frames
based upon the VLAN identified in the tag.
Switch Book Layer 2 concepts
40/51
Chapter 12
Virtual LANs: The IEEE Standard
Switch Book Layer 2 concepts
41/51
VLAN: Virtual Local Area Network and IEEE 802.1Q
Virtual LAN (VLAN) is a group of devices on one or more LANs that are configured so that they can
communicate as if they were attached to the same wire, when in fact they are located on a number of
different LAN segments. Because VLANs are based on logical instead of physical connections, it is very
flexible for user/host management, bandwidth allocation and resource optimization.
There are the following types of Virtual LANs:
1. Port-Based VLAN: each physical switch port is configured with an access list
specifying membership in a set of VLANs.
2. MAC-based VLAN: a switch is configured with an access list mapping individual
MAC addresses to VLAN membership.
3. Protocol-based VLAN: a switch is configured with a list of mapping layer 3
protocol types to VLAN membership - thereby filtering IP traffic from nearby
end-stations using a particular protocol such as IPX.
The IEEE 802.1Q specification establishes a standard method for tagging Ethernet frames
with VLAN membership information. The IEEE 802.1Q standard defines the operation
of VLAN Bridges that permit the definition, operation and administration of Virtual LAN
topologies within a Bridged LAN infrastructure. The 802.1Q standard is intended to
address the problem of how to break large networks into smaller parts so broadcast and
multicast traffic would not grab more bandwidth than necessary. The standard also helps
provide a higher level of security between segments of internal networks
Protocol Structure - VLAN: Virtual Local Area Network and the IEEE 802.1Q
IEEE 802.1Q Tagged Frame for Ethernet:
7 1 6 6 2 2 2 42-1496 4
Preamble SFD DA SA
TPID TCI Type Length Data CRC
TPID - defined value of 8100 in hex. When a frame has the EtherType equal to
8100, this frame carries the tag IEEE 802.1Q / 802.1P.
Switch Book Layer 2 concepts
42/51
TCI - Tag Control Information field including user priority, Canonical format
indicator and VLAN ID.
• Tag-based VLAN Overview
Regarding IEEE 802.1Q standard, Tag-based VLAN uses an extra tag in the MAC header
to identify the VLAN membership of a frame across bridges. This tag is used for VLAN
and QoS (Quality of Service) priority identification. The VLANs can be created statically
by hand or dynamically through GVRP. The VLAN ID associates a frame with a specific
VLAN and provides the information that switches need to process the frame across the
network. A tagged frame is four bytes longer than an untagged frame and contains two
bytes of TPID (Tag Protocol Identifier, residing within the type/length field of the
Ethernet frame) and two bytes of TCI (Tag Control Information, starts after the source
address field of the Ethernet frame).
• TPID : TPID has a defined value of 8100 in hex. When a frame has the
EtherType equal to 8100, this frame carries the tag IEEE 802.1Q / 802.1P.
Switch Book Layer 2 concepts
43/51
• Priority: The first three bits of the TCI define user priority, giving eight (2^3)
priority levels. IEEE 802.1P defines the operation for these 3 user priority bits.
• CFI: Canonical Format Indicator is a single-bit flag, always set to zero for
Ethernet switches. CFI is used for compatibility reason between Ethernet type
network and Token Ring type network. If a frame received at an Ethernet port has
a CFI set to 1, then that frame should not be forwarded as it is to an untagged port.
• VID: VLAN ID is the identification of the VLAN, which is basically used by the
standard 802.1Q. It has 12 bits and allow the identification of 4096 (2^12)
VLANs. Of the 4096 possible VIDs, a VID of 0 is used to identify priority frames
and value 4095 (FFF) is reserved, so the maximum possible VLAN configurations
are 4,094.
Note that user priority and VLAN ID are independent of each other. A frame with VID
(VLAN Identifier) of null (0) is called a priority frame, meaning that only the priority
level is significant and the default VID of the ingress port is given as the VID of the
frame.
• How 802.1Q VLAN works
According to the VID information in the tag, the switch forward and filter the frames
among ports. These ports with same VID can communicate with each other. IEEE
802.1Q VLAN function contains the following three tasks, Ingress Process, Forwarding
Process and Egress Process.
Switch Book Layer 2 concepts
44/51
1. Ingress Process:
Each port is capable of passing tagged or untagged frames. Ingress Process identifies if
the incoming frames contain tag, and classifies the incoming frames belonging to a
VLAN. Each port has its own Ingress rule. If Ingress rule accept tagged frames only, the
switch port will drop all incoming non-tagged frames. If Ingress rule accept all frame
type, the switch port simultaneously allow the incoming tagged and untagged frames:
• When a tagged frame is received on a port, it carries a tag header that has an
explicit VID. Ingress Process directly passes the tagged frame to Forwarding
Process.
• An untagged frame doesn't carry any VID to which it belongs. When an untagged
frame is received, Ingress Process insert a tag contained the PVID into the
untagged frame. Each physical port has a default VID called PVID (Port VID).
PVID is assigned to untagged frames or priority tagged frames (frames with null
(0) VID) received on this port.
Switch Book Layer 2 concepts
45/51
After Ingress Process, all frames have 4-bytes tag and VID information, and then go to
Forwarding Process.
2. Forwarding Process:
The Forwarding Process decides to forward the received frames according to the Filtering
Database. If you want to allow the tagged frames can be forwarded to certain port, this
port must be the egress port of this VID. The egress port is an outgoing port for the
specified VLAN, that is, frames with specified VID tag can go through this port. The
Filtering Database stores and organizes VLAN registration information useful for
switching frames to and from switch ports. It consists of static registration entries (Static
VLAN or SVLAN table) and dynamic registration entries (Dynamic VLAN or DVLAN
table). SVLAN table is manually added and maintained by the administrator. DVLAN
table is automatically learned via GVRP protocol, and can't be created and upgraded by
the administrator
Switch Book Layer 2 concepts
46/51
CHAPTER 13
Priority Operation
Switch Book Layer 2 concepts
47/51
Priority operation adds complexity to switches, there is no need to pay for this
complexity unless there is an application benefit to be gained. There are two situations to
consider:
1.) The catenet cannot handle steady state traffic load offered by other users: It will occur
if some link or switch in the catenet has inadequate capacity to support the desired
application data flows.A steady state problem will occur if the switch did not support
wire speed operation at higher data rate.The solution is to add capacity to the network.
2.) The catenet has sufficient capacity to handle the steady state traffic load,but not short
term peak loads.There can be times when the traffic load wil exceed the capacity of some
link or switch,regardless of the design of the catenet.So priorities come in to picture
here,some traffic streams are more important then these streams can be given priority
over less traffic.This will work only for over load conditions.
LAN Priority Mechanisms:
1.)Access priority:Giving priority to a particular station in a shared LAN.
(a.)Static:Giving priority to the station all the time.
(b)Dynamic:Priority is given on frame by frame basis depending on applications running.
2.)User priority:It is the priority assigned to a given frame by the application sourcing
those frames.
For Ethernet access priority,some of the methods employed are:
1.) Shortened interframe gap: By reducing the IFG, we are making the particular traffic to
go sooner than others.
2.) Modified backoff algorithm: When collision occurs, the device with shortened
backoff time will transmit its frames sooner than the other stations involved in the
collision.
Switch Book Layer 2 concepts
48/51
3.) Looong preamble: The longer the preamble, the higher the priority. The device with
the longest preamble ignores collision and continues with its frame transmission.
VLAN and Priority Tagging:
Tagged Frame Type - this indicates the type of tag, for Ethernet frames this is
currently always 0x8100.
Priority - this ranges from binary 000 (0) for low priority to binary 111 (7) for high
priority
Canonical - this is always 0.
VLAN ID - this identifies the VLAN number when trunking VLANs.
In order to use Priority mechanisms,
1.) The operating system and protocol stack have 2 to be modified.
2.) APIs in the internet protocol stacks have to be modified.
3.) Protocol implementations within the end stations may have to be enhanced.
Switch Book Layer 2 concepts
49/51
4.)Operating system code(NIC APIs,network devices drivers)have to be modified.
5.) Network interfaces have to be modified.
Edge Switches: These switches sit on the boundary between the priority unaware
world and the priority aware core. They provide attachments for end stations directly.
Core Switches: These typically provide backbone interconnections between the edge
switches.
Priority Operation in switches:
If we don’t invoke any priority mechanisms, the operation of a switch is quite
straightforward; the switch handles all frames equally. The whole idea of priority is to
allow frames that are more important to jump ahead of lower priority frames in the
queue.
Switch Process Flow for priority operation. It’s a three step process
1.) Determining frame priority on input: On receipt of a frame, the switch must
determine the priority of that frame, either from explicit priority information provided
in the frame itself or implicitly from the frame contents and a set of administrative
policy rules.
2.) Mapping input priority to class of service. Knowing the priority of the frame, the
switch must map that priority to one set of classes of service avaible at each output
port on which the frame is to be forwarded.Typically; each service class identifies a
particular output queue on each port.
3.) Output Scheduling. For a set of multiple output queues, the switch must apply
some scheduling algorithm to transmit frames from those queues according to the
needs of the classes of service that they represent.
Scheduling Algorithms:
Switch Book Layer 2 concepts
50/51
1.) Strict priority: As the name implies inteprets priority literally, higher priority
queues will be served first and then the lower priority queues. It is the easiest
policy to implement. Incase if a high priority user offers more load than the
capacity of the output port, no frames will be transmitted from the lower priority
queues. In extreme case, all frames will be discarded.
2.) Weighted Fair Queuing: Its an alternative approach which does not exclude lower
priority queues completely. Weight is assigned to each queue; higher priority
queues are given greater weight than lower priority queues.The output scheduler
then use a round robin algorithm tempered by indicated weight. Weights are
usually assigned depending upon the bandwidth allocated to each queue. That is,
if all queues have traffic to send, the avaible bandwidth will be divided among
them by the ratio of their weights.
Indicating the priority in transmitted frames:
On input, we made a priority determination and possibly remapped that priority to a
globally consistent set of semantics. On output, we have three choices:
1.) Signal the user priority in a VLAN-style tag: This relieves the next device from
having to make an implicit priority determination from a set of administrative rules.
The tagging approach requires that he output port support tagged frames.
2.) Signal the user priority in a LAN specific maner: This method is used when output
port does not support tags, but supports native indication of user priority.
3.) Don’t signal user priority: On Ethernet ports without tag support, there is no
choice but to forward the frame without priority and the next device to receive the
frame will need to determine the priority through implicit means.
Priority Regeneration:
The IEEE 802.1p and q standards provide for priority regeneration. Priority
regeneration is only used when explicit priority is indicated in received frames
Switch Book Layer 2 concepts
51/51
through a native priority field. Priority regeneration can be used not only to equalize
service levels among departments but to change or override the local administrative
policy.Priority regeneration allows an easy means of migrating and merging priority
enabled LANs into larger catenet without having to change all of the local
administrative policies at once.
IEEE 802.1p:
IEEE 802.1P defines a priority field that can be used by LAN switches and such at
the Ethernet level to prioritize traffic.
The prioritization specification works at the media access control (MAC) framing
layer (OSI model layer 2). The 802.1P standard also offers provisions to filter
multicast traffic to ensure it does not proliferate over layer 2-switched networks.
The 802.1p header includes a three-bit field for prioritization, which allows packes to
be grouped into various traffic classes. The IEEE has made broad recommendations
concerning how network managers can implement these traffic classes, but it stops
short of mandating the use of its recommended traffic class definitions. It can also be
defined as best-effort QoS (Quality of Service) or CoS (Class of Service) at Layer 2
and is implemented in network adapters and switches without involving any
reservation setup. 802.1p traffic is simply classified and sent to the destination; no
bandwidth reservations are established.
********************************************************