the switch book by rich seifert-notes

Switch Book Layer 2 concepts

1/51

THE SWITCH BOOKTHE SWITCH BOOKTHE SWITCH BOOKTHE SWITCH BOOK By Rich Seifert

CHAPTER-1

Foundations Of LAN Switches:


2/51

Network Architecture:

OSI LAYER:(Open System Interconnect)

It consists of seven layers of network system functions.

1. Physical Layer:

� Transmission and reception of signals from the communications medium.

� Data is sent in terms of bits:0’s and 1’s.

� This layer is a function of the design of physical medium.(Cabling)

2.Data Link Layer:

� Provides direct communication between devices.

� Communications are of two types-point to point and point to multipoint.

� It provides mechanisms for 1.Framing 2.Addressing 3.Error Detection.


3/51

� 2 Modes of operation :

1. Connectionless :( a).Just forwards the frame and doesn’t

receive acknowledgement. (b) Doesnt provides error control and flow control.

2. Connection oriented :( a) continual exchange of data and

receives acknowledgement. (b) Provides error and flow control.

3.Network Layer:

� Station to Station data delivery across multiple links.

� Routing of packets across the internetwork usually through routers.

� Protocols include:IP,IPX,Appletalk etc

4. Transport Layer:

� Shields between lower and upper layers.

� Provides error free sequenced guaranted delivery service.

� Mechanisms:1.Connection establishment,2.Error Recovery 3.Flow control.

� Protocols:TCP,ATP,SPX etc.

5. Session Layer:

� Establishment of communications sessions between applications.

� Deals with user authentication and access control(passwords).

6.Presentation Layer:

� Presents proper data to application layer.

� Data formats: encryption,decryption, encoding,decoding.


4/51

7.Application Layer:

� Provides APIs that allow user applications to communicate across the network.

� Functions such as FTP, Mail Utilities, SMTP, NFS etc.

Data link sub layering:

1. Logical Link Control (LLC): It’s the upper layer .Provides the data link service

(connectionless or connection oriented) to the higher layer clients, independent of the

underlying LAN.There are 3 types of service 1.LLC TYPE 1: Connectionless Service

2.LLC TYPE 2: Connection Oriented Service 3.LLC TYPE 3: Acknowledged

Connectionless Service.

2. Medium Access Control (MAC): It’s the lower layer.Deals with details of frame

formats associated with the particular technology in use.

LLC Frame Format:

Mac Header Dest SAP-1

Byte

Source SAP-

1 Byte

CTRL-1

Byte

DATA

LLC/Snap Format: If the SAP is set to OXAA, then SNAP is in use.

Mac

Header

Dest

SAP=OXAA

Source

SAP=OXAA

CTRL SNAP

OUI

SNAP

Pid

Data


5/51

Addressing:

MAC Address:Its a 48 bit address.Its used in the data link.Its also called as hardware

address,Physical address.

1 byte 2nd 3rd 4th 5th 6th

OUI OAP

The OUI is used the denote the manufacturer.Force10 has the OUI as 00-01-E8.

ETHERNET:

� Low cost,High speed communication.

Frame transmission:

� Sensing carrier.

� Waiting for Interframe gap.

� Transmission takes place.

Frame reception:

� Station monitors for receiving frame.

� When channel becomes non-idle,it starts receiving bits.

� Frames will be discarded if they less than one slot time in length.

� FCS checks for minimum frame length,if valid the receiver will check for the

DA to see if it matches the physical address of the receiving station.


6/51

� If it matches then frame is forwarded to client.

Ethernet Frame Formats:

Type encapsulation:ETHERNET VERSION-2

Preamble/SFD DA SA TYPE DATA FCS

Bytes: 8 6 6 2 46-1500 4

Length encapsulation:IEEE 802.3

46-1500

LLC HEADER

Preamble/SFD DA SA LENGTH DSAP SSAP CTRL DATA PAD FCS

Bytes: 8 6 6 2 1 1 1 4

Preamble: It consists of 7 bytes and it allows receivers to synchronize on incoming frame.

It has a value of 0X55.

SFD:Its consists of 1 byte. Its used to signify the beginning of the DA.Its value is OXD5.

DA:Destination address of the frame.It consists of 6 bytes.

SA:Source addressof the frame.It consists of 6 bytes.


7/51

DATA:It consists of 46-1500 bytes. It encapsulates the higher layer protocol information

being transferred across the Ethernet.

Pad: This is used to add extra bytes incase the value of the data is less say less than 46

bytes.In that case the frame will be discarded,so in order to prevent it we use pad field.

LAYER ENCAPSULATION:

PHYSICAL LAYER ENCAPSULATION-STREAM-BITS

ETHERNET FRAME

IP PACKET

TCP SEGMENT

PL

HEADER

ETH

HEADER

IP

HEADER

TCP

HEADER

APPLICATION

DATA

ETH

TRAILER

PL

TRAILER

A transport PDU is called a segment or message.

A Network PDU is called a packet.

A Data Link PDU is called frame.

A Physical layer PDU is called symbol stream.

PDU: Protocol Data Unit.


8/51

CHAPTER-2

TRANSPARENT BRIDGES


9/51

Transparent bridges.

Now getting into details of how things actually work…

Transparent bridges are so named because their presence and operation are transparent to

network hosts. When transparent bridges are powered on, they learn the network's

topology by analyzing the source address of incoming frames from all attached

networks.

If, for example, a bridge sees a frame arrive on Line 1 from Host A, the bridge concludes that

Host A can be reached through the network connected to Line 1. Through this

process, transparent bridges build a table.

Host address Network number

15 1

17 1

12… 2….

Figure 1: Transparent bridges build a table that determines a host's accessibility

The bridge uses its table as the basis for traffic forwarding. When a frame is received on one

of the bridge's interfaces, the bridge looks up the frame's destination address in its

internal table. If the table contains an association between the destination address and

any of the bridge's ports aside from the one on which the frame was received, the

frame is forwarded out the indicated port. If no association is found, the frame is

flooded to all ports except the inbound port. Broadcasts and multicasts also are

flooded in this way.

UNICAST OPERATION:


10/51

When a frame is received on any port, the bridge extracts the destination address from

the frame, looks up in the table, and determines the port to which the address maps.

We have filtering and forwarding concepts.

Filtering: When a packet is received by a node, filtering is the task of

a) Determine whether to forward the packet at all, and

b) Which port(s) to forward the packet to?

Filtering makes the network operation more efficient by reducing the number of

output ports to which the packet needs to be sent to. For example:

Unicast packets need to go to only one output port, and that output port should the

next step in the desired path to the destination. Multicast packets need to go to a

(sub) set of ports. The forwarding table encodes this subset of ports and avoids the

need to carry such information in the packet itself.

Forwarding:

Given a packet at a node, finding which output port it needs to go to is called

“forwarding” – it is a per-node function, whereas routing may encompass several

nodes.

The forwarding function is performed by every node in the network including hosts,

repeaters, bridges and routers.

Forwarding is trivial in the case of a single-port node (or in a dual port node, where

the destination is on the port other than the input port) – in this case you don’t need

even addresses.

Generating the address table:

1) The address table can be built automatically by considering the source address in

received frames.


11/51

2) Bridges perform a table lookup on the destination address in order to determine

on which ports to forward the frame.

Address table aging:

If all we ever did was add learned address to the table and never removes them, we

would have two problems.

1) The larger the table, the more time the lookup will require. Thus we have to restrict

entries in the table to only those stations that are known to be currently active.

2) If a station moves from one port to another, the table will incorrectly indicate the

old port until the station sends traffic that would cause the bridge to learn its new

location.

The simple solution to both the problems is to age entries out of the address table

when a station has not been heard from for some period of time. Thus, when we

perform the table lookup for the source address, we not only make a new entry, we

flag the entry as being still active. On regular basis we check for stale entries.—

entries that have not been flagged as active for some period of time—and remove

them from table.

Process Model of Table Operation

1. A lookup process compares the destination address in incoming frames to the

entries in the table to determine whether to discard the frame, forward it to a specific

port, or flood it to all the ports.

2. A learning process compares the source address in incoming frames to the entries

in the table and updates the port mapping and activity indicators or creates new

entries as needed.

3. An aging process removes stale entries from the table on a regular basis.


12/51

Custom Filtering and Forwarding

We can add filtering and forwarding criteria beyond defaults. Many commercial

bridges allow the network administrator to program the custom filter and forward

criteria: for example the network administrator may wish to:

1. Prevent specific users from accessing certain resources.

2. Prevent sensitive traffic from being allowed to propagate beyond a set of

controlled LANs.

3. Limit the amount of multicast traffic that is flooded onto certain LANs.

Implementing the bridge address table

Table operations:

There are three operations that need to be performed on the Bridge address table;

Destination address lookup, source address learning, and entry aging. Considering

the priority of the operations, the table design should be optimized for fast, real-time

lookup, at the expense of slower and more complex update and aging algorithms if

need be.

Search Algorithms:

1) Hash tables

2) Binary search

3) Content addressable Memories (CAM)

You can compare CAM to the inverse of RAM. When read, RAM produces the

data for a given address. Conversely, CAM produces an address for a given data word.

When searching for data within a RAM block, the search is performed serially. Thus,

finding a particular data word can take many cycles. CAM searches all addresses in


13/51

parallel and produces the address storing a particular word. You can use CAM for any

application requiring high-speed searches, such as networking, communications, data

compression, and cache management.

Aging entries from the table

The aging process is a non-critical low priority task .It can be done in the background

with out significant performance or operational penalty. The common mechanism used

,to maintain two bits valid(V) and Hit(H) .the valid (v) bit indicates that a table entry is

valid; the hit(h) bit indicates that this entry has been ‘hit ‘,that is ,seen as a source

address, during the most recent aging process cycle.

The IEEE 802.1D standard

In addition to the formal description of transparent bridge operation, the standard

provides:

1) An architectural frame work for the operation of bridges, including formal

specifications for interlayer services.

2) A formal description of the bridge address table, frame filtering, and forwarding,

including for static and dynamic entries, forwarding rules and custom filter definitions.

3) A set of operating parameters for interoperation of bridged catenets.


14/51

CHAPTER-4

Principle of LAN switches


15/51

1. Switched LAN concepts

� Access Domains : The set of stations sharing a given LAN and arbitrating

among themselves using whatever access control mechanism is appropriate

for that LAN.

� Collision Domains: In an Ethernet LAN, the set of stations contending for

the access to shared Ethernet LAN. This results in Collision domain.

� Token Domains : Similarly, the set of stations contending for the use of

token on a token-passing LAN, which results in Token domain.

� Both Collision and Token domain are the examples for Access domain.

� Each port in the switch act as the terminal for the access domain of that

particular link.

� It’s the switch that separates the Access domain of each port.

Segmentation and Microsegmentation:

� Segmentation is connecting group of stations to the each port of the switch

(i.e. each port is connected with a shared LAN). So a switch used in this

manner provides a collapsed backbone.

� So to over come the drawbacks of the collapsed backbone, the concept of

Microsegmentation come in to task.

� Microsegmentation: it is the direct connection of each end stations to each

switch port.


16/51

� Microsegmentation interesting characters:

1. No access contention (i.e. no collision) only in the full duplex mode.

2. Possible to eliminate access control when full duplex used.

3. There will be a dedicated bandwidth (LAN segmentation is available for

each station). So the data rate is independent. For example one can be

10Mbps, the other one can be 100Mbps or 1000Mbps.

Extended distance limitations:

� Switches allow us to extend distance coverage of a LAN.

� Using full duplex the distance constraints can be eliminated

(i.e. microsegmentation).

Increase aggregate capacity:

� A switch provides greater data-carrying capability than a shared LAN.

� Since a switch hub provides dedicated capacity on each switch port, the total

LAN capacity increases with the number of switch ports. So the aggregate

capacity will equal:

Capacityagg = port=1Σn Data Rateport

2. Cut-Through verses Store-and-Forward

� Store and forward: as the name implies, each frame is received (stored)

completely and then decisions are made regarding whether and where to

forward the frames.


17/51

� This is done based on the Destination address in the Ethernet frame. The

destination address is the first field in the Ethernet frame.

� So in this method the switch waits for 1.2ms, for the frame to receive fully

and then the decision and forwarding is done.

� To reduce this receiving and forwarding timing, the concept of Cut-Through

comes in to picture.

� Cut-Through: The switch begin transmitting the frame before the frame

fully received at the input side. Since the destination address is the first field

in the Ethernet frames, as soon as the switch reads the destination address it

forwards the frame to the destination.

� Switch can receive the destination address field by 11.2 µs make the

decision and forward. So the switch need not wait for the whole frame to be

received.

� Because of this advantage, Cut-Through mode is having less latency than

the Store-and-Forward.

� The implication was that a Cut-Through switch provided a 20:1

performance improvement over Store-and-Forward switch. There are a

number of fallacies with this conclusions:

1. Absolute latency is not a significant issue for most higher-layer protocols

and applications (at least not latency on the order of a few milliseconds).

2. For those protocols that are sensitive to latency, the switch is only a

smaller part of the problem.

3. Any latency benefit accrues only when the output port is available.

4. Cut-Through operation is generally not possible for multicast or unknown

destination address.


18/51

CHAPTER 5

Loop resolution


19/51

Spanning tree protocol

� Frames would loop for an indefinite period of time in networks with

physically redundant links.

� To prevent looping frames, STP blocks some ports from forwarding

frames so that only one active path exists between any pair of LAN

segments (collision domains).

� The result of STP is both good and bad

� Good: Frames do not loop infinitely, which makes the LAN usable.

� Bad: the network does not actively take advantage of some of the

redundant links, because they are blocked to prevent frames from looping.

Some users’ traffic travels a seemingly longer path through the network,

because a shorter physical path is blocked.

� However the net result is GOOD.

Terminology

Tree topology: Think of a tree. There is a root, branches (actually, a hierarchy of

progressively smaller branches), and ultimately leaves. On a given tree, there are no

disconnected parts that are still considered part of the tree; that is, the tree encompasses

all of its leaves. In addition, there are no loops in the tree. Thus a tree is a loop-free

topology that spans all of its parts.

Root Bridge: just as a tree has a root, spanning tree has a Root Bridge. The root Bridge is

the logical center (but not necessarily the physical center) of the catenet. There is always

exactly one Root Bridge in a catenet.

Designated Bridge: the bridge responsible for forwarding traffic in the direction from

the root to a given link is known as the designated bridge for that link.

Designated Port: the port in the active topology used to forward traffic away from the

root on to the link(s) for which this bridge is the Designated Bridge.


20/51

Root Port: the port in the active topology that provides connectivity from the designated

bridge towards the root.

Bridge identifier: in order to properly configure, calculate, and maintain the spanning

tree, there needs to be a way to uniquely identify each bridge in the catenet and each port

within the bridge.

A bridge identifier is a 64-bit field unique to each port in the catenet. The bridge

id is the concatenation of a globally-unique 48-bit field and a 16-bit “priority” value.

Bridge id: the priority is from 0 to 65,535 (216

) and the default priority value is

32768(0x8000).

Port identifier: each port of the bridge is assigned a port id. Similar to the bridge id, a

port id concatenates a unique 8-bit port number and 8-bit priority field. The range of the

priority field in port id is 0 to 255(0xFF); the default value is the range (128 or 0X80).

Link and link cost: each port on a bridge connects to a link. That link may be a high-

speed LAN or, alternatively, some wide area communications technology. The STP

attempts to configure the catenet such that every end station is reachable from the root

through the path with the lowest cost. By default,

Link cost = 1000/ data rate in Mbps

Table: link cost recommendations

DATA RATE RECOMMENDED LINK

COST RANGE

RECOMMENDED LINK

COST VALUE

4Mbps 100-1000 250

10Mbps 50-600 100

16Mbps 40-400 62

100Mbps 10-60 19

1Gbps 3-5 4

10Gbps 1-5 2


21/51

Path cost: as stated earlier, the STP attempts to configure the catenet such that every

station is reachable from the root through the path with the lowest cost. The cost of a path

is the sum of the cost of the links attached to the root ports in that path, as calculated

earlier.

Calculating and maintaining the spanning tree:

The spanning tree topology for a given set of links and bridges is determined by the

bridge id, the link cost, and the port id associated with the bridges in the catenet.

Logically, we need to perform three operations:

1. Determine (elect) a root bridge.

2. Determine (elect) the designated bridge and designated ports for each link.

3. Maintain the topology over time.

In practice all of these are done in parallel, through the spanning tree algorithm

operating identically and independently in each bridge.

Elect a root

To elect a root there is a election algorithm: the bridge with the numerically-lowest

bridge id becomes the root bridge at any given time.

Elect the designated bridges and designated ports:

� By definition, the root bridge is the designated bridge for each link to which it

attaches.

� For other links the designated bridge is elected with the help of the cost factor.

The link which is having low path cost back to the root.


22/51

� If there is a tie in the path cost, then the bridge with lowest-numbered bridge id

will become the designated bridge.

� For a particular designated bridge there have to be only one designated port. So

the port with lowest-numbered port id will be the designated port.

Spanning tree maintenance:

In normal (steady state) operation, to maintain the tree, the protocol operates as

follows:

� Once every Hello Time (2 seconds), the root bridge transmits a configuration

message encoded as BPDU.

� all bridges sharing links with root bridge receive the BPDU and pass it to the STP

entity within the bridge. Like the data frames, the BPDU is no forwarded by the

bride to the end stations.

� The designated bridge will create a new BPDU based no the received BPDU from

the root bridge and then transmit the message.

� So in each tire, the designated bridges will update the BPDU with their own

information and transmit to the next tire. This process continues until there are no

more designated bridges.


23/51

CHAPTER 7

Full Duplex Operation


24/51

Half Duplex: It’s like where one device is transmitting and the other devices are

receiving.

Full Duplex channel:Its a communication channel which supports data transfer in both

directions.

Half-duplex works optimally only if one device is transmitting and all the other devices are receiving.otherwise, collisions occur. When the collisions are detected, the devices causing the collision wait

for a random time before retransmitting. Half-duplex is the most common transmission method and is

adequate for normal workstation and PC connections.

Full-duplex provides dual communication on a point-to-point connection and allows each device to simultaneously transmit and receive on a connection. Full-duplex mode is typically used to connect to other switches or to connect fast access devices such as workgroup servers.

To use full-duplex communication, both ends of the connection must be configured to operate in full-duplex mode.Full-duplex operation is only possible on point-to-point Ethernet connections that use separate conductors or fibers for transmit and receive, such as 10Base-T and 100Base-FX cabling etc. Full-duplex operation is not possible on connections using coaxial or AUI (10Base-5) cables or with most hubs.

Full Duplex Operation in LAN: It Depends on

1.Use of dedicated media as provided by the popular structured cabling.(10 Base T,1000 Base Sx,1000 Base Lx etc)

2. The use of microsegmented (One PC to One Port connection), dedicated LAN’S.

For full duplex operation to occur:

1. There should be 2 devices on LAN (Switch –PC or PC to PC etc).

2. Physical cabling should support Full Duplex.

3.Ethernet MAC must be configured to work in full duplex mode(Pascal code is used to disable collision detection).

Full duplex operation is a subset of half duplex,disabling functions of half duplex.(no CS, no MA, no CD).


25/51

Implications of full duplex operation :

1.Eliminating collisions.

2.Increasing aggregate channel capacity.

3.Increases potential load on switch.

Transmitter Operation:

A Full duplex transmitter will send a frame following two simple rules:

1.)The station sends frame by frame, that is, it finishes sending one frame before sending the next pending frame.

2.)The transmitter sends frames with interframe gap which gives the receiver some time to perform housekeeping chores.

Receiver Operation:

1.)The receiver waits for valid SFD and then begins to assemble the data link encapsulation of the frame.

2.) The Destination address is checked whether it matches the device otherwise its discarded.

3.)The FCS is checked, and any frame invalid is discarded.

4.)The frame length is checked and frames shorter than minimum length is discarded.

5.)The receiver passes up to its client all frames that have passed the previous tests.

Full Duplex Application Environments:

Full duplex operation is most often seen in:

1.) Switch to Switch connections-Increased capacity, Meet the two station LAN requirement for full duplex operation, and require link lengths in excess of those allowed by the use of CSMA/CD.

2.) Server and Router connections- Increased capacity, justified in using dedicated switch ports, even at very high speeds.

3.) Long distance connections. Optical fiber is commonly used as it supports long distances.


26/51

Chapter 8

LAN and Switch Flow Control


27/51

The need for flow control:

Both LANs and LAN switches are connectionless in nature. Frames are transferred

without error to a high degree of probability, but there is no absolute assurance of

success.

In the event of a bit error, receiver buffer unavailability, or any other abnormal

occurrence, a receiver simply discards the frame without providing any

notification of the fact. This allows LAN interfaces to be built at very low cost; a

connectionless system is much simpler to implement than a system that includes

mechanisms for error recovery and flow control within the data link.

Default switch behavior

A switch receives frames on its input ports and forwards them onto the

appropriate output ports based on information [typically DA] in the received

frame. Depending on the traffic patterns, switch performance limitations, and

available buffer memory, it is possible that frames can arrive faster than the switch

can receive, process, and forward them. The default behavior of a switch is to

discard frames when faced with congestion condition.

The Effect of Frame Loss

A higher layer protocol or application that requires reliable delivery must

implement some form of error control .Such mechanism in TCP use positive

acknowledge and retransmission [PAR] algorithm.

In this scheme, data being transferred in one direction between stations is

acknowledges in the other. The originating station does not assume that data has

been successfully delivered until an acknowledge has been received. Depending

on the transport protocol, a single lost frame can incur the penalty of idling the

data transfer for seconds.


28/51

Controlling flow in half duplex networks

Half Duplex with Back Pressure

Half-duplex back pressure ensures retransmission of incoming packets if a

half-duplex switch port is unable to receive incoming packets. When back

pressure is enabled and no buffers are available to a port, the switch

sends collision frames across the affected port and causes the transmitting

station to resend the packets. The switch can then use this retransmission

time to clear it’s receive buffer by sending packets already in the queue

MAC Control

MAC Control frame format

Preamble

(7-bytes)

Start

Frame

Delimiter

(1-byte)

Dest. MAC

Address (6-

bytes)

= (01-80-

C2-

00-00-01)

or unique

DA

Source

MAC

Address

(6-

bytes)

Length/Type

(2-bytes)

= 802.3 MAC

Control

(88-08)

MAC

Control

Opcode

(2-bytes)

= PAUSE

(00-01)

MAC

Control

Parameters

(2-bytes)

= (00-00 to

FF-FF)

Reserved

(42-

bytes)

= all

zeros

Frame

Check

Sequence

(4-bytes)

PAUSE Function

The PAUSE function is used to implement flow control on full duplex Ethernet links .PAUSE

operation MAC control architecture and frame format .The operation is defined only for

use across a single duplex link; it can’t be used on a shared LAN. It may be sued to

control data frame flow between:

A pair of end stations


29/51

A switch and an end station

A switch-to-switch link

The pause function is specifically designed to prevent switches from unnecessarily

discarding frames due to input buffer overflow under short-term transient overload

conditions.

PAUSE operation

PAUSE operation implements a very simple stop-start form of flow control. A device

wishing to temporarily inhibit incoming data sends a pause frame, with a parameter

indicating the length of time that the full duplex partner should wait before sending any

more dataframes.when a station receives a pause frame, it stops sending data frames for

the period

A station may issue a PAUSE may cancel the remainder of the pause period bu issuing

another PAUSE frame with a parameter of zero time.

FLOW CONTROL IMPLEMENTATION ISSUES

Design implications of PAUSE Function

1) Inserting PAUSE frames in the Transmit Queue

Ethernet simply transmits frames in the order presented by

the device driver, without PAUSE frame. Inserting PAUSE frames in timely

manner is important for the effective use of flow control. The transmission of

PAUSE frame cannot preempt a data transmission in progress. Therefore, the

interface should complete the transmission of any frame in progress, wait

interface spacing, and then send the requested PAUSE frame.

2) Parsing received PAUSE frames

An interface must inspect and parse the fields in all incoming frames to

determine when a valid PUASE has been received in order to act upon it. The

fields like DA, Type field, MAC control opcode, FCS must be checked.


30/51

3) PAUSE timing

Following the reception of the PAUSE frames itself (i.e. starting from the end of

the last bit of the received FCS), the interface has the maximum 512 bit times to

validate, decode and act upon the PAUSE frame. If during this time, the

transmitter begins transmission of frame, then it is completed normally.


31/51

Chapter 9

LINK AGGREGATION


32/51

Why Link Aggregation?

Link Aggregation or trunking is a method of combining physical network links into a single

logical link for increased bandwidth. With Link aggregation we are able to increase the capacity

and availability of the communications channel between devices (both switches and

end stations) using existing Fast Ethernet and Gigabit Ethernet technology. Two or more Gigabit

Ethernet connections are combined in order to increase the bandwidth capability and to

create resilient and redundant links. A set of multiple parallel physical links between two devices

is grouped together to form a single logical link.

Link Aggregation also provides load balancing where the processing and communications

activity is distributed across several links in a trunk so that no single link is overwhelmed.

By taking multiple LAN connections and treating them as a unified, aggregated link, we can

achieve practical benefits in many applications.

Link Aggregation provides the following important benefits:

_ Higher link availability

_ Increased link capacity

_ Improvements are obtained using existing hardware (no upgrading to higher-capacity

link technology is necessary)

Aggregating replaces Upgrading

If the link capacity is to be increased, there are usually two possibilities: either upgrade the

native link capacity or use an aggregate of two or more lower-speed links. Upgrades typically

occur in factors of 10. In many cases, however, the device cannot take advantage of this

increase. A performance improvement of 1:10 is not achieved, moreover the bottleneck is just

moved from the network link to some other element within the device.

Link aggregation may be less expensive than a native speed upgrade and yet achieve

a similar performance level. Both the hardware costs for a higher speed link and the equivalent

number of lower speed connections have to be balanced to decide which approach is the most

advantageous.

**Sometimes link aggregation may even be the only means to improve performance when the

highest data rate available on the market is not sufficient.


33/51

Types of Link Aggregation

There are a number of situations where Link Aggregation is commonly deployed:

_ Switch-to-switch connections

_ Switch-to-station (server or router) connections

_ Station-to-station connections

Switch-to-Switch Connections

In this scenario, multiple workgroups are joined to form one aggregated link. By aggregating

multiple links, the higher speed connections can be achieved without hardware upgrade.

Switch-to-Station (Server or Router) Connections

Most server platforms can saturate a single 100 Mb/s link with many of the applications

available today. Thus, link capacity becomes the limiting factor for overall system performance.

Station-to-Station Connections

In the case of aggregation directly between a pair of end stations, no switches are involved

at all. As in the station-to-switch case, the higher performance channel is created without

having to upgrade to higher-speed LAN hardware. In some cases, higher-speed NICs may

not even be available for a particular server platform, making link aggregation the only practical

choice for improved performance.

Physical issues in Link Aggregation

Addressing

Each network interface controller is assigned a unique MAC address. Usually this address is

Programmed into the ROM during manufacturing. During initialization, the device driver

reads the contents of the ROM and transfers the address to a register within the MAC controller.

In most cases, this address is used as source and destination address during the

transmission of packets. Aggregated links are to appear as a single link with a single logical

network interface and therefore only have one “virtual” MAC address. The MAC address of

one of the interfaces belonging to the aggregated link provides the “virtual” address of the

logical link.

Frame Distribution[transmission of frames ]

When applying WAN technologies, frames are sometimes broken into smaller units to accelerate

transmission . LAN communications channels, however, do not support sub-frame transfers. The

complete frame has to be sent through the same physical link. Using aggregated links, the task is

to select the link on which to transmit a given frame. Sending one long frame may take longer


34/51

than sending several short ones, so the short frames may be received earlier than one long

frame. The order has to be restored at the receiver side. Thus, an agreement has been made: all

frames belonging to one conversation must be transmitted through the same physical link, which

guarantees correct ordering at the receiving end station. For this reason no sequencing

information may be added to the frames. Traffic belonging to separate conversations can be sent

through various links in a random order. The algorithm for assigning frames to a conversation

depends on the application environment and the kind of devices used at each end of the link.

When a conversation is to be transferred to another link because the originally mapped link

is out of service (failed or configured out of the aggregation) or a new link has become available

relieving the existing ones, precautions have to be taken to avoid mis-ordering of

frames at the receiver. This can be realized either by means of a delay time the distributor

must determine somehow or through an explicit marker protocol that searches for a marker

identifying the last frame of a conversation. The distributor inserts a “marker message” behind

the last frame of a conversation. After the collector receives this “marker message” it

sends a response to the distributor, which then knows, that all frames of the conversation

have been delivered. Now the distributor can send frames of these types of conversations

via a new link without delay. If the conversation is to be transferred to a new link, because

the originally mapped link failed, this method will not work. There is no path on which the

message marker can be transferred, i.e. the distributor has to employ the timeout method.

Technology Constraints

In principle, the devices applied in the aggregation restrict the throughput. Using an aggregation

of four 100 Mb/s links instead of one 100 Mb/s link will increase the capacity but the

throughput on each link remains the same.


35/51

CHAPTER-11

Virtual LANs:Applications and Concepts


36/51

VLAN (Virtual LAN): Virtual LAN - Virtual Local Area Network. A division of a local

area network by software rather than by physical arrangement of cables. Division of the

LAN into subgroups can simplify and speed up communications within a workgroup.

Switching a user from one virtual LAN to another via software is also easier than

rewiring the hardware. The stations on the same VLAN group can communicate with

each other. With VLAN, a station cannot directly talk to or hear from stations that are not

in the same VLAN group(s)

Applications of VLAN:

1.) Software patch panel: This simple application requires only port based vlans. With the

centralized wiring center connections between equipment on the LAN are made by patch

cord interconnections on a wiring panel. Thus moving, adding or changing a station

Can be simply achieved by changing the patch cord interconnections without rewiring.

2.) LAN Security: A user on a shared LAN can create problems by sending lots of traffic

to some targeted users, resulting in performance degradation. Therefore by creating

logical partitions to the catenet with VLAN technology we enhance the protections

against unwanted traffic. Port based VLAN allows free communication among the

members of a given VLAN, but does forward traffic among switch ports associated with

members of different VLANs.

3.) User Mobility:

a.) User’s view of the network can stay consistent regardless of physical location.

b.)Network layer addresses may not need to be changed based on physical location.

c.)Mobile users are granted access privileges so that they can access their home servers.

4.) Bandwidth Preservation: VLAN technology will isolate traffic between logically

separated workgroups, thus preserving bandwidth.

VLAN Concepts:

A station can be in multiple VLANs depending upon the capabilities of the station and

switches deployed and applications operating within the station. Stations simply look at

frames and classify a frame as belonging to a particular based on a set VLAN association

rules. LAN aware devices just need to apply the rules and classify frames as belonging to

one vlan or another.

VLAN Tagging:


37/51

Implicit Tags: Here tags are not involved; it’s an unmodified frame as sent by any station

or switch. All frames sent by VLAN unaware end stations are considered implicitly

tagged. Here it’s based on set of VLAN association rules. The VLAN association is a

function of protocol type, data link source address, higher layer network identifiers etc. If

there are no explicit tags provided then the VLAN aware switch must determine the

VLAN association from an application of the rules.

Explicit Tags: An explicit tag is a predefined filed in a frame that carries the VLAN

identifier for that frame. These tags are applied to the VLAN aware devices and these

devices after receiving the frame does not re-apply the application rules.

Tagged Frame Type - this indicates the type of tag, for Ethernet frames this is

currently always 0x8100.

Priority - this ranges from binary 000 (0) for low priority to binary 111 (7) for high

priority

Canonical - this is always 0.

VLAN ID - this identifies the VLAN number when trunking VLANs.

VLAN Awareness:


38/51

1.) Making frame forwarding decisions based on VLAN association of a given

frame.(based on DA and also on the VLAN to which the frame belongs).

2.) Providing explicit VLAN identification within transmitted frame.

VLAN Aware Switches:

Edge Switches: These switches connect at the boundary between VLAN unaware domain

and the VLAN aware domain. This switch apply rules on every frame and then tags these

frames for forwarding it to the backbone through the core switch.An edge switch will

remove the inserted tag before forwarding the frame to the VLAN unaware domain.

Core Switches: These switches connect between two VLAN aware devices. They do not

tag or untag frmes. It purely forwards frames based on VLAN identification in the tag.

It consist of a table that maps VLAN identifiers to the set of ports that are needed to reach

the members of the VLAN.The depth of the table is fixed at 4094 entries.

Vlan Aware End Stations: (Advantages):

1.) A Set of stations may negotiate a dynamically created VLAN for the purpose of

carrying on short term audio or video conference and the conferencing application

can tag frames for that particular conference with a unique VLAN identifier.

2.) The frame sent by the station will reach only to members of that same VLAN.

3.) If all frames carry VLAN tags, then all switches will become core switches that is

switches will make decision based on vlan tag information.

VLAN awareness in end stations. (Methods)

1.)Applications themselves need to be written to be VLAN aware.

2.)APIs need to be enhanced to support passing of VLAN information to and from

applications.

3.) Device drivers for LAN interfaces need to be changed to allow a client to specify a

VLAN in addition to the other information needed to send frames on its behalf.

4.) Insert VLAN tags within transmitted frames.This is implemented in the device driver

or in VLAN aware NIC.

VLAN Unaware Switches: VLAN unaware switches are not capable of tagging or

untagging.VLAN unaware switch can process a VLAN tagged frame based on the

address in the frame.

VLAN Association Rules: (Mapping frames to VLANs)


39/51

1.) Port based VLAN mapping: Stations within a given VLAN can freely

communicate among themselves. No communication is possible between stations

connected to ports that are members of different VLANs.Its used for software

patch panel. It provides bandwidth preservation.This mapping is used in force10.

2.) Mac address VLAN mapping: In this type of mapping switch uses source address

to determine the VLAN membership. A look up process that is used to learn the

port mapping for the station is used to determine the VLAN mapping.

3.) Protocol Based VLAN mapping: A switch with protocol-based VLANs that divide the physical

network into logical VLAN groups for each required protocol. When a frame is received at a port, its

VLAN membership can then be determined based on the protocol type being used by the inbound

packets. The protocol based VLAN mapping allows a station to be member of multiple VLANs

depending on the number of protocols it supports (IP, IPX, and Appletalk etc).The VLAN mapping

is a function of both the source address and the encapsulated protocol.

4.) IP Subnet Based VLAN mapping: In this of mapping the VLANs are divided based on the IP

Subnets. A VLAN aware switch needs to perform two operations to create IP subnet based VLANs.

a.)Check if frame encapsulates an IP datagram.

b.)Extract the IP subnet portion of the IP source address in the encapsulated datagram.

5.) Application Based VLAN mapping: In this type the VLANs are divided based on higher layer

application processes. The applications could provide audio or video conferencing, group document

preparation etc.The use of application based VLANs requires that the station be VLAN aware.The

application will ensure that the frame carried the VLAN identifier in an explicit tag, so that the VLAN aware

switches never need to parse the frames to determine the application and they can simply switch frames

based upon the VLAN identified in the tag.


40/51

Chapter 12

Virtual LANs: The IEEE Standard


41/51

VLAN: Virtual Local Area Network and IEEE 802.1Q

Virtual LAN (VLAN) is a group of devices on one or more LANs that are configured so that they can

communicate as if they were attached to the same wire, when in fact they are located on a number of

different LAN segments. Because VLANs are based on logical instead of physical connections, it is very

flexible for user/host management, bandwidth allocation and resource optimization.

There are the following types of Virtual LANs:

1. Port-Based VLAN: each physical switch port is configured with an access list

specifying membership in a set of VLANs.

2. MAC-based VLAN: a switch is configured with an access list mapping individual

MAC addresses to VLAN membership.

3. Protocol-based VLAN: a switch is configured with a list of mapping layer 3

protocol types to VLAN membership - thereby filtering IP traffic from nearby

end-stations using a particular protocol such as IPX.

The IEEE 802.1Q specification establishes a standard method for tagging Ethernet frames

with VLAN membership information. The IEEE 802.1Q standard defines the operation

of VLAN Bridges that permit the definition, operation and administration of Virtual LAN

topologies within a Bridged LAN infrastructure. The 802.1Q standard is intended to

address the problem of how to break large networks into smaller parts so broadcast and

multicast traffic would not grab more bandwidth than necessary. The standard also helps

provide a higher level of security between segments of internal networks

Protocol Structure - VLAN: Virtual Local Area Network and the IEEE 802.1Q

IEEE 802.1Q Tagged Frame for Ethernet:

7 1 6 6 2 2 2 42-1496 4

Preamble SFD DA SA

TPID TCI Type Length Data CRC

TPID - defined value of 8100 in hex. When a frame has the EtherType equal to

8100, this frame carries the tag IEEE 802.1Q / 802.1P.


42/51

TCI - Tag Control Information field including user priority, Canonical format

indicator and VLAN ID.

• Tag-based VLAN Overview

Regarding IEEE 802.1Q standard, Tag-based VLAN uses an extra tag in the MAC header

to identify the VLAN membership of a frame across bridges. This tag is used for VLAN

and QoS (Quality of Service) priority identification. The VLANs can be created statically

by hand or dynamically through GVRP. The VLAN ID associates a frame with a specific

VLAN and provides the information that switches need to process the frame across the

network. A tagged frame is four bytes longer than an untagged frame and contains two

bytes of TPID (Tag Protocol Identifier, residing within the type/length field of the

Ethernet frame) and two bytes of TCI (Tag Control Information, starts after the source

address field of the Ethernet frame).

• TPID : TPID has a defined value of 8100 in hex. When a frame has the

EtherType equal to 8100, this frame carries the tag IEEE 802.1Q / 802.1P.


43/51

• Priority: The first three bits of the TCI define user priority, giving eight (2^3)

priority levels. IEEE 802.1P defines the operation for these 3 user priority bits.

• CFI: Canonical Format Indicator is a single-bit flag, always set to zero for

Ethernet switches. CFI is used for compatibility reason between Ethernet type

network and Token Ring type network. If a frame received at an Ethernet port has

a CFI set to 1, then that frame should not be forwarded as it is to an untagged port.

• VID: VLAN ID is the identification of the VLAN, which is basically used by the

standard 802.1Q. It has 12 bits and allow the identification of 4096 (2^12)

VLANs. Of the 4096 possible VIDs, a VID of 0 is used to identify priority frames

and value 4095 (FFF) is reserved, so the maximum possible VLAN configurations

are 4,094.

Note that user priority and VLAN ID are independent of each other. A frame with VID

(VLAN Identifier) of null (0) is called a priority frame, meaning that only the priority

level is significant and the default VID of the ingress port is given as the VID of the

frame.

• How 802.1Q VLAN works

According to the VID information in the tag, the switch forward and filter the frames

among ports. These ports with same VID can communicate with each other. IEEE

802.1Q VLAN function contains the following three tasks, Ingress Process, Forwarding

Process and Egress Process.


44/51

1. Ingress Process:

Each port is capable of passing tagged or untagged frames. Ingress Process identifies if

the incoming frames contain tag, and classifies the incoming frames belonging to a

VLAN. Each port has its own Ingress rule. If Ingress rule accept tagged frames only, the

switch port will drop all incoming non-tagged frames. If Ingress rule accept all frame

type, the switch port simultaneously allow the incoming tagged and untagged frames:

• When a tagged frame is received on a port, it carries a tag header that has an

explicit VID. Ingress Process directly passes the tagged frame to Forwarding

Process.

• An untagged frame doesn't carry any VID to which it belongs. When an untagged

frame is received, Ingress Process insert a tag contained the PVID into the

untagged frame. Each physical port has a default VID called PVID (Port VID).

PVID is assigned to untagged frames or priority tagged frames (frames with null

(0) VID) received on this port.


45/51

After Ingress Process, all frames have 4-bytes tag and VID information, and then go to

Forwarding Process.

2. Forwarding Process:

The Forwarding Process decides to forward the received frames according to the Filtering

Database. If you want to allow the tagged frames can be forwarded to certain port, this

port must be the egress port of this VID. The egress port is an outgoing port for the

specified VLAN, that is, frames with specified VID tag can go through this port. The

Filtering Database stores and organizes VLAN registration information useful for

switching frames to and from switch ports. It consists of static registration entries (Static

VLAN or SVLAN table) and dynamic registration entries (Dynamic VLAN or DVLAN

table). SVLAN table is manually added and maintained by the administrator. DVLAN

table is automatically learned via GVRP protocol, and can't be created and upgraded by

the administrator


46/51

CHAPTER 13

Priority Operation


47/51

Priority operation adds complexity to switches, there is no need to pay for this

complexity unless there is an application benefit to be gained. There are two situations to

consider:

1.) The catenet cannot handle steady state traffic load offered by other users: It will occur

if some link or switch in the catenet has inadequate capacity to support the desired

application data flows.A steady state problem will occur if the switch did not support

wire speed operation at higher data rate.The solution is to add capacity to the network.

2.) The catenet has sufficient capacity to handle the steady state traffic load,but not short

term peak loads.There can be times when the traffic load wil exceed the capacity of some

link or switch,regardless of the design of the catenet.So priorities come in to picture

here,some traffic streams are more important then these streams can be given priority

over less traffic.This will work only for over load conditions.

LAN Priority Mechanisms:

1.)Access priority:Giving priority to a particular station in a shared LAN.

(a.)Static:Giving priority to the station all the time.

(b)Dynamic:Priority is given on frame by frame basis depending on applications running.

2.)User priority:It is the priority assigned to a given frame by the application sourcing

those frames.

For Ethernet access priority,some of the methods employed are:

1.) Shortened interframe gap: By reducing the IFG, we are making the particular traffic to

go sooner than others.

2.) Modified backoff algorithm: When collision occurs, the device with shortened

backoff time will transmit its frames sooner than the other stations involved in the

collision.


48/51

3.) Looong preamble: The longer the preamble, the higher the priority. The device with

the longest preamble ignores collision and continues with its frame transmission.

VLAN and Priority Tagging:

Tagged Frame Type - this indicates the type of tag, for Ethernet frames this is

currently always 0x8100.

Priority - this ranges from binary 000 (0) for low priority to binary 111 (7) for high

priority

Canonical - this is always 0.

VLAN ID - this identifies the VLAN number when trunking VLANs.

In order to use Priority mechanisms,

1.) The operating system and protocol stack have 2 to be modified.

2.) APIs in the internet protocol stacks have to be modified.

3.) Protocol implementations within the end stations may have to be enhanced.


49/51

4.)Operating system code(NIC APIs,network devices drivers)have to be modified.

5.) Network interfaces have to be modified.

Edge Switches: These switches sit on the boundary between the priority unaware

world and the priority aware core. They provide attachments for end stations directly.

Core Switches: These typically provide backbone interconnections between the edge

switches.

Priority Operation in switches:

If we don’t invoke any priority mechanisms, the operation of a switch is quite

straightforward; the switch handles all frames equally. The whole idea of priority is to

allow frames that are more important to jump ahead of lower priority frames in the

queue.

Switch Process Flow for priority operation. It’s a three step process

1.) Determining frame priority on input: On receipt of a frame, the switch must

determine the priority of that frame, either from explicit priority information provided

in the frame itself or implicitly from the frame contents and a set of administrative

policy rules.

2.) Mapping input priority to class of service. Knowing the priority of the frame, the

switch must map that priority to one set of classes of service avaible at each output

port on which the frame is to be forwarded.Typically; each service class identifies a

particular output queue on each port.

3.) Output Scheduling. For a set of multiple output queues, the switch must apply

some scheduling algorithm to transmit frames from those queues according to the

needs of the classes of service that they represent.

Scheduling Algorithms:


50/51

1.) Strict priority: As the name implies inteprets priority literally, higher priority

queues will be served first and then the lower priority queues. It is the easiest

policy to implement. Incase if a high priority user offers more load than the

capacity of the output port, no frames will be transmitted from the lower priority

queues. In extreme case, all frames will be discarded.

2.) Weighted Fair Queuing: Its an alternative approach which does not exclude lower

priority queues completely. Weight is assigned to each queue; higher priority

queues are given greater weight than lower priority queues.The output scheduler

then use a round robin algorithm tempered by indicated weight. Weights are

usually assigned depending upon the bandwidth allocated to each queue. That is,

if all queues have traffic to send, the avaible bandwidth will be divided among

them by the ratio of their weights.

Indicating the priority in transmitted frames:

On input, we made a priority determination and possibly remapped that priority to a

globally consistent set of semantics. On output, we have three choices:

1.) Signal the user priority in a VLAN-style tag: This relieves the next device from

having to make an implicit priority determination from a set of administrative rules.

The tagging approach requires that he output port support tagged frames.

2.) Signal the user priority in a LAN specific maner: This method is used when output

port does not support tags, but supports native indication of user priority.

3.) Don’t signal user priority: On Ethernet ports without tag support, there is no

choice but to forward the frame without priority and the next device to receive the

frame will need to determine the priority through implicit means.

Priority Regeneration:

The IEEE 802.1p and q standards provide for priority regeneration. Priority

regeneration is only used when explicit priority is indicated in received frames


51/51

through a native priority field. Priority regeneration can be used not only to equalize

service levels among departments but to change or override the local administrative

policy.Priority regeneration allows an easy means of migrating and merging priority

enabled LANs into larger catenet without having to change all of the local

administrative policies at once.

IEEE 802.1p:

IEEE 802.1P defines a priority field that can be used by LAN switches and such at

the Ethernet level to prioritize traffic.

The prioritization specification works at the media access control (MAC) framing

layer (OSI model layer 2). The 802.1P standard also offers provisions to filter

multicast traffic to ensure it does not proliferate over layer 2-switched networks.

The 802.1p header includes a three-bit field for prioritization, which allows packes to

be grouped into various traffic classes. The IEEE has made broad recommendations

concerning how network managers can implement these traffic classes, but it stops

short of mandating the use of its recommended traffic class definitions. It can also be

defined as best-effort QoS (Quality of Service) or CoS (Class of Service) at Layer 2

and is implemented in network adapters and switches without involving any

reservation setup. 802.1p traffic is simply classified and sent to the destination; no

bandwidth reservations are established.

********************************************************

the switch book by rich seifert-notes

Documents

patch cord

canonical

high priority

software patch

mac control

vlan unaware

transparent

full duplex