final review - university of california,...
TRANSCRIPT
1
1
Final ReviewEE 122: Intro to Communication Networks
Fall 2006 (MW 4-5:30 in Donner 155)
Vern PaxsonTAs: Dilip Antony Joseph and Sukun Kim
http://inst.eecs.berkeley.edu/~ee122/
Materials with thanks to Jennifer Rexford, Ion Stoica,and colleagues at Princeton and UC Berkeley
2
Announcements• Additional office hours–Sukun next week: Friday 1-3PM–Dilip: likely next Weds & Thurs (along with regular Fri)–Me next week: regular Weds + by appointment
• Course evaluations today ~5:20PM–No 5-minute break during this lecture
2
3
Final Review• Saturday Dec. 16, 8AM-11AM, in 2 Le Conte–Near Evans / Bancroft Library
• Closed book
• You can have one regular-sized (8.5”x11”) sheet ofpaper with notes on both sides
• No PDAs, calculators, electronic/Internet gadgets,smart cell phones, jeweler’s loupes, etc.
• No Blue Books - all answers on exam sheets
• Ensure legibility (pencil + eraser)
• Emphasis is on material since midterm
4
Fundamental Challenges for Networking• Speed-of-light
• Desiring a pervasive global network
• Need for it to work efficiently/cheaply
• Failure of components
• Enormous dynamic range– “no such thing as typical”
• Disparate parties must work together
• Rapid growth/evolution
• Crooks & other bad guys
3
5
Avoiding Manual Configuration• Dynamic Host Configuration Protocol (DHCP)–End host learns how to send packets– Learn IP address, DNS servers, “gateway”, what’s local
• Address Resolution Protocol (ARP)– For local destinations, learn mapping between IP
address and MAC address
host host DNS... host host DNS...
router router
1.2.3.0/23255.255.254.0
5.6.7.0/24
1.2.3.7 1.2.3.1561.2.3.48
1.2.3.19
router
1A-2F-BB-76-09-AD
6
Key Ideas in Both Protocols• Broadcasting: when in doubt, shout!
• Caching: remember the past for a while
• Soft state: eventually forget the past–Key for robustness in the face of unpredictable change
4
7
Dynamic Host Configuration Protocol
arrivingclient
DHCP server203.1.2.5
DHCP discover(broadcast)
DHCP offer
DHCP request
DHCP ACK
(broadcast)
8
Figuring Out Where To Send Locally• Two cases:
– Destination is on the local network So need to address it directly
– Destination is not local (“remote”) Need to figure out the first “hop” on the local network
• Determining if it’s local: use the netmask– E.g., mask destination IP address w/ 255.255.254.0– Is it the same value as when we mask our own address?
Yes = local No = remote
host host DNS... host host DNS...
router router
1.2.3.0/23255.255.254.0
5.6.7.0/24
1.2.3.7 1.2.3.1561.2.3.48
1.2.3.19
router
1A-2F-BB-76-09-AD
5
9
Address Resolution Protocol• Every node maintains an ARP table– <IP address, MAC address> pair
• Consult the table when sending a packet
• But: what if IP address not in the table?–Sender broadcasts: “Who has IP address 1.2.3.156?”–Receiver responds: “MAC address 58-23-D7-FA-20-B0”–Sender caches result in its ARP table
• Link-layer protocol (RFC826)–Not IP (or UDP or TCP over IP) because IP requires that
you already know the destination IP address
10
Example: A Sending a Packet to BHow does host A send an IP packet to host B?
A
RB
6
11
Security Analysis of ARP• Impersonation–Any node that hears request can answer …–… and can say whatever they want
• Actual legit receiver never sees a problem–Because even though later packets carry its IP address,
its NIC doesn’t capture them since not its MAC address
• Or: Man-in-the-middle attack– Imposter forwards everything it receives for destination
but gets to inspect (& maybe alter) it first
• Does the attacker have to “win” a race?–Maybe not, if sender blindly believes ARP responses
• Different attack: overflow ARP table, force evictions
12
Internet Control Message Protocol• ICMP runs on top of IP–Viewed as an integral part of IP
Not viewed as a transport protocol
• Diagnostics– Triggered when an IP packet encounters a problem
E.g., Time Exceeded or Destination Unreachable– ICMP packet sent back to the source IP address
Includes the error information (e.g., type and code) … and IP header plus 8+ byte excerpt from original packet
–Source host receives the ICMP packet Inspects excerpt (e.g., protocol and ports) … to identify which socket should receive the error
7
13
Path MTU Discovery• MTU = Maximum Transmission Unit– Largest IP packet that a link supports
• Path MTU (PMTU) = minimum end-to-end MTU–Sender must keep datagrams no larger to avoid
fragmentation
• How does the sender know the PMTU is?
• Strategy (RFC 1191):– Try a desired value–Set DF to prevent fragmentation–Upon receiving Need Fragmentation ICMP …
… oops, that didn’t work, try a smaller value
14
traceroute to www.whitehouse.gov (204.102.114.49), 30 hops max, 40 byte packets 1 cory115-1-gw.EECS.Berkeley.EDU (128.32.48.1) 0.829 ms 0.660 ms 0.565 ms 2 cory-cr-1-1-soda-cr-1-2.EECS.Berkeley.EDU (169.229.59.233) 0.953 ms 0.857 ms 0.727 ms 3 soda-cr-1-1-soda-br-6-2.EECS.Berkeley.EDU (169.229.59.225) 1.461 ms 1.260 ms 1.137 ms 4 g3-8.inr-202-reccev.Berkeley.EDU (128.32.255.169) 1.402 ms 1.298 ms * 5 ge-1-3-0.inr-002-reccev.Berkeley.EDU (128.32.0.38) 1.428 ms 1.889 ms 1.378 ms 6 oak-dc2--ucb-ge.cenic.net (137.164.23.29) 1.731 ms 1.643 ms 1.680 ms 7 dc-oak-dc1--oak-dc2-p2p-2.cenic.net (137.164.22.194) 3.045 ms 1.640 ms 1.630 ms 8 * * * 9 dc-lax-dc1--sac-dc1-pos.cenic.net (137.164.22.126) 13.104 ms 13.163 ms 12.988 ms10 137.164.22.21 (137.164.22.21) 13.328 ms 42.981 ms 13.548 ms11 dc-tus-dc1--lax-dc2-pos.cenic.net (137.164.22.43) 18.775 ms 17.469 ms 21.652 ms12 a204-102-114-49.deploy.akamaitechnologies.com (204.102.114.49) 18.137 ms 14.905 ms 19.730 ms
Lost Reply
Router doesn’t send ICMPs
Final HopNo PTR record for address
8
15
• Each router has a complete picture of the network
• How does each router get the global state?– Each router reliably floods information about its neighbors to every
other router (more later)
• Each router independently calculates the shortest path fromitself to every other router– Dijkstra’s Shortest Path Algorithm
Link State Routing
Host A
Host B Host E
Host DHost C
N1 N2
N3
N4
N5
N7N6
A
B E
DC
A
B E
DC A
B E
DC
A
B E
DC
A
B E
DC
A
B E
DC
A
B E
DC
16
Dijsktra’s Algorithm
1 Initialization:2 S = {A};3 for all nodes v4 if v adjacent to A5 then D(v) = c(A,v);6 else D(v) = ;78 Loop9 find w not in S such that D(w) is a minimum;10 add w to S;11 update D(v) for all v adjacent to w and not in S:12 D(v) = min( D(v), D(w) + c(w,v) ); // new cost to v is either old cost to v or known // shortest path cost to w plus cost from w to v13 until all nodes in S;
!
9
17
When to Initiate Flooding• Topology change–Link or node failure–Link or node recovery
• Configuration change–Link cost change
• Periodically–Refresh the link-state information–Typically (say) 30 minutes–Corrects for possible corruption of the data
18
Distance Vector Routing• Each router knows the links to its immediate
neighbors–Does not flood this information to the whole network
• Each router has some idea about the shortest pathto each destination–E.g.: Router A: I can get to router B with cost 11 via next
hop router D–Routers exchange this information with their neighboring
routers Again, no flooding the whole network
–Routers update their idea of the best path using info fromneighbors
10
19
Distance Vector Algorithm (cont’d)
1 Initialization: 2 for all neighbors V do3 if V adjacent to A 4 D(A, V) = c(A,V); 5 else • D(A, V) = ∞; • loop: 8 wait (until A sees a link cost change to neighbor V 9 or until A receives update from neighbor V) 10 if (D(A,V) changes by d) 11 for all destinations Y through V do 12 D(A,Y) = D(A,Y) + d 13 else if (update D(V, Y) received from V) /* shortest path from V to some Y has changed */ 14 D(A,Y) = D(A,V) + D(V, Y);15 if (there is a new minimum for destination Y)16 send D(A, Y) to all neighbors 17 forever
20
Routing: Link State vs. Distance Vector
Per-node message complexity
• LS: O(e) messages– e: number of edges
• DV: O(d) messages, many times– d is node’s degree
Complexity/Convergence
• LS: O(n2) computation
• DV: convergence time varies–may be routing loops– count-to-infinity problem
11
21
Interdomain Routing• Challenges of interdomain routing–Scale, privacy, and policy– Limitations of link-state and distance-vector routing
• Path-vector routing– Faster loop detection than distance-vector routing–More flexibility than shortest-path routing
• Border Gateway Protocol (BGP)– Incremental, prefix-based, path-vector protocol–Runs between Autonomous Systems (ASs)–Programmable import and export policies–Multi-step decision process for selecting “best” route
But often skewed by Hot Potato routing
22
TCP Service Model• Reliable, in-order, byte-stream delivery
– and with good performance
• Challenges - the network can– drop packets
Even perhaps a large number– delay packets
Even perhaps for many seconds– deliver packets out-of-order
Follows from possibility of arbitrary delay– replicate packets
Weird, but it does sometimes happen– corrupt packets– (What’s missing?) (security)
12
23
TCP Header
Source port Destination port
Sequence number
Acknowledgment
Advertised windowHdrLen Flags0
Checksum Urgent pointer
Options (variable)
Data
24
Timing Diagram: 3-Way Handshaking
Client (initiator)
Server
SYN, SeqNum = x
SYN + ACK, SeqNum = y, Ack = x + 1
ACK, Ack = y + 1
ActiveOpen
PassiveOpen
connect()
listen()
accept()
13
25
What if the SYN Packet Gets Lost?• Suppose the SYN packet gets lost–Packet is lost inside the network, or:–Server discards the packet (e.g., listen queue is full)
• Eventually, no SYN-ACK arrives–Sender sets a timer and waits for the SYN-ACK–… and retransmits the SYN if needed
• How should the TCP sender set the timer?–Sender has no idea how far away the receiver is–Hard to guess a reasonable length of time to wait–SHOULD (RFCs 1122 & 2988) use default of 3 seconds
Other implementations instead use 6 seconds
26
Normal Termination, One Side At A Time
• Finish (FIN) to close and receive remaining bytes– FIN occupies one octet in the sequence space
• Other host ack’s the octet to confirm
• Closes A’s side of the connection, but not B’s– Until B likewise sends a FIN– Which A then acks
SYN
SYN
ACK
ACK
Dat
a
FIN
ACK
ACK
timeA
B
FIN
ACK
Timeout:Avoid reincarnationCan retransmitFIN ACK if lost
Connectionnow half-closed
Connectionnow closed
14
27
Abrupt Termination
• A sends a RESET (RST) to B– E.g., because app. process on A crashed
• That’s it– B does not ack the RST– Thus, RST is not delivered reliably– And: any data in flight is lost– But: if B sends anything more, will elicit another RST
SYN
SYN
ACK
ACK
Dat
a
RSTA
CK
timeA
B
Data RS
T
28
Reasons for Retransmission
Packet
ACK
Tim
eout
Packet
ACK
Tim
eout
Packet
Tim
eout
Packet
ACK
Tim
eout
Packet
ACK
Tim
eout
Packet
ACK
Tim
eout
ACK lostDUPLICATE
PACKET
Packet lost Early timeoutDUPLICATEPACKETS
15
29
RTT Estimation
• Use exponential averaging:
!
SampleRTT = AckRcvdTime " SendPacketTime
EstimatedRTT =# $ EstimatedRTT + (1"#) $ SampleRTT
# = 7 /8 (for one measurement per flight)
Estim
ated
RTT
Time
SampleRTT
30
Jacobson/Karels Algorithm• Compute “slop” in terms of observed variability
!
Difference = SampleRTT " EstimatedRTT
Deviation = Deviation + # $ (|Difference |"Deviation)
RTO = µ $ EstimatedRTT + % $Deviation
# =1/4 (again, for one measurement per flight)
µ =1
% = 4
– Implementations often use a coarse-grained (500 msec)timer, so resulting value is large
16
31
Problem: Ambiguous Measurement• How to differentiate between the real ACK, and
ACK of the retransmitted packet?
ACK
Retransmission
Original Transmission
Sam
pleR
TT ?
Sender Receiver
ACKRetransmission
Original Transmission
Sam
pleR
TT ?
Sender Receiver
• Karn/Partridge algorithm: Measure SampleRTT onlyfor original transmissions–And use exponential backoff
32
TCP State Diagram
17
33
Flow Control vs. Congestion Control• Flow control keeps one fast sender fromoverwhelming a slow receiver–Controlled by advertised window
• Congestion control keeps a set of sendersfrom overloading the network–Controlled by CWND
34
View from a Single Flow
• Knee – point after which– Throughput increases very
slowly– Delay increases quickly
• Cliff – point after which– Throughput starts to decrease
very fast to zero (congestioncollapse)
– Delay approaches infinity
Load
Load
Thro
ughp
utD
elay
knee cliff
congestioncollapse
packetloss
18
35
Additive Increase, Multiplicative Decrease
• How much to increase and decrease?– Increase linearly, decrease multiplicatively (AIMD)–Necessary condition for stability of TCP
• Additive increase–On success for last window of data, increase linearly
One packet (MSS) per RTT Or: increment per ACK: CWND += MSS * (MSS / CWND)
• Multiplicative decrease–On loss of packet, divide congestion window in half
36
Slow StartDouble CWND per round-trip time
Simple implementation:on each ack, CWND += MSS
D A D D A A D D
A A
D
A
Src
Dest
D
A
1 2 4 8
19
37
Fast Retransmission
• Resend a segmentafter 3 duplicate ACKs– Duplicate ACK means
that an out-of sequencesegment was received
segment 1cwnd = 1
ACK 2cwnd = 2 segment 2
segment 3
ACK 4cwnd = 4 segment 4
segment 5segment 6segment 7
ACK 4
ACK 4• Notes:
– ACKs are for nextexpected packet
– Packet reordering cancause duplicate ACKs
– Window may be too smallto generate enoughduplicate ACKs
ACK 3cwnd = 3
ACK 4 segment 4
3 duplicateACKs
cwnd = 2
38
Repeating Slow Start After Timeout
t
Window
Slow-start restart: Go back to CWND of 1, but takeadvantage of knowing the previous value of CWND.
Slow start in operation untilit reaches half of previousCWND, I.e., SSTHRESH
TimeoutFastRetransmission
SSThreshSet to Here
20
39
TCP Performance• Time-Sequence plots provide a powerful tool for
visualizing TCP behavior & performance
• Spectrum of TCP mechanisms influenceperformance–Advertised window, sender window– Timeout, slow start, exponential backoff–Acking policy (delayed; ack-splitting; SACK)– Fast Retransmit (avoid RTO stall)– Fast Recovery (full AIMD)–Window scaling (required for large bandwidth-delay
product)
40
Summary of TCP Mechanisms• Delayed Acknowledgment– Lessens overhead (40 bytes per ACK)–But can cause CWND to grow more slowly
• Fast Retransmit–NACK-based loss detection in 1 RTT–Avoids timeout delay–AIMD after subsequent Slow Start reaches SSTHRESH
• Fast Recovery–Avoids needing to Slow Start after Fast Retransmit– True AIMD
• SACK–Both speeds recovery and avoids unnecessary rexmit.
21
41
Example of Time-Sequence Plot
Hollow squares = Acks
Solid squares = Data
MSS
Window
RTT
(Circles =AdvertisedWindow)
42
Example of Time-Sequence Plot
Slope gives overall throughput (bytes/sec)
22
43
Same Connection - Why So Different?
Note: Receiver acks every other segment
Receiver Perspective
44
Round-TripTime(RTT)
Sender Receiver
ACK 486
Data 4381:5841
Data 1461:2921Data 2921:4381Data 5841:7301
ACK 973
ACK 1461
Data 1:1461 • Rule: grow window by one full-sized packet for each valid ACK received
• Send M (distinct) ACKs forone packet
• Growth factor proportional to M
ACK-splitting
23
45
Page fetch from CNN.com
0
10000
20000
30000
40000
50000
60000
0 0.2 0.4 0.6 0.8 1
Time (sec)
Se
qu
en
ce
N
um
be
r (b
yte
s)
Modified Client
Normal Client
10 line change to Linux TCP
(Courtesy ofStefan Savage)
46
Mid-Transfer: Self-Clocking
Each flight of packets has thesame shape!
As do the ACKs …
24
47
Sliding Window induces Self-Clocking
(From [JK88])
48
Mid-Transfer: Why Doesn’t CWND Grow?
25
49
Receiver Window = 8 MSS
Circles show advertised window
There’s also a hiddensender window
50
How Many Packets Were Lost?
26
51
Answer: Zero!
52
Fast Retransmission
Window stays at 5 MSS⇒ transition toCongestion Avoidance
After pending data ack’d,slow start. CWND = 2 MSSsince ACK arrivalincremented it by MSS
Third dup triggers retransmission
27
53
Same Fast Retransmission @ Recv.
What happened here?
Reordering.
Again, arrivals muchmore smooth due tobottleneck shaping
54
TCP Throughput Equation• For packets of B bytes, throughput is– T = w·B/RTT = sqrt(1.5)·B/(RTT·sqrt(p))
• Implications:– Long-term throughput falls as 1/RTT– Long-term throughput falls as 1/sqrt(p)
• Non-TCP transport can use equation to provideTCP-friendly congestion control!
T =1.5B
RTT p
28
55
Generic Router Architecture
• Input and output interfacesare connected through aninterconnect
• Interconnect can beimplemented by– Shared memory
Low capacity routers (e.g.,PC-based routers)
– Shared bus Medium capacity routers
– Point-to-point (switched) bus High capacity routers Packets fragmented into
cells Essentially a network inside
the router!
input interface output interface
Inter-connect
56
Output Queued Routers
• Only output interfaces storepackets
• Advantages– Easy to design algorithms:
only one congestion point
• Disadvantages– Requires an output speedup
(Ro/C) of N, where N is thenumber of interfaces notfeasible
input interface output interface
Backplane
CRO
29
57
Input Queued Routers
• Input interfaces storepackets
• Easier to build sinceonly need R ≈ C– Though need to
implement “backpressure” to know whento send
• But harder to buildefficiently due tocontention andhead-of-line blocking
input interface output interface
Backplane
C R
58
Head-of-line Blocking• Cell at head of an input queue cannot be
transferred, thus blocking the following cells
Cannot betransferred because output buffer overflow
Cannot be transferred because is blocked by orange cell
Output 1
Output 2
Output 3
Input 1
Input 2
Input 3
• Modern high-speed routers use combination of input &output queuing, with flow control & multiple “virtual queues”
30
59
Simple Queuing - FIFO and Drop Tail• Most of today’s routers
• Transmission via FIFO scheduling– First-in first-out queue–Packets transmitted in the order they arrive
• Buffer management: drop-tail– If the queue is full, drop the incoming packet
60
Queuing Example
P = 1 Kbit; R = 1 Mbps P/R = 1 ms
Packet arrival Time (ms)
Time
Delay for packet that arrives at time t, d(t) = Q(t)/R + P/RQ(t)
1 Kb
P bits Q bits
0 0.5 1 7 7.5
0.5 Kb
1.5 Kb2 Kb
packet 1, d(0) = 1ms
packet 2, d(0.5) = 1.5ms
packet 3, d(1) = 2ms
31
61
Little’s Theorem• Assume a system where packets arrive at rate λ• Let d be mean delay of packet, i.e., mean time a packet
spends in the system
• Q: What is N, mean # of packets in the system?– E.g., for a router N would give the size of the queue
systemλ – mean arrival rate
d = mean delay
• A: N = λ x d
62
Random Early Detection (RED)• Basic idea of RED– Router notices that the queue is getting backlogged– … and randomly drops arriving packets to signal congestion
• Packet drop probability– Drop probability increases with average queue length– If buffer is below some level, don’t drop anything– … otherwise, set drop probability as function of length
• RED controls average queue size, avoiding lost bursts …– … and distributing congestion losses in a more fair fashion
Average Queue Length
Prob
abili
ty
32
63
Explicit Congestion Notification• Early dropping of packets–Good: gives early/refined feedback–Bad: costs a packet drop to give the feedback
• Explicit Congestion Notification (ECN)–Router instead marks the packet with an ECN bit–… which end system interprets as a sign of congestion
• Surmounting the challenges–Must be supported by both end hosts as well as routers–Requires two bits in the IP header and 2 TCP header bits
64
Summary of QoS• Basic mechanism for achieving better-than-best-
effort performance: scheduling–Multiple queues allow priority service– Fair queuing provides isolation between flows
• But: still need end-to-end mechanisms–Reservations & admission control–Descriptions of bursty traffic: token buckets
33
65
Summary of QoS, con’t• IntServ provides per-flow performance guarantees– But lacks scalability
• DiffServ provides per-aggregate tiers of relativeperformance–Scalable, but not as powerful
• Neither is generally available end-to-end today
• ISPs manipulating what services receive whatperformance raises issues of network neutrality
66
Scheduling• Decide when and what packet to send on output link
• Classifier partitions incoming traffic into flows each with theirown FIFO queue
1
2
Scheduler
flow 1
flow 2
flow n
Classifier
Buffermanagement
34
67
Max-Min Fairness
• Denote– C – link capacity– N – number of flows– ri – arrival rate
• Max-min fair rate computation:1. compute C/N2. if there are flows i such that ri ≤ C/N, update C and N
3. if no, f = C/N; terminate4. go to 1
• Flows receive at most the fair rate, i.e., min(f, ri)!
C = C " ri
i s.t ri #C /N$ ; N = N " k (for k such flows)
68
Fair Queuing (FQ) [DKS’89]• Conceptually, computes when each bit in the
queue should be transmitted to attain max-minfairness (a “fluid flow system” approach)
• Then serve packets in the order of thetransmission time of their last bits
• Provides isolation: misbehaving flow can’t impairothers
• Doesn’t “solve” congestion: still need to deal withindividual queues filling up
• Generalized to Weighted Fair Queuing (WFQ)
35
69
Characterizing Burstiness: Token Bucket• Parameters– r – average rate, i.e., rate at which tokens fill the bucket– b – bucket depth (limits size of burst)–R – maximum link capacity or peak rate
• A bit is transmitted only when token is available
r bps
b bits
≤ R bps
regulatortime
bits
b·R/(R-r)
slope R
slope r
Maximum # of bits sent
70
Arrival Curve: Example• Arrival curve – maximum amount of bits
transmitted during an interval of time Δt
• Use token bucket to bound arrival curvebits Arrival curve
time
bps
0 1 2 3 4 5
1
2
1 2 3 4 5
1
2
3
4
(R=2,b=1,r=1)
Δt
36
71
QoS Guarantees: Per-hop Reservation
• Router: allocate bandwidth ra, buffer space Basuch that– no packet is dropped– no packet experiences a delay larger than D
bits
b*R/(R-r)
slope rArrival curve
DBa
slope ra
R
72
Integrated Services: Required Elements• Reservation Protocol–How service request gets from host to network
• Admission control algorithm–How network decides if it can accept flow
• Packet scheduling algorithms–How routers deliver service
• Architecture for solution: IntServ–Provides service guarantees at a per-flow
granularity
37
73
Control Plane: Resource Reservation
SenderReceiver
74
Control Plane: Resource Reservation
SenderReceiver
Sender sends specificationof traffic profile
38
75
Control Plane: Resource Reservation
SenderReceiverPath established (or perhaps admission control denies path)
76
Control Plane: Resource Reservation
SenderReceiver
The receiver signalsreservation request
39
77
Control Plane: Admission Control
SenderReceiver
Per-flow state(soft state)
78
SenderReceiver
Control Plane: Admission Control
Per-flow state on all routers in path
40
79
Data Plane
SenderReceiver
Per-flow classification on each router
80
Data Plane
SenderReceiver
Per-flow classification on each router
41
81
Data Plane
SenderReceiver
Per-flow scheduling on each router
82
Differentiated Service (DS) Field
• DS field encodes Per-Hop Behavior (PHB)– E.g., Expedited Forwarding (all packets receive
minimal delay & loss)– E.g., Assured Forwarding (packets marked with
low/high drop probabilities)
Version HLen TOS LengthIdentification Fragment offsetFlags
Source addressDestination address
TTL Protocol Header checksum
0 4 8 16 19 31
Data
IPheader
DS Field0 5 6 7
ECN
42
83
Comparison to Best-Effort & Intserv
Per flow steupLong term setupNo setupComplexity
End-to-endDomainEnd-to-endServicescope
Not scalable(each routermaintains perflow state)
Scalable
(edge routersmaintain peraggregate state; corerouters per classstate)
Highly scalable(nodes maintainonly routing state)
Scalability
Per flow isolation
Per flowguarantee
Per aggregateisolation
Per aggregateguarantee
Connectivity
No isolation
No guarantees
Service
IntservDiffservBest-Effort
84
Summary of Middleboxes• Middleboxes address important problems–Using fewer IP addresses–Blocking unwanted traffic–Monitoring activity–Shaping use of network resources– Improving/controlling performance (vs. network neutrality)
• Middleboxes cause problems of their own–Connectivity erodes
Notion of addresses, ports weakened Middlebox state management can lead to connection termination
–Harder to deploy new apps
43
85
Network Address Translation Example
10.0.0.1
10.0.0.2
10.0.0.3
S: 10.0.0.1, 3345D: 128.119.40.186, 80
110.0.0.4
138.76.29.7
1: host 10.0.0.1 sends datagram to 128.119.40.186, 80
NAT translation tableWAN side addr LAN side addr138.76.29.7, 5001 10.0.0.1, 3345…… ……
S: 128.119.40.186, 80D: 10.0.0.1, 3345 4
S: 138.76.29.7, 5001D: 128.119.40.186, 802
2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table
S: 128.119.40.186, 80D: 138.76.29.7, 5001 3
3: Reply arrives dest. address: 138.76.29.7, 5001
4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345
86
Objections Against NAT• Difficult to support peer-to-peer applications–P2P needs a host to act as a server
• Layering violation (hence messiness)
• NAT violates the end-to-end principle–Network nodes should not modify the packets
• Connections become brittle
• Barrier to deployment of new apps
• IPv6 is a cleaner solution–Better to migrate than to limp along with a hack
44
87
Firewalls
administerednetwork
publicInternet
firewall
• Isolates organization’s internal net from Internet
• Allows some packets to pass, blocks others– (Refinement: shape some traffic, allow other unimpeded)
• Twin goals: security and policy enforcement
88
Example of Firewall Configuration• Alice’s firewall rules–#1: Don’t let Trudy machines in
Deny <src = 111.55.66.0/24, dst = 222.33.0.0/16>–#2: Let rest of Bob’s network in to special dsts
Permit <src=111.55.0.0/16, dst = 222.33.44.0/24>–#3: Block the rest of the world
Deny <src = 0.0.0.0/0, dst = 0.0.0.0/0>
45
89
Misleading Stateless Inspection
Source port Destination port
Sequence number
Acknowledgment
Advertised windowHdrLen SYN0
Checksum Urgent pointer
Options (variable)
Data
Split into twofragments.First is just 8bytes of IPpayload, i.e.,here
90
Misleading Stateless Inspection, con’t
Source port Destination port
Sequence number
Acknowledgment
Advertised windowHdrLen SYN0
Checksum Urgent pointer
Options (variable)
Data
Second fragmentstarts 8 bytes latercovering all of this
Firewall looks14 bytes intopayload, i.e.,here, which isunder thecontrol of theattacker
46
91
Example: Tunneling IP over EmailFrom: [email protected]: [email protected]: Here’s my IP datagram
IP-header-version: 4IP-header-len: 5IP-ID: 11234IP-src: 1.2.3.4IP-dst: 5.6.7.8IP-payload: 0xa144bf2c0102…
Program receives this legal email, builds an IP packetcorresponding to description in email body …… and injects it into the network
92
Summary of Cryptographic Mechanisms• Requirements for secure communication:–Authentication, authorization, integrity, confidentiality,
non-repudiation, availability
• Workhorse for many of these: cryptography–Symmetric encryption: fast, but requires shared secret–Public key encryption: no need for shared secret
• Hash functions provide integrity and signatures
• There are a range of attacks on cryptosystems–However, crypto is in fact our most mature security
technology
• Managing public keys: PKI–Digital certificates
47
93
Using Symmetric Keys
• Both the sender and the receiver use the samesecret keys
InternetEncrypt withsecret key
Decrypt withsecret key
Plaintext Plaintext
Ciphertext
94
Symmetric Key Ciphers - DES & AES• Idea: one-time pad (XOR w/ randombits)–But requires as much key material as plaintext
• Data Encryption Standard (DES)– 56-bit key (decreased from 64 bits at NSA’s request)–Still fairly strong other than brute-forcing the key space
But custom hardware can crack a key in < 24 hours– Today many financial institutions use Triple DES
• Advanced Encryption Standard (AES)• Replacement for DES standardized in 2002• Key size: 128, 192 or 256 bits
• How fundamentally strong are they?• No one knows (no proofs exist)
48
95
Cryptographically Strong Hashes• Desired properties when faced with an adversary:–Hard to invert
Given hash, adversary can’t find input that produces it–Hard to find collisions
Adversary can’t find two inputs that produce the same hash
⇒ Someone cannot alter the message withoutmodifying the digest
• Hashes let us–Succinctly refer to large objects–Obliquely refer to private objects (e.g., passwords)
Send hash of object rather than object itself (since hard to invert) Can prepend a (secret) key so that hashes of known items is
unpredictable
96
Standard Cryptographic Hash Functions• MD5 (Message Digest version 5)–Produces 128 bit hashes–Widely used (RFC 1321)–Broken:
Recent work quickly finds collisions
• SHA-1 (Secure Hash Algorithm)–Produces 160 bit hashes–Widely used (SSL/TLS, SSH, PGP, IPSEC)–Broken:
Recent work finds collisions, though not really quickly … yet
49
97
Public Key / Asymmetric Encryption• Sender uses receiver’s public key–Advertised to everyone
• Receiver uses complementary private key–Must be kept secret
InternetEncrypt withpublic key
Decrypt withprivate key
Plaintext Plaintext
Ciphertext
98
RSA Encryption and Decryption• Encryption of message block m: c = E(m, e) = me mod n
• Decryption of ciphertext c: m = D(c, d) = cd mod n
–Works due to number-theoretic properties–Note: D(E(x, e), d) = E(D(x, d), e) = x I.e., D & E are inverses
50
99
RSA Crypto & Signatures• Suppose Alice has published public key KE
• If she wishes to prove who she is, she cansend a message x encrypted with herprivate key KD (i.e., she sends D(x,KD))–Recall: E(x,KE) and D(x,KD) are inverses–Therefore: anyone w/ public key KE can recover
x, verify that Alice must have sent the message It provides a signature
–Alice can’t deny it ⇒ non-repudiation
100
Summary of Our Crypto Toolkit• If we can securely distribute a key, then–Symmetric ciphers (e.g., AES) offer fast,
presumably strong confidentiality
• Public key cryptography does away with(potentially major) problem of secure keydistribution–But: not as computationally efficient
Use public key crypto to exchange a session key–And: also not guaranteed secure (but major
result if not)–Strength of popular RSA algorithm rests on
factoring large numbers
51
101
Summary of Our Crypto Toolkit, con’t• Cryptographically strong hash functions provide
major building block for integrity (e.g., SHA-1)–As well as providing concise digests–And providing a way to prove you know something
(e.g., passwords) without revealing it (non-invertibility)–But: worrisome recent results regarding their strength
• Public key also gives us signatures– Including sender non-repudiation
102
Types of Attacks on Crypto Systems• Guess the key
• Brute-force the key
• Steal the key
• Deduce the key (break the crypto algorithm)
• Replay (send a copy of an old message)
• Man-in-the-middle (intercept communication,forward it along after listening/modifying)
52
103
Digital Certificate
• Signed data structure that binds an entity withits corresponding public key–Signed by a recognized and trusted authority, i.e.,
Certification Authority (CA)–Provide assurance that a particular public key
belongs to a specific entity
• Example: certificate of entity Y Cert = E({nameY, KYpublic}, KCAprivate)–KCAprivate: private key of Certificate Authority–KYpublic: public key of entity Y– nameY: name of entity Y
• Your browser has a bunch of CAs wired into it
104
Managing Trust• Trust is not particularly transitive–Should Alice trust Bob because she trusts Charlie …–… and Charlie vouches for Donna …–… and Donna says Eve is trustworthy …–… and Eve vouches for Bob’s identity?
• Two models of delegating trust–Rely on your set of friends and their friends
“Web of trust” -- e.g., PGP–Rely on trusted, well-known authorities (and their
minions) “Trusted root” -- e.g., HTTPS
53
105
Putting It All Together - HTTPS• https = “Use HTTP over SSL/TLS”• SSL = Secure Socket Layer• TLS = Transport Layer Security• Successor to SSL, and compatible with it• RFC 4346
• Provides security layer (authentication,encryption) on top of TCP• Fairly transparent to the app
106
HTTPS Connection (SSL/TLS), con’t
• Browser (client) connectsvia TCP to Amazon’sHTTPS server
• Client sends over list ofcrypto protocols itsupports
• Server picks protocols touse for this session
• Server sends over itscertificate
• (all of this is in the clear)
SYN
SYN ACK
ACK
Browser Amazon
Hello. I support(TLS+RSA+AES128+SHA1) or
(SSL+RSA+3DES+MD5) or …
Let’s use
TLS+RSA+AES128+SHA1
Here’s my cert
~1 KB of dat
a
54
107
HTTPS Connection (SSL/TLS), con’t• Browser constructs a random
session key K
• Browser encrypts K usingAmazon’s public key
• Browser sends E(K, {n, e}) toserver
• Browser displays
• All subsequent communicationencrypted w/ symmetric cipher(e.g., AES128) using key K– E.g., client can authenticate using
a password
• (missing? Checking for revocation)
Browser Amazon
Here’s my cert
~1 KB of dat
a
E(K, {n,e})K
K
E(password …, K)
E(response …, K)
Agreed
108
Summary of Attacks• Attacks that compromise a system can occur at
different semantic levels–E.g., Buffer overflow vs. cross-site-scripting vs. social
engineering–Automated attacks lead to worms and bots
• Denial-of-service via flooding likewise can occur atdifferent semantic levels–Network layer vs. transport layer vs. application layer–Very hard to address if attacker has a lot of zombies
55
109
Host Compromise• Tricking a host into executing on your behalf
• Can consider what is attacked (server or client)and the semantic level at which it is attacked
• Violation of program semantics:–E.g., buffer overflow
• Exploiting logic errors–E.g., cross-site scripting attacks–No violation of program semantics
• Social engineering–E.g., phishing
110
Example: Buffer Overflowvoid get_cookie(char *packet) {
. . . (200 bytes of local vars) . . . munch(packet);
. . .
}
void munch(char *packet) {
int n;
char cookie[512];
. . .
code here computes offset of cookie inpacket, stores it in n
strcpy(cookie, &packet[n]);
. . .
}
return address backto get_cookie()
n
Stack
X
X - 4
X - 8
X - 520
X - 524return address back
to munch()
get_cookie()’s stack frame
X + 200 cookievaluereadfrom
packet
56
111
Example: Buffer Overflowvoid get_cookie(char *packet) {
. . . (200 bytes of local vars) . . . munch(packet);
. . .
}
void munch(char *packet) {
int n;
char cookie[512];
. . .
code here computes offset of cookie inpacket, stores it in n
strcpy(cookie, &packet[n]);
. . .
}
X
Stack
X
X - 4
ExecutableCode
X + 200
Now branches to code read in fromthe network
From here on, machine fallsunder the attacker’s control
112
Semantic Level of Compromise, con’t• Logic errors
• E.g., suppose your Web server passes anyargument named “rev” in a URL request to abackend script called munch via the equivalent of
sh munch $revand returns its output
• Now suppose you receive the following request:GET /bin/TWikiUsers?rev=2%20|more%20/etc/passwd
It decodes to: $rev = “2 |more /etc/passwd”
57
113
Logic Errors, con’t• Your script is invoked as
sh munch 2 |more /etc/passwdwhich returns as output the password file.
• “Cross-site scripting attack”• Similar “SQL injection” attacks on backend databases
• Note: no violation of programming semantics!
⇒ Very hard to detect. Need to understandintended semantics.
114
Semantic Level of Compromise, con’t• Social engineering: misleading/fooling humans
• E.g., DNS typo attacks (register www.gooogle.com)
• Powerful technique for targeted attacks– E.g., find out name & mailstop of a company’s sysadmin– … mail an employee bogus system CD as if from them
With a note that it contains an important security update– User trusts source of update, applies it
They install a backdoor into company
• E.g., “I love you” email virus
• E.g., phishing• General defense: user education :-(
58
115
Automated Compromise: Worms & Bots• When attacker compromises a host, they can
instruct it to do whatever they want
• Instructing it to find more vulnerable hosts createsa worm: a program that self-replicates across anetwork• Can spread via picking random 32-bit #s (IP addresses)• … but this isn’t fundamental
• As the worm repeatedly replicates, it growsexponentially fast because each copy of the wormworks in parallel to find more victims
• Attacker can instead install a bot to facilitatefuture access to the system
116
Summary of Denial-of-Service• Can occur at different semantic levels– Network layer vs. transport layer vs. application layer– Very hard to address if attacker has a lot of zombies
• Principle: attacker finds bottleneck element …– … and sends it more work than it can cope with
• E.g.:– Router’s packets-per-second processing capability– Link’s bits-per-second transmission capability– End host’s memory available for new connections …– … or cycles available to validate connections (cookies)– Server’s cycles for processing requests
• Defend via– Overprovisioning– Force sender to prove they’re not spoofing (cookies)– Force sender to prove they’re not a robot (CAPTCHAs)
59
117
Distributed Denial-of-Service (DDoS)
Master
Slave 1
Slave 3
Slave 4
Slave 2
Victim
Control traffic directsslaves at victim
src = randomdst = victim
Slaves send streams of traffic(perhaps spoofed) to victim
118
Diffuse DDoS: Reflector Attack
Master
Slave 1
Slave 3
Slave 4
Slave 2
Victim
Control traffic directs slavesat victim & reflectors
Request: src = victim dst = reflector
Reflectors send streams of non-spoofedbut unsolicited traffic to victim
Reflector 1
Reflector 9
Reflector 4
Reflector 2
Reflector 3
Reflector 5
Reflector 6Reflector 7
Reflector 11Reflector 8
Reflector 10
Reply: src = reflector dst = victim
60
119
Defending Against Network Flooding• How do we defend against such floods?
• Answer: basically, we don’t! Big problem today!
• Techniques exist to trace spoofed traffic back toorigins, but this isn’t useful in face of a large attack
• Techniques exist to filter traffic, but a well-designedflooding stream defies stateless filtering
• Best solutions to date:– Overprovision - have enough raw capacity that it’s hard to
flood your links Largest confirmed botnet to date: 1.5 million hosts Floods seen to date: 40+ Gbps
– Distribute your services - force attacker to flood many points E.g., the root name servers
120
Transport-Level Denial-of-Service• Recall TCP’s 3-way connection establishment
handshake–Goal: agree on initial sequence numbers–Starting sequence numbers are based on clock
Client (initiator)
SYN, SeqNum = x
SYN and ACK, SeqNum = y, Ack = x + 1
ACK, Ack = y + 1
Server
to prevent attacker from guessing them to establishconnections using spoofed source addresses
random
Server creates stateassociated withconnection here
61
121
Flooding Defense: SYN Cookies
Client (initiator)
SYN, SeqNum = x
SYN and ACK, SeqNum = y, Ack = x + 1
ACK, Ack = y + 1
Server
• Server: when SYN arrives, encode connectionstate entirely within SYN-ACK’s sequence # y– y = SHA-1(client_addr, client_port, ISN x, server_secret)
•When ACK of SYN-ACK arrives, server onlycreates state if seq # y in it agrees with hash
Server only createsstate here
Must be cheap togenerate & test
122
Wireless Media Access• What makes wireless links more problematic than
wired links?
• It’s technically difficult to detect collisions– Transmitter swamps co-located receiver
• … even if we could, it wouldn’t work– Different transmitters have different coverage areas– Hidden terminals– Exposed terminals
• In addition, wireless links are much more prone toloss than wired links
• “Discovery” provides new infection vectors
62
123
• A and C can both send to B but can’t hear each other– A is a hidden terminal for C and vice versa
• CSMA/CD will be ineffective – need to sense at receiver
Hidden Terminals
A B C
transmit range
124
Exposed Terminals
• B, C can hear each other …
• .. But can safely send to A, D
A B C
transmit range
D
63
125
MACA = Multiple Access with Collision Avoidance
Overcome exposed/hidden terminal problems withcontention-free protocol1. B stimulates C with Request To Send (RTS)2. A hears RTS and defers (to allow C to answer)3. C replies to B with Clear To Send (CTS)4. D hears CTS and defers to allow the data5. B sends to C
RTS / CTS Protocols (MACA)
B C DRTS
CTSA
126
MACA, con’t
• If sender doesn’t get a CTS or ACK back, itassumes collision
• If other nodes hear RTS, but not CTS: send– Presumably, destination for sender is out of node’s
range …–… Can cause problems when a CTS is lost
sender receiver other node in sender’s range
RTS
ACK
dataCTS
64
127
IEEE 802.11 Wireless LAN … etc.• 802.11b– 2.4-5 GHz unlicensed radio
spectrum E.g., microwave ovens,
cordless phones– up to 11 Mbps– direct sequence spread
spectrum (DSSS) in physicallayer All hosts use same code
– Widely deployed, using basestations Base station provides
gateway from wireless nodesto another hop
Next hop could be anotherbase station …
… or could be the Internet
• 802.11a– 5-6 GHz range– up to 54 Mbps
• 802.11g– 2.4-5 GHz range– up to 54 Mbps
• Bluetooth– Separate (non-802)
standard for short-rangewireless
– Primarily meant forconnecting nearby devices(e.g., keyboards/mice)
– Enabled for many cellphones / laptops “Discoverable” …
128
Summary of Multimedia• Data rates can be significant– Leads to emphasis on compression
• For real-time or interactive reception, mustaccommodate vagaries of variable delay (jitter)–Playback buffer: hold data up to maximum limit to enable
smooth playback
• Voice-over-IP: convergence of global telephonynetwork with global data network
65
129
Playout Buffer/Delay, con’t
CumulativeBits
Time
delay Playout delay
Source Destination
Playoutbuffer
PlaybackDelay
Play back data a fixed time interval after it was sent