reinventing tcp/ip in windows vista with the netio stack
TRANSCRIPT
Black Hat 2006Reinventing TCP/IP in Windows Vistawith the NetIO stack
Abolade GbadegesinArchitect
Contact: aboladeg at microsoft dot com
August 3, 2006 2
Getting Started
About meResponsible for architecture of network transports inWindows11 years working on Windows networking6 years redesigning the Windows networking stack
What this talk will coverGuiding principlesNetIO architectureIntegrated and extensible network securityPerformance and scalabilityWriting networked applications
August 3, 2006 3
Reinventing TCP/IPWhat do customers want?
Rich, clean APIsRich, clean APIsSimple diagnosticsSimple diagnostics
DevelopersDevelopers
End usersEnd users
It just worksIt just worksGreat performanceGreat performance
Device supportDevice supportScalabilityScalability
PartnersPartners
AdministratorsAdministrators
Low TCOLow TCOSecuritySecurity
SimpleResilientFlexible
Diagnosable
August 3, 2006 4
Reinventing TCP/IPGuiding Principles
• Define the state of the art in networking• Design components to be extensible and
diagnosable• Raise the bar on security and resilience• Enable pervasively flexible and self-
tuning performance
End usersEnd users
It just worksIt just worksGreat performanceGreat performance
Rich, clean APIsRich, clean APIsSimple diagnosticsSimple diagnostics
DevelopersDevelopers
Device supportDevice supportScalabilityScalability
PartnersPartners
AdministratorsAdministrators
Low TCOLow TCOSecuritySecurity
August 3, 2006 5
History
• Denial-of-service exploits– Early attacks exploited
spoofing, protocol designand product vulnerabilities
– Current attacks usestateful, non-spoofedsessions from 0wnedmachines
• Penetration exploits– Code-execution
vulnerabilities more rare inlow layers like TCP
– Attacks moving farther upthe application stack!
August 3, 2006 6
Timeline
1999
2000
2001
2002
2003
Initial architectureand design decisions
Project checkinsbegin
BVT frameworkrunning
Early 20011st milestone
UDP sockets over IPv6
IPv6 neighbor discoveryOutgoing TCP connection
setup over IPv6
Incoming TCP connectionsetup over IPv6
Mid 20022nd milestone
1st TCP data exchangeEarly connection offload workIPv4 and IPv6 fragmentation
and reassembly
Dual-family socketsMulticast
Early performance work
Late 20033rd milestone
NetIO TCP/IP in winmain builds
Meet the Internet Protocols teamMeet the Internet Protocols team
August 3, 2006 8
NetIO architectural frameworkGoals and design decisions
Windows NT subsystems
“NetIO”
NDIS 6.0
Multi-packetsend and
receive
Multiprocessorscaling
Unifiedconfigurationmechanisms
Extensivediagnostic
support
Cleanextensibilityinterfaces
Simplifiedkernel-modesupport
Unified IPv4 and IPv6
Native offload capability
Advanced TCP algorithms
Consistent, compatible API improvements
August 3, 2006 9
NetIO architectural frameworkA whirlwind debugger-guided tour
• Putting the components together– modules, binding, configuration– transport and network protocols– diagnostics, tracing
• Maintaining runtime state– compartments, interfaces, addresses, routes– endpoints, ports, listeners, connections
• Handling I/O– requests, buffers, queuing– paths, neighbors– inspection, injection, callouts
August 3, 2006 10
Designing for extensible securityOne question, though…
What does all this change mean fornetwork security and network policy
solutions on Windows?
August 3, 2006 11
Designing for extensible securityThe problem: how do you infer this…
endpoint listener
connection
socket()
listen()
connect()
accept()[*]
shutdown()
closesocket()
August 3, 2006 12
Designing for extensible security…from this?
?
socket()
listen()
connect()
accept()[*]
shutdown()
closesocket()
SYN SYNACK
FIN RST ACK
August 3, 2006 13
Designing for extensible securityThree conclusions
• Security-focused components needvisibility into the operation of thethings that they secure
• Policy-enforcing components needdirect control over the things thatpolicy talks about
• Security policy must be decoupledfrom components so it can evolve atthe pace of security threats
August 3, 2006 14
Designing for extensible securityThe NetIO approach
Allow external components to cleanly observe andinfluence internal logic
August 3, 2006 15
Designing for extensible securityUnderstanding the Windows Filtering Platform
TCP UDP Raw IP
IPv4 IPv6
Windows Sockets
RPC
Neighbor Discovery
This is the networking stack…
August 3, 2006 16
Designing for extensible securityUnderstanding the Windows Filtering Platform
…and this is how WFP fits in.
August 3, 2006 17
Designing for extensible securityWhat’s in the picture?
• Core stack (TCP, UDP, IPv4, IPv6)• Built-in policy-related components
–Application Layer Enforcement–Stream inspection– IPsec
• Core filtering engine–User-mode and kernel-mode logic–Filter database
• Filtering callouts
August 3, 2006 18
Designing for extensible securityWhat layers are defined for callouts?
• RPC, IKE• Socket operations (listen, accept, connect,
port assignment)• In-order TCP data streams• Inbound & outbound TCP/UDP messages• Inbound, outbound & forwarded IP packets• ICMP messages• …and more to come
August 3, 2006 19
Designing for extensible securityWhat does WFP enable?
• Extensibility• Transparency to users and
applications• Tight integration, high performance,
scalability
August 3, 2006 20
Performance and scalability
Core TCP performanceHandling intensive workloads
Core TCP performanceFlow control 101
Receiver advertises window
Sender transmits up to window size
Receiver has to acknowledge somethingbefore sender can transmit further
Ideal window: bandwidth * delay
Bandwidth
Delay
Receive window capacity
Receiver window capacity
Core TCP performancePipes & flow control
Pipecharacteristics
100 Mbps with 10 ms delay5Mbps with 200ms delay
1Gbps with 50ms delay
Idealwindow128KB128KB~6MB
Capacity utilizedby default window
~12%~12%~1.2%
Core TCP performanceFlow control on various paths
Receiver enables window scaling by default
Continuously estimates pipe capacity and monitorsapplication reads
Auto-tunes receive window advertisements toensure the receive window doesn’t limit throughput
up to 4000% improvement over XP in throughputfor HTTP
up to 4600% improvement over XP in throughputfor file transfers with SMB 2.0 pipelining
Core TCP performanceTCP receive window auto-tuning
Command line:netsh interface tcp set global autotuninglevel
Core TCP performanceControlling auto-tuning
Group Policy and UIGpedit.msc under advancedpolicy-based QoS settings
Core TCP performanceControlling auto-tuning
August 3, 2006 27
Performance and scalability
Core TCP performanceHandling intensive workloads
Speed1 Gbps
10 Gbps100 Gbps
Budget16 _secs1.6 _secs0.16 _secs
Minimizing per-packet processingMulti-packet transmission and receptionOffload checksum computation and verificationGiant Send Offload (GSO)
but driving a connection at 100 Gbps requiresmore….
Handling intensive workloadsTackling bandwidth scalability
Neighbor Discovery
1. Initiate offload attempt
IPv4/IPv6
5. Accept offload
Synchronizes state machinesbetween OS and hardware
TCPTCP connection
IP path state
Link-layer state
TCP offload manager
IP offload manager
ND offload manager
2. Cache TCP state
3. Cache IP state
4. Cache ND state
6. Update cached state
Handling intensive workloadsTCP connection offload
Transparently and gracefully transitions stateback and forth between OS and hardware
greater than 50% reduction in CPU utilizationusing 1Gbps Ethernet for HTTP workloads
OS continuously monitors connection activityand selects suitable candidates for offload
Defines offload state composably to simplifyoffload of other protocol stacks (e.g. SSL)
Handling intensive workloadsTCP connection offload
Path characteristics1 Gbps at 500ms10Gbps at 500ms
100Gbps at 500ms
Buffer size~64MB~512MB~6GB
Packet loss probability grows steadilyRamp-up after loss takes much longer (10minutes on 1Gbps/100ms path)
Packets in flight~32 thousand~256 thousand
~3 million
Handling intensive workloadsHigh latency throughput
Slow-start phaseIncrease congestion window by 1 packet foreach cumulative acknowledgment
Congestion avoidance phaseIncrease congestion window by 1 packet foreach round trip
Congestion responseOn loss, drop window to 1 packet and set slow-start threshold to _ outstanding data
Handling intensive workloadsClassic TCP congestion control 101
Congestion avoidanceDetect congestion by sensing increased delayAssumes sufficient network buffering toproduce measurable delay variations
Congestion responseAvoid packet loss by adjusting congestionwindow in response to delay
Handling intensive workloadsDelay-based TCP congestion control 101
Loss window Delay windowCongestion window
Combine loss and delaywindows for faster ramp-
up and recovery
Handling intensive workloadsReducing ramp-up time with Compound TCP
nearly 50% reduction in transfer timeover 1Gbps path with 30ms RTT
Tries to avoid losses when runningalone and recover quickly from lossescaused by others
Designed for fairness to connectionsusing loss-based congestion control
Handling intensive workloadsCompound TCP
Writing Networked Applications
Detecting Internet connectivityOptimizing connection establishmentPort management
Wireless Hot Spot
Windows Vista PC
Wired Internet
Internet Site
1. Send query for1. Send query forknown DNS nameknown DNS name
2. Invalid response,2. Invalid response,not Internetnot Internet
3. Send request for known URL3. Send request for known URL4. Correct response,4. Correct response,hence Internethence Internet
Writing networked applicationsDetecting Internet connectivity
characterizes global Internet connectivityfor both IPv4 and IPv6
Handles DNS spoofing by wirelesshotspots and detects transparent HTTPproxies
Scales by leveraging DNS and HTTPcaching
Queries issued through NetworkLocation Awareness API, handled byNLA 2.0 service
Writing networked applicationsDetecting Internet connectivity
Wireless Hot Spot
Windows Vista PC
Wired Internet
Internet Site
Wired addressesWired addressesIPv4 site-localIPv4 site-localIPv6 link-localIPv6 link-localIPv6 globalIPv6 global
Wireless addressesWireless addressesIPv4 site-localIPv4 site-localIPv6 link-localIPv6 link-local
Server addressesServer addressesIPv4 globalIPv4 globalIPv6 globalIPv6 global
Writing networked applicationsOptimizing connection establishment
designed to optimize connection successrate across IPv4 and IPv6
Currently tries one combination at a time,will make parallel attempts in futureAddress sorting functionality availableon its own via socket I/O control
Prioritizes and sorts multiple combinationsof source and destination addresses
Writing networked applicationsWSAConnectByName and WSAConnectByList
Well-known portsWell-known ports
11……
……10241024
10251025……
……50005000 ……6553465534
50015001……
Ephemeral portsEphemeral ports Other portsOther ports
Well-known portsWell-known ports
11……
……10231023
10241024……
……4915149151 ……6553465534
4915249152……
Registered portsRegistered ports Ephemeral portsEphemeral ports
more port numbers for dynamic assignmentfewer collisions on registered port numbers
Writing networked applicationsBasic port management
May I have 100 ports?May I have 100 ports?
OK, start at …
Writing networked applicationsReserving ports at runtime for applications
May I have port 520?May I have port 520?
OK, here is a reservation token …
Writing networked applicationsReserving ports statically for services
Reserve port numbers at runtime andstaticallyOptionally randomizes port assignmentsfor increased security
Supports IANA compliance for registeredand ephemeral ports
Writing networked applicationsPort management
August 3, 2006 45
Call to Action
We’re building the foundation, and we wantyour help!
• Ensure your tools & products light up withNetIO– Test devices for compatibility with TCP window
scaling– Achieve great TCP performance by supporting
pipelining and multithreading– Extend your reach by supporting IPv4 and IPv6– Leverage new features, e.g. port reservation
and randomization
August 3, 2006 46
Call to Action (2)
We’re building the foundation, and we wantyour help!
• Innovate on NetIO to enable new scenarios– Plug into WFP to enforce your own security
policies– Use secure sockets for authentication &
authorization– Leverage kernel sockets & kernel IP helper API
in drivers
August 3, 2006 47
Resources
EmailTCP/IP: [email protected]: [email protected]
Windows Vista on MSDN and TechNethttp://msdn.microsoft.com/windowsvista/http://windowssdk.msdn.microsoft.com/http://www.microsoft.com/technet/windowsvista
/network/default.mspx
The Cable Guyhttp://www.microsoft.com/technet/community/c
olumns/cableguy/default.mspx
August 3, 2006 48
Still to Come!
Tony ChorCase Study: The Security DevelopmentLifecycle and Internet Explorer 7
16:45 – 18:30
Adrian MarinescuWindows Vista Heap ManagementEnhancements – Security, Reliabilityand Performance
15:15 – 16:30
Noel Anderson &Taroon Mandhana
WiFi in Windows Vista: A Peek Insidethe Kimono
13:45 – 15:00
August 3, 2006 4949
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.