7/9/2001 edward chow content switch 1 introduction to linux-based virtual server and content switch...
TRANSCRIPT
7/9/2001 Edward Chow Content Switch 1
Introduction to Linux-based Virtual Server and Content Switch
C. Edward ChowDepartment of Computer Science
University of Colorado at Colorado [email protected]
The ppt file of this tutorial is available at http://cs.uccs.edu/~chow/pub/conf/pdcat/tutorial.ppt
Part of this work sponsored by CCL/ITRI
7/9/2001 Edward Chow Content Switch 2
Outline of the Talk
• Overview of Content Delivery Networks• Linux-based Virtual Server• Linux-based Content Switching
7/9/2001 Edward Chow Content Switch 3
Clients
Content Delivery Network (CDN)
Host Server
MindSpring
PSINetSprint
Gloobix
QWest
@Home
UUnet
Huge Requests
Server Crash
Slow Response
Clients
Clients
7/9/2001 Edward Chow Content Switch 4
Content Delivery Problems
http://www.akamai.com
7/9/2001 Edward Chow Content Switch 5
Use Client Cache/Client Side Cache Server
Host Server
MindSpring
PSINetSprint
Gloobix
@Home
UUnet
Fewer Requests
Clients
Clients
Clients
ClientCache
ClientSideCacheServer
QWest
Fast Response
7/9/2001 Edward Chow Content Switch 6
Use Mirror Sites
Host Server
MindSpring
PSINetSprint
Gloobix
QWest
@Home
UUnet
Fewer Requests
Server
Fast Response
Clients
Clients
Clients
Mirror Site
Mirror Site
Need improvement by guiding the selection of mirror servers with server load/network bandwidth measurement
7/9/2001 Edward Chow Content Switch 7
Edge Network Cache Servers
Host Server
MindSpring
PSINetSprint
Gloobix
QWest
@Home
UUnet
Fewer Requests
Server
Fast Response
Clients
ClientsClients
ClientCache
Mirror Site
Mirror SiteEdgeNetworkCacheServer
CacheServer
CacheServer
CacheServer
CacheServer
ClientSideCacheServer
7/9/2001 Edward Chow Content Switch 8
Content Delivery Problem
• Cache Location Problem: Where to put cache servers?
• How many are needed?• When/where/how to push/delivery the content?• How about dynamic content?
7/9/2001 Edward Chow Content Switch 9
Akamai Edge Delivery Service
• Peering Bottleneck Problem: Access traffic evenly spread over 7400+ networks (no one over 5%; most << 1%) Need to put edge servers in many networks.
• 11/2000, 4 billion bits/day for 2800 sites.• Source Http://www.akamai.com
Date # of Edge Servers
# of Networks # of Countries
11/2000 6000 335 54
6/2001 9700 650 56
7/9/2001 Edward Chow Content Switch 10
Caching Dynamic Content at Web Proxies
• Active Cache Project : [PeiCao 98] Univ. Wisconsin– Cache Java applet to be executed at proxies– Choice of passing to server, delivery cached copy,
or generate dynamically.• Edge Side Include (ESI):
– XML tag to specify ESI fragment in a web page.– Each ESI fragment can have different cache/
7/9/2001 Edward Chow Content Switch 11
Edge Side Include Examplehttp://www.esi.org/
<table><tr><td colspan=“2”><esi:try> <esi:attempt> <esi:include src=http://www.myxyz.com/news/top.html onerror=“contineu” /> </esi:attempt> <esi:except> <!- -esi This spot is reserved for your company’s advertising. For more info <a href=www.myxyz.com> click here </a> - - > </esi:except></esi:try></td></tr></table>
7/9/2001 Edward Chow Content Switch 12
Solution to First Mile Problem• First Mile Problem: Hugh requests at web site of CDN• High Bandwidth Connection• Caching
– End System Cache• Client Cache• Client Site Proxy Cache Server• Mirror Site Caches
– Cache Servers in Internet• Hierarchical Cache Servers, e.g., Squid/Harvest/Adaptive Web• Edge Servers of Akamai
• Faster Server/Server Farm (Server Side Caching+Cluster)• Layer4 Load balancer+Real Servers• Content Switch+Real Servers• Distributed Packet Rewrite
7/9/2001 Edward Chow Content Switch 13
Load Balancer
or
Content Switch
Real Server
Web Server ClusterLoad balancer can run at
• Application Level — Reverse Proxy
• Kernel level — Linux Virtual Server
Load balancer can distribute requests based on
• Layer 3-4 info — fixe field/fast hash
• Layer 3-7 info — var. length/slow parsing
Real Server
Real Server
Real Server
7/9/2001 Edward Chow Content Switch 14
Comparison of Load Balancers• Reverse Proxy runs as application process requires
more memory/packet copying.• Linux Virtual Server runs in kernelno memory
Name Type Level Layer Info
Reverse Proxy/Apache/Tomcat/Servlet
SW Application 3-7
Linux Virtual Server SW Kernel 3-4
Linux Content Switch SW Kernel 3-7
Layer4 Switch (narrow def.) HW Embedded OS 3-4
Content/Web Switch HW Embedded OS 3-7
7/9/2001 Edward Chow Content Switch 15
Linux Virtual Server (LVS)• “Virtual server is a highly scalable and highly
available server built on a cluster of real servers. The architecture of the cluster is transparent to end users, and the users see only a single virtual server” with Virtual IP address (VIP).
• Http://www.linuxvirtualserver.org/
InternetVIP
Load Balancer/DirectorLinux Box
WAN/LAN
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3CIP
Client CIP: Client IP AddressVIP: Virutal IP AddressRIP: Real Server IP Address
7/9/2001 Edward Chow Content Switch 16
LVS-NAT Configuration (Network Address Translation)• All return traffic go through DirectorSlow• Modify IP addr/port #/Checksum at Director• Director and real servers at same LAN• No modification needed on real-servers• Port remapping: real web server can run
on 8080
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3CIP
Client
Switch
7/9/2001 Edward Chow Content Switch 17
LVS-NAT Configuration Step 2. Director routes Pkt
• Based on CIP, source port#, VIP and dst port#, director selects one of the real servers
• Change the dst IP addr or port # of pkt.
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
Client
Switch
CIP VIPCIP RIP1
LVS RoutingScheduling Rules
ipvsadm cmd
7/9/2001 Edward Chow Content Switch 18
LVS-NAT Configuration Step 3. Real Server Replies
• Real server retrieves response.• All real servers set default gateway to Director; like any other
NAT or IP masquerade setup• Packet will be sent back to Director.
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
3. ProcessRequest
Client
Switch
CIP VIPCIP RIP1
RIP1 CIP
7/9/2001 Edward Chow Content Switch 19
LVS-NAT Configuration Step 4. Director rewrites reply
• Director changes the dst IP addr. (RIP1) of pkt to VIP• Modify port # if needed.• Modify the checksum; send back pkt.
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
3. ProcessRequest
4. Rewrite replyClient
Switch
CIP VIPCIP RIP1
RIP1 CIP
VIP CIP
7/9/2001 Edward Chow Content Switch 20
LVS-NAT Configuration (Network Address Translation)• All return traffic go through DirectorSlow• Modify IP addr/port #/Checksum at Director.• Director and real servers at same LAN
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
3. ProcessRequest
4. Rewrite reply5. Receive reply
Client
Switch
CIP VIPCIP RIP1
RIP1 CIP
VIP CIP
7/9/2001 Edward Chow Content Switch 21
LVS-NAT Setup Commands
# make the director forward the masquerading packetsecho 1 > /proc/sys/net/ipv4/ip_forward ipchains -A forward -j MASQ -s 172.16.0.0/24 -d 0.0.0.0/0# Add virtual service and link a scheduler to it ipvsadm -A -t 202.103.106.5:80 -s wlc (Weighted Least-Connection
scheduling) ipvsadm -A -t 202.103.106.5:21 -s wrr (Weighted Round Robin scheduling ) #Add real servers and select forwarding method and weight ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.2:80 -m ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.3:8000 -m -w 2 ipvsadm -a -t 202.103.106.5:21 -R 172.16.0.2:21 -m
7/9/2001 Edward Chow Content Switch 22
LVS-Tunnel Configuration(IP Tunneling)
• Real Servers need to handle IP over IP packets.• Real Servers can be geographically separated and return traffic
go through different routes. • Security implication!
InternetVIPLoad Balancer
Linux Box
Real Server1
Real Server2
Real Server3
RIP1
RIP21. request
2. Scheduling/Put packet in IP Tunnel
CIP
3. ProcessRequest
4. Receive reply
Client
CIP VIPRIP0 RIP2 CIP VIP
IP TunnelIP Tunnel
IP TunnelRIP3
RIP0
VIP CIP
7/9/2001 Edward Chow Content Switch 23
LVS-Tunnel Setup Commands
#The load balancer (LinuxDirector), kernel 2.2.14echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 -i
#The real server 1, kernel 2.2.14echo 1 > /proc/sys/net/ipv4/ip_forward
# insert it if it is compiled as module insmod ipip ifconfig tunl0 172.26.20.110 netmask 255.255.255.255
broadcast 172.26.20.110 up route add -host 172.26.20.110 dev tunl0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden
7/9/2001 Edward Chow Content Switch 24
LVS-DR Configuration (Direct Routing)
• Real servers need to configure a non-arp alias interface with virtual IP address and that interface must share same physical segment with load balancer.
• Only Director’s interface replies to VIP ARP request.
• Director only rewrites server MAC address; IP packet not changed Fast!
Internet
VMACDirector Real
Server1
Real Server2
Real Server3
RMAC1
RMAC2
RMAC3
1. request
2. Scheduling/Rewrite packet
CIP
Client
Route/Switch
GMAC VMAC CIP VIP
VMAC RMAC3 CIP VIP
GMAC: Gateway MAC address
7/9/2001 Edward Chow Content Switch 25
LVS-DR Configuration Step 3. Process Request
• Real server returns request.
• Request goes directly throughswitch/router; not Director.
Internet
VMAC LinuxDirector Real
Server1
Real Server2
Real Server3
RMAC1
RMAC2
RMAC3
1. request
2. Scheduling/Rewrite packet
CIP 3. ProcessRequest
4. Receive replyClient
Switch
VIP CIP
GMAC VMAC CIP VIP
VMAC RMAC3 CIP VIP
RMAC3 GMAC VIP CIP
GMAC: Gateway MAC address
7/9/2001 Edward Chow Content Switch 26
LVS-DR Configuration (Direct Routing)
• Real servers need to configure a non-arp alias interface with virtual IP address and that interface must share same physical segment with load balancer.
• Load balancer only rewrites server MAC address; IP packet not changed Fast!
Internet
VMAC LinuxDirector Real
Server1
Real Server2
Real Server3
RMAC1
RMAC2
RMAC3
1. request
2. Scheduling/Rewrite packet
CIP 3. ProcessRequest
4. Receive replyClient
Switch
VIP CIP
GMAC VMAC CIP VIP
VMAC RMAC3 CIP VIP
RMAC3 GMAC VIP CIP
GMAC: Gateway MAC address
7/9/2001 Edward Chow Content Switch 27
LVS-DR Setup Commands #The load balancer (LinuxDirector), kernel 2.2.14 or later
echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 –g
#The real server 1, 172.26.20.112, kernel 2.2.14 or later
echo 1 > /proc/sys/net/ipv4/ip_forward ifconfig lo:0 172.26.20.110 netmask 255.255.255.255
broadcast 172.26.20.110 up route add -host 172.26.20.110 dev lo:0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/lo/hidden
7/9/2001 Edward Chow Content Switch 28
Persistence Handling in LVS• Sticky connections Examples:
– FTP control (port21), data (port20)For passive FTP, the server tells the clients the port that it listens to, the client initiates the data connection connecting to that port. For the LVS/TUN and the LVS/DR, LinuxDirector is only on the client-to-server half connection, so it is imposssible for LinuxDirector to get the port from the packet that goes to the client directly.
– SSL Session: port 443 for secure Web servers and port 465 for secure mail server, key for connection must be chosen/exchanged.
• Persistent port solution:– First accesses the service, LinuxDirector create a template between the given client
and the selected server, then create an entry for the connection in the hash table. – The template expires in a configurable time, and the template won't expire until all
its connections expire. – The connections for any port from the client will send to the server before the
template expires. – The timeout of persistent templates can be configured by users, and the default is
300 seconds
7/9/2001 Edward Chow Content Switch 29
HA-LVS ConfigurationHigh Available
Internet LinuxDirector
Real Server1
Real Server2
Real Server3
CIPClient
HeartBeat
MON
BackupDirector
MON1. When Backup Director detects Linux Director failurethrough heart beat protocol,
“graciously negotiate”the take-over of VIP
Provide fault-tolerant
2. Monitor server processes run on real servers
Route requests to server processesthat are alive. Initiate restart/repair
7/9/2001 Edward Chow Content Switch 30
Performance of LVS-based Systems
“We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14 kernel)directing about 20,000 HTTP requests/second to a bank of about 20 Web servers answering with tiny identical dummy responses for a few minutes. Worked just fine.” Jerry Glomph Black, Director, Internet & Technical Operations, RealNetworks
“I had basically (1024) four class-Cs of virtual servers which were loadbalanced through a LinuxDirector (two, actually -- I used redundant directors) onto four real servers which each had the four different class-
Cs aliased on them.” "Ted Pavlic" <[email protected]>
7/9/2001 Edward Chow Content Switch 31
LVS Usage Survey 2/15/2001 Lorn KeyClusters 20 1 2 2 2
Directors
Per Cluster
2 2 2 2 2
Total Real Servers
170 12 4 15 6
RoutingMethods
DR/NAT DR NAT DR NAT
ScheduleMethods
RR/WLC WRR LC WLC WLC
Types of Real Servers
RH6.2 Linux WinLinux
LinuxSolaris
RH
ServiceOffered
WWW WWW/other
WWWDB
WWWSMTP
WWW
File SystemReplication
rsync rsync CodaNFS
Custom rsynccustom
MonitoringSoftware
Heartbeatldirectord
Nanny/Pulse
HeartbeatMon
NannyPulse
Heartbeat
C. Edward ChowDepartment of Computer Science
University of Colorado at Colorado Springs
Sponsored by Computer Comm. Lab/ITRI
7/9/2001 Edward Chow Content Switch 33
Content Switch Topics
• What is a Content Switch?• What Services it Can Provide• Content Switch Example• Related Technologies• Content Switch Architecture and Basic Operations• TCP Delay Binding and Related Improvement• Content Switch Rule and Conflict Detection• Conclusion
7/9/2001 Edward Chow Content Switch 34
Content Switch (CS)
• Route packets based on high layer (Layer 5/7) headers and content.
• Examples:– Direct Web traffic based on pattern of
• URLs, cookies – URL Switching• XML Tag Value– Web Switching
– Can Route incoming email based on email address;Connect POP/IMAP based on login
• Web switches and Intel XML Director/accelerator are special cases of content switch.
7/9/2001 Edward Chow Content Switch 35
What Services It Can Provide
• Enabling premium services for e-commerce, ISP, and Web hosting providers
• Load Balancing and High Available Server Clusters: Web, E-commerce, Email, Computing, File, SAN
• Policy-based networking, differential/QoS services. • Firewall, Strengthening DoS protection, cache/firewall
load-balancing• ‘Flash-crowd' management• Email Spam Protection, Virus Detection/Removal• Applet Authentication/Filtering
7/9/2001 Edward Chow Content Switch 36
F5 VRM Solution
BIG-IP
Server Array
Webmaster
Site Inewyork.domain.com
Site IIItokyo.domain.com
Site IIlosangeles.domain.com
Userlondon.domain.com
Local DNS
3-DNS
GLOBAL-SITE
Router
BIG-IP
InternetInternet
7/9/2001 Edward Chow Content Switch 37
Intel Netstructure XML Director 7280
• Example of Rule:Server1: create */order.asp & //Amount[Value >= 10000]
7/9/2001 Edward Chow Content Switch 38
Phobos In-Switch• Only load balancing switch in a PCI card form factor
• Plugs directly into any server PCI slot
• Supports up to 8,192 servers, ensuring availability and maximum performance
• Six different algorithms are available for optimum performance: Round Robin, Weighted Percentage, Least Connections, Fastest Response Time, Adaptive and Fixed.
• Provides failover to other servers for high-availability of the web site
• U.S. Retail $1995.00
7/9/2001 Edward Chow Content Switch 39
E-Commerce Example: 1. ClientClient submits via HTTP/Post (or SOAP) the following purchase in XML:<purchase>
<customerName>CCL</customerName><customerID>111222333</customerID><item><productID>309121544</productID>
<productName>IBM Thinkpad T21</productName><unitPrice>5000</unitPrice><noOfUnits>10</noOfUnits><subTotal>50000</subTotal>
</item><item><productID>309121538</productID>
<productName>Intel wireless LAN PC Card</productName><unitPrice>200</unitPrice><noOfUnits>10</noOfUnits><subTotal>2000</subTotal>
</item><totalAmount>52000</totalAmount>
</purchase>
7/9/2001 Edward Chow Content Switch 40
E-Commerce Example: 2. Content Switch
• Content switch receives the packet.• Recognize it is a http post request from http request line
POST /purchase.cgi HTTP/1.1• Recognize it is an XML document from the meta header
content-type: TEXT/XML• Parsing XML content• Extract values of tag sequences:
52000 purchase/totalAmount CCL purchase/customerName
• Rule 1 is matched and packet is routed to one of highSpeedServers.Rule 1: if (xml.purchase/totalAmount > 5000) routeTo(highSpeedServers);Rule 2: if (xml.purchase/customerName == CCL) routeTo(specialCustomerServers);
7/9/2001 Edward Chow Content Switch 41
No Free Lunch:Penalty of Having Content Switch
•
Increased packet processing time.• For XML Director/Accelerator, it needs to parse XML
document and match tag sequences. 1-3? order of processing time
Layer 4 Switching Layer 7 Switchingpacket header extraction fixed short fields varying length long fieldsswitch rule matching hash table look up pattern matching
Size of XML Document (Bytes) XML Content Extract Time (ms)600 14
7000 2167104 53
7/9/2001 Edward Chow Content Switch 42
Related Technologies
• Application level solution: Proxy server; Apache/Tomcat/Servlet; Microsoft NLB
• Kernel level layer 4 load balancing solution: http://www.linuxvirtualserver.org/– Joseph Mark’s presentation– LVS-NAT(Network Address Translation) web page– LVS-IP Tunnel web page– LVS-DR (Direct Routing) web page
• Hardware solution: Cisco 11000, F5 (Big IP), Alteon Web Systems, Foundry Networks (ServerIron),Excellent information on: Foundry ServerIron Installation and Configuration Guide, May 2000.
• Routing table lookup: Longest prefix (Gupta/McKeown)
7/9/2001 Edward Chow Content Switch 43
Basic Operations of Content Switching
CS Rule Matching Algorithm
HeaderContent
Extraction
Packet Classification
CSRules
Packet Routing(Load Balancing)
CS RuleEditor
IncomingPackets
ForwardPacket
To Servers
Network Path Info
Server Load Status
CS: Content Switching
7/9/2001 Edward Chow Content Switch 44
Content Switch ArchitectureApostolopoulos
Infocom 2000
7/9/2001 Edward Chow Content Switch 45
Content Switch Architecture
Client
HashTable
Case A: Controller findsthere is an entry in its Hash Table,Route request to “sticky connection” outgoing port
Real Server1
7/9/2001 Edward Chow Content Switch 46
Content Switch Architecture
Client
HashTable
Case B: Step 1. Controller findsthere is no entry in Hash Table,Route request to content switch processor Real
Server1
7/9/2001 Edward Chow Content Switch 47
Content Switch Architecture
Client
HashTable
Case B: Step 1. Controller findsthere is no entry in Hash Table,Route request to content switch processor
Real Server1
Step2. CS processora. Extract content/Match CS rules
b.Route requestc. Setup Sequence# modification
on server side port
CSRules
pktModification
info
7/9/2001 Edward Chow Content Switch 48
Content Switch Architecture
Client
HashTable
Case B: Step 1. Controller findsthere is no entry in Hash Table,Route request to content switch processor
Real Server1
Step2. CS processora. Extract content/Match CS rules
b.Route requestc. Setup Sequence# modification
on server side port
CSRules
pktModification
info
Step 3. At server side port,Return pkts are modified
Sequence#/IP addr/ChksumRoute back to client
7/9/2001 Edward Chow Content Switch 49
Efficient Software Architecture
• Tasks: Million packets with thousand of rules to match and load balancing algorithms to run.
• How to assign tasks to the (network) processors and threads?– Packet Extraction
(Understand header formats, XML parsing)– Content Switching Rule Matching– Packet Routing
(Load Balancing, Bandwidth Control)• How Much Packet Processing Should Controllers Do?• What a controller can do?• A Typical Parallel Processing Problem?
7/9/2001 Edward Chow Content Switch 50
TCP Delay Binding (Splicing)client
content switch server
step1
step2
SYN(CSEQ)
SYN(DSEQ) ACK(CSEQ+1)
DATA(CSEQ+1) ACK(DSEQ+1)
step4
step9
step10
step5
step6
SYN(CSEQ)
SYN(SSEQ) ACK(CSEQ+1)
step8
DATA(CSEQ+1) ACK(SSEQ+1)
DATA(SSEQ+1) ACK(CSEQ+lenR+1)
DATA(DSEQ+1) ACK(CSEQ+LenR+1)
ACK(DSEQ+ lenD+1) ACK(SSEQ+lenD+1)
lenR: size of http request. lenD: size of return document.
ACK(DSEQ+1)
step3
step7
ACK(SSEQ+1)
DATA(?) 2nd request ACK(?)
step11
7/9/2001 Edward Chow Content Switch 51
Improve Content Switching
• Setup CS-Real Server connections ahead of time (Persistent HTTP Connections). NetScale Reduce TCP 3-way handshake time
• Pre-allocate Server Scheme (Guess Real Server based on the TCP Sync)
• Sequence# modification on every return pkt Need to recompute checksum also.
• Filter Scheme (Offload Sequence# modification/rule matching to real servers).
• Buffering/Pipeline (aggregate) Requests
7/9/2001 Edward Chow Content Switch 52
Pre-Allocate Server Schemeclient
content switch Pre-allocatedserver
step2
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1) step4
SYN(CSEQ)
SYN(SSEQ) ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)
step5
step6
ACK(SSEQ+1)
DATA(SSEQ+1)ACK(CSEQ+lenR+1)
DATA(SSEQ+1)ACK(CSEQ+LenR+1)
ACK(SSEQ+lenD+1) ACK(SSEQ+lenD+1)
.
• Guess routing decision based on IP/Port#/History• Advantage:
• Faster than TCP delay binding.• Possible direct route between client and server• Reduce session processing overhead
no need to convert server sequence #
step1
step3ACK(SSEQ + 1) ACK(SSEQ+1)
7/9/2001 Edward Chow Content Switch 53
Degenerated to TCP Delayed Binding If Guess is Wrong
client content switch
Pre-allocatedserver
step1
SYN(CSEQ)
SYN(CSEQ)
step2SYN(SSEQ)/ ACK(CSEQ+1) SYN(SSEQ)/ ACK(CSEQ+1)
step12
DATA(RSEQ+1)/ACK(CSEQ+lenR+1)DATA(SSEQ+1)/ACK(CSEQ+LenR+1)
ACK(SSEQ+lenD+1 ACK(RSEQ+lenD+1)
step6
step7
step8
SYN(CSEQ) SYN(RSEQ)/ ACK(CSEQ+1)
DATA(CSEQ+1)/ACK(RSEQ+1)
Right server
Sequence # conversion neededfor right server now
step3ACK(SSEQ + 1) ACK(SSEQ+1)
DATA(CSEQ+1)/ ACK(SSEQ+1) step4 DATA(CSEQ+1)/ACK(SSEQ+1)
step5 DATA(SSEQ+1)
FIN(CSEQ+lenR+1))Server sent HTTP 404
ACK(RSEQ+1)step9
step10
step11
7/9/2001 Edward Chow Content Switch 54
Filter Process SchemeFilter Processrun on server
client content switch
server
step1
SYN(CSEQ)
step2SYN(DSEQ)/ACK(CSEQ+1)
DATA(CSEQ+1)/ACK(DSEQ+1)
step4
step5 a
step6
step8
step10
SYN(CSEQ)
SYN(SSEQ)/ ACK(CSEQ+1)
DATA(CSEQ+1)/ACK(SSEQ+1)
ACK(DSEQ+lenD+1) ACK(SSEQ+lenD+1)
step9DATA(SSEQ+1)
ACK(CSEQ+lenR+1)DATA(DSEQ+1)ACK(CSEQ+LenR+1)
step5bMigrate(Data, CSEQ, DSEQ)
ACK(DSEQ+1)
ACK(SSEQ+1)
step3
step7
7/9/2001 Edward Chow Content Switch 55
Pre-allocate performance plot
Plot of response time vs document size
020000400006000080000
100000120000140000160000180000200000220000240000260000280000300000320000340000360000380000400000420000440000460000480000500000
0 10000 20000 30000 40000
bytes
mic
ros
ec
on
ds
Series1
Series2
Series3
Series4
Figure 3. Performance of Pre-allocate Server Scheme
Series 1 - Basic scheme with no rule matching module inserted, i.e., using default IPVS.
Series 2 - Basic scheme with the rule matching module inserted.
Series 3 - Pre-allocate scheme with all hits, i.e., where all pre-allocate guesses were correct.
Series 4 - Pre-allocate scheme with all misses, i.e., where all pre-allocate guesses were wrong.
7/9/2001 Edward Chow Content Switch 56
Handling multiple requestsin a Keep-Alive connection
• Determine when new request arrives– Verify that previous request has been completely received– Request data size is > 0
• Key assumption is only one outstanding request is sent at a time by client, i.e., requests are not pipelined
• Reuse connections – Store each connection control information in a
hash table keyed by real server address, once it is established.
7/9/2001 Edward Chow Content Switch 57
Quiz
• Web server keeps the TCP connection alive, expecting the browser to return for images and in-line media files.
• How many keep-alive connections are setup on IE5 and Netscape 4.7 for web page with many .jpg/.gif images?
• Can these image requests be pipelined from client browser to web server?
7/9/2001 Edward Chow Content Switch 58
Multiple HTTP Requests from One TCP Connection
• A keep alive TCP connection may include multiple HTTP “GET” requests.• Content Switch examines each “GET” request and makes new routing decision.• Content Switch establishes another connection with a different server based on the routing decision.• Those HTTP responses from different servers need to be interleaved and seen by the user as if from the same server.• Solutions: In order delivery (buffer requirement); Out of order delivery (seq# tracking)?• Problems: Should we throw away earlier html requests if receive later requests?
.
.
.
client
NAT approach
cs.jpgrocky.mid
uccs.gif
Index.htm
ContentSwitch
server1
server2
server9
7/9/2001 Edward Chow Content Switch 59
Multiple HTTP Requests from One TCP Connection
• Can servers return documents directly to client in keep-alive session case?
• Can equivalent VS-Tunnel or VS-DR be implemented using Content Switch?
.
.
.
client
cs.gif
rocky.mid
uccs.jpg
ContentSwitch
server1
server2
server9
7/9/2001 Edward Chow Content Switch 60
Content Switch Rule Survey
Survey shows that existing switches support• rules in basic (condition action) or (action condition)
form• some define condition as class, then specify the
action in separate statement or command• simple single conditional term• command line interface (to facilitate incremental
update?)• Actions can include reject, forward, put in queue (for
bandwidth control, scheduling)
7/9/2001 Edward Chow Content Switch 61
Content Switch Rule Design• Rule syntax generic to support all Intended features.• Use simple C if statement syntax rule: if (condition) { action }
– Easy to read – Allow optimization using c compiler
• Condition consists of multiple terms of – variable relational_operator value
e.g. xml.purchase/totalAmount > 50000 smtp.to == “[email protected]”
cookie.name == “servlet1” bitmatch(64, 8, 0xff) == 64 # above mean TTL=64 idea from netfilter universal filter
– suffix(variable, string) e.g. suffix(url, “gif”)– regex(variable, pattern) e.g. regex(url, “/purchase”)
• Action consists of reject, forward(server| queue)loadBalance(serverGroup, loadBalancingAlgorihtm)
7/9/2001 Edward Chow Content Switch 62
Efficient CS Rule Matching
• Brute force, strict priority: Rules are executed in sequential manner.
• Efficient Rule Matching Method:– Organize Rules so that rules can be skipped
based on existing content types.– Utilize compiler optimization technique.
7/9/2001 Edward Chow Content Switch 63
Simple CS Rule Editor GUI
7/9/2001 Edward Chow Content Switch 64
Conflict Detection on Content Switching Rules
• Detect conflicts among rules or rule set.• Absolute conflict type:
r1: if (xml.purchase/customerName == “CCL”) {routeTo(r1)}r2: if (xml.purchase/customerName == “CCL”) {routeTo(r2)}
• Potential conflict type: r1: if (xml.purchase/totalAmount > 5000) {routeTo(quickServers)}r2: if (xml.purchase/totalAmount >20000) {routeTo(superServers)}
• Algorithm: Build tree with the same variable, check operator and value to see if they are the same or lead to potential conflict, compare actions to decide conflict type or duplication.
• Developed conflict detection algorithm for rules with multiple term condition. Can be applied to policy-based rules conflict detection.
• Editor can build these trees while a user enters rules and warns about conflict right away.
7/9/2001 Edward Chow Content Switch 65
XML Tag Value Extraction
• A xmlContentExtract() is built to extract the tag values of a list of unique tag sequences.
• It is based on clark cooper’s expat 1.0 xmlparser.• Its argument include the pointer to an XML
document, the pointer to the array of strings (unique xml tag squences we follow the xsl selector syntax), and the number of sequences.
• It return the list of a structure node, with the tag sequence, its attribute, and its value.
• Currently, it supports one attribute and tag sequece needs to be unique.
7/9/2001 Edward Chow Content Switch 66
Status of UCCS ACSD Project
• A Linux-based LVS content switch called LCS was developed • Sponsored by CCL/ITRI. • Based on Linux-2.2.16-3, current release LCS02.• ip_forward.c, ip_masq.c, ip_vs.c are modified to implement
basic TCP delay binding.• ip_cs.c are added for most of the content switching functions
with http header extraction and xml content extraction.• A simple Java-based ruleEdit program was created for rule
editing and conflict detection.• Rule translate program to convert the rule set into a Linux kernel
module and allow dynamic replacement of rule without restarting the system.
• LCS is being ported to Intel IXP 1200 network processor.
7/9/2001 Edward Chow Content Switch 67
LCS Demo
• We set up viva.uccs.edu as a content switch and wait and ace as two real servers.
• URL Switching demo:http://viva.uccs.edu/~lcs1/ route to ace.uccs.eduhttp://viva.uccs.edu/~lcs2/ route to wait.uccs.edu
• XML Web Switching (E-commerce applications)http://archie.uccs.edu/~acsd/lcs/xmldemo.htmlWhen the 2nd subtotal tag >=50000, route to ace.When the 2nd subtotal tag <50000, route to wait.
• Let us know if you have problem accessing them.My students may be working on LCS extension.
7/9/2001 Edward Chow Content Switch 68
LCS Rule ExampleR4: if (atoi(rule_fields[1].value) >= 50000) { return route_to("ace", NON_STICKY, saddr); }R5: if ((atoi(rule_fields[1].value) > 0) && (atoi(rule_fields[1].value) < 50000)){ IP_RULE_MSG("serevr=wait\n"); return route_to("wait", NON_STICKY, saddr); }R10: if (strstr(url, "lcs1") != NULL) { IP_RULE_MSG("server=ace\n"); return route_to("ace", NON_STICKY, saddr); }R11: if(strstr(url, "lcs2") != NULL){ IP_RULE_MSG("server=wait\n"); return route_to("wait", NON_STICKY, saddr); }
7/9/2001 Edward Chow Content Switch 69
Related Load Balancing Research Results
• Modified Apache status module to report– Total bytes to be transferred by child processes– Average document transfer speed
• Modified LB-DNS to receive server status and bandwidth probing results.
• LB-DNS returns IP-address of the best server based a weight contributed by both server load and bandwidth.
• Modified WebStone benchmark to test the performance of load balancing web server clusters.
7/9/2001 Edward Chow Content Switch 70
Load balancing Systems
Modified Web Server1
Modified Web Servern
Statistics GatheringDaemon
LBA: ModifiedDNS
Server Delay
Request for Web pages
Server Ranking/tmp/StatFile
Bandwidth Probe Results
7/9/2001 Edward Chow Content Switch 71
Connection Rate: LBA vs. Round-RobinServer connection rate for 4 servers
0
200
400
600
800
1000
Update for LBA , per sec
Conn
ectio
ns/s
ec
load balancing system round-robin
load balancing system 418.2 656.6 907.9 420 636.7 322.6 711.6 420.5 638.3 670.6 683.4 899
round-robin 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6
1 2 3 4 5 6 7 8 9 10 11 12
Round robin only run once
7/9/2001 Edward Chow Content Switch 72
Conclusion• Content Delivery Network improves internet content retrieval• LVS provides a low cost layer 4 switching service for cluster.• Linux Content Switch with generic rules can be easily
configured for wide-variety of value-added services:– Premium services– Load balancing/High Available server farm.– Firewall– Bandwidth control/Traffic shaping
• Require efficient SW/HW architecture and rule matching algorithms to reduce processing overhead.
• Content rule design/conflict detection are important and challenging.
• TCP delay binding can be improved.
7/9/2001 Edward Chow Content Switch 73
References• http://www.linuxvirtualserver.org/• http://www.akamai.com/• http://cs.uccs.edu/~chow/pub/contentsw/talk/contentswitching.ppt• [Aron2000] Aron, Mohit, “Differential and predictable QoS in web server systems”, Ph.D
dissertation Rice University, Oct. 2000.• [Zhang97] Lixia Zhang, Sally Floyd, and Van Jacobson, “Adaptive Web Caching,” April 25,
1997. http://www-nrg.ee.lbl.gov/floyd/web.html• [Esi2001] Edge Side Includes, http://www.esi.org/. • [Chow2001a] C. Edward Chow and Indira Semwal, “Web Load Balancing Through More
Accurate Server Report,” Proceeding of PDCAT 2001, Taipei, Taiwan.• [Chow2001b] C. Edward Chow, Ganesh Godavari, and Jianhua Xie, “Content Switch Rules
and their Conflict Detection,” Proceeding of PDCAT 2001, Taipei, Taiwan.• [Chow2001c] C. Edward Chow and Weihong Wang, “The Design and Implementation of
Linux LVS-based Content Switch”, Proceeding of PDCAT 2001, Taipei, Taiwan.• [Aversa2000] Luis Aversa and Azer Bestavros, “Load Balancing a Cluster of Web Servers:
Using Distributed Packet Rewriting,” Proceedings of IPCCC 2000. • [Cao98] PeiCao, Jin Zhang and Kevin Beach, “Active Cache: Caching Dynamic Contents on
the Web” http://www.cs.wisc.edu/~cao/papers/active-cache.ps