13- clustering and load balancing

Upload: bindhu-pandu

Post on 06-Apr-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 13- Clustering and Load Balancing

    1/33

    Clustering and Load Balancing

  • 8/2/2019 13- Clustering and Load Balancing

    2/33

    Outline

    Introduction

    Linux Virtual Server

    Microsoft load balancing solution

  • 8/2/2019 13- Clustering and Load Balancing

    3/33

    Introduction

    Explosive Growth of the Internet

    100% annual growth rate

    Sites receiving unprecedented workloadYahoo! 625 million views per day

    AOL Web cache system receiving 5 billion

    requests per day

  • 8/2/2019 13- Clustering and Load Balancing

    4/33

    Introduction

    load balancing is a technique to spread work between manycomputers, processes, disks or other resources in order to get optimalresource utilization and decrease computing time.

    A load balancer can be used to increase the capacity of a server farm

    beyond that of a single server. It can also allow the service to continue even in the face of server

    down time due to server failure or server maintenance. A load balancer consists of a virtual server (also referred to as vserver

    orVIP) which, in turn, consists of an IP address andport.

    virtual serveris bound to a number of physical services running on the

    physical servers in a server farm. A client sends a request to the virtual server, which in turn selects a

    physical server in the server farm and directs this request to theselected physical server.

    http://en.wikipedia.org/wiki/Server_farmhttp://en.wikipedia.org/wiki/Server_(computing)http://en.wikipedia.org/wiki/VIPhttp://en.wikipedia.org/wiki/IP_addresshttp://en.wikipedia.org/wiki/Porthttp://en.wikipedia.org/wiki/Virtual_serverhttp://en.wikipedia.org/wiki/Virtual_serverhttp://en.wikipedia.org/wiki/Porthttp://en.wikipedia.org/wiki/IP_addresshttp://en.wikipedia.org/wiki/VIPhttp://en.wikipedia.org/wiki/Server_(computing)http://en.wikipedia.org/wiki/Server_farm
  • 8/2/2019 13- Clustering and Load Balancing

    5/33

    Introduction (cont.)

    Different virtual servers can be configured for different sets ofphysical services, such as TCP and UDP services in general.

    Application specific virtual server may exist to support HTTP, FTP,SSL, DNS, etc.

    The load balancing methods manage the selection of an appropriatephysical server in a server farm.

    Persistence can be configured on a virtual server; once a server isselected, subsequent requests from the client are directed to the sameserver.

    Persistence is sometimes necessary in applications where client state is

    maintained on the server, but the use of persistence can causeproblems in failure and other situations.

    A more common method of managing persistence is to store stateinformation in a shared database, which can be accessed by all realservers, and to link this information to a client with a small token suchas a cookie, which is sent in every client request.

    http://en.wikipedia.org/wiki/Transmission_Control_Protocolhttp://en.wikipedia.org/wiki/User_Datagram_Protocolhttp://en.wikipedia.org/wiki/Statehttp://en.wikipedia.org/wiki/Statehttp://en.wikipedia.org/wiki/User_Datagram_Protocolhttp://en.wikipedia.org/wiki/Transmission_Control_Protocol
  • 8/2/2019 13- Clustering and Load Balancing

    6/33

    Introduction (cont.)

    Load balancers also perform server monitoring of servicesin a web serverfarm.

    case of failure of a service, the load balancer continues to

    perform load balancing across the remaining services thatare UP.

    In case of failure of all the servers bound to a virtual server,requests may be sent to a backup virtual server (ifconfigured) or optionally redirected to a configured URL.

    In Global Server Load Balancing(GSLB) the load balancerdistributes load to a geographically distributed set of serverfarms based on health, server load or proximity.

    http://en.wikipedia.org/wiki/Web_serverhttp://en.wikipedia.org/wiki/Web_serverhttp://en.wikipedia.org/wiki/Gslbhttp://en.wikipedia.org/wiki/Gslbhttp://en.wikipedia.org/wiki/Gslbhttp://en.wikipedia.org/wiki/Web_server
  • 8/2/2019 13- Clustering and Load Balancing

    7/33

    Introduction (cont.)

    Load balancing methods: Least connections

    Round robin

    Least response time Leastbandwidth

    Leastpackets

    URLhashing

    Domain name hashing

    Source IP address

    Destination IP address

    Source IP - destination

    Staticproximity, used for GSLB

    http://en.wikipedia.org/wiki/Connectionhttp://en.wikipedia.org/wiki/Round_robinhttp://en.wikipedia.org/wiki/Response_timehttp://en.wikipedia.org/wiki/Bandwidthhttp://en.wikipedia.org/wiki/Packetshttp://en.wikipedia.org/wiki/URLhttp://en.wikipedia.org/wiki/Hashinghttp://en.wikipedia.org/wiki/Domainhttp://en.wikipedia.org/wiki/IP_addresshttp://en.wikipedia.org/wiki/Proximityhttp://en.wikipedia.org/wiki/Proximityhttp://en.wikipedia.org/wiki/IP_addresshttp://en.wikipedia.org/wiki/Domainhttp://en.wikipedia.org/wiki/Hashinghttp://en.wikipedia.org/wiki/URLhttp://en.wikipedia.org/wiki/Packetshttp://en.wikipedia.org/wiki/Bandwidthhttp://en.wikipedia.org/wiki/Response_timehttp://en.wikipedia.org/wiki/Round_robinhttp://en.wikipedia.org/wiki/Connection
  • 8/2/2019 13- Clustering and Load Balancing

    8/33

    Web Server Load Balancing

    One major issue for large Internet sites is how to handle the load of thelarge number of visitors they get.

    This is routinely encountered as a scalability problem as a site grows.

    There are several ways to accomplish load balancing

    For example in WikiMedia load is balanced as: Round robin DNS distributed page requests evenly to one of three Squid

    Cache servers

    Squid cache servers used response time measurements to distribute pagerequests between seven web servers.

    In addition, the Squid servers cached pages and delivered about 75% of all

    pages without ever asking a web server for help. The PHP scripts which run the web servers distribute load to one of several

    database servers depending on the type of request, with updates going to amaster database server and some database queries going to one or moreslave database servers.

  • 8/2/2019 13- Clustering and Load Balancing

    9/33

    Server Load Balancing and

    redundancy Alternative methods include use of Layer 4 Router

    Linux virtual server, which is an advanced opensource load balancing solution for networkservices.

    Network Load Balancing Services, which is anadvanced open source load balancing solution fornetwork services.

    Many sites are turning to the multi-homed scenario;having multiple connections to the Internet viamultiple providers to provide a reliable and highthroughput service.

  • 8/2/2019 13- Clustering and Load Balancing

    10/33

    Linux Virtual Server

    Started in 1998, the Linux Virtual Server (LVS) projectcombines multiple physical servers into one virtual server,eliminating single points of failure (SPOF).

    Built with off-the-shelf components, LVS is already in usein some of the highest-trafficked sites on the Web.

    Requirements for LVS: The service must scale: when the service workload increases, the

    system must scale up to meet the requirements.

    The service must always be on and available, despite transientpartial hardware and software failures.

    The system must be cost-effective: the whole system must beeconomical to build and expand.

    Although the whole system may be big in physical size, it should

    be easy to manage.

  • 8/2/2019 13- Clustering and Load Balancing

    11/33

    LVS In LVS, a cluster of Linux servers appear as a single

    (virtual) server on a single IP address.

    Client applications interact with the cluster as if it were asingle, high-performance, and highly-available server.

    Inside the virtual server, LVS directs incoming networkconnections to the different servers according to schedulingalgorithms.

    Scalability is achieved by transparently adding or removingnodes in the cluster.

    High availability is provided by detecting node or daemonfailures and reconfiguring the system accordingly, on-the-fly.

    For transparency, scalability, availability and manageability,LVS is designed around a three-tier architecture, asillustrated in next figure

  • 8/2/2019 13- Clustering and Load Balancing

    12/33

    LVS architecture

    The load balancer,servers, and sharedstorage are usually

    connected by a high-speed network, suchas 100 Mbps Ethernetor Gigabit Ethernet,so that the

    intranetwork does notbecome a bottleneckof the system as thecluster grows.

  • 8/2/2019 13- Clustering and Load Balancing

    13/33

    IPVS

    IPVS modifies the TCP/IP stack inside the

    Linux kernel to support IP load balancing

    technologies

  • 8/2/2019 13- Clustering and Load Balancing

    14/33

    Three ways to balance load

    IPVS supports following three ways to

    balance loads:

    Virtual Server via NAT(VS/NAT)

    Virtual Server via Tunneling(VS/TUN)

    Virtual Server via Direct Routing(VS/DR)

  • 8/2/2019 13- Clustering and Load Balancing

    15/33

    Virtual Server via NAT (VS/NAT)

  • 8/2/2019 13- Clustering and Load Balancing

    16/33

    VS/NAT Workflow1. When a user accesses a virtual service provided by the server

    cluster, a request packet destined for the virtual IP address (the IPaddress to accept requests for virtual service) arrives at the load

    balancer.

    2. The load balancer examines the packet's destination address and

    port number. If they match a virtual service in the virtual server ruletable, a real server is selected from the cluster by a schedulingalgorithm and the connection is added to hash table that recordsconnections. Then, the destination address and the port of the

    packet are rewritten to those of the selected server, and the packet isforwarded to the server. When an incoming packet belongs to anestablished connection, the connection can be found in the hash

    table and the packet is rewritten and forwarded to the right server.3. The request is processed by one of the physical servers.

    4. When response packets come back, the load balancer rewrites thesource address and port of the packets to those of the virtualservice. When a connection terminates or timeouts, the connectionrecord is removed from the hash table.

    5. A reply is sent back to the user.

  • 8/2/2019 13- Clustering and Load Balancing

    17/33

    An example of Virtual Server via Nat

  • 8/2/2019 13- Clustering and Load Balancing

    18/33

    Packet rewriting flow

    The incoming packet for web service:

    The load balancer will choose a real server andrewritten forwards the packet to it:

    Replies get back to the load balancer:

    The packet is rewritten and forwarded back to theclient

  • 8/2/2019 13- Clustering and Load Balancing

    19/33

    VS-NAT advantages and

    disadvantages Advantages:

    Real servers can run any OS that supports TCP/IP

    Only an IP address is needed for the load balancer, realservers can use private IP address

    Disadvantages The maximum number of server nodes is limited,

    because both request and response packers are rewritten

    by the load balancer. When the number of server nodesincrease up to 20, the load balancer will probably

    become a new bottleneck

  • 8/2/2019 13- Clustering and Load Balancing

    20/33

    Virtual Server via IP Tunneling

    (VS/TUN) IP tunneling(also calledIP encapsulation) is a

    technique to encapsulate IP datagrams within IPdatagrams, which allows datagrams destined for

    one IP address to be wrapped and redirected toanother IP address.

    This technique can also be used to build a virtualserver: the load balancer tunnels the request packetsto the different servers, the servers process therequests, and return the results to the clientsdirectly. Thus, the service appears as a virtualservice on a single IP address.

  • 8/2/2019 13- Clustering and Load Balancing

    21/33

    VS/TUN architecture

  • 8/2/2019 13- Clustering and Load Balancing

    22/33

    VS-TUN workflow

  • 8/2/2019 13- Clustering and Load Balancing

    23/33

    VS-TUN advantages and

    disadvantages Advantages:

    Real servers send response packets to client directly,

    which can follow different network routes Real servers can be in different networks, LAN/WAN

    Greatly increasing the scalability of Virtual Server

    Disadvantages:

    Real server must support IP tunneling protocol

  • 8/2/2019 13- Clustering and Load Balancing

    24/33

    Virtual Server via Direct Routing

    (VS/DR) The load balancer and the real servers must have one of their

    interfaces physically linked by an uninterrupted segment of LAN suchas an Ethernet switch.

    The virtual IP address is shared by real servers and the load balancer.

    Each real server has a non-ARPing, loopback alias interfaceconfigured with the virtual IP address, and the load balancer has aninterface configured with the virtual IP address to accept incoming

    packets. The workflow of VS/DR is similar to that of VS/NAT or VS/TUN. In

    VS/DR, the load balancer directly routes a packet to the selected

    server (the load balancer simply changes the MAC address of the dataframe to that of the server and retransmits it on the LAN). When the server receives the forwarded packet, the server determines

    that the packet is for the address on its loopback alias interface,processes the request, and finally returns the result directly to the user.

  • 8/2/2019 13- Clustering and Load Balancing

    25/33

    VS/DR architecture

  • 8/2/2019 13- Clustering and Load Balancing

    26/33

    VS-DR workflow

  • 8/2/2019 13- Clustering and Load Balancing

    27/33

    VS-DR advantages and disadvantages

    Advantages:Real servers send response packets to clients

    directly, which can follow different networkroutes

    No tunneling overhead

    Disadvantages:

    Servers must have non-arp alias interface

    The load balancer and server must have one oftheir interfaces in the same LAN segment

  • 8/2/2019 13- Clustering and Load Balancing

    28/33

    Comparison

    VS/NAT VS/TUN VS/DR

    Server any Tunneling Non-arp device

    server network private LAN/WAN LANserver number low (10~20) High (100) High (100)

    server gateway load balancer own router Own router

    Note: those numbers are estimated based on the

    assumption that load balancer and backend servers

    have the same hardware configuration.

  • 8/2/2019 13- Clustering and Load Balancing

    29/33

    Scheduling algorithms

    Round-Robin

    Weighted Round-Robin

    Least-Connection

    Weighted Least-Connection

  • 8/2/2019 13- Clustering and Load Balancing

    30/33

    The LocalNode feature

    In a virtual server of only a few nodes(2,3

    or more), it is a resource waste if the load

    balancer is only used to direct packets. The LocalNode feature enable that the load

    balancer not only can redirect packets, but

    also can process some packets locally

  • 8/2/2019 13- Clustering and Load Balancing

    31/33

    LVS cluster management software

    RedHat Cluster Server / Piranha

    LVS+Piranha Cluster Management tools.

    UltraMoney: Open-Source Server Farm

    LVS+lvs-gui+heartbeat+ldirectord

    heartbeat+ldirectord

    heartbeat+mon

    ...

  • 8/2/2019 13- Clustering and Load Balancing

    32/33

    Some sites using LVS

    UK National JANET Cache

    (wwwcache.ja.net)

    www.linux.com

    sourceforge.net

    One of largest PC manufacturers

    www.netwalk.com

  • 8/2/2019 13- Clustering and Load Balancing

    33/33

    References

    Wikipedia

    http://www.linux-vs.org