hawkeye [edocfind.com] (1)

Upload: kritikagrover3743

Post on 08-Apr-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    1/25

    www.cs.wisc.edu/condor 1

    HawkEyeA Monitoring and Management

    Tool for Distributed SystemsTodd Tannenbaum

    Department of Computer Sciences

    University of Wisconsin-Madison http://www.cs.wisc.edu/condor [email protected]

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    2/25

    www.cs.wisc.edu/condor 2

    What does Condor have? lots of c ore technology for bui lding a distributed system

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    3/25

    www.cs.wisc.edu/condor 3

    What does Condor have? lots of core technology for building a distributed system lots of core technology for monitoring

    the status of a machine

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    4/25

    www.cs.wisc.edu/condor 4

    What does Condor have? lots of core technology for building a distributed system lots of core technology for monitoring

    the status of a machine

    lots of core technology for managing a w ork load of tasks

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    5/25

    www.cs.wisc.edu/condor 5

    What does Condor have? lots of core technology for building a distributed system lots of core technology for monitoring

    the status of a machine

    lots of core technology for managing a w ork load of tasks lots of really, truly, skilled and

    experienced developers and researchers at building distributed systems . S ome of the best . S tandout state employees . Honest .E mail for W isconsin G ov S cott McC allum:

    w isgov@ gov.state .w i.us

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    6/25

    www.cs.wisc.edu/condor 6

    One day an

    avid Condoruser asked:

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    7/25

    www.cs.wisc.edu/condor 7

    One day an

    avid Condoruser asked:

    Say, could CondorTechnology be used

    for distributed system administration??

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    8/25

    www.cs.wisc.edu/condor 8

    Time to think Gathered up our experiences w ith our own management tasks, looked at the mature C ondor technology available to us, and HawkEye effort was born .

    Completely separate from Condor from end user prospective .

    Can install HawkEye, or Condor, or both

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    9/25

    www.cs.wisc.edu/condor 9

    First Component:MONITORING

    Sysadmins first need information about what is happening on the

    machines they are responsible for .Both Current and PastInformation must be consolidated and easily accessibleInformation must be dynamic

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    10/25

    www.cs.wisc.edu/condor 10

    Condor ClassAds T echnology for an entity to describe

    itself Simple attribute value pairs

    [ load_average = 1.3free_Swap_space_mb = 140number_of_processes = 92keyboard_idle_secs = 6ram = 128total_swap = 512total_memory = ram + total_swapbusy = load_average > 1.0

    ]

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    11/25

    www.cs.wisc.edu/condor 11

    Condor ClassAds, cont. No fixed schema Attributes can contain values or

    expressions

    Serialize Ads in XML Open source libraries on C++ and J ava to:Manipulate Ads and Ad attributesStore Ads

    Query collections of Ads Bindings for Perl and others on the way

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    12/25

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    13/25

    www.cs.wisc.edu/condor 13

    HawkEye Monitoring Agent

    HawkEye Monitoring Agent

    HawkEyeManager HawkEye Monitoring Agent

    HawkEye Monitoring Agent

    HawkEye Monitoring Agent

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    14/25

    www.cs.wisc.edu/condor 14

    HawkEye Monitoring Agent

    /proc, kstat

    Hawkeye_Startup_Agent

    Hawkeye_Monitor

    HawkEye Monitoring Agent

    HawkEyeManager ClassAd

    UpdatesViaSecureUDP

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    15/25

    www.cs.wisc.edu/condor 15

    Monitor Agent, cont.

    Updates are sent periodicallyInformation does not get stale

    Updates also serve as a heartbeat monitorKnow when a machine is down Out of the box, the update ClassAd has

    many attributes about the machine of interest for system administration

    Current Prototype = 184 attributes

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    16/25

    www.cs.wisc.edu/condor 16

    What if I wantto monitor

    something youdidnt think

    about?

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    17/25

    www.cs.wisc.edu/condor 17

    Custom Attributes

    /proc, kstat

    Hawkeye_Startup_Agent

    Hawkeye_Monitor

    HawkEye Monitoring Agent

    HawkEyeManager

    Data fromhawkeye_update_attribute

    command line tool

    Create your own

    HawkEye plugins,or share plugins withothers

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    18/25

    www.cs.wisc.edu/condor 18

    Role of HawkEye

    Manager Store all incoming ClassAds in a indexed resident data structure

    Fast response to client tool queries about current stateShow me all machines with a load average > 10

    Periodically store ClassAd attributes into a Round Robin D atabase

    Store information over timeShow me a graph with the load average for this machine over the past week

    Speak to clients via CEDAR, HTTP

    HawkEyeManager

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    19/25

    S everal different clients Command- line, GUI, W eb - based

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    20/25

    www.cs.wisc.edu/condor 20

    But sysadmins alsosometimes have to do

    work T ask: copy a new library onto the

    local disk of each machine.J ust a script to copy via rcp/scp to every machine or is it?

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    21/25

    www.cs.wisc.edu/condor 21

    Running tasks on behalf of the sysadmin Submit your sysadmin tasks to HawkEye

    T asks are stored in a persistent queue by the Manager

    T asks can leave the queue upon completion, or repeat after specified intervalsT asks can have complex interdependencies via DAGManRecords are kept on which task ran where

    Sounds like Condor, eh?Yes, but simpler

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    22/25

    www.cs.wisc.edu/condor 22

    Run Tasks in response tomonitoring information ClassAd Requirements Attribute

    Example: Send email if a machine is low on disk space or low on swap space

    Submit an email task with an attribute: Requirements = free_disk < 5 || free_swap < 5

    Example w/ task interdependency: If load average is high and OS=Linux and console is Idle, submit a task which runs top , if top sees Netscape, submit a task to kill Netscape

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    23/25

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    24/25

    www.cs.wisc.edu/condor 24

    Current S tatus

    J ust Beginning this project Initial release early summer Prototypes already running

    Stop in and see initial HawkEye W orkRm 3385 on W eds 9am 12pm

  • 8/7/2019 hawkeye [EDocFind.com] (1)

    25/25

    www.cs.wisc.edu/condor25

    Thank you!

    I was an

    overworkedsysadmin. NowI have more free

    time thanks toHawkEye!