hawkeye [edocfind.com] (1)
TRANSCRIPT
-
8/7/2019 hawkeye [EDocFind.com] (1)
1/25
www.cs.wisc.edu/condor 1
HawkEyeA Monitoring and Management
Tool for Distributed SystemsTodd Tannenbaum
Department of Computer Sciences
University of Wisconsin-Madison http://www.cs.wisc.edu/condor [email protected]
-
8/7/2019 hawkeye [EDocFind.com] (1)
2/25
www.cs.wisc.edu/condor 2
What does Condor have? lots of c ore technology for bui lding a distributed system
-
8/7/2019 hawkeye [EDocFind.com] (1)
3/25
www.cs.wisc.edu/condor 3
What does Condor have? lots of core technology for building a distributed system lots of core technology for monitoring
the status of a machine
-
8/7/2019 hawkeye [EDocFind.com] (1)
4/25
www.cs.wisc.edu/condor 4
What does Condor have? lots of core technology for building a distributed system lots of core technology for monitoring
the status of a machine
lots of core technology for managing a w ork load of tasks
-
8/7/2019 hawkeye [EDocFind.com] (1)
5/25
www.cs.wisc.edu/condor 5
What does Condor have? lots of core technology for building a distributed system lots of core technology for monitoring
the status of a machine
lots of core technology for managing a w ork load of tasks lots of really, truly, skilled and
experienced developers and researchers at building distributed systems . S ome of the best . S tandout state employees . Honest .E mail for W isconsin G ov S cott McC allum:
w isgov@ gov.state .w i.us
-
8/7/2019 hawkeye [EDocFind.com] (1)
6/25
www.cs.wisc.edu/condor 6
One day an
avid Condoruser asked:
-
8/7/2019 hawkeye [EDocFind.com] (1)
7/25
www.cs.wisc.edu/condor 7
One day an
avid Condoruser asked:
Say, could CondorTechnology be used
for distributed system administration??
-
8/7/2019 hawkeye [EDocFind.com] (1)
8/25
www.cs.wisc.edu/condor 8
Time to think Gathered up our experiences w ith our own management tasks, looked at the mature C ondor technology available to us, and HawkEye effort was born .
Completely separate from Condor from end user prospective .
Can install HawkEye, or Condor, or both
-
8/7/2019 hawkeye [EDocFind.com] (1)
9/25
www.cs.wisc.edu/condor 9
First Component:MONITORING
Sysadmins first need information about what is happening on the
machines they are responsible for .Both Current and PastInformation must be consolidated and easily accessibleInformation must be dynamic
-
8/7/2019 hawkeye [EDocFind.com] (1)
10/25
www.cs.wisc.edu/condor 10
Condor ClassAds T echnology for an entity to describe
itself Simple attribute value pairs
[ load_average = 1.3free_Swap_space_mb = 140number_of_processes = 92keyboard_idle_secs = 6ram = 128total_swap = 512total_memory = ram + total_swapbusy = load_average > 1.0
]
-
8/7/2019 hawkeye [EDocFind.com] (1)
11/25
www.cs.wisc.edu/condor 11
Condor ClassAds, cont. No fixed schema Attributes can contain values or
expressions
Serialize Ads in XML Open source libraries on C++ and J ava to:Manipulate Ads and Ad attributesStore Ads
Query collections of Ads Bindings for Perl and others on the way
-
8/7/2019 hawkeye [EDocFind.com] (1)
12/25
-
8/7/2019 hawkeye [EDocFind.com] (1)
13/25
www.cs.wisc.edu/condor 13
HawkEye Monitoring Agent
HawkEye Monitoring Agent
HawkEyeManager HawkEye Monitoring Agent
HawkEye Monitoring Agent
HawkEye Monitoring Agent
-
8/7/2019 hawkeye [EDocFind.com] (1)
14/25
www.cs.wisc.edu/condor 14
HawkEye Monitoring Agent
/proc, kstat
Hawkeye_Startup_Agent
Hawkeye_Monitor
HawkEye Monitoring Agent
HawkEyeManager ClassAd
UpdatesViaSecureUDP
-
8/7/2019 hawkeye [EDocFind.com] (1)
15/25
www.cs.wisc.edu/condor 15
Monitor Agent, cont.
Updates are sent periodicallyInformation does not get stale
Updates also serve as a heartbeat monitorKnow when a machine is down Out of the box, the update ClassAd has
many attributes about the machine of interest for system administration
Current Prototype = 184 attributes
-
8/7/2019 hawkeye [EDocFind.com] (1)
16/25
www.cs.wisc.edu/condor 16
What if I wantto monitor
something youdidnt think
about?
-
8/7/2019 hawkeye [EDocFind.com] (1)
17/25
www.cs.wisc.edu/condor 17
Custom Attributes
/proc, kstat
Hawkeye_Startup_Agent
Hawkeye_Monitor
HawkEye Monitoring Agent
HawkEyeManager
Data fromhawkeye_update_attribute
command line tool
Create your own
HawkEye plugins,or share plugins withothers
-
8/7/2019 hawkeye [EDocFind.com] (1)
18/25
www.cs.wisc.edu/condor 18
Role of HawkEye
Manager Store all incoming ClassAds in a indexed resident data structure
Fast response to client tool queries about current stateShow me all machines with a load average > 10
Periodically store ClassAd attributes into a Round Robin D atabase
Store information over timeShow me a graph with the load average for this machine over the past week
Speak to clients via CEDAR, HTTP
HawkEyeManager
-
8/7/2019 hawkeye [EDocFind.com] (1)
19/25
S everal different clients Command- line, GUI, W eb - based
-
8/7/2019 hawkeye [EDocFind.com] (1)
20/25
www.cs.wisc.edu/condor 20
But sysadmins alsosometimes have to do
work T ask: copy a new library onto the
local disk of each machine.J ust a script to copy via rcp/scp to every machine or is it?
-
8/7/2019 hawkeye [EDocFind.com] (1)
21/25
www.cs.wisc.edu/condor 21
Running tasks on behalf of the sysadmin Submit your sysadmin tasks to HawkEye
T asks are stored in a persistent queue by the Manager
T asks can leave the queue upon completion, or repeat after specified intervalsT asks can have complex interdependencies via DAGManRecords are kept on which task ran where
Sounds like Condor, eh?Yes, but simpler
-
8/7/2019 hawkeye [EDocFind.com] (1)
22/25
www.cs.wisc.edu/condor 22
Run Tasks in response tomonitoring information ClassAd Requirements Attribute
Example: Send email if a machine is low on disk space or low on swap space
Submit an email task with an attribute: Requirements = free_disk < 5 || free_swap < 5
Example w/ task interdependency: If load average is high and OS=Linux and console is Idle, submit a task which runs top , if top sees Netscape, submit a task to kill Netscape
-
8/7/2019 hawkeye [EDocFind.com] (1)
23/25
-
8/7/2019 hawkeye [EDocFind.com] (1)
24/25
www.cs.wisc.edu/condor 24
Current S tatus
J ust Beginning this project Initial release early summer Prototypes already running
Stop in and see initial HawkEye W orkRm 3385 on W eds 9am 12pm
-
8/7/2019 hawkeye [EDocFind.com] (1)
25/25
www.cs.wisc.edu/condor25
Thank you!
I was an
overworkedsysadmin. NowI have more free
time thanks toHawkEye!