a generic approach to job tracking for distributed computing: the star approach

15
A generic approach to job tracking for distributed computing: the STAR approach by V.Fine, J.Lauret, L.Hajdu

Upload: yitro

Post on 07-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

A generic approach to job tracking for distributed computing: the STAR approach. by V.Fine, J.Lauret, L.Hajdu. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A generic approach to job tracking for distributed computing: the STAR approach

A generic approach to job tracking for distributed computing: the STAR

approach 

by

V.Fine, J.Lauret, L.Hajdu

Page 2: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

2

Introduction

Job tracking, i.e. monitoring bundle of jobs or individual job behavior from submission to completion, is becoming very complicated in the heterogeneous Grid environment.

This talk presents the principles of an integrating tracking solution based on components already deployed at STAR, none of which are experiment specific: a Generic logging layer and the STAR Unified Meta-Scheduler (SUMS).

Page 3: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

3

Logging overview• Most HENP experiment software includes a

logging or tracing API allowing the display it a particular format of important feedback coming from the core application.

• Typical so-called “HENP production” jobs initiate a chain of events where many components involved are often written in diverse languages, which do not offer a consistent and easily adaptable interface for logging important events

Page 4: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

4

STAR generic logging layer• The STAR Grid production environment contains a “generic

logging layer”, built on top of a “logger” family derived from the “Jakarta log4j” project. That provides us the consistency across packages and frameworks.

• “Jakarta log4j” is a community-supported project that includes the log4cxx, log4c, and log4perl packages (thereby supporting multiple computing languages) and encompassing four main component types:– loggers– appenders– layouts– filters

• These four types are necessary to empower the developers with the capabilities to log messages according to message type and level, to control at runtime how these messages are formatted, and lastly to decide where they are reported.

Page 5: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

5

STAR generic logging layer (cont.)

• STAR logger layer allows us to set independently the level of the verbosity for each STAR production “module”. This is convenient for the debugging of the code as a whole as modules and classes are uniquely identified by their message format. In addition, each developer can set the message output levels for their own modules.

Page 6: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

6

Generic logging layer and the STAR Unified Meta-Scheduler

SUMS• SUMS is a "generic" gateway to user batch-mode

analysis and allows the user to describe tasks based on an abstract job description language (SUMS’s architecture was designed around a module plug-and-play philosophy and is therefore not experiment specific).

(More on SUMS: see talk: “Meta-configuration for dynamic resource brokering: the SUMS approach” )

Page 7: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

7

SUMS job unique ID

• The logger layer allows us utilizes a unique ID generated by SUMS for each task it handles and the set of jobs it creates; The ID is used for creating and updating Job records in the administrative database along with other vital job related information. Our approach does not require users to introduce any additional key to identify and associate the job with the database tables as the tree structure of information is handled automatically.

Page 8: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

8

loggerxml

SUMSxml

SUMS

log4j

JAVA

User’s meta-job

Logger configuration

File catalog

log4perl

GRID realmGRID realm

Job_1

Job_2

Job_N

. . .

Meta-log

proxySUMS assigns a unique tag for

each job

Using the unique job id tag assigned by SUMS for the job tracking

C++

job_3

log4cxx

Job tracking

Page 9: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

9

SUMS+log4j = powerful tracking• To exploit the SUMS + log4cxx capability one

needs to make one step further by complementing the system with the dedicated logger and layout for each language the GRID infrastructure employs.

• MySQLTrackingAppender – – establish a connection with the mysql Db (or its

proxy) – execute the command provided and formated by the

logger layout component.“Appender” has no idea what concrete command it is

going to send to the MySQL server.

Page 10: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

10

SQL tracking “appender” + Tracking Layout = flexible Db

interface• It is not mandatory to use

MySQLTrackingAppender to direct the messages to MySQL Db. One can use th standard MySQLAppender from log4j kit.

• It is not mandatory to this appender with JobTrackingLayout layout either.

• But together they create a powerful tool to track the job submitted by SUMS

Page 11: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

11

JobTrackingLayout• Special layout, class derived from the “PatternLayout“

class• Doesn’t conflict with user’s will. The user is still free to

format the message the way he/she thinks fit his/her current need.

• On the other hands the SysAdmin, production manager, SUMS are allowed to add their own job logger configuration components to the job configuration file if any

“JobTrackingLayout” can find and expand the $XXX like string to the $XXX run-time env. variable that can be present in the original user’s message to add the run-time env information to the MySQL command

Page 12: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

12

DEBUG

WANRING

INFO

FATAL

Mixing the program messages with the external parameters

User’s code

Logger<<“USER’S MESSAGE”

Logger threshold

appender custom filters

layout

Joblog4j.xml

Output message

Job environment

consoleApprender mySQLApprender

SUMS

. . . . . . . .

selection patterns

Format rules

id

SUMS attaches a unique tag for each

real job

Page 13: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

13

XML configuration example:<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE configuration>

<configuration xmlns='http://logging.apache.org/'> <appender name="stdout" class="org.apache.log4j.ConsoleAppender"> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="%-3c{2}:%-5p - %m%n"/> </layout> </appender> <appender name="MYSQL" class="org.apache.log4j.MySQLTrackingAppender"> <layout class="org.apache.log4j.JobTrackingPatternLayout"> <param name="ConversionPattern" value="&quot;%-3c{2}&quot;)” > </layout> </appender> <root> <priority value ="FATAL" /> <appender-ref ref="stdout" /> </root> <logger name="QA"> <priority value ="DEBUG" /> <appender-ref ref="MYSQL" /> </logger></configuration>

Page 14: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

14

MySQL job tracking tables relation

Page 15: A generic approach to job tracking for distributed computing: the STAR approach

CHEP 2006 V.Fine, J.Lauret, L.Hajdu

Job Tracking for Distributed Computing: the STAR Approach

15

Conclusion

The propose approach of representing (sets of) GRID jobs submitted via SUMS system in a database does make easy to implement management, scheduling, and query operations as the user may list all previous jobs and get the details of status, time submitted, started, finished, etc…

However to be able to track jobs through GRID realm the GRID components have to equipped with the logger layer STAR alike.