accounting in egee … and beyond john gordon and david kant cclrc, e-science centre

23
Accounting in EGEE … and beyond John Gordon and David Kant CCLRC, e-Science Centre

Upload: barrie-mccormick

Post on 25-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Accounting in EGEE … and beyond

John Gordon and David Kant

CCLRC, e-Science Centre

Operations Workshop, Sept 2005 - 2

History

• EDG – EU DataGrid 2001-04 developed DGAS a full economic scheduling and accounting

package developed in Italy wasn’t mature enough to be deployed by end of EDG

• LCG – 2004-…. wanted resource reporting across the grid commissioned APEL from RAL

• SWEGrid developed SGAS for Swedish Supercomputing

Operations Workshop, Sept 2005 - 3

Types of Accounting

• Job Accounting AFTER the event (APEL Domain)• Concept of a “Job” as a unit of resource consumption• Determination of value after job execution• Job usage record as a complete description of resource

consumption• Suitable for post paid services.

• Real Time Accounting (DGAS, SGAS Domain)• Incremental determination of resource value while job being

executed• Incremental decrement of account balance • Can enforce user quotas • Suitable for pre-paid services

Operations Workshop, Sept 2005 - 4

APEL, Job Accounting Flow Diagram

[1] Build Job Accounting Records at site.

[2] Send Job Records to a central repository

[3] Data Aggregation

Operations Workshop, Sept 2005 - 5

Accounting for Grid Jobs

• Build Job Records at Site• APEL mapping grid users to the resource usage on local

farms

Job Records In via RGMA

RGMA

MON

SQL QUERY TO Accounting Server 1 Query / Hour

On-Demand Accounting Pages based on SQL queries to summary data

1 Record per Grid Job (Millions of records expected)

Summary data refreshed every hour (Max records about 100K per year)

Hom

e P

ag

e

User queries

Graphs

GOC

Consolidation of Data

Accounting Home Page

159 Sites publishing data (9 Jan 2006)

5.5 Million Job records

~ 100K records per week (period June – Dec 2005)

http://goc.grid-support.ac.uk//

Operations Workshop, Sept 2005 - 8

Demos of Accounting Aggregation

Global views of resource consumption.

• LCG View • http://goc.grid-support.ac.uk/gridsite/accounting/tree/treeview.php

Shows Aggregation for each LHC VO• Requirements driven by RRB / Kors Bos• Tier-1 and Country entry points• LHC VO only• All data normalised in units of 1000 . SI2000 . Hour• Tabular Summaries per Tier1/ Country

• GridPP View• http://goc.grid-support.ac.uk/gridsite/accounting/tree/gridppview.php

Shows Aggregation for EGEE partner Prototype for EGEE View

LHC View: Data Aggregation For VOs per Tier1, per Country

Aggregation of Data for GridPP

Aggregation of Data for Tier2

Data Aggregation at Site Level

Breakdown of data per Vo per month showing Njobs, CPUt, WCT, record history

Total CPU Usage per VO

Gantt Chart NB:Gaps across all VOs consistent with scheduled downdowns in GocDB

Operations Workshop, Sept 2005 - 13

Batch Support in APEL

Currently Available in LCG 2.6

• OpenPBS, Torque, PBSPro and Vanilla PBS ~90% Sites in LCG/EGEE

• Load Share Facility (Versions 5 and 6) CERN, Italy

Available in LCG 2.7

• Condor Canada

• Sun Grid Engine in development Imperial College

Operations Workshop, Sept 2005 - 14

APEL Summary

• APEL is not a banking system. Job accounting AFTER the event; Not in real-time.

• APEL designed to build accounting records at a site• Supports PBS and LSF; SGE (done) Condor in development

• Middleware Independent.• Although APEL uses R-GMA in LCG/EGEE, it could quite happily use any

other mechanism for transportation (e.g. MySQL, WebServices, GridFTP).• Can be deployed on other grids e.g. OSG

• Implementation is simple.• One database per site• One central repository

• APEL provides high level views usage Data• Can also show usage at the dn level with restricted access via ACLs

(GridSite)

• APEL has been running on the production EGEE grid for >1 year

Operations Workshop, Sept 2005 - 15

DGAS vs. APEL (?)

• DGAS and Apel aims are different:• DGAS:

Focused on storing detailed accounting information and controlling authorised access to it.

Provides resource&user(VO) level accounting. Can serve as a basis for economic accounting and quota management. Provides security and authorisation to information access.

• APEL: Focused on publishing accounting data and providing an easy graphical

view to aggregate information. Provides accounting suitable to upper (VO) level management view. Focuses on after the fact, resource oriented accounting.

• DGAS & APEL! We believe that these two softwares are not competitors, altough they have

some (needed) overlapping, If used together they can furnish what is actually needed for grid accounting and benefit from cooperation.

Operations Workshop, Sept 2005 - 16

Issues

• Full Deployment political, legal, security, paranoia batch system support

• Validation are all records captured? is normalisation correct? is site meeting commitments?

• Account other resources storage, memory, network

• Standards

• Interoperability

• Global Repository

Operations Workshop, Sept 2005 - 17

Challenges Ahead

• Recognise that accounting isn’t just about “job usage” its about Resource usage which encompasses many things:- CPU Usage Also Storage & Network Usage

• How do we describe this data? Luckily there is a GGF Usage Record which provides a generic description of

resource usage Are these descriptors stable? Are they sufficient to describe the data? Can we get Network and Storage people to use the same schema? CPU is consumed; Storage is Occupied and can be recycled

• How important is accounting? Compute resource viewed as a grid currency Need a guarantee that the data has not been tampered with in an un fair way How does normalisation fit into this? The concept of a raw usage records has no

meaning if internal scaling is applied to Heterogeneous farms. GGF UR allows a “cost” descriptor Do we need an agreement of cost?

Operations Workshop, Sept 2005 - 18

Challenges Ahead

• Data Collection Many implementations for collecting accounting data in LCG World;

• APEL/DGAS in EGEE• SGAS in SweGrid• Sites that implement their own systems (FermiILab, IN2P3, SARA: multiple grid

job managers from different grids feed a single condor pool)• Discussion with OSG on deploying APEL with their own transport mechanism.

Switching one for another doesn’t resolve the problem of data sharing across the project.

• No mechanism in place to share this data in a consistent way in place. GGF Working on a Resource Usage Service What would the model for data sharing look like? Low level or high level? Low Level: sensors publishing data via a web service? High level: Data collected within the infrastructure, aggregated in a meaningful

way, reviewed and approve data before it can be passed on (FermiLab) Some Tier-1 centres have concerns about data association

“LCG not EGEE” “Will the service be separate?”

Operations Workshop, Sept 2005 - 19

Challenges Ahead

• Usage Reporting at what Level? Anonymous level: How much resource has been provided to each VO Aggregation across: VOs, Countries, Regions, Grids, Organisations Granularity: summed over units of Hours, Days, Weeks, Months?

• User Level Reporting? If 10,000 CPU hours were consumed by Atlas VO, who are the users

that submitted the work? Data privacy laws A Grid “DN” is personal information which could be used to target an

individual. Who has access to this data and how do you get it? Can CA policies change to support anonymous DNs and reverse DN

mappings? What are the consequences? Are there any lawyers in the audience?

Operations Workshop, Sept 2005 - 20

World Wide Accounting Service for LCG

• Project involves combining results from all three peer infrastructures and presenting an aggregated view of resource usage for LHC VOs to the RRB

Peer Infrastructures in LCG• Open Science Grid + Others (Ruth Pordes, Philippe Canal, Matteo

Melani)• Nordugrid (Per Oster, Thomas Sandholm)• LCG/EGEE (Kors Bos, Dave Kant)

[email protected]

Operations Workshop, Sept 2005 - 21

Resource Usage Service

• Based on emerging GGF standards and Web Services GGF UR, OGSI

• An implementation exists in “Market for Computational Science” – UK e-Science project

• Use case might be: A user invokes the query service through a web browser, using SSL for client

authentication, to ensure that usage information at user level belongs to the user. Servlet sends query to RUS web service and gets user data.

Service

InterfaceRUS WS Application

ACL

DB

Web Service Container

Operations Workshop, Sept 2005 - 22

Possible Roadmap

Stage 1: Lets try to get some data from each of Tier-1ssummary records describing VO usage over a finite period of time• Before end 2005• SweGrid and Fermilab and DGAS ARE providing Data!

Stage 2: Centralised database with a web service interface (RUS) to publish/query accounting data (summary records)

• Sometime in 2006

Stage 3: Distributed databases with a complete RUS implementation including permission model.

• Sometime early 2007

Operations Workshop, Sept 2005 - 23

Summary

• EGEE has had a production accounting infrastructure in place since 2004 but still has a long way to go

• We are developing a central repository to sit above all the grid infrastructures to meet the requirement for global reporting on LHC Computing

• Accounting is a controversial subject

• Thank you to everyone who has cooperated