information dump

57
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Information Dump White Areas Lecture Laurence Field 30 th January 2009

Upload: danyl

Post on 01-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Information Dump. White Areas Lecture Laurence Field 30 th January 2009. Overview. What is a Grid? Information Models The Glue 2.0 The Information System The New BDII GStat 2.0. What is a Grid?. What is a Grid?. Cross-organizational Grids. Volunteer Computing. Campus Grids. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information Dump

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Information Dump

White Areas Lecture Laurence Field

30th January 2009

Page 2: Information Dump

2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Overview

• What is a Grid?• Information Models• The Glue 2.0• The Information System• The New BDII• GStat 2.0

Page 3: Information Dump

3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

What is a Grid?

Page 4: Information Dump

4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

What is a Grid?

Cross-organizational

Grids

Intra-organizational

Grids

Data Centers

Virtualization

Volunteer Computing

Campus Grids

Clusters

Cloud

Computing

Page 5: Information Dump

5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

What is the problem?

• Organization A and B are administrative domains– Independent policies, systems and authentication mechanisms

• Users have local access to their local system using local methods• Users from A wish to collaborate with users from B

– Pool the resources– Split tasks by specialty– Share common frameworks

Organization BOrganization A

Page 6: Information Dump

6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The Solution

• The Users from A and B create a Virtual Organization– Users have a unique identify but also the identity of the VO

• Organizations A and B support the Virtual Organization– Place “grid” interfaces at the organizational boundary– These map the generic “grid” functions/information/credentials

To the local security functions/information/credentials

• Multi-institutional e-Science Infrastructures

Organization BOrganization A Virtual Organization

Page 7: Information Dump

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The Information System

Organization BOrganization A

InformationSystem

Users Operations Service

Page 8: Information Dump

8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Information Models

Page 9: Information Dump

9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Information Model

• Abstract description of data – Description of values which are identified by attributes– Description of attribute groupings– Description of relationships between groupings

• Data → Information → Knowledge– Information model turns data into information

Existence, Description, State

• Describes the components in a grid infrastructure– and hence the grid itself

• The Data Model is the implementation – LDAP, XML, Relational etc.

Page 10: Information Dump

10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The Original MDS 2.x Schema

http://www.globus.org/toolkit/docs/2.4/mds/Schema.html

Page 11: Information Dump

11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

European DataGrid Project

• Found that the MDS schema was not sufficient for their needs

• Each functional area defined their own sub schema– Workload management, data management, fabric management– data storage and network monitoring.

• Introduced the Computing Element (CE) entity which described– the GRAM endpoint – the batch system– state behind the endpoint – and a simple description of the resource (homogeneous cluster)

• The Storage Element (SE) entity which describes– the storage endpoint. – the Storage Element Protocol entity

Page 12: Information Dump

12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Nordugrid

• The Nordugrid project started in May 2001• Aimed to build a Nordic testbed

– for wide-area computing and data handling

http://www.nordugrid.org/documents/arc_infosys.pdf

Page 13: Information Dump

13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The World Wide Testbed

• A 2002 DataTAG initiative to create a worldwide Grid testbed

• Comprised of – 8 European sites using the EDG 1.2 release– 9 U.S. sites using the VDT 1.1.3 release

• The EDG release contained addition information providers – which were not available in the VDT release

• The information was essential for the Resource Broker to function

• The information providers were installed on all the US sites– An example of interoperability using the parallel deployment model

Page 14: Information Dump

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Origins and Aims

• GLUE: Grid Laboratory Uniform Environment– Started in April 2002– Join activity between EU-DataTAG, US-iVDGL and EDG

Focused on interoperability between US and EU HEP projects

– Aimed to provide common schema to facilitate interoperations

• Initial versions– v1.0 (released Nov 2002) – v1.1 (released April 2003)

• HEP driven revisions– v1.2 (released Dec 2005)– v1.3 (released Oct 2006)

Page 15: Information Dump

15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

OSG and GLUE v1.2

• Both EGEE and OSG used GLUE v1.1– OSG (MDS + GLUE + their own Grid3 schema)– EGEE (GLUE + their own extensions)

• Relying on custom extensions breaks interoperability– Additional use cases need to be added to GLUE

• A proposal for version GLUE v1.2 was discussed – An incremental approach taken– Only make the minimal changes– Only solve problems found in deployment– Ensure backwards compatibility

• For non-backwards compatible changes– Introduced the idea of defining Glue 2.0 at a future date

Page 16: Information Dump

16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

GLUE v1.3

• Last minute changes for LHC start-up.– Could not wait for Glue 2.0– Main focus was SRM 2.x

• Meeting in October 2006 to discuss proposed changes– 44 suggested changes ,30 accepted, 8 rejected and 5 duplicates

• Version 1.3 deployed the being of 2007– Ongoing migration with respect to usage

• No requirement for v1.4 – Suggests that things are not too bad

No blocking issues that urgently require a schema change

• Proved useful in interoperation activities– OSG, NDGF, gin-info, Unicore, NAREGI etc.

• Interpretation of the schema has been tightened– The understanding of the schema has improved– Many additional documents describe usage.

Page 17: Information Dump

17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

GLUE 2.0

Page 18: Information Dump

18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Moving into the Open Grid Forum

• Conceptual and structural changes left for Glue 2.0– Discussion on GLUE 2.0 at the Oct 2006 meeting in London

• Decision made to define Glue 2.0 within OGF– Improve the acceptance of GLUE by other communities

The OGF process should not create to much overhead

• GLUE-WG started in Jan 2007 at OGF19– Building on the 4 years of existing work

• Positive Outcomes– GLUE widely accepted within OGF

Seen as an important contribution

– Grid Forge helped the activity coordination– Broad view points limited assumptions– Increased participation from other projects

And hence acceptance by those projects.

Page 19: Information Dump

19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Introduction

• Glue Schema Working Group Created in the Open Grid Forum– Need demonstrated though the GIN activities.

• Build upon existing experiences– Consolidate over 4 years of production feedback

• Focus on use cases seen not envisaged– Cross-Grid use cases

• Define an abstract Information Model– And a number of renderings; LDAP, XML, Relational, CIM etc.

• Start with abstract core concepts– Evolve into specific service types

• Ensure participation from existing production infrastructures

Page 20: Information Dump

20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Key Concepts

User

Domain

Admin

Domain

Resource

Provides

Utilizes

Page 21: Information Dump

21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Key Concepts

User

Domain

Admin

Domain

Resource

Negotiates Share with

Defined onShare

Utilizes

Manages

Provides

Manager

Page 22: Information Dump

22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Key Concepts

User

Domain

Admin

Domain

ResourceShareEnd Point

Access

Policy

Mapping

Policy

Negotiates Share with

Defined on

Contacts

Maps User to

Has

Manager

Manages

Provides

Page 23: Information Dump

23

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Key Concepts

User

Domain

Admin

Domain

Resource

Manager

ShareEnd Point

ActivityAccess

Policy

Mapping

Policy

Negotiates Share with

Provides

Manages

Runs

Defined on

Contacts

Maps User to

Has

Service

Page 24: Information Dump

24

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Computing Schema

Computing

Service

Execution

Environment

Computing

Manager

Computing

Share

Computing

End Point

Computing

Activity

Manages

Runs

Defined onMaps User to Application

EnvironmentCan use

Mapped to

Page 25: Information Dump

25

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Storage Schema

Storage

Service

Storage

Resource

Storage

Manager

Storage

Share

Storage

End Point

Share

Capacity

Defined onMaps User to

Storage

CapacityHas

Storage

AccessProtocol

Offers

Offers Manages

Page 26: Information Dump

26

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Glue 2.0 Timeline

• Oct 2006, Decision taken to move into OGF• Jan 2007 (OGF 19), First working group meeting• June 2008 (OGF 23), Spec. entered public comments• Aug 2008, Public comment period ended• Nov 2008, Started addressing comments

• Jan 2009, Final Spec. ready?• Mar 2009, Glue 2.0 official OFG Specification?

• 1st April 2009, Start work on Glue 2.1

Page 27: Information Dump

27

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Proposed Roll Out Plan

1. Create a hybrid schema file with both v1.3 and v2.0– Deploy across the infrastructure

Should have negligible side effects

– Est. 3 - 6 months after specification fixed

2. Update information providers– Publish Glue 2.0 information in addition to Glue 1.3– Deploy across the infrastructure– Est. 4 - 12 months after specification fixed

3. Update software and tooling as necessary– Est. 6 - 36 months after specification fixed

4. Remove Glue 1.3 providers when no longer required1. Est. 36 - ?? months after specification fixed

Page 28: Information Dump

28

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Some Statistics

• 45 phone conferences – 1.5 hours each ~ 3 days talking– 5 people participating ~ 2 months FTE invested in total

Split between projects (EGEE, WLCG, Teragrid, Nordugrid, DEISA) This does not include the time invested by editor (OMII-Europe)

• 40 versions of the document – 347 days between first conference and initial specification– 46 pages, 12787 words – Document updated nearly every week

• 254 Attributes– 28 Objects

• Four different renderings– LDIF, XML, Relational and CIM

Page 29: Information Dump

29

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The Information System

Page 30: Information Dump

30

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Globus MDS v2

• Metadata Directory Service (MDS)– http://www.globus.org/toolkit/docs/2.4/mds/

• Information Providers (IP)– Scripts that get the information and return LDIF

• Grid Resource Information Service (GRIS)– Daemon that runs the IP and answers LDAP queries – Register to a GIIS

• Grid Information Index Service (GIIS)– answers LDAP queries by querying registered GRIS’s or GIIS’s.

• Both the GRIS and GIIS have a 30s cache– To reduce load and improve performance

Page 31: Information Dump

31

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Original MDS Deployment

Top

GIIS

Site

GIIS

GRISGRIS

Site

GIIS

GRISGRIS

Provider Provider ProviderProvider

Query

Page 32: Information Dump

32

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The BDII

• Berkeley Database Information Index.– Standard OpenLDAP server – Updated by a perl process.

Using LDAP URLs (ldapsearch) (GIIS mode) From a script (Information Provider) (GRIS mode)

• Why?– Because MDS didn’t work in a distributed environment.

Originally did not scale past 4 sites.• 1 broken work node could bring down the whole system!

MDS was the problem not LDAP.

• BDII first used as top-level GIIS– Now used at the site and resource level

Page 33: Information Dump

33

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Information System Architecture

Top

BDII

Site

BDII

GRISResource

BDII

Site

BDII

GRISResource

BDII

Provider Provider ProviderProvider

Query

Page 34: Information Dump

34

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

BDII

• Multiple DBs instances used to increase performance– Read only, write only and one spare for queries to finish.– This functionality is enabled by the port forwarder.

• List of sources to query from local file– Can be updated from a web page.– More than one DBs is used, separate read and write.

• Can also use a local LDIF file to modify DB after population.– Can be updated from a web page.

2171LDAP

2172LDAP

2173LDAP

2170Port Fwd

Update DB&

Modify DB

2170Port Fwd

Swap DBs

Write to cache Write to cache

Write to cache Write to cache

Write to cache ldapsearch

FCR

Page 35: Information Dump

35

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Load Balanced BDII

BDII2170

BDII2170

BDII2170

BDII2170

BDII2170

BDII2170

DNS Round

Robin Alias

Queries

Page 36: Information Dump

36

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Freedom of Choice

• Developed to meet a requirement from the VOs. – Modifies the information to their liking

White list and black list services.

– Only the VO manger can white list and black list the services.

• Generates an LDIF modify file.– Web based.

• BDII can be configured to use this file– Will modify the database after population– For use only with top-level BDIIs

• Linked with the Site Functional Tests Portal – Can automatically remove a site if it fails a functional tests

It’s the VOs choice.

Page 37: Information Dump

37

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Generic Information Provider

• Provides information about the grid service. – Outputs LDIF information in accordance to the Glue Schema to stdout.

• Information can be provided by, – dynamic providers from the providers directory.– static files from the ldif directory. – dynamic plugins from the plugin directory.

• Cache used to improve efficiency and reduce load.

GIP

Provider

Config File

Plugin

Cache

LDIF

Page 38: Information Dump

38

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Generic Information Provider

Read Config File

Fork of providers and plugins

Wait (response time)

Write to cache Write to cache

Write to cache Write to cache

Write to cache Write to cache

Read provider and plugins from cache

Read Static LDIF

LDAP_MODIFY

Print to stdout

Process will time out

use cache if fresh

Page 39: Information Dump

39

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

User Tools

• lcg-infosites and lcg-info– Can be used to query the information system– For more information see the User Guide

https://edms.cern.ch/file/722398//gLite-3-UserGuide.pdf

• lcg-ManageVoTag– Used by the Vos to publish software environment tags– Publishes to /opt/edg/var/info/<VO>/<VO>.list

Ensure the VO can write here!

– Used by plugin glite-info-dynamic-software-wrapper

Page 40: Information Dump

40

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Observations

Problems observed in information system – not always due to information system

It is just where the problem is visible

– Many problems at the information providers level Due to either poor configuration Poor fabric management affecting information providers

• Scalability and Stability– Top level BDIIs can become over subscribed– BDIIs take too much time and resources (CPU/RAM) to update– Production problems difficult to trace.

Requires more instrumentation in the code.

– BDIIs don’t work with low bandwidth connections.

Page 41: Information Dump

41

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Investigations

• Stress Testing– ldapbench

Page 42: Information Dump

42

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Results

Page 43: Information Dump

43

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The New BDII

Page 44: Information Dump

44

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The New BDII v5

• Use only one LDAP database– Reduces complexity and relies on the stability of OpenLDAP

• Only do differential updates– Reduce the write interaction and update time

• Merge the GIP and the BDII– Only do LDAP_ADD and LDAP_MODIFY in one place

• Remove all internal caches– The database is the cache!

• Improved logging– Using the standard python logger– More stats which are available remotely

• Do more with less (KISS)!

Page 45: Information Dump

45

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

New Architecture

2170LDAP

NewLDIF

Provider

Plugin

LDIF

LDIFDIIF

LDAP_ADD

LDAP_ADD

LDAP_MODIFY

Query

Update

Merge

Page 46: Information Dump

46

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Future Work

• Reducing the network load– Investigate the use of syncrepl– Update static information less frequently

• Reducing the query load– Query caching on the WN

lcg-utils and Service Discovery API

• Failover queries– Local cache– Site level– Top levels

Page 47: Information Dump

47

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

GStat 2.0

Page 48: Information Dump

48

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Information Validation

• It is important that information is correct– Miss-configured sites have in the past

Stopped services to to run grid wide! Caused black holes for job submission.

• Information must agree with the Glue Schema– http://forge.gridforum.org/sf/projects/glue-wg

• And be accurate– Grid Status (gstat) does basic sanity checks for the each site– http://goc.grid.sinica.edu.tw/gstat/– Grid Wiki gives solutions to common problems– http://goc.grid.sinica.edu.tw/gocwiki/FrontPage

Page 49: Information Dump

49

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The Original GStat

Page 50: Information Dump

50

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

GStat 2.0 Core Concepts

• Monitor and test the information system• Primary goals for GStat:

– Detect faults in the information system– Validates the information content– Displays useful information with different views

• Build a sustainable architecture– Enabling decentralized operations – In a federated environment

• Redesign GStat in modular way– Reusable components reusable – Multi-location (site/roc)– Multi-application (certification/operations)

Page 51: Information Dump

51

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

BDIINagios

data

Graphs

snapshot

Monitoring

Visualization

Core

Validation

DisplayValidation

Scripts

Results

Entities

Glue

GStat 2.0 Architecture

Page 52: Information Dump

52

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Component Descriptions

• Core– Provides an SQL DB snapshot of the BDII content– Maintains an entity cache of what has been seen

• Validation– Validates the information content– Provides testing results for visualization or export

• Monitoring– Detects faults in the information system– Entity DB is used to configure which entities are monitored

– Depends on WLCG Nagios sensors (collaboration)– Prepares monitoring data and graphs ready for visualization

• Visualization– Uses entity DB to generate the main structure– Visualizes the result of validation and monitoring– Provides different views for different user groups

Page 53: Information Dump

53

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

GStat 2.0 Documentation

Page 54: Information Dump

54

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Example Usage

Page 55: Information Dump

55

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Nagios

Page 56: Information Dump

56

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Prototype Displays

Page 57: Information Dump

57

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Summary

• Glue 2.0 is coming – The transition will take time

• BDIII v5 is coming– Needs rigorous testing

• New testing methods are coming– gstat-validate and ldapbench

• GStat 2.0 is coming– An instance can be installed for the Cert Testbed– Compatible with Nagios and WLCG – Extensible: You can build things on top!

• Future work– Focus: addressing scalability and stability