egee catalogs peter kunszt egee data management middleware service grids nesc, 22-23 july 2004 egee...

14
EGEE Catalogs Peter Kunszt EGEE Data Management Middleware http://cern.ch/egee-jra1 Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the European Union under contract IST-2003-508833

Upload: spencer-oneal

Post on 02-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

EGEE Catalogs

Peter Kunszt

EGEE Data Management Middleware

http://cern.ch/egee-jra1

Service Grids NeSC, 22-23 July 2004

EGEE is a project funded by the European Union under contract IST-2003-508833

Page 2: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 2

High-level strategy for middleware

• EGEE Middleware – To re-engineer generic middleware packages Incorporating experience from EDG, VDT, LCG, AliEn

(product from CERN Alice experiment) and others Architected for scale and performance requirements of LCG

and other applications

• EGEE design team formed early to develop architecture Architecture: https://edms.cern.ch/document/476451

• Fast prototyping approach Short update cycles to give applications the chance to

influence and give feedback

Page 3: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 3

Guiding principles

• Lightweight (existing) services Easily and quickly deployable Re-use as much as possible

• Interoperability Allow for multiple implementations

• Resilience and Fault Tolerance

• Service oriented approach Follow WSRF standardization No mature WSRF implementations exist to date, hence: start with plain WS – WSRF

compliance is not an immediate goal Aim for WS-I compliance

• Co-existence with deployed infrastructure Co-existence (and convergence) with existing grid infrastructures (e.g. LCG2) are

essential for the EGEE Grid service

EDGVDT . . .

LCG

EGEE

. . .AliEn

Page 4: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 4

High-level functional decomposition

• Starting point was the ARDA roadmap document Focus is upon interfaces that can be composed into useful services

Page 5: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 5

EGEE Data functional interfaces

• File Catalog Management of the logical namespace

• Replica Catalog Tracking of file replicas

• Metadata Catalog Application specific metadata

• In particular, metadata used to select logical files

• Combined Catalog Added functionality by orchestration of

the 3 catalogs (providing transaction safety)

• Storage Element Where the files get stored SRM interface (see GGF GSM-WG)

• Manage a Storage Resource• Space reservation• Put and retrieve files using various

protocols Posix-like File I/O

• Most posix-compliant feature support• Abstraction over existing MSS IO

mechanisms

• Data Management:• File Transfer Service

Reliable transfer of files between two sites

• File Placement Service Transfer and register files Orchestrate File Transfer and Data

Catalog services

• Data Scheduling Service Event-based data transfer, using File

Placement Service

Page 6: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 7

File Catalog

Metadata Catalog

Replica Catalog

Files and Catalogs

LFN GUIDMasterSURL

SURL

SURL

Metadata

Page 7: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 8

Services: EGEE Catalogs

SOA: WS-IImplementation status: prototype

Page 8: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 9

Service Operations

File Catalog operations

• Directory operations

• Directory permissions

• Symbolic links

• List, find

• +Bulk ops (upload)

Replica Catalog operations

• GUID mappings to SURLs

• ACLs

• File ‘stat’ like metadata

• +Bulk ops (upload, delete)

Metadata Catalog operations

• Query, returning list of LFN/GUID

• Set metadata based on LFN/GUID

• Query metadata of LFN/GUID

Combined interface

• List based on LFN (including replicas, metadata)

• Add entries just based on LFN (auto entry of GUID, SURL)

• Permissions based on LFNs

http://cern.ch/pkunszt/catalogs/

Page 9: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 10

What do you use to build your service?(i.e. How ‘standard’ is your service?)

• Widely Implemented Standard Specification (1pt) All services are described through WSDL, WS-I compliant (nontrivial!) X509 extensions used for authorization (VOMS)

• Implemented draft Spec (2pt) GSS/GSI for delegation

• Implemented draft specification (3pt) --

• Implemented proposal (4pt) --

• Non-implemented proposal (5pt) --

• Concept (6pt) --

• TOTAL: 4• Will use: messaging (JMS) first (+?), WS-Transactions, WS-Notification

when they are available with lower rankings• Security: Delegation portType (proposal) (+4?)

Page 10: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 11

Service Dependencies

• What else does your service depend on (i.e. external dependencies)? RDBMS: need a JNDI connector. Can be anything beneath that, in

principle. Implementation currently exists for Oracle, MySQL. Logging: log4j

• What does your implementation depend on? Tomcat 4 or 5 Java 1.4 Axis 1.1 Security libs GSI (using CoG + GSI security libs)

Page 11: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 12

AAA & Security

• What authentication mechanism do you use? https: SSL/TLS / TrustManager + CoG

• What authorisation mechanism do you use? GSI Delegation -- Working on Delegation portType VOMS/AuthzManager Work ongoing on restricted delegation

• What accounting mechanism do you use? Logging, RGMA (see Abdeslem’s talk)

• Does service interaction need to be encrypted? No. Still waiting on detailed req’s from users whether they really need this

• If these are not used now, will they be in the future? Plugin-based extensibility planned. GSI over https used today. Extensions

should talk anything that people need (WS-Security in particular)

Page 12: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 13

Exploiting the Service Architecture

• What features from your ‘plumbing’ do you use in your service? Factory port : no Factory pattern : no Logging : yes Event notification : not yet Meta-data : yes Registry discovery/advertisement : yes Other OGSI/WSRF/WS/WS-GAF characteristics?

• No. but interested in – Messaging– Distributed Transactions, Sessions– Notifications and Eventing

Page 13: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 14

Service Activity

• Multiple interaction or single user? multiple

• Throughput (1/per day or 100/per second?) Many per second.

• Typical data volume moved in Lots of small simultaneous single operations Bulk operations with O(>10000) entries, up to O(106)

• Typical data volume moved out Lots of small single ops (queries for ACL, lookups) Bulk listings used by scheduler

Page 14: EGEE Catalogs Peter Kunszt EGEE Data Management Middleware  Service Grids NeSC, 22-23 July 2004 EGEE is a project funded by the

Service Grids – NeSC, July 22-23, 2004 - 15

Service Failure

• Required Reliability Support for many bulk operation failure policies

• Fail on any• Try as many as possible• [implies Policy management]• [implies notification for asynch ops]

Atomic operation failures ‘straightforward’

• Required Persistence State persistence not required, try to be atomic

• Required Availability Deployment choice.