generic policy rules and principles

15
Generic policy rules and principles Jean-Yves Nief

Upload: pabla

Post on 23-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Generic policy rules and principles. Jean-Yves Nief. Talk overview. An introduction to CC-IN2P3 activity . iRODS in production: Why are we using it ? Who is using it ? Prospects. iRODS rules policies through examples : Resource Monitoring System. Biomedical applications: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Generic policy rules  and  principles

Generic policy rules and principles

Jean-Yves Nief

Page 2: Generic policy rules  and  principles

Talk overview

An introduction to CC-IN2P3 activity. iRODS in production:

– Why are we using it ?– Who is using it ?– Prospects.

iRODS rules policies through examples:– Resource Monitoring System.– Biomedical applications:

• Human data.• Animal data.

– Arts and Humanities.– Other rules: Mass storage system interface, access rights.– Pitfalls.– Future usages.

20/09/10Repository workshop - Garching 2

Page 3: Generic policy rules  and  principles

CC-IN2P3 activities

20/09/10Repository workshop - Garching 3

dapnia

Federate computing needs of the french scientific community in:– Nuclear and particle physics.– Astrophysics and astroparticles.

Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC, ….

Opened now to biology, Arts & Humanities.

Page 4: Generic policy rules  and  principles

iRODS @ CC-IN2P3: why using it ?

National and international collaborations. Users spread geographically (Europe, America, Australia…). Need for storage virtualization:

- federation of heterogeneous storage (disks, tapes) and data access system (MSS, databases…).

- transparent data access for end users.- middleware working on heterogeneous OS. - common logical name space.- virtual organization (access rights, groups etc…).- metadata search.- Easy interface with any kind of clients applications (APIs,

drivers).

20/09/10Repository workshop - Garching 4

Page 5: Generic policy rules  and  principles

iRODS @ CC-IN2P3: why using it ?

SRB being used since 2003: – 3 PBs handled for 10 different experiments (HEP, astro,

biology).– Decomissionning: end of 2012 ?

Limitation: – no centralized data management (DM). no enforcement of DM policy.

iRODS rules based policy: – adequate solution.– from the user point of view: virtualization of data

management policy.

20/09/10Repository workshop - Garching 5

Page 6: Generic policy rules  and  principles

iRODS @ CC-IN2P3: who is using it ?

Arts and Humanities (Adonis):– Long term data preservation.– Web and batch jobs access.

Biology (phylogenetic), fluid mechanics: – grid jobs.

Biomedical applications:– Human and animal imagery.

Biology (phylogenetic), fluid mechanics: – grid jobs.

High Energy physics:– Neutrino experiment.

20/09/10Repository workshop - Garching 6

Page 7: Generic policy rules  and  principles

iRODS @ CC-IN2P3: who is going to use it ?

Astrophysics experiments:– LSST …

Other biomedical, physics projects. iRODS will be part of French NGI. All the SRB instances to be moved to

iRODS. 1 PB should be reached soon.

20/09/10Repository workshop - Garching 7

Page 8: Generic policy rules  and  principles

Rules examples: Arts and Humanities

20/09/10Repository workshop - Garching 8

CRDO

CINES

CC-IN2P3

1. Data transfer: CRDO CINES (Montpellier).

2. Archived at CINES.3. iRODS transfer to CC-

IN2P3: iput file.tar4. Automatic untar at Lyon

+ checksum.5. Automatic registration in

Fedora-commons (delayed rule).

Fedora

Archive

Ex: archival and data publication of audio files (CRDO).

Page 9: Generic policy rules  and  principles

Rules examples: biomedical data

Human and animal data (fMRI, PET, MEG etc…). Usually in DICOM format. Main issue for human data:

– Need to be anonymized ! Need to do metadata search on DICOM files. Rule:

1. Check for anonymization of the file: send a warning if not true.2. Extract a subset of metadata (based on a list stored in iRODS)

from DICOM files.3. Add these metadata as user defined metadata in iRODS.

20/09/10Repository workshop - Garching 9

Page 10: Generic policy rules  and  principles

Rules examples: resource monitoring system

20/09/10Repository workshop - Garching 10

iRODS iCAT server

iRODS data server

iRODS data server

iRODS data server

iRODS data server

1. Ask each server for its metrics: rule engine cron task (msi).

2. Performance script launched on each server.

Perf script

Perf script

Perf script

Perf script

3. Results sent back to the iCAT.

4. Store metrics into iCAT.

DB

5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi).

Page 11: Generic policy rules  and  principles

Other rules

Mass Storage System integration:– Using compound resources: iRODS disk cache + tapes.– Data on disk cache replication into MSS asynchronously (1h later)

using a delayExec rule.– Recovery mechanism: retries until success, delay between each

retries is doubled at each round. ACL management:

– Rules needed for fine granularity access rights management.– Eg:

• 3 groups of users (admins, experts, users).• ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r• ACLs on all others subcollections => admins + experts : r/w, users : r

20/09/10Repository workshop - Garching 11

Page 12: Generic policy rules  and  principles

Developpements needed

Scripts/binaries:– Metadata extraction from DICOM files.– Registration of files into Fedora-Commons.– … Needed whatever storage system being used underneath.

Micro-services:– ACLs, tar/untar of archives file,… APIs already available, did not require a large amount of work (parts of

iRODS distro).– Resource Monitoring System: bigger developpement, includes

modification of the iCAT schema. Rules:

– Most of them are simple.– Somes requires more work (Adonis project), workflow more complex.

20/09/10Repository workshop - Garching 12

Page 13: Generic policy rules  and  principles

Pitfalls and bugs

Writing complex rules:– Avoid writing them directly using the .irb syntax.– Becomes difficult to debug especially with nested actions.solution: need to use ruleGen to generate rules in a more user

friendly manner. Some memory leaks found with irodsReServer with Oracle as a

backend: Fixed in 2.4.

delayExec syntax bugs:Fixed in 2.4 and 2.4.1.

Rules in configuration file at the moment: – Must be consistent on all the iRODS servers. Will be in the iCAT database in the future.

20/09/10Repository workshop - Garching 13

Page 14: Generic policy rules  and  principles

Prospects

Rules for database interaction (in progress):– Will be used by DTM (developped at CC-IN2P3):

• DTM managed list of tasks to be processed by a batch cluster.• DTM requires a database to manage the tasks.

– Rule launched by the client will interact with the DTM database through iRODS:

• More security: iRODS used as a proxy server (database behind a firewall, use iRODS authentication.

• Database schema upgrade transparent for the client (no SQL code launched on the client side).

Xmessaging system (part of iRODS):– Allow to exchange messages between different iRODS process or clients.– e.g.: Could be used to monitor job status in a distributed computing

environnement.

20/09/10Repository workshop - Garching 14

Page 15: Generic policy rules  and  principles

Acknowledgement

Thanks to:– Pascal Calvat.– Yonny Cardenas.– Thomas Kachelhoffer.– Pierre-Yves Jallud.

03/25/10iRODS at CC-IN2P3 15