generic policy rules and principles
DESCRIPTION
Generic policy rules and principles. Jean-Yves Nief. Talk overview. An introduction to CC-IN2P3 activity . iRODS in production: Why are we using it ? Who is using it ? Prospects. iRODS rules policies through examples : Resource Monitoring System. Biomedical applications: - PowerPoint PPT PresentationTRANSCRIPT
Generic policy rules and principles
Jean-Yves Nief
Talk overview
An introduction to CC-IN2P3 activity. iRODS in production:
– Why are we using it ?– Who is using it ?– Prospects.
iRODS rules policies through examples:– Resource Monitoring System.– Biomedical applications:
• Human data.• Animal data.
– Arts and Humanities.– Other rules: Mass storage system interface, access rights.– Pitfalls.– Future usages.
20/09/10Repository workshop - Garching 2
CC-IN2P3 activities
20/09/10Repository workshop - Garching 3
dapnia
Federate computing needs of the french scientific community in:– Nuclear and particle physics.– Astrophysics and astroparticles.
Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC, ….
Opened now to biology, Arts & Humanities.
iRODS @ CC-IN2P3: why using it ?
National and international collaborations. Users spread geographically (Europe, America, Australia…). Need for storage virtualization:
- federation of heterogeneous storage (disks, tapes) and data access system (MSS, databases…).
- transparent data access for end users.- middleware working on heterogeneous OS. - common logical name space.- virtual organization (access rights, groups etc…).- metadata search.- Easy interface with any kind of clients applications (APIs,
drivers).
20/09/10Repository workshop - Garching 4
iRODS @ CC-IN2P3: why using it ?
SRB being used since 2003: – 3 PBs handled for 10 different experiments (HEP, astro,
biology).– Decomissionning: end of 2012 ?
Limitation: – no centralized data management (DM). no enforcement of DM policy.
iRODS rules based policy: – adequate solution.– from the user point of view: virtualization of data
management policy.
20/09/10Repository workshop - Garching 5
iRODS @ CC-IN2P3: who is using it ?
Arts and Humanities (Adonis):– Long term data preservation.– Web and batch jobs access.
Biology (phylogenetic), fluid mechanics: – grid jobs.
Biomedical applications:– Human and animal imagery.
Biology (phylogenetic), fluid mechanics: – grid jobs.
High Energy physics:– Neutrino experiment.
20/09/10Repository workshop - Garching 6
iRODS @ CC-IN2P3: who is going to use it ?
Astrophysics experiments:– LSST …
Other biomedical, physics projects. iRODS will be part of French NGI. All the SRB instances to be moved to
iRODS. 1 PB should be reached soon.
20/09/10Repository workshop - Garching 7
Rules examples: Arts and Humanities
20/09/10Repository workshop - Garching 8
CRDO
CINES
CC-IN2P3
1. Data transfer: CRDO CINES (Montpellier).
2. Archived at CINES.3. iRODS transfer to CC-
IN2P3: iput file.tar4. Automatic untar at Lyon
+ checksum.5. Automatic registration in
Fedora-commons (delayed rule).
Fedora
Archive
Ex: archival and data publication of audio files (CRDO).
Rules examples: biomedical data
Human and animal data (fMRI, PET, MEG etc…). Usually in DICOM format. Main issue for human data:
– Need to be anonymized ! Need to do metadata search on DICOM files. Rule:
1. Check for anonymization of the file: send a warning if not true.2. Extract a subset of metadata (based on a list stored in iRODS)
from DICOM files.3. Add these metadata as user defined metadata in iRODS.
20/09/10Repository workshop - Garching 9
Rules examples: resource monitoring system
20/09/10Repository workshop - Garching 10
iRODS iCAT server
iRODS data server
iRODS data server
iRODS data server
iRODS data server
1. Ask each server for its metrics: rule engine cron task (msi).
2. Performance script launched on each server.
Perf script
Perf script
Perf script
Perf script
3. Results sent back to the iCAT.
4. Store metrics into iCAT.
DB
5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi).
Other rules
Mass Storage System integration:– Using compound resources: iRODS disk cache + tapes.– Data on disk cache replication into MSS asynchronously (1h later)
using a delayExec rule.– Recovery mechanism: retries until success, delay between each
retries is doubled at each round. ACL management:
– Rules needed for fine granularity access rights management.– Eg:
• 3 groups of users (admins, experts, users).• ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r• ACLs on all others subcollections => admins + experts : r/w, users : r
20/09/10Repository workshop - Garching 11
Developpements needed
Scripts/binaries:– Metadata extraction from DICOM files.– Registration of files into Fedora-Commons.– … Needed whatever storage system being used underneath.
Micro-services:– ACLs, tar/untar of archives file,… APIs already available, did not require a large amount of work (parts of
iRODS distro).– Resource Monitoring System: bigger developpement, includes
modification of the iCAT schema. Rules:
– Most of them are simple.– Somes requires more work (Adonis project), workflow more complex.
20/09/10Repository workshop - Garching 12
Pitfalls and bugs
Writing complex rules:– Avoid writing them directly using the .irb syntax.– Becomes difficult to debug especially with nested actions.solution: need to use ruleGen to generate rules in a more user
friendly manner. Some memory leaks found with irodsReServer with Oracle as a
backend: Fixed in 2.4.
delayExec syntax bugs:Fixed in 2.4 and 2.4.1.
Rules in configuration file at the moment: – Must be consistent on all the iRODS servers. Will be in the iCAT database in the future.
20/09/10Repository workshop - Garching 13
Prospects
Rules for database interaction (in progress):– Will be used by DTM (developped at CC-IN2P3):
• DTM managed list of tasks to be processed by a batch cluster.• DTM requires a database to manage the tasks.
– Rule launched by the client will interact with the DTM database through iRODS:
• More security: iRODS used as a proxy server (database behind a firewall, use iRODS authentication.
• Database schema upgrade transparent for the client (no SQL code launched on the client side).
Xmessaging system (part of iRODS):– Allow to exchange messages between different iRODS process or clients.– e.g.: Could be used to monitor job status in a distributed computing
environnement.
20/09/10Repository workshop - Garching 14
Acknowledgement
Thanks to:– Pascal Calvat.– Yonny Cardenas.– Thomas Kachelhoffer.– Pierre-Yves Jallud.
03/25/10iRODS at CC-IN2P3 15