ceph at the digital repository of ireland - ceph day frankfurt

18
Peter Tiernan Systems and Storage Engineer Digital Repository of Ireland TCHPC Ceph at the DRI

Upload: inktank

Post on 16-Jan-2015

682 views

Category:

Technology


3 download

DESCRIPTION

Peter Tiernan, Trinity College

TRANSCRIPT

Page 1: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Peter TiernanSystems and Storage EngineerDigital Repository of Ireland TCHPC

Ceph at the DRI

Page 2: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

DRI:

The Digital Repository Of Ireland (DRI) is an interactive, national trusted digital repository for contemporary and historical, social and cultural data held by Irish institutions.

The DRI follows the Open Archival Information System (OAIS) ISO reference model and The Trusted Repository Audit Checklist (TRAC)

Page 3: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

OAIS Model:

- is concerned with all technical aspects of digital repositories- describes ‘components and services required to develop and maintain archives’- is broken down into 'Functional Entities' and 'Work Packages'. - WP8 is responsible for the ‘Archival Storage’ functional entity.

Page 4: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

OAIS Model:

Source:www.digital-preservation.com

Page 5: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

DRI Storage Requirements:

OAIS/TRAC requires the following from storage:

- Minimal conditions for performing long-term preservation of digital assets

- Long Term Preservation of digital assets, even if the OAIS (repository) itself is not permanent or present.

Page 6: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

DRI Storage Requirements:

- Open Source/Open Standards- Independence- High Availability- Dynamically Configurable- Ease of Interoperability (Interfaces, APIs)- Data Security/Placement (Replication, Erasure coding,

Placement, Tiering, Federation)- Self Contained- Commodity Hardware

Page 7: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Storage Solutions We Tested:

Page 8: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Why we didn't choose HDFS:

- Interfaces limited. Not posix compliant due to immutable nature of filesystem.- Performance geared towards large data streams. I/O of

many small files is poor. - Single point of failure and bottleneck at its Namenode.- Doesn’t provide any federation

Page 9: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Why we didn't choose iRODS:

- Default Interfaces limited. No Restful, RBD.- Single point of failure at its iCAT metadata server- Overlapping functionality with Fedora Commons

Why we didn't choose GPFS:- Default Interfaces limited. No Restful, RBD.- Data Replica limit of 2.- Closed source

Page 10: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Why we chose Ceph:

- We like its distributed, clustered architecture- Provides complete high availability on install- Scales out horizontally to massive levels- Data Security: Distributed, Replicated- Many interface options - Rich, documented, multi-level APIs- Dynamically configurable- Very good Performance for general use (many small file I/O)- Solid release schedule, new features

Page 11: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Findings:HDFS iRODS Ceph GPFS

API Yes Yes Yes Yes

Fedora 3.6.x Driver

Yes No No No

Interface: Posix No Yes Yes Yes

Interface: RBD No No Yes No

Interface: RESTful

Yes No Yes No

Dynamic Configuration

Yes Yes Yes Yes

High Availability: Data

Yes Yes Yes Yes

High Availability: Service

No No Yes Yes

Max Raw Storage (PetaByte)

>100 N/A >100 4 - 10^14

On-Read Data Checking

No Yes No No

Max Replicas 512 >2 ~2.1 Billion 2

Federation No Yes No Yes

Page 12: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

DRI Infrastructure

Page 13: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Performance

Poor performance with low number of OSDs (6) and replication.

Page 14: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Performance

Adding OSDs (26) improves replicated performance

Source: Diana Gudu, KIT

Sou

rc e: Dia

na Gud

u, KIT

Source: D

iana G

udu, K

IT

Page 15: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Features we want from Ceph:

- Asynchronous Replication- Erasure Coding - Tiering- Multi-datacenter/Rados level async replication- Micro-services

Page 16: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Other Ceph Projects:

- TCHPC: 100TB cluster used for backups

- collaboration with KIT/PSNC: performance testing, WAN scale replication testing (Sync/Async)

Page 17: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

DRI: www.dri.ieTrinity HPC: www.tchpc.tcd.ie

Trinity College Dublin: www.tcd.ie

Questions?

Page 18: Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

Links:

Ceph: www.ceph.comHDFS: hadoop.apache.orgIRODS: www.irods.orgGPFS: www.ibm.com/systems/software/gpfs/

Project Hydra: projecthydra.orgFedora Commons: www.fedora-commons.orgApache SOLR: lucene.apache.org/solr/HAProxy: haproxy.1wt.eu