Download - Storage and Alfresco
Storage Foundation and Alfresco
Toni de la Fuente Principal Solutions Engineer, Americas [email protected] Blog: blyx.com – Twitter: @ToniBlyx
Agenda • Intro to Storage Concepts • Hardware • Alfresco Storage Related Solutions
– Alfresco S3 • Caching contentstore
– Alfresco XAM – Content Store Selector – Replication / Geo-clusters / Redundancy
• Partners Solutions – Alf2CAS, Star Storage
• Storage Best Practices with Alfresco • Backup and Recovery
Intro to Storage Concepts: stack
File Protocol NFS, CIFS, SMB
File System Ext3, Ext4, RaiserFS, XFS, GFS, NTFS, FAT32, GlusterFS, OCFS, ZFS
Block Management MDM, LVM (Logical Volume Management)
Block Protocol SCSI, SATA, FC
RAID (HW or SW) Mirrors, Stripes
Hardware Disks, connectors, racks, FC switches
Intro to Storage Concepts • Hard drive types and interfaces
– PATA: Parallel Advanced Technology Attachment • AKA IDE or EIDE, older, 20pin connector, less efficient, use
to be 4K – 5K rpm.
– SATA: Serial ATA • Similar to PATA, different connector, more energy efficient,
between 5K and 10K rpm.
– SCSI: Small Computer System Interface • Spin at 10K and 15K rpm, need a controller
– SSD: Solid State Drives • No mechanical, semiconductors, much faster than
mechanical and less likely to break down than others.
Intro to Storage Concepts • Hard drive types and interfaces
– FC: Fibre Channel • Successor to parallel SCSI, broader usage than mere disk
interfaces, used for SANs. – SAS: Serial Attached SCSI
• Similar to SCSI but serial rather than parallel. – Other interfaces end user oriented:
• USB • Firewire • Thunderbolt
• CAS Content-addressable storage, is a mechanism for storing information that can be retrieved based on its content, not its storage location. (EMC Centera / Caringo)
• XAM standard interface for archiving in CAS.
Intro to Storage Concepts • RAID types (SW or HW)
ß Faster with parity
Intro to Storage Concepts Main differences between SAN and NAS
A SAN is a shared "network" of storage • Block access to LUNs • Online and offline storage • SAN device = storage array • Zoning: data integrity and
security • Dedicated fiber network Protocols: • SCSI over Fibre Channel • SCSI over IP/Ethernet (iSCSI)
and FC, Infiniband
NAS is a file system shared over a network
• File access to data • Online storage only • NAS device = File server or
"filer” already formatted Protocols: • NFS, CIFS over IP over
Ethernet
Intro to Storage Concepts Who should need a SAN? • Database servers and ECM: Oracle, SQL Server, DB2 and
other database servers. • File servers: Using SAN-based storage for file servers lets
you expand file server resources quickly, makes them run better, and enables you to manage your file-based NAS storage through the SAN.
• Backup servers: SAN-based backup is dramatically faster than LAN-based backup.
• Voice/video servers: Manage large amounts of data very quickly.
• High-performance application servers: Applications such as document management, customer relationship management, billing, data warehouses, and other high-performance and critical applications all benefit by what a SAN can provide.
Intro to Storage Concepts • Evolution
Internal Storage
Direct-Attach Storage (DAS)
Network-Attached Storage (NAS)
Hardware HBA
CARD
Tape Library
Fibre Cables
Storage Arrays
Alfresco Storage Related Solutions Alfresco S3 Connector • An alternative contentstore implementation that uses S3 directly (S3
APIs) • Somewhat equivalent to XAM, but not identical
– Unlike XAM, S3 doesn’t offer retention policies • Enterprise only
– USD10K for Alfresco Standard – USD13.4K for Alfresco Enterprise
• Shipped as a single repo-side AMP • Can only be installed into a new Alfresco instance (no migration!) • Configuration must be done before first start. • Can also configure caching content store (default cache size: 50GB) • Only supported if Alfresco is running on Amazon EC2 • Amazon EBS still required for database files, indexes, etc. • Does not support S3 Encryption yet.
Alfresco Storage Related Solutions Alfresco XAM Connector (deprecated) • Made to get access from Alfresco to XAM
enabled storage devices. • New XAM connector available • Only EMC Centera supported • Released with 3.4, Jan 2011. • Enterprise only • Still being supported for existing customers
– until November 30th 2014 or their current subscription runs out, whichever comes first.
Alfresco Storage Related Solutions Content Store Selector • Storage policies based in
business rules • Since Alfresco 3.2 • Examples
o By type: Large video files on fast expensive drives. Office documents on slower, more cost effective, drives.
o By business unit, by age, by usage, by ...
• Leverage Rules and Actions to drive
SSD $$$
SATA Drive
$
SSD = Solid State Drives FC = Fibre Channel
Policy Rules
Policy Rules
FC Drives
$$
Alfresco Storage Related Solutions Content Replication (Alfresco on-premise to Alfresco on-premise) • Distributed repository replication
– Selective replication of spaces and content – Support for full, incremental and delete – One source – multiple destinations – Replicas are read-only (update at source only - re-
direct if needed) • Benefits
– Support geographically dispersed companies – Provide fast local access – Remove single point of failure – Reduce wide area network traffic
Alfresco Storage Related Solutions Content Replication / Geo-clusters / Redundancy • Alfresco Cloud Sync: On premise ßà Cloud
– Content oriented not for storage replication
• Synchronization feature between Alfresco on-premises (Not available yet).
• Alfresco Desktop Sync: from Windows or Mac desktop to Alfresco on-premise (not available yet)
Alfresco Storage Related Solutions Geo-clusters and Redundancy • Geo-clusters can be done by replicating DB and Content
store. Supported? – Low level replication/sync – Some customers has this. – Some customer uses NetApp NAS storage and Golden-gate for DB replication – Other replication tools: EMC Clariion, EMC Symmetrix or IBM Total Storage.
Partners Solutions • Xenit Alf2Cas
– Caringo Castor integration – Deprecated?
• Star Storage – Hitachi Content Platform (HCP) – Content archiving, additional storage and faster content backup – Alfresco Enterprise: 3.4.x, 4.0.x – Hitachi Content Platform (HCP): 4.x, 5.x, 6.x
Third Party – Community Solutions • StorNext
– It is not a connector is a solution for data life cycle management in the background
– Alfresco can see it as mount point and is not aware about that – Runs over FC
• EMC Atmos – XAM connector for Alfresco
• Alfresco Cloud Store – Amazon S3 – https://code.google.com/p/alfresco-cloud-store/
• Amazon S3 for on premise – https://issues.alfresco.com/jira/browse/AMZNSSS-26
• Walrus? The S3 alternative for Eucalyptus
Storage Best Practices • Content Store
– Use Content Store Selector for managing different size of contents.
– Default content store should be faster than others for writing to avoid bottlenecks (contents come to default then copied to other content store)
– WORM disks as non default content store (cleaner - Jefferies)
– SAN if possible – If NAS use a dedicated LAN if possible – LVM if possible (scalability, snapshot) – Clean trash bin often – Delete “contentstore.deleted” often
Storage Best Practices • Indexes (SOLR or Lucene)
– Dedicated disk local or SAN. – Avoid NAS. – Have at least 50-75% of space free (backup and
merge) – Consider using different file system for Lucene
backup and Solr backup. • Logs
– Set your logs directory in different file system as Content Store and Indexes.
Backup and Recovery • Recovery Time Objective: (RTO) The amount of time
that it takes to get your systems back online.
• Recovery Point Objective: (RPO)This is the last consistent data transaction prior to the disaster. If you had a disaster, how much data would be lost?
• The Disaster Recovery plan (DR) focuses on getting your business back up and running after a major outage
• The Business Continuance plan (BCP) focuses on keeping your business running DURING the disaster.
Backup and Recovery • Alfresco Backup and Recovery Tool is
available: – http://blyx.com/open-source-contributions/alfresco-
bart/
• Alfresco Backup and Recovery White Paper: – http://www.slideshare.net/toniblyx/alfresco-backup-
and-disaster-recovery-white-paper
Common Questions to SE? • Best practices to storage.
– You got it
• NAS or SAN? – SAN if possible! Or NAS backed by a SAN is common as well. NAS is not bad
but now you know why is different.
• Required space for DB, Indexes, Content Store? – It depends of any case but DB and Indexes use to be a 20% of the Content Store
space (each).
• Do you have an Archiving solution? – Alfresco can be integrated with Archiving solutions like mentioned above and
implemented with Content Store Selector.
• Do you have a backup/recovery solution? – http://www.slideshare.net/toniblyx/alfresco-backup-and-disaster-recovery-white-
paper
• Do you have an data encryption solution? – Yes, Alfresco Encryption at Rest:
http://docs.alfresco.com/5.0/concepts/encrypted-overview.html
What kind of storage can I use with Alfresco? • Any mountable volumes that can be made to
appear as standard local filesystems (local disks, NAS, SAN, etc.)
• Amazon S3 (for Alfresco installations in AWS) • Centera (through the now open source
connector) • EMC Atmos (through a partner-created
integration) • CAStor (through a dated partner-created
integration)
Appendix 1: Deleting content
Deleting Content • A complex process • You need to know this because it impacts
– Disk space management – Backup and recovery procedures (and their integrity) – Security and auditing
• You have a wide degree of control over what happens and when
• You need to do some work • More info page 24
http://www.slideshare.net/toniblyx/alfresco-security-best-practices-guide
Node deletion workspace://SpacesStore alf_node
alf_content_data
alf_content_url
alf_node_properties
others 2e3839d2d345.bin
archive://SpacesStore
contentstore
~/alf_data
contentstore.deleted
filesystem database
User deletes document
workspace://SpacesStore alf_node
alf_content_data
alf_content_url
alf_node_properties
others 2e3839d2d345.bin
archive://SpacesStore
contentstore
~/alf_data
contentstore.deleted
filesystem database
Node deletion
Wastebasket emp5es
workspace://SpacesStore alf_node
alf_content_data
alf_content_url orphan_time = 'now'
alf_node_properties
2e3839d2d345.bin
archive://SpacesStore
contentstore
~/alf_data
contentstore.deleted
filesystem database
workspace://SpacesStore alf_node
alf_content_data
alf_content_url
alf_node_properties
others 2e3839d2d345.bin
archive://SpacesStore
contentstore
~/alf_data
contentstore.deleted
filesystem database
Node deletion workspace://SpacesStore alf_node
alf_content_data
alf_content_url orphan_time = 'now'
alf_node_properties
2e3839d2d345.bin
archive://SpacesStore
contentstore
~/alf_data
contentstore.deleted
filesystem database
contentStoreCleaner Runs
workspace://SpacesStore alf_node
alf_content_data
alf_content_url
alf_node_properties
2e3839d2d345.bin
archive://SpacesStore
contentstore
~/alf_data
contentstore.deleted
filesystem database
Questions?