infrastructure and performance - islandora workshop... · infrastructure and performance an...
TRANSCRIPT
Infrastructure and PerformanceAn Islandoracon Workshop
Instructors: Gavin Morris & Luke Taylor
Instructor: Luke TaylorDevOps Team Lead
discoverygarden inc.
155 Queen St. Suite 101
Charlottetown, PE C1A 4B4
discoverygarden.ca
Instructors - Gavin Morris● Team Lead & Dev Ops at Born Digital
○ http://born-digital.com/
● Convener of the Islandora DevOps Interest Group
○ https://github.com/islandora-interest-groups/Islandora-
DevOps-Interest-Group
● Led the Islandora DevOps Panel: Building Islandora at the
inaugural Islandora Conference (Islandoracon) on Prince
Edward Island, Canada
● Presented Automating Islandora Upgrations, Maintenance
and Deploys at the Islandora Camp in Hartford, CT
Overview● Intro to the stack
○ Build Types
○ Split the stack
○ Operating Systems
○ Packages
○ Services / Software
● Provisioning
○ Deploy & Config management tools
○ Pipeline
● Performance
● Scaling
● Security
● Best Practices
● The future of the stack
○ CLAW
○ ISLE
● Q&A
● Resources
Intro to the Stack - What is Islandora?
Source: http://islandora.mnpals.net/pals/islandora/object/PALSrepository%3A412/datastream/OBJ/download/2016-08_Detailed_Islandora_Introduction.pdf
Intro to the Stack - Build Types ALL
All in One
2-3 servers
5-7 servers
Intro to the Stack - Build Types: All in OneRecommended Minimum
● 4-6 cores
● 16GB - 32GB RAM
● 100-200GB for OS, Temp files etc
● Volume large enough for repository data
● Additional space could be required for staging content
Intro to the Stack - Build Types: 2-3 serversWeb & Database Server (Minimum Requirements)
● 1-2 cores
● 4 - 16 GB RAM (*depends on platform type e.g. staging, dev)
● 150-250GB for OS, Temp files etc
● Additional space could be required for staging content
Fedora Repository Server (Minimum Requirements)
● 2-4 cores
● 8 - 32 GB RAM (*depends on collection size)
● 150-250GB for OS, Temp files etc
● Additional space / volume for repository data e.g. 2 -20 TB
Intro to the Stack - Build Types: 2-3 servers
Intro to the Stack - Build Types: 5-7 servers
Public Front End
Read-only fedora DB server Blazegraph Solr Read-write Fedora
Staff/Ingest Front End
Storage mount (e.g. NFS)
Intro to the Stack - Split the Stack● Remote Solr
○ Use Gsearch 2.8+
○ Edit fgsindex.indexBase in index.properties in Gsearch.
○ Still have to maintain a “dummy” index on the Gsearch server.
● Blazegraph
○ Used to replace Mulgara (Triplestore) for performance and stability gains
○ https://github.com/discoverygarden/trippi-sail
○ https://github.com/Smithsonian/trippi-sparql
Intro to the Stack - Operating SystemsCurrent Stable Recommendations
● Ubuntu 14.04 LTS
● RHEL/CentOS 6.9
Needing more definitive testing
● Ubuntu 16.04 TLS (w/PHP7)
● RHEL/CentOS 7 (challenges with temporary file system)
Community Poll from Melissa Anez (Have your say!)
● Survey https://docs.google.com/forms/d/1E7NmS4944LD3E51A7SK_8MiNoOWCPnjgY8YWOUC7I-o
● Google Group topic
https://groups.google.com/forum/?hl=en#!searchin/islandora/php$20testing|sort:relevance/islandora/WftNSPr7Xi0/vlh6eJU
bAwAJ
Intro to the Stack - Operating Systems packages (basic)
man vim curl perl unzip automake subversion kernel-headers
gcc zip dkms bzip2 openssh mercurial pkg-config build-essential
git wget htop cmake libtool apt-utils kernel-devel libfreetype6-dev
ntp yasm nasm rsync autoconf zlib1g-dev linux-headers Development tools
Intro to the Stack - Services / Software● Apache 2.2 - 2.4 Web server
○ Modules include but are not limited to
■ ssl, rewrite, deflate, headers, expires, xml2enc
■ reverse proxy for multi-systems:
● proxy, proxy_http, proxy_html, proxy_connect
● Databases
○ Mysql 5.5+
○ Percona
○ Mariadb
○ Postgres
○ Recommend UTF-8 encoding
Intro to the Stack - Services / Software● Tomcat 7.0.52 +
○ Oracle Java JDK or OpenJDK 7/8
○ SSL & port 8443
■ Will need to compile own jks/P12/truststore (how to automate?)
○ see Gotcha section re versions above 7.0.72/8.0.39+
● Apache Solr
○ versions 4.2, 4.6.1, 4.10
○ Don’t use Gsearch Ant generated schema (not complete), missing catch_all
entries etc.
○ Always helpful for starting out for schema & solrconfig .xml files
https://github.com/discoverygarden/basic-solr-config
Intro to the Stack - Services / Software● PHP 5.3.x+
○ Drupal 7.5.4
■ Islandora 7.x / HEAD modules
■ Additional modules e.g. ctools, imagemagick, date, views etc.
○ Composer
■ Drush
● Fedora-Commons 3.8.1
○ Triplestore (mulgara, Blazegraph)
● Fedoragsearch HEAD / 2.7.1
○ DGI GSearch Extensions https://github.com/discoverygarden/dgi_gsearch_extensions
○ XSL Transforms for Gsearch https://github.com/discoverygarden/islandora_transforms
Intro to the Stack - Services / Software● Binaries, Derivative generation
○ Imagemagick
○ LAME (audio, mp3 etc)
○ FFMPEG (video) from source 3.3
○ FITS
○ EXIF
○ XPDF
○ Ghostscript 9.05 (from source)
○ Tesseract (OCR)
○ Adore-djatoka 1.1
■ On multi-system setups libraries should be additionally installed on web servers
■ Requires use of Oracle JDK 7/8
Provisioning - Deploy & Config management toolsPuppet DSL / Ruby
Free
(up to 10 nodes)
Puppet Enterprise https://puppet.com/
Chef DSL / Ruby Free Chef Automate / Hosted https://www.chef.io
Ansible
(Red Hat owned)
DSL / Python
(agentless)
Free Tower https://www.ansible.com/
Saltstack DSL / Python Salt Open Salt Enterprise https://saltstack.com/
CFEngine DSL / C Community Edition CFEngine Enterprise https://cfengine.com/
Shell Scripts Bash / sh Free
https://www.gnu.org/softw
are/bash/
Packer DSL / JSON Free Builds Images https://www.packer.io/
Example Pipeline
Developer #1
Web & Db server VM
Fedora server VM
Developer #2
Web & Db server VM
Fedora server VM
Developer #3
Web & Db server VM
Fedora server VM
Production
Web & DB server
Fedora repo server
Development
Web & DB server
Fedora repo server
Code Up!
Data Down!
Package & software updates, system
configuration changes, data
migrations, re-indexing of triplestore
etc.
Theming, solution packs,
modules, XSLTs, schemas,
config etc.
Continuous Integration w/
Testing Suites for Code &
Data
Provisioning - Pipeline
Performance● Using Solr vs SPARQL/iTQL
○ Collection Solution Pack (Display Generation)
○ Islandora OAI (Query Backend)
○ Paged Content Module (Use Solr to derive pages and sequence numbers)
○ Breadcrumbs (Breadcrumb Generation)
● Breadcrumbs - Disable if not required or use Solr
● Enable Drupal caching options (Configuration - Development - Performance)
● Memcached / Varnish
Performance“(XmlUsersFileModule) null” error
Source: /usr/local/fedora/server/logs/fedora.log
Reference:
https://issues.apache.org/jira/browse/XERCESJ-211
https://jira.duraspace.org/browse/FCREPO-1230
Fix! https://github.com/discoverygarden/fcrepo3-security-jaas
ERROR 2017-03-10 08:56:54.796 [http-8080-21] (XmlUsersFileModule) null ERROR 2017-03-10 08:56:54.805 [http-8080-21] (AuthFilterJAAS) javax.security.auth.login.LoginException: Login Failure: all modules ignored
Performance
● Help too many multisites!
○ Islandora installations with Drupal multisites can cause unnecessary database connections.
● Multi-site optimization
○ https://github.com/discoverygarden/fcrepo3-security-jaas
Performance● Islandora Jobs
○ https://github.com/discoverygarden/islandora_job
○ Faster Ingests
○ Allows you to have multiple Gearman workers processing derivatives.
● Islandora Gsearcher
○ https://github.com/discoverygarden/islandora_gsearcher
○ Updates Solr index upon ingest completion vs waiting for ActiveMQ
Security● Directory permissions Tomcat/Drupal
● Run services using non-privileged users with no shell.
● Firewalls
○ Fail2ban (https://www.fail2ban.org)
○ Modsec (https://modsecurity.org/)
○ Ports / Rules
● Central logging
○ Syslog
○ Tripwire (https://www.tripwire.com/) (can be used for extended logging in addition to security)
○ ELK (ElasticSearch, Logstash & Kibana) https://logz.io/learn/complete-guide-elk-stack/
Best Practices, Gotchas, Tips ● Gsearch issues Tomcat 7.0.72/8.0.39+
○ https://github.com/discoverygarden/gsearch.git
● Try the Islandora Deploy on Ubuntu guide
https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group/blob/master/Deployment
%20Guides/Provisioning-Islandora-on-Ubuntu.md
● AWS S3 mounting as a file system
○ https://github.com/danilop/yas3fs
■ Debug mode first!
■ Make sure it re-mounts properly if system is restarted.
■ Gotcha: There may be an object size limit of 60 GB for ingested binaries e.g. video etc.
■ Mount the datastreamStore to S3 and leave objectStore on EBS for better performance
● Caution! Challenges with restoration!
○ Alternative https://bitbucket.org/nikratio/s3ql (same Gotchas apply!)
The future of the stack - Islandora 7.2.x - CLAW
https://github.com/Islandora-CLAW/CLAW/blob/master/docs/user-documentation/i
ntro-to-claw.md
https://github.com/Islandora-CLAW/CLAW/blob/master/docs/mvp/mvp_doc.md
The future of the stack - ISLE
Islandora
Enterprise
(ISLE)
+ =
https://github.com/Islandora-Collaboration-Group
https://islandora-collaboration-group.github.io/
https://islandora.ca/content/islandora-together-meet-islandora-consortial-group
Q&A
Resources● Islandora http://islandora.ca
● Islandora sandbox https://sandbox.islandora.ca/
● Vagrant up with Islandora Labs! https://github.com/Islandora-Labs/islandora_vagrant
● Please join the growing global community! http://islandora.ca/membership
● Perhaps jump on a call with one of the Islandora Interest groups?
○ https://github.com/islandora-interest-groups
○ https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group
● One can learn so much from the Islandora Community on Google Groups!
○ https://groups.google.com/forum/?hl=en#!forum/islandora-dev
○ https://groups.google.com/forum/?hl=en#!forum/islandora
Thank you!