infrastructure and performance - islandora workshop... · infrastructure and performance an...

31
Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Upload: trankhue

Post on 28-Jul-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Infrastructure and PerformanceAn Islandoracon Workshop

Instructors: Gavin Morris & Luke Taylor

Page 2: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Instructor: Luke TaylorDevOps Team Lead

discoverygarden inc.

155 Queen St. Suite 101

Charlottetown, PE C1A 4B4

discoverygarden.ca

[email protected]

Page 3: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Instructors - Gavin Morris● Team Lead & Dev Ops at Born Digital

○ http://born-digital.com/

● Convener of the Islandora DevOps Interest Group

○ https://github.com/islandora-interest-groups/Islandora-

DevOps-Interest-Group

● Led the Islandora DevOps Panel: Building Islandora at the

inaugural Islandora Conference (Islandoracon) on Prince

Edward Island, Canada

● Presented Automating Islandora Upgrations, Maintenance

and Deploys at the Islandora Camp in Hartford, CT

Page 4: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Overview● Intro to the stack

○ Build Types

○ Split the stack

○ Operating Systems

○ Packages

○ Services / Software

● Provisioning

○ Deploy & Config management tools

○ Pipeline

● Performance

● Scaling

● Security

● Best Practices

● The future of the stack

○ CLAW

○ ISLE

● Q&A

● Resources

Page 5: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - What is Islandora?

Source: http://islandora.mnpals.net/pals/islandora/object/PALSrepository%3A412/datastream/OBJ/download/2016-08_Detailed_Islandora_Introduction.pdf

Page 6: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Build Types ALL

All in One

2-3 servers

5-7 servers

Page 7: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Build Types: All in OneRecommended Minimum

● 4-6 cores

● 16GB - 32GB RAM

● 100-200GB for OS, Temp files etc

● Volume large enough for repository data

● Additional space could be required for staging content

Page 8: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Build Types: 2-3 serversWeb & Database Server (Minimum Requirements)

● 1-2 cores

● 4 - 16 GB RAM (*depends on platform type e.g. staging, dev)

● 150-250GB for OS, Temp files etc

● Additional space could be required for staging content

Fedora Repository Server (Minimum Requirements)

● 2-4 cores

● 8 - 32 GB RAM (*depends on collection size)

● 150-250GB for OS, Temp files etc

● Additional space / volume for repository data e.g. 2 -20 TB

Page 9: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Build Types: 2-3 servers

Page 10: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Build Types: 5-7 servers

Public Front End

Read-only fedora DB server Blazegraph Solr Read-write Fedora

Staff/Ingest Front End

Storage mount (e.g. NFS)

Page 11: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Split the Stack● Remote Solr

○ Use Gsearch 2.8+

○ Edit fgsindex.indexBase in index.properties in Gsearch.

○ Still have to maintain a “dummy” index on the Gsearch server.

● Blazegraph

○ Used to replace Mulgara (Triplestore) for performance and stability gains

○ https://github.com/discoverygarden/trippi-sail

○ https://github.com/Smithsonian/trippi-sparql

Page 12: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Operating SystemsCurrent Stable Recommendations

● Ubuntu 14.04 LTS

● RHEL/CentOS 6.9

Needing more definitive testing

● Ubuntu 16.04 TLS (w/PHP7)

● RHEL/CentOS 7 (challenges with temporary file system)

Community Poll from Melissa Anez (Have your say!)

● Survey https://docs.google.com/forms/d/1E7NmS4944LD3E51A7SK_8MiNoOWCPnjgY8YWOUC7I-o

● Google Group topic

https://groups.google.com/forum/?hl=en#!searchin/islandora/php$20testing|sort:relevance/islandora/WftNSPr7Xi0/vlh6eJU

bAwAJ

Page 13: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Operating Systems packages (basic)

man vim curl perl unzip automake subversion kernel-headers

gcc zip dkms bzip2 openssh mercurial pkg-config build-essential

git wget htop cmake libtool apt-utils kernel-devel libfreetype6-dev

ntp yasm nasm rsync autoconf zlib1g-dev linux-headers Development tools

Page 14: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor
Page 15: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Services / Software● Apache 2.2 - 2.4 Web server

○ Modules include but are not limited to

■ ssl, rewrite, deflate, headers, expires, xml2enc

■ reverse proxy for multi-systems:

● proxy, proxy_http, proxy_html, proxy_connect

● Databases

○ Mysql 5.5+

○ Percona

○ Mariadb

○ Postgres

○ Recommend UTF-8 encoding

Page 16: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Services / Software● Tomcat 7.0.52 +

○ Oracle Java JDK or OpenJDK 7/8

○ SSL & port 8443

■ Will need to compile own jks/P12/truststore (how to automate?)

○ see Gotcha section re versions above 7.0.72/8.0.39+

● Apache Solr

○ versions 4.2, 4.6.1, 4.10

○ Don’t use Gsearch Ant generated schema (not complete), missing catch_all

entries etc.

○ Always helpful for starting out for schema & solrconfig .xml files

https://github.com/discoverygarden/basic-solr-config

Page 17: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Services / Software● PHP 5.3.x+

○ Drupal 7.5.4

■ Islandora 7.x / HEAD modules

■ Additional modules e.g. ctools, imagemagick, date, views etc.

○ Composer

■ Drush

● Fedora-Commons 3.8.1

○ Triplestore (mulgara, Blazegraph)

● Fedoragsearch HEAD / 2.7.1

○ DGI GSearch Extensions https://github.com/discoverygarden/dgi_gsearch_extensions

○ XSL Transforms for Gsearch https://github.com/discoverygarden/islandora_transforms

Page 18: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Intro to the Stack - Services / Software● Binaries, Derivative generation

○ Imagemagick

○ LAME (audio, mp3 etc)

○ FFMPEG (video) from source 3.3

○ FITS

○ EXIF

○ XPDF

○ Ghostscript 9.05 (from source)

○ Tesseract (OCR)

○ Adore-djatoka 1.1

■ On multi-system setups libraries should be additionally installed on web servers

■ Requires use of Oracle JDK 7/8

Page 19: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Provisioning - Deploy & Config management toolsPuppet DSL / Ruby

Free

(up to 10 nodes)

Puppet Enterprise https://puppet.com/

Chef DSL / Ruby Free Chef Automate / Hosted https://www.chef.io

Ansible

(Red Hat owned)

DSL / Python

(agentless)

Free Tower https://www.ansible.com/

Saltstack DSL / Python Salt Open Salt Enterprise https://saltstack.com/

CFEngine DSL / C Community Edition CFEngine Enterprise https://cfengine.com/

Shell Scripts Bash / sh Free

https://www.gnu.org/softw

are/bash/

Packer DSL / JSON Free Builds Images https://www.packer.io/

Page 20: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Example Pipeline

Developer #1

Web & Db server VM

Fedora server VM

Developer #2

Web & Db server VM

Fedora server VM

Developer #3

Web & Db server VM

Fedora server VM

Production

Web & DB server

Fedora repo server

Development

Web & DB server

Fedora repo server

Code Up!

Data Down!

Package & software updates, system

configuration changes, data

migrations, re-indexing of triplestore

etc.

Theming, solution packs,

modules, XSLTs, schemas,

config etc.

Continuous Integration w/

Testing Suites for Code &

Data

Provisioning - Pipeline

Page 21: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Performance● Using Solr vs SPARQL/iTQL

○ Collection Solution Pack (Display Generation)

○ Islandora OAI (Query Backend)

○ Paged Content Module (Use Solr to derive pages and sequence numbers)

○ Breadcrumbs (Breadcrumb Generation)

● Breadcrumbs - Disable if not required or use Solr

● Enable Drupal caching options (Configuration - Development - Performance)

● Memcached / Varnish

Page 22: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Performance“(XmlUsersFileModule) null” error

Source: /usr/local/fedora/server/logs/fedora.log

Reference:

https://issues.apache.org/jira/browse/XERCESJ-211

https://jira.duraspace.org/browse/FCREPO-1230

Fix! https://github.com/discoverygarden/fcrepo3-security-jaas

ERROR 2017-03-10 08:56:54.796 [http-8080-21] (XmlUsersFileModule) null ERROR 2017-03-10 08:56:54.805 [http-8080-21] (AuthFilterJAAS) javax.security.auth.login.LoginException: Login Failure: all modules ignored

Page 23: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Performance

● Help too many multisites!

○ Islandora installations with Drupal multisites can cause unnecessary database connections.

● Multi-site optimization

○ https://github.com/discoverygarden/fcrepo3-security-jaas

Page 24: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Performance● Islandora Jobs

○ https://github.com/discoverygarden/islandora_job

○ Faster Ingests

○ Allows you to have multiple Gearman workers processing derivatives.

● Islandora Gsearcher

○ https://github.com/discoverygarden/islandora_gsearcher

○ Updates Solr index upon ingest completion vs waiting for ActiveMQ

Page 25: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Security● Directory permissions Tomcat/Drupal

● Run services using non-privileged users with no shell.

● Firewalls

○ Fail2ban (https://www.fail2ban.org)

○ Modsec (https://modsecurity.org/)

○ Ports / Rules

● Central logging

○ Syslog

○ Tripwire (https://www.tripwire.com/) (can be used for extended logging in addition to security)

○ ELK (ElasticSearch, Logstash & Kibana) https://logz.io/learn/complete-guide-elk-stack/

Page 26: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Best Practices, Gotchas, Tips ● Gsearch issues Tomcat 7.0.72/8.0.39+

○ https://github.com/discoverygarden/gsearch.git

● Try the Islandora Deploy on Ubuntu guide

https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group/blob/master/Deployment

%20Guides/Provisioning-Islandora-on-Ubuntu.md

● AWS S3 mounting as a file system

○ https://github.com/danilop/yas3fs

■ Debug mode first!

■ Make sure it re-mounts properly if system is restarted.

■ Gotcha: There may be an object size limit of 60 GB for ingested binaries e.g. video etc.

■ Mount the datastreamStore to S3 and leave objectStore on EBS for better performance

● Caution! Challenges with restoration!

○ Alternative https://bitbucket.org/nikratio/s3ql (same Gotchas apply!)

Page 28: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

The future of the stack - ISLE

Islandora

Enterprise

(ISLE)

+ =

https://github.com/Islandora-Collaboration-Group

https://islandora-collaboration-group.github.io/

https://islandora.ca/content/islandora-together-meet-islandora-consortial-group

Page 29: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Q&A

Page 30: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Resources● Islandora http://islandora.ca

● Islandora sandbox https://sandbox.islandora.ca/

● Vagrant up with Islandora Labs! https://github.com/Islandora-Labs/islandora_vagrant

● Please join the growing global community! http://islandora.ca/membership

● Perhaps jump on a call with one of the Islandora Interest groups?

○ https://github.com/islandora-interest-groups

○ https://github.com/islandora-interest-groups/Islandora-DevOps-Interest-Group

● One can learn so much from the Islandora Community on Google Groups!

○ https://groups.google.com/forum/?hl=en#!forum/islandora-dev

○ https://groups.google.com/forum/?hl=en#!forum/islandora

Page 31: Infrastructure and Performance - Islandora Workshop... · Infrastructure and Performance An Islandoracon Workshop Instructors: Gavin Morris & Luke Taylor

Thank you!