architecting and tuning iib/extreme scale for maximum performance and reliability, featuring tbc

CONNECT WITH US:

Architecting & Tuning IIB / eXtreme Scale for Maximum Performance and Reliability

AJ Aronoff Infrastructure Practice Director, Prolifics

Robert Gus Fort

Manager, Integration Development

CONNECT WITH US:

Agenda

Speaker Introduction

Challenge: The need for Speed & Scalability Extreme Scale Basics

Use Cases

Performance Tuning Measurement - Bottlenecks

Infrastructure

Maximizing Reliability and Reducing Downtime

References & What’s Next

2

CONNECT WITH US:

Speaker Introduction

A.J. Aronoff Infrastructure Practice Director,

Prolifics

AJ Aronoff is a Practice Director covering The IBM Integration Product Suite

(IIB, Extreme Scale, MQ, WAS, SmartCloud APM, Data Power) for Prolifics.

During his twenty plus years at Prolifics he has focused on reliability (Zero

Downtime, High Availability, Disaster Recovery), scalability, performance and

monitoring. He is a frequent speaker on IIB, MQ, Security as well as industry

specific topics.

Robert Gus Fort Manager, Integration Development

Prolifics Customer

Gus Fort learned about the need for speed and 100% reliability in

the U.S. Air Force. Since then he has been applying that same

quest for reliability and performance in his current role with a leading

automotive services retailer. His current projects use eXtreme Scale

to give customers a satisfying and rewarding experience on a high

speed and reliable system. Nearly doubling the number of stores

while reducing risk and increasing customer satisfaction

3

CONNECT WITH US: 4

5-Years Compound

Annual Growth Rate

19%

Employees

Worldwide

1,500

Global Presence

United States, United Kingdom, Germany, India

20+ Technology

Accelerators

550+ Technical

Certifications

Over 10 Technology and

Solutions Awards since 2009

including Business Agility,

Customer Integration and Digital

Experience, the first ever

Beacon Laureate for Business

Agility

Over 160 global customers

are currently Fortune

1000 companies

Best-in-class architects and

specialty experts:

BPM, Integration, Digital

Experience, Security, Testing,

Business Analytics and

Enterprise Content Management

End-to-End Project

Expertise

Rate of Repeat

Engagements*

91%

Prolifics at a Glance

Years in

Business

35+

Offices

14

Awards

Technology Expertise

Fortune 1000

*based on % revenue Source – December 2013 internal revenue metrics

CONNECT WITH US:

Extreme Scale Caching & The Need for Speed

“As the world becomes more instrumented, interconnected and intelligent, Internet-based activities, online

transactions and data volumes increase. Further, these increasing amounts of data, along with rising consumer

expectations and the need to maintain a competitive edge require fast and reliable performance…. Elastic caching is

the answer.”

- IBM Software Technical White Paper

- ftp://ftp.software.ibm.com/software/au/data/pdf/Elastic_CachingWP.pdf

3/5/2015 5

ftp://ftp.software.ibm.com/software/au/data/pdf/Elastic_CachingWP.pdf

CONNECT WITH US:

Challenge: The Need For Speed & Scalability

6

• How scalable is your System?

• When the peak load doubles does response time:

• A) stay the same, B) double C) quadruple D) fail

• Have you tested your system to destruction?

• How high can the load go before failure

• Where are the bottlenecks and single points of failure

• Focus on Extreme Scale

• Benefit - the approach taken for measurement,

monitoring, bottleneck identification will work on many

systems.

CONNECT WITH US:

Extreme Scale Basics: Traditional Cache Operation

App

App

App

App

EIS

A

A

A

A

App

Inva

lida

tion

ch

atte

r

New Server with

cold cache

Redundant copies of data at

different versions

Invalidation load

increases with

cluster size

High load on EIS

Inva

lida

tion

ch

atte

r

• Traditional in JVM cache

• Cache capacity determined by

individual JVM Size

• Invalidation load per server

increases as cluster grows

• Cold start servers hit the

information system even when

data is cached in cluster

• Lower performance as load

increases due to invalidation

chatter

• No redundancy of cached data

CONNECT WITH US:

Basics: WXS based Cache Operation

App

App

App

App

EIS

A

B

D

C

C’

D’

A

A

App A

B’

A’ Cache is

4x larger!

Cache is

5x larger!

• Cluster Coherent cache

• Cache capacity determined by total

cluster size

Size of each cache = M

# JVMs = N

Total Cache = M x N

• No invalidation chatter

• Linearly scalable

• Less load on database and

no cold start spikes

• Predictable performance as load

increases

• Cached data can be stored

redundantly

Cache cluster can be co-

located with the application or

run in it’s own tier.

CONNECT WITH US:

Extreme Scale Basics: How it Works

9

CONNECT WITH US:

Extreme Scale Basics: Architecture

10

Back End Services

Elastic Cache

Application or IIB Tier

Web Server Tier

CONNECT WITH US:

Use Case 1: Universal Application State

11

Single replacement for multiple local caches

Consistent response times

Reduces Application Server JVM heap size

Improved memory utilization - more memory for applications

Faster Application Server start-up

Removes invalidation chatter of local caches

Applications move application state to grid

Stateless applications scale elastically

Application state can be shared across data

centers for high availability

CONNECT WITH US:

Use Case 2: Side Cache Pattern

12

Client first checks the grid before using the data access layer to connect to a back end data store

If an object is not returned from the grid (a cache “miss”), the client uses the data access layer as usual to retrieve the data

The result is put into the grid to enable faster access the next time

The back end remains the system of record, and usually only a small amount of the data is cached in the grid

An object is stored only once in the cache, even if multiple clients use it

Thus, more memory is available for caching, more data can be cached, which increases the cache hit rate

Improve performance and offload unnecessary workload on backend systems

CONNECT WITH US:

Use Case 2: ESB - Side Cache Pattern

13

Easily integrates into the existing business process No code changes to the client application or

backend application Simply add the side cache mediation at the ESB

layer

Significantly reduces the load on the back-end system by eliminating redundant requests Eliminates costly MIPS by eliminating redundant

request Allows for more “REAL” work to be performed Improves overall response time Minimizes the need to scale hardware to

increase processing capacity since the back-end system no longer has to handle redundant requests

Response time from elastic cache is in milliseconds

CONNECT WITH US:

Use Case 2: ESB - Side Cache Pattern

14

Product Search Existing performance without cache

Targeted SLA

Full Load of cache data

Delta processing

Crash preparation

CONNECT WITH US:

Performance Tuning - Measurement

15

Several useful tools include: For MQ performance - the MS0P Support Pac - allows us to

see which Queues had buildup

For IIB WebAdmin - interfaces with the Flow Statistic to identify nodes in a flow indentifying slow (high elapse time), or inefficiency from a CPU usage point of view

For DP, am AJAX Web "Dashboard" - helps us to track Overall CPU, TPS, Memory and specific MPG nodes and their respective elapse times

Captured CPU time for all servers, the TPS rate (end to end as well as per product) and by performing multiple runs, using the IBM Performance harness

CONNECT WITH US:

SupportPac MS0P: MQ Explorer plug-ins

A variety of WMQ Explorer plug-ins are provided Format event messages into human-readable text

Trace route function to determine the path messages take

Export WMQ Explorer display panels to CSV

Remote management of Windows, UNIX and Linux systems

Utilities Qtune

MQIDecode turns codes into symbolic names

OAM logging utility records PCF commands sent to CMDSVR

16

CONNECT WITH US:

SupportPac MS0P: Event Message Context Menu

After installing SupportPac MS0P, new context menu items appear in WebSphere MQ Explorer. The "Format Event Message" option shows up when right-clicking on an event queue.

17

CONNECT WITH US:

SupportPac MS0P: GUI Display

18

CONNECT WITH US:

Details of Extreme Scale running inside WAS 8.5 Use PMI for monitoring

WebSphere Application Server provides a sophisticated performance monitoring infrastructure (PMI). It can be used to gather performance statistics of all components inside an application server. WebSphere eXtreme Scale takes advantage of this infrastructure to collect performance statistics relevant to the grid. Metrics are grouped within the following PMI modules:

objectGridModule

Provides metrics about transaction response time.

mapModule

Provides metrics about map and index count and time statistics. These include hit rate per map, # of entries per map, # of results per index

agentManagerModule

Provides statistics related to map-based agents. These include number of agent executions, number of partitions involved, time required to run map and reduce operations, time required to serialize agents, and related results.

queryModule

Provides statistics about query processing and execution. These include the amount of time to create a query plan and run the query, the number of times that a query has been run, and so on.

19

CONNECT WITH US:

Infrastructure: Avoid Single Points of Failure

20

Fast network: 10 gigabit +

Multiple Nics

Nearby Catalogs

Fast Disk Drives: The new slid state drives (SSDs) are incredible.

Trace route: On multiple occasions we have found very unexpected network delays. Measure early and often

CONNECT WITH US:

Erasmus: Prevention is Better than Cure

WXS 8.6 makes it easier to recover from crashes.

Log analysis was done exploring the cause of those crashes

The next few slides show how WAS 8.5 can

prevent some of those crashes

21

CONNECT WITH US:

WebSphere Application Server V8.5

Three primary goals:

Provide intelligent management and enhanced resiliency for your application server environment

Improve operation, security, control

Integration of the application server - providing a fast, flexible, and simplified application development environment that allows you to deliver rich user experience faster

22

CONNECT WITH US:

Intelligent management

23

CONNECT WITH US:

Health Management

24

Automatically detect and handle application health problems

Without requiring administrator time, expertise, or intervention

Intelligently handle health issues in a way that will maintain continuous availability

Each health policy consists of a condition, one or more actions, and a target set of processes

Includes health policies for common application problems

Customizable health conditions and health actions

WebSphere Application Server Version 8.5 can monitor servers for common health problems and take corrective action. There are several health conditions that can be defined using health policies. When a violation of a health policy is detected, an action plan can be put into effect automatically – including sending email, capturing diagnostic data, and restarting the application server. Application server restarts are smart and done in a way to prevent outage and service policy violations.

CONNECT WITH US:

Benefits of Extreme Scale running inside WAS 8.5

Catalog and container servers as managed application servers

Catalog service domain configuration

Simplified connection to the grid

Simplified grid container management

Simplified administration and management

Use PMI for monitoring

25

CONNECT WITH US:

Details of Extreme Scale running inside WAS 8.5

Catalog and container servers as managed application servers

Catalog and container servers are defined as standard application server instances within a WebSphere Application Server cell. They receive all the health/resiliency benefits of running inside WAS 8.5

For high availability reasons, use dedicated clusters for catalog and container servers. Best results can be achieved by having at least three catalog server instances that are in different physical locations

Also, use a distinct WebSphere cluster to host the grid infrastructure (container servers) to decouple it from catalog servers and applications

Catalog service domain configuration

Catalog service domains define a group of catalog servers that are associated to a grid infrastructure. Catalog service domains can be easily defined as administered resources within a WAS ND environment

Simplified connection to the grid

In a stand-alone environment, connecting to the grid requires knowledge of the host name and port of the catalog servers. In an integrated environment, this requirement can be simplified by using a catalog service domain. More specifically, the catalog service domain can be queried to obtain the list of catalog service endpoints that client uses to establish a connection to the catalog service

26

CONNECT WITH US:

Details of Extreme Scale running inside WAS 8.5

Simplified grid container management

Catalog and container servers are Grid configurations can be packaged within a Java Platform, Enterprise Edition module (Web or EJB module) and deployed as a standard enterprise archive (EAR) file. When a Java Platform, Enterprise Edition application starts, WebSphere eXtreme Scale checks for grid configuration files (objectGrid.xml and objectGridDeployment.xml) within the META-INF folder of the EJB and WEB modules. If only objectGrid.xml is found, the application server is assumed to be a grid client. If both objectGrid.xml and objectGridDeployment.xml are present, the application server acts as a container by implementing the specified deployment policies

Simplified administration and management

Catalog service domains define a group of catalog servers that are associated to a grid infrastructure. Catalog service domains can be easily defined as administered resources within a WebSphere Application Server Network Deployment environment

27

CONNECT WITH US:

Monitoring helps find problems

Monitor for operational exceptions and to maintain health of the system:

Expand the default size of error logs

Check error logs for exceptions

Archive error logs for forensic analysis

Alert on messages in DLQ or error queue

Distinguish between operational vs. Application alerts

28

CONNECT WITH US:

FFSTSummary

Why learn to use the tools that summarize FFST and error files?

“Because that’s where the errors are.”

FFST files contain useful information on serious problems http://hursleyonwmq.wordpress.com/2007/05/04/introduction-to-ffsts/

What is FFST? FFST stands for First Failure Support Technology, and is

“designed to create detailed reports for IBM Service with information about the current state of a part of a queue manager together with historical data”

FFSTSummary is a tool that produces one line summaries of these serious problems, arranged in chronological order

29

CONNECT WITH US:

Lesson Learned

Install and Use the best tools for measurement Identify bottlenecks (flows, queuries)

Use the best available hardware: Solid State Drives

Fast network: 10 gigabit +

Multiple Nics

Tune one thing at a time

It is a capital crime to theorize in advance of your data

30

CONNECT WITH US:

Learn More… Prolifics at InterConnect

31

Monday How Broadcast Music, Inc.

Devised and Enabled Enterprise

Architecture from Corporate

Strategy

12:15 PM - 1:15 PM

Integrating Salesforce.com and

Oracle ERP Using IBM WebSphere

Cast Iron

2:00 PM - 3:00 PM

Recommended Design

Considerations for Enterprise

Monitoring using SCAPM and

Netcool OMNIbus

5:00 PM - 6:00 PM

Tuesday Smarter Integration Using the IBM

SOA Foundation Stack: Best

Practices and Lessons Learned

8:00 AM - 9:00 AM

Best Practices for Monitoring Your

Cloud Environment and

Applications

9:30 AM - 10:30 AM

Applicability of IBM SOA Approach

In Manual Processes Automation

11:30 AM-11:50 AM

Leveraging Governance in the

IBM WebSphere Service Registry

and Repository for IIB and

DataPower

12:30 PM - 1:30 PM

Broadcast Music Inc. Release

Rockstars: Program-Wide DevOps

Success with UrbanCode Deploy

3:30 PM - 4:30 PM

Empowering SmartCloud APM -

Predictive Insights and Analysis: A

Use Case Scenario

5:30 PM - 6:30 PM

Wednesday Architecting and Tuning

IIB/eXtreme Scale for Maximum

Performance and Reliability,

Featuring TBC

8:00 AM - 9:00 AM

MasterCard's Modeling and

Governance of Decisions and

Processes for Improved Fraud

11:00 AM - 12:00 PM

How BMI is Revolutionizing the

Music Business Using IBM’s BPM

and Integration Technology

2:00 PM - 3:00 PM

Integrating IBM Pure Application

Systems and IBM Urbancode

Deploy: A GE Capital Case Study

2 :00 PM – 3:00 PM

Thursday Aetna’s Vision for a Healthier

World: Smarter Architecture and a

Scalable Integration Bus

9:00 AM - 10:00 AM

Meet the Expert - Delivering

Enterprise Applications: Faster.

Cheaper. Better

Thursday 12:00 PM – 12:50 PM

Using the Power of IBM Tivoli

Common Reporting to Make Smart

Decisions: The Untold Story

2:30 PM - 3:30 PM

CONNECT WITH US:

Let’s Continue the

Conversation….

A.J. Aronoff

[email protected]

Visit these useful links on the Prolifics website: Case Studies http://www.prolifics.com/resources/case-studies

Webcasts http://www.prolifics.com/resources/webcasts

Videos http://www.prolifics.com/resources/videos

Solution Briefs http://www.prolifics.com/resources/solution-briefs

Blog http://www.prolifics.com/blog

Twitter http://www.twitter.com/prolifics

Facebook http://www.facebook.com/ProlificsTech

Prolifics TV http://www.youtube.com/prolificstv

CONNECT WITH US:

Thank You

Your Feedback is

Important!

Access the InterConnect 2015

Conference CONNECT Attendee Portal

to complete your session surveys from

your smartphone, laptop or conference

kiosk.

architecting and tuning iib/extreme scale for maximum performance and reliability, featuring tbc

Technology

extreme scale benefit

tuning iib extreme scale

customer integration

high speed

prolifics aj aronoff

reliable performance

maximum performance

business agility