the john hancock monitoring story, futurestack17

We operate as John Hancock in the United States, and Manulife in other parts of the world.

The John Hancock Monitoring Story:

Implementation OR Adaptation?

What does it take to succeed with New Relic?

September 2017


Navpreet SinghHead of Technical Resolution at John Hancock

2

3

Manulife & John Hancock

Source: http://www.manulife.com/Our-Story

A Global company

22 million customers,

35,000 employees, 70,000 agents,

thousands of distribution partners

Global Assets Under Management

and Administration exceeded

$1 trillion in the first quarter of 2017

http://www.manulife.com/Our-Story

4

Technology Landscape @ John Hancock

150 Year-old Business

Early IT Adapter

Using mainframe

MainframeCOBOLMicrofocus…

ServerlessMicroservicesIn Cloud

VB, PB, Progress, VFP…

Java, .Net, Ruby, Node, Angular, React, PHP…

Windows, Linux, Solaris, AIX…

SQL Server, Oracle, DB2, MySQL…

…And every version of these!

, cloud, and everything in between

5

Technology Landscape @John Hancock

600+ applications developed both in-house

and with vendors

Hosted on multiple models

Thousands of IT/IS professionals


The Manulife/John Hancock Reality

Before New Relic

Disparate Monitoring Solutions

Many different approaches to

monitor applications

No monitoring software for many applications

Basic hardware monitoring for

ops and vendors

But…Applications talk to each other all the time!

Result: Large holes in end-to-end monitoring

8

Example Scenarios

Web page loading slow

Batch process running slow

Don’t know CPU? or RAM? or Disk? or SQL? or App? issue

Dev team can only access app logs;Can’t capture CPU/RAM usage

Need server admin & DBAMeet Service admin to capture CPU/RAM usageWait for assigned admins to respond

Takes hours to days just to obtain databefore troubleshooting

Performance Issues

9

Example Scenarios

Web page errors

App layer / Business layer errors

SQL errors

Dev team uses app logs; limited insight

Need to bring to lower regions, do code debugging

Time consuming exercise, lack of real time trace.Web page -> App component -> SQL invokedfrom App

Lack of detail @thread level tracing forperformance issues

Need architect / admins

Application Errors

10

Increased Priority Incidents = Need for Better Monitoring

Move from reactive to proactive

We needed a

central monitoring standard

Resolve issues quickly

Improve understanding of application

behavior

Improve visibility into applications

in production

Enter


We’re All a Product of Our Environment!What Else Was Happening When New Relic Was Being Introduced?

What Else Was Happening?

Move to CloudPredominantly Azure IaaS with some PaaS, App Service

Some AWS

Move to AgileLargely Scrum, SAFe with some advanced concepts like TDD+Pairing

Push to DevOps

New Relic push aligns with DevOps and Agile

CIO/COO sets a Clear Goal!

All applications in Production must be monitored by New Relic within one year

An aggressive, clear, & unambiguous goal:

What’s Next?

What’s the right Team Structure?

Who should Ownmonitoring setup and responsibilities?

15

Monitoring Ownership

Goal: End-to-end monitoring solution

which spans tiers, hardware, and software

Monitoring Servers

Ops team has clear ownership

Monitoring ApplicationsNot so clear

?

16

Monitoring Ownership options

A specialized central monitoring team focused on application monitoring

Ops team owns all monitoring, drives it with

the application teams

1 2

Each app team owns setting up

monitoring

3

17

Our Ownership Solution at JH: It’s a Hybrid!

Each app team owns setting up monitoring for

their applications

Center of Excellenceset up to drive the effort

Culture change – very important.This distinguishes adaptation from a simple software implementation

For one BU with 100+ apps, a central monitoring teamestablished within the BU

18

Engagement Methodology with App Teams

1st set of Meetings:

New Relic Buy-in

2nd set of Meetings:

App’s Tech

Proposal:

App + New Relic = Great Things!

Periodic Check-ins

Adaptation: Best Practices & Suggestions

Culture ChangeGet Buy-In

Highlight the Wins & Success Stories

to Top Leadership

Nurture an Internal Community

Monitoring Maturity CurveDifferent types of monitoring

Alerts – Getting them right

Insights – IT Analytics

Insights – Business Analytics

21

Agile mindset to the project

Bias towards action

Don’t sit in a room discussing / researching until you know all the answers

Figure out enough to get started, start executing, find answers in the process – Inspect and Adapt

22

Progress Shared monthly with all Senior IT Leaders

Metrics showed:

# of users

Growth over a period:

% Apps by Status

Monthly growth by BU

Metrics Highlighted to Track Progress

Agent TypeMin.

Contracted Apr May Jun

APM (Application

Performance Monitors) 264 61 98 126

Servers Unlimited 575 675 725

Mobile Apps 250000 0 0 298

Browser (Million Checks) 75 1.5 8.3 11

Synthetic*(Million checks) 1.5 1.4 1.4 0.7

Jan-17

Feb-17

Mar-17

Apr-17

May-17

Jun-17

‘In Progress’ and ‘Completed’JH DA

JH DA


Speed Bumps?

Before You Can Live Happily Ever After…

24

Some speed bumps we faced?

Firewall – took a long time to resolve internally

SSL issue with older java apps

Sweet spot – Great with tech within the last 20-30 years and upcoming technologies

IBM technologies

PMI Metrics with Websphere

Private Locations Azure deployable image

Server Agents (& breadth)


Some Happy Endings…

26

Results - Success Stories

APM: A group improved page performance by 3 secs per page load by identifying tuning opportunities with a SQL executed multiple times for every page load

Synthetics: A group identified a 100+ MB static file was being served by webservers in MA instead of Akamai CDN

SQL Server Plugin: A team identified their Page Life Expectancy had deteriorated drastically since DB moved to new server, indicating inadequate RAM allocated

Insights: A team identified uneven load distribution across servers was causing severely degraded performance

Server API+Synthetics: A team uses alerts on memory exhaustion to avoid what used to be definite downtime

28

Going Forward… The Journey Continues

Recently Acquired

Infrastructure Product

NR Software Analysis Review

NR Expert

Services

Increased Insights

Retention Period

Miles to go…

29

Questions?