system insight without interference

58
Insight without Interference Monitoring with Scala, Swagger, MongoDB and Wordnik OSS Tony Tam @fehguy

Upload: tony-tam

Post on 11-May-2015

3.287 views

Category:

Technology


0 download

DESCRIPTION

Talk at Wordnik HQ about how to monitor application performance and business goals without intrusive engineering work on your core product.

TRANSCRIPT

Page 1: System insight without Interference

Insight without InterferenceMonitoring with Scala, Swagger, MongoDB and Wordnik

OSSTony Tam@fehguy

Page 2: System insight without Interference

Nagios Dashboard

Page 3: System insight without Interference

Monitoring?

IT Ops 101

Host Checks

System

Load

Disk Space

Network

Page 4: System insight without Interference

Host Checks

System

Load

Disk Space

Network

Monitoring?

Necessary(but

insufficient)

Page 5: System insight without Interference

Why Insufficient?

•What about Services?

• Database running?

• HTTP traffic?

•Install Munin Node!

• Some (good) service-level insight

Page 6: System insight without Interference
Page 7: System insight without Interference

Your boss LOVES charts

“OH pretty

colors!”

“up and to the right!”“it MUST

be important

!”

Page 8: System insight without Interference

Good vs. Bad?

•Database calls avg 1ms?

• Great! DB working well

• But called 1M times per page load/user?

•Most tools are for system, not your app

•By the time you know, it’s too late

Need business metrics

monitoring!

Page 9: System insight without Interference

Enter APM

•Application Performance Monitoring

•Many flavors, degrees of integration

• Heavy: transaction monitoring, code performance, heap, memory analysis

• Medium: home-grown profiling

• Light: digest your logs (failure forensics)

•What you need depends on architecture, business + technology stage

Page 10: System insight without Interference

APM @ Wordnik

•Micro Services make the System

Monolithic application

Page 11: System insight without Interference

APM @ Wordnik

•Micro Services make the System

Monolithic application

API Calls are the unit of work!

Page 12: System insight without Interference

Monitoring API Calls

•Every API must be profiled

•Other logic as needed

• Database calls

• Connection manager

• etc...

•Anything that might matter!

Page 13: System insight without Interference

How?

•Wordnik-OSS Profiler for Scala

• Apache 2.0 License, available in Maven Central

•Profiling Arbitrary code block:import com.wordnik.util.perf.Profile

Profile("create a cat", {/* do something */})

•Profiling an API call:Profile("/store/purchase", {/* do something */})

Page 14: System insight without Interference

Profiler gives you…

•Nearly free*** tracking

•Simple aggregation

•Trigger mechanism

• Actions on time spent “doing things”:

Profile.triggers += new Function1[ProfileCounter, Unit] { def apply(counter: ProfileCounter): Unit = { if (counter.name == "getDb" && counter.duration > 5000) wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW) return counter }}

Page 15: System insight without Interference

Profiler gives you…

•Nearly free*** tracking

•Simple aggregation

•Trigger mechanism

• Actions on time spent “doing things”:

Profile.triggers += new Function1[ProfileCounter, Unit] { def apply(counter: ProfileCounter): Unit = { if (counter.name == "getDb" && counter.duration > 5000) wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW) return counter }}

This is intrusive on

your codebase

Page 16: System insight without Interference

Accessing Profile Data

•Easy to get in codeProfileScreenPrinter.dump

•Output where you wantlogger.info(ProfileScreenPrinter.toString)

•Send to logs, email, etc.

Page 17: System insight without Interference

Accessing Profile Data

•Easier to get via API with Swagger-JAXRS

import com.wordnik.resource.util

@Path("/activity.json")@Api("/activity")@Produces(Array("application/json"))class ProfileResource extends ProfileTrait

Page 18: System insight without Interference

Accessing Profile Data

Page 19: System insight without Interference

Accessing Profile Data

Inspect without bugging

devs!

Page 20: System insight without Interference

Is Aggregate Data Enough?

•Probably not

•Not Actionable

• Have calls increased? Decreased?

• Faster response? Slower?

Page 21: System insight without Interference

Make it Actionable

•“In a 3 hour window, I expect 300,000 views per server”

• Poll & persist the counters

• Example: Log page views, every min{

"_id" : "web1-word-page-view-20120625151812","host" : "web1","count" : 627172,"timestamp" : NumberLong("1340637492247")

},{"_id" : "web1-word-page-view-20120625151912","host" : "web1","count" : 627372,"timestamp" : NumberLong("1340637552778")

}

Page 22: System insight without Interference

Make it Actionable

Page 23: System insight without Interference

Make it Actionable

Your boss LOVES charts

Page 24: System insight without Interference

That’s not Actionable!

•But it’s pretty

What’s missing?

APIs to track?

Low + High

Watermarks

Custom Time

window

Too much custom

Engineering

Page 25: System insight without Interference

That’s not Actionable!

APIs to track?

Low + High

Watermarks

Custom Time

window

Too much custom

Engineering

Call to Action!

Page 26: System insight without Interference

Make it Actionable

•Swagger + a tiny bit of engineering

• Let your *product* people create monitors, set goals

•A Check: specific API call mapped to a service function{ "name": "word-page-view", "path": "/word/*/wordView (post)", "checkInterval": 60, "healthSpan": 300, "minCount": 300, "maxCount": 100000}

Page 27: System insight without Interference

Make it Actionable

•A Service Type: a collection of checks which make a functional unit { "name": "www-api", "checks": [ "word-of-the-day", "word-page-view", "word-definitions", "user-login", "api-account-signup", "api-account-activated" ] }

Page 28: System insight without Interference

Make it Actionable

•A Host: “directions” to get to the checks { "host": "ip-10-132-43-114", "path": "/v4/health.json/profile?api_key=XYZ", "serviceType": "www-api”},{ "host": "ip-10-130-134-82", "path": "/v4/health.json/profile?api_key=XYZ", "serviceType": "www-api”}

Page 29: System insight without Interference

Make it Actionable

•And finally, a simple GUI

Page 30: System insight without Interference

Make it Actionable

•And finally, a simple GUI

Page 31: System insight without Interference

Make it Actionable

•Point Nagios at this!serviceHealth.json/status/www-api?explodeOnFailure=true

•Get a 500, get an alert

Metrics from

Product

Based on YOUR app

Treat like system failure

Page 32: System insight without Interference

Make it Actionable

Page 33: System insight without Interference

Is this Enough?

System monitoring

Aggregate monitoring

Windowed monitoring

Object monitoring?

• Action on a specific event/object

Why!?

Page 34: System insight without Interference

Object-level Actions

•Any back-end engineer can build this

• But shouldn’t

•ETL to a cube?

•Run BI queries against production?

•Best way to “siphon” data from production w/o intrusive engineering?

Page 35: System insight without Interference

Avoiding Code Invasion

•We use MongoDB everywhere

•We use > 1 server wherever we use MongoDB

•We have an opLog record against everything we do

Page 36: System insight without Interference

What is the OpLog

•All participating members have one

•Capped collection of all write ops

primary replica replicat0

time

t1

t3

t2

time

Page 37: System insight without Interference

So What?

•It’s a “pseudo-durable global topic message bus” (PDGTMB)

• WTF?

•All DB transactions in there

•It’s persistent (cyclic collection)

•It’s fast (as fast as your writes)

•It’s non-blocking

•It’s easily accessible

Page 38: System insight without Interference

More about this{

"ts" : {"t" : 1340948921000, "i" : 1

},"h" : NumberLong("5674919573577531409"),"op" : "i","ns" : "test.animals","o" : {"_id" : "fred", "type" : "cat"}

}, {"ts" : {

"t" : 1340948935000, "i" : 1},"h" : NumberLong("7701120461899338740"),"op" : "i","ns" : "test.animals","o" : {

"_id" : "bill", "type" : "rat"}

}

Page 39: System insight without Interference

Tapping into the Oplog

•Made easy for you!https://github.com/wordnik/wordnik-oss

Page 40: System insight without Interference

Tapping into the Oplog

•Made easy for you!https://github.com/wordnik/wordnik-oss

SnapshotsReplication

Incremental Backup

Same Techniqu

e!

Page 41: System insight without Interference

Tapping into the Oplog

•Create an OpLogProcessor

class OpLogReader extends OplogRecordProcessor { val recordTriggers = new HashSet[Function1[BasicDBObject, Unit]] @throws(classOf[Exception]) def processRecord(dbo: BasicDBObject) = { recordTriggers.foreach(t => t(dbo)) } @throws(classOf[IOException]) def close(string: String) = {}}

Page 42: System insight without Interference

Tapping into the Oplog

•Attach it to an OpLogTailThreadval util = new OpLogReader

val coll: DBCollection =

(MongoDBConnectionManager.getOplog("oplog",

"localhost", None, None)).get

val tailThread = new OplogTailThread(util, coll)

tailThread.start

Page 43: System insight without Interference

Tapping into the Oplog

•Add some observer functions

util.recordTriggers += new Function1[BasicDBObject, Unit] { def apply(e: BasicDBObject): Unit = Profile("inspectObject", { totalExamined += 1 /* do something here */ } }) } }

Page 44: System insight without Interference

/* do something here */

•Like?

•Convert to business objects and act!

• OpLog to domain object is EASY

• Just process the ns that you care about

"ns" : "test.animals”

•How?

Page 45: System insight without Interference

Converting OpLog to Object

•Jackson makes this trivial

case class User(username: String, email: String, createdAt: Date)

val user = jacksonMapper.convertValue( dbo.get("o").asInstanceOf[DBObject], classOf[User])

•Reuse your DAOs? Bonus points!

•Got your objects!

Page 46: System insight without Interference

Converting OpLog to Object

•Jackson makes this trivial

case class User(username: String, email: String, createdAt: Date)

val user = jacksonMapper.convertValue( dbo.get("o").asInstanceOf[DBObject], classOf[User])

•Reuse your DAOs? Bonus points!

•Got your objects!Now

What?

“o” is for “Object”

Page 47: System insight without Interference

Use Case 1: Alert on Action

•New account!obj match { case newAccount: UserAccount => { /* ring the bell! */ } case _ => { /* ignore it */ }}

Page 48: System insight without Interference

Use case 2: What’s Trending?

•Real-time activitycase o: VisitLog =>

Profile("ActivityMonitor:processVisit", {

wordTracker.add(o.word)

})

Page 49: System insight without Interference

Use case 3: External Analytics

case o: UserProfile => {

getSqlDatabase().executeSql(

"insert into user_profile values(?,?,?)",

o.username, o.email, o.createdAt)

}

Page 50: System insight without Interference

Use case 3: External Analytics

case o: UserProfile => {

getSqlDatabase().executeSql(

"insert into user_profile values(?,?,?)",

o.username, o.email, o.createdAt)

}

Don’t mix runtime &

OLAP!

Your Data pushes to Relational!

Page 51: System insight without Interference

Use case 4: Cloud analysis

case o: NewUserAccount => {

getSalesforceConnector().create(

Lead(Account.ID, o.firstName, o.lastName,

o.company, o.email, o.phone))

}

Page 52: System insight without Interference

Use case 4: Cloud analysis

case o: NewUserAccount => {

getSalesforceConnector().create(

Lead(Account.ID, o.firstName, o.lastName,

o.company, o.email, o.phone))

} We didn’t interrupt

core engineering

!

Pushed directly to Salesforce!

Page 53: System insight without Interference

Examples

Polling profile APIs

cross cluster

Page 54: System insight without Interference

Examples

Siphoning hashtags

from opLog

Page 55: System insight without Interference

Examples

Page view activity from

opLog

Page 56: System insight without Interference

Examples

Health check w/o

engineering

Page 57: System insight without Interference

Summary

•Don’t mix up monitoring servers & your application

•Leave core engineering alone

•Make a tiny engineering investment now

•Let your product folks set metrics

•FOSS tools are available (and well tested!)

•The opLog is incredibly powerful

• Hack it!

Page 58: System insight without Interference

Find out more

•Wordnik: developer.wordnik.com

•Swagger: swagger.wordnik.com

•Wordnik OSS: github.com/wordnik/wordnik-oss

•Atmosphere: github.com/Atmosphere/atmosphere

•MongoDB: www.mongodb.org