frontend monitoring @ grammarly
Post on 14-Aug-2015
360 Views
Preview:
TRANSCRIPT
Frontend Monitoring@Grammarly
Grammarly Products
● Web editor - single page app
● Browser extensions for ○ Chrome, Safari, FireFox
● M$ Office Add-in
● Funnels
Grammarly Products :: Load
● 1M+ active users; 0.5M+ daily active
● 10M+ users planned next year
● 30 services on 300+ servers
● 130M+ requests/day
● 3.8M+ WebSocket connections/day
What’s ur problem,bro?
The problem
● Model != Reality
● 1B websites * X browsers
● Free users: Problem? => ⌘+W | Alt+F4
● Paid users: Problem? => Let’s torture support
The problem :: CI / …
● Bugs
● Daily releases
● Performance testing
● A/B testing
The Solution...
The Solution :: Monitoring
● Monitoring != Tracking● Monitoring:
○ all data are volatile○ helps to assess quality in terms of
■ stability and ■ performance
○ fast problem detection and alerting○ troubleshooting○ different data sources incl. tracking events
Grammarly FE Monitoring :: The Saga
● Manual testing● NewRelic● Errorception● Sentry● …● Profit (Custom Solution)
FE Monitoring @ Grammarly
● Logging○ events with context (userId, UserAgent,
stacktraces, etc...)○ special cases only, no tracing ‘blah-blah’ logs
● TS Metrics○ everything else :)
● Alerting
FE Monitoring Web browser
Elasticsearch x 4
Grafana Kibana
x 2Nginx
Access logs
LogstashStatsD
Graphite
x 2Sensu Checks
Sensu Server
OpsGenie
Logging :: Backend Web browser
Elasticsearch x 4
Grafana Kibana
x 2Nginx
Access logs
LogstashStatsD
Graphite
x 2Sensu Checks
Sensu Server
OpsGenie
TS Metrics :: Backend Web browser
Elasticsearch x 4
Grafana Kibana
x 2Nginx
Access logs
LogstashStatsD
Graphite
x 2Sensu Checks
Sensu Server
OpsGenie
FE Monitoring
metrics codec
Logstash
metrics data~ 2600 RPS
~ 90 GiB / day
logs data~ 450 RPS
~ 50 GiB / day
Nginx access logs
StatsD
logs codec(+source maps)
ElasticSearch
tail *.log files
UDP HTTP
FE Monitoring in numbers
● 38M logs/day
● up to 3K logs/sec @ busy hours
● ~100 Graphite metrics
● 6 servers + 2 shared w/ backend monitoring
Logging :: JS Library
● legacy codebase from raven-js
● named loggers
● log levels (info, warn, error)
● default data in all events (aka MDC)
● scopes (lifetime, session, document)
Logging :: JS Library
kibana screen
TS Metrics
● StatsD metrics: ○ counter (inc/dec)
○ timer: values for which StatsD calculates avg, min, max, percentile
○ set
TS Metrics :: JS Library
Metric name: ui.performance.chrome.popup.loadCardinality is limited by Graphite storage (whisper)● product● version● browser● region (US | World)
TS Metrics :: JS Library
TS Metrics :: UI
Case study● “Creeping” Versions● Active users● WebSocket errors● Stability● Performance● Page loads success/errors percentage● Bugs: …
The Solution :: Adoption Problems● JS monitoring code bugs =>
○ wrong data○ self-DDoS
● FP alerts even on a correct data● Developers aren’t very passionate about writing logs
and metrics● Some education activity is needed to promote usage● turning monitoring into engineering practice
The Solution :: Other Problems● Lack of data verification mechanism
● Graphite disk space issues
● High load as users base grows
● Monitoring infrastructure stability
Near Future Plans● Graphite disk space scaling (Cassandra)● Client/server protocol optimization● Simple API for getting monitoring data for tests● Trends / New Events dashboards/facility● Simplified ES => Graphite metrics routing● Automatic code changes verification with A/B testing &
logs/metrics analysis
Questions?
Sergey Rudenko
Frontend engineer
fb://rudenko.sergii@Rudegrudenko.sergey92@gmail.com
Aleksey Yashchenko
Backend engineer
fb://tuxslayer@tuxslayertuxslayer@gmail.com
top related