Download - Monitoring using Open source technologies
![Page 1: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/1.jpg)
MonitoringUsing
Open Source Technologies
![Page 2: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/2.jpg)
Utkarsh Bhatnagar
• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).• An active contributor to Grafana.• Project initiator for wizzy – a user friendly CLI tool for GRAFANA
GitHub - https://github.com/utkarshcmuEmail – [email protected]
GrafanaCon 2016 Speaker - https://www.youtube.com/watch?v=llRhdvV25rg
![Page 3: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/3.jpg)
Monitoring using
@
![Page 4: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/4.jpg)
PlayStation Outage!
![Page 5: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/5.jpg)
Hi, I am Jack.
Sometime 2 years back…
![Page 6: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/6.jpg)
POC on Monitoring
Requirements:
• 50,000 unique metrics from one source• Data points every minute• Roughly about 72 million data points per day• Data retention 60 days• User friendly UI with possible customization
![Page 7: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/7.jpg)
Monitoring Stack
METRICSOURCE
Time Series Database Visualization Layer
![Page 8: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/8.jpg)
Choosing the technology!
![Page 9: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/9.jpg)
POCDesign & Architecture
METRICSOURCE
![Page 10: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/10.jpg)
POC Completed!
Mission accomplished!
1 metrics source50,000 unique metrics
72 million data points per day
![Page 11: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/11.jpg)
Metrics OnboardingTeam 1 Requirements:• 100,000 unique metrics• About 200 million data points per day
Team 2 Requirements:• 400,000 unique metrics• About 600 million data points per day
Team 3 Requirements:• 500,000 unique metrics• About 2 billion data points per day
Team 4 Requirements:• 800,000 unique metrics• About 5 billion data points per day
And more………
![Page 12: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/12.jpg)
POCDesign & Architecture
METRICSOURCE
![Page 13: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/13.jpg)
How to Scale?
Should he continue with Graphite?Should he ask to reduce metrics or datapoints?
How to dynamically scale Graphite?Does Grafana support other datasources?
OpenTSDB / InfluxDB / KairosDB / Prometheus?Support scaling Infrastructure to support variable load of metrics?
Challenges:• Multiple teams• Millions of unique metrics• Above 10 billion data points a day• Process 3 million logs every minute
and generate metrics• Reprocessing of metrics and logs if
needed• Provide real time monitoring for all
of the above using GRAFANA!
![Page 14: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/14.jpg)
Strategy
Divide & Conquer
Team 1 Requirements:• 100,000 unique metrics• About 200 million data
points per day
Team 2 Requirements:• 500,000 unique metrics• About 2 billion data
points per day
Team 3 Requirements:• 3 million logs a minute• Generate metrics in real
time
And more………
Team 1 Requirements:• 100,000 unique metrics• About 200 million data
points per day
![Page 15: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/15.jpg)
Design & Architecture
POCMETRICSOURCE
POC works for:
1 metrics source50,000 unique metrics
72 million data points per day
Team 1 requirements:
1 metrics source100,000 unique metrics
200 million data points per day
TEAM 1 METRIC SOURCE
![Page 16: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/16.jpg)
Team 1 Conquered!
This strategy works! Bring it on!
![Page 17: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/17.jpg)
Strategy
Divide & Conquer
Team 1 Requirements:• 100,000 unique metrics• About 200 million data
points per day
Team 2 Requirements:• 500,000 unique metrics• About 2 billion data
points per day
Team 3 Requirements:• 3 million logs a minute• Generate metrics in real
time
And more………
Team 2 Requirements:• 500,000 unique metrics• About 2 billion data
points per day
![Page 18: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/18.jpg)
Design & Architecture
POCMETRICSOURCE
Team 2 requirements:
1 metrics source500,000 unique metrics
2 billion data points per day
TEAM 1 METRIC SOURCE
TEAM 2 METRIC SOURCE
![Page 19: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/19.jpg)
Team 2 Conquered!
![Page 20: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/20.jpg)
Design & Architecture
POCMETRICSOURCE
TEAM 1 METRIC SOURCE
TEAM 2 METRIC SOURCE
Team 2 requirements:
1 metrics source500,000 unique metrics
2 billion data points per day
![Page 21: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/21.jpg)
![Page 22: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/22.jpg)
Scaling Graphite
Clustering Graphite
CARBON RELAY
CARBON CACHE + WHISPER +
GRAPHITE WEB
CARBON CACHE + WHISPER +
GRAPHITE WEB
CARBON CACHE + WHISPER +
GRAPHITE WEB. . .
GRAPHITE WEB GRAPHITE WEB
LOAD BALANCER
![Page 23: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/23.jpg)
Design & Architecture
POCMETRICSOURCE
TEAM 1 METRIC SOURCE
TEAM 2 METRIC SOURCE
Team 2 requirements:
1 metrics source500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
![Page 24: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/24.jpg)
Team 2 Conquered!
But……. Happiness lasted only for a month
![Page 25: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/25.jpg)
Design & Architecture
POCMETRICSOURCE
TEAM 1 METRIC SOURCE
TEAM 2 METRIC SOURCE
Team 2 requirements:
1 metrics source500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
![Page 26: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/26.jpg)
Scalable Alternatives ToGraphite
![Page 27: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/27.jpg)
Design & Architecture
POCMETRICSOURCE
TEAM 1 METRIC SOURCE
TEAM 2 METRIC SOURCE
Team 2 requirements:
1 metrics source500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
![Page 28: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/28.jpg)
Team 2 Conquered!
Finally!
![Page 29: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/29.jpg)
Strategy
Divide & Conquer
Team 1 Requirements:• 100,000 unique metrics• About 200 million data
points per day
Team 2 Requirements:• 500,000 unique metrics• About 2 billion data
points per day
Team 3 Requirements:• 3 million logs a minute• Generate metrics in real
time
And more………
Team 3 Requirements:• 3 million logs a minute• Generate metrics in real
time
![Page 30: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/30.jpg)
How to process logs at scale?
![Page 31: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/31.jpg)
Design & Architecture
POCMETRICSOURCE
TEAM 1 METRIC SOURCE
Team 3 requirements:
Over 5000 log sources3 million logs per minute
TEAM 2 METRIC SOURCE
LOGS SOURCES
![Page 32: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/32.jpg)
Team 3 Conquered!
But …. One day..
![Page 33: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/33.jpg)
Design & Architecture
POCMETRICSOURCE
TEAM 1 METRIC SOURCE
TEAM 2 METRIC SOURCE
LOGS SOURCES
![Page 34: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/34.jpg)
Design & ArchitectureMETRIC SOURCE 1
METRIC SOURCE 2
METRIC SOURCE 3
METRIC SOURCE N
LOGS SOURCES
LB
Alerting
![Page 35: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/35.jpg)
Metrics & Logs Sources
Graphite Stats- Apps using a stats library written byAlexander Filipchik
Custom metrics- From other sources
![Page 36: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/36.jpg)
Lessons Learned
![Page 37: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/37.jpg)
Strategy
Divide & Conquer
![Page 38: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/38.jpg)
![Page 39: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/39.jpg)
Look for alternatives!
![Page 40: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/40.jpg)
Choose scalable components!
(Subject to effort and time)
![Page 41: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/41.jpg)
Automation
![Page 42: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/42.jpg)
Design & ArchitectureMETRIC SOURCE 1
METRIC SOURCE 2
METRIC SOURCE 3
METRIC SOURCE N
LOGS SOURCES
LB
Alerting
![Page 43: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/43.jpg)
Some numbers• More than 3 million unique metrics supported
- creation and deletion happens all the time
• More than 11 billion data points written per day- across all TSDBs
• Processing about 40 billion events per day- logs and metrics events in near real time (within 30 seconds)
• More than 3000 requests per minute to Grafana dashboards- around 7000 requests in during outages
![Page 44: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/44.jpg)
Monitoring Stack @ Sony PlayStation
METRIC SOURCE 1
METRIC SOURCE 2
METRIC SOURCE 3
METRIC SOURCE N
LOGS SOURCES
LB
Alerting
![Page 45: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/45.jpg)
Grafana
A metrics visualization and alerting tool
![Page 46: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/46.jpg)
Supports multipletime series databases
![Page 47: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/47.jpg)
Supports multiple panel types
https://grafana.net/plugins
![Page 48: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/48.jpg)
Supports multiplenotification channels for alerting
![Page 49: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/49.jpg)
Other features……• Alert lists
• Drilldown links
• Template variables
• Dashboard snapshots
• Grafana.net community
• Grafana CLI
![Page 50: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/50.jpg)
http://grafana.org/
http://docs.grafana.org/
https://github.com/grafana/grafana
https://raintank.slack.com
Grafana links!
![Page 51: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/51.jpg)
![Page 52: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/52.jpg)
• Move• Copy• Extract• Insert• Remove
• Rows• Panels• Template variables• Dashboard tags
![Page 53: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/53.jpg)
• Dashboards• Datasources• Orgs• Rows• Panels• Template variables• Dashboard tags
Version Control
![Page 54: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/54.jpg)
• Production• Staging• Testing• Development
Grafana in multiple environments
![Page 55: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/55.jpg)
• Last 24 hours• By a dashboard tag• Customized dashboard list
Generate GIFs of important dashboards
![Page 56: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/56.jpg)
Generate GIFs of important dashboards
![Page 57: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/57.jpg)
• Upload/Store/Download dashboards to/in/from AWS S3 respectively.
• Search/Download community dashboards from Grafana.net
External features
![Page 58: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/58.jpg)
![Page 59: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/59.jpg)
https://utkarshcmu.github.io/wizzy-site/
https://utkarshcmu.github.io/wizzy-site/home/
https://github.com/utkarshcmu/wizzy
https://raintank.slack.com/messages/wizzy/
wizzy links!
![Page 60: Monitoring using Open source technologies](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ee92a31a28abcb0f8b4627/html5/thumbnails/60.jpg)
Utkarsh Bhatnagar
• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).• An active contributor to Grafana.• Project initiator for wizzy – a user friendly CLI tool for GRAFANA
GitHub - https://github.com/utkarshcmuEmail – [email protected]
GrafanaCon 2016 Speaker - https://www.youtube.com/watch?v=llRhdvV25rg