gocon autumn (story of our own monitoring agent in golang)
TRANSCRIPT
![Page 1: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/1.jpg)
Story of our own Monitoring Agent
in golang@dxhuy
LINE corp
![Page 2: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/2.jpg)
Introduction
• @dxhuy • Vietnamese • Building monitoring stack at LINE
![Page 3: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/3.jpg)
My goal today• Join GoConference without lottery
![Page 4: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/4.jpg)
My goal today• Show that this is not 100% true
![Page 5: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/5.jpg)
Today takeaway
→Anatomy of monitoring agent →How to design one →Challenges and learn
![Page 6: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/6.jpg)
Monitoring Agent !?
![Page 7: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/7.jpg)
![Page 8: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/8.jpg)
![Page 9: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/9.jpg)
• Small application run on host machine • Collect host machine metrics
• Request latency? • MySQL load? • Redis hit/miss rate? • .....
• Aggregate metrics (sum/avg/histogram..) • Send to collector server → alert / chart ...
• statsd / collectd / telegraf...
![Page 10: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/10.jpg)
![Page 11: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/11.jpg)
Not a generic log transfer
![Page 12: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/12.jpg)
Why not reuse existing technology?
• Scale problem • We need to write our own stack
• Various environment problem • Management problem • Development velocity problem
![Page 13: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/13.jpg)
Let's start write our own
![Page 14: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/14.jpg)
Language
![Page 15: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/15.jpg)
![Page 16: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/16.jpg)
![Page 17: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/17.jpg)
Features
![Page 18: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/18.jpg)
• Modularity (for user)
• Buffer (prevent data loss)
• Management friendly (for admin)
![Page 19: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/19.jpg)
Modularity
• What is modularity? • Easily to add new metrics from user
view • Pluggable
![Page 20: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/20.jpg)
Modularity• How?
• Input : get metric • Codec : understand metric • Output : send metric
![Page 21: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/21.jpg)
// Metric is central model for imonDtype Metric struct {
ProtocolVersion ProtocolVerName stringVal ValueTimeStamp time.TimeFingerprint FingerprintType MetricTypeLabels map[string]string
}
![Page 22: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/22.jpg)
Input Plugin design
![Page 23: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/23.jpg)
Input Plugin design
• Three important things: • Process model • Plugin model • Collecting model (push vs pull)
![Page 24: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/24.jpg)
Process model
Single process vs
Multiple process
![Page 25: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/25.jpg)
Process model
- Adv : easy management / maintainance
- DisAdv : one bad plugin could affect the whole
![Page 26: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/26.jpg)
Same language vs
Embedded language
Plugin model
![Page 27: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/27.jpg)
Plugin model- Adv: Simple model, better maintainance - DisAdv: each time add new plugin, need to restart the whole agent
![Page 28: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/28.jpg)
// InputPlugin represent an input plugin interfacetype InputPlugin interface {
Interval() config.DurationGracefulStop() errorName() stringType() InputType
}
type InputByte interface {Decoder() codec.DecoderReadBytesWithContext(ctx context.Context) ([]byte, error)
}
type InputMetrics interface {ReadMetricsWithContext(ctx context.Context) (model.Metrics, error)
}
All plugins share same interface
![Page 29: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/29.jpg)
Push vs
Pull
Collecting model
![Page 30: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/30.jpg)
Collecting model
- Adv: less affect to middleware, simple model - DisAdv: Application need to expose some thing to "pull" (http endpoint / file / ..)
![Page 31: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/31.jpg)
func (i *MemcachedInput) ReadMetricsWithContext(ctx context.Context) (model.Metrics, error) {
..............conn, err := net.DialTimeout("tcp", i.endpoint, i.timeout.Duration)if err != nil {
return nil, err}defer conn.Close()
_, err = conn.Write([]byte("stats\n"))if err != nil {
return nil, err}..................scanner := bufio.NewScanner(conn)
for scanner.Scan() {text := scanner.Text()if text == "END" {
break}// Split entries which look like: STAT time 1488291730entries := strings.Split(text, " ")if len(entries) == 3 {
v, err := strconv.ParseInt(entries[2], 10, 64)if err != nil {
log.Debug("invalid value %s", entries[2])continue
}
ms = append(ms, *model.NewMetric(entries[1],model.Value(float64(v)),time.Now(),model.GaugeType,
))}
}..........return ms, nil
}
Pull sample directly contact server
![Page 32: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/32.jpg)
Codec Plugin / Output Plugin
![Page 33: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/33.jpg)
type Encoder interface {//Name() stringEncode(metrics model.Metrics) ([]byte, error)Name() string
}
type Decoder interface {//Name() stringDecode(input []byte) (model.Metrics, error)Name() string
}
Codec interface
![Page 34: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/34.jpg)
// OutputPlugin represent an output plugin interfacetype OutputPlugin interface {
WriteWithContext(ctx context.Context, metrics model.Metrics) error // for Cancellable write
Encoder() codec.EncoderInterval() config.DurationGracefulStop() errorWalReader() wal.LogReaderName() string
}
Output interface
![Page 35: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/35.jpg)
Buffer design
![Page 36: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/36.jpg)
each Output maintain its own offset i offset will be update when output success
Buffer design
![Page 37: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/37.jpg)
Buffer design• Advantages
• When output failed, just rollback index
• Chunks will be organized by segments (each segments ~ 1GB) • To clean up, just delete old segments
which already consumed by all output
![Page 38: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/38.jpg)
Buffer design• Other concerns
• Serialization • It's not hard to write your own serialization method (link)
• mmap vs file read • not much different in our case • mmap index management is cubersome to write because it
has to manipulate at 2^n address
• Concurrent write vs Synchronized write • Synchronized write for data safety
https://www.slideshare.net/dxhuy88/story-writing-byte-serializer-in-golang
![Page 39: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/39.jpg)
Buffer designtype LogReader interface {
Read() (model.Metrics, error)Read1() (model.Metrics, error)CurrentOffset() int64SetOffset(int64) errorDestroy() error
}
type LogWriter interface {Write(*model.Metrics) errorLastOffset() int64
}
![Page 40: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/40.jpg)
Management friendly
• Monitoring agents is f**king hard
• Deploy agents in large scale is painful
![Page 41: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/41.jpg)
Potential risk
• Die without noticing • Over resource consume • Overflow buffer • Dirty data • Resend storm
![Page 42: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/42.jpg)
Resend storm is aweful
![Page 43: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/43.jpg)
How we solve those problems
• Expose agent state as http endpoint • and monitoring them all using prometheus • Monitoring everything
• Aliveness / CPU / Memory / Output Lag • Using circuitbreaker / jitter resend to
prevent resend storm
![Page 44: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/44.jpg)
func (b *AutoOpenBreaker) Close() {log.Info("close breaker for %v", b.autoOpenTime)b.state = CLOSEb.closeTime = time.Now()go b.autoOpen()
}
func (b *AutoOpenBreaker) open() {b.state = OPEN
}
func (b *AutoOpenBreaker) IsOpen() bool {return b.state == OPEN
}
func (b *AutoOpenBreaker) autoOpen() {tick := time.Tick(b.autoOpenTime)select {case <-tick:
log.Info("auto open breaker after %v", b.autoOpenTime)b.open()
}} Circuit
breaker
![Page 45: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/45.jpg)
func (i *Output) retry(left int, cancelCtx context.Context, f func() error) error {
select {case <-cancelCtx.Done():
return fmt.Errorf("got cancelled")default: // no-op}
// jitter retrym := math.Min(capacity, float64(base*math.Pow(2.0, float64(maxRetry-
left))))s := rand.Intn(int(m))log.Debug("retry sleep %d second", s)time.Sleep(time.Duration(s) * time.Second)
// do some work....}
jitter
![Page 46: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/46.jpg)
Agent monitoring using prometheus / grafana
![Page 48: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/48.jpg)
Admin page
![Page 49: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/49.jpg)
Finally• Golang is awesome
• Quick prototype, works everywhere • Never, ever write your own agent
• ... unless you have to • But it's fun because there're a lot of
problems
![Page 50: GOCON Autumn (Story of our own Monitoring Agent in golang)](https://reader034.vdocuments.mx/reader034/viewer/2022052117/5a6477777f8b9afc4d8b47ab/html5/thumbnails/50.jpg)
We're hiring