-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache: capture metrics related to cache records and pruning #4476
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
package cache | ||
|
||
import ( | ||
"context" | ||
"time" | ||
|
||
"go.opentelemetry.io/otel" | ||
"go.opentelemetry.io/otel/metric" | ||
) | ||
|
||
const ( | ||
instrumentationName = "github.com/moby/buildkit/cache" | ||
metricCacheRecords = "cache.records.count" | ||
metricCachePruneDuration = "cache.prune.duration" | ||
) | ||
|
||
type metrics struct { | ||
CacheRecords metric.Int64ObservableGauge | ||
CachePruneDuration metric.Int64Histogram | ||
meter metric.Meter | ||
regs []metric.Registration | ||
} | ||
|
||
func newMetrics(cm *cacheManager, mp metric.MeterProvider) *metrics { | ||
m := &metrics{} | ||
|
||
var err error | ||
m.meter = mp.Meter(instrumentationName) | ||
|
||
m.CacheRecords, err = m.meter.Int64ObservableGauge(metricCacheRecords, | ||
metric.WithDescription("Number of cache records."), | ||
) | ||
if err != nil { | ||
otel.Handle(err) | ||
} | ||
|
||
m.CachePruneDuration, err = m.meter.Int64Histogram(metricCachePruneDuration, | ||
metric.WithDescription("Measures the duration of cache prune operations."), | ||
metric.WithUnit("ms"), | ||
) | ||
if err != nil { | ||
otel.Handle(err) | ||
} | ||
|
||
reg, err := m.meter.RegisterCallback(cm.collectMetrics, m.CacheRecords) | ||
if err != nil { | ||
otel.Handle(err) | ||
} | ||
m.regs = append(m.regs, reg) | ||
|
||
return m | ||
} | ||
|
||
func (m *metrics) MeasurePrune() (record func(ctx context.Context)) { | ||
start := time.Now() | ||
return func(ctx context.Context) { | ||
dur := int64(time.Since(start) / time.Millisecond) | ||
m.CachePruneDuration.Record(ctx, dur) | ||
} | ||
} | ||
|
||
func (m *metrics) Close() error { | ||
for _, reg := range m.regs { | ||
_ = reg.Unregister() | ||
} | ||
return nil | ||
} | ||
|
||
type cacheStats struct { | ||
NumRecords int64 | ||
} | ||
|
||
func (cm *cacheManager) readStats() (stats cacheStats) { | ||
cm.mu.Lock() | ||
defer cm.mu.Unlock() | ||
|
||
stats.NumRecords = int64(len(cm.records)) | ||
return stats | ||
} | ||
|
||
func (cm *cacheManager) collectMetrics(ctx context.Context, o metric.Observer) error { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When does this get called? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This gets called automatically during a metrics collection interval (defined by the reader). So for the periodic reader, every 60 seconds by default. For the prometheus reader, whenever See the invocation of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
q: Does this mean that the calls are duplicated because there are 2 readers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just checked this. Short answer, yes. The call happens for each reader so it'll happen each time Alternatively, if the call never happens because the metrics are never checked, it happens zero times. So if the otel collector isn't configured and the prometheus endpoint doesn't exist or isn't invoked. |
||
stats := cm.readStats() | ||
o.ObserveInt64(cm.metrics.CacheRecords, stats.NumRecords) | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we should have a
cache.records.size
to collect the cache size or maybe split betweencache.records.shared.size
,cache.records.private.size
,cache.records.reclaimable.size
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure to be honest. I do think that will likely take a bit more effort though to implement so I left it out of this initial version. The reason is because I wasn't quite sure how to determine the disk usage in an efficient way taking into account mutable and immutable. I figured immutable entries didn't need to be continuously updated while mutable ones would have to be rechecked. I also didn't want to hold a lock on the disk usage or perform a potentially expensive computation to count the size.
I do think it's likely worth making a new issue for cache sizes and adding some metrics to it. I'd say all of the above are good metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right that's a very good point, disk usage is indeed a resource intensive call.