Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic to translate metric descriptors and initial flow #247

Merged
merged 13 commits into from
Dec 22, 2021

Conversation

jsuereth
Copy link
Contributor

@jsuereth jsuereth commented Dec 20, 2021

  • If you enable any of the end-to-end integration tests, you'll see metric descriptors sent to the dummy service.
  • Adds "known domains" configuration, so in the event we add new metric domain types (or system metrics define them), the metric name mapping logic can be configured.
  • Adds "CreateDefaults" method to metric config structure. This allows us to write method which use configuration without worrying about nil checking repeatedly. Note: There's probably a better "go" way to do this, let me know.
  • Updates createTimeSeries to call CreateTimeSeries. We'll need to figure out CreateServiceTimeSeries later.
  • Adds metric name/type/display name mappings (for updated version).
  • Add simple label-mapping (no label-descriptions possible)
  • Add constants for Summary mapping.

Not in this PR:

  • Add "legacy" flag for metric naming conventions that:
    • uses external.googleapis.com/OpenCensus/
    • sets display name to original metric name (or last part of the path).

@jsuereth jsuereth requested review from punya and aabmass December 20, 2021 15:36
@jsuereth jsuereth marked this pull request as ready for review December 20, 2021 15:36
@aabmass aabmass self-assigned this Dec 20, 2021
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
Comment on lines 72 to 73
// Updates config object to include all defaults for metric export.
func (cfg *Config) SetMetricDefaults() {
Copy link
Contributor

@aabmass aabmass Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the defaults here

func createDefaultConfig() config.Exporter {

and then update the unit tests to use this method instead of &Config{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wasn't sure how to go from config.Exporter back to Config PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you did is fine.

The other option is to leave the createDefaultConfig returning a config.Exporter and type assert metricMapper{cfg: createDefaultConfig().(*Config)} in the tests.

exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter_test.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
Copy link
Contributor

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits but LGTM

@@ -69,8 +91,12 @@ func newGoogleCloudMetricsExporter(
cfg: cfg,
client: client,
mapper: metricMapper{cfg},
mds: make(chan *metricpb.MetricDescriptor),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want any buffer for this channel? As is, one slow CreateMetricDescriptor call will block me.mds <- md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind CMD is optimistic. I want to give it the minimal amount of resource and I don't care if any specific call fails. The current state of it, we basically have to include logic around it, but it's dubious whether we want it in the long run.

I'm happy having it on its own "thread" churning away slowly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase aaron's point, the current implementation isn't really better than having the CMD call serialized with the CreateTimeSeries call. Either way, a CreateMetricDescriptor call can block the CreateTimeSeries calls. If you want CMD to be optimistic, you could have a buffered channel, but drop CMD calls if the buffer fills up. Otherwise, you probably don't need to bother with the extra goroutine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but without a buffer (or non-blocking write), L126 in pushMetrics() will block until the background goroutine reads from the channel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to go with a buffer of a few, and I do some pre-filtering of MDs before shoveling in the buffer. I think what we have now matches requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
@aabmass
Copy link
Contributor

aabmass commented Dec 21, 2021

timeSeries = append(timeSeries, me.mapper.metricToTimeSeries(monitoredResource, extraLabels, metric)...)
}
}
}

// TODO: self observability
// TODO: Figure out how to configure service time series calls.
if false {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aabmass @dashpole I added this as a reminder that I don't think we have an open bug to support service timeseries calls in this rework. Would one of you mind confirming/opening approrpiately? I think David has the most context on what we need here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can link to #225.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assigned the bug to @jsuereth and pulled into the sprint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I'll work on that next.

// prior to shutdown.
for md := range me.mds {
// Not yet sent, now we sent it.
// TODO - check to see if this is a service/system metric and doesn't send descriptors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - OpenCensus had logic to NOT send metric descriptors on certain domains when we suspect they are "system" (or service) metrics. I can add that in this CL or a follow on. It's related to the design of how we want to handle CreateServiceTimeSeries vs. CreateTimeSeries calls.

cc @dashpole

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. It would be nice if CreateServiceTimeSeries == don't create MD so we can simplify the implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not implement that logic if we don't have to

@@ -58,9 +76,14 @@ func newGoogleCloudMetricsExporter(
) (component.MetricsExporter, error) {
setVersionInUserAgent(cfg, set.BuildInfo.Version)

// TODO: map cfg options into metric service client configuration with
// map cfg options into metric service client configuration with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment can probably be removed entirely.

timeSeries = append(timeSeries, me.mapper.metricToTimeSeries(monitoredResource, extraLabels, metric)...)
}
}
}

// TODO: self observability
// TODO: Figure out how to configure service time series calls.
if false {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -69,8 +91,12 @@ func newGoogleCloudMetricsExporter(
cfg: cfg,
client: client,
mapper: metricMapper{cfg},
mds: make(chan *metricpb.MetricDescriptor),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase aaron's point, the current implementation isn't really better than having the CMD call serialized with the CreateTimeSeries call. Either way, a CreateMetricDescriptor call can block the CreateTimeSeries calls. If you want CMD to be optimistic, you could have a buffered channel, but drop CMD calls if the buffer fills up. Otherwise, you probably don't need to bother with the extra goroutine.

exporter/collector/metricsexporter.go Outdated Show resolved Hide resolved
// Not yet sent, now we sent it.
// TODO - check to see if this is a service/system metric and doesn't send descriptors.
if !me.cfg.MetricConfig.SkipCreateMetricDescriptor && md != nil && mdCache[md.Type] == nil {
err := me.exportMetricDescriptor(context.Background(), md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want a timeout for this ctx, or is this handled automatically by the client library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should be able to configure that on gRPC clients and it'll attach to an existing timeout if one exists, or create one. However, I'm rather weak at Go, as you know :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure so looked around: https://pkg.go.dev/cloud.google.com/go#hdr-Timeouts_and_Cancellation TLDR; the client lib will set a default timeout if the context doesn't have one already.

@dashpole probably knows better, is the default reasonable for this or should we hardcode something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a timeout in the googlecloud exporter's config. I'd expect that to apply to the CMD call as well. Otherwise, i'd expect it to use the default timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@aabmass aabmass Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in meeting, would be better to send the context in the channel from the other goroutine. Fine to do in a follow up PR.

@jsuereth jsuereth merged commit 20a5d3e into col-exporter-rewrite Dec 22, 2021
@jsuereth jsuereth deleted the wip-suereth/metric-descriptors branch December 22, 2021 15:51
@dashpole dashpole mentioned this pull request Feb 1, 2022
dashpole added a commit that referenced this pull request Feb 2, 2022
* Skip all fixture tests (#239)

* Initial structure for new pdata metrics exporter (#238)

* [Metrics Rewrite] add outline with todos for fragmenting work (#240)

* [Metrics Rewrite] attribute to label mapping (#243)

[Metrics Rewrite] attribute to label mapping

* [Metrics Rewrite] support for pdata Sum points (#242)

* [Metrics Rewrite] support for pdata Sum points

* update breaking-changes.md

* use concatentation instead of sprintf

* [Metrics Rewrite] support for pdata Gauge points (#244)

* Add logic to translate metric descriptors and initial flow (#247)

* Fixes from merge.

* Fix tests.

* Clean up test cases, re-disable integration tests.

* Add summary descriptors and label descriptors.

* Fix lint issues.

* Some fixes from review.

* Remove metric import.

* Fixes from review.
- Update default config method
- Simplify some of my lack-of-go expertise.

* Add unit test for metric domains.

* Fixes from review.

* Add breaking changes.

* Fixes from review.

* Update context to be TODO.

* Add support for exponential histograms and exemplars. (#251)

* Add support for exponential histograms and exemplars.

* Fixes from review.

* Fixes from review.

* Fixes from discussion.

* [Metrics Rewrite] implement monitored resource mapping (#252)

* [Metrics Rewrite] implement monitored resource mapping

* review fixes

* [Metrics Rewrite] update breaking-changes.md for monitored resource (#255)

* Add summary mapping to exporter. (#249)

* Add config to call `CreateServiceTimeSeries` (#259)

* Initial implementation of create service time series.

* Add a test case for create service timeseries.

* Add logic to auto-detect project id if not configured.

* Fix from code review

* Fix resource to be one that has retention policy for integration tests.

* Add support for histogram to metrics exporter. (#258)

BUG=210164184

* Re-enable ops-agent self-metric integration test. (#260)

* [Metrics Rewrite] add ExponentialHistogram fixture (#257)

* [Metrics Rewrite] add ExponentialHistogram fixture

* make tests deterministic

* few last changes

* close channel instead of sending a message

* Enable ops agent host metric integration test. (#264)

- There is a bug in upstream agent-metric-processor that sets incorrect units on usage metrics (GoogleCloudPlatform/opentelemetry-operations-collector#72)
- We update the expectations for inculsion of units in CreateTimeSeries
- We disable metric descriptors (for now).  Given the bug in agent-metric-processor, liekly ops-agent will need upstream fix for this first.

* add a feature gate, which defaults to false, for using the re-written exporter (#267)

* Enable Basic integration tests (#266)

* Enable basic counter test.

* Enable delta counter metrics.

- Note: Delta counters are NOW fake-delta (i.e. cumulatives with limited time windows)

* Enable non-monotonic-sum integration test.

* Re-enable summary integration test and fix design issues in summary translation.

- Summary exports percentiles, not quantiles
- Percentiles should include similar double precision in the string.

* Fix recordfixtures script to use featuregate (#270)

* Skip already seen attribute keys when creating LabelDescriptors (#272)

* Reenable GKE metrics agent fixtures (#271)

* Update breaking-changes.md for googlecloudmonitoring/point_count self observability (#277)

* Move logging to use zap-logger and set up self-observability to match collector expectations. (#275)

* Enable metric prefix integraiton tests. (#274)

* enable workloadapis prefix integration test.

* update unknown domain metrics expect.

* Add instrumentationLibraryToLabels method to metrics exporter. (#253)

* Add instrumentationLibraryToLabels method to metrics exporter.

BUG=https://b.corp.google.com/issues/210164355

* Remove custom_metrics_domains behaviour from metrics-exporter.

* Remove dependency on go.opentelemetry.io/collector (#279)

* remove dependency on go.opentelemetry.io/collector

* add ocgrpc metrics to exporters' self-obs metrics (#280)

* Use OC stackdriver exporter to capture self observability metrics as GCM protos (#282)

* Capture ocgrpc self observability metrics (#283)

* make integrationtest not internal (#285)

* Remove internal/ prefix for integrationtest (#288)

* Add batching support to metrics-exporter. (#286)

* Add batching support to metrics-exporter.

* Retry when we fail to write metric descriptors.

* Re-enable workload metrics integration tests (#278)

* update header year for new files (#296)

* Document new CreateMetricDescriptor behavior (#294)

* reenable disabled metrics test (#299)

Co-authored-by: Aaron Abbott <aaronabbott@google.com>
Co-authored-by: Josh Suereth <Joshua.Suereth@gmail.com>
Co-authored-by: Thomas Barker <tbarker25@gmail.com>
Co-authored-by: Punya Biswal <punya@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants