-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics Rewrite] implement monitored resource mapping #252
[Metrics Rewrite] implement monitored resource mapping #252
Conversation
1b0b31e
to
0b14a5d
Compare
0b14a5d
to
99af1e6
Compare
99af1e6
to
feca90e
Compare
// resource keys for a given monitored resource type. For entries with multiple OTel | ||
// resource keys, the keys' values will be coalesced in order until there is a non-empty | ||
// value. | ||
monitoredResourceMappings = map[string]map[string][]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this intended to be used with Configuration from the user? Should we allow users to continue performing MR mapping via config as well, and if so is there a way for that to override this behavior?
Do we expect that to solely be via map[string]map[string][]string
manipulation of defaults w/ config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that to solely be via map[string]map[string][]string manipulation of defaults w/ config?
That would work except we need an extra field for "discriminating" the incoming resource to one of the entries in the map. For the default logic here, this is cloud.platform
but we have the special cases for differentiating k8s_{container,pod,node,cluster}
and for the fallbacks.
We could provide a full escape-hatch function config that lets users (probably 1st-party) override this whole thing in a custom build if needed.
But for now can we punt this until we know the use cases better?
instanceID: {semconv.AttributeHostID}, | ||
}, | ||
k8sContainer: { | ||
location: {semconv.AttributeCloudAvailabilityZone}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a comment in the spec from @dashpole about the value of this field for regional clusters. I don't quite understand how the resource detection would work for regional clusters but this may need to have both zone and then region if zone isn't populated for the regional clusters. Though I'm good with starting with only zone for now until the resource detection piece is figured out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I'm good with starting with only zone for now until the resource detection piece is figured out.
Lets do this for now, but I'll leave this comment unresolved
taskID: {semconv.AttributeServiceInstanceID}, | ||
}, | ||
genericNode: { | ||
location: {semconv.AttributeCloudAvailabilityZone, semconv.AttributeCloudRegion}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec says that on the last fall through, the location should be populated with "global" if semantic convention labels aren't populated. This is because writing with a location set to empty string will be an error. Do you think we need to handle that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't use global if we default to workload.googleapis.com right? DO we need to update the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed for generic_task/generic_node
. For the other resources it should really be present, so I'll leave those to empty to surface the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the value of the "location" label to "global" is still allowed under workload.googleapis.com.
* Skip all fixture tests (#239) * Initial structure for new pdata metrics exporter (#238) * [Metrics Rewrite] add outline with todos for fragmenting work (#240) * [Metrics Rewrite] attribute to label mapping (#243) [Metrics Rewrite] attribute to label mapping * [Metrics Rewrite] support for pdata Sum points (#242) * [Metrics Rewrite] support for pdata Sum points * update breaking-changes.md * use concatentation instead of sprintf * [Metrics Rewrite] support for pdata Gauge points (#244) * Add logic to translate metric descriptors and initial flow (#247) * Fixes from merge. * Fix tests. * Clean up test cases, re-disable integration tests. * Add summary descriptors and label descriptors. * Fix lint issues. * Some fixes from review. * Remove metric import. * Fixes from review. - Update default config method - Simplify some of my lack-of-go expertise. * Add unit test for metric domains. * Fixes from review. * Add breaking changes. * Fixes from review. * Update context to be TODO. * Add support for exponential histograms and exemplars. (#251) * Add support for exponential histograms and exemplars. * Fixes from review. * Fixes from review. * Fixes from discussion. * [Metrics Rewrite] implement monitored resource mapping (#252) * [Metrics Rewrite] implement monitored resource mapping * review fixes * [Metrics Rewrite] update breaking-changes.md for monitored resource (#255) * Add summary mapping to exporter. (#249) * Add config to call `CreateServiceTimeSeries` (#259) * Initial implementation of create service time series. * Add a test case for create service timeseries. * Add logic to auto-detect project id if not configured. * Fix from code review * Fix resource to be one that has retention policy for integration tests. * Add support for histogram to metrics exporter. (#258) BUG=210164184 * Re-enable ops-agent self-metric integration test. (#260) * [Metrics Rewrite] add ExponentialHistogram fixture (#257) * [Metrics Rewrite] add ExponentialHistogram fixture * make tests deterministic * few last changes * close channel instead of sending a message * Enable ops agent host metric integration test. (#264) - There is a bug in upstream agent-metric-processor that sets incorrect units on usage metrics (GoogleCloudPlatform/opentelemetry-operations-collector#72) - We update the expectations for inculsion of units in CreateTimeSeries - We disable metric descriptors (for now). Given the bug in agent-metric-processor, liekly ops-agent will need upstream fix for this first. * add a feature gate, which defaults to false, for using the re-written exporter (#267) * Enable Basic integration tests (#266) * Enable basic counter test. * Enable delta counter metrics. - Note: Delta counters are NOW fake-delta (i.e. cumulatives with limited time windows) * Enable non-monotonic-sum integration test. * Re-enable summary integration test and fix design issues in summary translation. - Summary exports percentiles, not quantiles - Percentiles should include similar double precision in the string. * Fix recordfixtures script to use featuregate (#270) * Skip already seen attribute keys when creating LabelDescriptors (#272) * Reenable GKE metrics agent fixtures (#271) * Update breaking-changes.md for googlecloudmonitoring/point_count self observability (#277) * Move logging to use zap-logger and set up self-observability to match collector expectations. (#275) * Enable metric prefix integraiton tests. (#274) * enable workloadapis prefix integration test. * update unknown domain metrics expect. * Add instrumentationLibraryToLabels method to metrics exporter. (#253) * Add instrumentationLibraryToLabels method to metrics exporter. BUG=https://b.corp.google.com/issues/210164355 * Remove custom_metrics_domains behaviour from metrics-exporter. * Remove dependency on go.opentelemetry.io/collector (#279) * remove dependency on go.opentelemetry.io/collector * add ocgrpc metrics to exporters' self-obs metrics (#280) * Use OC stackdriver exporter to capture self observability metrics as GCM protos (#282) * Capture ocgrpc self observability metrics (#283) * make integrationtest not internal (#285) * Remove internal/ prefix for integrationtest (#288) * Add batching support to metrics-exporter. (#286) * Add batching support to metrics-exporter. * Retry when we fail to write metric descriptors. * Re-enable workload metrics integration tests (#278) * update header year for new files (#296) * Document new CreateMetricDescriptor behavior (#294) * reenable disabled metrics test (#299) Co-authored-by: Aaron Abbott <aaronabbott@google.com> Co-authored-by: Josh Suereth <Joshua.Suereth@gmail.com> Co-authored-by: Thomas Barker <tbarker25@gmail.com> Co-authored-by: Punya Biswal <punya@google.com>
Implements monitored resource mapping. Added in a new file and added a bunch of tests. Integration tests will come in a separate PR.
There are a lot of breaking changes here from the previous OC mapping, I need to update the markdown still.