Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NH-60984 Add Otel Python SDK-based histogram for response_time #201

Merged
merged 44 commits into from
Oct 27, 2023

Conversation

tammy-baylis-swi
Copy link
Contributor

@tammy-baylis-swi tammy-baylis-swi commented Sep 27, 2023

This adds a new SolarWinds MeterManager as a place to store any Otel Python meters initialized at instrumented service startup. In this iteration we add one Otel Python SDK Histogram to represent the trace.service.response_time derived/"inbound" metric in SWO.

Example graphs of how this looks are screenshot in this doc: https://swicloud.atlassian.net/wiki/spaces/NIT/pages/3792503086/PoC+Otel+Metrics+to+recreate+Inbound+Metrics#trace.service.response_time

Also adds a new SolarWinds OTLPMetricsSpanProcessor to record to the histogram at the end of every service entry span. Right now this processor duplicates some of what the InboundMetricsSpanProcessor does to set the metrics attribute sw.transaction.

Configurator is updated to initialized all of the above at instrumented service startup. Span processors are registered in order of add_span_processor calls so InboundMetricsSpanProcessor is first and OTLPMetricsSpanProcessor is second when on_end is called.

User needs to set SW_APM_EXPERIMENTAL_OTEL_COLLECTOR as per #200 for the histogram to be created and recorded to. If not set, metrics instruments are not created nor is the span processor for otlp metrics recording published.

Next steps:

@tammy-baylis-swi tammy-baylis-swi changed the title NH-60984 add OTLP meter for response_time NH-60984 PoC: Otel Python histogram measures response_time Sep 27, 2023
@tammy-baylis-swi tammy-baylis-swi force-pushed the NH-60984-add-otlp-meters branch from 81165f4 to dbb6bbf Compare October 2, 2023 18:58
@tammy-baylis-swi tammy-baylis-swi changed the title NH-60984 PoC: Otel Python histogram measures response_time NH-60984 PoC: Otel Python histogram for response_time Oct 17, 2023
@tammy-baylis-swi tammy-baylis-swi force-pushed the NH-60984-add-otlp-meters branch from 4a39ae2 to 491cb99 Compare October 19, 2023 01:35
@tammy-baylis-swi tammy-baylis-swi changed the title NH-60984 PoC: Otel Python histogram for response_time NH-60984 Add Otel Python SDK-based histogram for response_time Oct 19, 2023
@@ -5,4 +5,4 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

"""Version of SolarWinds Custom-Distro for OpenTelemetry agents"""
__version__ = "0.18.0.1"
__version__ = "0.18.0.6"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a few testreleases to create APM Python lambda layer from testpypi and do some testing there.

@tammy-baylis-swi tammy-baylis-swi marked this pull request as ready for review October 19, 2023 19:40
@tammy-baylis-swi tammy-baylis-swi requested review from cheempz and a team October 19, 2023 19:40
Base automatically changed from NH-60984-load-metrics-entrypoints to main October 26, 2023 23:02
Copy link
Contributor

@cheempz cheempz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the write up and digestable PR as always @tammy-baylis-swi! I focused on the areas not marked by TODO and left a few thoughts.

def __init__(self, **kwargs: int) -> None:
# Returns a named `Meter` to handle instrument creation.
# A convenience wrapper for MeterProvider.get_meter
self.meter = metrics.get_meter("sw.apm.sampling.metrics")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the sw.apm.sampling.metrics instrumentation scope name should only be given to our "request counter" metrics, it tells the collector to send them to sample rate hopper. For the "response time" metrics we haven't talked about what to set, but I see @cleverchuk currently uses sw.apm.request.metrics which makes sense, I'll start a discussion and get this in our spec.

Copy link
Contributor Author

@tammy-baylis-swi tammy-baylis-swi Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed sw.apm.request.metrics here in f893bff at least for now! Will name a separate meter sw.apm.sampling.metrics in WIP PR that adds gauges for counters: #207 (EDIT: done in d1ec2ca)

class SolarWindsMeterManager:
def __init__(self, *args, **kwargs):
self.meter = None
self.response_time = OtelHistogram()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor curiosity, whether the OTel NoOpHistogram can be used instead.

Copy link
Contributor Author

@tammy-baylis-swi tammy-baylis-swi Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah!!! Changed in 0d9fd86 and 65a0f7d


# support ssa and conform to Otel proto common_pb2
meter_attrs = {
"sw.service_name": self.service_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to conform to the SWO response time metric tagging, right? If so, sw.service_name isn't needed because Python doesn't support service name override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh. Removed in f432626. Past Tammy also might have added it when trying to correlate metrics with service, before you found that it was that resource attribute needed instead!

in nanoseconds (ns). OTel span start/end_time are optional."""
if not start_time or not end_time:
return 0
return int((end_time - start_time) // 1e6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the apm-proto metrics we definitely needed to convert to microseconds, but OTel metrics model seems to support nanoseconds so how about let's not do any conversion and pass in just the start/end delta as-is and see what we get :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the conversion and record (end_time - start_time) then the scale bar gets cranked up to the Millions range! What do you think?

OTLP metrics for response_time (on NA-02) without conversion:

otlp_response_time

...verses APM-proto inbound metrics for response_time (on NA-01):

apm-proto_response_time

@tammy-baylis-swi
Copy link
Contributor Author

Thank you for first pass @cheempz ! I've addressed them and posted a question about the start/end delta conversion if you could please take another look.

Copy link
Contributor

@cheempz cheempz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the revisit and experiments @tammy-baylis-swi! I'll follow up with other teams on the unconverted nanosecond presentation in SWO :)

@tammy-baylis-swi tammy-baylis-swi merged commit aafcd20 into main Oct 27, 2023
11 checks passed
@tammy-baylis-swi tammy-baylis-swi deleted the NH-60984-add-otlp-meters branch October 27, 2023 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants