-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NH-60984 Add Otel Python SDK-based histogram for response_time #201
Changes from 37 commits
33cc277
e2bc6fe
20b3a41
cba0107
eb31a1d
d9018b1
2b27d33
8a1bf89
bf6a139
46749ec
dbb6bbf
c49cf2d
5cd1315
a076ce2
16e76cc
3e2d928
fa97c56
26b9764
3756e0a
c9500f4
e49d034
d6c6481
773be29
334ac6d
ea9f603
25cf96c
3966526
9fc242a
5db6722
6d11be2
47e51ea
3f8d287
49e1964
d9225b0
491cb99
e166e1f
120dd2a
c9a5696
de809f8
e375a8f
0d9fd86
f893bff
65a0f7d
f432626
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# © 2023 SolarWinds Worldwide, LLC. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
|
||
import logging | ||
|
||
from opentelemetry import metrics | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class SolarWindsMeterManager: | ||
"""SolarWinds Python OTLP Meter Manager""" | ||
|
||
def __init__(self, **kwargs: int) -> None: | ||
# Returns a named `Meter` to handle instrument creation. | ||
# A convenience wrapper for MeterProvider.get_meter | ||
self.meter = metrics.get_meter("sw.apm.sampling.metrics") | ||
|
||
self.response_time = self.meter.create_histogram( | ||
name="trace.service.response_time", | ||
description="measures the duration of an inbound HTTP request", | ||
unit="ms", | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -188,3 +188,15 @@ class Config: | |
@staticmethod | ||
def getVersionString(): | ||
return "No extension loaded." | ||
|
||
|
||
class OtelHistogram: | ||
@staticmethod | ||
def record(*args, **kwargs): | ||
pass | ||
|
||
|
||
class SolarWindsMeterManager: | ||
def __init__(self, *args, **kwargs): | ||
self.meter = None | ||
self.response_time = OtelHistogram() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor curiosity, whether the OTel NoOpHistogram can be used instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,190 @@ | ||
# © 2023 SolarWinds Worldwide, LLC. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
|
||
import logging | ||
import random | ||
from typing import TYPE_CHECKING, Any, Tuple | ||
|
||
from opentelemetry.metrics import get_meter_provider | ||
from opentelemetry.sdk.trace import SpanProcessor | ||
from opentelemetry.semconv.trace import SpanAttributes | ||
from opentelemetry.trace import SpanKind, StatusCode | ||
|
||
from solarwinds_apm.w3c_transformer import W3CTransformer | ||
|
||
if TYPE_CHECKING: | ||
from opentelemetry.sdk.trace import ReadableSpan | ||
|
||
from solarwinds_apm.apm_config import SolarWindsApmConfig | ||
from solarwinds_apm.apm_meter_manager import SolarWindsMeterManager | ||
from solarwinds_apm.apm_txname_manager import SolarWindsTxnNameManager | ||
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class SolarWindsOTLPMetricsSpanProcessor(SpanProcessor): | ||
# TODO Refactor for both inbound and otlp metrics | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
_HTTP_METHOD = SpanAttributes.HTTP_METHOD # "http.method" | ||
_HTTP_ROUTE = SpanAttributes.HTTP_ROUTE # "http.route" | ||
_HTTP_STATUS_CODE = SpanAttributes.HTTP_STATUS_CODE # "http.status_code" | ||
_HTTP_URL = SpanAttributes.HTTP_URL # "http.url" | ||
|
||
_HTTP_SPAN_STATUS_UNAVAILABLE = 0 | ||
|
||
def __init__( | ||
self, | ||
apm_txname_manager: "SolarWindsTxnNameManager", | ||
apm_config: "SolarWindsApmConfig", | ||
apm_meters: "SolarWindsMeterManager", | ||
) -> None: | ||
self.apm_txname_manager = apm_txname_manager | ||
self.service_name = apm_config.service_name | ||
self.apm_meters = apm_meters | ||
|
||
# TODO Assumes SolarWindsInboundMetricsSpanProcessor.on_start | ||
# is called to store span ID for any custom txn naming | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
|
||
def on_end(self, span: "ReadableSpan") -> None: | ||
"""Calculates and reports OTLP trace metrics""" | ||
# Only calculate OTLP metrics for service entry spans | ||
parent_span_context = span.parent | ||
if ( | ||
parent_span_context | ||
and parent_span_context.is_valid | ||
and not parent_span_context.is_remote | ||
): | ||
return | ||
|
||
# support ssa and conform to Otel proto common_pb2 | ||
meter_attrs = { | ||
"sw.service_name": self.service_name, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is to conform to the SWO response time metric tagging, right? If so, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahhhh. Removed in f432626. Past Tammy also might have added it when trying to correlate metrics with service, before you found that it was that resource attribute needed instead! |
||
"sw.nonce": random.getrandbits(64) >> 1, | ||
} | ||
|
||
has_error = self.has_error(span) | ||
if has_error: | ||
meter_attrs.update({"sw.is_error": "true"}) | ||
else: | ||
meter_attrs.update({"sw.is_error": "false"}) | ||
|
||
# trans_name will never be None because always at least span.name | ||
trans_name, _ = self.calculate_transaction_names(span) | ||
|
||
is_span_http = self.is_span_http(span) | ||
span_time = self.calculate_span_time( | ||
span.start_time, | ||
span.end_time, | ||
) | ||
|
||
if is_span_http: | ||
status_code = self.get_http_status_code(span) | ||
request_method = span.attributes.get(self._HTTP_METHOD, None) | ||
meter_attrs.update( | ||
{ | ||
self._HTTP_STATUS_CODE: status_code, | ||
self._HTTP_METHOD: request_method, | ||
"sw.transaction": trans_name, | ||
} | ||
) | ||
else: | ||
meter_attrs.update({"sw.transaction": trans_name}) | ||
self.apm_meters.response_time.record( | ||
amount=span_time, | ||
attributes=meter_attrs, | ||
) | ||
|
||
# This does not cache txn_name for span export because | ||
# assuming SolarWindsInboundMetricsSpanProcessor does it | ||
# for SW-style trace export. This processor is for OTLP-style. | ||
# TODO: Cache txn_name for OTLP span export? | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
|
||
# Force flush metrics after every entry span via flush of all meters | ||
# including PeriodicExportingMetricReader | ||
logger.debug("Performing MeterProvider force_flush of metrics") | ||
get_meter_provider().force_flush() | ||
|
||
# TODO Refactor for both inbound and otlp metrics | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
def is_span_http(self, span: "ReadableSpan") -> bool: | ||
"""This span from inbound HTTP request if from a SERVER by some http.method""" | ||
if span.kind == SpanKind.SERVER and span.attributes.get( | ||
self._HTTP_METHOD, None | ||
): | ||
return True | ||
return False | ||
|
||
# TODO Refactor for both inbound and otlp metrics | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
def has_error(self, span: "ReadableSpan") -> bool: | ||
"""Calculate if this span instance has_error""" | ||
if span.status.status_code == StatusCode.ERROR: | ||
return True | ||
return False | ||
|
||
# TODO Refactor for both inbound and otlp metrics | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
def get_http_status_code(self, span: "ReadableSpan") -> int: | ||
"""Calculate HTTP status_code from span or default to UNAVAILABLE""" | ||
status_code = span.attributes.get(self._HTTP_STATUS_CODE, None) | ||
# Something went wrong in OTel or instrumented service crashed early | ||
# if no status_code in attributes of HTTP span | ||
if not status_code: | ||
# TODO change if refactor | ||
status_code = self._HTTP_SPAN_STATUS_UNAVAILABLE | ||
return status_code | ||
|
||
# TODO Refactor for both inbound and otlp metrics | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
# Disable pylint for compatibility with Python3.7 else TypeError | ||
def calculate_transaction_names( | ||
self, span: "ReadableSpan" | ||
) -> Tuple[Any, Any]: # pylint: disable=deprecated-typing-alias | ||
"""Get trans_name and url_tran of this span instance.""" | ||
url_tran = span.attributes.get(self._HTTP_URL, None) | ||
http_route = span.attributes.get(self._HTTP_ROUTE, None) | ||
trans_name = None | ||
custom_trans_name = self.calculate_custom_transaction_name(span) | ||
|
||
if custom_trans_name: | ||
trans_name = custom_trans_name | ||
elif http_route: | ||
trans_name = http_route | ||
elif span.name: | ||
trans_name = span.name | ||
return trans_name, url_tran | ||
|
||
# TODO Refactor for both inbound and otlp metrics | ||
# https://swicloud.atlassian.net/browse/NH-65061 | ||
def calculate_custom_transaction_name(self, span: "ReadableSpan") -> Any: | ||
"""Get custom transaction name for trace by trace_id, if any""" | ||
trans_name = None | ||
trace_span_id = W3CTransformer.trace_and_span_id_from_context( | ||
span.context | ||
) | ||
custom_name = self.apm_txname_manager.get(trace_span_id) | ||
if custom_name: | ||
trans_name = custom_name | ||
# TODO change if refactor | ||
# Assume SolarWindsInboundMetricsSpanProcessor is | ||
# active and is doing the removal of any | ||
# custom txn name from cache if not sampled, | ||
# so not doing it here for OTLP | ||
return trans_name | ||
|
||
def calculate_span_time( | ||
self, | ||
start_time: int, | ||
end_time: int, | ||
) -> int: | ||
"""Calculate span time in milliseconds (ms) using start and end time | ||
in nanoseconds (ns). OTel span start/end_time are optional.""" | ||
if not start_time or not end_time: | ||
return 0 | ||
return int((end_time - start_time) // 1e6) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the apm-proto metrics we definitely needed to convert to microseconds, but OTel metrics model seems to support nanoseconds so how about let's not do any conversion and pass in just the start/end delta as-is and see what we get :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,4 +5,4 @@ | |
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
|
||
"""Version of SolarWinds Custom-Distro for OpenTelemetry agents""" | ||
__version__ = "0.18.0.1" | ||
__version__ = "0.18.0.6" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did a few testreleases to create APM Python lambda layer from testpypi and do some testing there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the
sw.apm.sampling.metrics
instrumentation scope name should only be given to our "request counter" metrics, it tells the collector to send them to sample rate hopper. For the "response time" metrics we haven't talked about what to set, but I see @cleverchuk currently usessw.apm.request.metrics
which makes sense, I'll start a discussion and get this in our spec.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
sw.apm.request.metrics
here in f893bff at least for now! Will name a separate metersw.apm.sampling.metrics
in WIP PR that adds gauges for counters: #207 (EDIT: done in d1ec2ca)