Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NH-60984 Add Otel Python SDK-based histogram for response_time #201

Merged
merged 44 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
33cc277
(WIP) Add SolarWindsMeterManager
tammy-baylis-swi Sep 23, 2023
e2bc6fe
Factor out OTLPMetricsProcessor
tammy-baylis-swi Sep 25, 2023
20b3a41
Add sw.nonce attr to response_time record
tammy-baylis-swi Sep 25, 2023
cba0107
Fix time for response_time
tammy-baylis-swi Sep 25, 2023
eb31a1d
Add txn name methods to OTLP processor
tammy-baylis-swi Sep 26, 2023
d9018b1
Always use trans_name?
tammy-baylis-swi Sep 26, 2023
2b27d33
Rename response_time to same as apm proto
tammy-baylis-swi Sep 26, 2023
8a1bf89
lint
tammy-baylis-swi Sep 27, 2023
bf6a139
Update comment
tammy-baylis-swi Sep 27, 2023
46749ec
Add sw.service_name to metrics attrs
tammy-baylis-swi Sep 27, 2023
dbb6bbf
Add response_time attrs for sw.trace_span_mode, sw.data.module
tammy-baylis-swi Oct 2, 2023
c49cf2d
Test: add sw.trace_span_mode at should_sample
tammy-baylis-swi Oct 2, 2023
5cd1315
Merge branch 'main' into NH-60984-add-otlp-meters
tammy-baylis-swi Oct 3, 2023
a076ce2
Rm attempt to set otel attrs outside resource
tammy-baylis-swi Oct 5, 2023
16e76cc
Update Resource for MeterProvider
tammy-baylis-swi Oct 5, 2023
3e2d928
Merge branch 'main' into NH-60984-add-otlp-meters
tammy-baylis-swi Oct 6, 2023
fa97c56
Test: requests counter direct otlp
tammy-baylis-swi Oct 13, 2023
26b9764
Revert "Test: requests counter direct otlp"
tammy-baylis-swi Oct 13, 2023
3756e0a
Add ObservableGauge placeholder
tammy-baylis-swi Oct 14, 2023
c9500f4
Merge branch 'NH-60984-load-metrics-entrypoints' into NH-60984-add-ot…
tammy-baylis-swi Oct 17, 2023
e49d034
Fix Iterable import and add note
tammy-baylis-swi Oct 17, 2023
d6c6481
Set meter name
tammy-baylis-swi Oct 17, 2023
773be29
Add force flush of metrics
tammy-baylis-swi Oct 17, 2023
334ac6d
Revert "Fix Iterable import and add note"
tammy-baylis-swi Oct 17, 2023
ea9f603
Revert "Add ObservableGauge placeholder"
tammy-baylis-swi Oct 17, 2023
25cf96c
Testrelease 0.18.0.2
tammy-baylis-swi Oct 17, 2023
3966526
Testrelease 0.18.0.3
tammy-baylis-swi Oct 17, 2023
9fc242a
Testrelease 0.18.0.4
tammy-baylis-swi Oct 18, 2023
5db6722
Testrelease 0.18.0.5
tammy-baylis-swi Oct 18, 2023
6d11be2
Testrelease 0.18.0.6
tammy-baylis-swi Oct 18, 2023
47e51ea
Rm debug lines
tammy-baylis-swi Oct 18, 2023
3f8d287
Some cleanup
tammy-baylis-swi Oct 18, 2023
49e1964
otlpmetricsprocessor uses config service_name
tammy-baylis-swi Oct 18, 2023
d9225b0
Merge branch 'NH-60984-load-metrics-entrypoints' into NH-60984-add-ot…
tammy-baylis-swi Oct 19, 2023
491cb99
Tidy resource attrs for meterprovider
tammy-baylis-swi Oct 19, 2023
e166e1f
Update todos
tammy-baylis-swi Oct 19, 2023
120dd2a
Add noop meter manager
tammy-baylis-swi Oct 19, 2023
c9a5696
Update changelog
tammy-baylis-swi Oct 20, 2023
de809f8
Merge branch 'NH-60984-load-metrics-entrypoints' into NH-60984-add-ot…
tammy-baylis-swi Oct 20, 2023
e375a8f
Merge branch 'main' into NH-60984-add-otlp-meters
tammy-baylis-swi Oct 26, 2023
0d9fd86
Noop SolarWindsMeterManager inits NoOpHistogram
tammy-baylis-swi Oct 27, 2023
f893bff
Rename meter for response times
tammy-baylis-swi Oct 27, 2023
65a0f7d
Fix import
tammy-baylis-swi Oct 27, 2023
f432626
Rm unneeded meter_attr
tammy-baylis-swi Oct 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions solarwinds_apm/apm_meter_manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# © 2023 SolarWinds Worldwide, LLC. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

import logging

from opentelemetry import metrics

logger = logging.getLogger(__name__)


class SolarWindsMeterManager:
"""SolarWinds Python OTLP Meter Manager"""

def __init__(self, **kwargs: int) -> None:
# Returns a named `Meter` to handle instrument creation.
# A convenience wrapper for MeterProvider.get_meter
self.meter = metrics.get_meter("sw.apm.sampling.metrics")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the sw.apm.sampling.metrics instrumentation scope name should only be given to our "request counter" metrics, it tells the collector to send them to sample rate hopper. For the "response time" metrics we haven't talked about what to set, but I see @cleverchuk currently uses sw.apm.request.metrics which makes sense, I'll start a discussion and get this in our spec.

Copy link
Contributor Author

@tammy-baylis-swi tammy-baylis-swi Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed sw.apm.request.metrics here in f893bff at least for now! Will name a separate meter sw.apm.sampling.metrics in WIP PR that adds gauges for counters: #207 (EDIT: done in d1ec2ca)


self.response_time = self.meter.create_histogram(
name="trace.service.response_time",
description="measures the duration of an inbound HTTP request",
unit="ms",
)
12 changes: 12 additions & 0 deletions solarwinds_apm/apm_noop.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,3 +188,15 @@ class Config:
@staticmethod
def getVersionString():
return "No extension loaded."


class OtelHistogram:
@staticmethod
def record(*args, **kwargs):
pass


class SolarWindsMeterManager:
def __init__(self, *args, **kwargs):
self.meter = None
self.response_time = OtelHistogram()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor curiosity, whether the OTel NoOpHistogram can be used instead.

Copy link
Contributor Author

@tammy-baylis-swi tammy-baylis-swi Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah!!! Changed in 0d9fd86 and 65a0f7d

62 changes: 58 additions & 4 deletions solarwinds_apm/configurator.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,17 @@
INTL_SWO_SUPPORT_EMAIL,
)
from solarwinds_apm.apm_fwkv_manager import SolarWindsFrameworkKvManager
from solarwinds_apm.apm_meter_manager import SolarWindsMeterManager
from solarwinds_apm.apm_noop import Reporter
from solarwinds_apm.apm_noop import SolarWindsMeterManager as NoopMeterManager
from solarwinds_apm.apm_oboe_codes import OboeReporterCode
from solarwinds_apm.apm_txname_manager import SolarWindsTxnNameManager
from solarwinds_apm.inbound_metrics_processor import (
SolarWindsInboundMetricsSpanProcessor,
)
from solarwinds_apm.otlp_metrics_processor import (
SolarWindsOTLPMetricsSpanProcessor,
)
from solarwinds_apm.response_propagator import (
SolarWindsTraceResponsePropagator,
)
Expand All @@ -74,9 +79,22 @@ def _configure(self, **kwargs: int) -> None:
apm_txname_manager = SolarWindsTxnNameManager()
apm_fwkv_manager = SolarWindsFrameworkKvManager()
apm_config = SolarWindsApmConfig()

if not apm_config.get("experimental").get("otel_collector") is True:
logger.debug(
"Experimental otel_collector flag not configured. Creating meter manager as no-op."
)
apm_meters = NoopMeterManager()
else:
apm_meters = SolarWindsMeterManager()

reporter = self._initialize_solarwinds_reporter(apm_config)
self._configure_otel_components(
apm_txname_manager, apm_fwkv_manager, apm_config, reporter
apm_txname_manager,
apm_fwkv_manager,
apm_config,
reporter,
apm_meters,
)
# Report an status event after everything is done.
self._report_init_event(reporter, apm_config)
Expand All @@ -87,6 +105,7 @@ def _configure_otel_components(
apm_fwkv_manager: SolarWindsFrameworkKvManager,
apm_config: SolarWindsApmConfig,
reporter: "Reporter",
apm_meters: SolarWindsMeterManager,
) -> None:
"""Configure OTel sampler, exporter, propagator, response propagator"""
self._configure_sampler(apm_config)
Expand All @@ -95,6 +114,11 @@ def _configure_otel_components(
apm_txname_manager,
apm_config,
)
self._configure_otlp_metrics_span_processor(
apm_txname_manager,
apm_config,
apm_meters,
)
self._configure_exporter(
reporter,
apm_txname_manager,
Expand Down Expand Up @@ -154,6 +178,27 @@ def _configure_metrics_span_processor(
)
)

def _configure_otlp_metrics_span_processor(
self,
apm_txname_manager: SolarWindsTxnNameManager,
apm_config: SolarWindsApmConfig,
apm_meters: SolarWindsMeterManager,
) -> None:
"""Configure SolarWindsOTLPMetricsSpanProcessor"""
if not apm_config.get("experimental").get("otel_collector") is True:
logger.debug(
"Experimental otel_collector flag not configured. Not configuring OTLP metrics span processor."
)
return

trace.get_tracer_provider().add_span_processor(
SolarWindsOTLPMetricsSpanProcessor(
apm_txname_manager,
apm_config,
apm_meters,
)
)

def _configure_exporter(
self,
reporter: "Reporter",
Expand Down Expand Up @@ -269,10 +314,19 @@ def _configure_metrics_exporter(
reader = PeriodicExportingMetricReader(exporter)
metric_readers.append(reader)

# Consolidate in #201
resource = Resource.create({"service.name": apm_config.service_name})
# Use configured Resource attributes then merge with
# custom service.name and sw.trace_span_mode
resource = trace.get_tracer_provider().get_tracer(__name__).resource
sw_resource = Resource.create(
{
"sw.trace_span_mode": "otel",
"service.name": apm_config.service_name,
}
).merge(resource)

provider = MeterProvider(
resource=resource, metric_readers=metric_readers
resource=sw_resource,
metric_readers=metric_readers,
)
metrics.set_meter_provider(provider)

Expand Down
190 changes: 190 additions & 0 deletions solarwinds_apm/otlp_metrics_processor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# © 2023 SolarWinds Worldwide, LLC. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

import logging
import random
from typing import TYPE_CHECKING, Any, Tuple

from opentelemetry.metrics import get_meter_provider
from opentelemetry.sdk.trace import SpanProcessor
from opentelemetry.semconv.trace import SpanAttributes
from opentelemetry.trace import SpanKind, StatusCode

from solarwinds_apm.w3c_transformer import W3CTransformer

if TYPE_CHECKING:
from opentelemetry.sdk.trace import ReadableSpan

from solarwinds_apm.apm_config import SolarWindsApmConfig
from solarwinds_apm.apm_meter_manager import SolarWindsMeterManager
from solarwinds_apm.apm_txname_manager import SolarWindsTxnNameManager


logger = logging.getLogger(__name__)


class SolarWindsOTLPMetricsSpanProcessor(SpanProcessor):
# TODO Refactor for both inbound and otlp metrics
# https://swicloud.atlassian.net/browse/NH-65061
_HTTP_METHOD = SpanAttributes.HTTP_METHOD # "http.method"
_HTTP_ROUTE = SpanAttributes.HTTP_ROUTE # "http.route"
_HTTP_STATUS_CODE = SpanAttributes.HTTP_STATUS_CODE # "http.status_code"
_HTTP_URL = SpanAttributes.HTTP_URL # "http.url"

_HTTP_SPAN_STATUS_UNAVAILABLE = 0

def __init__(
self,
apm_txname_manager: "SolarWindsTxnNameManager",
apm_config: "SolarWindsApmConfig",
apm_meters: "SolarWindsMeterManager",
) -> None:
self.apm_txname_manager = apm_txname_manager
self.service_name = apm_config.service_name
self.apm_meters = apm_meters

# TODO Assumes SolarWindsInboundMetricsSpanProcessor.on_start
# is called to store span ID for any custom txn naming
# https://swicloud.atlassian.net/browse/NH-65061

def on_end(self, span: "ReadableSpan") -> None:
"""Calculates and reports OTLP trace metrics"""
# Only calculate OTLP metrics for service entry spans
parent_span_context = span.parent
if (
parent_span_context
and parent_span_context.is_valid
and not parent_span_context.is_remote
):
return

# support ssa and conform to Otel proto common_pb2
meter_attrs = {
"sw.service_name": self.service_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to conform to the SWO response time metric tagging, right? If so, sw.service_name isn't needed because Python doesn't support service name override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh. Removed in f432626. Past Tammy also might have added it when trying to correlate metrics with service, before you found that it was that resource attribute needed instead!

"sw.nonce": random.getrandbits(64) >> 1,
}

has_error = self.has_error(span)
if has_error:
meter_attrs.update({"sw.is_error": "true"})
else:
meter_attrs.update({"sw.is_error": "false"})

# trans_name will never be None because always at least span.name
trans_name, _ = self.calculate_transaction_names(span)

is_span_http = self.is_span_http(span)
span_time = self.calculate_span_time(
span.start_time,
span.end_time,
)

if is_span_http:
status_code = self.get_http_status_code(span)
request_method = span.attributes.get(self._HTTP_METHOD, None)
meter_attrs.update(
{
self._HTTP_STATUS_CODE: status_code,
self._HTTP_METHOD: request_method,
"sw.transaction": trans_name,
}
)
else:
meter_attrs.update({"sw.transaction": trans_name})
self.apm_meters.response_time.record(
amount=span_time,
attributes=meter_attrs,
)

# This does not cache txn_name for span export because
# assuming SolarWindsInboundMetricsSpanProcessor does it
# for SW-style trace export. This processor is for OTLP-style.
# TODO: Cache txn_name for OTLP span export?
# https://swicloud.atlassian.net/browse/NH-65061

# Force flush metrics after every entry span via flush of all meters
# including PeriodicExportingMetricReader
logger.debug("Performing MeterProvider force_flush of metrics")
get_meter_provider().force_flush()

# TODO Refactor for both inbound and otlp metrics
# https://swicloud.atlassian.net/browse/NH-65061
def is_span_http(self, span: "ReadableSpan") -> bool:
"""This span from inbound HTTP request if from a SERVER by some http.method"""
if span.kind == SpanKind.SERVER and span.attributes.get(
self._HTTP_METHOD, None
):
return True
return False

# TODO Refactor for both inbound and otlp metrics
# https://swicloud.atlassian.net/browse/NH-65061
def has_error(self, span: "ReadableSpan") -> bool:
"""Calculate if this span instance has_error"""
if span.status.status_code == StatusCode.ERROR:
return True
return False

# TODO Refactor for both inbound and otlp metrics
# https://swicloud.atlassian.net/browse/NH-65061
def get_http_status_code(self, span: "ReadableSpan") -> int:
"""Calculate HTTP status_code from span or default to UNAVAILABLE"""
status_code = span.attributes.get(self._HTTP_STATUS_CODE, None)
# Something went wrong in OTel or instrumented service crashed early
# if no status_code in attributes of HTTP span
if not status_code:
# TODO change if refactor
status_code = self._HTTP_SPAN_STATUS_UNAVAILABLE
return status_code

# TODO Refactor for both inbound and otlp metrics
# https://swicloud.atlassian.net/browse/NH-65061
# Disable pylint for compatibility with Python3.7 else TypeError
def calculate_transaction_names(
self, span: "ReadableSpan"
) -> Tuple[Any, Any]: # pylint: disable=deprecated-typing-alias
"""Get trans_name and url_tran of this span instance."""
url_tran = span.attributes.get(self._HTTP_URL, None)
http_route = span.attributes.get(self._HTTP_ROUTE, None)
trans_name = None
custom_trans_name = self.calculate_custom_transaction_name(span)

if custom_trans_name:
trans_name = custom_trans_name
elif http_route:
trans_name = http_route
elif span.name:
trans_name = span.name
return trans_name, url_tran

# TODO Refactor for both inbound and otlp metrics
# https://swicloud.atlassian.net/browse/NH-65061
def calculate_custom_transaction_name(self, span: "ReadableSpan") -> Any:
"""Get custom transaction name for trace by trace_id, if any"""
trans_name = None
trace_span_id = W3CTransformer.trace_and_span_id_from_context(
span.context
)
custom_name = self.apm_txname_manager.get(trace_span_id)
if custom_name:
trans_name = custom_name
# TODO change if refactor
# Assume SolarWindsInboundMetricsSpanProcessor is
# active and is doing the removal of any
# custom txn name from cache if not sampled,
# so not doing it here for OTLP
return trans_name

def calculate_span_time(
self,
start_time: int,
end_time: int,
) -> int:
"""Calculate span time in milliseconds (ms) using start and end time
in nanoseconds (ns). OTel span start/end_time are optional."""
if not start_time or not end_time:
return 0
return int((end_time - start_time) // 1e6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the apm-proto metrics we definitely needed to convert to microseconds, but OTel metrics model seems to support nanoseconds so how about let's not do any conversion and pass in just the start/end delta as-is and see what we get :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the conversion and record (end_time - start_time) then the scale bar gets cranked up to the Millions range! What do you think?

OTLP metrics for response_time (on NA-02) without conversion:

otlp_response_time

...verses APM-proto inbound metrics for response_time (on NA-01):

apm-proto_response_time

2 changes: 1 addition & 1 deletion solarwinds_apm/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

"""Version of SolarWinds Custom-Distro for OpenTelemetry agents"""
__version__ = "0.18.0.1"
__version__ = "0.18.0.6"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a few testreleases to create APM Python lambda layer from testpypi and do some testing there.