-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Collect run-time metrics #819
Conversation
Are context switch and CPU time just always going to be incrementing during runtime? Is there an alert or check you would do on either? Like if context switches are above/below a certain value? (Same for CPU time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting dig in a little, need to spend more time with this PR
5a8435f
to
06c6ba1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bunch of minor comments.
Seems a little over-engineered, but the direction in general is good.
For logging, please use ddtrace.internal.logger.get_logger
and only use log.warn
, log.error
, and log.debug
.
Also, the runtime metrics worker, we should get 1 worker per-process, similar to our AgentWriter.
- add data structures with tests - add collectors for psutil and platform modules - send metrics to agent statsd
- thanks brett :) Co-Authored-By: Kyle-Verhoog <kyle.verhoog@datadoghq.com>
ddtrace/bootstrap/sitecustomize.py
Outdated
@@ -97,6 +98,8 @@ def add_global_tags(tracer): | |||
opts["port"] = int(port) | |||
if priority_sampling: | |||
opts["priority_sampling"] = asbool(priority_sampling) | |||
if runtime_metrics_enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set a default, then we'll always have this, unless they do DD_RUNTIME_METRICS=
and empty string is falsey.
Also we don't really use "enabled" as a value for other things do we? we should just use True
as the default.
We should be able to change to:
opts['collect_metrics'] = asbool(get_env('runtime_metrics', True))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to verify if we want this to be True
or False
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I completely forgot how get_env
works, it is get_env(<integration>, <name>)
so this is DD_RUNTIME_METRICS_ENABLED
and the default is None
.
So this is totally fine to keep as-is! sorry about any confusion!
ddtrace/tracer.py
Outdated
@@ -239,8 +262,66 @@ def start_span(self, name, child_of=None, service=None, resource=None, span_type | |||
# add it to the current context | |||
context.add_span(span) | |||
|
|||
if service and service not in self._services: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be indented under the else
from L226
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._services
is a set, we can do:
if service:
self._services.add(service)
def test_worker_flush(self): | ||
self.worker.start() | ||
self.worker.flush() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move self.worker.stop()
here to keep from double flushing.
|
||
# expect all metrics in default set are received | ||
# DEV: dogstatsd gauges in form "{metric_name}:{value}|g" | ||
self.assertSetEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check here to confirm values match what is cached by the collections in the worker.
Co-Authored-By: majorgreys <tahir@tahirbutt.com>
… into kyle-verhoog/metrics
… into kyle-verhoog/metrics
@@ -3,6 +3,6 @@ | |||
from ddtrace import tracer | |||
|
|||
if __name__ == '__main__': | |||
assert tracer.dogstatsd.host == "172.10.0.1" | |||
assert tracer.dogstatsd.port == 8120 | |||
assert tracer._dogstatsd_client.host == "172.10.0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, shipping behind a flag too which is nice.
noticed some double quotes usage... I need to get a flake8 plugin setup for that.
Co-Authored-By: majorgreys <tahir@tahirbutt.com>
In likeness of DataDog/dd-trace-rb#677 this PR brings run-time metrics collection to the Python tracer.
Overview
Metrics are periodically polled from various source modules such as
gc
,platform
andpsutil
and sent to Datadog (through the Agent) viadogstatsd
. Some example metrics are:runtime.python.thread_count
runtime.python.mem.rss
runtime.python.gc.gen1_count
Metrics are tagged with static values which are obtained on startup. These include values like:
runtime.python.lang_interpreter
cpython
orjython
orpypy
for exampleruntime.python.lang_version
3.7.2
)What this (roughly) looks like in Datadog (TODO: better data):

TODOs
datadog
module detectedpsutil
)