-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report profiling data in v2.4 intake format; compress files #53
Conversation
This was tested with both the Ruby (agent and agentless modes) and the PHP profilers. This also introduces a breaking API change: the `ddog_ProfileExporter_build` / `ProfileExporter::build` functions now take two additional arguments -- the profiling library name and version. Other than that change, using the v2.4 intake format is transparent to the libdatadog users. Thanks to @morrisonlevi for pairing with me on this. Note that this does not (yet) include support for including attributes in the reporting data. I'll leave that for a separate PR.
This isn't a thing in v2.4
c2ac732
to
61ccef9
Compare
@morrisonlevi I was able to test your lz4 changes successfully with the Ruby profiler as well. I had, of course, to disable my own compression -- funnily enough the intake would still accept profiles that were gzipped by Ruby + lz4'd by libdatadog but they never showed up on the profile list ;) |
profiling/src/exporter/mod.rs
Outdated
) | ||
let mut encoder = FrameEncoder::new(Vec::new()); | ||
encoder.write_all(file.bytes)?; | ||
form.add_reader_file(file.name, Cursor::new(encoder.finish()?), file.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the compression format defined in the payload ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the intake just looks for magic bits to distinguish if an uploaded file is [gzip|lz4] compressed or not. Otherwise no changes needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ivo is right, according to Florian:
Levi: How does it know what compression format to use? Inspect magic bytes?
Florian: yes
Florian: I was looking at java profiler and it uses LZ4 by default
Florian: Jaroslav experiments has shown that it's a good space/CPU tradeoff for the profiler
It's also is why I picked lz4. I didn't test or verify these claims beyond "works for me."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. There are indeed magic values at the start of the payload that you can use.
As a Datadog agnostic library, this is not super friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a Datadog agnostic library, this is not super friendly.
Do you mean, if other teams at Datadog want to reuse this, or if non-datadog people want to reuse this? I'd like to understand your concerns :)
I'm clarifying with Florian whether it's important that
Or if this is fine:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still waiting on Florian for a question about name vs filename. Aside from this, with fresh I eyes I think we should set profiling library name and version at the same place we set family, which is when we build the exporter, not the request, as this information is unlikely to change.
I also don't know how many people are re-using an exporter vs just making a new one so... it may be a moot point.
profiling/src/exporter/mod.rs
Outdated
@@ -120,26 +122,46 @@ impl ProfileExporter { | |||
} | |||
|
|||
/// Build a Request object representing the profile information provided. | |||
#[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is fine for now, but we should probably work on breaking this up when we add support for the extra attributes feature.
profiling/src/exporter/mod.rs
Outdated
) | ||
let mut encoder = FrameEncoder::new(Vec::new()); | ||
encoder.write_all(file.bytes)?; | ||
form.add_reader_file(file.name, Cursor::new(encoder.finish()?), file.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ivo is right, according to Florian:
Levi: How does it know what compression format to use? Inspect magic bytes?
Florian: yes
Florian: I was looking at java profiler and it uses LZ4 by default
Florian: Jaroslav experiments has shown that it's a good space/CPU tradeoff for the profiler
It's also is why I picked lz4. I didn't test or verify these claims beyond "works for me."
Right, that makes sense. I'll probably not have time today, but I've made and note and will come back soon-ish™️ |
I confirmed that intake doesn't care about the form field name for files other than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but then again I wrote or paired for all of it, so that's expected ^_^. I'll let someone else review it.
) -> anyhow::Result<ProfileExporter> { | ||
) -> anyhow::Result<ProfileExporter> | ||
where | ||
F: Into<Cow<'static, str>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to alias to different types ?
Also the Cow 'static
lifetime seems strange to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of thinking Cow<str>
as a string that gets cloned on write, think of it instead as either a Borrowed string, or an Owned one. In this case, we can borrow a 'static
string because that will, by definition, live long enough. Doing so allows us to save some memory allocations if we can show the lifetime is static (such as PHP profiler calling this from Rust). However, if we can't prove it, then we need an Owned version that copies it; this is what will happen across the C FFI.
The reason for taking an Into<Cow<_>>
is so that you can pass a &str
or a String
or anything else that the compiler knows how to convert via into
or from
, making it nicer for the caller since they don't have to wrap it into a Cow<_>
. The reason for having it repeated 3 times is so each parameter can independently do this -- if we had only one type IntoCow: Into<Cow<'static, str>>
that they all used, then they'd all have to be the same type, which isn't so nice. For instance, maybe the library name is a &str
but the version is a String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Happy to integrate this in native !
What does this PR do?
This PR changes libdatadog to report profiling data using the v2.4 intake format. It also compresses the files
includingexcluding the newevent.json
file using lz4.This was tested with both the Ruby (agent and agentless modes) and the PHP profilers.
This also introduces two breaking changes:
ddog_ProfileExporter_new
/ProfileExporter::new
functions now take two additional arguments -- the profiling library name and version.We expect that other than the API change, using the v2.4 intake format and the added compression will be transparent to the libdatadog users.
Thanks to @morrisonlevi for pairing with me on this.
Note that this does not (yet) include support for including attributes in the reporting data. I'll leave that for a separate PR.
Motivation
Get libdatadog users to use intake v2.4, and lay the ground work for including attributes in the reported data.
Also save some bytes over the wire for files. A few stats:
- PHP CLI: running
composer create-project symfony/symfony-demo
resulted in a 40016 byte pprof which compressed to 18139 bytes in 141661 nanoseconds. That's a compression ratio of 2.2061 and a space savings of 54.67%.- Ruby github relenv test app:
Additional Notes
(Nothing)
How to test the change?
Validate that data can still be reported to the backend, in both agent as well as agentless modes.