-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default runtime-id tag #415
Conversation
4f101d6
to
8899207
Compare
Benchmark results for collatzParameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics. See unchanged results
|
Benchmark results for BadBoggleSolver_runParameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics. See unchanged results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make this observation in a comment, but expanding on it here for no particular reason.
As far as I know, runtime-id
is used to fingerprint a precise instance of a service. The noted strategy works perfectly fine for the other profilers, since they are indeed attached to a single instance of a runtime and they can hook into runtime faculties to change the runtime-id
if that changes (e.g., atfork handlers).
On the other hand, this concept is slightly problematic for ddprof
due to the way we track process lineage/lifetime. The noted procedure is more of a profiler-id
. I just want to note this because our concept of what it means to be a runtime-id
was greatly influenced by how the existing profilers work, but this will absolutely need to be revisited in the future.
My take:
- This solution is fine because it's needed for timeline
- A more durable solution will be computed in the backend by hashing PID, container-id, and host. Why? Because this will uniquely identify processes, and that's what matters.
- Instead of PID, a better solution might be to compute something like
comm
name parity. In other words, if the parent/child PID sequence looks like bash -> bash -> python -> bash -> apache, the change sequence is 0 -> 0 -> 1 -> 2 -> 3 instead of 0 -> 0 -> 1 -> 0 -> 2. Something like that, at least. Why? Because the affected profilers can start in the middle of execution and may be restarted, so having a stable characterization is valuable.
In conclusion:
- This PR is fine, the current situation isn't our fault and the average ddprof user is basically just starting the profiler once anyway
- Some careful thought should be put into adapting this procedure for a profiler with scope of more than one service, and I think that would require some work from the backend.
- LGTM aside from the minor fixups in a separate PR. Approving now because I have nothing novel to add and I trust those comments will be addressed.
- Thank you!
ce42d47
to
df90080
Compare
Thanks for sharing. I agree, this is far from perfect. |
5c62fab
to
64cd1d6
Compare
Provide a default runtime_id This is to make it easier to browse the timeline data
64cd1d6
to
00eeec4
Compare
src/uuid.cc
Outdated
std::random_device rd; | ||
std::array<int, std::mt19937::state_size> seed_data; | ||
std::generate_n(seed_data.data(), seed_data.size(), std::ref(rd)); | ||
std::seed_seq seq(std::begin(seed_data), std::end(seed_data)); | ||
std::mt19937 gen(seq); | ||
std::uniform_int_distribution<int> dis(0, 15); | ||
std::uniform_int_distribution<int> dis2(8, 11); | ||
for (int i = 0; i < 12; ++i) { | ||
data[i] = dis(gen); | ||
} | ||
|
||
// Set the version to 4 (UUID version 4) | ||
data[k_version_position] = k_version; | ||
for (int i = 13; i < 16; ++i) { | ||
data[i] = dis(gen); | ||
} | ||
// variant | ||
data[16] = dis2(gen); | ||
|
||
for (int i = 17; i < 32; ++i) { | ||
data[i] = dis(gen); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::random_device rd; | |
std::array<int, std::mt19937::state_size> seed_data; | |
std::generate_n(seed_data.data(), seed_data.size(), std::ref(rd)); | |
std::seed_seq seq(std::begin(seed_data), std::end(seed_data)); | |
std::mt19937 gen(seq); | |
std::uniform_int_distribution<int> dis(0, 15); | |
std::uniform_int_distribution<int> dis2(8, 11); | |
for (int i = 0; i < 12; ++i) { | |
data[i] = dis(gen); | |
} | |
// Set the version to 4 (UUID version 4) | |
data[k_version_position] = k_version; | |
for (int i = 13; i < 16; ++i) { | |
data[i] = dis(gen); | |
} | |
// variant | |
data[16] = dis2(gen); | |
for (int i = 17; i < 32; ++i) { | |
data[i] = dis(gen); | |
} | |
static std::random_device rd; | |
using int_type = std::random_device::result_type; | |
for (int i = 0; i < 16; i += sizeof(int_type)) { | |
uint32_t x = rd(); | |
memcpy(&data[i], &x, sizeof(int_type)); | |
} | |
// variant must be 10xxxxxx | |
data[8] &= 0xBF; | |
data[8] |= 0x80; | |
// version must be 0100xxxx | |
data[6] &= 0x4F; | |
data[6] |= 0x40; | |
return uuid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to review the proposal during a live session.
[[nodiscard]] int get_version() const { return data[k_version_position]; } | ||
[[nodiscard]] std::string to_string() const; | ||
// We could make this smaller, as it is hexadecimal and 32 characters | ||
// but we are keeping one byte per element for simplicity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this simplifies things, I would say either store chars and keeps it 32 bytes or store numbers and make it 16 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it easier to reason on the values of every element.
What does this PR do?
Provide a default
runtime-id
tagMotivation
Ensure that the timeline data can be discovered for native.
Additional Notes
The
runtime_id
should come from tracingWhen we do not find it, we should still generate a default
runtime-id
.How to test the change?
I have a minor test to check the format.
I did not add an end to end test for now.