Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[crashtracker] Enable dynamic updating of metadata and config at runtime #297

Merged
merged 21 commits into from
Feb 21, 2024

Conversation

danielsn
Copy link
Contributor

@danielsn danielsn commented Jan 30, 2024

What does this PR do?

Makes the CrashtrackerMetadata and CrashtrackerConfiguration dynamically updatable, rather than fixed at crashtracker startup.

Motivation

Ruby allows the user to update tags on the fly, so we need to support that.

Additional Notes

This will also be useful for a sidecar setup, where we would want to keep the metadata local until its time to upload.

How to test the change?

Run the test_crash test in crashtracker/api, and notice that the extra tag is output in the crash report.

For Reviewers

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

@danielsn danielsn requested review from a team as code owners January 30, 2024 18:17
@github-actions github-actions bot added the profiling Relates to the profiling* modules. label Jan 30, 2024
@ivoanjo
Copy link
Member

ivoanjo commented Feb 1, 2024

The way that the "endpoint" gets configured in Ruby is also dynamic, and can also change when configuration via code is used. Would it be possible to also allow that to be reconfigured?

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable! Left a few notes :)

crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
Comment on lines 172 to 174
fn handle_posix_signal_impl(signum: i32) -> anyhow::Result<()> {
let mut receiver = match std::mem::replace(unsafe { &mut RECEIVER }, GlobalVarState::Taken) {
GlobalVarState::Some(r) => r,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that occurred to me while reviewing the PR is -- should we have something protecting handle_posix_signal_impl so that it can't get called multiple times in the same process?

Afaik if a thread is handling a signal (e.g. SIGSEGV), no other SIGSEGVs can be delivered. But I... don't think that applies to other threads in the same process.

May be worth giving it a try?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#318 to track

crashtracker/src/crash_info.rs Outdated Show resolved Hide resolved
crashtracker/src/crash_info.rs Outdated Show resolved Hide resolved
@danielsn danielsn changed the title [crashtracker] Enable dynamic updating of metadata at runtime [crashtracker] Enable dynamic updating of metadata and config at runtime Feb 15, 2024
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable! I think the biggest note I have is around the design of the APIs e.g. exposing explicit update arguments vs allowing further calls to init.

(But if you're not convinced with that option, happy to approve as-is)

crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
profiling-ffi/src/crashtracker.rs Outdated Show resolved Hide resolved
Comment on lines +85 to +91
let old = METADATA.swap(box_ptr, SeqCst);
if !old.is_null() {
// Safety: This can only come from a box above.
unsafe {
std::mem::drop(Box::from_raw(old));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth clearly documenting this design where we expect accesses to all these AtomicPtr to always be made through swap and never through load.

(Or introduce a quick wrapper class that takes care of this?)

Otherwise, when reading the code, you need to reverse engineer this important fact by checking every place where we access them.

crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
Comment on lines 125 to 128
anyhow::ensure!(
matches!(old_receiver, GlobalVarState::Unassigned),
"Error registering crash handler receiver: receiver already existed {old_receiver:?}"
old_receiver.is_null(),
"Error registering crash handler receiver: receiver already existed"
);
Copy link
Member

@ivoanjo ivoanjo Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the semantics here is to say an existing receiver already existing is an error, should we do a compare and swap on RECEIVER instead, expecting it to be null?

Right now it's kinda weird that we successfully set up the new receiver, and only then we return an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the semantics of registering a reciever

crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

crashtracker/src/crash_handler.rs Outdated Show resolved Hide resolved
Comment on lines 317 to 320
pub fn register_crash_handlers(create_alt_stack: bool) -> anyhow::Result<()> {
anyhow::ensure!(OLD_HANDLERS.load(SeqCst).is_null());
if !OLD_HANDLERS.load(SeqCst).is_null() {
return Ok(());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Not sure this is something we want to care about, but one thing we could do when the function is called is to check that our signal handlers are still active, e.g. because something else on the system may have overwritten them.

(I'm thinking of this, because there may be a situation where someone starts the crash tracker, and then discovers that the VM overwrites their signal handlers after X, and then calls register_crash_handlers again expecting libdatadog to be restored but isn't. Maybe too farfetched?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profiling-ffi/src/crashtracker.rs Show resolved Hide resolved
profiling-ffi/src/crashtracker.rs Outdated Show resolved Hide resolved
@danielsn danielsn merged commit fba951c into main Feb 21, 2024
20 checks passed
@danielsn danielsn deleted the dsn/crashtrack-update-metadata branch February 21, 2024 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Relates to the profiling* modules.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants