-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[crashtracker] Enable dynamic updating of metadata and config at runtime #297
Conversation
The way that the "endpoint" gets configured in Ruby is also dynamic, and can also change when configuration via code is used. Would it be possible to also allow that to be reconfigured? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable! Left a few notes :)
crashtracker/src/crash_handler.rs
Outdated
fn handle_posix_signal_impl(signum: i32) -> anyhow::Result<()> { | ||
let mut receiver = match std::mem::replace(unsafe { &mut RECEIVER }, GlobalVarState::Taken) { | ||
GlobalVarState::Some(r) => r, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that occurred to me while reviewing the PR is -- should we have something protecting handle_posix_signal_impl
so that it can't get called multiple times in the same process?
Afaik if a thread is handling a signal (e.g. SIGSEGV
), no other SIGSEGVs can be delivered. But I... don't think that applies to other threads in the same process.
May be worth giving it a try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#318 to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable! I think the biggest note I have is around the design of the APIs e.g. exposing explicit update
arguments vs allowing further calls to init
.
(But if you're not convinced with that option, happy to approve as-is)
let old = METADATA.swap(box_ptr, SeqCst); | ||
if !old.is_null() { | ||
// Safety: This can only come from a box above. | ||
unsafe { | ||
std::mem::drop(Box::from_raw(old)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth clearly documenting this design where we expect accesses to all these AtomicPtr
to always be made through swap
and never through load
.
(Or introduce a quick wrapper class that takes care of this?)
Otherwise, when reading the code, you need to reverse engineer this important fact by checking every place where we access them.
crashtracker/src/crash_handler.rs
Outdated
anyhow::ensure!( | ||
matches!(old_receiver, GlobalVarState::Unassigned), | ||
"Error registering crash handler receiver: receiver already existed {old_receiver:?}" | ||
old_receiver.is_null(), | ||
"Error registering crash handler receiver: receiver already existed" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the semantics here is to say an existing receiver already existing is an error, should we do a compare and swap on RECEIVER
instead, expecting it to be null
?
Right now it's kinda weird that we successfully set up the new receiver, and only then we return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the semantics of registering a reciever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
crashtracker/src/crash_handler.rs
Outdated
pub fn register_crash_handlers(create_alt_stack: bool) -> anyhow::Result<()> { | ||
anyhow::ensure!(OLD_HANDLERS.load(SeqCst).is_null()); | ||
if !OLD_HANDLERS.load(SeqCst).is_null() { | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Not sure this is something we want to care about, but one thing we could do when the function is called is to check that our signal handlers are still active, e.g. because something else on the system may have overwritten them.
(I'm thinking of this, because there may be a situation where someone starts the crash tracker, and then discovers that the VM overwrites their signal handlers after X, and then calls register_crash_handlers
again expecting libdatadog to be restored but isn't. Maybe too farfetched?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this PR do?
Makes the
CrashtrackerMetadata
andCrashtrackerConfiguration
dynamically updatable, rather than fixed at crashtracker startup.Motivation
Ruby allows the user to update tags on the fly, so we need to support that.
Additional Notes
This will also be useful for a sidecar setup, where we would want to keep the metadata local until its time to upload.
How to test the change?
Run the
test_crash
test in crashtracker/api, and notice that the extra tag is output in the crash report.For Reviewers
@DataDog/security-design-and-guidance
.