[crashtracker] Enable dynamic updating of metadata and config at runtime #297

danielsn · 2024-01-30T18:17:43Z

What does this PR do?

Makes the CrashtrackerMetadata and CrashtrackerConfiguration dynamically updatable, rather than fixed at crashtracker startup.

Motivation

Ruby allows the user to update tags on the fly, so we need to support that.

Additional Notes

This will also be useful for a sidecar setup, where we would want to keep the metadata local until its time to upload.

How to test the change?

Run the test_crash test in crashtracker/api, and notice that the extra tag is output in the crash report.

For Reviewers

If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
This PR doesn't touch any of that.

ivoanjo · 2024-02-01T08:54:06Z

The way that the "endpoint" gets configured in Ruby is also dynamic, and can also change when configuration via code is used. Would it be possible to also allow that to be reconfigured?

ivoanjo

Seems reasonable! Left a few notes :)

crashtracker/src/crash_handler.rs

ivoanjo · 2024-02-06T15:49:22Z

crashtracker/src/crash_handler.rs

 fn handle_posix_signal_impl(signum: i32) -> anyhow::Result<()> {
    let mut receiver = match std::mem::replace(unsafe { &mut RECEIVER }, GlobalVarState::Taken) {
        GlobalVarState::Some(r) => r,


One thing that occurred to me while reviewing the PR is -- should we have something protecting handle_posix_signal_impl so that it can't get called multiple times in the same process?

Afaik if a thread is handling a signal (e.g. SIGSEGV), no other SIGSEGVs can be delivered. But I... don't think that applies to other threads in the same process.

May be worth giving it a try?

#318 to track

crashtracker/src/crash_info.rs

ivoanjo

Looks reasonable! I think the biggest note I have is around the design of the APIs e.g. exposing explicit update arguments vs allowing further calls to init.

(But if you're not convinced with that option, happy to approve as-is)

crashtracker/src/crash_handler.rs

profiling-ffi/src/crashtracker.rs

ivoanjo · 2024-02-16T10:56:20Z

crashtracker/src/crash_handler.rs

+    let old = METADATA.swap(box_ptr, SeqCst);
+    if !old.is_null() {
+        // Safety: This can only come from a box above.
+        unsafe {
+            std::mem::drop(Box::from_raw(old));
+        }
+    }


It may be worth clearly documenting this design where we expect accesses to all these AtomicPtr to always be made through swap and never through load.

(Or introduce a quick wrapper class that takes care of this?)

Otherwise, when reading the code, you need to reverse engineer this important fact by checking every place where we access them.

crashtracker/src/crash_handler.rs

ivoanjo · 2024-02-16T11:05:39Z

crashtracker/src/crash_handler.rs

    anyhow::ensure!(
-        matches!(old_receiver, GlobalVarState::Unassigned),
-        "Error registering crash handler receiver: receiver already existed {old_receiver:?}"
+        old_receiver.is_null(),
+        "Error registering crash handler receiver: receiver already existed"
    );


If the semantics here is to say an existing receiver already existing is an error, should we do a compare and swap on RECEIVER instead, expecting it to be null?

Right now it's kinda weird that we successfully set up the new receiver, and only then we return an error.

I modified the semantics of registering a reciever

crashtracker/src/crash_handler.rs

ivoanjo

👍 LGTM

crashtracker/src/crash_handler.rs

ivoanjo · 2024-02-19T12:23:21Z

crashtracker/src/crash_handler.rs

 pub fn register_crash_handlers(create_alt_stack: bool) -> anyhow::Result<()> {
-    anyhow::ensure!(OLD_HANDLERS.load(SeqCst).is_null());
+    if !OLD_HANDLERS.load(SeqCst).is_null() {
+        return Ok(());
+    }


Minor: Not sure this is something we want to care about, but one thing we could do when the function is called is to check that our signal handlers are still active, e.g. because something else on the system may have overwritten them.

(I'm thinking of this, because there may be a situation where someone starts the crash tracker, and then discovers that the VM overwrites their signal handlers after X, and then calls register_crash_handlers again expecting libdatadog to be restored but isn't. Maybe too farfetched?)

profiling-ffi/src/crashtracker.rs

Enable crashtracker metadata to be modified at runtime

1c5b239

danielsn requested review from a team as code owners January 30, 2024 18:17

github-actions bot added the profiling Relates to the profiling* modules. label Jan 30, 2024

danielsn added 2 commits January 31, 2024 14:58

Merge branch 'main' into dsn/crashtrack-update-metadata

039bdcd

Merge branch 'main' into dsn/crashtrack-update-metadata

f9ba3e2

ivoanjo reviewed Feb 6, 2024

View reviewed changes

danielsn added 7 commits February 12, 2024 17:59

Merge branch 'main' into dsn/crashtrack-update-metadata

5b0acd6

use atomic for RECEIVER

c19efc0

use atomic ptr for the old actions

f3b5d25

Leak the old handlers during a crash

fee3619

Better way of handling metadata

6e29ff0

Better way of handling config

2fbd711

Merge branch 'main' into dsn/crashtrack-update-metadata

9dcca97

danielsn changed the title ~~[crashtracker] Enable dynamic updating of metadata at runtime~~ [crashtracker] Enable dynamic updating of metadata and config at runtime Feb 15, 2024

danielsn added 2 commits February 15, 2024 14:38

Comment out the test

b704dca

PR Comment

25a89ea

danielsn mentioned this pull request Feb 15, 2024

Protect crashtracker against multiple signal deliveries #318

Open

ivoanjo reviewed Feb 16, 2024

View reviewed changes

danielsn added 5 commits February 16, 2024 17:31

Make init idempotent, as per PR request

8457cd6

Add context

89c87cb

better comment

d979539

Better includes

900d983

turn the test back off

2f9eee7

ivoanjo approved these changes Feb 19, 2024

View reviewed changes

typo

bd16068

danielsn mentioned this pull request Feb 21, 2024

Profiling: mechanism to check that signal handlers are still active #325

Open

danielsn added 2 commits February 20, 2024 19:36

PR comments

146e9e0

forgot to update name

e98178f

Merge branch 'main' into dsn/crashtrack-update-metadata

03dd82d

danielsn merged commit fba951c into main Feb 21, 2024
20 checks passed

danielsn deleted the dsn/crashtrack-update-metadata branch February 21, 2024 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[crashtracker] Enable dynamic updating of metadata and config at runtime #297

[crashtracker] Enable dynamic updating of metadata and config at runtime #297

danielsn commented Jan 30, 2024 •

edited

Loading

ivoanjo commented Feb 1, 2024

ivoanjo left a comment

ivoanjo Feb 6, 2024

danielsn Feb 15, 2024

ivoanjo left a comment

ivoanjo Feb 16, 2024

ivoanjo Feb 16, 2024 •

edited

Loading

danielsn Feb 16, 2024

ivoanjo left a comment

ivoanjo Feb 19, 2024

danielsn Feb 21, 2024

[crashtracker] Enable dynamic updating of metadata and config at runtime #297

[crashtracker] Enable dynamic updating of metadata and config at runtime #297

Conversation

danielsn commented Jan 30, 2024 • edited Loading

What does this PR do?

Motivation

Additional Notes

How to test the change?

For Reviewers

ivoanjo commented Feb 1, 2024

ivoanjo left a comment

Choose a reason for hiding this comment

ivoanjo Feb 6, 2024

Choose a reason for hiding this comment

danielsn Feb 15, 2024

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

ivoanjo Feb 16, 2024

Choose a reason for hiding this comment

ivoanjo Feb 16, 2024 • edited Loading

Choose a reason for hiding this comment

danielsn Feb 16, 2024

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

ivoanjo Feb 19, 2024

Choose a reason for hiding this comment

danielsn Feb 21, 2024

Choose a reason for hiding this comment

danielsn commented Jan 30, 2024 •

edited

Loading

ivoanjo Feb 16, 2024 •

edited

Loading