Configure stack size #284

r1viollet · 2023-07-05T11:29:12Z

What does this PR do?

Add a parameter to configure the size of samples.

Motivation

Ensure we can unwind deeper stacks.
Discussions in #276

Additional Notes

For now the maximum value allowed by linux is the closest value to USHORT_MAX, 8 byte aligned:
65528
If this is not enough we should consider a different kernel API.

How to test the change?

A test with deep stacks was added. I will run this in CI.
https://github.com/DataDog/ddprof-build/pull/101

include/lib/allocation_event.hpp

r1viollet · 2023-07-06T07:17:57Z

src/pevent_lib.cc

@@ -55,11 +58,34 @@ static void display_system_config(void) {
  }
 }

+int pevent_compute_min_mmap_order(int min_buffer_size_order,


@nsavoire @sanchda would you agree with computing the size of the ring buffer based on the requested sample size ?

nsavoire · 2023-07-06T08:06:52Z

I find that the naming sample_stack_user used throughout the PR is not very good at telling what the parameter really is.
Perhaps something like: stack_sample_size or user_stack_sample_size

include/ddprof_defs.hpp

include/lib/allocation_event.hpp

src/ddprof_cli.cc

r1viollet · 2023-07-06T08:23:05Z

I find that the naming sample_stack_user used throughout the PR is not very good at telling what the parameter really is. Perhaps something like: stack_sample_size or user_stack_sample_size

I thought the same, though I wanted the consistency with the perf_event_open API

       sample_stack_user (since Linux 3.7)
              This defines the size of the user stack to dump if
              PERF_SAMPLE_STACK_USER is specified.

I guess we can forget about consistency and make it more user friendly ?

src/ddprof_cmdline.cc

src/pevent_lib.cc

test/deep_stacks.cc

test/deep_stacks.sh

test/deep_stacks.cc

nsavoire · 2023-07-06T08:49:02Z

test/deep_stacks.sh

+./ddprof --config ${CONFIG_DEEP_STACK} ./test/deep_stacks | tee ./log_ddprof.txt
+
+# Check for truncated stack traces
+truncated_input=$(awk -F': ' '/datadog.profiling.native.unwind.stack.truncated_input/ { print $NF }' ./log_ddprof.txt)


Why not test that the stacks are correct ?

I would need to open source prof-correctness to avoid duplicating some of the logics for this.

Yes, but in its current state this test does not test much (and is quite complicated).
In release mode, everything is probably inlined in main function.

Trying out release mode, everything looks OK

I did add a "no inline" attribute to make sure. Thanks for mentioning this.

Various adjustments based on pull request #284 comments - Adjust option naming - Add a template configuration element to properly account for default settings - other minor fixes

r1viollet · 2023-07-06T11:38:29Z

The failures in CI are related to mmap failures..

r1viollet · 2023-07-06T11:48:37Z

I find that the naming sample_stack_user used throughout the PR is not very good at telling what the parameter really is. Perhaps something like: stack_sample_size or user_stack_sample_size

I thought the same, though I wanted the consistency with the perf_event_open API
       sample_stack_user (since Linux 3.7)
              This defines the size of the user stack to dump if
              PERF_SAMPLE_STACK_USER is specified.
I guess we can forget about consistency and make it more user friendly ?

I adjusted the naming, we are now inconsistent, though the setting is easier to understand.

- Ensure we take into account values from the template configuration when parsing a new configuration

r1viollet · 2023-07-06T13:26:04Z

The failures in CI are related to mmap failures..

I reverted the increase in buffer size :-(

- Removal of useless volatile keyword

include/lib/allocation_event.hpp

test/deep_stacks.cc

src/event_parser/event_parser.y

nsavoire · 2023-07-06T13:58:18Z

include/ddprof_defs.hpp

+// considering sample size, we adjust the size of ring buffers.
+// Following is considered as a minimum number of samples to be fit in the
+// ring buffer.
+constexpr auto k_min_number_samples_per_ring_buffer = 7;


My point was default config seems wasteful:

event size is 32992 bytes (32768 + 224)

hence buffer size order to accommodate 7 samples will be 6 , ie. 266240 bytes with a usable size of 262144 bytes

meaning that with 7 samples in the buffer, 31200 bytes will remain free

What about decreasing default sample stack size to something like 32000 bytes, then we could fit 8 samples in the buffer and 4352 bytes would remain for other events ?

I pushed a change on default sample sizes.

- More tweaking on default sizes

nsavoire

LGTM !

r1viollet added 4 commits July 5, 2023 09:48

Add a parameter to configure the size of the sampled stack for users

c27726d

Adjust the size of the ring buffer using the user's stack sample size

36f728c

Add a test with deep stacks

b683df9

Adjust the way we encode the dynamic size of allocation events

6d3d342

r1viollet force-pushed the r1viollet/config_sample_size branch from fb7350d to 6d3d342 Compare July 5, 2023 14:09

r1viollet commented Jul 5, 2023

View reviewed changes

include/lib/allocation_event.hpp Outdated Show resolved Hide resolved

Add an input check on the value of sample_stack_user parameter

ab9f23a

r1viollet commented Jul 6, 2023

View reviewed changes

Minor simplification of the allocation event structure

e7249b6

r1viollet force-pushed the r1viollet/config_sample_size branch from 0e9bced to e7249b6 Compare July 6, 2023 07:30

r1viollet assigned sanchda and nsavoire Jul 6, 2023

r1viollet marked this pull request as ready for review July 6, 2023 07:44

r1viollet requested a review from sanchda as a code owner July 6, 2023 07:44

Adjust unit test to an allowed sample size

f136a1c

r1viollet force-pushed the r1viollet/config_sample_size branch from aa5ba50 to f136a1c Compare July 6, 2023 07:55