Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable a symbol name cache if ASLR is enabled #1033

Merged
merged 1 commit into from
Feb 12, 2020

Conversation

mmisono
Copy link
Collaborator

@mmisono mmisono commented Dec 26, 2019

resolve_usym() caches a bcc_symbol object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Note and known issues:

  • A cache is discarded whenever resolve_usym is called even if a pid is
    the same as the old one. This is because pid may be reused.
  • This does not check whether a binary is PIE ASLAR or not. Note that
    even if a binary is not PIE ASLR, addresses of shared libraries are
    randomized if ASLR is enabled. (If a binary is not PIE ASLR and
    resolve_usym() resolves symbol in a binary, we can utilize a cache.)
  • If ASLR is disabled on the first execution but enabled on the second
    execution, resolve_usym() for the second run will use the previous
    cache.
  • I'm not sure how much performance impact this has. If the impact is
    huge, maybe this should be an option.
  • As discussed in ustack not symbolicated if traced process exits first #246, symbolizing will fail after process termination
    (this is a separate issue). For example:
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@No such file or directory: /proc/3557/personality
[
    0x7fea4211c990
]: 3
@No such file or directory: /proc/3554/personality
[
    0x7f32bc51a990
]: 3

Closes #1031 and solves the second part of #75.

@fbs
Copy link
Contributor

fbs commented Dec 27, 2019

@no such file or directory: /proc/3557/personality

I do not like the mid output errors, makes it visually very messy. The errors aren't really informative either, it is just a confusing way of telling people that the process has exited before the stackframe was symbolized. I think it is better to aggregate them and print them last, skipping ENOENT. Or just skip them at all.

A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one
I'm not sure how much performance impact this has. If the impact is
huge, maybe this should be an option.

We should measure this, I can't imagine it being "cheap" for the larger binaries. Although I do not know what tricks bcc has, they might have some tricks to speed it up.

I think we should make this thing optional with a flag that accepts:

  • on, flush the cache to avoid breaking with ASLR
  • off, do not flush the cache
  • auto, auto detect based on kernel/program

@mmisono
Copy link
Collaborator Author

mmisono commented Dec 28, 2019

I do not like the mid output errors, makes it visually very messy.

I feel the same. I'm thinking about making output only when bt_verbose.

I'll look into the overhead. By the way, we can use BPFTRACE_NO_USER_SYMBOLS=1 to disable user symbol resolutions.

@mmisono
Copy link
Collaborator Author

mmisono commented Jan 5, 2020

This is the result of the simple benchmark.

for i in `seq 1 1 10`
do
    sudo ./src/bpftrace -c "/bin/bash" -e \
    'u:/bin/bash:readline { @start = elapsed; unroll(20) { unroll('$i') { printf("%d %s %s\n", cpu, usym(reg("ip")), ustack);}} exit();}
     END { printf("%d\n", elapsed-@start); delete(@start);}' \
    | sed '/^$/d' | tail -n1
done
cache       nocache
147954778   95647408
125798568   137093331
123463690   190338079
124233236   189265622
136969078   249505589
154593444   265685467
123442989   312219876
141076933   387775882
124469933   404608889
130294544   459923131

Time increases proportionally if the cache is disabled.

@mmisono
Copy link
Collaborator Author

mmisono commented Jan 5, 2020

I made the following changes.

  • Disable caching when ASLR is enabled on system and -c option is not given (if trace one program execution, caching is fine)
  • Introduce BPFTRACE_CACHE_USER_SYMBOLS env variable to force caching.

clang-format test failed. But I think it is much easier to read codes without newlines in this case...

Copy link
Contributor

@fbs fbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @dxu @mmarchini objections?

( This should probably moved out of bpftrace as some point, #1080 )

… given

`resolve_usym()` caches a `bcc_symbol` object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Introduce `BPFTRACE_CACHE_USER_SYMBOLS` env variable to force caching.
Caching is fine if only trace one program execution.

Note and known issues:

- A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one. This is because pid may be reused.
- This does not check whether a binary is PIE ASLAR or not. Note that
even if a binary is not PIE ASLR, addresses of shared libraries are
randomized if ASLR is enabled.  (If a binary is not PIE ASLR and
`resolve_usym()` resolves symbol in a binary, we can utilize a cache.)
- If ASLR is disabled on the first execution but enabled on the second
execution, `resolve_usym()` for the second run will use the previous
cache.
- As discussed in bpftrace#246, symbolizing will fail after process termination
(this is a separate issue). For example:

```
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@
[
    0x7fea4211c990
]: 3
@
[
    0x7f32bc51a990
]: 3
```

-----

Closes bpftrace#1031 and solves the second part of bpftrace#75.
@brendangregg
Copy link
Contributor

LGTM

@fbs fbs merged commit 4651255 into bpftrace:master Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ustack can't solve symbol after first run
3 participants