Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve unknown/undefined types for tracepoints #1485

Merged
merged 2 commits into from
Sep 8, 2020

Conversation

viktormalik
Copy link
Contributor

@viktormalik viktormalik commented Aug 26, 2020

If a type is defined by a typedef located in a non-included header, clang parser will fail. This change analyses error messages from the clang parser and retrieves the missing typedefs from BTF (if it is used). If all types cannot be resolved, it provides the user a hint: either to include BTF (running with --btf) or to include relevant headers.

Resolves #1154, resolves #730. It should also help with iovisor/kubectl-trace#109.

I was also wondering, if it would make sense to get rid of the --btf option completely. AFAIU, it is used for tracepoints only. Using BTF automatically whenever it is available would make most of the "unknown type name" errors disappear in default bpftrace setup.

Checklist
  • Language changes are updated in docs/reference_guide.md
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

@viktormalik viktormalik force-pushed the btf-unknown-typedefs branch 2 times, most recently from 28435a9 to dd8b829 Compare August 26, 2020 11:11
@mmisono
Copy link
Collaborator

mmisono commented Sep 1, 2020

I found that if a struct has a member whose type is not in the BTF, then bpftrace hangs (though probably this does not happen in most cases.)

% ./src/bpftrace -e 'struct foo {aaa a;};  BEGIN { @ = ((struct foo*)0)->a}' -b
// hang

I was also wondering, if it would make sense to get rid of the --btf option completely.

I also think using BTF whenever it is available is reasonable. Is there any downside? possible slowdown?

@viktormalik
Copy link
Contributor Author

I found that if a struct has a member whose type is not in the BTF, then bpftrace hangs (though probably this does not happen in most cases.)

Correct, thanks for noticing. Even though this shouldn't happen for tracepoints, hanging is not good. I fixed it, now it terminates and shows the corresponding error message.

I also think using BTF whenever it is available is reasonable. Is there any downside? possible slowdown?

I didn't notice any slowdown, in my local setup, the run-times are almost equal with and without BTF. Also, we're using BTF for all other probe categories, where a larger number of probes can be attached at a time (thus more BTF lookups), so I don't see any reason not to use it for tracepoints, too.

Copy link
Collaborator

@mmisono mmisono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, though it's better to have another review. I think it is OK to remove --btf.

src/clang_parser.cpp Outdated Show resolved Hide resolved
Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with small comments, but my only concern is the string parsing of error messages. Not sure if error strings are part of clang API but I'd be worried if it wasn't. I didn't look around too hard in clang for alternative solutions so I'll take it on faith this is the best way.

bool bail_on_error)
{
error_msgs.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: callee clear could cause confusion later and is less flexible than caller clear. I'd prefer caller clear unless there's a strong reason.

// that imply an unresolved typedef of type_t. This cannot be done below in
// clang_visitChildren since clang does not have the unknown type names.
const std::string unknown_type_msg = "unknown type name \'";
for (auto &msg : diag_msgs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
for (auto &msg : diag_msgs)
for (const auto &msg : diag_msgs)

btf_and_input_files.emplace_back(CXUnsavedFile{
auto incomplete_types = get_incomplete_types(
input, input_files, args, bpftrace.btf_set_);
unsigned types_cnt = bpftrace.btf_set_.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More portable I suppose

Suggested change
unsigned types_cnt = bpftrace.btf_set_.size();
size_t types_cnt = bpftrace.btf_set_.size();

};
// If additional BTF types were found, we need to repeat the process since
// that might have introduced some new unresolved typedefs.
check_additional_types = types_cnt != bpftrace.btf_set_.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more clear like this instead

Suggested change
check_additional_types = types_cnt != bpftrace.btf_set_.size();
check_additional_types = incomplete_types.size();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, we need to check if a new incomplete type was found (one that wasn't in bpftrace.btf_set_ already), otherwise the loop could not terminate (see the comment above for example).

Comment on lines 634 to 626
auto incomplete_types = get_incomplete_types(
input, input_files, args, bpftrace.btf_set_);
unsigned types_cnt = bpftrace.btf_set_.size();
bpftrace.btf_set_.insert(incomplete_types.cbegin(),
incomplete_types.cend());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines should be in the if (process_btf) branch right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They weren't there before this PR, but I agree - bpftrace.btf_set_ should only be used when BTF is on.

@danobi
Copy link
Member

danobi commented Sep 2, 2020

I was also wondering, if it would make sense to get rid of the --btf option completely. AFAIU, it is used for tracepoints only. Using BTF automatically whenever it is available would make most of the "unknown type name" errors disappear in default bpftrace setup.

Tentative +1. I remember I tried doing this a couple times but I always hit some kind of realization that it would cause an issue. Can't seem to remember right now. It's possible that I've chipped away at reducing internal usage of --btf that there's no issue anymore.

* unknown type name 'type_t'
* return type_t.
*/
std::string ClangParser::ClangParser::get_unknown_type(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an optional instead of empty string would be cleaner API

Suggested change
std::string ClangParser::ClangParser::get_unknown_type(
std::optional<std::string> ClangParser::ClangParser::get_unknown_type(

If a type is defined by a typedef located in a non-included header,
clang parser will fail. This analyses error messages from the clang
parser and retrieves the missing typedefs from BTF (if it is used).

Add a test that requires this extension.

Resolves iovisor/bpftrace#1154.
There are two options how to handle undefined types:
- running with --btf as we are now able to parse missing type
  definitions from BTF
- including a correct header file

Resolves iovisor#730.
@viktormalik
Copy link
Contributor Author

my only concern is the string parsing of error messages. Not sure if error strings are part of clang API but I'd be worried if it wasn't. I didn't look around too hard in clang for alternative solutions so I'll take it on faith this is the best way.

I don't like this either. I tried to find an API for this, but it seems that there is none. Clang parser replaces all unknown types by int and it seems that it looses the information about the original type name. The only option could be extracting the information from Clang diagnostics (which is what we're doing here, anyway), but it doesn't seem that other parts of the diagnostics are more stable than the message itself.

@viktormalik
Copy link
Contributor Author

Tentative +1. I remember I tried doing this a couple times but I always hit some kind of realization that it would cause an issue. Can't seem to remember right now. It's possible that I've chipped away at reducing internal usage of --btf that there's no issue anymore.

The problem will come if the user tries to define a type that is also parsed from BTF. For example this program:

# bpftrace -e 'struct task_struct {int x;} i:ms:1 { printf("%d\n", curtask->x); }'

won't work because using curtask will force parsing struct task_struct from BTF and it will be then defined twice. I don't see a reason for doing this, but I may be missing something.

Anyway, I have a proposal implementation ready. Once this PR is merged, I'll open a new PR where we can discuss further.

@fbs fbs merged commit ec660c7 into bpftrace:master Sep 8, 2020
@viktormalik viktormalik deleted the btf-unknown-typedefs branch September 9, 2020 06:58
Comment on lines +636 to +638
// If additional BTF types were found, we need to repeat the process since
// that might have introduced some new unresolved typedefs.
check_additional_types = types_cnt != bpftrace.btf_set_.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktormalik
I found this PR causes a slowdown if an unknown struct is big. For example:

before this PR

% time sudo  ./src/bpftrace --btf -e 'struct f { struct task_struct x; } BEGIN{exit();}'
Attaching 1 probe..

0.94s user 0.20s system 80% cpu 1.420 tota

after

% time sudo ./src/bpftrace --btf -e 'struct f { struct task_struct x; } BEGIN{exit();}'
Attaching 1 probe...

7.76s user 0.59s system 96% cpu 8.646 total

The problem is this commented part, and for task_struct, this loop iterates 9 times. I wonder if we can reduce the iteration. Actually, I'm not sure when we need to iterate until the condition is satisfied. At least, the above example works without any iteration. Do you have an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this problem. The truth is that I don't have any example that would require multiple iterations, but it seems to me that it could, in theory, occur.

However, what I realized now, is that we run this loop to make sure that the Clang parser succeeds. So I believe that we can stop collecting the incomplete types once the Clang parser runs without errors. I implemented a simple fix in this branch. Could you please try it and confirm that the slowdown is gone? I'll run some tests myself and then I can open a PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it drastically improves speed.

% time sudo ./src/bpftrace --btf -e 'struct f { struct task_struct x; } BEGIN{exit();}'
Attaching 1 probe...


0.47s user 0.27s system 72% cpu 1.009 total

Please open a PR and let's discuss this further there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.
It's not that easy after all, because this simple fix breaks one regression test. The problem is that the type resolution must be run at least once. I'll come with a better fix and open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants