Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print arbitrary data safely #1107

Merged
merged 8 commits into from
Apr 14, 2020
Merged

Print arbitrary data safely #1107

merged 8 commits into from
Apr 14, 2020

Conversation

acj
Copy link
Contributor

@acj acj commented Jan 27, 2020

Fixes #659. This PR introduces a built-in function buf that sanitizes arbitrary data so that it's safe to print. The function is similar in implementation to str and shares its constraints on length. The differences mostly stem from not using CreateProbeReadStr (can't expect the buffer to end on a zero byte) and sorting out the necessary plumbing.

I couldn't find a straightforward way to pass the buffer's length to userspace independently from the buffer itself, so I've embedded it as the first byte in the buffer. Feels a bit odd, but it's simple and works.

I'm opening this as a draft because it still needs docs and perhaps another test. It's ready for initial feedback, though.

sudo src/bpftrace -e 'tracepoint:syscalls:sys_enter_sendto { printf("%s\n", buf(args->buff, args->len)); }' -c 'ping 8.8.8.8 -c1'
Attaching 1 probe...
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
\x08\x00\xb1\x85D\xc3\x00\x01\xceQ.^\x00\x00\x00\x0073\x0f\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&'()*+,-./0123456
64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=20.5 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 20.474/20.474/20.474/0.000 ms

@mmisono
Copy link
Collaborator

mmisono commented Jan 27, 2020

The current behavior is (please correct me if I'm worng):

If a struct is

struct x {
    char* a;     // Type::integer
    char  b[4];  // Type::string
    int8  c[4];  // Type::array (pointee size 1)
    int   d[4];  // Type::array (pointee size 4)
}

then,

Type How to read buffer $a = expr map / printf()
x->a Type::integer y y
x->a[0] not supported n n
str(x->a) Type::string probe_read_str() y y
buf(x->a) Type::buf probe_read() n y (printf() only)
x->b Type::string probe_read() y y
x->b[0] not supported n n
str(x->b) not supported n n
buf(x->b) Type::buf probe_read()1 n y (printf() only)
x->c Type::array n n
x->c[0] Type::integer y y
str(x->c) not supported n n
buf(x->c) not supported n n
x->d Type::array n n
x->d[0] Type::integer y y
str(x->d) not supported n n
buf(x->d) not supported n n

I think this is OK for now, but it would be nice if the above operations are supported (and would be less confusing).


I think the following behavior is straightforward

struct x {
    char* a;     // Type::integer
    char  b[4];  // Type::array (pointee size 1)
    int8  c[4];  // Type::array (pointee size 1)
    int   d[4];  // Type::array (pointee size 4)
}
Type How to read buffer $x = expr map / printf()
x->a Type::integer y y
x->a[0] Type::integer y y
str(x->a) Type::string probe_read_str() y y
buf(x->a) Type::buf probe_read() y y
x->b Type::array y y 1
x->b[0] Type::integer y y
str(x->b) Type::string probe_read_str() y y
buf(x->b) Type::buf probe_read() y y
x->c Type::array y y 2
x->c[0] Type::integer y y
str(x->c) Type::string probe_read_str() y y
buf(x->c) Type::buf probe_read() y y
x->d Type::array y y 2
x->d[0] Type::integer y y
str(x->d) not supported n n
buf(x->d) Type::buf 3 probe_read() y y
  • char [] is Type::array, not Type::string
  • Type::array value is not in the stack
  • When using Type::array value as a map key/value and printf() argument, read whole data by probe_read()/probe_read_str() implicitly

Footnotes

  1. following probe_read() 2

  2. data read by probe_read() 2

  3. pointee size = 4

@fbs
Copy link
Contributor

fbs commented Jan 28, 2020

I like the idea of having buf as a way to read a chunk of binary data, (the probe_read alternative of str) and having it print "hex encoded" by makes sense too.

Being able to do printf("%x\n", buf()); feels weird to me as buf isn't a string. A new format specifiier (#659) makes more sense to me, as that can be used on others too, e.g. a string with non-printables/control chars.

And like @mmisono says, being able to read from array types would be nice too. In that case we can automatically determine the amount of bytes that need to be read instead letting the user specify it.

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/semantic_analyser.cpp Outdated Show resolved Hide resolved
@fbs
Copy link
Contributor

fbs commented Feb 7, 2020

You might want to take a look at #1104 too. As adding a new type means you'll have to update quite a bit of code.

@acj acj force-pushed the print-strings-safely branch 2 times, most recently from 6fe58c3 to 1bed0ca Compare March 2, 2020 01:06
@acj
Copy link
Contributor Author

acj commented Mar 2, 2020

I've prototyped the %r format specifier and addressed most of the earlier feedback. To keep things simple for now, args for %r must be a buffer. The codegen now uses a struct to pass the buffer and length back to user space. I've been focusing on getting the basics to work and haven't worked on memory efficiency yet.

This has basic support for using a buffer with maps, but it's not working yet (verifier error). I'll dig into this next.

@fbs if you have time, would love to get your thoughts on the changes.

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
@fbs
Copy link
Contributor

fbs commented Mar 3, 2020

the basis looks ok to me :), some notes:

  • Using %r sounds ok for now. Would be nice if it would support more than just buffers in the future, but that could be done later.
  • The codegen suite has changed in master, you probably want to rebase to avoid issues
  • The map not working is probably due all the map stuff not knowing how to handle the new type, theres quite a few places that will need updating (see Use a load instruction to access a context field [v2] #1104)
  • In the runtime test you have: struct MyStruct { int d[4]; } ... buf($s->d, 16). It would be great if we can automatically figure out that we need to read 16 bytes instead of having the user keep track of byte counts

@acj acj force-pushed the print-strings-safely branch 3 times, most recently from fcc5c02 to b3a5a07 Compare March 15, 2020 23:48
@acj acj marked this pull request as ready for review March 16, 2020 00:03
@acj
Copy link
Contributor Author

acj commented Mar 16, 2020

@fbs This is ready for another look and feels pretty good to me. I think I've addressed all of your feedback. There's more nice-to-have stuff to do, e.g. let %r work for regular strings as well, but it might be good to land this and then make adjustments based on usage feedback.

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
tests/codegen/llvm/call_buf_implicit_size.ll Outdated Show resolved Hide resolved
@@ -41,6 +41,7 @@ enum class Type
array,
// BPF program context; needing a different access method to satisfy the verifier
ctx,
buffer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment like ctx hasd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is currently the only one with a comment, so I'm not sure what's expected here. Any suggestions for what it should say?

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
tests/runtime/variable Show resolved Hide resolved
if (buffer_size > max_buffer_size)
buffer_size = max_buffer_size;

buffer_size++; // allow room for zero byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a zero byte? We already have its length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood what was happening here when I wrote the comment. I think we need the extra byte because otherwise arg.type.size will be one byte too small (because I'm using a byte to embed the buffer length), and the final print buffer will be corrupted:

https://github.com/iovisor/bpftrace/blob/aa94d9b363b70f9ed0f43d2345605bf7880f099c/src/ast/codegen_llvm.cpp#L1973

@acj
Copy link
Contributor Author

acj commented Mar 26, 2020

Rebased and fixed a copy-and-paste mistake in codegen. The equality test is failing now due to the recent type fixes (e523e2c), and there's an offset bug when I try to use a buffer as the key for a histogram map. Will look into these issues soon.

@acj acj force-pushed the print-strings-safely branch 3 times, most recently from d3743dc to a97ccdc Compare April 1, 2020 01:29
@acj
Copy link
Contributor Author

acj commented Apr 1, 2020

@fbs Okay, I think this is ready for a final pass. I'll add support for buffers as hist and lhist map keys for a followup PR if that works for you. Seems like an uncommon use case that doesn't need to block us.

@fbs
Copy link
Contributor

fbs commented Apr 5, 2020

I will take a look asap :)

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall structure looks great; thanks for all the work. Just some small comments

src/ast/irbuilderbpf.cpp Show resolved Hide resolved
src/ast/semantic_analyser.cpp Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Show resolved Hide resolved
src/ast/codegen_llvm.cpp Show resolved Hide resolved
src/bpftrace.cpp Outdated Show resolved Hide resolved
docs/reference_guide.md Outdated Show resolved Hide resolved
docs/reference_guide.md Outdated Show resolved Hide resolved
@acj
Copy link
Contributor Author

acj commented Apr 13, 2020

@danobi Thanks! I've addressed your feedback except for the type issue noted above

src/ast/async_event_types.h Outdated Show resolved Hide resolved
src/ast/async_event_types.h Outdated Show resolved Hide resolved
src/ast/codegen_llvm.cpp Show resolved Hide resolved
src/ast/async_event_types.h Outdated Show resolved Hide resolved
src/bpftrace.cpp Outdated Show resolved Hide resolved
@acj
Copy link
Contributor Author

acj commented Apr 14, 2020

@danobi Okay, I took a shot at implementing those suggestions. I was with you until here:

Then you alloca a fixed_buffer_length + sizeof(uint8_t) length buffer.

This way we can properly cache the llvm types. And also be less misleading.

Are you suggesting that I stop using a struct, and instead allocate a single buffer to store both the contents and the length? That was how my original implementation worked, but the struct was recommended in an earlier round of feedback, and imo feels cleaner as it lets us use GEP instead of calculating offsets. Maybe I misunderstood your comment, though.

@danobi
Copy link
Member

danobi commented Apr 14, 2020

Are you suggesting that I stop using a struct, and instead allocate a single buffer to store both the contents and the length?

Yeah that's what I originally meant

and imo feels cleaner as it lets us use GEP instead of calculating offsets

Didn't realize you can't use GEP on non-array types. Makes sense I guess. Let's just go with the changes you have in b2f876e, then.


The CI is failing on two things:

  • clang-format
  • Incomplete type AsyncEvent::Buf()
    • I think it's b/c clang 6 doesn't support FLAs. Not even sure an FLA is part of C++ standard. Maybe try char content[1] and see if that works.

Otherwise LGTM

@fbs
Copy link
Contributor

fbs commented Apr 14, 2020

I've just opened #1262 to get rid of llvm5 :)

@danobi
Copy link
Member

danobi commented Apr 14, 2020

I think we're good to merge. Any last comments?

@fbs
Copy link
Contributor

fbs commented Apr 14, 2020

Yes lets take 9fc6bf5 out of this PR.

Ill take a quick look at the latest change set

(woops misclicked, who puts close right next to comment)

@fbs fbs closed this Apr 14, 2020
@fbs fbs reopened this Apr 14, 2020
src/ast/async_event_types.h Outdated Show resolved Hide resolved
@fbs
Copy link
Contributor

fbs commented Apr 14, 2020

LGTM

@acj
Copy link
Contributor Author

acj commented Apr 14, 2020

Removed the two commits that referred to LLVM 5.

Any last comments?

Thanks for your feedback and patience, @fbs @danobi @mmisono. I'm very happy with where this landed. Looking forward to many more patches. :)

I hope that all is well on your end.

@fbs fbs merged commit a7b8e17 into bpftrace:master Apr 14, 2020
@fbs
Copy link
Contributor

fbs commented Apr 14, 2020

Cheers. You should join our little irc channel on oftc at some point :)

@danobi
Copy link
Member

danobi commented Apr 14, 2020

Or matrix bridge :p

@Rtoax
Copy link
Contributor

Rtoax commented May 14, 2022

Does buf() support unsigned char *data .

struct sk_buff {
  unsigned char *data;
  unsigned int data_len;
};

when

printf("%r\n", buf($skb->data, $skb->len));

get error:

bpftrace: /usr/include/llvm/IR/Instructions.h:1207: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.
Aborted

Does buf() support this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A way to print arbitrary binary data safely?
5 participants