Print arbitrary data safely #1107

acj · 2020-01-27T03:25:23Z

Fixes #659. This PR introduces a built-in function buf that sanitizes arbitrary data so that it's safe to print. The function is similar in implementation to str and shares its constraints on length. The differences mostly stem from not using CreateProbeReadStr (can't expect the buffer to end on a zero byte) and sorting out the necessary plumbing.

I couldn't find a straightforward way to pass the buffer's length to userspace independently from the buffer itself, so I've embedded it as the first byte in the buffer. Feels a bit odd, but it's simple and works.

I'm opening this as a draft because it still needs docs and perhaps another test. It's ready for initial feedback, though.

sudo src/bpftrace -e 'tracepoint:syscalls:sys_enter_sendto { printf("%s\n", buf(args->buff, args->len)); }' -c 'ping 8.8.8.8 -c1'
Attaching 1 probe...
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
\x08\x00\xb1\x85D\xc3\x00\x01\xceQ.^\x00\x00\x00\x0073\x0f\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&'()*+,-./0123456
64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=20.5 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 20.474/20.474/20.474/0.000 ms

mmisono · 2020-01-27T22:15:28Z

The current behavior is (please correct me if I'm worng):

If a struct is

struct x {
    char* a;     // Type::integer
    char  b[4];  // Type::string
    int8  c[4];  // Type::array (pointee size 1)
    int   d[4];  // Type::array (pointee size 4)
}

then,

	Type	How to read buffer	`$a = expr`	map / printf()
`x->a`	Type::integer		y	y
`x->a[0]`	not supported		n	n
`str(x->a)`	Type::string	`probe_read_str()`	y	y
`buf(x->a)`	Type::buf	`probe_read()`	n	y (printf() only)
`x->b`	Type::string	`probe_read()`	y	y
`x->b[0]`	not supported		n	n
`str(x->b)`	not supported		n	n
`buf(x->b)`	Type::buf	`probe_read()`¹	n	y (printf() only)
`x->c`	Type::array		n	n
`x->c[0]`	Type::integer		y	y
`str(x->c)`	not supported		n	n
`buf(x->c)`	not supported		n	n
`x->d`	Type::array		n	n
`x->d[0]`	Type::integer		y	y
`str(x->d)`	not supported		n	n
`buf(x->d)`	not supported		n	n

I think this is OK for now, but it would be nice if the above operations are supported (and would be less confusing).

I think the following behavior is straightforward

struct x {
    char* a;     // Type::integer
    char  b[4];  // Type::array (pointee size 1)
    int8  c[4];  // Type::array (pointee size 1)
    int   d[4];  // Type::array (pointee size 4)
}

	Type	How to read buffer	`$x = expr`	map / printf()
`x->a`	Type::integer		y	y
`x->a[0]`	Type::integer		y	y
`str(x->a)`	Type::string	`probe_read_str()`	y	y
`buf(x->a)`	Type::buf	`probe_read()`	y	y
`x->b`	Type::array		y	y ¹
`x->b[0]`	Type::integer		y	y
`str(x->b)`	Type::string	`probe_read_str()`	y	y
`buf(x->b)`	Type::buf	`probe_read()`	y	y
`x->c`	Type::array		y	y ²
`x->c[0]`	Type::integer		y	y
`str(x->c)`	Type::string	`probe_read_str()`	y	y
`buf(x->c)`	Type::buf	`probe_read()`	y	y
`x->d`	Type::array		y	y ²
`x->d[0]`	Type::integer		y	y
`str(x->d)`	not supported		n	n
`buf(x->d)`	Type::buf ³	`probe_read()`	y	y

char [] is Type::array, not Type::string
Type::array value is not in the stack
When using Type::array value as a map key/value and printf() argument, read whole data by probe_read()/probe_read_str() implicitly

following probe_read() ↩ ↩²
data read by probe_read() ↩ ↩²
pointee size = 4 ↩

fbs · 2020-01-28T17:13:33Z

I like the idea of having buf as a way to read a chunk of binary data, (the probe_read alternative of str) and having it print "hex encoded" by makes sense too.

Being able to do printf("%x\n", buf()); feels weird to me as buf isn't a string. A new format specifiier (#659) makes more sense to me, as that can be used on others too, e.g. a string with non-printables/control chars.

And like @mmisono says, being able to read from array types would be nice too. In that case we can automatically determine the amount of bytes that need to be read instead letting the user specify it.

src/ast/codegen_llvm.cpp

src/ast/semantic_analyser.cpp

fbs · 2020-02-07T22:57:46Z

You might want to take a look at #1104 too. As adding a new type means you'll have to update quite a bit of code.

acj · 2020-03-02T01:26:06Z

I've prototyped the %r format specifier and addressed most of the earlier feedback. To keep things simple for now, args for %r must be a buffer. The codegen now uses a struct to pass the buffer and length back to user space. I've been focusing on getting the basics to work and haven't worked on memory efficiency yet.

This has basic support for using a buffer with maps, but it's not working yet (verifier error). I'll dig into this next.

@fbs if you have time, would love to get your thoughts on the changes.

src/ast/codegen_llvm.cpp

fbs · 2020-03-03T21:03:23Z

the basis looks ok to me :), some notes:

Using %r sounds ok for now. Would be nice if it would support more than just buffers in the future, but that could be done later.
The codegen suite has changed in master, you probably want to rebase to avoid issues
The map not working is probably due all the map stuff not knowing how to handle the new type, theres quite a few places that will need updating (see Use a load instruction to access a context field [v2] #1104)
In the runtime test you have: struct MyStruct { int d[4]; } ... buf($s->d, 16). It would be great if we can automatically figure out that we need to read 16 bytes instead of having the user keep track of byte counts

acj · 2020-03-16T00:08:36Z

@fbs This is ready for another look and feels pretty good to me. I think I've addressed all of your feedback. There's more nice-to-have stuff to do, e.g. let %r work for regular strings as well, but it might be good to land this and then make adjustments based on usage feedback.

src/ast/codegen_llvm.cpp

tests/codegen/llvm/call_buf_implicit_size.ll

fbs · 2020-03-16T14:58:40Z

src/types.h

@@ -41,6 +41,7 @@ enum class Type
  array,
  // BPF program context; needing a different access method to satisfy the verifier
  ctx,
+  buffer,


Can you add a comment like ctx hasd?

ctx is currently the only one with a comment, so I'm not sure what's expected here. Any suggestions for what it should say?

src/ast/codegen_llvm.cpp

tests/runtime/variable

fbs · 2020-03-16T15:15:12Z

src/ast/semantic_analyser.cpp

+    if (buffer_size > max_buffer_size)
+      buffer_size = max_buffer_size;
+
+    buffer_size++; // allow room for zero byte


Why do we need a zero byte? We already have its length?

I misunderstood what was happening here when I wrote the comment. I think we need the extra byte because otherwise arg.type.size will be one byte too small (because I'm using a byte to embed the buffer length), and the final print buffer will be corrupted:

https://github.com/iovisor/bpftrace/blob/aa94d9b363b70f9ed0f43d2345605bf7880f099c/src/ast/codegen_llvm.cpp#L1973

acj · 2020-03-26T01:48:22Z

Rebased and fixed a copy-and-paste mistake in codegen. The equality test is failing now due to the recent type fixes (e523e2c), and there's an offset bug when I try to use a buffer as the key for a histogram map. Will look into these issues soon.

acj · 2020-04-01T02:02:39Z

@fbs Okay, I think this is ready for a final pass. I'll add support for buffers as hist and lhist map keys for a followup PR if that works for you. Seems like an uncommon use case that doesn't need to block us.

fbs · 2020-04-05T23:02:01Z

I will take a look asap :)

danobi

Overall structure looks great; thanks for all the work. Just some small comments

src/ast/irbuilderbpf.cpp

src/ast/semantic_analyser.cpp

src/ast/codegen_llvm.cpp

src/bpftrace.cpp

docs/reference_guide.md

src/ast/codegen_llvm.cpp

acj · 2020-04-13T01:21:31Z

@danobi Thanks! I've addressed your feedback except for the type issue noted above

src/ast/async_event_types.h

src/ast/codegen_llvm.cpp

src/ast/async_event_types.h

src/bpftrace.cpp

acj · 2020-04-14T00:15:08Z

@danobi Okay, I took a shot at implementing those suggestions. I was with you until here:

Then you alloca a fixed_buffer_length + sizeof(uint8_t) length buffer.

This way we can properly cache the llvm types. And also be less misleading.

Are you suggesting that I stop using a struct, and instead allocate a single buffer to store both the contents and the length? That was how my original implementation worked, but the struct was recommended in an earlier round of feedback, and imo feels cleaner as it lets us use GEP instead of calculating offsets. Maybe I misunderstood your comment, though.

danobi · 2020-04-14T02:11:00Z

Are you suggesting that I stop using a struct, and instead allocate a single buffer to store both the contents and the length?

Yeah that's what I originally meant

and imo feels cleaner as it lets us use GEP instead of calculating offsets

Didn't realize you can't use GEP on non-array types. Makes sense I guess. Let's just go with the changes you have in b2f876e, then.

The CI is failing on two things:

clang-format
Incomplete type AsyncEvent::Buf()
- I think it's b/c clang 6 doesn't support FLAs. Not even sure an FLA is part of C++ standard. Maybe try char content[1] and see if that works.

Otherwise LGTM

fbs · 2020-04-14T13:02:28Z

I've just opened #1262 to get rid of llvm5 :)

danobi · 2020-04-14T18:18:04Z

I think we're good to merge. Any last comments?

fbs · 2020-04-14T18:57:51Z

Yes lets take 9fc6bf5 out of this PR.

Ill take a quick look at the latest change set

(woops misclicked, who puts close right next to comment)

src/ast/async_event_types.h

fbs · 2020-04-14T19:12:23Z

LGTM

acj · 2020-04-14T19:30:39Z

Removed the two commits that referred to LLVM 5.

Any last comments?

Thanks for your feedback and patience, @fbs @danobi @mmisono. I'm very happy with where this landed. Looking forward to many more patches. :)

I hope that all is well on your end.

fbs · 2020-04-14T23:11:57Z

Cheers. You should join our little irc channel on oftc at some point :)

danobi · 2020-04-14T23:29:56Z

Or matrix bridge :p

Rtoax · 2022-05-14T10:15:07Z

Does buf() support unsigned char *data .

struct sk_buff {
  unsigned char *data;
  unsigned int data_len;
};

when

printf("%r\n", buf($skb->data, $skb->len));

get error:

bpftrace: /usr/include/llvm/IR/Instructions.h:1207: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.
Aborted

Does buf() support this?

fbs reviewed Jan 28, 2020

View reviewed changes

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved

src/ast/semantic_analyser.cpp Outdated Show resolved Hide resolved

fbs mentioned this pull request Feb 11, 2020

[WIP] tracepoint __data_loc support #770

Closed

danobi mentioned this pull request Feb 21, 2020

Add uwrite builtin function #1150

Closed

acj force-pushed the print-strings-safely branch 2 times, most recently from 6fe58c3 to 1bed0ca Compare March 2, 2020 01:06

fbs reviewed Mar 3, 2020

View reviewed changes

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved

fbs reviewed Mar 3, 2020

View reviewed changes

src/ast/codegen_llvm.cpp Outdated Show resolved Hide resolved

acj force-pushed the print-strings-safely branch 3 times, most recently from fcc5c02 to b3a5a07 Compare March 15, 2020 23:48

acj marked this pull request as ready for review March 16, 2020 00:03

acj force-pushed the print-strings-safely branch from b3a5a07 to bbb1f71 Compare March 16, 2020 00:32

fbs requested changes Mar 16, 2020

View reviewed changes

acj force-pushed the print-strings-safely branch from ad1adf9 to c603ee5 Compare March 26, 2020 01:42

acj force-pushed the print-strings-safely branch 3 times, most recently from d3743dc to a97ccdc Compare April 1, 2020 01:29

danobi reviewed Apr 10, 2020

View reviewed changes

acj added 3 commits April 12, 2020 18:27

Add CreateProbeRead that takes an llvm::Value

2b89338

Add buf builtin function

1a6ce4b

Implement buf call in codegen

49c1ea8

acj added 3 commits April 12, 2020 20:50

Implement %r format specifier

7954af7

Add tests for buf call

e4ebc02

Mention buf built-in function in docs

1a678fd

acj force-pushed the print-strings-safely branch from a97ccdc to f9f033f Compare April 13, 2020 00:55

acj commented Apr 13, 2020

View reviewed changes

src/ast/codegen_llvm.cpp Show resolved Hide resolved

danobi reviewed Apr 13, 2020

View reviewed changes

acj force-pushed the print-strings-safely branch from b2f876e to 9fc6bf5 Compare April 14, 2020 11:22

fbs closed this Apr 14, 2020

fbs reopened this Apr 14, 2020

fbs requested changes Apr 14, 2020

View reviewed changes

src/ast/async_event_types.h Outdated Show resolved Hide resolved

Improve types for AsyncEventTypes::Buf

5971ca6

acj force-pushed the print-strings-safely branch from 9fc6bf5 to 5971ca6 Compare April 14, 2020 19:24

fbs approved these changes Apr 14, 2020

View reviewed changes

fbs merged commit a7b8e17 into bpftrace:master Apr 14, 2020

danobi mentioned this pull request Apr 16, 2020

Create changelog entries for merged PRs #1261

Merged

acj deleted the print-strings-safely branch December 22, 2020 01:14

jb55 mentioned this pull request Dec 30, 2020

eBPF Linux tracepoints bitcoin/bitcoin#19866

Merged

3 tasks

0xB10C mentioned this pull request May 26, 2021

tracing: first tracepoints and documentation on User-Space, Statically Defined Tracing (USDT) bitcoin/bitcoin#22006

Merged

3 tasks

Rtoax mentioned this pull request May 14, 2022

buf() print unsigned char * error #2215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print arbitrary data safely #1107

Print arbitrary data safely #1107

acj commented Jan 27, 2020

mmisono commented Jan 27, 2020

fbs commented Jan 28, 2020

fbs commented Feb 7, 2020

acj commented Mar 2, 2020

fbs commented Mar 3, 2020

acj commented Mar 16, 2020

fbs Mar 16, 2020

acj Mar 17, 2020

fbs Mar 16, 2020

acj Mar 17, 2020

acj commented Mar 26, 2020

acj commented Apr 1, 2020

fbs commented Apr 5, 2020

danobi left a comment

acj commented Apr 13, 2020

acj commented Apr 14, 2020 •

edited

Loading

danobi commented Apr 14, 2020

fbs commented Apr 14, 2020

danobi commented Apr 14, 2020

fbs commented Apr 14, 2020 •

edited

Loading

fbs commented Apr 14, 2020

acj commented Apr 14, 2020

fbs commented Apr 14, 2020

danobi commented Apr 14, 2020

Rtoax commented May 14, 2022

Print arbitrary data safely #1107

Print arbitrary data safely #1107

Conversation

acj commented Jan 27, 2020

mmisono commented Jan 27, 2020

Footnotes

fbs commented Jan 28, 2020

fbs commented Feb 7, 2020

acj commented Mar 2, 2020

fbs commented Mar 3, 2020

acj commented Mar 16, 2020

fbs Mar 16, 2020

Choose a reason for hiding this comment

acj Mar 17, 2020

Choose a reason for hiding this comment

fbs Mar 16, 2020

Choose a reason for hiding this comment

acj Mar 17, 2020

Choose a reason for hiding this comment

acj commented Mar 26, 2020

acj commented Apr 1, 2020

fbs commented Apr 5, 2020

danobi left a comment

Choose a reason for hiding this comment

acj commented Apr 13, 2020

acj commented Apr 14, 2020 • edited Loading

danobi commented Apr 14, 2020

fbs commented Apr 14, 2020

danobi commented Apr 14, 2020

fbs commented Apr 14, 2020 • edited Loading

fbs commented Apr 14, 2020

acj commented Apr 14, 2020

fbs commented Apr 14, 2020

danobi commented Apr 14, 2020

Rtoax commented May 14, 2022

acj commented Apr 14, 2020 •

edited

Loading

fbs commented Apr 14, 2020 •

edited

Loading