Bpftrace fixes for s390x #1241

sumanthkorikkar · 2020-04-03T16:44:30Z

The patch series provides register support for s390x. This also solves various issues that can occur in big-endian architectures.

bpftrace: Add s390x register support
bpftrace: Fix sarg support for s390x and other architectures
bpftrace: Fix type conversion in printf() builtin
bpftrace: Fix histogram output & stat output for big-endian systems
bpftrace: Add architecture specific testcase support
bpftrace: Introduce htonl, htons, ntohl, ntohs functions
bpftrace: Fix ntop builtin function
bpftrace: Fix variable assignment of 32 bit integers

sumanthkorikkar · 2020-04-03T16:50:21Z

Please review and request to merge into master.

Thank you.

sumanthkorikkar · 2020-04-03T18:54:41Z

Hello @fbs , @brendangregg @yonghong-song,

The last commit Fix variable assignment of 32 bit integers causes failure in tools-parsing-test.sh.

Other commits passes the tests.

Few questions:

Shall I again create a pull requests without the last commit?.
Or should I create a pull request per commit, so that it can be reviewed individually?

Any suggestions would be great. Thank you.

Best Regards
Sumanth

danobi · 2020-04-03T22:21:21Z

Hi @sumanthkorikkar , thanks for the PR. I'm skimming it and it seems reasonable (will look more closely next on monday) .

Shall I again create a pull requests without the last commit?.

It looks like it's hitting some kind of assertion. Do you think you can figure out what's going on?

If you wanna do partial merges of s390x support we can also go with a build flag that's off by default and fix the abort in another PR.

Or should I create a pull request per commit, so that it can be reviewed individually?

Usually we do it all in one PR, especially if the commits depend on each other. But see above if you wanna get most of the code in first so you don't have to keep fixing merge conflicts.

Other stuff:

can you look at the clang-format build job failures and apply the suggestions there?
might be a good time to decide what direction we wanna take static binaries. See inline comment

src/arch/ppc64.cpp

sumanthkorikkar · 2020-04-04T06:31:54Z

Hi @danobi

Hi @sumanthkorikkar , thanks for the PR. I'm skimming it and it seems reasonable (will look more closely next on monday) .

Ok. thank you.

Shall I again create a pull requests without the last commit?.

It looks like it's hitting some kind of assertion. Do you think you can figure out what's going on?

Sure. I will give it a try.

If you wanna do partial merges of s390x support we can also go with a build flag that's off by default and fix the abort in another PR.

Ok. I will separate s390 support and I will fix the abort in another PR

Or should I create a pull request per commit, so that it can be reviewed individually?

Usually we do it all in one PR, especially if the commits depend on each other. But see above if you wanna get most of the code in first so you don't have to keep fixing merge conflicts.

Other stuff:

can you look at the clang-format build job failures and apply the suggestions there?

Ok. Sure I will conform it to clang-format and send. Thanks

might be a good time to decide what direction we wanna take static binaries. See inline comment

Ok.

fbs · 2020-04-04T14:28:17Z

Thanks for working on this! :)

Haven't look closely either yet, but just some remarks:

As for the helpers. Having both a 'short/i16' and 'long/i32' versions is a bit different from the way most helpers currently work, which usually look at the input argument and derive stuff from it. Instead of having 4 helpers, cant we have one that just flips the byte order? Looking at the change that seems possible to do.

We're not 1.0 yet, but I'd rather not add any helpers that we don't need. If a good use case pops up we can always add more helpers (calls).

Re the current compiler issue:

bpftrace is not as strict with LLVM typing as LLVM wants us to be. While it "mostly works" you can run into weird issues like the crashes you're seeing now. A simple way to test that is running bpftrace -dd if it shows the before opt code but crashes during optimization it is likely a typing issue.

In this case it is caused by:

    $dport = $sk->__sk_common.skc_dport;
    $dport = ntohs($dport);

Which compiles down to

  %"$dport" = alloca i16
...
  %100 = load i16, i16* %"struct sock_common.skc_dport"
...
  %111 = or i16 %108, %110
  %112 = zext i16 %111 to i64
  store i64 %112, i16* %"$dport"

Doing a i64 store in a i16* causes the issue in this case. As a fix:

]$ diff /vagrant/tools/tcpconnect.bt tcpconnect.bt
46c46
<     $dport = $sk->__sk_common.skc_dport;
---
>     $dport = (uint64) $sk->__sk_common.skc_dport;

Lately I've been fixing most of these (e.g #1189) with #1231 and #1194 still open but there probably still are some left.

In this case the issue is mostly with the semantic analyser however. Your newly added functions have call.type set to i64 which appearently doesn't get checked properly during variable assignment as it only appears to check the type:

1481:  else if (search->second.type != assignment.expr->type.type) {

I want to start working on improving on this (#878) once the current codegen issues are fixed but that will take a while.

For now it will probably work if you make the return type of your new helpers match the input type, instead of casting it to i64.

Can you add codegen tests for the new helpers (after we've decided what to do with them, see fisrt part of comment).

What are your thoughts about testing this and making sure it keeps working? Are there any CI platforms that do s390. At least being able to do semi-regular (daily, weekly) builds would be a huge help.

fbs · 2020-04-04T14:45:18Z

As for splitting up, I having the first 4 or 5 commits going in first would be easier.

sumanthkorikkar · 2020-04-06T06:30:47Z

Hi @fbs,

Thanks for reviewing.

Thanks for working on this! :)

Haven't look closely either yet, but just some remarks:

As for the helpers. Having both a 'short/i16' and 'long/i32' versions is a bit different from the way most helpers currently work, which usually look at the input argument and derive stuff from it. Instead of having 4 helpers, cant we have one that just flips the byte order? Looking at the change that seems possible to do.

something like bswap() instead of hton*(). ?. Based on input type, Let me try that.
For user readability, I added these functions.

We're not 1.0 yet, but I'd rather not add any helpers that we don't need. If a good use case pops up we can always add more helpers (calls).

Re the current compiler issue:

bpftrace is not as strict with LLVM typing as LLVM wants us to be. While it "mostly works" you can run into weird issues like the crashes you're seeing now. A simple way to test that is running bpftrace -dd if it shows the before opt code but crashes during optimization it is likely a typing issue.

In this case it is caused by:
    $dport = $sk->__sk_common.skc_dport;
    $dport = ntohs($dport);
Which compiles down to
  %"$dport" = alloca i16
...
  %100 = load i16, i16* %"struct sock_common.skc_dport"
...
  %111 = or i16 %108, %110
  %112 = zext i16 %111 to i64
  store i64 %112, i16* %"$dport"
Doing a i64 store in a i16* causes the issue in this case. As a fix:
]$ diff /vagrant/tools/tcpconnect.bt tcpconnect.bt
46c46
<     $dport = $sk->__sk_common.skc_dport;
---
>     $dport = (uint64) $sk->__sk_common.skc_dport;

Ok. Thanks for mentioning this.

Lately I've been fixing most of these (e.g #1189) with #1231 and #1194 still open but there probably still are some left.

In this case the issue is mostly with the semantic analyser however. Your newly added functions have call.type set to i64 which appearently doesn't get checked properly during variable assignment as it only appears to check the type:
1481:  else if (search->second.type != assignment.expr->type.type) {
I want to start working on improving on this (#878) once the current codegen issues are fixed but that will take a while.

For now it will probably work if you make the return type of your new helpers match the input type, instead of casting it to i64.

Ok. I will check.

Can you add codegen tests for the new helpers (after we've decided what to do with them, see fisrt part of comment).

I will add this. But this is specific to little endian. [ llvm/*] datalayout is different for big endian. I will think about this and see how we improve codegen tests for both arch.

What are your thoughts about testing this and making sure it keeps working? Are there any CI platforms that do s390. At least being able to do semi-regular (daily, weekly) builds would be a huge help.

Ok. I need to check with my colleagues, if there is CI platforms for s390.

Thank you

sumanthkorikkar · 2020-04-06T06:55:06Z

Hi @fbs,

What are your thoughts about testing this and making sure it keeps working? Are there any CI platforms that do s390. At least being able to do semi-regular (daily, weekly) builds would be a huge help.

I have added bpftrace runtime-tests to our internal CI environment as of now. So, we run daily builds and runtime-tests.

sumanthkorikkar · 2020-04-06T10:14:14Z

Hi @fbs, @danobi

Made changes to initial 5 commits. Other commits related to hton* and later will be done in another PR. Request to merge this into master, if it is good. Thank you.

src/arch/ppc64.cpp

danobi · 2020-04-08T21:35:42Z

src/arch/ppc64.cpp

@@ -3,6 +3,11 @@
 #include <algorithm>
 #include <array>

+// For little endian 64 bit, sp + 32 + 8 regs save area + argX
+#define ARG0_STACK_LE 96


I can't seem to find where in the spec LE and BE are different for stack frame setup. Can you point me to it?

Powerpc64, little endian, See Power Architecture 64-Bit ELF V2 ABI Specification
https://members.openpowerfoundation.org/document/dl/576

See parameter save area

When the caller allocates the Parameter Save Area, it will always be automatically quadword aligned because it must always start at SP + 32. It shall be at least 8 doublewords in length. If a function needs to pass more than 8 doublewords of arguments, the Parameter Save Area shall be large enough to spill all register-based parameters and to contain the arguments that the caller stores in it.

Hm I'm a little confused. Seems like the ABI changed starting at power ISA 2.07B. So why is only LE using the new ABI?

Won't block this PR on these questions. Realizing the sarg issue is a big win already.

src/arch/s390.cpp

tests/runtime/engine/parser.py

danobi · 2020-04-08T23:22:53Z

I have added bpftrace runtime-tests to our internal CI environment as of now. So, we run daily builds and runtime-tests.

Sounds good. If you can find a way to set it up in the open source CI it would help us prevent future breakages at merge time. Not sure if it would be less work on the long run (depends on how much of our stuff implicitly depends on endianness, I guess).

sumanthkorikkar · 2020-04-09T08:27:37Z

I have added bpftrace runtime-tests to our internal CI environment as of now. So, we run daily builds and runtime-tests.

Sounds good. If you can find a way to set it up in the open source CI it would help us prevent future breakages at merge time. Not sure if it would be less work on the long run (depends on how much of our stuff implicitly depends on endianness, I guess).

Hi @danobi , @fbs

yes. that would be great for community. One of my colleague, referred me to this, So it should be possible.
https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z

I can try this. But I would need some input from you people about the requirements - (os requirements, services, scripts) to run this.?

Some docker scripts, travis yaml file which are currently used to run bpftrace
The following seems to use LXD container instead of docker and verify if we can run bpftrace as root.

Ok. I will have a look at build scripts which are available here. Thanks

danobi · 2020-04-09T19:30:12Z

But I would need some input from you people about the requirements - (os requirements, services, scripts) to run this.?

I don't know if we have any hard requirements other than it functioning as expected. I don't think any of us are too picky on how tests get run, just that they run somehow. We have github actions and travis ci at our disposal.

The following seems to use LXD container instead of docker and verify if we can run bpftrace as root.

This seems like a pain. We also have github actions set up. I looked around and I found this: https://github.com/marketplace/actions/run-on-architecture . Looks much easier to get set up than lxd. Seems like it's powered by https://github.com/multiarch/qemu-user-static which looks quite powerful.

fbs · 2020-04-09T21:54:34Z

Looks mostly good to me. I'm not that sure about "bpftrace: Fix type conversion in printf() builtin". Using cast_type probably works ok in most cases but because structs are casts and casts are casts (obviously) but not all types are 8 bytes. E.g. cpid is 4bytes.

I think that the better way to fix this is to get rid of that resize the semantic analyser does and instead make sure everything is properly stored. Which is what I want to do with #1231.

I think the best we can do is to accept for now and revert it later (e.g. in #1231).

sumanthkorikkar · 2020-04-12T16:04:54Z

But I would need some input from you people about the requirements - (os requirements, services, scripts) to run this.?

I don't know if we have any hard requirements other than it functioning as expected. I don't think any of us are too picky on how tests get run, just that they run somehow. We have github actions and travis ci at our disposal.

The following seems to use LXD container instead of docker and verify if we can run bpftrace as root.

This seems like a pain. We also have github actions set up. I looked around and I found this: https://github.com/marketplace/actions/run-on-architecture . Looks much easier to get set up than lxd. Seems like it's powered by https://github.com/multiarch/qemu-user-static which looks quite powerful.

Hi @danobi,

My findings:

I gave qemu-user-static a try. I tried running bpftrace command, s390x/fedora with priviliged mode docker run.

Run docker run -t -v /usr/src:/usr/src:ro -v /lib/modules/:/lib/modules:ro -v /sys/kernel/debug/:/sys/kernel/debug:rw --net=host --pid=host --privileged -t test-demo
Dockerfile contains setup to run sys_open_read() probe
Starting the script:
Error creating printf map: Function not implemented
Creation of the required BPF maps has failed.
Make sure you have all the required permissions and are not confined (e.g. like
snapcraft does). dmesg will likely have useful output for further troubleshooting
##[error]Process completed with exit code 1.
Compilation is quite slow and may timeout when running runtime tests.

-Related issue : multiarch/qemu-user-static#17. Tried this, but same error message.

However the same works without qemu-user-static for x86
Using lxd containers may not be the option, as this does not provide access to /sys/kernel/debug:rw, which bpftrace needs
Should discuss about other options with the team. But this may need some time.

Add s390x specific registers. This is needed for bpftrace builtins like argX, regs(), retval etc. This commit provides various functions to perform proper offset calculation from the pt_regs context. The builtin functions of bpftrace uses these offset functions to generate the proper bytecode for s390x Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>

Introduce arg_stack_offset() function and define it for each architecture. When the number of arguments passed to a function is greater than the number of available argument registers, then the additonal parameters are stored on the stack. This function calculates the argument stack offset for each architecture. This should fix the sarg builtin support for s390x/powerpc64/aarch64. It should also be noted that only 64 bit architecture is supported. Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>

LLVM might change the size used to store the print arguements in the stack according to BPF alignment requirements. This resulted in bpftrace printf() builtin interpreting the values not correctly. To avoid these, in commit df51b11 ("promote scalar printf args to 64-bits"), all the printf integer args were promoted to 64 bits. But however this causes issues when performing type conversion in big endian systems. In Big endian systems: 1. Current state: bpftrace -e 'BEGIN { printf("%d", (uint16)0x1234); exit() }' Returns 0 Expected result: 0x1234 2. Current state: In Big endian machines, this is interpreted as: Address: Value 0x0 : 12 0x1 : 34 --------- 0x2 : 00 0x3 : 00 0x4 : 00 0x5 : 00 0x6 : 00 0x7 : 00 format() will intepret it from 0x6-0x7 and print the result as 0. Hence the result is always zero. Expected state: In Big endian machines: Address: Value 0x0 : 00 0x1 : 00 0x2 : 00 0x3 : 00 0x4 : 00 0x5 : 00 0x6 : 00 0x7 : 12 0x8 : 34 format() will print 0x1234. Now let's see the bytecode: 0: (bf) r6 = r1 1: (b7) r1 = 0 2: (7b) *(u64 *)(r10 -8) = r1 last_idx 2 first_idx 0 regs=2 stack=0 before 1: (b7) r1 = 0 3: (b7) r2 = 4660 4: (6b) *(u16 *)(r10 -8) = r2 <<<<<<< The upper most 0x0-0x1 will be set to 0x1234, Which is OK for Little endian, but for big endian NOT OK. 5: (7b) *(u64 *)(r10 -16) = r1 6: (18) r7 = 0xab6d0c00 8: (85) call bpf_get_smp_processor_id#8 9: (bf) r4 = r10 10: (07) r4 += -16 <<<< arg_data.offset for calls + arg_data args. 11: (bf) r1 = r6 12: (bf) r2 = r7 13: (bf) r3 = r0 14: (b7) r5 = 16 15: (85) call bpf_perf_event_output#25 last_idx 15 first_idx 0 regs=20 stack=0 before 14: (b7) r5 = 16 16: (b7) r0 = 0 17: (95) exit 4. bpftrace also maintains arg.type.cast_type information. Utilize this information to correctly print the arguments for bigendian architectures. Test result: ./bpftrace -e 'BEGIN { printf("%x", (uint8)-1); exit() }' Attaching 1 probe... ff Fixes: bpftrace#47 Fixes: df51b11 ("promote scalar printf args to 64-bits") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>

1. bpftrace -e 'kretprobe:vfs_read { @=hist(512); exit(); }' 2. bpftrace histogram adds the 512 value into the wrong bucket - For big endian systems, the output value is always added to the 0th bucket instead of (log 512 base 2) bucket. The output could be wrong even for little endian systems, when the bucket number goes beyond 255. Avoid this by properly reading 8 bytes data instead of 1 byte char. - Also avoid 32 bit overflow of sum. All histogram, stats related testcases are now OK. Test result: ./bpftrace -e 'kretprobe:vfs_read { @[1]=hist(512); @[1]=hist(512); @[1]=hist(1024); exit(); }' Attaching 1 probe... @[1]: [512, 1K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1K, 2K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@| Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>

- Add ARCH support for the test tool - Fix architecture specific testcases Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>

sumanthkorikkar · 2020-04-12T16:51:09Z

Looks mostly good to me. I'm not that sure about "bpftrace: Fix type conversion in printf() builtin". Using cast_type probably works ok in most cases but because structs are casts and casts are casts (obviously) but not all types are 8 bytes. E.g. cpid is 4bytes.

I think that the better way to fix this is to get rid of that resize the semantic analyser does and instead make sure everything is properly stored. Which is what I want to do with #1231.

I think the best we can do is to accept for now and revert it later (e.g. in #1231).

Ok. Thanks

danobi

We can keep the CI conversation going in parallel. I think this is good to merge. I'll merge tomorrow if there aren't any new comments.

fbs · 2020-04-13T21:04:48Z

+1. Lets get the 0.10 release out first and merge it first thing after

danobi reviewed Apr 3, 2020

View reviewed changes

src/arch/ppc64.cpp Show resolved Hide resolved

sumanthkorikkar force-pushed the bpftrace-fixes-s390x branch from add486f to 61e3d0d Compare April 6, 2020 09:46

danobi reviewed Apr 8, 2020

View reviewed changes

sumanthkorikkar added 5 commits April 12, 2020 18:26

bpftrace: Add architecture specific testcase support

c90cec2

- Add ARCH support for the test tool - Fix architecture specific testcases Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>

sumanthkorikkar force-pushed the bpftrace-fixes-s390x branch from 61e3d0d to c90cec2 Compare April 12, 2020 16:49

danobi approved these changes Apr 13, 2020

View reviewed changes

fbs added this to the 0.11.0 milestone Apr 13, 2020

danobi merged commit 48d4051 into bpftrace:master Apr 13, 2020

danobi mentioned this pull request Apr 16, 2020

Create changelog entries for merged PRs #1261

Merged

danobi mentioned this pull request Jun 10, 2021

Add bswap() helper #1875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bpftrace fixes for s390x #1241

Bpftrace fixes for s390x #1241

sumanthkorikkar commented Apr 3, 2020

sumanthkorikkar commented Apr 3, 2020

sumanthkorikkar commented Apr 3, 2020

danobi commented Apr 3, 2020

sumanthkorikkar commented Apr 4, 2020

fbs commented Apr 4, 2020

fbs commented Apr 4, 2020

sumanthkorikkar commented Apr 6, 2020

sumanthkorikkar commented Apr 6, 2020

sumanthkorikkar commented Apr 6, 2020

danobi Apr 8, 2020

sumanthkorikkar Apr 9, 2020

danobi Apr 9, 2020 •

edited

Loading

danobi commented Apr 8, 2020

sumanthkorikkar commented Apr 9, 2020 •

edited

Loading

danobi commented Apr 9, 2020

fbs commented Apr 9, 2020

sumanthkorikkar commented Apr 12, 2020

sumanthkorikkar commented Apr 12, 2020

danobi left a comment •

edited

Loading

fbs commented Apr 13, 2020

Bpftrace fixes for s390x #1241

Bpftrace fixes for s390x #1241

Conversation

sumanthkorikkar commented Apr 3, 2020

sumanthkorikkar commented Apr 3, 2020

sumanthkorikkar commented Apr 3, 2020

danobi commented Apr 3, 2020

sumanthkorikkar commented Apr 4, 2020

fbs commented Apr 4, 2020

fbs commented Apr 4, 2020

sumanthkorikkar commented Apr 6, 2020

sumanthkorikkar commented Apr 6, 2020

sumanthkorikkar commented Apr 6, 2020

danobi Apr 8, 2020

Choose a reason for hiding this comment

sumanthkorikkar Apr 9, 2020

Choose a reason for hiding this comment

danobi Apr 9, 2020 • edited Loading

Choose a reason for hiding this comment

danobi commented Apr 8, 2020

sumanthkorikkar commented Apr 9, 2020 • edited Loading

danobi commented Apr 9, 2020

fbs commented Apr 9, 2020

sumanthkorikkar commented Apr 12, 2020

sumanthkorikkar commented Apr 12, 2020

danobi left a comment • edited Loading

Choose a reason for hiding this comment

fbs commented Apr 13, 2020

danobi Apr 9, 2020 •

edited

Loading

sumanthkorikkar commented Apr 9, 2020 •

edited

Loading

danobi left a comment •

edited

Loading