Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm-addr2line does not work on AOMP-generated binary #616

Closed
pearzt opened this issue Aug 22, 2023 · 7 comments
Closed

llvm-addr2line does not work on AOMP-generated binary #616

pearzt opened this issue Aug 22, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@pearzt
Copy link

pearzt commented Aug 22, 2023

Considering the following minimal example:

foo.c:

#include <stdio.h>
#define __USE_GNU
#include <dlfcn.h>

int bar() {
    return 42;
}

int main() {
    const void* code_addr = bar;
    Dl_info dl_info;
    dladdr(code_addr, &dl_info);
    printf("%s\n", dl_info.dli_fname);
    void* base_address = dl_info.dli_fbase;

    // this is the address in the ELF file
    size_t relative_address = (size_t) code_addr - (size_t) base_address;
    printf("0x%zx\n", relative_address);
}

run.sh:

#!/usr/bin/env bash
which clang
which llvm-addr2line

clang foo.c -o foo.out -ldl -g -O0 || exit 1
./foo.out
llvm-addr2line $(./foo.out | tail -n1) -e foo.out || exit 1

With LLVM-upstream clang 17.0.0, it works as expected and prints the source location of bar:

$ ./run.sh
/.../LLVM/upstream/bin/clang
/.../LLVM/upstream/bin/llvm-addr2line
./foo.out
0x1150
/.../foo.c:5

However, with AOMP, the source attribution fails, as dladdr seems to return a wrong address:

$ ./run.sh
/.../AOMP/17.0-3/bin/clang
/.../AOMP/17.0-3/bin/llvm-addr2line
./foo.out
0x1750
??:0

AOMP clang version: AOMP_STANDALONE_17.0-3 clang version 17.0.0 (https://github.com/radeonopencompute/llvm-project f959ea5d8d1e5aef4b6d06727a9698316d3d33cd)

@gregrodgers gregrodgers added the bug Something isn't working label Sep 26, 2023
@gregrodgers
Copy link
Contributor

I confirm this fails in 18.0-0 .

grodgers@ixt-sjc2-08:~/git/trunk18.0/aomp/test/smoke/foo$ cat run.sh
#!/usr/bin/env bash
which clang
which llvm-addr2line

clang foo.c -o foo.out -ldl -g -O0 || exit 1
./foo.out
llvm-addr2line $(./foo.out | tail -n1) -e foo.out || exit 1
grodgers@ixt-sjc2-08:~/git/trunk18.0/aomp/test/smoke/foo$ chmod 755 run.sh
grodgers@ixt-sjc2-08:~/git/trunk18.0/aomp/test/smoke/foo$ ./run.sh
/usr/lib/aomp_18.0-0/bin/clang
/usr/lib/aomp_18.0-0/bin/llvm-addr2line
./foo.out
0x1710
??:0

I am not familiar with llvm-addr2line nor dladdr. I will ask around.

@slinder1
Copy link
Contributor

AOMP is producing a true executable, rather than a shared object or PIE:

$ readelf -h foo.aomp.out | grep Type:
  Type:                              EXEC (Executable file)

Of note, it seems like dladdr is succeeding and returning something reasonable, i.e. the load address of the executable, but that it is not guaranteed to be meaningful for symbols not defined in a shared object.

Either way, the addresses in the DWARF information for an EXEC ELF are already virtual addresses (see section 7.3.3 "Executable Objects") so subtracting out the load address is incorrect. You can reproduce the issue without any LLVM/AMD tooling at all, e.g. with gcc (with an explicit -no-pie) and non-llvm addr2line:

$ gcc foo.c -o foo.gcc.out -ldl -g -O0 -no-pie
$ addr2line $(./foo.gcc.out | tail -n1) -e foo.gcc.out
??:0

If there is any bug to file against AOMP it would just be that it likely should not produce a non-PIE executable. I don't know enough context about that default to comment more though.

@mhalk
Copy link
Contributor

mhalk commented Jan 30, 2024

Apologies for the radio-silence.

I just re-tested everything with the latest versions of AOMP and upstream LLVM, both of which behave the same.
That being said using -fPIC -pie creates a PIC executable as expected where llvm-addr2line will provide the expected output:

-- GCC:
./aomp-addr2line_gcc.out
0x11a9
./aomp-addr2line/aomp-addr2line.c:5
  Type:                              DYN (Position-Independent Executable file)

-- AOMP:
./aomp-addr2line_aomp.out
0x1740
./aomp-addr2line/aomp-addr2line.c:5
  Type:                              DYN (Position-Independent Executable file)

-- trunk:
./aomp-addr2line_trunk.out
0x1770
./aomp-addr2line/aomp-addr2line.c:5
  Type:                              DYN (Position-Independent Executable file)

(Without -fPIC -pie clang will generate: EXEC (Executable file))

Interestingly, when an opt-level > 0 is used we will get the following warning -AND- another line number (the one of the return statement) with AOMP and LLVM upstream (gcc remains the same, at the function definition):
readelf: Warning: Unrecognized form: 0x22 (repeated two times, I guess for every function)

Since this behavior is not AOMP exclusive, I guess this should be tackled upstream.

From the history of this ticket we should expect to compile the given example into a DYN (Position-Independent Executable file) without(!) -fPIC -pie.
Q: Is this correct?

I will also take a look (i.e. at least try(!) to understand) why the opt-level changes the reported line number.

@slinder1
Copy link
Contributor

Apologies for the radio-silence.

I just re-tested everything with the latest versions of AOMP and upstream LLVM, both of which behave the same. That being said using -fPIC -pie creates a PIC executable as expected where llvm-addr2line will provide the expected output:

-- GCC:
./aomp-addr2line_gcc.out
0x11a9
./aomp-addr2line/aomp-addr2line.c:5
  Type:                              DYN (Position-Independent Executable file)

-- AOMP:
./aomp-addr2line_aomp.out
0x1740
./aomp-addr2line/aomp-addr2line.c:5
  Type:                              DYN (Position-Independent Executable file)

-- trunk:
./aomp-addr2line_trunk.out
0x1770
./aomp-addr2line/aomp-addr2line.c:5
  Type:                              DYN (Position-Independent Executable file)

(Without -fPIC -pie clang will generate: EXEC (Executable file))

Interestingly, when an opt-level > 0 is used we will get the following warning -AND- another line number (the one of the return statement) with AOMP and LLVM upstream (gcc remains the same, at the function definition): readelf: Warning: Unrecognized form: 0x22 (repeated two times, I guess for every function)

Since this behavior is not AOMP exclusive, I guess this should be tackled upstream.

From the history of this ticket we should expect to compile the given example into a DYN (Position-Independent Executable file) without(!) -fPIC -pie. Q: Is this correct?

I will also take a look (i.e. at least try(!) to understand) why the opt-level changes the reported line number.

The form it is confused by is DW_FORM_loclistx which is new to DWARF5. It likely only shows up when optimizing because the location at O0 is generally just a single stack location for the life of the variable, so non-loclist form works for everything.

The readelf you are using likely doesn't support DWARF5, maybe try explicitly asking clang for -gdwarf-4

@mhalk
Copy link
Contributor

mhalk commented Jan 30, 2024

The form it is confused by is DW_FORM_loclistx which is new to DWARF5. It likely only shows up when optimizing because the location at O0 is generally just a single stack location for the life of the variable, so non-loclist form works for everything.

The readelf you are using likely doesn't support DWARF5, maybe try explicitly asking clang for -gdwarf-4

Thanks for all the info, much appreciated! Using -gdwarf-4 resolves the warnings as expected.
Just FYI in that case we will still get the wrong(?) (at least different) line of the return 42; statement.

@mhalk mhalk self-assigned this Feb 1, 2024
@mhalk
Copy link
Contributor

mhalk commented Feb 1, 2024

tl;dr: My take is this: the issue seems resolved. (If there are no objections, I will close the issue in one week.)

So, I did quite a bit of testing with different versions of AOMP and upstream LLVM.
End of July '23 there seems to have been a change in AOMP which affected the generation of an Position-Independent-Executable, which was resolved around a week later.
Since then: AOMP and LLVM behave the same in that regard, i.e. at least as far as I can tell.

IMHO this basically invalidates the original reason the ticket was submitted.
(The reason I'd tend to close it.)

However, we should take note that I observed the following differences between gcc and clang w.r.t. this ticket:

  • When -fPIC -pie is given a DYN (Position-Independent Executable file) is generated by clang.
    • Otherwise: EXEC (Executable file)
    • gcc will generate DYN even without these flags (version 11.4)
  • Providing an optimization level greater zero will change the line number reported by llvm-addr2line -- which does not match the one reported by the combination of gcc and addr2line.
    • It will be the line of the last statement within the function (instead of the function definition start).
    • gcc / addr2line will output the line of function definition start, regardless of the opt-level.

@mhalk
Copy link
Contributor

mhalk commented Feb 9, 2024

If there are no objections, I will close the issue in one week.

Closing as discussed.

@mhalk mhalk closed this as completed Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants