Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf segfaults #340

Closed
sscargal opened this issue Apr 15, 2022 · 16 comments · Fixed by #372
Closed

perf segfaults #340

sscargal opened this issue Apr 15, 2022 · 16 comments · Fixed by #372

Comments

@sscargal
Copy link

sscargal commented Apr 15, 2022

On Fedora 35 running Kernel 5.16.19-200.fc35.x86_64, gProfiler's perf segfaults on exeuction.

# setsid ./gprofiler -cu --token '<my_token>' --service-name 'test' &

Yields

[16:34:56] Running gprofiler (version 1.2.19), commandline: '-cu --token lN4td5dwQNA5PyF7D6wYQGstP8Vxh-RYPJIv6ZB0eZA --service-name pmemdev1'
[16:34:56] gProfiler Python version: 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
[16:34:56] gProfiler deployment mode: standalone_executable
[16:34:56] Kernel uname release: 5.16.19-200.fc35.x86_64
[16:34:56] Kernel uname version: #1 SMP PREEMPT Fri Apr 8 15:34:44 UTC 2022
[16:34:56] Total CPUs: 96
[16:34:56] Total RAM: 250.58 GB
[16:34:56] Linux distribution: Fedora Linux | 35 |
[16:34:56] libc version: glibc-2.34
[16:34:56] Hostname: hostname
[16:34:57] The connection to the server was successfully established (service 'pmemdev1')
[16:35:03] Initialized JavaProfiler (frequency: 11hz, duration: 60s)
[16:35:03] Couldn't create the Java profiler, continuing without it
Traceback (most recent call last):
  File "gprofiler/profilers/factory.py", line 41, in get_profilers
  File "gprofiler/profilers/java.py", line 608, in __init__
  File "gprofiler/profilers/java.py", line 624, in _init_ap_mode
  File "gprofiler/utils/perf.py", line 19, in can_i_use_perf_events
  File "gprofiler/utils/__init__.py", line 265, in run_process
gprofiler.exceptions.CalledProcessError: Command '['/tmp/_MEID3BPp6/gprofiler/resources/perf', 'record', '-o', '/dev/null', '--', '/bin/true']' died with <Signals.SIGSEGV: 11>.
stdout: b''
stderr: b''
[16:35:03] Initialized SystemProfiler (frequency: 11hz, duration: 60s)
[16:35:03] Initialized PythonEbpfProfiler (frequency: 11hz, duration: 60s)
[16:35:05] Initialized RbSpyProfiler (frequency: 11hz, duration: 60s)
[16:35:05] Could not find a Docker daemon or CRI-compatible daemon, profiling data will not include the container names. If you do have a containers runtime and it's not supported, please open a new issue here: https://github.com/Granulate/gprofiler/issues/new
[16:35:05] gProfiler initialized and ready to start profiling
[16:35:05] Starting profiling of Python processes with PyPerf
[16:35:06] Starting perf (fp mode)
[16:35:11] perf failed to start. stdout b'' stderr b''
[16:35:11] Unexpected error occurred
Traceback (most recent call last):
  File "gprofiler/main.py", line 771, in main
  File "gprofiler/main.py", line 357, in run_continuous
  File "gprofiler/main.py", line 148, in __enter__
  File "gprofiler/main.py", line 233, in start
  File "gprofiler/profilers/perf.py", line 192, in start
  File "gprofiler/profilers/perf.py", line 72, in start
  File "gprofiler/utils/__init__.py", line 155, in wait_event
TimeoutError

I see the following in dmesg

# dmesg -T | tail
[Fri Apr 15 16:35:02 2022] perf[13307]: segfault at 10 ip 00007fabc1498b34 sp 00007ffc7b92c5b0 error 4 in libssh.so.4.8.7[7fabc1486000+44000]
[Fri Apr 15 16:35:02 2022] Code: 00 00 00 31 c0 5b c3 b8 ff ff ff ff c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 83 ec 08 48 8d 3d 49 c4 04 00 e8 2c fc fe ff <8b> 80 10 00 00 00 48 83 c4 08 c3 90 f3 0f 1e fa 48 85 ff 74 1b 53

Running the command from gProfiler manually also yields a core dump

# /tmp/_MEID3BPp6/gprofiler/resources/perf record -o /dev/null -- /bin/true
Segmentation fault (core dumped)

Running the included perf does not

# /usr/bin/perf record -o /dev/null -- /bin/true
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.000 MB /dev/null ]
@sscargal sscargal changed the title perf segfaul' perf segfaults Apr 15, 2022
@Jongy
Copy link
Contributor

Jongy commented Apr 16, 2022

Hi @sscargal ! Thanks for the report.

Very weird. I can see the crash originates in libssh.so:

[Fri Apr 15 16:35:02 2022] perf[13307]: segfault at 10 ip 00007fabc1498b34 sp 00007ffc7b92c5b0 error 4 in libssh.so.4.8.7[7fabc1486000+44000]

which is weird because our perf build is static - so it must be some DSO that's coming from your system.

I will try testing gProfiler on a matching machine and update here if I can reproduce the same issue. If not, I might require more details from you to be able to fix this.

@Jongy
Copy link
Contributor

Jongy commented Apr 16, 2022

I did not manage to reproduce this issue on the following system:

core@ip-172-31-20-154:~$ uname -a
Linux ip-172-31-20-154 5.15.10-200.fc35.x86_64 #1 SMP Fri Dec 17 14:46:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
core@ip-172-31-20-154:~$ cat /etc/os-release 
NAME="Fedora Linux"
VERSION="35.20220116.20.1 (CoreOS)"
ID=fedora
VERSION_ID=35
...

Would you please be able to:

  1. Provide the source of libssh.so? I suppose it's installed from a standard package in Fedora CoreOS but I'm not acquainted with that system. My system has /usr/lib64/libssh.so.4.8.7 with this buildid: 1e466b1f1c44646e8ef4279b82980df32fa03261.
  2. Provide a log of strace /path/to/perf that ends up in a crash - I'd like to see what perf was trying to do, ending up loading some unrelated DSO (unrelated to my understanding :)

I'll try upgrading the kernel meanwhile, to get an exact environment as yours.

@Jongy
Copy link
Contributor

Jongy commented Apr 16, 2022

On my system, perf doesn't end up even trying to load libssh.so:

core@ip-172-31-20-154:~$ sudo cat /proc/3422/maps
00400000-00bfd000 r-xp 00000000 00:1f 2066                               /tmp/_MEIBB29hV/gprofiler/resources/perf
00dfc000-00fde000 rw-p 007fc000 00:1f 2066                               /tmp/_MEIBB29hV/gprofiler/resources/perf
00fde000-01092000 rw-p 00000000 00:00 0 
01ce5000-01d47000 rw-p 00000000 00:00 0                                  [heap]
7fa65cf40000-7fa65cf41000 ---p 00000000 00:00 0 
7fa65cf41000-7fa65d741000 rw-p 00000000 00:00 0 
7fa65d741000-7fa65d7c2000 rw-s 00000000 00:0e 11972                      anon_inode:[perf_event]
7fa65d7c2000-7fa65d843000 rw-s 00000000 00:0e 11972                      anon_inode:[perf_event]
7fa65d843000-7fa65d864000 rw-p 00000000 00:00 0 
7fa65d864000-7fa65d965000 rw-s 00000000 00:0e 11972                      anon_inode:[perf_event]
7fa65d965000-7fa65da66000 rw-s 00000000 00:0e 11972                      anon_inode:[perf_event]
7fa65da66000-7fa65da67000 r--p 00000000 ca:04 24944                      /usr/lib/locale/C.utf8/LC_COLLATE
7fa65da67000-7fa65da68000 r--p 00000000 ca:04 24949                      /usr/lib/locale/C.utf8/LC_MONETARY
7fa65da68000-7fa65da69000 r--p 00000000 ca:04 472482                     /usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES
7fa65da69000-7fa65da6a000 r--p 00000000 ca:04 24954                      /usr/lib/locale/C.utf8/LC_PAPER
7fa65da6a000-7fa65da6b000 r--p 00000000 ca:04 24953                      /usr/lib/locale/C.utf8/LC_NAME
7fa65da6b000-7fa65da6c000 r--p 00000000 ca:04 24939                      /usr/lib/locale/C.utf8/LC_ADDRESS
7fa65da6c000-7fa65da6d000 r--p 00000000 ca:04 24956                      /usr/lib/locale/C.utf8/LC_TELEPHONE
7fa65da6d000-7fa65da6e000 r--p 00000000 ca:04 24948                      /usr/lib/locale/C.utf8/LC_MEASUREMENT
7fa65da6e000-7fa65da6f000 r--p 00000000 ca:04 24947                      /usr/lib/locale/C.utf8/LC_IDENTIFICATION
7ffd41b3a000-7ffd41b5b000 rw-p 00000000 00:00 0                          [stack]
7ffd41bcf000-7ffd41bd3000 r--p 00000000 00:00 0                          [vvar]
7ffd41bd3000-7ffd41bd5000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

@sscargal
Copy link
Author

sscargal commented Apr 18, 2022

Thanks for the update @Jongy.

To clarify, I'm using Fedora Server 35, not Fedora CoreOS, though it should be close.

# cat /etc/fedora-release
Fedora release 35 (Thirty Five)

# cat /etc/os-release
NAME="Fedora Linux"
VERSION="35 (Server Edition)"
ID=fedora
VERSION_ID=35
VERSION_CODENAME=""
PLATFORM_ID="platform:f35"
PRETTY_NAME="Fedora Linux 35 (Server Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:35"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f35/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=35
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=35
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Server Edition"
VARIANT_ID=server
# uname -r
5.16.19-200.fc35.x86_64

Here's the requested data (+attachments):

Execute gProfiler under strace

# strace -f -o /downloads/gProfiler.perf.strace /tmp/_MEID3BPp6/gprofiler/resources/perf record -o /dev/null -- /bin/true

Checking dmesg

# dmesg -T | tail
[Mon Apr 18 09:08:21 2022] perf[45663]: segfault at 10 ip 00007f1ed1584b34 sp 00007ffc9808a200 error 4 in libssh.so.4.8.7[7f1ed1572000+44000]
[Mon Apr 18 09:08:21 2022] Code: 00 00 00 31 c0 5b c3 b8 ff ff ff ff c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 83 ec 08 48 8d 3d 49 c4 04 00 e8 2c fc fe ff <8b> 80 10 00 00 00 48 83 c4 08 c3 90 f3 0f 1e fa 48 85 ff 74 1b 53

Here's the core dump

ls -lrt /var/crash/core-
[gProfiler_core_and_strace.tar.gz](https://github.com/Granulate/gprofiler/files/8506773/gProfiler_core_and_strace.tar.gz)
perf*
[...snip...]
-rw-------. 1 root root    4837376 Apr 18 09:08 core-perf-11-0-0-45663-1650294502

# file core-perf-11-0-0-45663-1650294502
core-perf-11-0-0-45663-1650294502: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/tmp/_MEID3BPp6/gprofiler/resources/perf record -o /dev/null -- /bin/true', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/tmp/_MEID3BPp6/gprofiler/resources/perf', platform: 'x86_64'

Find the location of the 'libssh.so.4.8.7' library

# find / -name "libssh.so.4.8.7"
/usr/lib64/libssh.so.4.8.7

Identify the package that installed this library

# dnf whatprovides /usr/lib64/libssh.so.4.8.7
Last metadata expiration check: 0:56:46 ago on Mon 18 Apr 2022 08:25:58 AM MDT.
libssh-0.9.6-1.fc35.x86_64 : A library implementing the SSH protocol
Repo        : @System
Matched from:
Filename    : /usr/lib64/libssh.so.4.8.7

libssh-0.9.6-1.fc35.x86_64 : A library implementing the SSH protocol
Repo        : fedora
Matched from:
Filename    : /usr/lib64/libssh.so.4.8.7

# dnf info libssh
Last metadata expiration check: 0:54:12 ago on Mon 18 Apr 2022 08:25:58 AM MDT.
Installed Packages
Name         : libssh
Version      : 0.9.6
Release      : 1.fc35
Architecture : x86_64
Size         : 501 k
Source       : libssh-0.9.6-1.fc35.src.rpm
Repository   : @System
From repo    : fedora
Summary      : A library implementing the SSH protocol
URL          : http://www.libssh.org
License      : LGPLv2+
Description  : The ssh library was designed to be used by programmers needing a working SSH
             : implementation by the mean of a library. The complete control of the client is
             : made by the programmer. With libssh, you can remotely execute programs, transfer
             : files, use a secure and transparent tunnel for your remote programs. With its
             : Secure FTP implementation, you can play with remote files easily, without
             : third-party programs others than libcrypto (from openssl).

Attachments:
gProfiler_core_and_strace.tar.gz

@sscargal
Copy link
Author

sscargal commented Apr 18, 2022

Here's the backtrace from the perf core file

Core was generated by `/tmp/_MEID3BPp6/gprofiler/resources/perf record -o /dev/null -- /bin/true'.
Program terminated with signal SIGSEGV, Segmentation fault.

(gdb) bt
#0  0x00007f1ed1584b34 in ssh_get_log_level () at /usr/src/debug/libssh-0.9.6-1.fc35.x86_64/src/log.c:197
#1  0x00007f1ed1584f5d in _ssh_log (verbosity=verbosity@entry=1, function=function@entry=0x7f1ed15bc450 <__func__.0.lto_priv.16> "ssh_crypto_init",
    format=format@entry=0x7f1ed15ba900 "libssh compiled with %s headers, currently running with %s.") at /usr/src/debug/libssh-0.9.6-1.fc35.x86_64/src/log.c:124
#2  0x00007f1ed1587de0 in ssh_crypto_init () at /usr/src/debug/libssh-0.9.6-1.fc35.x86_64/src/libcrypto.c:1087
#3  ssh_crypto_init () at /usr/src/debug/libssh-0.9.6-1.fc35.x86_64/src/libcrypto.c:1079
#4  _ssh_init (constructor=constructor@entry=1) at /usr/src/debug/libssh-0.9.6-1.fc35.x86_64/src/init.c:79
#5  0x00007f1ed1575882 in libssh_constructor () at /usr/src/debug/libssh-0.9.6-1.fc35.x86_64/src/init.c:115
#6  0x00000000008435ba in call_init.part ()
#7  0x00000000008436cb in _dl_init ()
#8  0x00000000008219b5 in dl_open_worker ()
#9  0x000000000081fac4 in _dl_catch_error ()
#10 0x0000000000821319 in _dl_open ()
#11 0x00000000007a4c87 in dlopen_doit ()
#12 0x000000000081fac4 in _dl_catch_error ()
#13 0x00000000007a50b2 in _dlerror_run ()
#14 0x00000000007a4cde in __dlopen ()
#15 0x000000000040394d in __libdwfl_debuginfod_init () at debuginfod-client.c:107
#16 0x000000000076b707 in __libc_csu_init ()
#17 0x000000000076abbe in generic_start_main ()
#18 0x000000000076b205 in __libc_start_main ()
#19 0x0000000000404379 in _start ()

@sscargal
Copy link
Author

Here's the strace of the /usr/bin/perf record -o /dev/null -- /bin/true for comparison
usr-bin-perf.strace.gz

@sscargal
Copy link
Author

I filed a new libssh bug as it seems to be in their code - https://gitlab.com/libssh/libssh-mirror/-/issues/123. I'm still trying to understand why gProfiler's included perf ends up in this code path vs the OS included perf.

@Jongy
Copy link
Contributor

Jongy commented Apr 19, 2022

Thanks for all the details @sscargal . It will take me a few days to find the time to look into them, but I'm definitely curious as to why our statically linked perf ends up loading "random" DSOs from the target system. It should not happen.

If you reach any new conclusions, please post here. I will post my findings as well.

@Jongy
Copy link
Contributor

Jongy commented Apr 27, 2022

Hi @sscargal I finally had the time to look into it. Per the crashing stacktrace, the offender is a c'tor __libdwfl_debuginfod_init which attempts to dlopen("libdebuginfod.so") and it ends up loading the faulty libssh.so.

We don't really need debuginfod, I pushed a branch https://github.com/Granulate/gprofiler/tree/perf-no-debuginfod which should disable its incorporation into our perf build, would you mind trying it out and telling me if it solved your problem?

For convenience I also built a binary from that branch, if you just wanna try it out without building yourself.

@sscargal
Copy link
Author

Sorry for the long delay.

I downloaded the binary but still encounter the same issue

Remove the old temp directories

# cd /tmp
# rm -rf gprofiler_tmp _MEI6aULwB

Here are the checksums for the new gprofiler binary I downloaded

# cksum gprofiler
1619462177 81512440 gprofiler
# md5sum gprofiler
910c1c7f66fab2288d41cb7a1d99729b  gprofiler
# sum gprofiler
35968 79602 gprofiler

Executing gprofiler through the CLI

# setsid ./gprofiler -cu --token='lN4td...' --service-name='testing'
[14:00:29] Running gProfiler (version 1.2.21), commandline: '-cu --token=lN4td5dwQNA5PyF7D6wYQGstP8Vxh-RYPJIv6ZB0eZA --service-name=testing-pmemdev'
[14:00:29] gProfiler arguments: Namespace(app_id_args_filters=[], application_metadata=True, collect_metadata=True, collect_metrics=True, config=None, container_names=True, continuous=True, controller_pid=None, duration=60, flamegraph=True, frequency=11, identify_applications=True, java_async_profiler_args=None, java_async_profiler_buildids=False, java_async_profiler_mode='auto', java_async_profiler_safemode=64, java_jattach_timeout=30, java_mode='ap', java_safemode='profiled-oom,profiled-signaled,hserr', java_version_check=True, log_file='/var/log/gprofiler/gprofiler.log', log_rotate_backup_count=1, log_rotate_max_size=5242880, log_to_server=True, log_usage=False, nodejs_mode='disabled', output_dir=None, perf_dwarf_stack_size=8192, perf_inject=False, perf_mode='fp', php_mode='disabled', php_process_filter='php-fpm', pid_ns_check=True, profile_api_version=None, python_add_versions=True, python_mode='auto', python_pyperf_user_stacks_pages=None, rotating_output=False, ruby_mode='rbspy', server_host='https://profiler.granulate.io', server_token='lN4td5dwQNA5PyF7D6wYQGstP8Vxh-RYPJIv6ZB0eZA', server_upload_timeout=120, service_name='testing-pmemdev', upload_results=True, verbose=False)
[14:00:29] gProfiler Python version: 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
[14:00:29] gProfiler deployment mode: standalone_executable
[14:00:29] Kernel uname release: 5.18.0-60.vanilla.1.fc36.x86_64
[14:00:29] Kernel uname version: #1 SMP PREEMPT_DYNAMIC Mon May 23 14:06:04 UTC 2022
[14:00:29] Total CPUs: 96
[14:00:29] Total RAM: 250.59 GB
[14:00:29] Linux distribution: Fedora Linux | 36 |
[14:00:29] libc version: glibc-2.35
[14:00:29] Hostname: hostname
[14:00:29] The connection to the server was successfully established (service 'testing')
[14:00:35] Initialized JavaProfiler (frequency: 11hz, duration: 60s)
[14:00:35] Couldn't create the Java profiler, continuing without it
Traceback (most recent call last):
  File "gprofiler/profilers/factory.py", line 41, in get_profilers
  File "gprofiler/profilers/java.py", line 637, in __init__
  File "gprofiler/profilers/java.py", line 654, in _init_ap_mode
  File "gprofiler/utils/perf.py", line 19, in can_i_use_perf_events
  File "gprofiler/utils/__init__.py", line 265, in run_process
gprofiler.exceptions.CalledProcessError: Command '['/tmp/_MEI546fJ0/gprofiler/resources/perf', 'record', '-o', '/dev/null', '--', '/bin/true']' died with <Signals.SIGSEGV: 11>.
stdout: b''
stderr: b''
[14:00:35] Initialized SystemProfiler (frequency: 11hz, duration: 60s)
[14:00:35] Initialized PythonEbpfProfiler (frequency: 11hz, duration: 60s)
[14:00:37] Initialized RbSpyProfiler (frequency: 11hz, duration: 60s)
[14:00:37] Could not find a Docker daemon or CRI-compatible daemon, profiling data will not include the container names. If you do have a containers runtime and it's not supported, please open a new issue here: https://github.com/Granulate/gprofiler/issues/new
[14:00:37] gProfiler initialized and ready to start profiling
[14:00:37] Starting profiling of Python processes with PyPerf
[14:00:38] Starting perf (fp mode)
[14:00:43] perf failed to start. stdout b'' stderr b''
[14:00:43] Unexpected error occurred
Traceback (most recent call last):
  File "gprofiler/main.py", line 781, in main
  File "gprofiler/main.py", line 357, in run_continuous
  File "gprofiler/main.py", line 148, in __enter__
  File "gprofiler/main.py", line 233, in start
  File "gprofiler/profilers/perf.py", line 192, in start
  File "gprofiler/profilers/perf.py", line 72, in start
  File "gprofiler/utils/__init__.py", line 155, in wait_event
TimeoutError
#

Running the included perf manually triggers the core dump

# /tmp/_MEIQ4xa4P/gprofiler/resources/perf record /bin/true
Segmentation fault (core dumped)

Using gdb

# gdb /tmp/_MEIQ4xa4P/gprofiler/resources/perf /var/crash/core-perf-11-0-0-5510-1653595764
[...]
Core was generated by `/tmp/_MEIQ4xa4P/gprofiler/resources/perf record /bin/true'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f54860e9e64 in ssh_get_log_level () at /usr/src/debug/libssh-0.9.6-4.fc36.x86_64/src/log.c:197
197       return ssh_log_level;

(gdb) bt
#0  0x00007f54860e9e64 in ssh_get_log_level () at /usr/src/debug/libssh-0.9.6-4.fc36.x86_64/src/log.c:197
#1  0x00007f54860ea28d in _ssh_log (verbosity=verbosity@entry=1, function=function@entry=0x7f5486121130 <__func__.0.lto_priv.16> "ssh_crypto_init",
    format=format@entry=0x7f548611f8f0 "libssh compiled with %s headers, currently running with %s.") at /usr/src/debug/libssh-0.9.6-4.fc36.x86_64/src/log.c:124
#2  0x00007f54860ed1b8 in ssh_crypto_init () at /usr/src/debug/libssh-0.9.6-4.fc36.x86_64/src/libcrypto.c:1087
#3  ssh_crypto_init () at /usr/src/debug/libssh-0.9.6-4.fc36.x86_64/src/libcrypto.c:1079
#4  _ssh_init (constructor=constructor@entry=1) at /usr/src/debug/libssh-0.9.6-4.fc36.x86_64/src/init.c:79
#5  0x00007f54860da7b2 in libssh_constructor () at /usr/src/debug/libssh-0.9.6-4.fc36.x86_64/src/init.c:115
#6  0x00000000008435ba in call_init.part ()
#7  0x00000000008436cb in _dl_init ()
#8  0x00000000008219b5 in dl_open_worker ()
#9  0x000000000081fac4 in _dl_catch_error ()
#10 0x0000000000821319 in _dl_open ()
#11 0x00000000007a4c87 in dlopen_doit ()
#12 0x000000000081fac4 in _dl_catch_error ()
#13 0x00000000007a50b2 in _dlerror_run ()
#14 0x00000000007a4cde in __dlopen ()
#15 0x000000000040394d in __libdwfl_debuginfod_init () at debuginfod-client.c:107
#16 0x000000000076b707 in __libc_csu_init ()
#17 0x000000000076abbe in generic_start_main ()
#18 0x000000000076b205 in __libc_start_main ()
#19 0x0000000000404379 in _start ()

@Jongy
Copy link
Contributor

Jongy commented May 29, 2022

Hmm, same issue. Probably my build without debuginfod did not succeed.
I will look into it this week, and this time I will make sure that __libdwfl_debuginfod_init indeed doesn't run by placing a probe on it. Once I manage to "disable" that function, I will let you know to try again.

@Jongy
Copy link
Contributor

Jongy commented Jun 4, 2022

@sscargal , letting you know that I found the problem. It is libdw which forced the link with debuginfod-client.c. I found the fixing commit for elfutils - https://sourceware.org/git/?p=elfutils.git;a=commit;h=4ced3010ad11 & upgraded our elfutils to make use of this new "feature" but currently gProfiler's build is failing for other reasons...

I'll update it once I get it built & tested (I test by ensuring that __libdwfl_debuginfod_init is not present in the final binary). I am convinced this fix should be in gProfiler's mainline by now, so you can just wait for the next release.

@Jongy
Copy link
Contributor

Jongy commented Jun 5, 2022

I managed to build a version that doesn't use libdebuginfod, so the problematic __libdwfl_debuginfod_init doesn't exist. You can see the relevant changes in the PR, and download the build artifacts to try it out.

It will be merged and included in the next release.

@Jongy Jongy closed this as completed in #372 Jun 6, 2022
@Jongy
Copy link
Contributor

Jongy commented Jun 12, 2022

@sscargal gProfiler 1.2.25 was released and it includes the fix :) you can try it out.

@sscargal
Copy link
Author

Version 1.2.25 resolves the issue, many thanks!

# setsid ./gprofiler -cu --token='lN4td5dwQNA5PyF7D6wYQGstP8Vxh-RYPJIv6ZB0eZA' --service-name='test'
[08:37:07] Running gProfiler (version 1.2.25), commandline: '-cu --token=... --service-name=test'
[08:37:07] gProfiler arguments: Namespace(app_id_args_filters=[], application_metadata=True, collect_metadata=True, collect_metrics=True, config=None, container_names=True, continuous=True, controller_pid=None, curlify_requests=False, duration=60, flamegraph=True, frequency=11, identify_applications=True, java_async_profiler_args=None, java_async_profiler_buildids=False, java_async_profiler_mcache=30, java_async_profiler_mode='auto', java_async_profiler_safemode=64, java_jattach_timeout=30, java_mode='ap', java_safemode='profiled-oom,profiled-signaled,hserr', java_version_check=True, log_file='/var/log/gprofiler/gprofiler.log', log_rotate_backup_count=1, log_rotate_max_size=5242880, log_to_server=True, log_usage=False, nodejs_mode='disabled', output_dir=None, perf_dwarf_stack_size=8192, perf_inject=False, perf_mode='fp', php_mode='disabled', php_process_filter='php-fpm', pid_ns_check=True, profile_api_version=None, python_add_versions=True, python_mode='auto', python_pyperf_user_stacks_pages=None, rotating_output=False, ruby_mode='rbspy', server_host='https://profiler.granulate.io', server_token='...', server_upload_timeout=120, service_name='test', upload_results=True, verbose=False)
[08:37:07] gProfiler Python version: 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
[08:37:07] gProfiler deployment mode: standalone_executable
[08:37:07] Kernel uname release: 5.18.0-60.vanilla.1.fc36.x86_64
[08:37:07] Kernel uname version: #1 SMP PREEMPT_DYNAMIC Mon May 23 14:06:04 UTC 2022
[08:37:07] Total CPUs: 96
[08:37:07] Total RAM: 3222.59 GB
[08:37:07] Linux distribution: Fedora Linux | 36 |
[08:37:07] libc version: glibc-2.35
[08:37:07] Hostname: hostname
[08:37:07] The connection to the server was successfully established (service 'test')
[08:37:14] Initialized JavaProfiler (frequency: 11hz, duration: 60s)
[08:37:15] Initialized SystemProfiler (frequency: 11hz, duration: 60s)
[08:37:15] Initialized PythonEbpfProfiler (frequency: 11hz, duration: 60s)
[08:37:17] Initialized RbSpyProfiler (frequency: 11hz, duration: 60s)
[08:37:17] Discovered container runtimes: ['docker']
[08:37:17] gProfiler initialized and ready to start profiling
[08:37:17] Starting profiling of Python processes with PyPerf
[08:37:18] Starting perf (fp mode)
[08:37:18] Started perf (fp mode)
[08:38:21] Successfully uploaded profiling data to the server
[08:39:23] Successfully uploaded profiling data to the server
[08:40:26] Successfully uploaded profiling data to the server
[08:41:28] Successfully uploaded profiling data to the server
[...snip...]

@Jongy
Copy link
Contributor

Jongy commented Jun 13, 2022

Excellent, happy to hear that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants