Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Version 1.3.0 test failure #2831

Closed
remicollet opened this issue Sep 3, 2024 · 28 comments
Closed

[Bug]: Version 1.3.0 test failure #2831

remicollet opened this issue Sep 3, 2024 · 28 comments
Labels
🐛 bug Something isn't working

Comments

@remicollet
Copy link
Contributor

remicollet commented Sep 3, 2024

Bug report

I'm used to NOT run test suite anymore during package build (too much time and issues)

BTW I run test suite to check current situation

  • PHP 7.4: 3 tests failed
  • PHP 8.0: 10 tests failed
  • PHP 8.1: 10 tests failed
  • PHP 8.2: 15 tests failed
  • PHP 8.3: 17 tests failed

Full build log attached (--show-diff used, using RHEL 9.4)

build-php74.txt
build-php80.txt
build-php81.txt
build-php82.txt
build-php83.txt

PHP version

7.4.33 to 8.3.11

Tracer or profiler version

1.3.0

@remicollet remicollet added the 🐛 bug Something isn't working label Sep 3, 2024
@bwoebi
Copy link
Collaborator

bwoebi commented Sep 3, 2024

  • tests/ext/startup_logging_json.phpt fails because you set export DD_INSTRUMENTATION_TELEMETRY_ENABLED=0. Same for tests/ext/crashtracker_segfault.phpt. Also DD_APPSEC_SCA_ENABLED related tests. They all expect telemetry being available.
    • If I recall correctly, you reported issues with that one some time in the past. We did some fixes recently related to that, maybe it works now without this disabled?
  • tests/ext/pcntl/pcntl_fork_long_running_autoflush.phpt fails because your test runner does not set -d datadog.trace.sources_path= (that value gets set during installation - I suppose we can just force-set it in a couple tests).
  • You have a couple segfaults related to generators, which I cannot reproduce (trying yum install php with pecl install datadog_trace on the registry.access.redhat.com/ubi9/ubi:9.4-1181.1724035907 container (which is the best thing I found to spin up RHEL 9.4 right now). It would be interesting to fix these. Is it possible for you to report some more details on that?

Thanks for reporting us these test results. In particular the segfaults are interesting (though I fail to reproduce it).

@remicollet
Copy link
Contributor Author

Without export DD_INSTRUMENTATION_TELEMETRY_ENABLED=0 things are worst

Tests failed : 82 ( 16.1%) ( 17.6%)

build-php83-el9.txt

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 4, 2024

It will cause some tests to fail, because of the log output is tested, but could you please run it once with RUST_BACKTRACE=1 DD_TRACE_LOG_LEVEL=trace DD_TRACE_LOG_FILE=/tmp/logfile.log and send me that logfile at the end? The probably cause for all these broken pipe issues are likely some rust panics in the other process - which would be visible there.

@remicollet
Copy link
Contributor Author

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 4, 2024

No [sidecar] logs present at all :-(
Is it possible to provide a strace -f? That should give a definitive hint.

EDIT: I'm blind, I found /builddir/build/BUILD/php83-php-pecl-datadog-trace-1.3.0/datadog_trace-1.3.0/../NTS/modules/ddtrace.so: undefined symbol: php_base64_encode.

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 4, 2024

Does your generated (generated during make) components-rs/php_sidecar_mockgen/mock_php_syms.c contain php_base64_encode?
This file contains all symbols from the php binary which ddtrace depends on.

What I'm doing is (setting your LDFLAGS or CFLAGS does not make a difference):

docker run --rm -ti registry.access.redhat.com/ubi9/ubi:9.4-1181.1724035907 bash
dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
dnf install https://rpms.remirepo.net/enterprise/remi-release-9.rpm
dnf module install php:remi-8.3
yum install cargo php-devel curl-devel php-pear
pecl download datadog_trace
tar xzf datadog_trace-1.3.0.tgz
cd datadog_trace-1.3.0
phpize
./configure
make install

This should hopefully be the exact same php binary than the one you are building against?
I'm wondering, what exactly the difference is between what I'm doing and what you're doing.

Then:

[root@a4664eb14032 datadog_trace-1.3.0]# DD_TRACE_CLI_ENABLED=1 php -d extension=ddtrace -r ''
[root@a4664eb14032 datadog_trace-1.3.0]# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4620  3200 pts/0    Ss   13:31   0:00 bash
root      9393  0.0  0.0 968596 11780 ?        Ssl  14:10   0:00 datadog-ipc-helper  /usr/lib64/php/modules/ddtrace.so /proc/self/fd/4 ddog_daemon_entry_point
root      9410  0.0  0.0   7416  2816 pts/0    R+   14:10   0:00 ps aux

I see the datadog-ipc-helper process properly running.

@remicollet
Copy link
Contributor Author

remicollet commented Sep 5, 2024

$ DD_TRACE_CLI_ENABLED=1 php -d extension=ddtrace -r ''
[ddtrace] [error] Failed flushing telemetry buffer: Os { code: 32, kind: BrokenPipe, message: "Relais brisé (pipe)" }
[ddtrace] [error] Failed flushing service data: Os { code: 32, kind: BrokenPipe, message: "Relais brisé (pipe)" }
[ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Relais brisé (pipe)" }
$

But

$ DD_TRACE_CLI_ENABLED=1 php -n -d extension=ddtrace -r ''
$

P.S. looks like a conflict with swoole extension, when removed, it's ok (but is never present during build in a clean chroot, with minimal extension set)

@remicollet
Copy link
Contributor Author

Does your generated (generated during make) components-rs/php_sidecar_mockgen/mock_php_syms.c contain php_base64_encode?

No.

@remicollet
Copy link
Contributor Author

remicollet commented Sep 5, 2024

Can reproduce using (using ddtrace from package)

$ podman run --rm -ti registry.access.redhat.com/ubi9/ubi:9.4-1181.1724035907 bash
dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
dnf install https://rpms.remirepo.net/enterprise/remi-release-9.rpm
dnf module install php:remi-8.3
dnf install cargo php-devel  php-ddtrace php-pear
pecl download datadog_trace
tar xzf datadog_trace-1.3.0.tgz
cd datadog_trace-1.3.0
phpize

# run test using packaged extension
export DD_TRACE_CLI_ENABLED=1
export DD_INSTRUMENTATION_TELEMETRY_ENABLED=1
export TEST_PHP_ARGS="-n  -d extension=curl  -d extension=posix -d extension=ddtrace"
php -n run-tests.php -q --show-diff -j15
=> tons of failure

# build extension and run test with it
./configure
make -j15
export TEST_PHP_ARGS="-n  -d extension=curl  -d extension=posix -d extension=$PWD/modules/ddtrace.so"
php -n run-tests.php -q --show-diff -j15
=> only 2 failed tests

Notice: package extension is stripped

-rwxr-xr-x. 1 root root 81523024 Sep  5 08:33 /datadog_trace-1.3.0/modules/ddtrace.so
-rwxr-xr-x. 1 root root 10638400 Aug 30 14:44 /usr/lib64/php/modules/ddtrace.so

@remicollet
Copy link
Contributor Author

OK, this is related to build flags (optimization and security)

Adding, before configure/make, allow to reproduce the failures

dnf install redhat-rpm-config
CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
export CFLAGS
XXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
export CXXFLAGS
FFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/opt/remi/php83/root/usr/lib64/gfortran/modules'
export FFLAGS
FCFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/opt/remi/php83/root/usr/lib64/gfortran/modules'
export FCFLAGS
LDFLAGS='-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 '
export LDFLAGS
./configure

And this is not LTO related (I have tried without)

@remicollet
Copy link
Contributor Author

I need to play a bit more

  • with all default RPM build options; tons of failure
  • without "hardened" options, only 11 failed tests (with segfaults)
  • without all standard build options, I'm able to reduce to 3 failed tests,

But this ext takes so much time to build.... will try if I can find some spare time.

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 5, 2024

Thanks Remi, I can actually reproduce it this time!

Let me try looking into it, whether I can figure anything obvious.

bwoebi added a commit that referenced this issue Sep 5, 2024
LTO seems to recognize some variable was never set and eliminated a branch we relied on to exist. Fool the compiler a bit better.

And add minor changes to the tests to allow running them more generally.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
bwoebi added a commit that referenced this issue Sep 5, 2024
LTO seems to recognize some variable was never set and eliminated a branch we relied on to exist. Fool the compiler a bit better.

And add minor changes to the tests to allow running them more generally.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
bwoebi added a commit that referenced this issue Sep 5, 2024
LTO seems to recognize some variable was never set and eliminated a branch we relied on to exist. Fool the compiler a bit better.

And add minor changes to the tests to allow running them more generally.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
@bwoebi
Copy link
Collaborator

bwoebi commented Sep 5, 2024

Looks like php_base64_encode is stored as ifunc, which the symbol resolver didn't like.
The generator failures seem LTO related, I fixed these too. And changed the tests minimally so that they'll hopefully all pass.

Thanks again for helping me actually reproduce the issues!

If you want to test yourself, a pecl package based on top of the current branch: https://output.circle-artifacts.com/output/job/c649228b-31a1-46e0-ba47-53e7c56953bf/artifacts/0/pecl/datadog_trace-1.4.0.tgz

At least for me, with your reproducer, this is now a fully passing test suite.

@remicollet
Copy link
Contributor Author

remicollet commented Sep 6, 2024

Great, this seems far better

Will all standard build options, 1 single erratic failure failure (sometime PASS sometime FAIL)

========DIFF========
--
     process
     [ddtrace] [info] Flushing trace of size 3 to send-queue for %s
     kill
005- %r\n*(Killed\n*)?(Termsig=9)?%r
========DONE========
FAIL Force flush the traces mid-way through a trace [tests/ext/force_flush_traces.phpt] 

@remicollet
Copy link
Contributor Author

remicollet commented Sep 6, 2024

Also erratic failure with (PHP 8.2 on Fedora 39)

========DIFF========
--
     17952737041105130042 - parent
     2469588189546311528 - child
     Done.
005+ 8323445853463659930 - parent
005- 2516265689700432462 - parent
     16668552215174154828 - child
     Done.
008+ 14458935525009338917 - parent
008- 15684088468973760345 - parent
     10307413207671831467 - child
     Done.
011+ 10888029678232491475 - parent
011- 3611203882987592167 - parent
     14490808261858112199 - child
     Done.
014+ 10961989281185213082 - parent
014- 8371681150192204748 - parent
     12415856028556828342 - child
     Done.
     Done.
========DONE========
FAIL Short running multiple forks [tests/ext/pcntl/pcntl_fork_reseed_span_id.phpt] 

@remicollet
Copy link
Contributor Author

remicollet commented Sep 6, 2024

I have tested with 7.4 to 8.3, test suite passes most of the time.

Do you have any ETA for a release with these fixes ?

(as I think my RPMs are probably broken)

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 6, 2024

Regarding the first test failure, that regex is supposed to catch a possibly empty line. Not sure what would help here. Can you dump me your tests/ext/force_flush_traces.out file (possibly with xxd) - no idea what exactly doesn't match here (maybe a \r\n, insteas of \n or such?).

The second erratic failure I've never seen and no idea why it would fail. Having a backtrace whenever genrand64_int64() in ext/vendor/mt19937/mt19937-64.c gets called would probably help. I suppose some condition leads to it being called more often than it should...?

I'll come back to you with an ETA for the fixes. (Maybe I can do a quick 1.3.1.)

@remicollet
Copy link
Contributor Author

Maybe I can do a quick 1.3.1

This will be great

@remicollet
Copy link
Contributor Author

Can you dump me your tests/ext/force_flush_traces.out

00000010  70 72 6f 63 65 73 73 0a  5b 64 64 74 72 61 63 65  |process.[ddtrace|
00000020  5d 20 5b 69 6e 66 6f 5d  20 46 6c 75 73 68 69 6e  |] [info] Flushin|
00000030  67 20 74 72 61 63 65 20  6f 66 20 73 69 7a 65 20  |g trace of size |
00000040  33 20 74 6f 20 73 65 6e  64 2d 71 75 65 75 65 20  |3 to send-queue |
00000050  66 6f 72 20 68 74 74 70  3a 2f 2f 6c 6f 63 61 6c  |for http://local|
00000060  68 6f 73 74 3a 38 31 32  36 0a 6b 69 6c 6c        |host:8126.kill|
0000006e

force_flush_traces.out.txt

@remicollet
Copy link
Contributor Author

remicollet commented Sep 6, 2024

The second erratic failure I've never seen and no idea why it would fail.

BTW, I only see it once among lot of runs, probably safe to ignore, will tell you if it comes back

P.S. sometimes running various builds for different PHP versions simultaneously raise failures, perhaps I launch a new one too fast become another fails.

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 6, 2024

Interesting, the trailing \n in echo "kill\n"; apparently gets dropped from the out file. I suppose putting the %r%r on the line before will help.

@remicollet
Copy link
Contributor Author

remicollet commented Sep 6, 2024

Interesting, the trailing \n in echo "kill\n"; apparently gets dropped from the out file. I suppose putting the %r%r on the line before will help.

I confirm, fixed using

diff -up tests/force_flush_traces.phpt.old tests/force_flush_traces.phpt
--- tests/force_flush_traces.phpt.old	2024-09-06 14:12:42.138278530 +0200
+++ tests/force_flush_traces.phpt	2024-09-06 14:12:46.676457649 +0200
@@ -43,5 +43,4 @@ var_dump(dd_trace_serialize_closed_spans
 tracing process
 process
 [ddtrace] [info] Flushing trace of size 3 to send-queue for %s
-kill
-%r\n*(Killed\n*)?(Termsig=9)?%r
+kill%r\n*(Killed\n*)?(Termsig=9)?%r

P.S run ~100 times without failure (else fails 50%)

bwoebi added a commit that referenced this issue Sep 6, 2024
LTO seems to recognize some variable was never set and eliminated a branch we relied on to exist. Fool the compiler a bit better.

And add minor changes to the tests to allow running them more generally.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
@bwoebi bwoebi closed this as completed in ff59f29 Sep 6, 2024
@bwoebi
Copy link
Collaborator

bwoebi commented Sep 6, 2024

We'll be releasing 1.3.1 soon.

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 6, 2024

@remicollet 1.3.1 has been released - thanks Remi.

@remicollet
Copy link
Contributor Author

@remicollet 1.3.1 has been released - thanks Remi.

Sadly lot of failure again, probably some fix missing from 1.4.0... :(

Will check next week.

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 6, 2024

@remicollet Yes, I messed up. Sorry, I'll have to retag that.

@bwoebi
Copy link
Collaborator

bwoebi commented Sep 6, 2024

@remicollet I now uploaded the correct 1.3.1. Sorry about that :-(

@remicollet
Copy link
Contributor Author

remicollet commented Sep 6, 2024

@remicollet I now uploaded the correct 1.3.1. Sorry about that :-(

Looks good, thanks!

P.S. failure on tests/ext/pcntl/pcntl_fork_reseed_span_id.phpt, very probably related to ton of extensions installed in my developer env. Not urgent, will try to find the culprit next week.
P.S.2: full test suite passes in clean env.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants