Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otel_http_client not being built with HAVE_CPP_STDLIB when using cmake -DWITH_STL=ON causing core dump when calling GlobalLogHandler::GetLogHandler() #1362

Closed
dgtrapeze opened this issue Apr 29, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@dgtrapeze
Copy link

Describe your environment Describe any aspect of your environment relevant to the problem, including your platform, build system, version numbers of installed dependencies, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main branch.

opentelemetry-cpp = v1.3.0
build system = centos 7.9 (although our rhel 8 build had the same issue)
Using cmake 3.17.5 using makefiles for build with gcc 11 as compiler
Using nlohmann::json supplied by third_party area

We have been using the build line as below for our normal builds:

cmake3 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=17 -DWITH_ABSEIL=OFF -DWITH_ETW=OFF -DWITH_STL=ON -DWITH_LOGS_PREVIEW=OFF -DWITH_OTLP=ON -DWITH_OTLP_HTTP=ON -DWITH_OTLP_GRPC=OFF -DWITH_ELASTICSEARCH=OFF -DBUILD_TESTING=OFF -DWITH_EXAMPLES=OFF -DCMAKE_INSTALL_PREFIX=$STAGING ..

Although to investigate this issue I was using the Debug build

Steps to reproduce
Describe exactly how to reproduce the error. Include a code sample if applicable.

The critical elements of this issue are:

  • Build otel libraries with WITH_STL enabled
  • Use the OtlpHttpExporter
  • The collector for this exporter is shutdown

This causes the following line in otlp_client_http.cc to be executed

      case http_client::SessionState::ConnectFailed:
        OTEL_INTERNAL_LOG_ERROR("[OTLP HTTP Client] Session state: connection failed");

While not critical to the issue, I was using the BatchSpanProcessor as well so the send calls are being done in a separate thread.

I did not get crashes with using other exporters, only the OTLP HTTP exporter caused a crash.

What is the expected behavior?
What did you expect to see?

The error message listed in the code ("[OTLP HTTP Client] Session state: connection failed") to appear on stderr. Technically I don't want this error message to appear as I don't want our users to see this message, so I typically do SetLogHandler(new NoopLogHandler) in the code, but for the sake of this issue, I used the DefaultLogHandler. The core dump occurs with either log handler set.

What is the actual behavior?
What did you see instead?

Program crashes trying to use the log_handler due to a mismatch between the definitions of nostd::shared_ptr in otlp_http_client and the rest of the code.

#0 0x0000000001849c50 in ?? ()
#1 0x0000000000a329b1 in opentelemetry::v1::exporter::otlp::(anonymous namespace)::ResponseHandler::OnEvent(opentelemetry::v1::ext::http::client::SessionState, opentelemetry::v1::nostd::string_view) ()
#2 0x0000000000a3e5b4 in opentelemetry::v1::ext::http::client::curl::HttpOperation::Send() ()
#3 0x0000000000a42f65 in std::Function_handler<std::unique_ptr<std::future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result<long>, std::_future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<opentelemetry::v1::ext::http::client::curl::HttpOperation::SendAsync(std::function<void (opentelemetry::v1::ext::http::client::curl::HttpOperation&)>)::{lambda()#1}> >, long> >::_M_invoke(std::_Any_data const&) ()
#4 0x0000000000a41c6b in std::_future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::future_base::_Result_base, std::_future_base::_Result_base::_Deleter> ()>, bool) ()
#5 0x00007ffff795b20b in __pthread_once_slow () from /lib64/libpthread.so.0
#6 0x0000000000a435e3 in std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<opentelemetry::v1::ext::http::client::curl::HttpOperation::SendAsync(std::function<void (opentelemetry::v1::ext::http::client::curl::HttpOperation&)>)::{lambda()#1}> >, long>::_M_run() ()
#7 0x0000000000b331b4 in execute_native_thread_routine ()
#8 0x00007ffff795cea5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007ffff5a80b0d in clone () from /lib64/libc.so.6

Additional context
Add any other context about the problem here.

I noticed the issue during debugging of crashes in code where the telemetry was added recently. The code 'jumped' into the 'local' opentelemetry definition of nostd::shared_ptr at the OTEL_INTERNAL_LOG_ERROR line when I knew it should not as I'd built it in C++17 mode.

On checking the make build I noticed that every other source file had been built with the -DHAVE_CPP_STDLIB define EXCEPT otlp_http_client.cc. It was built with only -DJSON_USE_IMPLICIT_CONVERSIONS=1

On checking what cmake had generated, the makefile it created has

opentelemetry-cpp/build/exporters/otlp/CMakeFiles/opentelemetry_exporter_otlp_http_client.dir/flags.make:CXX_DEFINES = -DJSON_USE_IMPLICIT_CONVERSIONS=1
where as all the other source files in the otlp exporter area had

opentelemetry-cpp/build/exporters/otlp/CMakeFiles/opentelemetry_exporter_otlp_http.dir/flags.make:CXX_DEFINES = -DENABLE_LOGS_PREVIEW -DHAVE_CPP_STDLIB -DJSON_USE_IMPLICIT_CONVERSIONS=1
I don't know cmake to know what has gone wrong, or to know if it is related to my setup, way I'm building otel or versions of cmake being used.

I compensated for the issue by manually adding

#define HAVE_CPP_STDLIB

to the top of exporters/otlp/src/otlp_http_client.cc

and my crashes all stopped.

Ideally whatever has gone wrong in cmake that is preventing the extra defines being added to the build can be fixed.

@dgtrapeze dgtrapeze added the bug Something isn't working label Apr 29, 2022
@owent
Copy link
Member

owent commented Apr 29, 2022

Thanks for your feedback, this problem is already solved by #1336 . We will publish it on next release.
You can link opentelemetry-cpp::api into your library or executable in cmake script to solve this problem for temporary.

@lalitb
Copy link
Member

lalitb commented May 2, 2022

@owent - Do you know which commit in v1.3.0 caused this issue, and wondering if we need to release a patch early to fix the issue?

@owent
Copy link
Member

owent commented May 3, 2022

@owent - Do you know which commit in v1.3.0 caused this issue, and wondering if we need to release a patch early to fix the issue?

I didn't test it, but I think the problem is here https://github.com/open-telemetry/opentelemetry-cpp/pull/1314/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aL143
I think it's good idea to release a patch.

@lalitb
Copy link
Member

lalitb commented May 7, 2022

Closing as the problem is already fixed in the latest commit, and would be available in the next release.

@lalitb lalitb closed this as completed May 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants