-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault/problems with error categories/std::error_codes when using dynamic.ot tracing. #5481
Comments
I think this can be fixed by changing this code to to always return the dynamic_load_failure error code (so that there's no reference to an error_category from the loaded library). And adding the error category symbols to the global section of the export map so that they resolve to the versions of the opentracing library linked in by envoy, which would make them comparable. |
The fix you suggest sounds good to me, if the error_message output parameter is set to something that preserves the information that would be contained in the error_message from the plugin and the error code + category (calling error_category::message should be safe before onloading the shared object). Regarding the export map, I thought so too but in this repro, I'm dynamically linking libopentracing.so. I do not completely grok ELF/Linux dynamic symbol resolution, but I thought that this would be sufficient for any symbols in the executable to override symbols from libopentracing. man 3 dlopen has this to say:
So maybe envoy is not liked with -rdynamic? |
It supports a build with exported symbols (See #2251). But not sure what the default is. |
Another (though less elegant and efficient) option that would working without |
I think dynamic cast would also not work without the types being exported. So far I'v worked around this by comparing the error code value (integer) and the category name (string), which seems to work well enough. |
Right, those classes would need to be available. But since Could you review the PR (opentracing/opentracing-cpp#102) to verify it will fix the segfaulting? |
Thanks, @rnburn that fixes the crash! 👍 I rebuilt envoy with the opentracing from your PR, and now get the following output, as expected:
One should just consider if error_code.message() should be prepended to error_message or if one should be able to configure opentracing-cpp to return the actual error code (in case libopentracing is linked dynamically). |
Maybe interesting (if only in the long-run) in this regard is C++ std proposal P1196R0 "Value-based std::error_category comparison" (via reddit). |
Description:
Envoy crashes when the error_category parameter of the
OpenTracingMakeTracerFactory
implementation is assigned from a dynamic.ot plugin. The cause of this is the combination of:Point 1 alone causes a lot of problems, for example even if 2 is fixed, error categories and thus std::error_codes from plugins will always compare unequal to the ones from envoy (because libstdc++ basically compares the address of error category objects and the copy in the plugin will have a different one than the statically linked copy in the envoy binary) which e.g. affects the
LookupKey
function in opentracing-cpp's mocktracer (mocktracer/src/propagation.cpp
) sinceresult.error() != opentracing::lookup_key_not_supported_error
is always true since the lhs comes from envoy and the rhs from the opentracing shared library.In combination with point 2, this means that envoy tries to dereference a pointer to memory that belongs to an unloaded shared library (whether it is in the so image itself or was deallocated in unload callbacks/static destructors).
Repro steps:
For example, start with the
/examples/jaeger-native-tracing
sample (I'm using commit 7d2e84d). Replace the jaeger tracing plugin in the front-envoy with the opentracing_mocktracer from opentracing/opentracing-cpp@v1.5.0. Modify the plugins's source code (in opentracing-cpp) like follows:Then compile it (fortunately, opentracing-cpp has a very straightforward cmake build so this should be no problem).
Change the docker-compose.yaml of the envoy example:
For better debugging, I also changed the envoy dockerfile:
Admin and Stats Output: Should not be relevant.
Config: The same as in examples/jaeger-native-tracing/front-envoy-jaeger.yaml, except that the tracing config is changed to the following:
Logs:
Notice how the suspected fault address matches the address of the error category, and that the .so was already unloaded (
~DummyUnloadDetection
in output) before the segfault occured.Call Stack: (snip from above)
The text was updated successfully, but these errors were encountered: