Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this fixes the existing issue? also if you grep this file by json::array() its used in 7 places, so it would require to be fixed in all of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line, "json arr = json::array();" calls constructor, "basic_json(init, false, value_t::array);" in file json.hpp:1615, whereas, "json::array_t a; json arr(a);" invokes different constructor "basic_json(const array_t& val)".
Here, the issue appears to be stack corruption, where passed parameter seems to have a value different from the one passed by the calling method. We see this issue only in this flow on API "sai_serialize_acl_resource_list".
When modified the constructor call to pass only 2 arg and assign 3 arg from default value as below, issue is not seen.
"basic_json(init, false);" in file json.hpp:1615
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is actual root cause, since the same exact line is used in different serialization methods in this file as well, and if they don't cause same error, then this could potentially by compiler bug or some memory alignment, for example same constructor us used in sai_serialize_json_fdb_event_notification_data and sai_serialize_fdb_event_ntf or sai_serialize_port_oper_status_ntf. So if you don't hit this error anywhere else, then it ether means that you don't receive those notifications in your case (if error is persistent), or the error is related to some compiler/optimization. Also, this error is not triggered on x86/64 platform, only on armhf, which would suggest armhf platform is flawed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will you be able to see if you will hit this error without -O2 enabled ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, do you have idea which actual line in json.hpp is causing this error? maybe it's better to fix this in json.hpp rather than here, if many different components may use json.hpp and this array constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lguohan, it is not clear to me from the stack backtrace why boost v 1.71 is at play here. @kcudnik are we using API across libraries that are using boost? My understanding is the swss-common had compile time dependency on boost smart pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boost was recently added as dependency as am' aware, and sairedis library is not currently using any boost bindings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was in azure pipeline build docker. Wondering if @rajkumar38 has mismatched binaries by any chance???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all packages are installed in build docker using apt-get command and I don't see any package mismatch (verified with dpkg -l in build docker env)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With changes from PR 6532, I recompiled libswsscommon, libsairedis and libsaimetadata debs with O2 flag disabled. With this, I'm not seeing the crash.
Issue is seen when PR 6532 and O2 flag enabled together.
I don't think any compiler settings changed recently and is not clear how PR 6532 is causing the regression.