Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build job with address sanitizers is flaky #164

Open
andpiccione opened this issue Nov 3, 2023 · 2 comments
Open

Build job with address sanitizers is flaky #164

andpiccione opened this issue Nov 3, 2023 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@andpiccione
Copy link
Member

andpiccione commented Nov 3, 2023

Describe the bug
The build pipeline job running functional tests using a SCITT build with address sanitizers enabled is flaky and fails frequently.

To Reproduce

Run the build pipeline with commit 8e765e7 or before.

Example build: https://msazure.visualstudio.com/One/_build/results?buildId=82290496&view=logs&j=18afa956-2433-54b9-a984-6a92e16f0b5b&t=d846f465-8944-5ee0-9403-de1d697ca3f0&l=482

Expected behavior
Build and tests with address sanitizers enabled should succeed consistently.

Additional context
Based on the logs, it looks like cchost randomly fails in functional tests with the following error:

UndefinedBehaviorSanitizer: undefined-behavior ../include/ccf/ds/logger.h:159:34: runtime error: -1.08502 is outside the range of representable values of type 'unsigned long'

The problem seems to originate from the enclave offset being negative, which results in an unsigned long being set to a negative number: https://github.com/microsoft/CCF/blob/0e406e48409c819aea5139391a85f89dd090f0b5/include/ccf/ds/logger.h#L159.

The negative offset seems to be the result of the difference between the time known to the enclave and the time known to the host: https://github.com/microsoft/CCF/blob/0e406e48409c819aea5139391a85f89dd090f0b5/src/host/handle_ring_buffer.h#L76.

It is not clear yet if the problem originates from the CCF code or the SCITT code and should be investigated further in a future task.

@andpiccione andpiccione added the bug Something isn't working label Nov 3, 2023
@ivarprudnikov
Copy link
Member

@andpiccione would it make sense to escalate to CCF to make sure negative numbers cannot reach the method? Seems like a bit of protective coding would help here but we cannot do much in our codebase.

@andpiccione
Copy link
Member Author

@ivarprudnikov Yeah I think it could make sense to involve the CCF team for this. However, it's not clear to me yet whether this problem originates from our codebase or from CCF's though: the error above seems to point to the CCF code, but CCF have a similar build job with address sanitizers enabled (with similar steps) and that doesn't seem to fail with the error detailed above from what I could see. This let me think there is something wrong in our tests / build, or maybe the RC is completely different.

Before involving CCF, I believe we should try reproducing the failure locally on a VM (should be easy by running the same commands executed during the build job) and see if we can find more logs or information on this specific error. If we can't find anything more, we can reach out to the CCF team for assistance on the matter.

@ivarprudnikov ivarprudnikov added this to the Next version milestone Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants