Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E failing links test is flaky #4215

Open
matzf opened this issue Jun 10, 2022 · 1 comment
Open

E2E failing links test is flaky #4215

matzf opened this issue Jun 10, 2022 · 1 comment
Labels
bug Something isn't working c/testing Everything related to the testing stack

Comments

@matzf
Copy link
Contributor

matzf commented Jun 10, 2022

The E2E failing links test has been flaky since #4168.
The general problem seems to be that the await-connectivity script waits for a path to be established between any two ASes, but the "E2E failing links" test requires specific paths to have been discovered before continuing to disable routers.
Consequently, possible fixes could be:

  • wait for a fixed amount of time
  • extend the await-connectivity script to wait for specific path segments
  • adapt the await-connectivity script to wait for all paths to be established, e.g. by waiting until there are no new segments after one full beaconing period.
@matzf matzf added bug Something isn't working c/testing Everything related to the testing stack labels Jun 10, 2022
@matzf
Copy link
Contributor Author

matzf commented Jul 31, 2023

Update: I did some investigation (#4356), suspecting that the problem mainly lies with the error and timeout handling end-to-end test utility. However, while I found that these can be improved, this is not the (main) issue here.

The problem is related to the caching of path segments in the control service; once path segments are queried, the result is kept in the cache for a relatively long time (path.query_interval, defaults to 5 minutes). The "Refresh" flag of the sciond path requests does not propagate to the path segment requests in the control service.
If the "surviving" segments of the failing link scenario have not been registered before the first path segment request is made, the test will fail.
This suggests a possible alternative, going beyond just fixing the test; rethink the path query caching strategy, to better take into account the interaction between the different caching levels and the usage requirements from SCION applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working c/testing Everything related to the testing stack
Projects
None yet
Development

No branches or pull requests

1 participant