Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaxSpotInstanceCountExceeded in GL tests due to GPU #675

Closed
DavidGOrtega opened this issue Jul 27, 2021 · 5 comments
Closed

MaxSpotInstanceCountExceeded in GL tests due to GPU #675

DavidGOrtega opened this issue Jul 27, 2021 · 5 comments
Assignees
Labels
ci-gitlab gpu Inexplicably convoluted drivers p1-important High priority technical-debt Refactoring, linting & tidying testing Unit tests & debugging

Comments

@DavidGOrtega
Copy link
Contributor

We have a difficult scenario having MaxSpotInstanceCountExceeded errors. Probably reuse will solve this

@DavidGOrtega DavidGOrtega added p0-critical Max priority (ASAP) technical-debt Refactoring, linting & tidying testing Unit tests & debugging labels Jul 27, 2021
@0x2b3bfa0
Copy link
Member

If we enable the --reuse option, unit tests won't [always] cover the runner creation process. Do we want that?

@casperdcl
Copy link
Contributor

casperdcl commented Aug 2, 2021

@shcheklein found that there are multiple instances running for about a week. We need:

  • automatic warning (email etc.) from AWS if instances run for more than e.g. 30min
  • automatic shutdown of instances (timeout) by AWS
  • figure out why CML didn't cleanly terminate the instances

--reuse will only hide the problem without solving it

@DavidGOrtega DavidGOrtega self-assigned this Aug 2, 2021
@DavidGOrtega
Copy link
Contributor Author

related to #680

This might be related to #678 after seeying the logs seems that the chrono is not working properly

@casperdcl
Copy link
Contributor

@dacbd
Copy link
Contributor

dacbd commented Feb 17, 2023

We haven't seen this in a while, closing for now.

@dacbd dacbd closed this as completed Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-gitlab gpu Inexplicably convoluted drivers p1-important High priority technical-debt Refactoring, linting & tidying testing Unit tests & debugging
Projects
None yet
Development

No branches or pull requests

4 participants