Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update allocation halted on missing hash ALEPH-337 #737

Merged

Conversation

olethanh
Copy link
Collaborator

If a message was inexisting, fake or rejected, for example in the ref for an instance, update_allocation halted at the VM it was processing, stopping the processing of the rest of the list of VM. This mean that any bad item hash from the scheduler would block bringing any new VM ups.

This situation is now improved more generally so as to catch any error that might happen during VM creation.

Related ClickUp, GitHub or Jira tickets : Jira Ticket: ALEH-337

Self proofreading checklist

  • The new code clear, easy to read and well commented.
  • New code does not duplicate the functions of builtin or popular libraries.
  • An LLM was used to review the new code and look for simplifications.
  • New classes and functions contain docstrings explaining what they provide.
  • [no] All new code is covered by relevant tests.
  • [n/a] Documentation has been updated regarding these changes.
  • [ x] Dependencies update in the project.toml have been mirrored in the Debian package build script packaging/Makefile

Changes

Explain the changes that were made. The idea is not to list exhaustively all the changes made (GitHub already provides a full diff), but to help the reviewers better understand:

  • which specific file changes go together, e.g: when creating a table in the front-end, there usually is a config file that goes with it
  • the reasoning behind some changes, e.g: deleted files because they are now redundant
  • the behaviour to expect, e.g: tooltip has purple background color because the client likes it so, changed a key in the API response to be consistent with other endpoints

How to test

Call update allocation with invalid hash mixed in

e.g.

POST http://localhost:4020/control/allocations
Content-Type: application/json
X-Auth-Signature: test
Accept: application/json
Content-Length: 191
User-Agent: IntelliJ HTTP Client/PyCharm 2024.3.1
Accept-Encoding: br, deflate, gzip, x-gzip

{
  "persistent_vms": [],
  "instances": [
    "2f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d723",
    "f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d7232"
  ]
}

it should return a something like

{
  "success": false,
  "successful": [],
  "failing": [
    "f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d7232",
    "2f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d723"
  ],
  "errors": {
    "f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d7232": "HostNotFoundError()",
    "2f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d723": "<HTTPNotFound Hash not found not prepared>"
  }
}

Jira Ticket: ALEH-337

If a message was inexisting, for example in the ref for an instance, update_allocation halted at the VM it was processing
leaving the other vm as is.

Solution:
Catch the HttpNotFound error that is raised when a message cannot be retrieved
While creating a VM if any unkown error was raised, update_allocation halted where it was, failing to process the rest of the list

Solution:
Catch all the possible errors

Continuation of ALEPH-337
@olethanh olethanh requested a review from hoh January 13, 2025 18:03
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Project coverage is 62.73%. Comparing base (c2ad82a) to head (654b102).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/aleph/vm/orchestrator/views/__init__.py 0.00% 8 Missing ⚠️
src/aleph/vm/storage.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #737      +/-   ##
==========================================
- Coverage   62.81%   62.73%   -0.08%     
==========================================
  Files          70       70              
  Lines        6314     6322       +8     
  Branches      516      516              
==========================================
  Hits         3966     3966              
- Misses       2190     2198       +8     
  Partials      158      158              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@olethanh olethanh merged commit b860265 into main Jan 15, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants