update allocation halted on missing hash ALEPH-337 #737

olethanh · 2025-01-13T18:03:43Z

If a message was inexisting, fake or rejected, for example in the ref for an instance, update_allocation halted at the VM it was processing, stopping the processing of the rest of the list of VM. This mean that any bad item hash from the scheduler would block bringing any new VM ups.

This situation is now improved more generally so as to catch any error that might happen during VM creation.

Related ClickUp, GitHub or Jira tickets : Jira Ticket: ALEH-337

Self proofreading checklist

The new code clear, easy to read and well commented.
New code does not duplicate the functions of builtin or popular libraries.
An LLM was used to review the new code and look for simplifications.
New classes and functions contain docstrings explaining what they provide.
[no] All new code is covered by relevant tests.
[n/a] Documentation has been updated regarding these changes.
[ x] Dependencies update in the project.toml have been mirrored in the Debian package build script packaging/Makefile

Changes

Explain the changes that were made. The idea is not to list exhaustively all the changes made (GitHub already provides a full diff), but to help the reviewers better understand:

which specific file changes go together, e.g: when creating a table in the front-end, there usually is a config file that goes with it
the reasoning behind some changes, e.g: deleted files because they are now redundant
the behaviour to expect, e.g: tooltip has purple background color because the client likes it so, changed a key in the API response to be consistent with other endpoints

How to test

Call update allocation with invalid hash mixed in

e.g.

POST http://localhost:4020/control/allocations
Content-Type: application/json
X-Auth-Signature: test
Accept: application/json
Content-Length: 191
User-Agent: IntelliJ HTTP Client/PyCharm 2024.3.1
Accept-Encoding: br, deflate, gzip, x-gzip

{
  "persistent_vms": [],
  "instances": [
    "2f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d723",
    "f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d7232"
  ]
}

it should return a something like

{
  "success": false,
  "successful": [],
  "failing": [
    "f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d7232",
    "2f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d723"
  ],
  "errors": {
    "f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d7232": "HostNotFoundError()",
    "2f473bb4662888edf852b5dda6662328cca000cdf4d64c8a4576bea355d3d723": "<HTTPNotFound Hash not found not prepared>"
  }
}

Jira Ticket: ALEH-337 If a message was inexisting, for example in the ref for an instance, update_allocation halted at the VM it was processing leaving the other vm as is. Solution: Catch the HttpNotFound error that is raised when a message cannot be retrieved

While creating a VM if any unkown error was raised, update_allocation halted where it was, failing to process the rest of the list Solution: Catch all the possible errors Continuation of ALEPH-337

codecov · 2025-01-13T18:09:15Z

Codecov Report

Attention: Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Project coverage is 62.73%. Comparing base (c2ad82a) to head (654b102).
Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
src/aleph/vm/orchestrator/views/__init__.py	0.00%	8 Missing ⚠️
src/aleph/vm/storage.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #737      +/-   ##
==========================================
- Coverage   62.81%   62.73%   -0.08%     
==========================================
  Files          70       70              
  Lines        6314     6322       +8     
  Branches      516      516              
==========================================
  Hits         3966     3966              
- Misses       2190     2198       +8     
  Partials      158      158

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

olethanh added 3 commits January 13, 2025 18:28

Problem: update_allocation stopped on unkown error

39812a7

While creating a VM if any unkown error was raised, update_allocation halted where it was, failing to process the rest of the list Solution: Catch all the possible errors Continuation of ALEPH-337

Enhance verbosity on failed downloads

654b102

olethanh requested a review from hoh January 13, 2025 18:03

hoh approved these changes Jan 14, 2025

View reviewed changes

Psycojoker approved these changes Jan 15, 2025

View reviewed changes

olethanh merged commit b860265 into main Jan 15, 2025
20 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update allocation halted on missing hash ALEPH-337 #737

update allocation halted on missing hash ALEPH-337 #737

olethanh commented Jan 13, 2025

codecov bot commented Jan 13, 2025 •

edited

Loading

update allocation halted on missing hash ALEPH-337 #737

update allocation halted on missing hash ALEPH-337 #737

Conversation

olethanh commented Jan 13, 2025

Self proofreading checklist

Changes

How to test

codecov bot commented Jan 13, 2025 • edited Loading

Codecov Report

codecov bot commented Jan 13, 2025 •

edited

Loading