[24.1] Fix job paused on user defined object store #19578

davelopez · 2025-02-10T18:17:20Z

Fixes #19577

This PR moves the quota check until the point where we know the object_store_id of the job so we can properly check for quotas.

Thanks @mvdbeek for hinting in that direction, it seems to work 👍

FixRunToolOnUserStorage.mp4

How to test the changes?

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
- Follow the steps on Cannot run jobs on user-defined object-stores when the default quota is full #19577

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

We don't have the object_store_id until we get to this point, and we need the object_store_id to check whether the target object store is subject to quota or not.

Since they are not subject to Galaxy quotas

bgruening · 2025-02-10T23:05:24Z

lib/galaxy/jobs/__init__.py

@@ -1510,7 +1510,7 @@ def pause(self, job=None, message=None):
            job = self.get_job()
        if message is None:
            message = "Execution of this dataset's job is paused"
-        if job.state == job.states.NEW:
+        if job.state in (job.states.NEW, job.states.QUEUED):


Can a queued job that is already handed over to the job scheduler be paused?

Not really, but the "over-quota pause check" is done now during the enqueue process and the state at this point is already queued.
I'll see if I can delay the status change without other consequences until this check has been made 👍

I ended up refactoring a bit more than I would have liked for a fix, but it seems reasonable since otherwise the logging will be inconsistent and tell that the job was dispatched when it wasn't.

Can a queued job that is already handed over to the job scheduler be paused?

Yes it can, at any point of a jobs' lifetime. I don't know if that means some of the refactoring can be undone ?

Ok, Thanks for the clarification! I will drop the additional refactoring commits.

@mvdbeek but does that mean that the job scheduler needs to remove the job from the queue? If so should we not avoid that? Just want to understand how this works :)

paused jobs won't be picked up by the handler loop, they're excluded in the ready-to-run query.

mvdbeek · 2025-02-11T11:15:00Z

lib/galaxy/jobs/__init__.py

@@ -1604,23 +1606,28 @@ def get_destination_configuration(self, key, default=None):

    def enqueue(self):
        job = self.get_job()
-        # Change to queued state before handing to worker thread so the runner won't pick it up again


This needs to be the first thing to happen to avoid races between worker threads. It is a no-op state basically.

mvdbeek · 2025-02-11T11:18:12Z

The state in c2ed250 looked good to me, I am not so sure about the remaining commits. It should be possible to write an integration test for this, I think that would be quite helpful to base any changes against.

davelopez · 2025-02-11T11:37:12Z

I will try to work on an integration test, I was unsure about setting up a user-defined object store for the test, but I will investigate more.

mvdbeek · 2025-02-11T11:54:33Z

I don't think it needs to be a user object store, I think it just needs to be a non-default object store. "scratch" should also do. This might be helpful as a starting point: https://github.com/galaxyproject/galaxy/blob/dev/test/integration/objectstore/test_selection_with_user_preferred_object_store.py#L1

davelopez · 2025-02-11T12:08:57Z

Thanks! I found a way to use user-defined object stores by looking at some existing tests from John. I think it is important to check user-defined because this change affects those in particular, but I can add another test for non-default object stores too.

mvdbeek · 2025-02-11T13:59:10Z

No need for a separate test IMO, if you use user-defined or not, the important thing is to delay the over-the-quota check until we have the object store id for the job.

davelopez · 2025-02-11T14:00:52Z

Then it should be ready for review :)

davelopez added 2 commits February 10, 2025 19:10

Move quota check to job enqueue

ecaa747

We don't have the object_store_id until we get to this point, and we need the object_store_id to check whether the target object store is subject to quota or not.

Skip over quota check for user-defined object stores

c2ed250

Since they are not subject to Galaxy quotas

davelopez added kind/bug area/backend labels Feb 10, 2025

github-actions bot added this to the 24.1 milestone Feb 10, 2025

bgruening reviewed Feb 10, 2025

View reviewed changes

davelopez marked this pull request as draft February 11, 2025 08:13

davelopez marked this pull request as ready for review February 11, 2025 09:40

mvdbeek reviewed Feb 11, 2025

View reviewed changes

davelopez marked this pull request as draft February 11, 2025 11:37

davelopez force-pushed the 24.1_fix_job_paused_on_user_defined_object_store branch from ed26b64 to c2ed250 Compare February 11, 2025 11:38

Add integration tests for user object store quota handling

adef4f8

davelopez marked this pull request as ready for review February 11, 2025 14:00

mvdbeek approved these changes Feb 11, 2025

View reviewed changes

mvdbeek merged commit c3b6cbf into galaxyproject:release_24.1 Feb 11, 2025
46 of 52 checks passed

davelopez deleted the 24.1_fix_job_paused_on_user_defined_object_store branch February 11, 2025 16:11

davelopez mentioned this pull request Feb 12, 2025

Cannot run jobs on user-defined object-stores when the default quota is full #19577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[24.1] Fix job paused on user defined object store #19578

[24.1] Fix job paused on user defined object store #19578

davelopez commented Feb 10, 2025

bgruening Feb 10, 2025

davelopez Feb 11, 2025

davelopez Feb 11, 2025

mvdbeek Feb 11, 2025 •

edited

Loading

davelopez Feb 11, 2025

bgruening Feb 11, 2025

mvdbeek Feb 11, 2025 •

edited

Loading

mvdbeek Feb 11, 2025 •

edited

Loading

mvdbeek commented Feb 11, 2025

davelopez commented Feb 11, 2025

mvdbeek commented Feb 11, 2025

davelopez commented Feb 11, 2025

mvdbeek commented Feb 11, 2025

davelopez commented Feb 11, 2025

[24.1] Fix job paused on user defined object store #19578

[24.1] Fix job paused on user defined object store #19578

Conversation

davelopez commented Feb 10, 2025

How to test the changes?

License

bgruening Feb 10, 2025

Choose a reason for hiding this comment

davelopez Feb 11, 2025

Choose a reason for hiding this comment

davelopez Feb 11, 2025

Choose a reason for hiding this comment

mvdbeek Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

davelopez Feb 11, 2025

Choose a reason for hiding this comment

bgruening Feb 11, 2025

Choose a reason for hiding this comment

mvdbeek Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

mvdbeek Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

mvdbeek commented Feb 11, 2025

davelopez commented Feb 11, 2025

mvdbeek commented Feb 11, 2025

davelopez commented Feb 11, 2025

mvdbeek commented Feb 11, 2025

davelopez commented Feb 11, 2025

mvdbeek Feb 11, 2025 •

edited

Loading

mvdbeek Feb 11, 2025 •

edited

Loading

mvdbeek Feb 11, 2025 •

edited

Loading