Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.0] Defer job attributes that are usually not needed #17795

Merged
merged 2 commits into from
Mar 19, 2024

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Mar 19, 2024

These are only needed in
https://github.com/galaxyproject/galaxy/blob/release_24.0/lib/galaxy/managers/jobs.py#L388-L389, no need to always load them up.

I was looking at the odd query in #17787. I'm not expecting this fixes the issue, but the query is much more light-weight now.

Before:

sa_session.get(Job, 100000)
INFO:sqlalchemy.engine.Engine:select pg_catalog.version()
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:select current_schema()
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:show standard_conforming_strings
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT job.id AS job_id, job.create_time AS job_create_time, job.update_time AS job_update_time, job.history_id AS job_history_id, job.library_folder_id AS job_library_folder_id, job.tool_id AS job_tool_id, job.tool_version AS job_tool_version, job.galaxy_version AS job_galaxy_version, job.dynamic_tool_id AS job_dynamic_tool_id, job.state AS job_state, job.info AS job_info, job.copied_from_job_id AS job_copied_from_job_id, job.command_line AS job_command_line, job.dependencies AS job_dependencies, job.job_messages AS job_job_messages, job.param_filename AS job_param_filename, job.runner_name AS job_runner_name_1, job.job_stdout AS job_job_stdout, job.job_stderr AS job_job_stderr, job.tool_stdout AS job_tool_stdout, job.tool_stderr AS job_tool_stderr, job.exit_code AS job_exit_code, job.traceback AS job_traceback, job.session_id AS job_session_id, job.user_id AS job_user_id, job.job_runner_name AS job_job_runner_name, job.job_runner_external_id AS job_job_runner_external_id, job.destination_id AS job_destination_id, job.destination_params AS job_destination_params, job.object_store_id AS job_object_store_id, job.imported AS job_imported, job.params AS job_params, job.handler AS job_handler, job.preferred_object_store_id AS job_preferred_object_store_id, job.object_store_id_overrides AS job_object_store_id_overrides, (SELECT EXISTS (SELECT history_dataset_collection_association.id
FROM history_dataset_collection_association, job, job_to_output_dataset_collection
WHERE job.id = job_to_output_dataset_collection.job_id AND history_dataset_collection_association.id = job_to_output_dataset_collection.dataset_collection_id AND history_dataset_collection_association.deleted = true) AS anon_2) AS anon_1, (SELECT EXISTS (SELECT history_dataset_association.id
FROM history_dataset_association, job, job_to_output_dataset
WHERE job.id = job_to_output_dataset.job_id AND history_dataset_association.id = job_to_output_dataset.dataset_id AND history_dataset_association.deleted = true) AS anon_4) AS anon_3
FROM job
WHERE job.id = %(pk_1)s
INFO:sqlalchemy.engine.Engine:[generated in 0.00036s] {'pk_1': 100000}

After:

sa_session.get(Job, 100000)
INFO:sqlalchemy.engine.Engine:select pg_catalog.version()
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:select current_schema()
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:show standard_conforming_strings
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT job.id AS job_id, job.create_time AS job_create_time, job.update_time AS job_update_time, job.history_id AS job_history_id, job.library_folder_id AS job_library_folder_id, job.tool_id AS job_tool_id, job.tool_version AS job_tool_version, job.galaxy_version AS job_galaxy_version, job.dynamic_tool_id AS job_dynamic_tool_id, job.state AS job_state, job.info AS job_info, job.copied_from_job_id AS job_copied_from_job_id, job.command_line AS job_command_line, job.dependencies AS job_dependencies, job.job_messages AS job_job_messages, job.param_filename AS job_param_filename, job.runner_name AS job_runner_name_1, job.job_stdout AS job_job_stdout, job.job_stderr AS job_job_stderr, job.tool_stdout AS job_tool_stdout, job.tool_stderr AS job_tool_stderr, job.exit_code AS job_exit_code, job.traceback AS job_traceback, job.session_id AS job_session_id, job.user_id AS job_user_id, job.job_runner_name AS job_job_runner_name, job.job_runner_external_id AS job_job_runner_external_id, job.destination_id AS job_destination_id, job.destination_params AS job_destination_params, job.object_store_id AS job_object_store_id, job.imported AS job_imported, job.params AS job_params, job.handler AS job_handler, job.preferred_object_store_id AS job_preferred_object_store_id, job.object_store_id_overrides AS job_object_store_id_overrides
FROM job
WHERE job.id = %(pk_1)s
INFO:sqlalchemy.engine.Engine:[generated in 0.00014s] {'pk_1': 100000}

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@mvdbeek mvdbeek added area/performance area/database Galaxy's database or data access layer labels Mar 19, 2024
@github-actions github-actions bot added this to the 24.1 milestone Mar 19, 2024
@mvdbeek mvdbeek changed the title Defer any_output_dataset_collection_instances_deleted and any_output_… [24.0] Defer job attributes that are usually not needed Mar 19, 2024
@mvdbeek mvdbeek requested a review from jdavcs March 19, 2024 18:32
Copy link
Member

@jdavcs jdavcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense!

stmt = select(model.Job).filter(model.Job.job_runner_external_id == remote_job_id)
galaxy_job_id = self.app.model.session.execute(stmt).scalar_one().id
stmt = select(model.Job.id).filter(model.Job.job_runner_external_id == remote_job_id)
galaxy_job_id = self.app.model.session.execute(stmt).scalar_one()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮 - ouch - good catch.

@mvdbeek mvdbeek merged commit b36e5a1 into galaxyproject:release_24.0 Mar 19, 2024
49 checks passed
Copy link

This PR was merged without a "kind/" label, please correct.

@nsoranzo nsoranzo deleted the optimize_job_loading branch March 19, 2024 21:36
@jdavcs jdavcs modified the milestones: 24.1, 24.0 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer area/performance kind/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants