Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job controller, job monitor and tools to support compute4punch backend #430

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

giffels
Copy link

@giffels giffels commented Feb 12, 2024

This pull request contains a job controller, job monitor and tools to support the compute4punch backend.

  • Add support for token based ssh access motley_cue the Compute4PUNCH login node. The HELMHOLTZ_TOP token needs to be added as secret using the reana-client.
  • Add Compute4PUNCH job monitor.
  • Add configurations for the Compute4PUNCH backend.
  • Add diverse utilities to simply parsing of HTCondor´s condor_q and condor_history commands.
  • Add Compute4PUNCH job controller based on ssh calls to the Compute4PUNCH login node.

@giffels
Copy link
Author

giffels commented Feb 12, 2024

I opened this as a draft pull request in order to start a discussion with the maintainers.
In particular, I am interested in early feedback and how to specify specific resource needs of the job?

At the moment I have added two new arguments to the compute4punch job controller knowing that they need to be added to the reana-commons and reana-workflow-engine-* modules as well. Before doing this, I would like to check if this is the way to go or if there is already a mechanism about that?

I have found #32, but it seems that only htcondor_max_runtime and htcondor_accounting_group have been implemented so far?

@giffels giffels force-pushed the add/compute4punch-support branch 2 times, most recently from 54768b0 to dc0fc33 Compare February 16, 2024 10:40
@giffels giffels marked this pull request as ready for review February 21, 2024 12:29
@giffels
Copy link
Author

giffels commented Feb 21, 2024

I would to share with you, that the first successful REANA workflow completed on Compute4PUNCH.

Bildschirmfoto 2024-02-21 um 13 30 42

@giffels giffels force-pushed the add/compute4punch-support branch from 6abe063 to 8177ab5 Compare February 21, 2024 12:45
Copy link

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 7.56303% with 220 lines in your changes missing coverage. Please review.

Project coverage is 39.72%. Comparing base (32ce567) to head (5ad471b).

Current head 5ad471b differs from pull request most recent head 6e62daa

Please upload reports for the commit 6e62daa to get more accurate results.

❗ There is a different number of reports uploaded between BASE (32ce567) and HEAD (5ad471b). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (32ce567) HEAD (5ad471b)
5 1
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##           maint-0.9     #430      +/-   ##
=============================================
- Coverage      45.99%   39.72%   -6.27%     
=============================================
  Files             17       18       +1     
  Lines           1235     1470     +235     
=============================================
+ Hits             568      584      +16     
- Misses           667      886     +219     
Files Coverage Δ
reana_job_controller/config.py 100.00% <100.00%> (ø)
reana_job_controller/utils.py 28.69% <10.63%> (-12.49%) ⬇️
reana_job_controller/job_monitor.py 40.27% <11.66%> (-7.76%) ⬇️
reana_job_controller/compute4punch_job_manager.py 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

Copy link
Member

@mdonadoni mdonadoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @giffels, sorry for the delay!

I have left some comments after my first look at this PR, most of them are just due to my small knowledge of HTCondor and of Compute4PUNCH ;)

I still haven't looked at the token authentication part, though, and I haven't run the code yet.

I have found #32, but it seems that only htcondor_max_runtime and htcondor_accounting_group have been implemented so far?

Yes, those are the only ones as far as I can tell.
Regarding these new values:

  • c4p_cpu_cores
  • c4p_memory_limit
  • c4p_additional_requirements

Even though the first two could be harmonised with the ones for Kubernetes, I guess that for now these will work. We will also need to add them to the specification of reana.yaml.

@tiborsimko what do you think? Is it ok to keep these new values separate from the ones used for k8s jobs? See also this related comment: #32 (comment)

reana_job_controller/compute4punch_job_manager.py Outdated Show resolved Hide resolved
reana_job_controller/compute4punch_job_manager.py Outdated Show resolved Hide resolved
reana_job_controller/compute4punch_job_manager.py Outdated Show resolved Hide resolved
reana_job_controller/compute4punch_job_manager.py Outdated Show resolved Hide resolved
reana_job_controller/compute4punch_job_manager.py Outdated Show resolved Hide resolved
reana_job_controller/compute4punch_job_manager.py Outdated Show resolved Hide resolved
reana_job_controller/compute4punch_job_manager.py Outdated Show resolved Hide resolved
reana_job_controller/job_monitor.py Outdated Show resolved Hide resolved
@mdonadoni
Copy link
Member

mdonadoni commented Feb 26, 2024

One last thing: how hard would it be to have at least some small tests for these changes? Those are the only way for us to make sure that the code keeps working also in the future, as we do not have access on Compute4PUNCH and we will not be able to test the integration with REANA there.

@giffels giffels force-pushed the add/compute4punch-support branch from 0fea500 to db63edd Compare February 27, 2024 13:05
@giffels
Copy link
Author

giffels commented Feb 27, 2024

One last thing: how hard would it be to have at least some small tests for these changes? Those are the only way for us to make sure that the code keeps working also in the future, as we do not have access on Compute4PUNCH and we will not be able to test the integration with REANA there.

Sure, it would be really useful to have test coverage for this. I will work on this.

@giffels giffels force-pushed the add/compute4punch-support branch from 16159ad to 4fc51e5 Compare March 1, 2024 09:50
@giffels giffels force-pushed the add/compute4punch-support branch 2 times, most recently from 515c974 to a244df3 Compare May 10, 2024 10:01
@giffels giffels changed the base branch from master to maint-0.9 May 10, 2024 10:02
@giffels giffels force-pushed the add/compute4punch-support branch from a244df3 to fa3ecab Compare May 10, 2024 15:16
setup.py Show resolved Hide resolved
reana_job_controller/utils.py Outdated Show resolved Hide resolved
reana_job_controller/utils.py Show resolved Hide resolved
@mdonadoni
Copy link
Member

Except for some comments that are left, I have not found issue with my local deployment of REANA, even when using HTCondor@CERN and Slurm@CERN, so these changes should not affect other REANA instances that do not use the C4P backend.

@giffels giffels force-pushed the add/compute4punch-support branch 4 times, most recently from 9b856fa to 52b903c Compare June 21, 2024 16:52
@tiborsimko
Copy link
Member

I have not found issue with my local deployment of REANA, even when using HTCondor@CERN and Slurm@CERN, so these changes should not affect other REANA instances that do not use the C4P backend.

Confirmed, just tried HTCondorCERN, and the paramiko version number change (that is coming with this PR) works perfectly well.

@giffels giffels force-pushed the add/compute4punch-support branch from 52b903c to 6e62daa Compare July 1, 2024 14:15
mdonadoni pushed a commit to giffels/reana-job-controller that referenced this pull request Jul 4, 2024
@mdonadoni mdonadoni force-pushed the add/compute4punch-support branch from 6e62daa to 3581378 Compare July 4, 2024 12:46
mdonadoni pushed a commit to giffels/reana-job-controller that referenced this pull request Jul 4, 2024
@mdonadoni mdonadoni force-pushed the add/compute4punch-support branch from 3581378 to f20e523 Compare July 4, 2024 15:15
@mdonadoni mdonadoni force-pushed the add/compute4punch-support branch from f20e523 to 4243252 Compare July 4, 2024 15:33
@mdonadoni mdonadoni merged commit 4243252 into reanahub:maint-0.9 Jul 5, 2024
12 checks passed
tiborsimko added a commit to tiborsimko/reana-job-controller that referenced this pull request Dec 4, 2024
chore(maint-0.9): release 0.9.4 (reanahub#418)
build(python): bump shared REANA packages as of 2024-11-28 (reanahub#477)
fix(kubernetes): avoid privilege escalation in Kubernetes jobs (reanahub#476)
fix(config): read secret key from env (reanahub#476)
fix(htcondorcern): support multiline commands (reanahub#474)
fix(htcondorcern): run provided command in unpacked image (reanahub#474)
build(deps): update reana-auth-vomsproxy to 1.3.0 (reanahub#466)
build(docker): pin setuptools 70 (reanahub#465)
fix(config): update reana-auth-vomsproxy to 1.2.1 to fix WLCG IAM (reanahub#457)
feat(backends): add new Compute4PUNCH backend (reanahub#430)

Note: The merge commit removes the changes related to pinning
`setuptools` to version 70, because this was only necessary for the
`maint-0.9` branches, as well as other 0.9.4 release-related changes.
tiborsimko added a commit to tiborsimko/reana-job-controller that referenced this pull request Dec 4, 2024
chore(maint-0.9): release 0.9.4 (reanahub#418)
build(python): bump shared REANA packages as of 2024-11-28 (reanahub#477)
fix(kubernetes): avoid privilege escalation in Kubernetes jobs (reanahub#476)
fix(config): read secret key from env (reanahub#476)
fix(htcondorcern): support multiline commands (reanahub#474)
fix(htcondorcern): run provided command in unpacked image (reanahub#474)
build(deps): update reana-auth-vomsproxy to 1.3.0 (reanahub#466)
build(docker): pin setuptools 70 (reanahub#465)
fix(config): update reana-auth-vomsproxy to 1.2.1 to fix WLCG IAM (reanahub#457)
feat(backends): add new Compute4PUNCH backend (reanahub#430)

Note: The merge commit removes the changes related to pinning
`setuptools` to version 70, because this was only necessary for the
`maint-0.9` branches, as well as other 0.9.4 release-related changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants