Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly parse and handle DBS client errors #12312

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Mar 19, 2025

Fixes #12229

Status

ready

Description

Properly collect all DBS error codes and provide new method to represent them to the client. The client code can request for DBS error codes and interpret them accordingly. Adjust DBSUploadPoller to use DBS error codes and perform necessary logic to spit out proper error message to upstream caller.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

After numerous discussions via MatterMost here is a list of all related PRs:

External dependencies / deployment changes

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 7 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/507/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 7 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/508/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/509/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/510/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 24, 2025

test this please

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/512/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet changed the title Fix issue 12229 Properly parse and handle DBS client errors Mar 25, 2025
Copy link
Member

@mapellidario mapellidario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, these changes make sure that when we fail to insert a block in dbs, we keep track in the agent of all the errors that have been encountered inside dbs, to make sure that we always have the root cause for the error on our side.

If this is the case, then i approve, looks good to me

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before going with a full review, I'd rather have those commits properly squashed first - as every now and then I observe changes getting lost during the squashing process.

In addition, I think it is important to first move with the server changes (dbs2go). Once testbed is running the new dbs2go version, please test these changes in an agent with a representative use scenario.

Lastly, I think our agreement was that either error.reason or message would have a clear and friendly error message (with the actual root error). Did I misunderstand our discussion? Why do we have to have string parsing of the error served by the server?

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 26, 2025

@amaltaro

  • commits are now squashed
  • changes to dbs2go (Improve DBS errors output dbs2go#122) server should be handled by @todor-ivanov and he should report when they are deployed
  • finally, my understanding from our discussions was that we agreed that it is better to keep dbs2go agnostic and handle errors in client(s). As such the only changes on server require the better formatting the errors (the Improve DBS errors output dbs2go#122 provides that) and necessary fixes to properly return error within the reason (this was done in Fix error code when parent file is not found dbs2go#121 for issue in question, i.e. about parent file was not found). At the same time, within a client we should avoid missing any error returned by DBS server and it is better to capture all of them and use for internal logic which is what is provided in this PR.

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 5 tests no longer failing
    • 1 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/526/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 26, 2025

@amaltaro please note that reported unit test failures are not related to this PR changes.

@todor-ivanov
Copy link
Contributor

hi @vkuznet :

changes to dbs2go (dmwm/dbs2go#122) server should be handled by @todor-ivanov and he should report when they are deployed

I am testing it now, before merging.

@anpicci
Copy link
Contributor

anpicci commented Mar 28, 2025

test this please

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/531/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet requested a review from amaltaro March 31, 2025 16:21
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my previous review, I think we should not be traversing all the errors from the server in our client layer, instead we should use the final and root cause error that the server will be providing. I also left a comment along the code.

Before we converge and merge this PR, I think we need to:
a) upgrade dbs2go in testbed
b) update this PR to use the actual precise error (message or reason field?)
c) apply the candidate PR/patch to an agent
d) test this with a representative scenario.

msg = f"Error trying to process block {name} through DBS. Details: {msg}"
logging.error(msg)
results.put({'name': name, 'success': "error", 'error': msg})
for srvCode in dbsError.getCodes():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what I reported/asked in the main PR body, I think looping over the errors is not what we have discussed. Instead, the agent should only care about the actual root cause of the injection failure, without traversing the whole chain of errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro , I suppose this comment is outdated after the MM discussion, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not

@@ -96,6 +96,17 @@ def getServerCode(self):
return self.data['error']['code']
return 0

def getCodes(self):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@todor-ivanov todor-ivanov Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get the comment here.... @amaltaro, what is not followed by the book in this docstring ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Parse the response of DBS Server and return all the error codes from error.code" or something providing a brief summary

Copy link
Contributor

@anpicci anpicci Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the description as is, given that the function is not parsing and then separately putting the results in a list, but it's creating a list by making the associations between the error messages and the error codes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation of the function/method does not influence the structure of the docstring.
But if people want to update the contribution guidelines with a different recommendation, let us do it.
I am just trying to seek consistency in the project.

@vkuznet vkuznet requested a review from anpicci April 2, 2025 18:55
@dmwm-bot
Copy link

dmwm-bot commented Apr 2, 2025

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests deleted
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 7 comments to review
  • Pycodestyle check: succeeded
    • 34 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/540/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 9, 2025

I addressed @amaltaro concern about "friendly" messages via the following PR: dmwm/dbs2go#129 In particular, I modified 520 (!!!) errors in DBS server code and move away from generic errors to concrete ones which address friendliness of returned DBS error code error and messages. I'll provide additional changes in this PR to address backward compatibility and avoid explicit reliance on DBS error code values. Stay tuned.

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 7 warnings
    • 49 comments to review
  • Pycodestyle check: succeeded
    • 32 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/562/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 7 warnings
    • 51 comments to review
  • Pycodestyle check: succeeded
    • 33 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/563/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 7 warnings
    • 51 comments to review
  • Pycodestyle check: succeeded
    • 33 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/564/artifact/artifacts/PullRequestReport.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WMAgent: Properly parse and handle DBS client errors
6 participants