Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI-PMH error handling: meaningful errors in XML for verb processing #10205

Merged
merged 9 commits into from
Jan 17, 2024

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Jan 4, 2024

What this PR does / why we need it:

Currently, certain OAI requests, especially related to verbs or the absence of a verb, result in a 500 error and no meaningful information for the end user. Plus the poor sysadmin gets an ugly stacktrace in server.log.

As of this PR, we send XML responses with meaningful errors, like this:

/oai?foo=bar will show "No argument 'verb' found"

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2024-01-04T21:29:01Z</responseDate>
  <request>http://localhost:8080/oai</request>
  <error code="badArgument">No argument 'verb' found</error>
</OAI-PMH>

/oai?verb=foo&verb=bar will show "Verb must be singular, given: '[foo, bar]'"

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2024-01-04T21:29:01Z</responseDate>
  <request>http://localhost:8080/oai</request>
  <error code="badArgument">Verb must be singular, given: '[foo, bar]'</error>
</OAI-PMH>

Which issue(s) this PR closes:

Special notes for your reviewer:

In Slack I checked with @landreev and he's fine with the 200 response. "You have successfully queried the OAI server, and here’s your answer" even though the answer is "there was an error." This follows an established pattern set by (at least) "no set found", which I added a test for.

Due to how the xoai library is written, I have to return <error code="badArgument"> rather than ` as the spec requires. ( http://www.openarchives.org/OAI/openarchivesprotocol.html#ErrorConditions says "Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated."). However, Leonid and I are ok with deferring this as it requires work on the library side.

I updated the docker compose file so I could test this.

Related:

Suggestions on how to test this:

Play around with variations on these:

  • /oai?foo=bar will show "No argument 'verb' found"
  • /oai?verb=foo&verb=bar will show "Verb must be singular, given: '[foo, bar]'"

Confirm I didn't break anything (HarvestingServerIT passes for me locally.)

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No.

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

None.

@pdurbin pdurbin self-assigned this Jan 4, 2024
@coveralls
Copy link

coveralls commented Jan 4, 2024

Coverage Status

coverage: 20.152% (-0.001%) from 20.153%
when pulling edd6fc8 on 9275-harvest-invalid-query-params
into e5e232d on develop.

This comment has been minimized.

@pdurbin pdurbin added the Component: Containers Anything related to cloudy Dataverse, shipped in containers. label Jan 4, 2024
@cmbz cmbz added the Size: 10 A percentage of a sprint. 7 hours. label Jan 4, 2024

This comment has been minimized.

@pdurbin pdurbin marked this pull request as ready for review January 4, 2024 21:50
@pdurbin pdurbin removed their assignment Jan 4, 2024
@pdurbin pdurbin changed the title WIP: /oai?foo=bar (invalid params) OAI-PMH error handling: meaningful errors in XML for verb processing Jan 4, 2024
@landreev landreev self-requested a review January 5, 2024 13:21
@landreev landreev self-assigned this Jan 5, 2024
@landreev landreev removed their assignment Jan 5, 2024
@pdurbin pdurbin self-assigned this Jan 5, 2024
dataProvider.handle(params) allows us to return the correct error.
@pdurbin pdurbin removed their assignment Jan 8, 2024

This comment has been minimized.

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@jp-tosca jp-tosca self-assigned this Jan 9, 2024

This comment has been minimized.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 12, 2024

@landreev from https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10205/3/testReport/ we're seeing this:

Screenshot 2024-01-12 at 3 41 35 PM

Any insight on this? I could swear I've seen this switch from noRecordsMatch to noSetHierarchy and back again, but that doesn't make sense. It should be one or the other, right?

@pdurbin pdurbin self-assigned this Jan 12, 2024
@pdurbin
Copy link
Member Author

pdurbin commented Jan 16, 2024

I just merged develop to kick off another Jenkins run to get another data point. I'm not sure if the error is flipping back and forth or not. Let's see.

This comment has been minimized.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 16, 2024

In 30e357b I changed the "no such set" test to expect noSetHierarchy rather than noRecordsMatch.

This comment has been minimized.

@landreev
Copy link
Contributor

If the set does not exist, then I would expect it to consistently return "no set hierarchy". I can't really think of a timing issue or anything like that, that would be causing a "no records match" intermittently.

@landreev
Copy link
Contributor

Wait, I was clearly wrong about the above.
You only get the "no set hierarchy" when there are NO sets on this server, neither the requested set, nor any others.
So yes, the response will be different, depending on what else the developer has in their database/whether other tests have already run.

You know, I would go as far as to suggest to drop this test from the PR. Because
a) it's a bit out of scope - since the issue was specifically about a 500 for a bad verb request
b) I have another PR currently in review/QA, #10228 that changes the behavior of ListSets slightly that may actually conflict with this.

This comment has been minimized.

@landreev
Copy link
Contributor

... I should've said, I have 2 PRs in the works, #10222 and #10228 that change the behavior of both ListIdentifiers and ListSets, that may conflict with this test.

@landreev
Copy link
Contributor

... Although feel free to just modify the test, to expect either of the 2 error codes, "no sets" or "no records matching"; up to you.
Also, should we pick a weirder set name than "census" - in case a developer running the tests happens to have a set named that? 😄

@landreev
Copy link
Contributor

(sorry for the confusion; too many OAI PRs going on in parallel...)

@pdurbin
Copy link
Member Author

pdurbin commented Jan 16, 2024

So yes, the response will be different, depending on what else the developer has in their database/whether other tests have already run.

That's enough to convince me to simply drop the test, which I did in edd6fc8.

Thanks for all the feedback, @landreev

@pdurbin pdurbin removed their assignment Jan 16, 2024
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9275-harvest-invalid-query-params
ghcr.io/gdcc/configbaker:9275-harvest-invalid-query-params

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 16, 2024

Tests are passing: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10205/7/testReport/edu.harvard.iq.dataverse.api/

@landreev there's probably no need for you to be assigned to this PR. Thanks for the feedback. I'll remove you.

@jp-tosca jp-tosca merged commit f74e4c1 into develop Jan 17, 2024
19 checks passed
@jp-tosca jp-tosca removed their assignment Jan 17, 2024
@pdurbin pdurbin added this to the 6.2 milestone Jan 17, 2024
@pdurbin pdurbin deleted the 9275-harvest-invalid-query-params branch January 17, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Containers Anything related to cloudy Dataverse, shipped in containers. Feature: Harvesting Size: 10 A percentage of a sprint. 7 hours.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Malformed OAI-PMH requests generate an internal server error
5 participants