-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service provider: ListIdentifierHandler (?) truncating resumptionToken #170
Comments
Hi there, thanks for opening the bug report. To make the process easier: are you using the library on its own or within Dataverse? Thank you! |
Sorry for the short description of my problem. I'm using library that's not within Dataverse. I'm trying to harvest all of the identifiers from specific library, that offers oai-pmh protocol. Response i've got from oai looks like this:
(You probably also can get similar response with copy-paste url into browser.) Resumption token differs only in number after LAST_ITEM After i get my portion of identifiers and resumptionToken for continuing my harvest, I prepare another url, this time with resumptionToken param, that looks like this: http://dlibra.umcs.lublin.pl/dlibra/oai-pmh-repository.xml?verb=ListIdentifiers&resumptionToken=F889BBA1B1F73EE9FD88BBF0F39ABDA7ListIdentifiers1691757140380_DL_LAST_ITEM_50_DL_METADATA_mets and getting another portion of identifiers. Processing goes fine, but when app reaches identifiers list with resumptionToken = _LAST_ITEM_1600, resumptionToken is truncated. Shortened resumptionToken is passed to my own CustomOaiClient (with JdkHttpOaiClient happens the same) as param. Then i'm creating url from params like this:
and with that url I'm requesting for more identifiers. This url looks like this: http://dlibra.umcs.lublin.pl/dlibra/oai-pmh-repository.xml?verb=ListIdentifiers&resumptionToken=F889BBA1B1F73EE9FD88BBF0F39ABDA7ListIdentifiers1691757140380_DL_LAST_ITEM_1600_DL_ There are missing METADATA_mets part and because of that i get I was debugging and checking if maybe i was done unexpected token truncating during preparing for request. This resumptionToken is already truncated here: Line 69 in 6808361
after that, this text variable is assigned to resumptionToken variable and this resumptionToken is passed as param to another request. It's hard to tell what's going on here, that could truncate specific resumptionToken. This token is not significantly longer or i dont know, different? That's why I'm asking if maybe you could tell me what can go wrong here, or maybe it's just an issue. This token is correctly passed directly from oai-pmh response. It's just extracting that token from oai-response into java code does something unexpected. |
Hi guys, Hope you're doing well. I wanted to draw your attention back to the GitHub issue I raised, which seems to have slipped off the radar. Your insights and expertise on this matter would be really helpful in making progress. Looking forward to your involvement in resolving this. Thanks! |
Sorry we were/are pretty busy with preparing Dataverse 6.0, so there were no cycles left to address this. If you want to dive into this on your own, please feel free to give it a go! PRs much appreciated! Personally, I'd try to do a recording of the HTTP data exchange and put it into WireMock, so we can test something and also keep the test around for the future. |
Let me add a quick comment that I spent a few cycles yesterday, trying to create a reproducer. I wasn't really able to pin down repeatable fail conditions. Will push soon, so someone can play around with it some more. |
Here's something to play around with: https://github.com/gdcc/xoai/tree/170-reproducer I played with different combos of parameters, but was not able to reliably reproduce the problem in https://github.com/gdcc/xoai/blob/170-reproducer/xoai-service-provider/src/test/java/io/gdcc/xoai/serviceprovider/reproducers/Issue170IT.java @Ajmma if you can provide a combo that works, please let me know! |
Hello, i've got weird issue during full range identifiers harvest of http://dlibra.umcs.lublin.pl
I've investigated this exception and figured out that resumptionToken is truncated at some point during the processing.
Each value assigned to resumptionToken variable in ListIdentifierHandler class looks like this:
2261E6AEAC7E55ECC864C955C7231E63ListIdentifiers1691656059787_DL_LAST_ITEM_50_DL_METADATA_mets
2261E6AEAC7E55ECC864C955C7231E63ListIdentifiers1691656059787_DL_LAST_ITEM_100_DL_METADATA_mets
...
2261E6AEAC7E55ECC864C955C7231E63ListIdentifiers1691656059787_DL_LAST_ITEM_1500_DL_METADATA_mets
2261E6AEAC7E55ECC864C955C7231E63ListIdentifiers1691656059787_DL_LAST_ITEM_1550_DL_METADATA_mets
2261E6AEAC7E55ECC864C955C7231E63ListIdentifiers1691656059787_DL_LAST_ITEM_1600_DL_
Everytime resumptionToken is truncated on the same resumptionToken last item = 1600
Resumption token from this source for 1600 looks like this:
<resumptionToken completeListSize="45695" cursor="1550" expirationDate="2023-08-10T11:12:09Z">4ECCFC571D6632484E8D04ECFF3214A3ListIdentifiers1691656753766_DL_LAST_ITEM_1600_DL_METADATA_mets</resumptionToken>
Can you guys check it out? I would be grateful.
The text was updated successfully, but these errors were encountered: