Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File DOI registration with EZID does not make DOIs public #4828

Closed
qqmyers opened this issue Jul 10, 2018 · 24 comments
Closed

File DOI registration with EZID does not make DOIs public #4828

qqmyers opened this issue Jul 10, 2018 · 24 comments
Assignees

Comments

@qqmyers
Copy link
Member

qqmyers commented Jul 10, 2018

Using the ezid test server, I was able to create file DOIs with the registerDataFileAll api call but the resulting DOIs are 'reserved' rather than 'public'. I was able to track this to the logic in RegisterDvObjectCommand which only does setIdentifierRegistered(true); after the call to publicizeIdentifier. For the DOIEZIdServiceBean, publicize tries to re-createIdentifier() if isIdentifierRegistered() is not true, which fails (identifier already exists). While EZID is on its way out, the problem would affect any registerWhenPublished() == false id provider.

I have a fix that I will post, but it may be a few days. In the meantime, I think going to 4.9.1 with EZID and running registerDataFileAll will leave identifiers in 'reserved' and throw error messages about duplicate ids in the log.

@pdurbin
Copy link
Member

pdurbin commented Jul 10, 2018

@qqmyers thanks for making pull request #4830. I just moved this issue to code review at https://waffle.io/IQSS/dataverse

@djbrooke djbrooke assigned sekmiller and pdurbin and unassigned sekmiller Jul 11, 2018
@pdurbin pdurbin removed their assignment Jul 13, 2018
@pdurbin
Copy link
Member

pdurbin commented Jul 13, 2018

@qqmyers I'm sorry to do this but can you either fix up pull request #4830 or make a new one that doesn't have the whitespace changes? It's difficult to see what you changed.

@pdurbin
Copy link
Member

pdurbin commented Jul 20, 2018

@qqmyers I just remembered that by adding ?w=1 to the URL like https://github.com/IQSS/dataverse/pull/4830/files?w=1 I can see the "diff" without the whitespace changes:

screen shot 2018-07-20 at 1 53 55 pm

However, now there's a merge conflict with src/main/java/edu/harvard/iq/dataverse/engine/command/impl/RegisterDvObjectCommand.java

@pdurbin
Copy link
Member

pdurbin commented Jul 23, 2018

@qqmyers I see you resolved the merge conflict. Thanks! I assume you're ready for code review, but if not, please let us know!

@qqmyers
Copy link
Member Author

qqmyers commented Jul 23, 2018

+1 - ready to go.

pdurbin added a commit to QualitativeDataRepository/dataverse that referenced this issue Jul 24, 2018
@pdurbin
Copy link
Member

pdurbin commented Jul 24, 2018

@qqmyers cool. I hope you don't mind but in 8712c94 I reverted the whitespace changes to make it easier to code review the pull request.

@qqmyers
Copy link
Member Author

qqmyers commented Jul 24, 2018

That's great. I try not to change whitespace but was rushed with that commit.

@pdurbin
Copy link
Member

pdurbin commented Jul 26, 2018

@qqmyers I see you moved those methods within the null check. Thanks. @sekmiller and I just talked about this issue and he'll take a look at your pull request.

@sekmiller sekmiller removed their assignment Jul 27, 2018
@kcondon kcondon self-assigned this Jul 27, 2018
@djbrooke djbrooke assigned sekmiller and unassigned kcondon Jul 31, 2018
@kcondon
Copy link
Contributor

kcondon commented Jul 31, 2018

@qqmyers I've tested this and found that though it does not register unpublished files, it still registers published files as reserved rather than public

@qqmyers
Copy link
Member Author

qqmyers commented Jul 31, 2018

@kcondon - Hmm - we just ran ezid publication of files on qdr and I verified that they were public. Our deployed code is more like the initial commits here, so perhaps something in the shuffle of code inside if statements broke it. I'm on travel, but will look again as I can to see if I can spot an issue.

@qqmyers
Copy link
Member Author

qqmyers commented Jul 31, 2018

FWIW - the code at https://github.com/QualitativeDataRepository/dataverse/blob/v4.9.1-qdr-4777/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/RegisterDvObjectCommand.java works for QDR. Other than moving the updates inside the test to see if an identifier exists, and merging with the change to use the GlobalIdServiceBean class, I don't see any difference that would cause it not to work with ezid...

@kcondon
Copy link
Contributor

kcondon commented Aug 1, 2018

@qqmyers I think the branch you just mentioned is different from the one in the PR, no?
QualitativeDataRepository:4828-DOI_registration_with_EZID_does_not_make_DOIs_public

@qqmyers
Copy link
Member Author

qqmyers commented Aug 1, 2018

@kcondon Yes. The URL in my comment is to QDR's current deployment. My point was that it works and I don't see anything different enough from the PR to cause what you're seeing. Unless you can see something, it would have to be something in the recent merge or some config issue, etc. The only guess I have would be that this change won't update existing reserved dois to public, so if some had been created before this patch, you'd still see them.
When I get back from travel, I can try updating the qdr branch to use exactly the same code as the PR and see if that reproduces the issue.

@djbrooke djbrooke assigned kcondon and unassigned sekmiller Aug 2, 2018
@kcondon
Copy link
Contributor

kcondon commented Aug 3, 2018

OK, thanks. I've retested it and see the same "reserved" behavior.
I'm attaching a copy of my test db for you. It was created using 4.8.6, 4 datasets, 2 published, 2 unpublished, upgraded to 4.9.1, before running register endpoint. I'll also attach a copy after running it.
testdb.zip
testdbpostapi.zip

@qqmyers
Copy link
Member Author

qqmyers commented Aug 3, 2018

@kcondon - Thanks. What are you seeing in the log? For the original problem, there were warnings from EZID about the DOI already existing (because publicize calls create if the id isn't registered and create then fails.). With Fine logging, publicize adds a message but by default you should be seeing the warnings from within the create call.

@kcondon
Copy link
Contributor

kcondon commented Aug 3, 2018

[2018-08-02T18:18:14.611-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294611] [levelValue: 800] [[
Starting to register: analyzing 11 files. Thu Aug 02 18:18:14 EDT 2018]]

[2018-08-02T18:18:14.612-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294612] [levelValue: 800] [[
Only unregistered, published files will be registered.]]

[2018-08-02T18:18:14.657-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.GlobalIdServiceBean] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294657] [levelValue: 1000] [[
Unknown protocol: null]]

[2018-08-02T18:18:14.709-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.GlobalIdServiceBean] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294709] [levelValue: 1000] [[
Unknown protocol: null]]

[2018-08-02T18:18:14.715-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294715] [levelValue: 800] [[
1 of 11 files not yet published]]

[2018-08-02T18:18:14.715-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294715] [levelValue: 800] [[
2 of 11 files not yet published]]

[2018-08-02T18:18:14.715-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294715] [levelValue: 800] [[
3 of 11 files not yet published]]

[2018-08-02T18:18:14.747-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.GlobalIdServiceBean] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294747] [levelValue: 1000] [[
Unknown protocol: null]]

[2018-08-02T18:18:14.781-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.GlobalIdServiceBean] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294781] [levelValue: 1000] [[
Unknown protocol: null]]

[2018-08-02T18:18:14.814-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.GlobalIdServiceBean] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294814] [levelValue: 1000] [[
Unknown protocol: null]]

[2018-08-02T18:18:14.821-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294821] [levelValue: 800] [[
4 of 11 files not yet published]]

[2018-08-02T18:18:14.821-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294821] [levelValue: 800] [[
5 of 11 files not yet published]]

[2018-08-02T18:18:14.821-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294821] [levelValue: 800] [[
6 of 11 files not yet published]]

[2018-08-02T18:18:14.821-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294821] [levelValue: 800] [[
Final Results:]]

[2018-08-02T18:18:14.822-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294822] [levelValue: 800] [[
0 of 11 files were already registered. Thu Aug 02 18:18:14 EDT 2018]]

[2018-08-02T18:18:14.822-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294822] [levelValue: 800] [[
6 of 11 files are not yet published. Thu Aug 02 18:18:14 EDT 2018]]

[2018-08-02T18:18:14.822-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294822] [levelValue: 800] [[
5 of 11 unregistered, published files to register. Thu Aug 02 18:18:14 EDT 2018]]

[2018-08-02T18:18:14.822-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.api.Admin] [tid: _ThreadID=35 _ThreadName=http-listener-1(5)] [timeMillis: 1533248294822] [levelValue: 800] [[
5 of 5 unregistered, published files registered successfully. Thu Aug 02 18:18:14 EDT 2018]]

@qqmyers
Copy link
Member Author

qqmyers commented Aug 3, 2018

@kcondon - OK - a good clue in the log: it looks like the protocol from the dvObject coming in is null and due to that, the new code in GlobalIdServiceBean is returning null instead of a bean - you've got the SEVERE log message about that. Any exception from that would get caught and ignored in the command class. Not sure I understand how you get reserved ids at all yet, but something to dig into now...

@qqmyers
Copy link
Member Author

qqmyers commented Aug 3, 2018

@kcondon - When you said 'reserved' - did you just mean that the doi is in the DB or that you could see the DOI in EZID as 'reserved'?. Hopefully the former - I can see how the DOI would get generated before the null bean would cause an error and will add a fix in the PR, but if EZID is actually getting contacted, something else is going on. FWIW - this PR fixes the issue where EZID is contacted but the DOI is never transitioned from their 'reserved' to 'public' state.

@kcondon
Copy link
Contributor

kcondon commented Aug 3, 2018

@qqmyers I could see the doi in EZID as reserved

@qqmyers
Copy link
Member Author

qqmyers commented Aug 3, 2018

@kcondon - Hmm - I don't understand that unless some other part of the code sees the generated Id and calls EZID . Can you try the modification I just committed to the PR? It should allow the command to succeed and create IDs and make them public at EZID, and get rid of the SEVERE message in the logs. If it works, I can look a bit more and see if I can find if there is somewhere else EZID could be getting called to create but not publish/make public the DOI.

@kcondon
Copy link
Contributor

kcondon commented Aug 3, 2018

@qqmyers your branch fails to build -refresh from /develop since we've fixed a failing test today

@qqmyers
Copy link
Member Author

qqmyers commented Aug 3, 2018

@kcondon - its merged...

@kcondon
Copy link
Contributor

kcondon commented Aug 3, 2018

@qqmyers Your last fixes have resolved the issue. I'll merge this now. Thanks!

@qqmyers
Copy link
Member Author

qqmyers commented Aug 3, 2018

Yeah! So ftr the second commit handles a change between IdServiceBean and GlobalIdServiceBean where the latter no longer handles a null protocol parameter (by finding the currently configured one in the context) in the getBean() method. I did a quick search back through other places that call getBean() and I think all of the others handle situations where the protocol will always be defined. It's only for file during publication that their protocols will be null.

@djbrooke djbrooke added this to the 4.9.2 - Stata Upgrades, etc. milestone Aug 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants