-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORCID integration in Dataverse #4236
Comments
One of the best ways to encourage ORCID adoption and the association of data with ORCID ID is to enable ORCID federated sign-on. This is similar to Google/FB sign-on. A growing number of Open Science systems have adopted this approach so that researchers can use use SSO across different systems and platforms - one example is seen here https://www.ariessys.com/views-and-press/resources/video-library/orcid-single-sign-on/ |
@Richardcwynne Are you thinking of something different than the ORCID authentication that Dataverse already supports? |
Wow you have done it already! Fantastic!! Sorry I missed that. Richard. |
@Richardcwynne no worries! Please see "Dataverse supports three OAuth providers: ORCID, GitHub, and Google" at http://guides.dataverse.org/en/4.9/installation/oauth2.html |
The authentication with ORCID was delivered in 4.6.1, but this issue requests additional features. As @philippconzett pointed out, there was a discussion in the Dataverse Users Community Google Group, What else does ORCID Integration give you besides Login?. @Richardcwynne any other suggestions for ORCID related features are welcome. If this specific GitHub issue does not cover a use case of yours, please feel free to open a new issue describing it. |
I just want to emphasize that this issue is about retriving ORCID information for a third party to be used in the metadata, not OAuth authentication, as @philippconzett and @mheppler pointed out. We also talked about ORCID as an example source when I visited this week's tech hour. It was about #4772, regarding the referencing of well-defined objects (people, vocabulary terms, or any other type of object through lists generated by external API calls) in the metadata. It seemed unpratical to query for the name and institution of each returned search result (complexity n instead of 1) by using the Fetch record details API. The according discussion in the orcid-api-users google group does not offer a solution to this by just using the search API. It might make sense to cache the results (as the keep narrowing down) until the ORCID API is improved to return ID, name and institution (to make sure the match is correct) for the search API call: https://pub.sandbox.orcid.org/v2.1/search/?q=John-Doe ( -> use header "Accept: application/json"). I hope that the use of ORCID to find a person for a metadata field can be solved by a generalized way to reference data from API sources in the metadata, as described in the to be added issue. |
I'm wondering if this issue could be split into two or three, even though they're very related. Using ORCID IDs to search for dataset authors For the first feature @philippconzett wrote about, making it easier for users to search for dataset authors using dataset author ORCID IDs, I interpret this a few ways. You can search for ORCID IDs using the basic search box, but you need to add quotes around the ID numbers. If you search without adding quotes, the results make me think that the search engine is treating the hyphens between each group of numbers like spaces and searching for four strings instead of one. For example, if you search for 0000-1111-2222-3333, the results will include datasets that have only 0000 in its metadata (and the search engine isn't considering results that have the entire string as most relevant). Could something be done, besides adding quotes, to make the search treat the whole ID as one string or treat results that have the whole string as most relevant? If we wanted to add the author identifier field to the advanced search, wouldn't we just need to edit the citation.tsv file so that for authorIdentifier, advancedSearch is TRUE?: (And then just follow the other steps in the Metadata Customization guide, e.g. loading tsv file, updating solr schema?) Pre-filling the citation block's author identifier field with ORCID IDs
When I log into Dataverse using my ORCID account and then create a dataset, my ORCID ID in my Dataverse account is also pre-filled in the dataset's author identifier field. I think @RightInTwo's comment is more about Dataverse using metadata it already has, like author name, to recommend ORCID IDs by pulling those IDs from an external source (ORCID's database). So when I create a dataset and add an author name, Dataverse suggests ORCID IDs that might belong to that author and that I can then add to the author identifier field. Is that how this would work from the depositor's perspective? |
I think "metadata it already has" is a bit misleading, because at the point of adding authors, Dataverse does not necessarily have any metadata yet. But otherwise, yes! |
Yes. Absolutely. Probably even more. The smaller the better. Small items move across the board faster.
We could probably fix this by changing
Judging from https://github.com/IQSS/dataverse/blob/v4.15/src/main/java/edu/harvard/iq/dataverse/search/AdvancedSearchPage.java#L70 I think so but I haven't tried it.
As someone who logs in to Harvard Dataverse but who has an ORCID ID (without an "x" @philippconzett 😄 ) I'd love to be able to add it to my user profile. Currently, this only happens for people who log in with ORCID. I don't believe the following issue has been mentioned yet but it's an integration I think we should consider: update users' ORCID record on dataset publication #3490 |
FWIW, on the SEAD project we created an input widget that allowed users to start typing name, email, or ORCID digits and we supported autocomplete for any we had seen before, storing the ORCID as the value, but displaying name as a link to the ORCID page, and showing the email as a pop-up (iff the email in the ORCID profile was public). For new ORCIDs (ones we hadn't seen), you had to type them in but, once used, we queried ORCID to be able to display name,email, ORCID. This made it reasonably useful without us having to provide search over all ORCIDs. |
@qqmyers I think your explanation is clear enough and it sounds like a great feature! Now we just need someone to code it up. 😄 @philippconzett I still agree that this issue should perhaps be broken into smaller, more clearly defined issues. We use the term "small chunks" for this. Small chunks move more quickly across our board: https://github.com/orgs/IQSS/projects/2 😄 Or maybe you could simply adjust the title of this issue to make it more specific? |
@pdurbin Makes sense to me to break up this issue into smaller ones. I think, I'll concentrate on the part that is about populating the ORCID field with the value stored in the Account Information. Before I create a new issue or rename this one, I'd like to ask some questions. @jggautier explains above how the ORCID field in a dataset is automatically filled in based on the the Account Information. That's nice! But in my Account Information, there is no ORCID field. This is probably because this information is fetched from our SSO provider? I cannot edit the Account Information in Dataverse either. When I look at the Account Information in my locally created test account on demo.dataverse.org, I cannot find the ORCID field either. So, how does one get the ORCID information into the Account Information in the first place? Is this only available when one uses sign-up / sign-in via ORCID? |
Yes, the ORCID ID is only stored for people who have authenticated with ORCID when logging into Dataverse. I believe it's stored in the persistentuserid column of the authenticateduserlookup table. |
Thanks, @poikilotherm + @pdurbin. Would it possible to get the ORCID into the Dataverse Account Information also when signing up via institutional SSO? If an institution provides ORCIDs for all its researchers, it would be nice to get this information into Dataverse along other information such as affiliation. |
@philippconzett I like to say that anything is possible with code. 😄 There are a few steps:
In short, the feature hinges on how strict you want to be about confirming that an individual "owns" the ORCID ID they say they do (like confirming an email address). I hope this helps. |
For such integrations ORCID offers API endpoints for members. You can retrieve data from ORCID (or send yours) from (to) the users profile there via their XML based REST API. This needs OAuth support on our side, but would ensure that we receive the ORCID from a trusted and validated source (vs any random human errors with such long IDs...). |
Hi all, I was looking into the status of ORCID within Dataverse and came across this issue. Great to see the discussion and consideration. Indeed, the best practice is to gather authenticated ORCID iDs from individuals via OAuth, rather than allowing for manual entry which has room for error. The ORCID public API can be used to gather authenticated IDs and read public data from ORCID. The ORCID member API allows for the same but also for writing data to people's ORCID records. It all depends on the scopes that are used in the ORCID auth URL. Note that currently institutions can not provide ORCID iDs for their affiliated researchers, only individuals can register for their own ORCID iDs. So, to get ORCID iDs into the system you would really need the individuals to connect their ORCID iD via OAuth . |
@sheilarabun: Could you please explain how researchers can "connect their ORCID iD via OAuth"? |
@philippconzett Yes absolutely! Aside from the below explanation, ORCID has more detailed documentation, initially: From the researcher/user perspective:
The access token is what would allow subsequent API calls to either import data from or write data to the person's ORCID record. Here is a list of all of the possible data points that could be included on an ORCID record: https://www.lyrasis.org/Leadership/Pages/ORCID-Data-Fields.aspx You can try out the basic OAuth process in the ORCID sandbox here: https://members.orcid.org/api/oauth/presenting-oauth#try-it One option that might make sense, is to have Dataverse be an ORCID service provider, where institutions that are using their own installation of Dataverse and are also organizational ORCID members could use their own ORCID member API credentials to enable this functionality. I'm happy to chat more if there are additional questions. |
Thanks, @sheilarabun, for this in-depth explanation! In our Dataverse-run repository (DataverseNO), we basically have two type of users:
A. Researchers in group 1 we want to sign up / log in through Feide. DataverseNO has already implemented A. If I'm not mistaken, B is also possible in Dataverse. I guess for DataverseNO, we would have to combine B with some other process where researchers of group 2 must specify which collection / sub-dataverse they need to have deposit access to. As for C, I'm not sure if this would work the same way for researchers of both group 1 and 2. |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
Some of the researchers depositing their data in our Dataverse installation use different author name variants, e.g. Eckhoff, Hanne vs. Eckhoff Hanne M. In some cases they prefer to use the same variants as in the article publication that is based on the dataset. These inconsistency in author name makes it somewhat difficult to use the author field for searching and filtering. The "same" author (with different name variants) appears several times in the Dataverse search, and, I guess, also in other search engines that harvest DataCite.
Dataverse already provides an ORCID field in the Citation Metadata section. But as far I can see, this field is not available for search/filtering in a user friendly way through GUI. I suggest that ORCID should be used in future versions of Dataverse to enable unique searching and filtering for author names in a user friendly GUI.
See also this discussion on Dataverse Google Group.
In addition, it should be possible to make the ORCID field in the Citation Metadata section pre-filled using the ORCID field in account information.
Best,
Philipp
The text was updated successfully, but these errors were encountered: