Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing OAuth provider Dataporten #4334

Closed
xibriz opened this issue Dec 1, 2017 · 25 comments
Closed

Implementing OAuth provider Dataporten #4334

xibriz opened this issue Dec 1, 2017 · 25 comments
Assignees

Comments

@xibriz
Copy link
Contributor

xibriz commented Dec 1, 2017

A little background information: In Norway, all a lot of public sectors use the a national autentication provider. You may have heard of Feide that integrates with Dataverse today trough Shibboleth. Dataporten is the next gen providing OAuth2 autentication.

I have been working on integrating Dataporten in Dataverse, code here: https://github.com/uit-no/dataverse/tree/dataporten

This will benefit all higher education institusions in Norway. I propose to merge my code so I don't have to maintain a fork.

I can probably provide a test setup with client credentials and guest accounts if that will tip the answer in our favor.

@xibriz
Copy link
Contributor Author

xibriz commented Dec 1, 2017

I have got an GO to set you your test environment up as an application in Dataporten. I just need your Redirect URI and then I can provice you with all nessecary information like Client ID and Secret, test users for login and a dataporten.json file with the userEndpoint

@xibriz
Copy link
Contributor Author

xibriz commented Dec 4, 2017

Since you guys are using ScribeJava I have gotten the DataportenApi as a part of that repository. scribejava/scribejava#805

This means I only have to add 1 new class and a couple of minor changes in my PR that I will be making verry soon.

Also note that to get Dataporten into ScribeJava I had to make arrangement so that the maintainer of ScribeJava could test the code, the same arrangemement will be made for the Dataverse team :)

@djbrooke
Copy link
Contributor

djbrooke commented Dec 4, 2017

@xibriz thanks for the PR! @scolapasta (Dataverse Architect) and I have a meeting this afternoon and we'll discuss this.

@pdurbin
Copy link
Member

pdurbin commented Dec 4, 2017

Since you guys are using ScribeJava I have gotten the DataportenApi as a part of that repository. scribejava/scribejava#805

@xibriz great! Thanks for adding a comment about this in your pull request, which looks good to me.

@xibriz
Copy link
Contributor Author

xibriz commented Dec 4, 2017

Thanks. Just doing what I can to implement this in the same way you guys have implemented Google and GitHub.

@djbrooke
Copy link
Contributor

djbrooke commented Dec 4, 2017

@xibriz thanks for the PR! As this is something that’s very specific to Norweigan installations, we don’t want to merge it into develop and include it in a release to the wider Dataverse community. Instead of merging this sort of thing to the Dataverse core code, we are exploring ways to make specific areas of the code modular so that this could be supported in a more sustainable way.

Is making this area of the code more modular something that you are interested in working on? If so, we could work with you to provide initial guidance. Modularity is a new area for us as well, so we’d be interested in learning with you.

@xibriz
Copy link
Contributor Author

xibriz commented Dec 5, 2017

@djbrooke thanks for the info. I need to check with my project leader what he want to do, but I'm interested in how you guys think this should be implemented.

I have some thoughts my self.

For each OAuth2 provider, as I can see, this information is different between them:

  • GET/POST
  • AccessToken endpoint URL
  • Authorization URL
  • Scope
  • Data format on response JSON/XML/other? and the structure
    and the things you have defined in the authentication provider JSON-file

At the top of my head, I think I have heard of a JSON library that is generic so you can describe which attributes that you want to extract from a structure. If this exist in Java you coud define that the email is in "user->email" for Dataporten and just "email" for Google and so on.

These definitions could be posted to Dataverse trough the same JSON-file as the other stuff along with the other URLs listed.

Then you could have a generic OAuth JSON provider class. Same for XML and other response types.

Just thinking out loud here...

@michbarsinai
Copy link
Member

My 2 ¢:
Creating a completely general OAuth2 client would be nice, but OAuth2 is very permissive, so there's a lot of variance between providers. Even ScribeJava ended up writing multiple clients, one-per-service. It may be possible to generalize a bit, though, in the same manner we did with the REST-based workflow step.
Another option is to open the OAuth2 providers as a plug-in area. This way, the Dataporten OAuth2 client will be developed as a plug-in, and so there won't be a need to maintain a fork.

@pdurbin
Copy link
Member

pdurbin commented Dec 5, 2017

Or we could simply merge pull request #4341 so that @xibriz doesn't have to maintain a fork. 😄 Maybe he'll fix other bugs for us too! He already helped with #4336 and is now our 50th contributor! 🎉

screen shot 2017-12-05 at 4 23 12 pm

Look how quickly he got scribejava/scribejava#805 merged. We are talking about authentication for an entire nation, after all. If we merge the code now, we can factor it out later once we have modularity in place. It isn't even that much code and as @xibriz points out, it will be even less code on the Dataverse side one ScribeJava makes a release that includes his pull request.

@xibriz
Copy link
Contributor Author

xibriz commented Dec 6, 2017

I have not been able to get in touch with the project leader yet, but I suspect my focus will be switched to trying Dataverse with a national cloud provider who is going to support SWIFT and S3. We would love to get S3 working with a provider that's not Amazon (not that we have nothing against Amazon).

I off course agree with @pdurbin, it's not that much code and it would be simple to refactor it when the modularity is in place :)

@djbrooke
Copy link
Contributor

djbrooke commented Dec 7, 2017

Thanks @xibriz. I understand about the desire to get this merged in, but getting things like this into the core is not sustainable long term. For this OAuth issue, the best solution will be to run a fork, but if we can talk ahead of time with you and/or your project leader about what’s next, we can provide some guidance. I’m going to assign this to @scolapasta to close out with any thoughts.

@scolapasta
Copy link
Contributor

OK, added new issue for a general plugin style solution: #4383. @xibriz please feel free to add any additional feedback there. Thanks!

Closing this one out.

@pdurbin
Copy link
Member

pdurbin commented Dec 9, 2019

@philippconzett and @oodu would you like me to reopen this issue? OpenID Connect (OIDC) support is a feature of Dataverse contributed by @poikilotherm and he seems pretty sure based on https://docs.feide.no/service_providers/integration_guide/oauth_oidc/openidconnect.html that you might not need to run a fork anymore. Please see discussion at #6442 (review)

If you get it working my hope would be that you could make a pull request for the Dataverse Guides to explain what you did. 😄

@oodu
Copy link

oodu commented Dec 9, 2019 via email

@philippconzett
Copy link
Contributor

Unfortunately, @xibriz is not working with us anymore. We'll have to discuss this issue at our next DataverseNO meeting, which is on December 18. Note also that the SSHOC project has on its task list to provide the possibility to use multiple authentication protocols in the same Dataverse installation. We'll get back to you after our meeting on December 18.

@poikilotherm
Copy link
Contributor

@philippconzett and @oodu this is exactly what #5974 is about. Moving the hassle of auth to a separate service, so it gets easier to maintain. Maybe you could mention that during your meeting as not everybody might be aware of it? Thx in advance! 🎉

@philippconzett
Copy link
Contributor

Thanks, @poikilotherm, for clarifying this. Currently, we use local authentication + Shibboleth for our institutional log-in, which is provided by Feide (please note that Dataporten and Feide have merged into Feide). From the Feide documentation, I learn that Feide "offers both OAuth 2.0 and OpenID Connect interfaces towards applications." Would your solution, @poikilotherm, enable Dataverse to provide the possibility to offer authentication with Feide AS WELL AS with e.g. ORCID AND local authentication?

@poikilotherm
Copy link
Contributor

Sure. The OIDC provider is just the same as the other providers.

You can either choose to offer people multiple ways of authentication within the Dataverse application by enabling those providers you mentioned plus adding Dataporten/Feide via OIDC. Beware that this still means converting users - Dataverse is not capable of handling multiauth.

Or you go big and use an IDM/IAM like Unity, Keycloak, ..., do all the hard work there and attach Dataverse to it via OpenID Connect. A real world example using Unity IDM is https://b2access.eudat.eu, offering local accounts, eduGAIN/SAML and some other providers. Advantage of such a solution is automated mapping of multiauth, add custom attributes, etc.

@philippconzett
Copy link
Contributor

philippconzett commented Dec 11, 2019

I have just had the chance to discuss this issue with one of my colleagues, and we agreed that DataverseNO would highly prefer a solution which enables Dataverse to provide the possibility to offer authentication with Feide as well as with e.g. ORCID and local authentication. But please note that, unlike noted by @pdurbin in another thread which I wasn't able to find right now, DataverseNO only uses the Dataverse main distribution provided by Harvard, and this is a policy matter for us. So, the solution outlined above must be implemented in the main distribution before we can use it in DataverseNO.

@poikilotherm
Copy link
Contributor

@philippconzett I'm a bit lost on your requirements. From what I see at https://dataverse.no, you are using Feide via SAML/Shibboleth, correct?

grafik

To enable ORCID and maybe others in your installation you want "the possibility to offer authentication with Feide as well as with e.g. ORCID and local authentication". This is already the case today, you could enable all providers you want next to Feide.

The only (but maybe huge) problem is that Dataverse does not offer authentication to the same Dataverse user coming from different authentication options. You can loging as Oliver (local account) but not access the same user by signing in via Feide, ORCID or any other provider. This is a limitation of Dataverse and, as far as I can see as a community member, not likely to be changed in the near future.

If this would be exactly what you need, the only way to go with multiple providers is by moving the logic outside Dataverse into some kind of IDM/IAM solution. This gets much easier by using the Dataverse OIDC provider. If you are fine with enabling more providers and no multiway auth, this is already possible without OIDC.

@pdurbin
Copy link
Member

pdurbin commented Dec 11, 2019

@poikilotherm good point. And even more login options can be enabled at once, like this:

Screen Shot 2019-12-11 at 7 22 24 AM

@philippconzett I'm sorry I misrepresented DataverseNO as running a fork! I'm glad you aren't! 😄

I'm very glad that you and @poikilotherm will have a chance to talk in Tromsø in about a month at the European Dataverse Workshop 2020. I wish I could come!

A while back I created and issue titled "Permit multiple login options to the same Dataverse account" at #3487 which I have since closed but if someone would like to create a fresh issue, please go ahead! 😄 These issues mean so much more when they come from the community. 😄

For a pain points with multiple accounts, please see these issues:

@philippconzett
Copy link
Contributor

Thanks, @poikilotherm and @pdurbin, for clarifying things.

We have been aware of the fact that authentication for the same Dataverse user coming from different authentication options is not an option in Dataverse. When we started with TROLLing, we only used local authentication. When we expanded our installation to include our institutional repository, UiT Open Research Data (which is now a collection within DataverseNO), we implemented single sign-on with Feide. We then had to convert the existing TROLLing users at UiT from local to Shibboleth.

I now had to dig deeper into the background of the original request by @xibriz. As far as I’ve been able to figure out, the reason for the request was the following: Back in 2017 (or maybe earlier), UNINETT, which is the provider of Feide (the national authentication service uses at Norwegian HE institutions) was working on a new authentication service, called Dataporten, which was based on OAuth2, and which should replace Feide. Our Feide single sign-on integration was at that time, and still is, based on Shibboleth. As far as I now, at that time, and still today, institutional log-in integration in the main distribution of Dataverse (and without any additional add-on or similar) is only possible through Shibboleth. So, that was, I guess, the reason why @xibriz wanted to integrate institutional log-in based on OAuth2. Later, after @xibriz finished to work on this Dataverse integration, UNINETT merged Dataporten with Feide. The features in Dataporten were integrated in a new version/generation of Feide, also called new Feide (nye Feide). New Feide is thus based on OAuth2. But old Feide is still in use, and I guess that our single sign-on integration in DataverseNO is based on old Feide, and thus still works with Shibboleth. In order for us to switch to new Feide, we need to base our single sign-on integration in DataverseNO on OAuth2. (See this information from UNINETT in Norwegian: “Merk at tjenester som i dag er koblet til Feide, må koble seg til nye Feide gjennom et nytt grensesnitt (OAuth 2.0/OpenIDConnect) for å ta i bruk ny funksjonalitet. Dette skjer ikke automatisk.” (https://www.feide.no/nye-feide).

So, if we want to migrate to new Feide, I guess @poikilotherm’s solution would allow us to enable both institutional authentication through new Feide (based on OAuth2) as well as authentication through ORCID and local authentication (e.g. to be used for test accounts)?

I see of course the advantages of enabling authentication for the same Dataverse user coming from different authentication options. But, I think there are some issues we should consider. Here is one of them:

In multi-institutional Dataverse installations, e.g. DataverseNL (pinging @4tikhonov) and DataverseNO, users are recognized by their institutional affiliation, and they are assigned access rights to their institutional collection based on their affiliation. Let’s assume that Dataverse enables authentication for the same Dataverse user coming from different authentication options. Let’s assume that a researcher at UiT has an ORCID account, and has registered her/his private email address in this ORCID account. How can we assure that when this UiT researcher logs into DataverseNO via ORCID authentication, will be recognized as a UiT researcher by Dataverse?

@pdurbin
Copy link
Member

pdurbin commented Dec 13, 2019

How can we assure that when this UiT researcher logs into DataverseNO via ORCID authentication, will be recognized as a UiT researcher by Dataverse?

This reminds me of a conversation I had with Scott Bradner a few years ago. I was telling him that we added ORCID login support to Dataverse. His reaction (I'm paraphrasing, of course) was, "That's nice but as far as I know ORCID does absolutely no vetting to make sure you are who you say you are and that you have the affiliations you say you do."

In contrast, having a Shibboleth login at Harvard means something. You don't get your "HarvardKey" until you visit an office and get your photo taken, etc.

I must admit that I'm completely ignorant about if ORCID is now vetting people in any way. When I signed up it felt like creating an account on Google or Yahoo or whatever.

According to https://www.pidapalooza.org/about ORCID is one of the sponsors of PIDapalooza so maybe I can ask about this in 6 weeks or so. I should probably start a list. @philippconzett meanwhile, maybe you could ask in the ORCID forum at https://support.orcid.org/hc/en-us/community/topics

@philippconzett
Copy link
Contributor

Thanks, @pdurbin! I have submitted the following question to the ORCID support as well as to the ORCID forum:

Our organization runs a national repository for research data. When logging in, researchers are automatically recognized according to the organization they are affiliated with and get assigned access rights accordingly. Authentication for institutional log-in is currently based on Shibboleth/LDAP. The repository software we are using also allows authentication through ORCID, based on OAuth2, and we are considering to enable this feature. However, the question arises of whether there is any control in ORCID of whether the affiliation that is specified for a researcher in ORCID, actually is correct and currently up-to-date. We’d appreciate to get any information about this. Thank you!

@scolapasta
Copy link
Contributor

A few comments on enabling authentication for the same Dataverse user coming from different authentication option:

When we first began with Dataverse 4, we didn't have as many login options (native and shibboleth) so it wasn't as necessary. And we were concerned with a use case similar to the one described by philippconzett of a user logging in with their native account and not seeing access to what has been granted to their institutional account(*). In particular, we were concerned with users being confused by why they don't see everything they would expect.

(*) to be clear, even in cases where we would be able to identify the connection, we can't allow access to these objects, because we don't know if they are still valid users at the institution

I do think we do want to support this some day. So we will need to think of proper messaging (at login for example, explaining they they've only authenticated through one way, perhaps). Additionally, we'd need to modify the permission system so that if you don't have access with one account (e.g. the native account), they system doesn't necessarily throw a 403, but affords the user the opportunity to authenticate with their other account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants