Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework auth/api_token parameters #201

Merged
merged 12 commits into from
Aug 26, 2024
Merged

Rework auth/api_token parameters #201

merged 12 commits into from
Aug 26, 2024

Conversation

shoeffner
Copy link
Collaborator

@shoeffner shoeffner commented Jul 19, 2024

Describe your environment

On host:

  • OS: macOS, Sonoma 14.5, M1
  • pyDataverse: this PR (i.e., main branch + patches)
  • Python: 3.12
  • Dataverse: 6.3 (local/container), 6.2 (demo.dataverse.org)

Inside container:

  • OS: I think the python images use ubuntu, but didn't check
  • pyDataverse: this PR (i.e., main branch + patches)
  • Python: 3.11
  • Dataverse: 6.3 (!) -- had to override DV_VERSION in docker-compose-test-all.yml to make tests pass, as three tests check for the version. See also Update DV_VERSION to 6.3 #197 .

Follow best practices

Describe the PR

  • What kind of change does this PR introduce?
    • This PR introduces custom authentication implementation to allow for API token and Bearer token-based authentication.
    • Sword requires BasicAuth, which this PR also addresses (see https://guides.dataverse.org/en/latest/api/sword.html#sword-auth)
    • An Auth implementation must be passed as a keyword-argument to the init function. We can discuss this requirement, but it felt better to be explicit about it and make sure implementers actually pass it correctly rather than accidentally passing it as the api_token or something.
    • I replaced some isinstance(..., ("".__class__, "".__class__)) with the much more legible and equivalent (as far as I can tell) isinstance(..., str).
    • It also deprecates the use of the previously ignored auth on individual functions in favor of generating multiple API objects with different auth methods.
      • For the deprecation, I introduced a DEPRECATION_GUARD to warn people who are trying to pass something to the functions. Initially I wanted to check for False, but there was one call defaulting to True (get_access) and this would not catch code explicitly setting auth=False on the callsite. So I decided to introduce a new default argument and show a warning if it is overwritten. Happy to discuss this decision – I have never done such an orderly deprecation before, so this is my first try :)
    • I updated the documentation.
    • Oh, and I smuggled a change to the Api.__str__ methods into the PR. As I scrolled by, I saw that they essentially could be removed and just the Api.__str__ method itself should use self.__class__.__name__ to get a similar result. However, the DataAccessApi will not print DataAccessApi: ... instead of the previous Data Access API: ... – the same applies for the others. However, this felt to be more in spirit with the docstring (which I then also updated slightly). If that's not so nice, we can move that to its own PR or even drop that commit entirely.
    • I had to adapt three unrelated things (conf.py, the self.base_url = None assignment in api.py, and the pyproject.toml) for mypy.
    • This PR almost accidentally also allows to pass params properly due to the name change of params -> headers.
  • Why is this change required? What problem does it solve?
    • First and foremost, the issue it resolves is that tokens could potentially be leaked when sharing log messages, as the error response from dataverse includes the request URL, which, if the token is provided as ?key, contains the key.
    • The PR also lays the groundwork to allow for different authentication schemas in the future and even to configure it from the outside, favoring composition over inheritance. This will hopefully make maintenance and the refactoring in Refactor api.py into sub-modules #189 easier.
    • By deprecating the auth on individual methods, it will eventually be possible to remove this unused argument and make the API a little bit leaner and reduce confusing behavior.
  • Screenshots (if appropriate)
  • Put Closes #ISSUE_NUMBER to the end of this pull request

Testing

  • Have you used tox and/or pytest for testing the changes? pytest
  • Did the local testing ran successfully? yes, if applying Update DV_VERSION to 6.3 #197. For demo.dataverse.org, I could not run test_upload.py successfully, as I was unauthorized to perform those uploads (401). However, locally (outside the containers) I was able to run the asyncio tests with pytest-asyncio. The containers seem to miss this.
  • Did the Continuous Integration testing (Travis-CI) ran successfully?

Commits

  • Have descriptive commit messages with a short title (first line).
  • Use the commit message template
  • Put Closes #ISSUE_NUMBER in your commit messages to auto-close the issue that it fixes (if such).
    • I have only done this for 3e73fbf, as that is arguably the "essence" of this PR.

Others

  • Is there anything you need from someone else? Take your time to review these changes. While some are quite repetitive and I already split some parts off into other PRs, this is a massive changeset due to the deprecation. In fact, maybe the deprecation should be its own PR? What do you think?

Documentation contribution

  • [?] Have you followed NumPy Docstring standard? I hope so

Code contribution

  • Have you used pre-commit? Yes, see also Run pre-commit run --all #196.
  • Have you formatted your code with black prior to submission (e. g. via pre-commit)? Yes
  • Have you written new tests for your changes? Yes
  • Have you ran mypy on your changes successfully? Yes, but I had to add types-jsonschema to the lint dependencies and in the conf.py. I haven't tried --strict.
  • Have you documented your update (Docstrings and/or Docs)? yes
  • Do your changes require additional changes to the documentation? No, but m

Note that this PR depends on other PRs which should be reviewed and decided/acted upon first, I can also rebase/merge the changes into this branch afterwards.
In particular:

Closes #192 .

@JR-1991 JR-1991 self-assigned this Jul 19, 2024
@JR-1991 JR-1991 self-requested a review July 19, 2024 07:46
@JR-1991 JR-1991 added the pkg:api api related activities label Jul 19, 2024
@JR-1991 JR-1991 added this to the 0.3.4 milestone Jul 19, 2024
@JR-1991
Copy link
Member

JR-1991 commented Jul 19, 2024

Thanks, @shoeffner, that looks great! I agree with the changes made to allow passing either 'auth' or 'api_token'. I just tested it locally, and it works perfectly 🙌

image

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run the code or even look at it very closely but I'm happy there are tests and the PR seems reasonable to me.

I'm leaving a few tiny suggestions for docs.

Thanks for the PR, @shoeffner! ❤️

pyDataverse/auth.py Outdated Show resolved Hide resolved
pyDataverse/auth.py Outdated Show resolved Hide resolved
pyDataverse/auth.py Outdated Show resolved Hide resolved
@@ -64,6 +65,7 @@ optional = true
black = "^24.3.0"
radon = "^6.0.1"
mypy = "^1.9.0"
types-jsonschema = "^4.23.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, where is this used?

I'm asking because upstream we've been trying to improve JSON Schema support:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In utils, jsonschema.validate is used:

from jsonschema import validate

jsonschema is not typed, so mypy requires the type stubs for it. When you run mypy on the code you will get the respective error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look, the call chain is: pyDataverse.models.(Dataset|DVObject).validate_json -> pyDataverse.utils.validate_data -> jsonschema.validate.
There are also four schema.json files checked in at pyDataverse/schemas/json (which is not a package though, so one would need the filepaths to validate against them).

The function is not called automatically and supposed to be usable by consumers of the package, the validate_json functions are tested in test_datafile.py, test_dataset.py, and test_dataverse.py. The paths to the schemas are declared in the conftest.py.

pyDataverse/api.py Outdated Show resolved Hide resolved
pyDataverse/api.py Show resolved Hide resolved
shoeffner added a commit to shoeffner/pyDataverse that referenced this pull request Jul 23, 2024
shoeffner added a commit to shoeffner/pyDataverse that referenced this pull request Jul 23, 2024
The SWORD API returns an empty string for 401 Unauthorized.
This would cause the usual response handling of extracting the error
message from the JSON to raise, so we except that in that case.

Resolves review comment on gdcc#201.
@shoeffner shoeffner mentioned this pull request Jul 25, 2024
@JR-1991
Copy link
Member

JR-1991 commented Aug 23, 2024

@shoeffner, due to the merge of the jsonData pull request, this one has a couple of merge conflicts. I am happy to take care of those if you want.

@shoeffner
Copy link
Collaborator Author

I'll take a look, thanks for letting me know!

This way, they use the class names and base URLs, instead of being specified
for each class.
This test verifies that parameter ?key is not used.
Otherwise, it will occur in error messages, which might lead to an unwanted
disclosure of the key, e.g., when sharing log messages.
- Add custom httpx.Auth implementation for Dataverse API Tokens
- Add custom httpx.Auth implementation for Dataverse Bearer Token
- Add auth to docs
It is still possible to use the api_token parameter.
In this case assume an API token was copied from the
user profile of a Dataverse instance.

If both are specified, an api_token and an explicit auth method,
warn the user and use the auth method.

Closes gdcc#192.
Instead, if required, use multiple instances of each API implementation.
Also add types-jsonschema as a lint dependency.
The SWORD API returns an empty string for 401 Unauthorized.
This would cause the usual response handling of extracting the error
message from the JSON to raise, so we except that in that case.

Resolves review comment on gdcc#201.
This change was necessary due to rebasing the feature branch onto the json-data fixes in gdcc#203.

Relates to gdcc#201.
@shoeffner
Copy link
Collaborator Author

Alright, I rebased everything and resolved one incompatibility with #203, which I didn't pay attention to during the rebase. Should work now.

@shoeffner shoeffner marked this pull request as ready for review August 24, 2024 15:37
@JR-1991
Copy link
Member

JR-1991 commented Aug 26, 2024

@shoeffner, thanks a lot - Merging!

@JR-1991 JR-1991 merged commit d57a351 into gdcc:main Aug 26, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:api api related activities
Projects
Development

Successfully merging this pull request may close these issues.

API Key and User-Agent should preferrably be passed as headers
3 participants