Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot access URLs #1607

Open
johentsch opened this issue Nov 21, 2023 · 2 comments
Open

Cannot access URLs #1607

johentsch opened this issue Nov 21, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@johentsch
Copy link

Overview

I am curating a data project where we make datapackages available both on GitHub and on Zenodo. So far, I have never managed to access data on the web using the frictionless framework.

Setup

I have installed the framework (version 5.16.0) using pip install -U frictionless[zenodo,visidata].

GitHub Fail

Our workflow publishes datapackages with every GitHub release such as this one here.

Since everyone can access the file

I would expect to be able to

frictionless explore https://github.com/DCMLab/dcml_corpora/releases/download/v2.1/dcml_corpora.datapackage.json

or to

frictionless validate https://github.com/DCMLab/dcml_corpora/releases/download/v2.1/dcml_corpora.datapackage.json

but in both cases I get

╭─ Error ─────────────────────────────────────────────╮
│ [error] Repo and user is required                   │
╰─────────────────────────────────────────────────────╯

I hope it's just that I overlooked the relevant bit in the documentation which I have extensively searched. Nevertheless, from my understanding of Frictionless this should "just work".

Zenodo Fail

Likewise, take this Zenodo record.

It provides the datapackage

so I would like to

frictionless explore https://zenodo.org/records/10181503/files/mozart_piano_sonatas.datapackage.json
frictionless validate https://zenodo.org/records/10181503/files/mozart_piano_sonatas.datapackage.json

but I get

╭─ Error ─────────────────────────────────────────────╮
│ [error] Record is required.                         │
╰─────────────────────────────────────────────────────╯

Am I doing something wrong?

@roll
Copy link
Member

roll commented Nov 22, 2023

Thanks for reporting it's a bug!

@pierrecamilleri
Copy link
Collaborator

pierrecamilleri commented Nov 20, 2024

I made a failed attempt to fix the github links, so for the next attempt (or if anyone wants to try to fix this), here is what happens : due to the github.com domain, frictionless thinks it needs to interact with a Github Portal, which in turn expects a URL shaped like github.com/user/repo with no additional URL segment and therefore throws an exception.

Instead, the URL provided in the issue is a plain URL pointing to a plain JSON file, and I suspect the same holds for the zenodo URL.

A possible solution might be to check if the URL has the expected shape before dispatching it to a data portal (which probably happens in some factory.py file => correction: the create_adapter function decides if the source is appropriate for the plugin or if it needs to be passed over to other plugins).

pierrecamilleri added a commit that referenced this issue Nov 22, 2024
- fixes part of #1607 

In the absence of other hints (e.g. a Controller object associated to
another plugin), all URLs from `github.com` were being processed by the
`GithubPlugin`, to create a Github Data Portal adapter.

With this PR, `GithubPlugin` only handles the URL if it has the expected
format "https://github.com/user_or_org/repo", and
"https://github.com/user_or_org", otherwise it is passed other to be
treated as a plain URL.

I also made some refactorings along the way, to have more readable and
less nested code.

As tests github portal are currently skipped, I checked manually that I
have not broken this functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants