Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gated dataset info is leaked #2457

Closed
albertvillanova opened this issue Aug 19, 2024 · 5 comments
Closed

Gated dataset info is leaked #2457

albertvillanova opened this issue Aug 19, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@albertvillanova
Copy link
Member

Describe the bug

Unauthenticated users can access gated dataset info.

Reproduction

from huggingface_hub import HfApi

hf_api = HfApi(token=False)
dataset_info = hf_api.dataset_info("albertvillanova/gated-csv", token=False)
print(dataset_info)

Logs

DatasetInfo(id='albertvillanova/gated-csv', author='albertvillanova', sha='7ce938a89113308d5a724e6f3c9a2c07315983d4', created_at=datetime.datetime(2024, 8, 19, 9, 52, 14, tzinfo=datetime.timezone.utc), last_modified=datetime.datetime(2024, 8, 19, 9, 53, 48, tzinfo=datetime.timezone.utc), private=False, gated='manual', disabled=False, downloads=0, downloads_all_time=None, likes=0, paperswithcode_id=None, tags=['region:us'], card_data=None, siblings=[RepoSibling(rfilename='.gitattributes', size=None, blob_id=None, lfs=None), RepoSibling(rfilename='train.csv', size=None, blob_id=None, lfs=None)])

System info

- huggingface_hub version: 0.24.5
- Platform: Linux-6.1.85+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Running in iPython ?: Yes
- iPython shell: Shell
- Running in notebook ?: Yes
- Running in Google Colab ?: Yes
- Token path ?: /root/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: 2.7.16
- Tensorflow: 2.17.0
- Torch: 2.3.1+cu121
- Jinja2: 3.1.4
- Graphviz: 0.20.3
- keras: 3.4.1
- Pydot: 1.4.2
- Pillow: 9.4.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.8.2
- aiohttp: 3.10.2
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /root/.cache/huggingface/hub
- HF_ASSETS_CACHE: /root/.cache/huggingface/assets
- HF_TOKEN_PATH: /root/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@albertvillanova albertvillanova added the bug Something isn't working label Aug 19, 2024
@Wauplin
Copy link
Contributor

Wauplin commented Aug 19, 2024

This is not a bug but a feature 😉
Even when a repo is gated, some info is made accessible to the public. The actual data stored on the repo shouldn't be accessible but that's it. I'll close this issue but happy to reopen if there is a specific attribute you'd like to discuss.

cc @Pierrci on the Hub side

@Wauplin Wauplin closed this as completed Aug 19, 2024
@albertvillanova
Copy link
Member Author

albertvillanova commented Aug 19, 2024

Thanks for your reply, @Wauplin. Do you know since when this is the case?

Since this feature was enabled, we have dead code in the datasets library and the dataset-viewer CI failing:

@Wauplin
Copy link
Contributor

Wauplin commented Aug 19, 2024

Beginning of August I think. It also created some other side effects (see internal slack) so feel free to have a look if your issue is similar.

@Wauplin
Copy link
Contributor

Wauplin commented Aug 19, 2024

You can check if a dataset is gated with dataset_info(..., expand="gated"). And you can check if your token has access to the gated repo using GET /api/:repoType(models|spaces|datasets)/:namespace/:repo/auth-check (It will return 200 if the token has access, 403 if not, and 404 if no repo)

@albertvillanova
Copy link
Member Author

Thanks a lot, @Wauplin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants