Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ✏️ add links to the Datasets API #4984

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,13 @@

Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the [Hugging Face Hub](https://huggingface.co/datasets), allowing you to easily load and share a dataset with the wider machine learning community.

Find your dataset today on the [Hugging Face Hub](https://huggingface.co/datasets), and take an in-depth look inside of it with the live viewer.
Find your dataset today on the [Hugging Face Hub](https://huggingface.co/datasets), and take an in-depth look inside of it with the [live viewer](https://huggingface.co/datasets/glue/viewer/cola/train).

<Tip>

Are you looking for the [Datasets REST API](https://huggingface.co/docs/datasets-server)? Integrate into your apps over 10,000 datasets via simple **HTTP requests**, with pre-processed responses and scalability built-in.

</Tip>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this tip will bring the four blocks "tutorials" "how-to guides" etc. at the very bottom of the page and users will have to scroll. As it is the first page, it must also be minimal and straight to the point. Therefore I'm not a big fan of changing this page.

datasets users will probably mostly be interested in having the preview working. I think we can have a dedicated page in the "Dataset Repository" section about "Dataset Preview" that explains how it works, and possibly redirect to the REST API


<div class="mt-10">
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
Expand Down
4 changes: 4 additions & 0 deletions src/datasets/inspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,8 @@ def get_dataset_config_names(
'rte',
'wnli',
'ax']

Note that you can fetch the list of configs for a dataset on the Hugging Face Hub via an HTTP request with the [Datasets REST API endpoint /splits](https://huggingface.co/docs/datasets-server/splits).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding this here would bring a lot of users. Maybe prefer adding some links in the actual docs

```
"""
dataset_module = dataset_module_factory(
Expand Down Expand Up @@ -422,6 +424,8 @@ def get_dataset_split_names(
>>> get_dataset_split_names('rotten_tomatoes')
['train', 'validation', 'test']
```

Note that you can fetch the list of split names for a dataset on the Hugging Face Hub via an HTTP request with the [Datasets REST API endpoint /splits](https://huggingface.co/docs/datasets-server/splits).
"""
info = get_dataset_config_info(
path,
Expand Down