The download selected public datasets included in the benchmark, please run the following command:
DATASETSROOT=/path/to/local/download/directory python -m datasets.load_datasets -d <DS_NAME_1> <DS_NAME_2>
The scipt relies on a DATASETSROOT
environment variable, to indicate the local path where
datasets will be automatically downloaded.
You can alternatively export this variable in your SHELL environment before running the script:
export DATASETSROOT=/path/to/download/directory
Please do not run the load_datasets
script from within the datasets
folder. This will not work
due to issues with relative imports.
Please execute the load_datasets
script directly from the main folder, using the -m
option with the Python interpreter.
To access the list of all the datasets included in the benchmark, please use the --list
option:
python -m datasets.load_datasets --list
It is also possible to gather the list of dataset(s) to download directly from
benchmark configuration files by using the --configs
(-c
) option:
DATASETSROOT=/path/to/download/dir python -m datasets.load_datasets -c config_1.json config_2.json ...
This method will override the -d
option, and it is highly recommended when
running multiple benchmark experiments.