This is an example of how to use a single metadata file for all execution outputs instead of creating an individual sidecar file for each output.
This has benefits when you have many outputs, as parsing a single file is faster than parsing many files, and one file also takes up less storage than many very small files.
You can use valohai-utils
to handle the metadata file creation for you.
(Note: You need version 0.6.0 or later to use this feature.)
For each file, you add metadata properties to the file:
import valohai
with valohai.output_properties() as properties:
filename = "output.txt"
# write data to the file
...
# add metadata properties
properties.add(file=filename, properties={"my_property": "my_value", "number": 1.23})
You can also add a file to a dataset with the add_to_dataset
method.
The properties helper also has a method for building the dataset version URI
based on the dataset name and version.
import valohai
with valohai.output_properties() as properties:
filename = "output.txt"
# create dataset version URI
new_dataset_version = properties.dataset_version_uri("dataset_name", "new_version")
# add the file to the dataset versions
properties.add_to_dataset(file=filename, dataset_version=new_dataset_version)
Instead of creating a sidecar file for each output, you create a single metadata file that lists all the outputs.
The metadata file name must be valohai.metadata.jsonl
and it must be in the execution outputs root directory.
The file format is JSON lines, where each line is a JSON object with the following fields:
file
: The name (including the path) of the output filemetadata
: The metadata as a JSON object
The value of the metadata
property is the same JSON object that you would put in a sidecar file.
It is then applied to the file specified in the file
property.
For example, if your output file is output.txt
in a directory called my_outputs
,
and you want to set a metadata property my_property
to the value my_value
,
you would create a file called valohai.metadata.jsonl
with the following content:
{"file": "my_outputs/output.txt", "metadata": {"my_property": "my_value"}}
(Note that in the sidecar JSON file, you can divide the properties into multiple lines for readability, but in the JSON lines file, each list of file properties must be on a single line.)
To run this project as-is, you need Python 3.9 or later installed on your system.
To set up the project, first create a virtual environment,
and then install the required dependencies (from requirements.txt
).
Next, you need to log on Valohai and associate the project directory with a project. (See Valohai documentation → Command-line Client for more information.)
Run the execution step create-files
either from the UI or
from the command-line (using valohai-cli
).
No parameters are needed for this step.
vh execution run create-files
You can set the number of output files with the --nr-of-files
parameter:
vh execution run create-files --nr-of-files=100