[ARCHITECTURE UPDATE] update data organisation

I propose to update the architecture of dataset to mimic the one of HuggingFace datasets as close as possible:
```bash
folder
├── data
│   ├── train
│   │   ├── sample_000000000
│   │   │   ├── features_000000000.cgns
│   │   │   ├── features_000000001.cgns
│   │   └── sample_0000000001
│   │   │   ├──...
│   ├── test
│   │   ├── ...
├── infos.yaml
└── problem_definitions
│   └── task_1
│       ├── problem_infos.yaml
│       └── split.json
│   ├── task_2
│   │   ├── ...
```
Like HF datasets, we can introduce `Dataset`(the actual one) and `DatasetDict:dict[str,Dataset]`. The split will contain the keys of DatasetDict and subsplit with a numbering local to the corresponding key (train, test, ...).

Doing this will make hf datasets repo very similar to the data memory mapping:

<img width="566" height="293" alt="Image" src="https://github.com/user-attachments/assets/3bf04559-2ece-4a79-94d8-c6ab3cfd58d8" />

(this was obtained by using the `hf_dataset.push_to_hub(repo_id)` and our Hugging Face bridge)

The multiple problem definition proposal will indeed enable multiple task defined over the same dataset:

<img width="613" height="232" alt="Image" src="https://github.com/user-attachments/assets/15d72e43-45ff-4ffd-9950-bd02a7cbe064" />

The work I did in #240 implements this for HF dataset repos of PLAID datasets. I think the problem definition can be modifyins as well to:
- indicate `in` and `out` split concerned by the regression task
- name the score function for the moment (maybe later we should find a way to define an implementation
- rely on the flatten tree keys for the in and out feature identifiers, e.g. `Base_2_2/Zone/GridCoordinates/CoordinateX`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ARCHITECTURE UPDATE] update data organisation #241

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ARCHITECTURE UPDATE] update data organisation #241

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions