Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues for loading from RAM instead of from Disk #431

Closed
4 tasks done
DomInvivo opened this issue Aug 10, 2023 · 1 comment · Fixed by #432
Closed
4 tasks done

Issues for loading from RAM instead of from Disk #431

DomInvivo opened this issue Aug 10, 2023 · 1 comment · Fixed by #432

Comments

@DomInvivo
Copy link
Collaborator

DomInvivo commented Aug 10, 2023

  • The DatasetSubsampler class fails if processed_graph_data_path is None, due to this line:

    path_with_hash = os.path.join(data_path, data_hash)

  • cache_data_path should be depreciated in favor of processed_graph_data_path:

    cache_data_path: Optional[Union[str, os.PathLike]] = None,

  • load_from_file should be its own parameter. Depending on processed_graph_data_path means that either:

    • We can cache the data and dataloading from disk
    • We cannot cache and we can do dataloading from RAM
    • We need the option of caching while doing dataloading from RAM, and this will be enabled by a new load_from_file parameter in the Datamodule
  • Why is normalize_label only applied when dataloading is from RAM? Where is it applied on disk?

    self.normalize_label(multitask_dataset, stage)

@DomInvivo DomInvivo changed the title Issues for loading from RAM instead of from DISK Issues for loading from RAM instead of from Disk Aug 10, 2023
@DomInvivo DomInvivo linked a pull request Aug 10, 2023 that will close this issue
5 tasks
@WenkelF
Copy link
Collaborator

WenkelF commented Aug 10, 2023

Why is normalize_label only applied when dataloading is from RAM? Where is it applied on disk?

This is fine because _save_data_to_files() internally calls _make_multitask_dataset(..., load_from_file=False), i.e., label normalization is already applied to datasets loaded from disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants