Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load_dataset and save_dataset functions #392

Merged
merged 2 commits into from
Nov 20, 2020

Conversation

tomwhite
Copy link
Collaborator

This fixes #298, and also incorporates some of the implementation from https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/blob/4f862e31b8093d25fdaa8da7f841b9be8583cda4/scripts/gwas.py#L53-L71

Unlike #298 it does not rechunk the dataset before saving. I worry about having implicit rechunk methods that the user can't control since we have seen them perform poorly in some cases. For the moment I think it's preferable for the user to explicitly rechunk before saving. This is what I have been doing in the MalariaGEN notebooks, and Eric has too judging by this example.

I also haven't added fsspec support even though the GWAS pipeline uses it. This is because I was getting a warning that files were not being closed when running on local files. That could be addressed separately.

@codecov-io
Copy link

codecov-io commented Nov 19, 2020

Codecov Report

Merging #392 (05cd754) into master (1880cfd) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #392   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           31        32    +1     
  Lines         2230      2247   +17     
=========================================
+ Hits          2230      2247   +17     
Impacted Files Coverage Δ
sgkit/__init__.py 100.00% <100.00%> (ø)
sgkit/io/dataset.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1880cfd...05cd754. Read the comment docs.

"""
store = str(path)
for v in ds:
ds[v].encoding.pop("chunks", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning pydata/xarray#4380 in a comment?

@tomwhite tomwhite added the auto-merge Auto merge label for mergify test flight label Nov 20, 2020
@mergify mergify bot merged commit 92b331f into sgkit-dev:master Nov 20, 2020
@tomwhite tomwhite deleted the load_and_save branch November 20, 2020 10:05
tomwhite added a commit to tomwhite/sgkit that referenced this pull request Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add save/load dataset methods to API
4 participants