Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remotely read/stream embargoed Zarr #1491

Open
kabilar opened this issue Aug 20, 2024 · 7 comments
Open

Remotely read/stream embargoed Zarr #1491

kabilar opened this issue Aug 20, 2024 · 7 comments

Comments

@kabilar
Copy link
Member

kabilar commented Aug 20, 2024

Hi team, @aaronkanzer and I are trying to read the metadata and chunks of an embargoed Zarr (on DANDI and LINC) and are unable to. What would be the best approach to remotely access a Zarr that is apart of an embargoed Dandiset?

For the code snippet below, I can get the zarr_path in the dandiarchive S3 bucket from the File Browser of a Dandiset using the View Asset Metadata button, but s3fs also requires AWS credentials.

import zarr, s3fs

access_key = 'your-access-key-id'
secret_key = 'your-secret-access-key'
session_token = 'your-session-token'  # Optional, if using temporary credentials

s3 = s3fs.S3FileSystem(key=access_key, secret=secret_key, token=session_token)

bucket_name = 'dandiarchive'
zarr_path = 'path/to/your/zarr/data.zarr'

store = s3fs.S3Map(root=f'{bucket_name}/{zarr_path}', s3=s3, check=False)

zarr_array = zarr.open_array(store, mode='r')

For reference, I am also hitting a blocker using the DANDI API when trying to access a public or private Zarr. I am not sure if this would be related to my use case. Using the code snippet below (which is a derivative of the OpenScope Databook streaming section) I receive a Response [400]. I presume that this is because the asset is a Zarr and the response is set in lines 143-148. And perhaps this is related to #1455.

from dandi import dandiapi

dataset = "000026"
filepath = "sub-I58/ses-Hip-CT/micr/sub-I58_sample-01_chunk-01_hipCT.ome.zarr"
dandi_api_key = <dandi_api_key>

client = dandiapi.DandiAPIClient(api_url="https://api.dandiarchive.org/api", token=dandi_api_key)

my_dandiset = client.get_dandiset(dandiset_id=dataset, version_id="draft")

file = my_dandiset.get_asset_by_path(filepath)

base_url = file.client.session.head(file.base_download_url)

Thank you.

@jwodder
Copy link
Member

jwodder commented Aug 20, 2024

How are you even creating a Zarr in an embargoed Dandiset? dandi-cli currently doesn't support that, and — last time I checked — neither does the Archive.

As to your second snippet, base_download_url and download_url are useless for Zarrs, as they normally point to a single file, but a Zarr is many files. What exactly were you expecting to happen there?

@yarikoptic
Copy link
Member

Indeed, I thought zarrbargo is yet to be implemented, correct @jjnesbitt ?

@jjnesbitt
Copy link
Member

Indeed, I thought zarrbargo is yet to be implemented, correct @jjnesbitt ?

Correct, it is not yet implemented.

@kabilar
Copy link
Member Author

kabilar commented Aug 20, 2024

Thanks team. Sorry, I was a bit loose with my explanation. The first code snippet does not work on LINC, which requires authentication for all requests since the platform is private. (LINC does allow for upload and download of private Zarrs.) The second code snippet does not work on both DANDI and LINC. And as you mentioned, this may not be needed for my use case.

Overall I am just looking for advice on how we should provide LINC users read/streaming access to private Zarrs. We were thinking about creating a helper function (get_read_only_credentials) to work with s3fs as shown below.

aws_credentials = lincbrain.get_read_only_credentials(lincbrain_api_key=<lincbrain_api_key>)

s3 = s3fs.S3FileSystem(key=aws_credentials['access_key'], 
                       secret=aws_credentials['secret_key'], 
                       token=aws_credentials['session_token'])

@kabilar
Copy link
Member Author

kabilar commented Aug 20, 2024

And thanks for clarifying. I forgot that zarrbargo has not yet been implemented.

@satra
Copy link
Member

satra commented Aug 20, 2024

on the 000108 examples in the example-notebooks, the code reads and evaluates zarr objects on dandiarchive.

@kabilar
Copy link
Member Author

kabilar commented Aug 20, 2024

Thank you. I will take a look at these examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants