Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed query for composite content iri [tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT] #267

Closed
jhpoelen opened this issue Nov 8, 2023 · 2 comments

Comments

@jhpoelen
Copy link
Member

jhpoelen commented Nov 8, 2023

Preston uses Apache Virtual File System (VFS) file path notation to point into (compressed) archives.
e.g.,

tar:https;//example.org/file.tar!/somefile.txt

references a file, somefile.txt inside a tar ball at https://example.org/file.tar .

for gzipped archives, VFS allows for both

tar:gz:https;//example.org/file.tar.zip!/file.tar!/somefile.txt

and

tar:gz:https;//example.org/file.tar.gz!/somefile.txt

But . . . when used in Preston for some reason,

tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT

does not resolve, even though the contentid and file path exists.

observed using:

Caused by: java.io.IOException: cannot find content identified by [<tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT>]
	at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:74)
	at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23)
	... 34 more
@jhpoelen
Copy link
Member Author

jhpoelen commented Nov 8, 2023

Root cause was a invalid use of a split method.

@jhpoelen jhpoelen changed the title failed query for composite content iri [tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT] failed query for composite content iri [tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT] Nov 8, 2023
jhpoelen pushed a commit to globalbioticinteractions/nomer that referenced this issue Nov 8, 2023
@jhpoelen
Copy link
Member Author

jhpoelen commented Nov 8, 2023

resolved in preston v0.7.7

@jhpoelen jhpoelen closed this as completed Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant