Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListObjects performs un-needed recursive listing on bucket with filesystem provider #575

Closed
juangburgos opened this issue Nov 16, 2023 · 6 comments

Comments

@juangburgos
Copy link

Awesome software, just one detail:

When there is a ListObjects request with a / delimeter, the filesystem provider should only query the folders, instead it does a recursive listing of all the sub-tree in the bucket (filesystem folder) which is un-needed and very expensive for deep trees with lots of files.

Basically in the logs we see one entry per file in the subtree recursivelly!:

[s3proxy] D [timestamp] S3Proxy-Jetty-40 o.j.b.config.LocalBlobStore:56 |::] Opening blob in container: [file]

Of course any S3 client timesout when the sub-tree is large.

@gaul
Copy link
Owner

gaul commented Nov 17, 2023

Duplicate of #473. You can work around this by setting jclouds.version to 2.6.0-SNAPSHOT.

@gaul gaul closed this as completed Nov 17, 2023
@juangburgos
Copy link
Author

Sorry, did not know it was a dupe, not knowledgeable in Java, how/where would I change this parameter? I guess in s3proxy.conf? I will try tonight. Thanks!

@gaul
Copy link
Owner

gaul commented Nov 17, 2023

You need to edit pom.xml and compile S3Proxy with mvn package.

@juangburgos
Copy link
Author

juangburgos commented Nov 17, 2023

Thanks, that worked, tests are fialing though, had to run with:

mvn package -DskipTests

@juangburgos
Copy link
Author

Now the problem is on Windows, the returned sub-folders have the trailing \, for example:

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>converted</Name>
  <Prefix/>
  <KeyCount>2</KeyCount>
  <MaxKeys>1000</MaxKeys>
  <ContinuationToken/>
  <StartAfter/>
  <Delimiter>/</Delimiter>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>some-file.txt</Key>
    <LastModified>2020-05-01T23:37:35Z</LastModified>
    <Size>693</Size>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
  <CommonPrefixes>
    <Prefix>some-folder\</Prefix>
  </CommonPrefixes>
</ListBucketResult>

Which then an S3 client wil try to use to query the subfolder, and everything breaks from then on.

@juangburgos
Copy link
Author

I have managed to make it work by adding the following to my reverse proxy configuration:

AddOutputFilterByType SUBSTITUTE application/xml
Substitute "s|\|/|n"

Basically substitute \ for / for XML responses. And now is usable 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants