Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Multipart upload: mtime difference between storage and lakeFS can be substantial #8303

Closed
N-o-Z opened this issue Oct 23, 2024 · 1 comment · Fixed by #8311
Closed
Assignees
Labels
area/block-adapter bug Something isn't working

Comments

@N-o-Z
Copy link
Member

N-o-Z commented Oct 23, 2024

For example: In S3 the mtime is determined when creating the multipart upload requests, while lakeFS mtime is determined upon completion of multipart upload.
Needless to say this can result in a very huge diff between the S3 mtime and lakeFS mtime.

Need to find a generic solution to this which will be valid for all storage adapters
Possible solution:
Upon CompleteMultipartUpload, stat the object on the blockstore and use the mtime to create the lakeFS entry.

In order to properly test this - we need to consider adding a head object interface to our block adapter.

@N-o-Z N-o-Z added bug Something isn't working area/block-adapter labels Oct 23, 2024
@arielshaqed arielshaqed self-assigned this Oct 27, 2024
@arielshaqed
Copy link
Contributor

Fortunately we can do this: GCS and for Azure return this information. S3 does not, but we already headObject the generated object to gets its ETag, after which Last-Modified time is free (and guaranteed to be found).

Probably also want to straighten this out for put-object: any difference can be unpleasant for presigned, and generally confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/block-adapter bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants