Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use time of create MPU on backend storage as time of MPU #8311

Merged
merged 5 commits into from
Oct 30, 2024

Conversation

arielshaqed
Copy link
Contributor

@arielshaqed arielshaqed commented Oct 27, 2024

This time is either reported by the block adapter during completion, or it can be read by stat'ting the underlying object after complete-MPU.

Fixes #8303.

@arielshaqed arielshaqed added bug Something isn't working area/API Improvements or additions to the API area/gateway Changes to the gateway include-changelog PR description should be included in next release changelog labels Oct 27, 2024
@arielshaqed arielshaqed force-pushed the bug/8303-use-adapter-time-for-mpu branch 3 times, most recently from 84c7bbd to 2ad899c Compare October 27, 2024 16:04
Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link

E2E Test Results - Quickstart

11 passed

This time is either reported by the block adapter during creation, or it can
be read by stat'ting the underlying object after complete-MPU.  Use the
latter method on S3.
@arielshaqed arielshaqed force-pushed the bug/8303-use-adapter-time-for-mpu branch 6 times, most recently from 7f17008 to bb70230 Compare October 29, 2024 08:08
Azure and GCS do _not_ use the time of create-MPU.  The important thing is
that the two share the same time.

Anything else confuses not only people but also programs.  For instance
DataBricks with presigned URLs can get confused by getting one time from
lakeFS and another from underlying storage.
@arielshaqed arielshaqed force-pushed the bug/8303-use-adapter-time-for-mpu branch from bb70230 to a732383 Compare October 29, 2024 08:20
@arielshaqed arielshaqed marked this pull request as ready for review October 29, 2024 08:34
@arielshaqed
Copy link
Contributor Author

Not a customer issue

This does not match any report by a customer of an inconsistency. It's just a relevant cleanup.

@arielshaqed arielshaqed requested a review from N-o-Z October 29, 2024 08:35
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Very elegant - some minor comments

@@ -384,6 +384,12 @@ func (c *Controller) CompletePresignMultipartUpload(w http.ResponseWriter, r *ht

writeTime := time.Now()
checksum := httputil.StripQuotesAndSpaces(mpuResp.ETag)
if mpuResp.MTime != nil {
// Anything else can be _really_ wrong when the storage layer assigns the time
// of MPU creation. For instance, the S3 block adapter takes sure to return an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// of MPU creation. For instance, the S3 block adapter takes sure to return an
// of MPU creation. For instance, the S3 block adapter makes sure to return an

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -384,6 +384,12 @@ func (c *Controller) CompletePresignMultipartUpload(w http.ResponseWriter, r *ht

writeTime := time.Now()
checksum := httputil.StripQuotesAndSpaces(mpuResp.ETag)
if mpuResp.MTime != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put line 387 right above this line

@@ -309,7 +309,8 @@ func handlePut(w http.ResponseWriter, req *http.Request, o *PathOperation) {
// write metadata
metadata := amzMetaAsMetadata(req)
contentType := req.Header.Get("Content-Type")
err = o.finishUpload(req, blob.Checksum, blob.PhysicalAddress, blob.Size, true, metadata, contentType)
// BUG(ariels): Read MTime from upload!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still haven't decided if this should be fixed or not - anyway it's not related to the changes in this PR (let's track it in an issue instead?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -779,6 +779,7 @@ func (a *Adapter) CompleteMultiPartUpload(ctx context.Context, obj block.ObjectP
etag := strings.Trim(aws.ToString(resp.ETag), `"`)
return &block.CompleteMultiPartUploadResponse{
ETag: etag,
MTime: headResp.LastModified,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-)


// On S3 the object should have Last-Modified time at around time of MPU creation.
// Ensure lakeFS fails the test if it fakes it by using the current time.
time.Sleep(2 * timeResolution)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never too happy about hardcoded sleeps in tests but seems inevitable.
Perhaps scope it to blockstoreType == s3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand. Don't like it but did it, thanks!

Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Pulling once Esti passes again...


// On S3 the object should have Last-Modified time at around time of MPU creation.
// Ensure lakeFS fails the test if it fakes it by using the current time.
time.Sleep(2 * timeResolution)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand. Don't like it but did it, thanks!

@@ -384,6 +384,12 @@ func (c *Controller) CompletePresignMultipartUpload(w http.ResponseWriter, r *ht

writeTime := time.Now()
checksum := httputil.StripQuotesAndSpaces(mpuResp.ETag)
if mpuResp.MTime != nil {
// Anything else can be _really_ wrong when the storage layer assigns the time
// of MPU creation. For instance, the S3 block adapter takes sure to return an
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -779,6 +779,7 @@ func (a *Adapter) CompleteMultiPartUpload(ctx context.Context, obj block.ObjectP
etag := strings.Trim(aws.ToString(resp.ETag), `"`)
return &block.CompleteMultiPartUploadResponse{
ETag: etag,
MTime: headResp.LastModified,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-)

@@ -309,7 +309,8 @@ func handlePut(w http.ResponseWriter, req *http.Request, o *PathOperation) {
// write metadata
metadata := amzMetaAsMetadata(req)
contentType := req.Header.Get("Content-Type")
err = o.finishUpload(req, blob.Checksum, blob.PhysicalAddress, blob.Size, true, metadata, contentType)
// BUG(ariels): Read MTime from upload!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@arielshaqed arielshaqed enabled auto-merge (squash) October 30, 2024 08:35
@arielshaqed arielshaqed force-pushed the bug/8303-use-adapter-time-for-mpu branch from edf1449 to 5dd27ca Compare October 30, 2024 15:16
@arielshaqed arielshaqed merged commit c2e44a0 into master Oct 30, 2024
38 checks passed
@arielshaqed arielshaqed deleted the bug/8303-use-adapter-time-for-mpu branch October 30, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API Improvements or additions to the API area/gateway Changes to the gateway bug Something isn't working include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Multipart upload: mtime difference between storage and lakeFS can be substantial
2 participants