Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload manager does not re-use buffer pools across requests #2036

Closed
robin865 opened this issue Jul 6, 2018 · 2 comments · Fixed by #2863
Closed

Upload manager does not re-use buffer pools across requests #2036

robin865 opened this issue Jul 6, 2018 · 2 comments · Fixed by #2863
Labels
feature-request A feature should be added or improved. performance

Comments

@robin865
Copy link

robin865 commented Jul 6, 2018

So noticed this while profiling memory usage when using the upload manager for small objects.

I have a part size of 64 MB and concurrency set to 2 for my single instance of the upload manager, since I don't want to use multipart for objects less than this size. However, I'd like to just use a single interface for all my s3 requests (big and small). The problem is that using this instance of the upload manager uses a very large amount of memory. For instance, when I perform 200,000 30 KB PUTs using this single upload manager (only doing 2 concurrent manager.Upload() calls at a time); I end up consuming 3.5 GB of memory while these requests are going on.

Looking into it further, I see that in upload.go, the "func (u *uploader) init() {" function reassigns u.bufferPool. And init() is called per-request. Hence there is no sharing of buffer pools across requests. I am wondering if this is intentional? It seems like it would make sense to have a shared buffer pool per-upload manager used by all requests? Currently the buffer pool only is used within a request.

I guess in general, I am wondering what the recommended use of s3manager.Uploader? Is it meant for this sort of workload?

Cheers,

Robin M

@xibz
Copy link
Contributor

xibz commented Jul 11, 2018

Hello @Rsm10, thank you for reaching out to us. I believe there is a couple reasons to why a lot of memory is being used. The first being that the marshalers take a lot of memory due to the use of reflection, and secondly the lack of pooling as you had stated. I'll go ahead and mark this as a feature request. If you have any additional questions, please let us know!

@xibz xibz added the feature-request A feature should be added or improved. label Jul 11, 2018
@jasdel jasdel added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jul 12, 2018
@jasdel
Copy link
Contributor

jasdel commented Jul 12, 2018

I think there might be something else at play here. I updated the s3manager.Uploader to persist a sync.Pool across all Upload calls, but was not able to see a meaningful difference.

@Rsm10 could you provide more information about how the Uploader is being used, e.g the type of value used for the UploadInput.Body parameter. Is the upload type a os.File, bytes.Buffer, bytes.Reader, etc? If you're able to profile the issue you're experiencing in more detail it would significantly help investigate this issue further.

I used the following to benchmark the Upload: (I realized the previous version of my tested had a bug causing it to have a empty body for all but first request, but after fixing still not a noticeable difference)

// go test -bench=BenchmarkUpload -benchmem -benchtime 10s -run NONE -cpu 1,2,4 ./service/s3/s3manager/
func BenchmarkUpload(b *testing.B) {
	createUploadBody := []byte(`
<?xml version="1.0" encoding="UTF-8"?>
<InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Bucket>Bucket</Bucket>
  <Key>Key</Key>
  <UploadId>uploadid</UploadId>
</InitiateMultipartUploadResult>
`)

	completeUploadBody := []byte(`
<CompleteMultipartUpload>
  <Part>
  </Part>
</CompleteMultipartUpload>
`)

	svc := s3.New(unit.Session)
	svc.Handlers.Send.Swap(corehandlers.SendHandler.Name, request.NamedHandler{
		Name: "mock.send",
		Fn: func(r *request.Request) {
			io.CopyN(ioutil.Discard, r.HTTPRequest.Body, r.HTTPRequest.ContentLength)
			req := r.HTTPRequest
			var body []byte
			switch {
			case strings.Contains(req.URL.RawQuery, "uploads") && req.Method == "POST":
				body = createUploadBody
			case strings.Contains(req.URL.RawQuery, "uploadId") && req.Method == "POST":
				body = completeUploadBody
			}
			req.Body.Close()
			respHeader := &http.Header{}
			respHeader.Set("Content-Length", strconv.Itoa(len(body)))
			r.HTTPResponse = &http.Response{
				Status:        "OK",
				StatusCode:    200,
				Header:        http.Header{},
				ContentLength: int64(len(body)),
				Body:          ioutil.NopCloser(bytes.NewReader(body)),
			}
		},
	})
	u := s3manager.NewUploaderWithClient(svc)

	bodyBuf := make([]byte, 30*1024)
	b.ResetTimer()

	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			_, err := u.Upload(&s3manager.UploadInput{
				Bucket: aws.String("Bucket"),
				Key:    aws.String("Key"),
				// Use a Bytes Buffer since it is not seekable.
				Body: bytes.NewBuffer(bodyBuf),
			})
			if err != nil {
				b.Fatalf("expect no error, got %v", err)
			}
		}
	})
}

@robin865
Copy link
Author

UploadInput.Body is being supplied the body from a get response (GetObjectOutput.Body).

Here is the pprof output (-alloc_space) after running one of my tests (200,000 30 KB objects):

Showing top 10 nodes out of 14
flat flat% sum% cum cum%
475073MB 98.28% 98.28% 475073MB 98.28% github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).init.func1
2668.20MB 0.55% 98.83% 2782.74MB 0.58% main
2.50MB 0.00052% 98.83% 475242.16MB 98.31% uploader
1.50MB 0.00031% 98.83% 475239.66MB 98.31% github.com/aws/aws-sdk-go/service/s3/s3manager.Uploader.UploadWithContext
1.50MB 0.00031% 98.83% 475611.50MB 98.39% replicator
0 0% 98.83% 475077.02MB 98.28% github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).nextReader
0 0% 98.83% 475238.16MB 98.31% github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).upload
0 0% 98.83% 475239.66MB 98.31% github.com/aws/aws-sdk-go/service/s3/s3manager.Uploader.Upload
0 0% 98.83% 475613MB 98.39% main
0 0% 98.83% 475613MB 98.39% main.run.func5

and without alloc_space:

(pprof) top1
Showing top 1 nodes out of 54
flat flat% sum% cum cum%
640MB 83.16% 83.16% 640MB 83.16% github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).init.func1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants