CRT client not respecting pod memory limits #4034

Lunkers · 2023-05-24T12:24:43Z

Describe the bug

When uploading large files using the CRT s3 client, it seems that Kubernetes memory restrictions are not respected.
When uploading a large file using the code snippet in #4033 in a Kubernetes pod, a number of anonymous file read processess are spawned and don't respect the memory limits on the pod, trying to use more memory than the pod is allowed to.

Expected Behavior

The SDK upload processes should not consume all available RAM in the pod, and respect Kubernetes limits.

Current Behavior

The pod consumes more and more memory for each upload created, rarely freeing memory. Sooner or later, Kubernetes kills the pod for using too many resources. The provided screenshot shows a pod with an 8GB memory limit:

Reproduction Steps

use the transfer manager created in the snippet provided in #4033 to upload a large file, roughly 80-100GB in a Kubernetes pod.

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.20.67

JDK version used

11

Operating System and version

Ubuntu jammy jellyfish

debora-ito · 2023-06-06T22:39:26Z

Question, as I'm not very familiar with Kubernetes:

The SDK upload processes should not consume all available RAM in the pod, and respect Kubernetes limits.

How is this limit set? Is it a "virtual" limit?

SriDeepa-s3 · 2023-06-10T04:29:21Z

even iam getting similar issue

Lunkers · 2023-06-12T07:29:46Z

@debora-ito I'm not a kubernetes expert either, but afaik resource allocation is done using cgroups. Our infrastructure team theorize that the CRT client may not respect cgroups correctly.

@SriDeepa-s3 We've managed to circumvent this by using a normal AsyncClients instead for S3 uploads, which has solved the issue.

SriDeepa-s3 · 2023-06-12T12:47:33Z

@debora-ito: is there any reason why CRT client not bound to memory ?

debora-ito · 2023-06-16T17:17:54Z

@Lunkers noted. We'll investigate.

SriDeepa-s3 · 2023-06-21T12:05:45Z

@debora-ito: any update on above issue ?

graebm · 2023-06-26T17:13:47Z

I started investigating this last week. I can reproduce this issue on my laptop when my internet speeds are much lower than the CRT S3 client's targetThroughputInGbps setting (it's 10Gbps by default, which is much higher than my home internet).

The CRT S3 client does use memory outside the JVM, so it is able to exceed Java's normal Runtime.maxMemory() aka -XX:MaxHeapSize. The CRT S3 client's only real tuning knob is targetThroughputGbps right now. That leads to a opaque series of calculations for how much concurrency is ideal, but based on my back-of-the-envelope math it should be using much less than 1 GiB of memory.

Something weird is definitely going on. The CRT's memory usage climbs well over 1GiB in the first 60sec of my upload, before coming back down and settling well below 1GiB for the remainder of the upload.

I'm still investigating, this is unacceptable.
But in the meantime you could lower the targetThroughputGbps.

graebm · 2023-06-27T17:25:09Z

Actually, I'm not reproducing this. When my memory usage climbed over 1GiB, I had been messing around and set the targetThroughputGbps=100, much higher than the default 10.

With the default targetThroughputGbps=10, memory usage stays solidly below 1GiB (JVM using 300MiB, CRT using 500MiB).

graebm · 2023-06-27T18:20:46Z

I guess we'll need a more reliable repro case...

Are you certain there's only 1 instance of the CRT S3 client running?

You can see how much native memory the CRT S3 client is using by running with the -Daws.crt.memory.tracing=1 system property set, and then calling software.amazon.awssdk.crt.CRT.nativeMemory() to see the current usage (in bytes).

FWIW, you can see the JVM's memory usage by calling Runtime.getRuntime().totalMemory()

SriDeepa-s3 · 2023-06-28T10:57:57Z

our scenario is like, we are running consumer where it will listen to one topic and upload the input stream to s3 .
for every request , creating new client with user credentials and closing client obj after each upload

sample code in #4094

is there any thing iam missing here on close?
will provide update on CRT and jvm memory with as suggested

graebm · 2023-06-28T16:44:14Z

If possible, I would revamp your code to only have 1 instance of the S3Client. The S3Client is built with the intention of being a singleton, handling multiple requests at once in an intelligent way that doesn't use too many resources. But if you have N instances, they will use N times the system resources.

SriDeepa-s3 · 2023-07-03T12:21:35Z

In our scenario every request with different credentials and bucket details ..so we ended up in creating new client for every request .. Is there way to create client at ones and ovveride the credentials ? And when we close the client in finally block after all operations done..doesn't release the resources ?

…

On Wed, 28 Jun 2023, 22:14 Michael Graeb, ***@***.***> wrote: If possible, I would revamp your code to only have 1 instance of the S3Client. The S3Client is built with the intention of being a singleton, handling multiple requests at once in an intelligent way that doesn't use too many resources. But if you have N instances, they will use N times the system resources. — Reply to this email directly, view it on GitHub <#4034 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BALWNNKRAIRDG7SND6TPSITXNRNOTANCNFSM6AAAAAAYNJSFHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

neel1298 · 2023-07-13T14:41:33Z

@debora-ito I'm not a kubernetes expert either, but afaik resource allocation is done using cgroups. Our infrastructure team theorize that the CRT client may not respect cgroups correctly.

@SriDeepa-s3 We've managed to circumvent this by using a normal AsyncClients instead for S3 uploads, which has solved the issue.

Hello, we are also facing some similar issues but with downloading. Can you tell me what do you mean by normal AsyncClients.
Did you use
S3AsyncClient.builder() instead of S3AsyncClient.crtBuilder() ?
And did you see any major performance drops when using a normal client?

graebm · 2023-07-14T21:28:43Z

Is there way to create client at ones and ovveride the credentials ?

Darn, it's not currently possible in aws-sdk-java-v2. It's possible in the underlying native code to use different buckets and credentials per-request, but that configuration still needs to be exposed out at this level. Sorry. The team has this task in their backlog...

dfinucane · 2023-07-20T16:05:43Z

Definitely seems like a bug in the CRT async client because I'm using that, a single instance shared across the process and the native memory use (non JVM heap) soars to 11 or even 14GB during single file uploads. I'm going to switch to the non CRT async client until this gets resolved. I have the same problem where I'm running inside kubernetes which has a limit of 6GB and the pod is getting terminated for far exceeding the 6GB limit whereas the JVM heap is hovering around 2GB.

dfinucane · 2023-07-20T23:46:26Z

For me switching to the S3AsyncClient.builder().build() client did not resolve the issue. I'm using SDK version 2.20.103 and regardless which client I use many gigabytes of native (non JVM heap) memory get used, the process exceeds its limit and is terminated. I thought the S3AsyncClient.builder().build() client was implemented in Java and therefore used the JVM heap but that's not what it seems like because the leak I'm experiencing is not in the JVM heap.

dfinucane · 2023-07-22T21:30:39Z

Upon further reviewing my situation I do not believe I was experiencing a problem with S3TransferManager or the CRT async client. I wanted to point that out so that nobody wasted time researching a problem based on my last two comments.
I was misreading metrics and now believe that the spikes i was seeing were the sum of the memory used by the pod being terminated and the pod taking its place.
I believe my problem was the result of not leaving enough room for non JVM heap memory where my pod was allowed 6GB and my heap was allowed to consume up to 75% of that because I had used -XX:UseContainerSupport with -XX:MaxRAMPercentage=75.0.
Sorry if I caused confusion with this ticket.

davidh44 · 2023-08-04T22:55:53Z

Is there way to create client at ones and ovveride the credentials ?

Darn, it's not currently possible in aws-sdk-java-v2. It's possible in the underlying native code to use different buckets and credentials per-request, but that configuration still needs to be exposed out at this level. Sorry. The team has this task in their backlog...

You can configure credentials per-request using AwsRequestOverrideConfiguration:

PutObjectRequest request = PutObjectRequest.builder()
                                                   .bucket("bucket")
                                                   .key("key")
                                                   .overrideConfiguration(o -> o.credentialsProvider(DefaultCredentialsProvider.create()))
                                                   .build();

To handle buckets in different regions than that of the S3Client, you can enable crossRegionAccessEnabled:

S3AsyncClient crossRegionS3Client = crossRegionS3Client = S3AsyncClient.crtBuilder()
                                                                           .region(Region.EU_WEST_1)
                                                                           .crossRegionAccessEnabled(true)
                                                                           .build();

rtjain21 · 2023-08-23T16:40:12Z

any updates on this issue?

alexsander-terres · 2023-09-12T12:16:20Z

I am also wondering if there was any updates on this, since I was also experiencing some memory leaks with crtBuilder.

jassuncao · 2023-10-13T07:52:44Z

I experienced the same issue of a pod getting killed due to excessive memory usage.
This happens with the CRT and the normal/pure java client. In the case of the normal client I suspect the cause is the use of direct memory buffers. I was using -Xms to limit the JVM memory usage but this is not enough. I also had to set -XX:MaxDirectMemorySize

jason-weddington · 2023-10-13T11:34:26Z

@jassuncao Thanks for the information. We're working on additional configuration options to make this easier to manage and will update this issue then that update is released.

jma-9code · 2023-11-21T10:49:23Z

I had memory leak too with the CRT implementation (unable to control the java native memory).

The only viable solution I've found to continue using high-level interfaces for multipart operations, especially for uploading files larger than 5 gigabytes, is to use SDK v1.

debora-ito · 2024-02-07T19:17:06Z

We apologize for the long silence, we have some updates to share.

The CRT team made some changes in the CRT client core, released in aws-crt 0.29.7, and we observed improvements in the memory usage after running benchmarks using this new version.

We are also exposing a new attribute in the S3CrtAsyncClient that will provide more control over the utilized memory at the client level: maxNativeMemoryLimitInBytes. If a value is not provided, the CRT will attempt to limit native memory usage in an optimal way, based on parameters like target throughput. The new attribute was added via #4885 and will be available in today's release.

In summary:

Please upgrade to SDK version 2.23.5 (which includes aws-crt 0.29.7) or greater, and let us know if you see improvements regarding memory usage
Upgrade to version 2.23.20 if you want to provide a custom maxNativeMemoryLimitInBytes

Edit: updated with the version of the SDK that includes the new maxNativeMemoryLimitInBytes attribute.

Lunkers added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 24, 2023

debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Jun 6, 2023

debora-ito self-assigned this Jun 6, 2023

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jun 10, 2023

debora-ito added crt-client p2 This is a standard priority issue labels Jun 16, 2023

debora-ito mentioned this issue Jun 20, 2023

CRT Client upload causes increase in memory consumption #4094

Closed

debora-ito removed their assignment Jun 20, 2023

debora-ito mentioned this issue Jul 11, 2023

S3TransferManger with CRT - Uploading a single large file causes memory issues #4168

Closed

debora-ito added the transfer-manager label Jul 11, 2023

debora-ito mentioned this issue Jul 11, 2023

SdkClientException: Failed to send the request: A callback has reported failure on upload large volume stream (100GB) with S3AsyncClient crt client(short issue description) #4082

Closed

debora-ito added p1 This is a high priority issue and removed p2 This is a standard priority issue labels Jul 25, 2023

kandybaby mentioned this issue Nov 19, 2023

Memory leak after large number of upload jobs kandybaby/S3mediaArchival#3

Closed

graebm mentioned this issue Nov 21, 2023

Mem limiter awslabs/aws-c-s3#368

Merged

debora-ito added the closing-soon This issue will close in 4 days unless further comments are made. label Feb 7, 2024

debora-ito mentioned this issue Feb 8, 2024

AWS JAVA SDK BOM 2.21.6 causing out of memory issue #4836

Closed

github-actions bot added closed-for-staleness and removed closing-soon This issue will close in 4 days unless further comments are made. labels Feb 11, 2024

github-actions bot closed this as completed Feb 11, 2024

bryanlb mentioned this issue Feb 16, 2024

Update CRT client, set default limit for off-heap & provide override slackhq/astra#774

Merged

debora-ito mentioned this issue Mar 8, 2024

Memory Leak in S3TransferManager #4999

Closed

vladimirfx mentioned this issue Mar 28, 2024

Add maxNativeMemoryLimitInBytes option for S3 CRT client quarkiverse/quarkus-amazon-services#1197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRT client not respecting pod memory limits #4034

CRT client not respecting pod memory limits #4034

Lunkers commented May 24, 2023 •

edited

Loading

debora-ito commented Jun 6, 2023

SriDeepa-s3 commented Jun 10, 2023

Lunkers commented Jun 12, 2023

SriDeepa-s3 commented Jun 12, 2023

debora-ito commented Jun 16, 2023

SriDeepa-s3 commented Jun 21, 2023 •

edited

Loading

graebm commented Jun 26, 2023

graebm commented Jun 27, 2023

graebm commented Jun 27, 2023

SriDeepa-s3 commented Jun 28, 2023 •

edited

Loading

graebm commented Jun 28, 2023

SriDeepa-s3 commented Jul 3, 2023 via email

neel1298 commented Jul 13, 2023

graebm commented Jul 14, 2023

dfinucane commented Jul 20, 2023

dfinucane commented Jul 20, 2023

dfinucane commented Jul 22, 2023

davidh44 commented Aug 4, 2023

rtjain21 commented Aug 23, 2023

alexsander-terres commented Sep 12, 2023

jassuncao commented Oct 13, 2023

jason-weddington commented Oct 13, 2023

jma-9code commented Nov 21, 2023

debora-ito commented Feb 7, 2024 •

edited

Loading

CRT client not respecting pod memory limits #4034

CRT client not respecting pod memory limits #4034

Comments

Lunkers commented May 24, 2023 • edited Loading

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

AWS Java SDK version used

JDK version used

Operating System and version

debora-ito commented Jun 6, 2023

SriDeepa-s3 commented Jun 10, 2023

Lunkers commented Jun 12, 2023

SriDeepa-s3 commented Jun 12, 2023

debora-ito commented Jun 16, 2023

SriDeepa-s3 commented Jun 21, 2023 • edited Loading

graebm commented Jun 26, 2023

graebm commented Jun 27, 2023

graebm commented Jun 27, 2023

SriDeepa-s3 commented Jun 28, 2023 • edited Loading

graebm commented Jun 28, 2023

SriDeepa-s3 commented Jul 3, 2023 via email

neel1298 commented Jul 13, 2023

graebm commented Jul 14, 2023

dfinucane commented Jul 20, 2023

dfinucane commented Jul 20, 2023

dfinucane commented Jul 22, 2023

davidh44 commented Aug 4, 2023

rtjain21 commented Aug 23, 2023

alexsander-terres commented Sep 12, 2023

jassuncao commented Oct 13, 2023

jason-weddington commented Oct 13, 2023

jma-9code commented Nov 21, 2023

debora-ito commented Feb 7, 2024 • edited Loading

Lunkers commented May 24, 2023 •

edited

Loading

SriDeepa-s3 commented Jun 21, 2023 •

edited

Loading

SriDeepa-s3 commented Jun 28, 2023 •

edited

Loading

debora-ito commented Feb 7, 2024 •

edited

Loading