Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM on AKS (docker container) in version above 1.2.6 #432

Closed
ppiwow-apay opened this issue Mar 25, 2022 · 16 comments
Closed

OOM on AKS (docker container) in version above 1.2.6 #432

ppiwow-apay opened this issue Mar 25, 2022 · 16 comments
Assignees
Labels
bug status-in_progress Issue is worked on by the driver team

Comments

@ppiwow-apay
Copy link

Issue description

On the local machine all works fine on each version, but when we deploy it to docker image and release it to AKS new SnowflakeConnection causes OOM on POD. From our investigation it seems na ctor of this class allocates huge amount of memory.

Does this connector support usage on unix docker image?

All works fine till version 1.2.6.

Both net 5 and net 6 have this issue. If you need any additional information let me know as there

@sfc-gh-jfan sfc-gh-jfan reopened this Jul 1, 2022
@github-actions github-actions bot closed this as completed Jul 2, 2022
@sfc-gh-jfan sfc-gh-jfan reopened this Jul 6, 2022
@jimhoonan
Copy link

We are running into this issue as well. Had to revert to an older version of the driver. Very frustrating.

@ppiwow-apay
Copy link
Author

any information? are you going to look at it at all?

@sfc-gh-wfateem
Copy link
Collaborator

@ppiwow-apay None of the tests cover AKS, but I don't think that should be relevant. If there is a memory leak then that should have surfaced on other environments.
Do you happen to have a heap dump that you can share exhibiting the memory leak you're reporting?

@sfc-gh-wfateem
Copy link
Collaborator

@ppiwow-apay I also recognize that we have been late in responding to you, so sorry about that. Given that your post was back in March, can you clarify what version of the .NET driver you were using at the time and have you been able to reproduce that same behavior with the latest version?

@ppiwow-apay
Copy link
Author

The last I tried was the newest version, I could try once more but need some time to update and get memory dumps. all version over 1.2.6 (we updated from time to time) was affected.

locally on the machine or locally on docker all seems to work fine, and only on aks does this issue occur.

@jblackburn21
Copy link

jblackburn21 commented Sep 21, 2022

@sfc-gh-jfan We have seen some similar behavior. I have been testing this recently with v2.0.16 and Dapper 2.0.123 on dotnet 6.

Our simple app spins up 20 tasks and runs this in a loop. When using dapper, the memory usage of the app running in k8s more than doubled over 6 hours.

public class SnowflakeRepository : IRepository
{
    public async Task<IList<Entity>> GetEntities(long entityId, CancellationToken token)
    {
        await using var conn = new SnowflakeDbConnection();
        conn.ConnectionString = "account=***;user=***;password=***;db=***;role=***;warehouse=***;schema=***";
        await conn.OpenAsync(token);
        
        var query = "select * from presentation.vw_entities where EntityId = :entityId;";

        var entityResults = await conn.QueryAsync<Entity>(query, new { entityId });
                
        return entityResults.ToList();
    }
}

@ppiwow-apay
Copy link
Author

Hi, in our tests we recognized that a memory leak was on calling ctor:

new SnowflakeDbConnection();

we made tests with only creating new connection and it was enough.

@jblackburn21
Copy link

I can confirm what @ppiwow-apay is reporting. We trimmed down our sample to remove dapper and use a minimal setup, and am seeing the same behavior.

@mattcalt
Copy link

I am seeing the exact same behavior. Stripped down the application to the basics and seeing the same memory issues.

@ppiwow-apay
Copy link
Author

any news on that?

@sfc-gh-jtang
Copy link

@sfc-gh-igarish any news on that?

@sfc-gh-igarish
Copy link
Collaborator

There are some questions:
Looks like their garbage collection didn't work as usual. Can the customer check their garbage collector setting? And do you know how many free memory do they have inside that decker?

I found something maybe related to this OOM issue.

dotnet/runtime#58974

On this link, they said .net 5 and 6 for workstation GC are all pass, but they are failed on Server GC, so the customer may have the same issue.
also it said for .net 5 or higher, they need more heap,

for example, the following heap count is 10, they may have to increase this.

and it said they need 3190MB to get the GC pass.

I hope this can help.

At the same time we continue looking into it.

@sfc-gh-igarish sfc-gh-igarish added the status-in_progress Issue is worked on by the driver team label May 26, 2023
@amis92
Copy link

amis92 commented Jun 1, 2023

@sfc-gh-igarish - me and @ppiwow-apay work together.

We're using workstation GC everywhere, so it's not an issue with server GC. Various services have different RAM limits, but overall it's in the 256-512 MiB range.

Please keep in mind that 1.2.6 keeps working for us (no issues), but once we move forward with just this package (same runtime image), OOMs happen, so I wouldn't look for blame in runtime.

edit: I don't think this was stated anywhere earlier plainly, but we're running the containers on Azure Kubernetes Service, on linux worker nodes. Base image is mcr.microsoft.com/dotnet/aspnet:6.0 so it's Debian 11 AMD64 OS.

@amis92
Copy link

amis92 commented Jun 1, 2023

There's a repro repository I've made 2yrs ago: https://github.com/amis92/net-snowflake-memoryleak

@sfc-gh-pbulawa
Copy link
Collaborator

sfc-gh-pbulawa commented Jul 20, 2023

I have ran the docker images based on repro repository locally and found no big differences between several driver versions. The 2.0.25 driver version was ran on .NET6, whereas rest of the tests on .NET5.

It seems to be rather consistent:

Using snowflake Snowflake.Data, Version=1.2.4.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.46 MB
- WorkingSet 24.41 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 119.64 MB
- WorkingSet 62.91 MB
- ManagedMemory 0.93 MB
Executing command
- PrivateMemory 119.70 MB
- WorkingSet 62.91 MB
- ManagedMemory 0.94 MB
Executed command
- PrivateMemory 120.65 MB
- WorkingSet 65.00 MB
- ManagedMemory 1.20 MB
Connection disposed.
- PrivateMemory 128.88 MB
- WorkingSet 65.00 MB
- ManagedMemory 1.26 MB

Using snowflake Snowflake.Data, Version=1.2.6.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.46 MB
- WorkingSet 24.31 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 119.84 MB
- WorkingSet 64.17 MB
- ManagedMemory 0.98 MB
Executing command
- PrivateMemory 119.89 MB
- WorkingSet 64.17 MB
- ManagedMemory 0.99 MB
Executed command
- PrivateMemory 122.94 MB
- WorkingSet 68.08 MB
- ManagedMemory 1.38 MB
Connection disposed.
- PrivateMemory 122.99 MB
- WorkingSet 68.30 MB
- ManagedMemory 1.44 MB

Using snowflake Snowflake.Data, Version=2.0.3.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.47 MB
- WorkingSet 24.27 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 131.25 MB
- WorkingSet 66.73 MB
- ManagedMemory 0.98 MB
Executing command
- PrivateMemory 131.29 MB
- WorkingSet 66.73 MB
- ManagedMemory 0.99 MB
Executed command
- PrivateMemory 132.24 MB
- WorkingSet 68.83 MB
- ManagedMemory 1.38 MB
Connection disposed.
- PrivateMemory 132.30 MB
- WorkingSet 68.83 MB
- ManagedMemory 1.44 MB

Using snowflake Snowflake.Data, Version=2.0.10.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.47 MB
- WorkingSet 24.38 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 131.71 MB
- WorkingSet 68.16 MB
- ManagedMemory 1.04 MB
Executing command
- PrivateMemory 131.75 MB
- WorkingSet 68.16 MB
- ManagedMemory 1.05 MB
Executed command
- PrivateMemory 135.72 MB
- WorkingSet 73.07 MB
- ManagedMemory 1.44 MB
Connection disposed.
- PrivateMemory 135.81 MB
- WorkingSet 73.30 MB
- ManagedMemory 1.49 MB

Using snowflake Snowflake.Data, Version=2.0.25.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 65.33 MB
- WorkingSet 22.90 MB
- ManagedMemory 0.09 MB
Building command
- PrivateMemory 135.98 MB
- WorkingSet 67.18 MB
- ManagedMemory 1.16 MB
Executing command
- PrivateMemory 136.00 MB
- WorkingSet 67.18 MB
- ManagedMemory 1.18 MB
Executed command
- PrivateMemory 137.11 MB
- WorkingSet 69.11 MB
- ManagedMemory 1.64 MB
Connection disposed.
- PrivateMemory 137.11 MB
- WorkingSet 69.11 MB
- ManagedMemory 1.64 MB

The OOM in AKS may be related to some kind of unexpected behavior within the dotnet runtime even when using the Workstation GC as pointed by the issue in dotnet/runtime#49317 which indicates that the GC does not work as expected by the user.

One of the comments points that switching to the alpine version of the dotnet image fixed their problem whereas other comment advised to tinker with the GC settings.

I hope this can help you with your issue.

@sfc-gh-dszmolka
Copy link
Contributor

hey all - seems like the issue is not reproducible with the recent versions of the Snowflake .NET driver and also seems to be related to an unexpected behaviour of the runtime itself.

therefore i'm now marking this issue as closed, but please feel free to comment if you have a reproduction scenario which proves evidence for bug/unexpected behaviour in the recent versions of Snowflake .NET driver and then I'll reopen and we can continue troubleshooting.

@sfc-gh-dszmolka sfc-gh-dszmolka closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug status-in_progress Issue is worked on by the driver team
Projects
None yet
Development

No branches or pull requests