Memory leaking from version 7? #4087

norrs · 2019-08-20T08:32:51Z

Issue Summary

After upgrading from redash5 to redash7, we notice redash webserver is leaking memory linear compared to redash5.

Steps to Reproduce

Deployed redash7 in kubernetes, watching metrics.

Technical details:

Redash Version: redash/redash:7.0.0.b18042
How did you install Redash: Helm

See helm/charts#5071 (comment) for my upgrade notes to existing merge request about a public redash helm chart for kubernetes.

Last hours:

Last 30 days: (you can see when we did the upgrade at 6. august. The restarts is basically pods being restarting due to kubernetes evicting em to memory usage, using more then requested etc)

norrs · 2019-08-20T08:40:50Z

Oh, I just noticed that f165168 is already commited to master.

I'll try the latest docker image and redeploy our redash7 to see if setting max requests and max request jitter has an effect. Even if it fixes the problem, I would say redash7 still has a memory leak issue.

But this is a "decent" work around for now, I'll post back how it goes.

norrs · 2019-08-20T08:45:57Z

Hm, https://hub.docker.com/r/redash/redash/tags doesn't seem to have any newer builds for 7.x that contains f165168 as I understand it.

Is there anything special that is required for this upgrade without maybe database migrations?
I see nothing in https://version.redash.io/ or https://github.com/getredash/redash/blob/master/CHANGELOG.md for redash8

norrs · 2019-08-20T08:52:58Z

Feature for max requests and max request jitter comes from #4013

norrs · 2019-08-21T06:56:18Z

This seems to be a work around for the memory leak at least.

arikfr · 2019-09-28T18:06:17Z

Last 30 days: (you can see when we did the upgrade at 6. august.

Was it the only change that happened around this time?

Btw, for reference, Gunicorn can accept additional command line options via an en var:

Settings can be specified by using environment variable GUNICORN_CMD_ARGS. All available command line arguments can be used. For example, to specify the bind address and number of workers:

$ GUNICORN_CMD_ARGS="--bind=127.0.0.1 --workers=3" gunicorn app:app

So you didn't have to upgrade to benefit from the new options.

norrs · 2019-09-30T06:29:01Z

@arikfr : That could be that I didn't need to upgrade to get the GUNICORN_CMD_ARGS, but what I found at the time was f165168 .

And having workers restarting clearly indicates something is leaking somewhere. Where, I don't know.

We are happy enough with having workers being restarted.

In regards if there was other things that happened around this time, only thing I can say is that
there probably were a few "start migration over again" because I did a silly mistake or two while upgrading, but the end results is what matters here, clearly shows pods increasing in memory usage until they were killed by out of memory.

Graphs are directly from our production environment.

ATM we are on some alpha build of redash8 I think which had a docker image with f165168 in it, and it has been working perfectly. So much gracious for a great product!

arikfr · 2019-11-20T11:15:22Z

And having workers restarting clearly indicates something is leaking somewhere.

The reason we added auto restarts was because Python doesn't release memory. As query results are loaded into memory before being served, if a large query result was served it might cause the memory usage to go up.

norrs · 2019-11-27T10:00:13Z

@arikfr That doesn't answer the graphs memory usage. Earlier versions still had to load result sets into memory before serving em, and we didn't see the growing of memory usage as the application lived on in redash5. So yes, python doesn't release the memory, but that due to it's "garbage collector" cannot do its job I believe.

I suspect there is something holding references, so refcounting never gets cleared for the "garbage collector" to kick in and do its job. Circular references comes to mind. ( https://stackoverflow.com/questions/2428301/should-i-worry-about-circular-references-in-python )

Best way to hunt down the cause would be by bisecting between the two major versions, but you would need to make a reproducible test first.

sys.getrefcount(a) can be a tool in the search for it maybe.

The one who wants to devote time into finding exactly the cause, is a super hero! This work is both tricky and cumbersome.

For other finding this issue, restarting the gunicorn workers is a work-around working just fine.

(Again, Thanks for a superb product)

arikfr · 2019-11-27T10:12:51Z

That doesn't answer the graphs memory usage.

Sure. I just mentioned that to give context to why we implemented the restart...

Thank you for the tips on how to hunt this down. We will probably wait with it until the Python 3 migration is done and some other ongoing things, but definitely would like to address it.

weekly-digest bot mentioned this issue Aug 26, 2019

Weekly Digest (19 August, 2019 - 26 August, 2019) #4098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leaking from version 7? #4087

Memory leaking from version 7? #4087

norrs commented Aug 20, 2019

norrs commented Aug 20, 2019

norrs commented Aug 20, 2019 •

edited

Loading

norrs commented Aug 20, 2019

norrs commented Aug 21, 2019

arikfr commented Sep 28, 2019

norrs commented Sep 30, 2019

arikfr commented Nov 20, 2019

norrs commented Nov 27, 2019

arikfr commented Nov 27, 2019 •

edited

Loading

Memory leaking from version 7? #4087

Memory leaking from version 7? #4087

Comments

norrs commented Aug 20, 2019

Issue Summary

Steps to Reproduce

Technical details:

norrs commented Aug 20, 2019

norrs commented Aug 20, 2019 • edited Loading

norrs commented Aug 20, 2019

norrs commented Aug 21, 2019

arikfr commented Sep 28, 2019

norrs commented Sep 30, 2019

arikfr commented Nov 20, 2019

norrs commented Nov 27, 2019

arikfr commented Nov 27, 2019 • edited Loading

norrs commented Aug 20, 2019 •

edited

Loading

arikfr commented Nov 27, 2019 •

edited

Loading