Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is a slow-ish memory leak in the agent's HTTP request library. The leak has been confirmed on OpenSuSE and Debian thus far. I have not been able to reproduce it on Ubuntu, RHEL, or CentOS.
I created a branch with tracemalloc instrumentation to help track down the issue. The leak appeared with the following file and line context.
This was just useful enough to identify that the issue had to do with sockets. (tracemalloc supports grabbing a larger stacktrace around the memory allocation source, but I was not able to get that additional context. I have not debugged why.) The couple of bare socket calls that the agent makes are infrequent. The agent does make a lot of HTTP requests, so I started with the HTTP library.
The first thing I noticed is that the agent never explicitly cleans up an HTTP connection when it is done with it. Note that the following method never calls
resp.connection.close()
.I thought this was fine. I would expect Python to clean up once all references were removed, but this does not appear to be true at least in the case of Debian/OpenSuSE. I could fix the code to explicitly close the connection, but there are many places and branches where the agent makes HTTP requests. The HTTP library returns the HTTPResponse and not the connection, which is generally fine because you can usually reach the connection via the response. Another fix, that appears to be working, is to explicitly set the HTTP Connection header to close.
In my travels, I found that the Python requests library also had this issue. Their fix was to explicitly close the connection after use. I think that we should make that fix too, but that is a bigger and more invasive change. I would prefer to ship this fix as a hotfix, and leave the bigger fix for another release.