-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Django backend (possibly) not closing PostgreSQL connections #4374
Comments
If you are referring to the |
The source of the problem is indeed that idle worker processes (REST APIs, optimade APIs, discover sections... [1]) keep their database connection open, see writeup here. I believe we now need to look into making sure that these web services close the postgresql connection when it is no longer needed. P.S. Issue #3867 is related but concerns the use case of persistent connections and essentially asks for a way to recover if those connections are interrupted due to external reasons. [1] The same probably holds in the AiiDA lab, where every running jupyter notebook may have its own DB connection. Since these aren't threaded, however, one is unlikely to hit the 100 connection limit. [2] If the goal is just to fix the issue encountered on materials cloud, one could of course also just add a cronjob on the db server that kills connections that have been idling for too long (+ a fix of #3867). That's more of a workaround than a solution, however. mentioning also @giovannipizzi for info |
P.S. Just another thought: As long as we manage to keep the AiiDA version of the databases hosted on Materials Cloud homogeneous, a model where the connection/profile is coupled to a particular request would actually allow us to serve all REST API requests from the same worker process (scaling to as many processes as needed to serve all requests). With the current model, we need to give each profile's REST API server enough resources to serve a reasonable number of requests, even if this profile is requested only very rarely (multiplying memory requirements [1] by the number of profiles that we are serving). I believe the same holds for optimade requests (but not for discover sections, where the "server code" differs from one to the other). [1] Memory usage by an apache wsgi daemon for the AiiDA REST API is roughly ~100MB. Similar for a REST API served via a docker container running a gunicorn process. |
Again here, thanks for the investigation of this @ltalirz ! A context manager similar to what is implemented in the REST API would be good, but may be difficult to implement. See aiida-core/aiida/restapi/resources.py Line 98 in b002a9a
aiida-core/aiida/restapi/common/utils.py Line 853 in b002a9a
However, it may also be good with a simple "close" functionality if possible, since the Concerning your latest comment about keeping a coupled connection/profile seems like a great solution, but at the same time quite limiting in terms of testing AiiDA versions, but if we do this in production only, it might work out great. The issue would then still persist on the development servers though - unless you would force AiiDA version limitations also there? |
Thanks for the pointers to the REST API code @CasperWA So, it seems you are taking care of opening and closing sessions for the QueryBuilder, but the underlying connection is remaining untouched. I.e. we would need to add handling of the connection in a similar way.
I agree that it is important to retain the ability to run REST APIs for different AiiDA versions, even if just for development purposes. |
This was indeed my point, the hinted code in the REST API was merely to show how your suggested context manager solution might work, since we already have a solution that sort of does this. So it was for inspiration in the end :)
Thanks 🎉 it was combined efforts between @sphuber and I :)
Hmm, is that feasible? |
This forum post may be relevant https://forum.djangoproject.com/t/closing-old-db-connections-outside-of-the-request-context/299
|
This reminds me of a question as well @ltalirz: Were the list of databases in your overview all using Django as backend or are some using SQLAlchemy? Or in other words, is this an issue with using Django or a more general issue? |
I think this is a problem of configuration and not necessarily a problem in Since you guys are mostly just using AiiDA to query the database and are not running the daemon, the only connections should go through the query builder. Even for Django, this goes through a SqlAlchemy and it is SqlAlchemy that manages a connection pool. The idea is then that we don't have to close the connections (which also has an overhead and closing/opening everytime may not be the most efficient) and we simply reuse open ones when needed. I think this may ultimately simply be a configuration problem. Of course if you start serving too many applications from a single server and PSQL cluster, at some point you run out of resources. If you think the current amount of projects should perfectly be manageable with the default of a 100 connections, then we can always configure the connection pool of SqlAlchemy. @CasperWA added a commit not too long ago (the commit to test the correct handling of the session for the REST API) that allows to configure the parameters of the SQLA connection pool. I don't think it is fully plumbed through that you can configure this dynamically per AiiDA profile, but if necessary you could add this. Otherwise you can try a temporary hardcoded solution and limit the number of connections in a pool. The default number of connections per pool seems to be 5. |
thanks for the comment @sphuber , I'll look into exactly which connections are set up
Just to be sure: are you saying that if one is not running the daemon, one is not using the DB backend? |
No, my point was that you then have only a single process running AiiDA and so a single ORM, which means there is a single SqlAlchemy engine with a single connection pool. The default is 5 connections, so I think each AiiDA process is already limited to 5 connections max. Although, now that I think of it, there is the overflow parameter, |
I see, but what if it is using a django database? To follow up - when running a rest api on a |
That is the point, even for a Django backend, the database connection through the |
And that is my point ;-) I'm checking now |
See this issue also: #2039 |
But this will not be the case if a Node is updated, like for the OPTIMADE REST API server upon the first request (updating the extras). |
After forcing aiida-core/aiida/backends/utils.py Lines 40 to 42 in fe8333e
I.e. in case of a django profile, we will need to handle both the sqla connection and the django connection (which for the REST API can probably simply be closed once and for all if what Seb says is correct).
While this certainly makes sense for regular AiiDA usage, as the number of processes querying the DB increases, postgresql connections eventually become a scarce resource and one is better off closing them when they are not needed (see article). Edit: Things differ between REST API and verdi shell. On a django backend I see this:
|
The ideal here, I think, would then be to figure out if there's a timer on connections.
Indeed. However, what we have now is then worse than this, with 2 connections per thread no matter the backend, or did I misunderstand? |
Just as a follow-up to @giovannipizzi's question concerning where time is being spent when loading the backend, I did some timing tests. My MacBook Pro 15" 2015 (postgres server on same machine)ipython
In [1]: from aiida import load_profile
In [2]: load_profile()
Out[2]: <aiida.manage.configuration.profile.Profile at 0x7facb9d35978>
In [3]: from aiida.manage.manager import get_manager
In [4]: m=get_manager()
In [5]: %prun -D backend2.prof m.get_backend() This takes ~1.5s on my machine, with 94% of the time spent inside Then I did: python -m timeit -n 1 -r 1 -s "import psycopg2" 'conn= psycopg2.connect("dbname=... user=... password=...")'
1 loops, best of 1: 3.05 msec per loop
python -m timeit -n 1 -r 10 -s "import psycopg2" 'conn= psycopg2.connect("dbname=... user=... password=...")'
1 loops, best of 10: 2.44 msec per loop Note: I checked also manually that this really establishes the connection, and calling Materials Cloud REST API server (postgres server on different machine)In [1]: from aiida import load_profile
In [2]: load_profile()
Out[2]: <aiida.manage.configuration.profile.Profile at 0x7ff50f214ba8>
In [3]: from aiida.manage.manager import get_manager
In [4]: m=get_manager()
In [5]: %prun -D backend.prof m.get_backend() This takes ~5.4s.
ConclusionWhile opening a new postgresql connection is not instantaneous (~3-10ms), I believe it would be an acceptable delay on a per-request basis. Note: There will of course also be python code involved when considering profile switching on a per-request basis; one will need to figure out how expensive this is on the python side. |
Just to add some further testing notes (current It seems to me that the number of connections opened on a standard django profile depends a lot on the initial load on the server. Given that
Will look into this further. |
Wouldn't it make sense, though, that each new request opens a new connection (SQLA session), which it then closes? Or is this irrelevant/separate to the number of connections you're seeing? |
Yes! I should probably have been more clear on that the number of postgresql connections I was observing were in between queries. E.g. when I say there are 7 open connections, this is "without any active request". |
Before I look into trying to get rid of superfluous django connections, here a summary of the observations for sqlalchemy profilesOn Here some timings of requests to the http://127.0.0.1:5000/api/v4/nodes endpoint on a tiny database using the built-in server on my macbook (local postgres DB):
This is consistent with the overhead of opening a postgres connection mentioned above and would indicate that switching to a [1] To see this, set |
django profilesFor django profiles, one has an additional persistent connection that is initialized together with the dbenv. One simple workaround to close the connection in the from django.db import connection
connection.close() I've added this and the Another interesting observation is that, from the sqlalchemy logs, also the first request to a This suggests that the 4 extra connection checkouts observed on sqlalchemy profiles (on first request) are caused by some initialization of the sqla database (e.g. I did notice |
Since Django is dropped for v2.0, I am closing this. Feel free to reopen if it persists in v2.0 nonetheless or if this is a critical problem that needs to be fixed on |
In using the OPTIMADE Client I found an issue with the Materials Cloud AiiDA-OPTIMADE servers that is noted to be an issue of not closing 100+ PostgreSQL database connections (see related issue).
I was looking into the PR I created to handle the issue of closing SQLAlchemy sessions for the AiiDA REST API (#3974), and came across the following line's asymmetry between the backends, which might be fine, but might not be.
Django:
aiida-core/aiida/backends/djsite/__init__.py
Line 65 in b002a9a
SQLAlchemy:
aiida-core/aiida/backends/sqlalchemy/__init__.py
Line 57 in b002a9a
Could this be the cause of the issue? The database that caused the issue was indeed using a django backend.
Pinging @sphuber and @giovannipizzi
The text was updated successfully, but these errors were encountered: