👌 IMPROVE: add `DaemonClient.lock()` #4924

chrisjsewell · 2021-05-06T15:31:20Z

This is a proposal for part of the solution to #4321, in order to provide a basic locking mechanism for the daemon; to "ensure" that it can not be running or started from another process:

In [29]: from aiida.engine.daemon.client import get_daemon_client
    ...: client = get_daemon_client()
    ...: with client.lock():
    ...:     # try to get another client
    ...:     get_daemon_client()
    ...: 
---------------------------------------------------------------------------
DaemonLock                                Traceback (most recent call last)
<ipython-input-29-58ecd1c03adb> in <module>
      3 with client.lock():
      4     # try to get another client
----> 5     get_daemon_client()
      6 

~/Documents/GitHub/aiida-core-origin/aiida/engine/daemon/client.py in get_daemon_client(profile_name)
     61         profile = config.current_profile
     62 
---> 63     return DaemonClient(profile)
     64 
     65 

~/Documents/GitHub/aiida-core-origin/aiida/engine/daemon/client.py in __init__(self, profile)
     87         self._DAEMON_TIMEOUT: int = config.get_option('daemon.timeout')  # pylint: disable=invalid-name
     88         if Path(self.daemon_lock_file).exists():
---> 89             raise DaemonLock(
     90                 f"The '{self.profile.name}' daemon is locked for use. "
     91                 'Use `verdi daemon unlock` if this is unexpected.'

DaemonLock: The quicksetup daemon is locked for use. Use `verdi daemon unlock` if this is unexpected.

cc @giovannipizzi, @sphuber for comment

(if you think its ok, I'll add tests and was also thinking of adding verdi daemon unlock, in case for whatever reason the lock file is not removed)

sphuber · 2021-05-06T18:46:25Z

Concept looks good I think. There is just one edge case that I realized that I wanted to note before I forget. The check that the "daemon is running" is actually checking that the circus daemon pid file exists. However, that doesn't check that there are daemon worker processes running. Now typically if the circus daemon is not running, there shouldn't be daemon workers, however that is not a guarantee. It is perfectly possible to launch a daemon worker outside the circus daemon. For example verdi devel run_daemon directly runs a daemon worker process, without circus. Now since this is under verdi devel we might not want or need to have to take this into account, but just wanted to mention it for discussion.

giovannipizzi · 2021-05-06T22:13:54Z

aiida/engine/daemon/client.py

+    def lock(self):
+        if self.is_daemon_running:
+            raise RuntimeError(f"Cannot lock the '{self.profile.name}' daemon whilst it is running.")
+        Path(self.daemon_lock_file).touch()


I think here you need to avoid concurrent creation of the file to succeed silently, so exist_ok=False

giovannipizzi · 2021-05-06T22:16:11Z

aiida/engine/daemon/client.py

@@ -95,6 +102,14 @@ def daemon_name(self) -> str:
        """
        return self._DAEMON_NAME.format(name=self.profile.name)

+    @contextmanager
+    def lock(self):
+        if self.is_daemon_running:


I think you are mixing the "old" check with the PID file with the "new" check with the lock file. However the logic sounds a bit complex to me, because the check on whether is locked is actually done in the __init__. Shouldn't this function raise if there is already a lock file? I.e. in the line with the touch below, and using exist_ok=False, catch that exception and re-raise the DaemonLock exception? Maybe I'm just missing how this lock manager would actually be used in practice (I know it's in the commit message but I'm still not sure of how you envision to use it in practice).

Yeh I can do that as well 👍
The key point is to inhibit the instantiatuon of any further DaemonClient (from this or any other processes), since this is required for both verdi daemon start-circus and verdi devel run-daemon.
Naturally you could bypass this in some way, so it's not a "100% guarantee", but I think this is a 99% solution to provide a good level of insurance that no daemons are running while we clean the repo

giovannipizzi · 2021-05-06T22:17:48Z

aiida/engine/daemon/client.py

+            raise RuntimeError(f"Cannot lock the '{self.profile.name}' daemon whilst it is running.")
+        Path(self.daemon_lock_file).touch()
+        yield self
+        Path(self.daemon_lock_file).unlink()


I think you actually want to do a:

try: ... yield self finally: Path(self.daemon_lock_file).unlink()

to ensure the file is removed even if there are exceptions

Oh yeh, I was mistaking it with pytest fixtures, that have this try/finally logic built-in.
Thought that was the case here, but no

giovannipizzi · 2021-05-07T06:48:27Z

I think this is a 99% solution to provide a good level of insurance that no daemons are running while we clean the repo

I see. However, in principle, we'd like also to avoid that there are verdi shell running while the repo is cleaned (as well as avoiding that two cleaners work at the same time).

What, instead, about this different approach.

In `load_profile`

This should currently be called by any process that needs to use data from AiiDA, so both verdi shells, Jupiter notebooks, daemons, etc

This already calls load_backend_environment, that already does checks e.g. on the Schema Version from DbSetting:

aiida-core/aiida/backends/manager.py

Lines 110 to 120 in 16b326b

    
               def load_backend_environment(self, profile, validate_schema=True, **kwargs): 
        
                   """Load the backend environment. 
        
                   :param profile: the profile whose backend environment to load 
        
                   :param validate_schema: boolean, if True, validate the schema first before loading the environment. 
        
                   :param kwargs: keyword arguments that will be passed on to the backend specific scoped session getter function. 
        
                   """ 
        
                   self._load_backend_environment(**kwargs) 
        
                   if validate_schema: 
        
                       self.validate_schema(profile)

We add another check of a new locked_profile key (whose value is set to the PID of the process that has locked the profile)
- if the key exists, stop and exit
  - if the value (that is the PID of the process locking the profile) is a valid PID of a running process, mention in the message the PID of the process that is locking this profile
  - if there is no process with that PID, maybe mention a verdi level unlock-deamon if people are sure? (then implement this command, but also there do the same logic, to avoid that a user just calls the command because time has been passing, preventing the lock to properly work

In any process that needs to acquire a lock (`verdi clean` subcommands etc.)

Call a locking mechanism (with try/finally or a context manager, to be implemented).
The locking mechanism should do the following checks:

already it should load the dbenv before, so it does the checks above
it should perform a SQL query asking PostgreSQL how many connections there are to the current DB; stop if there is any other connection except itself [1]
add (in an atomic way, and failing if a row with the key already exists) a new row in DbSetting, with key locked_profile and value the PID of the current process. This can be achieved by just adding a row to DbSetting, where the table has a uniqueness constraint on the key (locked_profile in this case)
- if the transaction fails, it means someone else acquired a lock. Print a message (maybe mention the PID) and exit
- if it goes through, the lock was correctly acquired and no-one else could acquire it or can acquire it in the future, continue
there is still the possibility that another process called load_profile/dbenv at the same time. Do again the check [1] above on the processes; if there is some other process, remove the lock, and exit with an error message saying that there is some other process (same error as in [1])
At this point, it's safe to assume that there is a single process that has locked the profile and no other process can neither acquire another lock, nor use the DB. Start performing the operations
Make sure to unlock (remove the DbSetting row) in a try/finally block (or in a context manager) so we reduce the risks that the profile remains locked inadvertently

This should be quite safe to avoid multiple uses from any client. It only remains to check which query to run to accomplish [1], but I'm quite sure a google search will help.
The only risk is that the python process is killed (e.g. with kill -9, SIGSEGV, system reboot) and does not remove the row in the DbSetting. But the the check in load_profile will detect that the PID does not exist anymore and suggest the user to run verdi level unlock-deamon

Also, in this way, we can keep the current logic for the daemons avoiding to touch that part.

giovannipizzi · 2021-05-07T06:57:19Z

For [1] the query could be:

select pid, client_port from pg_stat_activity where datname='PUT_YOUR_DB_NAME_HERE';

This seems to correctly count the number of connections (e.g. 1 per daemon, 1 per verdi shell, and itself from psql.
however the pid is not the python pid but the pid of the psql process launched to reply to requests.
Maybe the port can help checking, but with this query it's not possible to easily check which python process is connected (so detect "yourself").
Maybe it's enough (and one just checks that there is only 1 connection?) or maybe there is a way to either get the actual python process ID, or to check (from the python side) the port from which we are connected to postgres (via the psypopg2 library or similar?)

EDIT: at least we can remove "ourselves" from the query using the pg_backend_pid() function that returns the backend PID for the current connection.
To check it:

select pg_backend_pid();

For the query above:

select pid, client_port from pg_stat_activity where datname='PUT_YOUR_DB_NAME_HERE' and pid != pg_backend_pid();

would return non-zero results if there is anybody else.

This is enough I think: we can tell the user that there are XX other connections, and the PID of the backends (even if not the PID of the python process - but we could provide 1. a utility function in AiiDA to check it in verdi shell, and 2. print it when a daemon is started, in the logs). And then we would just put our current PID in the DbSetting table, so load_profile knows which (python) PID what to check at load, and in the unlock part.

dev-zero · 2021-05-16T17:25:49Z

When I was reading the summary of this I got confused: why would you need a lock file when you have a pid file already which in the Linux world is usually sufficient to determine whether a service is up. Only when I finished reading I realized that you need it to make sure no daemon gets started (or is running).
Since a lock (file) is often meant to prevent a task from running more than once at the same time I would hence suggest to reword it to something more specific like inhibitor_lock as being used in the context of inhibit start of the screensaver or maybe call it maintenance_mode (often used in webapps) or a profile_lock (rather than a daemon lock).

I'd prefer @giovannipizzi's approach: since we have a DB with OS-portable transactions it should be preferred. Only change: add a timestamp to it as well (that would be included in a file-based lock file for free and can give the user more info to decide whether it is safe to manually unlock) and call it unlock-profile.

chrisjsewell · 2021-05-16T18:07:00Z

Yeh cheers for the feedback @dev-zero, indeed it was very much a rough initial proposal. I'll take the feedback and do a "take 2"

sphuber · 2022-01-20T16:13:16Z

@chrisjsewell I take it this can be closed since @ramirezfranciscof is implementing the approach that also includes the functionality of tracking access by process other than the daemon workers.

sphuber · 2022-01-21T19:19:35Z

Superseded by #5270

👌 IMPROVE: add DaemonClient.lock()

e6a9448

chrisjsewell requested review from sphuber and giovannipizzi May 6, 2021 15:31

giovannipizzi reviewed May 6, 2021

View reviewed changes

ramirezfranciscof mentioned this pull request May 31, 2021

ADD: New repository CLI #4965

Merged

2 tasks

ramirezfranciscof mentioned this pull request Sep 14, 2021

NEW: Mechanism to lock profile access within AiiDA #5135

Closed

sphuber closed this Jan 21, 2022

sphuber deleted the daemon-lock branch January 27, 2022 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👌 IMPROVE: add `DaemonClient.lock()` #4924

👌 IMPROVE: add `DaemonClient.lock()` #4924

chrisjsewell commented May 6, 2021 •

edited

Loading

sphuber commented May 6, 2021

giovannipizzi May 6, 2021

giovannipizzi May 6, 2021

chrisjsewell May 6, 2021

giovannipizzi May 6, 2021

chrisjsewell May 6, 2021 •

edited

Loading

giovannipizzi commented May 7, 2021

giovannipizzi commented May 7, 2021 •

edited

Loading

dev-zero commented May 16, 2021

chrisjsewell commented May 16, 2021

sphuber commented Jan 20, 2022

sphuber commented Jan 21, 2022

👌 IMPROVE: add DaemonClient.lock() #4924

👌 IMPROVE: add DaemonClient.lock() #4924

Conversation

chrisjsewell commented May 6, 2021 • edited Loading

sphuber commented May 6, 2021

giovannipizzi May 6, 2021

Choose a reason for hiding this comment

giovannipizzi May 6, 2021

Choose a reason for hiding this comment

chrisjsewell May 6, 2021

Choose a reason for hiding this comment

giovannipizzi May 6, 2021

Choose a reason for hiding this comment

chrisjsewell May 6, 2021 • edited Loading

Choose a reason for hiding this comment

giovannipizzi commented May 7, 2021

In load_profile

In any process that needs to acquire a lock (verdi clean subcommands etc.)

giovannipizzi commented May 7, 2021 • edited Loading

dev-zero commented May 16, 2021

chrisjsewell commented May 16, 2021

sphuber commented Jan 20, 2022

sphuber commented Jan 21, 2022

👌 IMPROVE: add `DaemonClient.lock()` #4924

👌 IMPROVE: add `DaemonClient.lock()` #4924

chrisjsewell commented May 6, 2021 •

edited

Loading

chrisjsewell May 6, 2021 •

edited

Loading

In `load_profile`

In any process that needs to acquire a lock (`verdi clean` subcommands etc.)

giovannipizzi commented May 7, 2021 •

edited

Loading