Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemonized daemon runners through circus #1217

Merged
merged 24 commits into from
Mar 7, 2018

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Mar 3, 2018

Fixes #1055

The initial work was done by @DropD in PR #1213. In this PR I built on that work to mostly rip out the old logic and remove anything that was made redundant.

This PR replaces celery with circus as the tool to daemonize the DaemonRunner process, which has replaced the old polling based daemon. The new setup allows to easily run an individual and completely independent daemon for each profile. Calling verdi daemon start will call the aiida.daemon.runner.start_daemon function, which will start a DaemonRunner, and the whole thing will be daemonized by a circus process. The messages from the aiida logger will be logged to aiida-{profile_name} in the log folder, and the messages from circus will go to circus-{profile_name}. Due to these changes a few concepts became redundant and have been removed:

  • daemon user: since now each profile can run a daemon, a single daemon user does not make sense and so all the logic dealing with this has been removed
  • profile process: due to the single daemon concept, the concept of a profile process was introduced, of which there were two: verdi and daemon. This allowed for a different default profile to be set and used. This way a user could let the daemon use a different profile then the one they used for verdi commands. Again this concept has been made redundant and therefore removed.

What I have tested so far, is that two different profiles that have two different databases and repositories can successfully have their respective daemons running in parallel and submitting for example a workchain under a specific profile, will be picked up by the correct daemon as evidenced by changes in the profiles database, repository and log file.

One potential problem that is not addressed here is what happens or should happen when two different launch their daemons in parallel and both point to the same database, repository or both. Probably this is not a good idea, and we should make the usage of a db/repo pair unique to a profile. Or at the very least, only allow one of them to run a daemon at one time. This will be addressed in a separate PR.

Things to do:

  • Automated tests for the cli
  • Prevent multiple active daemons from accessing the same resources

giovannipizzi
giovannipizzi previously approved these changes Mar 5, 2018
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me!

@sphuber
Copy link
Contributor Author

sphuber commented Mar 5, 2018

@giovannipizzi thanks. I will discuss this with Martin tomorrow. I just found a small problem with some more testing, so I will fix that first

@sphuber sphuber force-pushed the fix_1055_daemon_process branch 8 times, most recently from a7cb9af to 5622e48 Compare March 6, 2018 18:19
DropD and others added 15 commits March 6, 2018 19:35
The AiiDA logger was still being hijacked by one of the old daemon
tasks, in this case the workflow_stepper that is called by the new
daemon, but this task was calling configure_logging with the old
daemon log file 'celery.log'. By removing the logging configuration
call and doing this only in the start_daemon call and passing the
same log file as the circus watcher, the aiida logger now correctly
writes to the profile specific logfile that the circus watcher also
writes too
With the new circus daemon, every profile can have its very own
daemon, that runs completely isolated and independent of all the
other profile daemons. The original concept of a 'process' in the
profile configuration, which allowed one to set a different profile
for the 'daemon' and the 'verdi' process, no longer makes sense.
Since every profile has its own daemon, a single profile is fully
self contained, for both the verdi and daemon usage.
The functionality related to the circus daemon was too specific
to be put in the basic profile configuration class and so a
separate container is created
sphuber added 9 commits March 6, 2018 19:35
Now that each profile runs its own daemon, the concept of a daemon
user is no longer relevant. Remove all the old code that dealt
with this
Celery is no longer necessary now that the daemon is daemonized
by the circus library
There was a problem with the config.json configuration file, which
led to an exception in one of the profile_daemon.get_endpoint() calls
in 'verdi daemon start'. Because the daemonize call was done before
this, the exception was silently swallowed because the output was
already redirected to the background. By moving the creating of the
arbiter configuration before the daemonize call, the exception will
now be displayed.
The default 'verdi daemon restart' will keep the circus daemon process
alive and simply restart the workers. This will thus keep the number
of workers that were running at the invocation of the command.

The '--reset' flag will force the circus daemon process to also be
shutdown and the whole thing to be restarted. This behavior emulates
the calling of 'verdi daemon stop' followed by 'verdi daemon start'
manually. After restart a new circus daemon process will be running
with a single worker
The '--wait' option now properly changes the information that
is printed to the screen
This allows us to call one single command in a subprocess from both
the 'verdi daemon start' and 'verdi daemon restart --reset' commands.
It also allows us to print a message that we're attempting to start
the daemon and as soon as the subprocess is spawned we can try to
call the client to verify whether it started and notify the user.
This hopefully accomplishes that if the daemon fails to start, it
does not do so completly silently
The recommended behavior for stopping or restarting the daemon
should be to wait for it to be completed. The old default was not
to wait and in order to wait the '--wait' flag had to be set.

Here, we invert the logic and wait by default and if the user does
not want to wait , they can set the '--no-wait' flag. This is the
safer default behavior
On March 5 2018, tornado v5.0 was released, which would get installed
automatically, however, this breaks another dependency, circus.
As of this writing circus has not yet fixed this in a release or
locked down the tornado dependency and so we have to lock it.
Without this circus will not be able to start the daemon, which is
rather crucial
@sphuber sphuber force-pushed the fix_1055_daemon_process branch from adbcfe9 to 5927768 Compare March 6, 2018 18:36
@sphuber sphuber merged commit c0bd8f8 into aiidateam:workflows Mar 7, 2018
@sphuber sphuber deleted the fix_1055_daemon_process branch March 7, 2018 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants