Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid literal for int() #124

Closed
orf opened this issue Nov 30, 2017 · 17 comments
Closed

Invalid literal for int() #124

orf opened this issue Nov 30, 2017 · 17 comments

Comments

@orf
Copy link

orf commented Nov 30, 2017

Firstly, thank you for this library. It's seriously impressive and so useful!

I ran into an exception when working with the is_running function:

  File "quip.py", line 40, in main
    if is_running('update'):
  File "/Users/orf/PycharmProjects/alfred-quip-workflow/workflow/background.py", line 99, in is_running
    pid = int(file_obj.read().strip())
ValueError: invalid literal for int() with base 10: ''

The code is taken from the getting started guide and isn't doing anything crazy:

    if not wf.cached_data_fresh('documents', max_age=60 * 60 * 60):
        cmd = ['/usr/bin/python', wf.workflowfile('quip-update.py')]
        run_in_background('update', cmd)

    if is_running('update'):
        wf.add_item('Getting new posts from Pinboard',
                    valid=False,
                    icon=ICON_INFO)

I haven't dug into it, but it seems to me like it's a race condition where the file is created by the new process but the pid has not been written to it yet.

Using something like the pidfile library solves a lot of these issue, could you vendor it inside this library?

@deanishe
Copy link
Owner

deanishe commented Dec 1, 2017

a race condition where the file is created by the new process but the pid has not been written to it yet.

This is probably the case. I don't worry about such issues as long as they aren't going to corrupt files, as the errors are very transitory: when the user enters the next character and your Script Filter is run again, the PID will have been written to the file and the error will be gone.

could you vendor it inside this library?

It doesn't solve the problem. The damn library raises a SystemExit exception instead.

The solution is not to immediately try to read the file after starting the background process:

updating = False
if not wf.cached_data_fresh('documents', max_age=60 * 60 * 60):
    cmd = ['/usr/bin/python', wf.workflowfile('quip-update.py')]
    run_in_background('update', cmd)
    updating = True

if updating or is_running('update'):
    wf.add_item('Getting new posts from Pinboard',
                valid=False,
                icon=ICON_INFO)

@deanishe deanishe closed this as completed Dec 1, 2017
@deanishe
Copy link
Owner

deanishe commented Dec 1, 2017

I've pulled some of the code out of pidfile and done a bit of testing with it.

I wasn't able to provoke the ValueError, but calling is_running() immediately after run_in_background() still returns False, as the background process hasn't had time to write its PID to the file.

I think I prefer the exception to be honest. The code might be "bad", but it strikes me as better than is_running() returning what is essentially the wrong value.

There's no way around setting your own flag (which is of course much faster than going to the filesystem) unless you give the background task a significant fraction of a second to get up and running before calling is_running().

Unless you can think of a way?

The only thing that occurs to me is some sort of short-term memory in background that holds on to recently started job names for maybe half a second.

@orf
Copy link
Author

orf commented Dec 1, 2017

Hmm, why not reverse it. Perhaps the Workflow process that launches the subprocess should write the childs PID to the file. That way we can ensure it's written before Alfred restarts the process. That would eliminate the race condition we are facing and make is_running() return the correct result in all cases

@deanishe
Copy link
Owner

deanishe commented Dec 1, 2017

process that launches the subprocess should write the childs PID to the file.

It can't: the "runner" process performs a double fork (i.e. it's a real daemon process). Thus the calling code in your process only has access to the PID of the parent process (which exits immediately), not the actual child process that runs your command.

I've tried implementing a cache that holds onto recently-launched job names for half a second. In my tests, this causes is_running() to return True when called immediately after run_in_background().

I'm unsure of how sound this logic is, however.

I think that should be okay: the purpose of the background API is so your calling script can exit and return control to Alfred, not to run parallel jobs while your calling script waits on the results.

On the next run, the pidfile will be used.

@orf
Copy link
Author

orf commented Dec 1, 2017

Hmm, that could be an OK workaround but it doesn't seem very elegant.

Regarding the double fork, couldn't we just make the first fork write the childs PID before it exits? That way the run_in_background() could block until the first fork is exited (which means that the PID is written), and the first fork wouldn't exit until the second fork has succeeded and it has written the PID?

Just spitballing here, but that sounds more foolproof than the cache IMO.

@deanishe
Copy link
Owner

deanishe commented Dec 1, 2017

couldn't we just make the first fork write the childs PID before it exits?

There are two forks, so no. The PID of the persistent background process is never known within the calling process.

The only possibility I see would be for run_in_background() to wait until the PID file exists.

@orf
Copy link
Author

orf commented Dec 1, 2017

Hmm, what about this:

   def _fork_and_exit_parent(errmsg, wait_pid=False, write_pid=False):
        try:
            pid = os.fork()
            if pid > 0:
                if write_pid:
                    write_pid_file(pid)
                if wait_pid:
                    os.waitpid(pid)
                os._exit(0)
        except OSError as err:
            _log().critical('%s: (%d) %s', errmsg, err.errno, err.strerror)
            raise err

    # Do first fork.
    _fork_and_exit_parent('fork #1 failed', wait_pid=True)

    # Decouple from parent environment.
    os.chdir(wf().workflowdir)
    os.setsid()

    # Do second fork.
    _fork_and_exit_parent('fork #2 failed', write_pid=True)

This would work like so:

  1. Subprocess A is spawned
  2. A Forks into B. A waits for B to terminate
  3. B forks into C, B writes the PID of C to a file and then terminates
  4. A then terminates as B has terminated
  5. The caller of subprocess knows that the file is written

Correct me if I'm wrong (and I might well be, sorry for wasting your time if I've got the wrong end of the stick), but wouldn't this work? We form a chain of processes as part of the fork, and use them terminating as a signal that the file has been written. This means that run_in_background() won't return until the whole chain is set up and the last process launched, which means that the file is 100% on the disk and ready for reading.

@deanishe
Copy link
Owner

deanishe commented Dec 1, 2017

I'm going to release a version where run_in_background() waits for the PID file to be written.

From my tests, it only adds ~0.015s (about 10%) to the run time of run_in_background().

It will time out after 1s waiting for the PID file, but that shouldn't really ever happen.

This seems the cleanest solution, though I'd still recommend setting a variable rather than calling is_running(), as it's so much faster than going to the filesystem.

I've also made the PID file write atomic, so no more ValueError from reading an empty file.

@deanishe
Copy link
Owner

deanishe commented Dec 1, 2017

wouldn't this work?

Let me try it your way. It seems a cleaner solution.

deanishe added a commit that referenced this issue Dec 1, 2017
This means that `is_running()` can be called immediately and will
return the right result.

Also write PID file atomically to prevent `is_running()` raising
a `ValueError` if PID file is empty.
@deanishe
Copy link
Owner

deanishe commented Dec 1, 2017

Oh yes. That works very well indeed. Unsurprisingly, it's faster than waiting for the PID file to appear on disk, and much, much cleaner.

Added in v1.29.

Thanks very much for the help. That's a big improvement, imo.

@orf
Copy link
Author

orf commented Dec 1, 2017

My pleasure, thank you for this great library 👍

@deanishe
Copy link
Owner

deanishe commented Dec 4, 2017

Hmm. It looks like this change may have re-triggered #111 (on High Sierra at least).

I hope not: I really like this feature.

@orf
Copy link
Author

orf commented Dec 4, 2017 via email

@deanishe
Copy link
Owner

deanishe commented Dec 4, 2017

it is quite dangerous to do much else that quit unless you are very very careful

How do you mean that? A double-fork is standard for daemons to do on startup, AFAIK.

replace this double fork situation with multiprocessing

I don't think multiprocessing can help here. The primary goal isn't parallelism, it's to allow the calling Script Filter process to exit as quickly as possible, so Alfred can run it again.

The typical use case is to launch a single background process to update a local cache from a web API or an application (via AppleScript). The Script Filter can work at lightning speed from the cache data while the background process slowly fetches the new data and then updates the cache.

@orf
Copy link
Author

orf commented Dec 4, 2017 via email

@deanishe
Copy link
Owner

deanishe commented Dec 4, 2017

the only thing we really changed was making the first fork write a file, and two waitpids.

Yeah, but the issue is down to some funky interaction between Python and macOS. In any case, I've only had one report of it misbehaving so far. If there are more (and there will be if there's a real issue), I'll back out the waitpids and go back to waiting for the PID file to appear.

starting a Python subprocess

I know multiprocessing well, but subprocesses and IPC are not wanted. The whole point is to start an entirely separate process, so the parent can exit as soon as possible, so that Alfred can run the workflow again.

@deanishe
Copy link
Owner

deanishe commented Dec 5, 2017

Looks like I totally jumped the gun here. The bug in the workflow was (fingers crossed) entirely down to a really stupid logic error I made.

But nowhere near as bad as the High Sierra root login one, so I'll take solace in that.

mesozoic pushed a commit to mesozoic/alfred-workflow that referenced this issue Apr 20, 2022
…anishe#124

This means that `is_running()` can be called immediately and will
return the right result.

Also write PID file atomically to prevent `is_running()` raising
a `ValueError` if PID file is empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants