Horizon jobs stay on pause after 5.6 upgrade #292

orkhanahmadov · 2018-02-09T14:04:08Z

I had Horizon setup for Redis queue. Everything was working fine.

Upgraded Laravel to 5.6, now jobs stay on "pause", about 1 on every 10 jobs actually finish processing, all others stay on "pause". Horizon log file also does not have anything about "paused" jobs.

Reverted back 5.5, everything works fine.

Anyone else having this issue?

Gorbutt · 2018-02-09T20:43:11Z

Yes, I am. I wrote about the issue on Laracasts and on LaraChat but no-one has commented on it. The entry is here at Laracasts. Most jobs would get stuck but a few went through.

I came here to report this and saw your post. You're not alone!

It was also throwing an exception every 3 minutes and filling the log. No jobs were dispatched but soon as I ran 'php artisan horizon' it would throw this exception every 3 minutes or so.
local.ERROR: Error while reading line from the server. [tcp://127.0.0.1:6379] {"exception":"[object] (Predis\Connection\ConnectionException(code: 0): Error while reading line from the server. [tcp://127.0.0.1:6379] at

I wasn't sure it was a local issue so I uploaded the code to my staging server on Ubuntu (Forge provisioned) and exactly the same thing.

I rolled back to 5.5 and all is good.

Insolita · 2018-02-09T21:21:15Z

I have same trouble after upgrade to 5.6, Solved by set horizon version as 1.1.1, 1.2 seems not stable

Gorbutt · 2018-02-09T22:02:54Z

I upgraded framework to 5.6 again, same issue, downgraded Horizon to 1.1.1 and issue remained. Rolled back framework to 5.5 and ok again. Horizon 1.1.1 and 1.2 same issue for me.

shrink · 2018-02-10T02:18:02Z

I'm having what is potentially the same issue. A job is dispatched (from a model observer) and the job is created: it's visible in the queue ("Recent Jobs") and the count under "Current Workload" for the queue reflects the jobs presence. However, at no point is the job ever executed. I have added logging to both the job constructor and the handle method to verify this. The job remains in limbo, forever: "paused" state, unexecuted. The next time "Current Workload" updates (appears to be every second) it now shows no jobs pending.

I have a command running every minute populating the queue with a couple of hundred jobs, with other jobs created on demand in response to a request. These jobs are all the same class, the only difference is the model passed in.

The issue doesn't seem to have much of a pattern: it's occurring (as far as I can tell) exclusively with my jobs that are dispatched in response to a request. I can dispatch 20 jobs in a row manually and each will succeed, and then one might enter the limbo state, or a dozen might enter that state, or one might enter that state followed by success followed by another in that state. This issue appears to be most common when the queue is being populated, i.e if I add a job to the queue at the same time as my command is populating the queue it's more likely to enter the limbo state but given the seemingly random nature of this issue I'm not sure if that's just a coincidence.

Laravel 5.6
Laravel Horizon 1.1
Redis as the queue
Memory and CPU usage below 50%

I'm going to attempt to debug this more and will add anything I find here.

shrink · 2018-02-10T03:52:21Z

Making use of redis-cli monitor I think I have tracked down something that might point us in the right direction. I monitored a successful job and I monitored a job that entered "limbo" and the difference seems to be that the job is taken from the queue (BLPOP) but it isn't reserved, it just disappears. For example, here's a successful job:

1518230973.746618 [0 127.0.0.1:44506] "BLPOP" "queues:example" "0"
1518230973.746885 [0 127.0.0.1:44506] "ZADD" "queues:example:reserved" "1518231063" "{..}"
1518230973.747353 [0 127.0.0.1:45260] "HMSET" "horizon:228144" "status" "reserved" "payload"
1518230973.956982 [0 127.0.0.1:44508] "ZREM" "queues:example:reserved" "{...}"
1518230973.964735 [0 127.0.0.1:45254] "HMSET" "horizon:228137" "status" "completed" "completed_at" "1518230973"

A job is pulled from the queue, it's added to queues:example:reserved and its status is set to reserved, then the job completes and it's removed from the reserved queue and marked as complete.

...whereas, looking at a job in limbo I see the BLPOP but then nothing happens with that job: it's not reserved, there's nothing, no further log lines (in the following 60 seconds) relate to that job, it just
disappears — the rest of the log relates to a different job on a different queue being removed (included just in case there's something indicative of the issue in them).

1518230975.209830 [0 127.0.0.1:56044] "BLPOP" "queues:priority" "0"
1518230975.218347 [0 127.0.0.1:45424] "ZREM" "queues:example:reserved" "{...}"
1518230975.218792 [0 127.0.0.1:45426] "HMSET" "horizon:228143" "status" "completed" "completed_at" "1518230975"
1518230975.218893 [0 127.0.0.1:45426] "EXPIREAT" "horizon:228143" "1518234575"

So the issue appears to be: for some reason, a job in the queue is being taken but not reserved and never processed, and therefore remains in limbo (status pending but not in the queue) until it expires.

Since this works in Laravel 5.5 but not 5.6 then I'm guessing it's a Laravel issue, and the Laravel RedisQueue class was modified very recently, the retrieveNextJob method specifically... and then when I looked at the commit for those changes I see there's all these comments on the Pull Request about race conditions and lost jobs which could well explain this experience... so maybe this is it? Maybe that's the cause?

josiasmontag · 2018-02-10T12:28:32Z

As a temporary fix, adding 'block_for' => null to the queue.php config solves the issue:

    'redis' => [
        'driver' => 'redis',
        'connection' => 'default',
        'queue' => 'default',
        'retry_after' => 90,
        'block_for' => null
    ],

godbout · 2018-02-10T13:27:06Z

! Been fighting with predis for the whole day as I got a comment that this was not due to a Laravel issue. Glad I'm finding this thread.

For me, killing horizon (and having supervisord restarting it) get the jobs through. But basically, I have to kill horizon after every job queued.

shrink · 2018-02-10T16:01:09Z

@josiasmontag as far as I can tell, setting block_for to null has indeed resolved my missing jobs. Thank you :-)

sptdigital · 2018-03-06T17:09:43Z

I am still experiencing this issue on Laravel 5.5 and Horizon 1.2.2, I have cleared my config cache and restarted supervisor (I also killed supervisor, killed horizon and ran it manually).

If you follow 6fe4cd4 back it suggests this issue came about as a result to changes in the default config for 5.6, but I'm seeing it in 5.5 as well.

Now all my dependencies have 5.6 support, I'll try updating it to 5.6 tomorrow and seeing how we get on.

cspshadx · 2018-05-16T08:39:47Z

The problem persists, Laravel 5.6, Horizon 1.1.1, 1.2.3 (neither version works). Tried solution with setting block_for=null in config with no luck. Can't downgrade to 5.5 as the project was built on 5.6 initially and various packages have 5.6 deps, so downgrading is very time consuming.

nicorafales · 2018-06-04T14:46:01Z

Same as @cspshadx . My problem happens only with jobs that open a file in local disk. It seems that horizon is not so mature as we would want it to be.

shadoWalker89 · 2019-05-20T09:25:12Z

I started a new app
I'm on laravel 5.8.17 and horizon 3.1.2 (latest version of both)
I have this job that run for 831 seconds.
When the job was actually running, the horizon shows the job is paused even though it is not
My job sends progress back to the browser and i was seeing the progress advance till the end.

When my job finished, horizon showed that the job finished successfully and it run for 831seconds.
Before it was paused and the runtime is 0.4 seconds

jezmck · 2020-07-02T09:04:43Z

I found a workaround; run the standard artisan queue:work, as well as artisan horizon.

orkhanahmadov changed the title ~~Horizon jobs stay on pause after 5.6 upgraded~~ Horizon jobs stay on pause after 5.6 upgrade Feb 9, 2018

shrink mentioned this issue Feb 10, 2018

[5.6] blocking pop from redis queues laravel/framework#22284

Merged

josiasmontag mentioned this issue Feb 10, 2018

block_for should default to null #294

Merged

sisve mentioned this issue Feb 10, 2018

[5.6] The new blocking pop feature for redis queues can lose jobs in case of worker failures laravel/framework#22939

Closed

taylorotwell closed this as completed in #294 Feb 10, 2018

sisve mentioned this issue Feb 11, 2018

Jobs stays on pause with multiple horizon workers on server #279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Horizon jobs stay on pause after 5.6 upgrade #292

Horizon jobs stay on pause after 5.6 upgrade #292

orkhanahmadov commented Feb 9, 2018

Gorbutt commented Feb 9, 2018 •

edited

Loading

Insolita commented Feb 9, 2018 •

edited

Loading

Gorbutt commented Feb 9, 2018

shrink commented Feb 10, 2018

shrink commented Feb 10, 2018 •

edited

Loading

josiasmontag commented Feb 10, 2018

godbout commented Feb 10, 2018

shrink commented Feb 10, 2018

sptdigital commented Mar 6, 2018 •

edited

Loading

cspshadx commented May 16, 2018

nicorafales commented Jun 4, 2018

shadoWalker89 commented May 20, 2019

jezmck commented Jul 2, 2020

Horizon jobs stay on pause after 5.6 upgrade #292

Horizon jobs stay on pause after 5.6 upgrade #292

Comments

orkhanahmadov commented Feb 9, 2018

Gorbutt commented Feb 9, 2018 • edited Loading

Insolita commented Feb 9, 2018 • edited Loading

Gorbutt commented Feb 9, 2018

shrink commented Feb 10, 2018

shrink commented Feb 10, 2018 • edited Loading

josiasmontag commented Feb 10, 2018

godbout commented Feb 10, 2018

shrink commented Feb 10, 2018

sptdigital commented Mar 6, 2018 • edited Loading

cspshadx commented May 16, 2018

nicorafales commented Jun 4, 2018

shadoWalker89 commented May 20, 2019

jezmck commented Jul 2, 2020

Gorbutt commented Feb 9, 2018 •

edited

Loading

Insolita commented Feb 9, 2018 •

edited

Loading

shrink commented Feb 10, 2018 •

edited

Loading

sptdigital commented Mar 6, 2018 •

edited

Loading