Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horizon jobs stay on pause after 5.6 upgrade #292

Closed
orkhanahmadov opened this issue Feb 9, 2018 · 13 comments
Closed

Horizon jobs stay on pause after 5.6 upgrade #292

orkhanahmadov opened this issue Feb 9, 2018 · 13 comments

Comments

@orkhanahmadov
Copy link

I had Horizon setup for Redis queue. Everything was working fine.

Upgraded Laravel to 5.6, now jobs stay on "pause", about 1 on every 10 jobs actually finish processing, all others stay on "pause". Horizon log file also does not have anything about "paused" jobs.

Reverted back 5.5, everything works fine.

Anyone else having this issue?

@orkhanahmadov orkhanahmadov changed the title Horizon jobs stay on pause after 5.6 upgraded Horizon jobs stay on pause after 5.6 upgrade Feb 9, 2018
@Gorbutt
Copy link

Gorbutt commented Feb 9, 2018

Yes, I am. I wrote about the issue on Laracasts and on LaraChat but no-one has commented on it. The entry is here at Laracasts. Most jobs would get stuck but a few went through.

I came here to report this and saw your post. You're not alone!

It was also throwing an exception every 3 minutes and filling the log. No jobs were dispatched but soon as I ran 'php artisan horizon' it would throw this exception every 3 minutes or so.
local.ERROR: Error while reading line from the server. [tcp://127.0.0.1:6379] {"exception":"[object] (Predis\Connection\ConnectionException(code: 0): Error while reading line from the server. [tcp://127.0.0.1:6379] at

I wasn't sure it was a local issue so I uploaded the code to my staging server on Ubuntu (Forge provisioned) and exactly the same thing.

I rolled back to 5.5 and all is good.

@Insolita
Copy link

Insolita commented Feb 9, 2018

I have same trouble after upgrade to 5.6, Solved by set horizon version as 1.1.1, 1.2 seems not stable

@Gorbutt
Copy link

Gorbutt commented Feb 9, 2018

I upgraded framework to 5.6 again, same issue, downgraded Horizon to 1.1.1 and issue remained. Rolled back framework to 5.5 and ok again. Horizon 1.1.1 and 1.2 same issue for me.

@shrink
Copy link

shrink commented Feb 10, 2018

I'm having what is potentially the same issue. A job is dispatched (from a model observer) and the job is created: it's visible in the queue ("Recent Jobs") and the count under "Current Workload" for the queue reflects the jobs presence. However, at no point is the job ever executed. I have added logging to both the job constructor and the handle method to verify this. The job remains in limbo, forever: "paused" state, unexecuted. The next time "Current Workload" updates (appears to be every second) it now shows no jobs pending.

I have a command running every minute populating the queue with a couple of hundred jobs, with other jobs created on demand in response to a request. These jobs are all the same class, the only difference is the model passed in.

The issue doesn't seem to have much of a pattern: it's occurring (as far as I can tell) exclusively with my jobs that are dispatched in response to a request. I can dispatch 20 jobs in a row manually and each will succeed, and then one might enter the limbo state, or a dozen might enter that state, or one might enter that state followed by success followed by another in that state. This issue appears to be most common when the queue is being populated, i.e if I add a job to the queue at the same time as my command is populating the queue it's more likely to enter the limbo state but given the seemingly random nature of this issue I'm not sure if that's just a coincidence.

Laravel 5.6
Laravel Horizon 1.1
Redis as the queue
Memory and CPU usage below 50%

I'm going to attempt to debug this more and will add anything I find here.

@shrink
Copy link

shrink commented Feb 10, 2018

Making use of redis-cli monitor I think I have tracked down something that might point us in the right direction. I monitored a successful job and I monitored a job that entered "limbo" and the difference seems to be that the job is taken from the queue (BLPOP) but it isn't reserved, it just disappears. For example, here's a successful job:

1518230973.746618 [0 127.0.0.1:44506] "BLPOP" "queues:example" "0"
1518230973.746885 [0 127.0.0.1:44506] "ZADD" "queues:example:reserved" "1518231063" "{..}"
1518230973.747353 [0 127.0.0.1:45260] "HMSET" "horizon:228144" "status" "reserved" "payload"
1518230973.956982 [0 127.0.0.1:44508] "ZREM" "queues:example:reserved" "{...}"
1518230973.964735 [0 127.0.0.1:45254] "HMSET" "horizon:228137" "status" "completed" "completed_at" "1518230973"

A job is pulled from the queue, it's added to queues:example:reserved and its status is set to reserved, then the job completes and it's removed from the reserved queue and marked as complete.

...whereas, looking at a job in limbo I see the BLPOP but then nothing happens with that job: it's not reserved, there's nothing, no further log lines (in the following 60 seconds) relate to that job, it just
disappears — the rest of the log relates to a different job on a different queue being removed (included just in case there's something indicative of the issue in them).

1518230975.209830 [0 127.0.0.1:56044] "BLPOP" "queues:priority" "0"
1518230975.218347 [0 127.0.0.1:45424] "ZREM" "queues:example:reserved" "{...}"
1518230975.218792 [0 127.0.0.1:45426] "HMSET" "horizon:228143" "status" "completed" "completed_at" "1518230975"
1518230975.218893 [0 127.0.0.1:45426] "EXPIREAT" "horizon:228143" "1518234575"

So the issue appears to be: for some reason, a job in the queue is being taken but not reserved and never processed, and therefore remains in limbo (status pending but not in the queue) until it expires.

Since this works in Laravel 5.5 but not 5.6 then I'm guessing it's a Laravel issue, and the Laravel RedisQueue class was modified very recently, the retrieveNextJob method specifically... and then when I looked at the commit for those changes I see there's all these comments on the Pull Request about race conditions and lost jobs which could well explain this experience... so maybe this is it? Maybe that's the cause?

@josiasmontag
Copy link
Contributor

As a temporary fix, adding 'block_for' => null to the queue.php config solves the issue:

    'redis' => [
        'driver' => 'redis',
        'connection' => 'default',
        'queue' => 'default',
        'retry_after' => 90,
        'block_for' => null
    ],

@godbout
Copy link

godbout commented Feb 10, 2018

! Been fighting with predis for the whole day as I got a comment that this was not due to a Laravel issue. Glad I'm finding this thread.

For me, killing horizon (and having supervisord restarting it) get the jobs through. But basically, I have to kill horizon after every job queued.

@shrink
Copy link

shrink commented Feb 10, 2018

@josiasmontag as far as I can tell, setting block_for to null has indeed resolved my missing jobs. Thank you :-)

@sptdigital
Copy link

sptdigital commented Mar 6, 2018

I am still experiencing this issue on Laravel 5.5 and Horizon 1.2.2, I have cleared my config cache and restarted supervisor (I also killed supervisor, killed horizon and ran it manually).

If you follow 6fe4cd4 back it suggests this issue came about as a result to changes in the default config for 5.6, but I'm seeing it in 5.5 as well.

Now all my dependencies have 5.6 support, I'll try updating it to 5.6 tomorrow and seeing how we get on.

@cspshadx
Copy link

The problem persists, Laravel 5.6, Horizon 1.1.1, 1.2.3 (neither version works). Tried solution with setting block_for=null in config with no luck. Can't downgrade to 5.5 as the project was built on 5.6 initially and various packages have 5.6 deps, so downgrading is very time consuming.

@nicorafales
Copy link

Same as @cspshadx . My problem happens only with jobs that open a file in local disk. It seems that horizon is not so mature as we would want it to be.

@shadoWalker89
Copy link

I started a new app
I'm on laravel 5.8.17 and horizon 3.1.2 (latest version of both)
I have this job that run for 831 seconds.
When the job was actually running, the horizon shows the job is paused even though it is not
My job sends progress back to the browser and i was seeing the progress advance till the end.

When my job finished, horizon showed that the job finished successfully and it run for 831seconds.
Before it was paused and the runtime is 0.4 seconds

image

@jezmck
Copy link

jezmck commented Jul 2, 2020

I found a workaround; run the standard artisan queue:work, as well as artisan horizon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests