Database connections lost after any time waiting #1

judgej · 2018-04-18T10:31:52Z

This is a problem I've not seen on MySQL but am seeing on Azure SQL Server.

If no jobs are processed for some time - in the order of a few hours - then the database connection seems to "go bad". This results in a failure to dispatch any further jobs from the queue, with an error:

[2018-04-17 23:02:17] production.ERROR: SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF]. (SQL: select top 1 * from [queue_monitor] where [job_id] = 66084abc-7654-4b2c-82fa-a66e655eeb6d order by [started_at] desc, [id] desc) {"exception":"[object] (Illuminate\Database\QueryException(code: 08S02): SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF]. (SQL: select top 1 * from [queue_monitor] where [job_id] = 66084abc-7654-4b2c-82fa-a66e655eeb6d order by [started_at] desc, [id] desc) at .../vendor/laravel/framework/src/Illuminate/Database/Connection.php:664, PDOException(code: 08S02): SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF]. at .../vendor/laravel/framework/src/Illuminate/Database/Connection.php:326)

I guess the events are registered in the long-running job worker process, and it is that process that attempts to log the job details, and the database connection does not stay open for long waiting for that to happen.

The exception also does not put the job into the failed jobs queue, so it is effectively lost. This means it is absolutely essential that no event handlers registered against the job dispatcher must throw ANY exceptions. It is very important to catch any kind of error, perhaps log it, but then return to the caller.

judgej · 2018-04-18T10:36:16Z

Possible solutions:

Restart the queue worker regularly.
Extend the database connection time (but to what arbitrary length?)
Make a database connection attempt, catching any failures, then performing a reconnect if the connection has been found to have failed.

So why is this working on MySQL? A worker with just one job every 24 hours seems to work, and the MySQL database connections certainly don't last that long. Is Laravel handling the reconnections automatically for us, but missing the SQL Server failed connections due to the nature of the error ("connection not usable" rather than "not connected"). It seems like the process still thinks it is connected, but the database thinks it isn't, and the Azure infrastructure in between is not joining up the dots to tell the process it has lost the database connection.

judgej · 2018-04-18T11:07:51Z

Some info that may be useful here:

https://laracasts.com/discuss/channels/laravel/database-aborted-connections-with-queue-daemon

It seems that Laravel does automatically reconnect MySQL database connections automatically, which will explain how that works.

queue:listen starts up a new process for each job. I'm not sure if that helps at all (it probably doesn't, so long as the long-running process reconnects, but then, without this logging, there would be nothing to reconnect, so this connection error would happen in each worker if they are not started as a new connection, and even then, if a worker fails and the database is not connected, then it would not get written to the job failure table). Bit of a mess really. Perhaps I need a separate package to hook into the scheduler process and just make sure the database connection is working if it has disconnected in any way. Even the disconnect may fail in this state that is half-connected and half-not-connected.

judgej · 2018-04-18T13:35:05Z

This may just be a Laravel illuminate\database issue, with the dropped database connection not being detected.

Issue raised here: laravel/framework#23925

Hopefully it's a simple fix, which I am trying now, and will generate a PR if it works.

…e database

judgej · 2018-04-29T11:13:19Z

This is pushed over to the Laravel project (Laravel should hide the lost connections from the application), so closing this issue.

judgej changed the title ~~Database connections lost over any time waiting~~ Database connections lost after any time waiting Apr 18, 2018

judgej added a commit that referenced this issue Apr 19, 2018

Issue #1 protecting the scheduler from exceptions while logging to th…

144e8dd

…e database

judgej closed this as completed Apr 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database connections lost after any time waiting #1

Database connections lost after any time waiting #1

judgej commented Apr 18, 2018

judgej commented Apr 18, 2018

judgej commented Apr 18, 2018 •

edited

Loading

judgej commented Apr 18, 2018

judgej commented Apr 29, 2018

Database connections lost after any time waiting #1

Database connections lost after any time waiting #1

Comments

judgej commented Apr 18, 2018

judgej commented Apr 18, 2018

judgej commented Apr 18, 2018 • edited Loading

judgej commented Apr 18, 2018

judgej commented Apr 29, 2018

judgej commented Apr 18, 2018 •

edited

Loading