Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database connections lost after any time waiting #1

Closed
judgej opened this issue Apr 18, 2018 · 4 comments
Closed

Database connections lost after any time waiting #1

judgej opened this issue Apr 18, 2018 · 4 comments

Comments

@judgej
Copy link
Member

judgej commented Apr 18, 2018

This is a problem I've not seen on MySQL but am seeing on Azure SQL Server.

If no jobs are processed for some time - in the order of a few hours - then the database connection seems to "go bad". This results in a failure to dispatch any further jobs from the queue, with an error:

[2018-04-17 23:02:17] production.ERROR: SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF]. (SQL: select top 1 * from [queue_monitor] where [job_id] = 66084abc-7654-4b2c-82fa-a66e655eeb6d order by [started_at] desc, [id] desc) {"exception":"[object] (Illuminate\Database\QueryException(code: 08S02): SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF]. (SQL: select top 1 * from [queue_monitor] where [job_id] = 66084abc-7654-4b2c-82fa-a66e655eeb6d order by [started_at] desc, [id] desc) at .../vendor/laravel/framework/src/Illuminate/Database/Connection.php:664, PDOException(code: 08S02): SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF]. at .../vendor/laravel/framework/src/Illuminate/Database/Connection.php:326)

I guess the events are registered in the long-running job worker process, and it is that process that attempts to log the job details, and the database connection does not stay open for long waiting for that to happen.

The exception also does not put the job into the failed jobs queue, so it is effectively lost. This means it is absolutely essential that no event handlers registered against the job dispatcher must throw ANY exceptions. It is very important to catch any kind of error, perhaps log it, but then return to the caller.

@judgej
Copy link
Member Author

judgej commented Apr 18, 2018

Possible solutions:

  • Restart the queue worker regularly.
  • Extend the database connection time (but to what arbitrary length?)
  • Make a database connection attempt, catching any failures, then performing a reconnect if the connection has been found to have failed.

So why is this working on MySQL? A worker with just one job every 24 hours seems to work, and the MySQL database connections certainly don't last that long. Is Laravel handling the reconnections automatically for us, but missing the SQL Server failed connections due to the nature of the error ("connection not usable" rather than "not connected"). It seems like the process still thinks it is connected, but the database thinks it isn't, and the Azure infrastructure in between is not joining up the dots to tell the process it has lost the database connection.

@judgej
Copy link
Member Author

judgej commented Apr 18, 2018

Some info that may be useful here:

https://laracasts.com/discuss/channels/laravel/database-aborted-connections-with-queue-daemon

It seems that Laravel does automatically reconnect MySQL database connections automatically, which will explain how that works.

queue:listen starts up a new process for each job. I'm not sure if that helps at all (it probably doesn't, so long as the long-running process reconnects, but then, without this logging, there would be nothing to reconnect, so this connection error would happen in each worker if they are not started as a new connection, and even then, if a worker fails and the database is not connected, then it would not get written to the job failure table). Bit of a mess really. Perhaps I need a separate package to hook into the scheduler process and just make sure the database connection is working if it has disconnected in any way. Even the disconnect may fail in this state that is half-connected and half-not-connected.

@judgej
Copy link
Member Author

judgej commented Apr 18, 2018

This may just be a Laravel illuminate\database issue, with the dropped database connection not being detected.

Issue raised here: laravel/framework#23925

Hopefully it's a simple fix, which I am trying now, and will generate a PR if it works.

@judgej judgej changed the title Database connections lost over any time waiting Database connections lost after any time waiting Apr 18, 2018
@judgej
Copy link
Member Author

judgej commented Apr 29, 2018

This is pushed over to the Laravel project (Laravel should hide the lost connections from the application), so closing this issue.

@judgej judgej closed this as completed Apr 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant