Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure SQL Server - Laravel not detecting lost database connections #23925

Closed
judgej opened this issue Apr 18, 2018 · 12 comments
Closed

Azure SQL Server - Laravel not detecting lost database connections #23925

judgej opened this issue Apr 18, 2018 · 12 comments

Comments

@judgej
Copy link
Contributor

judgej commented Apr 18, 2018

  • Laravel Version: 5.6
  • PHP Version: 7.2
  • Database Driver & Version: pdo_sqlsrv

Description:

Running Laravel on CentOS 7.4 in an Azure VM. Using Azure SQL Server database and Azure queues.

We have queue workers running. These monitor the queues with no issues. However, after a short time (a few hours or thereabouts) if a job comes in on the queue, the job throws an exception as it tries to access the database. The exception includes the message:

ERROR: SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF].

This continues for all further jobs, until the queue workers are killed.

Steps To Reproduce:

Most of the details are above. Basically, use Azure SQL Server with pdo_sqlsrv driver, run a artisan queue:work, do nothing for a couple of hours, then dispatch a job to the queue that needs to use the database.

I have found the list of error messages that Laravel uses to decide whether a database connection has been lost:

https://github.com/illuminate/database/blob/71f75f4d1d7ffaa33da4bde0da6f246ed7741479/DetectsLostConnections.php#L10

I have hacked "Physical connection is not usable" into that list on our system to see if that fixes the problem. Will know later today.

Assistance sought here: https://laracasts.com/discuss/channels/laravel/how-do-i-add-a-message-to-the-list-detecting-a-closed-database-connection

@judgej
Copy link
Contributor Author

judgej commented Apr 18, 2018

A 17:00 BST scheduled job we have ran successfully after being left without processing any queued jobs for 2.5 hours. It's looking promising. I'll leave the system for the midnight job to see how that goes, before submitting a PR.

Just a note, but the list of messages that Laravel checks for, to determine whether the database connection has been closed, are all English. How does this work in other locales, where the PDO driver may be returning non-English error messages? Are they all having problems running queue worker daemons?

@judgej
Copy link
Contributor Author

judgej commented Apr 19, 2018

Had a different "connection lost" error this time:

SQLSTATE[08S01]: [Microsoft][ODBC Driver 13 for SQL Server]TCP Provider: Error code 0x68

It is becoming apparent to me that this list of error messages that indicate a connection was lost, needs to be configurable. I can see no way to override that Trait, hard-coded in the Connection class, that is not macroable (AFAICS), except by an override in the autoloader.

If the trait moved the list of messages to a property, and provided some public methods to manipulate that list, that could be one approach. Though, this trait is used in multiple places, so maybe not. Perhaps it should be a singleton class and not a trait.

judgej added a commit to judgej/database that referenced this issue Apr 19, 2018
See laravel/framework#23925

Seeing these on MS Azure, with Azure SQL Server, and CentOS 7 in a VM with PHP 7.2 + pdo_sqlsrv. Cloud database connections seem to be made of water, and time out in many ways. I suspect there will be more to come.
@adamtester
Copy link
Contributor

Hello @judgej, I have this problem as well and it's caused by running the workers in daemon mode, causing the connection to the DB to be left open.

When changing to not use it in production it fixes the issue (although jobs and processed much slower).

You can increase the timeout in SQL Server but not sure how to do it for AzureDB

SQLSTATE[08S02]: [Microsoft][ODBC Driver 13 for SQL Server]SMux Provider: Physical connection is not usable [xFFFFFFFF]

@judgej
Copy link
Contributor Author

judgej commented Apr 24, 2018

I've not been able to find much information on what running the jobs in non-daemon mode actually does. If it continuously restarts itself every few seconds, then that is a horrendous use of resources - constant database connects, constant process restart processing costs. I would rather avoid that if I can. But the documentation on what it actually does is very sparse.

Having the database connection dropped though inactivity is fine, so long as Laravel invisibly reconnects when the connection is needed again. Obviously avoid keeping transactions open for long periods, as you could end up with lost data when that drops.

I am finding this is becoming an issue with long-running queue listeners though. When the queue connection drops, it sends an error message to Laravel, and Laravel is not able to handle it and reconnect. For now, I'm restarting the queue workers every ten minutes, which is a reasonable compromise.

The PR I've submitted has the "Physical connection is not usable" message you are seeing included in the list of message Laravel uses to recognise lost connections, which should help you if/when it gets merged in. However, I do feel a better solution is needed there, something that will work in non-English locales would be good for a start, or a way for applications to inject their own list of messages would be an okay workaround. It's all hard-coded in a trait at the moment, and I cannot see a way to extend that, except by an auto-loader override, which can be a bit flaky as you have no control over the order composer runs its filesystem search.

@adamtester
Copy link
Contributor

adamtester commented Apr 24, 2018

Would that make Laravel auto reconnect? Or just show a more friendly error message?
We never seem to have dropped connections our MySQL or PostgreSQL applications.

I will try your PR on my local and report back!

EDIT: Just read your comment on the PR!

@judgej
Copy link
Contributor Author

judgej commented Apr 24, 2018

You do have dropped connections with MySQL. It is Laravel that quietly brushes them under the carpet, reconnects, then retries the SQL or DML again for you. That's the magic of using a framework - it does stuff like that so you don't have to think about it :-) Until it doesn't :-(

@adamtester
Copy link
Contributor

adamtester commented Jun 12, 2018

This is becoming a real problem on production, I wrote some terrible code which seems to have fixed it until your PR goes through, since you can't easily override the trait

private function checkForDroppedConnection()
{
    $location = '../vendor/laravel/framework/src/Illuminate/Database/DetectsLostConnections.php';
    $file = file_get_contents($location);

    if (!\Illuminate\Support\Str::contains($file, 'Physical connection is not usable')) {
        $newFile = file_get_contents('../storage/hacks/DetectsLostConnections.php');
        $fp = fopen($location, 'wb');
        fwrite($fp, $newFile);
        fclose($fp);
    }
}

Which copies over the new file, this can be run as a composer post-install hook, or even just stick it in a middleware

$newFile is the same just with your changes

@judgej
Copy link
Contributor Author

judgej commented Jun 12, 2018

Another way is to duplicate the whole class, make your changes, then make sure it is loaded before the Laravel version is loaded (just a nasty hacky include in index.php will make sure your odified version is loaded), and the composer autoloader won't then try to load the original unpatched version.

@judgej
Copy link
Contributor Author

judgej commented Jun 12, 2018

Did I submit the PR to the wrong place?

@adamtester
Copy link
Contributor

I think so! Realised none of these PR's got merged then saw this was read only. I reopened it on the real one and was merged (sorry, feel like I took your credit) I tagged this issue to be looked at.

@judgej
Copy link
Contributor Author

judgej commented Jun 12, 2018

@adamtester I'll hunt you down - for a beer. I obviously failed to get this thing fixed, and you did, so thank you :-)

@xmaop
Copy link

xmaop commented Feb 28, 2019

@adamtester Thank you.
I ran this command in my docker container and the error disappeared.
sudo docker exec -ti dockerizedapp_app_1 supervisorctl restart laravel-worker:laravel-worker_00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants