-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layout corruption with php-fpm (a Magento 1 problem returns) #6942
Comments
I've tested the fix and unfortunately the cache corruption has occurred again, so it may be that the cause of this problem is different, or the fix needs to be different for Magento 2. |
Can you please post a screenshot of the error when it happens? I'm trying to replicate on vanilla need to find something to hook into my bash script so that my scripts can identify a failure programmatically. For now i'll focus on the page missing a tag in its entirety. |
I've spent an hour or so playing and have a few thoughts, although sadly no solution. I hope I actually get my hands on a demonstrable case of this bug at some point in the future. M1 and configuration corruptionThere were multiple errors that could cause config corruption in M1, aside from the cache locking race condition (https://github.com/convenient/magento-ce-ee-config-corruption-bug#update-good-news-a-patch-from-magento) the most likely culprit would be running PHP-FPM and the non thread-safe function https://bugs.php.net/bug.php?id=64938
|
Thanks for all that work! I will try and go through it soon. For now, I'll just note that I thought there were unsafe calls in symfony and zend components that are manually turning on libxml_disable_entity_loader. Thanks again, |
One of our projects was experiencing this issue as well, and as a workaround, we configured a cron task for clearing the layout cache every 3 hours just to avoid broken pages from being shown on the site. We recently upgraded to Magento 2.1.3, and along with it, disabled the cron task for cleaning the layout cache. So far there are no reports of this issue occurring again. Anybody else experienced the same? |
@eldonbite What version did you upgrade from? Do you have any other debug info to add? |
Hi @convenient, We were previously running 2.1.2. |
@maderlock @convenient @eldonbite Are you using Redis for cache storage? |
Hi @mrIntegrator, yes we do use Redis for both cache and session storage, and Varnish for FPC. Just recently we encountered this same issue in the Admin (instead of the front-end), and clearing just the layout cache fixed the broken Admin pages. |
Sorry, that it's been a while. This bug came back to mind when I noticed one of sites still had layout cache disabled in production as a precaution. Lo and behold, the bug still exists in the latest version of Magento (currently 2.2.3). So here's what I've discovered. Steps to Reproduce:
Now this is just how to quickly reproduce what happens under heavy traffic as mentioned in the initial bug report. We have only experienced this happening under heavy traffic as well. The problem is in the logic of this function:
We can see here that there is a check as to whether the layout updates were loaded from cache, but if they were, it's just assumed that page layout will be loaded from cache as well. The problem is that when Redis reaches its maxmemory setting (as it might under heavy traffic), it will begin evicting keys (see https://redis.io/topics/lru-cache). So if the page layout key is evicted while the layout key is not, the bug occurs. To fix this bug, we have to check for a successful load of both layout and page layout from the cache. I will create a PR with a fix for this. Hopefully Magento can get this fixed quickly and we'll all be able to turn layout cache back on! |
The real production case for which I've spent 40 hours debugging: Symptoms Full cache flush helps temporary. Pre-requisites
How It Happens
How to easily reproduce
Actual result: Empty page, no HTML code at all. Root cause Permanent fix Pull requests
Other action items
|
As I see the fix has been included into v2.2.9 and v2.3.2, but pull requests weren't actually merged. |
Actually. I ran into this old issue again in version 2.4. The site was running great in the months before and I have been working around the clock to get this fixed. I noticed there was a correlation between the load and these broken pages and I noticed that clearing the cache fixed things for a while (until a new load peak). |
I can hit 15 categories every second without a problem but then one hangs for a while and comes up broken. I will raise the one GB again to two GB now the Database is moved away from the VPS but still that is a lot of GB for processing pages. I'll come back with the outcome of this change. |
Setting it to 1.5GB didn't help. Even with 2GB I get the error that 28480 bytes could not be allocated series on a page where a customer tried to set the product limit from 40 to 80. |
@onepack according to the logs you shown above - seems like you got the out of memory issue, please check if your increased memory is being used, the best option will checking the memory_limit parameter in the output of the |
@ihor-sviziev I was just thinking about the system running for months without a restart before we upgraded running on less RAM than we do now to keep things from crashing. Also I noticed other users with the same issue: Checking above shows the same files causing memory peaks: |
@onepack This is definitely not related to listed above issue. |
I did some elastic searches and got a result on the result page but it also gave an error in the RAM. As a visitor on the frontend you don't notice this but the next search or going to a category around the same time will result in a broken layout or a blank page 500 error page. This is done via the search but the same happens when going through categories without the search. |
@ihor-sviziev |
@ihor-sviziev |
@onepack Your case looks to me different than the issue described by me above. |
@andrey-legayev |
Preconditions
Steps to reproduce
Expected result
Actual result
This is fixed by clearing the layout cache. This displays in the same way as an issue in Magento 1 - see https://www.c3media.co.uk/blog/c3-news/security-fix-might-take-site
The fix in Magento 1 was to add the following to bootstrap.php:
Why is this needed? Because there are a number of places in Magento 2 where
$oldval = libxml_disable_entity_loader(true)
is called and then laterlibxml_disable_entity_loader($oldval)
. In theory this should be fine, but the value is not thread-safe, so when running under php-fpm it only requires a call to occur in between these statements for the entire thread to end up set to TRUE. Why this then corrupts to cache it was never clear in Magento 1, but that was the upshot.The above fix was added in Magento 1.9.2.0, but was not included in Magento 2. In one file (vendor/magento/framework/Xml/Security.php) there is an acknowledgment of the issue:
...but sadly it still calls libxml_disable_entity_loader despite the warning. This is also called in some of the Zend and Symfony components, so it's not really reasonable to handle all these situations. Thus the need for the suggested fix.
Not entirely sure how to fix this in the short-term, as unlike Magento 2 we cannot just edit the bootstrap.php file in Magento 2.1.0. Any suggestions?
The text was updated successfully, but these errors were encountered: