Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Configurable products on big catalogs #278

Closed
barbazul opened this issue Apr 9, 2013 · 11 comments
Closed

Import Configurable products on big catalogs #278

barbazul opened this issue Apr 9, 2013 · 11 comments

Comments

@barbazul
Copy link
Contributor

barbazul commented Apr 9, 2013

I've just stumbled upon an issue when importing a small file into a large catalog using ImportExport.

The file contained 2 simple products and a configurable product associated to them via 2 attributes (size and color).

You can find the sample file in here: https://gist.github.com/barbazul/5349433

I imported the file in a catalog of about 40k products. The data validation went ok, but the actual import failed (mutely).

Debugging through the script I found the error to happen during an inventory event (which is weird on itself).

The whole path is the following:

  1. Eventually the script gets to Mage_ImportExport_Model_Import_Entity_Product_Type_Configurable::_loadSkuSuperAttributeValues()
  2. The foreach in line 233 fires a call to load on a Mage_Catalog_Model_Resource_Product_Collection filtering by product type equal to 'simple', 'downloadable' or 'virtual'
  3. The call to load fires the catalog_product_collection_load_after event
  4. CatalogInventory has an observer for that event: Mage_CatalogInventory_Model_Observer::addStockStatusToCollection()
  5. The observer in turn calls Mage_CatalogInventory_Model_Stock_Status::addStockStatusToProducts() passing the loaded collection
  6. The model will walk the entire collection (40k products at this time) and collect all product ids in an array, which is then passed, in line 479 to $this->_getResource()->getProductStatus($productIds, $websiteId, $stockId)
  7. Mage_CatalogInventory_Model_Resource_Stock_Status::getProductStatus() takes the array and passes it to a were IN condition in a select element (line 116)

The result is a query with a WHERE clause like this:

SELECT 
[...]
WHERE 
    product_id IN ( [40k IDs] ) 
    AND stock_id = $stockId 
    AND website_id = $websiteId

Conclussions

  1. I don't think there is any need to load any inventory information at all just to assign associated products to a configurable
  2. Even if it just "didn't hurt" to have inventory information at that time, there is something seriously wrong in that query
  3. The whole issue is being triggered by a method that is attempting to load the full mapping of simple products to their corresponding assigned options in the supper attributes. In many cases, that means the whole catalog will attempt to be loaded into memory. Is that safe or even necessary for the task being addressed? It looks like it won't scale well on large catalogs
@s-hoffman
Copy link

@barbazul,

I am currently working with a company that has 700K Simple Products and 3 Super Attributes for most of them.
Product Import fails due to Memory overflow during the Configurable product phase. I have increased memory to 6GB or so, and it runs through that also.

The SQL call time also spikes at about 75K. The exact number varies so I am assuming that the DB has to store some of the Joined information on the disk when the total data exceeds the memory limit.

Due to the above I have written a short Module rewrite for the Configurable product that Fixes the bug.
When I test the full import, I aim to Patch ../Configurable.php and push the change.

I am working with Magento EE 1.12 but ../Configurable.php is the same in both versions so it should carry over.

@barbazul
Copy link
Contributor Author

@s-hoffman I'd like to see that patch!

Meanwhile I spent a little time thinking about this. I think the problem is within the CatalogInventory observer which very naively assumes it is being called using a small, paged, subsection of the catalog. I think the whole query should be rewritten to assume the worst, though I am not sure how

@barbazul barbazul reopened this Apr 16, 2013
@s-hoffman
Copy link

@barbazul I have pushed the change to my fork.
I do not have Mage 2 configured and I implemented this version slightly different then the rewrite for EE1.12.
It should work though. As you are trying to import your 40K products you can test it with little effort.

Then I can create a Pull request.

@barbazul
Copy link
Contributor Author

Looks very promissing. I will give it a try and report back

@s-hoffman
Copy link

Any feedback yet?

@barbazul
Copy link
Contributor Author

Sorry. Havent found the time to check this out yet.
I'll be toying with this during the weekend
El abr 19, 2013 5:55 PM, "s-hoffman" notifications@github.com escribió:

Any feedback yet?


Reply to this email directly or view it on GitHubhttps://github.com//issues/278#issuecomment-16688232
.

@barbazul
Copy link
Contributor Author

I just tested in 1.7.0.2 And the ammount of ids used in the IN clause dropped from 48778 to 4 (I was creating a configurable with 3 options). The import run well, however the configurable product did not get associated simple products.

I am going to try again after refreshing all indexes (which could take several hours)

@barbazul
Copy link
Contributor Author

I can confirm this works like a charm in 1.7.0.2. The error I found before was entirely human (misstyped the name of the options)

I don't have any magento2 instances set up with such a large dataset though

You should definitely prepare a pull-request for this with tests 👍

@barbazul
Copy link
Contributor Author

barbazul commented May 2, 2013

@s-hoffman I strongly recommend you submit a pull request so this gets fixed in next release

@barbazul
Copy link
Contributor Author

barbazul commented May 2, 2013

Ignore my previous comment. I just found it: #284

magento-team added a commit that referenced this issue Mar 7, 2014
* Cache:
  * Implemented depersonalization of private content generation
  * Implemented content invalidation
  * Added Edge Side Includes (ESI) support
  * Added a built-in caching application
* GitHub requests:
  * [#454](#454) -- Allow to specify list of IPs in a body on maintenance.flag which will be granted access even if the flag is on
  * [#204](#204) -- Mage_ImportExport: Exporting configurable products ignores multiple configurable options
  * [#418](#418) -- Echo vs print
  * [#419](#419) -- Some translation keys are not correct.
  * [#244](#244) -- Retrieve base host URL without path in error processor
  * [#411](#411) -- Missed column 'payment_method' of table 'sales_flat_quote_address'
  * [#284](#284) -- Fix for Issue #278 (Import -> Stores with large amount of Configurable Products)
* Fixed bugs:
  * Fixed an issue where Mage_Eav_Model_Entity_Type::fetchNewIncrementId() did not rollback on exception
  * Fixed an issue where a category containing more than 1000 products could not be saved
  * Fixed inappropriate error messages displayed during installation when required extensions were not installed
  * Fixed synopsis of the install.php script
  * Fixed an issue where the schedule of recurring payments was not displayed in the shopping cart
* Modularity improvements:
  * Introduced the OfflinePayments module - a saparate module for offline payment methods
  * Added the ability to enable/disable the Paypal module
  * Moved the framework part of the Locale functionality from the Core module to library
  * The Locale logic was split among appropriate classes in library, according to their responsibilities
  * Removed the deprecated DHL functionality
  * Introduced the OfflineShipping module for offline shipping carrier functionality: Flatrate, Tablerate, Freeshipping, Pickup
  * Introduced a separate module for the DHL shipping carrier
  * Introduced a separate module for the Fedex shipping carrier
  * Introduced a separate module for the UPS shipping carrier
  * Introduced a separate module for the USPS shipping carrier
* Framework Improvements:
  * Added the ability to intercept internal public calls
  * Added the ability to access public interface of the intercepted object
  * Added a static integrity test for plugin interface validation
  * Added support for both class addressing approaches in DI: with and without slash ("\") at the beginning of a class name
* Customer Service usage:
  * Refactored the Customer module blocks and controllers to use customer service layer
* Security:
  * Introduced the ability to hash a password with a random salt of default length (32 chars) by the encryption library
  * Utilized a random salt of default length for admin users, and frontend customers
@verklov
Copy link
Contributor

verklov commented Apr 17, 2014

Fixed by #284. Closing this issue.

@verklov verklov closed this as completed Apr 17, 2014
elevateweb pushed a commit to elevateweb/AvS_FastSimpleImport that referenced this issue Nov 12, 2014
Refer to this thread for the full issue.
magento/magento2#278

We confirmed this issue Exists in Magento EE 1.13 Where for large
catalogs we were experiencing out of memory issues due to this function
_loadSkuSuperAttributeValues()

This patch will help those stores with large amounts of configurable
products process large imports with hundreds of thousands of rows
without hitting this issue as a
bottleneck
tang-yu pushed a commit that referenced this issue May 8, 2015
[Extensibility] Bug fixes, public APIs, Sprint 50
magento-team pushed a commit that referenced this issue Jan 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants