-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-107: fix out of memory when fetching Top objects with lots of children #112
Conversation
I will attempt a refactor and i want to be able to go back!
Next step now is to report better the progress, since children rows are invisible for the user. Working bee here @alliomeria and @karomabiles
@alliomeria @karomabiles we need to test/ see if performance changed
Until @alliomeria and @karomabiles have that conversation and we come up with a decision if Spreadsheets need to be split and many attached to the same SET or multiple sets or no sets at all. This can potentially bring 400K objects into a single Sheets.
We can no (yet) load a referenced entity here based on an AMI set because it would require to load the whole mapping of the AMI set yo deduce which field is being used for the node_uuid and also know if/the ADO actually exists. So we add a noticed
Small change. We won't longer use the $per_page to calculate the children max. We set the max to static increment and use the per_page only to count the top objects. this makes the 75% reducing factor when requesting all 500 and getting less top ones to only make the top objects less still allowing the children to increment more. In practice this keeps numbers low but does reduce less when there are many children making the fetch faster
At least this way @alliomeria and @karomabiles your decisions stick and are brought from child to top and back. Also changes a bit the messages we send during the batch fetching from Solr. But we need to test this, i kinda feel now things are a bit slower than before. Maybe it is me?
But should be. Tomorrow i need to try with a collection and fetch less and then fetch with offsets, maybe whitneystudioclub:collection
Another need to set $data with a default to please the getKey function
@@ -533,12 +576,18 @@ public function submitForm(array &$form, FormStateInterface $form_state) { | |||
$amisetdata->total_rows = $data['totalrows']; | |||
} | |||
|
|||
// We should probably add the UUIDs here right now. | |||
$uuid_key = isset($amisetdata->adomapping['uuid']['uuid']) && !empty($amisetdata->adomapping['uuid']['uuid']) ? $amisetdata->adomapping['uuid']['uuid'] : 'node_uuid'; | |||
/* remove from the mappings any column used as file source */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave this one as it is. Single commends can have /* */ or even //
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
This needs to be removed in the next release Co-authored-by: Albert Min <albert.min@gmail.com>
@@ -533,12 +576,18 @@ public function submitForm(array &$form, FormStateInterface $form_state) { | |||
$amisetdata->total_rows = $data['totalrows']; | |||
} | |||
|
|||
// We should probably add the UUIDs here right now. | |||
$uuid_key = isset($amisetdata->adomapping['uuid']['uuid']) && !empty($amisetdata->adomapping['uuid']['uuid']) ? $amisetdata->adomapping['uuid']['uuid'] : 'node_uuid'; | |||
/* remove from the mappings any column used as file source */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave this one as it is. Single commends can have /* */ or even //
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
@aksm thanks for your review Any reason why we need to use
instead of Just checking my Drupal Coding standards knowledge. Was not aware that is a requirement? Please let me know |
Co-authored-by: Albert Min <albert.min@gmail.com>
@DiegoPino Sorry, I misunderstood the docs. I thought all of them had to start that way, but I see now that it's just docblocks. |
@aksm no worries. DCS are a lot to digest. Please check if I addressed all your concerns. If so we can move forward and approve and keep walking into other issues. Before merging I need to compare performance with @karomabiles and @alliomeria. I felt a small harvest last night went a bit slow. I can't see why/where it could have been slowed down so it might be just my internet, but still better to compare. Thanks a lot for reviewing! |
@DiegoPino Looks good to me. Was only scanning for (my limited understanding of) DCS and anything obvious that might jump out at me. Thanks! |
Won't go the EAD or the Transform model here, those are pure Metadata. better ways of getting EAD into Archipelago
@aksm if it looks good you need to approve it! |
There might be some Drupal 10.0 deprecations in some parts of AMI. Will deal with those in another pull. In the meantime merging this and we go from there. Thanks |
What?
See #107.
@karomabiles and @alliomeria even if this is safer, I feel it just became a bit slower so will need you both to check with me during our call the same collection being harvested via your current systems v/s with this patch. I can do a pre-test just to be sure of course and make you loose less time