Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-107: fix out of memory when fetching Top objects with lots of children #112

Merged
merged 24 commits into from
Aug 2, 2022

Conversation

DiegoPino
Copy link
Member

What?

See #107.

  • Adds adaptive Range/offset based on what was the last number of Top objects v/s Children Rations
  • Copies CMODEL mapping between Children/Top objects during setup
  • Validates UUID Mapping v/s Node to Node Mapping (to avoid those being mixed up)
  • Multiple cleanups, fixes Automatic UUID for Plugins that run via Batch (e.g Solr)

@karomabiles and @alliomeria even if this is safer, I feel it just became a bit slower so will need you both to check with me during our call the same collection being harvested via your current systems v/s with this patch. I can do a pre-test just to be sure of course and make you loose less time

DiegoPino added 10 commits July 27, 2022 20:49
I will attempt a refactor and i want to be able to go back!
Next step now is to report better the progress, since children rows are invisible for the user. Working bee here @alliomeria and @karomabiles
Until @alliomeria and @karomabiles have that conversation and we come up with a decision if Spreadsheets need to be split and many attached to the same SET or multiple sets or no sets at all. This can potentially bring 400K objects into a single Sheets.
We can no (yet) load a referenced entity here based on an AMI set because it would require to load the whole mapping of the AMI set yo deduce which field is being used for the node_uuid and also know if/the ADO actually exists. So we add a noticed
Small change. We won't longer use the $per_page to calculate the children max. We set the max to static increment and use the per_page only to count the top objects. this makes the 75% reducing factor when requesting all 500 and getting less top ones to only make the top objects less still allowing the children to increment more. In practice this keeps numbers low but does reduce less when there are many children making the fetch faster
At least this way @alliomeria and @karomabiles your decisions stick and are brought from child to top and back. Also changes a bit the messages we send during the batch fetching from Solr.
But we need to test this, i kinda feel now things are a bit slower than before. Maybe it is me?
But should be. Tomorrow i need to try with a collection and fetch less and then fetch with offsets, maybe whitneystudioclub:collection
@DiegoPino DiegoPino requested a review from aksm August 2, 2022 01:51
@DiegoPino DiegoPino self-assigned this Aug 2, 2022
@DiegoPino DiegoPino added Ingest Setup Knobs and Levers you move while thinking about feelings and metadata and CSV files AMI Import Plugins Gets data from somewhere, puts data into a strawberry basked labels Aug 2, 2022
@DiegoPino DiegoPino added this to the 0.4.0 milestone Aug 2, 2022
Another need to set $data with a default to please the getKey function
src/Controller/AmiRowAutocompleteHandler.php Outdated Show resolved Hide resolved
src/Controller/AmiRowAutocompleteHandler.php Outdated Show resolved Hide resolved
src/Form/AmiMultiStepIngest.php Outdated Show resolved Hide resolved
src/Form/AmiMultiStepIngest.php Outdated Show resolved Hide resolved
@@ -533,12 +576,18 @@ public function submitForm(array &$form, FormStateInterface $form_state) {
$amisetdata->total_rows = $data['totalrows'];
}

// We should probably add the UUIDs here right now.
$uuid_key = isset($amisetdata->adomapping['uuid']['uuid']) && !empty($amisetdata->adomapping['uuid']['uuid']) ? $amisetdata->adomapping['uuid']['uuid'] : 'node_uuid';
/* remove from the mappings any column used as file source */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave this one as it is. Single commends can have /* */ or even //

src/Plugin/ImporterAdapter/SolrImporter.php Outdated Show resolved Hide resolved
src/Plugin/ImporterAdapter/SolrImporter.php Outdated Show resolved Hide resolved
src/Plugin/ImporterAdapter/SolrImporter.php Outdated Show resolved Hide resolved
src/Plugin/ImporterAdapter/SolrImporter.php Outdated Show resolved Hide resolved
src/Plugin/ImporterAdapter/SolrImporter.php Outdated Show resolved Hide resolved
DiegoPino and others added 4 commits August 2, 2022 09:39
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
This needs to be removed in the next release

Co-authored-by: Albert Min <albert.min@gmail.com>
@@ -533,12 +576,18 @@ public function submitForm(array &$form, FormStateInterface $form_state) {
$amisetdata->total_rows = $data['totalrows'];
}

// We should probably add the UUIDs here right now.
$uuid_key = isset($amisetdata->adomapping['uuid']['uuid']) && !empty($amisetdata->adomapping['uuid']['uuid']) ? $amisetdata->adomapping['uuid']['uuid'] : 'node_uuid';
/* remove from the mappings any column used as file source */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave this one as it is. Single commends can have /* */ or even //

DiegoPino and others added 7 commits August 2, 2022 09:41
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
Co-authored-by: Albert Min <albert.min@gmail.com>
@DiegoPino
Copy link
Member Author

@aksm thanks for your review

Any reason why we need to use

/**comments */

instead of
/* comments */

Just checking my Drupal Coding standards knowledge. Was not aware that is a requirement? Please let me know

Co-authored-by: Albert Min <albert.min@gmail.com>
@aksm
Copy link
Contributor

aksm commented Aug 2, 2022

@aksm thanks for your review

Any reason why we need to use

/**comments */

instead of /* comments */

Just checking my Drupal Coding standards knowledge. Was not aware that is a requirement? Please let me know

@DiegoPino Sorry, I misunderstood the docs. I thought all of them had to start that way, but I see now that it's just docblocks.

@DiegoPino
Copy link
Member Author

@aksm no worries. DCS are a lot to digest. Please check if I addressed all your concerns. If so we can move forward and approve and keep walking into other issues. Before merging I need to compare performance with @karomabiles and @alliomeria. I felt a small harvest last night went a bit slow. I can't see why/where it could have been slowed down so it might be just my internet, but still better to compare. Thanks a lot for reviewing!

@aksm
Copy link
Contributor

aksm commented Aug 2, 2022

@DiegoPino Looks good to me. Was only scanning for (my limited understanding of) DCS and anything obvious that might jump out at me. Thanks!

Won't go the EAD or the Transform model here, those are pure Metadata. better ways of getting EAD into Archipelago
@DiegoPino
Copy link
Member Author

@aksm if it looks good you need to approve it!

@DiegoPino
Copy link
Member Author

There might be some Drupal 10.0 deprecations in some parts of AMI. Will deal with those in another pull. In the meantime merging this and we go from there. Thanks

@DiegoPino DiegoPino merged commit cfabf3b into 0.4.0 Aug 2, 2022
@DiegoPino DiegoPino deleted the ISSUE-107 branch August 2, 2022 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMI Import Plugins Gets data from somewhere, puts data into a strawberry basked Ingest Setup Knobs and Levers you move while thinking about feelings and metadata and CSV files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants