ISSUE-107: fix out of memory when fetching Top objects with lots of children #112

DiegoPino · 2022-08-02T01:50:56Z

What?

Adds adaptive Range/offset based on what was the last number of Top objects v/s Children Rations
Copies CMODEL mapping between Children/Top objects during setup
Validates UUID Mapping v/s Node to Node Mapping (to avoid those being mixed up)
Multiple cleanups, fixes Automatic UUID for Plugins that run via Batch (e.g Solr)

@karomabiles and @alliomeria even if this is safer, I feel it just became a bit slower so will need you both to check with me during our call the same collection being harvested via your current systems v/s with this patch. I can do a pre-test just to be sure of course and make you loose less time

I will attempt a refactor and i want to be able to go back!

@alliomeria

Next step now is to report better the progress, since children rows are invisible for the user. Working bee here @alliomeria and @karomabiles

@alliomeria

@alliomeria @karomabiles we need to test/ see if performance changed

@alliomeria

Until @alliomeria and @karomabiles have that conversation and we come up with a decision if Spreadsheets need to be split and many attached to the same SET or multiple sets or no sets at all. This can potentially bring 400K objects into a single Sheets.

We can no (yet) load a referenced entity here based on an AMI set because it would require to load the whole mapping of the AMI set yo deduce which field is being used for the node_uuid and also know if/the ADO actually exists. So we add a noticed

Small change. We won't longer use the $per_page to calculate the children max. We set the max to static increment and use the per_page only to count the top objects. this makes the 75% reducing factor when requesting all 500 and getting less top ones to only make the top objects less still allowing the children to increment more. In practice this keeps numbers low but does reduce less when there are many children making the fetch faster

@alliomeria

At least this way @alliomeria and @karomabiles your decisions stick and are brought from child to top and back. Also changes a bit the messages we send during the batch fetching from Solr. But we need to test this, i kinda feel now things are a bit slower than before. Maybe it is me?

But should be. Tomorrow i need to try with a collection and fetch less and then fetch with offsets, maybe whitneystudioclub:collection

Another need to set $data with a default to please the getKey function

src/Controller/AmiRowAutocompleteHandler.php

src/Form/AmiMultiStepIngest.php

DiegoPino · 2022-08-02T13:41:00Z

src/Form/AmiMultiStepIngest.php

@@ -533,12 +576,18 @@ public function submitForm(array &$form, FormStateInterface $form_state) {
          $amisetdata->total_rows = $data['totalrows'];
        }

-        // We should probably add the UUIDs here right now.
-        $uuid_key = isset($amisetdata->adomapping['uuid']['uuid']) && !empty($amisetdata->adomapping['uuid']['uuid']) ? $amisetdata->adomapping['uuid']['uuid'] : 'node_uuid';
+        /* remove from the mappings any column used as file source */


I will leave this one as it is. Single commends can have /* */ or even //

src/Plugin/ImporterAdapter/SolrImporter.php

Co-authored-by: Albert Min <albert.min@gmail.com>

This needs to be removed in the next release Co-authored-by: Albert Min <albert.min@gmail.com>

DiegoPino · 2022-08-02T13:41:00Z

src/Form/AmiMultiStepIngest.php

@@ -533,12 +576,18 @@ public function submitForm(array &$form, FormStateInterface $form_state) {
          $amisetdata->total_rows = $data['totalrows'];
        }

-        // We should probably add the UUIDs here right now.
-        $uuid_key = isset($amisetdata->adomapping['uuid']['uuid']) && !empty($amisetdata->adomapping['uuid']['uuid']) ? $amisetdata->adomapping['uuid']['uuid'] : 'node_uuid';
+        /* remove from the mappings any column used as file source */


I will leave this one as it is. Single commends can have /* */ or even //

Co-authored-by: Albert Min <albert.min@gmail.com>

DiegoPino · 2022-08-02T13:45:19Z

@aksm thanks for your review

Any reason why we need to use

/**comments */

instead of
/* comments */

Just checking my Drupal Coding standards knowledge. Was not aware that is a requirement? Please let me know

Co-authored-by: Albert Min <albert.min@gmail.com>

aksm · 2022-08-02T14:04:33Z

@aksm thanks for your review

Any reason why we need to use

/**comments */

instead of /* comments */

Just checking my Drupal Coding standards knowledge. Was not aware that is a requirement? Please let me know

@DiegoPino Sorry, I misunderstood the docs. I thought all of them had to start that way, but I see now that it's just docblocks.

DiegoPino · 2022-08-02T14:13:55Z

@aksm no worries. DCS are a lot to digest. Please check if I addressed all your concerns. If so we can move forward and approve and keep walking into other issues. Before merging I need to compare performance with @karomabiles and @alliomeria. I felt a small harvest last night went a bit slow. I can't see why/where it could have been slowed down so it might be just my internet, but still better to compare. Thanks a lot for reviewing!

aksm · 2022-08-02T14:19:59Z

@DiegoPino Looks good to me. Was only scanning for (my limited understanding of) DCS and anything obvious that might jump out at me. Thanks!

Won't go the EAD or the Transform model here, those are pure Metadata. better ways of getting EAD into Archipelago

DiegoPino · 2022-08-02T15:03:35Z

@aksm if it looks good you need to approve it!

DiegoPino · 2022-08-02T15:46:59Z

There might be some Drupal 10.0 deprecations in some parts of AMI. Will deal with those in another pull. In the meantime merging this and we go from there. Thanks

DiegoPino added 10 commits July 27, 2022 20:49

Committing even if not ready yet

fb2ee43

I will attempt a refactor and i want to be able to go back!

haha! No more memory issues

b3ab7ce

Next step now is to report better the progress, since children rows are invisible for the user. Working bee here @alliomeria and @karomabiles

Give data nerds some hints of what is happening!

b7d589e

This upps the increments now that we now this won't fail on memory

381b4b1

@alliomeria @karomabiles we need to test/ see if performance changed

Disabled for now deep traverse

c019e44

Until @alliomeria and @karomabiles have that conversation and we come up with a decision if Spreadsheets need to be split and many attached to the same SET or multiple sets or no sets at all. This can potentially bring 400K objects into a single Sheets.

Validates Mapping to avoid Node to Node to end with UUID mappings

7fe8d1d

Not sure if this is correct

4adc794

But should be. Tomorrow i need to try with a collection and fetch less and then fetch with offsets, maybe whitneystudioclub:collection

DiegoPino requested a review from aksm August 2, 2022 01:51

DiegoPino self-assigned this Aug 2, 2022

DiegoPino added Ingest Setup Knobs and Levers you move while thinking about feelings and metadata and CSV files AMI Import Plugins Gets data from somewhere, puts data into a strawberry basked labels Aug 2, 2022

DiegoPino added this to the 0.4.0 milestone Aug 2, 2022

Update AmiMultiStepIngest.php

3baab4c

Another need to set $data with a default to please the getKey function

aksm suggested changes Aug 2, 2022

View reviewed changes

DiegoPino and others added 4 commits August 2, 2022 09:39

Update src/Controller/AmiRowAutocompleteHandler.php

caca48f

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Controller/AmiRowAutocompleteHandler.php

f145f27

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Form/AmiMultiStepIngest.php

df2df8f

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Form/AmiMultiStepIngest.php

f9c44f2

This needs to be removed in the next release Co-authored-by: Albert Min <albert.min@gmail.com>

DiegoPino commented Aug 2, 2022

View reviewed changes

DiegoPino and others added 7 commits August 2, 2022 09:41

Update src/Plugin/ImporterAdapter/SolrImporter.php

94f377a

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Plugin/ImporterAdapter/SolrImporter.php

2eff6a4

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Plugin/ImporterAdapter/SolrImporter.php

1113e6e

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Plugin/ImporterAdapter/SolrImporter.php

e09f675

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Plugin/ImporterAdapter/SolrImporter.php

5b99a68

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Plugin/ImporterAdapter/SolrImporter.php

2f7ca8b

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Form/AmiMultiStepIngest.php

78f0cbe

Co-authored-by: Albert Min <albert.min@gmail.com>

Update src/Plugin/ImporterAdapter/SolrImporter.php

447ff43

Co-authored-by: Albert Min <albert.min@gmail.com>

Add manuscriptCModel as possible Top Object

8a34799

Won't go the EAD or the Transform model here, those are pure Metadata. better ways of getting EAD into Archipelago

aksm approved these changes Aug 2, 2022

View reviewed changes

DiegoPino merged commit cfabf3b into 0.4.0 Aug 2, 2022

DiegoPino deleted the ISSUE-107 branch August 2, 2022 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISSUE-107: fix out of memory when fetching Top objects with lots of children #112

ISSUE-107: fix out of memory when fetching Top objects with lots of children #112

DiegoPino commented Aug 2, 2022

DiegoPino Aug 2, 2022

DiegoPino Aug 2, 2022

DiegoPino commented Aug 2, 2022

aksm commented Aug 2, 2022

DiegoPino commented Aug 2, 2022

aksm commented Aug 2, 2022

DiegoPino commented Aug 2, 2022

DiegoPino commented Aug 2, 2022

ISSUE-107: fix out of memory when fetching Top objects with lots of children #112

ISSUE-107: fix out of memory when fetching Top objects with lots of children #112

Conversation

DiegoPino commented Aug 2, 2022

What?

DiegoPino Aug 2, 2022

Choose a reason for hiding this comment

DiegoPino Aug 2, 2022

Choose a reason for hiding this comment

DiegoPino commented Aug 2, 2022

aksm commented Aug 2, 2022

DiegoPino commented Aug 2, 2022

aksm commented Aug 2, 2022

DiegoPino commented Aug 2, 2022

DiegoPino commented Aug 2, 2022