Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reduce memory consumption of scans #41272

Merged
merged 1 commit into from
Nov 28, 2023
Merged

fix: reduce memory consumption of scans #41272

merged 1 commit into from
Nov 28, 2023

Conversation

solracsf
Copy link
Member

@solracsf solracsf commented Nov 3, 2023

Summary

array_filter is pretty memory intensive, specially in the context of appdata_ folder where it can have thousands of folders. Construct the array in a loop instead.

The array_filter creates a new array and populates it with elements that meet the filter condition. This means it creates a new array in memory to hold the filtered elements, which can be memory-intensive, especially if the original array $children is large.

On the other hand, a foreach loop initializes an empty array $childFolders and then iterates through the $children array, only adding elements to $childFolders if they meet the filtering condition. This approach doesn't create an additional array for the filtered elements, so it consumes less memory.

Before

old

After

new

Test system

+---------+--------+--------------+
| Folders | Files  | Elapsed time |
+---------+--------+--------------+
| 217688  | 155452 | 00:09:46     |
+---------+--------+--------------+

Checklist

@solracsf solracsf added the 3. to review Waiting for reviews label Nov 3, 2023
@solracsf solracsf added this to the Nextcloud 28 milestone Nov 3, 2023
@solracsf solracsf added the php Pull requests that update Php code label Nov 3, 2023
Signed-off-by: Git'Fellow <12234510+solracsf@users.noreply.github.com>

Fix lint

Signed-off-by: Git'Fellow <12234510+solracsf@users.noreply.github.com>
@kesselb
Copy link
Contributor

kesselb commented Nov 3, 2023

To benchmark it myself, I need to upload a good number of images, pre generate all previews, delete the previews again and run occ files:scan-app-data?

@come-nc
Copy link
Contributor

come-nc commented Nov 6, 2023

Summary

array_filter is pretty memory intensive, specially in the context of appdata_ folder where it can have thousands of folders. Construct the array in a loop instead.

The array_filter creates a new array and populates it with elements that meet the filter condition. This means it creates a new array in memory to hold the filtered elements, which can be memory-intensive, especially if the original array $children is large.

On the other hand, a foreach loop initializes an empty array $childFolders and then iterates through the $children array, only adding elements to $childFolders if they meet the filtering condition. This approach doesn't create an additional array for the filtered elements, so it consumes less memory.

I do not understand the difference, you wrote twice the same thing from what I understand.

If array_filter is worse than a loop doing the same thing an issue should be opened with the PHP project.
Did you test on several PHP version for a difference?
Also, maybe it uses more memory because it prioritizes speed?

@come-nc
Copy link
Contributor

come-nc commented Nov 6, 2023

https://externals.io/message/100452
Interesting read, 6 years old so not sure if all still apply but it seems a callback function is always expensive speedwise.

@solracsf
Copy link
Member Author

solracsf commented Nov 6, 2023

On my tests, on different php projects, the foreach loop was always less memory intensive (i'm not analysing speed/performance here) once the array is big (>hundreds). In this specific case, I've tested with v8.1 only.

This was referenced Nov 6, 2023
This was referenced Nov 14, 2023
@blizzz blizzz modified the milestones: Nextcloud 28, Nextcloud 29 Nov 23, 2023
@solracsf solracsf merged commit 394cd75 into master Nov 28, 2023
50 checks passed
@solracsf solracsf deleted the fixOccScans branch November 28, 2023 10:37
@solracsf
Copy link
Member Author

/backport to stable28

@solracsf
Copy link
Member Author

/backport to stable27

@solracsf
Copy link
Member Author

/backport to stable26

@blizzz
Copy link
Member

blizzz commented Nov 28, 2023

foreach() has a lower memory footprint the most of the array_* functions, so when you can expect to get a few more entries in your arrays, use foreach.

@solracsf
Copy link
Member Author

/backport to stable28

@solracsf
Copy link
Member Author

/skjnldsv-backport to stable28

@solracsf
Copy link
Member Author

/skjnldsv-backport to stable27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews backport-request php Pull requests that update Php code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: occ files:scan-app-data consumes too much memory
5 participants