-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] IslandoraUtils.php#getReferencingMedia very slow in certain circumstances #1055
Comments
Based on the query there, it looks like you're using the That said, in our instance, instead of relating every media type to the group, we instead implemented/use https://github.com/discoverygarden/islandora_hierarchical_access to allow the nodes' relation to the group to affect their related media and files. |
I don't think that the group module is the culprit here. When I was looking at this I tested the query above with all the group stuff removed from it and the performance was still quite bad. |
Was discussed on the October 9th, 2024 tech call. Looks like that You mentioned regenerating service files, how is this being triggered? At scale, as you've noted, re-writing and optimization becomes a bigger issue. May be easier to be more explicit about how that media is resolved for the triggering functions as opposed to relying on |
Running this through my IDE and it suggested some changes to fix the performance for IslandoraUtils.php https://github.com/Islandora/islandora/blob/2.x/src/IslandoraUtils.php
|
this is what we did:
|
Strictly speaking this isn't so much a bug as a performance issue, and one that may only occur in certain circumstances. That being the case I'm not sure how interested you'll be in it, but we thought it'd be useful to note.
We were regenerating service file derivatives and it was going quite slow. Eventually we tracked this down to the query being generated by the IslandoraUtils.php#getReferencingMedia function being very slow.
For context, our repository has approximately 500K nodes and 2M media. We also use the group module for certain permissions. We're basically running ISLE (~3.2 or so) on a plenty resourceful machine - 16 CPUs and 48G memory.
The function generates queries like the one I've included at the end of the ticket.
This query takes about 60-90s to run in our set up. And of course when doing this mass regeneration of derivatives one like it runs for every single file in a post save hook.
While we're surprised that the mariadb query optimizer doesn't handle this a bit better (and everything that seems to need a database index here is indexed), it doesn't for us.
We have worked around this by rewriting the method so that instead of ORing together all of the
$conditions
coming from the referencing fields in a single query we make a separate query for each condition and combine those results. This all happens in a second or two, compared to the original 60s+.I suspect that the number of media we have makes things get out of hand with all the joins and is more manageable (and maybe the query optimizer is better) with smaller queries in our case.
Query
The text was updated successfully, but these errors were encountered: