Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] IslandoraUtils.php#getReferencingMedia very slow in certain circumstances #1055

Open
hading opened this issue Sep 25, 2024 · 5 comments

Comments

@hading
Copy link

hading commented Sep 25, 2024

Strictly speaking this isn't so much a bug as a performance issue, and one that may only occur in certain circumstances. That being the case I'm not sure how interested you'll be in it, but we thought it'd be useful to note.

We were regenerating service file derivatives and it was going quite slow. Eventually we tracked this down to the query being generated by the IslandoraUtils.php#getReferencingMedia function being very slow.

For context, our repository has approximately 500K nodes and 2M media. We also use the group module for certain permissions. We're basically running ISLE (~3.2 or so) on a plenty resourceful machine - 16 CPUs and 48G memory.

The function generates queries like the one I've included at the end of the ticket.

This query takes about 60-90s to run in our set up. And of course when doing this mass regeneration of derivatives one like it runs for every single file in a post save hook.

While we're surprised that the mariadb query optimizer doesn't handle this a bit better (and everything that seems to need a database index here is indexed), it doesn't for us.

We have worked around this by rewriting the method so that instead of ORing together all of the $conditions coming from the referencing fields in a single query we make a separate query for each condition and combine those results. This all happens in a second or two, compared to the original 60s+.

I suspect that the number of media we have makes things get out of hand with all the joins and is more manageable (and maybe the query optimizer is better) with smaller queries in our case.

Query
SELECT base_table.vid AS vid, base_table.mid AS mid
FROM
    media base_table
        LEFT JOIN media__field_captions media__field_captions ON media__field_captions.entity_id = base_table.mid
        LEFT JOIN media__field_fits_file media__field_fits_file ON media__field_fits_file.entity_id = base_table.mid
        LEFT JOIN media__field_media_audio_file media__field_media_audio_file ON media__field_media_audio_file.entity_id = base_table.mid
        LEFT JOIN media__field_media_document media__field_media_document ON media__field_media_document.entity_id = base_table.mid
        LEFT JOIN media__field_media_file media__field_media_file ON media__field_media_file.entity_id = base_table.mid
        LEFT JOIN media__field_media_image media__field_media_image ON media__field_media_image.entity_id = base_table.mid
        LEFT JOIN media__field_media_video_file media__field_media_video_file ON media__field_media_video_file.entity_id = base_table.mid
        LEFT JOIN media__field_service_file media__field_service_file ON media__field_service_file.entity_id = base_table.mid
        LEFT JOIN media__field_track media__field_track ON media__field_track.entity_id = base_table.mid
        LEFT OUTER JOIN group_relationship_field_data gcfd ON base_table.mid=gcfd.entity_id AND gcfd.plugin_id IN ('group_media:audio', 'group_media:document', 'group_media:extracted_text', 'group_media:file', 'group_media:fits_technical_metadata', 'group_media:image', 'group_media:video')
        INNER JOIN media_field_data media_field_data ON base_table.mid=media_field_data.mid
        LEFT OUTER JOIN group_relationship_field_data gcfd_2 ON gcfd.gid=gcfd_2.gid AND gcfd_2.plugin_id='group_membership' AND gcfd_2.entity_id='5'
WHERE ((media__field_captions.field_captions_target_id = '981051') or (media__field_fits_file.field_fits_file_target_id = '981051') or (media__field_media_audio_file.field_media_audio_file_target_id = '981051') or (media__field_media_document.field_media_document_target_id = '981051') or (media__field_media_file.field_media_file_target_id = '981051') or (media__field_media_image.field_media_image_target_id = '981051') or (media__field_media_video_file.field_media_video_file_target_id = '981051') or (media__field_service_file.field_service_file_target_id = '981051') or (media__field_track.field_track_target_id = '981051')) AND ((gcfd.entity_id IS NULL) OR ((media_field_data.status = '0') AND (((gcfd.type IN ('bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_9b20217818d0f', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_5f103496bc6d6', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_bbda3c0506a8e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_7f6ed65c8beec', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_79f4033ae3129', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_456c3eefedecd', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_071e750e136ed', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NULL)) OR ((gcfd.type IN ('group_content_type_9b20217818d0f', 'bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'group_content_type_5f103496bc6d6', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_bbda3c0506a8e', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'group_content_type_7f6ed65c8beec', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_79f4033ae3129', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'group_content_type_456c3eefedecd', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'group_content_type_071e750e136ed', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NOT NULL)))) OR ((media_field_data.status = '1') AND (((gcfd.type IN ('bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_9b20217818d0f', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_5f103496bc6d6', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_bbda3c0506a8e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_7f6ed65c8beec', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_79f4033ae3129', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_456c3eefedecd', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_071e750e136ed', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NULL)) OR ((gcfd.type IN ('group_content_type_9b20217818d0f', 'bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'group_content_type_5f103496bc6d6', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_bbda3c0506a8e', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'group_content_type_7f6ed65c8beec', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_79f4033ae3129', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'group_content_type_456c3eefedecd', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'group_content_type_071e750e136ed', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NOT NULL)))));
@adam-vessey
Copy link

Based on the query there, it looks like you're using the group module. There is at least one big performance issue there that I know of that can pop up in larger sites: https://www.drupal.org/project/group/issues/3421756

That said, in our instance, instead of relating every media type to the group, we instead implemented/use https://github.com/discoverygarden/islandora_hierarchical_access to allow the nodes' relation to the group to affect their related media and files.

@hading
Copy link
Author

hading commented Oct 2, 2024

I don't think that the group module is the culprit here. When I was looking at this I tested the query above with all the group stuff removed from it and the performance was still quite bad.

@jordandukart
Copy link
Member

jordandukart commented Oct 9, 2024

Was discussed on the October 9th, 2024 tech call.

Looks like that utils function is being inefficient and generic with how it's attempting to take a fid and resolve any media references.

You mentioned regenerating service files, how is this being triggered? At scale, as you've noted, re-writing and optimization becomes a bigger issue. May be easier to be more explicit about how that media is resolved for the triggering functions as opposed to relying on IslandoraUtils.

@DonRichards
Copy link
Member

DonRichards commented Jan 22, 2025

Running this through my IDE and it suggested some changes to fix the performance for IslandoraUtils.php https://github.com/Islandora/islandora/blob/2.x/src/IslandoraUtils.php
None of this was tested. Just giving ideas on approaching the issue.

/**
 * Get array of media ids that have fields that reference $node and $term.
 *
 * @param \Drupal\node\NodeInterface $node
 *   The node to reference.
 * @param \Drupal\taxonomy\TermInterface $term
 *   The term to reference.
 *
 * @return array|null
 *   Array of media IDs or NULL.
 */
public function getMediaReferencingNodeAndTerm(NodeInterface $node, TermInterface $term) {
    // Early returns for missing references
    $term_fields = $this->getReferencingFields('media', 'taxonomy_term');
    if (empty($term_fields)) {
        \Drupal::logger('islandora')->warning('No media fields reference a taxonomy term');
        return NULL;
    }

    $node_fields = $this->getReferencingFields('media', 'node');
    if (empty($node_fields)) {
        \Drupal::logger('islandora')->warning('No media fields reference a node');
        return NULL;
    }

    // Clean up field names by removing entity prefix
    $term_fields = array_map(function($field) {
        return substr($field, strpos($field, '.') + 1);
    }, $term_fields);
    
    $node_fields = array_map(function($field) {
        return substr($field, strpos($field, '.') + 1);
    }, $node_fields);

    try {
        $query = $this->entityTypeManager->getStorage('media')->getQuery();
        $query->accessCheck(TRUE);

        // Add term condition group
        $term_group = $query->orConditionGroup();
        foreach ($term_fields as $field) {
            $term_group->condition($field, $term->id());
        }
        $query->condition($term_group);

        // Add node condition group
        $node_group = $query->orConditionGroup();
        foreach ($node_fields as $field) {
            $node_group->condition($field, $node->id());
        }
        $query->condition($node_group);

        // Execute query
        $media_ids = $query->execute();
        return !empty($media_ids) ? $media_ids : NULL;

    }
    catch (QueryException $e) {
        \Drupal::logger('islandora')->error('Error executing media query: @message', [
            '@message' => $e->getMessage()
        ]);
        return NULL;
    }
}

/**
 * Get the fields on an entity of $entity_type that reference a $target_type.
 *
 * @param string $entity_type
 *   Type of entity to search for.
 * @param string $target_type
 *   Type of entity the field references.
 *
 * @return array
 *   Array of fields.
 */
public function getReferencingFields($entity_type, $target_type) {
    $fields = $this->entityTypeManager->getStorage('field_storage_config')
        ->getQuery()
        ->condition('entity_type', $entity_type)
        ->condition('settings.target_type', $target_type)
        ->execute();
        
    return is_array($fields) ? $fields : [$fields];
}

@laurinpenland
Copy link

this is what we did:
//trico reimplementation
public function getReferencingMedia($fid) {
// Get media fields that reference files.
$fields = $this->getReferencingFields('media', 'file');

    // Process field names, stripping off 'media.' and appending 'target_id'.
    $conditions = array_map(
        function ($field) {
            return ltrim($field, 'media.') . '.target_id';
        },
        $fields
    );

    // Query for media that reference this file.
    $all_media = [];
    foreach ($conditions as $condition) {
        $query = $this->entityTypeManager->getStorage('media')->getQuery();
        $query->accessCheck(TRUE);
        $group = $query->orConditionGroup();
        $group->condition($condition, $fid);
        $query->condition($group);
        $media = $this->entityTypeManager->getStorage('media')
                                         ->loadMultiple($query->execute());
        foreach ($media as $m) {
            $all_media[] = $m;
        }
    }

    return $all_media;
    
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants