Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PB-962: Improve performance to list items #467

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

benschs
Copy link
Contributor

@benschs benschs commented Oct 29, 2024

Endpoint to get list of items is sometimes slow. This is due to the sql query that the postgresql planner does not optimize properly, even though the indexes seem to be set as expected. Using a subquery to filter greatly improves the performance (up to 1 second reduction in query execution time).

The following EXPLAIN ANALYZE queries were run on INT with postgres v16.4 using the collection ch.swisstopo.swisstlm3d.

EXPLAIN ANALYZE before change:

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT
    "stac_api_item"."id",
    "stac_api_item"."name",
    "stac_api_item"."collection_id",
    "stac_api_item"."geometry" :: bytea,
    "stac_api_item"."created",
    "stac_api_item"."updated",
    "stac_api_item"."properties_datetime",
    "stac_api_item"."properties_start_datetime",
    "stac_api_item"."properties_end_datetime",
    "stac_api_item"."properties_expires",
    "stac_api_item"."properties_title",
    "stac_api_item"."etag",
    "stac_api_item"."update_interval",
    "stac_api_collection"."id",
    "stac_api_collection"."published",
    "stac_api_collection"."name",
    "stac_api_collection"."created",
    "stac_api_collection"."updated",
    "stac_api_collection"."description",
    "stac_api_collection"."extent_out_of_sync",
    "stac_api_collection"."extent_geometry" :: bytea,
    "stac_api_collection"."extent_start_datetime",
    "stac_api_collection"."extent_end_datetime",
    "stac_api_collection"."license",
    "stac_api_collection"."summaries",
    "stac_api_collection"."summaries_proj_epsg",
    "stac_api_collection"."summaries_eo_gsd",
    "stac_api_collection"."summaries_geoadmin_variant",
    "stac_api_collection"."summaries_geoadmin_lang",
    "stac_api_collection"."title",
    "stac_api_collection"."etag",
    "stac_api_collection"."update_interval",
    "stac_api_collection"."allow_external_assets",
    "stac_api_collection"."external_asset_whitelist"
FROM
    "stac_api_item"
    INNER JOIN "stac_api_collection" ON (
        "stac_api_item"."collection_id" = "stac_api_collection"."id"
    )
WHERE
    (
        (
            "stac_api_item"."properties_expires" >= '2024-10-29 07:44:53.685705+00:00'
            OR "stac_api_item"."properties_expires" IS NULL
        )
        AND "stac_api_collection"."name" = 'ch.swisstopo.swisstlm3d'
    )
ORDER BY
    "stac_api_item"."name" ASC
LIMIT
    1;

 Limit  (cost=0.71..444.50 rows=1 width=640) (actual time=1244.442..1244.446 rows=1 loops=1)
   Output: stac_api_item.id, stac_api_item.name, stac_api_item.collection_id, ((stac_api_item.geometry)::bytea), stac_api_item.created, stac_api_item.updated, stac_api_item.properties_datetime, stac_api_item.properties_start_datetime, stac_api_item.properties_end_datetime, stac_api_item.properties_expires, stac_api_item.properties_title, stac_api_item.etag, stac_api_item.update_interval, stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, ((stac_api_collection.extent_geometry)::bytea), stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
   Buffers: shared hit=1363491
   ->  Nested Loop  (cost=0.71..280031.86 rows=631 width=640) (actual time=1244.441..1244.442 rows=1 loops=1)
         Output: stac_api_item.id, stac_api_item.name, stac_api_item.collection_id, (stac_api_item.geometry)::bytea, stac_api_item.created, stac_api_item.updated, stac_api_item.properties_datetime, stac_api_item.properties_start_datetime, stac_api_item.properties_end_datetime, stac_api_item.properties_expires, stac_api_item.properties_title,'ch.swisstopo.swisstlm3d' stac_api_item.etag, stac_api_item.update_interval, stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, (stac_api_collection.extent_geometry)::bytea, stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
         Inner Unique: true
         Join Filter: (stac_api_item.collection_id = stac_api_collection.id)
         Rows Removed by Join Filter: 1369040
         Buffers: shared hit=1363491
         ->  Index Scan using item_name_idx on public.stac_api_item  (cost=0.43..259561.00 rows=1343137 width=281) (actual time=0.009..863.702 rows=1369041 loops=1)
               Output: stac_api_item.id, stac_api_item.name, stac_api_item.collection_id, stac_api_item.geometry, stac_api_item.created, stac_api_item.updated, stac_api_item.properties_datetime, stac_api_item.properties_start_datetime, stac_api_item.properties_end_datetime, stac_api_item.properties_expires, stac_api_item.properties_title, stac_api_item.etag, stac_api_item.update_interval
               Filter: ((stac_api_item.properties_expires >= '2024-10-29 07:44:53.685705+00'::timestamp with time zone) OR (stac_api_item.properties_expires IS NULL))
               Buffers: shared hit=1363488
         ->  Materialize  (cost=0.28..8.30 rows=1 width=534) (actual time=0.000..0.000 rows=1 loops=1369041)
               Output: stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, stac_api_collection.extent_geometry, stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
               Buffers: shared hit=3
               ->  Index Scan using collection_name_idx on public.stac_api_collection  (cost=0.28..8.30 rows=1 width=534) (actual time=0.014..0.014 rows=1 loops=1)
                     Output: stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, stac_api_collection.extent_geometry, stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
                     Index Cond: ((stac_api_collection.name)::text = 'ch.swisstopo.swisstlm3d'::text)
                     Buffers: shared hit=3
 Query Identifier: -4398731555030374451
 Planning:
   Buffers: shared hit=12
 Planning Time: 0.322 ms
 Execution Time: 1244.502 ms
(25 rows)

EXPLAIN ANALYZE after change:

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT
    "stac_api_item"."id",
    "stac_api_item"."name",
    "stac_api_item"."collection_id",
    "stac_api_item"."geometry" :: bytea,
    "stac_api_item"."created",
    "stac_api_item"."updated",
    "stac_api_item"."properties_datetime",
    "stac_api_item"."properties_start_datetime",
    "stac_api_item"."properties_end_datetime",
    "stac_api_item"."properties_expires",
    "stac_api_item"."properties_title",
    "stac_api_item"."etag",
    "stac_api_item"."update_interval",
    "stac_api_collection"."id",
    "stac_api_collection"."published",
    "stac_api_collection"."name",
    "stac_api_collection"."created",
    "stac_api_collection"."updated",
    "stac_api_collection"."description",
    "stac_api_collection"."extent_out_of_sync",
    "stac_api_collection"."extent_geometry" :: bytea,
    "stac_api_collection"."extent_start_datetime",
    "stac_api_collection"."extent_end_datetime",
    "stac_api_collection"."license",
    "stac_api_collection"."summaries",
    "stac_api_collection"."summaries_proj_epsg",
    "stac_api_collection"."summaries_eo_gsd",
    "stac_api_collection"."summaries_geoadmin_variant",
    "stac_api_collection"."summaries_geoadmin_lang",
    "stac_api_collection"."title",
    "stac_api_collection"."etag",
    "stac_api_collection"."update_interval",
    "stac_api_collection"."allow_external_assets",
    "stac_api_collection"."external_asset_whitelist"
FROM
    "stac_api_item"
    INNER JOIN "stac_api_collection" ON (
        "stac_api_item"."collection_id" = "stac_api_collection"."id"
    )
WHERE
    (
        (
            "stac_api_item"."properties_expires" >= '2024-10-29 07:53:51.568155+00:00'
            OR "stac_api_item"."properties_expires" IS NULL
        )
        AND "stac_api_item"."collection_id" = (
            SELECT
                U0."id"
            FROM
                "stac_api_collection" U0
            WHERE
                U0."name" = 'ch.swisstopo.swisstlm3d'
        )
    )
ORDER BY
    "stac_api_item"."name" ASC
LIMIT
    1;

 Limit  (cost=9.00..12.70 rows=1 width=640) (actual time=0.050..0.051 rows=1 loops=1)
   Output: stac_api_item.id, stac_api_item.name, stac_api_item.collection_id, ((stac_api_item.geometry)::bytea), stac_api_item.created, stac_api_item.updated, stac_api_item.properties_datetime, stac_api_item.properties_start_datetime, stac_api_item.properties_end_datetime, stac_api_item.properties_expires, stac_api_item.properties_title, stac_api_item.etag, stac_api_item.update_interval, stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, ((stac_api_collection.extent_geometry)::bytea), stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
   Buffers: shared hit=10
   InitPlan 1 (returns $0)
     ->  Index Scan using collection_name_idx on public.stac_api_collection u0  (cost=0.28..8.30 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1)
           Output: u0.id
           Index Cond: ((u0.name)::text = 'ch.swisstopo.swisstlm3d'::text)
           Buffers: shared hit=3
   ->  Nested Loop  (cost=0.71..58389.98 rows=15802 width=640) (actual time=0.049..0.049 rows=1 loops=1)
         Output: stac_api_item.id, stac_api_item.name, stac_api_item.collection_id, (stac_api_item.geometry)::bytea, stac_api_item.created, stac_api_item.updated, stac_api_item.properties_datetime, stac_api_item.properties_start_datetime, stac_api_item.properties_end_datetime, stac_api_item.properties_expires, stac_api_item.properties_title, stac_api_item.etag, stac_api_item.update_interval, stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, (stac_api_collection.extent_geometry)::bytea, stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
         Buffers: shared hit=10
         ->  Index Scan using stac_api_item_collection_id_name_78fbc154_uniq on public.stac_api_item  (cost=0.43..50283.16 rows=15802 width=281) (actual time=0.032..0.032 rows=1 loops=1)
               Output: stac_api_item.id, stac_api_item.name, stac_api_item.collection_id, stac_api_item.geometry, stac_api_item.created, stac_api_item.updated, stac_api_item.properties_datetime, stac_api_item.properties_start_datetime, stac_api_item.properties_end_datetime, stac_api_item.properties_expires, stac_api_item.properties_title, stac_api_item.etag, stac_api_item.update_interval
               Index Cond: (stac_api_item.collection_id = $0)
               Filter: ((stac_api_item.properties_expires >= '2024-10-29 07:53:51.568155+00'::timestamp with time zone) OR (stac_api_item.properties_expires IS NULL))
               Buffers: shared hit=7
         ->  Materialize  (cost=0.28..8.30 rows=1 width=534) (actual time=0.008..0.008 rows=1 loops=1)
               Output: stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, stac_api_collection.extent_geometry, stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
               Buffers: shared hit=3
               ->  Index Scan using stac_api_collection_pkey on public.stac_api_collection  (cost=0.28..8.30 rows=1 width=534) (actual time=0.005..0.005 rows=1 loops=1)
                     Output: stac_api_collection.id, stac_api_collection.published, stac_api_collection.name, stac_api_collection.created, stac_api_collection.updated, stac_api_collection.description, stac_api_collection.extent_out_of_sync, stac_api_collection.extent_geometry, stac_api_collection.extent_start_datetime, stac_api_collection.extent_end_datetime, stac_api_collection.license, stac_api_collection.summaries, stac_api_collection.summaries_proj_epsg, stac_api_collection.summaries_eo_gsd, stac_api_collection.summaries_geoadmin_variant, stac_api_collection.summaries_geoadmin_lang, stac_api_collection.title, stac_api_collection.etag, stac_api_collection.update_interval, stac_api_collection.allow_external_assets, stac_api_collection.external_asset_whitelist
                     Index Cond: (stac_api_collection.id = $0)
                     Buffers: shared hit=3
 Query Identifier: -4337872972699751832
 Planning Time: 0.277 ms
 Execution Time: 0.099 ms
(26 rows)

Endpoint to get list of items is sometimes slow. This is due to the sql query
that the postgresql planner does not optimize properly, even though the indexes
seem to be set as expected. Using a subquery to filter greatly improves the
performance (up to 1 second reduction in query execution time).
@benschs benschs requested review from ltclm and boecklic October 29, 2024 12:28
Copy link
Contributor

@ltclm ltclm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you 👍

@benschs benschs merged commit 113e776 into develop Oct 29, 2024
3 checks passed
@benschs benschs deleted the feat-PB-962-list-item-performance branch October 29, 2024 15:17
Copy link
Contributor

@boecklic boecklic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great finding, very nice! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants