Heap reading optimization and fixes #366

mkaruza · 2024-10-29T08:50:08Z

PR contains two commits that are supposed to be committed separately.

First commit improves performances of reading heap tuples and writing them back to duckdb output vector by converting map lookups into vector structures that are read sequentially for each row.

Second commit addresses problem of creating BufferAccessStrategy for each page read and creates single object for complete duration of scan execution.

include/pgduckdb/scan/postgres_scan.hpp

src/scan/postgres_scan.cpp

src/pgduckdb_types.cpp

Currently we use map to match duckdb output column idx to postgres column id. For column filter we would also do lookup for each column. This is not needed as this information are not changed during scan of heap tables. Optimization is based on idea to construct vector that holds these mapping information so map lookups are not needed anymmore. For filtering, same logic is applied, now we would have vector of column filters.

@MMeent

Currently we are creating BufferAccessStrategy for each page fetch. With help of @MMeent issue was identified and this commit fixes this problem by creating BufferAccessStrategy for each scan instead.

JelteF

These look like good changes to me. I'm still planning to look at refactoring this code a bit more to make the different indexes and arrays easier to understand. But for that I need to try a few things and play around with the code. The current PR is already much easier to follow than before.

JelteF · 2024-10-29T12:37:49Z

Let's add PR numbers to the commit message titles though for easy reference if you're not doing a squash merge.

mkaruza requested review from Y-- and JelteF October 29, 2024 08:50

Y-- reviewed Oct 29, 2024

View reviewed changes

include/pgduckdb/scan/postgres_scan.hpp Show resolved Hide resolved

src/scan/postgres_scan.cpp Outdated Show resolved Hide resolved

src/pgduckdb_types.cpp Outdated Show resolved Hide resolved

src/pgduckdb_types.cpp Outdated Show resolved Hide resolved

mkaruza added 2 commits October 29, 2024 10:35

Don't create BufferAccessStrategy for each tuple fetch

cf96745

Currently we are creating BufferAccessStrategy for each page fetch. With help of @MMeent issue was identified and this commit fixes this problem by creating BufferAccessStrategy for each scan instead.

mkaruza force-pushed the optimization branch from aefc6e7 to cf96745 Compare October 29, 2024 09:35

JelteF approved these changes Oct 29, 2024

View reviewed changes

Y-- approved these changes Oct 29, 2024

View reviewed changes

Y-- merged commit d42b05c into main Oct 29, 2024
4 checks passed

Y-- deleted the optimization branch October 29, 2024 13:44

JelteF mentioned this pull request Nov 27, 2024

Make InsertTupleIntoChunk more efficient #416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heap reading optimization and fixes #366

Heap reading optimization and fixes #366

mkaruza commented Oct 29, 2024

JelteF left a comment

JelteF commented Oct 29, 2024 •

edited

Loading

Heap reading optimization and fixes #366

Heap reading optimization and fixes #366

Conversation

mkaruza commented Oct 29, 2024

JelteF left a comment

Choose a reason for hiding this comment

JelteF commented Oct 29, 2024 • edited Loading

JelteF commented Oct 29, 2024 •

edited

Loading