Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap reading optimization and fixes #366

Merged
merged 2 commits into from
Oct 29, 2024
Merged

Heap reading optimization and fixes #366

merged 2 commits into from
Oct 29, 2024

Conversation

mkaruza
Copy link
Collaborator

@mkaruza mkaruza commented Oct 29, 2024

PR contains two commits that are supposed to be committed separately.

First commit improves performances of reading heap tuples and writing them back to duckdb output vector by converting map lookups into vector structures that are read sequentially for each row.

Second commit addresses problem of creating BufferAccessStrategy for each page read and creates single object for complete duration of scan execution.

@mkaruza mkaruza requested review from Y-- and JelteF October 29, 2024 08:50
include/pgduckdb/scan/postgres_scan.hpp Show resolved Hide resolved
src/scan/postgres_scan.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
Currently we use map to match duckdb output column idx to postgres column
id. For column filter we would also do lookup for each column. This is
not needed as this information are not changed during scan of heap
tables.
Optimization is based on idea to construct vector that holds these mapping
information so map lookups are not needed anymmore. For filtering, same
logic is applied, now we would have vector of column filters.
Currently we are creating BufferAccessStrategy for each page fetch.
With help of @MMeent issue was identified and this commit fixes this
problem by creating BufferAccessStrategy for each scan instead.
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like good changes to me. I'm still planning to look at refactoring this code a bit more to make the different indexes and arrays easier to understand. But for that I need to try a few things and play around with the code. The current PR is already much easier to follow than before.

@JelteF
Copy link
Collaborator

JelteF commented Oct 29, 2024

Let's add PR numbers to the commit message titles though for easy reference if you're not doing a squash merge.

@Y-- Y-- merged commit d42b05c into main Oct 29, 2024
4 checks passed
@Y-- Y-- deleted the optimization branch October 29, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants