-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon/internal/db2/history: Insert and query rows from history lookup tables with one query #5415
Conversation
2ffc951
to
877cfb4
Compare
@urvisavla can you take another look? I believe I've addressed all the feedback |
Actually it turns out there's a possible race condition in the get or create query where the query can return 0 rows if there are 2 transactions attempting to insert the same data concurrently. See https://stackoverflow.com/a/15950324 for more details. I will need to update this PR |
@urvisavla now it's ready for review again. I fixed the race condition but unfortunately we cannot do both insertion and querying within a single sql statement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're essentially removing the initial select, updating the insert to return the inserted rows, and then adding a select to fetch only the remaining rows, correct?
Also, I'm assuming there's no need to check the performance of the change on staging since we're no longer combining the queries, right?
Changes look good to me. Left a couple of comments for consideration. Also the description needs update.
yes, that's correct
I'm still checking the performance of staging to make sure there are no regressions I also updated the PR to reduce code duplication between all 4 loaders. PTAL, thanks! |
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
fix #4864
Currently, the loaders for the history lookup tables attempt to get or create history ids by issuing up to 3 SQL statements. First, we query for any existing rows. Then we insert the rows that are not present. Finally, we query the rows that were just inserted.
In #4864 we discussed that if we included a RETURNING clause to the insert statements we could obtain all the rows both present and inserted in a single query.
However, @urvisavla pointed out:
So that means the RETURNING clause could reduce the number of SQL statements from 3 to 2 but we would still need to do a SQL statement to find all the pre-existing rows.
But, it turns out we can use a UNION clause to combine theINSERT ... RETURNING *
statement with another query which just selects all the pre-existing rows. This technique is described in more detail here:https://hakibenita.com/postgresql-get-or-createKnown limitations
Unfortunately when deploying this PR to staging I did not observe any significant improvements to the "Ingestion Loaders Duration" metric.
Also, with this change we lose insight into the number of rows which were inserted vs queried.Nevertheless, I do think it's still worth including this change as it reduces code complexity