-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the new block endpoint for history fill #162
Conversation
@edmundnoble Thank you very much for opening this PR! I've been waiting for the corresponding CW-N endpoints to go into a release and now that they are I'll look into updating, verifying and merging this PR. |
@edmundnoble, I've used a CW-D instance from this branch to backfill the database over night. Today I've noticed that it currently leaves a lot of gaps of blocks ~10-20 heights tall. Seems like this is caused by the fact that we're trying to fetch chunks of 360 blocks by asking for I'm planning to fix the issue with CW-D by shrinking our chunks, chunks of 360 is unnecessarily large anyway and I think it also inflates the memory consumption unnecessarily. |
Unfortunately, there is no paginated endpoint which includes orphans, which would be the natural way to deal with this issue. You can sort of get around this by asking for maxheight - minheight = 120, because there are almost never more than 2 orphans at a given height. |
I also opted for a chunk size of 120, which solved the problem of gaps as expected. I'll leave this fill running for a little more and look for any issues, otherwise it seems like this PR is very close to completion. |
I think this PR is ready to be merged. I've been running a During those 24 hours, No gaps in the dataI've run the following query to check if there are any gaps in the populated data: SELECT chainid, height, next_height
FROM (
SELECT
chainid,
height,
LEAD(height) OVER (PARTITION BY chainid ORDER BY height DESC) AS next_height
FROM
blocks
) AS subquery
WHERE
(height - next_height) > 1
ORDER BY
chainid ASC,
height DESC LIMIT 20;
chainid | height | next_height
---------+--------+-------------
(0 rows) (Essentially the same query performed by Populated data is identical to the data from the previous releaseSELECT count(*)
, count(*) FILTER (WHERE (SELECT to_jsonb(b) = to_jsonb(b2) FROM cwd.blocks b2 WHERE b2.hash = b.hash))
, count(*) FILTER (WHERE NOT (SELECT to_jsonb(b) = to_jsonb(b2) FROM cwd.blocks b2 WHERE b2.hash = b.hash))
FROM blocks b
;
count | count | count
----------+----------+-------
34508643 | 34498688 | 0
(1 row) Seems like we've ended up with some additional blocks (I think this new endpoint provides all orphans seen), but all the blocks we've copied that are common with the release CW-D have all of their fields identical. SELECT count(*)
, count(*) FILTER (WHERE (SELECT to_jsonb(t) = to_jsonb(t2) FROM cwd.transactions t2 WHERE t2.block = t.block AND t2.requestkey = t.requestkey))
, count(*) FILTER (WHERE NOT (SELECT to_jsonb(t) = to_jsonb(t2) FROM cwd.transactions t2 WHERE t2.block = t.block AND t2.requestkey = t.requestkey))
FROM transactions t
WHERE t.height > 4100000
;
count | count | count
----------+----------+-------
22884000 | 22825320 | 0
(1 row) Similar query for a subset of the transactions, no discrepancy found. The query was a little trickier for the events table, because that table contains the SELECT count(*)
, count(*) FILTER (WHERE (SELECT to_jsonb(e) - 'paramtext' = to_jsonb(e2) - 'paramtext' FROM cwd.events e2 WHERE e2.block = e.block AND e2.requestkey = e.requestkey AND e2.idx = e.idx))
, count(*) FILTER (WHERE NOT (SELECT to_jsonb(e) - 'paramtext' = to_jsonb(e2) - 'paramtext' FROM cwd.events e2 WHERE e2.block = e.block AND e2.requestkey = e.requestkey AND e2.idx = e.idx))
FROM events e
WHERE e.height > 4300000
;
count | count | count
---------+---------+-------
7695345 | 7674529 | 0
(1 row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you very much for the PR @edmundnoble
This PR uses
chainweb-node
's new block headers with payloads endpoint during the history fill operations. Doing so simplifies the code and presumably reduces the CPU overhead over the CW-N and CW-D involved in the process.The next step is to migrate the listen process from the header stream to the header with payloads stream, but it is not included in this PR.