Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add block timestamps to logs #8616

Merged
merged 5 commits into from
Mar 14, 2023
Merged

Conversation

reductionista
Copy link
Contributor

@reductionista reductionista commented Mar 2, 2023

This adds a block timestamp field to logpoller.Log and corresponding column in evm_logs table.
Block timestamps are filled in while converting logs returned in types.Log format from the eth client into logpoller.Log format.
The block header including timestamp is already available for unfinalized logs. For finalized logs (backfilled), GetBlocksRange() is called to submit a batch request for the block headers associated with each log event found.

A block_timestamp column has also been added to evm_log_poller_blocks for consistency, and because blocks from db are checked first in GetBlocksRange(), to avoid unnecessarily re-requesting a block.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

I see that you haven't updated any CHANGELOG files. Would it make sense to do so?

@reductionista reductionista force-pushed the logpoller_block_timestamps branch 6 times, most recently from 0206b33 to 2bc9987 Compare March 6, 2023 17:24
@reductionista reductionista marked this pull request as ready for review March 6, 2023 17:27
@reductionista reductionista force-pushed the logpoller_block_timestamps branch 2 times, most recently from 0843e1e to 0fee2d4 Compare March 6, 2023 18:56
krehermann
krehermann previously approved these changes Mar 6, 2023
core/chains/evm/logpoller/log_poller.go Outdated Show resolved Hide resolved
core/chains/evm/logpoller/log_poller_test.go Outdated Show resolved Hide resolved
addresses := []common.Address{th.EmitterAddress1, th.EmitterAddress2}
topics := []common.Hash{EmitterABI.Events["Log1"].ID, EmitterABI.Events["Log2"].ID}

err := th.LogPoller.RegisterFilter(Filter{"convertLogs", topics, addresses})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe overkill - I think we can just have a single table test against a pure convertLogs. Same coverage afaict but more readable/maintainable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have renamed this test--Test_convertLogs is misleading. Originally I was planning to have one test for lp.convertLogs() and a separate one for lp.blocksFromLogs(), but after realizing there is no way to mock an eth client from logpoller (due to circular dependencies), I ended up deciding the best option was to just have a single test for both. Also, since it's basically an end-to-end test of how blockpoller handles block timestamps as the chain progresses, I think I should move it into logpoller/integration_test.go.

I've converted convertLogs() back into a pure function, and added a 2 parameter version of it in helper_test.go that fills in the logger and chain id params for convenience. I can also add a simple standalone convertLogs() test, but I think the end-to-end test is still useful to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocksFromLogs is such a thin wrapper around GetBlocksRange I probably wouldn't even worry about it (has its own test) and just add maybe a small assertion to the existing integration test + pure convertLogs tests

core/chains/evm/logpoller/log_poller.go Outdated Show resolved Hide resolved
core/chains/evm/logpoller/log_poller.go Outdated Show resolved Hide resolved
@@ -0,0 +1,7 @@
-- +goose Up
ALTER TABLE evm_logs ADD COLUMN block_timestamp timestamptz NOT NULL DEFAULT now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT now() is interesting - not better to just error vs a potentially incorrect block timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DEFAULT Is only there for the migration. You must have a DEFAULT set in order to add a column to a non-empty table.
Only other options besides DEFAULT now() I can think of are:

1.   purge all logs from the table during migration
or
2.  use NULL default
or
3.  use some very old date, like 1970, for the default.

I assume the first one is off the table. The problem with the second one is that you can't read the block_timestamp column directly into a logpoller.Log struct unless you define it as a pointer and allow it to be nil... otherwise the datatypes don't match. I decided that would make things uglier in several ways, and means you've got to add permanent checks for nil in lots of places just to solve this one-time migration issue.

I decided it was cleaner and simpler to just define it as time.Time instead of *time.Time:

type Log struct {
	EvmChainId     *utils.Big
	LogIndex       int64
	BlockHash      common.Hash
	BlockNumber    int64
	BlockTimestamp time.Time
	Topics         pq.ByteaArray
	EventSig       common.Hash
	Address        common.Address
	TxHash         common.Hash
	Data           []byte
	CreatedAt      time.Time
}

This seems cleaner and simpler in several ways. It would be confusing and possibly error-prone if CreatedAt was time.Time but BlockTimestamp was *time.Time.

The 3rd option I'm definitely open to, if that would make more sense... although I don't think I have enough context on the use cases for this feature to know which would be better. Any preference there? Also open to other ideas not on my list, if I've missed one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I suppose a slightly better variant of this might be: whether we choose now() or [old date] for the DEFAULT during migration, we could add a second step after the ALTER COLUMN which drops the default. I think I like that approach actually... fits better with not wanting a permanent solution for a temporary problem. Either way though, could use advice on what the best timestamp is to fill in for all pre-migration logs.

Copy link
Contributor

@connorwstein connorwstein Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I guess the issue with actually looking up the correct timestamp in the migration is you need RPC access in the migration scripts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know of any way we could access that in the migration scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was really important to have these be accurate, we do have one option but it seems like overkill to me: set them all to 1970, then during node startup have a second phase of migration where it re-requests all blocks set to a 1970 timestamp and updates them to accurate timestamp.
I don't know of a way it could skip that after the first restart though... it would probably at least have to run the query to find if any of them are set to 1970 each time it starts up.

Assuming this gets rolled out in the next release (or any release <= CCIP release), then it doesn't seem like something we should care much, right? If CCIP is the only customer and that won't be enabled until later... won't these blocks be far in the past by then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct shouldn't be an issue because of the CCIP release sequencing. I think that + the fact that we can always delete and replay manually to repopulate the correct data as a recovery technique makes DEFAULT now() fine.

jmank88
jmank88 previously approved these changes Mar 13, 2023
This adds a block timestamp field to logpoller.Log and corresponding
column in evm_logs table.  Block timestamps are filled in while
converting logs returned in types.Log format from the eth client into
logpoller.Log format.  The block header including timestamp is already
available for unfinalized logs.  For finalized logs (backfilled),
GetBlocksRange() is called to submit a batch request for the block
headers associated with each log event found.
@reductionista reductionista dismissed stale reviews from jmank88 and krehermann via 3f5604d March 14, 2023 17:20
@reductionista reductionista force-pushed the logpoller_block_timestamps branch 2 times, most recently from 3f5604d to c40d82c Compare March 14, 2023 17:23
- Change convertLogs back to pure function
- Move e2e block timestamp test into integration_test.go
- Refactor TestConvertLogsInputParams as TestConvertLogs, add happy paths
- Remove timestamp DEFAULT's after migration
- Return err without wrapping
- Inline call to convertLogs
@reductionista reductionista force-pushed the logpoller_block_timestamps branch from c40d82c to 43cebc5 Compare March 14, 2023 17:39
@cl-sonarqube-production
Copy link

@reductionista reductionista merged commit 39fe8e3 into develop Mar 14, 2023
@reductionista reductionista deleted the logpoller_block_timestamps branch March 14, 2023 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants