-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ft, nft aggregated tables #153
Conversation
@@ -0,0 +1,62 @@ | |||
CREATE TABLE aggregated__fungible_tokens | |||
( | |||
id bigserial NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that id
is useless. We should have some way to define an order of events, e.g. processing_index_in_chunk
(ORDER BY processed_in_block_timestamp, processing_index_in_chunk
we don't care if tokens have the same processing_index_in_chunk
since the receipts in the same block do not overlap and even more so, the receipts to the same FT/NFT token contract are bounded to a single shard where the contract resides)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with adding processing_index_in_chunk
for sorting purposes, but I'm still not sure is it a good idea to get rid of the artificial id
column.
Some thoughts on why having id
as PK is good:
- I still think about duplicated tx hashes and maybe it's better to gradually migrate to artificial
id
as PK everywhere. In my opinion, our first task is to show data, only after it we can validate it somehow. So, iftimestamp
+index_in_chunk
is not unique for some reason, it's better to add the value anyway, and then decide what should we do with it. - If the PK is large, you have fewer keys per page in the index and the index will take more cache memory to store it. We have 8 bytes (one id bigserial) VS 20 bytes (16 bytes in timestamp numeric(20,0) and 4 bytes in index_in_chunk integer). Performance is also degraded because we have to scan more memory.
- Comparing numerics could be more expensive than comparing integers. I'm not sure about it, it is not mentioned in the documentation in a straight way, but 5-minute googling gives this thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PK of this table should probably be originated_from_receipt_id
. timestamp
and index
cannot be PK since those are not unique by nature, so that is out of the question.
We should not expect this artificial ID to be used for sorting since it has no physical meaning on the blockchain (our source of truth) and we cannot rely on autoincrementing columns if we ever want to use partitions.
CREATE TABLE aggregated__fungible_tokens | ||
( | ||
id bigserial NOT NULL, | ||
included_in_transaction_hash text NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest originated_from_transaction_hash
and also consider adding originated_from_receipt_id
@@ -0,0 +1,62 @@ | |||
CREATE TABLE aggregated__fungible_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that "fungible_tokens" is not a self-describing naming. What exactly about FT do we store here? We don't store them as is (like accounts
or access_keys
tables), we seem to store updates/touches/changes (like account_changes
table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregated__fungible_token_actions
aggregated__fungible_token_updates
aggregated__fungible_token_changes
aggregated__fungible_token_operations
I vote for operations
and changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More options:
aggregated__fungible_token_function_calls
aggregated__fungible_token_calls
aggregated__fungible_token_executions
aggregated__fungible_token_interactions
So far I like operations
the best, but I would need to think about it more since I feel that it might be ambiguous and still not immediately obvious. I suggest we take some time wrapping our head around and modeling the usage to see how would we call the table if we would love to use it.
id bigserial NOT NULL, | ||
included_in_transaction_hash text NOT NULL, | ||
included_in_transaction_timestamp numeric(20, 0) NOT NULL, | ||
transaction_status execution_outcome_status NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we actually set the transaction status? Can we actually say if the FT operation was successful as a whole transaction rather than just a single receipt execution outcome status? (I have yet to read the FT standard, so keep that in mind) How would that play with cross-contract calls (e.g. FT operation is triggered as part of some other transaction just as a cross-contract call)
included_in_transaction_hash text NOT NULL, | ||
included_in_transaction_timestamp numeric(20, 0) NOT NULL, | ||
transaction_status execution_outcome_status NOT NULL, | ||
issued_contract_id text NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense, but from a first glance I cannot say what it is, so it might be worth considering a better name (I will need to read FT NEP before I can provide any suggestions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ft_account_id
ft_owner_account_id
ft_contract_account_id
ft_deployed_by_account_id
We need to think about it more
predecessor_account_id text NOT NULL, | ||
receiver_account_id text NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the "fungible token" table point of view these are ambiguous names since receiver_account_id
of a transaction, a receipt, and FT token are three different account ids, and I feel that in the context of "fungible token" table we care about the FT sender and FT receiver account ids and the rest is less relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ft_sender_account_id
ft_receiver_account_id
c80da53
to
bcfbcf3
Compare
I've pushed another version of ft/nft tables. It's even more important to discuss not the interface, but the process of collecting the data. On the call, we discussed how should we split the data by line: transaction-based or receipt-based. For now, I've chosen a transaction-based way: I guess noone will need more fine-grained splitting. We should trust each contract implementation. It does not sound reliable. Instead of that, I suggest the following algorithm for collecting the data:
A minute of self-criticism:
|
I feel that receipt-based records will be much easier to reason about and won't break apart if there is a contract that once being called initiates two independent FT transfers (being a single transaction, it will be hard to store this kind of scenario in transaction-based records); also, waiting for all the receipts sounds too much of a hassle on the indexer level since we process data block-by-block. From the usage perspective, I would love to be able to: SELECT
ft_sender_account_id,
ft_receiver_account_id,
ft_name, -- ambiguous name, so we should do better
ft_amount, -- UNSURE if we should have a transfer amount or should we store the updated balance or both (I feel that the transfer amount is more useful and the balance can be fetched on-demand via view call given we know the block reference; though we have a minor issue if a few transfers land on the same block, then we need to fetch the balance on a previous block, and all the transfers on the current block in the correct order and apply the changes correctly)
originated_from_transaction_hash,
updated_at_receipt_id,
FROM aggregated__fugnible_tokens_operations
WHERE ft_sender_account_id = 'frol.near' OR ft_receiver_account_id = 'frol.near'
ORDER BY updated_at_block_timestamp DESC, updated_at_index_in_chunk DESC
LIMIT 10 In order to see:
|
We merged NFT support recently #169; it's easier for me to start from the beginning with FT support rather than using this discussion. Closing the PR |
Please have a look at the columns in both tables.
FT
NFT
FT
near/NEPs#141
We can also add nullable
memo
field. Though, I thought it's not so important and we can store it inargs
. I also have a plan to put there all the other data that we have.NFT
near/NEPs#171
Same things about
memo
andargs
.We also can add nullable
approval_id
field, read more about it here:https://github.com/near/NEPs/blob/master/specs/Standards/NonFungibleToken/ApprovalManagement.md
Any feedback is highly appreciated!
I guess I can implement the first draft and check what we put into
args
. After that, we can finalize the columns.