Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ft, nft aggregated tables #153

Closed
wants to merge 2 commits into from
Closed

feat: add ft, nft aggregated tables #153

wants to merge 2 commits into from

Conversation

telezhnaya
Copy link
Contributor

@telezhnaya telezhnaya commented Aug 23, 2021

Please have a look at the columns in both tables.
FT
NFT

FT

near/NEPs#141

We can also add nullable memo field. Though, I thought it's not so important and we can store it in args. I also have a plan to put there all the other data that we have.

NFT

near/NEPs#171

Same things about memo and args.

We also can add nullable approval_id field, read more about it here:
https://github.com/near/NEPs/blob/master/specs/Standards/NonFungibleToken/ApprovalManagement.md

Any feedback is highly appreciated!

I guess I can implement the first draft and check what we put into args. After that, we can finalize the columns.

@telezhnaya telezhnaya self-assigned this Aug 23, 2021
@telezhnaya telezhnaya linked an issue Aug 25, 2021 that may be closed by this pull request
@@ -0,0 +1,62 @@
CREATE TABLE aggregated__fungible_tokens
(
id bigserial NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that id is useless. We should have some way to define an order of events, e.g. processing_index_in_chunk (ORDER BY processed_in_block_timestamp, processing_index_in_chunk we don't care if tokens have the same processing_index_in_chunk since the receipts in the same block do not overlap and even more so, the receipts to the same FT/NFT token contract are bounded to a single shard where the contract resides)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with adding processing_index_in_chunk for sorting purposes, but I'm still not sure is it a good idea to get rid of the artificial id column.

Some thoughts on why having id as PK is good:

  • I still think about duplicated tx hashes and maybe it's better to gradually migrate to artificial id as PK everywhere. In my opinion, our first task is to show data, only after it we can validate it somehow. So, if timestamp + index_in_chunk is not unique for some reason, it's better to add the value anyway, and then decide what should we do with it.
  • If the PK is large, you have fewer keys per page in the index and the index will take more cache memory to store it. We have 8 bytes (one id bigserial) VS 20 bytes (16 bytes in timestamp numeric(20,0) and 4 bytes in index_in_chunk integer). Performance is also degraded because we have to scan more memory.
  • Comparing numerics could be more expensive than comparing integers. I'm not sure about it, it is not mentioned in the documentation in a straight way, but 5-minute googling gives this thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PK of this table should probably be originated_from_receipt_id. timestamp and index cannot be PK since those are not unique by nature, so that is out of the question.

We should not expect this artificial ID to be used for sorting since it has no physical meaning on the blockchain (our source of truth) and we cannot rely on autoincrementing columns if we ever want to use partitions.

CREATE TABLE aggregated__fungible_tokens
(
id bigserial NOT NULL,
included_in_transaction_hash text NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest originated_from_transaction_hash and also consider adding originated_from_receipt_id

@@ -0,0 +1,62 @@
CREATE TABLE aggregated__fungible_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that "fungible_tokens" is not a self-describing naming. What exactly about FT do we store here? We don't store them as is (like accounts or access_keys tables), we seem to store updates/touches/changes (like account_changes table)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregated__fungible_token_actions
aggregated__fungible_token_updates
aggregated__fungible_token_changes
aggregated__fungible_token_operations

I vote for operations and changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More options:

aggregated__fungible_token_function_calls
aggregated__fungible_token_calls
aggregated__fungible_token_executions
aggregated__fungible_token_interactions

So far I like operations the best, but I would need to think about it more since I feel that it might be ambiguous and still not immediately obvious. I suggest we take some time wrapping our head around and modeling the usage to see how would we call the table if we would love to use it.

id bigserial NOT NULL,
included_in_transaction_hash text NOT NULL,
included_in_transaction_timestamp numeric(20, 0) NOT NULL,
transaction_status execution_outcome_status NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we actually set the transaction status? Can we actually say if the FT operation was successful as a whole transaction rather than just a single receipt execution outcome status? (I have yet to read the FT standard, so keep that in mind) How would that play with cross-contract calls (e.g. FT operation is triggered as part of some other transaction just as a cross-contract call)

included_in_transaction_hash text NOT NULL,
included_in_transaction_timestamp numeric(20, 0) NOT NULL,
transaction_status execution_outcome_status NOT NULL,
issued_contract_id text NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense, but from a first glance I cannot say what it is, so it might be worth considering a better name (I will need to read FT NEP before I can provide any suggestions)

Copy link
Contributor Author

@telezhnaya telezhnaya Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft_account_id
ft_owner_account_id
ft_contract_account_id
ft_deployed_by_account_id

We need to think about it more

Comment on lines 9 to 10
predecessor_account_id text NOT NULL,
receiver_account_id text NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the "fungible token" table point of view these are ambiguous names since receiver_account_id of a transaction, a receipt, and FT token are three different account ids, and I feel that in the context of "fungible token" table we care about the FT sender and FT receiver account ids and the rest is less relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft_sender_account_id
ft_receiver_account_id

@telezhnaya
Copy link
Contributor Author

I've pushed another version of ft/nft tables. It's even more important to discuss not the interface, but the process of collecting the data.

On the call, we discussed how should we split the data by line: transaction-based or receipt-based. For now, I've chosen a transaction-based way: I guess noone will need more fine-grained splitting.

We should trust each contract implementation. It does not sound reliable. Instead of that, I suggest the following algorithm for collecting the data:

  • We find all transactions with the methods that could change ft/nft balance of the user.
  • For each transaction, we collect all receipts. (It means, we wait for all the receipt chains to be finalized)
  • For each receipt, we collect predecessor_account_id and receiver_account_id
  • For each account_id, we call view method ft_balance_of or nft_token to get the balance.
  • For FT, for each account_id (both predecessor and receiver), we store the info in the DB.
  • For NFT, we store the info about the current NFT owner.

A minute of self-criticism:

  1. The PKs are awful, it repeats the story with account_changes table, I suggest repeating the solution from there and to return id columns.
  2. While re-reading my own report, I hane an idea that maybe it's better to store data receipt-based. It will be easier to understand what's going on with more data.
  3. Imagine the scenario: A gives its NFT to B. We have 2 records about it: A has NFT, then B has NFT. I want to have third record: A does not have NFT anymore. I guess I need to add the boolean column, something like is_the_owner (I should think about naming more).

@frol
Copy link
Contributor

frol commented Aug 28, 2021

I feel that receipt-based records will be much easier to reason about and won't break apart if there is a contract that once being called initiates two independent FT transfers (being a single transaction, it will be hard to store this kind of scenario in transaction-based records); also, waiting for all the receipts sounds too much of a hassle on the indexer level since we process data block-by-block.

From the usage perspective, I would love to be able to:

SELECT
  ft_sender_account_id,
  ft_receiver_account_id,
  ft_name, -- ambiguous name, so we should do better
  ft_amount, -- UNSURE if we should have a transfer amount or should we store the updated balance or both (I feel that the transfer amount is more useful and the balance can be fetched on-demand via view call given we know the block reference; though we have a minor issue if a few transfers land on the same block, then we need to fetch the balance on a previous block, and all the transfers on the current block in the correct order and apply the changes correctly)
  originated_from_transaction_hash,
  updated_at_receipt_id,
FROM aggregated__fugnible_tokens_operations
WHERE ft_sender_account_id = 'frol.near' OR ft_receiver_account_id = 'frol.near'
ORDER BY updated_at_block_timestamp DESC, updated_at_index_in_chunk DESC
LIMIT 10

In order to see:

frol.near history:

frol.near sent 10 wNEAR to olga.near | frol.near's balance is 90 wNEAR:
  [deadbeef01] frol.near has been deducted 10 wNEAR to send to olga.near
  [deadbeef02] olga.near has received 10 wNEAR from frol.near
frol.near received 7 wNEAR from khorolets.near | frol.near's balance is 97 wNEAR
  ...

@telezhnaya
Copy link
Contributor Author

We merged NFT support recently #169; it's easier for me to start from the beginning with FT support rather than using this discussion. Closing the PR

@telezhnaya telezhnaya closed this Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FT/NFT: make a research about them
2 participants