Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag to save ABCIResults #8028

Closed
1 of 4 tasks
tac0turtle opened this issue Feb 28, 2022 · 15 comments
Closed
1 of 4 tasks

Flag to save ABCIResults #8028

tac0turtle opened this issue Feb 28, 2022 · 15 comments
Labels
stale for use by stalebot

Comments

@tac0turtle
Copy link
Contributor

tac0turtle commented Feb 28, 2022

Summary

Right now on nodes that are not indexing data, the node saves the events anyways through ABCIResults. This is unnecessary since the node opted out of indexing this data anyways.

This inflates the state.db to be 2x+ the size of the block store.db.

Problem Definition

Unnecessary computation and storage use by default.

Proposal

Provide a config in config.toml that allows the node operator to disable saving of events in ABCIResults. The flag would be on by default to keep the same behaviour as now but not operators can turn it off if they want.

I can do the work if this is accepted


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@creachadair
Copy link
Contributor

creachadair commented Feb 28, 2022

👍🏼 for removing these data. Unless we have a pretty good reason, I don't think we should bother having a flag to switch the behaviour. These data aren't exposed by any existing API of the node.

@williambanfield
Copy link
Contributor

@marbar3778 Do you know if anyone needs it at all? If they opted out of indexing, I agree with @creachadair that we should just rip it out. It would make #7983 a bit more straightforward since I could rip out all of the little extra shim logic.

@tac0turtle
Copy link
Contributor Author

In regards to the sdk, we aim to provide indexing of events from the application and send empty events to tendermint, in this case its not needed.

Oasis mentioned wanting to keep it but they also followed up with they could build something for it. I don't have a preference from the sdk perspective.

From the node operators point of view I'm up for removing it to have less bloat on disk

@cmwaters
Copy link
Contributor

cmwaters commented Mar 3, 2022

Yes Oasis have indicated that they still plan to use it. I'd be fine with first making it optional, marking it as deprecated and then in a later release removing it (i.e. a more gradual phase out approach)

@creachadair
Copy link
Contributor

In regards to the sdk, we aim to provide indexing of events from the application and send empty events to tendermint, in this case its not needed.

If the concern is indexing, then it's worth noting that indexing does not interact with the state or block stores at all—the events are indexed entirely separately, and do not access the stored data. We could delete the stored data today and not affect indexing at all.

I don't know if that's the sense in which Oasis is using these data, but if so it might not matter whether we remove them from the databases.

@tac0turtle
Copy link
Contributor Author

If the concern is indexing, then it's worth noting that indexing does not interact with the state or block stores at all—the events are indexed entirely separately, and do not access the stored data. We could delete the stored data today and not affect indexing at all.

the app side indexing would be to enable typed events. This is something tendermint can't really provide unless we do something fancy and possibly complicated. This is a niche use case and I wouldn't want to tie everyone using Tendermint to use protobuf events.

@creachadair
Copy link
Contributor

the app side indexing would be to enable typed events. This is something tendermint can't really provide unless we do something fancy and possibly complicated. This is a niche use case and I wouldn't want to tie everyone using Tendermint to use protobuf events.

Right, that makes sense—my point is that even the existing Tendermint indexing doesn't use the stored data this issue proposes to get rid of. The ABCI events are plumbed directly to the indexer, and are not read back from block/state store.

@ebuchman
Copy link
Contributor

From what I understand, these shouldn't be saved in the State db. They are only needed for the latest block to facilitate replay. If indexing is on, they would be stored separately in the indexing database.

So it should be pretty straight forward to just delete the last saved ABCIResults every time we go to save the next one when committing a block so we only ever have one saved at a time. Does that sound right ?

@ebuchman
Copy link
Contributor

// LoadABCIResponses loads the ABCIResponses for the given height from the
// database. If not found, ErrNoABCIResponsesForHeight is returned.
//
// This is useful for recovering from crashes where we called app.Commit and
// before we called s.Save(). It can also be used to produce Merkle proofs of
// the result of txs.

Right. Not sure anyone uses Merkle proofs of tx results but in principle that could be useful. I guess we should ensure it's possible to get that out of the indexer?

@ebuchman
Copy link
Contributor

We actually might not even need to delete stuff, just overwrite. We can just change the calcABCIResponsesKey to not take a height and just write over it each time we run SaveABCIResponses. Probably requires a bit more thought to ensure the crash recovery will still work properly but otherwise looks promising!

@cmwaters
Copy link
Contributor

We can just change the calcABCIResponsesKey to not take a height and just write over it each time we run SaveABCIResponses.

Yeah we've done a similar thing with the SeenCommit key where we only save the most recent one i.e. no height suffix. I think because of ABCI++, we might have already been forced to change keys (ABCIResponses is a different type) so perhaps it will be easier to integrate this change into v0.36.

On a more broader note, we've actually been planning to revisit the crash semantics by moving the AppHash around: #8632. This would allow us to completely remove storing ABCIResponses.

For v0.34 and v0.35 branches, the simplest solution would be to prune everything but the most recent.

samricotta added a commit that referenced this issue Aug 2, 2022
…#8946 (#9090)

*Adds a flag to the which enables discarding of abci responses

Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Co-authored-by: Samantha Ricotta <samantharicotta@Samanthas-MacBook-Pro.local>
Co-authored-by: Samantha Ricotta <samantharicotta@Samanthas-MBP.fritz.box>
@tac0turtle
Copy link
Contributor Author

This issue is actually wrong. it groups events into a general thing that is indexed by the kv indexer but actually the kv indexer only indexes tx events, therefore we are discarding block events entirely. This is how the current implementation works, but I don't think this was intentional, or if it was, node operator should have the option to save block events and discard tx events due to indexing them in the kv_indexer.

@thanethomson
Copy link
Contributor

This issue is actually wrong. it groups events into a general thing that is indexed by the kv indexer but actually the kv indexer only indexes tx events, therefore we are discarding block events entirely. This is how the current implementation works, but I don't think this was intentional, or if it was, node operator should have the option to save block events and discard tx events due to indexing them in the kv_indexer.

The way the issue was framed was that it would be valuable to totally discard the ABCI results data (a solution to a problem). This naturally has a number of consequences. I would highly recommend that we find a way to clearly express the problem, from a user's perspective, before continuing to implement solutions along these lines (discarding ABCI results is an internal implementation detail from an integrator's perspective).

The core of the problem seems to me to be that operators' storage costs are high and we need to find a way to bring them down. But there are different types of operators. We can't simply stop storing certain kinds of data because some people rely on that data. Therefore, operators need more control over what Tendermint stores - along with a clear understanding of the tradeoffs of using those controls.

The problem statement should, however, include information on which Tendermint functionality, from a user's/integrator's perspective, is critical to retain. For example, for each kind of integrator (relayers, block explorers, etc.), which RPC endpoints are critical? This would allow the team to look into ways of saving storage space for each major category of integrator.

@tac0turtle
Copy link
Contributor Author

The way the issue was framed was that it would be valuable to totally discard the ABCI results data (a solution to a problem). This naturally has a number of consequences. I would highly recommend that we find a way to clearly express the problem, from a user's perspective, before continuing to implement solutions along these lines (discarding ABCI results is an internal implementation detail from an integrator's perspective).

I meant I framed it wrong. not anyone else.

This issue introduced a bug in the way I opened it. It doesn't account for not indexing data that was discarded. As a user of tendermint I would like to reindex data through either the db layer or rpc layer. A tool I'm writing, in the current form when downloading a snapshot it is unknown if the snapshot has either tx_index.db or complete state.db.

If It has the state.db I can reindex everything, if it only has tx_index.db then I can't reindex block events.

The same statement can be said about RPC. If the node operator prunes ABCIResponses, but indexes txs via the kv indexer then there is no way to query for block events.

This leaves me with needing to write a rpc crawler in order to find nodes that have ABCIResponses, making the tool more complicated than needed.

@thanethomson
Copy link
Contributor

Not a problem. Fortunately it's an opt-in feature which we can deprecate over time. One thing we should do is better document the impact of turning the flag on.

What I would like to do is open a new issue that starts to outline the various ways in which we can give different types of users more control over their storage usage.

@github-actions github-actions bot added the stale for use by stalebot label May 10, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale for use by stalebot
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants