-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(index): add lotus shed commands to prune all indexes #12393
feat(index): add lotus shed commands to prune all indexes #12393
Conversation
@aarshkshah1992 We can also create separate commands for pruning the Need suggestions on a couple of things:
|
cmd/lotus-shed/indexes.go
Outdated
&cli.IntFlag{ | ||
Name: "from", | ||
Value: 0, | ||
Usage: "the tipset height to start backfilling from (0 is head of chain)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pruning", not "backfilling"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lotus-shed/indexes.go
Outdated
if err := rows.Scan(&id, &height); err != nil { | ||
return "", "", "", xerrors.Errorf("error scanning event id: %w", err) | ||
} | ||
eventIds = append(eventIds, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this could get big, probably not ideal for an sql query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
I think this is going to be pretty inefficient doing the select and building a really long targeted delete, can we not just use a range delete? I'm also not sure it's worth the bother doing a full range with these (i.e. the "epochs" arg probably doesn't get used by anyone). I think you'd just want to have an "older than" (in fact, you should rename "from" to be something like that, So, can we just bundle the heavy lifting directly into the queries?
|
|
I don't think it need symmetry with the backfill commands, and honestly I find 'from' a little bit awkward for those anyway (I added that "going backwards" in there because of that awkwardness, I kind of wish they were something like Separate sub-commands makes sense to me rather than all in one. You may only have one of these indexes enabled anyway. |
@akaladarshi I agree with Rodd. Looking at this now, separate pruning cmds for all Indices where we "prune everything older than epoch X" makes a lot of sense. |
I was inclined more towards having a single command to prune all the indexes at once and having individual commands as well. From a code perspective, there won't be much change as I have already separated each index pruning into its function. It's a matter of calling those functions (individually or together). |
cmd/lotus-shed/indexes.go
Outdated
basePath, err := homedir.Expand(cctx.String("repo")) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
startHeight := cctx.Int64("older-than") | ||
if startHeight == 0 { | ||
currTs, err := api.ChainHead(ctx) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
startHeight += int64(currTs.Height()) | ||
|
||
if startHeight == 0 { | ||
return xerrors.Errorf("bogus start height %d", startHeight) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about you refactor this into a utility function since this block is repeated 4 times: basePath, startHeight, err := parsePruneArgs(cctx)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This is neat, thanks @akaladarshi. I have a suggestion, but I don't want to impose this as a requirement, just if you think it's interesting and worth doing: a I think this should even work to do both in one query, for each table:
This is just a nice feature, I think though running this on my own node and I don't think I want to be pruning at the moment (I'm building up a big db which is helpful for debugging) but I would like to see what it would do, so that's what I came up with. |
That sounds like a good-to-have feature, I will start working on it once I am done with testing this PR locally with the calibration testnet node before merging and migrating of indexes to common DB. |
cmd/lotus-shed/indexes.go
Outdated
blkMsgs, err := api.ChainGetBlockMessages(ctx, blockheader.Cid()) | ||
if err != nil { | ||
log.Infof("failed to get block messages at height: %d", currTs.Height()) | ||
continue // should we break here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvagg should we break here or keep going till the first tipset?
Is there a possibility, that a block is missing in between?
While testing I saw one block was missing, after that, it kept going on like this.
Testing logs:
`2024-08-21T20:21:39.118+0530 INFO lotus-shed lotus-shed/indexes.go:1148 failed to get block messages at height: 1867646
2024-08-21T20:21:39.122+0530 INFO lotus-shed lotus-shed/indexes.go:1148 failed to get block messages at height: 1867641
2024-08-21T20:21:39.127+0530 INFO lotus-shed lotus-shed/indexes.go:1148 failed to get block messages at height: 1867633
2024-08-21T20:21:39.134+0530 INFO lotus-shed lotus-shed/indexes.go:1148 failed to get block messages at height: 1867623`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this one is tricky! because we don't have an epoch to delete against we can't do it bulk, as you've discovered, but we also can't do it this way because we kind of want the inverse: we want to delete entries for messages where we don't have the message in the blockstore ourselves. So it's almost like: collect all messages, then delete everything not in that list. Which is impractical because that could get very large. The other problem with using the chain to do this is that there are going to be messages not on the canonical chain that we need to consider (i.e. there's been a reorg, so walking back from some tip doesn't lead us to tipsets that were previously considered canonical but are no longer), perhaps we want to include them, perhaps we should delete them because they are non-canonical, it's actually not even clear what you'd want to do because it depends on how the lookup API is being used. But conceptually what you'd want is something like: nothing in this txhashdb that isn't also in my blockstore. Because if I can't get to the original message anyway then there's no real help in being able to translate it from an eth hash.
BUT; just doing a quick look at the db, it's the one that we already have GC in already, and we achieve that by timestamp. If you turn on EthTxHashMappingLifetimeDays
then it'll clean up according to the timestamp it was inserted at.
So I think the strategy needs to change for this table: we just can't do it on epoch unfortunately; we either need to not offer it here at all, or we need to come up with a timestamp and run something like the existing functionality:
func (ei *EthTxHashLookup) DeleteEntriesOlderThan(days int) (int64, error) { |
We can get a timestamp. So, if you want to keep this pruning here, you could keep the existing api.ChainGetTipSetByHeight(ctx, abi.ChainEpoch(startHeight), types.EmptyTSK)
to get the tipset, look at the first block and get its timestamp (e.g. like this and then work out how many seconds ago it is and then you can run the delete using datetime()
maths. Something like DELETE FROM eth_tx_hashes WHERE insertion_time < datetime('now', '-X seconds')
. Note how sqlite datetime maths works:
sqlite> select datetime('now'), datetime('now', '-10 seconds'), datetime('now', '-86400 seconds');
2024-08-22 03:38:03|2024-08-22 03:37:53|2024-08-21 03:38:03
This is one area where we could apply some reuse of existing code: in eth_transaction_hash_lookup.go
, we could expose a method to reuse the existing delete statement, and since we only run this once per hour on the existing GC we can get rid of the preparedstatement too and expose a public function to run it on any instance of a db, here's a quick go at what what may look like over in that file:
func (ei *EthTxHashLookup) DeleteEntriesOlderThan(days int) (int64, error) {
if ei.db == nil {
return 0, xerrors.New("db closed")
}
epochs := abi.ChainEpoch(build.BlockDelaySecs * builtin.SecondsInDay * uint64(days))
return DeleteEntriesOlderThan(ei.db, epochs)
}
// DeleteEntriesOlderThan deletes entries older than the given number of epochs from now.
func DeleteEntriesOlderThan(db *sql.DB, epochs abi.ChainEpoch) (int64, error) {
secondsAgo := int64(epochs) * int64(build.BlockDelaySecs)
res, err := db.Exec(deleteOlderThan, "-"+strconv.FormatInt(secondsAgo, 10)+" seconds")
if err != nil {
return 0, err
}
return res.RowsAffected()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as per your suggestion of reusing code.
cmd/lotus-shed/indexes.go
Outdated
totalRowsAffected += rowsAffected | ||
log.Infof("pruned %s table, rows affected: %d", eventTable, rowsAffected) | ||
|
||
res, err = tx.Exec(deleteEventEntriesByEventIDs, startHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After executing deleteEventEntriesByEventIDs
, res.RowsAffected()
is returning 0 even though in DB data is being deleted.
My guess is that SQLite not able to figure out how many rows are affected because there is a subquery inside it.
What are your thoughts on this @rvagg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you just need to swap this with deleteEventFromStartHeight
- this query relies on the entries being in the event
table because of the sub-select, but you've just deleted the ones that it's trying to match; so clean up event_entry
first, then event
and you're all good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned the data is still being deleted but, it's not returning the number of rows affected.
Anyway, I will update it as you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how do you know it's being deleted? you've lost the entried they join with on the event
table, have you tried a select count(*) from event
and select count(*) from event_index
to see that they're actually deleted? Because of the subquery working on rows that were previously deleted, you should not have the join and therefore not get deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, I was checking it the wrong way.
cmd/lotus-shed/indexes.go
Outdated
defer closer() | ||
ctx := lcli.ReqContext(cctx) | ||
api := srvc.FullNodeAPI() | ||
closer := srvc.Close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just keep the pattern the same across these, you have some churn in naming and you have an extra closer
here; the one below in pruneTxHashCmd
is clean:
srv, err := lcli.GetFullNodeServices(cctx)
if err != nil {
return err
}
defer srv.Close() //nolint:errcheck
api := srv.FullNodeAPI()
ctx := lcli.ReqContext(cctx)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lotus-shed/indexes.go
Outdated
} | ||
|
||
// pruneEventsIndex is a helper function that prunes the events index.db for a number of epochs starting from a specified height | ||
func pruneEventsIndex(_ context.Context, basePath string, startHeight int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just ditch the context arg if you have no way of using it, I know it's nice to have symmetry but if you don't need the symmetry for any tactical reason then just keep it clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lotus-shed/indexes.go
Outdated
|
||
// database related constants | ||
const ( | ||
databaseType = "sqlite" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this naming confusing when parsing the path resolution below, perhaps something like indexesDir
? i.e. it's the name of a directory and isn't used to refer to the type of database anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realise this until after I ran the node. Initially, I thought it was DB-type.
Anyway, I have updated it.
cmd/lotus-shed/indexes.go
Outdated
err := db.Close() | ||
if err != nil { | ||
fmt.Printf("ERROR: closing db: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer compact form where you can get away with it, IMO it helps with Go's nasty eye-sprawl
err := db.Close() | |
if err != nil { | |
fmt.Printf("ERROR: closing db: %s", err) | |
} | |
if err := db.Close(); err != nil { | |
fmt.Printf("ERROR: closing db: %s", err) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lotus-shed/indexes.go
Outdated
} | ||
|
||
totalRowsAffected += rowsAffected | ||
log.Infof("pruned %s table, rows affected: %d", eventTable, rowsAffected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not critical: I suggest keeping track of these numbers separately and then logging after you've committed the transaction, because until you do, they're not really gone; if you have a rollback then these log messages will be incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're concerned about progress logging, print a message before each execution: log.Infof("pruning `%s` table", eventTable)
; then at least the user knows its working if this takes a long time for some reason (it shouldn't unless it's on a really slow disk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
cmd/lotus-shed/indexes.go
Outdated
err = tx.Commit() | ||
if err != nil { | ||
return xerrors.Errorf("error committing tx: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err = tx.Commit() | |
if err != nil { | |
return xerrors.Errorf("error committing tx: %w", err) | |
} | |
if err = tx.Commit(); err != nil { | |
return xerrors.Errorf("error committing tx: %w", err) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lotus-shed/indexes.go
Outdated
err := db.Close() | ||
if err != nil { | ||
fmt.Printf("ERROR: closing db: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := db.Close() | |
if err != nil { | |
fmt.Printf("ERROR: closing db: %s", err) | |
} | |
if err := db.Close(); err != nil { | |
fmt.Printf("ERROR: closing db: %s", err) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
e60187d
to
d092ede
Compare
d092ede
to
e25ebee
Compare
@akaladarshi : what are the next steps for this PR so it can be merged? |
@BigLep I am not sure about this PR as of now because we have removed the existing indexes in favour of #12421. I think @aarshkshah1992 or @rvagg are the best people who can comment on this. |
@aarshkshah1992 : should this be closed in light of ChainIndexer work? |
I've requested review from you @aarshkshah1992 . I assume this can be closed, but will let you comment. |
Related Issues
Fixes: #12377
Proposed Changes
Add a
lotus-shed
command to prune the indexes, otherprune-txhash
command to prune tx hash indexprune-events
commandprune-msgindex
command to prune the msg indexprune-all-indexes
command to prune all the indexes (msg, events, tx hash)at onceUse
older-than
flag to get the height to start pruning indexes older than the provided height