-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon: Remove by_hash index, replace with index on hash prefix #4087
Conversation
return 0, errors.Wrap(err, "error calculating hashHexHash") | ||
} | ||
|
||
return int64(hexHashPrefix), nil // overflow expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use simply use uint64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because postgres doesn't support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth clarifying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, postgres doesn't support unsigned ints. I'll update the comment with clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use numeric for the column, but bleh, and will use more bytes.
-- +migrate Up | ||
ALTER TABLE history_transactions ADD COLUMN transaction_hash_prefix bigint; | ||
CREATE INDEX transaction_hash_prefix on history_transactions (transaction_hash_prefix); | ||
DROP INDEX by_hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include:
UPDATE history_transactions SET transaction_hash_prefix = ('x' || lpad(transaction_hash, 16, '0'))::bit(64)::bigint where transaction_hash_prefix is null;
here to avoid forcing a reingestion? Might take a while to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it in fa49b48 and then removed it: I cancelled it after 10h in stg. I can try again but I think reingestion is better because you can actually track progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe could move the UPDATE to be done on a smaller block of rows obtained from an outer select loop, and then checkpoint after each iteration to keep the commit log space down?
pseudo:
CTR = 0
LOOP WHILE CTR < MAX(ID) or something
UPDATE history_transactions .. where id in (SELECT id from history_transactions where id > CTR LIMIT 10000 )
CHECKPOINT
CTR+=10000
END LOOP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the speed of this query is now less a problem than the space required. The problem is that history_transactions
is one of the largest tables in full history table and it needs to be rewritten so it requires twice as much space. When I run this in stg
we basically run out of memory. So I think I'll implement the idea in: #4087 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, cool, I was thinking that the pg commit log gets huge when attempting one update stmt on all rows under one TX, so, by CHECKPOINT'ing on a smaller batch of rows, that would allow PG to flush the commit log storage areas often, potentially keeping mem/disk usage lower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean. Looking at the charts it looks like Postgres is actually able to flush the log during the transaction. "Transaction Logs Disk Usage" peaked at 5GB and during that time "Free Storage Space" from from 1TB to 0. So it looks like the majority of data is actually updated rows.
@@ -0,0 +1,9 @@ | |||
-- +migrate Up | |||
ALTER TABLE history_transactions ADD COLUMN transaction_hash_prefix bigint; | |||
CREATE INDEX transaction_hash_prefix on history_transactions (transaction_hash_prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CREATE INDEX CONCURRENTLY
help this build faster? and/or adding a WHERE transaction_hash_prefix IS NOT NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that CONCURRENTLY
will affect the performance a lot because of how big the table is. I'll try WHERE transaction_hash_prefix IS NOT NULL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrently
doesn't make it use more db cores, it just lets other txns happen while the index is being built. But probably safer without anyway.
I think the WHERE transaction_hash_prefix IS NOT NULL
would hurt your TransactionsWithoutPrefixExist
query, so also probably not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrently
doesn't make it use more db cores, it just lets other txns happen while the index is being built. But probably safer without anyway.
Postgres docs say it actually can affect performance:
PostgreSQL supports building indexes without locking out writes. This method is invoked by specifying the CONCURRENTLY option of CREATE INDEX. When this option is used, PostgreSQL must perform two scans of the table, and in addition it must wait for all existing transactions that could potentially modify or use the index to terminate. Thus this method requires more total work than a standard index build and takes significantly longer to complete. However, since it allows normal operations to continue while the index is built, this method is useful for adding new indexes in a production environment. Of course, the extra CPU and I/O load imposed by the index creation might slow other operations.
"(ht.transaction_hash_prefix IS NULL and ht.transaction_hash = ?)", | ||
hash, | ||
) | ||
return q.Get(ctx, dest, byHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be worthwhile to log out a debug/trace line here just for monitoring runtime profile, see how often it falls into here?
@@ -27,8 +28,19 @@ func TestTransactionQueries(t *testing.T) { | |||
err := q.TransactionByHash(tt.Ctx, &tx, real) | |||
tt.Assert.NoError(err) | |||
|
|||
fake := "not_real" | |||
err = q.TransactionByHash(tt.Ctx, &tx, fake) | |||
_, err = q.ExecRaw(tt.Ctx, "UPDATE history_transactions SET transaction_hash_prefix = NULL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test coverage, maybe this new verify warrants it's own test case method like 'TestNonPrefixedTransactionQueries' or maybe just a message on the line 29 assert adding detail like 'tx should be found with hash prefix', and line 36 'tx should be found without hash prefix', you already have an inline comment for that context there, which is enough, just mentioned in case nice to have visible up front in test display output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I'm green still, probably not qualified for approval, just observing.
I wonder if this would still be needed if we use #4094 to |
@stellar/horizon-committers I run some tests in I started think about intermediate solution that will allow people with reingested DB to be able to use improved version but these with old rows still use the old index on hash field. I came up with this:
By doing this we remove the performance issue in |
@bartekn I'm not sure if it's a good idea to have ingestion essentially perform part of the db migration automatically. perhaps we could have a separate horizon command to drop the index? we could also run the select query when horizon starts up and log a message about how the index should be dropped if no rows are found |
Good point. I wanted it to be automagic for users but doing this via CLI command is safer/cleaner. |
@stellar/horizon-committers this is ready to review again. I implemented the idea here: #4087 with @tamirms suggestion: we don't drop index automatically in ingestion. Instead, we print a message on startup that optimization is possible via |
// Should we finally set default to info in support/log? | ||
hlog.DefaultLogger.SetLevel(hlog.InfoLevel) | ||
|
||
hlog.Info("Checking if by_hash index can be removed...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe ln 507 - 523 could be refactored slightly, pull up the details into own function, and cmd here invokes standard interface for each optimization function available
func(ctx Context, sess Session) error
or hold-off, consider refactor when/if other optimization come up later.
// transaction in history_transactions table without transaction_hash_prefix | ||
// field set. | ||
func (q *Q) TransactionsWithoutPrefixExist(ctx context.Context) (bool, error) { | ||
var id int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering about potential performance option, does the sql query need to run forever? if row count > 0 and no null transaction_hash_prefix, could that first 'false' reading be safely cached here in memory and ensuing requests short circuit to that cache state for response and skip sql round trip, or could that state regress to where new rows still arrive with transaction_hash_prefix=null?
|
||
return q.Get(ctx, dest, union) | ||
} else { | ||
byHash := selectTransaction.Where("ht.transaction_hash = ?", hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case don't we still need to union with the inner hash query?
Use: "optimize-schema", | ||
Short: "checks for possible performance improvements and execute schema migrations", | ||
Long: "Some schema migrations that improve performance can be extremely slow, depending on number of rows in tables. " + | ||
"This command detects if it's possible to run such migrations fast and runs them.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear here if you mean that the detection is fast, or the migration is fast
// TransactionsWithoutPrefixExist returns true if there is at least one | ||
// transaction in history_transactions table without transaction_hash_prefix | ||
// field set. | ||
func (q *Q) TransactionsWithoutPrefixExist(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit ugly to be leaking schema details and migrations through the db interface...
Instead, maybe we should push the optimization implementation down here into the db layer, and expose func (q *Q) MaybeOptimizeSchema(ctx context.Context) (bool, error)
, or something?
// let's use the index on `transaction_hash_prefix` because index on | ||
// `transaction_hash` could be removed by Horizon admins for performance | ||
// reasons (faster ingestion). | ||
txWithoutPrefixExist, outerErr := q.TransactionsWithoutPrefixExist(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this code be easier to read if it had fewer double-negatives?
txWithoutPrefixExist, outerErr := q.TransactionsWithoutPrefixExist(ctx) | |
txPrefixIndexReady, outerErr := q.TransactionPrefixIndexReady(ctx) |
(ready/available/ok/whatever)
Then below you could do:
if txPrefixIndexReady {
// All transactions have prefix field set -> use new index
// ...
} else {
// no index
}
@@ -0,0 +1,9 @@ | |||
-- +migrate Up | |||
ALTER TABLE history_transactions ADD COLUMN transaction_hash_prefix bigint; | |||
CREATE INDEX transaction_hash_prefix on history_transactions (transaction_hash_prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrently
doesn't make it use more db cores, it just lets other txns happen while the index is being built. But probably safer without anyway.
I think the WHERE transaction_hash_prefix IS NOT NULL
would hurt your TransactionsWithoutPrefixExist
query, so also probably not worth it.
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
This commit removes
by_hash
index and introduces a new field:transaction_hash_prefix
(and index on it). This make the duration ofTransactionProcessor
comparable to other processors (around 3-4x speed improvement) but also keeps the queries by transaction hash fast (however fastSELECT
queries require reingestion).Why
After analyzing the speed of queries sent by
TransactionProcessor
we understood that the majority of time is spent in updating indexes on string fields. In #4085 unused indexes were removed but there was one left that is actually used in/transactions/{id}
endpoint:by_hash
ontransaction_hash
field. Removing it makes the aforementioned endpoint extremely slow and, in reality, useless because of timeouts.The solution in this commit creates a bigint field that represents a 16 characters (8 bytes) prefix of tx hash in numeric form (thus updating such index is fast). As a result all transactions are put into
math.MaxUint64
buckets. The buckets are used for index scan inSELECT
statement and found rows are then filtered bytransaction_hash
. In practice the number of buckets is vastly larger than number of existing transactions in public network so we should see more than one transaction in a bucket very rarely or never.Known limitations
The problem with this solution is that while ingestion becomes faster the select queries are slow until the prefix column is populated by reingesiton. Also, while creating a new NULLable field is fast in Postgres, creating an index on it is actually very slow for a table with 2B rows (in stg it took around 6h to create such index).