services/horizon: Remove by_hash index, replace with index on hash prefix #4087

bartekn · 2021-11-17T16:36:52Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

This commit removes by_hash index and introduces a new field: transaction_hash_prefix (and index on it). This make the duration of TransactionProcessor comparable to other processors (around 3-4x speed improvement) but also keeps the queries by transaction hash fast (however fast SELECT queries require reingestion).

Why

After analyzing the speed of queries sent by TransactionProcessor we understood that the majority of time is spent in updating indexes on string fields. In #4085 unused indexes were removed but there was one left that is actually used in /transactions/{id} endpoint: by_hash on transaction_hash field. Removing it makes the aforementioned endpoint extremely slow and, in reality, useless because of timeouts.

The solution in this commit creates a bigint field that represents a 16 characters (8 bytes) prefix of tx hash in numeric form (thus updating such index is fast). As a result all transactions are put into math.MaxUint64 buckets. The buckets are used for index scan in SELECT statement and found rows are then filtered by transaction_hash. In practice the number of buckets is vastly larger than number of existing transactions in public network so we should see more than one transaction in a bucket very rarely or never.

Known limitations

The problem with this solution is that while ingestion becomes faster the select queries are slow until the prefix column is populated by reingesiton. Also, while creating a new NULLable field is fast in Postgres, creating an index on it is actually very slow for a table with 2B rows (in stg it took around 6h to create such index).

…efix

2opremio · 2021-11-17T16:39:00Z

services/horizon/internal/db2/history/transaction_batch_insert_builder.go

+		return 0, errors.Wrap(err, "error calculating hashHexHash")
+	}
+
+	return int64(hexHashPrefix), nil // overflow expected


why not use simply use uint64?

Ah, because postgres doesn't support it?

It may be worth clarifying it.

Correct, postgres doesn't support unsigned ints. I'll update the comment with clarifications.

Could use numeric for the column, but bleh, and will use more bytes.

paulbellamy · 2021-11-18T18:50:17Z

services/horizon/internal/db2/schema/migrations/52_remove_by_hash_index.sql

+-- +migrate Up
+ALTER TABLE history_transactions ADD COLUMN transaction_hash_prefix bigint;
+CREATE INDEX transaction_hash_prefix on history_transactions (transaction_hash_prefix);
+DROP INDEX by_hash;


Should we include:

UPDATE history_transactions SET transaction_hash_prefix = ('x' || lpad(transaction_hash, 16, '0'))::bit(64)::bigint where transaction_hash_prefix is null;

here to avoid forcing a reingestion? Might take a while to run?

I tried it in fa49b48 and then removed it: I cancelled it after 10h in stg. I can try again but I think reingestion is better because you can actually track progress.

maybe could move the UPDATE to be done on a smaller block of rows obtained from an outer select loop, and then checkpoint after each iteration to keep the commit log space down?

pseudo:

CTR = 0 LOOP WHILE CTR < MAX(ID) or something UPDATE history_transactions .. where id in (SELECT id from history_transactions where id > CTR LIMIT 10000 ) CHECKPOINT CTR+=10000 END LOOP

I think the speed of this query is now less a problem than the space required. The problem is that history_transactions is one of the largest tables in full history table and it needs to be rewritten so it requires twice as much space. When I run this in stg we basically run out of memory. So I think I'll implement the idea in: #4087 (comment).

ok, cool, I was thinking that the pg commit log gets huge when attempting one update stmt on all rows under one TX, so, by CHECKPOINT'ing on a smaller batch of rows, that would allow PG to flush the commit log storage areas often, potentially keeping mem/disk usage lower

Oh I see what you mean. Looking at the charts it looks like Postgres is actually able to flush the log during the transaction. "Transaction Logs Disk Usage" peaked at 5GB and during that time "Free Storage Space" from from 1TB to 0. So it looks like the majority of data is actually updated rows.

paulbellamy · 2021-11-18T18:50:49Z

services/horizon/internal/db2/schema/migrations/52_remove_by_hash_index.sql

@@ -0,0 +1,9 @@
+-- +migrate Up
+ALTER TABLE history_transactions ADD COLUMN transaction_hash_prefix bigint;
+CREATE INDEX transaction_hash_prefix on history_transactions (transaction_hash_prefix);


Does CREATE INDEX CONCURRENTLY help this build faster? and/or adding a WHERE transaction_hash_prefix IS NOT NULL?

I'm afraid that CONCURRENTLY will affect the performance a lot because of how big the table is. I'll try WHERE transaction_hash_prefix IS NOT NULL.

concurrently doesn't make it use more db cores, it just lets other txns happen while the index is being built. But probably safer without anyway.

I think the WHERE transaction_hash_prefix IS NOT NULL would hurt your TransactionsWithoutPrefixExist query, so also probably not worth it.

concurrently doesn't make it use more db cores, it just lets other txns happen while the index is being built. But probably safer without anyway.

Postgres docs say it actually can affect performance:

PostgreSQL supports building indexes without locking out writes. This method is invoked by specifying the CONCURRENTLY option of CREATE INDEX. When this option is used, PostgreSQL must perform two scans of the table, and in addition it must wait for all existing transactions that could potentially modify or use the index to terminate. Thus this method requires more total work than a standard index build and takes significantly longer to complete. However, since it allows normal operations to continue while the index is built, this method is useful for adding new indexes in a production environment. Of course, the extra CPU and I/O load imposed by the index creation might slow other operations.

sreuland · 2021-11-18T21:00:28Z

services/horizon/internal/db2/history/transaction.go

+			"(ht.transaction_hash_prefix IS NULL and ht.transaction_hash = ?)",
+			hash,
+		)
+	return q.Get(ctx, dest, byHash)


would it be worthwhile to log out a debug/trace line here just for monitoring runtime profile, see how often it falls into here?

sreuland · 2021-11-18T21:15:49Z

services/horizon/internal/db2/history/transaction_test.go

@@ -27,8 +28,19 @@ func TestTransactionQueries(t *testing.T) {
 	err := q.TransactionByHash(tt.Ctx, &tx, real)
 	tt.Assert.NoError(err)

-	fake := "not_real"
-	err = q.TransactionByHash(tt.Ctx, &tx, fake)
+	_, err = q.ExecRaw(tt.Ctx, "UPDATE history_transactions SET transaction_hash_prefix = NULL")


nice test coverage, maybe this new verify warrants it's own test case method like 'TestNonPrefixedTransactionQueries' or maybe just a message on the line 29 assert adding detail like 'tx should be found with hash prefix', and line 36 'tx should be found without hash prefix', you already have an inline comment for that context there, which is enough, just mentioned in case nice to have visible up front in test display output

sreuland

Nice work! I'm green still, probably not qualified for approval, just observing.

2opremio · 2021-11-22T20:07:47Z

I wonder if this would still be needed if we use #4094 to COPY into a temporary table without indexes and then merge it into the definite table.

bartekn · 2021-11-22T22:29:07Z

I wonder if this would still be needed if we use #4094 to COPY into a temporary table without indexes and then merge it into the definite table.

I'm not sure if performance improvement in #4094 actually works well with an index on string field. The benchmark runs on a table without an index.

bartekn · 2021-11-22T22:42:51Z

@stellar/horizon-committers I run some tests in stg last week and unfortunately I wasn't able to migrate history_transactions table using an SQL query: after around 10h the DB run out of this space (because all rows need to be rewritten but old ones are still kept until vacuum run).

I started think about intermediate solution that will allow people with reingested DB to be able to use improved version but these with old rows still use the old index on hash field. I came up with this:

In TransactionByHash we first run the query: select * from history_transactions where transaction_hash_prefix is null limit 1.
1. If the query above returns one row this means that there are some rows without prefix set. In such case we use the old query that's using transaction_hash field only.
2. If there are no rows it means all rows have the prefix set so we can use optimized query.
At the same time during ingestion (because it always have RW access to DB) we run the same query (select * from history_transactions where transaction_hash_prefix is null limit 1).
1. If no rows are found we run DROP INDEX IF EXISTS by_hash.
2. Otherwise do nothing.

By doing this we remove the performance issue in TransactionProcessors as soon as all rows are updated and we don't break performance of /transactions/{id} in a mean time. This is a bit tricky and changes the schema outside migrations so wanted to check with you first before implementing.

tamirms · 2021-11-23T07:58:33Z

At the same time during ingestion (because it always have RW access to DB) we run the same query (select * from history_transactions where transaction_hash_prefix is null limit 1).
If no rows are found we run DROP INDEX IF EXISTS by_hash.
Otherwise do nothing.

@bartekn I'm not sure if it's a good idea to have ingestion essentially perform part of the db migration automatically. perhaps we could have a separate horizon command to drop the index? we could also run the select query when horizon starts up and log a message about how the index should be dropped if no rows are found

bartekn · 2021-11-23T12:02:58Z

Good point. I wanted it to be automagic for users but doing this via CLI command is safer/cleaner.

bartekn · 2021-11-24T13:07:09Z

@stellar/horizon-committers this is ready to review again. I implemented the idea here: #4087 with @tamirms suggestion: we don't drop index automatically in ingestion. Instead, we print a message on startup that optimization is possible via horizon db optimize-schema command (maybe we'll have more optimizations in the future so I used a generic name).

sreuland · 2021-11-29T22:51:53Z

services/horizon/cmd/db.go

+		// Should we finally set default to info in support/log?
+		hlog.DefaultLogger.SetLevel(hlog.InfoLevel)
+
+		hlog.Info("Checking if by_hash index can be removed...")


maybe ln 507 - 523 could be refactored slightly, pull up the details into own function, and cmd here invokes standard interface for each optimization function available
func(ctx Context, sess Session) error

or hold-off, consider refactor when/if other optimization come up later.

sreuland · 2021-11-29T23:08:39Z

services/horizon/internal/db2/history/transaction.go

+// transaction in history_transactions table without transaction_hash_prefix
+// field set.
+func (q *Q) TransactionsWithoutPrefixExist(ctx context.Context) (bool, error) {
+	var id int64


just wondering about potential performance option, does the sql query need to run forever? if row count > 0 and no null transaction_hash_prefix, could that first 'false' reading be safely cached here in memory and ensuing requests short circuit to that cache state for response and skip sql round trip, or could that state regress to where new rows still arrive with transaction_hash_prefix=null?

tamirms · 2021-11-30T09:14:04Z

services/horizon/internal/db2/history/transaction.go

+
+		return q.Get(ctx, dest, union)
+	} else {
+		byHash := selectTransaction.Where("ht.transaction_hash = ?", hash)


in this case don't we still need to union with the inner hash query?

paulbellamy · 2021-11-30T10:51:49Z

services/horizon/cmd/db.go

+	Use:   "optimize-schema",
+	Short: "checks for possible performance improvements and execute schema migrations",
+	Long: "Some schema migrations that improve performance can be extremely slow, depending on number of rows in tables. " +
+		"This command detects if it's possible to run such migrations fast and runs them.",


unclear here if you mean that the detection is fast, or the migration is fast

paulbellamy · 2021-11-30T10:55:29Z

services/horizon/internal/db2/history/transaction.go

+// TransactionsWithoutPrefixExist returns true if there is at least one
+// transaction in history_transactions table without transaction_hash_prefix
+// field set.
+func (q *Q) TransactionsWithoutPrefixExist(ctx context.Context) (bool, error) {


A bit ugly to be leaking schema details and migrations through the db interface...

Instead, maybe we should push the optimization implementation down here into the db layer, and expose func (q *Q) MaybeOptimizeSchema(ctx context.Context) (bool, error), or something?

paulbellamy · 2021-11-30T11:06:45Z

services/horizon/internal/db2/history/transaction.go

+	// let's use the index on `transaction_hash_prefix` because index on
+	// `transaction_hash` could be removed by Horizon admins for performance
+	// reasons (faster ingestion).
+	txWithoutPrefixExist, outerErr := q.TransactionsWithoutPrefixExist(ctx)


Would this code be easier to read if it had fewer double-negatives?

Suggested change

txWithoutPrefixExist, outerErr := q.TransactionsWithoutPrefixExist(ctx)

txPrefixIndexReady, outerErr := q.TransactionPrefixIndexReady(ctx)

(ready/available/ok/whatever)

Then below you could do:

if txPrefixIndexReady { // All transactions have prefix field set -> use new index // ... } else { // no index }

paulbellamy · 2021-11-30T11:09:58Z

services/horizon/internal/db2/schema/migrations/52_remove_by_hash_index.sql

@@ -0,0 +1,9 @@
+-- +migrate Up
+ALTER TABLE history_transactions ADD COLUMN transaction_hash_prefix bigint;
+CREATE INDEX transaction_hash_prefix on history_transactions (transaction_hash_prefix);


concurrently doesn't make it use more db cores, it just lets other txns happen while the index is being built. But probably safer without anyway.

I think the WHERE transaction_hash_prefix IS NOT NULL would hurt your TransactionsWithoutPrefixExist query, so also probably not worth it.

services/horizon: Remove by_hash index, replace with index on hash pr…

074f514

…efix

2opremio reviewed Nov 17, 2021

View reviewed changes

bartekn added 7 commits November 17, 2021 17:41

migration file

93901f1

fix migrations

d14f36e

Remove prefix calculations

fa49b48

fixes

500b937

fixes

0a91501

fixes

d9ed3d2

update query

c2315aa

bartekn marked this pull request as ready for review November 18, 2021 18:37

paulbellamy reviewed Nov 18, 2021

View reviewed changes

sreuland reviewed Nov 18, 2021

View reviewed changes

2opremio mentioned this pull request Nov 18, 2021

Feature Request: Allow Efficient Access to Account "Actions" Where Account is Source #4051

Open

bartekn mentioned this pull request Nov 19, 2021

Profile Horizon #4022

Closed

2opremio mentioned this pull request Nov 22, 2021

support/db: PoC: Use COPY instead of INSERT in BatchInsertBuilder #4094

Closed

2 tasks

bartekn added 5 commits November 23, 2021 13:39

Merge branch 'master' into by_hash_prefix

8fd57dd

Revert index removal

f4d2cb6

update query

c93847e

Check and command

9f18437

Merge branch 'master' into by_hash_prefix

05560a5

sreuland reviewed Nov 29, 2021

View reviewed changes

tamirms reviewed Nov 30, 2021

View reviewed changes

tamirms changed the base branch from master to release-horizon-v2.12.0 November 30, 2021 09:20

paulbellamy reviewed Nov 30, 2021

View reviewed changes

tamirms deleted the branch stellar:release-horizon-v2.12.0 December 7, 2021 08:16

tamirms closed this Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services/horizon: Remove by_hash index, replace with index on hash prefix #4087

services/horizon: Remove by_hash index, replace with index on hash prefix #4087

bartekn commented Nov 17, 2021 •

edited

Loading

2opremio Nov 17, 2021

2opremio Nov 17, 2021

2opremio Nov 17, 2021

bartekn Nov 17, 2021

paulbellamy Nov 18, 2021 •

edited

Loading

paulbellamy Nov 18, 2021

bartekn Nov 18, 2021

sreuland Nov 23, 2021 •

edited

Loading

bartekn Nov 23, 2021

sreuland Nov 23, 2021

bartekn Nov 23, 2021

paulbellamy Nov 18, 2021

bartekn Nov 18, 2021

paulbellamy Nov 30, 2021

bartekn Nov 30, 2021

sreuland Nov 18, 2021

sreuland Nov 18, 2021 •

edited

Loading

sreuland left a comment

2opremio commented Nov 22, 2021

bartekn commented Nov 22, 2021

bartekn commented Nov 22, 2021 •

edited

Loading

tamirms commented Nov 23, 2021

bartekn commented Nov 23, 2021 •

edited

Loading

bartekn commented Nov 24, 2021 •

edited

Loading

sreuland Nov 29, 2021

sreuland Nov 29, 2021

tamirms Nov 30, 2021

paulbellamy Nov 30, 2021

paulbellamy Nov 30, 2021

paulbellamy Nov 30, 2021

paulbellamy Nov 30, 2021

	txWithoutPrefixExist, outerErr := q.TransactionsWithoutPrefixExist(ctx)
	txPrefixIndexReady, outerErr := q.TransactionPrefixIndexReady(ctx)

services/horizon: Remove by_hash index, replace with index on hash prefix #4087

services/horizon: Remove by_hash index, replace with index on hash prefix #4087

Conversation

bartekn commented Nov 17, 2021 • edited Loading

PR Structure

Thoroughness

Release planning

What

Why

Known limitations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulbellamy Nov 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sreuland Nov 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sreuland Nov 18, 2021 • edited Loading

Choose a reason for hiding this comment

sreuland left a comment

Choose a reason for hiding this comment

2opremio commented Nov 22, 2021

bartekn commented Nov 22, 2021

bartekn commented Nov 22, 2021 • edited Loading

tamirms commented Nov 23, 2021

bartekn commented Nov 23, 2021 • edited Loading

bartekn commented Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartekn commented Nov 17, 2021 •

edited

Loading

paulbellamy Nov 18, 2021 •

edited

Loading

sreuland Nov 23, 2021 •

edited

Loading

sreuland Nov 18, 2021 •

edited

Loading

bartekn commented Nov 22, 2021 •

edited

Loading

bartekn commented Nov 23, 2021 •

edited

Loading

bartekn commented Nov 24, 2021 •

edited

Loading