Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change limits of when we delete replication slots #8619

Closed
2 of 3 tasks
Tracked by #6213
ololobus opened this issue Aug 6, 2024 · 6 comments
Closed
2 of 3 tasks
Tracked by #6213

Change limits of when we delete replication slots #8619

ololobus opened this issue Aug 6, 2024 · 6 comments
Assignees

Comments

@ololobus
Copy link
Member

ololobus commented Aug 6, 2024

  • Metric on the number of snap files
  • Pageserver: metric for number of aux files (we have pageserver_aux_file_estimated_size, so it's enough for now)
  • Adjust metric for slot removal after lagging for >N GB (we can probably relax it)

Currently, when we have >1000 snap files, we delete the slot. We can probably remove (bump) logic for replication slot removal

"neon.logical_replication_max_snap_files",

John: That'd be great for all AUX files to occupy 32 MB (???)
Important comment from Heikki on how to generate big aux files: #6626 (comment)

@tristan957
Copy link
Member

Sounds like we want a default of 32MB?

@ololobus
Copy link
Member Author

@arssher last planning we discussed adding the size-based equivalent for neon.logical_replication_max_snap_files because the cumulative size is what we also care for AUX files. I wonder, was there any specific reasoning for adding a files-count-based limit?

If not, I think we could have both, like 10k count limit and 10 MB or so for size (check the right numbers with storage)

cc @tristan957

@arssher
Copy link
Contributor

arssher commented Oct 18, 2024

I wonder, was there any specific reasoning for adding a files-count-based limit?

When we were adding this, the count was the main problem / bottleneck due to O(n^2) operations in the pageserver. Since this is apparently solved maybe size limit also reasonable. But it is probably not very different because snap files contain list of running xacts, which would usually be in range 0-100, maybe 1000. Hm, though not sure if subxacts are separately tracked there, probably they are.

@knizhnik
Copy link
Contributor

Size of LR snapshot files doesn't;t depend on number of subxids (at least as far as I understand).
looking at the code it should depend on mummer of committed transactions:

	/* copy committed xacts */
	if (builder->committed.xcnt > 0)
	{
		sz = sizeof(TransactionId) * builder->committed.xcnt;
		memcpy(ondisk_c, builder->committed.xip, sz);
		COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
		ondisk_c += sz;
	}

... but looks like I do not completely understand which committed transactions are taken in account.
I tried to get large snapshot files with the following test:

import time
from fixtures.neon_fixtures import (
    NeonEnv,
    logical_replication_sync,
)
from fixtures.log_helper import log


def test_snap_files(neon_simple_env: NeonEnv, vanilla_pg):
    env = neon_simple_env

    tenant_id = env.initial_tenant
    timeline_id = env.initial_timeline
    endpoint = env.endpoints.create_start("main", config_lines=["log_statement=all"])

    con1 = endpoint.connect()
    cur1 = con1.cursor()

    con2 = endpoint.connect()
    cur2 = con2.cursor()

    cur1.execute("create table t(pk serial primary key, payload integer)")
    cur1.execute("create publication pub1 for table t")

    # now start subscriber
    vanilla_pg.start()
    vanilla_pg.safe_psql("create table t(pk integer primary key, payload integer)")

    connstr = endpoint.connstr().replace("'", "''")
    log.info(f"ep connstr is {endpoint.connstr()}, subscriber connstr {vanilla_pg.connstr()}")
    vanilla_pg.safe_psql(f"create subscription sub1 connection '{connstr}' publication pub1")

    # Wait logical replication channel to be established
    logical_replication_sync(vanilla_pg, endpoint)

    # insert some data
    cur1.execute("begin")
    cur1.execute("insert into t (payload) values (0)")
    for _ in range(10000):
        cur2.execute("insert into t (payload) values (0)")
    log.info("Waiting 1 minute")
    time.sleep(60)
    cur1.execute("commit")

    logical_replication_sync(vanilla_pg, endpoint)
    assert vanilla_pg.safe_psql_scalar("SELECT count(*) FROM t") == 10001

but still size of snapshot file is 144 bytes.
May be somebody has idea how to get larger snapshot files?

In any case it seems to be good idea to calculate total size of snapshot files rather than just their number.

Also there was some concerns that larger transaction can cause generation of large snapshot files.
From the code it is clean that it is not true.
But in any case I tried LR with larger transactions.
Two observation:

  1. Large transactions doesn't increase size of snapshot file or their number, snapshots are still generated each 15 seconds and their sizes 144 bytes.
  2. By default larger transaction is placed in reorder buffer and is spilled to disk. So transaction size is limited by local disk size. To prevent it, subscription should be created with streaming=parallel option. There is now discussion in hackers whether parallel option should be switched on by default:
    https://www.postgresql.org/message-id/flat/TYAPR01MB5692F83F3C7A383FD13A209CF57D2%40TYAPR01MB5692.jpnprd01.prod.outlook.com#c3c2c14d14[…]71f730f9b20

@skyzh
Copy link
Member

skyzh commented Oct 21, 2024

regarding the size limit, pageserver currently issues a single vectored read for aux files when generating the basebackup, and we will print a warning if the size is >= 128KB. While a majority of tenants fall below this limit, pageserver can also handle aux file basebackups ~10MB.

tristan957 added a commit that referenced this issue Oct 21, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Oct 21, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Oct 21, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Oct 23, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Oct 29, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Oct 29, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Nov 8, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Nov 11, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Nov 11, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
tristan957 added a commit that referenced this issue Nov 12, 2024
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: #8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
@tristan957
Copy link
Member

#9887 should be ready to merge today if I get a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants