Skip to content

Commit

Permalink
utils: remove unnecessary fsync in durable_rename() (#9686)
Browse files Browse the repository at this point in the history
## Problem

WAL segment fsyncs significantly affect WAL ingestion throughput.
`durable_rename()` is used when initializing every 16 MB segment, and
issues 3 fsyncs of which 1 was unnecessary.

## Summary of changes

Remove an fsync in `durable_rename` which is unnecessary with Linux and
ext4 (which we currently use). This improves WAL ingestion throughput by
up to 23% with large appends on my MacBook.
  • Loading branch information
erikgrinaker authored Nov 12, 2024
1 parent cef1658 commit 05381a4
Showing 1 changed file with 20 additions and 11 deletions.
31 changes: 20 additions & 11 deletions libs/utils/src/crashsafe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,27 @@ pub async fn fsync_async_opt(
Ok(())
}

/// Like postgres' durable_rename, renames file issuing fsyncs do make it
/// durable. After return, file and rename are guaranteed to be persisted.
/// Like postgres' durable_rename, renames a file and issues fsyncs to make it durable. After
/// returning, both the file and rename are guaranteed to be persisted. Both paths must be on the
/// same file system.
///
/// Unlike postgres, it only does fsyncs to 1) file to be renamed to make
/// contents durable; 2) its directory entry to make rename durable 3) again to
/// already renamed file, which is not required by standards but postgres does
/// it, let's stick to that. Postgres additionally fsyncs newpath *before*
/// rename if it exists to ensure that at least one of the files survives, but
/// current callers don't need that.
/// Unlike postgres, it only fsyncs 1) the file to make contents durable, and 2) the directory to
/// make the rename durable. This sequence ensures the target file will never be incomplete.
///
/// Postgres also:
///
/// * Fsyncs the target file, if it exists, before the rename, to ensure either the new or existing
/// file survives a crash. Current callers don't need this as it should already be fsynced if
/// durability is needed.
///
/// * Fsyncs the file after the rename. This can be required with certain OSes or file systems (e.g.
/// NFS), but not on Linux with most common file systems like ext4 (which we currently use).
///
/// An audit of 8 other databases found that none fsynced the file after a rename:
/// <https://github.com/neondatabase/neon/pull/9686#discussion_r1837180535>
///
/// eBPF probes confirmed that this is sufficient with ext4, XFS, and ZFS, but possibly not Btrfs:
/// <https://github.com/neondatabase/neon/pull/9686#discussion_r1837926218>
///
/// virtual_file.rs has similar code, but it doesn't use vfs.
///
Expand All @@ -149,9 +161,6 @@ pub async fn durable_rename(
// Time to do the real deal.
tokio::fs::rename(old_path.as_ref(), new_path.as_ref()).await?;

// Postgres'ish fsync of renamed file.
fsync_async_opt(new_path.as_ref(), do_fsync).await?;

// Now fsync the parent
let parent = match new_path.as_ref().parent() {
Some(p) => p,
Expand Down

1 comment on commit 05381a4

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5473 tests run: 5240 passed, 3 failed, 230 skipped (full report)


Failures on Postgres 16

  • test_sharded_ingest[github-actions-selfhosted-1]: release-x86-64
  • test_storage_controller_many_tenants[github-actions-selfhosted]: release-x86-64
  • test_compaction_l0_memory[github-actions-selfhosted]: release-x86-64
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_sharded_ingest[release-pg16-github-actions-selfhosted-1] or test_storage_controller_many_tenants[release-pg16-github-actions-selfhosted] or test_compaction_l0_memory[release-pg16-github-actions-selfhosted]"

Code coverage* (full report)

  • functions: 31.8% (7890 of 24834 functions)
  • lines: 49.5% (62445 of 126257 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
05381a4 at 2024-11-12T20:01:45.505Z :recycle:

Please sign in to comment.