Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy WAL frames through temp file to shadow #474

Merged

Conversation

hifi
Copy link
Collaborator

@hifi hifi commented Apr 28, 2023

This implementation replicates what the new behavior in the legacy branch did using a temporary file instead of memory to buffer WAL frames between commits.

Fixes excess memory use if the WAL includes massive commits due to big inserts or migrations.

A small refactor would remove the requirement of a temporary file that was already part of the legacy branch where we keep the cached valid position of the shadow WAL in the DB struct and seek the real WAL to the last valid position instead of expecting the file to be completely intact which isn't guaranteed. After that we can directly write the pages from real WAL to shadow WAL and not worry about partial pages being treated as part of the WAL.

This implementation replicates what the new behavior in the legacy
branch did using a temporary file instead of memory to buffer WAL
frames between commits.

Fixes excess memory use if the WAL includes massive commits due to
big inserts or migrations.
@hifi
Copy link
Collaborator Author

hifi commented Apr 28, 2023

As for why this matters, I created an extremely bad test case that forced the WAL to grow a lot within a single write transaction.

image

Between 10:05 and 10:18 I ran increasingly harder write transactions against a Litestreamed (downstream patched v0.3.9) database that did not have this backport from the legacy branch. Eventually the container went into OOMKill loop as every time Litestream started it would exhaust all of the RAM it was given.

After the gap I ran the same tests with this patch applied. I think because of gc pressure it needed even that much RAM as it did just to get through of everything but regardless it lived. Between 10:40 and 10:45 I dropped a 6 gigabyte table that would normally cause the same issue as Litestream would end up buffering that 6 gigs into RAM.

@benbjohnson benbjohnson merged commit e0493f9 into benbjohnson:main Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants