Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pop logs in log segment after cleaning up logs #3064

Conversation

ion-elgreco
Copy link
Collaborator

Description

The description of the main changes of your pull request

Related Issue(s)

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Dec 16, 2024
@ion-elgreco ion-elgreco force-pushed the fix--pop-logs-in-logsegment-after-cleanup branch 3 times, most recently from 42e9a94 to e4af46f Compare December 16, 2024 22:20
@ion-elgreco ion-elgreco force-pushed the fix--pop-logs-in-logsegment-after-cleanup branch from e4af46f to b096bab Compare December 17, 2024 19:21
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should not be producing corrupt state and endanger querying an invalid snapshot.

However I hope we can consider a little bit the implications of some of these changes and the root cause we are looking at.

Need to dive a bit deeper, but right now I don't really understand how this can happen - if the metadata is part of the snapshot, it should still be valid? Or rather why would they get cleaned up if they are part of teh active snapshot, without any operations in-between that would cause the actions to no longer be part of a version?

Certain I am missing something, just don't know what this is.

More generally, log cleanup would usually be not on "the critical path" when it comes to cleanup it seems these would run even in completely separate processes? SO rather then trying to update the internal log, could we just consume the state and trigger a new replay? Or something along those lines?

In the "Pure" thinking snapshots are immutable - incremental updates are an optimization, but actually this just treats the current state as snapshot and replays new files on top of this.

Last but not least, I am hoping to integrate more with kernel and already did some promising experiments. Ideally we would keep the surface area of the snapshot minimal until we have some more clarity.

@ion-elgreco
Copy link
Collaborator Author

ion-elgreco commented Dec 17, 2024

@roeap The snapshot is not valid anymore after you remove the logs from the object store. The logs don't exist, hence the snapshot should not reference them anymore.

However in its current state we still reference files in the log segment that are deleted from the object store, which causes vacuum to fail because it tries to read each file from the log segment

@roeap
Copy link
Collaborator

roeap commented Dec 17, 2024

The snapshot is not valid anymore after you remove the logs from the object store.

I still donÄT understand how that situation arises :) - are we deleting data?

hence the snapshot should not reference them anymore.

could we just discard the snapshot then? i.e. have the function just consume the snapshot? It is a snapshot after all :) which can go invalid if the table is updated ...

@ion-elgreco
Copy link
Collaborator Author

ion-elgreco commented Dec 17, 2024

The snapshot is not valid anymore after you remove the logs from the object store.

I still donÄT understand how that situation arises :) - are we deleting data?

Yes, cleanup_metadata removes the physical log files, and also post_commit_hook removes logs if the interval for it has been met.

So in two instances the Snapshot will become invalid. Only by forcefully reloading the table it will be fixed. The linked issue mentions the behaviour

hence the snapshot should not reference them anymore.

could we just discard the snapshot then? i.e. have the function just consume the snapshot? It is a snapshot after all :) which can go invalid if the table is updated ...

Replaying the log stream seems more compute intensive, then just popping the log files you know got deleted from the object store

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cleanup_expired_logs_for doesn't update logSegment after removal
2 participants