Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: Garbage collect WAL files #1327

Merged
merged 1 commit into from
Aug 10, 2016
Merged

raft: Garbage collect WAL files #1327

merged 1 commit into from
Aug 10, 2016

Conversation

aaronlehmann
Copy link
Collaborator

We currently garbage collect snapshot files (keeping only
KeepOldSnapshot outdated snapshots, which defaults to 0). However, we
don't garbage collect the WAL files that the snapshots replace.

Delete any WALs which are so old that they only contain information that
predates the oldest of the snapshots we have retained. This means that
by default, we will remove old WALs once they are supplanted by a
snapshot. However, if KeepOldSnapshots is set above 0, we will keep
whichever WALs are necessary to catch up from the oldest of the retained
snapshots. This makes sure that the old snapshots we retain are actually
useful, and avoids adding an independent knob for WAL retention that
might end up with an inconsistent setting.

Also, fix serious brokenness in the the deletion of old snapshots (it
was deleting the most recent outdated snapshots, instead of the oldest).

Tested using the following patch, using a failure loop to create a steady stream of writes to raft, and restarting the daemon frequently:

diff --git a/manager/state/raft/raft.go b/manager/state/raft/raft.go
index e4d6360..0c96c51 100644
--- a/manager/state/raft/raft.go
+++ b/manager/state/raft/raft.go
@@ -301,8 +301,8 @@ func DefaultNodeConfig() *raft.Config {
 // DefaultRaftConfig returns a default api.RaftConfig.
 func DefaultRaftConfig() api.RaftConfig {
    return api.RaftConfig{
-       KeepOldSnapshots:           0,
-       SnapshotInterval:           10000,
+       KeepOldSnapshots:           2,
+       SnapshotInterval:           5,
        LogEntriesForSlowFollowers: 500,
        ElectionTick:               3,
        HeartbeatTick:              1,
diff --git a/vendor/github.com/coreos/etcd/wal/wal.go b/vendor/github.com/coreos/etcd/wal/wal.go
index 99f1a9c..3004f43 100644
--- a/vendor/github.com/coreos/etcd/wal/wal.go
+++ b/vendor/github.com/coreos/etcd/wal/wal.go
@@ -46,7 +46,7 @@ const (

    // the expected size of each wal segment file.
    // the actual size might be bigger than it.
-   segmentSizeBytes = 64 * 1000 * 1000 // 64MB
+   segmentSizeBytes = 64 * 1000 // 64k
 )

 var (

Also added a unit test that adds a lot of data to force a WAL rollover, checks that the old WAL is deleted iff it completely predates the snapshot, and restarts the cluster to make sure the remaining wal/snapshot combination contains all the data.

cc @abronan @runshenzhu @aluzzardi

@aluzzardi
Copy link
Member

/cc @chanwit, this will fix the problem you had with wal files taking too much space.

I believe that if you want to try this out, you could stop dockerd, start swarmd with this patch and -d /var/lib/docker/swarm and it will automatically garbage collect the extra .wal files.

@aaronlehmann is that correct?

@aaronlehmann
Copy link
Collaborator Author

Not quite. The garbage collection happens every time a snapshot is triggered, so it takes on the order of 10,000 raft commits to make it happen.

@aaronlehmann aaronlehmann force-pushed the gc-wals branch 2 times, most recently from ecd52e4 to 24ed09e Compare August 5, 2016 23:23
@aaronlehmann
Copy link
Collaborator Author

I'm having some trouble with the unit test in CI. It seems to get stuck. It passes locally for me. I think it may just be too memory-intensive for the CI virtual machine. The WAL package has a hardcoded 64 MB file rotation threshold, so we need to fill the logs with that much data, and multiplied by 3 nodes in the cluster it must end up being too much for the VM to handle.

I've skipped the test when -test.short is specified, for now. Let me know if there are any better ideas.

@codecov-io
Copy link

codecov-io commented Aug 5, 2016

Current coverage is 55.07% (diff: 43.10%)

Merging #1327 into master will decrease coverage by <.01%

@@             master      #1327   diff @@
==========================================
  Files            80         80          
  Lines         12582      12606    +24   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6931       6943    +12   
- Misses         4693       4702     +9   
- Partials        958        961     +3   

Sunburst

Powered by Codecov. Last update 866211a...c72cdf1

err := os.Remove(filepath.Join(n.snapDir(), snapFile))
if err != nil && removeErr == nil {
removeErr = err
if curSnapshotIdx >= 0 && i > curSnapshotIdx {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we construct remainingSnapshots within this loop? So that it can reduce disk I/O by avoiding the call of ioutil.ReadDir, (or it may not harm the performance because of cache? I don't know).

Also, it may reduce some redundant code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@runshenzhu: Updated.

}

// Sort snapshot filenames in lexical order
sort.Sort(sort.StringSlice(remainingSnapshots))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainingSnapshots is already sorted, (in reversed order), because snapshots is sorted.

We may even replace this array with var oldestSnap string and construct it from the loop above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@runshenzhu: Further simplified.

We currently garbage collect snapshot files (keeping only
KeepOldSnapshot outdated snapshots, which defaults to 0). However, we
don't garbage collect the WAL files that the snapshots replace.

Delete any WALs which are so old that they only contain information that
predates the oldest of the snapshots we have retained. This means that
by default, we will remove old WALs once they are supplanted by a
snapshot. However, if KeepOldSnapshots is set above 0, we will keep
whichever WALs are necessary to catch up from the oldest of the retained
snapshots. This makes sure that the old snapshots we retain are actually
useful, and avoids adding an independent knob for WAL retention that
might end up with an inconsistent setting.

Also, fix serious brokenness in the the deletion of old snapshots (it
was deleting the most recent outdated snapshots, instead of the oldest).

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@runshenzhu
Copy link
Contributor

LGTM

@chanwit
Copy link

chanwit commented Aug 6, 2016

Thank you @aaronlehmann for implementing this and @aluzzardi for the ping.
This PR SGTM.

An addition requirement here. Is it possible to make KeepOldSnapshots configurable?
This would be really great:

  • for production, we set the value to 2 and disk space will never ran out.
  • for testing, we set the value to 0 and Swarm will keep all WALs for time-based troubleshooting (like we did during the analysis).

@aaronlehmann
Copy link
Collaborator Author

Yes, I agree that KeepOldSnapshots should be configurable. I opened an issue on Docker a few days ago: moby/moby#25451

@aaronlehmann
Copy link
Collaborator Author

But note that setting KeepOldSnapshots to 0 would mean that it only keeps the latest snapshot. If you want to keep all of them, it should be set to a large number.

@chanwit
Copy link

chanwit commented Aug 9, 2016

@aaronlehmann I see thank you for that :-)

@abronan
Copy link
Contributor

abronan commented Aug 10, 2016

LGTM

@abronan abronan merged commit f0e46aa into moby:master Aug 10, 2016
@aaronlehmann aaronlehmann deleted the gc-wals branch August 10, 2016 23:30
@chanwit
Copy link

chanwit commented Aug 11, 2016

🎉 🎉

@aaronlehmann aaronlehmann mentioned this pull request Aug 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants