GC the CAS storage #5136

stuhood · 2017-11-27T20:04:53Z

We should add a method to Scheduler or Graph to garbage collect any entries in the Store that are not currently mentioned in the Graph, and which either:

are older than X
have not been accessed in Y

... depending on whether storing access times has a significant impact on performance.

The text was updated successfully, but these errors were encountered:

stuhood · 2017-11-27T20:12:02Z

...but actually, if the Store database is going to be shared across workspaces, using access time might be necessary to avoid nuking things out from under other daemons.

illicitonion · 2017-11-28T11:53:59Z

Given the atomic update nature of lmdb, we could probably do something clever where a daemon marks a blob as being "leased", and is responsible for re-leasing at least every n hours or something.

stuhood · 2017-11-28T16:59:16Z

+1 on leases... so at read time, if it looks like the lease will expire in X time, renew it.

stuhood · 2018-01-12T04:41:36Z

@illicitonion : I think that we'll want a relatively short gap between #5106 landing and this change, as avoiding needing to deal with backwards compatibility of the LMDB store quite yet would be nice.

illicitonion · 2018-01-15T11:11:13Z

Yeah, that's a pretty reasonable concern - will try to get this put together today :)

illicitonion · 2018-01-15T17:18:30Z

Ok, I've pushed the mechanics of a possible lease and garbage collect implementation to https://github.com/twitter/pants/tree/dwagnerhall/fs/cas-garbage-collection - the commit message has some discussion around how to actually wire that up to be useful. I'm going to put together an implementation of a, because it should be quick and easy, but I could be convinced that b is worthwhile...

illicitonion · 2018-01-15T18:28:18Z

Actually, I think I'll wait until we have a chat before implementing a - it's not entirely clear to me how the best way to write a "walk the graph to find all DigestFile nodes" block would be.

I suspect that the correct thing to do is to give Core a Set<Digest> field, and have Core kick off a background thread which does the whole timer-at-interval thing to lease each thing in that Set. DigestFile::run would then add to the Set via its reference to Context (yay handy context objects!), but I'm not sure what would remove things from that Set; maybe DigestFile would get a Drop impl which does the removal? I believe that Nodes get dropped when they're invalidated...

Alternatively, I could add a kind of hacky single-use Walk implementation on Graph which uses all roots, rather than requiring them to be specified? Or an even hackier single-use get_all_nodes_of_kind implementation to Graph which just iterates over the Nodes HashMap, and have the interval-timer thread poll Graph for the nodes it should be looking at, and extending the leases on those...

WDYT?

stuhood · 2018-01-16T18:48:57Z

but I'm not sure what would remove things from that Set; maybe DigestFile would get a Drop impl which does the removal? I believe that Nodes get dropped when they're invalidated...

Nodes are cloned in various places, so I think that that Drop impl would be a bit too clever.

Or an even hackier single-use get_all_nodes_of_kind implementation to Graph which just iterates over the Nodes HashMap, and have the interval-timer thread poll Graph for the nodes it should be looking at, and extending the leases on those...

I think this is pretty reasonable, and is what I had in mind when I mentioned "walking the graph to find all DigestFile nodes".

Regarding A/B from twitter@880d7ff : it's not clear how B actually results in cleanup occurring... ie, what's looking for expired leases?

illicitonion · 2018-01-16T20:28:07Z

We had some discussion on Slack; we settled on:

When a DigestFile node is created, leasing its Digest.
Kicking off a background thread from Python which periodically acquires the fork lock, and calls into the Rust to acquire the Graph lock and traverse the Graph re-leasing any DigestFile nodes. The traversal will be very special-cased.
Kicking off a background thread from Python which periodically acquires the fork lock, and calls into the Rust to initiate a GC pass.

2 and 3 may end up being the same service (which can be modelled on the FSEventService), or may end up being separate.

We're going to punt on things which don't appear in the graph is DigestFile nodes for now (e.g. files in Snapshots received from remote execution).

stuhood · 2018-01-16T20:33:02Z

That looks great, thanks.

We're going to punt on things which don't appear in the graph is DigestFile nodes for now (e.g. files in Snapshots received from remote execution).

I think all Digests that are live in the graph would be candidates for this... I expect that the "does this represent digests" method would be quite similar to

pants/src/rust/engine/src/nodes.rs

Lines 1134 to 1145 in d12ecec

    
             /// 
        
             /// If this NodeKey represents an FS operation, returns its Path. 
        
             /// 
        
             pub fn fs_subject(&self) -> Option<&Path> { 
        
               match self { 
        
                 &NodeKey::ReadLink(ref s) => Some((s.0).0.as_path()), 
        
                 &NodeKey::Scandir(ref s) => Some((s.0).0.as_path()), 
        
                 &NodeKey::Stat(ref s) => Some(s.0.as_path()), 
        
                 _ => None, 
        
               } 
        
             } 
        
           }

(which you've updated in one of your branches to be less fragile).

stuhood added engine labels Nov 27, 2017

stuhood added this to the 1.4.x milestone Nov 27, 2017

stuhood mentioned this issue Nov 27, 2017

Snapshots are stored in the LMDB store not tar files #5106

Closed

stuhood modified the milestones: 1.4.x, 1.5.x Jan 12, 2018

illicitonion mentioned this issue Jan 18, 2018

Garbage collect Store entries #5345

Merged

illicitonion closed this as completed in #5345 Jan 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC the CAS storage #5136

GC the CAS storage #5136

stuhood commented Nov 27, 2017 •

edited

Loading

stuhood commented Nov 27, 2017

illicitonion commented Nov 28, 2017

stuhood commented Nov 28, 2017 •

edited

Loading

stuhood commented Jan 12, 2018

illicitonion commented Jan 15, 2018

illicitonion commented Jan 15, 2018

illicitonion commented Jan 15, 2018

stuhood commented Jan 16, 2018

illicitonion commented Jan 16, 2018

stuhood commented Jan 16, 2018 •

edited

Loading

GC the CAS storage #5136

GC the CAS storage #5136

Comments

stuhood commented Nov 27, 2017 • edited Loading

stuhood commented Nov 27, 2017

illicitonion commented Nov 28, 2017

stuhood commented Nov 28, 2017 • edited Loading

stuhood commented Jan 12, 2018

illicitonion commented Jan 15, 2018

illicitonion commented Jan 15, 2018

illicitonion commented Jan 15, 2018

stuhood commented Jan 16, 2018

illicitonion commented Jan 16, 2018

stuhood commented Jan 16, 2018 • edited Loading

stuhood commented Nov 27, 2017 •

edited

Loading

stuhood commented Nov 28, 2017 •

edited

Loading

stuhood commented Jan 16, 2018 •

edited

Loading