Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshots are stored in the LMDB store not tar files #5106

Closed

Conversation

illicitonion
Copy link
Contributor

@illicitonion illicitonion commented Nov 16, 2017

This has a few advantages:

  1. It makes the Bazel remote execution API much easier to implement.
  2. We store each unique file by content exactly one time, rather than
    once per snapshot it belongs to.

It also allows us to delete a lot of code which handles awkward
specifics of tar files.

Fixes #5106 and fixes #4394.

@illicitonion
Copy link
Contributor Author

Note that this PR contains many TODOs with non-trivial narrative which are likely to trigger actual discussion. I will resolve all of those TODOs before merging this PR, but strongly welcome any comments/thoughts/ideas around them.

Depends on #5105

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Blocked on both #4397 and #5108, methinks.

// This is called from in nodes.rs's Snapshot::create.
// How should a node do this kind of rate-limiting? It seems weird for a node to be passing a
// PosixFS into fs::Snapshot::from_path_stats just so it can get at its pool.
// So I guess we need to move this pool somewhere else. But where? Maybe Context?
Copy link
Member

@stuhood stuhood Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touched on this on #5105, but: I think what we'll need is an Arc to share the pool between the various members of the Core that need it, including PosixFS and Store (which isn't using it yet, but should be to take IO off the main thread). Opened #5108 about this.

But we also need the fork-safe machinery to ensure that we recreate this when necessary. So I think it will look like an Arc<ResettablePool>, with a helper fn with_pool<F: FnOnce()->T>(f: F) -> T.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented here: #5110

@@ -817,275 +819,6 @@ impl fmt::Debug for FileContent {
}
}

// Like std::fs::create_dir_all, except handles concurrent calls among multiple
// threads or processes. Originally lifted from rustc.
fn safe_create_dir_all_ioerror(path: &Path) -> Result<(), io::Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in the other PR, but: should use to create the path to the Store.


// TODO: How much should we be aiming to parallelise this?
// The old method returned a Future which did all of the work in one work-unit, on the PosixFS
// CpuPool. Is that sufficient? Or should we be kicking off each LMDB read in parallel on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we be kicking off each LMDB read in parallel on the pool?

This. Although I suspect it is already faster, if it's not (due to overhead in CpuPool), it will be soon. As mentioned elsewhere: Store should hold a pool reference and expose futures.

content: store
.load_file_bytes(&Fingerprint::from_hex_string(
file_node.get_digest().get_hash(),
).unwrap())?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwraps in here should be handled, as they would indicate IO errors or someone having deleted the store out from under a running process (both of which become more likely with the daemon).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or maybe not...? I suppose a panic will be somewhat-properly handled, and would trigger the daemon killing itself with a log message. But the client connected to the daemon would not get a useful message. If you were to trigger an error here saying something like "internal error, please run ./pants kill-pantsd", the response is: "gee thanks, why didn't you just do that yourself?".

Six of one, half dozen of the other.

);
}

contents.sort_by(|a, b| a.path.cmp(&b.path));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have the effect of sorting too many times due to the recursion.

If you guarantee/assert that the Directory proto has sorted file and directory lists (...maybe we should push this guarantee upstream...?), this gets a bit easier. Assert sorted, and then either:

  1. Always render the files in a directory before the directories in a directory.
  2. Execute a linear-time merge of the sorted DirNode and FileNode lists.

@@ -86,16 +86,6 @@ impl Tasks {
);
}

// TODO: Only exists in order to support the `Snapshots` singleton replacement in `context.rs`:
// Fix by porting isolated processes to rust:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably blocked on landing #4397, because the python code currently cheats, consumes this object, and goes looking for tar files.

@stuhood stuhood mentioned this pull request Nov 18, 2017
stuhood pushed a commit that referenced this pull request Nov 20, 2017
### Problem

`Store` executes IO, but was doing so synchronously on the main thread, which might block other operations.

### Solution

Give `Store` a reference to a `ResettablePool`, and execute all database operations on the pool.

It was also necessary to apply the "`Arc<Inner*>` pattern" (maybe it has a name?) to allow the `Store` to be cloned into the pool... this pattern seems much cleaner than the "`impl Trait for Arc<Struct>` pattern" that we've used with the `VFS` implementations... should consider switching those over at some point.

Finally, I added `Store::load_bytes_with` to support copying directly from LMDB buffers into a destination to avoid copying, and execute the copy on the IO pool.

### Result

`save_directory` and `materialize_directory` in `fs_util`, and `DigestFile` in `nodes` will execute concurrently. Helps to unblock #5105 and #5106.
@illicitonion
Copy link
Contributor Author

Just pushed a rebase, didn't fix all the stuff that needs fixing yet.

Outstanding TODOs:

  • Parallelise Snapshot::contents
  • Write more tests
  • Work out how to make GetNode not be public (advice welcome!)

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

As mentioned previously, this won't pass the tests until #4397 ports IsolatedProcess to rust, probably.

// TODO: This shouldn't be public (is for context to implement GetFileDigest).
// How should this *actually* be consumed, in such a way that it can be delegated to by traits which
// have multiple implementations (either for tests, or for weird tools like fs_util).
pub trait GetNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure why this is a trait anyway... just an artifact maybe? I think you can just move this method to impl Context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #5123

@@ -883,7 +886,7 @@ pub struct Snapshot {
}

impl Snapshot {
fn create(context: Context, path_globs: PathGlobs) -> NodeFuture<fs::Snapshot> {
fn create(context: Arc<Context>, path_globs: PathGlobs) -> NodeFuture<fs::Snapshot> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context implements Clone (cheaply): no need to wrap it in an Arc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like rather than taking an Arc<GFD>, Snapshot::from_path_stats should maybe take a GFD with a Clone bound?:

-  pub fn from_path_stats<GFD: GetFileDigest<Error> + Sized, Error: fmt::Debug + 'static + Send>(
+  pub fn from_path_stats<GFD: GetFileDigest<Error> + Clone, Error: fmt::Debug + 'static + Send>(
     store: Arc<Store>,
-    file_digester: Arc<GFD>,
+    file_digester: GFD,
     mut path_stats: Vec<PathStat>,
   ) -> BoxFuture<Snapshot, String> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let pool = context.core.pool.clone();
pool
.spawn_fn(move || {
fs::Snapshot::from_path_stats(
Copy link
Member

@stuhood stuhood Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_path_stats returns a Future, so no need to spawn it onto a pool: can:

stats
  .and_then(move |_| {
    fs::Snapshot::from_path_stats(
      ..
    )
    .map_err(...)
  })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -901,10 +904,16 @@ impl Snapshot {
// And then create a Snapshot.
stats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Stat type (which needed to fake the file access) should be completely removed in favor of FileDigest (which is actually usefully memoized): just make sure that you update this codepath

///
/// If this NodeKey represents an FS operation, returns its Path.
///
pub fn fs_subject(&self) -> Option<&Path> {
match self {
&NodeKey::ReadLink(ref s) => Some((s.0).0.as_path()),
&NodeKey::Scandir(ref s) => Some((s.0).0.as_path()),
&NodeKey::Stat(ref s) => Some(s.0.as_path()),
_ => None,
}
}
}
to link FileDigest to its canonical stat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Modified fs_subject to avoid relying on your review to do this in the future.

@stuhood
Copy link
Member

stuhood commented Nov 22, 2017

Also, I think there are still a few cases where we are over-sorting due to recursion.

@illicitonion illicitonion force-pushed the dwagnerhall/fs/snapshot-4 branch 2 times, most recently from 8529078 to 1aa21b5 Compare November 22, 2017 15:03
@stuhood
Copy link
Member

stuhood commented Nov 22, 2017

It looks like there are some unhelpful panics in Travis due to #4884

},
)
.map(|(mut files, dirs)| {
for mut dir in dirs.into_iter() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all ends up doing a kind of sad amount of copying things into intermediate Vecs... Any ideas for how to avoid this are welcome!

.into_iter()
.map(|(path, maybe_bytes)| {
maybe_bytes
.ok_or_else(|| format!("Couldn't find file contents for {:?}", path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like handling this error above would simplify the type of paths_and_maybe_byteses to just Vec<FileContent>... any reason to defer it to this separate join_all call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer indeed! :)

@stuhood
Copy link
Member

stuhood commented Nov 27, 2017

Related: #5136

@illicitonion
Copy link
Contributor Author

This now passes all tests which don't explicitly involve IsolatedProcess.

let env = Environment::new()
// Without this flag, read transactions don't give up their reader slots until the thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this detected via benchmarking? Thread local storage sounds... like a pretty good thing in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, our own test suite fails hard by exhausting the transaction count.

I don't think it's anywhere near as scary as its name suggests :)

I believe the mechanism here is that there is a finite number of transaction slots (default: 126), and without this flag, each time an OS thread opens a transaction, it takes up a slot, and those slots only get freed up when the OS thread dies. If you make multiple transactions sequentially from the same thread (say, you're using a thread pool), you accordingly exhaust your slots after 126 transactions, and can never read from the DB while those threads are alive.

The only down-side is that you need to make sure that any individual thread must not try to perform multiple write transactions concurrently. We don't try to run multiple write transactions concurrently on any particular OS thread, so we're fine there. (At least, that's my understanding from https://docs.rs/lmdb/0.7.0/lmdb/constant.NO_TLS.html - I guess the other interpretation is that we need a Mutex around attempts to create new write transactions, but that's not how I read it).

The Go and Python bindings turn this option on by default. It definitely feels like this should be the default behaviour in general...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks. Worth expanding the comment a tiny bit probably, as I expected that it was a performance tunable.

@@ -120,6 +104,12 @@ impl Context {
}
}

impl GetFileDigest<Failure> for Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me again why we don't just put this on VFS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't answer this question, so I tried to move it, and I learnt the answer! Which definitely points to something to improve.

There's a currently-very-implicit side-effect of calling digest, which is that after you call it, the Store will be able to answer load_file_bytes calls for it. This is an important property.

So VFS is the wrong place for the digest function, because PosixFS shouldn't have a dependency on Store.

Store would be a reasonable place for it, except that Context wants to be able to cache it rather than pass through to the underlying Store.

So I think we should probably have a standalone trait which is maybe called StoresFileByDigest, which Store implements (because it does, and is useful in tests), and which Context also implements (for the caching layer).

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Except Store can't implement it on its own, because it needs a VFS to actually load files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the rename, added a comment.

@@ -1144,10 +1127,18 @@ impl NodeKey {
///
pub fn fs_subject(&self) -> Option<&Path> {
match self {
&NodeKey::DigestFile(ref s) => Some(s.0.path.as_path()),
&NodeKey::ReadLink(ref s) => Some((s.0).0.as_path()),
&NodeKey::Scandir(ref s) => Some((s.0).0.as_path()),
&NodeKey::Stat(ref s) => Some(s.0.as_path()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that after this review Stat should be unused, since FileDigest is the only thing representing presence of a file in a Snapshot? See the comment on the Stat struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Done! Hurrah for replacing weird placeholders which need long comments with things which actually make standalone sense!

@illicitonion
Copy link
Contributor Author

Squashed and rebased. Handing over to @ity to merge into #5239

@illicitonion illicitonion force-pushed the dwagnerhall/fs/snapshot-4 branch 2 times, most recently from c3de15b to 8181271 Compare January 22, 2018 10:35
@illicitonion
Copy link
Contributor Author

@ity It looks like

def _extract_snapshot(snapshot_archive_root, snapshot, sandbox_dir):
is still reading tar files through a side-channel, which blocks me landing this PR - do you have the clean-up for this on an un-published branch, or should I start on it?

@ity
Copy link
Contributor

ity commented Jan 23, 2018

@illicitonion on it, will work on removing this escape hatch today.

.to_boxed()
}
Err(e) => err(throw(&format!("PathGlobs expansion failed: {:?}", e))),
.map_err(|e| format!("PlatGlobs expansion failed: {:?}", e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathGlobs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// try to perform multiple write transactions concurrently. Fortunately, this property
// holds for us.
.set_flags(NO_TLS)
// 3 DBs; one for file contents, one for directories, one for leases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike rustfmt :(

(I'm aware I'm the one who advocated for it... I just wasn't aware how bad it was at things...)

Done

@@ -923,33 +897,17 @@ pub struct Snapshot {

impl Snapshot {
fn create(context: Context, path_globs: PathGlobs) -> NodeFuture<fs::Snapshot> {
// Recursively expand PathGlobs into PathStats while tracking their dependencies.
// Recursively expand PathGlobs into PathStats.
// We rely on Context::expand tracking dependencies for scandirs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(technically it's tracking both ReadLinks and Scandirs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This has a few advantages:
 1. It makes the Bazel remote execution API much easier to implement.
 2. We store each unique file by content exactly one time, rather than
    once per snapshot it belongs to.

It also allows us to delete a lot of code which handles awkward
specifics of tar files.
@kwlzn
Copy link
Member

kwlzn commented Jan 30, 2018

@ity how's the escape hatch removal looking?

@illicitonion
Copy link
Contributor Author

This got replaced with #5496 (and several other PRs around it)

@illicitonion illicitonion deleted the dwagnerhall/fs/snapshot-4 branch February 26, 2018 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inlining of PathGlob expansion causes excess invalidation with the daemon
4 participants