Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warn or error on matching source globs #5769

Merged

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Apr 30, 2018

Problem

Pants gives no indication when a file specified in the sources kwarg of a target doesn't exist, or if a glob pattern doesn't match anything. This is often an error. See #5430, as well as this longstanding TODO in wrapped_globs.py.

Solution

  • Add the --glob-expansion-failure global option (defaulting to warn).
  • Warn or raise an error if any of the arguments to globs, rglobs, zglobs, or a literal file list doesn't expand to any existing source files.

Result

There are now warning messages emitted (warnings don't work yet, see #5863) when a glob doesn't match in a sources or bundles argument in a target. Setting the global option --error-on-glob-match-failure will cause an exception to be raised if any glob fails to match a file. Error messages contain a representation of the globs/rglobs/etc call from the BUILD file to aid in debugging.

Followup Issues

  1. target sources glob expansion failure warning needs followup work #5863
  2. allow target-specific overriding of GlobMatchErrorBehavior #5864
  3. move glob matching to its own file #5871

@cosmicexplorer cosmicexplorer changed the title warn on matching source globs warn or error on matching source globs May 1, 2018
@cosmicexplorer cosmicexplorer changed the title warn or error on matching source globs [WIP] warn or error on matching source globs May 2, 2018
@cosmicexplorer
Copy link
Contributor Author

I'm still working on this. After @illicitonion mentioned on slack that I'm re-parsing the global options, I realized I had to take a different approach and haven't fully worked it out. I will leave another comment when this is resolved.

@cosmicexplorer cosmicexplorer force-pushed the warn-or-err-matching-source-globs branch 2 times, most recently from fa21d9f to 7062880 Compare May 3, 2018 07:17
@cosmicexplorer cosmicexplorer changed the title [WIP] warn or error on matching source globs warn or error on matching source globs May 3, 2018
@cosmicexplorer
Copy link
Contributor Author

So I took a step back and started with a fresh branch and quickly replicated the functionality I wanted from the previous attempt without any changes to the options system, then force-pushed, so this is much cleaner and simpler. This is ready for review now.

@jsirois: I changed TypedDatatypeInstanceConstructionError to subclass TypeError instead of Exception in 4f9ba6c, and caught TypeError in tests, but thinking about your comment in the other pr I feel like I might want to revert that? It's not clear to me whether the distinction between the exception types here matters, since they really are just to add exception message prefixes. There may be a better way to arrange the exceptions as well.

relpath_adjusted_filespec = FilesetRelPathWrapper.to_filespec(filespec['globs'], spec_path)
rel_include_globs = filespec['globs']

_get_globs_owning_files(files,
Copy link
Member

@stuhood stuhood May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek... sorry. Unfortunately, I think this is the wrong place to do this check. At this point, it's necessary to reverse engineer the match.


In the ticket, I said:

...since we now control the implementation of glob matching, we no longer have a great excuse not to.

...by which I meant: we have an implementation of glob matching in:

/// Recursively expands PathGlobs into PathStats while applying excludes.

What I had in my head when I said that (but didn't explain very well: sorry), was that in fn expand we could track the original/input of each PathGlob instance (ie, this PathGlob came from this original glob string), and then at the end see which original/input glob strings never matched anything. I don't know concretely what this would look like, but I suspect that 1) it would be more efficient than re-constructing later, 2) it would be less likely to break.

@cosmicexplorer cosmicexplorer force-pushed the warn-or-err-matching-source-globs branch 2 times, most recently from 609cf65 to 69a04e0 Compare May 19, 2018 13:21
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the goal of this change :)

I worry that we're adding a bit much complexity to the model, and a few too many layers of wrapping in the rust code.

How would you feel about having the rust code always returning up the match information when expanding PathGlobs, and having the Python code decide whether to ignore it? That way we can avoid a bunch of the do-we-report-stuff state passing around. Hopefully the overhead of tracking that data is low enough that people shouldn't notice/care...

@@ -121,6 +121,9 @@ def file_stats(self):
)


class SnapshotWithMatchData(datatype([('snapshot', Snapshot), 'match_data'])): pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to have semi-typed datatypes...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that one wasn't all the way thought through anyway. It is now gone regardless (I removed the intrinsic and all changes to the native interface).

@@ -239,3 +282,10 @@ def register_options(cls, register):
register('--lock', advanced=True, type=bool, default=True,
help='Use a global lock to exclude other versions of pants from running during '
'critical operations.')
# TODO(cosmicexplorer): make a custom type abstract class to automate the production of an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds nice, assuming it can be done simply :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left it as a TODO due to the length of this PR, I agree 100%.

@@ -121,18 +122,24 @@ impl PathStat {
}
}

impl fmt::Debug for PathStat {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dropping the stat here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated... would be good to remove it from this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed.

@@ -171,13 +184,26 @@ impl PathGlob {
}

pub fn create(filespecs: &[String]) -> Result<Vec<PathGlob>, String> {
let mut path_globs = Vec::new();
let filespecs_globs = Self::spread_filespecs(filespecs)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to phrase these spread and flattens as things which accept and return iterators, which can be collected by the consumer, rather than needing to allocate intermediate Vecs. Probably worth waiting for this until 1.26 lands with impl trait, though :)

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some very brief research -- is there a way to phrase that without impl trait that I'm not seeing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmicexplorer : In theory you can use Box<Iterator<..>>, but it's semi-advanced usage, so I wouldn't worry about it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, great! This actually can probably be removed entirely when implementing your suggestion below.

&PathGlob::Wildcard {
ref symbolic_path,
ref wildcard,
..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we ignoring fields here?

But if we're going to, let's explicitly ref _canonical_dir rather than .., so it becomes a compile error not to consider Debug formatting when adding new fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great piece of advice!!! I think the canonical_dir should be added to the Debug for sure, I think I didn't add it because it was unnecessary screen space when I was Debugging earlier, but I will add it now. I will remember your second comment for future reference though!

) -> BoxFuture<(Vec<PathStat>, Vec<PathGlob>), E> {
match path_glob {
) -> BoxFuture<SingleExpansionResult, E> {
// TODO(cosmicexplorer): I would love to do something like `glob @ PathGlob::Wildcard {...}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this makes me sad too :( I've generally favoured the re-construction over the clone, but I have no strong feelings other than sadness :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this now, it makes sense, I think to disallow:

match path_glob {
  glob @ PathGlob::Wildcard { canonical_dir, symbolic_path, wildcard } => {
    path_glob
  }
}

Because you're moving path_glob into the destructured canonical_dir, symbolic_path, and wildcard, but then trying to move it again into the result of the expression -- there needs to be a clone happening somewhere. However if we were to match on &path_glob (which I just tried), it seems like it's impossible to move something while borrowing it, even if the move happens at the end of the borrowed scope. That seems like it doesn't need to happen? I'm not sure if my analysis is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; your analysis is correct that there isn't a nice way to make this work, and the reasons why. I'd quite like the language to somehow support (maybe as a special case), some syntax like:

match path_globs {
  PathGlob::Wildcard => path_globs
}

or

match path_globs {
  wildcard: PathGlob::Wildcard => wildcard
}

or something similar, to allow this, because it's a pretty common desire.

WithGlobMatchData,
}

impl fmt::Debug for Snapshot {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not derive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deriving now!

ref globs,
} = entry;
let match_found_for_input: bool = globs
.into_iter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this could just be an iter()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That particular line has been removed, but I am looking out for other places where I may have made this mistake.

@@ -915,7 +1087,8 @@ impl NodeKey {
keystr(&s.subject),
typstr(&s.product)
),
&NodeKey::Snapshot(ref s) => format!("Snapshot({})", keystr(&s.0)),
// TODO(cosmicexplorer): why aren't we implementing an fmt::Debug for all of these nodes?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should be :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned that into a TODO for a followup PR.

#[derive(Clone, Debug)]
pub struct SnapshotExpansionResult {
snapshot: fs::Snapshot,
found_files: HashMap<PathGlob, GlobMatch>,
Copy link
Member

@stuhood stuhood May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing something, but why would we expose the found_files out of VFS::expand rather than just eagerly failing in that method based on the configured behaviour?

@cosmicexplorer cosmicexplorer force-pushed the warn-or-err-matching-source-globs branch from a135429 to dd2f422 Compare May 22, 2018 22:12
@@ -42,7 +46,9 @@ def parse_address_family(address_mapper, directory):

The AddressFamily may be empty, but it will not be None.
"""
logger.debug("directory: {}".format(directory))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

default_option_value = 'warn'

@classmethod
def create(cls, value=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating a new instance of the class for each call to create, you could return singletons here:

  if value == 'ignore':
    return GlobMatchErrorBehavior.IGNORE
  ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!! I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(did that)

@@ -121,18 +122,24 @@ impl PathStat {
}
}

impl fmt::Debug for PathStat {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated... would be good to remove it from this PR.

}

///
/// Given a filespec as Patterns, create a series of PathGlob objects.
///
fn parse_globs(
// FIXME: add as_zsh_glob() method which generates a string from the available PathGlob info, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fixme is concrete enough. Would remove, or post as a ticket if it's a thing you actually think we should do. But these weren't really intended to be zsh globs so much as git globs: see the comment in the method body about git. ...but it looks like you removed that comment maybe got removed in the refactoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I actually added it in the as_zsh_style_globs() method, and used it to generate the error message in expand() (which is why I wanted it). But I will absolutely change the naming to gitignore-style globs.

"""Given a BundlesField, request Snapshots for each of its filesets and create BundleAdaptors."""
snapshot_list = yield [Get(Snapshot, PathGlobs, pg) for pg in bundles_field.path_globs_list]
logger.debug("glob_match_error_behavior={}".format(glob_match_error_behavior))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also removed!

@@ -714,12 +837,13 @@ pub trait VFS<E: Send + Sync + 'static>: Clone + Send + Sync + 'static {
return future::ok(vec![]).to_boxed();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn, sorry... this was more challenging than I thought it would be. I'm worried that the result is too complicated right now (because the post expansion loop essentially ends up recreating/rexecuting the expansion logic, the same way you were recreating it in the python code).

In theory tracking associations all the way through (input glob string -> PathGlob -> PathStat) should allow us to avoid reconstructing the match logic afterward. It's also possible I'm missing something: would be happy to pair on this to try and figure it out together.

I also expect that this would be easier with (memoized) recursion, and wonder whether we shouldn't land a separate PR to switch this to memoized recursion before tackling this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous warn-globs-v2 branch I messaged in the slack, I had done exactly that, by having a Vec of "source" PathGlobs stored in an IndexMap (so we can iterate in reverse to do this in topological order), which would memoize exactly what we describe. I can reintroduce that now that the other kinks are worked out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also land that in a separate PR, though.

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now -- we merely record all the "sources" as GlobSource instances, for each glob in completed. A GlobSource can be either another PathGlob, or a GlobParsedSource. We then iterate in reverse over completed. This is very low-cost for the ignore case.

@@ -2,6 +2,7 @@
// Licensed under the Apache License, Version 2.0 (see LICENSE).

use std::ffi::OsString;
use std::iter::Iterator;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

static ref PYTHON_FALSE: Value = eval("False").unwrap();
}

impl From<bool> for Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; removed!


resources(
name='missing-zglobs',
sources=zglobs('**/*.a'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the globs types (z/r/etc) are converted into one underlying format, so accounting for just "recursive", "non-recursive", and "literal" should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I'd like to leave the testprojects and integration testing as is until we can close #5863. Does that sound reasonable? I would like to make sure the testing captures the error messages and ensures they display something useful to the user, and all of the skipped tests in the new test_graph_integration.py need to be updated when that happens.

@@ -9,8 +9,7 @@
import pickle
from abc import abstractmethod

from pants.util.objects import (Exactly, SubclassesOf, SuperclassesOf, TypeCheckError,
TypedDatatypeInstanceConstructionError, datatype)
from pants.util.objects import Exactly, SubclassesOf, SuperclassesOf, datatype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted!

@cosmicexplorer
Copy link
Contributor Author

This should be ready for another look. I have responded to all the review comments left previously.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Danny... I have a bunch of comments, but I think once those are addressed this should be landable.

@@ -239,3 +283,10 @@ def register_options(cls, register):
register('--lock', advanced=True, type=bool, default=True,
help='Use a global lock to exclude other versions of pants from running during '
'critical operations.')
# TODO(cosmicexplorer): Make a custom type abstract class to automate the production of an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to use the TODO($name) format, but have moved away from it in favor of including ticket links in cases where we plan to follow up on something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have edited all the TODOs to that effect and will use tickets from now on, or unnamed smaller TODOs. Sorry about that.

@@ -134,6 +134,8 @@ def __str__(self):
field_value = getattr(self, field_name)
if not constraint_for_field:
elements_formatted.append(
# FIXME(cosmicexplorer): use {field_value!r} in this method, even though this is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto above regarding names. Fine to leave TODOs/FIXMEs, but rather than targeting them at a particular person, create a ticket with enough explanation to allow anyone to fix the issue. Or if the TODO is self-explanatory or low priority, can skip a ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. Thanks for clarifying this.

@@ -121,15 +121,63 @@ impl PathStat {
}
}

#[derive(Clone, Debug)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloning this type would be expensive, so it's good that GitignoreStyleExcludes::create returns an Arc<Self>. But in cases where cloning things would be expensive, it's best not to derive Clone: that way you send the signal to a user that they should avoid copying the type (such as by using an Arc, or sharing it in some other way).

I think I found the callsite below where you were cloning this, and it should not be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the Clone derive and everything seems to work!

@@ -147,6 +195,31 @@ pub enum PathGlob {
},
}

#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct GlobParsedSource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence about whether this type is useful... but I guess I'd bias toward leaving it in.

Rather than giving the field a long name (since it is already in a wrapper type), you might consider using what's known as a "tuple struct":

pub struct GlobParsedSource(String);

...which comes with different matching and construction syntax:

let str = "*.txt".to_string();
match GlobParsedSource(str) {
  GlobParsedSource(s) => ..
}

The only downside to tuple structs is that because their fields aren't named, you have to refer to them by index in cases where you're not destructuring:

let gps = GlobParsedSource("*.txt".to_string());
let str = gps.0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type is useful in the sense that it means we're not passing around strings, there's at least some context. I converted it to a 1-element tuple struct as you suggested -- the named field is definitely unnecessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the in-depth description of tuple structs here, by the way!

"warn" => Ok(StrictGlobMatching::Warn),
"error" => Ok(StrictGlobMatching::Error),
_ => Err(format!(
"???/unrecognized strict glob matching behavior: {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop the ???.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped it!

})
self.assert_failure(pants_run)
self.assertIn(AddressLookupError.__name__, pants_run.stderr_data)
expected_msg = """
Copy link
Member

@stuhood stuhood May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much, much too general. Changing either error reporting or the shape of the rule graph would break it.

Should validate a much more specific thing... maybe just that both the target address and glob are included.

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c3750a4 makes the testing here much more useful as you suggest.

self.assert_success(pants_run)
self.assertNotIn("WARN]", pants_run.stderr_data)

@unittest.skip('Skipped to expedite landing #5769: see #5863')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have this test set to error, I don't think this is related to the warnings issue... is it possible to actually enable this test?

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it requires overriding the __repr__ of BaseGlobs -- after b72ab17 this can now be unskipped.

# TODO: this is passing, but glob_match_error_behavior='ignore' in the target. We should have a
# field which targets with bundles (and/or sources) can override which is the same as the global
# option.
expected_msg = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above response!

"""
self.assertIn(expected_msg, pants_run.stderr_data)

def test_exception_invalid_option_value(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively just tests options parsing... would remove it (or if you don't see a test of the choices impl, move something like this test to the relevant test class).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this test. Thanks!


class GlobalOptionsTest(BaseTest):

def test_exception_glob_match_constructor(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto testing options parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this file, there is ample testing already for this functionality.

@cosmicexplorer cosmicexplorer force-pushed the warn-or-err-matching-source-globs branch 2 times, most recently from a18b5ba to 34e4aa9 Compare May 30, 2018 22:43
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @cosmicexplorer ! I don't think any of my comments are blockers, so we can merge once @illicitonion 's feedback is in.

@@ -288,6 +293,8 @@ def legacy_globs_class(self):
"""The corresponding `wrapped_globs` class for this BaseGlobs."""

def __init__(self, *patterns, **kwargs):
self._patterns = patterns
self._kwargs = kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is mutated below, it will not end up containing anything interesting. Can fix in a followup, but IMO, attempting to reproduce the syntax here is probably overkill.

@@ -272,6 +272,9 @@ fn execute(top_match: clap::ArgMatches) -> Result<(), ExitError> {
.map(|s| s.to_string())
.collect::<Vec<String>>(),
&[],
// By using `Ignore`, we assume all elements of the globs will definitely expand to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this case could at least warn if something isn't matched, but @illicitonion could confirm.

}

pub fn should_check_glob_matches(&self) -> bool {
match self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you derive Eq/PartialEq for this enum, you can do &StrictGlobMatching::Ignore == self.

source: GlobSource,
}

#[derive(Clone, Debug, Eq, Hash, PartialEq)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For enums that don't own any data (or even enums that own very small amounts of data), it's generally fine to derive Copy, which avoids having to manually call Clone.

.collect();

if !non_matching_inputs.is_empty() {
// TODO(#5684): explain what global and/or target-specific option to set to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad ticket reference.


class GlobMatchErrorBehavior(datatype(['failure_behavior'])):
"""Describe the action to perform when matching globs in BUILD files to source files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially controversial, but: given that this type will be validated on the rust side, I think removing the python wrapper type and just having it be a checked string would be totally fine. Take it or leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the wrapper type is slightly overkill here :)

(Would maybe consider re-introducing it if it led to an easy way to keep the python and rust enums in sync, but I don't see an obvious one)

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address comments as a follow-up to avoid merge conflicts :)

@@ -272,6 +272,9 @@ fn execute(top_match: clap::ArgMatches) -> Result<(), ExitError> {
.map(|s| s.to_string())
.collect::<Vec<String>>(),
&[],
// By using `Ignore`, we assume all elements of the globs will definitely expand to
// something here, or we don't care. Is that a valid assumption?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd say we don't care here because fs_util users are generally doing weird hacky one-off experiments, rather than wanting correctness safety nets :)

}

impl GitignoreStyleExcludes {
fn create(patterns: &[String]) -> Result<Arc<Self>, String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return an Arc? It looks like it's to avoid the overhead of cloning EMPTY_IGNORE? Maybe add a comment explaining that (and that it's presumably much more expensive it is to clone a Gitignore than an Arc)?

fn to_sourced_globs(&self) -> Vec<GlobWithSource> {
self
.globs
.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that:

self.globs
  .iter()
  .map(|path_glob| GlobWithSource {
    path_glob.clone(),
    source: GlobSource::ParsedInput(self.input.clone()),
  })
  .collect()

would be slightly more efficient here; cloning the Vec and then doing an into_iter forces you to clone more of the underlying structure of the Vec which you're just going to throw away, whereas cloning each PathGlob as you encounter it, only requires cloning the underlying data you actually need to copy.

@@ -171,13 +240,28 @@ impl PathGlob {
}

pub fn create(filespecs: &[String]) -> Result<Vec<PathGlob>, String> {
let mut path_globs = Vec::new();
// Getting a Vec<PathGlob> per filespec is needed to create a `PathGlobs`, but we don't need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this comment?

// Getting a Vec<PathGlob> per filespec is needed to create a `PathGlobs`, but we don't need
// that here.
let filespecs_globs = Self::spread_filespecs(filespecs)?;
let all_globs = Self::flatten_entries(filespecs_globs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably inline flatten_entries; it's small enough and only used once that it's more clear to be able to read its body, rather than needing to look around for its implementation

// reverse order of expansion (using .rev()), we ensure that we have already visited every
// "child" glob of the glob we are operating on while iterating. This is a reverse
// "topological ordering" which preserves the partial order from parent to child globs.
let all_globs: Vec<PathGlob> = completed.keys().rev().map(|pg| pg.clone()).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.cloned() instead of .map(|pg| pg.clone())

// `GlobSource`), which may be another glob, or it might be a `GlobParsedSource`. We record
// all `GlobParsedSource` inputs which transitively expanded to some file here, and below we
// warn or error if some of the inputs were not found.
let mut inputs_with_matches: HashSet<GlobParsedSource> = HashSet::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need this type ascription?

@@ -29,24 +30,47 @@ class Path(datatype(['path', 'stat'])):
"""


class PathGlobs(datatype(['include', 'exclude'])):
class PathGlobs(datatype([
'include',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to add tuple type constraints to include and exclude - having half-typed objects seems weird

@@ -70,7 +70,7 @@ class Field(object):
"""A marker for Target(Adaptor) fields for which the engine might perform extra construction."""


class SourcesField(datatype(['address', 'arg', 'filespecs', 'path_globs']), Field):
class SourcesField(datatype(['address', 'arg', 'filespecs', 'base_globs', 'path_globs']), Field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add pydoc explaining what base_globs are? It's not self-evident...


class GlobMatchErrorBehavior(datatype(['failure_behavior'])):
"""Describe the action to perform when matching globs in BUILD files to source files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the wrapper type is slightly overkill here :)

(Would maybe consider re-introducing it if it led to an easy way to keep the python and rust enums in sync, but I don't see an obvious one)

@benjyw
Copy link
Contributor

benjyw commented Jul 25, 2018

There is a major issue here, which is that if you have multiple globs in a target, every one has to match something separately. For example, the default globs for python_tests is ('test_*.py', '*_test.py'), so that shops can pick whether they prefer test_foo.py or foo_test.py as their idiom. But now you'll get a warning on whichever one you didn't pick...

I will open an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants