-
-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
warn or error on matching source globs #5769
warn or error on matching source globs #5769
Conversation
I'm still working on this. After @illicitonion mentioned on slack that I'm re-parsing the global options, I realized I had to take a different approach and haven't fully worked it out. I will leave another comment when this is resolved. |
fa21d9f
to
7062880
Compare
So I took a step back and started with a fresh branch and quickly replicated the functionality I wanted from the previous attempt without any changes to the options system, then force-pushed, so this is much cleaner and simpler. This is ready for review now. @jsirois: I changed |
relpath_adjusted_filespec = FilesetRelPathWrapper.to_filespec(filespec['globs'], spec_path) | ||
rel_include_globs = filespec['globs'] | ||
|
||
_get_globs_owning_files(files, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek... sorry. Unfortunately, I think this is the wrong place to do this check. At this point, it's necessary to reverse engineer the match.
In the ticket, I said:
...since we now control the implementation of glob matching, we no longer have a great excuse not to.
...by which I meant: we have an implementation of glob matching in:
pants/src/rust/engine/fs/src/lib.rs
Line 666 in cb10df5
/// Recursively expands PathGlobs into PathStats while applying excludes. |
What I had in my head when I said that (but didn't explain very well: sorry), was that in fn expand
we could track the original/input
of each PathGlob
instance (ie, this PathGlob came from this original glob
string), and then at the end see which original/input glob strings never matched anything. I don't know concretely what this would look like, but I suspect that 1) it would be more efficient than re-constructing later, 2) it would be less likely to break.
609cf65
to
69a04e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the goal of this change :)
I worry that we're adding a bit much complexity to the model, and a few too many layers of wrapping in the rust code.
How would you feel about having the rust code always returning up the match information when expanding PathGlobs
, and having the Python code decide whether to ignore it? That way we can avoid a bunch of the do-we-report-stuff state passing around. Hopefully the overhead of tracking that data is low enough that people shouldn't notice/care...
src/python/pants/engine/fs.py
Outdated
@@ -121,6 +121,9 @@ def file_stats(self): | |||
) | |||
|
|||
|
|||
class SnapshotWithMatchData(datatype([('snapshot', Snapshot), 'match_data'])): pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to have semi-typed datatypes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that one wasn't all the way thought through anyway. It is now gone regardless (I removed the intrinsic and all changes to the native interface).
@@ -239,3 +282,10 @@ def register_options(cls, register): | |||
register('--lock', advanced=True, type=bool, default=True, | |||
help='Use a global lock to exclude other versions of pants from running during ' | |||
'critical operations.') | |||
# TODO(cosmicexplorer): make a custom type abstract class to automate the production of an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds nice, assuming it can be done simply :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left it as a TODO due to the length of this PR, I agree 100%.
src/rust/engine/fs/src/lib.rs
Outdated
@@ -121,18 +122,24 @@ impl PathStat { | |||
} | |||
} | |||
|
|||
impl fmt::Debug for PathStat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we dropping the stat
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated... would be good to remove it from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been removed.
src/rust/engine/fs/src/lib.rs
Outdated
@@ -171,13 +184,26 @@ impl PathGlob { | |||
} | |||
|
|||
pub fn create(filespecs: &[String]) -> Result<Vec<PathGlob>, String> { | |||
let mut path_globs = Vec::new(); | |||
let filespecs_globs = Self::spread_filespecs(filespecs)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to phrase these spread
and flatten
s as things which accept and return iterators, which can be collected by the consumer, rather than needing to allocate intermediate Vec
s. Probably worth waiting for this until 1.26 lands with impl trait
, though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some very brief research -- is there a way to phrase that without impl trait
that I'm not seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmicexplorer : In theory you can use Box<Iterator<..>>
, but it's semi-advanced usage, so I wouldn't worry about it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, great! This actually can probably be removed entirely when implementing your suggestion below.
src/rust/engine/fs/src/lib.rs
Outdated
&PathGlob::Wildcard { | ||
ref symbolic_path, | ||
ref wildcard, | ||
.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we ignoring fields here?
But if we're going to, let's explicitly ref _canonical_dir
rather than ..
, so it becomes a compile error not to consider Debug
formatting when adding new fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great piece of advice!!! I think the canonical_dir
should be added to the Debug
for sure, I think I didn't add it because it was unnecessary screen space when I was Debug
ging earlier, but I will add it now. I will remember your second comment for future reference though!
src/rust/engine/fs/src/lib.rs
Outdated
) -> BoxFuture<(Vec<PathStat>, Vec<PathGlob>), E> { | ||
match path_glob { | ||
) -> BoxFuture<SingleExpansionResult, E> { | ||
// TODO(cosmicexplorer): I would love to do something like `glob @ PathGlob::Wildcard {...}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes me sad too :( I've generally favoured the re-construction over the clone, but I have no strong feelings other than sadness :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this now, it makes sense, I think to disallow:
match path_glob {
glob @ PathGlob::Wildcard { canonical_dir, symbolic_path, wildcard } => {
path_glob
}
}
Because you're moving path_glob
into the destructured canonical_dir
, symbolic_path
, and wildcard
, but then trying to move it again into the result of the expression -- there needs to be a clone happening somewhere. However if we were to match on &path_glob
(which I just tried), it seems like it's impossible to move something while borrowing it, even if the move happens at the end of the borrowed scope. That seems like it doesn't need to happen? I'm not sure if my analysis is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; your analysis is correct that there isn't a nice way to make this work, and the reasons why. I'd quite like the language to somehow support (maybe as a special case), some syntax like:
match path_globs {
PathGlob::Wildcard => path_globs
}
or
match path_globs {
wildcard: PathGlob::Wildcard => wildcard
}
or something similar, to allow this, because it's a pretty common desire.
src/rust/engine/src/nodes.rs
Outdated
WithGlobMatchData, | ||
} | ||
|
||
impl fmt::Debug for Snapshot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not derive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deriving now!
src/rust/engine/src/nodes.rs
Outdated
ref globs, | ||
} = entry; | ||
let match_found_for_input: bool = globs | ||
.into_iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this could just be an iter()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That particular line has been removed, but I am looking out for other places where I may have made this mistake.
src/rust/engine/src/nodes.rs
Outdated
@@ -915,7 +1087,8 @@ impl NodeKey { | |||
keystr(&s.subject), | |||
typstr(&s.product) | |||
), | |||
&NodeKey::Snapshot(ref s) => format!("Snapshot({})", keystr(&s.0)), | |||
// TODO(cosmicexplorer): why aren't we implementing an fmt::Debug for all of these nodes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should be :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned that into a TODO for a followup PR.
src/rust/engine/src/nodes.rs
Outdated
#[derive(Clone, Debug)] | ||
pub struct SnapshotExpansionResult { | ||
snapshot: fs::Snapshot, | ||
found_files: HashMap<PathGlob, GlobMatch>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing something, but why would we expose the found_files
out of VFS::expand
rather than just eagerly failing in that method based on the configured behaviour?
a135429
to
dd2f422
Compare
@@ -42,7 +46,9 @@ def parse_address_family(address_mapper, directory): | |||
|
|||
The AddressFamily may be empty, but it will not be None. | |||
""" | |||
logger.debug("directory: {}".format(directory)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed!
default_option_value = 'warn' | ||
|
||
@classmethod | ||
def create(cls, value=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than creating a new instance of the class for each call to create
, you could return singletons here:
if value == 'ignore':
return GlobMatchErrorBehavior.IGNORE
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!! I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(did that)
src/rust/engine/fs/src/lib.rs
Outdated
@@ -121,18 +122,24 @@ impl PathStat { | |||
} | |||
} | |||
|
|||
impl fmt::Debug for PathStat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated... would be good to remove it from this PR.
src/rust/engine/fs/src/lib.rs
Outdated
} | ||
|
||
/// | ||
/// Given a filespec as Patterns, create a series of PathGlob objects. | ||
/// | ||
fn parse_globs( | ||
// FIXME: add as_zsh_glob() method which generates a string from the available PathGlob info, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fixme is concrete enough. Would remove, or post as a ticket if it's a thing you actually think we should do. But these weren't really intended to be zsh globs so much as git globs: see the comment in the method body about git. ...but it looks like you removed that comment maybe got removed in the refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I actually added it in the as_zsh_style_globs()
method, and used it to generate the error message in expand()
(which is why I wanted it). But I will absolutely change the naming to gitignore-style globs.
"""Given a BundlesField, request Snapshots for each of its filesets and create BundleAdaptors.""" | ||
snapshot_list = yield [Get(Snapshot, PathGlobs, pg) for pg in bundles_field.path_globs_list] | ||
logger.debug("glob_match_error_behavior={}".format(glob_match_error_behavior)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also removed!
@@ -714,12 +837,13 @@ pub trait VFS<E: Send + Sync + 'static>: Clone + Send + Sync + 'static { | |||
return future::ok(vec![]).to_boxed(); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn, sorry... this was more challenging than I thought it would be. I'm worried that the result is too complicated right now (because the post expansion loop essentially ends up recreating/rexecuting the expansion logic, the same way you were recreating it in the python code).
In theory tracking associations all the way through (input glob string -> PathGlob -> PathStat)
should allow us to avoid reconstructing the match logic afterward. It's also possible I'm missing something: would be happy to pair on this to try and figure it out together.
I also expect that this would be easier with (memoized) recursion, and wonder whether we shouldn't land a separate PR to switch this to memoized recursion before tackling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous warn-globs-v2
branch I messaged in the slack, I had done exactly that, by having a Vec of "source" PathGlobs stored in an IndexMap (so we can iterate in reverse to do this in topological order), which would memoize exactly what we describe. I can reintroduce that now that the other kinks are worked out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also land that in a separate PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed now -- we merely record all the "sources" as GlobSource
instances, for each glob in completed
. A GlobSource
can be either another PathGlob
, or a GlobParsedSource
. We then iterate in reverse over completed
. This is very low-cost for the ignore
case.
src/rust/engine/src/externs.rs
Outdated
@@ -2,6 +2,7 @@ | |||
// Licensed under the Apache License, Version 2.0 (see LICENSE). | |||
|
|||
use std::ffi::OsString; | |||
use std::iter::Iterator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed!
src/rust/engine/src/externs.rs
Outdated
static ref PYTHON_FALSE: Value = eval("False").unwrap(); | ||
} | ||
|
||
impl From<bool> for Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; removed!
|
||
resources( | ||
name='missing-zglobs', | ||
sources=zglobs('**/*.a'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the globs types (z
/r
/etc) are converted into one underlying format, so accounting for just "recursive", "non-recursive", and "literal" should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I'd like to leave the testprojects and integration testing as is until we can close #5863. Does that sound reasonable? I would like to make sure the testing captures the error messages and ensures they display something useful to the user, and all of the skipped tests in the new test_graph_integration.py
need to be updated when that happens.
@@ -9,8 +9,7 @@ | |||
import pickle | |||
from abc import abstractmethod | |||
|
|||
from pants.util.objects import (Exactly, SubclassesOf, SuperclassesOf, TypeCheckError, | |||
TypedDatatypeInstanceConstructionError, datatype) | |||
from pants.util.objects import Exactly, SubclassesOf, SuperclassesOf, datatype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted!
97098d1
to
3d78392
Compare
42f2533
to
322c16e
Compare
9976e8b
to
846f0be
Compare
This should be ready for another look. I have responded to all the review comments left previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Danny... I have a bunch of comments, but I think once those are addressed this should be landable.
@@ -239,3 +283,10 @@ def register_options(cls, register): | |||
register('--lock', advanced=True, type=bool, default=True, | |||
help='Use a global lock to exclude other versions of pants from running during ' | |||
'critical operations.') | |||
# TODO(cosmicexplorer): Make a custom type abstract class to automate the production of an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to use the TODO($name)
format, but have moved away from it in favor of including ticket links in cases where we plan to follow up on something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have edited all the TODOs to that effect and will use tickets from now on, or unnamed smaller TODOs. Sorry about that.
src/python/pants/util/objects.py
Outdated
@@ -134,6 +134,8 @@ def __str__(self): | |||
field_value = getattr(self, field_name) | |||
if not constraint_for_field: | |||
elements_formatted.append( | |||
# FIXME(cosmicexplorer): use {field_value!r} in this method, even though this is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto above regarding names. Fine to leave TODOs/FIXMEs, but rather than targeting them at a particular person, create a ticket with enough explanation to allow anyone to fix the issue. Or if the TODO is self-explanatory or low priority, can skip a ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. Thanks for clarifying this.
src/rust/engine/fs/src/lib.rs
Outdated
@@ -121,15 +121,63 @@ impl PathStat { | |||
} | |||
} | |||
|
|||
#[derive(Clone, Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloning this type would be expensive, so it's good that GitignoreStyleExcludes::create
returns an Arc<Self>
. But in cases where cloning things would be expensive, it's best not to derive Clone
: that way you send the signal to a user that they should avoid copying the type (such as by using an Arc
, or sharing it in some other way).
I think I found the callsite below where you were cloning this, and it should not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the Clone
derive and everything seems to work!
src/rust/engine/fs/src/lib.rs
Outdated
@@ -147,6 +195,31 @@ pub enum PathGlob { | |||
}, | |||
} | |||
|
|||
#[derive(Clone, Debug, Eq, Hash, PartialEq)] | |||
pub struct GlobParsedSource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence about whether this type is useful... but I guess I'd bias toward leaving it in.
Rather than giving the field a long name (since it is already in a wrapper type), you might consider using what's known as a "tuple struct":
pub struct GlobParsedSource(String);
...which comes with different matching and construction syntax:
let str = "*.txt".to_string();
match GlobParsedSource(str) {
GlobParsedSource(s) => ..
}
The only downside to tuple structs is that because their fields aren't named, you have to refer to them by index in cases where you're not destructuring:
let gps = GlobParsedSource("*.txt".to_string());
let str = gps.0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type is useful in the sense that it means we're not passing around strings, there's at least some context. I converted it to a 1-element tuple struct as you suggested -- the named field is definitely unnecessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the in-depth description of tuple structs here, by the way!
src/rust/engine/fs/src/lib.rs
Outdated
"warn" => Ok(StrictGlobMatching::Warn), | ||
"error" => Ok(StrictGlobMatching::Error), | ||
_ => Err(format!( | ||
"???/unrecognized strict glob matching behavior: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can drop the ???
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped it!
}) | ||
self.assert_failure(pants_run) | ||
self.assertIn(AddressLookupError.__name__, pants_run.stderr_data) | ||
expected_msg = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much, much too general. Changing either error reporting or the shape of the rule graph would break it.
Should validate a much more specific thing... maybe just that both the target address and glob are included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c3750a4 makes the testing here much more useful as you suggest.
self.assert_success(pants_run) | ||
self.assertNotIn("WARN]", pants_run.stderr_data) | ||
|
||
@unittest.skip('Skipped to expedite landing #5769: see #5863') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have this test set to error
, I don't think this is related to the warnings issue... is it possible to actually enable this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it requires overriding the __repr__
of BaseGlobs
-- after b72ab17 this can now be unskipped.
# TODO: this is passing, but glob_match_error_behavior='ignore' in the target. We should have a | ||
# field which targets with bundles (and/or sources) can override which is the same as the global | ||
# option. | ||
expected_msg = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above response!
""" | ||
self.assertIn(expected_msg, pants_run.stderr_data) | ||
|
||
def test_exception_invalid_option_value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively just tests options parsing... would remove it (or if you don't see a test of the choices
impl, move something like this test to the relevant test class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this test. Thanks!
|
||
class GlobalOptionsTest(BaseTest): | ||
|
||
def test_exception_glob_match_constructor(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto testing options parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this file, there is ample testing already for this functionality.
a18b5ba
to
34e4aa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating @cosmicexplorer ! I don't think any of my comments are blockers, so we can merge once @illicitonion 's feedback is in.
@@ -288,6 +293,8 @@ def legacy_globs_class(self): | |||
"""The corresponding `wrapped_globs` class for this BaseGlobs.""" | |||
|
|||
def __init__(self, *patterns, **kwargs): | |||
self._patterns = patterns | |||
self._kwargs = kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is mutated below, it will not end up containing anything interesting. Can fix in a followup, but IMO, attempting to reproduce the syntax here is probably overkill.
@@ -272,6 +272,9 @@ fn execute(top_match: clap::ArgMatches) -> Result<(), ExitError> { | |||
.map(|s| s.to_string()) | |||
.collect::<Vec<String>>(), | |||
&[], | |||
// By using `Ignore`, we assume all elements of the globs will definitely expand to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this case could at least warn if something isn't matched, but @illicitonion could confirm.
} | ||
|
||
pub fn should_check_glob_matches(&self) -> bool { | ||
match self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you derive Eq
/PartialEq
for this enum, you can do &StrictGlobMatching::Ignore == self
.
source: GlobSource, | ||
} | ||
|
||
#[derive(Clone, Debug, Eq, Hash, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For enums that don't own any data (or even enums that own very small amounts of data), it's generally fine to derive Copy
, which avoids having to manually call Clone
.
.collect(); | ||
|
||
if !non_matching_inputs.is_empty() { | ||
// TODO(#5684): explain what global and/or target-specific option to set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad ticket reference.
|
||
class GlobMatchErrorBehavior(datatype(['failure_behavior'])): | ||
"""Describe the action to perform when matching globs in BUILD files to source files. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially controversial, but: given that this type will be validated on the rust side, I think removing the python wrapper type and just having it be a checked string would be totally fine. Take it or leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the wrapper type is slightly overkill here :)
(Would maybe consider re-introducing it if it led to an easy way to keep the python and rust enums in sync, but I don't see an obvious one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's address comments as a follow-up to avoid merge conflicts :)
@@ -272,6 +272,9 @@ fn execute(top_match: clap::ArgMatches) -> Result<(), ExitError> { | |||
.map(|s| s.to_string()) | |||
.collect::<Vec<String>>(), | |||
&[], | |||
// By using `Ignore`, we assume all elements of the globs will definitely expand to | |||
// something here, or we don't care. Is that a valid assumption? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd say we don't care here because fs_util
users are generally doing weird hacky one-off experiments, rather than wanting correctness safety nets :)
} | ||
|
||
impl GitignoreStyleExcludes { | ||
fn create(patterns: &[String]) -> Result<Arc<Self>, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this return an Arc
? It looks like it's to avoid the overhead of cloning EMPTY_IGNORE
? Maybe add a comment explaining that (and that it's presumably much more expensive it is to clone a Gitignore
than an Arc
)?
fn to_sourced_globs(&self) -> Vec<GlobWithSource> { | ||
self | ||
.globs | ||
.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume that:
self.globs
.iter()
.map(|path_glob| GlobWithSource {
path_glob.clone(),
source: GlobSource::ParsedInput(self.input.clone()),
})
.collect()
would be slightly more efficient here; cloning the Vec
and then doing an into_iter
forces you to clone more of the underlying structure of the Vec
which you're just going to throw away, whereas cloning each PathGlob
as you encounter it, only requires cloning the underlying data you actually need to copy.
@@ -171,13 +240,28 @@ impl PathGlob { | |||
} | |||
|
|||
pub fn create(filespecs: &[String]) -> Result<Vec<PathGlob>, String> { | |||
let mut path_globs = Vec::new(); | |||
// Getting a Vec<PathGlob> per filespec is needed to create a `PathGlobs`, but we don't need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this comment?
// Getting a Vec<PathGlob> per filespec is needed to create a `PathGlobs`, but we don't need | ||
// that here. | ||
let filespecs_globs = Self::spread_filespecs(filespecs)?; | ||
let all_globs = Self::flatten_entries(filespecs_globs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably inline flatten_entries
; it's small enough and only used once that it's more clear to be able to read its body, rather than needing to look around for its implementation
// reverse order of expansion (using .rev()), we ensure that we have already visited every | ||
// "child" glob of the glob we are operating on while iterating. This is a reverse | ||
// "topological ordering" which preserves the partial order from parent to child globs. | ||
let all_globs: Vec<PathGlob> = completed.keys().rev().map(|pg| pg.clone()).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.cloned()
instead of .map(|pg| pg.clone())
// `GlobSource`), which may be another glob, or it might be a `GlobParsedSource`. We record | ||
// all `GlobParsedSource` inputs which transitively expanded to some file here, and below we | ||
// warn or error if some of the inputs were not found. | ||
let mut inputs_with_matches: HashSet<GlobParsedSource> = HashSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need this type ascription?
@@ -29,24 +30,47 @@ class Path(datatype(['path', 'stat'])): | |||
""" | |||
|
|||
|
|||
class PathGlobs(datatype(['include', 'exclude'])): | |||
class PathGlobs(datatype([ | |||
'include', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to add tuple
type constraints to include
and exclude
- having half-typed objects seems weird
@@ -70,7 +70,7 @@ class Field(object): | |||
"""A marker for Target(Adaptor) fields for which the engine might perform extra construction.""" | |||
|
|||
|
|||
class SourcesField(datatype(['address', 'arg', 'filespecs', 'path_globs']), Field): | |||
class SourcesField(datatype(['address', 'arg', 'filespecs', 'base_globs', 'path_globs']), Field): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add pydoc explaining what base_globs
are? It's not self-evident...
|
||
class GlobMatchErrorBehavior(datatype(['failure_behavior'])): | ||
"""Describe the action to perform when matching globs in BUILD files to source files. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the wrapper type is slightly overkill here :)
(Would maybe consider re-introducing it if it led to an easy way to keep the python and rust enums in sync, but I don't see an obvious one)
There is a major issue here, which is that if you have multiple globs in a target, every one has to match something separately. For example, the default globs for I will open an issue. |
Problem
Pants gives no indication when a file specified in the
sources
kwarg of a target doesn't exist, or if a glob pattern doesn't match anything. This is often an error. See #5430, as well as this longstanding TODO inwrapped_globs.py
.Solution
--glob-expansion-failure
global option (defaulting towarn
).globs
,rglobs
,zglobs
, or a literal file list doesn't expand to any existing source files.Result
There are now warning messages emitted (warnings don't work yet, see #5863) when a glob doesn't match in a
sources
orbundles
argument in a target. Setting the global option--error-on-glob-match-failure
will cause an exception to be raised if any glob fails to match a file. Error messages contain a representation of theglobs
/rglobs
/etc call from the BUILD file to aid in debugging.Followup Issues