-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to proc_macro::Span
API
#43604
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
r? @jseyfried |
src/libproc_macro/lib.rs
Outdated
pub fn glue(&self, other: Span) -> Option<Span> { | ||
if self == other { return None } | ||
|
||
if self.0.lo != other.0.hi || self.0.hi != other.0.lo { return None } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing case: overlapping but no equivalent positions
Should |
6ef8080
to
5644028
Compare
@abonander you might be interested in this RFC: rust-lang/rfcs#2091 |
Yeah, I read it. This is for more than just |
src/libproc_macro/lib.rs
Outdated
|
||
/// The original source file into which this span points. | ||
#[unstable(feature = "proc_macro", issue = "38356")] | ||
pub fn file(&self) -> SourceFile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return Option<SourceFile>
instead, for when the input to the current macro is the result of another macro and thus will not have a source file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the source file be the file in which the other macro was defined then?
I still think this should be Option<SourceFile>
, at least for impl FromStr for TokenStream
(i.e. DUMMY_SP
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I need to check the values for the other methods that DUMMY_SP
might return to make sure they're not nonsense or else pointing into something unrelated to the current macro expansion.
use span_api_tests::{reemit, assert_no_source_file}; | ||
|
||
reemit! { | ||
assert_no_source_file! { "Hello, world!" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jseyfried The "Hello, world!"
token here gives its source file as span-api-tests.rs
even though it was reparsed from a string by reemit! { }
. Is this the correct behavior? If so, we can't rely on the source file being "<anon>"
to determine if it was emitted from a macro or not. I think the output from external crates' macros is marked as "<anon>"
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-parsing from a string creates tokens with the macro invocation span and then applies a mark to indicate that they are macro-expanded. It doesn't create a "fake" file, so that shouldn't be used to determine whether something came from a macro or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it should test for equality with the macro invocation span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You test for equality modulo syntax context (i.e. sp1.lo == sp2.lo && sp1.hi == sp2.hi
). You can't just equality check the spans since the reparsed tokens' syntax contexts will have an extra mark.
In general, if you want to check if something is emitted from a macro or not, you should check if the syntax context is not SyntaxContext::empty()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good advice there. We wanted this to return None
if the token was produced by a macro, so that should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jseyfried It looks like tokens produced by TokenStream::from_str()
are given a non-empty SyntaxContext
, so that doesn't quite work for that. Is that a bug in that impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I think I've figured out how to reliable deduce this.
@abonander are you looking for some feedback on this PR, or are you still working on it? I note the travis build has failed. |
@aidanhs Looking for a response to #43604 (comment) since that would dictate what semantics the failing test would actually need to test for. It may necessitate touching more internals to set a different filename for the parsed |
@jseyfried I got |
@jseyfried This is ready for review, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I could just try and do a code review. Note that I don't really know anything about compiler internals (although this might be a good thing? maybe? 😄). So maybe all comments are really useless ^_^
src/libproc_macro/lib.rs
Outdated
return None | ||
} | ||
|
||
let loc = __internal::lookup_char_pos(self.0.lo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that lo
and hi
point to the same file? In other words: are there spans which span multiple files? If there aren't, a comment explaining this would be nice for people reading the code and not knowing stuff about compiler internals (like me 😛). If there are, the doc comment should mention that the file of the start of the span is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, because of how the compiler basically sticks an entire crate's source code together, you could potentially have a span that covers multiple files if the macro invocation contains mod
declarations and if you glue()
the tokens from the individual spans together.
So really start()
and end()
should probably return their own SourceFile
's, but that might be exposing too much of the compiler's implementation details (because if it stopped doing that then the API would be misleading, but it would still work). Either that or glue()
should not allow spans with different source files to be glued together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you glue the spans from the individual tokens together, that is. I'm on mobile for most of the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about having something like BytePos
in proc_macro
? Then start()
and end()
would return a BytePos
. And there are methods like BytePos::line()
and BytePos::source_file()
. I think I'd like that API better; what do you think?
src/libproc_macro/lib.rs
Outdated
// If this span's token was produced by a macro | ||
if self.0.ctxt != SyntaxContext::empty() { | ||
return None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it somehow possible to recursively traverse up the macro call hierarchy? To get the span of the macro-invocation that generated the self
span? Does this question make sense? 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, yes. I believe that's done for regular macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should it be done here? If not, it should be possible to do the recursive traversal in user space IMO (which is not possible right now, right?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure what you mean. There's already Span::call_site()
which gets the span of the current macro invocation. I don't think we get much out of giving the macro call information about other macros than itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I missed call_site()
. That's exactly what I meant! :) Nevermind then.
src/libproc_macro/lib.rs
Outdated
|
||
/// Create a new span encompassing `self` and `other`. | ||
#[unstable(feature = "proc_macro", issue = "38356")] | ||
pub fn glue(&self, other: Span) -> Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is glue()
a name already used today? Maybe hull()
is more fitting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and regarding your glue_many()
idea: I wouldn't make it a method, but an associated function:
fn hull_of(spans: &[Span]) -> Span
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hull
is a bit arcane for terminology. syntax_pos
's Span
uses to()
but I feel like that's too short and undescriptive for a public API. Maybe to_other()
, or Span::containing(one, another)
?
As for glue_many()
, it's trivially implementable in client code so I'd forget about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the term convex hull is actually the mathematically correct one here (although usually, 1D convex hulls are not that interesting). So hull()
makes sense and is short. I'm also fine with Span::containing()
, but it's longer than hull()
. I don't like to()
or to_other()
. Or what about join()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too much to explain in a short piece of documentation. We're looking for something that immediately makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
join()
isn't terrible, though.
src/libproc_macro/lib.rs
Outdated
impl SourceFile { | ||
/// Get the underlying file name/path. | ||
#[unstable(feature = "proc_macro", issue = "38356")] | ||
pub fn as_str(&self) -> &str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this return the path or the name of the file? If path: relative to the manifest-file? Relative to something else? Or absolute?
And if a path is returned, why not std::path::Path
instead of &str
? And then I'd also rename the method to path()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying representation is a string but it is typically a path, yeah. I believe it's relative to the working directory for the compiler process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's relative to the directory containing the file being compiled, i.e. ~/foo/src
for rustc ~/foo/src/lib.rs
.
I think this method should be named path
and should return an absolute path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jseyfried Are you want it to be absolute? It'd be more difficult to unit-test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the unit test could just check that a suffix of the absolute path is correct. I think it's worth having an absolute path so that the proc macro author doesn't have to worry about to which the path is relative.
☔ The latest upstream changes (presumably #44186) made this pull request unmergeable. Please resolve the merge conflicts. |
@jseyfried So it would seem that |
src/libproc_macro/lib.rs
Outdated
|
||
let loc = __internal::lookup_char_pos(self.0.lo()); | ||
|
||
PathBuf::from(&loc.file.name).canonicalize().ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could actually return &Path
if we don't care about canonicalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I would lean against canonicalizing, but I don't have that strong of an opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it'll save an allocation to skip it so that's good enough for me.
Build failure appears to be spurious, the |
src/libproc_macro/lib.rs
Outdated
|
||
let loc = __internal::lookup_char_pos(self.0.lo()); | ||
|
||
Some(Path::new(&loc.file.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jseyfried That's right, this borrow is tied to loc
instead of self
so we can't return &Path
here. Should I put it back to returning an opaque wrapper that can be dereferenced to Path
, or just return a copy in PathBuf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning a copy in a PathBuf
is fine, but I think the return value should be a newtype anyway for future compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if we return a newtype then getting &Path
from it is as easy as returning PathBuf
. Is a PathBuf
still preferable in case we change the internals such that it's not easy to get a borrow that lives long enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as it's future compatible (i.e. just impl Deref { type Target = Path; }
) it's up to you -- if internals change we can change later.
@LukasKalbertodt mentioned the real possibility that a span's start and end positions could be in different source files, given that the compiler expands #[my_attr]
mod foo {
fn foo();
mod bar;
} The mod foo {
// tokens from the current file
mod bar {
// tokens from `foo/bar.rs`
}
} This only becomes an issue if two spans are joined that have different source files (as a source token wouldn't have this quirk). Do we want to prohibit this by having |
@jseyfried Also, do the tests look good enough or do you want me to test the return values of |
4716f5c
to
decdf7d
Compare
IMO, that hack is that we construct "fake" files named |
Hacks beget hacks. Poetic. |
src/libproc_macro/lib.rs
Outdated
// This is a hack until intercrate spans are implemented and we can have real source files | ||
// for spans generated in external macros. | ||
// https://github.com/rust-lang/rust/pull/43604#issuecomment-333334368 | ||
self.filemap.name.ends_with("macros>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh, this condition needs to be inverted.
955261b
to
4debb56
Compare
Rebased against a significantly more recent |
src/libproc_macro/lib.rs
Outdated
/// [`is_real`]: #method.is_real | ||
# [unstable(feature = "proc_macro", issue = "38356")] | ||
pub fn as_path(&self) -> &Path { | ||
Path::new(&self.filemap.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this path be used to open files? Note that filemap.name
is not a real path when filemap.name_was_remapped
is true (using -Z remap-path-prefix-*
). It will look like a real path though, so if it's just used for display purposes then that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is_real()
should probably return filemap.is_real_file() && filemap.name_was_remapped
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it needs to be a real path that can be opened, then maybe you need the filemap.path
field that I'm adding in #44940. If users of this API need to be able to open the file, then they need to be able to open the file even when using -Z remap-path-prefix-*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main use case is to refer to files in custom error messages, so they should make sense to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case filemap.name
is correct, and you don't need to check name_was_remapped
. Maybe it should be returning a str
instead of a Path
, so that users are less likely to wrongly open it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's used by some build systems. I do mention it in the as_str()
docs, think it needs to be duplicated on is_real()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remapping is for when the source package is going installed at a location that is different from where it is built. eg I may compile in /home/philipc, but the source package will be installed in /usr/src. It's also useful for reproducible builds. See #38322 for the feature request.
What it means is that the paths in the output binaries should use the remapped location (such as in debuginfo or panic messages). We're also using the remapped path for error messages during compilation, but I'm less sure about why that is useful (maybe it's just for anonymization).
But if you need to open a file during compilation then you need to use the path that isn't remapped, so maybe we need to add an API for that too. Otherwise we'll get into a situation where crates will fail to compile when remapping is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining! I guess we really should add an API to retrieve the original path in long term. The remap-flag should never break code that compiled before, I guess. But maybe this can be done later, as this PR has seen already quite a bit of iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philipc something just occurred to me: the way it's currently implemented, remapping seems to be an all-or-nothing thing, so can it just be a global flag instead of per Filemap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not all-or-nothing: it's possible the remap prefix will only match some files, or there may be multiple prefixes that match different files. I don't know if this will happen in practice, but the compiler currently supports it.
src/libproc_macro/lib.rs
Outdated
// This is a hack until intercrate spans are implemented and we can have real source files | ||
// for spans generated in external macros. | ||
// https://github.com/rust-lang/rust/pull/43604#issuecomment-333334368 | ||
!self.filemap.name.ends_with("macros>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly which paths need to be flagged as not real, but note that there is a FileMap::is_real_file
that only checks for '<' and '>'.
@philipc @jseyfried build passing with latest changes, can rebase whenever. |
r=me with commits squashed. |
I almost don't want to merge this, as of right now it's the oldest still-open PR 😄 |
2fa99b0
to
c66c990
Compare
@jseyfried rebased, much cleaner now |
@bors r=jseyfried |
📌 Commit c66c990 has been approved by |
⌛ Testing commit c66c9903e4e44fa322bd76b2a37da7dc2574ba7d with merge fddec2dec4783140b06ffba217dad25bf215725b... |
💔 Test failed - status-travis |
@bors r-
|
So the assertion is technically correct, that's not a real source file, but it doesn't really help us here. @jseyfried should I fall back to checking if the source file ends in |
@abonander I believe adding an |
c66c990
to
7be36d2
Compare
@jseyfried amended with |
@kennytm would you please re-r+ |
@bors r=jseyfried |
📌 Commit 7be36d2 has been approved by |
Improvements to `proc_macro::Span` API Motivation: https://internals.rust-lang.org/t/better-panic-location-reporting-for-unwrap-and-friends/5042/12?u=logician TODO: - [x] Bikeshedding/complete API - [x] Implement tests/verify return values cc @jseyfried @nrc
☀️ Test successful - status-appveyor, status-travis |
Motivation: https://internals.rust-lang.org/t/better-panic-location-reporting-for-unwrap-and-friends/5042/12?u=logician
TODO:
cc @jseyfried @nrc