Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: proc macro include! #3200

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

CAD97
Copy link

@CAD97 CAD97 commented Nov 25, 2021

Allow include! to be implemented in proc macros, by adding a proc_macro API to read files as Vec<u8>, String, or TokenStream. If the file is read as TokenStream, it is given Spans appropriate for diagnostics to point into the read file. In all cases, the build system knows which file(s) have been read, and can cache results / rerun the macro as desired.

Rendered

@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Nov 25, 2021
@programmerjake
Copy link
Member

we'd likely also want a way to list files in a directory, though that may be more difficult to integrate into build systems

@programmerjake
Copy link
Member

we may want to specify that a non-existent file/directory produce a NotFound error just like File::open, cuz proc macros are likely to want to probe for specific files (e.g. a config override file for a particular directory) and fall back to some default if they don't exist.

Co-authored-by: Jacob Lifshay <programmerjake@gmail.com>
@Nemo157
Copy link
Member

Nemo157 commented Nov 25, 2021

It would be useful to somehow allow Spans into include_str files as well, so the proc-macro can report errors (or induce later rustc errors) that reference into non-TokenStream compatible syntaxes.

@CAD97
Copy link
Author

CAD97 commented Nov 25, 2021

allow Spans into include_str files as well

I agree that this is desirable. As such, I'm torn on whether include_str should return (logically) (String, Span). However, there's currently no (even unstable) way for a proc macro to split a given Span into smaller parts, so even with a Span, error reporting into a Rust lexer incompatible file wouldn't yet be possible.

This RFC I'd like to keep focused on the "read file via the build system" architecture, so introducing split spans architecture would (imo) overextend the RFC.

Perhaps the best short term approach is to just drop include_str, and have proc macros include_bytes and String::from_utf8 for the time being until include_str can give a useful Span.

text/0000-proc-macro-include.md Show resolved Hide resolved
text/0000-proc-macro-include.md Show resolved Hide resolved
text/0000-proc-macro-include.md Outdated Show resolved Hide resolved
text/0000-proc-macro-include.md Outdated Show resolved Hide resolved
@est31
Copy link
Member

est31 commented Dec 1, 2021

Two points:

  • How should relative paths be handled? relative to the pwd of the rustc process? or how include_string!, which is relative to the file the .rs is contained within. This becomes especially interesting when the proc macro is called with a relative path, it would be weird for users if that worked differently from include_string! (say a include_cpp! proc macro). Maybe one could pass an optional span to the functions and it would resolve the directory relative to the span's location, and if no span is passed, it would resolve relative to the top level span of the crate, or in other words, relative to lib.rs.

  • I'm wondering about making the return for include_bytes and include_str opaque, or at least using something that supports the backing buffers coming from a mmap call instead of having to use the standard Rust allocator. Think of instances where the included files are, say, 4 GiB large or something. Reading the entire file into memory is very inefficient in that instance and ideally one ceases to read the complete file until the linking step. This optimization does not have to be implemented, but I feel that it should be possible to implement it in the future. Also, maybe the implementation only wants to read parts of a file, in which instance reading the entire file and copying it to RAM is wasteful. Mapping it to memory reads only the needed parts of it.

@programmerjake
Copy link
Member

  • I'm wondering about making the return for include_bytes and include_str opaque, or at least using something that supports the backing buffers coming from a mmap call instead of having to use the standard Rust allocator.

How about:

trait BytesBuf: AsRef<[u8]> {
    // may be expensive
    fn into_vec(self: Box<Self>) -> Vec<u8>;
}
fn include_bytes<P: AsRef<str>>(path: P) -> Result<Box<dyn BytesBuf>, std::io::Error>;

@camelid
Copy link
Member

camelid commented Jan 5, 2022

I would really like for the proc macro version of include_str! (and probably include_bytes! as well, for consistency) to return a Span for the included string. See rust-lang/rust#92565.

@AlbertMarashi
Copy link

This would open tonnes of doors and allow extending the rust language to work with things like single-file components in frontend frameworks written in rust.

@CAD97
Copy link
Author

CAD97 commented Apr 20, 2022

I've finally gotten around to updating the RFC text for the comments here. Changelog:

  • include_str now produces Literal instead of String
  • Mention questions brought up as well as the alternative of more specialized wrapper types for returns.

///
/// NOTE: some errors may cause panics instead of returning `io::Error`.
/// We reserve the right to change these errors into `io::Error`s later.
fn include_bytes<P: AsRef<str>>(path: P) -> Result<Vec<u8>, std::io::Error>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for include_bytes to return Literal as well, or would that not be possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work because Literal can be a byte string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yeah, I overlooked that possibility.

The main limitation is that the only current interface for getting the contents out of a Literal is to ToString it. syn does have a .value() for LitByteStr as well as LitStr, though, so I guess it's workable.

It's probably not good to short term require debug escaping a binary file to reparse the byte string literal if a proc macro is going to post process the file... but if it's just including the literal, it can put the Literal in the token stream, and we can offer ways to extract (byte) string literals without printing the string literal in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one limitation which needs to be solved is how do spans work. Do we just say that the byte string literal contains the raw bytes of the file (even though that would be illegal in a normal byte string, and invalid UTF-8), maybe as a new "kind" of byte string, so span offsets are mapped directly with the source file? Or are there multiple span positions (representing a \xNN in the byte string) which map to a single byte in the source file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what bytes are not allowed in byte string literals? Does the literal itself have to be valid UTF-8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Rust source file must be valid UTF-8. Thus, the contents of a byte string literal in the source must be valid UTF-8.

Bytes that are not < 0x80 thus must be escaped to appear in a byte string literal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then another question that's worth making explicit: what does it even mean for rustc to report a span into a binary file?

I think binary includes are better served by a different API that lets rustc point into generated code, rather than trying to point into an opaque binary file.

@AlbertMarashi
Copy link

AlbertMarashi commented Apr 26, 2022

I've finally gotten around to updating the RFC text for the comments here. Changelog:

* `include_str` now produces [`Literal`](https://doc.rust-lang.org/nightly/proc_macro/struct.Literal.html#) instead of `String`

* Mention questions brought up as well as the alternative of more specialized wrapper types for returns.

Does this allow you to split the Literal into Spans?

Edit: nvm I see there is a subspan function

- That which `include!` is relative to in the source file expanding the macro.
- That which `fs` is relative to in the proc macro execution.

Both have their merits and drawbacks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to support both options would be to take a Span that the path is relative to. Then it would make multi-level includes easier (the macro includes a path relative to the Rust source file, then the included file references another relative file so that needs to be included based on the Span from the first proc_macro::include_str call).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would Span::mixed_site be relative to?

Also, that would kinda soft-block the feature on Span::def_site, while the RFC is currently written such that additional unstable features (such as span subslicing) are incremental improvements not required for the functionality to be useful... though I suppose requiring a span would be strictly more powerful than include!-style base path, so that fits into the same category.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it should just behave the exact same as a include_str!("..") macro invocation whose tokens carry a mixed_site span.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macro_rules! x {
    () => {
        include_str!("a")
    };
}

Somewhat surprisingly, this looks for a file called "a" relative to the file in which x!() is invoked, not relative to the file that contains the definition above.

@SOF3
Copy link

SOF3 commented Sep 21, 2022

we'd likely also want a way to list files in a directory, though that may be more difficult to integrate into build systems

how is this not already possible with Span::call_site().source_file()?

@programmerjake
Copy link
Member

we'd likely also want a way to list files in a directory, though that may be more difficult to integrate into build systems

how is this not already possible with Span::call_site().source_file()?

listing files is already possible using std::fs, the issue is that since cargo doesn't know about that, it won't rerun if you add new files to that directory, or delete, or modify files, or modify file attributes. therefore I think the proc-macro API (and probably something for build.rs too) should include functions that inform cargo that you depend on certain directories so it'll re-run if you change them.

@sam0x17
Copy link

sam0x17 commented Jun 15, 2023

we'd likely also want a way to list files in a directory, though that may be more difficult to integrate into build systems

how is this not already possible with Span::call_site().source_file()?

Additionally the path() part has been nightly only since the dawn of time, so not something we've been able to use yet in stable

@CAD97
Copy link
Author

CAD97 commented Jun 15, 2023

(This would also likely be blocked in unstable limbo by the same concerns as the tracked path interface.)

FWIW, the tracked_path API now supports tracking any changes within a directory. The change happened alongside the buildscript rerun-if getting that functionality to replace the older, mostly useless behavior of watching the directory entry itself for (i.e. metadata) changes.

The "perfect" solution (with respect to tracking only) is to use a WASI target or similar in order to instrument all environment access, such that it can be transparently instrumented, sandboxed, and whatever else the compiler sees as reasonable. For what this RFC is directly trying to address — spanned manipulation of newly accessed files — though, this API surface is still required even with perfect instrumentation of environment access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-proc-macros Proc macro related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.