Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Use 'trampolines' to make most of the body of a generic function non-generic #77960

Open
jyn514 opened this issue Oct 15, 2020 · 11 comments
Labels
A-codegen Area: Code generation A-mir-opt Area: MIR optimizations A-monomorphization Area: Monomorphization C-feature-request Category: A feature request, i.e: not implemented / a PR. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-compiletime Issue: Problems and improvements with respect to compile times. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jyn514
Copy link
Member

jyn514 commented Oct 15, 2020

A common pattern in Rust, especially in the standard library, is to have two versions of a function: one generic and one with the actual implementation. For example, Error::new() immediately delegates to a non-generic function _new():

#[stable(feature = "rust1", since = "1.0.0")]
pub fn new<E>(kind: ErrorKind, error: E) -> Error
where
E: Into<Box<dyn error::Error + Send + Sync>>,
{
Self::_new(kind, error.into())
}
fn _new(kind: ErrorKind, error: Box<dyn error::Error + Send + Sync>) -> Error {
Error { repr: Repr::Custom(Box::new(Custom { kind, error })) }
}

This benefits both compile- and run-time: compiletime because LLVM has to do less work compiling the large implementation of the function body many times, and runtime because generic version can safely be marked as #[inline] without bloating the instruction cache.

However, most libraries do not do this, for the simple reason that it's tedious and annoying to do (and you have to already know about it). It would be amazing if rustc could do this transformation itself. I expect this would help greatly with #65031, as well as compile times across the ecosystem.

@jyn514 jyn514 added E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. A-codegen Area: Code generation I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. I-heavy Issue: Problems and improvements with respect to binary size of generated code. labels Oct 15, 2020
@jyn514
Copy link
Member Author

jyn514 commented Oct 15, 2020

Thanks to @matklad for bringing this up in https://matklad.github.io/2020/10/15/study-of-std-io-error.html !

@jyn514 jyn514 changed the title Feature request: Use 'trampolines' to de-monomorphize large parts of a function Feature request: Use 'trampolines' to most of the body of a generic function non-generic Oct 15, 2020
@jyn514 jyn514 changed the title Feature request: Use 'trampolines' to most of the body of a generic function non-generic Feature request: Use 'trampolines' to make most of the body of a generic function non-generic Oct 15, 2020
@dlight
Copy link

dlight commented Oct 15, 2020

@llogiq wrote a library to perform this translation with a function attribute, momo. It works with a small number of traits (Into, AsRef and AsMut). Having rustc first optimize those three traits would be a good first step before working with arbitrary traits.

In the far future, I wish that rustc were able to automatically transform generics into code that uses trait objects, whenever applicable (the opposite of the "devirtualization" optimization).

@dylni
Copy link
Contributor

dylni commented Oct 15, 2020

#77767 is a similar issue for const generics.

@thomcc
Copy link
Member

thomcc commented Oct 15, 2020

This is a specific case of "outlining", and while outlining would help various things in a lot of places, its probably better to handle this specifically.

@camelid
Copy link
Member

camelid commented Dec 18, 2020

For those (like me!) who don't know what 'trampoline' means in this sense of the word, I think it's related to this part of the Wikipedia article "Trampolines (computing)":

In embedded systems, trampolines are short snippets of code that start up other snippets of code. For example, rather than write interrupt handlers entirely in assembly language, another option is to write interrupt handlers mostly in C, and use a short trampoline to convert the assembly-language interrupt calling convention into the C calling convention.

@jyn514
Copy link
Member Author

jyn514 commented Dec 18, 2020

@camelid well sort of - I was mostly thinking of how the PLT is a trampoline: https://stackoverflow.com/questions/43048932/why-does-the-plt-exist-in-addition-to-the-got-instead-of-just-using-the-got. But yeah, same idea of having a short, well-known setup address that calls a much larger implementation.

@tgnottingham
Copy link
Contributor

tgnottingham commented Jan 4, 2021

The effect on runtime should be elaborated a bit. If I understand correctly, we wouldn't want to apply the transformation indiscriminately.

For example, functions that we actually want to be inlined shouldn't have this applied to them, because that would defeat the purpose of inlining. And I don't think we can expect LLVM to undo the transformation in these cases. At least it doesn't in the Error::_new example in the summary. That happens to be desirable in this case, or it's theorized to be, which is also something to think about. Maybe LLVM would inline if the non-generic impl was marked #[inline] though.

Anyway, there's probably more subtlety involved than that, but I just wanted to point out the general issue since it hadn't been mentioned.

There was also some discussion about this and the implementation of dispatching to the non-generic function in this thread about Momo.

@bluss
Copy link
Member

bluss commented Feb 17, 2021

One of the culprits in std (but not the only one) - is that closures inherit all type parameters in their scope, not just those that affect it, and solving that separately would allow us to remove some of the workarounds in Iterator specifically.

I don't know the status of polymorphization, is that solving enough to remove the need for the workarounds in Iterator?

@jumbatm
Copy link
Contributor

jumbatm commented Mar 15, 2021

Happy to take this :)

@rustbot claim

@cjgillot
Copy link
Contributor

Investigation report

This feature is close to type-erased code generation that other programming languages (Java, OCaml...) do. My approach was the following.

Consider a function

fn foo<T>(a: M<T>) -> R<T> {
  let c: C<T> = T::CONST;
  // stuff with T
  T::bar(a, c)
}

During MIR analyses, the trampoline and the callee version can be computed generically:

  • gather all constants out a MIR body, and replace them by appropriately-typed parameters;
  • gather all functions, and replace them by function pointer parameters.
fn foo<T>(a: M<T>) -> R<T> {
  let _2: C<T> = T::CONST;
  let _3: fn(M<T>, C<T>) -> R<T> = T::bar;
  foo_impl<T>(a, _2, _3)
}

fn foo_impl<T>(a: M<T>, c: C<T>, _3: fn(M<T>, C<T>) -> R<T>) -> R<T> {
  // stuff with T
  _3(a, c)
}

The next step is to only have one instance of foo_impl. Next, we introduce a placeholder type T_op. The code becomes:

extern { type T_impl; }

fn foo<T>(a: M<T>) -> R<T> {
  let _2: C<T> = T::CONST;
  let _3: fn(M<T>, C<T>) -> R<T> = T::bar;
  transmute(foo_impl(transmute(a), transmute(_2), transmute(_3)))
}

fn foo_impl(a: M<T_impl>, c: C<T_impl>, _3: fn(M<T_impl>, C<T_impl>) -> R<T_impl>) -> R<T_impl> {
  // stuff with T_impl
  _3(a, c)
}

As a last step, we need to check that all those transmutes are safe.

The problem is that Rust layouts all monomorphized types independently: in general M<T> and M<T_impl> have no reason to have a similar layout if T != U. In special cases, this can happen, for instance if M<T> only points to sized T through a reference.

As a consequence, we need to check that M<T> and M<T_impl> and all recursively accessible types have the same layout. Likewise for C<T> and R<T>. If any such test fails, the type-erasure is unsound, we need to abort and go back to full monomorphization. Actually, we should be able to restrict the check to all types that are involved in the MIR: types of locals, of accessed fields, of dereferenced pointers...

This last checking part is (IMHO) the most subtle part for a MVP of this feature.

@jumbatm
Copy link
Contributor

jumbatm commented Apr 19, 2021

Thank you for documenting your investigation.

For contrast and discussion's sake, I'll describe my approach.

Instead of type erasure, it does something similar to what Momo already does. First, we perform a dataflow analysis to determine whether the arguments' uses converge to a single type. For example:

fn bar<S: Into<String>>(s: S) {
    some_work(); // s: S
    more_work(); // s: S
    let s: String = s.into(); // s: String
    even_more_work(&s); // s: String
    yet_even_more_work(s); // s: String
}

We see in the example above that after the third line of bar, s is a String for all code paths. This means that from this statement onwards, the function is non-generic, so we can extract that out into a single instance. The generic "trampoline", therefore, just becomes the generic start of the function:

#[inline]
fn bar<S: Into<String>>(s: S) {
    some_work();
    more_work();
    let s: String = s.into();
    bar_impl(s);
}

fn bar_impl(s: String) {
    even_more_work(&s);
    yet_even_more_work(s);
}

As you can see, this is basically what Momo does, but generalised out with a dataflow analysis so the optimisation can be applied automatically.

I think our approaches actually target different scenarios, and may be orthogonal: While my approach is, like Momo, specifically targeted to cases like the Error example given above, your approach uses the trampoline to merge monomorphisations of parameters with the same layout. If I'm not mistaken, your approach makes the observation that when two generic functions differ in type parameters, they may really be equivalent on layout (which your approach handles by erasing the type) and only differ on specific handling (which your approach handles with the function pointer parameter). My concern with this approach is that it may not apply to scenarios such as in the Error example above: while two types may both implement Into<Box<_>>, it does not necessarily mean that their layouts will be identical, or even similar. I'm also concerned about the call via function pointer, because we don't want to introduce a dynamic dispatch where a user would expect a static dispatch (though this concern may be moot if const prop is able to resolve the call directly).

Again, thank you for documenting your investigation. I appreciate the write-up -- it's given me a lot to think about. Please do let me know what you think of my approach, and if I've missed anything.

[...] we need to check that M and M<T_impl> and all recursively accessible types have the same layout. [...] This last checking part is (IMHO) the most subtle part for a MVP of this feature.

Ooh! I've actually previously implemented a similar check for layout equivalence, which may be of interest: https://github.com/rust-lang/rust/blob/master/compiler/rustc_lint/src/builtin.rs#L2696-L2704.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-mir-opt Area: MIR optimizations A-monomorphization Area: Monomorphization C-feature-request Category: A feature request, i.e: not implemented / a PR. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-compiletime Issue: Problems and improvements with respect to compile times. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests