Closures in generic code can create duplicate monomorphizations #83010

tgnottingham · 2021-03-11T09:58:29Z

Summary

When a closure is defined in a generic context like a generic function or a method of a generic class, its type includes, in some sense or another, the parent's generic parameters. If the closure doesn't depend on some of those generic parameters, it can lead to the instantiation of mono items which are functionally identical, but distinct by virtue of false dependencies on unused generic parameters of the parent. I'll call these mono items "duplicates".

These duplicates create needless work and code bloat. At the very least, they go through the codegen and optimization pipeline, and if not optimized away, are included in the binary. I imagine there is other overhead, such as creation of more distinct types.

Example 1

fn foo<T: std::fmt::Display>(t: T) {
    let _ = vec!["hello".to_string()].iter().map(|x| x);
    println!("{}", t);
}

fn main() {
    foo(0i32);
    foo(0i64);
}

The map call is passed a closure |x| x which does not depend the parent function's generic parameter T. The code for that closure is identical in both foo::<i32> and foo::<i64>, so we should be able to only generate one map instantiation and Map type (the return type of map, which is generic over the closure that map was passed), and use it in both foo instances. Yet this creates two map instantiations, and two Map::new instantiations, because the closure type is polluted by the parent's T parameter, which, again, the closure doesn't use in any way.

Excerpt from -Z print-mono-items=lazy:

MONO_ITEM fn <std::slice::Iter<std::string::String> as std::iter::Iterator>::map::<&std::string::String, [closure@src/main.rs:2:50: 2:55]> @@ lzylnwmetevawv6[External]
MONO_ITEM fn <std::slice::Iter<std::string::String> as std::iter::Iterator>::map::<&std::string::String, [closure@src/main.rs:2:50: 2:55]> @@ lzylnwmetevawv6[External]

MONO_ITEM fn std::iter::Map::<std::slice::Iter<std::string::String>, [closure@src/main.rs:2:50: 2:55]>::new @@ 290s40pyalavf195[External]
MONO_ITEM fn std::iter::Map::<std::slice::Iter<std::string::String>, [closure@src/main.rs:2:50: 2:55]>::new @@ 290s40pyalavf195[External]

(I don't know why, but there weren't any mono items in the output for the closure itself in this example.)

If we actually did anything interesting with the Maps, it would create yet more duplicate instantiations. And if the closure captured anything by value that needed dropping, it would create duplicate drop_in_place instantiations for the closure and Map types.

Example 2

This isn't a rare event. Here's an example from the standard library which was quickly and easily found. This case is different, because the parent generics come from the struct for which the function is implemented:

impl<K, V, S> PartialEq for HashMap<K, V, S>
where
    K: Eq + Hash,
    V: PartialEq,
    S: BuildHasher,
{
    fn eq(&self, other: &HashMap<K, V, S>) -> bool {
        if self.len() != other.len() {
            return false;
        }

        self.iter().all(|(key, value)| other.get(key).map_or(false, |v| *value == *v))
    }
}

The closure passed to map_or really only depends on V, yet it has false dependencies on K and S. Creating two HashMaps with the same value type will cause duplicate mono items to be created for this closure and map_or:

use std::collections::HashMap;

fn main() {
    let m1 = HashMap::<String, u8>::new();
    assert_eq!(m1, m1);

    let m2 = HashMap::<i32, u8>::new();
    assert_eq!(m2, m2);
}

Excerpt from -Z print-mono-items=lazy:

MONO_ITEM fn <std::collections::HashMap<i32, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0} @@ i6hou2d2a3r4xaf[External]
MONO_ITEM fn <std::collections::HashMap<std::string::String, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0} @@ i6hou2d2a3r4xaf[External]

MONO_ITEM fn std::option::Option::<&u8>::map_or::<bool, [closure@<std::collections::HashMap<i32, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0}]> @@ 583fmf0zweyhx5vp[External]
MONO_ITEM fn std::option::Option::<&u8>::map_or::<bool, [closure@<std::collections::HashMap<std::string::String, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0}]> @@ 583fmf0zweyhx5vp[External]

Impact

Okay, so there are some duplicate mono items. Maybe even a lot in a real codebase. But is it impactful? I haven't measured, but I suspect it is. Take a look at the Rust and assembly for Option::map_or (for the above closure) in debug mode, and consider the above example:

    pub fn map_or<U, F: FnOnce(T) -> U>(self, default: U, f: F) -> U {
        match self {
            Some(t) => f(t),
            None => default,
        }
    }

Assembly

<<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>>:
	                sub    $0x58,%rsp
	                mov    %rdi,0x18(%rsp)
	                mov    %sil,%al
	                and    $0x1,%al
	                mov    %al,0x37(%rsp)
	                mov    %rdx,0x38(%rsp)
	                movb   $0x0,0x36(%rsp)
	                movb   $0x0,0x35(%rsp)
	                movb   $0x1,0x36(%rsp)
	                movb   $0x1,0x35(%rsp)
	                mov    0x18(%rsp),%rcx
	                test   %rcx,%rcx
	                setne  %al
	                movzbl %al,%r8d
	                mov    %r8d,%ecx
	                mov    %rdx,0x10(%rsp)
	                mov    %sil,0xf(%rsp)
	         /----- je     <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x4d>
	         |  /-- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x4b>
	      /--|--\-X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x60>
	      |  \----> movb   $0x0,0x36(%rsp)
	      |         mov    0xf(%rsp),%al
	      |         and    $0x1,%al
	      |         mov    %al,0x27(%rsp)
	/-----|-------- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xac>
	|     |         ud2    
	|     \-------> mov    0x18(%rsp),%rax
	|               mov    %rax,0x40(%rsp)
	|               movb   $0x0,0x35(%rsp)
	|               mov    %rax,0x28(%rsp)
	|               mov    0x28(%rsp),%rsi
	|               mov    0x10(%rsp),%rdi
	|               callq  <<std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>
	|               mov    %al,0xe(%rsp)
	|           /-- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x89>
	|           \-> mov    0xe(%rsp),%al
	|               and    $0x1,%al
	|               mov    %al,0x27(%rsp)
	+-------------- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xac>
	|  /----------> testb  $0x1,0x36(%rsp)
	|  |  /-------- jne    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb5>
	|  |  |  /----> mov    0x27(%rsp),%al
	|  |  |  |      and    $0x1,%al
	|  |  |  |      movzbl %al,%eax
	|  |  |  |      add    $0x58,%rsp
	|  |  |  |      retq   
	|  +--|--|--/-X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x95>
	\--|--|--|--|-> testb  $0x1,0x35(%rsp)
	   |  |  |  \-- jne    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xaa>
	   \--|--|----- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x95>
	      \--\----X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x9c>
	         /--/-X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xc0>
	      /--|--|-> testb  $0x1,0x36(%rsp)
	      |  |  \-- jne    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb7>
	      |  \----> mov    0x48(%rsp),%rdi
	      |         callq  <_Unwind_Resume@plt>
	      |         ud2    
	      |         mov    %rax,0x48(%rsp)
	      |         mov    %edx,0x50(%rsp)
	      \-------- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb9>
	                nopw   0x0(%rax,%rax,1)

(Wow, that's surprisingly complicated.)

In debug mode, this code is included in the binary twice, once for each closure. In release mode, the code may be optimized out, but it's still processed for some part of compilation. This was a trivial example. Imagine if map_or was a complicated function!

If there are hundreds or thousands of duplicate mono items in moderate to large codebases, which seems quite possible, then this may have a noticeable impact on compile times and binary sizes.

Solution

If possible, we should make closure types depend only on the parent generics which they actually use.

The ClosureSubsts comment explains why parent generics are included:

/// All right, you say, but why include the type parameters from the
/// original function then? The answer is that codegen may need them
/// when monomorphizing, and they may not appear in the upvars. A
/// closure could capture no variables but still make use of some
/// in-scope type parameter with a bound (e.g., if our example above
/// had an extra `U: Default`, and the closure called `U::default()`).
///
/// There is another reason. This design (implicitly) prohibits
/// closures from capturing themselves (except via a trait
/// object). This simplifies closure inference considerably, since it
/// means that when we infer the kind of a closure or its upvars, we
/// don't have to handle cycles where the decisions we make for
/// closure C wind up influencing the decisions we ought to make for
/// closure C (which would then require fixed point iteration to
/// handle). Plus it fixes an ICE. :P

See #27087, where this was introduced. #27086 may also be relevant. Here and here are FIXMEs for that issue, later removed.

I have a feeling the case where the parent generics come from a generic struct on which the function including the closure is defined (example 2) may be harder to solve.

@rustbot label T-compiler A-typesystem I-compiletime I-heavy

The text was updated successfully, but these errors were encountered:

Aaron1011 · 2021-03-11T20:20:06Z

I believe this is polymorphization: https://rust-lang.github.io/compiler-team/working-groups/polymorphization/

tgnottingham · 2021-03-11T20:51:59Z

Yeah, polymorphization might be a solution. Though, if it's feasible to just not depend on the unused parent generics, that would be preferable over polymorphization, since it means the duplicates are never even introduced. And -Z polymorphize isn't smart enough to handle this yet either.

panstromek · 2021-03-15T20:42:11Z

Note for context: The original motivating issue which was closed by polymorhization talks about basically the same problem: #46477

tgnottingham · 2023-08-18T03:11:52Z

Closing this since it's redundant. Solving this by simply treating two types -- which are otherwise equivalent except that they contain different unused generics -- as equivalent, as I originally thought might be possible as a kind of alternative to polymorphization, is probably not viable, at the very least because the two types still need to have different TypeIds (see #75325). There are probably other reasons why it's not viable, but one reason is good enough.

tgnottingham closed this as completed Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closures in generic code can create duplicate monomorphizations #83010

Closures in generic code can create duplicate monomorphizations #83010

tgnottingham commented Mar 11, 2021 •

edited

Loading

Aaron1011 commented Mar 11, 2021

tgnottingham commented Mar 11, 2021 •

edited

Loading

panstromek commented Mar 15, 2021

tgnottingham commented Aug 18, 2023

Closures in generic code can create duplicate monomorphizations #83010

Closures in generic code can create duplicate monomorphizations #83010

Comments

tgnottingham commented Mar 11, 2021 • edited Loading

Summary

Example 1

Example 2

Impact

Solution

Aaron1011 commented Mar 11, 2021

tgnottingham commented Mar 11, 2021 • edited Loading

panstromek commented Mar 15, 2021

tgnottingham commented Aug 18, 2023

tgnottingham commented Mar 11, 2021 •

edited

Loading

tgnottingham commented Mar 11, 2021 •

edited

Loading