Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating Rust enums for C++ enums causes NonZero-optimization footguns #225

Closed
bholley opened this issue Nov 8, 2016 · 20 comments
Closed

Comments

@bholley
Copy link

bholley commented Nov 8, 2016

Finally got to the bottom of what was making my incremental restyle stuff crash.

Gecko has nsChangeHint, which is a bitmask declared as an enum [1]. Bindgen currently generates a compatible rust enum as shown in [2] ([3] in case the former goes away).

The problem is that Rust makes very strong assumptions about enums, and in particular assumes that enums will never have representations other than the ones listed in the declaration. And if there is no zero value declared, then [4] causes the NonZero optimization to kick in, which means that Option will use the inner enum as the discriminant. And so if somebody happens to pass nsChangeHint(0), Rust will think the entire outer struct is None.

Gecko is arguably abusing |enum| here, but people do this all the time in C++. So we should either find a safer representation, or, barring that, at least generate checked conversion routines (like bitflags! has) that callers can use instead of transmute.

Thoughts?

CC @Manishearth @emilio @nox @fitzgen @vlad @heycam @upsuper

[1] http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/layout/base/nsChangeHint.h#19
[2] https://github.com/bholley/gecko-dev/blob/59bfa64706f8dc5743fbf7c7fdcd8c1478af4864/servo/components/style/gecko_bindings/structs_debug.rs#L5067
[3] https://pastebin.mozilla.org/8926070
[4] rust-lang/rust#37224

@heycam
Copy link
Contributor

heycam commented Nov 8, 2016

Nice find. Can we make rust-bindgen generate a value for 0 if the C++ enum didn't have one? That might be awkward if we match against such enums, since we now have an extra (useless) case to handle, but for bitflag-like enums it's not so likely we'd be using match anyway.

@upsuper
Copy link
Contributor

upsuper commented Nov 8, 2016

Maybe we should add an option to declare all bitflag enums, and make bindgen generate a tuple struct as well as corresponding bit operators and constants for those enums instead of a Rust enum.

@upsuper
Copy link
Contributor

upsuper commented Nov 8, 2016

This can probably be done in regen script. We could just add those types into blacklist, and add template for generating code manually.

@upsuper
Copy link
Contributor

upsuper commented Nov 8, 2016

Oh, hmmm, that is impossible to be done completely in regen script, because we need the constants... Probably we need to add some options to bindgen to support those kind of things, then.

@emilio
Copy link
Contributor

emilio commented Nov 8, 2016

Yeah, it's sort of straight forward to generate constants for enums, though probably a wrapper on top of it, something like:

struct ChangeHint(u32);

impl BitOr<nsChangeHint> for ChangeHint {
    type Output = Self;
    fn bitor(self, rhs: nsChangeHint) -> Self {
        ChangeHint(self.0 | rhs as u32)
    }
}

Though given we need to pass that to Gecko, seems straight-forward to generate something like:

type nsChangeHint = u32;

const nsChangeHint_XXX: nsChangeHint = 1;
// And so on...

@upsuper
Copy link
Contributor

upsuper commented Nov 8, 2016

What in my mind is something like

struct nsChangeHint(u32);
impl BitOr<nsChangeHint> for nsChanghint { ... }
const nsChangeHint_XXX: nsChangeHint = nsChangeHint(1);

@emilio
Copy link
Contributor

emilio commented Nov 8, 2016

That sounds more complete, I like it... I'll try to write something today, not totally sure I'll find the time though.

@Manishearth
Copy link
Member

I had a couple solutions in mind:

  • Fix gecko's bitflags enums to have a none variant. IMO this is an abuse and I don't like that C++ folks do it all the time 😜 We should use constants.
  • Give bindgen an option to always zero-guard enums
  • Give bindgen an option to zero-guard specially tagged enums. Though if we're doing this we might as well do the first option?
  • Give bindgen an option that forces all nonzero enums to either be tagged as "never zero" or tagged as "may be zero" (gets zero-guarded or converted to a bitflag). Untagged enums will cause an error.
  • Make bindgen generate bitflags for specially tagged enums. Again, if we're doing this we might as well take the first option (in addition to this).
  • Remove the optimization for repr(C).

@Manishearth
Copy link
Member

Actually, if nsChangeHint is a bitflag-thingy, it's very wrong to use it as an enum on the Rust side even if we do handle the zero thing. When matching a composite mask, Rust will go hit a hidden match path (fortunately not UB), print something about missing discriminants, and abort. We definitely should be creating a bitmask for this.

@upsuper
Copy link
Contributor

upsuper commented Nov 8, 2016

I guess bitflag is probably the right approach. It seems to me bitflags! is designed for this kind of scenarios.

@emilio
Copy link
Contributor

emilio commented Nov 8, 2016

The problem with bitflags! is that you force a new dependency on bindgen consumers, plus it's not repr(C), plus generating code is probably a lot harder and a one-off. I've written #226 real quick, let me know if you think something like that (probably refined with default implementations for common stuff and similar) would do it.

@upsuper
Copy link
Contributor

upsuper commented Nov 8, 2016

Yeah, I was concerned about the additional dependency...

Bitflags can be set repr(C), anyway. IIUC, you can do

bitflags! {
    #[repr(C)] flags Flags: u32 {
        const FLAG_A       = 0b00000001,
        const FLAG_B       = 0b00000010,
        const FLAG_C       = 0b00000100,
        const FLAG_ABC     = FLAG_A.bits
                           | FLAG_B.bits
                           | FLAG_C.bits,
    }
}

@emilio
Copy link
Contributor

emilio commented Nov 8, 2016

Oh really? TIL. I guess it's a nice to have, will fill an issue for that

@nox
Copy link
Contributor

nox commented Nov 8, 2016

I'm +1 for generating bitflags code, and I think upstream already does that.

@emilio
Copy link
Contributor

emilio commented Nov 8, 2016

It doesn't (https://github.com/Yamakaky/rust-bindgen/pull/284), but yeah.

@petrochenkov
Copy link

@Manishearth
Treating zero specially is not future-proof.
The plans are to "compact" all enums using their invalid discriminant values, not necessarily zero.
(E.g. 3 will be used as a special value for enum E { A = 0, B = 1, C = 2 }.)
So, creating invalid enums through FFI is bad regardless of concrete values.

@emilio
Copy link
Contributor

emilio commented Nov 8, 2016

@petrochenkov I think, though, that doing layout or range optimizations on #[repr(C)] enums is kind of unfortunate.

@eddyb
Copy link
Member

eddyb commented Nov 8, 2016

When matching a composite mask, Rust will go hit a hidden match path (fortunately not UB), print something about missing discriminants, and abort.

What do you mean? Using values not declared is and has been UB since long before 1.0.

@Manishearth
Copy link
Member

What do you mean? Using values not declared is and has been UB since long before 1.0.

TIL, I thought it wasn't UB for C like enums. My bad.

@emilio
Copy link
Contributor

emilio commented Dec 18, 2016

I think this can be closed now, we have support for defining bitfield-like enums that avoids this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants