Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint against treating c_char as i8 or u8, for portability #79089

Open
joshtriplett opened this issue Nov 16, 2020 · 10 comments
Open

Lint against treating c_char as i8 or u8, for portability #79089

joshtriplett opened this issue Nov 16, 2020 · 10 comments
Labels
A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. O-android Operating system: Android O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-PowerPC Target: PowerPC processors O-SystemZ Target: SystemZ processors (s390x) T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@joshtriplett
Copy link
Member

One common source of portability issues is making assumptions about whether c_char is i8 or u8. Code written for one platform may fail to build on another due because of this.

I think we could detect this fairly easily: we could add an attribute that means "yes, this is a type alias, but treat it as a slightly distinct type, and lint when matching it against the aliased type if not going through this type alias to do so".

This would help people avoid gratuitous portability issues.

@joshtriplett joshtriplett added T-lang Relevant to the language team, which will review and decide on the PR/issue. A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-PowerPC Target: PowerPC processors O-SystemZ Target: SystemZ processors (s390x) labels Nov 16, 2020
@joshtriplett
Copy link
Member Author

I've tagged this T-lang (because it's a proposed new lint and new attribute), and I've also tagged it for the three common architectures where char differs from the normal signedness.

@jyn514
Copy link
Member

jyn514 commented Nov 16, 2020

Another lint I'd like: treating i32 or i64 as c_long.

@the8472
Copy link
Member

the8472 commented Nov 16, 2020

Should such a lint still fire if the code is cfg'd to a specific platform?

@est31
Copy link
Member

est31 commented Nov 16, 2020

cc #41619

@jonas-schievink jonas-schievink added the E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. label Nov 16, 2020
@jyn514
Copy link
Member

jyn514 commented Nov 16, 2020

Should such a lint still fire if the code is cfg'd to a specific platform?

I would not expect it to, no.

@jyn514 jyn514 added the O-android Operating system: Android label Nov 17, 2020
@joshtriplett
Copy link
Member Author

As a first pass, it would be acceptable if the lint triggered even in that case. It could be allow by default, for instance.

In other words I don't think it should be gated on the portability lint, though once the portability lint is available this should integrate with it.

@jacobbramley
Copy link
Contributor

Note, as well, that this isn't clear in the libc documentation (which seems to be generated from the x86_64 build).

@est31
Copy link
Member

est31 commented Sep 22, 2021

@jacobbramley in the standard library this issue has been improved recently with PR #89114 . The libc crate's repository is here: https://github.com/rust-lang/libc

@workingjubilee
Copy link
Member

A motivating example for this problem:

use std::os::raw::c_char;
fn main() {
    let x: u8 = 0b10000000; // set the last bit. 128 in decimal
    let x = x as i8 as c_char;
    let x = x >> 1; // a non-portable operation
}

Neither Clippy nor the compiler lint on this right shift, but if c_char is signed, then this is an arithmetic shift, but if c_char is unsigned, then this is a logical shift. This is a severe portability hazard, as x now either has the bit pattern 0b11000000 or 0b01000000.

@Kmeakin
Copy link
Contributor

Kmeakin commented Oct 14, 2021

The issue is deeper than we first thought. The fundamental problem is that the Rust type system does not distinguish between a type alias and its aliased type: type-alises are eagerly resolved during type-checking. This means that a lint which only has access to the HIR and the result of type-checking cannot easily detect when an i8/u8 is being used as a c_char (I will use c_char throughout this post, but the same issue applies to types like c_long whose sign and/or width changes between platforms).

We have found 3 possible solutions, none of which is fully satisfactory:

  1. Write a lint to detect uses of improper uses of c_char. This would necessarily be incomplete: for example, 0_u8 as c_char is trivial to detect, as c_char is explicitly mentioned in the cast. However, in general detecting treatment of a u8/i8 as a c_char requires arbitrarily complex type inference:
fn require_types_equal<T>(x: T, y: T) {}
 
fn main() {
    let x: c_char = 0;
    let y = 0_u8 as _;
    require_types_equal(x, y);
}

Here inferring the target type of y's cast requires information from both the line before and after its definition. As we extend the lint to detect more and more cases, we end up effectively replicating Rust's type inference algorithm (inevitably not 100% faithfully).

  1. Change how type inference works to make these kind of uses easier to detect in lints - perhaps add an attribute for type-aliases that should be stronger than an aliases but less strong than a distinct struct. This would be the most general solution, and could provide utility to other projects in the future, but also requires modifying the type-checker, one of the most complex parts of the compiler.

  2. Deprecate the c_char type-alias and instead create a repr(transparent) struct:

#[cfg(target_arch = "x86_64")]
pub type CCharRepr = i8;

#[cfg(target_arch = "aarch64")]
pub type CCharRepr = u8;

#[repr(transparent)]
pub struct CChar {
    repr: CCharRepr
}

impl CChar {
    pub fn new(repr: CCharRepr) -> Self {
        Self { repr }
    }

    pub unsafe fn get_inner(&self) -> CCharRepr {
        self.repr
    }
}

The intention here is that CChar is an opaque type with no operations defined on it: it is only good for passing to extern "C" functions. There is still a getter to access the internal representation, but it is unsafe to emphasize that any operations on the internal representation may be platform-specific (for example, bitshifts behave differently depending on sign).

This has the advantage of only requiring changes to the standard library, not the compiler. Deprecating c_char is not by itself backwards-incompatible: c_char would still be available for existing code that has not yet migrated. However, c_char is also exposed used the public APIs of parts of the standard library, for example std::ffi::CString::into_raw returns a *mut c_char, and std::ffi::CStr::as_ptr returns a *const c_char. It would be unsatisfactory for these functions to return a deprecated type, but changing their return types would be a breaking change and require a new edition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. O-android Operating system: Android O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-PowerPC Target: PowerPC processors O-SystemZ Target: SystemZ processors (s390x) T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants