Valid value checks does not check `char` surrogate values #3241

celinval · 2024-06-07T18:46:57Z

I tried this code:

#[kani::proof]
fn transmute_surrogate_ub() {
    unsafe {
        let val: u32 = kani::any();
        kani::assume(val < char::MAX.into());
        let c: char = std::mem::transmute::<u32, char>(val) as char;
        match val {
            0..0xD800 | 0xE000..0x110000 => assert!(char::from_u32(val).is_some()),
            _ => unreachable!(),
        }
    }
}

using the following command line invocation:

kani -Zvalid-value-checks char.rs

with Kani version: 0.52.0

I expected to see this happen: A valid value check failure

Instead, this happened: An unreachable block being reached due to UB

Failed Checks: internal error: entered unreachable code: 
 File: "char.rs", line 31, in transmute_surrogate_ub

VERIFICATION:- FAILED
Verification Time: 0.12348909s

The text was updated successfully, but these errors were encountered:

adpaco-aws · 2024-07-12T15:43:19Z

I've been investigating this issue and here's what I've found out.

First, the ValidValueReq we process for char specifies a valid_range that doesn't take into account the surrogate gap in the middle: [ValidValueReq { offset: 0, size: MachineSize { num_bits: 32 }, valid_range: 0..=1114111 }]

Note: 1114111 is 0x10FFFF (i.e., char::MAX)

I tried to special-case the ty_req closure in ty_validity_per_offset to return two valid ranges when a char is found:

diff --git a/kani-compiler/src/kani_middle/transform/check_values.rs b/kani-compiler/src/kani_middle/transform/check_values.rs
index a7d0f14d270..253963a4ef6 100644
--- a/kani-compiler/src/kani_middle/transform/check_values.rs
+++ b/kani-compiler/src/kani_middle/transform/check_values.rs
@@ -839,8 +839,21 @@ pub fn ty_validity_per_offset(
     ty: Ty,
     current_offset: usize,
 ) -> Result<Vec<ValidValueReq>, String> {
+    debug!(typ=?ty, kind=?ty.kind(), "ty_validity_per_offset");
     let layout = ty.layout().unwrap().shape();
     let ty_req = || {
+        if ty.kind() == TyKind::RigidTy(RigidTy::Char) {
+            let shape = ty.layout().unwrap().shape();
+            let value_size = match shape.abi {
+                ValueAbi::Scalar(Scalar::Initialized { value, .. })
+                | ValueAbi::ScalarPair(Scalar::Initialized { value, .. }, _) => {
+                    Some(value.size(machine_info))
+                }
+                _ => None,
+            };
+            return vec![ValidValueReq { offset: 0, size: value_size.unwrap(), valid_range: WrappingRange { start: 0, end: 0xD7FF }},
+            ValidValueReq { offset: 0, size: value_size.unwrap(), valid_range: WrappingRange { start: 0xE000, end: 0x10FFFF }}]
+        };
         if let Some(mut req) = ValidValueReq::try_from_ty(machine_info, ty)

However, that didn't have the intended effect as each range check is being encoded separately. In other words, this change will generate two separate checks as follows:

assert!((0..=0xD7FF).contains(value));
assert!((0xE000..0x10FFFF).contains(value));

which doesn't work well because it's requiring value to be in BOTH ranges.

It's not clear to me that the check generation should be done this way for multiple ranges. For example, in this code

                SourceOp::BytesValidity { ranges, target_ty, rvalue } => {
                    let value = body.new_assignment(rvalue, &mut source, InsertPosition::Before);
                    let rvalue_ptr = Rvalue::AddressOf(Mutability::Not, Place::from(value));
                    for range in ranges {
                        let result = build_limits(body, &range, rvalue_ptr.clone(), &mut source);
                        let msg =
                            format!("Undefined Behavior: Invalid value of type `{target_ty}`",);
                        body.add_check(
                            tcx,
                            &self.check_type,
                            &mut source,
                            InsertPosition::Before,
                            result,
                            &msg,
                        );
                    }
                }

in which cases does it make sense to generate more than one check?

Another option is to extend ValidValueReq to be able to work with multiple ranges. This seems to be nontracked pending work according to the comments in the source code. Maybe replacing/wrapping it with another structure like RangeSet could save us work here.

adpaco-aws · 2024-07-15T21:30:25Z

Unfortunately, the option of extending ValidValueReq is more involved than I initially expected for a couple reasons:

It's based on WrappingRange which comes from the ABI API of StableMIR.
WrappingRange doesn't consider gaps within a range like the ones that we'd need for char surrogate values.

It's not clear to me such changes would pay off if char is the only type with a gap.

Would it be OK to change the range loop above so that build_limits constructs a single check that ensures the value falls within one of the ranges? I've just realized this wouldn't be OK if the ranges have different offsets, but it doesn't make sense to generate multiple checks for a particular offset.

Ensure value validity checks for `char` in Kani matches the Rust documentation: > A char value must not be a surrogate (i.e., must not be in the range 0xD800..=0xDFFF) and must be equal to or less than char::MAX. The existing code was relying on the compiler ABI information, which can only express one continuous value range for value validity. But `char` is a special case, and the Rust compiler understands the gap in valid `char`. This change makes Kani compiler aware of this gap too. Resolves #3241 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

celinval added the [C] Bug This is a bug. Something isn't working. label Jun 7, 2024

celinval assigned adpaco-aws Jun 17, 2024

celinval mentioned this issue Jun 27, 2024

Implement validity checks for unsupported constructs #3203

Open

5 tasks

celinval added [F] Soundness Kani failed to detect an issue Z-UnstableFeature Issues that only occur if a unstable feature is enabled labels Jul 4, 2024

adpaco-aws mentioned this issue Jul 19, 2024

[Draft] Combine multiple range checks to handle noncontiguous ranges #3362

Closed

adpaco-aws added T-High Priority Tag issues that have high priority and removed T-High Priority Tag issues that have high priority labels Jul 25, 2024

adpaco-aws assigned celinval and unassigned adpaco-aws Aug 13, 2024

zhassan-aws mentioned this issue Oct 18, 2024

Verify safety of to_lower and to_upper model-checking/verify-rust-std#115

Merged

celinval mentioned this issue Jan 24, 2025

Fix validity checks for char #3853

Merged

celinval closed this as completed in #3853 Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valid value checks does not check `char` surrogate values #3241

Valid value checks does not check `char` surrogate values #3241

celinval commented Jun 7, 2024

adpaco-aws commented Jul 12, 2024

adpaco-aws commented Jul 15, 2024

Valid value checks does not check char surrogate values #3241

Valid value checks does not check char surrogate values #3241

Comments

celinval commented Jun 7, 2024

adpaco-aws commented Jul 12, 2024

adpaco-aws commented Jul 15, 2024

Valid value checks does not check `char` surrogate values #3241

Valid value checks does not check `char` surrogate values #3241