Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical section causes unexpected behaviour in --release mode #505

Closed
gregoryholder opened this issue Feb 9, 2024 · 10 comments · Fixed by #523
Closed

Critical section causes unexpected behaviour in --release mode #505

gregoryholder opened this issue Feb 9, 2024 · 10 comments · Fixed by #523
Labels
compiler-bug Not a bug in avr-hal, but a bug in the rust compiler/LLVM

Comments

@gregoryholder
Copy link

gregoryholder commented Feb 9, 2024

I've come across inconsistent behavior when attempting to output a u64 to serial using ufmt from within a critical section.

Originally this came up because I was using a modified version of a print! macro from #115, and I was noticing that my interrupts would stop after a few iterations.

Edit: this is with the --release flag. Removing it seems to solve the issue, although I'd still be curious to know why.


(near)Minimal test-case:

#![no_std]
#![no_main]
#![feature(abi_avr_interrupt)]

use core::cell::Cell;

use avr_device::interrupt::Mutex;
use panic_halt as _;
use arduino_hal::prelude::*;

#[arduino_hal::entry]
fn main() -> ! {
    let dp = arduino_hal::Peripherals::take().unwrap();
    let pins = arduino_hal::pins!(dp);

    let mut serial = arduino_hal::default_serial!(dp, pins, 57600);

    let tc0 = dp.TC0;

    tc0.tccr0a.write(|w| w.wgm0().ctc());
    tc0.ocr0a.write(|w| w.bits(255));
    tc0.tccr0b.write(|w| w.cs0().prescale_1024());
    tc0.timsk0.write(|w| w.ocie0a().set_bit());


    unsafe {
        avr_device::interrupt::enable();
    }

    loop {
        
        let counter = avr_device::interrupt::free(|cs| {
            INT_COUNTER.borrow(cs).get()
        });
        
        let int_en = avr_device::interrupt::is_enabled();
        
        // unnessesary `free` section to replicate the `print!` macro
        avr_device::interrupt::free(|_| {
            ufmt::uwriteln!(serial, "Counter: {}, interrupts enabled: {}", counter, int_en).unwrap_infallible();
        });
        
        arduino_hal::delay_ms(100);

    }
}


static INT_COUNTER : Mutex<Cell<u64>> = Mutex::new(Cell::new(0)); // doesn't work
// static INT_COUNTER : Mutex<Cell<u32>> = Mutex::new(Cell::new(0)); // works ??

#[avr_device::interrupt(atmega328p)]
unsafe fn TIMER0_COMPA() {
    let cs = avr_device::interrupt::CriticalSection::new();

    let counter_cell = INT_COUNTER.borrow(cs);

    let counter = counter_cell.get();
    counter_cell.set(counter + 1);
}

Which outputs:

[...]
  Programmed target/avr-atmega328p/release/minimal.elf
     Console /dev/ttyACM1 at 57600 baud
             CTRL+C to exit.

Counter: 0, interrupts enabled: true
Counter: 6, interrupts enabled: true
Counter: 12, interrupts enabled: false
Counter: 12, interrupts enabled: false
Counter: 12, interrupts enabled: false
[...]

Changing inner type

If I change INT_COUNTER's inner type to u32, or I don't print the counter, interrupts stay enabled and the counter correctly increments.

[...]
  Programmed target/avr-atmega328p/release/minimal.elf
     Console /dev/ttyACM1 at 57600 baud
             CTRL+C to exit.

Counter: 0, interrupts enabled: true
Counter: 6, interrupts enabled: true
Counter: 12, interrupts enabled: true
Counter: 19, interrupts enabled: true
Counter: 25, interrupts enabled: true
Counter: 32, interrupts enabled: true
[...]

Adding a condition

Even stranger still, if I add a condition to whether I print the counter (with u64), IT STARTS WORKING AGAIN...

avr_device::interrupt::free(|_| {
    // this makes it work again
    if counter > 100 {
        ufmt::uwriteln!(serial, "Counter: {}, interrupts enabled: {}", counter, int_en).unwrap_infallible();
    }
});
[...]
  Programmed target/avr-atmega328p/release/minimal.elf
     Console /dev/ttyACM1 at 57600 baud
             CTRL+C to exit.

Counter: 103, interrupts enabled: true
Counter: 110, interrupts enabled: true
Counter: 116, interrupts enabled: true
Counter: 123, interrupts enabled: true
Counter: 129, interrupts enabled: true
Counter: 136, interrupts enabled: true
[...]

Any advice would be appreciated :)


Dependencies

[dependencies]
panic-halt = "0.2.0"
ufmt = "0.2.0"
nb = "1.1.0"
embedded-hal = "1.0.0"
avr-device = "0.5.4"

[dependencies.arduino-hal]
git = "https://github.com/rahix/avr-hal"
rev = "e312e6e1620bd09470afee71e6590da0ba96f369"
features = ["arduino-uno"]

Tested on uno and mega2560.

@gregoryholder
Copy link
Author

gregoryholder commented Feb 9, 2024

Update: I was running cargo run --release (out of habit), running without the --release flag seems to fix it. Still strange...

@gregoryholder gregoryholder changed the title Using ufmt::uwriteln! with a u64 in a critical section sometimes disables interrupts Critical section causes unexpected behaviour in --release mode Feb 9, 2024
@gregoryholder
Copy link
Author

gregoryholder commented Feb 9, 2024

I've found a workaround by adding #[inline(never)] which works with the release build, as it seems the critical section closure gets optimized away into something else...

avr_device::interrupt::free(
    #[inline(never)]
    |_| {
        ufmt::uwriteln!(
            serial,
            "Counter: {}, interrupts enabled: {}",
            counter,
            int_en
        )
        .unwrap_infallible();
    },
);

@gregoryholder
Copy link
Author

I'm suspecting this is a compilation / linker issue rather than crate-level issue.
If I change

[profile.release]
panic = "abort"
codegen-units = 1
debug = true
lto = true
opt-level = "s"

to

[profile.release]
panic = "abort"
codegen-units = 2 # anything > 1
debug = true
opt-level = "s"

my example works again.

Similarly, if I enable either overflow-checks or debug-assertions, it also works:

[profile.release]
panic = "abort"
codegen-units = 1
debug = true
lto = true
opt-level = "s"
overflow-checks = true  # enable either one (or both)
debug-assertions = true # enable either one (or both)

@Rahix
Copy link
Owner

Rahix commented Feb 10, 2024

Paging @Patryk27, maybe you know something here?

Otherwise we'll need to look at the generated assembly to find out where this is falling apart. My first guess would be that some u64 intrinsics are not working correctly...

@Rahix Rahix added the compiler-bug Not a bug in avr-hal, but a bug in the rust compiler/LLVM label Feb 10, 2024
@Patryk27
Copy link
Contributor

Judging by the behavior, this does look like a codegen bug, but I don't recall anything that could cause it (in this case the intrinsics should be fine, but always worth to take a look there, considering the history 😄) - I'll also try to take a look during the week.

@Patryk27
Copy link
Contributor

Minimized:

#![no_std]
#![no_main]

use avr_device::interrupt;
use panic_halt as _;

#[arduino_hal::entry]
fn main() -> ! {
    let dp = arduino_hal::Peripherals::take().unwrap();
    let pins = arduino_hal::pins!(dp);
    let mut serial = arduino_hal::default_serial!(dp, pins, 57600);

    unsafe {
        interrupt::enable();
    }

    let irq = interrupt::disable_save();
    let counter = core::hint::black_box(123u64);

    _ = ufmt::uwriteln!(serial, "{}", counter);

    if irq.enabled() {
        _ = ufmt::uwriteln!(serial, "irq? 1");
    } else {
        _ = ufmt::uwriteln!(serial, "irq? 0");
    }

    loop {
        //
    }
}

As it is, it prints irq? 0 (which is wrong), but changing 123u64 to 123u32 brings back irq? 1, so it looks like the registers get mixed up somewhere, investigating 🔍

@Patryk27
Copy link
Contributor

Patryk27 commented Feb 15, 2024

Alright, this one seems to be a proper bug in the LLVM codegen (i.e. doesn't seem to be related to the __udivdi3 etc. intrinsics this time 😄).

Minimized even more (by inlining stuff done by uwriteln!() etc.) - prints D, while it should print E:

#![no_std]
#![no_main]

use avr_device::interrupt;
use panic_halt as _;

#[arduino_hal::entry]
fn main() -> ! {
    unsafe {
        interrupt::enable();
    }

    let irq = interrupt::disable_save();
    let counter = core::hint::black_box(12u64); // at least a two-digit number is required to trigger the bug

    let mut buf = [0; 3];

    core::hint::black_box(fmt_num(&mut buf, counter));

    if irq.enabled() {
        report_irq_enabled();
    } else {
        report_irq_disabled();
    }

    loop {
        //
    }
}

fn fmt_num(buf: &mut [u8], mut n: u64) -> &[u8] { // corresponds to uxx!() from ufmt
    let mut i = 0;

    loop {
        unsafe {
            *buf.get_unchecked_mut(i) = (n % 10) as u8;
        }

        n /= 10;

        if n == 0 {
            break;
        } else {
            i += 1;
        }
    }

    if i > 2 { // branch not taken, but required for the bug to occur
        panic!("ayy");
    }

    buf
}

// separated from `main()` to make following the assembly easier
#[inline(never)]
fn report_irq_enabled() {
    let dp = arduino_hal::Peripherals::take().unwrap();
    let pins = arduino_hal::pins!(dp);
    let mut serial = arduino_hal::default_serial!(dp, pins, 57600);

    serial.write_byte(b'E');
    serial.write_byte(b'\n');
}

// ditto
#[inline(never)]
fn report_irq_disabled() {
    let dp = arduino_hal::Peripherals::take().unwrap();
    let pins = arduino_hal::pins!(dp);
    let mut serial = arduino_hal::default_serial!(dp, pins, 57600);

    serial.write_byte(b'D');
    serial.write_byte(b'\n');
}

If we inspect the assembly (e.g. by running avr-objdump or gdb on the *.elf file), we'll see four important places:

 202: 78 94        sei           // `interrupt::enable()`
 204: 8f b7        in r24, 0x3f  // `interrupt::disable_save()`, remembering previous irq state into `r24`
 206: 89 83        std Y+1, r24  // saves `irq` into memory, at address Y+1
 
 /* ... a couple of register-heavy calculations ...
   (hence `irq` gets moved from registers into the memory before) */

 26e: 49 81        ldd r20, Y+1  // loads `irq` back from the memory into register r20

 /* ... some other calculations, taken when the `loop` inside `fmt_num()`
        does a *second* round (or third, fourth etc.): */
 
 29a: 19 01        movw r2, r18
 29c: a4 01        movw r20, r8  // whoops, this clobbers r20!
 29e: b3 01        movw r22, r6

 /* ... back in `main()`, after `fmt_num()` completes:
    
    `if irq.enabled()`, checked assuming that `irq` is stored inside `r20`, but
    if the `loop` inside `fmt_num()` does a second round, it will clobber `r20`,
    making this check invalid! */

 312: 44 23        and r20, r20
 314: 1a f0        brmi .+6
 316: 0e 94 9c 00  call 0x138 ; `report_irq_disabled();`
 31a: 02 c0        rjmp .+4
 31c: 0e 94 53 00  call 0xa6 ; `report_irq_enabled();`
 320: ff cf        rjmp .-2

Tracking further 🔍

@Patryk27
Copy link
Contributor

Status: got it! -- llvm/llvm-project#85277

Once the fix is merged to LLVM, I should be able to push it to rustc within a couple of days.

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 21, 2024
@Patryk27
Copy link
Contributor

fyi, toolchain's already fixed (compiling with nightly-2024-03-22 produces a correct binary) 🙂

@Rahix
Copy link
Owner

Rahix commented Mar 22, 2024

Nice, then it's time to bump the compiler version again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-bug Not a bug in avr-hal, but a bug in the rust compiler/LLVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants