Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::fs::create_dir not failing on existing directory on aarch64, debug #50516

Closed
yrashk opened this issue May 7, 2018 · 32 comments
Closed

std::fs::create_dir not failing on existing directory on aarch64, debug #50516

yrashk opened this issue May 7, 2018 · 32 comments
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@yrashk
Copy link

yrashk commented May 7, 2018

I have this piece of code:

src/main.rs:

use std::fs::create_dir;

fn main() {
   create_dir("hello").expect("1");
   create_dir("hello").expect("2");
}

When I tried to run it on aarch64, debug mode, I get empty output. However, I would expect the second except to fail (provided hello didn't exist before).

In fact, when I run the same code with cargo run --release, this is the behavior I get:

$ cargo run --release
    Finished release [optimized] target(s) in 0.04s
     Running `target/release/dir-exists-aarch64`
thread 'main' panicked at '2: Os { code: 17, kind: AlreadyExists, message: "File exists" }', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

I've tried it on both stable and nightly:

rustc 1.25.0 (84203cac6 2018-03-25)
binary: rustc
commit-hash: 84203cac67e65ca8640b8392348411098c856985
commit-date: 2018-03-25
host: aarch64-unknown-linux-gnu
release: 1.25.0
LLVM version: 6.0
rustc 1.27.0-nightly (428ea5f6b 2018-05-06)
binary: rustc
commit-hash: 428ea5f6b9e7c6e5ee3294fe9f105e77e89ab407
commit-date: 2018-05-06
host: aarch64-unknown-linux-gnu
release: 1.27.0-nightly
LLVM version: 6.0

Any thoughts?

@estebank estebank added the O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label May 7, 2018
@cuviper
Copy link
Member

cuviper commented May 7, 2018

Can you run it under strace to see its actual syscalls? Debug and release builds should be identical in this respect, apart from different memory addresses between runs.

@yrashk
Copy link
Author

yrashk commented May 7, 2018

Of course!

Debug:

mkdirat(AT_FDCWD, "hello", 0777)        = 0
mkdirat(AT_FDCWD, "hello", 0777)        = -1 EEXIST (File exists)
newfstatat(AT_FDCWD, "hello", {st_mode=S_IFDIR|0775, st_size=4096, ...}, 0) = 0
sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=16384}, NULL) = 0
munmap(0xffffb154a000, 16384)           = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Release:

mkdirat(AT_FDCWD, "hello", 0777)        = 0
mkdirat(AT_FDCWD, "hello", 0777)        = -1 EEXIST (File exists)
write(2, "thread '", 8thread ')                 = 8
write(2, "main", 4main)                     = 4
write(2, "' panicked at '", 15' panicked at ')         = 15
write(2, "2: Os { code: 17, kind: AlreadyE"..., 632: Os { code: 17, kind: AlreadyExists, message: "File exists" }) = 63
write(2, "', ", 3', )                      = 3
write(2, "libcore/result.rs", 17libcore/result.rs)       = 17
write(2, ":", 1:)                        = 1
write(2, "945", 3945)                      = 3
write(2, ":", 1:)                        = 1
write(2, "5", 15)                        = 1
write(2, "\n", 1
)                       = 1
write(2, "note: Run with `RUST_BACKTRACE=1"..., 51note: Run with `RUST_BACKTRACE=1` for a backtrace.
) = 51
futex(0xffffb74cfccc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=16384}, NULL) = 0
munmap(0xffffb7534000, 16384)           = 0
exit_group(101)                         = ?
+++ exited with 101 +++

@cuviper
Copy link
Member

cuviper commented May 7, 2018

Well, at least the kernel is behaving as expected. 🙂

I can reproduce this on Fedora 27, using the official Rust 1.25 (w/ LLVM 6). Fedora 27's own rust-1.25.0 package (linked to LLVM 5) does not have the problem! But I also tried Fedora rawhide's package, where it does have LLVM 6, and the problem resurfaces.

@yrashk
Copy link
Author

yrashk commented May 7, 2018

I don't even know where to start on this, to be honest (besides comparing LLVM 5 and 6)... any suggestions?

@yrashk
Copy link
Author

yrashk commented May 7, 2018

And it's interesting that the issue manifests only in debug, not in release

@cuviper
Copy link
Member

cuviper commented May 7, 2018

Something is getting clobbered, or otherwise initialized improperly. Stepping through this in GDB, I found:

std::fs::DirBuilder::create (self=0xffffffffed38, path=0xaaaaaaac7c74 <str.0>) at /checkout/src/libstd/fs.rs:1976
(gdb) p *self
$11 = std::fs::DirBuilder {inner: std::sys::unix::fs::DirBuilder {mode: 511}, recursive: 254}

That mode: 511 is octal 0777 as expected, but recursive is a bool -- it should only be true (1) or false (0), and in this case we expect false! But the generated code behaves as if that 254 is true, then proceeds in the manner of create_dir_all, which doesn't error for existing paths.

@yrashk
Copy link
Author

yrashk commented May 7, 2018

Interesting! But stepping through the same code in release, recursive is 0?

@cuviper
Copy link
Member

cuviper commented May 7, 2018

Under optimization, even with debuginfo the locals are too far gone to inspect. But I can at least see in this bit of code that it's taking the latter path.

   │1979        fn _create(&self, path: &Path) -> io::Result<()> {
   │1980            if self.recursive {
   │1981                self.create_dir_all(path)
   │1982            } else {
  >│1983                self.inner.mkdir(path)
   │1984            }

@yrashk
Copy link
Author

yrashk commented May 7, 2018

Gotcha, as expected. So the main question now is why does recursive become 254. I am pretty sure I don't have any knowledge of internals to suggest anything valuable at this moment. But if you can at least point at the direction you think is worth exploring, I'll try to do my best to help with the research.

@yrashk
Copy link
Author

yrashk commented May 7, 2018

I believe #48673 is somewhat related. I can see 254 there as well!

@yrashk
Copy link
Author

yrashk commented May 7, 2018

Trying out the suggestion from the above issue (wrt to GlobalISel):

$ RUSTFLAGS="-C llvm-args=-fast-isel" cargo run
   Compiling dir-exists-aarch64 v0.1.0 (file:///home/yrashk/dir-exists-aarch64)
    Finished dev [unoptimized + debuginfo] target(s) in 2.73 secs
     Running `target/debug/dir-exists-aarch64`
thread 'main' panicked at '1: Os { code: 17, kind: AlreadyExists, message: "File exists" }', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace

works as expected!

@cuviper
Copy link
Member

cuviper commented May 8, 2018

OK, that gives something to compare more directly. Both create the DirBuilder the same:

 <_ZN3std2fs10DirBuilder3new17h4d87461ec68baf84E>:
320023e0        orr     w0, wzr, #0x1ff           
2a1f03e1        mov     w1, wzr                   
d65f03c0        ret                               

When the fast-isel create_dir gets that, it just copies w0 and w1 onto the stack:

 <_ZN3std2fs10create_dir17h0c9a5398928d3c8dE>:                                
d10183ff        sub     sp, sp, #0x60                                         
a9057bfd        stp     x29, x30, [sp, #80]                                   
910143fd        add     x29, sp, #0x50                                        
f90017e0        str     x0, [sp, #40]                                         
f9001be1        str     x1, [sp, #48]                                         
f90013e8        str     x8, [sp, #32]                                         
94000f77        bl      7764 <_ZN3std2fs10DirBuilder3new17h4d87461ec68baf84E> 
b9001fe1        str     w1, [sp, #28]                                         
b9001be0        str     w0, [sp, #24]                                         

Under the default (GlobalISel), it sets the precise bits in x8. The bfxil ... #0, #32 sets mode at lsb 0 width 32, and bfi ... #32, #1 sets recursive at lsb 32, width 1. The remaining bits of x8 are untouched! (i.e. garbage) Then x8 is written on the stack in the same place as before.

 <_ZN3std2fs10create_dir17h0c9a5398928d3c8dE>:                               
d10183ff        sub     sp, sp, #0x60                                        
a9057bfd        stp     x29, x30, [sp, #80]                                  
910143fd        add     x29, sp, #0x50                                       
9100a3e9        add     x9, sp, #0x28                                        
f9000120        str     x0, [x9]                                             
f9001be1        str     x1, [sp, #48]                                        
f90013e8        str     x8, [sp, #32]                                        
94000fcc        bl      79f4 <_ZN3std2fs10DirBuilder3new17h4d87461ec68baf84E>
2a0003e9        mov     w9, w0                                               
b3407d28        bfxil   x8, x9, #0, #32                                      
2a0103e9        mov     w9, w1                                               
b3600128        bfi     x8, x9, #32, #1                                      
f9000fe8        str     x8, [sp, #24]                                        

Then DirBuilder::_create looks the same for both: a condition on the entire byte at offset #4 branching into the different tail calls. But with the GlobalISel, this is where we see our 254, 0b11111110 where only the lsb 0 was set for our false.

 <_ZN3std2fs10DirBuilder7_create17h1220ff0bdc1b9d5bE>:                                   
39401009        ldrb    w9, [x0, #4]                                                     
34000049        cbz     w9, 7a0c <_ZN3std2fs10DirBuilder7_create17h1220ff0bdc1b9d5bE+0xc>
14000002        b       7a10 <_ZN3std2fs10DirBuilder14create_dir_all17hdbee7d289e9930e9E>
14001ef2        b       f5d4 <_ZN3std3sys4unix2fs10DirBuilder5mkdir17ha85b362192bd0679E> 

@cuviper
Copy link
Member

cuviper commented May 8, 2018

Even this shows the wrong recursive:

fn main() {
    println!("{:?}", std::fs::DirBuilder::new());
}
DirBuilder { inner: DirBuilder { mode: 511 }, recursive: true }

@retep998 retep998 added the A-codegen Area: Code generation label May 8, 2018
@yrashk
Copy link
Author

yrashk commented May 8, 2018

@cuviper this is excellent research! I am wondering, given that this works fine in release, does this mean GlobalISel works differently in release, or is GlobalISel not used in release?

@tstellar
Copy link

tstellar commented May 8, 2018

Could someone attach an LLVM IR dump from the example posted by@cuviper ?

@cuviper
Copy link
Member

cuviper commented May 8, 2018

@yrashk It's the latter, LLVM only uses GlobalISel for -O0 right now. Though even if it were used in release, it's not a given that the bug would still manifest, since optimizations can greatly change what's given to isel.

@tstellar Here you go: print-dirbuilder.tar.gz

The first block in dirbuilder::main shows the same bit manipulation I found before:

start:
  %_12 = alloca { i32, i1 }, align 4
  %_10 = alloca i32*, align 8
  %_9 = alloca [1 x { i8*, i8* }], align 8
  %_2 = alloca %"core::fmt::Arguments", align 8
; call std::fs::DirBuilder::new
  %0 = call { i32, i1 } @_ZN3std2fs10DirBuilder3new17h4be627efa14f8660E()
  store { i32, i1 } %0, { i32, i1 }* %_12, align 4
  br label %bb1

With llc -O0 dirbuilder.ll:

// %bb.0:                               // %start
	sub	sp, sp, #144            // =144
	str	x30, [sp, #128]         // 8-byte Folded Spill
	.cfi_def_cfa_offset 144
	.cfi_offset w30, -16
	adrp	x1, _ZN56_$LT$std..fs..DirBuilder$u20$as$u20$core..fmt..Debug$GT$3fmt17h81b1b3fa607a0cf5E
	add	x1, x1, :lo12:_ZN56_$LT$std..fs..DirBuilder$u20$as$u20$core..fmt..Debug$GT$3fmt17h81b1b3fa607a0cf5E
	adrp	x8, .Lbyte_str.3
	add	x8, x8, :lo12:.Lbyte_str.3
	adrp	x9, .Lbyte_str.4
	add	x9, x9, :lo12:.Lbyte_str.4
	str	x1, [sp, #40]           // 8-byte Folded Spill
	str	x8, [sp, #32]           // 8-byte Folded Spill
	str	x9, [sp, #24]           // 8-byte Folded Spill
	bl	_ZN3std2fs10DirBuilder3new17h4be627efa14f8660E
                                        // implicit-def: %x8
	mov	w9, w0
	bfxil	x8, x9, #0, #32
	mov	w9, w1
	bfi	x8, x9, #32, #1
	str	x8, [sp, #120]

So again we see x8 with only some of the bits changed from its previous value.

With llc -O0 -fast-isel dirbuilder.ll:

// %bb.0:                               // %start
	sub	sp, sp, #128            // =128
	str	x30, [sp, #112]         // 8-byte Folded Spill
	.cfi_def_cfa_offset 128
	.cfi_offset w30, -16
	bl	_ZN3std2fs10DirBuilder3new17h4be627efa14f8660E
	str	w0, [sp, #104]
	and	w0, w1, #0x1
	strb	w0, [sp, #108]

@tstellar
Copy link

tstellar commented May 9, 2018

Why does rust emit i1 for its bool type instead of i8?

@cuviper
Copy link
Member

cuviper commented May 9, 2018

I don't know the history of bool->i1, but I suspect that's just to accurately reflect that it only has two permissible states. Is there an actual problem with using i1 for this? Or maybe it's just unusual?

@cuviper
Copy link
Member

cuviper commented May 9, 2018

Here's an interesting comment about the bool representation:

// Make sure to return the same type `immediate_llvm_type` would,
// to avoid dealing with two types and the associated conversions.
// This means that `(bool, bool)` is represented as `{i1, i1}`,
// both in memory and as an immediate, while `bool` is typically
// `i8` in memory and only `i1` when immediate. While we need to
// load/store `bool` as `i8` to avoid crippling LLVM optimizations,
// `i1` in a LLVM aggregate is valid and mostly equivalent to `i8`.

@tstellar
Copy link

tstellar commented May 9, 2018

It seems like there is a mismatch then, because DirBuilder::_create() seems to assume that bool is 8-bits, but rust only emits a 1-bit store here.

@cuviper
Copy link
Member

cuviper commented May 9, 2018

Seems like that last statement in the comment may the thing we need to challenge, that "i1 in a LLVM aggregate is valid and mostly equivalent to i8."

@cuviper cuviper added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels May 9, 2018
@tstellar
Copy link

tstellar commented May 9, 2018

An i1 value in memory will be 8-bits, so I think that statement is more or less correct. The problem as I see it is that rust is assuming that the i1 value will be zero extended before it is stored, but according the the LLVM language ref the value of the other 7-bits is unspecified: http://llvm.org/docs/LangRef.html#id196

@cuviper
Copy link
Member

cuviper commented May 9, 2018

Here's the IR of DirBuilder::_create without optimization -- note the %4 = trunc i8 %3 to i1.

; std::fs::DirBuilder::_create
; Function Attrs: uwtable
define void @_ZN3std2fs10DirBuilder7_create17he9982b501e5a1dccE(%"core::result::Result<(), io::error::Error>"* noalias nocapture sret dereferenceable(16), { i32, i1 }* noalias readonly dereferenceable(8) %self, %"path::Path"* noalias nonnull readonly %path.0, i64 %path.1) unnamed_addr #1 {
start:
  %1 = getelementptr inbounds { i32, i1 }, { i32, i1 }* %self, i32 0, i32 1
  %2 = bitcast i1* %1 to i8*
  %3 = load i8, i8* %2, align 1, !range !0
  %4 = trunc i8 %3 to i1
  br i1 %4, label %bb1, label %bb2

bb1:                                              ; preds = %start
; call std::fs::DirBuilder::create_dir_all
  call void @_ZN3std2fs10DirBuilder14create_dir_all17hcd76f7c5d629ca4bE(%"core::result::Result<(), io::error::Error>"* noalias nocapture sret dereferenceable(16) %0, { i32, i1 }* noalias readonly dereferenceable(8) %self, %"path::Path"* noalias nonnull readonly %path.0, i64 %path.1)
  br label %bb3

bb2:                                              ; preds = %start
  %5 = bitcast { i32, i1 }* %self to i32*
; call std::sys::unix::fs::DirBuilder::mkdir
  call void @_ZN3std3sys4unix2fs10DirBuilder5mkdir17hc7463290099457b9E(%"core::result::Result<(), io::error::Error>"* noalias nocapture sret dereferenceable(16) %0, i32* noalias readonly dereferenceable(4) %5, %"path::Path"* noalias nonnull readonly %path.0, i64 %path.1)
  br label %bb4

bb3:                                              ; preds = %bb1
  br label %bb5

bb4:                                              ; preds = %bb2
  br label %bb5

bb5:                                              ; preds = %bb4, %bb3
  ret void
}

With optimization, we instead get %4 = icmp eq i8 %3, 0.

; std::fs::DirBuilder::_create
; Function Attrs: uwtable
define void @_ZN3std2fs10DirBuilder7_create17he9982b501e5a1dccE(%"core::result::Result<(), io::error::Error>"* noalias nocapture sret dereferenceable(16), { i32, i1 }* noalias nocapture readonly dereferenceable(8) %self, %"path::Path"* noalias nonnull readonly %path.0, i64 %path.1) unnamed_addr #1 {
start:
  %1 = getelementptr inbounds { i32, i1 }, { i32, i1 }* %self, i64 0, i32 1
  %2 = bitcast i1* %1 to i8*
  %3 = load i8, i8* %2, align 1, !range !184
  %4 = icmp eq i8 %3, 0
  br i1 %4, label %bb2, label %bb1

bb1:                                              ; preds = %start
; call std::fs::DirBuilder::create_dir_all
  tail call fastcc void @_ZN3std2fs10DirBuilder14create_dir_all17hcd76f7c5d629ca4bE(%"core::result::Result<(), io::error::Error>"* noalias nocapture nonnull sret dereferenceable(16) %0, { i32, i1 }* noalias nonnull readonly dereferenceable(8) %self, %"path::Path"* noalias nonnull readonly %path.0, i64 %path.1)
  br label %bb5

bb2:                                              ; preds = %start
  %5 = getelementptr inbounds { i32, i1 }, { i32, i1 }* %self, i64 0, i32 0
; call std::sys::unix::fs::DirBuilder::mkdir
  tail call fastcc void @_ZN3std3sys4unix2fs10DirBuilder5mkdir17hc7463290099457b9E(%"core::result::Result<(), io::error::Error>"* noalias nocapture nonnull sret dereferenceable(16) %0, i32* noalias nonnull readonly dereferenceable(4) %5, %"path::Path"* noalias nonnull readonly %path.0, i64 %path.1)
  br label %bb5

bb5:                                              ; preds = %bb2, %bb1
  ret void
}

(and note in this bug we have pre-compiled, optimized std code, mixed with debug user code.)

@cuviper
Copy link
Member

cuviper commented May 9, 2018

That !range !184 in the optimized case is !184 = !{i8 0, i8 2}.

@cuviper
Copy link
Member

cuviper commented May 9, 2018

The !range !0 in the unoptimized case has the same definition, !0 = !{i8 0, i8 2}.

@cuviper
Copy link
Member

cuviper commented May 9, 2018

Summary so far:

  • Rust loads bool as an i8 with range [0,2), then trunc i8 to i1
    • With optimizations, LLVM realizes that the range makes this trunc redundant, and doesn't bother.
  • Rust also stores bool as an i8, explicitly zero-extended from i1.
  • Rust store aggregates containing bool directly, without zero-extending or otherwise filling bits.

Possible changes on Rust's side:

  • Load bool directly with load i1, rather than load i8 range [0,2) then truncated to i1.
    • This comment implies that this might cripple optimization.
  • Define bool as i8 in aggregates, so DirBuilder would be { i32, i8 }.
    • Clang seems to do this for C, e.g. struct { int i; bool b; } becomes type { i32, i8 }. It still uses i1 for immediate bools in arguments and return values though.

@eddyb, any thoughts on this?

@eddyb
Copy link
Member

eddyb commented May 11, 2018

(I had written a comment but I deleted it because I missed a bit of information)

but according the the LLVM language ref the value of the other 7-bits is unspecified:

Alright, I didn't expect this. It seems counter-intuitive because now a load of an i1 followed by a zext to i8 has to mask out all bits but the lowest one, instead of being a noop.
Does AArch64 GlobalISel do that masking? If not, that seems like a bug on LLVM's side.
Also, how does this interact with _Bool? It does seem rather problematic IMO.

Note that rustc creates LLVM aggregates containing i1 only from scalar pairs, where the two values are usually kept outside of the aggregate, and only combined for returning the pair.
When the scalar pair's fields are accessed through a pointer to the scalar pair, that's where I think things start breaking down. I would've used the memory type (i8) but that created more issues.
Sadly I can't remember if any of the issues were since obsoleted.

OTOH, #50277 tried to switch all bools to i1 but it hit SROA not understanding i1 as well as i8.

@tstellar
Copy link

Forgetting about the LLVM bug for a moment, in general I would recommend that rust uses i8 as the memory type for bool. Rust is effectively already doing this by bitcasting i1* to 18* before loads and stores, but I think it is better to be explicit about this in the IR. clang also uses i8 as the memory type for bools and matching clang means you are more likely to get favorable results from the optimizer.

@eddyb
Copy link
Member

eddyb commented May 11, 2018

@tstellar We do that, and the only exception is an optimization for types that can be passed around as two scalars (integer/float/pointer).
LLVM is really bad at handling booleans, IMO, not being able to optimally handle memory accesses with the type it forces you to use for immediates.

Anyway, I guess we try to switch the aggregate type for scalar pairs to contain the memory types of the two components and somehow do all the conversions needed, and maybe that will both fix this bug and not regress any optimizations (or even unregress others).

@cuviper
Copy link
Member

cuviper commented May 11, 2018

Aha, I was confused trying to find all the generated i1s -- I found it for scalar pairs, but thought I was missing something when I couldn't find it elsewhere.

yrashk added a commit to sit-fyi/sit that referenced this issue May 17, 2018
Tests pertaining to duplicate name fail in debug mode on aarch64,
the OS error is never returned.

Solution: use FastISel instead of GlobalISel

Tracking issue has been filed with Rust:
rust-lang/rust#50516
@cuviper
Copy link
Member

cuviper commented Jun 15, 2018

With #51566, that bfi x8, x9, #32, #1 becomes bfi x8, x9, #32, #8, copying a whole byte instead of just one bit. Otherwise the asm is identical, and the original reproducer and my println example now both work correctly.

@workingjubilee workingjubilee changed the title std::fs:create_dir not failing on existing directory on aarch64, debug std::fs::create_dir not failing on existing directory on aarch64, debug Jul 18, 2022
@workingjubilee workingjubilee added O-AArch64 Armv8-A or later processors in AArch64 mode and removed O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state labels Aug 17, 2022
@workingjubilee
Copy link
Member

Taking a closer look instead of merely retagging, it seems this was resolved in #51583 so I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants