Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting guidelines for inline assembly #152

Open
joshtriplett opened this issue Jun 10, 2020 · 22 comments
Open

Formatting guidelines for inline assembly #152

joshtriplett opened this issue Jun 10, 2020 · 22 comments

Comments

@joshtriplett
Copy link
Member

joshtriplett commented Jun 10, 2020

An issue I've observed in several contexts discussing the new asm! syntax for inline assembly: everyone formats asm! statements differently, and we should 1) come up with guidance on how to do so, and 2) implement that guidance in rustfmt.

Notably, this includes how to format both single-line and multi-line assembly statements.

EDIT: I've updated these guidelines to use the new support for multiple template string arguments, implemented in rust-lang/rust#73364 .

With my style team hat on, I would propose the following guidelines:

  • Single-line assembly code should be formatted as a single string argument and treated as any other argument to a format macro.
  • Multi-line assembly code should be formatted with one string argument per line of assembly, indented as separate arguments:
    asm!(
        "instruction 1",
        "instruction 2",
        ...,
    );
  • Common assembly formatting such as \n\t (often seen in inline assembly for other compilers that directly copy the provided string into their assembly output) is not necessary with Rust inline assembly. Focus on keeping the assembly readable. Note that Rust is not prescriptive about the formatting of your assembly, so if you wish to put multiple instructions or directives or labels on one line (and thus within one assembly string), you can do so, and rustfmt will treat each assembly string as a line.
  • Use the opening " of each string as the base for any indentation within the assembly code; for instance, if you want to indent instructions four spaces past labels, include the indentation inside the string before the instructions. That way, Rust formatting can keep the strings aligned and your assembly code will remain aligned within them:
    asm!(
        "1:",
        "    instruction 1",
        "    instruction 2",
        "2:",
        "    instruction 3",
    );
  • Simple asm! with only one assembly string can have the entire asm! including all its arguments on the same line, if the whole thing from asm!( to ); (plus indentation) fits in the line width.
  • Any asm! block that needs breaking across multiple lines, or any asm! block that has multiple assembly strings no matter how short, should always put each argument on its own line, aligned one indentation level past the asm!, with a trailing comma on each, just like a function.
  • options(...) goes on one line if it fits; otherwise, format it as though it were a nested function call to a function named options.
  • Never place any space or line breaks inside of in(reg) or out(reg) or inout(reg) or in("regname") or out("regname") or similar; always treat them as a single atomic unit.
  • If an inout or inlateout pair of expressions are too long to fit on one line, break before (never after) the =>, and indent the => one level further:
    asm!(
        "instruction {}",
        inout(reg) very_long_expression
            => very_long_out_expression,
    )
  • If an in(reg) or out(reg) or lateout(reg) expression is too long, break between the ) and the expression and indent one level, then format from there; however, if the expression can be broken internally, follow same the rules for when to leave the head of the expression on the same line as the in(reg) or similar as if in(reg) were the opener of a function:
    asm!(
        "instruction {}",
        in(reg)
            extremely_long_unbreakable_expression,
        in(reg) function_name()
            .method_name(method_arg)
            .further_chained_method(),
        out(reg) long_function_name(
            long_function_argument_expression,
        ),
    );
  • For named arguments like name = in(reg) expression, line-break it as you would an assignment statement with the same indentation, treating the in(reg) expression as the right-hand side of the assignment, and following all the same rules as above.
@Amanieu
Copy link
Member

Amanieu commented Jun 10, 2020

Formatting the string literal is actually very tricking since we technically need to preserve whitespace and newlines when formatting. This will limit how much we can we format the template string, in particular regarding indentation.

We could make an exception here since whitespace isn't significant for inline assembly.

@joshtriplett
Copy link
Member Author

@Amanieu That's not entirely true, if you do something especially unusual in your inline assembly. Assembly supports multi-line .string directives, among likely other things:

#![feature(asm)]

fn main() {
    let mut i: u32 = 2;
    unsafe {
        asm!(
            r#"
            add {:e}, 3
            jmp 1f
            .string "hello
            world"
            1:
            "#,
            inout(reg) i,
        );
    }
    dbg!(i);
}

(Assume for the sake of argument that I used that string for something.)

If you look at the disassembly of this, the whitespace is significant. Changing the amount of whitespace at the start of lines would change the semantic meaning of the program.

One way around this would be if we recommended using concat! for multi-line assembly. I don't know that that's worth the tradeoff.

Initially I would propose having formatting guidelines, and then doing as much as we can in rustfmt without changing the string literal.

I can absolutely think of real-world reasons someone might write something like this; for instance, embedding some long buffer. I can also think of better ways to do that, but nonetheless, formatting must not break user code. So I don't think we can consider any formatting that changes even the amount of space inside the string. In theory, we might be able to get away with changing the amount of whitespace at the start and end of the string, maybe, but I'm not sure that's worth it, and we'd still have to think very hard to see if anyone could write asm code that that would break.

@Amanieu
Copy link
Member

Amanieu commented Jun 10, 2020

I think that the case for formatting multi-line strings is more general and not restricted to inline asm: rust-lang/rustfmt#2876

@BartMassey
Copy link

I think I would like to see something along the lines of the weird C "adjacent string constants" thing for this case. Then I could put one string per line, which seems to me more convenient. Alternatively, we could just make the macro take an arbitrary number of strings, but I fear much confusion at the end there.

Maybe go nuts and have an implicit newline when the last character is not whitespace and an implicit tab when the first character is not whitespace? The whole

"
xor %rax, %rax\n
\tmov %rax, %rcx\n
.l0:\n
\tinc %rcx\n
"

nonsense I tend to go through with GCC/Clang is pretty annoying.

Speaking of which, what is the assumed whitespace state just before and just after a new asm!() statement?

@joshtriplett
Copy link
Member Author

If you want separate strings for each line, you can do that with concat!. That would certainly make formatting easier, at a cost of verbosity and extra indentation.

The assumed whitespace state should indeed be documented. There will be a newline before and after the assembly.

@Amanieu
Copy link
Member

Amanieu commented Jun 10, 2020

I don't think that whitespace state before/after the asm is significant since whitespace (except for newlines) is ignored by all assembly languages.

The use of "\t" in C is just because GCC pastes inline assembly directly into the output .S file and some developers want the -S output of the compiler to be nicely aligned when reading the resulting assembly. This does not apply to LLVM since it parses the inline asm and then re-prints the instructions from scratch when generating assembly output.

@joshtriplett
Copy link
Member Author

@Amanieu

The use of "\t" in C is just because GCC pastes inline assembly directly into the output .S file and some developers want the -S output of the compiler to be nicely aligned when reading the resulting assembly. This does not apply to LLVM since it parses the inline asm and then re-prints the instructions from scratch when generating assembly output.

Guidance on this point seems worth capturing within the formatting guidelines. I've added a new bullet point to the formatting above:

  • In assembly strings, focus on the readability and proper indentation of the assembly with respect to the surrounding Rust code. Common assembly formatting such as \n\t (often seen in inline assembly for other compilers that directly copy the provided string into their assembly output) is not necessary with Rust inline assembly.

@BartMassey
Copy link

BartMassey commented Jun 11, 2020

#![feature(asm)]

macro_rules! asm_block {
    ({$($line:literal),*$(,)?}$($stuff:tt)*) => {
        asm!(concat!($(concat!($line,"\n")),*)$($stuff)*)
    };
}

fn main() {
    let x: u64;
    unsafe { asm_block!(
        {
            "nop",
            "xor {0},{0}",
        },
        out(reg) x,
    )};
    println!("{}", x);
}

I will admit to being one of those people who would like to see lines tabbed by default in case I want to read the assembly. The macro could be fancied to do this either by providing an explicit no-tab or by inferring: e.g. lines that begin with "." or end with ":" don't get tabbed. The commas in the code block could also get left out, which might or might not be an improvement.

@Amanieu
Copy link
Member

Amanieu commented Jun 11, 2020

I really like this syntax and I'm considering supporting it directly in asm!. I would just make a small modification: adjacent literals concatenate directly and ; concatenates with a newline:

fn main() {
    let x: u64;
    unsafe { 
        asm!(
            {
                "nop";
                "xor" " {0}," " {0}";
            },
            out(reg) x,
        );
    }
    println!("{}", x);
}

I've used concat! a lot in the past with llvm_asm! but in my experience this has been rather troublesome:

macro_rules! asm_read {
    ($instr:expr, $width:expr, $trapped:expr, $type:ty, $ptr:expr) => {{
        let tmp: $type;
        llvm_asm! {
            concat!(
                "0: ", $instr, " ${1:", $width, "}, [$2];",
                asm_trap_list!()
            )
            : "+{x16}" ($trapped), "=r" (tmp)
            : "r" ($ptr as u64)
            :
            : "volatile"
        };
        mem::transmute_copy(&tmp)
    }};
}

With the new style it would look like this:

macro_rules! asm_read {
    ($instr:expr, $width:expr, $trapped:expr, $type:ty, $ptr:expr) => {{
        let tmp: $type;
        asm!(
            {
                "0: " $instr " {out:" $width "}, [{ptr}]";
                asm_trap_list!();
            },
            out = lateout(reg) tmp,
            ptr = in(reg) $ptr,
            inout("x16") $trapped,
            options(nostack, preserves_flags),
        );
        mem::transmute_copy(&tmp)
    }};
}

@BartMassey
Copy link

If you want to allow multiple literals per line, I'd suggest adding a space between each of them internally — I think it's going to be less error-prone.

@joshtriplett
Copy link
Member Author

joshtriplett commented Jun 12, 2020

@BartMassey

I will admit to being one of those people who would like to see lines tabbed by default in case I want to read the assembly.

If you mean the assembly output from the compiler, it sounds like it will get properly formatted (by LLVM) even if the input is not.

@Amanieu I really like the idea of a syntax like this as well. I would suggest a slight tweak, though: rather than putting a braced block as the first argument, could we just turn the whole asm! into a braced block rather than a parenthesized one?

asm!{
    "instruction 1",
    "instruction 2",
    "instruction 3",
    out = lateout(reg) var,
    options(nostack),
}

(I don't want to bikeshed the separators between assembly strings, though I will admit that I find a mix of semicolons and commas not ideal. I primarily want to advocate for using a single braced block, because we can easily tell the difference between string literals, options, and inputs/outputs.)

@BartMassey
Copy link

I guess I could have written the macro to avoid the braced block; I was just being lazy when I put it in, but since those strings are the only arguments of token type literal I guess I didn't need the braces. Ah well.

@Amanieu
Copy link
Member

Amanieu commented Jun 13, 2020

Note that macros don't care about the type of delimiter used: [], () and {} can all be used (this is how vec![] works).

I like the idea about allowing multiple string literals in asm! which would be treated as separate lines. I feel that this would make multi-line asm much more readable than multi-line strings.

@joshtriplett
Copy link
Member Author

@Amanieu It'll also mean that rustfmt will be able to reliably format inline asm blocks. I'd love to see this.

@joshtriplett
Copy link
Member Author

Implemented in rust-lang/rust#73364 ; RFC updated via Amanieu/rfcs#1 .

@joshtriplett
Copy link
Member Author

I've updated the formatting guidelines at the top of this issue to use the new support for multiple assembly string arguments.

@calebcartwright
Copy link
Member

@joshtriplett - is my understanding correct that you all have settled on target formatting as summarized in the updated description, and all that's pending is for someone to codify those rules within the guide? (wondering if we need a new page for special case macros that have their own spec 🤔)

@joshtriplett
Copy link
Member Author

I believe we've settled on exactly what's in the description of this issue, yes.

@calebcartwright
Copy link
Member

cc @ytmimi - know we've got a few things already in flight, but think this would be a good item for you to look at next if you're interested. we can chat more about it offline

ytmimi added a commit to ytmimi/rust that referenced this issue Jan 11, 2022
To more easily allow rustfmt to format the asm! macro as specified in
rust-lang/style-team#152 certain fields are made public.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 11, 2022
…calebcartwright

Update AsmArgs field visibility for rustfmt

To more easily allow rustfmt to format the ``asm!`` macro as specified in
rust-lang/style-team#152 certain fields are made public.

r? `@calebcartwright`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 11, 2022
…calebcartwright

Update AsmArgs field visibility for rustfmt

To more easily allow rustfmt to format the ``asm!`` macro as specified in
rust-lang/style-team#152 certain fields are made public.

r? ``@calebcartwright``
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 12, 2022
…calebcartwright

Update AsmArgs field visibility for rustfmt

To more easily allow rustfmt to format the ``asm!`` macro as specified in
rust-lang/style-team#152 certain fields are made public.

r? ```@calebcartwright```
@ytmimi
Copy link

ytmimi commented Jan 24, 2022

Hey @joshtriplett, I've spent some time working on getting this implemented in rustfmt (rust-lang/rustfmt#5191). The guidelines you wrote up were a great help. I also found the asm section of the Unstable Book useful!

Templates and options were well specified, but I have a few clarifying questions around the operands and clobber_abis that I hope you can help me with.

Operands

How should we handle const and sym operands?

Is it better to not break between the keyword (const / sym) and the expression for long, unbreakable expressions?

asm!(
    "instruction {}",
    const extremely_long_unbreakable_expression,
    sym extremely_long_unbreakable_symbol_name,
)

In the case that we're dealing with a named argument should we treat const and sym operands as follows?

asm!(
    "instruction {}",
    long_named_argument =
        const extremely_long_unbreakable_expression,
    long_named_argument =
        sym extremely_long_unbreakable_symbol_name,
);

For const specifically, is this the desired formatting for long expressions that can be broken? Given that sym is meant to refer to a function name, we don't need to worry about these cases, right?

asm!(
    "instruction {}",
    const function_name()
        .method_name(method_arg)
        .further_chained_method(),
);

asm!(
    "instruction {}",
    const long_function_name(
        long_function_argument_expression
    ),
);

Here are some other test cases that I wanted to get your opinion on:

asm!(
    "instruction {}",

    // case 1 - in chain out expr
    inout(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => very_long_out_expression,

    // case 2 - in function out expr
     inout(reg) long_function_name(
        long_function_argument_expression
    )
        => very_long_out_expression,

    // case 3 - in expr out chain
    inout(reg) very_long_expression
        => function_name()
            .method_name(method_arg)
            .further_chained_method(),

    // case 4 - in chain out chain
    inout(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => function_name()
            .method_name(method_arg)
            .further_chained_method(),

    // case 5 - in function out chain
    inout(reg) long_function_name(
        long_function_argument_expression
    )
        => function_name()
            .method_name(method_arg)
            .further_chained_method(),

    // case 6 - in expr out function
    inout(reg) very_long_expression
        => long_function_name(
            long_function_argument_expression
        ),

    // case 7 - in chain out function
    inout(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => long_function_name(
            long_function_argument_expression
        ),

    // case 8 - in function out function
    inout(reg) long_function_name(
        long_function_argument_expression
    )
        => long_function_name(
            long_function_argument_expression
        ),
)

For completeness I added test cases for inout and inlateout that also include named arguments. Just want to double check that something like this is okay.

asm!(
    aaaaaaaaaaaaaaaaaaaa =
        inout(reg) very_long_expression
            => very_long_out_expression,
);

Clobber ABIs

The Reference-level explanation in the Unstable book gives the following ABNF for clobber_abis and options:

clobber_abi := "clobber_abi(" <abi> *["," <abi>] [","] ")"
options := "options(" option *["," option] [","] ")"

Given that the ABNF is so similar should we also treat clobber_abis as we do options? So treat them like nested function calls if we need to break them.

Trailing Commas

Lastly, how should we handle trailing commas? I know that for most macros we don't want to add or remove tokens, but in this case trailing commas are optional. Is there a recommendation for how we should handle trailing commas?

Thanks in advance for your help on this!

@ytmimi
Copy link

ytmimi commented Jan 29, 2022

A few more operand related test cases that I wanted to get your opinion on:

In this case we have an inout that doesn't fit on the same line, but the out_expr is _. Should we still force an indented newline after the in expression?

asm!(
    "instruction {}",
    inout(reg) long_function_name(
        long_function_argument_expression
    )
        => _,
);

Or does this look better:

asm!(
    "instruction {}",
    inout(reg) long_function_name(
        long_function_argument_expression
    ) => _,
);

Same question as above, but with chains:

asm!(
    "instruction {}",
    in(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => _,
);
asm!(
    "instruction {}",
    in(reg) function_name()
        .method_name(method_arg)
        .further_chained_method() => _,
);

Do we always want to break inout's if the in_expr and out_expr don't fit on the same line? Here's an example where the two technically don't fit but I'm not sure if it's better to force the break or not:

asm!(
    "instruction {}",
    inout(reg) very_long_expression => function_name(
        long_function_argument_expression
    ),
);

Or do we always force the break?

asm!(
    "instruction {}",
    inout(reg) very_long_expression
        => function_name(
            long_function_argument_expression
        ),
);

Again, thanks for helping to clarify this for me.

@calebcartwright
Copy link
Member

@ytmimi - I'd suggest we move forward with a PR in this repo that adds text to the guide which covers the above. For the outstanding cases feel free to make your suggested guidance in the respective wording.

Folks can weigh in on that PR, and we can drop notes in a few different channels for broader awareness to allow others to weigh in. We'll also have a little flexibility for a while to adjust things, especially given that we have the benefit in this particular instance of being able to progressively roll out this behavior in rustfmt via a config option.

I think the existing guide text and structure should provide a good reference for framing the resultant rules identified on this thread, but feel free to ping me offline to chat about any specifics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants