pick names for fixed-size integer types #543

zygoloid · 2021-05-20T21:14:04Z

We should have names for integer types with fixed widths. For the moment I'm only considering signed, two's-complement integer types that are expected to match native machine register widths: at least 8, 16, 32, and 64 bit varieties should be supported. Some possible options:

Follow C++ LP64 convention: char is the 8-bit type, short is the 16-bit type, int is the 32-bit type, long is the 64-bit type.
Use int with a length suffix: int8, int16, int32, int64. This somewhat mimics the intN_t typedefs from C++.
Use i with a length suffix: i8, i16, i32, i64. This follows Rust (and LLVM).

Orthogonally, but not independently: (A) capitalize the first letter, (b) do not capitalize the first letter.

The text was updated successfully, but these errors were encountered:

zygoloid · 2021-05-20T21:15:59Z

I prefer 2 and 3 over 1, because they don't require remembering a somewhat arbitrary name -> width mapping, don't give any option the name int, and can be easily extended with more widths.

I prefer 3 over 2, simply because these types are common and a terse name seems helpful.

I prefer 3b over 3A, because I don't think I32 is sufficiently visually distinct from l32 or 132.

tkoeppe · 2021-05-20T21:44:33Z

I'd defer A vs B to a more general decision. Presumably we'll want to name several other built-in types, as well as standard library components, and we'll probably want them to use the same capitalization convention.

I mildly prefer 2 over 3 because the longer names stand out ever so much more and avoid confusion with variable names. (E.g. consider for (i8 i = 0; i != 8; i |= j) etc.)

Even though I feel some nostalgia for 1, there's probably too much baggage attached to chars, strings and aliasing to make "char" a tolerable name for an integer type. But on that note, we should probably also discuss byte types and string code unit types at some point, which may feed back into this discussion.

And what about an integer type for round-tripping with pointers, will that be an alias or also a distinct fundamental type?

geoffromer · 2021-05-20T23:58:40Z

The current plan of record is that type names (like all other compile-time constants) should have camel-case names, and I don't think it would be a good idea to make an exception for something so fundamental as integer types, so I think we should choose (A) unless we want to reconsider that whole policy.

I prefer 2 over 3, because I agree with @zygoloid that I32 is too visually similar to l32 and 132, and I don't think making these names 2 characters shorter provides much benefit. Furthermore, I think such a naming scheme would set a bad precedent: would we likewise use F32 and F64 for floating point types? And U8, U16 etc for unsigned integral types? And U8, U16 etc (uh oh) for unicode code unit types? Should B be the byte type? Or should it be the boolean type?

I prefer 2 over 1 for the reasons already cited by others. I think we might want to consider having a type (or alias?) named Int in addition to the explicitly-sized types discussed here, but I can't tell if options 2 and 3 are meant to rule that out, or to preserve that option.

fowles · 2021-05-21T01:51:28Z

@geoffromer What would you do for unsigned values? Uint32, UInt32, something else?

chandlerc · 2021-05-21T03:14:14Z

I largely agree with @geoffromer's summary.

For unsigned: UInt32 (capitalizing I is likely to help screen readers) or some other name such as Unsigned32. I don't have strong preferences about which name between UInt, Unsigned, or yet another good name.

However, I'll offer a potential rationale for iN while keeping the general rule about capitalization and type names.

We could say that i followed by decimal digits is actually a language keyword that aliases Int(N) (or some other spelling besides Int if there is a better one). No naming inconsistency. And it doesn't move the type out of the library, it just is a keyword alias. I can imagine good ergonomic reasons for having these aliases:

These really are going to be pervasive, so it seems defensible to optimize their ergonomics even w/ dedicated syntax.
Name ergonomics scale non-linearly in short names. I suspect i32 is more than a 40% improvement over Int32.
If we don't make these language-based, we'll just have a finite set of aliases that has to keep expanding (forever?). The keywords can give good diagnostics.

I think the biggest downside of this is we do have to decide "which types merit these aliases?" and stop adding them at some point. However, I think there are defensible heuristics here that we could pick and stick with it. Some candidates:

Any truly pervasive non-aggregate/composite/compound type that is also fundamentally size-parameterized.
- I think this would give us i32, u32 (unsigned), and f32. Maybe there are others, but I suspect not many.
Any non-aggregate/composite/compound storage type for which we have literal syntax. (Note the storage type here -- should work fine even if 5 has a type encoding its value.)
- I think this would add bool, likely some form of character type, and maaaybe a string-ish or string-view-ish type.

Maybe there are other heuristics, or tweaks to these heuristics...

To be honest, I'd be happy with any of these. I'd even be happy with this being completely ad-hoc and simply reflecting what makes code more readable, including easing the reading by C/C++ programmers.

I somewhat prefer having bool if we have i32, but its a mild preference. Everything else I don't feel strongly about.

I prefer having these alias keywords purely based on the ergonomics. But I could live without them.

geoffromer · 2021-05-21T17:14:39Z

@geoffromer What would you do for unsigned values? Uint32, UInt32, something else?

Yeah, either of those seem fine, and Chandler's rationale for preferring the latter makes sense to me.

I can imagine good ergonomic reasons for having these aliases:

These really are going to be pervasive, so it seems defensible to optimize their ergonomics even w/ dedicated syntax.

Name ergonomics scale non-linearly in short names. I suspect i32 is more than a 40% improvement over Int32.

If we don't make these language-based, we'll just have a finite set of aliases that has to keep expanding (forever?). The keywords can give good diagnostics.

I don't have a good intuition for why these would be more ergonomic, other than the sheer reduction in keystrokes, which I think is offset (or more than offset) by the reduced readability that comes from abbreviation. The first two points here seem to assume the conclusion (that i8 is more ergonomic than Int8). The last point seems plausible, but being able to give good diagnostics for library aliases seems like a problem we should solve either way. It also just feels like a really weird inversion to say that all the truly fundamental types come from the library, but some aliases to library types are part of the core language.

chandlerc · 2021-07-26T23:58:03Z

This came up in open discussion today, and I think everyone (including both me and @zygoloid among the leads) was surprisingly happy trying out the super simple rule of iN, uN, and fN all being keyword-like syntaxes for writing Int(N), UInt(N), and Float(N).

The biggest downside of that is likely i1, i2, and i3 (generally iN where N < 8) colliding with variables. The idea is to try this (including renaming colliding variables) and see how it goes. We can always back off of this stance if the collisions are too high.

We can also move toward a Int32-style if we get feedback around ergonomics of reading here to your point @geoffromer. So far folks seem (perhaps surprisingly) happy to have the iN super-short syntax without giving up the explicit size. The feedback from folks using this in Rust seems consistently positive for example.

Provided we don't get any new information, or other serious concerns, this is maybe our decision for now?

fowles · 2021-07-27T10:57:33Z

Does that include bool (he asks hopefully)?

chandlerc · 2021-07-27T10:59:31Z

Does that include bool (he asks hopefully)?

Not in this issue. We should get a separate question for that.

josh11b · 2021-07-27T14:59:23Z

In particular, I've heard that we don't want to consider bool to be an integer type.

chandlerc · 2021-07-31T00:20:17Z

I think everyone (including both me and @zygoloid among the leads) was surprisingly happy trying out the super simple rule of iN, uN, and fN all being keyword-like syntaxes for writing Int(N), UInt(N), and Float(N).

Also checked with @KateGregory and we have consensus here so closing.

Lex [iuf][1-9][0-9]* as a new kind of "sized type literal" token. When parsing that token, form a literal expression. Co-authored-by: Chandler Carruth <chandlerc@gmail.com>

Note this doesn't support other sizes or types, it just errors on them. Co-authored-by: Geoff Romer <gromer@google.com>

jonmeow · 2021-09-07T17:57:38Z

I would like to note a typo concern: i32, i332, i22, and i23 are easy to typo between each other and there will be limited compiler validation that people typed the correct one (and, given compiling code, a limited implicit indication that a non-standard choice is a typo versus deliberate).

If int is idiomatic, i32 might rarely be typed and so the chances of a typo are poor. However, if i32 is the norm, then typo forms seem more likely to occur. The difference here is that a typo of int is likely rejected by the compiler, whereas a typo of i32 is likely valid and would likely be caught at later development stages (if at all -- some will be harmless other than performance overhead).

Glancing at Rust and Swift my impression is both provide only limited forms, not the arbitrary digit forms.

Note this will also have curious implications for the prelude, to the extent that we'd discussed having Int be in the prelude instead of a language builtin... I think of this as parsing i##, then translating that to an Int(##) identifier and doing a lookup on that. But then maybe it shouldn't be a normal lookup to avoid shadowing bugs -- maybe it means there needs to be a special lookup form that's only used by i##, or Int just ends up being a builtin to avoid that complexity?

Lex [iuf][1-9][0-9]* as a new kind of "sized type literal" token. When parsing that token, form a literal expression. Co-authored-by: Chandler Carruth <chandlerc@gmail.com>

Note this doesn't support other sizes or types, it just errors on them. Co-authored-by: Geoff Romer <gromer@google.com>

jonmeow · 2022-08-11T22:14:10Z

Filed #1998 to track turning this into a proposal.

chandlerc closed this as completed Jul 31, 2021

jonmeow added a commit that referenced this issue Aug 5, 2021

Move Int parsing to i32 for #543 (#700)

5947ba1

Note this doesn't support other sizes or types, it just errors on them. Co-authored-by: Geoff Romer <gromer@google.com>

This was referenced Aug 13, 2021

Naming conventions for Carbon-provided features #750

Closed

Implement numeric types #767

Closed

chandlerc pushed a commit that referenced this issue Jun 28, 2022

Move Int parsing to i32 for #543 (#700)

2620ba0

Note this doesn't support other sizes or types, it just errors on them. Co-authored-by: Geoff Romer <gromer@google.com>

jonmeow mentioned this issue Jul 20, 2022

Ambiguities in the grammar #1450

Closed

jonmeow added the leads question A question for the leads team label Aug 10, 2022

jonmeow mentioned this issue Aug 11, 2022

Make proposal for numeric type literal syntax #1998

Closed

josh11b mentioned this issue Aug 16, 2022

Numeric type literal syntax #2015

Merged

chandlerc mentioned this issue Aug 25, 2022

Structure, scope, and naming of the prelude and syntax aliases #2113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pick names for fixed-size integer types #543

pick names for fixed-size integer types #543

zygoloid commented May 20, 2021

zygoloid commented May 20, 2021

tkoeppe commented May 20, 2021

geoffromer commented May 20, 2021

fowles commented May 21, 2021

chandlerc commented May 21, 2021

geoffromer commented May 21, 2021

chandlerc commented Jul 26, 2021

fowles commented Jul 27, 2021

chandlerc commented Jul 27, 2021

josh11b commented Jul 27, 2021

chandlerc commented Jul 31, 2021

jonmeow commented Sep 7, 2021

jonmeow commented Aug 11, 2022 •

edited

Loading

pick names for fixed-size integer types #543

pick names for fixed-size integer types #543

Comments

zygoloid commented May 20, 2021

zygoloid commented May 20, 2021

tkoeppe commented May 20, 2021

geoffromer commented May 20, 2021

fowles commented May 21, 2021

chandlerc commented May 21, 2021

geoffromer commented May 21, 2021

chandlerc commented Jul 26, 2021

fowles commented Jul 27, 2021

chandlerc commented Jul 27, 2021

josh11b commented Jul 27, 2021

chandlerc commented Jul 31, 2021

jonmeow commented Sep 7, 2021

jonmeow commented Aug 11, 2022 • edited Loading

jonmeow commented Aug 11, 2022 •

edited

Loading