Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pick names for fixed-size integer types #543

Closed
zygoloid opened this issue May 20, 2021 · 13 comments
Closed

pick names for fixed-size integer types #543

zygoloid opened this issue May 20, 2021 · 13 comments
Labels
leads question A question for the leads team

Comments

@zygoloid
Copy link
Contributor

We should have names for integer types with fixed widths. For the moment I'm only considering signed, two's-complement integer types that are expected to match native machine register widths: at least 8, 16, 32, and 64 bit varieties should be supported. Some possible options:

  1. Follow C++ LP64 convention: char is the 8-bit type, short is the 16-bit type, int is the 32-bit type, long is the 64-bit type.
  2. Use int with a length suffix: int8, int16, int32, int64. This somewhat mimics the intN_t typedefs from C++.
  3. Use i with a length suffix: i8, i16, i32, i64. This follows Rust (and LLVM).

Orthogonally, but not independently: (A) capitalize the first letter, (b) do not capitalize the first letter.

@zygoloid
Copy link
Contributor Author

I prefer 2 and 3 over 1, because they don't require remembering a somewhat arbitrary name -> width mapping, don't give any option the name int, and can be easily extended with more widths.

I prefer 3 over 2, simply because these types are common and a terse name seems helpful.

I prefer 3b over 3A, because I don't think I32 is sufficiently visually distinct from l32 or 132.

@tkoeppe
Copy link
Contributor

tkoeppe commented May 20, 2021

I'd defer A vs B to a more general decision. Presumably we'll want to name several other built-in types, as well as standard library components, and we'll probably want them to use the same capitalization convention.

I mildly prefer 2 over 3 because the longer names stand out ever so much more and avoid confusion with variable names. (E.g. consider for (i8 i = 0; i != 8; i |= j) etc.)

Even though I feel some nostalgia for 1, there's probably too much baggage attached to chars, strings and aliasing to make "char" a tolerable name for an integer type. But on that note, we should probably also discuss byte types and string code unit types at some point, which may feed back into this discussion.

And what about an integer type for round-tripping with pointers, will that be an alias or also a distinct fundamental type?

@geoffromer
Copy link
Contributor

The current plan of record is that type names (like all other compile-time constants) should have camel-case names, and I don't think it would be a good idea to make an exception for something so fundamental as integer types, so I think we should choose (A) unless we want to reconsider that whole policy.

I prefer 2 over 3, because I agree with @zygoloid that I32 is too visually similar to l32 and 132, and I don't think making these names 2 characters shorter provides much benefit. Furthermore, I think such a naming scheme would set a bad precedent: would we likewise use F32 and F64 for floating point types? And U8, U16 etc for unsigned integral types? And U8, U16 etc (uh oh) for unicode code unit types? Should B be the byte type? Or should it be the boolean type?

I prefer 2 over 1 for the reasons already cited by others. I think we might want to consider having a type (or alias?) named Int in addition to the explicitly-sized types discussed here, but I can't tell if options 2 and 3 are meant to rule that out, or to preserve that option.

@fowles
Copy link

fowles commented May 21, 2021

@geoffromer What would you do for unsigned values? Uint32, UInt32, something else?

@chandlerc
Copy link
Contributor

I largely agree with @geoffromer's summary.

For unsigned: UInt32 (capitalizing I is likely to help screen readers) or some other name such as Unsigned32. I don't have strong preferences about which name between UInt, Unsigned, or yet another good name.


However, I'll offer a potential rationale for iN while keeping the general rule about capitalization and type names.

We could say that i followed by decimal digits is actually a language keyword that aliases Int(N) (or some other spelling besides Int if there is a better one). No naming inconsistency. And it doesn't move the type out of the library, it just is a keyword alias. I can imagine good ergonomic reasons for having these aliases:

  • These really are going to be pervasive, so it seems defensible to optimize their ergonomics even w/ dedicated syntax.
  • Name ergonomics scale non-linearly in short names. I suspect i32 is more than a 40% improvement over Int32.
  • If we don't make these language-based, we'll just have a finite set of aliases that has to keep expanding (forever?). The keywords can give good diagnostics.

I think the biggest downside of this is we do have to decide "which types merit these aliases?" and stop adding them at some point. However, I think there are defensible heuristics here that we could pick and stick with it. Some candidates:

  • Any truly pervasive non-aggregate/composite/compound type that is also fundamentally size-parameterized.
    • I think this would give us i32, u32 (unsigned), and f32. Maybe there are others, but I suspect not many.
  • Any non-aggregate/composite/compound storage type for which we have literal syntax. (Note the storage type here -- should work fine even if 5 has a type encoding its value.)
    • I think this would add bool, likely some form of character type, and maaaybe a string-ish or string-view-ish type.

Maybe there are other heuristics, or tweaks to these heuristics...

To be honest, I'd be happy with any of these. I'd even be happy with this being completely ad-hoc and simply reflecting what makes code more readable, including easing the reading by C/C++ programmers.

I somewhat prefer having bool if we have i32, but its a mild preference. Everything else I don't feel strongly about.

I prefer having these alias keywords purely based on the ergonomics. But I could live without them.

@geoffromer
Copy link
Contributor

@geoffromer What would you do for unsigned values? Uint32, UInt32, something else?

Yeah, either of those seem fine, and Chandler's rationale for preferring the latter makes sense to me.

I can imagine good ergonomic reasons for having these aliases:

  • These really are going to be pervasive, so it seems defensible to optimize their ergonomics even w/ dedicated syntax.
  • Name ergonomics scale non-linearly in short names. I suspect i32 is more than a 40% improvement over Int32.
  • If we don't make these language-based, we'll just have a finite set of aliases that has to keep expanding (forever?). The keywords can give good diagnostics.

I don't have a good intuition for why these would be more ergonomic, other than the sheer reduction in keystrokes, which I think is offset (or more than offset) by the reduced readability that comes from abbreviation. The first two points here seem to assume the conclusion (that i8 is more ergonomic than Int8). The last point seems plausible, but being able to give good diagnostics for library aliases seems like a problem we should solve either way. It also just feels like a really weird inversion to say that all the truly fundamental types come from the library, but some aliases to library types are part of the core language.

@chandlerc
Copy link
Contributor

This came up in open discussion today, and I think everyone (including both me and @zygoloid among the leads) was surprisingly happy trying out the super simple rule of iN, uN, and fN all being keyword-like syntaxes for writing Int(N), UInt(N), and Float(N).

The biggest downside of that is likely i1, i2, and i3 (generally iN where N < 8) colliding with variables. The idea is to try this (including renaming colliding variables) and see how it goes. We can always back off of this stance if the collisions are too high.

We can also move toward a Int32-style if we get feedback around ergonomics of reading here to your point @geoffromer. So far folks seem (perhaps surprisingly) happy to have the iN super-short syntax without giving up the explicit size. The feedback from folks using this in Rust seems consistently positive for example.

Provided we don't get any new information, or other serious concerns, this is maybe our decision for now?

@fowles
Copy link

fowles commented Jul 27, 2021

Does that include bool (he asks hopefully)?

@chandlerc
Copy link
Contributor

Does that include bool (he asks hopefully)?

Not in this issue. We should get a separate question for that.

@josh11b
Copy link
Contributor

josh11b commented Jul 27, 2021

In particular, I've heard that we don't want to consider bool to be an integer type.

@chandlerc
Copy link
Contributor

I think everyone (including both me and @zygoloid among the leads) was surprisingly happy trying out the super simple rule of iN, uN, and fN all being keyword-like syntaxes for writing Int(N), UInt(N), and Float(N).

Also checked with @KateGregory and we have consensus here so closing.

zygoloid added a commit that referenced this issue Aug 2, 2021
Lex [iuf][1-9][0-9]* as a new kind of "sized type literal" token. When
parsing that token, form a literal expression.

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
jonmeow added a commit that referenced this issue Aug 5, 2021
Note this doesn't support other sizes or types, it just errors on them.

Co-authored-by: Geoff Romer <gromer@google.com>
@jonmeow
Copy link
Contributor

jonmeow commented Sep 7, 2021

I would like to note a typo concern: i32, i332, i22, and i23 are easy to typo between each other and there will be limited compiler validation that people typed the correct one (and, given compiling code, a limited implicit indication that a non-standard choice is a typo versus deliberate).

If int is idiomatic, i32 might rarely be typed and so the chances of a typo are poor. However, if i32 is the norm, then typo forms seem more likely to occur. The difference here is that a typo of int is likely rejected by the compiler, whereas a typo of i32 is likely valid and would likely be caught at later development stages (if at all -- some will be harmless other than performance overhead).

Glancing at Rust and Swift my impression is both provide only limited forms, not the arbitrary digit forms.

Note this will also have curious implications for the prelude, to the extent that we'd discussed having Int be in the prelude instead of a language builtin... I think of this as parsing i##, then translating that to an Int(##) identifier and doing a lookup on that. But then maybe it shouldn't be a normal lookup to avoid shadowing bugs -- maybe it means there needs to be a special lookup form that's only used by i##, or Int just ends up being a builtin to avoid that complexity?

chandlerc added a commit that referenced this issue Jun 28, 2022
Lex [iuf][1-9][0-9]* as a new kind of "sized type literal" token. When
parsing that token, form a literal expression.

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
chandlerc pushed a commit that referenced this issue Jun 28, 2022
Note this doesn't support other sizes or types, it just errors on them.

Co-authored-by: Geoff Romer <gromer@google.com>
@jonmeow jonmeow added the leads question A question for the leads team label Aug 10, 2022
@jonmeow
Copy link
Contributor

jonmeow commented Aug 11, 2022

Filed #1998 to track turning this into a proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

7 participants