Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statically sized string and byte string literals #339

Merged
merged 8 commits into from
Oct 30, 2014
Merged

Statically sized string and byte string literals #339

merged 8 commits into from
Oct 30, 2014

Conversation

petrochenkov
Copy link
Contributor

@Screwtapello
Copy link

I encountered this issue while learning Rust. To summarise: I'm working through the Matasano Crypto Challenges as a way to learn Rust, and as you might expect I've wound up with a bunch of functions that use 16-byte blocks for various things (keys, initialisation vectors, encrypted output, etc.).

I'd like to encode these 16-byte blocks into Rust's type-system as [u8, ..16], but for clarity I'd like to be able to call these functions with byte-array-literals like b"YELLOW SUBMARINE". Unfortunately, that's a &'static [u8] rather than a [u8, ..16], so all my functions have to take &[u8] parameters and check the length with runtime assertions, which is disappointing when this information is available at compile-time and could be checked by the compiler.

I am less concerned about fixed-length string-literals. Since the syntax for a string slice doesn't resemble an array, I didn't expect it to support array-ish things like fixed-length (besides, would it be byte-length or codepoint-length, etc. etc.)

@petrochenkov
Copy link
Contributor Author

@Screwtapello

besides, would it be byte-length or codepoint-length, etc. etc.

All these conventions have already settled for runtime strings, we can just bring them to compile time without changes.

@alexcrichton
Copy link
Member

@petrochenkov thanks for writing up this RFC! I think at this time a fixed-length string syntax may be a little too ambitious, but I'd love to see some clarifications for the byte literal syntax! Moving forward with this RFC, would you be ok extracting out the fixed-length strings for now, and considering them a candidate for a future RFC?

With respect to byte literals, however, I'd love to help you move this RFC forward. You take an interesting approach in this RFC by having all literals produce the borrowed version of their contents as opposed to the owned version. This has interesting implications with [T, ..N] because there's no way to actually get an owned, fixed-size array (you can't move out of a borrowed pointer).

Instead of going the route of borrowing, what would you think of a rule like:

A byte literal b"foo" is exactly the same as if it were written ['f' a u8, 'o' as u8, 'o' as u8].

This would mean that all byte literals would have the type [u8, ..N] and you could always add the & in front to get the &[u8] version (via a DST coercion). It sounds like you avoided this route because "foo", if we got a fixed length string syntax, would naturally be str[N], and this could be difficult to work with.

I suppose as a knee-jerk reaction I would expect something like:

  • [a, b, c] has type [T, ..3]
  • b"foo" has type &[u8, ..3]
  • "foo" has type &str

Those are all a little different which seems odd, though, and it would certainly be nice to reconcile them. I don't think we can get away with [a, b, c] producing &[T, ..3] due to ownership requirements, which then brings up the question of how close can we get "foo" and b"foo". There's also a question of if "foo" is different from [a, b, c], then does b"foo" need to be the same as either of them?

Hm, just some thoughts of mine, what do you think?

@petrochenkov
Copy link
Contributor Author

@alexcrichton

I think at this time a fixed-length string syntax may be a little too ambitious ... would you be ok extracting out the fixed-length strings for now, and considering them a candidate for a future RFC?

I actually think it would be better for fixed-length strings and string slices to live in the library and not in the core language (see "Unresolved questions"), so the str[..N] syntax is mostly a placeholder for the future library type. I suppose fixed-length strings and string literals can be a part of RFC on moving str to the library, it's just important not to forget about them.
By the way, can such a move be performed now, or something is blocking it?

This has interesting implications with [T, ..N] because there's no way to actually get an owned, fixed-size array (you can't move out of a borrowed pointer).

That's unfortunate. At least it doesn't apply to fixed-length (byte) strings, since they are always Copy.
I'm not strongly opposed to array literals being [T, ..N], but &'a [T, ..N] still has its benefits - better usability, possible staticness and consistency with other literals, as described in the RFC. Is special-casing of *[a, b, c] with regard to moving out bad enough to outweigh these benefits? I can't decide.
If it _is_ bad enough, then I suggest to stop on the "knee-jerk reaction" variant for now, and to change &str to some library type like &str<[u8, ..N]> in the near future.

With respect to byte literals, however, I'd love to help you move this RFC forward.

What is need to be done?
I implemented the "knee-jerk reaction" variant for byte string literals with small changes to librustc/middle, but then stumbled upon something very similar to issue #17233 and surrendered.

@alexcrichton
Copy link
Member

Currently there's no way to parameterize over the N of a fixed-length str, so I don't think that we could do fixed-length operations in a library today. I suppose in theory we could define struct str([u8]) in libcore today, but how much do you think that would end up buying us?

In the past special case for things like *[a] having type [T, ..1] and being valid for any T has bitten us. We used to treat &"foo" as an instance of &str (instead of &&str) for reasons that made sense at the time (I've since forgotten them), so I don't think we should go too far down the road of [a] producing &'a [T, ..1] instead of [T, ..1].

What is need to be done?

I suppose the first question to answer is what all these literals should produce. Today b"foo" is consistent with "foo" in that it is a slice, but because we have syntax for a fixed-length byte array, it seems like we should use it.

So, along those lines, here's some questions we should answer:

  1. Should b"foo" produce a slice or a fixed-size array? The pros of a slice are that it's consistent with string literals. The cons are that you're losing static information (the size).
  2. If b"foo" produces a fixed-size array, should it be [u8, ..N] or &'static [u8, ..N]? If a value is used, it continues to deviate even more from "foo", and it's also difficult to pass to a function expecting a slice.

I would currently probably answer "fixed size array" to the first question, and "reference" to the second one. Perhaps in the future "foo" could produce a static reference to a fixed-size string, but today we'll have to live with it producing a slice instead. To explain, I think I lean towards "reference" on the second question because "foo" in C produces a char* which is in Rust essentially a &'static [u8, ..N]. It would be a little surprising to produce a "foo" on the stack!

So, in terms of moving forward: what do you think of the two questions, and what would you answer them with? The RFC will need to be updated to removing the fixed-length string syntax (but certainly mention it as a future extension!) and be updated with whatever decision we make.

Also, remember that you don't personally have to implement this!

@petrochenkov
Copy link
Contributor Author

I suppose in theory we could define struct str([u8]) in libcore today, but how much do you think that would end up buying us?

Not struct str([u8]);, but struct str<Sized? T = [u8]>(T);.
Today it would buy us the main point of this RFC - statically sized string literals in the form of &'static str<[u8, ..N]> (note, that parameterizing over N isn't needed here). And I'm not talking about ideological things like minimizing the core language.
@eddyb mentioned, that moving str to the library was planned anyway, so, why to delay?

I don't think that we could do fixed-length operations in a library today.

I just want to lay the ground and ensure the future compatibility, fixed-length strings don't need to be useful right now. Fixed-length arrays are also much less useful today than they could be.

In the past special case for things like *[a] having type [T, ..1] and being valid for any T has bitten us. We used to treat &"foo" as an instance of &str (instead of &&str) for reasons that made sense at the time (I've since forgotten them), so I don't think we should go too far down the road of [a] producing &'a [T, ..1] instead of [T, ..1].

Okay, let array literals stay as they are. (Although, I'm still interested in the concrete reasons.)

So, along those lines, here's some questions we should answer:

  1. Should b"foo" produce a slice or a fixed-size array? The pros of a slice are that it's consistent with string literals. The cons are that you're losing static information (the size).
  2. If b"foo" produces a fixed-size array, should it be [u8, ..N] or &'static [u8, ..N]? If a value is used, it continues to deviate even more from "foo", and it's also difficult to pass to a function expecting a slice.

I would currently probably answer "fixed size array" to the first question, and "reference" to the second one.

I agree.

Perhaps in the future "foo" could produce a static reference to a fixed-size string, but today we'll have to live with it producing a slice instead.

I again do not understand why to wait.

To explain, I think I lean towards "reference" on the second question because "foo" in C produces a char* which is in Rust essentially a &'static [u8, ..N]. It would be a little surprising to produce a "foo" on the stack!

"foo" in C has type const char[4] (and is special-cased to be a static lvalue!), but arrays in C decay to pointers in most cases, so this difference often stays unnoticed. But it doesn't matter, reference here is still better in all respects :)

So, in terms of moving forward: what do you think of the two questions, and what would you answer them with? The RFC will need to be updated to removing the fixed-length string syntax (but certainly mention it as a future extension!) and be updated with whatever decision we make.

My answers to the two questions are the same as yours - b"abcd" should have type &'static [u8, ..4].
I will update the RFC, but I'd still like to get more feedback on &'static str<[u8, ..N]> for string literals.

Also, remember that you don't personally have to implement this!

I know, I was just trying to study the compiler and it seemed like a relatively easy exercise to start.

@alexcrichton
Copy link
Member

@eddyb mentioned, that moving str to the library was planned anyway, so, why to delay?

I'm personally unaware of any plans, nor what's the motivation for doing so.

Not struct str([u8]), but struct str<Sized? T = [u8]>(T)

In theory this could be added backwards-compatibly, right?

I'd still like to get more feedback on &'static str<[u8, ..N]> for string literals.

While that syntax is certainly plausible, there are many possible routes to go in about the precise syntax here. I suspect that the question for b"foo" producing a reference to a fixed-length array will be unanimously approved while taking time to bikeshed a fixed-length string syntax and discuss the semantics of impls and such may take longer.

I personally would not expect u8 to appear in a fixed length string syntax, as well as the struct definition you proposed because it means I could write foo: str<str> or foo: str<uint> which don't necessarily make sense. The point of a type parameter I see as you can put "almost anything" in there, where in this case the only real parameter is the length of a string, which cannot currently be a type-level parameter.

@petrochenkov
Copy link
Contributor Author

While that syntax is certainly plausible, there are many possible routes to go in about the precise syntax here. I suspect that the question for b"foo" producing a reference to a fixed-length array will be unanimously approved while taking time to bikeshed a fixed-length string syntax and discuss the semantics of impls and such may take longer.

Agree

I personally would not expect u8 to appear in a fixed length string syntax, as well as the struct definition you proposed because it means I could write foo: str<str> or foo: str<uint> which don't necessarily make sense. The point of a type parameter I see as you can put "almost anything" in there, where in this case the only real parameter is the length of a string, which cannot currently be a type-level parameter.

The key requirement here is the autocoercion from reference to fixed string to string slice an we are unable to meet it now without exposing u8.

Okay, I agree, it deserves a separate discussion and a better solution will be possible after gaining the ability to parameterize on integers.
Ideally, something like

struct __StrImpl<Sized? T>(T); // private

pub type Str<'a> = &'a __StrImpl<[u8]>; // string slice, public
pub type FixedString<const N: uint> = __StrImpl<[u8, ..N]>; // string of fixed size, public

// &FixedString<N> -> Str : OK, including &'static FixedString<N> -> Str<'static> for string literals

I will change the "do it now" wording to something like "ensure it will be possible in the future"

@alexcrichton
Copy link
Member

Yeah that seems more aligned with what I might expect,
all we need now is to enable Foo<const N: uint>!

@petrochenkov
Copy link
Contributor Author

Updated.
I haven't found a corrector this time, so it may contain more silly grammatical mistakes.

@alexcrichton
Copy link
Member

Looks good to me, thanks @petrochenkov! I'll see if I can bring this up at this week's meeting.

@hatahet
Copy link

hatahet commented Oct 13, 2014

@alexcrichton
Copy link
Member

Discussion

Tracking

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Let's just go to a plain old `Mutex` instead of trying to be fancy with an
opportunistic lock. It weeds out more bugs and we can deal with perf
improvements later if a correct implementation comes along.

Closes rust-lang#339
@Centril Centril added A-string Proposals relating to strings. A-typesystem Type system related proposals & ideas labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-string Proposals relating to strings. A-typesystem Type system related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants