Add some additional utility methods to OsString and OsStr #1307

eefriedman · 2015-10-04T21:10:22Z

wthrowe · 2015-10-05T02:37:41Z

I've actually been thinking about OsStr(ing) interfaces recently and was thinking about writing something up, but hadn't gotten around to it yet. In terms of this proposal, is_empty and clear seem obviously reasonable, and I don't think there could be any doubt about what they do. from_string is also fine, although OsString already implements From<String>, which does the same thing.

It isn't clear to me what the length and capacity related methods would do, though. The internal representation of OsString on Windows is not very meaningful and most of the details of it are intentionally not exposed by the interface. Would these methods give values related to the internal WTF-8 representation (which should be useless, because all consequences of that encoding should be hidden), or of the system ill-formed UTF-16 representation (which would require some kind of conversion)?

eefriedman · 2015-10-05T03:41:07Z

It isn't clear to me what the length and capacity related methods would do, though.

len() and capacity() are defined in terms of the number of bytes used internally. That has uses even without being able to index into an OsString. len() and with_capacity() can be used alongside OsString::push to avoid unnecessary reallocations. capacity() can be used as a rough estimate of heap usage. Probably some other uses I'm not thinking of off the top of my head.

Granted, they don't really have any semantic meaning, but they still have uses.

Actually, come to think of it I forgot to add reserve() and reserve_exact().

wthrowe · 2015-10-05T05:35:23Z

Good points. I can see why we'd want those, now.

So I think I now like everything here except from_string, because the conversion functions related to OsString are already a mess and I don't think we want to add yet another nonstandard one that duplicates already existing functionality. (It doesn't look like anything currently exposed by the standard library has a from_string method.)

nagisa · 2015-10-05T05:41:15Z

We usually add specific conversion methods for conversions that are expected to be done commonly (e.g. str → String). I don’t see people using OsString oftenly, let alone its conversion methods. I’d vote for leaving from_string out here as well.

len and capacity both sound fine, because on all currently supported platforms the OsString representation is byte-based (bytes on unix, WTF-8 on Windows).

eefriedman · 2015-10-05T22:28:08Z

Updated: removed from_string, added reserve and reserve_exact, added a brief note in the drawbacks section noting the discussion of the meaning of len().

Diggsey · 2015-10-06T11:18:11Z

Looks good. OsStr should probably also implement AsciiExt.

I had a look through and there weren't any other methods on str that made sense to implement: anything involving substrings can't work correctly in WTF8.

However, it may be worthwhile (although out of scope for this RFC) defining an OsChar type which is a newtype around either u8 or u16.

Substring operations can then be defined in terms of iterators over OsChars and OsString can implement From<Iterator<OsChar>>, so that iter.collect() can produce an OsString.

BurntSushi · 2015-10-07T21:32:47Z

For from_string, note that today's unstable from_bytes works with anything that impls Into<Vec<u8>> (String has such an impl).

I feel like these methods won't be used that often, but if you need to use OsString, then these do seem like sensible things to have.

I think this looks good to me.

wthrowe · 2015-10-07T22:53:34Z

There's also the stable OsString::from(s: String)-> OsString.

alexcrichton · 2015-10-15T01:25:32Z

🔔 This RFC is now entering its week-long final comment period 🔔

I feel that these methods are quite conventional among collections, so the only particular question in my mind would be whether we want them at all. I originally thought that they're not useful because the length of an OS string doesn't actually mean anything in terms of decoding it, but it does mean something in terms of performance (a key insight here), which I feel is justification enough for their inclusion.

sfackler · 2015-10-15T01:32:21Z

I feel a bit iffy with the naming choice of len - it seems like people will inevitably try to apply the "intuitive" definition of length which will end up being bogus on Windows, no matter how dire any warnings in the documentation are. Maybe rename it to storage_size or something like that instead?

BurntSushi · 2015-10-15T01:37:58Z

@sfackler Isn't that the case for String's len method too? Specifically, it refers to the storage size rather than something else like [Insert Favorite Unicode Semantic Here]. (I very well could be missing a subtlety here!)

sfackler · 2015-10-15T01:47:00Z

Well, String's storage size is the UTF8 size, which is way more common than WTF8 is :)

retep998 · 2015-10-16T22:17:13Z

@sfackler Working with strings with Windows API is already a minefield. If someone doesn't understand what len on OsStr does, that's probably the least of their problems. We'd gain a lot more from just having a solid guide on how to work with Windows API, rather than trying to give functions weird names.

arthurprs · 2015-10-27T02:29:16Z

I believe len should stick to the existing semantics (string, vec, all collections) and return the number of underlying items, in this case the utf16ish words.

BurntSushi · 2015-10-27T02:38:34Z

@arthurprs &str's len returns the number of UTF-8 encoded bytes. Seems to me like OsStr should do the same (return the number of encoded bytes).

Diggsey · 2015-10-27T02:41:52Z

I believe len should stick to the existing semantics (string, vec, all collections) and return the number of underlying items, in this case the utf16ish words.

That would require iterating the OsString to calculate, when the len() functions on strings and vecs operate in constant time. It would also be totally inconsistent across platforms, and therefore unusable from non-platform-specific code.

Returning a byte count is portable, consistent and has practical usefulness for when building OsStrings. Without it, there's no direct relation between reserve and reserve_exact.

I don't think this breaks the encapsulation of the WTF8 encoding because len() is a lower level operation: it's the same reason that having size_of can be useful even when struct layout is ill-defined.

arthurprs · 2015-10-27T02:44:11Z

Yeah, I believe you guys are right since the underlying data is actually wtf8.

alexcrichton · 2015-10-29T17:22:02Z

The libs team discussed this during triage yesterday and the decision was to merge, thanks again for the RFC @eefriedman!

Tracking issue: rust-lang/rust#29453

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 5, 2015

RFC to add some additional utility methods to OsString and OsStr.

736bfce

eefriedman force-pushed the osstring-simple branch from ba84aa9 to 736bfce Compare October 5, 2015 22:25

wthrowe mentioned this pull request Oct 6, 2015

OS string string-like interface #1309

Closed

BurntSushi self-assigned this Oct 7, 2015

alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Oct 15, 2015

alexcrichton mentioned this pull request Oct 29, 2015

Add Capacity/length methods for OsString rust-lang/rust#29453

Closed

alexcrichton merged commit 736bfce into rust-lang:master Oct 29, 2015

Centril added A-platform Platform related proposals & ideas A-collections Proposals about collection APIs labels Nov 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some additional utility methods to OsString and OsStr #1307

Add some additional utility methods to OsString and OsStr #1307

eefriedman commented Oct 4, 2015

wthrowe commented Oct 5, 2015

eefriedman commented Oct 5, 2015

wthrowe commented Oct 5, 2015

nagisa commented Oct 5, 2015

eefriedman commented Oct 5, 2015

Diggsey commented Oct 6, 2015

BurntSushi commented Oct 7, 2015

wthrowe commented Oct 7, 2015

alexcrichton commented Oct 15, 2015

sfackler commented Oct 15, 2015

BurntSushi commented Oct 15, 2015

sfackler commented Oct 15, 2015

retep998 commented Oct 16, 2015

arthurprs commented Oct 27, 2015

BurntSushi commented Oct 27, 2015

Diggsey commented Oct 27, 2015

arthurprs commented Oct 27, 2015

alexcrichton commented Oct 29, 2015

Add some additional utility methods to OsString and OsStr #1307

Add some additional utility methods to OsString and OsStr #1307

Conversation

eefriedman commented Oct 4, 2015

wthrowe commented Oct 5, 2015

eefriedman commented Oct 5, 2015

wthrowe commented Oct 5, 2015

nagisa commented Oct 5, 2015

eefriedman commented Oct 5, 2015

Diggsey commented Oct 6, 2015

BurntSushi commented Oct 7, 2015

wthrowe commented Oct 7, 2015

alexcrichton commented Oct 15, 2015

sfackler commented Oct 15, 2015

BurntSushi commented Oct 15, 2015

sfackler commented Oct 15, 2015

retep998 commented Oct 16, 2015

arthurprs commented Oct 27, 2015

BurntSushi commented Oct 27, 2015

Diggsey commented Oct 27, 2015

arthurprs commented Oct 27, 2015

alexcrichton commented Oct 29, 2015