-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes, strings: add Cut #46336
Comments
Change https://golang.org/cl/322210 mentions this issue: |
Change https://golang.org/cl/322209 mentions this issue: |
I like this idea. I have a question regarding the decision to return |
@eZanmoto Usually there is no value to return in a "negative" case. Here there is. Instead of thinking of the final bool as indicating whether the first two results exist at all, think of it as adding one more piece of information to the overall answer. That is, it's a different kind of bool from a map access, like math/big.Rat.Float64. Or think of it as similar to the values returned on range errors from strconv.ParseInt. I renamed the result in the declarations above to |
I had an idea too bad not to share:
I think this constitutes a vote for Cut. I do think it's correct to return s when found==false, because s is indeed the set of characters in s prior to the first instance of a thing which was not found in s. |
Hearty +1 from me. I am also in favor of |
The idea of the Cut function is great, but seeing hat his will be both for string and byte arrays, perhaps this could be a generic function? |
Generics are not really helpful for unifying string and []byte functions, and the separate strings and bytes packages will not be going away. |
I definitely support adding this feature, and would have used it many times already. I remember really appreciating when they added However, I don't think the name My vote would be for the exact same signature, but with the name |
Fwiw, as the author of the original issue I can say that |
Maybe pedantic, but as a mathematician, to me "partition" means a set of disjoint sets the union of which equals another set. Since Cut, as proposed, does not include the separator, it does not give a partition under that technical definition (adjusted for strings). Python's partition does include the separator in the returned value, so it does truly form a partition. |
When I first saw this proposal, I assumed |
I think Cut is a really illustrative metaphor for the operation. It takes a second to get it, but then once you get it it’s completely unambiguous. Also it’s short, so not v taxing in the page. I’m pro cut! |
I don't think "Cut" captures the meaning; it implies "remove/drop". Options: Split2 - associates with current Split ops Since many string-search ops support a LastOp() variant, I think LastCut should be added too. |
Chop? |
In case it's helpful to anyone, some evocative images (several of them food-related!) that I think capture the spirit of each word, with a comment on each:
|
But "Cut" does remove/drop the "sep" part. |
Rust calls this operation split_once |
For more readability and compatibility with other functions (Split, SplitN, ...), it is better to name it SplitOnce. |
Since the proposed API is most like the current Split APIs... SplitAt - emphasizes separator |
Let me suggest
|
This proposal has been added to the active column of the proposals project |
I think we probably want to wait on LastCut until we have more compelling data. |
This hasn't been merged yet. Would it be helpful for me to open a CL? |
I was just about to send one. Thanks though. |
Change https://golang.org/cl/351711 mentions this issue: |
Change https://golang.org/cl/351710 mentions this issue: |
Many uses of Index/IndexByte/IndexRune/Split/SplitN can be written more clearly using the new Cut functions. Do that. Also rewrite to other functions if that's clearer. For #46336. Change-Id: I68d024716ace41a57a8bf74455c62279bde0f448 Reviewed-on: https://go-review.googlesource.com/c/go/+/351711 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Should the godoc for edit: I thought I was too lazy to sign the CLA and setup gerrit etc to send a one word patch, but in fact this only required a slow Sunday afternoon: https://go-review.googlesource.com/c/go/+/354969 |
Change https://golang.org/cl/369981 mentions this issue: |
For #47694. Updates #46336. Change-Id: Ibbd058a1fd4d6b0aa38d3e8dc15b560d1e149f7e Reviewed-on: https://go-review.googlesource.com/c/go/+/369981 Trust: Austin Clements <austin@google.com> Reviewed-by: Jeremy Faller <jeremy@golang.org>
A reasonable argument against the name
was replied by:
And yet we haven't mentioned it. Besides, we should mention Moreover, in cases when users open the Maybe we should have named it |
We’re in the Go 1.18 beta period. I think the ship has sailed on the name, but it’s not too late to send docs fixes. |
Sent https://go.dev/cl/382954 for the docs. @danieljl Thanks for pointing that out. |
Change https://golang.org/cl/382954 mentions this issue: |
This comment was marked as resolved.
This comment was marked as resolved.
For #46336 Change-Id: Idc23302085e14e24d571f5995d6d33ca964a0021 Reviewed-on: https://go-review.googlesource.com/c/go/+/382954 Trust: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
I propose to add to package strings:
I similarly propose to add to package bytes:
Cut is a single function that replaces and simplifies the overwhelming majority of the usage of four different standard library functions at once (Index, IndexByte, IndexRune, and SplitN). It has also been invented twice in the main repo. For these reasons, I propose to add Cut to the standard library.
These were previously proposed as part of #40135, which turned into quite the bikeshed discussion about compiler optimization and so on. There were also various other functions proposed, which made the discussion even more wide-ranging.
This issue is a replacement for #40135, to start a new and I hope more focused discussion. Thanks very much to @ainar-g for starting the discussion on #40135.
Edit, May 24: Renamed the third result to
found
to avoid confusion with the comma-ok form for map access, channel receive, and so on, which always return zero values with ok = false.The Unreasonable Effectiveness of Cut
To attempt to cut off the bikeshedding this time, let me present data showing why Cut is worth adding.
Anecdotally, after the discussion on #40135 I copied the suggested implementation of Cut into every program that I wrote that could use it for a while, and that turned out to be just about every program I wrote that did anything with strings. That made me think there is something here.
To check that belief, I recently searched for uses of strings.Index, strings.IndexByte, or strings.IndexRune in the main repo and converted the ones that could use strings.Cut instead. Calling those all “Index” for the sake of discussion (any call with a constant sep should use Index anyway), there were:
That leaves 285 calls. Of those, 221 were better written as Cut, leaving 64 that were not.
That is, 77% of Index calls are more clearly written using Cut. That's incredible!
A typical rewrite is to replace:
with
I have long believed that this line from the first version is an elegant Go idiom:
However, it is undeniably more elegant not to write that line at all, as in the second version. More complex examples became quite a lot more readable by working with strings instead of indexes. It seems to me that indexes are the equivalent of C pointers. Working with actual strings instead, as returned by Cut, lets the code raise the level of abstraction significantly.
As noted in #40135, Cut is very much like SplitN with n=2, but without the awkwardness and potential inefficiency of the slice. A typical rewrite is to replace:
with
In the discussion on #40135, the point was raised that maybe we could make SplitN with count 2 allocate the result slice on the caller's stack, so that the first version would be as efficient as the second. I am sure we can with mid-stack inlining, and we probably should to help existing code, but making SplitN with n=2 more efficient does not make the code any less awkward to write. The reason to adopt Cut is clarity of code, not efficiency.
Of the 33 calls to SplitN in the main repo (excluding examples), 24 were using count 2 and were more clearly written using Cut instead.
That is, 72% of SplitN calls are also more clearly written using Cut. That's also incredible!
Looking in my public Go corpus, I found 88,230 calls to strings.SplitN. Of these, 77,176 (87%) use a fixed count 2. I expect that essentially all of them would be more clearly written using Cut.
It is also worth noting that something like Cut had been reinvented in two different packages as an unexported function: golang.org/x/mod/sumdb/note's chop (7 uses) and net/url's split (4 uses). Clearly, slicing out the separator is an incredibly common thing to do with the result of strings.Index.
The conversions described here can be viewed in CL 322210.
As noted in #40135, Cut is similar to Python's str.partition(sep), which returns (before, sep, after) if sep is found and (str, "", "") if not. The boolean result seems far more idiomatic in Go, and it was used directly as a boolean in over half the Cut calls I introduced in the main repo. That is, the fact that str.partition is useful in Python is added evidence for Cut, but we need not adopt the Pythonic signature. (The more idiomatic Go signature was first suggested by @nightlyone.)
Again, Cut is a single function that replaces and simplifies the overwhelming majority of the usage of four different standard library functions at once (Index, IndexByte, IndexRune, and SplitN), and it has been invented twice in the main repo. That seems above the bar for the standard library.
Why not other related functions?
A handful of other functions were suggested in #40135 as well. Here is some more data addressing those.
LastCut. This function would be
There are 107 calls to strings.LastIndex and strings.LastIndexByte in the main repo (compared to 285 for Index/IndexByte/IndexRune). I expect a substantial fraction of them would be made simpler using LastCut but have not investigated except for looking at a couple dozen random examples. I considered including LastCut in this proposal, but it seemed easier to limit it to the single function Cut.
It may be worth adding LastCut as well.
Until. This function would be
Of the 261 calls to strings.Cut I ended up with in the main repo, 51 (20%) were of this form. Having the single-result form would further simplify uses by allowing it to appear inside other expressions. Like with LastCut, I considered including Until in the proposal but it seemed easier to limit it to the single function Cut.
It may be worth adding Until as well.
(Until was also suggested as PrefixUntil.)
SuffixAfter. This function was suggested as meaning
Similarly, we could consider a variant using Cut instead of LastCut. In the 261 calls to strings.Cut I ended up with in the main repo, only 6 (2%) were of the form
_, x, _ = strings.Cut
. So the SuffixAfter form using the result of Cut is definitely not worth adding. It is possible that the data would come out differently for LastCut, but in the absence of that evidence, it is not worth proposing to add SuffixAfter. (Adding the LastCut form would require first adding LastCut as well.)SplitFirst. This function was suggested as meaning
Of the 261 calls to Cut I added to the main repo, only 106 (41%) discard the final result. Although the result can be reconstructed as
len(before) != len(s)
, it seems wrong to force the majority of uses to do that, especially since there are multiple results either way.Why not other unrelated functions?
There was concern in #40135 even among those who thought Cut was worth adding about where the bar was for standard library additions and whether adding Cut meant opening the floodgates. I am comfortable saying that the bar is somewhere near “replaces and simplifies the overwhelming majority of the usage of four different standard library functions at once” and similarly confident that establishing such a bar would not open any floodgates.
The text was updated successfully, but these errors were encountered: