-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default last in strutils.find to -1 #18173
Conversation
pre-existing API is horrible, conflating a valid value with a different meaning note that this breaks code, and the resulting API is still bad because it doesn't use openArray; instead how about this: introduce func find*(s: openArray[char], sub: string): int = ...
func find*(s, sub: string, start: Natural, last = -1): int {.deprecated: "use overload with openArray".} = ...
func find*(s, sub: string, last: int): int {.deprecated: "use overload with openArray".} = ... (the deprecated procs, as i wrote it, have at least 1 non-optional param (start or last), so that the correct overload is picked if no scuh param is used) more generally
=> just summarized this in nim-lang/RFCs#381 links |
@timotheecour I've added new variants of First, before this change, a call to Second, the disadvantage of these methods taking Lines 1857 to 1863 in 58159fc
strutils ?
For example, # This seems much more useful...
doAssert find("dexxxdex", "de", start=5) == 5
# ...than this one
doAssert find("dexxxdex".toOpenArray(5, 7), "de") == 0 |
indeed, tough problem, it's hard to satisfy both of these: find(a, b) # uses new API
find(a, b, 3) # means start = 3
find(a, b, last = 3) # means last = 3 and we can't introduce this silent breaking change I need to think more here... That would also address your 2nd point although i kind of disagree with that 2nd point; toOpenArray simplifies the interface / complexity of the API; you pass a container that knows its bounds and don't have to care about whether it was a slice from a bigger container. |
These new operations don't have to be in strutils anyway... |
ok for |
Okay great, we'll leave all overloads of Apache Commons Lang is an example of another popular library that has addressed what I raised in my second point in #18173 (comment). They have two methods:
I strongly believe this is the API we should strive for. Notably, there is no need for a |
Another nice-to-have feature would be that the return types of these |
ok for ok for: proc indexOf*(s: openArray[char], sub: openArray[char] | char): int I don't think a The place where start is really needed is when the API needs some lookbehind (eg regex), which isn't the case here.
This would just add un-necessary complexity for users which then need to do conversion between int and StringIndex everytime this is used. |
My plan going forward (in order):
Would it be useful to always have a naive implementation available (useful because the naive implementation requires no preprocessing)? |
@n5m IMO all those comments you link to our outdated because we now use For now though, I'd recommend just having a version of [1] precedent: dragonbox, schubfach |
I ran into some issues here. You'll note that in my most recent push, I have this: Lines 149 to 151 in 09a0df2
This is because although the following linse work in a standalone const haystack: seq[char] = @['a', 'b', 'c']
doAssert haystack.indexOf(@['a', 'b', 'c']) == 0 Do you know why these line fail in a test but work outside of it? Even more problematic is that if we force the use of Lines 232 to 236 in 09a0df2
then const haystack: string = "0123456789ABCDEFGH"
echo haystack.toOpenArray(5, 9).indexOf("A") prints "5" when it should print out "-1". This makes the case that perhaps |
@timotheecour I'm still having trouble using the Example: func c_strstr(haystack, needle: cstring): cstring {.
importc: "strstr", header: "<string.h>".}
func naiveIndexOf(haystack, needle: openArray[char]): int =
# naive implementation, doesn't handle -1 case
let found = c_strstr(haystack[0].unsafeAddr, needle[0].unsafeAddr)
return cast[ByteAddress](found) -% cast[ByteAddress](haystack[0].unsafeAddr)
echo naiveIndexOf("abcdef", "e") # prints 4
echo naiveIndexOf("abcdef".toOpenArray(1, 2), "e") # prints 3 $ nim --version
Nim Compiler Version 1.5.1 [Linux: amd64]
Compiled at 2021-08-21
Copyright (c) 2006-2021 by Andreas Rumpf
git hash: 5b26f2bd81d6fc7d48befbfb4fa3317f713af787 |
can't you do something like: if found == nil: result = -1
else: ... |
Yes, but unfortunately there is a bigger obstacle. I'll trim down my example from #18173 (comment) func naiveIndexOf(haystack, needle: openArray[char]): int =
# naive implementation, doesn't handle -1 case
let found = c_strstr(haystack[0].unsafeAddr, needle[0].unsafeAddr)
return cast[ByteAddress](found) -% cast[ByteAddress](haystack[0].unsafeAddr)
echo naiveIndexOf("abcdef".toOpenArray(1, 2), "e") # prints 3 This shows that |
# This is the 1st commit message: default last in strutils.find to -1 # This is the commit message nim-lang#2: Revert "default last in strutils.find to -1" This reverts commit 3a99556. # This is the commit message nim-lang#3: add openArray variable of find with SkipTable # This is the commit message nim-lang#4: add find method that accepts an openArray # This is the commit message nim-lang#5: add tests # This is the commit message nim-lang#6: Revert "add find method that accepts an openArray" This reverts commit 2ab5351. # This is the commit message nim-lang#7: Revert "add openArray variable of find with SkipTable" This reverts commit 58159fc.
fix CI tests on public nimble modules
fix koch docs generation
Succeeded by #19761. |
Refs #18127 (comment). The docs for
find
say thatlast
represents an inclusive end index, butlast=0
is not treated as the 0th index. This PR fixes that.Future directions:
IndexDefect
iflast < start
, which will help with edge cases for find and rfind differ on empty needle #18128.last
being restricted to union(-1,Natural
) is a bit inconvenient. Is there room for something adjacent toNatural
like this?This would be like
ssize_t
and this type could also be used as the return type offind
andrfind
. Alternatively, we could change the paramter infind
toinstead of what it is in the current state of this PR, which is