-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Span.Until to Superpower.Parsers #129
Conversation
Thanks! We'll take a look as soon as we have some time free to think it through 👍 |
Sorry about the slow turnaround on this. I have a few thoughts but haven't been through in enough detail to be able to offer a useful review just yet - I'll add some comments anyway in case they're useful to move things forwards. |
src/Superpower/Parsers/Span.cs
Outdated
{ | ||
TextSpan x = input; | ||
|
||
while (!x.IsAtEnd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be nice to look at using string.IndexOf
here to speed up the search, since it'll be cleverly implemented at the low level? There's a similar algorithm in the regex matcher at: https://github.com/datalust/superpower/blob/dev/src/Superpower/Parsers/Span.cs#L219, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understood the indexOf reference you link to - is it the right place?
Anyway, I made a new version that I believe should be faster, as it only scans the input once.
Maybe it's better that you do the implementation, I think you are far more into all the details :-)
src/Superpower/Parsers/Span.cs
Outdated
x = x.ConsumeChar().Remainder; | ||
} | ||
|
||
return Result.Empty<TextSpan>(input, $"Until expected {text}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to pull out the formatting here and put it in a variable outside the returned closure, so that we don't allocate strings for every non-matching result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it :-)
@@ -83,5 +84,33 @@ public void RegexMatches() | |||
var r = parser(input); | |||
Assert.Equal("Foo", r.Value.ToStringValue()); | |||
} | |||
|
|||
[Theory] | |||
[InlineData("", "")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't Until("STOP")
fail if `"STOP" is missing from the input string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I moved it to the fail expecting test set, and removed the OptionalOrDefault()
[InlineData("123STOP", "123")] | ||
[InlineData("STOP123", "")] | ||
[InlineData("123STOP123STOP123", "123")] | ||
public void UntilMatches(string text, string expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could rename to be more descriptive - UntilMatchesWhenStopwordIsPresent()
perhaps? (Similar for the tests below?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
[InlineData("12345")] | ||
public void UntilParseFails(string text) | ||
{ | ||
Assert.Throws<ParseException>(() => Span.Until("STOP").Parse(text)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to check that the error message and position are what we expect, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Is it that bad :-) The only speed improvement I currently see is to avoid the remainder pattern. I'm actually a little puzzled on why you can only consume one char on the TextSpan. Why not just advance the position and keep consuming? |
Hehe 😄 no it's not bad at all! Just my week that got out of control 😅 I think this is good, I just need to give it another look over and hit the button, but getting back into the right head-space takes a bit of time. Sorry it's a a slow process right now! I haven't had a chance to check out what you mean regarding the character-by-character consumption, but I suspect it's because of the line number tracking Superpower does. There are definitely still opportunities to improve all over the place, though. |
I'm glad that you didn't press that button. There was a major issue with the calculation of the found position, as it didn't include the input offset. I found out, when I used the new version today in my other project :-) Regarding my question consider the ConsumeChar:
My question is, why it's returning a new TextSpan everytime we consume a char. I'm only asking because I'm guessing and haven't spend any time on stuff like this before. But I'm super happy about starting to get deeper into it, as it opens up a wealth of new opportunities. PS. How about a PeekChar method :-) |
Glad you were able to spot the issue!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked over this in a bit more detail, HTH!
src/Superpower/Parsers/Span.cs
Outdated
{ | ||
if (text == null) throw new ArgumentNullException(nameof(text)); | ||
|
||
var expectation = $"Until expected '{text}'"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "(backtick){text}(backtick)"
for consistency with other messages; see e.g. https://github.com/datalust/superpower/blob/dev/test/Superpower.Tests/ErrorMessageScenarioTests.cs#L115
We won't need this if we use the Except()
option.
src/Superpower/Parsers/Span.cs
Outdated
/// <param name="text">The text to match until. The text is not included in the result.</param> | ||
/// <returns>A parser that will match everything until the text argument is matched.</returns> | ||
/// <exception cref="ArgumentNullException">The text is null.</exception> | ||
public static TextParser<TextSpan> Until(string text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that we call this Except
when working with characters:
https://github.com/datalust/superpower/blob/dev/src/Superpower/Parsers/Character.cs#L95
Character.Except('c')
and Span.Except("stop")
would be similar in design; the advantage of Except()
over Until()
is that the user can choose whether or not the stopword must be present.
from content in Span.Except("stop")
from _ in Span.EqualTo("stop")
would be functionally identical to Until()
, while the second part of that construction could be omitted if the stopword is optional (say, where an end-of-file might have the same meaning).
There would be some reprocessing to match the stopword, but the position of it would be passed through via the remainder so minimal searching would need to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until() should leave the stopword in the reminder, so I think in it's current state it is actually identical to Span.Except()
I also like that naming better, more concise. So I think it's just a matter of renaming it.
src/Superpower/Parsers/Span.cs
Outdated
@@ -258,5 +258,39 @@ public static TextParser<TextSpan> MatchedBy<T>(TextParser<T> parser) | |||
result.Remainder); | |||
}; | |||
} | |||
|
|||
/// <summary> | |||
/// Parse as much of the input as matches <paramref name="text" />. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matches text from the source string until is seen, or the end of input is reached. ? (See below for the variation regarding EOF.)
TextSpan remainder = input; | ||
var matchIndex = 0; | ||
|
||
while (remainder.Length>0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still planning to use IndexOf()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By IndexOf(), you mean the string method right?
If so, I was wondering if allocating new strings every time we visit, would be a better choice than just using the TextSpan consume/reminder pattern. As a general thought and related to my last question, I think I would have chosen a design with a mutable TextSpan for performance reasons. I'm not sure that I understand the need for immutability without also knowing a lot more of the details behind your design choices - but it's just what comes to mind when you start thinking about it as an outsider. If you create a new TextSpan you could still leave it untouched without advancing it (for use as a marker), so it still puzzles me a little that it has to be immutable. I would love to hear why, but don't write a long story - I'm just curious :-)
When is this going to be merged? 🤔 |
I'm not sure whether it's me or Nicolas we are waiting for :-) I would like to know if I should go for indexOf or keep the consume/remainder pattern before continuing. Maybe it could also be determined later. Might need a benchmark test to see which is better, as I think it will vary with the size of the scan. Well, I will update according to the last comments, and then we can decide on the indexOf after that :-) |
…uld now be aligned with Character.Except.
@nblumhardt this looks OK to me! |
https://www.nuget.org/packages/Superpower/3.0.0-dev-00201 is out :-) Thanks for sticking with this, @Ellested! I made some tweaks on the way through - current version is at https://github.com/datalust/superpower/blob/dev/src/Superpower/Parsers/Span.cs#L275
|
Looks great. I didn't realize the source string was available all this time, so I thought you meant I should use intput.ToStringValue() and then IndexOf() on the result. Now I get it :-) |
Ah - great! 👍 Thanks again! |
I rely on the compiler to inline functions as necessary, so I have not done any optimizations.