Added Span.Until to Superpower.Parsers #129

Ellested · 2021-02-21T23:53:56Z

I rely on the compiler to inline functions as necessary, so I have not done any optimizations.

nblumhardt · 2021-02-22T22:30:53Z

Thanks! We'll take a look as soon as we have some time free to think it through 👍

nblumhardt · 2021-03-07T23:27:22Z

Sorry about the slow turnaround on this. I have a few thoughts but haven't been through in enough detail to be able to offer a useful review just yet - I'll add some comments anyway in case they're useful to move things forwards.

nblumhardt · 2021-03-07T23:31:44Z

src/Superpower/Parsers/Span.cs

+            {
+                TextSpan x = input;
+
+                while (!x.IsAtEnd)


It may be nice to look at using string.IndexOf here to speed up the search, since it'll be cleverly implemented at the low level? There's a similar algorithm in the regex matcher at: https://github.com/datalust/superpower/blob/dev/src/Superpower/Parsers/Span.cs#L219, I think.

Not sure I understood the indexOf reference you link to - is it the right place?
Anyway, I made a new version that I believe should be faster, as it only scans the input once.
Maybe it's better that you do the implementation, I think you are far more into all the details :-)

nblumhardt · 2021-03-07T23:32:36Z

src/Superpower/Parsers/Span.cs

+                    x = x.ConsumeChar().Remainder;
+                }
+
+                return Result.Empty<TextSpan>(input, $"Until expected {text}");


Would need to pull out the formatting here and put it in a variable outside the returned closure, so that we don't allocate strings for every non-matching result.

nblumhardt · 2021-03-07T23:33:07Z

test/Superpower.Tests/Parsers/SpanTests.cs

@@ -83,5 +84,33 @@ public void RegexMatches()
            var r = parser(input);
            Assert.Equal("Foo", r.Value.ToStringValue());
        }
+
+        [Theory]
+        [InlineData("", "")]


Shouldn't Until("STOP") fail if `"STOP" is missing from the input string?

Yep, I moved it to the fail expecting test set, and removed the OptionalOrDefault()

nblumhardt · 2021-03-07T23:33:41Z

test/Superpower.Tests/Parsers/SpanTests.cs

+        [InlineData("123STOP", "123")]
+        [InlineData("STOP123", "")]
+        [InlineData("123STOP123STOP123", "123")]
+        public void UntilMatches(string text, string expected)


Could rename to be more descriptive - UntilMatchesWhenStopwordIsPresent() perhaps? (Similar for the tests below?)

nblumhardt · 2021-03-07T23:33:59Z

test/Superpower.Tests/Parsers/SpanTests.cs

+        [InlineData("12345")]
+        public void UntilParseFails(string text)
+        {
+            Assert.Throws<ParseException>(() => Span.Until("STOP").Parse(text));


Good to check that the error message and position are what we expect, if possible.

Ellested · 2021-03-19T11:00:12Z

Is it that bad :-)

The only speed improvement I currently see is to avoid the remainder pattern. I'm actually a little puzzled on why you can only consume one char on the TextSpan. Why not just advance the position and keep consuming?
Of course the struct is allocated on the stack, but there's still some copying that needs to through, and the only difference is the position. Well, I guess that's just my limited knowledge of not working with the details.

nblumhardt · 2021-03-19T23:35:13Z

Hehe 😄 no it's not bad at all! Just my week that got out of control 😅

I think this is good, I just need to give it another look over and hit the button, but getting back into the right head-space takes a bit of time. Sorry it's a a slow process right now!

I haven't had a chance to check out what you mean regarding the character-by-character consumption, but I suspect it's because of the line number tracking Superpower does. There are definitely still opportunities to improve all over the place, though.

… accounted for.

Ellested · 2021-03-20T15:05:56Z

I'm glad that you didn't press that button. There was a major issue with the calculation of the found position, as it didn't include the input offset. I found out, when I used the new version today in my other project :-)

Regarding my question consider the ConsumeChar:

public Result<char> ConsumeChar()
{
	EnsureHasValue();
	if (IsAtEnd)
	{
		return Result.Empty<char>(this);
	}
	char ch = Source[Position.Absolute];
	return Result.Value(ch, this, new TextSpan(Source, Position.Advance(ch), Length - 1));
}

My question is, why it's returning a new TextSpan everytime we consume a char.
Why not just keep the current TextSpan, and advance the position until something interesting happens?
I'm only interested in the character, but I get a complete result struct where only the position and length is changed.
So even though it's a struct, it is still memory initialization/copying of bytes, that weighs in tight loops.

I'm only asking because I'm guessing and haven't spend any time on stuff like this before. But I'm super happy about starting to get deeper into it, as it opens up a wealth of new opportunities.

PS. How about a PeekChar method :-)

nblumhardt · 2021-03-26T06:24:30Z

Glad you were able to spot the issue!

TextSpan is immutable so that parsers can treat them like bookmarks to try different options while moving backwards and forwards within the string. HTH!

nblumhardt

Just looked over this in a bit more detail, HTH!

nblumhardt · 2021-03-29T01:52:32Z

src/Superpower/Parsers/Span.cs

+        {
+            if (text == null) throw new ArgumentNullException(nameof(text));
+
+            var expectation = $"Until expected '{text}'";


This should be "(backtick){text}(backtick)" for consistency with other messages; see e.g. https://github.com/datalust/superpower/blob/dev/test/Superpower.Tests/ErrorMessageScenarioTests.cs#L115

We won't need this if we use the Except() option.

nblumhardt · 2021-03-29T01:59:50Z

src/Superpower/Parsers/Span.cs

+        /// <param name="text">The text to match until. The text is not included in the result.</param>
+        /// <returns>A parser that will match everything until the text argument is matched.</returns>
+        /// <exception cref="ArgumentNullException">The text is null.</exception>
+        public static TextParser<TextSpan> Until(string text)


Just realized that we call this Except when working with characters:

https://github.com/datalust/superpower/blob/dev/src/Superpower/Parsers/Character.cs#L95

Character.Except('c') and Span.Except("stop") would be similar in design; the advantage of Except() over Until() is that the user can choose whether or not the stopword must be present.

from content in Span.Except("stop") from _ in Span.EqualTo("stop")

would be functionally identical to Until(), while the second part of that construction could be omitted if the stopword is optional (say, where an end-of-file might have the same meaning).

There would be some reprocessing to match the stopword, but the position of it would be passed through via the remainder so minimal searching would need to be done.

Until() should leave the stopword in the reminder, so I think in it's current state it is actually identical to Span.Except()
I also like that naming better, more concise. So I think it's just a matter of renaming it.

nblumhardt · 2021-03-29T02:01:47Z

src/Superpower/Parsers/Span.cs

@@ -258,5 +258,39 @@ public static TextParser<TextSpan> MatchedBy<T>(TextParser<T> parser)
                    result.Remainder);
            };
        }
+
+        /// <summary>
+        /// Parse as much of the input as matches <paramref name="text" />.


Matches text from the source string until is seen, or the end of input is reached. ? (See below for the variation regarding EOF.)

nblumhardt · 2021-03-29T02:03:45Z

src/Superpower/Parsers/Span.cs

+                TextSpan remainder = input;
+                var matchIndex = 0;
+
+                while (remainder.Length>0)


Still planning to use IndexOf() here?

By IndexOf(), you mean the string method right?
If so, I was wondering if allocating new strings every time we visit, would be a better choice than just using the TextSpan consume/reminder pattern. As a general thought and related to my last question, I think I would have chosen a design with a mutable TextSpan for performance reasons. I'm not sure that I understand the need for immutability without also knowing a lot more of the details behind your design choices - but it's just what comes to mind when you start thinking about it as an outsider. If you create a new TextSpan you could still leave it untouched without advancing it (for use as a marker), so it still puzzles me a little that it has to be immutable. I would love to hear why, but don't write a long story - I'm just curious :-)

SuperJMN · 2021-04-10T15:34:19Z

When is this going to be merged? 🤔

Ellested · 2021-04-10T18:41:01Z

I'm not sure whether it's me or Nicolas we are waiting for :-)

I would like to know if I should go for indexOf or keep the consume/remainder pattern before continuing. Maybe it could also be determined later. Might need a benchmark test to see which is better, as I think it will vary with the size of the scan.

Well, I will update according to the last comments, and then we can decide on the indexOf after that :-)

…uld now be aligned with Character.Except.

SuperJMN · 2021-04-10T22:32:16Z

@nblumhardt this looks OK to me!

nblumhardt · 2021-04-13T05:27:49Z

https://www.nuget.org/packages/Superpower/3.0.0-dev-00201 is out :-)

Thanks for sticking with this, @Ellested!

I made some tweaks on the way through - current version is at https://github.com/datalust/superpower/blob/dev/src/Superpower/Parsers/Span.cs#L275

Uses string.IndexOf() to simplify things a little
Disallows zero-length matches; this is important because Span.Except("STOP").Many() will otherwise loop forever if the match is zero-length (Many() is the only Superpower parser allowed to match a zero-length input, IIRC)
Adds ExceptIgnoreCase()

Ellested · 2021-04-14T16:15:15Z

Looks great. I didn't realize the source string was available all this time, so I thought you meant I should use intput.ToStringValue() and then IndexOf() on the result. Now I get it :-)

nblumhardt · 2021-04-14T22:59:32Z

Ah - great! 👍 Thanks again!

Added Span.Until to Superpower.Parsers

25e0b74

nblumhardt reviewed Mar 7, 2021

View reviewed changes

Span.Until improvements according to PR comments

5de1fc8

Fixed calculation of the found position. The input position was never…

86fa1d1

… accounted for.

nblumhardt reviewed Mar 29, 2021

View reviewed changes

Renamed Until to Except and removed the fail options. Span.Except sho…

5f48738

…uld now be aligned with Character.Except.

nblumhardt merged commit 21f53b2 into datalust:dev Apr 13, 2021

nblumhardt mentioned this pull request Sep 6, 2021

3.0.0 Release #135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Span.Until to Superpower.Parsers #129

Added Span.Until to Superpower.Parsers #129

Ellested commented Feb 21, 2021

nblumhardt commented Feb 22, 2021

nblumhardt commented Mar 7, 2021

nblumhardt Mar 7, 2021

Ellested Mar 8, 2021 •

edited

Loading

nblumhardt Mar 7, 2021

Ellested Mar 8, 2021

nblumhardt Mar 7, 2021

Ellested Mar 8, 2021

nblumhardt Mar 7, 2021

Ellested Mar 8, 2021

nblumhardt Mar 7, 2021

Ellested Mar 8, 2021

Ellested commented Mar 19, 2021

nblumhardt commented Mar 19, 2021

Ellested commented Mar 20, 2021 •

edited

Loading

nblumhardt commented Mar 26, 2021

nblumhardt left a comment

nblumhardt Mar 29, 2021

nblumhardt Mar 29, 2021

Ellested Mar 29, 2021

nblumhardt Mar 29, 2021

nblumhardt Mar 29, 2021

Ellested Mar 29, 2021

SuperJMN commented Apr 10, 2021

Ellested commented Apr 10, 2021

SuperJMN commented Apr 10, 2021

nblumhardt commented Apr 13, 2021

Ellested commented Apr 14, 2021

nblumhardt commented Apr 14, 2021

Added Span.Until to Superpower.Parsers #129

Added Span.Until to Superpower.Parsers #129

Conversation

Ellested commented Feb 21, 2021

nblumhardt commented Feb 22, 2021

nblumhardt commented Mar 7, 2021

Choose a reason for hiding this comment

Ellested Mar 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ellested commented Mar 19, 2021

nblumhardt commented Mar 19, 2021

Ellested commented Mar 20, 2021 • edited Loading

nblumhardt commented Mar 26, 2021

nblumhardt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SuperJMN commented Apr 10, 2021

Ellested commented Apr 10, 2021

SuperJMN commented Apr 10, 2021

nblumhardt commented Apr 13, 2021

Ellested commented Apr 14, 2021

nblumhardt commented Apr 14, 2021

Ellested Mar 8, 2021 •

edited

Loading

Ellested commented Mar 20, 2021 •

edited

Loading