Random improvements #416

MihaZupan · 2020-04-07T01:47:53Z

Dropping support for net35, net40. Now multi-targeting netstandard2.0, netcoreapp2.1, netcoreapp3.1. 🎉
Removed helpers such as ArrayHelper, MethodImplOptionPortable, ExcludeFromCodeCoverage that were needed for framework target compat.
Unsealed MarkdownParserContext (@patriksvensson I don't suppose you somehow rely on it being sealed? I figure having it unsealed adds more value)
Removed some allocations (b090f24, 84ec832, bdfaa27, 30d0924), 2% overall.
Improved codegen for some StringSlice helpers.
Used ThrowHelper everywhere, hopefully more things can inline now.
Further optimized CharacterMap in 256b4d3. @craigwi while not the reason I did it, this may also workaround the UWP issue.

Overall ~6% parsing throughput increase. Allocation-wise we are now very close to noise-free. The vast majority of allocations are now the AST itself and some strings.

KrisVandermotten · 2020-04-07T14:06:19Z

You changed BitVector128, a 16 byte struct, to BoolVector128, a 512 byte struct. Are you sure you want to do that?

MihaZupan · 2020-04-07T14:14:43Z

The allocation overhead of 128 bytes is negligible as we don't construct character maps often. Calls to CharacterMap methods don't copy the struct either so I don't see a downside.

craigwi · 2020-04-07T15:06:34Z

Lots of interesting changes. My recommendation is to split the PR into about a half of a dozen smaller PR, each of which focuses on a specific change to make it easier to review.

MihaZupan · 2020-04-07T15:13:07Z

I tried to keep changes isolated between commits and I don't think they should be too contentious (except perhaps a99e650). Is there any particular change you would find easier to review if I isolate it into a separate PR?

Edit:
A lot of changes depend on a few others like adding System.Memory / would cause merge conflicts with others like using ThrowHelper / removing compat helpers. The aim of grouping multiple changes into a single PR is an egoistic one, to save myself time resolving all conflicts / waiting for tiny changes before opening PRs, considering that there are not many other changes happening to Markdig that would conflict with this.

KrisVandermotten · 2020-04-07T15:14:54Z

The allocation overhead of 128 bytes is negligible as we don't construct character maps often. Calls to CharacterMap methods don't copy the struct either so I don't see a downside.

Well, the struct no longer fits in a single cache line.

Note that going from a 128 bool (512 bytes) array to a 4 uint (16 bytes) struct resulted in a 3% throughput improvement for the parsing.

Is there any particular change you would find easier to review if I isolate it into a separate PR?

This might be such a change yes. It would be interesting to see before and after benchmark results.

MihaZupan · 2020-04-07T15:16:39Z

This might be such a change yes. It would be interesting to see before and after benchmark results.

You got it, I'll move it out. The biggest chunk of throughput improvement from this PR comes from character map changes.

MihaZupan · 2020-04-07T15:32:42Z

Should we not ensure that the array cannot grow without bounds?

We should, looks like I over-simplified that part.

patriksvensson · 2020-04-07T15:52:51Z

@MihaZupan Just an old habit of sealing everything until there's a reason for it to be unsealed.

MihaZupan · 2020-04-07T15:55:14Z

@KrisVandermotten I've extracted it to MihaZupan#1, you can see the difference in the generated asm.

KrisVandermotten · 2020-04-07T16:02:42Z

@MihaZupan I'm sorry I did not make myself clear. I'm curious about the effect is of only the change from the 16 byte struct to 512 byte struct. I have no doubt that some of the other changes you did there are beneficial.

FYI, I may find some time later this week to take a look myself as well.

MihaZupan · 2020-04-07T16:07:44Z

I'm curious about the effect is of only the change from the 16 byte struct to 512 byte struct

Ah I see. The struct is 128 bytes, not 512. I can run a benchmark on just this change but I would expect the bool lookup to still be (not much) faster.

FYI, I may find some time later this week to take a look myself as well.

Glad to hear it!

KrisVandermotten · 2020-04-07T16:19:40Z

The struct is 128 bytes, not 512.

I stand corrected. Somehow I was convinced that sizeof(bool) == 4. It goes without saying that the difference is significant.

MihaZupan · 2020-04-07T16:34:56Z

cc @Sergio0694

Sergio0694 · 2020-04-07T16:48:02Z

@MihaZupan NICE! 🍻🚀

A couple questions:

Why don't you also add the netstandard.2.1 target while you're at it? Have you checked if there's some overlap with the .NET Core 3.1 APIs you're using and .NET Standard 2.1, so that those new features could also be available in the portable version?
Regarding the ThrowHelpers, not sure if taking an external dependency is out of the questions, but if isn't, you might be interested in referencing Microsoft.Toolkit when thre 6.1 update is out, and checking out the new Guard APIs (PR here), using those could simplify a lot of these conditional checks and remove some more code from markdig. They're already pretty optimized with respect to the codegen paths. Let me know if that helps! 😄

Sergio0694 · 2020-04-07T16:54:15Z

src/Markdig/Helpers/StringSlice.cs

+            int end = End;
+            int i = Start;
+
+            while (i <= end && (uint)i < (uint)text.Length && text[i].IsWhitespace())


I might be missing something, but why not leverage the ReadOnlySpan<T>.Trim extension (from MemoryExtensions) here instead, which is also automatically vectorized? Same for the other trim methods from this type.

I would expect the 99% case here to be 0 - 2 spaces, so creating the span would be more overhead.

I have played around with other unsafe helpers (that could be used in Match for example), that would take ref char to avoid the cost of a span creation, but that's for a different issue/pr.

Ah, that makes perfect sense, didn't know the expected leading space was so small. Good call, thanks for the additional info! 😊

MihaZupan · 2020-04-07T16:57:03Z

Why don't you also add the netstandard.2.1 target while you're at it? Have you checked if there's some overlap with the .NET Core 3.1 APIs you're using and .NET Standard 2.1, so that those new features could also be available in the portable version?

We can target netstandard2.1 as well, I added core 3.1 for the string.GetPinnableReference API, I don't know of an alternative to get ref char from a string (without going through Span / pinning).

Regarding the ThrowHelpers, not sure if taking an external dependency is out of the questions, but if isn't, you might be interested in referencing Microsoft.Toolkit

Considering the simplicity of those methods, I wouldn't add a dependency. See #296 (comment).

Sergio0694 · 2020-04-07T17:08:43Z

We can target netstandard2.1 as well, I added core 3.1 for the string.GetPinnableReference API, I don't know of an alternative to get ref char from a string (without going through Span / pinning).

If you don't mind doing dirty tricks, there's this 🙈

This also happens to be faster than GetPinnableReference, since it can skip the bounds check, though technically it is only guaranteed to work on CoreCLR-like runtimes. But, since you're using that inside the #if NETCOREAPP3_1 directive anyway, it should be fine. I understand if you didn't want to have stuff like this in the repo for maintanability though 😅

Considering the simplicity of those methods, I wouldn't add a dependency. See #296 (comment).

Yup, figured as much, makes perfect sense! 👍

MihaZupan · 2020-04-07T17:17:22Z

string's GetPinnableReference doesn't do any validation afaik (sharplab).

If you don't mind doing dirty tricks, there's this 🙈

👀 Thanks for sharing this. While I don't mind unsafe, I would rather have a preprocessor directive here in this case while we support targets without the API 😄

Sergio0694 · 2020-04-07T17:31:30Z

👀 Well thank you for sharing that too!

I was sure string.GetPinnableReference() was doing a check as well, but apparently I was wrong and that only happens for Span<T>.GetPinnableReference() (see here). That's a bit weird, I wonder why do those APIs act so differently (assuming it's not just sharplab.io acting up). I'll need to go find the source in CoreCLR, hopefully there will be some comments there that will shed some light on this! 😄

Well what do you know! Here's the source from CoreCLR.

This is great, thanks again! TIL 🍻

EDIT: here's Span<T>.GetPinnableReference() instead, source from CoreCLR (with bounds check).

Sergio0694 · 2020-04-07T19:56:09Z

Small note on that string.GetPinnableReference(), while it's great to have it built-in, looks like it unfortunately suffers from a very similar codegen issue to dotnet/runtime#12880.

In particular, we get this in x64:

public static ref char GetReference(this string text)
{
    return ref Unsafe.AsRef(in text.GetPinnableReference());   
}

C.GetReference(System.String)
    L0000: mov eax, [rcx]
    L0002: lea rax, [rcx+0xc]
    L0006: ret

That first mov eax, [rcx] looks unnecessary, since the value in eax is immediately overwritten and never actually used. Also I'm not even sure what that was supposed to be in the first place, since it's apparently reading a 32bit value (in eax), from the address of... The method table pointer? The first char is located at [rcx + 0xc] since that's 8 bytes for the method table pointer (x64) + 4 bytes for the int representing the length of the string. Might be a codegen leftover for the bounds check 🤔

Anyway, hopefully it gets fixed in a future RyuJit update, thought it might be interesting to share 😄

xoofx · 2020-04-07T20:14:42Z

L0000: mov eax, [rcx]

It should be a check for NullReferenceException so that access to text is still safe.

Sergio0694 · 2020-04-07T20:20:55Z

@xoofx Oh I see, thanks! Is that because if the address was null, trying to dereference [rcx+0xc] would have resulted in an access violation instead? As in, since that address would've been invalid but not exactlu null, so I'm not sure if the runtime is able to handle that the same? 🤔

xoofx · 2020-04-07T20:29:49Z

@xoofx Oh I see, thanks! Is that because if the address was null, trying to dereference [rcx+0xc] would have resulted in an access violation instead? As in, since that address would've been invalid but not exactlu null, so I'm not sure if the runtime is able to handle that the same?

    L0002: lea rax, [rcx+0xc]

This instruction doesn't dereference anything but just calculate an offset. That means that if the first instruction mov eax, [rcx] was not emitted, you would return rax = 12 without any error in this function. So the problem is that you really want to know that text.GetPinnableReference() failed on a NullReferenceException not much later when accessing the pointer returned by GetReference

An access to the memory behind 0xc or 0x0 should generate a NullReferenceException, usually because it is in the first range of a 4K page fault that are mapped/protected to generate such exception.

Sergio0694 · 2020-04-07T21:03:26Z

Ah, gotcha, that makes perfect sense. Also, good to know that memory accesses in that initial range would still properly trigger a NullReferenceException instead of a nasty AV, that's nice 😄

I guess if one really wanted to also skip that null check there, the solution would just be to either use the trick with the mapping type (sample here), or to just wait for .NET 5 (maybe?) to add some GetRawDataReference API or equivalent for the string type. Something like GetRawData (dotnet/runtime#28001), which for now is only for arrays as far as I know (GetArrayDataReference).

Of course these are arguably just very minor details, but mostly I was just interested in learning more about this in general. Thanks for taking the time to chime in, really appreciate it! 😊

KrisVandermotten · 2020-04-13T11:03:50Z

I've been benchmarking the difference between the 128 bool array and the 16 byte array.

It seems that the bool array is indeed slightly faster (at least on my computer, .NET Core 3.1.3) on this benchmark:

public class Program
{
    private readonly string text = File.ReadAllText("spec.md");

    [Benchmark]
    public void TestMarkdig()
    {
        Markdown.Parse(text);
    }

    private static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }
}

KrisVandermotten · 2020-04-13T14:42:29Z

LGTM

xoofx · 2020-04-18T07:19:59Z

Could you re-merge back master to your branch @MihaZupan?, the CI was broken and I had to fix it on master. You might have also to rerun the specs after merging. Thanks!

MihaZupan · 2020-04-18T12:20:38Z

Thanks for fixing up the CI! I rebased and also added a commit to cross-target standard2.1 as Sergio suggested

Sergio0694

Not an actual review, just a couple notes out of curiosity on my part 😄
Great work with the PR!

Sergio0694 · 2020-04-18T12:33:12Z

src/Markdig/Helpers/ThrowHelper.cs

@@ -10,45 +10,46 @@ namespace Markdig.Helpers
    [ExcludeFromCodeCoverage]
    internal static class ThrowHelper
    {
+        public static void ArgumentNullException(string paramName) => throw new ArgumentNullException(paramName);


I might be missing something, but shouldn't all these methods be marked as [MethodImpl(MethodImplOptions.NoInlining)]?

They could be, but the Jit will never find inlining methods that always throw profitable afaik

Ah I see, you're relying more on the JIT to do that for you. I guess I might just be overly cautious to always add that manually, just in case the code ends up running on some runtime with an older or less optimized JIT. But what you said makes perfect sense! 👍

I can add them to document that that is the intent

Yeah, it should be handled automatically in CoreCLR now (see dotnet/coreclr#6103), but I guess it wouldn't hurt considering the lib targets .NET Standard, so it could be running on everything from .NET Framework to other runtimes as well 😊

Sergio0694 · 2020-04-18T12:34:32Z

src/Markdig/Markdig.targets

@@ -6,7 +6,7 @@
    <NeutralLanguage>en-US</NeutralLanguage>
    <VersionPrefix>0.18.3</VersionPrefix>
    <Authors>Alexandre Mutel</Authors>
-    <TargetFrameworks>net35;net40;netstandard2.0;netcoreapp2.1</TargetFrameworks>
+    <TargetFrameworks>netstandard2.0;netstandard2.1;netcoreapp2.1;netcoreapp3.1</TargetFrameworks>


Out of curiosity, what .NET Core 2.1 specific APIs are you using in the lib?
Wouldn't .NET Standard 2.0 be enough to replace that target too?

And as for .NET Core 3.1, I assume that's because you have some vectorized paths using SSE/AVX/etc.?

Some APIs like TextWriter.Write(Span) aren't in standard 2.0 so that's why I added core2.1 before.
Core 3.1 is used only for string.GetPinnableReference in CharacterMap rn

Right now there's no manual hw intrinsics code in Markdig, but that can change now that we're refreshing build targets

I see, targeting .NET Core 2.1 directly as a workaround to be able to use those TextWriter.Write(Span<T>) overloads makes sense, yeah. As for .NET Core 3.1 though, if it's just for that single string.GetPinnableReference call, wouldn't it make more sense to drop that target, just target .NET Standard 2.1, and pass ReadOnlySpan<char> values around (instead of string-s), and then just use MemoryMarshal.GetReference() to get a mutable ref from there? This would also allow you to just split/tokenize over ReadOnlySpan<T> instead of allocating substrings, in case you're doing that too.

Just a thought! 😄

StringSlice is the type plummed throughout Markdig to avoid substring allocations. It provides good utility like the NextChar method that you would otherwise have to replace with slices/more manual index bookkeeping.

I'm not too worried about having the extra target since I can see us adding more code relying on new APIs in the future

I see, yeah that makes perfect sense. I was just curious, thank you for taking the time to provide more info, I appreciate it! 😊

xoofx · 2020-04-18T15:54:41Z

Amazing job @MihaZupan !

Sergio0694 reviewed Apr 7, 2020

View reviewed changes

MihaZupan added 3 commits April 18, 2020 14:07

Drop net35 and net40 targets

cce1b99

Unseal MarkdownParserContext

f7d7632

Add System.Memory package for netstandard2.0

8300a9c

MihaZupan added 17 commits April 18, 2020 14:09

Remove ExcludeFromCodeCoverage compat for net35

5f453fb

Remove substring allocation in RomanToArabic

9ca5cdc

Remove MethodImplOptionPortable compat helper

ee97d32

Optimize StringSlice helpers

2d97628

Remove ArrayHelper

f0269fc

Remove unused Helper

09593ff

Avoid allocations in EntityHelper

3d005d6

Remove .Net Foundation license header

2679c84

Use foreach where appropriate

be3a893

Improve CodeGen by using ThrowHelper everywhere

784b999

Avoid delegate allocation in LinkInlineParser.MarkParentAsInactive

86a72c3

Cache string representations of numbers 0-26

665f5f5

Remove allocation in EntityHelper.DecodeEntity

a8b6e2c

Cross-target Core 3.1

a9d78e0

Optimize CharacterMap further

b0a2f19

Don't count non-ascii chars into maxChar limit in CharacterMap

a15203a

Cross-target netstandard2.1

eb002db

MihaZupan force-pushed the new branch from 215f850 to eb002db Compare April 18, 2020 12:15

Sergio0694 reviewed Apr 18, 2020

View reviewed changes

xoofx merged commit 5c52a72 into xoofx:master Apr 18, 2020

This was referenced Aug 16, 2020

Discussion: should we keep support for net40 and net45? Kryptos-FR/markdig.wpf#41

Closed

Bring back net45 target #466

Merged

Random improvements #416

Random improvements #416

Conversation

MihaZupan commented Apr 7, 2020 • edited Loading

KrisVandermotten commented Apr 7, 2020

MihaZupan commented Apr 7, 2020

craigwi commented Apr 7, 2020

MihaZupan commented Apr 7, 2020 • edited Loading

KrisVandermotten commented Apr 7, 2020 • edited Loading

MihaZupan commented Apr 7, 2020

MihaZupan commented Apr 7, 2020

patriksvensson commented Apr 7, 2020

MihaZupan commented Apr 7, 2020

KrisVandermotten commented Apr 7, 2020

MihaZupan commented Apr 7, 2020

KrisVandermotten commented Apr 7, 2020

MihaZupan commented Apr 7, 2020

Sergio0694 commented Apr 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MihaZupan commented Apr 7, 2020

Sergio0694 commented Apr 7, 2020 • edited Loading

MihaZupan commented Apr 7, 2020

Sergio0694 commented Apr 7, 2020 • edited Loading

Sergio0694 commented Apr 7, 2020

xoofx commented Apr 7, 2020 • edited Loading

Sergio0694 commented Apr 7, 2020

xoofx commented Apr 7, 2020 • edited Loading

Sergio0694 commented Apr 7, 2020

KrisVandermotten commented Apr 13, 2020 • edited Loading

KrisVandermotten commented Apr 13, 2020

xoofx commented Apr 18, 2020

MihaZupan commented Apr 18, 2020

Sergio0694 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sergio0694 Apr 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sergio0694 Apr 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xoofx commented Apr 18, 2020

MihaZupan commented Apr 7, 2020 •

edited

Loading

MihaZupan commented Apr 7, 2020 •

edited

Loading

KrisVandermotten commented Apr 7, 2020 •

edited

Loading

Sergio0694 commented Apr 7, 2020 •

edited

Loading

Sergio0694 commented Apr 7, 2020 •

edited

Loading

xoofx commented Apr 7, 2020 •

edited

Loading

xoofx commented Apr 7, 2020 •

edited

Loading

KrisVandermotten commented Apr 13, 2020 •

edited

Loading

Sergio0694 Apr 18, 2020 •

edited

Loading

Sergio0694 Apr 18, 2020 •

edited

Loading