-
-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random improvements #416
Random improvements #416
Conversation
You changed |
The allocation overhead of 128 bytes is negligible as we don't construct character maps often. Calls to CharacterMap methods don't copy the struct either so I don't see a downside. |
Lots of interesting changes. My recommendation is to split the PR into about a half of a dozen smaller PR, each of which focuses on a specific change to make it easier to review. |
I tried to keep changes isolated between commits and I don't think they should be too contentious (except perhaps a99e650). Is there any particular change you would find easier to review if I isolate it into a separate PR? Edit: |
Well, the struct no longer fits in a single cache line. Note that going from a 128 bool (512 bytes) array to a 4 uint (16 bytes) struct resulted in a 3% throughput improvement for the parsing.
This might be such a change yes. It would be interesting to see before and after benchmark results. |
You got it, I'll move it out. The biggest chunk of throughput improvement from this PR comes from character map changes. |
We should, looks like I over-simplified that part. |
@MihaZupan Just an old habit of sealing everything until there's a reason for it to be unsealed. |
@KrisVandermotten I've extracted it to MihaZupan#1, you can see the difference in the generated asm. |
@MihaZupan I'm sorry I did not make myself clear. I'm curious about the effect is of only the change from the 16 byte struct to 512 byte struct. I have no doubt that some of the other changes you did there are beneficial. FYI, I may find some time later this week to take a look myself as well. |
Ah I see. The struct is 128 bytes, not 512. I can run a benchmark on just this change but I would expect the bool lookup to still be (not much) faster.
Glad to hear it! |
I stand corrected. Somehow I was convinced that |
cc @Sergio0694 |
@MihaZupan NICE! 🍻🚀 A couple questions:
|
int end = End; | ||
int i = Start; | ||
|
||
while (i <= end && (uint)i < (uint)text.Length && text[i].IsWhitespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but why not leverage the ReadOnlySpan<T>.Trim
extension (from MemoryExtensions
) here instead, which is also automatically vectorized? Same for the other trim methods from this type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the 99% case here to be 0 - 2 spaces, so creating the span would be more overhead.
I have played around with other unsafe helpers (that could be used in Match
for example), that would take ref char
to avoid the cost of a span creation, but that's for a different issue/pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes perfect sense, didn't know the expected leading space was so small. Good call, thanks for the additional info! 😊
We can target netstandard2.1 as well, I added core 3.1 for the
Considering the simplicity of those methods, I wouldn't add a dependency. See #296 (comment). |
If you don't mind doing dirty tricks, there's this 🙈 This also happens to be faster than
Yup, figured as much, makes perfect sense! 👍 |
string's
👀 Thanks for sharing this. While I don't mind unsafe, I would rather have a preprocessor directive here in this case while we support targets without the API 😄 |
👀 Well thank you for sharing that too! I was sure Well what do you know! Here's the source from CoreCLR. This is great, thanks again! TIL 🍻 EDIT: here's |
Small note on that In particular, we get this in x64: public static ref char GetReference(this string text)
{
return ref Unsafe.AsRef(in text.GetPinnableReference());
} C.GetReference(System.String)
L0000: mov eax, [rcx]
L0002: lea rax, [rcx+0xc]
L0006: ret That first Anyway, hopefully it gets fixed in a future RyuJit update, thought it might be interesting to share 😄 |
L0000: mov eax, [rcx] It should be a check for |
@xoofx Oh I see, thanks! Is that because if the address was null, trying to dereference |
This instruction doesn't dereference anything but just calculate an offset. That means that if the first instruction An access to the memory behind 0xc or 0x0 should generate a |
Ah, gotcha, that makes perfect sense. Also, good to know that memory accesses in that initial range would still properly trigger a I guess if one really wanted to also skip that Of course these are arguably just very minor details, but mostly I was just interested in learning more about this in general. Thanks for taking the time to chime in, really appreciate it! 😊 |
I've been benchmarking the difference between the 128 bool array and the 16 byte array. It seems that the bool array is indeed slightly faster (at least on my computer, .NET Core 3.1.3) on this benchmark:
|
LGTM |
Could you re-merge back master to your branch @MihaZupan?, the CI was broken and I had to fix it on master. You might have also to rerun the specs after merging. Thanks! |
Thanks for fixing up the CI! I rebased and also added a commit to cross-target standard2.1 as Sergio suggested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an actual review, just a couple notes out of curiosity on my part 😄
Great work with the PR!
@@ -10,45 +10,46 @@ namespace Markdig.Helpers | |||
[ExcludeFromCodeCoverage] | |||
internal static class ThrowHelper | |||
{ | |||
public static void ArgumentNullException(string paramName) => throw new ArgumentNullException(paramName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but shouldn't all these methods be marked as [MethodImpl(MethodImplOptions.NoInlining)]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could be, but the Jit will never find inlining methods that always throw profitable afaik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, you're relying more on the JIT to do that for you. I guess I might just be overly cautious to always add that manually, just in case the code ends up running on some runtime with an older or less optimized JIT. But what you said makes perfect sense! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add them to document that that is the intent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be handled automatically in CoreCLR now (see dotnet/coreclr#6103), but I guess it wouldn't hurt considering the lib targets .NET Standard, so it could be running on everything from .NET Framework to other runtimes as well 😊
@@ -6,7 +6,7 @@ | |||
<NeutralLanguage>en-US</NeutralLanguage> | |||
<VersionPrefix>0.18.3</VersionPrefix> | |||
<Authors>Alexandre Mutel</Authors> | |||
<TargetFrameworks>net35;net40;netstandard2.0;netcoreapp2.1</TargetFrameworks> | |||
<TargetFrameworks>netstandard2.0;netstandard2.1;netcoreapp2.1;netcoreapp3.1</TargetFrameworks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what .NET Core 2.1 specific APIs are you using in the lib?
Wouldn't .NET Standard 2.0 be enough to replace that target too?
And as for .NET Core 3.1, I assume that's because you have some vectorized paths using SSE/AVX/etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some APIs like TextWriter.Write(Span)
aren't in standard 2.0 so that's why I added core2.1 before.
Core 3.1 is used only for string.GetPinnableReference in CharacterMap rn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now there's no manual hw intrinsics code in Markdig, but that can change now that we're refreshing build targets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, targeting .NET Core 2.1 directly as a workaround to be able to use those TextWriter.Write(Span<T>)
overloads makes sense, yeah. As for .NET Core 3.1 though, if it's just for that single string.GetPinnableReference
call, wouldn't it make more sense to drop that target, just target .NET Standard 2.1, and pass ReadOnlySpan<char>
values around (instead of string
-s), and then just use MemoryMarshal.GetReference()
to get a mutable ref
from there? This would also allow you to just split/tokenize over ReadOnlySpan<T>
instead of allocating substrings, in case you're doing that too.
Just a thought! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringSlice is the type plummed throughout Markdig to avoid substring allocations. It provides good utility like the NextChar method that you would otherwise have to replace with slices/more manual index bookkeeping.
I'm not too worried about having the extra target since I can see us adding more code relying on new APIs in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yeah that makes perfect sense. I was just curious, thank you for taking the time to provide more info, I appreciate it! 😊
Amazing job @MihaZupan ! |
net35
,net40
. Now multi-targetingnetstandard2.0
,netcoreapp2.1,
netcoreapp3.1
. 🎉ArrayHelper
,MethodImplOptionPortable
,ExcludeFromCodeCoverage
that were needed for framework target compat.MarkdownParserContext
(@patriksvensson I don't suppose you somehow rely on it being sealed? I figure having it unsealed adds more value)ThrowHelper
everywhere, hopefully more things can inline now.CharacterMap
in 256b4d3. @craigwi while not the reason I did it, this may also workaround the UWP issue.Overall ~6% parsing throughput increase. Allocation-wise we are now very close to noise-free. The vast majority of allocations are now the AST itself and some strings.