Introduction to vectorization with Vector128 and Vector256 #84115

adamsitnik · 2023-03-30T08:37:03Z

Disclaimer:

I am not a vectorization expert, that is why I was asked to get familiar with the APIs and write an introduction for other beginers. Please don't treate this document as source of truth until it becomes reviewed by the experts, approved and merged.
I am not a native english speaker and I am always more than happy to improve my skills, so please don't hesitate to point out grammar mistakes or places for improvements. Suggestions are preffered ;)

ghost · 2023-03-30T08:37:20Z

Tagging subscribers to this area: @dotnet/area-meta
See info in area-owners.md if you want to be subscribed.

Issue Details

Disclaimer:

I am not a vectorization expert, that is why I was asked to get familiar with the APIs and write an introduction for other beginers. Please don't treate this document as source of truth until it becomes reviewed by the experts, approved and merged.
I am not a native english speaker and I am always more than happy to improve my skills, so please don't hesitate to point out grammar mistakes or places for improvements. Suggestions are preffered ;)

Author:	adamsitnik
Assignees:	-
Labels:	`area-Meta`
Milestone:	-

gfoidl

Nice write-up.

docs/coding-guidelines/vectorization-guidelines.md

Co-authored-by: Günther Foidl <gue@korporal.at>

docs/coding-guidelines/vectorization-guidelines.md

adamsitnik · 2023-03-30T12:31:50Z

@gfoidl big thanks for a very detailed review!! 🥇

docs/coding-guidelines/vectorization-guidelines.md

EgorBo · 2023-03-30T14:39:41Z

Does Vector512 need a coverage in the doc? It seems to be already usable

danmoseley

Read half and added minor suggestions.. great doc

docs/coding-guidelines/vectorization-guidelines.md

Co-authored-by: Dan Moseley <danmose@microsoft.com> Co-authored-by: Günther Foidl <gue@korporal.at>

adamsitnik · 2023-03-30T20:18:26Z

Read half and added minor suggestions.. great doc

@danmoseley Thank you very much!

docs/coding-guidelines/vectorization-guidelines.md

gfoidl · 2023-03-31T09:19:45Z

docs/coding-guidelines/vectorization-guidelines.md

+
+### Vectorized remainder handling
+
+Now imagine that we need to check whether the given buffer contains specific number. In this case, processing some values more than once is acceptable, we don't need to handle the remainder in a non-vectorized fashion.


Suggested change

Now imagine that we need to check whether the given buffer contains specific number. In this case, processing some values more than once is acceptable, we don't need to handle the remainder in a non-vectorized fashion.

Now imagine that we need to check whether the given buffer contains specific number. In this case, processing some values more than once is acceptable, we don't need to handle the remainder in a non-vectorized fashion. This can be done for all idempotent operations (e.g. also copying).

To make this a bit more general?

Rob-Hague

Grammar suggestions, as you asked for it 🙂

docs/coding-guidelines/vectorization-guidelines.md

Co-authored-by: Rob Hague <rob.hague00@gmail.com> Co-authored-by: Günther Foidl <gue@korporal.at>

Co-authored-by: Rob Hague <rob.hague00@gmail.com>

adamsitnik · 2023-03-31T13:06:10Z

Grammar suggestions, as you asked for it

@Rob-Hague thank you very much! It's very motivating to get such detailed and helpful review!

Co-authored-by: Rob Hague <rob.hague00@gmail.com>

tannergooding · 2023-04-03T14:23:33Z

Will give this a pass this afternoon.

docs/coding-guidelines/vectorization-guidelines.md

tannergooding · 2023-04-06T22:45:42Z

docs/coding-guidelines/vectorization-guidelines.md

+    Vector128<int> sum = Vector128<int>.Zero;
+
+    while (!Unsafe.IsAddressGreaterThan(ref current, ref oneVectorAwayFromEnd))
+    {
+        sum += Vector128.LoadUnsafe(ref current);
+
+        current = ref Unsafe.Add(ref current, Vector128<int>.Count);
+    }
+
+    int result = Vector128.Sum(sum);
+
+    while (Unsafe.IsAddressLessThan(ref current, ref end))
+    {
+        result += current;
+
+        current = ref Unsafe.Add(ref current, 1);
+    }


nit: It is much preferred to use LoadUnsafe(ref bufferStart, index) over using Unsafe.Add

It is very easy to get Unsafe.Add wrong, particularly with moving the ref outside the GC tracked region for the buffer. We have written a few bugs in that space ourselves.

LoadUnsafe(ref T address, nuint index) pushes users towards writing code that is less prone to this general issue and is overall easier to adapt to from a simpler for loop that the scalar logic may be using.

-- I see we touch on this below, I think it would be much clearer to cover the "appropriate" way to do it here first in the normal reading order and then to cover things like this in a separate "what not to do and why" section.

I see we touch on this below, I think it would be much clearer to cover the "appropriate" way to do it here first in the normal reading order and then to cover things like this in a separate "what not to do and why" section.

That is why in the paragraph above (Loops), where I need to load the values, I am using the recommended way (so it's the first thing the users see). In this paragraph, I start on purpose with the "wrong" way, just to point out the disadvantages, move on & repat and at the end I mention the recommended approach.

tannergooding · 2023-04-06T22:50:42Z

docs/coding-guidelines/vectorization-guidelines.md

+## Mindset
+
+Vectorizing real-world algorithms seems complex at the beginning. And what do software engineers do with complex problems? We break them down into sub-problems until these become simple enough to be solved directly.
+
+Let's implement a vectorized method for checking whether a given byte buffer consists only from valid ASCII characters to see how similar problems can be solved.


I think it would be better to have the example of vectorization earlier in the doc.

Basically:

Brief introduction to what vectors are and what .NET provides at a high level

An example of taking a scalar algorithm and vectorizing it, showing perf wins and touching briefly on what we did

A more in depth coverage of the APIs we provide, best practices, etc

A more in depth coverage of what not to do and why

Optionally some coverage of advanced topics/scenarios, such as handling trailing elements or alignment efficiently

This different ordering would walk users through a very natural process introducing key terms, giving an example of what can be done, and then breaking it apart and going into more detail.

As we have it now, there is a lot of detail up front before we ever get into showing a user the "simple example" of taking a scalar algorithm and vectorizing it.

I think it would be better to have the example of vectorization earlier in the doc.

In a way we already have that, as in the "Loops" paragraph I provide a working Sum and Contains method implementations with comments.

I've spent a lot of time thinking about your recommendation and I believe it's a good one, but I would prefer to keep the existing order.

My main motivation is to prevent the readers from running into various traps, so I really want them to read the whole document. I am hoping that with the current structure, our target readers are going to learn something interesting in every paragraph, which is going to keep them focused and reach the end without missing anything.

With your proposal, I am afraid that some of the readers could skip some parts of it and run into issues later.

docs/coding-guidelines/vectorization-guidelines.md

danmoseley · 2023-04-21T21:22:50Z

According to this #84635 (comment) each method must protect its own use of hardware intrinsics, it cannot just have a caller check and then assume (or assert) that some ISA is available. If that's the case it should perhaps be mentioned in here.

edit: oh apparently that's just in corelib

danmoseley · 2023-04-21T21:23:28Z

BTW, will this go on learn.microsoft.com? Something like this probably should be there.

docs/coding-guidelines/vectorization-guidelines.md

Co-authored-by: Günther Foidl <gue@korporal.at>

… of introduced concepts

* update TOC * add note about imperfect perf boost * don't recommend managed references over unsafe pointers, as they can both be dangerous when used incorrectly

adamsitnik

@tannergooding @stephentoub I believe that the PR is ready for another round of review. PTAL at the unresolved comments and let me know what do you think

adamsitnik · 2023-05-30T17:45:12Z

docs/coding-guidelines/vectorization-guidelines.md

+| Contains |    Scalar | 1024 | 143.844 ns | 0.6234 ns |  1.00 |     206 B |
+| Contains | Vector128 | 1024 | 104.544 ns | 1.2792 ns |  0.73 |     335 B |
+| Contains | Vector256 | 1024 |  55.769 ns | 0.6720 ns |  0.39 |     391 B |


I've been thinking about it a little bit more and I got to the following conclusion: it's not easy to find a built-in .NET method that provides a perfect perf boost thanks to vectorization. Even simple methods like IndexOf and Contains don't provide it for buffers of small primitive types.

That is why such boost might be rare and we should simply point the users to the right profilers which can tell them where the CPU time is spent.

I've added following sentece below the table with the results:

Note: as you can see, even such simple method like Contains did not observe a perfect performance boost: x8 for Vector256 (256/32) and x4 for Vector128 (128/32). To understand why, we would need to use a profiler that provides information on CPU instruction level, which depending on the hardware could be Intel VTune or amd uprof.

@stephentoub @tannergooding thoughts?

adamsitnik · 2023-05-30T17:45:38Z

docs/coding-guidelines/vectorization-guidelines.md

+    Vector128<int> sum = Vector128<int>.Zero;
+
+    while (!Unsafe.IsAddressGreaterThan(ref current, ref oneVectorAwayFromEnd))
+    {
+        sum += Vector128.LoadUnsafe(ref current);
+
+        current = ref Unsafe.Add(ref current, Vector128<int>.Count);
+    }
+
+    int result = Vector128.Sum(sum);
+
+    while (Unsafe.IsAddressLessThan(ref current, ref end))
+    {
+        result += current;
+
+        current = ref Unsafe.Add(ref current, 1);
+    }


I see we touch on this below, I think it would be much clearer to cover the "appropriate" way to do it here first in the normal reading order and then to cover things like this in a separate "what not to do and why" section.

That is why in the paragraph above (Loops), where I need to load the values, I am using the recommended way (so it's the first thing the users see). In this paragraph, I start on purpose with the "wrong" way, just to point out the disadvantages, move on & repat and at the end I mention the recommended approach.

adamsitnik · 2023-05-30T17:46:21Z

docs/coding-guidelines/vectorization-guidelines.md

+## Mindset
+
+Vectorizing real-world algorithms seems complex at the beginning. And what do software engineers do with complex problems? We break them down into sub-problems until these become simple enough to be solved directly.
+
+Let's implement a vectorized method for checking whether a given byte buffer consists only from valid ASCII characters to see how similar problems can be solved.


I think it would be better to have the example of vectorization earlier in the doc.

In a way we already have that, as in the "Loops" paragraph I provide a working Sum and Contains method implementations with comments.

I've spent a lot of time thinking about your recommendation and I believe it's a good one, but I would prefer to keep the existing order.

My main motivation is to prevent the readers from running into various traps, so I really want them to read the whole document. I am hoping that with the current structure, our target readers are going to learn something interesting in every paragraph, which is going to keep them focused and reach the end without missing anything.

With your proposal, I am afraid that some of the readers could skip some parts of it and run into issues later.

adamsitnik · 2023-05-30T17:47:15Z

docs/coding-guidelines/vectorization-guidelines.md

+- If you have already vectorized your code, but only for `x64/x86` or `arm64/arm`, you can use the  new APIs to have a single, cross-platform implementation.
+- If you have already vectorized your code with `Vector<T>` you can use the new APIs to check if they can produce better code-gen.
+- If you are not familiar with hardware specific instructions or you are about to vectorize a scalar algorithm, you should start with the new `Vector128` and `Vector256` APIs. Get a solid and working implementation and eventually consider using hardware-specific methods for performance critical code paths.
+- Both managed references and unsafe pointers are dangerous to use incorrectly and each comes with their own tradeoff.


Prefer managed references over unsafe pointers to avoid pinning and safety issues.

I've removed this point from the Best practices and added following sentence to the Summary:

Both managed references and unsafe pointers are dangerous to use incorrectly and each comes with their own tradeoff.

jeffhandley · 2023-06-02T18:15:15Z

BTW, will this go on learn.microsoft.com? Something like this probably should be there.

I don't think that question got answered yet, @danmoseley. This document won't go out there, but this was a step toward being able to do that. The general audience will need a slightly different format/flow/ordering that can build the concepts up likely more in the order that @tannergooding had suggested in some of his feedback. We wanted to tackle this smaller audience of dotnet/runtime contributors/maintainers first.

danmoseley · 2023-06-02T19:05:51Z

BTW, will this go on learn.microsoft.com? Something like this probably should be there.

I don't think that question got answered yet, @danmoseley. This document won't go out there, but this was a step toward being able to do that. The general audience will need a slightly different format/flow/ordering that can build the concepts up likely more in the order that @tannergooding had suggested in some of his feedback. We wanted to tackle this smaller audience of dotnet/runtime contributors/maintainers first.

Fair enough. With respect to reframing it for docs use, that seems like something someone in the docs team might be open to doing, and then it can be iterated on in the docs repo. I'm reminded of how we moved many of our interop guidelines out of a markdown in this repo to the docs and they benefited everyone then.

docs/coding-guidelines/vectorization-guidelines.md

danmoseley · 2023-10-03T20:46:42Z

way too late for this comment, but in reading it fresh, I realized that we don't establish up front how these relate to Vector<T>. There's only one mention of that, and it's at the end. Eg., is Vector<T> ever worth using in new code.

EgorBo · 2023-10-03T20:55:15Z

way too late for this comment, but in reading it fresh, I realized that we don't establish up front how these relate to Vector<T>. There's only one mention of that, and it's at the end. Eg., is Vector<T> ever worth using in new code.

I presume the doc will need to be updated to include ISimdVector as well once it's production-ready cc @tannergooding

stephentoub · 2023-10-03T21:02:04Z

way too late for this comment, but in reading it fresh, I realized that we don't establish up front how these relate to Vector<T>. There's only one mention of that, and it's at the end. Eg., is Vector<T> ever worth using in new code.

I presume the doc will need to be updated to include ISimdVector as well once it's production-ready cc @tannergooding

That's still internal only (and even only to corelib right now).

More pressing would be updating for Vector512 :)

adamsitnik requested review from jeffhandley, stephentoub and tannergooding March 30, 2023 08:37

dotnet-issue-labeler bot added the area-Meta label Mar 30, 2023

ghost assigned adamsitnik Mar 30, 2023

Introduction to vectorization with Vector128 and Vector256

ce1840f

adamsitnik force-pushed the introductionToVectorization branch from 848cc7e to ce1840f Compare March 30, 2023 08:43

gfoidl reviewed Mar 30, 2023

View reviewed changes

Apply suggestions from code review

5e89ed8

Co-authored-by: Günther Foidl <gue@korporal.at>

adamsitnik commented Mar 30, 2023

View reviewed changes

docs/coding-guidelines/vectorization-guidelines.md Outdated Show resolved Hide resolved

adamsitnik commented Mar 30, 2023

View reviewed changes

docs/coding-guidelines/vectorization-guidelines.md Outdated Show resolved Hide resolved

adamsitnik commented Mar 30, 2023

View reviewed changes

docs/coding-guidelines/vectorization-guidelines.md Outdated Show resolved Hide resolved

xtqqczze mentioned this pull request Mar 30, 2023

Vectorize Enumerable.Range(...).ToArray() #80633

Closed

EgorBo reviewed Mar 30, 2023

View reviewed changes

docs/coding-guidelines/vectorization-guidelines.md Outdated Show resolved Hide resolved

EgorBo reviewed Mar 30, 2023

View reviewed changes

docs/coding-guidelines/vectorization-guidelines.md Outdated Show resolved Hide resolved

danmoseley reviewed Mar 30, 2023

View reviewed changes

Apply suggestions from code review

fe4aaca

Co-authored-by: Dan Moseley <danmose@microsoft.com> Co-authored-by: Günther Foidl <gue@korporal.at>

yesmey reviewed Mar 31, 2023

View reviewed changes

docs/coding-guidelines/vectorization-guidelines.md Show resolved Hide resolved

gfoidl reviewed Mar 31, 2023

View reviewed changes

Rob-Hague reviewed Mar 31, 2023

View reviewed changes

adamsitnik commented Mar 31, 2023

View reviewed changes

adamsitnik and others added 2 commits March 31, 2023 15:01

Apply suggestions from code review

1fa325f

Co-authored-by: Rob Hague <rob.hague00@gmail.com> Co-authored-by: Günther Foidl <gue@korporal.at>

Apply suggestions from code review

e7ab250

Co-authored-by: Rob Hague <rob.hague00@gmail.com>

Apply suggestions from code review

16c1819

Co-authored-by: Rob Hague <rob.hague00@gmail.com>

tannergooding reviewed Apr 6, 2023

View reviewed changes