-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider expanding Vector<T>
to support nint
and nuint
#36160
Comments
Tagging subscribers to this area: @tannergooding |
CC. @pgovind, @CarolEidt, @echesakovMSFT |
We have an existing use case in the runtime in Utf16Utility.Validation.cs where we are using the following to workaround not having this: #if TARGET_64BIT
using nuint_t = System.UInt64;
#else
using nuint_t = System.UInt32;
#endif |
Could you also support any reference types? |
As I understand it, that would be a non-trivial ask and likely not for a lot of benefit. Many of the operations exposed aren't valid or don't make sense for reference types and the JIT doesn't support tracking GC types in the SIMD registers. |
I wonder if we could optimistically handle fast collection lookup anyway, even without support for reference types in SIMD. Consider that you have a T[] (where T is a reference type), and you want to quickly look up the index of any given T within the array. Assume referential equality checks, not deep equality checks. The algorithm would then be as follows:
Since the target is pinned, it cannot be moved by the GC, so you don't have to worry about having the GC track individual elements within the SIMD registers. It's possible that the GC might move other elements of the array while you're inspecting it, but that's ok since neither their original addresses nor their modified addresses will match the pinned address of the target you're seeking. This means you're guaranteed zero false positives and zero false negatives. Again, this would only work for referential equality checks. Since I'm not sure how common referential vs. deep equality checks are I'm not sure how much benefit this would offer in practice. |
BTW,maybe we also want support for |
Maybe we need some compiler specific support or JIT intrinsic for done this effectively. IntPtr e0,e1,e2,e3;//fixed-pointers
Unsafe.WriteUnaligned(ref Unsafe.As<IntPtr,byte>(ref e0),Unsafe.ReadUnaligned<Vector<nint>>(ref Unsafe.As<T,byte>(ref array[0])));
var vector = Unsafe.ReadUnaligned<Vector<nint>>(ref e0); |
C# going to have record type feature. This is a pseudo code. [EquatabilityContract]
[CompilerGenerated]
class Record
{
[EquatabilityMember]
long id;
}
Record[] records;
Record Find(long id)
{
for(var i = 0;i + Vector256<object>.Count <records.Length;i+=Vector256<object>.Count)
{
var vector = Unsafe.ReadUnaligned<Vector256<object>>(ref records[i]);
var ids = Avx2.GatherVector256<long>(JitHelper.OffsetOf<Record>(tokenof(id)),vector);//"tokenof" means member definition reference in IL.Compiler could generate it
var compare = Avx2.CompareEqual(vector,Vector256.Create(id));
var mm = Avx2.MoveMask(compare.AsBytes());
if(mm!=0)
{
return Unsafe.As<Record>(vector.GetElement(mm/IntPtr.Size));
}
}
}
|
namespace System.Numerics
{
public partial struct Vector<T>
{
public static explicit operator Vector<nint>(Vector<T> value);
public static explicit operator Vector<nuint>(Vector<T> value);
}
public static partial class Vector
{
public static Vector<nint> AsVectorNInt<T>(Vector<T> value);
public static Vector<nuint> AsVectorNUInt<T>(Vector<T> value);
public static Vector<nint> Equals(Vector<nint> left, Vector<nint> right);
public static Vector<nuint> Equals(Vector<nuint> left, Vector<nuint> right);
public static Vector<nint> GreaterThan(Vector<nint> left, Vector<nint> right);
public static Vector<nuint> GreaterThan(Vector<nuint> left, Vector<nuint> right);
public static Vector<nint> GreaterThanOrEqual(Vector<nint> left, Vector<nint> right);
public static Vector<nuint> GreaterThanOrEqual(Vector<nuint> left, Vector<nuint> right);
public static Vector<nint> LessThan(Vector<nint> left, Vector<nint> right);
public static Vector<nuint> LessThan(Vector<nuint> left, Vector<nuint> right);
public static Vector<nint> LessThanOrEqual(Vector<nint> left, Vector<nint> right);
public static Vector<nuint> LessThanOrEqual(Vector<nuint> left, Vector<nuint> right);
}
} |
@jkotas, @CarolEidt: How important is maintaining perf of Today, much of the code is generated via T4 templates and internally uses the However, given the existence of
This would also remove duplication between the
|
Do you have any numbers for how much regression we would potentially see on ARM32? It would impact Mono target platforms too. @marek-safar How much do we care about non-accelerated |
I don't have any currently but I can try and get some. I imagine some parts would be improved, especially in user code, due to less data being zeroed (its currently a 16-byte, 66 field explicit layout struct). But some parts, namely the methods in S.P.Corelib, would likely regress due to running a |
@jkotas we don't care about performance for that config |
@jkotas, it looks like the worst case is about a 4x* perf regression (for
BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.508 (2004/May2020Update/20H1) Current:
Generic BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.508 (2004/May2020Update/20H1) PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable Toolchain=CoreRun
|
Thanks for collecting the data. I am supportive of your proposal to simplify the template. |
I think that removing the T4 templates is the right thing to do, but it also seems that it would be worth what might be a smallish investment to reduce the perf impact for arm32. |
I'll take a look and see if adding the loop unrolling support is trivial. From what I recall on doing it for |
@tannergooding - that would be good, but perhaps that could also be done as a "cleanup" PR that includes the work to figure out why the HFA classification requires the Register type. |
So we have the SIMDHandlesCache which is set based on matching the name of the generic type: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/simd.cpp#L325-L330. However, we then map that later based on just the
However, Is there some way to get the handle for CC. @CarolEidt, @dotnet/jit-contrib |
@jkotas, is there currently a JIT/EE method that can be used to resolve a handle? This would end up being "pay for play" as it would only be resolved once and only if a I was also looking at finishing connecting the |
Does |
I'm not really a fan of the
That seems like an issue we need to be able to handle in any event, and I wonder if a synthetic |
I don't think so. The issue is we are building the This means that when we later try and get the handle back, such as in https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/gentree.cpp#L17354-L17356, we will try to get a handle for I was hoping there was some way to, for example, from a class handle for
I think this, in general, probably needs a bit of thought. We also have a couple issues around rewriting intrinsics into user calls in lowering (such as to better handle operands that later become constants) in which case I think we want to carry a I believe you can get the |
BTW: This name matching code is sub-optimal. It is using slow method for name formatting that it is meant to be used for debug-only tracing. The TODO at https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/simd.cpp#L257 talks about it. We can add JIT/EE interface methods that let you create new generic instantiations if it helps JIT to represent things. JIT would have to guarantee that these handles are never embedded into the code (directly or indirectly), e.g. the JITed code cannot call these instantiations. Otherwise, it would cause problems for AOT. |
This is what Is it prohibitively expensive if we used it exclusively or would we still need some caching layer to help?
I think that's reasonable and don't believe we are using the handles for anything like that today. What I'd like to do, ideally, is come to some middle ground between where we are today with the It sounds like we might be able to do something like:
This would allow us to track the proper base type and utilize |
I got most of the changes done (it rounds out to +812, -722 lines, approx), but ran into a bit of "snag". In particular This isn't hard to handle, but I was wondering if someone could explain the reason for the difference here? At first glance it looks like a bug, but I'm guessing there is some reason for it. |
Isn't this related to the fact that CLI models only 3 integral types on the computation stack - |
The base type is distinct from the node type and while they often match up, there are many cases where they don't.
That was my initial thought, but |
I'm not sure why this is the case. It does seem quite inconsistent, but I would bet that there's JIT code somewhere that depends on it. |
Rationale
Today
Vector<T>
supports the following 10 primitive types:byte
,sbyte
,short
,ushort
,int
,uint
,long
,ulong
,float
, anddouble
.C# 9 is introducing support for
nint
andnuint
which are variable sized integers matching the bitness of the underlying platform. That is, they are 32-bits on 32-bit systems and 64-bits on 64-bit systems.As such, it may be beneficial to expand
Vector<T>
to additionally support these types so we can get rid of the using aliases and support performing the cross-platform vector operations using these new primitive types.Proposal
Extend
Vector<T>
to supportnint
andnuint
as valid primitive types. This will extend a number of existing generic functions which take aVector<T>
to also support taking the new types rather than throwing aPlatformNotSupportedException
.Additionally, the following non-generic APIs should be added for parity with the existing surface area:
Other Considerations
For API names, we have a guideline that states to use the framework name rather than the language keyword name (e.g.
Int32
andInt64
rather thanint
andlong
). However, the framework name fornint
isIntPtr
but the operators exposed and general use case for the two types is somewhat different, as such a name such asNInt
andNUInt
may be a better alternative. Whatever we choose, it should likely become the standard fornint
moving forward.The same request could be made for
System.Runtime.Intrinsics
, but the API bloat for this would be much larger and would need further consideration. It might be worthwhile to allownint
/nuint
as validT
without exposing the additional overloads initially as that would at least unblock users from providing such APIs themselves.The text was updated successfully, but these errors were encountered: