-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API proposal: MemoryMarshal.GetRawArrayData #29003
Comments
Why not |
@jkotas - If we wanted to put this on |
(On the other hand, if we had a sister method |
What are the next steps to access subsequent elements once I have this
I don't fully understand the use case here for arrays, especially if users can do |
What about |
@AaronRobinsonMSFT The point is mainly to elide bounds checks in hot code paths. See below: ref T theRef = ref ThisFunction(theArray); // theRef now points to element 0 (NOT bounds checked)
theRef = ref Unsafe.Add(ref theRef, 1); // theRef now points to element 1
// ... @jkotas |
@GrabYourPitchforks But wouldn't the caller still want to know how many elements exist in order to properly call |
So essentially this would be a JIT intrinsic form of something like this?: /// <remarks>If <paramref name="array"/> is null, <see cref="ArgumentNullException"/> is thrown.</remarks>
/// <returns>A reference to the element at index 0 in the array, even if the array is empty.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static ref T GetRawArrayData<T>(T[] array)
{
if (array == null)
throw new ArgumentNullException(nameof(array));
return ref MemoryMarshal.GetReference(new Span<T>(array));
} I think having the function on EDIT: Come to think of it, couldn't this function technically be implemented as an overload for |
This is very low level method and throwing ArgumentNullException from it would make it much more expensive than it should be. This should throw the implicit NullReferenceException. (This would not be the first case in the framework to do this.) |
Why not? public static class MemoryMarshal
{
public static ref T GetReference<T>(T[] array);
} There is already such method for span. |
I'll update the proposal. |
I updated the description to put the API on |
Why do we need this API at all? It seems the existing APIs would cover this, no? namespace System.Runtime.InteropServices
{
public static class MemoryMarshal
{
public static ref T GetReference<T>(Span<T> span);
public static ref T GetReference<T>(ReadOnlySpan<T> span);
}
} Like this: T[] someArray = GetData();
ref T data = MemoryMarshal.GetReference(someArray); What would |
One potential pattern that would fail by bouncing through object[] arr = new string[] { "Hello", "world!" };
ref object o = ref MemoryMarshal.GetReference(arr); The implicit conversion from Perhaps we'd want this scenario to fail, but wanted to throw it out there. One other behavioral difference is in null handling. The implicit conversion from |
@MichalStrehovsky We were having trouble in API review justifying this new API given the existence of the methods Immo called out. Can you help clarify some of the scenarios? |
This is low-level API to enable micro-optimizations in performance sensitive code. You are right that one can get the same functionality by combining existing APIs, but not with the same performance characteristics. This API is equivalent of |
I think we can expose two APIs instead, which actually will be beneficial for Frozen Objects anyway. |
I do not think we want to make the super low-level APIs like these public. |
Do you consider |
Anything that talks about managed object size super low-level. It implies what the layout of managed objects and makes changing it hard or impossible.
It should be a non-goal for your frozen object experiment to run against public APIs. |
My scenarios mostly revolved around non-variable-size objects in the version of this API that is not limited to arrays (#33706). |
Reactivating based on Jan's comment at https://github.com/dotnet/corefx/issues/36133#issuecomment-512053677. Thank you for the context! Edit: For reference, here are all the locations where the runtime calls into the current (non-public) implementation of this method. It's used primarily as a perf optimization. |
Where and what would the equivalent ref char Getxxx(string str) And would they be in same class and overloads of each other |
We considered this: namespace System.Runtime.CompilerServices
{
public static class Unsafe
{
public static ref T GetRawArrayRef<T>(ref T source);
}
} but we'd prefer this: namespace System.Runtime.InteropServices
{
public static class MemoryMarshal
{
public static ref T GetReference<T>(T[] array);
}
} (Please not that's OK because the existing |
There is |
@benaadams - Here's a more detailed response on the difference in these APIs, in case you were curious. As @jkotas points out, For example, if you have a zero-length string theString = string.Empty;
ref readonly char theCharRef = ref theString.GetPinnableReference(); // returns a reference to the terminating null
char c = theCharRef; // c = '\0' once this line runs Contrast this to ReadOnlySpan<char> theSpan = GetString();
ref readonly char theCharRef = ref theSpan.GetPinnableReference(); // might return a null ref, which is ok
char c = theCharRef; // might NRE In both of the above cases, That said, there's an unsafe equivalent method ReadOnlySpan<char> theSpan = GetString();
ref readonly char theCharRef = ref MemoryMarshal.GetReference(theSpan);
if (!theSpan.IsEmpty)
{
char c = theCharRef; // OK, will return the first element of 'theSpan'
}
else
{
char c = theCharRef; // undefined behavior, including AVing the process
} The references returned by The reason this all works from a runtime perspective is that per ECMA-335, Sec. II.14.4.2, it's valid for a managed pointer (type &) to reference just past the end of a managed array. That is: int[] arr = [ 10, 20, 30 ];
'arr' memory layout is
[ OBJ_HEADER | <vtable_ptr> | element_count | (value 10) | (value 20) | (value 30) ]
^-- 'arr' object ref ^-- ref arr[0] ^-- ref arr[1] ^-- ref arr[2] ^-- ref arr[3]
points here points here points here points here would point here Within the runtime, the object header that begins every object is 2 native integers in width. In the diagram above I've split into two native-int sized fields: OBJ_HEADER and vtable_ptr. This is just for illustrative purposes so I can demonstrate how things are laid out. Assuming a 64-bit process, OBJ_HEADER, vtable_ptr, and element_count are all 64-bit fields. The array's data immediately follows. In this case, since it's an (The above paragraph handwaves a little bit for simplicity, but the overall model is generally accurate as of the time of this writing.) Say that you have a local of type mov ebx, dword ptr [rax + 8] ; element_count is always 32 bits, so use a 32-bit mov instead of a 64-bit mov Or, assuming register ecx has the index of the element you want to dereference: ; assume 'ecx' has already been bounds-checked
mov ebx, dword ptr [rax + 16 + 4 * rcx] ; contains the value arr[i]
lea rbx, [rax + 16 + 4 * rcx] ; contains ref arr[i], which is GC-trackable Importantly, it's valid (per ECMA-335 as mentioned above) to get a reference to just past the end of the array as long as you don't try to dereference it. The .NET garbage collector knows to treat this reference as a live reference to the original array, so you're not at risk of suddenly having objects ripped out from under you. This also means that you can perform reference address comparisons, as previously demonstrated by the ; assume 'rax' is an int[] of length 3
; assume 'ecx' = 3
lea rbx, [rax + 16 + 4 * rcx] ; contains ref arr[3], which is GC-trackable
; don't do this - it's no longer GC-trackable since it's now *PAST*
; the address just past the end of the array.
add rbx, 1 ; approx. equivalent to "ref Unsafe.AddByteOffset(ref arr[3], 1)"
; don't do this either - it has undefined behavior since it attempts to read memory
; past the end of the array.
mov rbx, dword ptr [rax + 16 + 4 * rcx] ; approx. equivalent to int value = ref arr[3] So... that's where this method comes in. It's essentially a way to allow you to write Whew, that was a mouthful. :) |
@GrabYourPitchforks You should add that to the Remarks section of the API; can't imagine a better description of the problem/use case. |
@terrajobst (Please not that's OK because the existing I ran some tests after our meeting and it turns out we were mistaken. :( public static void M(byte[] bytes)
{
Foo<byte>(bytes);
}
// M(...) ends up calling this method
private static void Foo<T>(Span<T> s) { }
private static void Foo<T>(ReadOnlySpan<T> s) { } |
Yep, that compiles fine (picks the So, shall we change this to what we were initially considering: public static class Unsafe
{
public static ref T GetRawArrayRef<T>(T[] array);
} |
|
Wouldn't the particular IL instruction be |
The |
I am amazed to learn that these It seems that these instructions could be very useful if they actually were a guarantee. The ECMA standard seems to allow the runtime to provide this guarantee: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf page 334. It would be possible to guarantee and document this as a feature of CoreCLR. |
This was already approved, but the originally approved method name |
"Ref" is an ambiguous abbreviation. I think it should be either |
Either sounds good to me, with my preference towards |
Rename to |
Could you please share the rationale for the "Ref" suffix? I do not think it follows the framework naming conventions and it is ambiguous with other uses of "Ref" like |
@jkotas I'm partial to T data = MemoryMarshal.GetRawArrayData(theArray); // WRONG
ref T refToData = ref MemoryMarshal.GetRawArrayData(theArray); // CORRECT Basically, without the Ref suffix, there's too much temptation to write the first line of code and potentially cause issues in your code base. There are some counterarguments against this. The first is that |
I am ok with the "Ref" suffix, but I think it should be a fully spelled "Reference" to match existing |
(Related to https://github.com/dotnet/corefx/issues/33706, but more narrowly scoped.)
Proposal
Motivation
Per the feedback at https://github.com/dotnet/corefx/issues/33706#issuecomment-442145588, third-party libraries may want to perform certain optimizations that today can only be performed within coreclr itself because we know the internal layout of SzArray instances.
Behavior
If array is null, this results in a standard
NullReferenceException
(see https://github.com/dotnet/corefx/issues/36133#issuecomment-474389127). Otherwise it returns a reference to the element at index 0 in the array, even if the array is empty.Per ECMA-335 (PDF), § II.14.4.2, it is legal for a managed pointer (type T&) to point just past the end of an array to where the next value would have been stored. Such T& cannot be safely dereferenced, but it is guaranteed to be tracked properly by the GC, and it can be used as an input argument in binary operators like
add
orclt
.Because it is possible to get a real GC-tracked managed pointer (T&), but because we cannot guarantee that it's safe to dereference, the
GetRawArrayData
method belongs onMemoryMarshal
or some other "unsafe" type. TheUnsafe
static class is also a candidate for where this API can live, but it does not have any existing API surface for dealing withT[]
inputs.Developers who want a T& that's always safe to dereference can continue to use the standard syntax for getting a reference to the first array element:
The text was updated successfully, but these errors were encountered: