Add ContainsReferences property #4309

omariom · 2015-06-13T12:48:06Z

Currently generic List calls Array.Clear on its underlying array in the implementation of Clear, RemoveAll, RemoveRange mathods.
It has to do so because the generic argument can contain references which must be freed for GC.
But what if it is plain Int32 or any other value type that doesn't have references in it directly or indirectly?

If Type had a property that could say if the type contains references then clearing the array could be completely skipped.
I see a huge performance opportunity here - less CPU work, less memory traffic and pollution.

And not only there. If JIT considered this property value as a JIT time constant then the check itself and the branch of the generated code could be skipped as well. Other methods could benefit from that without sacrificing a nanosecond - like RemoveAt.

omariom · 2015-06-13T12:49:05Z

Dictionary.Clear could take advantage of that too. And probably some concurrent collections.

hadibrais · 2015-06-14T12:23:31Z

@omariom Yes you are right actually. Array.Clear doesn't have to be called when the elements are of a value type. We can easily fix this by adding the condition !typeof(T).IsValueType and clearing the array only if it is true. This will reduce the asymptotic running time of List.Clear to constant. Also this won't be a breaking change because setting value typed elements to zero is not part of the API's spec as it only guarantees releasing references. However, the JIT may or may not optimize the check away. We'll have to see. Either way, I think we should make the change.

omariom · 2015-06-14T12:27:56Z

@hadibrais Unfortunately checking for valuetypeness alone is not enough. Value types can contain references.

omariom · 2015-06-14T12:30:16Z

The proposed ContainsReferences property should return true when the type is a reference type or a value type containing references either directly or indirectly.

hadibrais · 2015-06-14T12:33:59Z

@omariom Yeah I missed that. But there's an elegant way to solve this problem without incurring any performance penalties. We can add a constructor in which the user can indicate whether to call Array.Clear or not. The default, of course, would be to call Array.Clear. Otherwise, explicitly including a reliable type check might degrade the perf of the method if the list contained reference type elements.

hadibrais · 2015-06-14T12:35:05Z

ContainsReferences would be useful in this case only if it has a small constant running time.

omariom · 2015-06-14T12:45:56Z

@hadibrais
I see 2 issues with adding a new ctor:

Only new code will benefit from the change.
With any change to the type (making it ref from value or adding a ref field directly or indirectly) we will have to check ALL the usages of the type in Lists. Failing to change the param from false to true may lead to "memory leaks".

omariom · 2015-06-14T12:48:55Z

ContainsReferences would be useful in this case only if it has a small constant running time.

@hadibrais
MethodTable already has ContainsPointers method. If it conforms to all the requirements of ContainsReferences then the team could just expose it. And it will be very fast even without JIT support.

hadibrais · 2015-06-14T13:28:14Z

I agree that the constructor will reduce maintainability but that will be the fastest implementation.

hadibrais · 2015-06-14T13:29:32Z

I think ContainsReferences can be much more useful if it returned those fields that are of reference types rather than just saying whether the type contains reference fields of not.

hadibrais · 2015-06-14T14:15:07Z

As far as I know all elements of Array-based arrays are properly aligned, therefore the current implementation of Array.Clear is not optimal. It can be made smaller and faster.

omariom · 2015-06-14T14:35:26Z

@hadibrais Do you mean that for the case the array element doesn't contains references it can be implemented more efficiently?

hadibrais · 2015-06-14T14:57:34Z

No I mean when it contains references, it can be made more efficient since references are aligned on address boundaries so they can be zeroed out in GC-safe way without any extra bytes left unzeroed. However, this is only true when the prefer-32bit option in Visual Studio is not checked.

billetdoux · 2015-06-14T14:58:58Z

Every little helps. Good one.

omariom · 2015-06-15T18:05:27Z

Found another place where it could help with perf and greener environment.

In the implementation of ConcurrentQueue+Segment.TryRemove if typeof(T).ContainsReferences == false then this line:

 _array[lowLocal] = default(T); //release the reference to the object.

can be skipped. And as I suspect all the voodoo around _numSnapshotTakers as well.

@stephentoub Can you pls check it? Am I right? Is it worth the efforts?

stephentoub · 2015-06-15T18:55:03Z

I personally think a ContainsReferences feature would be a nice addition. Whether or not the JIT could treat it as an intrinsic, with the JIT's ability to treat readonly statics of scalars as constants (https://github.com/dotnet/coreclr/issues/1079), a library like System.Collections could have a cached Boolean value based on it, e.g.

internal static class TypeCache<T>
{
    internal static readonly bool ContainsReferences = typeof(T).ContainsReferences;
}

The JIT should then still be able to remove conditional branches using this TypeCache<T>.ContainsReferences when it's known to be false.

You could even implement something like this yourself, e.g. it's possible there are some corner cases I've missed, but here's a quick example:

internal static class TypeCache<T>
{
    internal static readonly bool ContainsReferences = GetContainsReferences(typeof(T));

    private static bool GetContainsReferences(Type t)
    {
        if (!t.IsValueType)
            return true;

        foreach (FieldInfo fi in t.GetFields(BindingFlags.DeclaredOnly | BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance))
        {
            if (fi.FieldType != t && GetContainsReferences(fi.FieldType))
                return true;
        }

        return false;
    }
}

and when I look at the disassembly for a few different call sites, I see the JIT doing exactly what you hoped it would (this is using VS2015 RC targeting x64):

        Console.WriteLine(TypeCache<int>.ContainsReferences);
00007FFDABEF542E  xor         ecx,ecx  
00007FFDABEF5430  call        00007FFE0A94A590  
        Console.WriteLine(TypeCache<DateTime>.ContainsReferences);
00007FFDABEF5435  xor         ecx,ecx  
00007FFDABEF5437  call        00007FFE0A94A590  
        Console.WriteLine(TypeCache<List<int>.Enumerator>.ContainsReferences);
00007FFDABEF543C  mov         ecx,1  
00007FFDABEF5441  call        00007FFE0A94A590

Can you pls check it?

Yes, that line should be removable if T is known to not contain references.

And as I suspect all the voodoo around _numSnapshotTakers as well.

(I think this is what you're saying, but just to be sure...) All of the code related to _numSnapshotTakers would still need to remain; it's just that you could avoid executing it if you knew that clearing wasn't necessary, e.g. instead of

 if (_source._numSnapshotTakers <= 0)
{
    _array[lowLocal] = default(T); //release the reference to the object.
}

you'd have:

 if (TypeCache<T>.ContainsReferences && _source._numSnapshotTakers <= 0)
{
    _array[lowLocal] = default(T); //release the reference to the object.
}

That could actually be more beneficial where _numSnapshotTakers is modified rather than where it's used (as it is here to determine whether to clear). The _numSnapshotTakers mechanism exists because we want to be able to clear, but if code has taken a snapshot of the collection (e.g. to enumerate it while other code is changing it concurrently), we don't actually want to change any of the existing elements by zeroing them out. Noting whether a snapshot is currently in progress requires some synchronization, e.g. the interlocked operation at https://github.com/dotnet/corefx/blob/master/src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentQueue.cs#L272, but if we knew that clearing was never necessary, we could avoid such synchronization.

Is it worth the efforts?

I've no doubt in some scenarios it could be a measurable win. Whether it's "worth the efforts" to do this in the BCL / runtime would I think require more effort to effectively answer.

omariom · 2015-06-15T19:31:18Z

it's just that you could avoid executing it if you knew that clearing wasn't necessary

Yes, that's what I meant.

omariom · 2015-06-16T11:09:23Z

You could even implement something like this yourself, e.g. it's possible there are some corner cases I've missed..

It reports true for pointer fields.
What if, in the future, interior pointer fields are added to CLR?
That's why I prefer CLR help me with it.

stephentoub · 2015-06-16T11:16:45Z

It reports true for pointer fields. That's why I prefer CLR help me with it.

So add if (t.IsPointer) return false; to the beginning of that GetContainsReferences method...?

omariom · 2015-06-16T11:28:20Z

I did this:

if (!t.IsValueType && !t.IsPointer)
        return true;

stephentoub · 2015-06-16T12:09:53Z

I did this

And that doesn't work for you or it does? This will end up doing some unnecessary work in the case of a pointer, as it'll still try to get the fields of the pointer (whereas in my suggestion it just exits immediately), but I'd expect it to still work: the pointer shouldn't have any declared fields, so it'll end up falling through to the return false; at the end.

omariom · 2015-06-16T12:16:35Z

Aah.. I missed your point. If put as you suggested in the beginning then yes, it works and avoids checking for fields.

stephentoub · 2015-06-16T14:24:01Z

Ok, good. If you wanted to experiment with that and use it to, for example, modify some of the collections in corefx locally to get some numbers about the benefits it could have, that would be useful information in making a decision about whether a feature like this is something that should be exposed from the BCL.

cc: @KrzysztofCwalina

omariom · 2016-08-01T20:40:19Z

Has the future come for this issue?

It could be the fist step in adding blittable constraint to the runtime and the language.

ghost · 2016-11-09T18:16:01Z

t.IsPointer tests for unmanaged pointers. Why is that something that should be forbidden? (Was this supposed to be t.IsByRef?)

nietras · 2016-11-23T11:52:30Z

In https://github.com/dotnet/corefx/issues/13427 we discussed this too, and @jkotas suggested we add ContainsReference<T> to RuntimeHelpers. Whether this should be called IsReferenceFree<T> or not is an open question too. So the proposal for an API for this could be:

public static class RuntimeHelpers
{
    public static bool ContainsReference<T>();
}

OR

public static class RuntimeHelpers
{
    public static bool IsReferenceFree<T>(); // aka !ContainsReference
}

masonwheeler · 2016-11-23T12:41:12Z

@nietras I would prefer ContainsReferences. As your code snippet explicitly points out, IsReferenceFree is a negative, which means you can end up with double negatives. It's cleaner to just use a positive notation.

omariom · 2016-11-23T12:58:53Z

@nietras
In the cases we have now ContainsReferences is more natural:

if (Runtimehelpers.ContainsReferences<T>())
   throw new InvalidOperationException("Types having references cannot be used with Unsafecast.")

// or

if (Runtimehelpers.ContainsReferences<T>())
{
    // Free for GC
    Array.Clear(_items, 0, _size);
}

nietras · 2016-11-23T19:23:52Z

IsReferenceFree is a negative, which means you can end up with double negatives. It's cleaner to just use a positive notation.

This depends on your perspective, but I agree that in Span<T> https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Span.cs#L114 this inverted check is weird and less readable. When the method could easily have been called ContainsReference here.

However, in my use case I would write something like:

if (Runtimehelpers.IsReferenceFree<T>())
{
    Unsafe.CopyBlock(ref _items[0], size);
}
else
{
    // for loop
}

But of course this can be moved around. I would also hope you cache the ContainsReferences call like in https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/SpanHelpers.cs#L50 so this works well on other runtimes too.

Next question is whether to use ContainsReference<T> singular or ContainsReferences<T> plural? I've seen both.

jkotas · 2016-11-23T20:00:08Z

This method is really about GC pointers. I am wondering whether the name should reflect it. What about ContainsGCPointers<T>?

omariom · 2016-11-23T20:07:57Z

We, high level C# devs, call them references :)
GcPointers is a too low level and runtime name.

nietras · 2016-11-23T20:08:41Z

GC pointers

@jkotas if you don't mind, what exactly is the difference between "reference" and "GC pointers" in your mind?

I understand that from a native view reference is something entirely different, but I thought ref is used to differentiate on .NET. Anywhere the lingo is defined? Have actually been thinking about starting a discussion on how we could coin a term that would exactly convey that something has no GC pointers, e.g. "reference free", "unmanaged" or ?? and all the terms that can not be used "primitive", "ref free", "blittable", yet are often used by some, which means conversations around this can be... imprecise at best.

ghost · 2016-11-23T21:08:36Z

Just a thought... in ECMA lingo, the term we're looking for is a union of "reference" (pointer to an object-as-a-whole) and "managed pointer" (the types denoted by the "ref" keyword in C#), or more practically, things that implicitly contain managed pointers (e.g. Span.)

But perhaps instead of the name enumerating the kind of types you can have inside, the name can talk about the motivating restriction: the type must not contain bits that are asynchronously rewritten by the garbage collector. Something like "ContainsGcManagedContent()" or something. That would make it clearer why it's used when it's used.

benaadams · 2016-11-23T21:30:45Z

GC managed items are non-value types?

ContainsClassReference<T> or ContainsObjectReference<T> or ContainsHeapReference<T>

omariom · 2016-11-23T21:31:25Z

\cc @KrzysztofCwalina

svick · 2016-11-23T22:29:32Z

@benaadams I think two of your proposals are not accurate names:

ContainsClassReference<T>: interfaces are reference types that are not classes (and the value can be a boxed value type, so that doesn't have to be a class either)
ContainsHeapReference<T>: heap is an implementation detail (and there actually seems to be some work being done to allocate reference types on the stack: Work towards objects stack allocation coreclr#6653)

jkotas · 2016-11-23T22:59:36Z

what exactly is the difference between "reference" and "GC pointers" in your mind?

No real difference - they are often used interchangeably. I was just wondering about the best name to use for this API.

For reference, here are the existing names used internally in the runtime implementations for similar concept (the runtime concept is slightly different - it always describes the fields even for non-valuetypes, but I assume that what is getting proposed here should always return true for non-valuetypes):

ContainsPointers https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/methodtable.h#L1891
CORINFO_FLG_CONTAINS_GC_PTR https://github.com/dotnet/coreclr/blob/7250e6f6630839b09d54f2f71d858b33c018ae8b/src/inc/corinfo.h#L873
HasReferenceFields
https://github.com/dotnet/corert/blob/4c7d585544306f6e99b1454df905ac957ac55422/src/Native/Runtime/inc/eetype.h#L415
HasPointers https://github.com/dotnet/corert/blob/406b1e533fb5adf8bd73289faa9abe962832c41d/src/System.Private.CoreLib/src/System/EETypePtr.cs#L323
ContainsGCPointers https://github.com/dotnet/corert/blob/91e5ce227650d8af0cf870bce0006c5f4c7208df/src/Common/src/TypeSystem/Common/DefType.FieldLayout.cs#L76

benaadams · 2016-11-23T23:00:59Z

@svick narrowing down my suggestions then ContainsObjectReference<T> 😄

ghost · 2016-11-23T23:25:03Z

Unsubscribing from thread as my year-end vacation is about to start. See 'yall in January.

nietras · 2016-11-26T10:07:32Z

a union of "reference" (pointer to an object-as-a-whole) and "managed pointer" (the types denoted by the "ref" keyword in C#)

Aren't managed pointers disallowed as fields? Hence, these are not relevant to this functionality? At least that is how I read the ECMA, see I.8.2.1.1 "Managed pointer types are only allowed for local variable (§I.8.6.1.3) and parameter signatures (§I.8.6.1.4)". And this is also the reason for the design of Span<T>.

ContainsObjectReference<T>

Object is superfluous in my view. And perhaps even wrong since this applies to interface types etc.

In ECMA a reference is clearly defined to be a reference to an object as a whole. And since only reference types can be fields (besides value types) then the check is for whether there are any "reference type" in a types fields seen as a whole. An example of the use of "reference" in an API would simply be object.ReferenceEquals(object,object).

And since we a nitpicking here, neither IsReferenceFree nor ContainsReference names are entirely correct. They don't just check if a type contains any reference types they also check if the type T itself is a reference type. So given this I would think we either need to split this in two or name it correctly as something like:

bool IsReferenceOrContainsReferences<T>() //sans last "s" if singular preferred
// OR the negative of this
bool IsValueTypeAndReferenceFree<T>()

Not ideal since its a bit long, but at least IsReferenceOrContainsReferences<T>() is accurate and still reads well:

if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
     ThrowHelper.ThrowArgumentException_InvalidTypeWithPointersNotSupported(typeof(T));

compared to the current:

if (!SpanHelpers.IsReferenceFree<T>())
     ThrowHelper.ThrowArgumentException_InvalidTypeWithPointersNotSupported(typeof(T));

and ThrowArgumentException_InvalidTypeWithPointersNotSupported should be renamed to ThrowArgumentException_InvalidTypeWithReferencesNotSupported.

If we split the check in two then everytime this would be used one would have to write something like:

if (!typeof(T).IsValueType && RuntimeHelpers.ContainsReference<T>())
{
   ...
}

which would be pretty terrible. But calling it just e.g. ContainsReference is incomplete in my view.

Other options would be to name it according to asking if the type is "referring" to any references which would include the type itself. Can't think of a good name though. Below a list of some other suggestions including my current preferred (most of these are poor, so suggestions are welcomed, but I think we need to keep reference as part of the name instead of pointer or similar):

bool IsReferenceOrContainsReferences<T>()
bool IsOrHasReference<T>()
bool IsReferenceOrHasReferences<T>()
bool RefersToAnyReferences<T>()
bool Refers<T>()
bool IsReferent<T>()
...

Sans last "s" if singular preferred instead of "references".

On a side note, I actually think there is a problem with the method name DangerousGetPinnableReference this does not return a reference but a ref, which are different things, right?

omariom · 2016-11-26T11:04:39Z

DangerousGetPinnableManagedPointer?

nietras · 2016-11-26T12:34:38Z

Yes, or DangerousGetPinnableByRef. At least from the ECMA it says:

A managed pointer (§I.12.1.1.2), or byref (§I.8.6.1.3, §I.12.4.1.5.2), can point to a local variable, parameter, field of a compound type, or element of an array.

However, not sure this means they are necessarily interchangeable.

DangeruousGetPinnableRef probably doesn't work since this is more C# speak than anything, I think F# uses ref to make a Ref<T> reference type, so that is not the same. VB.NET uses ByRef I think.

omariom · 2016-11-26T13:44:15Z

At least we don't have to invalidate a cache :)

jkotas · 2016-11-26T14:57:45Z

In ECMA a reference is clearly defined to be a reference to an object as a whole

There is definition of "reference type", "object reference", "typed reference" or "member reference". I do not think there is a clear definition of "reference" alone.

Similar for pointer, there is definition of "managed pointer", "pointer type" or "this pointer"; but no "pointer" alone.

Aren't managed pointers disallowed as fields?

Yes, they are disallowed. It may be interesting to consider what this method should return and be called if they were allowed in theory.

at least IsReferenceOrContainsReferences<T>() is accurate

I like how this makes it more accurate.

method name DangerousGetPinnableReference this does not return a reference

Yet another overload of reference, not that different from TypedReference. However, the name of this method is likely going to be revisited anyway.

nietras · 2016-11-26T17:25:08Z

do not think there is a clear definition of "reference" alone.

Ah yes, I should have written reference type in fact the method in question is checking for that i.e.

IsReferenceTypeOrContainsReferenceTypes<T>()

but that seems unnecessary given the input is type T, if it had been on Type like IsPrimitive (not IsPrimitiveType) then of course it would be without Type in the name.

And yes, overall "reference", "pointer" all have existing meanings both in managed and unmanaged cases, but since a ref (or managed pointer, which actually might just "reference" native memory) is different from a "managed reference", "Reference" seems a bit misleading in this case.

By the way, how does the runtime handle the case when DangerousGetPinnnableReference returns a ref to native memory? I.e. how does fixed or pinning work in that case?

jakobbotsch · 2016-11-27T11:08:57Z

By the way, how does the runtime handle the case when DangerousGetPinnnableReference returns a ref to native memory? I.e. how does fixed or pinning work in that case?

Managed pointers are allowed to point to unmanaged memory (see II.14.4.2 in ECMA-335). It is also possible with C# today:

int* ptr = (int*)Marshal.AllocHGlobal(4);
int.TryParse("", out *ptr);

fixed just adds a local managed pointer marked as pinned, so the GC should ignore these when they do not point to managed memory.

nietras · 2016-11-27T11:14:02Z

so the GC should ignore these when they do not point to managed memory

Yes, but how does the GC know that a given pointer can be ignored? That is, that the pointer does not point to managed memory? Is it the same as for ref in that it checks each ref against each managed memory segment or similar?

jkotas · 2016-11-27T15:53:40Z

ref is the thing being pinned, so it is the same thing.

BTW: This logic is in GCHeap::Promote:

Ignore null pointers
If it is ref (GC_CALL_INTERIOR), find the object that it belongs to. If it does not belong to any object, we are done.
If is pinned (GC_CALL_PINNED), pin the object for current GC.
Mark the object graph as alive

jkotas · 2016-11-28T05:08:37Z

Ok, I have created issue in the corefx repo to get this launched into the API review process https://github.com/dotnet/corefx/issues/14047

karelz · 2017-03-17T09:27:34Z

Seems to be dupe of https://github.com/dotnet/corefx/issues/14047 which has been fixed.
Please reopen if it is invalid understanding.

…nation. https://github.com/dotnet/coreclr/issues/1079 https://github.com/dotnet/coreclr/issues/1135#issuecomment-112172481

omariom changed the title ~~Type.ContainsReferences property~~ Add ContainsReferences property to Type class Jun 21, 2015

joshfree assigned nguerrera Oct 28, 2015

jkotas unassigned nguerrera Nov 22, 2016

jkotas changed the title ~~Add ContainsReferences property to Type class~~ Add ContainsReferences property Nov 23, 2016

karelz closed this as completed Mar 17, 2017

buybackoff referenced this issue in Spreads/Spreads Apr 16, 2017

TypeHelper<T>.Size must be static readonly to enable JIT branch elimi…

4355585

…nation. https://github.com/dotnet/coreclr/issues/1079 https://github.com/dotnet/coreclr/issues/1135#issuecomment-112172481

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

ghost locked as resolved and limited conversation to collaborators Jan 6, 2021

Add ContainsReferences property #4309

Add ContainsReferences property #4309

Comments

omariom commented Jun 13, 2015

omariom commented Jun 13, 2015

hadibrais commented Jun 14, 2015

omariom commented Jun 14, 2015

omariom commented Jun 14, 2015

hadibrais commented Jun 14, 2015

hadibrais commented Jun 14, 2015

omariom commented Jun 14, 2015

omariom commented Jun 14, 2015

hadibrais commented Jun 14, 2015

hadibrais commented Jun 14, 2015

hadibrais commented Jun 14, 2015

omariom commented Jun 14, 2015

hadibrais commented Jun 14, 2015

billetdoux commented Jun 14, 2015

omariom commented Jun 15, 2015

stephentoub commented Jun 15, 2015

omariom commented Jun 15, 2015

omariom commented Jun 16, 2015

stephentoub commented Jun 16, 2015

omariom commented Jun 16, 2015

stephentoub commented Jun 16, 2015

omariom commented Jun 16, 2015

stephentoub commented Jun 16, 2015

omariom commented Aug 1, 2016

ghost commented Nov 9, 2016

nietras commented Nov 23, 2016

masonwheeler commented Nov 23, 2016

omariom commented Nov 23, 2016

nietras commented Nov 23, 2016

jkotas commented Nov 23, 2016

omariom commented Nov 23, 2016

nietras commented Nov 23, 2016 • edited Loading

ghost commented Nov 23, 2016

benaadams commented Nov 23, 2016

omariom commented Nov 23, 2016

svick commented Nov 23, 2016

jkotas commented Nov 23, 2016

benaadams commented Nov 23, 2016

ghost commented Nov 23, 2016

nietras commented Nov 26, 2016

omariom commented Nov 26, 2016

nietras commented Nov 26, 2016

omariom commented Nov 26, 2016

jkotas commented Nov 26, 2016

nietras commented Nov 26, 2016

jakobbotsch commented Nov 27, 2016

nietras commented Nov 27, 2016

jkotas commented Nov 27, 2016

jkotas commented Nov 28, 2016

karelz commented Mar 17, 2017

nietras commented Nov 23, 2016 •

edited

Loading