-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string keyed dictionary ReadOnlySpan<char> lookup support #27229
Comments
Would it be better to instead have a set of extension methods on |
|
How would it perform the search (using hashcode + equals) against the current Dictionary api surface?
|
It wouldn't use the public API surface, it would use some new |
Ah.. so something like? static class StringKeyedDictionaryExtensions
{
static bool ContainsKey<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);
static bool Remove<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);
static bool TryGetValue<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key, out TValue value);
} |
Right, I think I missed the point then. The idea is to have the standard dictionary at the back end but methods which allow you to use ROS as the key to search for avoiding the allocation to check if something exists. So it needs access to the internals because the comparer needs to understand that it can do ROS-string comparisons which aren't currently possible. |
@benaadams Yup. Plus probably an extension method version of the indexer: public static T GetValue<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key); Though that's an advantage of the |
Also if they're extension methods we would need to provide overrides to specify a custom Comparer that can compare ROS right? Or that would be achieved by adding the StringEqualityComparer type and interface and in the extension methods check if Comparer is IStringEqualityComparer then use the ROS overrides if not the string ones? Also with override methods Adding values through the indexer or TryAdd/Add apis, we wouldn't be able to take advantage of the ROS comparisons right? |
Dictionary is based on
Can covert the |
Yes, but people could just use I think if we want to use the ROS comparison features and other of its features through a custom Comparer we should not use Extension methods, unless I’m missing something we could do in a different way without overriding implementation. |
Here's a comparison of the pro's and con's of the different design approaches.
Honestly, I'm kind of torn on this. Adding a new foundational Type to the ecosystem is costly in terms of what developers need to know. I'm feeling this even more now with the addition of all the types introduced with If the performance isn't an issue I now think I'd prefer the extension method route. |
@TylerBrinkley thanks for the great summary, I just updated the issue description and included this comparison in there. Will mark as ready for review now. |
I've updated the first post to correct formatting and make it a little easier to understand. |
Overloads for |
On the surface, the extension approach looks more appealing because it would allow the lookups to work against any instance of a dictionary (at least in principle, the current design only works if the dictionary was instantiated with a comparer that implements However, this also complicates the implementation and might have performance implications if we have to expose more methods that only make sense when I totally buy the scenario, but adding a new collection type (even if derived from What do others think? |
This is trying to workaround the fundamental problem that ref-like types cannot be used as keys in collections today. We will hit the same problem for all keyed collections and many ref-like types. @cdmihai mentioned that I think we need to have ref struct constraint first to make this scalable. |
I agree scalability is an issue if we add new types which is why I'm now in favor of the extension methods route which should scale more favorably. Even if the ref struct constraint is added I don't see how we can use that for this use case in a non-stack context such as non-ref-like class fields or static fields. |
It's not so much using |
Is what's equatable/alternative forms; so |
I would change NameValueCollection, making more effective version, passing allowDublicates parameter in constructor. |
Another reason the extension methods route makes better sense. What if we instead relied on the comparer to derive from |
Sounds reasonable. I can already see use-cases for wanting to determine the value equality between a //using regular old UTF-16 strings as a key due to framework reasons,
//even though 99.9% of the time, it doesn't go outside UTF-8 range
private readonly Dictionary<string, Foo> _fooMap = new Dictionary<string, Foo>(StringComparer.OrdinalIgnoreCase);
//what this proposal already boils down to:
void M(ReadOnlySpan<char> sliceOfString)
{
if (_fooMap.TryGetValue(sliceOfString, out var foo)
{
//....
}
}
//but when incoming data could also be UTF-8:
void M(ReadOnlySpan<byte> sliceOfUtf8String) //or 'ROS<Char8>' if that becomes the more appropriate thing
{
if (_fooMap.TryGetValue(sliceOfUtf8String, out var foo)
{
//....
}
} |
Really appreciated hearing the discussion of this in the .NET Design Reviews GitHub Triage. I think this should go back to triage to decide the best design approach. Again, personally I'd much prefer the extension method route. |
What if an API was added to Dictionary<TKey, TValue> that allows you to retrieve a bucket?
This way, the extension methods could operate on public APIs, and would just need to iterate over the bucket and compare values. I'm not sure how realistic this would be, and I'm absolutely certain that there are significant problems with this that I haven't considered, but I figured I'd throw the idea out there. |
It would be exposing and locking implementation details of the dictionary, it would prevent reimplementing using another method if a better one were found in future. |
As |
Look at the APIs introduced in https://github.com/dotnet/corefx/issues/31302 |
@jkotas Thanks. Since .NET Standard 2.1 is including some API's introduced for .NET Core 3.0 would including this API be considered as well? |
I quickly tried this out and could not immediately find a situation where it blows up when Maybe it would still blow up in a situation I haven't covered, but so far it looks like inheriting Also, in |
Yeah, we investigated more after the recording and came to the same conclusion, but didn't end up adding back the inheritance. We still could, though. |
I missed this API review live, but I support updating the API with |
Was there rationale for why |
@PaulusParssinen these APIs are specifically designed for hash tables, SortedSet and SortedDictionary are implemented using search trees. |
Right..🤦♂️Thanks for quick reply! |
It's out of scope for this particular feature, but it is also related - does it make sense to eventually add |
I have been prototyping a large perf oriented PR for ILCompiler name-mangling logic for a couple weeks now and I keep hitting the need for this. Given the API is approved, is it up-for-grabs? Or does someone on the team have this prototyped already while designing the API shape (IIRC during API reviews someone said they did?) edit: Oh wait, we need |
I have it implemented, and it needs allows ref struct. |
#102907 has been merged, adding the core features and implementations on I initially held off on |
@stephentoub Oh, not supporting Frozen ones - that's a pity. What about ImmutableDictionary? |
ImmutableDictionary is rarely the right type to use. Lookups are O(log N) and in genai much more exoensive than the other dictionaries. This feature is about optimizing lookups. If you want to optimize lookups, the first step is not using ImmutableDictionary. 😉 |
It seems that it would be possible to support this scenario assuming alternative keys were constrained to |
@stephentoub Yeah, well, since Frozen is out of question, there is really not much of an option if one is after immutability. |
I believe so (though it's possible there's something else lurking). It would mean trying to get a lookup would fail for a string key with an alternate key other than |
I was asking if the originally proposed It would seem like all these |
If we wanted to constrain it, we wouldn't need a new interface design; we could just change the entrypoint {Try}GetAlternateLookups for these types to themselves be constrained, such that they'd only ever work with specific keys/alternate keys. My concern there is it's elevating a current implementation-detail-based limitation to public surface area.
Yeah.
Yes, based on whether the specified TAlternateKey works with the collection's comparer. And the developer can fully control that: if they construct the collection with a comparer that supports TAlternateKey, then TAlternateKey will work. With this, we're adding an extra restriction which would be that even if that comparer/alternate key work together, we'd explicit reject it in situations implementation details don't currently support. I'll code it up and we can decide. |
UPDATED 1/22/2024 by @stephentoub
Option 1 (recommended)
Pros of Option 1:
Cons of Option 1:
Option 2
Pros of Option 2:
Cons of Option 2:
ReadOnlySpan<T>
, e.g. aSpan<T>
; calls bind to the instance members if not exact match to the extensionsRegardless of approach:
The above is based on the assumption that we will be getting a ref struct anti-constraint in for .NET 9 / C# 13. If that ends up not happening, this will be revamped to look almost the same, except using
ReadOnlySpan<TKeyElement>
instead ofTMappedKey
.UPDATED 1/3/2024 by @stephentoub.
If .NET 9 and C# 13 and up supporting ref struct constraints, we would tweak these APIs to instead take a
TMappingKey : ref struct
rather than aReadOnlySpan<TElement>
. See #27229 (comment) for more details.Open issues:
Moved from corefxlab#2438 as the proposed changes could not be implemented in a separate library.
Motivation
I would like to perform dictionary lookups using a
ReadOnlySpan<char>
value on a string keyed dictionary without having to allocate astring
.There are two different design approaches proposed, add extension methods to
Dictionary<string, TValue>
or add a new Type .Extension Method Proposed API
Implementation Details
Dictionary<TKey, TValue>
would need to be exposed with theinternal
modifier to implement theReadOnlySpan<char>
overloads.StringComparer
s that are passed into the constructor would be able to be optimized to not allocate astring
.StringComparer
would convert theReadOnlySpan<char>
to astring
and use thestring
methods instead.New Type Proposed API
Implementation Details
Dictionary<TKey, TValue>
would need to be exposed with theprivate protected
modifier to implement theReadOnlySpan<char>
overloads.StringComparer
would convert the span to a string and use the string methods instead.null
comparer passed to the constructor would default toStringComparer.Ordinal
.Possible Addition
While not specifically needed for this request a
ReadOnlySpan<char>
Compare
overload should probably be added as well while we're updatingStringComparer
.Open Questions
ReadOnlySpan<char>
Compare
overload be included?NonRandomizedStringEqualityComparer
now derive fromStringComparer
in order to be supported?TryAdd
overload be added as it has the possibility of not requiring to be converted to a string if an equal key is found?Discussion Summary.
We have two different design approaches, add a new Type or add extension methods to
Dictionary<string, TValue>
.Here's a comparison of the pro's and con's of the different design approaches.
Dictionary
needs no changes to use which is helpful when they can't easily be changed.GetValue
method instead.Dictionary
may not easily be changed to using the new type.Dictionary
internals exposed with theinternal
modifier.Dictionary
internals exposed with theprivate protected
modifier.StringComparer
.Updates
System.Collections.Specialized
.DictionaryExtensions
fromStringKeyedDictionaryExtensions
.IStringEqualityComparer
interface and instead relies on comparer deriving fromStringComparer
.The text was updated successfully, but these errors were encountered: