-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: OrderedDict, v3, based on AbstractHashDict #2548
Conversation
This is definitely much needed. I would be in favor of having all the basic collection types in Base. We've already got PriorityQueue in there at this point and OrderedDict seems like a good standard addition as well. If nothing else, having these things in Base will force us to get the type hierarchy right. Once question: why AbstractHashDict rather then just AbstractDict? Is this abstraction specific to using a hash table? |
Yup - it is great to see Base slowly becoming richer with more collection types. |
So, regarding
For comparison:
The benefit of using an array as backing, rather than a linked list, are that things like Anyway, as long as this sounds okay, I'll add the |
Slightly off-topic, but why are we planning on putting |
Well, there are conflicting goals: keep it small but also provide lots of useful functionality. |
One reason to have at least some of these collection types in base is to ensure that we get the hierarchy right. That's hard to do without having actual concrete implementations of different abstractions. |
Sure, but we can have both: small |
My impression was that |
They are the same. |
I find this |
@JeffBezanson, well, it is a hack, if by hack you mean "attempting to subclass a concrete class in julia". ;-) I've put together a gist with 3 different versions of https://gist.github.com/kmsquire/5147894 The gist gives a brief description of the versions. You should be able to require, use, and modify each directly. I did profile the first two versions, and didn't find much that I could do to improve performance, but maybe there's something I missed. I think there's more of a chance of an improvement by changing the representation than by optimizing the implementation or julia--the difference in |
Great, thank you! |
Isn't the entire problem here that the slow versions do some O(n) operation on an array for each deletion? |
That shouldn't be the case. Delete only marks array positions, and they're cleaned up only upon a call to _compact(). |
It calls |
Yes, but it's calling findfirst on a bit array which is at least 1/4 full, so the expected number of steps is 3. (Unless it's not actually being compacted...) |
Okay, that is a problem--thanks! I only am compacting on assignment. I thought that was a good thing, but maybe things will improve if I compact sooner. |
Another couple things, not sure how important they are:
|
The findfirst issue was only with the second version. I'll make that chance and fix the others you just suggested and rerun the timings. |
I was wrong about the findfirst. For the second version, |
Okay, the try/catch was the big problem--changing that made up most of the performance gap. Thanks! test_del
========
Dict{K,V}: [0.584101, 0.565818, 0.565554, 0.564916, 0.565354], median=0.5655537869999999
OrderedDict{K,V}: [0.797462, 0.785816, 0.781939, 0.783949, 0.784663], median=0.784663365 (Previously, the median for OrderedDictv1 was 11.775718537000001.) For v3, the timing is still better (median=0.609851745), but can I read your previous comments as preferring the (slightly) slower, less invasive version? |
I updated the gist with the new code and |
Closing in lieu of #2556. |
Or rather, that one is in lieu of this one... |
Minor updates to OrderedDict to fix tests.
Added a version of OrderedDict here, and retitled. One additional issue with this version, beyond anything already mentioned, is that |
While they work, both this and the other OrderedDict version in #2556 are a little too complex. I'm working on a simpler version, so I'm going to withdraw them both for now, and submit the simpler version soon. |
EDIT: updated title, description
An alterate OrderedDict. This is essentially the same version of
OrderedDict
as #2558, but without all of the copied code. As with that version, the delete performance here is better than in #2556 (~10% overhead, vs ~50% overhead in #2556), but the code in #2556 is a bit cleaner.(Copied from #2558)
If either of these move forward, I would love some specific feedback on whether to allow integer indexing for
OrderedDicts
with nonnumerical keys (e.g.,od["key"]
looks up by hashing "key", andod[1]
returns the first (key,value) pair, but only for nonnumerical keys).There is also
getitem(od, 1)
for doing numerical-index based lookup, andindexof(key)
to get the numerical index of a particular key. Suggestions for better names for these are welcome.Original AbstractHashDict description
This generalizes most
Dict
functions to work on an abstract hash type.Rationale: I needed an
OrderedDict
class for a project, and found no way to create one by wrappingDict
that was in any way efficient. The only implementation that did not have severe performance issues was to copy almost all ofDict
and modify it directly to keep track of the order of the keys. The result was the near duplication of most functions. See https://github.com/kmsquire/OrderedCollections.jl.I'm not sure if any other
Dict
subclass will ever need this, and I don't know if anOrderedDict
needs to be in base (it probably belongs in aCollections
orOrderedCollections
package). So I'm just putting this out there to get some feedback.If it goes in, I have a version of
OrderedDict
on a branch which will use it (and reduce in size greatly). If not, I'll publish the version which copies most of theDict
functions.If nothing else, this patch does show the potential utility of abstract classes with fields, as most of the functions for the abstract class depend on the existence of most of the fields of
Dict
(and which were copied inOrderedDict
).