-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/RFC: Refactored dict.jl #7348
Conversation
Very cool! |
Hi Mauro, thanks for taking this on! I'll add a few comments inline. |
const ISEMPTY = 0x0 | ||
const ISFILLED = 0x1 | ||
const ISMISSING = 0x2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider just using EMPTY
, FILLED
, MISSING
. Most other uses of is*
in Julia are predicates.
(To prevent leakage of common names, you could put this file in its own module.)
This looks quite nice, @mauro3! I don't have any substantive comments. |
Thanks Stefan and Kevin for the encouraging comments. I changed those |
end | ||
end | ||
|
||
function deserialize{K,V}(s, T::Type{Associative{K,V}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work, due to invariance. If T
is Dict{K,V}
, this method does not match.
How is performance compared to the current |
Performance of (I'll put proper tests together which can go into
The new |
The explicit conversion question is interesting. If it's the case that the input and output of |
Since |
What would you attribute the slight performance increase to? (from Dict to TOREPDict) |
I put some performance test together in Performance of this PR vs upstream is the same within error. I think that is because the extra function in this PR are no-opts for Somewhat puzzling/interesting is the performance difference between
Upstream vs 0.2.1: about the same except for the |
sure how to do convert in general.
@JeffBezanson: yes, you're right about key conversion because |
Some random info about the not-key-conversion in the getters in upstream julia> d = Dict{Symbol, Int}()
Dict{Symbol,Int64} with 0 entries
julia> get(d, 84, 8)
8
julia> get!(d, 84, 8)
ERROR: no method convert(Type{Symbol}, Int64)
in get! at dict.jl:564
julia> get(d, :t, :a)
:a
julia> get!(d, :t, :a)
ERROR: no method convert(Type{Int64}, Symbol)
in get! at dict.jl:573
julia> d[6]
ERROR: key not found: 6
in getindex at dict.jl:615
So, the I guess this is fine either way. In fact, |
I ran some more perfomance tests but now also using Note iteration: Based on these tests I made the |
Iterating over an ObjectIdDict is way too slow. I can fix that. |
Here the new perfomance test for @JeffBezanson's updated ObjectIdDict: https://gist.github.com/mauro3/daddb89c29b7f5e5cd59#file-jjeffs-new-objectiddict Much faster now. Looping over (ASSCIString, Int) dict is about as fast as the normal |
Any interest in this? Should I rebase it? |
6c7c7e3
to
1a4c02f
Compare
We can probably get rid of the macro now that it's possible to define constructors for types with any number of unspecified parameters. We can have Needs to be updated for the new style of This also looks like a good opportunity to implement #9028 (comment) . Making a closure and/or finalizer for each element of a weak Dict can't be good for performance. |
Cool, I'll have a look at it. Probably will be after Christmas though. |
Closing. See #10116. Performance test I ran over the ordered dict: #10116 (comment) |
This is a refactor of
dict.jl
which also takes into accountDataStructures.jl/OrderedDict
. The reason I did this is because Ineeded a
WeakObjectIdDict
for PR #5572 (which I did there with an uglycopy-paste). PR #5572 is for issue #3988 which should lay the
foundation for user-defined help.
Dictionaries have quite extensive internal mechanics and small changes
in those internals make new types of dicts. For instance, whether to
take a straight hash or use an object-id in
hashindex
makes aDict
vs
ObjectIdDict
. This means that when making new dictionaries bysimply wrapping the standard
Dict
and defining newsetindex!
,getindex
, etc. one has to replicate lots of those inner mechanicsbecause dispatch cannot do its magic on inner functions. This PR is a
go at writing
dict.jl
in a more composable manner.This refactor can accommodate quite a few different hash-table based
dicts. Implemented are
Dict
,ObjectIdDict2
(a new variation),WeakKeyDict
(now works for immutables too),WeakObjectIdDict
(new),and
OrderedDict
(essentially as inDataStructures.jl
package).Performance is as before except for
ObjectIdDict2
which is 25% slowerthan
ObjectIdDict
(but has the complete interface). This is becauseObjectIdDict
is implemented differently toDict
whereasObjectIdDict2
is within the same framework.
I'm not sure the style is in line with standard practices. The
concrete subtypes of
HashDictionary
and their constructors are madewith a macro. This is reminiscent of the approach of @kmsquire in
PR #2548 which was not liked by @JeffBezanson...
If this is not the way to cleanly implement a
WeakObjectIdDict
(and itturns out its needed to solve #3988), then any comments on how to
implement it instead are welcome!
What I did:
supertype of all implemented dicts at the moment and ought to be
the supertype of all hash-table based dicts.
Dict
specific methods toAssociative
.gkey
,skey!
,...). These allow, for instance, keys to be transformed to
WeakRef
's and back.hashindex
now is dispatched on the type so, for instance,ObjectIdDict can change it to use
object_id
instead.isequal
is also dispatched on with the methodisequalkey
.WeakRef
, even for immutables, if their objectget gc-ed with the function
topurge
.@makeHashDictionary
and some of above internals are specialized.
Then to produce the standard
Dict
:@makeHashDictionary(Dict, K, K, V, V, Unordered)
To make an object-id dict only a few extras are needed:
Note 95% of this is a refactor of good code others wrote, thanks to all those others!
TODO:
ObjectIdDict2
can be made compatible withObjectIdDict
TODO irrespective of this PR:
currently there is an inconsistency between key conversion in(see below for discussion)get
and
getindex
. (some do an explicit conversion others don't).Associative
.Maybe
Dictionary
would be more consistent?dict is equal?
ObjectIdDict
seems to get special treatment at the moment.Dict
be made as performant asObjectIdDict
,i.e. 25% faster? (almost there now)
(edited 24 June)