Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC: Adding EgalDict{K,V} (aka ObjectIdDict{K,V}) #24932

Closed
wants to merge 7 commits into from

Conversation

mauro3
Copy link
Contributor

@mauro3 mauro3 commented Dec 5, 2017

Jeff's comment that a ObjectIdDict with type parameters would be good to have, prompted me to look into this again. It is inspired by my previous effort #7348.

In this PR I added that requested ObjectIdDict-like dict, called EqualDict{K,V}. This PR is not the DRY-est (e.g. WeakKeyDict should be unified too), but my idea was to keep it as simple as possible to allow rapid review of this PR. That may would make the inclusion of this and (hopefully) thus #24354 before the imminent feature freeze possible. I would be happy to DRY this up a bit later in a separate PR.

The PR does the following:

  • insert an abstract type Associative{K,V} :> AbstractDict{K,V} :> Dict{K.V} which is the supertype of all dicts based around the current machinery of Dict. Currently all subtypes of AbstractDict would need to have the same fields as Dict (but this could be dried up by using a DictStore type and have that as a field).
  • introduce the functions:
    • isequalkey(h, k1, k2) which compares two keys for a given dict h
    • keyhash(h, k) which produces a hash for key k of dict h
  • the EgalDict is then implemented by copying the Dict type-def and setting:
    • isequalkey(::EgalDict, k1, k2) = k1===k2
    • keyhash(::EgalDict, k) = object_id(k)

Questions on which further work on this PR depend:

  • is this a desired feature?
  • is this the right approach?
  • is it ok to keep it simple for now and DRY up later?

Things to ponder:

  • I think it would make sense to move the original ObjectIdDict to Core and/or not export it anymore. (If I understand correctly, it needs to stay as it is in the C-part of Julia, but I might be wrong.) Maybe call it Core.OIdDict.
  • Then this EgalDict{K,V} could be ObjectIdDict{K,V} and most code using it should just work.

TODO:

  • add more tests for EgalDict
  • better docs

Possible future DRY-ups:

  • add WeakKeyDict to the framework (would need the addition of a key-conversion mechanism to hook into).
  • use a storage type to hold the fields of Dict and EgalDict
  • think about a way to construct all the constructors

@ararslan ararslan requested a review from JeffBezanson December 5, 2017 23:34
@ararslan ararslan added the collections Data structures holding multiple items, e.g. sets label Dec 5, 2017
@mauro3
Copy link
Contributor Author

mauro3 commented Dec 6, 2017

The repl-test failure was my bad (I dropped the fallback of empty to Dict). Fixed now.

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 6, 2017

On further reflection, the EgalDict should not convert keys as e.g. happens here. The fix is to add a keytransform function to the internal interface which will be convert for Dict and id for EgalDict.
I'll get to this.

No, this is ok as one line after there is the check whether the converted key is isequalkey(h, key, key0). This will fail thus no key-conversion can happen.

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 9, 2017

X-ref: #9381 (comment) a keycompare function was suggested, which has the same functionality as isequalkey here.

Edit: reading through #9381, I think it would make sense to change isequalkey -> keycomparison and keyhash -> keyhash (keep the name), to return the comparison and hashing function.

@JeffBezanson
Copy link
Member

+1 for keycomparison returning the comparison function (similarly for keyhash).

@JeffBezanson
Copy link
Member

JeffBezanson commented Dec 13, 2017

I'm not a fan of copying code like this. For the purpose of #24354, it would be sufficient just to add type parameters to ObjectIdDict (probably by renaming it to IdDict{K,V} and defining const ObjectIdDict = IdDict{Any,Any}) that do type assertions and that return "not found" immediately when trying to look up a key of the wrong type.

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 13, 2017

I'm not a fan of copying code like this.

Me neither, although note that it is "only" the type-def of EgalDict of about ~20 lines. Unless you count the slightly ugly eval loop for the outer constructors as copy-paste.

I could DRY this up by tomorrow by moving the fields of Dict to a DictStore type and sharing that (I would probably leave WeakKeyDict for another PR unless requested). Would that be ok? Or would you rather have the wrapped ObjectIdDict?

I will also do the rename ObjectIdDict->IdDict and EgalDict -> ObjectIdDict.

@andyferris
Copy link
Member

@mauro3 FYI I just merged a land-grab on the AbstractDict type name at #25012 :)

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 14, 2017

I started the DRY-up. Currently it is plagued by a strange failure in test/inference.jl. Edit: found it.

@StefanKarpinski
Copy link
Member

I have to say I'm not thrilled with the name EgalDict. I love Henry Baker as much as the next guy, but that's going to mean very little to most people and just seem weird to them. I'd favor calling it IdDict and gaving const ObjectIdDict = IdDict{Any,Any} as an alias for convenience.

base/dict.jl Outdated
age::UInt
idxfloor::Int # an index <= the indexes of all used slots
maxprobe::Int
mutable struct Dict{K,V} <: HashDict{K,V}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not mutable. also, we should optimize to remove the allocation for this case (since it would be relatively trivial), but currently don't, so we may need to make sure the perf regression is minor in the meantime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not mutable, thanks. And I was not aware of the perf problem. I'll run it against JuliaCI/BaseBenchmarks.jl#150 and some against older perf-tests.

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 14, 2017

Yes, I'll rename: ObjectIdDict->IdDict and EgalDict -> ObjectIdDict.

@Sacha0
Copy link
Member

Sacha0 commented Dec 14, 2017

I have to say I'm not thrilled with the name EgalDict. I love Henry Baker as much as the next guy, but that's going to mean very little to most people and just seem weird to them. I'd favor calling it IdDict and gaving const ObjectIdDict = IdDict{Any,Any} as an alias for convenience.

Playing devil's advocate, EgalDict means something to me, but IdDict and ObjectIdDict do not immediately. I conjecture that neither term is meaningful to those who do not grok the first term :).

Internally IdDicts are only used early in the bootstrap.  All other
instances of ObjectIdDict use my new implementation.

Skipping CIs as I expect tests to fail.  I will force push a working
commit onto this tomorrow.

[ci skip]
[av skip]
[bsd skip]
@mauro3
Copy link
Contributor Author

mauro3 commented Dec 14, 2017

Ok, so I DRY-ed this up (apart from unifying WeakKeyDict) and renamed ObjectIdDict->IdDict and EgalDict -> ObjectIdDict. All ObjectIdDicts in Base use now the new implementation, except a few early in boot-strap use IdDict. I haven't had time to run all tests locally after the rename before my bedtime, so I expect some breakage.

Todo:

  • update pre-compile statements
  • check performance

@StefanKarpinski
Copy link
Member

IdDict stands for "identity dict" which seems to suggest the right thing. I really rather wish that there could be one type and that the other type could just be IdDict{Any,Any}.

?module works
?sin does not

rehashing the meta-dicts fixes it

for m in Base.Docs.modules
    Base.rehash!(Base.Docs.meta(m))
end

Simlarly, there was a rehashing in serialize.jl which was fixed going
ObjectIdDict->IdDict.  But with the doc-system that did not help.

Skipping some CIs to save resources:
[ci skip]
[av skip]
[bsd skip]
@mauro3
Copy link
Contributor Author

mauro3 commented Dec 15, 2017

The current state of this PR has an issue with dicts not being re-hashed at times. This shows with the doc-system, where ?sin will not find the docs. The strange thing is that the doc-system uses the old, now called IdDict, whose code was not touched apart from the rename. Has anyone an idea why this happens?

The other issue is performance. I ran these somewhat primitive and probably not exhaustive tests: https://gist.github.com/mauro3/ada59b1fd9a09202631edccf255c4e1b (results at the bottom). They show performance degraded some 10-20% for Dict and WeakKeyDict. The new ObjectIdDict is about 3x slower than IdDict. As an aside, WeakKeyDict got about 20% slower since 0.6.1.

I will not have much time to work on this over the weekend. The breaking change of this PR is the ObjectIdDict taking now type parameters. This could be implement, as Jeff suggested, by wrapping the old ObjectIdDict and adding some type asserts. I can do that, but not until Monday. So, if the feature freeze ship hasn't sailed by then, I'll do this. And #24354 can be still done. (This PR can then be done, if desired, for 1.x)

@StefanKarpinski
Copy link
Member

I'm pretty confused by the approach here. Why rename ObjectIdDict to IdDict and then introduce a new thing called ObjectIdDict that is different than the old one? That's going to cause a lot of needless pain and suffering here. Why not introduce a new parametric egal-based-dict type called IdDict? Or even better, introduce that and then just define const ObjectIdDict = IdDict{Any,Any}?

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 15, 2017

Yes, you're right, I got confused. Let's call the new one IdDict and keep the old one ObjectIdDict (but maybe stop exporting it). It is probably beyond me (at least without a lot of trial and error) to remove ObjectIdDict as it features earlier in the boot-strap.

@JeffBezanson
Copy link
Member

Replaced by #25210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants