-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not store names in two different objects #2
Comments
Hi, On Mon, Nov 18, 2013 at 1:47 PM, Milan Bouchet-Valat <
We could have a function names(a::NamedArray, d::Int) = that always resolves the current dict. In slices etc. I now first recompute the names, and then redefine the dict.
a bit more overhead. Perhaps it can be done more cleverly. ---david
David van Leeuwen |
In theory when extracting a slice you only need to create a subset of the dictionary, which could be even more efficient than creating it from a vector (since you do not need to compute hashes again). So what we need is a function to extract a subset of a dict. Probably worth asking the main devs. There's also OrderedDict (see JuliaLang/julia#2548), which could be more efficient to select some elements by index (but slower when inserting). |
Hello, On Mon, Nov 18, 2013 at 9:48 PM, Milan Bouchet-Valat <
I worked on more efficient getindex(), which now is pretty fast for Int / I also have a small test.jl now. Needs to be expanded. ---david David van Leeuwen |
I've just found that Yet another solution would be to avoid storing copies of the strings, and only store in the dict references to the strings in the vector - that would save some space when names are long. I'll find the time to give a try at your performance improvements. |
Hi Milan, This morning I removed names::Vector from the type definition and went I didn't do the names slicing cleverly, yet. I just re-define a new dict. Cheers, ---david On Tue, Nov 19, 2013 at 11:44 PM, Milan Bouchet-Valat <
David van Leeuwen |
This should probably be revisited when the new version of OrderedDict (JuliaLang/julia#2548) is ready. What we need is really an object allowing both fast access by name and fast extraction of a subset of the names (for cases where a new NamedArray has to be built). OrderedDict might be a good compromise, improving on Dicts with regard to the latter goal since you do not need to sort names every time. |
Currently names are duplicated in the
names
anddicts
members. This is suboptimal in the long-term. It is probably possible to only use the dictionaries and to rebuild the ordered vector of names from it when needed. Since that vector is not typically needed in performance-critical operations, that's not a problem (indexing is fast thanks to dictionaries, and that's what matters). What do you think?The text was updated successfully, but these errors were encountered: