-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiTable: thin wrapper around Table with sane API for duplicate keys; deprecate Table.add #200
Comments
|
sure. |
Performance is linear in the number of duplicates per operation. Regular access is constant-time, not linear. "Quadratic" is typically a bit abused here, assuming that there is an outer loop, but it could be only exactly one key has any (or many) duplicates. No need to perpetuate the abuse. STL uses "multiset" for this concept (which I believe has broader mathematical community acceptance). So, "MultiTable" makes sense as a type name, but I don't think we need this. While I agree it's unusual, I do not have a problem with both sets and tables being their "multi" versions. Instances' history just determines things. Users just have to know not to abuse them with many duplicates. If they don't abuse, there is no problem. You can't really get around people needing to know how to use some things. Hashes are never guaranteed O(1), anyway. Things being "per instance" rather than in the type system definitely helps keep the number of table types down, as you note. There's nothing forcing people to put dups in, most new people don't know they can, and those that do know they can are probably aware of the performance gotcha. (This is actually one of the few ways to abuse B-trees as well, incidentally.) Anyway, I think the "very flexible" philosophy of Nim fits with less restrained data structures. I think the divergence between the sets & tables APIs in terms of duplicates is unnecessarily limited, though. Adding dup ability to sets would be another way to go there. That's what I'm doing in my own library. I just have an We don't really have a warning convention for the run-time library that I can tell. This cannot be the only example in the stdlib where it is easy to detect misuse/a probable but not necessarily fatal/exception oriented run-time issue. I think the better RFC here would be what this convention should be and to add warnings when abuse happens. |
on warnings
on keeping allowing duplicate keys in Tablesas I mentioned in top post, supporting duplicate keys in Table results in a half-assed design that impacts performance both for users that don't care about duplicate keys, as well as for users who care about duplicate keys. There is no solution I'm aware of that can satisfy both in the same design; if you disagree, file an RFC or show some POC code. For example, you can't get/add/modify a specific duplicate. With wrapper is not re-implementationMy main problem with nim's Table/HashSet/Intests/etc apis is not the number of unique table types, but the fact that these are more or less full re-implementations (with some partial code reuse). MultiTable as proposed here is just a thin wrapper, and in fact has a The only justification for re-implementation is a benchmark showing performance of a wrapper is not as good as a re-implementation in at least some meaningful cases; I haven't seen such benchmark. |
I don't really disagree about the warning points (at a high level, some examples I would challenge). The fact remains there will be times when unanticipated inputs create a weird situation that no one may ever know has pushed things into a "bad regime" without a warning to the end user who I realize may be different than the code author. Some code author could define a constant While I do see the current API as of limited usefulness ( I agree the code factoring is a bit messy. For what it's worth, in my "sets are primary, tables in terms of sets" approach there is a similar "down projection" control for keys which might be usable in this way (necessary to implement tables in terms of sets which may constitute another argument in that space of concerns vs. tables with void values). Once you have a multi-component key like your |
I tried to make some of my ideas (not entirely about this dups issue, but not entirely not about it, either) much more concrete over at https://github.com/c-blake/adix. { I had actually written most of that code a couple weeks before I made my comments here.} |
We did deprecate Tables.add, you can use a |
Table currently allows duplicate keys, which have issues, some fixable, others harder to fix without affecting code that doesn't care about duplicate keys (the common case is to not have duplicate keys and that should be as fast as possible); eg:
del
when there are repeated keys Nim#13500==
incorrect for tables with repeated keys Nim#13499del
just deletes "one of the duplicates"card(duplicates of a key)
proposal
Table.add
: would still work (for now at least) but would show a deprecation msg showing best course of action (including what this RFC introduces)TableDup
around Table to offer sane API to handle duplicate keys, including ability to access / re-assign / delete a specific duplicatevalues
andadd
: I'm not sure but shouldn't be worseNote
it's a thin wrapper, not a re-implementation, so all the complex logic is reused instead of re-implemented; I know we have too many Table types but the chief concern I have is that these are not mere wrappers around a few basic table types but full blown implementations (with some amount of code reuse). Wheres this is just a wrapper.
alternatives considered
IMO this is worse because you pay a price for each key, not just duplicates (seq indirection); and there are other problems as well
refinements
I didn't want to obfuscate the main idea but it's possible to extend to OrderedTable or other types eg:
(I haven't tested this)
(I haven't tested either)
links
D20200304T004500
The text was updated successfully, but these errors were encountered: