Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered collections #2

Open
pniederw opened this issue Jan 25, 2016 · 6 comments
Open

Ordered collections #2

pniederw opened this issue Jan 25, 2016 · 6 comments
Assignees

Comments

@pniederw
Copy link

Will capsule also provide sets/maps that maintain insertion order, or is this out of scope? And will it provide lists? Sorry for asking these questions here, didn't know where else to ask.

@msteindorfer
Copy link
Member

Hi Peter, sets/maps with insertion order are well in scope for capsule. Last summer I started to explore designs of such data structures for another project. At the moment, implementing insertion order is not my top priority, thus it will take some time until I'll add them.

Eventually I'm also going to add lists (likely something similar to RRB Trees).

The next data structures that I'll add to capsule will be sets/maps that can mix primitive elements with reference elements in a memory efficient way. The reasoning behind is to provide the best of specialized primitive collections and of generic collections in a single implementation (planned timeline: end of march).

I'll add milestones for what's missing / planned. Thanks for voicing your interest.

@pniederw
Copy link
Author

Sounds great! Looking forward to all of this.

msteindorfer added a commit that referenced this issue Aug 15, 2016
See issue #2 and issue #5. The current implementation is not yet
complete and still requires rework.
@msteindorfer msteindorfer self-assigned this Aug 15, 2016
@ilya-g
Copy link

ilya-g commented Mar 21, 2017

@msteindorfer
Copy link
Member

Hi @ilya-g, thanks for asking. The OrderedTrieMap implementation doesn't implement overflow handling yet, however I'll sketch the idea: in the constructor you can check if size reaches MAX_INT, if this happens you need to traverse the tree in insertion order and reassign subsequent IDs to each item (to essentially get rid of holes in the current sequence IDs).

Rewriting the tree for reassigning IDs is a costly operation, however shouldn't occur in practice often. The scenario where you'd encounter it is if you have really large collections and perform plenty of alternating inserts and deletes (because only deletes create holes in the sequence IDs). Other operations such as put do not update the sequenceId of a tuple.

If you like to use the OrderedTrieMap, give it a try. Apart from the missing overflow handling (that I'm willing to add as well) it's a fully functioning ordered map. I implemented a similar version of this data structure for a closed source project some time ago.

Note, the OrderedTrieMap is currently located in the capsule-experimental module, but with some integration work I'd be able to move it to the main capsule module in the near future.

@ilya-g
Copy link

ilya-g commented Mar 21, 2017

Another option is to use Long for sequenceId – it hardly overflows in a hundred years. 😀

The thing that concerns me though is that one has to pay O(n∙log(n)) cost for sorting entries before iteration, and in some scenarios it can happen every time iteration is started (i.e. you iterate collection once and then modify it somehow).

Have you considered keeping an immutable sorted map of sequenceId=>entry and therefore spreading that cost among modification operations?

@msteindorfer
Copy link
Member

Sorry for the late response. To answer your questions: I wouldn't like to use Long because it introduces to much overhead in my opinion.

Regarding sorting before iteration: this is all about implementation choices. To avoid sorting at all, one could use a reversed map as you pointed out correctly as well. The other solution would be to sort lazily upon the first time of iteration (as done now), and from then on update the reverse index incrementally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants