Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers - meta-issue to talk about quality of containers, what should be replaced, etc #3863

Closed
dbp opened this issue Oct 26, 2012 · 6 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one.

Comments

@dbp
Copy link
Contributor

dbp commented Oct 26, 2012

One of the main things people often look at standard libraries for are good containers.

Right now, as far as I can tell, there are the following containers:

HashMap - in std/map.rs - a chained hash-table, with the usual performance characteristics (expected constant time access, insert, removal, no order, no comparing)

LinearMap - in core/send_map.rs - open addressing hash table, sendable (same general behavior as above, but sendable)

TreeMap - in std/treemap.rs - unbalanced binary search tree, very bad pathological inputs, is ordered, can be compared

FunTreeMap - in std/fun_treemap.rs - looks to be the same as above, but without mutation.

SmallIntMap - in std/smallintmap.rs - wee! constant time everything, ordered, comparable! oh yeah, space order the maximum key :)

So here are some questions I have:

  1. There are tradeoffs between chained hashing and open addressing - are the implementation differences relevant for sendability? If not, could they both be sendable? It seems that in general having two different hash tables to have different performance characteristics makes sense, as long as it is clear and documented, but having them differ in sendability seems like a separate issue (but I may just not understand what's going on). Also, is LinearMap in core because it is used internally for rustc? Otherwise, seems a bit odd (as even the Map trait is in std).
  2. The TreeMaps should be replaced with self-balancing trees, as the comments in the file suggest (Red-black seems like the choice), but I'm wondering if the distinction between TreeMap and FunTreeMap makes sense. Is there a way that this could be pushed into the interface? I'm not sure if this is possible, but if it is, it definitely seems desirable (as having two basically parallel implementations is not good).
  3. Right now there is a Map trait (in std/map.rs) - How about also having an OrderedMap interface? Is there a way to tie mutability into the interfaces, and is that desired?
  4. There is a type alias in TreeMap that defines a Set as a TreeMap to unit values. It would be good to have an actual Set type (though it could internally be the the same).

Any thoughts people have about containers? Time permitting, I might work on a red-black tree implementation to replace the TreeMap.

Related issues (that I found): #3385, #2009

@nikomatsakis
Copy link
Contributor

This is an important issue. My general plan is to remove the existing HashMap and make SendMap (well, an optimized, improved version of SendMap) the default. In general I think we should try to make all of our containers sendable and freezable when possible; but sometimes this does not work so well. We'll have to experiment. I think a chained hashtable can be built that is sendable, but I haven't tried it.

Right now we don't offer any functional-style persistent data structures other than List. We should do this. Ironically, functional data structures are not well-suited to sendability because they tend to require sharing between structures and hence are well-suited only to @.

@nikomatsakis
Copy link
Contributor

An interesting thought: one could implement persistent/functional data structures based on ARCs, as well.

@graydon
Copy link
Contributor

graydon commented Oct 27, 2012

I think making all container types sendable and freezable is a good default. I think the libs should shy away from managed types unless they're mandated by the data structure (eg. a circular list or a list / tree that supports tail-sharing).

While there are some advantages to chaining over open addressing, I don't feel they're substantial enough in a language with automatic memory management (esp. one with owning pointers, engineered around swapping). I think if someone wants a "small" hashmap entry they can just make the value type ~V rather than V. We can make this space/speed tradeoff clear in the docs. Given our very pleasant randomized / strong hash function, I think a double-hashed open address table should be default there. The second (...Nth) hash function can be derived from the key values in the table: just use k+n.

The default TreeMap should be an LLRB tree, yes. Imperative, to match the Map interface.

I don't think the functional interface matters much for sendable types. We should have a different interface for managed (@), substructure-sharing maps: Tries and functional LLRBs at least satisfy this. Various HAMTs and Heaps do too. FunMap? ManagedMap? PersistMap? I'm not sure how to name that.

Given that core is merging with rt and we want to implement as much of rt as possible in rust itself, I think it reasonable to move a few primary container types into core. Not every possible container ever, but the Normal Bunch Every Program Uses: the Map trait, HashMap and TreeMap, a small variety of Lists (sendable, managed, circular), Sets (Tree and Hash), Queue (unsafe, double-ended, circular) and Heap (vector-based).

@nikomatsakis
Copy link
Contributor

By the way, I started doing some of the work towards refactoring the map trait (you see the SendMap trait for example) but hit some bugs in the implementation, particularly concerning region parameterization and self pointers. Those will need to be fixed before we can go too much farther. I don't know that there is any issue filed for the particular problem I hit, I ought to do so.

@lucian1900
Copy link

I think Persistent* is a good naming scheme. Other languages (like Clojure) use this naming as well to differentiate between those and the imperative versions.

@thestinger
Copy link
Contributor

I think it would be best to close this issue and coordinate on this wiki page instead. Issues can be opened for individual bugs, features and wanted containers. I've worked on a lot on related issues since this was opened/discussed, so the issues discussed here are mostly dealt with.

Summary of changes since this was opened:

  • TreeMap is now a self-balancing binary search tree, sendable, copy-free, and implements Eq and Ord
  • core::container now exists, with a redone Map trait (implemented by TreeMap, LinearMap and SmallIntMap) and a Set trait (implemented by the new TreeSet and LinearSet types)
  • there's now a std::priority_queue (binary heap)
  • TreeMap, TreeSet, LinearMap, LinearSet, SmallIntMap and PriorityQueue now all implement BaseIter

thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 5, 2025
rustfmt fails to format this match expression, because it has several
long string literals over the maximum line width. This seems to exhibit
rustfmt issues rust-lang#3863 (Gives up on chains if any line is too long) and
rust-lang#3156 (Fail to format match arm when other arm has long line).
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 5, 2025
rustfmt fails to format this match expression, because it has several
long string literals over the maximum line width. This seems to exhibit
rustfmt issues rust-lang#3863 (Gives up on chains if any line is too long) and
rust-lang#3156 (Fail to format match arm when other arm has long line).

Format it with a large line width (e.g., by setting `max_width = 200` in
rustfmt.toml) and, in case the rustfmt bugs are later fixed, mark it
with `#[rustfmt::skip]`, as it is more legible with each case on one
line.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 5, 2025
rustfmt fails to format this match expression, because it has several
long string literals over the maximum line width. This seems to exhibit
rustfmt issues rust-lang#3863 (Gives up on chains if any line is too long) and
rust-lang#3156 (Fail to format match arm when other arm has long line).

Format it with a large line width (e.g., by setting `max_width = 200` in
rustfmt.toml) and, in case the rustfmt bugs are later fixed, mark it
with `#[rustfmt::skip]`, as it is more legible with each case on one
line.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 11, 2025
rustfmt fails to format this match expression, because it has several
long string literals over the maximum line width. This seems to exhibit
rustfmt issues rust-lang#3863 (Gives up on chains if any line is too long) and
rust-lang#3156 (Fail to format match arm when other arm has long line).
jhpratt added a commit to jhpratt/rust that referenced this issue Feb 11, 2025
…braheemdev

Fix long lines which rustfmt fails to format

rustfmt fails to format this match expression, because it has several long string literals over the maximum line width. This seems to exhibit rustfmt issues [rust-lang#3863](rust-lang/rustfmt#3863) (Gives up on chains if any line is too long) and [rust-lang#3156](rust-lang/rustfmt#3156) (Fail to format match arm when other arm has long line).

Format it with a large line width (e.g., by setting `max_width = 200` in rustfmt.toml) and, in case the rustfmt bugs are later fixed, mark it with `#[rustfmt::skip]`, as it is more legible with each case on one line.
jhpratt added a commit to jhpratt/rust that referenced this issue Feb 11, 2025
…braheemdev

Fix long lines which rustfmt fails to format

rustfmt fails to format this match expression, because it has several long string literals over the maximum line width. This seems to exhibit rustfmt issues [rust-lang#3863](rust-lang/rustfmt#3863) (Gives up on chains if any line is too long) and [rust-lang#3156](rust-lang/rustfmt#3156) (Fail to format match arm when other arm has long line).

Format it with a large line width (e.g., by setting `max_width = 200` in rustfmt.toml) and, in case the rustfmt bugs are later fixed, mark it with `#[rustfmt::skip]`, as it is more legible with each case on one line.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Feb 11, 2025
Rollup merge of rust-lang#136606 - thaliaarchi:format-long-lines, r=ibraheemdev

Fix long lines which rustfmt fails to format

rustfmt fails to format this match expression, because it has several long string literals over the maximum line width. This seems to exhibit rustfmt issues [rust-lang#3863](rust-lang/rustfmt#3863) (Gives up on chains if any line is too long) and [rust-lang#3156](rust-lang/rustfmt#3156) (Fail to format match arm when other arm has long line).

Format it with a large line width (e.g., by setting `max_width = 200` in rustfmt.toml) and, in case the rustfmt bugs are later fixed, mark it with `#[rustfmt::skip]`, as it is more legible with each case on one line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one.
Projects
None yet
Development

No branches or pull requests

6 participants