add a pairing heap data structure #39046

dcoudert · 2024-11-27T18:22:10Z

This PR adds a pairing heap data structure, that is a min-heap data structure with decrease key operation (see https://en.wikipedia.org/wiki/Pairing_heap).

3 implementations are proposed:

one with fixed capacity, items in range $0..n-1$ and values of any comparable type.
one with fixed capacity accepting any type of hashable items. Values can be of any comparable type.
one with unbounded capacity where the types of items and values must be defined at C level. It is an interface to a templated C++ implementation and is for internal use only. The 2 other implementations can be accessed from the shell.

This can be useful for example for algorithms Dijkstra.

📝 Checklist

The title is concise and informative.
The description explains in detail what this PR is about.
I have linked a relevant issue or discussion.
I have created tests covering the changes.
I have updated the documentation and checked the documentation preview.

⌛ Dependencies

dcoudert · 2024-11-27T18:23:01Z

Note that several exceptions raised by the methods (value(...), etc.) should be a KeyError and not a ValueError, but I don't know how to convert C++ exceptions to KeyError (see http://docs.cython.org/en/latest/src/userguide/wrapping_CPlusPlus.html#exceptions for correspondances). Also, to get the same behaviors, I'm raising only ValueError.

github-actions · 2024-11-27T22:06:08Z

Documentation preview for this PR (built with commit e15825a; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

dcoudert · 2024-11-28T09:46:57Z

It seems that the meson/conda CI is not using the current stable version of Cython. Reported errors about new are not in accordance with the documentation http://docs.cython.org/en/stable/src/userguide/wrapping_CPlusPlus.html

cyrilbouvier · 2024-11-28T10:52:50Z

One remark after a quick look at the C++ code:

why do you use a std::map and not a std::unordered_map ?
According to cppreference, for map:

Search, removal, and insertion operations have logarithmic complexity.

But for unordered_map

Search, insertion, and removal of elements have average constant-time complexity

It implies, for exemple, that with the current code, the method contains is not O(1) as advertised but O(log(n))
If there is no need for the the nodes to have its keys sorted, a unordered_map seems more suited.

dcoudert · 2024-11-28T11:05:33Z

why do you use a std::map and not a std::unordered_map ? According to cppreference, for map:

If I understand well, the objects you use with unorderer_map must have a hash function. I don't know how to do that "easily".

cyrilbouvier · 2024-11-28T12:35:34Z

why do you use a std::map and not a std::unordered_map ? According to cppreference, for map:

If I understand well, the objects you use with unorderer_map must have a hash function. I don't know how to do that "easily".

Indeed, it is a additional constraint on the type TI but if you want to prevent duplicates items in the tree (and keep the advertised complexities) I do not see a way around that.

The use of std::map implies that the complexity of the push method cannot be lower than O(log(n)).

Why do you need the nodes attribute in the class ?

dcoudert · 2024-11-28T12:43:09Z

I find it easier for the user to interact with this class using an item rather than with an object of type PairingHeapNode.
Hence, I need a mapping (or a method) that given an item returns the corresponding PairingHeapNode.

cyrilbouvier · 2024-11-28T13:00:18Z

I find it easier for the user to interact with this class using an item rather than with an object of type PairingHeapNode. Hence, I need a mapping (or a method) that given an item returns the corresponding PairingHeapNode.

You could implement a iterator class similar to what exist for the containers of the standard library:

the iterator is just a class that store a pointer to PairingHeapNode
on push, you return a iterator to the inserted element (top could also return an iterator)
the iterator should only be invalidated on delete (but here it seems to be case: inserting new elements do not change the pointer)
the PairingHeapNode class can now contains only one member T data, with the only constraint is that the type T is comparable (in fact you just need less_than)
it seems that the only method that interact with a node (from the user point of view) is decrease, it could be called by passing a iterator as parameter.
the only changes that I see is that there should not be a copy constructor but a move constructor: so if one has an iterator for a node in tree1 and that tree is moved into the object tree2, the iterator is still valid for the new tree.

This structure would be very similar to std::list with the nodes holding an additional pointer to the children.

dcoudert · 2024-11-28T13:15:45Z

We can certainly do that but then the use of the data structure will be more complex and some parts of the complexity are only deported to the user. For instance, if I want to implement Dijkstra, I will need a mapping for the association between the nodes of the graph and the iterator.

I might be wrong, but if we have examples of hash functions and how to use them, then we can use unordered_map and preserve a reasonably simple interface for users.

cyrilbouvier · 2024-11-28T13:17:16Z

For information, I am interested in this PR because I contemplated the idea of implementing something similar to have cleaner code in my PR #39038 (and because it could be of interested in other case).

There would have been two differences with what you have implemented:

a node would have had 4 pointers: left, right, children and parent to ensure O(1) access to the parent node
there would have been no constraint between a node and its children

So i spent some time looking at the code of std::list in GCC to try to do something similar but it is not easy.

cyrilbouvier · 2024-11-28T13:27:44Z

We can certainly do that but then the use of the data structure will be more complex and some parts of the complexity are only deported to the user. For instance, if I want to implement Dijkstra, I will need a mapping for the association between the nodes of the graph and the iterator.

I might be wrong, but if we have examples of hash functions and how to use them, then we can use unordered_map and preserve a reasonably simple interface for users.

As it is an internal data structures, you can assume to have some control on which types will be used for TI, no ?
In the case of Graph, as their vertices is represented with ints, you can use unordered_map without worry, they are hashable (see https://en.cppreference.com/w/cpp/utility/hash).
Storing the mapping inside the datastructure or outside of it is then just a matter of choice.

dcoudert · 2024-11-28T13:29:20Z

We can provide multiple implementations. This PR already has 3 ;) and it can become a subproject like cysignals, memory_allocator, etc.

dcoudert · 2024-11-28T13:56:15Z

I changed to unordered_map. You are right that for internal use we can ensure that we use simple types.

It is much faster this way.

cyrilbouvier

I just reviewed the C++ code for now

src/sage/data_structures/pairing_heap.h

dcoudert · 2024-11-30T10:59:41Z

I was finally able to integrate all your suggestions in pairing_heap.h. I have moved the content of the method inside the class declaration to simplify/shorten the code.

src/sage/data_structures/pairing_heap.pxd

dcoudert · 2024-12-12T14:35:22Z

May be we can skip that for the moment. It can be added later if really needed.

gmou3

I've seen people insist on it, so I assume it has some importance.

src/sage/data_structures/pairing_heap.pyx

gmou3

Can the third C++ implementation be used? If not, why is it included?

I feel there is some redundancy with having a PairingHeap_class. There should essentially only be a PairingHeap_of_n_integers with a PairingHeap_of_n_hashables delegating to PairingHeap_of_n_integers after some dictionary translations (similar architecture to DisjointSet).

src/sage/data_structures/pairing_heap.h

src/sage/data_structures/pairing_heap.pyx

src/sage/data_structures/pairing_heap.h

dcoudert · 2024-12-13T12:31:17Z

Can the third C++ implementation be used? If not, why is it included?

Yes, it can be used as shown in methods _test_PairingHeap_from_C and _compare_heaps. The main difference is that the number of items is unbounded.

I feel there is some redundancy with having a PairingHeap_class. There should essentially only be a PairingHeap_of_n_integers with a PairingHeap_of_n_hashables delegating to PairingHeap_of_n_integers after some dictionary translations (similar architecture to DisjointSet).

Doing so would force to also maintain the bitset active in PairingHeap_of_n_hashables and would make methods push, decrease, etc. more complicated. I'm not sure it's worth the effort.

gmou3 · 2024-12-13T12:52:21Z

Ok. Another thing I was thinking about is to use a vector instead of an unordered_map, which I think would collapse the C++ implementation and the PairingHeap_of_n_integers into one. For hashables then one would use PairingHeap_of_n_hashables (or without the _n_) which would also use the C++ implementation eventually. If n is given then the vector can preallocate consecutive places in memory (unlike unordered_map) and can also be dynamically resized later if desired. Do you think this is worth it? I think it is good in two fronts , i.e., code simplification and potential speedup.

PS. there remain two origianls.

dcoudert · 2024-12-13T13:15:01Z

Ok. Another thing I was thinking about is to use a vector instead of an unordered_map, which I think would collapse the C++ implementation and the PairingHeap_of_n_integers into one. For hashables then one would use PairingHeap_of_n_hashables (or without the _n_) which would also use the C++ implementation eventually. If n is given then the vector can preallocate consecutive places in memory (unlike unordered_map) and can also be dynamically resized later if desired. Do you think this is worth it? I think it is good in two fronts , i.e., code simplification and potential speedup.

But then, how do you know the association between an item and the pointer / node in the heap ? With current implementation it is transparent for users.

gmou3 · 2024-12-13T13:19:59Z

Instead of an unordered_map you would have a vector that is indexed by integers (the items, say 0 to n-1 but can increase) and would store the pointers. If item i does not exist then it would return a nullptr. If you want hashable items you would use the wrapper class PairingHeap_of_hashables.

dcoudert · 2024-12-13T14:09:00Z

I understand what you have in mind, but I don't think it is a good idea. Instead of targeting the fastest possible implementation, we should target the most convenient one / the most suited for the kind of usage we will have. This is why I proposed 3 versions.
If you know that you are only using integers in range $[0, n-1]$, PairingHeap_of_n_integers is certainly the best possible version. If you don't know the range of integers, or if you prefer a memory usage that is proportional to the number of items in the heap, then the current C++ version might be better.

Note that we can add methods to PairingHeap_of_n_xxx in order to increase the capacity n if one needs such option.

gmou3

Ok then.

dcoudert · 2024-12-13T14:30:25Z

Thank you all for the thorough review. The code has been greatly improved !

This PR adds a pairing heap data structure, that is a min-heap data structure with decrease key operation (see https://en.wikipedia.org/wiki/Pairing_heap). 3 implementations are proposed: - one with fixed capacity, items in range $0..n-1$ and values of any comparable type. - one with fixed capacity accepting any type of hashable items. Values can be of any comparable type. - one with unbounded capacity where the types of items and values must be defined at C level. It is an interface to a templated C++ implementation and is for internal use only. The 2 other implementations can be accessed from the shell. This can be useful for example for algorithms Dijkstra. ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [x] I have created tests covering the changes. - [x] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#39046 Reported by: David Coudert Reviewer(s): cyrilbouvier, David Coudert, gmou3, user202729

Similarly to sagemath#39216, we avoid the conversion to `short_digraph` when the input graph is an instance of `StaticSparseBackend`. Furthermore, we use the new `PairingHeap_of_n_integers` data structure (sagemath#39046) instead of a `priority_queue` to emulate a max-heap. This is slightly faster this way and cleaner. ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [x] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#39217 Reported by: David Coudert Reviewer(s): Travis Scrimshaw

add a pairing heap data structure

7b67376

github-actions bot added the s: needs review label Nov 27, 2024

dcoudert added 4 commits November 27, 2024 20:14

fix lint errors

aaf0502

fix error in src/doc/en/reference/references/index.rst

747cc75

remove some trailing whitespaces

f3a88f5

try to fix references

ac0ef3e

dcoudert added 3 commits November 27, 2024 23:55

meson build

bd63d91

remove useless reference

7d2982e

detail in pairing_heap.pyx

f0d2c3a

dcoudert self-assigned this Nov 28, 2024

use unordered_map

fe3751d

cyrilbouvier reviewed Nov 29, 2024

View reviewed changes

dcoudert added 2 commits November 29, 2024 18:43

sagemath#39046: improvements in pairing_heap.h

2804a77

sagemath#39046: improve pairing_heap.h

780dc33

Merge branch 'develop' into data_structures/pairing_heap

9365926

cyrilbouvier reviewed Dec 2, 2024

View reviewed changes

src/sage/data_structures/pairing_heap.pxd Outdated Show resolved Hide resolved

gmou3 suggested changes Dec 12, 2024

View reviewed changes

src/sage/data_structures/pairing_heap.pyx Outdated Show resolved Hide resolved

src/sage/data_structures/pairing_heap.pyx Show resolved Hide resolved

src/sage/data_structures/pairing_heap.pyx Show resolved Hide resolved

gmou3 suggested changes Dec 12, 2024

View reviewed changes

src/sage/data_structures/pairing_heap.pyx Outdated Show resolved Hide resolved

src/sage/data_structures/pairing_heap.pyx Outdated Show resolved Hide resolved

sagemath#39046: suggested changes

7682e5b

gmou3 suggested changes Dec 13, 2024

View reviewed changes

sagemath#39046: review comments

abafad9

sagemath#39046: typos

fc81b14

gmou3 approved these changes Dec 13, 2024

View reviewed changes

gmou3 added s: positive review and removed s: needs review labels Dec 13, 2024

Merge branch 'develop' into data_structures/pairing_heap

e15825a

github-actions bot added s: needs review and removed s: positive review labels Dec 15, 2024

gmou3 added s: positive review and removed s: needs review labels Dec 15, 2024

user202729 mentioned this pull request Dec 19, 2024

tools/update-meson.py is too undiscoverable #39172

Open

vbraun merged commit f8d5510 into sagemath:develop Dec 22, 2024
19 of 22 checks passed

github-actions bot removed the s: positive review label Dec 22, 2024

This was referenced Dec 28, 2024

improve graph traversal methods #39217

Merged

faster bidirectional dijkstra using pairing heap #39228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a pairing heap data structure #39046

add a pairing heap data structure #39046

dcoudert commented Nov 27, 2024

dcoudert commented Nov 27, 2024

github-actions bot commented Nov 27, 2024 •

edited

Loading

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

dcoudert commented Nov 28, 2024 •

edited

Loading

cyrilbouvier left a comment •

edited

Loading

dcoudert commented Nov 30, 2024

dcoudert commented Dec 12, 2024

gmou3 left a comment

gmou3 left a comment

dcoudert commented Dec 13, 2024

gmou3 commented Dec 13, 2024

dcoudert commented Dec 13, 2024

gmou3 commented Dec 13, 2024

dcoudert commented Dec 13, 2024

gmou3 left a comment

dcoudert commented Dec 13, 2024

add a pairing heap data structure #39046

add a pairing heap data structure #39046

Conversation

dcoudert commented Nov 27, 2024

📝 Checklist

⌛ Dependencies

dcoudert commented Nov 27, 2024

github-actions bot commented Nov 27, 2024 • edited Loading

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

cyrilbouvier commented Nov 28, 2024

dcoudert commented Nov 28, 2024

dcoudert commented Nov 28, 2024 • edited Loading

cyrilbouvier left a comment • edited Loading

Choose a reason for hiding this comment

dcoudert commented Nov 30, 2024

dcoudert commented Dec 12, 2024

gmou3 left a comment

Choose a reason for hiding this comment

gmou3 left a comment

Choose a reason for hiding this comment

dcoudert commented Dec 13, 2024

gmou3 commented Dec 13, 2024

dcoudert commented Dec 13, 2024

gmou3 commented Dec 13, 2024

dcoudert commented Dec 13, 2024

gmou3 left a comment

Choose a reason for hiding this comment

dcoudert commented Dec 13, 2024

github-actions bot commented Nov 27, 2024 •

edited

Loading

dcoudert commented Nov 28, 2024 •

edited

Loading

cyrilbouvier left a comment •

edited

Loading