Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in-order iteration of HeapQueue #373

Open
timotheecour opened this issue Nov 23, 2020 · 5 comments
Open

in-order iteration of HeapQueue #373

timotheecour opened this issue Nov 23, 2020 · 5 comments

Comments

@timotheecour
Copy link
Owner

timotheecour commented Nov 23, 2020

proposal

allow efficient in-order iterator for a HeapQueue
I actually fell into a trap in nim-lang#16067 (maybe docs should make this clearer) by using the seq directly, which is not in order

proposal:

see https://stackoverflow.com/a/22187277/1426932

O(1) additional space.
Assume you have the heap in the usual array form.
Remove items one at a time in sorted order to "visit" them for the traversal.
After visiting the i'th item, put it back in the heap array at location n-i, which is currently unused by the heap (assuming zero-based array indices).
After traversal reverse the array to create a new heap.
Removing the items requires O(n log n) time. Reversing is another O(n).
If you don't need to traverse all the way, you can stop at any time and "fix up" the array by running the O(n) heapify operation. See pseudocode here for example.

links

@ringabout
Copy link
Collaborator

ringabout commented Nov 24, 2020

heapqueue only supports minHeap. Is it a bit inconvenient?

Should we allow user-defined cmp function?

type
  CmpCallback[T] = proc (x, y: T): bool
  HeapQueue*[T] = object
    ## A heap queue, commonly known as a priority queue.
    data: seq[T]
    cmp: CmpCallback[T]

Yeah, compared to Dlang, we don't have alias. But doesn't have big influence.

Or we could use concepts to reduce function pointer costs.

@timotheecour
Copy link
Owner Author

timotheecour commented Nov 24, 2020

heapqueue only supports minHeap. Is it a bit inconvenient?

maxheap would be redundant, can be emulated by cmp (with 0 overhead)

Should we allow user-defined cmp function?

absolutely; the lack of it is another limitation I ran into when using heapqueue.
Related to lack of cmp, heapqueue relies on generic caching which runs into exactly this issue: nim-lang#13572

eg:

import std/heapqueue
import ./t11358b

proc `<`(a,b: Foo): bool = a.x > b.x
proc `<`(a,b: tuple[x,y: int]): bool = a.x > b.x

proc fn3()=
  var list: HeapQueue[Foo]
  list.push Foo(x: 2)
  list.push Foo(x: 3)
  echo "fn3", list[0]
fn3()


proc fn4()=
  var list: HeapQueue[tuple[x, y: int]]
  list.push (x: 2, y: 30)
  list.push (x: 3, y: 20)
  echo "fn4:", list[0]
fn4()

# t11358b.nim
import std/heapqueue
type Foo* = object
  x*: int

proc `<`(a,b: Foo): bool = a.x < b.x

proc `<`(a,b: tuple[x,y: int]): bool = a.x < b.x

proc fn1[]()=
  var list: HeapQueue[Foo]
  list.push Foo(x: 2)
  list.push Foo(x: 3)
  echo "fn1:", list[0]

proc fn2[]()=
  var list: HeapQueue[tuple[x, y: int]]
  list.push (x: 2, y: 30)
  list.push (x: 3, y: 20)
  echo "fn2:", list[0]

when defined case1:
  fn1()
  fn2()
nim r main
# ok, main.< is honored
fn3(x: 3)
fn4:(x: 3, y: 20)

nim r -d:case1 main
fn1:(x: 2)
fn2:(x: 2, y: 30)
# bug, main.< is not honored, instead uses t11358b.< (which is not even visible from main)
# and worse, that is order sensitive as described in https://github.com/nim-lang/Nim/issues/13572
fn3(x: 2)
fn4:(x: 2, y: 30)

Yeah, compared to Dlang, we don't have alias. But doesn't have big influence.

As you see above, it does; unrelated code in some dependency (eg 2 modules defining some tuple) can affect each other's semantics in order-dependent ways. And that's just one of the bugs nim-lang#8677 has more (not all relate to this).

With alias PR nim-lang#11992, you can avoid this, while retaining 0 cost as follows:

var list: HeapQueue[Foo, alias myCmp]
# or var list = initHeapQueue(alias myCmp)
# or with a lambda, as shown in PR: var list: HeapQueue[Foo, (a,b) ~> a.x < b.x]
# or with a unary lambda, ditto: var list: HeapQueue[Foo, a ~> a.x] # uses cmp(lambda(a), lambda(b))

as shown with isSorted2 Example in that PR.

Or we could use concepts to reduce function pointer costs.

how?

@ringabout
Copy link
Collaborator

ringabout commented Nov 24, 2020

made a simple draft PR:
https://github.com/nim-lang/Nim/pull/16113/files

Edited:
I have closed this PR because heapqueue is used so widely. I don't want to break it. Better to have a featured one in nimble file.

@timotheecour
Copy link
Owner Author

but did it break code, or tests in testament?

Better to have a featured one in nimble file.

this would guarantee to lead to code duplication.

@ringabout
Copy link
Collaborator

I will make a better heapqueue library(Python like) first to verify it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants