Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List construction works very slow in REPL if I don't explicitly define its type as the (List Int) #4290

Open
sergey-kintsel opened this issue Jan 14, 2018 · 8 comments

Comments

@sergey-kintsel
Copy link

sergey-kintsel commented Jan 14, 2018

Idris version: 1.2.0
OSs: macOS & Ubuntu

Steps to Reproduce

Open REPL
type [1..1000] and hit Enter.
It takes very long and consumes 100% of 1 CPU core

Expected Behavior

It should be as fast as a bullet

Important notes:

  • if you run the (List Nat/Integer) [1..1000] it works slow
  • if you run the (List Int) [1..1000] in REPL, it works not so slow (I am able to wait for results)
  • if you compile the file with something like:
module Main

main : IO ()
main = print $ foldr (+) 1 (the (List Nat) [1..1000])

it works fast.

I tried to investigate the problem myself, but I got only to this point: https://github.com/idris-lang/Idris-dev/blob/master/src/Idris/REPL.hs#L859.

@steshaw
Copy link
Member

steshaw commented Jan 17, 2018

Possibly related to #3976 (unfortunately closed AFAICS) and #3516.

I see you found the REPL code! It looks to me that the evaluation of the term happens happens where normaliseBlocking is called.

When we spoke on the Idris Slack channel, I'm pretty sure I was misguided to suggest that it might be a problem of not having the GMP flag enabled. The evaluation of the [1..1000] expression in the REPL should be a matter of how it's evaluated by the Haskell code in normaliseBlocking.

Hopefully one of the other @idris-lang/contributors will be able to give you more guidance about how the evaluator works and why it might be slow in this case. Otherwise, dig into src/Idris/Core/Evaluate.hs to see what you can find!

@steshaw
Copy link
Member

steshaw commented Jan 17, 2018

I recalled that Agda has a mechanism to optimise datatypes that are like Nat. Googling, I came up with:

data  : Set where
  zero :suc  : (n : ℕ) {-# BUILTIN NATURAL ℕ #-}

See

Integers have a similar "builtin" mechanism in Agda.

I'm not sure if Idris has a similar mechanism. If not, perhaps that's what's required to resolve these performance issues?

@ziman
Copy link
Contributor

ziman commented Jan 17, 2018

Nat is unary and therefore a number N is represented by a linked list with N nodes.

The compiler special-cases Nat and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).

AFAIK, the Agda builtin mainly means that you can write 3 instead of (suc (suc (suc zero))). I don't know if Agda compiles Nat the same way as Idris does.

Either way, if you want big numbers at runtime, you likely need Int or Integer. Nat is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat of the corresponding size. However, if you don't need the structure, Int or Integer will be more efficient.

Perhaps one day we'll have a more principled optimisation that will allow using big Nats everywhere, with their useful structure, without their representational overhead.

@steshaw
Copy link
Member

steshaw commented Jan 17, 2018

Hi @ziman, thanks for stopping by!

The compiler special-cases Nat and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).

That's sad. We should use a similar optimisation for REPL evaluation.

AFAIK, the Agda builtin mainly means that you can write 3 instead of (suc (suc (suc zero))). I don't know if Agda special-cases Nat the same way as Idris does.

No, the use of NATURAL in Agda has 4 effects:

  1. The use of natural number literals is enabled. By default the type of a natural number literal will be Nat, but it can be overloaded to include other types as well.
  2. Closed natural numbers are represented as Haskell integers at compile-time.
  3. The compiler backends compile natural numbers to the appropriate number type in the target language.
  4. Enabled binding the built-in natural number functions: plus, minus, times, equals, less-than, div-suc-aux, mod-suc-aux.

See http://agda.readthedocs.io/en/v2.5.3/language/built-ins.html#natural-numbers

So, for Idris, we'd want to add (2) as (1) and (3) seem already taken care of. Not sure about (4), it looks like a complication 😄.

Either way, if you want big numbers at runtime, you likely need Int or Integer. Nat is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat of the corresponding size. However, if you don't need the structure, Int or Integer will be more efficient.

I'm not entirely following but I think we want Haskell Integer.

Perhaps one day we'll have a more principled optimisation that will allow using big Nats everywhere, with their useful structure, without their representational overhead.

@sergey-kintsel seems keen to solve this. I'm happy to help. Rather than "one day", with a little direction, we could hope for "one day next week" 😉! Could you point to the optimisation that the compiler does to replace uses of Nat with GMP integer operations? I imagine the structure of that operation would be very similar to what's needed for evaluation-time optimisation.

@ziman
Copy link
Contributor

ziman commented Jan 17, 2018

Ah, thank you for the clarification wrt. Agda.

I'm not entirely following but I think we want Haskell Integer.

In my personal opinion (which may be a bit radical), sometimes Nat is simply not the right type and it is better to use Int or Integer in your source -- because the meaning is different. For example, I'd most likely use a pair of Ints for widget dimensions: an opaque number is sufficient there, and it performs better. I prefer to read Nat as Count, and you hardly ever count pixels one by one in your types or elsewhere, as opposed to elements of a list or the depth of a tree.

Either way, the translation currently happens in Idris.DataOpts, which lives in src/Idris/DataOpts.hs. Compile-time optimisation is performed on the Raw representation.

We've been talking about generalising this for a very long time but I can't find anything written up online. You could extend this idea of big ints to Fin besides Nat, represent vectors as arrays, binary trees packed in arrays (like in heapsort), etc.

I have some notes and even syntax ideas but they are far from anything complete. For example, changing the representation changes the complexity of operations (e.g. (+1) becomes more expensive for big integers than for Nat, the same goes for cons in arrays vs. vectors) so you probably want to let the programmer choose the implementation somehow...

@clayrat
Copy link
Contributor

clayrat commented Jan 17, 2018

Another cool optimization would be to substitute Bits for binary numbers modulo (or bitvectors) .

@sergey-kintsel
Copy link
Author

I was looking DataOpts.hs though and found this:

(Var n, args) <- raw_unapply t -- MAGIC HERE

that's an amazing code comment 😂

@sergey-kintsel
Copy link
Author

looks like I am not the person to solve this. I have no idea how it works internally tbh 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants