List construction works very slow in REPL if I don't explicitly define its type as `the (List Int)` #4290

sergey-kintsel · 2018-01-14T10:28:52Z

Idris version: 1.2.0
OSs: macOS & Ubuntu

Steps to Reproduce

Open REPL
type [1..1000] and hit Enter.
It takes very long and consumes 100% of 1 CPU core

Expected Behavior

It should be as fast as a bullet

Important notes:

if you run the (List Nat/Integer) [1..1000] it works slow
if you run the (List Int) [1..1000] in REPL, it works not so slow (I am able to wait for results)
if you compile the file with something like:

module Main

main : IO ()
main = print $ foldr (+) 1 (the (List Nat) [1..1000])

it works fast.

I tried to investigate the problem myself, but I got only to this point: https://github.com/idris-lang/Idris-dev/blob/master/src/Idris/REPL.hs#L859.

The text was updated successfully, but these errors were encountered:

steshaw · 2018-01-17T10:49:03Z

Possibly related to #3976 (unfortunately closed AFAICS) and #3516.

I see you found the REPL code! It looks to me that the evaluation of the term happens happens where normaliseBlocking is called.

When we spoke on the Idris Slack channel, I'm pretty sure I was misguided to suggest that it might be a problem of not having the GMP flag enabled. The evaluation of the [1..1000] expression in the REPL should be a matter of how it's evaluated by the Haskell code in normaliseBlocking.

Hopefully one of the other @idris-lang/contributors will be able to give you more guidance about how the evaluator works and why it might be slow in this case. Otherwise, dig into src/Idris/Core/Evaluate.hs to see what you can find!

steshaw · 2018-01-17T10:59:41Z

I recalled that Agda has a mechanism to optimise datatypes that are like Nat. Googling, I came up with:

data ℕ : Set where
  zero : ℕ
  suc  : (n : ℕ) → ℕ

{-# BUILTIN NATURAL ℕ #-}

See

Integers have a similar "builtin" mechanism in Agda.

http://agda.readthedocs.io/en/v2.5.3/language/built-ins.html#integers

I'm not sure if Idris has a similar mechanism. If not, perhaps that's what's required to resolve these performance issues?

ziman · 2018-01-17T11:21:44Z

Nat is unary and therefore a number N is represented by a linked list with N nodes.

The compiler special-cases Nat and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).

AFAIK, the Agda builtin mainly means that you can write 3 instead of (suc (suc (suc zero))). I don't know if Agda compiles Nat the same way as Idris does.

Either way, if you want big numbers at runtime, you likely need Int or Integer. Nat is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat of the corresponding size. However, if you don't need the structure, Int or Integer will be more efficient.

Perhaps one day we'll have a more principled optimisation that will allow using big Nats everywhere, with their useful structure, without their representational overhead.

steshaw · 2018-01-17T12:36:48Z

Hi @ziman, thanks for stopping by!

The compiler special-cases Nat and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).

That's sad. We should use a similar optimisation for REPL evaluation.

AFAIK, the Agda builtin mainly means that you can write 3 instead of (suc (suc (suc zero))). I don't know if Agda special-cases Nat the same way as Idris does.

No, the use of NATURAL in Agda has 4 effects:

The use of natural number literals is enabled. By default the type of a natural number literal will be Nat, but it can be overloaded to include other types as well.
Closed natural numbers are represented as Haskell integers at compile-time.
The compiler backends compile natural numbers to the appropriate number type in the target language.
Enabled binding the built-in natural number functions: plus, minus, times, equals, less-than, div-suc-aux, mod-suc-aux.

See http://agda.readthedocs.io/en/v2.5.3/language/built-ins.html#natural-numbers

So, for Idris, we'd want to add (2) as (1) and (3) seem already taken care of. Not sure about (4), it looks like a complication 😄.

Either way, if you want big numbers at runtime, you likely need Int or Integer. Nat is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat of the corresponding size. However, if you don't need the structure, Int or Integer will be more efficient.

I'm not entirely following but I think we want Haskell Integer.

Perhaps one day we'll have a more principled optimisation that will allow using big Nats everywhere, with their useful structure, without their representational overhead.

@sergey-kintsel seems keen to solve this. I'm happy to help. Rather than "one day", with a little direction, we could hope for "one day next week" 😉! Could you point to the optimisation that the compiler does to replace uses of Nat with GMP integer operations? I imagine the structure of that operation would be very similar to what's needed for evaluation-time optimisation.

ziman · 2018-01-17T17:27:26Z

Ah, thank you for the clarification wrt. Agda.

I'm not entirely following but I think we want Haskell Integer.

In my personal opinion (which may be a bit radical), sometimes Nat is simply not the right type and it is better to use Int or Integer in your source -- because the meaning is different. For example, I'd most likely use a pair of Ints for widget dimensions: an opaque number is sufficient there, and it performs better. I prefer to read Nat as Count, and you hardly ever count pixels one by one in your types or elsewhere, as opposed to elements of a list or the depth of a tree.

Either way, the translation currently happens in Idris.DataOpts, which lives in src/Idris/DataOpts.hs. Compile-time optimisation is performed on the Raw representation.

We've been talking about generalising this for a very long time but I can't find anything written up online. You could extend this idea of big ints to Fin besides Nat, represent vectors as arrays, binary trees packed in arrays (like in heapsort), etc.

I have some notes and even syntax ideas but they are far from anything complete. For example, changing the representation changes the complexity of operations (e.g. (+1) becomes more expensive for big integers than for Nat, the same goes for cons in arrays vs. vectors) so you probably want to let the programmer choose the implementation somehow...

clayrat · 2018-01-17T17:47:16Z

Another cool optimization would be to substitute Bits for binary numbers modulo (or bitvectors) .

sergey-kintsel · 2018-01-18T10:37:11Z

I was looking DataOpts.hs though and found this:

(Var n, args) <- raw_unapply t -- MAGIC HERE

that's an amazing code comment 😂

sergey-kintsel · 2018-01-20T14:48:11Z

looks like I am not the person to solve this. I have no idea how it works internally tbh 😅

ahmadsalim mentioned this issue Jan 17, 2018

Constructing lists takes a long time #4283

Closed

ahmadsalim added C-Moderate Effort A-Confirmed S-Normal U-Type Checking U-Interactive labels Jan 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List construction works very slow in REPL if I don't explicitly define its type as `the (List Int)` #4290

List construction works very slow in REPL if I don't explicitly define its type as `the (List Int)` #4290

sergey-kintsel commented Jan 14, 2018 •

edited

Loading

steshaw commented Jan 17, 2018 •

edited

Loading

steshaw commented Jan 17, 2018

ziman commented Jan 17, 2018 •

edited

Loading

steshaw commented Jan 17, 2018

ziman commented Jan 17, 2018

clayrat commented Jan 17, 2018

sergey-kintsel commented Jan 18, 2018

sergey-kintsel commented Jan 20, 2018

List construction works very slow in REPL if I don't explicitly define its type as the (List Int) #4290

List construction works very slow in REPL if I don't explicitly define its type as the (List Int) #4290

Comments

sergey-kintsel commented Jan 14, 2018 • edited Loading

Steps to Reproduce

Expected Behavior

steshaw commented Jan 17, 2018 • edited Loading

steshaw commented Jan 17, 2018

ziman commented Jan 17, 2018 • edited Loading

steshaw commented Jan 17, 2018

ziman commented Jan 17, 2018

clayrat commented Jan 17, 2018

sergey-kintsel commented Jan 18, 2018

sergey-kintsel commented Jan 20, 2018

List construction works very slow in REPL if I don't explicitly define its type as `the (List Int)` #4290

List construction works very slow in REPL if I don't explicitly define its type as `the (List Int)` #4290

sergey-kintsel commented Jan 14, 2018 •

edited

Loading

steshaw commented Jan 17, 2018 •

edited

Loading

ziman commented Jan 17, 2018 •

edited

Loading