Incremental compilation #46

Araq · 2018-05-24T16:58:21Z

Currently in larger Nim projects, the Nim compiler is the bottleneck, not the C/C++ compiler. This is caused by the fact that the Nim compiler always recompiles the full program including all used libraries. This is not required, Nim should use a "compilation cache" so that only the modules that were changed (plus maybe the modules that depended on these modules) are recompiled. The current idea is to base this compilation cache on an SQLite database. This feature is crucial for Nim's scalability to bigger projects.

Araq · 2018-06-08T06:11:55Z

The implementation of the compilation cache is tricky: There are lots
of issues to be solved for the front- and backend.

General approach: AST replay

We store a module's AST of a successful semantic check in a SQLite
database. There are plenty of features that require a sub sequence
to be re-applied, for example:

  {.compile: "foo.c".} # even if the module is loaded from the DB,
                       # "foo.c" needs to be compiled/linked.

The solution is to re-play the module's top level statements.
This solves the problem without having to special case the logic
that fills the internal seqs which are affected by the pragmas.

In fact, this decribes how the AST should be stored in the database,
as a "shallow" tree. Let's assume we compile module m with the
following contents:

  import strutils

  var x*: int = 90
  {.compile: "foo.c".}
  proc p = echo "p"
  proc q = echo "q"
  static:
    echo "static"

Conceptually this is the AST we store for the module:

  import strutils

  var x*
  {.compile: "foo.c".}
  proc p
  proc q
  static:
    echo "static"

The symbol's body is loaded lazily, on demand. This is where most
savings come from, only the shallow outer AST is reconstructed immediately.

It is also important that the replay involves the import statement so
that the dependencies are resolved properly.

Shared global compiletime state

Nim allows .global, compiletime variables that can be filled by macro
invokations across different modules. This feature breaks modularity in a
severe way. Plenty of different solutions have been proposed:

Restrict the types of global compiletime variables to Set[T] or
similar unordered, only-growable collections so that we can track
the module's write effects to these variables and reapply the changes
in a different order.
In every module compilation, reset the variable to its default value.
Provide a restrictive API that can load/save the compiletime state to
a file.

(These solutions are not mutually exclusive.)

Since we adopt the "replay the top level statements" idea, the natural
solution to this problem is to emit pseudo top level statements that
reflect the mutations done to the global variable. However, this is
MUCH harder than it sounds, for example squeaknim uses this
snippet:

  apicall.add(") module: '" & dllName & "'>\C" &
              "\t^self externalCallFailed\C!\C\C")
  stCode.add(st & "\C\t\"Generated by NimSqueak\"\C\t" & apicall)

We can "replay" stCode.add only if the values of st
and apicall are known. And even then a hash table's add with its
hashing mechanism is too hard to replay.

In practice, things are worse still, consider someGlobal[i][j].add arg.
We only know the root is someGlobal but the concrete path to the data
is unknown as is the value that is added. We could compute a "diff" between
the global states and use that to compute a symbol patchset, but this is
quite some work, expensive to do at runtime (it would need to run after
every module has been compiled) and also would break for hash tables.

We need an API that hides the complex aliasing problems by not relying
on Nim's global variables. The obvious solution is to use string keys
instead of global variables:

  proc cachePut*(key: string; value: string)
  proc cacheGet*(key: string): string

However, the values being strings or json is quite problematic: Many
lookup tables that are built at compiletime embed proc vars and
types which have no obvious string representation... Seems like
AST diffing is still the best idea as it will not require to use
an alien API and works with some existing Nimble packages, at least.

On the other hand, in Nim's future I would like to replace the VM
by native code. A diff algorithm wouldn't work for that.
Instead the native code would work with an API like put, get:

  proc cachePut*(key: string; value: NimNode)
  proc cacheGet*(key: string): NimNode

The API should embrace the AST diffing notion: See the
module macrocache for the final details.

Methods and type converters

In the following
sections global means shared between modules or property of the whole
program.

Nim contains language features that are global. The best example for that
are multi methods: Introducing a new method with the same name and some
compatible object parameter means that the method's dispatcher needs to take
the new method into account. So the dispatching logic is only completely known
after the whole program has been translated!

Other features that are implicitly triggered cause problems for modularity
too. Type converters fall into this category:

  # module A
  converter toBool(x: int): bool =
    result = x != 0

  # module B
  import A

  if 1:
    echo "ugly, but should work"

If in the above example module B is re-compiled, but A is not then
B needs to be aware of toBool even though toBool is not referenced
in B explicitly.

Both the multi method and the type converter problems are solved by the
AST replay implementation.

Generics

We cache generic instantiations and need to ensure this caching works
well with the incremental compilation feature. Since the cache is
attached to the PSym datastructure, it should work without any
special logic.

Backend issues

Init procs must not be "forgotten" to be called.
Files must not be "forgotten" to be linked.
Method dispatchers are global.
DLL loading via dlsym is global.
Emulated thread vars are global.

However the biggest problem is that dead code elimination breaks modularity!
To see why, consider this scenario: The module G (for example the huge
Gtk2 module...) is compiled with dead code elimination turned on. So none
of G's procs is generated at all.

Then module B is compiled that requires G.P1. Ok, no problem,
G.P1 is loaded from the symbol file and G.c now contains G.P1.

Then module A (that depends on B and G) is compiled and B
and G are left unchanged. A requires G.P2.

So now G.c MUST contain both P1 and P2, but we haven't even
loaded P1 from the symbol file, nor do we want to because we then quickly
would restore large parts of the whole program.

Solution

The backend must have some logic so that if the currently processed module
is from the compilation cache, the body is not accessed. Instead
the generated C(++) for the symbol's body needs to be cached too and
inserted back into the produced C file. This approach seems to deal with
all the outlined problems above.

Araq · 2018-06-08T06:17:26Z

Appendix: macrocache.nim

## This module provides an API for macros that need to collect compile
## time information across module boundaries in global variables.
## Starting with version 0.19 of Nim this is not directly supported anymore
## as it breaks incremental compilations.
## Instead the API here needs to be used. See XXX (wikipedia page) for a
## theoretical foundation behind this.

type
  CacheSeq* = distinct string
  CacheTable* = distinct string
  CacheCounter* = distinct string

proc value*(c: CacheCounter): int {.magic: "NccValue".}
proc inc*(c: CacheCounter; by = 1) {.magic: "NccInc".}

proc add*(s: CacheSeq; value: NimNode) {.magic: "NcsAdd".}
proc incl*(s: CacheSeq; value: NimNode) {.magic: "NcsIncl".}
proc len*(s: CacheSeq): int {.magic: "NcsLen".}
proc `[]`*(s: CacheSeq; i: int): NimNode {.magic: "NcsAt".}

iterator items*(s: CacheSeq): NimNode =
  for i in 0 ..< len(s): yield s[i]

proc `[]=`*(t: CacheTable; key: string, value: NimNode) {.magic: "NctPut".}
  ## 'key' has to be unique!

proc len*(t: CacheTable): int {.magic: "NctLen".}
proc `[]`*(t: CacheTable; key: string): NimNode {.magic: "NctGet".}

proc hasNext(t: CacheTable; iter: int): bool {.magic: "NctHasNext".}
proc next(t: CacheTable; iter: int): (string, NimNode, int) {.magic: "NctNext".}

iterator pairs*(t: CacheTable): (string, NimNode) =
  var h = 0
  while hasNext(t, h):
    let (a, b, h2) = next(t, h)
    yield (a, b)
    h = h2

This is the upcoming API that replaces the .global, compileTime vars that are currently available for Nim's macro system, but need to be deprecated since they cannot work with incremental compilation.

Please try to imagine and tell me if your current macros could be based on this API. :-) Note again that only global compileTime variables that are used across module boundaries are affected. The vast majority of macro code out there is not affected.

zah · 2018-06-08T08:17:03Z

Araq and I have discussed this issue many times in private conversations, so I'll give a bit more background information.

The main problem with global compile-time variables is that they should be deterministic in the presence of arbitrary module recompilations. For this to work reliably, every time you recompile a module you must first undo all of its previous effects on the global variables and then apply the effects produced from the changed code. Undoing and redoing changes to a data structure is not trivial because it may depend on the order of operations (imagine a simple set type with incl and excl being used from multiple modules that can be reordered in arbitrary way).

Luckily, there is a category of data structures which support manipulation from multiple participants in arbitrary order always arriving at the same result - the so called Conflict-free replicated data types. The CRDT structures include counters, sets and other useful primitives that can serve as the basis for implementing everything else (from Set Theory, we know that sets are enough to represent any concept).

Thus, to increase modularity, the proposed solution is to limit the interaction with global compile-time variables to a set operations over CRDT types which will be recorded by the compiler. This could work by blessing certain types for use in compile-time vars or by introducing a low-level API like the one described above allowing you to build abstractions on top of it.

It should be noted that just manipulating the compile-time variables correctly is not enough for implementing any generative programming idea. The user must also be able to generate code that is derived from the final "program-wide" value of the variable. I usually call such generative programming approaches the "Gather and Generate" pattern. Here are some examples:

1. Language bridge library

A library like nimpy or NimBorg may gather all Nim types that are being passed to Python and Lua functions (by adding them to a global compile-time set as a side-effect from instantiating a generic dot operator). It will then produce a registerNimTypes(foreignVM) proc that will register all the run-time type information required by the foreign VM for working with the Nim values.

Initially registerNimTypes exists as an empty placeholder proc (it can be marked in the code with the partial pragma). Once the final value of the global compile-time variable is determined, a "sealing" compile-time proc is executed that will generate the body of registerNimTypes.

2. RPC library

A RCP library may allow you to create RCP end-points spread across several modules. A rpc macro is used to register them in a global compile-time set. In the sealing proc, the set or RPC procs is sorted by alphabetical order or by time of introduction (this can be achieved with a staticWrite API that will automatically update a file stored in version history). The protocol is versioned by hashing the signatures of all RPCs and each of them is assigned a numeric ID for maximum efficiency (from the sorter order). The numeric ID is written to a placeholder symbol referenced in the generated code of each RPC call.

3. User-defined multi-methods, HTTP requests router, and so on.

These are quite similar to the RPC library. The sealing proc can traverse the hierarchy of types to generate the required run-time dispatching logic, it can gather all HTTP routes and compile a single Ragel state machine for the most efficient routing possible and so on.

This Gather and Generate technique is one of the most common generative programming patterns and I have a lot of experience simulating it with global run-time variables and type erasure tricks in both C++ and Nim with the currently available functionality.

zah · 2018-06-08T08:30:54Z

So, from the description above and Araq's current proposal, we need to address the following:

How does the user introduce a new instance of a shared data structures (CacheSeq, CacheTable, CacheCounter, etc)? Is this done by introducing a variable?
Are these types CRDT data structures? How are they implemented?
How is the "sealing" stage exposed to the user?
Modifications to global variables can be introduced from static: code executed once per generic instantiation. For this to work, the instantiations need to be reference counted (each client module increases the count). Once a side effect is introduced, it should be removed only when the count drops back to zero. Does this play well with the rest of the proposal?
If the lower-level API is based on NimNodes, can we have higher-level APIs that are strongly typed?
To complete the feature, we also need the staticWrite magic for creating files that will go into version control. This is important, because it allows you to produce "cached" artifacts that will affect your co-workers as well.

Araq · 2018-06-08T08:35:54Z

const
  key = CacheCounter"package.module.myCounter" # obviously we can hide that better later.

static:
  inc(key, 2)
  inc(key)

The table is an ordered BTree, the seq is a sequence with "set" semantics, but some order is exposed as you need to be able to iterate over it.
There is no sealing stage.
I don't understand this question.
Probably, but under the hood it can only work with NimNodes.
Ok, that seems rather easy to add.

tavurth · 2018-08-22T11:21:02Z

As far as I understand from this thread, we're going to see a huge speed increase for this in the future.

It seems like most of the slowdown is being caused by the top level statements being replayed on each compilation.

Does this mean that by reducing the number or complexity of top-level statements, we can reduce the processing time required for each module?

Example of current implementation method:

## ------------
## b.nim
import some_expensive_library

proc add*(a: int, b:int): int =
    return a + b

## ------------
## a.nim
import b

echo add(a + b)

Would any performance improvement be given by for example first compiling b.nim to a dylib or so and then writing a small wrapper which interfaces with the compiled library?

## ------------
## b.nim
import some_expensive_library

proc add*(a: int, b:int): int {.exportc: "add"} =
    return a + b

b.nim is then precompiled into ./lib/b.so.

## ------------
## b_wrapper.nim
proc add*(a: int): int {.importc, dynlib: "./lib/b.so".}

## ------------
## a.nim
import b_wrapper

echo add(a + b)

Would this perhaps speed up development processing cycles?

Araq · 2018-08-22T13:48:06Z

@tavurth This describes the "simplest thing that can possibly work" and later on we have more ideas of how to make the compiler even more incremental and avoid the replaying of top level statements. However, I doubt DLLs are a viable solution to this problem as the code/AST must be available so that the compiler can perform the many compiletime evaluation features that Nim offers.

xomachine · 2018-09-10T12:27:16Z

That is my use-case:
The NESM library, I'm working on, uses global variable "ctx" which contains closures for generating [de]serialization code for all previously encountered types.
The closures are used to subtitute the object name in the generated code like:

proc serialize(source: NimNode): NimNode = ... # the closure
# it generates folowing code
# for source = IdentNode("a")
a.writeData(...)
# for source = DotExpr(IdentNode("a"), IdenNode("b"))
a.b.writeData(...)
# and so on...

It can be rewritten to use only NimNode (with some efforts though), but I need a contains proc for the CacheTable.
And I want to know if CacheTable can be used as argument to the function located in the other module.

# moduleA.nim
proc useCache*(a: CacheTable) {.compileTime.} =
  ...

# moduleB.nim
from moduleA import useCache
const x = CacheTable("mycachetable")
...
useCache(x)

PS: it would be nice to have an ability to overwrite the cached value from the CacheTable at the compile time without 'key already exists' error.

zielmicha · 2019-02-10T19:22:05Z

I have put a small bounty ($50) on Bountysource on this issue. (https://www.bountysource.com/issues/68171609-incremental-compilation)

metagn · 2020-11-29T17:15:14Z

3 questions for clarification:

Are macrocache items supposed to be mutable? Because the current implementation allows this, like so:

import macros, macrocache
static:
  let cache = CacheSeq"foo"
  cache.add(newTree(nnkPar))
  cache[0].add(ident"abc")

Will using compileTime globals instead of macrocache still be allowed at the cost of breaking incremental compilation? The macrocache documentation says it "is not directly supported anymore" and I'm only guessing that means "problems that compiletime globals have that macrocache can fix will not be addressed" rather than "compiletime globals should never be used". I don't support the latter because macrocache is a restriction on the power that macros have and there are some things that it might not be able to do or won't support that you might want to do.
the seq is a sequence with "set" semantics, but some order is exposed as you need to be able to iterate over it.

This means that they have order, however they won't store identical references or something, is that correct? I don't know because the current implementation allows for you to use both "equal" and identical nodes.

Araq · 2020-11-30T14:46:31Z

IIRC this won't work and shouldn't be allowed.
Yes but I consider features that "break IC" moribund. IC is not negotiable as far as I'm concerned. Maybe we need a more extensive macrocache API to support more use-cases, maybe we can do away by tracking dependencies in the VM without API additions but IC is more important than anything else to me. Especially now that arc/orc and =hooks are production ready.
They have an arbitrary order, much like iterating over a hash table.

Araq · 2021-04-09T17:02:40Z

The first milestone is complete: IC works for 4 carefully selected test cases. There will be a follow-up milestone.

narimiran transferred this issue from nim-lang/Nim Jan 2, 2019

zah mentioned this issue May 16, 2019

make global variables static when possible nim-lang/Nim#11236

Closed

Araq added this to the Nim2020 milestone Nov 8, 2019

mratsim mentioned this issue Dec 11, 2020

Developer tooling #300

Open

vitreo12 mentioned this issue Mar 22, 2021

Incremental compilation vitreo12/omni#151

Open

Araq added Accepted RFC implemented labels Apr 9, 2021

Araq closed this as completed Apr 9, 2021

metagn mentioned this issue Feb 5, 2022

[Feature Request] Add contains, pop, dec and more to macrocache nim-lang/Nim#19494

Open

6 tasks

zielmicha mentioned this issue Sep 9, 2018

[Grant Proposal] Make Nim suitable for interactive programming (Implement hot code-reloading, a REPL and a Jupyter kernel) #511

Open

Ethosa mentioned this issue Sep 13, 2023

Migrating to macrocache stdlib 🔥 HapticX/happyx#145

Closed

zah mentioned this issue Jan 13, 2024

Avoid global in p2p macro (fixes #4578) status-im/nimbus-eth2#5719

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental compilation #46

Incremental compilation #46

Araq commented May 24, 2018

Araq commented Jun 8, 2018

Araq commented Jun 8, 2018

zah commented Jun 8, 2018 •

edited

Loading

zah commented Jun 8, 2018 •

edited

Loading

Araq commented Jun 8, 2018 •

edited

Loading

tavurth commented Aug 22, 2018

Araq commented Aug 22, 2018

xomachine commented Sep 10, 2018 •

edited

Loading

zielmicha commented Feb 10, 2019

metagn commented Nov 29, 2020 •

edited

Loading

Araq commented Nov 30, 2020

Araq commented Apr 9, 2021

Incremental compilation #46

Incremental compilation #46

Comments

Araq commented May 24, 2018

Araq commented Jun 8, 2018

General approach: AST replay

Shared global compiletime state

Methods and type converters

Generics

Backend issues

Solution

Araq commented Jun 8, 2018

Appendix: macrocache.nim

zah commented Jun 8, 2018 • edited Loading

1. Language bridge library

2. RPC library

3. User-defined multi-methods, HTTP requests router, and so on.

zah commented Jun 8, 2018 • edited Loading

Araq commented Jun 8, 2018 • edited Loading

tavurth commented Aug 22, 2018

Araq commented Aug 22, 2018

xomachine commented Sep 10, 2018 • edited Loading

zielmicha commented Feb 10, 2019

metagn commented Nov 29, 2020 • edited Loading

Araq commented Nov 30, 2020

Araq commented Apr 9, 2021

zah commented Jun 8, 2018 •

edited

Loading

zah commented Jun 8, 2018 •

edited

Loading

Araq commented Jun 8, 2018 •

edited

Loading

xomachine commented Sep 10, 2018 •

edited

Loading

metagn commented Nov 29, 2020 •

edited

Loading