-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental compilation #46
Comments
The implementation of the compilation cache is tricky: There are lots General approach: AST replayWe store a module's AST of a successful semantic check in a SQLite {.compile: "foo.c".} # even if the module is loaded from the DB,
# "foo.c" needs to be compiled/linked. The solution is to re-play the module's top level statements. In fact, this decribes how the AST should be stored in the database, import strutils
var x*: int = 90
{.compile: "foo.c".}
proc p = echo "p"
proc q = echo "q"
static:
echo "static" Conceptually this is the AST we store for the module: import strutils
var x*
{.compile: "foo.c".}
proc p
proc q
static:
echo "static" The symbol's It is also important that the replay involves the Shared global compiletime stateNim allows
(These solutions are not mutually exclusive.) Since we adopt the "replay the top level statements" idea, the natural apicall.add(") module: '" & dllName & "'>\C" &
"\t^self externalCallFailed\C!\C\C")
stCode.add(st & "\C\t\"Generated by NimSqueak\"\C\t" & apicall) We can "replay" In practice, things are worse still, consider We need an API that hides the complex aliasing problems by not relying proc cachePut*(key: string; value: string)
proc cacheGet*(key: string): string However, the values being strings or json is quite problematic: Many On the other hand, in Nim's future I would like to replace the VM proc cachePut*(key: string; value: NimNode)
proc cacheGet*(key: string): NimNode The API should embrace the AST diffing notion: See the Methods and type convertersIn the following Nim contains language features that are global. The best example for that Other features that are implicitly triggered cause problems for modularity # module A
converter toBool(x: int): bool =
result = x != 0 # module B
import A
if 1:
echo "ugly, but should work" If in the above example module Both the multi method and the type converter problems are solved by the GenericsWe cache generic instantiations and need to ensure this caching works Backend issues
However the biggest problem is that dead code elimination breaks modularity! Then module Then module So now SolutionThe backend must have some logic so that if the currently processed module |
Appendix: macrocache.nim## This module provides an API for macros that need to collect compile
## time information across module boundaries in global variables.
## Starting with version 0.19 of Nim this is not directly supported anymore
## as it breaks incremental compilations.
## Instead the API here needs to be used. See XXX (wikipedia page) for a
## theoretical foundation behind this.
type
CacheSeq* = distinct string
CacheTable* = distinct string
CacheCounter* = distinct string
proc value*(c: CacheCounter): int {.magic: "NccValue".}
proc inc*(c: CacheCounter; by = 1) {.magic: "NccInc".}
proc add*(s: CacheSeq; value: NimNode) {.magic: "NcsAdd".}
proc incl*(s: CacheSeq; value: NimNode) {.magic: "NcsIncl".}
proc len*(s: CacheSeq): int {.magic: "NcsLen".}
proc `[]`*(s: CacheSeq; i: int): NimNode {.magic: "NcsAt".}
iterator items*(s: CacheSeq): NimNode =
for i in 0 ..< len(s): yield s[i]
proc `[]=`*(t: CacheTable; key: string, value: NimNode) {.magic: "NctPut".}
## 'key' has to be unique!
proc len*(t: CacheTable): int {.magic: "NctLen".}
proc `[]`*(t: CacheTable; key: string): NimNode {.magic: "NctGet".}
proc hasNext(t: CacheTable; iter: int): bool {.magic: "NctHasNext".}
proc next(t: CacheTable; iter: int): (string, NimNode, int) {.magic: "NctNext".}
iterator pairs*(t: CacheTable): (string, NimNode) =
var h = 0
while hasNext(t, h):
let (a, b, h2) = next(t, h)
yield (a, b)
h = h2 This is the upcoming API that replaces the Please try to imagine and tell me if your current macros could be based on this API. :-) Note again that only global compileTime variables that are used across module boundaries are affected. The vast majority of macro code out there is not affected. |
Araq and I have discussed this issue many times in private conversations, so I'll give a bit more background information. The main problem with global compile-time variables is that they should be deterministic in the presence of arbitrary module recompilations. For this to work reliably, every time you recompile a module you must first undo all of its previous effects on the global variables and then apply the effects produced from the changed code. Undoing and redoing changes to a data structure is not trivial because it may depend on the order of operations (imagine a simple Luckily, there is a category of data structures which support manipulation from multiple participants in arbitrary order always arriving at the same result - the so called Conflict-free replicated data types. The CRDT structures include counters, sets and other useful primitives that can serve as the basis for implementing everything else (from Set Theory, we know that sets are enough to represent any concept). Thus, to increase modularity, the proposed solution is to limit the interaction with global compile-time variables to a set operations over CRDT types which will be recorded by the compiler. This could work by blessing certain types for use in compile-time vars or by introducing a low-level API like the one described above allowing you to build abstractions on top of it. It should be noted that just manipulating the compile-time variables correctly is not enough for implementing any generative programming idea. The user must also be able to generate code that is derived from the final "program-wide" value of the variable. I usually call such generative programming approaches the "Gather and Generate" pattern. Here are some examples: 1. Language bridge libraryA library like nimpy or NimBorg may gather all Nim types that are being passed to Python and Lua functions (by adding them to a global compile-time set as a side-effect from instantiating a generic dot operator). It will then produce a Initially 2. RPC libraryA RCP library may allow you to create RCP end-points spread across several modules. A 3. User-defined multi-methods, HTTP requests router, and so on.These are quite similar to the RPC library. The sealing proc can traverse the hierarchy of types to generate the required run-time dispatching logic, it can gather all HTTP routes and compile a single Ragel state machine for the most efficient routing possible and so on. This Gather and Generate technique is one of the most common generative programming patterns and I have a lot of experience simulating it with global run-time variables and type erasure tricks in both C++ and Nim with the currently available functionality. |
So, from the description above and Araq's current proposal, we need to address the following:
|
const
key = CacheCounter"package.module.myCounter" # obviously we can hide that better later.
static:
inc(key, 2)
inc(key)
|
As far as I understand from this thread, we're going to see a huge speed increase for this in the future. It seems like most of the slowdown is being caused by the top level statements being replayed on each compilation. Does this mean that by reducing the number or complexity of top-level statements, we can reduce the processing time required for each module? Example of current implementation method: ## ------------
## b.nim
import some_expensive_library
proc add*(a: int, b:int): int =
return a + b
## ------------
## a.nim
import b
echo add(a + b) Would any performance improvement be given by for example first compiling ## ------------
## b.nim
import some_expensive_library
proc add*(a: int, b:int): int {.exportc: "add"} =
return a + b
## ------------
## b_wrapper.nim
proc add*(a: int): int {.importc, dynlib: "./lib/b.so".}
## ------------
## a.nim
import b_wrapper
echo add(a + b) Would this perhaps speed up development processing cycles? |
@tavurth This describes the "simplest thing that can possibly work" and later on we have more ideas of how to make the compiler even more incremental and avoid the replaying of top level statements. However, I doubt DLLs are a viable solution to this problem as the code/AST must be available so that the compiler can perform the many compiletime evaluation features that Nim offers. |
That is my use-case: proc serialize(source: NimNode): NimNode = ... # the closure
# it generates folowing code
# for source = IdentNode("a")
a.writeData(...)
# for source = DotExpr(IdentNode("a"), IdenNode("b"))
a.b.writeData(...)
# and so on... It can be rewritten to use only NimNode (with some efforts though), but I need a # moduleA.nim
proc useCache*(a: CacheTable) {.compileTime.} =
...
# moduleB.nim
from moduleA import useCache
const x = CacheTable("mycachetable")
...
useCache(x) PS: it would be nice to have an ability to overwrite the cached value from the CacheTable at the compile time without 'key already exists' error. |
I have put a small bounty ($50) on Bountysource on this issue. (https://www.bountysource.com/issues/68171609-incremental-compilation) |
3 questions for clarification:
|
|
The first milestone is complete: IC works for 4 carefully selected test cases. There will be a follow-up milestone. |
Currently in larger Nim projects, the Nim compiler is the bottleneck, not the C/C++ compiler. This is caused by the fact that the Nim compiler always recompiles the full program including all used libraries. This is not required, Nim should use a "compilation cache" so that only the modules that were changed (plus maybe the modules that depended on these modules) are recompiled. The current idea is to base this compilation cache on an SQLite database. This feature is crucial for Nim's scalability to bigger projects.
The text was updated successfully, but these errors were encountered: