Skip to content

optimizing nim code for performance

Timothee Cour edited this page May 22, 2020 · 6 revisions

template vs inline performance

inline procs WILL be noticably slower than templates in debug builds (or anything with --stacktrace) unless #13582 is merged and user passes the --stacktrace:noinline this enables in debug builds (even with --stacktrace:off and regardless of #13582) I've observed that templates will be a little bit faster than inline proc, but same speed with danger builds (and maybe -d:release, haven't checked); however IMO codegen can be improved to make inline proc and template same speed even for debug build by removing redundant assignment incurred by inline proc (with --stacktrace:noinline or --stacktrace:off)

inline semantics

pgo

optimization tricks

const t = "abcdefghijklmnopqrstuvwxyz".indent(65).cstring Microbenchmarks are misleading, in reality you can also have I-cache problems and then you appreciate not everything was "optimized" by 8 times loop unrolling with SSE instructions and 1KB lookup tables.

strlen(s) < 5 => strnlen(s, 5) < 5

https://lists.llvm.org/pipermail/llvm-dev/2016-February/095032.html