diff --git a/.travis.yml b/.travis.yml index f4636e759ecff9..d7cf3002f686bb 100644 --- a/.travis.yml +++ b/.travis.yml @@ -101,6 +101,7 @@ before_install: export JULIA_CPU_CORES=2; export JULIA_TEST_MAXRSS_MB=600; TESTSTORUN="all --skip linalg/triangular subarray"; fi # TODO: re enable these if possible without timing out + - echo "override JULIA_CPU_TARGET=generic;native" >> Make.user - git clone -q git://git.kitenet.net/moreutils script: - echo BUILDOPTS=$BUILDOPTS diff --git a/contrib/windows/appveyor_build.sh b/contrib/windows/appveyor_build.sh index 4f1ab3451016c1..33f101cf18eb68 100755 --- a/contrib/windows/appveyor_build.sh +++ b/contrib/windows/appveyor_build.sh @@ -53,6 +53,7 @@ else echo 'LIBBLAS = -L$(JULIAHOME)/usr/bin -lopenblas' >> Make.user echo 'LIBBLASNAME = libopenblas' >> Make.user fi +echo "override JULIA_CPU_TARGET=generic;native" >> Make.user # Set XC_HOST if in Cygwin or Linux case $(uname) in diff --git a/doc/src/devdocs/sysimg.md b/doc/src/devdocs/sysimg.md index 2248911710e907..ba0221d4752082 100644 --- a/doc/src/devdocs/sysimg.md +++ b/doc/src/devdocs/sysimg.md @@ -38,3 +38,68 @@ and `force` set to `true`, one would execute: ``` julia build_sysimg.jl /tmp/sys core2 ~/userimg.jl --force ``` + +## System image optimized for multiple microarchitectures + +The system image can be compiled simultaneously for multiple CPU microarchitectures +under the same instruction set architecture (ISA). Multiple versions of the same function +may be created with minimum dispatch point inserted into shared functions +in order to take advantage of different ISA extensions or other microarchitecture features. +The version that offers the best performance will be selected automatically at runtime +based on available features. + +### Specifying multiple system image targets + +Multi-microarch system image can be enabled by passing multiple targets +during system image compilation. This can be done either with the `JULIA_CPU_TARGET` make option +or with the `-C` command line option when running the compilation command manually. +Multiple targets are separated by `;` in the option. +The syntax for each target is a CPU name followed by multiple features separated by `,`. +All features supported by LLVM is supported and a feature can be disabled with a `-` prefix. +(`+` prefix is also allowed and ignored to be consistent with LLVM syntax). +Additionally, two special features are supported to control the function cloning behavior. + +1. `clone_all` + + By default, only functions that are the most likely to benefit from + the microarchitecture features will be cloned. + When `clone_all` is specified for a target, however, + **all** functions in the system image will be cloned for the target. + The negative form `-clone_all` can be used to prevent the built-in + heuristic from cloning all functions. + +2. `base()` + + Where `` is a placeholder for a non-negative number (e.g. `base(0)`, `base(1)`). + By default, a partially cloned (i.e. not `clone_all`) target will use functions + from the default target (first one specified) if a function is not cloned. + This behavior can be changed by specifying a different base with the `base()` option. + The `n`th target (0-based) will be used as the base target instead of the default (`0`th) one. + The base target has to be either `0` or another `clone_all` target. + Specifying a non default `clone_all` target as the base target will cause an error. + +### Implementation overview + +This is a brief overview of different part involved in the implementation. +See code comments for each components for more implementation details. + +1. System image compilation + + The parsing and cloning decision are done in `src/processor*`. + We currently support cloning of function based on the present of loops, simd instructions, + or other math operations (e.g. fastmath, fma, muladd). + This information is passed on to `src/llvm-multiversioning.cpp` which does the actual cloning. + In addition to doing the cloning and insert dispatch slots + (see comments in `MultiVersioning::runOnModule` for how this is done), + the pass also generates metadata so that the runtime can load and initialize the + system image correctly. + A detail description of the metadata is available in `src/processor.h`. + +2. System image loading + + The loading and initialization of the system image is done in `src/processor*` by + parsing the metadata saved during system image generation. + Host feature detection and selection decision are done in `src/processor_*.cpp` + depending on the ISA. The target selection will prefer exact CPU name match, + larger vector register size, and larget number of features. + An overview of this process is in `src/processor.cpp`.