Skip to content

Commit

Permalink
Finish up function multiversioning support
Browse files Browse the repository at this point in the history
* Enable test function multiversioning on the CI

  We can't do too much cloning on the CI before hitting the timeout or memory limit...
  Also avoid turning on cloning on circle CI since we seem to be very close to the memory limit.

* Add devdoc
  • Loading branch information
yuyichao committed Oct 13, 2017
1 parent 06ad31d commit 28a215d
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 0 deletions.
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ before_install:
export JULIA_CPU_CORES=2;
export JULIA_TEST_MAXRSS_MB=600;
TESTSTORUN="all --skip linalg/triangular subarray"; fi # TODO: re enable these if possible without timing out
- echo "override JULIA_CPU_TARGET=generic;native" >> Make.user
- git clone -q git://git.kitenet.net/moreutils
script:
- echo BUILDOPTS=$BUILDOPTS
Expand Down
1 change: 1 addition & 0 deletions contrib/windows/appveyor_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ else
echo 'LIBBLAS = -L$(JULIAHOME)/usr/bin -lopenblas' >> Make.user
echo 'LIBBLASNAME = libopenblas' >> Make.user
fi
echo "override JULIA_CPU_TARGET=generic;native" >> Make.user

# Set XC_HOST if in Cygwin or Linux
case $(uname) in
Expand Down
65 changes: 65 additions & 0 deletions doc/src/devdocs/sysimg.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,68 @@ and `force` set to `true`, one would execute:
```
julia build_sysimg.jl /tmp/sys core2 ~/userimg.jl --force
```

## System image optimized for multiple microarchitectures

The system image can be compiled simultaneously for multiple CPU microarchitectures
under the same instruction set architecture (ISA). Multiple versions of the same function
may be created with minimum dispatch point inserted into shared functions
in order to take advantage of different ISA extensions or other microarchitecture features.
The version that offers the best performance will be selected automatically at runtime
based on available features.

### Specifying multiple system image targets

Multi-microarch system image can be enabled by passing multiple targets
during system image compilation. This can be done either with the `JULIA_CPU_TARGET` make option
or with the `-C` command line option when running the compilation command manually.
Multiple targets are separated by `;` in the option.
The syntax for each target is a CPU name followed by multiple features separated by `,`.
All features supported by LLVM is supported and a feature can be disabled with a `-` prefix.
(`+` prefix is also allowed and ignored to be consistent with LLVM syntax).
Additionally, two special features are supported to control the function cloning behavior.

1. `clone_all`

By default, only functions that are the most likely to benefit from
the microarchitecture features will be cloned.
When `clone_all` is specified for a target, however,
**all** functions in the system image will be cloned for the target.
The negative form `-clone_all` can be used to prevent the built-in
heuristic from cloning all functions.

2. `base(<n>)`

Where `<n>` is a placeholder for a non-negative number (e.g. `base(0)`, `base(1)`).
By default, a partially cloned (i.e. not `clone_all`) target will use functions
from the default target (first one specified) if a function is not cloned.
This behavior can be changed by specifying a different base with the `base(<n>)` option.
The `n`th target (0-based) will be used as the base target instead of the default (`0`th) one.
The base target has to be either `0` or another `clone_all` target.
Specifying a non default `clone_all` target as the base target will cause an error.

### Implementation overview

This is a brief overview of different part involved in the implementation.
See code comments for each components for more implementation details.

1. System image compilation

The parsing and cloning decision are done in `src/processor*`.
We currently support cloning of function based on the present of loops, simd instructions,
or other math operations (e.g. fastmath, fma, muladd).
This information is passed on to `src/llvm-multiversioning.cpp` which does the actual cloning.
In addition to doing the cloning and insert dispatch slots
(see comments in `MultiVersioning::runOnModule` for how this is done),
the pass also generates metadata so that the runtime can load and initialize the
system image correctly.
A detail description of the metadata is available in `src/processor.h`.

2. System image loading

The loading and initialization of the system image is done in `src/processor*` by
parsing the metadata saved during system image generation.
Host feature detection and selection decision are done in `src/processor_*.cpp`
depending on the ISA. The target selection will prefer exact CPU name match,
larger vector register size, and larget number of features.
An overview of this process is in `src/processor.cpp`.

0 comments on commit 28a215d

Please sign in to comment.