Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing for v2.3 #170

Merged
merged 50 commits into from
Sep 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
c4056a7
started basic work on avx512 (completely wrong results currently)
manodeep Mar 22, 2018
ce978f4
Fixed the number of bits set in the mask. npairs now agrees with the …
manodeep Mar 22, 2018
ccf47e6
Trying mask loads. everything is broken now, including npairs
manodeep Mar 23, 2018
638f9fe
Working version with AVX512 (requires AVX512VL, ie Skylake cpus). Not…
manodeep Mar 23, 2018
904715a
Compiles with gcc7.3 (but not with gcc6.4) - might be a compiler bug
manodeep Mar 24, 2018
55ce36a
AVX512 is awesome! Supports masked horizontal adds across vector regi…
manodeep Mar 24, 2018
48e4f6b
Cleaned up the logic in the initial part of the loop.
manodeep Mar 24, 2018
b0f0b81
Improved handling of missing numpy. Renamed the fma macros to reflect…
manodeep Mar 24, 2018
8a9784c
Moved all the union declarations into the header files. Removed the m…
manodeep Mar 24, 2018
0e3122e
Added the AVX2 implementation for wp. No performance improvement beca…
manodeep Mar 24, 2018
eb71b3a
mostly finished with the avx512
manodeep Mar 27, 2018
350d97a
Cleaned up the dependency statements in the Makefiles. Protected the …
manodeep Apr 1, 2018
a464060
Replaced the mask loads with maskz loads. Replaced the comparison wit…
manodeep Apr 3, 2018
661af64
Should not compile python extensions if python/numpy are not available
manodeep Apr 3, 2018
4386083
Added the avx512f kernels for DDrppi_mocks. Integration tests pass. T…
manodeep Apr 3, 2018
e0a0daf
Bumped version
manodeep Apr 3, 2018
d65be85
Silencing compiler warning about meaningless type qualifier
manodeep Apr 3, 2018
ae76711
Replaced the Newton-Raphson steps with FMA equivalents
manodeep Apr 10, 2018
dea1d18
Fixed a logic error for the AVX512 case with count_vectorized option.…
manodeep Apr 10, 2018
c1323ff
Updated the scripts for benchmarking the speedup from avx512f
manodeep Apr 11, 2018
bedfd16
Added the DD kernel. Integration tests pass
manodeep Apr 22, 2018
6b58e17
Added the xi kernel. Integration tests pass
manodeep Apr 22, 2018
76a5784
Added the avx512f kernel for DDrppi. Integration tests pass
manodeep Apr 24, 2018
5dc64d0
Replaced the set zero to the actual intrinsic dedicated to that purpose
manodeep Apr 24, 2018
d8e83f3
Fixed the setzero int call
manodeep Apr 25, 2018
ff926cf
Added the DDsmu pair counter. Not real speed improvement; tests pass
manodeep Apr 25, 2018
e30cee9
Cleaned up the initial search for valid pairs in wp
manodeep Apr 25, 2018
cb7a073
Added the DDsmu_mocks (compiles but no checks have been run)
manodeep Apr 28, 2018
561bfeb
Added the DDtheta kernel (not compiled, tested or debugged)
manodeep Apr 29, 2018
a566f43
fast divide options needs to be added to DDsmu theory at some point […
manodeep Apr 29, 2018
30696ba
Merge branch 'avx512' of https://github.com/manodeep/Corrfunc into av…
manodeep Apr 29, 2018
16470c2
Fixed bug in DDtheta. Integration tests pass now
manodeep Apr 29, 2018
d3be28d
Removed unused variable. Tests pass for DDsmu_mocks
manodeep Apr 29, 2018
c741056
Adding in the fast_divide option to theory/DDsmu paircounter. Not tested
manodeep Apr 30, 2018
00c0ac5
Fixing the typos in fast-divide part of DDsmu
manodeep May 12, 2018
4a13dba
Fixed typo. And ported the nicer way of figuring out the start index …
manodeep May 12, 2018
2498523
Removed icc as default compiler
manodeep May 13, 2018
0573375
Attempting to fix travis failures from AVX512 code
manodeep May 13, 2018
51559b2
Removing the old xcode6.4 - seems to be not supported on travis any more
manodeep May 13, 2018
285791d
No longer counting blank lines. Partly fixes #160
manodeep Jun 29, 2018
f3f95b4
Thetamin of 0.0 is allowed
manodeep Jun 29, 2018
91ce6ba
Merge branch 'master' into avx512
manodeep Aug 20, 2018
a28cdb7
Fixing the build failure in #168
manodeep Aug 23, 2018
82d6ad9
Adding in the fast_divide option to the kernels. hopefully fixing bui…
manodeep Aug 27, 2018
52572a1
Fixing build failure for numpy dependency
manodeep Aug 24, 2018
b871725
Made a changelog entry [ci skip]
manodeep Aug 27, 2018
db9c19c
conda uses secure channels now
manodeep Aug 28, 2018
6ad6106
Merge branch 'min_sep_optimizations' into avx512
manodeep Sep 29, 2018
5ea7f49
Corrected the PR # [ci skip]
manodeep Sep 29, 2018
4463e52
Added in the PR # to the min separations [ci skip]
manodeep Sep 29, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ wp
xi
DDrppi
wprp
DDsmu
bin/*
include/*
run_correlations
Expand Down
25 changes: 12 additions & 13 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ matrix:
# before_install:
# - brew update
# - brew tap homebrew/versions && brew reinstall gcc49 --without-multilib
# - wget http://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
# - wget https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh

# - os: osx
# compiler: clang
Expand All @@ -42,36 +42,35 @@ matrix:
# - brew update
# - brew outdated xctool || brew upgrade xctool
# - brew tap homebrew/versions && brew install clang-omp
# - wget http://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
# - wget https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
- os: osx
osx_image: xcode9
compiler: clang
env: COMPILER=clang FAMILY=clang V='Apple LLVM 7.0.0' PYTHON_VERSION=3.6 DOCTEST=FALSE
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh


- os: osx
osx_image: xcode8
compiler: clang
env: COMPILER=clang FAMILY=clang V='Apple LLVM 7.0.0' PYTHON_VERSION=3.5 DOCTEST=FALSE
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh

- os: osx
osx_image: xcode7.3
compiler: clang
env: COMPILER=clang FAMILY=clang V='Apple LLVM 7.0.0' PYTHON_VERSION=2.7 DOCTEST=FALSE
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh -O miniconda.sh

- wget https://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh -O miniconda.sh

# - os: osx
# compiler: gcc
# env: COMPILER=gcc-4.8 V='4.8' PYTHON_VERSION=3.5 FAMILY=gcc
# before_install:
# - brew update && brew tap homebrew/versions && brew install gcc48 --without-multilib
# - wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
# - wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh

# - os: linux
# dist: trusty
Expand All @@ -83,7 +82,7 @@ matrix:
# packages: ['clang-3.6','libgsl0-dev']
# env: COMPILER=clang-3.6 V=3.6 PYTHON_VERSION=2.7
# before_install:
# - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
# - wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh

# - os: linux
# dist: trusty
Expand All @@ -95,39 +94,39 @@ matrix:
# packages: ['clang-3.6','libgsl0-dev']
# env: COMPILER=clang-3.6 V=3.6 PYTHON_VERSION=3.5
# before_install:
# - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

- os: linux
dist: trusty
sudo: required
compiler: gcc
env: COMPILER=gcc PYTHON_VERSION=2.7
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
- wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh

- os: linux
dist: trusty
sudo: required
compiler: gcc
env: COMPILER=gcc PYTHON_VERSION=3.4
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

- os: linux
dist: trusty
sudo: required
compiler: gcc
env: COMPILER=gcc PYTHON_VERSION=3.5
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

- os: linux
dist: trusty
sudo: required
compiler: gcc
env: COMPILER=gcc PYTHON_VERSION=3.6
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

install:
- bash miniconda.sh -b -p $HOME/miniconda
Expand Down
14 changes: 14 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,20 @@ New features
- conda installable package
- GPU version

2.3.0
=======

**Breaking Changes**
--------------------

New features
------------
- AVX512F kernels for all pair-counters [#167, #170]
- Faster code from new optimizations using the minimum separation between pairs of cells [#170]

Bug fixes
---------
- Fix segmentation fault in vpf_mocks [#168]

2.3.0
=======
Expand Down
2 changes: 1 addition & 1 deletion Corrfunc/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ def translate_isa_string_to_enum(isa):
except NameError:
if not isinstance(isa, str):
raise TypeError(msg)
valid_isa = ['FALLBACK', 'AVX', 'SSE42', 'FASTEST']
valid_isa = ['FALLBACK', 'AVX512F', 'AVX2', 'AVX', 'SSE42', 'FASTEST']
isa_upper = isa.upper()
if isa_upper not in valid_isa:
msg = "Desired instruction set = {0} is not in the list of valid "\
Expand Down
14 changes: 4 additions & 10 deletions common.mk
Original file line number Diff line number Diff line change
Expand Up @@ -248,10 +248,10 @@ ifeq ($(DO_CHECKS), 1)
## done with check for conflicting options

ifeq (icc,$(findstring icc,$(CC)))
CFLAGS += -xhost -opt-prefetch -opt-prefetch-distance=16 #-vec-report6
ifeq (USE_OMP,$(findstring USE_OMP,$(OPT)))
CFLAGS += -openmp
CLINK += -openmp
CFLAGS += -xhost -axCORE-AVX512
ifeq (USE_OMP,$(findstring USE_OMP,$(OPT)))
CFLAGS += -qopenmp
CLINK += -qopenmp
endif ##openmp with icc
else ## not icc -> gcc or clang follow

Expand Down Expand Up @@ -353,11 +353,6 @@ ifeq ($(DO_CHECKS), 1)
endif # USE_OMP
endif # CC is clang

# #### common options for gcc and clang
# ifeq (USE_AVX,$(findstring USE_AVX,$(OPT)))
# CFLAGS += -mavx
# endif

CFLAGS += -funroll-loops
CFLAGS += -march=native -fno-strict-aliasing
CFLAGS += -Wformat=2 -Wpacked -Wnested-externs -Wpointer-arith -Wredundant-decls -Wfloat-equal -Wcast-qual
Expand Down Expand Up @@ -435,7 +430,6 @@ ifeq ($(DO_CHECKS), 1)
# python3-config failed; let's try python-config (for Python 2 or 3)
PYTHON_CONFIG_EXE:="$(PYTHON_SCRIPTS)/python-config"
endif

$(warning $(ccblue)"PYTHON"$(ccreset) is set to $(ccblue)$(PYTHON)$(ccreset); using $(ccblue)$(PYTHON_CONFIG_EXE)$(ccreset) as $(ccblue)python-config$(ccreset). If this is not correct, please also set $(ccblue)"PYTHON_CONFIG_EXE"$(ccreset) in $(ccgreen)"common.mk"$(ccreset) to appropriate $(ccblue)python-config$(ccreset))
endif

Expand Down
2 changes: 1 addition & 1 deletion mocks.options
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ OPT += -DLINK_IN_RA #link_in_dec must be enabled before link_in_ra
#### Floating point precision to use
OPT += -DDOUBLE_PREC

#### If input distances are already in co-moving (relevant for DDrppi_mocks and vpf)
#### If input distances are already in co-moving (relevant for DDrppi_mocks, DDsmu_mocks and vpf)
#OPT += -DCOMOVING_DIST


Expand Down
2 changes: 1 addition & 1 deletion mocks/DDrppi_mocks/DDrppi_mocks.c
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ int main(int argc, char *argv[])
/*---Count-pairs--------------------------------------*/
results_countpairs_mocks results;
struct config_options options = get_config_options();

/* Pack weights into extra options */
struct extra_options extra = get_extra_options(weight_method);
for(int w = 0; w < num_weights; w++){
Expand Down
6 changes: 3 additions & 3 deletions mocks/DDrppi_mocks/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ LIBNAME := countpairs_rp_pi_mocks
LIBRARY := lib$(LIBNAME).a
LIBSRC := countpairs_rp_pi_mocks.c countpairs_rp_pi_mocks_impl_double.c countpairs_rp_pi_mocks_impl_float.c \
$(UTILS_DIR)/gridlink_mocks_impl_float.c $(UTILS_DIR)/gridlink_mocks_impl_double.c \
$(UTILS_DIR)/utils.c $(UTILS_DIR)/progressbar.c $(UTILS_DIR)/cpu_features.c \
$(UTILS_DIR)/set_cosmo_dist.c $(UTILS_DIR)/cosmology_params.c
$(UTILS_DIR)/utils.c $(UTILS_DIR)/progressbar.c $(UTILS_DIR)/cpu_features.c $(UTILS_DIR)/avx512_calls.c \
$(UTILS_DIR)/set_cosmo_dist.c $(UTILS_DIR)/cosmology_params.c
LIBRARY_HEADERS := $(LIBNAME).h

TARGET := DDrppi_mocks
Expand All @@ -25,7 +25,7 @@ INCL := countpairs_rp_pi_mocks_kernels_float.c countpairs_rp_pi_mocks_kernel
$(UTILS_DIR)/gridlink_mocks_impl_double.h $(UTILS_DIR)/gridlink_mocks_impl_float.h $(UTILS_DIR)/gridlink_mocks_impl.h.src \
$(UTILS_DIR)/cellarray_mocks_float.h $(UTILS_DIR)/cellarray_mocks_double.h $(UTILS_DIR)/cellarray_mocks.h.src \
$(UTILS_DIR)/set_cosmo_dist.h $(UTILS_DIR)/cosmology_params.h $(UTILS_DIR)/progressbar.h $(UTILS_DIR)/cpu_features.h \
$(UTILS_DIR)/utils.h $(UTILS_DIR)/function_precision.h $(UTILS_DIR)/avx_calls.h $(UTILS_DIR)/defs.h \
$(UTILS_DIR)/utils.h $(UTILS_DIR)/function_precision.h $(UTILS_DIR)/avx512_calls.h $(UTILS_DIR)/avx_calls.h $(UTILS_DIR)/defs.h \
$(UTILS_DIR)/weight_functions_double.h $(UTILS_DIR)/weight_functions_float.h $(UTILS_DIR)/weight_functions.h.src \
$(UTILS_DIR)/weight_defs_double.h $(UTILS_DIR)/weight_defs_float.h $(UTILS_DIR)/weight_defs.h.src

Expand Down
20 changes: 16 additions & 4 deletions mocks/DDrppi_mocks/countpairs_rp_pi_mocks_impl.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -107,15 +107,18 @@ countpairs_mocks_func_ptr_DOUBLE countpairs_rp_pi_mocks_driver_DOUBLE(const stru
{

static countpairs_mocks_func_ptr_DOUBLE function = NULL;
static isa old_isa=-1;
static isa old_isa = (isa) -1;
if(old_isa == options->instruction_set) {
return function;
}

/* Array of function pointers */
countpairs_mocks_func_ptr_DOUBLE allfunctions[] = {
#ifdef __AVX512F__
countpairs_rp_pi_mocks_avx512_intrinsics_DOUBLE,
#endif
#ifdef __AVX__
countpairs_rp_pi_mocks_avx_intrinsics_DOUBLE,
countpairs_rp_pi_mocks_avx_intrinsics_DOUBLE,
#endif
#ifdef __SSE4_2__
countpairs_rp_pi_mocks_sse_intrinsics_DOUBLE,
Expand All @@ -125,10 +128,17 @@ countpairs_mocks_func_ptr_DOUBLE countpairs_rp_pi_mocks_driver_DOUBLE(const stru

const int num_functions = sizeof(allfunctions)/sizeof(void *);
const int fallback_offset = num_functions - 1;
#if defined(__AVX__) || defined __SSE4_2__
#if defined(__AVX512F__) || defined(__AVX__) || defined(__SSE4_2__)
const int highest_isa = instrset_detect();
#endif
int curr_offset = 0;

/* Check for AVX512F support */
int avx512_offset = fallback_offset;
#ifdef __AVX512F__
avx512_offset = highest_isa >= 9 ? curr_offset:fallback_offset;
curr_offset++;
#endif

/* Now check if AVX is supported by the CPU */
int avx_offset = fallback_offset;
Expand All @@ -153,7 +163,7 @@ countpairs_mocks_func_ptr_DOUBLE countpairs_rp_pi_mocks_driver_DOUBLE(const stru
/* Check that cpu supports feature */
if(options->instruction_set >= 0) {
switch(options->instruction_set) {
case(AVX512F):
case(AVX512F):function_dispatch=avx512_offset;break;
case(AVX2):
case(AVX):function_dispatch=avx_offset;break;
case(SSE42): function_dispatch=sse_offset;break;
Expand All @@ -173,6 +183,8 @@ countpairs_mocks_func_ptr_DOUBLE countpairs_rp_pi_mocks_driver_DOUBLE(const stru
// This must be first (AVX/SSE may be aliased to fallback)
if(function_dispatch == fallback_offset){
fprintf(stderr,"Using fallback kernel\n");
} else if(function_dispatch == avx512_offset){
fprintf(stderr,"Using AVX512 kernel\n");
} else if(function_dispatch == avx_offset){
fprintf(stderr,"Using AVX kernel\n");
} else if(function_dispatch == sse_offset){
Expand Down
Loading