AVX512F kernels, min separation optimizations, not duplicating partic…

…le positions, array of cell-pairs (#173) * WIP: Added the machinery for quicker exits * WIP: Adding in early exits to the theory routines * Break out of next j-loop if any dz values in current iteration are larger than 'max_dz' * [WIP] Started implementing the min sep optimizations for mocks * Preparing for v2.3 (#170) * started basic work on avx512 (completely wrong results currently) * Fixed the number of bits set in the mask. npairs now agrees with the test, rpavg and weightavg are still wrong * Trying mask loads. everything is broken now, including npairs * Working version with AVX512 (requires AVX512VL, ie Skylake cpus). Not been valgrind'ed but the test passes. Only compiles with icc right now * Compiles with gcc7.3 (but not with gcc6.4) - might be a compiler bug * AVX512 is awesome! Supports masked horizontal adds across vector registers * Cleaned up the logic in the initial part of the loop. * Improved handling of missing numpy. Renamed the fma macros to reflect that there are many FMA options available * Moved all the union declarations into the header files. Removed the mask horizontal add because that was only supported by intel compilers and composed of multiple separate intrinsics. Fixed the missing closing brace from c++ compilers in the sse42 header * Added the AVX2 implementation for wp. No performance improvement because I could not get the integer blends to work with gcc. So the AVX2 is really identical to the AVX implementation except for the fma involved * mostly finished with the avx512 * Cleaned up the dependency statements in the Makefiles. Protected the FMA calls for different instruction sets * Replaced the mask loads with maskz loads. Replaced the comparison within the histogram update loop to a faster bitwise operation. Changed the floating point operations to the quiet kinds from the signalling kinds. Added in some more masked operations to the avx512 header, and removed ops that do not carry over from avx * Should not compile python extensions if python/numpy are not available * Added the avx512f kernels for DDrppi_mocks. Integration tests pass. The speedup is not that impressive though * Bumped version * Silencing compiler warning about meaningless type qualifier * Replaced the Newton-Raphson steps with FMA equivalents * Fixed a logic error for the AVX512 case with count_vectorized option. Removed extraneous comment symbols that were causing compile failures * Updated the scripts for benchmarking the speedup from avx512f * Added the DD kernel. Integration tests pass * Added the xi kernel. Integration tests pass * Added the avx512f kernel for DDrppi. Integration tests pass * Replaced the set zero to the actual intrinsic dedicated to that purpose * Fixed the setzero int call * Added the DDsmu pair counter. Not real speed improvement; tests pass * Cleaned up the initial search for valid pairs in wp * Added the DDsmu_mocks (compiles but no checks have been run) * Added the DDtheta kernel (not compiled, tested or debugged) * fast divide options needs to be added to DDsmu theory at some point [ci skip] * Fixed bug in DDtheta. Integration tests pass now * Removed unused variable. Tests pass for DDsmu_mocks * Adding in the fast_divide option to theory/DDsmu paircounter. Not tested * Fixing the typos in fast-divide part of DDsmu * Fixed typo. And ported the nicer way of figuring out the start index for the second set of points to the AVX kernel * Removed icc as default compiler * Attempting to fix travis failures from AVX512 code * Removing the old xcode6.4 - seems to be not supported on travis any more * No longer counting blank lines. Partly fixes #160 * Thetamin of 0.0 is allowed * Fixing the build failure in #168 * Adding in the fast_divide option to the kernels. hopefully fixing build failure * Fixing build failure for numpy dependency * Made a changelog entry [ci skip] * conda uses secure channels now * Corrected the PR # [ci skip] * Added in the PR # to the min separations [ci skip] * Fixing build failure * Added in the min sep into the avx512 kernels * Added the avx512 intrinsic for computing the NOT of a mask * Added the avx512 kernel for vpf * Added in the boolean option for min_sep_opt to theory pair-counters * Noted that avx512f is now available * Changed docstrings for avx512f * Added the min_sep_opt keyword handling to the C extension for python * Integration tests pass for min. sep optimizations for theory routines. * Integration tests pass for mocks * Added the min-sep-opt handling to the python extension and the wrappers * Ignoring the files with output from integration tests [ci skip] * Added in the enable_min_sep option to the extensions * Fix benchmark scripts for Python 3. Make Python output redirection not break if output has already been redirected. * Added an option to generate SIMD speedup from numpart/rmax scalings * Added the speedup tables * Added in min-sep based optimizations for xi * Option to turn off min_sep_opt for paper tests * Added in a bounds to the lattice to compute min separations * Fix min-sep optimization in xi kernel; needs porting to other kernels * Added in the min_sep_optimizations for xi * Removing zmin pointer since code seems to run slower. Tests pass * Added the z1_min pointer back in * Removed zmin again since the tests are 10\% slower * Added in min_sep_optmizations for wp. Integration tests pass * Added in the updated gridlinking code for 2D separations * Fixing the doctest failure (white-space issues) * Undoing (mostly) the last commit * Should fully undo the whitespace fix that broke the build * Adding a continue with max_dz condition check * Added in min-sep-opt for DD. Integration tests pass * Added min sep opt for DDrppi. Integration tests pass * Adding DDsmu min-sep-opt and gridlink fixes. Untested * Attempting to fix the bounding box calculations. * Tried to make the variable naming conventions clearer. Plus, continue statements in simd modes * Fixing compile failure * Fixing (integration) test failure in xi * Removed the negative pimax check * Corrected the logic for updating min distance between cell-pairs back to the original implementation * Moved the assignment after all the bounding box checks * Added min-sep-opt for mocks DDrppi * Fixing compile failure * Fixed (inconsequential) typos [ci skip] * Added min-sep-opt for DDsmu mocks. Integration tests pass * Added an integration test by default * Trying to fix (travis) compile failure for integration tests * Splitting up the make tests into two stages * Fixed whatspace errors * Forgot to pass through the min-sep-opt option to the python extension * Perhaps a missing whitespace before semi-colon * Updated the min_dz calculation * Added the min_dz update to the mocks * Fixed #161 * Changed the variable type to int64_t for the rmin==0.0 bin counts correction. Integration tests PASS for all mocks and all theory routines * Bumped version * Copied from the numpy setup.py file. Deleting the Corrfunc setup env var * Trying to fix travis failure for integration tests * Added the PR into the changelog [ci skip] * The warning was failing the build * Fixed the compile failure with gcc * The bug-fix should only be for gcc <= gcc8 and not for other compilers * The integration tests exceed time-limit on travis - removing * Only print the missing openmp support for Apple clang once * Dropped xcode7.3 from travis - seems to be causing build failure (due to wurlitzer) * Only print the openmp warning once * WIP: Adding min-sep-opt for wtheta (based on chord separation) * WIP: Min-sep-opt for DDtheta mocks * Removed cz from the mocks lattice structure * WIP: Tests fail for DDtheta * Fix weights corruption when using brute-force DDtheta kernel. Now appears to fail npairs by one particle when linking in RA. * Improved the error message under test failure * WIP: Possibly a working solution for DDtheta. Integration tests not done yet * Changing the avg_np calculation to include both datasets * Fixed the DDtheta bug * Propagated the -max_dz search across all relevant pair-counters * Only print the boosting bin-ref message in verbose mode * Should only print the info message in verbose mode * Fixed bug in xi * Removed unnecessary if condition * Had missed adding the double-counting check in the brute force wtheta (#161) * DDtheta brute-force is always a cross-corr (#161) * Added a few missing error messages about malloc failures * Added in the low-32 bits multiplication for avx512 * Only printing the clang-openmp message once * Use int instead of int8 for max_nmesh. Fixes #179 (grid refinement bug). * WIP: Option to use the particle positions in-place and returning an array of cell-pairs * WIP: Implementations for the theory paircounters * Adding the python bindings and tests for theory paircounters * Added in the new config options for the python wrappers * Attempting to fix build failure and pep8 issues * Fixed docstrings and updated changelog * Fixing docstrings [ci skip] * Fixed another docstring issue [ci skip] * More docstring fixes * Docstring fixes plus added big/little endian fix to DDsmu * Renamed copy_particle_positions to copy_particles (since both positions and weights are copied) * Reordered the sequence when running integration tests * Header for the new cell-pair struct, only one theory cellarray struct now (rather than two) * Initializing the xmin/xmax etc variables to floating limits * Created a new file containing gridlink utilities * Removed the reorder option completely. Fixed memory leaks during integration tests * Added the copy_particles option for mocks. Passing sqr_smax/sqr_smin into the DDsmu kernels instead of smax/smin * Added the copy_particles (and min-sep-opt for DDtheta) into mocks python extension * Fixed up the docs in the python theory extensions * A fix for automatic cache linesize detection (not used currently) * Added the copy_particles to the mocks python wrappers * Forgot to pass on the copy_particles value into the python extensions * Free the memory only when tests are successful * Fixing build failure * Removed duplicate function prototype * Fixed one set of memory leaks * More fixes for memory leaks * Changed the particle numbers in the tests to be 64-bit integers * Added the sanitize options if running on travis/other CI * Added utility functions to find the min and max separation between two cells * Fixed compile failure * Commenting out the fsanitize options to fix 'unknown compiler option' compile failure * Making copy_particles be the default * setting copy_particles=True as the default in theory * Fixed memory leaks with ra-dec linking and DDtheta * Add in the sanitize flags when running integration tests * Fix automatic uniform weight arrays and broadcasting of scalar weights. Also fixes weights reference leak. Closes #180 and #181. * Update changelog [ci skip] * * Fix doctest backwards compatibility with old Numpy print formatting * Edit Travis config to only build PRs and master branch (eliminates duplicate builds in PRs) * Fix indentation in utils.py * Update changelog Trying to appease astropy-bot * Another attempt at changelog * Fix changelog formatting? [ci skip] * Fix RST parser warnings in change log [ci skip]
manodeep · May 21, 2019 · 91e40fc · 91e40fc
1 parent 753aa50
commit 91e40fc
Show file tree

Hide file tree

Showing 115 changed files with 14,469 additions and 8,400 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,7 @@
 *.a
 *.so
 *.so.*
+*.bak
 xx*
 yy*
 zz*
@@ -28,10 +29,12 @@ wp
 xi
 DDrppi
 wprp
+DDsmu
 bin/*
 include/*
 run_correlations
 test_*period*
+*output*integration*
 *.tgz
 cov-int
 *.gcno

diff --git a/.travis.yml b/.travis.yml
@@ -33,7 +33,7 @@ matrix:
     #   before_install:
     #     - brew update
     #     - brew tap homebrew/versions && brew reinstall gcc49 --without-multilib
-    #     - wget http://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
+    #     - wget https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
 
     # - os: osx
     #   compiler: clang
@@ -42,36 +42,35 @@ matrix:
     #     - brew update
     #     - brew outdated xctool || brew upgrade xctool
     #     - brew tap homebrew/versions && brew install clang-omp
-    #     - wget http://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
+    #     - wget https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
     - os: osx
       osx_image: xcode9
       compiler: clang
       env: COMPILER=clang FAMILY=clang V='Apple LLVM 7.0.0' PYTHON_VERSION=3.6 DOCTEST=FALSE
       before_install:
-        - wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
+        - wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
 
 
     - os: osx
       osx_image: xcode8
       compiler: clang
       env: COMPILER=clang FAMILY=clang V='Apple LLVM 7.0.0' PYTHON_VERSION=3.5 DOCTEST=FALSE
       before_install:
-        - wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
+        - wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
 
     - os: osx
-      osx_image: xcode7.3
+      osx_image: xcode8
       compiler: clang
       env: COMPILER=clang FAMILY=clang V='Apple LLVM 7.0.0' PYTHON_VERSION=2.7 DOCTEST=FALSE
       before_install:
-        - wget http://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh -O miniconda.sh
-
+        - wget https://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh -O miniconda.sh
 
     # - os: osx
     #   compiler: gcc
     #   env: COMPILER=gcc-4.8 V='4.8' PYTHON_VERSION=3.5 FAMILY=gcc
     #   before_install:
     #     - brew update && brew tap homebrew/versions && brew install gcc48 --without-multilib
-    #     - wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
+    #     - wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
 
     # - os: linux
     #   dist: trusty
@@ -83,7 +82,7 @@ matrix:
     #       packages: ['clang-3.6','libgsl0-dev']
     #   env: COMPILER=clang-3.6 V=3.6 PYTHON_VERSION=2.7 
     #   before_install:
-    #     - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
+    #     - wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
 
     # - os: linux
     #   dist: trusty
@@ -95,39 +94,39 @@ matrix:
     #       packages: ['clang-3.6','libgsl0-dev']
     #   env: COMPILER=clang-3.6 V=3.6 PYTHON_VERSION=3.5
     #   before_install:
-    #     - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
+    #     - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
 
     - os: linux
       dist: trusty
       sudo: required
       compiler: gcc
       env: COMPILER=gcc PYTHON_VERSION=2.7
       before_install:
-        - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
+        - wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
 
     - os: linux
       dist: trusty
       sudo: required
       compiler: gcc
       env: COMPILER=gcc PYTHON_VERSION=3.4 
       before_install:
-        - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
+        - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
 
     - os: linux
       dist: trusty
       sudo: required
       compiler: gcc
       env: COMPILER=gcc PYTHON_VERSION=3.5
       before_install:
-        - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
+        - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
 
     - os: linux
       dist: trusty
       sudo: required
       compiler: gcc
       env: COMPILER=gcc PYTHON_VERSION=3.6
       before_install:
-        - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
+        - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
 
 install:
   - bash miniconda.sh -b -p $HOME/miniconda
@@ -142,8 +141,11 @@ install:
   - python setup.py install
 
 script:
+  - echo $CFLAGS
   - make tests CC=$COMPILER
   - make -C docs html
   - if [[ "${DOCTEST}" == "TRUE" ]]; then make -C docs doctest ; fi
 
-
+branches:
+  only: 
+    - master
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -10,23 +10,25 @@ New features
 - conda installable package
 - GPU version
 
-
-2.3.0
-=======
+2.3.0 (upcoming)
+================
 
 **Breaking Changes**
 --------------------
 
 New features
 ------------
+- AVX512F kernels for all pair-counters, faster code from new optimizations using the minimum separation between pairs of cells, option to use the input particle arrays directly and not make a copy of the particle positions, internal code changes to (hopefully) achieve better OpenMP scaling [#167, #170, #173]
 
 Bug fixes
 ---------
 - Fix segmentation fault in vpf_mocks [#168]
+- Fix automatic uniform weights array when only one set of weights (or a scalar) is passed [#180]
+- Fix memory leak due to Python reference leak when using weights [#181]
 
 
-2.2.0
-=======
+2.2.0 (2018-08-18)
+==================
 
 **Breaking Changes**
 --------------------
@@ -43,8 +45,8 @@ Bug fixes
   instead of the unhelpful "TypeError: 'NoneType' object is not iterable". [#158]
 
 
-2.1.0
-=======
+2.1.0 (2018-08-17)
+==================
 
 New features
 ------------
@@ -56,12 +58,12 @@ Enhancements
 ------------
 - GSL version now specified and tested by Travis [#164]
 - Now possible to specify the number of Newton-Raphson steps to
-improve accuracy of approximate reciprocals. Available in `DD(rp, pi)` for mocks,
-and `DD(s, mu)` for both theory and mocks
+  improve accuracy of approximate reciprocals. Available in `DD(rp, pi)` for mocks,
+  and `DD(s, mu)` for both theory and mocks
 
 
-2.0.0
-=======
+2.0.0 (2017-04-06)
+==================
 
 New features
 ------------
@@ -86,8 +88,9 @@ Enhancements
 
 - Ctrl-C now aborts even within python extensions (cleans up memory too!, `#12 <https://github.com/manodeep/Corrfunc/issues/12>`_)
 - Significantly improved installation for python
-  - compiler can now be specified within ``python setup.py install
-    CC=yourcompiler`` `#31<https://github.com/manodeep/Corrfunc/issues/31>`_
+
+  - compiler can now be specified within ``python setup.py install CC=yourcompiler``
+    `#31<https://github.com/manodeep/Corrfunc/issues/31>`_
   - python via an alias is now solved `#52 <https://github.com/manodeep/Corrfunc/issues/52>`_
 
 
@@ -107,50 +110,50 @@ Outstanding issues
 - Parameter parsing in python extensions can be flaky (`#79 <https://github.com/manodeep/Corrfunc/issues/79>`_)
 
 
-1.1.0 (June 8, 2016)
-=====================
+1.1.0 (2016-06-08)
+===================
 
 - SSE kernels for all statistics
 - Incorrect normalization in ``xi``. **ALL** previous
   ``xi`` calculations were wrong.
 
 
-1.0.0 (Apr 14, 2016)
-====================
+1.0.0 (2016-04-14)
+==================
 
 - Improved installation process  
 - Detecting ``AVX`` capable CPU at compile time
 - Double-counting bug fixes in ``wp`` and ``xi``
 
 
-0.2.3 (Mar 30, 2016)
-=====================
+0.2.3 (2016-03-30)
+==================
 
 - Streamlined compilation on MACs
 - PyPI version is not verbose by default
 
 
-0.2.2 (Feb 9, 2016)
-====================
+0.2.2 (2016-02-09)
+==================
 
 - First version on `PyPI <https://pypi.python.org/pypi/Corrfunc>`_
 
 
-0.2.1 (Feb 6, 2016)
-====================
+0.2.1 (2016-02-06)
+==================
 
 - ``AVX`` enabled by default
 
 
-0.2.0 (Feb 5, 2016)
-====================
+0.2.0 (2016-02-05)
+==================
 
 - Python 2/3 compatible
 
 
 
-0.0.1 (Nov 11, 2015)
-====================
+0.0.1 (2015-11-11)
+==================
 
 - Initial release
 
diff --git a/Corrfunc/__init__.py b/Corrfunc/__init__.py
@@ -10,7 +10,7 @@
                         unicode_literals)
 import os
 
-__version__ = "2.2.0"
+__version__ = "2.3.0"
 __author__ = "Manodeep Sinha <manodeep@gmail.com>"
 
 

diff --git a/Corrfunc/call_correlation_functions.py b/Corrfunc/call_correlation_functions.py
@@ -30,6 +30,7 @@ def main():
     tstart = time.time()
     t0 = tstart
     x, y, z = read_catalog()
+    w = np.ones((1,len(x)), dtype=x.dtype)
     boxsize = 420.0
     t1 = time.time()
     print("Done reading the data - time taken = {0:10.1f} seconds"
@@ -47,7 +48,7 @@ def main():
 
     print("Running 3-D correlation function DD(r)")
     results_DD, _ = DD_extn(autocorr, nthreads, binfile, x, y, z,
-                            weights1=np.ones_like(x), weight_type='pair_product',
+                            weights1=w, weight_type='pair_product',
                             verbose=True, periodic=periodic, boxsize=boxsize)
     print("\n#      **** DD(r): first {0} bins  *******       "
           .format(numbins_to_print))
@@ -62,7 +63,7 @@ def main():
     print("\nRunning 2-D correlation function DD(rp,pi)")
     results_DDrppi, _ = DDrppi_extn(autocorr, nthreads, pimax,
                                     binfile, x, y, z,
-                                    weights1=np.ones_like(x), weight_type='pair_product',
+                                    weights1=w, weight_type='pair_product',
                                     verbose=True, periodic=periodic,
                                     boxsize=boxsize)
     print("\n#            ****** DD(rp,pi): first {0} bins  *******      "
@@ -82,7 +83,7 @@ def main():
     results_DDsmu, _ = DDsmu_extn(autocorr, nthreads, binfile,
                                     mu_max, nmu_bins,
                                     x, y, z,
-                                    weights1=np.ones_like(x), weight_type='pair_product',
+                                    weights1=w, weight_type='pair_product',
                                     verbose=True, periodic=periodic,
                                     boxsize=boxsize, output_savg=True)
     print("\n#            ****** DD(s,mu): first {0} bins  *******      "
@@ -98,7 +99,7 @@ def main():
     print("\nRunning 2-D projected correlation function wp(rp)")
     results_wp, _, _ = wp_extn(boxsize, pimax, nthreads,
                             binfile, x, y, z,
-                            weights=np.ones_like(x), weight_type='pair_product',
+                            weights=w, weight_type='pair_product',
                             verbose=True)
     print("\n#            ******    wp: first {0} bins  *******         "
           .format(numbins_to_print))
@@ -113,7 +114,7 @@ def main():
     print("\nRunning 3-D auto-correlation function xi(r)")
     results_xi, _ = xi_extn(boxsize, nthreads, binfile,
                             x, y, z,
-                            weights=np.ones_like(x), weight_type='pair_product',
+                            weights=w, weight_type='pair_product',
                             verbose=True)
 
     print("\n#            ******    xi: first {0} bins  *******         "

diff --git a/Corrfunc/call_correlation_functions_mocks.py b/Corrfunc/call_correlation_functions_mocks.py
@@ -29,6 +29,7 @@ def main():
 
     t0 = time.time()
     ra, dec, cz = read_catalog(filename)
+    w = np.ones((1,len(ra)), dtype=ra.dtype)
     t1 = time.time()
     print("RA min  = {0} max = {1}".format(np.min(ra), np.max(ra)))
     print("DEC min = {0} max = {1}".format(np.min(dec), np.max(dec)))
@@ -49,7 +50,7 @@ def main():
     results_DDrppi, _ = rp_pi_mocks_extn(autocorr, cosmology, nthreads,
                                          pimax, binfile,
                                          ra, dec, cz,
-                                         weights1=np.ones_like(ra), weight_type='pair_product',
+                                         weights1=w, weight_type='pair_product',
                                          output_rpavg=True, verbose=True)
     print("\n#            ****** DD(rp,pi): first {0} bins  *******      "
           .format(numbins_to_print))
@@ -68,7 +69,7 @@ def main():
     print("\nRunning 2-D correlation function xi(s,mu)")
     results_DDsmu, _ = s_mu_mocks_extn(autocorr, cosmology, nthreads,
                                        mu_max, nmu_bins, binfile,
-                                       ra, dec, cz, weights1=np.ones_like(ra),
+                                       ra, dec, cz, weights1=w,
                                        output_savg=True, verbose=True,
                                        weight_type='pair_product')
     print("\n#            ****** DD(s,mu): first {0} bins  *******      "
@@ -87,8 +88,8 @@ def main():
     print("\nRunning angular correlation function DD(theta)")
     results_wtheta, _ = theta_mocks_extn(autocorr, nthreads, binfile,
                                          ra, dec, RA2=ra, DEC2=dec,
-                                         weights1=np.ones_like(ra),
-                                         weights2=np.ones_like(ra),
+                                         weights1=w,
+                                         weights2=w,
                                          weight_type='pair_product',
                                          output_thetaavg=True, fast_acos=True,
                                          verbose=1)