Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing only one weights array for cross-correlations of unequal sizes #180

Closed
lgarrison opened this issue May 16, 2019 · 16 comments
Closed
Assignees
Labels
Milestone

Comments

@lgarrison
Copy link
Collaborator

lgarrison commented May 16, 2019

General information

  • Corrfunc version: v2.2

Issue description

Passing only one weights array when doing a cross-correlation of particle sets of different sizes causes a RuntimeError.

The cause is that the Python wrappers create a uniform weights array for the second set of particles if only the first was provided, but it is created with size equal to the first particle set.

Minimal failing example

import numpy as np
from Corrfunc.theory.DD import DD

N_d = 10000
Lbox = 250.
N_r = N_d*5

x_DM_jack = np.random.uniform(0.,Lbox,N_d)                                                                  
y_DM_jack = np.random.uniform(0.,Lbox,N_d)                                                                  
z_DM_jack = np.random.uniform(0.,Lbox,N_d)

x_rand_jack = np.random.uniform(0.,Lbox,N_r)                                                                  
y_rand_jack = np.random.uniform(0.,Lbox,N_r)                                                                  
z_rand_jack = np.random.uniform(0.,Lbox,N_r)

w_DM_jack = np.full_like(x_DM_jack,1.)                                                                        
w_DM_jack[:] = np.random.uniform(1.,2.,N_d)

bins = np.logspace(-1,1.,25)

autocorr = 0                                                                                 
results = DD(autocorr,nthreads=16,binfile=bins,                                            
                           X1=x_rand_jack, Y1=y_rand_jack, Z1=z_rand_jack,
                           X2=x_DM_jack, Y2=y_DM_jack, Z2=z_DM_jack,weights2=w_DM_jack,                  
                           boxsize=Lbox,periodic=False)
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
error: ERROR: the last dimension of `weights` must match the number of positions.  Instead found n_weights=10000, nx=50000

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-fac2f1ff5530> in <module>
     23                            X1=x_rand_jack, Y1=y_rand_jack, Z1=z_rand_jack,
     24                            X2=x_DM_jack, Y2=y_DM_jack, Z2=z_DM_jack,weights2=w_DM_jack,
---> 25                            boxsize=Lbox,periodic=False)

~/anaconda3/lib/python3.6/site-packages/Corrfunc-2.3.0-py3.6-linux-x86_64.egg/Corrfunc/theory/DD.py in DD(autocorr, nthreads, binfile, X1, Y1, Z1, weights1, periodic, X2, Y2, Z2, weights2, verbose, boxsize, output_ravg, xbin_refine_factor, ybin_refine_factor, zbin_refine_factor, max_cells_per_dim, enable_min_sep_opt, c_api_timer, isa, weight_type)
    246     if extn_results is None:
    247         msg = "RuntimeError occurred"
--> 248         raise RuntimeError(msg)
    249     else:
    250         extn_results, api_time = extn_results

RuntimeError: RuntimeError occurred

Thanks to @boryanah for finding this and providing the failing example!

@lgarrison lgarrison added the bug label May 16, 2019
@lgarrison lgarrison added this to the v2.3.0 milestone May 16, 2019
@lgarrison lgarrison self-assigned this May 16, 2019
@manodeep
Copy link
Owner

What is the expected behaviour here? If there was only a scalar passed, then I can understand numpy-style dimension broadcasting. But if there is only one weights array passed, then we should raise a ValueError and request that the user pass another weights array of shape (num_weights, ND2) with the values all set to 1.0 for PAIR_PRODUCT (or whatever would be equivalent for their weight type for custom weighting schemes). The num_weights for the second weights array should match that of the first array.

@lgarrison
Copy link
Collaborator Author

lgarrison commented May 16, 2019 via email

@manodeep
Copy link
Owner

I am not sure how to automatically provide the array for the second set of particles. Say, the weights1 is full of uniform random numbers - how do you generate an equivalent array for the second dataset?
Are you saying to generate an array of the correct shape only when weights1 contains only 1 element (i.e., weights1 is a scalar)?

@lgarrison
Copy link
Collaborator Author

lgarrison commented May 16, 2019 via email

@boryanah
Copy link

boryanah commented May 16, 2019 via email

@manodeep
Copy link
Owner

@lgarrison Ahh okay. So a matrix of the correct shape, filled with 1's for the PAIR_PRODUCT case. ValueError with the "Please supply weights" for all other weighting schemes...

@boryanah Yeah - we really don't have an utility to compute the weighted correlation function (or even check that weights have been used and warn the user). This has been open for a while now in issue #139

@boryanah
Copy link

boryanah commented May 16, 2019 via email

@manodeep
Copy link
Owner

@boryanah You have to first compute the average weights on the dataset1 (and dataset2 for cross-correlations) with avgweight := sum(weights)/N_particles. In each radial bin, the total sum of weights is weightsum := weightavg * num_pairs_in_that_bin (where the weightavg is in itself a sum of the product of the two weights). Now, you have to figure out what would have been the weightsum if the points were randomly distributed -- so multiply by the number of points in one catalog with the density, volume, and average weight of the other catalog.

@lgarrison One thing I realise is that our rpavg is currently completely unweighted. We should probably also have an option to say return_weighted_rpavg -- where the math will be rp += weight*r within the kernels (instead of rp += r) and then divided by weightavg * num_pairs_in_bin at the end. (I have a vague memory of we discussing this previously, and deciding against weighted rpavg for backwards compatibility)

@manodeep
Copy link
Owner

@boryanah And then use the different weightsum for each DD, DR, and RR and combine with Landy-Szalay to get the weighted average. @lgarrison Will you please check that I have not omitted steps (or completely mis-stated) ? :)

@lgarrison
Copy link
Collaborator Author

lgarrison commented May 17, 2019 via email

@boryanah
Copy link

boryanah commented May 17, 2019 via email

@boryanah
Copy link

boryanah commented May 17, 2019 via email

@manodeep
Copy link
Owner

Hey @boryanah - no worries at all. Unfortunately, I have a deadline right now but if I have not responded by Monday, please ping me again!

And thank you for catching the bug in the first place!

@boryanah
Copy link

boryanah commented May 17, 2019 via email

lgarrison added a commit that referenced this issue May 20, 2019
…s. Also fixes weights reference leak. Closes #180 and #181.
@manodeep
Copy link
Owner

@boryanah Thanks for your patience. Would it be possible to add in some lines of code to demonstrate what you are doing?

@boryanah
Copy link

boryanah commented May 22, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants