-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing only one weights array for cross-correlations of unequal sizes #180
Comments
What is the expected behaviour here? If there was only a scalar passed, then I can understand numpy-style dimension broadcasting. But if there is only one weights array passed, then we should raise a ValueError and request that the user pass another weights array of shape |
The expected behavior is that the Python wrapper automatically provides
such an array. Indeed, this is the current documented behavior (but only
works as intended when the two sets are of equal size).
I think it's sensible to raise a ValueError if the weighting scheme is not
`pair_product`.
…On Thu, May 16, 2019 at 6:17 PM Manodeep Sinha ***@***.***> wrote:
What is the expected behaviour here? If there was only a scalar passed,
then I can understand numpy-style dimension broadcasting. But if there is
only one weights array passed, then we should raise a ValueError and
request that the user pass another weights array of shape (num_weights,
ND2) with the values all set to 1.0 for PAIR_PRODUCT (or whatever would
be equivalent for their weight type for custom weighting schemes). The
num_weights for the second weights array should match that of the first
array.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=ABLA7S6LKY4ENT7BMXAEHNTPVXMQTA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTGS5I#issuecomment-493250933>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABLA7S4QJRWRZAR4BIJ4G3LPVXMQTANCNFSM4HNPNDWA>
.
|
I am not sure how to automatically provide the array for the second set of particles. Say, the |
I just meant generate an array filled with 1s.
…On Thu, May 16, 2019 at 6:40 PM Manodeep Sinha ***@***.***> wrote:
I am not sure how to automatically provide the array for the second set of
particles. Say, the weights1 is full of uniform random numbers - how do
you generate an equivalent array for the second dataset?
Are you saying to generate an array of the correct shape *only* when
weights1 contains only 1 element (i.e., weights1 is a scalar)?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=ABLA7S6M5DYUOEDT6RP6AUDPVXPGRA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTHZUY#issuecomment-493255891>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABLA7S4EALGZ55QB2WBYHWDPVXPGRANCNFSM4HNPNDWA>
.
|
Hi!
I also wonder:
I am writing my own estimator to convert DD, DR, RR for non-periodic BC
into cf, but I wonder about whether there are any normalization factors
when you have weights (I thought there shouldn't but without them I don't
seem to be getting the right answer) -- this also doesn't seem to be in
convert_3d_counts_to_cf(), but I may be wrong.
Thank you for helping.
Best,
Boryana
On Thu, May 16, 2019 at 6:36 PM Lehman Garrison <notifications@github.com>
wrote:
… The expected behavior is that the Python wrapper automatically provides
such an array. Indeed, this is the current documented behavior (but only
works as intended when the two sets are of equal size).
I think it's sensible to raise a ValueError if the weighting scheme is not
`pair_product`.
On Thu, May 16, 2019 at 6:17 PM Manodeep Sinha ***@***.***>
wrote:
> What is the expected behaviour here? If there was only a scalar passed,
> then I can understand numpy-style dimension broadcasting. But if there is
> only one weights array passed, then we should raise a ValueError and
> request that the user pass another weights array of shape (num_weights,
> ND2) with the values all set to 1.0 for PAIR_PRODUCT (or whatever would
> be equivalent for their weight type for custom weighting schemes). The
> num_weights for the second weights array should match that of the first
> array.
>
> —
> You are receiving this because you were assigned.
> Reply to this email directly, view it on GitHub
> <
#180?email_source=notifications&email_token=ABLA7S6LKY4ENT7BMXAEHNTPVXMQTA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTGS5I#issuecomment-493250933
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/ABLA7S4QJRWRZAR4BIJ4G3LPVXMQTANCNFSM4HNPNDWA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=AKJFGGBXUTRIO7GWOWRU2T3PVXOVZA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTHSYI#issuecomment-493255009>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKJFGGAPSBTVWUTGBEPVQNTPVXOVZANCNFSM4HNPNDWA>
.
|
@lgarrison Ahh okay. So a matrix of the correct shape, filled with 1's for the @boryanah Yeah - we really don't have an utility to compute the weighted correlation function (or even check that weights have been used and warn the user). This has been open for a while now in issue #139 |
Thanks for the reference!
I am really sorry if this is very obvious, but which pair counts should be
multiplied by the average in what is fed to the convert function in the
case of different weights for R and D and pair_product type of weighting. I
tried:
cf = convert_3d_counts_to_cf(N_d, N_d, N_r,
N_r,
DD*w_avg,
DR*w_avg,
DR*w_avg, RR)
where w_avg is the mean weight of the data (mean weight of rand is 1), but
it didn't give me the right answer
…On Thu, May 16, 2019 at 6:52 PM Manodeep Sinha ***@***.***> wrote:
@lgarrison <https://github.com/lgarrison> Ahh okay. So a matrix of the
correct shape, filled with 1's for the PAIR_PRODUCT case. ValueError with
the "Please supply weights" for all other weighting schemes...
@boryanah <https://github.com/boryanah> Yeah - we really don't have an
utility to compute the weighted correlation function (or even check that
weights have been used and warn the user). This has been open for a while
now in issue #139 <#139>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=AKJFGGBZZEUORBFGSCYK4P3PVXQTLA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTILJA#issuecomment-493258148>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKJFGGCRUBC2KNGJUPAYGJLPVXQTLANCNFSM4HNPNDWA>
.
|
@boryanah You have to first compute the average weights on the dataset1 (and dataset2 for cross-correlations) with @lgarrison One thing I realise is that our |
@boryanah And then use the different |
That sounds right to me! It seems like we should be returning weightsum
instead of weightavg; it was a poor choice on my part. I guess it couldn't
hurt to add the latter to the results struct.
And good point about rpavg. One option would be to return both, although
that would be extra work in the kernels.
…On Thu, May 16, 2019 at 11:24 PM Manodeep Sinha ***@***.***> wrote:
@boryanah <https://github.com/boryanah> And then use the different
weightsum for each DD, DR, and RR and combine with Landy-Szalay to get
the weighted average. @lgarrison <https://github.com/lgarrison> Will you
please check that I have not omitted steps (or completely mis-stated) ? :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=ABLA7S4MDRZCOAZ6RDZ2LYLPVYQPTA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTT4FA#issuecomment-493305364>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABLA7SZDD7RFLW6UGUBDTIDPVYQPTANCNFSM4HNPNDWA>
.
|
Dear Manodeep,
Thanks a lot! This is what I understood (I think I got 2 wrong)
Say 1 is Data; 2 is Random.
We are trying to compute the factor that should multiply RD before it gets
into the LS estimator. This factor will be F
1)
N1,2 = number of particles
Np1,2 = number of pairs per bin (vector)
w_avg1.2 = "weight average" as returned by the DD function when computing
DD(XYZ1,XYZ2, weights1,weights2) (also vector)
sum1,2 = w_avg1,2*Np1,2 (vector) sum of the products of weights in a given
bin
2)
We also have mean weights:
w_mean1,2 = sum(weights1,2)/N1,2
To compute the sum of weights in the case of random for dataset 1 with
weights for 2, we do
sum_rand1 = N1 * [ N2/Vol * Vol * w_mean2 ] = N1* N2* w_mean2
(I think that I misunderstood this, maybe it is w_mean1*w_mean2*N1*N2 for
same volume?)
3)
Here you say multiply the sum of weights per bin, sum1, by what would have
been the sum of weights for the catalog if random,
which is sum_rand1,
so F1 = sum_rand1*sum1
4)
We substitute DR1 and DR2 in LS (or convert 3d to cf) with DR1*F1 and DR2*F2
This doesn't seem symmetric, so I guess this is DR1 and there is also DR2
which should go into the final estimator.
Best,
Boryana
…On Thu, May 16, 2019 at 11:24 PM Manodeep Sinha ***@***.***> wrote:
@boryanah <https://github.com/boryanah> And then use the different
weightsum for each DD, DR, and RR and combine with Landy-Szalay to get
the weighted average. @lgarrison <https://github.com/lgarrison> Will you
please check that I have not omitted steps (or completely mis-stated) ? :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=AKJFGGCMG7XD7RIUGVMMVGTPVYQPVA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTT4FA#issuecomment-493305364>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKJFGGGJXAQUUJS7QK4IPR3PVYQPVANCNFSM4HNPNDWA>
.
|
I suspect that for 2)
sum_avg1 = N1* [N2/Vol Vol_bin w_mean2] and it is thus a vector and Vol is
the total volume while Vol_bin is only volume of the bin
and for 3) maybe we want
F1 = sum_rand1/sum1
I am really sorry for bothering you.
On Fri, May 17, 2019 at 12:07 AM Hadzhiyska, Boryana <
boryana.hadzhiyska@cfa.harvard.edu> wrote:
… Dear Manodeep,
Thanks a lot! This is what I understood (I think I got 2 wrong)
Say 1 is Data; 2 is Random.
We are trying to compute the factor that should multiply RD before it gets
into the LS estimator. This factor will be F
1)
N1,2 = number of particles
Np1,2 = number of pairs per bin (vector)
w_avg1.2 = "weight average" as returned by the DD function when computing
DD(XYZ1,XYZ2, weights1,weights2) (also vector)
sum1,2 = w_avg1,2*Np1,2 (vector) sum of the products of weights in a given
bin
2)
We also have mean weights:
w_mean1,2 = sum(weights1,2)/N1,2
To compute the sum of weights in the case of random for dataset 1 with
weights for 2, we do
sum_rand1 = N1 * [ N2/Vol * Vol * w_mean2 ] = N1* N2* w_mean2
(I think that I misunderstood this, maybe it is w_mean1*w_mean2*N1*N2 for
same volume?)
3)
Here you say multiply the sum of weights per bin, sum1, by what would have
been the sum of weights for the catalog if random,
which is sum_rand1,
so F1 = sum_rand1*sum1
4)
We substitute DR1 and DR2 in LS (or convert 3d to cf) with DR1*F1 and
DR2*F2
This doesn't seem symmetric, so I guess this is DR1 and there is also DR2
which should go into the final estimator.
Best,
Boryana
On Thu, May 16, 2019 at 11:24 PM Manodeep Sinha ***@***.***>
wrote:
> @boryanah <https://github.com/boryanah> And then use the different
> weightsum for each DD, DR, and RR and combine with Landy-Szalay to get
> the weighted average. @lgarrison <https://github.com/lgarrison> Will you
> please check that I have not omitted steps (or completely mis-stated) ? :)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#180?email_source=notifications&email_token=AKJFGGCMG7XD7RIUGVMMVGTPVYQPVA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTT4FA#issuecomment-493305364>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AKJFGGGJXAQUUJS7QK4IPR3PVYQPVANCNFSM4HNPNDWA>
> .
>
|
Hey @boryanah - no worries at all. Unfortunately, I have a deadline right now but if I have not responded by Monday, please ping me again! And thank you for catching the bug in the first place! |
Thanks a lot! Good luck!! I appreciate it!
…On Fri, May 17, 2019 at 12:39 AM Manodeep Sinha ***@***.***> wrote:
Hey @boryanah <https://github.com/boryanah> - no worries at all.
Unfortunately, I have a deadline right now but if I have not responded by
Monday, please ping me again!
And thank you for catching the bug in the first place!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=AKJFGGHWFOITATYIGER3KCTPVYZILA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTWWLI#issuecomment-493316909>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKJFGGCLQKNHCCC44CNYPFDPVYZILANCNFSM4HNPNDWA>
.
|
@boryanah Thanks for your patience. Would it be possible to add in some lines of code to demonstrate what you are doing? |
Hi,
I will try to write it up soon, but I have switched to something else for
the moment!
Best,
Boryana
…On Mon, May 20, 2019 at 10:57 PM Manodeep Sinha ***@***.***> wrote:
@boryanah <https://github.com/boryanah> Thanks for your patience. Would
it be possible to add in some lines of code to demonstrate what you are
doing?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#180?email_source=notifications&email_token=AKJFGGBM2D3QMQ6QLKRI7F3PWNQKVA5CNFSM4HNPNDWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV2TLWI#issuecomment-494220761>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKJFGGFCJU7JSTMNGEJHUU3PWNQKVANCNFSM4HNPNDWA>
.
|
General information
Issue description
Passing only one weights array when doing a cross-correlation of particle sets of different sizes causes a
RuntimeError
.The cause is that the Python wrappers create a uniform weights array for the second set of particles if only the first was provided, but it is created with size equal to the first particle set.
Minimal failing example
Thanks to @boryanah for finding this and providing the failing example!
The text was updated successfully, but these errors were encountered: