-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cythonize logsignalrate #4114
Cythonize logsignalrate #4114
Conversation
@spxiwh One thing I see is the use of a 'cdef' block rather than explicit declaration. Is there anything else you are thinking about? Maybe you are referring to writing new functions only with the buffer interface? I think we've done that for most new functions at this point, I hadn't realized that the cluster function wasn't using it. Did I miss any key differences? |
# Scale by signal population SNR | ||
rate[rtype] *= (sref / self.ref_snr) ** -4.0 | ||
# Scale by signal population SNR | ||
rate[rtype] *= (sref[rtype] / self.ref_snr) ** -4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not seeing what this change is doing .. & how is sref an object that can be indexed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdent This isn't really a change at all. For the two-detector case this call is now done in the cython code, so this call is moved up a level to only be done in the [unoptimized] three-detector code. The [rtype] index was always there, if you compare line 516 (old code) to 536 (new code) you see that I am now not applying the rtype index so it just moves to here.
sref is an array (it stores the SNR in the reference ifo for all triggers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, this is inside the loop over ifos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not going to learn about cython quickly enough to meaningfully review most of this, but see one comment on a non-cython code change that looks a bit weird.
Presumably changes in the actual stat calculation can be/have been tested.
@tdent I just want to ensure that the people [who I think are] the authors of this code are alerted to the change. Functionally it should not change anything, but if there are any edge cases that unittests aren't covering, it's possible for bugs to creep in. So if there are unittests that you'd like to be included as part of checking this, please let me know! This seems to be passing the unittests (but I haven't yet checked why the search is failing). |
@ahnitz I think maybe 3/4 things I wanted to highlight:
|
... The search failure does seem genuine and would require what I think is a small patch to the all-sky coinc front-end code. |
The patch was even smaller than I expected (cython cares if you set what should be a float to an int!) All tests working now. |
Thanks, I made a minor edit - concerning PR approval I think someone with cython experience should do that, so I won't. |
734be71
to
215d548
Compare
a190f7d
to
b70cba2
Compare
@titodalcanton I would like to proceed with this one now that the unit test is in place. I have made two minor changes to the unit test here:
|
@spxiwh I have no objections apart from the minor comments above, and the following one here: in your last comment, you said the sample rate could be an issue. Is that only relevant to the unit test? Please confirm that the early-warning analysis, which uses a much lower sample rate than the 4096 Hz you mentioned, will behave correctly. |
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
@titodalcanton To clarify: The sample rate being a concern is nothing to do with this patch. It's just that it could cause large numerical differences if numerical precision caused an event to be in a different "bin". With a sample rate of 4096 the delta_t is larger than the precision, and so this doesn't happen (probably this was considered when casting time to a double). If the sample rate is smaller (larger delta_t) the numerical precision becomes even smaller compared to delta_t and so this is not a problem unless sample rate becomes larger. ... I also note that this isn't really a problem. If the time delay between two signals is exactly 0.02 s do you put something in the 0.01 - 0.02 s time delay bin, or the 0.02 - 0.03s time delay bin? Both choices would be right (but 0.02 also isn't possible with our sample rates anyway). |
Thanks @titodalcanton! Can we move through other PRs in the optimization series in the next day or two as well? |
* DOc additions and Tom-suggested changes * Stat restructure for profiling DO NOT MERGE * More test changes * Finish cython-izing logsignalrate * Remove test function * Remove unused variable * Update pycbc_coinc_findtrigs * Update eventmgr_cython.pyx * minor rewrite for less chatty comment * FIx modulus issue * Resolve rebase issue * Update test * Update pycbc/events/stat.py Co-authored-by: Tito Dal Canton <tito@dalcanton.it> Co-authored-by: Thomas Dent <thomas.dent@usc.es> Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
* DOc additions and Tom-suggested changes * Stat restructure for profiling DO NOT MERGE * More test changes * Finish cython-izing logsignalrate * Remove test function * Remove unused variable * Update pycbc_coinc_findtrigs * Update eventmgr_cython.pyx * minor rewrite for less chatty comment * FIx modulus issue * Resolve rebase issue * Update test * Update pycbc/events/stat.py Co-authored-by: Tito Dal Canton <tito@dalcanton.it> Co-authored-by: Thomas Dent <thomas.dent@usc.es> Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
I've been spending quite a bit of my free time in the last few weeks trying to optimize PyCBC Live so that it does better keeping up with data, and hopefully reduce latency when it is keeping up.
One bottleneck we've noticed is the logsignalrate function. In the offline all-sky search this function is sent many triggers every call, and was optimized for that. However, in the online search it is called many more times with orders of magnitude fewer triggers. This requires some differences in how we optimize this code (while not impacting on the offline use case).
Cython seems the natural solution here. Cython is not slower when doing operations on arrays with small numbers of components, is just as fast when there's lots of components, and we can change
range
toprange
if we want to use openmpi to speed things up further in the "lots of components" case.Additionally I have reduced the number of numpy arrays being assigned on the fly (important in particular when being called often, with arrays containing relative few events).
This is still being tested, and I want to get a callgraph comparison of this running at scale, but wanted to post this now for any immediate comments/complaints.
(@ahnitz You may notice a difference in style for the Cython code. I read a good book on Cython practices in the last few weeks, and we do a few things suboptimally in places, in particular with regard to numpy arrays. I will try to improve the other cython code in this regard in the future)