Cythonize logsignalrate #4114

spxiwh · 2022-08-23T13:49:29Z

I've been spending quite a bit of my free time in the last few weeks trying to optimize PyCBC Live so that it does better keeping up with data, and hopefully reduce latency when it is keeping up.

One bottleneck we've noticed is the logsignalrate function. In the offline all-sky search this function is sent many triggers every call, and was optimized for that. However, in the online search it is called many more times with orders of magnitude fewer triggers. This requires some differences in how we optimize this code (while not impacting on the offline use case).

Cython seems the natural solution here. Cython is not slower when doing operations on arrays with small numbers of components, is just as fast when there's lots of components, and we can change range to prange if we want to use openmpi to speed things up further in the "lots of components" case.

Additionally I have reduced the number of numpy arrays being assigned on the fly (important in particular when being called often, with arrays containing relative few events).

This is still being tested, and I want to get a callgraph comparison of this running at scale, but wanted to post this now for any immediate comments/complaints.

(@ahnitz You may notice a difference in style for the Cython code. I read a good book on Cython practices in the last few weeks, and we do a few things suboptimally in places, in particular with regard to numpy arrays. I will try to improve the other cython code in this regard in the future)

ahnitz · 2022-08-23T15:59:04Z

@spxiwh One thing I see is the use of a 'cdef' block rather than explicit declaration. Is there anything else you are thinking about? Maybe you are referring to writing new functions only with the buffer interface? I think we've done that for most new functions at this point, I hadn't realized that the cluster function wasn't using it. Did I miss any key differences?

tdent · 2022-08-25T12:48:28Z

pycbc/events/stat.py

-            # Scale by signal population SNR
-            rate[rtype] *= (sref / self.ref_snr) ** -4.0
+                # Scale by signal population SNR
+                rate[rtype] *= (sref[rtype] / self.ref_snr) ** -4.0


I am not seeing what this change is doing .. & how is sref an object that can be indexed?

@tdent This isn't really a change at all. For the two-detector case this call is now done in the cython code, so this call is moved up a level to only be done in the [unoptimized] three-detector code. The [rtype] index was always there, if you compare line 516 (old code) to 536 (new code) you see that I am now not applying the rtype index so it just moves to here.

sref is an array (it stores the SNR in the reference ifo for all triggers).

ah, this is inside the loop over ifos.

tdent

I am not going to learn about cython quickly enough to meaningfully review most of this, but see one comment on a non-cython code change that looks a bit weird.
Presumably changes in the actual stat calculation can be/have been tested.

spxiwh · 2022-08-25T14:44:51Z

I am not going to learn about cython quickly enough to meaningfully review most of this, but see one comment on a non-cython code change that looks a bit weird. Presumably changes in the actual stat calculation can be/have been tested.

@tdent I just want to ensure that the people [who I think are] the authors of this code are alerted to the change. Functionally it should not change anything, but if there are any edge cases that unittests aren't covering, it's possible for bugs to creep in. So if there are unittests that you'd like to be included as part of checking this, please let me know! This seems to be passing the unittests (but I haven't yet checked why the search is failing).

spxiwh · 2022-08-25T14:51:32Z

@spxiwh One thing I see is the use of a 'cdef' block rather than explicit declaration. Is there anything else you are thinking about? Maybe you are referring to writing new functions only with the buffer interface? I think we've done that for most new functions at this point, I hadn't realized that the cluster function wasn't using it. Did I miss any key differences?

@ahnitz I think maybe 3/4 things I wanted to highlight:

The cdef block is cute and helps to make declarations clear.
Use the buffer protocol, not the numpy cython object. I hadn't really understood why that's a bad idea before, but I'm told that (at least with python 3.7+) it's better to just use the Cython memoryview object. I think in any cython code I've written for PyCBC I've used the cimport numpy Cython object (unless I'm interfacing to native C code).
Don't confuse the import numpy and cimport numpy. Use different names for these so that we know when C objects are being used, and when python objects are in use (although I think I ended up removing the cimport numpy in the end ...).
I also found it to be a bad idea to do things like x = numpy.zeros(BLAH); LOTS_OF_WORK; return x in a def cython function. It seems better to declare x outside of the cython code and then send it to the cython function without returning anything. (Especially beneficial if the x array can then be cached).

spxiwh · 2022-08-25T15:00:52Z

... The search failure does seem genuine and would require what I think is a small patch to the all-sky coinc front-end code.

spxiwh · 2022-08-26T08:50:40Z

The patch was even smaller than I expected (cython cares if you set what should be a float to an int!) All tests working now.

pycbc/events/eventmgr_cython.pyx

tdent · 2022-08-30T11:38:36Z

Thanks, I made a minor edit - concerning PR approval I think someone with cython experience should do that, so I won't.

spxiwh · 2022-09-09T16:21:20Z

@titodalcanton I would like to proceed with this one now that the unit test is in place. I have made two minor changes to the unit test here:

The computation of the statistic can now be different at the numerical precision level because of order-of-operations. So I only check that the values are almost the same (to 1 part in 10^6)
The statistic uses bins for time difference etc. If a trigger falls on a different bin due to precision effects, the absolute difference becomes much larger. This should be rare, but because we store end-times as floats, we are only accurate to around microsecond precision, with time bins at millisecond .... So it was actually quite common that a trigger ended up in a different time bin due to numerical issues. This concern isn't present if we discretely sample time (once every 4096 of a second is fine), because then the time accuracy seems to be enough that we don't get triggers with time delays of almost exactly 1ms (or 2ms or ....).

pycbc/events/stat.py

titodalcanton · 2022-09-19T08:29:26Z

@spxiwh I have no objections apart from the minor comments above, and the following one here: in your last comment, you said the sample rate could be an issue. Is that only relevant to the unit test? Please confirm that the early-warning analysis, which uses a much lower sample rate than the 4096 Hz you mentioned, will behave correctly.

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

spxiwh · 2022-09-19T09:10:19Z

@titodalcanton To clarify: The sample rate being a concern is nothing to do with this patch. It's just that it could cause large numerical differences if numerical precision caused an event to be in a different "bin".

With a sample rate of 4096 the delta_t is larger than the precision, and so this doesn't happen (probably this was considered when casting time to a double). If the sample rate is smaller (larger delta_t) the numerical precision becomes even smaller compared to delta_t and so this is not a problem unless sample rate becomes larger.

... I also note that this isn't really a problem. If the time delay between two signals is exactly 0.02 s do you put something in the 0.01 - 0.02 s time delay bin, or the 0.02 - 0.03s time delay bin? Both choices would be right (but 0.02 also isn't possible with our sample rates anyway).

spxiwh · 2022-09-19T11:28:46Z

Thanks @titodalcanton! Can we move through other PRs in the optimization series in the next day or two as well?

* DOc additions and Tom-suggested changes * Stat restructure for profiling DO NOT MERGE * More test changes * Finish cython-izing logsignalrate * Remove test function * Remove unused variable * Update pycbc_coinc_findtrigs * Update eventmgr_cython.pyx * minor rewrite for less chatty comment * FIx modulus issue * Resolve rebase issue * Update test * Update pycbc/events/stat.py Co-authored-by: Tito Dal Canton <tito@dalcanton.it> Co-authored-by: Thomas Dent <thomas.dent@usc.es> Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

spxiwh requested review from ahnitz, tdent and titodalcanton August 23, 2022 13:49

titodalcanton added the low latency label Aug 23, 2022

tdent reviewed Aug 25, 2022

View reviewed changes

tdent reviewed Aug 29, 2022

View reviewed changes

pycbc/events/eventmgr_cython.pyx Outdated Show resolved Hide resolved

tdent reviewed Aug 29, 2022

View reviewed changes

pycbc/events/eventmgr_cython.pyx Show resolved Hide resolved

spxiwh force-pushed the pr_cythonize_logsignalrate branch from 734be71 to 215d548 Compare September 9, 2022 10:40

spxiwh and others added 12 commits September 9, 2022 07:57

DOc additions and Tom-suggested changes

b3b97f1

Stat restructure for profiling DO NOT MERGE

ba3509f

More test changes

4308404

Finish cython-izing logsignalrate

9b5e624

Remove test function

8b82264

Remove unused variable

76c441c

Update pycbc_coinc_findtrigs

79778bb

Update eventmgr_cython.pyx

30c7df9

minor rewrite for less chatty comment

7a80133

FIx modulus issue

7aadf89

Resolve rebase issue

b37c1d5

Update test

b70cba2

spxiwh force-pushed the pr_cythonize_logsignalrate branch from a190f7d to b70cba2 Compare September 9, 2022 16:16

titodalcanton reviewed Sep 19, 2022

View reviewed changes

pycbc/events/stat.py Show resolved Hide resolved

titodalcanton reviewed Sep 19, 2022

View reviewed changes

pycbc/events/stat.py Outdated Show resolved Hide resolved

Update pycbc/events/stat.py

e6e3b0b

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

titodalcanton approved these changes Sep 19, 2022

View reviewed changes

titodalcanton merged commit 5477f18 into gwastro:master Sep 19, 2022

spxiwh deleted the pr_cythonize_logsignalrate branch September 19, 2022 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cythonize logsignalrate #4114

Cythonize logsignalrate #4114

spxiwh commented Aug 23, 2022

ahnitz commented Aug 23, 2022

tdent Aug 25, 2022

spxiwh Aug 25, 2022

tdent Aug 29, 2022

tdent left a comment

spxiwh commented Aug 25, 2022

spxiwh commented Aug 25, 2022

spxiwh commented Aug 25, 2022

spxiwh commented Aug 26, 2022

tdent commented Aug 30, 2022

spxiwh commented Sep 9, 2022

titodalcanton commented Sep 19, 2022

spxiwh commented Sep 19, 2022

spxiwh commented Sep 19, 2022

Cythonize logsignalrate #4114

Cythonize logsignalrate #4114

Conversation

spxiwh commented Aug 23, 2022

ahnitz commented Aug 23, 2022

tdent Aug 25, 2022

Choose a reason for hiding this comment

spxiwh Aug 25, 2022

Choose a reason for hiding this comment

tdent Aug 29, 2022

Choose a reason for hiding this comment

tdent left a comment

Choose a reason for hiding this comment

spxiwh commented Aug 25, 2022

spxiwh commented Aug 25, 2022

spxiwh commented Aug 25, 2022

spxiwh commented Aug 26, 2022

tdent commented Aug 30, 2022

spxiwh commented Sep 9, 2022

titodalcanton commented Sep 19, 2022

spxiwh commented Sep 19, 2022

spxiwh commented Sep 19, 2022