Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cythonize logsignalrate #4114

Merged
merged 13 commits into from
Sep 19, 2022
Merged

Conversation

spxiwh
Copy link
Contributor

@spxiwh spxiwh commented Aug 23, 2022

I've been spending quite a bit of my free time in the last few weeks trying to optimize PyCBC Live so that it does better keeping up with data, and hopefully reduce latency when it is keeping up.

One bottleneck we've noticed is the logsignalrate function. In the offline all-sky search this function is sent many triggers every call, and was optimized for that. However, in the online search it is called many more times with orders of magnitude fewer triggers. This requires some differences in how we optimize this code (while not impacting on the offline use case).

Cython seems the natural solution here. Cython is not slower when doing operations on arrays with small numbers of components, is just as fast when there's lots of components, and we can change range to prange if we want to use openmpi to speed things up further in the "lots of components" case.

Additionally I have reduced the number of numpy arrays being assigned on the fly (important in particular when being called often, with arrays containing relative few events).

This is still being tested, and I want to get a callgraph comparison of this running at scale, but wanted to post this now for any immediate comments/complaints.

(@ahnitz You may notice a difference in style for the Cython code. I read a good book on Cython practices in the last few weeks, and we do a few things suboptimally in places, in particular with regard to numpy arrays. I will try to improve the other cython code in this regard in the future)

@ahnitz
Copy link
Member

ahnitz commented Aug 23, 2022

@spxiwh One thing I see is the use of a 'cdef' block rather than explicit declaration. Is there anything else you are thinking about? Maybe you are referring to writing new functions only with the buffer interface? I think we've done that for most new functions at this point, I hadn't realized that the cluster function wasn't using it. Did I miss any key differences?

# Scale by signal population SNR
rate[rtype] *= (sref / self.ref_snr) ** -4.0
# Scale by signal population SNR
rate[rtype] *= (sref[rtype] / self.ref_snr) ** -4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing what this change is doing .. & how is sref an object that can be indexed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdent This isn't really a change at all. For the two-detector case this call is now done in the cython code, so this call is moved up a level to only be done in the [unoptimized] three-detector code. The [rtype] index was always there, if you compare line 516 (old code) to 536 (new code) you see that I am now not applying the rtype index so it just moves to here.

sref is an array (it stores the SNR in the reference ifo for all triggers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this is inside the loop over ifos.

Copy link
Contributor

@tdent tdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not going to learn about cython quickly enough to meaningfully review most of this, but see one comment on a non-cython code change that looks a bit weird.
Presumably changes in the actual stat calculation can be/have been tested.

@spxiwh
Copy link
Contributor Author

spxiwh commented Aug 25, 2022

I am not going to learn about cython quickly enough to meaningfully review most of this, but see one comment on a non-cython code change that looks a bit weird. Presumably changes in the actual stat calculation can be/have been tested.

@tdent I just want to ensure that the people [who I think are] the authors of this code are alerted to the change. Functionally it should not change anything, but if there are any edge cases that unittests aren't covering, it's possible for bugs to creep in. So if there are unittests that you'd like to be included as part of checking this, please let me know! This seems to be passing the unittests (but I haven't yet checked why the search is failing).

@spxiwh
Copy link
Contributor Author

spxiwh commented Aug 25, 2022

@spxiwh One thing I see is the use of a 'cdef' block rather than explicit declaration. Is there anything else you are thinking about? Maybe you are referring to writing new functions only with the buffer interface? I think we've done that for most new functions at this point, I hadn't realized that the cluster function wasn't using it. Did I miss any key differences?

@ahnitz I think maybe 3/4 things I wanted to highlight:

  • The cdef block is cute and helps to make declarations clear.
  • Use the buffer protocol, not the numpy cython object. I hadn't really understood why that's a bad idea before, but I'm told that (at least with python 3.7+) it's better to just use the Cython memoryview object. I think in any cython code I've written for PyCBC I've used the cimport numpy Cython object (unless I'm interfacing to native C code).
  • Don't confuse the import numpy and cimport numpy. Use different names for these so that we know when C objects are being used, and when python objects are in use (although I think I ended up removing the cimport numpy in the end ...).
  • I also found it to be a bad idea to do things like x = numpy.zeros(BLAH); LOTS_OF_WORK; return x in a def cython function. It seems better to declare x outside of the cython code and then send it to the cython function without returning anything. (Especially beneficial if the x array can then be cached).

@spxiwh
Copy link
Contributor Author

spxiwh commented Aug 25, 2022

... The search failure does seem genuine and would require what I think is a small patch to the all-sky coinc front-end code.

@spxiwh
Copy link
Contributor Author

spxiwh commented Aug 26, 2022

The patch was even smaller than I expected (cython cares if you set what should be a float to an int!) All tests working now.

@tdent
Copy link
Contributor

tdent commented Aug 30, 2022

Thanks, I made a minor edit - concerning PR approval I think someone with cython experience should do that, so I won't.

@spxiwh spxiwh force-pushed the pr_cythonize_logsignalrate branch from 734be71 to 215d548 Compare September 9, 2022 10:40
@spxiwh spxiwh force-pushed the pr_cythonize_logsignalrate branch from a190f7d to b70cba2 Compare September 9, 2022 16:16
@spxiwh
Copy link
Contributor Author

spxiwh commented Sep 9, 2022

@titodalcanton I would like to proceed with this one now that the unit test is in place. I have made two minor changes to the unit test here:

  • The computation of the statistic can now be different at the numerical precision level because of order-of-operations. So I only check that the values are almost the same (to 1 part in 10^6)
  • The statistic uses bins for time difference etc. If a trigger falls on a different bin due to precision effects, the absolute difference becomes much larger. This should be rare, but because we store end-times as floats, we are only accurate to around microsecond precision, with time bins at millisecond .... So it was actually quite common that a trigger ended up in a different time bin due to numerical issues. This concern isn't present if we discretely sample time (once every 4096 of a second is fine), because then the time accuracy seems to be enough that we don't get triggers with time delays of almost exactly 1ms (or 2ms or ....).

pycbc/events/stat.py Outdated Show resolved Hide resolved
@titodalcanton
Copy link
Contributor

@spxiwh I have no objections apart from the minor comments above, and the following one here: in your last comment, you said the sample rate could be an issue. Is that only relevant to the unit test? Please confirm that the early-warning analysis, which uses a much lower sample rate than the 4096 Hz you mentioned, will behave correctly.

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
@spxiwh
Copy link
Contributor Author

spxiwh commented Sep 19, 2022

@titodalcanton To clarify: The sample rate being a concern is nothing to do with this patch. It's just that it could cause large numerical differences if numerical precision caused an event to be in a different "bin".

With a sample rate of 4096 the delta_t is larger than the precision, and so this doesn't happen (probably this was considered when casting time to a double). If the sample rate is smaller (larger delta_t) the numerical precision becomes even smaller compared to delta_t and so this is not a problem unless sample rate becomes larger.

... I also note that this isn't really a problem. If the time delay between two signals is exactly 0.02 s do you put something in the 0.01 - 0.02 s time delay bin, or the 0.02 - 0.03s time delay bin? Both choices would be right (but 0.02 also isn't possible with our sample rates anyway).

@titodalcanton titodalcanton merged commit 5477f18 into gwastro:master Sep 19, 2022
@spxiwh
Copy link
Contributor Author

spxiwh commented Sep 19, 2022

Thanks @titodalcanton! Can we move through other PRs in the optimization series in the next day or two as well?

@spxiwh spxiwh deleted the pr_cythonize_logsignalrate branch September 19, 2022 11:28
connor-mcisaac pushed a commit to connor-mcisaac/pycbc that referenced this pull request Oct 12, 2022
* DOc additions and Tom-suggested changes

* Stat restructure for profiling DO NOT MERGE

* More test changes

* Finish cython-izing logsignalrate

* Remove test function

* Remove unused variable

* Update pycbc_coinc_findtrigs

* Update eventmgr_cython.pyx

* minor rewrite for less chatty comment

* FIx modulus issue

* Resolve rebase issue

* Update test

* Update pycbc/events/stat.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

Co-authored-by: Thomas Dent <thomas.dent@usc.es>
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
acorreia61201 pushed a commit to acorreia61201/pycbc that referenced this pull request Apr 4, 2024
* DOc additions and Tom-suggested changes

* Stat restructure for profiling DO NOT MERGE

* More test changes

* Finish cython-izing logsignalrate

* Remove test function

* Remove unused variable

* Update pycbc_coinc_findtrigs

* Update eventmgr_cython.pyx

* minor rewrite for less chatty comment

* FIx modulus issue

* Resolve rebase issue

* Update test

* Update pycbc/events/stat.py

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

Co-authored-by: Thomas Dent <thomas.dent@usc.es>
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants