Reorganizing stat.py #3535

spxiwh · 2020-11-25T15:34:43Z

stat.py has become a little bit difficult to work with due to the large amount of classes now being needed to implement even minor changes. I want to rework this according to the pasted image. So far I've implemented this basically as illustrated below, with only those classes. It's possible this could be compressed further, but perhaps this is better? Clearly this PR will not work alone as we will need to remove completely the old stat executables, and hook the new ones up to the renamed methods here. However, I think it's worth seeing first if this is how we want this to look. I think there are some design choices here (most of the questions in red/green in the image) which are not obviously clear.

There's also the question of how to handle singles. At the moment, as I am removing the "sngl_statistic" concept from stat.py there are two places singles can get information. Either via the single_multiifo method for getting a full ranking statistic, or via the get_ranking function in ranking.py to access anything that can be computed from the trigger alone. Are these two places sufficient? If only pycbc_multiifo_sngls_findtrigs uses the single_multiifo method, is it okay that the plotting codes cannot get to exp_fitted singles ranking? Is a requirement to hook that up in some way?

When implementing get_sngls_ranking_from_trigs I first thought of doing something complicated allowing arbitrary extensions to the statistic name (or kwargs) to turn off/on things like sgveto, cuts, psdvar etc. However as the PSDvar affects the chi-squared these things are not all nicely orthogonal. As there are only a small enough number of these rankings at the moment anyway, I did not do the more involved thing. Perhaps at some point we would need to swap over, but I think that could be done ... Or maybe it needs doing now??

tdent · 2020-11-26T19:15:05Z

Thanks Ian. Are the failing builds meaningful?

spxiwh · 2020-11-27T10:01:01Z

@tdent I would expect failures on this, as the changes here are going to require small updates in other parts of the code to match. I'll address those before removing the Work-In-Progress flag ... For now I just post this to ensure that anyone has a chance to speak up if you think this should be done differently.

GarethCabournDavies · 2020-11-27T10:25:58Z

pycbc/events/stat.py

-        the threshold for each of the input triggers.
+    # FIXME: Why does this function not just call straight into single?
+    # FIXME: Should this be renamed for clarity if multiifo is being removed?
+    def single_multiifo(self, single_info):


The single_multiifo() function is to generate a ranking statistic for a single-detector candidate which is equivalent to the ranking statistic of a coincident event.

This is distinct from the single() function as that is used as the intermediate step in the likelihood-based statistics. In sum-of-squares statistics, it is entirely equivalent.

I agree that a better name should be used for this. e.g. rank_stat_single (and coinc_multiifo/coinc renamed to rank_stat_coinc)

@GarethDaviesGW The naming is fine with me. Would you be okay with having rank_stat_single just call straight through to the single method and replace this:
https://github.com/gwastro/pycbc/blob/master/bin/hdfcoinc/pycbc_multiifo_sngls_findtrigs#L144
with just the one call?

Yeah that makes sense

I missed this comment. 'rank_stat_coinc' seems like an unnecessarily long name given that the module and classes in it and instances in the rest of the code already have the name 'ranking' or 'stat'. I'd suggest 'coinc_rank' and 'single_rank'.

Also if we're renaming things, how about renaming 'ranking.py' -> 'sngl_statistic.py' ?

I'm happy to rename these to whatever is most useful.

I would avoid single_rank to avoid confusion with the stuff defined in ranking.py. I'm not quite sure what the right words are to distinguish between "single detector ranking using only information from the trigger itself", which is being referred to as "single ranking" and "single detector ranking using bulk information and fits". Maybe coinc_stat and single_stat??

For similar reasons renaming ranking.py to sngl_statistic confuses in my head the distinction between these two things.

what I'm suggesting would end up with the logic that 'single detector number calculated using only trigger information' is a 'sngl stat(istic)' and things analogous to the coinc ranking stat which may use global information are called 'rank(ing)'
i.e. something calculated at single trigger label is always a 'statistic', but in general won't be a ranking.

and yes, this is a bit backwards on wrt the current naming of stat.py and ranking.py, but i think makes more sense. (also consistent with naming instances of a ranking-stat object as 'rank_method' in scripts)

spxiwh · 2020-12-03T16:01:12Z

This is closer now to something that works, but I'll try to separate off parts of this to get that merged first.

tdent · 2020-12-03T16:32:37Z

ok, sorry, i haven't managed to take much of a detailed look at this yet. presumably the thing to look at first is still ranking.py and stat.py.

spxiwh · 2020-12-03T16:41:21Z

@tdent Yeah, stat.py and ranking.py are the things to look at.

Really there are now 3 ways to get information here:

The coinc method of the stat class for coinc statistic
The single_multiifo (needs renaming) method of the stat class for single statistic (some followup codes e.g. minifollowups could call into this to decide on most interesting singles if they so desire).
The ranking function in ranking.py. This must be computable from the values contained in a single-detector trigger and nothing else.

I find this quite a nice breakdown of information (and the old sngl_statistic_dict is now removed, as Gareth basically duplicated it anyway with the single_multiifo method).

All statistics will have these methods, but in some cases they are not implemented so will fail if called (but one can always add them in if needed).

spxiwh · 2020-12-08T14:14:16Z

@ahnitz @tdent @GarethDaviesGW I've now rebased this against master. I have successfully tested this in a couple of end-to-end workflows now. I think I might still have a few loose ends to connect to the new code, but the main workflow runs and this is ready for review at this point.

Some things to highlight:

Obviously stat.py has had a major reorganization, but it still does the same things (excepting the removal of the old stat code).
When initializing statistics a new argument is now needed: the single-detector ranking. This avoids having to define 48 statistics to be able to do the various versions of PSD_Var on/off SgChisq on/off etc. The single-detector stuff in ranking.py could be reorganized if this starts to get out of hand, but stat.py can handle this.
A new command line group is added for stat.py. This makes it easier for things like the minifollowup code to call into this (very few codes could take the mchirp key-word added some time ago).
Single-detector statistics/rankings: If you want to rank/get statistic values for single-detector triggers you now have two choices. Either the stuff in ranking.py which just takes the trigger and must be usable even by pycbc_inspiral. Or the single-detector statistic as used in 2-OGC, the latter requires using the stat.py command line options.
This does necessitate changes to config files, and cannot be backwards compatible, but the changes are not too much.

tdent · 2020-12-09T12:01:32Z

@spxiwh thanks - I assume you mean "When initializing statistics a new argument is now needed".

spxiwh · 2020-12-09T13:13:53Z

@spxiwh thanks - I assume you mean "When initializing statistics a new argument is now needed".

Indeed, thanks .. .Fixed above.

spxiwh · 2020-12-09T13:32:09Z

I'm not now aware of any more changes that need to be added to this (other than deciding appropriate names for some of the methods as discussed above).

For the remaining CC points, I'll deal with the FIXME in a separate PR and the long line is better than splitting according to (https://black.now.sh/).

tdent · 2020-12-09T14:01:51Z

bin/pycbc_process_sngls

@@ -67,11 +62,12 @@ sngls = hdf.SingleDetTriggers(args.single_trig_file, args.bank_file,\

 if args.statname is not None:
    logging.info('Calculating stat')
-    statengine = stat.sngl_statistic_dict[args.statname](args.statistic_files)
-    #  FIXME: inefficient, as we are calculating the stat on all
+    statengine = stat.get_statistic_from_opts(args, [args.instrument])


can this be called 'rank_method' as in other scripts

pycbc/events/stat.py

tdent · 2020-12-09T14:49:25Z

@spxiwh actually, it might make more sense to merge this once the functionality / testing aspects have been dealt with and then do any renaming in a subsequent PR (which someone else could deal with).

tdent · 2020-12-09T14:53:54Z

@spxiwh I think rather than ValueError the placeholder methods of the base class should raise NotImplementedError

see https://docs.python.org/3/library/exceptions.html#Exception
"In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method"

tdent · 2020-12-09T16:54:32Z

pycbc/events/stat.py

-        be applied (unused in this statistic)
+        err_msg = "This function is a stub that should be overridden by the "
+        err_msg += "sub-classes. You shouldn't be seeing this error!"
+        raise ValueError(err_msg)


as noted in the main PR thread, this and analogous exceptions should preferably be NotImplementedErrors.

tdent

apart from the request to change the 'statengine' instance name to 'rank_method' for consistency, and change ValueError to NotImplementedError for stub methods, the changes look ok to me

spxiwh · 2020-12-09T17:03:19Z

Thanks @tdent I'll make the requested changes, and leave the renaming for a further commit as you suggest.

tdent · 2020-12-09T17:10:43Z

fwiw if you want to make pycbc generally into a '90-ish-character-long-line' codebase as opposed to '79-character-long-line' that should be taken up at a higher level. However in this case it looks like there's only actually one exceptionally long line, which is ignorable.

GarethCabournDavies · 2020-12-10T11:15:05Z

N.B: Just seen another bit of old-workflow-compatible code in add_statmap

titodalcanton · 2020-12-11T12:51:16Z

I am doing a quick test of PyCBC Live with this patch applied and the phasetd_newsnr coinc statistic, as the test in /examples/live does not use the PTA stat files. I switched the PyCBC Live call to use these options:

--sngl-ranking newsnr \
--ranking-statistic phasetd_newsnr \
--statistic-files /same/as/previous/working/tests.hdf \

I get this error:

  File "/somewhere/bin/pycbc_live", line 755, in <module>
    len(bank), args.analysis_chunk, list(combo)))
  File "/somewhere/lib/python3.6/site-packages/pycbc/events/coinc.py", line 864, in from_cli
    **kwargs)
  File "/somewhere/lib/python3.6/site-packages/pycbc/events/coinc.py", line 776, in __init__
    **kwargs
  File "/somewhere/lib/python3.6/site-packages/pycbc/events/stat.py", line 310, in __init__
    ifos=ifos, **kwargs)
  File "/somewhere/lib/python3.6/site-packages/pycbc/events/stat.py", line 58, in __init__
    f = h5py.File(filename, 'r')
  File "/somewhere/lib/python3.6/site-packages/h5py/_hl/files.py", line 395, in __init__
    name = filename_encode(name)
  File "/somewhere/lib/python3.6/site-packages/h5py/_hl/compat.py", line 111, in filename_encode
    filename = fspath(filename)
TypeError: expected str, bytes or os.PathLike object, not list

Apparently this happens because of this (this is output from a print() I added to pycbc_live just before instantiating the LiveCoincTimeslideBackgroundEstimator class, not actual Python code):

args.statistic_files = [['/home/pycbc.live/production/data/H1L1-PHASE_TIME_AMP_v2.hdf']]

Where does the list-of-lists comes from? I would expect a (correct) list of strings by looking at the add_argument() call. @spxiwh any idea?

titodalcanton · 2020-12-11T14:18:26Z

PyCBC Live now gets past the initialization and crashes as soon as some triggers show up:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/somewhere/lib/python3.6/site-packages/pycbc/pool.py", line 42, in _lockstep_fcn
    return fcn(args)
  File "/somewhere/bin/pycbc_live", line 766, in get_coinc
    r = c.add_singles(results)
  File "/somewhere/lib/python3.6/site-packages/pycbc/events/coinc.py", line 1180, in add_singles
    _, coinc_results = self._find_coincs(results, ifos=valid_ifos)
  File "/somewhere/lib/python3.6/site-packages/pycbc/events/coinc.py", line 1053, in _find_coincs
    [0, -1]
  File "/somewhere/lib/python3.6/site-packages/pycbc/events/stat.py", line 628, in rank_stat_coinc
    raise ValueError(err_msg)
ValueError: Sorry! No-one has implemented this method yet!

Am I right that you expect this error @spxiwh? Do you prefer to fix this in a followup PR?

titodalcanton · 2020-12-11T14:19:14Z

Also, FWIW maybe that ValueError should be a NotImplementedError?

spxiwh · 2020-12-11T14:35:26Z

@titodalcanton Yes, I expected this error. It should be fixed in a separate PR, as it would be adding new functionality, not migrating old. (I think PyCBC Live was running the old coinc code with an old version of the PT statistic, neither of which are supported now). Adding this back in should not take long, but I'm not quite sure what the rank_stat_coinc function should be here so I'm leaving the NotImplemented stuff to be added in as people need it.

... And yes, this should be a NotImplemented error. Tom already asked for that and I forgot it, I will change that.

bin/plotting/pycbc_plot_singles_vs_params

spxiwh · 2020-12-11T17:12:08Z

This should now pass all tests, and I think I've addressed all comments/feedback given.

ahnitz

Please merge once Travis / any final cc passes.

* First cut at breaking down stat module * Documentation * More documentation * More documentation * More documentation * Some more doc updates * More doc updating * Updating more docs * Updates * More updating * More updates * More updating stat.py * CodeClimate-ing * More codeclimating * Some more codeclimating * Update coinc code to use reorganized stat.py * Updating exes to new interface * Update eventmgr to new interface * Remove some old stuff from ranking.py * Remove old sngl_statistic concept * Updating hdf.py to new interface * Change name in exe * Fix to ranking.py * Typo fix in hdf.py * Typo in stat.py * Adding in some more missing info * Update in coinc_findtrigs * Missed a line in pycbc_plot_singles_vs_params * Rewire single_multiifo * Hook up mask_to_n_loudest properly * Add new option to sngl_minifollowup * Simplify sngls_findtrigs * Implement a stat_option_group for convenience * Add reminder FIXME * Fix typo * Fix another typo * Fix typo in fit_sngls_by_template * Fix bug in sngl_minifollowup * CodeClimating * Fix typo * Hook up pycbc_live * Fix typo * Some codeclimating * Rename coinc and single_multiifo as suggested * Live must use new interface * Typo * More typos * Fix another typo * More CodeClimating * Fix to pycbc_process_sngls * Update testing config files * Some minor renaming (remove "New") * Fix PyCBC Live interface * Make ValueErrors NotImplementedErrors * Fix nasty typo * Rename as requested by Tom * Hopefully final CodeClimating

spxiwh requested review from ahnitz and tdent November 25, 2020 15:34

GarethCabournDavies reviewed Nov 27, 2020

View reviewed changes

This was referenced Dec 3, 2020

Move hdfcoinc plotting executables to plotting #3543

Merged

Remove old coinc workflow #3544

Merged

This was referenced Dec 4, 2020

Remove old coincidence code #3545

Merged

More removal of old coinc code #3547

Merged

spxiwh force-pushed the pr_stat_reworking branch from 0517eee to db85940 Compare December 8, 2020 14:05

tdent reviewed Dec 9, 2020

View reviewed changes

pycbc/events/stat.py Show resolved Hide resolved

tdent reviewed Dec 9, 2020

View reviewed changes

pycbc/events/stat.py Show resolved Hide resolved

tdent reviewed Dec 9, 2020

View reviewed changes

pycbc/events/stat.py Show resolved Hide resolved

tdent reviewed Dec 9, 2020

View reviewed changes

tdent requested changes Dec 9, 2020

View reviewed changes

ahnitz mentioned this pull request Dec 10, 2020

add search workflow run with condor + pegasus to github actions #3554

Merged

First cut at breaking down stat module

584186c

spxiwh added 4 commits December 11, 2020 10:28

More typos

1ea72e3

Fix another typo

4f5327e

More CodeClimating

09c2a0b

Fix to pycbc_process_sngls

015d664

spxiwh force-pushed the pr_stat_reworking branch from b57337e to 015d664 Compare December 11, 2020 10:29

Update testing config files

4dd89ed

spxiwh added 2 commits December 11, 2020 12:52

Some minor renaming (remove "New")

9fb5a30

Fix PyCBC Live interface

9225761

Make ValueErrors NotImplementedErrors

330df48

titodalcanton reviewed Dec 11, 2020

View reviewed changes

bin/plotting/pycbc_plot_singles_vs_params Show resolved Hide resolved

spxiwh added 2 commits December 11, 2020 17:05

Fix nasty typo

ebe686f

Rename as requested by Tom

3c703cb

spxiwh changed the title ~~[WIP] Reorganizing stat.py~~ Reorganizing stat.py Dec 11, 2020

Hopefully final CodeClimating

c6dc06c

ahnitz approved these changes Dec 14, 2020

View reviewed changes

tdent approved these changes Dec 14, 2020

View reviewed changes

spxiwh merged commit bde15e7 into gwastro:master Dec 14, 2020

spxiwh deleted the pr_stat_reworking branch December 14, 2020 16:01

spxiwh mentioned this pull request Dec 14, 2020

Add getitem to singledettrigs #3558

Merged

titodalcanton mentioned this pull request Dec 15, 2020

Statistic calculations are inflexible #2593

Closed

This was referenced Jan 11, 2021

PyCBC Live test: generate PTA files and use a PTA-based ranking #3580

Closed

PyCBC Live: re-enable the O3 coinc ranking #3584

Closed

titodalcanton mentioned this pull request Jan 26, 2021

Reimplement PhaseTDStatistic #3596

Merged

spxiwh mentioned this pull request Aug 31, 2022

Fix serious bug(?) in PyCBC Live background estimate #4120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganizing stat.py #3535

Reorganizing stat.py #3535

spxiwh commented Nov 25, 2020

tdent commented Nov 26, 2020

spxiwh commented Nov 27, 2020

GarethCabournDavies Nov 27, 2020

spxiwh Nov 27, 2020

GarethCabournDavies Nov 27, 2020

tdent Dec 9, 2020

spxiwh Dec 9, 2020

spxiwh Dec 9, 2020

tdent Dec 9, 2020

tdent Dec 9, 2020

spxiwh commented Dec 3, 2020

tdent commented Dec 3, 2020

spxiwh commented Dec 3, 2020

spxiwh commented Dec 8, 2020 •

edited

Loading

tdent commented Dec 9, 2020

spxiwh commented Dec 9, 2020

spxiwh commented Dec 9, 2020

tdent Dec 9, 2020

tdent commented Dec 9, 2020

tdent commented Dec 9, 2020

tdent Dec 9, 2020

tdent left a comment

spxiwh commented Dec 9, 2020

tdent commented Dec 9, 2020

GarethCabournDavies commented Dec 10, 2020

titodalcanton commented Dec 11, 2020

titodalcanton commented Dec 11, 2020

titodalcanton commented Dec 11, 2020

spxiwh commented Dec 11, 2020

spxiwh commented Dec 11, 2020

ahnitz left a comment

Reorganizing stat.py #3535

Reorganizing stat.py #3535

Conversation

spxiwh commented Nov 25, 2020

tdent commented Nov 26, 2020

spxiwh commented Nov 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spxiwh commented Dec 3, 2020

tdent commented Dec 3, 2020

spxiwh commented Dec 3, 2020

spxiwh commented Dec 8, 2020 • edited Loading

tdent commented Dec 9, 2020

spxiwh commented Dec 9, 2020

spxiwh commented Dec 9, 2020

Choose a reason for hiding this comment

tdent commented Dec 9, 2020

tdent commented Dec 9, 2020

Choose a reason for hiding this comment

tdent left a comment

Choose a reason for hiding this comment

spxiwh commented Dec 9, 2020

tdent commented Dec 9, 2020

GarethCabournDavies commented Dec 10, 2020

titodalcanton commented Dec 11, 2020

titodalcanton commented Dec 11, 2020

titodalcanton commented Dec 11, 2020

spxiwh commented Dec 11, 2020

spxiwh commented Dec 11, 2020

ahnitz left a comment

Choose a reason for hiding this comment

spxiwh commented Dec 8, 2020 •

edited

Loading