-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganizing stat.py #3535
Reorganizing stat.py #3535
Conversation
Thanks Ian. Are the failing builds meaningful? |
@tdent I would expect failures on this, as the changes here are going to require small updates in other parts of the code to match. I'll address those before removing the Work-In-Progress flag ... For now I just post this to ensure that anyone has a chance to speak up if you think this should be done differently. |
pycbc/events/stat.py
Outdated
the threshold for each of the input triggers. | ||
# FIXME: Why does this function not just call straight into single? | ||
# FIXME: Should this be renamed for clarity if multiifo is being removed? | ||
def single_multiifo(self, single_info): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single_multiifo() function is to generate a ranking statistic for a single-detector candidate which is equivalent to the ranking statistic of a coincident event.
This is distinct from the single() function as that is used as the intermediate step in the likelihood-based statistics. In sum-of-squares statistics, it is entirely equivalent.
I agree that a better name should be used for this. e.g. rank_stat_single (and coinc_multiifo/coinc renamed to rank_stat_coinc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GarethDaviesGW The naming is fine with me. Would you be okay with having rank_stat_single
just call straight through to the single
method and replace this:
https://github.com/gwastro/pycbc/blob/master/bin/hdfcoinc/pycbc_multiifo_sngls_findtrigs#L144
with just the one call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this comment. 'rank_stat_coinc' seems like an unnecessarily long name given that the module and classes in it and instances in the rest of the code already have the name 'ranking' or 'stat'. I'd suggest 'coinc_rank' and 'single_rank'.
Also if we're renaming things, how about renaming 'ranking.py' -> 'sngl_statistic.py' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to rename these to whatever is most useful.
I would avoid single_rank
to avoid confusion with the stuff defined in ranking.py
. I'm not quite sure what the right words are to distinguish between "single detector ranking using only information from the trigger itself", which is being referred to as "single ranking" and "single detector ranking using bulk information and fits". Maybe coinc_stat
and single_stat
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For similar reasons renaming ranking.py to sngl_statistic
confuses in my head the distinction between these two things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm suggesting would end up with the logic that 'single detector number calculated using only trigger information' is a 'sngl stat(istic)' and things analogous to the coinc ranking stat which may use global information are called 'rank(ing)'
i.e. something calculated at single trigger label is always a 'statistic', but in general won't be a ranking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes, this is a bit backwards on wrt the current naming of stat.py and ranking.py, but i think makes more sense. (also consistent with naming instances of a ranking-stat object as 'rank_method' in scripts)
This is closer now to something that works, but I'll try to separate off parts of this to get that merged first. |
ok, sorry, i haven't managed to take much of a detailed look at this yet. presumably the thing to look at first is still ranking.py and stat.py. |
@tdent Yeah, Really there are now 3 ways to get information here:
I find this quite a nice breakdown of information (and the old All statistics will have these methods, but in some cases they are not implemented so will fail if called (but one can always add them in if needed). |
0517eee
to
db85940
Compare
@ahnitz @tdent @GarethDaviesGW I've now rebased this against master. I have successfully tested this in a couple of end-to-end workflows now. I think I might still have a few loose ends to connect to the new code, but the main workflow runs and this is ready for review at this point. Some things to highlight:
|
@spxiwh thanks - I assume you mean "When initializing statistics a new argument is now needed". |
Indeed, thanks .. .Fixed above. |
I'm not now aware of any more changes that need to be added to this (other than deciding appropriate names for some of the methods as discussed above). For the remaining CC points, I'll deal with the FIXME in a separate PR and the long line is better than splitting according to (https://black.now.sh/). |
bin/pycbc_process_sngls
Outdated
@@ -67,11 +62,12 @@ sngls = hdf.SingleDetTriggers(args.single_trig_file, args.bank_file,\ | |||
|
|||
if args.statname is not None: | |||
logging.info('Calculating stat') | |||
statengine = stat.sngl_statistic_dict[args.statname](args.statistic_files) | |||
# FIXME: inefficient, as we are calculating the stat on all | |||
statengine = stat.get_statistic_from_opts(args, [args.instrument]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be called 'rank_method' as in other scripts
@spxiwh actually, it might make more sense to merge this once the functionality / testing aspects have been dealt with and then do any renaming in a subsequent PR (which someone else could deal with). |
@spxiwh I think rather than
|
pycbc/events/stat.py
Outdated
be applied (unused in this statistic) | ||
err_msg = "This function is a stub that should be overridden by the " | ||
err_msg += "sub-classes. You shouldn't be seeing this error!" | ||
raise ValueError(err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as noted in the main PR thread, this and analogous exceptions should preferably be NotImplementedError
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from the request to change the 'statengine' instance name to 'rank_method' for consistency, and change ValueError to NotImplementedError for stub methods, the changes look ok to me
Thanks @tdent I'll make the requested changes, and leave the renaming for a further commit as you suggest. |
fwiw if you want to make pycbc generally into a '90-ish-character-long-line' codebase as opposed to '79-character-long-line' that should be taken up at a higher level. However in this case it looks like there's only actually one exceptionally long line, which is ignorable. |
N.B: Just seen another bit of old-workflow-compatible code in add_statmap |
b57337e
to
015d664
Compare
I am doing a quick test of PyCBC Live with this patch applied and the
I get this error:
Apparently this happens because of this (this is output from a
Where does the list-of-lists comes from? I would expect a (correct) list of strings by looking at the |
PyCBC Live now gets past the initialization and crashes as soon as some triggers show up:
Am I right that you expect this error @spxiwh? Do you prefer to fix this in a followup PR? |
Also, FWIW maybe that |
@titodalcanton Yes, I expected this error. It should be fixed in a separate PR, as it would be adding new functionality, not migrating old. (I think PyCBC Live was running the old coinc code with an old version of the PT statistic, neither of which are supported now). Adding this back in should not take long, but I'm not quite sure what the rank_stat_coinc function should be here so I'm leaving the NotImplemented stuff to be added in as people need it. ... And yes, this should be a NotImplemented error. Tom already asked for that and I forgot it, I will change that. |
This should now pass all tests, and I think I've addressed all comments/feedback given. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge once Travis / any final cc passes.
* First cut at breaking down stat module * Documentation * More documentation * More documentation * More documentation * Some more doc updates * More doc updating * Updating more docs * Updates * More updating * More updates * More updating stat.py * CodeClimate-ing * More codeclimating * Some more codeclimating * Update coinc code to use reorganized stat.py * Updating exes to new interface * Update eventmgr to new interface * Remove some old stuff from ranking.py * Remove old sngl_statistic concept * Updating hdf.py to new interface * Change name in exe * Fix to ranking.py * Typo fix in hdf.py * Typo in stat.py * Adding in some more missing info * Update in coinc_findtrigs * Missed a line in pycbc_plot_singles_vs_params * Rewire single_multiifo * Hook up mask_to_n_loudest properly * Add new option to sngl_minifollowup * Simplify sngls_findtrigs * Implement a stat_option_group for convenience * Add reminder FIXME * Fix typo * Fix another typo * Fix typo in fit_sngls_by_template * Fix bug in sngl_minifollowup * CodeClimating * Fix typo * Hook up pycbc_live * Fix typo * Some codeclimating * Rename coinc and single_multiifo as suggested * Live must use new interface * Typo * More typos * Fix another typo * More CodeClimating * Fix to pycbc_process_sngls * Update testing config files * Some minor renaming (remove "New") * Fix PyCBC Live interface * Make ValueErrors NotImplementedErrors * Fix nasty typo * Rename as requested by Tom * Hopefully final CodeClimating
stat.py
has become a little bit difficult to work with due to the large amount of classes now being needed to implement even minor changes. I want to rework this according to the pasted image. So far I've implemented this basically as illustrated below, with only those classes. It's possible this could be compressed further, but perhaps this is better? Clearly this PR will not work alone as we will need to remove completely the old stat executables, and hook the new ones up to the renamed methods here. However, I think it's worth seeing first if this is how we want this to look. I think there are some design choices here (most of the questions in red/green in the image) which are not obviously clear.There's also the question of how to handle singles. At the moment, as I am removing the "sngl_statistic" concept from
stat.py
there are two places singles can get information. Either via thesingle_multiifo
method for getting a full ranking statistic, or via theget_ranking
function inranking.py
to access anything that can be computed from the trigger alone. Are these two places sufficient? If onlypycbc_multiifo_sngls_findtrigs
uses thesingle_multiifo
method, is it okay that the plotting codes cannot get to exp_fitted singles ranking? Is a requirement to hook that up in some way?When implementing
get_sngls_ranking_from_trigs
I first thought of doing something complicated allowing arbitrary extensions to the statistic name (or kwargs) to turn off/on things like sgveto, cuts, psdvar etc. However as the PSDvar affects the chi-squared these things are not all nicely orthogonal. As there are only a small enough number of these rankings at the moment anyway, I did not do the more involved thing. Perhaps at some point we would need to swap over, but I think that could be done ... Or maybe it needs doing now??