Add fraction of TPC clusters used for PID to AO2D #13417

mpuccio · 2024-08-21T20:03:33Z

We add this fraction as a binned variable contained in the track flags (like the PID in tracking).
@shahor02 this is how I would implement the change we discuss before.
@ddobrigk @pzhristov @jgrosseo please have a look: we need this information to clean up the dE/dx signal following the change of dE/dx calculation in the reconstruction that exclude the clusters close to the sector boundaries. A few open points:

this is far from elegant, but it avoids changing the data model
My naive guess is that 12.5% is a good enough binning, but maybe @wiechula can tell us if this is indeed the case

I am still testing so I open in draft.

We add this fraction as a binned variable contained in the track flags (like the PID in tracking).

github-actions · 2024-08-21T20:03:47Z

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass3
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0

Detectors/AOD/src/AODProducerWorkflowSpec.cxx

ddobrigk · 2024-08-23T09:21:38Z

Hi @mpuccio , thanks a lot! I have a few comments:

indeed better be sure that a certain resolution allows us to do what we want before making a change to the data model (to avoid a scenario in which we actually have to re-adjust later on)
This is maybe a predictable question, but ... did you check the data volume increase? Since this is heavily rounded I guess it's small but I have to ask for completeness.
As a general remark: I share your dislike of converters and - as I have stated in the past - I believe we should improve and automatize that. However, I am not so sure that the general idea of using existing variables to pack information in not so intuitive ways (e.g. packing a float inside a flags column) is a good way out: it makes the data model needlessly complicated. In addition, relying on dynamics for something like a track property (such as minFractionOfTPCclustersForPID) goes against using Filters on them, since dynamics are not filterable. Since tracks, in particular, constitute the bulk of our data, that's a pretty relevant drawback - and in this case, you're making a naive guess (in your words ;-) ) that a certain precision is OK just to make sure the data fits in that flags column. That's all a little scary to me...

Having all this in mind: I will not really object to carrying on in this particular instance provided (important!) that we make sure the precision is reasonable, but still, let's please restrict the 'repurposing' of columns to a bare minimum in the future. Instead, let's please discuss how to improve the way we deal with versioning and converters (tagging also @ktf @pzhristov for that). Thank you!

alibuild · 2024-08-25T13:40:05Z

Error while checking build/O2/fullCI for e714151 at 2024-09-25 15:06:

## sw/BUILD/Rivet-latest/log
make[2]: *** [Makefile:544: core.cpp] Error 127
make[1]: *** [Makefile:440: all-recursive] Error 1
make: *** [Makefile:561: all-recursive] Error 1

Full log here.

mpuccio · 2024-08-26T07:04:40Z

Hi @ddobrigk,
sorry, I saw only now your comments. I try to reply to each point:

I let Jens comment, but for the analyses of nuclei that traditionally use a lot this cut this is enough
I will check this week
It is true that dynamic column can't be filtered, however this is true in general for the number of clusters in TPC (see https://github.com/AliceO2Group/AliceO2/blob/7a6665b55cb67b6b9dcd82fba4777496394e6b85/Framework/Core/include/Framework/AnalysisDataModel.h#L349C28-L349C40), so we are not worsening the situation. We already use the flags to store a "non-flaggy" info like the PID in tracking, the main difference is that the current information is stored as a float (to save space). It is not the first time we pack floats in integers in the analysis (pidTiny), and indeed I am not a fan, but I dislike even more the idea of having a converter, especially because this will then be necessary for all the analyses running on the new datasets that we are producing. However, I can provide a comparison in terms of performance of disk of the two, in case you prefer going with the explicit information.

Of course if the spawner, or any other process, could handle the conversions in one go, I would be happier.

Cheers,
Max

ddobrigk · 2024-08-26T07:57:53Z

Hi @mpuccio , thanks for the reply - I think I am just a bit curious about the data volume at this point but otherwise I will not go against this being merged, provided also Jens is fine with 12.5% resolution in the ratio you are saving (though let's still hear from @wiechula to be sure)

Just for completeness and for the general data model discussion, I would still like to provide replies to your point 3: while this is not the first time we pack info in a non-direct way, there is a certain new component to what you are doing when you pack a float into an existing Flag. It's different wrt the PID in tracking, because PID-in-tracking is definitely a discrete option (and therefore flag-like) instead of a float. It is also different wrt pidTiny, where the truncated variables were designed for proper packing in the first place and aren't used for other, unrelated purposes. Therefore, rigorously, even if we have packed info in various ways before, this PR is still doing something new on top as far as I can tell ;-)

In this context, when you say "we are not making it worse", well ... we actually are, because we're adding extra packed info and at the same time combining float bitpacking and variable type mixing into one single 32-bit column. Arguably, if we can do that, we can simply have bitwise columns for everything, and we'll never need any converter but the raw data model will be rather difficult to read :-D. In that sense, what worries me most is that this does not become a major trend, and that's why I pinged Peter/Giulio about possibilities for improving converters for the future. Maybe we can also touch base now in the Monday meeting - but still, no objection to merging this one, let's decouple the two discussions. Thank you!

P.S.: One last word about filterability: what I usually do when I want to do a Filter on TPC clusters is to write the expression for clusters (a simple subtraction, in that case) into the filter, and arguably one could create an expression (perhaps even provided centrally?) to bit-unpack also the variable you've packed and it could be used in Filters too, in fact. That's actually a cool option, in fact, to have filterability...

wiechula · 2024-09-03T19:20:27Z

Dear @mpuccio , I'm not sure I fully understand the purpose of introducing this fraction. From analysis side it is hard for me to comment, since I'm not in analysis any more. My guess is this should be used to remove very bad quality tracks for which many clusters were not used for PID. There, I assume the resolution will do. If the intention is to use it in the n-Sigma estimate (instead of the tracking clusters used at the moment, since the PID clusters are not in AO2D), the +- 0.125/2 binning should result in +-3% variation of the sigma estimate (since sigma goes with 1/sqrt(ncl_PID)). Does also not sound dramatic. There is anyhow some bias using the tracking clusters...

Add fraction of TPC clusters used for PID to AO2D

80a7f35

We add this fraction as a binned variable contained in the track flags (like the PID in tracking).

shahor02 reviewed Aug 21, 2024

View reviewed changes

Detectors/AOD/src/AODProducerWorkflowSpec.cxx Outdated Show resolved Hide resolved

Use uint32_t for intermediate calculations

e714151

mpuccio marked this pull request as ready for review August 22, 2024 17:20

mpuccio requested review from a team as code owners August 22, 2024 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fraction of TPC clusters used for PID to AO2D #13417

Add fraction of TPC clusters used for PID to AO2D #13417

mpuccio commented Aug 21, 2024

github-actions bot commented Aug 21, 2024

ddobrigk commented Aug 23, 2024

alibuild commented Aug 25, 2024 •

edited

Loading

mpuccio commented Aug 26, 2024

ddobrigk commented Aug 26, 2024

wiechula commented Sep 3, 2024

Add fraction of TPC clusters used for PID to AO2D #13417

Are you sure you want to change the base?

Add fraction of TPC clusters used for PID to AO2D #13417

Conversation

mpuccio commented Aug 21, 2024

github-actions bot commented Aug 21, 2024

ddobrigk commented Aug 23, 2024

alibuild commented Aug 25, 2024 • edited Loading

mpuccio commented Aug 26, 2024

ddobrigk commented Aug 26, 2024

wiechula commented Sep 3, 2024

alibuild commented Aug 25, 2024 •

edited

Loading