Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fraction of TPC clusters used for PID to AO2D #13417

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

mpuccio
Copy link
Contributor

@mpuccio mpuccio commented Aug 21, 2024

We add this fraction as a binned variable contained in the track flags (like the PID in tracking).
@shahor02 this is how I would implement the change we discuss before.
@ddobrigk @pzhristov @jgrosseo please have a look: we need this information to clean up the dE/dx signal following the change of dE/dx calculation in the reconstruction that exclude the clusters close to the sector boundaries. A few open points:

  • this is far from elegant, but it avoids changing the data model
  • My naive guess is that 12.5% is a good enough binning, but maybe @wiechula can tell us if this is indeed the case

I am still testing so I open in draft.

We add this fraction as a binned variable contained in the track flags (like the PID in tracking).
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass3
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0

@mpuccio mpuccio marked this pull request as ready for review August 22, 2024 17:20
@mpuccio mpuccio requested review from a team as code owners August 22, 2024 17:20
@ddobrigk
Copy link
Contributor

Hi @mpuccio , thanks a lot! I have a few comments:

  • indeed better be sure that a certain resolution allows us to do what we want before making a change to the data model (to avoid a scenario in which we actually have to re-adjust later on)
  • This is maybe a predictable question, but ... did you check the data volume increase? Since this is heavily rounded I guess it's small but I have to ask for completeness.
  • As a general remark: I share your dislike of converters and - as I have stated in the past - I believe we should improve and automatize that. However, I am not so sure that the general idea of using existing variables to pack information in not so intuitive ways (e.g. packing a float inside a flags column) is a good way out: it makes the data model needlessly complicated. In addition, relying on dynamics for something like a track property (such as minFractionOfTPCclustersForPID) goes against using Filters on them, since dynamics are not filterable. Since tracks, in particular, constitute the bulk of our data, that's a pretty relevant drawback - and in this case, you're making a naive guess (in your words ;-) ) that a certain precision is OK just to make sure the data fits in that flags column. That's all a little scary to me...

Having all this in mind: I will not really object to carrying on in this particular instance provided (important!) that we make sure the precision is reasonable, but still, let's please restrict the 'repurposing' of columns to a bare minimum in the future. Instead, let's please discuss how to improve the way we deal with versioning and converters (tagging also @ktf @pzhristov for that). Thank you!

@alibuild
Copy link
Collaborator

alibuild commented Aug 25, 2024

Error while checking build/O2/fullCI for e714151 at 2024-09-25 15:06:

## sw/BUILD/Rivet-latest/log
make[2]: *** [Makefile:544: core.cpp] Error 127
make[1]: *** [Makefile:440: all-recursive] Error 1
make: *** [Makefile:561: all-recursive] Error 1

Full log here.

@mpuccio
Copy link
Contributor Author

mpuccio commented Aug 26, 2024

Hi @ddobrigk,
sorry, I saw only now your comments. I try to reply to each point:

  1. I let Jens comment, but for the analyses of nuclei that traditionally use a lot this cut this is enough
  2. I will check this week
  3. It is true that dynamic column can't be filtered, however this is true in general for the number of clusters in TPC (see https://github.com/AliceO2Group/AliceO2/blob/7a6665b55cb67b6b9dcd82fba4777496394e6b85/Framework/Core/include/Framework/AnalysisDataModel.h#L349C28-L349C40), so we are not worsening the situation. We already use the flags to store a "non-flaggy" info like the PID in tracking, the main difference is that the current information is stored as a float (to save space). It is not the first time we pack floats in integers in the analysis (pidTiny), and indeed I am not a fan, but I dislike even more the idea of having a converter, especially because this will then be necessary for all the analyses running on the new datasets that we are producing. However, I can provide a comparison in terms of performance of disk of the two, in case you prefer going with the explicit information.

Of course if the spawner, or any other process, could handle the conversions in one go, I would be happier.

Cheers,
Max

@ddobrigk
Copy link
Contributor

Hi @mpuccio , thanks for the reply - I think I am just a bit curious about the data volume at this point but otherwise I will not go against this being merged, provided also Jens is fine with 12.5% resolution in the ratio you are saving (though let's still hear from @wiechula to be sure)

Just for completeness and for the general data model discussion, I would still like to provide replies to your point 3: while this is not the first time we pack info in a non-direct way, there is a certain new component to what you are doing when you pack a float into an existing Flag. It's different wrt the PID in tracking, because PID-in-tracking is definitely a discrete option (and therefore flag-like) instead of a float. It is also different wrt pidTiny, where the truncated variables were designed for proper packing in the first place and aren't used for other, unrelated purposes. Therefore, rigorously, even if we have packed info in various ways before, this PR is still doing something new on top as far as I can tell ;-)

In this context, when you say "we are not making it worse", well ... we actually are, because we're adding extra packed info and at the same time combining float bitpacking and variable type mixing into one single 32-bit column. Arguably, if we can do that, we can simply have bitwise columns for everything, and we'll never need any converter but the raw data model will be rather difficult to read :-D. In that sense, what worries me most is that this does not become a major trend, and that's why I pinged Peter/Giulio about possibilities for improving converters for the future. Maybe we can also touch base now in the Monday meeting - but still, no objection to merging this one, let's decouple the two discussions. Thank you!

P.S.: One last word about filterability: what I usually do when I want to do a Filter on TPC clusters is to write the expression for clusters (a simple subtraction, in that case) into the filter, and arguably one could create an expression (perhaps even provided centrally?) to bit-unpack also the variable you've packed and it could be used in Filters too, in fact. That's actually a cool option, in fact, to have filterability...

@wiechula
Copy link
Collaborator

wiechula commented Sep 3, 2024

Dear @mpuccio , I'm not sure I fully understand the purpose of introducing this fraction. From analysis side it is hard for me to comment, since I'm not in analysis any more. My guess is this should be used to remove very bad quality tracks for which many clusters were not used for PID. There, I assume the resolution will do. If the intention is to use it in the n-Sigma estimate (instead of the tracking clusters used at the moment, since the PID clusters are not in AO2D), the +- 0.125/2 binning should result in +-3% variation of the sigma estimate (since sigma goes with 1/sqrt(ncl_PID)). Does also not sound dramatic. There is anyhow some bias using the tracking clusters...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants