Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve string->index mapping in Tau ID data format #30604

Closed
makortel opened this issue Jul 8, 2020 · 11 comments
Closed

Improve string->index mapping in Tau ID data format #30604

makortel opened this issue Jul 8, 2020 · 11 comments

Comments

@makortel
Copy link
Contributor

makortel commented Jul 8, 2020

I noticed (a bit accidentally via #30356) that that #28417 had introduced the use of provenance to map a Tau ID string to an index in an event product. While that is technically ok, the provenance system is not really meant for that. One downside of that approach is that a change in the producing module configuration structure leads to code changes elsewhere.

An alternative would be to store the string->index mapping in LuminosityBlock, in Run, or eventually in ProcessBlock.

This is certainly not urgent, but something to think about for longer term.

@makortel
Copy link
Contributor Author

makortel commented Jul 8, 2020

assign reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2020

New categories assigned: reconstruction

@slava77,@perrotta,@jpata you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2020

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jul 8, 2020

@swozniewski @mbluj
please take a note.
... not that we didn't consider this, if my memory is right

@mbluj
Copy link
Contributor

mbluj commented Jul 8, 2020

Yes, it was considered, but because of some reasons (@swozniewski should remember it better) provenance was chosen...

@swozniewski
Copy link
Contributor

Hi, I had a look into our email thread on this. For the record, I post the corresponding passage below. I see that Michal raised a corresponding question and Slava brought up provenance and adding some pros and cons for it or additionally storing the config instead. But we did not go much in detail about possibilities to store at higher levels than event level. Altogether I would say we arrived at using provenance because it is already stored anyway and therefore does not increase the file size while it should also be reasonably fast.
I think we were aware (not in the thread below) that changing the config keys of the concerned producer would induce changes in the modules using provenance but we did not see a reason why this config would need to change in future.

ME (summarizing the todo): - Indexing: Try string to value mapping at event level. This should by
well compressed by root, so let's try at least. (I would put this on
top
of the other changes in order to be able to roll back if necessary)

MICHAL: This sounds good. But, is there an easy to use way to store the names at
run or lumi-section level? Or even better at a sample level?
Discriminator / index names are properties of a given processing rather
than a given event.

SLAVA: Provenance could be used as well; it can allow to get to the list of
parameters per current configuration or per input file, in case of
reprocessing.
This is probably the most minimal way of tracking what went into the
input ID product,
or what was the meaning/naming of the working points.
This is a more direct solution during run time, perhaps more
straightforward than the python level solution, which knows only about
the current release configuration.

Saving an aligned vector of names per vector of values is the most flexible,
with the cost localized to the CPU needed for data product streaming and
the additional memory allocations during run time.
In the tau ID context, and also thinking of the similarities with the
pat::Lepton/Tau content,
this approach of keeping the string names inside the event data is more
direct.

@jpata
Copy link
Contributor

jpata commented May 18, 2022

type tau

@cmsbuild cmsbuild added the tau label May 18, 2022
@jpata
Copy link
Contributor

jpata commented May 18, 2022

@mbluj can you summarize, is this something that will be addressed, or rather should we read from the last message that you will stick with provenance?

@mbluj
Copy link
Contributor

mbluj commented May 18, 2022

We will stick with provenance. However, if you are strongly in favour of solution proposed by Slava we can look for someone who will be interested in implementing it (Sebastian who rearranged tauID format left CMS).

@jpata
Copy link
Contributor

jpata commented May 18, 2022

-reconstruction

  • will not be addressed in the near term

@jpata
Copy link
Contributor

jpata commented May 18, 2022

@cmsbuild please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants