Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add function to get single jungfrau module data #164

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tmichela
Copy link
Member

@tmichela tmichela commented Jun 8, 2022

  • gets data
  • add ASIC gaps
  • add mask
  • mask edges

@takluyver This is the kind of function people would like to have, as discussed here
(module and function name are up for suggestion, I wasn't particularly inspired)

@takluyver
Copy link
Member

Thanks! On naming, I'd call it something like jungfrau_module_data to make it clear that it's for a single module. And I'd make it importable directly from extra_geom, so the name of the module it lives in doesn't matter too much.

In terms of the API:

  • I'm not super keen on parameters that pass directly through to EXtra-data (name and trains). Why not take a SourceData object, so it looks like jungfrau_module_data(run['SPB_IRDA_JF4M/DET/JNGFR01:daqOutput']) instead of jungfrau_module_data(run, 'SPB_IRDA_JF4M/DET/JNGFR01:daqOutput')? It's slightly more complex for this specific use case, but it's a reusable pattern, so people have less to learn overall.
    • But maybe if there's only one JUNGFRAU module, it can accept a DataCollection and find the source automatically, like the things in extra_data.components do?
    • Should it also take an array or something so it's not tied specifically to loading data from files? 🤔
  • I'd combine the parameters mask_edges and pix_edge as something like edge_pixels, so edge_pixels=0 doesn't do the masking. False == 0 in Python, so you'd also be able to pass False for this.
  • For use_mask, I wonder if we should have 3 options, with the default meaning 'if available', and use_mask=True leading to an error if it's used on raw data (with no mask in the files). Maybe that's getting too clever, though. 🤷

@tmichela
Copy link
Member Author

Thanks for the feedback!

I'd call it something like jungfrau_module_data

👍

* I'm not super keen on parameters that pass directly through to EXtra-data (`name` and `trains`). Why not take a SourceData object, so it looks like `jungfrau_module_data(run['SPB_IRDA_JF4M/DET/JNGFR01:daqOutput'])` instead of `jungfrau_module_data(run, 'SPB_IRDA_JF4M/DET/JNGFR01:daqOutput')`? It's slightly more complex for this specific use case, but it's a reusable pattern, so people have less to learn overall.

Yea I can change that.
I was originally looking in the oposite direction with only giving a proposal/run number jungfrau_module_data(1234, 1, 'det') to avoid interract with extra_data at all, but that required to have extra_data as dependency of extra_geom... Any opinion?

  * But maybe if there's only one JUNGFRAU module, it can accept a DataCollection and find the source automatically, like the things in `extra_data.components` do?

Probably risky, there are usually multiple modules saved in a same run, and explicilty chosing the module avoids risk of working on the wrong data.

  * Should it also take an array or something so it's not tied specifically to loading data from files? 🤔

Probably not, but I can ask for feedback here.

* I'd combine the parameters `mask_edges` and `pix_edge` as something like `edge_pixels`, so `edge_pixels=0` doesn't do the masking. `False == 0` in Python, so you'd also be able to pass False for this.

good point 👍

* For `use_mask`, I wonder if we should have 3 options, with the default meaning 'if available', and `use_mask=True` leading to an error if it's used on raw data (with no mask in the files). Maybe that's getting too clever, though. 🤷

what about a warning if set to True and missing?

…ct instead of run/sourcename, import the function from the base module
@takluyver
Copy link
Member

I'd get rid of the trains parameter as well, for a similar reason to the source name. This reminds me, I wonder if we should make slicing trains work on DataCollection and SourceData objects like it already does on KeyData?

I was originally looking in the oposite direction with only giving a proposal/run number jungfrau_module_data(1234, 1, 'det') to avoid interract with extra_data at all, but that required to have extra_data as dependency of extra_geom... Any opinion?

I'd rather not make extra_geom require extra_data - the reason for separating them in the first place is that EXtra-geom is also useful for online data analysis where you're not working with files.

Maybe that implies this function is actually better off as part of extra_data.components, as it's primarily about accessing a specific type of data from files, not geometry. It could add fixed gaps as a hard-coded thing, and we can point people to EXtra-geom for anything more complicated than that. 🤔

Probably risky, there are usually multiple modules saved in a same run, and explicilty chosing the module avoids risk of working on the wrong data.

OK, that sounds reasonable. Possibly at some point we'll want to allow specifying module number instead of source name, but that can be added later.

what about a warning if [use_mask is] set to True and missing?

Maaaybe... my feeling is that warnings are rarely a good answer, because it's easy to either overlook them entirely or to read them as errors. I'm also OK with use_mask meaning 'use mask if available' if you prefer that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants