-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapting the pangeo approach to microscopy #144
Comments
What kinds of computations do you want to do on this data? Are you just looking at various slices? Are you doing aggregations? Would you want to re-arrange your data by space, by time, etc.. There is an example currently in the examples folder named something like So, some concrete questions:
|
Ha, I know that dataset - I did my Ph.D. in Drosophila neuro at Janelia (though not on the FlyEM side). Flashbacks abound... I'd be happy to test my holoviews/bokeh visualization on the FlyEM data, but I think I'd need a couple of things that don't seem to be available in the pangeo stack (though I could be totally missing something here). First, holoviews, which is listed in pangeo/environments/environment.yml but doesn't show doesn't show up when I run help('modules') within the notebook. Second, interactive bokeh plotting - currently I get this
which is presumably fixable by I haven't tried running this (or installing holoviews) via the jupyter console - let me know if that's kosher. ... Otherwise, here's the gist of the approach I'm using for now. To summarize - data storage is very flexible, and I'm mostly focused on (cloud) visualization for now as other things can be run offline.
For my test vizualiations I load data from a tiff file and hold it in memory. I'm moving towards HDF5, but this part of the stack is still very flexible. My current wrapper could switch from hdf5 > zarr without too much trouble, I think.
To start with I'd mostly like to do pretty simple visualizations - i.e. a 2D slice of the data across a few channels, possibly including color compositing.
I am most interested in interactive visualization. More complex computation can be offline, but the outputs of that computation will be either n-d rasters or point-clouds that will be visualized on top of the "raw-data" views that I'm working on now. I'm relying on the holoviews/bokeh stack to make these overlays, and to make plots interactive (e.g. pan, zoom, select ROIs).
I'd like to put visualization on a cluster, though I could also set up a single beefy server if it made more sense. One possible outcome of this work is to develop a stack that would serve as a sort of lightweight online imagej/Fiji - mostly for visualization, with some image processing / biological interpretation functions available. |
Interestingly, Holoviews just came up in #145
…On Wed, Mar 7, 2018 at 11:25 AM, Alex Vaughan ***@***.***> wrote:
Ha, I know that dataset - I did my Ph.D. in Drosophila neuro at Janelia
(though not on the FlyEM side). Flashbacks abound...
I'd be happy to test my holoviews/bokeh visualization on the FlyEM data,
but I think I'd need a couple of things that don't seem to be available in
the pangeo stack. First, holoviews, which is listed in pangeo/environments/environment.yml
but doesn't show doesn't show up when I run help('modules') within the
notebook.
Second, interactive bokeh plotting - currently I get this
from bokeh.io import output_notebook
output_notebook()
> JavaScript output is disabled in JupyterLab
which is presumably fixable by jupyter labextension install
jupyterlab_bokeh..
I haven't tried running this (or installing holoviews) via the jupyter
console - let me know if that's kosher.
...
Otherwise, here's the gist of the approach I'm using for now. To summarize
- data storage is very flexible, and I'm mostly focused on (cloud)
visualization for now as other things can be run offline.
How is your data stored now?
For my test vizualiations I load data from a tiff file and hold it in
memory. I'm moving towards HDF5, but this part of the stack is still very
flexible. My current wrapper could switch from hdf5 > zarr without too much
trouble, I think.
What kinds of computations do you want to run on your data?
To start with I'd mostly like to do pretty simple visualizations - i.e. a
2D slice of the data across a few channels, possibly including color
compositing.
There is more intensive computation do be done, but that is either
extremely fast (e.g. base-calling a single pixel is a max-arg across
channels) or can be done offline (filtering, some compressed sensing,
neuronal reconstruction etc). I'm fine with either pre-computing or losing
interactivity for any serious computation.
What is your time-budget (interactive, don't mind waiting a few minutes,
...)
I am most interested in interactive visualization. More complex
computation can be offline, but the outputs of that computation will be
either n-d rasters or point-clouds that will be visualized on top of the
"raw-data" views that I'm working on now. I'm relying on the
holoviews/bokeh stack to make these overlays, and to make plots interactive
(e.g. pan, zoom, select ROIs).
Did you want to keep things local or put it on a cluster?
I'd like to put visualization on a cluster, though I could also set up a
single beefy server if it made more sense. One possible outcome of this
work is to develop a stack that would serve as a sort of lightweight online
imagej/Fiji - mostly for visualization, with some image processing /
biological interpretation functions available.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#144 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszHDhxPJ_CmJEqk4usnWQkW1l9be3ks5tcAoPgaJpZM4SfzcJ>
.
|
Ah, some good solutions there. I can confirm that using the legacy notebook (/user/.../tree) allows bokeh to work fine. After |
Actually, that seems to do it for the basics! Code below will make a regridded 2d view of the EM data, with a slider for Z. Works after Tiling in X/Y isn't incredibly fast with default chunking (~ 2s update for a new tile), but is much faster when chunked as y =[200,200,200] (not seamless, but <0.5s). The interactivity of changing z depends a lot on zoom level etc.
|
It's pretty cool to see that work in action :) It also gives good motivation for opportunistic caching . |
@agvaughan again I'm glad to see your engagement here. Now that you've had a moment to play with pangeo.pydata.org what are some of the things that you would like to see to make you and people like you more productive on a system like this? |
FWIW Dask Arrays now work as a direct input to Holoviews. Also, if you are interested, can point you to some basic image processing libraries on top of Dask, which may help as you play with your data. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
Hi all,
I'm a neuroscientist working on a new form of microscopy (in-situ sequencing) that has some particular challenges in visualization. I've been working to build an interactive visualization using holoviews/bokeh, and am now digging into the back end more. I've been admiring the pangeo project from afar, and am looking for advice on adapting some of your back-end infrastructure.
Our approach creates a 5-d matrix ( ie., data = np.array(time, channel, z,y,x) that we typically visualize as a 4-color image (i.e. a colorized 2d view from data[time, :, z, : , :]). We expect datasets with a maximum size ~20-100GB, with sizes up to about [30, 4, 50, 5k, 5k].
I've been using holoviews for visualization of in-memory datasets with good results. For example, I can show 4 channels interactively in separate panes using hv.Image(...).regrid() with sufficient detail and interactively to be biologically useful. This is fast enough to be usable on my laptop, including various add-ons such as hv.DynamicMap streams.
In the near future I'd like to make two big changes: embrace OOM datasets, and host data off of the user's machine. Currently data is passed around as numpy > xarray.Dataarray > holoviews > bokeh. The approach you folks have here (particularly dask/zarr/gcfs) seems like a great foundation, but I'm not sure whether our constraints are the same. For instance, I'm not sure where the bottleneck would likely be for my live visualization - networking in GCFS, passing data around between processes, or simple bandwidth to the user?
I'm almost certain I don't know where the real crux is going to be in this project, so would love any and all advice!
Thanks!
Alex
PS, thanks to Matt for pointing me here. Matt, if there was anything relevant in my previous email that's worth mentioning, feel free.
The text was updated successfully, but these errors were encountered: