Pangeo framework for LIDAR and Hyperspectral Forestry #170

bw4sz · 2018-03-20T16:06:37Z

Hi all,

Following #144, introducing myself and my interest in this project. I am working on tree delineation and segmentation using airborne LIDAR and hyperspectral data for the NEON sites. Some project info is here. I am working on the UF Hipergator HPC environment. I appreciate the wiki doc on getting dask started on HPC. If i'm successful, I'll try to contribute additional information that might help users on other clusters (SLURM instead of PBS). If I understand correctly, alot of the speedup and memory management comes from xarrays and dask distributed processing? I'm inheriting alot of code, I'll need to decide how much to refactor to match these workflows? Our data is split into tiles, and i'd like to subset those tiles, distribute them to workers, perform our supervised classification algorithms and recombine. This will be my first experience with dask. I was using apache beam on google cloud dataflow before moving to the University cluster.

Ben Weinstein
Postdoctoral Fellow
University of Florida

mrocklin · 2018-03-20T16:14:28Z

Welcome @bw4sz ! We're glad to see you. I'd like to recommend a couple links to you:

the dask-jobqueue project, and the SLURM integration there in particular: https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/slurm.py

I suspect that this will mostly work, but I would not be surprised to learn that some tweaks need to be made to generalize it. As we encounter more and more clusters we routinely find that we had made some assumptions based on the clusters-at-hand that are not generalizable.
The documentation on making dask arrays from different data sources: http://dask.pydata.org/en/latest/array-creation.html

I suspect that if you include more information here about the kind of file format you're using and how you currently access that data from within Python that people here will have more suggestions on how to get started

bw4sz · 2018-03-20T16:33:35Z

Thanks @mrocklin, i'll report back on my success. I think overarching question I have is whether this pipeline will also be appropriate for some traditional embarrassingly parallel operations when needed. I can see in the mission statement that the goal is to work interactively. While that is 100% helpful and crucial in the development stage, eventually we hope to scale in a traditional batch submission approach.

In terms of data, we have thousands of .laz files stored locally on the HPC. We load them similarly to this stack overflow question. This is a very new project, so it will be a couple days before I have any much to add.

mrocklin · 2018-03-20T16:50:59Z

People do plenty of non-interactive work with this tool chain as well. Interactive is more of a goal than a constraint.

…

On Tue, Mar 20, 2018 at 12:33 PM, Ben Weinstein ***@***.***> wrote: Thanks @mrocklin <https://github.com/mrocklin>, i'll report back on my success. I think overarching question I have is whether this pipeline will also be appropriate for some traditional embarrassingly parallel operations when needed. I can see in the mission statement that the goal is to work interactively. While that is 100% helpful and crucial in the development stage, eventually we hope to scale in a traditional batch submission approach. In terms of data, we have thousands of .laz files stored locally on the HPC. We load them similarly to this stack overflow question <https://stackoverflow.com/questions/47671676/reading-laz-to-dask-dataframe-using-delayed-loading>. This is a very new project, so it will be a couple days before I have any much to add. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#170 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszOF028cuSnl3NXlcdIjejO3lhlm5ks5tgS9ggaJpZM4SyL-f> .

stale · 2018-06-25T16:15:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jacobtomlinson added community user support labels Apr 26, 2018

stale bot added the stale label Jun 25, 2018

bw4sz closed this as completed Jun 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pangeo framework for LIDAR and Hyperspectral Forestry #170

Pangeo framework for LIDAR and Hyperspectral Forestry #170

bw4sz commented Mar 20, 2018 •

edited

Loading

mrocklin commented Mar 20, 2018

bw4sz commented Mar 20, 2018

mrocklin commented Mar 20, 2018 via email

stale bot commented Jun 25, 2018

Pangeo framework for LIDAR and Hyperspectral Forestry #170

Pangeo framework for LIDAR and Hyperspectral Forestry #170

Comments

bw4sz commented Mar 20, 2018 • edited Loading

mrocklin commented Mar 20, 2018

bw4sz commented Mar 20, 2018

mrocklin commented Mar 20, 2018 via email

stale bot commented Jun 25, 2018

bw4sz commented Mar 20, 2018 •

edited

Loading