-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pangeo framework for LIDAR and Hyperspectral Forestry #170
Comments
Welcome @bw4sz ! We're glad to see you. I'd like to recommend a couple links to you:
|
Thanks @mrocklin, i'll report back on my success. I think overarching question I have is whether this pipeline will also be appropriate for some traditional embarrassingly parallel operations when needed. I can see in the mission statement that the goal is to work interactively. While that is 100% helpful and crucial in the development stage, eventually we hope to scale in a traditional batch submission approach. In terms of data, we have thousands of .laz files stored locally on the HPC. We load them similarly to this stack overflow question. This is a very new project, so it will be a couple days before I have any much to add. |
People do plenty of non-interactive work with this tool chain as well.
Interactive is more of a goal than a constraint.
…On Tue, Mar 20, 2018 at 12:33 PM, Ben Weinstein ***@***.***> wrote:
Thanks @mrocklin <https://github.com/mrocklin>, i'll report back on my
success. I think overarching question I have is whether this pipeline will
also be appropriate for some traditional embarrassingly parallel operations
when needed. I can see in the mission statement that the goal is to work
interactively. While that is 100% helpful and crucial in the development
stage, eventually we hope to scale in a traditional batch submission
approach.
In terms of data, we have thousands of .laz files stored locally on the
HPC. We load them similarly to this stack overflow question
<https://stackoverflow.com/questions/47671676/reading-laz-to-dask-dataframe-using-delayed-loading>.
This is a very new project, so it will be a couple days before I have any
much to add.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#170 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszOF028cuSnl3NXlcdIjejO3lhlm5ks5tgS9ggaJpZM4SyL-f>
.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi all,
Following #144, introducing myself and my interest in this project. I am working on tree delineation and segmentation using airborne LIDAR and hyperspectral data for the NEON sites. Some project info is here. I am working on the UF Hipergator HPC environment. I appreciate the wiki doc on getting dask started on HPC. If i'm successful, I'll try to contribute additional information that might help users on other clusters (SLURM instead of PBS). If I understand correctly, alot of the speedup and memory management comes from xarrays and dask distributed processing? I'm inheriting alot of code, I'll need to decide how much to refactor to match these workflows? Our data is split into tiles, and i'd like to subset those tiles, distribute them to workers, perform our supervised classification algorithms and recombine. This will be my first experience with dask. I was using apache beam on google cloud dataflow before moving to the University cluster.
Ben Weinstein
Postdoctoral Fellow
University of Florida
The text was updated successfully, but these errors were encountered: