Skip to content
This repository has been archived by the owner on Mar 6, 2023. It is now read-only.

Handle large area processing #49

Merged
merged 19 commits into from
Oct 6, 2021
Merged

Handle large area processing #49

merged 19 commits into from
Oct 6, 2021

Conversation

ValentinaHutter
Copy link
Collaborator

odc_load_helper now only changes nodata values to np.nan

@ValentinaHutter ValentinaHutter self-assigned this Sep 8, 2021
@ValentinaHutter ValentinaHutter changed the title Changed odc_load_helper to improve CPU and MEM USAGE WIP: Handle large area processing Sep 10, 2021
Copy link
Contributor

@sophieherrmann sophieherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some nice improvements!

Did you already check all other functions if attributes are properly passed?

src/openeo_processes/comparison.py Outdated Show resolved Hide resolved
src/openeo_processes/comparison.py Show resolved Hide resolved
src/openeo_processes/cubes.py Outdated Show resolved Hide resolved
src/openeo_processes/cubes.py Outdated Show resolved Hide resolved
src/openeo_processes/cubes.py Outdated Show resolved Hide resolved
src/openeo_processes/math.py Outdated Show resolved Hide resolved
src/openeo_processes/math.py Show resolved Hide resolved
src/openeo_processes/utils.py Outdated Show resolved Hide resolved
@sophieherrmann
Copy link
Contributor

I just found this http://xarray.pydata.org/en/stable/generated/xarray.save_mfdataset.html

could improve speed of writing netcdf files

@ValentinaHutter
Copy link
Collaborator Author

I just found this http://xarray.pydata.org/en/stable/generated/xarray.save_mfdataset.html

could improve speed of writing netcdf files

Thanks, I just inserted that :)

Copy link
Member

@clausmichele clausmichele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you splitting the request into two separate ones? I mean, this should be more flexible and split it into parts which have a size you know it's fine, otherwise this could work for size x = 100 for example, where you split into x1 = 50 and x2 = 50, but what if x = 1000? (numbers are just examples)

@ValentinaHutter
Copy link
Collaborator Author

Thanks for reviewing it, that was just a test to try out splitting it in two parts, but I will remove the change, as it is not improving the process.

@ValentinaHutter
Copy link
Collaborator Author

To handle large areas and apply processes to large areas, I had a look at all the processes. The processes that still need an update to work for large areas are sort, order. The issue is discribed here: #52

@sophieherrmann sophieherrmann changed the title WIP: Handle large area processing Handle large area processing Oct 6, 2021
@sophieherrmann
Copy link
Contributor

As this PR already provides a number of new features / bug fixes which are urgently needed, I'll merge it now.
The main new feature is, nearly all jobs are now running completely on dask - no direct access to the array values. This allows to also run large area jobs. Exceptions are described by @ValentinaHutter #49 (comment) and will be solved in a separate PR.

A connected issue is that the fit_curve process is computationally quite expensive (especially for large areas). This issue is documented here #53 and will also be addressed in a separate PR.

@sophieherrmann sophieherrmann merged commit 9727cc0 into master Oct 6, 2021
@ValentinaHutter ValentinaHutter deleted the update-load-helper branch May 2, 2022 07:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants