Handle large area processing #49

ValentinaHutter · 2021-09-08T07:04:29Z

odc_load_helper now only changes nodata values to np.nan

…aded anymore

sophieherrmann

There are some nice improvements!

Did you already check all other functions if attributes are properly passed?

src/openeo_processes/comparison.py

src/openeo_processes/cubes.py

src/openeo_processes/math.py

src/openeo_processes/utils.py

sophieherrmann · 2021-09-28T13:03:00Z

I just found this http://xarray.pydata.org/en/stable/generated/xarray.save_mfdataset.html

could improve speed of writing netcdf files

ValentinaHutter · 2021-09-28T16:05:53Z

I just found this http://xarray.pydata.org/en/stable/generated/xarray.save_mfdataset.html

could improve speed of writing netcdf files

Thanks, I just inserted that :)

As discussed openEOPlatform/architecture-docs#27 Signed-off-by: sherrmann <sophie.herrmann@eodc.eu>

Signed-off-by: sherrmann <sophie.herrmann@eodc.eu>

clausmichele

Why are you splitting the request into two separate ones? I mean, this should be more flexible and split it into parts which have a size you know it's fine, otherwise this could work for size x = 100 for example, where you split into x1 = 50 and x2 = 50, but what if x = 1000? (numbers are just examples)

ValentinaHutter · 2021-10-05T10:04:20Z

Thanks for reviewing it, that was just a test to try out splitting it in two parts, but I will remove the change, as it is not improving the process.

ValentinaHutter · 2021-10-06T09:24:13Z

To handle large areas and apply processes to large areas, I had a look at all the processes. The processes that still need an update to work for large areas are sort, order. The issue is discribed here: #52

sophieherrmann · 2021-10-06T12:14:17Z

As this PR already provides a number of new features / bug fixes which are urgently needed, I'll merge it now.
The main new feature is, nearly all jobs are now running completely on dask - no direct access to the array values. This allows to also run large area jobs. Exceptions are described by @ValentinaHutter #49 (comment) and will be solved in a separate PR.

A connected issue is that the fit_curve process is computationally quite expensive (especially for large areas). This issue is documented here #53 and will also be addressed in a separate PR.

Changed odc_load_helper to improve CPU and MEM USAGE

e3e87f3

ValentinaHutter self-assigned this Sep 8, 2021

Change in fit_curve output_size

d87ca75

ValentinaHutter changed the title ~~Changed odc_load_helper to improve CPU and MEM USAGE~~ WIP: Handle large area processing Sep 10, 2021

ValentinaHutter added 4 commits September 15, 2021 10:19

Updated save_result, updated processes that load values, no values lo…

f9ed43e

…aded anymore

Added attributes

1f8f9c1

Update comparison processes to check datatype of input

618173b

Changed attributes in math processes

c84442e

sophieherrmann reviewed Sep 27, 2021

View reviewed changes

ValentinaHutter added 2 commits September 27, 2021 13:49

Added rename_dimension process, needed for save_result process

f75148f

Inserted keep_attrs function

f182978

Save netCDF files with save_mfdataset

9bbeab3

sophieherrmann and others added 9 commits September 30, 2021 11:42

Create netcdf file with multiple timestamps.

4c61600

As discussed openEOPlatform/architecture-docs#27 Signed-off-by: sherrmann <sophie.herrmann@eodc.eu>

Use xr.curvefit for fit curve function

ccbd2cb

rechunk data in fit curve

49de4ab

rechunk time

252394d

No only rename dimension, but whole coordinate.

e661815

Signed-off-by: sherrmann <sophie.herrmann@eodc.eu>

rechunk data time, without rechunking x,y

f658a7d

try splitting in to arrays

7c5c459

old fit_curve version

c1600c8

Split fit curve into two datacubes

2113956

clausmichele reviewed Oct 5, 2021

View reviewed changes

old fit_curve version

09675ca

sophieherrmann changed the title ~~WIP: Handle large area processing~~ Handle large area processing Oct 6, 2021

sophieherrmann merged commit 9727cc0 into master Oct 6, 2021

ValentinaHutter deleted the update-load-helper branch May 2, 2022 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle large area processing #49

Handle large area processing #49

ValentinaHutter commented Sep 8, 2021

sophieherrmann left a comment

sophieherrmann commented Sep 28, 2021

ValentinaHutter commented Sep 28, 2021

clausmichele left a comment

ValentinaHutter commented Oct 5, 2021

ValentinaHutter commented Oct 6, 2021

sophieherrmann commented Oct 6, 2021

Handle large area processing #49

Handle large area processing #49

Conversation

ValentinaHutter commented Sep 8, 2021

sophieherrmann left a comment

Choose a reason for hiding this comment

sophieherrmann commented Sep 28, 2021

ValentinaHutter commented Sep 28, 2021

clausmichele left a comment

Choose a reason for hiding this comment

ValentinaHutter commented Oct 5, 2021

ValentinaHutter commented Oct 6, 2021

sophieherrmann commented Oct 6, 2021