Process fit_curve needs a lot of time #53

ValentinaHutter · 2021-10-06T09:45:13Z

The fit_curve process is working for small and large spatial extents, but it takes significantly longer for large spatial extents.
When fit_curve is calculating parameters the temporal extent must not be chunked. We now tried to chunk by spatial extent, but this did not improve the speed of the process.
So with an extent of 'x': (11.390419, 11.501999), 'y': (46.311778, 46.373875), 'time': ['2016-09-01', '2018-09-01'], 'measurements': ['B01', 'B02', 'B03', 'B04', 'B07']}) and 'dask_chunks': {'bands': 1, 'time': 150, 'x': 1000, 'y': 1000} applying the fit_curve process takes < 1 hour.
When using the same extent but 'dask_chunks': {'bands': 1, 'time': 150, 'x': 250, 'y': 250} the process took almost 2 hours. So chunking the dataset by spatial extent does not work the way it should.
While a smaller extent like 'x': (11.436012, 11.43804), 'y': (46.346286, 46.34833) takes a minute.

clausmichele · 2021-10-06T12:11:36Z

Thanks for the tests, I'll also try out some alternatives locally to see if there is space for improvements.

clausmichele · 2021-10-25T08:36:59Z

I report here some info about the tests I did:

Using a good estimate for the parameters makes a huge difference. A sample case I'm trying takes ~16 seconds with initial parameters [0,0,0] and ~6 seconds with [2000,0,0]
Resampling the data to weekly average (aggregate_temporal_period with reducer=mean) decreases the number of samples in the time series and therefore reduces slightly the time required for fitting. The result is very similar but the performance increase is not worth it.
The most important thing to check is the chunks size of the input data:


chunks={'time':-1,'x':8,'y':8})
...
CPU times: user 16.6 s, sys: 866 ms, total: 17.4 s
Wall time: 17.4 s

chunks={'time':-1,'x':64,'y':64})
...
CPU times: user 1.96 s, sys: 484 ms, total: 2.44 s
Wall time: 4.8 s

chunks={'time':-1,'x':128,'y':128})
...
CPU times: user 1.55 s, sys: 488 ms, total: 2.04 s
Wall time: 3.8 s

chunks={'time':1,'x':128,'y':128}) with apply_ufunc option 'allow_rechunk':True
...
CPU times: user 7.27 s, sys: 786 ms, total: 8.06 s
Wall time: 11.8 s

So, the data must be chunked only along the spatial dimension and not along the temporal dimension, keeping the option for rechunkig to False: 'allow_rechunk':False

sophieherrmann mentioned this issue Oct 6, 2021

Handle large area processing #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process fit_curve needs a lot of time #53

Process fit_curve needs a lot of time #53

ValentinaHutter commented Oct 6, 2021

clausmichele commented Oct 6, 2021

clausmichele commented Oct 25, 2021

Process fit_curve needs a lot of time #53

Process fit_curve needs a lot of time #53

Comments

ValentinaHutter commented Oct 6, 2021

clausmichele commented Oct 6, 2021

clausmichele commented Oct 25, 2021