You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 6, 2023. It is now read-only.
The fit_curve process is working for small and large spatial extents, but it takes significantly longer for large spatial extents.
When fit_curve is calculating parameters the temporal extent must not be chunked. We now tried to chunk by spatial extent, but this did not improve the speed of the process.
So with an extent of 'x': (11.390419, 11.501999), 'y': (46.311778, 46.373875), 'time': ['2016-09-01', '2018-09-01'], 'measurements': ['B01', 'B02', 'B03', 'B04', 'B07']}) and 'dask_chunks': {'bands': 1, 'time': 150, 'x': 1000, 'y': 1000} applying the fit_curve process takes < 1 hour.
When using the same extent but 'dask_chunks': {'bands': 1, 'time': 150, 'x': 250, 'y': 250} the process took almost 2 hours. So chunking the dataset by spatial extent does not work the way it should.
While a smaller extent like 'x': (11.436012, 11.43804), 'y': (46.346286, 46.34833) takes a minute.
The text was updated successfully, but these errors were encountered:
Using a good estimate for the parameters makes a huge difference. A sample case I'm trying takes ~16 seconds with initial parameters [0,0,0] and ~6 seconds with [2000,0,0]
Resampling the data to weekly average (aggregate_temporal_period with reducer=mean) decreases the number of samples in the time series and therefore reduces slightly the time required for fitting. The result is very similar but the performance increase is not worth it.
The most important thing to check is the chunks size of the input data:
chunks={'time':-1,'x':8,'y':8})
...
CPU times: user 16.6 s, sys: 866 ms, total: 17.4 s
Wall time: 17.4 s
chunks={'time':-1,'x':64,'y':64})
...
CPU times: user 1.96 s, sys: 484 ms, total: 2.44 s
Wall time: 4.8 s
chunks={'time':-1,'x':128,'y':128})
...
CPU times: user 1.55 s, sys: 488 ms, total: 2.04 s
Wall time: 3.8 s
chunks={'time':1,'x':128,'y':128}) with apply_ufunc option 'allow_rechunk':True
...
CPU times: user 7.27 s, sys: 786 ms, total: 8.06 s
Wall time: 11.8 s
So, the data must be chunked only along the spatial dimension and not along the temporal dimension, keeping the option for rechunkig to False: 'allow_rechunk':False
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The fit_curve process is working for small and large spatial extents, but it takes significantly longer for large spatial extents.
When fit_curve is calculating parameters the temporal extent must not be chunked. We now tried to chunk by spatial extent, but this did not improve the speed of the process.
So with an extent of 'x': (11.390419, 11.501999), 'y': (46.311778, 46.373875), 'time': ['2016-09-01', '2018-09-01'], 'measurements': ['B01', 'B02', 'B03', 'B04', 'B07']}) and 'dask_chunks': {'bands': 1, 'time': 150, 'x': 1000, 'y': 1000} applying the fit_curve process takes < 1 hour.
When using the same extent but 'dask_chunks': {'bands': 1, 'time': 150, 'x': 250, 'y': 250} the process took almost 2 hours. So chunking the dataset by spatial extent does not work the way it should.
While a smaller extent like 'x': (11.436012, 11.43804), 'y': (46.346286, 46.34833) takes a minute.
The text was updated successfully, but these errors were encountered: