You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def lag_linregress_3D(x, y, lagx=0, lagy=0):
"""
Input: Two xr.Datarrays of any dimensions with the first dim being time.
Thus the input data could be a 1D time series, or for example, have three
dimensions (time,lat,lon).
Datasets can be provided in any order, but note that the regression slope
and intercept will be calculated for y with respect to x.
Output: Covariance, correlation, regression slope and intercept, p-value,
and standard error on regression between the two datasets along their
aligned time dimension.
Lag values can be assigned to either of the data, with lagx shifting x, and
lagy shifting y, with the specified lag amount.
"""
#1. Ensure that the data are properly alinged to each other.
x,y = xr.align(x,y)
#2. Add lag information if any, and shift the data accordingly
if lagx!=0:
# If x lags y by 1, x must be shifted 1 step backwards.
# But as the 'zero-th' value is nonexistant, xr assigns it as invalid
# (nan). Hence it needs to be dropped
x = x.shift(time = -lagx).dropna(dim='time')
# Next important step is to re-align the two datasets so that y adjusts
# to the changed coordinates of x
x,y = xr.align(x,y)
if lagy!=0:
y = y.shift(time = -lagy).dropna(dim='time')
x,y = xr.align(x,y)
#3. Compute data length, mean and standard deviation along time axis:
n = y.notnull().sum(dim='time')
xmean = x.mean(axis=0)
ymean = y.mean(axis=0)
xstd = x.std(axis=0)
ystd = y.std(axis=0)
#4. Compute covariance along time axis
cov = np.sum((x - xmean)*(y - ymean), axis=0)/(n)
#5. Compute correlation along time axis
cor = cov/(xstd*ystd)
#6. Compute regression slope and intercept:
slope = cov/(xstd**2)
intercept = ymean - xmean*slope
#7. Compute P-value and standard error
#Compute t-statistics
tstats = cor*np.sqrt(n-2)/np.sqrt(1-cor**2)
stderr = slope/tstats
from scipy.stats import t
pval = t.sf(tstats, n-2)*2
pval = xr.DataArray(pval, dims=cor.dims, coords=cor.coords)
return cov,cor,slope,intercept,pval,stderr, x,y
I noticed in the function that when you change lag-y from 0 to say 12, it drops the last 12 months or time steps of your 3D data. Say your original data set goes from January 2000 - December 2018. but then when lagy is set to 12, the data set goes from January 2000 - December 2017.
But then xr.align realigns the time series and the 3D array to January 2000 - December 2018 and then takes the correlation etc.
My question is whether the lagged correlation is actually taken? Or is just correlating the time series and the 3D matrix between the set time interval (say January 2000 - Dec 2017).
I expected that the time for the time series would be January 2000 - Dec 2018 and then the 3D data set (with lag-y set to 12) would go from January - December 2017 with nans at the end for the last 12 months?
Am I interpreting this program correctly?
The goal is to get the correlation at different lag times (the correlation of 3D data set with respect to the time series). To have the time series lead while the multidimensional data set will lag (this is to try and see potential advection or propagation...).
One other thing to note is that the function also only works if there are no nans (all nans are set to zero).
# reproducible program here with rasm data set:
ds = xr.tutorial.open_dataset('rasm')
Tair = ds.Tair
Tair_anoms = Tair.groupby('time.month') - Tair.groupby('time.month').mean('time')
time_series = Tair_anoms[:,25,150]
time_series_anoms = time_series.groupby('time.month') - time_series.groupby('time.month').mean('time')
# lagged correlation
Tair_anoms_nan_to_zeros = np.nan_to_num(Tair_anoms) #make all nans becomes zeros
#converting back to xarray
Tair_anoms = xr.DataArray(Tair_anoms_nan_to_zeros, coords=[Tair_anoms.time, Tair_anoms.y, Tair_anoms.x], dims=["time","lat", "lon"])
lag_12 = lag_linregress_3D(time_series_anoms,Tair_anoms,0,12)
lag_12[1].plot()
plt.title('Correlation at lag 12')
plt.show()
print(lag_12[-2].time)
print(lag_12[-1].time)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am hoping to perform a lagged correlation analysis for a time series compared to 3D data set (time, lat, lon). Where the time series will be leading the 3D data set (lag correlation).
I've been looking at this program below (source: (https://stackoverflow.com/questions/52108417/how-to-apply-linear-regression-to-every-pixel-in-a-large-multi-dimensional-array)).
I noticed in the function that when you change lag-y from 0 to say 12, it drops the last 12 months or time steps of your 3D data. Say your original data set goes from January 2000 - December 2018. but then when lagy is set to 12, the data set goes from January 2000 - December 2017.
But then xr.align realigns the time series and the 3D array to January 2000 - December 2018 and then takes the correlation etc.
My question is whether the lagged correlation is actually taken? Or is just correlating the time series and the 3D matrix between the set time interval (say January 2000 - Dec 2017).
I expected that the time for the time series would be January 2000 - Dec 2018 and then the 3D data set (with lag-y set to 12) would go from January - December 2017 with nans at the end for the last 12 months?
Am I interpreting this program correctly?
The goal is to get the correlation at different lag times (the correlation of 3D data set with respect to the time series). To have the time series lead while the multidimensional data set will lag (this is to try and see potential advection or propagation...).
One other thing to note is that the function also only works if there are no nans (all nans are set to zero).
# reproducible program here with rasm data set:
Thank you!!!
Beta Was this translation helpful? Give feedback.
All reactions