Xarray lagged correlation function quesion #5757

pecos27 · 2021-09-01T16:06:51Z

pecos27
Sep 1, 2021

I am hoping to perform a lagged correlation analysis for a time series compared to 3D data set (time, lat, lon). Where the time series will be leading the 3D data set (lag correlation).
I've been looking at this program below (source: (https://stackoverflow.com/questions/52108417/how-to-apply-linear-regression-to-every-pixel-in-a-large-multi-dimensional-array)).

def lag_linregress_3D(x, y, lagx=0, lagy=0):
    """
    Input: Two xr.Datarrays of any dimensions with the first dim being time. 
    Thus the input data could be a 1D time series, or for example, have three 
    dimensions (time,lat,lon). 
    Datasets can be provided in any order, but note that the regression slope 
    and intercept will be calculated for y with respect to x.
    Output: Covariance, correlation, regression slope and intercept, p-value, 
    and standard error on regression between the two datasets along their 
    aligned time dimension.  
    Lag values can be assigned to either of the data, with lagx shifting x, and
    lagy shifting y, with the specified lag amount. 
    """ 
    #1. Ensure that the data are properly alinged to each other. 
    x,y = xr.align(x,y)

    #2. Add lag information if any, and shift the data accordingly
    if lagx!=0:

        # If x lags y by 1, x must be shifted 1 step backwards. 
        # But as the 'zero-th' value is nonexistant, xr assigns it as invalid 
        # (nan). Hence it needs to be dropped
        x   = x.shift(time = -lagx).dropna(dim='time')

        # Next important step is to re-align the two datasets so that y adjusts
        # to the changed coordinates of x
        x,y = xr.align(x,y)

    if lagy!=0:
        y   = y.shift(time = -lagy).dropna(dim='time')
        x,y = xr.align(x,y)

    #3. Compute data length, mean and standard deviation along time axis: 
    n = y.notnull().sum(dim='time')
    xmean = x.mean(axis=0)
    ymean = y.mean(axis=0)
    xstd  = x.std(axis=0)
    ystd  = y.std(axis=0)

    #4. Compute covariance along time axis
    cov   =  np.sum((x - xmean)*(y - ymean), axis=0)/(n)

    #5. Compute correlation along time axis
    cor   = cov/(xstd*ystd)

    #6. Compute regression slope and intercept:
    slope     = cov/(xstd**2)
    intercept = ymean - xmean*slope  

    #7. Compute P-value and standard error
    #Compute t-statistics
    tstats = cor*np.sqrt(n-2)/np.sqrt(1-cor**2)
    stderr = slope/tstats

    from scipy.stats import t
    pval   = t.sf(tstats, n-2)*2
    pval   = xr.DataArray(pval, dims=cor.dims, coords=cor.coords)

    return cov,cor,slope,intercept,pval,stderr, x,y

I noticed in the function that when you change lag-y from 0 to say 12, it drops the last 12 months or time steps of your 3D data. Say your original data set goes from January 2000 - December 2018. but then when lagy is set to 12, the data set goes from January 2000 - December 2017.
But then xr.align realigns the time series and the 3D array to January 2000 - December 2018 and then takes the correlation etc.
My question is whether the lagged correlation is actually taken? Or is just correlating the time series and the 3D matrix between the set time interval (say January 2000 - Dec 2017).

I expected that the time for the time series would be January 2000 - Dec 2018 and then the 3D data set (with lag-y set to 12) would go from January - December 2017 with nans at the end for the last 12 months?
Am I interpreting this program correctly?
The goal is to get the correlation at different lag times (the correlation of 3D data set with respect to the time series). To have the time series lead while the multidimensional data set will lag (this is to try and see potential advection or propagation...).

One other thing to note is that the function also only works if there are no nans (all nans are set to zero).

# reproducible program here with rasm data set:

ds = xr.tutorial.open_dataset('rasm')

Tair = ds.Tair

Tair_anoms = Tair.groupby('time.month') - Tair.groupby('time.month').mean('time')

time_series = Tair_anoms[:,25,150]
time_series_anoms = time_series.groupby('time.month') - time_series.groupby('time.month').mean('time')

# lagged correlation


Tair_anoms_nan_to_zeros = np.nan_to_num(Tair_anoms) #make all nans becomes zeros

#converting back to xarray
Tair_anoms =  xr.DataArray(Tair_anoms_nan_to_zeros, coords=[Tair_anoms.time, Tair_anoms.y, Tair_anoms.x], dims=["time","lat", "lon"])


lag_12 = lag_linregress_3D(time_series_anoms,Tair_anoms,0,12)

lag_12[1].plot()
plt.title('Correlation at lag 12')
plt.show()

print(lag_12[-2].time)
print(lag_12[-1].time)

Thank you!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xarray lagged correlation function quesion #5757

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Xarray lagged correlation function quesion #5757

pecos27 Sep 1, 2021

Replies: 0 comments

pecos27
Sep 1, 2021