Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERA5 Solar Position Time Shift Broken for Certain Time Spans #256

Closed
zoltanmaric opened this issue Oct 5, 2022 · 5 comments · Fixed by #257
Closed

ERA5 Solar Position Time Shift Broken for Certain Time Spans #256

zoltanmaric opened this issue Oct 5, 2022 · 5 comments · Fixed by #257

Comments

@zoltanmaric
Copy link
Contributor

Description

When building an ERA5 cutout for a period that doesn't cover entire months (e.g. for time=slice("2022-06-30 00:00", "2022-07-01 23:00")), the solar position is shifted by half a day instead of half an hour - effectively inverting the solar position (maximum at local midnight, minimum at local noon).

Expected Behavior

Solar position should be shifted by 30 minutes.

Actual Behavior

Solar position is shifted by 12 hours.

Example

import atlite
import matplotlib.pyplot as plt
from atlite.pv import solar_position

cutout = atlite.Cutout("/tmp/era5", module="era5", bounds=(-4, 56, 1.5, 62), time=slice("2022-06-30", "2022-07-01"))
cutout.prepare()

solar_altitude = solar_position.SolarPosition(cutout.data).sel(x=0, y=56).altitude.load()
solar_altitude.plot()
plt.savefig("/tmp/solar_altitude_wrong.png")

image

Your Environment

  • I ran this from a git clone of atlite at revision 230aa8a

Related to #158 and #199

@zoltanmaric
Copy link
Contributor Author

zoltanmaric commented Oct 5, 2022

The culprit is this attempt to infer the frequency of the time index:

# ERA5 variables are mean values for previous hour, i.e. 13:01 to 14:00 are labelled as "14:00"
# account by calculating the SolarPosition for the center of the interval for aggregation happens
# see https://github.com/PyPSA/atlite/issues/158
# Do not show DeprecationWarning from new SolarPosition calculation (#199)
with warnings.catch_warnings():
warnings.simplefilter("ignore", DeprecationWarning)
# Convert dt / time frequency to timedelta and shift solar position by half
# (freqs like ["H","30T"] do not work with pd.to_timedelta(...)
time_shift = (
-1
/ 2
* pd.to_timedelta(
pd.date_range(
"1970-01-01", periods=1, freq=pd.infer_freq(ds["time"])
).freq
)
)
sp = SolarPosition(ds, time_shift=time_shift)

An ERA5 CDS request spanning June 30 and July 1st looks like this:

{
  'product': 'reanalysis-era5-single-levels',
  'year': '2022',
  'month': [6, 7],
  'day': [30, 1],
  'time': ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00', '23:00']
}

which returns results covering June 1st, June 30th, July 1st, and July 30th:

Time index contents

From get_data_influx

print(ds.time)

Output:

<xarray.DataArray 'time' (time: 96)>
array(['2022-06-01T00:00:00.000000000', '2022-06-01T01:00:00.000000000',
       '2022-06-01T02:00:00.000000000', '2022-06-01T03:00:00.000000000',
       '2022-06-01T04:00:00.000000000', '2022-06-01T05:00:00.000000000',
       '2022-06-01T06:00:00.000000000', '2022-06-01T07:00:00.000000000',
       '2022-06-01T08:00:00.000000000', '2022-06-01T09:00:00.000000000',
       '2022-06-01T10:00:00.000000000', '2022-06-01T11:00:00.000000000',
       '2022-06-01T12:00:00.000000000', '2022-06-01T13:00:00.000000000',
       '2022-06-01T14:00:00.000000000', '2022-06-01T15:00:00.000000000',
       '2022-06-01T16:00:00.000000000', '2022-06-01T17:00:00.000000000',
       '2022-06-01T18:00:00.000000000', '2022-06-01T19:00:00.000000000',
       '2022-06-01T20:00:00.000000000', '2022-06-01T21:00:00.000000000',
       '2022-06-01T22:00:00.000000000', '2022-06-01T23:00:00.000000000',
       '2022-06-30T00:00:00.000000000', '2022-06-30T01:00:00.000000000',
       '2022-06-30T02:00:00.000000000', '2022-06-30T03:00:00.000000000',
       '2022-06-30T04:00:00.000000000', '2022-06-30T05:00:00.000000000',
       '2022-06-30T06:00:00.000000000', '2022-06-30T07:00:00.000000000',
       '2022-06-30T08:00:00.000000000', '2022-06-30T09:00:00.000000000',
       '2022-06-30T10:00:00.000000000', '2022-06-30T11:00:00.000000000',
       '2022-06-30T12:00:00.000000000', '2022-06-30T13:00:00.000000000',
       '2022-06-30T14:00:00.000000000', '2022-06-30T15:00:00.000000000',
       '2022-06-30T16:00:00.000000000', '2022-06-30T17:00:00.000000000',
       '2022-06-30T18:00:00.000000000', '2022-06-30T19:00:00.000000000',
       '2022-06-30T20:00:00.000000000', '2022-06-30T21:00:00.000000000',
       '2022-06-30T22:00:00.000000000', '2022-06-30T23:00:00.000000000',
       '2022-07-01T00:00:00.000000000', '2022-07-01T01:00:00.000000000',
       '2022-07-01T02:00:00.000000000', '2022-07-01T03:00:00.000000000',
       '2022-07-01T04:00:00.000000000', '2022-07-01T05:00:00.000000000',
       '2022-07-01T06:00:00.000000000', '2022-07-01T07:00:00.000000000',
       '2022-07-01T08:00:00.000000000', '2022-07-01T09:00:00.000000000',
       '2022-07-01T10:00:00.000000000', '2022-07-01T11:00:00.000000000',
       '2022-07-01T12:00:00.000000000', '2022-07-01T13:00:00.000000000',
       '2022-07-01T14:00:00.000000000', '2022-07-01T15:00:00.000000000',
       '2022-07-01T16:00:00.000000000', '2022-07-01T17:00:00.000000000',
       '2022-07-01T18:00:00.000000000', '2022-07-01T19:00:00.000000000',
       '2022-07-01T20:00:00.000000000', '2022-07-01T21:00:00.000000000',
       '2022-07-01T22:00:00.000000000', '2022-07-01T23:00:00.000000000',
       '2022-07-30T00:00:00.000000000', '2022-07-30T01:00:00.000000000',
       '2022-07-30T02:00:00.000000000', '2022-07-30T03:00:00.000000000',
       '2022-07-30T04:00:00.000000000', '2022-07-30T05:00:00.000000000',
       '2022-07-30T06:00:00.000000000', '2022-07-30T07:00:00.000000000',
       '2022-07-30T08:00:00.000000000', '2022-07-30T09:00:00.000000000',
       '2022-07-30T10:00:00.000000000', '2022-07-30T11:00:00.000000000',
       '2022-07-30T12:00:00.000000000', '2022-07-30T13:00:00.000000000',
       '2022-07-30T14:00:00.000000000', '2022-07-30T15:00:00.000000000',
       '2022-07-30T16:00:00.000000000', '2022-07-30T17:00:00.000000000',
       '2022-07-30T18:00:00.000000000', '2022-07-30T19:00:00.000000000',
       '2022-07-30T20:00:00.000000000', '2022-07-30T21:00:00.000000000',
       '2022-07-30T22:00:00.000000000', '2022-07-30T23:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2022-06-01 ... 2022-07-30T23:00:00
Attributes:
    long_name:  time

Because the time index is not continuous, pd.infer_freq returns None, resulting in a time shift of minus 12 hours.

@zoltanmaric
Copy link
Contributor Author

zoltanmaric commented Oct 5, 2022

I think the fix should be as simple as just always setting the time shift to 30 minutes (proposed diff).

The get_data function of era5.py already has the reanalysis-era5-single-levels hard-coded as the product:

retrieval_params = {
"product": "reanalysis-era5-single-levels",
"area": _area(coords),
"chunks": cutout.chunks,
"grid": [cutout.dx, cutout.dy],
"tmpdir": tmpdir,
"lock": lock,
}

To my understanding, this product's resolution is always hourly (see docs), so there's no reason to attempt to infer a different frequency.
image

What do you think @euronion @FabianHofmann ?

zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Oct 6, 2022
@zoltanmaric
Copy link
Contributor Author

zoltanmaric commented Oct 7, 2022

Comparing ERA5 Values Requested at Different Time Samplings

Here's a comparison of values received from ERA5 at 1h, 2h, 3h, 4h, and 6h sampling:

image

Code to Generate the Above Table

import xarray
import atlite.datasets.era5 as era5
from dask.utils import SerializableLock
import functools

# Create lists of hours like [00:00, 01:00, 02:00, ...], sampled
# every hour, every 2 hours, etc.
time_sampling = {}
for rate in [1, 2, 3, 4, 6]:
    time_sampling[rate] = [f"{hour:02}:00" for hour in range(0, 24, rate)]

retrieval_params = {
    'product': 'reanalysis-era5-single-levels',
    'area': [57.0, -0.5, 56.0, 0.5],
    'chunks': {'time': 100},
    'grid': [0.25, 0.25],
    'tmpdir': '/tmp',
    'lock': SerializableLock(),
    'year': '2013',
    'month': [1],
    'day': [1]
}

param_sets = {hour: {**retrieval_params, **{'time': time}} for hour, time in time_sampling.items()}


def retrieve_data_for_single_raster(params: dict) -> "xarray.DataSet":
    variable = [
        "surface_net_solar_radiation",
        "surface_solar_radiation_downwards",
        "toa_incident_solar_radiation",
        "total_sky_direct_solar_radiation_at_surface",
    ]
    ds = era5.retrieve_data(variable=variable, **params)
    return ds.sel(latitude=56, longitude=0).load().to_dataframe()[["ssr", "ssrd", "tisr", "fdir"]]


# Retrieve ERA5 data for each different time sampling
raw_ds = {hour: retrieve_data_for_single_raster(params) for hour, params in param_sets.items()}


def join_dfs(left_sampling_and_df, right_sampling_and_df):
    left_sampling, left_df = left_sampling_and_df
    right_sampling, right_df = right_sampling_and_df

    suffix = f"_{right_sampling}h"

    return left_sampling, left_df.join(right_df, on='time', how='left', rsuffix=suffix)


# Merge all data into a single dataframe to show differences in values
# for each hour next to each other
_, merged = functools.reduce(join_dfs, raw_ds.items())


samplings_compared = merged.sort_index(axis='columns')\
    .query('time.dt.hour > 7 and time.dt.hour < 18')\
    .astype(float)\
    .round(decimals=2)

# Remove date part
samplings_compared.index = samplings_compared.index.time

samplings_compared.to_csv('/tmp/samplings_compared.csv')

While this does show that the time shift is not proportional to the sampling, there's an additional weird twist to it. The values for 2h, 3h, 4h, and 6h sampling seem to be equal - but the values for 1h sampling are slightly different (by less than 1% at noon).

It would be interesting to find out why that is, but as far as this issue is concerned - I think it's still more appropriate to always shift the time by 30 minutes, than by half of the sampling interval.

@FabianHofmann
Copy link
Contributor

@zoltanmaric I am impressed how fast you are catching up with the atlite code! Definitely makes sense!

@FabianHofmann
Copy link
Contributor

Interesting, but I would not bother with the value differences for different time resolutions, since it is related to ERA5 internals only.

zoltanmaric added a commit to zoltanmaric/atlite that referenced this issue Oct 7, 2022
zoltanmaric added a commit to zoltanmaric/pypsa-eur that referenced this issue Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants