Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query_crossborder_flows fails in August 2021 for DE->NL #144

Closed
charelF opened this issue Nov 23, 2021 · 15 comments
Closed

query_crossborder_flows fails in August 2021 for DE->NL #144

charelF opened this issue Nov 23, 2021 · 15 comments
Labels
server bug when an issue isn't entsoe-py's issue, but on the server side

Comments

@charelF
Copy link

charelF commented Nov 23, 2021

Problem

There seems to be a problem with crossborder flows query_crossborder_flows.
In particular, querying over DE -> NL crossborder flows over summer 2021, there is a bug in august.
I assume it happens because the frequency in July is hourly data, and in September its quarterly (15min) data. Probably somewhere in august it switches from hourly to quarterly, which in turn messes up the dataframe and causes the error

Information

pandas: 1.3.4
entsoe-py: 0.4.1
python: 3.9.7
os: macOS monterey 12.0.1

Reproduction

July 2021: Correct

start = pd.Timestamp(str(int(2021_07_01)), tz='Europe/Amsterdam')
end = pd.Timestamp(str(int(2021_08_01)), tz='Europe/Amsterdam')
client.query_crossborder_flows("DE", "NL", start=start, end=end)

returns the expected output:

2021-07-01 00:00:00+02:00    1110.0
2021-07-01 01:00:00+02:00     925.0
2021-07-01 02:00:00+02:00     540.0
2021-07-01 03:00:00+02:00     743.0
2021-07-01 04:00:00+02:00     538.0
                              ...  
2021-07-31 19:00:00+02:00    1437.0
2021-07-31 20:00:00+02:00    1266.0
2021-07-31 21:00:00+02:00    2218.0
2021-07-31 22:00:00+02:00    2171.0
2021-07-31 23:00:00+02:00    2313.0
Freq: 60T, Length: 744, dtype: float64

September 2021: Correct

Similarly,

start = pd.Timestamp(str(int(2021_09_01)), tz='Europe/Amsterdam')
end = pd.Timestamp(str(int(2021_10_01)), tz='Europe/Amsterdam')
client.query_crossborder_flows("DE", "NL", start=start, end=end)

returns the expected output:

2021-09-01 00:00:00+02:00    2579.0
2021-09-01 00:15:00+02:00    2430.0
2021-09-01 00:30:00+02:00    2378.0
2021-09-01 00:45:00+02:00    2281.0
2021-09-01 01:00:00+02:00    2287.0
                              ...  
2021-09-30 22:45:00+02:00     843.0
2021-09-30 23:00:00+02:00    1210.0
2021-09-30 23:15:00+02:00    1191.0
2021-09-30 23:30:00+02:00    1233.0
2021-09-30 23:45:00+02:00    1233.0
Freq: 15T, Length: 2880, dtype: float64

August 2021: Problem

However this one fails:

start = pd.Timestamp(str(int(2021_08_01)), tz='Europe/Amsterdam')
end = pd.Timestamp(str(int(2021_09_01)), tz='Europe/Amsterdam')
client.query_crossborder_flows("DE", "NL", start=start, end=end)

with

ValueError: Length mismatch: Expected axis has 22 elements, new values have 85 elements

Lastly, I just wanted to thank all contributors of this library, it has saved me enormous amount of time and headaches I had working with the entsoe XML API. Thanks!

@charelF
Copy link
Author

charelF commented Nov 23, 2021

Here is the full error message btw:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/d9/6_lw279s459fpbps9x1mxd6m0000gn/T/ipykernel_18198/2060622407.py in <module>
      1 start = pd.Timestamp(str(int(2021_08_01)), tz='Europe/Amsterdam')
      2 end = pd.Timestamp(str(int(2021_09_01)), tz='Europe/Amsterdam')
----> 3 client.query_crossborder_flows("DE", "NL", start=start, end=end)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/entsoe/decorators.py in year_wrapper(start, end, *args, **kwargs)
     62         for _start, _end in blocks:
     63             try:
---> 64                 frame = func(*args, start=_start, end=_end, **kwargs)
     65             except NoMatchingDataError:
     66                 logging.debug(f"NoMatchingDataError: between {_start} and {_end}")

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/entsoe/entsoe.py in query_crossborder_flows(self, country_code_from, country_code_to, start, end, **kwargs)
   1180             start=start,
   1181             end=end)
-> 1182         ts = parse_crossborder_flows(text)
   1183         ts = ts.tz_convert(area_from.tz)
   1184         ts = ts.truncate(before=start, after=end)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/entsoe/parsers.py in parse_crossborder_flows(xml_text)
    229     series = pd.Series(dtype = 'object')
    230     for soup in _extract_timeseries(xml_text):
--> 231         series = series.append(_parse_crossborder_flows_timeseries(soup))
    232     series = series.sort_index()
    233     return series

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/entsoe/parsers.py in _parse_crossborder_flows_timeseries(soup)
    659     series = pd.Series(index=positions, data=flows)
    660     series = series.sort_index()
--> 661     series.index = _parse_datetimeindex(soup)
    662 
    663     return series

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   5498         try:
   5499             object.__getattribute__(self, name)
-> 5500             return object.__setattr__(self, name, value)
   5501         except AttributeError:
   5502             pass

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    557         if not fastpath:
    558             # The ensure_index call above ensures we have an Index object
--> 559             self._mgr.set_axis(axis, labels)
    560 
    561     # ndarray compatibility

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    214     def set_axis(self, axis: int, new_labels: Index) -> None:
    215         # Caller is responsible for ensuring we have an Index object.
--> 216         self._validate_set_axis(axis, new_labels)
    217         self.axes[axis] = new_labels
    218 

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
     55 
     56         elif new_len != old_len:
---> 57             raise ValueError(
     58                 f"Length mismatch: Expected axis has {old_len} elements, new "
     59                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 22 elements, new values have 85 elements

@fboerman
Copy link
Collaborator

fboerman commented Nov 23, 2021 via email

@charelF charelF changed the title query_crossborder_flows fails in August 2021 query_crossborder_flows fails in August 2021 for DE->NL Nov 23, 2021
@charelF
Copy link
Author

charelF commented Nov 23, 2021

I'm not sure its related to #137 given that user seems to have incorrect data whereas in my case it just fails.

Altough interestingly

start = pd.Timestamp(str(int(2021_08_01)), tz='Europe/Amsterdam')
end = pd.Timestamp(str(int(2021_09_01)), tz='Europe/Amsterdam')
client.query_crossborder_flows("CZ", "AT", start=start, end=end)

returns

2021-08-01 00:00:00+02:00    1103.0
2021-08-01 01:00:00+02:00    1121.0
2021-08-01 02:00:00+02:00    1062.0
2021-08-01 03:00:00+02:00    1037.0
2021-08-01 04:00:00+02:00     945.0
                              ...  
2021-08-31 19:00:00+02:00     899.0
2021-08-31 20:00:00+02:00     821.0
2021-08-31 21:00:00+02:00     766.0
2021-08-31 22:00:00+02:00    1054.0
2021-08-31 23:00:00+02:00     969.0
Freq: 60T, Length: 744, dtype: float64

which is indeed a DataFrame, which makes me think that the issue I have is potentially only present from DE->NL queries. I added this to the title of the issue.

@fboerman
Copy link
Collaborator

fboerman commented Nov 23, 2021 via email

@charelF
Copy link
Author

charelF commented Nov 23, 2021

It gets a bit weirder:

I narrowed the problem down to the 1st of August 2021. Because from August 2nd onwards, we get the correct result (15min interval)

start = pd.Timestamp(year=2021, month=8, day=2, tz='Europe/Amsterdam')
end = pd.Timestamp(year=2021, month=9, day=1, tz='Europe/Amsterdam')
client.query_crossborder_flows("DE", "NL", start=start, end=end)

gives the correct looking df:

2021-08-02 02:00:00+02:00    2144.0
2021-08-02 02:15:00+02:00    1819.0
2021-08-02 02:30:00+02:00    1677.0
2021-08-02 02:45:00+02:00    1559.0
2021-08-02 03:00:00+02:00    1343.0
                              ...  
2021-08-31 22:45:00+02:00    3312.0
2021-08-31 23:00:00+02:00    3314.0
2021-08-31 23:15:00+02:00    3290.0
2021-08-31 23:30:00+02:00    3276.0
2021-08-31 23:45:00+02:00    3145.0
Freq: 15T, Length: 2872, dtype: float64

The problem seems to be only for 1st of august, for example:

start = pd.Timestamp(year=2021, month=8, day=1, hour=10, tz='Europe/Amsterdam')
end = pd.Timestamp(year=2021, month=8, day=1, hour=15, tz='Europe/Amsterdam')
client.query_crossborder_flows("DE", "NL", start=start, end=end)

gives again:

ValueError: Length mismatch: Expected axis has 5 elements, new values have 17 elements

However what is really interesting:

Querying every single hour of august 1st individually does work:

for h1, h2 in zip(range(23), range(1,24)):
    start = pd.Timestamp(year=2021, month=8, day=1, hour=h1, tz='Europe/Amsterdam')
    end = pd.Timestamp(year=2021, month=8, day=1, hour=h2, tz='Europe/Amsterdam')
    print(client.query_crossborder_flows("DE", "NL", start=start, end=end))

gives

2021-08-01 00:00:00+02:00    1345.0
Freq: 60T, dtype: float64
2021-08-01 01:00:00+02:00    1046.0
Freq: 60T, dtype: float64
2021-08-01 02:00:00+02:00    1070.0
Freq: 15T, dtype: float64
2021-08-01 03:00:00+02:00    1965.0
Freq: 15T, dtype: float64
2021-08-01 04:00:00+02:00    1950.0
Freq: 15T, dtype: float64
2021-08-01 05:00:00+02:00    2696.0
Freq: 15T, dtype: float64
2021-08-01 06:00:00+02:00    2276.0
Freq: 15T, dtype: float64
2021-08-01 07:00:00+02:00    2174.0
Freq: 15T, dtype: float64
2021-08-01 08:00:00+02:00    1366.0
Freq: 15T, dtype: float64
2021-08-01 09:00:00+02:00    799.0
Freq: 15T, dtype: float64
2021-08-01 10:00:00+02:00    453.0
Freq: 15T, dtype: float64
2021-08-01 11:00:00+02:00    252.0
Freq: 15T, dtype: float64
2021-08-01 12:00:00+02:00    561.0
Freq: 15T, dtype: float64
2021-08-01 13:00:00+02:00    451.0
Freq: 15T, dtype: float64
2021-08-01 14:00:00+02:00    226.0
Freq: 15T, dtype: float64
2021-08-01 15:00:00+02:00    179.0
Freq: 15T, dtype: float64
2021-08-01 16:00:00+02:00    563.0
Freq: 15T, dtype: float64
2021-08-01 17:00:00+02:00    1833.0
Freq: 15T, dtype: float64
2021-08-01 18:00:00+02:00    2561.0
Freq: 15T, dtype: float64
2021-08-01 19:00:00+02:00    1432.0
Freq: 15T, dtype: float64
2021-08-01 20:00:00+02:00    1609.0
Freq: 15T, dtype: float64
2021-08-01 21:00:00+02:00    1405.0
Freq: 15T, dtype: float64
2021-08-01 22:00:00+02:00    2261.0
Freq: 15T, dtype: float64

To me it looks like the frequency is changed from 60T to 15T halfway through the day, which I assume means from 1H to 15min, however the data returned on that day is still only hourly. And if querying for multiple hours on that day, we get 1H data from entsoe, while the index freq is 15min, which causes the problems. At least that is my hypothesis right now.

@fboerman
Copy link
Collaborator

hi @charelF thank you for your thorough investigation. That seems like a very good reasoning of why it goes wrong. The question is what would be the best fix. This is a one time occurence so maybe just put in a static fix would be fine. But it is possible that this will occur on more borders. I am not sure yet what is the best solution. Do you have a good idea?

@charelF
Copy link
Author

charelF commented Nov 23, 2021

Do you have a good idea?

Unfortunately I am both new to this library and also new to this domain, so I do not feel qualified to give an answer on how to best fix this in the library (particularly since the call stack is still somewhat obscure to me, in particular the soup parts) nor to comment on how common this would be on other borders.

That being said, I did more investigations:

On here, I looked at the 3 dates:
https://transparency.entsoe.eu/transmission-domain/physicalFlow/show?name=&defaultValue=false&viewType=TABLE&areaType=BORDER_CTY&atch=false&dateTime.dateTime=31.07.2021+00:00|UTC|DAY&border.values=CTY|10YNL----------L!CTY_CTY|10YNL----------L_CTY_CTY|10Y1001A1001A83F&dateTime.timezone=UTC&dateTime.timezone_input=UTC

July 31st 2021

image

We have normal 1 hour intervals

August 2nd 2021

image

Normal 15 min intervals

August 1st 2021

image

Oh no... so it looks like a mix between 1 hour intervals and 15 min intervals --> This is probably where the error comes from, so I assume its an issue from entsoe

I checked, an this issue is not present for any of the other borders of NL, at least not on that particular day.

With these insights, I would say this is an error from entsoe, so probably a manual fix would apply here, since one can assume this should not happen again...

@fboerman
Copy link
Collaborator

Do you have a good idea?

Unfortunately I am both new to this library and also new to this domain, so I do not feel qualified to give an answer on how to best fix this in the library (particularly since the call stack is still somewhat obscure to me, in particular the soup parts) nor to comment on how common this would be on other borders.

That being said, I did more investigations:

On here, I looked at the 3 dates: https://transparency.entsoe.eu/transmission-domain/physicalFlow/show?name=&defaultValue=false&viewType=TABLE&areaType=BORDER_CTY&atch=false&dateTime.dateTime=31.07.2021+00:00|UTC|DAY&border.values=CTY|10YNL----------L!CTY_CTY|10YNL----------L_CTY_CTY|10Y1001A1001A83F&dateTime.timezone=UTC&dateTime.timezone_input=UTC

July 31st 2021

image

We have normal 1 hour intervals

August 2nd 2021

image

Normal 15 min intervals

August 1st 2021

image

Oh no... so it looks like a mix between 1 hour intervals and 15 min intervals --> This is probably where the error comes from, so I assume its an issue from entsoe

I checked, an this issue is not present for any of the other borders of NL, at least not on that particular day.

With these insights, I would say this is an error from entsoe, so probably a manual fix would apply here, since one can assume this should not happen again...

I can perhaps look up the exact implementation date of the 15 min trade. it is very possible it is not a problem from entsoe but from tennet side on the way it is reported. I will talk to my contact at the TenneT transparancy team tomorrow (disclaimer: I also work at TenneT) how this was handled. Thanks for the thorough background info! I whish that all issue authors would be like this haha

@charelF
Copy link
Author

charelF commented Nov 23, 2021

Thanks for the thorough background info! I whish that all issue authors would be like this haha

Thanks, I am happy to pay some of the time this library saved me back!

@fboerman fboerman added the server bug when an issue isn't entsoe-py's issue, but on the server side label Nov 24, 2021
@fboerman
Copy link
Collaborator

hi @charelF I have poked some people and they are looking into it. ill update the issue when I know more

@MattEwen
Copy link

Hiya - just wanted to add that the same issue seems to occur for Germany for both Switzerland:
Screenshot 2021-12-16 at 10 00 37

and Luxembourg:
Screenshot 2021-12-16 at 10 02 00

where the time granularity changes early in the month and means the data can't be parsed. I get the same

ValueError: Length mismatch: Expected axis has 22 elements, new values have 85 elements

Only mention this because strangely I don't get this error for DE and NL - only on these two cases.

And thanks again so much for the library, it really is great

@fboerman
Copy link
Collaborator

hi @MattEwen this seems like a consistent issue then. I will open a support ticket with entsoe.

@fboerman
Copy link
Collaborator

fboerman commented Jan 6, 2022

hi @MattEwen I have received a reply it is fixed now on dutch borders but the other borders they are sill looking into.

@fboerman
Copy link
Collaborator

fboerman commented Jan 14, 2022

hi @MattEwen and @charelF the issue is now fixed by ENTSO-E for all cases we discussed here :D

@charelF
Copy link
Author

charelF commented Jan 15, 2022

thanks, thats nice to hear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server bug when an issue isn't entsoe-py's issue, but on the server side
Projects
None yet
Development

No branches or pull requests

3 participants