-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After upgrading from 0.3 to 0.8.1, one of my notebook cells with resampler runs indefinitely #115
Comments
Hi @Alexander-Serov, I'm a little confused; does it keep running because of the Could you also provide a minimalistic code example of the plotly- figure (data) that gets stuck? Cheers, |
Hey @jonasvdd, No, I think it is the
I will see if I can provide a minimal example, the problem is I am working with copyright code and data, so I cannot just copy-paste... I will see what I can do. But the problem clearly appears in the 0.8.0 release. |
Hi, is it possible to create a summary of your data properties (e.g. which trace sizes are we speaking of, what dtype does your data have) and share your plotting code! Than I can try to look whether I can replicate it! |
The plotting code for this function is very simple, the problem likely comes from the data 🤔. Here is a simplified version of the code:
Basically grouping a dataset with time index by a parameter and plotting several curves. |
And here is a sample of data. I hope you are able to reproduce. Remember to convert the index to a time index (perhaps it has an effect?). |
For me, this still works, when I run this in an isolated notebook. What is your behavior with this code snippet (note that it uses the sample_data you shared with me)? from plotly_resampler import FigureResampler
import plotly.graph_objects as go
import pandas as pd
data = pd.read_csv('sample_data.csv', index_col='time')
fig = go.Figure()
for param in sorted(data["param"].unique()):
param_data = data.loc[data["param"] == param,
["param1", "param2"]].set_index(
"param1", drop=True
)
param_data = param_data.sort_index()
fig.add_trace(
go.Scattergl(
x=param_data.index,
y=param_data["param2"],
name=f"{param}",
mode="markers",
marker=dict(color='lightblue', size=5),
)
)
fig.update_xaxes(title_text="D")
fig.update_yaxes(title_text='L')
fig.update_layout(title="Av")
fig.show()
print("-"*80)
FigureResampler(fig, default_n_shown_samples=10_000).show_dash(mode='inline') 📷 ⬇️ |
Hey! It does work indeed, but the index in your example is of
Let me know. |
From your silence, I understand you were able to reproduce the issue @jonasvdd? Let me know if you need anything else. |
Hi @Alexander-Serov, I was able to find 2 errors based on minor adjustments of your snippet and implemented a solution in PR #116
We can create a pre-release so that you can test on your end whether this resolves the issue or not :) PS: we (@jonasvdd & me) were both afk because we were backpacking in the Alps! ⛺ ⛰️ |
Hey! No worries, it seems you guys were closer to my place than I expected. 🏔️ Glad to hear you found the problems and no problem testing the pre-release, let me know! Just a short question for error 2 though. You are saying you are considering dropping the TZ information and using UTC. This might render the FigureResampler unusable for our applications, for example, because UTC time could be hard to interpret for specific projects. E.g. for this particular data set, with a -7h UTC offset, the daytime data would be shown at night. Perhaps, there is a way to keep the TZ? It seems plotly is able to plot the original figure just fine. Or could we use UTC internally, but show the original labels on the x axis? It is plotting non-UTC data in the UTC timezone that would create the most confusion for us. Also, a remark just to be sure we are talking about the same things. In the data set I've sent you, you will see the -07:00 and -08:00 UTC offsets. However, it is still the same timezone and it is just the result of the daylight saving time (DST) change. To properly process this kind of DST data, one normally casts them to a named TZ, for instance, for the data I've sent you, you could use
Would this adjustment solve the problem you were having? (Surely, not all input data can be converted to such format if you don't know the TZ name, but this is a very common case. And perhaps you could process differently [try-catch] those cases, when this is not a problem?). |
Hey! It was a long (night)train-ride away to get there, but what a coincidence indeed 😃 Internally, when there are multiple timezones in the x-data, we now omit the timezone information (but do not convert the time) In the other cases nothing special happens - see additions to our parsing code; plotly-resampler/plotly_resampler/figure_resampler/figure_resampler_interface.py Lines 690 to 718 in c0cd89e
To answer your two questions;
Indeed, plotly can cope well with this! I've checked in tests if omitting timezone information (when there are multiple timezones in the x-data) results in the same x-values as vanilla plotly would show. You are more than welcome to validate this yourself :)
Yes, because in that case the data has only 1 timezone (and not two fixedoffsets). Perhaps we should also include this in the warning message 🤔 I've created pre-release Also, feel free to check out the pull request for this issue #116! |
Hey @Alexander-Serov, does the pre-release fixes the issue? |
Hey @jvdd, |
Hey @jvdd, Sorry again for the delay. I had wanted to test but had forgotten about it! But I've found time to test it now. Here is what I see:
Here is the new (even shorter snippet).
As before, the original plotly plot works fine. Do you have any idea of what could go wrong? Also, I see that you were testing with |
Hi @Alexander-Serov, First of all thanks for the thorough review on PR #116, @jonasvdd and me will take your remarks into account and update the PR accordingly in the following days. It is really nice to dive into & discuss code with other (more) experienced people - this is the power of open source 😄 Regarding your comment above: I was also able to reproduce the kernel crash (actually I encountered this issue as well when creating the first implementation in PR #116, but I thought the issue was fixed in the pre-release - which is apparently not the case 🙃 ). We'll try to properly find the root cause for this issue and hopefully fix it as well in PR #116 |
@Alexander-Serov, taking your comments into account I updated the code and created a new pre-release P.S., I still have to look into finding the most optimal way to drop (or converting?) the tzinfo when there are multiple timezones before we can create a new major release |
Hey! Considering the question of dropping/conversion of datetimes with different TZs, I think it's important to ask oneself why different TZs got there in the first place:
This said, I know plotly itself manages some of these cases without converting all to UTC. I wonder if we can look up what they do in their code. I will now try the rc2 to see if it fixes the kernel crash and have a look at the PR replies. :) If all goes well, we should have a new version soon. :) |
After first tests, it seems like I am having this AssertionError although all my index datetimes are in the same TZ: I am trying to plot
on the above data set converted to a single named TZ by applying
Do you confirm? |
At the same time, if I plot this plot (the one discussed above):
on the dataset with a single named TZ, but DST, this cell continues to run indefinitely. Let me know if you can confirm. I will continue looking at it tomorrow. 😃 Thanks for your efforts! |
I do not get an AssertionError when all the there is only a single named TZ. Code from screenshot above: data = pd.read_csv("sample_data.csv", index_col='time')
data.index = pd.to_datetime(data.index,utc=True).tz_convert('America/Los_Angeles')
fig = go.Figure()
fig.add_trace(
go.Scattergl(
x=data.index,
y=data["param2"],
mode="markers",
marker=dict(color='lightblue', size=5),
)
)
fig.update_layout(height=300)
fig.show()
print("-"*80)
FigureResampler(fig).show_dash(mode='inline') However, I did get an AssertionError when the data was not converted to a single named TZ.
This plot works perfectly on my computer as well It is rather bizarre that I cannot reproduce your errors.. To be sure that we are comparing apples to apples I created a new pre-release rc3 (which contains the fixes from earlier today) |
This PR still needs some more thought - I tend to agree with your comment @Alexander-Serov #115 (comment) At the moment, I'm unsure if dropping the timezone information is the responsibility of this library - as users can convert it to UTC or drop them themselves..
|
Hey guys! I have tested the latest
I think it pretty much fits into our 99% discussed above. I also see that this conde is exactly the same as your code above. Are we running different plotly/numpy versions? 🤔 Here is my simplified
I am running py3.7 in a Windows environment. |
Update. |
Update. Do you have an idea why it could hang on |
Hi @Alexander-Serov, Thx for investigating this - we both appreciate your efforts :) I could not reproduce your issue in a Python 3.7.14 environment (with the same packages as from the To further answer your questions / comments:
So the issue only occurs when you pass
Nope... really boggles my mind |
I have several figures in a notebook. All other figures plot correctly and I can wrap PlotlyResampler around and show them. However, one particular figure plots just fine, but when I wrap it in PlotlyResampler my cell keeps running indefinitely. This unfortunately blocks my update to 0.8.1. Do you have any idea @jonasvdd ?
#32
FigureResampler(fig, default_n_shown_samples=MAX_POINTS).show_dash(mode="inline")
Other observations:
dash show
, it works fineThe text was updated successfully, but these errors were encountered: