Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.Series interpolate with method='time' returns inconsistent results for first or last NaN #15356

Closed
bertrandhaut opened this issue Feb 9, 2017 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@bertrandhaut
Copy link
Contributor

Calling the pd.Series interpolate with method='time' returns inconsistent results when the first or the last value is NaN.

When the first value is NaN, interpolation is not performed on the first value. This is, for me, the expected behaviour since interpolation is not possible.

pd.Series(index=[datetime(2017,1,1), datetime(2017,1,2), datetime(2017,1,7)], data=[float('nan'), float('nan'), 3]).interpolate(method='time')

2017-01-01 NaN
2017-01-02 NaN
2017-01-07 3.0
dtype: float64

When the last value is a NaN, 'interpolation' is performed (like a forward-fill). I was expecting to keep the NaN values

pd.Series(index=[datetime(2017,1,1), datetime(2017,1,2), datetime(2017,1,7)], data=[1, float('nan'), float('nan')]).interpolate(method='time')

2017-01-01 1.0
2017-01-02 1.0
2017-01-07 1.0
dtype: float64

pd.show_versions()
INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

This does not seem to be specific to 'time' interpolation, as for more simple cases you see the same behaviour:

In [58]: pd.Series([1, 2, np.nan], index=[1,2,4]).interpolate()
Out[58]: 
1    1.0
2    2.0
4    2.0
dtype: float64

In [59]: pd.Series([1, 2, np.nan], index=[1,2,4]).interpolate(method='index')
Out[59]: 
1    1.0
2    2.0
4    2.0
dtype: float64

To me that feels like a bug (or at least not a behaviour that is mentioned in the docs), but not too familiar with the interpolate code.

@jorisvandenbossche
Copy link
Member

OK, this is a duplicate of #8000. Always welcome to look into this and submit a PR!

@jorisvandenbossche jorisvandenbossche added Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Feb 9, 2017
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Feb 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

2 participants