You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
date vals vals_2
0 2018-01-01 00:00:00-06:00 a NaN
1 2018-01-02 00:00:00-06:00 b NaN
2 2018-01-03 00:00:00-06:00 c NaN
3 2018-01-04 00:00:00-06:00 d NaN
4 2018-01-05 00:00:00-06:00 e NaN
When i was expecting
date vals vals_2
0 2018-01-01 00:00:00-06:00 a NaN
1 2018-01-02 00:00:00-06:00 b NaN
2 2018-01-03 00:00:00-06:00 c t
3 2018-01-04 00:00:00-06:00 d u
4 2018-01-05 00:00:00-06:00 e v
In PR #25260 the test case was specified with all NaN in vals_2 as expected. I don't understand why considering how merge on two columns or join on two indicies work:
Yes, it's a bit strange that this is inconsistent with identical DataFrames that are tz naive:
In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0.dev0+520.g2ef50aea04'
In [2]: df1 = pd.DataFrame({
...: 'date': pd.date_range(start='2018-01-01', periods=5, tz='America/Chicago'),
...: 'vals': list('abcde')})
...:
...: df2 = pd.DataFrame({
...: 'date': pd.date_range(start='2018-01-03', periods=5, tz='America/Chicago'),
...: 'vals_2': list('tuvwx')})
...:
...: df1.join(df2.set_index('date'), on='date')
Out[2]:
date vals vals_2
0 2018-01-01 00:00:00-06:00 a NaN
1 2018-01-02 00:00:00-06:00 b NaN
2 2018-01-03 00:00:00-06:00 c NaN
3 2018-01-04 00:00:00-06:00 d NaN
4 2018-01-05 00:00:00-06:00 e NaN
In [3]: df1 = pd.DataFrame({
...: 'date': pd.date_range(start='2018-01-01', periods=5),
...: 'vals': list('abcde')})
...:
...: df2 = pd.DataFrame({
...: 'date': pd.date_range(start='2018-01-03', periods=5),
...: 'vals_2': list('tuvwx')})
...:
...: df1.join(df2.set_index('date'), on='date')
Out[3]:
date vals vals_2
0 2018-01-01 a NaN
1 2018-01-02 b NaN
2 2018-01-03 c t
3 2018-01-04 d u
4 2018-01-05 e v
I stumbled across some unexpected behavior when performing a left join on a specified column with time zone aware columns.
When i was expecting
In PR #25260 the test case was specified with all NaN in vals_2 as expected. I don't understand why considering how merge on two columns or join on two indicies work:
Both yield the expected behavior.
Is there something I'm missing or is this inconsistency a bug?
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-48-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.6.1
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: