Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Try to sort result of Index.union rather than guessing sortability #17378

Closed
wants to merge 3 commits into from

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Aug 30, 2017

The lines of code I removed seemed written with the intention of guaranteeing the same result under Python 2 and 3, but the docs say "Form the union of two Index objects and sorts if possible.": so if it is possible in Python 2 we should do it, regardless of whether it is in Python 3. After all, this is how .sort_values() works. And more practically, trying to guess whether the result of an union can be sorted is hard.

@toobaz toobaz force-pushed the index_union_sort branch 2 times, most recently from 1ca8f9c to 35c2ec7 Compare August 30, 2017 07:51
@codecov
Copy link

codecov bot commented Aug 30, 2017

Codecov Report

Merging #17378 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17378      +/-   ##
==========================================
- Coverage   91.01%   90.99%   -0.02%     
==========================================
  Files         163      163              
  Lines       49567    49559       -8     
==========================================
- Hits        45113    45096      -17     
- Misses       4454     4463       +9
Flag Coverage Δ
#multiple 88.77% <100%> (-0.01%) ⬇️
#single 40.25% <60%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 95.86% <100%> (-0.02%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d676a3...35c2ec7. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 30, 2017

Codecov Report

Merging #17378 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17378      +/-   ##
==========================================
- Coverage   91.72%   91.72%   -0.01%     
==========================================
  Files         150      150              
  Lines       49122    49104      -18     
==========================================
- Hits        45057    45039      -18     
  Misses       4065     4065
Flag Coverage Δ
#multiple 90.1% <100%> (-0.01%) ⬇️
#single 41.86% <60%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.65% <100%> (-0.02%) ⬇️
pandas/core/reshape/tile.py 92.94% <0%> (-0.43%) ⬇️
pandas/core/window.py 96.31% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c14e4f...899c040. Read the comment docs.

warnings.warn("%s, sort order is undefined for "
"incomparable objects" % e, RuntimeWarning,
stacklevel=3)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably can simply use core.sorting.safe_sort here instead and just remove the warning entirely (as it fails in many less cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I left the warning as it's still raised if not even safe_sort can sort them.

(I assume the failure is a problem with AppVeyor)

@jreback jreback added 2/3 Compat Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 30, 2017
@toobaz toobaz force-pushed the index_union_sort branch 2 times, most recently from 0a0e0f2 to 0a725fc Compare August 30, 2017 14:07
@toobaz toobaz force-pushed the index_union_sort branch 3 times, most recently from bc4b8ba to e533022 Compare September 6, 2017 15:23
@toobaz
Copy link
Member Author

toobaz commented Sep 6, 2017

Any idea about what's happening with AppVeyor? It fails reliably on this PR with 2.7, but the trace doesn't point at any specific test.

@gfyoung
Copy link
Member

gfyoung commented Sep 6, 2017

@toobaz : Nope, not a clue 😢 . There's no way your changes AFAICT could have caused internal errors on the build. I restarted AppVeyor to double check.

@toobaz
Copy link
Member Author

toobaz commented Sep 6, 2017

I restarted AppVeyor to double check.

Failed again... shall I try closing this and opening a new PR?

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

Failed again... shall I try closing this and opening a new PR?

NO. that just creates more work. let me look.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

ok this is a legitimate catch. pytest-dist is segfaulting, hiding the traceback. So a couple of options to debug.

setup a vm on 3.6/windows is easiest / best.

you can temporarily change the options in the test runner (in ci/scripts_multiple.py) to something like:
pytest -v -r xX -m "not single" --junitxml=/tmp/multiple.xml $TEST_ARGS pandas

which disables the multi-processing and shows each test (so you can see where its failing).

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

it looks like its either segfaulting or in a recursive loop somewhere.

@toobaz
Copy link
Member Author

toobaz commented Sep 7, 2017

setup a vm on 3.6/windows is easiest / best.

Do you mean Python 3.6?! (tests fail only on 2.7)

@gfyoung
Copy link
Member

gfyoung commented Sep 7, 2017

@toobaz : I think @jreback probably meant 2.7 😄

@jreback
Copy link
Contributor

jreback commented Oct 14, 2017

@toobaz this is related to #17839, could simply recast this to add the sort= kwarg (I think for back compat we have to use =False, but maybe be able to change that, not sure).

@toobaz
Copy link
Member Author

toobaz commented Oct 15, 2017

@toobaz this is related to #17839, could simply recast this to add the sort= kwarg (I think for back compat we have to use =False, but maybe be able to change that, not sure).

Certainly related, but I'm confused: are you suggesting the two methods should share more code? Or just that Index.union too should have a sort= argument? In this case I think that for backward compatibility it should default to True...

@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

Certainly related, but I'm confused: are you suggesting the two methods should share more code? Or just that Index.union too should have a sort= argument? In this case I think that for backward compatibility it should default to True...

yes I would add a sort=True kwarg.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

can you rebase. I think we do want to add this kw.

@toobaz
Copy link
Member Author

toobaz commented Nov 26, 2017

Unfortunately the problem on Windows + Python 2.7 persists. I haven't found the time to set up a virtual machine to debug.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

@toobaz if you can rebase would be great. I would also be ok with simply adding the keyword w/o changing much code as a first step.

@gfyoung
Copy link
Member

gfyoung commented Mar 3, 2018

Hmmm...so this issue might be known with pytest (or its plugins) already:

pytest-dev/pytest#2223

Still not sure how @toobaz is triggering that though.

Unless anyone has any other insights into this, I feel inclined to file an issue against one of the pytest repositories and see what they have to say.

@toobaz
Copy link
Member Author

toobaz commented Mar 4, 2018

Maybe we could ask whether there is a way to use this catchsegv within a pytest run?

@gfyoung
Copy link
Member

gfyoung commented Mar 6, 2018

Maybe we could ask whether there is a way to use this catchsegv within a pytest run?

@toobaz : Well, this is a Windows machine, not UNIX 😢

I notice that we are using pytest-forked, which AFAICT, isn't well-supported on Windows:

https://github.com/pytest-dev/pytest-forked

I was about to file an issue against pytest-xdist, but I wonder if this might be the problem child.

@toobaz
Copy link
Member Author

toobaz commented Mar 6, 2018

@toobaz : Well, this is a Windows machine, not UNIX cry

right 🙄

I was about to file an issue against pytest-xdist, but I wonder if this might be the problem child.

What makes you think this is a bug in pytest, not a segfault somewhere in pandas? (or do you think it's both?)

@gfyoung
Copy link
Member

gfyoung commented Mar 7, 2018

What makes you think this is a bug in pytest, not a segfault somewhere in pandas? (or do you think it's both?)

I'm highly skeptical that your changes are causing a segfault. That would be pretty impressive actually given that you're only touching high-level Python 😄

In addition, I've tested your changes on multiple OS's (Windows including). No segfault anywhere.

@gfyoung gfyoung force-pushed the index_union_sort branch 3 times, most recently from 156fd86 to f4571a2 Compare March 7, 2018 04:00
@gfyoung
Copy link
Member

gfyoung commented Mar 7, 2018

So removing xdist + rebasing somehow resulted in test failures on the first machine?

Going to leave this PR alone for now until master is patched,

@toobaz
Copy link
Member Author

toobaz commented Mar 7, 2018

I'm highly skeptical that your changes are causing a segfault. That would be pretty impressive actually given that you're only touching high-level Python

Sure, that would require that my changes are triggering a bug somewhere in lower-level code... but I consider this more probable than a bug in pydist which affects precisely that test, in precisely this PR.

Then again, I'm definitely not an expert.

@gfyoung
Copy link
Member

gfyoung commented Mar 7, 2018

I suppose, but if it's a bug in your code, I haven't been able to reproduce it locally.

@toobaz
Copy link
Member Author

toobaz commented Mar 7, 2018

I suppose, but if it's a bug in your code, I haven't been able to reproduce it locally.

Aha! I had missed that you were testing on Windows. Good to know.

toobaz and others added 3 commits March 9, 2018 20:01
@@ -73,7 +73,7 @@ install:
- cmd: conda info -a

# create our env
- cmd: conda create -n pandas python=%PYTHON_VERSION% cython pytest>=3.1.0 pytest-xdist
- cmd: conda create -n pandas python=%PYTHON_VERSION% cython pytest>=3.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so does xdist cause the issues with this?

Copy link
Member

@gfyoung gfyoung Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really hard to tell at the moment. I've just been experimenting. I'm just at a loss right now as to how that one build on AppVeyor is crashing is so badly with changes like these.

@gfyoung
Copy link
Member

gfyoung commented Mar 10, 2018

@toobaz @jreback : So good news (I guess?), I was able to replicate the AppVeyor failure locally.

@gfyoung
Copy link
Member

gfyoung commented Mar 10, 2018

@toobaz : It appears that you loosened the sorting restrictions too much, at least for Windows:

  • The test that is breaking AppVeyor is TestAlignment.test_basic_frame_alignment
  • The condition where len(indexer) > 0 is the problem. For some reason, I need to put back the comparison of lvals[0] and other_diff[0] to remove the break OR only trying sorting when in the else branch (i.e. when len(indexer) = 0).

@toobaz
Copy link
Member Author

toobaz commented Mar 11, 2018

@gfyoung thanks for your analysis!

Does the problem arise with any specific combination of index types? (from args = product(...)

@gfyoung
Copy link
Member

gfyoung commented Mar 11, 2018

Does the problem arise with any specific combination of index types?

I can't tell unfortunately. It seems to have crashed with multiple types.

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

@toobaz whats' the status on this?

toobaz added a commit to toobaz/pandas that referenced this pull request Apr 24, 2018
@toobaz toobaz mentioned this pull request Apr 24, 2018
4 tasks
toobaz added a commit to toobaz/pandas that referenced this pull request Apr 24, 2018
@jreback
Copy link
Contributor

jreback commented Oct 11, 2018

closing as stale, if you want to continue working, pls ping and we can re-open. you will need to merge master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Union of indexes of tuples is not sorted
4 participants