Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Decorators and Better Pytest Integration in 'test_excel' #19829

Merged
merged 19 commits into from
Feb 26, 2018

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Feb 21, 2018

Once I started looking at it, it made sense to make the module more "pytest compatible". There was a lot of skipping being done imperatively that seemed better suited to using class-level decorators, so as I moved some of that around I also replaced the setup/teardown mechanism of the "Excel Writing" tests with a pytest fixture

@@ -1907,10 +1780,10 @@ def test_invalid_columns(self):
with pytest.raises(KeyError):
write_frame.to_excel(path, 'test1', columns=['C', 'D'])

def test_comment_arg(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few one-off functions like this that didn't previously have the skip_if_no_xlrd call in them However, when trying to run them without xlrd but say xlwt installed they would throw:

ModuleNotFoundError: No module named 'xlrd'

thrown by the ExcelFile constructor. So I'm assuming it was an oversight not to have these skipped, (they will be going forward with the class structure I put in place)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that’s good thanks

pytest.param(_XlwtWriter, '.xls', marks=pytest.mark.skipif(
not td.safe_import('xlwt'), reason='No xlwt'))
])
def test_ExcelWriter_dispatch(self, klass, ext):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was pretty wonky to begin with. With the parametrization I've split out the tests and given preference to xlsxwriter if both that and openpyxl are installed with .xlsx files, which is what the original function was doing in a roundabout way

pytest.param('xlsxwriter', '.xlsx', marks=pytest.mark.skipif(
not td.safe_import('xlsxwriter'), reason='No xlsxwriter'))
])
class TestExcelWriter(_WriterBase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating subclasses and adding arbitrary class metadata it seemed easy enough and clearer to parametrize the writing tests for each engine and the extension it should be serving

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that came to mind - want me to add parametrization for .xlsm with openpyxl and xlsxwriter? Neither were being tested before, but I think it makes sense to add those extensions for more coverage.

Happy to add here or in separate change if we don't want to expand the scope further

ext = '.xlsx'
engine_name = 'xlrd'
check_skip = staticmethod(_skip_if_no_xlrd)
class _WriterBase(SharedItems):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original ExcelWriterBase class had an explicit setup / teardown method to set / reset options before each test case. Because I parametrized that test the setup / teardown paradigm did not have visibility into the actual fixtures being passes, so setting up an autoused fixture here to wrap each test case was the best option to inspect the parameters and set / reset the appropriate excel options before and after test execution.

I made this it's own subclass because the three classes that test specific features of openpyxl, xlsxwriter and xlwt still benefit from having this fixture, although it does require them to provide some extra metadata (namely the merge_cells fixture) that they don't use.

If this is too wonky, I think an alternate approach would be to re-wire the tests to specifically pass the engine fixture as a keyword argument to their read functions (didn't explore that option in detail so could be wrong)

@WillAyd WillAyd changed the title Excel skipif decs Test Decorators and Better Pytest Integration in 'test_excel' Feb 21, 2018
@@ -103,7 +67,7 @@ def get_csv_refdf(self, basename):
dfref = read_csv(pref, index_col=0, parse_dates=True, engine='python')
return dfref

def get_excelfile(self, basename):
def get_excelfile(self, basename, ext):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More verbose doing this, but with the parametrization of Test Classes performed further down in the module I think it's better to explicitly pass ext as an argument rather than creating subclasses that bind their own ext variable

@WillAyd
Copy link
Member Author

WillAyd commented Feb 21, 2018

The Travis network failures seem specific to Py27 (ran fine locally on Py3 but could reproduce with Py27). Assume that something is getting crossed up between the fixtures and the decorator for that test

@WillAyd
Copy link
Member Author

WillAyd commented Feb 22, 2018

So this is extremely nuanced but it looks like the combination of Py27, using functools.wraps and the tm.network decorator do not play nice together. I found another like issue open on Pytest's bug tracker and their suggestion was to use the wraps method from six to be backwards compatible with Py27.

Before making a change on that front I figure it's worth stopping here and getting feedback on the PR as is, but that again is one possible solution to the failures.

Here's the issue I mentioned:
pytest-dev/pytest#2782

And the "offender" in the pandas source:

@wraps(t)

@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

looks pretty reasonable
ok on fixing the wraps
thing can just add the six wraps inline in pandas.compat (or separate file of too big)

@codecov
Copy link

codecov bot commented Feb 22, 2018

Codecov Report

Merging #19829 into master will decrease coverage by 0.01%.
The diff coverage is 53.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19829      +/-   ##
==========================================
- Coverage   91.66%   91.65%   -0.02%     
==========================================
  Files         150      150              
  Lines       48938    48949      +11     
==========================================
+ Hits        44860    44865       +5     
- Misses       4078     4084       +6
Flag Coverage Δ
#multiple 90.03% <53.84%> (-0.02%) ⬇️
#single 41.81% <53.84%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/util/testing.py 83.85% <100%> (ø) ⬆️
pandas/util/_test_decorators.py 92.42% <100%> (+0.36%) ⬆️
pandas/compat/__init__.py 57.62% <25%> (-1.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92dbc78...872e7d6. Read the comment docs.

import xlrd

# Test reading times with and without milliseconds. GH5945.
if LooseVersion(xlrd.__VERSION__) >= LooseVersion("0.9.3"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could put this in the decorator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional doesn't skip anything - it just sets a different expectation on the precision of the result. I suppose we could split this into two separate tests and skip one or the other depending on what's installed but I think that is more trouble than its worth, especially considering the current skip_if_no decorator only has a min_version but not a max_version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, no prolem.

@@ -1907,10 +1780,10 @@ def test_invalid_columns(self):
with pytest.raises(KeyError):
write_frame.to_excel(path, 'test1', columns=['C', 'D'])

def test_comment_arg(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that’s good thanks

@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

I think my comment got garbled. ok to fix the wraps things by using the six version. just include it in pandas.compat (or as a separate file if too big).

@jreback jreback added Testing pandas testing functions or related to the test suite IO Excel read_excel, to_excel labels Feb 22, 2018
@TomAugspurger
Copy link
Contributor

Can you check to make sure this doesn't hit the issue described in pytest-dev/pytest#568?

I don't recall the details, but it was something to do with applying skip marks to child classes causing parents & or siblings to skip, which sounds similar to what's changing here.

@WillAyd
Copy link
Member Author

WillAyd commented Feb 22, 2018

@TomAugspurger thanks for sharing. Fortunately that issue should not affect this design here, as the only subclasses with class-level skip_if decorators inherit _WriterBase, which doesn't have any tests it needs to execute on its own.

FWIW I compare the before / after results with various combinations of these packages installed in a virtual environment and the numbers matched in all cases, save the fact that there are 3 new tests as a result of parametrizing test_ExcelWriter_dispatch and creating test_ExcelWriter_dispatch_raises

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 22, 2018 via email

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just small request on docs. ping on green.

@@ -365,6 +365,18 @@ def callable(obj):
return any("__call__" in klass.__dict__ for klass in type(obj).__mro__)


if sys.version_info[0:2] < (3, 4):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob ok to just do this for PY2. can hyou add a comment and ref to the CPython bug report of what this is doing

version = getattr(sys.modules[mod_name], '__version__')
try:
version = getattr(sys.modules[mod_name], '__version__')
except AttributeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugg, really?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose another way to go about this would be do a str.lower on the attribute, but I did it this way because it's clearer why there's even a need for it. Whenever xlrd becomes obsolete it's an easy identifier that we can get rid of it.

If you prefer the former then no problem I'll do that in next commit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I put in a PR to xlrd to add this python-excel/xlrd#275 for future versions

import xlrd

# Test reading times with and without milliseconds. GH5945.
if LooseVersion(xlrd.__VERSION__) >= LooseVersion("0.9.3"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, no prolem.

@jreback jreback added this to the 0.23.0 milestone Feb 24, 2018
@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

I removed xlwt (only) and got this:

______________________________________________________________________________________ TestExcelWriter.test_to_excel_timedelta[openpyxl-.xlsx-True] ______________________________________________________________________________________
pandas/tests/io/test_excel.py:1386: in test_to_excel_timedelta
    frame.to_excel(path, 'test1')
pandas/core/frame.py:1599: in to_excel
    engine=engine)
pandas/io/formats/excel.py:646: in write
    writer = ExcelWriter(_stringify_path(writer), engine=engine)
pandas/io/excel.py:1432: in __init__
    import xlwt
E   ModuleNotFoundError: No module named 'xlwt'
_____________________________________________________________________________________ TestExcelWriter.test_to_excel_timedelta[openpyxl-.xlsx-False] ______________________________________________________________________________________
pandas/tests/io/test_excel.py:1386: in test_to_excel_timedelta
    frame.to_excel(path, 'test1')
pandas/core/frame.py:1599: in to_excel
    engine=engine)
pandas/io/formats/excel.py:646: in write
    writer = ExcelWriter(_stringify_path(writer), engine=engine)
pandas/io/excel.py:1432: in __init__
    import xlwt
E   ModuleNotFoundError: No module named 'xlwt'
_____________________________________________________________________________________ TestExcelWriter.test_to_excel_timedelta[xlsxwriter-.xlsx-True] _____________________________________________________________________________________
pandas/tests/io/test_excel.py:1386: in test_to_excel_timedelta
    frame.to_excel(path, 'test1')
pandas/core/frame.py:1599: in to_excel
    engine=engine)
pandas/io/formats/excel.py:646: in write
    writer = ExcelWriter(_stringify_path(writer), engine=engine)
pandas/io/excel.py:1432: in __init__
    import xlwt
E   ModuleNotFoundError: No module named 'xlwt'
____________________________________________________________________________________ TestExcelWriter.test_to_excel_timedelta[xlsxwriter-.xlsx-False] _____________________________________________________________________________________
pandas/tests/io/test_excel.py:1386: in test_to_excel_timedelta
    frame.to_excel(path, 'test1')
pandas/core/frame.py:1599: in to_excel
    engine=engine)
pandas/io/formats/excel.py:646: in write
    writer = ExcelWriter(_stringify_path(writer), engine=engine)
pandas/io/excel.py:1432: in __init__
    import xlwt
E   ModuleNotFoundError: No module named 'xlwt'
=========================================================================================== 4 failed, 293 passed, 116 skipped in 13.11 seconds ===========================================================================================

recons = read_excel(reader, 'test1')
tm.assert_frame_equal(expected, recons)

def test_to_excel_timedelta(self, merge_cells, engine, ext):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed without xlwt because the extension was hard-coded to .xls all the time, rather than using the ext that the fixture provides for all other test cases. I updated that to use the fixture, but in the process discovered that this doesn't work correctly with openpyxl so explicitly added a pytest.xfail and plan to open an issue to address separately

self.frame.to_excel(self.path, 'test1', header=False)
self.frame.to_excel(self.path, 'test1', index=False)

@pytest.mark.parametrize("np_type", [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured I might as well parametrize this and the following two functions as part of this

option_name = 'io.excel.{ext}.writer'.format(ext=ext.strip('.'))
prev_engine = get_option(option_name)
set_option(option_name, engine)
with ensure_clean(ext) as path:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because every write test uses ensure_clean as a context manager I figured it made more sense to add to the fixture rather than duplicating for each test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally fine. can you add some comments in the set_options, e.g. also a line explaining that this automatically used

# avoid mixed inferred_type
df = DataFrame([[u'\u0192', u'\u0193', u'\u0194'],
[u'\u0195', u'\u0196', u'\u0197']],
index=[u'A\u0192', u'B'],
columns=[u'X\u0193', u'Y', u'Z'])

with ensure_clean('__tmp_to_excel_float_format__.' + self.ext)\
as filename:
with ensure_clean('__tmp_to_excel_float_format__.' + ext) as filename:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next test modified the name of the file generated by ensure_clean. In these tests, the content manager within the setup fixture is therefore a no-op. Could move these into a separate class or do something tricky to prevent that, but I figure it was easiest just to keep as is

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase as well

option_name = 'io.excel.{ext}.writer'.format(ext=ext.strip('.'))
prev_engine = get_option(option_name)
set_option(option_name, engine)
with ensure_clean(ext) as path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally fine. can you add some comments in the set_options, e.g. also a line explaining that this automatically used

@WillAyd
Copy link
Member Author

WillAyd commented Feb 26, 2018

Just pushed updates. FWIW there was an error in the last Circle / Travis builds for TestExcelWriter.test_to_excel_timedelta. That test that originally hard-coded to use '.xls' files so was never testing for openpyxl and xlsxwriter. I'm imperatively skipping the former as that was not working and needs to be addressed separately (see #19900), but the later seems to work for most configurations.

I can't reproduce the issue locally on OSX with Py27 so perhaps it's a Linux-only issue? Waiting to see what happens this time around, but assuming it pops up again I would plan on imperatively skipping for the combination of Py27 and Linux for now and opening a separate issue to address that

INSTALLED VERSIONS

commit: 849933d
python: 2.7.14.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.0.dev0+400.g849933d2c
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.27.3
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 26, 2018

happy to hve in this one
but if it is holding this back can do in a follow up
lmk

@WillAyd
Copy link
Member Author

WillAyd commented Feb 26, 2018

Tried locally and everything passed with '.xlsm' so I'd prefer to include in this. Will re-push after seeing output of latest commit in Travis/CircleCI


if engine == 'openpyxl':
pytest.xfail('Timedelta roundtrip broken with openpyxl')
if engine == 'xlsxwriter' and (sys.version_info[0] == 2 and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This keeps failing on Travis with Py2, but AppVeyor was fine and I can't reproduce locally (macOS) so I assume it to be linux-only (?). Since this is a "new" test I figure just skip for now and I'll open an issue to see if someone on a Linux platform can help troubleshoot. If you have a better idea on how to manage let me know

@jreback jreback merged commit 1e4c50a into pandas-dev:master Feb 26, 2018
@jreback
Copy link
Contributor

jreback commented Feb 26, 2018

thanks @WillAyd keep em coming!

and ideally if you can submit PR's for the followups you noted (I think you created issues for all)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: Use Skipif Decorators in 'test_excel.py'
3 participants