Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage in 3.3.0 #2968

Closed
4 tasks done
mplanchard opened this issue Nov 28, 2017 · 22 comments
Closed
4 tasks done

Memory usage in 3.3.0 #2968

mplanchard opened this issue Nov 28, 2017 · 22 comments
Labels
plugin: logging related to the logging builtin plugin status: critical grave problem or usability issue that affects lots of users type: regression indicates a problem that was introduced in a release which was working previously

Comments

@mplanchard
Copy link

Bug Description

TL;DR: memory usage seems to have increased several fold in 3.3.0. With the default log capturing enabled, memory usage has the potential to increase to ludicrous levels.

I was troubleshooting issues with tests hanging on our CI server and massive swap utilization, and I found it to be due to an upgrade to pytest 3.3.0. Running our test suite with 3.3.0 quickly consumed all of the memory and began swapping.

Looking at the changelog, the capturing of log output seemed to be the most likely cause. Our application uses SQLAlchemy, which has verbose logging, and has rather verbose logging itself, so I attempted to disable log capturing via both of the options specified here, and while the first seemed to decrease the massive consumption of virtual memory seen with the default options, neither was able to reduce memory utilization enough for our test suite to run.

I have collected the memory utilization of various configurations below. For all of the 3.3.0 runs, the CI server began swapping and the test suite hung, although some configurations resulted in more virtual memory consumption than others.

In all cases, regardless of attempting to disable log capturing, the virtual memory output jumped significantly when hitting tests that produced a large amount of log output. However, even prior to hitting those tests, virtual memory utilization was around twice that of its peak in version 3.2.5.

The tests were running in a python:3.5.2 docker container, via an Ubuntu 16.04 OS. The system has 2 GB of RAM and 2 GB of swap space. In all cases, pytest was invoked via tox, with the command coverage run --source <project directory> -m py.test {posargs}

pytest version additional options peak virtual memory consumed peak resident memory consumed test suite completed
3.2.5 none 2685 MB 243 MB yes
3.3.0 none 11.9 GB !!! 418 MB no
3.3.0 addopts=-p no:logging 9105 MB 366 MB no
3.3.0 --no-print-logs 12.2 GB !!! 431 MB no
3.3.0 addopts=-p no:logging and --no-print-logs 12.2 GB !!! 415 MB no

pytest is not included in the pip output below, since its version varied by test. All other packages remained consistent between tests. Some private distributions have been redacted.

We have had to pin our version of pytest to <3.3.0 for the moment. Is there anything else I can try to reduce the base memory footprint of version 3.3.0?

I'm happy to gather more data as needed.

Thanks!

Pip List

aiohttp (2.2.5)
alembic (0.9.5)
amqp (2.2.2)
aniso8601 (1.3.0)
apipkg (1.4)
apispec (0.25.4)
appnope (0.1.0)
asn1crypto (0.23.0)
astroid (1.5.3)
async-timeout (2.0.0)
attrs (17.3.0)
autopep8 (1.3.3)
Babel (2.5.1)
billiard (3.5.0.3)
bleach (2.1.1)
celery (4.1.0)
certifi (2017.7.27.1)
cffi (1.11.2)
chardet (3.0.4)
click (6.7)
coverage (4.4.2)
cryptography (2.1.1)
debtcollector (1.18.0)
decorator (4.1.2)
dill (0.2.7.1)
docutils (0.14)
entrypoints (0.2.3)
execnet (1.5.0)
flake8 (3.5.0)
flasgger (0.8.0)
Flask (0.12.2)
Flask-RESTful (0.3.6)
Flask-Script (2.0.6)
frozendict (1.2)
html5lib (1.0b10)
httplib2 (0.10.3)
idna (2.6)
ipdb (0.10.3)
ipykernel (4.6.1)
ipython (6.2.1)
ipython-genutils (0.2.0)
ipywidgets (7.0.5)
iso8601 (0.1.12)
isort (4.2.15)
itsdangerous (0.24)
jedi (0.11.0)
Jinja2 (2.9.6)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.1.0)
jupyter-console (5.2.0)
jupyter-core (4.4.0)
keystoneauth1 (3.2.0)
kombu (4.1.0)
lazy-object-proxy (1.3.1)
loggly-python-handler (1.0.0)
Mako (1.0.7)
MarkupSafe (1.0)
marshmallow (2.13.6)
marshmallow-sqlalchemy (0.13.1)
mccabe (0.6.1)
mistune (0.7.4)
monotonic (1.3)
msgpack-python (0.4.8)
multidict (3.2.0)
mypy (0.550)
mysqlclient (1.3.12)
nbconvert (5.3.1)
nbformat (4.4.0)
netaddr (0.7.19)
netifaces (0.10.6)
notebook (5.2.1)
oslo.config (4.13.1)
oslo.i18n (3.18.0)
oslo.policy (1.28.1)
oslo.serialization (2.21.1)
oslo.utils (3.30.0)
pandocfilters (1.4.2)
parso (0.1.0)
passlib (1.7.1)
pbr (3.1.1)
pexpect (4.2.1)
pickleshare (0.7.4)
pika (0.11.0)
pip (9.0.1)
pluggy (0.6.0)
positional (1.2.1)
prompt-toolkit (1.0.15)
psutil (5.4.1)
psycopg2 (2.7.3.1)
ptyprocess (0.5.2)
py (1.5.2)
pycodestyle (2.3.1)
pycparser (2.18)
pydecor (1.1.3)
pydocstyle (2.1.1)
pyflakes (1.6.0)
Pygments (2.2.0)
PyJWT (1.5.3)
pylint (1.7.4)
pymongo (3.5.1)
pyparsing (2.2.0)
pyrabbit (1.1.0)
pytest-aiohttp (0.1.3)
pytest-cov (2.5.1)
pytest-forked (0.2)
pytest-xdist (1.20.1)
python-dateutil (2.6.1)
python-editor (1.0.3)
python-keystoneclient (3.13.0)
python-memcached (1.58)
pytz (2017.2)
PyYAML (3.12)
pyzmq (16.0.3)
qtconsole (4.3.1)
redis (2.10.6)
requests (2.18.4)
requests-futures (0.9.7)
restructuredtext-lint (1.1.2)
rfc3986 (1.1.0)
setuptools (20.10.1)
simplegeneric (0.8.1)
simplejson (3.11.1)
six (1.11.0)
snowballstemmer (1.2.1)
SQLAlchemy (1.1.14)
SQLAlchemy-Utils (0.32.18)
stevedore (1.27.1)
stripe (1.67.0)
terminado (0.8.1)
testpath (0.3.1)
tornado (4.5.2)
tox (2.9.1)
traitlets (4.3.2)
typed-ast (1.1.0)
typing (3.6.2)
urllib3 (1.22)
vine (1.1.4)
virtualenv (15.1.0)
wcwidth (0.1.7)
webargs (1.8.1)
webencodings (0.5.1)
Werkzeug (0.12.2)
widgetsnbextension (3.0.8)
wrapt (1.10.11)
yarl (0.13.0)

Submission Checklist

Thanks for submitting an issue!

Here's a quick checklist in what to include:

  • Include a detailed description of the bug or suggestion
  • pip list of the virtual environment you are using
  • pytest and operating system versions
  • Minimal example if possible
@mplanchard
Copy link
Author

mplanchard commented Nov 28, 2017

I was just trying to make a repeatable example of this issue, and I have found it to be relatively easy. The file here uses a ton of RAM during testing, without all the docker/tox/coverage complications of our real environment:

OS: MacOS 10.13.2
Python: 3.5

pip version options virtual memory resident memory test time
3.2.5 none 132 MB 32 MB 78.76 seconds
3.3.0 none 890 MB 731 MB 140.96 seconds
3.3.0 -p no:logging 125 MB 32 MB 74.27 seconds
3.3.0 --no-print-logs 819 MB 711 MB 140.96 seconds

This indicates that the additional memory overhead seen in our full test environment when -p no:logging is set is due to something other than the log capturing. Has the fixture setup/teardown lifecycle changed significantly?

Edit: it's also possible that this test doesn't really replicate our actual test environment well enough. We've got ~2500 tests, hundreds of fixtures, etc.

@nicoddemus
Copy link
Member

@mplanchard thanks a ton for the reproducible example, we really appreciate it.

I can reproduce your memory usage on my Windows computer. Indeed I suspected the logging module when I first read your post a few minutes ago because it will append the captured log to the test items, which are kept alive for the entire test session. Setting --no-print-logs should be enough as well as this prevents the logging module from appending the captured output to the items:

pytest/_pytest/logging.py

Lines 307 to 310 in 4d2f05e

if self.print_logs:
# Add a captured log section to the report.
log = log_handler.stream.getvalue().strip()
item.add_report_section(when, 'log', log)

But this does not explain why even with -p no:logging you still see an increase in memory in your full test environment of course. There has not been any significant change in fixture setup/teardown that I remember.

Are you passing -p no:logging on the command line or in the pytest.ini file? As #2964 shows, there might be a bug related to that.

@nicoddemus nicoddemus added status: critical grave problem or usability issue that affects lots of users type: regression indicates a problem that was introduced in a release which was working previously plugin: logging related to the logging builtin plugin labels Nov 28, 2017
@mplanchard
Copy link
Author

In the full environment, I was passing it via the pytest.ini file, although in my examples from my local machine I was just appending it to the command line (which seemed to work).

I can enable debugging, config tracing, etc. and run any particular combination that you think might be helpful. The tests take quite a while, so just let me know and I can run them in the background during the workday.

@mplanchard
Copy link
Author

mplanchard commented Nov 29, 2017

It's odd that --no-print-logs is showing virtually the same memory consumption as with no options in both my local and the full test environment. Maybe it's the record list being kept by the LogCaptureHandler? It seems like that might get appended to even if self.print_logs is False

pytest/_pytest/logging.py

Lines 117 to 128 in 4d2f05e

class LogCaptureHandler(logging.StreamHandler):
"""A logging handler that stores log records and the log text."""
def __init__(self):
"""Creates a new log handler."""
logging.StreamHandler.__init__(self, py.io.TextIO())
self.records = []
def emit(self, record):
"""Keep the log records in a list in addition to the log text."""
self.records.append(record)
logging.StreamHandler.emit(self, record)

@nicoddemus
Copy link
Member

Maybe it's the record list being kept by the LogCaptureHandler?

Good point and I thought about that, but the handler is deleted from the item at the end of each phase:

pytest/_pytest/logging.py

Lines 296 to 310 in 4d2f05e

@contextmanager
def _runtest_for(self, item, when):
"""Implements the internals of pytest_runtest_xxx() hook."""
with catching_logs(LogCaptureHandler(),
formatter=self.formatter) as log_handler:
item.catch_log_handler = log_handler
try:
yield # run test
finally:
del item.catch_log_handler
if self.print_logs:
# Add a captured log section to the report.
log = log_handler.stream.getvalue().strip()
item.add_report_section(when, 'log', log)

I'm really puzzled that even with logging disabled with -p no:logging you are still seeing the abnormal memory increase...

@mplanchard
Copy link
Author

The memory usage seems to be higher from the start, which is what made me ask about differences in fixture setup. I'm doing a run now with -p no:logging, --debug, and --trace-config. I'll need to review the output for sensitive information before sending it. Would it be helpful to also have a comparison of a run with the same options on 3.2.5?

@nicoddemus
Copy link
Member

Thanks for the patience in debugging this!

Would it be helpful to also have a comparison of a run with the same options on 3.2.5?

Hopefully yes, so it would be great if you could that as well. 😁

@mplanchard
Copy link
Author

No problem! We love pytest, and use it for all our projects. It's the least we can do.

Also, just collecting statistics, out test suite outputs about 54000 log messages, which is about 10 MB, so even if every log message were held in active memory simultaneously, it would not explain the sudden jump in virtual memory by several gigabytes.

@mplanchard
Copy link
Author

mplanchard commented Nov 29, 2017

I'm having some issues with collecting tests in --debug mode, because several of our fixtures rely on setting/transmitting a Flask application context, and having --debug on seems to interfere with that. There's a relatively complex fixture hierarchy relying on those fixtures, and of course a lot of tests. Any ideas as to how to get around that?

___________________________________ ERROR collecting invision_api/tests/integration/v3/test_log.py ____________________________________
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:617: in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:222: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:179: in __call__
    self.before(hook.name, hook_impls, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:441: in before
    hooktrace(hook_name, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:160: in __call__
    self.root.processmessage(self.tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:136: in processmessage
    lines = self.format_message(tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:131: in format_message
    lines.append("%s    %s: %s\n" % (indent, name, value))
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:366: in <lambda>
    __str__ = lambda x: str(x._get_current_object())
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:306: in _get_current_object
    return self.__local()
.tox/py35/lib/python3.5/site-packages/flask/globals.py:51: in _find_app
    raise RuntimeError(_app_ctx_err_msg)
E   RuntimeError: Working outside of application context.
E
E   This typically means that you attempted to use functionality that needed
E   to interface with the current application object in a way.  To solve
E   this set up an application context with app.app_context().  See the
E   documentation for more information.

@mplanchard
Copy link
Author

mplanchard commented Nov 29, 2017

I think I may have found something interesting. Looking at htop, I felt like there were more threads being used in 3.3.0, so I counted the threads every two seconds. With version 3.3.0, the thread count is increased by a factor of ~10 for the majority of the test, and then bumps up to a thread count increase of ~25 when it hits the latter tests (which have many more log messages, although I'm not sure that's causal).

Version 3.3.0 with -p no:logging
2 -> 4 -> 8 -> 28 -> 70 -> 136 -> 180 -> 226 -> 372 (majority of tests) -> 816 (beginning of latter tests till freeze)

Version 3.2.5
2 -> 4 -> 8 -> 30 -> 114 -> 148 -> 32 -> 34 (majority of tests) -> 166 (beginning of latter tests) -> 92 -> 150 -> 222 -> 90 -> 216 -> 82 -> 186 -> 220 -> 222 -> 230 -> 57 -> 38 -> 32 (until tests finished)

So, it looks like pytest may be holding onto threads that it was letting go previously, or spinning up more threads than it was before.

Edit: thread count collected with while true; do ps -eLf | grep -c "py.test"; sleep 2; done

@mplanchard
Copy link
Author

And the thread count gets even higher with the logging plugin enabled.

Version 3.3.0 with default options
2 -> 4 -> 8 -> 4 -> 28 -> 30 -> 76 -> 130 -> 162 -> 202 -> 290 -> 348 -> 372 (majority of tests) -> 554 (beginning of latter tests) -> 712 -> 922 -> 1178 (freezes)

@nicoddemus
Copy link
Member

Hi @mplanchard, thanks a lot for the detailed statistics. The part about the high number of threads is interesting: pytest itself (nor py and pluggy) spawn any threads AFAIK. This to me also suggests it might be a side effect of something not being teardown at the appropriate time (fixtures might be one case as you suggest). I would love to help debug this further but it is hard to speculate further without looking at the source code.

I would try to reduce the tests and code involved step by step until I got to a minimal reproducer. For example running half of the test suite, looking for fixtures which might spawn threads and see if the same number of threads keep spawning if tests which don't use that fixture are executed, and so on. Really sorry I can't help more than that, but I will be sure to come back here if more suggestions pop in my head.

@mplanchard
Copy link
Author

No worries. I didn't find any reference to the threading module in the pytest code, so I'm inclined to think you're right. I wonder if it might be due to the changes referenced in #2959? Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr implements the whole frozen thing, so I'm not sure whether that's a reasonable conjecture.

I'll try to get a minimal example by this evening that demonstrates a difference in thread consumption at the lowest possible level.

@nicoddemus
Copy link
Member

Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr implements the whole frozen thing, so I'm not sure whether that's a reasonable conjecture.

Actually the change in #2959 refers to the fact that the params argument of pytest.fixture could be mutated after the declaration and it would affect the parametrization with the mutated list; returning mutable objects from fixtures is common practice and should work.

Another thing to look for: do all your fixtures implement an explicit cleanup (if makes sense)? For example, explicitly closing a db connection or a destroying a thread pool? I can see some bug sneaking in that would keep the fixture frame alive for some reason, and a fixture which implicitly relied on the fact that the locals of that frame were destroyed for cleanup might start acting up (#2798 came to mind, but that was reported in 3.2).

@mplanchard
Copy link
Author

You were absolutely right about explicit cleanup. We had a couple of test fixtures that were relying on an implicit closing of a MongoClient connection upon garbage collection (sneakily, because they were instantiating objects that instantiated clients). Updating these to close() the clients explicitly fixed the memory leak so that the -p no:logging utilization more closely mirrors that of 3.2.5.

I should have time this evening both to check the memory utilization with logging enabled and to hopefully generate a test based on our fixtures that is able to demonstrate the difference between 3.2.5 and 3.3.0.

@The-Compiler
Copy link
Member

I haven't really had time to follow this discussion, but FWIW I've had good experiences tracking down memory leaks using tracemalloc and/or pympler.muppy.

@nicoddemus
Copy link
Member

nicoddemus commented Nov 29, 2017

@mplanchard that's both great news (that you found a solution to your problem) and a bit of sad news (it seems we introduced a regression about leaking frames/fixtures in 3.3). 😅

We would really appreciate if you can come up with a reproducible example that demonstrates the differences between 3.2 and 3.3 regarding fixture cleanup. 👍

@nicoddemus
Copy link
Member

Rá managed to find a reproducer in #2981! 👍

@mplanchard
Copy link
Author

mplanchard commented Nov 30, 2017 via email

@nicoddemus
Copy link
Member

Thanks for the patience as well!

out test suite outputs about 54000 log messages, which is about 10 MB

With the fixed tear down code, could you please check the memory consumption of your test suite with the logging plugin active?

@mplanchard
Copy link
Author

mplanchard commented Nov 30, 2017

@nicoddemus It seems like after integrating our teardown fixes, thread utilization and memory consumption is pretty much back to normal, even with the logging plugin enabled. In a run with pytest 3.3.0 and the logging plugin enabled, both metrics look pretty much identical to running with 3.2.5.

The changes required were remarkably minimal, but of course the fixtures are used in hundreds and hundreds of tests. I've included all of the required fixture changes for our test suite below:

-@pytest.yield_fixture()
-def legacy_app():
+@pytest.fixture(scope='session')
+def legacy_app_session():
+    """Yield a legacy application for the session."""
+    instance = api.create_custom_app()
+    yield instance
+    instance.mongo.db.client.close()
+
+
+@pytest.fixture()
+def legacy_app(legacy_app_session):
     """Yield an isolated flask test app instance"""
-    app = api.create_custom_app()
-    context = app.app_context()
+    context = legacy_app_session.app_context()
     context.push()
-    yield app
+    yield legacy_app_session
     context.pop()

 @pytest.fixture(scope='session')
 def session_app(sql_session):
     """Return an app for the session, with no context"""
     instance = api.create_app()
     instance.config['TESTING'] = True
     scoped_session = instance.sql.session
     instance.sql.session = instance.sql.session()
     yield instance
+    instance.mongo.db.client.close()
     instance.sql.session.expunge_all()
     instance.sql.session.rollback()
     instance.sql.session.close()

 @pytest.fixture()
-def db(mongo_util):
+def db(mongo_util, db_client):
     """Return a pymongo Database instance"""
-    db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+    db = db_client.get_database(mongo_util.db_info.db)
     return db
 
 
 @pytest.fixture(scope='session')
-def db_session(mongo_util):
+def db_session(mongo_util, db_client_session):
     """Return a pymongo Database instance"""
-    db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+    db = db_client_session.get_database(mongo_util.db_info.db)
     return db

The mongo_util.get_db() method being used previously created a client when it was called, which was then not getting closed. Also, the db_client fixtures, which were not being used previously in the fixtures where they've been added, already had explicit teardowns.

@nicoddemus
Copy link
Member

That's great news, thanks @mplanchard for the followup! I'm really glad we got to the bottom of the issue and the solution was simple.

I'm closing this for now because the root of the issue is detailed in #2981 already. Thanks again for all the patience and understanding while we tracked down the problem! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin: logging related to the logging builtin plugin status: critical grave problem or usability issue that affects lots of users type: regression indicates a problem that was introduced in a release which was working previously
Projects
None yet
Development

No branches or pull requests

3 participants