Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Maint] The Gallery is broken in CircleCI previews #174

Closed
psobolewskiPhD opened this issue Jun 8, 2023 · 11 comments · Fixed by #207
Closed

[Maint] The Gallery is broken in CircleCI previews #174

psobolewskiPhD opened this issue Jun 8, 2023 · 11 comments · Fixed by #207

Comments

@psobolewskiPhD
Copy link
Member

🧰 Task

Figure out what the issue is, the CircleCI action is different than the build_docs action, which works correctly—the gallery is fine in the .zip.
There is a difference in the order of the installs, which results in different ninja version.
There is a difference in python version
There is a difference in Qt backend.

I took some stabs at the first and second part, but that resulted in worse outcomes, with one of the examples failing totally and the build failing as as result, see:
#173

@melissawm
Copy link
Member

It looks like all of the examples are failing with the following message:

WARNING: /home/circleci/project/napari/examples/3Dimage_plane_rendering.py failed to execute correctly: Traceback (most recent call last):
  File "/home/circleci/project/napari/examples/3Dimage_plane_rendering.py", line 16, in <module>
    viewer = napari.Viewer(ndisplay=3)
  File "/home/circleci/project/napari/napari/viewer.py", line 67, in __init__
    self._window = Window(self, show=show)
  File "/home/circleci/project/napari/napari/_qt/qt_main_window.py", line 576, in __init__
    self._add_menus()
  File "/home/circleci/project/napari/napari/_qt/qt_main_window.py", line 732, in _add_menus
    self.view_menu = build_qmodel_menu(
  File "/home/circleci/project/napari/napari/_qt/_qapp_model/_menus.py", line 32, in build_qmodel_menu
    return QModelMenu(
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qmenu.py", line 62, in __init__
    self.rebuild()
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qmenu.py", line 105, in rebuild
    _rebuild(
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qmenu.py", line 312, in _rebuild
    submenu = QModelSubmenu(item, app, parent=menu)
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qmenu.py", line 148, in __init__
    super().__init__(
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qmenu.py", line 62, in __init__
    self.rebuild()
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qmenu.py", line 105, in rebuild
    _rebuild(
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qmenu.py", line 315, in _rebuild
    action = QMenuItemAction(item, app=app, parent=menu)
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qaction.py", line 166, in __init__
    self.update_from_context({})
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qaction.py", line 170, in update_from_context
    super().update_from_context(ctx)
  File "/home/circleci/project/venv/lib/python3.10/site-packages/app_model/backends/qt/_qaction.py", line 94, in update_from_context
    self.setEnabled(expr.eval(ctx) if (expr := self._cmd_rule.enablement) else True)
RuntimeError: Internal C++ object (QMenuItemAction) already deleted.

@Czaki
Copy link
Contributor

Czaki commented Jun 13, 2023

It is boring to see the same error again and again. The next problem from the storage of global reference to QtWidget, which is cleaned during destroy of the viewer. :(

@melissawm
Copy link
Member

melissawm commented Jun 23, 2023

Ok! Currently, changing the system/qt dependencies for CircleCI the error has moved from the one above to

generating gallery for gallery... [ 91%] nD_multiscale_image_non_uniform.py
Killed
make: *** [Makefile:30: docs-build] Error 137

Exited with code exit status 2
CircleCI received exit code 2

As @psobolewskiPhD has found, this may mean CircleCI is running out of memory: https://support.circleci.com/hc/en-us/articles/115014359648-Exit-Code-137-Out-of-Memory

I tried increasing the CircleCI image memory but we don't have that machine available in our tier. So currently our options are:

  • Increase the CircleCI memory tier (by paying)
  • Figuring out why the gallery is consuming so much memory.
    I am hoping maybe one of the Qt experts may help with finding a way to either clean up after each example or make sure we handle this better?

Feel free to inspect things here: #178

@Czaki
Copy link
Contributor

Czaki commented Jun 24, 2023

Did qtgallery allow to create custom teardown setup after each example?

@psobolewskiPhD
Copy link
Member Author

I think that should be possible, if not via qtgallery, which has reset_qapp:
https://qtgallery.readthedocs.io/en/latest/api.html
Then by sphinx-gallery, which has options for before or after:
https://sphinx-gallery.github.io/stable/advanced.html#custom-reset
I mean looking at this example script:
https://github.com/napari/napari/blob/49e235ae94e4a3c9d473208a10fa191a22a07c76/examples/add_image.py#L1-L18

It doesn't look like napari would be closed afterwards? So making the gallery would just pile up naparis? maybe the qtgallery handling of the event loop isn't working with napari.run()

@melissawm
Copy link
Member

@dalthviz this is the issue we discussed today, maybe you can take a look 😄

@dalthviz
Copy link
Member

Checking, I think the issue comes from the way sphinx-gallery works (here a comment I found in their issue tracker about how it works: sphinx-gallery/sphinx-gallery#1149 (comment)). If I understand correctly, all the examples run in a single Python interpreter which essencially means that any data that an example script uses/generates stays in memory as long as the gallery/docs are being generated. Testing locally, for the full gallery/docs generation, currently you need ~12 Gb so makes sense that the CI is failing due to a memory error (I think the large CI image comes with 8Gb while the xlarge comes with 16Gb). Maybe running the docs generation while setting the memory consumption option could help to detected the examples that use a lot of memory.

Also, an idea that maybe could be worthy to explore is to divide the examples generation so kind of calling the make docs-build multiple times while passing different matching patterns. Maybe using patterns like "/3D*.py", "/add_.py", "/nD.py", etc, and as a last run use the current pattern to catch any missing examples ("/*.py") or the names of the examples that use a lot of memory to run them independently could work. Since sphinx-gallery has a way to detect which examples have been already build I think something like the above could help.

@lucyleeow
Copy link
Collaborator

Ah I think I could have been better help as I wrote some of this code in Sphinx Gallery 🤦

If I understand correctly, all the examples run in a single Python interpreter which essencially means that any data that an example script uses/generates stays in memory as long as the gallery/docs are being generated.

Yes this is true.

We can do a garbage collection after each example by adding a 'resetting' function to run at the end of each example. Sphinx Gallery should delete references to variables created in examples, at the end of execution of each example. See: sphinx-gallery/sphinx-gallery#853

I can have a go at doing this tomorrow.

@melissawm
Copy link
Member

melissawm commented Jul 24, 2023

As a starting point, we are using qtgallery and we did add a reset to gallery examples here:

docs/docs/conf.py

Lines 210 to 225 in 704fb54

sphinx_gallery_conf = {
#'examples_dirs': '../../napari/examples', # path to your example scripts
# this value is set in the Makefile
'gallery_dirs': 'gallery', # path to where to save gallery generated output
'filename_pattern': '/*.py',
'ignore_pattern': 'README.rst|/*_.py',
'default_thumb_file': Path(__file__).parent / 'images' / 'logo.png',
'plot_gallery': "'True'", # https://github.com/sphinx-gallery/sphinx-gallery/pull/304/files
'download_all_examples': False,
'min_reported_time': 10,
'only_warn_on_example_error': True,
'image_scrapers': ("matplotlib", qtgallery.qtscraper,),
'reset_modules': (reset_napari_theme,),
'reference_url': {'napari': None},
'within_subsection_order': ExampleTitleSortKey,
}

This is probably not enough or there is some interaction still missing here...

@aganders3
Copy link
Contributor

I spent entirely too much time debugging the docs CI for napari/napari#4865 and ended up making a more specific "napari scraper" that seems to help and might be of use here as well (see #207). This also vastly speeds up the build for me on CI; from ~25 min to ~11 min on GitHub Actions, though no improvement on CircleCI.

One caveat is that it only scrapes napari Viewer windows - I'm not sure if we have any gallery examples that need to scrape other windows but will go through them to check.

@lucyleeow
Copy link
Collaborator

Amazing. This is great because I've realised that Sphinx gallery will do a garbage collection between every example.

@jni jni closed this as completed in #207 Jul 24, 2023
jni pushed a commit that referenced this issue Jul 24, 2023
# Description
This is a CI change that removes `qtgallery` in favor of a
napari-specific scraper. It's not that different but gives us a little
more control and is not much code.

This seems to fix the error seen in "Build PR Docs" for
napari/napari#4865 and also speeds up the docs
build quite a bit (~11 min instead of ~25 min).

I'm no Qt expert but suspect the main improvements here are related to
adding `napari.Viewer.close_all()` (which maybe belongs in the reset fn)
and calling `processEvents()` one more time after this.

The main drawback right now is that this doesn't capture non-Viewer
windows, but this could probably be added if needed.

## Type of change
- [x] Fixes or improves workflow, documentation build or deployment

# References
closes #174 (maybe?)
fixes errors in docs build for
napari/napari#4865

## Final checklist:
- [x] My PR is the minimum possible work for the desired functionality
- [x] I have commented my code, particularly in hard-to-understand areas
melissawm pushed a commit to melissawm/napari-docs that referenced this issue Sep 6, 2023
# Description
This is a CI change that removes `qtgallery` in favor of a
napari-specific scraper. It's not that different but gives us a little
more control and is not much code.

This seems to fix the error seen in "Build PR Docs" for
napari/napari#4865 and also speeds up the docs
build quite a bit (~11 min instead of ~25 min).

I'm no Qt expert but suspect the main improvements here are related to
adding `napari.Viewer.close_all()` (which maybe belongs in the reset fn)
and calling `processEvents()` one more time after this.

The main drawback right now is that this doesn't capture non-Viewer
windows, but this could probably be added if needed.

## Type of change
- [x] Fixes or improves workflow, documentation build or deployment

# References
closes napari#174 (maybe?)
fixes errors in docs build for
napari/napari#4865

## Final checklist:
- [x] My PR is the minimum possible work for the desired functionality
- [x] I have commented my code, particularly in hard-to-understand areas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment