Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Adds support for lazy and optional importing of dependencies #1415

Closed
wants to merge 4 commits into from

Conversation

mnmelo
Copy link
Member

@mnmelo mnmelo commented Jun 21, 2017

This addresses issues #577, #1361 and #1159. I have rebased against @orbeckst's branch issue-1159-analysis-deps, not develop, hence the conflicts.

Namely, it provides an easy way to import modules that are only really loaded on accession. This allows one to:

  1. delay module loading for potentially heavy dependencies until they're really needed;
  2. import optional dependencies without always having to catch potential ImportErrors in order to replace them with meaningful exit messages;
  3. concentrate imports at file headers regardless of the needs in points 1. and 2;
  4. via 3., reduce repetition when conditional imports are needed in more than one place.

Mechanism

My strategy was to adapt the code from the importing module from the PEAK package. On lazy loading a dummy, hollow module class is created and instantiated that triggers a proper module import on accession.

API

Right now two functions are exposed: lazy.import_module and lazy.import_function that emulate the typical uses of import and from import. I'd like feedback on this.
Here's how it now looks in analysis/psa.py, which uses several lazy/optional imports:

# Optional and/or lazily loaded modules
#from scipy import spatial, cluster
#import matplotlib
from MDAnalysis.lib import lazy
spatial = lazy.import_module("scipy.spatial")
cluster = lazy.import_module("scipy.cluster")

matplotlib = lazy.import_module('matplotlib')
plt = lazy.import_module('matplotlib.pyplot')

sns = lazy.import_module('seaborn.apionly')

Also, as it is now lazy modules lack the option to customize an error message, spitting out a generic

"attempted to use a functionality that requires module xx.yy, but it couldn't be loaded. Please install xx and retry."

For this reason I left the netcdf4 optional dependency untouched, because the error message is quite informative installation-wise. But something to add if this PR gets accepted.

Timing

Since I was developing this alongside @orbeckst's take on #1159 (PR #1404) I took the initiative to also make matplotlib and scipy lazy imports.
I just did some crude timing (100 repeats using UNIX's time) and found that on my machine lazy loading of scipy saves a grand total of 5ms when importing the MDAnalysis.analysis.distances submodule (on top of ~250ms taken to load MDAnalysis). This shows that point 1. above might not be that relevant.

Testing

Added tests for both lazy loads and missing dependencies (making good use of @richardjgowers's block_import), but still in nose format. Since this adds stuff to MDAnalysisTests/lib, which @utkbansal already took care of (#1413), it might be best to adapt to pytest directly before merging.

orbeckst and others added 4 commits June 17, 2017 17:55
- require scipy and matplotlib in setup.py (fixes #1361)
- scipy and matplotlib are imported at top in analysis
  - updated all modules
  - removed any code that guards against scipy or matplotlib import
  - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis
  - fixes #1159
- removed conditional skipping of tests when scipy or matplotlib are missing
- travis.yml
- docs, code style in analysis and analysis tests
This now addresses issues #577 (facultative imports), #1361 (cleaner
optional dependencies) and #1159 (optional dependencies in the analysis
module)
del msg
# Optional and/or lazily imported modules
from MDAnalysis.lib import lazy
scipy = lazy.import_module('scipy.sparse', level='base')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only load scipy.sparse and nothing else from scipy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be equivalent to
import scipy.sparse
in that it results on the 'scipy' name being added to the namespace.
On module access both scipy and scipy.sparse are fully imported.

The level='base' keyword tells the function to return a reference to the base module (scipy). From then on you use scipy.sparse.whatever, just like it had been imported with import.

The difference to the default import_module call (without level='base') is that in the default case the returned reference is to the submodule. In that case you use sparse = import_module('scipy.sparse'), as equivalent to from scipy import sparse.

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fancy! The only thing that annoys me a little is that you can end up with a calculation failing at the end because something is missing and you were not warn ahead of time.



def _import_module(modname, caller_name):
acquire_lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a context manager?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context manager already exists, but inside _imp. I'd rather keep it high-level and not use _imp, but I already have to anyway for Python 3 support...
So I guess the answer is yes :)

@mnmelo
Copy link
Member Author

mnmelo commented Jun 21, 2017

@jbarnoud, yes, that is a possibility. Right now I don't think any of the analyses commits to heavy calculations before using optional dependencies.
I imagine early loading of optionals could become a design guideline for future (heavy) analyses.

@mnmelo mnmelo changed the title Adds for support for lazy and optional importing of dependencies Adds support for lazy and optional importing of dependencies Jun 21, 2017
@jbarnoud
Copy link
Contributor

This introduces an odd behaviour for a python user. Maybe it should be documented somewhere; at least in the developer page of the wiki.

When should we merge this? It is going to introduce conflicts in a bunch of test files and in .travis.yml.

@mnmelo
Copy link
Member Author

mnmelo commented Jun 23, 2017

I opened the PR more as a request-for-comments. I think its best to bring it in only after @utkbansal's GSoC effort, no?

As to the impact to the end-user, I'm not so sure. This is internally hidden, and just as before some dependency errors/warnings only show up when executing analysis code, not at import-time.
What use-case do you foresee to be unexpected for users?

OTOH, I should definitely write up something for would-be contributors of code, as this would be a new guideline for using optional dependencies.

@mnmelo
Copy link
Member Author

mnmelo commented Jun 23, 2017

There's a potentially neat aspect:
because of how import caching works, if I lazy.import_module('scipy') anywhere, later calls to import scipy still get the hollow LazyModule.

This means that we can condense the calls to all those optional/lazy dependencies in a single file (say, lib/lazy.py) and let the rest of the files use regular import scipy calls.

However, a drawback is that functions cannot be preloaded: issuing from scipy.stats import gaussian_kde will immediately trigger the full loading of scipy.stats. Can be compensated for by always using scipy.stats.gaussian_kde, which @orbeckst prefers, but I'm not so sure about it.

@richardjgowers richardjgowers changed the title Adds support for lazy and optional importing of dependencies WIP: Adds support for lazy and optional importing of dependencies Jun 24, 2017
@orbeckst orbeckst added the close? Evaluate if issue/PR is stale and can be closed. label Sep 28, 2018
@RMeli RMeli deleted the issue-1159-lazy-optional-imports branch December 23, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close? Evaluate if issue/PR is stale and can be closed. Component-Analysis Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants