Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PEP561 #4693

Merged
merged 51 commits into from
Apr 9, 2018
Merged

Implement PEP561 #4693

merged 51 commits into from
Apr 9, 2018

Conversation

ethanhs
Copy link
Collaborator

@ethanhs ethanhs commented Mar 7, 2018

This is an implementation of PEP 561. I am picking up from #4403 because I'd rather not rebase over 100 commits (and some of the changes shouldn't go in!). Third times a charm I hope :)

Tasks:

  • Test PEP 561 implementation
  • Check PEP 561 conformance to resolution order
  • support running with alternate Python executable --python-executable
  • document PEP 561 features
  • document --python-executable flag
  • test --python-executable flag
  • test --no-site-packages flag
  • document --no-site-packages flag
  • --python-version flag sets --python-executable and visa-versa if possible

This branch likely works as intended and is feature complete, but it is possible that I've overlooked something/made a mistake.

Fixes #2625, #1190, #965.

@ethanhs ethanhs mentioned this pull request Mar 7, 2018
8 tasks
mypy/build.py Outdated
@@ -804,7 +806,7 @@ def remove_cwd_prefix_from_path(p: str) -> str:


# Cache find_module: (id, lib_path) -> result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs updating

mypy/build.py Outdated
"""Return the path of the module source file, or None if not found."""
lib_path = tuple(lib_path_arg)

def find() -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this refactor in a separate PR too?

@ethanhs
Copy link
Collaborator Author

ethanhs commented Mar 7, 2018

I just reverted some of the more significant refactoring started in the last PR so that this is easier to review. Most of it will be refactored later anyway.

mypy/main.py Outdated
options.python_version = _python_version_from_executable(
special_opts.python_executable)
options.python_executable = special_opts.python_executable
return options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps misleading that this modifies in place and returns the modified value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that the comment at the beginning might be misleading? (That could be, I'm just not sure what you mean by this comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect either

options.python_version, options.python_executable = infer_python_version_and_executable(special_opts)

or

infer_python_version_and_executable(options, special_opts)

But not

options = infer_python_version_and_executable(options, special_opts)

(think how list.sort(), dict.update(), etc, all modify in place and do not return the new result)

mypy/build.py Outdated
path = os.path.join(pkg_dir, dir_chain)
third_party_dirs.append(path)

candidate_base_dirs = tuple(third_party_dirs + find_module_dir_cache[dir_chain, lib_path])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the tuple conversion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is just due to it being in my previous commits. I will get rid of it when I probably end up rebasing this on #4623.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty close to done to me - looks like your patch got a lot cleaner after adding the refactor and removing it again :). So maybe it will get in before #4623 after all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. It is probably easier for me to rebase since my implementation is just one commit to rollback and apply, but I would prefer to get this PR in first. :)

mypy/build.py Outdated

if python_executable is not None:
site_packages_dirs = get_site_packages_dirs(python_executable)
if not site_packages_dirs:
Copy link
Contributor

@eric-wieser eric-wieser Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be site_packages_dirs is not None?

In fact, is there any point in checking this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just realized it is dead code.


def test_executable_inference(self) -> None:
"""Test the --python-executable flag with --python-version"""
sys_ver_str = '.'.join(map(str, sys.version_info[:2]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: possibly more readable as '{v.major}.{v.minor}'.format(sys.version_info)

with pytest.raises(PythonExecutableInferenceError) as e:
options = infer_python_version_and_executable(options, special_opts)
assert str(e.value) == 'Python version (2, 10) did not match executable {}, got' \
' version {}.'.format(sys.executable, str(sys.version_info[:2]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: No need for the str here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed needed. type(e.value) is mypy.main.PythonExecutableInferenceError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the second str on sys.version_info

mypy/build.py Outdated
# the python executable is likely in a virtual environment, thus lacking
# needed site methods
output = call_python(python_executable, VIRTUALENV_SITE_PACKAGES)
return [line for line in output.splitlines() if os.path.isdir(line)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check the directories exist in this path, but not in the python_executable == sys.executable path?

mypy/build.py Outdated
if key not in find_module_cache:
components = id.split('.')
dir_chain = os.sep.join(components[:-1]) # e.g., 'foo/bar'
site_packages_dirs = get_site_packages_dirs(python_executable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to put these three lines inside find, which makes the flow a little clearer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. They originally were broken out due to the refactor, but I agree putting them in find is ideal right now.

mypy/build.py Outdated
dirs = []
# TODO (ethanhs): refactor to use lru_cache on each of these searches
dirs = find_module_dir_cache.get((dir_chain, lib_path), [])
if not dirs:
Copy link
Contributor

@eric-wieser eric-wieser Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent to the previous code. Your new code ignores the cache if if says that dirs == [], which seems undesirable. I'd be inclined to leave it untouched

mypy/build.py Outdated
# If we're looking for a module like 'foo.bar.baz', then candidate_base_dirs now
# contains just the subdirectories 'foo/bar' that actually exist under the
# elements of lib_path. This is probably much shorter than lib_path itself.
# Now just look for 'baz.pyi', 'baz/__init__.py', etc., inside those directories.
seplast = os.sep + components[-1] # so e.g. '/baz'
seplast = os.sep + components[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this comment inaccurate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was at one revision but is no longer, so I should add it back.

mypy/build.py Outdated
def find_module(id: str, lib_path_arg: Iterable[str]) -> Optional[str]:
USER_SITE_PACKAGES = \
'from __future__ import print_function; import site; print(site.getusersitepackages());' \
'print(*site.getsitepackages(), sep="\\n")'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails in the corner-case where the paths contain newlines(!)

Better would be to do

print(repr(site.getsitepackages() + [site.getusersitepackages()]))

and then read it with ast.literal_eval

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also worry that you might have a different between the stdout encoding and the default .decode() decoding, which using repr/ast.literal_eval or maybe even pickle would resolve.

Are site package paths unicode, bytes, or either?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be str, for each meaning in 2 and 3, so your recommendation is well taken.

@gvanrossum
Copy link
Member

gvanrossum commented Mar 12, 2018 via email

@ethanhs
Copy link
Collaborator Author

ethanhs commented Mar 14, 2018

Okay now that #4623 is merged I will revert the last few commits I have made and hand-reapply them.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another batch of review comments. I'll continue tomorrow.


Generally, you do not need to do anything to use installed packages for the
Python executable used to run mypy. They should be automatically picked up by
mypy and used for type checking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works if the packages ship with type information, and currently no (or hardly any) packages do it. So I'd like to add a qualifier and explicitly say that this does not work for most packages yet.

Making PEP 561 compatible packages
**********************************

Packages that must be used at runtime and supply type information via type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this can be a bit confusing still, since "used at runtime" is too vague. What about instead saying that this section is relevant if you are publishing your own packages on PyPI, or if you want mypy to type check installed packages? Typically user code is type checked by directly pointing to the source code in your repository instead of first installing the package. As written right now, some people may get the wrong impression that they are always supposed to install their package before running mypy.

Maybe give examples of some use cases where this could be useful? Examples:

  • You publish your package on PyPI (or a local package repository) and want users to be able to type check code that uses your package.
  • You want to type check a codebase spread over multiple repositories or directories, and you use setup.py script(s) to install your packages. Alternatively, you could also give mypy multiple files/directories to type check as command line arguments.

mypy/build.py Outdated
@@ -33,6 +36,7 @@
if MYPY:
from typing import Deque

from . import sitepkgs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: We don't use relative imports in the mypy codebase.

mypy/build.py Outdated
@@ -809,6 +814,21 @@ def remove_cwd_prefix_from_path(fscache: FileSystemCache, p: str) -> str:
return p


@functools.lru_cache(maxsize=None)
def _get_site_packages_dirs(python_executable: Optional[str]) -> List[str]:
"""Find package directories for given python."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention in the docstring that this involves running Python to determine where site packages live. Document why we are using lru_cache (starting up a Python interpreter is pretty slow).

@@ -822,20 +842,30 @@ class FindModuleCache:

def __init__(self, fscache: Optional[FileSystemMetaCache] = None) -> None:
self.fscache = fscache or FileSystemMetaCache()
# Cache find_module: (id, lib_path) -> result.
self.results = {} # type: Dict[Tuple[str, Tuple[str, ...]], Optional[str]]
self.find_lib_path_dirs = functools.lru_cache(maxsize=None)(self._find_lib_path_dirs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lru_cache uses a dictionary internally so I suspect that using lru_cache doesn't perform better. I'd prefer to not use it unless there is a clear benefit since it's not type safe (argument types can't be checked).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that the cache-correctness of lru_cache is more valuable than the type-checking provided by mypy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that we've spent more effort arguing about what to do that would have been the relative benefits of either approach :-)

We have an informal style guideline that we avoid idioms that can't be type checked unless there is particularly good reason, and here the benefit we gain at the cost of losing some typing precision is minor.

But I won't argue about this any more.

mypy/main.py Outdated
if sys.version_info[:2] == python_version:
return sys.executable
str_ver = '.'.join(map(str, python_version))
print(str_ver)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a debug print?


def _python_version_from_executable(python_executable: str) -> Tuple[int, int]:
try:
check = subprocess.check_output([python_executable, '-c',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running multiple Python subprocess invocations can be pretty slow -- for example, I have a bunch of packages installed that seem to result in a simple python -c 'print(1)' run taking about 100ms. We seem to be running the interpreter multiple times: 1) for finding the executable 2) for finding the Python version and 3) for finding the search directories. It seems that all of these could be combined into a single call that prints all of that information, potentially saving hundreds of milliseconds per invocation. We could also cache the result of that call, just in case.

Copy link
Collaborator Author

@ethanhs ethanhs Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only will at most run it twice. (3) is always run, but only one of (1) or (2) is needed to be run (to infer the other value). Though perhaps combining them is for the best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah then this isn't as big of a problem as I first thought. Still, many of my incremental mypy runs only take under 1s, and a potential 20% performance impact is significant. I'll do some measurements -- maybe we can leave this optimization out for now.

Copy link
Collaborator Author

@ethanhs ethanhs Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave a TODO. (Done)

mypy/main.py Outdated
help='use Python x.y')
help='use Python x.y', dest='special-opts:python_version')
parser.add_argument('--python-executable', action='store', metavar='EXECUTABLE',
help="Python executable which will be used in typechecking.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text is a bit vague. What about something like "Python executable used for finding PEP 561 compliant installed packages and stubs"?

Remove dot from the end of the help text.

mypy/main.py Outdated
dest='special-opts:python_executable')
parser.add_argument('--no-site-packages', action='store_true',
dest='special-opts:no_executable',
help="Do not search for installed PEP 561 compliant packages.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dot from the end of the help text.

mypy/main.py Outdated
parser.add_argument('--platform', action='store', metavar='PLATFORM',
help="typecheck special-cased code for the given OS platform "
"(defaults to sys.platform).")
"(defaults to sys.platform).")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dot from the end of the help text.

@ethanhs
Copy link
Collaborator Author

ethanhs commented Apr 5, 2018

@JukkaL I've made all of the changes you requested save changing lru_caches to dicts, since there seems to be some disagreement over which to use. I also wanted to hear more of your thoughts on combining subprocess calls.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my final batch of comments. If you are short on time now, some of the suggestions can be left out and implemented in another PR so that we can merge this sooner. Let me know what you prefer. If you are able to update the PR by Monday we may just have enough time to include this in the next release, but even if that doesn't work out the next release after that won't be many weeks away.

Thanks for persevering with this -- this is going to make a big difference!

@@ -822,20 +842,30 @@ class FindModuleCache:

def __init__(self, fscache: Optional[FileSystemMetaCache] = None) -> None:
self.fscache = fscache or FileSystemMetaCache()
# Cache find_module: (id, lib_path) -> result.
self.results = {} # type: Dict[Tuple[str, Tuple[str, ...]], Optional[str]]
self.find_lib_path_dirs = functools.lru_cache(maxsize=None)(self._find_lib_path_dirs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that we've spent more effort arguing about what to do that would have been the relative benefits of either approach :-)

We have an informal style guideline that we avoid idioms that can't be type checked unless there is particularly good reason, and here the benefit we gain at the cost of losing some typing precision is minor.

But I won't argue about this any more.


def _python_version_from_executable(python_executable: str) -> Tuple[int, int]:
try:
check = subprocess.check_output([python_executable, '-c',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah then this isn't as big of a problem as I first thought. Still, many of my incremental mypy runs only take under 1s, and a potential 20% performance impact is significant. I'll do some measurements -- maybe we can leave this optimization out for now.


def ex(a):
# type: (Iterable[str]) -> Tuple[str, ...]
"""Example typed package. This intentionally has an error."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious what the error is. Maybe describe it or make it more obvious?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this may in fact be an out of date docstring!

class TestPEP561(TestCase):
@contextmanager
def install_package(self, pkg: str,
python_executable: str = sys.executable) -> Generator[None, None, None]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: Iterator[None] would be sufficient for the return type and a little simpler.

working_dir = os.path.join(package_path, pkg)
install_cmd = [python_executable, '-m', 'pip', 'install', '.']
# if we aren't in a virtualenv, install in the
# user package directory so we don't need sudo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like sys.real_prefix is monkey patched by virtualenv. May be worth mentioning this in a comment, as it's doc mentioned in stdlib docs, I think.

mypy/build.py Outdated
stub_name = components[0] + '-stubs'
typed_file = os.path.join(pkg_dir, components[0], 'py.typed')
stub_dir = os.path.join(pkg_dir, stub_name)
if os.path.isdir(stub_dir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fscache instead?

mypy/build.py Outdated
if os.path.isdir(stub_dir):
stub_components = [stub_name] + components[1:]
path = os.path.join(pkg_dir, *stub_components[:-1])
if os.path.isdir(path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fscache?

mypy/build.py Outdated
path = os.path.join(pkg_dir, *stub_components[:-1])
if os.path.isdir(path):
third_party_dirs.append(path)
elif os.path.isfile(typed_file):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fscache?

@@ -2129,8 +2167,6 @@ def dispatch(sources: List[BuildSource], manager: BuildManager) -> Graph:
stubs_found=sum(g.path is not None and g.path.endswith('.pyi')
for g in graph.values()),
graph_load_time=(t1 - t0),
fm_cache_size=len(manager.find_module_cache.results),
fm_dir_cache_size=len(manager.find_module_cache.dirs),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you refactor lru_cache uses away, these could be restored. Or maybe these stats could be retrieved from the lru caches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I should add these back based on the lru_cache.cache_info() currsize.

from mypy.util import read_with_python_encoding


class FileSystemMetaCache:
def __init__(self) -> None:
self.flush()
self.stat = functools.lru_cache(maxsize=None)(self._stat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current status of the memory leak investigation? Do you believe it may still be happening?

@ethanhs
Copy link
Collaborator Author

ethanhs commented Apr 6, 2018

If you are short on time now, some of the suggestions can be left out and implemented in another PR so that we can merge this sooner.

I left the lru_caches, as that is still somewhat a point of contention (also the memory usage issues disappeared when I put them in, so I'm a little hesitant to go back). I also left the current Python version/executable inference in as-is, which should be addressed at a later date once there is a better understanding of their performance impact.

If you are able to update the PR by Monday we may just have enough time to include this in the next release

Great! Now that I have updated it, and it will be merged by Monday, perhaps @dmoisset and or @carljm would be interested in testing this? (I realize it is a bad time, as it is already kinda late on a Friday...)

Thanks for persevering with this -- this is going to make a big difference!

Happy to do it, and I want to thank you, Eric, and Jelle for the reviews! They were quite helpful.

@dmoisset
Copy link
Contributor

dmoisset commented Apr 7, 2018 via email

@ethanhs
Copy link
Collaborator Author

ethanhs commented Apr 8, 2018

@dmoisset FWIW the release itself is Friday, so you do have some time.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to merge now! A few follow-up issues remain:

  • Revert changes where existing caching was replaced with lru_cache, as discussed in earlier comments. If this causes memory leaks we should understand why the leaks are happening instead of (or in addition to) only finding a workaround.
  • Only invoke one Python subprocess per invocation.
  • Installed non-stubs can take precedence even if a stubs package exists (if they are installed in differnet places). I'll create an issue about this.
  • Restore caching for isfile_case.

@@ -88,8 +84,6 @@ def isfile_case(self, path: str) -> bool:
TODO: We should maybe check the case for some directory components also,
to avoid permitting wrongly-cased *packages*.
"""
if path in self.isfile_case_cache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests may not be indicative of real-world performance. More importantly, the cache is not only used for improving performance -- the goal is to provide more transactional semantics for file operations, and omitting caching for some file operations breaks that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants