This change is RFC and WIP (please read whole change message).
Add `MypyTypeInferenceProvider` as an alternative for
`TypeInferenceProvider`. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field `MypyType.mypy_type`.
Let's assume we have the following code in the file `x.py` which we want
to inspect:
```python
x = [42]
s = set()
from enum import Enum
class E(Enum):
f = "f"
e = E.f
```
Then to get play with mypy types one should use the code like:
```python
import libcst as cst
from libcst.metadata import MypyTypeInferenceProvider
filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
module,
cache={MypyTypeInferenceProvider: cache},
)
mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target
print(mypy_type[x_name_node])
# prints: builtins.list[builtins.int]
print(mypy_type[x_name_node].fullname)
# prints: builtins.list[builtins.int]
print(mypy_type[x_name_node].mypy_type.type.fullname)
# prints: builtins.list
print(mypy_type[x_name_node].mypy_type.args)
# prints: (builtins.int,)
print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
# prints: typing.MutableSequence
print(mypy_type[set_call_node])
# prints: builtins.set
print("issuperset" in mypy_type[set_call_node].mypy_type.names)
# prints: True
print(mypy_type[set_call_node.func])
# prints: typing.Type[builtins.set]
print(mypy_type[e_name_node].mypy_type.type.is_enum)
# prints: True
```
Why?
1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be
installed. mypy is more popular than pyre. If the organization uses
mypy already (which is almost always the case), it may be difficult
to assure colleagues (including security team) that "we need yet
another type checker".
2. Even though it is possible to run pyre without watchman installation,
this is not advertised. watchman installation is not always possible
because of system requirements, or because of the security
requirements like "we install only our favorite GNU/Linux
distribution packages".
3. `TypeInferenceProvider` usage requires `pyre start` command to be run
before the execution, and `pyre stop` - after the execution. This may
be inconvenient, especially for the cases when pyre was not used
before.
4. Types produced by pyre in `TypeInferenceProvider` are just strings.
For example, it's not easily possible to infer that some variable is
enum instance. `MypyTypeInferenceProvider` makes it easy, see the
code above.
Drawbacks:
1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider`
comparing to `TypeInferenceProvider`.
How to partially solve this:
1. Implement AST tree caching in mypy. It may be difficult, however
this will lead to speed improvements for all the projects that use
this functionality.
2. Implement inferred types caching inside LibCST. As far as I know,
no caching at all is implemented inside LibCST, which is the
prerequisite for inferred types caching, so the task is big.
3. Implement LibCST CST to mypy AST. I am not sure if this possible
at all. Even if it is possible, the task is huge.
2. Two providers are doing similar things in LibCST will be present,
this can potentially lead to the situation when there is a need
install two type checkers to get all codemods from the library
running.
Alternatives considered:
1. Put `MypyTypeInferenceProvider` inside separate library (say,
LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly
separate `MypyTypeInferenceProvider` from the rest of LibCST.
Drawbacks:
1. The need to maintain separate library.
2. Limited fame (people need to know that the library exists).
3. Since some codemods cannot be implemented easily without the
library, for example, `if-elif-else` to `match` converter
(it needs powerful type inference), they are doomed to not be
shipped with LibCST.
2. Implement base class for inferred type, which inherits from `str`
(to keep the compatibility with the existing codebase) and
the mechanism for dynamically selecting `TypeInferenceProvider`
type checker (mypy or pyre; user can do this via environmental
variable). If the code inside LibCST requires just shallow type
information (so, just `str` is enough), then the code can run with
any type checker. The remaining code (such as `if-elif-else` to
`match` converter) will still require mypy.
Misc:
Code does not lint in my env, by some reason `pyre check` cannot find
`mypy` library.