Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make generated Python files mypy error-free #2325

Closed
shunichironomura opened this issue Sep 16, 2024 · 9 comments · Fixed by #3143
Closed

Make generated Python files mypy error-free #2325

shunichironomura opened this issue Sep 16, 2024 · 9 comments · Fixed by #3143
Milestone

Comments

@shunichironomura
Copy link

Description

When running mypy on marimo notebooks, mypy reports error: Name "__" already defined on line X [no-redef] errors because cells in marimo notebooks are all defined as functions with the same name: __. This makes it difficult to properly type-check marimo notebooks.

Suggested solution

There are at least three approaches to resolve this:

  1. Define cells as functions with unique names. This approach eradicates the root cause, but possibly has side effects causing other issues.
  2. Add # type: ignore[no-redef] comments to lines containing @app.cell to suppress errors. This approach has probably the fewest side effects but makes the files a bit noisy.
  3. Add a top-level #mypy: disable-error-code="no-redef" comment. This approach is probably the least noisy but possibly suppresses other no-redef errors in the file.

Alternative

No response

Additional context

As far as I know, there is no way to configure mypy to ignore a certain error in specific files (like per-file-ignores config in Ruff) via configuration files such as pyproject.toml. So it's impossible to suppress the no-redef errors in marimo notebooks without modifying the notebook files.

So my current workaround is to take approach 3 and manually adding the top-level #mypy: disable-error-code="no-redef" comment to every marimo notebook.

@dmadisetti
Copy link
Collaborator

Related is the idea of a marimo specific lint: #1543

Let us know if you come across any other type checking/ static analysis issues particular to mypy and marimo

@mscolnick
Copy link
Contributor

@shunichironomura - I dont think this is an alternative or solution, but you can rename cells (in the cell dropdown menu), which is the name given in the generated python file

@shunichironomura
Copy link
Author

shunichironomura commented Sep 17, 2024

Related is the idea of a marimo specific lint: #1543

Let us know if you come across any other type checking/ static analysis issues particular to mypy and marimo

Another mypy/marimo issue that I encountered is that mypy cannot infer the module imported in a different cell. Here's a minimal example:

import marimo

__generated_with = "0.8.15"
app = marimo.App(width="medium")


@app.cell
def __():
    import pathlib
    return pathlib,


@app.cell
def __(pathlib):
    def read_file(file: pathlib.Path) -> str:
        return file.read_text()
    return read_file,


if __name__ == "__main__":
    app.run()

Running mypy against this file gives:

notebooks/mypy_issue.py:13: error: Name "__" already defined on line 7  [no-redef]
notebooks/mypy_issue.py:15: error: Name "pathlib.Path" is not defined  [name-defined]
Found 2 errors in 1 file (checked 1 source file)

mypy doesn't report error if it's imported via from pathlib import Path instead, but in that case the type of file is inferred as Any:

import marimo

__generated_with = "0.8.15"
app = marimo.App(width="medium")


@app.cell
def __():
    from pathlib import Path
    return Path,


@app.cell
def __(Path):
    def read_file(file: Path) -> str:
        reveal_type(file) # Any
        return file.read_text()
    return read_file,


if __name__ == "__main__":
    app.run()

Still, there are cases where I want to import the module (e.g., numpy, networkx).

My current workaround is to modify the cell like this:

from typing import TYPE_CHEKCING

if TYPE_CHECKING:
    import pathlib as _pathlib

def read_file(file: _pathlib.Path) -> str:
    return file.read_text()

Maybe this is a known issue, I haven't searched the issues, but let me know if you want to open a new issue for this.

@shunichironomura
Copy link
Author

@shunichironomura - I dont think this is an alternative or solution, but you can rename cells (in the cell dropdown menu), which is the name given in the generated python file

Oh I didn't know about that, thank you. I agree that it cannot be the solution, but it's good to know.

@shunichironomura
Copy link
Author

Another option is to make a mypy plugin for marimo? I don't know much about what's possible with mypy plugins, but it may solve these issues without changing the generated Python files.

@patrick-kidger
Copy link

patrick-kidger commented Sep 18, 2024

Whatever is done here, do bear in mind that there are other static type-checkers beyond mypy; most notably pyright. (Which personally I much prefer, as it seems to have far fewer edge cases.)

If I had to give my +1 to something, it would be to give each cell a unique name. This would also avoid trigger spurious Ruff lints like N807.

@wasimsandhu
Copy link
Collaborator

Wanted to bump this issue. This issue is causing our CI/CD to fail (with both mpy and pyright). Renaming cells is a solution, but manual and cumbersome when you have many cells in a notebook.

Is there a reason cells can't simply be serialized in the file as cell_1, cell_2, etc.?

@nambrosini17
Copy link

Thank you for creating marimo and congratulations for the seeding round! The experience has been pleasant.
As one of the selling points of marimo is better git tracking, i think an important part of it is that it inserts itself seemlessly into git workflows. Many of python devs (target audience of marimo) already use mypy/other linter to track files.
So removing this friction will likely satisfy a large part of the target users.
I've read the 2 issues in which this behaviour is discussed and i think assigning names with a __ prefix should be enough to avoid naming collisions. In fact it seems you've already decided that when assigning __generated_with = "0.x.y" to notebook version so this new cell naming convention would be consistent with that.

@mscolnick mscolnick added this to the marimo 1.0.0 milestone Dec 6, 2024
@mscolnick
Copy link
Contributor

The simplest solution might be best. I'd propose we enumerate the cells:

@app.cell
def __1():
    pass

@app.cell
def __2():
    pass

Users can rename cells, but if they are prefixed with two underscores and a number, we will assume they were generated by marimo. Similar to @wasimsandhu's suggestion, but a bit easier to know if the name was user-generated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants