Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Marimo use unique default names for cells to serialize to a nicer Python file? #1954

Closed
danielhfrank opened this issue Aug 5, 2024 · 7 comments · Fixed by #3143
Closed

Comments

@danielhfrank
Copy link

Description

Right now, when I work on a Marimo notebook, I'm delighted that the serialized format is a plain python file. However, by default, each cell is serialized as a function with name "__". This makes for rather funny-looking python files, and if you have any sort of linter in your codebase, it will certainly not be happy.

I can see that if I create a new name for my cell, then it will be used in the serialized python file. I think it would be great if each cell had a unique name, so that the rendered python file inherits those names for the autogenerated functions.

Suggested solution

When we create a new cell, we should assign a unique name. I've not been able to look into the codebase to see if it would be practical to do something like cell_0, cell_1, etc, or if a more stateless identifier should be used. Since this is pretty transparent to the user, I would not be too concerned about readability of the name, but if it can be achieved, that would be nice!

I think this behavior is triggered by the default here:

name: str = "__",

I'm not quite sure though!

Alternative

No response

Additional context

I would be happy to contribute this change myself! I could use some pointers on whether the functionality I'm talking about is coming from where I linked above. If so, I can read up on contributing guidelines, and try to put it together!

@akshayka
Copy link
Contributor

akshayka commented Aug 5, 2024

The issue is that we want to introduce as few names into the file as possible. For example, in the future we might make it so that functions defined in a notebook that don't depend on any other variables are serialized directly into the notebook, instead of being wrapped in an app.cell decorator. If cells had unique names, especially human-readable ones, there would be a chance that these names would conflict with user-defined function names.

I understand the concern about linting, though. I am open to considering making the cell names unique to appease linters, though the names likely won't be very readable.

@dmadisetti
Copy link
Collaborator

I think there was the suggestion on naming based on execution graph position.

But to appease the linters, users can manually give names- which is much more readable than than anything that could be automated. Related, this code cleanliness check could be incorporated in a lint command: #1543

@akshayka
Copy link
Contributor

akshayka commented Aug 6, 2024

I think there was the suggestion on naming based on execution graph position.

Right. But this will add additional constraints to function names, thinking ahead to when we add top-level functions (@app.function). Perhaps that's okay, because most users likely wouldn't create functions named _cell_0, _cell_1, ...?

But to appease the linters, users can manually give names- which is much more readable than than anything that could be automated.

Yes definitely. I suppose @danielhfrank's concern is that this is too much work?

@dmadisetti
Copy link
Collaborator

marimo lint --suggest-names could pass cells into a GPT and run a prompt to rename cells?

I think renaming to _cell_x gives the false security that default cell can be reliably utilized in an import. Or maybe prefix _ are hidden in the app API?

@danielhfrank
Copy link
Author

Glad to see that this has sparked some conversation, I appreciate the attention!

But to appease the linters, users can manually give names- which is much more readable than than anything that could be automated.

Yes definitely. I suppose @danielhfrank's concern is that this is too much work?

Yes, indeed, I see that I can manually rename cells, but it's a lot of work - I think there could be a much smoother user experience if this were done automatically.

Having thought about it a bit, from my own perspective, I wouldn't really care if the function names were very human readable, or even if they reflected any sort of ordering. I think that a name like _cell_33c443 (some random hex identifier) would work well for my and others' use cases, and I expect would not interfere with any user code.

marimo lint --suggest-names could pass cells into a GPT and run a prompt to rename cells?

I think that a marimo lint command could help here if this didn't sound appealing as default behavior. Of course, my personal preference would be to change the defaults to unique names.

Thanks again for the consideration, and again, I would be happy to help contribute if you all think this is a constructive change!

@ggggggggg
Copy link
Contributor

ggggggggg commented Aug 7, 2024

Right. But this will add additional constraints to function names, thinking ahead to when we add top-level functions (@app.function). Perhaps that's okay, because most users likely wouldn't create functions named _cell_0, _cell_1, ...?

I think it's pretty reasonable to define some cell naming scheme and enforce that users are not allowed to name cells that conflict with that scheme. As long as it has a clear error message when someone does conflict with a clear action item to fix it, I don't think you'll get any pushback. Think of it like a keyword in a language, you just can't use them as variables, end of story.

@dmadisetti
Copy link
Collaborator

A thought on this- the new cell hashing could be leveraged to come up with names that are unique to the cell and it's position in the notebook's graph. This would also catch dupes, since identical cells would hash to the same cell name.

I think we could truncate the hash to be more friendly. Is a _ prefix seem like a reasonable cell name constraint?

This would be nice in a source control aspect too, because you could see the downrange effects that a single change might have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants