Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile() with PyCF_ONLY_AST flag ignores the optimize arg #108113

Closed
iritkatriel opened this issue Aug 18, 2023 · 17 comments · Fixed by #108154
Closed

compile() with PyCF_ONLY_AST flag ignores the optimize arg #108113

iritkatriel opened this issue Aug 18, 2023 · 17 comments · Fixed by #108154
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@iritkatriel
Copy link
Member

iritkatriel commented Aug 18, 2023

compile() (or ast.parse(), which calls it) can return the ast for some source code, converted to a Python object version of the AST. But this AST is not optimized (with _PyAST_Optimize).

Static Python re-implemented _PyAST_Optimize for this reason. If we expose a way for them to get an optimized AST, they won't have to.

For performance, it would be better not to call another API function on a Python AST, but to add an API option that runs _PyAST_Optimize before converting the AST to Python.

Linked PRs

@iritkatriel
Copy link
Member Author

iritkatriel commented Aug 18, 2023

I think we can use the optimize parameter for this. When the ast.PyCF_ONLY_AST flag is set, the optimize flag is not currently used at all (this is arguably a bug because the AST should replace __debug__ by True or False).

We can decide that when ast.PyCF_ONLY_AST and optimize >= 2 then we return an optimized AST.

@iritkatriel
Copy link
Member Author

Currently:

>>> ast.dump(compile("x = __debug__", "f.py", "exec", flags = ast.PyCF_ONLY_AST, optimize=2))
"Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Name(id='__debug__', ctx=Load()))], type_ignores=[])"
>>> 

With the suggested change:

>>> ast.dump(compile("x = __debug__", "f.py", "exec", flags = ast.PyCF_ONLY_AST, optimize=2))
"Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Constant(value=False))], type_ignores=[])"

@carljm
Copy link
Member

carljm commented Aug 18, 2023

Currently the design is that the optimize level is taken by _PyAST_Optimize as an argument, and then used internally by the AST optimizer to decide what and how to optimize.

So rather than deciding to call _PyAST_Optimize only if optimize >= 2, would it be more consistent to just say that _PyAST_Optimize should be called anytime we build an AST, and continue to let it decide internally what to do based on the optimize level?

Presumably for optimize = 0 at least, that should result in no change in behavior.

@iritkatriel
Copy link
Member Author

Another thing to decide is whether we want this to work when the input is already an AST. Currently if ast.PyCF_ONLY_AST then the AST is just returned (a new reference to it). I think we should make the new API optimize the AST (convert it to a C AST, optimize, and then convert back to Python).

@carljm
Copy link
Member

carljm commented Aug 18, 2023

Hmm, my leaning would be rather to stick with the current behavior (return input unchanged) if the input is already an AST and you request an AST. IMO it's too subtle for compile to implicitly try to re-optimize a given AST. If we want a way to take a constructed AST and run the optimizer over it, I would provide that as dedicated API: ast.optimize(...)?

@iritkatriel
Copy link
Member Author

So rather than deciding to call _PyAST_Optimize only if optimize >= 2, would it be more consistent to just say that _PyAST_Optimize should be called anytime we build an AST, and continue to let it decide internally what to do based on the optimize level?

Presumably for optimize = 0 at least, that should result in no change in behavior.

My concern is that in normal compilation we would end up calling _PyAST_Optimize twice, because it's called in compile.c when we start the compilation. We could try to pull it out and make _PyAST_Compile expect an optimized AST, but I'm not sure what that might break.

@iritkatriel
Copy link
Member Author

Note that _PyAST_Compile takes its own version of the flags.

@carljm
Copy link
Member

carljm commented Aug 18, 2023

My concern is that in normal compilation we would end up calling _PyAST_Optimize twice, because it's called in compile.c when we start the compilation. We could try to pull it out and make _PyAST_Compile expect an optimized AST, but I'm not sure what that might break.

Isn't this a problem regardless, for optimize = 2?

I think the main back-compat issue would be if there are C extensions currently calling _PyAST_Compile (with an AST they have in some way constructed themselves) and expecting AST optimization to occur. I agree this is possible. One way to address it could be to add a new compiler flag that says "AST already optimized" and skip _PyAST_Optimize in _PyAST_Compile iff this flag is set?

Or another (IMO less clean, but perhaps also less invasive) option would be to call _PyAST_Optimize in builtin_compile iff PyCF_ONLY_AST is set. Then we'd still optimize only once in the case of a full compile?

@iritkatriel
Copy link
Member Author

Or another (IMO less clean, but perhaps also less invasive) option would be to call _PyAST_Optimize in builtin_compile iff PyCF_ONLY_AST is set. Then we'd still optimize only once in the case of a full compile?

This is what I was thinking we'd do.

@carljm
Copy link
Member

carljm commented Aug 18, 2023

I think that's reasonable, but in that case I see no reason to also limit it to optimize >= 2 -- I think we can safely just always call it and let _PyAST_Optimize make use of the optimize level?

@iritkatriel iritkatriel changed the title Add a way to get an optimized AST compile() with PyCF_ONLY_AST flag ignores the optimize arg Aug 18, 2023
@iritkatriel iritkatriel added the type-bug An unexpected behavior, bug, or error label Aug 18, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Aug 18, 2023
@iritkatriel
Copy link
Member Author

As discussed offline, the AST optimization performs const folding regardless of the value of optimize, so calling it always it too disabling. We'll go with a new flag ast.PyCF_OPTIMIZED_AST.

@corona10
Copy link
Member

corona10 commented Aug 20, 2023

Maybe this is another topic with this issue,
but I feel the need for a flag that can be used for not applying the CFG optimization phase for debugging purposes.
(To observe the origin bytecodes and how they changed)

Do you think that it's worth adding it too?
If so, I would like to introduce a new mode of compile flag PyCF_DISABLE_CFG_OPT(I am not sure where the flag should exist, AST looks not proper, but no proper places exist), and then if the PyCF_DISABLE_CFG_OPT flag is passed from builtin.compile then optimize_and_assemble_code_unit will not apply the related optimization process.

WDYT?

There might be another way to deal with this issue too.

@iritkatriel
Copy link
Member Author

There is another way. I refactored the compiler to separate the codegen/opt/assemble stages, so we can pick and choose which ones to apply (not through compile but a new function). We already have unit tests that exercise the different stages.

See #104240.

@corona10
Copy link
Member

corona10 commented Aug 20, 2023

Ah for debugging purposes, it will be enough.

@carljm
Copy link
Member

carljm commented Aug 21, 2023

Ironically the solution we chose does not actually fix the issue as described in the issue title :) But I think it's OK, we now clearly present an alternative option that does respect the optimize flag.

@iritkatriel
Copy link
Member Author

I think we should make ast.parse() optimize also when the input is an AST. At the moment there is no way to apply AST optimization to an AST that you created, so there is a gap. I'd expect these to give the same result:

>>> ast.dump(ast.parse("print(1+2)", optimize=2))
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Constant(value=3)], keywords=[]))], type_ignores=[])"


>>> ast.dump(ast.parse(ast.parse("print(1+2)"), optimize=2))
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[BinOp(left=Constant(value=1), op=Add(), right=Constant(value=2))], keywords=[]))], type_ignores=[])"

@carljm
Copy link
Member

carljm commented Aug 21, 2023

I think we should make ast.parse() optimize also when the input is an AST. At the moment there is no way to apply AST optimization to an AST that you created, so there is a gap. I'd expect these to give the same result:

I agree. When we were discussing respecting optimize implicitly and always with PyCF_ONLY_AST, I did not think we should do this, but now that you can explicitly ask for compile to either optimize your AST or not, if you ask for it I think we should do it even on an AST input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants