-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add replace_import_package Python option. #7470
Add replace_import_package Python option. #7470
Conversation
I believe the failing check requires the addition of labels that I'm not authorized to add. |
Is it possible to get some feedback on this PR, or to have the proper labels applied so that I can see the results of an initial CI run? Thanks in advance. |
@jamescooke, I saw you mentioned this PR yesterday. Would this approach help your issue? |
@stackedsax Just some background: the dsrf project uses protocol buffers and ideally would support Python 3. It suffers from the problem mentioned in the pull text above:
Which means that Python 3 support is broken with protocol buffer compilation as it is at the moment. I went for a quick work-around with
If there was a way to adjust the imports used in compiled Python files, as I believe this PR provides, then yes - I think that would help the dsrf project support Python 3 more easily. I don't think that doing a |
Thanks for the feedback, @jamescooke. Certainly, |
Fixing relative imports seems like a good thing to do. However, renaming packages doesn't strike me as a sound practice. The background in #1491 lays out a lot of the issues... what's relevant here is that aligning the proto package and Python module is partly a correctness guarantee, not just a style consideration. In other words, remapping the Python package path would actually be a correctness regression, since it would make it possible to import two different "foo.bar.Baz" messages (which may be completely unrelated). If package styling is truly necessary, that can be done in pure Python, without any changes to protoc: # my/package/__init__.py
# Generated from foo/bar/my_message.proto:
from foo.bar.my_message_pb2 import MyMessage This ensures that the correctness guarantees provided by Python's import mechanism (and depended upon by the runtime) are not lost. |
@dlj-NaN thanks for the feedback. And thanks for agreeing that the relative import fix seems useful. That was the original motivation for this patch and I at first considered a narrower option that just added a I'm not sure I completely follow your objection to this remapping mechanism, however. It's true that this option provides an override to the default generated import statements and that of course introduces the possibility of mistake or misuse. But that just means it should be used judiciously and carefully. I'm not sure exactly what you're referring to when you say "aligning the proto package and Python module" ensures correctness. That alignment doesn't happen automatically, as currently, protoc completely ignores (for sound reasons) the protobuf package statement when generating python code. So it's up to the user to arrange the directory structure of the protobuf message files to match the desired generated python package structure to get properly generated python import statements. Perhaps you're saying this filesystem based approach is the preferable? That's not unreasonable -- I'm sure lots of users are doing it. But that option may not be available (if the user doesn't control the protobuf message repo) and may not be desirable (if, for example, in a polyglot environment there's a better directory structure that's not dictated by the python output). |
@dlj-NaN there were a couple of questions in @tpboudreau's comment that I'd love to hear your thoughts on. Thanks! |
Any movement on this? It's currently a big pain point on the python side of our protobuf packages. Would be nice to have an equivalent of go's go_package option. Thank you! |
@bpeake-illuscio makes a great point. I currently achieve this goal in my Go language target through an option in the proto file to set the generated package import name in the Go source. Other languages have similar options, and I think being able to do the same to control the python import statement would align well and have a consistent place for the setting instead of externally from the command line. |
I currently work around this by post-processing the generated python files. |
37a3061
to
5f10c6c
Compare
For me, I think the proposed functionality would help me improve correctness. Let me describe my scenario.
Does this use case seem reasonable? |
The same issue occurs when you utilize grpc generation. For |
This comment has been minimized.
This comment has been minimized.
So, Now is 2022, the patch is still not merged. Oh No! |
Unfortunately, remapping at the Python package level isn't something we would be interested in pursuing. The existing path import semantics are directly mapped to the import structure of the generated Python code, and although I certainly sympathize with having to live with past choices in code structure, I don't think adding a knob is actually in the overall best interest. One thing that isn't particularly well-documented is that the import paths can be arbitrarily remapped and fine-tuned with the In other words, if the desired generated output is (notionally): import a.b.c_pb2 then the
and the protocol compiler step should stitch those paths together, say, The This tends to fall out nicely with (reasonable :-) ) language-level semantics, like On a side note, I would medium-strongly advise against post-processing the generated sources. We explicitly do not promise stability of how imports are spelled, so it might work today, but it's not guaranteed behavior and might block a future upgrade for you. Even in extreme cases, a forwarding-import-only is usually more principled than trying to fake out things like # your/target/package/__init__.py:
from actual.generated.path import your_proto_file_pb2
# etc. or: # your/target/package/your_proto_file_pb2.py
from actual.generated.path.your_proto_file_pb2 import * Unfortunately, I think I will have to close this PR. We certainly do appreciate the contribution, but on balance, it sounds to me like this particular problem is just surfacing an issue that probably should be resolved at the |
Overview
This patch proposes a new python_opt option,
replace_import_package
, for the Python code generator that provides a lightweight mechanism for replacing the package name ordinarily generated by protoc in Python import statements (the "generated" package name) with a user supplied package name (the "preferred" package name) in order to allow the imported modules to be found more easily at runtime.In the simplest case, given a message file (say, "MainMessage.proto") that imports another message file (say, "ImportedMessage.proto") from within the same directory, invoking the new option like:
protoc --proto_path /protobuf/files --python_out ./my_package --python_opt replace_import_package MainMessage.proto
will replace the currently generated statement
import ImportedMessage_pb2 as ...
(with nofrom
clause), with the preferred statementfrom . import ImportedMessage_pb2 as ...
(with afrom
clause referencing an explicit package).Issue Background
When generating Python code from input files that import other protobuf files, protoc will generate Python import statements that either contain no
from
clause (if the input file's import statement contains no relative path prefix) or contain afrom
clause listing a package name derived solely from the relative path used in the protobuf import statement (protoc always ignores the protobuf package statement when generating Python code). This behavior occasionally results in Python import statements that don't work as expected: for example, import statements with nofrom
clause are no longer interpreted as implicit relative imports in Python3.Current workarounds include:
Each has drawbacks. Extended discussion of issues and workarounds can be found here.
A Few Preliminaries
Note that the scope of the new option is limited. In particular:
If you do not supply the new
replace_import_package
option for Python compiles, there is no change in behavior for protoc.If you do not use the protobuf "import" statement in any input files, the new
replace_import_package
option (even if present) is inoperative; it only changes the form of the Python import statements produced by protoc when generating Python code from protobuf files that themselves contain import statements.If your protobuf files happen to be arranged in a directory hierarchy that precisely matches the desired Python package structure, then the new
replace_import_package
option is unnecessary; protoc already creates output files in the desired locations (relative to the directory named in--python_out
) and produces correct import statements in the generated Python code.The new
replace_import_package
option does not change the protoc dataflow; output files are still generated 1:1 with input files and the location of the generated files is still controlled by the--python_out
optionProposed Option Operation
--python_opt
command line argumentreplace_import_package=MAPPINGS
generated.package|preferred.package
generated.package
name:preferred.package
name:generated.package
in one of the mappings, that generated package name is fully replaced by thepreferred.package
name in that mappingfrom
clause) matches a mapping with an emptygenerated.package
, causing the insertion of afrom
clause containing thepreferred.package
name in that mappingreplace_import_package
flag appears more than once, the effective set of mappings will be the union of the mappings from all the mapping stringsgenerated.package
value appears in more than one mapping, the latest appearing mapping (the rightmost mapping in the rightmost mapping string) is operativeUsage Scenarios
Some example usage scenarios for the new option (with runnable code) can be found here.