-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🍏 Fix imports #88
🍏 Fix imports #88
Conversation
import
with proto definition
#23
…with two underscores
…same name broke import
….org/dev/peps/pep-0008/#id34)
8667dc7
to
83e13aa
Compare
…tterproto.lib instead of using local forward references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍌 Nice work! This is a big milestone.
Considering functional change number 🍏 (the first one), is there any scenario that this would cause newly generated code to not be a drop in replacement for packages generated with the existing approach with respect to import statements?
I've included comments from a first pass review focusing on the simpler parts. More to follow 🍍
("foobar", "Foobar"), | ||
("FooBar", "FooBar"), | ||
("foo.bar", "FooBar"), | ||
("foo_bar", "FooBar"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always producing class names with conventional PascalCase style would be nice, but underscores are technically allowed in python class names so removing them like this creates the possibility for naming collisions unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yes it does create collissions. However, I do not see a practical way to avoid collissions and still convert all other cases to PascalCase, since its not guaranteed the that source strings are even snake_case.
For example:
message MyMessage {
}
should generate MyMessage
and
message my_message {
}
should generate MyMessage
as well.
If we generate for the last case My_Message
, we are not really converting anything to PascalCase, and we might as well use the original name.
The same problem goes for the fields.
I think we are generating snake_case there to make the code look nice and pythonic, like we are generating PascalCase for classes to make them nicely pythonic.
In my opinion, two messages or fields with names that are only distinguishable by casing style, is not really a valid use case, but maybe I'm biased..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument to be made here is that for users following best practices of message naming, they'll get nice names in python so there's no issue. However those with weird naming styles might not have a choice, because they might be consuming an API they don't control. So there's value in being as forgiving as possible without degrading the happy path.
("FOOBAR1", "foobar1"), | ||
("FOOBAR_1", "foobar1"), | ||
("FOO1BAR2", "foo1Bar2"), | ||
("foo__bar", "fooBar"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not sure this one is correct in terms of avoiding unnecessary collisions.
Maybe should be camel_case("foo__bar") => "foo_Bar"
and snake_case("foo_Bar") => "foo__bar"
if that works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other popular libraries like npm camelcase and apache CaseUtils also are not lossless encoders, but rather try to generate a string that matches whatever the casing format prescribes.
I'm inclined to not make this more complicated then necessary, as I don't think anyone in their right mind would make two fields in the same class called FOOBAR1
and FOOBAR_1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna push this point a little bit more. The question isn't whether this is the preferred or most elegant behaviour for a case mapping function, but rather how to generate classnames from message names.
Now having a message called foo_bar
and one called FOO__BAR
in the same scope might not be advisable, but some users might have reasons to want to, or might not have a choice about it (if they're just consumers of an API). So I think there's value in supporting as many additional use cases of this unfortunate nature as possible without compromising on the more sane use cases.
I see that this could ruin your elegant regex based solution, though there are a number of alternatives that are more flexible. Admittedly this is one weird cases that can be handled among a larger number that can't. Maybe the decider should be whether other tools that specifically deal with pb message names also choke on this case.
I'll let you decide what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented a strict
parameter for all case functions, which when strict=True
will output strings in strict format (snake_case = single underscores, pascalCase+CamelCase = no underscores). When strict=False
, the delimiter count is preserved. E.g. camel_case('foo__bar') == 'Foo_Bar'
.
At the moment we cannot use this however, because of nested types.
Protobuf plugin will report Nested.Type as NestedType
, so while generating the class, its not known that this type was originally nested (and which part of the name is the parent class).
References to this type will be broken if we use strict=False
, since that would result in a type reference of Nested_Type
, which can then not be resolved.
I'm not sure how to solve this now, but we could make an attempt at a later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ Looks good. Just a lot of details to process.
assert name == "nested_child.Message" | ||
|
||
|
||
def test_import_deeply_nested_child_from_root(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find "..._from_root" a little confusing, like root implies sys.path...
I would think of this as importing from sibling, as this present module and the module that it's importing from share the same parent package.
I would think of what is described below as import_from_sibling instead as not importing at all, unless I've misunderstood?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your comments.
test_import_deeply_nested_child_from_root
This test is about importing a deeply nested package from the context of the top-level package (no package).
def test_import_deeply_nested_child_from_root():
imports = set()
name = get_ref_type(
package="", imports=imports, source_type="deeply.nested.child.Message"
)
assert imports == {"from .deeply.nested import child as deeply_nested_child"}
assert name == "deeply_nested_child.Message"
as you can see from the generated import, it imports from 2 levels lower, so they are not siblings.
import_from_sibling
when a type is referenced from a 'sibling' (same package), then indeed, an import statement is not necessary.
maybe i should rename import_*
to reference_*
, because its about referencing types, that sometimes lead to imports.
agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by rename, i also mean the functions in the library code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes done as a result of these comments:
- replaced method comments with functionally descriptive docstrings
- terminology:
reference
instead ofimport
- added asserts for empty import set cases
I'm not sure what to do about _from_root
. I understand it can be confused with sys.path.
The context from which the import is done, is the root of the module, of which all packages for the generated code are descendants. It's the root
from the perspective of the protobuf package hierarchy.
Would be great if we can find a term for that, that is not confusing.
imports = set() | ||
name = get_ref_type(package="package.child", imports=imports, source_type="Message") | ||
|
||
assert imports == {"from ... import Message"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case looks like it follows a different rule to the previous test case with respect to importing the renamed module? Is this avoidable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat yes and no.
What I tried to do wherever possible was alias the modules, but not the imported types.
Although aliases look ugly, at least the type names are preserved this way.
For ancestor imports, there is no identified package to import that wraps the Type, so you cannot give it an alias. e.g, this is not accepted by python:
from ... import . as ___root__
___root__.Message
- If you know for sure that the ancestor is not the root of the package, you can import 1 level higher up, and alias that. This is what i have done in
reference_ancestor
python-betterproto/betterproto/compile/importing.py
Lines 126 to 135 in e2d672a
if py_package: string_import = py_package[-1] # Add trailing __ to avoid name mangling (python.org/dev/peps/pep-0008/#id34) string_alias = f"_{'_' * distance_up}{string_import}__" string_from = f"..{'.' * distance_up}" imports.add(f"from {string_from} import {string_import} as {string_alias}") return f"{string_alias}.{py_type}" else: imports.add(f"from .{'.' * distance_up} import {py_type}") return py_type
- However this approach is unsafe when the target type is in the top level package, as its not guaranteed that the generated code resides in its own package, so we might end up ascending outside of the package
- So for the case you mentioned, this approach does not work safely.
- You could alias the type itself. e.g.
from ... import Message as ___Message__
, or
from ... import Message as ___root_Message_
2 would be a safe and valid solution that prevents name collisions when importing 'Message' from root into a subpackage that also contains 'Message'.
Unfortunately, it makes the name ugly, in that case.
In the future, I hope to improve the importing logic such that it keeps track of occupied names per package, and only creates aliases when necessary, although this will complicate things if in the future we want to support parallel compilation (e.g. multiple protoc processes compiling a single package hierarchy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think robustness should be prioritized over the prettiness of the names in general. The first goal of betterproto should be that it works. Then, that it generates beautiful code. So if you agree, I will also alias the root types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ... import Message as ___root_Message_
Until someone calls a package root 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point 😄
I've aliased them now as ___Message__
, I don't think adding root
will help a lot in the end
This reverts commit c88edfd
Yes, there is:
I've updated the docs to reflect this. It was not possible to keep the behavior for proto files without packages, because if you mix some package-less files with packaged files, it will be impossible to know from where to import the referred types. When everything is package based, we can import |
…me from an ancestor package
# Conflicts: # Pipfile # README.md # betterproto/__init__.py # betterproto/plugin.py # betterproto/tests/util.py
Fixes:
import
with proto definition #23 Import Bug - No arguments are generated for stub methods when using import with proto definitionFunctional changes:
my/package/__init__.py
instead ofmy/package.py
my/package.py
shadowingmy/package/foo.py
output
and import the generated code withfrom output.messages import PingMessage
import user.v1 as user_v1
/import post.v1 as post_v1
) to avoid naming conflictsMessage.SubType
vspackage.SomeType
). I expect this to not cause anyone trouble, but Import bug - Message from Capitalized package is mistaken for Nested Type #87 documents this behavior just in case.Technical improvements:
inputs/
) will automatically recognize messages calledTest
as the main subject, even if they are in packages or in a proto-file different from the test-case name.stringcasing
, which had a very buggy implementation forsnake_case
that we relied on. Replaced with custom implementation that covers our scenarios (let me know if some are not covered!).