🍏 Fix imports #88

boukeversteegh · 2020-06-10T22:11:54Z

Fixes:

🍎 Import Bug - No arguments are generated for stub methods when using import with proto definition #23 Import Bug - No arguments are generated for stub methods when using import with proto definition
🍎 Import bug - two packages with the same name suffix should not cause naming conflict #25 Import bug - two packages with the same name suffix should not cause naming conflict
🍎 Import bug - import child package from root #57 Import bug - import child package from root
🍎 Import bug - import child package from package #58 Import bug - import child package from package
🍎 Import bug - import parent package from child package #59 Import bug - import parent package from child package
🍎 Import bug - import root package from child package #60 Import bug - import root package from child package
🍎 Import bug - import root package from root #61 Import bug - import root package from root
🍎 ALL_CAPS message fields are parsed incorrectly. #11 ALL_CAPS message fields are parsed incorrectly.

Functional changes:

🍏 Python files are generated as pure packages my/package/__init__.py instead of my/package.py
- This prevents import name clashes such as my/package.py shadowing my/package/foo.py
- Output is entirely determined by the protobuf package structure, and not filenames.
🍏 All imports statements are generated relatively
- You can therefor compile into other locations than the root of the python project, e.g. output and import the generated code with from output.messages import PingMessage
- Generated files now correctly resolve children, sibilings, parents, cousins and root packages
🍏 All import statements are aliased (import user.v1 as user_v1 / import post.v1 as post_v1 ) to avoid naming conflicts
⚠️ To correctly resolve Nested Types vs Packaged Types (in a package), it is assumed that Style Guide is followed, and that package names are lowercase, while Types are Capitalized (Message.SubType vs package.SomeType). I expect this to not cause anyone trouble, but Import bug - Message from Capitalized package is mistaken for Nested Type #87 documents this behavior just in case.

Technical improvements:

🥝 Standard Tests (inputs/) will automatically recognize messages called Test as the main subject, even if they are in packages or in a proto-file different from the test-case name.
🥝 Removed dependency on stringcasing, which had a very buggy implementation for snake_case that we relied on. Replaced with custom implementation that covers our scenarios (let me know if some are not covered!).

…with two underscores

…same name broke import

….org/dev/peps/pep-0008/#id34)

… a nested type danielgtaylor#87

…tterproto.lib instead of using local forward references

nat-n

🍌 Nice work! This is a big milestone.

Considering functional change number 🍏 (the first one), is there any scenario that this would cause newly generated code to not be a drop in replacement for packages generated with the existing approach with respect to import statements?

I've included comments from a first pass review focusing on the simpler parts. More to follow 🍍

betterproto/__init__.py

nat-n · 2020-06-12T20:25:00Z

betterproto/tests/test_casing.py

+        ("foobar", "Foobar"),
+        ("FooBar", "FooBar"),
+        ("foo.bar", "FooBar"),
+        ("foo_bar", "FooBar"),


Always producing class names with conventional PascalCase style would be nice, but underscores are technically allowed in python class names so removing them like this creates the possibility for naming collisions unnecessarily.

Hm yes it does create collissions. However, I do not see a practical way to avoid collissions and still convert all other cases to PascalCase, since its not guaranteed the that source strings are even snake_case.

For example:

message MyMessage { }

should generate MyMessage

and

message my_message { }

should generate MyMessage as well.

If we generate for the last case My_Message, we are not really converting anything to PascalCase, and we might as well use the original name.

The same problem goes for the fields.
I think we are generating snake_case there to make the code look nice and pythonic, like we are generating PascalCase for classes to make them nicely pythonic.

In my opinion, two messages or fields with names that are only distinguishable by casing style, is not really a valid use case, but maybe I'm biased..

The argument to be made here is that for users following best practices of message naming, they'll get nice names in python so there's no issue. However those with weird naming styles might not have a choice, because they might be consuming an API they don't control. So there's value in being as forgiving as possible without degrading the happy path.

nat-n · 2020-06-12T20:35:17Z

betterproto/tests/test_casing.py

+        ("FOOBAR1", "foobar1"),
+        ("FOOBAR_1", "foobar1"),
+        ("FOO1BAR2", "foo1Bar2"),
+        ("foo__bar", "fooBar"),


Also not sure this one is correct in terms of avoiding unnecessary collisions.

Maybe should be camel_case("foo__bar") => "foo_Bar" and snake_case("foo_Bar") => "foo__bar" if that works?

other popular libraries like npm camelcase and apache CaseUtils also are not lossless encoders, but rather try to generate a string that matches whatever the casing format prescribes.

I'm inclined to not make this more complicated then necessary, as I don't think anyone in their right mind would make two fields in the same class called FOOBAR1 and FOOBAR_1

I'm gonna push this point a little bit more. The question isn't whether this is the preferred or most elegant behaviour for a case mapping function, but rather how to generate classnames from message names.

Now having a message called foo_bar and one called FOO__BAR in the same scope might not be advisable, but some users might have reasons to want to, or might not have a choice about it (if they're just consumers of an API). So I think there's value in supporting as many additional use cases of this unfortunate nature as possible without compromising on the more sane use cases.

I see that this could ruin your elegant regex based solution, though there are a number of alternatives that are more flexible. Admittedly this is one weird cases that can be handled among a larger number that can't. Maybe the decider should be whether other tools that specifically deal with pb message names also choke on this case.

I'll let you decide what to do.

I've implemented a strict parameter for all case functions, which when strict=True will output strings in strict format (snake_case = single underscores, pascalCase+CamelCase = no underscores). When strict=False, the delimiter count is preserved. E.g. camel_case('foo__bar') == 'Foo_Bar'.

At the moment we cannot use this however, because of nested types.

Protobuf plugin will report Nested.Type as NestedType, so while generating the class, its not known that this type was originally nested (and which part of the name is the parent class).

References to this type will be broken if we use strict=False, since that would result in a type reference of Nested_Type, which can then not be resolved.

I'm not sure how to solve this now, but we could make an attempt at a later time.

betterproto/tests/test_casing.py

betterproto/tests/util.py

betterproto/tests/test_inputs.py

nat-n

⭐️ Looks good. Just a lot of details to process.

betterproto/plugin.py

betterproto/compile/importing.py

nat-n · 2020-06-13T17:01:17Z

betterproto/tests/test_get_ref_type.py

+    assert name == "nested_child.Message"
+
+
+def test_import_deeply_nested_child_from_root():


I find "..._from_root" a little confusing, like root implies sys.path...

I would think of this as importing from sibling, as this present module and the module that it's importing from share the same parent package.

I would think of what is described below as import_from_sibling instead as not importing at all, unless I've misunderstood?

thanks for your comments.

test_import_deeply_nested_child_from_root

This test is about importing a deeply nested package from the context of the top-level package (no package).

def test_import_deeply_nested_child_from_root(): imports = set() name = get_ref_type( package="", imports=imports, source_type="deeply.nested.child.Message" ) assert imports == {"from .deeply.nested import child as deeply_nested_child"} assert name == "deeply_nested_child.Message"

as you can see from the generated import, it imports from 2 levels lower, so they are not siblings.

import_from_sibling

when a type is referenced from a 'sibling' (same package), then indeed, an import statement is not necessary.

maybe i should rename import_* to reference_*, because its about referencing types, that sometimes lead to imports.

agree?

by rename, i also mean the functions in the library code

Changes done as a result of these comments:

replaced method comments with functionally descriptive docstrings

terminology: reference instead of import

added asserts for empty import set cases

I'm not sure what to do about _from_root. I understand it can be confused with sys.path.
The context from which the import is done, is the root of the module, of which all packages for the generated code are descendants. It's the root from the perspective of the protobuf package hierarchy.

Would be great if we can find a term for that, that is not confusing.

betterproto/tests/test_get_ref_type.py

nat-n · 2020-06-13T17:08:07Z

betterproto/tests/test_get_ref_type.py

+    imports = set()
+    name = get_ref_type(package="package.child", imports=imports, source_type="Message")
+
+    assert imports == {"from ... import Message"}


This case looks like it follows a different rule to the previous test case with respect to importing the renamed module? Is this avoidable?

Somewhat yes and no.

What I tried to do wherever possible was alias the modules, but not the imported types.
Although aliases look ugly, at least the type names are preserved this way.

For ancestor imports, there is no identified package to import that wraps the Type, so you cannot give it an alias. e.g, this is not accepted by python:

from ... import . as ___root__ ___root__.Message

If you know for sure that the ancestor is not the root of the package, you can import 1 level higher up, and alias that. This is what i have done in reference_ancestor

python-betterproto/betterproto/compile/importing.py

Lines 126 to 135 in e2d672a

if py_package:

string_import = py_package[-1]

# Add trailing __ to avoid name mangling (python.org/dev/peps/pep-0008/#id34)

string_alias = f"_{'_' * distance_up}{string_import}__"

string_from = f"..{'.' * distance_up}"

imports.add(f"from {string_from} import {string_import} as {string_alias}")

return f"{string_alias}.{py_type}"

else:

imports.add(f"from .{'.' * distance_up} import {py_type}")

return py_type

However this approach is unsafe when the target type is in the top level package, as its not guaranteed that the generated code resides in its own package, so we might end up ascending outside of the package

So for the case you mentioned, this approach does not work safely.

You could alias the type itself. e.g.
from ... import Message as ___Message__, or
from ... import Message as ___root_Message_

2 would be a safe and valid solution that prevents name collisions when importing 'Message' from root into a subpackage that also contains 'Message'.

Unfortunately, it makes the name ugly, in that case.

In the future, I hope to improve the importing logic such that it keeps track of occupied names per package, and only creates aliases when necessary, although this will complicate things if in the future we want to support parallel compilation (e.g. multiple protoc processes compiling a single package hierarchy).

I think robustness should be prioritized over the prettiness of the names in general. The first goal of betterproto should be that it works. Then, that it generates beautiful code. So if you agree, I will also alias the root types.

from ... import Message as ___root_Message_

Until someone calls a package root 🙃

Good point 😄

I've aliased them now as ___Message__, I don't think adding root will help a lot in the end

betterproto/tests/test_get_ref_type.py

This reverts commit c88edfd

boukeversteegh · 2020-06-14T21:23:23Z

Considering functional change number 🍏 (the first one), is there any scenario that this would cause newly generated code to not be a drop in replacement for packages generated with the existing approach with respect to import statements?

Yes, there is:

proto files without package are now generated in the root of the module instead of the equivalent relative path of the proto file
- so protoc -I foo foo/bar/baz.proto would compile to bar/baz.py
- now it will compile to __init__.py, straight in the root
- for proto files without packages, it will be best to generate them into a subdirectory, e.g. --custom_out=lib

I've updated the docs to reflect this.

It was not possible to keep the behavior for proto files without packages, because if you mix some package-less files with packaged files, it will be impossible to know from where to import the referred types. When everything is package based, we can import foo.bar.Message easily, but if we need to import Message, and it could be in any odd submodule, depending on which proto file it was defined in, things get complicated.

…me from an ancestor package

# Conflicts: # Pipfile # README.md # betterproto/__init__.py # betterproto/plugin.py # betterproto/tests/util.py

boukeversteegh requested review from danielgtaylor, nat-n and cetanu June 10, 2020 22:11

boukeversteegh added 19 commits June 11, 2020 13:55

Create unit tests for importing

9fd1c05

Implement some import scenarios

e5e61c8

Implement importing unrelated package

57523a9

fix all broken imports

d7ba27d

Update tests to reflect new generated package structure

d8abb85

Support nested messages, fix casing. Support test-cases in packages.

f7c2fd1

Compile proto files based on package structure

fdf3b2e

Break up importing logic in methods

c00e2ae

Add test cases for cousin imports that break due to aliases starting …

7c8d47d

…with two underscores

Fixes issue where importing cousin where path has a package with the …

3105e95

…same name broke import

Simplify logic for generating package init files

8567892

Add import aliases to ancestor imports

76db2f1

Ensure uniquely generated import aliases are not name mangled (python…

1a95a79

….org/dev/peps/pep-0008/#id34)

Detect entry-point of tests automatically

fb54917

Add failing test for importing a message from package that looks like…

34c34bd

… a nested type danielgtaylor#87

Update readme with new output structure and fix example inconsistencies

65c1f36

Remove fixed test from xfail list danielgtaylor#11

5d2f3a2

Remove dependency on stringcase, apply black

3ca75da

Fix method name

83e13aa

boukeversteegh force-pushed the fix/imports branch from 8667dc7 to 83e13aa Compare June 11, 2020 11:55

boukeversteegh added 3 commits June 12, 2020 13:54

Support running plugin without installing betterproto

c88edfd

Fixes issue where generated Google Protobuf messages imported from be…

d9fa6d2

…tterproto.lib instead of using local forward references

Recompile Google Protobuf files

32c8e77

nat-n reviewed Jun 12, 2020

View reviewed changes

nat-n reviewed Jun 13, 2020

View reviewed changes

boukeversteegh added 6 commits June 14, 2020 16:51

Readability for generating init_files

2c360a5

Revert "Support running plugin without installing betterproto"

87f4b34

This reverts commit c88edfd

Shorten list selectors

63f5191

Fix terminology, improve docstrings and add missing asserts to tests

e2d672a

find_module docstring and search for init files instead of directories

fdbe020

Added missing tests for casing

52eea5c

boukeversteegh added has test Has a (xfail) test that verifies the bugfix or feature bug Something isn't working labels Jun 24, 2020

boukeversteegh added 6 commits July 1, 2020 09:39

Add parameter for non-strict cased output that preserves delimiter count

e3135ce

Avoid naming conflicts when importing multiple types with the same na…

81711d2

…me from an ancestor package

Merge remote-tracking branch 'daniel/master' into fix/imports

f4ebcb0

# Conflicts: # Pipfile # README.md # betterproto/__init__.py # betterproto/plugin.py # betterproto/tests/util.py

Remove stringcase dependency

0d9387a

Expose betterproto.ServiceStub

af71154

black

d21cd6e

nat-n approved these changes Jul 1, 2020

View reviewed changes

boukeversteegh merged commit cdddb2f into danielgtaylor:master Jul 4, 2020

boukeversteegh deleted the fix/imports branch July 4, 2020 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🍏 Fix imports #88

🍏 Fix imports #88

boukeversteegh commented Jun 10, 2020 •

edited

Loading

nat-n left a comment

nat-n Jun 12, 2020

boukeversteegh Jun 14, 2020

nat-n Jun 16, 2020 •

edited

Loading

nat-n Jun 12, 2020

boukeversteegh Jun 14, 2020

nat-n Jun 16, 2020

boukeversteegh Jul 1, 2020

nat-n left a comment •

edited

Loading

nat-n Jun 13, 2020

boukeversteegh Jun 14, 2020

boukeversteegh Jun 14, 2020

boukeversteegh Jun 14, 2020

nat-n Jun 13, 2020

boukeversteegh Jun 14, 2020

boukeversteegh Jun 14, 2020

nat-n Jun 16, 2020

boukeversteegh Jul 1, 2020

boukeversteegh commented Jun 14, 2020 •

edited

Loading

		assert name == "nested_child.Message"


		def test_import_deeply_nested_child_from_root():

	if py_package:
	string_import = py_package[-1]
	# Add trailing __ to avoid name mangling (python.org/dev/peps/pep-0008/#id34)
	string_alias = f"_{'_' * distance_up}{string_import}__"
	string_from = f"..{'.' * distance_up}"
	imports.add(f"from {string_from} import {string_import} as {string_alias}")
	return f"{string_alias}.{py_type}"
	else:
	imports.add(f"from .{'.' * distance_up} import {py_type}")
	return py_type

🍏 Fix imports #88

🍏 Fix imports #88

Conversation

boukeversteegh commented Jun 10, 2020 • edited Loading

nat-n left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nat-n Jun 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nat-n left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boukeversteegh commented Jun 14, 2020 • edited Loading

boukeversteegh commented Jun 10, 2020 •

edited

Loading

nat-n Jun 16, 2020 •

edited

Loading

nat-n left a comment •

edited

Loading

boukeversteegh commented Jun 14, 2020 •

edited

Loading