Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🍏 Fix imports #88

Merged
merged 34 commits into from
Jul 4, 2020
Merged

Conversation

boukeversteegh
Copy link
Collaborator

@boukeversteegh boukeversteegh commented Jun 10, 2020

Fixes:


Functional changes:

  • 🍏 Python files are generated as pure packages my/package/__init__.py instead of my/package.py
    • This prevents import name clashes such as my/package.py shadowing my/package/foo.py
    • Output is entirely determined by the protobuf package structure, and not filenames.
  • 🍏 All imports statements are generated relatively
    • You can therefor compile into other locations than the root of the python project, e.g. output and import the generated code with from output.messages import PingMessage
    • Generated files now correctly resolve children, sibilings, parents, cousins and root packages
  • 🍏 All import statements are aliased (import user.v1 as user_v1 / import post.v1 as post_v1 ) to avoid naming conflicts
  • ⚠️ To correctly resolve Nested Types vs Packaged Types (in a package), it is assumed that Style Guide is followed, and that package names are lowercase, while Types are Capitalized (Message.SubType vs package.SomeType). I expect this to not cause anyone trouble, but Import bug - Message from Capitalized package is mistaken for Nested Type #87 documents this behavior just in case.

Technical improvements:

  • 🥝 Standard Tests (inputs/) will automatically recognize messages called Test as the main subject, even if they are in packages or in a proto-file different from the test-case name.
  • 🥝 Removed dependency on stringcasing, which had a very buggy implementation for snake_case that we relied on. Replaced with custom implementation that covers our scenarios (let me know if some are not covered!).

Copy link
Collaborator

@nat-n nat-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍌 Nice work! This is a big milestone.

Considering functional change number 🍏 (the first one), is there any scenario that this would cause newly generated code to not be a drop in replacement for packages generated with the existing approach with respect to import statements?

I've included comments from a first pass review focusing on the simpler parts. More to follow 🍍

betterproto/__init__.py Outdated Show resolved Hide resolved
("foobar", "Foobar"),
("FooBar", "FooBar"),
("foo.bar", "FooBar"),
("foo_bar", "FooBar"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always producing class names with conventional PascalCase style would be nice, but underscores are technically allowed in python class names so removing them like this creates the possibility for naming collisions unnecessarily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes it does create collissions. However, I do not see a practical way to avoid collissions and still convert all other cases to PascalCase, since its not guaranteed the that source strings are even snake_case.

For example:

message MyMessage {

}

should generate MyMessage

and

message my_message {

}

should generate MyMessage as well.

If we generate for the last case My_Message, we are not really converting anything to PascalCase, and we might as well use the original name.

The same problem goes for the fields.
I think we are generating snake_case there to make the code look nice and pythonic, like we are generating PascalCase for classes to make them nicely pythonic.

In my opinion, two messages or fields with names that are only distinguishable by casing style, is not really a valid use case, but maybe I'm biased..

Copy link
Collaborator

@nat-n nat-n Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument to be made here is that for users following best practices of message naming, they'll get nice names in python so there's no issue. However those with weird naming styles might not have a choice, because they might be consuming an API they don't control. So there's value in being as forgiving as possible without degrading the happy path.

("FOOBAR1", "foobar1"),
("FOOBAR_1", "foobar1"),
("FOO1BAR2", "foo1Bar2"),
("foo__bar", "fooBar"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure this one is correct in terms of avoiding unnecessary collisions.

Maybe should be camel_case("foo__bar") => "foo_Bar" and snake_case("foo_Bar") => "foo__bar" if that works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other popular libraries like npm camelcase and apache CaseUtils also are not lossless encoders, but rather try to generate a string that matches whatever the casing format prescribes.

I'm inclined to not make this more complicated then necessary, as I don't think anyone in their right mind would make two fields in the same class called FOOBAR1 and FOOBAR_1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna push this point a little bit more. The question isn't whether this is the preferred or most elegant behaviour for a case mapping function, but rather how to generate classnames from message names.

Now having a message called foo_bar and one called FOO__BAR in the same scope might not be advisable, but some users might have reasons to want to, or might not have a choice about it (if they're just consumers of an API). So I think there's value in supporting as many additional use cases of this unfortunate nature as possible without compromising on the more sane use cases.

I see that this could ruin your elegant regex based solution, though there are a number of alternatives that are more flexible. Admittedly this is one weird cases that can be handled among a larger number that can't. Maybe the decider should be whether other tools that specifically deal with pb message names also choke on this case.

I'll let you decide what to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented a strict parameter for all case functions, which when strict=True will output strings in strict format (snake_case = single underscores, pascalCase+CamelCase = no underscores). When strict=False, the delimiter count is preserved. E.g. camel_case('foo__bar') == 'Foo_Bar'.

At the moment we cannot use this however, because of nested types.

Protobuf plugin will report Nested.Type as NestedType, so while generating the class, its not known that this type was originally nested (and which part of the name is the parent class).

References to this type will be broken if we use strict=False, since that would result in a type reference of Nested_Type, which can then not be resolved.

I'm not sure how to solve this now, but we could make an attempt at a later time.

betterproto/tests/test_casing.py Show resolved Hide resolved
betterproto/tests/test_casing.py Show resolved Hide resolved
betterproto/tests/util.py Show resolved Hide resolved
betterproto/tests/util.py Outdated Show resolved Hide resolved
betterproto/tests/util.py Show resolved Hide resolved
betterproto/tests/test_inputs.py Show resolved Hide resolved
betterproto/tests/test_inputs.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@nat-n nat-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️ Looks good. Just a lot of details to process.

betterproto/plugin.py Show resolved Hide resolved
betterproto/plugin.py Outdated Show resolved Hide resolved
betterproto/plugin.py Outdated Show resolved Hide resolved
betterproto/compile/importing.py Outdated Show resolved Hide resolved
betterproto/compile/importing.py Outdated Show resolved Hide resolved
assert name == "nested_child.Message"


def test_import_deeply_nested_child_from_root():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "..._from_root" a little confusing, like root implies sys.path...

I would think of this as importing from sibling, as this present module and the module that it's importing from share the same parent package.

I would think of what is described below as import_from_sibling instead as not importing at all, unless I've misunderstood?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your comments.

test_import_deeply_nested_child_from_root

This test is about importing a deeply nested package from the context of the top-level package (no package).

def test_import_deeply_nested_child_from_root():
    imports = set()
    name = get_ref_type(
        package="", imports=imports, source_type="deeply.nested.child.Message"
    )

    assert imports == {"from .deeply.nested import child as deeply_nested_child"}
    assert name == "deeply_nested_child.Message"

as you can see from the generated import, it imports from 2 levels lower, so they are not siblings.

import_from_sibling

when a type is referenced from a 'sibling' (same package), then indeed, an import statement is not necessary.

maybe i should rename import_* to reference_*, because its about referencing types, that sometimes lead to imports.

agree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by rename, i also mean the functions in the library code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes done as a result of these comments:

  • replaced method comments with functionally descriptive docstrings
  • terminology: reference instead of import
  • added asserts for empty import set cases

I'm not sure what to do about _from_root. I understand it can be confused with sys.path.
The context from which the import is done, is the root of the module, of which all packages for the generated code are descendants. It's the root from the perspective of the protobuf package hierarchy.

Would be great if we can find a term for that, that is not confusing.

betterproto/tests/test_get_ref_type.py Show resolved Hide resolved
betterproto/tests/test_get_ref_type.py Outdated Show resolved Hide resolved
imports = set()
name = get_ref_type(package="package.child", imports=imports, source_type="Message")

assert imports == {"from ... import Message"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case looks like it follows a different rule to the previous test case with respect to importing the renamed module? Is this avoidable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat yes and no.

What I tried to do wherever possible was alias the modules, but not the imported types.
Although aliases look ugly, at least the type names are preserved this way.

For ancestor imports, there is no identified package to import that wraps the Type, so you cannot give it an alias. e.g, this is not accepted by python:

from ... import . as ___root__

___root__.Message
  1. If you know for sure that the ancestor is not the root of the package, you can import 1 level higher up, and alias that. This is what i have done in reference_ancestor
    if py_package:
    string_import = py_package[-1]
    # Add trailing __ to avoid name mangling (python.org/dev/peps/pep-0008/#id34)
    string_alias = f"_{'_' * distance_up}{string_import}__"
    string_from = f"..{'.' * distance_up}"
    imports.add(f"from {string_from} import {string_import} as {string_alias}")
    return f"{string_alias}.{py_type}"
    else:
    imports.add(f"from .{'.' * distance_up} import {py_type}")
    return py_type
  • However this approach is unsafe when the target type is in the top level package, as its not guaranteed that the generated code resides in its own package, so we might end up ascending outside of the package
  • So for the case you mentioned, this approach does not work safely.
  1. You could alias the type itself. e.g.
    from ... import Message as ___Message__, or
    from ... import Message as ___root_Message_

2 would be a safe and valid solution that prevents name collisions when importing 'Message' from root into a subpackage that also contains 'Message'.

Unfortunately, it makes the name ugly, in that case.

In the future, I hope to improve the importing logic such that it keeps track of occupied names per package, and only creates aliases when necessary, although this will complicate things if in the future we want to support parallel compilation (e.g. multiple protoc processes compiling a single package hierarchy).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think robustness should be prioritized over the prettiness of the names in general. The first goal of betterproto should be that it works. Then, that it generates beautiful code. So if you agree, I will also alias the root types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ... import Message as ___root_Message_

Until someone calls a package root 🙃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 😄

I've aliased them now as ___Message__, I don't think adding root will help a lot in the end

betterproto/tests/test_get_ref_type.py Show resolved Hide resolved
@boukeversteegh
Copy link
Collaborator Author

boukeversteegh commented Jun 14, 2020

Considering functional change number 🍏 (the first one), is there any scenario that this would cause newly generated code to not be a drop in replacement for packages generated with the existing approach with respect to import statements?

Yes, there is:

  • proto files without package are now generated in the root of the module instead of the equivalent relative path of the proto file
    • so protoc -I foo foo/bar/baz.proto would compile to bar/baz.py
    • now it will compile to __init__.py, straight in the root
    • for proto files without packages, it will be best to generate them into a subdirectory, e.g. --custom_out=lib

I've updated the docs to reflect this.

It was not possible to keep the behavior for proto files without packages, because if you mix some package-less files with packaged files, it will be impossible to know from where to import the referred types. When everything is package based, we can import foo.bar.Message easily, but if we need to import Message, and it could be in any odd submodule, depending on which proto file it was defined in, things get complicated.

@boukeversteegh boukeversteegh added has test Has a (xfail) test that verifies the bugfix or feature bug Something isn't working labels Jun 24, 2020
@boukeversteegh boukeversteegh merged commit cdddb2f into danielgtaylor:master Jul 4, 2020
@boukeversteegh boukeversteegh deleted the fix/imports branch July 4, 2020 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working has test Has a (xfail) test that verifies the bugfix or feature
Projects
None yet
2 participants