Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data #121

Merged
merged 23 commits into from
Jul 25, 2020

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Jul 15, 2020

This is an attempt to modularize plugin.py so that it can easily be extended to make new plugins. For reference, discussion in #119.

@boukeversteegh
Copy link
Collaborator

just a quick reply while i'm packing for a holiday.. please have a look also at #49 and my comment there.
i think it may be helpful if we first split up plugin.py into separate files, and merge that, before we end up with even more objects/variables in plugin.py.
on another point, small steps may be wise also, because if the refactor takes a long time, it will start to block new incoming work

@adriangb
Copy link
Contributor Author

Yeah splitting things up I think would be great.

Perhaps we can do this in the following steps:

  1. Define the data structures and constants in a seperate file (plugin_dataclasses.py?)
  2. Insert the data structures into the current generate_code method (i.e. still a single monolithic function, but replace the nested dicts with these new classes).
  3. Refactor generate_code into a modular class.

Regarding these dataclasses specifically: how do you feel about their structure, and about moving the logic for each protobuf object into it's corresponding data class? Thus generating the response mainly consists of instantiating these days classes passing them (1) their parent, (2) their path and (3) their corresponding protobuf object.

@boukeversteegh
Copy link
Collaborator

Yeah splitting things up I think would be great.

Perhaps we can do this in the following steps:

  1. Define the data structures and constants in a seperate file (plugin_dataclasses.py?)
  2. Insert the data structures into the current generate_code method (i.e. still a single monolithic function, but replace the nested dicts with these new classes).
  3. Refactor generate_code into a modular class.

Regarding these dataclasses specifically: how do you feel about their structure, and about moving the logic for each protobuf object into it's corresponding data class? Thus generating the response mainly consists of instantiating these days classes passing them (1) their parent, (2) their path and (3) their corresponding protobuf object.

I like the idea of gradually moving towards OOP. It still feels somewhat more risky coming up with a top-down design and fitting the code into it, as opposed to isolating a responsibility from the code, and moving it into a (small) class, and see immediately that it made the code more simple.

I see a number of responsibilities that might get mixed up (they are already mixed up in the current codebase):

  • analysis (reading all the source files to gather information that we need to start do mapping)
  • mapping (deciding on the proper output structure, based on input structure, for example the files, packages and classes)
  • rendering (transforming the output structure into actual python code)

It's not clear if we can split up the code immediately into clean components that separate these concerns, but even if we don't, we should know to some level which responsibilities each class takes.

But even if we mix it up, it would still be a step forward, and I support your proposal to have each protobuf type handled by separate components.

I haven't spent time to think about the paths/parents you mentioned, but i suppose it can be handy.

In any case, I would stick to always working code as much as possible, instead of a big bang change, so that we will be able to keep releasing. It could be a good start to just extract a single object like Service, Message, Enum, or even ServiceMethod. Whichever is easiest. It will already be challenging.

if field_val is PLACEHOLDER:
raise ValueError(f"`{field_name}` is a required field with no default value.")

@property # TODO: replace with functools.cached_property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would bump the required version to generate protobufs, might be better to just copy the source for it somewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I see three alternatives:

  1. Copy it from source (as you suggested).
  2. Leave it as @property which means it will be recalculated every time.
  3. Use @functools.lru_cache(maxsize=1) which would achieve the same thing, albeit with overkill complexity in the backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested pulling the source, it's pretty straightforward. Only question is if it's worth the trouble or if the performance is okay as is.

@adriangb
Copy link
Contributor Author

I pushed a version that has a separate plugin_dataclasses.py and is built on the latest version in master.

The way it is currently structured, there is no way to go one dataclass at a time, since they all rely on the parent and child tree-like architecture.

Makefile Outdated
Comment on lines 11 to 12
test: ## - Run tests, ingoring collection errors (ex from missing imports)
poetry run pytest --cov betterproto --continue-on-collection-errors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously not needed for this PR, but I found it very useful, will remove before final review.

@adriangb
Copy link
Contributor Author

Moving from draft -> real PR.

Still have some failing tests, but most are passing.

@adriangb adriangb marked this pull request as ready for review July 16, 2020 17:32
@adriangb adriangb changed the title Modularize plugin.py REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data Jul 17, 2020
class Message(ProtoContentBase):
"""Representation of a protobuf message.
"""
parent: Union[Message, OutputTemplate] = PLACEHOLDER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main features of this refactor is the tree structure.

@adriangb
Copy link
Contributor Author

Tests are working on 3.8 and 3.7. 3.6 is broken because I was using PEP 563. It's easy enough to just use string literal type hints, I'll change that when needed.

Copy link
Collaborator

@nat-n nat-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort on this.

Splitting up the plugin module is overdue. But since you've started I think it would be nicer to organise things a bit more explicitly with a sub-package.

consider:

betterproto/
    plugin/
        __init__.py
        __main__.py  # this is executed as $ python -m betterproto.plugin
        models.py

This structure would then be easier to extend to split things out further.

@adriangb
Copy link
Contributor Author

@nat-n please take a look at the structure I just created and let me know if that falls within the framework you were thinking of.

Comment on lines +1 to +2
@SET plugin_dir=%~dp0
@python -m %plugin_dir% %*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this will pan out, not a .bat person...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test this out when I have time in the next couple of days 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anymore?

Comment on lines -1 to -2
@SET plugin_dir=%~dp0
@python %plugin_dir%/plugin.py %*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to src/betterproto/plugin/plugin.bat

@@ -1,480 +0,0 @@
#!/usr/bin/env python
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split up into src/betterproto/plugin/__init__.py and src/betterproto/plugin/parser.py

Copy link
Collaborator

@nat-n nat-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to find time to review this properly, but it looks better already.

fh.write(request.SerializeToString())


if __name__ == "__main__":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this here too doesn't hurt (except for breaking the relative import in this module), but it would be good to also have a __main__.py like:

from . import main
main()

see: https://docs.python.org/3/using/cmdline.html#cmdoption-m



def main():

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be an empty line before the docstring

Copy link
Collaborator

@boukeversteegh boukeversteegh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb for taking up this big task!

Since I'm not able to fully review now, I would just like to share a few high level questions/remarks.

I think you've described the structure of protobuf files correctly in the docblock, and it should be a useful to represent them as a nested structure as to allow navigating to definitions of field types, and so on (and for example knowing to which file they will be written).

Within the concept of a tree, I would go personally go with more specialized concepts, rather than a generic term, I.e. messages have a package, and a field have a "containing message" or just message. That way we stick close to the real domain terms, and do not introduce new concepts or abstractions that are not part of Protobuf, and thus lower the barrier of understanding the library.

Other questions:

Is building this tree part of this PR? What is the general approach to generating the tree?

Will we be able to take advantage of the classes, before having the tree building logic complete?

Even with a lot of changes already moved out of scope (and it's great you're keeping an eye on scope, and keeping diffs manageable), I'm still concerned about the scope being too big.

Could you list briefly which changes you consider explicit part of this PR, and which are out of scope?

Sorry if any of my questions were already answered or should have been clear from the code, I'm on mobile ☺️🏕️

@adriangb
Copy link
Contributor Author

adriangb commented Jul 18, 2020 via email

Copy link
Collaborator

@boukeversteegh boukeversteegh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! Still need to look at the actual objects. Coming later

message_data = Message(parent=output_package, proto_obj=item, path=path)
for index, field in enumerate(item.field):
if is_map(field, item):
MapField(parent=message_data, proto_obj=field, path=path + [2, index])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks so much cleaner now! Nice!

Because protobuf itself also has class names for describing protobuf objects , I suggest we suffix all our classes representing protobuf objects with Definition.

It will still be somewhat confusing, as protobuf uses Description, which is very similar, and newcomers won't easily be able to distinguish the input types from the intermediate types.

Some terms come to mind:

  • Parsed...Definition (ParsedFieldDefinition)
  • Python... (PythonField) — not my favorite as the terms represent more the protobuf objects, than the python objects
  • ...Compiler (MessageCompiler, FieldCompiler) — kinda close to what it is, perhaps?
  • ...Mapper/Transformer

I personally like Compiler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah either compiler or Parsed..Definition sound good to me.

Copy link
Contributor Author

@adriangb adriangb Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "Compiler" it would be:

  • PluginRequestCompiler
  • MessageCompiler
  • FieldCompiler
  • MapEntryCompiler
  • OneOfFieldCompiler
  • EnumDefCompiler
  • ServiceCompiler
  • ServiceMethodCompiler

Right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "Compiler" it would be:

  • RequestCompiler
  • MessageCompiler
  • FieldCompiler
  • MapEntryCompiler
  • OneOfFieldCompiler
  • EnumDefCompiler
  • ServiceCompiler
  • ServiceMethodCompiler

Right?

Yes 👍

Although I haven't looked at protobuf descriptions to see if the classnames match, but if it doesn't, it would be an easy fix. So this looks fine to me.

EnumDefCompiler could just be EnumCompiler, I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I suggested EnumDefCompiler instead of EnumCompiler is that I originally was confused because I thought the enum objects protoc was supplying were enum fields or something like that instead of the actual enum defenitions. But maybe that was just me not knowing enough about protobuf structures... Again I really am open to anything you or others decide on 😃

Copy link
Collaborator

@boukeversteegh boukeversteegh Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I understand. I may have had the same confusion, until I realized enums are also independent objects.

What further confuses things is that Enum values are also distinct from the Enum definition itself, although I think the Enum values end up being just fields on the Enum, but not sure (?).

How about EnumTypeCompiler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep "Enum Fields" are just Fields of the type if that makes sense. They're treated just like any other field. So the only time you encounter anything "Enum" related is the enum definitions. Why not EnumDefinitionCompiler? It's a bit long but I think it's most explicit. But otherwise EnumTypeCompiler sounds fine as well.

src/betterproto/plugin/parser.py Outdated Show resolved Hide resolved
if is_map(field, item):
MapField(parent=message_data, proto_obj=field, path=path + [2, index])
elif is_oneof(field):
OneOfField(parent=message_data, proto_obj=field, path=path + [2, index])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this represent a field in a group, or the group itself?

If its the group, we could call it Group, or OneOfGroup. If that is in line with the descriptor name passed by the plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a field that is part of a OneOf group. I don't think the oneof group itself is represented as an entity by protoc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the reply. In the current library implementation, one of fields and normal fields are declared in pretty much the same way, except that one of has a group attribute set.

It seems a simple way to model the domain from the outside, as it avoids a nested structure (as does yours) but also another concept altogether.

Would you say keeping this approach in the OOP version could work, or did you have a specific motivation to split them into two classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for still splitting up into two classes is:

  1. Be able to use an isinstance check instead of adding an is_oneof attribute or something like that.
  2. Don't pollute the already complex Field namespace with a method that isn't relevant to a field unless it's a OneOf

return current.package_proto_obj

@property
def request(self) -> "Request":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a convenience method or do we need it for compilation?
If possible, I'd prefer to avoid too much treewalking unless really needed, because it makes it harder to think about the code, especially when the tree walking is very abstract.

If we really need the plug-in request object from any object defined in it, we could also just pass it as a constructor arg, and keep things flat. Or would that make things more difficult?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, it's only being used by ServiceMethod.py_input_message to get all of the messages. I guess it could be a constructor arg, that might simplify things, or make them more complicated, not sure, I'd have to implement it.

src/betterproto/plugin/models.py Outdated Show resolved Hide resolved
@adriangb
Copy link
Contributor Author

@nat-n I think the structure is now how you indicated and all of the entry points should work
@boukeversteegh I renamed the classes as discussed

boukeversteegh
boukeversteegh previously approved these changes Jul 20, 2020
Copy link
Collaborator

@boukeversteegh boukeversteegh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangb Congratulations on this amazing job! You've single handedly turned our ticker tape code into a nicely organized structure. I still recognize most of the pieces, but now each of them are in their proper place, and it fits together beautifully. I'm very happy with the result, and I expect it will make contributing this library much more pleasant, easy and much less daunting.

I see you've put a lot of effort in this, as is visible from the well readable and logically structured code, and you took great care to not overdo it. That is important and much appreciated.

Any suggested changes that came to my mind are insignificant compared to the big jump forward that this PR represents, so rather than to keep polishing it and blocking its merge, I suggest that we merge this as soon as there are no showstoppers.

Any finetuning (which for my part is limited to clarifying some variable names) can easily be done in follow PRs.

Your work will be one of the major milestones in this project, so I thank you on behalf of all contributors and users of betterproto who will find it in a much better shape.

Muchas gracias! 🙏🎉

src/betterproto/plugin/models.py Outdated Show resolved Hide resolved
src/betterproto/plugin/models.py Outdated Show resolved Hide resolved
@adriangb
Copy link
Contributor Author

I think you are giving me too much credit, I could not have made this PR without the work you and others have done before me!

I made the minor edits from your comments.

Since these changes are all internal, I think it's okay if we have to make changes again later, so I too am on board for merging what we've got so far and going from there.

@adriangb adriangb requested a review from nat-n July 21, 2020 19:39
Copy link
Collaborator

@abn abn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All for this change. Sorry I am a bit late to the party, but wanted to add a few comments regarding structure and change sequencing. Not show stoppers, but nice to haves.

# plugin: python-betterproto
from dataclasses import dataclass
{% if description.datetime_imports %}
from datetime import {% for i in description.datetime_imports %}{{ i }}{% if not loop.last %}, {% endif %}{% endfor %}
from datetime import {% for i in description.datetime_imports|sort %}{{ i }}{% if not loop.last %}, {% endif %}{% endfor %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to implement this using isort similar to the way we use black today? Obviously as part of another changeset and not within here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That definitely makes sense for a future PR. Previously sort was being used somewhere else, I just moved it.

@@ -1,13 +1,13 @@
# Generated by the protocol buffer compiler. DO NOT EDIT!
# sources: {{ ', '.join(description.files) }}
# sources: {{ ', '.join(description.input_filenames) }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for sanity sake it would be great to see the changes to names etc (and the template as a whole) to be split out into another PR just so that in case of any downstream issues we can revert easily. Just food for thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree. There's two main reasons why names were changed:
(1) When using the old names in the new structure really didn't make sense (ex: properties being a container for fields).
(2) When I just renamed things as I was developing for the sake of keeping my sanity and understanding everything.

I'd say the former makes sense for this PR but the latter (along with other re-naming for clarity's sake) would have made sense for another PR. The issue is that I don't remember which is which at this point 😣

@@ -0,0 +1 @@
from .main import main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from .main import main
from betterproto.plugin.main import main

Or better yet, drop this in favour of consolidating main.py into __main__.py and update pyproject.toml to use betterproto.plugin.__main__:main.

@boukeversteegh
Copy link
Collaborator

boukeversteegh commented Jul 23, 2020 via email

@boukeversteegh
Copy link
Collaborator

As proposed earlier, I will merge this PR since there don't seem to be showstoppers in it.
Any small improvements / nice to haves are welcome as new PR's.

Thank you!

@boukeversteegh boukeversteegh merged commit b5dcac1 into danielgtaylor:master Jul 25, 2020
Gobot1234 added a commit to Gobot1234/python-betterproto that referenced this pull request Aug 6, 2020
…ke structure to represent parsed data (danielgtaylor#121)"

This reverts commit b5dcac1
Gobot1234 pushed a commit to Gobot1234/python-betterproto that referenced this pull request Aug 24, 2020
…ture to represent parsed data (danielgtaylor#121)

Refactor plugin to parse input into data-class based hierarchical structure
@abn abn mentioned this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants