-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGV Python Approach #237
Comments
I have a couple issues with this approach:
|
Hey @akonradi. Thanks for your input. You're right that this approach has tradeoffs, but we believe it's the right way to go considering the lack of a good alternative for finding a
|
Sorry, meant to reply and then got lost for two weeks. I see, so the problem is that there's no way to do static registration in Python like in C++ without loading the modules containing the I'm also a bit confused by how this works with the Bazel-based build. The norm there is to have a separate PGV build rule for each .proto file. Would that just check that the validation annotations are sane and then exit? Would the Lastly, this seems like it might be a good use case for proto insertion points, provided we can add PGV to all the |
Hey @akonradi. Thank you for the follow up and your good suggestions. See below for our thoughts:
Even then that presents other challenges such as: which of the classes or functions in that file is the right one? Two solutions there: either name the classes/functions descriptively and do name matching - i.e. to validate Foo, do
OR we could try to have the generated code in
|
@akonradi, given the gotchas python presents here that Dmitriy and Aditya have laid out (for the first interpreted language compared to the rest of the pack), I think the JIT strategy makes sense as an initial endeavor; we can iterate on it to see if a long-term imperative strategy can grow from it. As mentioned by @dkunitsk, insertion points would be a break from the existing behavior and where previously it was not enforced to run simultaneously with the base code gen plugin, it would be necessary. Towards your Bazel point, we should assume, as a default, consumers of PGV do not use Bazel. |
Agreed, there's really not a great way to make the Java approach (where validators can be loaded by class name) or the C++ approach (where validators are registered statically) work for Python. Agreed on not using insertion points, I think that would end up being more of a headache than we want. I'm still wary of how this will work with Bazel where we invoke PGV once per .proto file, and given that Envoy is one of the main users of PGV, I don't think that's a good assumption. I worry that we're going to end up with some weird asymmetries between using PGV to generate Python and C++/Java code. |
Before I found this library, I was actually thinking of doing something similar to this project by using a combination of python-betterproto and pydantic. It looks something like this (this is a working but incomplete example): @dataclass
class Bar(betterproto.Message):
boo: str = betterproto.string_field(1)
@validator('boo')
def validate_boo(cls, v: str) -> str:
"""A very bad regex email matcher."""
pattern = r"\w+@\w+\.\w{2,5}"
if re.fullmatch(pattern, v):
return v
else:
raise ValueError This seems to achieve what PGV is doing, but using pure idiomatic Python. Does anyone have any insights on this approach vs PGV, or how the two might be reconciled (ex: compile the PGV protobufs to this type of python validation)? |
One of the goals with PGV is to have the validation specifications be language-neutral, so any approach would need to take protos as input and produce Python code. |
Right, so couldn't code like the above be generated based on the protos? It's not clear to me how the PGV part of the protos is parsed. So the input to the above example would be: import "validate/validate.proto";
message Bar {
string boo = 1 [(validate.rules).string.pattern = "\w+@\w+\.\w{2,5}"];
} |
Absolutely. The PGV part is parsed by looking at the descriptor for the message being passed in, using jinja2 templates to generate some Python code, and then executing that. See here, for example. For the Go and C++ code, we've been reluctant to add additional run-time dependencies, including additional libraries, since that makes PGV code harder to use for consumers. I don't know what the stance is for Python code. |
That looks pretty similar to how python-betterproto itself generates the compiled It should be possible to pre-write the Python validation functions and have a jinja2 template that ties them to the python-betterproto compiled A quick example/summary:
from typing import Callable
import re
import betterproto
from pydantic.dataclasses import dataclasses
def string_pattern_validator(pattern: str) -> Callable
def validator(cls, v: str) -> str:
if not re.fullmatch(pattern, v):
raise ValueError(...)
return v
validator.__qualname__ = 'validator' + pattern # see https://github.com/samuelcolvin/pydantic/issues/806
return validator The Pydantic dataclass/message that was generated by python-betterproto @dataclass
class Bar(betterproto.Message):
boo: str = betterproto.string_field(1) The jijna2 from better_proto_generated_py_file import Bar
import pgv.python_validators as vals
class Bar(Bar):
boo_validator = validator("boo")(vals.string_pattern_val(pattern="regex parsed from .proto")) Then the end-user would import this constrained Bar class |
@akonradi any further thoughts on this approach? I've been working with In terms of steps, it would look something like:
|
@rodaine @kothariaditya can you chime in? |
Realizing that the same end goal (i.e. compile |
@akonradi @rodaine @kothariaditya any updates on this? I'm happy to work on this if there is interest. |
An open source tool I wrote that is compatible with pgv rules can generate pydantic model code through protobuf plug-ins https://github.com/so1n/protobuf_to_pydantic |
We are considering a new approach for Python that’s a bit different from existing language support as it wouldn’t do meaningful compile-time generation and instead generate the validation function just-in-time (JIT).
What is the approach?
Hand-code a
generate_validate
function(proto.Message->(proto.Message->bool))
that would do JIT generation at runtime of thevalidate
function for eachproto.Message
class.The generated
pb2.py
files have Descriptors that provide access to the options and annotations on a proto message. Ingenerate_validate
we’d walk a Descriptor and get the validation rules for every field. Based on the validation rules, we generate a validation function at runtime and return that to the user.For performance, we could cache the validate function for each message class. We could also allow pre-priming the cache to avoid cold starts.
A proof of concept of
generate_validate
:This would be used through a
validate
function that the user can write themselves, or we could write a minimal python package containing this.Although this approach doesn’t do compile time code generation, we plan to output the file containing
generate_validate
as part of PGV. This keeps the Python implementation together in the same repo as other languages, but more importantly allows for the same option verification performed now (e.g. int can’t have string rules, etc).Benefits
This avoids a user needing to find and import each specific
foo_validate.py
file where thevalidate()
is located which is difficult at runtime without a mess of extra code. Unlike Go, there isn’t a way to hook the validate method into the instance itself without extra imports.This allows the validation code to be totally generic. As long as the user has imported the standard generated code (
foo_pb2.py
) before callingvalidate
(as that registers the descriptor in the symbol db),generate_validate
will work.Potential problems
It would involve performance hits due to generating the validation code JIT. However, we minimize the pitfalls through caching, and allowing the cache to be primed to avoid cold starts.
The text was updated successfully, but these errors were encountered: