add to/from_pydict methods #203

crflynn · 2021-01-29T02:41:17Z

This PR adds to_pydict() and from_pydict() to parse and serialize messages into python dictionaries, alternative to the current to/from_dict() methods which cater to JSON serialization.

In addition to defining messages, our protobuf definitions double as spark table schemas. Ingesting messages requires declaring the spark schema explicitly, and so we do that by mapping the betterproto message dataclass field types into spark sql types. The ingestion itself is done by decoding into a betterproto message, calling to_dict(), and then adding additional business logic to transform the JSON-prepared datetime strings into datetime objects. To do this we have to write a bit of extra code that looks like this:

def get_decoder(message_type: dataclass) -> Callable:
    """Get a decoder used to create a udf."""

    # determine fields which are datetimes so we can convert them from strings
    datetime_fields = [
        name
        for name, field in message_type().__dataclass_fields__.items()
        if field.type == datetime
    ]

    def decoder(s):
        """betterproto converts datetimes to strings on to_dict() so we convert it back
        using dateutil (iso8601) so that it goes into the spark table as a proper datetime
        """
        asdict = message_type().parse(bytes(s)).to_dict(casing=betterproto.Casing.SNAKE)
        for field in datetime_fields:
            if field in asdict:
                asdict[field] = parser.parse(asdict[field])
        return asdict

    return decoder

With this change our code would look like this, and the output dictionary value types match those declared in the dataclasses:

def get_decoder(message_type: dataclass) -> Callable:
    """Get a decoder used to create a udf."""

    def decoder(s):
        return message_type().parse(bytes(s)).to_pydict(casing=betterproto.Casing.SNAKE)

    return decoder

An example copied from the test:

@dataclass
class TestDatetimeMessage(betterproto.Message):
    bar: datetime = betterproto.message_field(1)
    baz: timedelta = betterproto.message_field(2)

test = TestDatetimeMessage().from_dict(
    {"bar": "2020-01-01T00:00:00Z", "baz": "86400.000s"}
)

print(test.to_pydict())
# {'bar': datetime.datetime(2020, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), 'baz': datetime.timedelta(days=1)}

This relates to some of the discussion happening in #36 with respect to datetimes. Personally this is the behavior that I expected for the to_dict() function, expecting it to act more like dataclasses.asdict but omitting defaults and handling casing.

Thanks a ton for betterproto. I'm happy to iterate on this if there are any requested changes.

Gobot1234 · 2021-10-25T22:22:55Z

Hi, sorry for the slow response on this.

I like this idea, however. I'm not a fan of the large amounts of duplication this brings, for which I may have a solution. Would it be possible to add an internal generator method that yields the keys and values for to/from_(py)dict to then process in their own ways?

guysz · 2021-12-17T21:39:13Z

Voted +1
I can assist if needed

Gobot1234 · 2021-12-17T21:46:50Z

Voted +1 I can assist if needed

If you'd be willing to pick this up with the feedback I've suggested, I'd be very grateful.

tcamise-gpsw · 2022-01-12T22:35:41Z

This does in fact satisfy my request from #316 (after the addition of a meta.proto_type == TYPE_ENUM case in the to_pydict method).

@Gobot1234, in regards to minimizing the code duplication... I took a stab at your generator idea. I don't think it was as clean as you imagined (or maybe I'm missing something). I also tried another approach of merging the two to/from_dict methods together into one and conditionally doing things depending on whether or not it is a to/from_pydict.

The generator approach is here: https://github.com/tcamise-gpsw/python-betterproto/commit/e1cb903f882446124a68bb7e54f234dda614504d
The merging approach is here: https://github.com/tcamise-gpsw/python-betterproto/commit/b749ad4cbf4910915ce3344fa24ae1473fd29643

Do either of these look acceptable? If so, i can open a PR and try to set up the poetry environment for more testing.

Gobot1234 · 2022-02-15T15:23:39Z

This does in fact satisfy my request from #316 (after the addition of a meta.proto_type == TYPE_ENUM case in the to_pydict method).

@Gobot1234, in regards to minimizing the code duplication... I took a stab at your generator idea. I don't think it was as clean as you imagined (or maybe I'm missing something). I also tried another approach of merging the two to/from_dict methods together into one and conditionally doing things depending on whether or not it is a to/from_pydict.

The generator approach is here: tcamise-gpsw@e1cb903 The merging approach is here: tcamise-gpsw@b749ad4

Do either of these look acceptable? If so, i can open a PR and try to set up the poetry environment for more testing.

Hi, sorry for the late reply, either looks fine to me. Thank you for looking into this

einarjohnson · 2022-04-22T09:17:59Z

Any ETA on when this might be released?

Gobot1234 · 2022-04-22T09:42:04Z

Any ETA on when this might be released?

I'm going to try and get this into b5, but no promises.

# Conflicts: # src/betterproto/__init__.py # tests/test_features.py

Gobot1234

Assuming #293 is merged, which should make the Message instances have enum fields always be enums,

This does in fact satisfy my request from #316 (after the addition of a meta.proto_type == TYPE_ENUM case in the to_pydict method).

special casing here shouldn't be necessary

Gobot1234 · 2022-05-09T16:34:16Z

Thanks for this!

add to/from_pydict methods

82c0835

nat-n added enhancement New feature or request has test Has a (xfail) test that verifies the bugfix or feature labels Apr 6, 2021

Gobot1234 mentioned this pull request Jan 12, 2022

Return actual enum's from to_dict #316

Open

Gobot1234 added 3 commits April 22, 2022 11:15

Merge branch 'master' into pr/203

cb4ce69

# Conflicts: # src/betterproto/__init__.py # tests/test_features.py

Remove unnecessary method call

ec4bf4c

Fix formatting

6edbfd8

Gobot1234 approved these changes Apr 22, 2022

View reviewed changes

Gobot1234 merged commit 6536181 into danielgtaylor:master May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add to/from_pydict methods #203

add to/from_pydict methods #203

crflynn commented Jan 29, 2021 •

edited

Loading

Gobot1234 commented Oct 25, 2021

guysz commented Dec 17, 2021

Gobot1234 commented Dec 17, 2021

tcamise-gpsw commented Jan 12, 2022

Gobot1234 commented Feb 15, 2022

einarjohnson commented Apr 22, 2022

Gobot1234 commented Apr 22, 2022

Gobot1234 left a comment

Gobot1234 commented May 9, 2022

add to/from_pydict methods #203

add to/from_pydict methods #203

Conversation

crflynn commented Jan 29, 2021 • edited Loading

Gobot1234 commented Oct 25, 2021

guysz commented Dec 17, 2021

Gobot1234 commented Dec 17, 2021

tcamise-gpsw commented Jan 12, 2022

Gobot1234 commented Feb 15, 2022

einarjohnson commented Apr 22, 2022

Gobot1234 commented Apr 22, 2022

Gobot1234 left a comment

Choose a reason for hiding this comment

Gobot1234 commented May 9, 2022

crflynn commented Jan 29, 2021 •

edited

Loading