Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(event_sources): implement Mapping protocol on DictWrapper for better interop with existing middlewares #1516

Merged
merged 2 commits into from
Sep 28, 2022

Conversation

Tankanow
Copy link
Contributor

@Tankanow Tankanow commented Sep 13, 2022

Issue #1503

Summary

DictWrapper and all of its subclasses should fully implement the Mapping interface so that it can be used interchangeably with code that expects the event to be a dictionary. This helps Powertools Middleware play nicely with existing middleware.

Changes

Add Mapping as parent class of DictWrapper, implement missing __len__ and __iter__ abstract methods.

User experience

Before this change, a user would encounter exceptions when trying to use many standard functions on dictionaries coerced into DictWrapper. Though a user could access the raw dictionary with the raw_event property, this was often clumsy, causing existing middleware to break (or to be order dependent) or causing the user to write lots of type-checking code.

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change? No

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.


View rendered CHANGELOG.md

@Tankanow Tankanow requested a review from a team as a code owner September 13, 2022 13:43
@Tankanow Tankanow requested review from rubenfonseca and removed request for a team September 13, 2022 13:43
@boring-cyborg boring-cyborg bot added area/commons github-actions Pull requests that update Github_actions code internal Maintenance changes labels Sep 13, 2022
@pull-request-size pull-request-size bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 13, 2022
@boring-cyborg boring-cyborg bot added dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation labels Sep 13, 2022
@Tankanow
Copy link
Contributor Author

Has something gone wild with the git history? I definitely did not change 52 files. =) Gimme a bit to fix.

@rubenfonseca
Copy link
Contributor

@Tankanow yes we had a small problem with a pipeline process. I think the best way is to reset your local repo from develop and cherry-pick your new commit on top. Thanks for the effort!

…methods to DictWrapper

ISSUE-1503: Add DictWrapper Mapping abc tests

ISSUE-1503: add StreamRecord tests
@pull-request-size pull-request-size bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 13, 2022


class DictWrapper:
class DictWrapper(Mapping):
Copy link
Contributor Author

@Tankanow Tankanow Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: I thought for a long time whether or not to explicitly add the Mapping protocol here or to simply implement the missing dunder methods and allow duck-typing to carry the day.

In the end, I decided it was better to explicitly add the Mapping protocol because, all things being equal, I think it is bad for subclasses of this class to break that protocol. It is good for mypy to warn when an implementer chooses to break the protocol. It should require an explicit type: ignore comment when choosing to break the protocol.

There is one existing class, StreamRecord, that breaks the protocol. There are no subclasses of StreamRecord and it is used only within the context of a DynamoDBRecord object. Because there is only one instance of this type breakage, and it is so localized, I marked StreamRecord.keys as type: ignore and added a comment and some tests to indicate that this class explicitly breaks the Mapping protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In older days I'd ask to remove it as we're breaking it in StreamRecord - I'd take this as an instance of Practicality Beats Purity.

It's highly unlikely a customer will use .keys to get the keys of a StreamRecord dict given the intent. However, if we do receive any customer complaint we'll consider removing the subclassing.

@Tankanow
Copy link
Contributor Author

@rubenfonseca, this is ready for review. I added a review comment to describe one design choice. I'm excited for your feedback.

@rubenfonseca rubenfonseca removed dependencies Pull requests that update a dependency file github-actions Pull requests that update Github_actions code labels Sep 13, 2022
@github-actions github-actions bot added the bug Something isn't working label Sep 15, 2022
@codecov-commenter
Copy link

Codecov Report

Base: 99.73% // Head: 99.73% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (c5394f5) compared to base (8be7164).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #1516   +/-   ##
========================================
  Coverage    99.73%   99.73%           
========================================
  Files          127      127           
  Lines         5729     5733    +4     
  Branches       651      651           
========================================
+ Hits          5714     5718    +4     
  Misses           8        8           
  Partials         7        7           
Impacted Files Coverage Δ
...lambda_powertools/utilities/data_classes/common.py 100.00% <100.00%> (ø)
...s/utilities/data_classes/dynamo_db_stream_event.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rubenfonseca
Copy link
Contributor

rubenfonseca commented Sep 15, 2022

Thank you so much for following up! After discussing this internally, we would like to move the discussion to our v2 release.

We are concerned with the changes that this PR might introduce to consumers of DictWrapper and its many subclasses. We want to take time to better understand and document the eventual changes of this PR. The release of v2 is being tracked to the end of September.

Action: move PR base to v2

@rubenfonseca rubenfonseca removed bug Something isn't working documentation Improvements or additions to documentation labels Sep 15, 2022
@rubenfonseca rubenfonseca changed the base branch from develop to v2 September 15, 2022 13:00
@rubenfonseca rubenfonseca changed the title fix(DataClasses): Implement Mapping protocol on DictWrapper fix(event_sources): implement Mapping protocol on DictWrapper Sep 15, 2022
@github-actions github-actions bot added the bug Something isn't working label Sep 15, 2022
@heitorlessa
Copy link
Contributor

Adding to @rubenfonseca comments on our concern about the MappingInterface.

The way Event Source Data Classes was done it abuses the word "Dict", when in fact is a class.

If we implement an iter for instance, it wouldn't be true for most Event Source Data Classes as the data varies in format - e.g., DynamoDB keeps a list of records in data.records, while API GW keeps in data.body.

A long-term strategy is to get rid of the duplication of Event Source Data Classes and Parser (Pydantic). We're waiting for Pydantic v2 to be launched to analyse its package size -- doing so would improve speed (~17x faster), validation, and open the door to more easily implement strict typing for all let alone a single source of truth for Event Source Models.

We'd still love to hear from you if you feel strongly inclined in having the protocol implemented either way in v2

@Tankanow
Copy link
Contributor Author

Thanks for the response @heitorlessa.

I respectfully disagree on this point:

If we implement an iter for instance, it wouldn't be true for most Event Source Data Classes as the data varies in format - e.g., DynamoDB keeps a list of records in data.records, while API GW keeps in data.body.

The event is always a dictionary so you can always iterate over its keys. I think you can rightly call the class DictWrapper to turn raw data into objects. In my humble opinion, classes wrapping the Lambda event should implement the Mapping interface (decorator pattern if you like) so that code can benefit from the Python standard library and data model without resorting to typing checking and access via raw_event.

To this point, in practice - that is in the entire code base - I could only find 1 subclass that breaks the interface. It seems that there are no consumers that expect the behavior you are worried about, i.e. that iter should behave differently for different event subclasses.

Lastly, it is true that this is a breaking change for anyone using Powertools who has currently implemented a negative check on the event type in order to coerce things back to a dict via raw_event if needed, as in this example:

from aws_lambda_powertools.utilities.data_classes import APIGatewayProxyEvent
from aws_lambda_powertools.utilities.data_classes.common import DictWrapper
from typing import Mapping, Union

def handler(event: Union[Mapping, DictWrapper], context):
    event_dict = event.raw_event if not isinstance(event, Mapping) else event
    return event_dict
        
event_dict = {'foo': 'bar'}
event_obj = APIGatewayProxyEvent(event_dict)
assert handler(event_dict, None) == handler(event_obj, None)

I feel like it is a corner case. As I feel like most consumers would use the positive check if isinstance(event, DictWrapper) to ensure the event has the .raw_event property.

@heitorlessa
Copy link
Contributor

Ah, I missed an important detail on iter - you want to access the raw_event property (actual dict), and not the data property where a payload might be (records, body etc).

If I got that right I take it back what I alluded earlier with different subclasses, and I fully agree we should do that.

I don't have data points on whether people are checking the superclass instance (DictWrapper) as opposed to the Event Source Class itself like we do even internally -- for the sake of Hyrum's law as customers might have taken a dependency on this, we should do it safely for V2 and document in the upgrade guide.

Let me know if I still haven't processed this correctly

@Tankanow
Copy link
Contributor Author

@heitorlessa,

If I got that right I take it back what I alluded earlier with different subclasses, and I fully agree we should do that.

Yes!

  • for the sake of Hyrum's law

I totally agree. I regret missing this when I opened this PR against V1. I'm glad you and @rubenfonseca caught it. I love your commitment to non-breaking changes as even breaking changes in a major release can be really frustrating to users. I would be happy to help V2 by testing against our code bases. Please let me know how I can help.

@heitorlessa
Copy link
Contributor

Perfect, we have a plan then.

Could you create a PR implementing the mapping interface against the v2 branch?

We can take specific questions, if any, about implementation and the messaging on upgrade guide.

@rubenfonseca @leandrodamascena and I can support you

Thank you!!!

@Tankanow
Copy link
Contributor Author

Tankanow commented Sep 26, 2022

@heitorlessa and @rubenfonseca,

Regarding this request:

Could you create a PR implementing the mapping interface against the v2 branch?

It looks like someone moved this PR destination to awslabs:v2. Is there any action item for me on this ticket?

@heitorlessa
Copy link
Contributor

@Tankanow we'll be looking into the PR this week ;) I mistakenly added a comment in the PR asking to create a PR :D, I'll be flying back home tomorrow, so EOW in the worst case.

@rubenfonseca rubenfonseca requested review from heitorlessa and removed request for rubenfonseca September 27, 2022 13:13
@heitorlessa
Copy link
Contributor

Looking into this now...

@heitorlessa heitorlessa self-assigned this Sep 28, 2022
Copy link
Contributor

@heitorlessa heitorlessa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @Tankanow - a minor note on subclassing (practicality vs purity) with our intent to keep but will revert if it affects any use case we couldn't account for, and code suggestions to improve maintenance.

We can merge once these are in ;)

Thank you!!!!



class DictWrapper:
class DictWrapper(Mapping):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In older days I'd ask to remove it as we're breaking it in StreamRecord - I'd take this as an instance of Practicality Beats Purity.

It's highly unlikely a customer will use .keys to get the keys of a StreamRecord dict given the intent. However, if we do receive any customer complaint we'll consider removing the subclassing.

aws_lambda_powertools/utilities/data_classes/common.py Outdated Show resolved Hide resolved
tests/functional/test_data_classes.py Outdated Show resolved Hide resolved
tests/functional/test_data_classes.py Outdated Show resolved Hide resolved
tests/functional/test_data_classes.py Outdated Show resolved Hide resolved
Co-authored-by: Heitor Lessa <lessa@amazon.nl>
@rubenfonseca
Copy link
Contributor

Just re-checking this PR, I agree that we should merge this into develop instead :)

@heitorlessa heitorlessa changed the base branch from v2 to develop September 28, 2022 13:44
@heitorlessa heitorlessa changed the title fix(event_sources): implement Mapping protocol on DictWrapper fix(event_sources): implement Mapping protocol on DictWrapper for better interop with existing middlewares Sep 28, 2022
@heitorlessa
Copy link
Contributor

Thanks for addressing the feedback @Tankanow - merging now. THANK YOU <3

@heitorlessa heitorlessa merged commit effb910 into aws-powertools:develop Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working internal Maintenance changes size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants