-
Notifications
You must be signed in to change notification settings - Fork 16
Improved param_value aggregation to support one-to-many relationships #1253
Conversation
@@ -38,10 +41,8 @@ saas_config: | |||
- dataset: saas_connector_example | |||
field: conversations.id | |||
direction: from | |||
data_path: conversation_messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to our changes but I cleaned this up since I was here to add some new collections for testing.
@@ -85,8 +85,6 @@ saas_config: | |||
query_params: | |||
- name: charge | |||
value: <charge_id> | |||
- name: payment_intent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed payment_intent
since the way we are building the requests now causes this to be invalid, the Stripe API doesn't let us specify both params at once. The result is still the same since we have a list of disputes that can be identified either by charge_id
or payment_intent_id
.
model: Type[SaaSSchema] = create_model( | ||
f"{self.saas_config.type}_schema", | ||
**field_definitions, | ||
__base__=SaaSSchema, | ||
_connector_params=PrivateAttr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was be best way I could find to dynamically set a property on the base class. I didn't include this with the other field_definitions
since that is only used to describe the secret
fields.
) | ||
|
||
# Build SaaS requests for fields that are dependent on each other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grouped_input
scenario is now covered by the same general approach above.
return request_params | ||
|
||
def _filtered_secrets(self, current_request: SaaSRequest) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to create a product of reference values with connector params that are not used by the request, so we filter to only keep the connector params in use.
|
||
if self.privacy_request: | ||
param_values["privacy_request_id"] = self.privacy_request.id | ||
param_values[PRIVACY_REQUEST_ID] = self.privacy_request.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More cleanup, moving these values to constants.
@@ -27,6 +28,8 @@ def zendesk_secrets(saas_config): | |||
"domain": pydash.get(saas_config, "zendesk.domain") or secrets["domain"], | |||
"username": pydash.get(saas_config, "zendesk.username") or secrets["username"], | |||
"api_key": pydash.get(saas_config, "zendesk.api_key") or secrets["api_key"], | |||
"page_size": pydash.get(saas_config, "zendesk.page_size") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been caught before but it actually throws an error now if this value is missing, which is the correct behavior.
@@ -107,6 +110,8 @@ def zendesk_create_erasure_data( | |||
zendesk_connection_config: ConnectionConfig, zendesk_erasure_identity_email: str | |||
) -> None: | |||
|
|||
time.sleep(60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to wait performing the access request or else we will get a rate limit exception when we try to create the erasure data. More robust solution coming up in a future ticket.
|
||
|
||
@pytest.mark.unit_saas | ||
class TestSaaSQueryConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving these tests to their own file.
@@ -137,7 +137,6 @@ async def test_zendesk_access_request_task( | |||
"custom_fields", | |||
"satisfaction_rating", | |||
"sharing_agreement_ids", | |||
"fields", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing assertion for optional fields, these are not always returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this generally looks good to me - nice job taking the opportunity to refactor and improve some things a bit more broadly! my comments are relatively minor - the more substantive ones aren't blockers, but just food for thought.
just noting here that we don't yet have an integration test for the new flatten
/one-to-many relationship, so that's just something to keep in mind once we do actually have our first use case -- it may need some ironing out.
also, i think we probably need some doc updates with this with the new functionality?
f"[{', '.join(invalid_options)}] are not valid options, default_value must be a list of values from [{', '.join(options)}]" | ||
) | ||
|
||
if multiselect and not options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as i was going through this, i started to wonder why we need to tie the multivalue (i.e. "multiselect") support strictly to options
params - why shouldn't we able to support proper lists/multiple values for any param, if we can support it in the case of an option
/enum param?
perhaps there's not a valid use case, or perhaps there's more complexity to supporting it than i'm speculating about here. but i wanted to bring it up at least for consideration.
if it's a valid use case but is not trivial to support here, then i think it may be worth creating a new issue to track it.
note that if we do want to just enable multi value support generally (and not only for options
params), then i think we should probably rename it from multiselect
to something like multivalue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a valid concern but we haven't run into such use-case yet. If a connector param can have multiple values, they are usually from a well defined list, I haven't seen a scenario where we would expect the user to come up with a list of valid values. Plus from a UI/UX perspective, we'd have to figure out how we would expect the user to freehand a list of values, at least by selecting from a list we can re-use our multiselect UI element. I would like to put this on hold for now.
"api_version": "2.0", | ||
"page_size": 10, | ||
"api_key": "letmein", | ||
"account_types": ["checking", "savings", "investment"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make sure i'm following, are the account_types
not coming into play here because we're going to the mailing_lists
node, which doesn't actually leverage the account_types
param?
what i'm trying to understand is why there are only 3 requests that are formed, if we are flattening the list_id
list (that has 3 elements) and we are providing 3 separate account_types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to demonstrate that we won't generate the product with connector_params that are not in use by the request. I can add a comment to make this clearer.
@@ -72,3 +72,42 @@ def test_default_value_fields( | |||
del saas_example_secrets["domain"] | |||
config = saas_example_secrets | |||
schema.parse_obj(config) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to include a test that confirms an exception is thrown if multiselect
= True
but without an an options
field in the param, since that's supposed to be invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch with this scenario, that's already covered by test_multiselect_without_options
in test_saasconfig.py
. The tests in test_connection_secrets_saas.py
are to test the validation of the secrets themselves, the scenario you are describing is testing the validation of the schema/connector_params.
return [ | ||
dict(zip(merged_dicts.keys(), values)) | ||
for values in product( | ||
*( | ||
value if isinstance(value, list) else [value] | ||
for value in merged_dicts.values() | ||
) | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a mouthful :)
i'm not sure i've got a better way to write it, but it did take me quite a bit of time to understand what's being done here. i guess that's why you've got the example above! :)
@@ -26,6 +26,7 @@ class ParamValue(BaseModel): | |||
identity: Optional[str] | |||
references: Optional[List[FidesopsDatasetReference]] | |||
connector_param: Optional[str] | |||
flatten: Optional[bool] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see why flatten
makes sense as a name for this configuration flag when looking at what's done in the code. but as an end user, i don't really think about the data coming through as a list of lists that would be flattened - but maybe that's because i'm not thinking about it correctly.
the outer list in the code comes from multiple records traversed at once (i think), but the config here really is focusing on how each individual record/traversal should be handled - at least that's how i think of it. so for me, a better name might be unpack
, since you're unpacking the multiple values into individual requests?
again, i may be thinking about this wrong, so please take this with a grain of salt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unpack is a better name, I agree with your reasoning that we need to describe what is happening to each individual record instead of the entire input list.
# Example: | ||
# | ||
# _generate_product_list({ "first": ["a", "b"] }, { "second": ["1", "2", "3"] }) | ||
# | ||
# Returns: | ||
# [ | ||
# { "first": "a", "second": "1" } | ||
# { "first": "a", "second": "2" } | ||
# { "first": "a", "second": "3" } | ||
# { "first": "b", "second": "1" } | ||
# { "first": "b", "second": "2" } | ||
# { "first": "b", "second": "3" } | ||
# ] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't really like this left as a comment. it's helpful generally, but i think you should either add it to the method docstring or put it as a comment in a test, etc. this gets a bit messy in the source code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had it in the docstring but the formatting wasn't preserved when I hover over the function to read the docstring. Is there a way to preserve formatting? Or is it still worth putting it in the docstring without formatting being preserved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my change in 6c9700a makes things look good on my IDE. that being said, there seem to be some standards for this (e.g. https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html), and i sort of followed them but can't say i followed them to a tee...
anyway, i think this is good enough for now, let me know what you think.
Co-authored-by: Adam Sachs <adam@ethyca.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all looking good now.
Created #1286 for documentation updates (@ethyca/docs-authors FYI)
…#1253) * Improved param_value aggregation to support one-to-many relationships * pylint and mypy fixes * Fixing mypy and unit tests * Adding async await to Shopify test * Fixing tests * Adding tests for generate_product_list * Updating changelog * Fixing tests * Simplifying flatten logic Co-authored-by: Adam Sachs <adam@ethyca.com> * Adding comment to clarify test * Renaming flatten to unpack * Move method example to docstring Co-authored-by: Adam Sachs <adam@ethyca.com> Co-authored-by: Adam Sachs <adam@Adams-MacBook-Pro.local>
Purpose
Improved param value aggregation to support one-to-many/many-to-many relationships between references and connector params.
Changes
ConnectorParam
options
: a list of valid values for the connector param valuesmultiselect
: a boolean which defaults to False indicating that the value is a list of values from theoptions
fieldParamValue
unpack
: a boolean which defaults to False indicating that the values must be unpacked from a list/array before being used to generate the requests[1,2,3] -> 1,2,3
(3 separate requests)SaaSSchema
validator to validate secret values are consistent with the newoption
andmultiselect
connector param fieldsChecklist
CHANGELOG.md
fileCHANGELOG.md
file is being appended toUnreleased
section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.Run Unsafe PR Checks
label has been applied, and checks have passed, if this PR touches any external servicesTicket
Fixes #458