Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Improved param_value aggregation to support one-to-many relationships #1253

Merged
merged 16 commits into from
Sep 9, 2022

Conversation

galvana
Copy link
Collaborator

@galvana galvana commented Sep 2, 2022

Purpose

Improved param value aggregation to support one-to-many/many-to-many relationships between references and connector params.

Changes

  • Added new fields, validators, and unit tests for ConnectorParam
    • options: a list of valid values for the connector param values
    • multiselect: a boolean which defaults to False indicating that the value is a list of values from the options field
  • Added a new field to ParamValue
    • unpack: a boolean which defaults to False indicating that the values must be unpacked from a list/array before being used to generate the requests [1,2,3] -> 1,2,3 (3 separate requests)
  • Updated SaaSSchema validator to validate secret values are consistent with the new option and multiselect connector param fields
  • Updated the core logic for generating SaaS requests from input values (references and connector_params)
  • Moved SaaS query config logic and tests into their own files

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
  • Good unit test/integration test coverage
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #458

@galvana galvana added the run unsafe ci checks Triggers running of unsafe CI checks label Sep 2, 2022
@galvana galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 2, 2022
@galvana galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 3, 2022
@galvana galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 6, 2022
@galvana galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 7, 2022
@galvana galvana marked this pull request as ready for review September 7, 2022 19:10
@galvana galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 7, 2022
@@ -38,10 +41,8 @@ saas_config:
- dataset: saas_connector_example
field: conversations.id
direction: from
data_path: conversation_messages
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to our changes but I cleaned this up since I was here to add some new collections for testing.

@@ -85,8 +85,6 @@ saas_config:
query_params:
- name: charge
value: <charge_id>
- name: payment_intent
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed payment_intent since the way we are building the requests now causes this to be invalid, the Stripe API doesn't let us specify both params at once. The result is still the same since we have a list of disputes that can be identified either by charge_id or payment_intent_id.

model: Type[SaaSSchema] = create_model(
f"{self.saas_config.type}_schema",
**field_definitions,
__base__=SaaSSchema,
_connector_params=PrivateAttr(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was be best way I could find to dynamically set a property on the base class. I didn't include this with the other field_definitions since that is only used to describe the secret fields.

)

# Build SaaS requests for fields that are dependent on each other
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grouped_input scenario is now covered by the same general approach above.

return request_params

def _filtered_secrets(self, current_request: SaaSRequest) -> Dict[str, Any]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to create a product of reference values with connector params that are not used by the request, so we filter to only keep the connector params in use.


if self.privacy_request:
param_values["privacy_request_id"] = self.privacy_request.id
param_values[PRIVACY_REQUEST_ID] = self.privacy_request.id
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More cleanup, moving these values to constants.

@@ -27,6 +28,8 @@ def zendesk_secrets(saas_config):
"domain": pydash.get(saas_config, "zendesk.domain") or secrets["domain"],
"username": pydash.get(saas_config, "zendesk.username") or secrets["username"],
"api_key": pydash.get(saas_config, "zendesk.api_key") or secrets["api_key"],
"page_size": pydash.get(saas_config, "zendesk.page_size")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been caught before but it actually throws an error now if this value is missing, which is the correct behavior.

@@ -107,6 +110,8 @@ def zendesk_create_erasure_data(
zendesk_connection_config: ConnectionConfig, zendesk_erasure_identity_email: str
) -> None:

time.sleep(60)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to wait performing the access request or else we will get a rate limit exception when we try to create the erasure data. More robust solution coming up in a future ticket.



@pytest.mark.unit_saas
class TestSaaSQueryConfig:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving these tests to their own file.

@@ -137,7 +137,6 @@ async def test_zendesk_access_request_task(
"custom_fields",
"satisfaction_rating",
"sharing_agreement_ids",
"fields",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing assertion for optional fields, these are not always returned.

Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this generally looks good to me - nice job taking the opportunity to refactor and improve some things a bit more broadly! my comments are relatively minor - the more substantive ones aren't blockers, but just food for thought.

just noting here that we don't yet have an integration test for the new flatten/one-to-many relationship, so that's just something to keep in mind once we do actually have our first use case -- it may need some ironing out.

also, i think we probably need some doc updates with this with the new functionality?

f"[{', '.join(invalid_options)}] are not valid options, default_value must be a list of values from [{', '.join(options)}]"
)

if multiselect and not options:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i was going through this, i started to wonder why we need to tie the multivalue (i.e. "multiselect") support strictly to options params - why shouldn't we able to support proper lists/multiple values for any param, if we can support it in the case of an option/enum param?

perhaps there's not a valid use case, or perhaps there's more complexity to supporting it than i'm speculating about here. but i wanted to bring it up at least for consideration.

if it's a valid use case but is not trivial to support here, then i think it may be worth creating a new issue to track it.

note that if we do want to just enable multi value support generally (and not only for options params), then i think we should probably rename it from multiselect to something like multivalue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a valid concern but we haven't run into such use-case yet. If a connector param can have multiple values, they are usually from a well defined list, I haven't seen a scenario where we would expect the user to come up with a list of valid values. Plus from a UI/UX perspective, we'd have to figure out how we would expect the user to freehand a list of values, at least by selecting from a list we can re-use our multiselect UI element. I would like to put this on hold for now.

src/fidesops/ops/service/connectors/saas_query_config.py Outdated Show resolved Hide resolved
"api_version": "2.0",
"page_size": 10,
"api_key": "letmein",
"account_types": ["checking", "savings", "investment"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure i'm following, are the account_types not coming into play here because we're going to the mailing_lists node, which doesn't actually leverage the account_types param?

what i'm trying to understand is why there are only 3 requests that are formed, if we are flattening the list_id list (that has 3 elements) and we are providing 3 separate account_types

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to demonstrate that we won't generate the product with connector_params that are not in use by the request. I can add a comment to make this clearer.

@@ -72,3 +72,42 @@ def test_default_value_fields(
del saas_example_secrets["domain"]
config = saas_example_secrets
schema.parse_obj(config)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to include a test that confirms an exception is thrown if multiselect = True but without an an options field in the param, since that's supposed to be invalid?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch with this scenario, that's already covered by test_multiselect_without_options in test_saasconfig.py. The tests in test_connection_secrets_saas.py are to test the validation of the secrets themselves, the scenario you are describing is testing the validation of the schema/connector_params.

Comment on lines +192 to +200
return [
dict(zip(merged_dicts.keys(), values))
for values in product(
*(
value if isinstance(value, list) else [value]
for value in merged_dicts.values()
)
)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a mouthful :)

i'm not sure i've got a better way to write it, but it did take me quite a bit of time to understand what's being done here. i guess that's why you've got the example above! :)

@@ -26,6 +26,7 @@ class ParamValue(BaseModel):
identity: Optional[str]
references: Optional[List[FidesopsDatasetReference]]
connector_param: Optional[str]
flatten: Optional[bool] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see why flatten makes sense as a name for this configuration flag when looking at what's done in the code. but as an end user, i don't really think about the data coming through as a list of lists that would be flattened - but maybe that's because i'm not thinking about it correctly.

the outer list in the code comes from multiple records traversed at once (i think), but the config here really is focusing on how each individual record/traversal should be handled - at least that's how i think of it. so for me, a better name might be unpack, since you're unpacking the multiple values into individual requests?

again, i may be thinking about this wrong, so please take this with a grain of salt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unpack is a better name, I agree with your reasoning that we need to describe what is happening to each individual record instead of the entire input list.

Comment on lines 177 to 190
# Example:
#
# _generate_product_list({ "first": ["a", "b"] }, { "second": ["1", "2", "3"] })
#
# Returns:
# [
# { "first": "a", "second": "1" }
# { "first": "a", "second": "2" }
# { "first": "a", "second": "3" }
# { "first": "b", "second": "1" }
# { "first": "b", "second": "2" }
# { "first": "b", "second": "3" }
# ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really like this left as a comment. it's helpful generally, but i think you should either add it to the method docstring or put it as a comment in a test, etc. this gets a bit messy in the source code here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had it in the docstring but the formatting wasn't preserved when I hover over the function to read the docstring. Is there a way to preserve formatting? Or is it still worth putting it in the docstring without formatting being preserved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my change in 6c9700a makes things look good on my IDE. that being said, there seem to be some standards for this (e.g. https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html), and i sort of followed them but can't say i followed them to a tee...

anyway, i think this is good enough for now, let me know what you think.

Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all looking good now.

Created #1286 for documentation updates (@ethyca/docs-authors FYI)

@adamsachs adamsachs merged commit df58807 into main Sep 9, 2022
@adamsachs adamsachs deleted the 458-one-to-many-references branch September 9, 2022 11:55
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
…#1253)

* Improved param_value aggregation to support one-to-many relationships

* pylint and mypy fixes

* Fixing mypy and unit tests

* Adding async await to Shopify test

* Fixing tests

* Adding tests for generate_product_list

* Updating changelog

* Fixing tests

* Simplifying flatten logic

Co-authored-by: Adam Sachs <adam@ethyca.com>

* Adding comment to clarify test

* Renaming flatten to unpack

* Move method example to docstring

Co-authored-by: Adam Sachs <adam@ethyca.com>
Co-authored-by: Adam Sachs <adam@Adams-MacBook-Pro.local>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
run unsafe ci checks Triggers running of unsafe CI checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow for multi-value (i.e. one-to-many or many-to-many) references between collections/endpoints
2 participants