Improved param_value aggregation to support one-to-many relationships #1253

galvana · 2022-09-02T20:41:49Z

Purpose

Improved param value aggregation to support one-to-many/many-to-many relationships between references and connector params.

Changes

Added new fields, validators, and unit tests for ConnectorParam
- options: a list of valid values for the connector param values
- multiselect: a boolean which defaults to False indicating that the value is a list of values from the options field
Added a new field to ParamValue
- unpack: a boolean which defaults to False indicating that the values must be unpacked from a list/array before being used to generate the requests [1,2,3] -> 1,2,3 (3 separate requests)
Updated SaaSSchema validator to validate secret values are consistent with the new option and multiselect connector param fields
Updated the core logic for generating SaaS requests from input values (references and connector_params)
Moved SaaS query config logic and tests into their own files

Checklist

Update CHANGELOG.md file
- Merge in main so the most recent CHANGELOG.md file is being appended to
- Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
- Add a link to this PR at the end of the description with the PR number as the text. example: #1
Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
If docs updated (select one):
- documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
Good unit test/integration test coverage
The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #458

galvana · 2022-09-07T19:58:32Z

data/saas/config/saas_example_config.yml

@@ -38,10 +41,8 @@ saas_config:
                - dataset: saas_connector_example
                  field: conversations.id
                  direction: from
+          data_path: conversation_messages


Unrelated to our changes but I cleaned this up since I was here to add some new collections for testing.

galvana · 2022-09-07T20:15:47Z

data/saas/config/stripe_config.yml

@@ -85,8 +85,6 @@ saas_config:
          query_params:
            - name: charge
              value: <charge_id>
-            - name: payment_intent


Removed payment_intent since the way we are building the requests now causes this to be invalid, the Stripe API doesn't let us specify both params at once. The result is still the same since we have a list of disputes that can be identified either by charge_id or payment_intent_id.

galvana · 2022-09-07T20:17:51Z

src/fidesops/ops/schemas/connection_configuration/connection_secrets_saas.py

        model: Type[SaaSSchema] = create_model(
            f"{self.saas_config.type}_schema",
            **field_definitions,
            __base__=SaaSSchema,
+            _connector_params=PrivateAttr(


This was be best way I could find to dynamically set a property on the base class. I didn't include this with the other field_definitions since that is only used to describe the secret fields.

galvana · 2022-09-07T20:18:51Z

src/fidesops/ops/service/connectors/saas_query_config.py

                )

-        # Build SaaS requests for fields that are dependent on each other


The grouped_input scenario is now covered by the same general approach above.

galvana · 2022-09-07T20:19:46Z

src/fidesops/ops/service/connectors/saas_query_config.py

        return request_params

+    def _filtered_secrets(self, current_request: SaaSRequest) -> Dict[str, Any]:


We don't want to create a product of reference values with connector params that are not used by the request, so we filter to only keep the connector params in use.

galvana · 2022-09-07T20:20:12Z

src/fidesops/ops/service/connectors/saas_query_config.py


        if self.privacy_request:
-            param_values["privacy_request_id"] = self.privacy_request.id
+            param_values[PRIVACY_REQUEST_ID] = self.privacy_request.id


More cleanup, moving these values to constants.

galvana · 2022-09-07T20:21:13Z

tests/ops/fixtures/saas/zendesk_fixtures.py

@@ -27,6 +28,8 @@ def zendesk_secrets(saas_config):
        "domain": pydash.get(saas_config, "zendesk.domain") or secrets["domain"],
        "username": pydash.get(saas_config, "zendesk.username") or secrets["username"],
        "api_key": pydash.get(saas_config, "zendesk.api_key") or secrets["api_key"],
+        "page_size": pydash.get(saas_config, "zendesk.page_size")


This should have been caught before but it actually throws an error now if this value is missing, which is the correct behavior.

galvana · 2022-09-07T20:22:01Z

tests/ops/fixtures/saas/zendesk_fixtures.py

@@ -107,6 +110,8 @@ def zendesk_create_erasure_data(
    zendesk_connection_config: ConnectionConfig, zendesk_erasure_identity_email: str
 ) -> None:

+    time.sleep(60)


We need to wait performing the access request or else we will get a rate limit exception when we try to create the erasure data. More robust solution coming up in a future ticket.

galvana · 2022-09-07T20:23:20Z

tests/ops/service/connectors/test_saas_queryconfig.py

+
+
+@pytest.mark.unit_saas
+class TestSaaSQueryConfig:


Moving these tests to their own file.

galvana · 2022-09-07T20:23:51Z

tests/ops/integration_tests/saas/test_zendesk_task.py

@@ -137,7 +137,6 @@ async def test_zendesk_access_request_task(
            "custom_fields",
            "satisfaction_rating",
            "sharing_agreement_ids",
-            "fields",


Removing assertion for optional fields, these are not always returned.

adamsachs

this generally looks good to me - nice job taking the opportunity to refactor and improve some things a bit more broadly! my comments are relatively minor - the more substantive ones aren't blockers, but just food for thought.

just noting here that we don't yet have an integration test for the new flatten/one-to-many relationship, so that's just something to keep in mind once we do actually have our first use case -- it may need some ironing out.

also, i think we probably need some doc updates with this with the new functionality?

adamsachs · 2022-09-08T18:25:47Z

src/fidesops/ops/schemas/saas/saas_config.py

+                        f"[{', '.join(invalid_options)}] are not valid options, default_value must be a list of values from [{', '.join(options)}]"
+                    )
+
+        if multiselect and not options:


as i was going through this, i started to wonder why we need to tie the multivalue (i.e. "multiselect") support strictly to options params - why shouldn't we able to support proper lists/multiple values for any param, if we can support it in the case of an option/enum param?

perhaps there's not a valid use case, or perhaps there's more complexity to supporting it than i'm speculating about here. but i wanted to bring it up at least for consideration.

if it's a valid use case but is not trivial to support here, then i think it may be worth creating a new issue to track it.

note that if we do want to just enable multi value support generally (and not only for options params), then i think we should probably rename it from multiselect to something like multivalue

I think this is a valid concern but we haven't run into such use-case yet. If a connector param can have multiple values, they are usually from a well defined list, I haven't seen a scenario where we would expect the user to come up with a list of valid values. Plus from a UI/UX perspective, we'd have to figure out how we would expect the user to freehand a list of values, at least by selecting from a list we can re-use our multiselect UI element. I would like to put this on hold for now.

src/fidesops/ops/service/connectors/saas_query_config.py

adamsachs · 2022-09-08T19:08:20Z

tests/ops/service/connectors/test_saas_queryconfig.py

+                "api_version": "2.0",
+                "page_size": 10,
+                "api_key": "letmein",
+                "account_types": ["checking", "savings", "investment"],


just to make sure i'm following, are the account_types not coming into play here because we're going to the mailing_lists node, which doesn't actually leverage the account_types param?

what i'm trying to understand is why there are only 3 requests that are formed, if we are flattening the list_id list (that has 3 elements) and we are providing 3 separate account_types

This is to demonstrate that we won't generate the product with connector_params that are not in use by the request. I can add a comment to make this clearer.

adamsachs · 2022-09-08T19:15:47Z

tests/ops/schemas/connection_configuration/test_connection_secrets_saas.py

@@ -72,3 +72,42 @@ def test_default_value_fields(
        del saas_example_secrets["domain"]
        config = saas_example_secrets
        schema.parse_obj(config)
+


might be good to include a test that confirms an exception is thrown if multiselect = True but without an an options field in the param, since that's supposed to be invalid?

Nice catch with this scenario, that's already covered by test_multiselect_without_options in test_saasconfig.py. The tests in test_connection_secrets_saas.py are to test the validation of the secrets themselves, the scenario you are describing is testing the validation of the schema/connector_params.

adamsachs · 2022-09-08T20:28:12Z

src/fidesops/ops/service/connectors/saas_query_config.py

+        return [
+            dict(zip(merged_dicts.keys(), values))
+            for values in product(
+                *(
+                    value if isinstance(value, list) else [value]
+                    for value in merged_dicts.values()
+                )
+            )
+        ]


this is a mouthful :)

i'm not sure i've got a better way to write it, but it did take me quite a bit of time to understand what's being done here. i guess that's why you've got the example above! :)

adamsachs · 2022-09-08T22:52:13Z

src/fidesops/ops/schemas/saas/saas_config.py

@@ -26,6 +26,7 @@ class ParamValue(BaseModel):
    identity: Optional[str]
    references: Optional[List[FidesopsDatasetReference]]
    connector_param: Optional[str]
+    flatten: Optional[bool] = False


i see why flatten makes sense as a name for this configuration flag when looking at what's done in the code. but as an end user, i don't really think about the data coming through as a list of lists that would be flattened - but maybe that's because i'm not thinking about it correctly.

the outer list in the code comes from multiple records traversed at once (i think), but the config here really is focusing on how each individual record/traversal should be handled - at least that's how i think of it. so for me, a better name might be unpack, since you're unpacking the multiple values into individual requests?

again, i may be thinking about this wrong, so please take this with a grain of salt.

I think unpack is a better name, I agree with your reasoning that we need to describe what is happening to each individual record instead of the entire input list.

adamsachs · 2022-09-08T22:53:23Z

src/fidesops/ops/service/connectors/saas_query_config.py

+        # Example:
+        #
+        # _generate_product_list({ "first": ["a", "b"] }, { "second": ["1", "2", "3"] })
+        #
+        # Returns:
+        # [
+        #     { "first": "a", "second": "1" }
+        #     { "first": "a", "second": "2" }
+        #     { "first": "a", "second": "3" }
+        #     { "first": "b", "second": "1" }
+        #     { "first": "b", "second": "2" }
+        #     { "first": "b", "second": "3" }
+        # ]
+


i don't really like this left as a comment. it's helpful generally, but i think you should either add it to the method docstring or put it as a comment in a test, etc. this gets a bit messy in the source code here.

I originally had it in the docstring but the formatting wasn't preserved when I hover over the function to read the docstring. Is there a way to preserve formatting? Or is it still worth putting it in the docstring without formatting being preserved?

my change in 6c9700a makes things look good on my IDE. that being said, there seem to be some standards for this (e.g. https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html), and i sort of followed them but can't say i followed them to a tee...

anyway, i think this is good enough for now, let me know what you think.

Co-authored-by: Adam Sachs <adam@ethyca.com>

adamsachs

This is all looking good now.

Created #1286 for documentation updates (@ethyca/docs-authors FYI)

…#1253) * Improved param_value aggregation to support one-to-many relationships * pylint and mypy fixes * Fixing mypy and unit tests * Adding async await to Shopify test * Fixing tests * Adding tests for generate_product_list * Updating changelog * Fixing tests * Simplifying flatten logic Co-authored-by: Adam Sachs <adam@ethyca.com> * Adding comment to clarify test * Renaming flatten to unpack * Move method example to docstring Co-authored-by: Adam Sachs <adam@ethyca.com> Co-authored-by: Adam Sachs <adam@Adams-MacBook-Pro.local>

Improved param_value aggregation to support one-to-many relationships

52b1545

galvana linked an issue Sep 2, 2022 that may be closed by this pull request

allow for multi-value (i.e. one-to-many or many-to-many) references between collections/endpoints #458

Closed

galvana added 2 commits September 2, 2022 13:55

pylint and mypy fixes

d7bd0e4

Fixing mypy and unit tests

017abbd

galvana added the run unsafe ci checks Triggers running of unsafe CI checks label Sep 2, 2022

Adding async await to Shopify test

1c3a3de

galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 2, 2022

galvana added 2 commits September 2, 2022 22:53

Merge branch 'main' into 458-one-to-many-references

1b6c644

Fixing tests

1968d1e

galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 3, 2022

galvana added 3 commits September 6, 2022 15:53

Merging in main

bd5e69a

Adding tests for generate_product_list

2b47001

Updating changelog

06d3ef8

galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 6, 2022

Fixing tests

1876045

galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 7, 2022

Merge branch 'main' into 458-one-to-many-references

a8e117e

galvana marked this pull request as ready for review September 7, 2022 19:10

galvana added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Sep 7, 2022

galvana commented Sep 7, 2022

View reviewed changes

galvana requested a review from adamsachs September 7, 2022 20:24

adamsachs suggested changes Sep 8, 2022

View reviewed changes

galvana and others added 3 commits September 8, 2022 16:29

Simplifying flatten logic

42e0e80

Co-authored-by: Adam Sachs <adam@ethyca.com>

Adding comment to clarify test

6916848

Renaming flatten to unpack

49c5e94

galvana requested a review from adamsachs September 8, 2022 23:56

galvana and others added 2 commits September 8, 2022 16:57

Merge branch 'main' into 458-one-to-many-references

0cf190e

Move method example to docstring

6c9700a

adamsachs mentioned this pull request Sep 9, 2022

Document new saas param configs: options, multiselect and unpack #1286

Open

adamsachs approved these changes Sep 9, 2022

View reviewed changes

adamsachs merged commit df58807 into main Sep 9, 2022

adamsachs deleted the 458-one-to-many-references branch September 9, 2022 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved param_value aggregation to support one-to-many relationships #1253

Improved param_value aggregation to support one-to-many relationships #1253

galvana commented Sep 2, 2022 •

edited by adamsachs

Loading

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

galvana Sep 7, 2022

adamsachs left a comment

adamsachs Sep 8, 2022

galvana Sep 8, 2022

adamsachs Sep 8, 2022

galvana Sep 8, 2022

adamsachs Sep 8, 2022

galvana Sep 8, 2022

adamsachs Sep 8, 2022

adamsachs Sep 8, 2022

galvana Sep 8, 2022

adamsachs Sep 8, 2022

galvana Sep 8, 2022

adamsachs Sep 9, 2022

adamsachs left a comment

		)

		# Build SaaS requests for fields that are dependent on each other

		return request_params

		def _filtered_secrets(self, current_request: SaaSRequest) -> Dict[str, Any]:

Improved param_value aggregation to support one-to-many relationships #1253

Improved param_value aggregation to support one-to-many relationships #1253

Conversation

galvana commented Sep 2, 2022 • edited by adamsachs Loading

Purpose

Changes

Checklist

Ticket

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsachs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsachs left a comment

Choose a reason for hiding this comment

galvana commented Sep 2, 2022 •

edited by adamsachs

Loading