Send config events #75

asatwal · 2023-05-18T18:38:02Z

Send dfe analytics version and required config items from DfE Analytics to BigQuery as an "analytics_initialise" event.

The "analytics_initialise" is a one-off on startup event.

Also, some miscellaneous READ.me updates:

Document the anonymise data method available
A further information and instructions on pushing tags

stevenleggdfe · 2023-05-19T07:56:18Z

README.md

@@ -260,6 +260,18 @@ DfE::Analytics.config.anonymise_web_request_user_id = false

 Anonymisation of `user_id` would be required if the source field in the schema is in `analytics_pii.yml` so that analysts can join the IDs together. If the `user_id` is not in `analytics_pii.yml` but is in `analytics.yml` then `user_id` anonymisation would *not* be required so that the IDs could still be joined together.

+### Data Anonymisation Algorithm


@asatwal should we label this pseudonymisation rather than anonymisation to match ICO terminology? Technically the approach we take is not true anonymisation, so under GDPR is still PII. The advantage is masking which helps with security, rather than eliminating the need for it.

And also in other places where we've used the term.

See https://ico.org.uk/media/about-the-ico/consultations/4019579/chapter-3-anonymisation-guidance.pdf for more info

@stevenleggdfe agree that pseudonymisation would be better terminology. However, historically it's been called anonymisation in the gem and there are methods such as DfE::Analytics.anonymise that are used externally and config variables that would need to be updated as well as all the other documentation. For consistency we would have to update all these and this would be part of a bigger change with external dependencies, so I don't want to tackle this just yet.

Strictly speaking, this is just a hashing algorithm, so I would say it's more correct to call this "hashing" rather than "anonymisation", but that's a different story.

@stevenleggdfe I decided to action this review comment in the end as this is a sensitive and important area.

The only item I didn't change is the anonymised_user_agent_and_ip field in event data for web requests. I thought this item would require a big migration and also this data doesn't identify a user right ? Only a user session ?

I've kept the DfE:Analytics.anonymise for backwards compatibility. We can deprecate in future releases. I believe the only team that use this is tv, although I think it's best to send out deprecation warnings and remove in a more release friendly way.

Sounds good to me - thank you!

stevenleggdfe · 2023-05-19T07:57:13Z

README.md

+
+The `dfe-analytics` gem also anonymises such data, if it is configured to do so. If you are anonymising database IDs in your code (in custom events for example), then you should uses the same hashing algorithm for anonymisation that the gem uses.
+
+The following method should be used in your code for anonymisation:


@asatwal worth explaining why? i.e. "... to ensure that pseudonymised data can still be joined to other pseudonymised data"

stevenleggdfe · 2023-05-19T07:58:20Z

README.md

+```ruby
+DfE::Analytics.anonymise(value)
+```
+


@asatwal worth adding a note here to say that the GoogleSQL equivalent of this is TO_HEX(SHA256(value)), and thus SQL can be used to anonymise data, but (obviously) not to de-anonymise it?

I think that could usefully go into a comment in the anonymise method? I like that callers just use anonymise and don't have to worry about the algorithm. Not that we're likely to change it, but it feels hygienic.

Agree. I'm trying to keep too much technical details out of the READ.me anyway as it's getting very long.

stevenleggdfe · 2023-05-19T08:01:15Z

lib/dfe/analytics/event.rb

+        update_entity
+        delete_entity
+        import_entity
+        analytics_initialise


@asatwal maybe ignore this (!) but given we've used verb_noun form for the entity event types what would you think of using initialise_analytics for this new event type for consistency?

@stevenleggdfe yes initialise_analytics sounds better !

duncanjbrown · 2023-05-19T09:19:24Z

README.md

@@ -260,6 +260,18 @@ DfE::Analytics.config.anonymise_web_request_user_id = false

 Anonymisation of `user_id` would be required if the source field in the schema is in `analytics_pii.yml` so that analysts can join the IDs together. If the `user_id` is not in `analytics_pii.yml` but is in `analytics.yml` then `user_id` anonymisation would *not* be required so that the IDs could still be joined together.

+### Data Anonymisation Algorithm


duncanjbrown · 2023-05-19T09:21:41Z

README.md

+```ruby
+DfE::Analytics.anonymise(value)
+```
+


I think that could usefully go into a comment in the anonymise method? I like that callers just use anonymise and don't have to worry about the algorithm. Not that we're likely to change it, but it feels hygienic.

duncanjbrown · 2023-05-19T09:22:04Z

README.md

-5. Push the tags: `git push --tags`
-6. Cut a PR on GitHub with the label `version-release`, and merge once approved
+5. Cut a PR on GitHub with the label `version-release`, and wait for approval 
+6. Once the PR is approved push the tags, immediately prior to merging: `git push --tags`


Thank you — that was bugging me!

duncanjbrown · 2023-05-19T09:23:30Z

lib/dfe/analytics.rb

@@ -103,6 +104,9 @@ def self.initialize!
          model.include(DfE::Analytics::Entities)
        end
      end
+
+      DfE::Analytics::Initialise.trigger_initialise_event


We already have an init routine in railtie.rb — could we use that?

In railtie.rb, I find the App isn't fully initialised, although it depends on where in railtie the methods are called. I suppose it could be called in the after_initialize hook after the DfE::Analytics.initialize!call, after which the config is fully initialised, as we need the config loaded before calling trigger_initialise_event.

I'll give that a try.

So I did try the suggestion of moving to railtie.rb and it does work. However, it's not possible to have collaboration tests, so if the call to DfE::Analytics.initialise.trigger_initialise_event ever got removed, the tests would not pick that up. For this reason I've left the trigger called in DfE::Analytics.initialize!.

duncanjbrown · 2023-05-19T09:23:41Z

lib/dfe/analytics/event.rb

+        update_entity
+        delete_entity
+        import_entity
+        analytics_initialise


duncanjbrown · 2023-05-19T09:24:15Z

lib/dfe/analytics/initialise.rb

+
+      private
+
+      def initialise_data


is this initialisation_data?

… event triggered

duncanjbrown

Thanks @asatwal !

asatwal added 2 commits May 18, 2023 18:38

Send initialisation event with version and config

15c80f9

Update docs for anynonimsing data and pushing tags to github

f137c01

asatwal requested a review from duncanjbrown May 18, 2023 18:38

asatwal self-assigned this May 18, 2023

stevenleggdfe reviewed May 19, 2023

View reviewed changes

duncanjbrown reviewed May 19, 2023

View reviewed changes

asatwal added 4 commits May 19, 2023 11:35

Actioned review comments

b08cdc2

Action review comment: change terminlogy anonymise -> pseudonymise

6b986dc

Delay sending of initialise event until after startup and until first…

75d7fb7

… event triggered

Remove unecessary spec expectations when initialise event not posted

fe4859f

duncanjbrown approved these changes May 26, 2023

View reviewed changes

asatwal merged commit cbef517 into main May 26, 2023

asatwal deleted the send-config-events branch May 26, 2023 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send config events #75

Send config events #75

asatwal commented May 18, 2023 •

edited

Loading

stevenleggdfe May 19, 2023

duncanjbrown May 19, 2023

asatwal May 19, 2023

asatwal May 22, 2023

stevenleggdfe May 23, 2023

stevenleggdfe May 19, 2023

stevenleggdfe May 19, 2023

duncanjbrown May 19, 2023

asatwal May 19, 2023 •

edited

Loading

stevenleggdfe May 19, 2023

duncanjbrown May 19, 2023

asatwal May 19, 2023

duncanjbrown May 19, 2023

duncanjbrown May 19, 2023

duncanjbrown May 19, 2023

duncanjbrown May 19, 2023

asatwal May 19, 2023 •

edited

Loading

duncanjbrown May 19, 2023

duncanjbrown May 19, 2023

duncanjbrown left a comment

		@@ -260,6 +260,18 @@ DfE::Analytics.config.anonymise_web_request_user_id = false

		Anonymisation of `user_id` would be required if the source field in the schema is in `analytics_pii.yml` so that analysts can join the IDs together. If the `user_id` is not in `analytics_pii.yml` but is in `analytics.yml` then `user_id` anonymisation would not be required so that the IDs could still be joined together.

		### Data Anonymisation Algorithm


		The `dfe-analytics` gem also anonymises such data, if it is configured to do so. If you are anonymising database IDs in your code (in custom events for example), then you should uses the same hashing algorithm for anonymisation that the gem uses.

		The following method should be used in your code for anonymisation:

Send config events #75

Send config events #75

Conversation

asatwal commented May 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asatwal May 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asatwal May 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

duncanjbrown left a comment

Choose a reason for hiding this comment

asatwal commented May 18, 2023 •

edited

Loading

asatwal May 19, 2023 •

edited

Loading

asatwal May 19, 2023 •

edited

Loading