-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send config events #75
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
15c80f9
Send initialisation event with version and config
asatwal f137c01
Update docs for anynonimsing data and pushing tags to github
asatwal b08cdc2
Actioned review comments
asatwal 6b986dc
Action review comment: change terminlogy anonymise -> pseudonymise
asatwal 75d7fb7
Delay sending of initialise event until after startup and until first…
asatwal fe4859f
Remove unecessary spec expectations when initialise event not posted
asatwal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
|
||
**👉 Send every web request and database update to BigQuery** | ||
|
||
**✋ Skip or anonymise fields containing PII** | ||
**✋ Skip or pseudonymise fields containing PII**. For an explanation of pseudonymisation, see [ICO Guidance](https://ico.org.uk/media/about-the-ico/consultations/4019579/chapter-3-anonymisation-guidance.pdf) | ||
|
||
**✌️ Configure and forget** | ||
|
||
|
@@ -250,15 +250,27 @@ If a field other than `id` is required for the user identifier, then a custom us | |
DfE::Analytics.config.user_identifier = proc { |user| user&.id } | ||
``` | ||
|
||
#### User ID anonymisation | ||
#### User ID pseudonymisation | ||
|
||
The `user_id` in the web request event will not be anonymised by default. This can be changed by updating the configuration option in `config/initializers/dfe_analytics.rb`: | ||
The `user_id` in the web request event will not be pseudonymised by default. This can be changed by updating the configuration option in `config/initializers/dfe_analytics.rb`: | ||
|
||
```ruby | ||
DfE::Analytics.config.anonymise_web_request_user_id = false | ||
DfE::Analytics.config.pseudonymise_web_request_user_id = false | ||
``` | ||
|
||
Anonymisation of `user_id` would be required if the source field in the schema is in `analytics_pii.yml` so that analysts can join the IDs together. If the `user_id` is not in `analytics_pii.yml` but is in `analytics.yml` then `user_id` anonymisation would *not* be required so that the IDs could still be joined together. | ||
Pseudonymisation of `user_id` would be required if the source field in the schema is in `analytics_pii.yml` so that analysts can join the IDs together. If the `user_id` is not in `analytics_pii.yml` but is in `analytics.yml` then `user_id` pseudonymisation would *not* be required so that the IDs could still be joined together. | ||
|
||
### Data Pseudonymisation Algorithm | ||
|
||
Generally all PII data should be pseudonymised, including data that directly or indirect references PII, for example database IDs. | ||
|
||
The `dfe-analytics` gem also pseudonymises such data, if it is configured to do so. If you are pseudonymising database IDs in your code (in custom events for example), then you should use the same hashing algorithm for pseudonymisation that the gem uses in order to allow joining of pseudonymised data across different database tables. | ||
|
||
The following method should be used in your code for pseudonymisation: | ||
|
||
```ruby | ||
DfE::Analytics.pseudonymise(value) | ||
``` | ||
|
||
### Adding specs | ||
|
||
|
@@ -432,8 +444,11 @@ Please note that page caching is project specific and each project must carefull | |
> It could be nice to have tests to prove that connectivity to GCP still works after an update, but we aren't setup for that yet. | ||
3. (Optional) Verify committed `CHANGELOG.md` changes and alter if necessary: `git show` | ||
4. Push the branch: `git push origin v${NEW_VERSION}-release`, e.g. `git push origin v1.3.0-release` | ||
5. Push the tags: `git push --tags` | ||
6. Cut a PR on GitHub with the label `version-release`, and merge once approved | ||
5. Cut a PR on GitHub with the label `version-release`, and wait for approval | ||
6. Once the PR is approved push the tags, immediately prior to merging: `git push --tags` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you — that was bugging me! |
||
7. Merge the PR. | ||
|
||
IMPORTANT: Pushing the tags will immediately make the release available even on a unmerged branch. Therefore, push the tags to Github only when the PR is approved and immediately prior to merging the PR. | ||
|
||
## License | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# frozen_string_literal: true | ||
|
||
module DfE | ||
module Analytics | ||
# DfE Analytics initialisation event | ||
# - Event should only be sent once, but NOT on startup as this causes errors on some services | ||
# - Event contains the dfe analytics version, config and other items | ||
class Initialise | ||
# Disable rubocop class variable warnings for class - class variable required to control sending of event | ||
# rubocop:disable Style:ClassVars | ||
@@initialise_event_sent = false # rubocop:disable Style:ClassVars | ||
|
||
def self.trigger_initialise_event | ||
new.send_initialise_event | ||
end | ||
|
||
def self.initialise_event_sent? | ||
@@initialise_event_sent | ||
end | ||
|
||
def self.initialise_event_sent=(value) | ||
@@initialise_event_sent = value # rubocop:disable Style:ClassVars | ||
end | ||
|
||
def send_initialise_event | ||
return unless DfE::Analytics.enabled? | ||
|
||
initialise_event = DfE::Analytics::Event.new | ||
.with_type('initialise_analytics') | ||
.with_data(initialisation_data) | ||
.as_json | ||
|
||
if DfE::Analytics.async? | ||
DfE::Analytics::SendEvents.perform_later([initialise_event]) | ||
else | ||
DfE::Analytics::SendEvents.perform_now([initialise_event]) | ||
end | ||
|
||
@@initialise_event_sent = true # rubocop:disable Style:ClassVars | ||
end | ||
|
||
private | ||
|
||
def initialisation_data | ||
{ | ||
analytics_version: DfE::Analytics::VERSION, | ||
config: { | ||
pseudonymise_web_request_user_id: DfE::Analytics.config.pseudonymise_web_request_user_id | ||
} | ||
} | ||
end | ||
# rubocop:enable Style:ClassVars | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# frozen_string_literal: true | ||
|
||
RSpec.describe DfE::Analytics::Initialise do | ||
before do | ||
allow(DfE::Analytics::SendEvents).to receive(:perform_later) | ||
allow(DfE::Analytics).to receive(:enabled?).and_return(true) | ||
end | ||
|
||
describe 'trigger_initialise_event ' do | ||
it 'includes the expected attributes' do | ||
described_class.trigger_initialise_event | ||
|
||
expect(DfE::Analytics::SendEvents).to have_received(:perform_later) | ||
.with([a_hash_including({ | ||
'event_type' => 'initialise_analytics', | ||
'data' => [ | ||
{ 'key' => 'analytics_version', 'value' => [DfE::Analytics::VERSION] }, | ||
{ 'key' => 'config', | ||
'value' => ['{"pseudonymise_web_request_user_id":false}'] } | ||
] | ||
})]) | ||
end | ||
end | ||
|
||
describe '.initialise_event_sent=' do | ||
it 'allows setting of the class variable' do | ||
described_class.initialise_event_sent = true | ||
expect(described_class.initialise_event_sent?).to eq(true) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asatwal worth adding a note here to say that the GoogleSQL equivalent of this is
TO_HEX(SHA256(value))
, and thus SQL can be used to anonymise data, but (obviously) not to de-anonymise it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could usefully go into a comment in the
anonymise
method? I like that callers just useanonymise
and don't have to worry about the algorithm. Not that we're likely to change it, but it feels hygienic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I'm trying to keep too much technical details out of the READ.me anyway as it's getting very long.