-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send config events #75
Changes from 2 commits
15c80f9
f137c01
b08cdc2
6b986dc
75d7fb7
fe4859f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -260,6 +260,18 @@ DfE::Analytics.config.anonymise_web_request_user_id = false | |
|
||
Anonymisation of `user_id` would be required if the source field in the schema is in `analytics_pii.yml` so that analysts can join the IDs together. If the `user_id` is not in `analytics_pii.yml` but is in `analytics.yml` then `user_id` anonymisation would *not* be required so that the IDs could still be joined together. | ||
|
||
### Data Anonymisation Algorithm | ||
|
||
Generally all PII data should be anonymised, including data that directly or indirect references PII, for example database IDs. | ||
|
||
The `dfe-analytics` gem also anonymises such data, if it is configured to do so. If you are anonymising database IDs in your code (in custom events for example), then you should uses the same hashing algorithm for anonymisation that the gem uses. | ||
|
||
The following method should be used in your code for anonymisation: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @asatwal worth explaining why? i.e. "... to ensure that pseudonymised data can still be joined to other pseudonymised data" |
||
|
||
```ruby | ||
DfE::Analytics.anonymise(value) | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @asatwal worth adding a note here to say that the GoogleSQL equivalent of this is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that could usefully go into a comment in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. I'm trying to keep too much technical details out of the READ.me anyway as it's getting very long. |
||
### Adding specs | ||
|
||
#### Testing modes | ||
|
@@ -432,8 +444,11 @@ Please note that page caching is project specific and each project must carefull | |
> It could be nice to have tests to prove that connectivity to GCP still works after an update, but we aren't setup for that yet. | ||
3. (Optional) Verify committed `CHANGELOG.md` changes and alter if necessary: `git show` | ||
4. Push the branch: `git push origin v${NEW_VERSION}-release`, e.g. `git push origin v1.3.0-release` | ||
5. Push the tags: `git push --tags` | ||
6. Cut a PR on GitHub with the label `version-release`, and merge once approved | ||
5. Cut a PR on GitHub with the label `version-release`, and wait for approval | ||
6. Once the PR is approved push the tags, immediately prior to merging: `git push --tags` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you — that was bugging me! |
||
7. Merge the PR. | ||
|
||
IMPORTANT: Pushing the tags will immediately make the release available even on a unmerged branch. Therefore, push the tags to Github only when the PR is approved and immediately prior to merging the PR. | ||
|
||
## License | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ | |
require 'dfe/analytics/load_entities' | ||
require 'dfe/analytics/load_entity_batch' | ||
require 'dfe/analytics/requests' | ||
require 'dfe/analytics/initialise' | ||
require 'dfe/analytics/version' | ||
require 'dfe/analytics/middleware/request_identity' | ||
require 'dfe/analytics/middleware/send_cached_page_request_event' | ||
|
@@ -103,6 +104,9 @@ def self.initialize! | |
model.include(DfE::Analytics::Entities) | ||
end | ||
end | ||
|
||
DfE::Analytics::Initialise.trigger_initialise_event | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We already have an init routine in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In I'll give that a try. So I did try the suggestion of moving to |
||
|
||
rescue ActiveRecord::PendingMigrationError | ||
Rails.logger.info('Database requires migration; DfE Analytics not initialized') | ||
rescue ActiveRecord::ActiveRecordError | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,14 @@ | |
module DfE | ||
module Analytics | ||
class Event | ||
EVENT_TYPES = %w[web_request create_entity update_entity delete_entity import_entity].freeze | ||
EVENT_TYPES = %w[ | ||
web_request | ||
create_entity | ||
update_entity | ||
delete_entity | ||
import_entity | ||
analytics_initialise | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @asatwal maybe ignore this (!) but given we've used verb_noun form for the entity event types what would you think of using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stevenleggdfe yes |
||
].freeze | ||
|
||
def initialize | ||
time_zone = 'London' | ||
|
@@ -24,9 +31,7 @@ def with_type(type) | |
allowed_types = EVENT_TYPES + DfE::Analytics.custom_events | ||
raise 'Invalid analytics event type' unless allowed_types.include?(type.to_s) | ||
|
||
@event_hash.merge!( | ||
event_type: type | ||
) | ||
@event_hash.merge!(event_type: type) | ||
|
||
self | ||
end | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# frozen_string_literal: true | ||
|
||
module DfE | ||
module Analytics | ||
# Send event on dfe analytics startup during initialisation | ||
# - Event contains the dfe analytics version, config and other items | ||
class Initialise | ||
def self.trigger_initialise_event | ||
new.send_initialise_event | ||
end | ||
|
||
def send_initialise_event | ||
return unless DfE::Analytics.enabled? | ||
|
||
initialise_event = DfE::Analytics::Event.new | ||
.with_type('analytics_initialise') | ||
.with_data(initialise_data) | ||
|
||
DfE::Analytics::SendEvents.do([initialise_event.as_json]) | ||
end | ||
|
||
private | ||
|
||
def initialise_data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this |
||
{ | ||
analytics_version: DfE::Analytics::VERSION, | ||
config: { | ||
anonymise_web_request_user_id: DfE::Analytics.config.anonymise_web_request_user_id | ||
} | ||
} | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# frozen_string_literal: true | ||
|
||
RSpec.describe DfE::Analytics::Initialise do | ||
before do | ||
allow(DfE::Analytics::SendEvents).to receive(:perform_later) | ||
allow(DfE::Analytics).to receive(:enabled?).and_return(true) | ||
end | ||
|
||
describe 'trigger_initialise_event ' do | ||
it 'includes the expected attributes' do | ||
described_class.trigger_initialise_event | ||
|
||
expect(DfE::Analytics::SendEvents).to have_received(:perform_later) | ||
.with([a_hash_including({ | ||
'event_type' => 'analytics_initialise', | ||
'data' => [ | ||
{ 'key' => 'analytics_version', 'value' => [DfE::Analytics::VERSION] }, | ||
{ 'key' => 'config', | ||
'value' => ['{"anonymise_web_request_user_id":false}'] } | ||
] | ||
})]) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asatwal should we label this pseudonymisation rather than anonymisation to match ICO terminology? Technically the approach we take is not true anonymisation, so under GDPR is still PII. The advantage is masking which helps with security, rather than eliminating the need for it.
And also in other places where we've used the term.
See https://ico.org.uk/media/about-the-ico/consultations/4019579/chapter-3-anonymisation-guidance.pdf for more info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenleggdfe agree that pseudonymisation would be better terminology. However, historically it's been called anonymisation in the gem and there are methods such as
DfE::Analytics.anonymise
that are used externally and config variables that would need to be updated as well as all the other documentation. For consistency we would have to update all these and this would be part of a bigger change with external dependencies, so I don't want to tackle this just yet.Strictly speaking, this is just a hashing algorithm, so I would say it's more correct to call this "hashing" rather than "anonymisation", but that's a different story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenleggdfe I decided to action this review comment in the end as this is a sensitive and important area.
The only item I didn't change is the
anonymised_user_agent_and_ip
field in event data for web requests. I thought this item would require a big migration and also this data doesn't identify a user right ? Only a user session ?I've kept the
DfE:Analytics.anonymise
for backwards compatibility. We can deprecate in future releases. I believe the only team that use this is tv, although I think it's best to send out deprecation warnings and remove in a more release friendly way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me - thank you!