-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hidden_pii.yml #121
Add hidden_pii.yml #121
Conversation
README.md
Outdated
**It is imperative that you perform a full check of those fields are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.** | ||
| `config/analytics.yml` | List all fields we will send to BigQuery | | ||
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` | | ||
| `config/analytics_hidden_pii.yml` | List all fields we will send separately to BigQuery where they will be hidden. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is - will the fields within the analytics_hidden_pii be a subset of fields within analytics.yml in the same way analytics_pii is at present? If they aren't, then I will likely need to update the validations since we currently check that a field is either on the allowlist or the blocklist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so. In that respect analytics_hidden_pii.yml
is no different to analytics_pii.yml
. It's just that the method of sending and holding the data is different.
Also I the field should only be in either analytics_hidden_pii.yml
or analytics_pii.yml
. Having the field in both would be ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great. There is a ticket to add a validation so that they don't appear in both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, mainly around conventions
README.md
Outdated
**It is imperative that you perform a full check of those fields are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.** | ||
| `config/analytics.yml` | List all fields we will send to BigQuery | | ||
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` | | ||
| `config/analytics_hidden_pii.yml` | List all fields we will send separately to BigQuery where they will be hidden. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so. In that respect analytics_hidden_pii.yml
is no different to analytics_pii.yml
. It's just that the method of sending and holding the data is different.
Also I the field should only be in either analytics_hidden_pii.yml
or analytics_pii.yml
. Having the field in both would be ambiguous.
lib/dfe/analytics.rb
Outdated
@@ -141,6 +141,10 @@ def self.allowlist_pii | |||
Rails.application.config_for(:analytics_pii) | |||
end | |||
|
|||
def self.analytics_hidden_pii |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for consistency, should we call this method hidden_pii
?
spec/dfe/analytics/fields_spec.rb
Outdated
end | ||
end | ||
|
||
let(:existing_allowlist) { { Candidate.table_name.to_sym => ['email_address'] } } | ||
let(:existing_allowlist) { { Candidate.table_name.to_sym => %w[email_address] } } | ||
let(:existing_blocklist) { { Candidate.table_name.to_sym => ['id'] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency should we change the following also ?
['id'] -> %w[id]
['dob'] -> %w[dob]
spec/dfe/analytics/fields_spec.rb
Outdated
|
||
before do | ||
allow(DfE::Analytics).to receive(:allowlist).and_return(existing_allowlist) | ||
allow(DfE::Analytics).to receive(:blocklist).and_return(existing_blocklist) | ||
allow(DfE::Analytics).to receive(:analytics_hidden_pii).and_return(hidden_pii) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hidden_pii
rather than analytics_hidden_pii
? Please see above.
9de2908
to
285bb9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM. Maybe we should create an epic branch and merge all the PRs into an epic branch rather than having separate PR branches ? We can discuss.
* Add hidden_pii.yml * Address PR comments * Update Readme for hidden_pii.yml
* Add hidden_pii.yml * Address PR comments * Update Readme for hidden_pii.yml
* Add hidden_pii.yml * Address PR comments * Update Readme for hidden_pii.yml
* Add DATA_hidden to events table creation SQL script * Add hidden PII required permissions to GCP custom IAM role setup instructions * Add policy tag setup instructions to GCP setup instructions * Update role configuration to match latest BAT configuration * Add hidden_pii.yml (#121) * Allow hidden data to be sent separately (#128) * Mask hidden_pii from logs * Add validation for hidden pii fields appearing on both lists * Change DATA_hidden to hidden_DATA in google_cloud_bigquery_setup.md * Change DATA_hidden to hidden_DATA in create-events-table.sql * Cover edge cases for validations (#146) * Update log masking method (#150) * Add reference to hidden_pii to cutom events (#153)
Ticket Add analytics_hidden_pii.yml configuration file to dfe-analytics