Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hidden_pii.yml #121

Merged
merged 3 commits into from
Apr 5, 2024
Merged

Add hidden_pii.yml #121

merged 3 commits into from
Apr 5, 2024

Conversation

ericaporter
Copy link
Contributor

@ericaporter ericaporter commented Mar 21, 2024

Ticket Add analytics_hidden_pii.yml configuration file to dfe-analytics

README.md Outdated
**It is imperative that you perform a full check of those fields are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.**
| `config/analytics.yml` | List all fields we will send to BigQuery |
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
| `config/analytics_hidden_pii.yml` | List all fields we will send separately to BigQuery where they will be hidden. |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is - will the fields within the analytics_hidden_pii be a subset of fields within analytics.yml in the same way analytics_pii is at present? If they aren't, then I will likely need to update the validations since we currently check that a field is either on the allowlist or the blocklist

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe so. In that respect analytics_hidden_pii.yml is no different to analytics_pii.yml. It's just that the method of sending and holding the data is different.

Also I the field should only be in either analytics_hidden_pii.yml or analytics_pii.yml. Having the field in both would be ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great. There is a ticket to add a validation so that they don't appear in both

Copy link
Collaborator

@asatwal asatwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, mainly around conventions

README.md Outdated
**It is imperative that you perform a full check of those fields are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.**
| `config/analytics.yml` | List all fields we will send to BigQuery |
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
| `config/analytics_hidden_pii.yml` | List all fields we will send separately to BigQuery where they will be hidden. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe so. In that respect analytics_hidden_pii.yml is no different to analytics_pii.yml. It's just that the method of sending and holding the data is different.

Also I the field should only be in either analytics_hidden_pii.yml or analytics_pii.yml. Having the field in both would be ambiguous.

@@ -141,6 +141,10 @@ def self.allowlist_pii
Rails.application.config_for(:analytics_pii)
end

def self.analytics_hidden_pii
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistency, should we call this method hidden_pii ?

end
end

let(:existing_allowlist) { { Candidate.table_name.to_sym => ['email_address'] } }
let(:existing_allowlist) { { Candidate.table_name.to_sym => %w[email_address] } }
let(:existing_blocklist) { { Candidate.table_name.to_sym => ['id'] } }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency should we change the following also ?
['id'] -> %w[id]
['dob'] -> %w[dob]


before do
allow(DfE::Analytics).to receive(:allowlist).and_return(existing_allowlist)
allow(DfE::Analytics).to receive(:blocklist).and_return(existing_blocklist)
allow(DfE::Analytics).to receive(:analytics_hidden_pii).and_return(hidden_pii)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidden_pii rather than analytics_hidden_pii? Please see above.

Copy link
Collaborator

@asatwal asatwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM. Maybe we should create an epic branch and merge all the PRs into an epic branch rather than having separate PR branches ? We can discuss.

@ericaporter ericaporter changed the base branch from main to hidden-pii-updates April 4, 2024 13:39
@ericaporter ericaporter merged commit 617128e into hidden-pii-updates Apr 5, 2024
7 checks passed
@ericaporter ericaporter deleted the add-hidden-pii-yml branch April 5, 2024 08:14
ericaporter added a commit that referenced this pull request May 2, 2024
* Add hidden_pii.yml
* Address PR comments
* Update Readme for hidden_pii.yml
ericaporter added a commit that referenced this pull request May 2, 2024
* Add hidden_pii.yml
* Address PR comments
* Update Readme for hidden_pii.yml
ericaporter added a commit that referenced this pull request May 8, 2024
* Add hidden_pii.yml
* Address PR comments
* Update Readme for hidden_pii.yml
ericaporter added a commit that referenced this pull request Jul 8, 2024
* Add DATA_hidden to events table creation SQL script
* Add hidden PII required permissions to GCP custom IAM role setup instructions
* Add policy tag setup instructions to GCP setup instructions
* Update role configuration to match latest BAT configuration
* Add hidden_pii.yml (#121)
* Allow hidden data to be sent separately (#128)
* Mask hidden_pii from logs
* Add validation for hidden pii fields appearing on both lists
* Change DATA_hidden to hidden_DATA in google_cloud_bigquery_setup.md
* Change DATA_hidden to hidden_DATA in create-events-table.sql
* Cover edge cases for validations (#146)
* Update log masking method (#150)
* Add reference to hidden_pii to cutom events (#153)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants