Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hidden pii updates #143

Merged
merged 23 commits into from
Jul 8, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
c3a00b4
Add DATA_hidden to events table creation SQL script
stevenleggdfe Apr 29, 2024
b40b42e
Add hidden PII required permissions to GCP custom IAM role setup inst…
stevenleggdfe Apr 29, 2024
7a89f78
Add policy tag setup instructions to GCP setup instructions
stevenleggdfe Apr 29, 2024
bc720fe
Add missing comma
stevenleggdfe Apr 29, 2024
d6417c9
Update role configuration to match latest BAT configuration
stevenleggdfe Apr 29, 2024
60e29d9
Fix a couple of incorrect IDs
stevenleggdfe Apr 29, 2024
9f4d5c0
Add hidden_pii.yml (#121)
ericaporter Apr 5, 2024
7cb4942
Allow hidden data to be sent separately (#128)
ericaporter Apr 29, 2024
b229b0e
Mask hidden_pii from logs
ericaporter May 1, 2024
1e6540b
Rubocop updates
ericaporter May 1, 2024
5877aa2
Fix specs after rebase
ericaporter May 2, 2024
10bcb1f
Update the tests for events when event_debug_filters are required
ericaporter May 7, 2024
7b6ca82
Update syntax
ericaporter May 7, 2024
5894bf6
Add hidden_pii.yml (#121)
ericaporter Apr 5, 2024
f5114a2
Allow hidden data to be sent separately (#128)
ericaporter Apr 29, 2024
81da9c1
Merge branch 'mask-pii-in-logs' into hidden-pii-updates
ericaporter May 9, 2024
abc81f7
Change DATA_hidden to hidden_DATA in google_cloud_bigquery_setup.md
stevenleggdfe May 9, 2024
f524643
Change DATA_hidden to hidden_DATA in create-events-table.sql
stevenleggdfe May 9, 2024
1eefb72
Merge pull request #138 from DFE-Digital/hidden-pii-gcp-setup-instruc…
stevenleggdfe May 10, 2024
ff894de
Cover edge cases for validations (#146)
ericaporter Jul 1, 2024
ff14154
Update log masking method (#150)
ericaporter Jul 1, 2024
16d02f9
Add reference to hidden_pii to cutom events (#153)
ericaporter Jul 1, 2024
dcf3bb1
Fix documentation
ericaporter Jul 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 29 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,13 +162,13 @@ The `dfe:analytics:install` generator will also initialize some empty config fil

| Filename | Purpose |
|---------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| `config/analytics.yml` | List all fields we will send to BigQuery |
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task |
| `config/analytics_custom_events.yml` | Optional file including list of all custom event names

**It is imperative that you perform a full check of those fields are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.**
| `config/analytics.yml` | List all fields we will send to BigQuery |
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
| `config/analytics_hidden_pii.yml` | List all fields we will send separately to BigQuery where they will be hidden. This should be a subset of fields in `analytics.yml` |
| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task |
| `config/analytics_custom_events.yml` | Optional file including list of all custom event names |

**It is imperative that you perform a full check of the fields that are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_hidden_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.**

When you first install the gem, none of your fields will be listed in `analytics.yml`, so no data will be sent to BigQuery. To get started, generate a blocklist using this command:

Expand All @@ -177,7 +177,7 @@ bundle exec rails dfe:analytics:regenerate_blocklist
```

Work through `analytics_blocklist.yml` to move entries into `analytics.yml` and
optionally also to `analytics_pii.yml`.
optionally also to `analytics_hidden_pii.yml`.

When you boot your app, DfE::Analytics will raise an error if there are
fields in your field configuration which are present in the database but
Expand Down Expand Up @@ -256,7 +256,7 @@ it might be necessary to add a primary key to the table and to update the releva

## Custom events

If you wish to send custom analytics event, create a file `config/analytics_custom_events.yml` containing an array of your custom events types under a `shared` key like:
If you wish to send custom analytics event, for example if you have data about emails sent, server-side validation errors, API query data, or data relating to searches performed, create a file `config/analytics_custom_events.yml` containing an array of your custom events types under a `shared` key like:

```yaml
shared:
Expand All @@ -275,6 +275,26 @@ event = DfE::Analytics::Event.new
.with_data(some: 'custom details about event')
```

If you need to include hidden PII, you can use the `hidden_data` key which will allow all fields listed to be sent separately to BigQuery where they will be hidden.

```ruby
event = DfE::Analytics::Event.new
.with_type(:some_custom_event)
.with_user(current_user)
.with_request_details(request)
.with_namespace('some_namespace')
.with_data(
data:
{
some: 'custom details about event'
},
hidden_data: {
some_hidden: 'some data to be hidden',
more_hidden: 'more data to be hidden,
}
)
```

Once all the events have been constructed, simply send them to your analytics:

```ruby
Expand Down Expand Up @@ -389,8 +409,7 @@ See the list of existing event types below for what kinds of event types can be
The different types of events that DfE Analytics send are:

- `web_request` - sent after a controller action is performed using controller callbacks
- `create_entity` - sent after an object is created using model callbacks
- `update_entity` - sent after an object is updated using model callbacks
- `create_entity` - sent after an object is created using model callbacks
ericaporter marked this conversation as resolved.
Show resolved Hide resolved
- `delete_entity` - sent after an object is deleted using model callbacks
- `import_entity` - sent for each object imported using the DfE Analytics import rake tasks

Expand Down
2 changes: 2 additions & 0 deletions docs/create-events-table.sql
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ CREATE TABLE
response_status STRING OPTIONS(description="HTTP response code returned by the application in response to this web request, if this event is a web request. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Status."),
DATA ARRAY < STRUCT <key STRING NOT NULL OPTIONS(description="Name of the field in the entity_table_name table in the database after it was created or updated, or just before it was imported or destroyed."),
value ARRAY < STRING > OPTIONS(description="Contents of the field in the database after it was created or updated, or just before it was imported or destroyed.") > > OPTIONS(description="ARRAY of STRUCTs, each with a key and a value. Contains a set of data points appropriate to the event_type of this event. For example, if this event was an entity create, update, delete or import event, data will contain the values of each field in the database after this event took place - according to the settings in the analytics.yml configured for this instance of dfe-analytics. Value be anonymised as a one way hash, depending on configuration settings."),
hidden_DATA ARRAY < STRUCT <key STRING NOT NULL OPTIONS(description="Name of the field in the entity_table_name table in the database after it was created or updated, or just before it was imported or destroyed."),
value ARRAY < STRING > OPTIONS(description="Contents of the field in the database after it was created or updated, or just before it was imported or destroyed.") > > OPTIONS(description="Defined in the same way as the DATA ARRAY of STRUCTs, except containing fields configured to be hidden in analytics_hidden_pii.yml"),
entity_table_name STRING OPTIONS(description="If event_type was an entity create, update, delete or import event, the name of the table in the database that this entity is stored in. NULL otherwise."),
event_tags ARRAY < STRING > OPTIONS(description="Currently left blank for future use."),
anonymised_user_agent_and_ip STRING OPTIONS(description="One way hash of a combination of the user's IP address and user agent, if this event is a web request. Can be used to identify the user anonymously, even when user_id is not set. Cannot be used to identify the user over a time period of longer than about a month, because of IP address changes and browser updates."),
Expand Down
Loading
Loading