Skip to content

Commit

Permalink
Hidden pii updates (#143)
Browse files Browse the repository at this point in the history
* Add DATA_hidden to events table creation SQL script
* Add hidden PII required permissions to GCP custom IAM role setup instructions
* Add policy tag setup instructions to GCP setup instructions
* Update role configuration to match latest BAT configuration
* Add hidden_pii.yml (#121)
* Allow hidden data to be sent separately (#128)
* Mask hidden_pii from logs
* Add validation for hidden pii fields appearing on both lists
* Change DATA_hidden to hidden_DATA in google_cloud_bigquery_setup.md
* Change DATA_hidden to hidden_DATA in create-events-table.sql
* Cover edge cases for validations (#146)
* Update log masking method (#150)
* Add reference to hidden_pii to cutom events (#153)
  • Loading branch information
ericaporter authored Jul 8, 2024
1 parent fd56ad1 commit 1bfd587
Show file tree
Hide file tree
Showing 21 changed files with 735 additions and 86 deletions.
38 changes: 29 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,13 +162,13 @@ The `dfe:analytics:install` generator will also initialize some empty config fil

| Filename | Purpose |
|---------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| `config/analytics.yml` | List all fields we will send to BigQuery |
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task |
| `config/analytics_custom_events.yml` | Optional file including list of all custom event names

**It is imperative that you perform a full check of those fields are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.**
| `config/analytics.yml` | List all fields we will send to BigQuery |
| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
| `config/analytics_hidden_pii.yml` | List all fields we will send separately to BigQuery where they will be hidden. This should be a subset of fields in `analytics.yml` |
| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task |
| `config/analytics_custom_events.yml` | Optional file including list of all custom event names |

**It is imperative that you perform a full check of the fields that are being sent, and exclude those containing personally-identifiable information (PII) in `config/analytics_hidden_pii.yml`, in order to comply with the requirements of the [Data Protection Act 2018](https://www.gov.uk/data-protection), unless an exemption has been obtained.**

When you first install the gem, none of your fields will be listed in `analytics.yml`, so no data will be sent to BigQuery. To get started, generate a blocklist using this command:

Expand All @@ -177,7 +177,7 @@ bundle exec rails dfe:analytics:regenerate_blocklist
```

Work through `analytics_blocklist.yml` to move entries into `analytics.yml` and
optionally also to `analytics_pii.yml`.
optionally also to `analytics_hidden_pii.yml`.

When you boot your app, DfE::Analytics will raise an error if there are
fields in your field configuration which are present in the database but
Expand Down Expand Up @@ -256,7 +256,7 @@ it might be necessary to add a primary key to the table and to update the releva

## Custom events

If you wish to send custom analytics event, create a file `config/analytics_custom_events.yml` containing an array of your custom events types under a `shared` key like:
If you wish to send custom analytics event, for example if you have data about emails sent, server-side validation errors, API query data, or data relating to searches performed, create a file `config/analytics_custom_events.yml` containing an array of your custom events types under a `shared` key like:

```yaml
shared:
Expand All @@ -275,6 +275,26 @@ event = DfE::Analytics::Event.new
.with_data(some: 'custom details about event')
```

If you need to include hidden PII, you can use the `hidden_data` key which will allow all fields listed to be sent separately to BigQuery where they will be hidden.

```ruby
event = DfE::Analytics::Event.new
.with_type(:some_custom_event)
.with_user(current_user)
.with_request_details(request)
.with_namespace('some_namespace')
.with_data(
data:
{
some: 'custom details about event'
},
hidden_data: {
some_hidden: 'some data to be hidden',
more_hidden: 'more data to be hidden,
}
)
```

Once all the events have been constructed, simply send them to your analytics:

```ruby
Expand Down Expand Up @@ -389,7 +409,7 @@ See the list of existing event types below for what kinds of event types can be
The different types of events that DfE Analytics send are:

- `web_request` - sent after a controller action is performed using controller callbacks
- `create_entity` - sent after an object is created using model callbacks
- `create_entity` - sent after an object is created using model callbacks
- `update_entity` - sent after an object is updated using model callbacks
- `delete_entity` - sent after an object is deleted using model callbacks
- `import_entity` - sent for each object imported using the DfE Analytics import rake tasks
Expand Down
2 changes: 2 additions & 0 deletions docs/create-events-table.sql
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ CREATE TABLE
response_status STRING OPTIONS(description="HTTP response code returned by the application in response to this web request, if this event is a web request. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Status."),
DATA ARRAY < STRUCT <key STRING NOT NULL OPTIONS(description="Name of the field in the entity_table_name table in the database after it was created or updated, or just before it was imported or destroyed."),
value ARRAY < STRING > OPTIONS(description="Contents of the field in the database after it was created or updated, or just before it was imported or destroyed.") > > OPTIONS(description="ARRAY of STRUCTs, each with a key and a value. Contains a set of data points appropriate to the event_type of this event. For example, if this event was an entity create, update, delete or import event, data will contain the values of each field in the database after this event took place - according to the settings in the analytics.yml configured for this instance of dfe-analytics. Value be anonymised as a one way hash, depending on configuration settings."),
hidden_DATA ARRAY < STRUCT <key STRING NOT NULL OPTIONS(description="Name of the field in the entity_table_name table in the database after it was created or updated, or just before it was imported or destroyed."),
value ARRAY < STRING > OPTIONS(description="Contents of the field in the database after it was created or updated, or just before it was imported or destroyed.") > > OPTIONS(description="Defined in the same way as the DATA ARRAY of STRUCTs, except containing fields configured to be hidden in analytics_hidden_pii.yml"),
entity_table_name STRING OPTIONS(description="If event_type was an entity create, update, delete or import event, the name of the table in the database that this entity is stored in. NULL otherwise."),
event_tags ARRAY < STRING > OPTIONS(description="Currently left blank for future use."),
anonymised_user_agent_and_ip STRING OPTIONS(description="One way hash of a combination of the user's IP address and user agent, if this event is a web request. Can be used to identify the user anonymously, even when user_id is not set. Cannot be used to identify the user over a time period of longer than about a month, because of IP address changes and browser updates."),
Expand Down
Loading

0 comments on commit 1bfd587

Please sign in to comment.