diff --git a/README.md b/README.md
index efcd591c..3d6d15f2 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,8 @@
## Overview
-This gem provides an _opinionated integration_ with Google BigQuery.
+This gem provides an _opinionated integration_ with Google Cloud Platform (GCP)
+BigQuery.
Once it is set up, every web request and database update (as permitted by
configuration) will flow to BigQuery.
@@ -64,8 +65,19 @@ A Rails app with `ActiveJob` configured.
## Installation
+Before you can send data to BigQuery with `dfe-analytics` you'll need to setup
+your Google Cloud project. See the [setup Google Cloud setup guide](docs/google_cloud_bigquery_setup.md)
+for instructions on how to do that.
+
+### 1. Add the dfe-analytics to your app
+
+The `dfe-analytics` gem hasn't been published to Rubygems yet, so it needs to be
+retrieved from GitHub. Check for the latest tagged version in GitHub and provide
+that to the `tag` argument in your Gemfile. Dependabot will update this for you
+when it finds a new tagged version.
+
```ruby
-gem 'dfe-analytics'
+gem 'dfe-analytics', github: 'DFE-Digital/dfe-analytics', tag: 'v1.3.2'
```
then
@@ -74,265 +86,22 @@ then
bundle install
```
-## Configuration
-
-
-### 1. Get a BigQuery project setup and add initial owners
-
-Ask in Slack on the `#twd_data_insights` channel for someone to help you
-procure a BigQuery instance in the `digital.education.gov.uk` Google Cloud
-Organisation.
-
-Ask - or ask your Delivery Manager to ask - for your `@digital.education.gov.uk` Google account to be setup as an owner
-via the IAM and Admin settings. Add other team members as necessary.
-
-#### Set up billing
-
-You also need to set up your BigQuery instance with paid billing. This is
-because `dfe-analytics` uses streaming, and streaming isn't allowed in the free
-tier:
-
-```
-accessDenied: Access Denied: BigQuery BigQuery: Streaming insert is not allowed
-in the free tier
-```
-
-### 2. Create a dataset and table
-
-You should create separate datasets for each environment (dev, qa, preprod, prod etc.).
-
-1. Open your project's BigQuery instance
-2. Go to the Analysis -> SQL Workspace section
-3. Tap on the 3 dots next to the project name, "Create data set"
-4. Name it something like `APPLICATIONNAME_events_ENVIRONMENT`, such as `applyforqts_events_production`, and set the location to `europe-west2 (London)`
-5. Select your new dataset
-6. Open a new query execution tab.
-7. Edit [create-events-table.sql](https://github.com/DFE-Digital/dfe-analytics/create-events-table.sql) to add your table name, and run it in the query execution tab in BigQuery to create a blank events table for dfe-analytics to stream data into.
-
-### 3. Create custom roles
-
-The following steps can be performed either through the IAM section of the Google Cloud console, or using the cloud shell feature inside the Google Cloud console.
-
-The shell commands require using a command-line interface so may not be appropriate for everyone.
-
-
-
-Instructions for GCloud CLI
-
-> **NB:** These instructions are appropriate for people who are comfortable running shell commands.
-
-1. Go to the IAM section of the Google Console for your project.
-2. Click ![Google Cloud shell button](https://user-images.githubusercontent.com/15608/184917222-80397b08-83fa-41e5-b485-acb4f7a8b7a0.png) to activate the Google Cloud shell.
-3. Copy the command provided into the shell, replacing `YOUR_PROJECT_ID` with your own project ID.
-
-
-
-
-Instructions for GCloud IAM Web UI
-
-> **NB:** Adding permissions to a role is a manual process that requires using the permission browser to add permissions one at a time.
-
-1. Go to the IAM section of the Google Console for your project.
-1. Go to Roles section using the sidebar on the left.
-1. Click on "+ Create role" near the top.
-1. Fill in the details from the info below.
-
-
-
-
-#### Analyst
-
-This role is used for analysts or other users who don't need to write to or modify data in BigQuery.
-
-
-Using the GCloud CLI
-
-``` bash
-gcloud iam roles create bigquery_analyst_custom --title="BigQuery Analyst Custom" --description="Assigned to accounts used by analysts and SQL developers." --permissions=bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.routines.get,bigquery.routines.list,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.createSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,resourcemanager.projects.get --project=YOUR_PROJECT_ID
-```
-
-
-
-
-Using the GCloud IAM Web UI
-
-| Field | Value |
-| ----------------- | -------------------------------------------------- |
-| Title | **BigQuery Analyst Custom** |
-| Description | Assigned to accounts used by analysts and SQL developers. |
-| ID | `bigquery_analyst_custom` |
-| Role launch stage | General Availability |
-| + Add permissions | See below |
-
-##### Permissions for `bigquery_analyst_custom`
-
-```
-bigquery.datasets.get
-bigquery.datasets.getIamPolicy
-bigquery.datasets.updateTag
-bigquery.jobs.create
-bigquery.jobs.get
-bigquery.jobs.list
-bigquery.jobs.listAll
-bigquery.models.export
-bigquery.models.getData
-bigquery.models.getMetadata
-bigquery.models.list
-bigquery.routines.get
-bigquery.routines.list
-bigquery.savedqueries.create
-bigquery.savedqueries.delete
-bigquery.savedqueries.get
-bigquery.savedqueries.list
-bigquery.savedqueries.update
-bigquery.tables.createSnapshot
-bigquery.tables.export
-bigquery.tables.get
-bigquery.tables.getData
-bigquery.tables.getIamPolicy
-bigquery.tables.list
-bigquery.tables.restoreSnapshot
-resourcemanager.projects.get
-```
-
-
-
-#### Developer
-
-This role is used for developers or other users who need to be able to write to or modify data in BigQuery.
-
-
-Using the GCloud CLI
-
-``` bash
-gcloud iam roles create bigquery_developer_custom --title="BigQuery Developer Custom" --description="Assigned to accounts used by developers." --permissions=bigquery.connections.create,bigquery.connections.delete,bigquery.connections.get,bigquery.connections.getIamPolicy,bigquery.connections.list,bigquery.connections.update,bigquery.connections.updateTag,bigquery.connections.use,bigquery.datasets.create,bigquery.datasets.delete,bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.update,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.delete,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.jobs.update,bigquery.models.create,bigquery.models.delete,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.models.updateData,bigquery.models.updateMetadata,bigquery.models.updateTag,bigquery.routines.create,bigquery.routines.delete,bigquery.routines.get,bigquery.routines.list,bigquery.routines.update,bigquery.routines.updateTag,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.create,bigquery.tables.createSnapshot,bigquery.tables.delete,bigquery.tables.deleteSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,bigquery.tables.setCategory,bigquery.tables.update,bigquery.tables.updateData,bigquery.tables.updateTag,resourcemanager.projects.get --project=YOUR_PROJECT_ID
-```
-
-
-
-
-Using the GCloud IAM Web UI
-
-| Field | Value |
-| ----------------- | ---------------------------------------- |
-| Title | **BigQuery Developer Custom** |
-| Description | Assigned to accounts used by developers. |
-| ID | `bigquery_developer_custom` |
-| Role launch stage | General Availability |
-| + Add permissions | See below |
-
-##### Permissions for `bigquery_developer_custom`
-
-```
-bigquery.connections.create
-bigquery.connections.delete
-bigquery.connections.get
-bigquery.connections.getIamPolicy
-bigquery.connections.list
-bigquery.connections.update
-bigquery.connections.updateTag
-bigquery.connections.use
-bigquery.datasets.create
-bigquery.datasets.delete
-bigquery.datasets.get
-bigquery.datasets.getIamPolicy
-bigquery.datasets.update
-bigquery.datasets.updateTag
-bigquery.jobs.create
-bigquery.jobs.delete
-bigquery.jobs.get
-bigquery.jobs.list
-bigquery.jobs.listAll
-bigquery.jobs.update
-bigquery.models.create
-bigquery.models.delete
-bigquery.models.export
-bigquery.models.getData
-bigquery.models.getMetadata
-bigquery.models.list
-bigquery.models.updateData
-bigquery.models.updateMetadata
-bigquery.models.updateTag
-bigquery.routines.create
-bigquery.routines.delete
-bigquery.routines.get
-bigquery.routines.list
-bigquery.routines.update
-bigquery.routines.updateTag
-bigquery.savedqueries.create
-bigquery.savedqueries.delete
-bigquery.savedqueries.get
-bigquery.savedqueries.list
-bigquery.savedqueries.update
-bigquery.tables.create
-bigquery.tables.createSnapshot
-bigquery.tables.delete
-bigquery.tables.deleteSnapshot
-bigquery.tables.export
-bigquery.tables.get
-bigquery.tables.getData
-bigquery.tables.getIamPolicy
-bigquery.tables.list
-bigquery.tables.restoreSnapshot
-bigquery.tables.setCategory
-bigquery.tables.update
-bigquery.tables.updateData
-bigquery.tables.updateTag
-resourcemanager.projects.get
-```
+### 2. Get an API JSON key :key:
-
+Depending on how your app environments are setup, we recommend you use the
+service account created for the `development` environment on your localhost to
+test integration with BigQuery. This requires that your project is setup in
+Google Cloud as per the instructions above.
-#### Appender
-
-This role is assigned to the service account used by the application connecting to Google Cloud to append data to the `events` tables.
-
-
-Using the GCloud CLI
-
-``` bash
-gcloud iam roles create bigquery_appender_custom --title="BigQuery Appender Custom" --description="Assigned to service accounts used to append data to events tables." --permissions=bigquery.datasets.get,bigquery.tables.get,bigquery.tables.updateData
-```
-
-
-
-
-Using the GCloud IAM Web UI
-
-| Field | Value |
-| ----------------- | ---------------------------------------------------------- |
-| Title | **BigQuery Appender Custom** |
-| Description | Assigned to service accounts used to append data to events tables. |
-| ID | `bigquery_appender_custom` |
-| Role launch stage | General Availability |
-| + Add permissions | See below |
-
-##### Permissions for bigquery_appender_custom
-
-```
-bigquery.datasets.get
-bigquery.tables.get
-bigquery.tables.updateData
-```
-
-
-
-### 4. Create an appender service account
-
-1. Go to [IAM and Admin settings > Create service account](https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create?supportedpurview=project)
-1. Name it like "Appender NAME_OF_SERVICE ENVIRONMENT" e.g. "Appender ApplyForQTS Production"
-1. Add a description, like "Used when developing locally."
-1. Grant the service account access to the project, use the "BigQuery Appender Custom" role you set up earlier
-
-### 5. Get an API JSON key :key:
-
-1. Access the service account you previously set up
-1. Go to the keys tab, click on "Add key > Create new key"
-1. Create a JSON private key
+1. Access the `development` service account you previously set up
+1. Go to the keys tab, click on "Add key" > "Create new key"
+1. Create a JSON private key. This file will be downloaded to your local system.
The full contents of this JSON file is your `BIGQUERY_API_JSON_KEY`.
-### 6. Set up environment variables
+Use these steps to download a key to use in your deployed environment's secrets,
+
+### 3. Set up environment variables
Putting the previous things together, to finish setting up `dfe-analytics`, you
need these environment variables:
@@ -344,7 +113,7 @@ BIGQUERY_DATASET=your-bigquery-dataset-name
BIGQUERY_API_JSON_KEY=
```
-### 7. Configure BigQuery connection, feature flags etc
+### 4. Configure BigQuery connection, feature flags etc
```bash
bundle exec rails generate dfe:analytics:install
@@ -354,13 +123,13 @@ and follow comments in `config/initializers/dfe_analytics.rb`.
The `dfe:analytics:install` generator will also initialize some empty config files:
-| Filename | Purpose |
-|----------|---------|
-| `config/analytics.yml` | List all fields we will send to BigQuery |
-| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
-| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task |
+| Filename | Purpose |
+|----------------------------------|--------------------------------------------------------------------------------------------------------------------|
+| `config/analytics.yml` | List all fields we will send to BigQuery |
+| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` |
+| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task |
-### 8. Check your fields
+### 5. Check your fields
A good place to start is to run
@@ -384,7 +153,7 @@ config but missing from the database.
**It's recommended to run this task regularly - at least as often as you run
database migrations. Consider enhancing db:migrate to run it automatically.**
-### 9. Enable callbacks
+### 6. Enable callbacks
Mix in the following modules. It's recommended to include them at the
highest possible level in the inheritance hierarchy of your controllers and
diff --git a/docs/bigquery-new-query-button.png b/docs/bigquery-new-query-button.png
new file mode 100644
index 00000000..190348ec
Binary files /dev/null and b/docs/bigquery-new-query-button.png differ
diff --git a/create-events-table.sql b/docs/create-events-table.sql
similarity index 100%
rename from create-events-table.sql
rename to docs/create-events-table.sql
diff --git a/docs/google-cloud-shell-button.png b/docs/google-cloud-shell-button.png
new file mode 100644
index 00000000..2c11f4cf
Binary files /dev/null and b/docs/google-cloud-shell-button.png differ
diff --git a/docs/google_cloud_bigquery_setup.md b/docs/google_cloud_bigquery_setup.md
new file mode 100644
index 00000000..2e300f19
--- /dev/null
+++ b/docs/google_cloud_bigquery_setup.md
@@ -0,0 +1,339 @@
+# Google Cloud BigQuery Setup
+
+Before you can start to use BigQuery and send events to it with `dfe-analytics`
+you'll need to setup your project in the Google Cloud Platform (GCP).
+
+## Initial Configuration
+
+These steps need to be performed only once when you setup your Google Cloud
+project.
+
+### 1. Create a Google Cloud project
+
+Ask in Slack on the `#twd_data_insights` channel for someone to help you create
+your project in the `digital.education.gov.uk` Google Cloud Organisation.
+
+Each team is responsible for managing their project in Google Cloud. Ensure
+you've added users with the `Owner` role through the IAM section of Google
+Cloud.
+
+### 2. Set up billing
+
+You also need to set up your GCP organisation instance with paid billing. This
+is because `dfe-analytics` uses streaming, and streaming to BigQuery isn't
+allowed in the free tier:
+
+```
+accessDenied: Access Denied: BigQuery BigQuery: Streaming insert is not allowed
+in the free tier
+```
+
+The following steps can be accomplished without having billing setup, however
+there are certain restrictions.
+
+- Streaming data to BigQuery isn't allowed, so you won't be able to use
+ `dfe_analytics`.
+- Tables are limited to 60 days retention. It's not clear if this limitation is
+ automatically lifted once the project is connected to a billing account,
+ tables may have to be modified or recreated.
+
+### 3. Create custom roles
+
+We use customised roles to give permissions to users who need to use the
+BigQuery.
+
+Instructions are provided below and must be followed to create each role. There
+are two approaches available to create custom roles, one is using the Google
+Cloud shell CLI, which is appropriate for advanced users comfortable with
+command-line interfaces. The other is through the Google Cloud IAM web UI and
+requires more manual work especially when it comes to adding permissions.
+
+ Instructions for GCloud CLI
+
+> **NB:** These instructions are appropriate for people who are comfortable
+> running shell commands.
+
+1. Go to the IAM section of the Google Console for your project.
+2. Click the ![Google Cloud shell button](google-cloud-shell-button.png) to
+ activate the Google Cloud shell.
+3. Copy the command provided into the shell, replacing `YOUR_PROJECT_ID` with
+ your own project ID.
+
+
+
+ Instructions for GCloud IAM Web UI
+
+> **NB:** Adding permissions to a role is a manual process that requires using
+> the permission browser to add permissions one at a time.
+
+1. Go to the IAM section of the Google Console for your project.
+1. Go to Roles section using the sidebar on the left.
+1. Click on "+ Create role" near the top.
+1. Fill in the details from the info below.
+
+
+
+
+#### Analyst Role
+
+This role is used for analysts or other users who don't need to write to or
+modify data in BigQuery.
+
+ Using the GCloud CLI
+
+``` bash
+gcloud iam roles create bigquery_analyst_custom --title="BigQuery Analyst Custom" --description="Assigned to accounts used by analysts and SQL developers." --permissions=bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.routines.get,bigquery.routines.list,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.createSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,resourcemanager.projects.get --project=YOUR_PROJECT_ID
+```
+
+
+
+ Using the GCloud IAM Web UI
+
+| Field | Value |
+|-------------------|-----------------------------------------------------------|
+| Title | **BigQuery Analyst Custom** |
+| Description | Assigned to accounts used by analysts and SQL developers. |
+| ID | `bigquery_analyst_custom` |
+| Role launch stage | General Availability |
+| + Add permissions | See below |
+
+##### Permissions for `bigquery_analyst_custom`
+
+```
+bigquery.datasets.get
+bigquery.datasets.getIamPolicy
+bigquery.datasets.updateTag
+bigquery.jobs.create
+bigquery.jobs.get
+bigquery.jobs.list
+bigquery.jobs.listAll
+bigquery.models.export
+bigquery.models.getData
+bigquery.models.getMetadata
+bigquery.models.list
+bigquery.routines.get
+bigquery.routines.list
+bigquery.savedqueries.create
+bigquery.savedqueries.delete
+bigquery.savedqueries.get
+bigquery.savedqueries.list
+bigquery.savedqueries.update
+bigquery.tables.createSnapshot
+bigquery.tables.export
+bigquery.tables.get
+bigquery.tables.getData
+bigquery.tables.getIamPolicy
+bigquery.tables.list
+bigquery.tables.restoreSnapshot
+resourcemanager.projects.get
+```
+
+
+
+#### Developer Role
+
+This role is used for developers or other users who need to be able to write to
+or modify data in BigQuery.
+
+ Using the GCloud CLI
+
+``` bash
+gcloud iam roles create bigquery_developer_custom --title="BigQuery Developer Custom" --description="Assigned to accounts used by developers." --permissions=bigquery.connections.create,bigquery.connections.delete,bigquery.connections.get,bigquery.connections.getIamPolicy,bigquery.connections.list,bigquery.connections.update,bigquery.connections.updateTag,bigquery.connections.use,bigquery.datasets.create,bigquery.datasets.delete,bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.update,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.delete,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.jobs.update,bigquery.models.create,bigquery.models.delete,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.models.updateData,bigquery.models.updateMetadata,bigquery.models.updateTag,bigquery.routines.create,bigquery.routines.delete,bigquery.routines.get,bigquery.routines.list,bigquery.routines.update,bigquery.routines.updateTag,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.create,bigquery.tables.createSnapshot,bigquery.tables.delete,bigquery.tables.deleteSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,bigquery.tables.setCategory,bigquery.tables.update,bigquery.tables.updateData,bigquery.tables.updateTag,resourcemanager.projects.get --project=YOUR_PROJECT_ID
+```
+
+
+
+ Using the GCloud IAM Web UI
+
+| Field | Value |
+| ----------------- | ---------------------------------------- |
+| Title | **BigQuery Developer Custom** |
+| Description | Assigned to accounts used by developers. |
+| ID | `bigquery_developer_custom` |
+| Role launch stage | General Availability |
+| + Add permissions | See below |
+
+##### Permissions for `bigquery_developer_custom`
+
+```
+bigquery.connections.create
+bigquery.connections.delete
+bigquery.connections.get
+bigquery.connections.getIamPolicy
+bigquery.connections.list
+bigquery.connections.update
+bigquery.connections.updateTag
+bigquery.connections.use
+bigquery.datasets.create
+bigquery.datasets.delete
+bigquery.datasets.get
+bigquery.datasets.getIamPolicy
+bigquery.datasets.update
+bigquery.datasets.updateTag
+bigquery.jobs.create
+bigquery.jobs.delete
+bigquery.jobs.get
+bigquery.jobs.list
+bigquery.jobs.listAll
+bigquery.jobs.update
+bigquery.models.create
+bigquery.models.delete
+bigquery.models.export
+bigquery.models.getData
+bigquery.models.getMetadata
+bigquery.models.list
+bigquery.models.updateData
+bigquery.models.updateMetadata
+bigquery.models.updateTag
+bigquery.routines.create
+bigquery.routines.delete
+bigquery.routines.get
+bigquery.routines.list
+bigquery.routines.update
+bigquery.routines.updateTag
+bigquery.savedqueries.create
+bigquery.savedqueries.delete
+bigquery.savedqueries.get
+bigquery.savedqueries.list
+bigquery.savedqueries.update
+bigquery.tables.create
+bigquery.tables.createSnapshot
+bigquery.tables.delete
+bigquery.tables.deleteSnapshot
+bigquery.tables.export
+bigquery.tables.get
+bigquery.tables.getData
+bigquery.tables.getIamPolicy
+bigquery.tables.list
+bigquery.tables.restoreSnapshot
+bigquery.tables.setCategory
+bigquery.tables.update
+bigquery.tables.updateData
+bigquery.tables.updateTag
+resourcemanager.projects.get
+```
+
+
+
+#### Appender Role
+
+This role is assigned to the service account used by the application connecting
+to Google Cloud to append data to the `events` tables.
+
+ Using the GCloud CLI
+
+``` bash
+gcloud iam roles create bigquery_appender_custom --title="BigQuery Appender Custom" --description="Assigned to service accounts used to append data to events tables." --permissions=bigquery.datasets.get,bigquery.tables.get,bigquery.tables.updateData
+```
+
+
+
+ Using the GCloud IAM Web UI
+
+| Field | Value |
+|-------------------|--------------------------------------------------------------------|
+| Title | **BigQuery Appender Custom** |
+| Description | Assigned to service accounts used to append data to events tables. |
+| ID | `bigquery_appender_custom` |
+| Role launch stage | General Availability |
+| + Add permissions | See below |
+
+##### Permissions for bigquery_appender_custom
+
+```
+bigquery.datasets.get
+bigquery.tables.get
+bigquery.tables.updateData
+```
+
+
+
+## Dataset and Table Setup
+
+`dfe-analytics` inserts events into a table in BigQuery with a pre-defined
+schema. Access is given using a service account that has access to append data
+to the given events table. The recommended setup is to have a separate dataset
+and service account for each application / environment combination in your
+project.
+
+For example let's say you have the applications `publish` and `find` in your
+project, and use `development`, `qa`, `staging` and `production` environments.
+You should create a separate dataset for each combination of the above, as well
+as a separate service account that has access to append data to events in only
+one dataset. The following table illustrates how this might look for this
+example:
+
+| Application | Environment | BigQuery Dataset | Service Account |
+|-------------|-------------|----------------------------|--------------------------------------------------------------|
+| publish | development | publish_events_development | appender-publish-development@project.iam.gserviceaccount.com |
+| publish | qa | publish_events_qa | appender-publish-qa@project.iam.gserviceaccount.com |
+| publish | staging | publish_events_staging | appender-publish-staging@project.iam.gserviceaccount.com |
+| publish | production | publish_events_production | appender-publish-production@project.iam.gserviceaccount.com |
+| find | development | find_events_development | appender-find-development@project.iam.gserviceaccount.com |
+| find | qa | find_events_qa | appender-find-qa@project.iam.gserviceaccount.com |
+| find | staging | find_events_staging | appender-find-staging@project.iam.gserviceaccount.com |
+| find | production | find_events_production | appender-find-production@project.iam.gserviceaccount.com |
+
+This approach helps prevent the possibility of sending events to the wrong
+dataset, and reduce the risk should a secret key for one of these accounts
+be leaked.
+
+> **NB:** It may be easier to perform these instructions with two browser tabs
+> open, one for BigQuery and the other for IAM
+
+### 1. Create dataset(s)
+
+Start by creating a dataset.
+
+1. Open your project's BigQuery instance and go to the SQL Workspace section.
+2. Tap on the 3 dots next to the project name then "Create dataset".
+3. Name it something like `APPLICATIONNAME_events_ENVIRONMENT`, as per above
+ examples, e.g. `publish_events_development`, and set the location to
+ `europe-west2 (London)`.
+
+### 2. Create the events table
+
+Once the dataset is ready you need to create the `events` table in it:
+
+1. Select your new dataset and click the ![BigQuery new query
+ button](bigquery-new-query-button.png) to open a new query execution tab.
+2. Copy the contents of [create-events-table.sql](create-events-table.sql)
+ into the query editor.
+3. Edit your project and dataset names in the query editor.
+4. Run the query to create a blank events table.
+
+BigQuery allows you to copy a table to a new dataset, so now is a good time to
+create all the datasets you need and copy the blank `events` table to each of
+them.
+
+### 3. Create an appender service account
+
+Create a service account that will be given permission to append data to tables
+in the new dataset.
+
+1. Go to [IAM and Admin settings > Create service
+ account](https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create?supportedpurview=project)
+2. Name it like "Appender NAME_OF_SERVICE ENVIRONMENT" e.g. "Appender
+ ApplyForQTS Development".
+3. Add a description, like "Used for appending data from development
+ environments."
+4. Copy the email address using the button next to it. You'll need this in the
+ next step to give this account access to your dataset.
+5. Click the "CREATE AND CONTINUE" button.
+6. Click "DONE", skipping the steps to grant roles and user access to this
+ account. Access will be given to the specific dataset in the next step.
+
+### 4. Give the service account access to your dataset
+
+Ensure you have the email address of the service account handy for this.
+
+1. Go to the dataset you created and click "SHARING" > "Permissions" near the
+ top right.
+2. Click "ADD PRINCIPAL".
+3. Paste in the email address of the service account you created into the "New
+ principals" box.
+4. Select the "BigQuery Appender Custom" role you created previously.
+5. Click "SAVE" to finish.
+
+
+