diff --git a/README.md b/README.md index efcd591c..3d6d15f2 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,8 @@ ## Overview -This gem provides an _opinionated integration_ with Google BigQuery. +This gem provides an _opinionated integration_ with Google Cloud Platform (GCP) +BigQuery. Once it is set up, every web request and database update (as permitted by configuration) will flow to BigQuery. @@ -64,8 +65,19 @@ A Rails app with `ActiveJob` configured. ## Installation +Before you can send data to BigQuery with `dfe-analytics` you'll need to setup +your Google Cloud project. See the [setup Google Cloud setup guide](docs/google_cloud_bigquery_setup.md) +for instructions on how to do that. + +### 1. Add the dfe-analytics to your app + +The `dfe-analytics` gem hasn't been published to Rubygems yet, so it needs to be +retrieved from GitHub. Check for the latest tagged version in GitHub and provide +that to the `tag` argument in your Gemfile. Dependabot will update this for you +when it finds a new tagged version. + ```ruby -gem 'dfe-analytics' +gem 'dfe-analytics', github: 'DFE-Digital/dfe-analytics', tag: 'v1.3.2' ``` then @@ -74,265 +86,22 @@ then bundle install ``` -## Configuration - - -### 1. Get a BigQuery project setup and add initial owners - -Ask in Slack on the `#twd_data_insights` channel for someone to help you -procure a BigQuery instance in the `digital.education.gov.uk` Google Cloud -Organisation. - -Ask - or ask your Delivery Manager to ask - for your `@digital.education.gov.uk` Google account to be setup as an owner -via the IAM and Admin settings. Add other team members as necessary. - -#### Set up billing - -You also need to set up your BigQuery instance with paid billing. This is -because `dfe-analytics` uses streaming, and streaming isn't allowed in the free -tier: - -``` -accessDenied: Access Denied: BigQuery BigQuery: Streaming insert is not allowed -in the free tier -``` - -### 2. Create a dataset and table - -You should create separate datasets for each environment (dev, qa, preprod, prod etc.). - -1. Open your project's BigQuery instance -2. Go to the Analysis -> SQL Workspace section -3. Tap on the 3 dots next to the project name, "Create data set" -4. Name it something like `APPLICATIONNAME_events_ENVIRONMENT`, such as `applyforqts_events_production`, and set the location to `europe-west2 (London)` -5. Select your new dataset -6. Open a new query execution tab. -7. Edit [create-events-table.sql](https://github.com/DFE-Digital/dfe-analytics/create-events-table.sql) to add your table name, and run it in the query execution tab in BigQuery to create a blank events table for dfe-analytics to stream data into. - -### 3. Create custom roles - -The following steps can be performed either through the IAM section of the Google Cloud console, or using the cloud shell feature inside the Google Cloud console. - -The shell commands require using a command-line interface so may not be appropriate for everyone. - -
- -Instructions for GCloud CLI - -> **NB:** These instructions are appropriate for people who are comfortable running shell commands. - -1. Go to the IAM section of the Google Console for your project. -2. Click ![Google Cloud shell button](https://user-images.githubusercontent.com/15608/184917222-80397b08-83fa-41e5-b485-acb4f7a8b7a0.png) to activate the Google Cloud shell. -3. Copy the command provided into the shell, replacing `YOUR_PROJECT_ID` with your own project ID. - -
- -
-Instructions for GCloud IAM Web UI - -> **NB:** Adding permissions to a role is a manual process that requires using the permission browser to add permissions one at a time. - -1. Go to the IAM section of the Google Console for your project. -1. Go to Roles section using the sidebar on the left. -1. Click on "+ Create role" near the top. -1. Fill in the details from the info below. - -
- - -#### Analyst - -This role is used for analysts or other users who don't need to write to or modify data in BigQuery. - -
-Using the GCloud CLI - -``` bash -gcloud iam roles create bigquery_analyst_custom --title="BigQuery Analyst Custom" --description="Assigned to accounts used by analysts and SQL developers." --permissions=bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.routines.get,bigquery.routines.list,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.createSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,resourcemanager.projects.get --project=YOUR_PROJECT_ID -``` - -
- -
-Using the GCloud IAM Web UI - -| Field | Value | -| ----------------- | -------------------------------------------------- | -| Title | **BigQuery Analyst Custom** | -| Description | Assigned to accounts used by analysts and SQL developers. | -| ID | `bigquery_analyst_custom` | -| Role launch stage | General Availability | -| + Add permissions | See below | - -##### Permissions for `bigquery_analyst_custom` - -``` -bigquery.datasets.get -bigquery.datasets.getIamPolicy -bigquery.datasets.updateTag -bigquery.jobs.create -bigquery.jobs.get -bigquery.jobs.list -bigquery.jobs.listAll -bigquery.models.export -bigquery.models.getData -bigquery.models.getMetadata -bigquery.models.list -bigquery.routines.get -bigquery.routines.list -bigquery.savedqueries.create -bigquery.savedqueries.delete -bigquery.savedqueries.get -bigquery.savedqueries.list -bigquery.savedqueries.update -bigquery.tables.createSnapshot -bigquery.tables.export -bigquery.tables.get -bigquery.tables.getData -bigquery.tables.getIamPolicy -bigquery.tables.list -bigquery.tables.restoreSnapshot -resourcemanager.projects.get -``` - -
- -#### Developer - -This role is used for developers or other users who need to be able to write to or modify data in BigQuery. - -
-Using the GCloud CLI - -``` bash -gcloud iam roles create bigquery_developer_custom --title="BigQuery Developer Custom" --description="Assigned to accounts used by developers." --permissions=bigquery.connections.create,bigquery.connections.delete,bigquery.connections.get,bigquery.connections.getIamPolicy,bigquery.connections.list,bigquery.connections.update,bigquery.connections.updateTag,bigquery.connections.use,bigquery.datasets.create,bigquery.datasets.delete,bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.update,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.delete,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.jobs.update,bigquery.models.create,bigquery.models.delete,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.models.updateData,bigquery.models.updateMetadata,bigquery.models.updateTag,bigquery.routines.create,bigquery.routines.delete,bigquery.routines.get,bigquery.routines.list,bigquery.routines.update,bigquery.routines.updateTag,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.create,bigquery.tables.createSnapshot,bigquery.tables.delete,bigquery.tables.deleteSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,bigquery.tables.setCategory,bigquery.tables.update,bigquery.tables.updateData,bigquery.tables.updateTag,resourcemanager.projects.get --project=YOUR_PROJECT_ID -``` - -
- -
-Using the GCloud IAM Web UI - -| Field | Value | -| ----------------- | ---------------------------------------- | -| Title | **BigQuery Developer Custom** | -| Description | Assigned to accounts used by developers. | -| ID | `bigquery_developer_custom` | -| Role launch stage | General Availability | -| + Add permissions | See below | - -##### Permissions for `bigquery_developer_custom` - -``` -bigquery.connections.create -bigquery.connections.delete -bigquery.connections.get -bigquery.connections.getIamPolicy -bigquery.connections.list -bigquery.connections.update -bigquery.connections.updateTag -bigquery.connections.use -bigquery.datasets.create -bigquery.datasets.delete -bigquery.datasets.get -bigquery.datasets.getIamPolicy -bigquery.datasets.update -bigquery.datasets.updateTag -bigquery.jobs.create -bigquery.jobs.delete -bigquery.jobs.get -bigquery.jobs.list -bigquery.jobs.listAll -bigquery.jobs.update -bigquery.models.create -bigquery.models.delete -bigquery.models.export -bigquery.models.getData -bigquery.models.getMetadata -bigquery.models.list -bigquery.models.updateData -bigquery.models.updateMetadata -bigquery.models.updateTag -bigquery.routines.create -bigquery.routines.delete -bigquery.routines.get -bigquery.routines.list -bigquery.routines.update -bigquery.routines.updateTag -bigquery.savedqueries.create -bigquery.savedqueries.delete -bigquery.savedqueries.get -bigquery.savedqueries.list -bigquery.savedqueries.update -bigquery.tables.create -bigquery.tables.createSnapshot -bigquery.tables.delete -bigquery.tables.deleteSnapshot -bigquery.tables.export -bigquery.tables.get -bigquery.tables.getData -bigquery.tables.getIamPolicy -bigquery.tables.list -bigquery.tables.restoreSnapshot -bigquery.tables.setCategory -bigquery.tables.update -bigquery.tables.updateData -bigquery.tables.updateTag -resourcemanager.projects.get -``` +### 2. Get an API JSON key :key: -
+Depending on how your app environments are setup, we recommend you use the +service account created for the `development` environment on your localhost to +test integration with BigQuery. This requires that your project is setup in +Google Cloud as per the instructions above. -#### Appender - -This role is assigned to the service account used by the application connecting to Google Cloud to append data to the `events` tables. - -
-Using the GCloud CLI - -``` bash -gcloud iam roles create bigquery_appender_custom --title="BigQuery Appender Custom" --description="Assigned to service accounts used to append data to events tables." --permissions=bigquery.datasets.get,bigquery.tables.get,bigquery.tables.updateData -``` - -
- -
-Using the GCloud IAM Web UI - -| Field | Value | -| ----------------- | ---------------------------------------------------------- | -| Title | **BigQuery Appender Custom** | -| Description | Assigned to service accounts used to append data to events tables. | -| ID | `bigquery_appender_custom` | -| Role launch stage | General Availability | -| + Add permissions | See below | - -##### Permissions for bigquery_appender_custom - -``` -bigquery.datasets.get -bigquery.tables.get -bigquery.tables.updateData -``` - -
- -### 4. Create an appender service account - -1. Go to [IAM and Admin settings > Create service account](https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create?supportedpurview=project) -1. Name it like "Appender NAME_OF_SERVICE ENVIRONMENT" e.g. "Appender ApplyForQTS Production" -1. Add a description, like "Used when developing locally." -1. Grant the service account access to the project, use the "BigQuery Appender Custom" role you set up earlier - -### 5. Get an API JSON key :key: - -1. Access the service account you previously set up -1. Go to the keys tab, click on "Add key > Create new key" -1. Create a JSON private key +1. Access the `development` service account you previously set up +1. Go to the keys tab, click on "Add key" > "Create new key" +1. Create a JSON private key. This file will be downloaded to your local system. The full contents of this JSON file is your `BIGQUERY_API_JSON_KEY`. -### 6. Set up environment variables +Use these steps to download a key to use in your deployed environment's secrets, + +### 3. Set up environment variables Putting the previous things together, to finish setting up `dfe-analytics`, you need these environment variables: @@ -344,7 +113,7 @@ BIGQUERY_DATASET=your-bigquery-dataset-name BIGQUERY_API_JSON_KEY= ``` -### 7. Configure BigQuery connection, feature flags etc +### 4. Configure BigQuery connection, feature flags etc ```bash bundle exec rails generate dfe:analytics:install @@ -354,13 +123,13 @@ and follow comments in `config/initializers/dfe_analytics.rb`. The `dfe:analytics:install` generator will also initialize some empty config files: -| Filename | Purpose | -|----------|---------| -| `config/analytics.yml` | List all fields we will send to BigQuery | -| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` | -| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task | +| Filename | Purpose | +|----------------------------------|--------------------------------------------------------------------------------------------------------------------| +| `config/analytics.yml` | List all fields we will send to BigQuery | +| `config/analytics_pii.yml` | List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in `analytics.yml` | +| `config/analytics_blocklist.yml` | Autogenerated file to list all fields we will NOT send to BigQuery, to support the `analytics:check` task | -### 8. Check your fields +### 5. Check your fields A good place to start is to run @@ -384,7 +153,7 @@ config but missing from the database. **It's recommended to run this task regularly - at least as often as you run database migrations. Consider enhancing db:migrate to run it automatically.** -### 9. Enable callbacks +### 6. Enable callbacks Mix in the following modules. It's recommended to include them at the highest possible level in the inheritance hierarchy of your controllers and diff --git a/docs/bigquery-new-query-button.png b/docs/bigquery-new-query-button.png new file mode 100644 index 00000000..190348ec Binary files /dev/null and b/docs/bigquery-new-query-button.png differ diff --git a/create-events-table.sql b/docs/create-events-table.sql similarity index 100% rename from create-events-table.sql rename to docs/create-events-table.sql diff --git a/docs/google-cloud-shell-button.png b/docs/google-cloud-shell-button.png new file mode 100644 index 00000000..2c11f4cf Binary files /dev/null and b/docs/google-cloud-shell-button.png differ diff --git a/docs/google_cloud_bigquery_setup.md b/docs/google_cloud_bigquery_setup.md new file mode 100644 index 00000000..2e300f19 --- /dev/null +++ b/docs/google_cloud_bigquery_setup.md @@ -0,0 +1,339 @@ +# Google Cloud BigQuery Setup + +Before you can start to use BigQuery and send events to it with `dfe-analytics` +you'll need to setup your project in the Google Cloud Platform (GCP). + +## Initial Configuration + +These steps need to be performed only once when you setup your Google Cloud +project. + +### 1. Create a Google Cloud project + +Ask in Slack on the `#twd_data_insights` channel for someone to help you create +your project in the `digital.education.gov.uk` Google Cloud Organisation. + +Each team is responsible for managing their project in Google Cloud. Ensure +you've added users with the `Owner` role through the IAM section of Google +Cloud. + +### 2. Set up billing + +You also need to set up your GCP organisation instance with paid billing. This +is because `dfe-analytics` uses streaming, and streaming to BigQuery isn't +allowed in the free tier: + +``` +accessDenied: Access Denied: BigQuery BigQuery: Streaming insert is not allowed +in the free tier +``` + +The following steps can be accomplished without having billing setup, however +there are certain restrictions. + +- Streaming data to BigQuery isn't allowed, so you won't be able to use + `dfe_analytics`. +- Tables are limited to 60 days retention. It's not clear if this limitation is + automatically lifted once the project is connected to a billing account, + tables may have to be modified or recreated. + +### 3. Create custom roles + +We use customised roles to give permissions to users who need to use the +BigQuery. + +Instructions are provided below and must be followed to create each role. There +are two approaches available to create custom roles, one is using the Google +Cloud shell CLI, which is appropriate for advanced users comfortable with +command-line interfaces. The other is through the Google Cloud IAM web UI and +requires more manual work especially when it comes to adding permissions. + +
Instructions for GCloud CLI + +> **NB:** These instructions are appropriate for people who are comfortable +> running shell commands. + +1. Go to the IAM section of the Google Console for your project. +2. Click the ![Google Cloud shell button](google-cloud-shell-button.png) to + activate the Google Cloud shell. +3. Copy the command provided into the shell, replacing `YOUR_PROJECT_ID` with + your own project ID. + +
+ +
Instructions for GCloud IAM Web UI + +> **NB:** Adding permissions to a role is a manual process that requires using +> the permission browser to add permissions one at a time. + +1. Go to the IAM section of the Google Console for your project. +1. Go to Roles section using the sidebar on the left. +1. Click on "+ Create role" near the top. +1. Fill in the details from the info below. + +
+ + +#### Analyst Role + +This role is used for analysts or other users who don't need to write to or +modify data in BigQuery. + +
Using the GCloud CLI + +``` bash +gcloud iam roles create bigquery_analyst_custom --title="BigQuery Analyst Custom" --description="Assigned to accounts used by analysts and SQL developers." --permissions=bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.routines.get,bigquery.routines.list,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.createSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,resourcemanager.projects.get --project=YOUR_PROJECT_ID +``` + +
+ +
Using the GCloud IAM Web UI + +| Field | Value | +|-------------------|-----------------------------------------------------------| +| Title | **BigQuery Analyst Custom** | +| Description | Assigned to accounts used by analysts and SQL developers. | +| ID | `bigquery_analyst_custom` | +| Role launch stage | General Availability | +| + Add permissions | See below | + +##### Permissions for `bigquery_analyst_custom` + +``` +bigquery.datasets.get +bigquery.datasets.getIamPolicy +bigquery.datasets.updateTag +bigquery.jobs.create +bigquery.jobs.get +bigquery.jobs.list +bigquery.jobs.listAll +bigquery.models.export +bigquery.models.getData +bigquery.models.getMetadata +bigquery.models.list +bigquery.routines.get +bigquery.routines.list +bigquery.savedqueries.create +bigquery.savedqueries.delete +bigquery.savedqueries.get +bigquery.savedqueries.list +bigquery.savedqueries.update +bigquery.tables.createSnapshot +bigquery.tables.export +bigquery.tables.get +bigquery.tables.getData +bigquery.tables.getIamPolicy +bigquery.tables.list +bigquery.tables.restoreSnapshot +resourcemanager.projects.get +``` + +
+ +#### Developer Role + +This role is used for developers or other users who need to be able to write to +or modify data in BigQuery. + +
Using the GCloud CLI + +``` bash +gcloud iam roles create bigquery_developer_custom --title="BigQuery Developer Custom" --description="Assigned to accounts used by developers." --permissions=bigquery.connections.create,bigquery.connections.delete,bigquery.connections.get,bigquery.connections.getIamPolicy,bigquery.connections.list,bigquery.connections.update,bigquery.connections.updateTag,bigquery.connections.use,bigquery.datasets.create,bigquery.datasets.delete,bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.datasets.update,bigquery.datasets.updateTag,bigquery.jobs.create,bigquery.jobs.delete,bigquery.jobs.get,bigquery.jobs.list,bigquery.jobs.listAll,bigquery.jobs.update,bigquery.models.create,bigquery.models.delete,bigquery.models.export,bigquery.models.getData,bigquery.models.getMetadata,bigquery.models.list,bigquery.models.updateData,bigquery.models.updateMetadata,bigquery.models.updateTag,bigquery.routines.create,bigquery.routines.delete,bigquery.routines.get,bigquery.routines.list,bigquery.routines.update,bigquery.routines.updateTag,bigquery.savedqueries.create,bigquery.savedqueries.delete,bigquery.savedqueries.get,bigquery.savedqueries.list,bigquery.savedqueries.update,bigquery.tables.create,bigquery.tables.createSnapshot,bigquery.tables.delete,bigquery.tables.deleteSnapshot,bigquery.tables.export,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.getIamPolicy,bigquery.tables.list,bigquery.tables.restoreSnapshot,bigquery.tables.setCategory,bigquery.tables.update,bigquery.tables.updateData,bigquery.tables.updateTag,resourcemanager.projects.get --project=YOUR_PROJECT_ID +``` + +
+ +
Using the GCloud IAM Web UI + +| Field | Value | +| ----------------- | ---------------------------------------- | +| Title | **BigQuery Developer Custom** | +| Description | Assigned to accounts used by developers. | +| ID | `bigquery_developer_custom` | +| Role launch stage | General Availability | +| + Add permissions | See below | + +##### Permissions for `bigquery_developer_custom` + +``` +bigquery.connections.create +bigquery.connections.delete +bigquery.connections.get +bigquery.connections.getIamPolicy +bigquery.connections.list +bigquery.connections.update +bigquery.connections.updateTag +bigquery.connections.use +bigquery.datasets.create +bigquery.datasets.delete +bigquery.datasets.get +bigquery.datasets.getIamPolicy +bigquery.datasets.update +bigquery.datasets.updateTag +bigquery.jobs.create +bigquery.jobs.delete +bigquery.jobs.get +bigquery.jobs.list +bigquery.jobs.listAll +bigquery.jobs.update +bigquery.models.create +bigquery.models.delete +bigquery.models.export +bigquery.models.getData +bigquery.models.getMetadata +bigquery.models.list +bigquery.models.updateData +bigquery.models.updateMetadata +bigquery.models.updateTag +bigquery.routines.create +bigquery.routines.delete +bigquery.routines.get +bigquery.routines.list +bigquery.routines.update +bigquery.routines.updateTag +bigquery.savedqueries.create +bigquery.savedqueries.delete +bigquery.savedqueries.get +bigquery.savedqueries.list +bigquery.savedqueries.update +bigquery.tables.create +bigquery.tables.createSnapshot +bigquery.tables.delete +bigquery.tables.deleteSnapshot +bigquery.tables.export +bigquery.tables.get +bigquery.tables.getData +bigquery.tables.getIamPolicy +bigquery.tables.list +bigquery.tables.restoreSnapshot +bigquery.tables.setCategory +bigquery.tables.update +bigquery.tables.updateData +bigquery.tables.updateTag +resourcemanager.projects.get +``` + +
+ +#### Appender Role + +This role is assigned to the service account used by the application connecting +to Google Cloud to append data to the `events` tables. + +
Using the GCloud CLI + +``` bash +gcloud iam roles create bigquery_appender_custom --title="BigQuery Appender Custom" --description="Assigned to service accounts used to append data to events tables." --permissions=bigquery.datasets.get,bigquery.tables.get,bigquery.tables.updateData +``` + +
+ +
Using the GCloud IAM Web UI + +| Field | Value | +|-------------------|--------------------------------------------------------------------| +| Title | **BigQuery Appender Custom** | +| Description | Assigned to service accounts used to append data to events tables. | +| ID | `bigquery_appender_custom` | +| Role launch stage | General Availability | +| + Add permissions | See below | + +##### Permissions for bigquery_appender_custom + +``` +bigquery.datasets.get +bigquery.tables.get +bigquery.tables.updateData +``` + +
+ +## Dataset and Table Setup + +`dfe-analytics` inserts events into a table in BigQuery with a pre-defined +schema. Access is given using a service account that has access to append data +to the given events table. The recommended setup is to have a separate dataset +and service account for each application / environment combination in your +project. + +For example let's say you have the applications `publish` and `find` in your +project, and use `development`, `qa`, `staging` and `production` environments. +You should create a separate dataset for each combination of the above, as well +as a separate service account that has access to append data to events in only +one dataset. The following table illustrates how this might look for this +example: + +| Application | Environment | BigQuery Dataset | Service Account | +|-------------|-------------|----------------------------|--------------------------------------------------------------| +| publish | development | publish_events_development | appender-publish-development@project.iam.gserviceaccount.com | +| publish | qa | publish_events_qa | appender-publish-qa@project.iam.gserviceaccount.com | +| publish | staging | publish_events_staging | appender-publish-staging@project.iam.gserviceaccount.com | +| publish | production | publish_events_production | appender-publish-production@project.iam.gserviceaccount.com | +| find | development | find_events_development | appender-find-development@project.iam.gserviceaccount.com | +| find | qa | find_events_qa | appender-find-qa@project.iam.gserviceaccount.com | +| find | staging | find_events_staging | appender-find-staging@project.iam.gserviceaccount.com | +| find | production | find_events_production | appender-find-production@project.iam.gserviceaccount.com | + +This approach helps prevent the possibility of sending events to the wrong +dataset, and reduce the risk should a secret key for one of these accounts +be leaked. + +> **NB:** It may be easier to perform these instructions with two browser tabs +> open, one for BigQuery and the other for IAM + +### 1. Create dataset(s) + +Start by creating a dataset. + +1. Open your project's BigQuery instance and go to the SQL Workspace section. +2. Tap on the 3 dots next to the project name then "Create dataset". +3. Name it something like `APPLICATIONNAME_events_ENVIRONMENT`, as per above + examples, e.g. `publish_events_development`, and set the location to + `europe-west2 (London)`. + +### 2. Create the events table + +Once the dataset is ready you need to create the `events` table in it: + +1. Select your new dataset and click the ![BigQuery new query + button](bigquery-new-query-button.png) to open a new query execution tab. +2. Copy the contents of [create-events-table.sql](create-events-table.sql) + into the query editor. +3. Edit your project and dataset names in the query editor. +4. Run the query to create a blank events table. + +BigQuery allows you to copy a table to a new dataset, so now is a good time to +create all the datasets you need and copy the blank `events` table to each of +them. + +### 3. Create an appender service account + +Create a service account that will be given permission to append data to tables +in the new dataset. + +1. Go to [IAM and Admin settings > Create service + account](https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create?supportedpurview=project) +2. Name it like "Appender NAME_OF_SERVICE ENVIRONMENT" e.g. "Appender + ApplyForQTS Development". +3. Add a description, like "Used for appending data from development + environments." +4. Copy the email address using the button next to it. You'll need this in the + next step to give this account access to your dataset. +5. Click the "CREATE AND CONTINUE" button. +6. Click "DONE", skipping the steps to grant roles and user access to this + account. Access will be given to the specific dataset in the next step. + +### 4. Give the service account access to your dataset + +Ensure you have the email address of the service account handy for this. + +1. Go to the dataset you created and click "SHARING" > "Permissions" near the + top right. +2. Click "ADD PRINCIPAL". +3. Paste in the email address of the service account you created into the "New + principals" box. +4. Select the "BigQuery Appender Custom" role you created previously. +5. Click "SAVE" to finish. + + +