Utility project to maintain BigQuery views. The main interface is the CLI.
Main features:
- Synchronize BigQuery Views between GCP and local file system
- Materialize Views (by running a view and saving it to a table):
- BigQuery Materialized Views are now available as Pre-GA
- Python 3
- Google Cloud SDK for gcloud
pip install bigquery-views-manager
SQL code of the view queries is assumed to be in files with the .sql
files. By default they will be in the views
directory.
The files can contain placeholders (surrounded by curly brackets, e.g. {placeholder}
) for the following variables:
name | description |
---|---|
project | The GCP project |
dataset | The BigQuery dataset |
Using the placeholders allows you to deploy the views to for example test, staging and production separately.
Example:
SELECT *
FROM `{project}.{dataset}.view1`
The views.yml
file contains the list of views that should be processed. It is important that the list of views are in the correct insert order. i.e. if v_view2
depends on v_view1
then v_view1
should appear first.
The format is a yaml file. In the simplest case it will be the list of the views, e.g.:
- v_view1
- v_view2
Additional parameters can be added, e.g. to materialize v_view1
:
- v_view1:
materialize: true
- v_view2
Or to materialize v_view1
to another table name:
- v_view1:
materialize: true
materialize_as: output_table1
- v_view2
The dataset could also be specified:
- v_view1:
materialize: true
materialize_as: output_dataset1.output_table1
- v_view2
When working with multiple datasets, this can also be conditional:
- v_view1:
materialize: true
conditions:
- if:
dataset: source_dataset1
materialize_as: "output_dataset1.output_table1"
- v_view2
The condition will depend on the passed in --dataset
.
Config tables are tables loaded from CSV. They are meant to assist views with configuration data, rather than loading large data. Config tables are generally used by views to avoid having to hard-code certain values in the views.
It is assumed that the filename is target table name with the .csv
file extension. By default in the ./config-tables/tables
directory. A BigQuery table schema can be specified via a file with the _schema.json
in the ./config-tables/schema
directory.
Another directory can be specified via the --config-tables-base-dir
CLI argument.
See example-data.
To get the command help:
python -m bigquery_views_manager --help
Or:
python -m bigquery_views_manager <sub-command> --help
python -m bigquery_views_manager \
create-or-replace-views \
--dataset=my_dataset \
[--view-list-config=/path/to/views.yml] \
[<view name> [<other view name> ...]]
Adding the --materialize
flag will additionally materialize the views (where it has been enabled). In that case views will be materialized immediately after updating a view.
python -m bigquery_views_manager \
materialize-views \
--dataset=my_dataset \
[--view-list-config=/path/to/views.yml] \
[<view name> [<other view name> ...]]
Show differences between local views and views within BigQuery.
python -m bigquery_views_manager \
diff-views \
--dataset=my_dataset \
[--view-list-config=/path/to/views.yml] \
[<view name> [<other view name> ...]]
Copy views from BigQuery to the local file system.
To get all of the files listed in views/views.yml
:
python -m bigquery_views_manager \
get-views \
--dataset=my_dataset \
[--view-list-config=/path/to/views.yml]
To get a particular view or views:
python -m bigquery_views_manager \
get-views \
--dataset=my_dataset \
[--view-list-config=/path/to/views.yml] \
<view name> [<other view name> ...]
When views are retrieved, the project name and dataset are replaced with placeholders.
Copy config tables (CSV) to BigQuery. The config tables are by default stored in ./config-tables
.
python -m bigquery_views_manager \
create-or-replace-config-tables \
--dataset=my_dataset \
[--config-tables-base-dir=/path/to/config-tables] \
[<table name> ...]
Add the view to the views
directory with the view name and .sql
file extension.
The view name also needs to be added to views/views.yml
in the correct order (i.e. if a view depends on another view, the other view should appear first).
The CLI also supports additional sub commands to delete views etc. Those are in particular use-ful in a CI environment.
The following commands are supported:
delete-config-tables
delete-views
delete-materialized-tables
There is also a Docker Image that can be used directly:
docker pull elifesciences/bigquery-views-manager:latest
docker run --rm \
--volume ~/.config/gcloud:/root/.config/gcloud \
--volume $$PWD/views:/data/views \
--env GOOGLE_CLOUD_PROJECT=my-gcp-project \
elifesciences/bigquery-views-manager:latest \
diff-views \
--dataset my_dataset