- Cloak
This component acts as a query layer over sensitive data. It works together with the air system to allow users to issue queries which return aggregated anonymized statistics (properties and counts).
You need to have following installed:
- Erlang and Elixir (see here for details)
- PostgreSQL 9.4 or more recent
- linux:
- packages:
unixodbc
,odbc-postgresql
- Configured ODBC DSN for PostgreSQL - execute the following as root from the
cloak
folder:odbcinst -i -s -l -f priv/odbc/odbc.ini
- packages:
- macos
- packages (homebrew):
unixodbc
,psqlodbc
- Configured ODBC DSN for PostgreSQL - execute the following as root from the
cloak
folder: `odbcinst -i -s -l -f priv/odbc/osx/odbc.ini - Double check the macos guide to ensure you have Erlang built with support for odbc.
brew install wget
as the Makefile relies onwget
to install ODBC drivers
- packages (homebrew):
Run mix deps.get
to install the dependencies, followed by make
or make start
to build or build and start
the cloak, respectively.
You can override some of the configuration settings by creating a config/dev.local.exs
file. The format is the same as
config/dev.exs
.
You can create a database with initial data by running make recreate-db
.
It assumes you have a postgres server running on port 20002, which is started by default with start_dependencies.sh.
If everything is properly installed and setup, standard tests invoked with make test
should pass.
Cloaks have two sets of configuration files.
One set contains static configuration that remain the same across
installations. These configuration files can be found under config/*
.
The other contain configuration parameters that are potentially unique to a particular installation,
and which are what you are going to adapt when installing a cloak. These configuration files are written
as a JSON file that is mounted into the deployed cloak container.
A sample cloak configuration can be found under priv/config/dev.json
.
The skeleton of a configuration looks like this:
{
"air_site": "<URL OF AIR INSTALLATION>",
"data_sources": []
}
The data sources section contain a list of individual data source configurations. A data source configuration might look like this:
{
"driver": "<driver-name>",
"marker": "<optional marker>",
"parameters": {
...
},
"tables": {
"<table name as seen by user>": {
"db_name": "<table as named in the database>",
"user_id": "<user id column name in the table>"
},
...
}
},
A data source requires a driver supported by the cloak. For example postgressql
or odbc
.
The ODBC connector is a catch-all for drivers that don't have native support in the cloak.
The parameters are used by the driver to connect to the data store. In the case of the ODBC
driver, the
parameters are concatenated into an ODBC connection string.
Data sources are identified by a global identifier created from a combination of the connection parameters.
As a result, multiple distinct cloaks connecting to the same data store, will all make available a data source
with the same id. If you want to make multiple data sources unique despite being backed by the same database and user,
you can use the marker
option to add more context to the global identifier produced.
The tables
section lists the set of tables that should be made available to the analyst.
It allows the system to be configured such that the table name presented to the analyst doesn't necessarily
reflect the name used by the underlying data store.
make start
- starts the systemmake test
- runs standard tests (unit and simple property tests)make dialyze
- runs the dialyzermake lint
- style checking of Elixir codemake doc
- generates the documentationmake release-local
- creates the OTP release that can be run locally (on a dev machine)mix coveralls.html
- runs ExUnit tests with test coverage (generates an HTML output in thecover
folder)
Occasionally you may want to run only some tests. Depending on the kind of test, you have the following commands at your disposal:
mix test
- ExUnit testsmix eunit
- EUnit (Erlang) testsmake proper
- runs all property testsmake proper-std
- runs only simple property testsmake proper-extended
- runs extended (longer running) property testsmix test/some_test_script.exs
to run a single ExUnit test suite. See here for more information.mix eunit --module erlang_module
EUnit test of a single Erlang module.mix proper --module erlang_module
PropEr test of a single module.
Note that when specifying Erlang modules, you need to provide the name of the real module and not the test one. For
example, let's say you have the module anonymizer
and property tests are residing in the anonymizer_test
module. The
corresponding command is mix proper --module anonymizer
(without the _test
suffix).
By default, only native PostgreSQL adapter is tested locally, while other drivers are excluded. To change this you can run following commands:
mix test --only compliance
- to run only the compliance testsmake test_all
- to run all tests which are running on CI: standard tests, and tests for all other database adapters (Oracle, ...). Note however that compliance tests are going to be executed on a reduced database set (as specified incompliance.json
).
In order to have working tests on other drivers, you need to start corresponding database servers locally - see Installing database servers.
Each compliance test gets its tag which is reported for each failing test. This simplifies running a single compliance test. Example:
mix test --only "compliance:upper(<col>) changes.change notes_changes"
Note that there is no space after :
.
In addition, tests for each functions are tagged, so one can exercise a single function with:
mix test --only "pow(<col1>, <col2>)"
It is possible to run cloak as a local docker container:
- Make sure all the air dependencies are started with
../air/start_dependencies.sh
. - If needed, create the cloak database on docker with
DB_PORT=20002 ./regenerate-db.sh
. - Start the
air
container with../air/container.sh console
. - Run
./build_image.sh
to create the docker image. - Start the container with
./container.sh console
.
You can now interact with the cloak via the dockerized air (http://localhost:8080 or https://insights.air-local:8443).
To run the full compliance CI test locally with the command make ci.compliance
. This will start the required database
containers, start the cloak container, generate test data, and then enter the bash shell in the container. Now, you can
run tests with mix test --only compliance
. The container uses the source files mounted from your host, so you can
easily edit those files and repeatedly run the tests without needing to rebuild the image.
If you want to test some specific databases, you can set the CLOAK_DATA_SOURCES
env variable. For example, to test
only PostgreSQL and Oracle, you can run the following command:
CLOAK_DATA_SOURCES="postgresql oracle" make ci.compliance
The CLOAK_DATA_SOURCES
env var is a whitespace separated list of data source names which you want to use in the test
suite. For the list of available names, take a look at configuration files in this folder.
The default number of generated users is 10. You can change this by setting the COMPLIANCE_USERS
env variable:
COMPLIANCE_USERS=50 make ci.compliance
A task is available to run randomly-generated queries against the current version of cloak (fuzz testing). To invoke it:
MIX_ENV=test mix fuzzer.run --queries 10
Note that the queries
option specifies how many queries to run. This test runs against the same configuration as the
compliance tests. You can use mix gen.test_data
to create these databases:
MIX_ENV=test mix gen.test_data compliance 10
Or you can run from inside the compliance container to have access to different data source types.
The fuzzer creates three output files (by default /tmp/all.txt
, /tmp/stats.txt
, and /tmp/crashes.txt
) where the
results are stored. The crashes
file contains all queries that failed in an unexpected way along with the details of
the error - this should be the most interesting output for regular usage, the other outputs are mostly meant for
debugging the fuzzer itself. Each query in crashes
is a potential bug; however, it might also be a bug in the fuzzer.
The all
file lists all queries executed and the class of result obtained for each of them - ok
means the query
executed successfully, unexpected_error
means the query failed in a way unknown to the fuzzer, and the other
categories are different kinds of common errors, like anonymity constraint violations. The stats
file lists the kinds
of results from the all
file along with the number of times each result was obtained.
See here.
Before running the tests you need to prepare the performance database.
make recreate-perf-db
- generate apreformance
DB with 10k usersmake perftest
- run the performance tests
Note that the tests submit results to InfluxDB - it will be started with start_dependencies.sh
.
Dev container is built to allow developers who work on macOS to develop against other databases, such as Oracle. Before starting the container, you need to first build the air container and start it locally (with air/build-image.sh
and air/container.sh console
). Then, you can start the cloak dev container by going to the cloak folder and invoking make dev-container
.
Once the container is started, you can start the cloak with make start
, and visit http://localhost:8080
. At this point, you should be able to access Oracle datasources.
The source files of the cloak project are mounted into the container. Therefore, you can freely edit the source files, and restart the cloak without needing to rebuild and restart the container.