Skip to content

Commit

Permalink
Merge pull request #520 from EverythingMe/docs-datasources
Browse files Browse the repository at this point in the history
Docs: update documentation about data sources
  • Loading branch information
arikfr committed Aug 2, 2015
2 parents 464402a + 6e45706 commit 05d1886
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 172 deletions.
249 changes: 85 additions & 164 deletions docs/datasources.rst
Original file line number Diff line number Diff line change
@@ -1,162 +1,117 @@
Supported Data Sources
######################

re:dash supports several types of data sources (see below the full list)
and their management is done with the CLI (``manage.py``):
re:dash supports several types of data sources, and if you set it up using the provided images, it should already have
the needed dependencies to use them all. Starting from version 0.7 and newer, you can manage data sources from the UI
by browsing to ``/data_sources`` on your instance.

Create new data source
======================
If one of the listed data source types isn't available when trying to create a new data source, make sure that:

.. code:: bash
1. You installed required dependencies.
2. If you've set custom value for the ``REDASH_ENABLED_QUERY_RUNNERS`` setting, it's included in the list.

$ cd /opt/redash/current
$ sudo -u redash bin/run ./manage.py ds new -n {name} -t {type} -o {options}
If you omit any of the options (-n, -t, -o) it will show a prompt asking
for it. Options is a JSON string with the connection parameters. Unless
you're doing some sort of automation, it's probably easier to leave it
empty and fill out the prompt.
PostgreSQL / Redshift
---------------------

See below for the different supported data sources types and the
relevant options string format.
- **Options**:

Listing existing data sources
=============================
- Database name (mandatory)
- User
- Password
- Host
- Port

.. code:: bash
- **Additional requirements**:

$ sudo -u redash bin/run ./manage.py ds list
- None

Supported data sources
======================

PostgreSQL / Redshift
---------------------
MySQL
-----

- **Type**: pg
- **Options**:

- User (user)
- Password (password)
- Host (host)
- Port (port)
- Database name (dbname) (mandatory)
- Database name (mandatory)
- User
- Password
- Host
- Port

- **Options string format (for v0.5 and older)**: "user= password=
host= port=5439 dbname="
- **Additional requirements**:

MySQL
-----
- ``MySQL-python`` python package


Google BigQuery
---------------

- **Type**: mysql
- **Options**:

- User (user)
- Password (passwd)
- Host (host)
- Port (port)
- Database name (db) (mandatory)
- Project ID (mandatory)
- JSON key file, generated when creating a service account (see `instructions <https://developers.google.com/console/help/new/#serviceaccounts>`__).

- **Options string format (for v0.5 and older)**:
"Server=localhost;User=;Pwd=;Database="

Note that you need to install the MySQLDb package as it is not included
in the ``requirements.txt`` file.
- **Additional requirements**:

- ``google-api-python-client``, ``oauth2client`` and ``pyopenssl`` python packages (on Ubuntu it might require installing ``libffi-dev`` and ``libssl-dev`` as well).


Graphite
--------

- **Type**: graphite
- **Options**:

- Url (url) (mandatory)
- User (username)
- Password (password)
- Verify SSL ceritficate (verify)
- Url (mandatory)
- User
- Password
- Verify SSL certificate

- **Options string format**: '{"url":
"https://graphite.yourcompany.com", "auth": ["user", "password"],
"verify": true}'

Google BigQuery
---------------
MongoDB
-------

- **Type**: bigquery
- **Options**:

- Service Account (serviceAccount) (mandatory)
- Project ID (projectId) (mandatory)
- Private Key filename (privateKey) (mandatory)
- Connection String (mandatory)
- Database name
- Replica set name

- **Options string format (for v0.5 and older)**: {"serviceAccount" :
"43242343247-fjdfakljr3r2@developer.gserviceaccount.com",
"privateKey" : "/somewhere/23fjkfjdsfj21312-privatekey.p12",
"projectId" : "myproject-123" }
- **Additional requirements**:

Notes:
- ``pymongo`` python package.

1. To obtain BigQuery credentials follow the guidelines at:
https://developers.google.com/bigquery/authorization#service-accounts
2. You need to install the ``google-api-python-client``,
``oauth2client`` and ``pyopenssl`` packages (PyOpenSSL requires
``libffi-dev`` and ``libssl-dev`` packages), as they are not included
in the ``requirements.txt`` file.
For information on how to write MongoDB queries, see :doc:`documentation </usage/mongodb_querying>`.

Google Spreadsheets
-------------------

(supported from v0.6.4)

- **Type**: google\_spreadsheets
- **Options**:

- Credentials filename (credentialsFilePath) (mandatory)
ElasticSearch
-------------

Notes:
...

1. To obtain Google ServiceAccount credentials follow the guidelines at:
https://developers.google.com/console/help/new/#serviceaccounts (save
the JSON version of the credentials file)
2. To be able to load the spreadsheet in re:dash - share your it with
your ServiceAccount's email (it can be found in the credentials json
file, for example
43242343247-fjdfakljr3r2@developer.gserviceaccount.com) Note: all the
service account details can be seen inside the json file you should
obtain following step #1
3. The query format is "DOC\_UUID\|SHEET\_NUM" (for example
"kjsdfhkjh4rsEFSDFEWR232jkddsfh\|0")
4. You (might) need to install the ``gspread``, ``oauth2client`` and
``dateutil`` packages as they are not included in the
``requirements.txt`` file.
InfluxDB
--------

MongoDB
-------
...

- **Type**: mongo
- **Options**:
Presto
------

- Connection String (connectionString) (mandatory)
- Database name (dbName)
- Replica set name (replicaSetName)
...

- **Options string format (for v0.5 and older)**: { "connectionString"
: "mongodb://user:password@localhost:27017/mydb", "dbName" : "mydb" }
Hive
----

For ReplicaSet databases use the following connection string: \*
**Options string format**: { "connectionString" :
"mongodb://user:pasword@server1:27017,server2:27017/mydb", "dbName" :
"mydb", "replicaSetName" : "myreplicaSet" }
...

Notes:
Impala
------

1. You need to install ``pymongo``, as it is not included in the
``requirements.txt`` file.
...

URL
---

A URL based data source which requests URLs that conforms to the
supported :doc:`results JSON
A URL based data source which requests URLs that return the :doc:`results JSON
format </dev/results_format>`.

Very useful in situations where you want to expose the data without
Expand All @@ -165,81 +120,47 @@ connecting directly to the database.
The query itself inside re:dash will simply contain the URL to be
executed (i.e. http://myserver/path/myquery)

- **Type**: url
- **Options**:

- Url (url)
- Url - set this if you want to limit queries to certain base path.

- **Options string format (optional) (for v0.5 and older)**:
http://myserver/path/

Notes:

1. All URLs must return the supported :doc:`results JSON
format </dev/results_format>`.
2. If the Options string is set, only URLs that are part of the supplied
path can be executed using this data source. Not setting the options
path allows any URL to be executed as long as it returns the
supported :doc:`results JSON
format </dev/results_format>`.

Script
------

Allows executing any executable script residing on the server as long as
its standard output conforms to the supported :doc:`results JSON
format </dev/results_format>`.

This integration is useful in situations where you need more than just a
query and requires some processing to happen.

Once the path to scripts is configured in the datasource the query needs
to contain the file name of the script as well as any command line
parameters the script requires (i.e. myscript.py param1 param2
--param3=value)
Google Spreadsheets
-------------------

- **Type**: script
- **Options**:

- Scripts Path (path) (mandatory)
- JSON key file, generated when creating a service account (see `instructions <https://developers.google.com/console/help/new/#serviceaccounts>`__).

- **Additional requirements**:

- **Options string format (for v0.5 and older)**: /path/to/scripts/
- ``gspread`` and ``oauth2client`` python packages.

Notes:

1. You MUST set a path to execute the scripts, otherwise the data source
will not work.
2. All scripts must be executable, otherwise results won't return
3. The script data source does not allow relative paths in the form of
"../". You may use a relative sub path such as "./mydir/myscript".
4. All scripts must output to the standard output the supported :doc:`results
JSON format </dev/results_format>` and
only that, otherwise the data source will not be able to load the
data.
1. To be able to load the spreadsheet in re:dash - share your it with
your ServiceAccount's email (it can be found in the credentials json
file, for example
43242343247-fjdfakljr3r2@developer.gserviceaccount.com).
2. The query format is "DOC\_UUID\|SHEET\_NUM" (for example
"kjsdfhkjh4rsEFSDFEWR232jkddsfh\|0")


Python
------

Execute other queries, manipulate and compute with Python code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Python data source allows running Python code in a secure and safe
environment. It won't allow writing files to disk, importing modules
that were not pre-approved in the configuration etc.

One of the benefits of using the Python data source is its ability to
execute queries (or saved queries) which you can store in a variable and
then manipulate/transform/merge with other data and queries.
**Execute other queries, manipulate and compute with Python code**

You can import data analysis libraries such as Pandas, NumPy and SciPy.
This is a special query runner, that will execute provided Python code as the query. Useful for various scenarios such as
merging data from different data sources, doing data transformation/manipulation that isn't trivial with SQL, merging
with remote data or using data analysis libraries such as Pandas (see `example query <https://gist.github.com/arikfr/be7c2888520c44cf4f0f>`__).

This saved the trouble of having outside scripts do the synthesis of
data from multiple sources to create a single data set that can then be
used in dashboards.
While the Python query runner uses a sandbox (RestrictedPython), it's not 100% secure and the security depends on the
modules you allow to import. We recommend enabling the Python query runner only in a trusted environment (meaning: behind
VPN and with users you trust).

- **Type**: Python
- **Options**:

- Allowed Modules in a comma separated list (optional). **NOTE:**
You MUST make sure these modules are installed on the machine
running the Celery workers
running the Celery workers.
10 changes: 3 additions & 7 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Setup

Once you created the instance with either the image or the script, you
should have a running re:dash instance with everything you need to get
started. You can even login to it with the user "admin" (password:
started. You can now login to it with the user "admin" (password:
"admin"). But to make it useful, there are a few more steps that you
need to manually do to complete the setup:

Expand Down Expand Up @@ -133,18 +133,14 @@ file.
Datasources
-----------

To make re:dash truly useful, you need to setup your data sources in it.
Currently all data sources management is done with the CLI.
To make re:dash truly useful, you need to setup your data sources in it. Browse to ``/data_sources`` on your instance,
to create new data source connection.

See
:doc:`documentation </datasources>`
for the different options. Your instance comes ready with dependencies
needed to setup supported sources.

Follow issue
`#193 <https://github.com/EverythingMe/redash/issues/193>`__ to know
when UI was implemented to manage data sources.

How to upgrade?
---------------

Expand Down
1 change: 0 additions & 1 deletion redash/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,6 @@ def all_settings():
'redash.query_runner.mongodb',
'redash.query_runner.mysql',
'redash.query_runner.pg',
'redash.query_runner.script',
'redash.query_runner.url',
'redash.query_runner.influx_db',
'redash.query_runner.elasticsearch',
Expand Down

0 comments on commit 05d1886

Please sign in to comment.