Skip to content

Commit

Permalink
Added FAQ and db dependencies to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mistercrunch committed Apr 8, 2016
1 parent eff0beb commit 0afa5d2
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 1 deletion.
34 changes: 34 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
FAQ
===


Can I query/join multiple tables at one time?
---------------------------------------------
Not directly no. A Caravel SQLAlchemy datasource can only be a single table
or a view.

When working with tables, the solution would be to materialize
a table that contains all the fields needed for your analysis, most likely
through some scheduled batch process.

A view is a simple logical layer that abstract an arbitrary SQL queries as
a virtual table. This can allow you to join and union multiple tables, and
to apply some transformation using arbitrary SQL expressions. The limitation
there is your database performance as Caravel effectively will run a query
on top of your query (view). A good practice may be to limit yourself to
joining your main large table to one or many small tables only, and avoid
using ``GROUP BY`` where possible as Caravel will do its own ``GROUP BY`` and
doing the work twice might slow down performance.

Whether you use a table or a view, the important factor is whether your
database is fast enough to serve it in an interactive fashion to provide
a good user experience in Caravel.


How BIG can my data source be?
------------------------------

It can be gigantic! As mentioned above, the main criteria is whether your
database can execute queries and return results in a time frame that is
acceptable to your users. Many distributed databases out there can execute
queries that scan through terabytes in an interactive fashion.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Contents
tutorial
videos
gallery
faq


Indices and tables
Expand Down
36 changes: 35 additions & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,40 @@ the `Flask App Builder Documentation
<http://flask-appbuilder.readthedocs.org/en/latest/config.html>`_
for more information on how to configure Caravel.

Database dependencies
---------------------

Caravel does not ship bundled with connectivity to databases, except
for Sqlite, which is part of the Python standard library.
You'll need to install the required packages for the database you
want to use as your metadata database as well as the packages needed to
connect to the databases you want to access through Caravel.

Here's a list of some of the recommended packages.

+---------------+-------------------------------------+-------------------------------------------------+
| database | pypi package | SQLAlchemy URI prefix |
+===============+=====================================+=================================================+
| MySQL | ``pip install mysqlclient`` | ``mysql://`` |
+---------------+-------------------------------------+-------------------------------------------------+
| Postgres | ``pip install psycopg2`` | ``postgresql+psycopg2://`` |
+---------------+-------------------------------------+-------------------------------------------------+
| Presto | ``pip install pyhive`` | ``presto://`` |
+---------------+-------------------------------------+-------------------------------------------------+
| Oracle | ``pip install cx_Oracle`` | ``oracle://`` |
+---------------+-------------------------------------+-------------------------------------------------+
| sqlite | | ``sqlite://`` |
+---------------+-------------------------------------+-------------------------------------------------+
| Redshift | ``pip install sqlalchemy-redshift`` | ``redshift+psycopg2://`` |
+---------------+-------------------------------------+-------------------------------------------------+
| MSSQL | ``pip install pymssql`` | ``mssql://`` |
+---------------+-------------------------------------+-------------------------------------------------+

Note that many other database are supported, the main criteria being the
existence of a functional SqlAlchemy dialect and Python driver. Googling
the keyword ``sqlalchemy`` in addition of a keyword that describes the
database you want to connect to should get you to the right place.


Caching
-------
Expand All @@ -147,7 +181,7 @@ parameters exposed by SQLAlchemy. In the ``Database`` edit view, you will
find an ``extra`` field as a ``JSON`` blob.

.. image:: _static/img/tutorial/add_db.png
:scale: 50 %
:scale: 30 %

This JSON string contains extra configuration elements. The ``engine_params``
object gets unpacked into the
Expand Down

0 comments on commit 0afa5d2

Please sign in to comment.