Skip to content

Commit

Permalink
docs: Updates to Superset Site for 1.0 (#12626)
Browse files Browse the repository at this point in the history
* incorporating precommit logic

* add 1.0 page

* fixed annoying docz config issue 2

* tweaked indentation

* added asf link 2

* changed Dockerhub link

* reverted frontend package lock json: precommit
  • Loading branch information
srinify authored Jan 26, 2021
1 parent da63b4b commit 017f11f
Show file tree
Hide file tree
Showing 53 changed files with 250 additions and 278 deletions.
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
name: New Drivers to Docker Image
name: Adding New Drivers in Docker
menu: Connecting to Databases
route: /docs/databases/dockeradddrivers
index: 1
version: 1
---
## Install New Database Drivers in Docker Image
## Adding New Database Drivers in Docker

Superset requires a Python database driver to be installed for each additional type of database you
want to connect to. When setting up Superset locally via `docker-compose`, the drivers and packages
Expand Down
27 changes: 11 additions & 16 deletions docs/src/pages/docs/Connecting to Databases/index.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
name: Install Database Drivers
name: Installing Database Drivers
menu: Connecting to Databases
route: /docs/databases/installing-database-drivers
index: 0
Expand All @@ -8,15 +8,13 @@ version: 1

## Install Database Drivers

Superset requires a Python database driver to be installed for each additional type of database you
want to connect to.
Superset requires a Python DB-API database driver and a SQLAlchemy dialect to be installed for each datastore you want to connect to.

Superset interacts with the underlying databases using the provided SQL interface (often times
through a SQLAlchemy library).
You can read more [here](/docs/databases/dockeradddrivers) about how to install new database drivers into your Superset configuration.

### Supported Databases and Dependecies
### Supported Databases and Dependencies

Superset does not ship bundled with connectivity to databases, except for Sqlite, which is part of the Python standard library. You’ll need to install the required packages for the database you want to use as your metadata database as well as the packages needed to connect to the databases you want to access through Superset.
Superset does not ship bundled with connectivity to databases, except for SQLite, which is part of the Python standard library. You’ll need to install the required packages for the database you want to use as your metadata database as well as the packages needed to connect to the databases you want to access through Superset.

A list of some of the recommended packages.

Expand All @@ -32,7 +30,7 @@ A list of some of the recommended packages.
|[Apache Pinot](/docs/databases/pinot)|```pip install pinotdb```|```pinot+http://CONTROLLER:5436/ query?server=http://CONTROLLER:5983/```|
|[Apache Solr](/docs/databases/solr)|```pip install sqlalchemy-solr```|```solr://{username}:{password}@{hostname}:{port}/{server_path}/{collection}```
|[Apache Spark SQL](/docs/databases/spark)|```pip install pyhive```|```hive://hive@{hostname}:{port}/{database}```
|[Azure MS SQL](/docs/databases/sqlserver)||```mssql+pymssql://UserName@presetSQL:TestPassword@presetSQL.database.windows.net:1433/TestSchema```
|[Azure MS SQL](/docs/databases/sql-server)|```pip install pymssql``` |```mssql+pymssql://UserName@presetSQL:TestPassword@presetSQL.database.windows.net:1433/TestSchema```
|[Big Query](/docs/databases/bigquery)|```pip install pybigquery```|```bigquery://{project_id}```|
|[ClickHouse](/docs/databases/clickhouse)|```pip install sqlalchemy-clickhouse```|```clickhouse://{username}:{password}@{hostname}:{port}/{database}```|
|[CockroachDB](/docs/databases/cockroachdb)|```pip install cockroachdb```|```cockroachdb://root@{hostname}:{port}/{database}?sslmode=disable```|
Expand All @@ -48,24 +46,21 @@ A list of some of the recommended packages.
|[SAP Hana](/docs/databases/hana)|```pip install hdbcli sqlalchemy-hana or pip install apache-superset[hana]```|```hana://{username}:{password}@{host}:{port}```|
|[Snowflake](/docs/databases/snowflake)|```pip install snowflake-sqlalchemy```|```snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse={warehouse}```|
|SQLite||```sqlite://```|
|[SQL Server](/docs/databases/sqlserver)|```pip install pymssql```|```mssql://```|
|[SQL Server](/docs/databases/sql-server)|```pip install pymssql```|```mssql://```|
|[Teradata](/docs/databases/teradata)|```pip install sqlalchemy-teradata```|```teradata://{user}:{password}@{host}```|
|[Vertica](/docs/databases/vertica)|```pip install sqlalchemy-vertica-python```|```vertica+vertica_python://<UserName>:<DBPassword>@<Database Host>/<Database Name>```|

***

Note that many other databases are supported, the main criteria being the existence of a functional SqlAlchemy dialect and Python driver. Googling the keyword sqlalchemy in addition of a keyword that describes the database you want to connect to should get you to the right place.
Note that many other databases are supported, the main criteria being the existence of a functional
SQLAlchemy dialect and Python driver. Searching for the keyword "sqlalchemy + (database name)"
should help get you to the right place.

If your database or data engine isn't on the list but a SQL interface
exists, please file an issue on the
[Superset GitHub repo](https://github.com/apache/superset/issues), so we can work on
[Superset GitHub repo](https://github.com/apache/superset/issues), so we can work on documenting and
supporting it.

[StackOverflow](https://stackoverflow.com/questions/tagged/apache-superset+superset) and the
[Superset community Slack](https://join.slack.com/t/apache-superset/shared_invite/zt-l5f5e0av-fyYu8tlfdqbMdz_sPLwUqQ)
are great places to get help with connecting to databases in Superset.

In the end, you should be looking for a Python package compatible with your database. One part that
makes database driver installation tricky is the fact that local binaries are sometimes required in
order for them to bind properly, which means that various apt packages might need to be installed
before pip can get things set up.
Original file line number Diff line number Diff line change
Expand Up @@ -21,24 +21,26 @@ following information about each flight is given:
- Information about the origin and destination.
- The distance between the origin and destination, in kilometers (km).

### Enabling Upload a CSV Functionality
### Enabling Data Upload Functionality

You may need to enable the functionality to upload a CSV to your database. The following section
You may need to enable the functionality to upload a CSV or Excel file to your database. The following section
explains how to enable this functionality for the examples database.

In the top menu, select **Sources ‣ Databases**. Find the **examples** database in the list and
select the edit record button.
In the top menu, select **Data ‣ Databases**. Find the **examples** database in the list and
select the **Edit** button.

<img src="/images/edit-record.png" />

Within the **Edit Database** page, check the **Allow Csv Upload** checkbox. Save by selecting
**Save** at the bottom of the page.
In the resulting modal window, switch to the **Extra** tab and
tick the checkbox for **Allow Data Upload**. End by clicking the **Save** button.

<img src="/images/add-data-upload.png" />

### Loading CSV Data

Download the CSV dataset to your computer from
[Github](https://raw.githubusercontent.com/apache-superset/examples-data/master/tutorial_flights.csv).
In the Superset menu, select **Sources > Upload a CSV**.
In the Superset menu, select **Data ‣ Upload a CSV**.

<img src="/images/upload_a_csv.png" />

Expand All @@ -54,53 +56,42 @@ Leaving all the other options in their default settings, select **Save** at the

### Table Visualization

In this section, we’ll create our first visualization: a table to show the number of flights and
cost per travel class.

To create a new chart, select **New > Chart**.

<img src="/images/add_new_chart.png" />

Once in the **Create a new chart** form, select _tutorial_flights_ from the **Chose a datasource**
dropdown.

<img src="/images/chose_a_datasource.png" />
You should now see _tutorial_flights_ as a dataset in the **Datasets** tab. Click on the entry to
launch an Explore workflow using this dataset.

Next, select the visualization type as **Table**.
In this section, we'll create a table visualization
to show the number of flights and cost per travel class.

<img src="/images/select_table_visualization_type.png" />

Then, select **Create new chart** to go into the chart view.

By default, Apache Superset only shows the last week of data: in our example, we want to look at all
the data in the dataset. No problem - within the **Time** section, remove the filter on **Time
range** by selecting **Last week** then changing the selection to **No filter**, with a final **OK**
to confirm your selection.
By default, Apache Superset only shows the last week of data. In our example, we want to visualize all
of the data in the dataset. Click the **Time ‣ Time Range** section and change
the **Range Type** to **No Filter**.

<img src="/images/no_filter_on_time_filter.png" />

Click **Apply** to save.

Now, we want to specify the rows in our table by using the **Group by** option. Since in this
example, we want to understand different Travel Classes, we select **Travel Class** in this menu.

Next, we can specify the metrics we would like to see in our table with the **Metrics**option.
Count(\*), which represents the number of rows in the table (in this case corresponding to the
number of flights since we have a row per flight), is already there. To add cost, within
**Metrics**, select **Cost**.

**Save** the default aggregation option, which is to sum the column.
- `COUNT(*)`, which represents the number of rows in the table
(in this case, quantity of flights in each Travel Class)
- `SUM(Cost)`, which represents the total cost spent by each Travel Class

<img src="/images/sum_cost_column.png" />

Finally, select **Run Query** to see the results of the table.

<img src="/images/tutorial_table.png" />

Congratulations, you have created your first visualization in Apache Superset!
To save the visualization, click on **Save** in the top left of the screen. In the following modal,

To save the visualization, click on **Save** in the top left of the screen. Select the ** Save as**
option, and enter the chart name as Tutorial Table (you will be able to find it again through the
**Charts** screen, accessible in the top menu). Similarly, select **Add to new dashboard** and enter
Tutorial Dashboard. Finally, select **Save & go to dashboard**.
- Select the ** Save as**
option and enter the chart name as Tutorial Table (you will be able to find it again through the
**Charts** screen, accessible in the top menu).
- Select **Add To Dashboard** and enter
Tutorial Dashboard. Finally, select **Save & Go To Dashboard**.

<img src="/images/save_tutorial_table.png" />

Expand All @@ -124,10 +115,12 @@ In this section, we will extend our analysis using a more complex visualization,
end of this section, you will have created a table that shows the monthly spend on flights for the
first six months, by department, by travel class.

As before, create a new visualization by selecting **New > Chart** on the top menu. Choose
Create a new chart by selecting **+ ‣ Chart** from the top right corner. Choose
tutorial_flights again as a datasource, then click on the visualization type to get to the
visualization menu. Select the **Pivot Table** visualization (you can filter by entering text in the
search box) and then **Create a new chart**.
search box) and then **Create New Chart**.

<img src="/images/create_pivot.png" />

In the **Time** section, keep the Time Column as Travel Date (this is selected automatically as we
only have one time column in our dataset). Then select Time Grain to be month as having daily data
Expand All @@ -151,22 +144,18 @@ see some data!

<img src="/images/tutorial_pivot_table.png" />

You should see months in the rows and Department and Travel Class in the columns. To get this in our
dashboard, select Save, name the chart Tutorial Pivot and using **Add chart to existing dashboard**
select **Tutorial Dashboard**, and then finally **Save & go to dashboard**.
You should see months in the rows and Department and Travel Class in the columns. Publish this chart
to your existing Tutorial Dashboard you created earlier.

### Line Chart

In this section, we are going to create a line chart to understand the average price of a ticket by
month across the entire dataset. As before, select **New > Chart**, and then tutorial_flights as the
datasource and Line Chart as the visualization type.
month across the entire dataset.

In the Time section, as before, keep the Time Column as Travel Date and Time Grain as month but this
time for the Time range select No filter as we want to look at entire dataset.

Within Metrics, remove the default COUNT(\*) and add Cost. This time, we want to change how this
column is aggregated to show the mean value: we can do this by selecting AVG in the aggregate
dropdown.
Within Metrics, remove the default `COUNT(*)` metric and instead add `AVG(Cost)`, to show the mean value.

<img src="/images/average_aggregate_for_cost.png" />

Expand All @@ -187,8 +176,7 @@ and Y Axis Label.

<img src="/images/tutorial_line_chart.png" />

Once you’re done, Save as Tutorial Line Chart, use **Add chart to existing dashboard** to add this
chart to the previous ones on the Tutorial Dashboard and then **Save & go to dashboard**.
Once you’re done, publish the chart in your Tutorial Dashboard.

### Markup

Expand Down Expand Up @@ -216,8 +204,8 @@ To exit, select any other part of the dashboard. Finally, don’t forget to keep
In this section, you will learn how to add a filter to your dashboard. Specifically, we will create
a filter that allows us to look at those flights that depart from a particular country.

A filter box visualization can be created as any other visualization by selecting **New > Chart**,
and then tutorial_flights as the datasource and Filter Box as the visualization type.
A filter box visualization can be created as any other visualization by selecting **+ ‣ Chart**,
and then _tutorial_flights_ as the datasource and Filter Box as the visualization type.

First of all, in the **Time** section, remove the filter from the Time range selection by selecting
No filter.
Expand Down
Loading

0 comments on commit 017f11f

Please sign in to comment.