Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation on spatial data #2750

Merged
merged 10 commits into from
Dec 24, 2022
Merged
1 change: 1 addition & 0 deletions doc/user_guide/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,7 @@ data before usage in Altair using GeoPandas for example as such:
:hidden:

self
data/index
encoding
marks/index
transform/index
Expand Down
8 changes: 8 additions & 0 deletions doc/user_guide/data/dataframe.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. currentmodule:: altair

.. _user-guide-dataframe-data:

DataFrame
~~~~~~~~~

Describe
8 changes: 8 additions & 0 deletions doc/user_guide/data/dict.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. currentmodule:: altair

.. _user-guide-dict-data:

Dictionary
~~~~~~~~~~

Describe
8 changes: 8 additions & 0 deletions doc/user_guide/data/generator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. currentmodule:: altair

.. _user-guide-generator-data:

Generated data
~~~~~~~~~~~~~~

Describe
49 changes: 49 additions & 0 deletions doc/user_guide/data/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
.. currentmodule:: altair

.. _user-guide-data:

Data
~~~~

The basic data model used by Altair is tabular data, similar to a spreadsheet, pandas DataFrame or a database table. Individual data sets are assumed to contain a collection of records, which may contain any number of named data fields.

Each top-level chart object (i.e. :class:`Chart`, :class:`LayerChart`,
and :class:`VConcatChart`, :class:`HConcatChart`, :class:`RepeatChart`,
:class:`FacetChart`) accepts a dataset as its first argument.

Altair provides the following ways to specify a dataset:

========================================= ================================================================================
Data Description
========================================= ================================================================================
:ref:`user-guide-dataframe-data` A `Pandas DataFrame <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html>`_.
:ref:`user-guide-dict-data` A :class:`Data` or related object (i.e. :class:`UrlData`, :class:`InlineData`, :class:`NamedData`).
:ref:`user-guide-url-data` A url string pointing to a ``json`` or ``csv`` formatted text file.
:ref:`user-guide-spatial-data` An object that supports the `__geo_interface__` (eg. `Geopandas GeoDataFrame <http://geopandas.org/data_structures.html#geodataframe>`_, `Shapely Geometries <https://shapely.readthedocs.io/en/latest/manual.html#geometric-objects>`_, `GeoJSON Objects <https://github.com/jazzband/geojson#geojson-objects>`_).
:ref:`user-guide-generator-data` A generated dataset such as numerical sequences or geographic reference elements.
========================================= ================================================================================

When data is specified as a DataFrame, the encoding is quite simple, as Altair
uses the data type information provided by Pandas to automatically determine
the data types required in the encoding. For example, here we specify data via a Pandas DataFrame:

.. altair-plot::

import altair as alt
import pandas as pd

data = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E'],
'y': [5, 3, 6, 7, 2]})
alt.Chart(data).mark_bar().encode(
x='x',
y='y',
)

.. toctree::
:hidden:

dataframe
dict
url
spatial
generator
280 changes: 280 additions & 0 deletions doc/user_guide/data/spatial.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
.. currentmodule:: altair

.. _user-guide-spatial-data:

Spatial Data
~~~~~~~~~~~~

On this page we explain different methods to work with spatial data and Altair.

The following methods for working with spatial data are discussed below:

- :ref:`spatial-data-gdf`
- :ref:`spatial-data-inline-geojson`
- :ref:`spatial-data-remote-geojson`
- :ref:`spatial-data-inline-topojson`
- :ref:`spatial-data-remote-topojson`
- :ref:`spatial-data-nested-geojson`


.. _spatial-data-gdf:

GeoPandas GeoDataFrame
~~~~~~~~~~~~~~~~~~~~~~

It is convenient to use geopandas as source for your spatial data.
Geopandas can read many type spatial data and Altair is optimized in
reading these. Here we define four polygon geometries into a
GeoDataFrame and visualize these using the ``mark_geoshape``.

.. altair-plot::
:output: repr

from shapely import geometry
import geopandas as gpd
import altair as alt

data_geoms = [
{"color": "#F3C14F", "geometry": geometry.Polygon([[1.45, 3.75], [1.45, 0], [0, 0], [1.45, 3.75]])},
{"color": "#4098D7", "geometry": geometry.Polygon([[1.45, 0], [1.45, 3.75], [2.57, 3.75], [2.57, 0], [2.33, 0], [1.45, 0]])},
{"color": "#66B4E2", "geometry": geometry.Polygon([[2.33, 0], [2.33, 2.5], [3.47, 2.5], [3.47, 0], [3.2, 0], [2.57, 0], [2.33, 0]])},
{"color": "#A9CDE0", "geometry": geometry.Polygon([[3.2, 0], [3.2, 1.25], [4.32, 1.25], [4.32, 0], [3.47, 0], [3.2, 0]])},
]

gdf_geoms = gpd.GeoDataFrame(data_geoms)
gdf_geoms


This data uses a non-geographic projection. Therefor we use the
``project`` configuration ``type="identity", reflectY=True`` to draw the
geometries without applying a projection. By using ``scale=None`` we
disable the scale for the color channel and Altair will use the defined
-Hex color- codes directly.

.. altair-plot::

alt.Chart(gdf_geoms, title="Vega-Altair").mark_geoshape().encode(
color=alt.Color("color:N", scale=None)
).project(type="identity", reflectY=True)


.. _spatial-data-inline-geojson:

Inline GeoJSON object
~~~~~~~~~~~~~~~~~~~~~

If your source data is a GeoJSON file and you do not want to load it
into a GeoPandas GeoDataFrame you can specify it directly in Altair. A
GeoJSON file consists normally of a ``FeatureCollection`` with a list of
``features`` where information for each geometry is specified within a
``properties`` dictionary. In the following example a GeoJSON-like data
object is specified into an altair Data object using the ``property``
value of the ``key`` that contain the nested list (here named
``features``).

.. altair-plot::
:output: repr

obj_geojson = {
"type": "FeatureCollection",
"features":[
{"type": "Feature", "properties": {"location": "left"}, "geometry": {"type": "Polygon", "coordinates": [[[1.45, 3.75], [1.45, 0], [0, 0], [1.45, 3.75]]]}},
{"type": "Feature", "properties": {"location": "middle-left"}, "geometry": {"type": "Polygon", "coordinates": [[[1.45, 0], [1.45, 3.75], [2.57, 3.75], [2.57, 0], [2.33, 0], [1.45, 0]]]}},
{"type": "Feature", "properties": {"location": "middle-right"}, "geometry": {"type": "Polygon", "coordinates": [[[2.33, 0], [2.33, 2.5], [3.47, 2.5], [3.47, 0], [3.2, 0], [2.57, 0], [2.33, 0]]]}},
{"type": "Feature", "properties": {"location": "right"}, "geometry": {"type": "Polygon", "coordinates": [[[3.2, 0], [3.2, 1.25], [4.32, 1.25], [4.32, 0], [3.47, 0], [3.2, 0]]]}}
]
}
data_obj_geojson = alt.Data(values=obj_geojson, format=alt.DataFormat(property="features"))
data_obj_geojson

The information is stored within the ``properties`` dictionary. We
specify the nested variable name (here ``location``) within the color
channel encoding. We apply a ``magma`` color scheme as as custom scale
for the ordinal structured data.

.. altair-plot::

alt.Chart(data_obj_geojson, title="Vega-Altair - ordinal scale").mark_geoshape().encode(
color=alt.Color("properties.location:O", scale=alt.Scale(scheme='magma'))
).project(type="identity", reflectY=True)


.. _spatial-data-remote-geojson:

GeoJSON file by URL
~~~~~~~~~~~~~~~~~~~

Altair can load GeoJSON resources directly from a web URL. Here we use
an example from geojson.xyz. As is explained in
#Spatial-data-from-an-inline-GeoJSON-object, we specify ``features`` as
value for the ``property`` parameter in the ``alt.DataFormat()`` object
and prepend ``scalerank`` with the name of the nested dictionary where
information of each geometry is stored (``properties``).

.. altair-plot::
:output: repr

url_geojson = "https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_land.geojson"
data_url_geojson = alt.Data(url=url_geojson, format=alt.DataFormat(property="features"))
data_url_geojson

.. altair-plot::

alt.Chart(data_url_geojson).mark_geoshape().encode(color='properties.scalerank:N')


.. _spatial-data-inline-topojson:

Inline TopoJSON object
~~~~~~~~~~~~~~~~~~~~~~

TopoJSON is an extension of GeoJSON, where the geometry of the features
are referred to from a top-level object named arcs. Each shared arc is
only stored once to reduce size. An TopoJSON file object can contain
multiple objects (eg. boundary border and province border). When
defining an TopoJSON object for Altair we specify the ``topojson`` type
data format and the name of the object we like to visualize using the
``feature`` (here ``MY_DATA``) parameter.

Note: the key-name ``MY_DATA`` is arbitrary and differs in each dataset.

.. altair-plot::
:output: repr

obj_topojson = {
"arcs": [
[[1.0, 1.0], [0.0, 1.0], [0.0, 0.0], [1.0, 0.0]],
[[1.0, 0.0], [2.0, 0.0], [2.0, 1.0], [1.0, 1.0]],
[[1.0, 1.0], [1.0, 0.0]],
],
"objects": {
"MY_DATA": {
"geometries": [
{"arcs": [[-3, 0]], "properties": {"name": "abc"}, "type": "Polygon"},
{"arcs": [[1, 2]], "properties": {"name": "def"}, "type": "Polygon"},
],
"type": "GeometryCollection",
}
},
"type": "Topology",
}
data_obj_topojson = alt.Data(
values=obj_topojson, format=alt.DataFormat(feature="MY_DATA", type="topojson")
)
data_obj_topojson

.. altair-plot::

alt.Chart(data_obj_topojson).mark_geoshape(
).encode(
color="properties.name:N"
).project(
type='identity', reflectY=True
)


.. _spatial-data-remote-topojson:

TopoJSON file by URL
~~~~~~~~~~~~~~~~~~~~

Altair can load TopoJSON resources directly from a web URL. As is
explained in #Spatial-data-from-an-inline-TopoJSON-object, we have to
specify ``boroughs`` as object name for the ``feature`` parameter in and
define the type of data as ``topjoson`` in the ``alt.DataFormat()``
object.

.. altair-plot::
:output: repr

from vega_datasets import data

url_topojson = data.londonBoroughs.url

data_url_topojson = alt.Data(
url=url_topojson, format=alt.DataFormat(feature="boroughs", type="topojson")
)

data_url_topojson

Note: There also exist a shorthand to extract the objects from a
topojson file if this file is accessible by URL:
``alt.topo_feature(url=url_topojson, feature="boroughs")``

We color encode the Boroughs by there names as they are stored as an
unique identifier (``id``). We use a ``symbolLimit`` of 33 in two
columns to display all entries in the legend.

.. altair-plot::

alt.Chart(data_url_topojson, title="London-Boroughs").mark_geoshape(
tooltip=True
).encode(
color=alt.Color("id:N", legend=alt.Legend(columns=2, symbolLimit=33))
)



.. _spatial-data-nested-ge0json:

Nested GeoJSON objects
~~~~~~~~~~~~~~~~~~~~~~

GeoJSON data can also be nested within another dataset. In this case it
is possible to use the ``shape`` encoding channel to visualize the
nested dictionary that contains the GeoJSON objects. In the following
example the GeoJSON object is nested in ``geo``:

.. altair-plot::

nested_features = [
{"color": "#F3C14F", "geo": {"type": "Feature", "geometry": {"type": "Polygon", "coordinates": [[[1.45, 3.75], [1.45, 0], [0, 0], [1.45, 3.75]]]}}},
{"color": "#4098D7", "geo": {"type": "Feature", "geometry": {"type": "Polygon", "coordinates": [[[1.45, 0], [1.45, 3.75], [2.57, 3.75], [2.57, 0], [2.33, 0], [1.45, 0]]]}}},
{"color": "#66B4E2", "geo": {"type": "Feature", "geometry": {"type": "Polygon", "coordinates": [[[2.33, 0], [2.33, 2.5], [3.47, 2.5], [3.47, 0], [3.2, 0], [2.57, 0], [2.33, 0]]]}}},
{"color": "#A9CDE0", "geo": {"type": "Feature", "geometry": {"type": "Polygon", "coordinates": [[[3.2, 0], [3.2, 1.25], [4.32, 1.25], [4.32, 0], [3.47, 0], [3.2, 0]]]}}},
]
data_nested_features = alt.Data(values=nested_features)

alt.Chart(data_nested_features, title="Vega-Altair").mark_geoshape().encode(
shape="geo:G",
color=alt.Color("color:N", scale=None)
).project(type="identity", reflectY=True)


.. _data-projections:

Projections
~~~~~~~~~~~
For geographic data it is best to use the World Geodetic System 1984 as
its geographic coordinate reference system with units in decimal degrees.

Try to avoid putting projected data into Altair, but reproject your spatial data to
EPSG:4326 first.

If your data comes in a different projection (eg. with units in meters) and you don't
have the option to reproject the data, try using the project configuration
``(type: 'identity', reflectY': True)``. It draws the geometries without applying a projection.


.. _data-winding-order:

Winding Order
~~~~~~~~~~~~~
LineString, Polygon and MultiPolygon geometries contain coordinates in an order: lines
go in a certain direction, and polygon rings do too. The GeoJSON-like structure of the
``__geo_interface__`` recommends the right-hand rule winding order for Polygon and
MultiPolygons. Meaning that the exterior rings should be counterclockwise and interior
rings are clockwise. While it recommends the right-hand rule winding order, it does not
reject geometries that do not use the right-hand rule.

Altair does NOT follow the right-hand rule for geometries, but uses the left-hand rule.
Meaning that exterior rings should be clockwise and interior rings should be
counterclockwise.

If you face a problem regarding winding order, try to force the left-hand rule on your
data before usage in Altair using GeoPandas for example as such:

.. code:: python

from shapely.ops import orient
gdf.geometry = gdf.geometry.apply(orient, args=(-1,))
8 changes: 8 additions & 0 deletions doc/user_guide/data/url.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. currentmodule:: altair

.. _user-guide-url-data:

URL
~~~

Describe