Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Fix footer, banner, TOC, EMR doc, Databricks doc #1114

Merged
merged 4 commits into from
Nov 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs-overrides/main.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
{% extends "base.html" %}

{% block outdated %}
You're not viewing the latest snapshot version.
You're not viewing the latest stable version.
<a href="{{ '../' ~ base_url }}">
<strong>Click here to go to the latest snapshot.</strong>
<strong>Click here to go to the latest stable version.</strong>
</a>
{% endblock %}

Expand Down
57 changes: 0 additions & 57 deletions docs-overrides/partials/footer.html

This file was deleted.

23 changes: 0 additions & 23 deletions docs-overrides/partials/toc-item.html

This file was deleted.

25 changes: 0 additions & 25 deletions docs-overrides/partials/toc.html

This file was deleted.

2 changes: 1 addition & 1 deletion docs/community/contact.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ You can participate in the community as follows:

## Discord Server

[Join Apache Sedona community server](discord-invite-form.html)!
[Join Apache Sedona community server](./discord-invite-form.html)!

## Mailing list

Expand Down
41 changes: 23 additions & 18 deletions docs/setup/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,29 @@ You just need to install the Sedona jars and Sedona Python on Databricks using D

## Advanced editions

We recommend Databricks 10.x+.

!!!tip
Wherobots Cloud provides a free tool to deploy Apache Sedona to Databricks. Please sign up [here](https://www.wherobots.services/).

* Sedona 1.0.1 & 1.1.0 is compiled against Spark 3.1 (~ Databricks DBR 9 LTS, DBR 7 is Spark 3.0)
* Sedona 1.1.1, 1.2.0 are compiled against Spark 3.2 (~ DBR 10 & 11)
* Sedona 1.2.1, 1.3.1, 1.4.0 are complied against Spark 3.3
* 1.4.1, 1.5.0 are complied against Spark 3.3 and 3.4

> In Spark 3.2, `org.apache.spark.sql.catalyst.expressions.Generator` class added a field `nodePatterns`. Any SQL functions that rely on Generator class may have issues if compiled for a runtime with a differing spark version. For Sedona, those functions are:
> * ST_MakeValid
> * ST_SubDivideExplode

__Sedona `1.1.1-incubating` and above is overall the recommended version to use. It is generally backwards compatible with earlier Spark releases but you should be aware of what Spark version Sedona was compiled against versus which is being executed in case you hit issues.__

#### Databricks 10.x+ (Recommended)
!!!note
If you are using Spark 3.4+ and Scala 2.12, please use `sedona-spark-shaded-3.4_2.12`. Please pay attention to the Spark version postfix and Scala version postfix. Sedona is not able to support `Databricks photon acceleration`. Sedona requires Spark internal APIs to inject many optimization strategies, which is not accessible in `Photon`.

* You need to use Sedona version `1.1.1-incubating` or higher.
* In order to activate the Kryo serializer (this speeds up the serialization and deserialization of geometry types) you need to install the libraries via init script as described below.

#### Databricks DBR 7.x - 9.x
## Install Sedona from the web UI (not recommended)

* If you are using the commercial version of Databricks you can install the Sedona jars and Sedona Python using the Databricks default web UI. DBR 7 matches with Sedona `1.1.0-incubating` and DBR 9 matches better with Sedona `1.1.1-incubating` due to Databricks cherry-picking some Spark 3.2 private APIs.
This method cannot achieve the best performance of Sedona and does not work for pure SQL environment.

## Install Sedona from the web UI
### Install libraries

1) From the Libraries tab install from Maven Coordinates
```
Expand All @@ -37,17 +37,17 @@ __Sedona `1.1.1-incubating` and above is overall the recommended version to use.
2) For enabling python support, from the Libraries tab install from PyPI
```
apache-sedona
keplergl==0.3.2
pydeck==0.8.0
```

3) (Only for DBR up to 7.3 LTS) You can speed up the serialization of geometry types by adding to your spark configurations (`Cluster` -> `Edit` -> `Configuration` -> `Advanced options`) the following lines:
```
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.apache.sedona.core.serde.SedonaKryoRegistrator
```
> For DBRs after 7.3, use the Init Script method described further down.


## Initialise
### Initialize

After you have installed the libraries and started the cluster, you can initialize the Sedona `ST_*` functions and types by running from your code:

Expand All @@ -63,11 +63,13 @@ from sedona.register.geo_registrator import SedonaRegistrator
SedonaRegistrator.registerAll(spark)
```

## Pure SQL environment
## Install Sedona from the init script

In order to activate the Kryo serializer (this speeds up the serialization and deserialization of geometry types) you need to install the libraries via init script as described below.

In order to use the Sedona `ST_*` functions from SQL without having to register the Sedona functions from a python/scala cell, you need to install the Sedona libraries from the [cluster init-scripts](https://docs.databricks.com/clusters/init-scripts.html) as follows.

## Install Sedona via init script (for DBRs > 7.3)
### Download Sedona jars

Download the Sedona jars to a DBFS location. You can do that manually via UI or from a notebook by executing this code in a cell:

Expand All @@ -80,10 +82,10 @@ mkdir -p /dbfs/FileStore/jars/sedona/{{ sedona.current_version }}
curl -o /dbfs/FileStore/jars/sedona/{{ sedona.current_version }}/geotools-wrapper-{{ sedona.current_geotools }}.jar "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar"

curl -o /dbfs/FileStore/jars/sedona/{{ sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar"

curl -o /dbfs/FileStore/jars/sedona/{{ sedona.current_version }}/sedona-viz-3.0_2.12-{{ sedona.current_version }}.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-viz-3.0_2.12/{{ sedona.current_version }}/sedona-viz-3.0_2.12-{{ sedona.current_version }}.jar"
```

### Create an init script

Create an init script in DBFS that loads the Sedona jars into the cluster's default jar directory. You can create that from any notebook by running:

```bash
Expand All @@ -97,8 +99,6 @@ cat > /dbfs/FileStore/sedona/sedona-init.sh <<'EOF'
#!/bin/bash
#
# File: sedona-init.sh
# Author: Erni Durdevic
# Created: 2021-11-01
#
# On cluster startup, this script will copy the Sedona jars to the cluster's default jar directory.
# In order to activate Sedona functions, remember to add to your spark configuration the Sedona extensions: "spark.sql.extensions org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
Expand All @@ -108,6 +108,8 @@ cp /dbfs/FileStore/jars/sedona/{{ sedona.current_version }}/*.jar /databricks/ja
EOF
```

### Set up cluster config

From your cluster configuration (`Cluster` -> `Edit` -> `Configuration` -> `Advanced options` -> `Spark`) activate the Sedona functions and the kryo serializer by adding to the Spark Config
```
spark.sql.extensions org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions
Expand All @@ -123,6 +125,9 @@ dbfs:/FileStore/sedona/sedona-init.sh
For enabling python support, from the Libraries tab install from PyPI
```
apache-sedona
keplergl==0.3.2
pydeck==0.8.0
```

*Note: You need to install the Sedona libraries via init script because the libraries installed via UI are installed after the cluster has already started, and therefore the classes specified by the config `spark.sql.extensions`, `spark.serializer`, and `spark.kryo.registrator` are not available at startup time.*
!!!tips
You need to install the Sedona libraries via init script because the libraries installed via UI are installed after the cluster has already started, and therefore the classes specified by the config `spark.sql.extensions`, `spark.serializer`, and `spark.kryo.registrator` are not available at startup time.*
10 changes: 8 additions & 2 deletions docs/setup/emr.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ We recommend Sedona-1.3.1-incuabting and above for EMR. In the tutorial, we use

This tutorial is tested on EMR on EC2 with EMR Studio (notebooks). EMR on EC2 uses YARN to manage resources.

!!!note
If you are using Spark 3.4+ and Scala 2.12, please use `sedona-spark-shaded-3.4_2.12`. Please pay attention to the Spark version postfix and Scala version postfix.

## Prepare initialization script

In your S3 bucket, add a script that has the following content:
Expand All @@ -22,8 +25,11 @@ sudo curl -o /jars/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar
sudo curl -o /jars/geotools-wrapper-{{ sedona.current_geotools }}.jar "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar"

# Install necessary python libraries
sudo python3 -m pip install pandas shapely==1.8.5
sudo python3 -m pip install pandas geopandas==0.10.2
sudo python3 -m pip install pandas==1.3.5
sudo python3 -m pip install shapely==1.8.5
sudo python3 -m pip install geopandas==0.11.1
sudo python3 -m pip install keplergl==0.3.2
sudo python3 -m pip install pydeck==0.8.0
sudo python3 -m pip install attrs matplotlib descartes apache-sedona=={{ sedona.current_version }}
```

Expand Down
10 changes: 6 additions & 4 deletions docs/setup/overview.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Download statistics

|Download statistics| **Maven** | **PyPI** | **CRAN** | **DockerHub** |
|:-------------:|:------------------:|:--------------:|:---------:|:------:|
| Apache Sedona | 225k/month |[![PyPI - Downloads](https://img.shields.io/pypi/dm/apache-sedona)](https://pepy.tech/project/apache-sedona) [![Downloads](https://static.pepy.tech/personalized-badge/apache-sedona?period=total&units=international_system&left_color=black&right_color=brightgreen&left_text=total%20downloads)](https://pepy.tech/project/apache-sedona)|[![](https://cranlogs.r-pkg.org/badges/apache.sedona?color=brightgreen)](https://cran.r-project.org/package=apache.sedona) [![](https://cranlogs.r-pkg.org/badges/grand-total/apache.sedona?color=brightgreen)](https://cran.r-project.org/package=apache.sedona)|[![Docker pulls](https://img.shields.io/docker/pulls/apache/sedona?color=brightgreen)](https://hub.docker.com/r/apache/sedona)|
| Archived GeoSpark releases |10k/month|[![PyPI - Downloads](https://img.shields.io/pypi/dm/geospark)](https://pepy.tech/project/geospark)[![Downloads](https://static.pepy.tech/personalized-badge/geospark?period=total&units=international_system&left_color=black&right_color=brightgreen&left_text=total%20downloads)](https://pepy.tech/project/geospark)| | |
| Download statistics | **Maven** | **PyPI** | Conda-forge | **CRAN** | **DockerHub** |
|----------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| Apache Sedona | 225k/month | [![PyPI - Downloads](https://img.shields.io/pypi/dm/apache-sedona)](https://pepy.tech/project/apache-sedona) [![Downloads](https://static.pepy.tech/personalized-badge/apache-sedona?period=total&units=international_system&left_color=black&right_color=brightgreen&left_text=total%20downloads)](https://pepy.tech/project/apache-sedona) | [![Anaconda-Server Badge](https://anaconda.org/conda-forge/apache-sedona/badges/downloads.svg)](https://anaconda.org/conda-forge/apache-sedona) | [![](https://cranlogs.r-pkg.org/badges/apache.sedona?color=brightgreen)](https://cran.r-project.org/package=apache.sedona) [![](https://cranlogs.r-pkg.org/badges/grand-total/apache.sedona?color=brightgreen)](https://cran.r-project.org/package=apache.sedona) | [![Docker pulls](https://img.shields.io/docker/pulls/apache/sedona?color=brightgreen)](https://hub.docker.com/r/apache/sedona) |
| Archived GeoSpark releases | 10k/month | [![PyPI - Downloads](https://img.shields.io/pypi/dm/geospark)](https://pepy.tech/project/geospark)[![Downloads](https://static.pepy.tech/personalized-badge/geospark?period=total&units=international_system&left_color=black&right_color=brightgreen&left_text=total%20downloads)](https://pepy.tech/project/geospark) | | | |



# What can Sedona do?

Expand Down
6 changes: 5 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ theme:
- search.suggest
- search.highlight
- search.share
- navigation.footer
extra:
version:
provider: mike
Expand All @@ -134,6 +135,8 @@ extra:
link: 'https://github.com/apache/sedona'
- icon: fontawesome/brands/twitter
link: 'https://twitter.com/ApacheSedona'
- icon: fontawesome/brands/discord
link: './community/discord-invite-form.html'
sedona:
current_version: 1.5.0
current_geotools: 1.5.0-28.2
Expand All @@ -143,7 +146,8 @@ extra:
current_rc: 1.5.0-rc1
current_snapshot: 1.5.1-SNAPSHOT
next_version: 1.5.1
copyright: Copyright © 2023 The Apache Software Foundation
copyright: Copyright © 2023 The Apache Software Foundation. Apache Sedona, Sedona, Apache, the Apache feather logo, and the Apache Sedona project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. All other marks mentioned may be trademarks or registered trademarks of their respective owners. Please visit <a href="http://www.apache.org/">Apache Software Foundation</a> for more details.

markdown_extensions:
- admonition
- attr_list
Expand Down