Skip to content

Commit

Permalink
Merge pull request databrickslabs#519 from mjohns-databricks/mjohns-0…
Browse files Browse the repository at this point in the history
….4.0-docs-3

Mjohns 0.4.0 docs 20240124
  • Loading branch information
Milos Colic authored Jan 25, 2024
2 parents d89cc16 + bdfbdd4 commit 2a96291
Show file tree
Hide file tree
Showing 9 changed files with 258 additions and 237 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.

> DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13. You can specify `%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.
As of the 0.4.0 release, Mosaic issues the following ERROR when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]:
:warning: **Mosaic 0.4.x series issues the following ERROR on a standard, non-Photon cluster [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]:**

> DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster.
Expand Down
213 changes: 11 additions & 202 deletions docs/source/api/raster-functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,7 @@ rst_combineavg
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_combineavg_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -229,58 +230,6 @@ rst_combineavg
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_combineavgagg
*****************

.. function:: rst_combineavgagg(tile)

Combines a group by statement over aggregated raster tiles by averaging the pixel values.
The rasters must have the same extent, number of bands, and pixel type.
The rasters must have the same pixel size and coordinate reference system.
The output raster will have the same extent as the input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A grouped column containing raster tiles.
:type tile: Column (RasterTileType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py

df.groupBy()\
.agg(mos.rst_combineavgagg("tile").limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

df.groupBy()
.agg(rst_combineavgagg(col("tile")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql

SELECT rst_combineavgagg(tile)
FROM table
GROUP BY 1
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+


rst_derivedband
**************
Expand All @@ -295,6 +244,7 @@ rst_derivedband
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_derivedband_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -364,96 +314,6 @@ rst_derivedband
+----------------------------------------------------------------------------------------------------------------+

rst_derivedbandagg
*****************

.. function:: rst_derivedbandagg(tile, python_func, func_name)

Combines a group by statement over aggregated raster tiles by using the provided python function.
The rasters must have the same extent, number of bands, and pixel type.
The rasters must have the same pixel size and coordinate reference system.
The output raster will have the same extent as the input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A grouped column containing raster tile(s).
:type tile: Column (RasterTileType)
:param python_func: A function to evaluate in python.
:type python_func: Column (StringType)
:param func_name: name of the function to evaluate in python.
:type func_name: Column (StringType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py
from textwrap import dedent
df\
.select(
"date", "tile",
F.lit(dedent(
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""")).alias("py_func1"),
F.lit("average").alias("func1_name")
)\
.groupBy("date", "py_func1", "func1_name")\
.agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
.. code-tab:: scala

df
.select(
"date", "tile"
lit(
"""
|import numpy as np
|def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
| out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
|""".stripMargin).as("py_func1"),
lit("average").as("func1_name")
)
.groupBy("date", "py_func1", "func1_name")
.agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql
SELECT
date, py_func1, func1_name,
rst_derivedbandagg(tile, py_func1, func1_name)
FROM SELECT (
date, tile,
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""" as py_func1,
"average" as func1_name
FROM table
)
GROUP BY date, py_func1, func1_name
LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_frombands
**************

Expand Down Expand Up @@ -527,6 +387,7 @@ rst_fromcontent

.. tabs::
.. code-tab:: py

# binary is python bytearray data type
df = spark.read.format("binaryFile")\
.load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")\
Expand All @@ -538,6 +399,7 @@ rst_fromcontent
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

//binary is scala/java Array(Byte) data type
val df = spark.read
.format("binaryFile")
Expand Down Expand Up @@ -910,9 +772,12 @@ rst_mapalgebra
Here are examples of the json_spec': (1) shows default indexing, (2) shows reusing an index,
and (3) shows band indexing.
(1) '{"calc": "A+B/C"}'
(2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}'
(3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}'

.. code-block:: text
(1) '{"calc": "A+B/C"}'
(2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}'
(3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}'
:param tile: A column containing the raster tile.
:type tile: Column (RasterTileType)
Expand Down Expand Up @@ -1011,6 +876,7 @@ rst_merge
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the highest resolution input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_merge_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -1048,63 +914,6 @@ rst_merge
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_mergeagg
************

.. function:: rst_mergeagg(tile)

Combines a grouped aggregate of raster tiles into a single raster.
The rasters do not need to have the same extent.
The rasters must have the same coordinate reference system.
The rasters are combined using gdalwarp.
The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster.
The rasters are stacked in the order they are provided.
This order is randomized since this is an aggregation function.
If the order of rasters is important please first collect rasters and sort them by metadata information and then use
rst_merge function.
The output raster will have the extent covering all input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the highest resolution input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A column containing raster tiles.
:type tile: Column (RasterTileType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py

df.groupBy("date")\
.agg(mos.rst_mergeagg("tile")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

df.groupBy("date")
.agg(rst_mergeagg(col("tile"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql

SELECT rst_mergeagg(tile)
FROM table
GROUP BY date
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_metadata
*************
Expand Down
Loading

0 comments on commit 2a96291

Please sign in to comment.