Skip to content

Commit

Permalink
adjustments to docs from initial PR merge. Still need to address st_g…
Browse files Browse the repository at this point in the history
…eomfrom* and st_srid, st_setsrid, and st_transform.
  • Loading branch information
mjohns-databricks committed Jan 24, 2024
1 parent dbd037d commit 2578d2d
Show file tree
Hide file tree
Showing 7 changed files with 246 additions and 226 deletions.
213 changes: 11 additions & 202 deletions docs/source/api/raster-functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,7 @@ rst_combineavg
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_combineavg_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -229,58 +230,6 @@ rst_combineavg
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_combineavgagg
*****************

.. function:: rst_combineavgagg(tile)

Combines a group by statement over aggregated raster tiles by averaging the pixel values.
The rasters must have the same extent, number of bands, and pixel type.
The rasters must have the same pixel size and coordinate reference system.
The output raster will have the same extent as the input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A grouped column containing raster tiles.
:type tile: Column (RasterTileType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py

df.groupBy()\
.agg(mos.rst_combineavgagg("tile").limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

df.groupBy()
.agg(rst_combineavgagg(col("tile")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql

SELECT rst_combineavgagg(tile)
FROM table
GROUP BY 1
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+


rst_derivedband
**************
Expand All @@ -295,6 +244,7 @@ rst_derivedband
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_derivedband_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -364,96 +314,6 @@ rst_derivedband
+----------------------------------------------------------------------------------------------------------------+

rst_derivedbandagg
*****************

.. function:: rst_derivedbandagg(tile, python_func, func_name)

Combines a group by statement over aggregated raster tiles by using the provided python function.
The rasters must have the same extent, number of bands, and pixel type.
The rasters must have the same pixel size and coordinate reference system.
The output raster will have the same extent as the input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A grouped column containing raster tile(s).
:type tile: Column (RasterTileType)
:param python_func: A function to evaluate in python.
:type python_func: Column (StringType)
:param func_name: name of the function to evaluate in python.
:type func_name: Column (StringType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py
from textwrap import dedent
df\
.select(
"date", "tile",
F.lit(dedent(
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""")).alias("py_func1"),
F.lit("average").alias("func1_name")
)\
.groupBy("date", "py_func1", "func1_name")\
.agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+
.. code-tab:: scala

df
.select(
"date", "tile"
lit(
"""
|import numpy as np
|def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
| out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
|""".stripMargin).as("py_func1"),
lit("average").as("func1_name")
)
.groupBy("date", "py_func1", "func1_name")
.agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql
SELECT
date, py_func1, func1_name,
rst_derivedbandagg(tile, py_func1, func1_name)
FROM SELECT (
date, tile,
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""" as py_func1,
"average" as func1_name
FROM table
)
GROUP BY date, py_func1, func1_name
LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_frombands
**************

Expand Down Expand Up @@ -527,6 +387,7 @@ rst_fromcontent

.. tabs::
.. code-tab:: py

# binary is python bytearray data type
df = spark.read.format("binaryFile")\
.load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")\
Expand All @@ -538,6 +399,7 @@ rst_fromcontent
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

//binary is scala/java Array(Byte) data type
val df = spark.read
.format("binaryFile")
Expand Down Expand Up @@ -910,9 +772,12 @@ rst_mapalgebra
Here are examples of the json_spec': (1) shows default indexing, (2) shows reusing an index,
and (3) shows band indexing.
(1) '{"calc": "A+B/C"}'
(2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}'
(3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}'

.. code-block:: text
(1) '{"calc": "A+B/C"}'
(2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}'
(3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}'
:param tile: A column containing the raster tile.
:type tile: Column (RasterTileType)
Expand Down Expand Up @@ -1011,6 +876,7 @@ rst_merge
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the highest resolution input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_merge_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -1048,63 +914,6 @@ rst_merge
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_mergeagg
************

.. function:: rst_mergeagg(tile)

Combines a grouped aggregate of raster tiles into a single raster.
The rasters do not need to have the same extent.
The rasters must have the same coordinate reference system.
The rasters are combined using gdalwarp.
The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster.
The rasters are stacked in the order they are provided.
This order is randomized since this is an aggregation function.
If the order of rasters is important please first collect rasters and sort them by metadata information and then use
rst_merge function.
The output raster will have the extent covering all input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the highest resolution input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A column containing raster tiles.
:type tile: Column (RasterTileType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py

df.groupBy("date")\
.agg(mos.rst_mergeagg("tile")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

df.groupBy("date")
.agg(rst_mergeagg(col("tile"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql

SELECT rst_mergeagg(tile)
FROM table
GROUP BY date
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_metadata
*************
Expand Down
Loading

0 comments on commit 2578d2d

Please sign in to comment.