From 8f452d2ab582e838b4025f04d63af581f4ec9e21 Mon Sep 17 00:00:00 2001 From: Kristin Cowalcijk Date: Thu, 21 Mar 2024 12:38:55 +0800 Subject: [PATCH] Update documentation --- docs/setup/compile.md | 19 ++++++++++++++----- docs/tutorial/raster.md | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+), 5 deletions(-) diff --git a/docs/setup/compile.md b/docs/setup/compile.md index 417038c279..49e627775d 100644 --- a/docs/setup/compile.md +++ b/docs/setup/compile.md @@ -73,11 +73,20 @@ For example, export SPARK_HOME=$PWD/spark-3.0.1-bin-hadoop2.7 export PYTHONPATH=$SPARK_HOME/python ``` -2. Compile the Sedona Scala and Java code with `-Dgeotools` and then copy the ==sedona-spark-shaded-{{ sedona.current_version }}.jar== to ==SPARK_HOME/jars/== folder. +2. Put JAI jars to ==SPARK_HOME/jars/== folder. +``` +export JAI_CORE_VERSION="1.1.3" +export JAI_CODEC_VERSION="1.1.3" +export JAI_IMAGEIO_VERSION="1.1" +wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_core/${JAI_CORE_VERSION}/jai_core-${JAI_CORE_VERSION}.jar +wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_codec/${JAI_CODEC_VERSION}/jai_codec-${JAI_CODEC_VERSION}.jar +wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_imageio/${JAI_IMAGEIO_VERSION}/jai_imageio-${JAI_IMAGEIO_VERSION}.jar +``` +3. Compile the Sedona Scala and Java code with `-Dgeotools` and then copy the ==sedona-spark-shaded-{{ sedona.current_version }}.jar== to ==SPARK_HOME/jars/== folder. ``` cp spark-shaded/target/sedona-spark-shaded-xxx.jar $SPARK_HOME/jars/ ``` -3. Install the following libraries +4. Install the following libraries ``` sudo apt-get -y install python3-pip python-dev libgeos-dev sudo pip3 install -U setuptools @@ -86,12 +95,12 @@ sudo pip3 install -U virtualenvwrapper sudo pip3 install -U pipenv ``` Homebrew can be used to install libgeos-dev in macOS: `brew install geos` -4. Set up pipenv to the desired Python version: 3.7, 3.8, or 3.9 +5. Set up pipenv to the desired Python version: 3.7, 3.8, or 3.9 ``` cd python pipenv --python 3.7 ``` -5. Install the PySpark version and the other dependency +6. Install the PySpark version and the other dependency ``` cd python pipenv install pyspark @@ -99,7 +108,7 @@ pipenv install --dev ``` `pipenv install pyspark` installs the latest version of pyspark. In order to remain consistent with the installed spark version, use `pipenv install pyspark==` -6. Run the Python tests +7. Run the Python tests ``` cd python pipenv run python setup.py build_ext --inplace diff --git a/docs/tutorial/raster.md b/docs/tutorial/raster.md index ee4a922e2f..b95141b6d4 100644 --- a/docs/tutorial/raster.md +++ b/docs/tutorial/raster.md @@ -583,6 +583,44 @@ SELECT RS_AsPNG(raster) Please refer to [Raster writer docs](../../api/sql/Raster-writer) for more details. +## Collecting raster Dataframes and working with them locally in Python + +Sedona allows collecting Dataframes with raster columns and working with them locally in Python since `v1.6.0`. +The raster objects are represented as `SedonaRaster` objects in Python, which can be used to perform raster operations. + +```python +df_raster = sedona.read.format("binaryFile").load("/path/to/raster.tif").selectExpr("RS_FromGeoTiff(content) as rast") +rows = df_raster.collect() +raster = rows[0].rast +raster # +``` + +You can retrieve the metadata of the raster by accessing the properties of the `SedonaRaster` object. + +```python +raster.width # width of the raster +raster.height # height of the raster +raster.affine_trans # affine transformation matrix +raster.crs_wkt # coordinate reference system as WKT +``` + +You can get a numpy array containing the band data of the raster using the `as_numpy` or `as_numpy_masked` method. The +band data is organized in CHW order. + +```python +raster.as_numpy() # numpy array of the raster +raster.as_numpy_masked() # numpy array with nodata values masked as nan +``` + +If you want to work with the raster data using `rasterio`, you can retrieve a `rasterio.DatasetReader` object using the +`as_rasterio` method. + +```python +ds = raster.as_rasterio() # rasterio.DatasetReader object +# Work with the raster using rasterio +band1 = ds.read(1) # read the first band +``` + ## Performance optimization When working with large raster datasets, refer to the [documentation on storing raster geometries in Parquet format](../storing-blobs-in-parquet) for recommendations to optimize performance.